Save a large array into equal length .csv files?
16 次查看(过去 30 天)
显示 更早的评论
Hi Guys, I am trying to save an adjusted very large data set into equal length .csv files. I am using the following script from this link with my own database:
%%Step 1 - create a tall table
varnames = {'ArrDelay', 'DepDelay', 'Origin', 'Dest'};
ds1 = datastore('airlinesmall.csv', 'TreatAsMissing', 'NA', ...
'SelectedVariableNames', varnames);
tt = tall(ds1);
%%Step 2 - operate on tall table
tt.TotalDelay = tt.ArrDelay + tt.DepDelay;
%%Step 3 - use tall/write to emit .mat files
writeDir = tempname
mkdir(writeDir);
write(writeDir, tt);
%%Step 4 - use parfor to parallelise the writetable loop
ds = datastore(writeDir);
N = numpartitions(ds, gcp);
csvDir2 = tempname
mkdir(csvDir2);
parfor idx1 = 1 : N
idx2 = 0;
subds = partition(ds, N, idx1);
while hasdata(subds)
idx2 = 1 + idx2;
fname = fullfile(csvDir2, sprintf('out_%06d_%06d.csv', idx1, idx2));
writetable(read(subds), fname);
end
end
I am adapting the script in step 4 to the following in order to specify that each .csv file has 20000 rows:
RequiredDataRowsPerFile = 20000;
ds = datastore(writeDir,'ReadSize',RequiredDataRowsPerFile);
It works to some degree as there is an impact; however, the outcome does not generate an equal distribution of .csv files in terms of number of rows (of course the last file will always be different).
I would appreciate any help. Thanks
Tim
0 个评论
回答(0 个)
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Large Files and Big Data 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!