How to save a tall array / table to a text or csv file?
7 次查看(过去 30 天)
显示 更早的评论
I'm trying to save the contents of a tall table that doesn't fit into memory to a .txt file.
MATLAB provides the function write. However, it can only write the table contents to .mat files. So far I haven't found an option or another function that could write the data to a text file.
A workaround I'm trying to do is to continously gather a part of the tall table, save it to a text file, gather the next part etc. until the table end is reached. For that to work in tall syntax I suppose I need a vector containing the row numbers of the tall table. That way I could find the index of the rows I want to gather and write with:
idx = (RowNumbers >= lowerLimit) & (RowNumbers <= upperLimit);
With the index vector it is then possible to gather the rows I want to save to the text file:
TableToSave = gather(Table(idx,:));
once the data is gathered the table could be saved with the writetable function. After that, the lowerLimit and upperLimit could be adjusted and a new chunk of the table could be saved.
The point where I'm failing is the construction of this vector containing the row numbers. In theory it's simply
RowNumbers = 1:1:size(Table,1); // Or: RowNumbers = 1:1:gather(size(Table,1));
The first one doesn't work because the 1:X syntax doesn't support 'tall doubles' and the second approach doesn't work I suppose because the resulting RowNumbers and index vector are completely in-memory while the table is not.
So if I try to
idx = tall((RowNumbers >= lowerLimit) & (RowNumbers <= upperLimit));
and use
gather(head(DB(idx,:)));
The following error appears:
Incompatible tall array arguments. The first dimension in each tall array must have the same size, and each tall array must be based on the same datastore.
To sum up:
1. Is there another way to save tall arrays / tables to text files?
2. How to create an "unevaluated" row number array that then could be used for the described workaround?
Thanks a lot!
0 个评论
采纳的回答
Edric Ellis
2017-9-7
I think the least inefficient method is probably to combine use of tall/write with writetable. Calling gather repeatedly is going to be inefficient - the approach below takes 2 passes over the data (one of which is in the optimised .mat form, so should be quicker). Here's the sort of thing I mean.
%%Step 1 - create a tall table
varnames = {'ArrDelay', 'DepDelay', 'Origin', 'Dest'};
ds1 = datastore('airlinesmall.csv', 'TreatAsMissing', 'NA', ...
'SelectedVariableNames', varnames);
tt = tall(ds1);
%%Step 2 - operate on tall table
tt.TotalDelay = tt.ArrDelay + tt.DepDelay;
%%Step 3 - use tall/write to emit .mat files
writeDir = tempname
mkdir(writeDir);
write(writeDir, tt);
%%Step 4 - iteratively convert the tall/write output to CSV
ds = datastore(writeDir);
csvDir = tempname
mkdir(csvDir);
idx = 0;
while hasdata(ds)
idx = 1 + idx;
fname = fullfile(csvDir, sprintf('out_%06d.csv', idx));
writetable(read(ds), fname);
end
A refinement of this would be to partition the datastore and operate on it in parallel, using the techniques described here in the documentation.
2 个评论
Edric Ellis
2017-9-7
Here's the parfor version of Step 4.
%%Step 5 - use parfor to parallelise the writetable loop
ds = datastore(writeDir);
N = numpartitions(ds, gcp);
csvDir2 = tempname
mkdir(csvDir2);
parfor idx1 = 1 : N
idx2 = 0;
subds = partition(ds, idx1, N);
while hasdata(subds)
idx2 = 1 + idx2;
fname = fullfile(csvDir2, sprintf('out_%06d_%06d.csv', idx1, idx2));
writetable(read(subds), fname);
end
end
更多回答(1 个)
Adam Filion
2018-10-1
编辑:Adam Filion
2018-10-1
As of R2018b the tall write command now directly supports writing tall arrays to .txt files (also .csv, .xls* and custom formats) in addition to .mat:
0 个评论
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Tall Arrays 的更多信息
产品
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!