How to save a tall array / table to a text or csv file?

69 次查看(过去 30 天)
I'm trying to save the contents of a tall table that doesn't fit into memory to a .txt file.
MATLAB provides the function write. However, it can only write the table contents to .mat files. So far I haven't found an option or another function that could write the data to a text file.
A workaround I'm trying to do is to continously gather a part of the tall table, save it to a text file, gather the next part etc. until the table end is reached. For that to work in tall syntax I suppose I need a vector containing the row numbers of the tall table. That way I could find the index of the rows I want to gather and write with:
idx = (RowNumbers >= lowerLimit) & (RowNumbers <= upperLimit);
With the index vector it is then possible to gather the rows I want to save to the text file:
TableToSave = gather(Table(idx,:));
once the data is gathered the table could be saved with the writetable function. After that, the lowerLimit and upperLimit could be adjusted and a new chunk of the table could be saved.
The point where I'm failing is the construction of this vector containing the row numbers. In theory it's simply
RowNumbers = 1:1:size(Table,1); // Or: RowNumbers = 1:1:gather(size(Table,1));
The first one doesn't work because the 1:X syntax doesn't support 'tall doubles' and the second approach doesn't work I suppose because the resulting RowNumbers and index vector are completely in-memory while the table is not.
So if I try to
idx = tall((RowNumbers >= lowerLimit) & (RowNumbers <= upperLimit));
and use
gather(head(DB(idx,:)));
The following error appears:
Incompatible tall array arguments. The first dimension in each tall array must have the same size, and each tall array must be based on the same datastore.
To sum up:
1. Is there another way to save tall arrays / tables to text files?
2. How to create an "unevaluated" row number array that then could be used for the described workaround?
Thanks a lot!

采纳的回答

Edric Ellis
Edric Ellis 2017-9-7
I think the least inefficient method is probably to combine use of tall/write with writetable. Calling gather repeatedly is going to be inefficient - the approach below takes 2 passes over the data (one of which is in the optimised .mat form, so should be quicker). Here's the sort of thing I mean.
%%Step 1 - create a tall table
varnames = {'ArrDelay', 'DepDelay', 'Origin', 'Dest'};
ds1 = datastore('airlinesmall.csv', 'TreatAsMissing', 'NA', ...
'SelectedVariableNames', varnames);
tt = tall(ds1);
%%Step 2 - operate on tall table
tt.TotalDelay = tt.ArrDelay + tt.DepDelay;
%%Step 3 - use tall/write to emit .mat files
writeDir = tempname
mkdir(writeDir);
write(writeDir, tt);
%%Step 4 - iteratively convert the tall/write output to CSV
ds = datastore(writeDir);
csvDir = tempname
mkdir(csvDir);
idx = 0;
while hasdata(ds)
idx = 1 + idx;
fname = fullfile(csvDir, sprintf('out_%06d.csv', idx));
writetable(read(ds), fname);
end
A refinement of this would be to partition the datastore and operate on it in parallel, using the techniques described here in the documentation.
  2 个评论
Benjamin Imbach
Benjamin Imbach 2017-9-8
Thanks! This is what I was looking for. I still wish there was a direct way to write the tall table to a csv but until then this will do!
A small correction:
subds = partition(ds, idx1, N);
should be
subds = partition(ds, N, idx1);
In your case it worked since the datastore is probably so small that N = 1.
Thanks again!

请先登录,再进行评论。

更多回答(1 个)

Adam Filion
Adam Filion 2018-10-1
编辑:Adam Filion 2018-10-1
As of R2018b the tall write command now directly supports writing tall arrays to .txt files (also .csv, .xls* and custom formats) in addition to .mat:

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by