Internal problem while evaluating tall expression (requested 40.5 GB array)
显示 更早的评论
Hi, I'm working with a large data set with approximately 500k rows and 6k columns. I'm using a datastore and tall array to handle the loading. The file itself is comma separated file while with most of its values coded with integers or strings. I have a dictionary for decoding these values. What I am trying to do is to replace codes with the actual meaning and save the decoded file to local.
Below I copied a structure of my program
classdef myTable < handle
% ...
methods
function this = myTable
end
% ...
end
methods
function loadCsv(this)
% ...
ds = datastore(this.csvSource);
ds.SelectedFormats = repmat({'%q'}, 1, length(ds.VariableNames));
this.csvTable = tall(ds);
end
% ...
function decoding(this)
% ...
end
function export(this)
% ...
write([this.outputDir '/' this.csvTableName '_decoded_*.csv'], this.csvTable, 'WriteFcn', @myWriter);
end
end
end
%% helper
function myWriter(info, data)
filename = info.SuggestedFilename;
writetable(data, filename, 'FileType', 'text', 'Delimiter', ',')
end
Error occured at this.export:
Error using digraph/distances
Internal problem while evaluating tall expression. The problem was:
Requested 73733x73733 (40.5GB) array exceeds maximum array size preference. Creation of arrays greater than this limit
may take a long time and cause MATLAB to become unresponsive.
Question: I was thinking that the write function should be partitioning the data while exporting. Isn't that true? Why did MATLAB still try to create such a big array?
I am using a windows machine with 16GB RAM. MATLAB R2020a (tried on 19a first and just upgraded to 20a).
Thank you!
16 个评论
Peng Li
2020-3-23
Peng Li
2020-3-23
Peng Li
2020-3-24
Peng Li
2020-3-24
per isakson
2020-3-24
编辑:per isakson
2020-3-24
You are asking for too much. I've have looked at your code and I have made a working example based on an example in the documentation. It seems to work. I fail to understand what's going wrong for you. Your code include a lot of irrelevant stuff.
Proposal
- present a MWE (Minimal working example) that produces this error
- upload one (or a few) row of your data set.
Sean de Wolski
2020-3-24
Yes, please provide a few sample rows.
Peng Li
2020-3-24
Peng Li
2020-3-25
Sean de Wolski
2020-3-25
Your understanding is correct.
But we need to know why digraph is trying to create a 73733x73733 array. It could be you have something shadowed so it's not calling a builtin, it could be expected and you need to partition differently, I don't know.
Peng Li
2020-3-25
Peng Li
2020-3-25
Walter Roberson
2020-3-25
A complete error message showing traceback would help.
Peng Li
2020-3-25
Sean de Wolski
2020-3-26
Tall uses a digraph to figure out the fewest number of lower level operations that need to be done so it can efficiently traverse the data set as few a times and without repetition as possible.
Peng Li
2020-3-26
Peng Li
2020-3-27
回答(0 个)
类别
在 帮助中心 和 File Exchange 中查找有关 Matrix Indexing 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!