Matlab unable to parse a Numeric field when I use the gather function on a tall array.

172 次查看(过去 30 天)
So I have a CSV file with a large amount of datapoints that I want to perform a particular algorithm on. So I created a tall array from the file and wanted to import a small chunk of the data at a time. However, when I tried to use gather to get the small chunk into the memory, I get the following error.
"Board_Ai0" is the header of the CSV file. It is not in present in row 15355 as can be seen below where I opened the csv file in MATLAB's import tool.
The same algorithm works perfectly fine when I don't use tall array but instead import the whole file into the memory. However, I have other larger CSV files that I also want to analyze but won't fit in memory.
UPDATE: So apparently the images were illegible but someone else edited the question to make the size of the image larger so I guess it should be fine now. Also I can't attach the data files to this question because the data files that give me this problems are all larger than 5 GB.
  11 个评论
Harald
Harald 2025-9-1,12:35
This makes sense: as it stands, MATLAB tries to import the entire file at once - I should have mentioned that.
You need to set 'ReadMode' to 'partialfile' and specify a 'ReadFcn' that imports a certain number, say 100,000, of rows at a time. It could then look like this:
ds = fileDatastore("yourfile.csv", "ReadFcn", @readdata, "UniformRead", true, "ReadMode","partialfile");
data = readall(ds);
function [data,startrow,done] = readdata(filename,startrow)
nRows = 100000;
if isempty(startrow)
startrow = 2;
end
opts = detectImportOptions(filename);
opts.DataLines = [startrow, startrow+nRows-1];
data = readtimetable(filename, opts);
data = rmmissing(data);
done = height(data) < nRows;
startrow = startrow + nRows;
end
Best wishes,
Harald
Ninad
Ninad 2025-9-6,12:47
So the code works well when I run it on a file that can fit in memory. But when I run it on a file that cannot, I get the following error:
The code is:
function [data,startrow,done] = readdata(filename,startrow)
nRows = 10000000;
if isempty(startrow)
startrow = 2;
end
opts = detectImportOptions(filename);
opts.DataLines = [startrow, startrow+nRows-1];
data = readtimetable(filename, opts);
data = rmmissing(data);
done = height(data) < nRows;
startrow = startrow + nRows;
end
function [data,startrow,done]=givetimetable(~,~)
data=timetable(seconds(200.000005),[0.139389038085938],'VariableNames',["Board0_Ai0"]);
startrow=2;
done=true;
end
ds = fileDatastore("1kcross.csv", "ReadFcn", @readdata, "UniformRead", true,"PreviewFcn",@givetimetable,"ReadMode","partialfile");
data=tall(ds);
slice=data(1:10000000,:);
slice=gather(slice);
What am I still doing wrong?

请先登录,再进行评论。

采纳的回答

Stephen23
Stephen23 2025-9-8,14:43
编辑:dpb 2025-9-8,16:45
Providing the RANGE argument does not prevent READTABLE from calling its automatic format detection:
which might involve loading all or a significant part of the file into memory. The documented solution is to provide an import options object yourself (e.g. you can generate this on a known good file of a smaller size and then storing it) or alternatively using a low-level file reading command, e.g. FSCANF, FREAD, etc.
  3 个评论
Ninad
Ninad about 12 hours 前
So I tried going the import options route, MATLAB crashed. Anyways, I have decided to stop working on this problem and just buy more RAM.
@dpb Since you mentioned that having the data file would be easier on the other answer, I am sharing a data file which gave me probems here:
https://drive.google.com/file/d/162QUEpXudcHb5sE1IQxDLdUuu_RcCkGw/view?usp=sharing
dpb
dpb about 7 hours 前
I was suggesting to attach a piece of the file (perhaps zipped to include a little more). That would be enough for folks to have enough to test with that duplicates the actual format.
What, precisely, does "MATLAB crashed" mean? Actually aborted MATLAB itself or another out-of-memory or ...?

请先登录,再进行评论。

更多回答(1 个)

dpb
dpb 2025-9-6,13:59
编辑:dpb 2025-9-6,18:43
It appears it is detectImportOptions that is having the problem -- apparently it tries to read the whole file into memory first before it does its forensics.
I don't think you need an import options object anyway, use the 'Range' named parameter in the argument to readtimetable
Something like
function [data,startrow,done] = readdata(filename,startrow)
nRows = 10000000;
if isempty(startrow)
startrow = 2; % this looks unlikely to be right from the earlier image there are 3(?) header rows?
end
range=sprintf('%d:%d',startrow, startrow+nRows); % build row range expression
data = readtimetable(filename, 'Range',range);
data = rmmissing(data);
done = height(data) < nRows;
startrow = startrow + nRows;
end
This may still have some issues using the timetable, however if it first reads variable names from a header line which header line isn't there in the subsequent sections of the file. I don't know what trouble you'll run into with such large files if try to read 100K lines into the file but tell it to also read the variablenames from the second or third line in the file....probably ignoring variable names and letting MATLAB use defaults then set the Properties.VariableNames after reading of just accept the defaults would be best bet.
  5 个评论
Harald
Harald 2025-9-9,7:36
@Ninad, sorry that my suggestion did not work and for the troubles around this. I would usually test my suggestions but this is difficult due to not having the data.
@dpb, while I work at MathWorks, I am not a developer or in Technical Support. I try to support Answers as my core duties permit.
dpb
dpb about 5 hours 前
编辑:dpb about 5 hours 前
@Harald, no problem, just commenting on why I hadn't poked harder, earlier...
If @Ninad would attach a short section of a file it would make it simpler, indeed. It's not convenient at the moment to stop a debugging session and try to create a local copy of a similar file to play with/poke at.
The documentation isn't all that helpful, the only examples I can find using tables/timetables with tall arrays are tiny data files and don't use the filedatastore so they don't have a callback function with a table. I don't believe there is an example of the combination....

请先登录,再进行评论。

标签

产品


版本

R2025a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by