Best way to read large text files (over 2 mil rows) into matlab?
14 次查看(过去 30 天)
显示 更早的评论
I need to read in a .csv file with 4 columns and over 2 million rows. The columns consist of a 3 row header followed by 50,000 numerical values; this pattern of the header followed by the 50,000 numbers repeats hundreds of times within the same columns until i have over 2 million rows worth of data.
What is the fastest and most efficient way to read these columns into matlab? It isn't a big deal if the cells that contain strings get read in as NaN, i can always fix that after the file has been read in.
The code that i am currently using to try and read in the data (seen below) is taking over 3 hours and it completely freezes my computer while it is computing.
filename = 'input.csv';
delimiter = ',';
startRow = 1;
%%Read columns of data as strings:
% For more information, see the TEXTSCAN documentation.
formatSpec = '%s%s%s%s%[^\n\r]';
%%Open the text file.
fileID = fopen(filename,'r');
%%Read columns of data according to format string.
% This call is based on the structure of the file used to generate this
% code. If an error occurs for a different file, try regenerating the
% code from the Import Tool.
dataArray = textscan(fileID, formatSpec, 'Delimiter', delimiter ...
, 'HeaderLines' ,startRow-1, 'ReturnOnError', false);
%%Close the text file.
fclose(fileID);
回答(1 个)
per isakson
2014-8-8
编辑:per isakson
2014-8-9
The file consists of many blocks of header-lines followed by numerical data(?). There is no high-level function in Matlab, which read your file.
- "50,000 numbers"  translates to 12,500 rows?
- the entire file as one string variable in Matlab will be approx. 0.2GB
- the numerical data converted to double will be less than 0.1GB
That should fit comfortably in memory.
 
I think that "fastest and most efficient way" is
- read the entire file to one string variable
- split the string into sub-strings, which contains header-lines followed by numerical data
- parse the sub-strings with textscan
To fill in the details requires more info on the format of the file.
0 个评论
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Text Files 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!