Textscan doesn't work on big files?

3 次查看（过去 30 天）

显示更早的评论

Oscar Perez 2024-5-22

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2121496-textscan-doesn-t-work-on-big-files

评论： Harald 2024-5-24

topcube.txt

在 MATLAB Online 中打开

I'm currently using the latest Matlab version on 16 GB RAM Mac.

I tried to perform a splitting of a really big cube file (100 GB) into smaller cube files with only 210151 lines per file using this code:

%% Splitting
% opening the result.cube file
fid = fopen(cube) ;
if fid == -1
    error('File could not be opened.');
end
m = 1 ;
while ~feof(fid)
    % skip the alpha and beta density
    fseek(fid,16596786,0) ;
    
    % copy the spin density
    text = textscan(fid,'%s',210150,'Delimiter','\n','Whitespace','') ;
    
    
    % Prints the cube snap shot to the subdirectory 
    name = string(step_nr(m))+'.cube' ;
    full_path = fullfile(name1,name) ;
    fid_new = fopen(full_path,"w") ;
    fprintf(fid_new,'%s\n', text{1}{:}) ;
    fclose(fid_new) ;
    m = m+1 ;
end
fclose(fid) ;
save("steps","step_nr")
end

My problem is: Apparently, textscan is not suited for this kind of files. I also tried with line-by-line copying with fgetl, which on the other hand takes ages for a file of 100 GB. Is there a more efficient way to split the file?

I've read about fscanf and tried this:

tic;
fid = fopen('result.cube');
fgetl(fid) ; fgetl(fid) ;
f = fscanf(fid, '%d %f %f %f', [4 4]) ;
s = fscanf(fid, '%d %f %f %f %f', [5 192]) ;
n = fscanf(fid, '%f %f %f %f %f %f', [6 209953]) ;
fid_new = fopen("new",'w') ;
fprintf(fid_new, '%d %.6f %.6f %.6f\n', f) ;
fprintf(fid_new, '%d %.6f %.6f %.6f %.6f\n', s) ;
fprintf(fid_new, '%f %f %f %f %f\n', n) ;
fclose(fid) ;
t=toc

But my problem here is: `s` is not aligned in the individual file like in the big file. `n` is in decimals instead of for example E-02. I also tried to copy it line by line but it takes years. Any suggestions how to improve this? I want it to look like this:

2 个评论
显示无隐藏无

Steven Lord 2024-5-22

Is your goal to split the file or is your goal to work with the data in MATLAB? If the latter, some of the Large File and Big Data functionality available in MATLAB may be of use to you.

Oscar Perez 2024-5-22

My goal is actually just splitting a really huge file into smaller ones. Afterwards, I want to deal with them individually.

请先登录，再进行评论。

请先登录，再回答此问题。

回答（1 个）

Harald 2024-5-22

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2121496-textscan-doesn-t-work-on-big-files#answer_1461761

Hi Oscar,

please attach a sample data file (1 MB will be plenty) so that we can reproduce any issues.

What problem do you encounter with the textscan approach? One issue I suspect: While textscan usually resumes where the previous textscan command left off, you always use fseek to move to the same point again. It seems you should place the call to fseek outside of the while loop.

For block reading, I would usually resort to datastores. If the data is of tabular format, I would specifically use

https://www.mathworks.com/help/matlab/ref/matlab.io.datastore.tabulartextdatastore.html

Best wishes,

Harald

9 个评论
显示 7更早的评论隐藏 7更早的评论

Oscar Perez 2024-5-24

编辑：Oscar Perez 2024-5-24

Matlab stops exactly at textscan and displays "out of memory". Already at the first loop.

Harald 2024-5-24

Ok. Can you try with a smaller number of rows (say 20000 or 2000) to see what the memory usage is?

请先登录，再进行评论。

请先登录，再回答此问题。

类别

MATLAB Data Import and Analysis Data Import and Export

在 Help Center 和 File Exchange 中查找有关 Data Import and Export 的更多信息

产品

MATLAB

版本

R2024a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by