Is it possible to split a large text file into half and subsequently use textscan for both parts?

4 次查看(过去 30 天)
Hi,
This is my first time in this forum.
I am working on a large text file containing a large number of data 10^5 * 600 of 16-digit elements. I use the textscan command to read a string data. I already known the number of columns, so I am able to generate a format spec beforehand. The main part of my code is shown below:
array=textscan(fileID,Spec,NumRow,'Delimiter',delim,'MultipleDelimsAsOne',true,'HeaderLines',1,'ReturnOnError',false);
When I specify the NumRow (number of rows) as 50000 or below, it works fine and only took about 1 minute to run. However, my system seems to crash when I increase the NumRow to 100,000. I suspect that my virtual memory has reached its limit.
Therefore, I wonder that is there a way I can split the data into two parts. Say, from the 1st -50,000th row and 50000th -100000th row
Thanks! Ati
  3 个评论
Atipong
Atipong 2013-5-14
Hi,
It's something like this, with 10^5 rows and 600 columns separated by space.
-4.7533250000e-05 -4.8990000000e-05 -3.5166750000e-01
1.5550000000e-02 -1.5832100000e-09 -4.3949250000e-01
-1.9371000000e-04 -1.1074875000e-01 -6.1198500000e-01

请先登录,再进行评论。

回答(2 个)

per isakson
per isakson 2013-5-13
编辑:per isakson 2013-5-13
Something like this
nRow = 50000;
fid = fopen( ... )
buf1 = textscan( fid, ..., nRow, .... );
....
buf2 = textscan( fid, ..., nRow, .... );
fclose( fid );
  3 个评论
per isakson
per isakson 2013-5-14
编辑:per isakson 2013-5-14
You have to process the data in buf1 and
clear buf1
before reading the rest of the file. Or
buf = textscan( fid, ..., nRow, .... );
....
buf = textscan( fid, ..., nRow, .... );
I guess, I would have written the data to one or more binary files and used memmapfile to work with the data.

请先登录,再进行评论。


Yao Li
Yao Li 2013-5-14
You can use for loops to auto-generate the formatSpec for textscan(). For example, you can read two column at a time by defining formatSpec as:
for j=1:300
for k=1:600
temp{k}='%*f';
end
temp{2*j}='%f';
temp{2*j-1}='%f';
formatSpec_array{j}=strcat(temp{1},temp{2});
for i=3:600
formatSpec_array{j}=strcat(formatSpec_array{j},temp{i});
end
end

标签

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by