Reading ASCII file in portions
4 次查看(过去 30 天)
显示 更早的评论
I have 4.5GB ASCII file. I would like to read it in portions.
For example I would like to read 1GB at a time, and store the read DATA into MATLAB. After the 1GB read I would like to add the next 1GB of data read to the existing stored data.
Is it possible and if so what is the code?
I tried to use something like the following
segsize = 1000000;
while ~feof(fid)
data=fread(fid,segsize,'*char');
end
But it is not reading the entire file. I am guessing it stops at the number of the segsize. How do I make it read 1 GB and store it in MATLAB; then read another 1GB? I'd like to conserve RAM as I intend to read much larger files.
Thanks for the Help!
2 个评论
Matt Kindig
2013-9-16
Is "data" changing throughout the loop? In other words, is the fread() correctly reading? How big is "data" after, say, one iteration of the while loop?
回答(1 个)
Walter Roberson
2013-9-16
feof(fid) does not predict that end of file is about to occur: feof() is not true until an end-of-file has already occurred. You need to be checking how much data you got back from the fread because you might not get any (because it was positioned right before end of file before the fread() )
[data, count] = fread(fid, segsie, '*char' )
Question: is the file definitely ASCII? As in the last printable character is decimal 126, the tilde ("~") character? Or is the file potentially UTF-8 or UTF-16 encoded due to having been created that way or edited using an editor that automatically saves to UTF-* ? If the file happens to contain bytes with value beyond 127, what do you want to happen? Should the fread() try to examine the byte sequence to see if it should decode the UTF-8 or UTF-16 into the Unicode that MATLAB uses internally? Or should the fread() return each byte of input as a distinct position in the string?
The code you have now is for the case where the file might possibly be UTF-* encoded and the fread() is to examine the bytestream to see if it can decode it. If you do not want that to happen, then instead of '*char' use 'uint8=>char'
2 个评论
Walter Roberson
2013-9-16
Does the file contain any characters other than A-Z a-z 0-9 ~!@#$%^&*()_+`-=[]{}\| ;':",.<>/? and spaces and end-of-line characters ?
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Low-Level File I/O 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!