Why Fread a 2 GB file needs more than 8 GB of Ram?

5 次查看(过去 30 天)
textscan is too slow.
Thus, I want to load a 2 GB file in RAM with fread (fast), then scan it.
Fread works well with small files, but if I try to fread(filename,'*char') a 2 GB file, RAM spikes for some reason over my 8 GB limit and I get out of memory.
Ideas?
  2 个评论
Jan
Jan 2013-6-4
Please post the full code, because there might be unexpected problems.
Gabriel
Gabriel 2013-6-4
Well, the code is simple:
fid = fopen(filename);
test = fread(fid, '*char');

请先登录,再进行评论。

回答(3 个)

Jan
Jan 2013-6-4
Reading a 2GB-file into a CHAR required 4GB of RAM, because Matlab uses 2-byte-chars. Then it is possible depending on the way you store the data, that the contents of a temporary array is copied, such that 8GB is the expected memory consumption. But actually I'd expect that this copy could be avoided, so it might be helpful, if you show us the code fragment.
  2 个评论
Gabriel
Gabriel 2013-6-4
Precisely, I expect it to require 4GB, yet watching system monitor, the whole things goes over 8GB and into swap.
I also get the copied into functions parts, etc. But shouldnt FREAD be able to load a 2 GB file into a 4GB char array without needing more than 8GB of Ram?
Jan
Jan 2013-6-4
编辑:Jan 2013-6-4
I've seen an equivalent behavior for another FREAD implementation (not in Matlab): The required final size was not determined by FSEEK, but the file was read in chunks until the buffer was filled. Then the buffer was re-allocated with the double size. After the obvious drawbacks have been mentioned in a discussion, the author decided to replace the doubling method by a smarter Fibonacci sequence. :-)

请先登录,再进行评论。


Iain
Iain 2013-6-4
As Jan implied, passing around variables often leads to memory duplication - 2GB arrays get COPIED when put into functions.
The Out of memory error normally comes up when matlab cannot find a single chunk of RAM big enough for a variable.
Use much smaller chunks of memory, and read the file in and parse it in chunks of, say, 64MB.
  2 个评论
Walter Roberson
Walter Roberson 2013-6-4
The arrays will only get copied if they are modified; otherwise the data pointer will point to the original storage.
Gabriel
Gabriel 2013-6-4
I think I did not express myself well, I apologize. Parsing is not the issue. I fully expect scanning functions to be memory hogs (relatively).
Fread on the other hand, I don't quite get why it needs so much overhead to load a 2GB+ file in the workspace?

请先登录,再进行评论。


Gabriel
Gabriel 2013-6-4
编辑:Gabriel 2013-6-4
In any case, I have found a workaround for textscanning large ascii files (4GB and beyond) that contain numbers
The trick is padding the numbers with PERL or SED before trying to read them into matlab. If you pad your numbers with leading 0s, every line has the same ammount of chars, thus FREAD is easy to execute in chunks.
ex:
While not eof
tmp = fread X lines
data = textscan(tmp)
process(data)
end
With this trick, I went from 3 MB/sec to 130 MB/sec for processing a file.

类别

Help CenterFile Exchange 中查找有关 String Parsing 的更多信息

标签

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by