How to read, reshape, and write large data?

2 次查看(过去 30 天)
Hello!
I have a data matrix data = m x n, which I want to transform into a single column vector data(:), and write this vector to an output file.
% read 100 rows of data
data = [];
for idx_row = 1:100
A = fscanf(fileID,formatSpec);
data = cat(1,data, A);
end
% Convert to int16
data =data*10^6;
data = int16(data);
% Write to file
fp = fopen([filepath 'data.dat'], 'wb');
fwrite(fp, data(:),'int16');
fclose(fp);
The problem is that the size of data is very large to fit in the memory (e.g. 100 x 1e10). And, each row of the data is saved in separate file, and I must read them separately.
I can read a single row, which works file, but when I try to add more rows, the computer runs out of memory rather quickly. :(
Also, when creating a large array to fill the data in runs into the same problem regarding out of memory -
data = nan(100,1e10)
Error using nan
Requested 100x10000000000 (7450.6GB) array exceeds maximum array size preference. Creation of arrays greater
than this limit may take a long time and cause MATLAB to become unresponsive.
How can I make it work? Thanks in advance!
  2 个评论
Rik
Rik 2021-7-29
If you don't have 8TB of RAM, you can't create such a large array (and even if you had, it could still be a problem, as memory needs to be contiguous). Using int16 to preallocate your array will help, but only by a factor 4.
You will have to do this chunk by chunk.
Chunru
Chunru 2021-7-29
8TB is way too big for today's system. However, the array size is not exactly limited by the RAM size. It is limited by the virtual memory size that OS manages (which may use hard disk as part of memory hierarchy). Of course, the speed will be affected when data are exchanged between RAM and hard disk very frequently.

请先登录,再进行评论。

采纳的回答

Chunru
Chunru 2021-7-29
编辑:Chunru 2021-7-29
You can read a small portion each time and write to the file. This way you will not use a lot of memory.
blocksize = 1e6;
nfiles = 100;
for i=1:nfiles
% fileID(i) = fopen(...)
end
fp = fopen([filepath 'data.dat'], 'wb');
data = zeros(nfiles, blocksize);
% you may need special treatment for last block
for iblock=1:nblocks
for i = 1:nfiles
% for large file, using fwrite and fread for speed
% fscanf and fprintf are slow and take much more disk space
data(i, :) = fread(fid, ...); % read a block of data from each file
%A = fscanf(fileID,formatSpec);
%data = cat(1,data, A);
end
% write data
fwrite(fp, int(data(:)*1e6),'int16');
end
fclose all
  2 个评论
Rik
Rik 2021-7-29
The problem is that you need the first element from every file, then the second element from every file, etc.
And about the coding style: I would suggest using fclose(fp);, instead of closing all files. That habit will get you when you do have multiple files open.
Chunru
Chunru 2021-7-29
Instead of read first element from every file, we read a block of data from every file (obviously for speed). You don't need all data from a single file before doing partial reshaping. "fclose all" is a lazy way here as I am tired of another for loop to close all the files.

请先登录,再进行评论。

更多回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Low-Level File I/O 的更多信息

产品


版本

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by