Big Problem/Bug with new matfile command for partial mat file read/writes - creates massivly bloated files.

1 次查看(过去 30 天)
Please look at this minimal example:
%create a 1mb "incompressible" array
one_meg = uint8(rand(1,1000,1000)*256);
%choose a file, clear it and open it with write access
testfile = 'D:\Data\PGRtest\testfile.mat';
system(['del "' testfile '"'] );
matObj = matfile(testfile,'Writable',true);
%keep a copy of what we write to the file in memory for verification
memcpy = zeros(50,1000,1000,'uint8');
%write the array 50 times to this file
for i = 1:50
tic
%store in file and memory in same format - pages of 1000x1000
matObj.RawDat(i,1:1000,1:1000) = one_meg;
memcpy(i,1:1000,1:1000) = one_meg;
tm = toc;
%time increases from 45ms to 250ms at last iteration
fprintf('Iteration %i, time taken: %ims\n',i,tm*1000);
end
%check file size - should be 50mb or smaller from compression
%the file size is 1200mb....?
s = dir(testfile);
fprintf('file size: %i mb\n', s.bytes/1024/1024);
%load the mat file
load(testfile)
%the data inside is 50mb as expected no where near 1200mb
whos('RawDat')
%verify
%the read data is equal to the memory copy.. where did all that extra space go?
sum(abs(memcpy(:)-RawDat(:)))
This is using Windows 7 64bit, Matlab 2011b 64bit.
The problem is mostly described in the comments - essentially why does 50mb of data create a 1200mb mat file when created using the matfile system object?
I have tried storing the data with 2 dimensions instead of 3 I have tried using doubles not uint8. I have tried changing the default .mat file format from 7.3 although this is the only version that supports it.
I cant understand why it takes longer and longer - it is as if each write to the file rewrites all the existing data a second time so the first write is 1mb then 2mb then 3mb etc instead of 1mb each time.
I expect 'testfile' to be a <50mb mat file containing a 50x1000x1000 array. What I see is a 1.2GB file containing that array - clearly incorrect.
If the array is saved directly from workspace using 'save' the mat file is 2mb containing the same data.
Looks like this is a bug.
Any ideas? Do you get the same results? Thanks, Tom.
  3 个评论
Jiri Hajek
Jiri Hajek 2021-4-20
Ten years later and the same problem is still around... My data saved into a v7 mat file are around 1MB, as compared to almost 100MB in a v7.3 file. Loading and saving times are unfortunately proportionally longer as well. Note however that the data contained in the file take up only around 20MB so there is roughly a 5-fold increase in size by saving to a v7.3 file. Also note that I dont use the -nocompression flag when saving the file.
Any ideas or suggestions would be highly appreciated...

请先登录,再进行评论。

采纳的回答

Philip Borghesani
Philip Borghesani 2011-11-28
For the same reasons that growing an array in memory is a bad idea growing an array in a matfile is not a good programming practice. Your file has been horribly fragmented because of the matrix growth. The full 3d matrix must occupy one linear segment of the file.
If you preallocate the file variable by adding the line:
matObj.RawDat=memcpy; %preallocate
after creating the memcpy variable then your file size will be reasonable.
If your code is a model of what you want to do I suggest storing your chunks of data in cells of a RawData cell array inside your file.
You are also indexing into your array inefficiently but that does not seem to be causing any performance issues. For MATLAB it would be best if RawData was (1000,1000,50) in size.
  4 个评论
Philip Borghesani
Philip Borghesani 2011-11-28
Did you see my suggestion to use a cell array? A cell array should not need to be preallocated and each cell is stored separately in the file so growth will not be an issue.
Thomas Osgood
Thomas Osgood 2011-11-28
Yes sorry, I will give that a try and report how it goes either way it looks like you have come up with a solution!

请先登录,再进行评论。

更多回答(1 个)

Walter Roberson
Walter Roberson 2011-11-27
Keep in mind that save defaults to -v7, which has compression, but matfile uses -v7.3 which is HDF5 files which appear not to be compressed the way MATLAB uses them (though it could be that that has changed since -v7.3 files were first introduced.)
  3 个评论

请先登录,再进行评论。

类别

Help CenterFile Exchange 中查找有关 Workspace Variables and MAT-Files 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by