Big Problem/Bug with new matfile command for partial mat file read/writes - creates massivly bloated files.
显示 更早的评论
Please look at this minimal example:
%create a 1mb "incompressible" array
one_meg = uint8(rand(1,1000,1000)*256);
%choose a file, clear it and open it with write access
testfile = 'D:\Data\PGRtest\testfile.mat';
system(['del "' testfile '"'] );
matObj = matfile(testfile,'Writable',true);
%keep a copy of what we write to the file in memory for verification
memcpy = zeros(50,1000,1000,'uint8');
%write the array 50 times to this file
for i = 1:50
tic
%store in file and memory in same format - pages of 1000x1000
matObj.RawDat(i,1:1000,1:1000) = one_meg;
memcpy(i,1:1000,1:1000) = one_meg;
tm = toc;
%time increases from 45ms to 250ms at last iteration
fprintf('Iteration %i, time taken: %ims\n',i,tm*1000);
end
%check file size - should be 50mb or smaller from compression
%the file size is 1200mb....?
s = dir(testfile);
fprintf('file size: %i mb\n', s.bytes/1024/1024);
%load the mat file
load(testfile)
%the data inside is 50mb as expected no where near 1200mb
whos('RawDat')
%verify
%the read data is equal to the memory copy.. where did all that extra space go?
sum(abs(memcpy(:)-RawDat(:)))
This is using Windows 7 64bit, Matlab 2011b 64bit.
The problem is mostly described in the comments - essentially why does 50mb of data create a 1200mb mat file when created using the matfile system object?
I have tried storing the data with 2 dimensions instead of 3 I have tried using doubles not uint8. I have tried changing the default .mat file format from 7.3 although this is the only version that supports it.
I cant understand why it takes longer and longer - it is as if each write to the file rewrites all the existing data a second time so the first write is 1mb then 2mb then 3mb etc instead of 1mb each time.
I expect 'testfile' to be a <50mb mat file containing a 50x1000x1000 array. What I see is a 1.2GB file containing that array - clearly incorrect.
If the array is saved directly from workspace using 'save' the mat file is 2mb containing the same data.
Looks like this is a bug.
Any ideas? Do you get the same results? Thanks, Tom.
3 个评论
Titus Edelhofer
2011-11-28
Hi Tom,
looks strange, indeed. I get the same results (the 1.2 GB file as well as a reasonably sized file when saving once the data (also in 7.3 format). I will pass your example to our development to look into this.
Thanks,
Titus
Thomas Osgood
2011-11-28
Jiri Hajek
2021-4-20
Ten years later and the same problem is still around... My data saved into a v7 mat file are around 1MB, as compared to almost 100MB in a v7.3 file. Loading and saving times are unfortunately proportionally longer as well. Note however that the data contained in the file take up only around 20MB so there is roughly a 5-fold increase in size by saving to a v7.3 file. Also note that I dont use the -nocompression flag when saving the file.
Any ideas or suggestions would be highly appreciated...
采纳的回答
更多回答(1 个)
Walter Roberson
2011-11-27
0 个投票
Keep in mind that save defaults to -v7, which has compression, but matfile uses -v7.3 which is HDF5 files which appear not to be compressed the way MATLAB uses them (though it could be that that has changed since -v7.3 files were first introduced.)
3 个评论
Thomas Osgood
2011-11-28
Walter Roberson
2011-11-28
matfile objects *force* -v7.3
Thomas Osgood
2011-11-28
类别
在 帮助中心 和 File Exchange 中查找有关 Workspace Variables and MAT Files 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!