Find out why mat files differ in size

17 次查看(过去 30 天)
I'm developing a rather complex class hierachy with a few GB of data embedded in its instances which might get saved to mat files for later analysis.
I refactored a lot to improve memory and CPU footprints (using dependent properties, customized loadobj and saveobj methods etc) and saw that the resulting mat file grows in size (using save() with v7.0 and enabled compression). I screwed it up.
I have some old reference mat files from the former versions that are smaller (~30%). However if I load them using the current class definitions, the resulting objects in RAM are almost exactly (just <1% difference) in size (using the great getArrayFromByteStream function, see Serializing/deserializing Matlab data - Undocumented Matlab). That means I can't infer from the instantiated objects, what grew in size.
Question: How do I find out what really gets saved to the mat file, i.e. which variable/object is much larger compared to the old versions?
I can roll-back to my former version via Git, but that does not really help me to understand, why exactly the mat files got bigger.
Any ideas?
Thanks,
Jan

采纳的回答

Jan Kappen
Jan Kappen 2024-3-25
Got it fixed.
I've followed a similar approach as @Samay Sagar proposed, but ultimately used getArrayFromByteStream, see Serializing/deserializing Matlab data - Undocumented Matlab. And I checked out the old version of my library in a second MatLab session and compared all properties step by step, skipping Dependent properties via reflection.
Root cause: I've split a data table (class table) into two class objects which should've used dependent properties, and an internal table to store the data. Turned out I forgot to make one block of properties transient/dependent to avoid saving them.
Afterwards, the mat file sizes were basicaly the same - quite interesting that there's no difference if the table is saved or a wrapping class around it - both can get compressed very efficiently, very nice Mathworks!
PS, just found out that mat files can be compared visually too: Compare and Merge MAT-Files - MATLAB & Simulink (mathworks.com) and that it can even "look" into objects, but not arbitrarily nested. But it could also be a good starting point:

更多回答(1 个)

Samay Sagar
Samay Sagar 2024-3-25
You can utilize the "whos" command for thorough examination of variable sizes within MATLAB objects, facilitating the discernment of any modifications in variable dimensions present in MAT files.
Here is a sample script to identify changes in MAT file:
% Extract variables of interest
oldVariables = whos('-file', 'old_version.mat');
newVariables = whos('-file', 'new_version.mat');
% Compare variable sizes
for i = 1:length(oldVariables)
oldSize = oldVariables(i).bytes;
newSize = 0; % Initialize new size
% Find corresponding variable in new version
for j = 1:length(newVariables)
if strcmp(oldVariables(i).name, newVariables(j).name)
newSize = newVariables(j).bytes;
break;
end
end
if newSize == 0
fprintf('%s:\n', oldVariables(i).name);
fprintf(' Variable not found in new version\n\n');
else
sizeChange = newSize - oldSize;
percentageChange = (sizeChange / oldSize) * 100;
fprintf('%s:\n', oldVariables(i).name);
fprintf(' Old Size: %d bytes\n', oldSize);
fprintf(' New Size: %d bytes\n', newSize);
fprintf(' Size Change: %d bytes (%.2f%%)\n\n', sizeChange, percentageChange);
end
end
Read more about “whos” here:
  1 个评论
Jan Kappen
Jan Kappen 2024-3-25
Thank you very much for that approach. Unfortunately, it looks like that does not work with handle class objects. Plus, I just had one variable in that mat file, a big class object that capsules all the data.
I've followed a similar approach but ultimately used getArrayFromByteStream, see Serializing/deserializing Matlab data - Undocumented Matlab. And I checked out the old version of my library in a second MatLab session and compared all properties step by step, skipping Dependent properties via reflection.
Root cause: I've split a data table (class table) into two class objects which should've used dependent properties, and an internal table to store the data. Turned out I forgot to make one block of properties transient/dependent to avoid saving them.
Afterwards, the mat file sizes were basicaly the same - quite interesting that there's no difference if the table is saved or a wrapping class around it - both can get compressed very efficiently, very nice Mathworks!
PS, just found out that mat files can be compared visually too: Compare and Merge MAT-Files - MATLAB & Simulink (mathworks.com) and that it can even "look" into objects, but not arbitrarily nested. But it could also be a good starting point:

请先登录,再进行评论。

类别

Help CenterFile Exchange 中查找有关 Workspace Variables and MAT-Files 的更多信息

产品


版本

R2023b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by