The “matfile” function in MATLAB provides a way to access and modify variables in MAT-files without loading the entire file into memory. This is useful when working with large datasets that do not fit into memory. However, the way “matfile” handles data can lead to inefficiencies in certain scenarios, as discovered.
Using matfile for incremental writes is slower and produces larger files due to frequent disk access, potential space reallocation, and added file overhead. In contrast, the save function compresses data efficiently in a single operation, resulting in faster performance and smaller file sizes.
In shared example, writing to foo.mat in a loop, modifying the file 100,000 times. This process is slow because each write operation incurs file I/O overhead. Additionally, the resulting file is larger because it contains more overhead and less effective compression due to the incremental writes.
When you save A to bar.Mat using the “save” function, MATLAB writes the entire array to disk in one operation, which allows it to efficiently compress the data and minimize file overhead, resulting in a faster operation and a smaller file.
Recommendations:
- “matfile” is best used when dealing with data that is too large to fit into memory. For smaller datasets, or when performance is a concern, consider constructing variable in memory first and then saving it in one operation.
- Preallocate Space: If final size of variable is known, reallocating space in MAT-file can sometimes improve performance and reduce file fragmentation, although this will not necessarily reduce final file size.
While matfile provides a flexible interface for working with large datasets, its performance characteristics and impact on file size make it less suitable for scenarios where data can be efficiently handled in memory.
Please refer to following documentation links-
Hope it helps!
Best Regards,
Simar