load specified column in matfile too slow

3 次查看(过去 30 天)
I have a matfile with size of: 5e6*50.
I want to write a code to load the specific column into my memory, but I found that the time reading a specified column is nearly the same with that reading the whole matfile.
below is the test code:
A=rand(5e6,50);
save A A
f=matfile('A');
tic
tmp=f.A;
toc
tic
tmp=f.A(:,1);
toc
is there anyway to improve the performance?
Thanks!
Yu
  2 个评论
Walter Roberson
Walter Roberson 2018-10-3
I notice you did not specifically save with -v7.3, so you might be getting a -v7 file.
When I test on my system with -v7.3, selecting one column comes out roughly 10% faster. Not as good as one might hope, though.
Yu Li
Yu Li 2018-10-3
Yes, the expected result should be 2% of reading the whole file, since I have totally 50 columns.

请先登录,再进行评论。

回答(1 个)

Walter Roberson
Walter Roberson 2018-10-3
In this case, you can do much better by using -nocompression
Save -v7.3 -nocompression
Elapsed time is 17.814604 seconds.
Done save -v7.3 -nocompression
Start matfile -v7.3 -nocompression
Elapsed time is 0.016278 seconds.
Done matfile -v7.3 -nocompression
Start recall entire variable -v7.3 -nocompression
Elapsed time is 2.195975 seconds.
Done recall entire variable -v7.3 -nocompression
Start recall one column -v7.3 -nocompression
Elapsed time is 1.089280 seconds.
Done recall one column -v7.3 -nocompression
Save -v7.3
Elapsed time is 58.543461 seconds.
Done save -v7.3
Start matfile -v7.3
Elapsed time is 0.077814 seconds.
Done matfile -v7.3
Start recall entire variable -v7.3
Elapsed time is 10.139135 seconds.
Done recall entire variable -v7.3
Start recall one column -v7.3
Elapsed time is 9.118167 seconds.
Done recall one column -v7.3
Source code:
A=rand(5e6,50);
time_it(A, {'-v7.3' '-nocompression'})
time_it(A, {'-v7.3'})
function time_it(A, saveoptions)
savedesc = strjoin(saveoptions, ' ');
fprintf('Save %s\n', savedesc);
tic
save('A', 'A', saveoptions{:});
toc
fprintf('Done save %s\n', savedesc);
fprintf('Start matfile %s\n', savedesc);
tic
f = matfile('A');
toc
fprintf('Done matfile %s\n', savedesc);
fprintf('Start recall entire variable %s\n', savedesc);
tic
tmp=f.A;
toc
fprintf('Done recall entire variable %s\n', savedesc);
fprintf('Start recall one column %s\n', savedesc);
tic
tmp=f.A(:,1);
toc
fprintf('Done recall one column %s\n', savedesc);
end
  7 个评论
Yu Li
Yu Li 2018-10-3
Hi:
Thanks for your test, I got much deeper understanding about this.
I think there should have some bottom neck here, I will contact Mathworks for further investigation.
Thanks!
Yu
Walter Roberson
Walter Roberson 2018-10-3
You can refer them to this post and my test code.
One thing they are likely to point out is that the default of compression is intended for "real" data, not for rand(), and that when you read/write with compression, the performance would be expected to vary with how compressible the data is. Thus you should probably run this code with the rand() replaced by load of one of your actual matrices.

请先登录,再进行评论。

类别

Help CenterFile Exchange 中查找有关 Workspace Variables and MAT-Files 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by