When loading .mat files in a parfor, the first time is way slower than the second time.

2 次查看(过去 30 天)
Hi all,
I've encountered a weird behavior I wasn't able to understand or find a possible explanation of.
I wrote a function for loading some files (data structures whose size ranges from 40 to 100 MB) from a dataset in a parfor, and do some operations.
I've noticed that the first time I launch the script, the execution is incredibly slower than the successive executions (38 seconds vs 1.8 seconds).
I've tried to remove the parfor and use a simple for, but there is still a difference between the first and the successive times, even thou more limited (17 seconds vs 11 seconds).
I've also tried different datasets, and there is the same behavior. When I restart Matlab and I launch the same call the first time, same thing. If I stop and restart parpool, same thing.
I am wondering why it is like this and if I can do something to avoid this behavior.
Matlab 2019a Update 4, Unix (64-bit)
PS: parpool was already started.
PPS: the successive executions are faster even after calling clear all/clearvars.
PPPS: to remove all possible other influences, I've cleaned the code so that now it just loads files. Same behavior.
  2 个评论
Daniel M
Daniel M 2019-11-15
I think this is either an issue with either caching or the just-in-time compiler organizing itself on the first run of the loop, or an issue with broadcasting in the parfor.
Question:
  1. are you using "clear" or "clear all" at the top of your script? Try using "clearvars" instead.
  2. Are you loading the same files every loop? You could try loading once at the beginning of the loop instead.
Francesco Onorati
Francesco Onorati 2019-11-15
编辑:Francesco Onorati 2019-11-15
no broadcasting variables in the loop. Just tried clear all and clearvars, but the successive executions are stil way faster. I load the same data: first time, slow; second (and successive) time(s), fast. If I change dataset, same thing: first time, slow; after, fast.

请先登录,再进行评论。

回答(1 个)

Daniel M
Daniel M 2019-11-15
编辑:Daniel M 2019-11-15
So you're doing something like this?
for k = 1:10
mydata = load('myfile.mat');
output = someFunction(mydata);
end
That's pretty inefficient. You should load the data once outside the loop. It will be faster to read the data from a cache than to load it each time (because typically speed of memory is better than I/O).
As for why the first iteration is slower, I believe that is due to the JIT compiler doing its magic. This is also referred to as 'warm-up time'. Hopefully someone with a deeper understanding can weigh-in here.
Try running this script to test for warm up time. Note: run this in a script, not the command window (because the JIT effects may not take place in the command window).
clearvars
close all
clc
% Create some data, but only once
if ~exist('data.mat','file')
data = rand(1,1e8,'single');
save('data.mat','data');
clear data
end
fname = 'data.mat';
fprintf('loading\n')
tic
mydata = load(fname);
data = mydata.data;
loadtime = toc;
% display the loading time
fprintf('It took %f s to load the file.\n',loadtime)
% Run some stuff in a loop and time it.
iters = 20;
t2 = zeros(1,iters);
for k = 1:iters
t1 = tic;
% do some random processes on mydata
tmp1 = data.^2;
tmp2 = sin(tmp1);
t2(k) = toc(t1);
end
figure
stem(t2)
xlabel('Time')
ylabel('Iteration')
% first couple iterations take longer
% get the warm up time (from first few iterations)
warmtime = max(t2(1:3))/mean(t2(end-3:end)) - 1;
fprintf('First few iterations were %.0f %% slower than last\n',warmtime*100)
fprintf('done!\n')
And the output:
loading
It took 2.576633 s to load the file.
First few iterations were 58 % slower than last
done!
% see attached figure
  5 个评论
Francesco Onorati
Francesco Onorati 2019-11-15
编辑:Francesco Onorati 2019-11-15
function test_parfor(path)
files = dir(path);
parfor k = 1:length(files)
mydata = load(fullfile(path, files(k).name));
end
Daniel M
Daniel M 2019-11-15
编辑:Daniel M 2019-11-15
Can you write a self sufficient test script please? That does not run, nor is it a test of parfor.

请先登录,再进行评论。

类别

Help CenterFile Exchange 中查找有关 Parallel for-Loops (parfor) 的更多信息

标签

产品


版本

R2019a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by