Parfor with Iterations Having Unequal Execution Times

4 次查看(过去 30 天)
I've run into an interesting problem with parfor...
I'm trying to use parfor to evaluate a bunch of different input cases of a program simultaneously. I know beforehand that the cases are not going to take the same amount of time to evaluate, and I can tell you with reasonable accuracy which ones are going to take the longest. The number of cases that each worker is running is very low (about 40 total cases being distributed to 12 workers).
Some of these cases will take as much as three or four times longer than the faster ones, and so a lot of user time could be saved if I could predict what order the parfor loop was going to execute in and try to distribute the jobs intelligently.
The simple example would be this: say I have four jobs to run on two workers, three of which will take 10 seconds and one of which will take 30 seconds. Obviously the ideal solution is for one worker to evaluate the long job and the other worker to evaluate the other three in the same amount of time. What tends to happen, though, is that both workers will evaluate one of the short jobs first, and then one of them will wind up with 40 seconds of working time while the other worker sits idle for half that time.
Even more annoying, I think that sometimes there will be an available job sitting in the "queue" with a worker available, but it won't run because that job is supposed to go to a worker that is busy. The example here would be two 10-second jobs and two 30-second jobs, and one worker winds up evaluating both 30-second jobs.
Anyone have any clever ideas of how to work around this problem?

采纳的回答

Titus Edelhofer
Titus Edelhofer 2011-12-7
Hi,
if your cases are input cases you might try to use the (slightly more elaborate way) of distributing the code using a job with 40 tasks. It helps at least for the distribution of the tasks, because they get distributed one by one. You might still end up that the longest run is the last in the row to be computed.
One step further would be to create more then one job where you distribute your tasks more clever.
Titus
  1 个评论
Greg
Greg 2011-12-7
After looking into the job scheduler stuff a bit, I think this is going to be the "right" way to do what I want, though it's going to take a little bit more effort to get there.

请先登录,再进行评论。

更多回答(4 个)

Jaap
Jaap 2011-12-7
is it possible to make an inner loop that goes through the faster cases?
or: if you know everything beforehand, make two parfor loops. while in the first, you will do all short/fast cases (hoping that rem(12,cases)==0). the next loop, you will do the rest in the same fashion.
  1 个评论
Greg
Greg 2011-12-7
That's not a bad idea, but it might come back to bite me if I'm a little bit wrong about which one is going to take the longest.

请先登录,再进行评论。


Titus Edelhofer
Titus Edelhofer 2011-12-7
Hi,
here a rather ugly way which gives you full control on which worker processes which index:
function myparfortest
% this is a usual parfor loop
x = zeros(200, 1);
parfor i=1:200
x(i) = rank(magic(i));
end
% use spmd to do the same, although
spmd
y = zeros(200, 1);
% suppose we have 3 workers:
% worker1: process i=1, 4, 7, ...
% worker2: process i=2, 5, 8, ...
% worker3: process i=3, 6, 9, ...
for i=labindex:numlabs:200
y(i) = rank(magic(i));
end
end
% now collect the result
Y = zeros(200, 1);
for i=1:length(y)
yi = y{i};
Y(i:length(y):end) = yi(i:length(y):end);
end
% let's see if we did right:
isequal(x, Y)
Titus
  1 个评论
Greg
Greg 2011-12-7
This is a little hack-ish, but it's easy and it has promise if you know the execution order precisely beforehand. It won't dynamically re-evaluate like the job scheduler would, though, if some jobs start finishing earlier than expected.

请先登录,再进行评论。


Mikheil Azatov
Mikheil Azatov 2016-12-16
A bit late to the party, but have you figured this out since ? I'm having the same issue with R2016b.

Martin Ryba
Martin Ryba 2021-4-7
Hi I've noticed similar behavior all the way to r2020a. I have a big parfor loop where the execution time of each step is on the order of 10 minutes and I need to do 5000+ of them. I can get 100-150 workers. This gets submitted to a cluster (LSF) so if other jobs are on the node it will slow that node down. So, it appears the work unit allocation is frozen at the start of the loop, and you get to wait for the slowest of your workers to finish his list of jobs before the loop completes. Any good workarounds to give a more dynamic queue since the activity to manage the queue is small compared to the duration of each task?

类别

Help CenterFile Exchange 中查找有关 Parallel Computing Fundamentals 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by