How to prevent parfor from slowing down towards later iterations?

14 次查看(过去 30 天)
Hi there,
I'm running lots of simulation on an HPC in a parfor loop. The number of iterations is around 150000. I propably want to do more than that at a later time.
My scripts and functions print out a progress update into a text log after each iteration and I notice that the main parfor loop I'm utilizing slows down dramatically towards the end. The cpu time elapsed per iteration does not change apart from slight variations, leading me to believe that towards the end, fewer workers are utilized.
Does matlab assign all of the work and iterations to each worker once at the start of the parfor loop and then keep workers idle which are finished quicker than the others? It seems like a couple unfortunate workers are stuck finishing their allocated iterations, while the rest idles?
Is there a way to change this?

采纳的回答

Damian Pietrus
Damian Pietrus 2025-5-6
Out of curiosity, how many workers do you have in your pool? Is it a reasonable multiple of the total number of iterations in your parfor?
When assigning parfor iterations to workers in the pool, your client MATLAB session first sends small chunks of tasks to get started, sends larger chunks in the middle of the processing time, then finally sends smaller chunks to try to have tasks finish reasonably close to one another. However, this automatic process may sometimes result in certain workers wrapping up early and waiting for others to finish. You can see an extemely simplified version in the code below. Since neither 17 or 19 are even multiples of the 4 workers in the pool, the overall execution time will be the same for both loops.
if isempty(gcp('nocreate'))
parpool('Processes', 4);
end
% Start the first timer for parallel execution
tic
nIterations = 17;
parfor i = 1:nIterations
% Simulate some work with a pause
pause(0.50);
end
toc
% Start the second timer for parallel execution
tic
nIterations = 19;
parfor i = 1:nIterations
% Simulate some work with a pause
pause(0.50);
end
toc
One potential workaround would be to use parforOptions to manually control the range partitioning. To quote the doc:
You can control how parfor divides iterations into subranges for the workers with parforOptions. Controlling the range partitioning can optimize performance of a parfor-loop. For best performance, try to split into subranges that are:
  • Large enough that the computation time is large compared to the overhead of scheduling the subrange
  • Small enough that there are enough subranges to keep all workers busy
This would allow you to manually choose how many iterations are being sent to each worker, potentially leveling out execution time at the end of your loop. If you do give it a try, let us know how it goes!
  1 个评论
Felix
Felix 2025-5-7
I have a node with 128 workers. So far I had used 127 of them for the parpool. I have tried setting different subranges in parforOptions but have not yet found a good solution. However, I did not realize I should calculate useful combinations of workers and iterations. Thank you!

请先登录,再进行评论。

更多回答(1 个)

Matt J
Matt J 2025-5-6
编辑:Matt J 2025-5-6
Is there a way to change this?
I suspect not. Firstly, if you have reduction variables, the workers that aren't iterating aren't really "idle". They need to store their accumulated portion of the reduction variables until they can be combined with the portion from other workers.
Also, it can be counter-productive for one worker to broadcast part of its tasks to an idle worker midstream. In particular, any temporary variables residing on the non-idle worker would have to be cloned and broadcast. This will have overhead, especially if the temporary variables are large. At the very least, a complex decision would need to be made as to when these extra broadcasting steps are worth it.
  1 个评论
Felix
Felix 2025-5-7
Thank you!
I have accepted Damian's answer because it mentions parforOptions, which inches me a little closer towards good performance, even though your answer is technically correct I believe.

请先登录,再进行评论。

类别

Help CenterFile Exchange 中查找有关 Parallel Computing Fundamentals 的更多信息

产品


版本

R2024b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by