Parfor overhead: local cores vs. cluster core

1 次查看(过去 30 天)
I have a parfor loop that takes as inputs data from a very large cell array, where all elements of the cell array are eventually used over the loop This process takes about 150 seconds when computed on 20 local cores, but about 500 seconds when computed on 20 clustered cores (I have 100 on the cluster, for which I would like to use for scaling).
Two questions:
1) Is it safe to assume that this time difference is due to network communication latency?
2) If the answer to (1) is yes, then is there any way to send the data in the cell array in a more efficient way ? As a highly simplified example of what I currently have:
for model_it = 1:100
% some operations to create cell1, which is of length k.
parfor ih=1:k
temp=cell1{ih}
out = f(temp); % some operations done to temp
output_store{ih} = out;
end
% some operations that use output_store to create inputs to for cell1 on the next model_it
end
I do not believe parallel.pool.Constant is an option here because the data in cell1 changes every model iterations. Do I have other options for setting up this problem?
  1 个评论
Edric Ellis
Edric Ellis 2021-5-20
Try using ticBytes and tocBytes to see just how much data is being sent. Is there any way you can invert things to run parfor as the outer loop?

请先登录,再进行评论。

回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Parallel Computing Fundamentals 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by