Unable to achieve desired speed up using parfor
42 次查看(过去 30 天)
显示 更早的评论
Hi,
I am initializing several instances of a matlab (p)code using parfor loops on two computers with the following configurations.
Comp A: 16 core 3.4GHz, 8GB per core @ 3200MHz,
Comp B: 32 core 3.6GHz, 8GB per core @ 3200MHz,
I am launching 16 instances on A and 32 on B. I find that all instances on B finish in about half the time as those on A. It baffles me since the spec scale almost identically. Also, all instances do the same thing, hence identical computational overhead. Is there any hardware optimization that should be done for better efficiency on A?
5 个评论
Rik
2024-11-1,7:47
My initial guess was that the generation would be different and hence the number of instructions per cycle may be different. That doesn't seem to be the case here.
Perhaps it is the cache? If everything fits in the CPU cache there is no need to go to RAM. I don't have any other plausible cause, unless the smaller chip doesn't actually reach the frequency you mentioned due to thermal and/or power throttling.
回答(1 个)
Matt J
2024-11-1,10:58
I find that all instances on B finish in about half the time as those on A. It baffles me...
That is the expected result, assuming you are running the same loop on both computers. Assuming for example that it is a 32 iteration loop,
parfor i=1:32
...
end
then Comp A would be assigned 2 iterations per core, while Comp B will be assigned only 1. So, it makes perfect sense that Comp B will finish in half the time.
0 个评论
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Matrix Indexing 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!