improve and speed up parfor loop

1 次查看(过去 30 天)
Salam Al-Rubaye
Salam Al-Rubaye 2020-4-17
评论: Matt J 2020-4-21
Hello,
I have a code that has a 10000 iteration. The code involves a Monte Carlo simulation using Normal distributions. Number of simulation is 4,000,000. I tried to use parfor to speed up the code. However, when I compare its time to for loop is almost the same.
Is there a way to speed up the code so it works with parfor loop?
Thanks,
Here is my code
clc;
clear;
close all;
...
pool = parpool('local', str2num(getenv('SLURM_TASKS_PER_NODE')));
...
A=readmatrix("x.csv");
runs = 4000000;
results=zeros(10000,1);
meanG=constant;
sdG=constant;
parfor j=1:x
mean=A(j,1); %
sd=A(j,2);
guss=A(j,3); %
for n=1:0.5:40
B=normrnd(mean,sd,[1,runs]);
F=equation
G=normrnd(F*meanG,F*sdG,[1,runs]);
%Other calculation to calculate C
if C>10
d=equation;
break
end
end
record(j)=d;
end
  1 个评论
darova
darova 2020-4-17
Maybe if you can show something more and exaplain what this code does someone can help you

请先登录,再进行评论。

回答(2 个)

Matt J
Matt J 2020-4-17
编辑:Matt J 2020-4-17
We can't see all the operations in your loop, but the ones we can see are pretty basic ones. Operations as common and basic as those would probably be coded already to utilize a multicore CPU very efficiently, so there probably isn't much room for improvement with parfor. To get a clearer idea how much improvement is possible, though, we would need to see screen shots of your CPU usage and the usage of all its cores (e.g., from the Task Manager, if you are on a Windows OS).
Some of the randomization steps you are doing though look like they could be hoisted out of the loop, e.g.,
B=normrnd(mean,sd,[81,runs]);
for n=1:0.5:40
F=equation
...
end
  9 个评论
Matt J
Matt J 2020-4-21
编辑:Matt J 2020-4-21
I don't know bash very well, but the nodes=1 suggests to me that you are not running on multiple CPUs. Or, if you are, your for-loop has access to them as well, just as if you were running on a single 20-core CPU. If this is the case, then once again your for loop and your parfor loop have access to the exact same computing hardware, and there is no guarantee that you will get significant speed-up.
It might tell us more if you show us the output of,
>> gcp
Matt J
Matt J 2020-4-21
It might tell us more if you show us the output of,
Never mind this part. Raymond has pointed out that your workers are obviously non-remote.

请先登录,再进行评论。


Raymond Norris
Raymond Norris 2020-4-21
It's possible that your code is already making use of mulitple cores (i.e linear algebra); therefore, running local Workers may just offset this. Try running MATLAB in single thread mode (-singleCompThread) and then benchmark your code again.
You might consider posting a bit more of you code to provide more guidance for your parfor.
  1. As it's written, A is not a sliced input, it's a broadcast variable, which could impact performance.
  2. Is record(j) supposed to be results(j)?
  3. For a particular iteration of j, what happens if C is never greater than 10 (and d does not get defined)?
  4. Again, without all of the code, it's hard to make the following recommendation, but I would consider refactoring your code as such:
parfor j = 1:x
results(j) = unit_of_work(A,runs,j);
end
function d = unit_of_work(A,runs,j)
mean=A(j,1); %
sd=A(j,2);
guss=A(j,3); %
for n=1:0.5:40
B=normrnd(mean,sd,[1,runs]);
F=equation
G=normrnd(F*meanG,F*sdG,[1,runs]);
%Other calculation to calculate C
if C>10
d=equation;
break
end
end
end
  4 个评论
Matt J
Matt J 2020-4-21
I see, but I think the OPs intention is to have non-local workers.
Raymond Norris
Raymond Norris 2020-4-21
Doesn't appear that way. Notice the reference to local here:
pool = parpool('local', str2num(getenv('SLURM_TASKS_PER_NODE')));

请先登录,再进行评论。

类别

Help CenterFile Exchange 中查找有关 Third-Party Cluster Configuration 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by