Using parfor instead of for
2 次查看(过去 30 天)
显示 更早的评论
So I'm trying to look at the difference between CPU and GPU program runtimes using a parallel computational scheme. My simple file is attached, but whenever I use parfor with the most computational-heavy area, the runtime is significantly increased (1.8s to 477s). How else can I help speed up the code by implementing parallelization? Is my code too simple to see a performance increase?
Thanks!
2 个评论
Adam
2014-10-23
How are you measuring run time? One thing I always find difficult to factor into such a decision is the overhead of starting up the Matlab pool in the first place. If it isn't already open then your call to parfor will open it (unless you have set the option not to do so).
This overhead can often make the difference between whether or not it is worth using a parfor loop. Often you are in a situation where if the parallel pool were already open then parfor would be faster, but if it has to open the pool just to do your calculation then overall it will be slower.
I guess you can do some ugly if statement around that to check if the pool is open and then fork to either parfor or for depending on that, but often in such cases it just isn't worth the effort to use parallel computing at all.
采纳的回答
Bruno Pop-Stefanov
2014-10-23
编辑:Bruno Pop-Stefanov
2014-10-23
Dividing the work into several jobs and sending the jobs to the workers is very expensive. This is done at the line with parfor, before the loop starts. You're right in that there is no point to parallelize the code if this overhead is greater than the time needed to run the loop in serial.
I ran the code with for instead of parfor (it took me 308 sec for just x=3 and parfor was taking way too long to let it run to completion) and counted 6277 iterations of the while loop enclosing the parfor loop. That means that Parallel Computing Toolbox has to divide the loop and send the work to the workers 6277 times. That's a lot...
It's better to divide the work at a higher level, i.e. above the while loop. For example, you could do x=1, x=2, and x=3 on three workers instead of doing it in serial. Instead of taking 3 times 308 s, it should take just above ~308 s:
parpool(3)
tic
spmd % instead of for x=1:3
x = labindex;
DataPoints=mesh(x)^2;
...
end
toc
delete(gcp)
Also, it would be nice to get rid of the inner for loop for j=2:N. You could see a speedup if you can vectorize this for loop somehow.
更多回答(1 个)
Matt J
2014-10-23
编辑:Matt J
2014-10-23
There doesn't appear to be any good reason to make u a cell array. It looks like it could be made into a simple matrix with elements u(i,j). The same is true for A, D, and un below.
A{i}(j)=(1/h)*(((u{i}(j+1)+u{i}(j))/2)^2-((u{i}(j)+u{i}(j-1))/2)^2);
D{i}(j)=(1/h^2)*(u{i}(j+1)+u{i}(j-1)+u{i+1}(j)+u{i-1}(j)-4*u{i}(j));
un{i}(j)=u{i}(j)+deltat*(-A{i}(j)+(10^-6)*D{i}(j))-((deltat/h)*(DeltaP));
It also doesn't look like you even need either parfor or for loops to compute these expressions. They all involve expressions that are either convolutions, or can be vectorized, e.g.
D=conv2(u,[0 1 0; 1 -4 1; 0 1 0],'same')/h^2;
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Parallel for-Loops (parfor) 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!