- For very small operations, like A = B + C (B = 2x2, C = 2x2), the compiler would need to make arrangements for parallel code, running the code on separate cores, and then joining the threads. This can have a little more overhead than the serial execution.
- For very large number of threads, the OS would be busy in switching the threads, saving the state of each thread, and then loading the thread of each state again. This would result in heavy time loss.
Pcg and Parallel Computing Toolbox
3 次查看(过去 30 天)
显示 更早的评论
Hi MATLAB community, I know that function pcg is supported in the Parallel Computing Toolbox for use in data parallel computations with distributed arrays, i am using a HPC architecture that it's made of 8 nodes, each blade consists of 2 quadcore processors sharing memory for a total of 8 cores and of 64 cores, in total. I run pcg on 1 core and pcg with distributed arrays on 32 cores.
tic
[y]=pcg(A,b,[],100); %first case
toc
A=distributed(A);
b=distributed(b);
tic
[x,flagCG_1,iter] = pcg(@(x)gather(A*x),b,[],100); %second case on 32 cores
toc
i obtained that Elapsed time is 0.001279 seconds. %first case Elapsed time is 0.316632 seconds. %second case on 32 cores
why the time in second case is greater than the time in first case? what am I doing wrong? I tried with larger size matrices but the time in second case is always greater than the time in first case, i probably don't use pcg correctly for distributed arrays. Thanks for your help
0 个评论
回答(2 个)
Gaurav Garg
2020-9-16
Hey Rosalba,
Using Parallel Computing toolbox for very small problems or for large number of threads can prove to be of no use. That's because -
In your problem, the former argument seems to be the case.
As a workaround, you can either run the code without use of distributed arrays or try using parfor loop (code snippet given below) -
parfor i = 1:iter
[x] = f(k,l);
end
Note that the function f should not have any data dependency among the iterations.
0 个评论
Oli Tissot
2020-9-11
You can simply do:
dA = distributed(A);
db = distributed(b);
[dx, flag, iter] = pcg(dA, db, [], 100); % dx is a distributed array
x = gather(dx);
But you should be aware that distributed arrays are not designed to be faster than in-memory arrays, they are designed to process arrays that would not fit in your local memory. In practice, operations on distributed arrays are usually slower because of the extra cost for communication, but if the matrix is large enough these operations could not be performed at all otherwise! If you are interested in performance, you may try to use gpuArray -- pcg is supported for gpuArray.
4 个评论
Steven Lord
2020-9-14
How large a problem are you trying to solve? If you're trying to solve small problems, using parallel code may not save you any time as you've seen because the overhead of setting up the problem in parallel may outweigh any savings you get from running in parallel.
Picture you and three young children go to the grocery store (before COVID-19 of course.) If you split the four items on your shopping list among your group are you really saving time? Maybe, but if you do save time you probably don't save a lot of time. And depending on how mature the children are, you may very well lose time. How about if you split the 100 items on your list into four segments? Then you may save more time than the four item list case.
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Parallel Computing Fundamentals 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!