Solve large linear systems with Parallel computing toolbox
5 次查看(过去 30 天)
显示 更早的评论
Dear all,
I need to solve something of this form , where are just 7 distinct doubles and I is the identity matrix. Of course, those linear systems can be solved in parallel, and I want to do that in Matlab with the PCT. The matrix A is A = gallery('poisson',n).
In my cluster, I have a node with 16 CPUs, and I want to use this fact to boost the performance. I wrote the following code to see if the parfor gives an improvement w.r.t the classical for. I started a parallel pool of 7 workers, and when I run it on the cluster I specified to use 7 CPU cores, according to the phylosophy "1 worker per CPU core", but my performance does not get better.
Here's the code with the following output:
clear all
close all
m = 70^2;
A = gallery('poisson',70);
I = speye(m);
v = ones(m,1);
x = zeros(m,7);
theta = [1.1,0.2,5.6,0.2,6,8,9.9];
tic
for i=1:7
x(:,i) = (A - theta(i)*I)\v;
end
toc
parpool(7)
tic
parfor i=1:7
x(:,i) = (A - theta(i)*I)\v;
end
toc
The results are:
Elapsed time is 0.184104 seconds. (with for loop)
Elapsed time is 0.451166 seconds. (with parfor loop)
So my questions are:
- is there something wrong in how I wrote my code to run in parallel? How can I improve my performance? (iterative solvers, or differen methods)
- why the parfor considerably slower than the classical for? I've seen that linear algebra operations are already multithreaded and hence there could be no gain with a parfor
0 个评论
回答(1 个)
Dana
2020-7-16
编辑:Dana
2020-7-16
First of all, when I run the code you posted, Matlab gives warnings that A-theta(i)*I is singular to working precision for i=2,4. Not sure if that's expected.
Second, I actually do find parfor faster when I run your code (though only barely). Not sure why you're finding otherwise, but it could be something specific to your processor.
7 个评论
Bruno Luong
2020-7-16
编辑:Bruno Luong
2020-7-16
You should then definitively try look into using one of the iterative solvers such as pcg, cgs, and friends.
In such method you can provide "A" through a function, in your case it's come down to computing
y = (-A*x +theta_i*x)
for any arbitrary given vector x, where A is sparse.
This must speed up, and if furthermore you could provide and cheap approximation of inv(A) for preconditioning, it will speed up even more.
I have no idea about efficiency of par-for since I do not own the parallel computing toolbox.
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Parallel for-Loops (parfor) 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!