Solving multiple systems of equations using GPU and iterative methods

I am trying to solve an inverse by solving multiple systems of linear equations. The coefficient matrix is large and sparse, and direct methods produce significant rounding errors. I have used the incomplete LU factorization as preconditioner for the iterative methods, and parallelization using CPU is quite simple as follows:
[L1,U1] = ilu(U,struct('type','ilutp','droptol',tolILU));
E3aux = speye(size(U));
parfor j=1:size(U,1),
[E3(:,j),~] = bigs(U,E3aux(:,j),[],[],L1,U1);
end
However, since bigs function also works with GPUarrays, I wonder if it would be possible to make something similar using the GPU. I know that it is possible to do it for one system of equations as follows:
cpuTime = timeit(@()bicg(U,b,[],[],L1,U1));
fprintf('It takes %3.4f seconds on the CPU to execute a systems of equations.\n',cpuTime);
Ugpu = gpuArray(full(U));
bgpu = gpuArray(full(b));
L1gpu = gpuArray(full(L1));
U1gpu = gpuArray(full(U1));
gpuTime = gputimeit(@()bicg(Ugpu,bgpu,[],[],L1gpu,U1gpu));
fprintf('It takes %3.4f seconds on the GPU to execute system.\n',gpuTime);
fprintf(['Unvectorized GPU code is %3.2f times slower ',...
'than the CPU version.\n'], gpuTime/cpuTime);
Getting the following output:
It takes 0.0195 seconds on the GPU to execute system.
Unvectorized GPU code is 25.29 times slower than the CPU version.
However, I do not know:
  1. How to use preconditioners without transforming them into full matrices.
  2. How to paralellize for multiple independent vectors.

 采纳的回答

Sparse gpuArrays have been supported since R2015b and bicg since R2016b, so you should just call bicg with the original sparse matrices and you should get good performance. If you pass your system matrix U as a preconditioner then ILU is the technique used to apply it so you will get the same behaviour. You cannot pass two matrix preconditioners to the sparse gpuArray version of bicg.
The iterative solvers cannot (by their very nature) solve for multiple right-hand-sides at the same time. You can only do that by calling the solver repeatedly - or by using a direct solver.

2 个评论

Thanks for your quick response Joss.
Regarding your last comment. I understand that iterative solvers cannot solve for multiple right-hand-sides at the same time, but I am not sure if what you are claiming is that it is not possible to parallelize the solution for multiple systems, like I did using the CPU (see the code below), but using the GPU instead.
E3aux = speye(size(U));
parfor j=1:size(U,1),
[E3(:,j),~] = bigs(U,E3aux(:,j),[],[],L1,U1);
end
Sure you can, but this only makes sense if you have more than one GPU. Otherwise you're gaining nothing.

请先登录,再进行评论。

更多回答(0 个)

类别

帮助中心File Exchange 中查找有关 Creating and Concatenating Matrices 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by