Preconditioning algorithm on GPU for solution of sparse matrices

Hi
I solve large sparse Ax=b equations with conjugate gradient algorithms with a preconditioner. Since Matlab 2016a, Matlab started to support some conjugate gradient algorithms like bicgstab, pcg, gmres on GPU with a preconditioner for sparse matrices. Those functions only accept M sparse matrix (M=M1*M2 for M1 lower M2 upper triangular sparse matrix) not M1 and M2.
I'm wondering how Matlab apply preconditioner? I know that sparse triangular matrix solving on GPU is notoriously slow. So I think it might use some kind of iterative method. Maybe preconditioner applying might be done on CPU instead. So what exactly is done on the background while applying the preconditioner?

 采纳的回答

MATLAB's preconditioning for sparse iterative solvers on the GPU is currently implemented using ILU and sparse triangular solves. If you have a solution more appropriate to your problem then you can use the functional form - this diverts to a different implementation but can be faster and/or converge better depending on your problem.

5 个评论

Thanks for your quick reply,
I have been programming on CUDA for a while now. I can confirm that, sparse triangular solver is pretty slow on GPU, so I thought that Matlab could have developped a new triangular sparse solver or it uses jacobi iteration on triangular matrices after ilu to apply preconditioner. I'm interested in this because Matlab's conjugate gradient functions are faster than CPU version and if it really uses TRSV (triangular sparse matrix solver) it is surprising.
Can you really confirm that, Does Matlab really use TRSV after applying ILU? If so why can't we send lower and upper triangular matrices to the function directly. I can read further documentation which I couldn't find any for those functions for GPU
MATLAB uses a third party library to perform the CG methods when the inputs are all sparse matrices. Internally to that code I can't say much, but I know it uses NVIDIA's cusparse library for the ILU and triangular solves, and those should be about as optimal as they can get. If you want to implement this in your own code (via MEX functions perhaps) then I recommend you read their API documentation for csrsm_solve and see if it has what you need.
Exposing a sparse triangular solve is all part of a drive to implement a sparse direct solver for gpuArray, which is something we hope to bring in a future version.
I hope I'm not dragging this too long
I realized cuSPARSE has a reordering function named csrcolor which reorders matrix to reduce number of levels in a matrix to expose more parallelism. I have read this in couple of NVIDIA papers but these papers also say that reordering reduce the quality of the preconditioner and decrease the convergence rate. When I tried to solve my complex valued system of equations (with bicgstab) on GPU, very interestingly, number of iterations decreased to half with same preconditioner. (It should have increased a bit)
I can speculate that GPU version of the CG functions of Matlab require M matrix (not M1 and M2 triangular matrices) because it reorders them to expose more parallelism.
I think, reordering is the trick Matlab is using here, I will try the same thing using CUDA-C and see what happens.
Ps. For my case, iterative solvers are way faster for very big system of equations (~10^6 DoF's)
It sounds like you're saying that the ILU produces better factors of M than the original two triangular matrices used to create it - that's possible, I don't know the details of the implementation.
@Joss, Does the current PCG implementation is multi threaded? Does it use Intel MKL solver behind the scene?

请先登录,再进行评论。

更多回答(0 个)

类别

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by