Accelerate eig with GPUs

Hi all, I need to diagonalize a lot of matrices. The problem is similar to:
A = rand(5000, 5000, 500); %this snipped is just a demo. It is real logic in the program
EVs = zeros(5000, 500);
for idx = 1:500
EVs(:, idx) = eig(A(:,:,idx));
end
This is fine on CPUs and easily scalable with parfor and MDCS. As eig is faster on GPUs I tried this
A = rand(5000, 5000, 500); %this snipped is just a demo. It is real logic in the program
EVs = zeros(5000, 500, 'gpuArray');
for idx = 1:500
B = gpuArray(A(:, :, idx));
EVs(:, idx) = eig(B);
end
EVs = gather(EVs);
This does not lead to a much better performance. Is there a way to get around the gpuArray statement in each loop? Some kind of pagefun with eig would be the solution I guess. (unfortunately, eig is not supported by pagefun)
Best wishes Niklas

1 个评论

Birk Andreas's comment moved here:
Please Mathworks, implement eig for use with pagefun as soon as possible!!!

请先登录,再进行评论。

 采纳的回答

Matt J
Matt J 2018-10-16
编辑:Matt J 2018-10-16
You need to build A directly on the GPU, for example,
EVs = zeros(5000, 500, 'gpuArray');
A=gpuArray.rand(5000,5000,500);
for idx = 1:500
B = A(:, :, idx);
EVs(:, idx) = eig(B);
end
EVs = gather(EVs);
For the case of your real A, you have to examine what operations you are currently using to build A on the host, and which of those operations would not also be available on the GPU.

3 个评论

Hi Matt,
thanks, I guess that does the trick by avoiding unnecessary communication with the GPU. Unfortunately, the speedup is not as big as I expected:
Elapsed time is 118.457193 seconds. <- CPU time without parfor
Elapsed time is 107.845443 seconds. <- GPU on GeForce 1080Ti with 12GB
Used code:
mSize = 1000;
runs = 200;
A = rand(mSize, mSize, runs);
tic
EVC = zeros(mSize, runs);
for idx = 1:runs
EVC(:, idx) = eig(A(:,:,idx));
end
toc
tic
EVG = zeros(mSize, runs, 'gpuArray');
A = gpuArray(A);
for idx = 1:runs
G = A(:,:,idx);
EVG(:, idx) = eig(G);
end
EVG = gather(EVG);
toc
Niklas
Niklas 2018-10-16
编辑:Niklas 2018-10-16
Using bigger matrices the speedup is higher.
Elapsed time is 1006.431223 seconds. <- CPU
Elapsed time is 295.168202 seconds. <- GPU
Unfortunately, nvidia-smi shows a low usage of the GPU. Maybe I will write a CUDA snipped to deal with it.
Yeah, I can't see that there would be a lot of parallelism in eigenvalue computation.

请先登录,再进行评论。

更多回答(0 个)

类别

帮助中心File Exchange 中查找有关 GPU Computing 的更多信息

产品

版本

R2018a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by