Is there a way to speed up for loop when grouped with GPU?

4 次查看(过去 30 天)
Hello,
At the moment for loop is bottle neck in my code. I know that GPU does not work with indexing and due to for loop all calculations are switching memories between GPU and CPU. But maybe someone would have a suggestion how to speed up this part or this is stalemate and due to memory switching cant not optimized more. In my case lab (200000000x130), dydis (100000).
function [a,b]=skaicia (lab,dydis,z)
comi=gpuArray(0.05);
t=gpuArray(0.6);
d=gpuArray(50001);
langas=gpuArray(50000);
atidaryta=gpuArray(50000);
x1 = zeros(dydis,65,1);
for i=1:z
x1(:,:,i)=lab(i*dydis+1-dydis:i*dydis,:);
x=gpuArray(x1(:,:,i));
x23=x(1:end-d,:);
[n1,n2]=size(x);
n1=gpuArray(n1);
n2=gpuArray(n2);
xt=permute(x,[2 1 3]);
dx1=(d-langas-1:d-2);
dx=permute(dx1,[2 1])+ (1:n1-d);
[sujn1(:,:,i),sujn2(:,:,i)]=mazinta(xt,dx,n2,n1,d,x23,t,langas,atidaryta,comi);
end
a=sujn1;
b=sujn2;
end
  2 个评论
Walter Roberson
Walter Roberson 2019-3-31
You are growing x1 dynamically along the third dimension -- you allocate it as dydis by 65 by 1, but you assign into x1(:,:,i) so it keeps getting larger.
You pull the part of x1 that you just assigned into a gpuArray that becomes x. You never use x1 again in your code other than continuing to grow it and copying the latest slice to gpu. You do not return x1.
Therefore it would be more efficient to directly do
x = gpuArray( lab(i*dydis+1-dydis:i*dydis,:) );
Mantas Vaitonis
Mantas Vaitonis 2019-3-31
Thank you Walter, you suggestion was very helpful and did improve the speed of calculations.

请先登录,再进行评论。

回答(0 个)

类别

Help CenterFile Exchange 中查找有关 GPU Computing 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by