Using GPU on multiple nested loops

3 次查看(过去 30 天)
The following code is slow for large ncnt (typically >2000)
and I want to use my GPU for the outermost (iplane) loop.
Can you give me an hint? (I have an NVIDIA RTX 8000.)
nxs=101;
nys=101;
nzs=101;
ncnt=100;
xnmin=-1.0;
xnmax= 1.0;
ynmin=-1.0;
ynmax= 1.0;
znmin=-1.0;
znmax= 1.0;
coefftmp=complex(rand(1,ncnt));
igalltmp=rand(3,ncnt);
vktmp=rand(1,3);
wfrtmp=complex(zeros(nxs,nys,nzs));
tic
for iplane=1:ncnt % GPU loop
ee=exp(2.*pi*complex(0.,1.));
vkg=vktmp+double(igalltmp(:,iplane)');
ekx=ee^vkg(1);
eky=ee^vkg(2);
ekz=ee^vkg(3);
coefft=coefftmp(iplane);
for iz=1:nzs
z=znmin+(znmax-znmin)*double(iz-1)/double(nzs-1);
ekzz=ekz^z;
for iy=1:nys
y=ynmin+(ynmax-ynmin)*double(iy-1)/double(nys-1);
ekyy=eky^y;
for ix=1:nxs
x=xnmin+(xnmax-xnmin)*double(ix-1)/double(nxs-1);
ekxx=ekx^x;
wfrtmp(ix,iy,iz)=wfrtmp(ix,iy,iz)+coefft*ekxx*ekyy*ekzz;
end
end
end
end
wfr(ispin,:,:,:)=wfrtmp/sqrt(Vcell);
toc
  4 个评论
Walter Roberson
Walter Roberson 2019-8-26
If you want to use GPU, you are going to have to rewrite your code to be vectorized.
I suggest you consider
zvec = linspace(znmin, znmax, nzs);
yvec = linspace(ynmin, ynmax, nys);
xvec = linspace(xnmin, xnmax, nxs);
[X, Y, Z] = ndgrid(xvec, yvec, zvec);
before any looping. After that you can do things like
coeff .* ekz.^Z .* eky.^Y .* ekx.^X
Jhinhwan Lee
Jhinhwan Lee 2019-8-26
Thanks! I did something basically the same and it is more than ten times faster now.
I also found including the X, Y and Z in the argument of the exp function slightly better: In the cases of kx=1, 2, 3, ... ekx=exp(2.*pi*complex(0.,1.)*kx)=1 and ekx^x==1 no matter what x is, while exp(2.*pi*complex(0.,1.)*kx*x) depends on x (unless k=0) as expected.
coeff*exp(2.*pi*complex(0.,1.)*(kx*X+ky*Y+kz*Z))

请先登录,再进行评论。

回答(1 个)

Raunak Gupta
Raunak Gupta 2019-8-30
Hi,
For speeding up the code you need to first vectorize the three loops inside the main loop as they are independent of each other. As mentioned in the comments you can use linspace and ndgrid for doing exponentiation for all three variables independently.
The above part only vectorizes the code but to actually use GPU you can create the initial arrays using gpuArray. This may also significantly fasten up the code. The function that you have used inside the code is supported for gpuArray but if you want to use any specific function you can check about all the supported function here.

类别

Help CenterFile Exchange 中查找有关 GPU Computing 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by