Unexpected speed decrease of 2D Fourier Transform on GPU when iFFTed
3 次查看(过去 30 天)
显示 更早的评论
I am applying a first FFT2 on a stack of images, croping a part of it, and iFFT2 this part:
For example on GPU: FFT2(1920*1240*30 (single) ) -> crop to 320*207*30 (single) -> iFFT2(320*207*30 (single) )
1920/6=320
1240/6=207
Here you may observe the time of execution, normalized to the number of single data processed, for each function:
Note that the yellow line (FFT2+crop1/6+iFFT2) is more than an order of magnitude slower than the purple line which has 36 more data to process with iFFT2 !
Any idea on what is happening here?
Here is the script I have used:
clear
n=10;
cx=1920;
cy=1240;
FPT=2:5:50;
fpt=size(FPT,2);
b=zeros(1,fpt);
for kk=1:8
for ii=1:fpt
ii
I=gpuArray(single(rand(cy,cx,FPT(1,ii))));
Ia=gpuArray(single(rand(round(cy/6),round(cx/6),FPT(1,ii))+1i.*rand(round(cy/6),round(cx/6),FPT(1,ii))));
mask=zeros(cy,cx,FPT(1,ii));
mask(round(cy/2)-round(cy/12):round(cy/2)+round(cy/12),round(cx/2)-round(cx/12):round(cx/2)+round(cx/12))...
=(ones(size(round(cy/2)-round(cy/12):round(cy/2)+round(cy/12),2),size(round(cx/2)-round(cx/12):round(cx/2)+round(cx/12),2)));
mask=gpuArray(single(mask));
tic
for jj=1:n
switch kk
case 1
tic
B=fft2(I);
case 2
tic
B=fft2(I);
C=B(((cy/2)-round(cy/12)):((cy/2)+round(cy/12)),...
((cx/2)-round(cx/12)):((cx/2)+round(cx/12)),:);
case 3
tic
B=fft2(I);
C=B(((cy/2)-round(cy/12)):((cy/2)+round(cy/12)),...
((cx/2)-round(cx/12)):((cx/2)+round(cx/12)),:);
D=ifft2(C);
case 4
tic
B=fft2(I);
C=ifft2(B);
case 5
tic
B=fft2(I);
C=B.*mask;
D=ifft2(C);
case 6
tic
B=fft2(I);
C=B.*mask;
D=ifft2(C);
E1=imresize(abs(D),1/6);
E2=imresize(angle(D),1/6);
case 7
tic
C=fft2(I);
B=ifft2(Ia);
case 8
tic
B=ifft2(Ia);
end
end
b(1,ii)=toc/n; % b is the time of execution normalized to
%the amount of data and the number of time a case has been evaluated
end
hold on
plot(b)
clear A B C D I E1 E2
end
b is the variable plotted in the above graphic.
My graphic card is the GeForce RTX 2080 Ti.
Any help will be appreciated.
Thanks,
Tual
0 个评论
采纳的回答
Joss Knight
2019-6-8
I modified your code inserting wait(gpuDevice) before each tic and toc and got a much more sensible graph:
0 个评论
更多回答(1 个)
Bruno Luong
2019-6-3
If you want a fast FFT, make your data length power of 2, or product of small integers.
166 is bad since the prime factor is 2 * 83..
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Fourier Analysis and Filtering 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!