Effective GPU Bandwidth Nvidia Quadro 6000
显示 更早的评论
Hello, I would like to use GPU acceleration to speed up the computation of fft2 in my code. The GPU device I'm using is a Nvidia Quadro 6000 having a theoretical bandwidth of 144 GB/s. However the effective bandwidth is almost 100 times lower making the use of a GPU almost unworthy:
Test : 2048 x 2048
Elapsed CPU time is : 0.109062 sec
Elapsed GPU time is : 0.007661 sec
Elapsed GPU time with CPU transfer is : 0.079723 sec
Speed up : 14.236 without memory transfer
1.36801 with memory transfer
Test : 4096 x 4096
Elapsed CPU time is : 0.356208 sec
Elapsed GPU time is : 0.026819 sec
Elapsed GPU time with CPU transfer is : 0.29406 sec
Speed up : 13.2819 without memory transfer
1.21134 with memory transfer
Test : 8192 x 8192
Elapsed CPU time is : 1.30381 sec
Elapsed GPU time is : 0.121605 sec
Elapsed GPU time with CPU transfer is : 1.17194 sec
Speed up : 10.7217 without memory transfer
1.11252 with memory transfer
If I compute the effective bandwidth (see benchmark below) it's about 1.45 GB/s
Could it be due to the version of Matlab I'm using (R2011a) or is it rather normal to expect such poor performances?
Benchmark used to measure the bandwidth:
sizes = power(2, 12:26);
repeats = 10;
D = gpuDevice
sendTimes = inf(size(sizes));
gatherTimes = inf(size(sizes));
for ii=1:numel(sizes)
data = randi([0 255], sizes(ii), 1, 'uint8');
for rr=1:repeats
timer = tic();
gdata = gpuArray(data);
sendTimes(ii) = min(sendTimes(ii), toc(timer));
timer = tic();
data2 = gather(gdata);
gatherTimes(ii) = min(gatherTimes(ii), toc(timer));
end
end
sendBandwidth = (sizes./sendTimes)/1e9
[maxSendBandwidth,maxSendIdx] = max(sendBandwidth);
fprintf('Peak send speed is %g GB/s\n',maxSendBandwidth)
gatherBandwidth = (sizes./gatherTimes)/1e9
[maxGatherBandwidth,maxGatherIdx] = max(gatherBandwidth);
fprintf('Peak gather speed is %g GB/s\n',max(gatherBandwidth))
回答(2 个)
Edric Ellis
2013-3-19
1 个投票
Your experiment there is measuring the transfer bandwidth across the PCI bus, not the device global memory bandwidth. The PCI bus bandwidth is discussed in a blog entry on Loren's blog here http://blogs.mathworks.com/loren/#1fa09fa2-c99c-4bb0-8b11-eb805fdd7040.
We have made various performance improvements to the gpuArray code since R2011a, so it would be best for you to upgrade if you can.
Domenico
2013-3-19
0 个投票
1 个评论
Edric Ellis
2013-3-19
Those figures are published using R2012b, and show that 8GB/s is not achieved; however it does show a decent improvement over your measured speed. It's hard to predict exactly how much of the difference is due to the software and how much due to the different hardware.
类别
在 帮助中心 和 File Exchange 中查找有关 GPU Computing 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!