GPU time slower than CPU time, what went wrong with my GPU implementation?

Question

Ruby Fu 2012-1-19

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/26552-gpu-time-slower-than-cpu-time-what-went-wrong-with-my-gpu-implementation

评论： ALysko 2015-4-14

Hi all, I have been testing the GPU computing feature in MATLAB. The code below is running and timing large matrix multiplications (1024x1024) using CPU and GPU computing:

    A=rand(1024);
    gA=gpuArray(A);
    %warming up
    for i=1:10
        C=A*A;
        gC=gA*gA;
    end
    tic,C=A*A;toc;
    tic,gC=gA*gA; toc;

After many trials, the results using CPU turns out to be faster than GPU time. I am surprised because this guy on stackoverflow forum did the exact testing and he proved that using GPU is faster:

    >> A = rand(1024); gA = gpuArray(A);
    % warm up by executing the operations a couple of times, and then:
    >> tic, C = A * A; toc
    Elapsed time is 0.075396 seconds.
    >> tic, gC = gA * gA; toc
    Elapsed time is 0.008621 seconds.

The only reason I can think of is that we are using different GPUs. The other guy has a Tesla C2070 while the laptop I am using is Dell Inspirion17R (NVIDIA GeForce GT 525M).

Could it be possible that by using a lesser GPU, the computation is actually slower than using CPU ?

Thank you! Ruby

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

ALysko 2015-4-14

A bit of extra info regarding double precision performance:

Tesla C2070 and GeForce GT 525M are two very different GPUs: Tesla C2070: 1.03TFlops/0.515TFlops (single/double precision) GeForce GT 525M: 0.23TFlops / 0.031TFlops

Titan Black may need a manual switch to enable full double precision:

1) the web page http://nvidianews.nvidia.com/news/nvidia-introduces-geforce-gtx-titan-dna-of-the-world-s-fastest-supercomputer-powered-by-world-s-fastest-gpu and the page 44 of the PDF "GeForce-Update-Feb-2014.pdf" at says that Titan Black has Single Precision 5.1 Teraflops Double Precision1.3 Teraflops

2) the web page http://www.bit-tech.net/news/hardware/2014/02/18/nvidia-gtx-titan-black-launched/1 compares the Titan Black to just Titan (tested by Mathworks): Titan Black: 5.1TFlops / 1.2TFlops Titan: 4.5TFlops / 1.3TFlops

(Thus, the benchmarks for Titan by Mathworks should be similar or worse than the benchmarks for Titan Black)

3) The page https://devtalk.nvidia.com/default/topic/716573/gtx-titan-double-precision-flops-way-off-specs/ talks specifically about the Mathworks benchmarks with gpuBench():

Before any changes (default settings): MTimes_D Backslash_D FFT_D MTimes_S Backslash_S FFT_S Tesla C2075 333 246 73 696 435 163 GF GTX TITAN 223 82 77 3635 179 252

After (switching the card into double precision in Control Panel): MTimes_D Backslash_D FFT_D MTimes_S Backslash_S FFT_S Tesla C2075 333 246 73 696 435 163 GeForce GTX TITAN 1285 128 146 3423 182 227

4) How to switch into double precision (which limits the GPU clock boost): http://www.hardwarecanucks.com/forum/hardware-canucks-reviews/59785-nvidia-geforce-gtx-titan-6gb-performance-review-2.html http://forums.evga.com/When-to-Use-Double-Precision-under-NVIDIA-Control-Panel-Manage-3D-Settings-m2252867.aspx http://nvidia.custhelp.com/app/answers/detail/a_id/3130/~/setting-power-management-mode-from-adaptive-to-maximum-performance http://www.hardwarecanucks.com/forum/hardware-canucks-reviews/59785-nvidia-geforce-gtx-titan-6gb-performance-review-2.html and for linux: http://ambermd.org/gpus/

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Ben Tordoff 2012-1-20

1
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/26552-gpu-time-slower-than-cpu-time-what-went-wrong-with-my-gpu-implementation#answer_34692

Hi Ruby,

I've just uploaded a benchmarking tool to the File Exchange which runs a whole load of these type of timings to put your GPU in context with others in the market:

http://www.mathworks.com/matlabcentral/fileexchange/34080-gpubench

One thing to bear in mind is that virtually all GPUs that aren't explicitly designed for scientific computing are optimized for single-precision maths (as is used by OpenGL etc.). GeForce cards, mobile or otherwise, are quite good for single-precision performance but usually about 8x worse for double. MATLAB defaults to using double-precision everywhere. Of the NVIDIA cards, only the Tesla and top-end Quadro series do well at double-precision. Add to that the fact that a mobile GPU typically has far fewer cores than a desktop one, and I'd be amazed if you saw any significant speed-ups compared to a modern mobile CPU when doing double-precision maths.

Anyway, give the benchmark a try and let us all know what you find.

Cheers

Ben

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

Answer 2

Walter Roberson 2012-1-19

1
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/26552-gpu-time-slower-than-cpu-time-what-went-wrong-with-my-gpu-implementation#answer_34634

Your GeForce GT 525M would be handling the graphics rendering, whereas the Tesla probably would not be handling graphics (and can be specifically configured to take it off graphics duties, I seem to recall.)

The GT 525M has 96 cores at up to 1.2 GHz; the Tesla C2070 has 448 cores at 1.15 GHz -- 4 times the cores.

2 个评论
显示无隐藏无

Ruby Fu 2012-1-19

Hi Walter,

Thanks for the response.

I think your answer explains why the GPU computing i performed is slower than the one performed using Tesla. However, I am also seeing that my GPU computing time is longer than the CPU computing time for the same code. Is this also due to the different number of cores the two types of hardware provide? Is there a possibility that this MATLAB feature can be improved?

Thanks!

Walter Roberson 2012-1-19

I only know some broad outlines on how things work. I know that time to load and unload the data can overwhelm the benefits of using GPUs. Large enough matrix multiply done in CPU are normally farmed out to LAPACK, which is highly optimized and uses multiple cores. The trade-off point of "large enough" could in theory depend upon which CPU you are using, but I do not know if MATLAB takes that in to account. You would need to know about the relative CPU capabilities to compare GPU/CPU figures meaningfully.

I believe that Accelereye's Jacket is benchmarked as faster than the native MATLAB GPU.

请先登录，再进行评论。

GPU time slower than CPU time, what went wrong with my GPU implementation?

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

采纳的回答

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

更多回答（1 个）

2 个评论
显示无隐藏无

另请参阅

类别

标签

产品

Community Treasure Hunt

GPU time slower than CPU time, what went wrong with my GPU implementation?

1 个评论 显示 -1更早的评论隐藏 -1更早的评论

采纳的回答

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

更多回答（1 个）

2 个评论 显示 无隐藏 无

另请参阅

类别

标签

产品

Community Treasure Hunt

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

2 个评论
显示无隐藏无