numerical instabilites for GPU results

Question

0 个投票

I run this code

T=randn(10000,64);
data=randn(1000,64,10);
Tg=gpuArray(T);
datag=gpuArray(data);
res=zeros(10000,1000);
resg=gpuArray(res);
for i=1:10
    res=res+T*data(:,:,i)';
end
for i=1:10
    resg=resg+Tg*datag(:,:,i)';
end
resg=gather(resg);
norm(res-resg,'fro')/norm(res,'fro')

where I would expect "res" (CPU comptuted) and "resg" (GPU computed) to be the same, but they are not.

I am running this on a Tesla Card, i.e.

gpuDevice

ans =

parallel.gpu.CUDADevice handle
Package: parallel.gpu
Properties:
                    Name: 'Tesla C1060'
                   Index: 1
       ComputeCapability: '1.3'
          SupportsDouble: 1
           DriverVersion: 3.2000
      MaxThreadsPerBlock: 512
        MaxShmemPerBlock: 16384
      MaxThreadBlockSize: [512 512 64]
             MaxGridSize: [65535 65535]
               SIMDWidth: 32
             TotalMemory: 4.2948e+09
              FreeMemory: 4.0671e+09
     MultiprocessorCount: 30
             ComputeMode: 'Default'
    GPUOverlapsTransfers: 1
  KernelExecutionTimeout: 0
        CanMapHostMemory: 1
         DeviceSupported: 1
          DeviceSelected: 1
Methods, Events, Superclasses

3 个评论
显示 1更早的评论隐藏 1更早的评论

Felix 2011-5-18

There are large numerical differences, i.e.norm(res-resg,'fro')/norm(res,'fro') returns something on the order of 1e234. These are clearly no subtle BLAS differences. I suspect there is something wrong when moving data between the CPU and the GPU?

Gaszton 2011-5-19

I runned the code on my gt425m:

ans =

2.4946e-016

请先登录，再进行评论。

请先登录，再回答此问题。

Follow Question

Answer 1

Felix 2011-5-20

0 个投票

I upgraded to the latest drivers

270.41.19

, which seems to have fixed the problem.

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

James Tursa 2011-5-20

FYI, it is bad form to accept your own answer when Edric was the one that suggested updating your drivers.

请先登录，再进行评论。

Answer 2

Edric Ellis 2011-5-19

在 MATLAB Online 中打开

2 个投票

I've just run this using R2011a on Linux and Windows using C1060 cards, and in each case the final "norm" calculation gives a result of around 2e-16. So, this should work! Could you post the output of running

parallel.internal.gpu.CUDADriverVersion

and

ver distcomp

4 个评论
显示 2更早的评论隐藏 2更早的评论

Felix 2011-5-20

what is your driver version?

When I run this:

T=randn(10000,64);

A=randn(1000,64);

Ag=gpuArray(A);

Tg=gpuArray(T);

res=gather(Tg*Ag');

norm(res-T*A','fro')/norm(T*A','fro')

I get ~1e-16 at first and ~0.05 on repeated runs, so there is a problem in the matrix mult.

Sean de Wolski 2012-3-14

Copying Felix' first post with license censored:

Here it is:

parallel.internal.gpu.CUDADriverVersion

ans =

260.19.26

ver distcomp

-------------------------------------------------------------------------------------

MATLAB Version 7.12.0.635 (R2011a)

MATLAB License Number: ############

Operating System: Linux 2.6.30.10-105.2.23.fc11.x86_64 #1 SMP Thu Feb 11 07:06:34 UTC 2010 x86_64

Java VM Version: Java 1.6.0_17-b04 with Sun Microsystems Inc. Java HotSpot(TM) 64-Bit Server VM mixed mode

-------------------------------------------------------------------------------------

Parallel Computing Toolbox Version 5.1 (R2011a)

请先登录，再进行评论。

numerical instabilites for GPU results

3 个评论
显示 1更早的评论隐藏 1更早的评论

采纳的回答

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

更多回答（1 个）

4 个评论
显示 2更早的评论隐藏 2更早的评论

类别

产品

标签

Community Treasure Hunt

numerical instabilites for GPU results

3 个评论 显示 1更早的评论 隐藏 1更早的评论

采纳的回答

1 个评论 显示 -1更早的评论 隐藏 -1更早的评论

更多回答（1 个）

4 个评论 显示 2更早的评论 隐藏 2更早的评论

类别

产品

标签

另请参阅

Community Treasure Hunt

3 个评论
显示 1更早的评论隐藏 1更早的评论

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

4 个评论
显示 2更早的评论隐藏 2更早的评论