PTX kernel time to run
显示 更早的评论
Hello, i am using R2010b, CUDA toolkit 3.1 with a geforce gt425m. While is was optimalizing my cuda code i observed that calling the kernel with feval in matlab has a ~2ms constant time measured with
tic feval(k,...) toc
the kernel code:
#define C_WIDTH 1024
#define C_HEIGHT 768
__global__ void timetest1(float* holo) {
int mindex=blockIdx.x*blockDim.x+threadIdx.x;
int size=C_WIDTH*C_HEIGHT;
if (mindex>=size)
return;
holo[mindex]=mindex*mindex;
}
Even if i take out the write to global memory //holo[mindex]=mindex*mindex; there is a ~2ms time
Does anybody know the origin of this lag? It would be great to somehow eliminate it.
Thanks,
Gaszton
PS: my matlab code for the kernel:
clear
import parallel.gpu.GPUArray
xsize=1024; ysize=768;
vectorsize=xsize*ysize; threadpblock=1024; k=parallel.gpu.CUDAKernel('TimeTest.ptx', 'TimeTest.cu'); k.ThreadBlockSize=[threadpblock,1,1]; k.GridSize=[ceil(vectorsize/threadpblock),1];
dholo=parallel.gpu.GPUArray.zeros(vectorsize,1,'single');
tic [dholo]=feval(k,dholo); time=toc;
['ms time= ' num2str(time*1000)]
clear
采纳的回答
更多回答(0 个)
类别
在 帮助中心 和 File Exchange 中查找有关 GPU Computing 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!