CUDA number of tasks exceed number of threads times blocks

2 次查看(过去 30 天)
I have a problem if my number of tasks exceed the number of total available threads. Lets images I want to add tow vectors of length 100 000.
Matlab Code:
N=100*1000
a=double(-[1:N]);
b=double(2*[1:N]);
a_gpu=gpuArray(a);%Create array on GPU
b_gpu=gpuArray(b);%Create array on GPU
c_gpu=gpuArray(zeros(1,N));%Create array on GPU
k = parallel.gpu.CUDAKernel('add.ptx', 'add.cu');
k.ThreadBlockSize = 100;
k.GridSize=[100,1];
o = feval(k, a_gpu,b_gpu,c_gpu);
I know that I could increase the Threadblocksize and GridSize, but this is not I want to now. Imagine my vector would be much longer..
My Cuda code looks like this
__global__ void add( double *a, double *b, double *c) {
int tid = threadIdx.x + blockIdx.x * blockDim.x;
a[tid] = a[tid] + b[tid];
tid += blockDim.x * gridDim.x;
}
In the last line I try to force the program to really go to the end of my make, by using the same threads a second, third... time. That's what I read in the book "Cuda by Example".
But for some reason using Matlab it is not working. If I use this only using C and CUDA it works.
What is wrong with my code? What is the usual way to avoid if the number of tasks are larger than the MaxThreadSize time size Gridsize? I could use the other dimension too, but still how to avoid this problem?
Thanks a lot
Robert

回答(0 个)

类别

Help CenterFile Exchange 中查找有关 GPU Computing 的更多信息

标签

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by