gpucoder.atomicInc

Atomically increment a variable in global or shared memory within a specified upper bound

Since R2021b

Syntax

[A,oldA] = gpucoder.atomicInc(A,B)

Description

[A,oldA] = gpucoder.atomicInc(A,B) increments the value of A in global or shared memory within the upper bound B. If the value of A is greater than or equal to B, it is reset. The operation is atomic in a sense that the entire read-modify-write operation is guaranteed to be performed without interference from other threads. The order of the input and output arguments must match the syntax provided.

example

Examples

collapse all

Increment within a Upper Bound Using CUDA atomicInc

Perform a simple atomic warp around increment operation by using the gpucoder.atomicInc function and generate CUDA^® code that calls corresponding CUDA atomicInc() APIs.

In one file, write an entry-point function myAtomicInc that accepts matrix inputs a and b.

function a = myAtomicInc(a,b)

coder.gpu.kernelfun;
for i =1:numel(a)
    [a(i),~] = gpucoder.atomicInc(a(i), b);
end

end

To create a type for a matrix of doubles for use in code generation, use the coder.newtype function.

A = coder.newtype('uint32', [1 30], [0 1]);
B = coder.newtype('uint32', [1 1], [0 0]);
inputArgs = {A,B};

To generate a CUDA library, use the codegen function.

cfg = coder.gpuConfig('lib');
cfg.GenerateReport = true;

codegen -config cfg -args inputArgs myAtomicInc -d myAtomicInc

The generated CUDA code contains the myAtomicInc_kernel1 kernel with calls to the atomicInc() CUDA APIs.

//
// File: myAtomicInc.cu
//
...

static __global__ __launch_bounds__(1024, 1) void myAtomicInc_kernel1(
    const uint32_T b, const int32_T i, uint32_T a_data[])
{
  uint64_T loopEnd;
  uint64_T threadId;

...

  for (uint64_T idx{threadId}; idx <= loopEnd; idx += threadStride) {
    int32_T b_i;
    b_i = static_cast<int32_T>(idx);
    atomicInc(&a_data[b_i], b);
  }
}
...

void myAtomicInc(uint32_T a_data[], int32_T a_size[2], uint32_T b)
{
  dim3 block;
  dim3 grid;
...

    cudaMemcpy(gpu_a_data, a_data, a_size[1] * sizeof(uint32_T),
               cudaMemcpyHostToDevice);
    myAtomicInc_kernel1<<<grid, block>>>(b, i, gpu_a_data);
    cudaMemcpy(a_data, gpu_a_data, a_size[1] * sizeof(uint32_T),
               cudaMemcpyDeviceToHost);
...

}

Input Arguments

collapse all

`A`, `B` — Operands
scalars | vectors | matrices | multidimensional arrays

Operands, specified as scalars, vectors, matrices, or multidimensional arrays. Inputs A and B must satisfy the following requirements:

Have the same data type.
Have the same size or have sizes that are compatible. For example, A is an M-by-N matrix and B is a scalar or 1-by-N row vector.

Data Types: uint32

Version History

Introduced in R2021b

gpucoder.atomicInc

Syntax

Description

Examples

Increment within a Upper Bound Using CUDA atomicInc

Input Arguments

`A`, `B` — Operands
scalars | vectors | matrices | multidimensional arrays

Version History

See Also

Functions

Topics

gpucoder.atomicInc

Syntax

Description

Examples

Increment within a Upper Bound Using CUDA atomicInc

Input Arguments

A, B — Operands scalars | vectors | matrices | multidimensional arrays

Version History

See Also

Functions

Topics

`A`, `B` — Operands
scalars | vectors | matrices | multidimensional arrays