Call Custom CUDA Kernels from the Generated Code
From within your MATLAB® code, you can directly call external CUDA® kernels, also called custom code or legacy code. To call CUDA kernels, use coder.ceval
. The code generator integrates your
CUDA kernel into the CUDA code generated from MATLAB. Integrate code when there are external libraries, optimized code, or object
files developed using CUDA that you want to use with your generated code.
The external CUDA kernel must use the __global__
qualifier to execute the
function (kernels) on the GPU device and to call the function from the host or from the
device. Functions with the __device__
qualifier are called device
functions. The device functions are different from global functions in that they can only be
called from other device or global functions. For information on integrating custom device
functions, see Call Custom CUDA Device Function from the Generated Code.
Note
Use coder.ceval
only in MATLAB code intended for
code generation. coder.ceval
generates an error in uncompiled
MATLAB code. To determine if a MATLAB function is executing in MATLAB, use coder.target
. If the function is executing in MATLAB, call the MATLAB version
of the CUDA kernel.
Call Custom CUDA Kernel
This example shows how to integrate a simple CUDA kernel with MATLAB code by using coder.ceval
. Consider the MATLAB function, saxpy
:
type saxpy.m
function y = saxpy(a,x,y) y = a*x + y; end
For this example, suppose that you want to implement the a x plus y operation by using external CUDA kernel. Consider the CUDA kernel, saxpy_kernel
, implemented in the file saxpy.cu
:
type saxpy.cu
#include "saxpy.h" __global__ void saxpy_kernel(uint32_T n, real32_T a, real32_T *x, real32_T *y) { int i = blockIdx.x*blockDim.x + threadIdx.x; if (i < n) y[i] = a*x[i] + y[i]; }
To integrate saxpy_kernel
with your MATLAB code, you need a header file that contains the function prototype. See the file saxpy.h
:
type saxpy.h
#ifndef real32_T #define real32_T float #define uint32_T unsigned int #endif #define saxpy(grid,block,n,a,x,y) saxpy_kernel<<<grid,block>>>(n,a,x,y) __global__ void saxpy_kernel(uint32_T n, real32_T a, real32_T *x, real32_T *y);
This example generates CUDA MEX, uint32_T
and real32_T
are custom types used in the generated MEX code. The code generator produces data types in CUDA code that correspond to the data types that you use in your MATLAB code. The data types that are generated depend on the target platform and compiler. The code generator can produce either built-in CUDA/C++ data types, such as short, long, int, and so on, or custom data types defined by using typedef statements. By default, the code generator produces built-in types for standalone code (lib, dll, or exe) and custom types for MEX code. For more information, see Mapping MATLAB Types to Types in Generated Code.
Entry-Point Function
Use the coder.ceval
command to call the CUDA kernel in the saxpyRef.m
entry-point function. Use coder.ceval
only in MATLAB code intended for code generation. The coder.rref
and coder.ref
commands instruct the code generator to pass pointers to the arrays, rather than copy them.
type saxpyRef.m
function y = saxpyRef(a,x,y) % saxpyRef Entry-point function for computing single-precision % (A*X) Plus Y % Copyright 2022 The MathWorks, Inc. coder.gpu.kernelfun; if coder.target('MATLAB') y = a*x + y; else coder.ceval('saxpy', uint32(floor((numel(x)+255)/256)), ... uint32(256),uint32(numel(x)), single(a), ... coder.rref(x,'gpu'),coder.ref(y,'gpu')); end end
Generate CUDA Code
To generate CUDA code, create a GPU code configuration object. Specify the location of the custom CUDA files by setting custom code properties on the configuration object. For more information, see Configure Build for External C/C++ Code.
cfg = coder.gpuConfig("mex"); cfg.GenerateReport = true; cfg.CustomSource = "saxpy.cu"; cfg.CustomInclude = pwd; cfg.CustomSourceCode = '#include "saxpy.h"'; aType = coder.newtype('single', [1 1], [0 0]); xType = coder.newtype('single', [4096 256], [0 0]); yType = coder.newtype('single', [4096 256], [0 0]); inputArgs = {aType,xType,yType}; codegen -config cfg saxpyRef -args inputArgs
Code generation successful: View report
Generated Code
To compare your generated CUDA code to the original MATLAB code, open the CUDA file, saxyRef.cu
in the work
\codegen\mex\saxpyRef
folder.
#include "saxpy.h" // Function Definitions void saxpyRef(real32_T a, const real32_T x[1048576], real32_T y[1048576]) { real32_T(*gpu_x)[1048576]; real32_T(*gpu_y)[1048576]; cudaMalloc(&gpu_y, 4194304UL); cudaMalloc(&gpu_x, 4194304UL); // saxpyRef Entry-point function for computing single-precision (A*X) Plus // Y // Copyright 2022 The MathWorks, Inc. cudaMemcpy(*gpu_x, x, 4194304UL, cudaMemcpyHostToDevice); cudaMemcpy(*gpu_y, y, 4194304UL, cudaMemcpyHostToDevice); saxpy(4096U, 256U, 1048576U, a, &(*gpu_x)[0], &(*gpu_y)[0]); cudaMemcpy(y, *gpu_y, 4194304UL, cudaMemcpyDeviceToHost); cudaFree(*gpu_x); cudaFree(*gpu_y); }
Run Generated MEX
Run the generated MEX with random inputs and compare the results with MATLAB simulation.
a = single(15); x = randi(10,4096,256,'single'); y = zeros(4096,256,'single'); yMATLAB = saxpyRef(a,x,y); yGPU = saxpyRef_mex(a,x,y); if (yGPU - yMATLAB == 0) fprintf('\nMATLAB simulation matches GPU execution.\n'); end
MATLAB simulation matches GPU execution.