The easiest way to create CUDA® kernels is to place the
coder.gpu.kernelfun pragma into your primary MATLAB® function. The primary function is also known as the top-level or entry-point function. When GPU Coder™ encounters the
kernelfun pragma, it attempts to parallelize all the computation within this function and then maps it to the GPU. For more information about GPU kernels, see GPU Programming Paradigm.
In this tutorial, you learn how to:
Prepare your MATLAB code for CUDA code generation by using the
Create and set up a GPU Coder project.
Define function input properties.
Check for code generation readiness and run-time issues.
Specify code generation properties.
Generate CUDA C code by using the GPU Coder app.
This tutorial requires the following products:
NVIDIA® GPU enabled for CUDA
CUDA toolkit and driver
Environment variables for the compilers and libraries. For more information, see Environment Variables.
You do not have to be familiar with the algorithm in the example to complete the tutorial.
The Mandelbrot set is the region in the complex plane consisting of the values z0 for which the trajectories defined by this equation remain bounded at k→∞.
The overall geometry of the Mandelbrot set is shown in the figure. This view does not have the resolution to show the richly detailed structure of the fringe just outside the boundary of the set. At increasing magnifications, the Mandelbrot set exhibits an elaborate boundary that reveals progressively finer recursive detail.
For this tutorial, pick a set of limits that specify a highly zoomed part of the Mandelbrot set in the valley between the main cardioid and the p/q bulb to its left. A 1000x1000 grid of real parts (x) and imaginary parts (y) is created between these two limits. The Mandelbrot algorithm is then iterated at each grid location. An iteration number of 500 is enough to render the image in full resolution.
maxIterations = 500; gridSize = 1000; xlim = [-0.748766713922161,-0.748766707771757]; ylim = [0.123640844894862,0.123640851045266];
This tutorial uses an implementation of the Mandelbrot set by using standard MATLAB commands running on the CPU. This implementation is based on the code provided in the Experiments with MATLAB e-book by Cleve Moler. This calculation is vectorized such that every location is updated simultaneously.
Create a MATLAB function called
mandelbrot_count.m with the following lines of code. This code is a baseline vectorized MATLAB implementation of the Mandelbrot set. For every point
(xGrid,yGrid) in the grid, it calculates the iteration index
count at which the trajectory defined by the equation reaches a distance of
2 from the origin. It then returns the natural logarithm of
count, which is used generate the color coded plot of the Mandelbrot set. Later in this tutorial, you modify this file to make it suitable for code generation.
function count = mandelbrot_count(maxIterations,xGrid,yGrid) % mandelbrot computation z0 = xGrid + 1i*yGrid; count = ones(size(z0)); z = z0; for n = 0:maxIterations z = z.*z + z0; inside = abs(z)<=2; count = count + inside; end count = log(count);
Create a MATLAB script called
mandelbrot_test.m with the following lines of code. The script generates a 1000 x 1000 grid of real parts (x) and imaginary parts (y) between the limits specified by
ylim. It also calls the
mandelbrot_count function and plots the resulting Mandelbrot set.
maxIterations = 500; gridSize = 1000; xlim = [-0.748766713922161,-0.748766707771757]; ylim = [0.123640844894862,0.123640851045266]; x = linspace(xlim(1),xlim(2),gridSize); y = linspace(ylim(1),ylim(2),gridSize); [xGrid,yGrid] = meshgrid(x,y); %% Mandelbrot computation in MATLAB count = mandelbrot_count(maxIterations,xGrid,yGrid); % Show figure(1) imagesc(x,y,count); colormap([jet();flipud(jet());0 0 0]); axis off title('Mandelbrot set with MATLAB');
Before making the MATLAB version of the Mandelbrot set algorithm suitable for code generation, you can test the functionality of the original code.
Change the current MATLAB working folder to the location that contains
mandelbrot_test.m. GPU Coder places generated code in this folder. Change your current working folder if you do not have full access to this folder.
The test script runs and shows the geometry of the Mandelbrot within the boundary set by the variables
Before you generate code with GPU Coder, check for coding issues in the original MATLAB code.
There are two tools that help you detect code generation issues at design time:
Code Analyzer tool
Code generation readiness tool
The Code Analyzer is a tool incorporated into the MATLAB Editor that continuously checks your code as you enter it. The Code Analyzer reports issues and recommends modifications to maximize performance and maintainability of your code. To identify the warnings and errors specific to code generation from your MATLAB code, add the
%#codegen directive to your MATLAB file. For more information, see Code Analyzer preferences.
The Code Analyzer does not detect all code generation issues. After eliminating the errors or warnings that the Code Analyzer detects, compile your code with GPU Coder to determine if the code has other compliance issues.
The code generation readiness tool screens the MATLAB code for features and functions that are not supported for code generation. This tool provides a report that lists issues and recommendations for making the MATLAB code suitable for code generation. You can access the code generation readiness tool in these ways:
In the current folder browser — right-click the MATLAB file that contains the entry-point function.
At the command line — by using the
In the GPU Coder app — after specifying the entry-point files, the app runs the Code Analyzer and the code generation readiness tool.
You can use GPU Coder to check for issues at code generation time. When GPU Coder detects errors or warnings, it generates an error report that describes the issues and provides links to the problematic MATLAB code. For more information, see Code Generation Reports.
To begin the process of making your MATLAB code suitable for code generation, use the file
Set your MATLAB current folder to the work folder that contains your files for this tutorial.
In the MATLAB Editor, open
mandelbrot_count.m. The Code Analyzer message indicator at the top right corner of the MATLAB Editor is green. The analyzer did not detect errors, warnings, or opportunities for improvement in the code.
After the function declaration, add the
%#codegen directive to turn on the error checking that is specific to code generation.
function count = mandelbrot_count(maxIterations,xGrid,yGrid) %#codegen
The Code Analyzer message indicator remains green, indicating that it has not detected any code generation issues.
To map the
mandelbrot_count function to a CUDA kernel, modify the original MATLAB code by placing the
coder.gpu.kernelfun pragma in the body of the function.
function count = mandelbrot_count(maxIterations,xGrid,yGrid) %#codegen % Add kernelfun pragma to trigger kernel creation coder.gpu.kernelfun; % mandelbrot computation z0 = xGrid + 1i*yGrid; count = ones(size(z0)); z = z0; for n = 0:maxIterations z = z.*z + z0; inside = abs(z)<=2; count = count + inside; end count = log(count);
If you use the
coder.gpu.kernelfun pragma, GPU Coder attempts to map the computations in the function
mandelbrot_count to the GPU.
Save the file. You are now ready to compile your code by using the GPU Coder app.
On the MATLAB toolstrip Apps tab, under Code Generation, click the GPU Coder app icon. You can also open the app by typing
gpucoder in the MATLAB Command Window. The app opens the Select source files page.
On the Select source files page, enter or select the name of the primary function,
mandelbrot_count. The primary function is also known as the top-level or entry-point function. The app creates a project with the default name
mandelbrot_count.prj in the current folder.
Click Next and go to the Define Input Types step. The app analyzes the function for coding issues and code generation readiness. If the app identifies issues, it opens the Review Code Generation Readiness page where you can review and fix issues. In this example, because the app does not detect issues, it opens the Define Input Types page.
The code generator must determine the data types of all the variables in the MATLAB files at compile time. Therefore, you must specify the data types of all the input variables. You can specify the input data types in one of these two ways:
Provide a test file that calls the project entry-point functions. The GPU Coder app can infer the input argument types by running the test file.
Enter the input types directly.
For more information about input specifications, see Input Specification.
In this example, to define the properties of the inputs
yGrid, specify the test file
Enter or select the test file
Click Autodefine Input Types.
The test file
mandelbrot_test.m calls the entry-point function,
mandelbrot_count.m with the expected input types. The app infers that the input
double(1x1) and the inputs
Click Next go to the Check for Run-Time Issues step.
The Check for Run-Time Issues step generates a MEX file from your entry-point functions, runs the MEX function, and reports issues. This step is optional. However, it is a best practice to perform this step. Using this step, you can detect and fix defects that are harder to diagnose in the generated GPU code.
GPU Coder provides the option to perform GPU-specific checks at this point. When you select this option, GPU Coder generates CUDA C code and a MEX file from your entry-point functions, runs the MEX function, and reports issues. Some of the GPU-specific run-time checks include:
Checks for register spills.
Stack size conformance checks.
There may be certain MATLAB constructs in your code that cause the Check for Run-Time Issues to fail CPU-specific checks but pass the GPU-specific checks.
To open the Check for Run-Time Issues dialog box, click the Check for Issues arrow.
In the Check for Run-Time Issues dialog box, specify a test file or enter code that calls the entry-point function with example inputs. For this example, use the test file
mandelbrot_test.m that you used to define the input types.
To enable GPU-specific checks, select the GPU option button. Click Check for Issues.
The app generates a MEX function. It runs the test script
mandelbrot_test replacing calls to
mandelbrot_count with calls to the generated MEX. If the app detects issues during the MEX function generation or execution, it provides warning and error messages. You can click these messages to navigate to the problematic code and fix the issue. In this example, the app does not detect issues. The MEX function has the same functionality as the original
There may be certain MATLAB constructs in your code that cause the Check for Run-Time Issues to fail CPU-specific checks but pass the GPU-specific checks.
Click Next go to the Generate Code step.
To open the Generate dialog box, click the Generate arrow.
In the Generate dialog box, you can select the type of build that you want GPU Coder to perform. The available options are listed in this table.
CUDA C Source code to integrate with an external project.
Compiled code to run inside MATLAB.
Binary library for static linking with an external project.
Binary library for dynamic linking with an external project.
Standalone program (requires a separate main file written in C).
For this tutorial, set Build type to
MEX(.mex). By generating a MEX output, you can check the correctness of the generated CUDA code from within MATLAB. The MEX build type does not require additional settings like Toolchain and Hardware Board. It also does not provide the option to generate only the source code. GPU Coder can automatically select an available CUDA toolchain as long as the Environment Variables are set properly.
To view advanced options, select More Settings. To the Compiler Flags option, add
--fmad=false. This flag, when passed to the
nvcc, instructs the compiler to disable Floating-point Multiply-add (FMAD) optimization. This option is set to prevent numerical mismatch in the generated code because of architectural differences between the CPU and the GPU. For more information, see Numerical Differences Between CPU and GPU.
This table describes the settings specific to GPU Coder.
GPU Coder Configuration Properties
|UI Setting||Value Type||Description|
Specify custom name prefix for kernel names in the generated code. For example, entering
Kernel names can contain upper-case letters, lowercase letters, digits 0–9, and underscore character _. GPU Coder removes unsupported characters from the kernel names and appends
Selects the type of GPU memory allocation:
Size above which the private variables are allocated on the heap instead of the stack.
Available stack limit per GPU thread.
Allows GPU Coder to utilize cuSOLVER library calls where appropriate.
Generates CUDA code with benchmarking options such as
Generates code with error-checking for CUDA API and kernel calls.
Select the minimum compute capability for code generation. The compute capability identifies the features supported by the GPU hardware and is used by applications at run time to determine which hardware features, instructions are available on the present GPU. If you specify custom compute capability, GPU Coder ignores this setting.
Specify the name of the NVIDIA virtual GPU architecture for which the CUDA input files must be compiled.
For example, to specify a virtual architecture type
Pass additional flags to the GPU compiler. For example,
For similar NVIDIA compiler options, see the topic on NVCC Command Options in the CUDA toolkit documentation.
In a multi GPU environment such as NVIDIA Drive platforms, specify the CUDA device to target.
GPU Coder generates the MEX executable
mandelbrot_count_mex in your working folder. The
<pwd>\codegen\mex\mandelbrot_count folder contains all other the generated files including the CUDA source (*.cu) and header files. The GPU Coder app indicates that the code generation succeeded. It displays the source MATLAB files and generated output files on the left side of the page. On the Variables tab, it displays information about the MATLAB source variables. On the Target Build Log tab, it displays the build log, including compiler warnings and errors. By default, in the code window, the app displays the CUDA source file
mandelbrot_count.cu. To view a different file, in the Source Code or Output Files pane, click the file name.
To view the code generation report, click View Report. The report provides links to your MATLAB code and the generated CUDA (*.cu) files. It also provides compile-time information for the variables and expressions in your MATLAB code. This information helps you to find sources of error and warnings. It also helps you to debug code generation issues in your code. For more information, see Code Generation Reports.
The GPU Kernels section on the Generated Code tab provides a list of kernels created during GPU code generation. The items in this list link to the relevant source code. For example, when you click mandelbrot_count_kernel1, the code section for this kernel is shown in the code browser window.
After you review the report, you can close the Code Generation Report window. To view the report later, open
<pwd>\codegen\mex\mandelbrot_count contains the
gpu_codegen_info.mat MAT-file that contains the statistics for the generated GPU code. This MAT-file contains the
cuda_Kernel variable that has information about the thread and block sizes, shared and constant memory usage, and input and output arguments of each kernel. The
cudaMemcpy variables contain information about the size of all the GPU variables and the number of
memcpy calls between the host and the device.
In the GPU Coder app, click Next to open the Finish Workflow page.
The Finish Workflow page indicates that the code generation succeeded. It provides a project summary and links to the MATLAB source files, the code generation report, and the generated output binaries. You can save the configuration parameters of the current GPU Coder project as a MATLAB script. See Convert MATLAB Coder Project to MATLAB Script.
To verify the correctness of the generated MEX file, see Verify Correctness of the Generated Code.