Main Content

gpuPerformanceAnalyzer

Analyze and optimize performance of the generated code

Since R2023a

    Description

    gpuPerformanceAnalyzer(fcn, fcn_inputs) generates GPU code for the MATLAB® entry-point function fcn and analyzes performance through code execution profiling plots and reports. fcn_inputs is a cell array of example values to fcn used during code generation and execution profiling.

    Note

    The profiling workflow depends on profiling tools from NVIDIA®. From CUDA® Toolkit v10.1 onwards, NVIDIA restricts access to performance counters to admin users. To enable GPU performance counters for all user accounts, see the instructions in Permission issue with Performance Counters (NVIDIA).

    Note

    The profiling tools from NVIDIA might not support legacy GPU hardware such as the Kepler family of devices. For information on supported GPU devices, see the NVIDIA documentation.

    gpuPerformanceAnalyzer(___,Name=Value) generates GPU code and analyzes performance through code execution profiling plots and reports by using the options specified by one or more Name=Value pair arguments.

    example

    Examples

    collapse all

    This example shows how to analyze the performance of the CUDA code generated for the Mandelbrot algorithm by using gpuPerformanceAnalyzer.

    The Mandelbrot set is the region in the complex plane consisting of the values z0 for which the trajectories defined by this equation remain bounded at k→∞.

    zk+1=zk2+z0,k=0,1,

    The overall geometry of the Mandelbrot set is shown in the figure. This view does not have the resolution to show the richly detailed structure of the fringe just outside the boundary of the set. At increasing magnifications, the Mandelbrot set exhibits an elaborate boundary that reveals progressively finer recursive detail.

    Geometry of the Mandelbrot set with a region of interest circled in the middle of the image

    For this example, pick a set of limits that specify a highly zoomed part of the Mandelbrot set in the valley between the main cardioid and the p/q bulb to its left. A 1000-by-1000 grid of real parts (x) and imaginary parts (y) is created between these two limits. The Mandelbrot algorithm is then iterated at each grid location. An iteration number of 500 renders the image in full resolution.

    maxIterations = 500;
    gridSize = 1000;
    xlim = [-0.748766713922161,-0.748766707771757];
    ylim = [0.123640844894862,0.123640851045266];
    
    x = linspace( xlim(1), xlim(2), gridSize );
    y = linspace( ylim(1), ylim(2), gridSize );
    [xGrid,yGrid] = meshgrid( x, y );
    

    The mandelbrot_count.m entry-point function contains a vectorized implementation of the Mandelbrot set.

    function count = mandelbrot_count(maxIterations, xGrid, yGrid) %#codegen
    
    z0 = complex(xGrid,yGrid);
    count = ones(size(z0));
    
    % Map computation to GPU.
    coder.gpu.kernelfun;
    
    z = z0;
    for n = 0:maxIterations
        z = z.*z + z0;
        inside = abs(z)<=2;
        count = count + inside;
    end
    count = log(count);
    

    To generate CUDA code for mandelbrot_count and analyze its performance, use the gpuPerformanceAnalyzer function.

    cfg = coder.gpuConfig('dll');
    cfg.GpuConfig.CompilerFlags = '--fmad=false';
    cfg.GpuConfig.EnableMemoryManager = true;
    
    gpuPerformanceAnalyzer('mandelbrot_count', ...
    {maxIterations,xGrid,yGrid},Config=cfg, ...
    NumIterations=2,OutFolder="PerfTest");
    
    ### Starting GPU code generation
    Code generation successful: View report
    
    ### GPU code generation finished
    ### Starting SIL execution for 'mandelbrot_count'
        To terminate execution: clear mandelbrot_count_sil
    ### Stopping SIL execution for 'mandelbrot_count'
    ### Starting profiling data processing
    ### Profiling data processing finished
    ### Showing profiling data
    

    After collecting the profiling data, the gpuPerformanceAnalyzer launches the GPU Performance Analyzer report window.

    GPU performance analyzer report for the Mandelbrot set

    Input Arguments

    collapse all

    Specified as a function existing in the current working folder or on the path. If the MATLAB file is on a path that contains non 7-bit ASCII characters, such as Japanese characters, the gpuPerformanceAnalyzer command might not find the file.

    Example: gpuPerformanceAnalyzer('foo.m', {rand(5,10)});

    Example values that define the size, class, and complexity of the inputs of the preceding MATLAB function. The position of the input in the cell array must correspond to the position of the input argument in the MATLAB function definition. Alternatively, instead of an example value, you can provide a coder.Type object. To create a coder.Type object, use coder.typeof.

    To generate a function that has fewer input arguments than the function definition has, omit the example values for the arguments that you do not want.

    Example: gpuPerformanceAnalyzer('foo.m', {rand(5,10)});

    Name-Value Arguments

    Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

    Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

    Example: gpuPerformanceAnalyzer('foo.m', {rand(5,10)}, Config=cfg, NumIterations=2, OutFolder="PerfTest");

    Specify the properties (size, class, and complexity) for the entry-point function inputs used during code generation. If this value is empty, the code generator interprets the input properties from fcn_inputs.

    Example: gpuPerformanceAnalyzer('foo.m', {rand(5,10)}, InputTypes= {coder.typeof(ones(5,10))});

    Specify the configuration object that contains the code generation parameters:

    • For CUDA MEX generation, specify a coder.MexCodeConfig object.

    • For CUDA standalone library or executable generation, specify a coder.EmbeddedCodeConfig object.

    By default, gpuPerformanceAnalyzer uses a GPU code generation configuration object for dynamic linked library (dll).

    Example: gpuPerformanceAnalyzer('foo.m', {rand(5,10)}, Config=coder.gpuconfig('dll'));

    Specify the number of times to execute the generated code. The value for 'NumIterations' name-value pair must be a positive integer greater than or equal to 1.

    By default, the View Mode of the GPU Performance Analyzer window is set to Entry-Point Function and the Profiling Timeline shows only the last execution of the generated code. To view all the iterations, set the View Mode to Full Application.

    Example: gpuPerformanceAnalyzer('foo.m', {rand(5,10)},NumIterations=2);

    Store generated files in the absolute or relative path specified by the 'Outfolder' name-value pair argument. Value for 'Outfolder' must not contain:

    • Spaces, as spaces can lead to code generation failures in certain operating system configurations.

    • Non 7-bit ASCII characters, such as Japanese characters,

    If the folder specified by the 'Outfolder' name-value pair argument does not exist, gpuPerformanceAnalyzer creates it.

    If you do not specify the folder location, gpuPerformanceAnalyzer generates files in the default folder:

    codegen/target/fcn_name

    target can be:

    • mex for CUDA MEX

    • lib for CUDA libraries

    • dll for CUDA dynamic libraries

    fcn_name is the name of the MATLAB function.

    The function does not support the following characters in folder names: asterisk (*), question-mark (?), dollar ($), and pound (#).

    Note

    Each time gpuPerformanceAnalyzer generates the same type of output for the same code, it removes the files from the previous build. If you want to preserve files from a previous build, before starting another build, copy them to a different location.

    Example: gpuPerformanceAnalyzer('foo.m', {rand(5,10)}, OutFolder="PerfTest");

    Version History

    Introduced in R2023a