Profile Generated CUDA MEX Functions Using Performance Analyzer
This example shows how to profile generated CUDA® MEX files by using the GPU Performance Analyzer. For more information on the MATLAB® code that this example uses, see Fog Rectification.
Third-Party Prerequisites
CUDA enabled NVIDIA® GPU and compatible driver.
Verify GPU Environment
To verify that the compilers and libraries necessary for running this example are set up correctly, use the coder.checkGpuInstall
function.
envCfg = coder.gpuEnvConfig('host');
envCfg.BasicCodegen = 1;
envCfg.Quiet = 1;
coder.checkGpuInstall(envCfg);
Define Entry-Point Function
Define the entry-point function fog_rectification
that take a foggy image as input and returns a defogged image.
type fog_rectification.m
function [out] = fog_rectification(input) %#codegen % % Copyright 2017-2023 The MathWorks, Inc. coder.gpu.kernelfun; % restoreOut is used to store the output of restoration restoreOut = zeros(size(input),"double"); % Changing the precision level of input image to double input = double(input)./255; %% Dark channel Estimation from input darkChannel = min(input,[],3); % diff_im is used as input and output variable for anisotropic % diffusion diff_im = 0.9*darkChannel; num_iter = 3; % 2D convolution mask for Anisotropic diffusion hN = [0.0625 0.1250 0.0625; 0.1250 0.2500 0.1250; 0.0625 0.1250 0.0625]; hN = double(hN); %% Refine dark channel using Anisotropic diffusion. for t = 1:num_iter diff_im = conv2(diff_im,hN,"same"); end %% Reduction with min diff_im = min(darkChannel,diff_im); diff_im = 0.6*diff_im ; %% Parallel element-wise math to compute % Restoration with inverse Koschmieder's law factor = 1.0./(1.0-(diff_im)); restoreOut(:,:,1) = (input(:,:,1)-diff_im).*factor; restoreOut(:,:,2) = (input(:,:,2)-diff_im).*factor; restoreOut(:,:,3) = (input(:,:,3)-diff_im).*factor; restoreOut = uint8(255.*restoreOut); %% % Stretching performs the histogram stretching of the image. % im is the input color image and p is cdf limit. % out is the contrast stretched image and cdf is the cumulative % prob. density function and T is the stretching function. % RGB to grayscale conversion im_gray = im2gray(restoreOut); [row,col] = size(im_gray); % histogram calculation [count,~] = imhist(im_gray); prob = count'/(row*col); % cumulative Sum calculation cdf = cumsum(prob(:)); % Utilize gpucoder.reduce to find less than particular probability. % This is equal to "i1 = length(find(cdf <= (p/100)));", but is % more GPU friendly. % lessThanP is the preprocess function that returns 1 if the input % value from cdf is less than the defined threshold and returns 0 % otherwise. gpucoder.reduce then sums up the returned values to get % the final count. i1 = gpucoder.reduce(cdf,@plus,"preprocess", @lessThanP); i2 = 255 - gpucoder.reduce(cdf,@plus,"preprocess", @greaterThanP); o1 = floor(255*.10); o2 = floor(255*.90); t1 = (o1/i1)*[0:i1]; t2 = (((o2-o1)/(i2-i1))*[i1+1:i2])-(((o2-o1)/(i2-i1))*i1)+o1; t3 = (((255-o2)/(255-i2))*[i2+1:255])-(((255-o2)/(255-i2))*i2)+o2; T = (floor([t1 t2 t3])); restoreOut(restoreOut == 0) = 1; u1 = (restoreOut(:,:,1)); u2 = (restoreOut(:,:,2)); u3 = (restoreOut(:,:,3)); % replacing the value from look up table out1 = T(u1); out2 = T(u2); out3 = T(u3); out = zeros([size(out1),3], "uint8"); out(:,:,1) = uint8(out1); out(:,:,2) = uint8(out2); out(:,:,3) = uint8(out3); end function out = lessThanP(input) p = 5/100; out = uint32(0); if input <= p out = uint32(1); end end function out = greaterThanP(input) p = 5/100; out = uint32(0); if input >= 1 - p out = uint32(1); end end
Approach 1: Generate and Profile MEX without Instrumentation
Generate a CUDA MEX for the fog_rectification
function by running the codegen
command. Do not supply any additional options to the codegen command other than -config
and -args
. The generated code does not contain profiling instrumentation.
cfg = coder.gpuConfig('mex'); inputImg = imread('foggyInput.png'); codegen -config cfg -args {inputImg} fog_rectification.m
Code generation successful: View report
Start the GPU profiler by running the gpuprofile
command. Run the generated MEX twice and view the profiling results.
gpuprofile on fog_rectification_mex(inputImg); fog_rectification_mex(inputImg); gpuprofile viewer
### Starting profiling data processing ### Profiling data processing finished ### Showing profiling data
The GPU Performance Analyzer report shows CPU overhead and GPU activities for the two MEX executions. Because there is no profiling instrumentation, the Functions and Loops rows are empty.
Approach 2: Generate and Profile MEX with Instrumentation
To add profiling instrumentation to the generated MEX, run the codegen
command again with the -gpuprofile
option.
cfg = coder.gpuConfig('mex'); inputImg = imread('foggyInput.png'); codegen -config cfg -args {inputImg} fog_rectification.m -gpuprofile
Code generation successful: View report
Start the GPU profiler by running the gpuprofile
command. Run the generated MEX twice and view the profiling results.
gpuprofile on fog_rectification_mex(inputImg); fog_rectification_mex(inputImg); gpuprofile viewer
### Starting profiling data processing ### Profiling data processing finished ### Showing profiling data
The GPU Performance Analyzer now shows the Functions and Loops events.
Approach 3: Generate and Profile MEX Using gpuPerformanceAnalyzer
Function
You can also generate and profile the MEX by passing a MEX configuration object to the gpuPerformanceAnalyzer
function.
cfg = coder.gpuConfig('mex'); inputImg = imread('foggyInput.png'); gpuPerformanceAnalyzer('fog_rectification.m', {inputImg}, Config=cfg);
### Starting GPU code generation Code generation successful: View report ### GPU code generation finished ### Starting application profiling ### Application profiling finished ### Starting profiling data processing ### Profiling data processing finished ### Showing profiling data
When you profile the MEX using the gpuPerformanceAnalyzer
function, you can also view the generated code and trace the events to the code in the Performance Analyzer report.