Main Content

Deep Learning Prediction with NVIDIA TensorRT Library

This example shows how to generate code for a deep learning application by using the NVIDIA® TensorRT™ library. This example uses the codegen command to generate a MEX file that performs prediction with a Logo Recognition classification network by using TensorRT. The example also demonstrates how to use codegen command to generate a MEX file that performs 8-bit integer and 16-bit floating point prediction.

Third-Party Prerequisites

Required

This example generates CUDA® MEX and requires a CUDA-enabled NVIDIA GPU and compatible driver. You must have specific GPU compute capability for 8-bit integer and 16-bit floating point precision modes, see Third-Party Hardware (GPU Coder).

Optional

For non-MEX builds such as static, dynamic libraries or executables, you must also have:

Verify GPU Environment

Use the coder.checkGpuInstall (GPU Coder) function to verify that the compilers and libraries necessary for running this example are set up correctly.

envCfg = coder.gpuEnvConfig('host');
envCfg.DeepLibTarget = 'tensorrt';
envCfg.DeepCodegen = 1;
envCfg.Quiet = 1;
coder.checkGpuInstall(envCfg);

Download and Load Pretrained Network

This example uses a pretrained logo recognition network to classify logos in images. Download the pretrained LogoNet network from MathWorks® website and load the file. The network was developed in MATLAB® and is approximately 42 MB in size. This network can recognize 32 logos under various lighting conditions and camera angles. For information on training the logo recognition network, see Logo Recognition Network (GPU Coder).

net = getLogonet;

Convert the SeriesNetwork network object to a dlnetwork object and save the network to a MAT-file.

dlconvnet = dag2dlnetwork(net);
save dlLogoNet.mat dlconvnet

The logonet_predict Entry-Point Function

The logonet_predict.m entry-point function takes an image input and performs prediction on the image by using the deep learning network saved in the dlLogoNet.mat file. The function loads the network object from dlLogoNet.mat into a persistent variable dlLogonet and reuses the persistent variable during subsequent prediction calls. For more information, see Code Generation for dlarray (GPU Coder).

type('logonet_predict.m')
function out = logonet_predict(in)
%#codegen

% Copyright 2017-2023 The MathWorks, Inc.

% A persistent object dlLogonet is used to load the network object. At the
% first call to this function, the persistent object is constructed and
% setup. When the function is called subsequent times, the same object is
% reused to call predict on inputs, thus avoiding reconstructing and
% reloading the network object.

dlIn = dlarray(in, 'SSC');

persistent dlLogonet;

if isempty(dlLogonet)
   
    dlLogonet = coder.loadDeepLearningNetwork('dlLogoNet.mat','dlLogonet');

end

dlOut = predict(dlLogonet, dlIn);

out = extractdata(dlOut);

end

Run MEX Code Generation

To generate CUDA code for the logonet_predict entry-point function, create a GPU code configuration object for a MEX target and set the target language to C++. Use the coder.DeepLearningConfig (GPU Coder) function to create a TensorRT deep learning configuration object and assign it to the DeepLearningConfig property of the GPU code configuration object. Run the codegen command by specifying an input size of 227-by-227-by-3. This value corresponds to the input layer size of the Logo Recognition network. By default, generating TensorRT code runs inference in 32-bit floats.

cfg = coder.gpuConfig('mex');
cfg.TargetLang = 'C++';
cfg.DeepLearningConfig = coder.DeepLearningConfig('tensorrt');
codegen -config cfg logonet_predict -args {ones(227,227,3,'single')} -report
Code generation successful: View report

Perform Prediction on Test Image

Load an input image. Call logonet_predict_mex on the input image.

im = imread('gpucoder_tensorrt_test.png');
im = imresize(im, [227,227]);
predict_scores = logonet_predict_mex(single(im));

% get top 5 probability scores and their labels
[val,indx] = sort(predict_scores, 'descend');
scores = val(1:5)*100;
classnames = net.Layers(end).ClassNames;
top5labels = classnames(indx(1:5));

Display the top five classification labels.

outputImage = zeros(227,400,3, 'uint8');
for k = 1:3
    outputImage(:,174:end,k) = im(:,:,k);
end

scol = 1;
srow = 20;

for k = 1:5
    outputImage = insertText(outputImage, [scol, srow],...
        [char(top5labels(k)),' ',num2str(scores(k),'%2.2f'),'%'],...
        'TextColor', 'w','FontSize',15, 'BoxColor', 'black');
    srow = srow + 20;
end

 imshow(outputImage);

Figure contains an axes object. The axes object contains an object of type image.

Free the GPU memory by removing the loaded MEX function.

clear logonet_predict_mex;

Generate TensorRT Code for 8-Bit Integer Prediction

Generate TensorRT code that runs inference in int8 precision.

Code generation by using the NVIDIA TensorRT Library with inference computation in 8-bit integer precision supports these additional networks:

  • Object detector networks, such as YOLOv2 and SSD

  • Regression and semantic segmentation networks

TensorRT requires a calibration data set to calibrate a network that is trained in floating-point to compute inference in 8-bit integer precision. Set the data type to int8 and the path to the calibration data set by using the DeepLearningConfig. logos_dataset is a subfolder that contains images grouped by their classification labels. For int8 support, the GPU compute capability must be 6.1, 7.0, or higher.

Note that for semantic segmentation networks, the calibration data images must be of a format supported by the imread function.

unzip('logos_dataset.zip');
cfg = coder.gpuConfig('mex');
cfg.TargetLang = 'C++';
cfg.GpuConfig.ComputeCapability = '6.1';
cfg.DeepLearningConfig = coder.DeepLearningConfig('tensorrt');
cfg.DeepLearningConfig.DataType = 'int8';
cfg.DeepLearningConfig.DataPath = 'logos_dataset';
cfg.DeepLearningConfig.NumCalibrationBatches = 50;
codegen -config cfg logonet_predict -args {ones(227,227,3,'single')} -report
Code generation successful: View report

Run INT8 Prediction on Test Image

Load an input image. Call logonet_predict_mex on the input image.

im = imread('gpucoder_tensorrt_test.png');
im = imresize(im, [227,227]);    
predict_scores = logonet_predict_mex(single(im));

% get top 5 probability scores and their labels
[val,indx] = sort(predict_scores, 'descend');
scores = val(1:5)*100;
classnames = net.Layers(end).ClassNames;
top5labels = classnames(indx(1:5));

Display the top five classification labels.

outputImage = zeros(227,400,3, 'uint8');
for k = 1:3
    outputImage(:,174:end,k) = im(:,:,k);
end

scol = 1;
srow = 20;

for k = 1:5
    outputImage = insertText(outputImage, [scol, srow],...
        [char(top5labels(k)),' ',num2str(scores(k),'%2.2f'),'%'],...
        'TextColor', 'w','FontSize',15, 'BoxColor', 'black');
    srow = srow + 20;
end

 imshow(outputImage);

Figure contains an axes object. The axes object contains an object of type image.

Free the GPU memory by removing the loaded MEX function.

clear logonet_predict_mex;

Generate TensorRT Code for 16-bit Floating Point Prediction

Generate TensorRT code that runs inference in fp16 precision. For fp16 support, the GPU compute capability must be 5.3, 6.0, 6.2 or higher.

Note that quantization error occurs when accumulating operations in single precision and converting them to half precision. For more information, see Quantization of Deep Neural Networks (GPU Coder).

cfg = coder.gpuConfig('mex');
cfg.TargetLang = 'C++';
cfg.GpuConfig.ComputeCapability = '5.3';
cfg.DeepLearningConfig = coder.DeepLearningConfig('tensorrt');
cfg.DeepLearningConfig.DataType = 'fp16';
codegen -config cfg logonet_predict -args {ones(227,227,3,'single')} -report
Code generation successful: View report

Run FP16 Prediction on Test Image

Load an input image. Call logonet_predict_mex on the input image.

im = imread('gpucoder_tensorrt_test.png');

im = imresize(im, [227,227]);    
predict_scores = logonet_predict_mex(single(im));

% get top 5 probability scores and their labels
[val,indx] = sort(predict_scores, 'descend');
scores = val(1:5)*100;
classnames = net.Layers(end).ClassNames;
top5labels = classnames(indx(1:5));

Display the top five classification labels.

outputImage = zeros(227,400,3, 'uint8');
for k = 1:3
    outputImage(:,174:end,k) = im(:,:,k);
end

scol = 1;
srow = 20;

for k = 1:5
    outputImage = insertText(outputImage, [scol, srow],...
        [char(top5labels(k)),' ',num2str(scores(k),'%2.2f'),'%'],...
        'TextColor', 'w','FontSize',15, 'BoxColor', 'black');
    srow = srow + 20;
end

 imshow(outputImage);

Figure contains an axes object. The axes object contains an object of type image.

Free the GPU memory by removing the loaded MEX function.

clear logonet_predict_mex;