Generate Code for a Deep Learning Network for x86-64 Platforms Using Advanced Vector Instructions
This example shows how to generate code that uses advanced vector instructions and implements an image classification algorithm. The generated code does not depend on any deep learning libraries such as MKL-DNN. In this example, you generate first a MEX function and then an executable, both of which accept a batch of images as input and performs classification.
Prerequisites
Intel® processor with support for Intel Advanced Vector Extensions 2 (Intel AVX2) instructions. If your Intel processor does not support Intel AVX2 instructions or if you are using macOS platform, do not set the code configuration property
InstructionSetExtensions
to use AVX2 instructions.
This example is supported on Linux®, Windows® and Mac® platforms. This example is not supported for MATLAB® Online™.
Download Input Video File
Download the sample video file.
if ~exist('./object_class.avi', 'file') url = 'https://www.mathworks.com/supportfiles/gpucoder/media/object_class.avi.zip'; websave('object_class.avi.zip',url); unzip('object_class.avi.zip'); end
Define the netPredict
Function
This example uses MobileNet-v2 to show image classification on Intel desktops. You can obtain a pretrained MobileNet-v2 model for MATLAB by downloading the Deep Learning Toolbox™ Model for MobileNet-v2 Network support package.
The netPredict
function loads the MobileNet-v2 network into a persistent network object and then performs prediction on the input. Subsequent calls to the function reuse the persistent network object.
type netPredict.m
% Copyright 2024 The MathWorks, Inc. function scores = netPredict(in) %#codegen % A persistent object dlnet is used to load the dlnetwork object. At % the first call to this function, the persistent object is constructed and % setup. When the function is called subsequent times, the same object is % reused to call predict on inputs, avoiding reconstructing and reloading % the dlnetwork object. persistent dlnet; if isempty(dlnet) % Get the dlnetwork for the MobileNet-v2 model. dlnet = imagePretrainedNetwork('mobilenetv2'); end % compute scores scores = predict(dlnet, in);
Create Configuration Object for MEX Generation
To generate a MEX function for the netPredict
function, create a configuration object cfg
and specify the build type as MEX. Set the TargetLang
property of the config object to 'C'
. To generate a MEX function that leverages AVX2 intrinsics, set the SIMDAcceleration
property to 'Full'
.
Next, attach a deep learning configuration object to cfg
. To generate generic C code that does not call into any third-party deep learning libraries, set TargetLibrary
to 'none'
when creating the deep learning configuration object.
cfg = coder.config('mex'); cfg.TargetLang = 'C'; cfg.SIMDAcceleration = 'Full'; cfg.DeepLearningConfig = coder.DeepLearningConfig(TargetLibrary='none');
Customize Configuration Object to Write Deep Learning Constants to Data Files
Networks can have large deep learning constants, such as convolution and fully connected layer weights, that do not change once the network has been trained. When generating code for deep learning networks, you can either write these deep learning constants in their own data files, or you can embed them in the generated source files. If your network has large constants, you can consider the first option as it will ensure that the generated source files do not grow too large with the embedded large constants. This helps avoid toolchain compiler crashes that might occur as a result of these large source files.
To write these deep learning constants to their own data files during code generation, set the LargeConstantGeneration
configuration property to 'WriteOnlyDNNConstantsToDataFiles'
.
You can also control what minimum size the deep learning constants need to be in order to be written to data files. Set the LargeConstantThreshold
configuration property to a threshold size in bytes. Deep learning constants whose sizes are greater than or equal to this threshold are written to data files, while deep learning constants with smaller size are embedded in the generated source files. For the purposes of this example, set this threshold to 256 bytes.
cfg.LargeConstantGeneration = 'WriteOnlyDNNConstantsToDataFiles';
cfg.LargeConstantThreshold = 256;
Generate MEX for netPredict
To generate a MEX, pass the MEX configuration object to the codegen
command. Specify the input type to be a dlarray
object with format SSCB
and size 224 x 224 x 3 x batchSize
. Here batchSize
is the number of images in a batch that you set to 5
. The size of the dlarray
input corresponds to the input layer size of the MobileNet-v2 network.
batchSize = 5; sampleInput = dlarray(ones(224,224,3,batchSize,'single'),'SSCB'); codegen -config cfg netPredict -args {sampleInput} -report
Code generation successful: View report
Perform Prediction on a Batch of Images
Create a videoReader
object and read five frames using the read
object function. Because batchSize
is 5
, read five images at a time. Resize the batch of input images to the size expected by MobileNet-v2.
videoReader = VideoReader('object_class.avi');
imBatch = read(videoReader,[1 5]);
imBatch = imresize(imBatch, [224,224]);
To compute the output classification scores for the inputs you provide, call the generated netPredict_mex
function.
scores = netPredict_mex(dlarray(single(imBatch),'SSCB'));
Get the top five probability scores and their labels for each image in the batch.
[val,idx] = sort(scores, 'descend'); sortedScores = val(1:5,:)*100; [~, labels] = imagePretrainedNetwork('mobilenetv2'); for i = 1:batchSize sortedLabels = labels(idx(1:5,i)); disp(['Top 5 predictions of image, ', num2str(i)]); for j=1:5 disp([sortedLabels{j},' ',num2str(sortedScores(j,i), '%2.2f'),'%']) end end
Top 5 predictions of image, 1
electric guitar 95.20% acoustic guitar 3.43% banjo 0.66% stage 0.11% violin 0.08%
Top 5 predictions of image, 2
electric guitar 95.20% acoustic guitar 3.43% banjo 0.66% stage 0.11% violin 0.08%
Top 5 predictions of image, 3
electric guitar 95.20% acoustic guitar 3.43% banjo 0.66% stage 0.11% violin 0.08%
Top 5 predictions of image, 4
electric guitar 95.20% acoustic guitar 3.43% banjo 0.66% stage 0.11% violin 0.08%
Top 5 predictions of image, 5
electric guitar 95.20% acoustic guitar 3.43% banjo 0.66% stage 0.11% violin 0.08%
Display the top five classification labels on the image.
outputImage = zeros(224,400,3, 'uint8'); for k = 1:3 outputImage(:,177:end,k) = imBatch(:,:,k,1); end scol = 1; srow = 1; outputImage = insertText(outputImage, [scol, srow], 'Classification with MobileNet-v2', 'TextColor', 'w','FontSize',20, 'BoxColor', 'black'); srow = srow + 30; for k = 1:5 outputImage = insertText(outputImage, [scol, srow], [sortedLabels{k},' ',num2str(sortedScores(k), '%2.2f'),'%'], 'TextColor', 'w','FontSize',15, 'BoxColor', 'black'); srow = srow + 25; end imshow(outputImage);
Clear the persistent network object from memory.
clear netPredict_mex
Relocate Deep Learning Constants and run netPredict_mex
The generated executable needs to know the location of the deep learning constant data files in order to run. If you relocate the deep learning constant files, you need to indicate this new location by setting the environment variable CODER_DATA_PATH
.
First, move the weights to a new location. They are currently in the codegen folder under codegen/mex/netPredict
. Create a subfolder and move the entire codegen
folder there.
mkdir('mySubfolder'); movefile('codegen', 'mySubfolder');
Then, set CODER_DATA_PATH
to the new location. Note that on Windows, it is recommended to set this environment variable outside of MATLAB and start a new MATLAB instance.
setenv('CODER_DATA_PATH', fullfile('mySubfolder','codegen','mex','netPredict'));
Now that the MEX knows where to find the relocated weights, run it again.
scores = netPredict_mex(dlarray(single(imBatch),'SSCB'));
Finally, clear the persistent network object from memory.
clear netPredict_mex
Define the netPredictExe
Entry-Point Function
To generate an executable from MATLAB code, define a new entry-point function, netPredictExe
. This function is similar to the previous entry-point function netPredict
but, in addition, includes code for preprocessing and postprocessing. The API that netPredictExe
uses is platform independent. This function accepts a video and the batch size as input arguments. These arguments are compile-time constants.
The function netPredictExe
contains four subsections that perform these actions:
Read the classification labels from supplied input text file
Read the input batch of images and resize them as needed by the network
Run inference on input image batch
Overlay the results on the images
type netPredictExe.m
% Copyright 2023 The MathWorks, Inc. function netPredictExe(inputVideo,batchSize) %#codegen % Persistent objects are used to load the dlnetwork object and labels % At the first call to this function, the persistent objects are constructed and % setup. When the function is called subsequent times, the same objects are reused, % avoiding reconstructing and reloading the dlnetwork object. persistent dlnet; persistent labels; if isempty(dlnet) || isempty(labels) % Get the dlnetwork for the MobileNet-v2 model. [dlnet, labels] = imagePretrainedNetwork('mobilenetv2', ClassNamesType='cell'); end % Create video reader and video player objects % videoReader = VideoReader(inputVideo); depVideoPlayer = vision.DeployableVideoPlayer; i = 1; % Read frames until end of video file % while ~(i+batchSize > (videoReader.NumFrames+1)) % Read and resize batch of frames as specified by input argument% reSizedImagesBatch = readImageInputBatch(videoReader,batchSize,i); % run predict on resized input images % scores = predict(dlnet, dlarray(single(reSizedImagesBatch),'SSCB')); % extract the data from the output dlarray % scores = extractdata(scores); % overlay the prediction scores on images and display % overlayResultsOnImages(scores,labels,reSizedImagesBatch,batchSize,depVideoPlayer) i = i + batchSize; end release(depVideoPlayer); end function reSizedImagesBatch = readImageInputBatch(videoReader,batchSize,i) % Read and resize batch of frames as specified by input argument% % % Inputs : % videoReader - Object used for reading the images from video file % batchSize - Number of images in batch to process. Supplied by user % i - index to track frames read from video file % % Outputs : % reSizedImagesBatch - Batch of images resized to 224x224x3xbatchsize batchSize = coder.const(batchSize); img = read(videoReader,[i (i+batchSize-1)]); reSizedImagesBatch = coder.nullcopy(ones(224,224,3,batchSize,'like',img)); resizeTo = [224,224]; reSizedImagesBatch(:,:,:,:) = imresize(img,resizeTo); end function overlayResultsOnImages(scores,labels,reSizedImagesBatch,batchSize,depVideoPlayer) % Read and resize batch of frames as specified by input argument% % % Inputs : % scores - classification results for given network % labels - cell array filled with 1000 image class labels % reSizedImagesBatch - Batch of images resized to 224x224x3xbatchsize % batchSize - Number of images in batch to process. Supplied by user % depVideoPlayer - Object for displaying results % % Outputs : % Predicted results overlayed on input images % sort the predicted scores % [val,indx] = sort(scores, 'descend'); % Labels is a heterogenous cell array. We need to index into labels to % display the top five classification labels, but codegen does not % support non-constant indexing into heterogeneous cell arrays. To get % around this, create a local copy of labels, localLabels. Codegen is % able to treat the local copy as a homogenous cell array to get around % the limitation. localLabels = labels; for j = 1:batchSize scores = val(1:5,j)*100; outputImage = zeros(224,400,3, 'uint8'); for k = 1:3 outputImage(:,177:end,k) = reSizedImagesBatch(:,:,k,j); end % Overlay the results on image % scol = 1; srow = 1; outputImage = insertText(outputImage, [scol, srow], 'Classification with MobileNet-v2', TextColor=[255 255 255], FontSize=20, BoxColor=[0 0 0]); srow = srow + 30; for k = 1:5 scoreStr = sprintf('%2.2f',scores(k)); outputImage = insertText(outputImage, [scol, srow], [localLabels{indx(k,j)},' ',scoreStr,'%'], TextColor=[255 255 255], FontSize=15, BoxColor=[0 0 0]); srow = srow + 25; end depVideoPlayer(outputImage); end end
The readImageInputBatch
Function
This function reads and resizes the images from the video input file that is passed to the function as an input argument. This function reads the specified input images and resizes them to 224-by-224-by-3, which is the size the MobileNet-v2 network expects.
function reSizedImagesBatch = readImageInputBatch(videoReader,batchSize,i) % Read and resize batch of frames as specified by input argument% % % Inputs : % videoReader - Object used for reading the images from video file % batchSize - Number of images in batch to process. Supplied by user % i - index to track frames read from video file % % Outputs : % reSizedImagesBatch - Batch of images resized to 224x224x3xbatchsize
img = read(videoReader,[i (i+batchSize-1)]); reSizedImagesBatch = coder.nullcopy(ones(224,224,3,batchSize,'like',img)); resizeTo = coder.const([224,224]); reSizedImagesBatch(:,:,:,:) = imresize(img,resizeTo); end
The predict
Function
This function accepts a dlarray
object with the resized batch of images as input and returns the prediction results. Extract the numeric data from the output dlarray
object after the call to predict
.
% run predict on resized input images % scores = predict(dlnet, dlarray(single(reSizedImagesBatch),'SSCB')); % extract the data from the output dlarray % scores = extractdata(scores);
The overlayResultsOnImages
Function
This function accepts the prediction results and sorts them in descending order. It overlays these results on the input images and displays them.
function overlayResultsOnImages(scores,labels,reSizedImagesBatch,batchSize,depVideoPlayer) % Read and resize batch of frames as specified by input argument% % % Inputs : % scores - classification results for given network % labels - cell array filled with 1000 image class labels % reSizedImagesBatch - Batch of images resized to 224x224x3xbatchsize % batchSize - Number of images in batch to process. Supplied by user % depVideoPlayer - Object for displaying results % % Outputs : % Predicted results overlaid on input images % sort the predicted scores % [val,indx] = sort(scores, 'descend'); % Labels is a heterogenous cell array. We need to index into labels to % display the top five classification labels, but codegen does not % support non-constant indexing into heterogeneous cell arrays. To get % around this, create a local copy of labels, localLabels. Codegen is % able to treat the local copy as a homogenous cell array to get around % the limitation. localLabels = labels; for j = 1:batchSize scores = val(1:5,j)*100; outputImage = zeros(224,400,3, 'uint8'); for k = 1:3 outputImage(:,177:end,k) = reSizedImagesBatch(:,:,k,j); end % Overlay the results on image % scol = 1; srow = 1; outputImage = insertText(outputImage, [scol, srow], 'Classification with MobileNet-v2', TextColor=[255 255 255], FontSize=20, BoxColor=[0 0 0]); srow = srow + 30; for k = 1:5 scoreStr = sprintf('%2.2f',scores(k)); outputImage = insertText(outputImage, [scol, srow], [localLabels{indx(k,j)},' ',scoreStr,'%'], TextColor=[255 255 255], FontSize=15, BoxColor=[0 0 0]); srow = srow + 25; end depVideoPlayer(outputImage); end end
Build and Run Executable
Create a code configuration object for generating an executable and attach a deep learning configuration object to it.
If you do not intend to create a custom C main function and use the generated example C main instead, set the GenerateExampleMain
property to 'GenerateCodeAndCompile'
.
cfg = coder.config('exe'); cfg.TargetLang = 'C'; cfg.GenerateExampleMain = 'GenerateCodeAndCompile'; cfg.DeepLearningConfig = coder.DeepLearningConfig('none'); cfg.LargeConstantGeneration = 'WriteOnlyDNNConstantsToDataFiles'; cfg.LargeConstantThreshold = 256;
To enable the generated code to leverage AVX2 intrinsics, set InstructionSetExtensions
to 'AVX2'
. You may also choose to use different instruction set extensions or none at all. Note that 'AVX2'
is not supported on Mac machines. For more information on optimizing the generated code, see Optimize C/C++ Code Performance for Deep Learning Applications without Deep Learning Libraries (MATLAB Coder).
if ~ismac cfg.InstructionSetExtensions = 'AVX2'; end
Set the batchSize
and inputVideoFile
variables.
batchSize = 5;
inputVideoFile = 'object_class.avi';
Run the codegen
command to build the executable.
codegen -config cfg netPredictExe -args {coder.Constant(inputVideoFile), coder.Constant(batchSize)} -report
Code generation successful: View report
Run the generated executable netPredictExe
either at the MATLAB command line or at the desktop terminal.
if isunix system('./netPredictExe'); elseif ispc system('netPredictExe.exe'); else disp('Platform is not supported') end
Relocate Deep Learning Constants and Run netPredictExe
If you want to relocate the weight files, you must also update the environment variable CODER_DATA_PATH
.
First, relocate the weight files. Then, outside of MATLAB, set the environment variable CODER_DATA_PATH
to the new location of the weight files. Finally, run the executable again.