Generate Code for a Deep Learning Network for x86-64 Platforms Using Advanced Vector Instructions

This example uses:

This example shows how to generate code that uses advanced vector instructions and implements an image classification algorithm. The generated code does not depend on any deep learning libraries such as MKL-DNN. In this example, you generate first a MEX function and then an executable, both of which accept a batch of images as input and performs classification.

Prerequisites

Intel® processor with support for Intel Advanced Vector Extensions 2 (Intel AVX2) instructions. If your Intel processor does not support Intel AVX2 instructions or if you are using macOS platform, do not set the code configuration property InstructionSetExtensions to use AVX2 instructions.

This example is supported on Linux®, Windows® and Mac® platforms. This example is not supported for MATLAB® Online™.

Download Input Video File

Download the sample video file.

if ~exist('./object_class.avi', 'file')
   url = 'https://www.mathworks.com/supportfiles/gpucoder/media/object_class.avi.zip';
   websave('object_class.avi.zip',url);
   unzip('object_class.avi.zip');
end

Define the `netPredict` Function

This example uses MobileNet-v2 to show image classification on Intel desktops. You can obtain a pretrained MobileNet-v2 model for MATLAB by downloading the Deep Learning Toolbox™ Model for MobileNet-v2 Network support package.

The netPredict function loads the MobileNet-v2 network into a persistent network object and then performs prediction on the input. Subsequent calls to the function reuse the persistent network object.

type netPredict.m

% Copyright 2024 The MathWorks, Inc.

function scores = netPredict(in) 
%#codegen

% A persistent object dlnet is used to load the dlnetwork object. At
% the first call to this function, the persistent object is constructed and
% setup. When the function is called subsequent times, the same object is
% reused to call predict on inputs, avoiding reconstructing and reloading
% the dlnetwork object.

persistent dlnet;

if isempty(dlnet)
    % Get the dlnetwork for the MobileNet-v2 model.
    dlnet = imagePretrainedNetwork('mobilenetv2');
end

% compute scores 
scores = predict(dlnet, in);

Create Configuration Object for MEX Generation

To generate a MEX function for the netPredict function, create a configuration object cfg and specify the build type as MEX. Set the TargetLang property of the config object to 'C'. To generate a MEX function that leverages AVX2 intrinsics, set the SIMDAcceleration property to 'Full'.

Next, attach a deep learning configuration object to cfg. To generate generic C code that does not call into any third-party deep learning libraries, set TargetLibrary to 'none' when creating the deep learning configuration object.

cfg = coder.config('mex');
cfg.TargetLang = 'C';
cfg.SIMDAcceleration = 'Full';
cfg.DeepLearningConfig = coder.DeepLearningConfig(TargetLibrary='none');

Customize Configuration Object to Write Deep Learning Constants to Data Files

Networks can have large deep learning constants, such as convolution and fully connected layer weights, that do not change once the network has been trained. When generating code for deep learning networks, you can either write these deep learning constants in their own data files, or you can embed them in the generated source files. If your network has large constants, you can consider the first option as it will ensure that the generated source files do not grow too large with the embedded large constants. This helps avoid toolchain compiler crashes that might occur as a result of these large source files.

To write these deep learning constants to their own data files during code generation, set the LargeConstantGeneration configuration property to 'WriteOnlyDNNConstantsToDataFiles'.

You can also control what minimum size the deep learning constants need to be in order to be written to data files. Set the LargeConstantThreshold configuration property to a threshold size in bytes. Deep learning constants whose sizes are greater than or equal to this threshold are written to data files, while deep learning constants with smaller size are embedded in the generated source files. For the purposes of this example, set this threshold to 256 bytes.

cfg.LargeConstantGeneration = 'WriteOnlyDNNConstantsToDataFiles';
cfg.LargeConstantThreshold = 256;

Generate MEX for `netPredict`

To generate a MEX, pass the MEX configuration object to the codegen command. Specify the input type to be a dlarray object with format SSCB and size 224 x 224 x 3 x batchSize. Here batchSize is the number of images in a batch that you set to 5. The size of the dlarray input corresponds to the input layer size of the MobileNet-v2 network.

batchSize = 5;
sampleInput = dlarray(ones(224,224,3,batchSize,'single'),'SSCB');
codegen -config cfg netPredict -args {sampleInput} -report

Code generation successful: View report

Perform Prediction on a Batch of Images

Create a videoReader object and read five frames using the read object function. Because batchSize is 5, read five images at a time. Resize the batch of input images to the size expected by MobileNet-v2.

videoReader = VideoReader('object_class.avi');
imBatch = read(videoReader,[1 5]);
imBatch = imresize(imBatch, [224,224]);

To compute the output classification scores for the inputs you provide, call the generated netPredict_mex function.

scores = netPredict_mex(dlarray(single(imBatch),'SSCB'));

Get the top five probability scores and their labels for each image in the batch.

[val,idx] = sort(scores, 'descend');
sortedScores = val(1:5,:)*100;
[~, labels] = imagePretrainedNetwork('mobilenetv2');
for i = 1:batchSize
   sortedLabels = labels(idx(1:5,i));
   disp(['Top 5 predictions of image, ', num2str(i)]);
   for j=1:5
       disp([sortedLabels{j},' ',num2str(sortedScores(j,i), '%2.2f'),'%'])
   end
end

Top 5 predictions of image, 1

electric guitar 95.20%
acoustic guitar 3.43%
banjo 0.66%
stage 0.11%
violin 0.08%

Top 5 predictions of image, 2

electric guitar 95.20%
acoustic guitar 3.43%
banjo 0.66%
stage 0.11%
violin 0.08%

Top 5 predictions of image, 3

electric guitar 95.20%
acoustic guitar 3.43%
banjo 0.66%
stage 0.11%
violin 0.08%

Top 5 predictions of image, 4

electric guitar 95.20%
acoustic guitar 3.43%
banjo 0.66%
stage 0.11%
violin 0.08%

Top 5 predictions of image, 5

electric guitar 95.20%
acoustic guitar 3.43%
banjo 0.66%
stage 0.11%
violin 0.08%

Display the top five classification labels on the image.

outputImage = zeros(224,400,3, 'uint8');
for k = 1:3
   outputImage(:,177:end,k) = imBatch(:,:,k,1);
end
scol = 1;
srow = 1;
outputImage = insertText(outputImage, [scol, srow], 'Classification with MobileNet-v2', 'TextColor', 'w','FontSize',20, 'BoxColor', 'black');
srow = srow + 30;
for k = 1:5
   outputImage = insertText(outputImage, [scol, srow], [sortedLabels{k},' ',num2str(sortedScores(k), '%2.2f'),'%'], 'TextColor', 'w','FontSize',15, 'BoxColor', 'black');
   srow = srow + 25;
end
imshow(outputImage);

Figure contains an axes object. The axes object contains an object of type image.

Clear the persistent network object from memory.

clear netPredict_mex

Relocate Deep Learning Constants and run `netPredict_mex`

The generated executable needs to know the location of the deep learning constant data files in order to run. If you relocate the deep learning constant files, you need to indicate this new location by setting the environment variable CODER_DATA_PATH.

First, move the weights to a new location. They are currently in the codegen folder under codegen/mex/netPredict. Create a subfolder and move the entire codegen folder there.

mkdir('mySubfolder'); movefile('codegen', 'mySubfolder');

Then, set CODER_DATA_PATH to the new location. Note that on Windows, it is recommended to set this environment variable outside of MATLAB and start a new MATLAB instance.

setenv('CODER_DATA_PATH', fullfile('mySubfolder','codegen','mex','netPredict'));

Now that the MEX knows where to find the relocated weights, run it again.

scores = netPredict_mex(dlarray(single(imBatch),'SSCB'));

Finally, clear the persistent network object from memory.

clear netPredict_mex

Define the `netPredictExe` Entry-Point Function

To generate an executable from MATLAB code, define a new entry-point function, netPredictExe. This function is similar to the previous entry-point function netPredict but, in addition, includes code for preprocessing and postprocessing. The API that netPredictExe uses is platform independent. This function accepts a video and the batch size as input arguments. These arguments are compile-time constants.

The function netPredictExe contains four subsections that perform these actions:

Read the classification labels from supplied input text file
Read the input batch of images and resize them as needed by the network
Run inference on input image batch
Overlay the results on the images

type netPredictExe.m

% Copyright 2023 The MathWorks, Inc.

function netPredictExe(inputVideo,batchSize) 
%#codegen

    % Persistent objects are used to load the dlnetwork object and labels
    % At the first call to this function, the persistent objects are constructed and
    % setup. When the function is called subsequent times, the same objects are reused, 
    % avoiding reconstructing and reloading the dlnetwork object.
    persistent dlnet;
    persistent labels;

    if isempty(dlnet) || isempty(labels)
        % Get the dlnetwork for the MobileNet-v2 model.
        [dlnet, labels] = imagePretrainedNetwork('mobilenetv2', ClassNamesType='cell');
    end

    % Create video reader and video player objects %
    videoReader = VideoReader(inputVideo);
    depVideoPlayer = vision.DeployableVideoPlayer;

    i = 1;
    % Read frames until end of video file %
    while ~(i+batchSize > (videoReader.NumFrames+1))
        % Read and resize batch of frames as specified by input argument%
        reSizedImagesBatch = readImageInputBatch(videoReader,batchSize,i);

        % run predict on resized input images %
        scores = predict(dlnet, dlarray(single(reSizedImagesBatch),'SSCB'));

        % extract the data from the output dlarray %
        scores = extractdata(scores);

        % overlay the prediction scores on images and display %
        overlayResultsOnImages(scores,labels,reSizedImagesBatch,batchSize,depVideoPlayer)

        i = i + batchSize; 
    end
    release(depVideoPlayer);
end

function reSizedImagesBatch = readImageInputBatch(videoReader,batchSize,i)
% Read and resize batch of frames as specified by input argument%
%
% Inputs : 
% videoReader - Object used for reading the images from video file
% batchSize   - Number of images in batch to process. Supplied by user
% i           - index to track frames read from video file
%
% Outputs : 
% reSizedImagesBatch - Batch of images resized to 224x224x3xbatchsize
    batchSize = coder.const(batchSize);
    img = read(videoReader,[i (i+batchSize-1)]);
    reSizedImagesBatch = coder.nullcopy(ones(224,224,3,batchSize,'like',img));
    resizeTo = [224,224];
    reSizedImagesBatch(:,:,:,:) = imresize(img,resizeTo);
end


function overlayResultsOnImages(scores,labels,reSizedImagesBatch,batchSize,depVideoPlayer)
% Read and resize batch of frames as specified by input argument%
%
% Inputs : 
% scores          - classification results for given network
% labels          - cell array filled with 1000 image class labels
% reSizedImagesBatch - Batch of images resized to 224x224x3xbatchsize
% batchSize       - Number of images in batch to process. Supplied by user
% depVideoPlayer  - Object for displaying results
%
% Outputs : 
% Predicted results overlaid on input images

    % sort the predicted scores  %
    [val,indx] = sort(scores, 'descend');

    % Labels is a heterogenous cell array. We need to index into labels to
    % display the top five classification labels, but codegen does not
    % support non-constant indexing into heterogeneous cell arrays. To get
    % around this, create a local copy of labels, localLabels. Codegen is
    % able to treat the local copy as a homogenous cell array to get around
    % the limitation.
    localLabels = labels;

    for j = 1:batchSize
        scores = val(1:5,j)*100;
        outputImage = zeros(224,400,3, 'uint8');
        for k = 1:3
            outputImage(:,177:end,k) = reSizedImagesBatch(:,:,k,j);
        end

        % Overlay the results on image %
        scol = 1;
        srow = 1;
        outputImage = insertText(outputImage, [scol, srow], 'Classification with MobileNet-v2', TextColor=[255 255 255], FontSize=20, BoxColor=[0 0 0]);
        srow = srow + 30;

        for k = 1:5
            scoreStr = sprintf('%2.2f',scores(k));
            outputImage = insertText(outputImage, [scol, srow], [localLabels{indx(k,j)},' ',scoreStr,'%'], TextColor=[255 255 255], FontSize=15, BoxColor=[0 0 0]);
            srow = srow + 25;
        end
    
        depVideoPlayer(outputImage);
    end
end

The `readImageInputBatch` Function

This function reads and resizes the images from the video input file that is passed to the function as an input argument. This function reads the specified input images and resizes them to 224-by-224-by-3, which is the size the MobileNet-v2 network expects.

       function reSizedImagesBatch = readImageInputBatch(videoReader,batchSize,i)
       % Read and resize batch of frames as specified by input argument%
       %
       % Inputs :
       % videoReader - Object used for reading the images from video file
       % batchSize   - Number of images in batch to process. Supplied by user
       % i           - index to track frames read from video file
       %
       % Outputs :
       % reSizedImagesBatch - Batch of images resized to 224x224x3xbatchsize

           img = read(videoReader,[i (i+batchSize-1)]);
           reSizedImagesBatch = coder.nullcopy(ones(224,224,3,batchSize,'like',img));
           resizeTo  = coder.const([224,224]);
           reSizedImagesBatch(:,:,:,:) = imresize(img,resizeTo);
       end

The `predict` Function

This function accepts a dlarray object with the resized batch of images as input and returns the prediction results. Extract the numeric data from the output dlarray object after the call to predict.

        % run predict on resized input images %
        scores = predict(dlnet, dlarray(single(reSizedImagesBatch),'SSCB'));

        % extract the data from the output dlarray %
        scores = extractdata(scores);

The `overlayResultsOnImages` Function

This function accepts the prediction results and sorts them in descending order. It overlays these results on the input images and displays them.

function overlayResultsOnImages(scores,labels,reSizedImagesBatch,batchSize,depVideoPlayer)
% Read and resize batch of frames as specified by input argument%
%
% Inputs : 
% scores          - classification results for given network
% labels          - cell array filled with 1000 image class labels
% reSizedImagesBatch - Batch of images resized to 224x224x3xbatchsize
% batchSize       - Number of images in batch to process. Supplied by user
% depVideoPlayer  - Object for displaying results
%
% Outputs : 
% Predicted results overlaid on input images

    % sort the predicted scores  %
    [val,indx] = sort(scores, 'descend');

    % Labels is a heterogenous cell array. We need to index into labels to
    % display the top five classification labels, but codegen does not
    % support non-constant indexing into heterogeneous cell arrays. To get
    % around this, create a local copy of labels, localLabels. Codegen is
    % able to treat the local copy as a homogenous cell array to get around
    % the limitation.
    localLabels = labels;

    for j = 1:batchSize
        scores = val(1:5,j)*100;
        outputImage = zeros(224,400,3, 'uint8');
        for k = 1:3
            outputImage(:,177:end,k) = reSizedImagesBatch(:,:,k,j);
        end

        % Overlay the results on image %
        scol = 1;
        srow = 1;
        outputImage = insertText(outputImage, [scol, srow], 'Classification with MobileNet-v2', TextColor=[255 255 255], FontSize=20, BoxColor=[0 0 0]);
        srow = srow + 30;

        for k = 1:5
            scoreStr = sprintf('%2.2f',scores(k));
            outputImage = insertText(outputImage, [scol, srow], [localLabels{indx(k,j)},' ',scoreStr,'%'], TextColor=[255 255 255], FontSize=15, BoxColor=[0 0 0]);
            srow = srow + 25;
        end
    
        depVideoPlayer(outputImage);
    end
end

Build and Run Executable

Create a code configuration object for generating an executable and attach a deep learning configuration object to it.

If you do not intend to create a custom C main function and use the generated example C main instead, set the GenerateExampleMain property to 'GenerateCodeAndCompile'.

cfg = coder.config('exe');
cfg.TargetLang = 'C';
cfg.GenerateExampleMain = 'GenerateCodeAndCompile';
cfg.DeepLearningConfig = coder.DeepLearningConfig('none');
cfg.LargeConstantGeneration = 'WriteOnlyDNNConstantsToDataFiles';
cfg.LargeConstantThreshold = 256;

To enable the generated code to leverage AVX2 intrinsics, set InstructionSetExtensions to 'AVX2'. You may also choose to use different instruction set extensions or none at all. Note that 'AVX2' is not supported on Mac machines. For more information on optimizing the generated code, see Optimize C/C++ Code Performance for Deep Learning Applications without Deep Learning Libraries (MATLAB Coder).

if ~ismac
    cfg.InstructionSetExtensions = 'AVX2';
end

Set the batchSize and inputVideoFile variables.

batchSize = 5;
inputVideoFile = 'object_class.avi';

Run the codegen command to build the executable.

codegen -config cfg netPredictExe -args {coder.Constant(inputVideoFile), coder.Constant(batchSize)} -report

Code generation successful: View report

Run the generated executable netPredictExe either at the MATLAB command line or at the desktop terminal.

if isunix
    system('./netPredictExe');
elseif ispc
    system('netPredictExe.exe');
else
    disp('Platform is not supported')
end

Relocate Deep Learning Constants and Run `netPredictExe`

If you want to relocate the weight files, you must also update the environment variable CODER_DATA_PATH.

First, relocate the weight files. Then, outside of MATLAB, set the environment variable CODER_DATA_PATH to the new location of the weight files. Finally, run the executable again.

Generate Code for a Deep Learning Network for x86-64 Platforms Using Advanced Vector Instructions

Prerequisites

Download Input Video File

Define the `netPredict` Function

Create Configuration Object for MEX Generation

Customize Configuration Object to Write Deep Learning Constants to Data Files

Generate MEX for `netPredict`

Perform Prediction on a Batch of Images

Relocate Deep Learning Constants and run `netPredict_mex`

Define the `netPredictExe` Entry-Point Function

The `readImageInputBatch` Function

The `predict` Function

The `overlayResultsOnImages` Function

Build and Run Executable

Relocate Deep Learning Constants and Run `netPredictExe`

See Also

Topics

Generate Code for a Deep Learning Network for x86-64 Platforms Using Advanced Vector Instructions

Prerequisites

Download Input Video File

Define the netPredict Function

Create Configuration Object for MEX Generation

Customize Configuration Object to Write Deep Learning Constants to Data Files

Generate MEX for netPredict

Perform Prediction on a Batch of Images

Relocate Deep Learning Constants and run netPredict_mex

Define the netPredictExe Entry-Point Function

The readImageInputBatch Function

The predict Function

The overlayResultsOnImages Function

Build and Run Executable

Relocate Deep Learning Constants and Run netPredictExe

See Also

Topics

Define the `netPredict` Function

Generate MEX for `netPredict`

Relocate Deep Learning Constants and run `netPredict_mex`

Define the `netPredictExe` Entry-Point Function

The `readImageInputBatch` Function

The `predict` Function

The `overlayResultsOnImages` Function

Relocate Deep Learning Constants and Run `netPredictExe`