Generate INT8
Code for Deep Learning Network on Cortex-M Target
Deep learning uses neural network architectures that contain many processing layers. Deep learning models typically work on large sets of labeled data. Performing inference on these models is computationally intensive, consuming significant amount of memory. Neural networks use memory to store input data, parameters (weights), and activations from each layer as the input propagates through the network. Deep neural networks trained in MATLAB use single-precision floating point data types. Even networks that are small in size require a considerable amount of memory and hardware to perform these floating-point arithmetic operations. These restrictions can inhibit deployment of deep learning models to devices that have low computational power and smaller memory resources. By using a lower precision to store the weights and activations, memory requirements of the network can be reduced.
This example shows how to quantize and generate a C static library for a pretrained deep learning network and deploy the code on a Cortex-M processor. The quantization is performed by providing the calibration result file produced by the calibrate
(Deep Learning Toolbox) function to the codegen
command. The generated optimized C++ code reduces memory consumption by performing inference computations in 8-bit integers for the fully connected layer and takes advantage of ARM® processor SIMD by using the CMSIS-NN library. In this example, an LSTM network predicts human activity based on time series data representing accelerometer readings in three different directions.
In this example, you generate a PIL MEX function. When you run the PIL MEX within the MATLAB environment on your host computer, the PIL interface in turn executes the generated executable on the target hardware.
Note:
This example uses a pretrained LSTM network. For more information on how to train an LSTM network, see Sequence Classification Using Deep Learning (Deep Learning Toolbox).
Reduction in memory consumption and performance improvement might depend on the specific network you choose to deploy.
This example is supported on Windows® platform only.
Third-Party Prerequisites
Cortex-M hardware - STM32F746G Discovery board
CMSIS-NN Library
Generate Calibration Result File
Load the pretrained network attached as a MAT-file. Create a dlquantizer
(Deep Learning Toolbox) object and specify the network. Note that code generation does not support quantized deep neural networks produced by the quantize
(Deep Learning Toolbox) function.
load('activityRecognisationNet.mat'); dq = dlquantizer(net, 'ExecutionEnvironment', 'CPU');
Use the calibrate
(Deep Learning Toolbox) function to exercise the network with sample inputs and collect range information. In the training data you pass to the calibrate
function, all sequences must have the same length. The calibrate function exercises the network and collects the dynamic ranges of the weights and biases in the LSTM and fully connected layers of the network and the dynamic ranges of the activations in all layers of the network. Save the dlquantizer
object as a MAT-file to pass it to the codegen
function.
load('HumanActivityTrainFixedSeqLength.mat') xDs = arrayDatastore(cellfun(@single,XTrain,'UniformOutput',false),"IterationDimension",1,"OutputType","same"); tDs = arrayDatastore(YTrain,"IterationDimension",1,"OutputType","same"); data = combine(xDs,tDs); dq.calibrate(data); save('activityRecognisationQuantObj.mat', 'dq')
Generate PIL MEX Function
In this example, you generate code for the entry-point function net_predict.m
. This function uses the coder.loadDeepLearningNetwork
function to load a deep learning model and to construct and set up a RNN network. Then the entry-point function predicts the responses by using the predict
(Deep Learning Toolbox) function.
type net_predict.m
% Copyright 2021 The MathWorks, Inc. function out = net_predict(netFile, in) net = coder.loadDeepLearningNetwork(netFile); out = net.predict(in); end
To generate a PIL MEX function, create a code configuration object for a static library and set the verification mode to 'PIL'
. Set the target language to C. Limit the stack size to reasonable size, for example 512 bytes, as the default size is much larger than the memory available on the hardware board.
cfg = coder.config('lib', 'ecoder', true); cfg.VerificationMode = 'PIL'; cfg.StackUsageMax = 512; cfg.TargetLang = 'C';
Create a deep learning configuration object for the CMSIS-NN library.
dlcfg = coder.DeepLearningConfig('cmsis-nn');
Attach the saved dlquantizer object MAT-file to dlcfg
to generate code that performs low precision (int8
) inference.
dlcfg.CalibrationResultFile = 'activityRecognisationQuantObj.mat';
Set the DeepLearningConfig
property of cfg
to dlcfg
.
cfg.DeepLearningConfig = dlcfg;
Create a coder.Hardware
object for the STM32F746-Discovery board and attach it to the code generation configuration object. In the following code, replace comport
with port to which Cortex-M hardware is connected. Also, on the Cortex-M hardware board, set the CMSISNN_PATH
environment variable to the location of the CMSIS-NN library build on the Cortex-M board. For more information on building library and setting environment variables, see Prerequisites for Deep Learning with MATLAB Coder.
hwName = 'STM32F746G-Discovery'; hw = coder.hardware(hwName); hw.PILInterface = 'Serial'; % Uncomment the below line and replace comport with the actual port number % hw.PILCOMPort = comport; cfg.Hardware = hw; cfg.BuildConfiguration = 'Faster Builds';
In the above code, replace comport
with the actual port number. Generate a PIL MEX function by using the codegen
command.
codegen -config cfg net_predict -args {coder.Constant('activityRecognisationNet.mat'),single(zeros(3,10))}
Run Generated PIL MEX Function
Load test data from HumanActivityTest.mat
.
load('HumanActivityTest.mat')
inputData = single(XTest{1}(1:3,1:10));
Run the generated MEX function net_predict_pil
on a test data set.
YPred = net_predict_pil('activityRecognisationNet.mat', inputData);