Generate Code for LSTM Network and Deploy on Cortex-M Target
This example demonstrates how to generate floating-point C code for a sequence-to-sequence long short-term memory (LSTM) network. You generate a PIL application that makes predictions at each step of an input timeseries.
This example shows three approaches for handling variable sequence length inputs to the LSTM network in the generated code. For each approach, you generate a PIL application that does one of the following:
Accepts a single observation of variable sequence length
Accepts multiple observations of variable sequence lengths
Leverages the stateful behavior of the LSTM network to accept an input of fixed sequence length
This example uses the accelerometer sensor data from a smartphone carried on the body and makes predictions on the activity of the wearer.
Wearer movements are classified into one of five categories, namely dancing, running, sitting, standing, and walking.
For more information on training the network, see Sequence Classification Using Deep Learning (Deep Learning Toolbox).
When you generate and run the PIL executable, the generated C code runs on an STMicroelectronics® STM32F746G-Discovery board. This board is an ARM Cortex®-M7 based microcontroller.
You can also deploy this example on other STMicroelectronics Discovery boards and STMicroelectronics Nucleo boards that use ARM Cortex-M processors. For deployment on these devices, you must install the corresponding support package and the associated required products, as described in the support package documentation.
For deployment on STMicroelectronics Discovery boards, install the Embedded Coder Support Package for STMicroelectronics Discovery Boards.
Supported STMicroelectronics Discovery boards:
STM32F746G-Discovery
STM32F769I-Discovery
STM32F4-Discovery
For deployment on STMicroelectronics Nucleo boards, install the Simulink Coder Support Package for STMicroelectronics Nucleo Boards.
Supported STMicroelectronics Nucleo boards:
Nucleo-F401RE
Nucleo-F103RB
Nucleo-F302R8
Nucleo-F031K6
Nucleo-L476RG
Nucleo-L053R8
Nucleo-F746ZG
Nucleo-F411RE
Nucleo-F767ZI
Nucleo-H743ZI/Nucleo-H743ZI2
Required Hardware and Peripherals
STM32F746G-Discovery board
USB type A to Mini-B cable
Connect the hardware board to the host computer by using an USB type A to Mini-B cable. To install drivers for the board, see Install Drivers for STMicroelectronics STM32 Boards (Embedded Coder).
Set Code Configuration Parameters
Create Code Configuration Object
Create a coder.EmbeddedCodeConfig
object cfg
for generating a static library.
cfg = coder.config('lib','ecoder',true);
Configure Object for PIL Execution
To enable PIL-based execution, set VerificationMode
to 'PIL'
.
cfg.VerificationMode = 'PIL';
To generate generic C code that does not depend on third-party libraries, set TargetLibrary
to 'none'
.
cfg.DeepLearningConfig = coder.DeepLearningConfig('TargetLibrary', 'none');
Specify Target Hardware
To specify the target hardware, create a coder.Hardware
object. Assign this object to the Hardware
property of the object cfg
.
cfg.Hardware = coder.hardware('STM32F746G-Discovery');
Set PIL Communication Interface
Set up a serial PIL communication interface.
cfg.Hardware.PILInterface = 'Serial';
To determine the COM port for serial communication, follow the steps 2 to 4 in Code Verification and Validation with PIL and Monitoring and Tuning (Embedded Coder). Then, set the PILCOMPort
property.
cfg.Hardware.PILCOMPort = 'COM4';
Limit Stack Size
The default stack size is much larger than the memory available on the hardware this example uses. Set the stack size to a smaller value, for example, 512 bytes.
cfg.StackUsageMax = 512;
To view the build log at the command line, enable verbose build.
cfg.Verbose = 1;
Enable ARM Cortex-M CRL
To generate optimal code, use the ARM Cortex-M (CMSIS) code replacement library.
cfg.CodeReplacementLibrary = 'ARM Cortex-M (CMSIS)';
Approach 1: Generate PIL Executable That Accepts a Single Observation of Variable Sequence Length
lstmNetwork_predict
Entry-Point Function
This entry-point function takes an input sequence and passes it to a trained LSTM network for prediction. Specifically, this function uses the LSTM network trained in the example Sequence to Sequence Classification Using Deep Learning example.
The function loads the network object from the activityRecognitionNet.mat
file into a persistent variable. The function reuses this persistent object on subsequent prediction calls.
type('lstmNetwork_predict.m')
function out = lstmNetwork_predict(in) %#codegen % Copyright 2019-2021 The MathWorks, Inc. persistent mynet; if isempty(mynet) mynet = coder.loadDeepLearningNetwork('activityRecognitionNet.mat'); end % pass in input out = predict(mynet,in);
Specify Input Type and Size
Specify the type and size of the input argument to the codegen
command by using the coder.typeof
function.
For this example, the input is of single data type with a feature dimension value of three and a variable sequence length.
Specifying the sequence length as variable-size enables the generated code to perform prediction on an input sequence of any length.
matrixInput = coder.typeof(single(0),[3 Inf],[false true]);
Generate PIL Executable
Run the codegen
command to generate code and the PIL executable.
codegen -config cfg lstmNetwork_predict -args {matrixInput} -report
Run Generated PIL Executable
Load the MAT-file XValidateData.mat
. This MAT-file stores the variable XValidateData
that contains sample timeseries of sensor readings on which you can test the generated code. Also, load the MAT-file labelsActivity.mat
that contains the activity labels.
load XValidateData.mat load labelsActivity.mat
Call lstmNetwork_predict_pil
on the first observation which has a sequence length of six. The same PIL executable can be called using observations of other sequence lengths as well.
YPred1 = lstmNetwork_predict_pil(XValidateData{1});
Clear the PIL executable.
clear lstmNetwork_predict_pil;
YPred1
is a 5-by-6 numeric matrix containing the probabilities of the five classes for each of the 6 time steps.
% For each time step, find the predicted class by calculating the index of the maximum probability value.
[~, maxIndex] = max(YPred1, [], 1);
Associate the index of the maximum probability value to the corresponding label.
Display the associated labels. From the results, you can see that the network predicted the human position for the first observation.
predictedLabels_1stObservation = labels(maxIndex); disp(predictedLabels_1stObservation)
Approach 2: Generate PIL Executable That Accepts Multiple Observations of Different Sequence Lengths
If you want to perform prediction on many observations at once, you can group the observations together in a cell array and pass the cell array for prediction. The cell array must be a column cell array, and each cell must contain one observation.
Each observation must have the same feature dimension, but their sequence lengths might vary.
Specify Input Type and Size
In this example, XValidateData
contains four observations. To generate a MEX that can accept XValidateData
as an input, specify the input type to be a 4-by-1 cell array.
Further, specify that each cell be of the same type as matrixInput
, the type you specified for the single observation in the previous |codegen| command.
matrixInput = coder.typeof(single(0),[3 Inf],[false true]); cellInput = coder.typeof({matrixInput}, [4 1]);
Generate PIL Executable
Run the codegen
command to generate code and PIL executable.
codegen -config cfg lstmNetwork_predict -args {cellInput} -report
Run the PIL Executable
Load the MAT-file XValidateData.mat
. This MAT-file stores the variable XValidateData
that contains sample timeseries of sensor readings on which you can test the generated code. Also, load the MAT-file labelsActivity.mat
that contains the activity labels.
load XValidateData.mat; load labelsActivity.mat;
Run the PIL executable for all observations.
YPred2 = lstmNetwork_predict_pil(XValidateData);
Clear the PIL executable.
clear lstmNetwork_predict_pil;
The output is a 4-by-1 cell array of predictions for the four observations passed to lstmNetwork_predict_pil
.
disp(YPred2)
Display the associated labels for the first observation.
% For each time step, find the predicted class by calculating the index of the maximum probability.
[~, maxIndex] = max(YPred2{1}, [], 1);
predictedLabels_1stObservation = labels(maxIndex);
disp(predictedLabels_1stObservation)
Approach 3: Generate PIL Executable for Stateful LSTM
lstmNetwork_predict_and_update
Entry-Point Function
Instead of passing the entire timeseries to predict
in one step, you can run prediction on an input by streaming in one timestep at a time by using the predictAndUpdateState
(Deep Learning Toolbox) function. This function accepts an input, produces an output prediction, and updates the internal state of the network so that future predictions take this initial input into account. Use this approach in resource constrained hardware that does not have enough memory is not enough to operate on the entire timeseries.
The attached lstmNetwork_predict_and_update
function accepts a single-timestep input and processes the input by using the predictAndUpdateState
function. This function outputs a prediction for the input timestep and updates the network so that subsequent inputs are treated as subsequent timesteps of the same observation. After passing in all timesteps one at a time, the resulting output is the same as if all timesteps were passed in as a single input.
type('lstmNetwork_predict_and_update.m')
function out = lstmNetwork_predict_and_update(in) %#codegen % Copyright 2019-2021 The MathWorks, Inc. persistent mynet; if isempty(mynet) mynet = coder.loadDeepLearningNetwork('activityRecognitionNet.mat'); end % pass in input [mynet, out] = predictAndUpdateState(mynet,in);
Specify Input Type and Size
To run the codegen
command on this new design file, you must specify the type and size of the input arguments to the entry-point function. Because each call of lstmNetwork_predict_and_update
accepts a single timestep, specify matrixInput
to have a fixed sequence length of 1
instead of a variable sequence length.
matrixInput = coder.typeof(single(0),[3 1]);
Generate PIL Executable
Run the codegen
command to generate code and PIL executable.
codegen -config cfg lstmNetwork_predict_and_update -args {matrixInput} -report
Run Generated PIL Executable
Load the MAT-file XValidateData.mat
. This MAT-file stores the variable XValidateData
that contains sample timeseries of sensor readings on which you can test the generated code. Also, load the MAT-file labelsActivity.mat
that contains the activity labels.
load XValidateData.mat; load labelsActivity.mat;
Get the sequence length of the first observation.
sequenceLength = size(XValidateData{1} ,2);
Run the generated PIL executable on the sample's first observation by looping over each time step.
for i = 1:sequenceLength % get each timestep data eachTimestepData = XValidateData{1}(:,i); YPredStateful(:,i) = lstmNetwork_predict_and_update_pil(eachTimestepData); end
Clear generated PIL executable after each observation.
clear lstmNetwork_predict_and_update_pil; clear lstmNetwork_predict;
Associate the index of the maximum probability value to the corresponding label.
[~, maxIndex] = max(YPredStateful, [], 1); predictedLabelsStateful = labels(maxIndex); disp(predictedLabelsStateful)
See Also
codegen
| coder.hardware
| coder.typeof
| coder.config
| coder.DeepLearningConfig
| predictAndUpdateState
(Deep Learning Toolbox)
Related Topics
- Sequence Classification Using Deep Learning (Deep Learning Toolbox)
- Install Drivers for STMicroelectronics STM32 Boards (Embedded Coder)
- Code Verification and Validation with PIL and Monitoring and Tuning (Embedded Coder)