Code Generation for a Sequence-to-Sequence LSTM Network
This example demonstrates how to generate CUDA® code for a long short-term memory (LSTM) network. The example generates a MEX application that makes predictions at each step of an input timeseries. Two methods are demonstrated: a method using a standard LSTM network, and a method leveraging the stateful behavior of the same LSTM network. This example uses accelerometer sensor data from a smartphone carried on the body and makes predictions on the activity of the wearer. User movements are classified into one of five categories, namely dancing, running, sitting, standing, and walking. The example uses a pretrained LSTM network. For more information on training, see the Sequence Classification Using Deep Learning example from Deep Learning Toolbox™.
Third-Party Prerequisites
Required
This example generates CUDA MEX and has the following third-party requirements.
CUDA enabled NVIDIA® GPU and compatible driver.
Optional
For non-MEX builds such as static, dynamic libraries or executables, this example has the following additional requirements.
NVIDIA toolkit.
Environment variables for the compilers and libraries. For more information, see Third-Party Hardware (GPU Coder) and Setting Up the Prerequisite Products (GPU Coder).
Verify GPU Environment
Use the coder.checkGpuInstall
(GPU Coder) function to verify that the compilers and libraries necessary for running this example are set up correctly.
envCfg = coder.gpuEnvConfig('host');
envCfg.DeepCodegen = 1;
envCfg.Quiet = 1;
coder.checkGpuInstall(envCfg);
The lstmnet_predict
Entry-Point Function
A sequence-to-sequence LSTM network enables you to make different predictions for each individual time step of a data sequence. The lstmnet_predict.m
entry-point function takes an input sequence and passes it to a trained LSTM network for prediction. Specifically, the function uses the LSTM network trained in the Sequence to Sequence Classification Using Deep Learning example. A dlarray object is created within the entry-point function, input and output of the function are of primitive datatypes. The entry-point function loads the dlnetwork
object from the lstmnet.mat
file into a persistent variable and reuses the persistent object on subsequent prediction calls. For more information, see Code Generation for dlarray (GPU Coder).
To display an interactive visualization of the network architecture and information about the network layers, use the analyzeNetwork
function.
type('lstmnet_predict.m')
function out = lstmnet_predict(in) %#codegen % Copyright 2019-2024 The MathWorks, Inc. dlIn = dlarray(in,'CT'); persistent dlnet; if isempty(dlnet) dlnet = coder.loadDeepLearningNetwork('lstmnet.mat'); end dlOut = predict(dlnet,dlIn); out = extractdata(dlOut); end
Generate CUDA MEX
To generate CUDA MEX for the lstmnet_predict.m
entry-point function, create a GPU configuration object and specify the target to be MEX. Set the target language to C++. Create a deep learning configuration object that specifies the target library as none
. Attach this deep learning configuration object to the GPU configuration object.
cfg = coder.gpuConfig('mex'); cfg.DeepLearningConfig = coder.DeepLearningConfig('TargetLibrary','none');
At compile time, GPU Coder™ must know the data types of all the inputs to the entry-point function. Specify the type and size of the input argument to the codegen
(MATLAB Coder) command by using the coder.typeof
(MATLAB Coder) function. For this example, the input is of double data type with a feature dimension value of three and a variable sequence length. Specifying the sequence length as variable-sized enables us to perform prediction on an input sequence of any length.
matrixInput = coder.typeof(single(0),[3 Inf],[false true]);
Run the codegen command.
codegen -config cfg lstmnet_predict -args {matrixInput} -report
Code generation successful: View report
Run Generated MEX on Test Data
The HumanActivityValidate
MAT-file stores the variable XValidate
that contains sample timeseries of sensor readings on which you can test the generated code. Load the MAT-file and cast the data to single
for deployment. Call lstmnet_predict_mex
on the first observation.
load HumanActivityValidate XValidate = cellfun(@single, XValidate, 'UniformOutput', false); YPred1 = lstmnet_predict_mex(XValidate{1});
YPred1
is a 5-by-53888 numeric matrix containing the probabilities of the five classes for each of the 53888 time steps. For each time step, find the predicted class by calculating the index of the maximum probability.
[~, maxIndex] = max(YPred1, [], 1);
Associate the indices of max probability to the corresponding label. Display the first ten labels. From the results, you can see that the network predicted the human to be sitting for the first ten time steps.
labels = categorical({'Dancing', 'Running', 'Sitting', 'Standing', 'Walking'}); predictedLabels1 = labels(maxIndex); disp(predictedLabels1(1:10)')
Sitting Sitting Sitting Sitting Sitting Sitting Sitting Sitting Sitting Sitting
Compare Predictions with Test Data
Use a plot to compare the MEX output data with the test data.
figure plot(predictedLabels1,'.-'); hold on plot(YValidate{1}); hold off xlabel("Time Step") ylabel("Activity") title("Predicted Activities") legend(["Predicted" "Test Data"])
Call Generated MEX on an Observation with a Different Sequence Length
Call lstmnet_predict_mex
on the second observation with a different sequence length. In this example, XValidate{2}
has a sequence length of 64480 whereas XValidate{1}
had a sequence length of 53888. The generated code handles prediction correctly because we specified the sequence length dimension to be variable-size.
YPred2 = lstmnet_predict_mex(XValidate{2}); [~, maxIndex] = max(YPred2, [], 1); predictedLabels2 = labels(maxIndex); disp(predictedLabels2(1:10)')
Sitting Sitting Sitting Sitting Sitting Sitting Sitting Sitting Sitting Sitting
Generate MEX with Stateful LSTM
Instead of passing the entire timeseries to predict in one step, you can run prediction on an input by streaming in one timestep at a time by updating the state of the dlnetwork
. The predict
function allows you to produce the output prediction, along with the updated network state. This lstmnet_predict_and_update
function takes in a single-timestep input and updates the state of the network so that subsequent inputs are treated as subsequent timesteps of the same sample. After passing in all timesteps one at a time, the resulting output is the same as if all timesteps were passed in as a single input.
type('lstmnet_predict_and_update.m')
function out = lstmnet_predict_and_update(in) %#codegen % Copyright 2019-2024 The MathWorks, Inc. dlIn = dlarray(in,'CT'); persistent dlnet; if isempty(dlnet) dlnet = coder.loadDeepLearningNetwork('lstmnet.mat'); end [dlOut, updatedState] = predict(dlnet, dlIn); dlnet.State = updatedState; out = extractdata(dlOut); end
Run codegen on this new design file. Since we are taking in a single timestep each call, we specify matrixInput
to have a fixed sequence dimension of 1 instead of a variable sequence length.
cfg = coder.gpuConfig('mex'); cfg.DeepLearningConfig = coder.DeepLearningConfig('TargetLibrary','none'); matrixInput = coder.typeof(single(0),[3 1]); codegen -config cfg lstmnet_predict_and_update -args {matrixInput} -report
Code generation successful: View report
Run the generated MEX on the first validation sample's first timestep.
firstSample = XValidate{1}; firstTimestep = firstSample(:,1); YPredStateful = lstmnet_predict_and_update_mex(firstTimestep); [~, maxIndex] = max(YPredStateful, [], 1); predictedLabelsStateful1 = labels(maxIndex)
predictedLabelsStateful1 = categorical
Sitting
Compare the output label with the ground truth.
YValidate{1}(1)
ans = categorical
Sitting