Code Generation for a Sequence-to-Sequence LSTM Network

This example uses:

This example demonstrates how to generate CUDA® code for a long short-term memory (LSTM) network. The example generates a MEX application that makes predictions at each step of an input timeseries. Two methods are demonstrated: a method using a standard LSTM network, and a method leveraging the stateful behavior of the same LSTM network. This example uses accelerometer sensor data from a smartphone carried on the body and makes predictions on the activity of the wearer. User movements are classified into one of five categories, namely dancing, running, sitting, standing, and walking. The example uses a pretrained LSTM network. For more information on training, see the Sequence Classification Using Deep Learning example from Deep Learning Toolbox™.

Third-Party Prerequisites

Required

This example generates CUDA MEX and has the following third-party requirements.

CUDA enabled NVIDIA® GPU and compatible driver.

Optional

For non-MEX builds such as static, dynamic libraries or executables, this example has the following additional requirements.

NVIDIA toolkit.
Environment variables for the compilers and libraries. For more information, see Third-Party Hardware (GPU Coder) and Setting Up the Prerequisite Products (GPU Coder).

Verify GPU Environment

Use the coder.checkGpuInstall (GPU Coder) function to verify that the compilers and libraries necessary for running this example are set up correctly.

envCfg = coder.gpuEnvConfig('host');
envCfg.DeepCodegen = 1;
envCfg.Quiet = 1;
coder.checkGpuInstall(envCfg);

The `lstmnet_predict` Entry-Point Function

A sequence-to-sequence LSTM network enables you to make different predictions for each individual time step of a data sequence. The lstmnet_predict.m entry-point function takes an input sequence and passes it to a trained LSTM network for prediction. Specifically, the function uses the LSTM network trained in the Sequence to Sequence Classification Using Deep Learning example. A dlarray object is created within the entry-point function, input and output of the function are of primitive datatypes. The entry-point function loads the dlnetwork object from the lstmnet.mat file into a persistent variable and reuses the persistent object on subsequent prediction calls. For more information, see Code Generation for dlarray (GPU Coder).

To display an interactive visualization of the network architecture and information about the network layers, use the analyzeNetwork function.

type('lstmnet_predict.m')

function out = lstmnet_predict(in) %#codegen
% Copyright 2019-2024 The MathWorks, Inc. 
    dlIn = dlarray(in,'CT');
    persistent dlnet;
    
    if isempty(dlnet)
        dlnet = coder.loadDeepLearningNetwork('lstmnet.mat');
    end
    
    dlOut = predict(dlnet,dlIn); 
    
    out = extractdata(dlOut);
end

Generate CUDA MEX

To generate CUDA MEX for the lstmnet_predict.m entry-point function, create a GPU configuration object and specify the target to be MEX. Set the target language to C++. Create a deep learning configuration object that specifies the target library as none. Attach this deep learning configuration object to the GPU configuration object.

cfg = coder.gpuConfig('mex');
cfg.DeepLearningConfig = coder.DeepLearningConfig('TargetLibrary','none');

At compile time, GPU Coder™ must know the data types of all the inputs to the entry-point function. Specify the type and size of the input argument to the codegen (MATLAB Coder) command by using the coder.typeof (MATLAB Coder) function. For this example, the input is of double data type with a feature dimension value of three and a variable sequence length. Specifying the sequence length as variable-sized enables us to perform prediction on an input sequence of any length.

matrixInput = coder.typeof(single(0),[3 Inf],[false true]);

Run the codegen command.

codegen -config cfg lstmnet_predict -args {matrixInput} -report

Code generation successful: View report

Run Generated MEX on Test Data

The HumanActivityValidate MAT-file stores the variable XValidate that contains sample timeseries of sensor readings on which you can test the generated code. Load the MAT-file and cast the data to single for deployment. Call lstmnet_predict_mex on the first observation.

load HumanActivityValidate
XValidate = cellfun(@single, XValidate,  'UniformOutput',  false);
YPred1 = lstmnet_predict_mex(XValidate{1});

YPred1 is a 5-by-53888 numeric matrix containing the probabilities of the five classes for each of the 53888 time steps. For each time step, find the predicted class by calculating the index of the maximum probability.

[~, maxIndex] = max(YPred1, [], 1);

Associate the indices of max probability to the corresponding label. Display the first ten labels. From the results, you can see that the network predicted the human to be sitting for the first ten time steps.

labels = categorical({'Dancing', 'Running', 'Sitting', 'Standing', 'Walking'});
predictedLabels1 = labels(maxIndex);
disp(predictedLabels1(1:10)')

     Sitting 
     Sitting 
     Sitting 
     Sitting 
     Sitting 
     Sitting 
     Sitting 
     Sitting 
     Sitting 
     Sitting

Compare Predictions with Test Data

Use a plot to compare the MEX output data with the test data.

figure
plot(predictedLabels1,'.-');
hold on
plot(YValidate{1});
hold off

xlabel("Time Step")
ylabel("Activity")
title("Predicted Activities")
legend(["Predicted" "Test Data"])

Figure contains an axes object. The axes object with title Predicted Activities, xlabel Time Step, ylabel Activity contains 2 objects of type line. These objects represent Predicted, Test Data.

Call Generated MEX on an Observation with a Different Sequence Length

Call lstmnet_predict_mex on the second observation with a different sequence length. In this example, XValidate{2} has a sequence length of 64480 whereas XValidate{1} had a sequence length of 53888. The generated code handles prediction correctly because we specified the sequence length dimension to be variable-size.

YPred2 = lstmnet_predict_mex(XValidate{2});
[~, maxIndex] = max(YPred2, [], 1);
predictedLabels2 = labels(maxIndex);
disp(predictedLabels2(1:10)')

     Sitting 
     Sitting 
     Sitting 
     Sitting 
     Sitting 
     Sitting 
     Sitting 
     Sitting 
     Sitting 
     Sitting

Generate MEX with Stateful LSTM

Instead of passing the entire timeseries to predict in one step, you can run prediction on an input by streaming in one timestep at a time by updating the state of the dlnetwork. The predict function allows you to produce the output prediction, along with the updated network state. This lstmnet_predict_and_update function takes in a single-timestep input and updates the state of the network so that subsequent inputs are treated as subsequent timesteps of the same sample. After passing in all timesteps one at a time, the resulting output is the same as if all timesteps were passed in as a single input.

type('lstmnet_predict_and_update.m')

function out = lstmnet_predict_and_update(in) %#codegen
% Copyright 2019-2024 The MathWorks, Inc. 

    dlIn = dlarray(in,'CT');
    persistent dlnet;
    
    if isempty(dlnet)
        dlnet = coder.loadDeepLearningNetwork('lstmnet.mat');
    end
    
    [dlOut, updatedState] = predict(dlnet, dlIn);
    dlnet.State = updatedState;
    
    out = extractdata(dlOut);
end

Run codegen on this new design file. Since we are taking in a single timestep each call, we specify matrixInput to have a fixed sequence dimension of 1 instead of a variable sequence length.

cfg = coder.gpuConfig('mex');
cfg.DeepLearningConfig = coder.DeepLearningConfig('TargetLibrary','none');
matrixInput = coder.typeof(single(0),[3 1]);

codegen -config cfg lstmnet_predict_and_update -args {matrixInput} -report

Code generation successful: View report

Run the generated MEX on the first validation sample's first timestep.

firstSample = XValidate{1};
firstTimestep = firstSample(:,1);
YPredStateful = lstmnet_predict_and_update_mex(firstTimestep);
[~, maxIndex] = max(YPredStateful, [], 1);
predictedLabelsStateful1 = labels(maxIndex)

predictedLabelsStateful1 = categorical
     Sitting

Compare the output label with the ground truth.

YValidate{1}(1)

ans = categorical
     Sitting