Main Content

sequenceInputLayer

Sequence input layer

Description

A sequence input layer inputs sequence data to a neural network and applies data normalization.

Creation

Description

layer = sequenceInputLayer(inputSize) creates a sequence input layer and sets the InputSize property.

example

layer = sequenceInputLayer(inputSize,Name,Value) sets the optional MinLength, Normalization, Mean, and Name properties using name-value pairs. You can specify multiple name-value pairs. Enclose each property name in single quotes.

Properties

expand all

Sequence Input

Size of the input, specified as a positive integer or a vector of positive integers.

  • For vector sequence input, InputSize is a scalar corresponding to the number of features.

  • For 1-D image sequence input, InputSize is vector of two elements [h c], where h is the image height and c is the number of channels of the image.

  • For 2-D image sequence input, InputSize is vector of three elements [h w c], where h is the image height, w is the image width, and c is the number of channels of the image.

  • For 3-D image sequence input, InputSize is vector of four elements [h w d c], where h is the image height, w is the image width, d is the image depth, and c is the number of channels of the image.

To specify the minimum sequence length of the input data, use the MinLength property.

Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64

Minimum sequence length of input data, specified as a positive integer. When training or making predictions with the network, if the input data has fewer than MinLength time steps, then the software throws an error.

When you create a network that downsamples data in the time dimension, you must take care that the network supports your training data and any data for prediction. Some deep learning layers require that the input has a minimum sequence length. For example, a 1-D convolution layer requires that the input has at least as many time steps as the filter size.

As time series of sequence data propagates through a network, the sequence length can change. For example, downsampling operations such as 1-D convolutions can output data with fewer time steps than its input. This means that downsampling operations can cause later layers in the network to throw an error because the data has a shorter sequence length than the minimum length required by the layer.

When you train or assemble a network, the software automatically checks that sequences of length 1 can propagate through the network. Some networks might not support sequences of length 1, but can successfully propagate sequences of longer lengths. To check that a network supports propagating your training and expected prediction data, set the MinLength property to a value less than or equal to the minimum length of your data and the expected minimum length of your prediction data.

Tip

To prevent convolution and pooling layers from changing the size of the data, set the Padding option of the layer to "same" or "causal".

Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64

Data normalization to apply every time data is forward propagated through the input layer, specified as one of the following:

  • 'zerocenter' — Subtract the mean specified by Mean.

  • 'zscore' — Subtract the mean specified by Mean and divide by StandardDeviation.

  • 'rescale-symmetric' — Rescale the input to be in the range [-1, 1] using the minimum and maximum values specified by Min and Max, respectively.

  • 'rescale-zero-one' — Rescale the input to be in the range [0, 1] using the minimum and maximum values specified by Min and Max, respectively.

  • 'none' — Do not normalize the input data.

  • function handle — Normalize the data using the specified function. The function must be of the form Y = func(X), where X is the input data and the output Y is the normalized data.

Tip

The software, by default, automatically calculates the normalization statistics when using the trainnet and trainNetwork functions. To save time when training, specify the required statistics for normalization and set the ResetInputNormalization option in trainingOptions to 0 (false).

The software applies normalization to all input elements, including padding values.

Data Types: char | string | function_handle

Normalization dimension, specified as one of the following:

  • 'auto' – If the training option is false and you specify any of the normalization statistics (Mean, StandardDeviation, Min, or Max), then normalize over the dimensions matching the statistics. Otherwise, recalculate the statistics at training time and apply channel-wise normalization.

  • 'channel' – Channel-wise normalization.

  • 'element' – Element-wise normalization.

  • 'all' – Normalize all values using scalar statistics.

Data Types: char | string

Mean for zero-center and z-score normalization, specified as a numeric array, or empty.

  • For vector sequence input, Mean must be a InputSize-by-1 vector of means per channel, a numeric scalar, or [].

  • For 2-D image sequence input, Mean must be a numeric array of the same size as InputSize, a 1-by-1-by-InputSize(3) array of means per channel, a numeric scalar, or [].

  • For 3-D image sequence input, Mean must be a numeric array of the same size as InputSize, a 1-by-1-by-1-by-InputSize(4) array of means per channel, a numeric scalar, or [].

If you specify the Mean property, then Normalization must be 'zerocenter' or 'zscore'. If Mean is [], then the trainnet and trainNetwork functions calculate the mean and ignores padding values. To train a dlnetwork object using a custom training loop or assemble a network without training it using the assembleNetwork function, you must set the Mean property to a numeric scalar or a numeric array.

Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64

Standard deviation used for z-score normalization, specified as a numeric array, a numeric scalar, or empty.

  • For vector sequence input, StandardDeviation must be a InputSize-by-1 vector of standard deviations per channel, a numeric scalar, or [].

  • For 2-D image sequence input, StandardDeviation must be a numeric array of the same size as InputSize, a 1-by-1-by-InputSize(3) array of standard deviations per channel, a numeric scalar, or [].

  • For 3-D image sequence input, StandardDeviation must be a numeric array of the same size as InputSize, a 1-by-1-by-1-by-InputSize(4) array of standard deviations per channel, or a numeric scalar.

If you specify the StandardDeviation property, then Normalization must be 'zscore'. If StandardDeviation is [], then the trainnet and trainNetwork functions calculate the mean and ignores padding values. To train a dlnetwork object using a custom training loop or assemble a network without training it using the assembleNetwork function, you must set the StandardDeviation property to a numeric scalar or a numeric array.

Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64

Minimum value for rescaling, specified as a numeric array, or empty.

  • For vector sequence input, Min must be a InputSize-by-1 vector of means per channel or a numeric scalar.

  • For 2-D image sequence input, Min must be a numeric array of the same size as InputSize, a 1-by-1-by-InputSize(3) array of minima per channel, or a numeric scalar.

  • For 3-D image sequence input, Min must be a numeric array of the same size as InputSize, a 1-by-1-by-1-by-InputSize(4) array of minima per channel, or a numeric scalar.

If you specify the Min property, then Normalization must be 'rescale-symmetric' or 'rescale-zero-one'. If Min is [], then the trainnet and trainNetwork functions calculate the minima and ignores padding values. To train a dlnetwork object using a custom training loop or assemble a network without training it using the assembleNetwork function, you must set the Min property to a numeric scalar or a numeric array.

Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64

Maximum value for rescaling, specified as a numeric array, or empty.

  • For vector sequence input, Max must be a InputSize-by-1 vector of means per channel or a numeric scalar.

  • For 2-D image sequence input, Max must be a numeric array of the same size as InputSize, a 1-by-1-by-InputSize(3) array of maxima per channel, a numeric scalar, or [].

  • For 3-D image sequence input, Max must be a numeric array of the same size as InputSize, a 1-by-1-by-1-by-InputSize(4) array of maxima per channel, a numeric scalar, or [].

If you specify the Max property, then Normalization must be 'rescale-symmetric' or 'rescale-zero-one'. If Max is [], then the trainnet and trainNetwork functions calculate the maxima and ignores padding values. To train a dlnetwork object using a custom training loop or assemble a network without training it using the assembleNetwork function, you must set the Max property to a numeric scalar or a numeric array.

Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64

This property is read-only.

Flag to split input data into real and imaginary components specified as one of these values:

  • 0 (false) – Do not split input data.

  • 1 (true) – Split data into real and imaginary components.

When SplitComplexInputs is 1, then the layer outputs twice as many channels as the input data. For example, if the input data is complex-values with numChannels channels, then the layer outputs data with 2*numChannels channels, where channels 1 through numChannels contain the real components of the input data and numChannels+1 through 2*numChannels contain the imaginary components of the input data. If the input data is real, then channels numChannels+1 through 2*numChannels are all zero.

To input complex-valued data into a neural network, the SplitComplexInputs option of the input layer must be 1.

For an example showing how to train a network with complex-valued data, see Train Network with Complex-Valued Data.

Layer

Layer name, specified as a character vector or a string scalar. For Layer array input, the trainnet, trainNetwork, assembleNetwork, layerGraph, and dlnetwork functions automatically assign names to layers with the name "".

The SequenceInputLayer object stores this property as a character vector.

Data Types: char | string

This property is read-only.

Number of inputs of the layer. The layer has no inputs.

Data Types: double

This property is read-only.

Input names of the layer. The layer has no inputs.

Data Types: cell

This property is read-only.

Number of outputs from the layer, returned as 1. This layer has a single output only.

Data Types: double

This property is read-only.

Output names, returned as {'out'}. This layer has a single output only.

Data Types: cell

Examples

collapse all

Create a sequence input layer with the name 'seq1' and an input size of 12.

layer = sequenceInputLayer(12,'Name','seq1')
layer = 
  SequenceInputLayer with properties:

                      Name: 'seq1'
                 InputSize: 12
                 MinLength: 1
        SplitComplexInputs: 0

   Hyperparameters
             Normalization: 'none'
    NormalizationDimension: 'auto'

Include a sequence input layer in a Layer array.

inputSize = 12;
numHiddenUnits = 100;
numClasses = 9;

layers = [ ...
    sequenceInputLayer(inputSize)
    lstmLayer(numHiddenUnits,'OutputMode','last')
    fullyConnectedLayer(numClasses)
    softmaxLayer
    classificationLayer]
layers = 
  5x1 Layer array with layers:

     1   ''   Sequence Input          Sequence input with 12 dimensions
     2   ''   LSTM                    LSTM with 100 hidden units
     3   ''   Fully Connected         9 fully connected layer
     4   ''   Softmax                 softmax
     5   ''   Classification Output   crossentropyex

Create a sequence input layer for sequences of 224-224 RGB images with the name 'seq1'.

layer = sequenceInputLayer([224 224 3], 'Name', 'seq1')
layer = 
  SequenceInputLayer with properties:

                      Name: 'seq1'
                 InputSize: [224 224 3]
                 MinLength: 1
        SplitComplexInputs: 0

   Hyperparameters
             Normalization: 'none'
    NormalizationDimension: 'auto'

Train a deep learning LSTM network for sequence-to-label classification.

Load the example data from WaveformData.mat. The data is a numObservations-by-1 cell array of sequences, where numObservations is the number of sequences. Each sequence is a numChannels-by-numTimeSteps numeric array, where numChannels is the number of channels of the sequence and numTimeSteps is the number of time steps of the sequence.

load WaveformData

Visualize some of the sequences in a plot.

numChannels = size(data{1},1);

idx = [3 4 5 12];
figure
tiledlayout(2,2)
for i = 1:4
    nexttile
    stackedplot(data{idx(i)}',DisplayLabels="Channel "+string(1:numChannels))
    
    xlabel("Time Step")
    title("Class: " + string(labels(idx(i))))
end

Set aside data for testing. Partition the data into a training set containing 90% of the data and a test set containing the remaining 10% of the data. To partition the data, use the trainingPartitions function, attached to this example as a supporting file. To access this file, open the example as a live script.

numObservations = numel(data);
[idxTrain,idxTest] = trainingPartitions(numObservations, [0.9 0.1]);
XTrain = data(idxTrain);
TTrain = labels(idxTrain);

XTest = data(idxTest);
TTest = labels(idxTest);

Define the LSTM network architecture. Specify the input size as the number of channels of the input data. Specify an LSTM layer to have 120 hidden units and to output the last element of the sequence. Finally, include a fully connected with an output size that matches the number of classes, followed by a softmax layer and a classification layer.

numHiddenUnits = 120;
numClasses = numel(categories(TTrain));

layers = [ ...
    sequenceInputLayer(numChannels)
    lstmLayer(numHiddenUnits,OutputMode="last")
    fullyConnectedLayer(numClasses)
    softmaxLayer
    classificationLayer]
layers = 
  5×1 Layer array with layers:

     1   ''   Sequence Input          Sequence input with 3 dimensions
     2   ''   LSTM                    LSTM with 120 hidden units
     3   ''   Fully Connected         4 fully connected layer
     4   ''   Softmax                 softmax
     5   ''   Classification Output   crossentropyex

Specify the training options. Train using the Adam solver with a learn rate of 0.01 and a gradient threshold of 1. Set the maximum number of epochs to 150 and shuffle every epoch. The software, by default, trains on a GPU if one is available. Using a GPU requires Parallel Computing Toolbox and a supported GPU device. For information on supported devices, see GPU Computing Requirements (Parallel Computing Toolbox).

options = trainingOptions("adam", ...
    MaxEpochs=150, ...
    InitialLearnRate=0.01,...
    Shuffle="every-epoch", ...
    GradientThreshold=1, ...
    Verbose=false, ...
    Plots="training-progress");

Train the LSTM network with the specified training options.

net = trainNetwork(XTrain,TTrain,layers,options);

Classify the test data. Specify the same mini-batch size used for training.

YTest = classify(net,XTest);

Calculate the classification accuracy of the predictions.

acc = mean(YTest == TTest)
acc = 0.8400

Display the classification results in a confusion chart.

figure
confusionchart(TTest,YTest)

To create an LSTM network for sequence-to-label classification, create a layer array containing a sequence input layer, an LSTM layer, a fully connected layer, a softmax layer, and a classification output layer.

Set the size of the sequence input layer to the number of features of the input data. Set the size of the fully connected layer to the number of classes. You do not need to specify the sequence length.

For the LSTM layer, specify the number of hidden units and the output mode 'last'.

numFeatures = 12;
numHiddenUnits = 100;
numClasses = 9;
layers = [ ...
    sequenceInputLayer(numFeatures)
    lstmLayer(numHiddenUnits,'OutputMode','last')
    fullyConnectedLayer(numClasses)
    softmaxLayer
    classificationLayer];

For an example showing how to train an LSTM network for sequence-to-label classification and classify new data, see Sequence Classification Using Deep Learning.

To create an LSTM network for sequence-to-sequence classification, use the same architecture as for sequence-to-label classification, but set the output mode of the LSTM layer to 'sequence'.

numFeatures = 12;
numHiddenUnits = 100;
numClasses = 9;
layers = [ ...
    sequenceInputLayer(numFeatures)
    lstmLayer(numHiddenUnits,'OutputMode','sequence')
    fullyConnectedLayer(numClasses)
    softmaxLayer
    classificationLayer];

To create an LSTM network for sequence-to-one regression, create a layer array containing a sequence input layer, an LSTM layer, a fully connected layer, and a regression output layer.

Set the size of the sequence input layer to the number of features of the input data. Set the size of the fully connected layer to the number of responses. You do not need to specify the sequence length.

For the LSTM layer, specify the number of hidden units and the output mode 'last'.

numFeatures = 12;
numHiddenUnits = 125;
numResponses = 1;

layers = [ ...
    sequenceInputLayer(numFeatures)
    lstmLayer(numHiddenUnits,'OutputMode','last')
    fullyConnectedLayer(numResponses)
    regressionLayer];

To create an LSTM network for sequence-to-sequence regression, use the same architecture as for sequence-to-one regression, but set the output mode of the LSTM layer to 'sequence'.

numFeatures = 12;
numHiddenUnits = 125;
numResponses = 1;

layers = [ ...
    sequenceInputLayer(numFeatures)
    lstmLayer(numHiddenUnits,'OutputMode','sequence')
    fullyConnectedLayer(numResponses)
    regressionLayer];

For an example showing how to train an LSTM network for sequence-to-sequence regression and predict on new data, see Sequence-to-Sequence Regression Using Deep Learning.

You can make LSTM networks deeper by inserting extra LSTM layers with the output mode 'sequence' before the LSTM layer. To prevent overfitting, you can insert dropout layers after the LSTM layers.

For sequence-to-label classification networks, the output mode of the last LSTM layer must be 'last'.

numFeatures = 12;
numHiddenUnits1 = 125;
numHiddenUnits2 = 100;
numClasses = 9;
layers = [ ...
    sequenceInputLayer(numFeatures)
    lstmLayer(numHiddenUnits1,'OutputMode','sequence')
    dropoutLayer(0.2)
    lstmLayer(numHiddenUnits2,'OutputMode','last')
    dropoutLayer(0.2)
    fullyConnectedLayer(numClasses)
    softmaxLayer
    classificationLayer];

For sequence-to-sequence classification networks, the output mode of the last LSTM layer must be 'sequence'.

numFeatures = 12;
numHiddenUnits1 = 125;
numHiddenUnits2 = 100;
numClasses = 9;
layers = [ ...
    sequenceInputLayer(numFeatures)
    lstmLayer(numHiddenUnits1,'OutputMode','sequence')
    dropoutLayer(0.2)
    lstmLayer(numHiddenUnits2,'OutputMode','sequence')
    dropoutLayer(0.2)
    fullyConnectedLayer(numClasses)
    softmaxLayer
    classificationLayer];

Create a deep learning network for data containing sequences of images, such as video and medical image data.

  • To input sequences of images into a network, use a sequence input layer.

  • To apply convolutional operations independently to each time step, first convert the sequences of images to an array of images using a sequence folding layer.

  • To restore the sequence structure after performing these operations, convert this array of images back to image sequences using a sequence unfolding layer.

  • To convert images to feature vectors, use a flatten layer.

You can then input vector sequences into LSTM and BiLSTM layers.

Define Network Architecture

Create a classification LSTM network that classifies sequences of 28-by-28 grayscale images into 10 classes.

Define the following network architecture:

  • A sequence input layer with an input size of [28 28 1].

  • A convolution, batch normalization, and ReLU layer block with 20 5-by-5 filters.

  • An LSTM layer with 200 hidden units that outputs the last time step only.

  • A fully connected layer of size 10 (the number of classes) followed by a softmax layer and a classification layer.

To perform the convolutional operations on each time step independently, include a sequence folding layer before the convolutional layers. LSTM layers expect vector sequence input. To restore the sequence structure and reshape the output of the convolutional layers to sequences of feature vectors, insert a sequence unfolding layer and a flatten layer between the convolutional layers and the LSTM layer.

inputSize = [28 28 1];
filterSize = 5;
numFilters = 20;
numHiddenUnits = 200;
numClasses = 10;

layers = [ ...
    sequenceInputLayer(inputSize,'Name','input')
    
    sequenceFoldingLayer('Name','fold')
    
    convolution2dLayer(filterSize,numFilters,'Name','conv')
    batchNormalizationLayer('Name','bn')
    reluLayer('Name','relu')
    
    sequenceUnfoldingLayer('Name','unfold')
    flattenLayer('Name','flatten')
    
    lstmLayer(numHiddenUnits,'OutputMode','last','Name','lstm')
    
    fullyConnectedLayer(numClasses, 'Name','fc')
    softmaxLayer('Name','softmax')
    classificationLayer('Name','classification')];

Convert the layers to a layer graph and connect the miniBatchSize output of the sequence folding layer to the corresponding input of the sequence unfolding layer.

lgraph = layerGraph(layers);
lgraph = connectLayers(lgraph,'fold/miniBatchSize','unfold/miniBatchSize');

View the final network architecture using the plot function.

figure
plot(lgraph)

Figure contains an axes object. The axes object contains an object of type graphplot.

Algorithms

expand all

Extended Capabilities

Version History

Introduced in R2017b

expand all