Define Nested Deep Learning Layer Using Network Composition

If Deep Learning Toolbox™ does not provide the layer you require for your classification or regression problem, then you can define your own custom layer using this example as a guide. For a list of built-in layers, see List of Deep Learning Layers.

To create a custom layer that itself defines a neural network, you can declare a dlnetwork object as a learnable parameter in the properties (Learnable) section of the layer definition. This method is known as network composition. You can use network composition to:

Create a network with control flow, for example, a network with a section that can dynamically change depending on the input data.
Create a network with loops, for example, a network with sections that feed the output back into itself.
Implement weight sharing, for example, in networks where different data needs to pass through the same layers such as twin neural networks or generative adversarial networks (GANs).

For nested networks that have both learnable and state parameters, for example, networks with batch normalization or LSTM layers, declare the network in the properties (Learnable, State) section of the layer definition.

For more information, see Deep Learning Network Composition.

To create a single layer that represents a block of layers, for example, a residual block, use a networkLayer. Network layers simplify building and editing large networks or networks with repeating components. For more information, see Create and Train Network with Nested Layers.

This example shows how to create a custom layer representing a residual block. The custom layer residualBlockLayer contains a learnable block of layers consisting of convolution, batch normalization, ReLU, and addition layers, and also includes a skip connection and an optional convolution layer and batch normalization layer in the skip connection. The layer has a single input that is used twice, as the input to each branch. This diagram highlights the residual block structure.

Tip

For this use case, it's typically easier to use a neural network without nesting. For an example showing how to create a residual network without using custom layers, see Train Residual Network for Image Classification.

To define a custom deep learning layer, you can use the template provided in this example, which takes you through these steps:

Name the layer — Give the layer a name so that you can use it in MATLAB^®.
Declare the layer properties — Specify the properties of the layer, including learnable parameters and state parameters.
Create the constructor function (optional) — Specify how to construct the layer and initialize its properties. If you do not specify a constructor function, then at creation, the software initializes the Name, Description, and Type properties with [] and sets the number of layer inputs and outputs to 1.
Create initialize function (optional) — Specify how to initialize the learnable and state parameters when the software initializes the network. If you do not specify an initialize function, then the software does not initialize parameters when it initializes the network.
Create forward functions — Specify how data passes forward through the layer (forward propagation) at prediction time and at training time.
Create reset state function (optional) — Specify how to reset state parameters.
Create a backward function (optional) — Specify the derivatives of the loss with respect to the input data and the learnable parameters (backward propagation). If you do not specify a backward function, then the forward functions must support dlarray objects.

Custom Layer Template

Copy the custom layer template into a new file in MATLAB. This template gives the structure of a layer class definition. It outlines:

The optional properties blocks for the layer properties, learnable parameters, and state parameters.
The optional layer constructor function.
The optional initialize function.
The predict function and the optional forward function.
The optional resetState function for layers with state properties.
The optional backward function.

classdef myLayer < nnet.layer.Layer % ...
        % & nnet.layer.Formattable ... % (Optional) 
        % & nnet.layer.Acceleratable % (Optional)

    properties
        % (Optional) Layer properties.

        % Declare layer properties here.
    end

    properties (Learnable)
        % (Optional) Layer learnable parameters.

        % Declare learnable parameters here.
    end

    properties (State)
        % (Optional) Layer state parameters.

        % Declare state parameters here.
    end

    properties (Learnable, State)
        % (Optional) Nested dlnetwork objects with both learnable
        % parameters and state parameters.

        % Declare nested networks with learnable and state parameters here.
    end

    methods
        function layer = myLayer()
            % (Optional) Create a myLayer.
            % This function must have the same name as the class.

            % Define layer constructor function here.
        end

        function layer = initialize(layer,layout)
            % (Optional) Initialize layer learnable and state parameters.
            %
            % Inputs:
            %         layer  - Layer to initialize
            %         layout - Data layout, specified as a networkDataLayout
            %                  object
            %
            % Outputs:
            %         layer - Initialized layer
            %
            %  - For layers with multiple inputs, replace layout with 
            %    layout1,...,layoutN, where N is the number of inputs.
            
            % Define layer initialization function here.
        end
        

        function [Y,state] = predict(layer,X)
            % Forward input data through the layer at prediction time and
            % output the result and updated state.
            %
            % Inputs:
            %         layer - Layer to forward propagate through 
            %         X     - Input data
            % Outputs:
            %         Y     - Output of layer forward function
            %         state - (Optional) Updated layer state
            %
            %  - For layers with multiple inputs, replace X with X1,...,XN, 
            %    where N is the number of inputs.
            %  - For layers with multiple outputs, replace Y with 
            %    Y1,...,YM, where M is the number of outputs.
            %  - For layers with multiple state parameters, replace state 
            %    with state1,...,stateK, where K is the number of state 
            %    parameters.

            % Define layer predict function here.
        end

        function [Y,state,memory] = forward(layer,X)
            % (Optional) Forward input data through the layer at training
            % time and output the result, the updated state, and a memory
            % value.
            %
            % Inputs:
            %         layer - Layer to forward propagate through 
            %         X     - Layer input data
            % Outputs:
            %         Y      - Output of layer forward function 
            %         state  - (Optional) Updated layer state 
            %         memory - (Optional) Memory value for custom backward
            %                  function
            %
            %  - For layers with multiple inputs, replace X with X1,...,XN, 
            %    where N is the number of inputs.
            %  - For layers with multiple outputs, replace Y with 
            %    Y1,...,YM, where M is the number of outputs.
            %  - For layers with multiple state parameters, replace state 
            %    with state1,...,stateK, where K is the number of state 
            %    parameters.

            % Define layer forward function here.
        end

        function layer = resetState(layer)
            % (Optional) Reset layer state.

            % Define reset state function here.
        end

        function [dLdX,dLdW,dLdSin] = backward(layer,X,Y,dLdY,dLdSout,memory)
            % (Optional) Backward propagate the derivative of the loss
            % function through the layer.
            %
            % Inputs:
            %         layer   - Layer to backward propagate through 
            %         X       - Layer input data 
            %         Y       - Layer output data 
            %         dLdY    - Derivative of loss with respect to layer 
            %                   output
            %         dLdSout - (Optional) Derivative of loss with respect 
            %                   to state output
            %         memory  - Memory value from forward function
            % Outputs:
            %         dLdX   - Derivative of loss with respect to layer input
            %         dLdW   - (Optional) Derivative of loss with respect to
            %                  learnable parameter 
            %         dLdSin - (Optional) Derivative of loss with respect to 
            %                  state input
            %
            %  - For layers with state parameters, the backward syntax must
            %    include both dLdSout and dLdSin, or neither.
            %  - For layers with multiple inputs, replace X and dLdX with
            %    X1,...,XN and dLdX1,...,dLdXN, respectively, where N is
            %    the number of inputs.
            %  - For layers with multiple outputs, replace Y and dLdY with
            %    Y1,...,YM and dLdY,...,dLdYM, respectively, where M is the
            %    number of outputs.
            %  - For layers with multiple learnable parameters, replace 
            %    dLdW with dLdW1,...,dLdWP, where P is the number of 
            %    learnable parameters.
            %  - For layers with multiple state parameters, replace dLdSin
            %    and dLdSout with dLdSin1,...,dLdSinK and 
            %    dLdSout1,...,dldSoutK, respectively, where K is the number
            %    of state parameters.

            % Define layer backward function here.
        end
    end
end

Name Layer and Specify Superclasses

First, give the layer a name. In the first line of the class file, replace the existing name myLayer with residualBlockLayer.

classdef residualBlockLayer < nnet.layer.Layer % ...
        % & nnet.layer.Formattable ... % (Optional) 
        % & nnet.layer.Acceleratable % (Optional)
    ...
end

If you do not specify a backward function, then the layer functions, by default, receive unformatted dlarray objects as input. To specify that the layer receives formatted dlarray objects as input and also outputs formatted dlarray objects, also inherit from the nnet.layer.Formattable class when defining the custom layer.

Passing data through a dlnetwork requires formatted dlarray objects. To enable the layer to receive formatted dlarray objects as input, inherit from nnet.layer.Formattable. The layer functions support acceleration, so also inherit from nnet.layer.Acceleratable. For more information about accelerating custom layer functions, see Custom Layer Function Acceleration.

classdef residualBlockLayer < nnet.layer.Layer ...
        & nnet.layer.Formattable ...
        & nnet.layer.Acceleratable

    ...
end

Next, rename the myLayer constructor function (the first function in the methods section) so that it has the same name as the layer.

    methods
        function layer = residualBlockLayer()           
            ...
        end

        ...
     end

Save Layer

Save the layer class file in a new file named residualBlockLayer.m. The file name must match the layer name. To use the layer, you must save the file in the current folder or in a folder on the MATLAB path.

Declare Properties and Learnable Parameters

Declare the layer properties in the properties section and declare learnable parameters by listing them in the properties (Learnable) section.

By default, custom layers have these properties. Do not declare these properties in the properties section.

Property	Description
`Name`	Layer name, specified as a character vector or string scalar. For `Layer` array input, the `trainnet` and `dlnetwork` functions automatically assign names to layers with the name `""`.
`Description`	One-line description of the layer, specified as a string scalar or a character vector. This description appears when the layer is displayed in a `Layer` array. If you do not specify a layer description, then the software displays the layer class name.
`Type`	Type of the layer, specified as a character vector or a string scalar. The value of `Type` appears when the layer is displayed in a `Layer` array. If you do not specify a layer type, then the software displays the layer class name.
`NumInputs`	Number of inputs of the layer, specified as a positive integer. If you do not specify this value, then the software automatically sets `NumInputs` to the number of names in `InputNames`. The default value is 1.
`InputNames`	Input names of the layer, specified as a cell array of character vectors. If you do not specify this value and `NumInputs` is greater than 1, then the software automatically sets `InputNames` to `{'in1',...,'inN'}`, where `N` is equal to `NumInputs`. The default value is `{'in'}`.
`NumOutputs`	Number of outputs of the layer, specified as a positive integer. If you do not specify this value, then the software automatically sets `NumOutputs` to the number of names in `OutputNames`. The default value is 1.
`OutputNames`	Output names of the layer, specified as a cell array of character vectors. If you do not specify this value and `NumOutputs` is greater than 1, then the software automatically sets `OutputNames` to `{'out1',...,'outM'}`, where `M` is equal to `NumOutputs`. The default value is `{'out'}`.

If the layer has no other properties, then you can omit the properties section.

Tip

If you are creating a layer with multiple inputs, then you must set either the NumInputs or InputNames properties in the layer constructor. If you are creating a layer with multiple outputs, then you must set either the NumOutputs or OutputNames properties in the layer constructor. For an example, see Define Custom Deep Learning Layer with Multiple Inputs.

The residual block layer does not require any additional properties, so you can remove the properties section.

This custom layer has only one learnable parameter, the residual block itself specified as a dlnetwork object. The network also has state parameters (because it has batch normalization layers), so declare this parameter in the properties (Learnable, State) section and call the parameter Network.

    properties (Learnable, State)
        % Nested dlnetwork objects with both learnable
        % parameters and state parameters.
    
        % Residual block.
        Network
    end

Create Constructor Function

Create the function that constructs the layer and initializes the layer properties. Specify any variables required to create the layer as inputs to the constructor function.

The residual block layer constructor function requires four input arguments:

Number of convolutional filters
Stride (optional, with default stride 1)
Flag to include convolution in skip connection (optional, with default flag false)
Layer name (optional, with default name '')

In the constructor function residualBlockLayer, specify the required input argument numFilters and the optional arguments as name-value pairs with the name NameValueArgs. Add a comment to the top of the function that explains the syntax of the function.

        function layer = residualBlockLayer(numFilters,NameValueArgs)
            % layer = residualBlockLayer(numFilters) creates a residual
            % block layer with the specified number of filters.
            %
            % layer = residualBlockLayer(numFilters,Name=Value) specifies
            % additional options using one or more name-value arguments:
            % 
            %     Stride                 - Stride of convolution operation 
            %                              (default 1)
            %
            %     IncludeSkipConvolution - Flag to include convolution in
            %                              skip connection
            %                              (default false)
            %
            %     Name                   - Layer name
            %                              (default "")

            ...
        end

Parse Input Arguments

Parse the input arguments using an arguments block. List the arguments in the same order as the function syntax and specify the default values. Then, extract the values from the NameValueArgs input.

            % Parse input arguments.
            arguments
                numFilters                
                NameValueArgs.Stride = 1
                NameValueArgs.IncludeSkipConvolution = false
                NameValueArgs.Name = ""
            end
            
            stride = NameValueArgs.Stride;
            includeSkipConvolution = NameValueArgs.IncludeSkipConvolution;
            name = NameValueArgs.Name;

Initialize Layer Properties

In the constructor function, initialize the layer properties, including the dlnetwork object. Replace the comment % Layer constructor function goes here with code that initializes the layer properties.

Set the Name property to the input argument name.

            % Set layer name.
            layer.Name = name;

Give the layer a one-line description by setting the Description property of the layer. Set the description to describe the layer and any optional properties.

            % Set layer description.
            description = "Residual block with " + numFilters + " filters, stride " + stride;
            if includeSkipConvolution
                description = description + ", and skip convolution";
            end
            layer.Description = description;

Specify the type of the layer by setting the Type property. The value of Type appears when the layer is displayed in a Layer array.

            % Set layer type.
            layer.Type = "Residual Block";

Define the residual block. You can create the residual block layers as an uninitialized nested dlnetwork object without an input layer and allow the software to automatically initialize the learnable and state parameters at training time. For more information, see Automatically Initialize Learnable dlnetwork Objects for Training.

First, create a neural network containing the main layers of the block.

            % Define nested network.
            net = dlnetwork;
            
            layers = [
                convolution2dLayer(3,numFilters,Padding="same",Stride=stride)
                batchNormalizationLayer
                reluLayer
                convolution2dLayer(3,numFilters,Padding="same")
                batchNormalizationLayer
                additionLayer(2,Name="add")
                reluLayer];

            net = addLayers(net, layers);

Next, add the skip connection. If the includeSkipConvolution flag is true, then also include a convolution layer and batch normalization layer in the skip connection.

            % Add skip connection.
            if includeSkipConvolution
                layers = [
                    convolution2dLayer(1,numFilters,Stride=stride)
                    batchNormalizationLayer(Name="bnSkip")];
                
                net = addLayers(net,layers);
                net = connectLayers(net,"bnSkip","add/in2"); 
            end

Since there is no input layer, this network has two unconnected inputs. If the network does not have the skip connection, the input to the first convolution layer and one of the inputs to the "add" layer are unconnected. If the network does have the skip connection, then the unconnected inputs are the inputs to the first convolution layer and the convolution layer in the skip connection.

Finally, set the layer Network property.

            % Set Network property.
            layer.Network = net;

View the completed constructor function.

        function layer = residualBlockLayer(numFilters,NameValueArgs)
            % layer = residualBlockLayer(numFilters) creates a residual
            % block layer with the specified number of filters.
            %
            % layer = residualBlockLayer(numFilters,Name=Value) specifies
            % additional options using one or more name-value arguments:
            % 
            %     Stride                 - Stride of convolution operation 
            %                              (default 1)
            %
            %     IncludeSkipConvolution - Flag to include convolution in
            %                              skip connection
            %                              (default false)
            %
            %     Name                   - Layer name
            %                              (default "")
    
            % Parse input arguments.
            arguments
                numFilters
                NameValueArgs.Stride = 1
                NameValueArgs.IncludeSkipConvolution = false
                NameValueArgs.Name = ""
            end
    
            stride = NameValueArgs.Stride;
            includeSkipConvolution = NameValueArgs.IncludeSkipConvolution;
            name = NameValueArgs.Name;
    
            % Set layer name.
            layer.Name = name;
    
            % Set layer description.
            description = "Residual block with " + numFilters + " filters, stride " + stride;
            if includeSkipConvolution
                description = description + ", and skip convolution";
            end
            layer.Description = description;
            
            % Set layer type.
            layer.Type = "Residual Block";
    
            % Define nested network.
            net = dlnetwork;

            layers = [
                convolution2dLayer(3,numFilters,Padding="same",Stride=stride)
                batchNormalizationLayer
                reluLayer
                convolution2dLayer(3,numFilters,Padding="same")
                batchNormalizationLayer
                additionLayer(2,Name="add")
                reluLayer];
    
            net = addLayers(net, layers);
    
            % Add skip connection.
            if includeSkipConvolution
                layers = [
                    convolution2dLayer(1,numFilters,Stride=stride)
                    batchNormalizationLayer(Name="bnSkip")];
     
                net = addLayers(net,layers);
                net = connectLayers(net,"bnSkip","add/in2");  
            end 
    
            % Set Network property.
            layer.Network = net;
        end

With this constructor function, the command residualBlockLayer(64,Stride=2,IncludeSkipConvolution=true,Name="res5") creates a residual block layer with 64 filters, a stride of 2, a convolution in the skip connection, and with the name "res5". The required sizes of weights and parameters are determined when the completed network is assembled for training.

Because the nested network supports automatic initialization, defining the initialize function is optional. For layers that require information from the input data to initialize the learnable parameters, for example, the weights of a SReLU layer must have the same number of channels as the input data, you can implement a custom initialize function. For an example, see Define Custom Deep Learning Layer with Learnable Parameters.

Create Forward Functions

Create the layer forward functions to use at prediction time and training time.

Create a function named predict that propagates the data forward through the layer at prediction time and outputs the result.

The predict function syntax depends on the type of layer.

Y = predict(layer,X) forwards the input data X through the layer and outputs the result Y, where layer has a single input and a single output.
[Y,state] = predict(layer,X) also outputs the updated state parameter state, where layer has a single state parameter.

You can adjust the syntaxes for layers with multiple inputs, multiple outputs, or multiple state parameters:

For layers with multiple inputs, replace X with X1,...,XN, where N is the number of inputs. The NumInputs property must match N.
For layers with multiple outputs, replace Y with Y1,...,YM, where M is the number of outputs. The NumOutputs property must match M.
For layers with multiple state parameters, replace state with state1,...,stateK, where K is the number of state parameters.

Tip

If the number of inputs to the layer can vary, then use varargin instead of X1,…,XN. In this case, varargin is a cell array of the inputs, where varargin{i} corresponds to Xi.

If the number of outputs can vary, then use varargout instead of Y1,…,YM. In this case, varargout is a cell array of the outputs, where varargout{j} corresponds to Yj.

Tip

If the custom layer has a dlnetwork object for a learnable parameter, then in the predict function of the custom layer, use the predict function for the dlnetwork. When you do so, the dlnetwork object predict function uses the appropriate layer operations for prediction. If the dlnetwork has state parameters, then also return the network state.

Because the residual block has only one input, one output, and a state parameter, the syntax for predict for the custom layer is [Y,state] = predict(layer,X).

By default, the layer uses predict as the forward function at training time. To use a different forward function at training time, or retain a value required for a custom backward function, you must also create a function named forward.

The dimensions of the inputs depend on the type of data and the output of the connected layers.

Layer Input	Example
Layer Input	Shape	Data Format
2-D images	h-by-w-by-c-by-N numeric array, where h, w, c and N are the height, width, number of channels of the images, and number of observations, respectively.	`"SSCB"`
3-D images	h-by-w-by-d-by-c-by-N numeric array, where h, w, d, c and N are the height, width, depth, number of channels of the images, and number of image observations, respectively.	`"SSSCB"`
Vector sequences	c-by-N-by-s matrix, where c is the number of features of the sequence, N is the number of sequence observations, and s is the sequence length.	`"CBT"`
2-D image sequences	h-by-w-by-c-by-N-by-s array, where h, w, and c correspond to the height, width, and number of channels of the image, respectively, N is the number of image sequence observations, and s is the sequence length.	`"SSCBT"`
3-D image sequences	h-by-w-by-d-by-c-by-N-by-s array, where h, w, d, and c correspond to the height, width, depth, and number of channels of the image, respectively, N is the number of image sequence observations, and s is the sequence length.	`"SSSCBT"`
Features	c-by-N array, where c is the number of features, and N is the number of observations.	`"CB"`

For layers that output sequences, the layers can output sequences of any length or output data with no time dimension.

For the residual block layer, a forward pass of the layer is simply a forward pass of the dlnetwork object.

Implement this operation in the custom layer function predict. To perform a forward pass of the dlnetwork for prediction, use the predict function for dlnetwork objects. In this case, the input to the residual block layer is used as the input to both of the unconnected inputs to the dlnetwork object, so the syntax for predict for the dlnetwork object is [Y,state] = predict(net,X,X).

Because the layers in the dlnetwork object do not behave differently during training and that the residual block layer does not require memory or a different forward function for training, you can remove the forward function from the class file.

Create the predict function and add a comment to the top of the function that explains the syntaxes of the function.

        function [Y,state] = predict(layer, X)
            % Forward input data through the layer at prediction time and
            % output the result and state.
            %
            % Inputs:
            %         layer - Layer to forward propagate through
            %         X     - Input data
            % Outputs:
            %         Y     - Output of layer forward function
            %         state - Layer state

            % Predict using network.
            net = layer.Network;
            [Y,state] = predict(net,X,X);
            
        end

Because the predict function uses only functions that support dlarray objects, defining the backward function is optional. For a list of functions that support dlarray objects, see List of Functions with dlarray Support.

Completed Layer

View the completed layer class file.

classdef residualBlockLayer < nnet.layer.Layer ...
        & nnet.layer.Formattable ...
        & nnet.layer.Acceleratable
    % Example custom residual block layer.


    properties (Learnable, State)
        % Nested dlnetwork objects with both learnable
        % parameters and state parameters.
    
        % Residual block.
        Network
    end
    
    methods
        function layer = residualBlockLayer(numFilters,NameValueArgs)
            % layer = residualBlockLayer(numFilters) creates a residual
            % block layer with the specified number of filters.
            %
            % layer = residualBlockLayer(numFilters,Name=Value) specifies
            % additional options using one or more name-value arguments:
            % 
            %     Stride                 - Stride of convolution operation 
            %                              (default 1)
            %
            %     IncludeSkipConvolution - Flag to include convolution in
            %                              skip connection
            %                              (default false)
            %
            %     Name                   - Layer name
            %                              (default "")
    
            % Parse input arguments.
            arguments
                numFilters
                NameValueArgs.Stride = 1
                NameValueArgs.IncludeSkipConvolution = false
                NameValueArgs.Name = ""
            end
    
            stride = NameValueArgs.Stride;
            includeSkipConvolution = NameValueArgs.IncludeSkipConvolution;
            name = NameValueArgs.Name;
    
            % Set layer name.
            layer.Name = name;
    
            % Set layer description.
            description = "Residual block with " + numFilters + " filters, stride " + stride;
            if includeSkipConvolution
                description = description + ", and skip convolution";
            end
            layer.Description = description;
            
            % Set layer type.
            layer.Type = "Residual Block";
    
            % Define nested network.
            net = dlnetwork;

            layers = [
                convolution2dLayer(3,numFilters,Padding="same",Stride=stride)
                batchNormalizationLayer
                reluLayer
                convolution2dLayer(3,numFilters,Padding="same")
                batchNormalizationLayer
                additionLayer(2,Name="add")
                reluLayer];
    
            net = addLayers(net, layers);
    
            % Add skip connection.
            if includeSkipConvolution
                layers = [
                    convolution2dLayer(1,numFilters,Stride=stride)
                    batchNormalizationLayer(Name="bnSkip")];
     
                net = addLayers(net,layers);
                net = connectLayers(net,"bnSkip","add/in2");  
            end 
    
            % Set Network property.
            layer.Network = net;
        end
        
        function [Y,state] = predict(layer, X)
            % Forward input data through the layer at prediction time and
            % output the result and state.
            %
            % Inputs:
            %         layer - Layer to forward propagate through
            %         X     - Input data
            % Outputs:
            %         Y     - Output of layer forward function
            %         state - Layer state

            % Predict using network.
            net = layer.Network;
            [Y,state] = predict(net,X,X);
            
        end
    end
end

GPU Compatibility

If the layer forward functions fully support dlarray objects, then the layer is GPU compatible. Otherwise, to be GPU compatible, the layer functions must support inputs and return outputs of type gpuArray (Parallel Computing Toolbox).

Many MATLAB built-in functions support gpuArray (Parallel Computing Toolbox) and dlarray input arguments. For a list of functions that support dlarray objects, see List of Functions with dlarray Support. For a list of functions that execute on a GPU, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox). To use a GPU for deep learning, you must also have a supported GPU device. For information on supported devices, see GPU Computing Requirements (Parallel Computing Toolbox). For more information on working with GPUs in MATLAB, see GPU Computing in MATLAB (Parallel Computing Toolbox).

In this example, the MATLAB functions used in predict all support dlarray objects, so the layer is GPU compatible.