How to create Custom Regression Output Layer with multiple inputs for training sequence-to-sequence LSTM model?

Question

0 个投票

For the neural network architecture I am using for my problem, I would like to define a Regression Output Layer with a custom loss function. For this, I would need the regression layer to have two inputs, one from a fully connected layer and other from the sequenceInput layer, however I am not able to achieve that. How do I get around this?

Following is the definition of the custom layer:

classdef customLossLayerMultiInput < nnet.layer.RegressionLayer & nnet.layer.Acceleratable
    % Custom regression layer with mean-absolute-error loss and additional properties.
    
    properties
        node_properties
        numFeature
    end
    
    methods
        function layer = customLossLayerMultiInput(name, node_properties, numFeature)
            % Constructor
            layer.Name = name;
            layer.Description = 'Physics-Informed loss function for LSTM training';
            layer.node_properties = node_properties;
            layer.numFeature = numFeature;
        end
        
        function loss = forwardLoss(layer, Y, T, varargin)
            % Calculate the forward loss
            
            % Reshape predictions and targets
            Y = reshape(Y, [], 1);
            T = reshape(T, [], 1);
            X1 = varargin{1};
            X2 = varargin{2}; 
            % Sequence input data
            sequence_input_data = reshape(X1, [], layer.numFeature);
            % Calculate mean residue
            mean_residue = PI_BEM_Residue(Y, T, sequence_input_data, layer.node_properties);
            % Calculate RMSE loss
            rmse_loss = rmse(Y, T);
            % Total loss
            loss = mean_residue + rmse_loss;
        end
    end
end

And this is the network architecture

layers = [
    sequenceInputLayer(numFeatures, 'Name', 'inputLayer') % Define the sequence input layer and name it
    lstmLayer(num_hidden_units, 'OutputMode', 'sequence', 'Name', 'lstmLayer') % Define the LSTM layer and name it
    fullyConnectedLayer(1, 'Name', 'fullyConnectedLayer') % Define the fully connected layer and name it
    dropoutLayer(x.dropout_rate, 'Name', 'dropoutLayer') % Define the dropout layer and name it
    customLossLayerMultiInput(LayerName, node_properties,numFeatures)
    ];
% Create a layer graph
lgraph = layerGraph(layers);
lgraph = connectLayers(lgraph,"inputLayer",strcat(LayerName,'\in2'));

For this setup, I am getting an error

Error using nnet.cnn.LayerGraph>iValidateLayerName
Layer 'RegressionLayer_Node2\in2' does not exist.

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Follow Question

Answer 1

Ben 2023-6-20

0 个投票

Unfortunately it's not possible to define a custom multi-input loss layer.

The possible options are:

If Y, X1 and X2 have compatible sizes you can concatenate them before customLossLayerMultiInput and pass these in as a single input to the loss.
Use dlnetwork and a custom training loop - in this case you can write a much more flexible loss function rather than a custom loss layer, however you need to write a training loop following an example like this.

7 个评论
显示 5更早的评论隐藏 5更早的评论

Shubham Baisthakur 2023-6-20

在 MATLAB Online 中打开

Hello Ben,

Thanks for your response. Luckily I could modify my data such that the sizes of Y, X1 and X2 will be compatible. Following is the code I am using to generate this network

    LayerName = strcat('RegressionLayer_Node',num2str(node_properties.nodenumber));
    
layers = [
    sequenceInputLayer(numFeatures, 'Name', 'inputLayer') % Define the sequence input layer and name it
    lstmLayer(num_hidden_units, 'OutputMode', 'sequence', 'Name', 'lstmLayer') % Define the LSTM layer and name it
    fullyConnectedLayer(1, 'Name', 'fullyConnectedLayer') % Define the fully connected layer and name it
    dropoutLayer(x.dropout_rate, 'Name', 'dropoutLayer') % Define the dropout layer and name it
    concatenationLayer(1,2,'Name','concat')
    customLossLayer(LayerName, node_properties,numFeatures,X_train);
    ];
% Create a layer graph
lgraph = layerGraph(layers);
lgraph = connectLayers(lgraph,'inputLayer','concat/in2');
figure, plot(lgraph)

This is how the network architecture looks like

And I am training this network using following code:

    % Define the training options
    options = trainingOptions('adam', ...
        'MaxEpochs', 30, ...
        'MiniBatchSize', 1, ...
        'InitialLearnRate',x.learning_rate,...
        'SequenceLength', 'longest', ...
        'Shuffle', 'once', ...
        'L2Regularization',0.001,...
        'ValidationData',{X_val,Y_val}, ...
        'ValidationFrequency',10,...
        'Verbose',false,...
        'ExecutionEnvironment','auto');
    % Train the LSTM network
    net = trainNetwork(X_train, Y_train, lgraph, options);
    

Here the variable X_train and Y_train are a cell arrays of dimension Nx1. Each cell in X_train has a size 4x4000 and the corresponding cell in Y_train has a size 1x4000. I am trying to concatenate these arrays using the concatenation layer but during the training I am getting an error:

Error using trainNetwork
Invalid training data. For cell array input, responses must be an N-by-1 cell array of sequences, where N is the number of sequences. The spatial and
channel dimensions of the sequences must be the same as the output size of the last layer (5).

Would you please help me figure out how to tackle this?

Shubham Baisthakur 2023-6-21

编辑：Shubham Baisthakur 2023-6-21

在 MATLAB Online 中打开

Hey Ben,

As you can see in the network architecture, the input to the custom regression layer (RegressionLayer_Node2) is concatenation of the output of dropout layer and sequenece input layer, therfore the dimension of input to the custom regression layer is five (4 from input layer + 1 from drop out layer), where my target has dimension equal to one. I would need 4 input parameters to define the custom loss function, following is the definition of my custom regression layer:

classdef physicsInformedLossLayer < nnet.layer.RegressionLayer & nnet.layer.Acceleratable
    % Custom regression layer with mean-absolute-error loss and additional properties.
    
    properties
        node_properties
    end
    
    methods
        function layer = physicsInformedLossLayer(name, node_properties)
            % Constructor
            layer.Name = name;
            layer.Description = 'Physics-Informed loss function for NN training';
            layer.node_properties = node_properties;
        end
        
        function loss = forwardLoss(layer, Y, T)
            % Calculate the forward loss
            
            Vx = Y(1);
            Vy = Y(2);
            Azimuth = Y(4);
            Y_op = Y(end);
                                    
            nodenumber = layer.node_properties.nodenumber;
            theta = layer.node_properties.theta;
            Airfoils = layer.node_properties.Airfoils;
            TipRad = layer.node_properties.TipRad;
            HubRad = layer.node_properties.HubRad;
            BlSpn = layer.node_properties.BlSpn;
            Solid = layer.node_properties.Solid;
            chi0 = layer.node_properties.chi0;
            phi = Y(ii,1)+theta;
            
            [Residue] =  CallStateResidual(phi, theta, Vx(ii,1), Vy(ii,1), nodenumber, Airfoils, TipRad, HubRad, BlSpn, Solid, Azimuth, chi0);
            loss = (Y_op-T)^2 + Residue;
        end
    end
end

The output of the regression layer still has the dimension equal to one. How do I make this work?

Shubham Baisthakur 2023-6-22

在 MATLAB Online 中打开

Hello Ben, thanks for all the suggestions so far! I agree that dlnetwork provides more flexibility but the training process is much slower and the convergence rate is not great. Following is the code I am using for defining custom training loop

layers = [featureInputLayer(size(X_train,2),'Name','fetureInput')
    fullyConnectedLayer(opt_param.numneuron1,'Name','fc1')
    fullyConnectedLayer(opt_param.numneuron2,'Name','fc2')
    fullyConnectedLayer(1,'Name','fc3')
    dropoutLayer(opt_param.dropout_rate)
    ];
    % Create a layer graph
    net = dlnetwork(layers);
    % Define the training options
    miniBatchSize = 30000;
    numEpochs = 50;
    numObservations = numel(Y_train);
    numIterationsPerEpoch = floor(numObservations./miniBatchSize);
    numIterations = numEpochs * numIterationsPerEpoch;
    % Adam parameters
    averageGrad = [];
    averageSqGrad = [];
    %Initialize training progress monitor
    monitor = trainingProgressMonitor(Metrics="Loss",Info="Epoch",XLabel="Iteration");
    % Network training
    iteration = 0;
    epoch = 0;
    while epoch < numEpochs && ~monitor.Stop
        epoch = epoch + 1;
    
        % Shuffle data.
        idx = randperm(numel(Y_train));
        X_epoch = X_train(idx,:);
        Y_epoch = Y_train(idx);
    
        i = 0;
        while i < numIterationsPerEpoch && ~monitor.Stop
            i = i + 1;
            iteration = iteration + 1;
    
            % Read mini-batch of data and convert the labels to dummy
            % variables.
            idx = (i-1)*miniBatchSize+1:i*miniBatchSize;
            X = X_epoch(idx,:);
            T = Y_epoch(idx);
    
            % Convert mini-batch of data to a dlarray.
            X = dlarray(single(X));
            T = dlarray(single(T));
            node_properties_array = dlarray(node_properties_array);
    
            % Evaluate the model loss and gradients using dlfeval and the
            % modelLoss function.
            [loss,gradients] = dlfeval(@modelLoss,net,X,T,node_properties_array);
    
            % Update the network parameters using the Adam optimizer.
            [net,averageGrad,averageSqGrad] = adamupdate(net,gradients,averageGrad,averageSqGrad,iteration);
            % Update the training progress monitor.
            recordMetrics(monitor,iteration,Loss=loss);
            updateInfo(monitor,Epoch=epoch + " of " + numEpochs);
            monitor.Progress = 100 * iteration/numIterations;
        end
    end
    Y_val_pred = predict(net, dlarray(X_val',"CB"));
    val_loss = mean(mean((Y_val' - Y_val_pred).^2));
    

And the modelLoss function is:

function [loss,gradients] = modelLoss(net,X,T,node_properties_array)
X = dlarray(X',"CB");
% Forward data through network.
[Y,~] = forward(net,X);
X = extractdata(X);
Y = extractdata(Y);
node_properties_array = extractdata(node_properties_array);
% % Initialize node properties
nodenumber = node_properties_array(1);
theta = node_properties_array(2);
TipRad = node_properties_array(3);
HubRad = node_properties_array(4);
BlSpn = node_properties_array(5);
Solid = node_properties_array(6);
chi0 = node_properties_array(7);
load Airfoils.mat
batch_size = numel(Y);
Residue = zeros(1,batch_size);
Residue = dlarray(Residue);
for ii = 1:batch_size
    Vx = X(1,ii);
    Vy = X(2,ii);
    Azimuth = X(4,ii);
    phi = Y(ii)+theta;
    input_array = [phi theta Vx Vy nodenumber TipRad HubRad BlSpn Solid Azimuth chi0];
    [Residue(1,ii)] =  CallStateResidual_ANN_mex(double(input_array), Airfoils);
end
% data_loss = mean((Y-T').^2);
physics_loss = mean(abs(Residue))/100;
% loss = sqrt(data_loss+physics_loss)
% Train the model with physics_loss alone
loss = sqrt(data_loss+physics_loss)
% Calculate gradients of loss with respect to learnable parameters.
gradients = dlgradient(loss,net.Learnables);
end

However, whenver I try to run this function using physics_loss alone, I get the following error:

Error using dlarray/dlgradient
Value to differentiate is not traced. It must be a traced real dlarray scalar. Use dlgradient inside a function called by dlfeval to trace the
variables.

I think this could be happening due to the fact that the matlab function I am using to calculate the Residue term uses double as input as Matlab code generation does not support dlarrays but if I don't use the mex version, the code is really slow in execution. I would really appreciate if you could help me find a solution to this.

Ben 2023-6-23

It's true that for dlgradient(loss,learnables) to work that loss must be computed using only dlarray methods on learnables, otherwise we can't compute the derivatives automatically.

Sice Residue depends on Y which is the output of forward(net,X) you can't use a MEX function on Y and still get a traced output that we can compute gradients of. The extractdata calls break the tracing that is used to compute automatic derivatives.

At a glance it looks like CallStateResidual_ANN_mex is not vectorized over the batch size, so that's the first thing I'd suggest, though it's hard to know if that's plausible since I can't see the implementation of that function.

After that note that dlaccelerate can help optimize modelLoss if it's being called multiple times, so long as the only inputs to modelLoss that vary frequently are dlarray-s.

It's worth noting that the computation in modelLoss would have to happen for trainNetwork too if this was allowed as a custom loss layer, so it's not really avoidable. I would expect dlaccelerate to make up for most of the difference between speed of trainNetwork and custom training loops. As for convergence, this should be the same if your custom training loop implements all the same things as your trainingOptions does in trainNetwork - note that trainingOptions includes non-zero L2Regularization by default.

Shubham Baisthakur 2023-6-23

Hello Ben,

Thanks for the suggestions, I didn't know about the dlaccelerate functionality. Vectorising the CallStateResidual_ANN would be a difficult task but I will give it a try. Hopefully this will improve the computation time.

Thanks again!

请先登录，再进行评论。

How to create Custom Regression Output Layer with multiple inputs for training sequence-to-sequence LSTM model?

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

7 个评论
显示 5更早的评论隐藏 5更早的评论

更多回答（0 个）

类别

产品

版本

标签

Community Treasure Hunt

How to create Custom Regression Output Layer with multiple inputs for training sequence-to-sequence LSTM model?

0 个评论 显示 -2更早的评论 隐藏 -2更早的评论

采纳的回答

7 个评论 显示 5更早的评论 隐藏 5更早的评论

更多回答（0 个）

类别

产品

版本

标签

另请参阅

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

7 个评论
显示 5更早的评论隐藏 5更早的评论