How to create Custom Regression Output Layer with multiple inputs for training sequence-to-sequence LSTM model?

11 次查看(过去 30 天)
For the neural network architecture I am using for my problem, I would like to define a Regression Output Layer with a custom loss function. For this, I would need the regression layer to have two inputs, one from a fully connected layer and other from the sequenceInput layer, however I am not able to achieve that. How do I get around this?
Following is the definition of the custom layer:
classdef customLossLayerMultiInput < nnet.layer.RegressionLayer & nnet.layer.Acceleratable
% Custom regression layer with mean-absolute-error loss and additional properties.
properties
node_properties
numFeature
end
methods
function layer = customLossLayerMultiInput(name, node_properties, numFeature)
% Constructor
layer.Name = name;
layer.Description = 'Physics-Informed loss function for LSTM training';
layer.node_properties = node_properties;
layer.numFeature = numFeature;
end
function loss = forwardLoss(layer, Y, T, varargin)
% Calculate the forward loss
% Reshape predictions and targets
Y = reshape(Y, [], 1);
T = reshape(T, [], 1);
X1 = varargin{1};
X2 = varargin{2};
% Sequence input data
sequence_input_data = reshape(X1, [], layer.numFeature);
% Calculate mean residue
mean_residue = PI_BEM_Residue(Y, T, sequence_input_data, layer.node_properties);
% Calculate RMSE loss
rmse_loss = rmse(Y, T);
% Total loss
loss = mean_residue + rmse_loss;
end
end
end
And this is the network architecture
layers = [
sequenceInputLayer(numFeatures, 'Name', 'inputLayer') % Define the sequence input layer and name it
lstmLayer(num_hidden_units, 'OutputMode', 'sequence', 'Name', 'lstmLayer') % Define the LSTM layer and name it
fullyConnectedLayer(1, 'Name', 'fullyConnectedLayer') % Define the fully connected layer and name it
dropoutLayer(x.dropout_rate, 'Name', 'dropoutLayer') % Define the dropout layer and name it
customLossLayerMultiInput(LayerName, node_properties,numFeatures)
];
% Create a layer graph
lgraph = layerGraph(layers);
lgraph = connectLayers(lgraph,"inputLayer",strcat(LayerName,'\in2'));
For this setup, I am getting an error
Error using nnet.cnn.LayerGraph>iValidateLayerName
Layer 'RegressionLayer_Node2\in2' does not exist.

采纳的回答

Ben
Ben 2023-6-20
Unfortunately it's not possible to define a custom multi-input loss layer.
The possible options are:
  1. If Y, X1 and X2 have compatible sizes you can concatenate them before customLossLayerMultiInput and pass these in as a single input to the loss.
  2. Use dlnetwork and a custom training loop - in this case you can write a much more flexible loss function rather than a custom loss layer, however you need to write a training loop following an example like this.
  7 个评论
Ben
Ben 2023-6-23
It's true that for dlgradient(loss,learnables) to work that loss must be computed using only dlarray methods on learnables, otherwise we can't compute the derivatives automatically.
Sice Residue depends on Y which is the output of forward(net,X) you can't use a MEX function on Y and still get a traced output that we can compute gradients of. The extractdata calls break the tracing that is used to compute automatic derivatives.
At a glance it looks like CallStateResidual_ANN_mex is not vectorized over the batch size, so that's the first thing I'd suggest, though it's hard to know if that's plausible since I can't see the implementation of that function.
After that note that dlaccelerate can help optimize modelLoss if it's being called multiple times, so long as the only inputs to modelLoss that vary frequently are dlarray-s.
It's worth noting that the computation in modelLoss would have to happen for trainNetwork too if this was allowed as a custom loss layer, so it's not really avoidable. I would expect dlaccelerate to make up for most of the difference between speed of trainNetwork and custom training loops. As for convergence, this should be the same if your custom training loop implements all the same things as your trainingOptions does in trainNetwork - note that trainingOptions includes non-zero L2Regularization by default.
Shubham Baisthakur
Shubham Baisthakur 2023-6-23
Hello Ben,
Thanks for the suggestions, I didn't know about the dlaccelerate functionality. Vectorising the CallStateResidual_ANN would be a difficult task but I will give it a try. Hopefully this will improve the computation time.
Thanks again!

请先登录,再进行评论。

更多回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Image Data Workflows 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by