Custom Training Loop Model Loss Functions
Training a deep neural model is an optimization task. By considering a deep learning model as a function f(X;θ), where X is the model input, and θ is the set of learnable parameters, you can optimize θ so that it minimizes some loss value based on the training data. You typically optimize the learnable parameters θ such that for a given input X with corresponding targets T, the learnable parameters minimize the error between the predictions Y=f(X;θ) and T. For example, for regression and classification tasks, you can use cross-entropy and mean squared error (MSE) loss, respectively.
The trainnet
function provides several built-in loss functions to use for training. You can use
cross-entropy loss for classification and mean squared error loss for regression by
specifying "crossentropy" and "mse" as the lossFcn
argument, respectively.
For example, to train a neural network using the trainnet function
with cross-entropy loss,
use
net = trainnet(X,T,layers,"crossentropy",options);lossFcn
argument of the trainnet
function.When you train a deep learning model with a custom training loop, you can minimize the loss with respect to the learnable parameters using the gradients of the loss with respect to the learnable parameters. To calculate these gradients using automatic differentiation, you must define a model loss function.
For an example showing how to train deep learning model with a dlnetwork
object, see Train Network Using Custom Training Loop. For an example showing
how to train a deep learning model defined as a function, see Train Network Using Model Function.
Create Model Loss Function for Model Defined as dlnetwork Object
For a model specified as a dlnetwork object, create a function of the form
[loss,gradients] = modelLoss(net,X,T), where net
is the network, X is the network input, T contains the
targets, and loss and gradients are the returned loss
and gradients, respectively. Optionally, you can pass extra arguments to the gradients
function (for example, if the loss function requires extra information), or return extra
arguments (for example, the updated network state).
For example, this function returns the cross-entropy loss and the gradients of the loss with respect to the learnable parameters in the specified dlnetwork object net, given input data X, and targets T.
function [loss,gradients] = modelLoss(net,X,T) % Forward data through the dlnetwork object. Y = forward(net,X); % Compute loss. loss = crossentropy(Y,T); % Compute gradients. gradients = dlgradient(loss,net.Learnables); end
For an example showing how to train a neural network using a custom training loop, see Train Network Using Custom Training Loop.
To speed up training, you can accelerate your custom loss function using the dlaccelerate
function. For
example,
accLossFcn = dlaccelerate(@modelLoss);
Not all deep learning functions fully support acceleration. For more information, see Deep Learning Function Acceleration.
Create Model Loss Function for Model Defined as Function
For a model specified as a function, create a function of the form [loss,gradients] =
modelLoss(parameters,X,T), where parameters contains the
learnable parameters, X is the model input, T contains
the targets, and loss and gradients are the returned
loss and gradients, respectively. Optionally, you can pass extra arguments to the gradients
function (for example, if the loss function requires extra information), or return extra
arguments (for example, the updated model state).
For example, to compute the model loss and gradients for a model specified by the function
model and learnable parameters parameters,
use:
function [loss,gradients,state] = modelLoss(parameters,X,T) [Y,state] = model(parameters,X); loss = crossentropy(Y,T); gradients = dlgradient(loss,parameters); end
For an example showing how to train a deep learning model defined as a function using a custom training loop, see Train Network Using Model Function.
For more information, see Custom Training Loop Model Loss Functions.
To speed up training, you can accelerate your custom loss function using the dlaccelerate
function. For
example,
accLossFcn = dlaccelerate(@modelLoss);
Not all deep learning functions fully support acceleration. For more information, see Deep Learning Function Acceleration.
Functions for Building Custom Loss Functions
To help create a custom loss function, you can use the deep learning functions in this table.
| Function | Description |
|---|---|
softmax | The softmax activation operation applies the softmax function to the channel dimension of the input data. |
sigmoid | The sigmoid activation operation applies the sigmoid function to the input data. |
crossentropy | The cross-entropy operation computes the cross-entropy loss between network predictions and binary or one-hot encoded targets for single-label and multi-label classification tasks. |
indexcrossentropy | The index cross-entropy operation computes the cross-entropy loss between network predictions and targets specified as integer class indices for single-label classification tasks. |
l1loss | The L1 loss operation computes the
L1 loss given network predictions and target values. When the
Reduction option is "sum" and the
NormalizationFactor option is "batch-size", the
computed value is known as the mean absolute error (MAE). |
l2loss | The L2 loss operation computes the
L2 loss (based on the squared L2 norm) given
network predictions and target values. When the Reduction option is
"sum" and the NormalizationFactor option is
"batch-size", the computed value is known as the mean squared error
(MSE). |
huber | The Huber operation computes the Huber loss between network predictions and target values for regression tasks. When the 'TransitionPoint' option is 1, this is also known as smooth L1 loss. |
ctc | The CTC operation computes the connectionist temporal classification (CTC) loss between unaligned sequences. |
mse | The half mean squared error operation computes the half mean squared error loss between network predictions and target values for regression tasks. |
Evaluate Model Loss Function
To evaluate the model loss function using automatic differentiation, use the dlfeval
function, which evaluates a function with automatic differentiation enabled. For the first
input of dlfeval, pass the model loss function specified as a function
handle. For the following inputs, pass the required variables for the model loss function.
For the outputs of the dlfeval function, specify the same outputs as
the model loss function.
For example, evaluate the model loss function modelLoss with a
dlnetwork object net, input data
X, and targets T, and return the model loss
and
gradients.
[loss,gradients] = dlfeval(@modelLoss,net,X,T);
Similarly, evaluate the model loss function modelLoss using a model
function with learnable parameters specified by the structure
parameters, input data X, and targets
T, and return the model loss and
gradients.
[loss,gradients] = dlfeval(@modelLoss,parameters,X,T);
Update Learnable Parameters Using Gradients
To update the learnable parameters, you can use these functions.
| Function | Description |
|---|---|
adamupdate | Update parameters using adaptive moment estimation (Adam) |
rmspropupdate | Update parameters using root mean squared propagation (RMSProp) |
sgdmupdate | Update parameters using stochastic gradient descent with momentum (SGDM) |
lbfgsupdate | Update parameters using limited-memory BFGS (L-BFGS) |
dlupdate | Update parameters using custom function |
For example, update the learnable parameters of a dlnetwork object
net using the adamupdate
function.
[net,trailingAvg,trailingAvgSq] = adamupdate(net,gradients, ...
trailingAvg,trailingAvgSq,iteration);gradients is the gradients of the loss with respect to the
learnable parameters, and trailingAvg,
trailingAvgSq, and iteration are the
hyperparameters required by the adamupdate function.Similarly, update the learnable parameters for a model function
parameters using the adamupdate
function.
[parameters,trailingAvg,trailingAvgSq] = adamupdate(parameters,gradients, ...
trailingAvg,trailingAvgSq,iteration);gradients is the gradients of the loss with respect to the
learnable parameters, and trailingAvg,
trailingAvgSq, and iteration are the
hyperparameters required by the adamupdate function.Use Model Loss Function in Custom Training Loop
When training a deep learning model using a custom training loop, evaluate the model loss and gradients and update the learnable parameters for each mini-batch.
This code snippet shows an example of using the dlfeval and
adamupdate functions in a custom training loop.
iteration = 0; % Loop over epochs. for epoch = 1:numEpochs % Loop over mini-batches. for i = 1:numIterationsPerEpoch iteration = iteration + 1; % Prepare mini-batch. % ... % Evaluate model loss and gradients. [loss,gradients] = dlfeval(@modelLoss,net,X,T); % Update learnable parameters. [parameters,trailingAvg,trailingAvgSq] = adamupdate(parameters,gradients, ... trailingAvg,trailingAvgSq,iteration); end end
For an example showing how to train a deep learning model with a
dlnetwork object, see Train Network Using Custom Training Loop. For an example
showing how to training a deep learning model defined as a function, see Train Network Using Model Function.
Debug Model Loss Functions
If the implementation of the model loss function has an issue, then the call to
dlfeval can throw an error. Sometimes, when you use the
dlfeval function, it is not clear which line of code is
throwing the error. To help locate the error, you can try the following.
Call Model Loss Function Directly
Try calling the model loss function directly (that is, without using the
dlfeval function) with generated inputs of the expected
sizes. If any of the lines of code throw an error, then the error message provides
extra detail. Note that when you do not use the dlfeval
function, any calls to the dlgradient function throw an
error.
% Generate image input data. X = rand([28 28 1 100],'single'); X = dlarray(X); % Generate one-hot encoded target data. T = repmat(eye(10,'single'),[1 10]); [loss,gradients] = modelLoss(net,X,T);
Run Model Loss Code Manually
Run the code inside the model loss function manually with generated inputs of the expected sizes and inspect the output and any thrown error messages.
For example, consider the following model loss function.
function [loss,gradients] = modelLoss(net,X,T) % Forward data through the dlnetwork object. Y = forward(net,X); % Compute loss. loss = crossentropy(Y,T); % Compute gradients. gradients = dlgradient(loss,net.Learnables); end
Check the model loss function by running the following code.
% Generate image input data. X = rand([28 28 1 100],'single'); X = dlarray(X); % Generate one-hot encoded target data. T = repmat(eye(10,'single'),[1 10]); % Check forward pass. Y = forward(net,X); % Check loss calculation. loss = crossentropy(Y,T)
See Also
Topics
- Custom Loss Functions
- Custom Training Loops
- Train Network Using Custom Training Loop
- Train Network Using Model Function
- Specify Training Options in Custom Training Loop
- Update Batch Normalization Statistics in Custom Training Loop
- Make Predictions Using dlnetwork Object
- List of Functions with dlarray Support