Set Up Parameters and Train Convolutional Neural Network
To specify the training options for the trainnet
function, use the trainingOptions
function. Pass the resulting
options object to the trainnet
function.
For example, to create a training options object that specifies:
Train using the adaptive moment estimation (Adam) solver.
Train for at most four epochs.
Monitor the training progress in a plot and monitor the accuracy metric.
Disable the verbose output.
use:
options = trainingOptions("adam", ... MaxEpochs=4, ... Plots="training-progress", ... Metrics="accuracy", ... Verbose=false);
To train the network using these training options, use:
net = trainnet(data,layers,lossFcn,options);
Note
This topic outlines some commonly used training options. The options listed here are
only a subset. For a complete list, see trainingOptions
.
Solvers
The solver is the algorithm that the training function uses to optimize the learnable
parameters. Specify the solver using the first argument of the trainingOptions
function. For example, to create a training options
object with the default settings for the Adam optimizer, use:
options = trainingOptions("adam");
For more information, see trainingOptions
.
Tip
Different solvers work better for different tasks. The Adam solver is often a good optimizer to try first.
Monitoring Options
To monitor the training progress, you can display training metrics in a plot. For example, to monitor the accuracy in a plot and disable the verbose output, use:
options = trainingOptions("adam", ... Plots="training-progress", ... Metrics="accuracy", ... Verbose=false);
For more information, see trainingOptions
.
Data Format Options
Most deep learning networks and functions operate on different dimensions of the input data in different ways.
For example, an LSTM operation iterates over the time dimension of the input data, and a batch normalization operation normalizes over the batch dimension of the input data.
In most cases, you can pass you training data directly to the network. If you have data in a different layout to what the network expects, then you can specify the layout of the data using data formats.
A data format is a string of characters, where each character describes the type of the corresponding data dimension.
The characters are:
"S"
— Spatial"C"
— Channel"B"
— Batch"T"
— Time"U"
— Unspecified
For example, consider an array containing a batch of sequences where the first, second,
and third dimensions correspond to channels, observations, and time steps, respectively. You
can specify that this array has the format "CBT"
(channel, batch,
time).
If you have data that has a different layout to what the network expects, then is
usually easier to provide data format information than reshaping and preprocessing your
data. For example, to specify that you have sequence data, where the first, second, and
third dimensions correspond to channels, observations, and time steps, respectively,
specify that the input data has format "CBT"
(channel, batch, time)
using:
options = trainingOptions("adam", ... InputDataFormats="CBT");
For more information, see trainingOptions
.
Stochastic Solver Options
Stochastic solvers train neural networks by iterating over mini-batches of data and updating the neural network learnable parameters. You can specify stochastic solver options that control the mini-batches, epochs (full passes of the training data), learning rate, and other solver-specific settings such as momentum for the stochastic gradient descent with momentum (SGDM) solver. For example, to specify a mini-batch size of 16 with an initial learning rate of 0.01, use:
options = trainingOptions("adam", ... MiniBatchSiize=16, ... InitialLearnRate=0.01);
For more information, see trainingOptions
.
Tip
If the mini-batch loss during training ever becomes NaN
, then
the learning rate is likely too high. Try reducing the learning rate, for example by
a factor of 3, and then restart network training.
L-BFGS Solver Options
The limited-memory Broyden–Fletcher–Goldfarb–Shanno (L-BFGS) solver is a full-batch solver, which means that it processes the entire training set in a single iteration. You can specify L-BFGS solver options that control the iterations (full passes of training the data), line search, and other solver-specific settings. For example, to specify to train for 2000 iterations using L-BFGS and find a learning rate that satisfies the strong Wolfe conditions, use:
options = trainingOptions("lbfgs", ... MaxIterations=2000, ... LineSearchMethod="strong-wolfe");
For more information, see trainingOptions
.
Validation Options
You can monitor training progress using a held-out validation data set. Performing validation at regular intervals during training helps you to determine if your network is overfitting to the training data. To check if your network is overfitting, compare the training metrics to the corresponding validation metrics. If the training metrics are significantly better than the validation metrics, then the network could be overfitting. For example, to specify a validation data set and validate the network every 100 iterations, use:
options = trainingOptions("adam", ... ValidationData={XValidation,TValidation}, ... ValidationFrequency=100);
If your network has layers that behave differently during prediction than during training (for example, dropout layers), then the validation metrics can be better than the training metrics.
For more information, see trainingOptions
.
Regularization and Normalization Options
You can prevent overfitting and improve convergence using regularization and normalization. Regularization can help prevent overfitting by adding a penalty term to the loss function. Normalization can improve convergence and stability by scaling input data to a standard range. For example, to specify an L2 regularization factor of 0.0002, use:
options = trainingOptions("adam", ... L2Regularization=0.0002);
For more information, see trainingOptions
.
Gradient Clipping Options
To prevent large gradient from introducing errors in the training process, you can limit the magnitude of the gradients. For example, to scale the gradients to have magnitude equal to 2, use
options = trainingOptions("adam", ... GradientThresholdMethod="absolute-value", ... GradientThreshold=2);
For more information, see trainingOptions
.
Sequence Options
Training a neural network usually requires data with fixed sizes, for example, sequences with the same number of channels and time steps. To transform batches of sequences so that the sequences have the same length, you can specify padding and truncation options. For example, to left-pad mini-batches so that the sequences in each mini-batch have the same length, use:
options = trainingOptions("adam", ... SequencePaddingDirection="left");
For more information, see trainingOptions
.
Hardware and Acceleration Options
The software, by default, trains using a supported GPU if one is available. Using a GPU requires a Parallel Computing Toolbox™ license. For information on supported devices, see GPU Computing Requirements (Parallel Computing Toolbox). You can specify additional hardware and acceleration options. For example, to specify to use multiple GPUs on one machine, using a local parallel pool based on your default cluster profile, use:
options = trainingOptions("adam", ... ExecutionEnvironment="multi-gpu");
For more information, see trainingOptions
.
Checkpoint Options
For large networks and large datasets, training can take a long time to run. To
periodically save the network during training, you can save checkpoint networks. For
example, to save a checkpoint network every 5 epochs in the folder named
"checkpoints"
,
use
options = trainingOptions("adam", ... CheckpointPath="checkpoints", ... CheckpointFrequency=5);
If the training is interrupted for some reason, you can resume training from the last saved checkpoint neural network.
For more information, see trainingOptions
.
See Also
trainnet
| trainingOptions
| dlnetwork