Train Convolutional Neural Network for Regression
This example shows how to train a convolutional neural network to predict the angles of rotation of handwritten digits.
Regression tasks involve predicting continuous numerical values instead of discrete class labels. This example constructs a convolutional neural network architecture for regression, trains the network, and the uses the trained network to predict angles of rotated handwritten digits.
This diagram illustrates the flow of image data through a regression neural network.
Load Data
The data set contains synthetic images of handwritten digits together with the corresponding angles (in degrees) by which each image is rotated.
Load the training and test data from the MAT files DigitsDataTrain.mat
and DigitsDataTest.mat
, respectively. The variables anglesTrain
and anglesTest
are the rotation angles in degrees. The training and test data sets each contain 5000 images.
load DigitsDataTrain load DigitsDataTest
Display some of the training images.
numObservations = size(XTrain,4); idx = randperm(numObservations,49); I = imtile(XTrain(:,:,:,idx)); figure imshow(I);
Partition XTrain
and anglesTrain
into training and validation partitions using the trainingPartitions
function, attached to this example as a supporting file. To access this function, open the example as a live script. Set aside 15% of the training data for validation.
[idxTrain,idxValidation] = trainingPartitions(numObservations,[0.85 0.15]); XValidation = XTrain(:,:,:,idxValidation); anglesValidation = anglesTrain(idxValidation); XTrain = XTrain(:,:,:,idxTrain); anglesTrain = anglesTrain(idxTrain);
Check Data Normalization
When training neural networks, it often helps to make sure that your data is normalized in all stages of the network. Normalization helps stabilize and speed up network training using gradient descent. If your data is poorly scaled, then the loss can become NaN
and the network parameters can diverge during training. Common ways of normalizing data include rescaling the data so that its range becomes [0,1] or so that it has a mean of zero and standard deviation of one. You can normalize the following data:
Input data. Normalize the predictors before you input them to the network. In this example, the input images are already normalized to the range [0,1].
Layer outputs. You can normalize the outputs of each convolutional and fully connected layer by using a batch normalization layer.
Responses. If you use batch normalization layers to normalize the layer outputs in the end of the network, then the predictions of the network are normalized when training starts. If the response has a very different scale from these predictions, then network training can fail to converge. If your response is poorly scaled, then try normalizing it and see if network training improves. If you normalize the response before training, then you must transform the predictions of the trained network to obtain the predictions of the original response.
Plot the distribution of the response. The response (the rotation angle in degrees) is approximately uniformly distributed between -45 and 45, which works well without needing normalization. In classification problems, the outputs are class probabilities, which are always normalized.
figure histogram(anglesTrain) axis tight ylabel("Counts") xlabel("Rotation Angle")
In general, the data does not have to be exactly normalized. However, if you train the network in this example to predict 100*anglesTrain
or anglesTrain+500
instead of anglesTrain
, then the loss becomes NaN
and the network parameters diverge when training starts. These results occur even though the only difference between a network predicting and a network predicting is a simple rescaling of the weights and biases of the final fully connected layer.
If the distribution of the input or response is very uneven or skewed, you can also perform nonlinear transformations (for example, taking logarithms) to the data before training the network.
Define Neural Network Architecture
Define the neural network architecture.
For image input, specify an image input layer.
Specify four convolution-batchnorm-ReLU blocks with increasing numbers of filters.
Between each block, specify an average pooling layer with pooling regions and stride of size 2.
At the end of the network, include a fully connected layer with an output size that matches the number of responses.
numResponses = 1; layers = [ imageInputLayer([28 28 1]) convolution2dLayer(3,8,Padding="same") batchNormalizationLayer reluLayer averagePooling2dLayer(2,Stride=2) convolution2dLayer(3,16,Padding="same") batchNormalizationLayer reluLayer averagePooling2dLayer(2,Stride=2) convolution2dLayer(3,32,Padding="same") batchNormalizationLayer reluLayer convolution2dLayer(3,32,Padding="same") batchNormalizationLayer reluLayer fullyConnectedLayer(numResponses)];
Specify Training Options
Specify the training options. Choosing among the options requires empirical analysis. To explore different training option configurations by running experiments, you can use the Experiment Manager app.
Set the initial learn rate to 0.001 and lower the learning rate after 20 epochs.
Monitor the network accuracy during training by specifying validation data and validation frequency. The software trains the network on the training data and calculates the accuracy on the validation data at regular intervals during training. The validation data is not used to update the network weights.
Display the training progress in a plot and monitor the root mean squared error.
Disable the verbose output.
miniBatchSize = 128; validationFrequency = floor(numel(anglesTrain)/miniBatchSize); options = trainingOptions("sgdm", ... MiniBatchSize=miniBatchSize, ... InitialLearnRate=1e-3, ... LearnRateSchedule="piecewise", ... LearnRateDropFactor=0.1, ... LearnRateDropPeriod=20, ... Shuffle="every-epoch", ... ValidationData={XValidation,anglesValidation}, ... ValidationFrequency=validationFrequency, ... Plots="training-progress", ... Metrics="rmse", ... Verbose=false);
Train Neural Network
Train the neural network using the trainnet
function. For regression, use mean squared error loss. By default, the trainnet
function uses a GPU if one is available. Using a GPU requires a Parallel Computing Toolbox™ license and a supported GPU device. For information on supported devices, see GPU Computing Requirements (Parallel Computing Toolbox). Otherwise, the function uses the CPU. To specify the execution environment, use the ExecutionEnvironment
training option.
net = trainnet(XTrain,anglesTrain,layers,"mse",options);
Test Network
Test the neural network using the testnet
function. For regression, evaluate the root mean squared error (RMSE). By default, the testnet
function uses a GPU if one is available. To select the execution environment manually, use the ExecutionEnvironment
argument of the testnet
function.
rmse = testnet(net,XTest,anglesTest,"rmse")
rmse = 4.9274
Visualize the accuracy in a plot by making predictions with the test data and comparing the predictions with the targets. Make predictions using the minibatchpredict
function. By default, the minibatchpredict
function uses a GPU if one is available.
YTest = minibatchpredict(net,XTest);
Plot the predicted values against the targets.
figure scatter(YTest,anglesTest,"+") xlabel("Prediction") ylabel("Target") hold on plot([-60 60], [-60 60],"r--")
Make Predictions with New Data
Use the neural network to make a prediction with the first test image. To make a prediction with a single image, use the predict
function. To use a GPU, first convert the data to gpuArray
.
X = XTest(:,:,:,1); if canUseGPU X = gpuArray(X); end Y = predict(net,X)
Y = single
34.7356
figure
imshow(X)
title("Angle: " + gather(Y))
See Also
trainnet
| trainingOptions
| dlnetwork