Reproduce Network Training on a GPU
This example shows how to train a network several times on a GPU and get identical results.
Ensuring the reproducibility of model training and inference on the GPU can be beneficial for experimentation and debugging. Reproducing model training on the GPU is particularly important in the verification of deep learning systems.
Prepare Training Data and Network
Use the supporting functions prepareDigitsData
and prepareAutoencoderLayers
to prepare the training data and the network architecture. These functions prepare the data and build the autoencoder network as described in the Prepare Datastore for Image-to-Image Regression example, and are attached to this example as supporting files.
[dsTrain,dsVal] = prepareDigitsData; layers = prepareAutoencoderLayers;
Define Training Options
Specify the training options. The options are the same as those in the Prepare Datastore for Image-to-Image Regression example, with these exceptions.
Train for 5 epochs. Five epochs are not sufficient for the network to converge, but are sufficient to demonstrate whether or not training is exactly reproducible.
Return the network corresponding to the last training iteration. Doing so ensures a fair comparison when you compare the trained networks.
Train the network on a GPU. By default, the
trainnet
function uses a GPU if one is available. Training on a GPU requires a Parallel Computing Toolbox™ license and a supported GPU device. For information on supported devices, see GPU Computing Requirements (Parallel Computing Toolbox).Disable all visualizations.
options = trainingOptions("adam", ... MaxEpochs=5, ... MiniBatchSize=500, ... ValidationData=dsVal, ... ValidationPatience=5, ... OutputNetwork="last-iteration", ... ExecutionEnvironment="gpu", ... Verbose=false);
Check whether a GPU is selected and is available for training.
gpu = gpuDevice;
disp(gpu.Name + " selected.")
NVIDIA RTX A5000 selected.
Train Network Twice and Compare Results
Train the network twice using the trainnet
function. To ensure that random number generation does not affect the training, set the random number generator and seed on the CPU and the GPU before training using the rng
and gpurng
(Parallel Computing Toolbox) functions, respectively.
rng("default") gpurng("default") net1 = trainnet(dsTrain,layers,"mse",options); rng("default") gpurng("default") net2 = trainnet(dsTrain,layers,"mse",options);
Check whether the learnable parameters of the trained networks are equal. As the training uses nondeterministic algorithms, the learnable parameters of the two networks are different.
isequal(net1.Learnables.Value,net2.Learnables.Value)
ans = logical
0
Plot the difference between the weights of the first convolution layer between the first training run and the second training run. The plot shows that there is a small difference in the weights of the two networks.
learnablesDiff = net1.Learnables.Value{1}(:) - net2.Learnables.Value{1}(:); learnablesDiff = extractdata(learnablesDiff); figure bar(learnablesDiff) ylabel("Difference in Weight Value") xlabel("Learnable Parameter Number")
Set Determinism Option and Train Networks
Use the deep.gpu.deterministicAlgorithms
function to set the GPU determinism state to true
, and capture the previous state of the GPU determinism so that you can restore it later. All subsequent calls to GPU deep learning operations use only deterministic algorithms.
previousState = deep.gpu.deterministicAlgorithms(true);
Train the network twice using the trainnet
function, setting the CPU and GPU random number generator and seed each time. Using only deterministic algorithms can slow down training and inference.
rng("default") gpurng("default") net3 = trainnet(dsTrain,layers,"mse",options); rng("default") gpurng("default") net4 = trainnet(dsTrain,layers,"mse",options);
Check whether the learnable parameters of the trained networks are equal. As only deterministic algorithms are used, the learnable parameters of the two networks are equal.
isequal(net3.Learnables.Value,net4.Learnables.Value)
ans = logical
1
Plot the difference between the weights of the first convolution layer between the first training run and the second training run. The plot shows that there is no difference in the weights of the two networks.
learnablesDiff = net3.Learnables.Value{1}(:) - net4.Learnables.Value{1}(:); learnablesDiff = extractdata(learnablesDiff); figure bar(learnablesDiff) ylabel("Difference in Weight Value") xlabel("Learnable Parameter Number")
Restore the GPU determinism state to its original value.
deep.gpu.deterministicAlgorithms(previousState);
See Also
deep.gpu.deterministicAlgorithms
| rng
| gpurng
(Parallel Computing Toolbox) | trainnet
| trainingOptions
Related Topics
- Generate Random Numbers That Are Repeatable
- Random Number Streams on a GPU (Parallel Computing Toolbox)
- Control Random Number Streams on Workers (Parallel Computing Toolbox)