Scale Up Deep Learning in Parallel, on GPUs, and in the Cloud
Training deep networks is computationally intensive and can take many hours of computing time; however, neural networks are inherently parallel algorithms. You can take advantage of this parallelism by running in parallel using high-performance GPUs and computer clusters.
It is recommended to train using a GPU or multiple GPUs. Only use single CPU or multiple CPUs if you do not have a GPU. CPUs are normally much slower than GPUs for both training and inference. Running on a single GPU typically offers much better performance than running on multiple CPU cores.
If you do not have a suitable GPU, you can rent high-performance GPUs and clusters in the cloud. For more information on how to access MATLAB® in the cloud for deep learning, see Deep Learning in the Cloud.
Using a GPU or parallel options requires Parallel Computing Toolbox™. Using a GPU also requires a supported GPU device. For information on supported devices, see GPU Computing Requirements (Parallel Computing Toolbox). Using a remote cluster also requires MATLAB Parallel Server™.
Tip
For trainnet
workflows, GPU support is automatic. By default,
the trainnet
function uses a GPU if one is available. If you
have access to a machine with multiple GPUs, specify the
ExecutionEnvironment
training option as
"multi-gpu"
.
To run custom training workflows, on the GPU, use minibatchqueue
to automatically convert data to
gpuArray
objects.
You can use parallel resources to scale up deep learning for a single network. You can also train multiple networks simultaneously. The following sections show the available options for deep learning in parallel in MATLAB:
Note
If you run MATLAB on a single remote machine for example, a cloud machine that you connect to via ssh or remote desktop protocol, then follow the steps for local resources. For more information on connecting to cloud resources, see Deep Learning in the Cloud.
Train Single Network in Parallel
Use Local Resources to Train Single Network in Parallel
The following table shows you the available options for training and inference with single network on your local workstation.
Resource |
trainnet Workflows | Custom Training Workflows | Required Products |
---|---|---|---|
Single CPU | Automatic if no GPU is available. Training using a single CPU is not recommended. | Training using a single CPU is not recommended. |
|
Multiple CPU cores | Training using multiple CPU cores is not recommended if you have access to a GPU. | Training using multiple CPU cores is not recommended if you have access to a GPU. |
|
Single GPU | Automatic. By default, training and inference run on the GPU if one is available. Alternatively,
specify the | Use For an example, see Train Network Using Custom Training Loop. | |
Multiple GPUs | Specify the For an example, see Train Network Using Automatic Multi-GPU Support. | Start a local parallel pool with as many workers as available GPUs. For more information, see Deep Learning with MATLAB on Multiple GPUs. Use For an example, see Train Network in Parallel with Custom Training Loop. Set the |
Use Remote Cluster Resources to Train Single Network in Parallel
The following table shows you the available options for training and inference with single network on a remote cluster.
Resource | trainnet Workflows | Custom Training Workflows | Required Products |
---|---|---|---|
Any | Specify the desired cluster as your default cluster profile. For more information, see Manage Cluster Profiles and Automatic Pool Creation. Specify the
If the pool has access to GPUs, then only workers with a unique GPU perform training computation and excess workers become idle. If the pool does not have GPUs, then training takes place on all available CPU workers instead. | Specify the desired cluster as your default cluster profile. For more information, see Manage Cluster Profiles and Automatic Pool Creation. Use The software, by default, performs calculations using only the CPU. For an example, see Train Network in Parallel with Custom Training Loop. Set the |
|
Multiple CPUs | Training using multiple CPU cores is not recommended if you have access to a GPU. Specify the desired cluster as your default cluster profile. For more information, see Manage Cluster Profiles and Automatic Pool Creation. Specify the
If the pool has access to GPUs, the GPUs will not be used. | Training using multiple CPU cores is not recommended if you have access to a GPU. | |
Multiple GPUs | Specify the desired cluster as your default cluster profile. For more information, see Manage Cluster Profiles and Automatic Pool Creation. Specify the
If you
use the If you use
the | Start a parallel pool in the desired cluster with as many workers as available GPUs. For more information, see Deep Learning with MATLAB on Multiple GPUs. Use For an example, see Train Network in Parallel with Custom Training Loop. Set the |
Train Multiple Networks in Parallel
Use Local or Remote Cluster Resources to Train Multiple Network in Parallel
To train multiple networks in parallel, train each network on a different parallel worker. You can modify the network or training parameters on each worker to perform parameter sweeps in parallel.
Use parfor
(Parallel Computing Toolbox) or parfeval
(Parallel Computing Toolbox) to train a single
network on each worker. To run in the background without blocking your local
MATLAB, use parfeval
. You can plot results using the
OutputFcn
training option.
You can run locally or using a remote cluster. Using a remote cluster requires MATLAB Parallel Server.
Resource |
trainnet Workflows | Custom Training Workflows | Required Products |
---|---|---|---|
Multiple CPUs | Specify the desired cluster as your default cluster profile. For more information, see Manage Cluster Profiles and Automatic Pool Creation. Use For examples, see
| Specify the desired cluster as your default cluster profile. For more information, see Manage Cluster Profiles and Automatic Pool Creation. Use |
|
Multiple GPUs | Start a parallel pool in the desired cluster with as many workers as available GPUs. For more information, see Deep Learning with MATLAB on Multiple GPUs. Use For examples, see
| Start a parallel pool in the desired cluster with as many workers as available GPUs. For more information, see Deep Learning with MATLAB on Multiple GPUs. Use Convert each mini-batch of data to
|
Use Experiment Manager to Train Multiple Networks in Parallel
You can use Experiment Manager to run trials on multiple parallel workers
simultaneously. Set up your parallel environment and, on the Experiment Manager
toolstrip, set Mode to Simultaneous
before running your experiment. Experiment Manager runs as many simultaneous
trials as there are workers in your parallel pool. For more information, see
Run Experiments in Parallel.
Batch Deep Learning
You can offload deep learning computations to run in the background using the
batch
(Parallel Computing Toolbox) function. This means that
you can continue using MATLAB while your computation runs in the background, or you can close your
client MATLAB and fetch results later.
You can run batch jobs in a local or remote cluster. To offload your deep learning
computations, use batch
to submit a script or function that
runs in the cluster. You can perform any kind of deep learning computation as a
batch job, including parallel computations. For an example, see Send Deep Learning Batch Job to Cluster.
When you submit a batch job as a script, by default, workspace variables are copied from the client to the workers. To avoid copying workspace variables to the workers, submit batch jobs as functions.
To run in parallel, use a script or function that contains the same code that you
would use to run in parallel locally or in a cluster. For example, your script or
function can run trainnet
with the
ExecutionEnvironment
training option set to
"parallel-auto"
, or run a custom training loop in parallel.
Use batch
to submit the script or function to the cluster and
use the Pool
option to specify the number of workers you want to
use. For more information on running parallel computations with
batch
, see Run Batch Parallel Jobs (Parallel Computing Toolbox).
To run deep learning computation on multiple networks, it is recommended to submit a single batch job for each network. Doing so avoids the overhead required to start a parallel pool in the cluster and allows you to use the job monitor to observe the progress of each network computation individually.
You can submit multiple batch jobs. If the submitted jobs require more workers than are currently available in the cluster, then later jobs are queued until earlier jobs have finished. Queued jobs start when enough workers are available to run the job.
The default search paths of the workers might not be the same as that of your
client MATLAB. To ensure that workers in the cluster have access to the needed
files, such as code files, data files, or model files, specify paths to add to
workers using the AdditionalPaths
option.
To retrieve results after the job is finished, use the fetchOutputs
(Parallel Computing Toolbox) function.
fetchOutputs
retrieves all variables in the batch worker
workspace. When you submit batch jobs as a script, by default, workspace variables
are copied from the client to workers. To avoid recursion of workspace variables,
submit batch jobs as functions instead of as scripts.
You can use the diary
(Parallel Computing Toolbox) to capture command line
output while running batch jobs. This can be useful when executing the
trainnet
function with the Verbose
option set to true
.
Manage Cluster Profiles and Automatic Pool Creation
Parallel Computing Toolbox comes pre-configured with the cluster profile
Processes
for running parallel code on your local desktop
machine. By default, MATLAB starts all parallel pools using the Processes
cluster profile. If you want to run code on a remote cluster, you must start a
parallel pool using the remote cluster profile. You can manage cluster profiles
using the Cluster Profile Manager. For more information about managing cluster
profiles, see Discover Clusters and Use Cluster Profiles (Parallel Computing Toolbox).
Some functions, including trainnet
,
parfor
, and parfeval
can automatically
start a parallel pool. To take advantage of automatic parallel pool creation, set
your desired cluster as the default cluster profile in the Cluster Profile Manager.
Alternatively, you can create the pool manually and specify the desired cluster
resource when you create the pool.
If you want to use multiple GPUs in a remote cluster to train multiple networks in parallel or for custom training loops, best practice is to manually start a parallel pool in the desired cluster with as many workers as available GPUs. For more information, see Deep Learning with MATLAB on Multiple GPUs.
Deep Learning Precision
For best performance, it is recommended to use a GPU for all deep learning workflows. Because single-precision and double-precision performance of GPUs can differ substantially, it is important to know in which precision computations are performed. Typically, GPUs offer much better performance for calculations in single precision.
If you only use a GPU for deep learning, then single-precision performance is one of the most important characteristics of a GPU. If you also use a GPU for other computations using Parallel Computing Toolbox, then high double-precision performance is important. This is because many functions in MATLAB use double-precision arithmetic by default. For more information, see Perform Calculations in Single Precision (Parallel Computing Toolbox)
By default, the software performs computations using single-precision, floating-point arithmetic to train a neural network using the trainnet
function. The trainnet
function returns a network with single-precision learnables and state parameters.
When you use prediction or validation functions with a dlnetwork
object with single-precision learnable and state parameters, the software performs the computations using single-precision, floating-point arithmetic.
When you use prediction or validation functions with a dlnetwork
object with double-precision learnable and state parameters:
If the input data is single precision, the software performs the computations using single-precision, floating-point arithmetic.
If the input data is double precision, the software performs the computations using double-precision, floating-point arithmetic.
For custom training workflows, it is recommended to convert data to single
precision for training and inference. If you use minibatchqueue
to manage mini-batches, your data is converted to single precision by default.
Reproducibility
To provide the best performance, deep learning using a GPU in
MATLAB is not guaranteed to be deterministic. Depending on your network architecture,
under some conditions you might get different results when using a GPU to train two identical
networks or make two predictions using the same network and data. If you require determinism
when performing deep learning operations using a GPU, use the deep.gpu.deterministicAlgorithms
function (since R2024b).
See Also
trainnet
| trainingOptions
| dlnetwork
| minibatchqueue
| Deep Network
Designer | Experiment
Manager
Related Topics
- Deep Learning with MATLAB on Multiple GPUs
- Resolve GPU Memory Issues
- Run MATLAB using GPUs in the Cloud (Parallel Computing Toolbox)
- Deep Learning with Big Data
- Deep Learning in the Cloud
- Train Deep Learning Networks in Parallel
- Send Deep Learning Batch Job to Cluster
- Work with Deep Learning Data in the Cloud
- Run Experiments in Parallel