Train Shallow Networks on CPUs and GPUs
Parallel Computing Toolbox
Tip
This topic describes shallow networks. For deep learning, see instead Scale Up Deep Learning in Parallel, on GPUs, and in the Cloud.
Neural network training and simulation involves many parallel calculations. Multicore CPUs, graphical processing units (GPUs), and clusters of computers with multiple CPUs and GPUs can all take advantage of parallel calculations.
Together, Deep Learning Toolbox™ and Parallel Computing Toolbox™ enable the multiple CPU cores and GPUs of a single computer to speed up training and simulation of large problems.
The following is a standard single-threaded training and simulation session. (While the benefits of parallelism are most visible for large problems, this example uses a small dataset that ships with Deep Learning Toolbox.)
[x, t] = bodyfat_dataset; net1 = feedforwardnet(10); net2 = train(net1, x, t); y = net2(x);
Parallel CPU Workers
Intel® processors ship with as many as eight cores. Workstations with two processors can have as many as 16 cores, with even more possible in the future. Using multiple CPU cores in parallel can dramatically speed up calculations.
Start or get the current parallel pool and view the number of workers in the pool.
pool = gcp; pool.NumWorkers
An error occurs if you do not have a license for Parallel Computing Toolbox.
When a parallel pool is open, set the train
function’s
'useParallel'
option to 'yes'
to specify that training
and simulation be performed across the pool.
net2 = train(net1,x,t,'useParallel','yes'); y = net2(x,'useParallel','yes');
GPU Computing
GPUs can have thousands of cores on a single card and are highly efficient on parallel algorithms like neural networks.
Use gpuDeviceCount
to check whether a supported GPU card is available
in your system. Use the function gpuDevice
to review the currently selected
GPU information or to select a different GPU.
gpuDeviceCount
gpuDevice
gpuDevice(2) % Select device 2, if available
An “Undefined function or variable” error appears if you do not have a license for Parallel Computing Toolbox.
When you have selected the GPU device, set the train
or
sim
function’s 'useGPU'
option to
'yes'
to perform training and simulation on it.
net2 = train(net1,x,t,'useGPU','yes'); y = net2(x,'useGPU','yes');
Multiple GPU/CPU Computing
You can use multiple GPUs for higher levels of parallelism.
After opening a parallel pool, set both 'useParallel'
and
'useGPU'
to 'yes'
to harness all the GPUs and CPU cores
on a single computer. Each worker associated with a unique GPU uses that GPU. The rest of the
workers perform calculations on their CPU core.
net2 = train(net1,x,t,'useParallel','yes','useGPU','yes'); y = net2(x,'useParallel','yes','useGPU','yes');
For some problems, using GPUs and CPUs together can result in the highest computing speed.
For other problems, the CPUs might not keep up with the GPUs, and so using only GPUs is faster.
Set 'useGPU'
to 'only'
, to restrict the parallel computing
to workers with unique GPUs.
net2 = train(net1,x,t,'useParallel','yes','useGPU','only'); y = net2(x,'useParallel','yes','useGPU','only');
Cluster Computing with MATLAB Parallel Server
MATLAB® Parallel Server™ allows you to harness all the CPUs and GPUs on a network cluster of computers. To take advantage of a cluster, open a parallel pool with a cluster profile. Use the MATLAB Home tab Environment area Parallel menu to manage and select profiles.
After opening a parallel pool, train the network by calling train
with
the 'useParallel'
and 'useGPU'
options.
net2 = train(net1,x,t,'useParallel','yes'); y = net2(x,'useParallel','yes'); net2 = train(net1,x,t,'useParallel','yes','useGPU','only'); y = net2(x,'useParallel','yes','useGPU','only');
Load Balancing, Large Problems, and Beyond
For more information on parallel computing with Deep Learning Toolbox, see Shallow Neural Networks with Parallel and GPU Computing, which introduces other topics, such as how to manually distribute data sets across CPU and GPU workers to best take advantage of differences in machine speed and memory.
Distributing data manually also allows worker data to load sequentially, so that data sets are limited in size only by the total RAM of a cluster instead of the RAM of a single computer. This lets you apply neural networks to very large problems.