Deep Learning and Traditional Machine Learning: Choosing the Right Approach

CHAPTERS

Chapter 3 Your Hardware Options

The speed of your algorithm is dependent on the size and complexity of your data, the algorithm itself, and your available hardware. In this chapter, we’re going to focus on hardware considerations when:

Training the model
Running the model in production

Some things to remember:

If you need results quickly, try machine learning algorithms first. They are generally quicker to train and require less computational power. The main factor in training time will be the number of variables and observations in the training data.

Deep learning models will take time to train. Pretrained networks and public data sets have shortened the time to train deep learning models through transfer learning, but it is easy to underestimate the real-world practicalities of incorporating your training data into these networks. These algorithms can take anywhere from a minute to a few weeks to train depending on your hardware and computing power.

Training the Model

Desktop CPUs

Desktop CPUs are sufficient for training most machine learning models but may prove slow for deep learning models.

CPU Clusters

Big data frameworks such as Apache Spark™ spread the computation across a cluster of CPUs.

The cluster or cloud option has gained popularity due to the high costs associated with obtaining the GPUs, since this option lets the hardware be shared by several researchers. Because deep learning models take a long time to train (often on the order of hours or days), it is common to have several models training in parallel, with the hope that one (or some) of them will provide improved results.

GPUs

GPUs are the norm for training most deep learning models because they offer dramatic speed improvements over training on a CPU. To reduce training time, it is common for practitioners to have multiple deep learning models training simultaneously (which requires additional hardware).

Running the Model in Production

The trend toward smarter and more connected sensors is moving more processing and analytics closer to the sensors. This shrinks the amount of data that is transferred over the network, which reduces the cost of transmission and can reduce the power consumption of wireless devices.

Several factors will drive the architecture of the production system:

Will a network connection always be available?
How often will the model need to be updated?
Do you have specialized hardware to run deep learning models?

Pedestrian Detection on a NVIDIA GPU with TensorRT

Will a network connection always be available?

Machine learning and deep learning models that run on hardware at the edge will provide quick results and will not require a network connection.

How often will the model need to be updated?

Powerful hardware will need to be available at the edge to run the machine learning model, and it will be more difficult to push out updates to the model than if the model resided on a centralized server.

Tools are available that can convert machine learning models, which are typically developed in high-level interpreted languages, into standalone C/C++ code, which can be run on low-power embedded devices.

Do you have specialized hardware to run deep learning models?

For deep learning models, specialized hardware is typically required due to the higher memory and compute requirements.

GPU Coder™ enables code generation that leverages optimized libraries from Intel^® (MKL-DNN), NVIDIA^® (TensorRT, cuDNN), and Arm^® (Arm Compute Library) to create deployable models with high-performance inference speed. With GPU Coder Support Package for NVIDIA GPUs, you can cross-compile and deploy the generated CUDA^® code as a standalone application on an embedded GPU.

Deep Learning on an Intel Processor with MKL-DNN (1:36)

A Note on Validation

Depending on your application, the level of validation required before using the model in production will vary greatly. In safety-critical application, models can be integrated with existing validation processes such as hardware-in-the-loop to ensure the model runs as expected in the production environment.

Guess the Algorithm

Start quiz

Nope!

Battelle used MATLAB to develop signal processing and a support vector machine algorithm and ran them in real time.

The participant was shown a computer-generated virtual hand performing movements such as wrist flexion and extension, thumb flexion and extension, and hand opening and closing, and instructed to think about making the same movements with his own hand.

Working in MATLAB, the team developed algorithms to analyze data from the 96 channels in the implanted electrode array. Using Wavelet Toolbox™, they performed wavelet decomposition to isolate the frequency ranges of the brain signals that govern movement.

During testing sessions, the team trained the SVM by having the participant attempt the movements shown in the videos. They used the trained SVM’s output to animate a computer-generated virtual hand that the participant could manipulate on screen. The same SVM output was scaled and used to control the 130 channels of the NMES sleeve.

Read story

Correct!