Ebook

Deep Learning and Traditional Machine Learning: Choosing the Right Approach

CHAPTERS

Chapter 3

Your Hardware Options


The speed of your algorithm is dependent on the size and complexity of your data, the algorithm itself, and your available hardware. In this chapter, we’re going to focus on hardware considerations when:

  • Training the model
  • Running the model in production

Some things to remember:

If you need results quickly, try machine learning algorithms first. They are generally quicker to train and require less computational power. The main factor in training time will be the number of variables and observations in the training data.

Deep learning models will take time to train. Pretrained networks and public data sets have shortened the time to train deep learning models through transfer learning, but it is easy to underestimate the real-world practicalities of incorporating your training data into these networks. These algorithms can take anywhere from a minute to a few weeks to train depending on your hardware and computing power.

section

Training the Model

Desktop CPUs

Desktop CPUs are sufficient for training most machine learning models but may prove slow for deep learning models.

CPU Clusters

Big data frameworks such as Apache Spark™ spread the computation across a cluster of CPUs.

The cluster or cloud option has gained popularity due to the high costs associated with obtaining the GPUs, since this option lets the hardware be shared by several researchers. Because deep learning models take a long time to train (often on the order of hours or days), it is common to have several models training in parallel, with the hope that one (or some) of them will provide improved results.

GPUs

GPUs are the norm for training most deep learning models because they offer dramatic speed improvements over training on a CPU. To reduce training time, it is common for practitioners to have multiple deep learning models training simultaneously (which requires additional hardware).

section

Running the Model in Production

The trend toward smarter and more connected sensors is moving more processing and analytics closer to the sensors. This shrinks the amount of data that is transferred over the network, which reduces the cost of transmission and can reduce the power consumption of wireless devices.

Several factors will drive the architecture of the production system:

  • Will a network connection always be available?
  • How often will the model need to be updated?
  • Do you have specialized hardware to run deep learning models?

Will a network connection always be available?

Machine learning and deep learning models that run on hardware at the edge will provide quick results and will not require a network connection.

How often will the model need to be updated?

Powerful hardware will need to be available at the edge to run the machine learning model, and it will be more difficult to push out updates to the model than if the model resided on a centralized server.

Tools are available that can convert machine learning models, which are typically developed in high-level interpreted languages, into standalone C/C++ code, which can be run on low-power embedded devices.

Do you have specialized hardware to run deep learning models?

For deep learning models, specialized hardware is typically required due to the higher memory and compute requirements.

GPU Coder™ enables code generation that leverages optimized libraries from Intel® (MKL-DNN), NVIDIA® (TensorRT, cuDNN), and Arm® (Arm Compute Library) to create deployable models with high-performance inference speed. With GPU Coder Support Package for NVIDIA GPUs, you can cross-compile and deploy the generated CUDA® code as a standalone application on an embedded GPU.

A Note on Validation

Depending on your application, the level of validation required before using the model in production will vary greatly. In safety-critical application, models can be integrated with existing validation processes such as hardware-in-the-loop to ensure the model runs as expected in the production environment.

Guess the Algorithm