Ebook

Deep Learning and Traditional Machine Learning: Choosing the Right Approach

CHAPTERS

Chapter 1

Your Project


section

What Are You Trying to Do?

Considering whether to use deep learning techniques or machine learning techniques depends on your project. While one task alone might be more suited to machine learning, your full application might involve multiple steps that, when taken together, are better suited to deep learning.

In this chapter, we will present you with six common tasks to evaluate which of the techniques best apply depending on your project:

  • Predicting an output
  • Identifying objects
  • Moving physically or in a simulation
  • Uncovering trends
  • Enhancing images or signals
  • Responding to speech or text
Video length is 3:47
section

Tasks

PREDICT an output based on historical and current data

Example: Use real-time sensor data from a motor to predict remaining useful life for rotating machinery. The Similarity-Based Remaining Useful Life Estimation example uses linear regression.

Applications: Predictive maintenance, financial trading, recommender systems

Input: Sensor data, timestamped financial data, numeric data

Common algorithms: Linear regression, decision trees, support vector machines (SVMs), neural networks, association rules

Typical approach: Machine learning is more common

IDENTIFY objects or actions in image, video, and signal data

Example: Create a computer vision application that can detect vehicles. The Object Detection Using Faster R-CNN Deep Learning example uses a convolutional neural network.

Applications: Advanced driver assistance (ADAS) with object detection, robotics, computer vision perception for image recognition, activity detection, voice biometrics (voiceprint)

Input: Images, videos, signals 

Common algorithms: CNNs, clustering, Viola-Jones

Typical approach: Deep learning is more common

MOVE an object physically or in a simulation

Example: Perform robotic path planning to learn the best possible route to a destination. The Reinforcement Learning (Q-Learning) File Exchange submission uses a deep Q network.

Applications: Control systems, robotics in manufacturing, self-driving cars, drones, video games

Input: Mathematical models, sensor data, videos, lidar data

Common algorithms: Reinforcement learning (deep Q networks), artificial neural networks (ANNs), CNNs, recurrent neural networks (RNNs)

Typical approach: Deep learning is more common

UNCOVER trends, sentiments, fraud, or threats

Example: Determine how many topics are present in text data. The Analyze Text Data Using Topics Models example uses the latent Dirichlet allocation (LDA) topic model.

Applications: Natural language processing for safety records, market or medical research, sentiment analysis, cybersecurity, document summarization

Input: Streaming text data, static text data

Common algorithms: RNNs, linear regression, SVMs, naive Bayes, latent Dirichlet allocation, latent semantic analysis, word2vec

Typical approach: Machine learning is more common

ENHANCE images and signals

Example: Create high-resolution images from low-resolution images. The Single Image Super-Resolution Using Deep Learning example uses a very-deep super-resolution (VDSR) neural network.

Applications: Improve image resolution, denoise signals in audio

Input: Images and signal data

Common algorithms: LSTM, CNNs, VDSR neural network

Typical approach: Deep learning is more common

RESPOND to speech or text commands based on context and learned routines

Example: Automatically recognize spoken commands like “on,” “off,” “stop,” and “go.” The Speech Command Recognition Using Deep Learning example uses a CNN.

Applications: Customer care calls, smart devices, virtual assistants, machine translation and dictation

Input: Acoustic data, text data

Common algorithms: RNNs (LSTM algorithms in particular), CNNs, word2vec

Typical approach: Both approaches are used

A blue signal on a plot for the spoken word “on.”

Audio signal for the spoken command “on.”

Timeline diagram for artificial intelligence. It indicates that artificial intelligence appeared first in the 1950s, machine learning followed around the 1980s, and deep learning is the most recent, appearing after 2015.

Over time, computer performance has evolved through artificial intelligence, machine learning, and deep learning.

Guess the Algorithm

section

How Accurate Do You Need to Be?

In general, if you have a large data set, deep learning techniques can produce more accurate results than machine learning techniques. Deep learning uses more complex models with more parameters that can be more closely “fit” to the data.

So, how much data is a “large” data set? It depends. Some popular image classification networks available for transfer learning were trained on a data set consisting of 1.2 million images from 1,000 different categories.

If you want to use machine learning and have a laser-focus on accuracy, be careful not to overfit your data.

Overfitting happens when your algorithm is too closely associated to your training data, and then cannot generalize to a wider data set. The model can’t properly handle new data that doesn’t fit its narrow expectations.

To avoid overfitting from the start, make sure you have plenty of training, validation, and test data. Use the training and validation data first to train the model; the data needs to be representative of your real-world data and you need to have enough of it. Once your model is trained, use test data to check that your model is performing well; the test data should be completely new data.

Photos of the African and European swallow are shown side by side. The African swallow appears to be brighter than the European swallow.

You’ll need more data to teach the network to distinguish between similar images, such as the African and European swallow.

If you think your model is starting to overfit the data, take a look at:

  • Regularization — Penalizes large parameters to help keep the model from relying too heavily on individual data points.
  • Dropout probability — Randomly skips some data to avoid the model memorizing the data set.
section

Do You Need to Explain the Results?

Data scientists often refer to the ability to share and explain results as model interpretability. A model that is easily interpretable has:

  • A small number of features that typically are created from some physical understanding of the system
  • A transparent decision-making process

Interpretability is important for many health, safety, and financial applications, for example, if you need to:

  • Prove that your model complies with government or industry standard
  • Explain factors that contributed to a diagnosis
  • Show the absence of bias in decision-making

If you must have the ability to demonstrate the steps the algorithm took to reach a conclusion, focus your attention on machine learning techniques. Decision trees are famously easy to follow down their Boolean paths of “if x, then y.” Traditional statistics techniques such as linear and logistic regression are well accepted. Even random forests are relatively simple to explain if taken one tree at a time.

If your project is more suited to a neural network, support vector machine, or other model of similar opacity, you still have options.

Research on interpretability using proxies

Local interpretable model-agnostic explanations (LIME) take a series of individual inputs and outputs to approximate the decision-making.
Another area of research is the use of decision trees as a method to illustrate a more complex model.
Icons indicating a decision tree in blue, with a pair of scissors cutting or pruning the model to make it smaller.
section

Domain Knowledge

How much do you know about the system where your project sits? If you are working on a controls application, do you understand the related systems that might affect your project, or is your experience more siloed? Domain knowledge can play a part in choosing what data to include in a model and determining the most important features of that data.

What Data Should You Include?

For example, a medical researcher wants to make sense of a large amount of patient data. There could be thousands of features from patient stats, from the characteristics of a disease to DNA traits to environmental elements. If you have a solid understanding of the data, select the features you think will be the most influential and start with a machine learning algorithm. If you have high-dimensional data, try dimensionality reduction techniques such as principal component analysis (PCA) to create a smaller number of features to try to improve results.

Feature Selection

For a model to produce accurate results, you need to make sure it’s using the right data. Feature selection is how you ensure your model is focused on the data with the most predictive power and is not distracted by data that won’t impact decision making. Precise feature selection will result in a faster, more efficient, more interpretable model.

If you have a lot of domain knowledge, use machine learning and manually select the important features of your data.

If you have limited domain knowledge, try automatic feature selection techniques such as neighborhood component analysis or use a deep learning algorithm like CNN for feature selection.

If your data has lots of features, use principal component analysis with machine learning to reduce dimensionality.

For Example

Signal processing engineers are often required to transform their 1D signals to reduce dimensionality (signals often come in at high frequency rates, making the amount of data untenable to process in its raw form) and to expose prominent features specific to the data. One common way is to convert 1D signals into a 2D representation using a transform such as spectrogram.

This conversion highlights the most prominent frequencies of a signal. This creates an “image” that can then be used as input into a CNN.

Six signal plots. The signals in the first three plots indicate visuals for the spoken words “up,” “on,” and “right.” The other three show the same signals transformed and represented as 3 D spectrograms.

Original signal (top) and corresponding spectrograms (bottom) for words.