Data Preprocessing for Deep Learning
Why Is Data Preprocessing Important?
Data preprocessing is preparing raw data before feeding it to a deep learning model. It highlights patterns and transforms data into a suitable form for the network architecture. Carefully prepping your data can have a significant impact on network accuracy.
Reducing Data Dimensionality
Reducing data dimensionality helps the network recognize patterns in data by removing extraneous information and addresses the “curse of dimensionality” that translates to larger training data sets and network complexity. To reduce dimensionality, you must understand your data. Some network architectures can automatically extract relevant features from data.

Prepping Data for the Network Architecture
A network expects data to be a specific format in terms of size, units, and signal type. Two common types of network architecture are convolutional neural networks (CNNs) and long short-term memory networks. The core building block of a CNN is the convolutional layer, which slides a filter across the input volume and looks for activated regions. From these activated regions, the CNN learns which features are present and where.
Data Types
For image and video data, each input image must be the same in terms of width, height, and color layers. As a result, part of the preparation can be to crop, pad, or resize images. For signal data, the signal length and sample rate must be consistent, which might require cropping, padding, and resampling. These are only some of the data types you can use in deep learning.


Example: Preprocessing Audio Signals with STFT
A short-time Fourier transform (STFT) can be used to visualize how frequency content in an audio signal changes over time. The magnitude squared of the STFT is known as the spectrogram time-frequency representation of the signal. It can be used as a CNN input.
