Main Content

Data preprocessing is the second stage of the workflow for predictive maintenance algorithm development:

Data preprocessing is often necessary to clean the data and convert it into a form from which you can extract condition indicators. Data preprocessing can include:

Outlier and missing-value removal, offset removal, and detrending.

Noise reduction, such as filtering or smoothing.

Transformations between time and frequency domain.

More advanced signal processing such as short-time Fourier transforms and transformations to the order domain.

You can perform data preprocessing on arrays or tables of measured or simulated data that you manage with Predictive Maintenance Toolbox™ ensemble datastores, as described in Data Ensembles for Condition Monitoring and Predictive Maintenance. Generally, you preprocess your data before analyzing it to identify a promising condition indicator, a quantity that changes in a predictable way as system performance degrades. (See Condition Indicators for Monitoring, Fault Detection, and Prediction.) There can be some overlap between the steps of preprocessing and identifying condition indicators. Typically, though, preprocessing results in a cleaned or transformed signal, on which you perform further analysis to condense the signal information into a condition indicator.

Understanding your machine and the kind of data you have can help determine what preprocessing methods to use. For example, if you are filtering noisy vibration data, knowing what frequency range is most likely to display useful features can help you choose preprocessing techniques. Similarly, it might be useful to transform gearbox vibration data to the order domain, which is used for rotating machines when the rotational speed changes over time. However, that same preprocessing would not be useful for vibration data from a car chassis, which is a rigid body.

MATLAB^{®} includes many functions that are useful for basic preprocessing of
data in arrays or tables. These include functions for:

Data cleaning, such as

`fillmissing`

and`filloutliers`

. Data cleaning uses various techniques for finding, removing, and replacing bad or missing data.Smoothing data, such as

`smoothdata`

and`movmean`

. Use smoothing to eliminate unwanted noise or high variance in data.Detrending data, such as

`detrend`

. Removing a trend from the data lets you focus your analysis on the fluctuations in the data about the trend. While trends can be meaningful, others are due to systematic effects, and some types of analyses yield better insight once you remove them. Removing offsets is another, similar type of preprocessing.Scaling or normalizing data, such as

`rescale`

. Scaling changes the bounds of the data, and can be useful, for example, when you are working with data in different units.

Another common type of preprocessing is to extract a useful portion of the signal and discard other portions. For instance, you might discard the first five seconds of a signal that is part of some start-up transient, and retain only the data from steady-state operation. For an example that performs this kind of preprocessing, see Using Simulink to Generate Fault Data.

For more information on basic preprocessing commands in MATLAB, see Preprocessing Data.

Filtering is another way to remove noise or unwanted components from a signal.
Filtering is helpful when you know what frequency range in the data is most likely
to display useful features for condition monitoring or prediction. The basic
MATLAB function `filter`

lets you filter a signal
with a transfer function. You can use `designfilt`

to generate filters for use with
`filter`

, such as passband, high-pass and low-pass filters,
and other common filter forms. For more information about using these functions, see
Digital and Analog Filters.

If you have a Wavelet Toolbox™ license, you can use wavelet tools for more complex filter approaches.
For instance, you can divide your data into subbands, process the data in each
subband separately, and recombine them to construct a modified version of the
original signal. For more information about such filters, see
Filter Banks (Wavelet Toolbox). You can also use the
Signal Processing Toolbox™ function `emd`

to decompose separate a mixed signal into components with different time-frequency
behavior.

Predictive Maintenance Toolbox and Signal Processing Toolbox provides functions that let you study and characterize vibrations in mechanical systems in the time domain. Use these functions for preprocessing or extraction of condition indicators. For example:

`tsa`

— Remove noise coherently with time-synchronous averaging and analyze wear using envelope spectra. The example Using Simulink to Generate Fault Data uses time-synchronous averaging to preprocess vibration data.`tsadifference`

— Remove the regular signal, the first-order sidebands and other specific sidebands with their harmonics from a time-synchronous averaged (TSA) signal.`tsaregular`

— Isolate the known signal from a TSA signal by removing the residual signal and specific sidebands.`tsaresidual`

— Isolate the residual signal from a TSA signal by removing the known signal components and their harmonics.`ordertrack`

— Use order analysis to analyze and visualize spectral content occurring in rotating machinery. Track and extract orders and their time-domain waveforms.`rpmtrack`

— Track and extract the RPM profile from a vibration signal by computing the RPM as a function of time.`envspectrum`

— Compute an envelope spectrum. The envelope spectrum removes the high-frequency sinusoidal components from the signal and focuses on the lower-frequency modulations. The example Rolling Element Bearing Fault Diagnosis uses an envelope spectrum for such preprocessing.

For more information on these and related functions, see Vibration Analysis.

For vibrating or rotating systems, fault development can be indicated by changes in frequency-domain behavior such as the changing of resonant frequencies or the presence of new vibrational components. Signal Processing Toolbox provides many functions for analyzing such spectral behavior. Often these are useful as preprocessing before performing further analysis for extracting condition indicators. Such functions include:

`pspectrum`

— Compute the power spectrum, time-frequency power spectrum, or power spectrogram of a signal. The spectrogram contains information about how the power distribution changes with time. The example Multi-Class Fault Detection Using Simulated Data performs data preprocessing using`pspectrum`

.`envspectrum`

— Compute an envelope spectrum. A fault that causes a repeating impulse or pattern will impose amplitude modulation on the vibration signal of the machinery. The envelope spectrum removes the high-frequency sinusoidal components from the signal and focuses on the lower-frequency modulations. The example Rolling Element Bearing Fault Diagnosis uses an envelope spectrum for such preprocessing.`orderspectrum`

— Compute an average order-magnitude spectrum.`modalfrf`

— Estimate the frequency-response function of a signal.

For more information on these and related functions, see Vibration Analysis.

Signal Processing Toolbox includes functions for analyzing systems whose frequency-domain
behavior changes with time. Such analysis is called
*time-frequency* analysis, and is useful for analyzing and
detecting transient or changing signals associated with changes in system
performance. These functions include:

`spectrogram`

— Compute a spectrogram using a short-time Fourier transform. The spectrogram describes the time-localized frequency content of a signal and its evolution over time. The example Condition Monitoring and Prognostics Using Vibration Signals uses`spectrogram`

to preprocess signals and help identify potential condition indicators.`hht`

— Compute the Hilbert spectrum of a signal. The Hilbert spectrum is useful for analyzing signals that comprise a mixture of signals whose spectral content changes in time. This function computes the spectrum of each component in the mixed signal, where the components are determined by empirical mode decomposition.`emd`

— Compute the empirical mode decomposition of a signal. This decomposition describes the mixture of signals analyzed in a Hilbert spectrum, and can help you separate a mixed signal to extract a component whose time-frequency behavior changes as system performance degrades. You can use`emd`

to generate the inputs for`hht`

.`kurtogram`

— Compute the time-localized spectral kurtosis, which characterizes a signal by differentiating stationary Gaussian signal behavior from nonstationary or non-Gaussian behavior in the frequency domain. As preprocessing for other tools such as envelope analysis, spectral kurtosis can supply key inputs such as optimal band. (See`pkurtosis`

.) The example Rolling Element Bearing Fault Diagnosis uses spectral kurtosis for preprocessing and extraction of condition indicators.

For more information on these and related functions, see Time-Frequency Analysis.