What Is Anomaly Detection - MATLAB & Simulink

Anomaly Detection

What Is Anomaly Detection?

Design algorithms to detect unexpected events or patterns in data

Anomaly detection is the process of identifying events or patterns in data that deviate from expected behavior. Anomalies often indicate important behavior, such as machine faults, security breaches, or process inefficiencies.

Benefits of Anomaly Detection

Anomaly detection methods can range from simple outlier detection to complex machine learning algorithms trained to uncover hidden patterns in massive multivariate data sets. For example, in an industrial setting with thousands of sensors, time series anomaly detection algorithms can uncover patterns that would be impossible for humans to identify manually. Use cases and benefits of anomaly detection include:

  • Predictive maintenance: Anomalies in sensor equipment outputs, such as vibration or temperature data, may be precursors to more severe failures. Addressing these anomalies early can improve equipment safety and avoid costly downtime.
  • Process monitoring: Anomaly detection in process data, such as detecting anomalies in energy across a power grid, can reveal opportunities for optimizing performance and improving asset availability.
  • Test data insights: During testing, anomaly detection can be used to identify issues in prototypes, remove bad sensor readings, and understand system performance. This approach ensures that products meet important standards before becoming operational.
  • Quality control: In manufacturing, anomalies can indicate physical defects in production lines that may go unnoticed with traditional methods, minimizing waste and optimizing yield.

Introduction to Anomaly Detection for Engineers

Anomalies are deviations from the expected behavior, and it can be tough to identify anomalous events or patterns through inspection alone. Anomaly detection algorithms can help find those deviations. Learn how with a hardware demo.

Types of Anomalies

Anomalies in time series data typically fall into three main categories:

Point anomalies, or outliers, are individual data points that deviate from the expected value. An isolated point anomaly may indicate a problem or system change that requires investigation, or it may not indicate a problem unless several point anomalies occur in quick succession.

Collective anomalies occur when a group of data points together deviate from expected patterns. For example, a sequence of irregular fluctuations in power consumption might indicate a larger issue, such as a failing component or change in usage patterns.

Multivariate anomalies occur only when analyzing multiple data sources together. For example, temperature and pressure readings in a chemical process may individually indicate safe operating conditions, but a change in their relationship could signal a problem.

Anomalies in Other Data Types

Learn about anomaly detection in other types of data, such as image, video, and text:

Anomaly Detection with MATLAB

There are many ways to design time series anomaly detection algorithms in MATLAB®. The anomaly detection approach most suitable for a given application will depend on the amount of anomalous data available and whether you can distinguish anomalies from normal data.

Data Access and Exploration

The foundation of any anomaly detection project is data. MATLAB supports many ways to access your data, whether importing historical data sets stored locally or in the cloud or acquiring data directly from sensors or databases.

Sometimes you can perform anomaly detection just by looking at your data. For example, in the figure below showing signals collected from a fan, you can easily see the abrupt signal changes that indicate anomalies in the fan behavior. If you are able to detect anomalies by eye, you may be able to use a simple algorithm such as findchangepts or controlchart.

MATLAB plot for anomaly detection showing motor voltage, fan speed, and temperature data.

Data from a cooling fan showing anomalies that are easy to detect by eye, created using the MATLAB plot function.

Data Preprocessing and Feature Engineering

Anomalies are often hard to detect visually from raw data. Today’s complex machines can have thousands of sensors, and sometimes anomalies become apparent only when considering many sensors at once. If you have labeled data, you can examine statistical distributions of time- and frequency-domain features. You may also apply unsupervised feature ranking to identify the features that have the highest variance or the lowest correlation with other features, which have the best chance of training an accurate time series anomaly detection algorithm.

The Diagnostic Feature Designer in Predictive Maintenance Toolbox™ can help you to interactively extract, analyze, and rank features from time series data.

Screenshot of using the app for anomaly detection showing bar chart for feature ranking and histograms for probability.

Using the Diagnostic Feature Designer app to extract, explore, and rank features in normal three-axis vibration data. (See MATLAB code.)

Designing Anomaly Detection Algorithms

Anomaly detection algorithms may involve applying a statistical method to historical data or training an AI model to detect anomalies on new data. MATLAB provides a broad range of time series anomaly detection approaches that fall into three categories: statistical and distance-based methods, one-class AI models, and clustering. These are all possible tools to use when designing anomaly detection algorithms, but where you start will depend on the data you have and your goals.

The table below provides a high-level overview of when to use different broad categories of anomaly detection methods and a few example functions in MATLAB. This is not an exhaustive list, and you should try a variety of techniques to achieve the best results.

Anomaly Detection Approaches
Category Example Functions Use When
Statistical and distance-based methods isoutlier, findchangepts, matrixProfile, robustcov, mahal
  • You have a lot of normal data, with some anomalies mixed in.
  • You want to find the anomalies in your data.
  • You want a simple, easily explainable approach.
One-class AI models

ocsvm, iforest, rccforest, lof, deepSignalAnomalyDetector

  • You have a lot of normal data, but few anomalies.
  • You want to train an AI model to detect anomalies on new data.
Clustering methods kmeans, dbscan, clusterdata, fitgmdist
  • You have a balanced mix of normal data and anomalies.
  • You want to separate the data into identifiable clusters.

Statistical and Distance-Based Methods

Statistical and distance-based methods for anomaly detection rely on assumptions about the underlying data distribution. They identify anomalies by determining which data points deviate significantly from expected behavior. These methods include:

  • Outlier and changepoint detection: Outlier detection flags individual data points that fall outside expected thresholds, such as unusually high or low values. Changepoint detection identifies shifts in the data’s statistical properties, such as a sudden change in the mean or variance. These are useful methods for point or collective anomaly detection that can help you understand or clean up your data, but they may not be robust enough to catch subtler behavior changes.
  • Matrix and distance profiling: Computing the distance profile and matrix profile are anomaly detection methods that analyze time series data to uncover motifs (repeated patterns) and discords (rare or anomalous patterns). They are particularly useful for detecting collective anomalies in long time series by comparing segments of the data and are very computationally efficient.
  • Robust covariance: Robust covariance models the data’s distribution and detects points that fall outside its contours. By considering the relationships among variables, it’s effective for anomaly detection in data sets where multivariate anomalies result from deviations in multiple features simultaneously.
  • Mahalanobis distance: Mahalanobis distance measures how far a data point is from the center of a multivariate distribution. It’s particularly suited for detecting multivariate anomalies in data sets where features are correlated.

Applying distanceProfile for anomaly detection. Distance profile computes the distance between a query subsequence (red) and all other subsequences in the time series to identify the most similar (motif) and most different (discord). (See MATLAB documentation.)

One-Class AI Models

One-class models are designed for unsupervised anomaly detection—that is, where most or all the data is “normal” (not anomalous). These one-class anomaly detection models can be machine learning or deep learning based—they are trained on normal data, and any deviations from normal are flagged as anomalies. One-class models work best when anomalies are rare or lack a well-defined pattern, and you have lots of normal data for training. They can be applied to point, collective, or multivariate anomalies.

Popular one-class AI models for anomaly detection in MATLAB include:

  • Isolation forest: Isolation forests build trees that isolate each observation into a leaf, and an anomaly score is computed as the average depth to your sample: anomalous samples take fewer decisions than normal ones. This anomaly detection method supports a mix of numeric and categorical features and works on high-dimensional data.
  • One-class support vector machine: One-class support vector machines create a boundary around normal data points in a high-dimensional feature space, separating them from points outside the boundary, which are classified as anomalies. This approach requires numeric features as input and will not work well on high-dimensional data.
  • Autoencoders: Autoencoders are neural networks trained on normal data that attempt to reconstruct the original input. The trained autoencoder will reconstruct a normal input accurately, but a large difference between the input and its reconstruction indicates an anomaly. Autoencoders may use convolutional neural networks or long short-term memory networks. Signal Processing Toolbox™ provides a deepSignalAnomalyDetector object for applying these deep learning methods.
Screenshot of MATLAB plot for anomaly detection showing before and after vibration data for three channels.

Example of multivariate anomalies detected in three-axis vibration data. Here, the data collected right after maintenance is normal, while the data collected right before maintenance is identified as an anomaly. (See MATLAB code.)

Clustering

When you have anomalies mixed into your data but cannot label them, you can also try unsupervised clustering approaches to anomaly detection. Clustering algorithms group similar data points based on their characteristics. Typically, clustering methods are applied to features extracted from the time series data, such as principal components. Sometimes you can associate clusters of normal and anomalous data, but unless your data set is balanced (containing many anomalies of the same type), useful results are more likely with the one-class AI methods.

Testing and Validation

Testing and validation ensure that an anomaly detection model performs accurately and reliably on unseen data. This process requires some labeled anomalous data—and this data might be hard to come by. When anomalous data is scarce, you can generate synthetic time series data using rule-based methods (e.g., adding noise or spikes) or from physics-based Simscape™ models of your system. Or you can generate additional tabular feature data using synthesizeTabularData.

Engineers commonly split data into training, validation, and test sets: the training set teaches the model normal behavior, the validation set tunes it during training, and the test set evaluates its final performance. Performance metrics like precision, recall, F1-score, and ROC-AUC assess how well the model detects anomalies. Robustness testing further ensures reliability by evaluating the model on noisy, incomplete, or unseen data patterns.

Deployment

You can deploy algorithms to embedded devices for real-time anomaly detection by generating C/C++ code or Simulink blocks. If you want the model to learn continuously from new data, you might also apply incremental anomaly detection. Here, incoming data is processed with little or no information about the data, and the algorithm adapts to changes in real time.

For operating on enterprise-wide data, you can also deploy anomaly detection algorithms to your choice of cloud environments.

Keep Exploring This Topic