Introduction to Anomaly Detection for Engineers
Brian Douglas
Anomaly detection is the process of identifying events or patterns that differ from expected behavior. This is important for applications like predictive maintenance but can be hard to achieve by inspection alone. Machine learning and deep learning (AI) techniques for anomaly detection can uncover anomalies in time series or image data that would be otherwise hard to spot. Learn how and why to apply anomaly detection algorithms to identify anomalies in hardware sensor data.
Published: 23 Sep 2022
One of these images is not like the other. Can you spot it? Ok, that’s pretty easy. How about in time series data? Which of these data streams is the odd one out? Ok, fine also pretty simple. But what about these images? Or these time series? Can you tell which is the expected behavior and which is anomalous behavior? It can get pretty tough to identify events or patterns that differ from the expected behavior through inspection alone. And as I will show you later in this video with a demonstration, it’s especially difficult to find very subtle changes in the behavior of hardware. This is where anomaly detection can be beneficial.
Anomalies are a deviation from the expected behavior and anomaly detection algorithms are trying to find those deviations. Now, deviations don’t necessarily mean that a fault has occurred, or that something is broken or not performing to within its specification. It’s just simply a difference and understanding when those differences occur in your data can be really helpful. So, let’s talk about that. I’m Brian, and welcome to a MATLAB Tech Talk.
Anomaly detection is useful for many applications including looking for fraud in financial transactions, checking for defects in manufacturing production lines, and looking for unusual movements in video surveillance footage. However, what I want to expand on more in this video is how anomaly detection can be used for finding the sort of low hanging fruit in predictive maintenance applications. With predictive maintenance, an algorithm is looking for specific fault trends in a machine and is trying to come up with an estimate of how much time is left before failure. And this estimate will help you determine when to schedule maintenance and which parts of your system require it.
The downside of predictive maintenance is that it typically requires a lot of historical data in order to train the algorithm to predict faults. That is, we need to know what failure looks like from similar machines that have failed in the past. And we don’t want machines to fail and will obviously go to great lengths to prevent failures, therefore some faults happen so infrequency that you might not have enough data to successfully learn to classify it.
On the other hand, anomaly detection isn’t looking for faults, it’s looking for deviations from a nominal system. And to detect when a system is behaving differently than expected, for any reason, you just need data for the normal behavior, and that is usually pretty easy to come by since hopefully your system runs normally most of the time.
This makes anomaly detection easier to set up than a full-up predictive maintenance algorithm. And, if ultimately you do want to set up predictive maintenance, then running anomaly detection is still beneficial because it’s a good way to flag possible failure data that you can then use for predictive maintenance training.
Alright, so if I’ve convinced you that anomaly detection is worthwhile, let’s now talk about how it works. And I’m going to keep this rather generic because there are a lot of different ways to approach anomaly detection algorithmically. For example, you could set a threshold on a particular metric and check to see if it’s ever exceeded. Thresholding is a very simple and straightforward approach, especially for single variate date. For anomalies that require looking across multiple variables or for ones that have multiple types of features that indicate an anomaly, then something a little more robust might be needed. For example, you can use machine learning approaches like one-class support vector machines, isolations forests, or autoencoders.
Each of these all have the same basic structure. In some way they learn a model using data of the expected behavior, and then use that model to determine if the observed behavior falls outside of what the model considers normal.
For one-class SVM’s, the model is a hyper plane that maximizes the distance between expected and unexpected behavior. For isolation forests, the model is a tree that isolates each observation into a leaf. The more decisions it takes to isolate an observation the less likely it is to be an anomaly. And with auto encoders, the model is a deep neural network that is trained to reconstruct the input data. The better it does at reconstruction, the more likely the data is expected.
There are more than just these algorithms as well but I just want to give you a sense of how different ways to model the expected behavior can help you determine when an anomaly has occurred.
Ok, at this point, I think it might be easier to understand anomaly detection if we can watch one of the algorithms in action using some real hardware. For this example, I’m using the QUBE-Servo 2 from Quanser. This is a rotary inverted pendulum and I’m controlling it with a feedback controller that is trying to keep the red arm upright while also following a reference angle for the silver arm. The reference is just a square wave that is stepping back and forth, and back and forth, over and over again.
Now, I’m using 4 measurements to control the system, but I’m only recording two of those measurements: the motor voltage in volts and the angle off vertical of the red pendulum in degrees. These are the only two that I’m going to use for anomaly detection. Right now, you can see that for the most part the motor voltage is very low and the pendulum angle is near zero degrees, except for when the reference steps to a new location, and then both the voltage and the angle increase slightly to follow that reference before settling back down.
This is the nominal hardware behavior since I know that it is functioning as I designed it. Now I want I don’t know every types of failure for this system and so instead of checking explicitly for certain failures, I’m just going to check to see if any anomalies occur going forward. So, my first step in anomaly detection is to define this nominal behavior in some way so that I can determine if the behavior deviates from it in any way.
For this example, I’m going to use an auto-encoder to model the nominal behavior. And like I talked about earlier, an autoencoder is a type of deep neural network that attempts to reduce the dimensionality of the input data. What that means is that it takes the high dimension input data, encodes it in a lower dimension by representing it with less information, and then it decodes it again to reconstruct the original data.
If the data is dominated by low dimensional behaviors - like the predictable dynamics of the pendulum then the auto encoder will capture those dynamics in the encoding process and essentially ignore other things like the sensor noise since it would take more dimensions to recreate those random motions perfectly. So, if we train the auto encoder on this nominal behavior, we would expect the reconstructed data to match the general behavior of the pendulum but ignore more of the high frequency noise and other motions.
So, let’s do that. I found this MATLAB example called Time Series anomaly detection using deep learning, which uses an auto encoder on time series data which is exactly what I want to do. So, for the most part, I’m following along with this example and tweaking it slightly for my particular problem. The main difference is that instead of three data channels, I only have the two: motor voltage and pendulum angle.
And the reason why I like this example is that it’s training the auto encoder directly on the raw data which is something I want to try first. Anomalies are often difficult to detect in raw data, even for machine learning, since raw data can have high dimensionality and therefore, the algorithm needs to learn on its own how to narrow down all that information into patterns that indicate normal behavior. On the other hand, I could do a little feature engineering and preprocess the data in such a way that highlights the features that I think are most likely to indicate anomalies and then the machine learning algorithm only needs to learn using just those particular features. Again, I’m going to try the raw data approach in this example and see how it does, however, the derived feature approach is very popular for real world problems.
Alright, I collected 10 minutes of nominal data from my hardware, and just like the example I split it up into training data and validation data. I left the auto encoder architecture the exact same as it was in the example as well as the training options, and then trained this network with the trainNetwork MATLAB function. Overall, it took about a minute or so to run.
Now that I have a network that can reconstruct the nominal input data, I can run that model on new incoming data with the predict function. So, the predict function output is the network’s attempt at reconstructing the input data And if we go back to the real-time running of the hardware, that’s what we see over here.
The blue lines are the reconstructed data from the trained auto encoder given these inputs. And this third figure just plots the raw data and the reconstructed data on the same chart so we can see how well the network was able to capture the essential dynamics of this system. And visually, it appears to do a pretty decent job. However, we can quantify how good of a job it does by looking at the mean absolute error between the two signals. So, I’m looking at 4 seconds of raw data and reconstructed data, and calculating the mean absolute error, and showing it here.
Notice that when the pendulum is stationary, the error is pretty low, less than 0.1, and when it performs the fast dynamic maneuver, the error increases a bit to about 0.18 or so. So, this is telling me that the auto encoder probably hasn’t perfectly captured the nominal dynamics of this system otherwise I’d expect the mean error to be about the same in both situations, but perhaps it’s good enough. We’ll soon find out.
This red line is at the maximum error that the auto encoder has produced in recreating the nominal data. And now with this red line as a threshold, we have a way to check for anomalies.
We can assume that as long as the system is behaving nominally, the auto encoder network will be able to reconstruct the data to within this error threshold. However, if new dynamics crop up in the system, or if there are new external influences on this system, then we would expect that the network will not be able to reconstruct that data as well since it wasn’t trained on it, and therefore, the error would be greater than this threshold. In this way, to detect anomalies going forward, we just need to look at this error term and compare it to the threshold.
So, let’s try it out. I have the pendulum running nominally, and as you can see the error is below the threshold. However, let me disturb the pendulum. Now, even though the controller was able to compensate for that disturbance and the arm stayed upright, the auto encoder network was unable to reconstruct those strange dynamics and the error jumped above the threshold, indicating an anomaly.
Now, poking it with my finger is a pretty obvious disturbance, but the nice thing about anomaly detection is that the anomaly itself can be very subtle to a human and still produce more error through the network than the nominal system does. For example, watch what happens when I use this twine to add just a small amount of friction to the pendulum arm. This particular anomaly is one that is quite common for motor-based systems. Bearing drag often increases over time and you want to be able to detect that increase and flag it for further action. And as you can see, it didn’t take much of a deviation to trigger a detection. The pendulum angle and the motor voltage don’t appear to visually change very much, but this additional drag produces a signal that the auto encoder network can’t reconstruct as well.
Alright, lastly, I want to show you that this algorithm can also detect other types of anomalies like a change in the load on the motor. And I’m going to create this error by adding a small amount of well change onto the tip of the pendulum. Notice how when I first drop them in the slight force from them trips the detector, so far so good, but we should also be able to catch the change in dynamics when the pendulum swings to the other side.
Ha, yep it caught it! Ok, that might have been too much change for the controller to handle.
Alright, I’m not going to add any new anomalies into this system. Hopefully, this demonstration shows you how anomaly detection works to find any deviations from the nominal system rather than looking for a specific deviation.
And what’s really cool about this is that this algorithm could be run real-time like I’m doing here, or run periodically on saved data to check for historical anomalies. Also, anomaly detection could be tied into a larger design where something is counting the number of times an anomaly occurs in a week and an operator is just looking at how fast that counter is growing to determine when to intervene. It’s all really cool and the number of uses is unlimited so I hope you go and try it out yourself. All of the references and resources I used in this video are linked below.
Alright, that’s where I’m going to leave this video. If you don’t want to miss any other future Tech Talk videos don’t forget to subscribe to this channel. And you want to check out my channel, control system lectures, I cover more control theory topics there as well. Thanks for watching, and I’ll see you next time.