Video length is 16:07

Tracking a Single Object With an IMM Filter | Understanding Sensor Fusion and Tracking, Part 4

From the series: Understanding Sensor Fusion and Tracking

Brian Douglas

This video describes how we can improve tracking a single object by estimating state with an interacting multiple model filter. We will build up some intuition about the IMM filter and show how it is a better tracking algorithm than a single model Kalman filter.

We cover what makes tracking a harder problem than positioning and localization because there is less information available to the tracking filter. We explain how the IMM makes up for the lack of information, and show some simulated results.

Published: 25 Sep 2019

In this video, we’re going to switch our focus from trying to estimate the state of our own system to estimating the state of a remote object. So we’re switching from the idea of positioning and localization to single object tracking. Figuring out where another object is isn’t all that different from figuring out where you are. We’re simply trying to determine state, like position or velocity, by fusing together the results from sensors and models. The part that makes tracking harder is that we usually have to do it with less information. But to deal with the lack of some information we can upgrade a single model estimation filter, like the standard Kalman filter that we used in the last video, to an interacting multiple model filter. In this video, we’re going to build up some intuition around the IMM by showing how it achieves state estimation when tracking an uncertain object. If you haven’t heard of an IMM before, I hope you stick around because I think it’s a pretty awesome approach to solving the tracking problem. I’m Brian, and welcome to a MATLAB Tech Talk.

Throughout this video, I’m going to be showing some simulation results so that, as we build up the IMM filter, you can see how the changes impact the quality of the estimation. And I generated the results using the example, Tracking Maneuvering Targets that comes with the Sensor Fusion and Tracking Toolbox from MathWorks. The basic idea is that this example simulates tracking an object that goes through three distinct maneuvers: it travels at a constant velocity at the beginning, then a constant turn, and it ends with the object undergoing a constant acceleration. Within the script, we can set up different single and multiple model filters to track this object.

And to give you a glimpse of what we’re working towards, I’ll show you the end result. On the left is the result for a typical single model filter, and on the right is the result for an interacting multiple model filter. The bottom graph shows the normalized distance between the object's true position and the estimated position. As you can see, the IMM does a much better job tracking this maneuvering object; the normalized distance through all three maneuvers is much lower than the single model solution. So the question is why? What makes the IMM so special? To answer that, we need a little background information.

Estimation filters, like a Kalman filter, work by predicting the future state of a system and then correcting that state with a measurement. So we predict, and then we measure and correct. In order to predict, we have to give the filter a model of the system, something that it can use to estimate where the system will be at some time in the future. And then at that future time, a measurement of the system state is made using one or more sensors. And we use that measured state to correct the predicted state based on the relative confidence in both it and the prediction. This blended result is the output of the filter.

This two-step process, predict and correct, is the same whether we’re estimating the state of our own system or we’re estimating the state of a remote object we’re tracking. However, for a tracked object, one of those steps is not as easy as the other.

Let’s start with the differences in the measurement step. In the last video, we used a GPS and IMU to measure state. These are sensors that are embedded within the system and that we have access to. With tracking, we often don’t have access to the sensors within the system and so the measurements need to come from remote sensors like a radar tracking station or a camera vision system. But the exact set of sensors that you use doesn’t change the nature of the measurement step. The idea is that we want to fuse together sensors that complement each other by combining the strengths of each so that you get a good overall measurement. So you can imagine that as long as you have the right combination of sensors—remote or local—then measuring the state of a system you have control over is pretty much the exact same as measuring the state of a remote object. There is, however, at least one major difference, and that is the idea of a false positive result. You get a measurement, but it’s not of the object that you’re tracking; it’s for some other object in the vicinity. This gets into a data association problem that we’ll talk more about in the next video. For now, assume that we know that we are measuring the object we’re tracking and there’s no confusion there.

So what about the prediction step? Well, this is where the difference lies. It’s much harder to predict the future state of an object that you don’t have control over than it is one that you do.

Let’s demonstrate the prediction problem with an example. Imagine an airplane flying through a radar station that updates once every few seconds and you want to predict where it’ll be at the next detection. Let’s say you’re the filter here. Do you have a guess? Probably around here, right? It’s been pretty consistent before this, so it makes sense that it’ll continue on this trajectory. Now, what if the last few measurements looked like this instead? You’d probably assume that the airplane was currently turning and you’d have more confidence in a prediction that continued that trend. So how could we code this kind of intuition into a filter?

Well, consider this: Motion comes from three things. The first is the dynamics and kinematics of the system that carries the current state forward, so the airplane already has some velocity and so it would move forward in a fairly predictable manner based on the physics of the plane traveling through the air. Two, motion also comes from the commanded and known inputs into the system that add or remove energy and change the state; this would be things like adjusting the engines or control surfaces. If the pilot rotates the control wheel to the right, then you would be correct to assume that the state of the plane also moves to the right, and, three, motion comes from inputs into the system that are unknown or random from the environment, things like wind gusts and air density changes. So these are the three things that we need to take into account when predicting a future state.

So how does an estimation filter do this? Well, we give the filter access to the dynamics in the form of a mathematical model. And if it’s a system that you have control over, then the filter can have access to the control inputs as well. That is, you can tell the filter when you’re commanding the system and it can play those commands through the model to better the prediction. Now, the unknown inputs into the system, as well as uncertainty in the model, by definition can’t be known and therefore they only degrade the prediction. We take this degradation into account with the filter process noise. The higher the process noise, the more uncertain you are about the prediction.

So if you were the one flying the plane, and you knew that you didn’t command any adjustments to the airplane, no control inputs, then you could expect with reasonable certainty that the plane would maintain its current speed and direction, so the prediction at the red X is probably pretty close.

But what if you weren’t flying the plane, but tracking it remotely? How do we account for the control inputs in this situation? Well, it depends on whether we’re talking about cooperative tracking or uncooperative tracking. A cooperative object shares information with the tracking filter. So the airplane would share the commands it was sending to the engines and the control surfaces, and therefore tracking a cooperative object is pretty similar to flying it ourselves.

Uncooperative objects, however, don’t share their control inputs and so we have to treat them as additional unknown disturbances.

Let’s revisit our prediction of the airplane, but this time it’s uncooperative. Now, how can we handle this? Well, when we were the ones doing the prediction earlier, we assumed that whatever motion the airplane was engaged in was probably the most likely motion to continue into the future. Sure, the pilot may change course but, at least over a short time period, it’s likely they maintained the same motion. Therefore, the model that we give our filter should take into account the motion that we are expecting. If we think the plane is traveling straight, the model should predict the state forward. If we think the airplane is turning, the model should predict the state rotating off one direction or another. Choosing the right single model is sort of a pre-prediction decision, we’ll say.

Let’s go back to the MATLAB example and see how well this single model filter does with a maneuvering object. The model that this filter is using is a constant velocity model, so it is predicting the future state under the assumption that the object continues forward at a fixed speed. If we look at the normalized distance, you can see that it does a great job when the object is moving at a constant velocity, maybe about 5 units of error or so, but the error increases dramatically during the constant turn portion. I don’t even know how bad it gets; it’s off the chart. And it’s about 30 units of error during the constant acceleration section. So with a single model, our prediction is great if the object actually performs that motion but falls apart if the model doesn’t match reality.

However, we may say that we’re putting too much trust in our prediction. We’ve increased the number of unknowns in our system, and therefore should have less confidence in our prediction.  The airplane could turn or slow down or speed up. We just don’t know. So we should account for this by increasing the process noise in our filter. Trusting the prediction less has the byproduct of trusting the correction measurement more. And this makes sense. If we have a hard time predicting where the airplane will be, why not just believe the radar measurement when we get one and basically ignore most of the useless prediction? Well, let’s go back to the MATLAB simulation and see how this idea plays out.

In this run, I’ve left the constant velocity model but upped the process noise, and you can clearly see there is a difference. When the object is turning, the error is now a much better 30 units or so, and the acceleration portion improved as well. But there’s a cost: The constant velocity section, which is the portion that our model is set up for in the first place, got worse.

This section got worse because we’re relying more on the noisy measurements. So if we can’t trust the prediction and we are mostly relying on the sensor measurements anyway, then what good is this estimation filter? The whole point is to use a prediction to account for some of the measurement noise, lowering the overall uncertainty.

Well, this is the problem we’re left with. How do we estimate the state of a maneuvering object better than what the sensors alone are capable of measuring?

And the answer: Run more than one model.  Basically, we can think of this as running several simultaneous estimation filters, each with a different prediction model and process noise. The idea is to have one model for each type of motion that you expect the tracked object to engage in. You know, things like move at a constant velocity, or constant acceleration or constant turning, and so on. Whatever is necessary to cover the full range of possible motions.

Each model predicts where the object will be if it follows that particular motion. Then, when we get a measurement, it is compared to every single prediction. From this, claims can be made as to which model most likely represents the true motion and we can place more trust in that model for the next prediction cycle. This behaves just like how a human would do prediction. If the airplane seems to be flying straight, assume it’ll keep flying straight. If you see that it’s starting to turn, assume that turn will continue for some time. With this method, there will be some transient error whenever the object transitions to a new motion, but the filter will quickly realize that a new model has a better prediction and will start to increase its likelihood.

This is the general idea behind multiple model algorithms, but there is still one more step to get to interacting multiple models.

The problem we’ll have with the current way we’ve set up the filters is that each one is operating on its own, isolated from the others. This means that for a model that doesn’t represent the true motion, it’s going to be maintaining its own bad estimate of the system state and state covariance. Then, when the object changes motion, and there is a transition to this model, with its bad state estimate and covariance, the filter is going to take some time to converge again.  So, in this way, every time there is a transition to a new motion, the transient period will be longer than necessary while the filter is trying to catch up.

So to fix this, we allow the models to interact. After a measurement, the overall filter gets an updated state and state covariance based on the blending of the most likely models. At that point, every filter is reinitialized with a mixed estimate of state and covariance based on their probability of “switching to” or “mixing with” each other. This is constantly improving each individual filter to reduce its own residual error, even when it doesn't represent the true motion of the object. In this way, an IMM filter can switch to an individual model without having to wait for it to converge first.

So now we can finally make sense of the IMM result that I showed you at the beginning of this video. This IMM is set up with three models: constant velocity, turn, and acceleration, to match the three expected motions of this object. On the left are plots showing the normalized distance, or the error, for the different models we talked about. That way you can see the results of all three side by side. The top right shows the maneuver profile of the object, and there’s a new graph in the bottom right that shows how likely each model in the IMM is to represent the true motion. The colored overlay is just there to give you a visual reference for which motion the object is currently engaged in.

So let’s kick this off. Check out the IMM results. You can see that the overall normalized distance is very low for all three maneuvers. Also, check out how the likelihood of each model skyrockets when the object is doing the motion that it’s predicting and the transient time between motions is pretty low. So as long as the object isn’t constantly and quickly changing motions, then this transient error won’t contribute much to the overall quality of the estimate.

So this is how we make up for the lack of control input information when tracking uncooperative objects. We build a model for each expected motion, and then set up an IMM to blend them together based on the likelihood that they represent the true motion.

Now before I end this video, I do want to address one more thing. You might be tempted to just run an IMM with a million models, something that could cover every possible motion scenario, right? Well, the problem with this is that for every model you run you have to pay a price. Namely, the computational cost of running a pile of predictions. And if it’s a high-speed real-time tracking situation, you may only have milliseconds to run the full filter. In addition, there is also the pain of having to set up all of these filters and get the process noise right. Let's say computational speed is not a problem; you only care about performance. Well, even then having too many models can hurt performance. For one, it increases the number of transitions between models, and it’s harder to determine when that transition should take place if there are a lot of models that represent very similar motions. Both of these contribute to a less optimal estimation.

So, unfortunately, you still have to approach this filter in a smart way and try to find the smallest set of models that can adequately predict the possible motions for the object that you’re tracking. Practically speaking, this tends to be less than 10 models, and usually around just three or four.

And, something else to keep in mind is that everything I’ve just explained is just what’s necessary to track a single object. Our problem gets even harder when we expand this to tracking multiple objects at once. And that is what we’ll cover in the next video.

So, if you don’t want to miss future Tech Talk videos, don’t forget to subscribe to this channel.  Also, if you want to check out my channel, Control System Lectures, I cover more control topics there as well. I’ll see you next time.