Video length is 15:24

What Is Track-Level Fusion? | Understanding Sensor Fusion and Tracking, Part 6

From the series: Understanding Sensor Fusion and Tracking

Brian Douglas

Gain insights into track-level fusion, the types of tracking situations that require it, and some of the challenges associated with it.

You’ll see two different tracking architectures—track-to-track fusion and central-level tracking—and learn the benefits of choosing one architecture over the other.

Published: 27 Aug 2020

In this video, I want to introduce you to track-level fusion or track-to-track fusion. You may hear these two terms used interchangeably. Now, we're not going to go into any particular algorithm. Rather, I want to explain what track-level fusion is and how it's different from the tracking algorithms we've already covered in this series.

And hopefully along the way, this will provide some motivation and intuition into why track-level fusion is necessary in some situations. And then also, we're going to go through some of the challenges associated with it. So I hope you stick around. I'm Brian, and welcome to a MATLAB Tech Talk.

To begin, let's compare two different tracking architectures. We'll start with an architecture that we'll call central-level tracking. You'll probably be used to this if you've been following along with this series. In this architecture, the sensor detections are fed into a tracking algorithm that assigns the detections to tracks and updates the states and covariances of the tracked objects-- exactly what we saw in the past few videos.

The key here is that all of the sensor data is fused or blended together at the same level and within the same tracker, sort of a mothership approach where there's this one centralized unit that takes in all of the information that's available and performs the calculations necessary to estimate the tracks.

Now, with this architecture, we could have just a single sensor or multiple sensors that are blended together. And we could be tracking just a single target or multiple targets. And the targets themselves can be point objects or extended objects. As long as sensor measurements feed into a central tracker, no matter what it's tracking, we can think of it as a central-level tracker.

Now, let's compare this architecture to one that uses so-called sensor-level tracking and track-level fusion. The idea here is that one or more sensors feed into a central-level tracker just like the other architecture. But now, we have several of these trackers each fusing together their own set of sensors.

These are the sensor-level trackers in this architecture. Each produces its own track estimates. But now, we can combine or fuse together all of these estimated tracks into a single track set, which we'll call the central tracks using a track-level fuser.

As a quick example, one sensor-level tracker might say that there is an object here with this probability distribution. Another sensor-level tracker might say that an object is here with a different probability distribution. The track-level fuser would need to determine if these two tracks are two different objects or associate them as the same object.

And if it's the same object, then combine the two estimates to create a new state and probability distribution that is more accurate than either source track on its own. And we can use this type of architecture with multiple sensors and to track single or multiple objects and for both point and extended objects, so similar capabilities as central-level tracking but with a slightly different approach, a distributed approach to solving the problem.

So the question at this point might be, why not just use a single tracker? Why go through the extra steps of having multiple trackers and then having to fuse their tracks together? Well, to answer that, let's look at some of the benefits of doing track-to-track fusion and some of the scenarios where it's a more appealing option. And then after that, we'll look at some of the challenges with it. And hopefully, you'll be able to form an opinion as to when you may choose one architecture over another.

Track-level fusion can be beneficial if you are worried about access to data, bandwidth, compute capabilities, and specialization. Let's start with access to data. You may be forced to use a track-level fuser if you don't have access to the raw sensor data, and this might be the case if you buy a sensor that has a fusion and tracking algorithm built into it.

For example, you might have a lidar system that doesn't return a point cloud but instead is capable of tracking some number of objects in the scene and returning the tracks for each of them. And a track here might be a state vector with position, velocity, orientation, and shape, as well as a set of covariance matrices that indicates how much confidence there is in each state.

In this case, in order to blend this track information with other sensors, say, with a visible camera system on your autonomous vehicle, you're going to need a track-level fuser. Now, even if you buy or build sensors that give you access to the detections, it may not be possible to transfer real-time all of that information from the sensor to the compute element that's running the tracker.

Some sensors can generate data rates that are far too great for the bandwidth of the communication buses, especially lidar are invisible cameras that are sampling dozens of times per second. And if you have limited bandwidth within your communication system, that's a limited number of bits per second that you can send, then you may be interested in reducing the size of the data that each sensor is sending.

Track information is small compared to, say, a camera image. So if a local computer can process the sensor information and distill it down to a best estimate for the objects it's tracking, then there's a lot less information that needs to be sent to the main computer that's running the track-level fuser.

Now, even if bandwidth isn't a problem, and you can send all the data you want, there might still be an issue with compute capabilities. Again, imagine there are dozens of visible cameras and lidar sensors on a vehicle. A central-level tracker would need to be able to ingest and process all of that data in one giant tracking algorithm. And this may take too much processing time to produce estimates at the desired sample rate.

However, if a local computer is processing its own sensor data, all of that initial processing is being done in parallel, distributed among many computers. And then the track fusion algorithm only needs to be able to process the much smaller track information, which can speed up the entire processing time considerably

OK, so maybe none of this matters to you because let's say you have a powerful computer that's able to handle all of the data at once. It may still be beneficial to have a track-level fuser, because it allows for the sensor-level trackers to be specialized to their particular sensor type.

Remember, in a tracker, we have to set up motion models and sensor models and associate detections to objects and existing tracks. And we have to tune it to a particular set of hardware and expected environment conditions and so on. And all of that can be much easier if we build a tracker that focuses on fusing together, say, just the camera data. And then we can fuse those tracks with ones that are generated by say the lidar-based tracker. In this way, we don't have a single massive tracking algorithm but many smaller algorithms that can be easier to setup, tune, and test.

So each of these are reasons why you may want to or need to implement a track-level fusion algorithm on your vehicle. And these all seem like really good benefits. So the question now is, why don't we just do track-level fusion all the time? And to answer that, let's look at two of its challenges-- reduced accuracy and correlated noise.

Let's start with reduced accuracy. We've already said that sensors can produce a lot of data, and the tracker distills that data down to a state vector with a lot less information. And as a byproduct of that process, we may be removing some information that is useful, information that the track-level fuser no longer has access to. In this way, when we fuse tracks together, it can produce a result that is less accurate than we could get by fusing together all of the information at the sensor level.

As a quick example of this, imagine we're tracking the same two objects with two different sensors. Sensor A has the following detections. Its tracker has generated a track for the left object but not for the right. It's treating those detections as spurious noise and ignoring it, or maybe, it just hasn't established a track yet for it.

Sensor B is doing the opposite. It has a track for the right object, but it's ignoring the left detections. Now, fusing these two tracks together results in both of them being included in the central track list, unmodified, since there's no other information to update them with. However, if we had grouped all of the sensor detections together, the single detections would have been grouped with the others and could be used to help improve the estimates.

But there is another issue with track-level fusion that arguably is more important, and that's the problem with correlated noise. If the tracks that we're fusing together are correlated in some way, then we can't just multiply their probabilities together like we can in a standard Kalman filter.

As an extreme example of this, imagine two trackers that are using the same process model to predict a future state. Each model is initialized with perfect knowledge of the state of the object which it got from its own respective perfect sensor. As the models propagate this perfect state forward in time, our uncertainty grows due to the process noise or errors in the model.

We now have two uncertain track estimates. We may try to fuse these two tracks together so that we have a central estimate that we have more confidence in. I mean, if two different models predict the same state, then it seems like we should have more confidence in the fused solution than we would in either solution on its own.

However, these track probabilities are highly correlated since they were generated using the same model. So we shouldn't actually have any more confidence in the fuse solution then we do the individual ones. Because we essentially just ran the same model twice, so why would combining the result give us any more information?

So if there's no correlation between tracks, then we want our tracker to take advantage of this and increase our confidence in the solution. But if there is a lot of correlation, then we want the tracker to fuse the solutions in a way that doesn't increase our confidence. But here's the real issue. We may not necessarily know how or if two tracks are correlated. And handling this unknown correlation is the idea behind some of the existing fusion algorithms like covariance intersection.

And I've left a link to a resource that goes into way more depth on this, but let me give you a visualization of approximately how it works. Assume a tracker produces a probability distribution of where an object is located on a 2D plane represented with this oval. And a second tracker produces this distribution of the same object. To fuse these two probabilities together, we can create a third probability that bounds the intersection of these two.

| is, we're looking at where these two ovals intersect and creating a distribution that completely includes that intersection.

You can see that this distribution is smaller than either of the previous ones, indicating that we have more confidence in the solution. However, as the process noise or the sensor noise becomes more correlated, and these two probabilities start to line up, you can see how the intersection grows until the two probabilities perfectly align, and it becomes the exact same distribution.

Now, this is a conservative approach to fusing probabilities, because as you can imagine, we may have distributions that line up purely by chance and with no correlation at all. And when that happens, the method will still treat them as though they are correlated

OK, there's one last thing I want to talk about in this video. So far, we've looked at this architecture where the source tracks into the fuser come from sensor-level trackers. However, in some situations, the source tracks may come from other track-level fusers, and this can create some interesting phenomenon.

To illustrate this problem, let's imagine that there are two autonomous vehicles each with their own set of sensor-level trackers that feed into a track-level fuser. The rear vehicle is positioned such that it can't see a pedestrian up ahead, but the front vehicle can see it. You could imagine that it would benefit the rear vehicle to know that there's a pedestrian up ahead, so that when it does come into view, it doesn't need to waste precious time establishing a new track. It'll already exist.

So to accomplish this, both vehicles can share their central tracks with each other and fuse them along with their own estimated tracks. But now we've introduced a possible problem called rumors, and it happens like this. The front vehicle tells the rear vehicle about the pedestrian by sharing its track with it. So now, both vehicles know about the object and are propagating it state with their own process models.

Each time step, the front vehicle senses the object and updates its track and says, hey, the object is still here. And the rear vehicle says, yep. I've been tracking it as well with my process model. We're good. But let's imagine now that the object disappears or moves out of frame of the sensor.

The front vehicle then may drop the track and say, hey, that object's gone. But the rear vehicle who is propagating the state still may not have dropped the track by then and tells the front vehicle, hey, don't worry about it. Somebody told me about this object, and I'm still tracking it. Here is its state information even though all it's doing is propagating the state that the front vehicle gave it.

Well, now, the front vehicle would say, OK, I'll keep this track since you're telling me it's still there. And now, a rumor has started. That track persists even though no vehicle is actually sensing it anymore. So track fusion algorithms need to be set up in a way that discourages rumor propagation, which can actually be pretty tricky to do without discouraging actual non-rumored tracks.

OK, so there are more benefits and challenges with central-level and track-level fusion than what I covered here, but these are some of the major ones. As you might see now, there isn't one tracking approach that's best suited for every situation. Hopefully, you can start to piece together which tracking scenarios could benefit from the speed and efficiency of track-level fusion, and which scenarios would be better suited for the less complex and possibly more accurate central-level tracking.

As always, if you want to explore further than what I've covered here, I've linked to several resources that go into more detail on everything I've talked about, but I hope this short video has helped. If you don't want to miss any other future Tech Talk videos, don't forget to subscribe to this channel. And if you want to check out my channel, Control System Lectures, I cover more control theory topics there as well. Thanks for watching, and I'll see you next time.