Why the Riccati Equation Is important for LQR Control
From the series: State Space
Brian Douglas
This Tech Talk looks at an optimal controller called linear quadratic regulator, or LQR, and shows why the Riccati equation plays such an important role in solving it efficiently.
The talk walks through three different ways that the LQR problem can be solved: an intuitive, but ultimately inefficient brute force way; a more efficient learning algorithm way; and then the most efficient approach, which is accomplished analytically using the algebraic Riccati equation.
Published: 13 Jul 2023
In this video, we’re going to look at an optimal controller called linear quadratic regulator, or LQR, and talk about why the Riccati equation plays such an important role in solving it efficiently. Now, just to set your expectations, we’re not going to go through a rigorous mathematical proof, and I know I’ve just thrown a lot of math symbols up on this page, and it might be a bit overwhelming, but I think this will all make sense by the end of this video because we’re going to walk through a simplified, but useful, explanation. I think it’s pretty interesting so I hope you stick around for it. I’m Brian and welcome to a MATLAB Tech Talk.
If you’re not familiar with LQR, we have another MATLAB Tech Talk which I’ve linked to below that explains what it is and why we may use it. But let me just give you the briefest of overviews so we’re all on the same page.
Let’s set up a simple feedback control system. We have a plant that we’re trying to control using full state feedback. Full state just means every state, x, is observed and then fed back into the controller. The controller in this case is a gain matrix, -K, which is multiplied with x to create the input, u, which goes back into the plant. The question is, what are the optimal values of K so that the system behaves the way we want? And one way to solve this is with a linear quadratic regulator.
The linear in LQR means that the plant that we’re trying to control is modeled as a set of linear equations. In state space, this is x dot = Ax + Bu, and y = Cx + Du. We only really need to look at the A and B matrices since the output y, is just the full state of the system, x. This means C is an identity matrix and D is 0. So, they’re not important for this controller. Alright, that’s the first part. We need linear state space equations, and specifically the A and B matrices.
The quadratic term refers to the quadratic cost function that we are trying to minimize. In this case, we have a matrix, Q, that penalizes the square of any deviation in the state of the system, and a matrix, R, that penalizes the square of any controller effort, and then we sum that across all time. So, simplistically speaking, we’re trading how well a controller can drive the state of the system to zero, versus how much effort it takes to do so.
So, now given the tuning parameters Q and R, and the system dynamics A and B, what is the optimal gain matrix, K? That is, what is the K that will produce the lowest overall cost?
We can figure that out pretty easily with a single command in MATLAB. We can define our plant dynamics with the A and B matrices, and then define our cost function with the Q and R matrices, and then solve for the optimal gains using the LQR function. And for this particular system the two gains are about 3.4 and 2.3. Plus, it also returns the solution to the algebraic Riccati equation, which is this matrix P. Don’t worry about what this is right now, we’ll come back to it later.
Alright, so that’s the answer to this LQR problem, but that’s not terribly satisfying because we want to know how to get that answer. And so, what we’re going to do here is walk through three different ways that we can solve the LQR problem so that hopefully you have a better idea of what these numbers represent and why we solve LQR the way we do. We’ll look an inefficient brute force way, a more efficient learning algorithm way, and then the most efficient, and the actual way that it’s done which is an analytic approach using the algebraic Riccati equation.
Alright, so let’s start with brute force. Brute force basically means solve the problem with lots and lots of calculations. To show you what I mean, I created this simple MATLAB app that calculates the cost for a given gain set. For example, here I have K1 and K2 both equation to 1. The app then simulates the closed loop system with these gains, which produces a plot of the system state over time and a plot of the controller effort that produced that state. And then it plugs this information into the cost function and manually calculate what the cost is.
So, with these gains it produces a cost of about 252.
Now, I can pick a different set of gains, which changes the behavior of the system, which changes the cost. So, with brute force, we just do this for every possible gain combination and find the lowest gain. And to visualize what this looks like, I’m programmatically sweeping K1 and K2 between 0 and 5 and plotting the resulting cost on the vertical axis. This produces a convexed surface. You know, sort of looks like a blanket with the ends held up creating a low point somewhere in the middle. This low point is the lowest cost which, as expected, is right around K1 = 3.4 and K2 = 2.3. So, this is one way to solve the LQR problem.
But as you can probably imagine, it is entirely too inefficient in most cases. I mean who has the time to calculate every possible gain combination, especially for more complex systems?
Alright, so let’s move on to a different method. We could try learning-based algorithms.
In general, many learning algorithms use the concept of gradient descent. Gradient descent algorithms start at some point on this surface, that is they start at some initial gain set, calculate the cost and then follow the gradient or the slope of the cost surface down until it reaches the lowest point. This is kind of like dropping a ball onto our sheet and letting it roll down to the bottom where it comes to rest at the optimal gains. In this way, we don’t need to calculate the entire cost surface, we just need to calculate the cost that lies along the trajectory of the descending gradient.
If you’re curious about what this looks like in practice, we have an example showing how to solve LQR with reinforcement learning in MATLAB.
But with that being said, learning algorithms are useful optimization techniques when there is a lot of flexibility in system dynamics model or the cost function. This is because there often isn’t an analytic way to solve the problem and so we have to approach is iteratively through gradient descent or some other learning method. But that isn’t the case for LQR. We set up the problem such that we need linear system dynamics and we need a quadratic cost function. And the reason for being this rigid is because with this formulation, we know how to the solve the problem. We don’t have to calculate any of this cost surface because there is a way to just find the lowest cost solution directly. And the solution involves solving an algebraic Riccati equation.
Let’s walk through it. Now there will be a lot of math ahead but most of it is just moving matrices around so I hope it’s easy to follow. But the bottom line is that we’re just going to manipulate our cost function in away that we can more easily find the controller u that minimizes J.
And this is where a clever manipulation comes in. Let’s introduce a matrix P which is symmetric so P = P transpose. We don’t know what this is, but we’ll find out later. For now, let’s add it into our cost function with x0transpose times P times x0, where x0 is the initial state of the system. OK? Now we can’t just add in extra cost, that changes the cost function! So, we’ll subtract the same value to keep everything the same. Perfectly harmless! Now we can move this negative part into the integral where it transforms into the derivative of, Xtranspose times P times x.
We haven’t changed the value of J because if we take the integral of the derivative we’re left with X transpose P X evaluated at time equal infinity minus the value at time equal 0. But the state goes to zero as time approaches infinity. At least is does for stable systems. And, so, we’re left with the negative term exactly like how we started.
Alright, now let’s just look at this derivative we added and expand it to get "x dot transpose" p x plus "x transpose" p x dot. How has this helped? Well, we know what x dot is, right? It’s from the state equation. It’s our constraint. So, plugging in x dot gives us this equation for the derivative term which we can now plug back into our cost function. That gives us this long equation.
Which is the cost, taking into account the constraint. And we can simplify this by grouping the x transpose x terms together. So, we get this A transpose P + PA + Q group inside of the x transpose x, and then we get the rest of this stuff.
Now, as a reminder, we’re still trying to find the control u that minimizes this cost function. The part outside the integral doesn’t depend on u, so we don’t have control over that, and neither does this first part inside the integral. So, we really only need to look at what u minimizes this part over here. But unfortunately, it’s not just u, it also depends on x and P, making it kind of gross and hard to solve. But, here, we can take advantage of a cool technique called completing the square.
Notice that these three terms have this form of u^2 + some terms with just u. And in general, we can turn this into a square. That is, u + some unknown value squared. And if we expand this, We get u squared plus some term with just u plus the unknown value squared. And since we just want to match the first two terms, we have to remove this unknown value squared at the end. So, we’re looking to put this equation into this square form, and this is what it looks like. Notice that we have this u + additional term, scaled by R and squared, and then we subtract the square of that additional term. Like we expected, and you can expand this and see that it is equal to what we started with.
Now we plug this thing back into our cost function, we get this result. Remember, we haven’t changed the value of J, just reordered it in a fancy way. And I know this looks a mess, but now it’s super easy to minimize cost. This first term doesn’t depend on U so that’s cost we can’t minimize. But this integral, we can minimize.
If we look at this second part, we can make this zero by choosing u to be -R^-1 *B transpose * P * x. Which is kind of a cool result. This is saying that the optimal controller is just full state feedback where K is a constant gain equal to R^-1 * B transpose * some matrix P.
Now, what is this matrix P? Well, to figure that out, we can look at this first term. If the middle section here equals 0, then x transpose times it times x with also be zero. So, to make that happen we’re left with is this equation. This is the famous algebraic Riccati Equation. We have to find P such that this equation equals 0. If we can solve this Riccati equation, then we have everything we need to find the optimal gains.
So, to summarize, we find P that solves the algebraic Riccati equation which sets this part of the cost function to 0, and then use that matrix P to find the optimal gains, for the controller u that sets this part of the cost to 0.
And what’s great about this method is that solving the algebraic Riccati equation is much more efficient than any of the other methods we looked at.
Now, a closed form solution doesn’t always exist for it and so we do have to rely on numeric methods to find the matrix P, but that’s relatively simple. Also, since this equation is quadratic with respect to P, there are up to two possible solutions for each dimension in the system. However, only one of the solutions will produce a stable closed loop system. So, we are looking for the stabilizing solution.
Alright, if we now go back to the LQR command in MATLAB, we know a bit more what it’s doing, right? We have to give it the A, B, Q, and R matrices because those are needed to solve the Riccati equation. It then solves for P for the multiple solutions numerically, then checks the closed loop system for each of the possible gain sets and returns the gains that produce a stable system. Pretty awesome, and pretty efficient all things considered!
Alright, that’s where I’m going to leave this video. Hopefully this has given you a little intuition behind LQR and the algebraic Riccati equation.
Please check out the links below because I’ve linked to lot’s of good resources for more information on all of this. Also you can find all of the Tech Talk videos, across many different topics, nicely organized at mathworks.com. And if you liked this video then you might be interested to learn about LQR at a higher level than what we covered here. Alright, thanks for watching, and I’ll see you next time