ELEC 5650 - Linear Quadratic Regulator

"We have decided to call the entire field of control and communication theory, whether in the machine or in the animal, by the name Cybernetics, which we form from the Greek ... for steersman."
-- by Norbert Wiener

This is the lecture notes for 'ELEC 5650: Networked Sensing, Estimation and Control' in the 2024-25 Spring semester, delivered by Prof. Ling Shi at HKUST. In this session, we will cover Linear Quadratic Regulator (LQR) theory and its applications in control systems.

Linear Quadratic Regulator

Dynamic Programming

Consider a discrete-time dynamical system over a finite horizon $N$ . The goal is to find a control policy that minimizes the expected cumulative cost:

J^{π} (x_{0}) = E_{ω_{k}} {g_{N} (x_{N}) + \sum_{k = 0}^{N - 1} g_{k} [x_{k}, μ_{k} (x_{k}), ω_{k}]}

The system evolves according to:

x_{k + 1} = f_{k} (x_{k}, u_{k}, ω_{k})

Principle of Optimality

"An optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision."
-- Richard Bellman

Optimal principle allows us to break down the multi-stage optimization problem into a sequence of simpler single-stage problems.

Let $π^{*} = {μ_{0}^{*}, \dots, μ_{N - 1}^{*}}$ be an optimal policy. Then for any $k$ and reachable state $x_{k}$ , the sub-policy $π_{k \to N - 1}^{*} = {μ_{k}^{*}, \dots, μ_{N - 1}^{*}}$ must minimize the cost-to-go:

J_{k \to N} (x_{k}) = E_{ω_{k}} {g_{N} (x_{N}) + \sum_{i = k}^{N - 1} g_{i} [x_{i}, μ_{i} (x_{i}), ω_{i}]}

This implies

J_{k} (x_{k}) = min_{u_{k}} E_{ω_{k}} {g_{k} [x_{k}, μ_{k} (x_{k}), ω_{k}] + J_{k + 1} f [x_{k}, μ_{k} (x_{k}), ω_{k}]}

Dynamic Programming Algorithm

The solution is computed recursively through the following steps:

Terminal Cost

J_{N} (x_{N}) = g_{N} (x_{N})

Backward Recursion

J_{k} (x_{k}) = min_{u_{k} \in U} E_{ω_{k}} {g_{k} (x_{k}, u_{k}, ω_{k}) + J_{k + 1} [f (x_{k}, u_{k}, ω_{k})]}

μ_{k}^{*} (x_{k}) = \arg min_{u_{k} \in U} E_{ω_{k}} {g_{k} (x_{k}, u_{k}, ω_{k}) + J_{k + 1} [f (x_{k}, u_{k}, ω_{k})]}

Consider the following linear system

x_{k + 1} = A x_{k} + B u_{k}

We wants to find a series of $u_{0}, \dots, u_{N - 1}$ to minimize

J = \underset{g_{N} (x_{N})}{\underset{⏟}{x_{N}^{T} Q x_{N}}} + \sum_{k = 0}^{N - 1} (\underset{g_{k} (x_{k}, u_{k}, ω_{k})}{\underset{⏟}{x_{k}^{T} Q x_{k} + u_{k}^{T} R u_{k}}}), Q ⪰ 0, R ≻ 0

Solution

Terminal Cost

J_{N} (x_{N}) = g_{N} (x_{N}) = x_{N}^{T} Q x_{N} ≜ x_{N}^{T} P_{N} x_{N}

Backward Recursion

\begin{aligned} J_{N - 1} (x_{N - 1}) & = min_{u_{N - 1}} {x_{N - 1}^{T} Q x_{N - 1} + u_{N - 1}^{T} R u_{N - 1} + J_{N} (x_{N})} \\ = min_{u_{N - 1}} {x_{N - 1}^{T} Q x_{N - 1} + u_{N - 1}^{T} R u_{N - 1} + x_{N}^{T} P_{N} x_{N}} \\ = min_{u_{N - 1}} {x_{N - 1}^{T} Q x_{N - 1} + u_{N - 1}^{T} R u_{N - 1} + (A x_{N - 1} + B u_{N - 1})^{T} P_{N} (A x_{N - 1} + B u_{N - 1})} \\ = min_{u_{N - 1}} {u_{N - 1}^{T} (R + B^{T} P_{N} B) u_{N - 1} + 2 x_{N - 1}^{T} A^{T} P_{N} B u_{N - 1} + x_{N - 1}^{T} (Q + A^{T} P_{N} A) x_{N - 1}} \end{aligned}

u_{N - 1}^{*} = \underset{L_{N - 1}}{\underset{⏟}{- (R + B^{T} P_{N} B)^{- 1} B^{T} P_{N} A}} x_{N - 1}

\begin{aligned} J_{N - 1} (x_{N - 1}) & = x_{N - 1}^{T} P_{N - 1} x_{N - 1} \\ = u_{N - 1}^{T} (R + B^{T} P_{N} B) u_{N - 1} + 2 x_{N - 1}^{T} A^{T} P_{N} B u_{N - 1} + x_{N - 1}^{T} (Q + A^{T} P_{N} A) x_{N - 1} \\ = x_{N - 1}^{T} [A^{T} P_{N}^{T} B (R + B^{T} P_{N} B)^{- 1} B^{T} P_{N} A - 2 A^{T} P_{N} B (R + B^{T} P_{N} B)^{- 1} B^{T} P_{N} A + Q + A^{T} P_{N} A] x_{N - 1} \\ = x_{N - 1}^{T} [Q + A^{T} P_{N} A - A^{T} P_{N}^{T} B (R + B^{T} P_{N} B)^{- 1} B^{T} P_{N} A] x_{N - 1} \end{aligned}

P_{N - 1} = Q + A^{T} P_{N} A - A^{T} P_{N}^{T} B (R + B^{T} P_{N} B)^{- 1} B^{T} P_{N} A

Summary

P_{N} = Q, {\begin{cases} u_{k}^{*} = L_{k} x_{k} \\ L_{k} = - (B^{T} P_{k + 1} B + R)^{- 1} B^{T} P_{k + 1} A \\ P_{k} = A^{T} P_{k + 1} A + Q - A^{T} P_{k + 1} B (B^{T} P_{k + 1} B + R)^{- 1} B^{T} P_{k + 1} A \end{cases}

Riccati Equation

Define

P_{k + 1} = h (P_{k}) = A^{T} P_{k} A + Q - A^{T} P_{k} B (B^{T} P_{k} B + R)^{- 1} B^{T} P_{k} A

Assume $(A, B)$ is controllable, $(A, \sqrt{Q})$ is observable, then the following holds

$\exists P ≻ 0, \forall P_{0} ⪰ 0, lim_{k \to \infty} P_{k} = P$
$P$ is the unique solution to $P = h (P)$
$D = A + B L$ is stable, where $L = - (B^{T} P B + R)^{- 1} B^{T} P A$

Existence

We firstly prove $\exists P ≻ 0, P = h (P)$

Assume $P_{0} = 0$ , then $P_{k} = h (P_{k - 1}) = h^{k - 1} (P_{1}) = h^{k} (P_{0}) = h (0) ⪰ 0$ . So for any control sequence

{min}_{i = 0}^{k - 1} \underset{x_{0}^{T} g^{k} (0) x_{0}}{\underset{⏟}{(x_{i}^{T} Q x_{i} + u_{i}^{T} R u_{i})}} \leq {min}_{i = 0}^{k} \underset{x_{0}^{T} g^{k + 1} (0) x_{0}}{\underset{⏟}{(x_{i}^{T} Q x_{i} + u_{i}^{T} R u_{i})}}

If $P_{0} = 0$ , then $\forall X ⪰ Y, h (X) ⪰ h (Y)$ . For any specific control sequence ${\bar{u}}_{0}, \dots {\bar{u}}_{k}$ , there exist an associated cost ${\bar{J}}_{k} = \sum_{i = 0}^{k - 1} (x_{i}^{T} Q x_{i} + {\bar{u}}_{i}^{T} R u_{i})$ woule be a constant.

\forall k, x_{0}^{T} P_{k} x_{0} = x_{0}^{T} h^{k} (P_{0}) x_{0} = x_{0}^{T} h^{k} (0) x_{0} \leq {\bar{J}}_{k}

So $P_{k}$ converges when $P_{0} = 0$

Stability

\begin{aligned} P & = h (P) \\ = A^{T} P_{k} A + Q - A^{T} P_{k} B (B^{T} P_{k} B + R)^{- 1} B^{T} P_{k} A \\ = D^{T} P D + Q + L^{T} R L \end{aligned}

x_{k + 1} = A x_{k} + B u_{k} = (A + B L) x_{k} = D x_{k}

To show $D$ is stable, we only need to show that $\forall x_{0}, x_{k} \to 0$ as $k \to \infty$

{\begin{cases} x_{k + 1}^{T} P x_{k + 1} - x_{k}^{T} P x_{k} = x_{k}^{T} D^{T} P D x_{k} - x_{k}^{T} P x_{k} = - x_{k}^{T} (Q + L^{T} R L) x_{k} \\ x_{k}^{T} P x_{k} - x_{k - 1}^{T} P x_{k - 1} = - x_{k - 1}^{T} (Q + L^{T} R L) x_{k - 1} \\ ⋮ \\ x_{1}^{T} P x_{1} - x_{0}^{T} P x_{0} = - x_{0}^{T} (Q + L^{T} R L) x_{0} \end{cases}

x_{k + 1}^{T} P x_{k + 1} = x_{0}^{T} P x_{0} - \sum_{i = 0}^{k} x_{i}^{T} (Q + L^{T} R L) x_{i}

Because $P \geq 0$ , $x_{k + 1}^{T} P x_{k + 1} ⪰ 0$ , hence

\sum_{i = 0}^{k} x_{i}^{T} (Q + L^{T} R L) x_{i} \leq x_{0}^{T} P x_{0} < \infty

This implies

lim_{k \to \infty} x_{k}^{T} (Q + L^{T} R L) x_{k} = 0

While $Q + L^{T} R L ≻ 0$ , we must have

lim_{k \to \infty} x_{k} = 0

Hence, the stability is proved.

Convergence

Next we prove $\forall P ⪰ 0, P = h (P)$ . Because $u^{*}$ is the optimal solution, ${\bar{J}}_{k} \geq x_{0}^{T} h^{k} (P_{0}) x_{0}$ .

\begin{aligned} {\bar{J}}_{k} & = x_{k}^{T} P_{0} x_{k} + \sum_{i = 0}^{k - 1} (x_{i}^{T} Q x_{i} + u_{i}^{T} R u_{i}) \\ = x_{0}^{T} (D^{k})^{T} P_{0} D^{k} x_{0} + \sum_{i = 0}^{k - 1} (x_{i}^{T} (Q + L^{T} R L) x_{i}) \\ = x_{0}^{T} [(D^{k})^{T} P_{0} D^{k} + \sum_{i = 0}^{k - 1} (D^{i})^{T} (Q + L^{T} R L) D^{i}] x_{0} \\ = x_{0}^{T} [(D^{k})^{T} P_{0} D^{k} + P - (D^{k})^{T} P D^{k}] x_{0} \end{aligned}

lim_{k \to \infty} {\bar{J}}_{k} = x_{0}^{T} P x_{0}

\forall P_{0}, lim_{k \to \infty} x_{0}^{T} h^{k} (P_{0}) x_{0} = x_{0}^{T} P x_{0}

lim_{k \to \infty} P_{k} = lim_{k \to \infty} h^{k} (P_{0}) = P

Uniqueness

Assume $\bar{P}$ is another solution to $\bar{P} = h (\bar{P})$ . If $P^{'} \neq P$ , then $\exists x_{0}, x_{0}^{T} P^{'} X_{0} < x_{0}^{T} P x_{0}$ . Contradict to the optimal feature. Hence, $P^{'} = P$

Linear Quadratic Gaussian

\begin{aligned} x_{k + 1} & = A x_{k} + B u_{k} + ω_{k}, & ω_{k} \sim N (0, W) \\ y_{k} & = C x_{k} + ν_{k}, & ν_{k} \sim N (0, V) \end{aligned}

We want to minimize the quadratic cost function:

J = E [x_{N}^{T} Q x_{N} + \sum_{k = 0}^{N - 1} (x_{k}^{T} Q x_{k} + u_{k}^{T} R u_{k})], Q ⪰ 0, R ≻ 0

By seperation principle, we can decomposed the problem to an optimal estimatior and an optimal controller.

Kalman Filter

The Kalman filter provides the optimal state estimate ${\hat{x}}_{k}$ with error covariance $Σ_{k}$ :

Time Update

\begin{aligned} {\hat{x}}_{k | k - 1} & = A {\hat{x}}_{k - 1 | k - 1} + B u_{k - 1} \\ {\hat{P}}_{k | k - 1} & = A {\hat{P}}_{k - 1 | k - 1} A^{T} + W \end{aligned}

Measurement Update

\begin{aligned} K_{k} & = {\hat{P}}_{k | k - 1} C^{T} (C {\hat{P}}_{k | k - 1} C^{T} + V)^{- 1} \\ {\hat{x}}_{k | k} & = {\hat{x}}_{k | k - 1} + K_{k} (y_{k} - C {\hat{x}}_{k | k - 1}) \\ {\hat{P}}_{k | k} & = (I - K_{k} C) {\hat{P}}_{k | k - 1} \end{aligned}

Linear Quadratic Regulator

J = \sum_{k = 0}^{\infty} (x_{k}^{T} Q x_{k} + u_{k}^{T} R u_{k})

Solve $P$ from

P = A^{T} P A - A^{T} P B (R + B^{T} P B)^{- 1} B^{T} P A + Q

And solve $L$ from

L = - (R + B^{T} P B)^{- 1} B^{T} P A

u_{k} = L x_{k}

ELEC 5650 - Linear Quadratic Regulator ​

Linear Quadratic Regulator ​

Dynamic Programming ​

Principle of Optimality ​

Dynamic Programming Algorithm ​

Terminal Cost ​

Backward Recursion ​

Solution ​

Terminal Cost ​

Backward Recursion ​

Summary ​

Riccati Equation ​

Existence ​

Stability ​

Convergence ​

Uniqueness ​

Linear Quadratic Gaussian ​

Kalman Filter ​

Time Update ​

Measurement Update ​

Linear Quadratic Regulator ​

ELEC 5650 - Linear Quadratic Regulator

Linear Quadratic Regulator

Dynamic Programming

Principle of Optimality

Dynamic Programming Algorithm

Terminal Cost

Backward Recursion

Solution

Terminal Cost

Backward Recursion

Summary

Riccati Equation

Existence

Stability

Convergence

Uniqueness

Linear Quadratic Gaussian

Kalman Filter

Time Update

Measurement Update

Linear Quadratic Regulator