Skip to content

ELEC 5650 - Linear Quadratic Regulator

"We have decided to call the entire field of control and communication theory, whether in the machine or in the animal, by the name Cybernetics, which we form from the Greek ... for steersman."

 -- by Norbert Wiener

This is the lecture notes for 'ELEC 5650: Networked Sensing, Estimation and Control' in the 2024-25 Spring semester, delivered by Prof. Ling Shi at HKUST. In this session, we will cover Linear Quadratic Regulator (LQR) theory and its applications in control systems.

  1. Mathematic Tools
  2. Estimation Theory
  3. Kalman Filter
  4. Linear Quadratic Regulator <--

Linear Quadratic Regulator

Dynamic Programming

Consider a discrete-time dynamical system over a finite horizon N. The goal is to find a control policy that minimizes the expected cumulative cost:

Jπ(x0)=Eωk{gN(xN)+k=0N1gk[xk,μk(xk),ωk]}

The system evolves according to:

xk+1=fk(xk,uk,ωk)

Principle of Optimality

"An optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision."

 -- Richard Bellman

Optimal principle allows us to break down the multi-stage optimization problem into a sequence of simpler single-stage problems.

Let π={μ0,,μN1} be an optimal policy. Then for any k and reachable state xk, the sub-policy πkN1={μk,,μN1} must minimize the cost-to-go:

JkN(xk)=Eωk{gN(xN)+i=kN1gi[xi,μi(xi),ωi]}

This implies

Jk(xk)=minukEωk{gk[xk,μk(xk),ωk]+Jk+1f[xk,μk(xk),ωk]}

Dynamic Programming Algorithm

The solution is computed recursively through the following steps:

Terminal Cost

JN(xN)=gN(xN)

Backward Recursion

Jk(xk)=minukUEωk{gk(xk,uk,ωk)+Jk+1[f(xk,uk,ωk)]}μk(xk)=argminukUEωk{gk(xk,uk,ωk)+Jk+1[f(xk,uk,ωk)]}

Consider the following linear system

xk+1=Axk+Buk

We wants to find a series of u0,,uN1 to minimize

J=xNTQxNgN(xN)+k=0N1(xkTQxk+ukTRukgk(xk,uk,ωk)),Q0,R0

Solution

Terminal Cost

JN(xN)=gN(xN)=xNTQxNxNTPNxN

Backward Recursion

JN1(xN1)=minuN1{xN1TQxN1+uN1TRuN1+JN(xN)}=minuN1{xN1TQxN1+uN1TRuN1+xNTPNxN}=minuN1{xN1TQxN1+uN1TRuN1+(AxN1+BuN1)TPN(AxN1+BuN1)}=minuN1{uN1T(R+BTPNB)uN1+2xN1TATPNBuN1+xN1T(Q+ATPNA)xN1}uN1=(R+BTPNB)1BTPNALN1xN1JN1(xN1)=xN1TPN1xN1=uN1T(R+BTPNB)uN1+2xN1TATPNBuN1+xN1T(Q+ATPNA)xN1=xN1T[ATPNTB(R+BTPNB)1BTPNA2ATPNB(R+BTPNB)1BTPNA+Q+ATPNA]xN1=xN1T[Q+ATPNAATPNTB(R+BTPNB)1BTPNA]xN1PN1=Q+ATPNAATPNTB(R+BTPNB)1BTPNA

Summary

PN=Q,{uk=LkxkLk=(BTPk+1B+R)1BTPk+1APk=ATPk+1A+QATPk+1B(BTPk+1B+R)1BTPk+1A

Riccati Equation

Define

Pk+1=h(Pk)=ATPkA+QATPkB(BTPkB+R)1BTPkA

Assume (A,B) is controllable, (A,Q) is observable, then the following holds

  1. P0,P00,limkPk=P
  2. P is the unique solution to P=h(P)
  3. D=A+BL is stable, where L=(BTPB+R)1BTPA

Existence

We firstly prove P0,P=h(P)

Assume P0=0, then Pk=h(Pk1)=hk1(P1)=hk(P0)=h(0)0. So for any control sequence

mini=0k1(xiTQxi+uiTRui)x0Tgk(0)x0mini=0k(xiTQxi+uiTRui)x0Tgk+1(0)x0

If P0=0, then XY,h(X)h(Y). For any specific control sequence u¯0,u¯k, there exist an associated cost J¯k=i=0k1(xiTQxi+u¯iTRui) woule be a constant.

k,x0TPkx0=x0Thk(P0)x0=x0Thk(0)x0J¯k

So Pk converges when P0=0

Stability

P=h(P)=ATPkA+QATPkB(BTPkB+R)1BTPkA=DTPD+Q+LTRLxk+1=Axk+Buk=(A+BL)xk=Dxk

To show D is stable, we only need to show that x0,xk0 as k

{xk+1TPxk+1xkTPxk=xkTDTPDxkxkTPxk=xkT(Q+LTRL)xkxkTPxkxk1TPxk1=xk1T(Q+LTRL)xk1x1TPx1x0TPx0=x0T(Q+LTRL)x0xk+1TPxk+1=x0TPx0i=0kxiT(Q+LTRL)xi

Because P0, xk+1TPxk+10, hence

i=0kxiT(Q+LTRL)xix0TPx0<

This implies

limkxkT(Q+LTRL)xk=0

While Q+LTRL0, we must have

limkxk=0

Hence, the stability is proved.

Convergence

Next we prove P0,P=h(P). Because u is the optimal solution, J¯kx0Thk(P0)x0.

J¯k=xkTP0xk+i=0k1(xiTQxi+uiTRui)=x0T(Dk)TP0Dkx0+i=0k1(xiT(Q+LTRL)xi)=x0T[(Dk)TP0Dk+i=0k1(Di)T(Q+LTRL)Di]x0=x0T[(Dk)TP0Dk+P(Dk)TPDk]x0limkJ¯k=x0TPx0P0,limkx0Thk(P0)x0=x0TPx0limkPk=limkhk(P0)=P

Uniqueness

Assume P¯ is another solution to P¯=h(P¯). If PP, then x0,x0TPX0<x0TPx0. Contradict to the optimal feature. Hence, P=P

Linear Quadratic Gaussian

xk+1=Axk+Buk+ωk,ωkN(0,W)yk=Cxk+νk,νkN(0,V)

We want to minimize the quadratic cost function:

J=E[xNTQxN+k=0N1(xkTQxk+ukTRuk)],Q0,R0

By seperation principle, we can decomposed the problem to an optimal estimatior and an optimal controller.

Kalman Filter

The Kalman filter provides the optimal state estimate x^k with error covariance Σk:

Time Update

x^k|k1=Ax^k1|k1+Buk1P^k|k1=AP^k1|k1AT+W

Measurement Update

Kk=P^k|k1CT(CP^k|k1CT+V)1x^k|k=x^k|k1+Kk(ykCx^k|k1)P^k|k=(IKkC)P^k|k1

Linear Quadratic Regulator

J=k=0(xkTQxk+ukTRuk)

Solve P from

P=ATPAATPB(R+BTPB)1BTPA+Q

And solve L from

L=(R+BTPB)1BTPAuk=Lxk