ELEC 5650 - Kalman Filter

"We have decided to call the entire field of control and communication theory, whether in the machine or in the animal, by the name Cybernetics, which we form from the Greek ... for steersman."
-- by Norbert Wiener

This is the lecture notes for "ELEC 5650: Networked Sensing, Estimation and Control" in the 2024-25 Spring semester, delivered by Prof. Ling Shi at HKUST. In this session, we will deviate Kalman Filter from three different perspectives: Geometric, Probabilistic, and Optimization approaches. Each perspective provides unique insights into understanding and implementing the Kalman Filter algorithm.

Takeaway Notes

Consider an LTI system with initial conditions ${\hat{x}}_{0 | 0}$ and ${\hat{P}}_{0 | 0}$

\begin{aligned} x_{k + 1} & = A x_{k} + B u_{k} + ω_{k}, & ω_{k} \sim N (0, Q) \\ y_{k} & = C x_{k} + ν_{k}, & ν_{k} \sim N (0, R) \end{aligned}

Find the estimation of $x_{k}$ given ${u_{0}, u_{1} \dots, u_{k}}$ and ${y_{0}, y_{1} \dots, y_{k}}$ .

Assumptions

$(A, B)$ is controllable and $(A, C)$ is observable
$Q ⪰ 0, R ⪰ 0, P_{0} ⪰ 0$
$ω_{k}$ , $ν_{k}$ and ${\hat{x}}_{0}$ are mutually uncorelated
The future state of the system is conditionally independent of the past states given the current state

Time Update

\begin{aligned} {\hat{x}}_{k | k - 1} & = A {\hat{x}}_{k - 1 | k - 1} + B u_{k} \\ {\hat{P}}_{k | k - 1} & = A {\hat{P}}_{k - 1 | k - 1} A^{T} + Q \end{aligned}

Measurement Update

\begin{aligned} K_{k} & = {\hat{P}}_{k | k - 1} C^{T} (C {\hat{P}}_{k | k - 1} C^{T} + R)^{- 1} \\ {\hat{x}}_{k | k} & = {\hat{x}}_{k | k - 1} + K_{k} (y_{k} - C {\hat{x}}_{k | k - 1}) \\ {\hat{P}}_{k | k} & = {\hat{P}}_{k | k - 1} - K_{k} C {\hat{P}}_{k | k - 1} = ({\hat{P}}_{k | k - 1}^{- 1} + C^{T} R^{- 1} C)^{- 1} \end{aligned}

Geometric Perspective (LMMSE Estimation)

The Geometric perspective views Kalman Filter as a Linear Minimum Mean Square Error (LMMSE) estimator, which is rooted in orthogonal projection theory in Hilbert space. The key insight is that the Kalman Filter's innovation term $e_{k}$ is orthogonal to all past observations $Y_{k - 1}$ , ensuring the new information being incorporated is statistically independent of previous measurements, thus maintaining the estimator's optimality.

Time Update

\begin{aligned} {\hat{x}}_{k | k - 1} & = {proj}_{Y_{k - 1}} (x_{k}) \\ = {proj}_{Y_{k - 1}} (A x_{k - 1} + B u_{k} + ω_{k}) \\ = A \cdot {proj}_{Y_{k - 1}} (x_{k - 1}) + B \cdot {proj}_{Y_{k - 1}} (u_{k}) + {proj}_{Y_{k - 1}} (ω_{k}) \\ = A {\hat{x}}_{k - 1 | k - 1} + B u_{k} \\ {\tilde{x}}_{k | k - 1} & = x_{k} - {\hat{x}}_{k | k - 1} \\ {\hat{P}}_{k | k - 1} & = E [{\tilde{x}}_{k | k - 1} {\tilde{x}}_{k | k - 1}^{T}] \\ = E [(x_{k} - {\hat{x}}_{k | k - 1}) (x_{k} - {\hat{x}}_{k | k - 1})^{T}] \\ = E [(A {\tilde{x}}_{k - 1 | k - 1} + ω_{k}) (A {\tilde{x}}_{k - 1 | k - 1} + ω_{k})^{T}] \\ = A E [{\tilde{x}}_{k - 1 | k - 1} {\tilde{x}}_{k - 1 | k - 1}^{T}] A^{T} + 2 A E [{\tilde{x}}_{k - 1 | k - 1} ω_{k}^{T}] + E [ω_{k} ω_{k}^{T}] \\ = A {\hat{P}}_{k - 1 | k - 1} A^{T} + Q \end{aligned}

Measurement Update

\begin{aligned} e_{k} & = y_{k} - {\hat{y}}_{k | k - 1} \\ = y_{k} - {proj}_{Y_{k - 1}} (y_{k}) \\ = y_{k} - {proj}_{Y_{k - 1}} (C x_{k} + ν_{k}) \\ = y_{k} - C \cdot {proj}_{Y_{k - 1}} (x_{k}) - {proj}_{Y_{k - 1}} (ν_{k}) \\ = y_{k} - C {\hat{x}}_{k | k - 1} \\ {\hat{x}}_{k | k} & = {proj}_{Y_{k}} (x_{k}) \\ = {\hat{x}}_{k | k - 1} + K_{k} e_{k} \\ = {\hat{x}}_{k | k - 1} + K_{k} (y_{k} - C {\hat{x}}_{k | k - 1}) \\ {\tilde{x}}_{k | k} & = x_{k} - {\hat{x}}_{k | k} \\ {\hat{P}}_{k | k} & = E [{\tilde{x}}_{k | k} {\tilde{x}}_{k | k}^{T}] \\ = E [(x_{k} - {\hat{x}}_{k | k}) (x_{k} - {\hat{x}}_{k | k})^{T}] \\ = E [({\tilde{x}}_{k | k - 1} - K_{k} e_{k}) ({\tilde{x}}_{k | k - 1} - K_{k} e_{k})^{T}] \\ = E [{\tilde{x}}_{k | k - 1} {\tilde{x}}_{k | k - 1}^{T}] - 2 K_{k} E [e_{k} {\tilde{x}}_{k | k - 1}^{T}] + K_{k} E [e_{k} e_{k}^{T}] K_{k}^{T} \\ = {\hat{P}}_{k | k - 1} - 2 K_{k} E [(y_{k} - C {\hat{x}}_{k | k - 1}) {\tilde{x}}_{k | k - 1}^{T}] + K_{k} E [(y_{k} - C {\hat{x}}_{k | k - 1}) (y_{k} - C {\hat{x}}_{k | k - 1})^{T}] K_{k}^{T} \\ = {\hat{P}}_{k | k - 1} - 2 K_{k} C {\hat{P}}_{k | k - 1} + K_{k} (C {\hat{P}}_{k | k - 1} C^{T} + R) K_{k}^{T} \\ \frac{\partial tr ({\hat{P}}_{k | k})}{\partial K_{k}} & = - 2 C {\hat{P}}_{k | k - 1} + 2 K_{k} (C {\hat{P}}_{k | k - 1} C^{T} + R) = 0 \\ K_{k} & = {\hat{P}}_{k | k - 1} C^{T} (C {\hat{P}}_{k | k - 1} C^{T} + R)^{- 1} \end{aligned}

Probabilistic Perspective (Bayesian Estimation)

The filter maintains a Gaussian belief state that gets refined through sequential application of Bayes' rule, where prediction corresponds to Chapman-Kolmogorov propagation and update implements Bayesian conditioning.

\begin{aligned} p (x_{k} | y_{1 : k}, u_{1 : k}) \\ = & p (x_{k} | y_{k}, y_{1 : k - 1}, u_{1 : k}) \\ = & \frac{p (y_{k} | x_{k}, y_{1 : k - 1}, u_{1 : k}) \cdot p (x_{k} | y_{1 : k - 1}, u_{1 : k})}{p (y_{k} | y_{1 : k - 1}, u_{1 : k})} \\ = & η \cdot \underset{observation model}{\underset{⏟}{p (y_{k} | x_{k})}} \cdot \underset{prior belief}{\underset{⏟}{p (x_{k} | y_{1 : k - 1}, u_{1 : k})}} \\ = & η \cdot p (y_{k} | x_{k}) \cdot \int p (x_{k}, x_{k - 1} | y_{1 : k - 1}, u_{1 : k}) d x_{k - 1} \\ = & η \cdot p (y_{k} | x_{k}) \cdot \int p (x_{k} | x_{k - 1}, y_{1 : k - 1}, u_{1 : k}) \cdot p (x_{k - 1} | y_{1 : k - 1}, u_{1 : k}) d x_{k - 1} \\ = & η \cdot p (y_{k} | x_{k}) \cdot \int \underset{motion model}{\underset{⏟}{p (x_{k} | x_{k - 1}, u_{k})}} \cdot \underset{previous belief}{\underset{⏟}{p (x_{k - 1} | y_{1 : k - 1}, u_{1 : k - 1})}} d x_{k - 1} \\ = & η \cdot N (y_{k}; H_{k} x_{k}, R_{k}) \cdot \int N (x_{k}; A x_{k - 1} + B u_{k}, Q_{k}) \cdot N (x_{k - 1}; {\hat{x}}_{k - 1}, {\hat{P}}_{k - 1}) d x_{k - 1} \\ = & η \cdot N (y_{k}; H_{k} x_{k}, R_{k}) \cdot N (x_{k}; {\hat{x}}_{k | k - 1}, {\hat{P}}_{k | k - 1}) \\ \propto & N (x_{k}; {\hat{x}}_{k}, {\hat{P}}_{k}) \end{aligned}

Applying Bayesian Rule and Markov Assumptions to $p (x_{k} | y_{1 : k}, u_{1 : k})$ , then the time update and the measurement update becomes very explicit.

Optimization Perspective (MAP Estimation)

The Kalman Filter solves a weighted least-squares problem where the optimal state estimate minimizes a cost function balancing prediction fidelity against measurement consistency, with covariance matrices acting as natural weighting matrices.

By Bayesian Rule, we know that

p (x_{k} | y_{1 : k}, u_{1 : k}) \propto \underset{measurement}{\underset{⏟}{p (y_{k} | x_{k})}} \cdot \underset{prior}{\underset{⏟}{p (x_{k} | y_{1 : k - 1}, u_{1 : k})}}

To maximize the posterior probability, it is equivalent to minimizing its negative logarithmic posterior.

- \ln p (x_{k} | y_{1 : k}, u_{1 : k}) = - \ln p (y_{k} | x_{k}) - \ln p (x_{k} | y_{1 : k - 1}, u_{1 : k}) + C

Applying Gaussian probability distribution

J_{x_{k}} = \frac{1}{2} | | z_{k} - C x_{k} | |_{R_{k}^{- 1}}^{2} + \frac{1}{2} | | x_{k} - {\hat{x}}_{k | k - 1} | |_{{\hat{P}}_{k | k - 1}}^{2} + C

{\hat{x}}_{k | k - 1} = \arg min_{x_{k}} (| | x_{k} - {\hat{x}}_{k | k - 1} | |_{{\hat{P}}_{k | k - 1}}^{2} + | z_{k} - C x_{k} | |_{R_{k}^{- 1}}^{2})

Prior is given by

\begin{aligned} {\hat{x}}_{k | k - 1} & = E [x_{k} | y_{1 : k - 1}, u_{1 : k}] \\ {\hat{P}}_{k | k - 1} & = Cov (x_{k} | y_{1 : k - 1}, u_{1 : k}) \end{aligned}

ELEC 5650 - Kalman Filter ​

Takeaway Notes ​

Assumptions ​

Time Update ​

Measurement Update ​

Geometric Perspective (LMMSE Estimation) ​

Time Update ​

Measurement Update ​

Probabilistic Perspective (Bayesian Estimation) ​

Optimization Perspective (MAP Estimation) ​

ELEC 5650 - Kalman Filter

Takeaway Notes

Assumptions

Time Update

Measurement Update

Geometric Perspective (LMMSE Estimation)

Time Update

Measurement Update

Probabilistic Perspective (Bayesian Estimation)

Optimization Perspective (MAP Estimation)