Skip to content

ELEC 5650 - Estimation Theory

"We have decided to call the entire field of control and communication theory, whether in the machine or in the animal, by the name Cybernetics, which we form from the Greek ... for steersman."

 -- by Norbert Wiener

This is the lecture notes for "ELEC 5650: Networked Sensing, Estimation and Control" in the 2024-25 Spring semester, delivered by Prof. Ling Shi at HKUST. In this session, we will explore fundamental concepts and techniques in estimation theory, including maximum a posteriori (MAP) estimation, minimum mean squared error (MMSE) estimation, maximum likelihood (ML) estimation, weighted least squares estimation, and linear minimum mean square error (LMMSE) estimation.

  1. Mathematic Tools
  2. Estimation Theory <--
  3. Kalman Filter
  4. Linear Quadratic Regulator

MAP (Maximum A Posterior) Estimation

x is the parameter to be estimated

x^=argmaxx{f(x|y),x is continuousp(x|y),x is discrete

MMSE (Minimum Mean Squared Error) Estimation

x^=argminx^E[eTe|y]=argminx^E[x^|y], e=xx^x^=xf(x|y) dxorxp(x|y)

Proof:

E[eTe]=E[(xx^)T(xx^)|y]=E[xTx|y]2x^TE[x|y]+x^Tx^x^(E[xTx|y]2x^TE[x|y]+x^Tx^)=02E[X|Y]+2x^=0x^MMSE=E[X|Y]

ML (Maximum Likelihood) Estimation

Non Bayesian. p(y|x) is conditional probability and p(y;x) is parameterized probability, p(y|x)p(y;x).

Assume we have n measurements X=(X1,,Xn), we use p(X;θ) to describe the joint probability of X.

θ^n=argmaxθ{f(X;θ),θ is continuousp(X;θ),θ is discretep(X;θ)=i=1np(Xi;θ)logp(X;θ)=i=1nlogp(Xi;θ)

MAP & ML

θ^MAP=argmaxθp(θ|x)=argmaxθp(θ)p(x|θ)p(x)p(θ|x)=argmaxθp(θ)p(x|θ)θ^ML=argmaxθp(x;θ)

Weighted Least Square Estimation

E(x)=||Axb||Σ2=xTATΣ1Ax2bTΣ1Ax+bTΣ1bE=2ATΣ1Ax2ATΣ1bx^=(ATΣ1A)1ATΣ1b

LMMSE (Linear Minimum Mean Square Error) Estimation

LMMSE estimation wants to find a linear estimator

x^=Ky+b

such that minimize the mean square error

MSE=E[(xx^)T(xx^)]=E[xTx]2E[xTx^]+E[x^Tx^]=E[xTx]2E[xT(Ky+b)]+E[(Ky+b)T(Ky+b)]MSEb=2E[x]+2b+2KE[y]=0b=μxKμyMSE=E[xTx]2E[xT(Ky+b)]+E[(Ky+b)T(Ky+b)]=E[xTx]2E[xT(Ky+μxKμy)]+E[(Ky+μxKμy)T(Ky+μxKμy)]=E[xTx]2E[xTKy]2μxTμx+2μxTKμy+E[yTKTKy]+μxTμx2μxTKμy+μyTKTKμyMSEK=2Σxy+2KΣyy=0K=ΣxyΣyy1x^=Ky+b=μx+ΣxyΣyy1(yμy)Σx^x^=ΣxxΣxyΣyy1Σyx

Orthogonality Principle

xKyb,y=E[(xKyb)yT]=E[xyT]KE[yyT]bE[yT]=Σxy+μxμyTK(Σyy+μyμyT)(μxKμy)μyT=Σxy(ΣxyΣyy1)Σyy=0x(Ky+b)y

This shows that error e=xx^ is independent of observation y.

Innovation Process

Calculating Σyy consumes lots of time, however, if Σyy is diagonal the thing becomes easy. By G.S. process, we can obtain orthogonality vectors e1,ek and the lower triangular transform matrix F from y1,,yk. The key idea of ​​orthogonal projection is to decompose the observation vector yk into a part related to the past prediction value, which can be predicted by y1,yk1, and a new part that is irrelevant to the past prediction value (innovation).

e=Fy

Then the covariance can be calculated by

Σee=FΣyyFT,Σex=FΣyxKe=ΣexΣee1=FΣyx(FT)1Σyy1F1

Although Ke is not equal to K, it serves as the Kalman gain in the transformed or projected space defined by the matrix F.

For new coming yt+1 we can find ek+1 by G.S. process

ek+1=yk+1y^k+1|k=yk+1proj(yk+1;Ek)=yk+1i=1kyk+1,eiei,eiei

It satisfies

ek+1,yi=E[ek+1yiT]=0,i[1,k]

To estimate x at k+1

x^k+1=proj(xk+1;Ek+1)=i=1k+1xk+1,eiei,eiei=i=1kxk+1,eiei,eie+xk+1,ek+1ek+1,ek+1ek+1=i=1kxk,eiei,eiei+xk+1,ek+1ek+1,ek+1ek+1=x^k+xk+1,ek+1ek+1,ek+1ek+1