Class 5: Kalman Filter

Class 5: Kalman Filter Macroeconometrics - Spring 2011 Jacek Suda, BdF and PSE April 11, 2011

Outline Outline: 1 Prediction Error Decomposition 2 State-space Form 3 Deriving the Kalman Filter See Kim and Nelson, Chapter 3; Hamilton, Chapter 13

Notation Denote: {Y t } covariance-stationary process, e.g. ARMA(p, q), Ω t information available at time t, Y t+1 t forecast of Y t+1 based on Ω t. In our simple model it is Y y+1 given Y t.

Linear Projection Linear projection: Ŷ t+1 t = α X t = α 1 X 1t +... + α p X pt, where E[(Y t+1 α X t ) X it ] = 0, i = 1,..., p. p moments conditions ensure that error is orthogonal to any information in Ω t: forecast errors are uncorrelated with past information. Result: The minimum MSE linear forecast of Y t+1 is linear projection.

ARMA Models Solve Wold form Y t µ = ψ(l)ε t, ε t WN ψ(l) = ψ j L j, ψ 0 = 1, ψj 2 < j=0 j=0 Y t+s = µ + ε t+s + ψ 1 ε t+s 1 +... + ψ s ε t + ψ s+1 ε t 1 +..., Ŷ t+1 t = µ + ψ s ε t + ψ s+1 ε t 1 +.... the last line is for information till time t and E t [ε t+i ] = 0, i > 0.

ARMA Models MSE(Ŷ t+s t, Y t+s ) = E[(ε t + ψε t+s 1 +... + ψ s 1 ε t+1 ) 2 ] = σ 2 (1 + ψ 2 1 + ψ 2 2 +... + ψ 2 s 1) < var(y t+s ). But, We are better off with linear projection than with unconditional variance. lim s σ2 s ψk 2 = var(y t ). k=0 Upper limit for uncertainty is as high as the unconditional variance.

Kalman Filter Forecasts based on Wold form assume infinite number of observations. We don t have them in reality. Kalman filter calculates linear projections for finite number of observations, exact finite sample forecast, allow for exact MLE of ARMA models based on Prediction Error Decomposition.

Normal Distribution Joint Normality: y 1. y T = ỹ TT 1 N(µ T 1, Ω T T ), Since it is covariance stationary process, each Y t has the same mean and variance, ω 11 = σ 2 = ω 22 = ω TT. ω 11 ω 12 ω 1T γ 0 γ 1... γ T 1 Ω = ω 21 ω 22..... = γ 1 γ 0 ω jj = γ 0,...., as ω ij = γ i j, i > j. γ T 1 γ 0 ω T1 ω TT The likelihood function: L( θ ỹ T ) = (2π) T 2 det(ω) 1 2 e 1 2 (ỹt µ) Ω 1 (ỹ T µ).

Factorization For large T, Ω might be large and difficult to invert. Since Ω is positive definite symmetric matrix then there exists a unique, triangular factorization of Ω, Ω = AfA, where f T T = A T T = f 1 0 0 0 f 2....., f t > 0 t diagonal matrix 0 f T 1 0 a 21 1..... a T1 a T2... 1

Likelihood The likelihood function can be rewritten as: L( θ ỹ T ) = (2π) T 2 det(afa ) 1 2 e 1 2 (ỹt µ) (AfA ) 1 (ỹ T µ) Define η = A 1 (ỹ T µ)(prediction error). where Aη = (ỹ T µ). Since A is lower-triangular matrix with 1s along the principal diagonal, η 1 = y 1 µ η 2 = y 2 µ a 11η 1 η 3 = y 3 µ a 21η 1 a 22η 2. T 1 η T = y T µ i=1 a Tiη T 1

Likelihood Also, since A is lower triangular with 1s along the principal diagonal, det(a) = 1 Then, det(afa) = det(a) det(f ) det(a ) = det(f ). L( θ ỹ T ) = (2π) T 2 det(f 1 ) 1 2 e 1 2 η (f 1 ) 1 η = T ( ) 1 e 1 η t 2 2 ft, 2πft t=1 where η t is t th element of η T 1 = prediction error y t ŷ t t 1, t 1 ŷ t t 1 = a t,iy i, i = 2, 3,..., T, i=1 where a t,i is (t, i) th element of A 1.

Kalman Filter Note: Given y t N(µ, Ω), η t Ω t 1 N(0, f t ), where f t is an (t, t) diagonal element of f matrix, ln L = 1 2 T ln(2πf t ) 1 2 t=1 T t=1 η 2 t f t, since η t N and independent of each other. The Kalman filter recursively calculates linear projection of y t on past information Ω t 1 for any model that can be cast in state-space form. Kalman filter: for any structure it solves for linear prediction.

Measurement (Observation) Equation General form that encompasses a wide variety od models. 1 Measurement (Observation) Equation Represent the static relationship between observed variables (data) and unobserved state variables. y t = H t β t + Az t + e t, where y t denotes observed data, β t is a state vector that captures the dynamics, z t is exogenous, observed variables for example, lagged values of y t but also other data, and e t is an error term, e t N(0, R). The existence of the state vector makes this representation not a simple linear model.

Transition (State) Equation 2 Transition (State) Equation Captures the dynamics in the system, causes the system to go on and on. β t = µ + Fβ t 1 + v t, where µ is a vector of constants, F is the transition matrix, and v t is an error vector, v t N(0, Q). Like AR(1) but in vector/matrix form.

Transition (State) Equation β t = µ + Fβ t 1 + v t, The state vector has and AR(1) kind of representation. Describes evolution of state vector. These state vectors can be unobservable. Transition equation can be used to get information about the unobservable, conditioning on data which is observable (Bayesian).

Error terms Error terms: e t N(0, R), v t N(0, Q), where R, Q are var-cov matrices and E[e t v τ ] = 0, t, τ Restrictive assumption The model can be represented in a way that is not very restrictive. Even with E[e t v τ ] 0 we can estimate the model with (modified) Kalman Filter but it becomes more complicated. The normality assumption might not be good as always but... It allows to use MLE.

Examples: AR(p) It applies to a very wide variety of time-series models. Consider an AR(p) process State equation y t µ = φ 1 (y t 1 µ) +... + φ p (y t p µ) + ε t E(ε τ ε t ) = σ 2 for t = τ y t µ y t 1 µ. y t p+1 µ = φ 1 φ 2... φ p 1 φ p 1 0... 0 0...... 0 0... 1 0 y t 1 µ y t 2 µ. y t p µ + ε t 0. 0 Observation equation y t = µ + [ 1 0... 0 ] y t 1 µ y t 2 µ. y t p µ.

Examples: ARMA(1,1) ARMA(1,1): Set µ = 0, y t = φy t 1 + ε t + θε t 1, ε t N(0, σ 2 ). There might be more than one way to represent a model in a state-space form. There might be differences in efficiency between different ways.

Examples: ARMA(1,1) State equation: The general form Put β t = β t = Fβ t 1 + v t. [ ] [ ] yt yt 1 β ε t 1 = t ε t 1 Put y t = φy t 1 + θε t 1 + ε t in a matrix notation: [ ] [ ] [ ] yt φ θ yt 1 = + ε t 0 0 ε t 1 β t F β t 1 [ ] σ 2 σ and v t N(0, Q), Q = 2 σ 2 σ 2. y t observable, ε t unobservable, forecast error [ εt ε t ] v t,

Examples: ARMA(1,1) Observation equations: y t = [ 1 0 ] [ yt ε t y t H β t ] no exogenous variables: A = 0, also R = 0. y t = Hβ t for this case (ARMA(1,1)). The parameters φ, θ, σ 2 are captured in F, Q matrices. The Kalman Filter will estimate them. For KF what goes in β t doesn t matter. Only parameters F, Q, H, R will matter. The state vector is now defined by F, Q, H, and the observations.

ARMA(1,1): Alternative Representation A more elegant ( i.e. easier for computation) representation. Log notation (alternative representation for ARMA(1,1)) (1 φl)y t = (1 + θl)ε t y t = (1 φl) 1 (1 + θl)ε t y t = (1 + θl)(1 φl) 1 ε t. Define x t = (1 φl) 1 ε t (1 φl)x t = ε t, (x t is AR(1), not observed) x t φx t 1 = ε t Then, y t = (1 + θl)x t y t = x t + θx t 1. So y t is a linear combination of 2 unobservable AR(1) processes, x t and X t 1.

ARMA(1,1): State-Space Observation equation (all randomness in the state equation) where y t = Hβ t, y t = [ 1 0 ] [ x t x t 1 ] Inside H there are parameters to be estimated. A = 0, no exogenous, R = 0 as the observable equation is just the identity (no randomness of e t.

ARMA(1,1): State-Space State equation so [ xt x t 1 ] β t = [ ] φ 0 1 0 F v t N(0, Q), Q = So φ is in F, θ in H, and σ 2 in Q. [ xt 1 x t 2 ] β t 1 + [ σ 2 0 0 0 ]. [ εt ] 0, v t Given F, Q, H, A, R and data (y t s), use Kalman Filter to find prediction error decomposition of joint likelihood for ỹ T = (y 1,... y T ), given by L(θ, φ, σ 2 ỹ T ). (exact likelihood)

Kalman Filter Kalman filter: purpose: to make inference about unobservable given the observable, application: signal extraction in engineering, economics: don t know the parameters F, Q, H and want to estimate them. State-space form ME: Measurement (Observation) equation: SE: Transition (State) equation: y t = Hβ t + e t, e t N(0, R) β t = µ + Fβ t 1 + v t, v t N(0, Q), E[e t v τ ] = 0.

Mean of β 1 β t is a random variable it might be unobservable and no data for it, it is normal random variable as it is sum of normal variables, v t N. Conditional mean β t Ω t 1 N (E[β t Ω t 1 ], var(β t Ω t 1 )) E[β t Ω t 1 ] = β t t 1, conditional expectations. We may not know what β s are. If we have information about its distribution, we can calculate mean, variance, etc. β t 1 may be not observable: take expectations of it E[β t Ω t 1 ] β t t 1 = µ + FE[β t 1 Ω t 1 ] + 0 β t t 1 = µ + Fβ t 1 t 1, In AR(1): E[y t] = µ + φe[y t 1], last term is observable.

Variance of β Conditional variance Recall Var(β t Ω t 1 ) P t t 1 = E[(β t β t t 1 )(β t β t t 1 ) ]. var(ax) = a 2 var(x), a scalar, x random vector. Two sources of randomness (variation) for β t : 1 v t is a random variable, 2 β t 1 is also random so there might be difference between β t 1 and β t t 1, there may not be equal to each other. P t t 1 = F P t 1 t 1 F + Q, where P t t 1, uncertainty about β t equals sum of uncertainty about β t 1, P t 1 t 1, and uncertainty about v t. Note: cov(β t 1, v t ) = 0.

Kalman Filter 2 y t is a random variable. Now, we have data on y t. We have some joint density of y t, β t and some prior. Using data we get posterior of β t. We want to make inference for β t which we don t observe. We see y t which is related to beta t. We make inferences on β t by observing joint density (distribution) of ys and βs (Bayesian view).

Distribution of y t Distribution of y t given state-space y t Ω t 1 N(E[y t Ω t 1 ], var(y t Ω t 1 ), Conditional mean E[y t Ω t 1 ] y t t 1 = Hβ t t 1 + 0 Conditional variance var(y t Ω t 1 ) f t t 1 = HP t t 1 H + Q, since we don t know β t. Note: cov(hβ t, e t ) = 0 because E[v t e t ] = 0. If E[v te t] 0 we will add another term in the var(y t Ω t 1) capturing that.

Joint Distribution Covariance between β t and y t : cov(y t, β t Ω t 1 ) = P t t 1 H, as cov(hβ t + e t, β t ) = cov(hβ t, β t ) + cov(e t, β t ) = cov(β t, β t )H + 0. Then, the joint distribution for y t and β t is joint normal: ([ ] [ ]) β t y t Ω βt t 1 Pt t 1 P t 1 N, t t 1 H Hβ t t 1 P t t 1 H. f t t 1

Kalman Filter Two steps of Kalman Filter : (a) Prediction, (b) Given y t updating inference on β t. Definition Given β 0 0, P 0 0, Kalman Filter solves the following six equations for i = 1,..., T Prediction of y t, β t (1) β t t 1 = µ + Fβ t 1 t 1, (2) P t t 1 = F P t 1 t 1 F + Q, Forecast error: Variance of forecast error: (3) η t t 1 y t y t t 1 = y t Hβ t t 1, (4) f t t 1 = H P t t 1 H + R Updating of y t, β t (5) β t t = β t t 1 + κ tη t t 1, (6) P t t = P t t 1 κ thp t t 1, κ t P t t 1 H f 1 t t 1 Kalman gain.

Kalman Filter β 0 0, P 0 0, are equal to unconditional mean and variance, and reflect prior beliefs. Equation (5) is a linear combination of previous guess and forecast error. (5) β t t = β t t 1 + κ t η t t 1, (6) P t t = P t t 1 κ t HP t t 1, κ t P t t 1 H f 1 t t 1 Kalman gain. The Kalman gain depends on the relationship between y t and β t since P t t 1 H = cov(β t, y t) and f 1 t t 1 is the precision of the forecast error. The bigger the variance of forecast error the smaller the Kalman gain and less weight put to updating. Equation (6) measures conditional variance. Since we observe y t the uncertainty declines.

Kalman Gain The stronger the covariance between y t and β t, the more we will update when we see high forecast error. If the relationship is weaker, we don t put much weight as probably it is not driven by β t. The weight depends on the variance of forecast error: if f 1 big, put high weight on that observations. Once we have η t t 1, f t t 1, we can do MLE after constructing the joint likelihood of prediction error decomposition.