Class 5: Kalman Filter

Similar documents
Time Series Analysis III

Time Series Analysis 1. Lecture 8: Time Series Analysis. Time Series Analysis MIT 18.S096. Dr. Kempthorne. Fall 2013 MIT 18.S096

Chapter 1. Vector autoregressions. 1.1 VARs and the identi cation problem

State Space Time Series Analysis

Financial TIme Series Analysis: Part II

1 Short Introduction to Time Series

Impulse Response Functions

Estimating an ARMA Process

Univariate and Multivariate Methods PEARSON. Addison Wesley

y t by left multiplication with 1 (L) as y t = 1 (L) t =ª(L) t 2.5 Variance decomposition and innovation accounting Consider the VAR(p) model where

Time Series Analysis

Chapter 4: Vector Autoregressive Models

3.1 Stationary Processes and Mean Reversion

Time Series Analysis

Understanding and Applying Kalman Filtering

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

SYSTEMS OF REGRESSION EQUATIONS

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

Master s Theory Exam Spring 2006

Multivariate time series analysis is used when one wants to model and explain the interactions and comovements among a group of time series variables:

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Sales forecasting # 2

System Identification for Acoustic Comms.:

Basics of Statistical Machine Learning

E 4101/5101 Lecture 8: Exogeneity

11. Time series and dynamic linear models

Probability and Random Variables. Generation of random variables (r.v.)

Trend and Seasonal Components

Time Series Analysis

The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series.

Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components

Lecture 2: ARMA(p,q) models (part 3)

Chapter 5. Analysis of Multiple Time Series. 5.1 Vector Autoregressions

Gaussian Conjugate Prior Cheat Sheet

MAN-BITES-DOG BUSINESS CYCLES ONLINE APPENDIX

Time Series in Mathematical Finance

Chapter 3: The Multiple Linear Regression Model

Quadratic forms Cochran s theorem, degrees of freedom, and all that

Analysis of Bayesian Dynamic Linear Models

Lecture 8: Signal Detection and Noise Assumption

Time series Forecasting using Holt-Winters Exponential Smoothing

1 Teaching notes on GMM 1.

Linear regression methods for large n and streaming data

Lecture 5 Least-squares

Regression Analysis. Regression Analysis MIT 18.S096. Dr. Kempthorne. Fall 2013

Centre for Central Banking Studies

EE 570: Location and Navigation

Forecasting Geographic Data Michael Leonard and Renee Samy, SAS Institute Inc. Cary, NC, USA

In this paper we study how the time-series structure of the demand process affects the value of information

Time Series Analysis

The Characteristic Polynomial

Data Mining: Algorithms and Applications Matrix Math Review

**BEGINNING OF EXAMINATION** The annual number of claims for an insured has probability function: , 0 < q < 1.

Bayes and Naïve Bayes. cs534-machine Learning

Univariate Time Series Analysis; ARIMA Models

3. Regression & Exponential Smoothing

Introduction to Matrix Algebra

Univariate Time Series Analysis; ARIMA Models

On the Efficiency of Competitive Stock Markets Where Traders Have Diverse Information

CS229 Lecture notes. Andrew Ng

Master s thesis tutorial: part III

Sections 2.11 and 5.8

8 Square matrices continued: Determinants

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber

A Model for Hydro Inow and Wind Power Capacity for the Brazilian Power Sector

Forecast covariances in the linear multiregression dynamic model.

Chapter 10: Basic Linear Unobserved Effects Panel Data. Models:

Time Series Analysis

Analysis of algorithms of time series analysis for forecasting sales

i=1 In practice, the natural logarithm of the likelihood function, called the log-likelihood function and denoted by

Dynamic Linear Models with R

Unit 18 Determinants

Lecture 3: Linear methods for classification

Introduction to Time Series Analysis. Lecture 1.

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression

DATA ANALYSIS II. Matrix Algorithms

Multivariate Normal Distribution

CITY UNIVERSITY LONDON. BEng Degree in Computer Systems Engineering Part II BSc Degree in Computer Systems Engineering Part III PART 2 EXAMINATION

October 3rd, Linear Algebra & Properties of the Covariance Matrix

α = u v. In other words, Orthogonal Projection

Math 115A HW4 Solutions University of California, Los Angeles. 5 2i 6 + 4i. (5 2i)7i (6 + 4i)( 3 + i) = 35i + 14 ( 22 6i) = i.

Solution to Homework 2

Vector and Matrix Norms

Internet Appendix to CAPM for estimating cost of equity capital: Interpreting the empirical evidence

Christfried Webers. Canberra February June 2015

Time Series Analysis in Economics. Klaus Neusser

STA 4273H: Statistical Machine Learning

Modeling and Performance Evaluation of Computer Systems Security Operation 1

Statistics in Retail Finance. Chapter 6: Behavioural models

TIME SERIES ANALYSIS

Lecture 4: Seasonal Time Series, Trend Analysis & Component Model Bus 41910, Time Series Analysis, Mr. R. Tsay

Review Jeopardy. Blue vs. Orange. Review Jeopardy

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

Least Squares Estimation

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling

Similarity and Diagonalization. Similar Matrices

2x + y = 3. Since the second equation is precisely the same as the first equation, it is enough to find x and y satisfying the system

1 Introduction to Matrices

ITSM-R Reference Manual

ANALYZING INVESTMENT RETURN OF ASSET PORTFOLIOS WITH MULTIVARIATE ORNSTEIN-UHLENBECK PROCESSES

Transcription:

Class 5: Kalman Filter Macroeconometrics - Spring 2011 Jacek Suda, BdF and PSE April 11, 2011

Outline Outline: 1 Prediction Error Decomposition 2 State-space Form 3 Deriving the Kalman Filter See Kim and Nelson, Chapter 3; Hamilton, Chapter 13

Notation Denote: {Y t } covariance-stationary process, e.g. ARMA(p, q), Ω t information available at time t, Y t+1 t forecast of Y t+1 based on Ω t. In our simple model it is Y y+1 given Y t.

Linear Projection Linear projection: Ŷ t+1 t = α X t = α 1 X 1t +... + α p X pt, where E[(Y t+1 α X t ) X it ] = 0, i = 1,..., p. p moments conditions ensure that error is orthogonal to any information in Ω t: forecast errors are uncorrelated with past information. Result: The minimum MSE linear forecast of Y t+1 is linear projection.

ARMA Models Solve Wold form Y t µ = ψ(l)ε t, ε t WN ψ(l) = ψ j L j, ψ 0 = 1, ψj 2 < j=0 j=0 Y t+s = µ + ε t+s + ψ 1 ε t+s 1 +... + ψ s ε t + ψ s+1 ε t 1 +..., Ŷ t+1 t = µ + ψ s ε t + ψ s+1 ε t 1 +.... the last line is for information till time t and E t [ε t+i ] = 0, i > 0.

ARMA Models MSE(Ŷ t+s t, Y t+s ) = E[(ε t + ψε t+s 1 +... + ψ s 1 ε t+1 ) 2 ] = σ 2 (1 + ψ 2 1 + ψ 2 2 +... + ψ 2 s 1) < var(y t+s ). But, We are better off with linear projection than with unconditional variance. lim s σ2 s ψk 2 = var(y t ). k=0 Upper limit for uncertainty is as high as the unconditional variance.

Kalman Filter Forecasts based on Wold form assume infinite number of observations. We don t have them in reality. Kalman filter calculates linear projections for finite number of observations, exact finite sample forecast, allow for exact MLE of ARMA models based on Prediction Error Decomposition.

Normal Distribution Joint Normality: y 1. y T = ỹ TT 1 N(µ T 1, Ω T T ), Since it is covariance stationary process, each Y t has the same mean and variance, ω 11 = σ 2 = ω 22 = ω TT. ω 11 ω 12 ω 1T γ 0 γ 1... γ T 1 Ω = ω 21 ω 22..... = γ 1 γ 0 ω jj = γ 0,...., as ω ij = γ i j, i > j. γ T 1 γ 0 ω T1 ω TT The likelihood function: L( θ ỹ T ) = (2π) T 2 det(ω) 1 2 e 1 2 (ỹt µ) Ω 1 (ỹ T µ).

Factorization For large T, Ω might be large and difficult to invert. Since Ω is positive definite symmetric matrix then there exists a unique, triangular factorization of Ω, Ω = AfA, where f T T = A T T = f 1 0 0 0 f 2....., f t > 0 t diagonal matrix 0 f T 1 0 a 21 1..... a T1 a T2... 1

Likelihood The likelihood function can be rewritten as: L( θ ỹ T ) = (2π) T 2 det(afa ) 1 2 e 1 2 (ỹt µ) (AfA ) 1 (ỹ T µ) Define η = A 1 (ỹ T µ)(prediction error). where Aη = (ỹ T µ). Since A is lower-triangular matrix with 1s along the principal diagonal, η 1 = y 1 µ η 2 = y 2 µ a 11η 1 η 3 = y 3 µ a 21η 1 a 22η 2. T 1 η T = y T µ i=1 a Tiη T 1

Likelihood Also, since A is lower triangular with 1s along the principal diagonal, det(a) = 1 Then, det(afa) = det(a) det(f ) det(a ) = det(f ). L( θ ỹ T ) = (2π) T 2 det(f 1 ) 1 2 e 1 2 η (f 1 ) 1 η = T ( ) 1 e 1 η t 2 2 ft, 2πft t=1 where η t is t th element of η T 1 = prediction error y t ŷ t t 1, t 1 ŷ t t 1 = a t,iy i, i = 2, 3,..., T, i=1 where a t,i is (t, i) th element of A 1.

Kalman Filter Note: Given y t N(µ, Ω), η t Ω t 1 N(0, f t ), where f t is an (t, t) diagonal element of f matrix, ln L = 1 2 T ln(2πf t ) 1 2 t=1 T t=1 η 2 t f t, since η t N and independent of each other. The Kalman filter recursively calculates linear projection of y t on past information Ω t 1 for any model that can be cast in state-space form. Kalman filter: for any structure it solves for linear prediction.

Measurement (Observation) Equation General form that encompasses a wide variety od models. 1 Measurement (Observation) Equation Represent the static relationship between observed variables (data) and unobserved state variables. y t = H t β t + Az t + e t, where y t denotes observed data, β t is a state vector that captures the dynamics, z t is exogenous, observed variables for example, lagged values of y t but also other data, and e t is an error term, e t N(0, R). The existence of the state vector makes this representation not a simple linear model.

Transition (State) Equation 2 Transition (State) Equation Captures the dynamics in the system, causes the system to go on and on. β t = µ + Fβ t 1 + v t, where µ is a vector of constants, F is the transition matrix, and v t is an error vector, v t N(0, Q). Like AR(1) but in vector/matrix form.

Transition (State) Equation β t = µ + Fβ t 1 + v t, The state vector has and AR(1) kind of representation. Describes evolution of state vector. These state vectors can be unobservable. Transition equation can be used to get information about the unobservable, conditioning on data which is observable (Bayesian).

Error terms Error terms: e t N(0, R), v t N(0, Q), where R, Q are var-cov matrices and E[e t v τ ] = 0, t, τ Restrictive assumption The model can be represented in a way that is not very restrictive. Even with E[e t v τ ] 0 we can estimate the model with (modified) Kalman Filter but it becomes more complicated. The normality assumption might not be good as always but... It allows to use MLE.

Examples: AR(p) It applies to a very wide variety of time-series models. Consider an AR(p) process State equation y t µ = φ 1 (y t 1 µ) +... + φ p (y t p µ) + ε t E(ε τ ε t ) = σ 2 for t = τ y t µ y t 1 µ. y t p+1 µ = φ 1 φ 2... φ p 1 φ p 1 0... 0 0...... 0 0... 1 0 y t 1 µ y t 2 µ. y t p µ + ε t 0. 0 Observation equation y t = µ + [ 1 0... 0 ] y t 1 µ y t 2 µ. y t p µ.

Examples: ARMA(1,1) ARMA(1,1): Set µ = 0, y t = φy t 1 + ε t + θε t 1, ε t N(0, σ 2 ). There might be more than one way to represent a model in a state-space form. There might be differences in efficiency between different ways.

Examples: ARMA(1,1) State equation: The general form Put β t = β t = Fβ t 1 + v t. [ ] [ ] yt yt 1 β ε t 1 = t ε t 1 Put y t = φy t 1 + θε t 1 + ε t in a matrix notation: [ ] [ ] [ ] yt φ θ yt 1 = + ε t 0 0 ε t 1 β t F β t 1 [ ] σ 2 σ and v t N(0, Q), Q = 2 σ 2 σ 2. y t observable, ε t unobservable, forecast error [ εt ε t ] v t,

Examples: ARMA(1,1) Observation equations: y t = [ 1 0 ] [ yt ε t y t H β t ] no exogenous variables: A = 0, also R = 0. y t = Hβ t for this case (ARMA(1,1)). The parameters φ, θ, σ 2 are captured in F, Q matrices. The Kalman Filter will estimate them. For KF what goes in β t doesn t matter. Only parameters F, Q, H, R will matter. The state vector is now defined by F, Q, H, and the observations.

ARMA(1,1): Alternative Representation A more elegant ( i.e. easier for computation) representation. Log notation (alternative representation for ARMA(1,1)) (1 φl)y t = (1 + θl)ε t y t = (1 φl) 1 (1 + θl)ε t y t = (1 + θl)(1 φl) 1 ε t. Define x t = (1 φl) 1 ε t (1 φl)x t = ε t, (x t is AR(1), not observed) x t φx t 1 = ε t Then, y t = (1 + θl)x t y t = x t + θx t 1. So y t is a linear combination of 2 unobservable AR(1) processes, x t and X t 1.

ARMA(1,1): State-Space Observation equation (all randomness in the state equation) where y t = Hβ t, y t = [ 1 0 ] [ x t x t 1 ] Inside H there are parameters to be estimated. A = 0, no exogenous, R = 0 as the observable equation is just the identity (no randomness of e t.

ARMA(1,1): State-Space State equation so [ xt x t 1 ] β t = [ ] φ 0 1 0 F v t N(0, Q), Q = So φ is in F, θ in H, and σ 2 in Q. [ xt 1 x t 2 ] β t 1 + [ σ 2 0 0 0 ]. [ εt ] 0, v t Given F, Q, H, A, R and data (y t s), use Kalman Filter to find prediction error decomposition of joint likelihood for ỹ T = (y 1,... y T ), given by L(θ, φ, σ 2 ỹ T ). (exact likelihood)

Kalman Filter Kalman filter: purpose: to make inference about unobservable given the observable, application: signal extraction in engineering, economics: don t know the parameters F, Q, H and want to estimate them. State-space form ME: Measurement (Observation) equation: SE: Transition (State) equation: y t = Hβ t + e t, e t N(0, R) β t = µ + Fβ t 1 + v t, v t N(0, Q), E[e t v τ ] = 0.

Mean of β 1 β t is a random variable it might be unobservable and no data for it, it is normal random variable as it is sum of normal variables, v t N. Conditional mean β t Ω t 1 N (E[β t Ω t 1 ], var(β t Ω t 1 )) E[β t Ω t 1 ] = β t t 1, conditional expectations. We may not know what β s are. If we have information about its distribution, we can calculate mean, variance, etc. β t 1 may be not observable: take expectations of it E[β t Ω t 1 ] β t t 1 = µ + FE[β t 1 Ω t 1 ] + 0 β t t 1 = µ + Fβ t 1 t 1, In AR(1): E[y t] = µ + φe[y t 1], last term is observable.

Variance of β Conditional variance Recall Var(β t Ω t 1 ) P t t 1 = E[(β t β t t 1 )(β t β t t 1 ) ]. var(ax) = a 2 var(x), a scalar, x random vector. Two sources of randomness (variation) for β t : 1 v t is a random variable, 2 β t 1 is also random so there might be difference between β t 1 and β t t 1, there may not be equal to each other. P t t 1 = F P t 1 t 1 F + Q, where P t t 1, uncertainty about β t equals sum of uncertainty about β t 1, P t 1 t 1, and uncertainty about v t. Note: cov(β t 1, v t ) = 0.

Kalman Filter 2 y t is a random variable. Now, we have data on y t. We have some joint density of y t, β t and some prior. Using data we get posterior of β t. We want to make inference for β t which we don t observe. We see y t which is related to beta t. We make inferences on β t by observing joint density (distribution) of ys and βs (Bayesian view).

Distribution of y t Distribution of y t given state-space y t Ω t 1 N(E[y t Ω t 1 ], var(y t Ω t 1 ), Conditional mean E[y t Ω t 1 ] y t t 1 = Hβ t t 1 + 0 Conditional variance var(y t Ω t 1 ) f t t 1 = HP t t 1 H + Q, since we don t know β t. Note: cov(hβ t, e t ) = 0 because E[v t e t ] = 0. If E[v te t] 0 we will add another term in the var(y t Ω t 1) capturing that.

Joint Distribution Covariance between β t and y t : cov(y t, β t Ω t 1 ) = P t t 1 H, as cov(hβ t + e t, β t ) = cov(hβ t, β t ) + cov(e t, β t ) = cov(β t, β t )H + 0. Then, the joint distribution for y t and β t is joint normal: ([ ] [ ]) β t y t Ω βt t 1 Pt t 1 P t 1 N, t t 1 H Hβ t t 1 P t t 1 H. f t t 1

Kalman Filter Two steps of Kalman Filter : (a) Prediction, (b) Given y t updating inference on β t. Definition Given β 0 0, P 0 0, Kalman Filter solves the following six equations for i = 1,..., T Prediction of y t, β t (1) β t t 1 = µ + Fβ t 1 t 1, (2) P t t 1 = F P t 1 t 1 F + Q, Forecast error: Variance of forecast error: (3) η t t 1 y t y t t 1 = y t Hβ t t 1, (4) f t t 1 = H P t t 1 H + R Updating of y t, β t (5) β t t = β t t 1 + κ tη t t 1, (6) P t t = P t t 1 κ thp t t 1, κ t P t t 1 H f 1 t t 1 Kalman gain.

Kalman Filter β 0 0, P 0 0, are equal to unconditional mean and variance, and reflect prior beliefs. Equation (5) is a linear combination of previous guess and forecast error. (5) β t t = β t t 1 + κ t η t t 1, (6) P t t = P t t 1 κ t HP t t 1, κ t P t t 1 H f 1 t t 1 Kalman gain. The Kalman gain depends on the relationship between y t and β t since P t t 1 H = cov(β t, y t) and f 1 t t 1 is the precision of the forecast error. The bigger the variance of forecast error the smaller the Kalman gain and less weight put to updating. Equation (6) measures conditional variance. Since we observe y t the uncertainty declines.

Kalman Gain The stronger the covariance between y t and β t, the more we will update when we see high forecast error. If the relationship is weaker, we don t put much weight as probably it is not driven by β t. The weight depends on the variance of forecast error: if f 1 big, put high weight on that observations. Once we have η t t 1, f t t 1, we can do MLE after constructing the joint likelihood of prediction error decomposition.