1 / 24 Time Series Modelling and Kalman Filters Chris Williams School of Informatics, University of Edinburgh November 2010
2 / 24 Outline Stochastic processes AR, MA and ARMA models The Fourier view Parameter estimation for ARMA models Linear-Gaussian HMMs (Kalman filtering) Reading: Handout on Time Series Modelling: AR, MA, ARMA and All That Reading: Bishop 13.3 (but not 13.3.2, 13.3.3)
3 / 24 Example time series FTSE 100 Meteorology: temperature, pressure... Seismology Electrocardiogram (ECG)...
Stochastic Processes A stochastic process is a family of random variables X(t), t T indexed by a parameter t in an index set T We will consider discrete-time stochastic processes where T = Z (the integers) A time series is said to be strictly stationary if the joint distribution of X(t 1 ),..., X(t n ) is the same as the joint distribution of X(t 1 + τ),..., X(t n + τ) for all t 1,..., t n, τ A time series is said to be weakly stationary if its mean is constant and its autocovariance function depends only on the lag, i.e. E[X(t)] = µ Cov[X(t)X(t + τ)] = γ(τ) A Gaussian process is a family of random variables, any finite number of which have a joint Gaussian distribution The ARMA models we will study are stationary Gaussian processes t 4 / 24
Autoregressive (AR) Models 5 / 24 Example AR(1) where w t N(0, σ 2 ) x t = αx t 1 + w t By repeated substitution we get x t = w t + αw t 1 + α 2 w t 2 +... Hence E[X(t)] = 0, and if α < 1 the process is stationary with Similarly Var[X(t)] = (1 + α 2 + α 4 +...)σ 2 = σ2 1 α 2 Cov[X(t)X(t k)] = α k Var[X(t k)] = αk σ 2 1 α 2
α = 0.5 6 / 24 α = 0.5
7 / 24 The general case is an AR(p) process x t = p α i x t i + w t i=1 Notice how x t is obtainied by a (linear) regression from x t 1,..., x t p, hence an autoregressive process Introduce the backward shift operator B, so that Bx t = x t 1, B 2 x t = x t 2 etc Then AR(p) model can be written as φ(b)x t = w t where φ(b) = (1 α 1 B... α p B p ) The condition for stationarity is that all the roots of φ(b) lie outside the unit circle
Yule-Walker Equations x t = x t x t k = p α i x t i + w t i=1 p α i x t i x t k + w t x t k i=1 Taking expectations (and exploiting stationarity) we obtain γ k = p α i γ k i k = 1,, 2,... i=1 Use p simultaneous equations to obtain the γ s from the α s. For inference, can solve a linear system to obtain the α s given estimates of the γ s Example: AR(1) process, γ k = α k γ 0 8 / 24
9 / 24 Graphical model illustrating an AR(2) process...... AR2: α 1 = 0.2, α 2 = 0.1 AR2: α 1 = 1.0, α 2 = 0.5
Vector AR processes 10 / 24 x t = p A i x t i + Gw t i=1 where the A i s and G are square matrices We can in general consider modelling multivariate (as opposed to univariate) time series An AR(2) process can be written as a vector AR(1) process: ( ) ( ) ( ) ( ) ( xt α1 α = 2 xt 1 1 0 wt + 1 0 0 0 x t 1 x t 2 w t 1 ) In general an AR(p) process can be written as a vector AR(1) process with a p-dimensional state vector (cf ODEs)
11 / 24 Moving Average (MA) processes x t = q β j w t j j=0 (linear filter) = θ(b)w t with scaling so that β 0 = 1 and θ(b) = 1 + q j=1 β jb j Example: MA(1) process w s x s
12 / 24 We have E[X(t)] = 0, and Var[X(t)] = (1 + β 2 1 +... + β2 q)σ 2 Cov[X(t)X(t k)] = E[ = { q β j w t j, j=0 q β i w t k i ] i=0 Note that covariance cuts off for k > q σ 2 q k j=0 β j+kβ j for k = 0, 1,..., q 0 for k > q
13 / 24 ARMA(p,q) processes x t = p q α i x t i + β j w t j i=1 j=0 φ(b)x t = θ(b)w t Writing an AR(p) process as a MA( ) process φ(b)x t = w t x t = (1 α 1 B... α p B p ) 1 w t = (1 + β 1 B + β 2 B 2...)w t Similarly a MA(q) process can be written as a AR( ) process Utility of ARMA(p,q) is potential parsimony
14 / 24 The Fourier View ARMA models are linear time-invariant systems. Hence sinusoids are their eigenfunctions (Fourier analysis) This means it is natural to consider the power spectrum of the ARMA process. The power spectrum S(k) can be determined from the {α}, {β} coefficients Roughly speaking S(k) is the amount of power allocated on average to the eigenfunction e 2πikt This is a useful way to understand some properties of ARMA processes, but we will not pursue it further here If you want to know more, see e.g. Chatfield (1989) chapter 7 or Diggle (1990) chapter 4
15 / 24 Parameter Estimation Let the vector of observations x = (x(t 1 ),..., x(t n )) T Estimate and subtract constant offset ˆµ if this is non zero ARMA models driven by Gaussian noise are Gaussian processes. Thus the likelihood L(x; α, β) is a multivariate Gaussian, and we can optimize this wrt the parameters (e.g. by gradient ascent) AR(p) models, x t = p α i x t i + w t i=1 can be viewed as the linear regression of x t on the p previous time steps, α and σ 2 can be estimated using linear regression This viewpoint also enables the fitting of nonlinear AR models
16 / 24 Model Order Selection, References For a MA(q) process there should be a cut-off in the autocorrelation function for lags greater than q For general ARMA models this is model order selection problem, discussed in an upcoming lecture Some useful books: The Analysis of Time Series: An Introduction. C. Chatfield, Chapman and Hall, 4th edition, 1989 Time Series: A Biostatistical Introduction. P. J. Diggle, Clarendon Press, 1990
17 / 24 Linear-Gaussian HMMs HMM with continuous state-space and observations Filtering problem known as Kalman filtering
18 / 24 z 1 z2 z3 z N A A.. C C C C x x x x 1 2 3 N Dynamical model z n+1 = Az n + w n+1 where w n+1 N(0, Γ) is Gaussian noise, i.e. p(z n+1 z n ) N(Az n, Γ)
19 / 24 Observation model x n = Cz n + v n where v n N(0, Σ) is Gaussian noise, i.e. p(x n z n ) N(Cz n, Σ) Initialization p(z 1 ) N(µ 0, V 0 )
Inference Problem filtering 20 / 24 As whole model is Gaussian, only need to compute means and variances p(z n x 1,..., x n ) N(µ n, V n ) µ n = E[z n x 1,..., x n ] V n = E[(z n µ n )(z n µ n ) T x 1,..., x n ] Recursive update split into two parts Time update p(z n x 1,..., x n ) p(z n+1 x 1,..., x n ) Measurement update p(z n+1 x 1,..., x n ) p(z n+1 x 1,..., x n, x n+1 )
21 / 24 Time update z n+1 = Az n + w n+1 thus E[z n+1 x 1,... x n ] = Aµ n cov(z n+1 x 1,... x n ) def = P n = AV n A T + Γ Measurement update (like posterior in Factor Analysis) µ n+1 = Aµ n + K n+1 (x n+1 CAµ n ) V n+1 = (I K n+1 C)P n where K n+1 = P n C T (CP n C T + Σ) 1 K n+1 is known as the Kalman gain matrix
Simple example 22 / 24 w n N(0, 1) v n N(0, 1) In the limit σ 2 we find z n+1 = z n + w n+1 x n = z n + v n p(z 1 ) N(0, σ 2 ) µ 3 = 5x 3 + 2x 2 + x 1 8 Notice how later data has more weight Compare z n+1 = z n (so that w n has zero variance); then µ 3 = x 3 + x 2 + x 1 3
23 / 24 Applications Much as a coffee filter serves to keep undesirable grounds out of your morning mug, the Kalman filter is designed to strip unwanted noise out of a stream of data. Barry Cipra, SIAM News 26(5) 1993 Navigational and guidance systems Radar tracking Sonar ranging Satellite orbit determination
24 / 24 Extensions Dealing with non-linearity The Extended Kalman Filter (EKF) If x n = f (z n ) + v n where f is a non-linear function, can linearize f, e.g. around E[z n x 1,... x n 1 ]. Works for weak non-linearities For very non-linear problems use sampling methods (known as particle filters). Example, work of Blake and Isard on tracking, see http://www.robots.ox.ac.uk/ vdg/dynamics.html It is possible to train KFs using a forward-backward algorithm