Parameter Estimation for Linear Dynamical Systems. Zoubin Ghahramani. Georey E. Hinton. University of Toronto. Toronto, Canada M5S 1A4

Size: px

Start display at page:

Download "Parameter Estimation for Linear Dynamical Systems. Zoubin Ghahramani. Georey E. Hinton. University of Toronto. Toronto, Canada M5S 1A4"

Cecily Nelson
7 years ago
Views:

1 Parameter Estimation for Linear Dynamical Systems Zoubin Ghahramani Georey E. Hinton Department of Computer Science University of Toronto 6 King's College Road Toronto, Canada M5S A4 zoubin@cs.toronto.edu Technical Report CRG-TR-96- February, 996 Abstract Linear systems have been used extensively in engineering to model and control the behavior of dynamical systems. In this note, we present the Expectation Maximization (EM) algorithm for estimating the parameters of linear systems (Shumway and Stoer, 98). We also point out the relationship between linear dynamical systems, factor analysis, and hidden Markov models. Introduction The goal of this note is to introduce the EM algorithm for estimating the parameters of linear dynamical systems (LDS). Such linear systems can be used both for supervised and unsupervised modeling of time series. We rst describe the model and then briey point out its relation to factor analysis and other data modeling techniques. The Model Linear time-invariant dynamical systems, also known as linear Gaussian state-space models, can be described by the following two equations: x t+ = Ax t + w t () y t = Cx t + v t : () Time is indexed by the discrete index t. The output y t is a linear function of the state, x t, and the state at one time step depends linearly on the previous state. Both state and output noise, w t and v t, are zero-mean normally distributed random variables with covariance matrices Q and R, respectively. Only the output of the system is observed, the state and all the noise variables are hidden. Rather than regarding the state as a deterministic value corrupted by random noise, we combine the state variable and the state noise variable into a single Gaussian random

2 variable; we form a similar combination for the output. Based on () and () we can write the conditional densities for the state and output, P (y t jx t ) = exp [y t Cx t ] 0 R [y t Cx t ] () p= jrj = (3) P (x t jx t ) = exp [x t Ax t ] 0 Q [x t Ax t ] () k= jqj = (4) A sequence of T output vectors (y ; y ; : : : ; y T ) is denoted by fyg; a subsequence (y t0 ; y t0 +; : : : ; y t ) by fyg t t0 ; similarly for the states. By the Markov property implicit in this model, P (fxg; fyg) = P (x ) TY P (x t jx t ) Assuming a Gaussian initial state density P (x ) = exp [x ] 0 V [x ] Therefore, the joint log probability is a sum of quadratic terms, log P (fxg; fyg) = t= TY t= [y t Cx t ] 0 R [y t Cx t ] T [x t Ax t ] 0 Q [x t Ax t ] P (y t jx t ): (5) () k= jv j = : (6) log jrj T log jqj [x ] 0 V [x ] log jv T (p + k) j log : (7) Often the inputs to the system can also be observed. In this case, the goal is to model the input{output response of a system. Denoting the inputs by u t, the state equation is x t+ = Ax t + Bu t + w t : (8) where B is the input matrix relating inputs linearly to states. We will present the learning algorithm for the output-only case, although the extensions to the input{output case are straightforward. If only the outputs of the system can be observed the problem can be seen as an unsupervised problem. That is, the goal is to model the unconditional density of the observations. If both inputs and outputs are observed, the problem becomes supervised, modeling the conditional density of the output given the input. Related Methods In its unsupervised incarnation, this model is an extension of maximum likelihood factor analysis (Everitt, 984). The factor, x t, evolves over time according to linear dynamics. In factor analysis, a further assumption is made that the output noise along each dimension

3 is uncorrelated, i.e. that R is diagonal. The goal of factor analysis is therefore to compress the correlational structure of the data into the values of the lower dimensional factors, while allowing independent noise terms to model the uncorrelated noise. The assumption of a diagonal R matrix can also be easily incorporated into the estimation procedure for the parameters of a linear dynamical system. The linear dynamical system can also be seen as a continuous-state analogue of the hidden Markov model (HMM; see Rabiner and Juang, 986, for a review). The forward part of the forward-backward algorithm from HMMs is computed by the well-known Kalman lter in LDSs; similarly, the backward part is computed by using Rauch's recursion (Rauch, 963). Together, these two recursions can be used to solve the problem of inferring the probabilities probabilities of the states given the observation sequence (known in engineering as the smoothing problem). These posterior probabilities form the basis of the E step of the EM algorithm. Finally, linear dynamical systems can also be represented as graphical probabilistic models (sometimes referred to as belief networks). The Kalman-Rauch recursions are special cases of the probability propagation algorithms that have been developed for graphical models (Lauritzen and Spiegelhalter, 988; Pearl, 988). The EM Algorithm Shumway and Stoer (98) presented an EM algorithm for linear dynamical systems where the observation matrix, C, is known. Since then, many authors have presented closely related models and extensions, also t with the EM algorithm (Shumway and Stoer, 99; Kim, 994; Athaide, 995). Here we present a basic form of the EM algorithm with C unknown, an obvious modication of Shumway and Stoer's original work. This note is meant as a succinct review of this literature for those wishing to implement learning in linear dynamical systems. The E step of EM requires computing the expected log likelihood, Q = E[log P (fxg; fyg)jfyg]: (9) This quantity depends on three expectations E[x t jfyg], E[x t x 0 t jfyg], E[x tx 0 t jfyg] which we will denote by the symbols: ^x t E[x t jfyg] (0) P t E[x t x 0 tjfyg] () P t;t E[x t x 0 t jfyg]: () Note that the state estimate, ^x t, diers from the one computed in a Kalman lter in that it depends on past and future observations; the Kalman lter estimates E[x t jfyg t ] (Anderson and Moore, 979). We rst describe the M step of the parameter estimation algorithm before showing how the above expectations are computed in the E step. 3

4 The M step The parameters of this system are A, C, R, Q,, V. Each of these is re-estimated by taking the corresponding partial derivative of the expected log likelihood, setting to zero, and solving. This results in the following: Output = t= C new = R y t^x 0 t + t= y t^x 0 t t=! T X t= R CP t = 0 (3) P t! (4) Output noise = T R t= R new = T y ty 0 t C^x ty 0 t + CP tc 0 = 0 (5) t= (y t y 0 t C new^x t y 0 t) (6) State dynamics = Q P t;t + Q AP t = 0 (7) A new =! X T P t;t P t! (8) State noise covariance: = T Q = T Q (P t AP t ;t P t;t A 0 + AP t A 0 ) = 0 Q new = T P t A new P t A new P t ;t! P t ;t! (9) (0) Initial state = (^x )V = 0 () new = ^x () 4

5 Initial state = V (P ^x 0 ^x ) (3) V new = P ^x ^x 0 (4) The above equations can be readily generalized to multiple observation sequences, with one subtlety regarding the estimate of the initial state covariance. Assume N observation sequences of length T, let ^x (i) t be the estimate of state at time t given the i th sequence, and Then the initial state covariance is The E step ^x t = N V new = P ^x ^x 0 + N NX i= NX i= ^x (i) t : [^x (i) ^x ] [^x (i) ^x ] 0 : (5) Using x t to denote E(x t jfyg ), and Vt Kalman lter forward recursions: to denote Var(x t jfyg ), we obtain the following x t t = Ax t t (6) Vt t = AV t t A0 + Q (7) K t = Vt t C 0 (CVt t C 0 + R) (8) x t t = x t t + K t (y t Cx t t ) (9) Vt t = Vt t K t CVt t ; (30) where x 0 = and V 0 = V. Following Shumway and Stoer (98), to compute ^x t x T t and P t Vt T + x T t xt 0 t one performs a set of backward recursions using J t = V t t A0 (V t t ) (3) x T t = x t t + J t (x T t V T t = V t t + J t (V T t Axt t ) (3) V t t )J 0 t : (33) We also require P t;t V T + t;t xt t xt 0 t, which can be obtained through the backward recursions V T t ;t = V t t J 0 t + J t (V T t t;t AVt )J 0 t ; (34) which is initialized V T T;T = (I K T C)A V T T : 5

6 References Anderson, B. D. O. and Moore, J. B. (979). Optimal Filtering. Prentice-Hall, Englewood Clis, NJ. Athaide, C. R. (995). Likelihood Evaluation and State Estimation for Nonlinear State Space Models. Ph.D. Thesis, Graduate Group in Managerial Science and Applied Economics, University of Pennsylvania, Philadelphia, PA. Everitt, B. S. (984). An Introduction to Latent Variable Models. Chapman and Hall, London. Kim, C.-J. (994). Dynamic linear models with Markov-switching. J. Econometrics, 60:{. Lauritzen, S. L. and Spiegelhalter, D. J. (988). Local computations with probabilities on graphical structures and their application to expert systems. J. Royal Statistical Society B, pages 57{4. Pearl, J. (988). Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Mateo, CA. Rabiner, L. R. and Juang, B. H. (986). An Introduction to hidden Markov models. IEEE Acoustics, Speech & Signal Processing Magazine, 3:4{6. Rauch, H. E. (963). Solutions to the linear smoothing problem. IEEE Transactions on Automatic Control, 8:37{37. Shumway, R. H. and Stoer, D. S. (98). An approach to time series smoothing and forecasting using the EM algorithm. J. Time Series Analysis, 3(4):53{64. Shumway, R. H. and Stoer, D. S. (99). Dynamic linear models with switching. J. Amer. Stat. Assoc., 86:763{769. 6

Deterministic Sampling-based Switching Kalman Filtering for Vehicle Tracking

Proceedings of the IEEE ITSC 2006 2006 IEEE Intelligent Transportation Systems Conference Toronto, Canada, September 17-20, 2006 WA4.1 Deterministic Sampling-based Switching Kalman Filtering for Vehicle