Maximum Likelihood Estimation of an ARMA(p,q) Model

Maximum Likelihood Estimation of an ARMA(p,q) Model Constantino Hevia The World Bank. DECRG. October 8 This note describes the Matlab function arma_mle.m that computes the maximum likelihood estimates of a stationary ARMA(p,q) model. Problem: To t an ARMA(p,q) model to a vector of time series fy ; y ; :::; y T g with zero unconditional mean. An ARMA(p,q) process is given by y t = y t + ::: + p y t p + " t + " t + ::: + q " t q ; where " t is an i.i.d. shock normally distributed with mean zero and variance. If the original P series do not have zero mean, we rst construct ~y t = y T t s= y s=t and then t the ARMA model to ~y t. Usage: results = arma_mle(y,p,q,[info]) Arguments: y = vector of observed time series with mean zero. p = length of the autoregressive part (AR) of the ARMA model (integer) q = length of the moving average part (MA) of the ARMA model (integer) info = [optional] If info is not zero, the program prints information about the convergence of the optimization algorithm. The default value is zero. Output: A structure with the following elements: results.ar = h^ ; ^ ; :::; ^ i p results.ma = h^ ; ^ ; :::; ^ i q : estimated coe cients of the AR part. : estimated coe cients of the MA part. results.sigma =^ : estimated standard deviation of " t.

The le test_arma_mle.m performs a Montecarlo experiment using the function arma_mle.m. The user inputs a theoretical ARMA model. The program runs a large number of simulations and then estimates the parameters for each simulation. Finally, the histograms of the estimates are shown. Algorithm In this section I describe the algorithm used to compute the maximum likelihood estimates of the ARMA(p,q) process. Suppose that we want to t the (mean zero) time series fy t g T t= the the following ARMA(p,q) model y t = y t + ::: + p y t p + " t + " t + ::: + q " t q ; () where " t is an i.i.d. shock normally distributed with mean zero and variance. Let r = max (p; q + ), and rewrite the model as y t = y t + ::: + r y t r + " t + " t + ::: + r " t r+ : () We interpret j = for j > p and j = for j > q. The estimation procedure is based on the Kalman lter (see Hamilton (994) for the derivation of the lter). To use the Kalman lter we need to write the model in the following (state-space) form x t+ = Ax t + R" t+ (3) y t = Z x t (4) where x t is an r state vector, A is an r r matrix, and R and Z are r vectors. These matrices and vectors are de ned as follows A = 6 4 3 3. ; R =... 7 6 r 5 4 r. r 3 3 ; Z =. 7 6 7 5 4 5 To see that the system (3) and (4) is equivalent to (), write the last row of (3) as x r;t+ = r x ;t + r " t+

Lagging this equation r periods we nd x r;t r+ = r L r x ;t + r L r " t+ (5) where we de ne L r x t = x t row implies r as the r lag operator for any integer r. The second to last x r ;t+ = r x ;t + x r;t + r " t+ Lagging r periods we obtain x r ;t r+3 = r L r x ;t + x r;t r+ + r L r " t+ Introducing (5) into the previous equation we nd x r ;t r+3 = r L r x ;t + r L r x ;t + r L r " t+ + r L r " t+ or x r ;t r+3 = r L r + r L r x ;t + r L r + r L r " t+ (6) Take now row r, x r ;t+ = r x ;t + x r ;t + r 3 " t+ Lagging r 3 periods we nd x r ;t r+4 = r L r 3 x ;t + x r ;t r+3 + r 3 L r 3 " t+ Plugging (6) into the previous equation we obtain x r ;t r+4 = r L r 3 + r L r + r L r x ;t + r L r + r L r + r 3 L r 3 " t+ Following this iterative procedure until row r we nd x ;t+ = + ::: + r L r 3 + r L r + r L r x ;t + r L r + r L r + r 3 L r 3 + ::: + " t+ or L L ::: r L r x ;t+ = r L r + r L r + r 3 L r 3 + ::: + " t+ (7) 3

Now, the observation equation (4) and the de nition of Z imply y t = x ;t Using (7) evaluated at t we arrive at the ARMA representation (), L L ::: r L r y t = r L r + r L r + r 3 L r 3 + ::: + " t which proves that the system (3), (4) is equivalent to (). Denote by ^x t+jt = E t [x t+ jy ; :::; y t ; x ] the expected value of x t+ conditional on the history of observations (y ; :::; y t ). The Kalman lter provides an algorithm for computing recursively ^x t+jt given an initial value ^x j =. (Note that is the unconditional mean of x t ). Associated with each of these forecasts is a mean squared error matrix, de ned as h P t+jt = E x t+ ^x t+jt xt+ ^x t+jt i : Given the estimate ^x tjt, we use (4) to compute the innovations a t = y t E [y t jy ; :::; y t ; x ] = y t Z ^x tjt The innovation variance, denoted by! t, satis es! t = E y t Z ^x tjt yt Z ^x tjt = E Z x t Z ^x tjt Z x t Z ^x tjt = Z P tjt Z: (8) In addition to the estimates ^x t+jt, the Kalman lter equations imply the following evolution of the matrices P t+jt P t+jt = A P tjt P tjt ZZ P tjt =! t A + RR : (9) Given an initial matrix P j = E (x t x t) and the initial value ^x j =, the likelihood function of the observation vector fy ; y ; :::; y T g is given by L = TY (! t ) = exp a t! t 4

Taking logarithms, dropping the constant, and multiplying by we obtain ln (!t ) + a t =! t () In principle, to nd the MLE estimates we maximize () with respect to the parameters j, j, and. However, the following trick allows us to concentrate-out the term, and maximize only with respect to the parameters j, j. Suppose we initialize the lter with the matrix ~P j = P j. Then, from (9) it follows that each P t+jt is proportional to, and from (8) it follows that the innovation variance is also proportional to. This implies that we can optimize rst with respect to by hand, replace the result into the objective function, and then optimize the resulting expression (called the concentrated log-likelihood ) with respect to the parameters j, j : To see this, note that () becomes ln a! t + t! t () and is cancelled out in the evolution equations of P t+jt and in the projections ^x t+jt. So we can directly optimize () with respect to to obtain = T a t =! t : Replacing this result into () we obtain the concentrated log-likelihood function = = ln + ln! t + a t! t "! a P # T t ln : + ln! t + a t =! t P T! T t n a t =! t : " # T ln (=T ) + T + T ln a t =! t + ln! t or, dropping irrelevant constants, " T ln a t =! t + # ln! t () Because the innovations a t and the variances! t are nonlinear functions of the parameters [; ], 5

we use numerical methods to maximize (). The Matlab function arma_mle.m performs this task using the optimization routine fminunc.m from the Matlab optimization package. The initial condition for the parameters are based on the two-step regression procedure described in Hannan and McDougall (984). The rst step consists in running a (relatively) long autoregression and computing the tted residuals. The second steps computes an OLS regression of y t on its p lagged values, and on q lagged values of the tted residuals obtained in the rst step. REFERENCES [] James D. Hamilton Time Series Analysis, 994. Princeton University Press. [] E.J. Hannan and A.J. McDougall. Regression Procedures for ARMA Estimation, Journal of the American Statistical Association, Vol 83, No 49, June 988. 6