#2.3 ML Estimation in an Autoregressive Model Solution Guide

Size: px

Start display at page:

Download "#2.3 ML Estimation in an Autoregressive Model Solution Guide"

Madeline Morrison
7 years ago
Views:

1 Econometrics II Fall 206 Department of Economics, University of Copenhagen By Morten Nyboe Tabor #2.3 ML Estimation in an Autoregressive Model Solution Guide Present the intuition for the maximum likelihood estimation principle, and outline the basic steps in deriving the estimators and the covariance matrix of the estimates. What is the number of parameters in the statistical model? We assume to know the density of y t, given by, y t Density θ, 2. where θ is a K-dimensional vector of parameters for the assumed density. The probability of observing y t is given by the density function, f y t θ. For independent and identically distributed iid observations the joint density is given by, f y,..., y T The likelihood function, defined as, f y t θ. 2.2 L θ f y,..., y T f y t θ L t θ. 2.3 can be written as the product of the individual likelihood contributions, L t θ, which indicate how much the individual observations contribute to the joint likelihood. The maximum likelihood estimator, θ ML, maximizes the joint likelihood. The intuition behind ML is that we select the estimator that maximizes the probability of observing the data given the model. Often, however, it is often more

2 convinient to work with the log of the likelihood function, T log L θ log L t θ log L t θ. 2.4 Since the log function is a monotonic transformation, maximizing the log-likelihood and the likelihood functions yield the same results, but the log-likelihood function is typically easier to work with. To derive the ML estimator we carry out the following steps. Step. Write the likelihood function and the log-likehood function given the assumed distribution, L θ f y,..., y T f y t θ T log L θ log L i θ L t θ 2.5 log L t θ. 2.6 Step 2. Find the score vector, which is the first derivative of the log-likelihood function with respect to the parameter vector θ, log L θ s θ K log L t θ s t θ. 2.7 Note, that sθ is of dimension K and note that the score vector can be written as the sum of the individual scores for each observation. The score vector indicates the slope of the log-likelihood function. Step 3. Find the first order conditions for the ML estimator, θ ML, s θml K s t θml and solve for θ ML to find the ML estimator. Note, that this is a system with K equations, the K so-called likelihood equations, and K parameters. In practice, it can be impossible to find an analytical solution to the likelihood equations. In such cases, numerical optimization algorithms can be used to find the ML estimator. Step 4. Find the Hessian as the second derivative, H t 2 log L t θ K K, 2.9 2

3 which indicates the curvature of the log-likelihood function. The Hessian is a block-diagonal K K matrix. Additionally, find the information matrix for observation t, Step 5. [ 2 ] log L t θ I θ E E [H t ]. 2.0 The asymptotic covariance matrix is given by the inverse of the information. As the Hessian measures the curvature of the log-likelihood function the variance of the ML estimator depends on the Hessian. The greater curvature, the greater the second derivative, and the smaller the variance. The ML estimator is asymptotically normally distributed with, T θml θ 0 N 0, V 2. where θ 0 are the true parameters and V is the asymptotic variance, so that, θ ML N 0, T V. 2.2 The asymptotic variance, V, is given by, [ V I θ 2 log L t θ E θθ0 ]. 2.3 In practice, we can estimate the asymptotic variance by replacing population expectations with sample averages and by replacing the unknown parameters with the ML estimates, V H N 2 log L t θ 2.4 θ θml where the subscript H indicates that the estimate is based on the Hessian alternatively, the asymptotic variance can be estimated based on the outer product of the scores. 3

4 2 Show how the joint density function for the time series, y 0, y,..., y T, denoted fy 0, y,..., y T θ,, can be factorized into a series of conditional and marginal distributions. Discuss how to construct the likelihood function for y, y 2,..., y T conditional on y 0. How does this procedure differ from the IID case? For iid data we can factorize the joint density as the product of the individual densities, f y, y 2,..., y T θ, f y t θ,. 2.5 For most economic data the iid assumption does not hold, so we cannot use this factorization. conditional and a marginal density, However, we use the factorization of a joint density into a f A, B f A B f B 2.6 to factorize the joint unconditional density of y 0, y,..., y T into a series of conditional and marginal densities, f y 0, y,..., y T θ, f y T y 0, y,..., y T ; θ, f y 0, y,..., y T ; θ, f y T y 0, y,..., y T ; θ, f y T y 0, y,..., y T 2 ; θ, f y 0, y,..., y T 2 ; θ,... f y t y 0, y,..., y t ; θ, f y 0 θ,. 2.7 By rewriting the expression, we get the joint density of y, y 2,..., y T conditional on y 0, f y,..., y T y 0 ; θ, f y 0, y,..., y T θ, f y 0 θ, f y t y 0, y,..., y t ; θ,. 2.8 Despite that the time series data do not satisfy the iid assumption, we can still factorize the joint density into a product of the individual densities when we condition on y 0. Thereby, we can still use the usual additive form for ML estimation based on the log-likelihood function. 4

5 3 Find an expression for the likelihood contribution for y t y t, denoted L t θ,, and state the likelihood function for y, y 2,..., y T y 0. Also write the corresponding log-likelihood function. We consider the first order autoregressive, AR, model y t θy t + ɛ t, t, 2,..., T, 2.9 where we assume that ɛ t N0, and we condition on the initial value y 0. We derive the ML estimator based on the assumption that the error term is normally distributed. Note, that we have two parameters to estimate: the autoregressive parameter, θ, and the variance of the error term,. First, we find the likelihood contribution of the error terms, ε t y t θy t, where we assume that Eε t 0, L t θ, σ 2 f y t y t ; θ, { } 2π exp e t µ ε 2 The likelihood function is given by, { } 2π exp y t θy t L θ, f y, y 2,..., y T y 0 ; θ, f y t y t ; θ, L t θ, σ 2 { } 2π exp y t θy t The log-likelihood function is given by, log L θ, 2 log 2π 2 log y t θy t 2 2, 2.22 and the log-likelihood contributions, 2 2 log 2π 2 log y t θy t

6 4 Calculate the individual scores s t θ, σ 2 log L tθ, log L tθ,. We find the individual scores by differentiating the log-likelihood contributions with respect to the parameters θ and remember that we here differentiate with respect to and not σ. Alternatively, you could consider σ and get similar results, s t θ, σ 2 log L tθ, log L tθ, y t y t θy t y t θy t 2 σ State the likelihood equations as the first order conditions for maximizing the log-likelihood function. Solve the first order conditions and find the ML estimators, θ ML and ML. The first order conditions are given by the likelihood equations, s θ, σ 2 s t θ, y t y t θy t σ 2 + y t θy t σ 4 We can rewrite the two equations separately as, y t y t θy t 0 y t y t y t y t θy t 0 y t θyt 0 y t y t θ yt,

7 and, y 2 + t θy 2 t 2 σ T + 2 y t θy 2 t σ 4 0 T y t θy t 2 σ 4 Ṱ Hence, we get the ML estimator of the autoregressive parameter, θ ML T y 2 t T y t y t, 2.28 and, by noting that ε t y t θy t, we get the ML estimator of the error variance, ML T ε 2 t How do the ML estimators compare to the OLS estimators in the model 2.9? The maximum likelihood ML estimator, θ ML, is identical to the OLS estimator, θ OLS, but note that the ML estimator of the error variance is different from the OLS estimator of the error variance, given by, OLS T K ε 2 t We know that the OLS estimator of the error variance, σ OLS 2, is unbiased, so the ML estimator, σ ML 2, must be biased but consistent. We also note, that the ML estimator has the smallest possible asymptotic variance among all consistent and asymptotically normal estimators denoted the Cramer-Rao lower bound. 7

8 7 Find the Hessian matrix of double derivatives, H t and the information matrix 2 log L tθ, 2 log L tθ, 2 log L tθ, 2 log L tθ,, Iθ, E[H t ]. Comment on the role of the information matrix in inference on the parameters and state the asymptotic distribution. The Hessian matrix is the second-derivative of the log-likelihood contributions, given by, which gives the Hessian matrix, H [ 2 log L tθ, 2 log L tθ, yt y t θy t yt y t θy t y t y t θy t y2 t 2.3 y t ε t σ y t θy t 2 σ 4 y t ε t σ y t θy t 2 σ 4 2σ 4 ε2 t σ log L tθ, 2 log L tθ, 2 + y t θy t 2 2 σ 4 y2 t y t ε t σ 4 y t ε t σ 4 2σ 4 ε2 t σ 6 ] y t y t θy t 2 + y t θy t 2 2 σ Note, that the Hessian is always block diagonal. The information matrix is the negative expected Hessian, I θ, E [ H θ, ]

9 As E [ε t ] 0, E [ ε 2 t ], and E [y t ε t ] 0we get the information matrix, I θ, [ E [ y 2 ] t 0 0 2σ 4 The variance of the ML estimator, θ ML, is given by, V θml T ] E [ yt 2 ] As we do not know E [ y 2 t ], we replace the expectation with the sample average to get the estimate of the asymptotic variance, V θml T σ2 ML T y 2 t ML yt

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written