Alastair Hall ECG 752: Econometrics Spring 2005 SAS Handout # 5 Serial Correlation In this handout we consider methods for estimation and inference in regression models with serially correlated errors. All the discussion is in the context of a simple aggregate productions function model of the form: ln(q t ) = β 0,1 + β 0,2ln(L t ) + β 0,3ln(K t ) + u t (1) where Q t equals an index of gross national product in constant dollars in year t, L t equals a labour input index (number of persons adjusted for hours of work and education level), and K t equals a capital input index (capital stock adjusted for rates of utilization). Annual data for the U.S. for the period 1929 1967 are contained in the data file proddata that can be downloaded from the course web page http://www4.ncsu.edu/~arhall/ecg752.htm. The data can be read into SAS as follows: proc import datafile= k:\proddata out=aa dbms=tab; getnames=yes; datarow=2; data bb; set aa; y=log(q); x1=log(l); x2=log(k); Given the time series nature of the data, it is reasonable to be concerned that the error process may be serially correlated. In class, we discussed two approaches to inference based respectively on OLS and GLS estimation. In this handout, we consider how both can be performed in SAS. The output from these procedures is contained in an appendix. (i) OLS based inference: The standard regression procedure, proc reg, does not have an option for calculating robust standard errors in the presence of serial correlation. However, this feature is available within proc model if the model is estimated via Generalized Method of Moments. Instrumental variables is the GMM estimator in which the moment condition is E[z t (y t x β t 0)] = 0, and it can be recalled from ECG751 that OLS is an IV estimator with instrument vector z t = x t. 1
proc model is a very general estimation procedure that can handle both linear and nonlinear models via a variety of estimation routines. The following program estimates the model in (1) via OLS and generates robust standard errors calculated using a HAC estimator. proc model data=bb; parms b c d; y= b+(c*x1)+(d*x2); label b= intercept c= coefficient of x1 d= coefficient of x2 ; fit y / gmm kernel=(bart,1,0.2); Two aspects of this code should be noted: Since proc model is a very general procedure, it makes no assumptions about the functional form and so this must be specified. It also makes no assumption about which variables are endogenous and so this must be specified via the fit statement. The kernel= option specifies the use of an HAC estimator to calculate the long run variance. Three kernels are supported: bart, which gives the Bartlett kernel discussed in class; parzen, which gives the Parzen kernel analyzed in Practice problem set # 4; QS, whichgives the quadratic spectral kernel that we have not discussed. The next two numbers in the kernel option specify the bandwidth as follows: kernel=(bart,m,n) meansthatthehaciscalcu- lated with a Bartlett kernel and bandwidth b T = mt n. Notice also that with this formula the bandwidth need not be an integer, and, in fact, there is no reason from the underlying theory why it need be so. However, it is intuitively more appealing to work with integer values for b T with either the Bartlett or Parzen kernels and this is common practice. If you want to fix a specific bandwidth b T = b, say,thenyoumustputm = b and n =0. It should also be noted that proc model actually obtains the estimates via numerical optimization and this explains the layout of the output. In the case here, convergence occurs in one step because the model is linear. As can be seen from the output, the use of a HAC yields different standrad errors than proc reg. The program above yields b T =2.080717. Compare the results with those obtained with =2or3. b T (ii) GLS based inference: We assume that u t is an AR(1) process, that is u t = θu t 1 + w t w t i.i.d.(0,σ 2 w ) and are concerned with the problem of testing whether θ =0. 2
It can be recalled that the basic proc reg output does not contain any diagnostics for serial correlation. However, if we include the option dw then the output includes the Durbin Watson statistic. For our example, the Durbin Watson can be calculated as follows: proc reg data=bb; model y=x1 x2/ dw; Run the program and open the output window. Notice that the Durbin Watson statistic is printed out following the parameter estimates table. For this example, we obtain: d = 0.862. Recall that the Durbin Watson test is one sided. In our case, the first order residual autocorrelation is 0.554, and so it makes sense to test H 0 : θ =0versusH 1 : θ>0. Recall that the decision rule involves an inconclusive region that is: Reject H 0 if d<d L. Fail to reject H 0 if d>d U. where the upper and lower bounds, d U and d U respectively, for a 5% test are reproduced in Table G.6 on page 958 of W. H. Greene (2003), Econometric Analysis, fifth edition.. Notice that these points depend on T and also the number of regressors excluding the intercept. (Beware: Greene uses k to denote the number of regressors minus the intercept whereas in our class notation k denotes the number of regressors including the intercept.) For our example, d L =1.38 and so the Durbin Watson test indicates positive autocorrelation in the residuals. (Aside: if it is desired to test H 0 against H 1 : θ<0 then the form of the decision rule is the same but the test statistic is 4 d.; see Greene p.270.) While the Durbin Watson test statistic is routinely reported, it is only strictly valid under the Classical assumptions. A more generally applicable test can be obtained by regressing e t on e t 1. Clearly to implement this test, it is necessary to save the residuals from the OLS regression and also create the lagged value of the residual. Therefore, the test can be calculated as follows: proc reg data=bb; model y=x1 x2; output out=resdat r=e data cc; set resdat; lage=lag(e); proc reg data=cc; model e=lage; The test statistic is the regression t statistic from the regression of e t on e t 1, and we denote this 3
here by ˆτ 1. The null and alternative hypotheses are H 0 : θ =0andH 1 : θ 0. the decision rule is to reject H 0 at the 100α% significance level if ˆτ 1 >z 1 α/2 where z 1 α/2 is the 100(1 α/2) th percentile of the standard normal distribution. For our example, the p value for this test is 0.0002 and so we reject H 0 at all conventional levels of significance. Therefore, once again the evidence points towards serial correlation in the errors. Three points are worth noting about the previous test: 1. The decision rule is based on the fact that under H 0 ˆτ 1 only asymptotically valid (unlike the Durbin Watson). d N(0, 1) and so the the test is 2. The limiting distribution is valid in the independent stochastic regressor model discussed in class (that is Assumptions ISR1 ISR6) plus some additional mild regularity conditions for the WLLN and CLT. 3. If,inadditiontoISR1 ISR6,u t has a normal distribution then ˆτ 1 is the LM test for H 0 : θ =0versusH 1 : θ 0. In the face of this evidence, it is clearly desirable to re-estimate the model to take account of the serial correlation in the errors. This can be done using proc autoreg. This procedure is designed to estimate a linear regression model with an AR(p) error term of the form: y t = x t β 0 + u t u t = ɛ t p i=1 ɛ t i.i.d.(0, σ 2 ɛ ) φ i u t i where p must be specified by the user. Note that SAS reports estimates of φ and in our notation φ i = θ i. To begin we estimate this model with p = 1. The appropriate code is as follows: proc autoreg data=bb; model y=x1 x2/nlag=1; As you can see, the output from proc autoreg contains four parts: (i) the OLS results ignoring any serial correlation; (ii) the sample autocorrelations of the residuals up to lag p; (iii) estimates of the autoregressive parameters, their standard errors and a t-ratio for the hypothesis that the AR coefficient is zero; (iv) GLS estimates of the model. This output contains a number of statistics that will be discussed in class, and so we leave undefined for now. This procedure also has many of the features of proc reg. Compare the OLS and GLS results. 4
It is also possible to estimate the model by unconditional maximum likelihood, i.e. basedonthe unconditional likelihood function. To do this, it is necessary to include a second option in the model statement as follows: proc autoreg data=bb; model y=x1 x2/nlag=1 method=ml; Compare the GLS and ML results. Exercises: 1. Calculate the robust standrad errors of the OLS estimates using the Parzen kernel. Compare your results with those obtained using the Bartlett kernel. 2. Calculate lag(q), lag1(q), lag2(q), lag3(q) and print them out side by side and compare. What happens at the beginning of the series and why? 3. The LM test for H 0 : θ = 0 versus H 1 : θ 0 can be implemented by running the regression of e t on e t 1 with or without the intercept. Under H 0, the two versions are asymptotically equivalent. What difference does this make in the empirical example above? 4. Suppose that u t = θ 1 u t 1 + θ 2 u t 2 + w t, then it is possible to test H 0 : θ 1 = θ 2 = 0 versus H 1 : θ i 0foratleastonei by running a regression of e t on e t 1 and e t 2 and rejecting H 0 at the 100α% levelifthe2f>c α where F is the F-statistic calculated by SAS and c α is the 100(1 α)% percentile of the χ 2 2 distribution. Perform this test for the production function model described above. (In fact, this test generalizes in the obvious way to AR(p) errors as we discuss in class.) 5. If nlag = 1 then t-value for estimate of the AR coefficient is the Wald test for H 0 : θ =0 versus H 1 : θ 0. What is the relationship between the Wald and LM test statistics? Would you expect them to be equal? Does the choice between them have a qualitative effect on the inference in this case? 6. Estimate the model by GLS with p =2,p =3andp = 4. Compare the results. Does the choice of lag length effect the regression coefficient estimates? 7. It is also possible to estimate so called subset AR models in which certain of the AR coefficient are set to zero. For example, if it is desired the AR(4) model with φ i =0for i =1, 2, 3 then this can be done using the option nlag =(4). Run this model and compare the results with those obtained for p = 4 in the previous question. 5
8. Now estimate the AR(4) model in which φ i =0fori =2, 3. Compare the results with those obtained above. 6