Alastair Hall ECG 752: Econometrics Spring Serial Correlation

Similar documents
Wooldridge, Introductory Econometrics, 3d ed. Chapter 12: Serial correlation and heteroskedasticity in time series regressions

Performing Unit Root Tests in EViews. Unit Root Testing

SYSTEMS OF REGRESSION EQUATIONS

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)

2. Linear regression with multiple regressors

Online Appendices to the Corporate Propensity to Save

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling

Solución del Examen Tipo: 1

The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series.

TEMPORAL CAUSAL RELATIONSHIP BETWEEN STOCK MARKET CAPITALIZATION, TRADE OPENNESS AND REAL GDP: EVIDENCE FROM THAILAND

1 Teaching notes on GMM 1.

Predictability of Non-Linear Trading Rules in the US Stock Market Chong & Lam 2010

Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software

Simulation Models for Business Planning and Economic Forecasting. Donald Erdman, SAS Institute Inc., Cary, NC

Advanced Forecasting Techniques and Models: ARIMA

Chapter 1. Vector autoregressions. 1.1 VARs and the identi cation problem

Non-Stationary Time Series andunitroottests

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

From the help desk: Bootstrapped standard errors

1 Short Introduction to Time Series

The following postestimation commands for time series are available for regress:

Chapter 6: Multivariate Cointegration Analysis

FULLY MODIFIED OLS FOR HETEROGENEOUS COINTEGRATED PANELS

Chapter 9: Univariate Time Series Analysis

Some useful concepts in univariate time series analysis

Is the Forward Exchange Rate a Useful Indicator of the Future Exchange Rate?

Using R for Linear Regression

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

Introduction to General and Generalized Linear Models

THE IMPACT OF EXCHANGE RATE VOLATILITY ON BRAZILIAN MANUFACTURED EXPORTS

Assessing the Relative Power of Structural Break Tests Using a Framework Based on the Approximate Bahadur Slope

Chapter 4: Vector Autoregressive Models

Introduction to Regression and Data Analysis

Least Squares Estimation

Nonlinear Regression Functions. SW Ch 8 1/54/

The Use of Event Studies in Finance and Economics. Fall Gerald P. Dwyer, Jr.

Simple Regression Theory II 2010 Samuel L. Baker

Econometric Modelling for Revenue Projections

Time-Series Regression and Generalized Least Squares in R

Note 2 to Computer class: Standard mis-specification tests

Department of Economics

Regression Analysis (Spring, 2000)

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Centre for Central Banking Studies

ADVANCED FORECASTING MODELS USING SAS SOFTWARE

Sales forecasting # 2

Chapter 9: Serial Correlation

Forecasting of Economic Quantities using Fuzzy Autoregressive Model and Fuzzy Neural Network

Unit root properties of natural gas spot and futures prices: The relevance of heteroskedasticity in high frequency data

A Unified Approach to Structural Change Tests Based on ML Scores, F Statistics, and OLS Residuals

Regression step-by-step using Microsoft Excel

Mortgage Loan Approvals and Government Intervention Policy

16 : Demand Forecasting

Forecasting the US Dollar / Euro Exchange rate Using ARMA Models

Testing The Quantity Theory of Money in Greece: A Note

Vector Time Series Model Representations and Analysis with XploRe

ECON 523 Applied Econometrics I /Masters Level American University, Spring Description of the course

Introduction to Fixed Effects Methods

Module 5: Multiple Regression Analysis

Univariate and Multivariate Methods PEARSON. Addison Wesley

Stepwise Regression. Chapter 311. Introduction. Variable Selection Procedures. Forward (Step-Up) Selection

We extended the additive model in two variables to the interaction model by adding a third term to the equation.

Econometrics Simple Linear Regression

Correlated Random Effects Panel Data Models

Chapter 2. Dynamic panel data models

Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne

Chapter 5: Bivariate Cointegration Analysis

Time Series Analysis

Multiple Linear Regression

Booth School of Business, University of Chicago Business 41202, Spring Quarter 2015, Mr. Ruey S. Tsay. Solutions to Midterm

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Using instrumental variables techniques in economics and finance

Premaster Statistics Tutorial 4 Full solutions

EFFECT OF INVENTORY MANAGEMENT EFFICIENCY ON PROFITABILITY: CURRENT EVIDENCE FROM THE U.S. MANUFACTURING INDUSTRY

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

How Far is too Far? Statistical Outlier Detection

Data analysis and regression in Stata

Tests of Changes in the Elasticity of the Demand for M2 and Policy Implications: The Case of Four Asian Countries

Testing, Monitoring, and Dating Structural Changes in Exchange Rate Regimes

IS THERE A LONG-RUN RELATIONSHIP

Elements of statistics (MATH0487-1)

Mgmt 469. Regression Basics. You have all had some training in statistics and regression analysis. Still, it is useful to review

Tutorial 5: Hypothesis Testing

Time Series Analysis

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Linear Mixed-Effects Modeling in SPSS: An Introduction to the MIXED Procedure

SAS Software to Fit the Generalized Linear Model

Causes of Inflation in the Iranian Economy

Standard errors of marginal effects in the heteroskedastic probit model

Interaction between quantitative predictors

Part 2: Analysis of Relationship Between Two Variables

Weak instruments: An overview and new techniques

Transcription:

Alastair Hall ECG 752: Econometrics Spring 2005 SAS Handout # 5 Serial Correlation In this handout we consider methods for estimation and inference in regression models with serially correlated errors. All the discussion is in the context of a simple aggregate productions function model of the form: ln(q t ) = β 0,1 + β 0,2ln(L t ) + β 0,3ln(K t ) + u t (1) where Q t equals an index of gross national product in constant dollars in year t, L t equals a labour input index (number of persons adjusted for hours of work and education level), and K t equals a capital input index (capital stock adjusted for rates of utilization). Annual data for the U.S. for the period 1929 1967 are contained in the data file proddata that can be downloaded from the course web page http://www4.ncsu.edu/~arhall/ecg752.htm. The data can be read into SAS as follows: proc import datafile= k:\proddata out=aa dbms=tab; getnames=yes; datarow=2; data bb; set aa; y=log(q); x1=log(l); x2=log(k); Given the time series nature of the data, it is reasonable to be concerned that the error process may be serially correlated. In class, we discussed two approaches to inference based respectively on OLS and GLS estimation. In this handout, we consider how both can be performed in SAS. The output from these procedures is contained in an appendix. (i) OLS based inference: The standard regression procedure, proc reg, does not have an option for calculating robust standard errors in the presence of serial correlation. However, this feature is available within proc model if the model is estimated via Generalized Method of Moments. Instrumental variables is the GMM estimator in which the moment condition is E[z t (y t x β t 0)] = 0, and it can be recalled from ECG751 that OLS is an IV estimator with instrument vector z t = x t. 1

proc model is a very general estimation procedure that can handle both linear and nonlinear models via a variety of estimation routines. The following program estimates the model in (1) via OLS and generates robust standard errors calculated using a HAC estimator. proc model data=bb; parms b c d; y= b+(c*x1)+(d*x2); label b= intercept c= coefficient of x1 d= coefficient of x2 ; fit y / gmm kernel=(bart,1,0.2); Two aspects of this code should be noted: Since proc model is a very general procedure, it makes no assumptions about the functional form and so this must be specified. It also makes no assumption about which variables are endogenous and so this must be specified via the fit statement. The kernel= option specifies the use of an HAC estimator to calculate the long run variance. Three kernels are supported: bart, which gives the Bartlett kernel discussed in class; parzen, which gives the Parzen kernel analyzed in Practice problem set # 4; QS, whichgives the quadratic spectral kernel that we have not discussed. The next two numbers in the kernel option specify the bandwidth as follows: kernel=(bart,m,n) meansthatthehaciscalcu- lated with a Bartlett kernel and bandwidth b T = mt n. Notice also that with this formula the bandwidth need not be an integer, and, in fact, there is no reason from the underlying theory why it need be so. However, it is intuitively more appealing to work with integer values for b T with either the Bartlett or Parzen kernels and this is common practice. If you want to fix a specific bandwidth b T = b, say,thenyoumustputm = b and n =0. It should also be noted that proc model actually obtains the estimates via numerical optimization and this explains the layout of the output. In the case here, convergence occurs in one step because the model is linear. As can be seen from the output, the use of a HAC yields different standrad errors than proc reg. The program above yields b T =2.080717. Compare the results with those obtained with =2or3. b T (ii) GLS based inference: We assume that u t is an AR(1) process, that is u t = θu t 1 + w t w t i.i.d.(0,σ 2 w ) and are concerned with the problem of testing whether θ =0. 2

It can be recalled that the basic proc reg output does not contain any diagnostics for serial correlation. However, if we include the option dw then the output includes the Durbin Watson statistic. For our example, the Durbin Watson can be calculated as follows: proc reg data=bb; model y=x1 x2/ dw; Run the program and open the output window. Notice that the Durbin Watson statistic is printed out following the parameter estimates table. For this example, we obtain: d = 0.862. Recall that the Durbin Watson test is one sided. In our case, the first order residual autocorrelation is 0.554, and so it makes sense to test H 0 : θ =0versusH 1 : θ>0. Recall that the decision rule involves an inconclusive region that is: Reject H 0 if d<d L. Fail to reject H 0 if d>d U. where the upper and lower bounds, d U and d U respectively, for a 5% test are reproduced in Table G.6 on page 958 of W. H. Greene (2003), Econometric Analysis, fifth edition.. Notice that these points depend on T and also the number of regressors excluding the intercept. (Beware: Greene uses k to denote the number of regressors minus the intercept whereas in our class notation k denotes the number of regressors including the intercept.) For our example, d L =1.38 and so the Durbin Watson test indicates positive autocorrelation in the residuals. (Aside: if it is desired to test H 0 against H 1 : θ<0 then the form of the decision rule is the same but the test statistic is 4 d.; see Greene p.270.) While the Durbin Watson test statistic is routinely reported, it is only strictly valid under the Classical assumptions. A more generally applicable test can be obtained by regressing e t on e t 1. Clearly to implement this test, it is necessary to save the residuals from the OLS regression and also create the lagged value of the residual. Therefore, the test can be calculated as follows: proc reg data=bb; model y=x1 x2; output out=resdat r=e data cc; set resdat; lage=lag(e); proc reg data=cc; model e=lage; The test statistic is the regression t statistic from the regression of e t on e t 1, and we denote this 3

here by ˆτ 1. The null and alternative hypotheses are H 0 : θ =0andH 1 : θ 0. the decision rule is to reject H 0 at the 100α% significance level if ˆτ 1 >z 1 α/2 where z 1 α/2 is the 100(1 α/2) th percentile of the standard normal distribution. For our example, the p value for this test is 0.0002 and so we reject H 0 at all conventional levels of significance. Therefore, once again the evidence points towards serial correlation in the errors. Three points are worth noting about the previous test: 1. The decision rule is based on the fact that under H 0 ˆτ 1 only asymptotically valid (unlike the Durbin Watson). d N(0, 1) and so the the test is 2. The limiting distribution is valid in the independent stochastic regressor model discussed in class (that is Assumptions ISR1 ISR6) plus some additional mild regularity conditions for the WLLN and CLT. 3. If,inadditiontoISR1 ISR6,u t has a normal distribution then ˆτ 1 is the LM test for H 0 : θ =0versusH 1 : θ 0. In the face of this evidence, it is clearly desirable to re-estimate the model to take account of the serial correlation in the errors. This can be done using proc autoreg. This procedure is designed to estimate a linear regression model with an AR(p) error term of the form: y t = x t β 0 + u t u t = ɛ t p i=1 ɛ t i.i.d.(0, σ 2 ɛ ) φ i u t i where p must be specified by the user. Note that SAS reports estimates of φ and in our notation φ i = θ i. To begin we estimate this model with p = 1. The appropriate code is as follows: proc autoreg data=bb; model y=x1 x2/nlag=1; As you can see, the output from proc autoreg contains four parts: (i) the OLS results ignoring any serial correlation; (ii) the sample autocorrelations of the residuals up to lag p; (iii) estimates of the autoregressive parameters, their standard errors and a t-ratio for the hypothesis that the AR coefficient is zero; (iv) GLS estimates of the model. This output contains a number of statistics that will be discussed in class, and so we leave undefined for now. This procedure also has many of the features of proc reg. Compare the OLS and GLS results. 4

It is also possible to estimate the model by unconditional maximum likelihood, i.e. basedonthe unconditional likelihood function. To do this, it is necessary to include a second option in the model statement as follows: proc autoreg data=bb; model y=x1 x2/nlag=1 method=ml; Compare the GLS and ML results. Exercises: 1. Calculate the robust standrad errors of the OLS estimates using the Parzen kernel. Compare your results with those obtained using the Bartlett kernel. 2. Calculate lag(q), lag1(q), lag2(q), lag3(q) and print them out side by side and compare. What happens at the beginning of the series and why? 3. The LM test for H 0 : θ = 0 versus H 1 : θ 0 can be implemented by running the regression of e t on e t 1 with or without the intercept. Under H 0, the two versions are asymptotically equivalent. What difference does this make in the empirical example above? 4. Suppose that u t = θ 1 u t 1 + θ 2 u t 2 + w t, then it is possible to test H 0 : θ 1 = θ 2 = 0 versus H 1 : θ i 0foratleastonei by running a regression of e t on e t 1 and e t 2 and rejecting H 0 at the 100α% levelifthe2f>c α where F is the F-statistic calculated by SAS and c α is the 100(1 α)% percentile of the χ 2 2 distribution. Perform this test for the production function model described above. (In fact, this test generalizes in the obvious way to AR(p) errors as we discuss in class.) 5. If nlag = 1 then t-value for estimate of the AR coefficient is the Wald test for H 0 : θ =0 versus H 1 : θ 0. What is the relationship between the Wald and LM test statistics? Would you expect them to be equal? Does the choice between them have a qualitative effect on the inference in this case? 6. Estimate the model by GLS with p =2,p =3andp = 4. Compare the results. Does the choice of lag length effect the regression coefficient estimates? 7. It is also possible to estimate so called subset AR models in which certain of the AR coefficient are set to zero. For example, if it is desired the AR(4) model with φ i =0for i =1, 2, 3 then this can be done using the option nlag =(4). Run this model and compare the results with those obtained for p = 4 in the previous question. 5

8. Now estimate the AR(4) model in which φ i =0fori =2, 3. Compare the results with those obtained above. 6