7. Autocorrelation (Violation of Assumption #B3)

7. Autocorrelation (Violation of Assumption #B3) Assumption #B3: The error term u i is not autocorrelated, i.e. Cov(u i, u j ) = 0 for all i = 1,..., N and j = 1,..., N where i j Where do we typically find autocorrelation: Time-series data sets (less frequently in cross-sectional data sets) 169

Outlook: Strong similarities between heteroskedasticity and autocorrelation with respect to consequences and estimation procedures (GLS, FGLS estimators) Example: [I] Estimation of a price-revenue function (monthly data) Variables: y i = monthly revenue quantity (in 1000 pieces) x i = selling price (in euros) 170

Month Obs. Price Revenue Month Obs. Price Revenue 01:2002 1 24.2 1590 01:2003 13 32.2 1700 02:2002 2 25.5 1630 02:2003 14 32.4 1450 03:2002 3 26.8 1570 03:2003 15 33.2 1450 04:2002 4 26.4 1960 04:2003 16 34.0 1450 05:2002 5 25.2 2150 05:2003 17 33.7 1000 06:2002 6 24.4 2450 06:2003 18 32.8 1080 07:2002 7 26.2 2770 07:2003 19 31.3 1270 08:2002 8 26.1 2400 08:2003 20 30.9 1520 09:2002 9 27.4 2200 09:2003 21 30.0 1820 10:2002 10 28.4 1270 10:2003 22 28.3 1660 11:2002 11 29.8 1250 11:2003 23 27.5 1500 12:2002 12 31.3 1500 12:2003 24 26.8 1410 Dependent Variable: REVENUE Method: Least Squares Date: 11/17/04 Time: 13:50 Sample: 2002:01 2003:12 Included observations: 24 Variable Coefficient Std. Error t-statistic Prob. C 4262.118 686.0876 6.212207 0.0000 PRICE -89.58094 23.56704-3.801111 0.0010 R-squared 0.396408 Mean dependent var 1668.750 Adjusted R-squared 0.368972 S.D. dependent var 445.9802 S.E. of regression 354.2746 Akaike info criterion 14.65768 Sum squared resid 2761231. Schwarz criterion 14.75585 Log likelihood -173.8921 F-statistic 14.44845 Durbin-Watson stat 0.760994 Prob(F-statistic) 0.000978

Example: [II] Linear model with a single regressor: y i = α + β x i + u i (i = 1,..., 24) Price increase of 1 euro decreases the monthly revenue by 90000 pieces 172

3000 2500 2000 Revenue 1500 1000 500 24 26 28 30 32 34 36 Price

Evidently: Line connecting the residuals rarely crosses the regression line we frequently find that positive u i 1 -values are followed by positive u i -values negative u i 1 -values are followed by negative u i -values Cov(u i 1, u i ) 0 (violation of Assumption #B3) Question: Impact on estimation and testing procedures 174

7.1 Consequences Note: We assume an explicit pattern of autocorrelation (alternative patterns are not considered here) Definition 7.1: (AR(1) process) Let u 1,..., u N be the error terms of the linear regression model. Furthermore, let ρ R be a (constant) parameter and let e 1,..., e N denote additional error terms that satisfy all B-assumptions (#B1 #B4). If u i = ρu i 1 + e i, (i = 2,..., N) we say that the error term u i follows a first-order autoregressive process (in symbols: u i AR(1)). 175

Remarks: An AR(1) process regresses u i on its predecessor value u i 1 plus the new random shock e i For ρ = 1 or ρ = 1 we have so-called random walks (important stochastic processes) For ρ > 1 processes become explosive in this lecture: 1 < ρ < 1 Now: Expected values, (co)variances, correlation coefficients of an AR(1) process 176

Theorem 7.2: (Moments of an AR(1) process) Let the error term u i (i = 1,..., N) follow an AR(1) process according to Definition 7.1 where 1 < ρ < 1. Furthermore, let Var(e i ) σ 2 e denote the constant variance of all e i. We then have for all (admissible) i = 1,..., N: (Proof: class) E(u i ) = 0, Var(u i ) = σ2 e 1 ρ 2 σ2, Cov(u i, u i τ ) = ρ τ σ 2 e 1 ρ 2 = ρτ σ 2 0, Corr(u i, u i τ ) = ρ τ. 177

Obviously: If the error term u i follows an AR(1) process with 1 < ρ < 1, then the Assumptions #B1, #B2 are satisfied whereas #B3 is violated Now: Autocorrelation in matrix notation (u i s follow an AR(1) process) Notation: u = u 1 u 2. u N, u 1 = u 0 u 1. u N 1, e = e 1 e 2. e N 178

Matrix representation: [I] Linear regression model y = Xβ + u with AR(1) error terms ( 1 < ρ < 1) u = ρu 1 + e Theorem 7.2 yields Cov(u) 179

Matrix representation: [II] Due to σ 2 σ 2 e /(1 ρ 2 ) we obtain Cov(u) = σ 2 Cov(u 1, u 2 ) Cov(u 1, u N ) Cov(u 2, u 1 ) σ 2 Cov(u 2, u N )... Cov(u N, u 1 ) Cov(u N, u 2 ) σ 2 = σ 2 ρσ 2 ρ N 1 σ 2 ρσ 2 σ 2 ρ N 2 σ 2... ρ N 1 σ 2 ρ N 2 σ 2 σ 2 = σ 2 Ω 180

Matrix representation: [II] where Ω = 1 ρ ρ N 1 ρ 1 ρ N 2... ρ N 1 ρ N 2 1 181

Question: Is there any transformation of the autocorrelated model so that the parameter vector β remains unchanged autocorrelation vanishes the transformed model y = X β + u satisfies all #A-, #B-, #C- assumptions? (cf. Section 6, Slide 111) 182

Hope: If yes, then the OLS estimator of the transformed model (the GLS estimator) would be BLUE (cf. Section 6, Slides 123 126) Result: In analogy to the line of argument given on Slides 117 121 under heteroskedasticity the following result obtains: there exists a regular matrix P so that the transformed model Py = PXβ + Pu satisfies all #A-, #B-, #C-assumptions 183

Form of P in the autocorrelated model: [I] P has to satisfy the following equations: P P = Ω 1 and PΩP = I N (see Slides 117, 120) First, the inverse of Ω from Slide 181 is given by Ω 1 = 1 1 ρ 2 (check it) 1 ρ 0 0 0 ρ 1 + ρ 2 ρ 0 0 0. ρ. 1 + ρ 2.. 0. 0. 0 0 0. 1 + ρ 2 ρ 0 0 0. ρ 1 184

Form of P in the autocorrelated model: [II] The form of P is given by P = 1 1 ρ 2 1 ρ 2 0 0 0 0 ρ 1 0 0 0 0 ρ 1 0 0...... 0 0 0 1 0 0 0 0 ρ 1 185

Form of P in the autocorrelated model: [III] transformed model: y = X β + u where y = Py = X = PX = 1 ρ 2 u 1 e 2. 1 ρ 2 y 1 y 2 ρy 1, u = Pu =. y N ρy N 1 e N 1 ρ 2 1 ρ 2 x 11 1 ρ 2 x K1 1 ρ. x 12 ρx 11.. x K2 ρx K1. 1 ρ x 1N ρx 1(N 1) x KN ρx K(N 1) 186

Remarks: The transformed model y = X β + u satisfies all #A-, #B-, #C-assumptions The parameter vector β remains unchanged consequences of autocorrelation parallel those of heteroskedasticity 187

Consequences of autocorrelation: [I] The OLS estimator β = ( X X ) 1 X y is still unbiased, but no longer BLUE (cf. Theorem 6.1, Slide 109) The covariance matrix of the OLS estimator is given by Cov ( β ) = σ 2 ( X X ) 1 X ΩX ( X X ) 1 The GLS estimator is BLUE β GLS = [ X X ] 1 X y = [ X Ω 1 X ] 1 X Ω 1 y 188

Consequences of autocorrelation: [II] Its covariance matrix is given by ( Cov β GLS) = σ 2 [ X Ω 1 X ] 1 (cf. Theorem 6.3, Slide 123) Unbiased estimator of σ 2 : ˆσ 2 = û û N K 1 = (Pû) Pû N K 1 189

Impact of neglecting autocorrelation: [I] OLS estimator of β is unbiased, but inefficient β = ( X X ) 1 X y The estimator ˆσ 2 (X X) 1 of the covariance matrix Cov ( β ) is biased The estimator ˆσ 2 û û = N K 1 of the error-term variance is biased 190

Impact of neglecting autocorrelation: [II] test statistics are based on biased estimators hypothesis tests are likely to be unreliable (t-, F -tests) 191

7.2 Diagnostics Graphical analysis: [I] First, estimation of the model y = Xβ + u by OLS, i.e. β = ( X X ) 1 X y calculation of the residuals û = y X β 192

Graphical analysis: [II] Plot of the residuals versus time slow swings around zero positive autocorrelation fast swings around zero negative autocorrelation Scatterplot of û i 1 versus û i positive slope positive autocorrelation negative slope negative autocorrelation Example: [I] Price-revenue function on Slides 170 172 193

Dependent Variable: Revenue Method: Least Squares Date: 11/17/04 Time: 13:50 Sample: 2002:01 2003:12 Included observations: 24 Variable Coefficient Std. Error t-statistic Prob. C 4262.118 686.0876 6.212207 0.0000 PRICE -89.58094 23.56704-3.801111 0.0010 R-squared 0.396408 Mean dependent var 1668.750 Adjusted R-squared 0.368972 S.D. dependent var 445.9802 S.E. of regression 354.2746 Akaike info criterion 14.65768 Sum squared resid 2761231. Schwarz criterion 14.75585 Log likelihood -173.8921 F-statistic 14.44845 Durbin-Watson stat 0.760994 Prob(F-statistic) 0.000978 OBS RESID(-1) RESID 1-504.2594 2-504.2594-347.8042 3-347.8042-291.3490 4-291.3490 62.81861 5 62.81861 145.3215 6 145.3215 373.6567 7 373.6567 854.9024 8 854.9024 475.9443 9 475.9443 392.3995 10 392.3995-448.0195 11-448.0195-342.6062 12-342.6062 41.76520 13 41.76520 322.3880 14 322.3880 90.30423 15 90.30423 161.9690 16 161.9690 233.6337 17 233.6337-243.2406 18-243.2406-243.8634 19-243.8634-188.2348 20-188.2348 25.93283 21 25.93283 245.3100 22 245.3100-66.97761 23-66.97761-298.6424 24-298.6424-451.3490

1000 800 600 Residuals 400 200 0-200 -400-600 02:01 02:04 02:07 02:10 03:01 03:04 03:07 03:10 i = 1,..., 24 1000 500 u i 0-500 -1000-1000 -500 0 500 1000 u i 1

Obviously: Positive dependence between û i 1 and û i indication of positive autocorrelation a conceivable specification of an AR(1) error-term process could be with 0 < ρ < 1 u i = ρu i 1 + e i 196

Now: Use the pair of residuals (û i 1, û i ) to estimate ρ Model û i = ρû i 1 + e i (i = 2,..., N) OLS estimator of ρ: ˆρ = N i=2 (ûi 1 û ) ( û i û ) N (ûi 1 û ) 2 = N i=2 N û i 1 û i i=2 û 2 i 1 i=2 197

Remarks: For the sum of the residuals computed via OLS we always have N i=1 û i = 0 and thus û = 1 N (cf. Von Auer, 2007, p. 57) N i=1 û i = 0 Since for i = 1 there is no residual û i 1 = û 0, we only have i = 2,..., N observations 198

OLS estimate of ρ in the price-revenue example: ˆρ = 0.579310 Dependent Variable: RESID Method: Least Squares Date: 11/20/04 Time: 19:15 Sample(adjusted): 2002:02 2003:12 Included observations: 23 after adjusting endpoints Variable Coefficient Std. Error t-statistic Prob. RESID(-1) 0.579310 0.171176 3.384285 0.0027 R-squared 0.339455 Mean dependent var 21.92432 Adjusted R-squared 0.339455 S.D. dependent var 336.8232 S.E. of regression 273.7493 Akaike info criterion 14.10481 Sum squared resid 1648652. Schwarz criterion 14.15418 Log likelihood -161.2053 Durbin-Watson stat 1.574718 Question: Is ρ significantly different from zero? Durbin-Watson test for autocorrelation 199

Durbin-Watson test: [I] Most popular test for autocorrelation (due to Durbin & Watson, 1950, 1951) Tests for both, positive (ρ > 0) and negative autocorrelation (ρ < 0) Test statistic: calculate the residuals û i via OLS test statistic: DW = N i=2 (ûi û i 1 ) 2 N / û 2 i i=1 200

Durbin-Watson test: [II] Relation to the OLS estimator ˆρ from Slide 197: DW 2(1 ˆρ) Properties: Since 1 < ρ < 1 it follows that 0 < DW < 4 no autocorrelation: ˆρ 0 DW 2 positive autocorrelation: ˆρ 1 DW 0 negative autocorrelation: ˆρ 1 DW 4 201

Durbin-Watson test: [III] Test for positive autocorrelation: hypotheses: H 0 : ρ 0 versus H 1 : ρ > 0 distribution of DW under H 0 depends on sampling size (N) number of exogenous regressors (K) specific values of the regressors x 1i,..., x Ki exact calculation by econometric software 202

Durbin-Watson test: [IV] distribution under H 0 of DW has lower and upper bounds exact critical values at the α-level (d α ) have lower and upper bounds (i.e. d L α d α d U α) (for α = 0.05 see Von Auer, 2007, p. 402) explicit decision rule: reject H 0 : ρ 0 if DW < d L α do not reject H 0 : ρ 0 if DW > d U α no decision if d L α DW d U α 203

Durbin-Watson test: [V] Test for negative autocorrelation: hypotheses: H 0 : ρ 0 versus H 1 : ρ < 0 explicit decision rule: Reject H 0 : ρ 0 if DW > 4 d L α Do not reject H 0 : ρ 0 if DW < 4 d U α No decision if 4 d U α DW 4 dl α 204

Example: Estimation of the price-revenue function (Slides 170, 171) Test for positive autocorrelation at the 5% level: H 0 : ρ 0 versus H 1 : ρ > 0 We have N = 24, K = 1, DW = 0.760994, d0.05 L = 1.27, du 0.05 = 1.45 and thus DW = 0.760994 < 1.27 = d0.05 L reject H 0 at the 5%-level 205

Drawbacks of the Durbin-Watson test: Frequently there is no decision (e.g. if DW [d L α, d U α]) when testing for pos. autocorrelation) DW-Test is unreliable if predecessor values like y i 1, y i 2,... are used as regressors (so-called lag-models) DW-test only tests for AR(1)-autocorrelation 206

7.3 Feasible Estimation Procedures Now: Estimation of the autocorrelated model y = Xβ + u with AR(1) error-terms u = ρu 1 + e ( 1 < ρ < 1) 207

Problem: From the data set (X, y) we do not have direct knowledge about the autocorrelation parameter ρ FGLS estimation Two feasible estimation procedures: GLS approach (Hildreth & Lu) FGLS approach (Cochrane & Orcutt) 208

1. Method by Hildreth & Lu: [I] Search algorithm Consider the GLS estimator β GLS = [ X Ω 1 X ] 1 X Ω 1 y where Ω 1 = 1 1 ρ 2 (cf. Slides 184, 188) 1 ρ 0 0 0 ρ 1 + ρ 2 ρ 0 0 0. ρ. 1 + ρ 2.. 0. 0. 0 0 0. 1 + ρ 2 ρ 0 0 0. ρ 1 209

1. Method by Hildreth & Lu: [II] Perform GLS estimation for distinct ρ-values ( 1 < ρ < 1) compute the sum of squared residuals û û for each estimation Find the ρ-value with the minimal sum of squared residuals GLS estimator of β associated with this ρ-value is called Hildreth-Lu estimator 210

Example: Data on price-revenue function ρ 0.60 0.68 0.69 0.70 0.80 u * ' u * 1756.358 1739.757 1739.748 1740.212 1772.438 α 3420.42 3192.98 3162.48 3131.62 2817.53 β 61.97 54.50 53.51 52.50 42.40 Hildreth-Lu estimates: ˆα HL = 3162.48, ˆβ HL = 53.51 211

2. Method by Cochrane & Orcutt: [I] Iterative multi-step procedure Procedure: 1. OLS estimation of the model y = Xβ + u 2. Save the residuals û = y X β 212

2. Method by Cochrane & Orcutt: [II] 3. Consider the resgression û = ρû 1 + e and estimate ρ by the OLS estimator ˆρ = N û i 1 û i i=2 N û 2 i 1 i=2 4. Use ˆρ to apply the FGLS estimator [ β FGLS = X Ω 1 1 X] X Ω 1 y 213

2. Method by Cochrane & Orcutt: [III] 5. Improvement due to iteration: Compute the new residuals û (2) FGLS = y X β Re-estimate ρ as in Step #3 Find the new FGLS estimator β FGLS(2) Repeat Step #5 until the FGLS estimator of β does not exhibit any further (substantial) change 214

Example: [I] Consider the price-revenue example Estimate OLS estimate Iteration #1 Iteration #2 Iteration #3 ρ 0.58 0.64 0.66 α 4262.12 3473.74 3310.85 3264.88 β 89.58 63.73 58.37 56.86 Cochrane-Orcutt estimates: ˆα CO = 3264.88, ˆβ CO = 56.86 (possibly further iterations) 215

Example: [II] Contrasting both estimation results: Parameter OLS estimate Hildreth-Lu Cochrane-Orcutt 4262.12 89.58 53.51 56.86 216