6. Heteroskedasticity (Violation of Assumption #B2)

Transcription

1 6. Heteroskedasticity (Violation of Assumption #B2) Assumption #B2: Error term u i has constant variance for i = 1,..., N, i.e. Var(u i ) = σ 2 Terminology: Homoskedasticity: constant variances of the u i Heteroskedasticity: non-constant variances of the u i 100

2 Typical situations of heteroskedasticity: Cross-section data sets covering household and regional data Data with measurement errors following a trend Financial market data (exchange-rate, asset-price returns) Example: [I] Rents for (business) real-estate in distinct town quarters 101

3 Example: [II] Variables: y i = rent for real-estate in quarter i (EUR/m 2 ) x i = distance to city center (in km) Single-regressor model: y i = α + β x i + u i, i = 1,..., 12 RENT DISTANCE

4 Rent Distance Dependent Variable: RENT Method: Least Squares Date: 11/06/04 Time: 14:25 Sample: 1 12 Included observations: 12 Variable Coefficient Std. Error t-statistic Prob. C DISTANCE R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood F-statistic Durbin-Watson stat Prob(F-statistic)

5 Example: [III] Residual variation increases with the regressor indication of heteroskedasticity 2 1 Residuals Distance 104

6 Issues: Consequences of heteroskedasticity Diagnostics (Tests for heteroskedasticity) Estimation and hypothesis-testing in the presence of heteroskedasticity (weighted OLS estimation, Aitken-estimator) 105

7 6.1 Consequences Homoskedasticity vs. heteroskedasticity: [I] Consider the linear regression model y = Xβ + u Homoskedasticity means that Cov(u) = σ 2 I N = (validity of Assumption #B3) σ σ σ 2 106

8 Homoskedasticity vs. heteroskedasticity: [II] Heteroskedasticity means that Cov(u) = where Ω = σ σ σn 2 σ2 Ω σ1 2/σ σ2 2/σ σn 2 /σ2 107

9 Example: For the real-estate data set we could assume σ 2 i = σ2 x i, that is Ω = x x x N Ω can be determined directly from the data 108

10 Now: Central result (proof and derivations are given below) Theorem 6.1: (Consequences of heteroskedasticity) In the presence of heteroskedasticity the OLS estimator β = (X X) 1 X y is unbiased. However, the OLS estimator β is no longer BLUE, i.e., there is another linear and unbiased estimator of β with a smaller covariance matrix. 109

11 Proof of unbiasedness: We have β = (X X) 1 X y = (X X) 1 X (Xβ + u) = β + (X X) 1 X u It follows that E ( β ) = β + (X X) 1 X E (u) From Assumption #B1 we have E (u) = 0 N 1 and E ( β ) = β (Assumption #B2 is not needed) 110

12 Now: Construction of a linear estimator of β that, in the presence of heteroskedasticity, is (1) unbiased and (2) more efficient than the OLS estimator β = (X X) 1 X y Procedure: [I] We transform the heteroskedastic model y = Xβ + u such that the parameter vector β remaines unchanged heteroskedasticity vanishes the transformed model y = X β + u satisfies all #A-, #B-, #C-assumptions 111

13 Procedure: [II] New estimator of β: OLS estimator of the transformed model β GLS = [ X X ] 1 X y Example: [I] Consider the single-regressor model where y i = α + β x i + u i (i = 1,..., N) (cf. our real-estate example) Var(u i ) = σ 2 i = σ2 x i 112

14 Example: [II] Transformation: y i xi = α xi + β x i xi + u i xi Define y i y i xi, z i 1 xi, x i x i xi, u i u i xi transformed model: y i = α z i + β x i + u i (multiple regression without intercept) 113

15 Example: [III] We have E(u i ) = 1 xi E(u i ) = 0 Var(u i ) = ( 1 xi ) 2 Var(u i ) = 1 x i σ 2 x i = σ 2 model is homoskedastic 114

16 Summary: Transformed model is homoskedastic Transformed model satisfies all #A-, #B-, #C-assumptions y i, z i, x i can be obtained from the original (x i, y i )-values OLS estimator of the transformed model is obtainable 115

17 Generalization: [I] Consider the heteroskedastic model where (cf. Slide 107) y = Xβ + u Cov(u) = E [ uu ] = σ 2 Ω All elements of the diagonal matrix Ω = are positive σ1 2/σ σ2 2/σ σn 2 /σ2 116

18 Generalization: [II] Ω is a positively definite matrix there is at least one regular (N N) matrix P such that P P = Ω 1 (vgl. Econometrics I, Slide 49) 117

19 Generalization: [III] We transform the heteroskedastic model y = Xβ + u via the matrix P into Py = PXβ + Pu Using the notation y Py, X PX, u Pu, we obtain y = X β + u 118

20 Generalization: [IV] For the vector u of the transformed model we have E ( u ) = E (Pu) = PE (u) = 0 N 1 Cov ( u ) = E { [Pu E(Pu)] [Pu E(Pu)] } = E [ Puu P ) ] = PE [ uu ] P = σ 2 PΩP = σ 2 I N transformed model is homoskedastic 119

21 Remark: The validity of follows from the equation PΩP = I N P P = Ω 1 after left-hand and right-hand-side multiplication with PΩ and P 1 : PΩP PP 1 = PΩΩ 1 P 1 120

22 Summary: There always exists a transformation matrix P that transforms the heteroskedastic model y = Xβ + u with Cov(u) = σ 2 Ω into the homoskedastic model y = X β + u with Cov(u ) = σ 2 I N The homoskedastic model satisfies all #A-, #B-, #C-assumptions β remains unaffected by the transformation 121

23 Now: Potential estimator of β: (cf. Slide 112) OLS estimator of the transformed model β GLS = [ X X ] 1 X y = [ (PX) PX ] 1 (PX) Py = [ X P PX ] 1 X P Py = [ X Ω 1 X ] 1 X Ω 1 y 122

24 Definition 6.2: (Generalized Least Squares estimator) The OLS estimator of β obtained from the transformed homoskedastic model β GLS = [ X Ω 1 X ] 1 X Ω 1 y is called the Generalized Least Squares (GLS) estimator (also: Aitken-estimator). Theorem 6.3: (Properties of the GLS estimator) The GLS estimator β GLS = [ X Ω 1 X ] 1 X Ω 1 y is linear and unbiased. Its covariance matrix is given by ( Cov β GLS) = σ 2 [ X Ω 1 X ] 1. (Proof: see class) 123

25 Remark: Under homoskedasticity it follows that and thus Ω = I N β GLS = [ X Ω 1 X ] 1 X Ω 1 y = [ X I N X ] 1 X I N y = [ X X ] 1 X y = β (GLS and OLS estimators coincide) 124

26 Obviously: Both, the GLS estimator β GLS = [ X Ω 1 X ] 1 X Ω 1 y and the OLS estimator β = [ X X ] 1 X y are linear and unbiased estimators Question: Which estimator is more efficient? 125

27 Answer: The transformed model y = X β + u satisfies all #A-, #B-, #C-assumptions The GLS estimator is BLUE of β (Gauß-Markov-Theorem) The OLS estimator cannot be efficient β GLS = [ X Ω 1 X ] 1 X Ω 1 y β = [ X X ] 1 X y 126

28 Question: What are the consequences of erroneously using the OLS estimator and its associated formulae for the standard errors in the presence of heteroskedasticity? Answer: We compare Cov( β) under heteroskedasticity (the true situation) with Cov( β) as computed under homoskedasticity (the untrue situation) 127

29 Comparison: [I] Under heteroskedasticity we have Cov ( β ) { [ = E β E ( β )] [ β E ( β )] } = E { [ β β ] [ β β ] } { (X = E X ) [ 1 (X X u X ) ] } 1 X u = E { (X X ) 1 X uu X ( X X ) 1 } = ( X X ) 1 X E [ uu ] X ( X X ) 1 = σ 2 ( X X ) 1 X ΩX ( X X ) 1 128

30 Comparison: [II] If we neglect heteroskedasticity we have Cov ( β ) = σ 2 ( X X ) 1 Similarly, in the estimation of σ 2 we have: Under heteroskedasticity: ˆσ 2 = û û N K 1 = (Pû) Pû N K 1 Under the neglection of heteroskedasticity: ˆσ 2 = û û N K 1 129

31 Summary: Under the neglection of heteroskedasticity, estimation of β via the OLS estimator ˆβ = ( X X ) 1 X y is unbiased, but inefficient Under the neglection of heteroskedasticity, the ordinary estimators of the covariance matrix Cov( β) the variance σ 2 of the error term are biased statistics of t- and F -tests are based on biased estimators t- and F -tests are very likely to be misleading 130

32 6.2 Diagnostics Now: Statistical tests for heteroskedasticity Basic structure of all tests: H 0 : Homoskedasticity vs. H 1 : Heteroskedasticity 131

33 Consequence: Non-rejection of H 0 OLS results are unsuspicious Rejection of H 0 problematic OLS results (cf. Section 6.1) application of alternative estimation procedures (cf. Section 6.3) 132

34 Problem of all tests: Tests are based on different patterns of heteroskedasticity (e.g. σ 2 i = σ2 x ki, σ 2 i = σ2 x 2 ki etc.) alternative tests for heteroskedasticity 1. The Goldfeld-Quandt test (special case) Assumed pattern of heteroskedasticity: Variances of the u i are split into two groups: σ 2 i = σ 2 A σ 2 i = σ 2 B for all i belonging to group A (i A) for all i belonging to group B (i B) 133

35 Hypothesis test: H 0 : σ 2 A = σ2 B versus H 1 : σ 2 A σ2 B Test statistic: [I] Notation N A is the number of observations in group A N B is the number of observations in group B S A ûû = i A û 2 i S B ûû = i B û 2 i is the sum of squared residuals in group A is the sum of squared residuals in group B 134

36 Test statistic: [II] Under the Assumption #B4 it follows that 1 N A K 1 SA ûû /σ2 A 1 N B K 1 SB ûû /σ2 B F NA K 1,N B K 1 Under H 0 : σ 2 A = σ2 B we have T = SA ûû /(N A K 1) S B ûû /(N B K 1) F N A K 1,N B K 1 Reject H 0 at the significance level α if T [0, F NA K 1,N B K 1;α/2 ] [F N A K 1,N B K 1;1 α/2, + ] 135

37 Remarks: [I] We can also test the one-sided alternative H 1 : σa 2 > σ2 B via the statistic T The critical region of this test at the α-level is given by [F NA K 1,N B K 1;1 α, + ] We test the reverse alternative H 1 : σ 2 A < σ2 B by interchanging the role of the groups A and B 136

38 Remarks: [II] The general Goldfeld-Quandt test can be used to test wether the σi 2 -values depend in a monotone way on a single exogenous variable x ki (cf. Gujarati, 2003) 137

39 2. The Breusch-Pagan test Assumed pattern of heteroskedasticity: [I] Consider J K exogenous variables z 1,..., z J and J coefficients α 1,..., α J For i = 1,..., N consider the transformation h(α 1 z 1i + α 2 z 2i α J z Ji ) with h : R R + satisfying the following properties: h is continuously differentiable h(0) = 1 138

40 Assumed pattern of heteroskedasticity: [II] We assume that the variances of the u i are given by σ 2 i = σ2 h(α 1 z 1i + α 2 z 2i α J z Ji ) Example: For h : R R + mit h(x) = exp(x) we have σ 2 i = σ2 exp(α 1 z 1i + α 2 z 2i α J z Ji ) (multiplicative heteroskedasticity) 139

41 Heteroskedasticity test: Defining α [α 1... α J ], the testing problem is H 0 : α = 0 J 1 versus H 1 : α 0 J 1 Test statistic: [I] There is a test that does not depend on the function h (Breusch-Pagan test) 140

42 Test statistic: [II] Derivation (in its simplest form): Estimate the model by OLS y = Xβ + u Compute the residuals û = y û = y X β Estimate the model û 2 i = α 0 + α 1 z 1i α J z Ji + u i (i = 1,..., N) and find the coefficient of determination R 2 141

43 Test statistic: [III] Test statistic: We have T = NR 2 T asmp. χ 2 J (chisquare distribution with J degrees-of-freedom) Under H 0 : α = 0 J 1 the impact of z 1,..., z J on û 2 i be equal to zero should decision rule: Reject H 0 at the α-level if T = NR 2 > χ 2 J;1 α 142

44 Remark: The Breusch-Pagan test is a Lagrange-Multiplier test (cf. the lecture Advanced Statistics ) 143

45 3. The White test Special feature of previous tests: Explicit structural form of heteroskedasticity White test: Allows for entirely unknown patterns of heteroskedasticity Best-known heteroskedasticity test Theoretical foundation: Eicker (1967) White (1980) 144

46 Preliminaries: [I] Covariance matrix of OLS estimator under heteroskedasticity Cov ( β ) = σ 2 ( X X ) 1 X ΩX ( X X ) 1 (cf. Slide 128) Question: Can we consistently estimate Cov ( β ) without any structural assumption on the σ 2 i? (i.e. in the presence of heteroskedasticity of unknown form) Answer: Yes, see White (1980) 145

47 Preliminaries: [II] Consider the following partitioning of the X matrix: where X = 1 x 11 x 21 x K1 1 x 12 x 22 x K x 1N x 2N x KN = x i = [ 1 x 1i x 2i x Ki ] x 1 x 2. x N Estimate the model by OLS y = Xβ + u 146

48 Preliminaries: [III] Compute the residuals û = y ŷ = y X β A consistent estimator of Cov ( β ) under heteroskedasticity of unknown form is given by Ĉov ( β ) = ( X X ) 1 N i=1 û 2 i x ix i ( X X ) 1 147

49 Definition 6.4: (Heteroskedasticity-robust standard errors) The standard errors of the OLS estimators β = ( X X ) 1 X y, which are given by the square root of the diagonal elements of the estimated covariance matrix Ĉov ( β ) = ( X X ) 1 N i=1 û 2 i x ix i ( X X ) 1, are called heteroskedasticity-robust standard errors or White standard errors. 148

50 Remarks: White standard errors are available in EViews White standard errors should be reported in empirical studies 149

51 Dependent Variable: RENT Method: Least Squares Date: 11/06/04 Time: 14:25 Sample: 1 12 Included observations: 12 Variable Coefficient Std. Error t-statistic Prob. C DISTANCE R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood F-statistic Durbin-Watson stat Prob(F-statistic) Dependent Variable: RENT Method: Least Squares Date: 11/12/04 Time: 12:23 Sample: 1 12 Included observations: 12 White Heteroskedasticity-Consistent Standard Errors & Covariance Variable Coefficient Std. Error t-statistic Prob. C DISTANCE R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood F-statistic Durbin-Watson stat Prob(F-statistic)

52 Now: White test for heteroskedasticity of unknown form Basis of the test: [I] Comparison of the estimated covariance matrices of the OLS estimator β = ( X X ) 1 X y under homoskedasticity: Ĉov ( β ) = ˆσ 2 ( X X ) 1, where ˆσ 2 = heteroskedasticity: û û N K 1 Ĉov ( β ) = ( X X ) 1 N i=1 û 2 i x ix ( i X X ) 1 (the estimated White covariance matrix) 151

53 Basis of the test: [III] Under homoskedasticity (H 0 ) both estimators should not differ substantially test statistic of the White test 152

54 Test statistic of the White test: [I] Estimate the model y = Xβ + u by OLS and compute the residuals û = y ŷ = y X β Use the squared residuals û 2 i, the exogenous variables x 1i,..., x Ki, their squared values x 2 1i,..., x2 Ki and all cross products x kix li (k = 1,..., K, l = 1,..., K, k = l) to specify the model û 2 i = γ 0 + γ 1 x 1i γ K x Ki + K K k=1 l=1 δ kl x ki x li + u i 153

55 Test statistic of the White test: [II] Estimated the model by OLS and find the coefficient of determination R 2 Under H 0 we have T = NR 2 asmp. χ 2 K(K+1) Reject H 0 at the significance level α if T > χ 2 K(K+1);1 α 154

56 Example: Test the data set on Slide 102 for heteroskedasticity via the Goldfeld-Quandt test the Breusch-Pagan test the White test (see Class) Interesting question: Which test should be preferred? 155

57 Remarks: The White test is the most general test often has low power Whenever we realistically conjecture an explicit pattern of heteroskedasticty (e.g. σi 2 = σ2 xki 2 ), we should use the alternative tests (Goldfeld-Quandt test, Breusch-Pagan test) use graphical tools to analyze residuals in order to detect potential patterns of heteroskedasticity 156

58 6.3 Feasible Estimation Procedures Result from Section 6.1: For the heteroskedastic model where the GLS estimator is BLUE of β (vgl. Folie 126) y = Xβ + u, Cov(u) = E [ uu ] = σ 2 Ω, β GLS = [ X Ω 1 X ] 1 X Ω 1 y 157

59 Problem: Frequently, the diagonal matrix Ω is not known GLS estimator β GLS cannot be computed Resort: Replace the unknown (true) Ω by an unbiased and/or consistent estimate Ω feasible GLS estimator (FGLS) 158

60 Definition 6.5: (FGLS estimator) Let Ω be an unbiased and/or consistent estimator of the unknown covariance matrix Ω of the heteroskedastic model where y = Xβ + u, Cov(u) = σ 2 Ω. The estimator β FGLS = [ X Ω 1 X] 1 X Ω 1 y is called feasible generalized-least-squares estimator of β. 159

61 Example: [I] Consider the data set on Slide 102 Classify the u i variances for the central (i = 1,..., 5) and the periphery quarters (i = 6,..., 12) Consider the following model: y i = α + β x i + u i, where σ 2 i = σ 2 A for i = 1,..., 5 σ 2 i = σ 2 B for i = 6,...,

62 Example: [II] Transformation of the model: y i σ = α 1 A σ + β x i A σ + u i A σ for i = 1,..., 5 A y i σ B = α 1 σ B + β x i σ B + u i σ B for i = 6,..., 12 variances of the error terms Var ( ) u i σ = 1 A σa 2 Var(u i ) = 1 σa 2 σa 2 Var ( ) u i σ = 1 B σb 2 Var(u i ) = 1 σb 2 σb 2 = 1 for i = 1,..., 5 = 1 for i = 6,...,

63 Example: [III] Summary: where y i = α z i + β x i + u i for i = 1,..., 12 y i = y i σ A, z i = 1 σ A, x i = x i σ A, u i = u i σ A for i = 1,..., 5 y i = y i σ B, z i = 1 σ B, x i = x i σ B, u i = u i σ B for i = 6,...,

64 Example: [IV] Variances σ 2 A, σ2 B are unknown estimation of the variances via the respective regressions y i = α + β x i + u i for i = 1,... 5 and i = 6, with the respective estimators ˆσ 2 = û û N K 1 163

65 Dependent Variable: RENT Method: Least Squares Date: 11/15/04 Time: 10:41 Sample: 1 5 Included observations: 5 Variable Coefficient Std. Error t-statistic Prob. C DISTANCE R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood F-statistic Durbin-Watson stat Prob(F-statistic) Dependent Variable: RENT Method: Least Squares Date: 11/15/04 Time: 10:41 Sample: 6 12 Included observations: 7 Variable Coefficient Std. Error t-statistic Prob. C DISTANCE R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood F-statistic Durbin-Watson stat Prob(F-statistic)

66 Example: [IV] Estimated variances and standard deviations ˆσ A 2 = = , ˆσ A = = ˆσ B 2 = = , ˆσ A = = (estimated) transformation of the model (cf. Slides 161, 162) FGLS estimates of the coefficients: ˆα FGLS = , ˆβ FGLS = Estimate of the error-term variance of the transformed model: Var(u i ) = =

67 Dependent Variable: RENT_TR Method: Least Squares Date: 11/15/04 Time: 10:51 Sample: 1 12 Included observations: 12 Variable Coefficient Std. Error t-statistic Prob. Z_TR DISTANCE _TR R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood Durbin-Watson stat

68 Example: [V] However, we know that Var(u i ) = 1 Corrected standard errors of ˆα FGLS, ˆβ FGLS (via covariance matrix σ 2 (X X) 1 of the OLS estimator) SE(ˆα FGLS ) = 0.243, SE(ˆβ FGLS ) = corrected t-values of ˆα FGLS, ˆβ FGLS : t-value of ˆα FGLS : , t-value of ˆβ FGLS :=

69 Properties of FGLS estimators: FGLS estimators are unbiased Variances of FGLS estimators are lower than the variances of the OLS estimators FGLS estimators are asymptotically efficient (approximation to the GLS estimator for N ) 168