Quantile Regression under misspecification, with an application to the U.S. wage structure

Size: px

Start display at page:

Download "Quantile Regression under misspecification, with an application to the U.S. wage structure"

Merry Watkins
10 years ago
Views:

1 Quantile Regression under misspecification, with an application to the U.S. wage structure Angrist, Chernozhukov and Fernandez-Val Reading Group Econometrics November 2, 2010

2 Intro: initial problem The paper starts with OLS: under misspecification, OLS minimizes the mean-squared error linear approximation. Suppose the true model is Y i = g(x i ) + ε i OLS estimation minimizes (β) 2 = (g(x i ) X i β) 2 The first question of the paper is : What are the properties of quantile regression (QR) when the model is misspecified?

Suppose the true model is Y i = g(x i ) + ε i OLS estimation minimizes (β) 2 = (g(x i )

3 Intro: what do they do? ACFV show that QR estimation is equivalent to the minimization of a weighted mean-squared error linear approximation. This allows : To decompose the contribution of covariates in the estimation. To interpret partial regression and to investigate omitted variable bias. ACFV also derive distribution theory of quantile process.

4 Framework and notations Conditional quantile function: It solves: with the check function: Q τ (Y X ) = inf{y : F Y (y X ) τ} Q τ (Y X ) arg min q(x ) E[ρ τ (Y q(x ))] ρ τ (u)(τ 1(u 0))u Restricting to linear functions : q(x ) = X β, the parameter vector solves: β(τ) = arg min E[ρ τ (Y X β)] β

q(x ) E[ρ τ (Y q(x ))] ρ τ (u)(τ 1(u 0))u Restricting to linear functions

5 Framework and notations The QR specification error is then : τ (X, β) = X β Q τ (Y X ) The deviation from the true quantile specification is : ε τ = Y Q τ (Y X ) with density f ετ (e X ).

6 First result (Theorem 1) Under assumptions : (i) f Y (y X ) exists. (ii) E(Y ), E(Q τ (Y X )) and E X are finite (iii) β(τ) is a unique QR estimator Then β(τ) solves: with : min β E[w τ (X, β). τ (X, β) 2 ] w τ (X, β) = = (1 u)f ετ (u τ (X, β) X )du (1 u)f Y (ux β + (1 u)q τ (Y X ) X )du

7 Theorem 1 : interpretation of weights w τ (X, β) = 1 0 (1 u)f Y [ux β + (1 u)q τ (Y X ) X ]du ux β + (1 u)q τ (Y X ) describes a line between the approximation point and the true one. Weights are thus a weighted average of the density of Y in this segment. More weight is given to points of the segment close to the true value of the conditional quantile. ACFV refer to w τ (X, β) as importance weights. Overall weights are given by (π(x) is the density of X): w τ (x, β) π(x) Weights can be approximated by : w τ (X, β) = 1/2f Y [Q τ (Y X ) X ] + ϱ τ (X ) If f y is locally almost constant on the segment, ϱ τ (X ) 0 (more importance is given where f Y is large).

More weight is given to points of the segment close to the true value of the conditional quantile. ACFV refer to w τ (X, β) as importance weights.

8 Second result (Theorem 2) Under assumptions : (i) f Y (y X ) exists. E(Y ) (ii) E(Y ), E(Q τ (Y X )) and E X are finite (iii) β(τ) is a unique QR estimator Then β(τ) uniquely solves: with : β(τ) = arg min β E[ w τ (X, β(τ)). τ (X, β) 2 ] w τ (X, β(τ)) = 1/2 = 1/ f ετ (u τ (X, β(τ)) X )du f Y (ux β(τ) + (1 u)q τ (Y X ) X )du

estimator Then β(τ) uniquely solves: with : β(τ) = arg min β E[ w τ (X, β(τ)).

9 Theorem 2 : Implications Weights do not depend on β anymore. QR solves a weighted least square approximation. As before, weights can be approximated (under smoothness assumption and if f is locally constant): w τ (X, β(τ)) w τ (X, β(τ)) 1/2f Y [Q τ (Y X ) X ]

As before, weights can be approximated (under smoothness

10 Example from US census A. τ = 0.10 B. τ = 0.50 C. τ = 0.90 Log earnings CQ QR MD Log earnings CQ QR MD Log earnings CQ QR MD Schooling Schooling Schooling D. τ = 0.10 E. τ = 0.50 F. τ = 0.90 Weight QR weights Imp. weights Den. weights Weight QR weights Imp. weights Den. weights Weight QR weights Imp. weights Den. weights Schooling Schooling Schooling

τ = 0.90 Weight 0.0 0.1 0.2 0.3 0.4 0.5 QR weights Imp. weights Den.

11 Example from US census Minimum distance (MD) estimator is obtained as in Chamberlain (1994) as: β(τ) = arg min β E[ τ (X, β) 2 ] Differences in slope come from difference in weights. Importance weights and densities approximations are very close: w τ (X, β(τ)) 1/2f Y [Q τ (Y X ) X ]

in slope come from difference in weights.

12 Example from US census TABLE I COMPARISON OF CQF AND QR-BASED INTERQUANTILE SPREADS Interquantile Spread Census Obs. CQ QR CQ QR CQ QR A. Overall , , , B. High school graduates , , , C. College graduates , , , Notes: The sample consist of U.S.-born black and white men aged The table shows average measures calculated using the distribution of the covariates in each year. The covariates are schooling, race, and a quadratic function of experience. Sampling weights were used for the 2000 Census.

76 2000 25,963 1.29 1.32 0.59 0.60 0.70 0.72 C. College graduates 1980 7,158 1.26 1.19 0.61 0.54 0.65 0.64 1990 15,517 1.44 1.38 0.70 0.66 0.74 0.72 2000 19,388 1.55 1.57 0.75 0.80 0.

13 Partial Quantile Regression Since QR solve a weighted least square approximation, ACFV use it to derive a partial regression framework and omitted variable bias formula. Suppose X = (X 1, X 2 ). Define β (τ) = (β 1 (τ), β 2 (τ)) Partialling out is obtained by computing WLS of Q τ (Y X ) and X 1 on X 2 : Q τ (Y X ) = X 2 π Q + q τ (Y X ) X1 = X 2 π 1 + V 1 Weights are wτ (X ) = w τ (X, β(τ)) In a second step, β 1 (τ) is obtained minimizing the weighted sum of squares of errors: β 1 (τ) = arg min β 1 E[ w τ (X ).(q τ (Y X ) V 1 β 1 ) 2 ] or β 1 (τ) = arg min β 1 E[ w τ (X ).(Q τ (Y X ) V 1 β 1 ) 2 ]

Define β (τ) = (β 1 (τ), β 2 (τ)) Partialling out is obtained by computing WLS of Q τ (Y X ) and X 1 on X 2 : Q τ (Y X ) = X 2 π Q + q τ (Y X ) X1 = X

14 Omitted Variable Bias Using these results, we can derive omitted variable bias formula. Suppose we do not observe X 2. We obtain γ 1 (τ) as : γ 1 (τ) = arg min γ 1 E[ρ τ (Y X 1 γ 1 )] That can be compared to the results for β 1 (τ) in : (β 1 (τ), β 2 (τ)) = arg min β 1,β 2 E[ρ τ (Y X 1β 1 X 2β 2 )] The difference between the two is given by: γ 1 (τ) = β 1 (τ) + (E[ w τ (X )X 1X 1 ]) 1 E[ w τ (X )X 1R τ (X )]

We obtain γ 1 (τ) as : γ 1 (τ) = arg min γ 1 E[ρ τ (Y X 1 γ 1 )] That can be compared to the results

15 Omitted Variable Bias The bias, (E[ w τ (X )X 1 X 1]) 1 E[ w τ (X )X 1 R τ (X )], depends on : w τ (X ) = 1/2 1 0 f ε τ (u. τ (X, γ 1 (τ)) X )du τ (X, γ 1 (τ)) = X 1 γ 1 Q τ (Y X ) ε τ = Y Q τ (Y X ) R τ (X ) = Q τ (Y X ) X 1 β 1 Bias is similar to OLS omitted variable bias.

16 Sampling Properties under misspecification How does misspecification affects inference? As we saw before, QR consistently estimates the approximation of conditional quantile function What about inference on this approximation? ACFL take into account the whole QR process : Allows to infer on several quantiles simultaneously Thus define ˆβ(.) as the set of estimators ˆβ(τ), τ T a close subset of ]0, 1[

17 Theorem 3 Under assumptions : (i) (Y i, X i ) are iid (ii) f Y (y X ) exists, is bounded and uniformly continuous (iii) τ J(τ) = E[f Y (X β(τ) X )X X ] > 0 (iv) E X 2+ε < for some ε > 0 Then : Where : sup ˆβ(τ) β(τ) = o p (1) (1) τ T J(.) n( ˆβ(.) β(.)) z() N (0, Σ(τ, τ )) z(.) (2) n Σ(τ, τ ) = E(z(τ), z(τ )) = E [ (τ 1{Y < X β(τ)})(τ 1{Y < X β(τ )})X X ]

some ε > 0 Then : Where : sup ˆβ(τ) β(τ) = o p (1) (1) τ T J(.) n( ˆβ(.) β(.)) z() N (0, Σ(τ, τ )) z(.

18 Theorem 3 Σ(τ, τ ) = E [ (τ 1{Y < X β(τ)})(τ 1{Y < X β(τ )})X X ] Under the assumption that the model is well specified, Q τ (Y X ) = X β(τ), the Covariance matrix is: Σ 0 (τ, τ ) = [min(τ, τ ) ττ ]E [ X X ] Joint normality is obtained even with misspecification However, misspecification affects the form of the covariance-matrix. This raises difficulties to make inference.

[min(τ, τ ) ττ ]E [ X X ] Joint normality is obtained even with misspecification However,

19 Testing hypotheses ACFV provide a method to test hypotheses of the form: H 0 : R(τ)β(τ) = r(τ) τ T They use a Kolmogorov statistic (using consistent estimates of Σ 0 (τ, τ ) and J(.)). Under misspecification, this statistic has no well-behaved distribution Bootstrap is used to find the quantiles of this statistic.

20 Schooling Coefficient (%) Schooling Coefficient (%) Schooling Coefficient (%) Example from US census A B C % Robust Uniform CI 95% Robust Pointwise CI 95% Robust Uniform CI 95% Robust Pointwise CI 95% Robust Uniform CI 95% Robust Pointwise CI Quantile Index Quantile Index Quantile Index

2000 95% Robust Uniform CI 95% Robust Pointwise CI 95% Robust Uniform CI 95% Robust Pointwise CI

21 Example from US census

22 Log earnings Log earnings Log earnings Example from US census A. Without Controls B. With controls (at covariate means) C. With controls (at 1980 covariate means) Quantile Index Quantile Index Quantile Index

23 Example from US census robust 95% simultaneous confidence intervals are computed by subsampling More heterogeneity in 1990 and 2000 than in 1980: Wage dispersion increases with schooling More increase in the second half...

MATHEMATICAL METHODS OF STATISTICS

MATHEMATICAL METHODS OF STATISTICS By HARALD CRAMER TROFESSOK IN THE UNIVERSITY OF STOCKHOLM Princeton PRINCETON UNIVERSITY PRESS 1946 TABLE OF CONTENTS. First Part. MATHEMATICAL INTRODUCTION. CHAPTERS