Quantile Regression under misspecification, with an application to the U.S. wage structure Angrist, Chernozhukov and Fernandez-Val Reading Group Econometrics November 2, 2010
Intro: initial problem The paper starts with OLS: under misspecification, OLS minimizes the mean-squared error linear approximation. Suppose the true model is Y i = g(x i ) + ε i OLS estimation minimizes (β) 2 = (g(x i ) X i β) 2 The first question of the paper is : What are the properties of quantile regression (QR) when the model is misspecified?
Intro: what do they do? ACFV show that QR estimation is equivalent to the minimization of a weighted mean-squared error linear approximation. This allows : To decompose the contribution of covariates in the estimation. To interpret partial regression and to investigate omitted variable bias. ACFV also derive distribution theory of quantile process.
Framework and notations Conditional quantile function: It solves: with the check function: Q τ (Y X ) = inf{y : F Y (y X ) τ} Q τ (Y X ) arg min q(x ) E[ρ τ (Y q(x ))] ρ τ (u)(τ 1(u 0))u Restricting to linear functions : q(x ) = X β, the parameter vector solves: β(τ) = arg min E[ρ τ (Y X β)] β
Framework and notations The QR specification error is then : τ (X, β) = X β Q τ (Y X ) The deviation from the true quantile specification is : ε τ = Y Q τ (Y X ) with density f ετ (e X ).
First result (Theorem 1) Under assumptions : (i) f Y (y X ) exists. (ii) E(Y ), E(Q τ (Y X )) and E X are finite (iii) β(τ) is a unique QR estimator Then β(τ) solves: with : min β E[w τ (X, β). τ (X, β) 2 ] w τ (X, β) = = 1 0 1 0 (1 u)f ετ (u τ (X, β) X )du (1 u)f Y (ux β + (1 u)q τ (Y X ) X )du
Theorem 1 : interpretation of weights w τ (X, β) = 1 0 (1 u)f Y [ux β + (1 u)q τ (Y X ) X ]du ux β + (1 u)q τ (Y X ) describes a line between the approximation point and the true one. Weights are thus a weighted average of the density of Y in this segment. More weight is given to points of the segment close to the true value of the conditional quantile. ACFV refer to w τ (X, β) as importance weights. Overall weights are given by (π(x) is the density of X): w τ (x, β) π(x) Weights can be approximated by : w τ (X, β) = 1/2f Y [Q τ (Y X ) X ] + ϱ τ (X ) If f y is locally almost constant on the segment, ϱ τ (X ) 0 (more importance is given where f Y is large).
Second result (Theorem 2) Under assumptions : (i) f Y (y X ) exists. E(Y ) (ii) E(Y ), E(Q τ (Y X )) and E X are finite (iii) β(τ) is a unique QR estimator Then β(τ) uniquely solves: with : β(τ) = arg min β E[ w τ (X, β(τ)). τ (X, β) 2 ] w τ (X, β(τ)) = 1/2 = 1/2 1 0 1 0 f ετ (u τ (X, β(τ)) X )du f Y (ux β(τ) + (1 u)q τ (Y X ) X )du
Theorem 2 : Implications Weights do not depend on β anymore. QR solves a weighted least square approximation. As before, weights can be approximated (under smoothness assumption and if f is locally constant): w τ (X, β(τ)) w τ (X, β(τ)) 1/2f Y [Q τ (Y X ) X ]
Example from US census A. τ = 0.10 B. τ = 0.50 C. τ = 0.90 Log earnings 5.0 5.5 6.0 6.5 CQ QR MD Log earnings 5.5 6.0 6.5 7.0 CQ QR MD Log earnings 6.5 7.0 7.5 CQ QR MD 5 10 15 20 5 10 15 20 5 10 15 20 Schooling Schooling Schooling D. τ = 0.10 E. τ = 0.50 F. τ = 0.90 Weight 0.0 0.1 0.2 0.3 0.4 0.5 QR weights Imp. weights Den. weights Weight 0.0 0.1 0.2 0.3 0.4 0.5 QR weights Imp. weights Den. weights Weight 0.0 0.1 0.2 0.3 0.4 0.5 QR weights Imp. weights Den. weights 5 10 15 20 5 10 15 20 5 10 15 20 Schooling Schooling Schooling
Example from US census Minimum distance (MD) estimator is obtained as in Chamberlain (1994) as: β(τ) = arg min β E[ τ (X, β) 2 ] Differences in slope come from difference in weights. Importance weights and densities approximations are very close: w τ (X, β(τ)) 1/2f Y [Q τ (Y X ) X ]
Example from US census TABLE I COMPARISON OF CQF AND QR-BASED INTERQUANTILE SPREADS Interquantile Spread 90 10 90 50 50 10 Census Obs. CQ QR CQ QR CQ QR A. Overall 1980 65,023 1.20 1.19 0.51 0.52 0.68 0.67 1990 86,785 1.35 1.35 0.60 0.61 0.75 0.74 2000 97,397 1.43 1.45 0.67 0.70 0.76 0.75 B. High school graduates 1980 25,020 1.09 1.17 0.44 0.50 0.65 0.67 1990 22,837 1.26 1.31 0.52 0.55 0.74 0.76 2000 25,963 1.29 1.32 0.59 0.60 0.70 0.72 C. College graduates 1980 7,158 1.26 1.19 0.61 0.54 0.65 0.64 1990 15,517 1.44 1.38 0.70 0.66 0.74 0.72 2000 19,388 1.55 1.57 0.75 0.80 0.80 0.78 Notes: The sample consist of U.S.-born black and white men aged 40 49. The table shows average measures calculated using the distribution of the covariates in each year. The covariates are schooling, race, and a quadratic function of experience. Sampling weights were used for the 2000 Census.
Partial Quantile Regression Since QR solve a weighted least square approximation, ACFV use it to derive a partial regression framework and omitted variable bias formula. Suppose X = (X 1, X 2 ). Define β (τ) = (β 1 (τ), β 2 (τ)) Partialling out is obtained by computing WLS of Q τ (Y X ) and X 1 on X 2 : Q τ (Y X ) = X 2 π Q + q τ (Y X ) X1 = X 2 π 1 + V 1 Weights are wτ (X ) = w τ (X, β(τ)) In a second step, β 1 (τ) is obtained minimizing the weighted sum of squares of errors: β 1 (τ) = arg min β 1 E[ w τ (X ).(q τ (Y X ) V 1 β 1 ) 2 ] or β 1 (τ) = arg min β 1 E[ w τ (X ).(Q τ (Y X ) V 1 β 1 ) 2 ]
Omitted Variable Bias Using these results, we can derive omitted variable bias formula. Suppose we do not observe X 2. We obtain γ 1 (τ) as : γ 1 (τ) = arg min γ 1 E[ρ τ (Y X 1 γ 1 )] That can be compared to the results for β 1 (τ) in : (β 1 (τ), β 2 (τ)) = arg min β 1,β 2 E[ρ τ (Y X 1β 1 X 2β 2 )] The difference between the two is given by: γ 1 (τ) = β 1 (τ) + (E[ w τ (X )X 1X 1 ]) 1 E[ w τ (X )X 1R τ (X )]
Omitted Variable Bias The bias, (E[ w τ (X )X 1 X 1]) 1 E[ w τ (X )X 1 R τ (X )], depends on : w τ (X ) = 1/2 1 0 f ε τ (u. τ (X, γ 1 (τ)) X )du τ (X, γ 1 (τ)) = X 1 γ 1 Q τ (Y X ) ε τ = Y Q τ (Y X ) R τ (X ) = Q τ (Y X ) X 1 β 1 Bias is similar to OLS omitted variable bias.
Sampling Properties under misspecification How does misspecification affects inference? As we saw before, QR consistently estimates the approximation of conditional quantile function What about inference on this approximation? ACFL take into account the whole QR process : Allows to infer on several quantiles simultaneously Thus define ˆβ(.) as the set of estimators ˆβ(τ), τ T a close subset of ]0, 1[
Theorem 3 Under assumptions : (i) (Y i, X i ) are iid (ii) f Y (y X ) exists, is bounded and uniformly continuous (iii) τ J(τ) = E[f Y (X β(τ) X )X X ] > 0 (iv) E X 2+ε < for some ε > 0 Then : Where : sup ˆβ(τ) β(τ) = o p (1) (1) τ T J(.) n( ˆβ(.) β(.)) z() N (0, Σ(τ, τ )) z(.) (2) n Σ(τ, τ ) = E(z(τ), z(τ )) = E [ (τ 1{Y < X β(τ)})(τ 1{Y < X β(τ )})X X ]
Theorem 3 Σ(τ, τ ) = E [ (τ 1{Y < X β(τ)})(τ 1{Y < X β(τ )})X X ] Under the assumption that the model is well specified, Q τ (Y X ) = X β(τ), the Covariance matrix is: Σ 0 (τ, τ ) = [min(τ, τ ) ττ ]E [ X X ] Joint normality is obtained even with misspecification However, misspecification affects the form of the covariance-matrix. This raises difficulties to make inference.
Testing hypotheses ACFV provide a method to test hypotheses of the form: H 0 : R(τ)β(τ) = r(τ) τ T They use a Kolmogorov statistic (using consistent estimates of Σ 0 (τ, τ ) and J(.)). Under misspecification, this statistic has no well-behaved distribution Bootstrap is used to find the quantiles of this statistic.
Schooling Coefficient (%) 8 10 12 14 16 Schooling Coefficient (%) 8 10 12 14 16 Schooling Coefficient (%) 8 10 12 14 16 Example from US census A. 1980 B. 1990 C. 2000 95% Robust Uniform CI 95% Robust Pointwise CI 95% Robust Uniform CI 95% Robust Pointwise CI 95% Robust Uniform CI 95% Robust Pointwise CI 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 Quantile Index Quantile Index Quantile Index
Example from US census
Log earnings -0.5 0.0 0.5 Log earnings -0.5 0.0 0.5 Log earnings -0.5 0.0 0.5 Example from US census A. Without Controls B. With controls (at covariate means) C. With controls (at 1980 covariate means) 1980 1990 2000 1980 1990 2000 1980 1990 2000 0.2 0.4 0.6 0.8 Quantile Index 0.2 0.4 0.6 0.8 Quantile Index 0.2 0.4 0.6 0.8 Quantile Index
Example from US census robust 95% simultaneous confidence intervals are computed by subsampling More heterogeneity in 1990 and 2000 than in 1980: Wage dispersion increases with schooling More increase in the second half...