Quantile Regression under misspecification, with an application to the U.S. wage structure



Similar documents
MATHEMATICAL METHODS OF STATISTICS

Local classification and local likelihoods

Least Squares Estimation

Master s Theory Exam Spring 2006

Introduction to General and Generalized Linear Models

Linear regression methods for large n and streaming data

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Session 9 Case 3: Utilizing Available Software Statistical Analysis

You Are What You Bet: Eliciting Risk Attitudes from Horse Races

THE IMPACT OF 401(K) PARTICIPATION ON THE WEALTH DISTRIBUTION: AN INSTRUMENTAL QUANTILE REGRESSION ANALYSIS

Adaptive Online Gradient Descent

Multivariate Normal Distribution

From the help desk: Bootstrapped standard errors

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

Response variables assume only two values, say Y j = 1 or = 0, called success and failure (spam detection, credit scoring, contracting.

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Clustering in the Linear Model

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling

E3: PROBABILITY AND STATISTICS lecture notes

ECON 523 Applied Econometrics I /Masters Level American University, Spring Description of the course

Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection

Keep It Simple: Easy Ways To Estimate Choice Models For Single Consumers

Simple Linear Regression Inference

Multiple Linear Regression in Data Mining

Forecast. Forecast is the linear function with estimated coefficients. Compute with predict command

Lecture 3: Linear methods for classification

2. Linear regression with multiple regressors

ESTIMATING AVERAGE TREATMENT EFFECTS: IV AND CONTROL FUNCTIONS, II Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

Inner Product Spaces

AP Physics 1 and 2 Lab Investigations

Stat 704 Data Analysis I Probability Review

2013 MBA Jump Start Program. Statistics Module Part 3

Multivariate Statistical Inference and Applications

Nonlinear Regression Functions. SW Ch 8 1/54/

Classification Problems

Christfried Webers. Canberra February June 2015

SYSTEMS OF REGRESSION EQUATIONS

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

Multivariate Logistic Regression

M1 in Economics and Economics and Statistics Applied multivariate Analysis - Big data analytics Worksheet 1 - Bootstrap

Probability and statistics; Rehearsal for pattern recognition

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

α = u v. In other words, Orthogonal Projection

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

University of Ljubljana Doctoral Programme in Statistics Methodology of Statistical Research Written examination February 14 th, 2014.

EMPIRICAL RISK MINIMIZATION FOR CAR INSURANCE DATA

Generating Random Numbers Variance Reduction Quasi-Monte Carlo. Simulation Methods. Leonid Kogan. MIT, Sloan , Fall 2010

MATH4427 Notebook 2 Spring MATH4427 Notebook Definitions and Examples Performance Measures for Estimators...

UNIVERSITY OF WAIKATO. Hamilton New Zealand

Simple linear regression

Average Redistributional Effects. IFAI/IZA Conference on Labor Market Policy Evaluation

Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software

Overview. Longitudinal Data Variation and Correlation Different Approaches. Linear Mixed Models Generalized Linear Mixed Models

The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series.

Simple Linear Regression

Directions for using SPSS

Sample Size Calculation for Longitudinal Studies

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY

Rolle s Theorem. q( x) = 1

QUANTIZED PRINCIPAL COMPONENT ANALYSIS WITH APPLICATIONS TO LOW-BANDWIDTH IMAGE COMPRESSION AND COMMUNICATION. D. Wooden. M. Egerstedt. B.K.

THE CENTRAL LIMIT THEOREM TORONTO

Machine Learning and Pattern Recognition Logistic Regression

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March Due:-March 25, 2015.

Online Appendix to Are Risk Preferences Stable Across Contexts? Evidence from Insurance Data

FIXED EFFECTS AND RELATED ESTIMATORS FOR CORRELATED RANDOM COEFFICIENT AND TREATMENT EFFECT PANEL DATA MODELS

Dimensionality Reduction: Principal Components Analysis

Analysis of Reliability and Warranty Claims in Products with Age and Usage Scales

2. Filling Data Gaps, Data validation & Descriptive Statistics

Simple Regression Theory II 2010 Samuel L. Baker

Generalized Method of Moments Estimation

MATH BOOK OF PROBLEMS SERIES. New from Pearson Custom Publishing!

Statistical Machine Learning

Chapter 10: Basic Linear Unobserved Effects Panel Data. Models:

Models for Longitudinal and Clustered Data

Multiple Linear Regression

Chapter 5: Bivariate Cointegration Analysis

MTH 140 Statistics Videos

4.3 Lagrange Approximation

Application. Outline. 3-1 Polynomial Functions 3-2 Finding Rational Zeros of. Polynomial. 3-3 Approximating Real Zeros of.

MULTIVARIATE PROBABILITY DISTRIBUTIONS

Wage Compensation for Risk: The Case of Turkey

Statistics 104: Section 6!

Syntax Menu Description Options Remarks and examples Stored results Methods and formulas References Also see. Description

Forecasting in supply chains

Mathematical Finance

Regression Analysis. Regression Analysis MIT 18.S096. Dr. Kempthorne. Fall 2013

An Application of the G-formula to Asbestos and Lung Cancer. Stephen R. Cole. Epidemiology, UNC Chapel Hill. Slides:

Online Appendix The Earnings Returns to Graduating with Honors - Evidence from Law Graduates

Practical. I conometrics. data collection, analysis, and application. Christiana E. Hilmer. Michael J. Hilmer San Diego State University

Regression III: Advanced Methods

12: Analysis of Variance. Introduction

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

Pattern Analysis. Logistic Regression. 12. Mai Joachim Hornegger. Chair of Pattern Recognition Erlangen University

On the Efficiency of Competitive Stock Markets Where Traders Have Diverse Information

Methodological aspects of small area estimation from the National Electronic Health Records Survey (NEHRS). 1

Transcription:

Quantile Regression under misspecification, with an application to the U.S. wage structure Angrist, Chernozhukov and Fernandez-Val Reading Group Econometrics November 2, 2010

Intro: initial problem The paper starts with OLS: under misspecification, OLS minimizes the mean-squared error linear approximation. Suppose the true model is Y i = g(x i ) + ε i OLS estimation minimizes (β) 2 = (g(x i ) X i β) 2 The first question of the paper is : What are the properties of quantile regression (QR) when the model is misspecified?

Intro: what do they do? ACFV show that QR estimation is equivalent to the minimization of a weighted mean-squared error linear approximation. This allows : To decompose the contribution of covariates in the estimation. To interpret partial regression and to investigate omitted variable bias. ACFV also derive distribution theory of quantile process.

Framework and notations Conditional quantile function: It solves: with the check function: Q τ (Y X ) = inf{y : F Y (y X ) τ} Q τ (Y X ) arg min q(x ) E[ρ τ (Y q(x ))] ρ τ (u)(τ 1(u 0))u Restricting to linear functions : q(x ) = X β, the parameter vector solves: β(τ) = arg min E[ρ τ (Y X β)] β

Framework and notations The QR specification error is then : τ (X, β) = X β Q τ (Y X ) The deviation from the true quantile specification is : ε τ = Y Q τ (Y X ) with density f ετ (e X ).

First result (Theorem 1) Under assumptions : (i) f Y (y X ) exists. (ii) E(Y ), E(Q τ (Y X )) and E X are finite (iii) β(τ) is a unique QR estimator Then β(τ) solves: with : min β E[w τ (X, β). τ (X, β) 2 ] w τ (X, β) = = 1 0 1 0 (1 u)f ετ (u τ (X, β) X )du (1 u)f Y (ux β + (1 u)q τ (Y X ) X )du

Theorem 1 : interpretation of weights w τ (X, β) = 1 0 (1 u)f Y [ux β + (1 u)q τ (Y X ) X ]du ux β + (1 u)q τ (Y X ) describes a line between the approximation point and the true one. Weights are thus a weighted average of the density of Y in this segment. More weight is given to points of the segment close to the true value of the conditional quantile. ACFV refer to w τ (X, β) as importance weights. Overall weights are given by (π(x) is the density of X): w τ (x, β) π(x) Weights can be approximated by : w τ (X, β) = 1/2f Y [Q τ (Y X ) X ] + ϱ τ (X ) If f y is locally almost constant on the segment, ϱ τ (X ) 0 (more importance is given where f Y is large).

Second result (Theorem 2) Under assumptions : (i) f Y (y X ) exists. E(Y ) (ii) E(Y ), E(Q τ (Y X )) and E X are finite (iii) β(τ) is a unique QR estimator Then β(τ) uniquely solves: with : β(τ) = arg min β E[ w τ (X, β(τ)). τ (X, β) 2 ] w τ (X, β(τ)) = 1/2 = 1/2 1 0 1 0 f ετ (u τ (X, β(τ)) X )du f Y (ux β(τ) + (1 u)q τ (Y X ) X )du

Theorem 2 : Implications Weights do not depend on β anymore. QR solves a weighted least square approximation. As before, weights can be approximated (under smoothness assumption and if f is locally constant): w τ (X, β(τ)) w τ (X, β(τ)) 1/2f Y [Q τ (Y X ) X ]

Example from US census A. τ = 0.10 B. τ = 0.50 C. τ = 0.90 Log earnings 5.0 5.5 6.0 6.5 CQ QR MD Log earnings 5.5 6.0 6.5 7.0 CQ QR MD Log earnings 6.5 7.0 7.5 CQ QR MD 5 10 15 20 5 10 15 20 5 10 15 20 Schooling Schooling Schooling D. τ = 0.10 E. τ = 0.50 F. τ = 0.90 Weight 0.0 0.1 0.2 0.3 0.4 0.5 QR weights Imp. weights Den. weights Weight 0.0 0.1 0.2 0.3 0.4 0.5 QR weights Imp. weights Den. weights Weight 0.0 0.1 0.2 0.3 0.4 0.5 QR weights Imp. weights Den. weights 5 10 15 20 5 10 15 20 5 10 15 20 Schooling Schooling Schooling

Example from US census Minimum distance (MD) estimator is obtained as in Chamberlain (1994) as: β(τ) = arg min β E[ τ (X, β) 2 ] Differences in slope come from difference in weights. Importance weights and densities approximations are very close: w τ (X, β(τ)) 1/2f Y [Q τ (Y X ) X ]

Example from US census TABLE I COMPARISON OF CQF AND QR-BASED INTERQUANTILE SPREADS Interquantile Spread 90 10 90 50 50 10 Census Obs. CQ QR CQ QR CQ QR A. Overall 1980 65,023 1.20 1.19 0.51 0.52 0.68 0.67 1990 86,785 1.35 1.35 0.60 0.61 0.75 0.74 2000 97,397 1.43 1.45 0.67 0.70 0.76 0.75 B. High school graduates 1980 25,020 1.09 1.17 0.44 0.50 0.65 0.67 1990 22,837 1.26 1.31 0.52 0.55 0.74 0.76 2000 25,963 1.29 1.32 0.59 0.60 0.70 0.72 C. College graduates 1980 7,158 1.26 1.19 0.61 0.54 0.65 0.64 1990 15,517 1.44 1.38 0.70 0.66 0.74 0.72 2000 19,388 1.55 1.57 0.75 0.80 0.80 0.78 Notes: The sample consist of U.S.-born black and white men aged 40 49. The table shows average measures calculated using the distribution of the covariates in each year. The covariates are schooling, race, and a quadratic function of experience. Sampling weights were used for the 2000 Census.

Partial Quantile Regression Since QR solve a weighted least square approximation, ACFV use it to derive a partial regression framework and omitted variable bias formula. Suppose X = (X 1, X 2 ). Define β (τ) = (β 1 (τ), β 2 (τ)) Partialling out is obtained by computing WLS of Q τ (Y X ) and X 1 on X 2 : Q τ (Y X ) = X 2 π Q + q τ (Y X ) X1 = X 2 π 1 + V 1 Weights are wτ (X ) = w τ (X, β(τ)) In a second step, β 1 (τ) is obtained minimizing the weighted sum of squares of errors: β 1 (τ) = arg min β 1 E[ w τ (X ).(q τ (Y X ) V 1 β 1 ) 2 ] or β 1 (τ) = arg min β 1 E[ w τ (X ).(Q τ (Y X ) V 1 β 1 ) 2 ]

Omitted Variable Bias Using these results, we can derive omitted variable bias formula. Suppose we do not observe X 2. We obtain γ 1 (τ) as : γ 1 (τ) = arg min γ 1 E[ρ τ (Y X 1 γ 1 )] That can be compared to the results for β 1 (τ) in : (β 1 (τ), β 2 (τ)) = arg min β 1,β 2 E[ρ τ (Y X 1β 1 X 2β 2 )] The difference between the two is given by: γ 1 (τ) = β 1 (τ) + (E[ w τ (X )X 1X 1 ]) 1 E[ w τ (X )X 1R τ (X )]

Omitted Variable Bias The bias, (E[ w τ (X )X 1 X 1]) 1 E[ w τ (X )X 1 R τ (X )], depends on : w τ (X ) = 1/2 1 0 f ε τ (u. τ (X, γ 1 (τ)) X )du τ (X, γ 1 (τ)) = X 1 γ 1 Q τ (Y X ) ε τ = Y Q τ (Y X ) R τ (X ) = Q τ (Y X ) X 1 β 1 Bias is similar to OLS omitted variable bias.

Sampling Properties under misspecification How does misspecification affects inference? As we saw before, QR consistently estimates the approximation of conditional quantile function What about inference on this approximation? ACFL take into account the whole QR process : Allows to infer on several quantiles simultaneously Thus define ˆβ(.) as the set of estimators ˆβ(τ), τ T a close subset of ]0, 1[

Theorem 3 Under assumptions : (i) (Y i, X i ) are iid (ii) f Y (y X ) exists, is bounded and uniformly continuous (iii) τ J(τ) = E[f Y (X β(τ) X )X X ] > 0 (iv) E X 2+ε < for some ε > 0 Then : Where : sup ˆβ(τ) β(τ) = o p (1) (1) τ T J(.) n( ˆβ(.) β(.)) z() N (0, Σ(τ, τ )) z(.) (2) n Σ(τ, τ ) = E(z(τ), z(τ )) = E [ (τ 1{Y < X β(τ)})(τ 1{Y < X β(τ )})X X ]

Theorem 3 Σ(τ, τ ) = E [ (τ 1{Y < X β(τ)})(τ 1{Y < X β(τ )})X X ] Under the assumption that the model is well specified, Q τ (Y X ) = X β(τ), the Covariance matrix is: Σ 0 (τ, τ ) = [min(τ, τ ) ττ ]E [ X X ] Joint normality is obtained even with misspecification However, misspecification affects the form of the covariance-matrix. This raises difficulties to make inference.

Testing hypotheses ACFV provide a method to test hypotheses of the form: H 0 : R(τ)β(τ) = r(τ) τ T They use a Kolmogorov statistic (using consistent estimates of Σ 0 (τ, τ ) and J(.)). Under misspecification, this statistic has no well-behaved distribution Bootstrap is used to find the quantiles of this statistic.

Schooling Coefficient (%) 8 10 12 14 16 Schooling Coefficient (%) 8 10 12 14 16 Schooling Coefficient (%) 8 10 12 14 16 Example from US census A. 1980 B. 1990 C. 2000 95% Robust Uniform CI 95% Robust Pointwise CI 95% Robust Uniform CI 95% Robust Pointwise CI 95% Robust Uniform CI 95% Robust Pointwise CI 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 Quantile Index Quantile Index Quantile Index

Example from US census

Log earnings -0.5 0.0 0.5 Log earnings -0.5 0.0 0.5 Log earnings -0.5 0.0 0.5 Example from US census A. Without Controls B. With controls (at covariate means) C. With controls (at 1980 covariate means) 1980 1990 2000 1980 1990 2000 1980 1990 2000 0.2 0.4 0.6 0.8 Quantile Index 0.2 0.4 0.6 0.8 Quantile Index 0.2 0.4 0.6 0.8 Quantile Index

Example from US census robust 95% simultaneous confidence intervals are computed by subsampling More heterogeneity in 1990 and 2000 than in 1980: Wage dispersion increases with schooling More increase in the second half...