Abadie s Semi-parametric. Difference-in-Difference Estimator

Similar documents
AN INTRODUCTION TO MATCHING METHODS FOR CAUSAL INFERENCE

Gender Differences in Employed Job Search Lindsey Bowen and Jennifer Doyle, Furman University

The Wage Return to Education: What Hides Behind the Least Squares Bias?

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

Standard errors of marginal effects in the heteroskedastic probit model

ESTIMATING AVERAGE TREATMENT EFFECTS: IV AND CONTROL FUNCTIONS, II Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics

LOGIT AND PROBIT ANALYSIS

Discussion Section 4 ECON 139/ Summer Term II

ASSESSING FINANCIAL EDUCATION: EVIDENCE FROM BOOTCAMP. William Skimmyhorn. Online Appendix

From the help desk: Swamy s random-coefficients model

Binary Logistic Regression

Lecture 3: Differences-in-Differences

A Basic Introduction to Missing Data

Figure 1.1 Percentage of persons without health insurance coverage: all ages, United States,

Development of the nomolog program and its evolution

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Lecture 15. Endogeneity & Instrumental Variable Estimation

Nonlinear Regression Functions. SW Ch 8 1/54/

Do Supplemental Online Recorded Lectures Help Students Learn Microeconomics?*

Online Appendix The Earnings Returns to Graduating with Honors - Evidence from Law Graduates

Local classification and local likelihoods

Marginal Effects for Continuous Variables Richard Williams, University of Notre Dame, Last revised February 21, 2015

Wooldridge, Introductory Econometrics, 4th ed. Chapter 7: Multiple regression analysis with qualitative information: Binary (or dummy) variables

Forecasting in STATA: Tools and Tricks

11. Analysis of Case-control Studies Logistic Regression

Why High-Order Polynomials Should Not be Used in Regression Discontinuity Designs

25 Working with categorical data and factor variables

Multiple Imputation for Missing Data: A Cautionary Tale

Implementing Propensity Score Matching Estimators with STATA

Returns to education on earnings in Italy: an evaluation using propensity score techniques

Average Redistributional Effects. IFAI/IZA Conference on Labor Market Policy Evaluation

Handling missing data in Stata a whirlwind tour

From the help desk: Bootstrapped standard errors

Public Health Insurance Expansions for Parents and Enhancement Effects for Child Coverage

Washington State s Integrated Basic Education and Skills Training Program (I-BEST): New Evidence of Effectiveness

Regression Analysis of the Relationship between Income and Work Hours

Comparing Features of Convenient Estimators for Binary Choice Models With Endogenous Regressors

How does Access to Short Term Disability Insurance Impact SSDI Claiming? *

Practical. I conometrics. data collection, analysis, and application. Christiana E. Hilmer. Michael J. Hilmer San Diego State University

ECON 523 Applied Econometrics I /Masters Level American University, Spring Description of the course

Missing Data & How to Deal: An overview of missing data. Melissa Humphries Population Research Center

Propensity scores for the estimation of average treatment effects in observational studies

Estimating the random coefficients logit model of demand using aggregate data

Rethinking the Cultural Context of Schooling Decisions in Disadvantaged Neighborhoods: From Deviant Subculture to Cultural Heterogeneity

Education and Wage Differential by Race: Convergence or Divergence? *

Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY

Chapter 7: Simple linear regression Learning Objectives

Interaction effects and group comparisons Richard Williams, University of Notre Dame, Last revised February 20, 2015

Rates for Vehicle Loans: Race and Loan Source

Statistical modelling with missing data using multiple imputation. Session 4: Sensitivity Analysis after Multiple Imputation

A Panel Data Analysis of Corporate Attributes and Stock Prices for Indian Manufacturing Sector

Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software

Profit-sharing and the financial performance of firms: Evidence from Germany

Online Appendix to Are Risk Preferences Stable Across Contexts? Evidence from Insurance Data

Missing data in randomized controlled trials (RCTs) can

HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

Missing Data Part 1: Overview, Traditional Methods Page 1

The Long-Run Effects of Attending an Elite School: Evidence from the UK (Damon Clark and Emilia Del Bono): Online Appendix

Additional sources Compilation of sources:

Machine Learning Methods for Causal Effects. Susan Athey, Stanford University Guido Imbens, Stanford University

III. INTRODUCTION TO LOGISTIC REGRESSION. a) Example: APACHE II Score and Mortality in Sepsis

Stata Walkthrough 4: Regression, Prediction, and Forecasting

Introduction to Fixed Effects Methods

Expression. Variable Equation Polynomial Monomial Add. Area. Volume Surface Space Length Width. Probability. Chance Random Likely Possibility Odds

Multinomial and Ordinal Logistic Regression

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Iowa School District Profiles. Central City

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Failure to take the sampling scheme into account can lead to inaccurate point estimates and/or flawed estimates of the standard errors.

Universal Health Insurance And Mortality: Evidence From Massachusetts

Duration Analysis. Econometric Analysis. Dr. Keshab Bhattarai. April 4, Hull Univ. Business School

MODEL I: DRINK REGRESSED ON GPA & MALE, WITHOUT CENTERING

Item Nonresponse in Wages: Testing for Biased Estimates in Wage Equations

Multiple logistic regression analysis of cigarette use among high school students

Multiple Linear Regression in Data Mining

STATISTICAL ANALYSIS OF UBC FACULTY SALARIES: INVESTIGATION OF

How to set the main menu of STATA to default factory settings standards

Descriptive Statistics

HEALTH INSURANCE COVERAGE STATUS American Community Survey 5-Year Estimates

How To Find Out If A Local Market Price Is Lower Than A Local Price

USUAL WEEKLY EARNINGS OF WAGE AND SALARY WORKERS FIRST QUARTER 2015

Propensity Score Matching Regression Discontinuity Limited Dependent Variables

Multivariate Models of Student Success

Organizing Your Approach to a Data Analysis

Regression with a Binary Dependent Variable

Washington State s Integrated Basic Education and Skills Training Program (I-BEST): New Evidence of Effectiveness

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

Transcription:

Abadie s Semi-parametric Difference-in-Difference Estimator Kenneth Houngbedji Paris School of Economics September, 2015 Abstract The difference-in-differences (DID) estimator measures the effect of a treatment or policy intervention by comparing change over time of the outcome variable across treatment groups. To interpret the estimate as a causal effect, this strategy requires that, in the absence of the treatment, the outcome variable would have followed the same trend in treated and untreated groups. This assumption may be implausible if selection for treatment is correlated with characteristics that affect the dynamic of the outcome variable. This paper describes the command absdid which implements the semi-parametric differences-in-differences (SDID) estimator of Abadie (2005). The SDID is a reweighing technique that addresses the imbalance of characteristics between treated and untreated groups. Hence, it makes the parallel trend assumption more credible. In addition, the SDID estimator allows the use of covariates to describe how the average effect of the treatment varies for different groups of the treated population. Keywords: semi-parametric estimations, difference-in-difference, propensity score Email: kenneth.houngbedji[@]psemail.eu, address: Paris School of Economics, Bureau A5-48 Boulevard Jourdan 75014 Paris, France. 1

2 1 The semi-parametric difference-in-difference estimator Let s consider the general setting of studies of causal effects used by Rosenbaum and Rubin (1983). We want to estimate the causal effect of a treatment on a variable of interest y at some time t. Each subject has two potential outcomes (y 1t, y 0t ). y 1t is the value of y if the subject receives the treatment by time t. y 0t is the value of y had the participant not received the treatment at time t. d is an indicator of whether or not a participant is treated by time t. At time t = 0 the baseline b no one is treated. At the time t 0, d is equal to 1 for treated participant and 0 otherwise. We want to estimate the average effect of the treatment on the treated (ATT): ) ATT E (y 1t y 0t d = 1. (1) Since y 0t is never observed for a treated participant, the ATT cannot be directly estimated. Assume y 0b is the value of y at time t = 0 i.e. the baseline. Let s x b be a set of pre-treatment characteristics; y t y t y b is the change of y between time t and the baseline b and π (x b ) P ( d = 1 ) xb is the conditional probability to be in the treatment group also called the propensity score. Abadie (2005) shows that the sample analog of ( yt E P (d = 1) d π (x ) ) b 1 π (x b ) (2) gives an unbiased estimate of the ATT if Equation (3) and Equation (4) hold. ) ) E (y 0t y 0b d = 1, xb = E (y 0t y 0b d = 0, xb (3) P (d = 1) > 0 and π (x b ) < 1 (4) The estimator is a weighted average of the difference of trend y t across treatment groups. It proceeds by reweighing the trend for the untreated participants based on their propensity score π (x b ). As π(x b ) 1 π(x b ) is an increasing function of π (x ), untreated b participants with higher propensity score are given higher weight.

3 Abadie (2005) suggests to approximate the propensity score π (x b ) semiparametrically using a polynomial series of order k of the predictors. Thereafter, the values predicted are plugged into the sample analogue of Equation (2). Even though the approximation improves for higher polynomial order, the estimation become less precise. It is also possible to estimate π (x b ) with the series logit estimator (SLE) (see Hirano et al., 2003). This method uses a logit specification to constrain the estimated propensity score to vary between 0 and 1. Independently of the approximation method used, the errors related to the estimation of the propensity scores are taken into account when estimating the standard error of the ATT as described in Abadie (2005). Other estimators use the propensity score to estimate the ATT. The kernel matching and nearest neighbor matching estimators are among the most widely used estimators for quasi experimental identification. However, both estimators assume that the propensity score is given and not estimated and produce on average estimates with smaller standard errors than the estimator of Abadie (2005). 2 The absdid command The command absdid is the Stata equivalent of a Matlab code written by Abadie in an empirical application of the semiparametric difference in difference estimator. 1 absdid estimates the ATT by comparing change over time of the outcome of interest across treatment groups while adjusting for difference between treatment groups on the observable characteristics at baseline which are correlated to the propensity score. 2.1 The syntax The general syntax for the command absdid is: absdid depvar [ if ] [ in ] [, tvar(varname) xvar(varlist) yxvar(varlist) order(#) csinf(#) csup(#) sle common ] 1 The original code is tailored to measure the effect of union membership on wages for workers. It is available at http://www.hks.harvard.edu/fs/aabadie/cdid_union.m.

4 depvar is a variable that represents the change of the outcome of interest between baseline and post treatment for each observation. tvar(varname) is the treatment variable. It takes the value 1 when the observation is treated and 0 otherwise. xvar(varlist) are the control variables. They can be either continuous or binary variables and are used to estimate the propensity score. order(#) takes integer values and represents the order of the polynomial function used to estimate the propensity score. The default is order(1). sle uses a series logit estimator of Hirano et al. (2003) to estimate the propensity score instead of simple polynomial series (the default). common imposes a common support by dropping treatment observations with a propensity score higher than the maximum score of the untreated observations or less than the minimum score of the treated. csinf(#) imposes a common support by dropping the observations of which the propensity score is lower than the value provided as csinf. The default is csinf(0). csup(#), imposes common support by dropping the observations of which the propensity score is higher than the value provided as csup. The default is csup(1). yxvar(varlist) allows to estimate the average impact of the treatment for the treated along the variables in varlist. It is mandatory to declare depvar, tvar(varname) and xvar(varlist). 2.2 Estimation of the propensity score with absdid Consider ˆπ (x b ) is the approximated propensity score; k is the order the polynomial function used to approximate π (x b ). By default absdid uses linear probability model

5 to approximate π (x b ) : ˆπ (x b ) = ˆγ 0 + ˆγ 1 x 1 + k ˆγ 2i x i 2, (5) i=1 where x 1 is a binary variable; x 2 is a continuous variable and x i 2 = i j=1 x 2. The coefficients ˆγ 0, ˆγ 1, ˆγ 21,..., ˆγ 2i,..., ˆγ 2k are estimated using an ordinary least square estimator. When the sle option is specified, absdid uses the following specification to approximate π (x b ): ( ˆπ (x b ) = Λ ˆγ 0 + ˆγ 1 x 1 + K ) ˆγ 2k x k 2. (6) k=1 Higher order the binary variables like x 1 are not considered because for any value k > 1, x k 1 = x 1. 3 Example To illustrate how absdid works, we reproduce the application exercise available on Abadie s website and estimate the effect of participation to a worker union on wages of unionized female workers. The data used is an excerpt of the current population survey (CPS) a US Government monthly survey of unemployment and labor force participation. It consists of workers observed in 1996 who were resurveyed in 1997 and can be downloaded as described below (see Table A-1 for a description of the key variables).. infile month wagespro wagespr union97 union96 age sex black hispanic married > educ exp state smsa dind grade year using "http://www.hks.harvard.edu/fs/aa > badie/cpsu97.raw", clear (44752 observations read). g lwagepro = log(wagespro). g lwagepr = log(wagespr). g hschool = (grade >= 39). g college = (grade >= 43)

6 We restrict the sample to female workers that were not unionized in 1996 and identify the union-wage effect on the workers that joined a worker union between 1996 and 1997. Let s note w 1,97 the wage of a worker in 1997 if she joins a worker union and w 0,97 the wage hasn t she joined the union. The parameter of interest is: ( ) ATT (w) E w 1,97 w 0,97 union97 = 1. (7) As wage variations are traditionally modeled through a log-normal distribution, we first get the estimate β of ATT (log(w)) and deduce the percentage effect of worker union on wage using the transformation suggested by Kennedy (1981): ( w1,97 w ) E 0,97 union97 = 1 w 0,97 ( = exp ˆβ 1 2 V ( β) ) 1. (8) To account for the fact that the workers who joined a union in 1997 differ from those that remained non-unionized with respect to age, education level and race see Table A-2 we use a semiparametric difference-in-difference approach. Hence, we assume that, in absence of worker unions, wage dynamics of unionized workers would have been similar to that of non-unionized workers with the same age, education level, race, state of residence, and sector of activity. If that assumption holds, the semiparametric difference-in-difference estimator of the union-wage effect for female workers can be estimated as described below.. g dlwage = log(wagespro)-log(wagespr). asdid dlwage if sex == 0 & union96 == 0, tvar(union97) xvar(age black hispa > nic married i.grade i.state i.dind i.month) order(4) csinf(.001) csup(1) ATT Coef. Std. Err. z P> z [95% Conf. Interval] dlwage.032768.0160057 2.05 0.041.0013974.0641386 Then, we implement the transformation of Equation (8).. nlcom exp(_b[dlwage] -.5*(_se[dlwage])^2) - 1

7 _nl_1: exp(_b[dlwage] -.5*(_se[dlwage])^2) - 1 ATT Coef. Std. Err. z P> z [95% Conf. Interval] _nl_1.0331785.0165367 2.01 0.045.000767.0655899 This estimation procedure suggests that joining a worker union increased wage of female workers by 3.3 percent in 1997. Similarly, we can also estimate how the treatment effect varies across different groups of female workers. See Table 1 for the results after the transformation recommended by Kennedy (1981). For comparison, Table A-3 shows the raw estimates without correction.. qui asdid dlwage if sex == 0 & union96 == 0, tvar(union97) xvar(age black h > ispanic married i.grade i.state i.dind i.month) yxvar(age hschool college) > order(4) csinf(.001) csup(1). qui nlcom exp(_b[dlwage] -.5*(_se[dlwage])^2) - 1. qui nlcom exp(_b[agexdlwage] -.5*(_se[ageXdlwage])^2) - 1. qui nlcom exp(_b[hschoolxdlwage] -.5*(_se[hschoolXdlwage])^2) - 1. qui nlcom exp(_b[collegexdlwage] -.5*(_se[collegeXdlwage])^2) - 1

8 Table 1: Effect of worker union on wage of unionized workers. ATT LPM SLE (1) (2) (3) (4) Constant 0.033** 0.326** 0.041** 0.400*** (0.017) (0.127) (0.017) (0.152) Age (years) -0.003** -0.004** (0.002) (0.002) High school -0.142** -0.173*** (0.056) (0.060) College 0.039 0.042 (0.036) (0.038) Number of workers 16,270 16,270 18,273 18,273 Note: The table reports estimates of the union-wage effects using asdid. Models (1) and (3) report estimates of the average ATT. Models (2) and (4) show how the ATT varies across different groups of unionized female workers. The ATT reported in (1) and (2) are estimated using a linear polynomial function of degree 4 to approximate the propensity score. The ATT reported in (3) and (4) are estimated using a logit specification or degree 4 to estimate the propensity score. Standard errors are in parentheses and significance levels are denoted as follows: * p<0.10, ** p<0.05, *** p<0.01.

9 4 Discussion The semiparametric difference-in-difference estimate can vary with the type of approximation used sle or simple linear probability model (the default) and the order of the polynomial approximation used order(#). Yet, in the example used in this paper, the confidence intervals of the ATT estimated with LPM and SLE intersect. To reduce the margin for arbitrage, one could use a cross validation technique to decide the combination of method which suits best the the semiparametric approximation of the propensity score. It can also help to consider that the LPM is likely to produce estimates of the propensity score which are either negative or greater than 1. This is not the case of the SLE approximation. Consequently, the sample size used to estimate the ATT is larger when the propensity score is approximated with the sle option. References Abadie, A. 2005. Semiparametric Difference-in-Differences Estimators. Review of Economic Studies 72(1): 1 19. Hirano, K., G. W. Imbens, and G. Ridder. 2003. Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score. Econometrica 71(4): 1161 1189. Kennedy, P. E. 1981. Estimation with Correctly Interpreted Dummy Variables in Semilogarithmic Equations. American Economic Review 71(4): 801. Rosenbaum, P. R., and D. B. Rubin. 1983. The Central Role of the Propensity Score in Observational Studies for Causal Effects. Biometrika 70(1): 41 55. About the authors Kenneth Houngbedji is a researcher at the Paris School of Economics. His main research interests are in development economics and quantitative methods.

10 Appendix Table A-1: Definitions of key variables of interest Variable name Definition month Interview calendar month which varies from 1 to 12 wagespr hourly wage of the worker 1996 wagespro hourly wage of the worker 1997 age age (in years) of the worker in 1996 sex black hispanic married binary variable equal to 0 for a female worker and 1 for a male worker binary variable equal to 1 is the worker is black and 0 if not binary variable equal to 1 if the worker is Hispanic and 0 if not binary variable equal to 1 for married worker and 0 otherwise grade highest grade attained by worker observed in 1996 hschool college state dind binary variable equal to 1 for workers that have completed high school and 0 if not binary variable equal to 1 for workers that have completed college and 0 if not 2-digit code for state of residence 2-digit code for the sector of activity of the worker

11 Table A-2: Characteristics of female workers across treatment groups. Variables Entire sample Unionized Non- Unionized Diff. Union coverage in 1997 0.05 [0.22] Wage variables: Log wage in 1997 2.36 2.43 2.36 0.07*** [0.52] [0.49] [0.53] (0.02) Log wage in 1996 2.30 2.34 2.30 0.04** [0.54] [0.52] [0.54] (0.02) Covariates in 1996: Age (years) 39.33 40.37 39.27 1.09*** [11.01] [10.55] [11.03] (0.37) High school 0.93 0.92 0.93-0.01 [0.26] [0.27] [0.26] (0.01) College 0.25 0.35 0.24 0.10*** [0.43] [0.48] [0.43] (0.01) African American 0.10 0.19 0.09 0.10*** [0.29] [0.39] [0.29] (0.01) Hispanic 0.06 0.07 0.06 0.01 [0.24] [0.26] [0.24] (0.01) Married 0.63 0.63 0.63-0.00 [0.48] [0.48] [0.48] (0.02) Number of workers 18,470 958 17,512 18,470 Note: Standard deviations are in brackets and standard errors are in parentheses and significance levels are denoted as follows: * p<0.10, ** p<0.05, *** p<0.01.

12 Table A-3: Effects of worker union on log of wage of unionized workers. ATT LPM SLE (1) (2) (3) (4) Constant 0.033** 0.287*** 0.040** 0.343*** (0.016) (0.096) (0.017) (0.108) Age (years) -0.003** -0.004** (0.002) (0.002) High school -0.151** -0.187*** (0.065) (0.072) College 0.039 0.042 (0.035) (0.036) Number of workers 16,270 16,270 18,273 18,273 Note: The table reports estimates of the effects of worker union on log of wage using asdid. Models (1) and (3) report estimates of the average ATT. Models (2) and (4) show how the ATT varies across different groups of unionized female workers. The ATT reported in (1) and (2) are estimated using a linear polynomial function of degree 4 to approximate the propensity score. The ATT reported in (3) and (4) are estimated using a logit specification or degree 4 to estimate the propensity score. Standard errors are in parentheses and significance levels are denoted as follows: * p<0.10, ** p<0.05, *** p<0.01.