Abadie s Semi-parametric Difference-in-Difference Estimator Kenneth Houngbedji Paris School of Economics September, 2015 Abstract The difference-in-differences (DID) estimator measures the effect of a treatment or policy intervention by comparing change over time of the outcome variable across treatment groups. To interpret the estimate as a causal effect, this strategy requires that, in the absence of the treatment, the outcome variable would have followed the same trend in treated and untreated groups. This assumption may be implausible if selection for treatment is correlated with characteristics that affect the dynamic of the outcome variable. This paper describes the command absdid which implements the semi-parametric differences-in-differences (SDID) estimator of Abadie (2005). The SDID is a reweighing technique that addresses the imbalance of characteristics between treated and untreated groups. Hence, it makes the parallel trend assumption more credible. In addition, the SDID estimator allows the use of covariates to describe how the average effect of the treatment varies for different groups of the treated population. Keywords: semi-parametric estimations, difference-in-difference, propensity score Email: kenneth.houngbedji[@]psemail.eu, address: Paris School of Economics, Bureau A5-48 Boulevard Jourdan 75014 Paris, France. 1
2 1 The semi-parametric difference-in-difference estimator Let s consider the general setting of studies of causal effects used by Rosenbaum and Rubin (1983). We want to estimate the causal effect of a treatment on a variable of interest y at some time t. Each subject has two potential outcomes (y 1t, y 0t ). y 1t is the value of y if the subject receives the treatment by time t. y 0t is the value of y had the participant not received the treatment at time t. d is an indicator of whether or not a participant is treated by time t. At time t = 0 the baseline b no one is treated. At the time t 0, d is equal to 1 for treated participant and 0 otherwise. We want to estimate the average effect of the treatment on the treated (ATT): ) ATT E (y 1t y 0t d = 1. (1) Since y 0t is never observed for a treated participant, the ATT cannot be directly estimated. Assume y 0b is the value of y at time t = 0 i.e. the baseline. Let s x b be a set of pre-treatment characteristics; y t y t y b is the change of y between time t and the baseline b and π (x b ) P ( d = 1 ) xb is the conditional probability to be in the treatment group also called the propensity score. Abadie (2005) shows that the sample analog of ( yt E P (d = 1) d π (x ) ) b 1 π (x b ) (2) gives an unbiased estimate of the ATT if Equation (3) and Equation (4) hold. ) ) E (y 0t y 0b d = 1, xb = E (y 0t y 0b d = 0, xb (3) P (d = 1) > 0 and π (x b ) < 1 (4) The estimator is a weighted average of the difference of trend y t across treatment groups. It proceeds by reweighing the trend for the untreated participants based on their propensity score π (x b ). As π(x b ) 1 π(x b ) is an increasing function of π (x ), untreated b participants with higher propensity score are given higher weight.
3 Abadie (2005) suggests to approximate the propensity score π (x b ) semiparametrically using a polynomial series of order k of the predictors. Thereafter, the values predicted are plugged into the sample analogue of Equation (2). Even though the approximation improves for higher polynomial order, the estimation become less precise. It is also possible to estimate π (x b ) with the series logit estimator (SLE) (see Hirano et al., 2003). This method uses a logit specification to constrain the estimated propensity score to vary between 0 and 1. Independently of the approximation method used, the errors related to the estimation of the propensity scores are taken into account when estimating the standard error of the ATT as described in Abadie (2005). Other estimators use the propensity score to estimate the ATT. The kernel matching and nearest neighbor matching estimators are among the most widely used estimators for quasi experimental identification. However, both estimators assume that the propensity score is given and not estimated and produce on average estimates with smaller standard errors than the estimator of Abadie (2005). 2 The absdid command The command absdid is the Stata equivalent of a Matlab code written by Abadie in an empirical application of the semiparametric difference in difference estimator. 1 absdid estimates the ATT by comparing change over time of the outcome of interest across treatment groups while adjusting for difference between treatment groups on the observable characteristics at baseline which are correlated to the propensity score. 2.1 The syntax The general syntax for the command absdid is: absdid depvar [ if ] [ in ] [, tvar(varname) xvar(varlist) yxvar(varlist) order(#) csinf(#) csup(#) sle common ] 1 The original code is tailored to measure the effect of union membership on wages for workers. It is available at http://www.hks.harvard.edu/fs/aabadie/cdid_union.m.
4 depvar is a variable that represents the change of the outcome of interest between baseline and post treatment for each observation. tvar(varname) is the treatment variable. It takes the value 1 when the observation is treated and 0 otherwise. xvar(varlist) are the control variables. They can be either continuous or binary variables and are used to estimate the propensity score. order(#) takes integer values and represents the order of the polynomial function used to estimate the propensity score. The default is order(1). sle uses a series logit estimator of Hirano et al. (2003) to estimate the propensity score instead of simple polynomial series (the default). common imposes a common support by dropping treatment observations with a propensity score higher than the maximum score of the untreated observations or less than the minimum score of the treated. csinf(#) imposes a common support by dropping the observations of which the propensity score is lower than the value provided as csinf. The default is csinf(0). csup(#), imposes common support by dropping the observations of which the propensity score is higher than the value provided as csup. The default is csup(1). yxvar(varlist) allows to estimate the average impact of the treatment for the treated along the variables in varlist. It is mandatory to declare depvar, tvar(varname) and xvar(varlist). 2.2 Estimation of the propensity score with absdid Consider ˆπ (x b ) is the approximated propensity score; k is the order the polynomial function used to approximate π (x b ). By default absdid uses linear probability model
5 to approximate π (x b ) : ˆπ (x b ) = ˆγ 0 + ˆγ 1 x 1 + k ˆγ 2i x i 2, (5) i=1 where x 1 is a binary variable; x 2 is a continuous variable and x i 2 = i j=1 x 2. The coefficients ˆγ 0, ˆγ 1, ˆγ 21,..., ˆγ 2i,..., ˆγ 2k are estimated using an ordinary least square estimator. When the sle option is specified, absdid uses the following specification to approximate π (x b ): ( ˆπ (x b ) = Λ ˆγ 0 + ˆγ 1 x 1 + K ) ˆγ 2k x k 2. (6) k=1 Higher order the binary variables like x 1 are not considered because for any value k > 1, x k 1 = x 1. 3 Example To illustrate how absdid works, we reproduce the application exercise available on Abadie s website and estimate the effect of participation to a worker union on wages of unionized female workers. The data used is an excerpt of the current population survey (CPS) a US Government monthly survey of unemployment and labor force participation. It consists of workers observed in 1996 who were resurveyed in 1997 and can be downloaded as described below (see Table A-1 for a description of the key variables).. infile month wagespro wagespr union97 union96 age sex black hispanic married > educ exp state smsa dind grade year using "http://www.hks.harvard.edu/fs/aa > badie/cpsu97.raw", clear (44752 observations read). g lwagepro = log(wagespro). g lwagepr = log(wagespr). g hschool = (grade >= 39). g college = (grade >= 43)
6 We restrict the sample to female workers that were not unionized in 1996 and identify the union-wage effect on the workers that joined a worker union between 1996 and 1997. Let s note w 1,97 the wage of a worker in 1997 if she joins a worker union and w 0,97 the wage hasn t she joined the union. The parameter of interest is: ( ) ATT (w) E w 1,97 w 0,97 union97 = 1. (7) As wage variations are traditionally modeled through a log-normal distribution, we first get the estimate β of ATT (log(w)) and deduce the percentage effect of worker union on wage using the transformation suggested by Kennedy (1981): ( w1,97 w ) E 0,97 union97 = 1 w 0,97 ( = exp ˆβ 1 2 V ( β) ) 1. (8) To account for the fact that the workers who joined a union in 1997 differ from those that remained non-unionized with respect to age, education level and race see Table A-2 we use a semiparametric difference-in-difference approach. Hence, we assume that, in absence of worker unions, wage dynamics of unionized workers would have been similar to that of non-unionized workers with the same age, education level, race, state of residence, and sector of activity. If that assumption holds, the semiparametric difference-in-difference estimator of the union-wage effect for female workers can be estimated as described below.. g dlwage = log(wagespro)-log(wagespr). asdid dlwage if sex == 0 & union96 == 0, tvar(union97) xvar(age black hispa > nic married i.grade i.state i.dind i.month) order(4) csinf(.001) csup(1) ATT Coef. Std. Err. z P> z [95% Conf. Interval] dlwage.032768.0160057 2.05 0.041.0013974.0641386 Then, we implement the transformation of Equation (8).. nlcom exp(_b[dlwage] -.5*(_se[dlwage])^2) - 1
7 _nl_1: exp(_b[dlwage] -.5*(_se[dlwage])^2) - 1 ATT Coef. Std. Err. z P> z [95% Conf. Interval] _nl_1.0331785.0165367 2.01 0.045.000767.0655899 This estimation procedure suggests that joining a worker union increased wage of female workers by 3.3 percent in 1997. Similarly, we can also estimate how the treatment effect varies across different groups of female workers. See Table 1 for the results after the transformation recommended by Kennedy (1981). For comparison, Table A-3 shows the raw estimates without correction.. qui asdid dlwage if sex == 0 & union96 == 0, tvar(union97) xvar(age black h > ispanic married i.grade i.state i.dind i.month) yxvar(age hschool college) > order(4) csinf(.001) csup(1). qui nlcom exp(_b[dlwage] -.5*(_se[dlwage])^2) - 1. qui nlcom exp(_b[agexdlwage] -.5*(_se[ageXdlwage])^2) - 1. qui nlcom exp(_b[hschoolxdlwage] -.5*(_se[hschoolXdlwage])^2) - 1. qui nlcom exp(_b[collegexdlwage] -.5*(_se[collegeXdlwage])^2) - 1
8 Table 1: Effect of worker union on wage of unionized workers. ATT LPM SLE (1) (2) (3) (4) Constant 0.033** 0.326** 0.041** 0.400*** (0.017) (0.127) (0.017) (0.152) Age (years) -0.003** -0.004** (0.002) (0.002) High school -0.142** -0.173*** (0.056) (0.060) College 0.039 0.042 (0.036) (0.038) Number of workers 16,270 16,270 18,273 18,273 Note: The table reports estimates of the union-wage effects using asdid. Models (1) and (3) report estimates of the average ATT. Models (2) and (4) show how the ATT varies across different groups of unionized female workers. The ATT reported in (1) and (2) are estimated using a linear polynomial function of degree 4 to approximate the propensity score. The ATT reported in (3) and (4) are estimated using a logit specification or degree 4 to estimate the propensity score. Standard errors are in parentheses and significance levels are denoted as follows: * p<0.10, ** p<0.05, *** p<0.01.
9 4 Discussion The semiparametric difference-in-difference estimate can vary with the type of approximation used sle or simple linear probability model (the default) and the order of the polynomial approximation used order(#). Yet, in the example used in this paper, the confidence intervals of the ATT estimated with LPM and SLE intersect. To reduce the margin for arbitrage, one could use a cross validation technique to decide the combination of method which suits best the the semiparametric approximation of the propensity score. It can also help to consider that the LPM is likely to produce estimates of the propensity score which are either negative or greater than 1. This is not the case of the SLE approximation. Consequently, the sample size used to estimate the ATT is larger when the propensity score is approximated with the sle option. References Abadie, A. 2005. Semiparametric Difference-in-Differences Estimators. Review of Economic Studies 72(1): 1 19. Hirano, K., G. W. Imbens, and G. Ridder. 2003. Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score. Econometrica 71(4): 1161 1189. Kennedy, P. E. 1981. Estimation with Correctly Interpreted Dummy Variables in Semilogarithmic Equations. American Economic Review 71(4): 801. Rosenbaum, P. R., and D. B. Rubin. 1983. The Central Role of the Propensity Score in Observational Studies for Causal Effects. Biometrika 70(1): 41 55. About the authors Kenneth Houngbedji is a researcher at the Paris School of Economics. His main research interests are in development economics and quantitative methods.
10 Appendix Table A-1: Definitions of key variables of interest Variable name Definition month Interview calendar month which varies from 1 to 12 wagespr hourly wage of the worker 1996 wagespro hourly wage of the worker 1997 age age (in years) of the worker in 1996 sex black hispanic married binary variable equal to 0 for a female worker and 1 for a male worker binary variable equal to 1 is the worker is black and 0 if not binary variable equal to 1 if the worker is Hispanic and 0 if not binary variable equal to 1 for married worker and 0 otherwise grade highest grade attained by worker observed in 1996 hschool college state dind binary variable equal to 1 for workers that have completed high school and 0 if not binary variable equal to 1 for workers that have completed college and 0 if not 2-digit code for state of residence 2-digit code for the sector of activity of the worker
11 Table A-2: Characteristics of female workers across treatment groups. Variables Entire sample Unionized Non- Unionized Diff. Union coverage in 1997 0.05 [0.22] Wage variables: Log wage in 1997 2.36 2.43 2.36 0.07*** [0.52] [0.49] [0.53] (0.02) Log wage in 1996 2.30 2.34 2.30 0.04** [0.54] [0.52] [0.54] (0.02) Covariates in 1996: Age (years) 39.33 40.37 39.27 1.09*** [11.01] [10.55] [11.03] (0.37) High school 0.93 0.92 0.93-0.01 [0.26] [0.27] [0.26] (0.01) College 0.25 0.35 0.24 0.10*** [0.43] [0.48] [0.43] (0.01) African American 0.10 0.19 0.09 0.10*** [0.29] [0.39] [0.29] (0.01) Hispanic 0.06 0.07 0.06 0.01 [0.24] [0.26] [0.24] (0.01) Married 0.63 0.63 0.63-0.00 [0.48] [0.48] [0.48] (0.02) Number of workers 18,470 958 17,512 18,470 Note: Standard deviations are in brackets and standard errors are in parentheses and significance levels are denoted as follows: * p<0.10, ** p<0.05, *** p<0.01.
12 Table A-3: Effects of worker union on log of wage of unionized workers. ATT LPM SLE (1) (2) (3) (4) Constant 0.033** 0.287*** 0.040** 0.343*** (0.016) (0.096) (0.017) (0.108) Age (years) -0.003** -0.004** (0.002) (0.002) High school -0.151** -0.187*** (0.065) (0.072) College 0.039 0.042 (0.035) (0.036) Number of workers 16,270 16,270 18,273 18,273 Note: The table reports estimates of the effects of worker union on log of wage using asdid. Models (1) and (3) report estimates of the average ATT. Models (2) and (4) show how the ATT varies across different groups of unionized female workers. The ATT reported in (1) and (2) are estimated using a linear polynomial function of degree 4 to approximate the propensity score. The ATT reported in (3) and (4) are estimated using a logit specification or degree 4 to estimate the propensity score. Standard errors are in parentheses and significance levels are denoted as follows: * p<0.10, ** p<0.05, *** p<0.01.