Empirical Methods. MIT 14.771/ Harvard 2390b



Similar documents
benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

14.74 Lecture 5: Health (2)

The OC Curve of Attribute Acceptance Plans

How To Calculate The Accountng Perod Of Nequalty

1. Measuring association using correlation and regression

Can Auto Liability Insurance Purchases Signal Risk Attitude?

An Alternative Way to Measure Private Equity Performance

PRIVATE SCHOOL CHOICE: THE EFFECTS OF RELIGIOUS AFFILIATION AND PARTICIPATION

Calculation of Sampling Weights

Forecasting the Direction and Strength of Stock Market Movement

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy

HOUSEHOLDS DEBT BURDEN: AN ANALYSIS BASED ON MICROECONOMIC DATA*

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation

DEFINING %COMPLETE IN MICROSOFT PROJECT

Statistical Methods to Develop Rating Models

CHAPTER 14 MORE ABOUT REGRESSION

Marginal Returns to Education For Teachers

SIMPLE LINEAR CORRELATION

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek

THE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

Economic Interpretation of Regression. Theory and Applications

What is Candidate Sampling

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

Evaluation Methods for Non- Experimental Data

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

Recurrence. 1 Definitions and main statements

Staff Paper. Farm Savings Accounts: Examining Income Variability, Eligibility, and Benefits. Brent Gloy, Eddy LaDue, and Charles Cuykendall

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006

CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES

Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy

! # %& ( ) +,../ # 5##&.6 7% 8 # #...

Number of Levels Cumulative Annual operating Income per year construction costs costs ($) ($) ($) 1 600,000 35, , ,200,000 60, ,000

Traffic-light a stress test for life insurance provisions

Marginal Benefit Incidence Analysis Using a Single Cross-section of Data. Mohamed Ihsan Ajwad and Quentin Wodon 1. World Bank.

Lecture 3: Force of Interest, Real Interest Rate, Annuity

Simple Interest Loans (Section 5.1) :

Portfolio Loss Distribution

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

WORKING PAPER. C.D. Howe Institute. The Effects of Tax Rate Changes on Tax Bases and the Marginal Cost of Public Funds for Provincial Governments

Criminal Justice System on Crime *

Chapter 15: Debt and Taxes

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Evaluating the Effects of FUNDEF on Wages and Test Scores in Brazil *

A Probabilistic Theory of Coherence

Gender differences in revealed risk taking: evidence from mutual fund investors

Proceedings of the Annual Meeting of the American Statistical Association, August 5-9, 2001

PRACTICE 1: MUTUAL FUNDS EVALUATION USING MATLAB.

Military Conscription and University Enrolment: Evidence from Italy

Are Women Better Loan Officers?

The Complementarities of Competition in Charitable Fundraising

Chapter 8 Group-based Lending and Adverse Selection: A Study on Risk Behavior and Group Formation 1

Problem Set 3. a) We are asked how people will react, if the interest rate i on bonds is negative.

5 Multiple regression analysis with qualitative information

Support Vector Machines

Analysis of Premium Liabilities for Australian Lines of Business

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

IS-LM Model 1 C' dy = di

Lecture 3: Annuity. Study annuities whose payments form a geometric progression or a arithmetic progression.

How Large are the Gains from Economic Integration? Theory and Evidence from U.S. Agriculture,

Intra-year Cash Flow Patterns: A Simple Solution for an Unnecessary Appraisal Error

An Empirical Study of Search Engine Advertising Effectiveness

The Racial and Gender Interest Rate Gap. in Small Business Lending: Improved Estimates Using Matching Methods*

Heterogeneous Paths Through College: Detailed Patterns and Relationships with Graduation and Earnings

Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008

Quantization Effects in Digital Filters

The Current Employment Statistics (CES) survey,

Credit Limit Optimization (CLO) for Credit Cards

Transition Matrix Models of Consumer Credit Ratings

SPEE Recommended Evaluation Practice #6 Definition of Decline Curve Parameters Background:

Time Value of Money. Types of Interest. Compounding and Discounting Single Sums. Page 1. Ch. 6 - The Time Value of Money. The Time Value of Money

Multiple-Period Attribution: Residuals and Compounding

Study on Model of Risks Assessment of Standard Operation in Rural Power Network

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6

IDENTIFICATION AND CORRECTION OF A COMMON ERROR IN GENERAL ANNUITY CALCULATIONS

STAMP DUTY ON SHARES AND ITS EFFECT ON SHARE PRICES

Using Series to Analyze Financial Situations: Present Value

Returns to Experience in Mozambique: A Nonparametric Regression Approach

STATISTICAL DATA ANALYSIS IN EXCEL

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression

Does Higher Education Enhance Migration?

Transcription:

Emprcal Methods MIT 14.771/ Harvard 2390b The goal of ths handout s to present the most common emprcal methods used n appled economcs. Excellent references for the program evaluaton and natural experment approach are Angrst and Krueger (1999), and Mayer (1999). Angrst and Krueger (1999) contans more materal and at a more detaled level than ths handout and should be a hgh prorty paper to read for students plannng to wrte a thess n emprcal development, labor of publc fnance. 1 The evaluaton problem Emprcal methods n development economcs, labor economcs, and publc fnance, have been developed to try to answer counterfactual questons. What would have happened to ths person s behavor f she had been subjected to an alternatve polcy T (e.g. would she work more f margnal taxes were lower, would she earn less f she had not gone to school, would she be more lkely to be mmunzed f there had been an mmunzaton center n vllage?). Here s an example that llustrates the fundamental dffcultes of program evaluaton: Let us call Y T and Y C the average test scores of chldren n a gven school f the school has textbooks, the test scores of chldren n the same school f the school has no textbooks. We are nterested n the dfference Y T Y C, whch s the effect of havng textbooks for school. Problem: we wll never have a school both wth and wthout books at the same tme. What can we do? We wll never know the effect of havng textbooks on a school n partcular but we may hope to learn the average effect that t wll have on schools: E[Y T Handout by Prof. Esther Duflo Y C ]. 1

Imagne we have access to data on lots of schools n one regon. Some schools have textbooks and others do not. We may thnk of takng the average n both groups, and the dfference between average test scores n schools wth textbooks and average test scores n schools wthout textbooks. Ths s equal to: D = E[Y T School has textbooks] E[Y C Subtract and add E[Y C T ], we obtan, D = E[Y T The frst term E[Y T T ] E[Y C T ] E[Y C School has no textbooks] = E[Y T C] + E[Y C T ] = E[Y T Y C T ] + E[Y C T ] E[Y C C] T ] E[Y C C] Y C T ] s the treatment effect that we try to solate (effect of treatment on the treated): on average, n the treatment schools, what dfference wll the books make? The dfference E[Y C T ] E[Y C C] s the selecton bas. It tells us that, besde the effect of the textbooks, there may be systematc dfferences between schools wth textbooks and other schools. Emprcal methods try to solve ths problem. 2 Randomzed evaluatons The deal set-up to evaluate the effect of a polcy X on outcome Y s a randomzed experment. Useful reference s Rosenbaum (1995). In a randomzed experment, a sample of N ndvduals s selected from the populaton (note that ths sample may not be random and may be selected accordng to observables). Ths sample s then dvded randomly nto two groups: the Treatment group (N T ndvduals) and the Control group (N C ndvduals). Obvously N T + N C = N. The Treatment group s then treated by polcy X whle the control group s not. Then the outcome Y s observed and compared for both Treatment and Control groups. The effect of polcy X s measured n general by the dfference n emprcal means of Y between Treatments and Controls: ˆD = Ê(Y T ) Ê(Y C), 2

where Ê denotes the emprcal mean. As Treatment has been randomly assgned, the dfference E[Y C absence of the treatment, schools are the same). Therefore, T ] E[Y C C] s equal to 0 (n the E[Y T ] E[Y C] = E[Y T Y C T ] = E[Y T Y C ], the causal parameter of nterest. The regresson counterpart to obtan standard errors for ˆD s, Y = α + D 1( T ) + ɛ where 1( T ) s a dummy for beng n the Treatment group. How? The formula for ˆD OLS s smple to handle when there s only one ndependent varable: ˆD OLS = 1( T )[Y Ȳ ] 1( T )[1( T ) N T /N] The denomnator s equal to: Den = 1( T )2 (N T /N) 1( T ) = N T (1 N T /N) The numerator s equal to: Num = 1( T )[Y Ȳ ] = 1( T )Y Ȳ 1( T ) whch mples: Num = N T Ê(Y T ) N T [N T Ê(Y T )+N C Ê(Y C)]/N = N T (1 N T /N)Ê(Y T ) (N N T )Ê(Y C) = N T (1 N T /N)[Ê(Y T ) Ê(Y C)]. Takng the ratos of Num and Den, we ndeed fnd that: ˆD OLS = Ê(Y T ) Ê(Y C). Problems of Randomzed Experments 1. Cost (a) Fnancal costs Experments are very costly and dffcult to mplement properly n economcs. The negatve ncome tax experments of the late 60s and 70s n the US llustrate most of 3

the ssues (see (Pencavel 1986, Ashenfelter and Plant 1990)). As a result they are often ether poorly managed, or small, or both (wth the correspondng problems we wll see below). (b) Ethcal problems It s not possble to run all the experments we would lke to because they mght affect substantally the economc or socal outcomes of the Treated. Alternatvely, NGOs or governments are reluctant to deprve the controls from treatment whch they consder potentally valuable. Insstng on the fact that t s a productve use of lmted resources may be a good way to go... 2. Threats to nternal valdty: (a) Non response bas: People may move off durng the experment. If people who leave have partcular characterstcs systematcally related to the outcome then there s attrton bas. (cf. Hausman and Wse (1979) about attrton n the NIT experment). (b) Mx up of Treatment and Controls: Sometmes, mantanng the allocaton to control and treatment to be random s almost mpossble. Example: (Krueger 2000) evaluaton of the Tennessee Star small class sze experment: chldren were moved to small classes (due to parental pressures, bad behavor,etc..). The actual class s therefore not random even though the ntal assgnment was random. It s then mportant to use the ntal assgnment as the treatment, because t s the only varaton that was randomly assgned. It can then be used as an nstrument for actual class sze (cf. below). 3. Threats to external valdty (a) Lmted duraton: Experments are n general temporary. program than to a permanent program. People may react dfferently to a temporary (b) Experment Specfcty: In general, an experment s run n a partcular geographc area (e.g., the NIT experments). It s not obvous that the same experment would have gven the same results 4

n another area. Therefore, t s often dffcult to generalze the results of an experment to the total populaton. (c) Hawthrone and John Henry effects: Treatment and control may behave dfferently because they know they are beng observed. Therefore the effects may not be generalzed to a context where subjects are not observed. (d) General Equlbrum effects: Extrapolaton complcated because of general equlbrum effects: small scale experments do not generate general equlbrum effects that mght be very mportant when polcy s appled to everybody n the populaton. 4. Threats to power (a) Small samples: Because experments are dffcult to admnster, samples are often small, whch makes t dffcult to obtan sgnfcant results. It s mportant to compute power calculaton before startng an experment (what s the sample sze requred to be able to dscrmnate from 0 an effect of a gven sze?). See the command sampsze n stata. But the crucal nputs (mean and varance of the outcomes before treatment) are often mssng, so that there s always some guess work nvolved n plannng experments. (b) Experment desgn and power of the experment: When the unt of randomzaton s a group (e.g. a school), we may need to collect data on a very large number of ndvduals to get sgnfcant results, f outcomes are strongly correlated wthn groups (see below how standard errors are corrected for the grouped structure). Ths was a dffculty n the Kremer, Glewwe and Mouln (1998) textbook experments. 5

3 Controllng for selecton bas by controllng for observables 3.1 OLS OLS s the basc regresson desgn. 3.1.1 Defnton Y = Xβ + ɛ Suppose we have N observatons Y s the dependent varable N 1 vector X are the ndependent varables N K vector (K ndependent varables). One element of the X may be T, the varable we are nterested n. We note X = (T, x 2,.., x K ) ɛ s the error term N 1 vector The OLS estmator s: ˆβ = (X X) 1 X Y = β + (X X) 1 X ɛ ˆβ s consstent f ɛ and X are uncorrelated, that s, E(X ɛ) = 0. NB: ths s not as strong a requrement as beng ndependent. Stata OLS command: regress y T x 2.. x K where y s the name of the dependent varable, T s the varable of nterest and x 2.. x K are the names of the K 1 control varables. 3.1.2 Inference The asymptotc varance, whch stata reports, s correct when the varance of the error term s dagonal (ths rules out autocorrelaton) wth dentcal terms on the dagonal (ths rules out heteroscedastcty), that s, V (ɛ) = σɛ 2 I N where I N s the dentty matrx of rank N. The asymptotc varance of the OLS estmator s gven by: VAR(β) = σɛ 2 (X X) 1 6

When the error term s non-sphercal V (ɛ) = Ω, the asymptotc varance of the OLS estmator s dfferent from the prevous formula and s gven by: VAR = (X X) 1 (X ΩX)(X X) 1 There are two mportant examples of non-sphercal dsturbances: 1. Heteroskedastcty: Ω s dagonal (ɛ s uncorrelated wth ɛ j when j) but V ar(ɛ ) may vary wth. Stata command: regress y x1.. xk, robust produces correct standard errors n that case usng the Whte method. 2. Group error structure: Example: Survey desgn n developng countres s often clustered. (cf. Deaton (1997) s book for more on ths). Frst, clusters (.e. vllages or neghborhoods are randomly selected), then ndvduals are selected wthn clusters. Y j = X j β + ɛ j where s the ndvdual and j s the vllage. Assume that there are vllage common fxed effects: ɛ j = µ j + ν j where the ν j are ndependent and wth constant varance. Then the error term matrx Ω s bloc dagonal. stata command : regress y x1.. xk, cluster(vllage) where vllage s the subgroup ndcator, produces standard errors whch are corrected both for heterockedastcty and the grouped structure. 3.1.3 Problems wth OLS 1. Under-controllng The most frequent problem wth OLS s that of omtted varable bas. Our coeffcent s lkely to be based f we omt relevant control varables. The classc example s that of returns to educaton. If ablty (or other factors affectng future earnngs) are correlated wth educaton choce and are not ncluded n the regresson, the OLS coeffcent s based. 7

Suppose our true model s Y = β 0 + β 1 T + β 2 X 2 + β 3 X 3 + β 4 X 4 + ɛ where T represents our varable of nterest (e.g. schoolng) and X 3 and X 4 represent other control varables (e.g. ablty, famly background). However, we do not have nformaton on X 3 and X 4, so we run the short regresson : Y = β 0 + β 1T + β 2X 2 + η Then we know that β 1 = Cov(Y, T ) V ar( T ) where T s the resdual from the regresson of T on X 2.e. T = γ 0 + γ 1 X 2 + T wth Cov(X 2, T ) = 0 So, the numerator of β 1 s Cov(Y, T ) = Cov(β 0 + β 1 T + β 2 X 2 + β 3 X 3 + β 4 X 4 + ɛ, T ) = Cov(β 1 T + β 3 X 3 + β 4 X 4, T ) = β 1 V ar( T ) + β 3 Cov(X 3, T ) + β 4 Cov(X 4, T ) β1 = β 1 + β 3 δ 31 + β 4 δ 41 where δ 31 = coeffcent on T when X 3 s regressed on T and X 2, and δ 41 = coeffcent on T when X 4 s regressed on T and X 2. In words: Short regresson coeff. = Long regresson coeff. + [coeffs. on omtted varables n long regresson] [coeffs. on omtted varables when regressed on ncluded varables] Ths formula s very useful n determnng the sgn of the omtted varables bas. For nstance, n the returns to educaton example wth ablty as the omtted varable, we expect that unobserved ablty wll have a postve mpact on wages n the long regresson. If we assume that hgher ablty people choose to get more schoolng, then the omtted varables bas s postve, whch means that our estmated coeff. on schoolng s based upwards. 2. Over-controllng Controllng for varables that are caused by the varable of nterest wll also lead to based coeffcent. For example, f wage and ablty (as measured by IQ, for example), are both 8

caused by schoolng, then controllng for IQ n an OLS regresson of wage on educaton wll lead to a downward bas n the OLS coeffcent of educaton (ntutvely: the ablty varable pcks up some of the causal effect of educaton, namely the ncrease n wages whch s due to the effect of educaton on ablty whch tself affects wages). The relatonshp between short and long regresson coeffcents s stll gven by the omtted varables formula above, only here the short regresson s the coeffcent we really want, and the long regresson s what we mstakenly run. In the case of the schoolng example, t thus results n a downward bas. 3. Estmatng the extent of omtted varables bas Computng the formula above explctly s dffcult to do snce we typcally do not have nformaton on the omtted varables. However, f the true relatonshp depends on a large number of varables, and the ncluded regressors are a random subset of ths set of factors and none of the factors domnates the relatonshp wth wages or schoolng, then the relatonshp between the ndces of observables n the schoolng and wage equatons s the same as the relatonshp between the unobservables ((Altonj, Elder and Taber 2000)). To get an dea of how much our results mght be affected due to unobserved covarates, we can compute how large the omtted varables bas must be to make our results nvald. If our schoolng varable takes only 2 values 0 and 1, we can compute the normalzed shft n schoolng due to observables: E(X β S = 1) E(X β S = 0) V ar(x β) and ask how large the normalzed shft due to unobservables E(ɛ S = 1) E(ɛ S = 0) V ar(ɛ) would have to be n order to explan away the entre estmate of β 1. If selecton on unobservables has to be very large compared to selecton on observables n order to attrbute all our results to omtted varables bas, we feel more confdent about our results. 9

3.2 Matchng 3.2.1 Matchng on observables Instead of dong a regresson, t s possble to use matchng methods. Matchng s easer to mplement when the treatment varable takes only two values. Clearly presented applcaton s (Angrst 1998). An obvous case s when the treatment effect s random condtonal on a set of observable varables X. Example: at Dartmouth, roommates are allocated randomly after condtonng for responses to a set of questons: are you more neat or messy?, do you smoke?, do you lsten to loud musc? People wth the same answers to all of these questons are put n a ple and then randomly allocated to each other and to a room. What s the effect of the hgh school score of my roommate on my GPA? ((Sacerdote 2000)). Imagne the treatment varable T = 1 f the roommate has a hgh score n hgh school. Randomzaton condtonal on observables mply that: E[Y C X, T ] E[Y C X, C] = 0 So: E[Y X, T ] E[Y X, C] = E[Y T X, T ] E[Y C X, T ] And therefore: E X {E[Y T X, T ] E[Y C X, C]} = E[Y T Y C T ], Our parameter of nterest. Fnally, E X {E[Y T X, T ] E[Y C X, T ]} = {E[Y T 10 x, T ] E[Y C x, C]}P (X = x T )dx

Ths means that, f X takes dscrete values, we can compare Treatment and Control n all the cells formed by the combnaton of the Xs (e.g.: neat, smoker, no loud musc), and then take a wegthed average over these cells, usng as weghts the proporton of treated n the cells (ths s the sample analog of ths expresson). Cells where there are only controls or only treatments are dropped. Comparng matchng and OLS: - They are the same f the treatment effects are constant - If treatment effects are dfferent, they wll be dfferent, because they apply a dfferent weghtng schemes. OLS s effcent under the assumpton that the treatment effect s constant, so t weghts observaton by the condtonal varance of the treatment status. -Matchng does not use cells where there are only treatment observatons, whereas OLS takes advantage of the lnearty assumpton to use all the varables: the treatment group and the control groups may be very dssmlar n matchng and n OLS (for example, comparng the CPS to the sample of tranng program partcpants n the tranng program mentoned below means that very dfferent people are compared). Matchng wll throw away all the control observatons for whch we cannot fnd at least one treatment observaton wth the same characterstcs. Important caveat: Sometmes matchng on observables mght lead to a greater bas than OLS, f matchng s not truly random condtonal on observables.e. matchng may not elmnate the omtted varables bas due to unobservables. For nstance, suppose we match up people on the bass of famly background and attrbute any resultng dfference n wages to dfferences n educaton. It s qute possble that people wth the same famly background have wdely varyng ablty levels, but very smlar levels of schoolng. In ths case, we would obtan a very large estmate of the returns to schoolng, due to the omtted varable bas. Ths mght even be larger than the bas n usual OLS, because n the latter case, we have a greater range of schoolng levels wth probably the same range of ablty levels. 11

3.2.2 Propensty score matchng Exact matchng s not practcal when X s contnuous or contans many varables. A result due to Rosenbaum and Rubn (1984), s that for p(x) equal to the probablty that T = 1 gven X, E[Y C X, T ] E[Y C X, C] = 0 mples: E[Y C p(x), T ] E[Y C p(x), C] = 0. So t s possble to frst estmate the propensty score, and then compare observatons whch have a smlar propensty score. It s often easer to estmate non-parametrcally or sem-parametrcally the propensty score than to drectly condton on observables. Example: (Deheja and Wahba 1999), revstng (Lalonde 1986) on the effect of tranng on earnngs, show that the propensty score matchng approach leads to results that are close to the expermental evdence, where the regressons approaches faled. In practce, they frst estmated a logt model of tranng partcpaton on covarates and lags of earnngs, and then compared treatment and control n each quntle of the estmated propensty scores. They obtaned the fnal estmate by weghtng each dfference by the proporton of each tranees n the gven quntle. 4 Dfference-n-dfferences type estmators General references: (Campbell 1969, Meyer 1995). 4.1 Smple Dfferences As random experments are very rare, economsts have to rely on actual polcy changes to dentfy the effects of polces on outcomes. These are called natural experments because we take advantage of changes that were not made explctly to measure the effects of polces. 12

The key ssue when analyzng a natural experment s to dvde the data nto a control and treatment group. The most obvous way to do that s to do a smple dfference method usng data before (t = 0) and after the change (t = 1): Y t = α + β 1(t = 1) + ɛ t The OLS estmate of β s the dfference n means Ȳ1 Ȳ0 before and after the change. Problem: how to dstngush the polcy effect from a secular change? Wth 2 perods only, ths s mpossble. The estmate s unbased only under the very strong assumpton that, absent the polcy change, there would have been no change n average Y. Wth many years of data, t s possble to develop a more convncng estmaton methodology. Suppose that years 0,..,T are avalable and change took place n year t. Put all the year dummes n the regresson: T Y t = α + β τ 1(t = τ) + ɛ t τ=1 Then ˆβ τ = Ȳτ Ȳ0 Queston: s there a rupture n the pattern of ˆβ τ around the reform date t? Problems: when the reform s gradual, ths strategy s not gong to work well. 4.2 Dfference-n-dfferences A way to mprove on the smple dfference method s to compare outcomes before and after a polcy change for a group affected by the change (Treatment Group) to a group not affected by the change (Control Group). Example: Mnmum wage ncrease n New-Jersey but not n Pennsylvana. Compare employment n the fast food ndustry before and after the change n both states ((Card and Krueger 1992)). Alternatvely: nstead of comparng before and after, t s possble to compare a regon where a polcy s mplemented to a regon wth no such polcy. Example: mcro-credt, poor households are elgble to borrow from Grameen Bank. Grameen mplements the program only n a subset of vllages. (Morduch 1998) compares rch households to poor households n vllages where Grameen mplements the program and other vllages. 13

The DD Estmate s: DD = [Ê(Y 1 T ) Ê(Y 0 T )] [Ê(Y 1 C) Ê(Y 0 C)] The dea s to correct the smple dfference before and after for the treatment group by substractng the smple dfference for the control group. DD estmates are often cleanly presented n a 2 by 2 box. The DD-estmate s an unbased estmate of the effect of the polcy change f, absent the polcy change, the average change n Y 1 Y 0 would have been the same for treatment and controls. Ths s the parallel trend assumpton. Regresson counterpart. Run OLS on, Y t = α + β 1(t = 1) + γ 1( T ) + η 1(t = 1) 1( T ) + ɛ t The OLS estmate of η s numercally dentcal to the DD estmate (the proof of ths s smlar, though somewhat more complcated, than for the smple dfference case). DD estmates are very common n appled work. Whether or not they are convncng depends on the context and on how close are the control and treatment groups. There are a number of smple checks that one should mperatvely do to assess the valdty of the DD strategy n each partcular case. Checks of DD strategy 1. Use data for pror perods (say perod -1) and redo the DD comparng year 0 and year -1 (assumng there was no polcy change between year 0 and year -1). If ths placebo DD s non zero, there are good chances that your estmate comparng year 0 and year 1 s based as well. More generally, when many years are avalable, t s very useful to plot the seres of average outcomes for Treatment and Control groups and see whether trends are parallel and whether there s a sudden change just after the reform for the Treatment group. 2. Use an alternatve control group C. If the DD wth the alternatve control s dfferent from 14

the DD wth the orgnal control C, then the orgnal DD s lkely to be based (cf. (Gruber 1996). 3. Replace Y by another outcome Y that s not supposed to be affected by the reform. If the DD usng Y s non-zero, then t s lkely that the DD for Y s based as well. NB: For 1) and 2), t possble to do a DDD strategy. The DDD estmate s the dfference between the DD of nterest and the placebo DD (that s supposed to be zero). However, the DDD s of lmted nterest n general because: all the bas. -If the DD placebo s non zero, t wll be dffcult to convnce people that the DDD removes - f the DD placebo s zero, then DD and DDD gve the same results but DD s preferable because standard errors are much smaller for DD than for DDD. (Gruber 1994, Gruber 1996) are neat emprcal examples of the use of DD estmators. Note: The closer are the Treatment and Control groups, the more convncng s the DD approach (note that n the case of a randomzed experment, Treatment and Controls are dentcal for large sample). It s often useful to perform smple dfferences between Treatment and Controls along covarates (such as age, race, ncome, educaton,...) to see whether Treatment and Controls dffer systematcally. In the regresson framework, t s useful to throw covarates nteracted wth the tme dummy to control for changes n the composton of controls and treatment groups. Common Problems wth DD estmates Targetng based on dfferences 15

A pre-condton of the valdty of the DD assumpton s that the program s not mplemented based on the pre-exstng dfferences n outcomes. Example: Ashenfelter dp : It was common to compare wage gans among partcpants and non partcpants n tranng programs to evaluate the effect of tranng on earnngs. (Ashenfelter and Card 1985) note that tranng partcpants often experence a dp n earnngs just before they enter the program (whch s presumably why they dd enter the program n the frst place). Snce wages have a natural tendency to mean reverson, ths leads to an upward bas of the DD estmtate of the program effect. In the case of dfference-n-dfferences that combne regonal and elgblty varaton: Often the regonal targetng s based upon the stuaton of the group of elgble people (e.g. Grameen wll locate a bank n the vllages where the poor are worse off. It s easy to check that ths wll lead to negatve dfference-n dfferences n the absence of the program, f vllages dffer n terms of dstrbuton of wealth. Functonal form dependence: When average levels of the outcome Y are very dfferent for controls and treatments before the polcy change, the magntude or even sgn of the DD effect s very senstve to the functonal form posted. Illustraton: Suppose you look at the effect of a tranng program targeted to the young. The unemployment level for the young decreases from 30% to 20%. The unemployment level for the old decreases from 10% to 5%. Because of the dramatc dfference n pre-program unemployment levels (30% vs 10%), t s dffcult to assess whether the program was effectve. The DD n levels would be (30 20) (10 5) = 10 5 = 5% suggestng a postve effect of tranng on employment. However, f you consder log changes n unemployment, the DD becomes, [log(30) log(20)] [log(10) log(5)] = log(1.5) log(2) < 0 suggestng that tranng had a negatve effect on employment. Long-term response versus relablty trade-off: 16

DD estmates are more relable when you compare outcomes just before and just after the polcy change because the dentfyng assumpton (parallel trends) s more lkely to hold over a short tme-wndow. Wth a long tme wndow, many other thngs are lkely to happen and confound the polcy change effect. However, for polcy purposes, t s often more nterestng to know the medum or long term effect of a polcy change. In any case, one must be very cautous to extrapolate short-term responses to long-term responses (see lterature on labor supply or taxable ncome elastctes). Heterogeneous behavoral responses: When both control and treatment groups experence a change but of dfferent sze t s stll possble to do a DD estmate. However, the DD estmate mght be meanngless f the ntensty of behavoral responses for Treatments and Controls s dfferent. Smple llustraton: effect of M t (for example margnal tax rate) on outcome Y t (taxable ncome). Assume the model s: Y t = µ + α t + η M t For treatment ndvduals, M t ncreased from 0 to M T. For control ndvduals, M t ncreased from 0 to M C. DD = [Y T 1 Y T 0 ] [Y C 1 Y C 0 ] = η T M T η C M C If η C = 2η T and M T = 2M C then DD s zero even though both η s can be large and postve. Ths ssue arses for example n (Feldsten 1995) on taxable ncome elastctes. Inference The observatons n the control and the treatment group may tend to move together over tme. In other words, there may be a common random effect at the tme*group level. In ths case, the standard error of the estmator should take nto account ths correlaton: we have n effect less nformaton than we thnk. Example: Suppose that the outcome can be descrbed by the equatons: y t = β T + γ 1 Post + α T t + ɛ t, 17

f belongs to the treatment group. y t = β C + α Ct + ɛ t, f belongs to the control group, where α T t and α Ct are random group effects (not necessarly..d). The varance of the dfference n dfference estmator should take nto account the varance of α T t α Ct : the varance covarance matrx of the error term s block dagonal. The standard OLS varance does not take t nto account. Wth only 2 perods and 1 treatment, one control group, there s nothng we can do to adjust the standard error of the DD estmator: DD s unbased, but not consstent. Wth several perods, we can use the pre-treatment perods to calculate the varance of α T t α Ct, and adjust the standard error for t. Ths problem s descrbed n general terms n Moulton (1986), and for the case of the DD specfcally n Lang(2000). Stata offers a correcton of the standard error wth the command cluster. However, ths command runs nto trouble when the number of clusters s small. Usng the formula n Moulton seems to be safer, but t needs to be programmed. 4.3 Fxed Effects Fxed effects can be seen as a generalzaton of DD n the case of more than two perods (say S perods) and more than 2 groups (say J groups). Suppose that group j n year t experences a gven polcy T (for example an ncome tax rate) of ntensty T jt. We want to know the effect of T on an outcome Y. OLS Regresson: Y jt = α + βt jt + ɛ jt Wth no fxed effects, the estmate of β s based f treatment T jt s correlated wth ɛ jt (that s, correlated wth the outcome Y jt even f the treatment T jt were all dentcal across tme and groups. Ths s often the case n practce: for example f T jt s generosty of welfare n state j and year t and Y jt s unemployment, the smple OLS estmate s lkely to be based downward f poorer states wth hgh unemployment levels have less generous benefts. A way to solve ths problem s to put tme dummes and group dummes n the regresson: Y jt = α + γ t + δ j + βt jt + ɛ jt Then dentfcaton s obtaned out of wthn group tme varaton: group specfc changes over 18

tme. Ths s a drect extenson of DD where there are 2 groups that experence dfferent changes n polcy over 2 perods. Note that changes common to all groups are captured by the tme dummes and thus are not a source of varaton that dentfes β. The problems and short-comng of Fxed effects are bascally the same as DD. The bg advantage relatve to DD s that many changes and years can be pooled n a sngle regresson producng more precse and robust results. However, the dsadvantage s that fxed effects s a black-box regresson and t s more dffcult to check vsually trends as can be done wth a sngle change. Another common crtcsm of fxed effects s that state polcy reforms may respond to trends n outcomes Y (example: ncrease generosty of welfare benefts when economy s not dong well) and thus produce a spurous correlaton even when one controls wth tme and year dummes. Fxed effects are vald only f the response s mmedate. If full responses take more than 1 perod, the fxed effects estmate mght be based because the true model should nclude lagged varables T j,t 1. 5 Instrumental Varable (IV) methodology 5.1 Bascs We know that the OLS regresson Y = Xβ + ɛ s based when ɛ s correlated wth X. A way to get around ths ssue s to use an nstrument Z for X. An nstrument Z s set of P varables. P must be equal or larger than K the number of varables n X. The IV formula s gven by ˆβ IV = (X P Z X) 1 (X P Z Y ) where P Z = Z(Z Z) 1 Z NB: when the number of nstruments s exactly equal to the number of ndependent varables 19

(P = K), the formula reduces to: ˆβ IV = (Z X) 1 (Z Y ) ˆβ IV s consstent when Z satsfes two condtons: 1) Z s uncorrelated wth ɛ 2) Z s correlated wth X (Z X s of rank K). Stata command: regress y x1.. xk (z1.. zp) where x1.. xk s the lst of dependent varables and z1.. zp s the lst of nstruments. Note that s general, we are nterested by the coeffcent on one varable X only (say x1) and we are confdent that the other controls x2.. xk are not correlated wth ɛ. In that case, x2.. xk can be used as nstruments and we need fnd only one extra nstrument z. The stata command n that case s: regress y x1 x2.. xk (z x2.. xk) The sprt of OLS s to compare outcomes Y for hgh X vs low X. The sprt of IV s to compare outcomes Y for hgh Z vs low Z. The regresson of the outcome Y on the nstruments Z s called the reduced form. To understand ths clearly, t s useful to consder the case of a sngle varable X and a sngle bnary nstrument Z. For example, X s varable ndcatng whether you have served n the mltary durng the Vetnam era ((Angrst 1990)), Z s a varable ndcatng whether you had a hgh lottery number or a low lottery number n the lottery draft, and Y are your earnngs after the war. In that case, smple computatons of the type we dd for smple dfferences shows that: ˆβ IV = Ê(Y Z = 1) Ê(Y Z = 0) Ê(X Z = 1) Ê(X Z = 0) That s, the IV estmate s the rato of the dfference of means of the outcome Y for the group Z = 1 and the group Z = 0 to the dfference of means of the varable X for the group Z = 1 and the group Z = 0. Ths s the Wald estmator, a very transparent IV estmator. Good nstruments are: 20

Strongly correlated wth X: E(X Z) vares a lot wth Z. Ths correlaton s checked by the Frst-stage: regress X on Z. X = Zγ + ν γ has to be non-zero and sgnfcant, otherwse the nstrument s weak and standard errors for β wll be large. Uncorrelated wth Y beyond the drect effect through X (n other words can be excluded from the equaton Y = Xβ + ɛ, that s, s not correlated wth ɛ). That cannot be tested and has to be assessed on a case by case bass. When there are more nstruments than columns n X, two tests can be used: An overdentfcaton test, whch n essence compares all the IV obtaned from usng dfferent subsets of nstruments, and tests whether they are the same. A Hausman test, when you trust an nstrument, and comparng the results obtaned wth only ths nstrument aganst the results obtaned usng the whole set of nstruments. These tests are useful, but have two problems: They may reject f the treatment effect s heterogenous, and the nstruments explot varaton at dfferent parts of the treatment response functon (cf. below on the nterpretaton of IV). Ther power s not very strong and they tend to accept too often. 5.2 Where to fnd nstruments? Instruments do not fall from the sky. Because t s dffcult to test the valdty of the nstruments, you need to be convnced on a pror grounds that they are vald. Good nstruments are usually generated by real or natural experments. Examples: Random encouragement desgns: These are cases where the probablty that someone receves 21

a treatment vares randomly across people. The actual treatment status may then result from a choce, and then be endogenous. Vetnam era draft lottery ((Angrst 1990)): a hgh lottery number makes t more lkely that someone s drafted, but he can stll dodge the draft f he has a hgh number, or enroll voluntarly f he has a low number. To test the effect of flu vaccne on flu ((Imbens, K.Hrano, D.Rubn and A.Zhou 2000)). A random encouragement desgn was done. A letter remndng doctors to propose a flu vaccne to ther clents was randomly sent to a set of doctors. The nstrument (the letter) s randomly assgned, but not the treatment (flu vaccne). Instruments tryng to approxmate a random encouragement desgn: Dstance to hosptal wth operatng facltes as an nstrument for surgery n heart attacks. Dstance to school as an nstrument for schoolng These nstruments must be evaluated carefully. Polcy reforms etc... An nstrument can be formed by nteractng two varables, for example a tme and group. We are then usng a DD as the frst stage of the relatonshp. In the second stage, we control for the two unnteracted varables. For example, consder the school experment n (Duflo 2000). There are two types of regons (Hgh H and Low L program regons) and two types of cohorts (Young Y and Old O). The program affected mostly the educaton of young cohorts n the hgh program regons. Assume that the program affected the wage of the ndvduals only through ts effects on educaton. The dfference n dfferences estmator for the effect of the program on educaton S s: (E[S H, Y ] E[S H, O]) (E[S L, Y ] E[S L, O]) The dfference n dfferences estmator for the effect of the program on wages W s: (E[W H, Y ] E[W H, O]) (E[W L, Y ] E[W L, O]) 22

The effect of educaton on wages can be obtaned by takng the rato of the two DD. Ths s the Wald estmator: E[W H, Y ] E[W H, O]) (E[W L, Y ] E[W L, O] E[S H, Y ] E[S H, O]) (E[S L, Y ] E[S L, O] The correspondng regresson would be: W = α + βy + γh + δs + ɛ where H s a dummy equal to 1 n the hgh program regon, Y s a dummy equal to 1 for the young and S s nstrumented wth the nteracton H Y. 5.3 Problems wth IV 1. IV can be very based (much more than OLS). Suppose our nstrument s not truly exogenous.e. Cov(Z, ɛ) 0. Consder the example of dfference n wages(y) due to servng n the Vetnam War(X), usng the draft lottery number(z) as an nstrument. We know that the OLS estmator E(Y X = 1) E(Y X = 0) s based, because servng n the army s lkely to be correlated wth lots of unobserved characterstcs. For the IV Wald estmator, the denomnator represents the dfference n the probablty of servng n the army for people wth hgh and low lottery numbers.e. ths number s less than 1. Suppose n fact the the draft lottery number were not random, then E(Y Z = 1) E(Y Z = 0) s a based estmate of the reduced form mpact of lottery number on wages. Notce now that even f the bas n the reduced form s of the same order of magntude as the bas of OLS, the IV estmate as a whole s much more based, because the denomnator s less than one. If the nstrument s strong.e. a very good predctor of army servce, then the denomnator s closer to (1 0), and hence ths bas due to the volaton of the excluson restrcton s less. 2. Even nstruments that are randomly assgned can be nvald. What s needed s that they don t affect the outcome drectly. Examples: Draft Lottery and Mltary servce ((Angrst 1990): 23

A low number could encourage someone to stay n college to evade the draft, thereby ncreasng ts earnng drectly. Flu vaccne: The letter sent to the doctor seems to have convnced them to take other steps to prevent the flu ((Imbens et al. 2000)). It therefore had a drect effect on flu, not due to the shot per se. The IV usng the letter as nstrument would be an overestmate. 3. How representatve s the IV answer? Notce that the IV estmator s the rato of the change n Y due to change n Z to the change n X due to change n Z, and we are assumng that a lower draft lottery number makes army servce more lkely, not less. We can partton all our sample unts nto the followng categores: those for whom the lottery number makes a dfference to the army servce decson and those for whom t doesn t (ths ncludes those who would have volunteered anyway, and those who would have avoded the draft rrespectve of ther lottery number). Then the change n X due to change n Z s non-zero only for the frst group (the complers ) and so the IV estmate represents the mpact of army servce on wages only for ths group. Ths s called the Local Average Treatment Effect.e. the mpact of the treatment (army servce) only for the group affected by the nstrument(see (Angrst and Imbens 1994, Angrst, Imbens and B.Rubn 1996, Angrst and Krueger 1999) for detaled explanatons of ths). If we assume that the mpact of army servce on wages s the same for every ndvdual n the populaton ( constant treatment effect ) then ths IV estmate represents a populaton average. However, f the mpact of army servce s dfferent for non-complers, then we must be careful whle extrapolatng IV estmates to the whole populaton. 4. Specfcaton searchng and publcaton bas. Papers wth T statstc above 2 are more lkely to be publshed. IV have larger standard error than OLS, therefore they also need larger pont estmates to be sgnfcant. Reported IV wll therefore have a natural tendency to be too hgh. Ths s Ashenfelter, Harmon and Oosterbeek (1999) explanaton for why IV returns to educaton tend to be hgher than OLS. 24

5.4 Gettng Instruments from Theoretcal Models In many papers, authors wrte down theoretcal models whch generate nstruments. For example, (Strauss 1986) wrtes down a model of the effect of food on productvty. Prce of food s negatvely correlated wth food quantty. The model s wrtten such that prce of food s also uncorrelated wth productvty besdes the effect on food ntake. Ths strategy known as structural model estmaton produces a framework that s complete (theoretcal model and data applcaton) and estmates that are fully meanngful n the context of the model. However, these estmates are vald only to the extent that the structural model s vald. 6 Regresson Dscontnuty Desgn References [1] The mportant reference s (Campbell 1969). See (Angrst and Lavy 1999, Van der Klauw 1996) for convncng applcatons. RDD can be used when the treatment s a dscontnuous functon of an underlyng contnuous varable. Examples: - Grameen bank elgblty rule: elgble f households owns less then 0.5 hectares. - Fnancal ad at NYU for college studes: step functon of an ndex (grades n hghschool, SAT scores, ncome of parents...). - Mamondes rule for class sze n Israel: extra teacher added as soon as the number of pupls n class reaches multple of 40 students. When ths rule s followed at least approxmately, t means that two people wth very close characterstcs wll be exposed to dfferent treatments. Idea of RD: compare outcome for people whose value of the underlyng targetng varable s just below and just above the dscontnuty. 25

Formally: Imagne frst that treatment rule s based on some number X and that the treatment rule s: - T = 1 f X X - T = 0 f X < X Then wth a large sample, you would compute (for some ɛ): E[Y X X < X + ɛ] E[Y X ɛ X < X = E[Y T T, X X < X + ɛ] E[Y C C, X ɛ X < X] The assumpton s that as the ɛ goes to 0, the dfference between the two groups n the absence of the treatment shrnks to 0. More realstcally, the rule ncrease the probablty that someone wll be treated. - Not everybody wth X X wll be treated (for example, some people may not have asked for fnancal ad, even though they would qualfy for t. - Some people wth X < X wll not be treated (for example, some schools wth less than 40 students stll get a second teacher). Formally: - P (T = 1) = p 1 f X >= X - P (T = 0) = p 0 f X < X, wth p 1 > p 0. below X. We can agan calculate the dfference n outcome between ndvduals just above and just E[Y X X < X + ɛ] E[Y X ɛ X < X] Under the same assumpton as before (that for ɛ small enough, the outcomes n the absence of treatment would be the same n the two groups), we can attrbute ths dfference to the dfference n the probablty of treatment. But now, there are some treated people and some control people 26

on both sdes of X. To obtan the effect of the treatment, we must scale up the dfference, by dvdng between the dfference n the probablty of treatment between the two groups. E[Y X X < X + ɛ] E[Y X ɛ X < X] E[T X X < X + ɛ] E[T X ɛ X < X] The relatonshp between ths and IV should be clear: Ths s the Wald estmate (whch we derved above), usng a dummy for X X as nstrument for the treatment status. Ths regressondscontnuty Wald estmator s numercally dentcal to a non-parametrc kernel estmator wth a unform kernel. Under ths nterpretaton, ths estmator would be vald even f the IV assumptons were volated. However, t would be asymptotcally based and we would need to use a slghtly more complcated non-parametc estmator to reduce the bas ((Hahn, Todd and der Klaauw 2001)). Researchers have exploted ths to construct IV versons of the RD estmator: We start wth a model: Y = αt + g(x) + ɛ, where g(x) s a set of smooth functons of X (polynomals, splnes, etc...) (thus controllng for the dependence of Y on X). A dummy for X > X can then be used as nstrument for recevng the treatment, n a regular 2SLS strategy Cautonary remarks: It s mportant to check n the data that there s actually a dscontnuty n the probablty of beng treated at the expected pont X. Example: In the Grameen case, (Morduch 1999) shows that people wth more than 0.5 hectares of land are as lkely than other people to get credt. The frst step should be to regress non-parametrcally the treatment varable on the varable X, and check whether the dscontnuty s actually present n the data. In developng countres, even strong rules are rarely followed to the letter... Fancy means testng procedure are unlkely to generate RD that can be exploted n practce. 27

Large sample s requred, snce you wll be explotng only varaton comng from ndvduals around X. 7 The measurement error problem 7.1 Classcal measurement error Assume that you want to estmate the relatonshp y = βx + ɛ, for = 1 to N, where, for example y s log calores per capta (after havng taken out the mean) and x s log of long run resources per capta. However, the true y and the true x are both unobserved. What you observe are proxes of these measures,.e. the true varables, measured wth error (For example: t s qute dffcult to know what people really eat: they eat food out of the home, there s wastage,... It s also dffcult to know people s long run resources. What we observe n a survey s people s current ncome, whch can vary much more). We model measurement error n the followng way. We observe y and x, whch are the true varables, plus some nose. y = y + ν, x = x + υ, In the classcal measurement error case, the assumpton s that measurement errors are uncorrelated wth the truth, and wth each other (we wll see what happens f we relax ths assumpton). So: E[ν y ] = E[υ x ] = E[ν x ] = E[υ y ] = E[υ ν ] = 0 28

Obvously, we wll also assume that the model s otherwse correctly specfed: E[ɛ x ] = 0. Let us rewrte the model n terms of the varable we actually observe: y = βx + (ɛ + ν βυ ) The source of the problem s that the new error term w = ɛ + ν βυ s now not uncorrelated wth x. To see ths, let us express the OLS estmator of β n the observed equaton: ˆβ OLS = N=1 x y N=1 x 2 = N=1 (x + υ )(βx + ɛ ν ) N=1 (x + υ )(x + υ ) We want to know the probablty lmt of ˆβ OLS as N. Wth our assumptons we obtan: Plm( ˆβ OLS ) = Plm where for a random varable x, σ 2 x = Plm ( N =1 x2 N Note: ( β 1 N 1 N=1 N x 2 ) σx 2 N=1 x 2 + υ 2 = β σx 2 +, σ2 υ ) s the varance of x. 1. There s an attenuaton bas: the estmated coeffcent s smaller than the true coeffcent (the Iron law of econometrcs ). 2. The measurement error n y does not lead to attenuaton bas, n the uncorrelated case. 3. The larger the varance of the error term relatve to the varance of the underlyng varable (the sgnal to nose rato ), the larger the attenuaton bas. 7.2 The problem of measurement error wth fxed effects Imagne you now have the relatonshp: 29

y t = βx t + ɛ t, wth ɛ t = ω + ξ t. You are worred that there s a correlaton between ω and x t whch would lead to a bas n OLS estmaton of ths equaton. If you have two years of data, you mght thnk of takng frst dfferences: y 2 y 1 = β(x 2 x 1) + ξ 2 ξ 1 whch we rewrte: y t = β x t + ξ t, The fxed effect has now dsappeared, so we have solved ths problem. However, the measurement problem s stll here. The probablty lmt of the frst OLS estmate of β n the frst dfference equaton (assumng uncorrelated measurement error) s: σ 2 x Plm ˆβ F D = β σ x 2 + σ υ 2 If measurement errors are ndependent measurement error n each perod (an extreme case), then σ υ 2 = 2 σ2 υ. However, x s presumably strongly autocorrelated, so σ x 2 < σ2 x. Therefore the attenuaton bas s stronger n fxed effect. In fact, t can become really large f the underlyng varable does not move very much over tme but there s measurement error n every perod. 7.3 Instrumental varables to solve the measurement error problem Comng back to the case of a sngle cross-secton, assume that you have another, ndependent measure of x, possbly nosy as well. For example, you ask food expendture to each spouses n the famly (whle the other spouse s not n the room). 30

z = x + µ, wth E[µ x ] = E[µ υ ] = 0. The nstrumental varable estmator of β s: N=1 x y N=1 (x ˆβ IV = N=1 = + υ )(βx + ɛ + ν ) x z N=1 (x + υ )(x + µ ) Plm ˆβ IV = β So the IV estmator s consstent. 7.4 Non-classcal measurement error Ths can occur due to varous reasons: 1. Measurement error s correlated wth the underlyng varables X. In ths case, there s not necessarly an attenuaton bas, and error n the measurement of y can also lead to based estmates. For example, assume that E[ν x ] = σ νx 0. For example, there would be a postve correlaton between the measurement error n calore ntake and ncome f calore ntakes tend to be overestmated for hgh ncome households (because they waste more) and underestmated for low ncome households. Then the probablty lmt of the OLS estmator becomes: plm( ˆβ OLS ) = β σ2 x + σ νx σx 2 +, σ2 υ The bas depend on how large σ νx s. 31

2. Our regressors are categorcal varables e.g. years of schoolng (dscrete values) or dummy for hgh-school graduate. In ths case, the lowest category cannot under-report and the hghest cannot over-report, whch means that the dstrbuton of the measurement error s related to the value of the regressor, thus volatng the classcal assumptons. In the case of only two categores, the OLS estmates are based downwards, but two-stage IV estmates are based upwards. In the case of only two schoolng categores (0 and 1) and two measurements S 1 and S 2 of the true schoolng level S, we can use S 2 as an nstrument for S 1. In ths case, Kane et. al. derve the probablty lmt of the 2SLS estmator to be plm( ˆβ 1 2SLS ) = β 1 (α 1 + α 2 ) where α 1 = P r(s 1 = 0 S = 1), α 2 = P r(s 1 = 1 S = 0). Snce the denomnator s less than 1, the IV estmator s based upwards. In the general case of multple categores, we cannot even determne the drecton of bas. Dfferent ways of solvng these problems nclude estmatng the extent of measurement error by usng a valdaton data set (Pshke 1995), or puttng restrctons on the form of the measurement error (Card 1996) or estmatng the extent of measurement error from the data by usng the presence of two measures of the regressor (Kane et.al. 1999). 3. Suppose people do not report the msmeasured x and y, but nstead are aware of the possblty of msmeasurement and report ther best estmate of x and y, based upon the observed x and y. For nstance, f people are asked how much food they buy n a month and they report a monthly fgure based on last week s consumpton. Or people are asked ther ncome levels, and they report ther best estmate of t, whch may not nclude some components lke nterest ncome or captal gans. The best estmate x s E[x x], whch for our lnear model would be a lnear combnaton of the observed x and µ x, the uncondtonal mean of x (and smlarly for y). Crucally, the measurement error between the reported value and the true value ( x x ) would now be uncorrelated wth the reported x (property of condtonal expectaton), so that measurement error n the regressor does not lead to a downward bas n OLS. Further, usng IV n such a stuaton would result n an upward-based estmate. On the other hand, the reported ỹ s E[y y] = λy + (1 λ)µ y Cov(ỹ, x ) = λcov(y, x ) = λcov(y, x ) our OLS estmates are based downwards when there s measurement error n y, but not based f there s measurement error n x. Ths s the reverse of the results 32

wth classcal measurement error! Hyslop and Imbens (2000) also consder the case when the respondents report ther best estmate x takng nto account both the observed y and the observed x, and fnd that measurement error can even lead to upward bases n OLS. 33

References Altonj, Joseph, Todd E. Elder, and Chrstopher R. Taber (2000) Selecton on observed and unobserved varables: Assessng the effectveness of catholc schools. NBER Workng Paper No. W7831 Angrst, Joshua D. (1990) Lfetme earnngs and the Vetnam era draft lottery: Evdence from Socal Securty Admnstratve records. Amercan Economc Revew 80(3), 313 336 (1998) Estmatng the labor market mpact of voluntary mltary servce usng socal securty data on mltary applcants. Econometrca 66(2), 249 88 Angrst, Joshua D., and Alan B. Krueger (1999) Emprcal strateges n labor economcs. Forthcomng Handbook of Labor Economcs Angrst, Joshua D., and Gudo Imbens (1994) Identfcaton and estmaton of local average treatment effects. Econometrca 62(2), 467 475 Angrst, Joshua D., and Vctor Lavy (1999) Usng Mamondes rule to estmate the effect of class sze on scholastc achevement. Quarterly Journal of Economcs 114(2), 533 575 Angrst, Joshua D., Gudo W. Imbens, and Donald B.Rubn (1996) Identfcaton of causal effects usng nstrumental varables. Journal of the Amercan Statstcal Assocaton 91(434), 444 455 Ashenfelter, Orley, and Davd Card (1985) Usng the longtudnal structure of earnngs to estmate the effect of tranng programs. Revew of Economcs and Statstcs 67(4), 648 60 Ashenfelter, Orley, and Mark W. Plant (1990) Nonparametrc estmates of the labor-supply effects of negatve ncome tax programs. Journal of Labor Economcs 8(1), 396 415 Ashenfelter, Orley, Colm P. Harmon, and Hessel Oosterbeek (1999) A revew of estmates of the schoolng/earnngs relatonshp. Labour Economcs 6(4), 453 470 Campbell, Donald T. (1969) Reforms as experments. Amercan Psychologst 24, 407 429 Card, Davd, and Alan Krueger (1992) Does school qualty matter? Returns to educaton and the characterstcs of publc schools n the Unted States. Journal of Poltcal Economy 100(1), 1 40 34

Deaton, Angus (1997) The Analyss of Household Surveys (World Bank, Internatonal Bank for Reconstructon and Development) Deheja, Rajeev H, and Sadek Wahba (1999) Causal effects n nonexpermental studes: Reevaluatng the evaluaton of tranng programs. Journal of the Amercan Statstcal Assocaton 94(448), 1053 62 Duflo, Esther (2000) Schoolng and labor market consequences of school constructon n Indonesa: Evdence from an unusual polcy experment. Workng Paper 7860, Natonal Bureau of Economc Research, August Feldsten, Martn (1995) The effect of margnal tax rates on taxable ncome: A panel study of the 1986 tax reform act. Journal of Poltcal Economy 103(3), 551 72 Gruber, Jonathon (1994) The ncdence of mandated maternty benefts. Amercan Economc Revew 84(3), 622 641 (1996) Cash welfare as a consumpton smoothng mechansm for sngle mothers. Workng Paper 5738, Natonal Bureau of Economc Research Hahn, Jnyong, Petra Todd, and Wlbert Van der Klaauw (2001) Identfcaton and estmaton of treatment effects wth a regresson-dscontnuty desgn. Econometrca 69(1), 201 209 Hausman, Jerry A, and Davd A Wse (1979) Attrton bas n expermental and panel data: The gary ncome mantenance experment. Econometrca 47(2), 455 73 Imbens, Gudo, K.Hrano, D.Rubn, and A.Zhou (2000) Estmatng the effect of flu shots n a randomzed encouragement desgn. Bostatstcs 1(1), 69 88 Kremer, Mchael, Paul Glewwe, and Sylve Mouln (1998) Textbooks and test scores: Evdence from a prospectve evaluaton n Kenya. Mmeo, Harvard Krueger, Alan (2000) Expermental estmates of educaton producton functons. forthcomng, Quarterly Journal of Economcs Lalonde, Robert J. (1986) Evaluatng the econometrc evaluatons of tranng programs usng expermental data. Amercan Economc Revew 76(4), 602 620 35

Mayer, Susan E. (1999) How dd the ncrease n economc nequalty between 1970 and 1990 affect poor amercan chldren s educatonal attanment? Mmeo Meyer, Bruce D. (1995) Natural and quas-experments n economcs. Journal of Busness and Economc Statstcs 13(2), 151 161 Morduch, Jonathan (1998) Does mcrofnance really help the poor? new evdence from flagshp programs n bangladesh. Mmeo (1999) The mcrofnance promse. Forthcomng,Journal of Economc Lterature Pencavel, John (1986) Labor supply of men. In Handbook of Labor Economcs, ed. Orley Ashenfelter and Rchard Layard (Elsever Scence) Rosenbaum, Paul, and Donald B. Rubn (1984) Estmatng the effects caused by treatments: Comment [on the nature and dscovery of structure]. Journal of the Amercan Statstcal Assocaton 79(385), 26 28 Rosenbaum, Paul R. (1995) Observatonal studes. In Seres n Statstcs (New York: Hedelberg and London: Sprnger) Sacerdote, Bruce (2000) Peer effects wth random assgnment: Results for Dartmouth roommates. Workng Paper 7469, Natonal Bureau of Economc Research Strauss, John (1986) Does better nutrton rase farm productvty? Journal of Poltcal Economy 9(2), 297 320 Van der Klauw, Wlbert (1996) A regresson-dscontnuty evaluaton of the effect of fnancal ad offers on college enrollment. Mmeo, New York Unversty, Department of Economcs 36