Problem set 2, Part 2: Generalized Roy Model 2 Factor, no normality After doing this problem set you should be able to figure out how to include more factors (so you make the model more flexible) get rid of the normality assumptions (actually there is a paper by Ferguson (1983) that shows that a mixture of normals can aproximate almost any distribution arbitrarilly well so we are really being very flexible here. That the model does not depend on functional form or distributional assumptions can be seen in Carneiro, Hansen Heckman (2003)). Ok, so much for simple 1 factor models normality assumptions. Let usnowgotoamodeloftheform I = Zγ + V (1) Y t,1 = Xβ t,1 + ε t,1 (2) Y t,0 = Xβ t,0 + ε t,0 (3) D =1(I>0) (4) Assume that that Y t = DY t,1 +(1 D) Y t,0 V = f 1 α V1 + f 2 α V2 + U V (5) ε t,1 = f 1 α t,11 + f 2 α t,12 + U t,1 (6) ε t,0 = f 1 α t,01 + f 2 α t,02 + U t,0 (7) (U t,1,u t,0,u V ) mutually independent (U t,1 U t,0 U V ) for all t f 1 f 2 (f 1,f 2 ) (U t,0,u t,1,u V ) U t,1 N ³0,σ 2Ut,1 ³0,σ 2Ut,0 U t,0 N U V N 0,σ 2 U V 1
Suppose that additionaly, you have two external test equations which we observe regardless of D which only depend on f 1. The tests take the form T 1 = Qθ 1 + f 1 δ 11 + U T1 T 2 = Qθ 2 + f 1 δ 21 + U T2 U T1 N 0,σ 2 T 1 U T2 N 0,σ 2 T 2. 1. Write down the likelihood function for this problem assuming that f 1 f 2 have some distribution say Pr (f 1,f 2 )=Pr(f 1 )Pr(f 2 ). Notice that conditional on f everything is independent, so take advantage of this when writting the likelihood. Now assume that XK 1 f 1 p f1, kn µ f1,k,σ 2 f 1,k XK 1 p f1, kµ f1,k =0 XK 1 p f1, k =1 XK 2 f 2 p f2, kn µ f2,k,σ 2 f 2,k XK 2 p f2, kµ f2,k =0 XK 2 p f2, k =1 2
where I am abusing notation to let st for X Pr (X) = KX KX p k N µ k,σ 2 k ³ 1 p k p e 1 X µk 2 σ k 2. 2πσ 2 k Also impose the following normalizations: σ 2 U V =1,δ 11 =1α 1,02 =1. To keep it simple assume that K 1 =2K 2 =2,butyoucanwriteit as a general program that allows for more mixture components, more time periods, more test equations more factors (later you will write a program that allows for more choices too!). 2. Just to start your engines, what is the formula for the variance of a mixture of normals rom variable like X above? 3. Program either a Maximum Likelihood or MCMC version of this model. Notice that if you do an MLE version you are now going to have to integrate over 2 continuous distributions which is going to take a long time. I strongly recomend the MCMC version since extending it to more factorsisnaturalitisstillveryfastwhereasextendingthemleversion is not. You ll still have a chance to practice MLE on a dynamic program on PS8. If using an MCMC method put non-informative priors on γ,θ j,β t,1 β t,0.putnormal(0, 10.0) (proper but with little µ information) priors on 1 1 1 α t,01,α t,11,α t,v1 δ 21 ;gamma(2, 1) priors on,,. σ 2 U σ 2 t,1 U σ 2 t,0 T j We are going to use dataset 2b for this part of the problem set for future problem sets. The way this dataset, again abusing notation for the mixtures, was generated is the following: f 1 0.5N (1, 2) + 0.5N ( 1, 2) f 2 0.3N (0.5, 0.5) + 0.7N ( 0.2143, 0.1) U t,1 N (0, 1) U t,0 N (0, 1) U V N (0, 1) t = 1, 2, 3 3
Z 0 = X 0 are just a constant equal to 1. Next we generate X 1 N (0, 2) Z 1 N (0, 2) so we are in the case where Z =(Z 0,Z 1 )X =(X 0,X 1 ) are exogenous. We finally form Y t,1 =2X 0 + X 1 +2f 1 + f 2 + U t,1 Y t,0 = X 0 + X 1 + f 1 + f 2 + U t,0 I =0.5Z 0 + Z 1 + f 1 + f 2 + U V for t =1, 2, 3let D =1(I>0). so that the observed Y t is Y t = DY t,1 +(1 D) Y t,0. Finally the test equations were generated as U T1 N (0, 1), U T2 N (0, 1). Q 0 =1, Q 1 N (0, 1) T 1 = Q 0 + Q 1 + f 1 + U T1. T 2 = Q 0 +2Q 1 +0.5f 1 + U T2 4. Run your program on this data. If you did it correctly your estimates should be close to the values we assigned when we built the dataset. Suppose we are interested in estimating mean treatment parameters (but see Carneiro, Hansen Heckman for the use of these methods in estimation of distributions) for present values, assume there is no discounting. That is, define 3X Y 1 = Y 0 = 4 Y t,1 t=1 3X Y t,0. t=1
5. (Derive analytically if you want, it is a very nice exercise to do you will not regret it since you will use something similar in PS8). Estimate from your results in the previous section the following: a) Average Treatment on the Treated b) Average Treatment effect c) Average effect of treatment for people at the margin of indiference between D =1D = 0 (a nice way to do this numerically is to change the intercept of I by very little take the average treatment on the treated effect for those persons who actually change choice). d) How would you estimate the Marginal Treatment Effect (Can you derive it?)? 6. Now let s look at the robustness of the method to changes in available information. This is going to be very important when making comparisons across methods. Suppose now that f 2 becomes available somehow so that now it is observed by you (the econometrician). This means we are back in a one factor model since f 2 is now like an X a Z. Reestimate the model. Do your results change? (hint: they shouldn t change much, to see why check Heckman Navarro-Lozano (2004)). In this final stage we are going to give names to things to make it easier to underst. In the previous model, suppose that the choice being made is schooling but that now there are 3 levels of schooling so now we have I 1 = U V1 I 2 = Zγ 2 + f 1 α 2,V1 + f 2 α 2,V2 + U V2 I 3 = Zγ 3 + f 1 α 3,V1 + f 2 α 3,V2 + U V3 U Vj N (0, 1). You should recognize the assumptions from problem set 1b. Now suppose that the outcomes you observe (i.e., the Y t,j ) in each schooling level are wages (now of course you only observe wages for the schooling level chosen, not for all 3). However we now add employment to the decisions being made. That is, not only do I only observe wages only for the schooling level chosen but at any given time period I only observe wages if Et = W t ρ t + f 1 π t,1 + f 2 π t,2 + η t > 0. η t N (0, 1). 5
That is I only observe wages for those who choose to be employed (we could easily allow the empoyment decision to depend on the schooling level chosen too). Assume as before that everything is independent conditional on the factor. 7. Can you write the likelihood for this model? 8. What about an algorithm? (you do not need to program it, just write how you would do it). By now, you should be able to see that extending the model (say for more time periods, other choices, more than 2 choices etc) is pretty straightforward under the factor structure assumption. This however, is only one way to do it. With what you have learned you should be able to figure out other methods program them since the principles are always the same. 6