Regressio with a Biary Deedet Variable (SW Ch. 11) So far the deedet variable (Y) has bee cotiuous: district-wide average test score traffic fatality rate But we might wat to uderstad the effect of X o a biary variable: Y = get ito college, or ot Y = erso smokes, or ot Y = mortgage alicatio is acceted, or ot 9-1
Examle: Mortgage deial ad race The Bosto Fed HMDA data set Idividual alicatios for sigle-family mortgages made i 1990 i the greater Bosto area 380 observatios, collected uder Home Mortgage Disclosure Act (HMDA) Variables Deedet variable: o Is the mortgage deied or acceted? Ideedet variables: o icome, wealth, emloymet status o other loa, roerty characteristics o race of alicat 9-
The Liear Probability Model (SW Sectio 11.1) A atural startig oit is the liear regressio model with a sigle regressor: But: Y i = β 0 + β 1 X i + u i What does β 1 mea whe Y is biary? Is β 1 = Y X? What does the lie β 0 + β 1 X mea whe Y is biary? What does the redicted value ˆ Y mea whe Y is biary? For examle, what does ˆ Y = 0.6 mea? 9-3
The liear robability model, ctd. Y i = β 0 + β 1 X i + u i Recall assumtio #1: E(u i X i ) = 0, so E(Y i X i ) = E(β 0 + β 1 X i + u i X i ) = β 0 + β 1 X i Whe Y is biary, E(Y) = 1 Pr(Y=1) + 0 Pr(Y=0) = Pr(Y=1) so E(Y X) = Pr(Y=1 X) 9-4
The liear robability model, ctd. Whe Y is biary, the liear regressio model Y i = β 0 + β 1 X i + u i is called the liear robability model. The redicted value is a robability: o E(Y X=x) = Pr(Y=1 X=x) = rob. that Y = 1 give x o Y ˆ = the redicted robability that Y i = 1, give X β 1 = chage i robability that Y = 1 for a give x: β 1 = Pr( Y = 1 X = x + x ) Pr( Y = 1 X = x ) x Examle: liear robability model, HMDA data 9-5
Mortgage deial v. ratio of debt aymets to icome (P/I ratio) i the HMDA data set (subset) 9-6
Liear robability model: HMDA data dey = -.080 +.604P/I ratio ( = 380) (.03) (.098) What is the redicted value for P/I ratio =.3? Pr( dey = 1 P / Iratio =.3) = -.080 +.604.3 =.151 Calculatig effects: icrease P/I ratio from.3 to.4: Pr( dey = 1 P / Iratio =.4) = -.080 +.604.4 =.1 The effect o the robability of deial of a icrease i P/I ratio from.3 to.4 is to icrease the robability by.061, that is, by 6.1 ercetage oits (what?). 9-7
Next iclude black as a regressor: dey = -.091 +.559P/I ratio +.177black (.03) (.098) (.05) Predicted robability of deial: for black alicat with P/I ratio =.3: Pr( dey = 1) = -.091 +.559.3 +.177 1 =.54 for white alicat, P/I ratio =.3: Pr( dey = 1) = -.091 +.559.3 +.177 0 =.077 differece =.177 = 17.7 ercetage oits Coefficiet o black is sigificat at the 5% level Still lety of room for omitted variable bias 9-8
The liear robability model: Summary Models robability as a liear fuctio of X Advatages: o simle to estimate ad to iterret o iferece is the same as for multile regressio (eed heteroskedasticity-robust stadard errors) Disadvatages: o Does it make sese that the robability should be liear i X? o Predicted robabilities ca be <0 or >1! These disadvatages ca be solved by usig a oliear robability model: robit ad logit regressio 9-9
Probit ad Logit Regressio (SW Sectio 11.) The roblem with the liear robability model is that it models the robability of Y=1 as beig liear: Pr(Y = 1 X) = β 0 + β 1 X Istead, we wat: 0 Pr(Y = 1 X) 1 for all X Pr(Y = 1 X) to be icreasig i X (for β 1 >0) This requires a oliear fuctioal form for the robability. How about a S-curve 9-10
The robit model satisfies these coditios: 0 Pr(Y = 1 X) 1 for all X Pr(Y = 1 X) to be icreasig i X (for β 1 >0) 9-11
Probit regressio models the robability that Y=1 usig the cumulative stadard ormal distributio fuctio, evaluated at z = β 0 + β 1 X: Pr(Y = 1 X) = Φ(β 0 + β 1 X) Φ is the cumulative ormal distributio fuctio. z = β 0 + β 1 X is the z-value or z-idex of the robit model. Examle: Suose β 0 = -, β 1 = 3, X =.4, so Pr(Y = 1 X=.4) = Φ(- + 3.4) = Φ(-0.8) Pr(Y = 1 X=.4) = area uder the stadard ormal desity to left of z = -.8, which is 9-1
Pr(Z -0.8) =.119 9-13
Probit regressio, ctd. Why use the cumulative ormal robability distributio? The S-shae gives us what we wat: o 0 Pr(Y = 1 X) 1 for all X o Pr(Y = 1 X) to be icreasig i X (for β 1 >0) Easy to use the robabilities are tabulated i the cumulative ormal tables Relatively straightforward iterretatio: o z-value = β 0 + β 1 X ˆ β + ˆ 1 o 0 β X is the redicted z-value, give X o β 1 is the chage i the z-value for a uit chage i X 9-14
STATA Examle: HMDA data. robit dey _irat, r; Iteratio 0: log likelihood = -87.0853 Iteratio 1: log likelihood = -835.6633 Iteratio : log likelihood = -831.80534 Iteratio 3: log likelihood = -831.7934 We ll discuss this later Probit estimates Number of obs = 380 Wald chi(1) = 40.68 Prob > chi = 0.0000 Log likelihood = -831.7934 Pseudo R = 0.046 ------------------------------------------------------------------------------ Robust dey Coef. Std. Err. z P> z [95% Cof. Iterval] -------------+---------------------------------------------------------------- _irat.967908.4653114 6.38 0.000.055914 3.879901 _cos -.194159.164971-13.30 0.000 -.517499-1.8708 ------------------------------------------------------------------------------ Pr( dey 1 P / Iratio) = = Φ(-.19 +.97 P/I ratio) (.16) (.47) 9-15
STATA Examle: HMDA data, ctd. Pr( dey 1 P / Iratio) = = Φ(-.19 +.97 P/I ratio) (.16) (.47) Positive coefficiet: does this make sese? Stadard errors have usual iterretatio Predicted robabilities: Pr( dey 1 P / Iratio.3) = = = Φ(-.19+.97.3) = Φ(-1.30) =.097 Effect of chage i P/I ratio from.3 to.4: Pr( dey = 1 P / Iratio =.4) = Φ(-.19+.97.4) =.159 Predicted robability of deial rises from.097 to.159 9-16
Probit regressio with multile regressors Pr(Y = 1 X 1, X ) = Φ(β 0 + β 1 X 1 + β X ) Φ is the cumulative ormal distributio fuctio. z = β 0 + β 1 X 1 + β X is the z-value or z-idex of the robit model. β 1 is the effect o the z-score of a uit chage i X 1, holdig costat X 9-17
STATA Examle: HMDA data. robit dey _irat black, r; Iteratio 0: log likelihood = -87.0853 Iteratio 1: log likelihood = -800.88504 Iteratio : log likelihood = -797.1478 Iteratio 3: log likelihood = -797.13604 Probit estimates Number of obs = 380 Wald chi() = 118.18 Prob > chi = 0.0000 Log likelihood = -797.13604 Pseudo R = 0.0859 ------------------------------------------------------------------------------ Robust dey Coef. Std. Err. z P> z [95% Cof. Iterval] -------------+---------------------------------------------------------------- _irat.741637.4441633 6.17 0.000 1.87109 3.61181 black.7081579.0831877 8.51 0.000.545113.87108 _cos -.58738.1588168-14. 0.000 -.570013-1.947463 ------------------------------------------------------------------------------ We ll go through the estimatio details later 9-18
STATA Examle: redicted robit robabilities. robit dey _irat black, r; Probit estimates Number of obs = 380 Wald chi() = 118.18 Prob > chi = 0.0000 Log likelihood = -797.13604 Pseudo R = 0.0859 ------------------------------------------------------------------------------ Robust dey Coef. Std. Err. z P> z [95% Cof. Iterval] -------------+---------------------------------------------------------------- _irat.741637.4441633 6.17 0.000 1.87109 3.61181 black.7081579.0831877 8.51 0.000.545113.87108 _cos -.58738.1588168-14. 0.000 -.570013-1.947463 ------------------------------------------------------------------------------. sca z1 = _b[_cos]+_b[_irat]*.3+_b[black]*0;. dislay "Pred rob, _irat=.3, white: "ormrob(z1); Pred rob, _irat=.3, white:.07546603 NOTE _b[_cos] is the estimated itercet (-.58738) _b[_irat] is the coefficiet o _irat (.741637) sca creates a ew scalar which is the result of a calculatio dislay rits the idicated iformatio to the scree 9-19
STATA Examle: HMDA data, ctd. Pr( dey = 1 P / I, black) = Φ(-.6 +.74 P/I ratio +.71 black) (.16) (.44) (.08) Is the coefficiet o black statistically sigificat? Estimated effect of race for P/I ratio =.3: Pr( dey = 1.3,1) = Φ(-.6+.74.3+.71 1) =.33 Pr( dey = 1.3,0) = Φ(-.6+.74.3+.71 0) =.075 Differece i rejectio robabilities =.158 (15.8 ercetage oits) Still lety of room for omitted variable bias 9-0
Logit regressio Logit regressio models the robability of Y=1 as the cumulative stadard logistic distributio fuctio, evaluated at z = β 0 + β 1 X: Pr(Y = 1 X) = F(β 0 + β 1 X) F is the cumulative logistic distributio fuctio: F(β 0 + β 1 X) = 1+ 1 ( 0 1X ) e β + β 9-1
Logistic regressio, ctd. Pr(Y = 1 X) = F(β 0 + β 1 X) where F(β 0 + β 1 X) = 1+ 1 ( 0 1X ) e β + β. Examle: β 0 = -3, β 1 =, X =.4, so β 0 + β 1 X = -3 +.4 = -. so Pr(Y = 1 X=.4) = 1/(1+e (.) ) =.0998 Why bother with logit if we have robit? Historically, umerically coveiet I ractice, very similar to robit 9-
STATA Examle: HMDA data. logit dey _irat black, r; Iteratio 0: log likelihood = -87.0853 Iteratio 1: log likelihood = -806.3571 Iteratio : log likelihood = -795.74477 Iteratio 3: log likelihood = -795.6951 Iteratio 4: log likelihood = -795.6951 Later Logit estimates Number of obs = 380 Wald chi() = 117.75 Prob > chi = 0.0000 Log likelihood = -795.6951 Pseudo R = 0.0876 ------------------------------------------------------------------------------ Robust dey Coef. Std. Err. z P> z [95% Cof. Iterval] -------------+---------------------------------------------------------------- _irat 5.37036.9633435 5.57 0.000 3.4844 7.58481 black 1.778.1460986 8.71 0.000.9864339 1.55913 _cos -4.15558.34585-11.93 0.000-4.80336-3.447753 ------------------------------------------------------------------------------. dis "Pred rob, _irat=.3, white: " > 1/(1+ex(-(_b[_cos]+_b[_irat]*.3+_b[black]*0))); Pred rob, _irat=.3, white:.07485143 NOTE: the robit redicted robability is.07546603 9-3
Predicted robabilities from estimated robit ad logit models usually are very close. 9-4
Estimatio ad Iferece i Probit (ad Logit) Models (SW Sectio 11.3) Probit model: Pr(Y = 1 X) = Φ(β 0 + β 1 X) Estimatio ad iferece o How to estimate β 0 ad β 1? o What is the samlig distributio of the estimators? o Why ca we use the usual methods of iferece? First discuss oliear least squares (easier to exlai) The discuss maximum likelihood estimatio (what is actually doe i ractice) 9-5
Probit estimatio by oliear least squares Recall OLS: mi [ Y ( b + b X )] b0, b1 i 0 1 i i= 1 The result is the OLS estimators β 0 ad β 1 ˆ ˆ I robit, we have a differet regressio fuctio the oliear robit model. So, we could estimate β 0 ad β 1 by oliear least squares: mi [ Y Φ ( b + b X )] b0, b1 i 0 1 i i= 1 Solvig this yields the oliear least squares estimator of the robit coefficiets. 9-6
Noliear least squares, ctd. mi [ Y Φ ( b + b X )] b0, b1 i 0 1 i i= 1 How to solve this miimizatio roblem? Calculus does t give ad exlicit solutio. Must be solved umerically usig the comuter, e.g. by trial ad error method of tryig oe set of values for (b 0,b 1 ), the tryig aother, ad aother, Better idea: use secialized miimizatio algorithms I ractice, oliear least squares is t used because it is t efficiet a estimator with a smaller variace is 9-7
Probit estimatio by maximum likelihood The likelihood fuctio is the coditioal desity of Y 1,,Y give X 1,,X, treated as a fuctio of the ukow arameters β 0 ad β 1. The maximum likelihood estimator (MLE) is the value of (β 0, β 1 ) that maximizes the likelihood fuctio. The MLE is the value of (β 0, β 1 ) that best describes the full distributio of the data. I large samles, the MLE is: o cosistet o ormally distributed o efficiet (has the smallest variace of all estimators) 9-8
Secial case: the robit MLE with o X Y = 1 with robability 0 with robability 1 (Beroulli distributio) Data: Y 1,,Y, i.i.d. Derivatio of the likelihood starts with the desity of Y 1 : so Pr(Y 1 = 1) = ad Pr(Y 1 = 0) = 1 y1 1 y1 Pr(Y 1 = y 1 ) = (1 ) (verify this for y 1 =0, 1!) 9-9
Joit desity of (Y 1,Y ): Because Y 1 ad Y are ideedet, Pr(Y 1 = y 1,Y = y ) = Pr(Y 1 = y 1 ) Pr(Y = y ) Joit desity of (Y 1,..,Y ): y1 1 y1 = [ (1 ) y 1 y ] [ (1 ) ] Pr(Y 1 = y 1,Y = y,,y = y ) y1 1 y1 = [ (1 ) y 1 y ] [ (1 ) ] [ = i= y 1 i (1 ) ( ) yi i= 1 y (1 ) 1 y ] The likelihood is the joit desity, treated as a fuctio of the ukow arameters, which here is : 9-30
f(;y 1,,Y ) = i= Y 1 i (1 ) ( ) Yi i= 1 The MLE maximizes the likelihood. Its stadard to work with the log likelihood, l[f(;y 1,,Y )]: l[f(;y 1,,Y )] = ( ) ( ) i i Y l( ) + Y l(1 ) i= 1 i= 1 dl f( ; Y1,..., Y ) d 1 1 + 1 = ( ) ( Y ) i Yi = 0 i= 1 i= 1 9-31
Solvig for yields the MLE; that is, ˆ MLE satisfies, or or or 1 1 + MLE ˆ 1 ˆ ( ) ( Y ) 1 i Y i= MLE i= 1 i = 0 1 1 ˆ 1 ˆ ( ) ( Y ) 1 i = Y i= MLE i= 1 i Y ˆ = 1 Y 1 ˆ MLE MLE MLE ˆ MLE = Y = fractio of 1 s 9-3
The MLE i the o-x case (Beroulli distributio): ˆ MLE = Y = fractio of 1 s For Y i i.i.d. Beroulli, the MLE is the atural estimator of, the fractio of 1 s, which is Y We already kow the essetials of iferece: o I large, the samlig distributio of ˆ MLE = Y is ormally distributed o Thus iferece is as usual: hyothesis testig via t-statistic, cofidece iterval as ± 1.96SE STATA ote: to emhasize requiremet of large-, the ritout calls the t-statistic the z-statistic; istead of the F-statistic, the chi-squared statistic (= q F). 9-33
The robit likelihood with oe X The derivatio starts with the desity of Y 1, give X 1 : Pr(Y 1 = 1 X 1 ) = Φ(β 0 + β 1 X 1 ) Pr(Y 1 = 0 X 1 ) = 1 Φ(β 0 + β 1 X 1 ) so y1 1 Pr(Y 1 = y 1 X 1 ) = ( ) [1 ( )] 1 y Φ β + β X Φ β + β X 0 1 1 0 1 1 The robit likelihood fuctio is the joit desity of Y 1,,Y give X 1,,X, treated as a fuctio of β 0, β 1 : f(β 0,β 1 ; Y 1,,Y X 1,,X ) Y1 1 = { ( ) [1 ( )] 1 Y β β X β β X Φ + Φ + } 0 1 1 0 1 1 { Y 1 Y Φ ( β0 + β1x) [1 Φ ( β0 + β1x)] } 9-34
The robit likelihood fuctio: f(β 0,β 1 ; Y 1,,Y X 1,,X ) Y1 1 = { ( ) [1 ( )] 1 Y β β X β β X Φ + Φ + } { 0 1 1 0 1 1 Y 1 Y Φ ( β0 + β1x) [1 Φ ( β0 + β1x)] } Ca t solve for the maximum exlicitly Must maximize usig umerical methods As i the case of o X, i large samles: o ˆ MLE 0 ˆ MLE β, β 1 are cosistet o ˆ β MLE 0, ˆ MLE 1 o Their stadard errors ca be comuted β are ormally distributed (more later ) o Testig, cofidece itervals roceeds as usual For multile X s, see SW A. 11. 9-35
The logit likelihood with oe X The oly differece betwee robit ad logit is the fuctioal form used for the robability: Φ is relaced by the cumulative logistic fuctio. Otherwise, the likelihood is similar; for details see SW A. 11. As with robit, o ˆ MLE 0 ˆ MLE β, β 1 are cosistet o ˆ β MLE 0, ˆ β MLE 1 are ormally distributed o Their stadard errors ca be comuted o Testig, cofidece itervals roceeds as usual 9-36
Measures of fit The R ad R do t make sese here (why?). So, two other secialized measures are used: 1. The fractio correctly redicted = fractio of Y s for which redicted robability is >50% (if Y i =1) or is <50% (if Y i =0).. The seudo-r measure the fit usig the likelihood fuctio: measures the imrovemet i the value of the log likelihood, relative to havig o X s (see SW A. 11.). This simlifies to the R i the liear model with ormally distributed errors. 9-37
Large- distributio of the MLE (ot i SW) This is foudatio of mathematical statistics. We ll do this for the o-x secial case, for which is the oly ukow arameter. Here are the stes: 1. Derive the log likelihood ( Λ() ) (doe).. The MLE is foud by settig its derivative to zero; that requires solvig a oliear equatio. 3. For large, ˆ MLE will be ear the true ( true ) so this oliear equatio ca be aroximated (locally) by a liear equatio (Taylor series aroud true ). 4. This ca be solved for ˆ MLE true. 5. By the Law of Large Numbers ad the CLT, for large, ( ˆ MLE true ) is ormally distributed. 9-38
1. Derive the log likelihood Recall: the desity for observatio #1 is: y1 1 y1 Pr(Y 1 = y 1 ) = (1 ) (desity) so Y1 1 Y1 f(;y 1 ) = (1 ) (likelihood) The likelihood for Y 1,,Y is, f(;y 1,,Y ) = f(;y 1 ) f(;y ) so the log likelihood is, Λ() = lf(;y 1,,Y ) = l[f(;y 1 ) f(;y )] = i= 1 l f( ; Y ) i 9-39
. Set the derivative of Λ() to zero to defie the MLE: L( ) ˆ MLE = l f( ; Yi ) = 0 MLE i= 1 ˆ 3. Use a Taylor series exasio aroud true to aroximate this as a liear fuctio of ˆ MLE : 0 = L( ) ˆ MLE L( ) true + L( ) true ( ˆ MLE true ) 9-40
4. Solve this liear aroximatio for ( ˆ MLE true ): so or L( ) true L( ) + true L( ) ( ˆ MLE true ) true ( ˆ MLE true ) ( ˆ MLE true ) 0 1 L( ) L ( ) L( ) true true true 9-41
9-4 5. Substitute thigs i ad aly the LLN ad CLT. Λ() = 1 l ( ; ) i i f Y = ( ) true L = 1 l ( ; ) true i i f Y = ( ) true L = 1 l ( ; ) true i i f Y = so ( ˆ MLE true ) 1 ( ) true L ( ) true L = 1 1 l ( ; ) true i i f Y = 1 l ( ; ) true i i f Y =
Multily through by : ( ˆ MLE true ) 1 l f( ; Yi ) i= 1 true 1 1 l f( ; Yi ) i = 1 Because Y i is i.i.d., the i th terms i the summads are also i.i.d. Thus, if these terms have eough () momets, the uder geeral coditios (ot just Beroulli likelihood): 1 l f( ; Yi ) i= 1 true a (a costat) (WLLN) 1 l f( ; Yi ) i = 1 true d N(0, σ l f true ) (CLT) (Why?) 9-43
Puttig this together, ( ˆ MLE true ) 1 l f( ; Yi ) i= 1 true 1 1 l f( ; Yi ) i = 1 true so 1 l f( ; Yi ) i= 1 true a (a costat) (WLLN) 1 l f( ; Yi ) i = 1 true d N(0, σ l f ) (CLT) (Why?) ( ˆ MLE true ) d N(0, σ l f /a ) (large- ormal) 9-44
Work out the details for robit/o X (Beroulli) case: Recall: so ad ad f(;y i ) = Y i (1 ) 1 Y l f(;y i ) = Y i l + (1 Y i )l(1 ) l f(, Y i ) l f(, Y i ) = Yi Y 1 Yi = 1 1 Y (1 ) i i = i Yi (1 ) Y 1 Y (1 ) i i = + 9-45
Deomiator term first: so l f(, Y i ) 1 l f( ; Yi ) i= 1 Y 1 Y (1 ) i i = + true i i = + 1 Y 1 Y i= 1 (1 ) Y 1 Y (1 ) = + 1 (1 ) + = 1 1 + 1 = 1 (1 ) (LLN) 9-46
Next the umerator: so l f(, Y i ) = 1 l f( ; Yi ) i = 1 true Yi (1 ) = 1 i = 1 Yi (1 ) = 1 1 (1 ) i = 1 ( Y ) i d σy N(0, [ (1 )] ) 9-47
Put these ieces together: ( ˆ MLE true ) where 1 l f( ; Yi ) i= 1 1 l f( ; Yi ) i= 1 true 1 true 1 l f( ; Yi ) i = 1 1 (1 ) 1 l f( ; Yi ) d σy N(0, i = 1 true [ (1 )] Thus ( ˆ MLE true ) d N(0, σ ) Y ) true 9-48
Summary: robit MLE, o-x case The MLE: ˆ MLE = Y Workig through the full MLE distributio theory gave: ( ˆ MLE true ) d N(0, σ ) Y But because true = Pr(Y = 1) = E(Y) = µ Y, this is: (Y µ Y ) d N(0, σ ) Y A familiar result from the first week of class! 9-49
The MLE derivatio alies geerally ( ˆ MLE true ) d N(0, σ l f /a )) Stadard errors are obtaied from workig out exressios for σ l f /a Exteds to >1 arameter (β 0, β 1 ) via matrix calculus Because the distributio is ormal for large, iferece is coducted as usual, for examle, the 95% cofidece iterval is MLE ± 1.96SE. The exressio above uses robust stadard errors, further simlificatios yield o-robust stadard errors which aly if l f ( Y ; ) / is homoskedastic. i 9-50
Summary: distributio of the MLE (Why did I do this to you?) The MLE is ormally distributed for large We worked through this result i detail for the robit model with o X s (the Beroulli distributio) For large, cofidece itervals ad hyothesis testig roceeds as usual If the model is correctly secified, the MLE is efficiet, that is, it has a smaller large- variace tha all other estimators (we did t show this). These methods exted to other models with discrete deedet variables, for examle cout data (# crimes/day) see SW A. 11.. 9-51
Alicatio to the Bosto HMDA Data (SW Sectio 11.4) Mortgages (home loas) are a essetial art of buyig a home. Is there differetial access to home loas by race? If two otherwise idetical idividuals, oe white ad oe black, alied for a home loa, is there a differece i the robability of deial? 9-5
The HMDA Data Set Data o idividual characteristics, roerty characteristics, ad loa deial/accetace The mortgage alicatio rocess circa 1990-1991: o Go to a bak or mortgage comay o Fill out a alicatio (ersoal+fiacial ifo) o Meet with the loa officer The the loa officer decides by law, i a race-blid way. Presumably, the bak wats to make rofitable loas, ad the loa officer does t wat to origiate defaults. 9-53
The loa officer s decisio Loa officer uses key fiacial variables: o P/I ratio o housig exese-to-icome ratio o loa-to-value ratio o ersoal credit history The decisio rule is oliear: o loa-to-value ratio > 80% o loa-to-value ratio > 95% (what haes i default?) o credit score 9-54
Regressio secificatios Pr(dey=1 black, other X s) = liear robability model robit Mai roblem with the regressios so far: otetial omitted variable bias. All these (i) eter the loa officer decisio fuctio, all (ii) are or could be correlated with race: wealth, tye of emloymet credit history family status Variables i the HMDA data set 9-55
9-56
9-57
9-58
9-59
9-60
Summary of Emirical Results Coefficiets o the fiacial variables make sese. Black is statistically sigificat i all secificatios Race-fiacial variable iteractios are t sigificat. Icludig the covariates sharly reduces the effect of race o deial robability. LPM, robit, logit: similar estimates of effect of race o the robability of deial. Estimated effects are large i a real world sese. 9-61
Remaiig threats to iteral, exteral validity Iteral validity 1. omitted variable bias what else is leared i the i-erso iterviews?. fuctioal form missecificatio (o ) 3. measuremet error (origially, yes; ow, o ) 4. selectio radom samle of loa alicatios defie oulatio to be loa alicats 5. simultaeous causality (o) Exteral validity This is for Bosto i 1990-91. What about today? 9-6
Summary (SW Sectio 11.5) If Y i is biary, the E(Y X) = Pr(Y=1 X) Three models: o liear robability model (liear multile regressio) o robit (cumulative stadard ormal distributio) o logit (cumulative stadard logistic distributio) LPM, robit, logit all roduce redicted robabilities Effect of X is chage i coditioal robability that Y=1. For logit ad robit, this deeds o the iitial X Probit ad logit are estimated via maximum likelihood o Coefficiets are ormally distributed for large o Large- hyothesis testig, cof. itervals is as usual 9-63