Regression with a Binary Dependent Variable (SW Ch. 11)



Similar documents
Unit 8: Inference for Proportions. Chapters 8 & 9 in IPS

I. Chi-squared Distributions

Properties of MLE: consistency, asymptotic normality. Fisher information.

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown

Incremental calculation of weighted mean and variance

Chapter 7 Methods of Finding Estimators

Maximum Likelihood Estimators.

5: Introduction to Estimation

Hypothesis testing. Null and alternative hypotheses

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring

Normal Distribution.

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n

Case Study. Normal and t Distributions. Density Plot. Normal Distributions

CHAPTER 3 THE TIME VALUE OF MONEY

Lesson 17 Pearson s Correlation Coefficient

Now here is the important step

One-sample test of proportions

Soving Recurrence Relations

Chapter 7 - Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas:

Confidence Intervals for One Mean

Chapter 7: Confidence Interval and Sample Size

Determining the sample size

Confidence Intervals for Two Proportions


Overview. Learning Objectives. Point Estimate. Estimation. Estimating the Value of a Parameter Using Confidence Intervals

INVESTMENT PERFORMANCE COUNCIL (IPC)

Exam 3. Instructor: Cynthia Rudin TA: Dimitrios Bisias. November 22, 2011

1 Correlation and Regression Analysis

Definition. A variable X that takes on values X 1, X 2, X 3,...X k with respective frequencies f 1, f 2, f 3,...f k has mean

Mann-Whitney U 2 Sample Test (a.k.a. Wilcoxon Rank Sum Test)

University of California, Los Angeles Department of Statistics. Distributions related to the normal distribution

Statistical inference: example 1. Inferential Statistics

3 Energy Non-Flow Energy Equation (NFEE) Internal Energy. MECH 225 Engineering Science 2

Output Analysis (2, Chapters 10 &11 Law)

Confidence Intervals. CI for a population mean (σ is known and n > 30 or the variable is normally distributed in the.

5.4 Amortization. Question 1: How do you find the present value of an annuity? Question 2: How is a loan amortized?

CS103A Handout 23 Winter 2002 February 22, 2002 Solving Recurrence Relations

RF Engineering Continuing Education Introduction to Traffic Planning

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

1. C. The formula for the confidence interval for a population mean is: x t, which was

A Test of Normality. 1 n S 2 3. n 1. Now introduce two new statistics. The sample skewness is defined as:

, a Wishart distribution with n -1 degrees of freedom and scale matrix.

Lesson 15 ANOVA (analysis of variance)

PSYCHOLOGICAL STATISTICS

CHAPTER 7: Central Limit Theorem: CLT for Averages (Means)

Measures of Spread and Boxplots Discrete Math, Section 9.4

Overview of some probability distributions.

BASIC STATISTICS. f(x 1,x 2,..., x n )=f(x 1 )f(x 2 ) f(x n )= f(x i ) (1)

Chapter 14 Nonparametric Statistics

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method

OMG! Excessive Texting Tied to Risky Teen Behaviors

Estimating Probability Distributions by Observing Betting Practices

GCSE STATISTICS. 4) How to calculate the range: The difference between the biggest number and the smallest number.

Systems Design Project: Indoor Location of Wireless Devices

1 Computing the Standard Deviation of Sample Means

Modified Line Search Method for Global Optimization

CHAPTER 11 Financial mathematics

THE TWO-VARIABLE LINEAR REGRESSION MODEL

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

Center, Spread, and Shape in Inference: Claims, Caveats, and Insights

Methods in Sample Surveys rd Quarter, 2009

CHAPTER 3 DIGITAL CODING OF SIGNALS

How to use what you OWN to reduce what you OWE

BENEFIT-COST ANALYSIS Financial and Economic Appraisal using Spreadsheets

.04. This means $1000 is multiplied by 1.02 five times, once for each of the remaining sixmonth

Quadrat Sampling in Population Ecology

Uncertainty Chapter 13. Mausam (Based on slides by UW-AI faculty)

5 Boolean Decision Trees (February 11)

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable

FM4 CREDIT AND BORROWING

MEI Structured Mathematics. Module Summary Sheets. Statistics 2 (Version B: reference to new book)

Option Pricing: A Simplified Approach

A probabilistic proof of a binomial identity

LECTURE 13: Cross-validation

Time Value of Money, NPV and IRR equation solving with the TI-86

Simple Annuities Present Value.

The analysis of the Cournot oligopoly model considering the subjective motive in the strategy selection

Subject CT5 Contingencies Core Technical Syllabus

Repeating Decimals are decimal numbers that have number(s) after the decimal point that repeat in a pattern.

Factoring x n 1: cyclotomic and Aurifeuillian polynomials Paul Garrett <garrett@math.umn.edu>

Confidence intervals and hypothesis tests

Linear classifier MAXIMUM ENTROPY. Linear regression. Logistic regression 11/3/11. f 1

Chapter 5: Basic Linear Regression

Chapter 5 Unit 1. IET 350 Engineering Economics. Learning Objectives Chapter 5. Learning Objectives Unit 1. Annual Amount and Gradient Functions

A gentle introduction to Expectation Maximization

Inference on Proportion. Chapter 8 Tests of Statistical Hypotheses. Sampling Distribution of Sample Proportion. Confidence Interval

I. Why is there a time value to money (TVM)?

hp calculators HP 12C Statistics - average and standard deviation Average and standard deviation concepts HP12C average and standard deviation

Confidence Intervals

Vladimir N. Burkov, Dmitri A. Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT

Agenda. Outsourcing and Globalization in Software Development. Outsourcing. Outsourcing here to stay. Outsourcing Alternatives

Hypergeometric Distributions

Domain 1: Identifying Cause of and Resolving Desktop Application Issues Identifying and Resolving New Software Installation Issues

Solving Logarithms and Exponential Equations

Forecasting. Forecasting Application. Practical Forecasting. Chapter 7 OVERVIEW KEY CONCEPTS. Chapter 7. Chapter 7

Parameter estimation for nonlinear models: Numerical approaches to solving the inverse problem. Lecture 11 04/01/2008. Sven Zenker

UC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Solutions 9 Spring 2006

THE ROLE OF EXPORTS IN ECONOMIC GROWTH WITH REFERENCE TO ETHIOPIAN COUNTRY

How To Calculate A Radom Umber From A Probability Fuctio

Present Values, Investment Returns and Discount Rates

Transcription:

Regressio with a Biary Deedet Variable (SW Ch. 11) So far the deedet variable (Y) has bee cotiuous: district-wide average test score traffic fatality rate But we might wat to uderstad the effect of X o a biary variable: Y = get ito college, or ot Y = erso smokes, or ot Y = mortgage alicatio is acceted, or ot 9-1

Examle: Mortgage deial ad race The Bosto Fed HMDA data set Idividual alicatios for sigle-family mortgages made i 1990 i the greater Bosto area 380 observatios, collected uder Home Mortgage Disclosure Act (HMDA) Variables Deedet variable: o Is the mortgage deied or acceted? Ideedet variables: o icome, wealth, emloymet status o other loa, roerty characteristics o race of alicat 9-

The Liear Probability Model (SW Sectio 11.1) A atural startig oit is the liear regressio model with a sigle regressor: But: Y i = β 0 + β 1 X i + u i What does β 1 mea whe Y is biary? Is β 1 = Y X? What does the lie β 0 + β 1 X mea whe Y is biary? What does the redicted value ˆ Y mea whe Y is biary? For examle, what does ˆ Y = 0.6 mea? 9-3

The liear robability model, ctd. Y i = β 0 + β 1 X i + u i Recall assumtio #1: E(u i X i ) = 0, so E(Y i X i ) = E(β 0 + β 1 X i + u i X i ) = β 0 + β 1 X i Whe Y is biary, E(Y) = 1 Pr(Y=1) + 0 Pr(Y=0) = Pr(Y=1) so E(Y X) = Pr(Y=1 X) 9-4

The liear robability model, ctd. Whe Y is biary, the liear regressio model Y i = β 0 + β 1 X i + u i is called the liear robability model. The redicted value is a robability: o E(Y X=x) = Pr(Y=1 X=x) = rob. that Y = 1 give x o Y ˆ = the redicted robability that Y i = 1, give X β 1 = chage i robability that Y = 1 for a give x: β 1 = Pr( Y = 1 X = x + x ) Pr( Y = 1 X = x ) x Examle: liear robability model, HMDA data 9-5

Mortgage deial v. ratio of debt aymets to icome (P/I ratio) i the HMDA data set (subset) 9-6

Liear robability model: HMDA data dey = -.080 +.604P/I ratio ( = 380) (.03) (.098) What is the redicted value for P/I ratio =.3? Pr( dey = 1 P / Iratio =.3) = -.080 +.604.3 =.151 Calculatig effects: icrease P/I ratio from.3 to.4: Pr( dey = 1 P / Iratio =.4) = -.080 +.604.4 =.1 The effect o the robability of deial of a icrease i P/I ratio from.3 to.4 is to icrease the robability by.061, that is, by 6.1 ercetage oits (what?). 9-7

Next iclude black as a regressor: dey = -.091 +.559P/I ratio +.177black (.03) (.098) (.05) Predicted robability of deial: for black alicat with P/I ratio =.3: Pr( dey = 1) = -.091 +.559.3 +.177 1 =.54 for white alicat, P/I ratio =.3: Pr( dey = 1) = -.091 +.559.3 +.177 0 =.077 differece =.177 = 17.7 ercetage oits Coefficiet o black is sigificat at the 5% level Still lety of room for omitted variable bias 9-8

The liear robability model: Summary Models robability as a liear fuctio of X Advatages: o simle to estimate ad to iterret o iferece is the same as for multile regressio (eed heteroskedasticity-robust stadard errors) Disadvatages: o Does it make sese that the robability should be liear i X? o Predicted robabilities ca be <0 or >1! These disadvatages ca be solved by usig a oliear robability model: robit ad logit regressio 9-9

Probit ad Logit Regressio (SW Sectio 11.) The roblem with the liear robability model is that it models the robability of Y=1 as beig liear: Pr(Y = 1 X) = β 0 + β 1 X Istead, we wat: 0 Pr(Y = 1 X) 1 for all X Pr(Y = 1 X) to be icreasig i X (for β 1 >0) This requires a oliear fuctioal form for the robability. How about a S-curve 9-10

The robit model satisfies these coditios: 0 Pr(Y = 1 X) 1 for all X Pr(Y = 1 X) to be icreasig i X (for β 1 >0) 9-11

Probit regressio models the robability that Y=1 usig the cumulative stadard ormal distributio fuctio, evaluated at z = β 0 + β 1 X: Pr(Y = 1 X) = Φ(β 0 + β 1 X) Φ is the cumulative ormal distributio fuctio. z = β 0 + β 1 X is the z-value or z-idex of the robit model. Examle: Suose β 0 = -, β 1 = 3, X =.4, so Pr(Y = 1 X=.4) = Φ(- + 3.4) = Φ(-0.8) Pr(Y = 1 X=.4) = area uder the stadard ormal desity to left of z = -.8, which is 9-1

Pr(Z -0.8) =.119 9-13

Probit regressio, ctd. Why use the cumulative ormal robability distributio? The S-shae gives us what we wat: o 0 Pr(Y = 1 X) 1 for all X o Pr(Y = 1 X) to be icreasig i X (for β 1 >0) Easy to use the robabilities are tabulated i the cumulative ormal tables Relatively straightforward iterretatio: o z-value = β 0 + β 1 X ˆ β + ˆ 1 o 0 β X is the redicted z-value, give X o β 1 is the chage i the z-value for a uit chage i X 9-14

STATA Examle: HMDA data. robit dey _irat, r; Iteratio 0: log likelihood = -87.0853 Iteratio 1: log likelihood = -835.6633 Iteratio : log likelihood = -831.80534 Iteratio 3: log likelihood = -831.7934 We ll discuss this later Probit estimates Number of obs = 380 Wald chi(1) = 40.68 Prob > chi = 0.0000 Log likelihood = -831.7934 Pseudo R = 0.046 ------------------------------------------------------------------------------ Robust dey Coef. Std. Err. z P> z [95% Cof. Iterval] -------------+---------------------------------------------------------------- _irat.967908.4653114 6.38 0.000.055914 3.879901 _cos -.194159.164971-13.30 0.000 -.517499-1.8708 ------------------------------------------------------------------------------ Pr( dey 1 P / Iratio) = = Φ(-.19 +.97 P/I ratio) (.16) (.47) 9-15

STATA Examle: HMDA data, ctd. Pr( dey 1 P / Iratio) = = Φ(-.19 +.97 P/I ratio) (.16) (.47) Positive coefficiet: does this make sese? Stadard errors have usual iterretatio Predicted robabilities: Pr( dey 1 P / Iratio.3) = = = Φ(-.19+.97.3) = Φ(-1.30) =.097 Effect of chage i P/I ratio from.3 to.4: Pr( dey = 1 P / Iratio =.4) = Φ(-.19+.97.4) =.159 Predicted robability of deial rises from.097 to.159 9-16

Probit regressio with multile regressors Pr(Y = 1 X 1, X ) = Φ(β 0 + β 1 X 1 + β X ) Φ is the cumulative ormal distributio fuctio. z = β 0 + β 1 X 1 + β X is the z-value or z-idex of the robit model. β 1 is the effect o the z-score of a uit chage i X 1, holdig costat X 9-17

STATA Examle: HMDA data. robit dey _irat black, r; Iteratio 0: log likelihood = -87.0853 Iteratio 1: log likelihood = -800.88504 Iteratio : log likelihood = -797.1478 Iteratio 3: log likelihood = -797.13604 Probit estimates Number of obs = 380 Wald chi() = 118.18 Prob > chi = 0.0000 Log likelihood = -797.13604 Pseudo R = 0.0859 ------------------------------------------------------------------------------ Robust dey Coef. Std. Err. z P> z [95% Cof. Iterval] -------------+---------------------------------------------------------------- _irat.741637.4441633 6.17 0.000 1.87109 3.61181 black.7081579.0831877 8.51 0.000.545113.87108 _cos -.58738.1588168-14. 0.000 -.570013-1.947463 ------------------------------------------------------------------------------ We ll go through the estimatio details later 9-18

STATA Examle: redicted robit robabilities. robit dey _irat black, r; Probit estimates Number of obs = 380 Wald chi() = 118.18 Prob > chi = 0.0000 Log likelihood = -797.13604 Pseudo R = 0.0859 ------------------------------------------------------------------------------ Robust dey Coef. Std. Err. z P> z [95% Cof. Iterval] -------------+---------------------------------------------------------------- _irat.741637.4441633 6.17 0.000 1.87109 3.61181 black.7081579.0831877 8.51 0.000.545113.87108 _cos -.58738.1588168-14. 0.000 -.570013-1.947463 ------------------------------------------------------------------------------. sca z1 = _b[_cos]+_b[_irat]*.3+_b[black]*0;. dislay "Pred rob, _irat=.3, white: "ormrob(z1); Pred rob, _irat=.3, white:.07546603 NOTE _b[_cos] is the estimated itercet (-.58738) _b[_irat] is the coefficiet o _irat (.741637) sca creates a ew scalar which is the result of a calculatio dislay rits the idicated iformatio to the scree 9-19

STATA Examle: HMDA data, ctd. Pr( dey = 1 P / I, black) = Φ(-.6 +.74 P/I ratio +.71 black) (.16) (.44) (.08) Is the coefficiet o black statistically sigificat? Estimated effect of race for P/I ratio =.3: Pr( dey = 1.3,1) = Φ(-.6+.74.3+.71 1) =.33 Pr( dey = 1.3,0) = Φ(-.6+.74.3+.71 0) =.075 Differece i rejectio robabilities =.158 (15.8 ercetage oits) Still lety of room for omitted variable bias 9-0

Logit regressio Logit regressio models the robability of Y=1 as the cumulative stadard logistic distributio fuctio, evaluated at z = β 0 + β 1 X: Pr(Y = 1 X) = F(β 0 + β 1 X) F is the cumulative logistic distributio fuctio: F(β 0 + β 1 X) = 1+ 1 ( 0 1X ) e β + β 9-1

Logistic regressio, ctd. Pr(Y = 1 X) = F(β 0 + β 1 X) where F(β 0 + β 1 X) = 1+ 1 ( 0 1X ) e β + β. Examle: β 0 = -3, β 1 =, X =.4, so β 0 + β 1 X = -3 +.4 = -. so Pr(Y = 1 X=.4) = 1/(1+e (.) ) =.0998 Why bother with logit if we have robit? Historically, umerically coveiet I ractice, very similar to robit 9-

STATA Examle: HMDA data. logit dey _irat black, r; Iteratio 0: log likelihood = -87.0853 Iteratio 1: log likelihood = -806.3571 Iteratio : log likelihood = -795.74477 Iteratio 3: log likelihood = -795.6951 Iteratio 4: log likelihood = -795.6951 Later Logit estimates Number of obs = 380 Wald chi() = 117.75 Prob > chi = 0.0000 Log likelihood = -795.6951 Pseudo R = 0.0876 ------------------------------------------------------------------------------ Robust dey Coef. Std. Err. z P> z [95% Cof. Iterval] -------------+---------------------------------------------------------------- _irat 5.37036.9633435 5.57 0.000 3.4844 7.58481 black 1.778.1460986 8.71 0.000.9864339 1.55913 _cos -4.15558.34585-11.93 0.000-4.80336-3.447753 ------------------------------------------------------------------------------. dis "Pred rob, _irat=.3, white: " > 1/(1+ex(-(_b[_cos]+_b[_irat]*.3+_b[black]*0))); Pred rob, _irat=.3, white:.07485143 NOTE: the robit redicted robability is.07546603 9-3

Predicted robabilities from estimated robit ad logit models usually are very close. 9-4

Estimatio ad Iferece i Probit (ad Logit) Models (SW Sectio 11.3) Probit model: Pr(Y = 1 X) = Φ(β 0 + β 1 X) Estimatio ad iferece o How to estimate β 0 ad β 1? o What is the samlig distributio of the estimators? o Why ca we use the usual methods of iferece? First discuss oliear least squares (easier to exlai) The discuss maximum likelihood estimatio (what is actually doe i ractice) 9-5

Probit estimatio by oliear least squares Recall OLS: mi [ Y ( b + b X )] b0, b1 i 0 1 i i= 1 The result is the OLS estimators β 0 ad β 1 ˆ ˆ I robit, we have a differet regressio fuctio the oliear robit model. So, we could estimate β 0 ad β 1 by oliear least squares: mi [ Y Φ ( b + b X )] b0, b1 i 0 1 i i= 1 Solvig this yields the oliear least squares estimator of the robit coefficiets. 9-6

Noliear least squares, ctd. mi [ Y Φ ( b + b X )] b0, b1 i 0 1 i i= 1 How to solve this miimizatio roblem? Calculus does t give ad exlicit solutio. Must be solved umerically usig the comuter, e.g. by trial ad error method of tryig oe set of values for (b 0,b 1 ), the tryig aother, ad aother, Better idea: use secialized miimizatio algorithms I ractice, oliear least squares is t used because it is t efficiet a estimator with a smaller variace is 9-7

Probit estimatio by maximum likelihood The likelihood fuctio is the coditioal desity of Y 1,,Y give X 1,,X, treated as a fuctio of the ukow arameters β 0 ad β 1. The maximum likelihood estimator (MLE) is the value of (β 0, β 1 ) that maximizes the likelihood fuctio. The MLE is the value of (β 0, β 1 ) that best describes the full distributio of the data. I large samles, the MLE is: o cosistet o ormally distributed o efficiet (has the smallest variace of all estimators) 9-8

Secial case: the robit MLE with o X Y = 1 with robability 0 with robability 1 (Beroulli distributio) Data: Y 1,,Y, i.i.d. Derivatio of the likelihood starts with the desity of Y 1 : so Pr(Y 1 = 1) = ad Pr(Y 1 = 0) = 1 y1 1 y1 Pr(Y 1 = y 1 ) = (1 ) (verify this for y 1 =0, 1!) 9-9

Joit desity of (Y 1,Y ): Because Y 1 ad Y are ideedet, Pr(Y 1 = y 1,Y = y ) = Pr(Y 1 = y 1 ) Pr(Y = y ) Joit desity of (Y 1,..,Y ): y1 1 y1 = [ (1 ) y 1 y ] [ (1 ) ] Pr(Y 1 = y 1,Y = y,,y = y ) y1 1 y1 = [ (1 ) y 1 y ] [ (1 ) ] [ = i= y 1 i (1 ) ( ) yi i= 1 y (1 ) 1 y ] The likelihood is the joit desity, treated as a fuctio of the ukow arameters, which here is : 9-30

f(;y 1,,Y ) = i= Y 1 i (1 ) ( ) Yi i= 1 The MLE maximizes the likelihood. Its stadard to work with the log likelihood, l[f(;y 1,,Y )]: l[f(;y 1,,Y )] = ( ) ( ) i i Y l( ) + Y l(1 ) i= 1 i= 1 dl f( ; Y1,..., Y ) d 1 1 + 1 = ( ) ( Y ) i Yi = 0 i= 1 i= 1 9-31

Solvig for yields the MLE; that is, ˆ MLE satisfies, or or or 1 1 + MLE ˆ 1 ˆ ( ) ( Y ) 1 i Y i= MLE i= 1 i = 0 1 1 ˆ 1 ˆ ( ) ( Y ) 1 i = Y i= MLE i= 1 i Y ˆ = 1 Y 1 ˆ MLE MLE MLE ˆ MLE = Y = fractio of 1 s 9-3

The MLE i the o-x case (Beroulli distributio): ˆ MLE = Y = fractio of 1 s For Y i i.i.d. Beroulli, the MLE is the atural estimator of, the fractio of 1 s, which is Y We already kow the essetials of iferece: o I large, the samlig distributio of ˆ MLE = Y is ormally distributed o Thus iferece is as usual: hyothesis testig via t-statistic, cofidece iterval as ± 1.96SE STATA ote: to emhasize requiremet of large-, the ritout calls the t-statistic the z-statistic; istead of the F-statistic, the chi-squared statistic (= q F). 9-33

The robit likelihood with oe X The derivatio starts with the desity of Y 1, give X 1 : Pr(Y 1 = 1 X 1 ) = Φ(β 0 + β 1 X 1 ) Pr(Y 1 = 0 X 1 ) = 1 Φ(β 0 + β 1 X 1 ) so y1 1 Pr(Y 1 = y 1 X 1 ) = ( ) [1 ( )] 1 y Φ β + β X Φ β + β X 0 1 1 0 1 1 The robit likelihood fuctio is the joit desity of Y 1,,Y give X 1,,X, treated as a fuctio of β 0, β 1 : f(β 0,β 1 ; Y 1,,Y X 1,,X ) Y1 1 = { ( ) [1 ( )] 1 Y β β X β β X Φ + Φ + } 0 1 1 0 1 1 { Y 1 Y Φ ( β0 + β1x) [1 Φ ( β0 + β1x)] } 9-34

The robit likelihood fuctio: f(β 0,β 1 ; Y 1,,Y X 1,,X ) Y1 1 = { ( ) [1 ( )] 1 Y β β X β β X Φ + Φ + } { 0 1 1 0 1 1 Y 1 Y Φ ( β0 + β1x) [1 Φ ( β0 + β1x)] } Ca t solve for the maximum exlicitly Must maximize usig umerical methods As i the case of o X, i large samles: o ˆ MLE 0 ˆ MLE β, β 1 are cosistet o ˆ β MLE 0, ˆ MLE 1 o Their stadard errors ca be comuted β are ormally distributed (more later ) o Testig, cofidece itervals roceeds as usual For multile X s, see SW A. 11. 9-35

The logit likelihood with oe X The oly differece betwee robit ad logit is the fuctioal form used for the robability: Φ is relaced by the cumulative logistic fuctio. Otherwise, the likelihood is similar; for details see SW A. 11. As with robit, o ˆ MLE 0 ˆ MLE β, β 1 are cosistet o ˆ β MLE 0, ˆ β MLE 1 are ormally distributed o Their stadard errors ca be comuted o Testig, cofidece itervals roceeds as usual 9-36

Measures of fit The R ad R do t make sese here (why?). So, two other secialized measures are used: 1. The fractio correctly redicted = fractio of Y s for which redicted robability is >50% (if Y i =1) or is <50% (if Y i =0).. The seudo-r measure the fit usig the likelihood fuctio: measures the imrovemet i the value of the log likelihood, relative to havig o X s (see SW A. 11.). This simlifies to the R i the liear model with ormally distributed errors. 9-37

Large- distributio of the MLE (ot i SW) This is foudatio of mathematical statistics. We ll do this for the o-x secial case, for which is the oly ukow arameter. Here are the stes: 1. Derive the log likelihood ( Λ() ) (doe).. The MLE is foud by settig its derivative to zero; that requires solvig a oliear equatio. 3. For large, ˆ MLE will be ear the true ( true ) so this oliear equatio ca be aroximated (locally) by a liear equatio (Taylor series aroud true ). 4. This ca be solved for ˆ MLE true. 5. By the Law of Large Numbers ad the CLT, for large, ( ˆ MLE true ) is ormally distributed. 9-38

1. Derive the log likelihood Recall: the desity for observatio #1 is: y1 1 y1 Pr(Y 1 = y 1 ) = (1 ) (desity) so Y1 1 Y1 f(;y 1 ) = (1 ) (likelihood) The likelihood for Y 1,,Y is, f(;y 1,,Y ) = f(;y 1 ) f(;y ) so the log likelihood is, Λ() = lf(;y 1,,Y ) = l[f(;y 1 ) f(;y )] = i= 1 l f( ; Y ) i 9-39

. Set the derivative of Λ() to zero to defie the MLE: L( ) ˆ MLE = l f( ; Yi ) = 0 MLE i= 1 ˆ 3. Use a Taylor series exasio aroud true to aroximate this as a liear fuctio of ˆ MLE : 0 = L( ) ˆ MLE L( ) true + L( ) true ( ˆ MLE true ) 9-40

4. Solve this liear aroximatio for ( ˆ MLE true ): so or L( ) true L( ) + true L( ) ( ˆ MLE true ) true ( ˆ MLE true ) ( ˆ MLE true ) 0 1 L( ) L ( ) L( ) true true true 9-41

9-4 5. Substitute thigs i ad aly the LLN ad CLT. Λ() = 1 l ( ; ) i i f Y = ( ) true L = 1 l ( ; ) true i i f Y = ( ) true L = 1 l ( ; ) true i i f Y = so ( ˆ MLE true ) 1 ( ) true L ( ) true L = 1 1 l ( ; ) true i i f Y = 1 l ( ; ) true i i f Y =

Multily through by : ( ˆ MLE true ) 1 l f( ; Yi ) i= 1 true 1 1 l f( ; Yi ) i = 1 Because Y i is i.i.d., the i th terms i the summads are also i.i.d. Thus, if these terms have eough () momets, the uder geeral coditios (ot just Beroulli likelihood): 1 l f( ; Yi ) i= 1 true a (a costat) (WLLN) 1 l f( ; Yi ) i = 1 true d N(0, σ l f true ) (CLT) (Why?) 9-43

Puttig this together, ( ˆ MLE true ) 1 l f( ; Yi ) i= 1 true 1 1 l f( ; Yi ) i = 1 true so 1 l f( ; Yi ) i= 1 true a (a costat) (WLLN) 1 l f( ; Yi ) i = 1 true d N(0, σ l f ) (CLT) (Why?) ( ˆ MLE true ) d N(0, σ l f /a ) (large- ormal) 9-44

Work out the details for robit/o X (Beroulli) case: Recall: so ad ad f(;y i ) = Y i (1 ) 1 Y l f(;y i ) = Y i l + (1 Y i )l(1 ) l f(, Y i ) l f(, Y i ) = Yi Y 1 Yi = 1 1 Y (1 ) i i = i Yi (1 ) Y 1 Y (1 ) i i = + 9-45

Deomiator term first: so l f(, Y i ) 1 l f( ; Yi ) i= 1 Y 1 Y (1 ) i i = + true i i = + 1 Y 1 Y i= 1 (1 ) Y 1 Y (1 ) = + 1 (1 ) + = 1 1 + 1 = 1 (1 ) (LLN) 9-46

Next the umerator: so l f(, Y i ) = 1 l f( ; Yi ) i = 1 true Yi (1 ) = 1 i = 1 Yi (1 ) = 1 1 (1 ) i = 1 ( Y ) i d σy N(0, [ (1 )] ) 9-47

Put these ieces together: ( ˆ MLE true ) where 1 l f( ; Yi ) i= 1 1 l f( ; Yi ) i= 1 true 1 true 1 l f( ; Yi ) i = 1 1 (1 ) 1 l f( ; Yi ) d σy N(0, i = 1 true [ (1 )] Thus ( ˆ MLE true ) d N(0, σ ) Y ) true 9-48

Summary: robit MLE, o-x case The MLE: ˆ MLE = Y Workig through the full MLE distributio theory gave: ( ˆ MLE true ) d N(0, σ ) Y But because true = Pr(Y = 1) = E(Y) = µ Y, this is: (Y µ Y ) d N(0, σ ) Y A familiar result from the first week of class! 9-49

The MLE derivatio alies geerally ( ˆ MLE true ) d N(0, σ l f /a )) Stadard errors are obtaied from workig out exressios for σ l f /a Exteds to >1 arameter (β 0, β 1 ) via matrix calculus Because the distributio is ormal for large, iferece is coducted as usual, for examle, the 95% cofidece iterval is MLE ± 1.96SE. The exressio above uses robust stadard errors, further simlificatios yield o-robust stadard errors which aly if l f ( Y ; ) / is homoskedastic. i 9-50

Summary: distributio of the MLE (Why did I do this to you?) The MLE is ormally distributed for large We worked through this result i detail for the robit model with o X s (the Beroulli distributio) For large, cofidece itervals ad hyothesis testig roceeds as usual If the model is correctly secified, the MLE is efficiet, that is, it has a smaller large- variace tha all other estimators (we did t show this). These methods exted to other models with discrete deedet variables, for examle cout data (# crimes/day) see SW A. 11.. 9-51

Alicatio to the Bosto HMDA Data (SW Sectio 11.4) Mortgages (home loas) are a essetial art of buyig a home. Is there differetial access to home loas by race? If two otherwise idetical idividuals, oe white ad oe black, alied for a home loa, is there a differece i the robability of deial? 9-5

The HMDA Data Set Data o idividual characteristics, roerty characteristics, ad loa deial/accetace The mortgage alicatio rocess circa 1990-1991: o Go to a bak or mortgage comay o Fill out a alicatio (ersoal+fiacial ifo) o Meet with the loa officer The the loa officer decides by law, i a race-blid way. Presumably, the bak wats to make rofitable loas, ad the loa officer does t wat to origiate defaults. 9-53

The loa officer s decisio Loa officer uses key fiacial variables: o P/I ratio o housig exese-to-icome ratio o loa-to-value ratio o ersoal credit history The decisio rule is oliear: o loa-to-value ratio > 80% o loa-to-value ratio > 95% (what haes i default?) o credit score 9-54

Regressio secificatios Pr(dey=1 black, other X s) = liear robability model robit Mai roblem with the regressios so far: otetial omitted variable bias. All these (i) eter the loa officer decisio fuctio, all (ii) are or could be correlated with race: wealth, tye of emloymet credit history family status Variables i the HMDA data set 9-55

9-56

9-57

9-58

9-59

9-60

Summary of Emirical Results Coefficiets o the fiacial variables make sese. Black is statistically sigificat i all secificatios Race-fiacial variable iteractios are t sigificat. Icludig the covariates sharly reduces the effect of race o deial robability. LPM, robit, logit: similar estimates of effect of race o the robability of deial. Estimated effects are large i a real world sese. 9-61

Remaiig threats to iteral, exteral validity Iteral validity 1. omitted variable bias what else is leared i the i-erso iterviews?. fuctioal form missecificatio (o ) 3. measuremet error (origially, yes; ow, o ) 4. selectio radom samle of loa alicatios defie oulatio to be loa alicats 5. simultaeous causality (o) Exteral validity This is for Bosto i 1990-91. What about today? 9-6

Summary (SW Sectio 11.5) If Y i is biary, the E(Y X) = Pr(Y=1 X) Three models: o liear robability model (liear multile regressio) o robit (cumulative stadard ormal distributio) o logit (cumulative stadard logistic distributio) LPM, robit, logit all roduce redicted robabilities Effect of X is chage i coditioal robability that Y=1. For logit ad robit, this deeds o the iitial X Probit ad logit are estimated via maximum likelihood o Coefficiets are ormally distributed for large o Large- hyothesis testig, cof. itervals is as usual 9-63