12.1 Qualitative Dependent variable (Dummy Dependent Variable. Qualitative Dependent variable binary variable with values of 0 and 1

Transcription

1 Lecture 12 Qualitative Dependent Variables 12.1 Qualitative Dependent variable (Dummy Dependent Variable Qualitative Dependent variable binary variable with values of 0 and 1 Examples: To model household purchasing decision whether to buy a car a typical family either bought a car or it did not: takes the value of 1 if the household purchased a car; takes the value of 0 if the household did not. the interpretation of the dependent variable is that it is a probability measure for which the realized value is 0 or 1 discrete choice model or qualitative response model: involving binary (dummy) dependent variables Decision to study for MBA degree: a function of unemployment rate, average wage rate, family income, etc. takes tow values: 1 if the person is in MBA program and 0 if he/she is not. 1

2 2 LECTURE 12 QUALITATIVE DEPENDENT VARIABLE Union membership of a worker Own a house When we handle models involving dichotomous response variables, four most commonly used approaches to estimating such models are: 1. The linear probability model (LPM) 2. The logit model 3. The probit model 4. The tobit (censored regression) model 12.2 Linear Probability (or Binary Choice) Models [LPM] Consider the following simple model: Y i = α + βx i + u i where X i = household income Y i = 1 if the household (ith observation) buys a car in a given year Y i = 0 if the household does not buy a car In this model, the dichotomous Y i is represented as a linear function of X. This model is called linear probability model since E[y i X i ] can be interpreted as the conditional probability that the event will occur given X i ; that is, Pr(Y i = 1 X i ). Given observed value of Y i = 0 or 1, let s define P i =Pr(Y i = 1 X i ). It results E[y i X i ] = 1 P i + 0 (1 P i ) = P i Pr(y i = 1 X i ) = α + βx i More importantly, the linear probability model does violate the assumption of homoskedasticity. When y is a binary variable, we have Var(y i X i ) = P i [1 P i ]

3 3 LECTURE 12 QUALITATIVE DEPENDENT VARIABLE where P i denotes the probability of success: P i (X) = α+βx i. This indicates there exists heteroskedasticity in LPM model. It implies the OLS estimators are inefficient in the LPM. Hence we have to correct for heteroskedasticity for estimating the LPM if we want to have a more efficient estimator that OLS in LPM. By definition, u i = 1 α βx i ify i = 1 = α βx i ify i = 1 0 = E(u i X i ) = P i (1 α βx i ) + (1 P i ) ( α βx i ) σ 2 i = E[(u i E(u i )) 2 X i ] = E(u 2 i ) since E(u i X i ) = 0 σ 2 i = P i (1 α βx i ) 2 + (1 P i )( α βx i ) 2 = P i (1 P i ) 2 + (1 P i )P 2 i = P i (1 P i ) which makes use if the fact that α + βx i = P i. Hence σ 2 i = (1 α βx i )(α + βx i ), which varies with i, thus establishing the heteroskedasticity of the residuals u i. Procedures of Estimating the LPM: 1. Obtain the OLS estimators of the LPM at first. 2. Determine whether all of the OLS fitted values, ŷ i, satisfy 0 < ŷ i < 1. If so, proceed to step (3). If not, some adjustment is needed to bring all fitted values into the unit interval. 3. Construct the estimator of σ 2 i : ˆσ 2 i = ŷ i (1 ŷ i ) 4. Apply WLS to estimate the equation REMARKS: Even the normality assumption of u i is violated, OLS estimates of α and β are unbiased and consistent, but inefficient because of the heteroskedasticity. Why not estimate α and β by regressing Y against a constant and X? Does this cause any problem? Reason: In the case of dummy dependent variable, the residual will be heteroskedastic, and hence the application of OLS will yield inefficient estimates. DISCUSSIONS: The LPM is plagued by several problems, such as

4 4 LECTURE 12 QUALITATIVE DEPENDENT VARIABLE 1. nonnormality of U i 2. heteroskedasticity of U i 3. possibility of Ŷi lying outside the 0-1 range 4. the generally lower R 2 values 5. it is not logically a very attractive model because it assumes that P i = E[Y = 1 X] increases linearly with X. We need a (probability) model that has two features: 1. As X i increases, P i = E[Y i = 1 X i ] increases but never steps outside the 0-1 interval 2. the relationship between P i and X i is nonlinear, that is, one which approaches zero at slower and slower rates as X i gets small and approaches one at slower and slower rates as X i gets very large. P i = f(x i ) f( ) is S-shaped and resembles the cumulative distribution function (CDF) of a random variable. Practically, the CDFs commonly chosen to represent the 0-1 response models are 1. the logistic distribution (logit model) 2. the normal distribution (probit model) Maximum Likelihood Estimation (MLE) Use MLE method to estimate β, the collection of β 1,, β k, σ 2, and Ω(θ), the variance matrix of parameters, at the same time. Let Γ estimate Ω 1. Then, the log likelihood can be written as: log L = n 2 ( ) log(2π) + log σ 2 1 2σ 2 ε Γε + 1 log Γ 2

5 5 LECTURE 12 QUALITATIVE DEPENDENT VARIABLE First Order Condition (FOC): log L β log L σ 2 = 1 σ 2 X Γ(Y Xβ) = n 2σ σ (Y 4 Xβ) Γ(Y Xβ) = 1 [Γ 1 ( 1 ] 2 σ 2 )εε = 1 2σ 2 (σ2 Ω εε ) 12.3 Heteroskedasticity-Robust Inference After OLS Estimation Since hypotheses tests and confidence interval with OLS are invalid in the presence of heteroskedasticity, we must make decide if we entirely abandon OLS or reformulate the adequate corresponding test statistics or confidence intervals. For the later options, we have to adjust standard errors, t, F, and LM statistics so that they are valid in the presence of heteroskedasticity of unknown form. Such procedure is called heteroskedasticity-robust inference and it is valid in large samples How to estimate the variance, Var( ˆβ j ), in the presence of heteroskedasticity Consider the simple regression model, y i = β 1 + β 2 x i + u i Assume assumptions A1-A4 are satisfied. If the errors are heteroskedastic, then Var(u i x i ) = σ 2 i The OLS estimator can be written as ˆβ 1 = β 1 + (x i x)u i (x i x) 2 and we have Var(b 2 ) = (x i x) 2 σ 2 i SST 2 x

6 6 LECTURE 12 QUALITATIVE DEPENDENT VARIABLE where SST x = n i=1 (x i x) 2 is the total sum of squares of the x i. Note: When σ 2 i = σ 2 for all i, Var( ˆβ 1 ) reduces to the usual form, σ 2 /SST x. Regarding the way to estimate Var( ˆβ 2 ) in the presence of heteroskedasticity, White (1980) proposed a procedure which is valid in large samples. Let û i denote the OLS residuals from the initial regression of y on x. White (1980) suggested a valid estimator of Var( ˆβ 2 ) for heteroskedasticity of any form (including homoskedasticity), is (x i x) 2 û 2 i SST 2 x Brief proof: (for complete proof, please refer to White (1980)) (x i x) 2 û 2 i n SST 2 x p E[(x i µ x ) 2 u 2 i ]/(σ 2 x) 2 n Var( ˆβ ni=1 (x i x) 2 σi 2 1 ) = n SST 2 x p (x i x) 2 û 2 i SST 2 x Therefore, by the law of large number and the central limit theorem, we can use this estimator, Var( ˆβ ni=1 (x i x) 2 û 2 i 1 ) = SST 2 to construct confidence x intervals and t test. For the multiple regression model, y i = β 1 + β 2 x i2 + + β k x ik + u i under assumptions A1-A4, the valid estimator of Var( ˆβ j ) is Var( ˆβ j ) = r 2 ijû 2 i SST 2 j where r ij denotes the ith residuals from regressing x j on all other independent variables (including an intercept), and SST j = n i=1 (x ij x j ) 2. REMARKS: The variance of the usual OLS estimator ˆβ j is Var( ˆβ σ 2 j ) = SST j (1 Rj), 2 for j = 1,..., k, where SST j = n i=1 (x ij x j ) 2, and Rj 2 is the R 2 from the regressing x j on all other independent variables (and including an intercept).

7 7 LECTURE 12 QUALITATIVE DEPENDENT VARIABLE The square root of Var( ˆβ j ) is called heteroskedasticity-robust standard error for ˆβ j. Once the heteroskedasticity-robust standard errors are obtained, we can then construct a heteroskedasticity-robust t statistic. estimate hypothesized value t = standard error 12.4 The Probit Model Assume there is a response function of the form I i = α + βx i, where X i is observable but where I i is an unobservable variable. What we observe in practice is Y i, which takes the value 1 if I i > I and 0 otherwise. I is a critical or threshold level of the index. For example, we can assume that the decision of ith household to own a house or not depends on an unobservable utility index that is determined by X. We thus have Y i = 1 Y i = 0 if α + βx i > I if α + βx i I If we denote by F (z) the cumulative distribution function of the normal distribution, that is, F (z) = P (Z z), then where t N(0, 1). P i = P (Y i = 1) = P (I i > I i ) = F (I i ) = 1 2π Ti = 1 2π α+βxi e t2 /2 dt I i = F 1 (I i ) = F 1 (P i ) = α + βx i e t2 /2 dt The joint probability density of the sample of observations (called the likelihood function) is therefore given by L = ( ) [ ( )] α βxi α βxi F 1 F σ σ Y i =0 Y i =1 the parameters α and β are estimated by maximizing L, which is highly nonlinear in parameters and cannot be estimated by conventional regression programs. It needs specialized nonlinear optimization procedures, such as BHHH and BFGS.

8 8 LECTURE 12 QUALITATIVE DEPENDENT VARIABLE 12.5 The Logit Model The logit model (aka the logistic model) has the following functional form: P i = E[Y i = 1 X i ] = e (α+βx i) We see that if βx +, P 1, and when βx, P 0. Thus, P can never be outside the range [0,1]. logistic distribution function: The CDF for a logistic random variable is P r(z z) = F (z) = e z F (z) 1 as z, F (z) 0 as z. Since P i is defined as the probability of Y i = 1, then 1 P i is the probability of Y i = 0. 1 P i = e z Hence, we have P i 1 P i = 1 + ezi 1 + e z i By taking the natural log, we obtain = e z i ( ) Pi L i = ln = z i = α + βx i 1 P i Estimation of the Logit Model ( ) Pi L i = ln = α + βx i + u i 1 P i If we have data on individual households, with P i = 1 if a household owns a house and P i = 0 if it does not own a house. But it is meaningless to calculate the logarithm of P i /(1 P i ) since it is undefined when P i is either 0 or 1. Therefore we can not estimate this model by the standard OLS. We need to use MLE to estimate the parameters.

9 9 LECTURE 12 QUALITATIVE DEPENDENT VARIABLE If we have data in which P is strictly between 0 and 1, then we can simply transform P and obtain Y i = ln[p i /(1 P i )]. Then regress Y i against a constant and X i. The marginal effect of X on P ˆP X = ˆβe (ˆα+ ˆβX) [1 + e (ˆα+ ˆβX) ] 2 ( ) = ˆβ ˆP 1 ˆP 12.6 The Tobit Model(or Censored Regressions) The observed values of a dependent variable sometimes have a discrete jump at zero, that is, some of the values may be zero while others are positive. We therefore never observe negative values. What are the consequences of disregarding this fact and regressing Y against a constant and X? In this situation, the residual will not satisfy the condition E(u i ) = 0, which is required for the unbiasedness of estimates. The Tobit Model(or Censored Regressions) There is an asymmetry between observations with positive values of Y and those with negative values. The model becomes { α + βxi + u Y i = i if Y i > 0 or u i > α βx i 0 if Y i 0 or u i α βx i The basic assumption behind this model is that there exists an index function I i = α + βx i + u i for each economic agent being studied. If I i 0, the value of the dependent variable is set to zero. If I i > 0, the value of the dependent variable is set to I i. Suppose u has the normal distribution with mean zero and variance σ 2. We note that Z = u/σ is a standard normal random variable. Denote by f(z) the probability density of the standard normal variable Z, and by F (z) its cumulative density that is, P [Z z]. Then the joint probability density for those observations for which Y i is positive is given by the following: P 1 = i=m i=1 1 σ f [ ] Yi α βx i σ

10 10 LECTURE 12 QUALITATIVE DEPENDENT VARIABLE where m is the number of observation in the subsample for which Y is positive. For the second subsample (of size n) for which the observed Y is zero, the random variable u α βx. The probability for this event is P 2 = = j=n j=1 j=n j=1 P [u j α βx j ] F [ ] α βxj The joint probability for the entire sample is therefore given by L = P 1 P 2. Because this is nonlinear, the OLS is not applicable here. We have to employ maximum likelihood procedure to obtain estimates of α and β, i.e., to maximize L with respect to the parameters. References Greene, W. H., 2003, Econometric Analysis, 5th ed., Prentice Hall. Chapter 21. Gujarati, D. N., 2003, Basic Econometrics, 4th ed., McGraw-Hill. Chapter 15. σ