Overview and Introduction. Contact

Transcription

1 Overview and Introduction Contact Lecturer: S. Sperlich (MZG 8.128) Lecture: Tue at 8-10am, MZG Contact: (Office hours: by appointment) Assistent: M. Dickel (MZG 8.135) Tutorial: Thu at 8-10am, MZG or WiSoRZ Start: Thu April 16 at 8am s.t. WiSoRZ Contact: (Office hours: by appointment) Econometrics 2 (Summer 2009) 1 / 35

2 Overview and Introduction Contents Binary Choice Models Multiple Diskrete Variables (Polychotome) Models Tobit 1, Tobit 2, Tobit 3 Hypothesis Tests (with recapitulation) GMM (including recapitulation of Endogeneity and IV-methods) Duration and Survival Models Paneldata Analysis (basics) Econometrics 2 (Summer 2009) 2 / 35

3 Overview and Introduction Literature Arellano, M. (2003) Panel Data Econometrics, Oxford University Press. Baltagi, B. (2001) Econometric Analysis of Panel Data, Wiley College Textbooks. Berndt, E. R. (1996) The practice of econometrics: classic and contemporary, Addison Wesley. Greene, W. (2003) Econometric Analysis, Prentice Hall. Hsiao, C. (2003) Analysis of Panel Data, Cambridge University Press. Judge, G., Hill, R., Griffiths, W., Lütkepohl, H. (1988) Introduction to the Theory and Practice of Econometrics, New York: Wiley. Maddala, G. S. (1986) Limited dependent and Qualitative Variables in Econometrics. Wooldridge, J. (2002) Econometric Analysis of Cross Section and Panel Data, MIT Press. Econometrics 2 (Summer 2009) 3 / 35

4 Overview and Introduction Organization Language (english - german) Tutorials (IT-room and/or seminarroom) Timetable for tutorials Exams Requirements: maths, statistics, profound knowledge of multivariate regression, Desired: Introduction of Econometrics or Econometrics 1 Econometrics 2 (Summer 2009) 4 / 35

5 Chapter 1 Binary and Multiple Choice Models Examples? Previous knowledge linear, logit, probit Differences between normal linear regression Econometrics 2 (Summer 2009) 5 / 35

6 Discrete Choice Models Classes of Discrete Variables binary multinomial Further classification of multinomial discrete variables: categorical y = 1, if income < 3000 e. y = 2, if income between 3000 and 5000 e. y = 3, if income > 5000 e. non categorical y = number of cars in the household. Further classification of categorical variables depends if they have a natural order or sequence. Econometrics 2 (Summer 2009) 6 / 35

7 Discrete Choice Models Classes of Discrete Variables nominal/ unordered categorical y = 1, if the mode of transport is by car. y = 2, if the mode of transport is by bus. y = 3, if the mode of transport is by train. ordered and/ or sequential y = 1, if an individual chooses not to work. y = 2, if an individual wants to work, but can t get a job. y = 3, if the individual works. The characteristics of any discrete variable dictate the methods available for model solution. Econometrics 2 (Summer 2009) 7 / 35

8 Binary Choice Models Theoretical Framework Consider a binary dependent variable y, which has only two possible outcomes (0 and 1), and a vector of explanatory variables x thought to influence the realization of y. The unconditional expectation of the binary variable y is by definition a probability: E(y) = P(y = 1). Further, let the set of explanatory variables x influence the outcome of y. Then, the conditional expectation of y given x is: E(y x) = P(y = 1 x). Econometrics 2 (Summer 2009) 8 / 35

9 Binary Choice Models Theoretical Framework Relate this term to the standard regression analysis: for the conditional expectation y = F (x, β) + u, E(y x) = E(F (X, β) + u x) = F (x, β) + E(u x) = F (x, β). Hence, the standard regression functional F (x, β) is a representation of the conditional expectation of y given x. If the dependent variable in a regression relationship is binary, then the regression functional F (x, β) equates directly to the conditional probability of observing y = 1. Thus, the characteristics of binary choice models crucially depend on the way we specify the regression functional F (x, β). Econometrics 2 (Summer 2009) 9 / 35

10 The Linear Probability Model The Linear Probability Model Consider a binary dependent variable y and a (k dim.) vector of explanatory variables x. We may specify the conditional probability directly as: P(y = 1 x) = F (x, β) = x β. Introducing random disturbances, we have y = x β + u, where u represents the stochastic disturbance term in the relationship, f (u) represents its density and E(u x) = 0 by definition. For a sample of n observations {y i, x i } drawn at random from a population, y i = x i β + u i. OLS-estimation procedures may be applied. Known as the Linear Probability Model (LPM) Econometrics 2 (Summer 2009) 10 / 35

11 The Linear Probability Model Econometrics 2 (Summer 2009) 11 / 35

12 The Linear Probability Model Problems with the LPM disturbance terms are non-normal u i = 1 x i β with probability f (u i) = x i β u i = x i β with probability f (u i) = 1 x i β disturbance terms are heteroskedastic Var(u i ) = E(u 2 i ) = ( x i β) 2 (1 x i β) + (1 x i β) 2 (x i β) = (x i β) (1 x i β) = P(y i = 1 x i ) P(y i = 0 x i ). the conditional expectation is not bounded between zero and one E(y i x i ) = P(y i = 1 x i ) = x i β, which is defined over the entire real line. Econometrics 2 (Summer 2009) 12 / 35

13 Possible Solutions The Linear Probability Model Weighted Least Squares to account for the heteroskedasticity, with weights w i = (x i ˆβ) (1 x i ˆβ), calculated from a first-stage estimation. The adjusted model then becomes y i = x i u i β +. w i w i w i This still does not return probabilities within the range [0, 1]. A better solution is to re-specify or to transform the regression model itself to constrain the probability outcome. Econometrics 2 (Summer 2009) 13 / 35

14 The Linear Probability Model Econometrics 2 (Summer 2009) 14 / 35

15 Probit and Logit Models Probit and Logit Models In general, For the Linear Probability Model, E(y i x i ) = P(y i = 1 x i ) = F (x i, β). F (x i, β) = x i β To solve the probability problem, constrain the outcome F (x i, β) to the interval [0, 1]. Which alternatives do we know? Econometrics 2 (Summer 2009) 15 / 35

16 Probit and Logit Models The Transformation Approach For the Probit F (x i, β) = Φ(x i β), where Φ represents the cumulative distribution of the standard normal density. For the Logit where Λ(z) = represents the Logistic function. F (x i, β) = Λ(x i β), exp(z) 1 + exp(z) = exp( z) Which effect has the choice of the linkfunction? First, some characteristics: Econometrics 2 (Summer 2009) 16 / 35

17 Probit and Logit Models The Transformation Approach Notice that the functions Φ(z) and Λ(z) are both monotone increasing functions of z. Moreover, in both cases, F (x i, β) 0 falls x i β, F (x i, β) 1 falls x i β +. So, Probit and Logit models both return well-defined probabilities. However, because the transformed regression function is non-linear in β, we can no longer use OLS and must move to ML techniques. The LPM might therefore be considered a first-order approximation to the arbitrary non-linear probability function F ( ). That is F (x, β) F (x 0, β) + (x x 0 ) F (x 0, β) β = x β 0. using a first-order Taylor series expansion around x = x 0. Econometrics 2 (Summer 2009) 17 / 35

18 Probit and Logit Models Latent Variable Assume that there is some underlying (and unobserved) latent propensity variable y where y (, ). Whilst we do not observe y directly, we do observe a binary outcome y such that y = 1I{y > 0}. where 1I is termed the indicator function, taking the value 1 if the condition within parentheses is satisfied, and 0 otherwise. Define the latent equation in linear form: y = x β + u, where u is random with symmetric density f and corresponding cumulative density function F. Econometrics 2 (Summer 2009) 18 / 35

19 Probit and Logit Models Latent Variable We now have that E(y x) = P(y = 1 x) = P(y > 0 x) = P(x β + u > 0) = P(u > x β) = 1 F ( x β) = F (x β). By specifying an appropriate distribution function for u, we can derive the Probit and Logit models. When u is assumed normally distributed, parameters must be scaled to force the variance of u to σ 2 = Var(u) = 1. Why? P(y = 1 x) = P(u > x β) = P(u/σ > x (β/σ)) = P(z > x (β/σ)) = Φ(x (β/σ)). Econometrics 2 (Summer 2009) 19 / 35

20 Probit and Logit Models Theoretical Foundations Suppose, y = 1 represents a person that works, and y = 0 one that doesn t. Consider state-specific utilities U y : U y=1 = x β 1 + u 1, U y=0 = x β 0 + u 0. Participation in the work force requires that Uy=1 > U y=0, such that y = 1I{U y=1 > U y=0} = 1I{x β 1 + u 1 > x β 0 + u 0 } = 1I{u 1 u 0 > x (β 1 β 0 )}. Identify the difference β 1 β 0. Hence, where y = 1I{y > 0}, y = x (β 1 β 0 ) + (u 1 u 0 ) = x β + u. Econometrics 2 (Summer 2009) 20 / 35

21 ML Estimation ML Estimation Consider a sample of n observations {y i, x i }, where y i is binary. Assume y i = 1I{y i > 0} for yi = x i β + u i. For any vector β, the probability of observing y i conditional on x i is: Taking logs, L(β x i ) = = n P(y i x i, β) i=1 n P(y i = 0 x i, β) 1 y i P(y i = 1 x i, β) y i. i=1 ln L(β x i ) = n {(1 y i ) ln P(y i = 0 x i, β) + y i ln P(y i = 1 x i, β)}. i=1 Econometrics 2 (Summer 2009) 21 / 35

22 ML Estimation For the Probit model, P(y i = 1 x i, β) = Φ(x i β), P(y i = 0 x i, β) = 1 Φ(x i β) giving a log-likelihood of the form n ln L(β x i ) = {(1 y i ) ln(1 Φ(x i β)) + y i ln Φ(x i β)}. i=1 For the Logit model, to give ln L(β x i ) = P(y i = 1 x i, β) = Λ(x i β) = exp(x i β) 1 + exp(x i β), P(y i = 0 x i, β) = 1 Λ(x i 1 β) = 1 + exp(x i β) n {(1 y i ) ln(1 Λ(x i β)) + y i ln Λ(x i β)}. i=1 Econometrics 2 (Summer 2009) 22 / 35

23 First Order Conditions ML Estimation Parameters which maximize the general log likelihood require that For the Probit, S(β) = ln L(β x i) β = 0. For the Logit, S(β) = n i=1 S(β) = y i Φ(x i β) Φ(x i β) (1 Φ(x i β)) φ(x i β) x i. n i=1 [ y i exp(x i β) ] 1 + exp(x i β) x i. Solution to ML is obtained by finding parameters for which S(β) = 0. Econometrics 2 (Summer 2009) 23 / 35

24 Interpretation Binary Choice Models Let s keep concentrating on the following Binary Choice models: LPM P(y i = 1 x i, β) = x i β Probit P(y i = 1 x i, β) = Φ(x i β) Logit P(y i = 1 x i, β) = Λ(x i β). If β j is positive (negative), then P(y i = 1 x i, β) = F (x i β) will increase (decrease) with an increase in x j. Econometrics 2 (Summer 2009) 24 / 35

25 Marginal Effects Interpretation LPM Probit Logit P(y i = 1 x i, β) x ij P(y i = 1 x i, β) x ij = β j P(y i = 1 x i, β) x ij = = φ(x i β) β j exp(x i β) (1 + exp(x i β))2 β j Implications Slope estimates are not directly comparable. E. g. variance of disturbances in Logit model and the Probit model are different. Hence the parameters are also scaled differently. Econometrics 2 (Summer 2009) 25 / 35

26 Interpretation Econometrics 2 (Summer 2009) 26 / 35

27 Marginal Effects Interpretation Notice also: The marginal effects in the LPM are constant (i. e. independent of the data). The marginal effects in the Probit and Logit models depend on x i. A popular transformation: β LPM 0.25 β L for the slopes, and β LPM 0.25 β L for the intercept. β P β L. Econometrics 2 (Summer 2009) 27 / 35

28 Interpretation An Empirical Example: childcare take-up estimates Parameter Estimates Variable LPM Probit Logit single woman other children aged woman works left school at attended college/uni youngest child aged youngest child aged receives maintenance constant Datasource: 1991/92 General Household Survey, from which a random sample of n = 1288 women was taken, which are responsible for at least one child in pre-school-age. Econometrics 2 (Summer 2009) 28 / 35

29 Interpretation An Empirical Example: childcare take-up estimates The dependent variable is 1, if the woman pays for childcare, else 0. The reference in all cases is a married woman who doesn t work, has left school at 16, has one child aged less than two and who receives no maintenance. For the reference household, all explanatory variables take a value of 0, which leads to probability estimates in each model of: LPM P(y i = 1 x i ) = x i β = 0.153, Probit P(y i = 1 x i ) = Φ(x i β) = Φ( 0.995) = 0.161, Logit P(y i = 1 x i ) = Λ(x i β) = exp( 1.645) 1 + exp( 1.645) = Econometrics 2 (Summer 2009) 29 / 35

30 Interpretation An Empirical Example: childcare take-up estimates How, for example, does the probability change for women who attend university? LPM P(y i = 1 x i ) = x i β = = 0.313, Probit P(y i = 1 x i ) = Φ(x i β) = Φ( ) = Φ( 0.537) = 0.296, Logit P(y i = 1 x i ) = Λ(x i β) = exp( ) 1 + exp( ) = Econometrics 2 (Summer 2009) 30 / 35

31 Statistical Inference Binary Choice Models: statistical inference For the LPM, estimated standard errors are easily derived and evaluated. But remember, LPM is heteroskedastic. For the Probit and Logit models, where I is the Fisher-Information. n ( β β ) asym. N(0, I ( β) 1 ), Computer software for ML estimation evaluates the variance-covariance matrix V ( β) directly. Hence, statistical inference and hypothesis testing can be carried out using standard inferential techniques. Econometrics 2 (Summer 2009) 31 / 35

32 Goodness of fit Statistical Inference Let L UR represent likelihoods for the full model. Let L R represent likelihoods for a restricted model estimated on an intercept alone. Then the formulation for two proposed measures are as follows: Cragg Uhler pseudo R 2 = L2/n UR L2/n R 1 L 2/n R McFadden pseudo R 2 = 1 ln L UR ln L R. Remember: the classical and adjusted R 2 can t be used. (Why?), Econometrics 2 (Summer 2009) 32 / 35

33 Goodness of fit Statistical Inference An alternative outcome-based measure: the proportion of correct predictions. For P i = P(y i = 1 x i ), eg. Φ(x i β) (Probit), let ỹ i = 1I{ P i > 0.5} Define the proportion of correct predictions as P = 1 n 1I{y i = ỹ i }. n i=1 In many statistic computer programs, you can see tables of predicted and observes binary values: predicted observed n 00 n 01 1 n 10 n 11 This measure should be avoided. It doesn t make sense if one of the two conditions is hardly represented in the random sample. Econometrics 2 (Summer 2009) 33 / 35

34 Statistical Inference Testing the Overall Significance of the Regression Let L UR represent likelihoods for the full model. Let L R represent likelihoods for a restricted model. r represents the number of restrictions imposed. Then: 2 ln(l R /L UR ) = 2(ln L UR ln L R ) χ 2 r. For example: H 0 : β 2 = β 3 = = β k = 0 H A : at least one β j 0, j = 2,..., k. Econometrics 2 (Summer 2009) 34 / 35

35 Transitions in Binary Choice models Transitions in Binary Choice models Let y i = x i β + u i. Imagine a change from x i to x R i x i. This clearly alters the latent variable from y i to Measure used often: Odds Ratio y R i = (x R i ) β + u i. R(x i ) = P(y i = 1 x i )/P(y i = 0 x i ) But how does this exogenous shock impact on the probability P i(j k) of transition from any state j to k, (j, k = 0, 1)? Hence, we need probabilities P i(0 1) = P(y R i > 0 y i < 0). Econometrics 2 (Summer 2009) 35 / 35