Continuous Outcomes. Objectives. Review the linear regression model (LRM) Discuss the idea of identification Present the method of maximum likelihood

Transcription

1 Continuous Outcomes Objectives Review the linear regression model (LRM) Discuss the idea of identification Present the method of maximum likelihood Continuous LHS \ 1

2 The Linear Regression Model y = x β + ε i i i ( x x x ) If x = 1, i i1 i2 i3 β0 β = + ε [ x x x ] 1 1 i 1 i 2 i 3 i β2 β3 = β + β x + β x + β x + ε 0 1 i1 2 i2 3 i3 i Continuous LHS \ 2

3 Graphically Continuous LHS \ 3

4 Assumptions Linearity Linear independence of the x's Errors: Zero conditional mean Homoscedastic Uncorrelated for any pair of x's [Normality] Continuous LHS \ 4

5 Linearity y is linearly related to the x's through the β's Continuous LHS \ 5

6 Collinearity the x k 's are not perfectly collinear Continuous LHS \ 6

7 Zero conditional mean E( ε x ) = 0 i i This identifying assumption implies: ( i xi) = E( xiβ + εi xi) = xβ + E( ε x ) Ey = i i i x β i Continuous LHS \ 7

8 Homoscedastic errors 2 ( ε ) = σ Var i x i for all i Continuous LHS \ 8

9 Uncorrelated errors For two observations i and j, the covariance between ε i and ε j is 0. What common situations violate this assumption? Continuous LHS \ 9

10 Estimation by OLS The OLS estimator of β is that value ˆβ that minimizes the sum of the squared residuals!! β = ( X X) 1 X y N i=1 ( y i x i β )2: When the assumptions of the model hold, β is the best linear unbiased estimator Continuous LHS \ 10

11 Estimation by Maximum Likelihood Instead of minimizing the sum of squared errors We maximize the likelihood The ML estimate is that value of the parameter that makes the observed (sample) data most likely Continuous LHS \ 11

12 A simple example o s be the # of men in the sample o N be the sample size o π be the population probability of being male We know s and N We want an estimate of π L ( π, s N) Continuous LHS \ 12

13 The Likelihood Function Binomial formula N! s Pr( s π, N) = (1 ) s!( N s)! π π N s Note that! k!= k (k 1) 2 1 Rewrite as likelihood function for s=3 & N=10 10! 3!7! 3 7 ( π = 3, = 10 ) = π (1 π) L s N Continuous LHS \ 13

14 Probability of s given fixed N and π (from Binomial formula) p(s p=.3, N=10) Continuous LHS \ 14

15 Probability of π given fixed N and s (from Likelihood formula) Continuous LHS \ 15

16 Maximize the likelihood The maximum occurs when the derivative (or gradient) is zero ( π 3, 10) L s= N= π = 0 The value that maximizes the likelihood function also maximizes the log of the likelihood (which is easier to calculate): ( π ) ln L s= 3, N= 10 π = 0 Continuous LHS \ 16

17 For our example, 10! 3 7 ln L( π s= 3, N= 10) ln 3!7! π (1 π) = π π 10! ln 7ln 1 3!7! 3lnπ = + + π π π 3lnπ 7ln(1 π) = π π 3 7 = = 0 π 1 π ( π )!π =.3 maximizes the likelihood Continuous LHS \ 17

18 Question {for you} Does anything about this approach seem strange to you? [HINT: What if our data represented coin flips, with s = HEADs?] Continuous LHS \ 18

19 ML estimation of the Sample Mean PDF for y 2 1 ( yi µ ) fy ( i µσ, = 1) = exp 2π 2 Rewrite in terms of µ 2 1 ( yi µ ) L( µ yi, σ = 1) = exp 2π 2 For three independent observations, the likelihood is L( µ y, σ = 1) = ln L( µ y, σ = 1) 3 i i Continuous LHS \ 19

21 ML estimation for the LRM The pdf ( x, ) f y i [ α βx ] 1 yi + i α + β i σ = ϕ σ σ Rewrite as likelihood equation L ( αβσ,, yx, ) [ α βx ] N 1 yi + i = ϕ i= 1 σ σ Continuous LHS \ 21

23 The Properties of ML Estimators Under very general conditions, the ML estimator is: Consistent Asymptotically efficient Asymptotically normally distributed These are asymptotic properties; they describe the ML estimator as the sample size approaches infinity. But, how big must N be to be approximately infinite? Continuous LHS \ 23

24 Guidelines It is risky to use ML for N < 100; N > 500 seems safe. These values should be raised depending on characteristics of the model and the data Some models seem to require more observations for example, the ordinal regression model Continuous LHS \ 24

25 Identification Occurs before estimation Continuous LHS \ 25

26 Demonstration of Identification In the LRM, the structural model is:! y = β + β x + + β x + ε where K K ( x ) E ε = 0 If we assume: ( ε x ) E = δ The structural equation can be modified to create an error with mean zero: y = 0+ β 0 + β 1 x β K x K + ε = ( δ δ )+ β 0 + β 1 x β K x K + ε = ( β 0 +δ )+ β 1 x β K x K + ε δ! = β 0 + β 1 x β K x K + ε ( ) Continuous LHS \ 26

27 This equation has all of the properties of the LRM, including ( x ) E ε = 0 But note: E β 0! = β +δ 0 No matter how large the sample, it is impossible to disentangle estimates of β and 0 δ β and δ are not identified individually, although their sum β0 0 identified + δ is Continuous LHS \ 27

28 Graphically ( ε x ) E = δ Continuous LHS \ 28

29 Basic Ideas about Identification A parameter is unidentified when it is impossible to estimate the parameter regardless of the data available Models become identified by adding assumptions, not by increasing the sample size. For example E( ε ) = 0 It is possible for some parameters of a model to be identified while others are not. For example, β but not α While individual parameters may not be identified, combinations of those parameters may be identified. For example, α+ δ but not δ and α Continuous LHS \ 29

30 Interpreting Regression Coefficients Slopes as marginal change (partial derivative) Slopes as discrete change (first difference) Relationship between discrete and marginal change Continuous LHS \ 30

31 Partial or Marginal Change The partial derivative of y with respect to x : k ( x) Ey x k xβ = = βk x k Continuous LHS \ 31

32 Discrete Change Notation Before: Ey (, x) x 2 is the expected value of y given x, explicitly noting a specific value of x 2 After: Ey (, x + 1) increases by 1 x 2 is the expected value of y given x when x 2 Continuous LHS \ 32

33 ΔE ( y x, x ) Δx 2 2 = After Before Ey ( x, x2 + 1) E( y x, x2 ) [ β0 + β1x+ β2( x2+ 1) + β3x3] [ β0 + β1x1+ β2x2+ β3x3 ] [ β + β x+ β x + β + β x ] [ β + β x + β x + β x ] = = = = β Continuous LHS \ 33

34 Equality of Discrete and Partial Change In the LRM, ( x) Δ ( x, ) E y E y x x k 2 = = Δx 2 β k Continuous LHS \ 34

35 Simple Interpretation For a unit increase in x the expected change in k y equals β, holding all k other variables constant Having characteristic x (as opposed to not having the characteristic) k results in an expected change of β in k y, holding all other variables constant Continuous LHS \ 35

36 Data Career data on biochemists that obtained their Ph.D.s in 1957, 1958, 1962, and 1963 (n=408) Continuous LHS \ 36

37 Descriptive Information Name Mean Std Dev Min Max Description JOB Prestige of job (from 1 to 5). FEM if female; 0 if male. PHD Prestige of Ph.D. department. MENT Citations received by mentor. FEL if held fellowship; else 0. ART Number of articles published. CIT Number of citations received. Continuous LHS \ 37

38 . use regjob3,clear (Long's data on academic jobs of biochemists \ ). codebook job fem phd ment fel art cit, compact Variable Obs Unique Mean Min Max Label job Prestige of 1st job on 1 to 5 scale fem Gender: 1=female 0=male phd PhD prestige on 1 to 5 scale ment Citations received by mentor fel Fellow: 1=yes 0=no art # of articles published cit # of citations received job Prestige of 1st job on 100 to 500 scale phd PhD prestige on 100 to 500 scale Continuous LHS \ 38

39 . summarize job fem phd ment fel art cit Variable Obs Mean Std. Dev. Min Max job fem phd ment fel art cit Continuous LHS \ 39

40 Stata: Estimating the LRM Our LRM is: JOB = + FEM + PHD + MENT + FEL + ART + CIT + In Stata: β β β β β β β ε <command> <y> <x x x x>, <options>. regress job fem phd ment fel art cit Source SS df MS Number of obs = F( 6, 401) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = job Coef. Std. Err. t P> t [95% Conf. Interval] fem phd ment fel art cit _cons Continuous LHS \ 40

41 Simple Interpretation job Coef. Std. Err. t P> t [95% Conf. Interval] fem phd ment fel art cit _cons For every additional citation, the prestige of the first job is expected to increase by.004 units, holding all other variables constant. The expected prestige of the first job is.14 points lower for females as compared to their male counterparts. Continuous LHS \ 41

42 Comparison of Linear & Nonlinear Models Continuous LHS \ 42

43 In nonlinear models, partial and discrete change are not equal: ( ) E( ) E Δ x Δx k k In nonlinear models, both discrete & partial change depend on: the value of x k, and the values of the other x's in the model Continuous LHS \ 43

44 Standardized and Semi- Standardized Coefficients y- standardized x- standardized fully standardized Continuous LHS \ 44

45 y- standardized coefficients Standardizing y to a unit variance: y β β β β ε = + x + x + x + σ σ σ σ σ σ y y y y y y Adding new notation: S S S S S S y = β0 + β1 x1+ β2 x2+ β3 x3+ ε y y y y y Continuous LHS \ 45

46 Interpretation For a continuous variable: o For a unit increase in x, k y is expected to change by deviations, holding all other variables constant For a dummy variable: S k y β standard o Having characteristic x (as opposed to not having the k Sy characteristic) results in an expected change in y of β standard k deviations, holding all other variables constant Continuous LHS \ 46

47 x- standardized coefficients Standardizing the x's to a unit variance: x x x y = β + σ β + σ β + σ β + ε Adding new notation: ( ) 1 ( ) 2 ( ) σ1 σ2 σ3 y= β + β x + β x + β x + ε S x S S x S S x S Continuous LHS \ 47

48 Interpretation For a continuous variable o For a standard deviation increase in x k, y is expected to change by β units, holding all other variables constant S x k For a dummy variable o The meaning of a standard deviation change is unclear Continuous LHS \ 48

49 Fully standardized coefficients Standardizing both y and x's: y β 0 σβ 1 1 x 1 σβ 2 2 x 2 σ3β 3 x3 ε = σy σ y σ y σ 1 σ y σ 2 σ y σ3 σy Adding new notation: y = β + β x + β x + β x + ε S S S S S S S S S y Continuous LHS \ 49

50 Interpretation For a continuous variable: o For a standard deviation increase in x, k y is expected to change by β standard deviations, holding all other variables constant S k For a dummy variable: o The meaning of a standard deviation change is unclear Continuous LHS \ 50

51 Stata: Using listcoef to compute standardized effects After running regress, the listcoef command is used:. listcoef, cons help regress (N=408): Unstandardized and Standardized Estimates Observed SD: SD of Error: job b t P> t bstdx bstdy bstdxy SDofX fem phd ment fel art cit _cons b = raw coefficient t = t-score for test of b=0 P> t = p-value for t-test bstdx = x-standardized coefficient bstdy = y-standardized coefficient bstdxy = fully standardized coefficient SDofX = standard deviation of X Continuous LHS \ 51

52 Interpreting standardized coefficients job b t P> t bstdx bstdy bstdxy SDofX fem phd ment fel art cit _cons Your turn Continuous LHS \ 52

53 End LRM Continuous LHS \ 53