# problem arises when only a non-random sample is available differs from censored regression model in that x i is also unobserved

Save this PDF as:

Size: px
Start display at page:

Download "problem arises when only a non-random sample is available differs from censored regression model in that x i is also unobserved"

## Transcription

1 4 Data Issues 4.1 Truncated Regression population model y i = x i β + ε i, ε i N(0, σ 2 ) given a random sample, {y i, x i } N i=1, then OLS is consistent and efficient problem arises when only a non-random sample is available specifically, {y i, x i } is observed iff y i b i differs from censored regression model in that x i is also unobserved examples only individuals with income below the poverty line are surveyed only firms with less than 100 employees are surveyed 50

2 MLE likelihood must account for truncation likelihood function ln[l(θ)] = ln[pr(y i x i, θ, b i, y i b i )] i again, what is Pr(y i x i, θ, b i, y i b i )? Pr(y i y i b i ) = f(y i )/F (b i ), where f( ) is the PDF of y and F ( ) is the CDF of y division by F (b) rescales probabilities to sum to one implies likelihood function is ln[l(θ)] = ln[pr(y i x i, θ, b i, y i b i )] i = [ ] (1/σ) ln φ(εi /σ) i Φ(b i /σ) 51

3 truncation from above and below population model y i = x i β + ε i, ε i N(0, σ 2 ) where {y i, x i } is observed iff a i y i b i likelihood function ln[l(θ)] = i ln[pr(y i x i, θ, a i, b i, a i y i b i )] again, what is Pr(y i x i, θ, a i, b i, a i y i b i )? likelihood function is ln[l(θ)] = ln[pr(y i x i, θ, a i, b i, a i y i b i )] i = [ ] (1/σ) ln φ(εi /σ) i Φ(b i /σ) Φ(a i /σ) 52

4 marginal effects truncated from above only E[y i y i b i ] x k = β k ( 1 λ 2 i α i λ i ) α i = b i x i β σ λ i = φ(α i) Φ(α i ) truncated from above and below E[y i a i y i b i ] x k STATA: -truncreg- α i1 α i2 = β k = a i x i β σ = b i x i β σ λ i = φ(α i) Φ(α i ) { 1 λ 2 i α i2 λ i [b i a i ]φ(α i1 ) σ [Φ(α i2 ) Φ(α i1 )] } 53

5 4.2 Sample Selection (Incidental Truncation) population model y i = x i β + ε i, ε i N(0, σ 2 ) given a random sample, {y i, x i } N i=1, then OLS is consistent and efficient problem arises when data on y is only available for a non-random sample let S i = 1 if y i is observed; S i = 0 if y i is unobserved differs from truncated regression model in that x i is observed for regardless of S i differs from censored regression model in that there is no clear censoring rule; i.e., S i = 0 implies nothing is none about y i, whereas in censored regression we know that y i c i 54

6 implies following data structure have data on a random sample, {y i, x i, S i } N i=1, but y i =. if S i = 0 can only use M i S i observations to estimate any model examples wages only observed for workers firm profits only observed for firms that remain in business SAT scores only observed for test takers house prices only observed for houses on the market issue is OLS still unbiased and consistent? answer: depends 55

7 exogenous sample selection if S i depends only on: (i) exogenous observables, x i, or (ii) unobservables, u i, where u i ε i then OLS is unbiased and consistent, where estimation uses only the sub-sample of M observations example w i = α + βeduc i + γ 1 age + γ 2 age 2 + ε i and Pr(w i observed) = f(educ, age) then OLS using only workers is consistent 56

8 endogenous sample selection model outcome and selection equations simultaneously y i = x i β + ε i S i = z i γ + u i 1 if Si > 0 S i = 0 if Si 0 y i =. if S i = 0 ε i, u i N 2 (0, 0, σ 2, 1, ρ) x, z are exogenous z = [x w }{{} exclusion restriction(s) ] 57

9 problem E[y z] = xβ, but E[y z, S = 1] = xβ + ρφ(zγ)/φ(zγ), where ρφ(zγ)/φ(zγ) is known as the Inverse Mills Ratio implies that E[y z, S = 1] = xβ iff ρ = 0 OLS estimation of y i = x i β + ε i using only M observations omits the IMR term, which implies that solution ε i = ρφ(zγ)/φ(zγ) + ε i which is not mean zero, and is not independent of x estimate IMR (using i = 1,..., N) estimate probit model, where S is dependent variable and z are the covariates = γ obtain IMR i = φ(z i γ) Φ(z i γ) regress y i on x i, IMR i via OLS (using i = 1,..., M) test of endogenous selection H o : ρ = 0 H a : ρ 0 58

10 notes usual OLS standard errors are incorrect since IMR is predicted; must account for additional uncertainty due to estimation of γ need an exclusion restriction(s) a variable in z not in x due to the fact that otherwise model is identified from nonlinearity of IMR, which arises solely from the assumption of joint normality STATA: -heckman-, -heckman2-59

11 4.3 Cov(x, ε) 0 OLS requires Cov(x i, ε i ) = 0; otherwise, E[ β ols ] = β + Cov(x, ε) Var(x) β situation can arise for a number of reasons omitted variable bias (unobserved heterogeneity) reverse causation measurement error terminology x is exogenous if it is uncorrelated with ε x is endogenous if it is correlated with ε 60

12 4.3.1 Omitted Variable Bias a relevant regressor is excluded from the regression model and is correlated with x example y i = α + βx i + γw i + ε i True Model y i = α + βx i + ε i Estimated Model where ε i = γw i + ε i OLS on the estimated model yields E[ β ols ] = β + = β + = β + Cov(x, ε) Var(x) Cov(x, γw) + Cov(x, ε) Var(x) γ Cov(x, w) + Cov(x, ε) Var(x) if Cov(x, ε) = 0 (i.e., only source of correlation between x and ε is w), then Cov(x, w) E[ β ols ] = β + γ Var(x)? β depending on sgn(γ) and direction of correlation between x and w 61

13 notes w may represent an observed variable that is excluded by mistake, or an unobserved variable that the analyst does not have data on in multiple regression model, bias spills over across variables example y i = α + β 1 x 1i + β 2 x 2i + γw i + ε i True Model y i = α + β 1 x 1i + β 2 x 2i + ε i Estimated Model where ε i = γw i + ε i if Cov(x 1, ε) = 0, but Cov(x 2, ε) 0, then not only is β 2 biased, but β 1 is biased iff Cov(x 1, x 2 ) 0 62

14 4.3.2 Reverse Causation not only does x have an effect on y, but y also has an effect on x (i.e., the two variables are jointly determined) example: wages of working women and the number of children... more children may reduce a woman s productivity at work, or increase her desire for a more flexible job (sacrificing pay), thus reducing her wage; low wage woman may opt for more children because the opportunity cost of their time is lower model y i = α + βx i + ε i x i = θ + δy i + µ i where the parameters represent the structural parameters substitution for y in the second equation reveals x i = θ + δα + δβx i + δε i + µ i 1 = 1 δβ (θ + δα + δε i + µ i ) which implies that Cov(x, ε) 0 intuitively, an unobserved shock to y (i.e., ε) must be correlated with x since changes in y lead to changes in x 63

15 4.3.3 Measurement Error problem: data are measured imprecisely examples recall error coding errors mis-information (e.g., overstate income, understate drug use) rounding errors (e.g., labor supply = 40 hrs/wk, or rounded to nearest 5 ; income rounded to \$1000s) two cases: (i) error in the dependent variable, or (ii) error(s) in independent variable(s) 64

16 dependent variable true model y i = α + βx i + ε i, ε i N(0, σ 2 ε) where on a variable indicates correctly measured given a random sample {yi, x i }N i=1, OLS is consistent and efficient with measurement error, do not observe y i instead one observes y i where y i }{{} observed = y }{{} i + µ }{{} i true measurement error, µ i N(0, σ 2 µ) reliability ratio RR = Var(y ) Var(y) [0, 1] susbtitution implies that the estimated model is y i = α + βx i + (µ i + ε i ) = α + βx i + ε i 65

17 properties of OLS estimates β OLS is unbiased and consistent iff Cov(x, ε) = 0, which is the case if Cov(x, ε) = Cov(x, ε) } {{ } + Cov(x, µ) } {{ } 0 by assumption 0 if ME of x α OLS is unbiased and consistent iff β OLS is unbiased and consistent since α OLS = y β OLS x and E[ ε] = 0, which is the case if E[ ε] = E[ε] }{{} 0 by + β E[µ] }{{} 0 if assumption classical ME 66

18 OLS standard errors are correct if µ i N implies ε N this holds even if Cov(µ, ε) 0 what is σ 2 ε? Var( ε) = Var(µ + ε) = Var(µ) + Var(ε) + 2 Cov(µ, ε) = σ 2 µ + σ 2 ε + 2ρσ µ σ ε which is greater than Var(ε) if ρ = 0 if Var( ε) Var(ε), then standard errors are larger summary: Classical Errors-in-Variables (CEV) model assumptions (i) µ i N(0, σ 2 µ) (ii) Cov(µ, ε) = 0 (iii) Cov(x, µ) = 0 implications (i) OLS unbiased, consistent (ii) standard errors are correct (iii) R 2, standard errors due to extra noise in the data 67

19 independent variable true model y i = α + βx i + ε i, ε i N(0, σ 2 ε) where on a variable indicates correctly measured given a random sample {yi, x i }N i=1, OLS is consistent and efficient with measurement error, do not observe x i instead one observes x i where x i }{{} observed = x }{{} i + µ }{{} i true measurement error, µ i N(0, σ 2 µ) reliability ratio RR = Var(x ) Var(x) [0, 1] susbtitution implies that the estimated model is y i = α + βx i + (ε i βµ i ) = α + βx i + ε i 68

20 properties of OLS estimates β OLS is unbiased and consistent iff Cov(x, ε) = 0, which is not likely Cov(x, ε) = Cov(x, ε) + Cov(x, βµ) = Cov(x, ε) } {{ } 0 by assumption + Cov(µ, ε) } {{ }? βcov(x, µ) } {{ } 0 = β OLS is unbiased and consistent if (i) β = 0 and Cov(µ, ε), or (ii) Cov(µ, ε) = β Cov(x, µ) α OLS is unbiased and consistent iff β OLS is unbiased and consistent since α OLS = y β OLS x and E[ ε] = 0, which is the case if E[ ε] = E[ε] }{{} 0 by + β E[µ] }{{} 0 if assumption classical ME 69

21 summary: Classical Errors-in-Variables (CEV) model assumptions (i) µ i N(0, σ 2 µ) (ii) Cov(µ, ε) = 0 (iii) Cov(x, µ) = 0 implications (i) OLS biased, inconsistent (ii) β OLS is attenuated toward zero (i.e., biased toward zero, biased down in absolute value, correct sign) plim( β OLS ) = β + Cov(x, ε) Var(x) Cov(x, ε βµ) = β + Var(x) Cov(x, ε) β Cov(x, µ) = β + Var(x) = β + = β [ Cov(x, ε) Var(x) } {{ } =0 1 σ2 µ [ ] σ 2 = β x σ } {{ 2 x } [0,1] σ 2 x ] Cov(x, µ) β + Var(x) } {{ } = β = β RR }{{} [0,1] =0 [ σ 2 x σ 2 µ σ 2 x ] Cov(µ, µ) Var(x) } {{ } =σ 2 µ/σ 2 x which is smaller than β in absolute value, but of the same sign as β 70

22 (iii) in multiple regression yi = α + βx i + K γ kx ki + ε k=1 where x is a mismeasured version of x and x k, k = 1,..., K, are correctly measured, then β OLS suffers from attenuation bias, and γ k are also biased in a complex way iff x k is uncorrelated with x 71

23 4.3.4 The Solution: Instrumental Variables goal: devise alternative estimation technique to obtain consistent estimates when x is endogenous solution identify β from exogenous variation in x suppose x can be decomposed into two independent parts: x = x + x where Cov(x, ε) = Cov(x, ε) + Cov(x, ε) and Cov(x, ε) 0, but Cov(x, ε) = 0 idea is to use variation in x due to x to identify β; ignore variation in x from x since this impact of this variation on y confounds effects of x and ε to only use variation arising from x, need additional information get this new information by adding data on a new var, z, called an instrument or instrumental variable (IV) or exclusion restriction 72

24 z is an IV for x iff (i) Cov(x, z) 0 (ii) Cov(ε, z) = 0 (iii) E[y x, z] = E[y x] (i.e., z has no direct effect on y; z is excluded from the model for y) (i) and (ii) = z is correlated with x through x estimation techniques IV Two-Stage Least Squares (TSLS or 2SLS) MLE 73

25 IV estimator model y i = α + βx i + ε i implies Cov(y, z) = Cov(α, z) + Cov(βx, z) + Cov(ɛ, z) = β Cov(x, z) estimator which is unbiased, consistent β IV = Cov(y, z) Cov(x, z) formula β IV = 1 N 1 1 N 1 i (y i y)(z i z) i (x i x)(z i z) 74

26 properties of β IV β IV is consistent plim β IV = = = = β 1 N 1 1 N 1 1 N 1 1 N 1 1 N 1 i y i(z i z) i x i(z i z) i (α + βx i + ε i )(z i z) i x i(z i z) 1 N 1 i βx i(z i z) i x i(z i z) α IV is consistent, since α IV = y β IV x Var(ε) = σ 2 σ 2 = 1 N 2 (y i α IV β IV x i ) 2 i Var( β IV ) Var( β IV ) = σ N Var(x)ρ 2 x,z σ i (x i x) R 2 x,z }{{} (sample counterpart) = ρ 2 x,z in simple OLS which is decreasing in Var(x) and ρ x,z 75

27 notes Var( β IV ) > Var( β OLS ) if ρ 2 x,z < 1 recall, Var( β OLS ) = σ/ i (x i x) inefficient to use IV if x is exogenous IV is algebraically equivalent to OLS using x as an instrument for itself β IV = = 1 N 1 1 N 1 1 N 1 i (y i y)(z i z) i (x i x)(z i z) i (y i y)(x i x) i (x i x) 2 1 N 1 = β OLS and α IV = y β IV x = y β OLS x = α OLS and σ Var( β IV ) = i (x i x)rx,z 2 σ = i (x i x)rx,x 2 σ = i (x i x) = Var( β OLS ) 76

28 multiple regression with only 1 endogenous var exogenous x s serve as instruments for themselves solution is simple using matrix algebra multiple regression with more than 1 endogenous var need unique instrument for each endogeous var exogenous x s serve as instruments for themselves solution is simple using matrix algebra 77

29 TSLS estimation proceeds in 2 steps first-stage x i = δ + πz i + µ i estimable via OLS = x i Cov(x, ε) 0 = Cov(µ, ε) 0 x i varies across i due to variation in z i (not µ i since x i does not depend on µ i ) second-stage y i = α + β x i + ε i 78

30 notes β T SLS is consistent standard errors need to be adjusted since x i is a predicted regressor if multiple endogenous vars, need a unique IV for each endogenous x if second-stage contains other exogenous vars, these vars must be included in the first-stage test of π 0 is test for Cov(x, z) 0 can test endogeneity using a Hausman test comparing β T SLS with β OLS if more than 1 IV for an endogenous var, then model is overidentified (as opposed to exactly identified) test of non-zero covariance between the set of IVs and x is given by a test that the coeffs on all IVs are jointly equal to zero enables other tests for instrument validity GMM estimation is more efficient if ε is heteroskdastic 79

31 MLE estimate first- and second-stage simultaneously, but second-stage is replaced with reduced form (i.e., y is expressed solely as a function of exogenous variables in the model) model x i = δ + πz i + µ i y i = α + βx i + ε i (structural eqn) = (α + βδ) + βπz i + (ε i + βµ i ) = (α + βδ) + βπz i + ε i (reduced form) where and ε, µ N 2 (0, Σ) bivariate normal dbn Σ = σ2 ε ρσ ε σ µ ρσ ε σ µ σ 2 µ is a 2x2 symmetric, positive definite matrix 80

32 the joint dbn of the reduced form errors is ε, µ N 2 (0, Σ) where Σ = σ2 ε + β 2 σ 2 µ + 2βρσ ε σ µ ρσ ε σ µ + βσ 2 µ ρσ ε σ µ + βσ 2 µ σ 2 µ derive ln[l(θ)], where θ = {δ, π, α, β, σ ε, σ µ, ρ} ln[l(θ)] = i ln[pr(y i, x i z i, θ)] = ln[pr( ɛ i, µ i z i, θ)] i = [ ( ɛi ln J φ 2, µ )] i, i σ ɛ σ Σ µ where J is the determinant of the Jacobian and φ 2 is the bivariate std normal pdf estimates obtained as arg max θ ln[l(θ)] = [ ( ɛi ln J φ 2, µ )] i, i σ ɛ σ Σ µ test of H o : π = 0 is a test for Cov(x, z) 0 test of endogeneity given by H o : ρ = 0 81

33 specification tests testing endogeneity may be relevant for economic reasons relevant since OLS is more efficient if x is exogenous Hausman test if x is exogenous, then β IV β OLS if x is endogenous, then β IV β OLS define test statistic based on difference β IV β OLS H = ( βiv β ) OLS ( ΣIV Σ ) 1 OLS ( βiv β ) OLS χ 2 K where K = # of x s 82

34 Durbin-Wu-Hausman test model x i = δ + πz i + µ i y i = α + βx i + ε i x is endogenous iff Cov(µ, ε) 0 steps: (i) estimate µ i via OLS (ii) estimate y i = α + βx i + δ µ i + ε i via OLS (iii) test H o : δ = 0, rejection implies x is endogenous if multiple endogenous vars, then conduct joint test H o : δ 1 =... = δ K = 0 (K = # of endog vars) 83

35 testing overidentifying restrictions if # IVs > # endogenous vars, can test if Cov(z, ε) = 0 steps: (i) regress y on x via TSLS = α T SLS, β T SLS = ε i (ii) regress ε i on z s (all IVs) = R 2 (iii) test statistic NR 2 χ 2 q where q is # of overidentifying restrictions intuition: if Cov(z, ε) = 0, then explanatory power of second regression should be small, R

36 weak IV = Cov(x, z) 0 can show plim β IV = β + ρ z,ε ρ z,x σ ε σ x if z is a valid IV, then ρ z,x > 0 and ρ z,ε = 0 = plim β IV = β but, if ρ z,x 0 and/or ρ z,ε 0, then plim β IV β OLS plim β OLS = β + ρ x,ε σ ε σ x and the asymptotic bias of OLS is smaller than IV iff ρ z,ε > ρ x,ε ρ z,x which becomes more likely as ρ z,x 0 STATA: -ivreg2-85

### Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares Many economic models involve endogeneity: that is, a theoretical relationship does not fit

### ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages

### Instrumental Variables Regression. Instrumental Variables (IV) estimation is used when the model has endogenous s.

Instrumental Variables Regression Instrumental Variables (IV) estimation is used when the model has endogenous s. IV can thus be used to address the following important threats to internal validity: Omitted

### Instrumental Variables & 2SLS

Instrumental Variables & 2SLS y 1 = β 0 + β 1 y 2 + β 2 z 1 +... β k z k + u y 2 = π 0 + π 1 z k+1 + π 2 z 1 +... π k z k + v Economics 20 - Prof. Schuetze 1 Why Use Instrumental Variables? Instrumental

### Instrumental Variables & 2SLS

Instrumental Variables & 2SLS y 1 = β 0 + β 1 y 2 + β 2 z 1 +... β k z k + u y 2 = π 0 + π 1 z k+1 + π 2 z 1 +... π k z k + v Economics 20 - Prof. Schuetze 1 Why Use Instrumental Variables? Instrumental

### Econometrics II. Lecture 9: Sample Selection Bias

Econometrics II Lecture 9: Sample Selection Bias Måns Söderbom 5 May 2011 Department of Economics, University of Gothenburg. Email: mans.soderbom@economics.gu.se. Web: www.economics.gu.se/soderbom, www.soderbom.net.

### IMPACT EVALUATION: INSTRUMENTAL VARIABLE METHOD

REPUBLIC OF SOUTH AFRICA GOVERNMENT-WIDE MONITORING & IMPACT EVALUATION SEMINAR IMPACT EVALUATION: INSTRUMENTAL VARIABLE METHOD SHAHID KHANDKER World Bank June 2006 ORGANIZED BY THE WORLD BANK AFRICA IMPACT

### HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009

HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 1. Introduction 2. A General Formulation 3. Truncated Normal Hurdle Model 4. Lognormal

### Analysis of Microdata

Rainer Winkelmann Stefan Boes Analysis of Microdata With 38 Figures and 41 Tables 4y Springer Contents 1 Introduction 1 1.1 What Are Microdata? 1 1.2 Types of Microdata 4 1.2.1 Qualitative Data 4 1.2.2

### Lecture 15. Endogeneity & Instrumental Variable Estimation

Lecture 15. Endogeneity & Instrumental Variable Estimation Saw that measurement error (on right hand side) means that OLS will be biased (biased toward zero) Potential solution to endogeneity instrumental

### Econometrics Simple Linear Regression

Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight

### Reject Inference in Credit Scoring. Jie-Men Mok

Reject Inference in Credit Scoring Jie-Men Mok BMI paper January 2009 ii Preface In the Master programme of Business Mathematics and Informatics (BMI), it is required to perform research on a business

### An Introduction to Time Series Regression

An Introduction to Time Series Regression Henry Thompson Auburn University An economic model suggests examining the effect of exogenous x t on endogenous y t with an exogenous control variable z t. In

### On Marginal Effects in Semiparametric Censored Regression Models

On Marginal Effects in Semiparametric Censored Regression Models Bo E. Honoré September 3, 2008 Introduction It is often argued that estimation of semiparametric censored regression models such as the

### Generating Random Numbers Variance Reduction Quasi-Monte Carlo. Simulation Methods. Leonid Kogan. MIT, Sloan. 15.450, Fall 2010

Simulation Methods Leonid Kogan MIT, Sloan 15.450, Fall 2010 c Leonid Kogan ( MIT, Sloan ) Simulation Methods 15.450, Fall 2010 1 / 35 Outline 1 Generating Random Numbers 2 Variance Reduction 3 Quasi-Monte

### IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

IAPRI Quantitative Analysis Capacity Building Series Multiple regression analysis & interpreting results How important is R-squared? R-squared Published in Agricultural Economics 0.45 Best article of the

### 1 The Problem: Endogeneity There are two kinds of variables in our models: exogenous variables and endogenous variables. Endogenous Variables: These a

Notes on Simultaneous Equations and Two Stage Least Squares Estimates Copyright - Jonathan Nagler; April 19, 1999 1. Basic Description of 2SLS ffl The endogeneity problem, and the bias of OLS. ffl The

### Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni

1 Web-based Supplementary Materials for Bayesian Effect Estimation Accounting for Adjustment Uncertainty by Chi Wang, Giovanni Parmigiani, and Francesca Dominici In Web Appendix A, we provide detailed

### Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

Lecture 5: Linear least-squares Regression III: Advanced Methods William G. Jacoby Department of Political Science Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Simple Linear Regression

### Structural Econometric Modeling in Industrial Organization Handout 1

Structural Econometric Modeling in Industrial Organization Handout 1 Professor Matthijs Wildenbeest 16 May 2011 1 Reading Peter C. Reiss and Frank A. Wolak A. Structural Econometric Modeling: Rationales

### Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation

### Sales forecasting # 1

Sales forecasting # 1 Arthur Charpentier arthur.charpentier@univ-rennes1.fr 1 Agenda Qualitative and quantitative methods, a very general introduction Series decomposition Short versus long term forecasting

### Notes for STA 437/1005 Methods for Multivariate Data

Notes for STA 437/1005 Methods for Multivariate Data Radford M. Neal, 26 November 2010 Random Vectors Notation: Let X be a random vector with p elements, so that X = [X 1,..., X p ], where denotes transpose.

### Variance of OLS Estimators and Hypothesis Testing. Randomness in the model. GM assumptions. Notes. Notes. Notes. Charlie Gibbons ARE 212.

Variance of OLS Estimators and Hypothesis Testing Charlie Gibbons ARE 212 Spring 2011 Randomness in the model Considering the model what is random? Y = X β + ɛ, β is a parameter and not random, X may be

### Accounting for Time-Varying Unobserved Ability Heterogeneity within Education Production Functions

Accounting for Time-Varying Unobserved Ability Heterogeneity within Education Production Functions Weili Ding Queen s University Steven F. Lehrer Queen s University and NBER July 2008 Abstract Traditional

### MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance

### Panel Data: Linear Models

Panel Data: Linear Models Laura Magazzini University of Verona laura.magazzini@univr.it http://dse.univr.it/magazzini Laura Magazzini (@univr.it) Panel Data: Linear Models 1 / 45 Introduction Outline What

### Econometric Analysis of Cross Section and Panel Data Second Edition. Jeffrey M. Wooldridge. The MIT Press Cambridge, Massachusetts London, England

Econometric Analysis of Cross Section and Panel Data Second Edition Jeffrey M. Wooldridge The MIT Press Cambridge, Massachusetts London, England Preface Acknowledgments xxi xxix I INTRODUCTION AND BACKGROUND

### Lecture 3: Differences-in-Differences

Lecture 3: Differences-in-Differences Fabian Waldinger Waldinger () 1 / 55 Topics Covered in Lecture 1 Review of fixed effects regression models. 2 Differences-in-Differences Basics: Card & Krueger (1994).

### A Subset-Continuous-Updating Transformation on GMM Estimators for Dynamic Panel Data Models

Article A Subset-Continuous-Updating Transformation on GMM Estimators for Dynamic Panel Data Models Richard A. Ashley 1, and Xiaojin Sun 2,, 1 Department of Economics, Virginia Tech, Blacksburg, VA 24060;

### Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March 2015. Due:-March 25, 2015.

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment -3, Probability and Statistics, March 05. Due:-March 5, 05.. Show that the function 0 for x < x+ F (x) = 4 for x < for x

### Factor Analysis. Factor Analysis

Factor Analysis Principal Components Analysis, e.g. of stock price movements, sometimes suggests that several variables may be responding to a small number of underlying forces. In the factor model, we

α α λ α = = λ λ α ψ = = α α α λ λ ψ α = + β = > θ θ β > β β θ θ θ β θ β γ θ β = γ θ > β > γ θ β γ = θ β = θ β = θ β = β θ = β β θ = = = β β θ = + α α α α α = = λ λ λ λ λ λ λ = λ λ α α α α λ ψ + α =

### Measurement Error in Criminal Justice Data

Measurement Error in Criminal Justice Data John Pepper Department of Economics University of Virginia jvpepper@virginia.edu Carol Petrie Committee on Law and Justice National Research Council CPetrie@nas.edu

### Fraternity & Sorority Academic Report Fall 2015

Fraternity & Sorority Academic Report Organization Lambda Upsilon Lambda 1-1 1 Delta Chi 77 19 96 2 Alpha Delta Chi 30 1 31 3 Alpha Delta Pi 134 62 196 4 Alpha Sigma Phi 37 13 50 5 Sigma Alpha Epsilon

### From the help desk: Bootstrapped standard errors

The Stata Journal (2003) 3, Number 1, pp. 71 80 From the help desk: Bootstrapped standard errors Weihua Guan Stata Corporation Abstract. Bootstrapping is a nonparametric approach for evaluating the distribution

### Worked examples Multiple Random Variables

Worked eamples Multiple Random Variables Eample Let X and Y be random variables that take on values from the set,, } (a) Find a joint probability mass assignment for which X and Y are independent, and

### Regression with a Binary Dependent Variable

Regression with a Binary Dependent Variable Chapter 9 Michael Ash CPPA Lecture 22 Course Notes Endgame Take-home final Distributed Friday 19 May Due Tuesday 23 May (Paper or emailed PDF ok; no Word, Excel,

### Sales forecasting # 2

Sales forecasting # 2 Arthur Charpentier arthur.charpentier@univ-rennes1.fr 1 Agenda Qualitative and quantitative methods, a very general introduction Series decomposition Short versus long term forecasting

### Lecture 2: Simple Linear Regression

DMBA: Statistics Lecture 2: Simple Linear Regression Least Squares, SLR properties, Inference, and Forecasting Carlos Carvalho The University of Texas McCombs School of Business mccombs.utexas.edu/faculty/carlos.carvalho/teaching

### Solución del Examen Tipo: 1

Solución del Examen Tipo: 1 Universidad Carlos III de Madrid ECONOMETRICS Academic year 2009/10 FINAL EXAM May 17, 2010 DURATION: 2 HOURS 1. Assume that model (III) verifies the assumptions of the classical

### Hypothesis Testing in Linear Regression Models

Chapter 4 Hypothesis Testing in Linear Regression Models 41 Introduction As we saw in Chapter 3, the vector of OLS parameter estimates ˆβ is a random vector Since it would be an astonishing coincidence

### 2. Linear regression with multiple regressors

2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions

### Time Series Analysis

Time Series Analysis Autoregressive, MA and ARMA processes Andrés M. Alonso Carolina García-Martos Universidad Carlos III de Madrid Universidad Politécnica de Madrid June July, 212 Alonso and García-Martos

### Online Appendices to the Corporate Propensity to Save

Online Appendices to the Corporate Propensity to Save Appendix A: Monte Carlo Experiments In order to allay skepticism of empirical results that have been produced by unusual estimators on fairly small

### Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives

### Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

### Maximum likelihood estimation of a bivariate ordered probit model: implementation and Monte Carlo simulations

The Stata Journal (yyyy) vv, Number ii, pp. 1 18 Maximum likelihood estimation of a bivariate ordered probit model: implementation and Monte Carlo simulations Zurab Sajaia The World Bank Washington, DC

### Clustering in the Linear Model

Short Guides to Microeconometrics Fall 2014 Kurt Schmidheiny Universität Basel Clustering in the Linear Model 2 1 Introduction Clustering in the Linear Model This handout extends the handout on The Multiple

### Models for Count Data With Overdispersion

Models for Count Data With Overdispersion Germán Rodríguez November 6, 2013 Abstract This addendum to the WWS 509 notes covers extra-poisson variation and the negative binomial model, with brief appearances

### Quadratic forms Cochran s theorem, degrees of freedom, and all that

Quadratic forms Cochran s theorem, degrees of freedom, and all that Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 1 Why We Care Cochran s theorem tells us

### Basic Statistcs Formula Sheet

Basic Statistcs Formula Sheet Steven W. ydick May 5, 0 This document is only intended to review basic concepts/formulas from an introduction to statistics course. Only mean-based procedures are reviewed,

### Fraternity & Sorority Academic Report Spring 2016

Fraternity & Sorority Academic Report Organization Overall GPA Triangle 17-17 1 Delta Chi 88 12 100 2 Alpha Epsilon Pi 77 3 80 3 Alpha Delta Chi 28 4 32 4 Alpha Delta Pi 190-190 4 Phi Gamma Delta 85 3

### Using instrumental variables techniques in economics and finance

Using instrumental variables techniques in economics and finance Christopher F Baum 1 Boston College and DIW Berlin German Stata Users Group Meeting, Berlin, June 2008 1 Thanks to Mark Schaffer for a number

### Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes

Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, Discrete Changes JunXuJ.ScottLong Indiana University August 22, 2005 The paper provides technical details on

### Heteroskedasticity and Weighted Least Squares

Econ 507. Econometric Analysis. Spring 2009 April 14, 2009 The Classical Linear Model: 1 Linearity: Y = Xβ + u. 2 Strict exogeneity: E(u) = 0 3 No Multicollinearity: ρ(x) = K. 4 No heteroskedasticity/

### Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

### Chapter 4: Statistical Hypothesis Testing

Chapter 4: Statistical Hypothesis Testing Christophe Hurlin November 20, 2015 Christophe Hurlin () Advanced Econometrics - Master ESA November 20, 2015 1 / 225 Section 1 Introduction Christophe Hurlin

### Standard errors of marginal effects in the heteroskedastic probit model

Standard errors of marginal effects in the heteroskedastic probit model Thomas Cornelißen Discussion Paper No. 320 August 2005 ISSN: 0949 9962 Abstract In non-linear regression models, such as the heteroskedastic

### SAMPLE SELECTION BIAS IN CREDIT SCORING MODELS

SAMPLE SELECTION BIAS IN CREDIT SCORING MODELS John Banasik, Jonathan Crook Credit Research Centre, University of Edinburgh Lyn Thomas University of Southampton ssm0 The Problem We wish to estimate an

### Modern Methods for Missing Data

Modern Methods for Missing Data Paul D. Allison, Ph.D. Statistical Horizons LLC www.statisticalhorizons.com 1 Introduction Missing data problems are nearly universal in statistical practice. Last 25 years

### Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

### 2. What are the theoretical and practical consequences of autocorrelation?

Lecture 10 Serial Correlation In this lecture, you will learn the following: 1. What is the nature of autocorrelation? 2. What are the theoretical and practical consequences of autocorrelation? 3. Since

### P (x) 0. Discrete random variables Expected value. The expected value, mean or average of a random variable x is: xp (x) = v i P (v i )

Discrete random variables Probability mass function Given a discrete random variable X taking values in X = {v 1,..., v m }, its probability mass function P : X [0, 1] is defined as: P (v i ) = Pr[X =

### UNIVERSITY OF WAIKATO. Hamilton New Zealand

UNIVERSITY OF WAIKATO Hamilton New Zealand Can We Trust Cluster-Corrected Standard Errors? An Application of Spatial Autocorrelation with Exact Locations Known John Gibson University of Waikato Bonggeun

### Fitting Subject-specific Curves to Grouped Longitudinal Data

Fitting Subject-specific Curves to Grouped Longitudinal Data Djeundje, Viani Heriot-Watt University, Department of Actuarial Mathematics & Statistics Edinburgh, EH14 4AS, UK E-mail: vad5@hw.ac.uk Currie,

### Chapter 3: The Multiple Linear Regression Model

Chapter 3: The Multiple Linear Regression Model Advanced Econometrics - HEC Lausanne Christophe Hurlin University of Orléans November 23, 2013 Christophe Hurlin (University of Orléans) Advanced Econometrics

### Introduction to Hypothesis Testing. Point estimation and confidence intervals are useful statistical inference procedures.

Introduction to Hypothesis Testing Point estimation and confidence intervals are useful statistical inference procedures. Another type of inference is used frequently used concerns tests of hypotheses.

### Lecture Note: Self-Selection The Roy Model. David H. Autor MIT 14.661 Spring 2003 November 14, 2003

Lecture Note: Self-Selection The Roy Model David H. Autor MIT 14.661 Spring 2003 November 14, 2003 1 1 Introduction A core topic in labor economics is self-selection. What this term means in theory is

### The Bivariate Normal Distribution

The Bivariate Normal Distribution This is Section 4.7 of the st edition (2002) of the book Introduction to Probability, by D. P. Bertsekas and J. N. Tsitsiklis. The material in this section was not included

### Chapter 2. Dynamic panel data models

Chapter 2. Dynamic panel data models Master of Science in Economics - University of Geneva Christophe Hurlin, Université d Orléans Université d Orléans April 2010 Introduction De nition We now consider

### INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) Abstract Indirect inference is a simulation-based method for estimating the parameters of economic models. Its

### Exploratory Factor Analysis and Principal Components. Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016

and Principal Components Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016 Agenda Brief History and Introductory Example Factor Model Factor Equation Estimation of Loadings

### Bivariate Regression Analysis. The beginning of many types of regression

Bivariate Regression Analysis The beginning of many types of regression TOPICS Beyond Correlation Forecasting Two points to estimate the slope Meeting the BLUE criterion The OLS method Purpose of Regression

### PS 271B: Quantitative Methods II. Lecture Notes

PS 271B: Quantitative Methods II Lecture Notes Langche Zeng zeng@ucsd.edu The Empirical Research Process; Fundamental Methodological Issues 2 Theory; Data; Models/model selection; Estimation; Inference.

### The Multiple Regression Model: Hypothesis Tests and the Use of Nonsample Information

Chapter 8 The Multiple Regression Model: Hypothesis Tests and the Use of Nonsample Information An important new development that we encounter in this chapter is using the F- distribution to simultaneously

### 5. Linear Regression

5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4

### 15.062 Data Mining: Algorithms and Applications Matrix Math Review

.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

### Topic 4: Multivariate random variables. Multiple random variables

Topic 4: Multivariate random variables Joint, marginal, and conditional pmf Joint, marginal, and conditional pdf and cdf Independence Expectation, covariance, correlation Conditional expectation Two jointly

### MT426 Notebook 3 Fall 2012 prepared by Professor Jenny Baglivo. 3 MT426 Notebook 3 3. 3.1 Definitions... 3. 3.2 Joint Discrete Distributions...

MT426 Notebook 3 Fall 2012 prepared by Professor Jenny Baglivo c Copyright 2004-2012 by Jenny A. Baglivo. All Rights Reserved. Contents 3 MT426 Notebook 3 3 3.1 Definitions............................................

### Post-Secondary Education in Canada: Can Ability Bias Explain the Earnings Gap Between College and University Graduates?

DISCUSSION PAPER SERIES IZA DP No. 2784 Post-Secondary Education in Canada: Can Ability Bias Explain the Earnings Gap Between College and University Graduates? Vincenzo Caponi Miana Plesca May 2007 Forschungsinstitut

### CHAPTER 6. SIMULTANEOUS EQUATIONS

Economics 24B Daniel McFadden 1999 1. INTRODUCTION CHAPTER 6. SIMULTANEOUS EQUATIONS Economic systems are usually described in terms of the behavior of various economic agents, and the equilibrium that

### SYSM 6304: Risk and Decision Analysis Lecture 3 Monte Carlo Simulation

SYSM 6304: Risk and Decision Analysis Lecture 3 Monte Carlo Simulation M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu September 19, 2015 Outline

### HOW EFFECTIVE IS TARGETED ADVERTISING?

HOW EFFECTIVE IS TARGETED ADVERTISING? Ayman Farahat and Michael Bailey Marketplace Architect Yahoo! July 28, 2011 Thanks Randall Lewis, Yahoo! Research Agenda An Introduction to Measuring Effectiveness

### Linear Classification. Volker Tresp Summer 2015

Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong

### Note 2 to Computer class: Standard mis-specification tests

Note 2 to Computer class: Standard mis-specification tests Ragnar Nymoen September 2, 2013 1 Why mis-specification testing of econometric models? As econometricians we must relate to the fact that the

### Sampling Theory for Discrete Data

Sampling Theory for Discrete Data * Economic survey data are often obtained from sampling protocols that involve stratification, censoring, or selection. Econometric estimators designed for random samples

### A General Approach to Variance Estimation under Imputation for Missing Survey Data

A General Approach to Variance Estimation under Imputation for Missing Survey Data J.N.K. Rao Carleton University Ottawa, Canada 1 2 1 Joint work with J.K. Kim at Iowa State University. 2 Workshop on Survey

### Premium Copayments and the Trade-off between Wages and Employer-Provided Health Insurance. February 2011

PRELIMINARY DRAFT DO NOT CITE OR CIRCULATE COMMENTS WELCOMED Premium Copayments and the Trade-off between Wages and Employer-Provided Health Insurance February 2011 By Darren Lubotsky School of Labor &

### Performance Related Pay and Labor Productivity

DISCUSSION PAPER SERIES IZA DP No. 2211 Performance Related Pay and Labor Productivity Anne C. Gielen Marcel J.M. Kerkhofs Jan C. van Ours July 2006 Forschungsinstitut zur Zukunft der Arbeit Institute

### An Internal Model for Operational Risk Computation

An Internal Model for Operational Risk Computation Seminarios de Matemática Financiera Instituto MEFF-RiskLab, Madrid http://www.risklab-madrid.uam.es/ Nicolas Baud, Antoine Frachot & Thierry Roncalli

### Linear Models for Continuous Data

Chapter 2 Linear Models for Continuous Data The starting point in our exploration of statistical models in social research will be the classical linear model. Stops along the way include multiple linear

### ECON20310 LECTURE SYNOPSIS REAL BUSINESS CYCLE

ECON20310 LECTURE SYNOPSIS REAL BUSINESS CYCLE YUAN TIAN This synopsis is designed merely for keep a record of the materials covered in lectures. Please refer to your own lecture notes for all proofs.

### Non-Inferiority Tests for Two Means using Differences

Chapter 450 on-inferiority Tests for Two Means using Differences Introduction This procedure computes power and sample size for non-inferiority tests in two-sample designs in which the outcome is a continuous

### The Real Business Cycle Model

The Real Business Cycle Model Ester Faia Goethe University Frankfurt Nov 2015 Ester Faia (Goethe University Frankfurt) RBC Nov 2015 1 / 27 Introduction The RBC model explains the co-movements in the uctuations

### Models for Longitudinal and Clustered Data

Models for Longitudinal and Clustered Data Germán Rodríguez December 9, 2008, revised December 6, 2012 1 Introduction The most important assumption we have made in this course is that the observations