Chapter 3: The Multiple Linear Regression Model Advanced Econometrics - HEC Lausanne Christophe Hurlin University of Orléans November 23, 2013 Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 1 / 174
Section 1 Introduction Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 2 / 174
1. Introduction The objectives of this chapter are the following: 1 De ne the multiple linear regression model. 2 Introduce the ordinary least squares (OLS) estimator. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 3 / 174
1. Introduction The outline of this chapter is the following: Section 2: The multiple linear regression model Section 3: The ordinary least squares estimator Section 4: Statistical properties of the OLS estimator Subsection 4.1: Finite sample properties Subsection 4:2: Asymptotic properties Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 4 / 174
1. Introduction References Amemiya T. (1985), Advanced Econometrics. Harvard University Press. Greene W. (2007), Econometric Analysis, sixth edition, Pearson - Prentice Hil (recommended) Pelgrin, F. (2010), Lecture notes Advanced Econometrics, HEC Lausanne (a special thank) Ruud P., (2000) An introduction to Classical Econometric Theory, Oxford University Press. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 5 / 174
1. Introduction Notations: In this chapter, I will (try to...) follow some conventions of notation. f Y (y) F Y (y) Pr () y Y probability density or mass function cumulative distribution function probability vector matrix Be careful: in this chapter, I don t distinguish between a random vector (matrix) and a vector (matrix) of deterministic elements. For more appropriate notations, see: Abadir and Magnus (2002), Notation in econometrics: a proposal for a standard, Econometrics Journal. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 6 / 174
Section 2 The Multiple Linear Regression Model Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 7 / 174
2. The Multiple Linear Regression Model Objectives 1 De ne the concept of Multiple linear regression model. 2 Semi-parametric and Parametric multiple linear regression model. 3 The multiple linear Gaussian model. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 8 / 174
2. The Multiple Linear Regression Model De nition (Multiple linear regression model) The multiple linear regression model is used to study the relationship between a dependent variable and one or more independent variables. The generic form of the linear regression model is y = x 1 β 1 + x 2 β 2 +.. + x K β K + ε where y is the dependent or explained variable and x 1,.., x K are the independent or explanatory variables. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 9 / 174
2. The Multiple Linear Regression Model Notations 1 y is the dependent variable, the regressand or the explained variable. 2 x j is an explanatory variable, a regressor or a covariate. 3 ε is the error term or disturbance. IMPORTANT: do not use the term "residual".. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 10 / 174
2. The Multiple Linear Regression Model Notations (cont d) The term ε is a random disturbance, so named because it disturbs an otherwise stable relationship. The disturbance arises for several reasons: 1 Primarily because we cannot hope to capture every in uence on an economic variable in a model, no matter how elaborate. The net e ect, which can be positive or negative, of these omitted factors is captured in the disturbance. 2 There are many other contributors to the disturbance in an empirical model. Probably the most signi cant is errors of measurement. It is easy to theorize about the relationships among precisely de ned variables; it is quite another to obtain accurate measures of these variables. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 11 / 174
2. The Multiple Linear Regression Model Notations (cont d) We assume that each observation in a sample fy i, x i1, x i2..x ik g for i = 1,.., N is generated by an underlying process described by Remark: y i = x i1 β 1 + x i2 β 2 +.. + x ik β K + ε i x ik = value of the k th explanatory variable for the i th unit of the sample x unit,variable Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 12 / 174
2. The Multiple Linear Regression Model Notations (cont d) Let the N 1 column vector x k be the N observations on variable x k, for k = 1,.., K. Let assemble these data in an N K data matrix, X. Let y be the N 1 column vector of the N observations, y 1, y 2,.., y N. Let ε be the N 1 column vector containing the N disturbances. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 13 / 174
2. The Multiple Linear Regression Model Notations (cont d) y = N 1 0 B @ y 1 y 2.. y i.. y N 1 C A x k = N 1 0 B @ x 1k x 2k.. x ik.. x Nk 1 C A ε = N 1 0 B @ ε 1 ε 2.. ε i.. ε N 1 C A β = K 1 0 B @ β 1 β 2.. β K 1 C A Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 14 / 174
2. The Multiple Linear Regression Model Notations (cont d) or equivalently X = N K X = (x 1 : x 2 :.. : x K ) N K 0 B @ x 11 x 12.. x 1k.. x 1K x 21 x 22.. x 2k.. x 2K............ x i1 x i2.. x ik.. x ik.......... x N1 x N2.. x Nk.. x NK 1 C A Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 15 / 174
2. The Multiple Linear Regression Model Fact In most of cases, the rst column of X is assumed to be a column of 1s so that β 1 is the constant term in the model. X = N K 0 B @ x 1 = 1 N 1 N 1 1 x 12.. x 1k.. x 1K 1 x 22.. x 2k.. x 2K............ 1 x i2.. x ik.. x ik.......... 1 x N2.. x Nk.. x NK 1 C A Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 16 / 174
2. The Multiple Linear Regression Model Remark More generally, the matrix X may as well contain stochastic and non stochastic elements such as: Constant; Time trend; Dummy variables (for speci c episodes in time); Etc. Therefore, X is generally a mixture of xed and random variables. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 17 / 174
2. The Multiple Linear Regression Model De nition (Simple linear regression model) The simple linear regression model is a model with only one stochastic regressor: K = 1 if there is no constant or K = 2 if there is a constant: y i = β 1 x i + ε i y i = β 1 + β 2 x i2 + ε i for i = 1,.., N, or y = β 1 + β 2 x 2 + ε Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 18 / 174
2. The Multiple Linear Regression Model De nition (Multiple linear regression model) The multiple linear regression model can be written y = X N 1 β N K K 1 + ε N 1 Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 19 / 174
2. The Multiple Linear Regression Model One key di erence for the speci cation of the MLRM: Parametric/semi-parametric speci cation Parametric model: the distribution of the error terms is fully characterized, e.g. ε N (0, Ω) Semi-Parametric speci cation: only a few moments of the error terms are speci ed, e.g. E (ε) = 0 and V (ε) = E εε > = Ω. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 20 / 174
2. The Multiple Linear Regression Model This di erence does not matter for the derivation of the ordinary least square estimator But this di erence matters for (among others): 1 The characterization of the statistical properties of the OLS estimator (e.g., e ciency); 2 The choice of alternative estimators (e.g., the maximum likelihood estimator); 3 Etc. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 21 / 174
2. The Multiple Linear Regression Model De nition (Semi-parametric multiple linear regression model) The semi-parametric multiple linear regression model is de ned by where the error term ε satis es y = Xβ + ε E ( εj X) = 0 N 1 V ( εj X) = σ 2 I N N N and I N is the identity matrix of order N. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 22 / 174
2. The Multiple Linear Regression Model Remarks 1 If the matrix X is non stochastic ( xed), i.e. there are only xed regressors, then the conditions on the error term u read: E (ε) = 0 V (ε) = σ 2 I N 2 If the (conditional) variance covariance matrix of ε is not diagonal, i.e. if V ( εj X) = Ω the model is called the Multiple Generalized Linear Regression Model Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 23 / 174
2. The Multiple Linear Regression Model Remarks (cont d) The two conditions on the error term ε E ( εj X) = 0 N 1 V ( εj X) = σ 2 I N are equivalent to E (yj X) = Xβ V (yj X) = σ 2 I N Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 24 / 174
2. The Multiple Linear Regression Model De nition (The multiple linear Gaussian model) The (parametric) multiple linear Gaussian model is de ned by y = Xβ + ε where the error term ε is normally distributed ε N 0, σ 2 I N As a consequence, the vector y has a conditional normal distribution with yj X N Xβ, σ 2 I N Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 25 / 174
2. The Multiple Linear Regression Model Remarks 1 The multiple linear Gaussian model is (by de nition) a parametric model. 2 If the matrix X is non stochastic ( xed), i.e. there are only xed regressors, then the vector y has marginal normal distribution: y N Xβ, σ 2 I N Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 26 / 174
2. The Multiple Linear Regression Model The classical linear regression model consists of a set of assumptions that describes how the data set is produced by a data generating process (DGP) Assumption 1: Linearity Assumption 2: Full rank condition or identi cation Assumption 3: Exogeneity Assumption 4: Spherical error terms Assumption 5: Data generation Assumption 6: Normal distribution Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 27 / 174
2. The Multiple Linear Regression Model De nition (Assumption 1: Linearity) The model is linear with respect to the parameters β 1,.., β K. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 28 / 174
Remarks The model speci es a linear relationship between the dependent variable and the regressors. For instance, the models are all linear (with respect to β). y = β 0 + β 1 x + u y = β 0 + β 1 cos (x) + v y = β 0 + β 1 1 x + w In contrast, the model y = β 0 + β 1 x β 2 + ε is non linear Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 29 / 174
2. The Multiple Linear Regression Model Remark The model can be linear after some transformations. Starting from y = Ax β exp (ε), one has a log-linear speci cation: ln (y) = ln (A) + β ln (x) + ε Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 30 / 174
2. The Multiple Linear Regression Model De nition (Log-linear model) The loglinear model is ln (y i ) = β 1 ln (x i1 ) + β 2 ln (x i2 ) +.. + β K ln (x ik ) + ε i This equation is also known as the constant elasticity form as in this equation, the elasticity of y with respect to changes in x does not vary with x ik : β k = ln (y i ) ln (x ik ) = y i x ik x ik y i Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 31 / 174
2. The Multiple Linear Regression Model The classical linear regression model consists of a set of assumptions that describes how the data set is produced by a data generating process (DGP) Assumption 1: Linearity Assumption 2: Full rank condition or identi cation Assumption 3: Exogeneity Assumption 4: Spherical error terms Assumption 5: Data generation Assumption 6: Normal distribution Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 32 / 174
2. The Multiple Linear Regression Model De nition (Assumption 2: Full column rank) X is an N K matrix with rank K. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 33 / 174
2. The Multiple Linear Regression Model Interpretation 1 There is no exact relationship among any of the independent variables in the model. 2 The columns of X are linearly independent. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 34 / 174
2. The Multiple Linear Regression Model Example Suppose that a cross-section model satis es: y i = β 0 + β 1 non labor income i + β 2 salary i +β 3 total income i + ε i The identi cation condition does not hold since total income is exactly equal to salary plus non labor income (exact linear dependency in the model). Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 35 / 174
2. The Multiple Linear Regression Model Remarks 1 Perfect multi-collinearity is generally not di cult to spot and is signalled by most statistical software. 2 Imperfect multi-collinearity is a more serious issue (see further). Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 36 / 174
2. The Multiple Linear Regression Model De nition (Identi cation) The multiple linear regression model is identi able if and only if one the following equivalent assertions holds: (i) rank (X) = K (ii) The matrix X > X is invertible (iii) The columns of X form a basis of L (X) (iv) Xβ 1 = Xβ 2 =) β 1 = β 2 (v) Xβ = 0 =) β = 0 8β 2 R K (vi) ker (X) = f0g 8 (β 1, β 2 ) 2 R K R K Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 37 / 174
2. The Multiple Linear Regression Model The classical linear regression model consists of a set of assumptions that describes how the data set is produced by a data generating process (DGP) Assumption 1: Linearity Assumption 2: Full rank condition or identi cation Assumption 3: Exogeneity Assumption 4: Spherical error terms Assumption 5: Data generation Assumption 6: Normal distribution Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 38 / 174
2. The Multiple Linear Regression Model De nition (Assumption 3: Strict exogeneity of the regressors) The regressors are exogenous in the sense that: E ( εj X) = 0 N 1 or equivalently for all the units i 2 f1,..ng E ( ε i j X) = 0 or equivalently E ( ε i j x jk ) = 0 for any explanatory variable k 2 f1,..k g and any unit j 2 f1,..ng. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 39 / 174
2. The Multiple Linear Regression Model CommentsComments 1 The expected value of the error term at observation i (in the sample) is not a function of the independent variables observed at any observation (including the i th observation). The independent variables are not predictors of the error terms. 2 The strict exogeneity condition can be rewritten as: E (y j X) = Xβ 3 If the regressors are xed, this condition can be rewritten as: E (ε) = 0 N 1 Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 40 / 174
2. The Multiple Linear Regression Model Implications The (strict) exogeneity condition E ( εj X) = 0 N 1 has two implications: 1 The zero conditional mean of ε implies that the unconditional mean of u is also zero (the reverse is not true): E (ε) = E X (E ( εj X)) = E X (0) = 0 2 The zero conditional mean of ε implies that (the reverse is not true): E (ε i x jk ) = 0 8i, j, k or Cov (ε i, X) = 0 8i Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 41 / 174
2. The Multiple Linear Regression Model The classical linear regression model consists of a set of assumptions that describes how the data set is produced by a data generating process (DGP) Assumption 1: Linearity Assumption 2: Full rank condition or identi cation Assumption 3: Exogeneity Assumption 4: Spherical error terms Assumption 5: Data generation Assumption 6: Normal distribution Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 42 / 174
2. The Multiple Linear Regression Model De nition (Assumption 4: Spherical disturbances) The error terms are such that: and V ( ε i j X) = E ε 2 i X = σ 2 for all i 2 f1,..ng Cov ( ε i, ε j j X) = E ( ε i ε j j X) = 0 for all i 6= j The condition of constant variances is called homoscedasticity. The uncorrelatedness across observations is called nonautocorrelation. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 43 / 174
2. The Multiple Linear Regression Model Comments 1 Spherical disturbances = homoscedasticity + nonautocorrelation 2 If the errors are not spherical, we call them nonspherical disturbances. 3 The assumption of homoscedasticity is a strong one: this is the exception rather than the rule! Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 44 / 174
2. The Multiple Linear Regression Model Comments Let us consider the (conditional) variance covariance matrix of the error terms: V ( εj X) = E εε > X {z } {z } N N N N 0 E ε 2 1 1 X E ( ε 1 ε 2 j X).. E ( ε 1 ε j j X).. E ( ε 1 ε N j X) E ( ε 2 ε 1 j X) E ε 2 2 X.. E ( ε 2 ε j j X).. E ( ε 2 ε N j X) =.......... B E ( ε i ε 1 j X).. E ( ε i ε j j X).. E ( ε i ε N j X) C @.......... A E ( ε N ε 1 j X).. E ( ε N ε j j X).. E ε 2 N X Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 45 / 174
2. The Multiple Linear Regression Model Comments The two assumptions (homoscedasticity and nonautocorrelation) imply that: V ( εj X) = E εε > X = σ 2 I N {z } {z } N N N N 0 = B @ σ 2 0.. 0.. 0 0 σ 2.. 0.. 0.................. 0.......... 0.. 0.. σ 2 1 C A Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 46 / 174
2. The Multiple Linear Regression Model The classical linear regression model consists of a set of assumptions that describes how the data set is produced by a data generating process (DGP) Assumption 1: Linearity Assumption 2: Full rank condition or identi cation Assumption 3: Exogeneity Assumption 4: Spherical error terms Assumption 5: Data generation Assumption 6: Normal distribution Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 47 / 174
2. The Multiple Linear Regression Model De nition (Assumption 5: Data generation) The data in (x i1 x i2...x ik ) may be any mixture of constants and random variables. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 48 / 174
2. The Multiple Linear Regression Model Comments 1 Analysis will be done conditionally on the observed X, so whether the elements in X are xed constants or random draws from a stochastic process will not in uence the results. 2 In the case of stochastic regressors, the unconditional statistical properties of are obtained in two steps: (1) using the result conditioned on X and (2) nding the unconditional result by averaging (i.e., integrating over) the conditional distributions. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 49 / 174
2. The Multiple Linear Regression Model Comments Assumptions regarding (x i1 x i2...x ik y i ) for i = 1,.., N is also required. This is a statement about how the sample is drawn. In the sequel, we assume that (x i1 x i2...x ik y i ) for i = 1,.., N are independently and identically distributed (i.i.d). The observations are drawn by a simple random sampling from a large population. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 50 / 174
2. The Multiple Linear Regression Model The classical linear regression model consists of a set of assumptions that describes how the data set is produced by a data generating process (DGP) Assumption 1: Linearity Assumption 2: Full rank condition or identi cation Assumption 3: Exogeneity Assumption 4: Spherical error terms Assumption 5: Data generation Assumption 6: Normal distribution Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 51 / 174
2. The Multiple Linear Regression Model De nition (Assumption 6: Normal distribution) The disturbances are normally distributed. ε i j X N 0, σ 2 or equivalently εj X N 0 N 1, σ 2 I N Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 52 / 174
2. The Multiple Linear Regression Model Comments 1 Once again, this is a convenience that we will dispense with after some analysis of its implications. 2 Normality is not necessary to obtain many of the results presented below. 3 Assumption 6 implies assumptions 3 (exogeneity) and 4 (spherical disturbances). Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 53 / 174
2. The Multiple Linear Regression Model Summary The main assumptions of the multiple linear regression model A1: linearity The model is linear with β A2: identi cation X is an N K matrix with rank K A3: exogeneity E ( εj X) = 0 N 1 A4: spherical error terms V ( εj X) = σ 2 I N A5: data generation X may be xed or random A6: normal distribution εj X N 0 N 1, σ 2 I N Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 54 / 174
2. The Multiple Linear Regression Model Key Concepts 1 Simple linear regression model 2 Multiple linear regression model 3 Semi-parametric multiple linear regression model 4 Multiple linear Gaussian model 5 Assumptions of the multiple linear regression model 6 Linearity (A1), Identi cation (A2), Exogeneity (A3), Spherical error terms (A4), Data generation (A5) and Normal distribution (A6) Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 55 / 174
Section 3 The ordinary least squares estimator Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 56 / 174
3. The ordinary least squares estimator Introduction 1 The simple linear regression model assumes that the following speci cation is true in the population: y = Xβ + ε where other unobserved factors determining y are captured by the error term ε. 2 Consider a sample fx i1, x i2,.., x ik, y i g N i=1 of i.i.d. random variables (be careful to the change of notations here) and only one realization of this sample (your data set). 3 How to estimate the vector of parameters β? Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 57 / 174
3. The ordinary least squares estimator Introduction (cont d) 1 If we assume that assumptions A1-A6 hold, we have a multiple linear Gaussian model (parametric model), and a solution is to use the MLE. The MLE estimator for β coincides to the ordinary least squares (OLS) estimator (cf. chapter 2). 2 If we assume that only assumptions A1-A5 hold, we have a semi-parametric multiple linear regression model, the MLE is unfeasible. 3 In this case, the only solution is to use the ordinary least squares estimator (OLS). Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 58 / 174
3. The ordinary least squares estimator Intuition Let us consider the simple linear regression model and for simplicity denote x i = x i2 : y i = β 1 + β 2 x i + ε i The general idea of the OLS consists in minimizing the distance between the points (x i, y i ) and the regression line by i = bβ 1 + bβ 2 x i or the points (x i, by i ) for all i = 1,.., N Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 59 / 174
3. The ordinary least squares estimator Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 60 / 174
3. The ordinary least squares estimator Intuition (cont d) Estimates of β 1 and β 2 are chosen by minimizing the sum of the squared residuals (SSR): N This SSR can be written as: N i=1 i=1 bε 2 i 2 bε 2 i = y i bβ 1 bβ 2 x i Therefore, bβ 1 and bβ 2 are the solutions of the minimization problem b β 1, bβ 2 = arg min (β 1,β 2 ) N i=1 (y i β 1 β 2 x i ) 2 Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 61 / 174
3. The ordinary least squares estimator De nition (OLS - simple linear regression model) In the simple linear regression model y i = β 1 + β 2 x i + ε i, the OLS estimators bβ 1 and bβ 2 are the solutions of the minimization problem b β 1, bβ 2 = arg min (β 1,β 2 ) N i=1 (y i β 1 β 2 x i ) 2 The solutions are: bβ 1 = y N bβ 2 x N bβ 2 = N i=1 (x i x N ) (y i y N ) N i=1 (x i x N ) 2 where y N = N 1 N i=1 y i and x N = N 1 N i=1 x i respectively denote the sample mean of the dependent variable y and the regressor x. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 62 / 174
3. The ordinary least squares estimator Remark The OLS estimator is a linear estimator (cf. chapter 1) since it can be expressed as a linear function of the observations y i : with in the case where y N = 0. ω i = bβ 2 = N ω i y i i=1 (x i x N ) N i=1 (x i x N ) 2 Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 63 / 174
3. The ordinary least squares estimator De nition (Fitted value) The predicted or tted value for observation i is: by i = bβ 1 + bβ 2 x i with a sample mean equal to the sample average of the observations by N = 1 N N by i = y N = 1 N i=1 N y i i=1 Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 64 / 174
3. The ordinary least squares estimator De nition (Fitted residual) The residual for observation i is: bε i = y i bβ 1 bβ 2 x i with a sample mean equal to zero by de nition. bε N = 1 N N i=1 bε i = 0 Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 65 / 174
3. The ordinary least squares estimator Remarks 1 The t of the regression is good if the sum N i=1 bε2 i (or SSR) is small, i.e., the unexplained part of the variance of y is small. 2 The coe cient of determination or R 2 is given by: R 2 = N i=1 (by i y N ) 2 N N i=1 (y i y N ) 2 = 1 i=1 bε2 i N i=1 (y i y N ) 2 Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 66 / 174
3. The ordinary least squares estimator Orthogonality conditions Under assumption A3 (strict exogeneity), we have E ( ε i j x i ) = 0. This condition implies that: E (ε i ) = 0 E (ε i x i ) = 0 Using the sample analog of this moment conditions (cf. chapter 6, GMM), one has: N 1 N y i i=1 bβ 1 bβ 2 x i = 0 N 1 N y i i=1 bβ 1 bβ 2 x i x i = 0 This is a system of two equations and two unknowns bβ 1 and bβ 2. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 67 / 174
3. The ordinary least squares estimator De nition (Orthogonality conditions) The ordinary least squares estimator can be de ned from the two sample analogs of the following moment conditions: E (ε i ) = 0 E (ε i x i ) = 0 The corresponding system of equations is just-identi ed. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 68 / 174
3. The ordinary least squares estimator OLS and multiple linear regression model Now consider the multiple linear regression model or y i = y = Xβ + ε K β k x ik + ε i k=1 Objective: nd an estimator (estimate) of β 1, β 2,.., β K and σ 2 under the assumptions A1-A5. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 69 / 174
3. The ordinary least squares estimator OLS and multiple linear regression model Di erent methods: 1 Minimize the sum of squared residuals (SSR) 2 Solve the same minimization problem with matrix notation. 3 Use moment conditions. 4 Geometrical interpretation Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 70 / 174
3. The ordinary least squares estimator 1. Minimize the sum of squared residuals (SSR): As in the simple linear regression, bβ = arg min β N i=1 ε 2 i = arg min β N i=1 y i! 2 K β k x ik k=1 One can derive the rst order conditions with respect to β k for k = 1,.., K and solve a system of K equations with K unknowns. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 71 / 174
3. The ordinary least squares estimator 2. Using matrix notations: De nition (OLS and multiple linear regression model) In the multiple linear regression model y i = x > i β + ε i, with x i = (x i1,.., x ik ) >, the OLS estimator b β is the solution of bβ = arg min β The OLS estimators of β is: bβ = N x i x i > i=1 N i=1 y i! 1 2 x i > β! N x i y i i=1 Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 72 / 174
3. The ordinary least squares estimator 2. Using matrix notations: De nition (Normal equations) Under suitable regularity conditions, in the multiple linear regression model y i = x > i β + ε i, with x i = (x i1 :.. : x ik ) >, the normal equations are N x i y i x i > i=1 bβ = 0 K 1 Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 73 / 174
3. The ordinary least squares estimator 2. Using matrix notations: De nition (OLS and multiple linear regression model) In the multiple linear regression model y = Xβ + ε, the OLS estimator b β is the solution of the minimization problem bβ = arg min β ε > ε = arg min β (y Xβ) > (y Xβ) The OLS estimators of β is: bβ = 1 X > X X > y Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 74 / 174
3. The ordinary least squares estimator 2. Using matrix notations: De nition The ordinary least squares estimator β b of β minimizes the following criteria s (β) = k(y Xβ)k 2 I N = (y Xβ) > (y Xβ) Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 75 / 174
3. The ordinary least squares estimator 2. Using matrix notations: The FOC (normal equations) are de ned by: s (β) β = 2 X > bβ {z} y X b β K N {z } N 1 = 0 K 1 The second-order conditions hold: s (β) β β > = 2 X {z > X} is de nite positive bβ K K since by de nition X > X is a positive de nite matrix. We have a minimum. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 76 / 174
3. The ordinary least squares estimator 2. Using matrix notations: De nition (Normal equations) Under suitable regularity conditions, in the multiple linear regression model y = Xβ + ε, the normal equations are given by: X > {z} (y Xβ) {z } K N N 1 = 0 K 1 Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 77 / 174
3. The ordinary least squares estimator De nition (Unbiased variance estimator) In the multiple linear regression model y = Xβ + ε, the unbiased estimator of σ 2 is given by: bσ 2 = 1 N K N i=1 bε 2 i SSR N K Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 78 / 174
3. The ordinary least squares estimator 2. Using matrix notations: The estimator bσ 2 can also be written as: bσ 2 = 1 N K N y i i=1 x > i 2 bβ bσ 2 = (y Xβ)> (y Xβ) N K bσ 2 = k(y Xβ)k2 I N N K Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 79 / 174
3. The ordinary least squares estimator 3. Using moment conditions: Under assumption A3 (strict exogeneity), we have E ( εj X) = 0. This condition implies: E (ε i x i ) = 0 K 1 with x i = (x i1 :.. : x ik ) >. Using the sample analogs, one has: N 1 N x i y i x i > i=1 bβ = 0 K 1 We have K (normal) equations with K unknown parameters bβ 1,.., bβ K. The system is just identi ed. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 80 / 174
3. The ordinary least squares estimator 4. Geometric interpretation: 1 The ordinary least squares estimation methods consists in determining the adjusted vector, by, which is the closest to y (in a certain space...) such that the squared norm between y and by is minimized. 2 Finding by is equivalent to nd an estimator of β. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 81 / 174
3. The ordinary least squares estimator 4. Geometric interpretation: De nition (Geometric interpretation) The adjusted vector, by, is the (orthogonal) projection of y onto the column space of X. The tted error terms, bε, is the projection of y onto the orthogonal space engendered by the column space of X. The vectors by and bε are orthogonal. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 82 / 174
3. The ordinary least squares estimator 4. Geometric interpretation: Source: F. Pelgrin (2010), Lecture notes, Advanced Econometrics Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 83 / 174
3. The ordinary least squares estimator 4. Geometric interpretation: De nition (Projection matrices) The vectors by and bε are de ned to be: by = P y bε = M y where P and M denote the two following projection matrices: 1 P = X X > X X > 1 M = I N P = I N X X > X X > Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 84 / 174
3. The ordinary least squares estimator Other geometric interpretations: Suppose that there is a constant term in the model. 1 The least squares residuals sum to zero: N i=1 bε i = 0 2 The regression hyperplane passes through the point of means of the data (x N, y N ). 3 The mean of the tted (adjusted) values of y equals the mean of the actual values of y: by N = y N Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 85 / 174
3. The ordinary least squares estimator De nition (Coe cient of determination) The coe cient of determination of the multiple linear regression model (with a constant term) is the ratio of the total (empirical) variance explained by model to the total (empirical) variance of y: R 2 = N i=1 (by i y N ) 2 N N i=1 (y i y N ) 2 = 1 i=1 bε2 i N i=1 (y i y N ) 2 Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 86 / 174
3. The ordinary least squares estimator Remark 1 The coe cient of determination measures the proportion of the total variance (or variability) in y that is accounted for by variation in the regressors (or the model). 2 Problem: the R 2 automatically and spuriously increases when extra explanatory variables are added to the model. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 87 / 174
3. The ordinary least squares estimator De nition (Adjusted R-squared) The adjusted R-squared coe cient is de ned to be: R 2 = 1 N 1 N p 1 1 R2 where p denotes the number of regressors (not counting the constant term, i.e., p = K 1 if there is a constant or p = K otherwise). Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 88 / 174
3. The ordinary least squares estimator Remark One can show that 1 R 2 <R 2 2 if N is large R 2 'R 2 3 The adjusted R-squared R 2 can be negative. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 89 / 174
3. The ordinary least squares estimator Key Concepts 1 OLS estimator and estimate 2 Fitted or predicted value 3 Residual or tted residual 4 Orthogonality conditions 5 Normal equations 6 Geometric interpretations of the OLS 7 Coe cient of determination and adjusted R-squared Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 90 / 174
Section 4 Statistical properties of the OLS estimator Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 91 / 174
4. Statistical properties of the OLS estimator In order to study the statistical properties of the OLS estimator, we have to distinguish (cf. chapter 1): 1 The nite sample properties 2 The large sample or asymptotic properties Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 92 / 174
4. Statistical properties of the OLS estimator But, we have also to distinguish the properties given the assumptions made on the linear regression model 1 Semi-parametric linear regression model (the exact distribution of ε is unknown) versus parametric linear regression model (and especially Gaussian linear regression model, assumption A6). 2 X is a matrix of random regressors versus X is a matrix of xed regressors. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 93 / 174
4. Statistical properties of the OLS estimator Fact (Assumptions) In the rest of this section, we assume that assumptions A1-A5 hold. A1: linearity The model is linear with β A2: identi cation X is an N K matrix with rank K A3: exogeneity E ( εj X) = 0 N 1 A4: spherical error terms V ( εj X) = σ 2 I N A5: data generation X may be xed or random Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 94 / 174
Subsection 4.1. Finite sample properties of the OLS estimator Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 95 / 174
4.1. Finite sample properties Objectives The objectives of this subsection are the following: 1 Compute the two rst moments of the (unknown) nite sample distribution of the OLS estimators b β and bσ 2 2 Determine the nite sample distribution of the OLS estimators b β and bσ under particular assumptions (A6). 3 Determine if the OLS estimators are "good": e cient estimator versus BLUE. 4 Introduce the Gauss-Markov theorem. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 96 / 174
4.1. Finite sample properties First moments of the OLS estimators Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 97 / 174
4.1. Finite sample properties Moments In a rst step, we will derive the rst moments of the OLS estimators 1 Step 1: compute E bβ and V bβ 2 Step 2: compute E bσ 2 and V bσ 2 Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 98 / 174
4.1. Finite sample properties De nition (Unbiased estimator) In the multiple linear regression model y = Xβ 0 + ε, under the assumption A3 (strict exogeneity), the OLS estimator β b is unbiased: E bβ = β 0 where β 0 denotes the true value of the vector of parameters. This result holds whether or not the matrix X is considered as random. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 99 / 174
4.1. Finite sample properties Proof Case 1: xed regressors (cf. chapter 1) bβ = 1 1 X > X X > y = β 0 + X > X X > ε So, if X is a matrix of xed regressors: 1 E bβ = β 0 + X > X X > E (ε) Under assumption A3 (exogeneity), E ( εj X) = E (ε) = 0. Then, we get: E bβ = β 0 The OLS estimator is unbiased. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 100 / 174
4.1. Finite sample properties Proof (cont d) Case 2: random regressors bβ = 1 1 X > X X > y = β 0 + X > X X > ε If X is includes some random elements: E bβ 1 X = β 0 + X > X X > E ( εj X) Under assumption A3 (exogeneity), E ( εj X) = 0. Then, we get: E bβ X = β 0 Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 101 / 174
4.1. Finite sample properties Proof (cont d) Case 2: random regressors The OLS estimator β b is conditionally unbiased. E bβ X = β 0 Besides, we have: E bβ = E X E bβ X = E X (β 0 ) = β 0 where E X denotes the expectation with respect to the distribution of X. So, the OLS estimator β b is unbiased. E bβ = β 0 Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 102 / 174
4.1. Finite sample properties De nition (Variance of the OLS estimator, non-stochastic regressors) In the multiple linear regression model y = Xβ + ε, if the matrix X is non-stochastic, the unconditional variance covariance matrix of the OLS estimator β b is: 1 V bβ = σ 2 X > X Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 103 / 174
4.1. Finite sample properties Proof bβ = 1 1 X > X X > y = β 0 + X > X X > ε So, if X is a matrix of xed regressors: bβ > V bβ = E β0 bβ β0 1 1 = E X X > X > εε > X X > X = 1 X X > X > E εε > 1 X X > X Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 104 / 174
4.1. Finite sample properties Proof (cont d) Under assumption A4 (spherical disturbances), we have: V (ε) = E εε > = σ 2 I N The variance covariance matrix of the OLS estimator is de ned by: V bβ 1 = X X > X > E εε > X X > X 1 1 = X X > X > σ 2 I N X X > X 1 1 = σ 2 X > X X > X X > X = σ 2 X > X 1 1 Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 105 / 174
4.1. Finite sample properties De nition (Variance of the OLS estimator, stochastic regressors) In the multiple linear regression model y = Xβ 0 + ε, if the matrix X is stochastic, the conditional variance covariance matrix of the OLS estimator β b is: V bβ 1 X = σ 2 X > X The unconditional variance covariance matrix is equal to: 1 V bβ = σ 2 E X X > X where E X denotes the expectation with respect to the distribution of X. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 106 / 174
4.1. Finite sample properties Proof bβ = 1 1 X > X X > y = β 0 + X > X X > ε So, if X is a stochastic matrix: bβ V bβ > X = E β0 bβ β0 X 1 1 = E X X > X > εε > X X X > X = 1 X X > X > E εε > 1 X X X > X Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 107 / 174
4.1. Finite sample properties Proof (cont d) Under assumption A4 (spherical disturbances), we have: V ( εj X) = E εε > X = σ 2 I N The conditional variance covariance matrix of the OLS estimator is de ned by: V bβ X = = 1 X X > X > E εε > X X X > X 1 1 X X > X > σ 2 I N X X > X = σ 2 X > X 1 1 Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 108 / 174
4.1. Finite sample properties Proof (cont d) We have: V bβ 1 X = σ 2 X > X The (unconditional) variance covariance matrix of the OLS estimator is de ned by: V bβ = E X V bβ 1 X = σ 2 E X X > X where E X denotes the expectation with respect to the distribution of X. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 109 / 174
4.1. Finite sample properties Summary Mean Variance Cond. mean Cond. var Case 1: X stochastic Case 2: X nonstochastic E bβ = β 0 E bβ = β 0 1 1 V bβ = σ 2 E X X > X V bβ = σ 2 X > X E bβ X = β 0 V bβ 1 X = σ 2 X X > Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 110 / 174
4.1. Finite sample properties Question How to estimate the variance covariance matrix of the OLS estimator? 1 V bβols = σ 2 X X > if X is nonstochastic 1 V bβols = σ 2 E X X > X if X is stochastic Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 111 / 174
4.1. Finite sample properties Question (cont d) De nition (Variance estimator) An unbiased estimator of the variance covariance matrix of the OLS estimator is given: bv 1 bβols = bσ 2 X > X where bσ 2 = (N K ) 1 bε > bε is an unbiased estimator of σ 2. This result holds whether X is stochastic or non stochastic. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 112 / 174
4.1. Finite sample properties Summary Variance Estimator Case 1: X stochastic 1 V bβ = σ 2 E X X > X Case 2: X nonstochastic V bβ = σ 2 X > X 1 bv 1 bβols = bσ 2 X X > bv 1 bβols = bσ 2 X > X Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 113 / 174
4.1. Finite sample properties De nition (Estimator of the variance of disturbances) Under the assumption A1-A5, in the multiple linear regression model y = Xβ + ε, the estimator bσ 2 is unbiased: E bσ 2 = σ 2 where bσ 2 = 1 N K N i=1 bε 2 i = bε> bε N K This result holds whether or not the matrix X is considered as random. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 114 / 174
4.1. Finite sample properties Proof We assume that X is stochastic. Let M denotes the projection matrix ( residual maker ) de ned by: with M = I N bε = (N,1) 1 X X > X X > M y (N,N )(N,1) The N N matrix M satis es the following properties: 1 if X is regressed on X, a perfect t will result and the residuals will be zero, so M X = 0 2 The matrix M is symmetric M > = M and idempotent M M = M Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 115 / 174
4.1. Finite sample properties Proof (cont d) The residuals are de ned as to be: bε = M y Since y = Xβ + ε, we have bε = M (Xβ + ε) = MXβ + Mε Since MX = 0, we have bε = Mε Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 116 / 174
4.1. Finite sample properties Proof (cont d) The estimator bσ 2 is based on the sum of squared residuals (SSR) bσ 2 = The expected value of the SSR is E bε > bε X bε> bε N K = ε> Mε N K = E ε > Mε X The scalar ε > Mε is a 1 1 scalar, so it is equal to its trace. tr E ε > Mε X = tr E εε > M X = tr E Mεε > X since tr (AB) = tr (AB). Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 117 / 174
4.1. Finite sample properties Proof (cont d) Since M = I N E bε > bε X 1 X X X > X > depends on X, we have: = tr E Mεε > X = tr M E εε > X Under assumptions A3 and A4, we have E εε > X = σ 2 I N As a consequence E bε > bε X = tr σ 2 M I N = σ 2 tr (M) Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 118 / 174
4.1. Finite sample properties Proof (cont d) E bε > bε X = σ 2 tr (M) = σ 2 tr I N X X > X = σ 2 tr (I N ) σ 2 tr = σ 2 tr (I N ) σ 2 tr 1 X > 1 X > X X > X 1 X > X X > X = σ 2 tr (I N ) σ 2 tr (I K ) = σ 2 (N K ) Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 119 / 174
4.1. Finite sample properties Proof (cont d) By de nition of bσ 2, we have: E bσ 2 X = E bε > bε X N K = σ2 (N K ) N K = σ 2 So, the estimator bσ 2 is conditionally unbiased. E bσ 2 = E X E bσ 2 X = E X σ 2 = σ 2 The estimator bσ 2 is unbiased: E bσ 2 = σ 2 Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 120 / 174
4.1. Finite sample properties Remark Given the same principle, we can compute the variance of the estimator bσ 2. As a consequence, we have: bσ 2 = V bσ 2 X = V But, it takes... at least ten slides... bε> bε N K = ε> Mε N K 1 (N K ) V ε > Mε X bσ 2 = E X V bσ 2 X Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 121 / 174
4.1. Finite sample properties De nition (Variance of the estimator bσ 2 ) In the multiple linear regression model y = Xβ 0 + ε, the variance of the estimator bσ 2 is V bσ 2 = 2σ4 N K where σ 2 denotes the true value of variance of the error terms. This result holds whether or not the matrix X is considered as random. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 122 / 174
4.1. Finite sample properties Summary Mean Variance Cond. mean Cond. var Case 1: X stochastic Case 2: X nonstochastic E bσ 2 = σ 2 E bσ 2 = σ 2 V bσ 2 = 2σ4 N K V bσ 2 = 2σ4 N K E bσ 2 X = σ 2 V bσ 2 X = 2σ4 N K Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 123 / 174
4.1. Finite sample properties Finite sample distributions Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 124 / 174
4.1. Finite sample properties Summary 1 Under assumptions A1-A5, we can derive the two rst moments of the (unknown) nite sample distribution of the OLS estimator β, b i.e. E bβ and V bβ. 2 Are we able to characterize the nite sample distribution (or exact sampling distribution) of b β? 3 For that, we need to put an assumption on the distribution of ε and use a parametric speci cation. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 125 / 174
4.1. Finite sample properties Finite sample distribution Fact (Finite sample distribution, I) (1) In a parametric multiple linear regression model with stochastic regressors, the conditional nite sample distribution of β b is known. The unconditional nite sample distribution is generally unknown: bβ X D β b?? where D is a multivariate distribution. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 126 / 174
4.1. Finite sample properties Finite sample distribution Fact (Finite sample distribution, II) (2) In a parametric multiple linear regression model with nonstochastic regressors, the marginal (unconditional) nite sample distribution of b β is known: bβ D Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 127 / 174
4.1. Finite sample properties Finite sample distribution Fact (Finite sample distribution, III) (3) In a semi-parametric multiple linear regression model, with xed or random regressors, the nite sample distribution of β b is unknown bβ X?? β b?? Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 128 / 174
4.1. Finite sample properties Example (Linear Gaussian regression model) Under the assumption A6 (normality - linear Gaussian regression model), ε N 0,σ 2 I N bβ = 1 1 X > X X > y = β + X > X X > ε If X is a stochastic matrix, the conditional distribution of β b is bβ 1 X N β,σ 2 X > X If X is a nonstochastic matrix, the marginal distribution of β b is bβ 1 N β,σ 2 X > X Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 129 / 174
4.1. Finite sample properties Theorem (Linear gaussian regression model) Under the assumption A6 (normality), the estimators β b and bσ 2 have a nite sample distribution given by: bβ 1 N β,σ 2 X > X bσ 2 σ 2 (N K ) χ2 (N K ) Moreover, β b and bσ 2 are independent. This result holds whether or not the matrix X is considered as random. In this last case, the distribution of β b is conditional to X with V bβ 1 X = σ 2 X X >. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 130 / 174
4.1. Finite sample properties E cient estimator or BLUE? Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 131 / 174
4.1. Finite sample properties Question: OLS estimator = "good" estimator of β? The question is to know if there this estimator is preferred to other unbiased estimators? V bβols 7 V bβother In general, to answer to this question we use the FDCR or Cramer-Rao bound and study the e ciency of the estimator (cf. chapters 1 and 2). Problem: the computation of the FDCR bound requires an assumption on the distribution of ε. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 132 / 174
4.1. Finite sample properties Two cases: 1 In a parametric model, it is possible to show that the OLS estimator is e cient - its variance reaches the FDCR bound. The OLS is the Best Unbiased Estimator (BUE) 2 In a semi-parametric model: 1 we don t known the conditional distribution of y ven X and the likelihood of the sample. 2 The FDCR bound is unknown: so it is impossible to show the e ciency of the OLS estimator. 3 In a semi-parametric model, we introduce another concept: the Best Linear Unbiased Estimator (BLUE) Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 133 / 174
4.1. Finite sample properties Problem (E ciency of the OLS estimator) In a semi-parametric multiple linear regression model (with no assumption on the distribution of ε), it is impossible to compute the FDCR or Cramer Rao bound and to show that the e ciency of the OLS estimator. The e ciency of the OLS estimator can only be shown in a parametric multiple linear regression model. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 134 / 174
4.1. Finite sample properties Theorem (E ciency - Gaussian model) In a Gaussian multiple linear regression model y = Xβ 0 + ε, with ε N 0,σ 2 I N (assumption A6), the OLS estimator β b is e cient. Its variance attains the FDCR or Cramer-Rao bound: V bβ 1 = I N (β 0 ) This result holds whether or not the matrix X is considered as random. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 135 / 174
4.1. Finite sample properties Proof Let us consider the case where X is nonstochastic. Under the assumption A6, the distribution is known, since y N Xβ,σ 2 I N. The log-likelihood of the sample fy i, x i g N i=1, with x i = (x i1 :.. : x ik ) >, is then equal to: `N (β; y, x) = N 2 ln σ2 N 2 ln (2π) (y Xβ) > (y Xβ) 2σ 2 If we denote θ = β > : σ 2 >, we have (cf. chapter 2): I N (θ 0 ) = 2`N (θ; y, x) E θ θ θ > = 1 X > X 0 σ 2 21 T 0 12 2σ 4! Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 136 / 174
4.1. Finite sample properties Proof (cont d) I N (θ 0 ) = 1 X > X 0 σ 2 21 T 0 12 2σ 4 The inverse of this block matrix is given by 0 1 1 1 I N (θ 0) = @ σ2 X X > 0 21 A = I 1 N (β 0 ) 0 21 1 0 12 I σ 2 Then we have: V bβ 0 12 2σ 4 T 1! 1 = I N (β 0 ) = σ2 X X > N! Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 137 / 174
4.1. Finite sample properties Problem 1 In a semi-parametric model (with no assumption on the distribution of ε), it is impossible to compute the FDCR bound and to show the e ciency of the OLS estimator. 2 The solution consists in introducing the concept of best linear unbiased estimator (BLUE): the Gauss-Markov theorem... Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 138 / 174
4.1. Finite sample properties Theorem (Gauss-Markov theorem) In the linear regression model under assumptions A1-A5, the least squares estimator b β is the best linear unbiased estimator (BLUE) of β 0 whether X is stochastic or nonstochastic. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 139 / 174
4.1. Finite sample properties Comment The estimator bβ k for k = 1,..K is the BLUE of β k Best = smallest variance Linear (in y or y i ) : bβ k = N i=1 ω ki y i Unbiased: E b β k = β k Estimator: bβ k = f (y 1,.., y N ) Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 140 / 174
4.1. Finite sample properties Remark The fact that the OLS is the BLUE means that there is no linear unbiased estimator with a smallest variance than b β BUT, nothing guarantees that a nonlinear unbiased estimator may have a smallest variance than b β Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 141 / 174
4.1. Finite sample properties Summary Model Case 1: X stochastic Case 2: X nonstochastic Semi-parametric β b is the BLUE β b is the BLUE (Gauss-Markov) (Gauss-Markov) Parametric β b is e cient and BUE β b is e cient and BUE (FDCR bound) (FDCR bound) BUE: Best Unbiased Estimator Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 142 / 174
4.1. Finite sample properties Other remarks Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 143 / 174
4.1. Finite sample properties Remarks 1 Including irrelevant variables (i.e., coe cient = 0) does not a ect the unbiasedness of the estimator. 2 Irrelevant variables decrease the precision of OLS estimator, but do not result in bias. 3 Omitting relevant regressors obviously does a ect the estimator, since the zero conditional mean assumption does not hold anymore. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 144 / 174
4.1. Finite sample properties Remarks 1 Omitting relevant variables cause bias (under some conditions on these variables). 1 The direction of the bias can be (sometimes) characterized. 2 The size of the bias is more complicated to derive. 2 More generally, correlation between a single regressor and the error term (endogeneity) usually results in all OLS estimators being biased Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 145 / 174
4.1. Finite sample properties Key Concepts 1 OLS unbiased estimator under assumption A3 (exogeneity) 2 Mean and variance of the OLS estimator under assumptions A3-A4 (exogeneity - spherical disturbances) 3 Finite sample distribution 4 Estimator of the variance of the error terms 5 BLUE estimator and e cient estimator (FDCR bound) 6 Gauss-Markov theorem Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 146 / 174
Subsection 4.2. Asymptotic properties of the OLS estimator Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 147 / 174
4.2. Asymptotic properties Objectives The objectives of this subsection are the following 1 Study the consistency of the OLS estimator. 2 Derive the asymptotic distribution of the OLS estimator β. b 3 Estimate the asymptotic variance covariance matrix of β. b Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 148 / 174
4.2. Asymptotic properties Theorem (Consistency) Let consider the multiple linear regression model y i = x > i β 0 + ε i, with x i = (x i1 :.. : x ik ) >. Under assumptions A1-A5, the OLS estimators bβ and bσ 2 are ( weakly) consistent: bβ p! β 0 bσ 2 p! σ 2 Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 149 / 174
4.2. Asymptotic Properties Proof (cf. chapter 1) We have: bβ = N x i x i > i=1! 1! N x i y i = β 0 + i=1 N x i x i > i=1! 1! N x i ε i i=1 By multiplying and dividing by N, we get: bβ = β + 1 N N x i x i > i=1! 1! N 1 N x i ε i i=1 Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 150 / 174
4.2. Asymptotic Properties Proof (cf. chapter 1) bβ = β + 1 N N x i x i > i=1! 1! N 1 N x i ε i i=1 1 By using the (weak) law of large number (Kitchine s theorem), we have: N 1 N x i x i > i=1 p! E x i x i > N 1 p N x i ε i! E (xi ε i ) i=1 2 By using the Slutsky s theorem: bβ p! β + E 1 x i x > i E (x i ε i ) Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 151 / 174
4.2. Asymptotic Properties Proof (cf. chapter 1) bβ p! β + E 1 x i x > i E (x i µ i ) Under assumption A3 (exogeneity): E ( ε i j x i ) = 0 ) E (εi x i ) = 0 K 1 E (ε i ) = 0 We have bβ p! β The OLS estimator b β is (weakly) consistent. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 152 / 174
4.2. Asymptotic properties Theorem (Asymptotic distribution) Consider the multiple linear regression model y i = x i > β 0 + ε i, with x i = (x i1 :.. : x ik ) >. Under assumptions A1-A5, the OLS estimator β b is asymptotically normally distributed: p N bβ β0 d! N 0, σ 2 E 1 X x i x > i or equivalently bβ asy N β 0, σ2 N E X 1 x i x i > Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 153 / 174
4.2. Asymptotic properties Comments 1 Whatever the speci cation used (parametric or semi-parametric), it is always possible to characterize the asymptotic distribution of the OLS estimator. 2 Under assumptions A1-A5, the OLS estimator b β is asymptotically normally distributed, even if ε is not normally distributed. 3 This result (asymptotic normality) is valid whether X is stochastic or nonstochastic. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 154 / 174
4.2. Asymptotic properties Proof (cf. chapter 1) This proof has be shown in the chapter 1 (Estimation theory) 1 Let us rewrite the OLS estimator as: bβ = β 0 + N x i x i > i=1! 1! N x i ε i i=1 2 Normalize the vector b β β 0 p N bβ β0 = 1 N N x i x i > i=1! 1! p N 1 N N x i ε i = AN 1 b N i=1 A N = 1 N N x i x i > b N = p N 1 N i=1 N x i ε i i=1 Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 155 / 174
4.2. Asymptotic properties Proof (cf. chapter 1) 3. Using the WLLN (Kinchine s theorem) and the CMP (continuous mapping theorem). Remark: if x i this part is useless.! 1 N 1 N x i x i > p! E 1 X x i x i > i=1 4. Using the CLT: B N = p N 1 N! N x i ε i E (x i ε i ) d! N (0, V (x i ε i )) i=1 Under assumption A3 E ( ε i j x i ) = 0 =) E (x i ε i ) = 0 and V (x i ε i ) = E x i ε i ε i x i > = E X E x i ε i ε i x i > x i = E X x i V ( ε i j x i ) x i > = σ 2 E X x i x i > Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 156 / 174
4.2. Asymptotic properties Proof (cf. chapter 1) So we have p N bβ β0 with A 1 N p! A 1 = A 1 N b N A 1 = E 1 X x i x > i b N d! N (Eb, V b ) E b = 0 V b = σ 2 E X x i x > i = σ 2 A Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 157 / 174
4.2. Asymptotic properties Proof (cf. chapter 1) A 1 N p! A 1 A 1 = E 1 x i x > i b N d! N (Eb, V b ) E b = 0 V b = σ 2 E X x i x > i = σ 2 A By using the Slusky s theorem (for a convergence in distribution), we have: A 1 N b N d! N (E c, V c ) E c = A 1 E b = A 1 0 = 0 V c = A 1 V b A 1 = A 1 σ 2 A A 1 = σ 2 A 1 Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 158 / 174
4.2. Asymptotic properties Proof (cf. chapter 1) p N bβ β0 = A 1 N b N d! N 0, σ 2 A 1 with A 1 = EX 1 x i > x i (be careful: it is the inverse of the expectation and not the reverse). Finally, we have: p N bβ d! β0 N 0, σ 2 EX 1 x i x i > Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 159 / 174
4.2. Asymptotic properties Remark 1 In some textbooks (Greene, 2007 for instance), the asymptotic variance covariance matrix of the OLS estimator is expressed as a function of the plim of N 1 X > X denoted by Q. Q = p lim 1 N X> X 2 Both notations are equivalent since: 1 N X> X = 1 N N x i x i > i=1 By using the WLLN (Kinchine s theorem), we have: 1 N X> X! p Q = E X x i x i > Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 160 / 174
4.2. Asymptotic properties Theorem (Asymptotic distribution) Consider the multiple linear regression model y = Xβ 0 + ε. Under assumptions A1-A5, the OLS estimator b β is asymptotically normally distributed: p N bβ β0 d! N 0, σ 2 Q 1 where or equivalently Q = p lim 1 N X> X = E X x > i x i bβ asy N β 0, σ2 N Q 1 Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 161 / 174
4.2. Asymptotic properties De nition (Asymptotic variance covariance matrix) The asymptotic variance covariance matrix of the OLS estimators β b is equal to V asy bβ = σ2 N E X 1 x i x i > or equivalently V asy bβ = σ2 N Q 1 where Q = p lim N 1 X > X. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 162 / 174
4.2. Asymptotic properties Remark Finite sample variance 1 V bβ = σ 2 E X X > X Asymptotic variance V asy bβ = σ2 N E X 1 x i x i > Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 163 / 174
4.2. Asymptotic properties Remark How to estimate the asymptotic variance covariance matrix of the OLS estimator? V asy bβ = σ2 N E X 1 x i x i > Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 164 / 174
4.2. Asymptotic properties De nition (Estimator of the asymptotic variance matrix) A consistent estimator of the asymptotic variance covariance matrix is given by: bv 1 asy bβ = bσ 2 X > X where bσ 2 is consistent estimator of σ 2 : bσ 2 = bε> bε N K Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 165 / 174
4.2. Asymptotic properties Proof We known that: X > X N V asy bβ = 1 N N x i x i > i=1 = σ2 N E X 1 x i x i > bσ 2 p! σ 2 p! E X x i x > i By using the CMP and the Slutsky theorem, we get: = Q bσ 2! X > 1 X p! σ 2 EX 1 x i x i > N Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 166 / 174
4.2. Asymptotic properties Proof (cont d) As consequence bσ 2 V asy bβ X > X N = σ2 N E X 1! 1 p! σ 2 EX 1 x i x > i x i x > i bσ 2 N! X > 1 X p σ 2! N N E X 1 x i x i > 1 bσ 2 X > X p! Vasy bβ or equivalently bv asy bβ p! Vasy bβ Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 167 / 174
4.2. Asymptotic properties Remark Under assumptions A1-A5, for a stochastic matrix X, we have Variance Estimator Finite sample 1 V bβ = σ 2 E X X > X bv 1 bβ = bσ 2 X > X Asymptotic V asy bβ = σ 2 N 1 Q 1 with Q = p lim 1 N X> X bv asy bβ = bσ 2 X > X 1 Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 168 / 174
4.2. Asymptotic properties Example (CAPM) The empirical analogue of the CAPM is given by: er it = α i + β i er mt + ε t er it = r it r {z ft } excess return of security i at time t where ε t is an i.i.d. error term with: er mt = (r mt r ft ) {z } market excess return at time t E (ε t ) = 0 V (ε t ) = σ 2 E ( ε t j er mt ) = 0 Data le: capm.xls for Microsoft, SP500 and Tbill (closing prices) from 11/1/1993 to 04/03/2003 Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 169 / 174
4.2. Asymptotic sample properties Example (CAPM, cont d) Question: write a Matlab code in order to nd (1) the estimates of α and β for Microsoft, (2) the corresponding standard errors, (3) the SE of the regression, (4) the SSR, (5) the coe cient of determination and (6) the adjusted R 2. Compare your results to the following table (Eviews) Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 170 / 174
4.2. Asymptotic sample properties Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 171 / 174
4.2. Asymptotic sample properties Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 172 / 174
4.2. Asymptotic sample properties Key Concepts 1 The OLS estimator is weakly consist 2 The OLS estimator is asymptotically normally distributed 3 Asymptotic variance covariance matrix 4 Estimator of the asymptotic variance Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 173 / 174
End of Chapter 3 Christophe Hurlin (University of Orléans) Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 174 / 174