1. Regression equations and other model equations

Similar documents
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Recall that two vectors in are perpendicular or orthogonal provided that their dot

Introduction to General and Generalized Linear Models

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Introduction to Matrix Algebra

Chapter 3: The Multiple Linear Regression Model

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

Econometrics Simple Linear Regression

Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components

1 Teaching notes on GMM 1.

1 Introduction to Matrices

Quadratic forms Cochran s theorem, degrees of freedom, and all that

3.1 Least squares in matrix form

Chapter 6: Multivariate Cointegration Analysis

0.8 Rational Expressions and Equations

α = u v. In other words, Orthogonal Projection

Multivariate normal distribution and testing for means (see MKB Ch 3)

Module 3: Correlation and Covariance

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

Data Mining: Algorithms and Applications Matrix Math Review

2. Linear regression with multiple regressors

Notes on Applied Linear Regression

CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA

The Bivariate Normal Distribution

The Method of Least Squares

Rotation Matrices and Homogeneous Transformations

Chapter 4: Vector Autoregressive Models

These axioms must hold for all vectors ū, v, and w in V and all scalars c and d.

Sections 2.11 and 5.8

Factor analysis. Angela Montanari

2. Simple Linear Regression

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Regression Analysis LECTURE 2. The Multiple Regression Model in Matrices Consider the regression equation. (1) y = β 0 + β 1 x β k x k + ε,

Multiple Linear Regression in Data Mining

Solving Systems of Linear Equations

DATA ANALYSIS II. Matrix Algorithms

1 Another method of estimation: least squares

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

Simple Regression Theory II 2010 Samuel L. Baker

Similarity and Diagonalization. Similar Matrices

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

Multiple regression - Matrices

University of Ljubljana Doctoral Programme in Statistics Methodology of Statistical Research Written examination February 14 th, 2014.

Regression Analysis. Regression Analysis MIT 18.S096. Dr. Kempthorne. Fall 2013

Big Ideas in Mathematics

Understanding and Applying Kalman Filtering

QUALITY ENGINEERING PROGRAM

Inner product. Definition of inner product

Chapter 1. Vector autoregressions. 1.1 VARs and the identi cation problem

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

Row Echelon Form and Reduced Row Echelon Form

Orthogonal Diagonalization of Symmetric Matrices

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

Continued Fractions and the Euclidean Algorithm

Elementary Matrices and The LU Factorization

Algebra 2 Chapter 1 Vocabulary. identity - A statement that equates two equivalent expressions.

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression

is in plane V. However, it may be more convenient to introduce a plane coordinate system in V.

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES. From Exploratory Factor Analysis Ledyard R Tucker and Robert C.

ECON Game Theory Exam 1 - Answer Key. 4) All exams must be turned in by 1:45 pm. No extensions will be granted.

Nonlinear Iterative Partial Least Squares Method

Systems of Linear Equations

Vector Time Series Model Representations and Analysis with XploRe

An extension of the factoring likelihood approach for non-monotone missing data

Lecture 2 Matrix Operations

Partial Least Squares (PLS) Regression.

BookTOC.txt. 1. Functions, Graphs, and Models. Algebra Toolbox. Sets. The Real Numbers. Inequalities and Intervals on the Real Number Line

DIFFERENTIABILITY OF COMPLEX FUNCTIONS. Contents

Notes on Orthogonal and Symmetric Matrices MENU, Winter 2013

Solutions to Math 51 First Exam January 29, 2015

Algebraic Concepts Algebraic Concepts Writing

Math 241, Exam 1 Information.

CAPM, Arbitrage, and Linear Factor Models

On the degrees of freedom in shrinkage estimation

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions.

SYSTEMS OF REGRESSION EQUATIONS

5. Linear Regression

CURVE FITTING LEAST SQUARES APPROXIMATION

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Typical Linear Equation Set and Corresponding Matrices

SECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA

Linearly Independent Sets and Linearly Dependent Sets

Variance Reduction. Pricing American Options. Monte Carlo Option Pricing. Delta and Common Random Numbers

Lecture 2. Marginal Functions, Average Functions, Elasticity, the Marginal Principle, and Constrained Optimization

Econometric Methods for Panel Data

Appendices with Supplementary Materials for CAPM for Estimating Cost of Equity Capital: Interpreting the Empirical Evidence

Matrix Differentiation

Internet Appendix to CAPM for estimating cost of equity capital: Interpreting the empirical evidence

How To Understand And Solve Algebraic Equations

Elasticity Theory Basics

Some probability and statistics

Simple Linear Regression Inference

GRADES 7, 8, AND 9 BIG IDEAS

Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics

Algebra 1 Course Title

Multivariate Analysis of Variance (MANOVA): I. Theory

Essential Mathematics for Computer Graphics fast

v w is orthogonal to both v and w. the three vectors v, w and v w form a right-handed set of vectors.

Transcription:

RNy, econ460, 06 Lecture note. Regression equations and other model equations Hendry and Nilsen (HN), start with the joint probability function (pdf) for the observable random variables, for example in 5. in HN about the two-variable regression model. Extending the joint distribution of the data from Lecture, we can think of f(x, X,..., X k ) () as the pdf for the k + random variables (X 0i, X i,..., X ki ). When we choose to use a regression model equation, we specify one random variable as a regressand and the other as regressors. Let Y X 0 define the regressand. We can always write the joint pdf as the product of a conditional pdf and a marginal pdf: f(y, X,..., X k ) f(y X,..., X k ) f(x,..., X k ) () Note that f(x,..., X k ) is a joint pdf in its own right, but it is marginal relative to the full pdf on the left hand side. From f(y X,..., X k ) we can always construct the conditional expectation function: E(Y X,..., X k ) (3) and define the disturbance ε as: ε Y E(Y X,..., X k ). (4) For each realization of the X-variables, the conditional expectation E(Y x,..., x k ) is deterministic. But we can consider the expectation for any realization of X, and by the Law of iterated expectations we get: E(E(ε X,..., X k )) E(ε) E([Y E(Y X,..., X k ) X,..., X k ) 0 E(ε) 0 (5) Added together, E(Y X,..., X k ) and ε gives the regression model in equation form: Y E(Y X,..., X k ) + ε (6) However, we keep in mind that E(Y X,..., X k ) needs not be linear in the parameters, and that the specification of functional-form often is the main modelling task. An important reference case is however when the joint pdf is a multivariate normal distribution. In that case it is implied that the conditional expectation function is linear, as is assumed directly in the model specification on page 68, 5., in HN: Assumption number ii) conditional normality. This assumption would have been implied if HN had assumed that the joint pdf on page 66 was a normal distribution. In that case, the marginal pdf is also normal (a well known property of the multivariate normal distribution). Conversely, when conditional normality is used in the model specification, nothing is implied about normality of the joint pdf. It can be whatever it wants, because the marginal pdf can also be whatever it wanth. In that sense, the assumption of conditional normality is weaker than the assumption about joint normality. The same remarks apply to the model specification for multiple regression in Ch 7 of HN as well. For k 3, the equation form of the model becomes Y i β + β X i + β X 3i + ε i, i,,..., n (7)

as a consequence of the assumed conditional normality, but we could also derive the same equation if we had a reason to assume that the joint pdf of f(y, X, X, X 3 ) was normal. Either way, ε i is a normally distributed variable with expectation zero, and constant variance. Note that in (7), we have added the usual subscript i for observation, and we have defined X as a constant with value. To be relevant, the parameters of the regression E(Y X,..., X k ), or derived parameters from the regression,should include our parameters of interest (focus parameters). Examples of parameters or from a regression model: The average response in Y to a change in one X j. The best prediction of Y given x,..., x k. Short-run, medium run and long-run multipliers (in dynamic regression models). Short-term and long run forecasts (in dynamic regression models). An alternative to regression modelling, is instead to put the joint pdf f(y, X,..., X k ) on equation form. We call this system modelling and it will be a central time later in the course. The distinction between regression model equations and other model equations can be subtle. We often focus on a single equation in the joint pdf, for example the demand function in a two equation model for price and quantity in a product market. The model equation (for demand Y ) will then look like a linear regression model: Y i γ + γ X i + γ 3 X 3i + ɛ i, i,,..., n, (8) but because E(ɛ i X i, X i ) 0, it is not a regression equation but a structural equation which is part of a system-of-equation representation of the joint pdf.. Projection matrices and TSS and RSS, and all that The matrix M I X(X X) X (9) is often called the residual maker. That nickname is easy to understand, since: My (I X(X X) X )y y X(X X) X y y X ˆβ ˆε M plays a central role in many derivations. The following properties are worth noting (and showing for yourself) : M M, symmetric matrix (0) M M, idempotent matrix () MX 0, regression of X on X gives perfect fit () Another way of interpreting MX 0 is that since M produces the OLS residuals, M is orthogonal to X. M does not affect the residuals: Since y Xβ + ε, we also have: Mˆε M(y X ˆβ) My MX ˆβ ˆε (3) M ε M(y Xβ) ˆε. (4) Note that this gives: ˆε ˆε ε Mε (5)

which shows that RSS is a quadratic form in the theoretical disturbances. A second important matrix in regression analysis is: which is called the prediction matrix, since P X(X X) X (6) ŷ X ˆβ X(X X) X y Py (7) P is also symmetric and idempotent. In linear algebra, M and P are both known as projection matrices. M and P are orthogonal: MP PM 0 (8) Since M and P are also complementary projections which gives and therefore: The scalar product y y then becomes: M I P, M + P I (9) (M + P)y Iy y ŷ + ˆε Py + My ŷ + ˆε (0) y y(y P + y M)(Py + My) ŷ ŷ + ˆε ˆε Written out, this is: Y i T SS Ŷ i ESS + ˆε i RSS You may be more used to write this famous decomposition as: () (Y i Ȳ ) } {{ } T SS (Ŷi Ŷ ) } {{ } ESS + ˆε i RSS () where we the Y i s and Ŷis have been de-meaned or centered. As long as there is a constant in the model there is no contradiction between () and (), since X ˆε X (y X ˆβ) 0 as result of OLS and, when there is an intercept in the regression: X [ ι X this gives in the first row of X ˆε 0, and therefore: ι ˆε ˆε i 0 (3) Using this, you can show that () can be written as (). Ȳ Ŷ. (4) 3

Alternative ways of writing RSS that sometimes come in handy: ˆε ˆε (y M )My y My y ˆε ˆε y (5) ˆε ˆεy y y P Py y y y [ X(X X) X [ X(X X) X y (6) y y y X(X X) X y y y y X ˆβ y y ˆβ X y. R can also be defined with reference to (): R ˆε ˆε n (Y i Ȳ ) ˆε ˆε (M ι y) (M ι y) M ι I ιι (7) where we used the the so called centering matrix M ι.it has the property that when we pre-multiply an n dimentional vector with M ι, we get a vector where the mean is subtracted, which is the same as the residual from regressing a variable on a constant. 3. Partitioned regression We now let β [ ˆβ where β has k parameters and β has k parameters. k + k k. [ The corresponding partitioning of X is X X n k X. The OLS estimated model with this notation: y X ˆβ +X ˆβ + ˆε, (8) We begin by writing out the normal equations: ˆε My. n k (X X) ˆβ X y in terms of the partitioning: ( ) ( ) X X X X ˆβ X X X X ( ) X y X. (9) y where we have used [ X X [X X X X see page 0 and Exercise.9 in DM. The vector of OLS estimators can therefore be expressed as: ( ) ( ) ˆβ X X X ( ) X X y X X X X X. (30) y We need a result for the inverse of a partitioned matrix. There are several versions, but probably the most useful for us is: ( ) ( A A A A A A A ) A A A A A A (3) where A A A A A (3) A A A A A. (33) 4

With the aid of (3)-(33) it can be shown that ( ) ˆβ ( (X M X ) ) X M y (X M X ), (34) X M y where M and M are the two residual-makers: M I X (X X ) X (35) M I X (X X ) X. (36) As an application, start with: X [ X X [ ι X where ι ι n so ι X. n X... X k X... X k.. X n... X nk M I ι (ι ι) ι M I ιι M ι the centering matrix. Show that it is: n M ι..... n n By using the general formula for partitioned regression above, we get (X M X ) X M y ( [M ι X [M ι X ) [Mι X y [ (X X ) (X X ) (X X ) y (37) where X is the n (k ) matrix with the variable means in the k columns. In the seminar set you are invited to show the same result for more directly, without the use of the results from partitioned regression, by re-writing the regression model. 4. Frisch-Waugh-Lovell (FWL) theorem The expression for in (34) suggests that there is another simple method for finding that involves M : We premultiply the model (8), first with M and then with X : M y M X ˆβ + ˆε (38) X M y X M X ˆβ (39) 5

where we have used M X 0 and M ˆε ˆε in (38) and X ˆε 0 in (38), since the influence of the k second variables has already been regressed out. We can find from this normal equation: It can be shown that the covariance matrix is ( X M X ) X M y (40) ( ) Cov ˆβ ˆσ (X M X ). where ˆσ denotes the unbiased estimator for the common variance of the disturbances: The corresponding expressions for ˆβ is: ˆσ ˆεˆε n k. ˆβ (X M X ) X M y (4) ( ) Cov ˆβ ˆσ (X M X ). Interpretation of (40):Use symmetry and idempotency of M : ((M X ) (M X )) (M X ) M y Here, M y is the residual vector ˆε y X and M X are the matrix with k residuals obtained from regressing each variable in X on X, ˆε X X. But this means that can be obtained by first running these two auxiliary regressions,saving the residuals in ˆε Y X and ˆε X X and then running a second regression with ˆε Y X as regressand, and with ˆε X X as the matrix with regressors: (M X ) (M X ) ˆε X X ˆε X X (M X ) (M y ) (4) ˆε X X ˆε Y X We have, in fact, shown the Frisch-Waugh(-Lovell) theorem! If we want to estimate the partial effects on Y of changes in the variables in X (i.e., controlled for the influence of X ), we can do that in two ways. Either by estimating the full multivariate model y X ˆβ +X ˆβ + ˆε or, by following the above two-step procedure of obtaining residuals with X as regressor and then obtaining from (4). Other applications: Trend correction Seasonal adjustment by seasonal dummy variables Leverage (or influential observations) 6