# Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Size: px
Start display at page:

Download "Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model"

Transcription

1 Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written as (1) or () where xt is the tth row of the matrix X or simply as (3) where it is implicit that x t is a row vector containing the regressors for the tth time period. The classical assumptions on the model can be summarized as (4) Assumption V as written implies II and III. These assumptions are described as 1. linearity. zero mean of the error vector 3. scalar covariance matrix for the error vector 4. non-stochastic X matrix of full rank 5. normality of the error vector With normally distributed disturbances, the joint density (and therefore likelihood function) of y is

2 (5) The natural log of the likelihood function is given by (6) Maximum likelihood estimators are obtained by setting the derivatives of (6) equal to zero and solving the resulting k+1 equations for the k s and. These first order conditions for the M.L estimators are (7) Solving we obtain (8) The ordinary least squares estimator is obtained be minimizing the sum of squared errors which is defined by

3 3 (9) The necessary condition for to be a minimum is that (10) This gives the normal equations which can then be solved to obtain the least squares estimator (11) The maximum likelihood estimator of estimator is given as is the same as the least squares estimator. The distribution of this (1) We have shown that the least squares estimator is: 1. unbiased. minimum variance of all unbiased estimators 3. consistent 4. asymptotically normal 5. asymptotically efficient. In this section we will discuss how the statistical properties of crucially dependent upon the assumptions I-V. The discussion will proceed by dropping one assumption at a time and considering the consequences. Following a general discussion, later sections will analyze specific violations of the assumptions in detail.

4 4 B. Nonlinearity 1. nonlinearity in the variables only If the model is nonlinear in the variables, but linear in the parameters, it can still be estimated using linear regression techniques. For example consider a set of variables z = (z 1, z,...,z p ), a set of k functions h... h, and parameters. Now define the model: 1 k (13) 0 This model is linear in the parameters and can be estimated using standard techniques where the functions h take the place of the x variables in the standard model. i. intrinsic linearity in the parameters a. idea Sometimes models are nonlinear in the parameters. Some of these may be intrinsically linear, however. In the classical model, if the k parameters can be written as k one-to-one functions (perhaps nonlinear) of a set of k underlying parameters 1,..., k, then model is intrinsically linear in. b. example (14) The model is nonlinear in the parameter A, but since it is linear in, and is a one-to-one 0 function of A, the model is intrinsically linear. 3. inherently nonlinear models Models that are inherently nonlinear cannot be estimated using ordinary least squares and the previously derived formulas. Alternatives include Taylor's series approximations and direct nonlinear estimation. In the section on non-linear estimation we showed that the non-linear least squares estimator is: 1. consistent. asymptotically normal We also showed that the maximum likelihood estimator in a general non-linear model is 1. consistent. asymptotically normal 3. asymptotically efficient in the sense that within the consistent asymptotic normal (CAN) class it has minimum variance

5 If the distribution of the error terms in the non-linear least squares model is normal, and the errors are iid(o, ), then the non-linear least squares estimator and the maximum likelihood estimator will be the same, just as in the classical normal linear regression model. C. Non-zero expected value of error term (E( t) 0) Consider the case where has a non-zero expectation. The least squares estimators of is given by (15) 5 The expected value of is given as follows where (16) which appears to suggest that all of the least squares estimators in the vector are biased. However, if E( ) = for all t, then t (17) To interpret this consider the rules for matrix multiplication.

6 6 (18) Now consider (19) The first column of the X matrix is a column of ones. Therefore (0) Thus it is clear that

7 7 (1) and only the estimator of the intercept is biased. This situation can arise if a relevant and important factor has been omitted from the model, but the factor doesn't change over time. The effect of this variable is then included in the intercept and separate estimators of and can't be obtained. 1 More general violations lead to more serious problems and in general the least squares estimators of and are biased.

8 8 D. A non-scalar identity covariance matrix 1. introduction Assumption III implies that the covariance matrix of the error vector is a constant multiplied by the identity matrix. In general this covariance may be any positive definite matrix. Different assumptions about this matrix will lead to different properties of various estimators.. heteroskedasticity Heteroskedasticity is the case where the diagonal terms of the covariance matrix are not all equal, i.e. Var ( ) for all t t With heteroskedasticity alone the covariance matrix is given by () This model will have k + n parameters and cannot be estimated using n observations unless some assumptions (restrictions) about the parameters are made....

9 9 3. autocorrelation Autocorrelation is the case where the off-diagonal elements of the covariance matrix are not zero, i.e. Cov (, ) 0 for t s. With no autocorrelation, the errors have no discernible pattern. t s In the case above, positive levels of tend to be associated with positive levels and so on. With autocorrelation alone is given by (3) This model will have k (n(n-1)/) parameters and cannot be estimated using n observations unless some assumptions (restrictions) about the parameters are made.

10 10 4. the general linear model For situations in which autocorrelation or heteroskedasticity exists (4) and the model can be written more generally as (5) Assumption VI as written here allows X to be stochastic, but along with II, allows all results to be conditioned on X in a meaningful way. This model is referred to as the generalized normal linear regression model and includes the classical normal linear regression model as a special case, i.e., when = I. The unknown parameters in the generalized regression model are the 's = ( 1,..., k)', and the n(n + 1)/ independent elements of the covariance matrix. In general it is not possible to estimate unless simplifying assumptions are made since one cannot estimate k + [n(n+1)/] parameters with n observations. 5. Least squares estimations of in the general linear model with = known Least squares estimation makes no assumptions about the disturbance matrix and so is defined as before using the sum of squared errors. The sum of squared errors is defined by (6) The necessary condition for to be a minimum is that (7)

11 11 This gives the normal equations which can then be solved to obtain the least squares estimator (8) The least squares estimator is exactly the same as before. Its properties may be different, however, as will be shown in a later section. 6. Maximum likelihood estimation with known The likelihood function for the vector random variable y is given by the multivariate normal density. For this model Therefore the likelihood function is given by (9) (30) The natural log of the likelihood function is given as (31) The M.L.E. of is defined by maximizing 31 (3) This then yields as an estimator of (33) This estimator differs from the least squares estimator. Thus the least squares estimator will have different properties than the maximum likelihood estimator. Notice that if is equal to I, the estimators are the same.

12 1 (34) 7. Best linear unbiased estimation with known BLUE estimators are obtained by finding the best estimator that satisfies certain conditions. BLUE estimators have the properties of being linear, unbiased, and minimum variance among all linear unbiased estimators. Linearity and unbiasedness can be summarized as (35) The estimator must also be minimum variance. One definition of this is that the variance of each must be a minimum. The variance of the ith is given by the ith diagonal element of (36) This can be denoted as (37) where is the ith row of the matrix A, is and is given as where i is the ith row of an kxk identity matrix. The construction of the estimator can be reduced to selecting the matrix A so that the rows of A (38) Because the result will be symmetric for each (hence, for each a ), denote by where a is an (n by 1) vector. The problem then becomes: i i

13 13 (39) The column vector i is the ith column of the identity matrix. The Lagrangian is as follows (40) To minimize it take the derivatives with respect to a and (41) -1 Now substitute a' = (1/) 'X' into the second equation in 41 to obtain (4) It is obvious that AX = I. The BLUE and MLE estimators of are identical, but different from the least squares estimator of. We sometimes call the BLUE estimator of in the general linear model, the generalized least squares estimator, GLS. This estimator is also sometimes called the Aitken estimator after the individual who first proposed it. 8. A note on the distribution of a. introduction

14 14 For the Classical Normal Linear Regression Model we showed that (43) For the Generalized Regression Model (44) b. unbiasedness of ordinary least squares in the general linear model As before write in the following fashion. (45) Now take the expected value of equation 45 (46) Because is either fixed or a function only of X if X is stochastic, it can be factored out of the expectation, leaving E( X), which has an expectation of zero by assumption II. Now find the unconditional expectation of by using the law of iterated expectations. In the sense of Theorem 3 of the section on alternative estimators, h(x,y) is and E Y X computes the expected value of conditioned on X. The interpretation of this result is that for any particular set of observations, X, the least squares estimator has expectation. c. variance of the OLS estimator First rewrite equation 4 as follows (47)

15 15 (48) Now directly compute the variance of given X. (49) If the regressors are non-stochastic, then this is also the unconditional variance of regressors are stochastic, then the unconditional variance is given by. If the d. unbiasedness of MLE and BLUE in the general linear model First write the GLS estimator as follows (50) Now take the expected value of equation 50 (51) Because is either fixed or a function only of X if X is stochastic, it can be factored out of the expectation, leaving E( X), which has an expectation of zero by assumption II. Now find the unconditional expectation of by using the law of iterated expectations. The interpretation of this result is that for any particular set of observations, X, the generalized least squares estimator has expectation. e. variance of the GLS (MLE and BLUE) estimator (5)

16 16 First rewrite equation 50 as follows (53) Now directly compute the variance of given X. (54) If the regressors are non-stochastic, then this is also the unconditional variance of regressors are stochastic, then the unconditional variance is given by. If the f. summary of finite sample properties of OLS in the general model Note that all the estimators are unbiased estimators of, but. If = I then the classical model results are obtained. Thus using least squares in the generalized model gives unbiased estimators, but the variance of the estimator may not be minimal. 9. Consistency of OLS in the generalized linear regression model We have shown that the least squares estimator in the general model is unbiased. If we can show that its variance goes to zero as n goes to infinity we will have shown that it is mean square error consistent, and thus that is converges in probability to. This variance is given by (55) As previously, we will assume that (56) With this assumption, we need to consider the remaining term, i.e.,

17 17 (57) The leading term,, will, by itself go to zero. We can write the matrix term in the following useful fashion similar to the way we wrote out a matrix product in proving the asymptotic normality of the non-linear least squares estimator. Remember specifically that (58) where In similar fashion we can show that the matrix in equation 57 can be written as (59) The second term in equation 59 is a sum of n terms divided by n. In order to check convergence of this product, we need to consider the order of each term. Remember the definition of order given earlier. Definition of order: 1. A sequence {a } is at most of order n, which we denote O(n ) if is bounded. n When =0, {a } is bounded, and we also write a = O(1), which we say as big oh one. n n. A sequence {a } is of smaller order then n, which we denote o(n ) if. When =0, {a } converges to zero, and we also write a = o(1), which we say as little oh one. n n n The first term in the product is of order 1/n, O(1/n). The second term, in general is of O(n). So it appears that if the product of these two terms converges, it might converge to a matrix of non-zero constants. If this were the case, proving consistency would be a problem. At this point we will simply make an assumption as follows.

18 18 (60) If this is the case, then the expression in equation 59 will converge in the limit to zero, and will be consistent. Using arguments similar to those adopted previously, we can also show that the OLS estimator will be asymptotically normal in a wide variety of settings (Amemiya, 187). Discussion of the GLS estimator will be discussed later. 10. Consistent estimators of the covariance matrix in the case of general error structures a. general discussion If were known, then the estimator of the asymptotic covariance matrix of would be (61) The outer terms are available from the data, and if were known, we would have the information we need to compute standard errors. From a sample of n observations, there is no way to estimate the elements of. But what we really need is an estimator of which is a symmetric k k matrix. What we then need is an estimator of this matrix. We can write this is a more useful fashion as follows (6) where t is the appropriate element of as compared to. The idea will be to use information on the residuals from the least squares regression to devise a way to approximate Q. b. heteroskedasticity only In the case where there is no auto correlation, that is when is a diagonal matrix, we can write equation 6 as * (63) White has shown that under very general conditions, the estimator

19 19 (64) The proof is based on the fact that is a consistent estimate of, (meaning the residuals are -1 consistent estimates of ), and fairly mild assumptions on X. Then rather than using (X X) to estimated the variance of in the general model, we instead use (65) c. autocorrelation In the case of a more general covariance matrix, a candidate estimator for Q * might be (66) The difficulty here is that to this matrix may not converge in the limit. To obtain convergence, it is necessary to assume that the terms involving unequal subscripts in (66) diminish in importance as n grows. A sufficient condition is that terms with subscript pairs t - grow smaller as the distance between then grows larger. A more practical problem for estimation is that Q * may not be positive definite. Newey and West have proposed an estimator to solve this problem using some of the cross products e e. This estimator will be discussed in a later section. E. Stochastic X matrix (possibly less than full rank) 1. X matrix less than full rank t t- If the X matrix, which is nxk, has rank less than k then X'X cannot be inverted and the least squares estimator will not be defined. This was discussed in detail in the section on multicollinearity.. Stochastic X Consider the least squares estimator of in the classical model. We showed that it was unbiased as follows. (67) If the X matrix is stochastic and correlated with, we cannot factor it out of the second term in equation 48. If this is the case,. In such cases, the least squares estimator is usually not only

20 0 biased, but is inconsistent as well. Consider for example the case where converges to a finite and non-singular matrix Q. Then we can compute the probability limit of as (68) unless. We showed previously that a consistent estimator of could be obtained using instrumental variables (IV). The idea of instrumental variables is to devise an estimator of such that the second term in equation 49 will have a probability limit of zero. The instrumental variables estimator is based on the idea that the instruments used in the estimation are not highly correlated with and any correlation disappears in large samples. A further condition is that these instruments are correlated with variables in the matrix X. We defined instrumental variables estimators in two different cases, when the number of instrumental variables was equal to the number of columns of the X matrix, i.e., Z was n x k matrix, and cases where there were more than k instruments. In either case we assumed that had the following properties (69) Then the instrumental variables estimator was given by By finding the plim of this estimator, we showed that it was consistent (70) (71)

21 In the case where the number of instrumental variables was greater than k, we formed k instruments by projecting each of the columns of the stochastic X matrix on all of the instruments, and then used the predicted values of X from these regressions as instrumental variables in defining the IV estimator. If we let P Z be the matrix that projects orthogonally onto the column space defined by the vectors Z, S(Z), then the IV estimator is given by 1 (7) We always assume that the matrix X PZX has full rank, which is a necessary condition for identified. In a similar fashion to equation 5, we can show that this IV estimator is consistent. F. Random disturbances are not distributed normally (assumptions I-IV hold) 1. General discussion to be An inspection of the derivation of the least squares estimator reveals that the deduction is not dependent upon any of the assumptions II-V except for the full rank condition on X. It really doesn't depend on I, if we are simply estimating a linear model no matter the nature of the underlying model. Thus for the model the OLS estimator is always (73) (74) even when assumption V is dropped. However, the statistical properties of distribution of. are very sensitive to the Similarly, we note that while the BLUE of depends upon II-IV, is invariant with respect to the assumptions about the underlying density of as long as II-IV are valid. We can thus conclude that even when the error term is not normally distributed.. Properties of the estimators (OLS and BLUE) when is not normally distributed When the error terms in the linear regression model are not normally distributed, the OLS and BLUE estimators are: a. unbiased b. minimum variance of all unbiased linear estimators (not necessarily of all unbiased estimators since the Cramer Rao lower bound is not known unless we know the density of the residuals) c. consistent d. but standard t and F tests and confidence intervals are not necessarily valid for nonnormally (75)

22 distributed residuals The distribution of (e.g., normal, Beta, Chi Square, etc.) will depend on the distribution of which determines the distribution of y (y = X + ). The maximum likelihood estimator, of course, will differ since it depends explicitly on the joint density function of the residuals. And this joint density gives rise to the likelihood function (76) and requires a knowledge of the distribution of the random disturbances. It is not defined otherwise. MLE are generally efficient estimators and least squares estimators will be efficient if f(y; ) is normal. However, least squares need not be efficient if the residuals are not distributed normally.

23 3 3. example Consider the case in which the density function of the random disturbances is defined by the Laplace distribution (77) which can be graphically depicted as The associated likelihood function is defined by (78) where x t = (1, x t,..., x tk), ' = ( 1,..., k). The log likelihood function is given by (79) The MLE of in this case will minimize (80) and is sometimes called the "least lines," minimum absolute deviations (MAD), or least absolute deviation (LAD) estimator. It will have all the properties of maximum likelihood estimators such as being asymptotically unbiased, consistent, and asymptotically efficient. It need not, however, be unbiased, linear, or minimum variance of all unbiased estimators. 4. Testing for and using other distributions The functional form of the distribution of the residuals is rarely investigated. This can be done, however, by comparing the distribution of with the normal. t

24 Various tests have been proposed to test the assumption of normality. These tests take different forms. One class of tests is based on examining the skewness or kurtosis of the distribution of the estimated residuals. Chi square goodness of fit tests have been proposed which are based upon comparing the histogram of estimated residuals with the normal distribution. The Kolmogorov-Smirnov test is based upon the distribution of the maximum vertical distance between the cumulative histogram and the cumulative distribution of the hypothesized distribution. An alternative approach is to consider general distribution functions such as the beta or gamma which include many of the common alternative specifications as special cases. Literature Cited Aitken, A.C. On Least Squares and Liner Combinations of Observations. Proceedings of the Royal Statistical Society, 55, (1935):4-48 Amemiya, T. Advanced Econometrics. Cambridge: Harvard University Press, Huber, P. J., Robust Statistics, New York: Wiley, Newey, W., and K. West. A Simple Positive Definite, Heteroskedasticity and Autocorrelation Consistent Covariance Matrix. Econometrica 55 (1987): White, H. A Heteroskedasticity Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity. Econometrica 48 (1980):

### MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a

### MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS 1. SYSTEMS OF EQUATIONS AND MATRICES 1.1. Representation of a linear system. The general system of m equations in n unknowns can be written a 11 x 1 + a 12 x 2 +

### 1 Teaching notes on GMM 1.

Bent E. Sørensen January 23, 2007 1 Teaching notes on GMM 1. Generalized Method of Moment (GMM) estimation is one of two developments in econometrics in the 80ies that revolutionized empirical work in

### Chapter 3: The Multiple Linear Regression Model

Chapter 3: The Multiple Linear Regression Model Advanced Econometrics - HEC Lausanne Christophe Hurlin University of Orléans November 23, 2013 Christophe Hurlin (University of Orléans) Advanced Econometrics

### 2. Linear regression with multiple regressors

2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions

### Review Jeopardy. Blue vs. Orange. Review Jeopardy

Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round \$200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?

### SYSTEMS OF REGRESSION EQUATIONS

SYSTEMS OF REGRESSION EQUATIONS 1. MULTIPLE EQUATIONS y nt = x nt n + u nt, n = 1,...,N, t = 1,...,T, x nt is 1 k, and n is k 1. This is a version of the standard regression model where the observations

### What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling Jeff Wooldridge NBER Summer Institute, 2007 1. The Linear Model with Cluster Effects 2. Estimation with a Small Number of Groups and

### NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

### Mean squared error matrix comparison of least aquares and Stein-rule estimators for regression coefficients under non-normal disturbances

METRON - International Journal of Statistics 2008, vol. LXVI, n. 3, pp. 285-298 SHALABH HELGE TOUTENBURG CHRISTIAN HEUMANN Mean squared error matrix comparison of least aquares and Stein-rule estimators

### Multivariate Normal Distribution

Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues

### MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

MATH10212 Linear Algebra Textbook: D. Poole, Linear Algebra: A Modern Introduction. Thompson, 2006. ISBN 0-534-40596-7. Systems of Linear Equations Definition. An n-dimensional vector is a row or a column

### Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

### Simple Regression Theory II 2010 Samuel L. Baker

SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

### Solution to Homework 2

Solution to Homework 2 Olena Bormashenko September 23, 2011 Section 1.4: 1(a)(b)(i)(k), 4, 5, 14; Section 1.5: 1(a)(b)(c)(d)(e)(n), 2(a)(c), 13, 16, 17, 18, 27 Section 1.4 1. Compute the following, if

### 1 Short Introduction to Time Series

ECONOMICS 7344, Spring 202 Bent E. Sørensen January 24, 202 Short Introduction to Time Series A time series is a collection of stochastic variables x,.., x t,.., x T indexed by an integer value t. The

### Solving Linear Systems, Continued and The Inverse of a Matrix

, Continued and The of a Matrix Calculus III Summer 2013, Session II Monday, July 15, 2013 Agenda 1. The rank of a matrix 2. The inverse of a square matrix Gaussian Gaussian solves a linear system by reducing

### Module 3: Correlation and Covariance

Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis

### Notes on Determinant

ENGG2012B Advanced Engineering Mathematics Notes on Determinant Lecturer: Kenneth Shum Lecture 9-18/02/2013 The determinant of a system of linear equations determines whether the solution is unique, without

### MATHEMATICAL METHODS OF STATISTICS

MATHEMATICAL METHODS OF STATISTICS By HARALD CRAMER TROFESSOK IN THE UNIVERSITY OF STOCKHOLM Princeton PRINCETON UNIVERSITY PRESS 1946 TABLE OF CONTENTS. First Part. MATHEMATICAL INTRODUCTION. CHAPTERS

### Econometrics Simple Linear Regression

Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight

### MULTIVARIATE PROBABILITY DISTRIBUTIONS

MULTIVARIATE PROBABILITY DISTRIBUTIONS. PRELIMINARIES.. Example. Consider an experiment that consists of tossing a die and a coin at the same time. We can consider a number of random variables defined

### Inner Product Spaces

Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and

### 1 Another method of estimation: least squares

1 Another method of estimation: least squares erm: -estim.tex, Dec8, 009: 6 p.m. (draft - typos/writos likely exist) Corrections, comments, suggestions welcome. 1.1 Least squares in general Assume Y i

### Least Squares Estimation

Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

### Master s Theory Exam Spring 2006

Spring 2006 This exam contains 7 questions. You should attempt them all. Each question is divided into parts to help lead you through the material. You should attempt to complete as much of each problem

### Introduction to Matrix Algebra

Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra - 1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary

### 5.3 The Cross Product in R 3

53 The Cross Product in R 3 Definition 531 Let u = [u 1, u 2, u 3 ] and v = [v 1, v 2, v 3 ] Then the vector given by [u 2 v 3 u 3 v 2, u 3 v 1 u 1 v 3, u 1 v 2 u 2 v 1 ] is called the cross product (or

### T ( a i x i ) = a i T (x i ).

Chapter 2 Defn 1. (p. 65) Let V and W be vector spaces (over F ). We call a function T : V W a linear transformation form V to W if, for all x, y V and c F, we have (a) T (x + y) = T (x) + T (y) and (b)

### Wooldridge, Introductory Econometrics, 3d ed. Chapter 12: Serial correlation and heteroskedasticity in time series regressions

Wooldridge, Introductory Econometrics, 3d ed. Chapter 12: Serial correlation and heteroskedasticity in time series regressions What will happen if we violate the assumption that the errors are not serially

### Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Module No. #01 Lecture No. #15 Special Distributions-VI Today, I am going to introduce

### 2DI36 Statistics. 2DI36 Part II (Chapter 7 of MR)

2DI36 Statistics 2DI36 Part II (Chapter 7 of MR) What Have we Done so Far? Last time we introduced the concept of a dataset and seen how we can represent it in various ways But, how did this dataset came

### LOGIT AND PROBIT ANALYSIS

LOGIT AND PROBIT ANALYSIS A.K. Vasisht I.A.S.R.I., Library Avenue, New Delhi 110 012 amitvasisht@iasri.res.in In dummy regression variable models, it is assumed implicitly that the dependent variable Y

### MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance

### LS.6 Solution Matrices

LS.6 Solution Matrices In the literature, solutions to linear systems often are expressed using square matrices rather than vectors. You need to get used to the terminology. As before, we state the definitions

### Introduction to Regression and Data Analysis

Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

### ANALYSIS OF FACTOR BASED DATA MINING TECHNIQUES

Advances in Information Mining ISSN: 0975 3265 & E-ISSN: 0975 9093, Vol. 3, Issue 1, 2011, pp-26-32 Available online at http://www.bioinfo.in/contents.php?id=32 ANALYSIS OF FACTOR BASED DATA MINING TECHNIQUES

### Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

### Vector and Matrix Norms

Chapter 1 Vector and Matrix Norms 11 Vector Spaces Let F be a field (such as the real numbers, R, or complex numbers, C) with elements called scalars A Vector Space, V, over the field F is a non-empty

### Imputing Missing Data using SAS

ABSTRACT Paper 3295-2015 Imputing Missing Data using SAS Christopher Yim, California Polytechnic State University, San Luis Obispo Missing data is an unfortunate reality of statistics. However, there are

### MATH 551 - APPLIED MATRIX THEORY

MATH 55 - APPLIED MATRIX THEORY FINAL TEST: SAMPLE with SOLUTIONS (25 points NAME: PROBLEM (3 points A web of 5 pages is described by a directed graph whose matrix is given by A Do the following ( points

### 15.062 Data Mining: Algorithms and Applications Matrix Math Review

.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

### CAPM, Arbitrage, and Linear Factor Models

CAPM, Arbitrage, and Linear Factor Models CAPM, Arbitrage, Linear Factor Models 1/ 41 Introduction We now assume all investors actually choose mean-variance e cient portfolios. By equating these investors

### Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components

Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components The eigenvalues and eigenvectors of a square matrix play a key role in some important operations in statistics. In particular, they

### Similarity and Diagonalization. Similar Matrices

MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that

### Estimating Industry Multiples

Estimating Industry Multiples Malcolm Baker * Harvard University Richard S. Ruback Harvard University First Draft: May 1999 Rev. June 11, 1999 Abstract We analyze industry multiples for the S&P 500 in

### 10.2 ITERATIVE METHODS FOR SOLVING LINEAR SYSTEMS. The Jacobi Method

578 CHAPTER 1 NUMERICAL METHODS 1. ITERATIVE METHODS FOR SOLVING LINEAR SYSTEMS As a numerical technique, Gaussian elimination is rather unusual because it is direct. That is, a solution is obtained after

### ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages

### Solving Systems of Linear Equations

LECTURE 5 Solving Systems of Linear Equations Recall that we introduced the notion of matrices as a way of standardizing the expression of systems of linear equations In today s lecture I shall show how

### Statistical Machine Learning

Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

### 3.1 Least squares in matrix form

118 3 Multiple Regression 3.1 Least squares in matrix form E Uses Appendix A.2 A.4, A.6, A.7. 3.1.1 Introduction More than one explanatory variable In the foregoing chapter we considered the simple regression

### Chapter 2. Dynamic panel data models

Chapter 2. Dynamic panel data models Master of Science in Economics - University of Geneva Christophe Hurlin, Université d Orléans Université d Orléans April 2010 Introduction De nition We now consider

### Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) and Neural Networks( 類 神 經 網 路 ) 許 湘 伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 35 13 Examples

### Chapter 1. Vector autoregressions. 1.1 VARs and the identi cation problem

Chapter Vector autoregressions We begin by taking a look at the data of macroeconomics. A way to summarize the dynamics of macroeconomic data is to make use of vector autoregressions. VAR models have become

### INTRODUCTORY STATISTICS

INTRODUCTORY STATISTICS FIFTH EDITION Thomas H. Wonnacott University of Western Ontario Ronald J. Wonnacott University of Western Ontario WILEY JOHN WILEY & SONS New York Chichester Brisbane Toronto Singapore

### The Method of Least Squares

Hervé Abdi 1 1 Introduction The least square methods (LSM) is probably the most popular technique in statistics. This is due to several factors. First, most common estimators can be casted within this

### Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software

STATA Tutorial Professor Erdinç Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software 1.Wald Test Wald Test is used

### Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

### Standard errors of marginal effects in the heteroskedastic probit model

Standard errors of marginal effects in the heteroskedastic probit model Thomas Cornelißen Discussion Paper No. 320 August 2005 ISSN: 0949 9962 Abstract In non-linear regression models, such as the heteroskedastic

### Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives

### Methods for Finding Bases

Methods for Finding Bases Bases for the subspaces of a matrix Row-reduction methods can be used to find bases. Let us now look at an example illustrating how to obtain bases for the row space, null space,

### INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) Abstract Indirect inference is a simulation-based method for estimating the parameters of economic models. Its

### Session 7 Bivariate Data and Analysis

Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares

### Simple linear regression

Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

### CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES. From Exploratory Factor Analysis Ledyard R Tucker and Robert C.

CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES From Exploratory Factor Analysis Ledyard R Tucker and Robert C MacCallum 1997 180 CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES In

### Factor Analysis. Chapter 420. Introduction

Chapter 420 Introduction (FA) is an exploratory technique applied to a set of observed variables that seeks to find underlying factors (subsets of variables) from which the observed variables were generated.

### The Bivariate Normal Distribution

The Bivariate Normal Distribution This is Section 4.7 of the st edition (2002) of the book Introduction to Probability, by D. P. Bertsekas and J. N. Tsitsiklis. The material in this section was not included

### α = u v. In other words, Orthogonal Projection

Orthogonal Projection Given any nonzero vector v, it is possible to decompose an arbitrary vector u into a component that points in the direction of v and one that points in a direction orthogonal to v

### Algebra 2 Chapter 1 Vocabulary. identity - A statement that equates two equivalent expressions.

Chapter 1 Vocabulary identity - A statement that equates two equivalent expressions. verbal model- A word equation that represents a real-life problem. algebraic expression - An expression with variables.

### Basics of Statistical Machine Learning

CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

### Multiple Linear Regression in Data Mining

Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple

### Continued Fractions and the Euclidean Algorithm

Continued Fractions and the Euclidean Algorithm Lecture notes prepared for MATH 326, Spring 997 Department of Mathematics and Statistics University at Albany William F Hammond Table of Contents Introduction

### 13 MATH FACTS 101. 2 a = 1. 7. The elements of a vector have a graphical interpretation, which is particularly easy to see in two or three dimensions.

3 MATH FACTS 0 3 MATH FACTS 3. Vectors 3.. Definition We use the overhead arrow to denote a column vector, i.e., a linear segment with a direction. For example, in three-space, we write a vector in terms

### A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September

### Department of Economics

Department of Economics On Testing for Diagonality of Large Dimensional Covariance Matrices George Kapetanios Working Paper No. 526 October 2004 ISSN 1473-0278 On Testing for Diagonality of Large Dimensional

### DATA ANALYSIS II. Matrix Algorithms

DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where

### Autocovariance and Autocorrelation

Chapter 3 Autocovariance and Autocorrelation If the {X n } process is weakly stationary, the covariance of X n and X n+k depends only on the lag k. This leads to the following definition of the autocovariance

### Linear Models for Continuous Data

Chapter 2 Linear Models for Continuous Data The starting point in our exploration of statistical models in social research will be the classical linear model. Stops along the way include multiple linear

### Time Series and Forecasting

Chapter 22 Page 1 Time Series and Forecasting A time series is a sequence of observations of a random variable. Hence, it is a stochastic process. Examples include the monthly demand for a product, the

### Lecture 3: Finding integer solutions to systems of linear equations

Lecture 3: Finding integer solutions to systems of linear equations Algorithmic Number Theory (Fall 2014) Rutgers University Swastik Kopparty Scribe: Abhishek Bhrushundi 1 Overview The goal of this lecture

### From the help desk: Bootstrapped standard errors

The Stata Journal (2003) 3, Number 1, pp. 71 80 From the help desk: Bootstrapped standard errors Weihua Guan Stata Corporation Abstract. Bootstrapping is a nonparametric approach for evaluating the distribution

### Module 5: Multiple Regression Analysis

Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

### CURVE FITTING LEAST SQUARES APPROXIMATION

CURVE FITTING LEAST SQUARES APPROXIMATION Data analysis and curve fitting: Imagine that we are studying a physical system involving two quantities: x and y Also suppose that we expect a linear relationship

### Chapter 6: Multivariate Cointegration Analysis

Chapter 6: Multivariate Cointegration Analysis 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie VI. Multivariate Cointegration

### Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

### CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

### Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13

Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13 Overview Missingness and impact on statistical analysis Missing data assumptions/mechanisms Conventional

### The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series.

Cointegration The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series. Economic theory, however, often implies equilibrium

### Estimating an ARMA Process

Statistics 910, #12 1 Overview Estimating an ARMA Process 1. Main ideas 2. Fitting autoregressions 3. Fitting with moving average components 4. Standard errors 5. Examples 6. Appendix: Simple estimators

### E3: PROBABILITY AND STATISTICS lecture notes

E3: PROBABILITY AND STATISTICS lecture notes 2 Contents 1 PROBABILITY THEORY 7 1.1 Experiments and random events............................ 7 1.2 Certain event. Impossible event............................

### MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could

### PARTIAL LEAST SQUARES IS TO LISREL AS PRINCIPAL COMPONENTS ANALYSIS IS TO COMMON FACTOR ANALYSIS. Wynne W. Chin University of Calgary, CANADA

PARTIAL LEAST SQUARES IS TO LISREL AS PRINCIPAL COMPONENTS ANALYSIS IS TO COMMON FACTOR ANALYSIS. Wynne W. Chin University of Calgary, CANADA ABSTRACT The decision of whether to use PLS instead of a covariance

### Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

### FEGYVERNEKI SÁNDOR, PROBABILITY THEORY AND MATHEmATICAL

FEGYVERNEKI SÁNDOR, PROBABILITY THEORY AND MATHEmATICAL STATIsTICs 4 IV. RANDOm VECTORs 1. JOINTLY DIsTRIBUTED RANDOm VARIABLEs If are two rom variables defined on the same sample space we define the joint

### A linear combination is a sum of scalars times quantities. Such expressions arise quite frequently and have the form

Section 1.3 Matrix Products A linear combination is a sum of scalars times quantities. Such expressions arise quite frequently and have the form (scalar #1)(quantity #1) + (scalar #2)(quantity #2) +...

### Linear Algebra Notes for Marsden and Tromba Vector Calculus

Linear Algebra Notes for Marsden and Tromba Vector Calculus n-dimensional Euclidean Space and Matrices Definition of n space As was learned in Math b, a point in Euclidean three space can be thought of

### Au = = = 3u. Aw = = = 2w. so the action of A on u and w is very easy to picture: it simply amounts to a stretching by 3 and 2, respectively.

Chapter 7 Eigenvalues and Eigenvectors In this last chapter of our exploration of Linear Algebra we will revisit eigenvalues and eigenvectors of matrices, concepts that were already introduced in Geometry

### 1 Solving LPs: The Simplex Algorithm of George Dantzig

Solving LPs: The Simplex Algorithm of George Dantzig. Simplex Pivoting: Dictionary Format We illustrate a general solution procedure, called the simplex algorithm, by implementing it on a very simple example.