# Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Save this PDF as:

Size: px
Start display at page:

Download "Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model"

## Transcription

1 Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written as (1) or () where xt is the tth row of the matrix X or simply as (3) where it is implicit that x t is a row vector containing the regressors for the tth time period. The classical assumptions on the model can be summarized as (4) Assumption V as written implies II and III. These assumptions are described as 1. linearity. zero mean of the error vector 3. scalar covariance matrix for the error vector 4. non-stochastic X matrix of full rank 5. normality of the error vector With normally distributed disturbances, the joint density (and therefore likelihood function) of y is

2 (5) The natural log of the likelihood function is given by (6) Maximum likelihood estimators are obtained by setting the derivatives of (6) equal to zero and solving the resulting k+1 equations for the k s and. These first order conditions for the M.L estimators are (7) Solving we obtain (8) The ordinary least squares estimator is obtained be minimizing the sum of squared errors which is defined by

3 3 (9) The necessary condition for to be a minimum is that (10) This gives the normal equations which can then be solved to obtain the least squares estimator (11) The maximum likelihood estimator of estimator is given as is the same as the least squares estimator. The distribution of this (1) We have shown that the least squares estimator is: 1. unbiased. minimum variance of all unbiased estimators 3. consistent 4. asymptotically normal 5. asymptotically efficient. In this section we will discuss how the statistical properties of crucially dependent upon the assumptions I-V. The discussion will proceed by dropping one assumption at a time and considering the consequences. Following a general discussion, later sections will analyze specific violations of the assumptions in detail.

4 4 B. Nonlinearity 1. nonlinearity in the variables only If the model is nonlinear in the variables, but linear in the parameters, it can still be estimated using linear regression techniques. For example consider a set of variables z = (z 1, z,...,z p ), a set of k functions h... h, and parameters. Now define the model: 1 k (13) 0 This model is linear in the parameters and can be estimated using standard techniques where the functions h take the place of the x variables in the standard model. i. intrinsic linearity in the parameters a. idea Sometimes models are nonlinear in the parameters. Some of these may be intrinsically linear, however. In the classical model, if the k parameters can be written as k one-to-one functions (perhaps nonlinear) of a set of k underlying parameters 1,..., k, then model is intrinsically linear in. b. example (14) The model is nonlinear in the parameter A, but since it is linear in, and is a one-to-one 0 function of A, the model is intrinsically linear. 3. inherently nonlinear models Models that are inherently nonlinear cannot be estimated using ordinary least squares and the previously derived formulas. Alternatives include Taylor's series approximations and direct nonlinear estimation. In the section on non-linear estimation we showed that the non-linear least squares estimator is: 1. consistent. asymptotically normal We also showed that the maximum likelihood estimator in a general non-linear model is 1. consistent. asymptotically normal 3. asymptotically efficient in the sense that within the consistent asymptotic normal (CAN) class it has minimum variance

5 If the distribution of the error terms in the non-linear least squares model is normal, and the errors are iid(o, ), then the non-linear least squares estimator and the maximum likelihood estimator will be the same, just as in the classical normal linear regression model. C. Non-zero expected value of error term (E( t) 0) Consider the case where has a non-zero expectation. The least squares estimators of is given by (15) 5 The expected value of is given as follows where (16) which appears to suggest that all of the least squares estimators in the vector are biased. However, if E( ) = for all t, then t (17) To interpret this consider the rules for matrix multiplication.

6 6 (18) Now consider (19) The first column of the X matrix is a column of ones. Therefore (0) Thus it is clear that

7 7 (1) and only the estimator of the intercept is biased. This situation can arise if a relevant and important factor has been omitted from the model, but the factor doesn't change over time. The effect of this variable is then included in the intercept and separate estimators of and can't be obtained. 1 More general violations lead to more serious problems and in general the least squares estimators of and are biased.

8 8 D. A non-scalar identity covariance matrix 1. introduction Assumption III implies that the covariance matrix of the error vector is a constant multiplied by the identity matrix. In general this covariance may be any positive definite matrix. Different assumptions about this matrix will lead to different properties of various estimators.. heteroskedasticity Heteroskedasticity is the case where the diagonal terms of the covariance matrix are not all equal, i.e. Var ( ) for all t t With heteroskedasticity alone the covariance matrix is given by () This model will have k + n parameters and cannot be estimated using n observations unless some assumptions (restrictions) about the parameters are made....

9 9 3. autocorrelation Autocorrelation is the case where the off-diagonal elements of the covariance matrix are not zero, i.e. Cov (, ) 0 for t s. With no autocorrelation, the errors have no discernible pattern. t s In the case above, positive levels of tend to be associated with positive levels and so on. With autocorrelation alone is given by (3) This model will have k (n(n-1)/) parameters and cannot be estimated using n observations unless some assumptions (restrictions) about the parameters are made.

10 10 4. the general linear model For situations in which autocorrelation or heteroskedasticity exists (4) and the model can be written more generally as (5) Assumption VI as written here allows X to be stochastic, but along with II, allows all results to be conditioned on X in a meaningful way. This model is referred to as the generalized normal linear regression model and includes the classical normal linear regression model as a special case, i.e., when = I. The unknown parameters in the generalized regression model are the 's = ( 1,..., k)', and the n(n + 1)/ independent elements of the covariance matrix. In general it is not possible to estimate unless simplifying assumptions are made since one cannot estimate k + [n(n+1)/] parameters with n observations. 5. Least squares estimations of in the general linear model with = known Least squares estimation makes no assumptions about the disturbance matrix and so is defined as before using the sum of squared errors. The sum of squared errors is defined by (6) The necessary condition for to be a minimum is that (7)

11 11 This gives the normal equations which can then be solved to obtain the least squares estimator (8) The least squares estimator is exactly the same as before. Its properties may be different, however, as will be shown in a later section. 6. Maximum likelihood estimation with known The likelihood function for the vector random variable y is given by the multivariate normal density. For this model Therefore the likelihood function is given by (9) (30) The natural log of the likelihood function is given as (31) The M.L.E. of is defined by maximizing 31 (3) This then yields as an estimator of (33) This estimator differs from the least squares estimator. Thus the least squares estimator will have different properties than the maximum likelihood estimator. Notice that if is equal to I, the estimators are the same.

12 1 (34) 7. Best linear unbiased estimation with known BLUE estimators are obtained by finding the best estimator that satisfies certain conditions. BLUE estimators have the properties of being linear, unbiased, and minimum variance among all linear unbiased estimators. Linearity and unbiasedness can be summarized as (35) The estimator must also be minimum variance. One definition of this is that the variance of each must be a minimum. The variance of the ith is given by the ith diagonal element of (36) This can be denoted as (37) where is the ith row of the matrix A, is and is given as where i is the ith row of an kxk identity matrix. The construction of the estimator can be reduced to selecting the matrix A so that the rows of A (38) Because the result will be symmetric for each (hence, for each a ), denote by where a is an (n by 1) vector. The problem then becomes: i i

13 13 (39) The column vector i is the ith column of the identity matrix. The Lagrangian is as follows (40) To minimize it take the derivatives with respect to a and (41) -1 Now substitute a' = (1/) 'X' into the second equation in 41 to obtain (4) It is obvious that AX = I. The BLUE and MLE estimators of are identical, but different from the least squares estimator of. We sometimes call the BLUE estimator of in the general linear model, the generalized least squares estimator, GLS. This estimator is also sometimes called the Aitken estimator after the individual who first proposed it. 8. A note on the distribution of a. introduction

14 14 For the Classical Normal Linear Regression Model we showed that (43) For the Generalized Regression Model (44) b. unbiasedness of ordinary least squares in the general linear model As before write in the following fashion. (45) Now take the expected value of equation 45 (46) Because is either fixed or a function only of X if X is stochastic, it can be factored out of the expectation, leaving E( X), which has an expectation of zero by assumption II. Now find the unconditional expectation of by using the law of iterated expectations. In the sense of Theorem 3 of the section on alternative estimators, h(x,y) is and E Y X computes the expected value of conditioned on X. The interpretation of this result is that for any particular set of observations, X, the least squares estimator has expectation. c. variance of the OLS estimator First rewrite equation 4 as follows (47)

15 15 (48) Now directly compute the variance of given X. (49) If the regressors are non-stochastic, then this is also the unconditional variance of regressors are stochastic, then the unconditional variance is given by. If the d. unbiasedness of MLE and BLUE in the general linear model First write the GLS estimator as follows (50) Now take the expected value of equation 50 (51) Because is either fixed or a function only of X if X is stochastic, it can be factored out of the expectation, leaving E( X), which has an expectation of zero by assumption II. Now find the unconditional expectation of by using the law of iterated expectations. The interpretation of this result is that for any particular set of observations, X, the generalized least squares estimator has expectation. e. variance of the GLS (MLE and BLUE) estimator (5)

16 16 First rewrite equation 50 as follows (53) Now directly compute the variance of given X. (54) If the regressors are non-stochastic, then this is also the unconditional variance of regressors are stochastic, then the unconditional variance is given by. If the f. summary of finite sample properties of OLS in the general model Note that all the estimators are unbiased estimators of, but. If = I then the classical model results are obtained. Thus using least squares in the generalized model gives unbiased estimators, but the variance of the estimator may not be minimal. 9. Consistency of OLS in the generalized linear regression model We have shown that the least squares estimator in the general model is unbiased. If we can show that its variance goes to zero as n goes to infinity we will have shown that it is mean square error consistent, and thus that is converges in probability to. This variance is given by (55) As previously, we will assume that (56) With this assumption, we need to consider the remaining term, i.e.,

17 17 (57) The leading term,, will, by itself go to zero. We can write the matrix term in the following useful fashion similar to the way we wrote out a matrix product in proving the asymptotic normality of the non-linear least squares estimator. Remember specifically that (58) where In similar fashion we can show that the matrix in equation 57 can be written as (59) The second term in equation 59 is a sum of n terms divided by n. In order to check convergence of this product, we need to consider the order of each term. Remember the definition of order given earlier. Definition of order: 1. A sequence {a } is at most of order n, which we denote O(n ) if is bounded. n When =0, {a } is bounded, and we also write a = O(1), which we say as big oh one. n n. A sequence {a } is of smaller order then n, which we denote o(n ) if. When =0, {a } converges to zero, and we also write a = o(1), which we say as little oh one. n n n The first term in the product is of order 1/n, O(1/n). The second term, in general is of O(n). So it appears that if the product of these two terms converges, it might converge to a matrix of non-zero constants. If this were the case, proving consistency would be a problem. At this point we will simply make an assumption as follows.

18 18 (60) If this is the case, then the expression in equation 59 will converge in the limit to zero, and will be consistent. Using arguments similar to those adopted previously, we can also show that the OLS estimator will be asymptotically normal in a wide variety of settings (Amemiya, 187). Discussion of the GLS estimator will be discussed later. 10. Consistent estimators of the covariance matrix in the case of general error structures a. general discussion If were known, then the estimator of the asymptotic covariance matrix of would be (61) The outer terms are available from the data, and if were known, we would have the information we need to compute standard errors. From a sample of n observations, there is no way to estimate the elements of. But what we really need is an estimator of which is a symmetric k k matrix. What we then need is an estimator of this matrix. We can write this is a more useful fashion as follows (6) where t is the appropriate element of as compared to. The idea will be to use information on the residuals from the least squares regression to devise a way to approximate Q. b. heteroskedasticity only In the case where there is no auto correlation, that is when is a diagonal matrix, we can write equation 6 as * (63) White has shown that under very general conditions, the estimator

19 19 (64) The proof is based on the fact that is a consistent estimate of, (meaning the residuals are -1 consistent estimates of ), and fairly mild assumptions on X. Then rather than using (X X) to estimated the variance of in the general model, we instead use (65) c. autocorrelation In the case of a more general covariance matrix, a candidate estimator for Q * might be (66) The difficulty here is that to this matrix may not converge in the limit. To obtain convergence, it is necessary to assume that the terms involving unequal subscripts in (66) diminish in importance as n grows. A sufficient condition is that terms with subscript pairs t - grow smaller as the distance between then grows larger. A more practical problem for estimation is that Q * may not be positive definite. Newey and West have proposed an estimator to solve this problem using some of the cross products e e. This estimator will be discussed in a later section. E. Stochastic X matrix (possibly less than full rank) 1. X matrix less than full rank t t- If the X matrix, which is nxk, has rank less than k then X'X cannot be inverted and the least squares estimator will not be defined. This was discussed in detail in the section on multicollinearity.. Stochastic X Consider the least squares estimator of in the classical model. We showed that it was unbiased as follows. (67) If the X matrix is stochastic and correlated with, we cannot factor it out of the second term in equation 48. If this is the case,. In such cases, the least squares estimator is usually not only

20 0 biased, but is inconsistent as well. Consider for example the case where converges to a finite and non-singular matrix Q. Then we can compute the probability limit of as (68) unless. We showed previously that a consistent estimator of could be obtained using instrumental variables (IV). The idea of instrumental variables is to devise an estimator of such that the second term in equation 49 will have a probability limit of zero. The instrumental variables estimator is based on the idea that the instruments used in the estimation are not highly correlated with and any correlation disappears in large samples. A further condition is that these instruments are correlated with variables in the matrix X. We defined instrumental variables estimators in two different cases, when the number of instrumental variables was equal to the number of columns of the X matrix, i.e., Z was n x k matrix, and cases where there were more than k instruments. In either case we assumed that had the following properties (69) Then the instrumental variables estimator was given by By finding the plim of this estimator, we showed that it was consistent (70) (71)

21 In the case where the number of instrumental variables was greater than k, we formed k instruments by projecting each of the columns of the stochastic X matrix on all of the instruments, and then used the predicted values of X from these regressions as instrumental variables in defining the IV estimator. If we let P Z be the matrix that projects orthogonally onto the column space defined by the vectors Z, S(Z), then the IV estimator is given by 1 (7) We always assume that the matrix X PZX has full rank, which is a necessary condition for identified. In a similar fashion to equation 5, we can show that this IV estimator is consistent. F. Random disturbances are not distributed normally (assumptions I-IV hold) 1. General discussion to be An inspection of the derivation of the least squares estimator reveals that the deduction is not dependent upon any of the assumptions II-V except for the full rank condition on X. It really doesn't depend on I, if we are simply estimating a linear model no matter the nature of the underlying model. Thus for the model the OLS estimator is always (73) (74) even when assumption V is dropped. However, the statistical properties of distribution of. are very sensitive to the Similarly, we note that while the BLUE of depends upon II-IV, is invariant with respect to the assumptions about the underlying density of as long as II-IV are valid. We can thus conclude that even when the error term is not normally distributed.. Properties of the estimators (OLS and BLUE) when is not normally distributed When the error terms in the linear regression model are not normally distributed, the OLS and BLUE estimators are: a. unbiased b. minimum variance of all unbiased linear estimators (not necessarily of all unbiased estimators since the Cramer Rao lower bound is not known unless we know the density of the residuals) c. consistent d. but standard t and F tests and confidence intervals are not necessarily valid for nonnormally (75)

22 distributed residuals The distribution of (e.g., normal, Beta, Chi Square, etc.) will depend on the distribution of which determines the distribution of y (y = X + ). The maximum likelihood estimator, of course, will differ since it depends explicitly on the joint density function of the residuals. And this joint density gives rise to the likelihood function (76) and requires a knowledge of the distribution of the random disturbances. It is not defined otherwise. MLE are generally efficient estimators and least squares estimators will be efficient if f(y; ) is normal. However, least squares need not be efficient if the residuals are not distributed normally.

23 3 3. example Consider the case in which the density function of the random disturbances is defined by the Laplace distribution (77) which can be graphically depicted as The associated likelihood function is defined by (78) where x t = (1, x t,..., x tk), ' = ( 1,..., k). The log likelihood function is given by (79) The MLE of in this case will minimize (80) and is sometimes called the "least lines," minimum absolute deviations (MAD), or least absolute deviation (LAD) estimator. It will have all the properties of maximum likelihood estimators such as being asymptotically unbiased, consistent, and asymptotically efficient. It need not, however, be unbiased, linear, or minimum variance of all unbiased estimators. 4. Testing for and using other distributions The functional form of the distribution of the residuals is rarely investigated. This can be done, however, by comparing the distribution of with the normal. t

24 Various tests have been proposed to test the assumption of normality. These tests take different forms. One class of tests is based on examining the skewness or kurtosis of the distribution of the estimated residuals. Chi square goodness of fit tests have been proposed which are based upon comparing the histogram of estimated residuals with the normal distribution. The Kolmogorov-Smirnov test is based upon the distribution of the maximum vertical distance between the cumulative histogram and the cumulative distribution of the hypothesized distribution. An alternative approach is to consider general distribution functions such as the beta or gamma which include many of the common alternative specifications as special cases. Literature Cited Aitken, A.C. On Least Squares and Liner Combinations of Observations. Proceedings of the Royal Statistical Society, 55, (1935):4-48 Amemiya, T. Advanced Econometrics. Cambridge: Harvard University Press, Huber, P. J., Robust Statistics, New York: Wiley, Newey, W., and K. West. A Simple Positive Definite, Heteroskedasticity and Autocorrelation Consistent Covariance Matrix. Econometrica 55 (1987): White, H. A Heteroskedasticity Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity. Econometrica 48 (1980):

### OLS in Matrix Form. Let y be an n 1 vector of observations on the dependent variable.

OLS in Matrix Form 1 The True Model Let X be an n k matrix where we have observations on k independent variables for n observations Since our model will usually contain a constant term, one of the columns

### EC327: Advanced Econometrics, Spring 2007

EC327: Advanced Econometrics, Spring 2007 Wooldridge, Introductory Econometrics (3rd ed, 2006) Appendix D: Summary of matrix algebra Basic definitions A matrix is a rectangular array of numbers, with m

### MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS 1. SYSTEMS OF EQUATIONS AND MATRICES 1.1. Representation of a linear system. The general system of m equations in n unknowns can be written a 11 x 1 + a 12 x 2 +

### MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a

### ELEC-E8104 Stochastics models and estimation, Lecture 3b: Linear Estimation in Static Systems

Stochastics models and estimation, Lecture 3b: Linear Estimation in Static Systems Minimum Mean Square Error (MMSE) MMSE estimation of Gaussian random vectors Linear MMSE estimator for arbitrarily distributed

### Simultaneous Equations

Simultaneous Equations 1 Motivation and Examples Now we relax the assumption that EX u 0 his wil require new techniques: 1 Instrumental variables 2 2- and 3-stage least squares 3 Limited LIML and full

### 2 Testing under Normality Assumption

1 Hypothesis testing A statistical test is a method of making a decision about one hypothesis (the null hypothesis in comparison with another one (the alternative using a sample of observations of known

### 1 Teaching notes on GMM 1.

Bent E. Sørensen January 23, 2007 1 Teaching notes on GMM 1. Generalized Method of Moment (GMM) estimation is one of two developments in econometrics in the 80ies that revolutionized empirical work in

### UNIT 2 MATRICES - I 2.0 INTRODUCTION. Structure

UNIT 2 MATRICES - I Matrices - I Structure 2.0 Introduction 2.1 Objectives 2.2 Matrices 2.3 Operation on Matrices 2.4 Invertible Matrices 2.5 Systems of Linear Equations 2.6 Answers to Check Your Progress

### Chapter 3: The Multiple Linear Regression Model

Chapter 3: The Multiple Linear Regression Model Advanced Econometrics - HEC Lausanne Christophe Hurlin University of Orléans November 23, 2013 Christophe Hurlin (University of Orléans) Advanced Econometrics

### We know a formula for and some properties of the determinant. Now we see how the determinant can be used.

Cramer s rule, inverse matrix, and volume We know a formula for and some properties of the determinant. Now we see how the determinant can be used. Formula for A We know: a b d b =. c d ad bc c a Can we

### 2. Linear regression with multiple regressors

2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions

### Review Jeopardy. Blue vs. Orange. Review Jeopardy

Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round \$200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?

### The basic unit in matrix algebra is a matrix, generally expressed as: a 11 a 12. a 13 A = a 21 a 22 a 23

(copyright by Scott M Lynch, February 2003) Brief Matrix Algebra Review (Soc 504) Matrix algebra is a form of mathematics that allows compact notation for, and mathematical manipulation of, high-dimensional

### SYSTEMS OF REGRESSION EQUATIONS

SYSTEMS OF REGRESSION EQUATIONS 1. MULTIPLE EQUATIONS y nt = x nt n + u nt, n = 1,...,N, t = 1,...,T, x nt is 1 k, and n is k 1. This is a version of the standard regression model where the observations

### SYSTEMS OF EQUATIONS

SYSTEMS OF EQUATIONS 1. Examples of systems of equations Here are some examples of systems of equations. Each system has a number of equations and a number (not necessarily the same) of variables for which

### (a) The transpose of a lower triangular matrix is upper triangular, and the transpose of an upper triangular matrix is lower triangular.

Theorem.7.: (Properties of Triangular Matrices) (a) The transpose of a lower triangular matrix is upper triangular, and the transpose of an upper triangular matrix is lower triangular. (b) The product

### Math 315: Linear Algebra Solutions to Midterm Exam I

Math 35: Linear Algebra s to Midterm Exam I # Consider the following two systems of linear equations (I) ax + by = k cx + dy = l (II) ax + by = 0 cx + dy = 0 (a) Prove: If x = x, y = y and x = x 2, y =

### Linear Dependence Tests

Linear Dependence Tests The book omits a few key tests for checking the linear dependence of vectors. These short notes discuss these tests, as well as the reasoning behind them. Our first test checks

### NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

### Statistics in Geophysics: Linear Regression II

Statistics in Geophysics: Linear Regression II Steffen Unkel Department of Statistics Ludwig-Maximilians-University Munich, Germany Winter Term 2013/14 1/28 Model definition Suppose we have the following

### DETERMINANTS. b 2. x 2

DETERMINANTS 1 Systems of two equations in two unknowns A system of two equations in two unknowns has the form a 11 x 1 + a 12 x 2 = b 1 a 21 x 1 + a 22 x 2 = b 2 This can be written more concisely in

### Diagonal, Symmetric and Triangular Matrices

Contents 1 Diagonal, Symmetric Triangular Matrices 2 Diagonal Matrices 2.1 Products, Powers Inverses of Diagonal Matrices 2.1.1 Theorem (Powers of Matrices) 2.2 Multiplying Matrices on the Left Right by

### Ordinary Least Squares and Poisson Regression Models by Luc Anselin University of Illinois Champaign-Urbana, IL

Appendix C Ordinary Least Squares and Poisson Regression Models by Luc Anselin University of Illinois Champaign-Urbana, IL This note provides a brief description of the statistical background, estimators

### Simple Linear Regression Chapter 11

Simple Linear Regression Chapter 11 Rationale Frequently decision-making situations require modeling of relationships among business variables. For instance, the amount of sale of a product may be related

### Topic 1: Matrices and Systems of Linear Equations.

Topic 1: Matrices and Systems of Linear Equations Let us start with a review of some linear algebra concepts we have already learned, such as matrices, determinants, etc Also, we shall review the method

### Multivariate Normal Distribution

Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues

### 1 Short Introduction to Time Series

ECONOMICS 7344, Spring 202 Bent E. Sørensen January 24, 202 Short Introduction to Time Series A time series is a collection of stochastic variables x,.., x t,.., x T indexed by an integer value t. The

### 2. What are the theoretical and practical consequences of autocorrelation?

Lecture 10 Serial Correlation In this lecture, you will learn the following: 1. What is the nature of autocorrelation? 2. What are the theoretical and practical consequences of autocorrelation? 3. Since

### Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

### Notes for STA 437/1005 Methods for Multivariate Data

Notes for STA 437/1005 Methods for Multivariate Data Radford M. Neal, 26 November 2010 Random Vectors Notation: Let X be a random vector with p elements, so that X = [X 1,..., X p ], where denotes transpose.

### Bivariate Regression Analysis. The beginning of many types of regression

Bivariate Regression Analysis The beginning of many types of regression TOPICS Beyond Correlation Forecasting Two points to estimate the slope Meeting the BLUE criterion The OLS method Purpose of Regression

### Hypothesis Testing in the Classical Regression Model

LECTURE 5 Hypothesis Testing in the Classical Regression Model The Normal Distribution and the Sampling Distributions It is often appropriate to assume that the elements of the disturbance vector ε within

### MULTIVARIATE PROBABILITY DISTRIBUTIONS

MULTIVARIATE PROBABILITY DISTRIBUTIONS. PRELIMINARIES.. Example. Consider an experiment that consists of tossing a die and a coin at the same time. We can consider a number of random variables defined

### 1 Logit & Probit Models for Binary Response

ECON 370: Limited Dependent Variable 1 Limited Dependent Variable Econometric Methods, ECON 370 We had previously discussed the possibility of running regressions even when the dependent variable is dichotomous

### MATHEMATICAL METHODS OF STATISTICS

MATHEMATICAL METHODS OF STATISTICS By HARALD CRAMER TROFESSOK IN THE UNIVERSITY OF STOCKHOLM Princeton PRINCETON UNIVERSITY PRESS 1946 TABLE OF CONTENTS. First Part. MATHEMATICAL INTRODUCTION. CHAPTERS

### What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling Jeff Wooldridge NBER Summer Institute, 2007 1. The Linear Model with Cluster Effects 2. Estimation with a Small Number of Groups and

### 1. The Classical Linear Regression Model: The Bivariate Case

Business School, Brunel University MSc. EC5501/5509 Modelling Financial Decisions and Markets/Introduction to Quantitative Methods Prof. Menelaos Karanasos (Room SS69, Tel. 018956584) Lecture Notes 3 1.

### Mean squared error matrix comparison of least aquares and Stein-rule estimators for regression coefficients under non-normal disturbances

METRON - International Journal of Statistics 2008, vol. LXVI, n. 3, pp. 285-298 SHALABH HELGE TOUTENBURG CHRISTIAN HEUMANN Mean squared error matrix comparison of least aquares and Stein-rule estimators

### Simultaneous Equation Models As discussed last week, one important form of endogeneity is simultaneity. This arises when one or more of the

Simultaneous Equation Models As discussed last week, one important form of endogeneity is simultaneity. This arises when one or more of the explanatory variables is jointly determined with the dependent

### Regression Analysis Prof. Soumen Maity Department of Mathematics Indian Institute of Technology, Kharagpur. Lecture - 2 Simple Linear Regression

Regression Analysis Prof. Soumen Maity Department of Mathematics Indian Institute of Technology, Kharagpur Lecture - 2 Simple Linear Regression Hi, this is my second lecture in module one and on simple

### Probability & Statistics Primer Gregory J. Hakim University of Washington 2 January 2009 v2.0

Probability & Statistics Primer Gregory J. Hakim University of Washington 2 January 2009 v2.0 This primer provides an overview of basic concepts and definitions in probability and statistics. We shall

### Master s Theory Exam Spring 2006

Spring 2006 This exam contains 7 questions. You should attempt them all. Each question is divided into parts to help lead you through the material. You should attempt to complete as much of each problem

### , then the form of the model is given by: which comprises a deterministic component involving the three regression coefficients (

Multiple regression Introduction Multiple regression is a logical extension of the principles of simple linear regression to situations in which there are several predictor variables. For instance if we

### Introduction to Matrix Algebra I

Appendix A Introduction to Matrix Algebra I Today we will begin the course with a discussion of matrix algebra. Why are we studying this? We will use matrix algebra to derive the linear regression model

### OLS is not only unbiased it is also the most precise (efficient) unbiased estimation technique - ie the estimator has the smallest variance

Lecture 5: Hypothesis Testing What we know now: OLS is not only unbiased it is also the most precise (efficient) unbiased estimation technique - ie the estimator has the smallest variance (if the Gauss-Markov

### Variance of OLS Estimators and Hypothesis Testing. Randomness in the model. GM assumptions. Notes. Notes. Notes. Charlie Gibbons ARE 212.

Variance of OLS Estimators and Hypothesis Testing Charlie Gibbons ARE 212 Spring 2011 Randomness in the model Considering the model what is random? Y = X β + ɛ, β is a parameter and not random, X may be

Lecture 5: Linear least-squares Regression III: Advanced Methods William G. Jacoby Department of Political Science Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Simple Linear Regression

### Solution to Homework 2

Solution to Homework 2 Olena Bormashenko September 23, 2011 Section 1.4: 1(a)(b)(i)(k), 4, 5, 14; Section 1.5: 1(a)(b)(c)(d)(e)(n), 2(a)(c), 13, 16, 17, 18, 27 Section 1.4 1. Compute the following, if

### Simple Regression Theory II 2010 Samuel L. Baker

SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

### Least Squares Estimation

Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

### 2.1: MATRIX OPERATIONS

.: MATRIX OPERATIONS What are diagonal entries and the main diagonal of a matrix? What is a diagonal matrix? When are matrices equal? Scalar Multiplication 45 Matrix Addition Theorem (pg 0) Let A, B, and

### Estimation and Inference in Cointegration Models Economics 582

Estimation and Inference in Cointegration Models Economics 582 Eric Zivot May 17, 2012 Tests for Cointegration Let the ( 1) vector Y be (1). Recall, Y is cointegrated with 0 cointegrating vectors if there

### Probability and Statistics

CHAPTER 2: RANDOM VARIABLES AND ASSOCIATED FUNCTIONS 2b - 0 Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be

### Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components

Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components The eigenvalues and eigenvectors of a square matrix play a key role in some important operations in statistics. In particular, they

### Limitations of regression analysis

Limitations of regression analysis Ragnar Nymoen Department of Economics, UiO 8 February 2009 Overview What are the limitations to regression? Simultaneous equations bias Measurement errors in explanatory

### Restricted Least Squares, Hypothesis Testing, and Prediction in the Classical Linear Regression Model

Restrited Least Squares, Hypothesis Testing, and Predition in the Classial Linear Regression Model A. Introdution and assumptions The lassial linear regression model an be written as (1) or (2) where xt

### Notes on Determinant

ENGG2012B Advanced Engineering Mathematics Notes on Determinant Lecturer: Kenneth Shum Lecture 9-18/02/2013 The determinant of a system of linear equations determines whether the solution is unique, without

### 1.5 Elementary Matrices and a Method for Finding the Inverse

.5 Elementary Matrices and a Method for Finding the Inverse Definition A n n matrix is called an elementary matrix if it can be obtained from I n by performing a single elementary row operation Reminder:

### C: LEVEL 800 {MASTERS OF ECONOMICS( ECONOMETRICS)}

C: LEVEL 800 {MASTERS OF ECONOMICS( ECONOMETRICS)} 1. EES 800: Econometrics I Simple linear regression and correlation analysis. Specification and estimation of a regression model. Interpretation of regression

### 2DI36 Statistics. 2DI36 Part II (Chapter 7 of MR)

2DI36 Statistics 2DI36 Part II (Chapter 7 of MR) What Have we Done so Far? Last time we introduced the concept of a dataset and seen how we can represent it in various ways But, how did this dataset came

### MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

MATH10212 Linear Algebra Textbook: D. Poole, Linear Algebra: A Modern Introduction. Thompson, 2006. ISBN 0-534-40596-7. Systems of Linear Equations Definition. An n-dimensional vector is a row or a column

### Wooldridge, Introductory Econometrics, 3d ed. Chapter 12: Serial correlation and heteroskedasticity in time series regressions

Wooldridge, Introductory Econometrics, 3d ed. Chapter 12: Serial correlation and heteroskedasticity in time series regressions What will happen if we violate the assumption that the errors are not serially

### Introduction to Regression and Data Analysis

Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

### Linear and Piecewise Linear Regressions

Tarigan Statistical Consulting & Coaching statistical-coaching.ch Doctoral Program in Computer Science of the Universities of Fribourg, Geneva, Lausanne, Neuchâtel, Bern and the EPFL Hands-on Data Analysis

### Linear Systems. Singular and Nonsingular Matrices. Find x 1, x 2, x 3 such that the following three equations hold:

Linear Systems Example: Find x, x, x such that the following three equations hold: x + x + x = 4x + x + x = x + x + x = 6 We can write this using matrix-vector notation as 4 {{ A x x x {{ x = 6 {{ b General

### ANALYSIS OF FACTOR BASED DATA MINING TECHNIQUES

Advances in Information Mining ISSN: 0975 3265 & E-ISSN: 0975 9093, Vol. 3, Issue 1, 2011, pp-26-32 Available online at http://www.bioinfo.in/contents.php?id=32 ANALYSIS OF FACTOR BASED DATA MINING TECHNIQUES

### Solving Linear Systems, Continued and The Inverse of a Matrix

, Continued and The of a Matrix Calculus III Summer 2013, Session II Monday, July 15, 2013 Agenda 1. The rank of a matrix 2. The inverse of a square matrix Gaussian Gaussian solves a linear system by reducing

### INTRODUCTORY STATISTICS

INTRODUCTORY STATISTICS FIFTH EDITION Thomas H. Wonnacott University of Western Ontario Ronald J. Wonnacott University of Western Ontario WILEY JOHN WILEY & SONS New York Chichester Brisbane Toronto Singapore

### 15.062 Data Mining: Algorithms and Applications Matrix Math Review

.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

### MATH 304 Linear Algebra Lecture 8: Inverse matrix (continued). Elementary matrices. Transpose of a matrix.

MATH 304 Linear Algebra Lecture 8: Inverse matrix (continued). Elementary matrices. Transpose of a matrix. Inverse matrix Definition. Let A be an n n matrix. The inverse of A is an n n matrix, denoted

### We call this set an n-dimensional parallelogram (with one vertex 0). We also refer to the vectors x 1,..., x n as the edges of P.

Volumes of parallelograms 1 Chapter 8 Volumes of parallelograms In the present short chapter we are going to discuss the elementary geometrical objects which we call parallelograms. These are going to

### Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) and Neural Networks( 類 神 經 網 路 ) 許 湘 伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 35 13 Examples

### Econometrics Simple Linear Regression

Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight

### MATH 240 Fall, Chapter 1: Linear Equations and Matrices

MATH 240 Fall, 2007 Chapter Summaries for Kolman / Hill, Elementary Linear Algebra, 9th Ed. written by Prof. J. Beachy Sections 1.1 1.5, 2.1 2.3, 4.2 4.9, 3.1 3.5, 5.3 5.5, 6.1 6.3, 6.5, 7.1 7.3 DEFINITIONS

### Chapter 2. Dynamic panel data models

Chapter 2. Dynamic panel data models Master of Science in Economics - University of Geneva Christophe Hurlin, Université d Orléans Université d Orléans April 2010 Introduction De nition We now consider

### MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance

### Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

### Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software

STATA Tutorial Professor Erdinç Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software 1.Wald Test Wald Test is used

### Cointegration. Basic Ideas and Key results. Egon Zakrajšek Division of Monetary Affairs Federal Reserve Board

Cointegration Basic Ideas and Key results Egon Zakrajšek Division of Monetary Affairs Federal Reserve Board Summer School in Financial Mathematics Faculty of Mathematics & Physics University of Ljubljana

### 1 Gaussian Elimination

Contents 1 Gaussian Elimination 1.1 Elementary Row Operations 1.2 Some matrices whose associated system of equations are easy to solve 1.3 Gaussian Elimination 1.4 Gauss-Jordan reduction and the Reduced

### 1 Another method of estimation: least squares

1 Another method of estimation: least squares erm: -estim.tex, Dec8, 009: 6 p.m. (draft - typos/writos likely exist) Corrections, comments, suggestions welcome. 1.1 Least squares in general Assume Y i

### EC 6310: Advanced Econometric Theory

EC 6310: Advanced Econometric Theory July 2008 Slides for Lecture on Bayesian Computation in the Nonlinear Regression Model Gary Koop, University of Strathclyde 1 Summary Readings: Chapter 5 of textbook.

### Basics Inversion and related concepts Random vectors Matrix calculus. Matrix algebra. Patrick Breheny. January 20

Matrix algebra January 20 Introduction Basics The mathematics of multiple regression revolves around ordering and keeping track of large arrays of numbers and solving systems of equations The mathematical

### Determinants. Dr. Doreen De Leon Math 152, Fall 2015

Determinants Dr. Doreen De Leon Math 52, Fall 205 Determinant of a Matrix Elementary Matrices We will first discuss matrices that can be used to produce an elementary row operation on a given matrix A.

### Block designs/1. 1 Background

Block designs 1 Background In a typical experiment, we have a set Ω of experimental units or plots, and (after some preparation) we make a measurement on each plot (for example, the yield of the plot).

### Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives

### Matrix Inverse and Determinants

DM554 Linear and Integer Programming Lecture 5 and Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Outline 1 2 3 4 and Cramer s rule 2 Outline 1 2 3 4 and

### Inner Product Spaces

Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and

### 9 Matrices, determinants, inverse matrix, Cramer s Rule

AAC - Business Mathematics I Lecture #9, December 15, 2007 Katarína Kálovcová 9 Matrices, determinants, inverse matrix, Cramer s Rule Basic properties of matrices: Example: Addition properties: Associative:

### Regression Analysis: Basic Concepts

The simple linear model Regression Analysis: Basic Concepts Allin Cottrell Represents the dependent variable, y i, as a linear function of one independent variable, x i, subject to a random disturbance

### 10.3 POWER METHOD FOR APPROXIMATING EIGENVALUES

55 CHAPTER NUMERICAL METHODS. POWER METHOD FOR APPROXIMATING EIGENVALUES In Chapter 7 we saw that the eigenvalues of an n n matrix A are obtained by solving its characteristic equation n c n n c n n...

### L10: Probability, statistics, and estimation theory

L10: Probability, statistics, and estimation theory Review of probability theory Bayes theorem Statistics and the Normal distribution Least Squares Error estimation Maximum Likelihood estimation Bayesian

### Statistical Machine Learning

Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

### Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Module No. #01 Lecture No. #15 Special Distributions-VI Today, I am going to introduce

### 5.3 The Cross Product in R 3

53 The Cross Product in R 3 Definition 531 Let u = [u 1, u 2, u 3 ] and v = [v 1, v 2, v 3 ] Then the vector given by [u 2 v 3 u 3 v 2, u 3 v 1 u 1 v 3, u 1 v 2 u 2 v 1 ] is called the cross product (or

### Inverses and powers: Rules of Matrix Arithmetic

Contents 1 Inverses and powers: Rules of Matrix Arithmetic 1.1 What about division of matrices? 1.2 Properties of the Inverse of a Matrix 1.2.1 Theorem (Uniqueness of Inverse) 1.2.2 Inverse Test 1.2.3

### Maximum Likelihood Estimation of an ARMA(p,q) Model

Maximum Likelihood Estimation of an ARMA(p,q) Model Constantino Hevia The World Bank. DECRG. October 8 This note describes the Matlab function arma_mle.m that computes the maximum likelihood estimates