# Simultaneous Prediction of Actual and Average Values of Study Variable Using Stein-rule Estimators

Save this PDF as:

Size: px
Start display at page:

Download "Simultaneous Prediction of Actual and Average Values of Study Variable Using Stein-rule Estimators"

## Transcription

1 Shalabh & Christian Heumann Simultaneous Prediction of Actual and Average Values of Study Variable Using Stein-rule Estimators Technical Report Number 104, 2011 Department of Statistics University of Munich

2 Simultaneous Prediction of Actual and Average Values of Study variable Using Stein-rule Estimators Shalabh Department of Mathematics & Statistics Indian Institute of Technology, Kanpur (INDIA) Christian Heumann Institute of Statistics University of Munich, Munich (GERMANY) Abstract The simultaneous prediction of average and actual values of study variable in a linear regression model is considered in this paper. Generally, either of the ordinary least squares estimator or Stein-rule estimators are employed for the construction of predictors for the simultaneous prediction. A linear combination of ordinary least squares and Stein-rule predictors are utilized in this paper to construct an improved predictors. Their efficiency properties are derived using the small disturbance asymptotic theory and dominance conditions for the superiority of predictors over each other are analyzed. 1

3 1 Introduction: Traditionally the predictions from a linear regression model are made either for the actual values of study variable or for the average values. However, this may not be the case in many practical situations and one may be required to predict both the actual and average values of the study variable simultaneously; see, e.g. Rao et al. (2008), Shalabh (1995), and Zellner (1994). As an illustrative example, consider a new drug that promotes the duration of sleep in human beings. The manufacturer of such a drug will be more interested in knowing the average increase in the sleep duration by a specific dose, for example, in designing an advertisement or sale campaign and somewhat less interested in the actual increase of sleep duration. On the other strand, a user may be more interested in knowing the actual increase in sleep duration rather than the average duration. Suppose the statistician utilizes the theory of regression analysis for prediction. It is expected from the statistician to safeguard the interest of both the manufacturer and user who are interested in the prediction of the average and actual increase, respectively, although they may assign varying weight to prediction of actual and average increases of sleep attributable to the specific dose of new drug. The classical theory of prediction can predict either the actual value or the average value of study variable but not simultaneously. In view of the importance of simultaneous prediction of actual and average values of study variable in a linear regression model, Shalabh (1995), see also Rao et al. (2008), has presented a framework for the simultaneous prediction of actual and average values of study variable. Shalabh (1995) has examined the efficiency properties of predictions arising from least squares and Stein-rule estimation procedures. The work on the issue of simultaneous prediction has been extended in various directions from various perspectives in different models in the literature. Toutenburg and Shalabh (2000), Shalabh and Chandra (2002), 2

4 and Dube and Manocha (2002) analyzed the simultaneous prediction in restricted regression model, Chaturvedi and Singh (2000) and Chaturvedi et al. (2008) employed Stein-rule estimators for simultaneous prediction; Chaturvedi et al. (2002) discussed the issue of simultaneous prediction in a multivariate set up with an unknown covariance matrix of disturbance vector; Shalabh et al. (2008) considered the simultaneous prediction in measurement error models etc. In all such works, either the ordinary least squares (OLS) predictor or the Stein-rule (SR) predictor are utilized for predicting the actual and average values of study variable. They provide more efficient predictions under certain conditions depending on whether they are used for actual or average value predictions. So a natural question arises that can we utilize the good properties of the two predictors and obtain an improved estimator? Based on this, we have utilized the OLS and SR predictors together and have proposed two predictors in this paper. Their efficiency properties are derived and analyzed. The small disturbance asymptotic theory is utilized to derive the efficiency properties and dominance conditions for the superiority of predictors over each other are derived and analyzed. The plan of this paper is as follows. Section 2 provides these predictions and presents their motivation. Their properties are analyzed in Sections 3 and 4 employing the small disturbance asymptotic theory. Some concluding remarks are placed in Section 5. Finally, derivation of main results is outlined in Appendix. 2 Model Specification And Predictions: Let us postulate the following linear regression model: y = Xβ + u (2.1) where y is a n 1 vector of n observations on the study variable, X is a n p matrix of n observations on p(> 2) explanatory variables, β is a p 1 vector of p regression 3

5 coefficients and u is a n 1 vector of disturbances following a multivariate normal distribution with mean vector 0 and variance covariance matrix σ 2 I n. rank. It is assumed that the scalar σ 2 is unknown and the matrix X has full column When the simultaneous prediction of average values A v = E(y) and actual values A c = y within the simple is to be considered, we may define our target function as T = λa v + (1 λ)a c (2.2) where λ is a nonstochastic scalar between 0 and 1; see Shalabh (1995), Rao et al. (2008). The value of λ may reflect the weight being given to the prediction of average values in relation to the prediction of actual values. The least squares estimator of β is given by b L = (X X) 1 X y (2.3) which is the best linear unbiased estimator of β. Sometimes the properties like linearity and unbiasedness may not be desirable. Under such situation, it may be possible to obtain an estimator with reduced variability by relaxing the properties of linearity and unbiasedness. The family of Stein-rule estimators gives rise to such estimators. The Stein-rule estimator of β is defined by 2(p 2)k Hy b S = 1 (n p + 2).y b y L (2.4) Hy where H = X(X X) 1 X, H = (I H) and k is any positive nonstochastic scalar; see, e.g., Judge and Bock (1978), Saleh(2006). Based on (2.3) and (2.4), predictions for the values of the study variable are obtained by Xb L and Xb S which can be used for both the average values A v = E(y) as well as actual values A c = y. 4

6 For both the components A v and A c of T defined by (2.2), we may use Xb L so that the vector of predictions for T is given by T LL = Xb L. (2.5) Similarly, if we employ Xb S for both A v and A c, we find the vector of predictions as T SS = Xb s 2(p 2)k Hy = 1 (n p + 2).y Xb y L. (2.6) Hy On the other hand, if we use Xb L for A c and Xb S for A v in T, we get the vector of predictions as T SL = λxb S + (1 λ)xb L = 1 2(p 2)λk Hy (n p + 2).y y Hy b L. (2.7) Similarly, utilizing Xb L for A v and Xb S for A c in T, we find yet another vector of predictions T LS = λxb L + (1 λ)xb S = 1 2(p 2)(1 λ)k. y Hy (n p + 2) y Hy b L. (2.8) Our motivation underlying the formulation of (2.7) and (2.8) is as follows. If we compare Xb L and Xb S with respect to the criterion of total mean squared error, it is well known that Xb L is superior to Xb S for all positive values of k when the aim is to predict A c (the actual values of study variable ). When the aim is to predict A v (the average values of study variable), Xb S is superior to Xb L for positive values of k below one. Thus if we use superior predictions, i.e., Xb L for A c and Xb S for A v in T defined by (2.3), we get T SL. Conversely, if we consider predictions, i.e. Xb L for A V and Xb S for A c, it leads to T LS. 5

7 Looking at the expressions in (2.5), (2.6), (2.7) and (2.8). We may define a family of predictions for T as follows: P g = 1 2(p 2)gk Hy (n p + 2).y y Hy Xb L (2.9) where 0 g 1 is any nonstochastic scalar characterizing the predictions. Notice that assigning values 0, 1, λ and (1 λ) to g in (2.9) yield T LL, T SS, T SL and T LS respectively. Next, let us consider the problem of prediction outside the sample. For this purpose, let us assume to be given a matrix X f of m (e.g., future) values of explanatory values corresponding to which the values of study variable are to be predicted. Further, we assume that the regression relationship continues to remain valid so that we can write y f = X f β + u f (2.10) where y f denotes a m 1 vector of values of study variable to be predicted and u f is a m 1 vector of disturbances having same distributional properties as u in (2.1). Further, we assume that u and u f are stochastically independent. Defining the target function as T f = λe(y f ) + (1 λ)y f, (2.11) we can formulate the following vectors of predictions of T f in the spirit of (2.5)- (2.8): T fll = X f b L (2.12) T fss = X f b S (2.13) 2(p 2)λk Hy T fsl = 1 (n p + 2).y X y f b L (2.14) Hy T fls = 1 2(p 2)(1 λ)k. y Hy (n p + 2) y Hy 6 X f b L (2.15)

8 whence the following family of predictions can be defined: P fg = 1 2(p 2)g fk Hy (n p + 2).y X y f b L (2.16) Hy where 0 g f 1 is a nonstochastic scalar characterizing the predictions. If we set the value of g as 0,1,λ and (1 λ), we obtain (2.12), (2.13), (2.14) and (2.15) respectively as special cases. 3 Asymptotic Efficiency Properties of Predictions Within The Sample: It is easy to see that the predictions based on least squares are weakly unbiased in the sense that E(T LL T ) = 0. (3.1) Further, the second order moment matrix is E(T LL T )(T LL T ) = σ 2 λ 2 I n + (1 2λ) H. (3.2) Similar exact expressions for the bias vector and second order moment matrix of P g for any nonzero value of g can be derived following, for instance, Judge and Bock (1978) but they would be sufficiently intricate and would not permit to deduce any clear inference regarding the efficiency properties. We therefore propose to consider their asymptotic approximations employing the small disturbance asymptotic theory. Theorem I: The asymptotic approximation for the bias vector of P g for nonzero values of g to order O(σ 2 ) is B(P g ) = E(P g T ) 2(n p)(p 2)gk = σ 2 Xβ (3.3) (n p + 2)β X Xβ 7

9 while the difference between the second order moment matrices of P g and P o T LL to order O(σ 4 ) is given by D(P g ; P o ) = E(P o T )(P o T ) E(P g T )(P g T ) 4(n p)(p 2)gk = σ 4 XCX (3.4) (n p + 2)β X Xβ where C = λ(x X) 1 2λ + (p 2)gk ββ. (3.5) β X Xβ These results are derived in Appendix. From (3.3), we observe that P g is not weakly unbiased. However, if we define the norm of bias vector to the order of our approximation as 4(n p) B(P g ) B(P g ) = σ 4 2 (p 2) 2 g 2 k 2 (n p + 2) 2 β X Xβ (3.6) then with respect to the criterion of such a norm, P g is superior than P g for g less than g. In particular, both T SL and T LS are better than T SS for positive λ. Further, T SL is superior or inferior than T LS when λ is less or greater than 0.5. When λ = 0.5, i.e., equal weight is assigned to the prediction of actual and average values of study variable, both T LS and T SL are equally good. Next, let us compare the predictions with respect to the criterion of second order moment matrix to order O(σ 4 ). For this purpose, we utilize the following two results for any p 1 vector a and p p positive definite matrix A. Result I: The matrix (A 1 aa ) is positive definite if only only if a Aa is less than 1; see, e.g., Yancey, Judge and Bock (1974) for proof. Result II: The matrix (aa A 1 ) cannot be non-negative definite for p > 1; see, e.g., Gulkey and Price (1981). Applying Result I to matrix C given by (3.5), we observe that it cannot be positive definite whence it follows from (3.4) that P g cannot be superior to P o 8

10 with respect to the criterion of second order moment matrix to the order of our approximation. Similarly, using Result II, we find that the matrix C cannot be non-negative definite by virtue of our specification that p exceeds 2. In other words, P o cannot be superior to P g. It is thus seen that P g neither dominates P o nor is dominated by P o according to second order moment matrix criterion. For the comparison of P g and P g, we observe from (3.4) that D(P g ; P g ) = E(P g T )(P g T ) E(P g T )(P g T ) 4 4(n p)(p 2)gk = σ (n p + 2)β XXβ (g g ) λ(x X) 1 2λ + (g + g )k ββ. (3.7) β X Xβ Applying Result I and Result II, once again we find no clear dominance of P g over P g. Let us now compare the predictions with respect to the criterion of trace of second order moment matrix to order O(σ 4 ). From (3.4), we see that trd(p g ; P o ) = σ 4 4(n p)(p 2)2 gk (λ gk) (3.8) (n p + 2)β XXβ which is positive when k < λ g. (3.9) Thus P 1 T SS, P λ T SL and P 1 λ T LS are better than P o T LL when k is less than λ, 1 and (1 λ) respectively. Just the reverse is true, i.e., T LL beats T SS, T SL and T LS for k exceeding λ, 1 and (1 λ) respectively which holds true at least so long as k exceeds 1. 9

11 Table 1: Choice of k for the superiority of predictions Predictions T LL T SS T SL T LS T LL * k > λ k > 1 k > (1 λ) T SS k < λ k < λ 1+λ k < λ 2 λ T SL k < 1 k > λ 1+λ * k < λ for λ > 1 2 k > λ for λ < 1 2 T LS k < (1 λ) k > λ 2 λ The entry in the (i, j) th k < λ < 1 2 k > λ < 1 2 cell gives the condition for the superiority of predictions in the i th row over the predictions in the j th column. For example, the entry in (1, 2) th cell is which is the condition for the superiority of T LL over T SS. * Similarly, we observe from (3.7) that 4(n p)(p 2) trd(p g ; P g ) = σ 4 2 (g g )λ (g + g )k (3.10) (n p + 2)β X Xβ which is positive when k < k > ( ) λ g + g ( ) λ g + g for g > g (3.11) for g < g. (3.12) This result can be used to study the relative performance of T SL and T LS The findings are assembled in Table 1. It is interesting to observe that the superiority conditions presented in the tabular form are simple and easy to use in application. 10

12 4 Asymptotic Efficiency Properties Of Predictions Outside The Sample Let us now consider the predictions for T f specified by (2.11). It is easy to see that E(T fll T f ) = 0 (4.1) E(T fll T f )(T fll T f ) = σ (1 2 λ) 2 I n + X f (X X) 1 X f. (4.2) Assuming g to be different from zero, we observe from (2.16) that P fg is not weakly unbiased for T f unlike P fo T fll as is evident from (4.1). Theorem II: The asymptotic approximation for the bias vector of P fg to order O(σ 2 ) is B(P fg ) = E(P fg T f ) 2(n p)(p = σ 2 2)gf k X (n p + 2)β X f β (4.3) Xβ and the difference between the second order moment matrices of P fg T fll is and P fo D(P fg ; P fo ) = E(P fo T f )(P fo T f ) E(P fg T f )(P fg T f ) 4(n p)(p = σ 4 2)gf k X (n p + 2)β X f C f X f (4.4) Xβ where C f = (X X) (p 2)g fk ββ. (4.5) β X Xβ It is interesting to note from (4.3) and (4.4) that bias vector as well as the matrix difference is free from λ, at least to the order of our approximation. This means that our findings arising from these expressions remain valid whether the aim is to predict actual values or average values as well as both. 11

13 Taking the criterion to be the norm of bias vector to the order of our approximation, as in (3.6), it is observed that both T fsl and T fls are better than T fss. Further, T fsl is better than T fls when λ is less than 0.5. The reverse is true, i.e., T fls is better than T fsl when λ exceeds 0.5. If we choose the criterion to be second order moment matrix to order O(σ 4 ) and use the Results I and II stated in preceding Section, it is found that neither P fg is better than P fo nor vice-versa. Finally, let us take the criterion as trace of second order moment matrix to order O(σ 4 ). Proceeding in the same manner as indicated in the preceding Section, we can easily find the conditions for the superiority one over the other. These are assembled in Table 2. First we observe from (4.4) that 4(n p)(p 2)gf kβ X trd(p fg ; P fo ) = σ 4 f X fβ (n p + 2)(β X Xβ) 2 β X Xβ β X f X fβ tr(x X) 1 X fx f 2 (p 2)g f k. (4.6) The expression on the right hand side is positive when 1 β X Xβ k < (p 2)g f β X f X fβ tr(x X) 1 X fx f 2 (4.7) provided that the quantity in the square brackets is positive. If we define q 1 = q p = ( ) 1 1 p α i 1 p 2 α 1 i=2 ( ) p α i 1 p 2 α p i=1 (4.8) (4.9) with α 1 α 2... α p denoting the eigenvalues of X f X f in the metric of X X, 12

14 we observe that the condition (4.7) is satisfied at least as long as k < q p g f ; q p > 0. (4.10) This condition is easy to verify in any given application. cases of it provide the conditions stated in Table 2. Further, special On the other hand, the expression on the right hand side of (4.6) is negative so long as which lead to the results stated in the first row of Table 2. k > q 1 g f (4.11) Similarly, for the comparison of T fss, T fsl and T fls, we observe that 4(n p)(p 2)kβ X trd(p fg ; P fg ) = σ 4 f X fβ (g (n p + 2)(β X Xβ) 2 f gf) β X Xβ β X f X fβ tr(x X) 1 X fx f 2 k(p 2)(g f + gf). (4.12) The expression on the right hand side is positive implying the superiority of P fg over P fg when k < k > 1 (p 2)(g f + g f ) 1 (p 2)(g f + g f ) β X Xβ β X f X fβ tr(x X) 1 X fx f 2 ; g f > gf (4.13) β X Xβ β X f X fβ tr(x X) 1 X fx f 2 ; g f < gf (4.14) provided that the expression on the right hand side of (4.13) is positive. The conditions (4.13) and (4.14) will surely be satisfied so long as ( ) k < k < q p ; q g f + gf p > 0; g f > gf (4.15) ( ) q 1 g f + g f 13 ; g f > g f. (4.16)

15 These conditions provide the remaining entries in Table 2. 14

16 Table 2: Choice of k for the superiority of predictions Predictions TfLL TfSS TfSL TfLS TfLL * k > q1 k > q 1 k > q 1 λ 1 λ TfSS k < qp; qp > 0 k < ( qp 1+λ TfSL k < qp ; q λ p > 0 k > ( q1 1+λ TfLS k < q p ; q (1 λ) p > 0 k > ( q1 2 λ ) * ) k > ( q1 2 λ ) ; q p > 0 k < ( qp 2 λ) ; q p > 0 ) ; λ < 1 2 k < ( qp 2 λ) ; q p > 0; λ > 1 2 k < ( qp 2 λ) ; q p > 0; λ > 1 2 k > ( q1 2 λ ) ; λ < 1 2 The entry in the (i, j) th cell gives the condition for the superiority of predictions in the i th row over the predictions in the j th column as in Table 1. 15

17 5 Some Concluding Remarks: If we take the performance criterion to be total mean squared error, it is wellknown that least squares predictions are better than Stein-rule predictions for the actual values of study variables while the opposite is true, i.e., Stein-rule predictions under some mild constraints are better than the least squares predictions for average values of study variable. This observation has prompted us to present two predictions when the objective is to predict both the actual and average values simultaneously. The proposed predictions are based like Stein-rule predictions. However, if we look at the norms of bias vectors to the order of our approximation, both are found to be superior to Stein-rule predictions. Next, we have compared the predictions according to the criterion of second order moment matrix to the order of our approximation and have found that none of the four predictions is uniformly superior to the other. Finally, taking the criterion as trace of second order moment matrix, we have deduced conditions for the superiority of one over the other and have presented them in a tabular form. These conditions are elegant and easy to apply in developing efficient predictions. It may be remarked that our investigations can be easy extended on the lines of Ullah, Srivastava and Chandra (1983) to the case when the disturbances are not necessarily normally distributed. Appendix In order to find small disturbance asymptotic approximations for the bias vectors and mean squared error matrices, we replace u in (2.1) by σv so that v has a multivariate normal distribution with mean vector 0 and variance covariance 16

18 matrix I n. Thus we can express we find y Hy y Hy = σ 2 v Hv β X Xβ 1 + 2σ β X v β X Xβ + u Hu σ2 β X Xβ = σ 2 v Hv β X Xβ 2σ3 v Hv.β X v (β X Xβ) 2 + O p(σ 4 ). Using it in (2.9) and observing that 1 b L = β + σ(x X) 1 X v (5.1) (P g T ) = σ(λi n H)v 2σ 2 (p 2)gk v Hv (n p + 2)β X Xβ Xβ ( H σ 3 2(p 2)gk v Hv (n p + 2)β X Xβ Thus the bias vector of P g is given by 2 β X Xβ Xββ X ) v + O p (σ 4 ). E(P g T ) = 2σ 2 (n p)(p 2)gk (n p + 2)β X Xβ + O(σ3 ) (5.2) which provides the result (3.3) of Theorem I. Similarly, dropping the terms with expectation as null matrix, the mean squared error matrix of P g is E(P g T )(P g T ) = σ 2 (λi n H)E(vv )(λi n ( H) ) σ 4 2(p 2)gk 2 H (n p + 2)β X Xβ β X Xβ Xββ X E(v Hv.vv )(λi n H ) ( ) +(λi n H)E(v Hv.vv 2 ) H β X Xβ Xββ X 2(p 2)gk (n p + 2)β X Xβ E(v Hv) 2 Xββ X + O(σ 5 ) = σ 2 λ 2 I n + (1 2λ) H λh 4 4(n p)(p 2)gk σ (n p + 2)β X Xβ +O(σ 5 ) 17 2λ + (p 2)gk Xββ X β X Xβ

19 which leads to the result (3.4) of Theorem I. The results of Theorem II can be derived in a similar manner. References Chaturvedi, Anoop, Alan T. K. Wan, Shri P. Singh (2002): Improved multivariate prediction in a general linear model with an unknown error covariance matrix, Journal of Multivariate Analysis, 83, 1, Chaturvedi, Anoop and Shri Prakash Singh (2000): Stein rule prediction of the composite target function in a general linear regression model, Statistical Papers, 41, 3, Chaturvedi, Anoop, Suchita Kesarwani, Ram Chandra (2008): Simultaneous prediction based on shrinkage estimator in Recent Advances in Linear Models and Related Areas, (Eds. Shalabh and C.Heumann), , Springer. Dube, M. and Veena Manocha (2002): Simultaneous prediction in restricted regression models, Journal of Applied Statistical Science, 11, 4, Guilkey, D.K. and J.M. Price (1981): On comparing restricted least squares estimators Journal of Econometrics, 15, Rao, C. R., H. Toutenburg, Shalabh and C. Heumann (2008): : Linear Models and Generalizations - Least Squares and Alternatives, Springer. Judge, G.G. and M.E. Bock (1978): The Statistical Implications of Pre-Test and Stein- Rule Estimators in Econometrics, North-Holland. 18

20 Saleh, A.K.Md.E. (2006): Theory of Preliminary Test and Stein-Type Estimation with Applications, Wiley Shalabh (1995): Performance of Stein-rule procedure for simultaneous prediction of actual and average values of study variable in linear regression model Jan Tinbergen award article in Proceedings of the 50th session of the International Statistical Institute, Shalabh, Chandra Mani Paudel and Narinder Kumar (2008): Simultaneous prediction of actual and average values of response variable in replicated measurement error models in Recent Advances in Linear Models and Related Areas, (Eds. Shalabh and C.Heumann), , Springer. Shalabh and R. Chandra (2002): Prediction in Restricted Regression Models, Journal of Combinatorics, Information System and Sciences, Vol. 29, Nos. 1-4, pp Toutenburg, H. and Shalabh (2000): Improved Prediction in Linear Regression Model with Stochastic Linear Constraints, Biometrical Journal, 42, 1, Ullah, A., V.K. Srivastava and R. Chandra (1983): Properties of shrinkage estimators in Linear regression when disturbances are not normal Journal of Econometrics, 21, Yancey, T.A., G.G. Judge and M.E. Bock (1974): A mean square test when stochastic restrictions are used in regression Communications in Statistics, 3,

### Mean squared error matrix comparison of least aquares and Stein-rule estimators for regression coefficients under non-normal disturbances

METRON - International Journal of Statistics 2008, vol. LXVI, n. 3, pp. 285-298 SHALABH HELGE TOUTENBURG CHRISTIAN HEUMANN Mean squared error matrix comparison of least aquares and Stein-rule estimators

### Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

### 1 Eigenvalues and Eigenvectors

Math 20 Chapter 5 Eigenvalues and Eigenvectors Eigenvalues and Eigenvectors. Definition: A scalar λ is called an eigenvalue of the n n matrix A is there is a nontrivial solution x of Ax = λx. Such an x

### EC327: Advanced Econometrics, Spring 2007

EC327: Advanced Econometrics, Spring 2007 Wooldridge, Introductory Econometrics (3rd ed, 2006) Appendix D: Summary of matrix algebra Basic definitions A matrix is a rectangular array of numbers, with m

### Basics Inversion and related concepts Random vectors Matrix calculus. Matrix algebra. Patrick Breheny. January 20

Matrix algebra January 20 Introduction Basics The mathematics of multiple regression revolves around ordering and keeping track of large arrays of numbers and solving systems of equations The mathematical

### Econometrics Simple Linear Regression

Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight

### Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components

Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components The eigenvalues and eigenvectors of a square matrix play a key role in some important operations in statistics. In particular, they

### ELEC-E8104 Stochastics models and estimation, Lecture 3b: Linear Estimation in Static Systems

Stochastics models and estimation, Lecture 3b: Linear Estimation in Static Systems Minimum Mean Square Error (MMSE) MMSE estimation of Gaussian random vectors Linear MMSE estimator for arbitrarily distributed

### Chapter 3: The Multiple Linear Regression Model

Chapter 3: The Multiple Linear Regression Model Advanced Econometrics - HEC Lausanne Christophe Hurlin University of Orléans November 23, 2013 Christophe Hurlin (University of Orléans) Advanced Econometrics

### Notes for STA 437/1005 Methods for Multivariate Data

Notes for STA 437/1005 Methods for Multivariate Data Radford M. Neal, 26 November 2010 Random Vectors Notation: Let X be a random vector with p elements, so that X = [X 1,..., X p ], where denotes transpose.

### ECONOMETRIC THEORY. MODULE I Lecture - 1 Introduction to Econometrics

ECONOMETRIC THEORY MODULE I Lecture - 1 Introduction to Econometrics Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur 2 Econometrics deals with the measurement

### Statistics in Geophysics: Linear Regression II

Statistics in Geophysics: Linear Regression II Steffen Unkel Department of Statistics Ludwig-Maximilians-University Munich, Germany Winter Term 2013/14 1/28 Model definition Suppose we have the following

### Block designs/1. 1 Background

Block designs 1 Background In a typical experiment, we have a set Ω of experimental units or plots, and (after some preparation) we make a measurement on each plot (for example, the yield of the plot).

### Chapter 6 Regression Analysis Under Linear Restrictions and Preliminary Test Estimation

Chapter 6 Regression Analysis Under Linear Restrictions and Preliminary Test Estimation One of the basic objectives in any statistical modeling is to find good estimators of the parameters. In the context

### Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

### OLS in Matrix Form. Let y be an n 1 vector of observations on the dependent variable.

OLS in Matrix Form 1 The True Model Let X be an n k matrix where we have observations on k independent variables for n observations Since our model will usually contain a constant term, one of the columns

### Similarity and Diagonalization. Similar Matrices

MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that

### Chapter 6: Multivariate Cointegration Analysis

Chapter 6: Multivariate Cointegration Analysis 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie VI. Multivariate Cointegration

### Chapter 4: Constrained estimators and tests in the multiple linear regression model (Part II)

Chapter 4: Constrained estimators and tests in the multiple linear regression model (Part II) Florian Pelgrin HEC September-December 2010 Florian Pelgrin (HEC) Constrained estimators September-December

### Orthogonal Diagonalization of Symmetric Matrices

MATH10212 Linear Algebra Brief lecture notes 57 Gram Schmidt Process enables us to find an orthogonal basis of a subspace. Let u 1,..., u k be a basis of a subspace V of R n. We begin the process of finding

### From the help desk: Bootstrapped standard errors

The Stata Journal (2003) 3, Number 1, pp. 71 80 From the help desk: Bootstrapped standard errors Weihua Guan Stata Corporation Abstract. Bootstrapping is a nonparametric approach for evaluating the distribution

### 2 Testing under Normality Assumption

1 Hypothesis testing A statistical test is a method of making a decision about one hypothesis (the null hypothesis in comparison with another one (the alternative using a sample of observations of known

### Methods for Finding Bases

Methods for Finding Bases Bases for the subspaces of a matrix Row-reduction methods can be used to find bases. Let us now look at an example illustrating how to obtain bases for the row space, null space,

### The basic unit in matrix algebra is a matrix, generally expressed as: a 11 a 12. a 13 A = a 21 a 22 a 23

(copyright by Scott M Lynch, February 2003) Brief Matrix Algebra Review (Soc 504) Matrix algebra is a form of mathematics that allows compact notation for, and mathematical manipulation of, high-dimensional

### Using the Singular Value Decomposition

Using the Singular Value Decomposition Emmett J. Ientilucci Chester F. Carlson Center for Imaging Science Rochester Institute of Technology emmett@cis.rit.edu May 9, 003 Abstract This report introduces

### 3.1 Least squares in matrix form

118 3 Multiple Regression 3.1 Least squares in matrix form E Uses Appendix A.2 A.4, A.6, A.7. 3.1.1 Introduction More than one explanatory variable In the foregoing chapter we considered the simple regression

### MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

MATH10212 Linear Algebra Textbook: D. Poole, Linear Algebra: A Modern Introduction. Thompson, 2006. ISBN 0-534-40596-7. Systems of Linear Equations Definition. An n-dimensional vector is a row or a column

### UNIT 2 MATRICES - I 2.0 INTRODUCTION. Structure

UNIT 2 MATRICES - I Matrices - I Structure 2.0 Introduction 2.1 Objectives 2.2 Matrices 2.3 Operation on Matrices 2.4 Invertible Matrices 2.5 Systems of Linear Equations 2.6 Answers to Check Your Progress

### Solving Systems of Linear Equations

LECTURE 5 Solving Systems of Linear Equations Recall that we introduced the notion of matrices as a way of standardizing the expression of systems of linear equations In today s lecture I shall show how

### INTRODUCTORY STATISTICS

INTRODUCTORY STATISTICS FIFTH EDITION Thomas H. Wonnacott University of Western Ontario Ronald J. Wonnacott University of Western Ontario WILEY JOHN WILEY & SONS New York Chichester Brisbane Toronto Singapore

### Estimation and Inference in Cointegration Models Economics 582

Estimation and Inference in Cointegration Models Economics 582 Eric Zivot May 17, 2012 Tests for Cointegration Let the ( 1) vector Y be (1). Recall, Y is cointegrated with 0 cointegrating vectors if there

### Solution based on matrix technique Rewrite. ) = 8x 2 1 4x 1x 2 + 5x x1 2x 2 2x 1 + 5x 2

8.2 Quadratic Forms Example 1 Consider the function q(x 1, x 2 ) = 8x 2 1 4x 1x 2 + 5x 2 2 Determine whether q(0, 0) is the global minimum. Solution based on matrix technique Rewrite q( x1 x 2 = x1 ) =

### Linear Dependence Tests

Linear Dependence Tests The book omits a few key tests for checking the linear dependence of vectors. These short notes discuss these tests, as well as the reasoning behind them. Our first test checks

### SYSTEMS OF REGRESSION EQUATIONS

SYSTEMS OF REGRESSION EQUATIONS 1. MULTIPLE EQUATIONS y nt = x nt n + u nt, n = 1,...,N, t = 1,...,T, x nt is 1 k, and n is k 1. This is a version of the standard regression model where the observations

### Factor analysis. Angela Montanari

Factor analysis Angela Montanari 1 Introduction Factor analysis is a statistical model that allows to explain the correlations between a large number of observed correlated variables through a small number

### B such that AB = I and BA = I. (We say B is an inverse of A.) Definition A square matrix A is invertible (or nonsingular) if matrix

Matrix inverses Recall... Definition A square matrix A is invertible (or nonsingular) if matrix B such that AB = and BA =. (We say B is an inverse of A.) Remark Not all square matrices are invertible.

### From the help desk: Swamy s random-coefficients model

The Stata Journal (2003) 3, Number 3, pp. 302 308 From the help desk: Swamy s random-coefficients model Brian P. Poi Stata Corporation Abstract. This article discusses the Swamy (1970) random-coefficients

### These axioms must hold for all vectors ū, v, and w in V and all scalars c and d.

DEFINITION: A vector space is a nonempty set V of objects, called vectors, on which are defined two operations, called addition and multiplication by scalars (real numbers), subject to the following axioms

### The Characteristic Polynomial

Physics 116A Winter 2011 The Characteristic Polynomial 1 Coefficients of the characteristic polynomial Consider the eigenvalue problem for an n n matrix A, A v = λ v, v 0 (1) The solution to this problem

### Ordinary Least Squares and Poisson Regression Models by Luc Anselin University of Illinois Champaign-Urbana, IL

Appendix C Ordinary Least Squares and Poisson Regression Models by Luc Anselin University of Illinois Champaign-Urbana, IL This note provides a brief description of the statistical background, estimators

### On Small Sample Properties of Permutation Tests: A Significance Test for Regression Models

On Small Sample Properties of Permutation Tests: A Significance Test for Regression Models Hisashi Tanizaki Graduate School of Economics Kobe University (tanizaki@kobe-u.ac.p) ABSTRACT In this paper we

### Marketing Mix Modelling and Big Data P. M Cain

1) Introduction Marketing Mix Modelling and Big Data P. M Cain Big data is generally defined in terms of the volume and variety of structured and unstructured information. Whereas structured data is stored

### State Space Time Series Analysis

State Space Time Series Analysis p. 1 State Space Time Series Analysis Siem Jan Koopman http://staff.feweb.vu.nl/koopman Department of Econometrics VU University Amsterdam Tinbergen Institute 2011 State

### Which null hypothesis do overidentification restrictions actually test? Abstract

Which null hypothesis do overidentification restrictions actually test? Rembert De Blander EcRu, U.C.Louvain and CES, K.U.Leuven Abstract In this note I investigate which alternatives are detected by over-identifying

### Regression III: Advanced Methods

Lecture 5: Linear least-squares Regression III: Advanced Methods William G. Jacoby Department of Political Science Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Simple Linear Regression

### 10.3 POWER METHOD FOR APPROXIMATING EIGENVALUES

55 CHAPTER NUMERICAL METHODS. POWER METHOD FOR APPROXIMATING EIGENVALUES In Chapter 7 we saw that the eigenvalues of an n n matrix A are obtained by solving its characteristic equation n c n n c n n...

### On the general equation of the second degree

On the general equation of the second degree S Kesavan The Institute of Mathematical Sciences, CIT Campus, Taramani, Chennai - 600 113 e-mail:kesh@imscresin Abstract We give a unified treatment of the

### Review Jeopardy. Blue vs. Orange. Review Jeopardy

Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round \$200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?

### The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression

The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression The SVD is the most generally applicable of the orthogonal-diagonal-orthogonal type matrix decompositions Every

### Similar matrices and Jordan form

Similar matrices and Jordan form We ve nearly covered the entire heart of linear algebra once we ve finished singular value decompositions we ll have seen all the most central topics. A T A is positive

### Factorization Theorems

Chapter 7 Factorization Theorems This chapter highlights a few of the many factorization theorems for matrices While some factorization results are relatively direct, others are iterative While some factorization

### ( ) which must be a vector

MATH 37 Linear Transformations from Rn to Rm Dr. Neal, WKU Let T : R n R m be a function which maps vectors from R n to R m. Then T is called a linear transformation if the following two properties are

### Various Symmetries in Matrix Theory with Application to Modeling Dynamic Systems

Various Symmetries in Matrix Theory with Application to Modeling Dynamic Systems A Aghili Ashtiani a, P Raja b, S K Y Nikravesh a a Department of Electrical Engineering, Amirkabir University of Technology,

### 15.062 Data Mining: Algorithms and Applications Matrix Math Review

.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

### 2.5 Gaussian Elimination

page 150 150 CHAPTER 2 Matrices and Systems of Linear Equations 37 10 the linear algebra package of Maple, the three elementary 20 23 1 row operations are 12 1 swaprow(a,i,j): permute rows i and j 3 3

### Inverses and powers: Rules of Matrix Arithmetic

Contents 1 Inverses and powers: Rules of Matrix Arithmetic 1.1 What about division of matrices? 1.2 Properties of the Inverse of a Matrix 1.2.1 Theorem (Uniqueness of Inverse) 1.2.2 Inverse Test 1.2.3

### Hypothesis Testing in the Classical Regression Model

LECTURE 5 Hypothesis Testing in the Classical Regression Model The Normal Distribution and the Sampling Distributions It is often appropriate to assume that the elements of the disturbance vector ε within

### Diagonalisation. Chapter 3. Introduction. Eigenvalues and eigenvectors. Reading. Definitions

Chapter 3 Diagonalisation Eigenvalues and eigenvectors, diagonalisation of a matrix, orthogonal diagonalisation fo symmetric matrices Reading As in the previous chapter, there is no specific essential

### Multivariate normal distribution and testing for means (see MKB Ch 3)

Multivariate normal distribution and testing for means (see MKB Ch 3) Where are we going? 2 One-sample t-test (univariate).................................................. 3 Two-sample t-test (univariate).................................................

### Presentation 3: Eigenvalues and Eigenvectors of a Matrix

Colleen Kirksey, Beth Van Schoyck, Dennis Bowers MATH 280: Problem Solving November 18, 2011 Presentation 3: Eigenvalues and Eigenvectors of a Matrix Order of Presentation: 1. Definitions of Eigenvalues

### Master s Theory Exam Spring 2006

Spring 2006 This exam contains 7 questions. You should attempt them all. Each question is divided into parts to help lead you through the material. You should attempt to complete as much of each problem

### Multiple Linear Regression in Data Mining

Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple

### CONTROLLABILITY. Chapter 2. 2.1 Reachable Set and Controllability. Suppose we have a linear system described by the state equation

Chapter 2 CONTROLLABILITY 2 Reachable Set and Controllability Suppose we have a linear system described by the state equation ẋ Ax + Bu (2) x() x Consider the following problem For a given vector x in

### Poisson Models for Count Data

Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the

### Linearly Independent Sets and Linearly Dependent Sets

These notes closely follow the presentation of the material given in David C. Lay s textbook Linear Algebra and its Applications (3rd edition). These notes are intended primarily for in-class presentation

### Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs

CSE599s: Extremal Combinatorics November 21, 2011 Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs Lecturer: Anup Rao 1 An Arithmetic Circuit Lower Bound An arithmetic circuit is just like

### Introduction to Matrix Algebra

Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra - 1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary

### Coefficients of determination

Coefficients of determination Jean-Marie Dufour McGill University First version: March 1983 Revised: February 2002, July 2011 his version: July 2011 Compiled: November 21, 2011, 11:05 his work was supported

### Quadratic forms Cochran s theorem, degrees of freedom, and all that

Quadratic forms Cochran s theorem, degrees of freedom, and all that Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 1 Why We Care Cochran s theorem tells us

### 2. Linear regression with multiple regressors

2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions

### 2DI36 Statistics. 2DI36 Part II (Chapter 7 of MR)

2DI36 Statistics 2DI36 Part II (Chapter 7 of MR) What Have we Done so Far? Last time we introduced the concept of a dataset and seen how we can represent it in various ways But, how did this dataset came

### Determinants. Dr. Doreen De Leon Math 152, Fall 2015

Determinants Dr. Doreen De Leon Math 52, Fall 205 Determinant of a Matrix Elementary Matrices We will first discuss matrices that can be used to produce an elementary row operation on a given matrix A.

### Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

### SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

### Chapter 6. Orthogonality

6.3 Orthogonal Matrices 1 Chapter 6. Orthogonality 6.3 Orthogonal Matrices Definition 6.4. An n n matrix A is orthogonal if A T A = I. Note. We will see that the columns of an orthogonal matrix must be

### MATH 240 Fall, Chapter 1: Linear Equations and Matrices

MATH 240 Fall, 2007 Chapter Summaries for Kolman / Hill, Elementary Linear Algebra, 9th Ed. written by Prof. J. Beachy Sections 1.1 1.5, 2.1 2.3, 4.2 4.9, 3.1 3.5, 5.3 5.5, 6.1 6.3, 6.5, 7.1 7.3 DEFINITIONS

### Mathematics Course 111: Algebra I Part IV: Vector Spaces

Mathematics Course 111: Algebra I Part IV: Vector Spaces D. R. Wilkins Academic Year 1996-7 9 Vector Spaces A vector space over some field K is an algebraic structure consisting of a set V on which are

### Econometric Analysis of Cross Section and Panel Data Second Edition. Jeffrey M. Wooldridge. The MIT Press Cambridge, Massachusetts London, England

Econometric Analysis of Cross Section and Panel Data Second Edition Jeffrey M. Wooldridge The MIT Press Cambridge, Massachusetts London, England Preface Acknowledgments xxi xxix I INTRODUCTION AND BACKGROUND

### December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS

December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B KITCHENS The equation 1 Lines in two-dimensional space (1) 2x y = 3 describes a line in two-dimensional space The coefficients of x and y in the equation

### Linear Codes. In the V[n,q] setting, the terms word and vector are interchangeable.

Linear Codes Linear Codes In the V[n,q] setting, an important class of codes are the linear codes, these codes are the ones whose code words form a sub-vector space of V[n,q]. If the subspace of V[n,q]

### Matrix Norms. Tom Lyche. September 28, Centre of Mathematics for Applications, Department of Informatics, University of Oslo

Matrix Norms Tom Lyche Centre of Mathematics for Applications, Department of Informatics, University of Oslo September 28, 2009 Matrix Norms We consider matrix norms on (C m,n, C). All results holds for

### INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) Abstract Indirect inference is a simulation-based method for estimating the parameters of economic models. Its

### Topic 1: Matrices and Systems of Linear Equations.

Topic 1: Matrices and Systems of Linear Equations Let us start with a review of some linear algebra concepts we have already learned, such as matrices, determinants, etc Also, we shall review the method

### 7 - Linear Transformations

7 - Linear Transformations Mathematics has as its objects of study sets with various structures. These sets include sets of numbers (such as the integers, rationals, reals, and complexes) whose structure

### MATHEMATICAL METHODS OF STATISTICS

MATHEMATICAL METHODS OF STATISTICS By HARALD CRAMER TROFESSOK IN THE UNIVERSITY OF STOCKHOLM Princeton PRINCETON UNIVERSITY PRESS 1946 TABLE OF CONTENTS. First Part. MATHEMATICAL INTRODUCTION. CHAPTERS

### 4. Matrix inverses. left and right inverse. linear independence. nonsingular matrices. matrices with linearly independent columns

L. Vandenberghe EE133A (Spring 2016) 4. Matrix inverses left and right inverse linear independence nonsingular matrices matrices with linearly independent columns matrices with linearly independent rows

### Regression Analysis: Basic Concepts

The simple linear model Regression Analysis: Basic Concepts Allin Cottrell Represents the dependent variable, y i, as a linear function of one independent variable, x i, subject to a random disturbance

### Least Squares Estimation

Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

### Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression Saikat Maitra and Jun Yan Abstract: Dimension reduction is one of the major tasks for multivariate

### C: LEVEL 800 {MASTERS OF ECONOMICS( ECONOMETRICS)}

C: LEVEL 800 {MASTERS OF ECONOMICS( ECONOMETRICS)} 1. EES 800: Econometrics I Simple linear regression and correlation analysis. Specification and estimation of a regression model. Interpretation of regression

### 1 Introduction. 2 Matrices: Definition. Matrix Algebra. Hervé Abdi Lynne J. Williams

In Neil Salkind (Ed.), Encyclopedia of Research Design. Thousand Oaks, CA: Sage. 00 Matrix Algebra Hervé Abdi Lynne J. Williams Introduction Sylvester developed the modern concept of matrices in the 9th

### The Scalar Algebra of Means, Covariances, and Correlations

3 The Scalar Algebra of Means, Covariances, and Correlations In this chapter, we review the definitions of some key statistical concepts: means, covariances, and correlations. We show how the means, variances,

### x1 x 2 x 3 y 1 y 2 y 3 x 1 y 2 x 2 y 1 0.

Cross product 1 Chapter 7 Cross product We are getting ready to study integration in several variables. Until now we have been doing only differential calculus. One outcome of this study will be our ability

### Practical Guide to the Simplex Method of Linear Programming

Practical Guide to the Simplex Method of Linear Programming Marcel Oliver Revised: April, 0 The basic steps of the simplex algorithm Step : Write the linear programming problem in standard form Linear

### The Delta Method and Applications

Chapter 5 The Delta Method and Applications 5.1 Linear approximations of functions In the simplest form of the central limit theorem, Theorem 4.18, we consider a sequence X 1, X,... of independent and

### 160 CHAPTER 4. VECTOR SPACES

160 CHAPTER 4. VECTOR SPACES 4. Rank and Nullity In this section, we look at relationships between the row space, column space, null space of a matrix and its transpose. We will derive fundamental results

### Matrix Algebra LECTURE 1. Simultaneous Equations Consider a system of m linear equations in n unknowns: y 1 = a 11 x 1 + a 12 x 2 + +a 1n x n,

LECTURE 1 Matrix Algebra Simultaneous Equations Consider a system of m linear equations in n unknowns: y 1 a 11 x 1 + a 12 x 2 + +a 1n x n, (1) y 2 a 21 x 1 + a 22 x 2 + +a 2n x n, y m a m1 x 1 +a m2 x

### 1 Teaching notes on GMM 1.

Bent E. Sørensen January 23, 2007 1 Teaching notes on GMM 1. Generalized Method of Moment (GMM) estimation is one of two developments in econometrics in the 80ies that revolutionized empirical work in