4.7 Confidence and Prediction Intervals

Similar documents
Review Jeopardy. Blue vs. Orange. Review Jeopardy

1 Introduction to Matrices

Notes on Applied Linear Regression

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Multiple Linear Regression in Data Mining

Sections 2.11 and 5.8

Least Squares Estimation

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Regression Analysis: A Complete Example

MATHEMATICAL METHODS OF STATISTICS

Dimensionality Reduction: Principal Components Analysis

Multivariate Normal Distribution

NOTES ON LINEAR TRANSFORMATIONS

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

Similarity and Diagonalization. Similar Matrices

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

Part 2: Analysis of Relationship Between Two Variables

Introduction to Matrix Algebra

Quadratic forms Cochran s theorem, degrees of freedom, and all that

Methods for Finding Bases

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

Introduction to General and Generalized Linear Models

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Regression step-by-step using Microsoft Excel

5. Linear Regression

D-optimal plans in observational studies

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Simple Regression Theory II 2010 Samuel L. Baker

Simple Linear Regression, Scatterplots, and Bivariate Correlation

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

3.1 Least squares in matrix form

Univariate Regression

MATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors. Jordan canonical form (continued).

Data Mining: Algorithms and Applications Matrix Math Review

Multiple regression - Matrices

Solving Linear Systems, Continued and The Inverse of a Matrix

GLM I An Introduction to Generalized Linear Models

MULTIPLE REGRESSIONS ON SOME SELECTED MACROECONOMIC VARIABLES ON STOCK MARKET RETURNS FROM

T ( a i x i ) = a i T (x i ).

( ) which must be a vector

MATH 304 Linear Algebra Lecture 18: Rank and nullity of a matrix.

Econometrics Simple Linear Regression

MAT 200, Midterm Exam Solution. a. (5 points) Compute the determinant of the matrix A =

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics

An extension of the factoring likelihood approach for non-monotone missing data

Part II. Multiple Linear Regression

Lecture 3: Linear methods for classification

Precalculus REVERSE CORRELATION. Content Expectations for. Precalculus. Michigan CONTENT EXPECTATIONS FOR PRECALCULUS CHAPTER/LESSON TITLES

SYSTEMS OF REGRESSION EQUATIONS

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

Module 5: Multiple Regression Analysis

Introduction: Overview of Kernel Methods

Factor Analysis. Chapter 420. Introduction

MULTIPLE REGRESSION EXAMPLE

Partial Least Squares (PLS) Regression.

Regression Analysis (Spring, 2000)

Math 115A HW4 Solutions University of California, Los Angeles. 5 2i 6 + 4i. (5 2i)7i (6 + 4i)( 3 + i) = 35i + 14 ( 22 6i) = i.

Regression III: Advanced Methods

A SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS

CONDITIONAL, PARTIAL AND RANK CORRELATION FOR THE ELLIPTICAL COPULA; DEPENDENCE MODELLING IN UNCERTAINTY ANALYSIS

TRINITY COLLEGE. Faculty of Engineering, Mathematics and Science. School of Computer Science & Statistics

MULTIPLE REGRESSION WITH CATEGORICAL DATA

Multivariate normal distribution and testing for means (see MKB Ch 3)

Week 5: Multiple Linear Regression

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

The Ideal Class Group

NOV /II. 1. Let f(z) = sin z, z C. Then f(z) : 3. Let the sequence {a n } be given. (A) is bounded in the complex plane

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

Inner Product Spaces

2. Linear regression with multiple regressors

Simple Methods and Procedures Used in Forecasting

Week TSX Index

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.

Chapter 3: The Multiple Linear Regression Model

Inner product. Definition of inner product

Factorization Theorems

a 11 x 1 + a 12 x a 1n x n = b 1 a 21 x 1 + a 22 x a 2n x n = b 2.

Multivariate Statistical Inference and Applications

Study Guide for the Final Exam

10. Analysis of Longitudinal Studies Repeat-measures analysis

University of Lille I PC first year list of exercises n 7. Review

Linear Regression. Guy Lebanon

Goodness of fit assessment of item response theory models

2. Simple Linear Regression

Chapter 4: Vector Autoregressive Models

Directions for using SPSS

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

Forecasting in STATA: Tools and Tricks

17. SIMPLE LINEAR REGRESSION II

CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES. From Exploratory Factor Analysis Ledyard R Tucker and Robert C.

Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as

Department of Economics

Orthogonal Diagonalization of Symmetric Matrices

Estimation of σ 2, the variance of ɛ

Transcription:

4.7 Confidence and Prediction Intervals Instead of conducting tests we could find confidence intervals for a regression coefficient, or a set of regression coefficient, or for the mean of the response given values for the regressors, and sometimes we are interested in the range of possible values for the response given values for the regressors, leading to prediction intervals. 4.7.1 Confidence Intervals for regression coefficients We already proved that in the multiple linear regression model ˆβ is multivariate normal with mean β and covariance matrix σ 2 (X X) 1, then is is also true that Lemma 1. For 0 i k ˆβ i is normally distributed with mean β i and variance σ C ii, if C ii is the i th diagonal entry of (X X) 1. Then confidence interval can be easily found as Lemma 2. For 0 i k a (1 α) 100% confidence interval for β i is given by ˆβ i ± t n p α/2 ˆσÈ C ii To find a 95% confidence interval for the regression coefficient first find t 4 0.025 = 2.776, then 2.38 ± 2.776(1.388) 0.018 2.38 ± 0.517 The 95% confidence interval for β 2 is [1.863, 2.897]. We are 95% confident that on average the blood pressure increases between 1.86 and 2.90 for every extra kilogram in weight after correcting for age. Joint confidence intervals for a set of regression coefficients. Instead of calculating individual confidence intervals for a number of regression coefficients one can find a joint confidence region for two or more regression coefficients at once. The result is stronger since in this case the confidence level applies to all coefficients at once instead individually. Lemma 3. A (1 α)% confidence region for the β 1 R r from the partitioned regression coefficient vector β = ( β 1, β 2 ) is given by Proof: The result is due to the fact that is F distributed with df n = r and df d = n p. ( ˆβ 1 β 1 ) (X 1X 1 )( ˆβ 1 β 1 ) r MS Res F α (r, n p) ( ˆβ 1 β 1 ) (X 1X 1 )( ˆβ 1 β 1 ) r MS Res For two dimensions this is an ellipse, but for three parameters it would be a 3-dimensional ellipsoid. 1

The following graph shows the 95% confidence region for β = (β 1, β 2 ) from the model bp = β 0 + β 1 (weight) + β 2 (age) + ε, ε N (0, σ 2 ) The graph was created with R using the following commands. library(ellipse) BP.data<-read.table("BP.txt",header=T) attach(bp.data) fit<-lm(bp~age+weight) plot(ellipse(fit, which = c( AGE, WEIGHT ), level = 0.95), type = l ) points(fit$coefficients[ AGE ], fit$coefficients[ WEIGHT ]) 4.7.2 Confidence interval for the mean response and prediction interval To find a confidence interval for E(Y x 0 ) the mean response given the regressors variables have values x 0 observe that a point estimate is given by Ŷ 0 = x 0 ˆβ It is unbiased and has variance σ 2 x 0 (X X) 1 x 0 (Proof!!). Because it is also normal a confidence interval is derived as 2

Lemma 4. In the MLRM a (1 α) 100% confidence interval for E(Y x 0 ) is given by Ŷ 0 ± t n p α/2 ˆσ È x 0 (X X) 1 x 0 Find a 95% confidence interval for the mean blood pressure of people who are 35 years old and weigh 70kg, i.e. x 0 = (1, 70, 35). t 4 0.025 = 2.776, and 66.13 ŷ 0 = (1, 70, 35) 2.38 0.44 = 115.79 and x 0 (X X) 1 x 0 = (1, 70, 35) then the confidence interval is given by 129.777 1.534 0.452 1.534 0.018 0.005 0.452 0.005 0.002 1 70 35 115.79 ± 2.776(1.388) 0.495 115.79 ± 2.711 [113.08, 118.50] = 0.495 We are 95% confident that the mean blood pressure of people who are 35 years old and weigh 70kg falls between 113 and 118.5. To find a prediction interval for Y 0 = Y x = x 0 the distribution of Ŷ 0 Y 0 has to be considered. Compare to the approach taken for the SLRM. The mean of Ŷ0 Y 0 is 0, and the variance is σ 2 (1 + x 0 (X X) 1 x 0 ). In addition it is normal as a linear combination of normal random variables. Therefore: Lemma 5. A (1 α) 100% prediction interval for Y 0 = Y x = x 0 is given by Ŷ 0 ± t n p α/2 ˆσ È 1 + x 0 (X X) 1 x 0 Find a 95% prediction interval for the blood pressure of people who are 35 years old and weigh 70 kg, i.e. x 0 = (1, 70, 35). Using the results from above 115.79 ± 2.776(1.388) 1 + 0.495 115.79 ± 4.711 [111.08, 120.50] We predict that 95% of people who are 35 years old and weigh 70 kg mean have a blood pressure between 111 and 120.5. 3

4.8 Standardized Regression In some studies the strength of influence of different regressors on the mean response shall be compared. Since the regressors are usually measured on different scales, this is impossible from the MLRM. In order to make coefficients comparable the standardized regression coefficients can be used. The standardized regression coefficients are the least squares estimators in the model, fitting response variable with regressors Y i = Y i Ȳ s Y, 1 i n z ij = x ij x j, 1 i n, 1 j k s j where s 2 j is the sample variance of x ij, 1 i n and s 2 Y is the sample variance of Y i, 1 i n. The model fit is Y = β 0 + β 1 z 1 + β 2 z 2 + + β k z k ε, ε N (0, σ 2 ) The least squares estimates ˆb are ˆb = (Z Z) 1 Z Y Theorem 4.1. The least squares estimate ˆβ can be found from the standardized regression coefficients as ˆβ j = ˆb j s Y s j = ˆb j SST SS jj Œ 2, 1 j k and where s Y ˆβ 0 = Ȳ kx i=1 ˆβ j x j and s j are the standard deviations of Y and x j, respectively, and SS jj = SS xj = nx i=1 (x ij x j ) 2 = nx i=1 x 2 ij (P n i=1 x ij ) 2 n = x j x j x j J n x j The least squares estimators for this model are called the standardized regression coefficients or beta coefficients and are usually denoted by β. They measure the change in the response (measured in standard deviations) for an increase in x j by one standard deviation when leaving the other regressors unchanged. Because the slopes are now measuring the change in y based on standardized increases in the regressors they are comparable and the larger absolute value of the standardized regression coefficient the stronger its effect on the response. ˆb0 for this model is 0, because of the centering of both response and regressors. Using the following R-code the standardized regression coefficients are 4

y<-bp.data$bp y<-(y-mean(y))/sd(y) (y<-t(t(y))) x1<-bp.data$weight x1<-(x1-mean(x1))/sd(x1) (x1<-t(t(x1))) x2<-bp.data$age x2<-(x2-mean(x2))/sd(x2) (x2<-t(t(x2))) (X<-cbind(x1,x2)) (beta<-solve(t(x)%*%x)%*%t(x)%*%y) The result [,1] [1,] 1.4029059 [2,] 0.7115687 So that ˆb 1 = 1.40, ˆb 2 = 0.71 and ˆb 0 = 0. Interpretation: On average the blood pressure increases 1.4 standard deviations, when increasing the weight by one standard deviation and keeping the age unchanged, and on average the blood pressure increases 0.7 standard deviations, when increasing the age by one standard deviation and keeping the weight unchanged. According to this data changes in weight have a stronger effect on the blood pressure than changes in age. 4.9 Multicollinearity When the columns of X are linearly dependent then X does not have full rank, therefore X X does not have full rank and its inverse does not exist and the determinant is 0, which indicates that the normal equation has infinite many solutions. Should this occur in a practical situation, it indicates that at least one of the regressors is redundant and should be removed from the model. Multicollinearity refers to the problem when the columns are near-linear dependent, or the determinant of X X is close to zero. The consequence is that the entries of the covariance matrix of ˆβ, which is σ 2 (X X) 1 are large, resulting in wide confidence intervals for the regression coefficients, indicating unprecise estimates for the regression coefficient resulting poor confidence intervals for the mean response and prediction intervals. A measure to detect multicollinearity is the determinant of the (X X) 1, or the diagonal elements of the inverse of the correlation matrix, which is standardized and easier to judge than the covariance matrix. If the diagonal entries, called variance inflation factors, exceed 10, they are considered critical. 5

Continue Example. For our example the inverse of the correlation matrix is 2.26 Œ 1.69 1.69 2.26 Indicating that we do not have to be concerned about multicollinearity. 6