Topic 3. Chapter 5: Linear Regression in Matrix Form

Similar documents

Notes on Applied Linear Regression

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Quadratic forms Cochran s theorem, degrees of freedom, and all that

MULTIPLE REGRESSION EXAMPLE

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Module 3: Correlation and Covariance

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

5. Linear Regression

Simple Regression Theory II 2010 Samuel L. Baker

Week 5: Multiple Linear Regression

Introduction to General and Generalized Linear Models

Chicago Booth BUSINESS STATISTICS Final Exam Fall 2011

Part II. Multiple Linear Regression

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

1 Simple Linear Regression I Least Squares Estimation

Exercise 1.12 (Pg )

2. Simple Linear Regression

Part 2: Analysis of Relationship Between Two Variables

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Introduction to Matrix Algebra

Chapter 13 Introduction to Linear Regression and Correlation Analysis

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

SPSS Guide: Regression Analysis

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

Getting Correct Results from PROC REG

STAT 350 Practice Final Exam Solution (Spring 2015)

Coefficient of Determination

Section 1: Simple Linear Regression

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

Module 5: Multiple Regression Analysis

Recall this chart that showed how most of our course would be organized:

Econometrics Simple Linear Regression

Simple Linear Regression Inference

Multivariate normal distribution and testing for means (see MKB Ch 3)

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

Chapter 7: Simple linear regression Learning Objectives

Univariate Regression

CALCULATIONS & STATISTICS

Solution to Homework 2

3.1 Least squares in matrix form

Session 7 Bivariate Data and Analysis

Analyzing Intervention Effects: Multilevel & Other Approaches. Simplest Intervention Design. Better Design: Have Pretest

Simple linear regression

Least Squares Estimation

INTRODUCTION TO MULTIPLE CORRELATION

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Multiple Regression: What Is It?

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Doing Multiple Regression with SPSS. In this case, we are interested in the Analyze options so we choose that menu. If gives us a number of choices:

Data Mining: Algorithms and Applications Matrix Math Review

Chapter 5 Analysis of variance SPSS Analysis of variance

Statistics 104: Section 6!

Additional sources Compilation of sources:

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

GLM I An Introduction to Generalized Linear Models

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

Multiple Linear Regression in Data Mining

Multiple Linear Regression

The importance of graphing the data: Anscombe s regression examples

Regression Analysis: A Complete Example

Statistical Models in R

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

MULTIPLE REGRESSION WITH CATEGORICAL DATA

UNDERSTANDING THE TWO-WAY ANOVA

Multiple regression - Matrices

Simple Linear Regression, Scatterplots, and Bivariate Correlation

Lab 5 Linear Regression with Within-subject Correlation. Goals: Data: Use the pig data which is in wide format:

Estimation of σ 2, the variance of ɛ

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

SIMPLE LINEAR CORRELATION. r can range from -1 to 1, and is independent of units of measurement. Correlation can be done on two dependent variables.

Chapter 7 Section 7.1: Inference for the Mean of a Population

Data Analysis Tools. Tools for Summarizing Data

Dongfeng Li. Autumn 2010

ANOVA. February 12, 2015

5. Multiple regression

Factor Analysis. Chapter 420. Introduction

Similarity and Diagonalization. Similar Matrices

CS 147: Computer Systems Performance Analysis

Statistiek II. John Nerbonne. October 1, Dept of Information Science

Correlation and Regression

Multivariate Analysis of Variance (MANOVA): I. Theory

Sections 2.11 and 5.8

Multivariate Analysis of Variance (MANOVA)

CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA

Economics of Strategy (ECON 4550) Maymester 2015 Applications of Regression Analysis

CURVE FITTING LEAST SQUARES APPROXIMATION

1.5 Oneway Analysis of Variance

Linear Models in STATA and ANOVA

1 Basic ANOVA concepts

Linear Algebra Notes

Introduction to Linear Regression

Transcription:

Topic Overview Statistics 512: Applied Linear Models Topic 3 This topic will cover thinking in terms of matrices regression on multiple predictor variables case study: CS majors Text Example (NKNW 241) Chapter 5: Linear Regression in Matrix Form The SLR Model in Scalar Form Y i = β 0 + β 1 X i + ɛ i where ɛ i iid N(0,σ 2 ) Consider now writing an equation for each observation: Y 1 = β 0 + β 1 X 1 + ɛ 1 Y 2 = β 0 + β 1 X 2 + ɛ 2 Y n = β 0 + β 1 X n + ɛ n TheSLRModelinMatrixForm Y 1 Y 2 Y n Y 1 Y 2 Y n = = β 0 + β 1 X 1 β 0 + β 1 X 2 ɛ 1 + ɛ 2 β 0 + β 1 X n ɛ n 1 X 1 1 X 2 [ ] β0 1 X n β 1 + (I will try to use bold symbols for matrices At first, I will also indicate the dimensions as a subscript to the symbol) 1 ɛ 1 ɛ 2 ɛ n

X is called the design matrix β is the vector of parameters ɛ is the error vector Y is the response vector The Design Matrix 1 X 1 1 X 2 X n 2 = 1 X n Vector of Parameters β 2 1 = [ β0 β 1 ] Vector of Error Terms ɛ n 1 = ɛ 1 ɛ 2 ɛ n Vector of Responses Y n 1 = Y 1 Y 2 Y n Thus, Y = Xβ + ɛ Y n 1 = X n 2 β 2 1 + ɛ n 1 2

Variance-Covariance Matrix In general, for any set of variables U 1,U 2,,U n,theirvariance-covariance matrix is defined to be σ 2 {U 1 } σ{u 1,U 2 } σ{u 1,U n } σ 2 σ{u {U} = 2,U 1 } σ 2 {U 2 } σ{u n 1,U n } σ{u n,u 1 } σ{u n,u n 1 } σ 2 {U n } where σ 2 {U i } is the variance of U i,andσ{u i,u j } is the covariance of U i and U j When variables are uncorrelated, that means their covariance is 0 The variance-covariance matrix of uncorrelated variables will be a diagonal matrix, since all the covariances are 0 Note: Variables that are independent will also be uncorrelated So when variables are correlated, they are automatically dependent However, it is possible to have variables that are dependent but uncorrelated, since correlation only measures linear dependence A nice thing about normally distributed RV s is that they are a convenient special case: if they are uncorrelated, they are also independent Covariance Matrix of ɛ σ 2 {ɛ} n n = Cov ɛ 1 ɛ 2 ɛ n = σ2 I n n = σ 2 0 0 0 σ 2 0 0 0 σ 2 Covariance Matrix of Y σ 2 {Y} n n = Cov Y 1 Y 2 Y n = σ2 I n n Distributional Assumptions in Matrix Form ɛ N(0,σ 2 I) I is an n n identity matrix Ones in the diagonal elements specify that the variance of each ɛ i is 1 times σ 2 Zeros in the off-diagonal elements specify that the covariance between different ɛ i is zero This implies that the correlations are zero 3

Parameter Estimation Least Squares Residuals are ɛ = Y Xβ Want to minimize sum of squared residuals ɛ 1 ɛ 2 ɛ 2 i =[ɛ 1 ɛ 2 ɛ n ] ɛ n = ɛ ɛ We want to minimize ɛ ɛ =(Y Xβ) (Y Xβ), where the prime () denotes the transpose of the matrix (exchange the rows and columns) We take the derivative with respect to the vector β This is like a quadratic function: think (Y Xβ) 2 The derivative works out to 2 times the derivative of (Y Xβ) with respect to β d That is, ((Y dβ Xβ) (Y Xβ)) = 2X (Y Xβ) We set this equal to 0 (a vector of zeros), and solve for β So, 2X (Y Xβ) =0 Or,X Y = X Xβ (the normal equations) Normal Equations X Y =(X X)β Solving this equation for β gives the least squares solution for b = Multiply on the left by the inverse of the matrix X X (Notice that the matrix X X is a 2 2 square matrix for SLR) REMEMBER THIS b =(X X) 1 X Y [ b0 b 1 ] Reality Break: This is just to convince you that we have done nothing new nor magic all we are doing is writing the same old formulas for b 0 and b 1 in matrix format Do NOT worry if you cannot reproduce the following algebra, but you SHOULD try to follow it so that you believe me that this is really not a new formula Recall in Topic 1, we had b 1 = (Xi X)(Y i Ȳ ) (Xi X) 2 b 0 = Ȳ b 1 X SS XY SS X 4

Now let s look at the pieces of the new formula: 1 X [ ] 1 1 1 1 X 1 X 2 [ ] X = X 1 X 2 X n = n Xi Xi X 2 i 1 X n [ (X X) 1 1 X = n 2 i ] X i Xi 2 ( X i ) 2 = 1 [ X 2 i X i X i n nss X X i n Y [ ] 1 1 1 1 X Y 2 [ ] Y = Yi X 1 X 2 X n Xi Y i Y n = Plug these into the equation for b: b = (X X) 1 X Y = 1 [ X 2 i ][ ] X i nss X Yi X i n Xi Y i [ 1 ( X 2 = i )( Y i ) ( X i )( ] X i Y i ) nss X ( X i )( Y i )+n X i Y i [ 1 Ȳ ( X 2 = i ) X ] X i Y i SS X Xi Y i n [ XȲ 1 Ȳ ( X 2 = i ) Ȳ (n X 2 )+ X(n XȲ ) X ] X i Y i SS X SP XY [ ] [ ] 1 ȲSSX SP = XY X Ȳ SP XY [ ] SS = X X b0 SP SS X SP XY =, XY SS X b 1 where SS X = Xi 2 n X 2 = (X i X) 2 ] SP XY = X i Y i n XȲ = (X i X)(Y i Ȳ ) All we have done is to write the same old formulas for b 0 and b 1 in a fancy new format See NKNW page 200 for details Why have we bothered to do this? The cool part is that the same approach works for multiple regression All we do is make X and b into bigger matrices, and use exactly the same formula Other Quantities in Matrix Form Fitted Values Ŷ = Ŷ 1 Ŷ 2 Ŷ n = b 0 + b 1 X 1 b 0 + b 1 X 2 1 X 1 1 X 2 = b 0 + b 1 X n 1 X n [ b0 b 1 ] = Xb 5

Hat Matrix Ŷ = Xb Ŷ = X(X X) 1 X Y Ŷ = HY where H = X(X X) 1 X We call this the hat matrix because is turns Y s into Ŷ s Estimated Covariance Matrix of b This matrix b is a linear combination of the elements of Y These estimates are normal if Y is normal These estimates will be approximately normal in general A Useful Multivariate Theorem Suppose U N(µ, Σ), a multivariate normal vector, and V = c + DU, a linear transformation of U where c is a vector and D is a matrix Then V N(c + Dµ, DΣD ) Recall: b =(X X) 1 X Y =[(X X) 1 X ] Y and Y N(Xβ,σ 2 I) Now apply theorem to b using U = Y,µ= Xβ,Σ=σ 2 I V = b, c = 0, and D =(X X) 1 X The theorem tells us the vector b is normally distributed with mean and covariance matrix (X X) 1 (X X)β = β σ 2 ( (X X) 1 X ) I ( (X X) 1 X ) = σ 2 (X X) 1 (X X) ( (X X) 1) = σ 2 (X X) 1 using the fact that both X X and its inverse are symmetric, so ((X X) 1 ) =(X X) 1 Next we will use this framework to do multiple regression where we have more than one explanatory variable (ie, add another column to the design matrix and additional beta parameters) Multiple Regression Data for Multiple Regression Y i is the response variable (as usual) 6

X i,1,x i,2,,x i,p 1 are the p 1 explanatory variables for cases i =1ton Example In Homework #1 you considered modeling GPA as a function of entrance exam score But we could also consider intelligence test scores and high school GPA as potential predictors This would be 3 variables, so p =4 Potential problem to remember!!! These predictor variables are likely to be themselves correlated We always want to be careful of using variables that are themselves strongly correlated as predictors together in the same model The Multiple Regression Model where Y i = β 0 + β 1 X i,1 + β 2 X i,2 + + β p 1 X i,p 1 + ɛ i for i =1, 2,,n Y i is the value of the response variable for the ith case ɛ i iid N(0,σ 2 ) (exactly as before!) β 0 is the intercept (think multidimensionally) β 1,β 2,,β p 1 are the regression coefficients for the explanatory variables X i,k is the value of the kth explanatory variable for the ith case Parameters as usual include all of the β s as well as σ 2 These need to be estimated from the data Interesting Special Cases Polynomial model: Y i = β 0 + β 1 X i + β 2 X 2 i + + β p 1 X p 1 i X s can be indicator or dummy variables with X = 0 or 1 (or any other two distinct numbers) as possible values (eg ANOVA model) Interactions between explanatory variables are then expressed as a product of the X s: + ɛ i Y i = β 0 + β 1 X i,1 + β 2 X i,2 + β 3 X i,1 X i,2 + ɛ i 7

Model in Matrix Form Y n 1 = X n p β p 1 + ɛ n 1 ɛ N(0,σ 2 I n n ) Y N(Xβ,σ 2 I) Design Matrix X: X = 1 X 1,1 X 1,2 X 1,p 1 1 X 2,1 X 2,2 X 2,p 1 1 X n,1 X n,2 X n,p 1 Coefficient matrix β: β = β 0 β 1 β p 1 Parameter Estimation Least Squares Find b to minimize SSE =(Y Xb) (Y Xb) Obtain normal equations as before: X Xb = X Y Least Squares Solution b =(X X) 1 X Y Fitted (predicted) values for the mean of Y are where H = X(X X) 1 X Residuals Ŷ = Xb = X(X X) 1 X Y = HY, e = Y Ŷ = Y HY =(I H)Y Notice that the matrices H and (I H) have two special properties They are Symmetric: H = H and (I H) =(I H) Idempotent: H 2 = H and (I H)(I H) =(I H) 8

Covariance Matrix of Residuals where h i,i is the ith diagonal element of H Cov(e) = σ 2 (I H)(I H) = σ 2 (I H) Var(e i ) = σ 2 (1 h i,i ), Note: h i,i = X i(x X) 1 X i where X i =[1X i,1 X i,p 1 ] Residuals e i are usually somewhat correlated: cov(e i,e j )= σ 2 h i,j ; this is not unexpected, since they sum to 0 Estimation of σ Since we have estimated p parameters, SSE = e e has df E = n p The estimate for σ 2 is the usual estimate: s 2 = e e n p = (Y Xb) (Y Xb) n p s = s 2 = Root MSE = SSE df E = MSE Distribution of b We know that b =(X X) 1 X Y The only RV involved is Y, so the distribution of b is based on the distribution of Y Since Y N(Xβ,σ 2 I), and using the multivariate theorem from earlier (if you like, go through the details on your own), we have E(b) = ( (X X) 1 X ) Xβ = β σ 2 {b} = Cov(b) =σ 2 (X X) 1 Since σ 2 is estimated by the MSE s 2, σ 2 {b} is estimated by s 2 (X X) 1 ANOVA Table Sources of variation are Model (SAS) or Regression (NKNW) Error (Residual) Total SS and df add as before SSM + SSE = SST df M + df E = df T otal but their values are different from SLR 9

Sum of Squares SSM = (Ŷi Ȳ )2 SSE = (Y i Ŷi) 2 SSTO = (Y i Ȳ )2 Degrees of Freedom df M = p 1 df E = n p df T otal = n 1 The total degrees have not changed from SLR, but the model df has increased from 1 to p 1, ie, the number of X variables Correspondingly, the error df has decreased from n 2 to n p Mean Squares ANOVA Table MSM = SSM ( Ŷ i = Ȳ )2 df M p 1 MSE = SSE (Yi = Ŷi) 2 df E n p MST = SSTO (Yi = Ȳ )2 df T otal n 1 F -test Source df SS MSE F Model df M = p 1 SSM MSM MSM MSE Error df E = n p SSE MSE Total df T = n 1 SST H 0 : β 1 = β 2 = = β p 1 = 0 (all regression coefficients are zero) H A : β k 0, for at least one k =1,,p 1; at least of the β s is non-zero (or, not all the β s are zero) F = MSM/MSE Under H 0, F F p 1,n p Reject H 0 if F is larger than critical value; if using SAS, reject H 0 if p-value <α=005 10

What do we conclude? If H 0 is rejected, we conclude that at least one of the regression coefficients is non-zero; hence at least one of the X variablesisusefulinpredictingy (Doesn t say which one(s) though) If H 0 is not rejected, then we cannot conclude that any of the X variablesisuseful in predicting Y p-value of F -test The p-value for the F significance test tell us one of the following: there is no evidence to conclude that any of our explanatory variables can help us to model the response variable using this kind of model (p 005) one or more of the explanatory variables in our model is potentially useful for predicting the response in a linear model (p 005) R 2 The squared multiple regression correlation (R 2 ) gives the proportion of variation in the response variable explained by the explanatory variables It is sometimes called the coefficient of multiple determination (NKNW, page 230) R 2 = SSM/SST (the proportion of variation explained by the model) R 2 =1 (SSE/SST) (1 the proportion not explained by the model) F and R 2 are related: F = R 2 /(p 1) (1 R 2 )/(n p) Inference for Individual Regression Coefficients Confidence Interval for β k We know that b N(β,σ 2 (X X) 1 ) Define s 2 {b k } = [ s 2 {b} ], the kth diagonal element k,k CI for β k : b k ± t c s{b k },wheret c = t n p (0975) s 2 {b} p p = MSE (X X) 1 Significance Test for β k H 0 : β k =0 Same test statistic t = b k /s{b k } Still use df E which now is equal to n p p-value computed from t n p distribution This tests the significance of a variable given that the other variables are already in the model (ie, fitted last) Unlike in SLR, the t-tests for β are different from the F -test 11

Multiple Regression Case Study Example: Study of CS Students Problem: Computer science majors at Purdue have a large drop-out rate Potential Solution: Can we find predictors of success? Predictors must be available at time of entry into program Data Available Grade point average (GPA) after three semesters (Y i, the response variable) Five potential predictors (p = 6) X 1 =Highschoolmathgrades(HSM) X 2 = High school science grades (HSS) X 3 = High school English grades (HSE) X 4 = SAT Math (SATM) X 5 = SAT Verbal (SATV) Gender (1 = male, 2 = female) (we will ignore this one right now, since it is not a continuous variable) We have n = 224 observations, so if all five variables are included, the design matrix X is 224 6 The SAS program used to generate output for this is cssas Look at the individual variables Our first goal should be to take a look at the variables to see Is there anything that sticks out as unusual for any of the variables? How are these variables related to each other (pairwise)? If two predictor variables are strongly correlated, we wouldn t want to use them in the same model! We do this by looking at statistics and plots data cs; infile H:\System\Desktop\csdatadat ; input id gpa hsm hss hse satm satv genderm1; Descriptive Statistics: proc means proc means data=cs maxdec=2; var gpa hsm hss hse satm satv; The option maxdec = 2 sets the number of decimal places in the output to 2 (just showing you how) 12

Output from proc means The MEANS Procedure Variable N Mean Std Dev Minimum Maximum --------------------------------------------------------------------------------- gpa 224 264 078 012 400 hsm 224 832 164 200 1000 hss 224 809 170 300 1000 hse 224 809 151 300 1000 satm 224 59529 8640 30000 80000 satv 224 50455 9261 28500 76000 --------------------------------------------------------------------------------- Descriptive Statistics Note that proc univariate also provides lots of other information, not shown proc univariate data=cs noprint; var gpa hsm hss hse satm satv; histogram gpa hsm hss hse satm satv /normal; Figure 1: Graph of GPA (left) and High School Math (right) Figure 2: Graph of High School Science (left) and High School English (right) NOTE: If you want the plots (eg, histogram, qqplot) and not the copious output from proc univariate, use a noprint statement 13

Figure 3: Graph of SAT Math (left) and SAT Verbal (right) proc univariate data = cs noprint; histogram gpa / normal; Interactive Data Analysis Read in the dataset as usual From the menu bar, select Solutions -> analysis -> interactive data analysis Obtain SAS/Insight window Open library work Click on Data Set CS and click open Getting a Scatter Plot Matrix (CTRL) Click on GPA, SATM, SATV Go to menu Analyze Choose option Scatterplot(Y X) You can, while in this window, use Edit -> Copy to copy this plot to another program such as Word: (See Figure 4) This graph once you get used to it can be useful in getting an overall feel for the relationships among the variables Try other variables and some other options from the Analyze menu to see what happens Correlations SAS will give us the r (correlation) value between pairs of random variables in a data set using proc corr proc corr data=cs; var hsm hss hse; 14

Figure 4: Scatterplot Matrix hsm hss hse hsm 100000 057569 044689 <0001 <0001 hss 057569 100000 057937 <0001 <0001 hse 044689 057937 100000 <0001 <0001 To get rid of those p-values, which make it difficult to read, use a noprob statement Here are also a few different ways to call proc corr: proc corr data=cs noprob; var satm satv; satm satv satm 100000 046394 satv 046394 100000 proc corr data=cs noprob; var hsm hss hse; with satm satv; hsm hss hse satm 045351 024048 010828 satv 022112 026170 024371 15

proc corr data=cs noprob; var hsm hss hse satm satv; with gpa; hsm hss hse satm satv gpa 043650 032943 028900 025171 011449 Notice that not only do the X s correlate with Y (this is good), but the X s correlate with each other (this is bad) That means that some of the X s may be redundant in predicting Y Use High School Grades to Predict GPA proc reg data=cs; model gpa=hsm hss hse; Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 3 2771233 923744 1886 <0001 Error 220 10775046 048977 Corrected Total 223 13546279 Root MSE 069984 R-Square 02046 Dependent Mean 263522 Adj R-Sq 01937 Coeff Var 2655711 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept 1 058988 029424 200 00462 hsm 1 016857 003549 475 <0001 hss 1 003432 003756 091 03619 hse 1 004510 003870 117 02451 Remove HSS proc reg data=cs; model gpa=hsm hse; R-Square 02016 Adj R-Sq 01943 Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept 1 062423 029172 214 00335 hsm 1 018265 003196 572 <0001 hse 1 006067 003473 175 00820 Rerun with HSM only proc reg data=cs; model gpa=hsm; 16

R-Square 01905 Adj R-Sq 01869 Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept 1 090768 024355 373 00002 hsm 1 020760 002872 723 <0001 The last two models (HSM only and HSM, HSE) appear to be pretty good Notice that R 2 go down a little with the deletion of HSE, so HSE does provide a little information This is a judgment call: do you prefer a slightly better-fitting model, or a simpler one? Now look at SAT scores proc reg data=cs; model gpa=satm satv; R-Square 00634 Adj R-Sq 00549 Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept 1 128868 037604 343 00007 satm 1 000228 000066291 344 00007 satv 1-000002456 000061847-004 09684 Here we see that SATM is a significant explanatory variable, but the overall fit of the model is poor SATV does not appear to be useful at all Now try our two most promising candidates: HSM and SATM proc reg data=cs; model gpa=hsm satm; R-Square 01942 Adj R-Sq 01869 Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept 1 066574 034349 194 00539 hsm 1 019300 003222 599 <0001 satm 1 000061047 000061117 100 03190 In the presence of HSM, SATM does not appear to be at all useful Note that adjusted R 2 is the same as with HSM only, and that the p-value for SATM is now no longer significant General Linear Test Approach: HS and SAT s proc reg data=cs; model gpa=satm satv hsm hss hse; *Do general linear test; * test H0: beta1 = beta2 = 0; sat: test satm, satv; * test H0: beta3=beta4=beta5=0; hs: test hsm, hss, hse; 17

R-Square 02115 Adj R-Sq 01934 Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept 1 032672 040000 082 04149 satm 1 000094359 000068566 138 01702 satv 1-000040785 000059189-069 04915 hsm 1 014596 003926 372 00003 hss 1 003591 003780 095 03432 hse 1 005529 003957 140 01637 The first test statement tests the full vs reduced models (H 0 :nosatmorsatv,h a :full model) Test sat Results for Dependent Variable gpa Mean Source DF Square F Value Pr > F Numerator 2 046566 095 03882 Denominator 218 049000 We do not reject the H 0 that β 1 = β 2 = 0 Probably okay to throw out SAT scores The second test statement tests the full vs reduced models (H 0 H a : all three in the model) : no hsm, hss, or hse, Test hs Results for Dependent Variable gpa Mean Source DF Square F Value Pr > F Numerator 3 668660 1365 <0001 Denominator 218 049000 We reject the H 0 that β 3 = β 4 = β 5 = 0 CANNOT throw out all high school grades Can use the test statement to test any set of coefficients equal to zero Related to extra sums of squares (later) Best Model? Likely the one with just HSM Could argue that HSM and HSE is marginally better We ll discuss comparison methods in Chapters 7 and 8 Key ideas from case study First, look at graphical and numerical summaries for one variable at a time Then, look at relationships between pairs of variables with graphical and numerical summaries Use plots and correlations to understand relationships The relationship between a response variable and an explanatory variable depends on what other explanatory variables are in the model 18

A variable can be a significant (p <005) predictor alone and not significant (p >005) when other X s are in the model Regression coefficients, standard errors, and the results of significance tests depend on what other explanatory variables are in the model Significance tests (p-values) do not tell the whole story Squared multiple correlations (R 2 ) give the proportion of variation in the response variable explained by the explanatory variables, and can give a different view; see also adjusted R 2 You can fully understand the theory in terms of Y = Xβ + ɛ To effectively use this methodology in practice, you need to understand how the data were collected, the nature of the variables, and how they relate to each other Other things that should be considered Diagnostics: Do the models meet the assumptions? You might take one of our final two potential models and check these out on your own Confidence intervals/bands, Prediction intervals/bands? Example II (NKNW page 241) Dwaine Studios, Inc operates portrait studios in n = 21 cities of medium size Program used to generate output for confidence intervals for means and prediction intervals is nknw241sas Y i is sales in city i X 1 : population aged 16 and under X 2 : per capita disposable income Read in the data Y i = β 0 + β 1 X i,1 + β 2 X i,2 + ɛ i data a1; infile H:\System\Desktop\CH06FI05DAT ; input young income sales; proc print data=a1; 19

Obs young income sales 1 685 167 1744 2 452 168 1644 3 913 182 2442 4 478 163 1546 proc reg data=a1; model sales=young income Sum of Mean Source DF Squares Square F Value Pr > F Model 2 24015 12008 9910 <0001 Error 18 218092741 12116263 Corrected Total 20 26196 Root MSE 1100739 R-Square 09167 Dependent Mean 18190476 Adj R-Sq 09075 Coeff Var 605118 Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept 1-6885707 6001695-115 02663 young 1 145456 021178 687 <0001 income 1 936550 406396 230 00333 clb option: Confidence Intervals for the β s proc reg data=a1; model sales=young income/clb; 95% Confidence Limits Intercept -19494801 5723387 young 100962 189950 income 082744 1790356 Estimation of E(Y h ) X h is now a vector of values (Y h is still just a number) (1,X h,1,x h,2,,x h,p 1 ) = X h : this is row h of the design matrix We want a point estimate and a confidence interval for the subpopulation mean corresponding to the set of explanatory variables X h Theory for E(Y ) h E(Y h ) = µ j = X hβ ˆµ h = X hb s 2 {ˆµ h } = X hs 2 {b}x h = s 2 X h(x X) 1 X h 95%CI :ˆµ h ± s{ˆµ h }t n p (0975) 20

clm option: Confidence Intervals for the Mean proc reg data=a1; model sales=young income/clm; id young income; Dep Var Predicted Std Error Obs young income sales Value Mean Predict 95% CL Mean 1 685 167 1744000 1871841 38409 1791146 1952536 2 452 168 1644000 1542294 35558 1467591 1616998 3 913 182 2442000 2343963 45882 2247569 2440358 4 478 163 1546000 1533285 32331 1465361 1601210 Prediction of Y h(new) Predict a new observation Y h with X values equal to X h We want a prediction of Y h based on a set of predictor values with an interval that expresses the uncertainty in our prediction As in SLR, this interval is centered at Ŷh and is wider than the interval for the mean Theory for Y h Y h = X hβ + ɛ CI for Y h(new) : Ŷh ± s{pred}t n p (0975) Ŷ h = ˆµ h = X hβ s 2 {pred} = Var(Ŷh + ɛ) =Var(Ŷh)+Var(ɛ) = s ( ) 2 1+X h(x X) 1 X h cli option: Confidence Interval for an Individual Observation proc reg data=a1; model sales=young income/cli; id young income; Dep Var Predicted Std Error Obs young income sales Value Mean Predict 95% CL Predict 1 685 167 1744000 1871841 38409 1626910 2116772 2 452 168 1644000 1542294 35558 1299271 1785317 3 913 182 2442000 2343963 45882 2093421 2594506 4 478 163 1546000 1533285 32331 1292260 1774311 Diagnostics Look at the distribution of each variable to gain insight Look at the relationship between pairs of variables proc corr is useful here BUT note that relationships between variables can be more complicated than just pairwise: correlations are NOT the entire story 21

Plot the residuals vs the predicted/fitted values each explanatory variable time (if available) Are the relationships linear? Look at Y vs each X i May have to transform some X s Are the residuals approximately normal? Look at a histogram Normal quantile plot Is the variance constant? Remedies Check all the residual plots Similar remedies to simple regression (but more complicated to decide, though) Additionally may eliminate some of the X s (this is call variable selection) Transformations such as Box-Cox Analyze with/without outliers More detail in NKNW Ch 9 and 10 22