1 Lecture 5 Hypothesis Testing in Multiple Linear Regression BIOST 515 January 20, 2004
2 Types of tests 1 Overall test Test for addition of a single variable Test for addition of a group of variables
3 Overall test 2 y i = β 0 + x i1 β x ip β p + ɛ i Does the entire set of independent variables contribute significantly to the prediction of y?
4 Test for an addition of a single variable 3 Does the addition of one particular variable of interest add significantly to the prediction of y acheived by the other independent variables already in the model? y i = β 0 + x i1 β x ip β p + ɛ i
5 Test for addition of a group of variables 4 Does the addition of some group of independent variables of interest add significantly to the prediction of y obtained through other independent variables already in the model? y i = β 0 + x i1 β x i,p 1 β p 1 + x ip β p + ɛ i
6 The ANOVA table 5 Source of Sums of squares Degrees of Mean E[Mean square] variation freedom square Regression SSR = ˆβ X y nȳ 2 SSR p p pσ 2 + β R X C X Cβ R Error SSE = y y ˆβ X SSE y n (p + 1) n (p+1) σ 2 Total SST O = y y nȳ 2 n 1 X C is the matrix of centered predictors: X C = 0 x 11 x 1 x 12 x 2 x 1p x p x 21 x 1. x 22 x 2. x 2p x p. x n1 x 1 x n2 x 2 x np x p 1 C A and β R = (β 1,, β p ).
7 The ANOVA table for 6 y i = β 0 + x i1 β1 + x i2 β2 + + x ip β p + ɛ i is often provided in the output from statistical software as Source of Sums of squares Degrees of F variation freedom Regression x 1 1 x 2 x 1. 1 x p x p 1, x p 2,, x 1 1 Error SSE n (p + 1) Total SST O n 1 where SSR = SSR(x 1 ) + SSR(x 2 x 1 ) + + SSR(x p x p 1, x p 2,..., x 1 ) and has p degrees of freedom.
8 Overall test 7 H 0 : β 1 = β 2 = = β p = 0 H 1 : β j 0 for at least one j, j = 1,..., p Rejection of H 0 implies that at least one of the regressors, x 1, x 2,..., x p, contributes significantly to the model. We will use a generalization of the F-test in simple linear regression to test this hypothesis.
9 Under the null hypothesis, SSR/σ 2 χ 2 p and SSE/σ 2 χ 2 n (p+1) are independent. Therefore, we have 8 F 0 = SSR/p SSE/(n p 1) = MSR MSE F p,n p 1 Note: as in simple linear regression, we are assuming that ɛ i N(0, σ 2 ) or relying on large sample theory.
10 CHS example, cont. 9 > anova(lmwtht) Analysis of Variance Table y i = β 0 + weight i β 1 + height i β 2 + ɛ i Response: DIABP Df Sum Sq Mean Sq F value Pr(>F) WEIGHT ** HEIGHT Residuals Signif. codes: 0 *** ** 0.01 * ( )/2 F 0 = = 5.59 > F 2,495,.95 = /495 We reject the null hypothesis at α =.05 and conclude that at least one of β 1 or β 2 is not equal to 0.
11 The overall F statistic is also available from the output of summary(). 10 > summary(lmwtht) Call: lm(formula = DIABP ~ WEIGHT + HEIGHT, data = chs) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e-10 *** WEIGHT * HEIGHT Signif. codes: 0 *** ** 0.01 * Residual standard error: on 495 degrees of freedom Multiple R-Squared: , Adjusted R-squared: F-statistic: on 2 and 495 DF, p-value:
12 Tests on individual regression coefficients 11 Once we have determined that at least one of the regressors is important, a natural next question might be which one(s)? Important considerations: Is the increase in the regression sums of squares sufficient to warrant an additional predictor in the model? Additional predictors will increase the variance of ŷ - include only predictors that explain the response (note: we may not know this through hypothesis testing as confounders may not test significant but would still be necessary in the regression model). Adding an unimportant predictor may increase the residual mean square thereby reducing the usefulness of the model.
13 12 y i = β 0 + x i1 β x ij β j + + x ip β p + ɛ i H 0 : β j = 0 H 1 : β j 0 As in simple linear regression, under the null hypothesis t 0 = ˆβ j ŝe( ˆβ j ) t n p 1. We reject H 0 if t 0 > t n p 1,1 α/2. This is a partial test because ˆβ j depends on all of the other predictors x i, i j that are in the model. Thus, this is a test of the contribution of x j given the other predictors in the model.
14 CHS example, cont. 13 y i = β 0 + weight i β 1 + height i β 2 + ɛ i H 0 : β 2 = 0 vs H 1 : β 2 0, given that weight is in the model. From the ANOVA table, ˆσ2 = C = (X X) 1 = t 0 = / = < t 495,.975 = 1.96 Therefore, we fail to reject the null hypothesis.
15 Tests for groups of predictors 14 Often it is of interest to determine whether a group of predictors contribute to predicting y given another predictor or group of predictors are in the model. In CHS example, we may want to know if age, height and sex are important predictors given weight is in the model when predicting blood pressure. We may want to know if additional powers of some predictor are important in the model given the linear term is already in the model. Given a predictor of interest, are interactions with other confounders of interest as well?
16 Using sums of squares to test for groups of predictors 15 Determine the contribution of a predictor or group of predictors to SSR given that the other regressors are in the model using the extra-sums-of-squares method. Consider the regression model with p predictors y = Xβ + ɛ. We would like to determine if some subset of r < p predictors contributes significantly to the regression model.
17 Partition the vector of regression coefficients as β = [ ] β 1 β 2 16 where β 1 is (p + 1 r) 1 and β 2 is r 1. We want to test the hypothesis H 0 : β 2 = 0 Rewrite the model as where X = [X 1 X 2 ]. H 1 : β 2 0 y = Xβ + ɛ = X 1 β 1 + X 2 β 2 + ɛ, (1)
18 Equation (1) is the full model with SSR expressed as 17 SSR(X) = ˆβ X y (p+1 degrees of freedom) and MSE = y y ˆβ X y n p 1. To find the contribution of the predictors in X 2, fit the model assuming H 0 is true. This reduced model is y = X 1 β 1 + ɛ, where ˆβ 1 = (X 1 X 1 ) ( 1) X 1 y
19 and 18 SSR(X 1 ) = ˆβ 1 X 1 y (p+1-r degrees of freedom). The regression sums of squares due to X 2 when X 1 is already in the model is SSR(X 2 X 1 ) = SSR(X) SSR(X 1 ) with r degrees of freedom. This is also known as the extra sum of squares due to X 2. SSR(X 2 X 1 ) is independent of MSE. We can test H 0 : β 2 = 0 with the statistic F 0 = SSR(X2 X 1 )/r MSE F r,n p 1.
20 CHS example, cont. 19 Full model: y i = β 0 + weight i β 1 + height i β 2 H 0 : β 2 = 0 Df Sum Sq Mean Sq F value Pr(>F) WEIGHT HEIGHT Residuals F 0 = / = 0.95 < F 1,495,0.95 = 3.86 This should look very similar to the t-test for H 0.
21 20 BP i = β 0 + weight i β 1 + height i β 2 + age i β 3 + gender i β 4 + ɛ > summary(lm(diabp~weight+height+age+gender,data=chs)) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e-08 *** WEIGHT HEIGHT AGE *** GENDER Signif. codes: 0 *** ** 0.01 * Residual standard error: on 493 degrees of freedom Multiple R-Squared: , Adjusted R-squared: F-statistic: on 4 and 493 DF, p-value:
22 H 0 : β 2 = β 3 = β 4 = 0 vs H 1 : β j, j = 2, 3, 4 21 Df Sum Sq Mean Sq F value Pr(>F) WEIGHT HEIGHT AGE GENDER Residuals SSR(intercept, weight, height, age, gender) = = SSR(intercept, weight) = = SSR(height, age, gender intercept, weight) = = 1670 Notice we can also get this from the ANOVA table above SSR(height, age, gender intercept,weight) = = 1670
23 The observed F statistic is 22 F 0 = 1670/3/ = 13.5 > F 3,493,.95 = 2.62, and we reject the null hypothesis, concluding that at least one of β 2, β 3 or β 4 is not equal to 0. This should look very similar to the overall F test if we considered the intercept to be a predictor and all the covariates to be the additional variables under consideration.
24 What if we had put the predictors in the model in a different order? 23 diabp i = β 0 + height i β 2 + age i β 3 + weight i β 1 + gender i β 4 + ɛ Df Sum Sq Mean Sq F value Pr(>F) HEIGHT AGE WEIGHT GENDER Residuals Could we use this table to test H 0 : β 2 = β 3 = β 4 = 0?
25 What if we had the ANOVA table for the reduced model? Df Sum Sq Mean Sq F value Pr(>F) WEIGHT Residuals Given that SSR = SSR(x 2 ) + SSR(x 3 x 2 ) + SSR(x 1 x 2, x 3 ) + SSR(x 4 x 3, x 2, x 1 ) and then SSR(x 2, x 3, x 4 x 1 ) = SSR SSR(x 1 ) SSR(x 2, x 3, x 4 x 1 ) = = 1680.
26 One other question we might be interested in asking is if there are any significant interactions in the model? 25 lm(diabp~weight*height*age*gender,data=chs) Estimate Std. Error t value Pr(> t ) (Intercept) WEIGHT HEIGHT AGE GENDER WEIGHT:HEIGHT WEIGHT:AGE HEIGHT:AGE WEIGHT:GENDER HEIGHT:GENDER AGE:GENDER WEIGHT:HEIGHT:AGE WEIGHT:HEIGHT:GENDER WEIGHT:AGE:GENDER HEIGHT:AGE:GENDER WEIGHT:HEIGHT:AGE:GENDER
27 ANOVA table 26 Df Sum Sq Mean Sq F value Pr(>F) WEIGHT HEIGHT AGE GENDER WEIGHT:HEIGHT WEIGHT:AGE HEIGHT:AGE WEIGHT:GENDER HEIGHT:GENDER AGE:GENDER WEIGHT:HEIGHT:AGE WEIGHT:HEIGHT:GENDER WEIGHT:AGE:GENDER HEIGHT:AGE:GENDER WEIGHT:HEIGHT:AGE:GENDER Residuals
28 We can simplify the ANOVA table to 27 Df Sum Sq Mean Sq F value Pr(>F) Main effects Interactions Residuals How do we fill in the rest of this table?
12-1.1 Introduction Many applications of regression analysis involve situations in which there are more than one regressor variable. A regression model that contains more than one regressor variable is
Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is
Chapter 11: Linear Regression - Inference in Regression Analysis - Part 2 Note: Whether we calculate confidence intervals or perform hypothesis tests we need the distribution of the statistic we will use.
Simple linear regression Systematic components: β 0 + β 1 x i Stochastic component : error term ε Y i = β 0 + β 1 x i + ε i ; i = 1,..., n E(Y X) = β 0 + β 1 x the central parameter is the slope parameter
Regression in ANOVA James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) 1 / 30 Regression in ANOVA 1 Introduction 2 Basic Linear
Statistics in Geophysics: Linear Regression II Steffen Unkel Department of Statistics Ludwig-Maximilians-University Munich, Germany Winter Term 2013/14 1/28 Model definition Suppose we have the following
Stat 411/511 ANOVA & REGRESSION Nov 31st 2015 Charlotte Wickham stat511.cwick.co.nz This week Today: Lack of fit F-test Weds: Review email me topics, otherwise I ll go over some of last year s final exam
STAT E-150 Statistical Methods Multiple Regression Three percent of a man's body is essential fat, which is necessary for a healthy body. However, too much body fat can be dangerous. For men between the
0.1 Multiple Regression Models We will introduce the multiple Regression model as a mean of relating one numerical response variable y to two or more independent (or predictor variables. We will see different
Statistics II Final Exam - January 2012 Use the University stationery to give your answers to the following questions. Do not forget to write down your name and class group in each page. Indicate clearly
Regression in SPSS Workshop offered by the Mississippi Center for Supercomputing Research and the UM Office of Information Technology John P. Bentley Department of Pharmacy Administration University of
Lecture 7 Linear Regression Diagnostics BIOST 515 January 27, 2004 BIOST 515, Lecture 6 Major assumptions 1. The relationship between the outcomes and the predictors is (approximately) linear. 2. The error
Quadratic Models We extended the additive model in two variables to the interaction model by adding a third term to the equation. Similarly, we can extend the linear model in one variable to the quadratic
Least Squares Introduction We have mentioned that one should not always conclude that because two variables are correlated that one variable is causing the other to behave a certain way. However, sometimes
Statistics for Management II-STAT 362-Final Review Multiple Choice Identify the letter of the choice that best completes the statement or answers the question. 1. The ability of an interval estimate to
Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation
Statistical Modelling in Stata 5: Linear Models Mark Lunt Arthritis Research UK Centre for Excellence in Epidemiology University of Manchester 08/11/2016 Structure This Week What is a linear model? How
Math 141 Lecture 24: Model Comparisons and The F-test Albyn Jones 1 1 Library 304 email@example.com www.people.reed.edu/ jones/courses/141 Nested Models Two linear models are Nested if one (the restricted
Comparing Nested Models ST 430/514 Two models are nested if one model contains all the terms of the other, and at least one additional term. The larger model is the complete (or full) model, and the smaller
Regression Analysis Prof. Soumen Maity Department of Mathematics Indian Institute of Technology, Kharagpur Lecture - 7 Multiple Linear Regression (Contd.) This is my second lecture on Multiple Linear Regression
Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction
ECO 22000 McRAE SELF-TEST: SIMPLE REGRESSION Note: Those questions indicated with an (N) are unlikely to appear in this form on an in-class examination, but you should be able to describe the procedures
Econometrics The Multiple Regression Model: João Valle e Azevedo Faculdade de Economia Universidade Nova de Lisboa Spring Semester João Valle e Azevedo (FEUNL) Econometrics Lisbon, March 2011 1 / 24 in
The simple linear model Regression Analysis: Basic Concepts Allin Cottrell Represents the dependent variable, y i, as a linear function of one independent variable, x i, subject to a random disturbance
Questions and Answers on Hypothesis Testing and Confidence Intervals L. Magee Fall, 2008 1. Using 25 observations and 5 regressors, including the constant term, a researcher estimates a linear regression
Inference for Regression Simple Linear Regression IPS Chapter 10.1 2009 W.H. Freeman and Company Objectives (IPS Chapter 10.1) Simple linear regression Statistical model for linear regression Estimating
Chapter 5: Linear regression Last lecture: Ch 4............................................................ 2 Next: Ch 5................................................................. 3 Simple linear
5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4
Bivariate Analysis Variable 2 LEVELS >2 LEVELS COTIUOUS Correlation Used when you measure two continuous variables. Variable 2 2 LEVELS X 2 >2 LEVELS X 2 COTIUOUS t-test X 2 X 2 AOVA (F-test) t-test AOVA
STAT 22 Business Statistics II- Term3 KING FAHD UNIVERSITY OF PETROLEUM & MINERALS Department Of Mathematics & Statistics DHAHRAN, SAUDI ARABIA STAT 22: BUSINESS STATISTICS II Third Exam July, 202 9:20
Paired Differences and Regression Students sometimes have difficulty distinguishing between paired data and independent samples when comparing two means. One can return to this topic after covering simple
1 of 14 18/12/2006 6:42 Schweser Printable Answers - Session Investment Tools: Quantitative Methods for Valuation Test ID#: 1362402 Back to Test Review Hide Questions Print this Page Question 1 - #12631
PhD Qualifying exam Methodology Jan 2014 Solutions 1. True or false question - only circle "true " or "false" (a) True or false? F-statistic can be used for checking the equality of two population variances
DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,
Chapter 4: Constrained estimators and tests in the multiple linear regression model (Part II) Florian Pelgrin HEC September-December 2010 Florian Pelgrin (HEC) Constrained estimators September-December
ANOVA February 12, 2015 1 ANOVA models Last time, we discussed the use of categorical variables in multivariate regression. Often, these are encoded as indicator columns in the design matrix. In : %%R
Regression Analysis Pekka Tolonen Outline of Topics Simple linear regression: the form and estimation Hypothesis testing and statistical significance Empirical application: the capital asset pricing model
15.1 The Regression Model: Analysis of Residuals Tom Lewis Fall Term 2009 Tom Lewis () 15.1 The Regression Model: Analysis of Residuals Fall Term 2009 1 / 12 Outline 1 The regression model 2 Estimating
Math 143 Inference on Regression 1 Review of Linear Regression In Chapter 2, we used linear regression to describe linear relationships. The setting for this is a bivariate data set (i.e., a list of cases/subjects
A continuation of regression analysis Lesson Objectives Continue to build on regression analysis. Learn how residual plots help identify problems with the analysis. M23-1 M23-2 Example 1: continued Case
Supplement 13A: Partial F Test Purpose of the Partial F Test For a given regression model, could some of the predictors be eliminated without sacrificing too much in the way of fit? Conversely, would it
SPSS Guide: Regression Analysis I put this together to give you a step-by-step guide for replicating what we did in the computer lab. It should help you run the tests we covered. The best way to get familiar
Practice 3 SPSS Partially based on Notes from the University of Reading: http://www.reading.ac.uk Simple Linear Regression A simple linear regression model is fitted when you want to investigate whether
Soci708 Statistics for Sociologists Module 11 Multiple Regression 1 François Nielsen University of North Carolina Chapel Hill Fall 2009 1 Adapted from slides for the course Quantitative Methods in Sociology
1 SUBMODELS (NESTED MODELS) AND ANALYSIS OF VARIANCE OF REGRESSION MODELS We will assume we have data (x 1, y 1 ), (x 2, y 2 ),, (x n, y n ) and make the usual assumptions of independence and normality.
Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Linear Models in R Regression Regression analysis is the appropriate
How Do We Test Multiple Regression Coefficients? Suppose you have constructed a multiple linear regression model and you have a specific hypothesis to test which involves more than one regression coefficient.
Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear. In the main dialog box, input the dependent variable and several predictors.
Econometrics - Exam 1 Exam and Solution Please discuss each problem on a separate sheet of paper, not just on a separate page! Problem 1: (20 points A health economist plans to evaluate whether screening
Multiple Hypothesis Testing: The F-test Matt Blackwell December 3, 2008 1 A bit of review When moving into the matrix version of linear regression, it is easy to lose sight of the big picture and get lost
2SLS HATCO SPSS and SHAZAM Example by Eddie Oczkowski August 200 This example illustrates how to use SPSS to estimate and evaluate a 2SLS latent variable model. The bulk of the example relates to SPSS,
Chapter 6 - Analyses of Variance and Covariance as General Linear Models 6. Eye fixations per line of text for poor, average, and good readers: a. Design matrix, using only the first subject in each group:
Section 3: Simple Linear Regression Carlos M. Carvalho The University of Texas at Austin McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Regression: General Introduction
1 Multiple Linear Regression Basic Concepts Multiple linear regression is the extension of simple linear regression to the case of two or more independent variables. In simple linear regression, we had
Lecture 9 Heteroskedasticity In this chapter, we aim to answer the following questions: 1. What is the nature of heteroskedasticity? 2. What are its consequences? 3. how does one detect it? 4. What are
Simple Linear Regression Inference for Regression The simple linear regression model Estimating regression parameters; Confidence intervals and significance tests for regression parameters Inference about
How to calculate an ANOVA table Calculations by Hand We look at the following example: Let us say we measure the height of some plants under the effect of different fertilizers. Treatment Measures Mean
Econ 371 Problem Set #3 Answer Sheet 4.1 In this question, you are told that a OLS regression analysis of third grade test scores as a function of class size yields the following estimated model. T estscore
7. Tests of association and Linear Regression In this chapter we consider 1. Tests of Association for 2 qualitative variables. 2. Measures of the strength of linear association between 2 quantitative variables.
Statistical Consulting Topics MANOVA: Multivariate ANOVA Predictors are still factors, but we have more than one continuous-variable response on each experimental unit. For example, y i = (y i1, y i2 ).
RESTRICTED OPEN BOOK EXAMINATION (Not to be removed from the examination hall) Data provided: Statistics Tables by H.R. Neave MAS5052 SCHOOL OF MATHEMATICS AND STATISTICS Basic Statistics Spring Semester
Statistics 203: Introduction to Regression and Analysis of Variance Fixed vs. Random Effects Jonathan Taylor - p. 1/19 Today s class Implications for Random effects. One-way random effects ANOVA. Two-way
1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years
Bootstrapping Analogs of the One Way MANOVA Test Hasthika S Rupasinghe Arachchige Don and David J Olive Southern Illinois University March 17, 2016 Abstract The classical one way MANOVA model is used to
Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w
& ANOVA Edpsy 580 Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Multivariate Relationships and Multiple Linear Regression Slide 1 of
Section 16 Linear constraints in multiple linear regression. Analysis of variance. Multiple linear regression with general linear constraints. Let us consider a multiple linear regression Y = X + β and
Biostatistics ANOVA - Analysis of Variance Burkhardt Seifert & Alois Tschopp Biostatistics Unit University of Zurich Master of Science in Medical Biology 1 ANOVA = Analysis of variance Analysis of variance
Multivariate hypothesis tests for fixed effects Testing homogeneity of level-1 variances In the following sections, we use the model displayed in the figure below to illustrate the hypothesis tests. Partial
Chapter 6 Testing for Lack of Fit How can we tell if a model fits the data? If the model is correct then ˆσ 2 should be an unbiased estimate of σ 2. If we have a model which is not complex enough to fit
Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College
Simple Linear Regression Does sex influence mean GCSE score? In order to answer the question posed above, we want to run a linear regression of sgcseptsnew against sgender, which is a binary categorical
Chapter 5 Analysis of variance SPSS Analysis of variance Data file used: gss.sav How to get there: Analyze Compare Means One-way ANOVA To test the null hypothesis that several population means are equal,
4.4. Further Analysis within ANOVA 1) Estimation of the effects Fixed effects model: α i = µ i µ is estimated by a i = ( x i x) if H 0 : µ 1 = µ 2 = = µ k is rejected. Random effects model: If H 0 : σa
Regression Estimation - Least Squares and Maximum Likelihood Dr. Frank Wood Least Squares Max(min)imization 1. Function to minimize w.r.t. β 0, β 1 Q = n (Y i (β 0 + β 1 X i )) 2 i=1 2. Minimize this by
Part II Multiple Linear Regression 86 Chapter 7 Multiple Regression A multiple linear regression model is a linear model that describes how a y-variable relates to two or more xvariables (or transformations