The scatterplot indicates a positive linear relationship between waist size and body fat percentage:


1 STAT E150 Statistical Methods Multiple Regression Three percent of a man's body is essential fat, which is necessary for a healthy body. However, too much body fat can be dangerous. For men between the ages of 18 and 39, a healthy body fat percentage is 8% to 19%. (For women it is 21% to 32%.) It is not easy to measure body fat percentage, but we can find a model for the relationship between body fat percentage and waist size and use it to find the body fat percentage associated with a given waist size. The scatterplot indicates a positive linear relationship between waist size and body fat percentage: The SPSS output shows a significant linear relationship between the two variables. R 2 =.678, so we know that almost 68% of the variability in the body fat percentage is accounted for by the waist size. What other variables might be used to predict body fat percentage? Can we improve the prediction by including additional variables? Coefficients a Unstandardized Coefficients Standardized Coefficients Model B Std. Error Beta t Sig. 1 (Constant) Waist Model Summary Adjusted R Model R R Square Square Std. Error of the Estimate a
2 The Multiple Linear Regression Model We have n observations on k explanatory variables X 1, X 2, X 3,, X k and a response variable, Y. The multiple regression model is Y = β 0 + β 1 X 1 + β 2 X β k X k + ε where ε ~ N(0, σ ε ) and the errors are independent from one another. The predictor variables may be higher powers or other functions of quantitative variables, coded categorical variables, or interaction terms. The main restriction is that the model is linear; that is, each term is a constant multiple of a predictor. Fitting a Multiple Linear Regression Model As we did in Simple Linear Regression, we will choose a possible set of predictors, estimate the coefficients based on sample data, and assess the fit. We will again use the sum of squared residuals, where the residuals are the differences between the actual Y values and the Y values predicted by the prediction equation Y ˆ = β ˆ +βˆ X +βˆ X + +βˆ X k k and use SPSS to determine the estimates of the coefficients β i that minimize the sum of the squared residuals. We will test the hypotheses H 0 : β 1 = β 2 = β 3 = = β k = 0 H a : The slopes are not all zero. Our assumptions are:  The yvalues are independent of each other  Y has a constant variance for any combination of predictors  The values of y are normally distributed for any fixed set of values for the explanatory variables That is, the errors are independent values from a N(0, σ ε ) distribution. If the null hypothesis is rejected, then test a null hypothesis for each of the coefficients: H 0 : β j = 0 H a : β j 0 Note: If the null hypothesis is not rejected, it does not mean that the corresponding predictor variable has no relationship to y; it means that the predictor variable contributes nothing to modeling y after allowing for all the other predictors. 2
3 The hypotheses for fitting a multiple linear regression model to predict body fat percentage based on waist size and height are H 0 : β height = β waist = 0 H a : The slopes are not both zero. Here are the scatterplots using the individual predictors: Although this suggests a linear relationship between waist size and body fat percentage, there doesn't appear to be a linear relationship between height and body fat percentage. Here are some of the results for a multiple regression analysis with both height and waist as predictors: Coefficients a Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta (Constant) Waist Height ANOVA a Model Sum of Squares df Mean Square F Sig. Regression b 1 Residual Total b. Predictors: (Constant), Height, Waist The pvalue for height is close to 0, so we know that height does contribute to the multiple regression model. 3
4 The graph shown below is called a scatterplot matrix. It shows the scatterplots for all pairs of the variables we are using. Which pair of variables shows a strong linear relationship? Which pair of variables shows a weak linear relationship? Which pair of variables shows no linear relationship? Residual Analysis These plots tell us that there is no particular scatter to the residuals, and that the distribution of the residuals is close to normal. 4
5 Use the SPSS output to answer the questions below: Model Summary b Model R R Square Adjusted R Square Std. Error of the Estimate a a. Predictors: (Constant), Height, Waist b. Dependent Variable: Pct BF Coefficients a Model Unstandardized Coefficients Standardized Coefficients B Std. Error Beta t Sig. (Constant) Waist Height ANOVA a Model Sum of Squares df Mean Square F Sig. Regression b 1 Residual Total b. Predictors: (Constant), Height, Waist What is the fitted regression equation? What does the value tell you? What change in Body Fat Percentage is associated with each additional inch of height? What is the value of R 2? What does it tell you? 5
6 ANOVA b Model Sum of Squares df Mean Square F Sig. 1 Regression a Residual Total a. Predictors: (Constant), Waist, Height b. Dependent Variable: Pct BF Use the SPSS results to complete the hypothesis test: The value of the test statistic is: p = What can you conclude? We also want to estimate the standard deviation of the error term, σ ε As we add a new predictor to the model, we have a new coefficient to estimate, and so we lose one more degree of freedom. The estimate for the standard error of the multiple regression model with k predictors is ˆσ ε SSE n k 1 Use the SPSS output to find the standard error of this regression model: ˆσ ε SSE n k 1 ANOVA a Model Sum of Squares df Mean Square F Sig. Regression b 1 Residual Total b. Predictors: (Constant), Height, Waist 6
7 Assessing a Multiple Regression Model Individual ttests for Coefficients in Multiple Regression In order to determine whether any one of the predictor variables is helpful to include in the model, we test the coefficient for that predictor: H 0 : β i = 0 H a : β i 0 The test statistic is t ˆ 0 SE i with n  k  1 degrees of freedom. ˆ i It is important to remember that the meaning of each coefficient depends on all of the predictors in the regression model. If we fail to reject the null hypothesis, it means that the corresponding predictor variable contributes nothing to the multiple regression model after allowing for all other predictors. Use the SPSS output to test the coefficients in our model: H 0 : β height = 0 H a : β height 0 t = p = What is your conclusion? H 0 : β waist = 0 H a : β waist 0 t = p = What is your conclusion? H 0 : β waist = 0 H a : β waist > 0 t = p = What is your conclusion? Coefficients a Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta (Constant) Waist Height
8 Using SPSS for Multiple Regression Creating a Scatterplot Matrix Click on Graphs > Chart Builder. Select Scatter/Dot from the list of charts. Drag the Scatterplot Matrix to the window Drag the matrix variables to the horizontal axis. Click on OK. The scatterplot matrix will appear in the Output Viewer. 8
9 Estimating the Model Click on Analyze > Regression > Linear and drag the dependent variable and all independent variables to the appropriate locations. Click on OK. This will produce several tables: Model Summary Model R R Square Adjusted R Square Std. Error of the Estimate a a. Predictors: (Constant), Waist, Height Coefficients a Unstandardized Coefficients Standardized Coefficients Model B Std. Error Beta t Sig. 1 (Constant) Height Waist ANOVA b Model Sum of Squares df Mean Square F Sig. 1 Regression a Residual Total a. Predictors: (Constant), Waist, Height b. Dependent Variable: Pct BF 9
10 If you click on Plots in the Linear Regression dialog box, you will get this dialog box: Plot the *ZRESIDS on the Y axis against the *ZPRED values on the X axis. You may also choose to create a Normal Probability Plot and/or histogram of the residuals. Click on Continue and then OK. Here are the results: 10
12 Click on Analyze > Regression > Linear and drag the dependent variable and all independent variables to the appropriate locations. Click on OK. This will produce several tables: Model Summary Model R R Square Adjusted R Square Std. Error of the Estimate a a. Predictors: (Constant), Waist, Height Coefficients a Unstandardized Coefficients Standardized Coefficients Model B Std. Error Beta t Sig. 1 (Constant) Height Waist ANOVA b Model Sum of Squares df Mean Square F Sig. 1 Regression a Residual Total a. Predictors: (Constant), Waist, Height b. Dependent Variable: Pct BF 12
13 If you click on Plots in the Linear Regression dialog box, you will get this dialog box: Choose to plot the *SRESIDS on the Y axis against the *ZPRED values on the X axis. You may also choose to create a Normal Probability Plot of the residuals. Click on Continue and then OK. Here are the results: Assignment 3 Read Chapter 3 through section 3.2 Hand in the solutions to the following exercises: Ex. 3.1, 3.3, 3.11ac Use SPSS to complete the following exercises. Be sure to paste the SPSS output into your solutions, using Autofit. Ex. 3.12abc, 3.13, abc, 2.30, 2.31, 2.32,
