Multiple Regression Analysis in Minitab 1

Similar documents
Regression Analysis: A Complete Example

Module 5: Multiple Regression Analysis

Regression step-by-step using Microsoft Excel

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Premaster Statistics Tutorial 4 Full solutions

TRINITY COLLEGE. Faculty of Engineering, Mathematics and Science. School of Computer Science & Statistics

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

Predictor Coef StDev T P Constant X S = R-Sq = 0.0% R-Sq(adj) = 0.

1.1. Simple Regression in Excel (Excel 2010).

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

SPSS Guide: Regression Analysis

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

c 2015, Jeffrey S. Simonoff 1

SPSS Explore procedure

MULTIPLE REGRESSION EXAMPLE

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

2. Simple Linear Regression

17. SIMPLE LINEAR REGRESSION II

The Volatility Index Stefan Iacono University System of Maryland Foundation

One-Way ANOVA using SPSS SPSS ANOVA procedures found in the Compare Means analyses. Specifically, we demonstrate

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.

Week TSX Index

Directions for using SPSS

Simple linear regression

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 23. Inferences for Regression

MGT 267 PROJECT. Forecasting the United States Retail Sales of the Pharmacies and Drug Stores. Done by: Shunwei Wang & Mohammad Zainal

MULTIPLE LINEAR REGRESSION ANALYSIS USING MICROSOFT EXCEL. by Michael L. Orlov Chemistry Department, Oregon State University (1996)

1.5 Oneway Analysis of Variance

Using R for Linear Regression

SIMPLE LINEAR CORRELATION. r can range from -1 to 1, and is independent of units of measurement. Correlation can be done on two dependent variables.

11. Analysis of Case-control Studies Logistic Regression

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

ANALYSIS OF TREND CHAPTER 5

Analysing Questionnaires using Minitab (for SPSS queries contact -)

Linear Models in STATA and ANOVA

Simple Linear Regression, Scatterplots, and Bivariate Correlation

August 2012 EXAMINATIONS Solution Part I

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

Once saved, if the file was zipped you will need to unzip it. For the files that I will be posting you need to change the preferences.

Multiple Linear Regression

EXCEL Analysis TookPak [Statistical Analysis] 1. First of all, check to make sure that the Analysis ToolPak is installed. Here is how you do it:

Getting Started with Minitab 17

Moderation. Moderation

Data Analysis Tools. Tools for Summarizing Data

Regression III: Advanced Methods

12: Analysis of Variance. Introduction

Chapter 5 Analysis of variance SPSS Analysis of variance

One-Way Analysis of Variance (ANOVA) Example Problem

Chapter 7: Simple linear regression Learning Objectives

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

This chapter will demonstrate how to perform multiple linear regression with IBM SPSS

Simple Linear Regression Inference

4. Multiple Regression in Practice

Stepwise Regression. Chapter 311. Introduction. Variable Selection Procedures. Forward (Step-Up) Selection

Estimation of σ 2, the variance of ɛ

Difference of Means and ANOVA Problems

Paired 2 Sample t-test

Regression Analysis (Spring, 2000)

Hypothesis testing - Steps

Univariate Regression

Factors affecting online sales

When to use Excel. When NOT to use Excel 9/24/2014

How To Test For Significance On A Data Set

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance

INTRODUCTION TO MULTIPLE CORRELATION

2 Sample t-test (unequal sample sizes and unequal variances)

January 26, 2009 The Faculty Center for Teaching and Learning

How To Run Statistical Tests in Excel

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.

Using Microsoft Excel to Analyze Data from the Disk Diffusion Assay

Chapter 7. One-way ANOVA

Minitab Tutorials for Design and Analysis of Experiments. Table of Contents

Notes on Applied Linear Regression

2013 MBA Jump Start Program. Statistics Module Part 3

Copyright 2007 by Laura Schultz. All rights reserved. Page 1 of 5

Section 13, Part 1 ANOVA. Analysis Of Variance

Curve Fitting. Before You Begin

(More Practice With Trend Forecasts)

Multicollinearity Richard Williams, University of Notre Dame, Last revised January 13, 2015

Standard errors of marginal effects in the heteroskedastic probit model

Multiple Regression: What Is It?

Confidence Intervals for Cp

Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY

GLM I An Introduction to Generalized Linear Models

Simple Predictive Analytics Curtis Seare

Using Excel for inferential statistics

Chapter 2 Probability Topics SPSS T tests

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

General Regression Formulae ) (N-2) (1 - r 2 YX

ABSORBENCY OF PAPER TOWELS

STAT 350 Practice Final Exam Solution (Spring 2015)

Confidence Intervals for Spearman s Rank Correlation

Doing Multiple Regression with SPSS. In this case, we are interested in the Analyze options so we choose that menu. If gives us a number of choices:

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

Transcription:

Multiple Regression Analysis in Minitab 1 Suppose we are interested in how the exercise and body mass index affect the blood pressure. A random sample of 10 males 50 years of age is selected and their height, weight, number of hours of exercise and the blood pressure are measured. Body mass index is calculated by the following ( ) formula: ( ). Select Stat-Regression-Regression from the pull-down menu. Placing the variable we would like to predict, blood pressure, in the Response: and the variable we will use for prediction, exercise and body mass index in the Predictors: box. Click OK. This generates the following Minitab output. The regression equation is BloodPressure = 74.5-2.84 Exercise + 2.71 BMI Predictor Coef SE Coef T P Constant 74.49 29.41 2.53 0.039 Exercise -2.836 1.861-1.52 0.171 BMI 2.7119 0.9144 2.97 0.021 S = 11.9087 R-Sq = 80.4% R-Sq(adj) = 74.8% Analysis of Variance Source DF SS MS F P Regression 2 4079.8 2039.9 14.38 0.003 Residual Error 7 992.7 141.8 Total 9 5072.5 The interpretation of R 2 is same as before. We can see that 80.4% of the variation in Y is explained by the regression line. The fitted regression model found from the output is (Blood Pressure) = 74.5-2.84 * Exercise + 2.71 * BMI.

Multiple Regression Analysis in Minitab 2 The next part of the output is the statistical analysis (ANOVA-analysis of variance) for the regression model. The ANOVA represents a hypothesis test with where the null hypothesis is H : 0 for all i (In simple regression, i = 1) o i H A : i 0 for at least 1 coefficient In this example, p-value for this overall test is.003 concluding at least one of independent variables is significantly meaningful to explain the blood pressure. The individual t-test can also be performed. In this example p-value is.171 and.021. Thus, is not significantly different from zero when 1 body mass index is in the model, and is significantly different from zero when body mass 2 index is in the model. Model assumption checking and prediction interval can be done in the similar manner as the simple regression analysis. Normal probability plot and residual plot can be obtained by clicking the Graphs button in the Regression window, then checking the Normal plot of residuals and Residuals versus fits boxes. Click OK to exit the graphs window, click OK again to run the test.

Multiple Regression Analysis in Minitab 3 Full and Reduced Models Sometimes in multiple regression analysis, it is useful to test whether subsets of coefficients are equal to zero. To do this a partial F test will be considered. This answers the question, Is the full model better than the reduced model at explaining variation in y? The following hypotheses are considered: Where L represents the number of variables in the reduced model and K represents the number of variables in our full model. Rejecting means the full model is preferred over the reduced model, whereas not rejecting means the reduced model is preferred.the Partial F is used to test these hypotheses and is given by ( ) ( ) Note that the denominator is the from the full model s output. The decision rule for the Partial F test is: Reject if F > F(α; K-L, n-k-1) Fail to reject if F F(α; K-L, n-k-1) In our example above we might consider comparing the full model with exercise and BMI predicting blood pressure to a reduced model of the BMI predicting blood pressure at the α=0.05 level. To calculate the partial F we will run the regression for the reduced model by clicking Stat-Regression-Regression and putting BloodPressure in as the Response: variable and BMI in as the Predictors: variable and click OK. The reduced model output reads: The regression equation is BloodPressure = 41.0 + 3.61 BMI Predictor Coef SE Coef T P Constant 40.98 21.07 1.94 0.088 BMI 3.6060 0.7569 4.76 0.001 S = 12.8545 R-Sq = 73.9% R-Sq(adj) = 70.7% Analysis of Variance Source DF SS MS F P Regression 1 3750.6 3750.6 22.70 0.001 Residual Error 8 1321.9 165.2 Total 9 5072.5 To calculate the partial F we will use the output for the full model found on page 1 and the reduced model output above.

Multiple Regression Analysis in Minitab 4 ( ) ( ) Then use an F-table to look up the value for F(α; K-L, n-k-1) = F(0.05; 1, 7) = 5.59. According to our decision rule, 2.32 5.59. This means we fail to reject. This means that at the α=0.05 level we have found evidence that the reduced model is more efficient at explaining the games won. Calculating Confidence Intervals and Prediction Intervals Calculating CI and PI for multiple regressions are fairly similar to simple linear regressions. For multiple regressions you can create the intervals for your model based on the predictor variables. Consider the full model from earlier in this tutorial. We can predict the CI and PI for 6 hours of exercise and a BMI of 20.1 by entering the values in as seen below after clicking Stat- Regression-Regression-Options to get to the window. Then press OK and OK to run the regression analysis. The output will now include: Predicted Values for New Observations New Obs Fit SE Fit 95% CI 95% PI 1 111.98 6.38 (96.88, 127.08) (80.03, 143.93) Values of Predictors for New Observations New Obs Exercise BMI 1 6.00 20.1 The 95% CI for this combination is (96.88, 127.08) and the 95% PI is (80.03, 143.93). The values entered can be seen at the bottom of the output to ensure each variable was correctly entered and not accidentally switched around or mistyped.

Multiple Regression Analysis in Minitab 5 Transformation of Variables It is not always obvious what to do when your model does not fit well. Transformations may be the easiest way to produce a better fit, especially when collecting more data is not feasible. Options to try include polynomial regression, inverse transformation, log transformation of the explanatory variable, log transformation of dependent and explanatory variable, and many more transformations. This tutorial will look at creating an inverse transformation for the model and storing this information in your Minitab sheet. Click Calc-Calculator and enter your information in the appropriate spaces in the window that pops up. Choose a column without data stored in it to store your inverse transformation. In the expression blank enter the appropriate algebra and functions for your transformation. Then press OK and your transformation will then appear in your Minitab sheet. Use this variable instead of the original variable in your regression to see if the model becomes a better fit. A Potential Problem with Multiple Regression When explanatory variables are correlated with one another, the problem of multicollinearity, or near-linear dependence among regression variables, is said to exist. When a high degree of multicollinearity exists, the variance of the regression coefficients are infalted. This can lead to small t-values and unstable regression coefficients. Multicollinearity does not affect the ability to obtain a good fit to the regression ( ) or the quality of forecasts or predictions from the regression. The way to determine if this is a problem with your model is to look at the Variance Inflation Factors (VIFs). The equation for the VIF for the jth regression coefficent can be written as, where is the coefficient of multiple determination obtained by performing the

Multiple Regression Analysis in Minitab 6 regression of on the remaining K-1 regressor variables. Any individual VIF larger than 10 should indiciate that multicollinearity is present. To check for VIFs in Minitab click Stat-Regression-Regression from the drop-down menu. Next click the Options button. Then check the Variance inflation factors box under Display, click OK. Then click OK again to run the test. The data created in the output will look identical to the data collected before, except the table of coefficient will contain an additional column of information: Predictor Coef SE Coef T P VIF Constant 74.49 29.41 2.53 0.039 Exercise -2.836 1.861-1.52 0.171 1.701 BMI 2.7119 0.9144 2.97 0.021 1.701 Seeing the both our VIFs are below 10 we assume multicollinearity is not present in our model. Correcting Multicollinearity In order to correct for multicollinearity you need to remove the variables that are highly correlated with others. You can also try to add more data, this might break the pattern of multicollinearity. There are drawbacks to these solutions. If you remove a variable you will obtain no information on the removed variable. So choosing which correlated variable to remove can be difficult. If you add more data the multicollinearity won t always disappear, and sometimes it is impossible to add more data (due to budget restraints or lack of data beyond what is known). If a regression model is used strictly for forecasting, corrections may not be necessary.