Interpreting Multiple Regression

Similar documents
1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Premaster Statistics Tutorial 4 Full solutions

Chapter 7: Simple linear regression Learning Objectives

Univariate Regression

Multiple Linear Regression

Simple linear regression

Multiple Regression: What Is It?

Stat 412/512 CASE INFLUENCE STATISTICS. Charlotte Wickham. stat512.cwick.co.nz. Feb

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 13 Introduction to Linear Regression and Correlation Analysis

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Predictor Coef StDev T P Constant X S = R-Sq = 0.0% R-Sq(adj) = 0.

The importance of graphing the data: Anscombe s regression examples

Simple Predictive Analytics Curtis Seare

Module 5: Multiple Regression Analysis

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Simple Linear Regression, Scatterplots, and Bivariate Correlation

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

MULTIPLE REGRESSION EXAMPLE

Didacticiel - Études de cas

Introduction to Regression and Data Analysis

Homework 8 Solutions

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

Stepwise Regression. Chapter 311. Introduction. Variable Selection Procedures. Forward (Step-Up) Selection

Regression III: Advanced Methods

5. Linear Regression

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

Linear Models in STATA and ANOVA

SPSS Guide: Regression Analysis

TRINITY COLLEGE. Faculty of Engineering, Mathematics and Science. School of Computer Science & Statistics

2. Simple Linear Regression

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

STAT 350 Practice Final Exam Solution (Spring 2015)

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Copyright 2007 by Laura Schultz. All rights reserved. Page 1 of 5

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS.

Example: Boats and Manatees

Fairfield Public Schools

Regression Analysis: A Complete Example

Notes on Applied Linear Regression

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Systat: Statistical Visualization Software

Simple Linear Regression Inference

Statistiek II. John Nerbonne. October 1, Dept of Information Science

Multicollinearity Richard Williams, University of Notre Dame, Last revised January 13, 2015

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Part 2: Analysis of Relationship Between Two Variables

Simple Regression Theory II 2010 Samuel L. Baker

Simple Linear Regression

Factors affecting online sales

Correlation and Simple Linear Regression

Teaching Multivariate Analysis to Business-Major Students

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

Module 3: Correlation and Covariance

Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation

5. Multiple regression

We are often interested in the relationship between two variables. Do people with more years of full-time education earn higher salaries?

Power Analysis for Correlation & Multiple Regression

4. Multiple Regression in Practice

Introduction to Data Analysis in Hierarchical Linear Models

CS 147: Computer Systems Performance Analysis

Analysing Questionnaires using Minitab (for SPSS queries contact -)

Nonlinear Regression Functions. SW Ch 8 1/54/

Assumptions. Assumptions of linear models. Boxplot. Data exploration. Apply to response variable. Apply to error terms from linear model

Logs Transformation in a Regression Equation

A full analysis example Multiple correlations Partial correlations

Data analysis and regression in Stata

COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES.

ABSORBENCY OF PAPER TOWELS

Elements of statistics (MATH0487-1)

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

Analysis of Variance ANOVA

Additional sources Compilation of sources:

THE UNIVERSITY OF CHICAGO, Booth School of Business Business 41202, Spring Quarter 2014, Mr. Ruey S. Tsay. Solutions to Homework Assignment #2

A Basic Introduction to Missing Data

International Statistical Institute, 56th Session, 2007: Phil Everson

Using R for Linear Regression

Algebra Academic Content Standards Grade Eight and Grade Nine Ohio. Grade Eight. Number, Number Sense and Operations Standard

Correlation and Regression

Example G Cost of construction of nuclear power plants

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

Regression Analysis (Spring, 2000)

Section A. Index. Section A. Planning, Budgeting and Forecasting Section A.2 Forecasting techniques Page 1 of 11. EduPristine CMA - Part I

Study Guide for the Final Exam

11. Analysis of Case-control Studies Logistic Regression

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE

A Short Tour of the Predictive Modeling Process

II. DISTRIBUTIONS distribution normal distribution. standard scores

Doing Multiple Regression with SPSS. In this case, we are interested in the Analyze options so we choose that menu. If gives us a number of choices:

Linear Regression. Chapter 5. Prediction via Regression Line Number of new birds and Percent returning. Least Squares

MULTIPLE REGRESSION WITH CATEGORICAL DATA

Section 13, Part 1 ANOVA. Analysis Of Variance

Week 5: Multiple Linear Regression

Chapter 4 and 5 solutions

Transcription:

Fall Semester, 2001 Statistics 621 Lecture 5 Robert Stine 1 Preliminaries Interpreting Multiple Regression Project and assignments Hope to have some further information on project soon. Due date for Assignment #2. Review of Key Points Outliers Leverage: unusual in terms of the predictor Influential: the regression changes in an important way when the point is removed from the fit. Only leveraged points influence the slope of the model. Choice to retain or exclude an outlier driven by substance of problem. Multiple regression adds predictors to the equation 0. Equation adds more factors ave( Y X )= β + β X + β X + + β X i i 0 1 i1 2 i2 L k ik 1. Independent observations (independent errors) 2. Constant variance observations (equal error variance) 3. Normally distributed around the regression line, written as an assumption about the unobserved errors: ε i 2 ~ N( 0, σ ) Interpreting regression coefficients Marginal coefficient in one-predictor regression includes effects of other correlated factors that are not in the regression model. Partial coefficient in multiple regression attempts to separate the effects of various predictors (i.e., holding the other factors fixed expression) Prediction intervals In-sample: (prediction) +/ 2 RMSE. Extrapolation: Error is slope estimation is compounded as model is extrapolated farther out, leading to gradual widening of intervals.

Fall Semester, 2001 2 Interpreting Regression Coefficients Car fuel consumption example Parameter Estimates Term Intercept Weight(lb) Estimate 9.4323 0.0136 Std Error 2.0545 0.0007 t Ratio 4.59 18.94 Prob> t Parameter Estimates Term Intercept Weight(lb) Horsepower Estimate 11.6843 0.0089 0.0884 Std Error 1.7270 0.0009 0.0123 t Ratio 6.77 10.11 7.21 Prob> t Offer several points of view to help interpret these. Geometric view Table-top geometry of points and planes in 3-D coordinates. Effect of correlation between two predictors on fitted slopes. Graph view (a graph with nodes and edges ) Capture the effects of correlation among predictors Direct and indirect effects of predictors upon response Simple regression (marginal) slope combines direct and indirect effects. Back-to-basics view: What does any slope mean? How does a slope in either model estimate what happens if we change the weight of a car, when in fact we never changed the weight of a car? Comparison of cars of different weight, not changing the weight. So how do you then get a multiple regression slope? (p. 121 for key plot) Deciding which to use: marginal or partial? What question are you trying to answer? Are you comparing two cars that have different weights. Are you comparing two cars with same HP but different weights. Which is more appropriate for estimating cost of gas to CA?

Fall Semester, 2001 3 Key Application Separating the factors that influence sales Which factor is the most important determinant of business growth? How do we evaluate the whole model and the individual components? Concepts and Terminology New plot: Scatterplot matrix Compact graphical summary of the pairwise associations among a collection of several variables. Visual correlation matrix an array of plots rather than numbers. Put the response in the first row, the other predictors arranged below. This plot reveals collinearity among predictors as well as how each is related to the response. Generate this plot using JMP-IN s Multivariate command. Example (car data) on page 116. Scatterplot Matrix 70 60 50 40 30 GP1000M Ci 4000 3500 3000 2500 2000 Weight(lb) 250 200 150 100 Horsepower 3040506070 2000 3000 4000 100150200

Fall Semester, 2001 4 Two types of inference for a multiple regression One coefficient t-ratio Is this slope different from zero? New interpretation in multiple regression, incremental improvement: Does this variable significantly improve a model containing rest? All coefficients overall F-ratio Does this entire model explain significant amounts of variation? Rationale for different procedures Each addresses a specific aspect of the fitted model: t-ratio considers one coefficient (intercept or slope) F-ratio considers all slopes, simultaneously, without allocating. Why not just do a bunch of t-tests, one for each slope? One has to watch out for problems due to the multiplicity of testing several conjectures. With 20 predictors, you expect one significant by chance alone! Too many things appear significant that are not meaningful. New statistical summary: Analysis of variance (ANOVA) Summary of how much variation is being explained per predictor: For car data with Weight and Horsepower as predictors Source Model Error C Total DF 2 109 111 Sum of Squares 7062.5945 1335.0408 8397.6353 Mean Square 3531.30 12.25 F Ratio 288.3143 Prob>F Model-fitting process converts data into regression coefficients. F-ratio answers the question Which explains more variation... one observation or one slope? Two new plots for each type of test t-ratio Leverage plot for each slope (p. 124, with more next time) F-ratio Plot response on fitted values. The F-ratio tests the overall explanatory power of the model, as reflected in R 2, which happens to be the squared

Fall Semester, 2001 5 correlation between the fitted values (predicted values for observations, like the values on the simple regression line) and the actual values. Examples for Today Handouts Two examples illustrate what can happen with collinearity in regression. Beginning from a simple regression with a significantly positive marginal slope, the multiple regression has (a) Partial slope that is near zero, and (b) Partial slope that is significantly negative. Automobile design ( cntd) Car89.jmp, p. 109 What other factors are important for the design? Add variable for Horsepower (p 117) Addition of HP is significant improvement since its t-ratio=7.21 Cost for carrying additional 200 lbs. for 3000 miles 5.3 gals R 2 increases from 77% to 84% and RMSE drops to 3.50 How small can we make the RMSE by adding other predictors? Add Cargo, Seating, Price (p 128) Leverage plots reveal other outliers that exert effects on fit. (p 129) Coefficients for dubious factors (e.g. Price) are sensitive to subsets How do we avoid false positives by searching over many predictors? Expanded data includes some special extra predictors R 2 always goes up, even if predictor does not really help. Bonferroni method : compare p-value to 0.05/(number considered) Revisit the problem of how to go about expanding (or in general, building) a regression model in lectures 9 and 10.

Fall Semester, 2001 6 Key Take-Away Points Multiple regression Partial vs. marginal slope: Which is the right one to use? Testing some/all of the coefficients using t-test or F-ratio. Adding predictors to a regression to improve its fit Useful predictors lead to better predictions Poor predictors claim to improve, but only add noise (Dangers of data dreging and Bonferroni method) Collinearity Predictors are related, making it hard to separate effects: Difference between marginal and partial coefficients Arises from correlation among the predictors Impact can be extreme - Change of sign in marketing handout example - Collapse of model s interpretation in advertising example. Next Time Collinearity and diagnostics for multiple regression How does one quantify the effects of collinearity? How does one check the residuals in a multiple regression, and learn how to add other factors?