Multiple Regression. Cautions About Simple Linear Regression

Similar documents
Correlation and Regression

MULTIPLE REGRESSION EXAMPLE

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

August 2012 EXAMINATIONS Solution Part I

Chapter 23. Inferences for Regression

MODEL I: DRINK REGRESSED ON GPA & MALE, WITHOUT CENTERING

Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY

Interaction effects between continuous variables (Optional)

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Addressing Alternative. Multiple Regression Spring 2012

Multicollinearity Richard Williams, University of Notre Dame, Last revised January 13, 2015

Regression Analysis: A Complete Example

STAT 350 Practice Final Exam Solution (Spring 2015)

Nonlinear Regression Functions. SW Ch 8 1/54/

Stat 412/512 CASE INFLUENCE STATISTICS. Charlotte Wickham. stat512.cwick.co.nz. Feb

The importance of graphing the data: Anscombe s regression examples

MTH 140 Statistics Videos

Lab 5 Linear Regression with Within-subject Correlation. Goals: Data: Use the pig data which is in wide format:

Simple linear regression

2. Simple Linear Regression

Stata Walkthrough 4: Regression, Prediction, and Forecasting

Failure to take the sampling scheme into account can lead to inaccurate point estimates and/or flawed estimates of the standard errors.

Department of Economics Session 2012/2013. EC352 Econometric Methods. Solutions to Exercises from Week (0.052)

Chapter 7: Simple linear regression Learning Objectives

Marginal Effects for Continuous Variables Richard Williams, University of Notre Dame, Last revised February 21, 2015

Module 5: Multiple Regression Analysis

The leverage statistic, h, also called the hat-value, is available to identify cases which influence the regression model more than others.

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

Lecture 15. Endogeneity & Instrumental Variable Estimation

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Handling missing data in Stata a whirlwind tour

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

1.1. Simple Regression in Excel (Excel 2010).

DETERMINANTS OF CAPITAL ADEQUACY RATIO IN SELECTED BOSNIAN BANKS

11. Analysis of Case-control Studies Logistic Regression

SIMPLE LINEAR CORRELATION. r can range from -1 to 1, and is independent of units of measurement. Correlation can be done on two dependent variables.

Nonlinear relationships Richard Williams, University of Notre Dame, Last revised February 20, 2015

Linear Regression Models with Logarithmic Transformations

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools

Forecasting in STATA: Tools and Tricks

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

MODELING AUTO INSURANCE PREMIUMS

SPSS Guide: Regression Analysis

South Carolina College- and Career-Ready (SCCCR) Probability and Statistics

Data Analysis Methodology 1

Relationships Between Two Variables: Scatterplots and Correlation

especially with continuous

The Numbers Behind the MLB Anonymous Students: AD, CD, BM; (TF: Kevin Rader)

Using Stata 9 & Higher for OLS Regression Richard Williams, University of Notre Dame, Last revised January 8, 2015

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

ESTIMATING AVERAGE TREATMENT EFFECTS: IV AND CONTROL FUNCTIONS, II Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics

Outliers Richard Williams, University of Notre Dame, Last revised April 7, 2016

2013 MBA Jump Start Program. Statistics Module Part 3

Regression Analysis (Spring, 2000)

TRINITY COLLEGE. Faculty of Engineering, Mathematics and Science. School of Computer Science & Statistics

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Univariate Regression

10. Analysis of Longitudinal Studies Repeat-measures analysis

25 Working with categorical data and factor variables

Rockefeller College University at Albany

An Analysis of the Undergraduate Tuition Increases at the University of Minnesota Duluth

Lecture 5: Model Checking. Prof. Sharyn O Halloran Sustainable Development U9611 Econometrics II

Multiple Regression: What Is It?

Correlation and Simple Linear Regression

The correlation coefficient

Homework 11. Part 1. Name: Score: / null

Using R for Linear Regression

Linear Regression. Chapter 5. Prediction via Regression Line Number of new birds and Percent returning. Least Squares

A Cohort Study of Traffic-related Air Pollution and Mortality in Toronto, Canada: Online Appendix

I n d i a n a U n i v e r s i t y U n i v e r s i t y I n f o r m a t i o n T e c h n o l o g y S e r v i c e s

ijcrb.com INTERDISCIPLINARY JOURNAL OF CONTEMPORARY RESEARCH IN BUSINESS AUGUST 2014 VOL 6, NO 4

Discussion Section 4 ECON 139/ Summer Term II

Outline: Demand Forecasting

2. Here is a small part of a data set that describes the fuel economy (in miles per gallon) of 2006 model motor vehicles.

Quick Stata Guide by Liz Foster

Data Analysis Tools. Tools for Summarizing Data

Using Excel for Statistical Analysis

The average hotel manager recognizes the criticality of forecasting. However, most

Review of Bivariate Regression

International Statistical Institute, 56th Session, 2007: Phil Everson

DATA INTERPRETATION AND STATISTICS

Correlation key concepts:

xtmixed & denominator degrees of freedom: myth or magic

Simple Linear Regression

ch12 practice test SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.

Mathematics Online Instructional Materials Correlation to the 2009 Algebra I Standards of Learning and Curriculum Framework

Chapter 18. Effect modification and interactions Modeling effect modification

Diagrams and Graphs of Statistical Data

Standard errors of marginal effects in the heteroskedastic probit model

hp calculators HP 50g Trend Lines The STAT menu Trend Lines Practice predicting the future using trend lines

Assignments Analysis of Longitudinal data: a multilevel approach

List of Examples. Examples 319

4. Simple regression. QBUS6840 Predictive Analytics.

Chapter 4 and 5 solutions

Transcription:

Multiple Regression Cautions About Simple Linear Regression Correlation and regression are powerful tools for describing relationship between two variables, but be aware of their limitations Correlation and regression describe only linear relations Correlation and least-squares regression line are not resistant to outliers Predictions outside the range of observed data are often inaccurate Relationship between two variables often influenced by lurking variables not included in our model 1

Least-Squares Regression of Heart Disease and Wine Consumption Heart disease deaths per 100,000 people 0 100 200 300 0 2 4 6 8 10 Alcohol with Wine (liters per person per year) 2

Does this regression provide strong evidence that increased wine consumption lowers the risk of heart disease? no Lurking variables Ecological fallacy Wealth Heart disease Wine consumption We can t make inferences about what individuals do, based on aggregate data Are individuals who drink more wine suffering less heart disease? General Principles of Data Analysis Plot your data To understand the data, always start with a series of graphs Interpret what you see Look for overall pattern and deviations from that pattern Numerical summary? Choose an appropriate measure to describe the pattern and deviation Mathematical model? If the pattern is regular, summarize the data in a compact mathematical model 3

Analysis of Two Quantitative Variables Plot your data For two quantitative variables, use a scatterplot Interpret what you see Describe the direction, form, and strength of the relationship Numerical summary? If pattern is roughly linear, summarize with correlation, means, and standard deviations Mathematical model? Regression gives a compact model of overall pattern, if relationship is roughly linear Analysis of Three or More Quantitative Variables Plot your data To examine relationships among all possible pairs use a scatterplot matrix Interpret what you see Describe the direction, form, and strength of the relationships Numerical summary? If pattern is roughly linear, summarize with correlations, means, and standard deviations Mathematical model? Multiple regression gives a compact model of relationship between response variable and a set of predictors 4

Blood alcohol content 0.05.1.15.2 0 2 4 6 8 10 Number of 12 ounce beers consumed 0 5 10.2 Blood alcohol content.1 10 0 5 Number of 12 ounce beers consumed 0 300 Weight (lbs) 200 0.1.2 100 200 300 100 In Stata, obtain this graph with graph matrix bac beers weight 5

Correlation Matrix in Stata corr uses only cases with no missing values on any variable (like regress). corr bac beers weight (obs=16) Because it is a symmetrical matrix, only half is shown bac beers weight -------------+--------------------------- bac 1.0000 beers 0.8943 1.0000 weight -0.1550 0.2489 1.0000 Weak, negative correlation between weight and BAC Weak, positive correlation between weight and number of beers consumed Correlation Matrix in Stata sig gives p-values for hypothesis that r is indistiguisable from 0 pwcorr uses all cases with no missing values for each pair. pwcorr bac beers weight, sig sidak obs bac beers weight -------------+--------------------------- bac 1.0000 16 beers 0.8943 1.0000 0.0000 16 16 weight -0.1550 0.2489 1.0000 0.9186 0.7287 16 16 16 sidak option corrects p-values for multiple comparisons 6

Multiple Regression in Stata. regress bac beers weight Overall F-test of model Source SS df MS Number of obs = 16 -------------+------------------------------ F( 2, 13) = 128.33 Model.027816116 2.013908058 R 2 Prob > F = 0.0000 Residual.001408883 13.000108376 R-squared = 0.9518 -------------+------------------------------ Adj R-squared = 0.9444 Total.029225 15.001948333 Root MSE =.01041 slope, b 1 ------------------------------------------------------------------------------ slope, bac b 2 Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- beers.0199757.0012629 15.82 0.000.0172474.022704 weight -.0003628.0000567-6.40 0.000 -.0004853 -.0002404 _cons.0398634.0104333 3.82 0.002.0173236.0624031 ------------------------------------------------------------------------------ y-intercept, a y ^ = a + b 1 x 1 + b 2 x 2 Estimated BAC =.0398 + (.0200)(Beers consumed) (.0003)(Weight) In Stata, obtain added-variable plots with avplots e( bac X ) -.1 -.05 0.05.1-4 -2 0 2 4 6 e( beers X ) coef =.01997571, se =.0012629, t = 15.82 e( bac X ) -.04 -.02 0.02.04-100 -50 0 50 100 e( weight X ) coef = -.00036282, se =.00005668, t = -6.4 7

Residuals-versus-Fitted Plot Residuals -.02 -.01 0.01.02 0.05.1.15.2 Fitted values In Stata, obtain this plot after regress with rvfplot, yline(0) Residuals-versus-Predictor Plot Residuals -.02 -.01 0.01.02 100 150 200 250 300 Weight (lbs) In Stata, obtain this plot after regress with rvpplot weight, yline(0) 8