CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression



Similar documents
Regression Analysis: A Complete Example

Factors affecting online sales

Chapter 7: Simple linear regression Learning Objectives

Chapter 13 Introduction to Linear Regression and Correlation Analysis

2. Simple Linear Regression

17. SIMPLE LINEAR REGRESSION II

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Non-Linear Regression Samuel L. Baker

Example: Boats and Manatees

2013 MBA Jump Start Program. Statistics Module Part 3

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Regression and Correlation

Correlation key concepts:

The correlation coefficient

DATA INTERPRETATION AND STATISTICS

Simple Linear Regression Inference

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

Using R for Linear Regression

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

A Primer on Forecasting Business Performance

Simple linear regression

August 2012 EXAMINATIONS Solution Part I

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Simple Regression Theory II 2010 Samuel L. Baker

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

4. Simple regression. QBUS6840 Predictive Analytics.

QUADRATIC EQUATIONS EXPECTED BACKGROUND KNOWLEDGE

Probability, Mean and Median

Correlation and Simple Linear Regression

The Big Picture. Correlation. Scatter Plots. Data

Introduction to Regression and Data Analysis

Section A. Index. Section A. Planning, Budgeting and Forecasting Section A.2 Forecasting techniques Page 1 of 11. EduPristine CMA - Part I

AP Physics 1 and 2 Lab Investigations

Multiple Linear Regression

Indiana State Core Curriculum Standards updated 2009 Algebra I

Part 2: Analysis of Relationship Between Two Variables

2. What is the general linear model to be used to model linear trend? (Write out the model) = or

Exercise 1.12 (Pg )

Using Excel for inferential statistics

Copyright 2007 by Laura Schultz. All rights reserved. Page 1 of 5

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools

WEB APPENDIX. Calculating Beta Coefficients. b Beta Rise Run Y X

How To Run Statistical Tests in Excel

Case Study in Data Analysis Does a drug prevent cardiomegaly in heart failure?

International Statistical Institute, 56th Session, 2007: Phil Everson

Chapter 23. Inferences for Regression

Linear Regression. Chapter 5. Prediction via Regression Line Number of new birds and Percent returning. Least Squares

1 Simple Linear Regression I Least Squares Estimation

Module 5: Multiple Regression Analysis

Interaction between quantitative predictors

Module 3: Correlation and Covariance

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Study Guide for the Final Exam

5. Linear Regression

SIMPLE LINEAR CORRELATION. r can range from -1 to 1, and is independent of units of measurement. Correlation can be done on two dependent variables.

General Method: Difference of Means. 3. Calculate df: either Welch-Satterthwaite formula or simpler df = min(n 1, n 2 ) 1.

Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS.

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.

The importance of graphing the data: Anscombe s regression examples

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups

Elements of statistics (MATH0487-1)

Hypothesis testing - Steps

CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA

Simple Linear Regression

Elementary Statistics Sample Exam #3

Copyright 2013 by Laura Schultz. All rights reserved. Page 1 of 7

Scatter Plot, Correlation, and Regression on the TI-83/84

Worksheet A5: Slope Intercept Form

1. How different is the t distribution from the normal?

WHAT IS A BETTER PREDICTOR OF ACADEMIC SUCCESS IN AN MBA PROGRAM: WORK EXPERIENCE OR THE GMAT?

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

SPSS Guide: Regression Analysis

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

TIME SERIES ANALYSIS & FORECASTING

Univariate Regression

11. Analysis of Case-control Studies Logistic Regression

Correlation. What Is Correlation? Perfect Correlation. Perfect Correlation. Greg C Elvers

Getting Correct Results from PROC REG

Least Squares Estimation

12: Analysis of Variance. Introduction

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

STAT 350 Practice Final Exam Solution (Spring 2015)

Final Exam Practice Problem Answers

Pearson s Correlation

10.1 Systems of Linear Equations: Substitution and Elimination

$ ( $1) = 40

Section 3 Part 1. Relationships between two numerical variables

Online Appendix to Are Risk Preferences Stable Across Contexts? Evidence from Insurance Data

Premaster Statistics Tutorial 4 Full solutions

Linear Models in STATA and ANOVA

You buy a TV for $1000 and pay it off with $100 every week. The table below shows the amount of money you sll owe every week. Week

Homework #1 Solutions

MTH 140 Statistics Videos

Linear Approximations ACADEMIC RESOURCE CENTER

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Statistics 151 Practice Midterm 1 Mike Kowalski

Transcription:

Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the relationship etween two or more variales. A simple regression model includes only two variales: one independent and one dependent. The dependent variale is the one eing explained, and the independent variale is the one used to explain the variation in the dependent variale. Linear Regression Definition A (simple) regression model that gives a straight-line relationship etween two variales is called a linear regression model. Figure 13.1 Relationship etween food expenditure and income. (a) Linear relationship. () Nonlinear relationship.

Figure 13. Plotting a linear equation. Figure 13.3 y-intercept and slope of a line. SIMPLE LINEAR REGREION ANALYSIS SIMPLE LINEAR REGREION ANALYSIS Definition In the regression model y = A + Bx + ε, A is called the y- intercept or constant term, B is the slope, and ε is the random error term. The dependent and independent variales are y and x, respectively. SIMPLE LINEAR REGREION ANALYSIS Definition In the model ŷ = a + x, a and, which are calculated using sample data, are called the estimates of A and B, respectively. Tale 13.1 Incomes (in hundreds of dollars) and Food Expenditures of Seven Households

Scatter Diagram Definition A plot of paired oservations is called a scatter diagram. Figure 13.4 Scatter diagram. Figure 13.5 Scatter diagram and straight lines. Figure 13.6 Regression Line and random errors. Error Sum of Squares (E) The error sum of squares, denoted E, is E = e = ( y yˆ ) The values of a and that give the minimum E are called the least square estimates of A and B, and the regression line otained with these estimates is called the least squares line. The Least Squares Line For the least squares regression line ŷ = a + x, where xy = and a = y x ( x)( y) ( x) x xy = xy and = n and stands for sum of squares. The least squares regression line ŷ = a + x is also called the regression of y on x. n

Example 13-1 Find the least squares regression line for the data on incomes and food expenditure on the seven households given in the Tale 13.1. Use income as an independent variale and food expenditure as a dependent variale. Tale 13. Example 13-1: Solution x = 386 y = 108 x = x/ n = 386 / 7 = 55.149 y = y / n = 108 / 7 = 15.486 Example 13-1: Solution ( x)( y) (386)(108) xy = xy = 6403 = 447.5714 n 7 ( ) x (386) = x = 3,058 = 177.8571 n 7 Example 13-1: Solution Figure 13.7 Error of prediction. xy 447.5714 = = =.55 177.8571 a = y x = 15.486 (.55)(55.149) = 1.5050 Thus, our estimated regression model is ŷ = 1.5050 +.55 x

Interpretation of a and Interpretation of a! Consider a household with zero income. Using the estimated regression line otained in Example 13-1, " ŷ = 1.5050 +.55(0) = $1.5050 hundred.! Thus, we can state that a household with no income is expected to spend $150.50 per month on food.! The regression line is valid only for the values of x etween 33 and 83. Interpretation of a and Interpretation of! The value of in the regression model gives the change in y (dependent variale) due to a change of one unit in x (independent variale).! We can state that, on average, a $100 (or $1) increase in income of a household will increase the food expenditure y $5.5 (or $.55). Figure 13.8 Positive and negative linear relationships etween x and y. Case Study 13-1 Regression of Weights on Heights for NFL Players Case Study 13-1 Regression of Weights on Heights for NFL Players Assumptions of the Regression Model Assumption 1: The random error term Є has a mean equal to zero for each x Assumption : The errors associated with different oservations are independent Assumption 3: For any given x, the distriution of errors is normal Assumption 4: The distriution of population errors for each x has the same (constant) standard deviation, which is denoted σ Є

Figure 13.11 (a) Errors for households with an income of $4000 per month. Figure 13.11 () Errors for households with an income of $ 7500 per month. Figure 13.1 Distriution of errors around the population regression line. Figure 13.13 Nonlinear relations etween x and y. STANDARD DEVIATION OF ERRORS AND COEFFICIENT OF DETERMINATION Degrees of Freedom for a Simple Linear Regression Model The degrees of freedom for a simple linear regression model are df = n Figure 13.14 Spread of errors for x = 40 and x = 75.

STANDARD DEVIATION OF ERRORS AND COEFFICIENT OF DETERMINATION The standard deviation of errors is calculated as where s e = n xy Example 13- Compute the standard deviation of errors s e for the data on monthly incomes and food expenditures of the seven households given in Tale 13.1. ( y ) = y n Tale 13.3 Example 13-: Solution ( y ) (108) = y = 179 = 15.7143 n 7 xy 15.7143.55(447.5714) se = = 1.5939 n 7 COEFFICIENT OF DETERMINATION Total Sum of Squares (T) The total sum of squares, denoted y T, is calculated as ( ) T = y n y Figure 13.15 Total errors. Note that this is the same formula that we used to calculate.

Tale 13.4 Figure 13.16 Errors of prediction when regression model is used. COEFFICIENT OF DETERMINATION Regression Sum of Squares (R) The regression sum of squares, denoted y R, is R = T E COEFFICIENT OF DETERMINATION Coefficient of Determination The coefficient of determination, denoted y r, represents the proportion of T that is explained y the use of the regression model. The computational formula for r is r = xy and 0 r 1 Example 13-3 For the data of Tale 13.1 on monthly incomes and food expenditures of seven households, calculate the coefficient of determination. Example 13-3: Solution! From earlier calculations made in Examples 13-1 and 13-,! =.55, = 447.5714, = 15.7143 xy (.55)(447.5714) r = = =.90 15.7143

INFERENCES ABOUT B! Sampling Distriution of! Estimation of B! Hypothesis Testing Aout B Sampling Distriution of Mean, Standard Deviation, and Sampling Distriution of Because of the assumption of normally distriuted random errors, the sampling distriution of is normal. The mean and standard deviation of, denoted y µ and, respectively, σ are σ µ = B and σ = Estimation of B Confidence Interval for B The (1 α)100% confidence interval for B is given y ± ts Example 13-4 Construct a 95% confidence interval for B for the data on incomes and food expenditures of seven households given in Tale 13.1. where s = s e and the value of t is otained from the t distriution tale for α α / area in the right tail of the t distriution and n- degrees of freedom. Example 13-4: Solution s se 1.5939 = = =.0379 177.8571 df = n = 7 = 5 α / = (1.95) / =.05 t =.571 ± ts =.55 ±.571(.0379) =.55 ±.0974 =.155 to.350 Hypothesis Testing Aout B Test Statistic for The value of the test statistic t for is calculated as B t = s The value of B is sustituted from the null hypothesis.

Example 13-5 Test at the 1% significance level whether the slope of the regression line for the example on incomes and food expenditures of seven households is positive. Example 13-5: Solution! Step 1: H 0 : B = 0 (The slope is zero) H 1 : B > 0 (The slope is positive)! Step : σ is not known Hence, we will use the t distriution to make the test aout B. Example 13-5: Solution! Step 3: α =.01 Area in the right tail = α =.01 df = n = 7 = 5 The critical value of t is 3.365. Figure 13.17 Example 13-5: Solution # Step 4: From H 0 B.55 0 t = = = 6.66 s.0379 Example 13-5: Solution! Step 5: The value of the test statistic t = 6.66 " It is greater than the critical value of t = 3.365 " It falls in the rejection region Hence, we reject the null hypothesis We conclude that x (income) determines y (food expenditure) positively.

LINEAR CORRELATION! Linear Correlation Coefficient! Hypothesis Testing Aout the Linear Correlation Coefficient Linear Correlation Coefficient Value of the Correlation Coefficient The value of the correlation coefficient always lies in the range of 1 to 1; that is, -1 ρ 1 and -1 r 1 Figure 13.18 Linear correlation etween two variales. (a) Perfect positive linear correlation, r = 1 Figure 13.18 Linear correlation etween two variales. () Perfect negative linear correlation, r = -1 Copyright 013 John Wiley x & Sons. All rights reserved. Copyright 013 John Wiley x & Sons. All rights reserved. Figure 13.18 Linear correlation etween two variales. Figure 13.19 Linear correlation etween variales. (c) No linear correlation,, r 0 Copyright 013 John Wiley x & Sons. All rights reserved.

Figure 13.19 Linear correlation etween variales. Figure 13.19 Linear correlation etween variales. Figure 13.19 Linear correlation etween variales. Linear Correlation Coefficient Linear Correlation Coefficient The simple linear correlation coefficient, denoted y r, measures the strength of the linear relationship etween two variales for a sample and is calculated as r = xy Example 13-6 Calculate the correlation coefficient for the example on incomes and food expenditures of seven households. Example 13-6: Solution r = xy 447.5714 = =.95 (177.8571)(15.7143)

Hypothesis Testing Aout the Linear Correlation Coefficient Test Statistic for r If oth variales are normally distriuted and the null hypothesis is H 0 : ρ = 0, then the value of the test statistic t is calculated as t = r n 1 r Example 13-7 Using the 1% level of significance and the data from Example 13-1, test whether the linear correlation coefficient etween incomes and food expenditures is positive. Assume that the populations of oth variales are normally distriuted. Here n are the degrees of freedom. Example 13-7: Solution! Step 1: H 0 : ρ = 0 (The linear correlation coefficient is zero) H 1 : ρ > 0 (The linear correlation coefficient is positive)! Step : The population distriutions for oth variales are normally distriuted. Hence, we can use the t distriution to perform this test aout the linear correlation coefficient. Example 13-7: Solution! Step 3: Area in the right tail =.01 df = n = 7 = 5 The critical value of t = 3.365 Figure 13.0 Example 13-7: Solution # Step 4:!=" % #/$ " # =.&'($ ) #/$ (.&'($ ) # =6.667

Example 13-7: Solution! Step 5: The value of the test statistic t = 6.667 " It is greater than the critical value of t=3.365 " It falls in the rejection region Hence, we reject the null hypothesis. REGREION ANALYSIS: A COMPLETE Example 13-8 A random sample of eight drivers selected from a small city insured with a company and having similar minimum required auto insurance policies was selected. The following tale lists their driving experiences (in years) and monthly auto insurance premiums (in dollars). We conclude that there is a positive relationship etween incomes and food expenditures. Example 13-8 Example 13-8 (a) Does the insurance premium depend on the driving experience or does the driving experience depend on the insurance premium? Do you expect a positive or a negative relationship etween these two variales? () Compute,, and xy. (c) Find the least squares regression line y choosing appropriate dependent and independent variales ased on your answer in part a. (d) Interpret the meaning of the values of a and calculated in part c. Example 13-8 (e) Plot the scatter diagram and the regression line. (f) Calculate r and r and explain what they mean. (g) Predict the monthly auto insurance for a driver with 10 years of driving experience. (h) Compute the standard deviation of errors. (i) Construct a 90% confidence interval for B. (j) Test at the 5% significance level whether B is negative. (k) Using α =.05, test whether ρ is different from zero. (a) Based on theory and intuition, we expect the insurance premium to depend on driving experience. " The insurance premium is a dependent variale " The driving experience is an independent variale

Tale 13.5 () x = x/ n = 90/8= 11.5 y = y/ n = 474/8= 59.5 xy ( x)( y) (90)(474) = xy = 4739 = 593.5000 n 8 ( x) (90) 1396 n 8 383.5000 ( y ) (474) 9,64 n 8 1557.5000 = x = = = y = = (c) xy 593.5000 = = = 1.5476 383.5000 a = y x = 59.5 ( 1.5476)(11.5) = 76.6605 (d) The value of a = 76.6605 gives the value of ŷ for x = 0; that is, it gives the monthly auto insurance premium for a driver with no driving experience. The value of = -1.5476 indicates that, on average, for every extra year of driving experience, the monthly auto insurance premium decreases y $1.55. ŷ=)-.--./ $./')-) Figure 13.1 Scatter diagram and the regression line. (e) The regression line slopes downward from left to right. (f) xy 593.5000 r = = =.77 (383.5000)(1557.5000) r xy ( 1.5476)( 593.5000) = = =.59 1557.5000

(f) The value of r = -0.77 indicates that the driving experience and the monthly auto insurance premium are negatively related. The (linear) relationship is strong ut not very strong. The value of r² = 0.59 states that 59% of the total variation in insurance premiums is explained y years of driving experience and 41% is not. (g) Using the estimated regression line, we find the predicted value of y for x = 10 is ŷ = 76.6605 1.5476(10) = $61.18 Thus, we expect the monthly auto insurance premium of a driver with 10 years of driving experience to e $61.18. (h) s e = n xy 1557.5000 ( 1.5476)( 593.5000) = 8 = 10.3199 (i) se 10.3199 s = = =.570 383.5000 α / =.5 (.90/) =.05 df = n = 8 = 6 t = 1.943 ± ts = 1.5476 ± 1.943(.570) = 1.5476 ± 1.040 =.57 to.5 (j)! Step 1: H 0 : B = 0 (B is not negative) H 1 : B < 0 (B is negative)! Step 3: Area in the left tail = α =.05 df = n = 8 = 6 The critical value of t is -1.943! Step : Because the standard deviation of the error is not known, we use the t distriution to make the hypothesis test

Figure 13. # Step 4: From H 0 B 1.5476 0 t = = =.937 s.570! Step 5: The value of the test statistic t = -.937 " It falls in the rejection region Hence, we reject the null hypothesis and conclude that B is negative. (k)! Step 1: H 0 : ρ = 0 (The linear correlation coefficient is zero) H 1 : ρ 0 (The linear correlation coefficient is different from zero) The monthly auto insurance premium decreases with an increase in years of driving experience.! Step : Assuming that variales x and y are normally distriuted, we will use the t distriution to perform this test aout the linear correlation coefficient. Figure 13.3! Step 3: Area in each tail =.05/ =.05 df = n = 8 = 6 The critical values of t are -.447 and.447

# Step 4: *=" % #/$ " # =.)-)& ( #/$ (.))) # = -.936! Step 5: The value of the test statistic t = -.936 " It falls in the rejection region Hence, we reject the null hypothesis We conclude that the linear correlation coefficient etween driving experience and auto insurance premium is different from zero. USING THE REGREION MODEL! Using the Regression Model for Estimating the Mean Value of y! Using the Regression Model for Predicting a Particular Value of y Figure 13.4 Population and sample regression lines. Using the Regression Model for Estimating the Mean Value of y Confidence Interval for µ y x The (1 α)100% confidence interval for µ y x for x = x 0 is ˆ y ± t s ym ˆ where the value of t is otained from the t distriution tale for α/ area in the right tail of the t distriution curve and df = n. Using the Regression Model for Estimating the Mean Value of y Confidence Interval for µ y x The value of s y is calculated as follows: s ˆm 1 ( x0 x) = s + n yˆ m e

Example 13-9 Refer to Example 13-1 on incomes and food expenditures. Find a 99% confidence interval for the mean food expenditure for all households with a monthly income of $5500. Example 13-9: Solution! Using the regression line estimated in Example 13-1, we find the point estimate of the mean food expenditure for x = 55 " ŷ = 1.5050 +.55(55) = $15.395 hundred! Area in each tail = α/ = (1.99)/ =.005! df = n = 7 = 5! t = 4.03 Example 13-9: Solution s = 1.5939, x = 55.149, and = 177.8571 S e yˆ m 1 ( x0 x) = se + n 1 (55 55.149) = (1.5939) + =.605 7 177.8571 Example 13-9: Solution Hence, the 99% confidence interval for µ yˆ ± ts = 15.395 ± 4.03(.605) yˆ m y 55 = 15.395 ±.493 = 1.963 to 17.818 is Using the Regression Model for Predicting a Particular Value of y Prediction Interval for y p The (1 α)100% prediction interval for the predicted value of y, denoted y y p, for x = x 0 is ˆ y ± t s y ˆ p Using the Regression Model for Predicting a Particular Value of y Prediction Interval for y p where the value of t is otained from the t distriution tale for α/ area in the right tail of the t distriution curve and df = n. The value of s s y ˆ p yˆ p e is calculated as follows: 1 ( x0 x) = s 1+ + n

Example 13-10 Refer to Example 13-1 on incomes and food expenditures. Find a 99% prediction interval for the predicted food expenditure for a randomly selected household with a monthly income of $5500. Example 13-10: Solution! Using the regression line estimated in Example 13-1, we find the point estimate of the predicted food expenditure for x = 55 " ŷ = 1.5050 +.55(55) = $15.395 hundred! Area in each tail = α/ = (1.99)/ =.005! df = n = 7 = 5! t = 4.03 Example 13-10: Solution Example 13-10: Solution s = 1.5939, x = 55.149, and = 177.8571 S e yˆ p 1 ( x0 x) = se 1+ + n 1 (55 55.149) = (1.5939) 1+ + = 1.7040 7 177.8571 Hence, the 99% prediction interval for y for x = 55 is yˆ ± t s =15.395 ± 4.03(1.7040) ŷ p = 15.395 ± 6.8705 = 8.50 to.630 p TI-84 TI-84

Minita Excel Excel Excel Excel