Chapter 12. Simple Linear Regression and Correlation

Size: px
Start display at page:

Download "Chapter 12. Simple Linear Regression and Correlation"

Transcription

1 Chapter 12. Simple Linear Regression and Correlation 12.1 The Simple Linear Regression Model 12.2 Fitting the Regression Line 12.3 Inferences on the Slope Rarameter β Inferences on the Regression Line 12.5 Prediction Intervals for Future Response Values 12.6 The Analysis of Variance Table 12.7 Residual Analysis 12.8 Variable Transformations 12.9 Correlation Analysis Supplementary Problems NIPRL 1

2 12.1 The Simple Linear Regression Model Model Definition and Assumptions(1/5) With the simple linear regression model y i =β 0 +β 1 x i +ε i the observed value of the dependent variable y i is composed of a linear function β 0 +β 1 x i of the explanatory variable x i, together with an error term ε i. The error terms ε 1,,ε,ε n are generally taken to be independent observations from a N(0,σ 2 ) distribution, for some error variance σ 2. This implies that the values y 1,,y n are observations from the independent random variables Y i ~ N (β 0 +β 1 x i, σ 2 ) as illustrated in Figure 12.1 NIPRL 2

3 Model Definition and Assumptions(2/5) NIPRL 3

4 Model Definition and Assumptions(3/5) The parameter β 0 is known as the intercept parameter, and the parameter β 0 is known as the intercept parameter, and the parameter β 1 is known as the slope parameter. A third unknown parameter, the error variance σ 2, can also be estimated from the data set. As illustrated in Figure 12.2, the data values (x i, y i ) lie closer to the line y = β 0 +β 1 x as the error variance σ 2 decreases. NIPRL 4

5 Model Definition and Assumptions(4/5) The slope parameter β 1 is of particular interest since it indicates how the expected value of the dependent variable depends upon the explanatory variable x, as shown in Figure 12.3 The data set shown in Figure 12.4 exhibits a quadratic (or at least nonlinear) relationship between the two variables, and it would make no sense to fit a straight line to the data set. NIPRL 5

6 Model Definition and Assumptions(5/5) Simple Linear Regression Model The simple linear regression model y i = β 0 + β 1 x i + ε i fit a straight line through a set of paired data observations (x 1,y 1 ),,(x n, y n ). The error terms ε 1,,ε n are taken to be independent observations from a N(0,σ 2 ) distribution. The three unknown parameters, the intercept parameter β 0, the slope parameter β 1, and the error variance σ 2, are estimated from the data set. NIPRL 6

7 Examples(1/2) Example 3 : Car Plant Electricity Usage The manager of a car plant wishes to investigate how the plant s electricity usage depends upon the plant s production. The linear model y = β + βx 0 1 will allow a month s electrical usage to be estimated as a function of the month s production. NIPRL 7

8 Examples(2/2) NIPRL 8

9 12.2 Fitting the Regression Line Parameter Estimation(1/4) The regression line y = β + βx is fitted to the data points ( x, y ), K,( x, y ) by finding the line that is "closest" to the data points in some sense. As Figure illustrates, the fitted line is chosen to be the line that minimizes the sum of the squares of these vertical deviations Q = ( y ( β + βx )) n i= 1 i 0 1 i and this is referred to as the least squares fit. 2 n n NIPRL 9

10 Parameter Estimation(2/4) With normally distributed error terms, ˆ β and ˆ β are maximum likelihood estimates. n 2 εi i= ( Q) The joint density of the error terms ε, K, ε is Q β 0 1 n 1 2σ e. 2πσ This likelihood is maximized by minizing 2 2 εi = ( yi ( β0 + β1 i )) = n i= 1 i 0 1 i n 0 1 i= 1 i x Q = 2( y ( β + βx )) and Q n = i= 1 2 xi ( yi ( β0 + β1xi )) β the normal equations y = n ˆ β + ˆ β x and i = ˆ β + ˆ β n n n 2 i= 1xy i i 0 i= 1xi 1 i= 1x i NIPRL 10 1 n

11 β n xy ( x )( y ) S n n n i= 1 i i i= 1 i i= 1 i XY 1 = = n 2 n 2 n i=1xi ( i=1xi ) SXX and then n n i= 1yi i= 1xi β 0 = β 1 = y β 1x n n where S = ( x x) = x nx n 2 n 2 2 XX i= 1 i i= 1 i n n 2 ( i= 1xi ) = i= 1xi n and Parameter Estimation(3/4) 2 n n n n n ( i= 1xi )( i= 1yi ) SXY = i= 1( xi x)( yi y) = i= 1xy i i nxy = i= 1xy i i n * For a specific value of the explanatory variable x, this equation * provides a fitted value yˆ = β + β x for the dependent variable y, as illustrated in Figure x * 0 1 NIPRL 11

12 Parameter Estimation(4/4) 2 The error variance can be estimated by considering the deviations between the observed data values y and their fitted values y. Specifically, the sum of squares for error SSE is i σ defined to be the sum of the squares of these deviations n n SSE = ( y y ) = ( y ( β + β x )) 2 2 i= 1 i i i= 1 i 0 1 i n 2 n n = y β y β xy i= 1 i 0 i= 1 i 1 i= 1 i i and the estimate of the error variance is 2 SSE σ = n 2 i NIPRL 12

13 Examples(1/5) Example 3 : Car Plant Electricity Usage For this example n 12 i= 1 12 i= 1 12 i= 1 12 i= 1 12 i= 1 x i y x y i = 12 and = L = = L = = L = i = L = i xy i i = ( ) + L+ ( ) = NIPRL 13

14 Examples(2/5) NIPRL 14

15 i i i i i= 1 i= 1 i= 1 β 1 = n n 2 2 n xi ( xi ) i= 1 i= Examples(3/5) The estimates of the slope parameter and the intercept parameter : β n n n n xy ( x )( y ) ( ) ( ) = = ( ) = y βx = ( ) = The fitted regression line : y = β + βx = x 0 1 $ y = ( ) = NIPRL 15

16 Examples(4/5) Using the model for production values x outside this range is known as extrapolatio n and may give inaccurate results. NIPRL 16

17 Examples(5/5) n n n 2 y i β0 yi β1 xy i i 2 i= 1 i= 1 i= 1 σ = n ( ) ( ) = = 10 σ = = NIPRL 17

18 12.3 Inferences on the Slope Parameter β Inference Procedures(1/4) Inferences on the Slope Parameter β 1 2 ˆ σ β1 Ν( β1, ). S XX A two-sided confidence interval with a confidence level 1 α for the slope parameter in a simple linear regression model is β ( β t se..( β ), β + t se..( β )) 1 1 α / 2, n α / 2, n 2 1 which is σt α / 2, n 2 σtα / 2, n 2 β1 ( β1, β1 + ) S S XX One-sided 1 α confidence level confidence intervals are σt α, n 2 σtα, n 2 β1 (, β1 + ) and β1 ( β1, ) S S XX XX XX NIPRL 18

19 Inference Procedures(2/4) The two-sided hypotheses H : β = b versus H : β b A 1 1 for a fixed value b of interest are tested with the t-statistic β1 b1 t = σ S XX 1 The p-value is p-value = 2 P( X > t ) where the random variable X has a t-distribution with n 2 degrees of freedom. A size α test rejects the null hypothesis if t > t α / 2, n 2. NIPRL 19

20 Inference Procedures(3/4) The one-sided hypotheses H : β b versus H : β < b A 1 1 have a p-value p-value = P( X < t) and a size α test rejects the null hypothesis if t < t α. The one-sided hypotheses H : β b versus H : β > b A 1 1 have a p-value p-value = P( X > t) and a size α test rejects the null hypothesis if t > t α., n 2, n 2 NIPRL Slki Lab. 20

21 Inference Procedures(4/4) An interesting point to notice is that for a fixed value of the error variance σ 2, the variance of the slope parameter estimate decreases as the value of S XX increases. This happens as the values of the explanatory variable x i become more spread out, as illustrated in Figure This result is intuitively reasonable since a greater spread in the values x i provides a greater leverage for fitting the regression line, and therefore the slope parameter estimate β 1 should be more accurate. NIPRL 21

22 Examples(1/2) Example 3 : Car Plant Electricity Usage 12 2 ( x 12 i ) 2 2 i= SXX = xi = = i= σ se..( β1) = = = S XX The t-statistic for testing H : β = 0 β1 t se.. β ( 1) = = The two-sided p-value p value = 2 P( X > 6.37) 0 NIPRL 22

23 Examples(2/2) With t 0.005,10 = 3.169, a 99% two-sided confidence interval for the slope parameter β ( β critical point se..( β ), β + critical point se..( β )) = ( , ) = ( 0.251, 0.747) NIPRL 23

24 12.4 Inferences on the Regression Line Inference Procedures(1/2) Inferences on the Expected Value of the Dependent Variable * A 1 α confidence level two-sided confidence interval for β0 + β1x, the * expected value of the dependent variable for a particular value of the explanatory variable, is * * β * 0 + β1x ( β0 + β1 x tα / 2, n 1 se..( β0 + β1x ), * * β + βx + t se..( β + βx )) where * 1 ( x x) se..( β0 + β1x ) = σ + n S 0 1 α / 2, n * 2 XX x NIPRL 24

25 Inference Procedures(2/2) One-sided confidence intervals are * * * β + βx (, β + βx + t se..( β + βx )) α, n and * * * β + β x ( β + β x t se..( β + β x ), ) α, n β β * Hypothesis tests on 0 + 1x can be performed by comparing the t-statistic ( β t = * * 0 + β1x ) ( β0 + β1x ) * se..( β + βx ) 0 1 with a t-distribution with n 2 degrees of freedom. NIPRL 25

26 Examples(1/2) Example 3 : Car Plant Electricity Usage ( x x ) 2 * * * ( x 4.885) se..( β0 + β1x ) = σ + = n S With β t XX = 2.228, a 95% confidence interval for β + βx 0.025, * 2 1 ( x 4.885) ( , * * 0 + β1x + x + * At 5 x = x + * * 2 1 ( x 4.885) + ) β + 5 β ( ( ) 0.113, ( ) ) = (2.79,3.02) * NIPRL 26

27 Examples(2/2) NIPRL 27

28 12.5 Prediction Intervals for Future Response Values Inference Procedures(1/2) Prediction Intervals for Future Response Values A 1 α confidence level two-sided prediction interval for y, a future value * of the dependent variable for a particular value of the explanatory variable, is x x * * 1 ( x x) y * ( β0 + β1 x t / 2, n 1 1 x α σ + + n S * 2 XX, * 2 * 1 ( x x) β0 + β1 x + tα / 2, n 2σ 1+ + ) n S XX NIPRL 28

29 Inference Procedures(2/2) One-sided confidence intervals are * 2 * 1 ( x x) y * (, β0 + β1 x + t, n 2 1 ) x α σ + + n S and * 2 * 1 ( x x) y * ( β0 + β1 x t, n 1 1, ) x α σ + + n S XX XX NIPRL 29

30 Examples(1/2) Example 3 : Car Plant Electricity Usage With t = 2.228, a 95% confidence interval for y y 0.025,10 13 ( x 4.885) * 2 * * ( x , x * At x = ( x 4.885) + x * 2 * ) y ( ( ) 0.401, ( ) ) = (2.50,3.30) x * NIPRL 30

31 Examples(2/2) NIPRL 31

32 12.6 The Analysis of Variance Table Sum of Squares Decomposition(1/5) NIPRL 32

33 Sum of Squares Decomposition(2/5) NIPRL 33

34 Sum of Squares Decomposition(3/5) Source Degrees of freedom Sum of squares Mean squares F-statistic p-value Regression Error 1 N-2 SSR SSE 2 σ MSR=SSR =MSE=SSE/(n-2) F=MSR/MSE P( F 1,n-2 > F ) 1,n Total n-1 F I G U R E Analysis of variance table for simple linear regression analysis NIPRL 34

35 Sum of Squares Decomposition(4/5) NIPRL 35

36 Sum of Squares Decomposition(5/5) Coefficient of Determination R 2 The total variability in the dependent variable, the total sum of squares SST = ( y y) n i= 1 2 can be partitioned into the variability explained by the regression line, n the regression sum of squares SSR = ( y y) i i= 1 2 i and the variability about the regression line, the error sum of squares n 2 i= 1 yi yi SSE = ( ). The proportion of the total variability accounted for by the regression line is the coefficient of determination 2 SSR SSE 1 R = = 1 = SST SST SSE 1 + SSR which takes a value between zero and one. NIPRL 36

37 Examples(1/1) Example 3 : Car Plant Electricity Usage MSR F = = = MSE SSR R = = = SST NIPRL 37

38 12.7 Residual Analysis Residual Analysis Methods(1/7) The residuals are defined to be e i = yi yi, 1 i n so that they are the differences between the observed values of the dependent variable and ythe corresponding fitted values. A property of the residuals n e = 0 i= 1 i 0 Residual analysis can be used to i Identify data points that are outliers, Check whether the fitted model is appropriate, Check whether the error variance is constant, and Check whether the error terms are normally distributed. y i NIPRL 38

39 Residual Analysis Methods(2/7) A nice random scatter plot such as the one in Figure there are no indications of any problems with the regression analysis Any patterns in the residual plot or any residuals with a large absolute value alert the experimenter to possible problems with the fitted regression model. NIPRL 39

40 Residual Analysis Methods(3/7) A data point (x i, y i ) can be considered to be an outlier if it does not appear to predict well by the fitted model. Residuals of outliers have a large absolute value, as indicated in Figure Note in the figure that e i is used instead of e ˆ i. s [For your interest only] Var e 1 ( x - x) 2 i 2 ( i ) = (1- - ) s. n SXX NIPRL 40

41 Residual Analysis Methods(4/7) If the residual plot shows positive and negative residuals grouped together as in Figure 12.47, then a linear model is not appropriate. As Figure indicates, a nonlinear model is needed for such a data set. NIPRL 41

42 Residual Analysis Methods(5/7) If the residual plot shows a funnel shape as in Figure 12.48, so that the size of the residuals depends upon the value of the explanatory variable x, then the assumption of a constant error variance σ 2 is not valid. NIPRL 42

43 Residual Analysis Methods(6/7) A normal probability plot ( a normal score plot) of the residuals Check whether the error terms ε i appear to be normally distributed. The normal score of the i th smallest residual 3 i 1 8 Φ 1 n + 4 The main body of the points in a normal probability plot lie approximately on a straight line as in Figure is reasonable The form such as in Figure indicates that the distribution is not normal NIPRL 43

44 Residual Analysis Methods(7/7) NIPRL 44

45 Example : Nile River Flowrate Examples(1/2) NIPRL 45

46 Examples(2/2) x = 3.88 $ y = ( ) = 2.77 e = y y = = 1.24 i i i ei 1.24 = = 3.75 σ x = 6.13 e = y y = 5.67 ( ( )) = 1.02 i i i ei 1.02 = = 3.07 σ NIPRL 46

47 12.8 Variable Transformations Intrinsically Linear Models(1/4) NIPRL 47

48 Intrinsically Linear Models(2/4) NIPRL 48

49 Intrinsically Linear Models(3/4) NIPRL 49

50 Intrinsically Linear Models(4/4) NIPRL 50

51 Examples(1/5) Example : Roadway Base Aggregates NIPRL 51

52 Examples(2/5) NIPRL 52

53 Examples(3/5) NIPRL 53

54 Examples(4/5) NIPRL 54

55 Examples(5/5) NIPRL 55

56 12.9 Correlation Analysis The Sample Correlation Coefficient Sample Correlation Coefficient The sample correlation coefficientr for a set of paired data observations ( x, y ) is i i n SXY i= 1( xi x)( yi y) r = = S S ( x x ) ( y y ) n 2 n 2 XX YY i= 1 i i= 1 i n i= 1xy i i nxy = x nx y ny n 2 2 n 2 2 i= 1 i= 1 i i It measures the strength of linear association between two variables and can be thought of as an estimate of the correlation ρ between the two associated random variable X and Y. NIPRL 56

57 Under the assumption that the X and Y random variables have a bivariate normal distribution, a test of the null hypothesis H 0 : ρ = 0 can be performed by comparing the t-statistic r n 2 t = 2 1 r with a t-distribution with n 2 degrees of freedom. In a regression framework, this test is equivalent to testing H : β = NIPRL 57

58 NIPRL 58

59 NIPRL 59

60 Example : Nile River Flowrate Examples(1/1) r R 2 = = = NIPRL 60

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the

More information

Regression Analysis: A Complete Example

Regression Analysis: A Complete Example Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

More information

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation

More information

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

More information

2. Simple Linear Regression

2. Simple Linear Regression Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

More information

Example: Boats and Manatees

Example: Boats and Manatees Figure 9-6 Example: Boats and Manatees Slide 1 Given the sample data in Table 9-1, find the value of the linear correlation coefficient r, then refer to Table A-6 to determine whether there is a significant

More information

Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

More information

Factors affecting online sales

Factors affecting online sales Factors affecting online sales Table of contents Summary... 1 Research questions... 1 The dataset... 2 Descriptive statistics: The exploratory stage... 3 Confidence intervals... 4 Hypothesis tests... 4

More information

Simple Regression Theory II 2010 Samuel L. Baker

Simple Regression Theory II 2010 Samuel L. Baker SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

Hypothesis testing - Steps

Hypothesis testing - Steps Hypothesis testing - Steps Steps to do a two-tailed test of the hypothesis that β 1 0: 1. Set up the hypotheses: H 0 : β 1 = 0 H a : β 1 0. 2. Compute the test statistic: t = b 1 0 Std. error of b 1 =

More information

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

More information

2. What is the general linear model to be used to model linear trend? (Write out the model) = + + + or

2. What is the general linear model to be used to model linear trend? (Write out the model) = + + + or Simple and Multiple Regression Analysis Example: Explore the relationships among Month, Adv.$ and Sales $: 1. Prepare a scatter plot of these data. The scatter plots for Adv.$ versus Sales, and Month versus

More information

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Module 7 Test Name MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. You are given information about a straight line. Use two points to graph the equation.

More information

2013 MBA Jump Start Program. Statistics Module Part 3

2013 MBA Jump Start Program. Statistics Module Part 3 2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just

More information

Univariate Regression

Univariate Regression Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is

More information

STAT 350 Practice Final Exam Solution (Spring 2015)

STAT 350 Practice Final Exam Solution (Spring 2015) PART 1: Multiple Choice Questions: 1) A study was conducted to compare five different training programs for improving endurance. Forty subjects were randomly divided into five groups of eight subjects

More information

August 2012 EXAMINATIONS Solution Part I

August 2012 EXAMINATIONS Solution Part I August 01 EXAMINATIONS Solution Part I (1) In a random sample of 600 eligible voters, the probability that less than 38% will be in favour of this policy is closest to (B) () In a large random sample,

More information

1 Simple Linear Regression I Least Squares Estimation

1 Simple Linear Regression I Least Squares Estimation Simple Linear Regression I Least Squares Estimation Textbook Sections: 8. 8.3 Previously, we have worked with a random variable x that comes from a population that is normally distributed with mean µ and

More information

Introduction to Regression and Data Analysis

Introduction to Regression and Data Analysis Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

More information

Regression and Correlation

Regression and Correlation Regression and Correlation Topics Covered: Dependent and independent variables. Scatter diagram. Correlation coefficient. Linear Regression line. by Dr.I.Namestnikova 1 Introduction Regression analysis

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

More information

Simple linear regression

Simple linear regression Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

More information

Exercise 1.12 (Pg. 22-23)

Exercise 1.12 (Pg. 22-23) Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

More information

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

 Y. Notation and Equations for Regression Lecture 11/4. Notation: Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

More information

5. Linear Regression

5. Linear Regression 5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4

More information

The Big Picture. Correlation. Scatter Plots. Data

The Big Picture. Correlation. Scatter Plots. Data The Big Picture Correlation Bret Hanlon and Bret Larget Department of Statistics Universit of Wisconsin Madison December 6, We have just completed a length series of lectures on ANOVA where we considered

More information

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model. Polynomial Regression POLYNOMIAL AND MULTIPLE REGRESSION Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model. It is a form of linear regression

More information

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 13 Introduction to Linear Regression and Correlation Analysis Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing

More information

Elementary Statistics Sample Exam #3

Elementary Statistics Sample Exam #3 Elementary Statistics Sample Exam #3 Instructions. No books or telephones. Only the supplied calculators are allowed. The exam is worth 100 points. 1. A chi square goodness of fit test is considered to

More information

Notes on Applied Linear Regression

Notes on Applied Linear Regression Notes on Applied Linear Regression Jamie DeCoster Department of Social Psychology Free University Amsterdam Van der Boechorststraat 1 1081 BT Amsterdam The Netherlands phone: +31 (0)20 444-8935 email:

More information

Module 5: Multiple Regression Analysis

Module 5: Multiple Regression Analysis Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

More information

MTH 140 Statistics Videos

MTH 140 Statistics Videos MTH 140 Statistics Videos Chapter 1 Picturing Distributions with Graphs Individuals and Variables Categorical Variables: Pie Charts and Bar Graphs Categorical Variables: Pie Charts and Bar Graphs Quantitative

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is

More information

Jinadasa Gamage, Professor of Mathematics, Illinois State University, Normal, IL, e- mail: jina@ilstu.edu

Jinadasa Gamage, Professor of Mathematics, Illinois State University, Normal, IL, e- mail: jina@ilstu.edu Submission for ARCH, October 31, 2006 Jinadasa Gamage, Professor of Mathematics, Illinois State University, Normal, IL, e- mail: jina@ilstu.edu Jed L. Linfield, FSA, MAAA, Health Actuary, Kaiser Permanente,

More information

GLM I An Introduction to Generalized Linear Models

GLM I An Introduction to Generalized Linear Models GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March 2009 Presented by: Tanya D. Havlicek, Actuarial Assistant 0 ANTITRUST Notice The Casualty Actuarial

More information

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance

More information

1 Another method of estimation: least squares

1 Another method of estimation: least squares 1 Another method of estimation: least squares erm: -estim.tex, Dec8, 009: 6 p.m. (draft - typos/writos likely exist) Corrections, comments, suggestions welcome. 1.1 Least squares in general Assume Y i

More information

Elements of statistics (MATH0487-1)

Elements of statistics (MATH0487-1) Elements of statistics (MATH0487-1) Prof. Dr. Dr. K. Van Steen University of Liège, Belgium December 10, 2012 Introduction to Statistics Basic Probability Revisited Sampling Exploratory Data Analysis -

More information

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1) CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

Linear Regression. Chapter 5. Prediction via Regression Line Number of new birds and Percent returning. Least Squares

Linear Regression. Chapter 5. Prediction via Regression Line Number of new birds and Percent returning. Least Squares Linear Regression Chapter 5 Regression Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). We can then predict the average response for all subjects

More information

Causal Forecasting Models

Causal Forecasting Models CTL.SC1x -Supply Chain & Logistics Fundamentals Causal Forecasting Models MIT Center for Transportation & Logistics Causal Models Used when demand is correlated with some known and measurable environmental

More information

A Primer on Forecasting Business Performance

A Primer on Forecasting Business Performance A Primer on Forecasting Business Performance There are two common approaches to forecasting: qualitative and quantitative. Qualitative forecasting methods are important when historical data is not available.

More information

Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS.

Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS. SPSS Regressions Social Science Research Lab American University, Washington, D.C. Web. www.american.edu/provost/ctrl/pclabs.cfm Tel. x3862 Email. SSRL@American.edu Course Objective This course is designed

More information

An analysis appropriate for a quantitative outcome and a single quantitative explanatory. 9.1 The model behind linear regression

An analysis appropriate for a quantitative outcome and a single quantitative explanatory. 9.1 The model behind linear regression Chapter 9 Simple Linear Regression An analysis appropriate for a quantitative outcome and a single quantitative explanatory variable. 9.1 The model behind linear regression When we are examining the relationship

More information

4. Simple regression. QBUS6840 Predictive Analytics. https://www.otexts.org/fpp/4

4. Simple regression. QBUS6840 Predictive Analytics. https://www.otexts.org/fpp/4 4. Simple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/4 Outline The simple linear model Least squares estimation Forecasting with regression Non-linear functional forms Regression

More information

5. Multiple regression

5. Multiple regression 5. Multiple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/5 QBUS6840 Predictive Analytics 5. Multiple regression 2/39 Outline Introduction to multiple linear regression Some useful

More information

Section 1: Simple Linear Regression

Section 1: Simple Linear Regression Section 1: Simple Linear Regression Carlos M. Carvalho The University of Texas McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Regression: General Introduction

More information

SIMPLE LINEAR REGRESSION

SIMPLE LINEAR REGRESSION CHAPTER 2 SIMPLE LINEAR REGRESSION 2.1 I NTRO DU CTlO N We start with the simple case of studying the relationship between a response variable Y and a predictor variable XI. Since we have only one predictor

More information

Estimation of σ 2, the variance of ɛ

Estimation of σ 2, the variance of ɛ Estimation of σ 2, the variance of ɛ The variance of the errors σ 2 indicates how much observations deviate from the fitted surface. If σ 2 is small, parameters β 0, β 1,..., β k will be reliably estimated

More information

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number 1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x - x) B. x 3 x C. 3x - x D. x - 3x 2) Write the following as an algebraic expression

More information

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ STA 3024 Practice Problems Exam 2 NOTE: These are just Practice Problems. This is NOT meant to look just like the test, and it is NOT the only thing that you should study. Make sure you know all the material

More information

17. SIMPLE LINEAR REGRESSION II

17. SIMPLE LINEAR REGRESSION II 17. SIMPLE LINEAR REGRESSION II The Model In linear regression analysis, we assume that the relationship between X and Y is linear. This does not mean, however, that Y can be perfectly predicted from X.

More information

DATA INTERPRETATION AND STATISTICS

DATA INTERPRETATION AND STATISTICS PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE

More information

International Statistical Institute, 56th Session, 2007: Phil Everson

International Statistical Institute, 56th Session, 2007: Phil Everson Teaching Regression using American Football Scores Everson, Phil Swarthmore College Department of Mathematics and Statistics 5 College Avenue Swarthmore, PA198, USA E-mail: peverso1@swarthmore.edu 1. Introduction

More information

One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups

One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups In analysis of variance, the main research question is whether the sample means are from different populations. The

More information

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r), Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables

More information

Chapter 23. Inferences for Regression

Chapter 23. Inferences for Regression Chapter 23. Inferences for Regression Topics covered in this chapter: Simple Linear Regression Simple Linear Regression Example 23.1: Crying and IQ The Problem: Infants who cry easily may be more easily

More information

Pearson s Correlation

Pearson s Correlation Pearson s Correlation Correlation the degree to which two variables are associated (co-vary). Covariance may be either positive or negative. Its magnitude depends on the units of measurement. Assumes the

More information

Module 3: Correlation and Covariance

Module 3: Correlation and Covariance Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis

More information

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction

More information

List of Examples. Examples 319

List of Examples. Examples 319 Examples 319 List of Examples DiMaggio and Mantle. 6 Weed seeds. 6, 23, 37, 38 Vole reproduction. 7, 24, 37 Wooly bear caterpillar cocoons. 7 Homophone confusion and Alzheimer s disease. 8 Gear tooth strength.

More information

Homework 11. Part 1. Name: Score: / null

Homework 11. Part 1. Name: Score: / null Name: Score: / Homework 11 Part 1 null 1 For which of the following correlations would the data points be clustered most closely around a straight line? A. r = 0.50 B. r = -0.80 C. r = 0.10 D. There is

More information

Statistics in Retail Finance. Chapter 6: Behavioural models

Statistics in Retail Finance. Chapter 6: Behavioural models Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural

More information

AP Physics 1 and 2 Lab Investigations

AP Physics 1 and 2 Lab Investigations AP Physics 1 and 2 Lab Investigations Student Guide to Data Analysis New York, NY. College Board, Advanced Placement, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks

More information

Introduction to Linear Regression

Introduction to Linear Regression 14. Regression A. Introduction to Simple Linear Regression B. Partitioning Sums of Squares C. Standard Error of the Estimate D. Inferential Statistics for b and r E. Influential Observations F. Regression

More information

How To Run Statistical Tests in Excel

How To Run Statistical Tests in Excel How To Run Statistical Tests in Excel Microsoft Excel is your best tool for storing and manipulating data, calculating basic descriptive statistics such as means and standard deviations, and conducting

More information

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Week 1 Week 2 14.0 Students organize and describe distributions of data by using a number of different

More information

Scatter Plot, Correlation, and Regression on the TI-83/84

Scatter Plot, Correlation, and Regression on the TI-83/84 Scatter Plot, Correlation, and Regression on the TI-83/84 Summary: When you have a set of (x,y) data points and want to find the best equation to describe them, you are performing a regression. This page

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

table to see that the probability is 0.8413. (b) What is the probability that x is between 16 and 60? The z-scores for 16 and 60 are: 60 38 = 1.

table to see that the probability is 0.8413. (b) What is the probability that x is between 16 and 60? The z-scores for 16 and 60 are: 60 38 = 1. Review Problems for Exam 3 Math 1040 1 1. Find the probability that a standard normal random variable is less than 2.37. Looking up 2.37 on the normal table, we see that the probability is 0.9911. 2. Find

More information

Correlation. What Is Correlation? Perfect Correlation. Perfect Correlation. Greg C Elvers

Correlation. What Is Correlation? Perfect Correlation. Perfect Correlation. Greg C Elvers Correlation Greg C Elvers What Is Correlation? Correlation is a descriptive statistic that tells you if two variables are related to each other E.g. Is your related to how much you study? When two variables

More information

Simple Linear Regression

Simple Linear Regression STAT 101 Dr. Kari Lock Morgan Simple Linear Regression SECTIONS 9.3 Confidence and prediction intervals (9.3) Conditions for inference (9.1) Want More Stats??? If you have enjoyed learning how to analyze

More information

Sections 2.11 and 5.8

Sections 2.11 and 5.8 Sections 211 and 58 Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis I 1/25 Gesell data Let X be the age in in months a child speaks his/her first word and

More information

II. DISTRIBUTIONS distribution normal distribution. standard scores

II. DISTRIBUTIONS distribution normal distribution. standard scores Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

More information

Vertical Alignment Colorado Academic Standards 6 th - 7 th - 8 th

Vertical Alignment Colorado Academic Standards 6 th - 7 th - 8 th Vertical Alignment Colorado Academic Standards 6 th - 7 th - 8 th Standard 3: Data Analysis, Statistics, and Probability 6 th Prepared Graduates: 1. Solve problems and make decisions that depend on un

More information

Econometrics Simple Linear Regression

Econometrics Simple Linear Regression Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight

More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

Applying Statistics Recommended by Regulatory Documents

Applying Statistics Recommended by Regulatory Documents Applying Statistics Recommended by Regulatory Documents Steven Walfish President, Statistical Outsourcing Services steven@statisticaloutsourcingservices.com 301-325 325-31293129 About the Speaker Mr. Steven

More information

Final Exam Practice Problem Answers

Final Exam Practice Problem Answers Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal

More information

Section 3 Part 1. Relationships between two numerical variables

Section 3 Part 1. Relationships between two numerical variables Section 3 Part 1 Relationships between two numerical variables 1 Relationship between two variables The summary statistics covered in the previous lessons are appropriate for describing a single variable.

More information

South Carolina College- and Career-Ready (SCCCR) Probability and Statistics

South Carolina College- and Career-Ready (SCCCR) Probability and Statistics South Carolina College- and Career-Ready (SCCCR) Probability and Statistics South Carolina College- and Career-Ready Mathematical Process Standards The South Carolina College- and Career-Ready (SCCCR)

More information

STT 200 LECTURE 1, SECTION 2,4 RECITATION 7 (10/16/2012)

STT 200 LECTURE 1, SECTION 2,4 RECITATION 7 (10/16/2012) STT 200 LECTURE 1, SECTION 2,4 RECITATION 7 (10/16/2012) TA: Zhen (Alan) Zhang zhangz19@stt.msu.edu Office hour: (C500 WH) 1:45 2:45PM Tuesday (office tel.: 432-3342) Help-room: (A102 WH) 11:20AM-12:30PM,

More information

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics. Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing

More information

One-Way Analysis of Variance (ANOVA) Example Problem

One-Way Analysis of Variance (ANOVA) Example Problem One-Way Analysis of Variance (ANOVA) Example Problem Introduction Analysis of Variance (ANOVA) is a hypothesis-testing technique used to test the equality of two or more population (or treatment) means

More information

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,

More information

Multivariate Normal Distribution

Multivariate Normal Distribution Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues

More information

11. Analysis of Case-control Studies Logistic Regression

11. Analysis of Case-control Studies Logistic Regression Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

More information

Multiple Linear Regression in Data Mining

Multiple Linear Regression in Data Mining Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple

More information

Case Study in Data Analysis Does a drug prevent cardiomegaly in heart failure?

Case Study in Data Analysis Does a drug prevent cardiomegaly in heart failure? Case Study in Data Analysis Does a drug prevent cardiomegaly in heart failure? Harvey Motulsky hmotulsky@graphpad.com This is the first case in what I expect will be a series of case studies. While I mention

More information

Premaster Statistics Tutorial 4 Full solutions

Premaster Statistics Tutorial 4 Full solutions Premaster Statistics Tutorial 4 Full solutions Regression analysis Q1 (based on Doane & Seward, 4/E, 12.7) a. Interpret the slope of the fitted regression = 125,000 + 150. b. What is the prediction for

More information

The correlation coefficient

The correlation coefficient The correlation coefficient Clinical Biostatistics The correlation coefficient Martin Bland Correlation coefficients are used to measure the of the relationship or association between two quantitative

More information

Linear Models in STATA and ANOVA

Linear Models in STATA and ANOVA Session 4 Linear Models in STATA and ANOVA Page Strengths of Linear Relationships 4-2 A Note on Non-Linear Relationships 4-4 Multiple Linear Regression 4-5 Removal of Variables 4-8 Independent Samples

More information

Common Core Unit Summary Grades 6 to 8

Common Core Unit Summary Grades 6 to 8 Common Core Unit Summary Grades 6 to 8 Grade 8: Unit 1: Congruence and Similarity- 8G1-8G5 rotations reflections and translations,( RRT=congruence) understand congruence of 2 d figures after RRT Dilations

More information

Using R for Linear Regression

Using R for Linear Regression Using R for Linear Regression In the following handout words and symbols in bold are R functions and words and symbols in italics are entries supplied by the user; underlined words and symbols are optional

More information

Section A. Index. Section A. Planning, Budgeting and Forecasting Section A.2 Forecasting techniques... 1. Page 1 of 11. EduPristine CMA - Part I

Section A. Index. Section A. Planning, Budgeting and Forecasting Section A.2 Forecasting techniques... 1. Page 1 of 11. EduPristine CMA - Part I Index Section A. Planning, Budgeting and Forecasting Section A.2 Forecasting techniques... 1 EduPristine CMA - Part I Page 1 of 11 Section A. Planning, Budgeting and Forecasting Section A.2 Forecasting

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

More information

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) and Neural Networks( 類 神 經 網 路 ) 許 湘 伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 35 13 Examples

More information