Ordinary Least Squares Regression Vartanian: SW 540

Size: px
Start display at page:

Download "Ordinary Least Squares Regression Vartanian: SW 540"

Transcription

1 Ordinary Least Squares Regression Vartanian: SW 540 When to Use Ordinary Least Squares Regression Analysis A. Variable types 1. When you have an interval/ratio scale dependent variable. 2. When your independent variables are either interval/ratio scale or dummy variables. B. Types of relationships We use ordinary least squares regression when we are interested in determining cause-and-effect relationships. Thus, if we believe that there is a positive relationship between the unemployment rate in a community and time on welfare (we believe that high unemployment causes people to spend a relatively long time on welfare) then use ordinary least squares regression analysis. The Process of Using OLS Regression Analysis When examining the relationship between an independent and dependent variable in a scattergram, the line that fits these points best is known as the least squares line. This line is chosen by minimizing the distance between all of these points and the line. In other words, we re choosing a line that is closest to all the data points. How do we form the line that goes through the data points (in the scattergram)? We do this by minimizing the sum of the squared deviations from any line we could draw through the points. We thus will choose a line that minimizes the following equation ( Y Y ) i p 2. Here, Y i are the actual values of Y (for each of the sample members) and Y p is the predicted value of Y (or the line we ll be drawing through the scattering of points). We re trying to minimize the sum of the squared deviations of the actual (sample) values of Y (Y i ) from the best 2 line we can draw through all of the Y i points. This ( Yi Yp) expression is known as the unexplained sums of squares or the error sums of squares. The total sums of squares given below can be broken up into explained and unexplained sums of squares. Or The first expression after the equals sign is the unexplained sums of squares and the second expression after the equals sign is the explained sums of squares. The first expression is the total sums of squares (to the left of the equals sign). C:\WP60_1\LECT1.PHD\OLSReg\Regression.Explained.wpd Page 1

2 Unexplained: Our error in predicting what y will be by using the regression line. Explained: What we gain by using Y p instead of What we re trying to do is predict the value of Y, or the dependent variable, given that we know something about the person, our the independent variable, X. If we knew nothing about the person, our best guess of what Y is would be We are trying to improve on in predicting the value of Y. We ll do this with our knowledge of the independent variable, X. The Y p line will allow us to predict the value of the dependent variable, Y, for any value of X, the independent variable. For example, we may know that a particular state has welfare payments of $500/month. We may wish to predict how long a person will stay on AFDC if they live in such a state. By knowing the Y p line, we ll be able to predict how long a person stays on AFDC. We may not be perfectly right in our prediction, for instance, if the points around the line are highly dispersed. But if the points around the line are concentrated around the line, then we can predict fairly accurately how long someone will spend on AFDC for a given AFDC payment level within the state. We are able to determine this ordinary least squares line by examining each X value and determining the mean value of Y at each X. We then connect each of these mean values, at each X value, to form the OLS regression line. If we were examining the effect of the number of children on income, we would examine the mean value of Y at each X value, or each number of children. We then connect these points to form the OLS regression line. Not all of the sample points will be located on the OLS regression line some will be below the line and some will be above the line. The closer the points are to this line, the better the predictor of the dependent variable the independent variable will be. We can determine the Y p line by the following equation: Y p = a + b X Here, a is the intercept, b is the slope coefficient, and X is the independent variable. Y p is the predicted value of Y for a given value of X. The formulas for determining the intercept (a) and the slope (b) are given below (on the next page). We can define the a and b coefficients as the following: a, or the intercept, is the point where we cross the Y axis when the value of X is 0. We know this because if we give X a value of 0, Y p =a. b, or the slope coefficient, tells us how much Y p changes for a one-unit change in X. A positive value for b indicates that there is a positive relationship between the independent and C:\WP60_1\LECT1.PHD\OLSReg\Regression.Explained.wpd Page 2

3 dependent variable. A negative value for b indicates that there is a negative relationship between the independent and dependent variable. A value of 1 for b indicates that for every 1 unit increase in the independent variable, the dependent variable increases by 1 unit. If b=2, this indicates that for a one unit increase in the independent variable, the dependent variable increases by 2 units. If b= -9, this indicates that for every 1 unit increase in the independent variable, the dependent variable would decrease by 9 units. Thus, b Change in Y = 1 Unit Increase in X. The slope is generally defined as. Let s say we have the following 5 observations, where X, the independent variable, is the number of children in the household, and Y, the dependent variable, is the time in months on AFDC. X Y The formula for determining the slope, or the b coefficient estimate is The formula for the intercept, or a coefficient estimate is. or In the example given, N=5. a = Y bx C:\WP60_1\LECT1.PHD\OLSReg\Regression.Explained.wpd Page 3

4 XY = 55 X Y = 15 X ( X ) = 15 2 = 2 55 = 225 b = 5(55) 1515 ( ) 5(55) = = 50 1 a = 15 1( 15) 5 0 = = 5 0 So, Y p = (X). The b coefficient estimate tells us that for every 1 unit increase in X, the predicted value for the dependent variable will increase by 1 unit. The a coefficient estimate tells us that when X=0, the value of the dependent variable is 0. When X =1, Y p =1. We could graph this line to see the relationship between the two variables -- the independent and the dependent -- which is given above. It turns out in this case, we have a perfect relationship since all of the points lie on the Y p line. If we were to determine a correlation coefficient (r), it would be =1. To graph this relationship, we could determine the value of Y p for each X. Y p X Let s say we have the following 5 cases for a second example. X Y C:\WP60_1\LECT1.PHD\OLSReg\Regression.Explained.wpd Page 4

5 N=5 To determine b: The regression equation is therefore Y p =6-(1)X or Y p =6-X The b coefficient estimate, or the slope coefficient, for this example = -1. The a coefficient estimate, or the intercept, = 6. Thus, when X=0, Y p, the predicted value of Y, is 6. If X=1, then the predicted value of Y is 5. In this second situation, we again would find a perfect relationship between the two variables all of the points are on the regression line. If we were to determine the correlation coefficient (r) for this example, it would = -1. To graph this we could determine the value of Y p for each X value. We again use the Y p equation from above. X Y p C:\WP60_1\LECT1.PHD\OLSReg\Regression.Explained.wpd Page 5

6 We will rarely find a perfect relationship between two variables as we have in the two examples above. For example, if we had the following 5 cases below, we would not find a perfect relationship between the two variables. X Y N=5 To determine b: The regression equation is therefore Yp= (X). Where b=.8 and a=3.6. Thus, when X=0, the predicted value for Y p is replace X with a value of 0 in the above Y p equation. When X=1, the predicted value for Y p =4.4 replace X with a value of 1 in the above C:\WP60_1\LECT1.PHD\OLSReg\Regression.Explained.wpd Page 6

7 Y p equation. When X=10, the predicted value for Y p =11.6. A final example examines a sample of people who have been on AFDC to determine the relationship between time on AFDC (in months) and the unemployment rate in the area where the AFDC recipient lives. We come up with the following a and b coefficients: a=3, b=4 In other words, Y p = X Here, X=unemployment rate in the area of residence of the AFDC recipient. What we can do is put in different values of X to see what we predict about the dependent variable. If X=0 (or the unemployment rate is at 0%), we would predict that AFDC recipients will spend 3 months on AFDC. Y p =3 + 4 (0) = 3. If X=1, we would predict that AFDC spell length would be 7 months Y p = 3 + 4(1) = 7. If X=2 (the country unemployment rate is at 2%), we would predict that AFDC recipients would spend 11 months on AFDC. Y p = 3 + 4(2) = 11. The Disturbance or Residual Term The points above and below the regression line constitute what is called the disturbance or residual.. We can determine the value of the disturbance or residual for each of the observations in the sample. The value of Y (Y i ) for each sample member is determined by the following equation: Y i = a + b X i + e i Where e i is the disturbance or residual. The disturbance or residual measures: 1. Variables that have not been used in the equation that should have been used. Theory states that you need particular variables in your equation but you fail to include these variables. 2. Unknown variance in the measurement of the dependent variable. In order to get unbiased and efficient estimates for the coefficient estimate, b, the following must be true: C:\WP60_1\LECT1.PHD\OLSReg\Regression.Explained.wpd Page 7

8 1. The expected value of the residual or disturbance term E (e i )=0. 2. e i is normally distributed 3. e i is independent of X. That is, e i and X are uncorrelated. If you omit variables from your equation that are necessary in determining your equation, these will be picked up by the residual or disturbance term, e i. If these are correlated with any of the X i, then your X i 's will be correlated with the disturbance term and you will be violating rule # 3 above. If this is the case, you will not have unbiased estimates of your b coefficients. It is important that your theory capture the necessary variables to estimate Y and you include these variables in your statistical models. Determining the disturbance for each of the observations: Example: kids(x) Spell Length (Y) From here we could determine the disturbance or residual for each of the observations: Y i =a+bx i +e i kids(x) Spell Length (Y i ) Disturbance (e) Yp Y p = a + b X. In this example, b=1 and a=2.4. For the first observation, 3= (1) + e C:\WP60_1\LECT1.PHD\OLSReg\Regression.Explained.wpd Page 8

9 For the second observation, For the third observation, For the fourth observation, For the fifth observation, 3=3.4+e -.4=e 1 Y p = *1= 3.4 5= (2) + e 5=4.4 +e.6=e 2 Y p =2.4+1*2=4.4 6= (3) + e 6=5.4+e.6=e 3 Y p =2.4+1*3=5.4 5= (4) + e 5=6.4 +e -1.4=e 4 Yp=2.4+1*4=6.4 8= (5) + e 8=7.4 +e.6=e 5 Y p =2.4+1*5=7.4 You could also find the e i values by the formula: Y i - Y p. Once you determine these residuals, you can see that the mean, or the expected value, for the residuals is equal to zero (add up all of the residuals). You could also determine the error sums of squares or the unexplained sums of squares (they re the same thing) by squaring each of the disturbance terms: To determine the explained sums of squares, first determine each of the observations and square this difference., then subtract it from Y p for Explained sums of Squares 1: ( ) 2 =4 2: ( ) 2 =1 3: ( ) 2 =0 4: ( ) 2 =1 5:( ) 2 = 4 The explained sum of squares = 10, the unexplained sums of squares = 3.2. Therefore, the total sums of squares = 13.2 (the Total SS=Unexplained SS+Explained SS). From this information, you can determine how much of the variation in the dependent variable is being explained by the independent variable. This is called the R 2 value and can be determined by using the following formula: C:\WP60_1\LECT1.PHD\OLSReg\Regression.Explained.wpd Page 9

10 In this case, the R 2 value = 10/13.2=.7575, or 75.75% of the variation in the dependent variable, AFDC Spell length, is explained by the independent variable, number of kids. Testing to Determine if the Relationship Between the Independent and Dependent Variables is Significant or Testing the Significance of the b coefficient estimate. You will generally be testing a null hypothesis that states that there is no relationship between the independent and dependent variables. In other words, you ll be testing the following: H 0 : B=0. If you re testing for a positive relationship between the independent and dependent variables, your one tailed research hypothesis will be: H R : B>0. A negative research hypothesis will be: H R : B<0 A two-tailed research hypothesis will be: HR: B 0 In order to test for the significance of the b coefficient, you will have to know the standard error for the b coefficient. The standard error for the coefficient is very similar to a standard deviation it measures the spread of the distribution. We will use a student t distribution to test the b coefficient, to determine if there is in all likelihood a relationship between the independent and dependent variables. The student t distribution value is very similar to a z value (related to the normal distribution) that we learned earlier. The t is telling us how many standard error units we are away from our hypothesized value. The hypothesized value we re examining is the null hypothesis -- a value of B=0. We found that for the normal distribution, when we were 1.96 units away from the mean of the distribution (where z=1.96), we were in the.025 tail of the normal distribution. When sample sizes get relatively large, it will again take around 1.96 units (now standard error units measured in t values rather than z values) for us to be in the.025 tail-end of the distribution. In other words, when sample sizes get large, the student t distribution turns into a normal distribution. C:\WP60_1\LECT1.PHD\OLSReg\Regression.Explained.wpd Page 10

11 The t value is determined by the formula below. t n k 1 = b SE b For now, we ll determine the standard error of the b estimate by using the following formula: Where the ESS stands for the error sums of squares or the unexplained sums of squares. The n-k-1 part of the t formula indicates the degrees of freedom. Here, n is equal to the number of observations, k is equal to the number of independent variables, and SE b is the standard error for the b coefficient estimate. If we had 5 observations and 1 independent variable, we would have 3 degrees of freedom. We would use this degrees of freedom in a table of critical values for t to determine if the t value is greater than or equal to the critical value. If the t value is greater than the critical value, you will reject the null hypothesis. If the t value is less than the critical value, you will accept the null hypothesis. Let s say that you determine that the b coefficient estimate = 4. You also determine that the standard error for the b coefficient estimate is 2, with an n=42 (or you re examining 42 cases). Let s also say you re examining a one-tailed hypothesis at the.05 level of significance. Your t statistic would be the following: t t = = 2 = 2 This indicates that the t value = 2, with 40 degrees of freedom. The critical value is Because the t value is greater than the critical value, you would reject the null hypothesis at the.05 level, for a one-tailed test. If you were testing this hypothesis at the.05 level for a two-tailed test, the critical value = Because the t value is less than the critical value, you would accept the null hypothesis. C:\WP60_1\LECT1.PHD\OLSReg\Regression.Explained.wpd Page 11

12 A MORE COMPLICATED EXAMPLE You re examining the relationship between age and wage. You have the following 4 observations: Obs Age (X) Wage (Y) From this information, we could determine the a and b coefficients: a=3.9, b=.085. Y p = X Therefore, the predicted value of Y and the residual or disturbance terms will be: (Y i -Y p ): 1: Y p = *20 = 5.6; e 1 = = -.1 2: Y p = *30 = 6.45; e 2 = =.05 3: Y p = *40= 7.3; e 3 = =.2 4: Y p = *50=8.15; e 4 =8-8.15= -.15 If we square each of these residuals, we get =.075. This is the value for the unexplained sums of squares. If we divide this value by n-k-1, or 2, we get This value, the, is called the Mean Square Error (MSE). It is the unexplained sums of squares divided by the degrees of freedom. We ll use the MSE again when examining the relationship between the entire model of independent variables and the dependent variable. To determine the standard error of the b estimate, use the formula: We ve just determined that = To determine the rest of this formula, do the following: C:\WP60_1\LECT1.PHD\OLSReg\Regression.Explained.wpd Page 12

13 We can then determine whether the t coefficient is significant by using the t formula: t 2 =.085/ = 9.8. At two degrees of freedom for a.05, two-tailed test, the critical value is 4.3. Because the t value is greater than the critical value, reject the null hypothesis. C:\WP60_1\LECT1.PHD\OLSReg\Regression.Explained.wpd Page 13

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

 Y. Notation and Equations for Regression Lecture 11/4. Notation: Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

More information

Regression Analysis: A Complete Example

Regression Analysis: A Complete Example Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

More information

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the

More information

SPSS Guide: Regression Analysis

SPSS Guide: Regression Analysis SPSS Guide: Regression Analysis I put this together to give you a step-by-step guide for replicating what we did in the computer lab. It should help you run the tests we covered. The best way to get familiar

More information

Module 3: Correlation and Covariance

Module 3: Correlation and Covariance Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis

More information

Simple Regression Theory II 2010 Samuel L. Baker

Simple Regression Theory II 2010 Samuel L. Baker SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

More information

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade Statistics Quiz Correlation and Regression -- ANSWERS 1. Temperature and air pollution are known to be correlated. We collect data from two laboratories, in Boston and Montreal. Boston makes their measurements

More information

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

More information

2. Simple Linear Regression

2. Simple Linear Regression Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

More information

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number 1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x - x) B. x 3 x C. 3x - x D. x - 3x 2) Write the following as an algebraic expression

More information

Regression III: Advanced Methods

Regression III: Advanced Methods Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models

More information

Simple linear regression

Simple linear regression Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

More information

Introduction to Regression and Data Analysis

Introduction to Regression and Data Analysis Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Section 14 Simple Linear Regression: Introduction to Least Squares Regression Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship

More information

Introduction to Quantitative Methods

Introduction to Quantitative Methods Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................

More information

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate

More information

17. SIMPLE LINEAR REGRESSION II

17. SIMPLE LINEAR REGRESSION II 17. SIMPLE LINEAR REGRESSION II The Model In linear regression analysis, we assume that the relationship between X and Y is linear. This does not mean, however, that Y can be perfectly predicted from X.

More information

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r), Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables

More information

Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

More information

Exercise 1.12 (Pg. 22-23)

Exercise 1.12 (Pg. 22-23) Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

More information

Example: Boats and Manatees

Example: Boats and Manatees Figure 9-6 Example: Boats and Manatees Slide 1 Given the sample data in Table 9-1, find the value of the linear correlation coefficient r, then refer to Table A-6 to determine whether there is a significant

More information

Module 5: Multiple Regression Analysis

Module 5: Multiple Regression Analysis Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

More information

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

More information

Univariate Regression

Univariate Regression Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is

More information

Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

More information

Session 7 Bivariate Data and Analysis

Session 7 Bivariate Data and Analysis Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares

More information

August 2012 EXAMINATIONS Solution Part I

August 2012 EXAMINATIONS Solution Part I August 01 EXAMINATIONS Solution Part I (1) In a random sample of 600 eligible voters, the probability that less than 38% will be in favour of this policy is closest to (B) () In a large random sample,

More information

Association Between Variables

Association Between Variables Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi

More information

Stata Walkthrough 4: Regression, Prediction, and Forecasting

Stata Walkthrough 4: Regression, Prediction, and Forecasting Stata Walkthrough 4: Regression, Prediction, and Forecasting Over drinks the other evening, my neighbor told me about his 25-year-old nephew, who is dating a 35-year-old woman. God, I can t see them getting

More information

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2 University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages

More information

The correlation coefficient

The correlation coefficient The correlation coefficient Clinical Biostatistics The correlation coefficient Martin Bland Correlation coefficients are used to measure the of the relationship or association between two quantitative

More information

DATA INTERPRETATION AND STATISTICS

DATA INTERPRETATION AND STATISTICS PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE

More information

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 13 Introduction to Linear Regression and Correlation Analysis Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing

More information

CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

More information

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model. Polynomial Regression POLYNOMIAL AND MULTIPLE REGRESSION Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model. It is a form of linear regression

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance

More information

SIMPLE LINEAR CORRELATION. r can range from -1 to 1, and is independent of units of measurement. Correlation can be done on two dependent variables.

SIMPLE LINEAR CORRELATION. r can range from -1 to 1, and is independent of units of measurement. Correlation can be done on two dependent variables. SIMPLE LINEAR CORRELATION Simple linear correlation is a measure of the degree to which two variables vary together, or a measure of the intensity of the association between two variables. Correlation

More information

11. Analysis of Case-control Studies Logistic Regression

11. Analysis of Case-control Studies Logistic Regression Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

More information

Scatter Plot, Correlation, and Regression on the TI-83/84

Scatter Plot, Correlation, and Regression on the TI-83/84 Scatter Plot, Correlation, and Regression on the TI-83/84 Summary: When you have a set of (x,y) data points and want to find the best equation to describe them, you are performing a regression. This page

More information

2. Linear regression with multiple regressors

2. Linear regression with multiple regressors 2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions

More information

2013 MBA Jump Start Program. Statistics Module Part 3

2013 MBA Jump Start Program. Statistics Module Part 3 2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just

More information

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation

More information

Hypothesis testing - Steps

Hypothesis testing - Steps Hypothesis testing - Steps Steps to do a two-tailed test of the hypothesis that β 1 0: 1. Set up the hypotheses: H 0 : β 1 = 0 H a : β 1 0. 2. Compute the test statistic: t = b 1 0 Std. error of b 1 =

More information

Elementary Statistics Sample Exam #3

Elementary Statistics Sample Exam #3 Elementary Statistics Sample Exam #3 Instructions. No books or telephones. Only the supplied calculators are allowed. The exam is worth 100 points. 1. A chi square goodness of fit test is considered to

More information

Causal Forecasting Models

Causal Forecasting Models CTL.SC1x -Supply Chain & Logistics Fundamentals Causal Forecasting Models MIT Center for Transportation & Logistics Causal Models Used when demand is correlated with some known and measurable environmental

More information

Violent crime total. Problem Set 1

Violent crime total. Problem Set 1 Problem Set 1 Note: this problem set is primarily intended to get you used to manipulating and presenting data using a spreadsheet program. While subsequent problem sets will be useful indicators of the

More information

One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups

One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups In analysis of variance, the main research question is whether the sample means are from different populations. The

More information

How To Run Statistical Tests in Excel

How To Run Statistical Tests in Excel How To Run Statistical Tests in Excel Microsoft Excel is your best tool for storing and manipulating data, calculating basic descriptive statistics such as means and standard deviations, and conducting

More information

1/27/2013. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

1/27/2013. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 Introduce moderated multiple regression Continuous predictor continuous predictor Continuous predictor categorical predictor Understand

More information

Correlation. What Is Correlation? Perfect Correlation. Perfect Correlation. Greg C Elvers

Correlation. What Is Correlation? Perfect Correlation. Perfect Correlation. Greg C Elvers Correlation Greg C Elvers What Is Correlation? Correlation is a descriptive statistic that tells you if two variables are related to each other E.g. Is your related to how much you study? When two variables

More information

Rockefeller College University at Albany

Rockefeller College University at Albany Rockefeller College University at Albany PAD 705 Handout: Hypothesis Testing on Multiple Parameters In many cases we may wish to know whether two or more variables are jointly significant in a regression.

More information

5. Linear Regression

5. Linear Regression 5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4

More information

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

A Primer on Forecasting Business Performance

A Primer on Forecasting Business Performance A Primer on Forecasting Business Performance There are two common approaches to forecasting: qualitative and quantitative. Qualitative forecasting methods are important when historical data is not available.

More information

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results IAPRI Quantitative Analysis Capacity Building Series Multiple regression analysis & interpreting results How important is R-squared? R-squared Published in Agricultural Economics 0.45 Best article of the

More information

Marginal Person. Average Person. (Average Return of College Goers) Return, Cost. (Average Return in the Population) (Marginal Return)

Marginal Person. Average Person. (Average Return of College Goers) Return, Cost. (Average Return in the Population) (Marginal Return) 1 2 3 Marginal Person Average Person (Average Return of College Goers) Return, Cost (Average Return in the Population) 4 (Marginal Return) 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27

More information

LOGIT AND PROBIT ANALYSIS

LOGIT AND PROBIT ANALYSIS LOGIT AND PROBIT ANALYSIS A.K. Vasisht I.A.S.R.I., Library Avenue, New Delhi 110 012 amitvasisht@iasri.res.in In dummy regression variable models, it is assumed implicitly that the dependent variable Y

More information

5. Multiple regression

5. Multiple regression 5. Multiple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/5 QBUS6840 Predictive Analytics 5. Multiple regression 2/39 Outline Introduction to multiple linear regression Some useful

More information

Correlation key concepts:

Correlation key concepts: CORRELATION Correlation key concepts: Types of correlation Methods of studying correlation a) Scatter diagram b) Karl pearson s coefficient of correlation c) Spearman s Rank correlation coefficient d)

More information

Simple Linear Regression, Scatterplots, and Bivariate Correlation

Simple Linear Regression, Scatterplots, and Bivariate Correlation 1 Simple Linear Regression, Scatterplots, and Bivariate Correlation This section covers procedures for testing the association between two continuous variables using the SPSS Regression and Correlate analyses.

More information

II. DISTRIBUTIONS distribution normal distribution. standard scores

II. DISTRIBUTIONS distribution normal distribution. standard scores Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

More information

Part 1 : 07/27/10 21:30:31

Part 1 : 07/27/10 21:30:31 Question 1 - CIA 593 III-64 - Forecasting Techniques What coefficient of correlation results from the following data? X Y 1 10 2 8 3 6 4 4 5 2 A. 0 B. 1 C. Cannot be determined from the data given. D.

More information

Introduction to Linear Regression

Introduction to Linear Regression 14. Regression A. Introduction to Simple Linear Regression B. Partitioning Sums of Squares C. Standard Error of the Estimate D. Inferential Statistics for b and r E. Influential Observations F. Regression

More information

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1) CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

Homework 11. Part 1. Name: Score: / null

Homework 11. Part 1. Name: Score: / null Name: Score: / Homework 11 Part 1 null 1 For which of the following correlations would the data points be clustered most closely around a straight line? A. r = 0.50 B. r = -0.80 C. r = 0.10 D. There is

More information

Econometrics Simple Linear Regression

Econometrics Simple Linear Regression Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight

More information

Because the slope is, a slope of 5 would mean that for every 1cm increase in diameter, the circumference would increase by 5cm.

Because the slope is, a slope of 5 would mean that for every 1cm increase in diameter, the circumference would increase by 5cm. Measurement Lab You will be graphing circumference (cm) vs. diameter (cm) for several different circular objects, and finding the slope of the line of best fit using the CapStone program. Write out or

More information

Mgmt 469. Regression Basics. You have all had some training in statistics and regression analysis. Still, it is useful to review

Mgmt 469. Regression Basics. You have all had some training in statistics and regression analysis. Still, it is useful to review Mgmt 469 Regression Basics You have all had some training in statistics and regression analysis. Still, it is useful to review some basic stuff. In this note I cover the following material: What is a regression

More information

Using Excel for inferential statistics

Using Excel for inferential statistics FACT SHEET Using Excel for inferential statistics Introduction When you collect data, you expect a certain amount of variation, just caused by chance. A wide variety of statistical tests can be applied

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1) Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the

More information

Multicollinearity Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 13, 2015

Multicollinearity Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 13, 2015 Multicollinearity Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 13, 2015 Stata Example (See appendices for full example).. use http://www.nd.edu/~rwilliam/stats2/statafiles/multicoll.dta,

More information

Estimation of σ 2, the variance of ɛ

Estimation of σ 2, the variance of ɛ Estimation of σ 2, the variance of ɛ The variance of the errors σ 2 indicates how much observations deviate from the fitted surface. If σ 2 is small, parameters β 0, β 1,..., β k will be reliably estimated

More information

An analysis appropriate for a quantitative outcome and a single quantitative explanatory. 9.1 The model behind linear regression

An analysis appropriate for a quantitative outcome and a single quantitative explanatory. 9.1 The model behind linear regression Chapter 9 Simple Linear Regression An analysis appropriate for a quantitative outcome and a single quantitative explanatory variable. 9.1 The model behind linear regression When we are examining the relationship

More information

Study Guide for the Final Exam

Study Guide for the Final Exam Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make

More information

Regression and Correlation

Regression and Correlation Regression and Correlation Topics Covered: Dependent and independent variables. Scatter diagram. Correlation coefficient. Linear Regression line. by Dr.I.Namestnikova 1 Introduction Regression analysis

More information

hp calculators HP 50g Trend Lines The STAT menu Trend Lines Practice predicting the future using trend lines

hp calculators HP 50g Trend Lines The STAT menu Trend Lines Practice predicting the future using trend lines The STAT menu Trend Lines Practice predicting the future using trend lines The STAT menu The Statistics menu is accessed from the ORANGE shifted function of the 5 key by pressing Ù. When pressed, a CHOOSE

More information

This chapter will demonstrate how to perform multiple linear regression with IBM SPSS

This chapter will demonstrate how to perform multiple linear regression with IBM SPSS CHAPTER 7B Multiple Regression: Statistical Methods Using IBM SPSS This chapter will demonstrate how to perform multiple linear regression with IBM SPSS first using the standard method and then using the

More information

4. Multiple Regression in Practice

4. Multiple Regression in Practice 30 Multiple Regression in Practice 4. Multiple Regression in Practice The preceding chapters have helped define the broad principles on which regression analysis is based. What features one should look

More information

Solución del Examen Tipo: 1

Solución del Examen Tipo: 1 Solución del Examen Tipo: 1 Universidad Carlos III de Madrid ECONOMETRICS Academic year 2009/10 FINAL EXAM May 17, 2010 DURATION: 2 HOURS 1. Assume that model (III) verifies the assumptions of the classical

More information

One-Way Analysis of Variance

One-Way Analysis of Variance One-Way Analysis of Variance Note: Much of the math here is tedious but straightforward. We ll skim over it in class but you should be sure to ask questions if you don t understand it. I. Overview A. We

More information

Regression Analysis of the Relationship between Income and Work Hours

Regression Analysis of the Relationship between Income and Work Hours Regression Analysis of the Relationship between Income and Work Hours Sina Mehdikarimi Samuel Norris Charles Stalzer Georgia Institute of Technology Econometric Analysis (ECON 3161) Dr. Shatakshee Dhongde

More information

Section Format Day Begin End Building Rm# Instructor. 001 Lecture Tue 6:45 PM 8:40 PM Silver 401 Ballerini

Section Format Day Begin End Building Rm# Instructor. 001 Lecture Tue 6:45 PM 8:40 PM Silver 401 Ballerini NEW YORK UNIVERSITY ROBERT F. WAGNER GRADUATE SCHOOL OF PUBLIC SERVICE Course Syllabus Spring 2016 Statistical Methods for Public, Nonprofit, and Health Management Section Format Day Begin End Building

More information

Lean Six Sigma Analyze Phase Introduction. TECH 50800 QUALITY and PRODUCTIVITY in INDUSTRY and TECHNOLOGY

Lean Six Sigma Analyze Phase Introduction. TECH 50800 QUALITY and PRODUCTIVITY in INDUSTRY and TECHNOLOGY TECH 50800 QUALITY and PRODUCTIVITY in INDUSTRY and TECHNOLOGY Before we begin: Turn on the sound on your computer. There is audio to accompany this presentation. Audio will accompany most of the online

More information

Final Exam Practice Problem Answers

Final Exam Practice Problem Answers Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal

More information

The Dummy s Guide to Data Analysis Using SPSS

The Dummy s Guide to Data Analysis Using SPSS The Dummy s Guide to Data Analysis Using SPSS Mathematics 57 Scripps College Amy Gamble April, 2001 Amy Gamble 4/30/01 All Rights Rerserved TABLE OF CONTENTS PAGE Helpful Hints for All Tests...1 Tests

More information

International Statistical Institute, 56th Session, 2007: Phil Everson

International Statistical Institute, 56th Session, 2007: Phil Everson Teaching Regression using American Football Scores Everson, Phil Swarthmore College Department of Mathematics and Statistics 5 College Avenue Swarthmore, PA198, USA E-mail: peverso1@swarthmore.edu 1. Introduction

More information

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,

More information

General Regression Formulae ) (N-2) (1 - r 2 YX

General Regression Formulae ) (N-2) (1 - r 2 YX General Regression Formulae Single Predictor Standardized Parameter Model: Z Yi = β Z Xi + ε i Single Predictor Standardized Statistical Model: Z Yi = β Z Xi Estimate of Beta (Beta-hat: β = r YX (1 Standard

More information

Lecture 15. Endogeneity & Instrumental Variable Estimation

Lecture 15. Endogeneity & Instrumental Variable Estimation Lecture 15. Endogeneity & Instrumental Variable Estimation Saw that measurement error (on right hand side) means that OLS will be biased (biased toward zero) Potential solution to endogeneity instrumental

More information

Section 1: Simple Linear Regression

Section 1: Simple Linear Regression Section 1: Simple Linear Regression Carlos M. Carvalho The University of Texas McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Regression: General Introduction

More information

Linear Models for Continuous Data

Linear Models for Continuous Data Chapter 2 Linear Models for Continuous Data The starting point in our exploration of statistical models in social research will be the classical linear model. Stops along the way include multiple linear

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

More information

Notes on Applied Linear Regression

Notes on Applied Linear Regression Notes on Applied Linear Regression Jamie DeCoster Department of Social Psychology Free University Amsterdam Van der Boechorststraat 1 1081 BT Amsterdam The Netherlands phone: +31 (0)20 444-8935 email:

More information

Case Study in Data Analysis Does a drug prevent cardiomegaly in heart failure?

Case Study in Data Analysis Does a drug prevent cardiomegaly in heart failure? Case Study in Data Analysis Does a drug prevent cardiomegaly in heart failure? Harvey Motulsky hmotulsky@graphpad.com This is the first case in what I expect will be a series of case studies. While I mention

More information

Forecasting in STATA: Tools and Tricks

Forecasting in STATA: Tools and Tricks Forecasting in STATA: Tools and Tricks Introduction This manual is intended to be a reference guide for time series forecasting in STATA. It will be updated periodically during the semester, and will be

More information