Lecture 5: Correlation and Linear Regression

Similar documents
Chapter 7: Simple linear regression Learning Objectives

CORRELATION ANALYSIS

SPSS Guide: Regression Analysis

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Regression and Correlation

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Module 3: Correlation and Covariance

Correlation key concepts:

Copyright 2007 by Laura Schultz. All rights reserved. Page 1 of 5

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

2. Simple Linear Regression

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

Univariate Regression

What does the number m in y = mx + b measure? To find out, suppose (x 1, y 1 ) and (x 2, y 2 ) are two points on the graph of y = mx + b.

Linear Regression. Chapter 5. Prediction via Regression Line Number of new birds and Percent returning. Least Squares

Correlational Research. Correlational Research. Stephen E. Brock, Ph.D., NCSP EDS 250. Descriptive Research 1. Correlational Research: Scatter Plots

Coefficient of Determination

Scatter Plot, Correlation, and Regression on the TI-83/84

Session 7 Bivariate Data and Analysis

Regression Analysis: A Complete Example

Section 3 Part 1. Relationships between two numerical variables

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Linear Equations. Find the domain and the range of the following set. {(4,5), (7,8), (-1,3), (3,3), (2,-3)}

table to see that the probability is (b) What is the probability that x is between 16 and 60? The z-scores for 16 and 60 are: = 1.

Econometrics Simple Linear Regression

Correlation. What Is Correlation? Perfect Correlation. Perfect Correlation. Greg C Elvers

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

STT 200 LECTURE 1, SECTION 2,4 RECITATION 7 (10/16/2012)

The Big Picture. Correlation. Scatter Plots. Data

Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS.

Directions for using SPSS

Pearson s Correlation

Algebraic expressions are a combination of numbers and variables. Here are examples of some basic algebraic expressions.

Using Excel for Statistical Analysis

Formula for linear models. Prediction, extrapolation, significance test against zero slope.

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

Simple Regression Theory II 2010 Samuel L. Baker

Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation

Algebra Cheat Sheets

WEB APPENDIX. Calculating Beta Coefficients. b Beta Rise Run Y X

Father s height (inches)

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.

Chapter 9 Descriptive Statistics for Bivariate Data

The Dummy s Guide to Data Analysis Using SPSS

The correlation coefficient

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

Linear Models in STATA and ANOVA

Homework 11. Part 1. Name: Score: / null

The importance of graphing the data: Anscombe s regression examples

Hypothesis testing - Steps

Dimensionality Reduction: Principal Components Analysis

Relationships Between Two Variables: Scatterplots and Correlation

CORRELATIONAL ANALYSIS: PEARSON S r Purpose of correlational analysis The purpose of performing a correlational analysis: To discover whether there

PEARSON R CORRELATION COEFFICIENT

Simple Linear Regression, Scatterplots, and Bivariate Correlation

Regression III: Advanced Methods

Introduction to Regression and Data Analysis

Section 1.1 Linear Equations: Slope and Equations of Lines

LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE

Year 9 set 1 Mathematics notes, to accompany the 9H book.

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2

Module 5: Statistical Analysis

Introduction to proc glm

Section 1.5 Linear Models

Introduction to Linear Regression

Curve Fitting in Microsoft Excel By William Lee

Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data

Tutorial on Using Excel Solver to Analyze Spin-Lattice Relaxation Time Data

Data Mining Part 5. Prediction

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Worksheet A5: Slope Intercept Form

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Logs Transformation in a Regression Equation

Elements of a graph. Click on the links below to jump directly to the relevant section

Lecture 13/Chapter 10 Relationships between Measurement (Quantitative) Variables

Graphing Rational Functions

So, using the new notation, P X,Y (0,1) =.08 This is the value which the joint probability function for X and Y takes when X=0 and Y=1.

Outline: Demand Forecasting

Aim: How do we find the slope of a line? Warm Up: Go over test. A. Slope -

2013 MBA Jump Start Program. Statistics Module Part 3

Brunswick High School has reinstated a summer math curriculum for students Algebra 1, Geometry, and Algebra 2 for the school year.

with functions, expressions and equations which follow in units 3 and 4.

Moderator and Mediator Analysis

4. Simple regression. QBUS6840 Predictive Analytics.

Procedure for Graphing Polynomial Functions

Lean Six Sigma Analyze Phase Introduction. TECH QUALITY and PRODUCTIVITY in INDUSTRY and TECHNOLOGY

AP STATISTICS REVIEW (YMS Chapters 1-8)

Introduction to Linear Regression

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

The Correlation Coefficient

RARITAN VALLEY COMMUNITY COLLEGE ACADEMIC COURSE OUTLINE MATH 111H STATISTICS II HONORS

y = a + bx Chapter 10: Horngren 13e The Dependent Variable: The cost that is being predicted The Independent Variable: The cost driver

Canonical Correlation Analysis

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Simple linear regression

1 Simple Linear Regression I Least Squares Estimation

Transcription:

Lecture 5: Correlation and Linear Regression 3.5. (Pearson) correlation coefficient The correlation coefficient measures the strength of the linear relationship between two variables. The correlation is always between 1 and 1. Points that fall on a straight line with positive slope have a correlation of 1. Points that fall on a straight line with negative slope have a correlation of 1. Points that are not linearly related have a correlation of 0. The farther the correlation is from 0, the stronger the linear relationship. The correlation does not change if we change units of measurement. See Figure 3 on page 105. Given a bivariate data sat of size n, (x 1, y 1 ), (x 2, y 2 ),..., (x n, y n ),

the sample covariance s x,y is defined by s x,y = 1 n (x i x)(y i y). n 1 Note that if x i = y i for all i = 1,..., n, then s x,y = s 2 x. The sample correlation coefficient r is defined by r = s x,y s x s y, where s x is the sample standard deviation of x 1,..., x n, i.e. s x = n (x i x) 2. n 1 To simplify calculation, we often use the following alternative formula: r = S x,y Sx,x Sy,y, where and n S x,y = x i y i ( n x i)( n y i), n n S x,x = x 2 i ( n x i) 2 n S y,y = n yi 2 ( n y i) 2. n

Example: See page 107. Causation; Lurking variables Go to an elementary school and measure two variables for each child: Shoe size and Reading Level. You will find a positive correlation; as shoe size increases, reading level tends to increase. Should we buy our children bigger shoes? No, the two variables both are positively associated with Age. Age is called a lurking variable. Remember: An observed correlation between two variables may be spurious. That is, the correlation may be caused by the influence of a lurking variable.

3.6. Prediction: Linear Regression Objective: Assume two variables x and y are related: when x changes, the value of y also changes. Given a data set (x 1, y 1 ), (x 2, y 2 ),..., (x n, y n ) and a value x n+1, can we predict the value of y n+1. In this context, x is called the input variable or predictor, and y is called the output variable or response. Examples: Having known the price change history of IBM stock, can we predict its price for tomorrow? Based on your first quiz, predict you final score. Survey consumers need for certain product, make a recommendation for the number of items to be produced. Method: Linear regression (fitting a straight line to the data). Question: Why do we only consider linear relationships? (Remember that correlation measures the strength and direction of the linear association between variables.)

Linear relationships are easy to understand and analyze. Linear relationships are common. Variables with nonlinear relationships can sometimes be transformed so that the relationships are linear. (See Lab 4 for an example.) Nonlinear relationships can sometimes be closely approximated by linear relationships. Recall: A straight line is determined by two constants: its intercept and slope. In its equation y = β 1 x + β 0, β 0 is the intercept of this line with the y-axis and β 1 represents the slope of the line. Finding the best-fitting line Idea: Draw a line that seems to fit well and then find its equation. Problems:

Different people will come up with different best lines. How do we pick the best? It s very hard for large datasets. It doesn t generalize to relationships between more than two variables. For these and other reasons, we look for the least squares line. The least squares line minimizes the sum of squared deviations from the data. Steps for finding the regression line: i. Plotting a scatter diagram to see whether a linear relation exists. If it does, go to the next step. ii. Using the data to estimate β 0 and β 1. This can be done by using the least square method: Slope β 1 = S x,y S x,x Intercept β 0 = y β 1 x.

iii. The fitted regression line is ŷ = β 1 x + β 0. Predicted values For a given value of the x-variable, we compute the predicted value by plugging the value into the least squares line equation. Example 7. See page 117. Example 8. Exercise 3.44.