Chapter 9. Section Correlation



Similar documents
Example: Boats and Manatees

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

table to see that the probability is (b) What is the probability that x is between 16 and 60? The z-scores for 16 and 60 are: = 1.

Univariate Regression

Copyright 2007 by Laura Schultz. All rights reserved. Page 1 of 5

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Regression Analysis: A Complete Example

2013 MBA Jump Start Program. Statistics Module Part 3

Correlation key concepts:

Linear Regression. Chapter 5. Prediction via Regression Line Number of new birds and Percent returning. Least Squares

Section 1.5 Linear Models

2. Simple Linear Regression

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 7: Simple linear regression Learning Objectives

What does the number m in y = mx + b measure? To find out, suppose (x 1, y 1 ) and (x 2, y 2 ) are two points on the graph of y = mx + b.

Linear Equations. Find the domain and the range of the following set. {(4,5), (7,8), (-1,3), (3,3), (2,-3)}

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Scatter Plot, Correlation, and Regression on the TI-83/84

Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS.

Copyright 2013 by Laura Schultz. All rights reserved. Page 1 of 7

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Section 3 Part 1. Relationships between two numerical variables

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

ch12 practice test SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.

Factors affecting online sales

A synonym is a word that has the same or almost the same definition of

A Primer on Forecasting Business Performance

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.

Correlation. What Is Correlation? Perfect Correlation. Perfect Correlation. Greg C Elvers

WEB APPENDIX. Calculating Beta Coefficients. b Beta Rise Run Y X

Correlation and Regression

4. Simple regression. QBUS6840 Predictive Analytics.

Simple Regression Theory II 2010 Samuel L. Baker

Introduction to Regression and Data Analysis

Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation

Econometrics Simple Linear Regression

Calibration and Linear Regression Analysis: A Self-Guided Tutorial

Part 2: Analysis of Relationship Between Two Variables

Algebraic expressions are a combination of numbers and variables. Here are examples of some basic algebraic expressions.

SIMPLE LINEAR CORRELATION. r can range from -1 to 1, and is independent of units of measurement. Correlation can be done on two dependent variables.

Module 5: Multiple Regression Analysis

Causal Forecasting Models

Coordinate Plane, Slope, and Lines Long-Term Memory Review Review 1

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Simple Linear Regression Inference

Logs Transformation in a Regression Equation

Determine If An Equation Represents a Function

A POPULATION MEAN, CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

(Least Squares Investigation)

SPSS Guide: Regression Analysis

Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2

Pearson's Correlation Tests

Additional sources Compilation of sources:

Pearson s Correlation

Exercise 1.12 (Pg )

MTH 140 Statistics Videos

TIME SERIES ANALYSIS & FORECASTING

Dealing with Data in Excel 2010

Module 3: Correlation and Covariance

Simple Linear Regression, Scatterplots, and Bivariate Correlation

Simple Linear Regression

Using R for Linear Regression

The Dummy s Guide to Data Analysis Using SPSS

Session 7 Bivariate Data and Analysis

Case Study in Data Analysis Does a drug prevent cardiomegaly in heart failure?

Hypothesis testing - Steps

One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups

Using Excel for inferential statistics

Descriptive Statistics

The importance of graphing the data: Anscombe s regression examples

Simple linear regression

Introduction to Linear Regression

Lean Six Sigma Analyze Phase Introduction. TECH QUALITY and PRODUCTIVITY in INDUSTRY and TECHNOLOGY

Fairfield Public Schools

PEARSON R CORRELATION COEFFICIENT

Section 1: Simple Linear Regression

The correlation coefficient

Dimensionality Reduction: Principal Components Analysis

Slope-Intercept Equation. Example

Analysing Questionnaires using Minitab (for SPSS queries contact -)

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

Graphing Rational Functions

Regression and Correlation

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

hp calculators HP 50g Trend Lines The STAT menu Trend Lines Practice predicting the future using trend lines

Stat 412/512 CASE INFLUENCE STATISTICS. Charlotte Wickham. stat512.cwick.co.nz. Feb

Module 5: Statistical Analysis

Section 7.1. Introduction to Hypothesis Testing. Schrodinger s cat quantum mechanics thought experiment (1935)

Calculator Notes for the TI-Nspire and TI-Nspire CAS

Graphing Linear Equations

ELEMENTARY STATISTICS

A and B This represents the probability that both events A and B occur. This can be calculated using the multiplication rules of probability.

17. SIMPLE LINEAR REGRESSION II

Section 1.1 Linear Equations: Slope and Equations of Lines

1.3 LINEAR EQUATIONS IN TWO VARIABLES. Copyright Cengage Learning. All rights reserved.

Transcription:

Chapter 9 Section 9.1 - Correlation Objectives: Introduce linear correlation, independent and dependent variables, and the types of correlation Find a correlation coefficient Test a population correlation coefficient ρ using a table Perform a hypothesis test for a population correlation coefficient ρ Distinguish between correlation and causation Correlation A relationship between two variables. The data can be represented by ordered pairs (x, y) x is the independent (or explanatory) variable y is the dependent (or response) variable A scatter plot can be used to determine whether a linear (straight line) correlation exists between two variables. Types of Correlation 1 P a g e

Example: Constructing a Scatter Plot An economist want to determine whether there is a linear relationship between a country s gross domestic product (GDP) and carbon dioxide (CO 2 ) emissions. The data are shown in the table. Display the data in a scatter plot and determine whether there appears to be a positive or negative linear correlation or no linear correlation. (Source: World Bank and U.S. Energy Information Administration) Correlation coefficient A measure of the strength and the direction of a linear relationship between two variables. The symbol r represents the sample correlation coefficient. A formula for r is r n xy x y 2 2 2 2 n x x n y y The population correlation coefficient is represented by ρ (rho). The range of the correlation coefficient is -1 to 1. n is the number of data pairs 2 P a g e

Linear Correlation Calculating a Correlation Coefficient 3 P a g e

Example: Finding the Correlation Coefficient Calculate the correlation coefficient for the gross domestic products and carbon dioxide emissions data. What can you conclude? Using a Table to Test a Population Correlation Coefficient ρ Once the sample correlation coefficient r has been calculated, we need to determine whether there is enough evidence to decide that the population correlation coefficient ρ is significant at a specified level of significance. Use Table 11 in Appendix B. If r is greater than the critical value, there is enough evidence to decide that the correlation coefficient ρ is significant. 4 P a g e

Example: Determine whether ρ is significant for five pairs of data (n = 5) at a level of significance of α = 0.01. If r > 0.959, the correlation is significant. Otherwise, there is not enough evidence to conclude that the correlation is significant. 5 P a g e

Example: Using a Table to Test a Population Correlation Coefficient ρ Below is the data for Old Faithful, you used 25 pairs of data to find r 0.979. Is the correlation coefficient significant? Use α = 0.05. Hypothesis Testing for a Population Correlation Coefficient ρ A hypothesis test can also be used to determine whether the sample correlation coefficient r provides enough evidence to conclude that the population correlation coefficient ρ is significant at a specified level of significance. A hypothesis test can be one-tailed or two-tailed. Left-tailed test H 0 : ρ 0 (no significant negative correlation) H a : ρ < 0 (significant negative correlation) Right-tailed test H 0 : ρ 0 (no significant positive correlation) H a : ρ > 0 (significant positive correlation) Two-tailed test H 0 : ρ = 0 (no significant correlation) H a : ρ 0 (significant correlation) 6 P a g e

The t-test for the Correlation Coefficient Can be used to test whether the correlation between two variables is significant. The test statistic is r The standardized test statistic follows a t-distribution with d.f. = n 2. In this text, only two-tailed hypothesis tests for ρ are considered. Using the t-test for ρ 7 P a g e

Example: t-test for a Correlation Coefficient Previously you calculated r 0.882 (On page 4 on notes). Test the significance of this correlation coefficient. Use α = 0.05. Correlation and Causation The fact that two variables are strongly correlated does not in itself imply a cause-and-effect relationship between the variables. If there is a significant correlation between two variables, you should consider the following possibilities. 1. Is there a direct cause-and-effect relationship between the variables? Does x cause y? 2. Is there a reverse cause-and-effect relationship between the variables? Does y cause x? 3. Is it possible that the relationship between the variables can be caused by a third variable or by a combination of several other variables? 4. Is it possible that the relationship between two variables may be a coincidence? 8 P a g e

Section 9.2 - Linear Regression Objectives: Find the equation of a regression line Predict y-values using a regression equation Regression lines After verifying that the linear correlation between two variables is significant, next we determine the equation of the line that best models the data (regression line). Can be used to predict the value of y for a given value of x. Residual The difference between the observed y-value and the predicted y-value for a given x-value on the line. Regression line (line of best fit) The line for which the sum of the squares of the residuals is a minimum. The equation of a regression line for an independent variable x and a dependent variable y is ŷ = mx + b where m is the slope, b is the y-intercept and is the predicted y-value for a given x value 9 P a g e

The Equation of a Regression Line ŷ = mx + b where 2 n xy x y m 2 n x x is the mean of the y-values in the data is the mean of the x-values in the data The regression line always passes through the point x, y Example: Finding the Equation of a Regression Line Find the equation of the regression line for the gross domestic products and carbon dioxide emissions data. 10 P a g e

Example: Predicting y-values Using Regression Equations The regression equation for the gross domestic products (in trillions of dollars) and carbon dioxide emissions (in millions of metric tons) data is ŷ = 196.152x + 102.289. Use this equation to predict the expected carbon dioxide emissions for the following gross domestic products. (Recall from section 9.1 that x and y have a significant linear correlation.) 1. 1.2 trillion dollars 2. 2.0 trillion dollars 3. 2.5 trillion dollars 11 P a g e

Section 9.3 - Measures of Regression and Prediction Intervals Objectives: Interpret the three types of variation about a regression line Find and interpret the coefficient of determination Find and interpret the standard error of the estimate for a regression line Construct and interpret a prediction interval for y Variation About a Regression Line Three types of variation about a regression line Total variation Explained variation Unexplained variation To find the total variation, you must first calculate The total deviation The explained deviation The unexplained deviation Total Deviation = Explained Deviation = Unexplained Deviation = Total variation The sum of the squares of the differences between the y-value of each ordered pair and the mean of y. Total Variation = Explained variation The sum of the squares of the differences between each predicted y-value and the mean of y. Explained Variation = Unexplained variation The sum of the squares of the differences between the y-value of each ordered pair and each corresponding predicted y-value. Unexplained Variation = The sum of the explained and unexplained variation is equal to the total variation. Total variation = Explained variation + Unexplained variation 12 P a g e

Coefficient of determination The ratio of the explained variation to the total variation. Denoted by r 2 2 Explained variation r Total variation Example: Coefficient of Determination The correlation coefficient for the gross domestic products and carbon dioxide emissions data as calculated in Section 9.1 is r 0.883. Find the coefficient of determination. What does this tell you about the explained variation of the data about the regression line? About the unexplained variation? Standard error of estimate The standard deviation of the observed y i -values about the predicted ŷ-value for a given x i - value. Denoted by s e. s e ( yi yˆ i) n 2 2 n is the number of ordered pairs in the data set The closer the observed y-values are to the predicted y-values, the smaller the standard error of estimate will be. 13 P a g e

Example: Standard Error of Estimate The regression equation for the gross domestic products and carbon dioxide emissions data as calculated in section 9.2 is ŷ = 196.152x + 102.289 Find the standard error of estimate. 14 P a g e

Prediction Intervals Two variables have a bivariate normal distribution if for any fixed value of x, the corresponding values of y are normally distributed and for any fixed values of y, the corresponding x-values are normally distributed. A prediction interval can be constructed for the true value of y. Given a linear regression equation ŷ = mx + b and x 0, a specific value of x, a c-prediction interval for y is ŷ E < y < ŷ + E where E 2 0 x 2 2 1 n( x ) tcse 1 n n x ( x) The point estimate is ŷ and the margin of error is E. The probability that the prediction interval contains y is c. Constructing a Prediction Interval for y for a Specific Value of x 15 P a g e

Example: Constructing a Prediction Interval Construct a 95% prediction interval for the carbon dioxide emission when the gross domestic product is $3.5 trillion. What can you conclude? Recall, n = 10, ŷ = 196.152x + 102.289, s e = 138.255 x 15.8, 2 x 32.44, x 1.975 16 P a g e