Chapter 10. The relationship between TWO variables. Response and Explanatory Variables. Scatterplots. Example 1: Highway Signs 2/26/2009

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Chapter 10. The relationship between TWO variables. Response and Explanatory Variables. Scatterplots. Example 1: Highway Signs 2/26/2009"

Transcription

1 Chapter 10 Section 10-2: Correlation Section 10-3: Regression Section 10-4: Variation and Prediction Intervals The relationship between TWO variables So far we have dealt with data obtained from one variable (either categorical or quantitative). In this chapter we will explore the relationship between two quantitative variables. 1 2 Response and Explanatory Variables In most studies involving two variables, each of the variables has a role. We distinguish between: the response variable - the outcome of the study the explanatory variable - the variable that claims to explain, predict or affect the response. Scatterplots In a scatterplot one axis is used to represent each of the variables, and the data are plotted as points on the graph. Typically, the explanatory or independent variable is plotted on the x axis and the response or dependent variable is plotted on the y axis. 4 Example 1: Highway Signs A Pennsylvania research firm conducted a study in which 30 drivers (of ages 18 to 82 years old) were sampled and for each one the maximum distance at which he/she could read a newly designed sign was determined. The goal of this study was to explore the relationship between driver's age and the maximum distance at which signs were legible, and then use the study's findings to improve safety for older drivers. Since the purpose of this study is to explore the effect of age on maximum legibility distance, the explanatory variable is Age, and the response variable is Distance. 32 1

2 Scatterplot Example 2 Here we have two quantitative variables for each of 16 students. How many beers they drank Their blood alcohol level We are interested in the relationship between the two variables: how is one affected by changes in the other one? Student Number of Beers Blood Alcohol Level Scatterplot example Some plots don t have clear explanatory and response variables. Student Beers BAC Response(dependent) variable Explanatory (independent) variable Do calories explain sodium amounts in hot dogs? Describing a Scatterplot Form: general shape linear clusters nonlinear no relationship 2

3 Direction Strength The strength of the relationship is determined by how closely the data follow the form of the relationship. A positive (or increasing) relationship means that an increase in one of the variables is associated with an increase in the other. A negative (or decreasing) relationship means that an increase in one of the variables is associated with a decrease in the other Deviation from the pattern Back to Example 1 Form: linear Direction: negative Outliers Strength: moderately strong do not appear to be any outliers Back to Example 2 Form: linear Direction: positive Strength:strong do not appear to be any outliers This is a weak relationship. For a particular state median household income, you can t predict the state per capita income very well. This is a very strong relationship. The daily amount of gas consumed can be predicted quite accurately for a given temperature value. 17 3

4 How to scale a scatterplot Same data for all four plots: Using an inappropriate scale for a scatterplot can give an incorrect impression. How to scale a scatterplot The straight-line pattern in the lower plot appears stronger because of the surrounding space. Both variables should be given a similar amount of space: Plot roughly square Points should occupy all the plot space (no blank space) Example 3 Example 4 Form: linear Direction: positive Strength: weak Outliers: 3 Form: linear Direction: negative Strength: medium-strong Outliers: no Adding categorical variables to scatterplots + for northeastern states for midwestern states The correlation coefficient, r The correlation coefficient is a measure of the direction and strength of a linear relationship. It is calculated using the mean and the standard deviation of both the x and y variables. The formal name for r is the Pearson product moment correlation coefficient. It is named after the English statistician Karl Pearson ( )

5 Correlation Back to Ex.1 Calculation: r is calculated using the following formula: r = 1 n 1 x x y y sx s y = 1 n 1 z x z y It looks scary, I know, but here s the basic idea: convert x and y to standardized values (z-scores), and find their average product (well, almost, divide by (n-1)). r ranges from 1 to +1 r quantifies the strength and direction of a linear relationship between two quantitative variables. Caution using correlation Use correlation only for linear relationships. Strength: How closely the points follow a straight line. Direction is positive when individuals with higher x values tend to have higher values of y. Influential points Correlations are calculated using means and standard deviations and thus are NOT resistant to outliers. Just moving one point away from the general trend here decreases the correlation from 0.91 to Properties of r Correlation requires that both variables be quantitative r has no units only measures the strength of a linear relationship ranges from -1 to 1 r is negative if the form of the relationship is negative r is positive if the form of the relationship is positive r is closer to 1 when the correlation is strong r is unchanged if you interchange x and y r is unchanged if you make a linear change of scale (ex. from feet to inches) The correlation is heavily influenced by outliers. 5

6 How to find r using the calculator 1 st step: enter you two lists (explanatory and response variables) STAT EDIT 1: Edit L1: enter your values of the explanatory variable, L2: enter your values of the response variable 2 nd step: find the correlation coefficient STAT CALC 8: LinReg(a+bx) LinReg(a+bx) L1,L2 r is the correlation coefficient BUT association does not imply causation! Even if two variables have a high correlation coefficient, it does not mean that the explanatory variable CAUSED the changes in the response variable Association does not imply causation! Example 1: During the months of March and April of a certain year, the weekly weight increases of a puppy in New York were collected. For the same time frame, the retail price increases of snowshoes in Alaska were collected. The data was examined and was found to have a very strong linear correlation. The weight of a growing puppy in New York (in pounds) The retail price of snowshoes in Alaska (in dollars) So, this must mean that the weight increase of a puppy in New York is causing snowshoe prices in Alaska to increase, or the price increases of snowshoes are causing the puppy's weight to increase. Of course this is not true! The moral of this example is: Be careful what you infer from your statistical analyses. Unfortunately, usually the situation is not as obvious as this one. Be sure your relationship makes sense. Also keep in mind that other factors may be involved in a potential cause and effect relationship. Association does not imply causation! Example 2: In the early 1930s the relationship between the human population (response variable) of Oldenburg, Germany, and number of storks nesting in the town (explanatory variable) was investigated. The correlation coefficient turned out to be Does this mean that storks bring babies? Can you give a possible explanation for this strong association?

7 The thymus example (shocking) The thymus, a gland in your neck, unlike other organs of the body, doesn t get larger as you grow it actually gets smaller. Imagine the situation: many infants are dying of what seem to be respiratory obstructions, so doctors begin to do autopsies on infants who die with respiratory symptoms. They have done many autopsies in the past on adults who died of various causes, so they decide to rely on those autopsy results for comparison. What stands out most when they did autopsies on the infants is that they all have thymus glands that look too big in comparison to their body size. So they concluded that the respiratory problems are caused by an enlarged thymus. It became quite common in the early 1900s for surgeons to treat respiratory problems in children by removing the thymus. In particular, in 1912, Dr. Charles Mayo published an article recommending removal of the thymus. He made this recommendation even though a third of the children who were operated on died. What s the lurking variable in this shocking example? What could be a lurking variable in these examples? There is a strong positive correlation between the foot length of K-12 students and reading scores. Students who use tutors have lower test scores than students who don t. A survey shows a strong positive correlation between the percentage of a country's inhabitants that use cell phones and the life expectancy in that country. Important: Association does not imply causation! One of the most common mistakes people make is when they observe a high correlation between two variables and conclude that one must be causing the other. Scatterplots and correlation do NOT demonstrate causation. It s hard to establish the nature and direction of causation, and there is always the risk of overlooking lurking variables Simpson s Paradox A relationship between two variables that holds for each individual value of a third variable can be changed or even reversed when the data for all values of the third variable are combined. This is Simpson s paradox. Simpson s paradox is an example of the effect of lurking variables on an observed association. Simpson s paradox Simpson s paradox is a severe form of confounding in which there is a reversal in the direction of an association caused by a lurking variable. Overall direction of association: positive But when we color different habitats in different colors, the data is separated by a lurking variable (different habitats) into a series of negative linear associations

8 Simpson s Paradox Example: Is acceptance into a college (response variable) predicted by gender (explanatory variable)? Consider these data: Success Failure Total Male Female Proportions accepted by gender: Male success rate = 198 / 360 = 0.55 Female success rate = 88 / 200 = 0.44 Conclude: males were accepted at a higher rate than females. 43 Broken down according to the lurking variable "major " Success Failure Total Male Female Business Success Failure Total Male Female Male proportion = 18 / 120 = 0.15 Female proportion = 24 / 120 = 0.20 Therefore: males were accepted at a lower rate than females. Art Success Failure Total Male Female Male proportion = 180 / 240 = 0.75 Female proportion = 64 / 80 = 0.80 Therefore: males were accepted at a lower rate than females. 44 Summary of causation Association does not imply causation! Association does not imply causation! Association does not imply causation! The issue of lurking variables and Simpson's paradox occur equally in both quantitative and categorical situations. So, in either case, be careful with your conclusion, and remember: Association does not imply causation! Explanatory variables A researcher wants to know if taking increasing amounts of ginkgo biloba will result in increased capacities of memory ability for different students. He administers it to the students in doses of 250 milligrams, 500 milligrams, and 1000 milligrams. What is the explanatory variable in this study? a) Amount of ginkgo biloba given to each student. b) Change in memory ability. c) Size of the student s brain. d) Whether the student takes the ginkgo biloba. 45 Numeric bivariate data The first step in analyzing numeric bivariate data is to a) Measure strength of linear relationship. b) Create a scatterplot. c) Model linear relationship with regression line. Scatterplots Look at the following scatterplot. Choose which description BEST fits the plot. a) Direction: positive, form: linear, strength: strong b) Direction: negative, form: linear, strength: strong c) Direction: positive, form: non-linear, strength: weak d) Direction: negative, form: non-linear, strength: weak e) No relationship 8

9 Scatterplots Look at the following scatterplot. Choose which description BEST fits the plot. Scatterplots Look at the following scatterplot. Choose which description BEST fits the plot. a) Direction: positive, form: non-linear, strength: strong b) Direction: negative, form: linear, strength: strong c) Direction: positive, form: linear, strength: weak d) Direction: positive, form: non-linear, strength: weak e) No relationship a) Direction: positive, form: non-linear, strength: strong b) Direction: negative, form: linear, strength: strong c) Direction: positive, form: linear, strength: weak d) Direction: positive, form: non-linear, strength: weak e) No relationship Scatterplots Which of the following scatterplots displays the stronger linear relationship? Correlation For which of the following situations would it be appropriate to calculate r, the correlation coefficient? a) Plot A b) Plot B a) Time spent studying for statistics exam and score on the exam. b) Income for county employees and their respective counties. c) Eye color and hair color of selected participants. d) Party affiliation of senators and their vote on presidential impeachment. c) Same for both Correlation What is a FALSE statement about r, the correlation coefficient? Correlation Which scatterplot would give a larger value for r? a) It is a product of z-scores of X and Y. b) It can range in value from 1 to 1. c) It measures the strength and direction of the linear relationship between X and Y. d) It is measured in units of the X variable. a) Plot A b) Plot B c) It would be the same for both plots. 9

10 Correlation True or False? Computing r as a measure of the strength of the relationship between X and Y is appropriate for the data in the following scatterplot: Correlation tells us about strength (scatter) and direction of the linear relationship between two quantitative variables. a) True b) False In addition, we would like to have a numerical description of how both variables vary together. For instance, is one variable increasing faster than the other one? And we would like to make predictions based on that numerical description. But which line best describes our data? A regression line Example 1 revisited A regression line is a straight line that describes how a response variable y changes as an explanatory variable x changes. We often use a regression line to predict the value of y for a given value of x. Example 1. again Example 2 revisited Which line to use? In most cases, no line will pass exactly through all the points in a scatterplot. Different people will draw different lines by eye. We need a way to draw a regression line that doesn t depend on our guess as to where the line should go. We will call this best line the Least-squares regression line

11 Least-squares Regression Line For a set of data points (x,y) the least squares regression line is a line for which the sum of squared errors is as small as possible. Equation of the Least-squares Regression Line $y = b + b x y ˆ = a+ bx Predicted value 0 1 Book s notation Calculator s notation All we need to do is calculate the intercept a, and the slope b. How to find a and b using the calculator 1 st step: enter you two lists (explanatory and response variables) STAT EDIT 1: Edit L1: enter your values of the explanatory variable, L2: enter your values of the response variable 2 nd step: find the correlation coefficient STAT CALC 8: LinReg(a+bx) LinReg(a+bx) L1,L2 a is the intercept, b is the slope Other way to find a and b: First we calculate the slope of the line, b = r s y b, from statistics we already know: r is the correlation s x s y is the standard deviation of the response variable y s x is the the standard deviation of the explanatory variable x Once we know b, the slope, we can calculate a, the y-intercept: a = y bx where x and y are the sample means of the x and y variables 63 Facts about least-squares regression Ex.1 AGAIN y The distinction between explanatory and response variables is essential in regression. The least-squares regression line always passes through the point ( x, y) ˆ y = a + bx a = 576 b = -3 $y = 576 3x Distance = 576 feet 3 Age x 65 11

12 Prediction: Interpolation The equation of the least-squares regression allows you to predict y for any x within the range studied. This is called interpolating. Prediction: Interpolation Predict the maximum distance at which a sign is legible for a 60 year old. Distance = 576 feet 3 Age Predicted distance = 576 feet = feet is our best prediction for the maximum distance at which a sign is legible for a 60 year old Prediction Ex.1 Predict the maximum distance at which a sign is legible for a 90 year old. Distance = 576 feet 3 Age Predicted distance = 576 feet = feet is our best prediction for the maximum distance at which a sign is legible for a 90 year old. BUT But this prediction is NOT RELIABLE. It is called EXTRAPOLATION. 69 Extrapolation Extrapolation is the use of a regression line for predictions outside the range of x values used to obtain the line. This can be a very silly thing to do, as seen here.!!!!!! Example 2 AGAIN y$ = x Nobody in the study drank 6.5 beers, but by finding the value of ŷ from the regression line for x = 6.5, we would expect a blood alcohol content of mg/ml. 12

13 Residuals The distances from each point to the least-squares regression are called residuals. The sum of these residuals is always 0. Points above the line have a positive residual. Ex.1 AGAIN $y = = 480 $y y Points below the line have a negative residual. ^ Predicted y Observed y dist. ( y yˆ) = residual residual y y$ = = Sum of squared errors Which least-squares regression line would have a smaller sum of squared errors? a) The line in Plot A. b) The line in Plot B. c) It would be the same for both plots. Slope Look at the following scatterplot. What would be a correct interpretation of the slope? a) As we increase our CO content by 1 mg, we increase the tar content by 1.01 mg. b) As we increase our CO content by 0.66 mg, we increase the tar content by 1.01 mg. c) As we increase our CO content by 0.66 mg, we increase the tar content by 0.66 mg. d) As we increase our CO content by 1 mg, we increase the tar content by 0.66 mg. Residuals Look at the following least-squares regression line. Compare the residuals from the two Points A and B. a) Point A s would be greater than Point B s. b) Point A s would be less than Point B s. c) Point A s would be equal to Point B s. d) There is not enough information. Residuals Residual equals a) b) c) d) 13

14 Correlation or regression Which of the following measures the direction and strength of the linear association between X and Y? Correlation or regression Which of the following makes no distinction between explanatory and response variables? a) Correlation b) Regression a) Correlation b) Regression Correlation or regression Which of the following is used for prediction? Regression line A regression line always passes through the point a) Correlation b) Regression a) b) c) d) Linear regression The following graph shows the linear relationship between diamond size and price for diamonds size 0.35 carats or less. Using this relationship to predict the price of a diamond that is 1 carat is considered Don t forget, the first test is on next Wednesday, 3/4. It will cover Chapters 1, 2, 3, and 10. a) Extrapolation. b) An influential observation. c) Prediction

Chapter 10 - Practice Problems 1

Chapter 10 - Practice Problems 1 Chapter 10 - Practice Problems 1 1. A researcher is interested in determining if one could predict the score on a statistics exam from the amount of time spent studying for the exam. In this study, the

More information

Linear Regression. Chapter 5. Prediction via Regression Line Number of new birds and Percent returning. Least Squares

Linear Regression. Chapter 5. Prediction via Regression Line Number of new birds and Percent returning. Least Squares Linear Regression Chapter 5 Regression Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). We can then predict the average response for all subjects

More information

, has mean A) 0.3. B) the smaller of 0.8 and 0.5. C) 0.15. D) which cannot be determined without knowing the sample results.

, has mean A) 0.3. B) the smaller of 0.8 and 0.5. C) 0.15. D) which cannot be determined without knowing the sample results. BA 275 Review Problems - Week 9 (11/20/06-11/24/06) CD Lessons: 69, 70, 16-20 Textbook: pp. 520-528, 111-124, 133-141 An SRS of size 100 is taken from a population having proportion 0.8 of successes. An

More information

Mind on Statistics. Chapter 3

Mind on Statistics. Chapter 3 Mind on Statistics Chapter 3 Section 3.1 1. Which one of the following is not appropriate for studying the relationship between two quantitative variables? A. Scatterplot B. Bar chart C. Correlation D.

More information

Chapter 9. Section Correlation

Chapter 9. Section Correlation Chapter 9 Section 9.1 - Correlation Objectives: Introduce linear correlation, independent and dependent variables, and the types of correlation Find a correlation coefficient Test a population correlation

More information

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Section 14 Simple Linear Regression: Introduction to Least Squares Regression Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship

More information

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r), Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables

More information

Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation

Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation Display and Summarize Correlation for Direction and Strength Properties of Correlation Regression Line Cengage

More information

2. Here is a small part of a data set that describes the fuel economy (in miles per gallon) of 2006 model motor vehicles.

2. Here is a small part of a data set that describes the fuel economy (in miles per gallon) of 2006 model motor vehicles. Math 1530-017 Exam 1 February 19, 2009 Name Student Number E There are five possible responses to each of the following multiple choice questions. There is only on BEST answer. Be sure to read all possible

More information

AP STATISTICS REVIEW (YMS Chapters 1-8)

AP STATISTICS REVIEW (YMS Chapters 1-8) AP STATISTICS REVIEW (YMS Chapters 1-8) Exploring Data (Chapter 1) Categorical Data nominal scale, names e.g. male/female or eye color or breeds of dogs Quantitative Data rational scale (can +,,, with

More information

Correlation and Regression

Correlation and Regression Correlation and Regression Scatterplots Correlation Explanatory and response variables Simple linear regression General Principles of Data Analysis First plot the data, then add numerical summaries Look

More information

Exercise 1.12 (Pg. 22-23)

Exercise 1.12 (Pg. 22-23) Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

More information

Homework 8 Solutions

Homework 8 Solutions Math 17, Section 2 Spring 2011 Homework 8 Solutions Assignment Chapter 7: 7.36, 7.40 Chapter 8: 8.14, 8.16, 8.28, 8.36 (a-d), 8.38, 8.62 Chapter 9: 9.4, 9.14 Chapter 7 7.36] a) A scatterplot is given below.

More information

Name: Date: Use the following to answer questions 2-3:

Name: Date: Use the following to answer questions 2-3: Name: Date: 1. A study is conducted on students taking a statistics class. Several variables are recorded in the survey. Identify each variable as categorical or quantitative. A) Type of car the student

More information

Yiming Peng, Department of Statistics. February 12, 2013

Yiming Peng, Department of Statistics. February 12, 2013 Regression Analysis Using JMP Yiming Peng, Department of Statistics February 12, 2013 2 Presentation and Data http://www.lisa.stat.vt.edu Short Courses Regression Analysis Using JMP Download Data to Desktop

More information

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1) CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

More information

Elementary Statistics. Scatter Plot, Regression Line, Linear Correlation Coefficient, and Coefficient of Determination

Elementary Statistics. Scatter Plot, Regression Line, Linear Correlation Coefficient, and Coefficient of Determination Scatter Plot, Regression Line, Linear Correlation Coefficient, and Coefficient of Determination What is a Scatter Plot? A Scatter Plot is a plot of ordered pairs (x, y) where the horizontal axis is used

More information

Chapter 23. Inferences for Regression

Chapter 23. Inferences for Regression Chapter 23. Inferences for Regression Topics covered in this chapter: Simple Linear Regression Simple Linear Regression Example 23.1: Crying and IQ The Problem: Infants who cry easily may be more easily

More information

04 Paired Data and Scatter Diagrams

04 Paired Data and Scatter Diagrams Paired Data and Scatter Diagrams Best Fit Lines: Linear Regressions A runner runs from the College of Micronesia- FSM National campus to PICS via the powerplant/nahnpohnmal back road The runner tracks

More information

Correlation key concepts:

Correlation key concepts: CORRELATION Correlation key concepts: Types of correlation Methods of studying correlation a) Scatter diagram b) Karl pearson s coefficient of correlation c) Spearman s Rank correlation coefficient d)

More information

The aspect of the data that we want to describe/measure is the degree of linear relationship between and The statistic r describes/measures the degree

The aspect of the data that we want to describe/measure is the degree of linear relationship between and The statistic r describes/measures the degree PS 511: Advanced Statistics for Psychological and Behavioral Research 1 Both examine linear (straight line) relationships Correlation works with a pair of scores One score on each of two variables ( and

More information

SIMPLE REGRESSION ANALYSIS

SIMPLE REGRESSION ANALYSIS SIMPLE REGRESSION ANALYSIS Introduction. Regression analysis is used when two or more variables are thought to be systematically connected by a linear relationship. In simple regression, we have only two

More information

Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

More information

table to see that the probability is 0.8413. (b) What is the probability that x is between 16 and 60? The z-scores for 16 and 60 are: 60 38 = 1.

table to see that the probability is 0.8413. (b) What is the probability that x is between 16 and 60? The z-scores for 16 and 60 are: 60 38 = 1. Review Problems for Exam 3 Math 1040 1 1. Find the probability that a standard normal random variable is less than 2.37. Looking up 2.37 on the normal table, we see that the probability is 0.9911. 2. Find

More information

Unit 11: Fitting Lines to Data

Unit 11: Fitting Lines to Data Unit 11: Fitting Lines to Data Summary of Video Scatterplots are a great way to visualize the relationship between two quantitative variables. For example, the scatterplot of temperatures and coral reef

More information

Statistics 151 Practice Midterm 1 Mike Kowalski

Statistics 151 Practice Midterm 1 Mike Kowalski Statistics 151 Practice Midterm 1 Mike Kowalski Statistics 151 Practice Midterm 1 Multiple Choice (50 minutes) Instructions: 1. This is a closed book exam. 2. You may use the STAT 151 formula sheets and

More information

Correlation A relationship between two variables As one goes up, the other changes in a predictable way (either mostly goes up or mostly goes down)

Correlation A relationship between two variables As one goes up, the other changes in a predictable way (either mostly goes up or mostly goes down) Two-Variable Statistics Correlation A relationship between two variables As one goes up, the other changes in a predictable way (either mostly goes up or mostly goes down) Positive Correlation As one variable

More information

MTH 140 Statistics Videos

MTH 140 Statistics Videos MTH 140 Statistics Videos Chapter 1 Picturing Distributions with Graphs Individuals and Variables Categorical Variables: Pie Charts and Bar Graphs Categorical Variables: Pie Charts and Bar Graphs Quantitative

More information

Pearson s correlation

Pearson s correlation Pearson s correlation Introduction Often several quantitative variables are measured on each member of a sample. If we consider a pair of such variables, it is frequently of interest to establish if there

More information

Lesson Lesson Outline Outline

Lesson Lesson Outline Outline Lesson 15 Linear Regression Lesson 15 Outline Review correlation analysis Dependent and Independent variables Least Squares Regression line Calculating l the slope Calculating the Intercept Residuals and

More information

Section 3 Part 1. Relationships between two numerical variables

Section 3 Part 1. Relationships between two numerical variables Section 3 Part 1 Relationships between two numerical variables 1 Relationship between two variables The summary statistics covered in the previous lessons are appropriate for describing a single variable.

More information

Example: Boats and Manatees

Example: Boats and Manatees Figure 9-6 Example: Boats and Manatees Slide 1 Given the sample data in Table 9-1, find the value of the linear correlation coefficient r, then refer to Table A-6 to determine whether there is a significant

More information

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the

More information

Scatter Plot, Correlation, and Regression on the TI-83/84

Scatter Plot, Correlation, and Regression on the TI-83/84 Scatter Plot, Correlation, and Regression on the TI-83/84 Summary: When you have a set of (x,y) data points and want to find the best equation to describe them, you are performing a regression. This page

More information

College of the Canyons Math 140 Exam 1 Amy Morrow. Name:

College of the Canyons Math 140 Exam 1 Amy Morrow. Name: Name: Answer the following questions NEATLY. Show all necessary work directly on the exam. Scratch paper will be discarded unread. One point each part unless otherwise marked. 1. Owners of an exercise

More information

Statistiek II. John Nerbonne. March 24, 2010. Information Science, Groningen Slides improved a lot by Harmut Fitz, Groningen!

Statistiek II. John Nerbonne. March 24, 2010. Information Science, Groningen Slides improved a lot by Harmut Fitz, Groningen! Information Science, Groningen j.nerbonne@rug.nl Slides improved a lot by Harmut Fitz, Groningen! March 24, 2010 Correlation and regression We often wish to compare two different variables Examples: compare

More information

Stats Review Chapters 3-4

Stats Review Chapters 3-4 Stats Review Chapters 3-4 Created by Teri Johnson Math Coordinator, Mary Stangler Center for Academic Success Examples are taken from Statistics 4 E by Michael Sullivan, III And the corresponding Test

More information

2013 MBA Jump Start Program. Statistics Module Part 3

2013 MBA Jump Start Program. Statistics Module Part 3 2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just

More information

4. Describing Bivariate Data

4. Describing Bivariate Data 4. Describing Bivariate Data A. Introduction to Bivariate Data B. Values of the Pearson Correlation C. Properties of Pearson's r D. Computing Pearson's r E. Variance Sum Law II F. Exercises A dataset with

More information

Simple linear regression

Simple linear regression Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

More information

AP Statistics Solutions to Packet 3

AP Statistics Solutions to Packet 3 AP Statistics Solutions to Packet 3 Examining Relationships Scatterplots Correlation Least-Squares Regression HW #15 1, 3, 6, 7, 9, 10 3.1 EPLANATORY AND RESPONSE VARIABLES In each of the following situations,

More information

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Review MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. 1) All but one of these statements contain a mistake. Which could be true? A) There is a correlation

More information

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number 1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x - x) B. x 3 x C. 3x - x D. x - 3x 2) Write the following as an algebraic expression

More information

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade Statistics Quiz Correlation and Regression -- ANSWERS 1. Temperature and air pollution are known to be correlated. We collect data from two laboratories, in Boston and Montreal. Boston makes their measurements

More information

Module 3: Correlation and Covariance

Module 3: Correlation and Covariance Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis

More information

Introductory Statistics Notes

Introductory Statistics Notes Introductory Statistics Notes Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 35487-0348 Phone: (205) 348-4431 Fax: (205) 348-8648 August

More information

Relationships Between Two Variables: Scatterplots and Correlation

Relationships Between Two Variables: Scatterplots and Correlation Relationships Between Two Variables: Scatterplots and Correlation Example: Consider the population of cars manufactured in the U.S. What is the relationship (1) between engine size and horsepower? (2)

More information

Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2

Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2 Lesson 4 Part 1 Relationships between two numerical variables 1 Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables

More information

Homework 11. Part 1. Name: Score: / null

Homework 11. Part 1. Name: Score: / null Name: Score: / Homework 11 Part 1 null 1 For which of the following correlations would the data points be clustered most closely around a straight line? A. r = 0.50 B. r = -0.80 C. r = 0.10 D. There is

More information

Regents Exam Questions A2.S.8: Correlation Coefficient

Regents Exam Questions A2.S.8: Correlation Coefficient A2.S.8: Correlation Coefficient: Interpret within the linear regression model the value of the correlation coefficient as a measure of the strength of the relationship 1 Which statement regarding correlation

More information

Chapter 9 Descriptive Statistics for Bivariate Data

Chapter 9 Descriptive Statistics for Bivariate Data 9.1 Introduction 215 Chapter 9 Descriptive Statistics for Bivariate Data 9.1 Introduction We discussed univariate data description (methods used to eplore the distribution of the values of a single variable)

More information

Sydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 1. 1. Introduction p. 2. 2. Statistical Methods Used p. 5. 3. 10 and under Males p.

Sydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 1. 1. Introduction p. 2. 2. Statistical Methods Used p. 5. 3. 10 and under Males p. Sydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 1 Table of Contents 1. Introduction p. 2 2. Statistical Methods Used p. 5 3. 10 and under Males p. 8 4. 11 and up Males p. 10 5. 10 and under

More information

Simple Linear Regression in SPSS STAT 314

Simple Linear Regression in SPSS STAT 314 Simple Linear Regression in SPSS STAT 314 1. Ten Corvettes between 1 and 6 years old were randomly selected from last year s sales records in Virginia Beach, Virginia. The following data were obtained,

More information

CHAPTER 10 REGRESSION AND CORRELATION

CHAPTER 10 REGRESSION AND CORRELATION CHAPTER 10 REGRESSION AND CORRELATION LINEAR REGRESSION (SECTION 10.1 10.3 OF UNDERSTANDABLE STATISTICS) Important Note: Before beginning this chapter, press y[catalog] (above Ê) and scroll down to the

More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

Stat 412/512 CASE INFLUENCE STATISTICS. Charlotte Wickham. stat512.cwick.co.nz. Feb 2 2015

Stat 412/512 CASE INFLUENCE STATISTICS. Charlotte Wickham. stat512.cwick.co.nz. Feb 2 2015 Stat 412/512 CASE INFLUENCE STATISTICS Feb 2 2015 Charlotte Wickham stat512.cwick.co.nz Regression in your field See website. You may complete this assignment in pairs. Find a journal article in your field

More information

Inferential Statistics

Inferential Statistics Inferential Statistics Sampling and the normal distribution Z-scores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are

More information

Statistics 2014 Scoring Guidelines

Statistics 2014 Scoring Guidelines AP Statistics 2014 Scoring Guidelines College Board, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks of the College Board. AP Central is the official online home

More information

The importance of graphing the data: Anscombe s regression examples

The importance of graphing the data: Anscombe s regression examples The importance of graphing the data: Anscombe s regression examples Bruce Weaver Northern Health Research Conference Nipissing University, North Bay May 30-31, 2008 B. Weaver, NHRC 2008 1 The Objective

More information

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 13 Introduction to Linear Regression and Correlation Analysis Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing

More information

Describing Relationships between Two Variables

Describing Relationships between Two Variables Describing Relationships between Two Variables Up until now, we have dealt, for the most part, with just one variable at a time. This variable, when measured on many different subjects or objects, took

More information

Chapter 1 Linear Equations and Graphs

Chapter 1 Linear Equations and Graphs Chapter 1 Linear Equations and Graphs Section 1.1 - Linear Equations and Inequalities Objectives: The student will be able to solve linear equations. The student will be able to solve linear inequalities.

More information

Lecture 13/Chapter 10 Relationships between Measurement (Quantitative) Variables

Lecture 13/Chapter 10 Relationships between Measurement (Quantitative) Variables Lecture 13/Chapter 10 Relationships between Measurement (Quantitative) Variables Scatterplot; Roles of Variables 3 Features of Relationship Correlation Regression Definition Scatterplot displays relationship

More information

Simple Regression and Correlation

Simple Regression and Correlation Simple Regression and Correlation Today, we are going to discuss a powerful statistical technique for examining whether or not two variables are related. Specifically, we are going to talk about the ideas

More information

STATISTICS 8 CHAPTERS 1 TO 6, SAMPLE MULTIPLE CHOICE QUESTIONS

STATISTICS 8 CHAPTERS 1 TO 6, SAMPLE MULTIPLE CHOICE QUESTIONS STATISTICS 8 CHAPTERS 1 TO 6, SAMPLE MULTIPLE CHOICE QUESTIONS Correct answers are in bold italics.. This scenario applies to Questions 1 and 2: A study was done to compare the lung capacity of coal miners

More information

Correlation & Regression, II. Residual Plots. What we like to see: no pattern. Steps in regression analysis (so far)

Correlation & Regression, II. Residual Plots. What we like to see: no pattern. Steps in regression analysis (so far) Steps in regression analysis (so far) Correlation & Regression, II 9.07 4/6/2004 Plot a scatter plot Find the parameters of the best fit regression line, y =a+bx Plot the regression line on the scatter

More information

WEB APPENDIX. Calculating Beta Coefficients. b Beta Rise Run Y 7.1 1 8.92 X 10.0 0.0 16.0 10.0 1.6

WEB APPENDIX. Calculating Beta Coefficients. b Beta Rise Run Y 7.1 1 8.92 X 10.0 0.0 16.0 10.0 1.6 WEB APPENDIX 8A Calculating Beta Coefficients The CAPM is an ex ante model, which means that all of the variables represent before-thefact, expected values. In particular, the beta coefficient used in

More information

Session 7 Bivariate Data and Analysis

Session 7 Bivariate Data and Analysis Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares

More information

The Big 50 Revision Guidelines for S1

The Big 50 Revision Guidelines for S1 The Big 50 Revision Guidelines for S1 If you can understand all of these you ll do very well 1. Know what is meant by a statistical model and the Modelling cycle of continuous refinement 2. Understand

More information

The Correlation Coefficient

The Correlation Coefficient The Correlation Coefficient Lelys Bravo de Guenni April 22nd, 2015 Outline The Correlation coefficient Positive Correlation Negative Correlation Properties of the Correlation Coefficient Non-linear association

More information

Chapter 7 Scatterplots, Association, and Correlation

Chapter 7 Scatterplots, Association, and Correlation 78 Part II Exploring Relationships Between Variables Chapter 7 Scatterplots, Association, and Correlation 1. Association. a) Either weight in grams or weight in ounces could be the explanatory or response

More information

Correlation. What Is Correlation? Perfect Correlation. Perfect Correlation. Greg C Elvers

Correlation. What Is Correlation? Perfect Correlation. Perfect Correlation. Greg C Elvers Correlation Greg C Elvers What Is Correlation? Correlation is a descriptive statistic that tells you if two variables are related to each other E.g. Is your related to how much you study? When two variables

More information

2. Simple Linear Regression

2. Simple Linear Regression Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

More information

Simple Regression Theory I 2010 Samuel L. Baker

Simple Regression Theory I 2010 Samuel L. Baker SIMPLE REGRESSION THEORY I 1 Simple Regression Theory I 2010 Samuel L. Baker Regression analysis lets you use data to explain and predict. A simple regression line drawn through data points In Assignment

More information

Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS.

Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS. SPSS Regressions Social Science Research Lab American University, Washington, D.C. Web. www.american.edu/provost/ctrl/pclabs.cfm Tel. x3862 Email. SSRL@American.edu Course Objective This course is designed

More information

Univariate Regression

Univariate Regression Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is

More information

Using Minitab for Regression Analysis: An extended example

Using Minitab for Regression Analysis: An extended example Using Minitab for Regression Analysis: An extended example The following example uses data from another text on fertilizer application and crop yield, and is intended to show how Minitab can be used to

More information

Project: Linear Correlation and Regression

Project: Linear Correlation and Regression Project: Linear Correlation and Regression You may very well have studied linear regression before; I know many instructors discuss it in their classes. If the word regression means nothing to you...great!

More information

Chapter 5. Regression

Chapter 5. Regression Topics covered in this chapter: Chapter 5. Regression Adding a Regression Line to a Scatterplot Regression Lines and Influential Observations Finding the Least Squares Regression Model Adding a Regression

More information

17. SIMPLE LINEAR REGRESSION II

17. SIMPLE LINEAR REGRESSION II 17. SIMPLE LINEAR REGRESSION II The Model In linear regression analysis, we assume that the relationship between X and Y is linear. This does not mean, however, that Y can be perfectly predicted from X.

More information

e = random error, assumed to be normally distributed with mean 0 and standard deviation σ

e = random error, assumed to be normally distributed with mean 0 and standard deviation σ 1 Linear Regression 1.1 Simple Linear Regression Model The linear regression model is applied if we want to model a numeric response variable and its dependency on at least one numeric factor variable.

More information

AP Statistics Semester Exam Review Chapters 1-3

AP Statistics Semester Exam Review Chapters 1-3 AP Statistics Semester Exam Review Chapters 1-3 1. Here are the IQ test scores of 10 randomly chosen fifth-grade students: 145 139 126 122 125 130 96 110 118 118 To make a stemplot of these scores, you

More information

II. DISTRIBUTIONS distribution normal distribution. standard scores

II. DISTRIBUTIONS distribution normal distribution. standard scores Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

More information

c. Construct a boxplot for the data. Write a one sentence interpretation of your graph.

c. Construct a boxplot for the data. Write a one sentence interpretation of your graph. MBA/MIB 5315 Sample Test Problems Page 1 of 1 1. An English survey of 3000 medical records showed that smokers are more inclined to get depressed than non-smokers. Does this imply that smoking causes depression?

More information

Lesson 3.2.1 Using Lines to Make Predictions

Lesson 3.2.1 Using Lines to Make Predictions STATWAY INSTRUCTOR NOTES i INSTRUCTOR SPECIFIC MATERIAL IS INDENTED AND APPEARS IN GREY ESTIMATED TIME 50 minutes MATERIALS REQUIRED Overhead or electronic display of scatterplots in lesson BRIEF DESCRIPTION

More information

LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE

LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE MAT 119 STATISTICS AND ELEMENTARY ALGEBRA 5 Lecture Hours, 2 Lab Hours, 3 Credits Pre-

More information

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Module 7 Test Name MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. You are given information about a straight line. Use two points to graph the equation.

More information

Simple Predictive Analytics Curtis Seare

Simple Predictive Analytics Curtis Seare Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use

More information

UCLA STAT 13 Statistical Methods - Final Exam Review Solutions Chapter 7 Sampling Distributions of Estimates

UCLA STAT 13 Statistical Methods - Final Exam Review Solutions Chapter 7 Sampling Distributions of Estimates UCLA STAT 13 Statistical Methods - Final Exam Review Solutions Chapter 7 Sampling Distributions of Estimates 1. (a) (i) µ µ (ii) σ σ n is exactly Normally distributed. (c) (i) is approximately Normally

More information

AP Statistics 2001 Solutions and Scoring Guidelines

AP Statistics 2001 Solutions and Scoring Guidelines AP Statistics 2001 Solutions and Scoring Guidelines The materials included in these files are intended for non-commercial use by AP teachers for course and exam preparation; permission for any other use

More information

Algebra 1 Course Information

Algebra 1 Course Information Course Information Course Description: Students will study patterns, relations, and functions, and focus on the use of mathematical models to understand and analyze quantitative relationships. Through

More information

Copyright 2013 by Laura Schultz. All rights reserved. Page 1 of 6

Copyright 2013 by Laura Schultz. All rights reserved. Page 1 of 6 Using Your TI-NSpire Calculator: Linear Correlation and Regression Dr. Laura Schultz Statistics I This handout describes how to use your calculator for various linear correlation and regression applications.

More information

Descriptive statistics; Correlation and regression

Descriptive statistics; Correlation and regression Descriptive statistics; and regression Patrick Breheny September 16 Patrick Breheny STA 580: Biostatistics I 1/59 Tables and figures Descriptive statistics Histograms Numerical summaries Percentiles Human

More information

. 58 58 60 62 64 66 68 70 72 74 76 78 Father s height (inches)

. 58 58 60 62 64 66 68 70 72 74 76 78 Father s height (inches) PEARSON S FATHER-SON DATA The following scatter diagram shows the heights of 1,0 fathers and their full-grown sons, in England, circa 1900 There is one dot for each father-son pair Heights of fathers and

More information

IQR Rule for Outliers

IQR Rule for Outliers 1. Arrange data in order. IQR Rule for Outliers 2. Calculate first quartile (Q1), third quartile (Q3) and the interquartile range (IQR=Q3-Q1). CO2 emissions example: Q1=0.9, Q3=6.05, IQR=5.15. 3. Compute

More information

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

More information

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test March 2014

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test March 2014 UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test March 2014 STAB22H3 Statistics I Duration: 1 hour and 45 minutes Last Name: First Name: Student number: Aids

More information

Hints for Success on the AP Statistics Exam. (Compiled by Zack Bigner)

Hints for Success on the AP Statistics Exam. (Compiled by Zack Bigner) Hints for Success on the AP Statistics Exam. (Compiled by Zack Bigner) The Exam The AP Stat exam has 2 sections that take 90 minutes each. The first section is 40 multiple choice questions, and the second

More information

Introduction to Quantitative Methods

Introduction to Quantitative Methods Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................

More information

Econometrics Simple Linear Regression

Econometrics Simple Linear Regression Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight

More information