Chapter 10. The relationship between TWO variables. Response and Explanatory Variables. Scatterplots. Example 1: Highway Signs 2/26/2009

Size: px
Start display at page:

Download "Chapter 10. The relationship between TWO variables. Response and Explanatory Variables. Scatterplots. Example 1: Highway Signs 2/26/2009"

Transcription

1 Chapter 10 Section 10-2: Correlation Section 10-3: Regression Section 10-4: Variation and Prediction Intervals The relationship between TWO variables So far we have dealt with data obtained from one variable (either categorical or quantitative). In this chapter we will explore the relationship between two quantitative variables. 1 2 Response and Explanatory Variables In most studies involving two variables, each of the variables has a role. We distinguish between: the response variable - the outcome of the study the explanatory variable - the variable that claims to explain, predict or affect the response. Scatterplots In a scatterplot one axis is used to represent each of the variables, and the data are plotted as points on the graph. Typically, the explanatory or independent variable is plotted on the x axis and the response or dependent variable is plotted on the y axis. 4 Example 1: Highway Signs A Pennsylvania research firm conducted a study in which 30 drivers (of ages 18 to 82 years old) were sampled and for each one the maximum distance at which he/she could read a newly designed sign was determined. The goal of this study was to explore the relationship between driver's age and the maximum distance at which signs were legible, and then use the study's findings to improve safety for older drivers. Since the purpose of this study is to explore the effect of age on maximum legibility distance, the explanatory variable is Age, and the response variable is Distance. 32 1

2 Scatterplot Example 2 Here we have two quantitative variables for each of 16 students. How many beers they drank Their blood alcohol level We are interested in the relationship between the two variables: how is one affected by changes in the other one? Student Number of Beers Blood Alcohol Level Scatterplot example Some plots don t have clear explanatory and response variables. Student Beers BAC Response(dependent) variable Explanatory (independent) variable Do calories explain sodium amounts in hot dogs? Describing a Scatterplot Form: general shape linear clusters nonlinear no relationship 2

3 Direction Strength The strength of the relationship is determined by how closely the data follow the form of the relationship. A positive (or increasing) relationship means that an increase in one of the variables is associated with an increase in the other. A negative (or decreasing) relationship means that an increase in one of the variables is associated with a decrease in the other Deviation from the pattern Back to Example 1 Form: linear Direction: negative Outliers Strength: moderately strong do not appear to be any outliers Back to Example 2 Form: linear Direction: positive Strength:strong do not appear to be any outliers This is a weak relationship. For a particular state median household income, you can t predict the state per capita income very well. This is a very strong relationship. The daily amount of gas consumed can be predicted quite accurately for a given temperature value. 17 3

4 How to scale a scatterplot Same data for all four plots: Using an inappropriate scale for a scatterplot can give an incorrect impression. How to scale a scatterplot The straight-line pattern in the lower plot appears stronger because of the surrounding space. Both variables should be given a similar amount of space: Plot roughly square Points should occupy all the plot space (no blank space) Example 3 Example 4 Form: linear Direction: positive Strength: weak Outliers: 3 Form: linear Direction: negative Strength: medium-strong Outliers: no Adding categorical variables to scatterplots + for northeastern states for midwestern states The correlation coefficient, r The correlation coefficient is a measure of the direction and strength of a linear relationship. It is calculated using the mean and the standard deviation of both the x and y variables. The formal name for r is the Pearson product moment correlation coefficient. It is named after the English statistician Karl Pearson ( )

5 Correlation Back to Ex.1 Calculation: r is calculated using the following formula: r = 1 n 1 x x y y sx s y = 1 n 1 z x z y It looks scary, I know, but here s the basic idea: convert x and y to standardized values (z-scores), and find their average product (well, almost, divide by (n-1)). r ranges from 1 to +1 r quantifies the strength and direction of a linear relationship between two quantitative variables. Caution using correlation Use correlation only for linear relationships. Strength: How closely the points follow a straight line. Direction is positive when individuals with higher x values tend to have higher values of y. Influential points Correlations are calculated using means and standard deviations and thus are NOT resistant to outliers. Just moving one point away from the general trend here decreases the correlation from 0.91 to Properties of r Correlation requires that both variables be quantitative r has no units only measures the strength of a linear relationship ranges from -1 to 1 r is negative if the form of the relationship is negative r is positive if the form of the relationship is positive r is closer to 1 when the correlation is strong r is unchanged if you interchange x and y r is unchanged if you make a linear change of scale (ex. from feet to inches) The correlation is heavily influenced by outliers. 5

6 How to find r using the calculator 1 st step: enter you two lists (explanatory and response variables) STAT EDIT 1: Edit L1: enter your values of the explanatory variable, L2: enter your values of the response variable 2 nd step: find the correlation coefficient STAT CALC 8: LinReg(a+bx) LinReg(a+bx) L1,L2 r is the correlation coefficient BUT association does not imply causation! Even if two variables have a high correlation coefficient, it does not mean that the explanatory variable CAUSED the changes in the response variable Association does not imply causation! Example 1: During the months of March and April of a certain year, the weekly weight increases of a puppy in New York were collected. For the same time frame, the retail price increases of snowshoes in Alaska were collected. The data was examined and was found to have a very strong linear correlation. The weight of a growing puppy in New York (in pounds) The retail price of snowshoes in Alaska (in dollars) So, this must mean that the weight increase of a puppy in New York is causing snowshoe prices in Alaska to increase, or the price increases of snowshoes are causing the puppy's weight to increase. Of course this is not true! The moral of this example is: Be careful what you infer from your statistical analyses. Unfortunately, usually the situation is not as obvious as this one. Be sure your relationship makes sense. Also keep in mind that other factors may be involved in a potential cause and effect relationship. Association does not imply causation! Example 2: In the early 1930s the relationship between the human population (response variable) of Oldenburg, Germany, and number of storks nesting in the town (explanatory variable) was investigated. The correlation coefficient turned out to be Does this mean that storks bring babies? Can you give a possible explanation for this strong association?

7 The thymus example (shocking) The thymus, a gland in your neck, unlike other organs of the body, doesn t get larger as you grow it actually gets smaller. Imagine the situation: many infants are dying of what seem to be respiratory obstructions, so doctors begin to do autopsies on infants who die with respiratory symptoms. They have done many autopsies in the past on adults who died of various causes, so they decide to rely on those autopsy results for comparison. What stands out most when they did autopsies on the infants is that they all have thymus glands that look too big in comparison to their body size. So they concluded that the respiratory problems are caused by an enlarged thymus. It became quite common in the early 1900s for surgeons to treat respiratory problems in children by removing the thymus. In particular, in 1912, Dr. Charles Mayo published an article recommending removal of the thymus. He made this recommendation even though a third of the children who were operated on died. What s the lurking variable in this shocking example? What could be a lurking variable in these examples? There is a strong positive correlation between the foot length of K-12 students and reading scores. Students who use tutors have lower test scores than students who don t. A survey shows a strong positive correlation between the percentage of a country's inhabitants that use cell phones and the life expectancy in that country. Important: Association does not imply causation! One of the most common mistakes people make is when they observe a high correlation between two variables and conclude that one must be causing the other. Scatterplots and correlation do NOT demonstrate causation. It s hard to establish the nature and direction of causation, and there is always the risk of overlooking lurking variables Simpson s Paradox A relationship between two variables that holds for each individual value of a third variable can be changed or even reversed when the data for all values of the third variable are combined. This is Simpson s paradox. Simpson s paradox is an example of the effect of lurking variables on an observed association. Simpson s paradox Simpson s paradox is a severe form of confounding in which there is a reversal in the direction of an association caused by a lurking variable. Overall direction of association: positive But when we color different habitats in different colors, the data is separated by a lurking variable (different habitats) into a series of negative linear associations

8 Simpson s Paradox Example: Is acceptance into a college (response variable) predicted by gender (explanatory variable)? Consider these data: Success Failure Total Male Female Proportions accepted by gender: Male success rate = 198 / 360 = 0.55 Female success rate = 88 / 200 = 0.44 Conclude: males were accepted at a higher rate than females. 43 Broken down according to the lurking variable "major " Success Failure Total Male Female Business Success Failure Total Male Female Male proportion = 18 / 120 = 0.15 Female proportion = 24 / 120 = 0.20 Therefore: males were accepted at a lower rate than females. Art Success Failure Total Male Female Male proportion = 180 / 240 = 0.75 Female proportion = 64 / 80 = 0.80 Therefore: males were accepted at a lower rate than females. 44 Summary of causation Association does not imply causation! Association does not imply causation! Association does not imply causation! The issue of lurking variables and Simpson's paradox occur equally in both quantitative and categorical situations. So, in either case, be careful with your conclusion, and remember: Association does not imply causation! Explanatory variables A researcher wants to know if taking increasing amounts of ginkgo biloba will result in increased capacities of memory ability for different students. He administers it to the students in doses of 250 milligrams, 500 milligrams, and 1000 milligrams. What is the explanatory variable in this study? a) Amount of ginkgo biloba given to each student. b) Change in memory ability. c) Size of the student s brain. d) Whether the student takes the ginkgo biloba. 45 Numeric bivariate data The first step in analyzing numeric bivariate data is to a) Measure strength of linear relationship. b) Create a scatterplot. c) Model linear relationship with regression line. Scatterplots Look at the following scatterplot. Choose which description BEST fits the plot. a) Direction: positive, form: linear, strength: strong b) Direction: negative, form: linear, strength: strong c) Direction: positive, form: non-linear, strength: weak d) Direction: negative, form: non-linear, strength: weak e) No relationship 8

9 Scatterplots Look at the following scatterplot. Choose which description BEST fits the plot. Scatterplots Look at the following scatterplot. Choose which description BEST fits the plot. a) Direction: positive, form: non-linear, strength: strong b) Direction: negative, form: linear, strength: strong c) Direction: positive, form: linear, strength: weak d) Direction: positive, form: non-linear, strength: weak e) No relationship a) Direction: positive, form: non-linear, strength: strong b) Direction: negative, form: linear, strength: strong c) Direction: positive, form: linear, strength: weak d) Direction: positive, form: non-linear, strength: weak e) No relationship Scatterplots Which of the following scatterplots displays the stronger linear relationship? Correlation For which of the following situations would it be appropriate to calculate r, the correlation coefficient? a) Plot A b) Plot B a) Time spent studying for statistics exam and score on the exam. b) Income for county employees and their respective counties. c) Eye color and hair color of selected participants. d) Party affiliation of senators and their vote on presidential impeachment. c) Same for both Correlation What is a FALSE statement about r, the correlation coefficient? Correlation Which scatterplot would give a larger value for r? a) It is a product of z-scores of X and Y. b) It can range in value from 1 to 1. c) It measures the strength and direction of the linear relationship between X and Y. d) It is measured in units of the X variable. a) Plot A b) Plot B c) It would be the same for both plots. 9

10 Correlation True or False? Computing r as a measure of the strength of the relationship between X and Y is appropriate for the data in the following scatterplot: Correlation tells us about strength (scatter) and direction of the linear relationship between two quantitative variables. a) True b) False In addition, we would like to have a numerical description of how both variables vary together. For instance, is one variable increasing faster than the other one? And we would like to make predictions based on that numerical description. But which line best describes our data? A regression line Example 1 revisited A regression line is a straight line that describes how a response variable y changes as an explanatory variable x changes. We often use a regression line to predict the value of y for a given value of x. Example 1. again Example 2 revisited Which line to use? In most cases, no line will pass exactly through all the points in a scatterplot. Different people will draw different lines by eye. We need a way to draw a regression line that doesn t depend on our guess as to where the line should go. We will call this best line the Least-squares regression line

11 Least-squares Regression Line For a set of data points (x,y) the least squares regression line is a line for which the sum of squared errors is as small as possible. Equation of the Least-squares Regression Line $y = b + b x y ˆ = a+ bx Predicted value 0 1 Book s notation Calculator s notation All we need to do is calculate the intercept a, and the slope b. How to find a and b using the calculator 1 st step: enter you two lists (explanatory and response variables) STAT EDIT 1: Edit L1: enter your values of the explanatory variable, L2: enter your values of the response variable 2 nd step: find the correlation coefficient STAT CALC 8: LinReg(a+bx) LinReg(a+bx) L1,L2 a is the intercept, b is the slope Other way to find a and b: First we calculate the slope of the line, b = r s y b, from statistics we already know: r is the correlation s x s y is the standard deviation of the response variable y s x is the the standard deviation of the explanatory variable x Once we know b, the slope, we can calculate a, the y-intercept: a = y bx where x and y are the sample means of the x and y variables 63 Facts about least-squares regression Ex.1 AGAIN y The distinction between explanatory and response variables is essential in regression. The least-squares regression line always passes through the point ( x, y) ˆ y = a + bx a = 576 b = -3 $y = 576 3x Distance = 576 feet 3 Age x 65 11

12 Prediction: Interpolation The equation of the least-squares regression allows you to predict y for any x within the range studied. This is called interpolating. Prediction: Interpolation Predict the maximum distance at which a sign is legible for a 60 year old. Distance = 576 feet 3 Age Predicted distance = 576 feet = feet is our best prediction for the maximum distance at which a sign is legible for a 60 year old Prediction Ex.1 Predict the maximum distance at which a sign is legible for a 90 year old. Distance = 576 feet 3 Age Predicted distance = 576 feet = feet is our best prediction for the maximum distance at which a sign is legible for a 90 year old. BUT But this prediction is NOT RELIABLE. It is called EXTRAPOLATION. 69 Extrapolation Extrapolation is the use of a regression line for predictions outside the range of x values used to obtain the line. This can be a very silly thing to do, as seen here.!!!!!! Example 2 AGAIN y$ = x Nobody in the study drank 6.5 beers, but by finding the value of ŷ from the regression line for x = 6.5, we would expect a blood alcohol content of mg/ml. 12

13 Residuals The distances from each point to the least-squares regression are called residuals. The sum of these residuals is always 0. Points above the line have a positive residual. Ex.1 AGAIN $y = = 480 $y y Points below the line have a negative residual. ^ Predicted y Observed y dist. ( y yˆ) = residual residual y y$ = = Sum of squared errors Which least-squares regression line would have a smaller sum of squared errors? a) The line in Plot A. b) The line in Plot B. c) It would be the same for both plots. Slope Look at the following scatterplot. What would be a correct interpretation of the slope? a) As we increase our CO content by 1 mg, we increase the tar content by 1.01 mg. b) As we increase our CO content by 0.66 mg, we increase the tar content by 1.01 mg. c) As we increase our CO content by 0.66 mg, we increase the tar content by 0.66 mg. d) As we increase our CO content by 1 mg, we increase the tar content by 0.66 mg. Residuals Look at the following least-squares regression line. Compare the residuals from the two Points A and B. a) Point A s would be greater than Point B s. b) Point A s would be less than Point B s. c) Point A s would be equal to Point B s. d) There is not enough information. Residuals Residual equals a) b) c) d) 13

14 Correlation or regression Which of the following measures the direction and strength of the linear association between X and Y? Correlation or regression Which of the following makes no distinction between explanatory and response variables? a) Correlation b) Regression a) Correlation b) Regression Correlation or regression Which of the following is used for prediction? Regression line A regression line always passes through the point a) Correlation b) Regression a) b) c) d) Linear regression The following graph shows the linear relationship between diamond size and price for diamonds size 0.35 carats or less. Using this relationship to predict the price of a diamond that is 1 carat is considered Don t forget, the first test is on next Wednesday, 3/4. It will cover Chapters 1, 2, 3, and 10. a) Extrapolation. b) An influential observation. c) Prediction

Linear Regression. Chapter 5. Prediction via Regression Line Number of new birds and Percent returning. Least Squares

Linear Regression. Chapter 5. Prediction via Regression Line Number of new birds and Percent returning. Least Squares Linear Regression Chapter 5 Regression Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). We can then predict the average response for all subjects

More information

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Section 14 Simple Linear Regression: Introduction to Least Squares Regression Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship

More information

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r), Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables

More information

Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation

Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation Display and Summarize Correlation for Direction and Strength Properties of Correlation Regression Line Cengage

More information

2. Here is a small part of a data set that describes the fuel economy (in miles per gallon) of 2006 model motor vehicles.

2. Here is a small part of a data set that describes the fuel economy (in miles per gallon) of 2006 model motor vehicles. Math 1530-017 Exam 1 February 19, 2009 Name Student Number E There are five possible responses to each of the following multiple choice questions. There is only on BEST answer. Be sure to read all possible

More information

AP STATISTICS REVIEW (YMS Chapters 1-8)

AP STATISTICS REVIEW (YMS Chapters 1-8) AP STATISTICS REVIEW (YMS Chapters 1-8) Exploring Data (Chapter 1) Categorical Data nominal scale, names e.g. male/female or eye color or breeds of dogs Quantitative Data rational scale (can +,,, with

More information

Exercise 1.12 (Pg. 22-23)

Exercise 1.12 (Pg. 22-23) Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

More information

Correlation and Regression

Correlation and Regression Correlation and Regression Scatterplots Correlation Explanatory and response variables Simple linear regression General Principles of Data Analysis First plot the data, then add numerical summaries Look

More information

Homework 8 Solutions

Homework 8 Solutions Math 17, Section 2 Spring 2011 Homework 8 Solutions Assignment Chapter 7: 7.36, 7.40 Chapter 8: 8.14, 8.16, 8.28, 8.36 (a-d), 8.38, 8.62 Chapter 9: 9.4, 9.14 Chapter 7 7.36] a) A scatterplot is given below.

More information

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1) CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

More information

Name: Date: Use the following to answer questions 2-3:

Name: Date: Use the following to answer questions 2-3: Name: Date: 1. A study is conducted on students taking a statistics class. Several variables are recorded in the survey. Identify each variable as categorical or quantitative. A) Type of car the student

More information

Chapter 23. Inferences for Regression

Chapter 23. Inferences for Regression Chapter 23. Inferences for Regression Topics covered in this chapter: Simple Linear Regression Simple Linear Regression Example 23.1: Crying and IQ The Problem: Infants who cry easily may be more easily

More information

Correlation key concepts:

Correlation key concepts: CORRELATION Correlation key concepts: Types of correlation Methods of studying correlation a) Scatter diagram b) Karl pearson s coefficient of correlation c) Spearman s Rank correlation coefficient d)

More information

Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

More information

table to see that the probability is 0.8413. (b) What is the probability that x is between 16 and 60? The z-scores for 16 and 60 are: 60 38 = 1.

table to see that the probability is 0.8413. (b) What is the probability that x is between 16 and 60? The z-scores for 16 and 60 are: 60 38 = 1. Review Problems for Exam 3 Math 1040 1 1. Find the probability that a standard normal random variable is less than 2.37. Looking up 2.37 on the normal table, we see that the probability is 0.9911. 2. Find

More information

Statistics 151 Practice Midterm 1 Mike Kowalski

Statistics 151 Practice Midterm 1 Mike Kowalski Statistics 151 Practice Midterm 1 Mike Kowalski Statistics 151 Practice Midterm 1 Multiple Choice (50 minutes) Instructions: 1. This is a closed book exam. 2. You may use the STAT 151 formula sheets and

More information

Scatter Plot, Correlation, and Regression on the TI-83/84

Scatter Plot, Correlation, and Regression on the TI-83/84 Scatter Plot, Correlation, and Regression on the TI-83/84 Summary: When you have a set of (x,y) data points and want to find the best equation to describe them, you are performing a regression. This page

More information

MTH 140 Statistics Videos

MTH 140 Statistics Videos MTH 140 Statistics Videos Chapter 1 Picturing Distributions with Graphs Individuals and Variables Categorical Variables: Pie Charts and Bar Graphs Categorical Variables: Pie Charts and Bar Graphs Quantitative

More information

Relationships Between Two Variables: Scatterplots and Correlation

Relationships Between Two Variables: Scatterplots and Correlation Relationships Between Two Variables: Scatterplots and Correlation Example: Consider the population of cars manufactured in the U.S. What is the relationship (1) between engine size and horsepower? (2)

More information

Section 3 Part 1. Relationships between two numerical variables

Section 3 Part 1. Relationships between two numerical variables Section 3 Part 1 Relationships between two numerical variables 1 Relationship between two variables The summary statistics covered in the previous lessons are appropriate for describing a single variable.

More information

Example: Boats and Manatees

Example: Boats and Manatees Figure 9-6 Example: Boats and Manatees Slide 1 Given the sample data in Table 9-1, find the value of the linear correlation coefficient r, then refer to Table A-6 to determine whether there is a significant

More information

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the

More information

2013 MBA Jump Start Program. Statistics Module Part 3

2013 MBA Jump Start Program. Statistics Module Part 3 2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just

More information

Module 3: Correlation and Covariance

Module 3: Correlation and Covariance Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis

More information

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade Statistics Quiz Correlation and Regression -- ANSWERS 1. Temperature and air pollution are known to be correlated. We collect data from two laboratories, in Boston and Montreal. Boston makes their measurements

More information

Describing Relationships between Two Variables

Describing Relationships between Two Variables Describing Relationships between Two Variables Up until now, we have dealt, for the most part, with just one variable at a time. This variable, when measured on many different subjects or objects, took

More information

Simple linear regression

Simple linear regression Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number 1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x - x) B. x 3 x C. 3x - x D. x - 3x 2) Write the following as an algebraic expression

More information

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Review MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. 1) All but one of these statements contain a mistake. Which could be true? A) There is a correlation

More information

Chapter 9 Descriptive Statistics for Bivariate Data

Chapter 9 Descriptive Statistics for Bivariate Data 9.1 Introduction 215 Chapter 9 Descriptive Statistics for Bivariate Data 9.1 Introduction We discussed univariate data description (methods used to eplore the distribution of the values of a single variable)

More information

Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2

Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2 Lesson 4 Part 1 Relationships between two numerical variables 1 Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables

More information

Homework 11. Part 1. Name: Score: / null

Homework 11. Part 1. Name: Score: / null Name: Score: / Homework 11 Part 1 null 1 For which of the following correlations would the data points be clustered most closely around a straight line? A. r = 0.50 B. r = -0.80 C. r = 0.10 D. There is

More information

Session 7 Bivariate Data and Analysis

Session 7 Bivariate Data and Analysis Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares

More information

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 13 Introduction to Linear Regression and Correlation Analysis Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing

More information

Statistics 2014 Scoring Guidelines

Statistics 2014 Scoring Guidelines AP Statistics 2014 Scoring Guidelines College Board, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks of the College Board. AP Central is the official online home

More information

The Correlation Coefficient

The Correlation Coefficient The Correlation Coefficient Lelys Bravo de Guenni April 22nd, 2015 Outline The Correlation coefficient Positive Correlation Negative Correlation Properties of the Correlation Coefficient Non-linear association

More information

Stat 412/512 CASE INFLUENCE STATISTICS. Charlotte Wickham. stat512.cwick.co.nz. Feb 2 2015

Stat 412/512 CASE INFLUENCE STATISTICS. Charlotte Wickham. stat512.cwick.co.nz. Feb 2 2015 Stat 412/512 CASE INFLUENCE STATISTICS Feb 2 2015 Charlotte Wickham stat512.cwick.co.nz Regression in your field See website. You may complete this assignment in pairs. Find a journal article in your field

More information

Lecture 13/Chapter 10 Relationships between Measurement (Quantitative) Variables

Lecture 13/Chapter 10 Relationships between Measurement (Quantitative) Variables Lecture 13/Chapter 10 Relationships between Measurement (Quantitative) Variables Scatterplot; Roles of Variables 3 Features of Relationship Correlation Regression Definition Scatterplot displays relationship

More information

The importance of graphing the data: Anscombe s regression examples

The importance of graphing the data: Anscombe s regression examples The importance of graphing the data: Anscombe s regression examples Bruce Weaver Northern Health Research Conference Nipissing University, North Bay May 30-31, 2008 B. Weaver, NHRC 2008 1 The Objective

More information

II. DISTRIBUTIONS distribution normal distribution. standard scores

II. DISTRIBUTIONS distribution normal distribution. standard scores Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

More information

Chapter 7 Scatterplots, Association, and Correlation

Chapter 7 Scatterplots, Association, and Correlation 78 Part II Exploring Relationships Between Variables Chapter 7 Scatterplots, Association, and Correlation 1. Association. a) Either weight in grams or weight in ounces could be the explanatory or response

More information

Correlation. What Is Correlation? Perfect Correlation. Perfect Correlation. Greg C Elvers

Correlation. What Is Correlation? Perfect Correlation. Perfect Correlation. Greg C Elvers Correlation Greg C Elvers What Is Correlation? Correlation is a descriptive statistic that tells you if two variables are related to each other E.g. Is your related to how much you study? When two variables

More information

2. Simple Linear Regression

2. Simple Linear Regression Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

More information

Univariate Regression

Univariate Regression Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is

More information

WEB APPENDIX. Calculating Beta Coefficients. b Beta Rise Run Y 7.1 1 8.92 X 10.0 0.0 16.0 10.0 1.6

WEB APPENDIX. Calculating Beta Coefficients. b Beta Rise Run Y 7.1 1 8.92 X 10.0 0.0 16.0 10.0 1.6 WEB APPENDIX 8A Calculating Beta Coefficients The CAPM is an ex ante model, which means that all of the variables represent before-thefact, expected values. In particular, the beta coefficient used in

More information

Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS.

Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS. SPSS Regressions Social Science Research Lab American University, Washington, D.C. Web. www.american.edu/provost/ctrl/pclabs.cfm Tel. x3862 Email. SSRL@American.edu Course Objective This course is designed

More information

Introduction to Quantitative Methods

Introduction to Quantitative Methods Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................

More information

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

More information

CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

More information

. 58 58 60 62 64 66 68 70 72 74 76 78 Father s height (inches)

. 58 58 60 62 64 66 68 70 72 74 76 78 Father s height (inches) PEARSON S FATHER-SON DATA The following scatter diagram shows the heights of 1,0 fathers and their full-grown sons, in England, circa 1900 There is one dot for each father-son pair Heights of fathers and

More information

Descriptive statistics; Correlation and regression

Descriptive statistics; Correlation and regression Descriptive statistics; and regression Patrick Breheny September 16 Patrick Breheny STA 580: Biostatistics I 1/59 Tables and figures Descriptive statistics Histograms Numerical summaries Percentiles Human

More information

Simple Predictive Analytics Curtis Seare

Simple Predictive Analytics Curtis Seare Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use

More information

c. Construct a boxplot for the data. Write a one sentence interpretation of your graph.

c. Construct a boxplot for the data. Write a one sentence interpretation of your graph. MBA/MIB 5315 Sample Test Problems Page 1 of 1 1. An English survey of 3000 medical records showed that smokers are more inclined to get depressed than non-smokers. Does this imply that smoking causes depression?

More information

LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE

LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE MAT 119 STATISTICS AND ELEMENTARY ALGEBRA 5 Lecture Hours, 2 Lab Hours, 3 Credits Pre-

More information

Descriptive Statistics and Measurement Scales

Descriptive Statistics and Measurement Scales Descriptive Statistics 1 Descriptive Statistics and Measurement Scales Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample

More information

Statistics. Measurement. Scales of Measurement 7/18/2012

Statistics. Measurement. Scales of Measurement 7/18/2012 Statistics Measurement Measurement is defined as a set of rules for assigning numbers to represent objects, traits, attributes, or behaviors A variableis something that varies (eye color), a constant does

More information

Algebra 1 Course Information

Algebra 1 Course Information Course Information Course Description: Students will study patterns, relations, and functions, and focus on the use of mathematical models to understand and analyze quantitative relationships. Through

More information

Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple.

Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple. Graphical Representations of Data, Mean, Median and Standard Deviation In this class we will consider graphical representations of the distribution of a set of data. The goal is to identify the range of

More information

17. SIMPLE LINEAR REGRESSION II

17. SIMPLE LINEAR REGRESSION II 17. SIMPLE LINEAR REGRESSION II The Model In linear regression analysis, we assume that the relationship between X and Y is linear. This does not mean, however, that Y can be perfectly predicted from X.

More information

Lesson 3.2.1 Using Lines to Make Predictions

Lesson 3.2.1 Using Lines to Make Predictions STATWAY INSTRUCTOR NOTES i INSTRUCTOR SPECIFIC MATERIAL IS INDENTED AND APPEARS IN GREY ESTIMATED TIME 50 minutes MATERIALS REQUIRED Overhead or electronic display of scatterplots in lesson BRIEF DESCRIPTION

More information

Simple Regression Theory II 2010 Samuel L. Baker

Simple Regression Theory II 2010 Samuel L. Baker SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

More information

Econometrics Simple Linear Regression

Econometrics Simple Linear Regression Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight

More information

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Module 7 Test Name MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. You are given information about a straight line. Use two points to graph the equation.

More information

Copyright 2007 by Laura Schultz. All rights reserved. Page 1 of 5

Copyright 2007 by Laura Schultz. All rights reserved. Page 1 of 5 Using Your TI-83/84 Calculator: Linear Correlation and Regression Elementary Statistics Dr. Laura Schultz This handout describes how to use your calculator for various linear correlation and regression

More information

We are often interested in the relationship between two variables. Do people with more years of full-time education earn higher salaries?

We are often interested in the relationship between two variables. Do people with more years of full-time education earn higher salaries? Statistics: Correlation Richard Buxton. 2008. 1 Introduction We are often interested in the relationship between two variables. Do people with more years of full-time education earn higher salaries? Do

More information

The Effect of Dropping a Ball from Different Heights on the Number of Times the Ball Bounces

The Effect of Dropping a Ball from Different Heights on the Number of Times the Ball Bounces The Effect of Dropping a Ball from Different Heights on the Number of Times the Ball Bounces Or: How I Learned to Stop Worrying and Love the Ball Comment [DP1]: Titles, headings, and figure/table captions

More information

Dealing with Data in Excel 2010

Dealing with Data in Excel 2010 Dealing with Data in Excel 2010 Excel provides the ability to do computations and graphing of data. Here we provide the basics and some advanced capabilities available in Excel that are useful for dealing

More information

The Dummy s Guide to Data Analysis Using SPSS

The Dummy s Guide to Data Analysis Using SPSS The Dummy s Guide to Data Analysis Using SPSS Mathematics 57 Scripps College Amy Gamble April, 2001 Amy Gamble 4/30/01 All Rights Rerserved TABLE OF CONTENTS PAGE Helpful Hints for All Tests...1 Tests

More information

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test March 2014

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test March 2014 UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test March 2014 STAB22H3 Statistics I Duration: 1 hour and 45 minutes Last Name: First Name: Student number: Aids

More information

Diagrams and Graphs of Statistical Data

Diagrams and Graphs of Statistical Data Diagrams and Graphs of Statistical Data One of the most effective and interesting alternative way in which a statistical data may be presented is through diagrams and graphs. There are several ways in

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

Using Excel for Statistical Analysis

Using Excel for Statistical Analysis Using Excel for Statistical Analysis You don t have to have a fancy pants statistics package to do many statistical functions. Excel can perform several statistical tests and analyses. First, make sure

More information

A full analysis example Multiple correlations Partial correlations

A full analysis example Multiple correlations Partial correlations A full analysis example Multiple correlations Partial correlations New Dataset: Confidence This is a dataset taken of the confidence scales of 41 employees some years ago using 4 facets of confidence (Physical,

More information

Designer: Nathan Kimball. Stage 1 Desired Results

Designer: Nathan Kimball. Stage 1 Desired Results Interpolation Subject: Science, math Grade: 6-8 Time: 4 minutes Topic: Reading Graphs Designer: Nathan Kimball Stage 1 Desired Results Lesson Overview: In this activity students work with the direct linear

More information

Pearson s Correlation Coefficient

Pearson s Correlation Coefficient Pearson s Correlation Coefficient In this lesson, we will find a quantitative measure to describe the strength of a linear relationship (instead of using the terms strong or weak). A quantitative measure

More information

Association Between Variables

Association Between Variables Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi

More information

ch12 practice test SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.

ch12 practice test SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question. ch12 practice test 1) The null hypothesis that x and y are is H0: = 0. 1) 2) When a two-sided significance test about a population slope has a P-value below 0.05, the 95% confidence interval for A) does

More information

Linear functions Increasing Linear Functions. Decreasing Linear Functions

Linear functions Increasing Linear Functions. Decreasing Linear Functions 3.5 Increasing, Decreasing, Max, and Min So far we have been describing graphs using quantitative information. That s just a fancy way to say that we ve been using numbers. Specifically, we have described

More information

Simple Linear Regression, Scatterplots, and Bivariate Correlation

Simple Linear Regression, Scatterplots, and Bivariate Correlation 1 Simple Linear Regression, Scatterplots, and Bivariate Correlation This section covers procedures for testing the association between two continuous variables using the SPSS Regression and Correlate analyses.

More information

CURVE FITTING LEAST SQUARES APPROXIMATION

CURVE FITTING LEAST SQUARES APPROXIMATION CURVE FITTING LEAST SQUARES APPROXIMATION Data analysis and curve fitting: Imagine that we are studying a physical system involving two quantities: x and y Also suppose that we expect a linear relationship

More information

MATH 103/GRACEY PRACTICE EXAM/CHAPTERS 2-3. MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

MATH 103/GRACEY PRACTICE EXAM/CHAPTERS 2-3. MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. MATH 3/GRACEY PRACTICE EXAM/CHAPTERS 2-3 Name MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Provide an appropriate response. 1) The frequency distribution

More information

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,

More information

Algebra 1 2008. Academic Content Standards Grade Eight and Grade Nine Ohio. Grade Eight. Number, Number Sense and Operations Standard

Algebra 1 2008. Academic Content Standards Grade Eight and Grade Nine Ohio. Grade Eight. Number, Number Sense and Operations Standard Academic Content Standards Grade Eight and Grade Nine Ohio Algebra 1 2008 Grade Eight STANDARDS Number, Number Sense and Operations Standard Number and Number Systems 1. Use scientific notation to express

More information

List of Examples. Examples 319

List of Examples. Examples 319 Examples 319 List of Examples DiMaggio and Mantle. 6 Weed seeds. 6, 23, 37, 38 Vole reproduction. 7, 24, 37 Wooly bear caterpillar cocoons. 7 Homophone confusion and Alzheimer s disease. 8 Gear tooth strength.

More information

The Big Picture. Describing Data: Categorical and Quantitative Variables Population. Descriptive Statistics. Community Coalitions (n = 175)

The Big Picture. Describing Data: Categorical and Quantitative Variables Population. Descriptive Statistics. Community Coalitions (n = 175) Describing Data: Categorical and Quantitative Variables Population The Big Picture Sampling Statistical Inference Sample Exploratory Data Analysis Descriptive Statistics In order to make sense of data,

More information

Copyright 2013 by Laura Schultz. All rights reserved. Page 1 of 7

Copyright 2013 by Laura Schultz. All rights reserved. Page 1 of 7 Using Your TI-83/84/89 Calculator: Linear Correlation and Regression Dr. Laura Schultz Statistics I This handout describes how to use your calculator for various linear correlation and regression applications.

More information

UNIT 1: COLLECTING DATA

UNIT 1: COLLECTING DATA Core Probability and Statistics Probability and Statistics provides a curriculum focused on understanding key data analysis and probabilistic concepts, calculations, and relevance to real-world applications.

More information

Module 5: Multiple Regression Analysis

Module 5: Multiple Regression Analysis Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

More information

(Least Squares Investigation)

(Least Squares Investigation) (Least Squares Investigation) o Open a new sketch. Select Preferences under the Edit menu. Select the Text Tab at the top. Uncheck both boxes under the title Show Labels Automatically o Create two points

More information

COWLEY COUNTY COMMUNITY COLLEGE REVIEW GUIDE Compass Algebra Level 2

COWLEY COUNTY COMMUNITY COLLEGE REVIEW GUIDE Compass Algebra Level 2 COWLEY COUNTY COMMUNITY COLLEGE REVIEW GUIDE Compass Algebra Level This study guide is for students trying to test into College Algebra. There are three levels of math study guides. 1. If x and y 1, what

More information

Simple Linear Regression

Simple Linear Regression STAT 101 Dr. Kari Lock Morgan Simple Linear Regression SECTIONS 9.3 Confidence and prediction intervals (9.3) Conditions for inference (9.1) Want More Stats??? If you have enjoyed learning how to analyze

More information

CORRELATIONAL ANALYSIS: PEARSON S r Purpose of correlational analysis The purpose of performing a correlational analysis: To discover whether there

CORRELATIONAL ANALYSIS: PEARSON S r Purpose of correlational analysis The purpose of performing a correlational analysis: To discover whether there CORRELATIONAL ANALYSIS: PEARSON S r Purpose of correlational analysis The purpose of performing a correlational analysis: To discover whether there is a relationship between variables, To find out the

More information

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

 Y. Notation and Equations for Regression Lecture 11/4. Notation: Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

More information

USING A TI-83 OR TI-84 SERIES GRAPHING CALCULATOR IN AN INTRODUCTORY STATISTICS CLASS

USING A TI-83 OR TI-84 SERIES GRAPHING CALCULATOR IN AN INTRODUCTORY STATISTICS CLASS USING A TI-83 OR TI-84 SERIES GRAPHING CALCULATOR IN AN INTRODUCTORY STATISTICS CLASS W. SCOTT STREET, IV DEPARTMENT OF STATISTICAL SCIENCES & OPERATIONS RESEARCH VIRGINIA COMMONWEALTH UNIVERSITY Table

More information

STAT 350 Practice Final Exam Solution (Spring 2015)

STAT 350 Practice Final Exam Solution (Spring 2015) PART 1: Multiple Choice Questions: 1) A study was conducted to compare five different training programs for improving endurance. Forty subjects were randomly divided into five groups of eight subjects

More information

DATA INTERPRETATION AND STATISTICS

DATA INTERPRETATION AND STATISTICS PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE

More information

Mind on Statistics. Chapter 2

Mind on Statistics. Chapter 2 Mind on Statistics Chapter 2 Sections 2.1 2.3 1. Tallies and cross-tabulations are used to summarize which of these variable types? A. Quantitative B. Mathematical C. Continuous D. Categorical 2. The table

More information

CORRELATION ANALYSIS

CORRELATION ANALYSIS CORRELATION ANALYSIS Learning Objectives Understand how correlation can be used to demonstrate a relationship between two factors. Know how to perform a correlation analysis and calculate the coefficient

More information