Section 14 Simple Linear Regression: Introduction to Least Squares Regression
|
|
|
- Jack Washington
- 11 months ago
- Views:
Transcription
1 Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship between two variables. If the researcher is working with numeric measures and supposes a linear relationship between these two variables, the appropriate measure of association is correlation. Additionally, if a particular set of assumptions is met, we can predict one of the two variables (an outcome) based on the other variable (a predictor ); this is called simple linear regression. Further, a researcher may wish to understand the relationships among more than two variables. This can be done with an extension of simple linear regression, called multiple linear regression. Recall, any statistical hypothesis test is a method for quantifying how much evidence constitutes enough evidence to declare a significant outcome in a research study. The hypothesis being tested by a correlation, and also by simple linear regression, is whether two variables have a significant linear association with each other.
2 Slide 2 Linear Regression: Examples Is higher wine consumption associated with lower rates of hear disease? What is the nature of this relationship? Is the relationship linear? What is the relationship between the number of people living on farms and the passing of time from 1935 to How fast did the number of people living on farms in the US decrease? What is the relationship between plasma volume in the blood and body weight? Do these two measures have a linear relationship? Does estriol level of a mother have a linear relationship with the birth-weight of her baby? Can we predict birth-weight of a baby from a mother s estriol level? Does the age at which a child first begins talking predict a score of mental ability later in childhood? Is there a linear relationship between systolic blood pressure and age? 2 We learned when we have a measure of two continuous variables we can describe this relationship visually with a scatter-plot. In addition, if that relationship appears to be linear, we can measure the strength and direction of the linear association. Finally, if certain assumptions are met, we may be able to predict the value of one measure from another measure. For example, is higher wine consumption associated with lower rates of hear disease? What is the nature of this relationship? Is the relationship linear? What is the relationship between the number of people living on farms and the passing of time from 1935 to In other words, how fast did the number of people living on farms in the US decrease from 1935 to 1990? What is the relationship between plasma volume in the blood and body weight? Do these two measures have a linear relationship? Can we predict plasma volume in the blood from a person s body weight? How well? Does estriol level of a mother have a linear relationship with the birth-weight of her baby? Can we predict birth-weight of a baby from a mother s estriol level? If so, can we anticipate a low birth-weight baby from estriol levels? Does the age at which a child first begins talking predict a score of mental ability later in childhood? Is there a linear relationship between systolic blood pressure and age? In all of these examples, we are investigating the relationship between two quantitative variables. We may begin this investigation with a scatter-plot followed by a correlation analysis. We will now take our investigation further by introducing simple linear regression.
3
4 Slide 3 Simple Linear Regression Simple Linear Regression(SLR) analysis is used to quantify the linear relationship between two quantitative variables. In this way, it is like correlation, but regression goes farther: It allows us to draw the line that best describes the linear relationship between X and Y. It allows us to predict the value of the outcome Y for a specified value of X. It allows us to quantify how much of a change in the value of Y is seen with a specified change in the value of X. In other studies the goal is to assess the relationships among a set of variables. 3 Simple linear regression analysis is used to quantify the linear relationship between two quantitative variables. In this way, it is like correlation, but regression goes farther: It allows us to draw the line that best describes the linear relationship between X and Y. It allows us to predict the value of the outcome Y for a specified value of X. It allows us to quantify how much of a change in the value of Y is seen with a specified change in the value of X.
5 Slide 4 Variable (X) and Variable (Y) We can describe the relationship or association between two quantitative variables using: Scatterplot Correlation Simple linear regression Usually we identify one variable as the outcome of interest, what we have been mostly thinking of as a disease variable so far. This is often called the response, or dependent, variable. The other variable is the predictor of interest, what we have been mostly thinking of as an exposure variable so far. This is often called the explanatory, or independent, variable. 4 Recall, usually we identify one variable as the outcome of interest, what we have been mostly thinking of as a disease variable so far. This is often called the response, or dependent, variable. The other variable is the predictor of interest, what we have been mostly thinking of as an exposure variable so far. This is often called the explanatory, or independent, variable. When each unit (person) has two measures we usually call one x and one y. If one variable can help predict the value of the other variable we call this variable x. It is also called the predictor, explanatory or independent variables. The other variable, y, is called the outcome, response variable or dependent variable. Sometimes we cannot tell which is the predictor and which is the outcome. Simple linear regression requires we pick one variable as the outcome.
6 Slide 5 Wine Consumption and Heart Disease Is higher wine consumption associated with lower rates of hear disease? What is the nature of this relationship? Is the relationship linear? Moore and McCabe, Introduction to the Practice of Statistics 4 th Edition, W. H. Freeman & Co., New York.. 5 Here is some data on wine consumption and heart disease deaths. Does this data suggest a linear relationship between these two variables?
7 Slide 6 Wine Consumption and Heart Disease 6 The data suggest a negative trend. Can we estimate how much lower heart disease rates are for each extra liter per person per year? How would we draw a line through this data to help us with this estimate? What can we say about the precision of this regression line? How much of the variability in heart disease deaths is explained by the regression line? Do you think these data come from a random sample? What assumptions are we making when using linear regression to make predictions? What confounders must we consider? These are all concepts we will investigate with linear regression.
8 Slide 7 Population Living on Farms What is the relationship between the number of people living on farms and the passing of time from 1935 to How fast did the number of people living on farms in the US decrease? 7 What is the relationship between the number of people living on farms and the passing of time from 1935 to How fast did the number of people living on farms in the US decrease? Does this data suggest a linear relationship between these two variables?
9 Slide 8 Population Living on Farms. How fast did the number of people living on farms in the US decrease? 8 We can see a strong negative trend that appears fairly linear. How might we draw a line through this data? Is there a best way to draw this line?
10 Slide 9 Plasma Volume and Body Weight What is the relationship between plasma volume in the blood and body weight? Do these two measures have a linear relationship? Body Plasma Subject Weight(kg) Volume(l) Consider the association between bodyweight in kilograms and plasma volume in the blood in liters for eight randomly selected people. Do heavier people have more plasma? If so, how much more? Is this relationship linear?
11 Slide 10 Simple Linear Regression Y, plasma volume (liters) Pearson s correlation = X, body weight (kg) 10 When we plot the data we can see a positive relationship between bodyweight and plasma. The data do not fall perfectly in a line. The correlation value when calculated is of We could calculate the value of correlation to help us understand the strength of the linear relationship. We may want to draw a line through this data, thus giving us a mathematical model to estimate plasma volume from weight, but which is the best line? The white line, the green line or the purple line? The technique of least squares regression will help us pick the line of best fit.
12 Slide 11 How Do We Choose the Best Line? The least squares regression line is the line which gets closest to all of the points How do we measure closeness to more than one point? minimize n (y i point_on_line i ) 2 i=1 11 The line of best fit is the regression line is the line that gets `closest' to all the data points. `Closeness' is measured as the vertical distance from the line to the data points. Specifically, the regression line is the one that minimizes the sum of all the squared vertical distances, hence estimation of this line is called least squares and the line is called the least square regression line.
13 Slide 12 Simple Linear Regression 12 Visually, we find the line that minimizes the squares of the vertical distances and the positive measures (points above the line) and the negative measures (points below the line), sum to zero. This could be very difficult to achieve by trial and error. We have some mathematical formulas that help us determine this exact line.
14 Slide 13 Equation of a Line Definition A line is defined by The intercept a (where the line crosses the vertical axis, the value of Y when X = 0), and The slope b (`rise over run,' how much y changes for each 1 unit change in x). y = a + bx 13 Before we move further with linear regression, let s review the equation of a line. That is, how do we represent a line with a mathematical function. A line is defined by the intercept a (where the line crosses the vertical axis, the value of Y when X = 0), and the slope b (`rise over run,' how much y changes for each 1 unit change in x). We write this as y = a + bx.
15 Slide 14 Equation of a Line 14 We can see the line crosses the vertical axis at the value a, when x = 0. We also see that for every one unit increase in x, y will change by the amount b.
16 Slide 15 Equation of a Line: Statistical Notation b b 0 1 = intercept = slope ˆ = b + b x y In statistics, the symbol for the intercept is b knot and the symbol for the slope is b sub one. Then we write the line as : y hat equals b0 + b1x. The reason we use yhat instead of y is to differentiate between the real data value y and our predicted value yhat given a value of x.
17 Slide 16 Equation of a Line: Statistical Notation y ˆ = b + b x y 0 1 b 0 b 1 slope intercept 0 x 16 Using statistical notation, we have the same picture as before. Here the line crosses the vertical axis at the value b knot, when x = 0. We also see that for every one unit increase in x, y-hat will change by the amount b sub 1.
18 Slide 17 Estimating Intercept and Slope b b 0 1 = y b x s = r s y x 1 yˆ = b + b x The least squares line minimizes the sum of squared vertical distances. This translates into: b knot equal ybar slope times xbar. The slope is the correlation times the ratio of the standard deviation of the observed y values divided by the standard deviation of the observed x values. In this way, we see the slope and the correlation are related to one another. The correlation depends on both the slope and the precision. The equations are obtained using mathematics beyond this course. It is enough to understand that these are the equations to help us determine the least squares regression line, y hat = b not plus b sub 1 times x.
19 Slide 18 y y Slope and Correlation b >0 1 b 1 = 0 b 1 < 0 0 x 18 Notice if the slope is positive then the correlation is positive. If the slope is zero then the correlation is zero. If the slope is negative then the correlation is negative.
20 Slide 19 Simple Linear Regression Y, plasma volume (liters) Pearson s correlation = X, body weight (kg) 19 The data points are represented as the dots in our scatter-plot, but the data points don't fall exactly on the line. How do we compute (and write) the least squares line for this data? Once we have the line, for any x value within the range of those values in our dataset, y-hat is the point that will fall exactly on the least squares line, not the data value for y. Thus every x value can be plugged into this equation to calculate a predicted y value which we denote y-hat.
21 Slide 20 Estimating Intercept and Slope sy b1 = r = s x = b = y b x = (66.875) = yˆ = x 20 Using the equations for estimating the slope and intercept for the least squares regression line, we get an intercept of and a slope of We must calculate the slope first because the equation for the intercept requires the use of the estimate of the slope. Generally, we do not do these calculations by hand. We use software to compute these values.
22 Slide 21 Plasma Volume and Weight yˆ = x 21 Using R we plot the least square regression line. This means for every one kilogram increase in body weight there is on average a liter increase in plasma volume. The intercept is the estimated plasma volume for a person who weighs zero kilograms. This estimate does not make biological sense. In this way, the intercept for this model is merely used to help us determine the line, not make a prediction at x = 0. The only meaningful estimates are within the range of our x values. That is weights from about 55 to 75 kilograms.
23 Slide 22 Plasma Volume and Weight Measurement of plasma volume very time consuming Body weight easy to measure: use equation and body weight to estimate plasma volume yˆ = x = (60) = Measuring plasma volume is very time consuming. We may want to estimate the plasma volume of a person outside this study based on the person s weight. For example, what on average would you expect plasma volume to be in liters for a 60 kilogram man? We would put 60 kilograms in for x and then calculate the estimated value to be 2.7 liters. That is, yhat equals * 60. Be very careful only to make estimates within the range of the data that was used to estimate the regression line. Also, be aware that measurement unit is meaningful. We would not want to insert values in pounds when the regression line is based on kilograms.
24 Slide 23 RSQUARE The square of the correlation (r 2= RSQUARE) is the fraction of the variation in the values of y that is explained by the least squares regression of y on x. r 2 variance of predicted values ŷ = variance of observed values of y = SSM SST 23 Recall Pearson s correlation: It measures the strength of the linear relationship between two quantitative variables. There is another measure called the coefficient of determination. It s value is Pearson s correlation squared. For this reason, it is often denoted RSQUARE. When using least squares regression typically the value of the coefficient of determination is used to help understand the amount of total variation that is explained by the regression of y on x. In fact, RSQUARE = SSM/SST. This is the sum of the squares of the model divided by the sum of the squares total. Those values will come from the ANOVA table in the linear regression output from the software. We will discuss the ANOVA table at length in a later lesson.
25 Slide 24 Plasma Volume and Weight This means 57.6% of the variation in plasma volume is explained by the least squares regression line of plasma volume on body weight. r 2 = 2 (0.759) = Recall, the correlation between plasma volume and weight is It we square this value, we have the coefficient of determination. The value is This means 57.6% of the variation in plasma volume is explained by the least squares regression line of plasma volume on body weight. When RSQUARE is close to 1, the regression line (the y-hat values) is representing the original data (the Y values) well. When RSQUARE is close to 0, the regression line is not representing the original data well.
26 Slide 25 Simple Linear Regression: Residuals 25 When we draw the least squares regression line, the line of best fit, the line does not fall directly on all the data points. That is, the y-hat values are different than the actual y values for the data. We call these vertical distances Residuals.
27 Slide 26 Residuals Model ˆ = b + b x y 0 1 ε = i y i yˆ i ε i =difference between observed and predicted value of response for each value of x => Called the residual. 26 y yhat for each piece of data is the residual for that point. This value is often denoted with epsilon sub i. We can calculate the value at any x in our dataset by taking the observed y value minus the predicted value, y-hat from the model. If the residual is positive, it means the data value is above the line. If the residual is negative, the data value is below the line. We will use residuals and residual plots in our next lesson to investigate how well the linear model is fitting the data observed.
28 Slide 27 Estriol and Infant Birth-weight Obstetricians sometimes order tests for estriol levels from 24-hour urine specimens taken from pregnant women who are near term. The level of estriol (mg/24 hours) has been found to be positively related to the birth-weight (grams/100) of the infant. Thus, the test can provide indirect evidence of an abnormally small fetus. [Bernard Rosner, Fundamentals of Biostatistics, page 425] 27 Let s do an another example. Obstetricians sometimes order tests for estriol levels from 24-hour urine specimens taken from pregnant women who are near term, since the level of estriol has been found to be related to the birth-weight of the infant. The test may provide indirect evidence of an abnormally small fetus.
29 Slide 28 Estriol and Infant Birth-weight Pearson' s Correlation, r = Here is the scatter-plot of birth-weight and Estriol for 31 women and babies. We can see that there is a positive relationship between estriol level and birthweight. The relationship is not perfect, but linear regression may still help with predictions. The Pearson s correlation value is Notice that birth-weight is in g/100. We will want to know this unit later for our calculations.
30 Slide 29 Estriol and Infant Birth-weight yˆ = x 29 The values of the slope and intercept can be calculated using software, or by using the equations given in earlier slides. The prediction line shown on the scatter-plot is yhat = x. This means for every one unit increase in estriol level the birth-weight of the infant is on average g/100 higher, about 60 grams.
31 Slide 30 Estriol and Infant Birth-weight Using estriol level to predict infant birth-weight when estriol level is 10mg. yˆ = x = (10) = 27.6 grams/ Suppose we want to estimate the birth-weight of a baby whose mother has an estriol level of 10 mg. Before we begin, we verify 10 mg is in the range of the original data. We can do this by looking at the scatter-plot of the data. We can then put 10 mg in the least squares regression equation for x and calculate an estimated weight of 27.6 g/100. This is 2,760 grams.
32 Slide 31 Estriol and Infant Birth-weight Using estriol level to predict infant birth-weight when estriol level is 30mg. 31 Suppose we want to estimate the birth-weight of a baby whose mother has an estriol level of 30 mg. Before we begin, we verify 30 mg is in the range of the original data. We can do this by looking at the scatter-plot of the data. We see that 30mg is NOT in the range of the x data for our study. We should not use the regression line to estimate infant birth-weight!
33 Slide 32 Estriol and Infant Birth-weight Now let's go in the reverse direction: Low birth-weight may be defined as infant birth-weight less than 2500 grams. For what estriol level is the predicted infant birth-weight equal to 2500 grams? (First convert to the correct units: 2500 grams = 25 grams/100.) 25 = x = 0.608x = x = x 32 Now let's go in the reverse direction: Low birth-weight may be defined as infant birth-weight less than 2500 grams. For what estriol level is the predicted infant birth-weight equal to 2500 grams? First we must convert to the correct units: 2500 grams = 25 grams/100. If you set 25 = x and then solve for x, you will find the estriol level that predicts a low birth-weight baby. The value of x is 5.72 mg.
34 Slide 33 Assumptions L = linear relationship between y and x. I = independence between values of y. (Value of one y does not affect value of another y). N = normality of error around each value of y. E= equality of variance around y for each value of x. 33 Linear regression requires we make some assumptions. Conveniently, these assumptions follow the acronym LINE. These assumptions are: L = = linear relationship between y and x. I = independence between values of y. One value of y does not affect another value of y. N = normality of error around each value of y. E= equality of variance around y for each value of x. Our next lesson will explore techniques to evaluate each of these assumptions.
35 Slide 34 Cautions Predicted values should only be computed for X values that fall within the range of X values in the original data. Just like a correlation, a regression line only summarizes the linear relationship between X and Y. If the relationship is truly non-linear, then using the regression line can be misleading. Seeing a relationship (an association) between X and Y does not imply causation: that changes in X will cause changes in Y. 34 In addition to evaluating linear regression assumptions, we must take caution with the interpretation of our results. Predicted values should only be computed for X values that fall within the range of X values in the original data. Just like a correlation, a regression line only summarizes the linear relationship between X and Y. If the relationship is truly non-linear, then using the regression line can be misleading. Seeing a relationship (an association) between X and Y does not imply causation: that changes in X will cause changes in Y.
36 Slide 35 Cautions In the regression context, a lurking variable is a third variable that may influence the relationship between X and Y. Outliers and skewed data can impact the regression line, just like they can impact the correlation. Either X or Y or both could have outliers or skewness. If including a particular data point changes the regression line compared to when it is not included, the data point is called influential. 35 In the regression context, a lurking variable is a third variable that may influence the relationship between X and Y. Outliers and skewed data can impact the regression line, just like they can impact the correlation. Either X or Y or both could have outliers or skewness. If including a particular data point changes the regression line compared to when it is not included, the data point is called influential. Does that seem like many `cautions'? It is: as we learn methods that are more complicated, there will often be more limits on their use and interpretation.
Chapter 2. Looking at Data: Relationships. Introduction to the Practice of STATISTICS SEVENTH. Moore / McCabe / Craig. Lecture Presentation Slides
Chapter 2 Looking at Data: Relationships Introduction to the Practice of STATISTICS SEVENTH EDITION Moore / McCabe / Craig Lecture Presentation Slides Chapter 2 Looking at Data: Relationships 2.1 Scatterplots
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships The Practice of Statistics, 4 th edition For AP* STARNES, YATES, MOORE Chapter 3 2 Describing Relationships 3.1 Scatterplots and Correlation 3.2 Learning Targets After
Chapter 5. Regression
Chapter 5. Regression 1 Chapter 5. Regression Regression Lines Definition. A regression line is a straight line that describes how a response variable y changes as an explanatory variable x changes. We
Correlation and Regression
Correlation and Regression 1 Association between Categorical Variables 2 2 Association between Quantitative Variables 3 3 Prediction 8 www.apsu.edu/jonesmatt 1 1 Association between Categorical Variables
Lesson Lesson Outline Outline
Lesson 15 Linear Regression Lesson 15 Outline Review correlation analysis Dependent and Independent variables Least Squares Regression line Calculating l the slope Calculating the Intercept Residuals and
CHAPTER 2 AND 10: Least Squares Regression
CHAPTER 2 AND 0: Least Squares Regression In chapter 2 and 0 we will be looking at the relationship between two quantitative variables measured on the same individual. General Procedure:. Make a scatterplot
The statistical procedures used depend upon the kind of variables (categorical or quantitative):
Math 143 Correlation and Regression 1 Review: We are looking at methods to investigate two or more variables at once. bivariate: multivariate: The statistical procedures used depend upon the kind of variables
Chapter 5 Least Squares Regression
Chapter 5 Least Squares Regression A Royal Bengal tiger wandered out of a reserve forest. We tranquilized him and want to take him back to the forest. We need an idea of his weight, but have no scale!
Chapter 8. Linear Regression. Copyright 2012, 2008, 2005 Pearson Education, Inc.
Chapter 8 Linear Regression Copyright 2012, 2008, 2005 Pearson Education, Inc. Fat Versus Protein: An Example The following is a scatterplot of total fat versus protein for 30 items on the Burger King
Example: A study is done to see how the number of beers that a student drinks predicts his/her blood alcohol content (BAC). Results of 16 students:
Scatterplots Example: A study is done to see how the number of beers that a student drinks predicts his/her blood alcohol content (BAC). Results of 16 students: How many variables do we have? Variables
Correlation and Regression
Correlation and Regression Cal State Northridge Ψ47 Ainsworth Major Points - Correlation Questions answered by correlation Scatterplots An example The correlation coefficient Other kinds of correlations
Often: interest centres on whether or not changes in x cause changes in y or on predicting y from
Fact: r does not depend on which variable you put on x axis and which on the y axis. Often: interest centres on whether or not changes in x cause changes in y or on predicting y from x. In this case: call
Correlation and Simple Regression
Correlation and Simple Regression Alan B. Gelder 06E:071, The University of Iowa 1 1 Correlation Categorical vs. Quantitative Variables: We have seen both categorical and quantitative variables during
Today s Outline. Some Terminology: Explanatory and Response Variable. Unit 3: Descriptive Statistics II 9/11/2013
Today s Outline Unit 3: Descriptive Statistics II Textbook Sections 3.5, 9.1, 9.2, & 9.4 1 Associations in Data Two Quantitative Variables Scatterplots Correlation Regression r 2 Residuals Caution One
Linear Regression. Chapter 5. Prediction via Regression Line Number of new birds and Percent returning. Least Squares
Linear Regression Chapter 5 Regression Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). We can then predict the average response for all subjects
2 Quantitative Variable Analysis
2 Quantitative Variable Analysis Introduction: We sometimes want to look for relationships between 2 quantitative variables, as we did for 2 categorical variables. Again, as we had in the categorical case,
Advanced High School Statistics. Preliminary Edition
Chapter 2 Summarizing Data After collecting data, the next stage in the investigative process is to summarize the data. Graphical displays allow us to visualize and better understand the important features
Unit 6 - Simple linear regression
Sta 101: Data Analysis and Statistical Inference Dr. Çetinkaya-Rundel Unit 6 - Simple linear regression LO 1. Define the explanatory variable as the independent variable (predictor), and the response variable
Regression Analysis Prof. Soumen Maity Department of Mathematics Indian Institute of Technology, Kharagpur. Lecture - 2 Simple Linear Regression
Regression Analysis Prof. Soumen Maity Department of Mathematics Indian Institute of Technology, Kharagpur Lecture - 2 Simple Linear Regression Hi, this is my second lecture in module one and on simple
Lecture 5: Correlation and Linear Regression
Lecture 5: Correlation and Linear Regression 3.5. (Pearson) correlation coefficient The correlation coefficient measures the strength of the linear relationship between two variables. The correlation is
Chapter 3 Review: Exploring Bivariate Data
Chapter 3 Review: Exploring Bivariate Data Directions: The questions or incomplete statements that follow are each followed by five suggested answers or completions. Choose the response that best answers
Section 3 Part 1. Relationships between two numerical variables
Section 3 Part 1 Relationships between two numerical variables 1 Relationship between two variables The summary statistics covered in the previous lessons are appropriate for describing a single variable.
Chapter 3: Association, Correlation, and Regression
Chapter 3: Association, Correlation, and Regression Section 1: Association between Categorical Variables The response variable (or sometimes called dependent variable) is the outcome variable that depends
Chapter 10 Correlation and Regression
Chapter 10 Correlation and Regression 10-1 Review and Preview 10-2 Correlation 10-3 Regression 10-4 Prediction Intervals and Variation 10-5 Multiple Regression 10-6 Nonlinear Regression Section 10.1-1
LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION
LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION In this lab you will learn how to use Excel to display the relationship between two quantitative variables, measure the strength and direction of the
Elementary Statistics. Scatter Plot, Regression Line, Linear Correlation Coefficient, and Coefficient of Determination
Scatter Plot, Regression Line, Linear Correlation Coefficient, and Coefficient of Determination What is a Scatter Plot? A Scatter Plot is a plot of ordered pairs (x, y) where the horizontal axis is used
Ismor Fischer, 5/29/ POPULATION Random Variables X, Y: numerical Definition: Population Linear Correlation Coefficient of X, Y
Ismor Fischer, 5/29/2012 7.2-1 7.2 Linear Correlation and Regression POPULATION Random Variables X, Y: numerical Definition: Population Linear Correlation Coefficient of X, Y ρ = σ XY σ X σ Y FACT: 1 ρ
SECTION 5 REGRESSION AND CORRELATION
SECTION 5 REGRESSION AND CORRELATION 5.1 INTRODUCTION In this section we are concerned with relationships between variables. For example: How do the sales of a product depend on the price charged? How
Sociology 6Z03 Topic 6: Least-Squares Regression
Sociology 6Z03 Topic 6: Least-Squares Regression John Fo McMaster University Fall 2016 John Fo (McMaster University) Soc 6Z03: Least-Squares Regression Fall 2016 1 / 44 Outline: Least-Squares Regression
Exercise 1.12 (Pg. 22-23)
Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.
Relationship of two variables
Relationship of two variables A correlation exists between two variables when the values of one are somehow associated with the values of the other in some way. Scatter Plot (or Scatter Diagram) A plot
Chapter 8. Linear Models 8 Linear Models CHAPTER Chapter Outline 8.1 REVIEW OF RATE OF CHANGE 8.2 LINEAR REGRESSION MODELS
www.ck12.org Chapter 8. Linear Models CHAPTER 8 Linear Models Chapter Outline 8.1 REVIEW OF RATE OF CHANGE 8.2 LINEAR REGRESSION MODELS 153 8.1. Review of Rate of Change www.ck12.org 8.1 Review of Rate
Simple Regression Theory I 2010 Samuel L. Baker
SIMPLE REGRESSION THEORY I 1 Simple Regression Theory I 2010 Samuel L. Baker Regression analysis lets you use data to explain and predict. A simple regression line drawn through data points In Assignment
Simple Linear Regression
Simple Linear Regression Simple linear regression models the relationship between an independent variable (x) and a dependent variable (y) using an equation that expresses y as a linear function of x,
Chapter 7: Simple linear regression Learning Objectives
Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -
2. Simple Linear Regression
Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according
Chapter 12 Relationships Between Quantitative Variables: Regression and Correlation
Stats 11 (Fall 2004) Lecture Note Introduction to Statistical Methods for Business and Economics Instructor: Hongquan Xu Chapter 12 Relationships Between Quantitative Variables: Regression and Correlation
Section 10-3 REGRESSION EQUATION 2/3/2017 REGRESSION EQUATION AND REGRESSION LINE. Regression
Section 10-3 Regression REGRESSION EQUATION The regression equation expresses a relationship between (called the independent variable, predictor variable, or explanatory variable) and (called the dependent
Chapter 4 Describing the Relation between Two variables. How can we explore the association between two quantitative variables?
Chapter 4 Describing the Relation between Two variables How can we explore the association between two quantitative variables? An association exists between two variables if a particular value of one variable
, has mean A) 0.3. B) the smaller of 0.8 and 0.5. C) 0.15. D) which cannot be determined without knowing the sample results.
BA 275 Review Problems - Week 9 (11/20/06-11/24/06) CD Lessons: 69, 70, 16-20 Textbook: pp. 520-528, 111-124, 133-141 An SRS of size 100 is taken from a population having proportion 0.8 of successes. An
Correlation and simple linear regression S5
Basic Medical Statistics Course Correlation and simple linear regression S5 Patrycja Gradowska p.gradowska@nki.nl November 4, 2015 1/39 Introduction So far we have looked at the association between: Two
Chapter 14. Inference for Regression
Chapter 14 Inference for Regression Lesson 14-1, Part 1 Inference for Regression Review Least-Square Regression A family doctor is interested in examining the relationship between patient s age and total
STAT 3660 Introduction to Statistics. Chapter 8 Correlation
STAT 3660 Introduction to Statistics Chapter 8 Correlation Associations Between Variables 2 Many interesting examples of the use of statistics involve relationships between pairs of variables. Two variables
Lecture 10: Chapter 10
Lecture 10: Chapter 10 C C Moxley UAB Mathematics 31 October 16 10.1 Pairing Data In Chapter 9, we talked about pairing data in a natural way. In this Chapter, we will essentially be discussing whether
Study Resources For Algebra I. Unit 1C Analyzing Data Sets for Two Quantitative Variables
Study Resources For Algebra I Unit 1C Analyzing Data Sets for Two Quantitative Variables This unit explores linear functions as they apply to data analysis of scatter plots. Information compiled and written
Simple Linear Regression Chapter 11
Simple Linear Regression Chapter 11 Rationale Frequently decision-making situations require modeling of relationships among business variables. For instance, the amount of sale of a product may be related
Chapter 10 Correlation and Regression
Weight Chapter 10 Correlation and Regression Section 10.1 Correlation 1. Introduction Independent variable (x) - also called an explanatory variable or a predictor variable, which is used to predict the
Correlation and regression
Applied Biostatistics Correlation and regression Martin Bland Professor of Health Statistics University of York http://www-users.york.ac.uk/~mb55/msc/ Correlation Example: Muscle strength and height in
Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2
Lesson 4 Part 1 Relationships between two numerical variables 1 Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables
STA Module 5 Regression and Correlation
STA 2023 Module 5 Regression and Correlation Learning Objectives Upon completing this module, you should be able to: 1. Define and apply the concepts related to linear equations with one independent variable.
Correlation and Regression Regression
Correlation and Regression Regression In studying relationships between two variables, collect the data and then construct a scatter plot. The purpose of the scatter plot, as indicated previously, is to
Chapter 4 Describing the Relation between Two Variables
Chapter 4 Describing the Relation between Two Variables 4.1 Scatter Diagrams and Correlation The response variable is the variable whose value can be explained by the value of the explanatory or predictor
Chapters 2 and 10: Least Squares Regression
Chapters 2 and 0: Least Squares Regression Learning goals for this chapter: Describe the form, direction, and strength of a scatterplot. Use SPSS output to find the following: least-squares regression
Chapter 27. Inferences for Regression. Copyright 2012, 2008, 2005 Pearson Education, Inc.
Chapter 27 Inferences for Regression Copyright 2012, 2008, 2005 Pearson Education, Inc. An Example: Body Fat and Waist Size Our chapter example revolves around the relationship between % body fat and waist
Section 2.3: Regression
Section 2.3: Regression Idea: If there is a known linear relationship between two variables x and y (given by the correlation, r), we want to predict what y might be if we know x. The stronger the correlation,
STT200 Chapter 7-9 KM
Chapter 7 Scatterplots, Association, and Correlation Correlation, association, relationship between two sets of numerical data is often discussed. It s believed that there is a relationship between smoking
CORRELATION & REGRESSION
CORRELATION & REGRESSION MULTIPLE CHOICE QUESTIONS In the following multiple-choice questions, select the best answer. 1. The correlation coefficient is used to determine: a. A specific value of the y-variable
What you need to know (Ch.1):
What you need to know (Ch.1): Data Population Census Sample Parameter Statistic Quantitative data o Discrete data o Continuous data Categorical data Four levels of measurement o The nominal level o The
3.6 COVARIANCE, CORRELATION, AND THE LEAST SQUARES REGRESSION LINE
3.6 COVARIANCE, CORRELATION, AND THE LEAST SQUARES REGRESSION LINE ou have no doubt concluded by now that it can be a long and complicated business to find the least squares best-fitting line for a set
AP Statistics Regression Test PART I : TRUE OR FALSE.
AP Statistics Regression Test PART I : TRUE OR FALSE. 1. A high correlation between x and y proves that x causes y. 3. The correlation coefficient has the same sign as the slope of the least squares line
Chapter 7-10 Practice Test. Part I: Multiple Choice (Questions 1-10) - Circle the answer of your choice.
Final_Exam_Score Chapter 7-10 Practice Test Name Part I: Multiple Choice (Questions 1-10) - Circle the answer of your choice. 1. Foresters use regression to predict the volume of timber in a tree using
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a
Background. research questions 25/09/2012. Application of Correlation and Regression Analyses in Pharmacy. Which Approach Is Appropriate When?
Background Let s say we would like to know: Application of Correlation and Regression Analyses in Pharmacy 25 th September, 2012 Mohamed Izham MI, PhD the association between life satisfaction and stress
Regression Analysis: Basic Concepts
The simple linear model Regression Analysis: Basic Concepts Allin Cottrell Represents the dependent variable, y i, as a linear function of one independent variable, x i, subject to a random disturbance
Name Period AP Statistics Unit 10 Review
Name Period AP Statistics Unit 10 Review Use the following to answer questions 1 4: At what age do babies learn to crawl? Does it take longer for them to learn in the winter, when babies are often bundled
Example: Boats and Manatees
Figure 9-6 Example: Boats and Manatees Slide 1 Given the sample data in Table 9-1, find the value of the linear correlation coefficient r, then refer to Table A-6 to determine whether there is a significant
Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),
Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables
1) In regression, an independent variable is sometimes called a response variable. 1)
Exam Name TRUE/FALSE. Write 'T' if the statement is true and 'F' if the statement is false. 1) In regression, an independent variable is sometimes called a response variable. 1) 2) One purpose of regression
Correlation and Regression
Chapter 13 Correlation and Regression 13.1 Study Suggestions Like Chapter 12, this chapter stresses interpretation over computation. For example, at no place in Chapter 13 are you required to compute the
Chapter 10. The relationship between TWO variables. Response and Explanatory Variables. Scatterplots. Example 1: Highway Signs 2/26/2009
Chapter 10 Section 10-2: Correlation Section 10-3: Regression Section 10-4: Variation and Prediction Intervals The relationship between TWO variables So far we have dealt with data obtained from one variable
Radnor High School Course Syllabus. Advanced Placement Statistics. Credits: 1.0 Grades: 11, 12 Prerequisite: Honors Algebra 2 or teacher rec.
Radnor High School Course Syllabus Advanced Placement Statistics 0470 Credits: 1.0 Grades: 11, 12 Weighted: Prerequisite: Honors Algebra 2 or teacher rec. Length: 1 year Format: Meets Daily I. Overall
AMS7: WEEK 8. CLASS 1. Correlation Monday May 18th, 2015
AMS7: WEEK 8. CLASS 1 Correlation Monday May 18th, 2015 Type of Data and objectives of the analysis Paired sample data (Bivariate data) Determine whether there is an association between two variables This
Multiple Regression. Cautions About Simple Linear Regression
Multiple Regression Cautions About Simple Linear Regression Correlation and regression are powerful tools for describing relationship between two variables, but be aware of their limitations Correlation
Correlation and simple linear regression S6
Basic Medical Statistics Course Correlation and simple linear regression S6 Patrycja Gradowska p.gradowska@nki.nl December 3, 2014 1/43 Introduction So far we have looked at the association between: Two
Algebra II Notes Curve Fitting with Linear Unit Scatter Plots, Lines of Regression and Residual Plots
Previously, you Scatter Plots, Lines of Regression and Residual Plots Graphed points on the coordinate plane Graphed linear equations Identified linear equations given its graph Used functions to solve
2. C. About half the scores are lower than 155 and half are higher. So, the midpoint of the distribution is close to 155.
1. B. The tail points to the lower numbers 2. C. About half the scores are lower than 155 and half are higher. So, the midpoint of the distribution is close to 155. 3. A. Most of those scores are higher
where b is the slope of the line and a is the intercept i.e. where the line cuts the y axis.
Least Squares Introduction We have mentioned that one should not always conclude that because two variables are correlated that one variable is causing the other to behave a certain way. However, sometimes
17. SIMPLE LINEAR REGRESSION II
17. SIMPLE LINEAR REGRESSION II The Model In linear regression analysis, we assume that the relationship between X and Y is linear. This does not mean, however, that Y can be perfectly predicted from X.
AP Stats Chapter 3 Notes
3.1 Scatterplots & Correlation AP Stats Chapter 3 Notes Why do we study relationships between two variables? What is an explanatory variable? What is a response variable? What is a scatterplot? How do
Lesson 4 Part 1. Relationships between. two numerical variables. Correlation Coefficient. Relationship between two
Lesson Part Relationships between two numerical variables Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear between two numerical variables Relationship
Simple Linear Regression Models
Simple Linear Regression Models 14-1 Overview 1. Definition of a Good Model 2. Estimation of Model parameters 3. Allocation of Variation 4. Standard deviation of Errors 5. Confidence Intervals for Regression
" Y. Notation and Equations for Regression Lecture 11/4. Notation:
Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through
Correlation. Scatterplots of Paired Data:
10.2 - Correlation Objectives: 1. Determine if there is a linear correlation 2. Conduct a hypothesis test to determine correlation 3. Identify correlation errors Overview: In Chapter 9 we presented methods
Lab 11: Simple Linear Regression
Lab 11: Simple Linear Regression Objective: In this lab, you will examine relationships between two quantitative variables using a graphical tool called a scatterplot. You will interpret scatterplots in
Correlation. Subject Height Weight ) Correlations require at least 2 scores for each person
1 Correlation A statistical technique that describes the relationship between two or more variables Variables are usually observed in a natural environment, with no manipulation by the researcher Example:
X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)
CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.
Correlation and Regression 07/10/09
Correlation and Regression Eleisa Heron 07/10/09 Introduction Correlation and regression for quantitative variables - Correlation: assessing the association between quantitative variables - Simple linear
Univariate Regression
Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is
Relationships Between Two Variables: Scatterplots and Correlation
Relationships Between Two Variables: Scatterplots and Correlation Example: Consider the population of cars manufactured in the U.S. What is the relationship (1) between engine size and horsepower? (2)
2.3 Least-Squares Regression
2.3 Least-Squares Regression The Least-Squares Regression Line Definition. A regression line is a straight line that describes how a response variable y changes as an explanatory variable x changes. We
STT 200. Arnab Bhattacharjee. This lecture is based on Chapters 7, 8 & 9
STT 200 Arnab Bhattacharjee This lecture is based on Chapters 7, 8 & 9 2 Example If we consider purebred dogs, breeds that are large tend to have a shorter life spans than that are small. For example,
MATH 2560 C F03 Elementary Statistics I LECTURE 10: Cautions about Correlations and Regressions.
MATH 2560 C F03 Elementary Statistics I LECTURE 10: Cautions about Correlations and Regressions. 1 Outline residuals; lurking variables; outliers and influential observations; beware the lurking variables
9/16/09. Regression line. Regression. Slope intercept form review. Regression line. Regression line. Regression. y = mx + b.
Regression FPP 10 kind of Correlation coefficient a nice numerical summary of two quantitative variables It indicates direction and strength of association But does it quantify the association? It would
Lecture 8 Linear Models
Lecture 8 Linear Models Scatterplots, Association, and Correlation Two variables measured on the same cases are associated if knowing the value of one of the variables tells you something about the values
Chapter 3 Review. 1. Consider the scatterplot at the right. The correlation between X and Y is approximately A) B) 0.8. C) 0.0. D) 0.7.
Chapter 3 Review Name 1. Consider the scatterplot at the right. The correlation between X and Y is approximately A) 0.999. B) 0.8. C) 0.0. D) 0.7. 2. Which of the following statements is true? A) The correlation
In last class, we learned statistical inference for population mean. Meaning. The population mean. The sample mean
RECALL: In last class, we learned statistical inference for population mean. Problem. Notation Populati on Notation X σ Meaning The population mean The sample mean The population standard deviation s The
BIOSTATS 640 Exam 1 Spring 2016
BIOSTATS 640 Intermediate Biostatistics Spring 2016 Examination 1 Units 1 and 2 Review of Introductory Biostatistics & Regression and Correlation Due: Wednesday March 2, 2016 Before you begin: This is
Simple Regression Analysis and Correlation
Chapter 12 Simple Regression Analysis and Correlation In this chapter, we will study the concept of correlation and how it can be used to estimate the relationship between two variables. We will also explore
Final Review Sheet. Mod 2: Distributions for Quantitative Data
Things to Remember from this Module: Final Review Sheet Mod : Distributions for Quantitative Data How to calculate and write sentences to explain the Mean, Median, Mode, IQR, Range, Standard Deviation,
Simple linear regression
Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between
