# Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship between two variables. If the researcher is working with numeric measures and supposes a linear relationship between these two variables, the appropriate measure of association is correlation. Additionally, if a particular set of assumptions is met, we can predict one of the two variables (an outcome) based on the other variable (a predictor ); this is called simple linear regression. Further, a researcher may wish to understand the relationships among more than two variables. This can be done with an extension of simple linear regression, called multiple linear regression. Recall, any statistical hypothesis test is a method for quantifying how much evidence constitutes enough evidence to declare a significant outcome in a research study. The hypothesis being tested by a correlation, and also by simple linear regression, is whether two variables have a significant linear association with each other.

2 Slide 2 Linear Regression: Examples Is higher wine consumption associated with lower rates of hear disease? What is the nature of this relationship? Is the relationship linear? What is the relationship between the number of people living on farms and the passing of time from 1935 to How fast did the number of people living on farms in the US decrease? What is the relationship between plasma volume in the blood and body weight? Do these two measures have a linear relationship? Does estriol level of a mother have a linear relationship with the birth-weight of her baby? Can we predict birth-weight of a baby from a mother s estriol level? Does the age at which a child first begins talking predict a score of mental ability later in childhood? Is there a linear relationship between systolic blood pressure and age? 2 We learned when we have a measure of two continuous variables we can describe this relationship visually with a scatter-plot. In addition, if that relationship appears to be linear, we can measure the strength and direction of the linear association. Finally, if certain assumptions are met, we may be able to predict the value of one measure from another measure. For example, is higher wine consumption associated with lower rates of hear disease? What is the nature of this relationship? Is the relationship linear? What is the relationship between the number of people living on farms and the passing of time from 1935 to In other words, how fast did the number of people living on farms in the US decrease from 1935 to 1990? What is the relationship between plasma volume in the blood and body weight? Do these two measures have a linear relationship? Can we predict plasma volume in the blood from a person s body weight? How well? Does estriol level of a mother have a linear relationship with the birth-weight of her baby? Can we predict birth-weight of a baby from a mother s estriol level? If so, can we anticipate a low birth-weight baby from estriol levels? Does the age at which a child first begins talking predict a score of mental ability later in childhood? Is there a linear relationship between systolic blood pressure and age? In all of these examples, we are investigating the relationship between two quantitative variables. We may begin this investigation with a scatter-plot followed by a correlation analysis. We will now take our investigation further by introducing simple linear regression.

3

4 Slide 3 Simple Linear Regression Simple Linear Regression(SLR) analysis is used to quantify the linear relationship between two quantitative variables. In this way, it is like correlation, but regression goes farther: It allows us to draw the line that best describes the linear relationship between X and Y. It allows us to predict the value of the outcome Y for a specified value of X. It allows us to quantify how much of a change in the value of Y is seen with a specified change in the value of X. In other studies the goal is to assess the relationships among a set of variables. 3 Simple linear regression analysis is used to quantify the linear relationship between two quantitative variables. In this way, it is like correlation, but regression goes farther: It allows us to draw the line that best describes the linear relationship between X and Y. It allows us to predict the value of the outcome Y for a specified value of X. It allows us to quantify how much of a change in the value of Y is seen with a specified change in the value of X.

5 Slide 4 Variable (X) and Variable (Y) We can describe the relationship or association between two quantitative variables using: Scatterplot Correlation Simple linear regression Usually we identify one variable as the outcome of interest, what we have been mostly thinking of as a disease variable so far. This is often called the response, or dependent, variable. The other variable is the predictor of interest, what we have been mostly thinking of as an exposure variable so far. This is often called the explanatory, or independent, variable. 4 Recall, usually we identify one variable as the outcome of interest, what we have been mostly thinking of as a disease variable so far. This is often called the response, or dependent, variable. The other variable is the predictor of interest, what we have been mostly thinking of as an exposure variable so far. This is often called the explanatory, or independent, variable. When each unit (person) has two measures we usually call one x and one y. If one variable can help predict the value of the other variable we call this variable x. It is also called the predictor, explanatory or independent variables. The other variable, y, is called the outcome, response variable or dependent variable. Sometimes we cannot tell which is the predictor and which is the outcome. Simple linear regression requires we pick one variable as the outcome.

6 Slide 5 Wine Consumption and Heart Disease Is higher wine consumption associated with lower rates of hear disease? What is the nature of this relationship? Is the relationship linear? Moore and McCabe, Introduction to the Practice of Statistics 4 th Edition, W. H. Freeman & Co., New York.. 5 Here is some data on wine consumption and heart disease deaths. Does this data suggest a linear relationship between these two variables?

7 Slide 6 Wine Consumption and Heart Disease 6 The data suggest a negative trend. Can we estimate how much lower heart disease rates are for each extra liter per person per year? How would we draw a line through this data to help us with this estimate? What can we say about the precision of this regression line? How much of the variability in heart disease deaths is explained by the regression line? Do you think these data come from a random sample? What assumptions are we making when using linear regression to make predictions? What confounders must we consider? These are all concepts we will investigate with linear regression.

8 Slide 7 Population Living on Farms What is the relationship between the number of people living on farms and the passing of time from 1935 to How fast did the number of people living on farms in the US decrease? 7 What is the relationship between the number of people living on farms and the passing of time from 1935 to How fast did the number of people living on farms in the US decrease? Does this data suggest a linear relationship between these two variables?

9 Slide 8 Population Living on Farms. How fast did the number of people living on farms in the US decrease? 8 We can see a strong negative trend that appears fairly linear. How might we draw a line through this data? Is there a best way to draw this line?

10 Slide 9 Plasma Volume and Body Weight What is the relationship between plasma volume in the blood and body weight? Do these two measures have a linear relationship? Body Plasma Subject Weight(kg) Volume(l) Consider the association between bodyweight in kilograms and plasma volume in the blood in liters for eight randomly selected people. Do heavier people have more plasma? If so, how much more? Is this relationship linear?

11 Slide 10 Simple Linear Regression Y, plasma volume (liters) Pearson s correlation = X, body weight (kg) 10 When we plot the data we can see a positive relationship between bodyweight and plasma. The data do not fall perfectly in a line. The correlation value when calculated is of We could calculate the value of correlation to help us understand the strength of the linear relationship. We may want to draw a line through this data, thus giving us a mathematical model to estimate plasma volume from weight, but which is the best line? The white line, the green line or the purple line? The technique of least squares regression will help us pick the line of best fit.

12 Slide 11 How Do We Choose the Best Line? The least squares regression line is the line which gets closest to all of the points How do we measure closeness to more than one point? minimize n (y i point_on_line i ) 2 i=1 11 The line of best fit is the regression line is the line that gets `closest' to all the data points. `Closeness' is measured as the vertical distance from the line to the data points. Specifically, the regression line is the one that minimizes the sum of all the squared vertical distances, hence estimation of this line is called least squares and the line is called the least square regression line.

13 Slide 12 Simple Linear Regression 12 Visually, we find the line that minimizes the squares of the vertical distances and the positive measures (points above the line) and the negative measures (points below the line), sum to zero. This could be very difficult to achieve by trial and error. We have some mathematical formulas that help us determine this exact line.

14 Slide 13 Equation of a Line Definition A line is defined by The intercept a (where the line crosses the vertical axis, the value of Y when X = 0), and The slope b (`rise over run,' how much y changes for each 1 unit change in x). y = a + bx 13 Before we move further with linear regression, let s review the equation of a line. That is, how do we represent a line with a mathematical function. A line is defined by the intercept a (where the line crosses the vertical axis, the value of Y when X = 0), and the slope b (`rise over run,' how much y changes for each 1 unit change in x). We write this as y = a + bx.

15 Slide 14 Equation of a Line 14 We can see the line crosses the vertical axis at the value a, when x = 0. We also see that for every one unit increase in x, y will change by the amount b.

16 Slide 15 Equation of a Line: Statistical Notation b b 0 1 = intercept = slope ˆ = b + b x y In statistics, the symbol for the intercept is b knot and the symbol for the slope is b sub one. Then we write the line as : y hat equals b0 + b1x. The reason we use yhat instead of y is to differentiate between the real data value y and our predicted value yhat given a value of x.

17 Slide 16 Equation of a Line: Statistical Notation y ˆ = b + b x y 0 1 b 0 b 1 slope intercept 0 x 16 Using statistical notation, we have the same picture as before. Here the line crosses the vertical axis at the value b knot, when x = 0. We also see that for every one unit increase in x, y-hat will change by the amount b sub 1.

18 Slide 17 Estimating Intercept and Slope b b 0 1 = y b x s = r s y x 1 yˆ = b + b x The least squares line minimizes the sum of squared vertical distances. This translates into: b knot equal ybar slope times xbar. The slope is the correlation times the ratio of the standard deviation of the observed y values divided by the standard deviation of the observed x values. In this way, we see the slope and the correlation are related to one another. The correlation depends on both the slope and the precision. The equations are obtained using mathematics beyond this course. It is enough to understand that these are the equations to help us determine the least squares regression line, y hat = b not plus b sub 1 times x.

19 Slide 18 y y Slope and Correlation b >0 1 b 1 = 0 b 1 < 0 0 x 18 Notice if the slope is positive then the correlation is positive. If the slope is zero then the correlation is zero. If the slope is negative then the correlation is negative.

20 Slide 19 Simple Linear Regression Y, plasma volume (liters) Pearson s correlation = X, body weight (kg) 19 The data points are represented as the dots in our scatter-plot, but the data points don't fall exactly on the line. How do we compute (and write) the least squares line for this data? Once we have the line, for any x value within the range of those values in our dataset, y-hat is the point that will fall exactly on the least squares line, not the data value for y. Thus every x value can be plugged into this equation to calculate a predicted y value which we denote y-hat.

21 Slide 20 Estimating Intercept and Slope sy b1 = r = s x = b = y b x = (66.875) = yˆ = x 20 Using the equations for estimating the slope and intercept for the least squares regression line, we get an intercept of and a slope of We must calculate the slope first because the equation for the intercept requires the use of the estimate of the slope. Generally, we do not do these calculations by hand. We use software to compute these values.

22 Slide 21 Plasma Volume and Weight yˆ = x 21 Using R we plot the least square regression line. This means for every one kilogram increase in body weight there is on average a liter increase in plasma volume. The intercept is the estimated plasma volume for a person who weighs zero kilograms. This estimate does not make biological sense. In this way, the intercept for this model is merely used to help us determine the line, not make a prediction at x = 0. The only meaningful estimates are within the range of our x values. That is weights from about 55 to 75 kilograms.

23 Slide 22 Plasma Volume and Weight Measurement of plasma volume very time consuming Body weight easy to measure: use equation and body weight to estimate plasma volume yˆ = x = (60) = Measuring plasma volume is very time consuming. We may want to estimate the plasma volume of a person outside this study based on the person s weight. For example, what on average would you expect plasma volume to be in liters for a 60 kilogram man? We would put 60 kilograms in for x and then calculate the estimated value to be 2.7 liters. That is, yhat equals * 60. Be very careful only to make estimates within the range of the data that was used to estimate the regression line. Also, be aware that measurement unit is meaningful. We would not want to insert values in pounds when the regression line is based on kilograms.

24 Slide 23 RSQUARE The square of the correlation (r 2= RSQUARE) is the fraction of the variation in the values of y that is explained by the least squares regression of y on x. r 2 variance of predicted values ŷ = variance of observed values of y = SSM SST 23 Recall Pearson s correlation: It measures the strength of the linear relationship between two quantitative variables. There is another measure called the coefficient of determination. It s value is Pearson s correlation squared. For this reason, it is often denoted RSQUARE. When using least squares regression typically the value of the coefficient of determination is used to help understand the amount of total variation that is explained by the regression of y on x. In fact, RSQUARE = SSM/SST. This is the sum of the squares of the model divided by the sum of the squares total. Those values will come from the ANOVA table in the linear regression output from the software. We will discuss the ANOVA table at length in a later lesson.

25 Slide 24 Plasma Volume and Weight This means 57.6% of the variation in plasma volume is explained by the least squares regression line of plasma volume on body weight. r 2 = 2 (0.759) = Recall, the correlation between plasma volume and weight is It we square this value, we have the coefficient of determination. The value is This means 57.6% of the variation in plasma volume is explained by the least squares regression line of plasma volume on body weight. When RSQUARE is close to 1, the regression line (the y-hat values) is representing the original data (the Y values) well. When RSQUARE is close to 0, the regression line is not representing the original data well.

26 Slide 25 Simple Linear Regression: Residuals 25 When we draw the least squares regression line, the line of best fit, the line does not fall directly on all the data points. That is, the y-hat values are different than the actual y values for the data. We call these vertical distances Residuals.

27 Slide 26 Residuals Model ˆ = b + b x y 0 1 ε = i y i yˆ i ε i =difference between observed and predicted value of response for each value of x => Called the residual. 26 y yhat for each piece of data is the residual for that point. This value is often denoted with epsilon sub i. We can calculate the value at any x in our dataset by taking the observed y value minus the predicted value, y-hat from the model. If the residual is positive, it means the data value is above the line. If the residual is negative, the data value is below the line. We will use residuals and residual plots in our next lesson to investigate how well the linear model is fitting the data observed.

28 Slide 27 Estriol and Infant Birth-weight Obstetricians sometimes order tests for estriol levels from 24-hour urine specimens taken from pregnant women who are near term. The level of estriol (mg/24 hours) has been found to be positively related to the birth-weight (grams/100) of the infant. Thus, the test can provide indirect evidence of an abnormally small fetus. [Bernard Rosner, Fundamentals of Biostatistics, page 425] 27 Let s do an another example. Obstetricians sometimes order tests for estriol levels from 24-hour urine specimens taken from pregnant women who are near term, since the level of estriol has been found to be related to the birth-weight of the infant. The test may provide indirect evidence of an abnormally small fetus.

29 Slide 28 Estriol and Infant Birth-weight Pearson' s Correlation, r = Here is the scatter-plot of birth-weight and Estriol for 31 women and babies. We can see that there is a positive relationship between estriol level and birthweight. The relationship is not perfect, but linear regression may still help with predictions. The Pearson s correlation value is Notice that birth-weight is in g/100. We will want to know this unit later for our calculations.

30 Slide 29 Estriol and Infant Birth-weight yˆ = x 29 The values of the slope and intercept can be calculated using software, or by using the equations given in earlier slides. The prediction line shown on the scatter-plot is yhat = x. This means for every one unit increase in estriol level the birth-weight of the infant is on average g/100 higher, about 60 grams.

31 Slide 30 Estriol and Infant Birth-weight Using estriol level to predict infant birth-weight when estriol level is 10mg. yˆ = x = (10) = 27.6 grams/ Suppose we want to estimate the birth-weight of a baby whose mother has an estriol level of 10 mg. Before we begin, we verify 10 mg is in the range of the original data. We can do this by looking at the scatter-plot of the data. We can then put 10 mg in the least squares regression equation for x and calculate an estimated weight of 27.6 g/100. This is 2,760 grams.

32 Slide 31 Estriol and Infant Birth-weight Using estriol level to predict infant birth-weight when estriol level is 30mg. 31 Suppose we want to estimate the birth-weight of a baby whose mother has an estriol level of 30 mg. Before we begin, we verify 30 mg is in the range of the original data. We can do this by looking at the scatter-plot of the data. We see that 30mg is NOT in the range of the x data for our study. We should not use the regression line to estimate infant birth-weight!

33 Slide 32 Estriol and Infant Birth-weight Now let's go in the reverse direction: Low birth-weight may be defined as infant birth-weight less than 2500 grams. For what estriol level is the predicted infant birth-weight equal to 2500 grams? (First convert to the correct units: 2500 grams = 25 grams/100.) 25 = x = 0.608x = x = x 32 Now let's go in the reverse direction: Low birth-weight may be defined as infant birth-weight less than 2500 grams. For what estriol level is the predicted infant birth-weight equal to 2500 grams? First we must convert to the correct units: 2500 grams = 25 grams/100. If you set 25 = x and then solve for x, you will find the estriol level that predicts a low birth-weight baby. The value of x is 5.72 mg.

34 Slide 33 Assumptions L = linear relationship between y and x. I = independence between values of y. (Value of one y does not affect value of another y). N = normality of error around each value of y. E= equality of variance around y for each value of x. 33 Linear regression requires we make some assumptions. Conveniently, these assumptions follow the acronym LINE. These assumptions are: L = = linear relationship between y and x. I = independence between values of y. One value of y does not affect another value of y. N = normality of error around each value of y. E= equality of variance around y for each value of x. Our next lesson will explore techniques to evaluate each of these assumptions.

35 Slide 34 Cautions Predicted values should only be computed for X values that fall within the range of X values in the original data. Just like a correlation, a regression line only summarizes the linear relationship between X and Y. If the relationship is truly non-linear, then using the regression line can be misleading. Seeing a relationship (an association) between X and Y does not imply causation: that changes in X will cause changes in Y. 34 In addition to evaluating linear regression assumptions, we must take caution with the interpretation of our results. Predicted values should only be computed for X values that fall within the range of X values in the original data. Just like a correlation, a regression line only summarizes the linear relationship between X and Y. If the relationship is truly non-linear, then using the regression line can be misleading. Seeing a relationship (an association) between X and Y does not imply causation: that changes in X will cause changes in Y.

36 Slide 35 Cautions In the regression context, a lurking variable is a third variable that may influence the relationship between X and Y. Outliers and skewed data can impact the regression line, just like they can impact the correlation. Either X or Y or both could have outliers or skewness. If including a particular data point changes the regression line compared to when it is not included, the data point is called influential. 35 In the regression context, a lurking variable is a third variable that may influence the relationship between X and Y. Outliers and skewed data can impact the regression line, just like they can impact the correlation. Either X or Y or both could have outliers or skewness. If including a particular data point changes the regression line compared to when it is not included, the data point is called influential. Does that seem like many `cautions'? It is: as we learn methods that are more complicated, there will often be more limits on their use and interpretation.

### Lesson Lesson Outline Outline

Lesson 15 Linear Regression Lesson 15 Outline Review correlation analysis Dependent and Independent variables Least Squares Regression line Calculating l the slope Calculating the Intercept Residuals and

### Linear Regression. Chapter 5. Prediction via Regression Line Number of new birds and Percent returning. Least Squares

Linear Regression Chapter 5 Regression Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). We can then predict the average response for all subjects

### Section 3 Part 1. Relationships between two numerical variables

Section 3 Part 1 Relationships between two numerical variables 1 Relationship between two variables The summary statistics covered in the previous lessons are appropriate for describing a single variable.

### Regression Analysis Prof. Soumen Maity Department of Mathematics Indian Institute of Technology, Kharagpur. Lecture - 2 Simple Linear Regression

Regression Analysis Prof. Soumen Maity Department of Mathematics Indian Institute of Technology, Kharagpur Lecture - 2 Simple Linear Regression Hi, this is my second lecture in module one and on simple

### , has mean A) 0.3. B) the smaller of 0.8 and 0.5. C) 0.15. D) which cannot be determined without knowing the sample results.

BA 275 Review Problems - Week 9 (11/20/06-11/24/06) CD Lessons: 69, 70, 16-20 Textbook: pp. 520-528, 111-124, 133-141 An SRS of size 100 is taken from a population having proportion 0.8 of successes. An

### Lecture 5: Correlation and Linear Regression

Lecture 5: Correlation and Linear Regression 3.5. (Pearson) correlation coefficient The correlation coefficient measures the strength of the linear relationship between two variables. The correlation is

### 2. Simple Linear Regression

Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

### Exercise 1.12 (Pg. 22-23)

Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

### where b is the slope of the line and a is the intercept i.e. where the line cuts the y axis.

Least Squares Introduction We have mentioned that one should not always conclude that because two variables are correlated that one variable is causing the other to behave a certain way. However, sometimes

### Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

### Simple Linear Regression Chapter 11

Simple Linear Regression Chapter 11 Rationale Frequently decision-making situations require modeling of relationships among business variables. For instance, the amount of sale of a product may be related

### Simple Regression Theory I 2010 Samuel L. Baker

SIMPLE REGRESSION THEORY I 1 Simple Regression Theory I 2010 Samuel L. Baker Regression analysis lets you use data to explain and predict. A simple regression line drawn through data points In Assignment

### Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2

Lesson 4 Part 1 Relationships between two numerical variables 1 Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables

### Study Resources For Algebra I. Unit 1C Analyzing Data Sets for Two Quantitative Variables

Study Resources For Algebra I Unit 1C Analyzing Data Sets for Two Quantitative Variables This unit explores linear functions as they apply to data analysis of scatter plots. Information compiled and written

### Elementary Statistics. Scatter Plot, Regression Line, Linear Correlation Coefficient, and Coefficient of Determination

Scatter Plot, Regression Line, Linear Correlation Coefficient, and Coefficient of Determination What is a Scatter Plot? A Scatter Plot is a plot of ordered pairs (x, y) where the horizontal axis is used

### Lesson 4 Part 1. Relationships between. two numerical variables. Correlation Coefficient. Relationship between two

Lesson Part Relationships between two numerical variables Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear between two numerical variables Relationship

### Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables

### X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

### , then the form of the model is given by: which comprises a deterministic component involving the three regression coefficients (

Multiple regression Introduction Multiple regression is a logical extension of the principles of simple linear regression to situations in which there are several predictor variables. For instance if we

### Correlation key concepts:

CORRELATION Correlation key concepts: Types of correlation Methods of studying correlation a) Scatter diagram b) Karl pearson s coefficient of correlation c) Spearman s Rank correlation coefficient d)

### Regression Analysis: Basic Concepts

The simple linear model Regression Analysis: Basic Concepts Allin Cottrell Represents the dependent variable, y i, as a linear function of one independent variable, x i, subject to a random disturbance

### Univariate Regression

Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is

### Yiming Peng, Department of Statistics. February 12, 2013

Regression Analysis Using JMP Yiming Peng, Department of Statistics February 12, 2013 2 Presentation and Data http://www.lisa.stat.vt.edu Short Courses Regression Analysis Using JMP Download Data to Desktop

### Example: Boats and Manatees

Figure 9-6 Example: Boats and Manatees Slide 1 Given the sample data in Table 9-1, find the value of the linear correlation coefficient r, then refer to Table A-6 to determine whether there is a significant

### Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

### AMS7: WEEK 8. CLASS 1. Correlation Monday May 18th, 2015

AMS7: WEEK 8. CLASS 1 Correlation Monday May 18th, 2015 Type of Data and objectives of the analysis Paired sample data (Bivariate data) Determine whether there is an association between two variables This

### Simple linear regression

Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

### Relationships Between Two Variables: Scatterplots and Correlation

Relationships Between Two Variables: Scatterplots and Correlation Example: Consider the population of cars manufactured in the U.S. What is the relationship (1) between engine size and horsepower? (2)

### SIMPLE REGRESSION ANALYSIS

SIMPLE REGRESSION ANALYSIS Introduction. Regression analysis is used when two or more variables are thought to be systematically connected by a linear relationship. In simple regression, we have only two

### " Y. Notation and Equations for Regression Lecture 11/4. Notation:

Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

### Correlation and Regression

Correlation and Regression Scatterplots Correlation Explanatory and response variables Simple linear regression General Principles of Data Analysis First plot the data, then add numerical summaries Look

### 17. SIMPLE LINEAR REGRESSION II

17. SIMPLE LINEAR REGRESSION II The Model In linear regression analysis, we assume that the relationship between X and Y is linear. This does not mean, however, that Y can be perfectly predicted from X.

### Chapter 9. Section Correlation

Chapter 9 Section 9.1 - Correlation Objectives: Introduce linear correlation, independent and dependent variables, and the types of correlation Find a correlation coefficient Test a population correlation

### Chapter 10 - Practice Problems 1

Chapter 10 - Practice Problems 1 1. A researcher is interested in determining if one could predict the score on a statistics exam from the amount of time spent studying for the exam. In this study, the

### 12/31/2016. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 Understand linear regression with a single predictor Understand how we assess the fit of a regression model Total Sum of Squares

### The aspect of the data that we want to describe/measure is the degree of linear relationship between and The statistic r describes/measures the degree

PS 511: Advanced Statistics for Psychological and Behavioral Research 1 Both examine linear (straight line) relationships Correlation works with a pair of scores One score on each of two variables ( and

### Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association

### Chapter 11: Two Variable Regression Analysis

Department of Mathematics Izmir University of Economics Week 14-15 2014-2015 In this chapter, we will focus on linear models and extend our analysis to relationships between variables, the definitions

### Mind on Statistics. Chapter 3

Mind on Statistics Chapter 3 Section 3.1 1. Which one of the following is not appropriate for studying the relationship between two quantitative variables? A. Scatterplot B. Bar chart C. Correlation D.

### Statistiek II. John Nerbonne. March 24, 2010. Information Science, Groningen Slides improved a lot by Harmut Fitz, Groningen!

Information Science, Groningen j.nerbonne@rug.nl Slides improved a lot by Harmut Fitz, Groningen! March 24, 2010 Correlation and regression We often wish to compare two different variables Examples: compare

### Practice 3 SPSS. Partially based on Notes from the University of Reading:

Practice 3 SPSS Partially based on Notes from the University of Reading: http://www.reading.ac.uk Simple Linear Regression A simple linear regression model is fitted when you want to investigate whether

### Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

### Algebra I: Lesson 5-4 (5074) SAS Curriculum Pathways

Two-Variable Quantitative Data: Lesson Summary with Examples Bivariate data involves two quantitative variables and deals with relationships between those variables. By plotting bivariate data as ordered

### Simple Linear Regression Inference

Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

### Introduction to Regression. Dr. Tom Pierce Radford University

Introduction to Regression Dr. Tom Pierce Radford University In the chapter on correlational techniques we focused on the Pearson R as a tool for learning about the relationship between two variables.

### Lecture 18 Linear Regression

Lecture 18 Statistics Unit Andrew Nunekpeku / Charles Jackson Fall 2011 Outline 1 1 Situation - used to model quantitative dependent variable using linear function of quantitative predictor(s). Situation

### SPSS Guide: Regression Analysis

SPSS Guide: Regression Analysis I put this together to give you a step-by-step guide for replicating what we did in the computer lab. It should help you run the tests we covered. The best way to get familiar

### CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

### Describing Relationships between Two Variables

Describing Relationships between Two Variables Up until now, we have dealt, for the most part, with just one variable at a time. This variable, when measured on many different subjects or objects, took

### UNDERSTANDING MULTIPLE REGRESSION

UNDERSTANDING Multiple regression analysis (MRA) is any of several related statistical methods for evaluating the effects of more than one independent (or predictor) variable on a dependent (or outcome)

### HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate

### Homework 8 Solutions

Math 17, Section 2 Spring 2011 Homework 8 Solutions Assignment Chapter 7: 7.36, 7.40 Chapter 8: 8.14, 8.16, 8.28, 8.36 (a-d), 8.38, 8.62 Chapter 9: 9.4, 9.14 Chapter 7 7.36] a) A scatterplot is given below.

### NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

### The importance of graphing the data: Anscombe s regression examples

The importance of graphing the data: Anscombe s regression examples Bruce Weaver Northern Health Research Conference Nipissing University, North Bay May 30-31, 2008 B. Weaver, NHRC 2008 1 The Objective

### e = random error, assumed to be normally distributed with mean 0 and standard deviation σ

1 Linear Regression 1.1 Simple Linear Regression Model The linear regression model is applied if we want to model a numeric response variable and its dependency on at least one numeric factor variable.

### Correlation. What Is Correlation? Perfect Correlation. Perfect Correlation. Greg C Elvers

Correlation Greg C Elvers What Is Correlation? Correlation is a descriptive statistic that tells you if two variables are related to each other E.g. Is your related to how much you study? When two variables

### 17.0 Linear Regression

17.0 Linear Regression 1 Answer Questions Lines Correlation Regression 17.1 Lines The algebraic equation for a line is Y = β 0 + β 1 X 2 The use of coordinate axes to show functional relationships was

### Outline. Correlation & Regression, III. Review. Relationship between r and regression

Outline Correlation & Regression, III 9.07 4/6/004 Relationship between correlation and regression, along with notes on the correlation coefficient Effect size, and the meaning of r Other kinds of correlation

### AP STATISTICS 2006 SCORING GUIDELINES. Question 2

2006 SCING GUIDELINES Question 2 Intent of Question The primary goal of this question is to assess a student s ability to identify the estimated regression line and to identify and interpret important

### Simple Regression Theory II 2010 Samuel L. Baker

SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

### 1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

STA 3024 Practice Problems Exam 2 NOTE: These are just Practice Problems. This is NOT meant to look just like the test, and it is NOT the only thing that you should study. Make sure you know all the material

### LEARNING OBJECTIVES SCALES OF MEASUREMENT: A REVIEW SCALES OF MEASUREMENT: A REVIEW DESCRIBING RESULTS DESCRIBING RESULTS 8/14/2016

UNDERSTANDING RESEARCH RESULTS: DESCRIPTION AND CORRELATION LEARNING OBJECTIVES Contrast three ways of describing results: Comparing group percentages Correlating scores Comparing group means Describe

### STAT 350 Practice Final Exam Solution (Spring 2015)

PART 1: Multiple Choice Questions: 1) A study was conducted to compare five different training programs for improving endurance. Forty subjects were randomly divided into five groups of eight subjects

### Simple Linear Regression

Inference for Regression Simple Linear Regression IPS Chapter 10.1 2009 W.H. Freeman and Company Objectives (IPS Chapter 10.1) Simple linear regression Statistical model for linear regression Estimating

### Statistics II Final Exam - January Use the University stationery to give your answers to the following questions.

Statistics II Final Exam - January 2012 Use the University stationery to give your answers to the following questions. Do not forget to write down your name and class group in each page. Indicate clearly

Lecture 5: Linear least-squares Regression III: Advanced Methods William G. Jacoby Department of Political Science Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Simple Linear Regression

### Notes 5: More on regression and residuals ECO 231W - Undergraduate Econometrics

Notes 5: More on regression and residuals ECO 231W - Undergraduate Econometrics Prof. Carolina Caetano 1 Regression Method Let s review the method to calculate the regression line: 1. Find the point of

### Simple Linear Regression in SPSS STAT 314

Simple Linear Regression in SPSS STAT 314 1. Ten Corvettes between 1 and 6 years old were randomly selected from last year s sales records in Virginia Beach, Virginia. The following data were obtained,

### Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing

### 7. Tests of association and Linear Regression

7. Tests of association and Linear Regression In this chapter we consider 1. Tests of Association for 2 qualitative variables. 2. Measures of the strength of linear association between 2 quantitative variables.

### Module 3: Correlation and Covariance

Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis

### Using Minitab for Regression Analysis: An extended example

Using Minitab for Regression Analysis: An extended example The following example uses data from another text on fertilizer application and crop yield, and is intended to show how Minitab can be used to

### Chapter 9 Descriptive Statistics for Bivariate Data

9.1 Introduction 215 Chapter 9 Descriptive Statistics for Bivariate Data 9.1 Introduction We discussed univariate data description (methods used to eplore the distribution of the values of a single variable)

### 2013 MBA Jump Start Program. Statistics Module Part 3

2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just

### EXPERIMENT 6: HERITABILITY AND REGRESSION

BIO 184 Laboratory Manual Page 74 EXPERIMENT 6: HERITABILITY AND REGRESSION DAY ONE: INTRODUCTION TO HERITABILITY AND REGRESSION OBJECTIVES: Today you will be learning about some of the basic ideas and

### Prentice Hall Mathematics: Algebra 1 2007 Correlated to: Michigan Merit Curriculum for Algebra 1

STRAND 1: QUANTITATIVE LITERACY AND LOGIC STANDARD L1: REASONING ABOUT NUMBERS, SYSTEMS, AND QUANTITATIVE SITUATIONS Based on their knowledge of the properties of arithmetic, students understand and reason

### Infinite Algebra 1 supports the teaching of the Common Core State Standards listed below.

Infinite Algebra 1 Kuta Software LLC Common Core Alignment Software version 2.05 Last revised July 2015 Infinite Algebra 1 supports the teaching of the Common Core State Standards listed below. High School

### Chapter 12 : Linear Correlation and Linear Regression

Number of Faculty Chapter 12 : Linear Correlation and Linear Regression Determining whether a linear relationship exists between two quantitative variables, and modeling the relationship with a line, if

Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models

### Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Statistics Quiz Correlation and Regression -- ANSWERS 1. Temperature and air pollution are known to be correlated. We collect data from two laboratories, in Boston and Montreal. Boston makes their measurements

### Simple Predictive Analytics Curtis Seare

Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use

### 11. Analysis of Case-control Studies Logistic Regression

Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

Academic Content Standards Grade Eight and Grade Nine Ohio Algebra 1 2008 Grade Eight STANDARDS Number, Number Sense and Operations Standard Number and Number Systems 1. Use scientific notation to express

### Stat 412/512 CASE INFLUENCE STATISTICS. Charlotte Wickham. stat512.cwick.co.nz. Feb 2 2015

Stat 412/512 CASE INFLUENCE STATISTICS Feb 2 2015 Charlotte Wickham stat512.cwick.co.nz Regression in your field See website. You may complete this assignment in pairs. Find a journal article in your field

### PASS Sample Size Software. Linear Regression

Chapter 855 Introduction Linear regression is a commonly used procedure in statistical analysis. One of the main objectives in linear regression analysis is to test hypotheses about the slope (sometimes

### CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the

### AP STATISTICS REVIEW (YMS Chapters 1-8)

AP STATISTICS REVIEW (YMS Chapters 1-8) Exploring Data (Chapter 1) Categorical Data nominal scale, names e.g. male/female or eye color or breeds of dogs Quantitative Data rational scale (can +,,, with

### SELF-TEST: SIMPLE REGRESSION

ECO 22000 McRAE SELF-TEST: SIMPLE REGRESSION Note: Those questions indicated with an (N) are unlikely to appear in this form on an in-class examination, but you should be able to describe the procedures

### 4. Describing Bivariate Data

4. Describing Bivariate Data A. Introduction to Bivariate Data B. Values of the Pearson Correlation C. Properties of Pearson's r D. Computing Pearson's r E. Variance Sum Law II F. Exercises A dataset with

### Means, standard deviations and. and standard errors

CHAPTER 4 Means, standard deviations and standard errors 4.1 Introduction Change of units 4.2 Mean, median and mode Coefficient of variation 4.3 Measures of variation 4.4 Calculating the mean and standard

### Regression in SPSS. Workshop offered by the Mississippi Center for Supercomputing Research and the UM Office of Information Technology

Regression in SPSS Workshop offered by the Mississippi Center for Supercomputing Research and the UM Office of Information Technology John P. Bentley Department of Pharmacy Administration University of

### Simple Regression and Correlation

Simple Regression and Correlation Today, we are going to discuss a powerful statistical technique for examining whether or not two variables are related. Specifically, we are going to talk about the ideas

### The Simple Linear Regression Model: Specification and Estimation

Chapter 3 The Simple Linear Regression Model: Specification and Estimation 3.1 An Economic Model Suppose that we are interested in studying the relationship between household income and expenditure on

### ST 311 Evening Problem Session Solutions Week 11

1. p. 175, Question 32 (Modules 10.1-10.4) [Learning Objectives J1, J3, J9, J11-14, J17] Since 1980, average mortgage rates have fluctuated from a low of under 6% to a high of over 14%. Is there a relationship

### Lecture - 32 Regression Modelling Using SPSS

Applied Multivariate Statistical Modelling Prof. J. Maiti Department of Industrial Engineering and Management Indian Institute of Technology, Kharagpur Lecture - 32 Regression Modelling Using SPSS (Refer

### 1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x - x) B. x 3 x C. 3x - x D. x - 3x 2) Write the following as an algebraic expression

### Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation

Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation Display and Summarize Correlation for Direction and Strength Properties of Correlation Regression Line Cengage

Section 5.4 The Quadratic Formula 481 5.4 The Quadratic Formula Consider the general quadratic function f(x) = ax + bx + c. In the previous section, we learned that we can find the zeros of this function