Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Size: px
Start display at page:

Download "Section 14 Simple Linear Regression: Introduction to Least Squares Regression"

Transcription

1 Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship between two variables. If the researcher is working with numeric measures and supposes a linear relationship between these two variables, the appropriate measure of association is correlation. Additionally, if a particular set of assumptions is met, we can predict one of the two variables (an outcome) based on the other variable (a predictor ); this is called simple linear regression. Further, a researcher may wish to understand the relationships among more than two variables. This can be done with an extension of simple linear regression, called multiple linear regression. Recall, any statistical hypothesis test is a method for quantifying how much evidence constitutes enough evidence to declare a significant outcome in a research study. The hypothesis being tested by a correlation, and also by simple linear regression, is whether two variables have a significant linear association with each other.

2 Slide 2 Linear Regression: Examples Is higher wine consumption associated with lower rates of hear disease? What is the nature of this relationship? Is the relationship linear? What is the relationship between the number of people living on farms and the passing of time from 1935 to How fast did the number of people living on farms in the US decrease? What is the relationship between plasma volume in the blood and body weight? Do these two measures have a linear relationship? Does estriol level of a mother have a linear relationship with the birth-weight of her baby? Can we predict birth-weight of a baby from a mother s estriol level? Does the age at which a child first begins talking predict a score of mental ability later in childhood? Is there a linear relationship between systolic blood pressure and age? 2 We learned when we have a measure of two continuous variables we can describe this relationship visually with a scatter-plot. In addition, if that relationship appears to be linear, we can measure the strength and direction of the linear association. Finally, if certain assumptions are met, we may be able to predict the value of one measure from another measure. For example, is higher wine consumption associated with lower rates of hear disease? What is the nature of this relationship? Is the relationship linear? What is the relationship between the number of people living on farms and the passing of time from 1935 to In other words, how fast did the number of people living on farms in the US decrease from 1935 to 1990? What is the relationship between plasma volume in the blood and body weight? Do these two measures have a linear relationship? Can we predict plasma volume in the blood from a person s body weight? How well? Does estriol level of a mother have a linear relationship with the birth-weight of her baby? Can we predict birth-weight of a baby from a mother s estriol level? If so, can we anticipate a low birth-weight baby from estriol levels? Does the age at which a child first begins talking predict a score of mental ability later in childhood? Is there a linear relationship between systolic blood pressure and age? In all of these examples, we are investigating the relationship between two quantitative variables. We may begin this investigation with a scatter-plot followed by a correlation analysis. We will now take our investigation further by introducing simple linear regression.

3

4 Slide 3 Simple Linear Regression Simple Linear Regression(SLR) analysis is used to quantify the linear relationship between two quantitative variables. In this way, it is like correlation, but regression goes farther: It allows us to draw the line that best describes the linear relationship between X and Y. It allows us to predict the value of the outcome Y for a specified value of X. It allows us to quantify how much of a change in the value of Y is seen with a specified change in the value of X. In other studies the goal is to assess the relationships among a set of variables. 3 Simple linear regression analysis is used to quantify the linear relationship between two quantitative variables. In this way, it is like correlation, but regression goes farther: It allows us to draw the line that best describes the linear relationship between X and Y. It allows us to predict the value of the outcome Y for a specified value of X. It allows us to quantify how much of a change in the value of Y is seen with a specified change in the value of X.

5 Slide 4 Variable (X) and Variable (Y) We can describe the relationship or association between two quantitative variables using: Scatterplot Correlation Simple linear regression Usually we identify one variable as the outcome of interest, what we have been mostly thinking of as a disease variable so far. This is often called the response, or dependent, variable. The other variable is the predictor of interest, what we have been mostly thinking of as an exposure variable so far. This is often called the explanatory, or independent, variable. 4 Recall, usually we identify one variable as the outcome of interest, what we have been mostly thinking of as a disease variable so far. This is often called the response, or dependent, variable. The other variable is the predictor of interest, what we have been mostly thinking of as an exposure variable so far. This is often called the explanatory, or independent, variable. When each unit (person) has two measures we usually call one x and one y. If one variable can help predict the value of the other variable we call this variable x. It is also called the predictor, explanatory or independent variables. The other variable, y, is called the outcome, response variable or dependent variable. Sometimes we cannot tell which is the predictor and which is the outcome. Simple linear regression requires we pick one variable as the outcome.

6 Slide 5 Wine Consumption and Heart Disease Is higher wine consumption associated with lower rates of hear disease? What is the nature of this relationship? Is the relationship linear? Moore and McCabe, Introduction to the Practice of Statistics 4 th Edition, W. H. Freeman & Co., New York.. 5 Here is some data on wine consumption and heart disease deaths. Does this data suggest a linear relationship between these two variables?

7 Slide 6 Wine Consumption and Heart Disease 6 The data suggest a negative trend. Can we estimate how much lower heart disease rates are for each extra liter per person per year? How would we draw a line through this data to help us with this estimate? What can we say about the precision of this regression line? How much of the variability in heart disease deaths is explained by the regression line? Do you think these data come from a random sample? What assumptions are we making when using linear regression to make predictions? What confounders must we consider? These are all concepts we will investigate with linear regression.

8 Slide 7 Population Living on Farms What is the relationship between the number of people living on farms and the passing of time from 1935 to How fast did the number of people living on farms in the US decrease? 7 What is the relationship between the number of people living on farms and the passing of time from 1935 to How fast did the number of people living on farms in the US decrease? Does this data suggest a linear relationship between these two variables?

9 Slide 8 Population Living on Farms. How fast did the number of people living on farms in the US decrease? 8 We can see a strong negative trend that appears fairly linear. How might we draw a line through this data? Is there a best way to draw this line?

10 Slide 9 Plasma Volume and Body Weight What is the relationship between plasma volume in the blood and body weight? Do these two measures have a linear relationship? Body Plasma Subject Weight(kg) Volume(l) Consider the association between bodyweight in kilograms and plasma volume in the blood in liters for eight randomly selected people. Do heavier people have more plasma? If so, how much more? Is this relationship linear?

11 Slide 10 Simple Linear Regression Y, plasma volume (liters) Pearson s correlation = X, body weight (kg) 10 When we plot the data we can see a positive relationship between bodyweight and plasma. The data do not fall perfectly in a line. The correlation value when calculated is of We could calculate the value of correlation to help us understand the strength of the linear relationship. We may want to draw a line through this data, thus giving us a mathematical model to estimate plasma volume from weight, but which is the best line? The white line, the green line or the purple line? The technique of least squares regression will help us pick the line of best fit.

12 Slide 11 How Do We Choose the Best Line? The least squares regression line is the line which gets closest to all of the points How do we measure closeness to more than one point? minimize n (y i point_on_line i ) 2 i=1 11 The line of best fit is the regression line is the line that gets `closest' to all the data points. `Closeness' is measured as the vertical distance from the line to the data points. Specifically, the regression line is the one that minimizes the sum of all the squared vertical distances, hence estimation of this line is called least squares and the line is called the least square regression line.

13 Slide 12 Simple Linear Regression 12 Visually, we find the line that minimizes the squares of the vertical distances and the positive measures (points above the line) and the negative measures (points below the line), sum to zero. This could be very difficult to achieve by trial and error. We have some mathematical formulas that help us determine this exact line.

14 Slide 13 Equation of a Line Definition A line is defined by The intercept a (where the line crosses the vertical axis, the value of Y when X = 0), and The slope b (`rise over run,' how much y changes for each 1 unit change in x). y = a + bx 13 Before we move further with linear regression, let s review the equation of a line. That is, how do we represent a line with a mathematical function. A line is defined by the intercept a (where the line crosses the vertical axis, the value of Y when X = 0), and the slope b (`rise over run,' how much y changes for each 1 unit change in x). We write this as y = a + bx.

15 Slide 14 Equation of a Line 14 We can see the line crosses the vertical axis at the value a, when x = 0. We also see that for every one unit increase in x, y will change by the amount b.

16 Slide 15 Equation of a Line: Statistical Notation b b 0 1 = intercept = slope ˆ = b + b x y In statistics, the symbol for the intercept is b knot and the symbol for the slope is b sub one. Then we write the line as : y hat equals b0 + b1x. The reason we use yhat instead of y is to differentiate between the real data value y and our predicted value yhat given a value of x.

17 Slide 16 Equation of a Line: Statistical Notation y ˆ = b + b x y 0 1 b 0 b 1 slope intercept 0 x 16 Using statistical notation, we have the same picture as before. Here the line crosses the vertical axis at the value b knot, when x = 0. We also see that for every one unit increase in x, y-hat will change by the amount b sub 1.

18 Slide 17 Estimating Intercept and Slope b b 0 1 = y b x s = r s y x 1 yˆ = b + b x The least squares line minimizes the sum of squared vertical distances. This translates into: b knot equal ybar slope times xbar. The slope is the correlation times the ratio of the standard deviation of the observed y values divided by the standard deviation of the observed x values. In this way, we see the slope and the correlation are related to one another. The correlation depends on both the slope and the precision. The equations are obtained using mathematics beyond this course. It is enough to understand that these are the equations to help us determine the least squares regression line, y hat = b not plus b sub 1 times x.

19 Slide 18 y y Slope and Correlation b >0 1 b 1 = 0 b 1 < 0 0 x 18 Notice if the slope is positive then the correlation is positive. If the slope is zero then the correlation is zero. If the slope is negative then the correlation is negative.

20 Slide 19 Simple Linear Regression Y, plasma volume (liters) Pearson s correlation = X, body weight (kg) 19 The data points are represented as the dots in our scatter-plot, but the data points don't fall exactly on the line. How do we compute (and write) the least squares line for this data? Once we have the line, for any x value within the range of those values in our dataset, y-hat is the point that will fall exactly on the least squares line, not the data value for y. Thus every x value can be plugged into this equation to calculate a predicted y value which we denote y-hat.

21 Slide 20 Estimating Intercept and Slope sy b1 = r = s x = b = y b x = (66.875) = yˆ = x 20 Using the equations for estimating the slope and intercept for the least squares regression line, we get an intercept of and a slope of We must calculate the slope first because the equation for the intercept requires the use of the estimate of the slope. Generally, we do not do these calculations by hand. We use software to compute these values.

22 Slide 21 Plasma Volume and Weight yˆ = x 21 Using R we plot the least square regression line. This means for every one kilogram increase in body weight there is on average a liter increase in plasma volume. The intercept is the estimated plasma volume for a person who weighs zero kilograms. This estimate does not make biological sense. In this way, the intercept for this model is merely used to help us determine the line, not make a prediction at x = 0. The only meaningful estimates are within the range of our x values. That is weights from about 55 to 75 kilograms.

23 Slide 22 Plasma Volume and Weight Measurement of plasma volume very time consuming Body weight easy to measure: use equation and body weight to estimate plasma volume yˆ = x = (60) = Measuring plasma volume is very time consuming. We may want to estimate the plasma volume of a person outside this study based on the person s weight. For example, what on average would you expect plasma volume to be in liters for a 60 kilogram man? We would put 60 kilograms in for x and then calculate the estimated value to be 2.7 liters. That is, yhat equals * 60. Be very careful only to make estimates within the range of the data that was used to estimate the regression line. Also, be aware that measurement unit is meaningful. We would not want to insert values in pounds when the regression line is based on kilograms.

24 Slide 23 RSQUARE The square of the correlation (r 2= RSQUARE) is the fraction of the variation in the values of y that is explained by the least squares regression of y on x. r 2 variance of predicted values ŷ = variance of observed values of y = SSM SST 23 Recall Pearson s correlation: It measures the strength of the linear relationship between two quantitative variables. There is another measure called the coefficient of determination. It s value is Pearson s correlation squared. For this reason, it is often denoted RSQUARE. When using least squares regression typically the value of the coefficient of determination is used to help understand the amount of total variation that is explained by the regression of y on x. In fact, RSQUARE = SSM/SST. This is the sum of the squares of the model divided by the sum of the squares total. Those values will come from the ANOVA table in the linear regression output from the software. We will discuss the ANOVA table at length in a later lesson.

25 Slide 24 Plasma Volume and Weight This means 57.6% of the variation in plasma volume is explained by the least squares regression line of plasma volume on body weight. r 2 = 2 (0.759) = Recall, the correlation between plasma volume and weight is It we square this value, we have the coefficient of determination. The value is This means 57.6% of the variation in plasma volume is explained by the least squares regression line of plasma volume on body weight. When RSQUARE is close to 1, the regression line (the y-hat values) is representing the original data (the Y values) well. When RSQUARE is close to 0, the regression line is not representing the original data well.

26 Slide 25 Simple Linear Regression: Residuals 25 When we draw the least squares regression line, the line of best fit, the line does not fall directly on all the data points. That is, the y-hat values are different than the actual y values for the data. We call these vertical distances Residuals.

27 Slide 26 Residuals Model ˆ = b + b x y 0 1 ε = i y i yˆ i ε i =difference between observed and predicted value of response for each value of x => Called the residual. 26 y yhat for each piece of data is the residual for that point. This value is often denoted with epsilon sub i. We can calculate the value at any x in our dataset by taking the observed y value minus the predicted value, y-hat from the model. If the residual is positive, it means the data value is above the line. If the residual is negative, the data value is below the line. We will use residuals and residual plots in our next lesson to investigate how well the linear model is fitting the data observed.

28 Slide 27 Estriol and Infant Birth-weight Obstetricians sometimes order tests for estriol levels from 24-hour urine specimens taken from pregnant women who are near term. The level of estriol (mg/24 hours) has been found to be positively related to the birth-weight (grams/100) of the infant. Thus, the test can provide indirect evidence of an abnormally small fetus. [Bernard Rosner, Fundamentals of Biostatistics, page 425] 27 Let s do an another example. Obstetricians sometimes order tests for estriol levels from 24-hour urine specimens taken from pregnant women who are near term, since the level of estriol has been found to be related to the birth-weight of the infant. The test may provide indirect evidence of an abnormally small fetus.

29 Slide 28 Estriol and Infant Birth-weight Pearson' s Correlation, r = Here is the scatter-plot of birth-weight and Estriol for 31 women and babies. We can see that there is a positive relationship between estriol level and birthweight. The relationship is not perfect, but linear regression may still help with predictions. The Pearson s correlation value is Notice that birth-weight is in g/100. We will want to know this unit later for our calculations.

30 Slide 29 Estriol and Infant Birth-weight yˆ = x 29 The values of the slope and intercept can be calculated using software, or by using the equations given in earlier slides. The prediction line shown on the scatter-plot is yhat = x. This means for every one unit increase in estriol level the birth-weight of the infant is on average g/100 higher, about 60 grams.

31 Slide 30 Estriol and Infant Birth-weight Using estriol level to predict infant birth-weight when estriol level is 10mg. yˆ = x = (10) = 27.6 grams/ Suppose we want to estimate the birth-weight of a baby whose mother has an estriol level of 10 mg. Before we begin, we verify 10 mg is in the range of the original data. We can do this by looking at the scatter-plot of the data. We can then put 10 mg in the least squares regression equation for x and calculate an estimated weight of 27.6 g/100. This is 2,760 grams.

32 Slide 31 Estriol and Infant Birth-weight Using estriol level to predict infant birth-weight when estriol level is 30mg. 31 Suppose we want to estimate the birth-weight of a baby whose mother has an estriol level of 30 mg. Before we begin, we verify 30 mg is in the range of the original data. We can do this by looking at the scatter-plot of the data. We see that 30mg is NOT in the range of the x data for our study. We should not use the regression line to estimate infant birth-weight!

33 Slide 32 Estriol and Infant Birth-weight Now let's go in the reverse direction: Low birth-weight may be defined as infant birth-weight less than 2500 grams. For what estriol level is the predicted infant birth-weight equal to 2500 grams? (First convert to the correct units: 2500 grams = 25 grams/100.) 25 = x = 0.608x = x = x 32 Now let's go in the reverse direction: Low birth-weight may be defined as infant birth-weight less than 2500 grams. For what estriol level is the predicted infant birth-weight equal to 2500 grams? First we must convert to the correct units: 2500 grams = 25 grams/100. If you set 25 = x and then solve for x, you will find the estriol level that predicts a low birth-weight baby. The value of x is 5.72 mg.

34 Slide 33 Assumptions L = linear relationship between y and x. I = independence between values of y. (Value of one y does not affect value of another y). N = normality of error around each value of y. E= equality of variance around y for each value of x. 33 Linear regression requires we make some assumptions. Conveniently, these assumptions follow the acronym LINE. These assumptions are: L = = linear relationship between y and x. I = independence between values of y. One value of y does not affect another value of y. N = normality of error around each value of y. E= equality of variance around y for each value of x. Our next lesson will explore techniques to evaluate each of these assumptions.

35 Slide 34 Cautions Predicted values should only be computed for X values that fall within the range of X values in the original data. Just like a correlation, a regression line only summarizes the linear relationship between X and Y. If the relationship is truly non-linear, then using the regression line can be misleading. Seeing a relationship (an association) between X and Y does not imply causation: that changes in X will cause changes in Y. 34 In addition to evaluating linear regression assumptions, we must take caution with the interpretation of our results. Predicted values should only be computed for X values that fall within the range of X values in the original data. Just like a correlation, a regression line only summarizes the linear relationship between X and Y. If the relationship is truly non-linear, then using the regression line can be misleading. Seeing a relationship (an association) between X and Y does not imply causation: that changes in X will cause changes in Y.

36 Slide 35 Cautions In the regression context, a lurking variable is a third variable that may influence the relationship between X and Y. Outliers and skewed data can impact the regression line, just like they can impact the correlation. Either X or Y or both could have outliers or skewness. If including a particular data point changes the regression line compared to when it is not included, the data point is called influential. 35 In the regression context, a lurking variable is a third variable that may influence the relationship between X and Y. Outliers and skewed data can impact the regression line, just like they can impact the correlation. Either X or Y or both could have outliers or skewness. If including a particular data point changes the regression line compared to when it is not included, the data point is called influential. Does that seem like many `cautions'? It is: as we learn methods that are more complicated, there will often be more limits on their use and interpretation.

Linear Regression. Chapter 5. Prediction via Regression Line Number of new birds and Percent returning. Least Squares

Linear Regression Chapter 5 Regression Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). We can then predict the average response for all subjects

Section 3 Part 1. Relationships between two numerical variables

Section 3 Part 1 Relationships between two numerical variables 1 Relationship between two variables The summary statistics covered in the previous lessons are appropriate for describing a single variable.

2. Simple Linear Regression

Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2

Lesson 4 Part 1 Relationships between two numerical variables 1 Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables

Exercise 1.12 (Pg. 22-23)

Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

Correlation key concepts:

CORRELATION Correlation key concepts: Types of correlation Methods of studying correlation a) Scatter diagram b) Karl pearson s coefficient of correlation c) Spearman s Rank correlation coefficient d)

Univariate Regression

Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is

Relationships Between Two Variables: Scatterplots and Correlation

Relationships Between Two Variables: Scatterplots and Correlation Example: Consider the population of cars manufactured in the U.S. What is the relationship (1) between engine size and horsepower? (2)

Simple linear regression

Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

Example: Boats and Manatees

Figure 9-6 Example: Boats and Manatees Slide 1 Given the sample data in Table 9-1, find the value of the linear correlation coefficient r, then refer to Table A-6 to determine whether there is a significant

SPSS Guide: Regression Analysis

SPSS Guide: Regression Analysis I put this together to give you a step-by-step guide for replicating what we did in the computer lab. It should help you run the tests we covered. The best way to get familiar

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

Describing Relationships between Two Variables

Describing Relationships between Two Variables Up until now, we have dealt, for the most part, with just one variable at a time. This variable, when measured on many different subjects or objects, took

CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association

17. SIMPLE LINEAR REGRESSION II

17. SIMPLE LINEAR REGRESSION II The Model In linear regression analysis, we assume that the relationship between X and Y is linear. This does not mean, however, that Y can be perfectly predicted from X.

Module 3: Correlation and Covariance

Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis

Correlation and Regression

Correlation and Regression Scatterplots Correlation Explanatory and response variables Simple linear regression General Principles of Data Analysis First plot the data, then add numerical summaries Look

Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

Chapter 9 Descriptive Statistics for Bivariate Data

9.1 Introduction 215 Chapter 9 Descriptive Statistics for Bivariate Data 9.1 Introduction We discussed univariate data description (methods used to eplore the distribution of the values of a single variable)

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate

Simple Regression Theory II 2010 Samuel L. Baker

SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Statistics Quiz Correlation and Regression -- ANSWERS 1. Temperature and air pollution are known to be correlated. We collect data from two laboratories, in Boston and Montreal. Boston makes their measurements

Means, standard deviations and. and standard errors

CHAPTER 4 Means, standard deviations and standard errors 4.1 Introduction Change of units 4.2 Mean, median and mode Coefficient of variation 4.3 Measures of variation 4.4 Calculating the mean and standard

Homework 8 Solutions

Math 17, Section 2 Spring 2011 Homework 8 Solutions Assignment Chapter 7: 7.36, 7.40 Chapter 8: 8.14, 8.16, 8.28, 8.36 (a-d), 8.38, 8.62 Chapter 9: 9.4, 9.14 Chapter 7 7.36] a) A scatterplot is given below.

The importance of graphing the data: Anscombe s regression examples

The importance of graphing the data: Anscombe s regression examples Bruce Weaver Northern Health Research Conference Nipissing University, North Bay May 30-31, 2008 B. Weaver, NHRC 2008 1 The Objective

Review of Fundamental Mathematics

Review of Fundamental Mathematics As explained in the Preface and in Chapter 1 of your textbook, managerial economics applies microeconomic theory to business decision making. The decision-making tools

Regression III: Advanced Methods

Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models

Simple Linear Regression Inference

Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

11. Analysis of Case-control Studies Logistic Regression

Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

Session 7 Bivariate Data and Analysis

Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares

Correlation. What Is Correlation? Perfect Correlation. Perfect Correlation. Greg C Elvers

Correlation Greg C Elvers What Is Correlation? Correlation is a descriptive statistic that tells you if two variables are related to each other E.g. Is your related to how much you study? When two variables

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

STA 3024 Practice Problems Exam 2 NOTE: These are just Practice Problems. This is NOT meant to look just like the test, and it is NOT the only thing that you should study. Make sure you know all the material

Simple Predictive Analytics Curtis Seare

Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use

2013 MBA Jump Start Program. Statistics Module Part 3

2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing

6.4 Normal Distribution

Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under

STAT 350 Practice Final Exam Solution (Spring 2015)

PART 1: Multiple Choice Questions: 1) A study was conducted to compare five different training programs for improving endurance. Forty subjects were randomly divided into five groups of eight subjects

. 58 58 60 62 64 66 68 70 72 74 76 78 Father s height (inches)

PEARSON S FATHER-SON DATA The following scatter diagram shows the heights of 1,0 fathers and their full-grown sons, in England, circa 1900 There is one dot for each father-son pair Heights of fathers and

Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation

Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation Display and Summarize Correlation for Direction and Strength Properties of Correlation Regression Line Cengage

CURVE FITTING LEAST SQUARES APPROXIMATION

CURVE FITTING LEAST SQUARES APPROXIMATION Data analysis and curve fitting: Imagine that we are studying a physical system involving two quantities: x and y Also suppose that we expect a linear relationship

Stat 412/512 CASE INFLUENCE STATISTICS. Charlotte Wickham. stat512.cwick.co.nz. Feb 2 2015

Stat 412/512 CASE INFLUENCE STATISTICS Feb 2 2015 Charlotte Wickham stat512.cwick.co.nz Regression in your field See website. You may complete this assignment in pairs. Find a journal article in your field

Descriptive statistics; Correlation and regression

Descriptive statistics; and regression Patrick Breheny September 16 Patrick Breheny STA 580: Biostatistics I 1/59 Tables and figures Descriptive statistics Histograms Numerical summaries Percentiles Human

Algebra 1 2008. Academic Content Standards Grade Eight and Grade Nine Ohio. Grade Eight. Number, Number Sense and Operations Standard

Academic Content Standards Grade Eight and Grade Nine Ohio Algebra 1 2008 Grade Eight STANDARDS Number, Number Sense and Operations Standard Number and Number Systems 1. Use scientific notation to express

COMP6053 lecture: Relationship between two variables: correlation, covariance and r-squared. jn2@ecs.soton.ac.uk

COMP6053 lecture: Relationship between two variables: correlation, covariance and r-squared jn2@ecs.soton.ac.uk Relationships between variables So far we have looked at ways of characterizing the distribution

AP STATISTICS REVIEW (YMS Chapters 1-8)

AP STATISTICS REVIEW (YMS Chapters 1-8) Exploring Data (Chapter 1) Categorical Data nominal scale, names e.g. male/female or eye color or breeds of dogs Quantitative Data rational scale (can +,,, with

5/31/2013. 6.1 Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives.

The Normal Distribution C H 6A P T E R The Normal Distribution Outline 6 1 6 2 Applications of the Normal Distribution 6 3 The Central Limit Theorem 6 4 The Normal Approximation to the Binomial Distribution

Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS.

SPSS Regressions Social Science Research Lab American University, Washington, D.C. Web. www.american.edu/provost/ctrl/pclabs.cfm Tel. x3862 Email. SSRL@American.edu Course Objective This course is designed

2. Linear regression with multiple regressors

2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the

Linear Models in STATA and ANOVA

Session 4 Linear Models in STATA and ANOVA Page Strengths of Linear Relationships 4-2 A Note on Non-Linear Relationships 4-4 Multiple Linear Regression 4-5 Removal of Variables 4-8 Independent Samples

CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA

We Can Early Learning Curriculum PreK Grades 8 12 INSIDE ALGEBRA, GRADES 8 12 CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA April 2016 www.voyagersopris.com Mathematical

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

The correlation coefficient

The correlation coefficient Clinical Biostatistics The correlation coefficient Martin Bland Correlation coefficients are used to measure the of the relationship or association between two quantitative

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x - x) B. x 3 x C. 3x - x D. x - 3x 2) Write the following as an algebraic expression

Correlation and Simple Linear Regression

Correlation and Simple Linear Regression We are often interested in studying the relationship among variables to determine whether they are associated with one another. When we think that changes in a

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test March 2014

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test March 2014 STAB22H3 Statistics I Duration: 1 hour and 45 minutes Last Name: First Name: Student number: Aids

UNDERSTANDING THE TWO-WAY ANOVA

UNDERSTANDING THE e have seen how the one-way ANOVA can be used to compare two or more sample means in studies involving a single independent variable. This can be extended to two independent variables

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Lesson : Comparison of Population Means Part c: Comparison of Two- Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis

5. Linear Regression

5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4

Final Exam Practice Problem Answers

Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal

Association Between Variables

Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi

II. DISTRIBUTIONS distribution normal distribution. standard scores

Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

Chapter 3 Quantitative Demand Analysis

Managerial Economics & Business Strategy Chapter 3 uantitative Demand Analysis McGraw-Hill/Irwin Copyright 2010 by the McGraw-Hill Companies, Inc. All rights reserved. Overview I. The Elasticity Concept

Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

Pre-Algebra 2008. Academic Content Standards Grade Eight Ohio. Number, Number Sense and Operations Standard. Number and Number Systems

Academic Content Standards Grade Eight Ohio Pre-Algebra 2008 STANDARDS Number, Number Sense and Operations Standard Number and Number Systems 1. Use scientific notation to express large numbers and small

Introduction to Regression and Data Analysis

Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

Module 5: Multiple Regression Analysis

Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

Moderation. Moderation

Stats - Moderation Moderation A moderator is a variable that specifies conditions under which a given predictor is related to an outcome. The moderator explains when a DV and IV are related. Moderation

Microeconomics Sept. 16, 2010 NOTES ON CALCULUS AND UTILITY FUNCTIONS

DUSP 11.203 Frank Levy Microeconomics Sept. 16, 2010 NOTES ON CALCULUS AND UTILITY FUNCTIONS These notes have three purposes: 1) To explain why some simple calculus formulae are useful in understanding

4. Multiple Regression in Practice

30 Multiple Regression in Practice 4. Multiple Regression in Practice The preceding chapters have helped define the broad principles on which regression analysis is based. What features one should look

Dealing with Data in Excel 2010

Dealing with Data in Excel 2010 Excel provides the ability to do computations and graphing of data. Here we provide the basics and some advanced capabilities available in Excel that are useful for dealing

AP Physics 1 and 2 Lab Investigations

AP Physics 1 and 2 Lab Investigations Student Guide to Data Analysis New York, NY. College Board, Advanced Placement, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks

5. Multiple regression

5. Multiple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/5 QBUS6840 Predictive Analytics 5. Multiple regression 2/39 Outline Introduction to multiple linear regression Some useful

Characteristics of Binomial Distributions

Lesson2 Characteristics of Binomial Distributions In the last lesson, you constructed several binomial distributions, observed their shapes, and estimated their means and standard deviations. In Investigation

Econometrics Simple Linear Regression

Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight

Premaster Statistics Tutorial 4 Full solutions

Premaster Statistics Tutorial 4 Full solutions Regression analysis Q1 (based on Doane & Seward, 4/E, 12.7) a. Interpret the slope of the fitted regression = 125,000 + 150. b. What is the prediction for

Statistical Models in R

Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Linear Models in R Regression Regression analysis is the appropriate

An analysis appropriate for a quantitative outcome and a single quantitative explanatory. 9.1 The model behind linear regression

Chapter 9 Simple Linear Regression An analysis appropriate for a quantitative outcome and a single quantitative explanatory variable. 9.1 The model behind linear regression When we are examining the relationship

EDUCATION AND VOCABULARY MULTIPLE REGRESSION IN ACTION

EDUCATION AND VOCABULARY MULTIPLE REGRESSION IN ACTION EDUCATION AND VOCABULARY 5-10 hours of input weekly is enough to pick up a new language (Schiff & Myers, 1988). Dutch children spend 5.5 hours/day

Regression and Correlation

Regression and Correlation Topics Covered: Dependent and independent variables. Scatter diagram. Correlation coefficient. Linear Regression line. by Dr.I.Namestnikova 1 Introduction Regression analysis

MULTIPLE REGRESSION WITH CATEGORICAL DATA

DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS Posc/Uapp 86 MULTIPLE REGRESSION WITH CATEGORICAL DATA I. AGENDA: A. Multiple regression with categorical variables. Coding schemes. Interpreting

Part 1 : 07/27/10 21:30:31

Question 1 - CIA 593 III-64 - Forecasting Techniques What coefficient of correlation results from the following data? X Y 1 10 2 8 3 6 4 4 5 2 A. 0 B. 1 C. Cannot be determined from the data given. D.

Testing for Lack of Fit

Chapter 6 Testing for Lack of Fit How can we tell if a model fits the data? If the model is correct then ˆσ 2 should be an unbiased estimate of σ 2. If we have a model which is not complex enough to fit

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance

2 Making Connections: The Two-Sample t-test, Regression, and ANOVA In theory, there s no difference between theory and practice. In practice, there is. Yogi Berra 1 Statistics courses often teach the two-sample

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

IAPRI Quantitative Analysis Capacity Building Series Multiple regression analysis & interpreting results How important is R-squared? R-squared Published in Agricultural Economics 0.45 Best article of the

Polynomial and Rational Functions

Polynomial and Rational Functions Quadratic Functions Overview of Objectives, students should be able to: 1. Recognize the characteristics of parabolas. 2. Find the intercepts a. x intercepts by solving

SOLUTIONS TO BIOSTATISTICS PRACTICE PROBLEMS

SOLUTIONS TO BIOSTATISTICS PRACTICE PROBLEMS BIOSTATISTICS DESCRIBING DATA, THE NORMAL DISTRIBUTION SOLUTIONS 1. a. To calculate the mean, we just add up all 7 values, and divide by 7. In Xi i= 1 fancy

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing

APPLICATION OF LINEAR REGRESSION MODEL FOR POISSON DISTRIBUTION IN FORECASTING

APPLICATION OF LINEAR REGRESSION MODEL FOR POISSON DISTRIBUTION IN FORECASTING Sulaimon Mutiu O. Department of Statistics & Mathematics Moshood Abiola Polytechnic, Abeokuta, Ogun State, Nigeria. Abstract

South Carolina College- and Career-Ready (SCCCR) Probability and Statistics

South Carolina College- and Career-Ready (SCCCR) Probability and Statistics South Carolina College- and Career-Ready Mathematical Process Standards The South Carolina College- and Career-Ready (SCCCR)

Graphical Integration Exercises Part Four: Reverse Graphical Integration

D-4603 1 Graphical Integration Exercises Part Four: Reverse Graphical Integration Prepared for the MIT System Dynamics in Education Project Under the Supervision of Dr. Jay W. Forrester by Laughton Stanley