# Lesson Lesson Outline Outline

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Lesson 15 Linear Regression

2 Lesson 15 Outline Review correlation analysis Dependent and Independent variables Least Squares Regression line Calculating l the slope Calculating the Intercept Residuals and Residual Plots Identifying significant relationship: t-test test of the slope R 2 : coefficient of determination Using the regression line for Prediction of Y from X Relationship between correlation coefficient and linear regression 2

3 Linear Regression and Correlation Both Linear Regression ess and Correlation o Analysis s can be used to explore the linear relationship between two continuous (quantitative) random variables. Correlation analysis is used when the interest is in identifying if a relationship exists and quantifying the strength of the relationship Regression Analysis is used to identify a relationship AND to predict the value of one variable given a value of the other variable(s). 3

4 Review: Correlation Analysis 1. Plot the data using a scatter plot to get a visual idea of the relationship 2. Calculate the correlation coefficient 1. Use Pearson s correlation coefficient if both variables are continuous 2. Use Spearman rank correlation coefficient if both variables are ordinal or one is ordinal and the other continuous. 4

5 Review: Scatter Plots and Association i Plot the 2 variables in a scatter plot (EXCEL) The pattern of the dots in the plot indicates the statistical relationship between the variables (the strength th and the direction) Positive relationship pattern goes from lower left to upper right. Negative relationship pattern goes from upper left to lower right. The more the dots cluster around a straight line with a positive or negative direction the stronger the linear relationship. 5

6 Review: Correlation Coefficient r ( x x )( y y ) [ ( x x ) 2 ][ ( y y) 2 ] The statistic r is called the Correlation Coefficient r estimated the population correlation coefficient: (the Greek letter r ) The correlation coefficient provides a measure of the linear association between two variables r is always between 1 and 1 6

7 Review: Correlation Coefficient i in Excel Use the CORREL function to find the correlation coefficient If data for one variable are in cells A1:A12 and data for other variable are in cells B1:B12, =CORREL(A1:A12,B1:B12) will return the Pearson correlation coefficient. Correlation coefficients i closer to 1 or 1 1i indicate a stronger linear relationship. Correlation coefficients close to 0 indicate a weak linear relationship. However there could be a nonlinear relationship when the correlation coefficient is close to 0. 7

8 Simple Linear Regression Like correlation analysis, Linear regression analysis is a technique that is used to explore the relationship between two continuous random variables that have a linear relationship. Regression analysis allows us to investigate the change in one variable that corresponds to a given change in the other variable. If only ONE variable is used to predict the value of the other variable, the analysis is called simple linear regression. When two or more variables are used to predict the value of the other variable, the analysis is called multiple linear regression (not covered in this course). 8

9 Linear Regression: Background Regression is from a Latin root meaning going back Linear regression as a statistical method was first described by Sir Francis Galton in his paper "Regression Towards Mediocrity in Hereditary Stature published in The Journal of the Anthropological Institute, 1886 Galton described the relationship between mid-parent height (Mid- parent height = the average of the 2 parent s height) and the height of their offspring Taller mid-parent height had children with heights closer to the average height Shorter mid-parent height had children with heights closer to the average height Galton called this phenomenon regression towards mediocrity 9

10 Sir Francis Galton: Regression When mid-parents are taller than mediocrity, their children tend to be shorter than they and When mid-parents are shorter than mediocrity, it their children tend to be taller than they 10

11 Variables in Simple Linear Regression Analysis Dependent or response variable- a variable to be predicted from or explained by the other variable The response variable is typically labeled Y Y is a continuous variable in simple linear regression Independent or explanatory variable the variable used to predict the dependant variable. This variable is typically labeled l X X can also be called the predictive variable or the regressor variable For simple linear regression X is a continuous variable For multiple linear regression X can be continuous or categorical 11

12 Identifying independent and dependent variables. In regression analysis, it s important to correctly identify the dependent d (Y) and independent d (X) variables. The study description should provide you with information about which is the dependent variable and which is the independent variable. If the study description states that the goal is to predict variable 1 from variable 2, 2 then variable 1 is the dependent variable (Y) and variable 2 is the independent variable (X). Typically, if the variables are separated in time, the variable collected first is the independent variable (X) )andthevariable collected later is the dependent variable (Y). In Galton s regression analysis, the mid-parent height was the independent variable and the offspring height was the dependent variable 12

13 Linear Regression Overview Look at a scatter plot of the data Plot Y on the y-axis and X on the x-axis Does the relationship appear to be linear? Estimate the regression line equation Find the slope and intercept of the regression line Check residuals Is the relationship statistically significant? Use a t-test test of the slope to determine significance How well does the estimated regression line equation fit the data? Calculate R 2 - the coefficient of determination Use the estimated regression line equation to predict values of fth the dependent d variable (Y)f for specified values of fth the independent variable (X). 13

14 Simple Linear Regression: An Example Is there a linear relationship between body weight and plasma volume that can be used to predict plasma volume from weight? Plasma volume is the dependent variable Y since we are interested in predicting this from body weight, the independent variable X. Body Plasma Subject Weight(kg) Volume(l)

15 Scatter plot of the Data There is a positive relationship between plasma volume and body weight. With this small number of data points it is difficult to see the linear relationship but there is a general linear trend to the data We want to identify a line that has a good fit to the data. This isn t a deterministic relationship so the points won t fall perfectly on the line. 4 Volume (liter rs) Plasma Body Weight (kg) 15

16 Estimate the Regression Line Equation A few of the many possible lines through the data points are illustrated t in the plot. How do we decide which h line best fits the data? 4 Pla asma Volum me (liters) Body Weight (kg) 16

17 Least Squares Regression Line The linear regression line is the line that gets closest to all of the points. This is called the least squares regression line. The least squares regression line minimizes the sum of the squares of the vertical distance between each observed data point (y i ) and the line minimize n ( y i 1 2 i point on linei) 17

18 Vertical distances between each observed Y (y i ) and the line are in red. The sum of these distances squared is minimized by the least squares regression line 4 Plasma a Volume (L) Body Weight (kg) 18

19 Least Squares Regression Line Equation The equation for a line requires a slope and an intercept In regression analysis, we estimate the population regression line with the least squares regression line calculated l from sample data: the sample regression line The notation for the slope and intercept in the population regression line are Greek letters for the intercept for the slope The notation for the slope and intercept in the sample regression line are Roman letters a for the intercept t b for the slope 19

20 The Population Regression Line 0 is the y -intercept of the line is the slope of the regression line 1 is the error term - the difference between the observed Y and the regression line Y X 20

21 Sample Regression Line 0 and ad 1 are aepopulation o parameters a Sample estimates for the regression parameters are : a is the estimate for b is the estimate for Y a bx is the regression line calculated from sample dt data Y is the predicted value of Y 21

22 Least Squares Regression Line aand and b are estimates of the regression coefficients and The regression coefficients are estimated from the sample data by the least squares method The intercept a is the estimated expected value of Y when X= 0 The slope b is the estimated expected change in Y corresponding to a 1 unit increase in X Y is the expected (or predicted) value of y, the point on the line. It is called the fitted value of y The following slide illustrates the least squares regression ession line 22

23 The Equation of a Regression y y Line Y a bx b a intercept 0 One-unit Change in X slope x 23

24 Interpretation of predicted values of Y The predicted value of y is the expected y-value Since not all observed data points are exactly on the regression line, there is a range of possible y-values (a distribution) for each x-value. In regression analysis the distribution of y-values for each x-value is assumed to be a normal distribution. The predicted values of y represent the mean values of the distributions of y for each specified value of x. The following slide illustrates this for 3 values of X: notice that t the mean of each distribution ib ti is on the regression line equation (the predicted value of y) and that the distribution of y-values are normal distributions. 24

25 Simple Linear Regression Model Illustrated 25

26 Assumptions for Regression Analysis There are several assumptions that should be met for regression analysis: For each value of X, the Y variable is assumed to have a normal distribution the mean of the normal distribution is the predicted value, Y The normal distributions are assumed to have equal variance across the entire range of X values. This assumption is called homogeneity or homoscedasticity. The predicted values of Y fall on the regression line representing the linear relationship between X and Y The Y observations are assumed to be independent The observations are from a random sample 26

27 Interpretation of the Slope of the Regression line The slope b is the expected change in Y corresponding to a 1 unit increase in X b = 0: There is no linear association between Y and X b > 0: There is a Positive linear association between Y and X (as X increases the expected value of Y increases) b < 0: There is a Negative linear association between Y and X (as X increases the expected value of Y decreases) The following slide illustrates a positive, negative and 0 slope. 27

28 Illustration of Negative, Positive slopes y and slope = 0 y b >0 b = 0 b < 0 0 x 28

29 Calculating the Slope of the Regression Line The formula to calculate the slope of the least squares regression line is given below b n ( x x )( y y ) i 1 i i n ( ) x x i i Notice that the numerator is the same as the numerator in the formula for the correlation coefficient. 29

30 b for plasma (Y) and body weight (X) example X Y (X- Xbar) (Y-Ybar) (X-Xbar)(Y-Ybar) (X-Xbar) Mean SUM

31 Slope of regression line From the previous slide the sum of (X-X)(Y-Y) Y) = The sum of (X-X) X) 2 = b = / = Interpretation of the slope: For every one unit increase in X, the expected increase in Y is units (rounded to 4 decimal places) Plasma volume increases liters for every one kg increase in body weight. The slope is positive indicating that as body weight (X) increases, plasma volume (Y) also increases 31

32 Calculating the Intercept of the regression line The intercept a of the regression line is the estimated value of Y when X = 0 a is calculated from the average value of Y, the average value of X and the estimated t slope b by the following formula: a Y bx 32

33 Intercept for Plasma Volume Example X Y b a * The intercept is the estimated expected value of Y when X = 0. Intercepts do not always have realistic interpretations. In this example, plasma volume is predicted to be liters when body weight = 0 kg. which h is not a possibility. 33

34 Regression Line Equation Once the slope and the intercept have been calculated the regression equation can be constructed: t Y a bx Y X This is the equation that will be used to predict plasma volume (l) from body weight (kg). The regression equation calculated from sample data is an estimate of the true population regression equation. 34

35 Regression Line Equation and interpretation i of the slope A 1 unit increase in X for this data = 1 kg so the interpretation of the slope in this regression line equation is: For each 1 kg increase in body weight, the expected increase in plasma volume is.0436 liters. What is the expected plasma volume increase for a 10 kg increase in body weight? For a 10 kilogram increase in body weight, the expected increase in plasma volume = 10* = liters. 35

36 What if the slope of the regression line is negative? If the slope of the regression line is negative we would expect a decrease in Y with each unit increase in X. The slope is a measure of the expected change in Y for each 1-unit increase in X If the slope is positive, the expected change in Y is an increase If the slope is negative, the expected change in Y is a decrease. 36

37 Regression Coefficients in Excel Excel has functions to calculate the slope and the intercept of the least squares regression line: The SLOPE function returns b - the slope =SLOPE(y-range, x-range) The INTERCEPT function returns a -the intercept =INTERCEPT(y-range, x-range) For both of these functions enter the y-range of fd data first and dth then the x-range of fth the data. 37

38 Plasma Volume Example in Excel The Lesson 15 Excel Module works through h the Plasma Volume / body weight regression example: Create a scatterplot of the data work through the calculations of the Slope and Intercept of the regression line Use the Excel Slope and Intercept functions After you ve worked through the calculations once, use the Excel functions to find the slope and intercept for future regression problems 38

39 Residuals The residual is st the ed difference ee cebet between ee the observed (Y) and the expected (Y ) value of Y Residual = Y Y Y is the observed Y for any X Y is the Y-value on the regression line for that t value of X The residual is the component of Y that is not predicted by X The least squares regression line is the line that minimizes the squared residuals 39

40 Residuals for Plasma Volume Example X Y Y' Residual Which point is closest to the regression line? Which point is furthest from the regression line? Calculate Y, the expected value of Y, using the regression line equation. The residual is the difference between Y and Y (74, 3.37) has the smallest residual (70.5, 3.49) has the largest residual 40

41 Regression Line and Residuals Largest residual Plasm ma Volume (L) Body Weight (kg) Smallest residual 41

42 Analysis of Residuals A Residual plot is a plot of the residual values on the Y- axis and the x-values on the X-axis If there is a linear relationship between X and Y, the correlation between X and the residuals should equal 0. The scatterplot will be a random scatter of points with no evident linear pattern. A nonlinear relationship between X and Y will be more evident in the residual plot of the (X, residual) data than in the scatterplot of the original (X, Y) data The Excel Regression analysis tool has an option for selecting the Residual plot. The Residual plot for the plasma volume example is on the following slide. 42

43 Residual Plot for Plasma Volume Body weight data body weight (kg) Residual Plot Re esiduals body weight (kg) No evidence of nonlinearity. The points are equally distributed around the value 0 with no evident positive or negative slope 43

44 (X, Y) Scatterplot for a nonlinear (or curvilinear) relationship When there is a curvilinear relationship between X and Y, the least squares regression line does not represent the relationship 44

45 Residual Plot for Curvilinear Relationship X Residual Plot 6 4 Residuals X This is the residual plot for the relationship on the previous slide. It illustrates that the relationship is not linear. The residual plot points aren t evenly distributed around the value 0. 45

46 Regression analysis for curvilinear relationships Simple linear regression analysis should not be used when X and Y have a curvilinear relationship There are several strategies for dealing with a curvilinear relationship between X and Y One option is to try a logarithmic transformation of the data to see if this improves the linear relationship Another option is to use piecewise regression fit one regression line to the increasing portion of the curve and a second regression line to the decreasing portion of the curve Athid third option is to include X 2 or X 3 in the regression equation (covered in PubH 6415 with multiple regression models). 46

47 Linear Regression Procedure Look at a scatter plot of the data Plot Y on the y-axis and X on the x-axis Add the trend line to the plot Estimate the regression line equation Find the slope and intercept of the regression line Check Residuals Is the relationship between X and Y statistically significant? Use a t-test test t of the slope to determine significance ifi How well does the estimated regression line equation fit the data? Calculate R 2 - the coefficient of determination Use the estimated regression line equation to predict values of the dependent variable (Y) for specified values of the independent variable (X). 47

48 Is the relationship between X and Y significant? ifi If the slope of the regression line = 0, this indicates there is no linear relationship between the variables. If there is no linear relationship the variables are considered to be independent Att t-test test t of the slope estimate t can be done to test t for independence between the X and Y variables Null hypothesis: slope = 0 The null hypothesis states t that t the variables are independent d Alternative hypothesis: slope 0 The alternative hypothesis is that there is a significant relationship between the variables If the t-test test of the slope result is significant (p-value < ), reject the null hypothesis and conclude that there is a statistically significant relationship between the two variables. 48

49 Notation for Population slope and Intercept As in any hypothesis test, the null and alternative hypotheses are stated about the population parameters, not about the estimates. The population parameters for the slope and intercept t of the regression line for the population are the Greek letters 1 and 0 1 is the population parameter for the slope 0 is the population parameter for the intercept The statistic for the t-test test of the slope will use the estimated value of the slope (b) that is calculated from the data. 49

50 t-test test of the Slope 1. State the Hypotheses Null hypothesis: = 0 Alternative hypothesis: 0 2. A t-test test will be used to test the hypothesis 3. Significance level = The degrees of freedom for a t-test test of the slope are n-2 where n=sample size The critical values of the t-test test are found using TINV(0.05, 05 df). For the plasma volume example, n = 8 so the critical values = TINV(0.05, 6) = and

51 t-test test of the slope 5. Calculate the test statistic the slope estimate divided by the standard error of the slope t b 1 SE( b 1 ) The formula for the SE of the slope is complicated so we will use the Excel Data Analysis Tool to do this t- test. The Data Analysis Tool provides the t-statistic and the p-value of the t-test test of the slope 6. State the conclusion. If the test statistic is more extreme than the critical values reject the null hypothesis and conclude that there is a significant relationship between the variables. 51

52 T-test of the Slope in Excel Data Analysis Tool output for the weight / plasma volume example: The t-statistic and p-value for the t-test of the slope are highlighted SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted d R Square Standard Error Observations 8 ANOVA df SS MS F Significance F Regression Residual Total Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept Body weight P-value for t-test test = so reject the null hypothesis and conclude that there is a significant relationship between weight and plasma volume 52

53 Regression Analysis in Excel In Excel Module 15 use the Data Analysis Tool to obtain the Regression Analysis results select Regression under the Data Analysis Tool. Enter the plasma volume data for Y-range and the weight data for X-range Check labels if you highlight the column headers Also check Residuals and Residual Plot Identify the t-statistic t ti ti and the p-value for the t-test test t of the slope. Also identify the slope and the intercept on the output table These are under the Coefficients column 95% confidence intervals for the coefficients are also provided if the Confidence Level box is checked 53

54 T-test of the Intercept The Data Analysis Tool also provides results of a t-test test of the Intercept. The Null hypothesis of this test is that the intercept = 0: = 0 The Alternative ti hypothesis of this test t is that t the intercept 0: 0 Usually there is not much interest in the t-test test of the intercept because testing whether the intercept = 0 does not provide information about the relationship between the two variables. From the Regression Table, you can see that the null hypothesis for the intercept = 0 is not rejected because the p-value = This result does not affect the significant result of the t-test test of the slope. 54

55 Linear Regression Procedure Look at a scatter plot of the data Plot Y on the y-axis and X on the x-axis Add the trend line to the plot Estimate the regression line equation Find the slope and intercept of the regression line Is the relationship statistically significant? Use a t-test test of the slope to determine significance How well does the estimated t regression line equation fit the data? Calculate R 2 - the coefficient of determination Use the estimated regression line equation to predict values of the dependent variable (Y) for specified values of the independent variable (X). 55

56 How well does the regression line equation fit the data? r 2 is st the notation otato for the ecoe coefficient ce to of determination r 2 is equal to the correlation coefficient (r) squared. It can range from 0 to 1. Interpretation of r 2 r 2 is proportion of variation in the dependent d variable (Y) that is explained by the estimated least squares regression equation. Larger values of r 2 indicate a better fit of the regression line to the data which indicates a more useful predictive model. 56

57 Calculating r 2 In Excel, you can use the CORREL function to find the correlation coefficient and square this value to find the coefficient of determination For the plasma / weight data, r = so r 2 = = Or you can find r 2 on the Data Analysis Tool Output: Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 8 Multiple R = the correlation coefficient R square = coefficient of determination (r 2 ) 57

58 Interpretation of r 2 For the plasma volume example r 2 = Interpretation: 57.6% of the variation in plasma volume is explained by the regression line equation with weight as the explanatory variable. Since only 57.6% of the variation in plasma volume is explained by body weight, there are likely other variables that explain some of the variation in plasma volume. Multiple l regression analysis uses more than one explanatory variable to predict the dependent variable This is covered in PubH 6415 If there are other explanatory variables significantly related to plasma volume in a multiple regression model, r 2 will increase 58

59 Linear Regression Procedure Look at a scatter plot of the data we have done this Plot Y on the y-axis and X on the x-axis Does the relationship appear to be linear? Estimate the regression line equation we have done this Find the slope and intercept of the regression line Is the relationship statistically significant? Use a t-test test of the slope to determine significance How well does the estimated t regression line equation fit the data? We have done this Calculate R 2 - the coefficient of determination Use the estimated regression line equation to predict values of the dependent variable (Y) for specified values of the independent variable (X). 59

60 Using the Regression Line equation for Prediction i The regression line equation for the weight and plasma volume data is: Y X For a given value of weight (X), the plasma volume (Y) can be predicted. What is the expected plasma volume for an individual who weighs 60 kg? Insert 60 in the equation in place of X and solve for Y: Y * lite liters 60

61 Predicting plasma volume for weight = 60 kg Plasma a Volume (liters) Body Weight (kg) The predicted plasma volume for weight = 60 kg is the point on the regression line corresponding to x = 60. This point is 2.7 liters. 61

62 Appropriate Applications of the Regression Line Equation Predictions using regression line equations are only valid within the range of x-values in the collected data. For the example data, the range of weight is from kgs. It would not be appropriate to use this regression line equation to predict plasma volume for an individual weighing 100 kg or an individual weighing 25 kg. There may be a different relationship between weight and plasma volume beyond the values of the collected data so the relationship identified by the regression line equation should not be extrapolated much beyond the range of the X values. 62

63 More cautions about application of Regression line predictions Predictions using Regression line equations are only valid for the population represented by the sample data. For Example, if data for a regression analysis are collected for girls age 10-18, predictions using the equation are not necessarily valid for boys, adults or girls younger than 10. You can t assume that the relationship between two variables in one population is the same in other populations. Read the study description carefully to identify the population that was sampled. Regression analysis inferences are valid for this population but not necessarily other populations. 63

64 What if there isn t a significant relationship between the variables? If regression analysis reveals that there is NOT a significant relationship between the two variables (that is if the p-value for the t-test test of the slope > ) )the ) regression equation is not useful for predicting values of the dependent variable from the independent variable. If the t-test test of the slope is NOT significant, end the regression analysis procedure and do not use the regression line equation for prediction. Prediction using the regression line equation is only useful if the null hypothesis of independence between the variables is rejected. 64

65 Relationship between Correlation and Regression The correlation coefficient and the slope of the regression line are related. For a given set of data: They will both have the same sign indicating the direction of the relationship (positive or negative). There is a mathematical ti relationship between the slope and the correlation coefficient: the slope of the regression line is equal to the correlation coefficient times the standard deviation of y divided by the standard deviation of x: b 1 rs y s x 65

66 Hypothesis Test of population correlation coefficient: i We can set up a hypothesis test of independence for the population correlation: Null Hypothesis: no significant linear association between the variables Alternative Hypothesis: 0 significant linear association between the variables The test statistic is a t-statistic with n-2 df After finding the t-statistic,,y you can use EXCEL to find the p-value = TDIST(t, n-2, 2) t r n 1 r

67 T-test of the correlation coefficient i For a given sample data, the t-test test for and the t-test test for the slope, 1, will have the same t-statistic t ti ti and p-value. For the plasma volume data, the t-statistic for the test of the population correlation coefficient = which is the same as the t-statistic t ti ti for the slope of the regression line You can work through the equation in EXCEL to confirm this P-value = TDIST( , 6, 2) = The same conclusion is reached from either hypothesis test: t there is a significant ifi relationship between the two variables The p-value < 0.05 so the null hypothesis of independence e is rejected at significance n level el

68 Linear Regression and Correlation: which to use? Both Linear Regression and Correlation Analysis can be used to explore the linear relationship between two continuous (quantitative) random variables Use Correlation analysis when the interest is primarily in identifying whether a relationship exists. Use the t-test test of the correlation coefficient to determine if the relationship is significant. Use Regression ession Analysis to identify a relationship AND to predict the value of one variable given a value of the other variable. Use the t-test test of the slope to determine if the relationship is significant Regression analysis is most useful when there is an identified interest in predicting one variable from the other(s). If prediction doesn t make sense, use correlation analysis. 68

69 Readings and Assignments Reading Chapter 8 pgs , 194, Complete the Lesson 15 Practice Exercises Lesson 15 Excel Modules Excel Module 15: Plasma Volume works through the example in this Lesson Excel Module 15: BMI works through the example in the text (pages , 206, ) 209) Complete OPTIONAL Homework 11: Use the Data Analysis Tool for the Linear Regression problems 69

### Univariate Regression

Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is

### Lesson 4 Part 1. Relationships between. two numerical variables. Correlation Coefficient. Relationship between two

Lesson Part Relationships between two numerical variables Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear between two numerical variables Relationship

### A correlation exists between two variables when one of them is related to the other in some way.

Lecture #10 Chapter 10 Correlation and Regression The main focus of this chapter is to form inferences based on sample data that come in pairs. Given such paired sample data, we want to determine whether

### Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2

Lesson 4 Part 1 Relationships between two numerical variables 1 Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables

### Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship

### SIMPLE REGRESSION ANALYSIS

SIMPLE REGRESSION ANALYSIS Introduction. Regression analysis is used when two or more variables are thought to be systematically connected by a linear relationship. In simple regression, we have only two

### 12/31/2016. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 Understand linear regression with a single predictor Understand how we assess the fit of a regression model Total Sum of Squares

### Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

### X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

### Inferential Statistics

Inferential Statistics Sampling and the normal distribution Z-scores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are

### Simple linear regression

Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

### The scatterplot indicates a positive linear relationship between waist size and body fat percentage:

STAT E-150 Statistical Methods Multiple Regression Three percent of a man's body is essential fat, which is necessary for a healthy body. However, too much body fat can be dangerous. For men between the

### Section 3 Part 1. Relationships between two numerical variables

Section 3 Part 1 Relationships between two numerical variables 1 Relationship between two variables The summary statistics covered in the previous lessons are appropriate for describing a single variable.

### 1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

### 2. Simple Linear Regression

Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

### Simple Linear Regression

Inference for Regression Simple Linear Regression IPS Chapter 10.1 2009 W.H. Freeman and Company Objectives (IPS Chapter 10.1) Simple linear regression Statistical model for linear regression Estimating

### CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the

### Exercise 1.12 (Pg. 22-23)

Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

### Sydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 1. 1. Introduction p. 2. 2. Statistical Methods Used p. 5. 3. 10 and under Males p.

Sydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 1 Table of Contents 1. Introduction p. 2 2. Statistical Methods Used p. 5 3. 10 and under Males p. 8 4. 11 and up Males p. 10 5. 10 and under

### Example: Boats and Manatees

Figure 9-6 Example: Boats and Manatees Slide 1 Given the sample data in Table 9-1, find the value of the linear correlation coefficient r, then refer to Table A-6 to determine whether there is a significant

### Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation

Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation Display and Summarize Correlation for Direction and Strength Properties of Correlation Regression Line Cengage

### Regression Analysis: A Complete Example

Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

### Simple Regression and Correlation

Simple Regression and Correlation Today, we are going to discuss a powerful statistical technique for examining whether or not two variables are related. Specifically, we are going to talk about the ideas

### " Y. Notation and Equations for Regression Lecture 11/4. Notation:

Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

### 7. Tests of association and Linear Regression

7. Tests of association and Linear Regression In this chapter we consider 1. Tests of Association for 2 qualitative variables. 2. Measures of the strength of linear association between 2 quantitative variables.

### Simple Linear Regression Chapter 11

Simple Linear Regression Chapter 11 Rationale Frequently decision-making situations require modeling of relationships among business variables. For instance, the amount of sale of a product may be related

### Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing

### Homework 11. Part 1. Name: Score: / null

Name: Score: / Homework 11 Part 1 null 1 For which of the following correlations would the data points be clustered most closely around a straight line? A. r = 0.50 B. r = -0.80 C. r = 0.10 D. There is

### e = random error, assumed to be normally distributed with mean 0 and standard deviation σ

1 Linear Regression 1.1 Simple Linear Regression Model The linear regression model is applied if we want to model a numeric response variable and its dependency on at least one numeric factor variable.

### Simple Predictive Analytics Curtis Seare

Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use

### AMS7: WEEK 8. CLASS 1. Correlation Monday May 18th, 2015

AMS7: WEEK 8. CLASS 1 Correlation Monday May 18th, 2015 Type of Data and objectives of the analysis Paired sample data (Bivariate data) Determine whether there is an association between two variables This

### For example, enter the following data in three COLUMNS in a new View window.

Statistics with Statview - 18 Paired t-test A paired t-test compares two groups of measurements when the data in the two groups are in some way paired between the groups (e.g., before and after on the

### Statistical Modelling in Stata 5: Linear Models

Statistical Modelling in Stata 5: Linear Models Mark Lunt Arthritis Research UK Centre for Excellence in Epidemiology University of Manchester 08/11/2016 Structure This Week What is a linear model? How

Using Your TI-NSpire Calculator: Linear Correlation and Regression Dr. Laura Schultz Statistics I This handout describes how to use your calculator for various linear correlation and regression applications.

### Introduction to Regression. Dr. Tom Pierce Radford University

Introduction to Regression Dr. Tom Pierce Radford University In the chapter on correlational techniques we focused on the Pearson R as a tool for learning about the relationship between two variables.

### Outline. Correlation & Regression, III. Review. Relationship between r and regression

Outline Correlation & Regression, III 9.07 4/6/004 Relationship between correlation and regression, along with notes on the correlation coefficient Effect size, and the meaning of r Other kinds of correlation

### 2013 MBA Jump Start Program. Statistics Module Part 3

2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just

### DATA INTERPRETATION AND STATISTICS

PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE

### SELF-TEST: SIMPLE REGRESSION

ECO 22000 McRAE SELF-TEST: SIMPLE REGRESSION Note: Those questions indicated with an (N) are unlikely to appear in this form on an in-class examination, but you should be able to describe the procedures

### EXPERIMENT 6: HERITABILITY AND REGRESSION

BIO 184 Laboratory Manual Page 74 EXPERIMENT 6: HERITABILITY AND REGRESSION DAY ONE: INTRODUCTION TO HERITABILITY AND REGRESSION OBJECTIVES: Today you will be learning about some of the basic ideas and

### Simple Linear Regression

1 Excel Manual Simple Linear Regression Chapter 13 This chapter discusses statistics involving the linear regression. Excel has numerous features that work well for comparing quantitative variables both

### Lecture 18 Linear Regression

Lecture 18 Statistics Unit Andrew Nunekpeku / Charles Jackson Fall 2011 Outline 1 1 Situation - used to model quantitative dependent variable using linear function of quantitative predictor(s). Situation

### Residuals. Residuals = ª Department of ISM, University of Alabama, ST 260, M23 Residuals & Minitab. ^ e i = y i - y i

A continuation of regression analysis Lesson Objectives Continue to build on regression analysis. Learn how residual plots help identify problems with the analysis. M23-1 M23-2 Example 1: continued Case

### Using Minitab for Regression Analysis: An extended example

Using Minitab for Regression Analysis: An extended example The following example uses data from another text on fertilizer application and crop yield, and is intended to show how Minitab can be used to

### Statistiek II. John Nerbonne. March 24, 2010. Information Science, Groningen Slides improved a lot by Harmut Fitz, Groningen!

Information Science, Groningen j.nerbonne@rug.nl Slides improved a lot by Harmut Fitz, Groningen! March 24, 2010 Correlation and regression We often wish to compare two different variables Examples: compare

### Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

### Regression Analysis. Pekka Tolonen

Regression Analysis Pekka Tolonen Outline of Topics Simple linear regression: the form and estimation Hypothesis testing and statistical significance Empirical application: the capital asset pricing model

### In Chapter 2, we used linear regression to describe linear relationships. The setting for this is a

Math 143 Inference on Regression 1 Review of Linear Regression In Chapter 2, we used linear regression to describe linear relationships. The setting for this is a bivariate data set (i.e., a list of cases/subjects

### 0.1 Multiple Regression Models

0.1 Multiple Regression Models We will introduce the multiple Regression model as a mean of relating one numerical response variable y to two or more independent (or predictor variables. We will see different

### Practice 3 SPSS. Partially based on Notes from the University of Reading:

Practice 3 SPSS Partially based on Notes from the University of Reading: http://www.reading.ac.uk Simple Linear Regression A simple linear regression model is fitted when you want to investigate whether

### Multiple Regression Analysis in Minitab 1

Multiple Regression Analysis in Minitab 1 Suppose we are interested in how the exercise and body mass index affect the blood pressure. A random sample of 10 males 50 years of age is selected and their

### Simple Linear Regression in SPSS STAT 314

Simple Linear Regression in SPSS STAT 314 1. Ten Corvettes between 1 and 6 years old were randomly selected from last year s sales records in Virginia Beach, Virginia. The following data were obtained,

Using Your TI-83/84 Calculator: Linear Correlation and Regression Elementary Statistics Dr. Laura Schultz This handout describes how to use your calculator for various linear correlation and regression

### where b is the slope of the line and a is the intercept i.e. where the line cuts the y axis.

Least Squares Introduction We have mentioned that one should not always conclude that because two variables are correlated that one variable is causing the other to behave a certain way. However, sometimes

### Module 5: Multiple Regression Analysis

Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

### Analyzing Linear Relationships, Two or More Variables

PART V ANALYZING RELATIONSHIPS CHAPTER 14 Analyzing Linear Relationships, Two or More Variables INTRODUCTION In the previous chapter, we introduced Kate Cameron, the owner of Woodbon, a company that produces

### , then the form of the model is given by: which comprises a deterministic component involving the three regression coefficients (

Multiple regression Introduction Multiple regression is a logical extension of the principles of simple linear regression to situations in which there are several predictor variables. For instance if we

### Chapter 11: Two Variable Regression Analysis

Department of Mathematics Izmir University of Economics Week 14-15 2014-2015 In this chapter, we will focus on linear models and extend our analysis to relationships between variables, the definitions

### Introduction to Regression and Data Analysis

Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

### Technology Step-by-Step Using StatCrunch

Technology Step-by-Step Using StatCrunch Section 1.3 Simple Random Sampling 1. Select Data, highlight Simulate Data, then highlight Discrete Uniform. 2. Fill in the following window with the appropriate

### Statistical Functions in Excel

Statistical Functions in Excel There are many statistical functions in Excel. Moreover, there are other functions that are not specified as statistical functions that are helpful in some statistical analyses.

### Regression in SPSS. Workshop offered by the Mississippi Center for Supercomputing Research and the UM Office of Information Technology

Regression in SPSS Workshop offered by the Mississippi Center for Supercomputing Research and the UM Office of Information Technology John P. Bentley Department of Pharmacy Administration University of

### Simple Linear Regression

STAT 101 Dr. Kari Lock Morgan Simple Linear Regression SECTIONS 9.3 Confidence and prediction intervals (9.3) Conditions for inference (9.1) Want More Stats??? If you have enjoyed learning how to analyze

### Study Resources For Algebra I. Unit 1C Analyzing Data Sets for Two Quantitative Variables

Study Resources For Algebra I Unit 1C Analyzing Data Sets for Two Quantitative Variables This unit explores linear functions as they apply to data analysis of scatter plots. Information compiled and written

### Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing

### NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

### The correlation coefficient

The correlation coefficient Clinical Biostatistics The correlation coefficient Martin Bland Correlation coefficients are used to measure the of the relationship or association between two quantitative

### AP * Statistics Review. Linear Regression

AP * Statistics Review Linear Regression Teacher Packet Advanced Placement and AP are registered trademark of the College Entrance Examination Board. The College Board was not involved in the production

### 12.1 Inference for Linear Regression

12.1 Inference for Linear Regression Least Squares Regression Line y = a + bx You might want to refresh your memory of LSR lines by reviewing Chapter 3! 1 Sample Distribution of b p740 Shape Center Spread

### Regression step-by-step using Microsoft Excel

Step 1: Regression step-by-step using Microsoft Excel Notes prepared by Pamela Peterson Drake, James Madison University Type the data into the spreadsheet The example used throughout this How to is a regression

### Chapter 12 : Linear Correlation and Linear Regression

Number of Faculty Chapter 12 : Linear Correlation and Linear Regression Determining whether a linear relationship exists between two quantitative variables, and modeling the relationship with a line, if

### Algebra I: Lesson 5-4 (5074) SAS Curriculum Pathways

Two-Variable Quantitative Data: Lesson Summary with Examples Bivariate data involves two quantitative variables and deals with relationships between those variables. By plotting bivariate data as ordered

### Session 7 Bivariate Data and Analysis

Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares

### 496 STATISTICAL ANALYSIS OF CAUSE AND EFFECT

496 STATISTICAL ANALYSIS OF CAUSE AND EFFECT * Use a non-parametric technique. There are statistical methods, called non-parametric methods, that don t make any assumptions about the underlying distribution

### Module 5: Statistical Analysis

Module 5: Statistical Analysis To answer more complex questions using your data, or in statistical terms, to test your hypothesis, you need to use more advanced statistical tests. This module reviews the

### Descriptive Statistics

Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

### 17.0 Linear Regression

17.0 Linear Regression 1 Answer Questions Lines Correlation Regression 17.1 Lines The algebraic equation for a line is Y = β 0 + β 1 X 2 The use of coordinate axes to show functional relationships was

### Class 6: Chapter 12. Key Ideas. Explanatory Design. Correlational Designs

Class 6: Chapter 12 Correlational Designs l 1 Key Ideas Explanatory and predictor designs Characteristics of correlational research Scatterplots and calculating associations Steps in conducting a correlational

### Correlation and Regression

Correlation and Regression Scatterplots Correlation Explanatory and response variables Simple linear regression General Principles of Data Analysis First plot the data, then add numerical summaries Look

### LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.

### , has mean A) 0.3. B) the smaller of 0.8 and 0.5. C) 0.15. D) which cannot be determined without knowing the sample results.

BA 275 Review Problems - Week 9 (11/20/06-11/24/06) CD Lessons: 69, 70, 16-20 Textbook: pp. 520-528, 111-124, 133-141 An SRS of size 100 is taken from a population having proportion 0.8 of successes. An

### Calculate Confidence Intervals Using the TI Graphing Calculator

Calculate Confidence Intervals Using the TI Graphing Calculator Confidence Interval for Population Proportion p Confidence Interval for Population μ (σ is known 1 Select: STAT / TESTS / 1-PropZInt x: number

### Regression in ANOVA. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Regression in ANOVA James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) 1 / 30 Regression in ANOVA 1 Introduction 2 Basic Linear

### Multiple Regression in SPSS STAT 314

Multiple Regression in SPSS STAT 314 I. The accompanying data is on y = profit margin of savings and loan companies in a given year, x 1 = net revenues in that year, and x 2 = number of savings and loan

### Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Course Text Business Statistics Lind, Douglas A., Marchal, William A. and Samuel A. Wathen. Basic Statistics for Business and Economics, 7th edition, McGraw-Hill/Irwin, 2010, ISBN: 9780077384470 [This

### Lecture - 32 Regression Modelling Using SPSS

Applied Multivariate Statistical Modelling Prof. J. Maiti Department of Industrial Engineering and Management Indian Institute of Technology, Kharagpur Lecture - 32 Regression Modelling Using SPSS (Refer

### In Chapter 27 we tried to predict the percent body fat of male subjects from

29 C H A P T E R Multiple Regression WHO WHAT UNITS WHEN WHERE WHY 25 Male subjects Body fat and waist size %Body fat and inches 199s United States Scientific research In Chapter 27 we tried to predict

### Regression. In this class we will:

AMS 5 REGRESSION Regression The idea behind the calculation of the coefficient of correlation is that the scatter plot of the data corresponds to a cloud that follows a straight line. This idea can be

### Chapter 23. Inferences for Regression

Chapter 23. Inferences for Regression Topics covered in this chapter: Simple Linear Regression Simple Linear Regression Example 23.1: Crying and IQ The Problem: Infants who cry easily may be more easily

### Relationships Between Two Variables: Scatterplots and Correlation

Relationships Between Two Variables: Scatterplots and Correlation Example: Consider the population of cars manufactured in the U.S. What is the relationship (1) between engine size and horsepower? (2)

### Chapter 5 Analysis of variance SPSS Analysis of variance

Chapter 5 Analysis of variance SPSS Analysis of variance Data file used: gss.sav How to get there: Analyze Compare Means One-way ANOVA To test the null hypothesis that several population means are equal,

### An analysis appropriate for a quantitative outcome and a single quantitative explanatory. 9.1 The model behind linear regression

Chapter 9 Simple Linear Regression An analysis appropriate for a quantitative outcome and a single quantitative explanatory variable. 9.1 The model behind linear regression When we are examining the relationship

### Section 13, Part 1 ANOVA. Analysis Of Variance

Section 13, Part 1 ANOVA Analysis Of Variance Course Overview So far in this course we ve covered: Descriptive statistics Summary statistics Tables and Graphs Probability Probability Rules Probability

### Math 62 Statistics Sample Exam Questions

Math 62 Statistics Sample Exam Questions 1. (10) Explain the difference between the distribution of a population and the sampling distribution of a statistic, such as the mean, of a sample randomly selected

### Simple Linear Regression One Binary Categorical Independent Variable

Simple Linear Regression Does sex influence mean GCSE score? In order to answer the question posed above, we want to run a linear regression of sgcseptsnew against sgender, which is a binary categorical

### ID X Y

Dale Berger SPSS Step-by-Step Regression Introduction: MRC01 This step-by-step example shows how to enter data into SPSS and conduct a simple regression analysis to develop an equation to predict from.

### Chapter 9. Section Correlation

Chapter 9 Section 9.1 - Correlation Objectives: Introduce linear correlation, independent and dependent variables, and the types of correlation Find a correlation coefficient Test a population correlation