Lesson Lesson Outline Outline

Size: px
Start display at page:

Download "Lesson Lesson Outline Outline"

Transcription

1 Lesson 15 Linear Regression

2 Lesson 15 Outline Review correlation analysis Dependent and Independent variables Least Squares Regression line Calculating l the slope Calculating the Intercept Residuals and Residual Plots Identifying significant relationship: t-test test of the slope R 2 : coefficient of determination Using the regression line for Prediction of Y from X Relationship between correlation coefficient and linear regression 2

3 Linear Regression and Correlation Both Linear Regression ess and Correlation o Analysis s can be used to explore the linear relationship between two continuous (quantitative) random variables. Correlation analysis is used when the interest is in identifying if a relationship exists and quantifying the strength of the relationship Regression Analysis is used to identify a relationship AND to predict the value of one variable given a value of the other variable(s). 3

4 Review: Correlation Analysis 1. Plot the data using a scatter plot to get a visual idea of the relationship 2. Calculate the correlation coefficient 1. Use Pearson s correlation coefficient if both variables are continuous 2. Use Spearman rank correlation coefficient if both variables are ordinal or one is ordinal and the other continuous. 4

5 Review: Scatter Plots and Association i Plot the 2 variables in a scatter plot (EXCEL) The pattern of the dots in the plot indicates the statistical relationship between the variables (the strength th and the direction) Positive relationship pattern goes from lower left to upper right. Negative relationship pattern goes from upper left to lower right. The more the dots cluster around a straight line with a positive or negative direction the stronger the linear relationship. 5

6 Review: Correlation Coefficient r ( x x )( y y ) [ ( x x ) 2 ][ ( y y) 2 ] The statistic r is called the Correlation Coefficient r estimated the population correlation coefficient: (the Greek letter r ) The correlation coefficient provides a measure of the linear association between two variables r is always between 1 and 1 6

7 Review: Correlation Coefficient i in Excel Use the CORREL function to find the correlation coefficient If data for one variable are in cells A1:A12 and data for other variable are in cells B1:B12, =CORREL(A1:A12,B1:B12) will return the Pearson correlation coefficient. Correlation coefficients i closer to 1 or 1 1i indicate a stronger linear relationship. Correlation coefficients close to 0 indicate a weak linear relationship. However there could be a nonlinear relationship when the correlation coefficient is close to 0. 7

8 Simple Linear Regression Like correlation analysis, Linear regression analysis is a technique that is used to explore the relationship between two continuous random variables that have a linear relationship. Regression analysis allows us to investigate the change in one variable that corresponds to a given change in the other variable. If only ONE variable is used to predict the value of the other variable, the analysis is called simple linear regression. When two or more variables are used to predict the value of the other variable, the analysis is called multiple linear regression (not covered in this course). 8

9 Linear Regression: Background Regression is from a Latin root meaning going back Linear regression as a statistical method was first described by Sir Francis Galton in his paper "Regression Towards Mediocrity in Hereditary Stature published in The Journal of the Anthropological Institute, 1886 Galton described the relationship between mid-parent height (Mid- parent height = the average of the 2 parent s height) and the height of their offspring Taller mid-parent height had children with heights closer to the average height Shorter mid-parent height had children with heights closer to the average height Galton called this phenomenon regression towards mediocrity 9

10 Sir Francis Galton: Regression When mid-parents are taller than mediocrity, their children tend to be shorter than they and When mid-parents are shorter than mediocrity, it their children tend to be taller than they 10

11 Variables in Simple Linear Regression Analysis Dependent or response variable- a variable to be predicted from or explained by the other variable The response variable is typically labeled Y Y is a continuous variable in simple linear regression Independent or explanatory variable the variable used to predict the dependant variable. This variable is typically labeled l X X can also be called the predictive variable or the regressor variable For simple linear regression X is a continuous variable For multiple linear regression X can be continuous or categorical 11

12 Identifying independent and dependent variables. In regression analysis, it s important to correctly identify the dependent d (Y) and independent d (X) variables. The study description should provide you with information about which is the dependent variable and which is the independent variable. If the study description states that the goal is to predict variable 1 from variable 2, 2 then variable 1 is the dependent variable (Y) and variable 2 is the independent variable (X). Typically, if the variables are separated in time, the variable collected first is the independent variable (X) )andthevariable collected later is the dependent variable (Y). In Galton s regression analysis, the mid-parent height was the independent variable and the offspring height was the dependent variable 12

13 Linear Regression Overview Look at a scatter plot of the data Plot Y on the y-axis and X on the x-axis Does the relationship appear to be linear? Estimate the regression line equation Find the slope and intercept of the regression line Check residuals Is the relationship statistically significant? Use a t-test test of the slope to determine significance How well does the estimated regression line equation fit the data? Calculate R 2 - the coefficient of determination Use the estimated regression line equation to predict values of fth the dependent d variable (Y)f for specified values of fth the independent variable (X). 13

14 Simple Linear Regression: An Example Is there a linear relationship between body weight and plasma volume that can be used to predict plasma volume from weight? Plasma volume is the dependent variable Y since we are interested in predicting this from body weight, the independent variable X. Body Plasma Subject Weight(kg) Volume(l)

15 Scatter plot of the Data There is a positive relationship between plasma volume and body weight. With this small number of data points it is difficult to see the linear relationship but there is a general linear trend to the data We want to identify a line that has a good fit to the data. This isn t a deterministic relationship so the points won t fall perfectly on the line. 4 Volume (liter rs) Plasma Body Weight (kg) 15

16 Estimate the Regression Line Equation A few of the many possible lines through the data points are illustrated t in the plot. How do we decide which h line best fits the data? 4 Pla asma Volum me (liters) Body Weight (kg) 16

17 Least Squares Regression Line The linear regression line is the line that gets closest to all of the points. This is called the least squares regression line. The least squares regression line minimizes the sum of the squares of the vertical distance between each observed data point (y i ) and the line minimize n ( y i 1 2 i point on linei) 17

18 Vertical distances between each observed Y (y i ) and the line are in red. The sum of these distances squared is minimized by the least squares regression line 4 Plasma a Volume (L) Body Weight (kg) 18

19 Least Squares Regression Line Equation The equation for a line requires a slope and an intercept In regression analysis, we estimate the population regression line with the least squares regression line calculated l from sample data: the sample regression line The notation for the slope and intercept in the population regression line are Greek letters for the intercept for the slope The notation for the slope and intercept in the sample regression line are Roman letters a for the intercept t b for the slope 19

20 The Population Regression Line 0 is the y -intercept of the line is the slope of the regression line 1 is the error term - the difference between the observed Y and the regression line Y X 20

21 Sample Regression Line 0 and ad 1 are aepopulation o parameters a Sample estimates for the regression parameters are : a is the estimate for b is the estimate for Y a bx is the regression line calculated from sample dt data Y is the predicted value of Y 21

22 Least Squares Regression Line aand and b are estimates of the regression coefficients and The regression coefficients are estimated from the sample data by the least squares method The intercept a is the estimated expected value of Y when X= 0 The slope b is the estimated expected change in Y corresponding to a 1 unit increase in X Y is the expected (or predicted) value of y, the point on the line. It is called the fitted value of y The following slide illustrates the least squares regression ession line 22

23 The Equation of a Regression y y Line Y a bx b a intercept 0 One-unit Change in X slope x 23

24 Interpretation of predicted values of Y The predicted value of y is the expected y-value Since not all observed data points are exactly on the regression line, there is a range of possible y-values (a distribution) for each x-value. In regression analysis the distribution of y-values for each x-value is assumed to be a normal distribution. The predicted values of y represent the mean values of the distributions of y for each specified value of x. The following slide illustrates this for 3 values of X: notice that t the mean of each distribution ib ti is on the regression line equation (the predicted value of y) and that the distribution of y-values are normal distributions. 24

25 Simple Linear Regression Model Illustrated 25

26 Assumptions for Regression Analysis There are several assumptions that should be met for regression analysis: For each value of X, the Y variable is assumed to have a normal distribution the mean of the normal distribution is the predicted value, Y The normal distributions are assumed to have equal variance across the entire range of X values. This assumption is called homogeneity or homoscedasticity. The predicted values of Y fall on the regression line representing the linear relationship between X and Y The Y observations are assumed to be independent The observations are from a random sample 26

27 Interpretation of the Slope of the Regression line The slope b is the expected change in Y corresponding to a 1 unit increase in X b = 0: There is no linear association between Y and X b > 0: There is a Positive linear association between Y and X (as X increases the expected value of Y increases) b < 0: There is a Negative linear association between Y and X (as X increases the expected value of Y decreases) The following slide illustrates a positive, negative and 0 slope. 27

28 Illustration of Negative, Positive slopes y and slope = 0 y b >0 b = 0 b < 0 0 x 28

29 Calculating the Slope of the Regression Line The formula to calculate the slope of the least squares regression line is given below b n ( x x )( y y ) i 1 i i n ( ) x x i i Notice that the numerator is the same as the numerator in the formula for the correlation coefficient. 29

30 b for plasma (Y) and body weight (X) example X Y (X- Xbar) (Y-Ybar) (X-Xbar)(Y-Ybar) (X-Xbar) Mean SUM

31 Slope of regression line From the previous slide the sum of (X-X)(Y-Y) Y) = The sum of (X-X) X) 2 = b = / = Interpretation of the slope: For every one unit increase in X, the expected increase in Y is units (rounded to 4 decimal places) Plasma volume increases liters for every one kg increase in body weight. The slope is positive indicating that as body weight (X) increases, plasma volume (Y) also increases 31

32 Calculating the Intercept of the regression line The intercept a of the regression line is the estimated value of Y when X = 0 a is calculated from the average value of Y, the average value of X and the estimated t slope b by the following formula: a Y bx 32

33 Intercept for Plasma Volume Example X Y b a * The intercept is the estimated expected value of Y when X = 0. Intercepts do not always have realistic interpretations. In this example, plasma volume is predicted to be liters when body weight = 0 kg. which h is not a possibility. 33

34 Regression Line Equation Once the slope and the intercept have been calculated the regression equation can be constructed: t Y a bx Y X This is the equation that will be used to predict plasma volume (l) from body weight (kg). The regression equation calculated from sample data is an estimate of the true population regression equation. 34

35 Regression Line Equation and interpretation i of the slope A 1 unit increase in X for this data = 1 kg so the interpretation of the slope in this regression line equation is: For each 1 kg increase in body weight, the expected increase in plasma volume is.0436 liters. What is the expected plasma volume increase for a 10 kg increase in body weight? For a 10 kilogram increase in body weight, the expected increase in plasma volume = 10* = liters. 35

36 What if the slope of the regression line is negative? If the slope of the regression line is negative we would expect a decrease in Y with each unit increase in X. The slope is a measure of the expected change in Y for each 1-unit increase in X If the slope is positive, the expected change in Y is an increase If the slope is negative, the expected change in Y is a decrease. 36

37 Regression Coefficients in Excel Excel has functions to calculate the slope and the intercept of the least squares regression line: The SLOPE function returns b - the slope =SLOPE(y-range, x-range) The INTERCEPT function returns a -the intercept =INTERCEPT(y-range, x-range) For both of these functions enter the y-range of fd data first and dth then the x-range of fth the data. 37

38 Plasma Volume Example in Excel The Lesson 15 Excel Module works through h the Plasma Volume / body weight regression example: Create a scatterplot of the data work through the calculations of the Slope and Intercept of the regression line Use the Excel Slope and Intercept functions After you ve worked through the calculations once, use the Excel functions to find the slope and intercept for future regression problems 38

39 Residuals The residual is st the ed difference ee cebet between ee the observed (Y) and the expected (Y ) value of Y Residual = Y Y Y is the observed Y for any X Y is the Y-value on the regression line for that t value of X The residual is the component of Y that is not predicted by X The least squares regression line is the line that minimizes the squared residuals 39

40 Residuals for Plasma Volume Example X Y Y' Residual Which point is closest to the regression line? Which point is furthest from the regression line? Calculate Y, the expected value of Y, using the regression line equation. The residual is the difference between Y and Y (74, 3.37) has the smallest residual (70.5, 3.49) has the largest residual 40

41 Regression Line and Residuals Largest residual Plasm ma Volume (L) Body Weight (kg) Smallest residual 41

42 Analysis of Residuals A Residual plot is a plot of the residual values on the Y- axis and the x-values on the X-axis If there is a linear relationship between X and Y, the correlation between X and the residuals should equal 0. The scatterplot will be a random scatter of points with no evident linear pattern. A nonlinear relationship between X and Y will be more evident in the residual plot of the (X, residual) data than in the scatterplot of the original (X, Y) data The Excel Regression analysis tool has an option for selecting the Residual plot. The Residual plot for the plasma volume example is on the following slide. 42

43 Residual Plot for Plasma Volume Body weight data body weight (kg) Residual Plot Re esiduals body weight (kg) No evidence of nonlinearity. The points are equally distributed around the value 0 with no evident positive or negative slope 43

44 (X, Y) Scatterplot for a nonlinear (or curvilinear) relationship When there is a curvilinear relationship between X and Y, the least squares regression line does not represent the relationship 44

45 Residual Plot for Curvilinear Relationship X Residual Plot 6 4 Residuals X This is the residual plot for the relationship on the previous slide. It illustrates that the relationship is not linear. The residual plot points aren t evenly distributed around the value 0. 45

46 Regression analysis for curvilinear relationships Simple linear regression analysis should not be used when X and Y have a curvilinear relationship There are several strategies for dealing with a curvilinear relationship between X and Y One option is to try a logarithmic transformation of the data to see if this improves the linear relationship Another option is to use piecewise regression fit one regression line to the increasing portion of the curve and a second regression line to the decreasing portion of the curve Athid third option is to include X 2 or X 3 in the regression equation (covered in PubH 6415 with multiple regression models). 46

47 Linear Regression Procedure Look at a scatter plot of the data Plot Y on the y-axis and X on the x-axis Add the trend line to the plot Estimate the regression line equation Find the slope and intercept of the regression line Check Residuals Is the relationship between X and Y statistically significant? Use a t-test test t of the slope to determine significance ifi How well does the estimated regression line equation fit the data? Calculate R 2 - the coefficient of determination Use the estimated regression line equation to predict values of the dependent variable (Y) for specified values of the independent variable (X). 47

48 Is the relationship between X and Y significant? ifi If the slope of the regression line = 0, this indicates there is no linear relationship between the variables. If there is no linear relationship the variables are considered to be independent Att t-test test t of the slope estimate t can be done to test t for independence between the X and Y variables Null hypothesis: slope = 0 The null hypothesis states t that t the variables are independent d Alternative hypothesis: slope 0 The alternative hypothesis is that there is a significant relationship between the variables If the t-test test of the slope result is significant (p-value < ), reject the null hypothesis and conclude that there is a statistically significant relationship between the two variables. 48

49 Notation for Population slope and Intercept As in any hypothesis test, the null and alternative hypotheses are stated about the population parameters, not about the estimates. The population parameters for the slope and intercept t of the regression line for the population are the Greek letters 1 and 0 1 is the population parameter for the slope 0 is the population parameter for the intercept The statistic for the t-test test of the slope will use the estimated value of the slope (b) that is calculated from the data. 49

50 t-test test of the Slope 1. State the Hypotheses Null hypothesis: = 0 Alternative hypothesis: 0 2. A t-test test will be used to test the hypothesis 3. Significance level = The degrees of freedom for a t-test test of the slope are n-2 where n=sample size The critical values of the t-test test are found using TINV(0.05, 05 df). For the plasma volume example, n = 8 so the critical values = TINV(0.05, 6) = and

51 t-test test of the slope 5. Calculate the test statistic the slope estimate divided by the standard error of the slope t b 1 SE( b 1 ) The formula for the SE of the slope is complicated so we will use the Excel Data Analysis Tool to do this t- test. The Data Analysis Tool provides the t-statistic and the p-value of the t-test test of the slope 6. State the conclusion. If the test statistic is more extreme than the critical values reject the null hypothesis and conclude that there is a significant relationship between the variables. 51

52 T-test of the Slope in Excel Data Analysis Tool output for the weight / plasma volume example: The t-statistic and p-value for the t-test of the slope are highlighted SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted d R Square Standard Error Observations 8 ANOVA df SS MS F Significance F Regression Residual Total Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept Body weight P-value for t-test test = so reject the null hypothesis and conclude that there is a significant relationship between weight and plasma volume 52

53 Regression Analysis in Excel In Excel Module 15 use the Data Analysis Tool to obtain the Regression Analysis results select Regression under the Data Analysis Tool. Enter the plasma volume data for Y-range and the weight data for X-range Check labels if you highlight the column headers Also check Residuals and Residual Plot Identify the t-statistic t ti ti and the p-value for the t-test test t of the slope. Also identify the slope and the intercept on the output table These are under the Coefficients column 95% confidence intervals for the coefficients are also provided if the Confidence Level box is checked 53

54 T-test of the Intercept The Data Analysis Tool also provides results of a t-test test of the Intercept. The Null hypothesis of this test is that the intercept = 0: = 0 The Alternative ti hypothesis of this test t is that t the intercept 0: 0 Usually there is not much interest in the t-test test of the intercept because testing whether the intercept = 0 does not provide information about the relationship between the two variables. From the Regression Table, you can see that the null hypothesis for the intercept = 0 is not rejected because the p-value = This result does not affect the significant result of the t-test test of the slope. 54

55 Linear Regression Procedure Look at a scatter plot of the data Plot Y on the y-axis and X on the x-axis Add the trend line to the plot Estimate the regression line equation Find the slope and intercept of the regression line Is the relationship statistically significant? Use a t-test test of the slope to determine significance How well does the estimated t regression line equation fit the data? Calculate R 2 - the coefficient of determination Use the estimated regression line equation to predict values of the dependent variable (Y) for specified values of the independent variable (X). 55

56 How well does the regression line equation fit the data? r 2 is st the notation otato for the ecoe coefficient ce to of determination r 2 is equal to the correlation coefficient (r) squared. It can range from 0 to 1. Interpretation of r 2 r 2 is proportion of variation in the dependent d variable (Y) that is explained by the estimated least squares regression equation. Larger values of r 2 indicate a better fit of the regression line to the data which indicates a more useful predictive model. 56

57 Calculating r 2 In Excel, you can use the CORREL function to find the correlation coefficient and square this value to find the coefficient of determination For the plasma / weight data, r = so r 2 = = Or you can find r 2 on the Data Analysis Tool Output: Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 8 Multiple R = the correlation coefficient R square = coefficient of determination (r 2 ) 57

58 Interpretation of r 2 For the plasma volume example r 2 = Interpretation: 57.6% of the variation in plasma volume is explained by the regression line equation with weight as the explanatory variable. Since only 57.6% of the variation in plasma volume is explained by body weight, there are likely other variables that explain some of the variation in plasma volume. Multiple l regression analysis uses more than one explanatory variable to predict the dependent variable This is covered in PubH 6415 If there are other explanatory variables significantly related to plasma volume in a multiple regression model, r 2 will increase 58

59 Linear Regression Procedure Look at a scatter plot of the data we have done this Plot Y on the y-axis and X on the x-axis Does the relationship appear to be linear? Estimate the regression line equation we have done this Find the slope and intercept of the regression line Is the relationship statistically significant? Use a t-test test of the slope to determine significance How well does the estimated t regression line equation fit the data? We have done this Calculate R 2 - the coefficient of determination Use the estimated regression line equation to predict values of the dependent variable (Y) for specified values of the independent variable (X). 59

60 Using the Regression Line equation for Prediction i The regression line equation for the weight and plasma volume data is: Y X For a given value of weight (X), the plasma volume (Y) can be predicted. What is the expected plasma volume for an individual who weighs 60 kg? Insert 60 in the equation in place of X and solve for Y: Y * lite liters 60

61 Predicting plasma volume for weight = 60 kg Plasma a Volume (liters) Body Weight (kg) The predicted plasma volume for weight = 60 kg is the point on the regression line corresponding to x = 60. This point is 2.7 liters. 61

62 Appropriate Applications of the Regression Line Equation Predictions using regression line equations are only valid within the range of x-values in the collected data. For the example data, the range of weight is from kgs. It would not be appropriate to use this regression line equation to predict plasma volume for an individual weighing 100 kg or an individual weighing 25 kg. There may be a different relationship between weight and plasma volume beyond the values of the collected data so the relationship identified by the regression line equation should not be extrapolated much beyond the range of the X values. 62

63 More cautions about application of Regression line predictions Predictions using Regression line equations are only valid for the population represented by the sample data. For Example, if data for a regression analysis are collected for girls age 10-18, predictions using the equation are not necessarily valid for boys, adults or girls younger than 10. You can t assume that the relationship between two variables in one population is the same in other populations. Read the study description carefully to identify the population that was sampled. Regression analysis inferences are valid for this population but not necessarily other populations. 63

64 What if there isn t a significant relationship between the variables? If regression analysis reveals that there is NOT a significant relationship between the two variables (that is if the p-value for the t-test test of the slope > ) )the ) regression equation is not useful for predicting values of the dependent variable from the independent variable. If the t-test test of the slope is NOT significant, end the regression analysis procedure and do not use the regression line equation for prediction. Prediction using the regression line equation is only useful if the null hypothesis of independence between the variables is rejected. 64

65 Relationship between Correlation and Regression The correlation coefficient and the slope of the regression line are related. For a given set of data: They will both have the same sign indicating the direction of the relationship (positive or negative). There is a mathematical ti relationship between the slope and the correlation coefficient: the slope of the regression line is equal to the correlation coefficient times the standard deviation of y divided by the standard deviation of x: b 1 rs y s x 65

66 Hypothesis Test of population correlation coefficient: i We can set up a hypothesis test of independence for the population correlation: Null Hypothesis: no significant linear association between the variables Alternative Hypothesis: 0 significant linear association between the variables The test statistic is a t-statistic with n-2 df After finding the t-statistic,,y you can use EXCEL to find the p-value = TDIST(t, n-2, 2) t r n 1 r

67 T-test of the correlation coefficient i For a given sample data, the t-test test for and the t-test test for the slope, 1, will have the same t-statistic t ti ti and p-value. For the plasma volume data, the t-statistic for the test of the population correlation coefficient = which is the same as the t-statistic t ti ti for the slope of the regression line You can work through the equation in EXCEL to confirm this P-value = TDIST( , 6, 2) = The same conclusion is reached from either hypothesis test: t there is a significant ifi relationship between the two variables The p-value < 0.05 so the null hypothesis of independence e is rejected at significance n level el

68 Linear Regression and Correlation: which to use? Both Linear Regression and Correlation Analysis can be used to explore the linear relationship between two continuous (quantitative) random variables Use Correlation analysis when the interest is primarily in identifying whether a relationship exists. Use the t-test test of the correlation coefficient to determine if the relationship is significant. Use Regression ession Analysis to identify a relationship AND to predict the value of one variable given a value of the other variable. Use the t-test test of the slope to determine if the relationship is significant Regression analysis is most useful when there is an identified interest in predicting one variable from the other(s). If prediction doesn t make sense, use correlation analysis. 68

69 Readings and Assignments Reading Chapter 8 pgs , 194, Complete the Lesson 15 Practice Exercises Lesson 15 Excel Modules Excel Module 15: Plasma Volume works through the example in this Lesson Excel Module 15: BMI works through the example in the text (pages , 206, ) 209) Complete OPTIONAL Homework 11: Use the Data Analysis Tool for the Linear Regression problems 69

Univariate Regression

Univariate Regression Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is

More information

Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2

Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2 Lesson 4 Part 1 Relationships between two numerical variables 1 Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables

More information

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Section 14 Simple Linear Regression: Introduction to Least Squares Regression Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship

More information

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1) CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

More information

Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

More information

Section 3 Part 1. Relationships between two numerical variables

Section 3 Part 1. Relationships between two numerical variables Section 3 Part 1 Relationships between two numerical variables 1 Relationship between two variables The summary statistics covered in the previous lessons are appropriate for describing a single variable.

More information

2. Simple Linear Regression

2. Simple Linear Regression Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

More information

Simple linear regression

Simple linear regression Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

More information

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

More information

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the

More information

Exercise 1.12 (Pg. 22-23)

Exercise 1.12 (Pg. 22-23) Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

More information

Simple Predictive Analytics Curtis Seare

Simple Predictive Analytics Curtis Seare Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use

More information

Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation

Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation Display and Summarize Correlation for Direction and Strength Properties of Correlation Regression Line Cengage

More information

Example: Boats and Manatees

Example: Boats and Manatees Figure 9-6 Example: Boats and Manatees Slide 1 Given the sample data in Table 9-1, find the value of the linear correlation coefficient r, then refer to Table A-6 to determine whether there is a significant

More information

2013 MBA Jump Start Program. Statistics Module Part 3

2013 MBA Jump Start Program. Statistics Module Part 3 2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just

More information

Regression Analysis: A Complete Example

Regression Analysis: A Complete Example Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

More information

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 13 Introduction to Linear Regression and Correlation Analysis Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing

More information

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

 Y. Notation and Equations for Regression Lecture 11/4. Notation: Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

More information

Homework 11. Part 1. Name: Score: / null

Homework 11. Part 1. Name: Score: / null Name: Score: / Homework 11 Part 1 null 1 For which of the following correlations would the data points be clustered most closely around a straight line? A. r = 0.50 B. r = -0.80 C. r = 0.10 D. There is

More information

DATA INTERPRETATION AND STATISTICS

DATA INTERPRETATION AND STATISTICS PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE

More information

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

More information

Session 7 Bivariate Data and Analysis

Session 7 Bivariate Data and Analysis Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares

More information

Introduction to Regression and Data Analysis

Introduction to Regression and Data Analysis Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

More information

Copyright 2007 by Laura Schultz. All rights reserved. Page 1 of 5

Copyright 2007 by Laura Schultz. All rights reserved. Page 1 of 5 Using Your TI-83/84 Calculator: Linear Correlation and Regression Elementary Statistics Dr. Laura Schultz This handout describes how to use your calculator for various linear correlation and regression

More information

The correlation coefficient

The correlation coefficient The correlation coefficient Clinical Biostatistics The correlation coefficient Martin Bland Correlation coefficients are used to measure the of the relationship or association between two quantitative

More information

Module 5: Multiple Regression Analysis

Module 5: Multiple Regression Analysis Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

More information

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade Statistics Quiz Correlation and Regression -- ANSWERS 1. Temperature and air pollution are known to be correlated. We collect data from two laboratories, in Boston and Montreal. Boston makes their measurements

More information

Module 5: Statistical Analysis

Module 5: Statistical Analysis Module 5: Statistical Analysis To answer more complex questions using your data, or in statistical terms, to test your hypothesis, you need to use more advanced statistical tests. This module reviews the

More information

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics. Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing

More information

Correlation and Regression

Correlation and Regression Correlation and Regression Scatterplots Correlation Explanatory and response variables Simple linear regression General Principles of Data Analysis First plot the data, then add numerical summaries Look

More information

Statistical Functions in Excel

Statistical Functions in Excel Statistical Functions in Excel There are many statistical functions in Excel. Moreover, there are other functions that are not specified as statistical functions that are helpful in some statistical analyses.

More information

Simple Linear Regression

Simple Linear Regression STAT 101 Dr. Kari Lock Morgan Simple Linear Regression SECTIONS 9.3 Confidence and prediction intervals (9.3) Conditions for inference (9.1) Want More Stats??? If you have enjoyed learning how to analyze

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

Chapter 5 Analysis of variance SPSS Analysis of variance

Chapter 5 Analysis of variance SPSS Analysis of variance Chapter 5 Analysis of variance SPSS Analysis of variance Data file used: gss.sav How to get there: Analyze Compare Means One-way ANOVA To test the null hypothesis that several population means are equal,

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r), Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables

More information

Regression step-by-step using Microsoft Excel

Regression step-by-step using Microsoft Excel Step 1: Regression step-by-step using Microsoft Excel Notes prepared by Pamela Peterson Drake, James Madison University Type the data into the spreadsheet The example used throughout this How to is a regression

More information

Linear Models in STATA and ANOVA

Linear Models in STATA and ANOVA Session 4 Linear Models in STATA and ANOVA Page Strengths of Linear Relationships 4-2 A Note on Non-Linear Relationships 4-4 Multiple Linear Regression 4-5 Removal of Variables 4-8 Independent Samples

More information

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.

More information

II. DISTRIBUTIONS distribution normal distribution. standard scores

II. DISTRIBUTIONS distribution normal distribution. standard scores Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

More information

Chapter 23. Inferences for Regression

Chapter 23. Inferences for Regression Chapter 23. Inferences for Regression Topics covered in this chapter: Simple Linear Regression Simple Linear Regression Example 23.1: Crying and IQ The Problem: Infants who cry easily may be more easily

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

More information

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics Course Text Business Statistics Lind, Douglas A., Marchal, William A. and Samuel A. Wathen. Basic Statistics for Business and Economics, 7th edition, McGraw-Hill/Irwin, 2010, ISBN: 9780077384470 [This

More information

An analysis appropriate for a quantitative outcome and a single quantitative explanatory. 9.1 The model behind linear regression

An analysis appropriate for a quantitative outcome and a single quantitative explanatory. 9.1 The model behind linear regression Chapter 9 Simple Linear Regression An analysis appropriate for a quantitative outcome and a single quantitative explanatory variable. 9.1 The model behind linear regression When we are examining the relationship

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

The Dummy s Guide to Data Analysis Using SPSS

The Dummy s Guide to Data Analysis Using SPSS The Dummy s Guide to Data Analysis Using SPSS Mathematics 57 Scripps College Amy Gamble April, 2001 Amy Gamble 4/30/01 All Rights Rerserved TABLE OF CONTENTS PAGE Helpful Hints for All Tests...1 Tests

More information

Using Microsoft Excel for Probability and Statistics

Using Microsoft Excel for Probability and Statistics Introduction Using Microsoft Excel for Probability and Despite having been set up with the business user in mind, Microsoft Excel is rather poor at handling precisely those aspects of statistics which

More information

Final Exam Practice Problem Answers

Final Exam Practice Problem Answers Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal

More information

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model. Polynomial Regression POLYNOMIAL AND MULTIPLE REGRESSION Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model. It is a form of linear regression

More information

Regression and Correlation

Regression and Correlation Regression and Correlation Topics Covered: Dependent and independent variables. Scatter diagram. Correlation coefficient. Linear Regression line. by Dr.I.Namestnikova 1 Introduction Regression analysis

More information

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate

More information

The importance of graphing the data: Anscombe s regression examples

The importance of graphing the data: Anscombe s regression examples The importance of graphing the data: Anscombe s regression examples Bruce Weaver Northern Health Research Conference Nipissing University, North Bay May 30-31, 2008 B. Weaver, NHRC 2008 1 The Objective

More information

Relationships Between Two Variables: Scatterplots and Correlation

Relationships Between Two Variables: Scatterplots and Correlation Relationships Between Two Variables: Scatterplots and Correlation Example: Consider the population of cars manufactured in the U.S. What is the relationship (1) between engine size and horsepower? (2)

More information

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management KSTAT MINI-MANUAL Decision Sciences 434 Kellogg Graduate School of Management Kstat is a set of macros added to Excel and it will enable you to do the statistics required for this course very easily. To

More information

Section 13, Part 1 ANOVA. Analysis Of Variance

Section 13, Part 1 ANOVA. Analysis Of Variance Section 13, Part 1 ANOVA Analysis Of Variance Course Overview So far in this course we ve covered: Descriptive statistics Summary statistics Tables and Graphs Probability Probability Rules Probability

More information

MULTIPLE LINEAR REGRESSION ANALYSIS USING MICROSOFT EXCEL. by Michael L. Orlov Chemistry Department, Oregon State University (1996)

MULTIPLE LINEAR REGRESSION ANALYSIS USING MICROSOFT EXCEL. by Michael L. Orlov Chemistry Department, Oregon State University (1996) MULTIPLE LINEAR REGRESSION ANALYSIS USING MICROSOFT EXCEL by Michael L. Orlov Chemistry Department, Oregon State University (1996) INTRODUCTION In modern science, regression analysis is a necessary part

More information

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation

More information

Formula for linear models. Prediction, extrapolation, significance test against zero slope.

Formula for linear models. Prediction, extrapolation, significance test against zero slope. Formula for linear models. Prediction, extrapolation, significance test against zero slope. Last time, we looked the linear regression formula. It s the line that fits the data best. The Pearson correlation

More information

Introduction to Quantitative Methods

Introduction to Quantitative Methods Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................

More information

Elementary Statistics Sample Exam #3

Elementary Statistics Sample Exam #3 Elementary Statistics Sample Exam #3 Instructions. No books or telephones. Only the supplied calculators are allowed. The exam is worth 100 points. 1. A chi square goodness of fit test is considered to

More information

Simple Regression Theory II 2010 Samuel L. Baker

Simple Regression Theory II 2010 Samuel L. Baker SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

More information

Simple Linear Regression, Scatterplots, and Bivariate Correlation

Simple Linear Regression, Scatterplots, and Bivariate Correlation 1 Simple Linear Regression, Scatterplots, and Bivariate Correlation This section covers procedures for testing the association between two continuous variables using the SPSS Regression and Correlate analyses.

More information

Scatter Plot, Correlation, and Regression on the TI-83/84

Scatter Plot, Correlation, and Regression on the TI-83/84 Scatter Plot, Correlation, and Regression on the TI-83/84 Summary: When you have a set of (x,y) data points and want to find the best equation to describe them, you are performing a regression. This page

More information

Copyright 2013 by Laura Schultz. All rights reserved. Page 1 of 7

Copyright 2013 by Laura Schultz. All rights reserved. Page 1 of 7 Using Your TI-83/84/89 Calculator: Linear Correlation and Regression Dr. Laura Schultz Statistics I This handout describes how to use your calculator for various linear correlation and regression applications.

More information

Data Analysis Tools. Tools for Summarizing Data

Data Analysis Tools. Tools for Summarizing Data Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool

More information

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance 2 Making Connections: The Two-Sample t-test, Regression, and ANOVA In theory, there s no difference between theory and practice. In practice, there is. Yogi Berra 1 Statistics courses often teach the two-sample

More information

EXCEL Tutorial: How to use EXCEL for Graphs and Calculations.

EXCEL Tutorial: How to use EXCEL for Graphs and Calculations. EXCEL Tutorial: How to use EXCEL for Graphs and Calculations. Excel is powerful tool and can make your life easier if you are proficient in using it. You will need to use Excel to complete most of your

More information

Correlation key concepts:

Correlation key concepts: CORRELATION Correlation key concepts: Types of correlation Methods of studying correlation a) Scatter diagram b) Karl pearson s coefficient of correlation c) Spearman s Rank correlation coefficient d)

More information

Study Guide for the Final Exam

Study Guide for the Final Exam Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make

More information

ch12 practice test SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.

ch12 practice test SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question. ch12 practice test 1) The null hypothesis that x and y are is H0: = 0. 1) 2) When a two-sided significance test about a population slope has a P-value below 0.05, the 95% confidence interval for A) does

More information

Correlation and Simple Linear Regression

Correlation and Simple Linear Regression Correlation and Simple Linear Regression We are often interested in studying the relationship among variables to determine whether they are associated with one another. When we think that changes in a

More information

Using Excel for inferential statistics

Using Excel for inferential statistics FACT SHEET Using Excel for inferential statistics Introduction When you collect data, you expect a certain amount of variation, just caused by chance. A wide variety of statistical tests can be applied

More information

Recall this chart that showed how most of our course would be organized:

Recall this chart that showed how most of our course would be organized: Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical

More information

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,

More information

STAT 350 Practice Final Exam Solution (Spring 2015)

STAT 350 Practice Final Exam Solution (Spring 2015) PART 1: Multiple Choice Questions: 1) A study was conducted to compare five different training programs for improving endurance. Forty subjects were randomly divided into five groups of eight subjects

More information

CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

More information

Correlation. What Is Correlation? Perfect Correlation. Perfect Correlation. Greg C Elvers

Correlation. What Is Correlation? Perfect Correlation. Perfect Correlation. Greg C Elvers Correlation Greg C Elvers What Is Correlation? Correlation is a descriptive statistic that tells you if two variables are related to each other E.g. Is your related to how much you study? When two variables

More information

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means Lesson : Comparison of Population Means Part c: Comparison of Two- Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis

More information

Elements of statistics (MATH0487-1)

Elements of statistics (MATH0487-1) Elements of statistics (MATH0487-1) Prof. Dr. Dr. K. Van Steen University of Liège, Belgium December 10, 2012 Introduction to Statistics Basic Probability Revisited Sampling Exploratory Data Analysis -

More information

How To Run Statistical Tests in Excel

How To Run Statistical Tests in Excel How To Run Statistical Tests in Excel Microsoft Excel is your best tool for storing and manipulating data, calculating basic descriptive statistics such as means and standard deviations, and conducting

More information

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ STA 3024 Practice Problems Exam 2 NOTE: These are just Practice Problems. This is NOT meant to look just like the test, and it is NOT the only thing that you should study. Make sure you know all the material

More information

MULTIPLE REGRESSION EXAMPLE

MULTIPLE REGRESSION EXAMPLE MULTIPLE REGRESSION EXAMPLE For a sample of n = 166 college students, the following variables were measured: Y = height X 1 = mother s height ( momheight ) X 2 = father s height ( dadheight ) X 3 = 1 if

More information

An analysis method for a quantitative outcome and two categorical explanatory variables.

An analysis method for a quantitative outcome and two categorical explanatory variables. Chapter 11 Two-Way ANOVA An analysis method for a quantitative outcome and two categorical explanatory variables. If an experiment has a quantitative outcome and two categorical explanatory variables that

More information

SPSS Explore procedure

SPSS Explore procedure SPSS Explore procedure One useful function in SPSS is the Explore procedure, which will produce histograms, boxplots, stem-and-leaf plots and extensive descriptive statistics. To run the Explore procedure,

More information

Premaster Statistics Tutorial 4 Full solutions

Premaster Statistics Tutorial 4 Full solutions Premaster Statistics Tutorial 4 Full solutions Regression analysis Q1 (based on Doane & Seward, 4/E, 12.7) a. Interpret the slope of the fitted regression = 125,000 + 150. b. What is the prediction for

More information

Comparing Nested Models

Comparing Nested Models Comparing Nested Models ST 430/514 Two models are nested if one model contains all the terms of the other, and at least one additional term. The larger model is the complete (or full) model, and the smaller

More information

Doing Multiple Regression with SPSS. In this case, we are interested in the Analyze options so we choose that menu. If gives us a number of choices:

Doing Multiple Regression with SPSS. In this case, we are interested in the Analyze options so we choose that menu. If gives us a number of choices: Doing Multiple Regression with SPSS Multiple Regression for Data Already in Data Editor Next we want to specify a multiple regression analysis for these data. The menu bar for SPSS offers several options:

More information

Factors affecting online sales

Factors affecting online sales Factors affecting online sales Table of contents Summary... 1 Research questions... 1 The dataset... 2 Descriptive statistics: The exploratory stage... 3 Confidence intervals... 4 Hypothesis tests... 4

More information

You have data! What s next?

You have data! What s next? You have data! What s next? Data Analysis, Your Research Questions, and Proposal Writing Zoo 511 Spring 2014 Part 1:! Research Questions Part 1:! Research Questions Write down > 2 things you thought were

More information

LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE

LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE MAT 119 STATISTICS AND ELEMENTARY ALGEBRA 5 Lecture Hours, 2 Lab Hours, 3 Credits Pre-

More information

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1) Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the

More information

Dealing with Data in Excel 2010

Dealing with Data in Excel 2010 Dealing with Data in Excel 2010 Excel provides the ability to do computations and graphing of data. Here we provide the basics and some advanced capabilities available in Excel that are useful for dealing

More information

Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS.

Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS. SPSS Regressions Social Science Research Lab American University, Washington, D.C. Web. www.american.edu/provost/ctrl/pclabs.cfm Tel. x3862 Email. SSRL@American.edu Course Objective This course is designed

More information

Using Excel for Statistical Analysis

Using Excel for Statistical Analysis Using Excel for Statistical Analysis You don t have to have a fancy pants statistics package to do many statistical functions. Excel can perform several statistical tests and analyses. First, make sure

More information

Module 3: Correlation and Covariance

Module 3: Correlation and Covariance Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis

More information

Correlational Research. Correlational Research. Stephen E. Brock, Ph.D., NCSP EDS 250. Descriptive Research 1. Correlational Research: Scatter Plots

Correlational Research. Correlational Research. Stephen E. Brock, Ph.D., NCSP EDS 250. Descriptive Research 1. Correlational Research: Scatter Plots Correlational Research Stephen E. Brock, Ph.D., NCSP California State University, Sacramento 1 Correlational Research A quantitative methodology used to determine whether, and to what degree, a relationship

More information

Chapter 7. One-way ANOVA

Chapter 7. One-way ANOVA Chapter 7 One-way ANOVA One-way ANOVA examines equality of population means for a quantitative outcome and a single categorical explanatory variable with any number of levels. The t-test of Chapter 6 looks

More information

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel

More information

Lean Six Sigma Analyze Phase Introduction. TECH 50800 QUALITY and PRODUCTIVITY in INDUSTRY and TECHNOLOGY

Lean Six Sigma Analyze Phase Introduction. TECH 50800 QUALITY and PRODUCTIVITY in INDUSTRY and TECHNOLOGY TECH 50800 QUALITY and PRODUCTIVITY in INDUSTRY and TECHNOLOGY Before we begin: Turn on the sound on your computer. There is audio to accompany this presentation. Audio will accompany most of the online

More information