Lesson Lesson Outline Outline
|
|
- Myron Wilkins
- 7 years ago
- Views:
Transcription
1 Lesson 15 Linear Regression
2 Lesson 15 Outline Review correlation analysis Dependent and Independent variables Least Squares Regression line Calculating l the slope Calculating the Intercept Residuals and Residual Plots Identifying significant relationship: t-test test of the slope R 2 : coefficient of determination Using the regression line for Prediction of Y from X Relationship between correlation coefficient and linear regression 2
3 Linear Regression and Correlation Both Linear Regression ess and Correlation o Analysis s can be used to explore the linear relationship between two continuous (quantitative) random variables. Correlation analysis is used when the interest is in identifying if a relationship exists and quantifying the strength of the relationship Regression Analysis is used to identify a relationship AND to predict the value of one variable given a value of the other variable(s). 3
4 Review: Correlation Analysis 1. Plot the data using a scatter plot to get a visual idea of the relationship 2. Calculate the correlation coefficient 1. Use Pearson s correlation coefficient if both variables are continuous 2. Use Spearman rank correlation coefficient if both variables are ordinal or one is ordinal and the other continuous. 4
5 Review: Scatter Plots and Association i Plot the 2 variables in a scatter plot (EXCEL) The pattern of the dots in the plot indicates the statistical relationship between the variables (the strength th and the direction) Positive relationship pattern goes from lower left to upper right. Negative relationship pattern goes from upper left to lower right. The more the dots cluster around a straight line with a positive or negative direction the stronger the linear relationship. 5
6 Review: Correlation Coefficient r ( x x )( y y ) [ ( x x ) 2 ][ ( y y) 2 ] The statistic r is called the Correlation Coefficient r estimated the population correlation coefficient: (the Greek letter r ) The correlation coefficient provides a measure of the linear association between two variables r is always between 1 and 1 6
7 Review: Correlation Coefficient i in Excel Use the CORREL function to find the correlation coefficient If data for one variable are in cells A1:A12 and data for other variable are in cells B1:B12, =CORREL(A1:A12,B1:B12) will return the Pearson correlation coefficient. Correlation coefficients i closer to 1 or 1 1i indicate a stronger linear relationship. Correlation coefficients close to 0 indicate a weak linear relationship. However there could be a nonlinear relationship when the correlation coefficient is close to 0. 7
8 Simple Linear Regression Like correlation analysis, Linear regression analysis is a technique that is used to explore the relationship between two continuous random variables that have a linear relationship. Regression analysis allows us to investigate the change in one variable that corresponds to a given change in the other variable. If only ONE variable is used to predict the value of the other variable, the analysis is called simple linear regression. When two or more variables are used to predict the value of the other variable, the analysis is called multiple linear regression (not covered in this course). 8
9 Linear Regression: Background Regression is from a Latin root meaning going back Linear regression as a statistical method was first described by Sir Francis Galton in his paper "Regression Towards Mediocrity in Hereditary Stature published in The Journal of the Anthropological Institute, 1886 Galton described the relationship between mid-parent height (Mid- parent height = the average of the 2 parent s height) and the height of their offspring Taller mid-parent height had children with heights closer to the average height Shorter mid-parent height had children with heights closer to the average height Galton called this phenomenon regression towards mediocrity 9
10 Sir Francis Galton: Regression When mid-parents are taller than mediocrity, their children tend to be shorter than they and When mid-parents are shorter than mediocrity, it their children tend to be taller than they 10
11 Variables in Simple Linear Regression Analysis Dependent or response variable- a variable to be predicted from or explained by the other variable The response variable is typically labeled Y Y is a continuous variable in simple linear regression Independent or explanatory variable the variable used to predict the dependant variable. This variable is typically labeled l X X can also be called the predictive variable or the regressor variable For simple linear regression X is a continuous variable For multiple linear regression X can be continuous or categorical 11
12 Identifying independent and dependent variables. In regression analysis, it s important to correctly identify the dependent d (Y) and independent d (X) variables. The study description should provide you with information about which is the dependent variable and which is the independent variable. If the study description states that the goal is to predict variable 1 from variable 2, 2 then variable 1 is the dependent variable (Y) and variable 2 is the independent variable (X). Typically, if the variables are separated in time, the variable collected first is the independent variable (X) )andthevariable collected later is the dependent variable (Y). In Galton s regression analysis, the mid-parent height was the independent variable and the offspring height was the dependent variable 12
13 Linear Regression Overview Look at a scatter plot of the data Plot Y on the y-axis and X on the x-axis Does the relationship appear to be linear? Estimate the regression line equation Find the slope and intercept of the regression line Check residuals Is the relationship statistically significant? Use a t-test test of the slope to determine significance How well does the estimated regression line equation fit the data? Calculate R 2 - the coefficient of determination Use the estimated regression line equation to predict values of fth the dependent d variable (Y)f for specified values of fth the independent variable (X). 13
14 Simple Linear Regression: An Example Is there a linear relationship between body weight and plasma volume that can be used to predict plasma volume from weight? Plasma volume is the dependent variable Y since we are interested in predicting this from body weight, the independent variable X. Body Plasma Subject Weight(kg) Volume(l)
15 Scatter plot of the Data There is a positive relationship between plasma volume and body weight. With this small number of data points it is difficult to see the linear relationship but there is a general linear trend to the data We want to identify a line that has a good fit to the data. This isn t a deterministic relationship so the points won t fall perfectly on the line. 4 Volume (liter rs) Plasma Body Weight (kg) 15
16 Estimate the Regression Line Equation A few of the many possible lines through the data points are illustrated t in the plot. How do we decide which h line best fits the data? 4 Pla asma Volum me (liters) Body Weight (kg) 16
17 Least Squares Regression Line The linear regression line is the line that gets closest to all of the points. This is called the least squares regression line. The least squares regression line minimizes the sum of the squares of the vertical distance between each observed data point (y i ) and the line minimize n ( y i 1 2 i point on linei) 17
18 Vertical distances between each observed Y (y i ) and the line are in red. The sum of these distances squared is minimized by the least squares regression line 4 Plasma a Volume (L) Body Weight (kg) 18
19 Least Squares Regression Line Equation The equation for a line requires a slope and an intercept In regression analysis, we estimate the population regression line with the least squares regression line calculated l from sample data: the sample regression line The notation for the slope and intercept in the population regression line are Greek letters for the intercept for the slope The notation for the slope and intercept in the sample regression line are Roman letters a for the intercept t b for the slope 19
20 The Population Regression Line 0 is the y -intercept of the line is the slope of the regression line 1 is the error term - the difference between the observed Y and the regression line Y X 20
21 Sample Regression Line 0 and ad 1 are aepopulation o parameters a Sample estimates for the regression parameters are : a is the estimate for b is the estimate for Y a bx is the regression line calculated from sample dt data Y is the predicted value of Y 21
22 Least Squares Regression Line aand and b are estimates of the regression coefficients and The regression coefficients are estimated from the sample data by the least squares method The intercept a is the estimated expected value of Y when X= 0 The slope b is the estimated expected change in Y corresponding to a 1 unit increase in X Y is the expected (or predicted) value of y, the point on the line. It is called the fitted value of y The following slide illustrates the least squares regression ession line 22
23 The Equation of a Regression y y Line Y a bx b a intercept 0 One-unit Change in X slope x 23
24 Interpretation of predicted values of Y The predicted value of y is the expected y-value Since not all observed data points are exactly on the regression line, there is a range of possible y-values (a distribution) for each x-value. In regression analysis the distribution of y-values for each x-value is assumed to be a normal distribution. The predicted values of y represent the mean values of the distributions of y for each specified value of x. The following slide illustrates this for 3 values of X: notice that t the mean of each distribution ib ti is on the regression line equation (the predicted value of y) and that the distribution of y-values are normal distributions. 24
25 Simple Linear Regression Model Illustrated 25
26 Assumptions for Regression Analysis There are several assumptions that should be met for regression analysis: For each value of X, the Y variable is assumed to have a normal distribution the mean of the normal distribution is the predicted value, Y The normal distributions are assumed to have equal variance across the entire range of X values. This assumption is called homogeneity or homoscedasticity. The predicted values of Y fall on the regression line representing the linear relationship between X and Y The Y observations are assumed to be independent The observations are from a random sample 26
27 Interpretation of the Slope of the Regression line The slope b is the expected change in Y corresponding to a 1 unit increase in X b = 0: There is no linear association between Y and X b > 0: There is a Positive linear association between Y and X (as X increases the expected value of Y increases) b < 0: There is a Negative linear association between Y and X (as X increases the expected value of Y decreases) The following slide illustrates a positive, negative and 0 slope. 27
28 Illustration of Negative, Positive slopes y and slope = 0 y b >0 b = 0 b < 0 0 x 28
29 Calculating the Slope of the Regression Line The formula to calculate the slope of the least squares regression line is given below b n ( x x )( y y ) i 1 i i n ( ) x x i i Notice that the numerator is the same as the numerator in the formula for the correlation coefficient. 29
30 b for plasma (Y) and body weight (X) example X Y (X- Xbar) (Y-Ybar) (X-Xbar)(Y-Ybar) (X-Xbar) Mean SUM
31 Slope of regression line From the previous slide the sum of (X-X)(Y-Y) Y) = The sum of (X-X) X) 2 = b = / = Interpretation of the slope: For every one unit increase in X, the expected increase in Y is units (rounded to 4 decimal places) Plasma volume increases liters for every one kg increase in body weight. The slope is positive indicating that as body weight (X) increases, plasma volume (Y) also increases 31
32 Calculating the Intercept of the regression line The intercept a of the regression line is the estimated value of Y when X = 0 a is calculated from the average value of Y, the average value of X and the estimated t slope b by the following formula: a Y bx 32
33 Intercept for Plasma Volume Example X Y b a * The intercept is the estimated expected value of Y when X = 0. Intercepts do not always have realistic interpretations. In this example, plasma volume is predicted to be liters when body weight = 0 kg. which h is not a possibility. 33
34 Regression Line Equation Once the slope and the intercept have been calculated the regression equation can be constructed: t Y a bx Y X This is the equation that will be used to predict plasma volume (l) from body weight (kg). The regression equation calculated from sample data is an estimate of the true population regression equation. 34
35 Regression Line Equation and interpretation i of the slope A 1 unit increase in X for this data = 1 kg so the interpretation of the slope in this regression line equation is: For each 1 kg increase in body weight, the expected increase in plasma volume is.0436 liters. What is the expected plasma volume increase for a 10 kg increase in body weight? For a 10 kilogram increase in body weight, the expected increase in plasma volume = 10* = liters. 35
36 What if the slope of the regression line is negative? If the slope of the regression line is negative we would expect a decrease in Y with each unit increase in X. The slope is a measure of the expected change in Y for each 1-unit increase in X If the slope is positive, the expected change in Y is an increase If the slope is negative, the expected change in Y is a decrease. 36
37 Regression Coefficients in Excel Excel has functions to calculate the slope and the intercept of the least squares regression line: The SLOPE function returns b - the slope =SLOPE(y-range, x-range) The INTERCEPT function returns a -the intercept =INTERCEPT(y-range, x-range) For both of these functions enter the y-range of fd data first and dth then the x-range of fth the data. 37
38 Plasma Volume Example in Excel The Lesson 15 Excel Module works through h the Plasma Volume / body weight regression example: Create a scatterplot of the data work through the calculations of the Slope and Intercept of the regression line Use the Excel Slope and Intercept functions After you ve worked through the calculations once, use the Excel functions to find the slope and intercept for future regression problems 38
39 Residuals The residual is st the ed difference ee cebet between ee the observed (Y) and the expected (Y ) value of Y Residual = Y Y Y is the observed Y for any X Y is the Y-value on the regression line for that t value of X The residual is the component of Y that is not predicted by X The least squares regression line is the line that minimizes the squared residuals 39
40 Residuals for Plasma Volume Example X Y Y' Residual Which point is closest to the regression line? Which point is furthest from the regression line? Calculate Y, the expected value of Y, using the regression line equation. The residual is the difference between Y and Y (74, 3.37) has the smallest residual (70.5, 3.49) has the largest residual 40
41 Regression Line and Residuals Largest residual Plasm ma Volume (L) Body Weight (kg) Smallest residual 41
42 Analysis of Residuals A Residual plot is a plot of the residual values on the Y- axis and the x-values on the X-axis If there is a linear relationship between X and Y, the correlation between X and the residuals should equal 0. The scatterplot will be a random scatter of points with no evident linear pattern. A nonlinear relationship between X and Y will be more evident in the residual plot of the (X, residual) data than in the scatterplot of the original (X, Y) data The Excel Regression analysis tool has an option for selecting the Residual plot. The Residual plot for the plasma volume example is on the following slide. 42
43 Residual Plot for Plasma Volume Body weight data body weight (kg) Residual Plot Re esiduals body weight (kg) No evidence of nonlinearity. The points are equally distributed around the value 0 with no evident positive or negative slope 43
44 (X, Y) Scatterplot for a nonlinear (or curvilinear) relationship When there is a curvilinear relationship between X and Y, the least squares regression line does not represent the relationship 44
45 Residual Plot for Curvilinear Relationship X Residual Plot 6 4 Residuals X This is the residual plot for the relationship on the previous slide. It illustrates that the relationship is not linear. The residual plot points aren t evenly distributed around the value 0. 45
46 Regression analysis for curvilinear relationships Simple linear regression analysis should not be used when X and Y have a curvilinear relationship There are several strategies for dealing with a curvilinear relationship between X and Y One option is to try a logarithmic transformation of the data to see if this improves the linear relationship Another option is to use piecewise regression fit one regression line to the increasing portion of the curve and a second regression line to the decreasing portion of the curve Athid third option is to include X 2 or X 3 in the regression equation (covered in PubH 6415 with multiple regression models). 46
47 Linear Regression Procedure Look at a scatter plot of the data Plot Y on the y-axis and X on the x-axis Add the trend line to the plot Estimate the regression line equation Find the slope and intercept of the regression line Check Residuals Is the relationship between X and Y statistically significant? Use a t-test test t of the slope to determine significance ifi How well does the estimated regression line equation fit the data? Calculate R 2 - the coefficient of determination Use the estimated regression line equation to predict values of the dependent variable (Y) for specified values of the independent variable (X). 47
48 Is the relationship between X and Y significant? ifi If the slope of the regression line = 0, this indicates there is no linear relationship between the variables. If there is no linear relationship the variables are considered to be independent Att t-test test t of the slope estimate t can be done to test t for independence between the X and Y variables Null hypothesis: slope = 0 The null hypothesis states t that t the variables are independent d Alternative hypothesis: slope 0 The alternative hypothesis is that there is a significant relationship between the variables If the t-test test of the slope result is significant (p-value < ), reject the null hypothesis and conclude that there is a statistically significant relationship between the two variables. 48
49 Notation for Population slope and Intercept As in any hypothesis test, the null and alternative hypotheses are stated about the population parameters, not about the estimates. The population parameters for the slope and intercept t of the regression line for the population are the Greek letters 1 and 0 1 is the population parameter for the slope 0 is the population parameter for the intercept The statistic for the t-test test of the slope will use the estimated value of the slope (b) that is calculated from the data. 49
50 t-test test of the Slope 1. State the Hypotheses Null hypothesis: = 0 Alternative hypothesis: 0 2. A t-test test will be used to test the hypothesis 3. Significance level = The degrees of freedom for a t-test test of the slope are n-2 where n=sample size The critical values of the t-test test are found using TINV(0.05, 05 df). For the plasma volume example, n = 8 so the critical values = TINV(0.05, 6) = and
51 t-test test of the slope 5. Calculate the test statistic the slope estimate divided by the standard error of the slope t b 1 SE( b 1 ) The formula for the SE of the slope is complicated so we will use the Excel Data Analysis Tool to do this t- test. The Data Analysis Tool provides the t-statistic and the p-value of the t-test test of the slope 6. State the conclusion. If the test statistic is more extreme than the critical values reject the null hypothesis and conclude that there is a significant relationship between the variables. 51
52 T-test of the Slope in Excel Data Analysis Tool output for the weight / plasma volume example: The t-statistic and p-value for the t-test of the slope are highlighted SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted d R Square Standard Error Observations 8 ANOVA df SS MS F Significance F Regression Residual Total Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept Body weight P-value for t-test test = so reject the null hypothesis and conclude that there is a significant relationship between weight and plasma volume 52
53 Regression Analysis in Excel In Excel Module 15 use the Data Analysis Tool to obtain the Regression Analysis results select Regression under the Data Analysis Tool. Enter the plasma volume data for Y-range and the weight data for X-range Check labels if you highlight the column headers Also check Residuals and Residual Plot Identify the t-statistic t ti ti and the p-value for the t-test test t of the slope. Also identify the slope and the intercept on the output table These are under the Coefficients column 95% confidence intervals for the coefficients are also provided if the Confidence Level box is checked 53
54 T-test of the Intercept The Data Analysis Tool also provides results of a t-test test of the Intercept. The Null hypothesis of this test is that the intercept = 0: = 0 The Alternative ti hypothesis of this test t is that t the intercept 0: 0 Usually there is not much interest in the t-test test of the intercept because testing whether the intercept = 0 does not provide information about the relationship between the two variables. From the Regression Table, you can see that the null hypothesis for the intercept = 0 is not rejected because the p-value = This result does not affect the significant result of the t-test test of the slope. 54
55 Linear Regression Procedure Look at a scatter plot of the data Plot Y on the y-axis and X on the x-axis Add the trend line to the plot Estimate the regression line equation Find the slope and intercept of the regression line Is the relationship statistically significant? Use a t-test test of the slope to determine significance How well does the estimated t regression line equation fit the data? Calculate R 2 - the coefficient of determination Use the estimated regression line equation to predict values of the dependent variable (Y) for specified values of the independent variable (X). 55
56 How well does the regression line equation fit the data? r 2 is st the notation otato for the ecoe coefficient ce to of determination r 2 is equal to the correlation coefficient (r) squared. It can range from 0 to 1. Interpretation of r 2 r 2 is proportion of variation in the dependent d variable (Y) that is explained by the estimated least squares regression equation. Larger values of r 2 indicate a better fit of the regression line to the data which indicates a more useful predictive model. 56
57 Calculating r 2 In Excel, you can use the CORREL function to find the correlation coefficient and square this value to find the coefficient of determination For the plasma / weight data, r = so r 2 = = Or you can find r 2 on the Data Analysis Tool Output: Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 8 Multiple R = the correlation coefficient R square = coefficient of determination (r 2 ) 57
58 Interpretation of r 2 For the plasma volume example r 2 = Interpretation: 57.6% of the variation in plasma volume is explained by the regression line equation with weight as the explanatory variable. Since only 57.6% of the variation in plasma volume is explained by body weight, there are likely other variables that explain some of the variation in plasma volume. Multiple l regression analysis uses more than one explanatory variable to predict the dependent variable This is covered in PubH 6415 If there are other explanatory variables significantly related to plasma volume in a multiple regression model, r 2 will increase 58
59 Linear Regression Procedure Look at a scatter plot of the data we have done this Plot Y on the y-axis and X on the x-axis Does the relationship appear to be linear? Estimate the regression line equation we have done this Find the slope and intercept of the regression line Is the relationship statistically significant? Use a t-test test of the slope to determine significance How well does the estimated t regression line equation fit the data? We have done this Calculate R 2 - the coefficient of determination Use the estimated regression line equation to predict values of the dependent variable (Y) for specified values of the independent variable (X). 59
60 Using the Regression Line equation for Prediction i The regression line equation for the weight and plasma volume data is: Y X For a given value of weight (X), the plasma volume (Y) can be predicted. What is the expected plasma volume for an individual who weighs 60 kg? Insert 60 in the equation in place of X and solve for Y: Y * lite liters 60
61 Predicting plasma volume for weight = 60 kg Plasma a Volume (liters) Body Weight (kg) The predicted plasma volume for weight = 60 kg is the point on the regression line corresponding to x = 60. This point is 2.7 liters. 61
62 Appropriate Applications of the Regression Line Equation Predictions using regression line equations are only valid within the range of x-values in the collected data. For the example data, the range of weight is from kgs. It would not be appropriate to use this regression line equation to predict plasma volume for an individual weighing 100 kg or an individual weighing 25 kg. There may be a different relationship between weight and plasma volume beyond the values of the collected data so the relationship identified by the regression line equation should not be extrapolated much beyond the range of the X values. 62
63 More cautions about application of Regression line predictions Predictions using Regression line equations are only valid for the population represented by the sample data. For Example, if data for a regression analysis are collected for girls age 10-18, predictions using the equation are not necessarily valid for boys, adults or girls younger than 10. You can t assume that the relationship between two variables in one population is the same in other populations. Read the study description carefully to identify the population that was sampled. Regression analysis inferences are valid for this population but not necessarily other populations. 63
64 What if there isn t a significant relationship between the variables? If regression analysis reveals that there is NOT a significant relationship between the two variables (that is if the p-value for the t-test test of the slope > ) )the ) regression equation is not useful for predicting values of the dependent variable from the independent variable. If the t-test test of the slope is NOT significant, end the regression analysis procedure and do not use the regression line equation for prediction. Prediction using the regression line equation is only useful if the null hypothesis of independence between the variables is rejected. 64
65 Relationship between Correlation and Regression The correlation coefficient and the slope of the regression line are related. For a given set of data: They will both have the same sign indicating the direction of the relationship (positive or negative). There is a mathematical ti relationship between the slope and the correlation coefficient: the slope of the regression line is equal to the correlation coefficient times the standard deviation of y divided by the standard deviation of x: b 1 rs y s x 65
66 Hypothesis Test of population correlation coefficient: i We can set up a hypothesis test of independence for the population correlation: Null Hypothesis: no significant linear association between the variables Alternative Hypothesis: 0 significant linear association between the variables The test statistic is a t-statistic with n-2 df After finding the t-statistic,,y you can use EXCEL to find the p-value = TDIST(t, n-2, 2) t r n 1 r
67 T-test of the correlation coefficient i For a given sample data, the t-test test for and the t-test test for the slope, 1, will have the same t-statistic t ti ti and p-value. For the plasma volume data, the t-statistic for the test of the population correlation coefficient = which is the same as the t-statistic t ti ti for the slope of the regression line You can work through the equation in EXCEL to confirm this P-value = TDIST( , 6, 2) = The same conclusion is reached from either hypothesis test: t there is a significant ifi relationship between the two variables The p-value < 0.05 so the null hypothesis of independence e is rejected at significance n level el
68 Linear Regression and Correlation: which to use? Both Linear Regression and Correlation Analysis can be used to explore the linear relationship between two continuous (quantitative) random variables Use Correlation analysis when the interest is primarily in identifying whether a relationship exists. Use the t-test test of the correlation coefficient to determine if the relationship is significant. Use Regression ession Analysis to identify a relationship AND to predict the value of one variable given a value of the other variable. Use the t-test test of the slope to determine if the relationship is significant Regression analysis is most useful when there is an identified interest in predicting one variable from the other(s). If prediction doesn t make sense, use correlation analysis. 68
69 Readings and Assignments Reading Chapter 8 pgs , 194, Complete the Lesson 15 Practice Exercises Lesson 15 Excel Modules Excel Module 15: Plasma Volume works through the example in this Lesson Excel Module 15: BMI works through the example in the text (pages , 206, ) 209) Complete OPTIONAL Homework 11: Use the Data Analysis Tool for the Linear Regression problems 69
Univariate Regression
Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is
More informationCorrelation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2
Lesson 4 Part 1 Relationships between two numerical variables 1 Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables
More informationSection 14 Simple Linear Regression: Introduction to Least Squares Regression
Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship
More informationX X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)
CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.
More informationChapter 7: Simple linear regression Learning Objectives
Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -
More informationSection 3 Part 1. Relationships between two numerical variables
Section 3 Part 1 Relationships between two numerical variables 1 Relationship between two variables The summary statistics covered in the previous lessons are appropriate for describing a single variable.
More information2. Simple Linear Regression
Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according
More informationSimple linear regression
Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between
More information1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96
1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years
More informationCHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression
Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the
More informationExercise 1.12 (Pg. 22-23)
Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.
More informationSimple Predictive Analytics Curtis Seare
Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use
More informationLecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation
Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation Display and Summarize Correlation for Direction and Strength Properties of Correlation Regression Line Cengage
More informationExample: Boats and Manatees
Figure 9-6 Example: Boats and Manatees Slide 1 Given the sample data in Table 9-1, find the value of the linear correlation coefficient r, then refer to Table A-6 to determine whether there is a significant
More information2013 MBA Jump Start Program. Statistics Module Part 3
2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just
More informationRegression Analysis: A Complete Example
Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty
More informationChapter 13 Introduction to Linear Regression and Correlation Analysis
Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing
More information" Y. Notation and Equations for Regression Lecture 11/4. Notation:
Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through
More informationHomework 11. Part 1. Name: Score: / null
Name: Score: / Homework 11 Part 1 null 1 For which of the following correlations would the data points be clustered most closely around a straight line? A. r = 0.50 B. r = -0.80 C. r = 0.10 D. There is
More informationDATA INTERPRETATION AND STATISTICS
PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE
More informationUnit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a
More informationSession 7 Bivariate Data and Analysis
Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares
More informationIntroduction to Regression and Data Analysis
Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it
More informationCopyright 2007 by Laura Schultz. All rights reserved. Page 1 of 5
Using Your TI-83/84 Calculator: Linear Correlation and Regression Elementary Statistics Dr. Laura Schultz This handout describes how to use your calculator for various linear correlation and regression
More informationThe correlation coefficient
The correlation coefficient Clinical Biostatistics The correlation coefficient Martin Bland Correlation coefficients are used to measure the of the relationship or association between two quantitative
More informationModule 5: Multiple Regression Analysis
Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College
More informationAnswer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade
Statistics Quiz Correlation and Regression -- ANSWERS 1. Temperature and air pollution are known to be correlated. We collect data from two laboratories, in Boston and Montreal. Boston makes their measurements
More informationModule 5: Statistical Analysis
Module 5: Statistical Analysis To answer more complex questions using your data, or in statistical terms, to test your hypothesis, you need to use more advanced statistical tests. This module reviews the
More informationBusiness Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.
Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing
More informationCorrelation and Regression
Correlation and Regression Scatterplots Correlation Explanatory and response variables Simple linear regression General Principles of Data Analysis First plot the data, then add numerical summaries Look
More informationStatistical Functions in Excel
Statistical Functions in Excel There are many statistical functions in Excel. Moreover, there are other functions that are not specified as statistical functions that are helpful in some statistical analyses.
More informationSimple Linear Regression
STAT 101 Dr. Kari Lock Morgan Simple Linear Regression SECTIONS 9.3 Confidence and prediction intervals (9.3) Conditions for inference (9.1) Want More Stats??? If you have enjoyed learning how to analyze
More informationNCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
More informationChapter 5 Analysis of variance SPSS Analysis of variance
Chapter 5 Analysis of variance SPSS Analysis of variance Data file used: gss.sav How to get there: Analyze Compare Means One-way ANOVA To test the null hypothesis that several population means are equal,
More informationAdditional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm
Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm
More informationChapter 10. Key Ideas Correlation, Correlation Coefficient (r),
Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables
More informationRegression step-by-step using Microsoft Excel
Step 1: Regression step-by-step using Microsoft Excel Notes prepared by Pamela Peterson Drake, James Madison University Type the data into the spreadsheet The example used throughout this How to is a regression
More informationLinear Models in STATA and ANOVA
Session 4 Linear Models in STATA and ANOVA Page Strengths of Linear Relationships 4-2 A Note on Non-Linear Relationships 4-4 Multiple Linear Regression 4-5 Removal of Variables 4-8 Independent Samples
More informationLAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING
LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.
More informationII. DISTRIBUTIONS distribution normal distribution. standard scores
Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,
More informationChapter 23. Inferences for Regression
Chapter 23. Inferences for Regression Topics covered in this chapter: Simple Linear Regression Simple Linear Regression Example 23.1: Crying and IQ The Problem: Infants who cry easily may be more easily
More informationDescriptive Statistics
Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize
More informationCourse Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics
Course Text Business Statistics Lind, Douglas A., Marchal, William A. and Samuel A. Wathen. Basic Statistics for Business and Economics, 7th edition, McGraw-Hill/Irwin, 2010, ISBN: 9780077384470 [This
More informationAn analysis appropriate for a quantitative outcome and a single quantitative explanatory. 9.1 The model behind linear regression
Chapter 9 Simple Linear Regression An analysis appropriate for a quantitative outcome and a single quantitative explanatory variable. 9.1 The model behind linear regression When we are examining the relationship
More informationSimple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
More informationThe Dummy s Guide to Data Analysis Using SPSS
The Dummy s Guide to Data Analysis Using SPSS Mathematics 57 Scripps College Amy Gamble April, 2001 Amy Gamble 4/30/01 All Rights Rerserved TABLE OF CONTENTS PAGE Helpful Hints for All Tests...1 Tests
More informationUsing Microsoft Excel for Probability and Statistics
Introduction Using Microsoft Excel for Probability and Despite having been set up with the business user in mind, Microsoft Excel is rather poor at handling precisely those aspects of statistics which
More informationFinal Exam Practice Problem Answers
Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal
More informationPOLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.
Polynomial Regression POLYNOMIAL AND MULTIPLE REGRESSION Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model. It is a form of linear regression
More informationRegression and Correlation
Regression and Correlation Topics Covered: Dependent and independent variables. Scatter diagram. Correlation coefficient. Linear Regression line. by Dr.I.Namestnikova 1 Introduction Regression analysis
More informationHYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION
HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate
More informationThe importance of graphing the data: Anscombe s regression examples
The importance of graphing the data: Anscombe s regression examples Bruce Weaver Northern Health Research Conference Nipissing University, North Bay May 30-31, 2008 B. Weaver, NHRC 2008 1 The Objective
More informationRelationships Between Two Variables: Scatterplots and Correlation
Relationships Between Two Variables: Scatterplots and Correlation Example: Consider the population of cars manufactured in the U.S. What is the relationship (1) between engine size and horsepower? (2)
More informationKSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management
KSTAT MINI-MANUAL Decision Sciences 434 Kellogg Graduate School of Management Kstat is a set of macros added to Excel and it will enable you to do the statistics required for this course very easily. To
More informationSection 13, Part 1 ANOVA. Analysis Of Variance
Section 13, Part 1 ANOVA Analysis Of Variance Course Overview So far in this course we ve covered: Descriptive statistics Summary statistics Tables and Graphs Probability Probability Rules Probability
More informationMULTIPLE LINEAR REGRESSION ANALYSIS USING MICROSOFT EXCEL. by Michael L. Orlov Chemistry Department, Oregon State University (1996)
MULTIPLE LINEAR REGRESSION ANALYSIS USING MICROSOFT EXCEL by Michael L. Orlov Chemistry Department, Oregon State University (1996) INTRODUCTION In modern science, regression analysis is a necessary part
More informationOutline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares
Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation
More informationFormula for linear models. Prediction, extrapolation, significance test against zero slope.
Formula for linear models. Prediction, extrapolation, significance test against zero slope. Last time, we looked the linear regression formula. It s the line that fits the data best. The Pearson correlation
More informationIntroduction to Quantitative Methods
Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................
More informationElementary Statistics Sample Exam #3
Elementary Statistics Sample Exam #3 Instructions. No books or telephones. Only the supplied calculators are allowed. The exam is worth 100 points. 1. A chi square goodness of fit test is considered to
More informationSimple Regression Theory II 2010 Samuel L. Baker
SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the
More informationSimple Linear Regression, Scatterplots, and Bivariate Correlation
1 Simple Linear Regression, Scatterplots, and Bivariate Correlation This section covers procedures for testing the association between two continuous variables using the SPSS Regression and Correlate analyses.
More informationScatter Plot, Correlation, and Regression on the TI-83/84
Scatter Plot, Correlation, and Regression on the TI-83/84 Summary: When you have a set of (x,y) data points and want to find the best equation to describe them, you are performing a regression. This page
More informationCopyright 2013 by Laura Schultz. All rights reserved. Page 1 of 7
Using Your TI-83/84/89 Calculator: Linear Correlation and Regression Dr. Laura Schultz Statistics I This handout describes how to use your calculator for various linear correlation and regression applications.
More informationData Analysis Tools. Tools for Summarizing Data
Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool
More informationStatistics courses often teach the two-sample t-test, linear regression, and analysis of variance
2 Making Connections: The Two-Sample t-test, Regression, and ANOVA In theory, there s no difference between theory and practice. In practice, there is. Yogi Berra 1 Statistics courses often teach the two-sample
More informationEXCEL Tutorial: How to use EXCEL for Graphs and Calculations.
EXCEL Tutorial: How to use EXCEL for Graphs and Calculations. Excel is powerful tool and can make your life easier if you are proficient in using it. You will need to use Excel to complete most of your
More informationCorrelation key concepts:
CORRELATION Correlation key concepts: Types of correlation Methods of studying correlation a) Scatter diagram b) Karl pearson s coefficient of correlation c) Spearman s Rank correlation coefficient d)
More informationStudy Guide for the Final Exam
Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make
More informationch12 practice test SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.
ch12 practice test 1) The null hypothesis that x and y are is H0: = 0. 1) 2) When a two-sided significance test about a population slope has a P-value below 0.05, the 95% confidence interval for A) does
More informationCorrelation and Simple Linear Regression
Correlation and Simple Linear Regression We are often interested in studying the relationship among variables to determine whether they are associated with one another. When we think that changes in a
More informationUsing Excel for inferential statistics
FACT SHEET Using Excel for inferential statistics Introduction When you collect data, you expect a certain amount of variation, just caused by chance. A wide variety of statistical tests can be applied
More informationRecall this chart that showed how most of our course would be organized:
Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical
More informationDESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.
DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,
More informationSTAT 350 Practice Final Exam Solution (Spring 2015)
PART 1: Multiple Choice Questions: 1) A study was conducted to compare five different training programs for improving endurance. Forty subjects were randomly divided into five groups of eight subjects
More informationCALCULATIONS & STATISTICS
CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents
More informationCorrelation. What Is Correlation? Perfect Correlation. Perfect Correlation. Greg C Elvers
Correlation Greg C Elvers What Is Correlation? Correlation is a descriptive statistic that tells you if two variables are related to each other E.g. Is your related to how much you study? When two variables
More informationLesson 1: Comparison of Population Means Part c: Comparison of Two- Means
Lesson : Comparison of Population Means Part c: Comparison of Two- Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis
More informationElements of statistics (MATH0487-1)
Elements of statistics (MATH0487-1) Prof. Dr. Dr. K. Van Steen University of Liège, Belgium December 10, 2012 Introduction to Statistics Basic Probability Revisited Sampling Exploratory Data Analysis -
More informationHow To Run Statistical Tests in Excel
How To Run Statistical Tests in Excel Microsoft Excel is your best tool for storing and manipulating data, calculating basic descriptive statistics such as means and standard deviations, and conducting
More information1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ
STA 3024 Practice Problems Exam 2 NOTE: These are just Practice Problems. This is NOT meant to look just like the test, and it is NOT the only thing that you should study. Make sure you know all the material
More informationMULTIPLE REGRESSION EXAMPLE
MULTIPLE REGRESSION EXAMPLE For a sample of n = 166 college students, the following variables were measured: Y = height X 1 = mother s height ( momheight ) X 2 = father s height ( dadheight ) X 3 = 1 if
More informationAn analysis method for a quantitative outcome and two categorical explanatory variables.
Chapter 11 Two-Way ANOVA An analysis method for a quantitative outcome and two categorical explanatory variables. If an experiment has a quantitative outcome and two categorical explanatory variables that
More informationSPSS Explore procedure
SPSS Explore procedure One useful function in SPSS is the Explore procedure, which will produce histograms, boxplots, stem-and-leaf plots and extensive descriptive statistics. To run the Explore procedure,
More informationPremaster Statistics Tutorial 4 Full solutions
Premaster Statistics Tutorial 4 Full solutions Regression analysis Q1 (based on Doane & Seward, 4/E, 12.7) a. Interpret the slope of the fitted regression = 125,000 + 150. b. What is the prediction for
More informationComparing Nested Models
Comparing Nested Models ST 430/514 Two models are nested if one model contains all the terms of the other, and at least one additional term. The larger model is the complete (or full) model, and the smaller
More informationDoing Multiple Regression with SPSS. In this case, we are interested in the Analyze options so we choose that menu. If gives us a number of choices:
Doing Multiple Regression with SPSS Multiple Regression for Data Already in Data Editor Next we want to specify a multiple regression analysis for these data. The menu bar for SPSS offers several options:
More informationFactors affecting online sales
Factors affecting online sales Table of contents Summary... 1 Research questions... 1 The dataset... 2 Descriptive statistics: The exploratory stage... 3 Confidence intervals... 4 Hypothesis tests... 4
More informationYou have data! What s next?
You have data! What s next? Data Analysis, Your Research Questions, and Proposal Writing Zoo 511 Spring 2014 Part 1:! Research Questions Part 1:! Research Questions Write down > 2 things you thought were
More informationLAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE
LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE MAT 119 STATISTICS AND ELEMENTARY ALGEBRA 5 Lecture Hours, 2 Lab Hours, 3 Credits Pre-
More informationClass 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)
Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the
More informationDealing with Data in Excel 2010
Dealing with Data in Excel 2010 Excel provides the ability to do computations and graphing of data. Here we provide the basics and some advanced capabilities available in Excel that are useful for dealing
More informationCourse Objective This course is designed to give you a basic understanding of how to run regressions in SPSS.
SPSS Regressions Social Science Research Lab American University, Washington, D.C. Web. www.american.edu/provost/ctrl/pclabs.cfm Tel. x3862 Email. SSRL@American.edu Course Objective This course is designed
More informationUsing Excel for Statistical Analysis
Using Excel for Statistical Analysis You don t have to have a fancy pants statistics package to do many statistical functions. Excel can perform several statistical tests and analyses. First, make sure
More informationModule 3: Correlation and Covariance
Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis
More informationCorrelational Research. Correlational Research. Stephen E. Brock, Ph.D., NCSP EDS 250. Descriptive Research 1. Correlational Research: Scatter Plots
Correlational Research Stephen E. Brock, Ph.D., NCSP California State University, Sacramento 1 Correlational Research A quantitative methodology used to determine whether, and to what degree, a relationship
More informationChapter 7. One-way ANOVA
Chapter 7 One-way ANOVA One-way ANOVA examines equality of population means for a quantitative outcome and a single categorical explanatory variable with any number of levels. The t-test of Chapter 6 looks
More informationbusiness statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar
business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel
More informationLean Six Sigma Analyze Phase Introduction. TECH 50800 QUALITY and PRODUCTIVITY in INDUSTRY and TECHNOLOGY
TECH 50800 QUALITY and PRODUCTIVITY in INDUSTRY and TECHNOLOGY Before we begin: Turn on the sound on your computer. There is audio to accompany this presentation. Audio will accompany most of the online
More information