Lesson Lesson Outline Outline


 Myron Wilkins
 1 years ago
 Views:
Transcription
1 Lesson 15 Linear Regression
2 Lesson 15 Outline Review correlation analysis Dependent and Independent variables Least Squares Regression line Calculating l the slope Calculating the Intercept Residuals and Residual Plots Identifying significant relationship: ttest test of the slope R 2 : coefficient of determination Using the regression line for Prediction of Y from X Relationship between correlation coefficient and linear regression 2
3 Linear Regression and Correlation Both Linear Regression ess and Correlation o Analysis s can be used to explore the linear relationship between two continuous (quantitative) random variables. Correlation analysis is used when the interest is in identifying if a relationship exists and quantifying the strength of the relationship Regression Analysis is used to identify a relationship AND to predict the value of one variable given a value of the other variable(s). 3
4 Review: Correlation Analysis 1. Plot the data using a scatter plot to get a visual idea of the relationship 2. Calculate the correlation coefficient 1. Use Pearson s correlation coefficient if both variables are continuous 2. Use Spearman rank correlation coefficient if both variables are ordinal or one is ordinal and the other continuous. 4
5 Review: Scatter Plots and Association i Plot the 2 variables in a scatter plot (EXCEL) The pattern of the dots in the plot indicates the statistical relationship between the variables (the strength th and the direction) Positive relationship pattern goes from lower left to upper right. Negative relationship pattern goes from upper left to lower right. The more the dots cluster around a straight line with a positive or negative direction the stronger the linear relationship. 5
6 Review: Correlation Coefficient r ( x x )( y y ) [ ( x x ) 2 ][ ( y y) 2 ] The statistic r is called the Correlation Coefficient r estimated the population correlation coefficient: (the Greek letter r ) The correlation coefficient provides a measure of the linear association between two variables r is always between 1 and 1 6
7 Review: Correlation Coefficient i in Excel Use the CORREL function to find the correlation coefficient If data for one variable are in cells A1:A12 and data for other variable are in cells B1:B12, =CORREL(A1:A12,B1:B12) will return the Pearson correlation coefficient. Correlation coefficients i closer to 1 or 1 1i indicate a stronger linear relationship. Correlation coefficients close to 0 indicate a weak linear relationship. However there could be a nonlinear relationship when the correlation coefficient is close to 0. 7
8 Simple Linear Regression Like correlation analysis, Linear regression analysis is a technique that is used to explore the relationship between two continuous random variables that have a linear relationship. Regression analysis allows us to investigate the change in one variable that corresponds to a given change in the other variable. If only ONE variable is used to predict the value of the other variable, the analysis is called simple linear regression. When two or more variables are used to predict the value of the other variable, the analysis is called multiple linear regression (not covered in this course). 8
9 Linear Regression: Background Regression is from a Latin root meaning going back Linear regression as a statistical method was first described by Sir Francis Galton in his paper "Regression Towards Mediocrity in Hereditary Stature published in The Journal of the Anthropological Institute, 1886 Galton described the relationship between midparent height (Mid parent height = the average of the 2 parent s height) and the height of their offspring Taller midparent height had children with heights closer to the average height Shorter midparent height had children with heights closer to the average height Galton called this phenomenon regression towards mediocrity 9
10 Sir Francis Galton: Regression When midparents are taller than mediocrity, their children tend to be shorter than they and When midparents are shorter than mediocrity, it their children tend to be taller than they 10
11 Variables in Simple Linear Regression Analysis Dependent or response variable a variable to be predicted from or explained by the other variable The response variable is typically labeled Y Y is a continuous variable in simple linear regression Independent or explanatory variable the variable used to predict the dependant variable. This variable is typically labeled l X X can also be called the predictive variable or the regressor variable For simple linear regression X is a continuous variable For multiple linear regression X can be continuous or categorical 11
12 Identifying independent and dependent variables. In regression analysis, it s important to correctly identify the dependent d (Y) and independent d (X) variables. The study description should provide you with information about which is the dependent variable and which is the independent variable. If the study description states that the goal is to predict variable 1 from variable 2, 2 then variable 1 is the dependent variable (Y) and variable 2 is the independent variable (X). Typically, if the variables are separated in time, the variable collected first is the independent variable (X) )andthevariable collected later is the dependent variable (Y). In Galton s regression analysis, the midparent height was the independent variable and the offspring height was the dependent variable 12
13 Linear Regression Overview Look at a scatter plot of the data Plot Y on the yaxis and X on the xaxis Does the relationship appear to be linear? Estimate the regression line equation Find the slope and intercept of the regression line Check residuals Is the relationship statistically significant? Use a ttest test of the slope to determine significance How well does the estimated regression line equation fit the data? Calculate R 2  the coefficient of determination Use the estimated regression line equation to predict values of fth the dependent d variable (Y)f for specified values of fth the independent variable (X). 13
14 Simple Linear Regression: An Example Is there a linear relationship between body weight and plasma volume that can be used to predict plasma volume from weight? Plasma volume is the dependent variable Y since we are interested in predicting this from body weight, the independent variable X. Body Plasma Subject Weight(kg) Volume(l)
15 Scatter plot of the Data There is a positive relationship between plasma volume and body weight. With this small number of data points it is difficult to see the linear relationship but there is a general linear trend to the data We want to identify a line that has a good fit to the data. This isn t a deterministic relationship so the points won t fall perfectly on the line. 4 Volume (liter rs) Plasma Body Weight (kg) 15
16 Estimate the Regression Line Equation A few of the many possible lines through the data points are illustrated t in the plot. How do we decide which h line best fits the data? 4 Pla asma Volum me (liters) Body Weight (kg) 16
17 Least Squares Regression Line The linear regression line is the line that gets closest to all of the points. This is called the least squares regression line. The least squares regression line minimizes the sum of the squares of the vertical distance between each observed data point (y i ) and the line minimize n ( y i 1 2 i point on linei) 17
18 Vertical distances between each observed Y (y i ) and the line are in red. The sum of these distances squared is minimized by the least squares regression line 4 Plasma a Volume (L) Body Weight (kg) 18
19 Least Squares Regression Line Equation The equation for a line requires a slope and an intercept In regression analysis, we estimate the population regression line with the least squares regression line calculated l from sample data: the sample regression line The notation for the slope and intercept in the population regression line are Greek letters for the intercept for the slope The notation for the slope and intercept in the sample regression line are Roman letters a for the intercept t b for the slope 19
20 The Population Regression Line 0 is the y intercept of the line is the slope of the regression line 1 is the error term  the difference between the observed Y and the regression line Y X 20
21 Sample Regression Line 0 and ad 1 are aepopulation o parameters a Sample estimates for the regression parameters are : a is the estimate for b is the estimate for Y a bx is the regression line calculated from sample dt data Y is the predicted value of Y 21
22 Least Squares Regression Line aand and b are estimates of the regression coefficients and The regression coefficients are estimated from the sample data by the least squares method The intercept a is the estimated expected value of Y when X= 0 The slope b is the estimated expected change in Y corresponding to a 1 unit increase in X Y is the expected (or predicted) value of y, the point on the line. It is called the fitted value of y The following slide illustrates the least squares regression ession line 22
23 The Equation of a Regression y y Line Y a bx b a intercept 0 Oneunit Change in X slope x 23
24 Interpretation of predicted values of Y The predicted value of y is the expected yvalue Since not all observed data points are exactly on the regression line, there is a range of possible yvalues (a distribution) for each xvalue. In regression analysis the distribution of yvalues for each xvalue is assumed to be a normal distribution. The predicted values of y represent the mean values of the distributions of y for each specified value of x. The following slide illustrates this for 3 values of X: notice that t the mean of each distribution ib ti is on the regression line equation (the predicted value of y) and that the distribution of yvalues are normal distributions. 24
25 Simple Linear Regression Model Illustrated 25
26 Assumptions for Regression Analysis There are several assumptions that should be met for regression analysis: For each value of X, the Y variable is assumed to have a normal distribution the mean of the normal distribution is the predicted value, Y The normal distributions are assumed to have equal variance across the entire range of X values. This assumption is called homogeneity or homoscedasticity. The predicted values of Y fall on the regression line representing the linear relationship between X and Y The Y observations are assumed to be independent The observations are from a random sample 26
27 Interpretation of the Slope of the Regression line The slope b is the expected change in Y corresponding to a 1 unit increase in X b = 0: There is no linear association between Y and X b > 0: There is a Positive linear association between Y and X (as X increases the expected value of Y increases) b < 0: There is a Negative linear association between Y and X (as X increases the expected value of Y decreases) The following slide illustrates a positive, negative and 0 slope. 27
28 Illustration of Negative, Positive slopes y and slope = 0 y b >0 b = 0 b < 0 0 x 28
29 Calculating the Slope of the Regression Line The formula to calculate the slope of the least squares regression line is given below b n ( x x )( y y ) i 1 i i n ( ) x x i i Notice that the numerator is the same as the numerator in the formula for the correlation coefficient. 29
30 b for plasma (Y) and body weight (X) example X Y (X Xbar) (YYbar) (XXbar)(YYbar) (XXbar) Mean SUM
31 Slope of regression line From the previous slide the sum of (XX)(YY) Y) = The sum of (XX) X) 2 = b = / = Interpretation of the slope: For every one unit increase in X, the expected increase in Y is units (rounded to 4 decimal places) Plasma volume increases liters for every one kg increase in body weight. The slope is positive indicating that as body weight (X) increases, plasma volume (Y) also increases 31
32 Calculating the Intercept of the regression line The intercept a of the regression line is the estimated value of Y when X = 0 a is calculated from the average value of Y, the average value of X and the estimated t slope b by the following formula: a Y bx 32
33 Intercept for Plasma Volume Example X Y b a * The intercept is the estimated expected value of Y when X = 0. Intercepts do not always have realistic interpretations. In this example, plasma volume is predicted to be liters when body weight = 0 kg. which h is not a possibility. 33
34 Regression Line Equation Once the slope and the intercept have been calculated the regression equation can be constructed: t Y a bx Y X This is the equation that will be used to predict plasma volume (l) from body weight (kg). The regression equation calculated from sample data is an estimate of the true population regression equation. 34
35 Regression Line Equation and interpretation i of the slope A 1 unit increase in X for this data = 1 kg so the interpretation of the slope in this regression line equation is: For each 1 kg increase in body weight, the expected increase in plasma volume is.0436 liters. What is the expected plasma volume increase for a 10 kg increase in body weight? For a 10 kilogram increase in body weight, the expected increase in plasma volume = 10* = liters. 35
36 What if the slope of the regression line is negative? If the slope of the regression line is negative we would expect a decrease in Y with each unit increase in X. The slope is a measure of the expected change in Y for each 1unit increase in X If the slope is positive, the expected change in Y is an increase If the slope is negative, the expected change in Y is a decrease. 36
37 Regression Coefficients in Excel Excel has functions to calculate the slope and the intercept of the least squares regression line: The SLOPE function returns b  the slope =SLOPE(yrange, xrange) The INTERCEPT function returns a the intercept =INTERCEPT(yrange, xrange) For both of these functions enter the yrange of fd data first and dth then the xrange of fth the data. 37
38 Plasma Volume Example in Excel The Lesson 15 Excel Module works through h the Plasma Volume / body weight regression example: Create a scatterplot of the data work through the calculations of the Slope and Intercept of the regression line Use the Excel Slope and Intercept functions After you ve worked through the calculations once, use the Excel functions to find the slope and intercept for future regression problems 38
39 Residuals The residual is st the ed difference ee cebet between ee the observed (Y) and the expected (Y ) value of Y Residual = Y Y Y is the observed Y for any X Y is the Yvalue on the regression line for that t value of X The residual is the component of Y that is not predicted by X The least squares regression line is the line that minimizes the squared residuals 39
40 Residuals for Plasma Volume Example X Y Y' Residual Which point is closest to the regression line? Which point is furthest from the regression line? Calculate Y, the expected value of Y, using the regression line equation. The residual is the difference between Y and Y (74, 3.37) has the smallest residual (70.5, 3.49) has the largest residual 40
41 Regression Line and Residuals Largest residual Plasm ma Volume (L) Body Weight (kg) Smallest residual 41
42 Analysis of Residuals A Residual plot is a plot of the residual values on the Y axis and the xvalues on the Xaxis If there is a linear relationship between X and Y, the correlation between X and the residuals should equal 0. The scatterplot will be a random scatter of points with no evident linear pattern. A nonlinear relationship between X and Y will be more evident in the residual plot of the (X, residual) data than in the scatterplot of the original (X, Y) data The Excel Regression analysis tool has an option for selecting the Residual plot. The Residual plot for the plasma volume example is on the following slide. 42
43 Residual Plot for Plasma Volume Body weight data body weight (kg) Residual Plot Re esiduals body weight (kg) No evidence of nonlinearity. The points are equally distributed around the value 0 with no evident positive or negative slope 43
44 (X, Y) Scatterplot for a nonlinear (or curvilinear) relationship When there is a curvilinear relationship between X and Y, the least squares regression line does not represent the relationship 44
45 Residual Plot for Curvilinear Relationship X Residual Plot 6 4 Residuals X This is the residual plot for the relationship on the previous slide. It illustrates that the relationship is not linear. The residual plot points aren t evenly distributed around the value 0. 45
46 Regression analysis for curvilinear relationships Simple linear regression analysis should not be used when X and Y have a curvilinear relationship There are several strategies for dealing with a curvilinear relationship between X and Y One option is to try a logarithmic transformation of the data to see if this improves the linear relationship Another option is to use piecewise regression fit one regression line to the increasing portion of the curve and a second regression line to the decreasing portion of the curve Athid third option is to include X 2 or X 3 in the regression equation (covered in PubH 6415 with multiple regression models). 46
47 Linear Regression Procedure Look at a scatter plot of the data Plot Y on the yaxis and X on the xaxis Add the trend line to the plot Estimate the regression line equation Find the slope and intercept of the regression line Check Residuals Is the relationship between X and Y statistically significant? Use a ttest test t of the slope to determine significance ifi How well does the estimated regression line equation fit the data? Calculate R 2  the coefficient of determination Use the estimated regression line equation to predict values of the dependent variable (Y) for specified values of the independent variable (X). 47
48 Is the relationship between X and Y significant? ifi If the slope of the regression line = 0, this indicates there is no linear relationship between the variables. If there is no linear relationship the variables are considered to be independent Att ttest test t of the slope estimate t can be done to test t for independence between the X and Y variables Null hypothesis: slope = 0 The null hypothesis states t that t the variables are independent d Alternative hypothesis: slope 0 The alternative hypothesis is that there is a significant relationship between the variables If the ttest test of the slope result is significant (pvalue < ), reject the null hypothesis and conclude that there is a statistically significant relationship between the two variables. 48
49 Notation for Population slope and Intercept As in any hypothesis test, the null and alternative hypotheses are stated about the population parameters, not about the estimates. The population parameters for the slope and intercept t of the regression line for the population are the Greek letters 1 and 0 1 is the population parameter for the slope 0 is the population parameter for the intercept The statistic for the ttest test of the slope will use the estimated value of the slope (b) that is calculated from the data. 49
50 ttest test of the Slope 1. State the Hypotheses Null hypothesis: = 0 Alternative hypothesis: 0 2. A ttest test will be used to test the hypothesis 3. Significance level = The degrees of freedom for a ttest test of the slope are n2 where n=sample size The critical values of the ttest test are found using TINV(0.05, 05 df). For the plasma volume example, n = 8 so the critical values = TINV(0.05, 6) = and
51 ttest test of the slope 5. Calculate the test statistic the slope estimate divided by the standard error of the slope t b 1 SE( b 1 ) The formula for the SE of the slope is complicated so we will use the Excel Data Analysis Tool to do this t test. The Data Analysis Tool provides the tstatistic and the pvalue of the ttest test of the slope 6. State the conclusion. If the test statistic is more extreme than the critical values reject the null hypothesis and conclude that there is a significant relationship between the variables. 51
52 Ttest of the Slope in Excel Data Analysis Tool output for the weight / plasma volume example: The tstatistic and pvalue for the ttest of the slope are highlighted SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted d R Square Standard Error Observations 8 ANOVA df SS MS F Significance F Regression Residual Total Coefficients Standard Error t Stat Pvalue Lower 95% Upper 95% Intercept Body weight Pvalue for ttest test = so reject the null hypothesis and conclude that there is a significant relationship between weight and plasma volume 52
53 Regression Analysis in Excel In Excel Module 15 use the Data Analysis Tool to obtain the Regression Analysis results select Regression under the Data Analysis Tool. Enter the plasma volume data for Yrange and the weight data for Xrange Check labels if you highlight the column headers Also check Residuals and Residual Plot Identify the tstatistic t ti ti and the pvalue for the ttest test t of the slope. Also identify the slope and the intercept on the output table These are under the Coefficients column 95% confidence intervals for the coefficients are also provided if the Confidence Level box is checked 53
54 Ttest of the Intercept The Data Analysis Tool also provides results of a ttest test of the Intercept. The Null hypothesis of this test is that the intercept = 0: = 0 The Alternative ti hypothesis of this test t is that t the intercept 0: 0 Usually there is not much interest in the ttest test of the intercept because testing whether the intercept = 0 does not provide information about the relationship between the two variables. From the Regression Table, you can see that the null hypothesis for the intercept = 0 is not rejected because the pvalue = This result does not affect the significant result of the ttest test of the slope. 54
55 Linear Regression Procedure Look at a scatter plot of the data Plot Y on the yaxis and X on the xaxis Add the trend line to the plot Estimate the regression line equation Find the slope and intercept of the regression line Is the relationship statistically significant? Use a ttest test of the slope to determine significance How well does the estimated t regression line equation fit the data? Calculate R 2  the coefficient of determination Use the estimated regression line equation to predict values of the dependent variable (Y) for specified values of the independent variable (X). 55
56 How well does the regression line equation fit the data? r 2 is st the notation otato for the ecoe coefficient ce to of determination r 2 is equal to the correlation coefficient (r) squared. It can range from 0 to 1. Interpretation of r 2 r 2 is proportion of variation in the dependent d variable (Y) that is explained by the estimated least squares regression equation. Larger values of r 2 indicate a better fit of the regression line to the data which indicates a more useful predictive model. 56
57 Calculating r 2 In Excel, you can use the CORREL function to find the correlation coefficient and square this value to find the coefficient of determination For the plasma / weight data, r = so r 2 = = Or you can find r 2 on the Data Analysis Tool Output: Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 8 Multiple R = the correlation coefficient R square = coefficient of determination (r 2 ) 57
58 Interpretation of r 2 For the plasma volume example r 2 = Interpretation: 57.6% of the variation in plasma volume is explained by the regression line equation with weight as the explanatory variable. Since only 57.6% of the variation in plasma volume is explained by body weight, there are likely other variables that explain some of the variation in plasma volume. Multiple l regression analysis uses more than one explanatory variable to predict the dependent variable This is covered in PubH 6415 If there are other explanatory variables significantly related to plasma volume in a multiple regression model, r 2 will increase 58
59 Linear Regression Procedure Look at a scatter plot of the data we have done this Plot Y on the yaxis and X on the xaxis Does the relationship appear to be linear? Estimate the regression line equation we have done this Find the slope and intercept of the regression line Is the relationship statistically significant? Use a ttest test of the slope to determine significance How well does the estimated t regression line equation fit the data? We have done this Calculate R 2  the coefficient of determination Use the estimated regression line equation to predict values of the dependent variable (Y) for specified values of the independent variable (X). 59
60 Using the Regression Line equation for Prediction i The regression line equation for the weight and plasma volume data is: Y X For a given value of weight (X), the plasma volume (Y) can be predicted. What is the expected plasma volume for an individual who weighs 60 kg? Insert 60 in the equation in place of X and solve for Y: Y * lite liters 60
61 Predicting plasma volume for weight = 60 kg Plasma a Volume (liters) Body Weight (kg) The predicted plasma volume for weight = 60 kg is the point on the regression line corresponding to x = 60. This point is 2.7 liters. 61
62 Appropriate Applications of the Regression Line Equation Predictions using regression line equations are only valid within the range of xvalues in the collected data. For the example data, the range of weight is from kgs. It would not be appropriate to use this regression line equation to predict plasma volume for an individual weighing 100 kg or an individual weighing 25 kg. There may be a different relationship between weight and plasma volume beyond the values of the collected data so the relationship identified by the regression line equation should not be extrapolated much beyond the range of the X values. 62
63 More cautions about application of Regression line predictions Predictions using Regression line equations are only valid for the population represented by the sample data. For Example, if data for a regression analysis are collected for girls age 1018, predictions using the equation are not necessarily valid for boys, adults or girls younger than 10. You can t assume that the relationship between two variables in one population is the same in other populations. Read the study description carefully to identify the population that was sampled. Regression analysis inferences are valid for this population but not necessarily other populations. 63
64 What if there isn t a significant relationship between the variables? If regression analysis reveals that there is NOT a significant relationship between the two variables (that is if the pvalue for the ttest test of the slope > ) )the ) regression equation is not useful for predicting values of the dependent variable from the independent variable. If the ttest test of the slope is NOT significant, end the regression analysis procedure and do not use the regression line equation for prediction. Prediction using the regression line equation is only useful if the null hypothesis of independence between the variables is rejected. 64
65 Relationship between Correlation and Regression The correlation coefficient and the slope of the regression line are related. For a given set of data: They will both have the same sign indicating the direction of the relationship (positive or negative). There is a mathematical ti relationship between the slope and the correlation coefficient: the slope of the regression line is equal to the correlation coefficient times the standard deviation of y divided by the standard deviation of x: b 1 rs y s x 65
66 Hypothesis Test of population correlation coefficient: i We can set up a hypothesis test of independence for the population correlation: Null Hypothesis: no significant linear association between the variables Alternative Hypothesis: 0 significant linear association between the variables The test statistic is a tstatistic with n2 df After finding the tstatistic,,y you can use EXCEL to find the pvalue = TDIST(t, n2, 2) t r n 1 r
67 Ttest of the correlation coefficient i For a given sample data, the ttest test for and the ttest test for the slope, 1, will have the same tstatistic t ti ti and pvalue. For the plasma volume data, the tstatistic for the test of the population correlation coefficient = which is the same as the tstatistic t ti ti for the slope of the regression line You can work through the equation in EXCEL to confirm this Pvalue = TDIST( , 6, 2) = The same conclusion is reached from either hypothesis test: t there is a significant ifi relationship between the two variables The pvalue < 0.05 so the null hypothesis of independence e is rejected at significance n level el
68 Linear Regression and Correlation: which to use? Both Linear Regression and Correlation Analysis can be used to explore the linear relationship between two continuous (quantitative) random variables Use Correlation analysis when the interest is primarily in identifying whether a relationship exists. Use the ttest test of the correlation coefficient to determine if the relationship is significant. Use Regression ession Analysis to identify a relationship AND to predict the value of one variable given a value of the other variable. Use the ttest test of the slope to determine if the relationship is significant Regression analysis is most useful when there is an identified interest in predicting one variable from the other(s). If prediction doesn t make sense, use correlation analysis. 68
69 Readings and Assignments Reading Chapter 8 pgs , 194, Complete the Lesson 15 Practice Exercises Lesson 15 Excel Modules Excel Module 15: Plasma Volume works through the example in this Lesson Excel Module 15: BMI works through the example in the text (pages , 206, ) 209) Complete OPTIONAL Homework 11: Use the Data Analysis Tool for the Linear Regression problems 69
Univariate Regression
Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is
More informationLesson 4 Part 1. Relationships between. two numerical variables. Correlation Coefficient. Relationship between two
Lesson Part Relationships between two numerical variables Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear between two numerical variables Relationship
More informationA correlation exists between two variables when one of them is related to the other in some way.
Lecture #10 Chapter 10 Correlation and Regression The main focus of this chapter is to form inferences based on sample data that come in pairs. Given such paired sample data, we want to determine whether
More informationCorrelation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2
Lesson 4 Part 1 Relationships between two numerical variables 1 Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables
More informationSection 14 Simple Linear Regression: Introduction to Least Squares Regression
Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship
More informationSIMPLE REGRESSION ANALYSIS
SIMPLE REGRESSION ANALYSIS Introduction. Regression analysis is used when two or more variables are thought to be systematically connected by a linear relationship. In simple regression, we have only two
More information12/31/2016. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2
PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 Understand linear regression with a single predictor Understand how we assess the fit of a regression model Total Sum of Squares
More informationChapter 7: Simple linear regression Learning Objectives
Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) 
More informationX X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)
CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.
More informationInferential Statistics
Inferential Statistics Sampling and the normal distribution Zscores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are
More informationSimple linear regression
Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between
More informationThe scatterplot indicates a positive linear relationship between waist size and body fat percentage:
STAT E150 Statistical Methods Multiple Regression Three percent of a man's body is essential fat, which is necessary for a healthy body. However, too much body fat can be dangerous. For men between the
More informationSection 3 Part 1. Relationships between two numerical variables
Section 3 Part 1 Relationships between two numerical variables 1 Relationship between two variables The summary statistics covered in the previous lessons are appropriate for describing a single variable.
More information1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96
1 Final Review 2 Review 2.1 CI 1propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years
More information2. Simple Linear Regression
Research methods  II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according
More informationSimple Linear Regression
Inference for Regression Simple Linear Regression IPS Chapter 10.1 2009 W.H. Freeman and Company Objectives (IPS Chapter 10.1) Simple linear regression Statistical model for linear regression Estimating
More informationCHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression
Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the
More informationExercise 1.12 (Pg. 2223)
Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.
More informationSydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 1. 1. Introduction p. 2. 2. Statistical Methods Used p. 5. 3. 10 and under Males p.
Sydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 1 Table of Contents 1. Introduction p. 2 2. Statistical Methods Used p. 5 3. 10 and under Males p. 8 4. 11 and up Males p. 10 5. 10 and under
More informationExample: Boats and Manatees
Figure 96 Example: Boats and Manatees Slide 1 Given the sample data in Table 91, find the value of the linear correlation coefficient r, then refer to Table A6 to determine whether there is a significant
More informationLecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation
Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation Display and Summarize Correlation for Direction and Strength Properties of Correlation Regression Line Cengage
More informationRegression Analysis: A Complete Example
Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty
More informationSimple Regression and Correlation
Simple Regression and Correlation Today, we are going to discuss a powerful statistical technique for examining whether or not two variables are related. Specifically, we are going to talk about the ideas
More information" Y. Notation and Equations for Regression Lecture 11/4. Notation:
Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through
More information7. Tests of association and Linear Regression
7. Tests of association and Linear Regression In this chapter we consider 1. Tests of Association for 2 qualitative variables. 2. Measures of the strength of linear association between 2 quantitative variables.
More informationSimple Linear Regression Chapter 11
Simple Linear Regression Chapter 11 Rationale Frequently decisionmaking situations require modeling of relationships among business variables. For instance, the amount of sale of a product may be related
More informationChapter 13 Introduction to Linear Regression and Correlation Analysis
Chapter 3 Student Lecture Notes 3 Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing
More informationHomework 11. Part 1. Name: Score: / null
Name: Score: / Homework 11 Part 1 null 1 For which of the following correlations would the data points be clustered most closely around a straight line? A. r = 0.50 B. r = 0.80 C. r = 0.10 D. There is
More informatione = random error, assumed to be normally distributed with mean 0 and standard deviation σ
1 Linear Regression 1.1 Simple Linear Regression Model The linear regression model is applied if we want to model a numeric response variable and its dependency on at least one numeric factor variable.
More informationSimple Predictive Analytics Curtis Seare
Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use
More informationAMS7: WEEK 8. CLASS 1. Correlation Monday May 18th, 2015
AMS7: WEEK 8. CLASS 1 Correlation Monday May 18th, 2015 Type of Data and objectives of the analysis Paired sample data (Bivariate data) Determine whether there is an association between two variables This
More informationFor example, enter the following data in three COLUMNS in a new View window.
Statistics with Statview  18 Paired ttest A paired ttest compares two groups of measurements when the data in the two groups are in some way paired between the groups (e.g., before and after on the
More informationStatistical Modelling in Stata 5: Linear Models
Statistical Modelling in Stata 5: Linear Models Mark Lunt Arthritis Research UK Centre for Excellence in Epidemiology University of Manchester 08/11/2016 Structure This Week What is a linear model? How
More informationCopyright 2013 by Laura Schultz. All rights reserved. Page 1 of 6
Using Your TINSpire Calculator: Linear Correlation and Regression Dr. Laura Schultz Statistics I This handout describes how to use your calculator for various linear correlation and regression applications.
More informationIntroduction to Regression. Dr. Tom Pierce Radford University
Introduction to Regression Dr. Tom Pierce Radford University In the chapter on correlational techniques we focused on the Pearson R as a tool for learning about the relationship between two variables.
More informationOutline. Correlation & Regression, III. Review. Relationship between r and regression
Outline Correlation & Regression, III 9.07 4/6/004 Relationship between correlation and regression, along with notes on the correlation coefficient Effect size, and the meaning of r Other kinds of correlation
More information2013 MBA Jump Start Program. Statistics Module Part 3
2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just
More informationDATA INTERPRETATION AND STATISTICS
PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE
More informationSELFTEST: SIMPLE REGRESSION
ECO 22000 McRAE SELFTEST: SIMPLE REGRESSION Note: Those questions indicated with an (N) are unlikely to appear in this form on an inclass examination, but you should be able to describe the procedures
More informationEXPERIMENT 6: HERITABILITY AND REGRESSION
BIO 184 Laboratory Manual Page 74 EXPERIMENT 6: HERITABILITY AND REGRESSION DAY ONE: INTRODUCTION TO HERITABILITY AND REGRESSION OBJECTIVES: Today you will be learning about some of the basic ideas and
More informationSimple Linear Regression
1 Excel Manual Simple Linear Regression Chapter 13 This chapter discusses statistics involving the linear regression. Excel has numerous features that work well for comparing quantitative variables both
More informationLecture 18 Linear Regression
Lecture 18 Statistics Unit Andrew Nunekpeku / Charles Jackson Fall 2011 Outline 1 1 Situation  used to model quantitative dependent variable using linear function of quantitative predictor(s). Situation
More informationResiduals. Residuals = ª Department of ISM, University of Alabama, ST 260, M23 Residuals & Minitab. ^ e i = y i  y i
A continuation of regression analysis Lesson Objectives Continue to build on regression analysis. Learn how residual plots help identify problems with the analysis. M231 M232 Example 1: continued Case
More informationUsing Minitab for Regression Analysis: An extended example
Using Minitab for Regression Analysis: An extended example The following example uses data from another text on fertilizer application and crop yield, and is intended to show how Minitab can be used to
More informationStatistiek II. John Nerbonne. March 24, 2010. Information Science, Groningen Slides improved a lot by Harmut Fitz, Groningen!
Information Science, Groningen j.nerbonne@rug.nl Slides improved a lot by Harmut Fitz, Groningen! March 24, 2010 Correlation and regression We often wish to compare two different variables Examples: compare
More informationUnit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a
More informationRegression Analysis. Pekka Tolonen
Regression Analysis Pekka Tolonen Outline of Topics Simple linear regression: the form and estimation Hypothesis testing and statistical significance Empirical application: the capital asset pricing model
More informationIn Chapter 2, we used linear regression to describe linear relationships. The setting for this is a
Math 143 Inference on Regression 1 Review of Linear Regression In Chapter 2, we used linear regression to describe linear relationships. The setting for this is a bivariate data set (i.e., a list of cases/subjects
More information0.1 Multiple Regression Models
0.1 Multiple Regression Models We will introduce the multiple Regression model as a mean of relating one numerical response variable y to two or more independent (or predictor variables. We will see different
More informationPractice 3 SPSS. Partially based on Notes from the University of Reading:
Practice 3 SPSS Partially based on Notes from the University of Reading: http://www.reading.ac.uk Simple Linear Regression A simple linear regression model is fitted when you want to investigate whether
More informationMultiple Regression Analysis in Minitab 1
Multiple Regression Analysis in Minitab 1 Suppose we are interested in how the exercise and body mass index affect the blood pressure. A random sample of 10 males 50 years of age is selected and their
More informationSimple Linear Regression in SPSS STAT 314
Simple Linear Regression in SPSS STAT 314 1. Ten Corvettes between 1 and 6 years old were randomly selected from last year s sales records in Virginia Beach, Virginia. The following data were obtained,
More informationCopyright 2007 by Laura Schultz. All rights reserved. Page 1 of 5
Using Your TI83/84 Calculator: Linear Correlation and Regression Elementary Statistics Dr. Laura Schultz This handout describes how to use your calculator for various linear correlation and regression
More informationwhere b is the slope of the line and a is the intercept i.e. where the line cuts the y axis.
Least Squares Introduction We have mentioned that one should not always conclude that because two variables are correlated that one variable is causing the other to behave a certain way. However, sometimes
More informationModule 5: Multiple Regression Analysis
Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College
More informationAnalyzing Linear Relationships, Two or More Variables
PART V ANALYZING RELATIONSHIPS CHAPTER 14 Analyzing Linear Relationships, Two or More Variables INTRODUCTION In the previous chapter, we introduced Kate Cameron, the owner of Woodbon, a company that produces
More information, then the form of the model is given by: which comprises a deterministic component involving the three regression coefficients (
Multiple regression Introduction Multiple regression is a logical extension of the principles of simple linear regression to situations in which there are several predictor variables. For instance if we
More informationChapter 11: Two Variable Regression Analysis
Department of Mathematics Izmir University of Economics Week 1415 20142015 In this chapter, we will focus on linear models and extend our analysis to relationships between variables, the definitions
More informationIntroduction to Regression and Data Analysis
Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it
More informationTechnology StepbyStep Using StatCrunch
Technology StepbyStep Using StatCrunch Section 1.3 Simple Random Sampling 1. Select Data, highlight Simulate Data, then highlight Discrete Uniform. 2. Fill in the following window with the appropriate
More informationStatistical Functions in Excel
Statistical Functions in Excel There are many statistical functions in Excel. Moreover, there are other functions that are not specified as statistical functions that are helpful in some statistical analyses.
More informationRegression in SPSS. Workshop offered by the Mississippi Center for Supercomputing Research and the UM Office of Information Technology
Regression in SPSS Workshop offered by the Mississippi Center for Supercomputing Research and the UM Office of Information Technology John P. Bentley Department of Pharmacy Administration University of
More informationSimple Linear Regression
STAT 101 Dr. Kari Lock Morgan Simple Linear Regression SECTIONS 9.3 Confidence and prediction intervals (9.3) Conditions for inference (9.1) Want More Stats??? If you have enjoyed learning how to analyze
More informationStudy Resources For Algebra I. Unit 1C Analyzing Data Sets for Two Quantitative Variables
Study Resources For Algebra I Unit 1C Analyzing Data Sets for Two Quantitative Variables This unit explores linear functions as they apply to data analysis of scatter plots. Information compiled and written
More informationBusiness Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.
Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGrawHill/Irwin, 2008, ISBN: 9780073319889. Required Computing
More informationNCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
More informationThe correlation coefficient
The correlation coefficient Clinical Biostatistics The correlation coefficient Martin Bland Correlation coefficients are used to measure the of the relationship or association between two quantitative
More informationAP * Statistics Review. Linear Regression
AP * Statistics Review Linear Regression Teacher Packet Advanced Placement and AP are registered trademark of the College Entrance Examination Board. The College Board was not involved in the production
More information12.1 Inference for Linear Regression
12.1 Inference for Linear Regression Least Squares Regression Line y = a + bx You might want to refresh your memory of LSR lines by reviewing Chapter 3! 1 Sample Distribution of b p740 Shape Center Spread
More informationRegression stepbystep using Microsoft Excel
Step 1: Regression stepbystep using Microsoft Excel Notes prepared by Pamela Peterson Drake, James Madison University Type the data into the spreadsheet The example used throughout this How to is a regression
More informationChapter 12 : Linear Correlation and Linear Regression
Number of Faculty Chapter 12 : Linear Correlation and Linear Regression Determining whether a linear relationship exists between two quantitative variables, and modeling the relationship with a line, if
More informationAlgebra I: Lesson 54 (5074) SAS Curriculum Pathways
TwoVariable Quantitative Data: Lesson Summary with Examples Bivariate data involves two quantitative variables and deals with relationships between those variables. By plotting bivariate data as ordered
More informationSession 7 Bivariate Data and Analysis
Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table covariation least squares
More information496 STATISTICAL ANALYSIS OF CAUSE AND EFFECT
496 STATISTICAL ANALYSIS OF CAUSE AND EFFECT * Use a nonparametric technique. There are statistical methods, called nonparametric methods, that don t make any assumptions about the underlying distribution
More informationModule 5: Statistical Analysis
Module 5: Statistical Analysis To answer more complex questions using your data, or in statistical terms, to test your hypothesis, you need to use more advanced statistical tests. This module reviews the
More informationDescriptive Statistics
Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize
More information17.0 Linear Regression
17.0 Linear Regression 1 Answer Questions Lines Correlation Regression 17.1 Lines The algebraic equation for a line is Y = β 0 + β 1 X 2 The use of coordinate axes to show functional relationships was
More informationClass 6: Chapter 12. Key Ideas. Explanatory Design. Correlational Designs
Class 6: Chapter 12 Correlational Designs l 1 Key Ideas Explanatory and predictor designs Characteristics of correlational research Scatterplots and calculating associations Steps in conducting a correlational
More informationCorrelation and Regression
Correlation and Regression Scatterplots Correlation Explanatory and response variables Simple linear regression General Principles of Data Analysis First plot the data, then add numerical summaries Look
More informationLAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING
LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.
More information, has mean A) 0.3. B) the smaller of 0.8 and 0.5. C) 0.15. D) which cannot be determined without knowing the sample results.
BA 275 Review Problems  Week 9 (11/20/0611/24/06) CD Lessons: 69, 70, 1620 Textbook: pp. 520528, 111124, 133141 An SRS of size 100 is taken from a population having proportion 0.8 of successes. An
More informationCalculate Confidence Intervals Using the TI Graphing Calculator
Calculate Confidence Intervals Using the TI Graphing Calculator Confidence Interval for Population Proportion p Confidence Interval for Population μ (σ is known 1 Select: STAT / TESTS / 1PropZInt x: number
More informationRegression in ANOVA. James H. Steiger. Department of Psychology and Human Development Vanderbilt University
Regression in ANOVA James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) 1 / 30 Regression in ANOVA 1 Introduction 2 Basic Linear
More informationMultiple Regression in SPSS STAT 314
Multiple Regression in SPSS STAT 314 I. The accompanying data is on y = profit margin of savings and loan companies in a given year, x 1 = net revenues in that year, and x 2 = number of savings and loan
More informationCourse Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics
Course Text Business Statistics Lind, Douglas A., Marchal, William A. and Samuel A. Wathen. Basic Statistics for Business and Economics, 7th edition, McGrawHill/Irwin, 2010, ISBN: 9780077384470 [This
More informationLecture  32 Regression Modelling Using SPSS
Applied Multivariate Statistical Modelling Prof. J. Maiti Department of Industrial Engineering and Management Indian Institute of Technology, Kharagpur Lecture  32 Regression Modelling Using SPSS (Refer
More informationIn Chapter 27 we tried to predict the percent body fat of male subjects from
29 C H A P T E R Multiple Regression WHO WHAT UNITS WHEN WHERE WHY 25 Male subjects Body fat and waist size %Body fat and inches 199s United States Scientific research In Chapter 27 we tried to predict
More informationRegression. In this class we will:
AMS 5 REGRESSION Regression The idea behind the calculation of the coefficient of correlation is that the scatter plot of the data corresponds to a cloud that follows a straight line. This idea can be
More informationChapter 23. Inferences for Regression
Chapter 23. Inferences for Regression Topics covered in this chapter: Simple Linear Regression Simple Linear Regression Example 23.1: Crying and IQ The Problem: Infants who cry easily may be more easily
More informationRelationships Between Two Variables: Scatterplots and Correlation
Relationships Between Two Variables: Scatterplots and Correlation Example: Consider the population of cars manufactured in the U.S. What is the relationship (1) between engine size and horsepower? (2)
More informationChapter 5 Analysis of variance SPSS Analysis of variance
Chapter 5 Analysis of variance SPSS Analysis of variance Data file used: gss.sav How to get there: Analyze Compare Means Oneway ANOVA To test the null hypothesis that several population means are equal,
More informationAdditional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jintselink/tselink.htm
Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jintselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm
More informationAn analysis appropriate for a quantitative outcome and a single quantitative explanatory. 9.1 The model behind linear regression
Chapter 9 Simple Linear Regression An analysis appropriate for a quantitative outcome and a single quantitative explanatory variable. 9.1 The model behind linear regression When we are examining the relationship
More informationSection 13, Part 1 ANOVA. Analysis Of Variance
Section 13, Part 1 ANOVA Analysis Of Variance Course Overview So far in this course we ve covered: Descriptive statistics Summary statistics Tables and Graphs Probability Probability Rules Probability
More informationMath 62 Statistics Sample Exam Questions
Math 62 Statistics Sample Exam Questions 1. (10) Explain the difference between the distribution of a population and the sampling distribution of a statistic, such as the mean, of a sample randomly selected
More informationSimple Linear Regression One Binary Categorical Independent Variable
Simple Linear Regression Does sex influence mean GCSE score? In order to answer the question posed above, we want to run a linear regression of sgcseptsnew against sgender, which is a binary categorical
More informationID X Y
Dale Berger SPSS StepbyStep Regression Introduction: MRC01 This stepbystep example shows how to enter data into SPSS and conduct a simple regression analysis to develop an equation to predict from.
More informationChapter 9. Section Correlation
Chapter 9 Section 9.1  Correlation Objectives: Introduce linear correlation, independent and dependent variables, and the types of correlation Find a correlation coefficient Test a population correlation
More informationAnswer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade
Statistics Quiz Correlation and Regression  ANSWERS 1. Temperature and air pollution are known to be correlated. We collect data from two laboratories, in Boston and Montreal. Boston makes their measurements
More informationHYPOTHESIS TESTING: CONFIDENCE INTERVALS, TTESTS, ANOVAS, AND REGRESSION
HYPOTHESIS TESTING: CONFIDENCE INTERVALS, TTESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate
More information