CHAPTER 2 AND 10: Least Squares Regression

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "CHAPTER 2 AND 10: Least Squares Regression"

Transcription

1 CHAPTER 2 AND 0: Least Squares Regression In chapter 2 and 0 we will be looking at the relationship between two quantitative variables measured on the same individual. General Procedure:. Make a scatterplot and describe the form, direction and strength of the relationship. Note: fitting a line only makes sense if the overall pattern of the scatterplot is roughly linear. 2. Look for outliers and influential observations on the scatterplot. a. Note: Inference is not safe if there are influential points as results depend strongly on these few points. It is often helpful to rework the data without the influential variables and compare the results. 3. Find the correlation r to get a numerical measure of direction and strength of the linear relationship. 4. Find r 2, the fraction of variation in the values of y that is explained by the least squares regression of y on x. 5. If the data is reasonably linear, find the least squares regression line for the data. Note: The line can be used to predict y for a given x. 6. Make a residual plot and normal probability plot to check the regression assumptions. 7. If and only if your data was collected using random sampling techniques, you can look at the hypothesis tests and confidence interval for the correlation, slope and intercept. 8. If and only if your data was collected using random sampling techniques, you can look at the hypothesis tests and confidence intervals for the mean response and prediction intervals.

2 Association Between Variables: Two variables measured on the same individuals are associated if some values of one variable tend to occur more often with some values of the second variable than with other values of that variable. Just because two variables are associated doesn t mean that a change in one variable causes a change in the other. (Causation section 2.6) Also, the relationship between two variables might not tell the whole story. Other variables may affect the relationship. These other variables are called lurking variables. Positive association: When above average values of one variable tend to accompany aboveaverage values of the other, and below average values also tend to occur together. Negative association: When above average values of one variable tend to accompany below average values of the other and visa versa. No association: Hard to find a pattern showing a relationship between the variables. Response variable: measures an outcome of a study. Dependent variable Y Explanatory variable: explains or causes changes in the response variable. Independent variable X 2

3 Example : A forester has become adept at estimating the volume (in cubic feet) of trees on a particular site prior to a timber sale. Since his operation has now expanded, he would like to train another person to assist in estimating the cubic foot volume of trees. He decided to create a model that will allow him to obtain the actual tree volume based on his assistant s estimation. The forester selects a random sample of trees to be felled. For each tree, the assistant is to guess the cubic foot volume of the tree. The forester also obtains the actual foot volume after the tree has been chopped down. Below is his data: Tree Estimated Volume Actual Volume STEP : Make a scatterplot; describe the form, direction and strength of the relationship. Before doing the scatterplot you need to decide which variable is the explanatory variable and which is the response variable. For Example, identify the explanatory and the response variables. Explanatory Variable: Response Variable: A scatterplot shows the relationship between two quantitative variables measured on the same individual. The explanatory variable is plotted on the x axis; the response variable is plotted on the y axis. Look at the overall pattern. The overall pattern can be described by form, direction and strength. Form: is the scatterplot linear, quadratic, etc. Direction: is the association positive or negative? Strength: of the relationship. Describe the scatterplot in Example. Form: Strength: Direction: 3

4 Scatterplot Using SPSS: >Graphs >Scatter/Dot Select simple and click define Pull estimate into the X Axis box and actual into the Y Axis box then click OK. Note: To get the fitted line you need to double click on your graph to bring up the chart editor. You will then need to click on a button that looks like a sdatterplot with a fitted line through it. Select linear and then close. Estimated Volume versus Actual Volume of Trees Actual R Sq Linear = Estimate STEP 2: Look for outliers and influential observations on the scatterplot. Look for striking deviations from the overall pattern. Outlier: An observation that lies outside of the overall pattern of the other observations. Points that are outliers in the y direction of a scatterplot have large regression residuals, but other outliers need not have large residuals. Influential observations: an observation that if removed would markedly change the results of the regression calculation. Points that are outliers in the x direction of a scatterplot are often influential for the least squares regression line. Are there any outliers or influential observations in our data? Note: To add a categorical variable to a scatterplot, use a different plot color or symbol for each category. 4

5 STEP 3: Find the correlation r. The correlation measures the direction and strength of the linear relationship between two quantitative variables. Correlation is usually written as r. x ix yiy r n s x s y Properties of correlation: It makes no difference which variable you call x and which you call y since correlation does not make use of the distinction between the explanatory variable and the response variable. Both variables need to be quantitative to calculate correlation. The correlation r does not change if we change the units of measurements of x, y, or both. A positive r corresponds to a positive relationship between the variables. A negative r corresponds to a negative relationship between the variables. ( r ) Values near 0 indicate a weak relationship and values close to or indicate a strong relationship. Correlation measures the strength of only a linear relationship. Like the mean and standard deviation, the correlation is not resistant. The correlation r is strongly affected by a few outlying observations. Use r with caution when outliers appear in the scatterplot. Correlation is not a complete description of two variable data. You should give the mean and standard deviations of both x and y along with the correlation. For Example, find the correlation between estimated volume and actual volume. We will use the Pearson Correlation output from SPSS here. 5

6 Note: The SPSS manual tells you where to find r using the least squares regression output, but this r is actually the ABSOLUTE VALUE OF r, so you need to figure out the sign yourself by looking at the association (positive or negative) of your data. Estimate Actual Correlations Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N Estimate Actual.936** ** **. Correlation is significant at the 0.0 level (2-tailed). STEP 4: Find r 2. r 2 is the percent of variation in y explained by the regression line (the closer to 00%, the better). We can get this from the regression output by squaring the correlation r. For Example, find the percent of variation in actual volume of trees explained by the regression line. STEP 5: Find the least squares regression line for the data. The least squares regression line of y on x is the line that makes the sum of the squares of the vertical distances of the data points from the line as small as possible. We have data on an explanatory variable x and a response variable y for n individuals. The means and standard deviations of the data are x and s x for x and y and s y for y; the correlation between x and y is r. The regression Model for the population is: yi 0 x i The sample prediction equation of the least squares regression is: y b bx 0 The slope is: sy b r s x Where the slope measures the amount of change caused in the response variable when the explanatory variable is increased by one unit. The intercept is: b0 y bx 6

7 Using SPSS: >Analyze >Regression >linear. Put estimate into the independent box and actual into the dependent box. Click OK. Model Model Summary Adjusted Std. Error of R R Square R Square the Estimate.936 a a. Predictors: (Constant), Estimate Model Regression Residual Total a. Predictors: (Constant), Estimate b. Dependent Variable: Actual ANOVA b Sum of Squares df Mean Square F Sig a Model (Constant) Estimate a. Dependent Variable: Actual Unstandardized Coefficients Coefficients a Standardized Coefficients B Std. Error Beta t Sig For Example, find the least squares regression line. Based on the r 2 value you found previously, do you think this line will be useful for predicting actual tree volumes? For Example, use the regression line to predict the actual volume of a tree with an estimated volume of 3 cubic feet? We can use a regression line to make predictions as long as we follow the following rules: Only use the least squares regression line to find y for a specific value of x. (Don t use it to find x for a specific value of y!) Extrapolation involves using the line to find y values corresponding to x values that are outside the range of our data x values. Typically we want to avoid this since the line may not be valid for wide ranges of x values. 7

8 Example 2: (From Moore and McCabe fourth edition) During the period after birth, a male white rat gains 40 grams (g) per week till about 0 weeks of age. (This is unusually regular in his growth, but 40g per week is a realistic rate.) a. If the rat weighed 00g at birth, give an equation for his weight after a week. What is the slope of this line? b. Would you be willing to use this line to predict the rat s weight at age 2 years: Do the prediction and think about the reasonableness of the results? (There are 454 g in a pound. To help you asses the results, a large cat weighs about 0 pounds. Prediction Intervals Predicting a future observation under conditions similar to those used in the study. Since there is variability involved in using a model created from sample data, a prediction interval is better than a single prediction. They re related to confidence intervals. Use SPSS to calculate these intervals. Residual: The vertical distance between the observed y value and the corresponding predicted y value. residual e y y y ( b b x ) i i i i 0 i For example 2, find the residual for tree number and tree number 7. Assumptions for Regression Inference and Regression Model:. Repeated responses of y are independent of each other. This basically means the data comes from a simple random sample. (To check this assumption examine the way in which the units were selected.) 2. For any fixed value of x, the response y varies according to a normal distribution. (To check the assumption of normality you can do a normal probability plot of the residuals on SPSS). 3. The relationship is linear. (To check the linearity assumption, you can make a scatterplot or a residual plot of the data.) 4. The standard deviation of y is the same for all values of x. The value of is unknown. (To check for constant variability you can look at the residual plot of the data.) 8

9 STEP 6: Make a residual plot and normal probability plot to check the regression assumptions. It is always important to check that the assumptions of the regression model have been met to determine whether your results are valid. This is also important to do before you proceed with inference. Normal Probability plots: If your points fall in a relatively straight line, then you can assume that your response is relatively normal and the second assumption has been met. To check the normality assumption we make a normal probability plot by doing the following: Using SPSS: >Analyze >Regression >linear. Put estimate into the independent box and actual into the dependent box. Then select plots and click on the box for normal probability plot and click continue followed by OK. Normal P-P Plot of Regression Standardized Residual.0 Dependent Variable: Actual 0.8 Expected Cum Prob Observed Cum Prob For example 2, has the normality assumption has been met? Residual Plots: A residual plot is a scatterplot of the regression residuals against the explanatory variable. It is used to assess the fit of a regression line and to check for a constant variability. The residual plot magnifies the deviations from the line to make the pattern easy to read. If the points are random with no pattern and approximately the same number of points above and below the center line, you can feel confident that assumptions three and four have been met. If you have a funnel shape this shows you that the assumption of constant variance has not been met. If you have some other pattern like a parabola, this shows you that the linearity assumption has not been met. 9

10 Using SPSS: >Analyze >Regression >linear. Put estimate into the independent box and actual into the dependent box. Click on the save button, check unstandardized residuals, and click continue. The residuals will appear in the data editor. Make a scatterplot of the residuals on the y axis against the estimated volume on the x axis. To get the line at y=0, when in the chart editor right click on the graph and select Add Y Axis reference line. Then select reference line and plug in a zero for y axis position. Residual Plot For example 2, have the assumptions of linearity and constant variability been met? Unstandardized Residual Estimate Lastly, it is important to check for outliers and influential variables. Looking for high residuals or points that are far from the other points is important. Often we will want to do the analysis both with and without the outliers, particularly if they are influential variables as well. Are there any outliers or influential variables in example 2? 0

11 Scatterplot Without the Influential Variable Actual Derived from Actual 4.00 R Sq Linear = Estimate Model Model Summary Adjusted Std. Error of R R Square R Square the Estimate.86 a a. Predictors: (Constant), Estimate Model (Constant) Estimate a. Dependent Variable: Actual Unstandardized Coefficients Coefficients a Standardized Coefficients B Std. Error Beta t Sig

12 STEP 7: Look at the hypothesis tests and confidence intervals Up until this point we have looked at some regression related concepts that can be used in an exploratory data analysis setting as well as a more formal setting. We will now look at inference for regression. Before we do this, however, it must be understood that the tests and confidence intervals that we find from now on out can only be found on data that has been collected using a random sampling technique such as simple random sampling. If we did not collect our data using a random sample, or if we have conducted a census, these techniques are meaningless. Test for a Zero Population Correlation: State null and alternative hypotheses H : 0 versus 0 a : 0 a Find the test statistic r n 2 t 2 r H, H : 0 or H : 0 where n is the sample size and r is the sample correlation Calculate the P value in terms of a random variable T having the ( 2) The P value for a test of H 0 against H a : 0 is PT ( t) H a : 0 is PT ( t) H : 0 is 2 P( T t ) a Compare the P value to the α level If P value α, then reject H 0 If P value > α, fail to then reject State your conclusions in terms of the problem H 0 a tn distribution. For example, test H : 0 versus 0 H a : 0 2

13 Confidence Intervals for Regression Slope and Intercept: A level C confidence interval for the intercept is 0 b t SE * 0 b 0 A level C confidence interval for the slope is b t SE * b SPSS will also give you these confidence intervals for 95%, but you may have to use the estimates for the coefficients and their standard errors to find the other confidence intervals. (Use the t table and n 2 degrees of freedom to get t*). Using SPSS: >Analyze >Regression >linear. Put estimate into the independent box and actual into the dependent box. Click on statistics and select confidence intervals and click continue followed by OK Model (Constant) Estimate a. Dependent Variable: Actual Unstandardized Coefficients Coefficients a Standardized Coefficients 95% Confidence Interval for B t Sig. Lower Bound Upper Bound B Std. Error Beta For example 2, find the 95% and 99% confidence intervals for slope and y intercept. 3

14 Hypothesis test for the regression slope: State the null and Alternative Hypotheses H0 : 0 versus Ha : 0, a : 0 Find the test statistic t b, with df = n 2 SEb H or H : 0 (SPSS will give you the test statistic) Calculate the P value (SPSS will give you the 2 sided P value. If you have a one sided test, you will have to divide the P value by 2). Compare the P value to the α level If P value α, then reject H 0 If P value > α, then fail to reject H State your conclusions in terms of the problem 0 a The test statistic for correlation is numerically identical to the test statistic used to test slope. Therefore, you can read the test statistic and P value off of the SPSS output for the slope when doing a test for the correlation. For example 2, perform a significance test to see whether the slope of the regression line is positive. 4

15 Example 4: This example will use data that is part of a data set from Dr. T.N.K. Raju, Department of Neonatology, University of Illinois at Chicago. IMR=Infant Mortality rate PQLI=Physical Quality of life Index (Indicator of average wealth) Case PQLI IMR Case PQLI IMR How does the physical quality of life index affect infant mortality rate? Answer the questions below based on the output that follows. a. Describe the form, direction and strength of the relationship. b. What is the correlation? c. What percent of the variation in infant mortality rate is explained by the regression line? d. Give an estimate for the standard deviation of the model. (Find s.) e. Do a hypothesis test to test H : 0 0 versus H : 0. a f. What is the equation of the least squares regression line? g. Use the regression line to predict a PQLI of 25. 5

16 h. Is the prediction in part 6 good? Why? i. Find the residual for case. j. Find a 99% confidence interval for the slope. k. What assumptions need to be met for the above to be of use? How Physical Quality of Life affects Infant Mortality Rate 30 0 IMR PQLI Model Model Summary Adjusted Std. Error of R R Square R Square the Estimate.300 a a. Predictors: (Constant), PQLI Model (Constant) PQLI a. Dependent Variable: IMR Unstandardized Coefficients Coefficients a Standardized Coefficients 95% Confidence Interval for B t Sig. Lower Bound Upper Bound B Std. Error Beta

17 Residual Plot Unstandardized Residual PQLI Normal P-P Plot of Regression Standardized Residual.0 Dependent Variable: IMR 0.8 Expected Cum Prob Observed Cum Prob Does the relationship make sense? For example, does it make sense that the infant mortality rate will go up as the physical quality of life index gets better? What could be a potential lurking variable here? Now let s look at what happens if we add a categorical variable to the picture. Case PQLI IMR Location 7 0 rural rural rural urban rural urban rural urban urban rural urban 7

18 40 How Physical Quality of Life Affects Infant Mo LOCATION urban IMR rural PQLI 8

Chapter 23. Inferences for Regression

Chapter 23. Inferences for Regression Chapter 23. Inferences for Regression Topics covered in this chapter: Simple Linear Regression Simple Linear Regression Example 23.1: Crying and IQ The Problem: Infants who cry easily may be more easily

More information

Simple Linear Regression in SPSS STAT 314

Simple Linear Regression in SPSS STAT 314 Simple Linear Regression in SPSS STAT 314 1. Ten Corvettes between 1 and 6 years old were randomly selected from last year s sales records in Virginia Beach, Virginia. The following data were obtained,

More information

SPSS Guide: Regression Analysis

SPSS Guide: Regression Analysis SPSS Guide: Regression Analysis I put this together to give you a step-by-step guide for replicating what we did in the computer lab. It should help you run the tests we covered. The best way to get familiar

More information

Simple linear regression

Simple linear regression Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

More information

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 13 Introduction to Linear Regression and Correlation Analysis Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing

More information

Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

More information

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Section 14 Simple Linear Regression: Introduction to Least Squares Regression Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship

More information

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

More information

Simple Linear Regression, Scatterplots, and Bivariate Correlation

Simple Linear Regression, Scatterplots, and Bivariate Correlation 1 Simple Linear Regression, Scatterplots, and Bivariate Correlation This section covers procedures for testing the association between two continuous variables using the SPSS Regression and Correlate analyses.

More information

, has mean A) 0.3. B) the smaller of 0.8 and 0.5. C) 0.15. D) which cannot be determined without knowing the sample results.

, has mean A) 0.3. B) the smaller of 0.8 and 0.5. C) 0.15. D) which cannot be determined without knowing the sample results. BA 275 Review Problems - Week 9 (11/20/06-11/24/06) CD Lessons: 69, 70, 16-20 Textbook: pp. 520-528, 111-124, 133-141 An SRS of size 100 is taken from a population having proportion 0.8 of successes. An

More information

Lesson Lesson Outline Outline

Lesson Lesson Outline Outline Lesson 15 Linear Regression Lesson 15 Outline Review correlation analysis Dependent and Independent variables Least Squares Regression line Calculating l the slope Calculating the Intercept Residuals and

More information

Chapter 5. Regression

Chapter 5. Regression Topics covered in this chapter: Chapter 5. Regression Adding a Regression Line to a Scatterplot Regression Lines and Influential Observations Finding the Least Squares Regression Model Adding a Regression

More information

Chapter 5 Analysis of variance SPSS Analysis of variance

Chapter 5 Analysis of variance SPSS Analysis of variance Chapter 5 Analysis of variance SPSS Analysis of variance Data file used: gss.sav How to get there: Analyze Compare Means One-way ANOVA To test the null hypothesis that several population means are equal,

More information

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

More information

SIMPLE REGRESSION ANALYSIS

SIMPLE REGRESSION ANALYSIS SIMPLE REGRESSION ANALYSIS Introduction. Regression analysis is used when two or more variables are thought to be systematically connected by a linear relationship. In simple regression, we have only two

More information

Doing Multiple Regression with SPSS. In this case, we are interested in the Analyze options so we choose that menu. If gives us a number of choices:

Doing Multiple Regression with SPSS. In this case, we are interested in the Analyze options so we choose that menu. If gives us a number of choices: Doing Multiple Regression with SPSS Multiple Regression for Data Already in Data Editor Next we want to specify a multiple regression analysis for these data. The menu bar for SPSS offers several options:

More information

Correlation and Regression

Correlation and Regression Correlation and Regression Scatterplots Correlation Explanatory and response variables Simple linear regression General Principles of Data Analysis First plot the data, then add numerical summaries Look

More information

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r), Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables

More information

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate

More information

Bivariate Regression Analysis. The beginning of many types of regression

Bivariate Regression Analysis. The beginning of many types of regression Bivariate Regression Analysis The beginning of many types of regression TOPICS Beyond Correlation Forecasting Two points to estimate the slope Meeting the BLUE criterion The OLS method Purpose of Regression

More information

Exercise 1.12 (Pg. 22-23)

Exercise 1.12 (Pg. 22-23) Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

More information

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

 Y. Notation and Equations for Regression Lecture 11/4. Notation: Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

More information

Univariate Regression

Univariate Regression Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is

More information

Mind on Statistics. Chapter 3

Mind on Statistics. Chapter 3 Mind on Statistics Chapter 3 Section 3.1 1. Which one of the following is not appropriate for studying the relationship between two quantitative variables? A. Scatterplot B. Bar chart C. Correlation D.

More information

Introduction to Regression and Data Analysis

Introduction to Regression and Data Analysis Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

More information

e = random error, assumed to be normally distributed with mean 0 and standard deviation σ

e = random error, assumed to be normally distributed with mean 0 and standard deviation σ 1 Linear Regression 1.1 Simple Linear Regression Model The linear regression model is applied if we want to model a numeric response variable and its dependency on at least one numeric factor variable.

More information

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear. Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear. In the main dialog box, input the dependent variable and several predictors.

More information

Statistiek II. John Nerbonne. March 24, 2010. Information Science, Groningen Slides improved a lot by Harmut Fitz, Groningen!

Statistiek II. John Nerbonne. March 24, 2010. Information Science, Groningen Slides improved a lot by Harmut Fitz, Groningen! Information Science, Groningen j.nerbonne@rug.nl Slides improved a lot by Harmut Fitz, Groningen! March 24, 2010 Correlation and regression We often wish to compare two different variables Examples: compare

More information

Linear Regression in SPSS

Linear Regression in SPSS Linear Regression in SPSS Data: mangunkill.sav Goals: Examine relation between number of handguns registered (nhandgun) and number of man killed (mankill) checking Predict number of man killed using number

More information

Correlation and Regression Analysis: SPSS

Correlation and Regression Analysis: SPSS Correlation and Regression Analysis: SPSS Bivariate Analysis: Cyberloafing Predicted from Personality and Age These days many employees, during work hours, spend time on the Internet doing personal things,

More information

Sydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 1. 1. Introduction p. 2. 2. Statistical Methods Used p. 5. 3. 10 and under Males p.

Sydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 1. 1. Introduction p. 2. 2. Statistical Methods Used p. 5. 3. 10 and under Males p. Sydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 1 Table of Contents 1. Introduction p. 2 2. Statistical Methods Used p. 5 3. 10 and under Males p. 8 4. 11 and up Males p. 10 5. 10 and under

More information

Linear Regression. Chapter 5. Prediction via Regression Line Number of new birds and Percent returning. Least Squares

Linear Regression. Chapter 5. Prediction via Regression Line Number of new birds and Percent returning. Least Squares Linear Regression Chapter 5 Regression Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). We can then predict the average response for all subjects

More information

SPSS for Exploratory Data Analysis Data used in this guide: studentp.sav (http://people.ysu.edu/~gchang/stat/studentp.sav)

SPSS for Exploratory Data Analysis Data used in this guide: studentp.sav (http://people.ysu.edu/~gchang/stat/studentp.sav) Data used in this guide: studentp.sav (http://people.ysu.edu/~gchang/stat/studentp.sav) Organize and Display One Quantitative Variable (Descriptive Statistics, Boxplot & Histogram) 1. Move the mouse pointer

More information

Regression Analysis: A Complete Example

Regression Analysis: A Complete Example Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

More information

2013 MBA Jump Start Program. Statistics Module Part 3

2013 MBA Jump Start Program. Statistics Module Part 3 2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just

More information

Bivariate Analysis. Correlation. Correlation. Pearson's Correlation Coefficient. Variable 1. Variable 2

Bivariate Analysis. Correlation. Correlation. Pearson's Correlation Coefficient. Variable 1. Variable 2 Bivariate Analysis Variable 2 LEVELS >2 LEVELS COTIUOUS Correlation Used when you measure two continuous variables. Variable 2 2 LEVELS X 2 >2 LEVELS X 2 COTIUOUS t-test X 2 X 2 AOVA (F-test) t-test AOVA

More information

2. Simple Linear Regression

2. Simple Linear Regression Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

More information

Using Minitab for Regression Analysis: An extended example

Using Minitab for Regression Analysis: An extended example Using Minitab for Regression Analysis: An extended example The following example uses data from another text on fertilizer application and crop yield, and is intended to show how Minitab can be used to

More information

STAT 350 Practice Final Exam Solution (Spring 2015)

STAT 350 Practice Final Exam Solution (Spring 2015) PART 1: Multiple Choice Questions: 1) A study was conducted to compare five different training programs for improving endurance. Forty subjects were randomly divided into five groups of eight subjects

More information

Yiming Peng, Department of Statistics. February 12, 2013

Yiming Peng, Department of Statistics. February 12, 2013 Regression Analysis Using JMP Yiming Peng, Department of Statistics February 12, 2013 2 Presentation and Data http://www.lisa.stat.vt.edu Short Courses Regression Analysis Using JMP Download Data to Desktop

More information

Formula for linear models. Prediction, extrapolation, significance test against zero slope.

Formula for linear models. Prediction, extrapolation, significance test against zero slope. Formula for linear models. Prediction, extrapolation, significance test against zero slope. Last time, we looked the linear regression formula. It s the line that fits the data best. The Pearson correlation

More information

Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation

Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation Display and Summarize Correlation for Direction and Strength Properties of Correlation Regression Line Cengage

More information

Example: Boats and Manatees

Example: Boats and Manatees Figure 9-6 Example: Boats and Manatees Slide 1 Given the sample data in Table 9-1, find the value of the linear correlation coefficient r, then refer to Table A-6 to determine whether there is a significant

More information

Chapter 10. Analysis of Covariance. 10.1 Multiple regression

Chapter 10. Analysis of Covariance. 10.1 Multiple regression Chapter 10 Analysis of Covariance An analysis procedure for looking at group effects on a continuous outcome when some other continuous explanatory variable also has an effect on the outcome. This chapter

More information

An analysis appropriate for a quantitative outcome and a single quantitative explanatory. 9.1 The model behind linear regression

An analysis appropriate for a quantitative outcome and a single quantitative explanatory. 9.1 The model behind linear regression Chapter 9 Simple Linear Regression An analysis appropriate for a quantitative outcome and a single quantitative explanatory variable. 9.1 The model behind linear regression When we are examining the relationship

More information

Module 5: Multiple Regression Analysis

Module 5: Multiple Regression Analysis Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

More information

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the

More information

MTH 140 Statistics Videos

MTH 140 Statistics Videos MTH 140 Statistics Videos Chapter 1 Picturing Distributions with Graphs Individuals and Variables Categorical Variables: Pie Charts and Bar Graphs Categorical Variables: Pie Charts and Bar Graphs Quantitative

More information

Chapter 2 Probability Topics SPSS T tests

Chapter 2 Probability Topics SPSS T tests Chapter 2 Probability Topics SPSS T tests Data file used: gss.sav In the lecture about chapter 2, only the One-Sample T test has been explained. In this handout, we also give the SPSS methods to perform

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is

More information

SPSS: Descriptive and Inferential Statistics. For Windows

SPSS: Descriptive and Inferential Statistics. For Windows For Windows August 2012 Table of Contents Section 1: Summarizing Data...3 1.1 Descriptive Statistics...3 Section 2: Inferential Statistics... 10 2.1 Chi-Square Test... 10 2.2 T tests... 11 2.3 Correlation...

More information

Simple Regression Theory II 2010 Samuel L. Baker

Simple Regression Theory II 2010 Samuel L. Baker SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

More information

The importance of graphing the data: Anscombe s regression examples

The importance of graphing the data: Anscombe s regression examples The importance of graphing the data: Anscombe s regression examples Bruce Weaver Northern Health Research Conference Nipissing University, North Bay May 30-31, 2008 B. Weaver, NHRC 2008 1 The Objective

More information

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ STA 3024 Practice Problems Exam 2 NOTE: These are just Practice Problems. This is NOT meant to look just like the test, and it is NOT the only thing that you should study. Make sure you know all the material

More information

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1) CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

More information

Regression. Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.

Regression. Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question. Class: Date: Regression Multiple Choice Identify the choice that best completes the statement or answers the question. 1. Given the least squares regression line y8 = 5 2x: a. the relationship between

More information

Pearson s correlation

Pearson s correlation Pearson s correlation Introduction Often several quantitative variables are measured on each member of a sample. If we consider a pair of such variables, it is frequently of interest to establish if there

More information

Inferential Statistics

Inferential Statistics Inferential Statistics Sampling and the normal distribution Z-scores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are

More information

Introductory Statistics Notes

Introductory Statistics Notes Introductory Statistics Notes Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 35487-0348 Phone: (205) 348-4431 Fax: (205) 348-8648 August

More information

The aspect of the data that we want to describe/measure is the degree of linear relationship between and The statistic r describes/measures the degree

The aspect of the data that we want to describe/measure is the degree of linear relationship between and The statistic r describes/measures the degree PS 511: Advanced Statistics for Psychological and Behavioral Research 1 Both examine linear (straight line) relationships Correlation works with a pair of scores One score on each of two variables ( and

More information

Chapter 9. Section Correlation

Chapter 9. Section Correlation Chapter 9 Section 9.1 - Correlation Objectives: Introduce linear correlation, independent and dependent variables, and the types of correlation Find a correlation coefficient Test a population correlation

More information

Linear Models in STATA and ANOVA

Linear Models in STATA and ANOVA Session 4 Linear Models in STATA and ANOVA Page Strengths of Linear Relationships 4-2 A Note on Non-Linear Relationships 4-4 Multiple Linear Regression 4-5 Removal of Variables 4-8 Independent Samples

More information

Premaster Statistics Tutorial 4 Full solutions

Premaster Statistics Tutorial 4 Full solutions Premaster Statistics Tutorial 4 Full solutions Regression analysis Q1 (based on Doane & Seward, 4/E, 12.7) a. Interpret the slope of the fitted regression = 125,000 + 150. b. What is the prediction for

More information

Final Exam Practice Problem Answers

Final Exam Practice Problem Answers Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal

More information

Directions for using SPSS

Directions for using SPSS Directions for using SPSS Table of Contents Connecting and Working with Files 1. Accessing SPSS... 2 2. Transferring Files to N:\drive or your computer... 3 3. Importing Data from Another File Format...

More information

Simple Regression and Correlation

Simple Regression and Correlation Simple Regression and Correlation Today, we are going to discuss a powerful statistical technique for examining whether or not two variables are related. Specifically, we are going to talk about the ideas

More information

Chapter 10 - Practice Problems 1

Chapter 10 - Practice Problems 1 Chapter 10 - Practice Problems 1 1. A researcher is interested in determining if one could predict the score on a statistics exam from the amount of time spent studying for the exam. In this study, the

More information

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,

More information

Homework 8 Solutions

Homework 8 Solutions Math 17, Section 2 Spring 2011 Homework 8 Solutions Assignment Chapter 7: 7.36, 7.40 Chapter 8: 8.14, 8.16, 8.28, 8.36 (a-d), 8.38, 8.62 Chapter 9: 9.4, 9.14 Chapter 7 7.36] a) A scatterplot is given below.

More information

Correlation key concepts:

Correlation key concepts: CORRELATION Correlation key concepts: Types of correlation Methods of studying correlation a) Scatter diagram b) Karl pearson s coefficient of correlation c) Spearman s Rank correlation coefficient d)

More information

The Dummy s Guide to Data Analysis Using SPSS

The Dummy s Guide to Data Analysis Using SPSS The Dummy s Guide to Data Analysis Using SPSS Mathematics 57 Scripps College Amy Gamble April, 2001 Amy Gamble 4/30/01 All Rights Rerserved TABLE OF CONTENTS PAGE Helpful Hints for All Tests...1 Tests

More information

Relationships Between Two Variables: Scatterplots and Correlation

Relationships Between Two Variables: Scatterplots and Correlation Relationships Between Two Variables: Scatterplots and Correlation Example: Consider the population of cars manufactured in the U.S. What is the relationship (1) between engine size and horsepower? (2)

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

ABSORBENCY OF PAPER TOWELS

ABSORBENCY OF PAPER TOWELS ABSORBENCY OF PAPER TOWELS 15. Brief Version of the Case Study 15.1 Problem Formulation 15.2 Selection of Factors 15.3 Obtaining Random Samples of Paper Towels 15.4 How will the Absorbency be measured?

More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

Once saved, if the file was zipped you will need to unzip it.

Once saved, if the file was zipped you will need to unzip it. 1 Commands in SPSS 1.1 Dowloading data from the web The data I post on my webpage will be either in a zipped directory containing a few files or just in one file containing data. Please learn how to unzip

More information

Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS.

Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS. SPSS Regressions Social Science Research Lab American University, Washington, D.C. Web. www.american.edu/provost/ctrl/pclabs.cfm Tel. x3862 Email. SSRL@American.edu Course Objective This course is designed

More information

We are often interested in the relationship between two variables. Do people with more years of full-time education earn higher salaries?

We are often interested in the relationship between two variables. Do people with more years of full-time education earn higher salaries? Statistics: Correlation Richard Buxton. 2008. 1 Introduction We are often interested in the relationship between two variables. Do people with more years of full-time education earn higher salaries? Do

More information

Using R for Linear Regression

Using R for Linear Regression Using R for Linear Regression In the following handout words and symbols in bold are R functions and words and symbols in italics are entries supplied by the user; underlined words and symbols are optional

More information

Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

More information

REGRESSION LINES IN STATA

REGRESSION LINES IN STATA REGRESSION LINES IN STATA THOMAS ELLIOTT 1. Introduction to Regression Regression analysis is about eploring linear relationships between a dependent variable and one or more independent variables. Regression

More information

1.5 Oneway Analysis of Variance

1.5 Oneway Analysis of Variance Statistics: Rosie Cornish. 200. 1.5 Oneway Analysis of Variance 1 Introduction Oneway analysis of variance (ANOVA) is used to compare several means. This method is often used in scientific or medical experiments

More information

Technology Step-by-Step Using StatCrunch

Technology Step-by-Step Using StatCrunch Technology Step-by-Step Using StatCrunch Section 1.3 Simple Random Sampling 1. Select Data, highlight Simulate Data, then highlight Discrete Uniform. 2. Fill in the following window with the appropriate

More information

Chapter 7 Section 7.1: Inference for the Mean of a Population

Chapter 7 Section 7.1: Inference for the Mean of a Population Chapter 7 Section 7.1: Inference for the Mean of a Population Now let s look at a similar situation Take an SRS of size n Normal Population : N(, ). Both and are unknown parameters. Unlike what we used

More information

An analysis method for a quantitative outcome and two categorical explanatory variables.

An analysis method for a quantitative outcome and two categorical explanatory variables. Chapter 11 Two-Way ANOVA An analysis method for a quantitative outcome and two categorical explanatory variables. If an experiment has a quantitative outcome and two categorical explanatory variables that

More information

EPS 625 ANALYSIS OF COVARIANCE (ANCOVA) EXAMPLE USING THE GENERAL LINEAR MODEL PROGRAM

EPS 625 ANALYSIS OF COVARIANCE (ANCOVA) EXAMPLE USING THE GENERAL LINEAR MODEL PROGRAM EPS 6 ANALYSIS OF COVARIANCE (ANCOVA) EXAMPLE USING THE GENERAL LINEAR MODEL PROGRAM ANCOVA One Continuous Dependent Variable (DVD Rating) Interest Rating in DVD One Categorical/Discrete Independent Variable

More information

AP Statistics 2001 Solutions and Scoring Guidelines

AP Statistics 2001 Solutions and Scoring Guidelines AP Statistics 2001 Solutions and Scoring Guidelines The materials included in these files are intended for non-commercial use by AP teachers for course and exam preparation; permission for any other use

More information

COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES.

COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES. 277 CHAPTER VI COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES. This chapter contains a full discussion of customer loyalty comparisons between private and public insurance companies

More information

FEV1 (litres) Figure 1: Models for gas consumption and lung capacity

FEV1 (litres) Figure 1: Models for gas consumption and lung capacity Simple Linear Regression: Reliability of predictions Richard Buxton. 2008. 1 Introduction We often use regression models to make predictions. In Figure 1 (a), we ve fitted a model relating a household

More information

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Chapter Seven Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Section : An introduction to multiple regression WHAT IS MULTIPLE REGRESSION? Multiple

More information

Regression step-by-step using Microsoft Excel

Regression step-by-step using Microsoft Excel Step 1: Regression step-by-step using Microsoft Excel Notes prepared by Pamela Peterson Drake, James Madison University Type the data into the spreadsheet The example used throughout this How to is a regression

More information

Independent t- Test (Comparing Two Means)

Independent t- Test (Comparing Two Means) Independent t- Test (Comparing Two Means) The objectives of this lesson are to learn: the definition/purpose of independent t-test when to use the independent t-test the use of SPSS to complete an independent

More information

MULTIPLE REGRESSION EXAMPLE

MULTIPLE REGRESSION EXAMPLE MULTIPLE REGRESSION EXAMPLE For a sample of n = 166 college students, the following variables were measured: Y = height X 1 = mother s height ( momheight ) X 2 = father s height ( dadheight ) X 3 = 1 if

More information

Statistics and research

Statistics and research Statistics and research Usaneya Perngparn Chitlada Areesantichai Drug Dependence Research Center (WHOCC for Research and Training in Drug Dependence) College of Public Health Sciences Chulolongkorn University,

More information

SPSS Explore procedure

SPSS Explore procedure SPSS Explore procedure One useful function in SPSS is the Explore procedure, which will produce histograms, boxplots, stem-and-leaf plots and extensive descriptive statistics. To run the Explore procedure,

More information

Predictability Study of ISIP Reading and STAAR Reading: Prediction Bands. March 2014

Predictability Study of ISIP Reading and STAAR Reading: Prediction Bands. March 2014 Predictability Study of ISIP Reading and STAAR Reading: Prediction Bands March 2014 Chalie Patarapichayatham 1, Ph.D. William Fahle 2, Ph.D. Tracey R. Roden 3, M.Ed. 1 Research Assistant Professor in the

More information

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management KSTAT MINI-MANUAL Decision Sciences 434 Kellogg Graduate School of Management Kstat is a set of macros added to Excel and it will enable you to do the statistics required for this course very easily. To

More information

Factors affecting online sales

Factors affecting online sales Factors affecting online sales Table of contents Summary... 1 Research questions... 1 The dataset... 2 Descriptive statistics: The exploratory stage... 3 Confidence intervals... 4 Hypothesis tests... 4

More information

Nonlinear Regression Functions. SW Ch 8 1/54/

Nonlinear Regression Functions. SW Ch 8 1/54/ Nonlinear Regression Functions SW Ch 8 1/54/ The TestScore STR relation looks linear (maybe) SW Ch 8 2/54/ But the TestScore Income relation looks nonlinear... SW Ch 8 3/54/ Nonlinear Regression General

More information

AP Statistics Section :12.2 Transforming to Achieve Linearity

AP Statistics Section :12.2 Transforming to Achieve Linearity AP Statistics Section :12.2 Transforming to Achieve Linearity In Chapter 3, we learned how to analyze relationships between two quantitative variables that showed a linear pattern. When two-variable data

More information