Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression



Similar documents
Unit 26 Estimation with Confidence Intervals

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

SPSS Guide: Regression Analysis

Univariate Regression

Chapter 7: Simple linear regression Learning Objectives

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Simple Regression Theory II 2010 Samuel L. Baker

Hypothesis testing - Steps

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

II. DISTRIBUTIONS distribution normal distribution. standard scores

Introduction to Quantitative Methods

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

Introduction to Regression and Data Analysis

Regression Analysis: A Complete Example

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

The correlation coefficient

Introduction to Hypothesis Testing

2. Simple Linear Regression

Two-Sample T-Tests Assuming Equal Variance (Enter Means)

Descriptive Statistics

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Simple Linear Regression Inference

Chapter 23. Inferences for Regression

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Fairfield Public Schools

STAT 350 Practice Final Exam Solution (Spring 2015)

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference)

General Method: Difference of Means. 3. Calculate df: either Welch-Satterthwaite formula or simpler df = min(n 1, n 2 ) 1.

Multiple Linear Regression

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

Example: Boats and Manatees

Chapter 5 Analysis of variance SPSS Analysis of variance

Factors affecting online sales

Independent t- Test (Comparing Two Means)

SIMPLE LINEAR CORRELATION. r can range from -1 to 1, and is independent of units of measurement. Correlation can be done on two dependent variables.

ch12 practice test SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.

Week TSX Index

THE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7.

Correlation and Simple Linear Regression

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Recall this chart that showed how most of our course would be organized:

Part 2: Analysis of Relationship Between Two Variables

Simple linear regression

Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test

Association Between Variables

Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS.

Two-sample t-tests. - Independent samples - Pooled standard devation - The equal variance assumption

11. Analysis of Case-control Studies Logistic Regression

Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation

UNDERSTANDING THE DEPENDENT-SAMPLES t TEST

Analysis of Variance ANOVA

Pearson's Correlation Tests

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

Section Format Day Begin End Building Rm# Instructor. 001 Lecture Tue 6:45 PM 8:40 PM Silver 401 Ballerini

Module 5: Multiple Regression Analysis

BA 275 Review Problems - Week 5 (10/23/06-10/27/06) CD Lessons: 48, 49, 50, 51, 52 Textbook: pp

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Introduction. Hypothesis Testing. Hypothesis Testing. Significance Testing

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

August 2012 EXAMINATIONS Solution Part I

This chapter will demonstrate how to perform multiple linear regression with IBM SPSS

Final Exam Practice Problem Answers

Calculating P-Values. Parkland College. Isela Guerra Parkland College. Recommended Citation

Pearson s Correlation

3.4 Statistical inference for 2 populations based on two samples

Linear Models in STATA and ANOVA

C. The null hypothesis is not rejected when the alternative hypothesis is true. A. population parameters.

Section 13, Part 1 ANOVA. Analysis Of Variance

1.5 Oneway Analysis of Variance

Odds ratio, Odds ratio test for independence, chi-squared statistic.

ANALYSIS OF TREND CHAPTER 5

Elementary Statistics Sample Exam #3

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

Definition 8.1 Two inequalities are equivalent if they have the same solution set. Add or Subtract the same value on both sides of the inequality.

Name: Date: Use the following to answer questions 3-4:

Comparing Nested Models

17. SIMPLE LINEAR REGRESSION II

Lecture Notes Module 1

Statistical Models in R

Part 3. Comparing Groups. Chapter 7 Comparing Paired Groups 189. Chapter 8 Comparing Two Independent Groups 217

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

Statistics 2014 Scoring Guidelines

Using R for Linear Regression

Having a coin come up heads or tails is a variable on a nominal scale. Heads is a different category from tails.

COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES.

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

CURVE FITTING LEAST SQUARES APPROXIMATION

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

An analysis appropriate for a quantitative outcome and a single quantitative explanatory. 9.1 The model behind linear regression

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

AP STATISTICS (Warm-Up Exercises)

In the general population of 0 to 4-year-olds, the annual incidence of asthma is 1.4%

2 Sample t-test (unequal sample sizes and unequal variances)

Principles of Hypothesis Testing for Public Health

Confidence Intervals for the Difference Between Two Means

Transcription:

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a statistically significant slope in a linear regression and testing for a statistically significant linear relationship (i.e., correlation) are actually the same test Recall that we can use the Pearson correlation r and the least squares line to describe a linear relationship between two quantitative variables X and Y. We called a sample of observations on two such quantitative variables bivariate data, represented as (x 1, y 1 ), (x, y ),, (x n, y n ). Typically, we let Y represent the response variable and let X represent the explanatory variable. From such a data set, we have previously seen how to obtain the Pearson correlation r and the least squares line. We now want a hypothesis test to decide whether a linear relationship is statistically significant. For example, return to Table 10- where data is recorded on X = the dosage of a certain drug (in grams) and Y = the reaction time to a particular stimulus (in seconds) for each subject in a study. We have reproduced this data here as Table 31-1. From the calculations in Table 10-3, you previously found the correlation between dosage and reaction time to be r = 0.968, and from these same calculations you also found the least squares line to be rct = 10.5 0.875(dsg), where rct represents reaction time in seconds, and dsg represents dosage in grams. However, these are just descriptive statistics; alone, they can not tell us whether there is sufficient evidence of a linear relationship between the two quantitative variables. When we study the prediction of a quantitative response variable Y from a quantitative explanatory variable X based on the equation for a straight line, we say we are studying the linear regression to predict Y from X or the linear regression of Y on X. Deciding whether or not a linear relationship is significant is the same as deciding whether or not the slope in the linear regression to predict Y from X is different from zero. If the slope is zero, then Y does not tend to change as X changes; if the slope is not zero, then Y will tend to change as X changes. We need a hypothesis test to decide if there is sufficient evidence that the slope in a simple linear regression is different from zero. Such a test could be one-sided or two-sided. If our focus was on finding evidence that the slope is greater than zero (i.e., finding evidence that the linear relationship was a positive one), then we would use a one-sided test. If our focus was on finding evidence that the slope is less than zero (i.e., finding evidence that the linear relationship was a negative one), then we would use a one-sided test. However, if our focus was on finding evidence that the slope is different from zero in either direction (i.e., finding evidence that the linear relationship was either positive or negative), then we would use a two-sided test. With a simple random sample of bivariate data of size n, the following t statistic with n degrees of freedom is available for the desired hypothesis test: t n = s Y X b 0 ( x x), 34

where s Y X is called the standard error of estimate. The formula for this t statistic has a format similar to other t statistics we have encountered. The numerator is the difference between a sample estimate of the slope (b) and the hypothesized value of the slope (zero), and the denominator is an appropriate standard error. The calculation of this test statistic requires obtaining the slope b of the least squares line, the sum of the squared deviations of the sample x values from their mean ( x x), and the standard error of estimate s Y X. We have previously seen how to obtain b and ( x x), but we have never previously encountered the standard error of estimate s Y X. You can think of the standard error of estimate s Y X for bivariate data similar to the way you think of the standard deviation s for one sample of observations of a quantitative variable, which can be called univariate data. Recall that the standard deviation for a sample x values is obtained by dividing ( x x) by one less than the sample size, and taking the square root of the result. The standard error of estimate s Y X is a measure of the dispersion of data points around the least squares line similar to the way that the standard deviation s is a measure of the dispersion of observations (around the sample mean) in a single sample. The standard error of estimate s Y X is obtained by dividing the sum of the squared residuals by two less than the sample size, and taking the square root of the result. If you wish, you may do this calculation for the data of Table 31-1, after which you can calculate the test statistic t n. In general, a substantial amount of calculation is needed to obtain this test statistic, and the use of appropriate statistical software or a programmable calculator is recommended. Consequently, shall not concern ourselves with these calculations in detail. To illustrate the t test about the slope in a simple linear regression, let us consider using a 0.05 significance level to perform a hypothesis test to see if the data of Table 31-1 (Table 10-) provide evidence that the linear relationship between drug dosage and reaction time is significant, or in other words, evidence that the slope in the linear regression to predict reaction time from drug dosage is different from zero. The first step is to state the null and alternative hypotheses, and choose a significance level. This test is two-sided, since we are looking for a relationship which could be positive or negative (i.e., a slope which could be different from zero in either direction). We can complete the first step of the hypothesis test as follows: H 0 : slope = 0 vs. H 1 : slope 0 (α = 0.05, two-sided test). We could alternatively choose to write the hypotheses as follows: H 0 : The linear relationship between drug dosage and reaction time is not significant vs. (α = 0.05) H 1 : The linear relationship between drug dosage and reaction time is significant The second step is to collect data and calculate the value of the test statistic. Whether you choose to do all the calculations yourself or to use the appropriate statistical software or programmable calculator, you should find the test statistic to be t 6 = 8.73. The third step is to define the rejection region, decide whether or not to reject the null hypothesis, and obtain the p-value of the test. Since we have chosen α = 0.05 for a two-sided test, the rejection region is defined by the t-score with df = 6 above which 0.05 of the area lies; below the negative of this t-score will lie 0.05 of the area, making a total area of 0.05. From Table A.3, we find that t 6;0.05 =.447. We can then define our rejection region algebraically as t 6.447 or t 6 +.447 (i.e., t 6.447). Since our test statistic, calculated in the second step, was found to be t 6 = 8.73, which is in the rejection region, our decision is to reject H 0 : slope = 0 ; in other words, our data provides sufficient evidence to believe that H 1 : slope 0 is true. From Table A.3, we find that t 6;0.0005 = 5.595, and since our observed test statistic is t 6 = 8.73, we see that p-value < 0.0005. To complete the fourth step of the hypothesis test, we can summarize the results of the hypothesis test as follows: 35

Since t 6 = 8.73 and t 6;0.05 =.447, we have sufficient evidence to reject H 0. We conclude that the slope in the linear regression to predict reaction time from drug dosage is different from zero, or in other words, the linear relationship between dosage and reaction time is significant (p-value < 0.0005). The data suggest that the slope is less than zero, or in other words, that the linear relationship is negative. If we do not reject the null hypothesis, then no further analysis is necessary, since we are concluding that the linear relationship is not significant. On the other hand, if we do reject the null hypothesis, then we would want to describe the linear relationship, which of course is easily done by finding the least squares line. This was already done previously for the data of Table 31-1 in Unit 10 where we were working with the same data in Table 10-. There are also further analyses that are possible once the least squares line has been obtained for a significant regression. These analyses include hypothesis testing and confidence intervals concerning the slope, the intercept, and various predicted values; there are also prediction intervals which can be obtained. These analyses are beyond the scope of our discussion here but are generally available from the appropriate statistical software or programmable calculator. Finally, it is important to keep in mind that the t test concerning whether or not a linear relationship is significant (i.e., the test concerning a slope) is based on the assumptions that the relationship between two quantitative variables is indeed linear and that the variable Y has a normal distribution for each given value of X. Previously, we have discussed some descriptive methods for verifying the linearity assumption. Hypothesis tests are available concerning both the linearity assumption and the normality assumption. These tests are beyond the scope of our discussion here but are generally available from the appropriate statistical software or programmable calculator. Self-Test Problem 31-1. In Self-Test Problems 10-1 and 11-1, the between age and grip strength among right-handed males is being investigated, with regard to the prediction of grip strength from age. The Age and Grip Strength Data, displayed in Table 10-4, was recorded for several right-handed males. For convenience, we have redisplayed the data here in Table 31-. (a) Explain how the data in Table 31- is appropriate for a hypothesis test concerning whether or not a linear relationship is significant. (b) A 0.05 significance level is chosen for a hypothesis test to see if there is any evidence that the linear relationship between age and grip strength among right-handed males is positive, or in other words, evidence that the slope is positive in the linear regression to predict grip strength from age. Complete the four steps of the hypothesis test by completing the table titled Hypothesis Test for Self-Test Problem 31-1. You should find that t 10 = +3.814. (c) Verify that the least squares line (found in Self-Test Problem 11-1) is grp = 6 + (age), and write a one sentence interpretation of the slope of the least squares line. (d) Considering the results of the hypothesis test, decide which of the Type I or Type II errors is possible, and describe this error. (e) Decide whether H 0 would have been rejected or would not have been rejected with each of the following significance levels: (i) α = 0.01, (ii) α = 0.10. 36

Answers to Self-Test Problems 31-1 (a) The data consists of a random sample of observations of the two quantitative variables age and grip strength. (b) Step 1: H 0 : slope = 0 vs. H 1 : slope > 0 (α = 0.05, one-sided test) OR H 0 : The linear relationship between age and grip strength is not significant vs. (α = 0.05) H 1 : The linear relationship between age and grip strength is significant Step : t 10 = +3.814 Step 3: The rejection is t 10 +1.81. H 0 is rejected; 0.0005 < p-value < 0.005. Step 4: Since t 10 = 3.814 and t 10; 0.05 = 1.81, we have sufficient evidence to reject H 0. We conclude that the slope is positive in the regression to predict grip strength from age among right-handed males, or in other words, that the linear relationship between age and grip strength is positive (0.0005 < p-value < 0.005). (c) The squares line (found in Self-Test Problem 11-1) is grp = 6 + (age). With each increase of one year in age, grip strength increases on average by about lbs. (d) Since H 0 is rejected, the Type I error is possible, which is concluding that slope > 0 (i.e., the positive linear relationship is significant) when in reality slope = 0 (i.e., the linear relationship is not significant). (e) H 0 would have been rejected with both α = 0.01 and α = 0.10. 37

Summary When studying a possible linear relationship between two quantitative variables X and Y, we can obtain the Pearson correlation r and the least squares line from a sample consisting of bivariate data, represented as (x 1, y 1 ), (x, y ),, (x n, y n ). With Y representing the response variable and X representing the explanatory variable, we say that we are studying the linear regression to predict Y from X or the linear regression of Y on X. Deciding whether or not the linear relationship between X and Y is significant is the same as deciding whether or not the slope in the linear regression of Y on X is different from zero. To make this decision, the t statistic with n degrees of freedom which can be used is t n = s Y X b 0 ( x x), where s Y X is called the standard error of estimate, calculated by dividing the sum of the squared residuals by two less than the sample size, and taking the square root of this result. The numerator of this t statistic is the difference between a sample estimate of the slope (b) and the hypothesized value of the slope (zero), and the denominator is an appropriate standard error. These calculations can be greatly simplified by the use of appropriate statistical software or a programmable calculator. The null hypothesis in this hypothesis test could be stated as either H 0 : slope = 0 or H 0 : The linear relationship between X and Y is not significant, and this test and the test could be one-sided or two-sided. If our focus was on finding evidence that the slope is greater than zero (i.e., finding evidence that the linear relationship was a positive one), then we would use a one-sided test. If our focus was on finding evidence that the slope is less than zero (i.e., finding evidence that the linear relationship was a negative one), then we would use a one-sided test. However, if our focus was on finding evidence that the slope is different from zero in either direction (i.e., finding evidence that the linear relationship was either positive or negative), then we would use a two-sided test. If we do not reject the null hypothesis, then no further analysis is necessary, since we are concluding that the linear relationship is not significant. On the other hand, if we do reject the null hypothesis, then we would want to describe the linear relationship, which of course is easily done by finding the least squares line. Further analyses that are possible once the least squares line has been obtained for a significant regression include hypothesis testing and confidence intervals concerning the slope, the intercept, and various predicted values; there are also prediction intervals which can be obtained. The t test concerning whether or not a linear relationship is significant (i.e., the test concerning a slope) is based on the assumptions that the relationship between two quantitative variables is indeed linear and that the variable Y has a normal distribution for each given value of X. Hypothesis tests are available concerning both the linearity assumption and the normality assumption. 38