Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means



Similar documents
Part 3. Comparing Groups. Chapter 7 Comparing Paired Groups 189. Chapter 8 Comparing Two Independent Groups 217

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

THE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7.

NCSS Statistical Software

Confidence Intervals for the Difference Between Two Means

Hypothesis testing - Steps

SIMPLE LINEAR CORRELATION. r can range from -1 to 1, and is independent of units of measurement. Correlation can be done on two dependent variables.

Two Related Samples t Test

Two-Sample T-Tests Assuming Equal Variance (Enter Means)

Lecture Notes Module 1

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference)

Introduction. Hypothesis Testing. Hypothesis Testing. Significance Testing

1.5 Oneway Analysis of Variance

Independent t- Test (Comparing Two Means)

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon

Recall this chart that showed how most of our course would be organized:

2 Sample t-test (unequal sample sizes and unequal variances)

NCSS Statistical Software. One-Sample T-Test

Section 13, Part 1 ANOVA. Analysis Of Variance

3.4 Statistical inference for 2 populations based on two samples

Study Guide for the Final Exam

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

Comparing Two Groups. Standard Error of ȳ 1 ȳ 2. Setting. Two Independent Samples

Unit 26 Estimation with Confidence Intervals

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

CALCULATIONS & STATISTICS

UNDERSTANDING THE DEPENDENT-SAMPLES t TEST

Chapter 7. Comparing Means in SPSS (t-tests) Compare Means analyses. Specifically, we demonstrate procedures for running Dependent-Sample (or

Paired T-Test. Chapter 208. Introduction. Technical Details. Research Questions

Two-sample hypothesis testing, II /16/2004

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

HYPOTHESIS TESTING: POWER OF THE TEST

Chapter 7 Notes - Inference for Single Samples. You know already for a large sample, you can invoke the CLT so:

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

Chi-square test Fisher s Exact test

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Basic Statistical and Modeling Procedures Using SAS

Chapter 7 Section 7.1: Inference for the Mean of a Population

Stat 411/511 THE RANDOMIZATION TEST. Charlotte Wickham. stat511.cwick.co.nz. Oct

Tests for Two Proportions

Part 2: Analysis of Relationship Between Two Variables

Chapter 9. Two-Sample Tests. Effect Sizes and Power Paired t Test Calculation

General Method: Difference of Means. 3. Calculate df: either Welch-Satterthwaite formula or simpler df = min(n 1, n 2 ) 1.

NCSS Statistical Software

One-Way Analysis of Variance

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Calculating P-Values. Parkland College. Isela Guerra Parkland College. Recommended Citation

Chapter 2 Probability Topics SPSS T tests

HYPOTHESIS TESTING WITH SPSS:

Using Stata for One Sample Tests

Introduction to Hypothesis Testing

C. The null hypothesis is not rejected when the alternative hypothesis is true. A. population parameters.

DDBA 8438: The t Test for Independent Samples Video Podcast Transcript

Comparing Means in Two Populations

Two-sample t-tests. - Independent samples - Pooled standard devation - The equal variance assumption

An Introduction to Statistical Tests for the SAS Programmer Sara Beck, Fred Hutchinson Cancer Research Center, Seattle, WA

Testing a claim about a population mean

Point Biserial Correlation Tests

Odds ratio, Odds ratio test for independence, chi-squared statistic.

Introduction to Analysis of Variance (ANOVA) Limitations of the t-test

Two-sample inference: Continuous data

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY

One-Way ANOVA using SPSS SPSS ANOVA procedures found in the Compare Means analyses. Specifically, we demonstrate

Non-Inferiority Tests for Two Means using Differences

Descriptive Statistics

Section 12 Part 2. Chi-square test

Testing Research and Statistical Hypotheses

Hypothesis Testing: Two Means, Paired Data, Two Proportions

Outline. Definitions Descriptive vs. Inferential Statistics The t-test - One-sample t-test

BA 275 Review Problems - Week 6 (10/30/06-11/3/06) CD Lessons: 53, 54, 55, 56 Textbook: pp , ,

Simple Regression Theory II 2010 Samuel L. Baker

BA 275 Review Problems - Week 5 (10/23/06-10/27/06) CD Lessons: 48, 49, 50, 51, 52 Textbook: pp

research/scientific includes the following: statistical hypotheses: you have a null and alternative you accept one and reject the other

Analysis of Variance ANOVA

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES

Principles of Hypothesis Testing for Public Health

individualdifferences

CHAPTER 14 NONPARAMETRIC TESTS

STAT 350 Practice Final Exam Solution (Spring 2015)

Projects Involving Statistics (& SPSS)

Inference for two Population Means

Introduction to Hypothesis Testing. Hypothesis Testing. Step 1: State the Hypotheses

Testing for differences I exercises with SPSS

Pearson's Correlation Tests

Chapter 14: Repeated Measures Analysis of Variance (ANOVA)

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Chapter 3 RANDOM VARIATE GENERATION

Chapter 7. One-way ANOVA

Understand the role that hypothesis testing plays in an improvement project. Know how to perform a two sample hypothesis test.

WHERE DOES THE 10% CONDITION COME FROM?

Introduction to Quantitative Methods

This chapter discusses some of the basic concepts in inferential statistics.

Chapter 2. Hypothesis testing in one population

Guide to Microsoft Excel for calculations, statistics, and plotting data

Transcription:

Lesson : Comparison of Population Means Part c: Comparison of Two- Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means.

Steps in Hypothesis Testing. State the null hypothesis H 0 and the alternative hypothesis H a.. Calculate the value of the test statistic on which the test will be based. 3. Find the p-value for the observed data. 4. State a conclusion. Recall the steps in hypothesis testing. First, we state the null and alternative hypothesis to address our research question. Next, calculate the test statistic. We then compare the test statistic to a density curve to find the p-value. Finally, we compare this p-value to the type I error probability to determine our conclusion.

Hypothesis Testing: Comparing Two Means Identify two independent populations. Draw a simple random sample of size n from population and a simple random sample of size n from population. Compute the mean for each sample. Formulate hypothesis test based on the difference of the means 3 Let s discuss the setting for hypothesis testing of two means. First, we take random samples from two independent populations. We wish to compare the average value mu from these two independent populations to see if they are similar or different. From each sample we compute an average value, xbar. We will then use these sample means to determine if the populations are centered in the same location or different locations. We look at a linear combination (xbar-xbar) to help us make this decision. If xbar-xbar is close to zero, the populations probably have the same center. If xbar-xbar is not close to zero the populations may have different centers. 3

Hypothesis Testing: Comparing Two Means Step : State your hypotheses H 0 : μ - μ = 0 H a : μ - μ 0 (two-sided) Or H a : μ - μ < 0 (one-sided) Or H a : μ - μ > 0 (one-sided) 4 First step is to represent our scientific question is the null and alternative hypothesis. The null: Ho: m-m = 0 represents the condition that the populations are centered in the same spot. As with one-sample and matched pairs hypothesis testing we can have a one-sided or two-sided alternative. The two sided alternative is that the means differ. That is m-m does not equal zero. We could also look at the alternative that m is greater than m. This would be if we subtract m from m we would obtain a negative number. Or We could look at the alternative that m is greater than m. This would be if we subtract m from m we would obtain a positive number. 4

Two sample Problem with σ and σ known. z = ( x x) σ σ + n n 5 If the population standard deviation for both populations is known then the statistic we use is the z statistic. You can see the standard error of the difference of xbar and xbar is in the denominator. You will recall that when we make a linear combination of two means that the variances are additive. This is why the standard error in the denominator has a plus sign instead of a minus sign. As before, knowing the variance of the population is not typical. So we usually substitute the value of the sample variance in where the population variance is in this equation. 5

Use this t when s larger /s smaller > Two sample Problem with σ and σ unknown (do not assume σ =σ ). df= smaller of n - or n - t = ( x x) s n + s n 6 When we do this substitution we switch to using the student s t statistic. Notice in the title it also says do not assume s=s. There are two ways to calculate the standard error of the difference of samples means. The first is what you see in the denominator here. This is when we cannot assume that the population standard deviations are the same from the two independent populations. We do not know the populations standard deviations so we look at our sample standard deviations. If the larger sample standard deviation divided by the smaller sample standard deviation is greater than two, we do not assume sigma=sigma. This is a rule of thumb. It says our sample standard deviations are different enough we cannot assume sigma=sigma. Our degrees of freedom for this t-test are the smaller of n- or n-. 6

Two sample Problem with σ and σ unknown and assumption σ = σ. Use this t when s larger /s smaller < ( x x) μo t = s + p n n withs p with n + n ( n ) s + ( n ) s = n + n degreesof freedom 7 So what happens when s larger over s smaller is less than two? Well, we can assume sigma=sigma. If we make this assumption we use the t test given in this slide. Notice we pool the variances to create a common variance. This is called sp. We then use sp in the denominator to calculate the standard error of the difference of sample means. So why worry about whether or not to assume sigma=sigma? Notice the degrees of freedom when we make the assumption are n+n-. This is more degrees of freedom than in the previous t test. When we have more degrees of freedom in the t-test we have more power to detect a difference should there be one. We want to make the assumption that the population variances are equal when we can. 7

SAS Example We have 0 students, 5 are randomly assigned to control and 5 are randomly assigned to treatment. Response times to a stimulus is measured for all 0 participants. Research question: Do the treatment scores come from a population whose mean is different from which the control scores were drawn? Control mean = 88.6 millisec Treatment mean = 0.6 millisec 8 Let s try an example. We will also introduce a bit of SAS to understand our example. Recall: SAS is a computer language that helps us analyze data. At this point you may have tried the SAS tutorial for the first assignment. If not, you may want to do this before moving on. Consider the example above. You have 0 students, 5 are randomly assigned to control and 5 are randomly assigned to treatment. The outcome of interest is response times to a stimulus. We are wondering of the treatment group and the control group come from populations whose mean values are different. The control sample mean was 88.6 milliseconds. The treatment sample mean was 0.6 milliseconds. 8

Data Response; Input Group $ Time; Datalines; C 80 C 93 C 83 C 89 C 98 T 00 T 03 T 04 T 99 T 0 ; Proc ttest data = response; Title "T-test example"; class group; var time; run; SAS: Proc ttest This is a SAS program that reads in data and performs a two sample t-test. The first line Data response; tells SAS that we want to create a temporary data set called response. The next line Input Group $ Time; tells SAS that we have two variables. One is called Group and it is categorical. This is indicated by the dollar sign following the word Group. The second variable is called time. There is no designation of type of variable after time. SAS will assume it is quantitative if there is not a designation. Next we have Datalines; this tells SAS: Here comes the actual data. Following datalines is the data C and T are control and treatment. The data is separated by a single space with a new line for each person s data. Notice the semicolon is a line below the final piece of data. The data step is now complete. We can then do a procedure on this data. 9

Data Response; Input Group $ Time; Datalines; C 80 C 93 C 83 C 89 C 98 T 00 T 03 T 04 T 99 T 0 ; Proc ttest data = response; Title "T-test example"; class group; var time; run; SAS: Proc ttest The procedure is Proc ttest SAS will analyze the data using a ttest. We tell SAS which data set by data=response; We can insert a title with the title command followed by the title in quotes. Next class group; tells SAS between which two groups we would like to perform the t-test. In our case it is the variable called Groups with control or treatment as group. var time; tells SAS that we want to analyze the outcome time; Finally, run; tells SAS go ahead and analyze using the procedure. Again, for more on SAS programming see the class SAS tutorials. 0

The TTEST Procedure Statistics Lower CL Upper CL Lower CL Upper Variable Group N Mean Mean Mean Std Dev Std Dev Std Dev Std Err Time C 5 79.535 88.6 97.665 4.374 7.3007 0.979 3.65 Time T 5 99.05 0.6 04.7.44.0736 5.9587 0.974 Time Diff (-) -0.83-3 -5.73 3.649 5.3666 0.8 3.394 Note: SAS Tests ->Ho: μ=μ vs. Ha: μ not equal μ If everything ran without error you will see output. This is the first piece of output you see. We have variable time divided by our two groups, Control and treatment. We also have a row indicating the difference of control treatment outcomes. Let s start with row. We see this is our control group with n=5. We have the sample mean of 88.6 with a 95% confidence interval from 79.5 to 97.7. We have a sample standard deviation of 7.3. SAS also gives us the 95% confidence interval for the standard deviation. Finally, SAS gives us the standard error of 7.3 divided by the square root of 5 with value 3.7. We have the same information in row for our treatment group. The third row has the confidence interval for m-m from 0.8 to 5.. Notice this confidence interval does not contain zero. Later we will see why this is important to us. Finally, notice SAS is always testing the two sided hypothesis test: Ho: u=u vs. Ha: u not equal u.

T-Tests Variable Method Variances DF t Value Pr > t Time Pooled Equal 8-3.83 0.0050 Time Satterthwaite Unequal 4.64-3.83 0.04 Note: SAS tests both with assumption σ =σ and σ not equal σ This is the output for the t-test statistic. Notice we have two results Pooled and Satterthwaite. These correspond to the two choices that we have for the t statistic. We need to decide which is appropriate for our analysis based on our data. If we go back to the previous slide we see the standard deviation for the control group over the standard deviation for the treatment group is greater than. This means we cannot assume the population standard deviations are the same. We need to choose the variances unequal option (Satterthwaite). Our degrees of freedom when we do this calculation by hand are the smalelr of n- or n-. SAS uses a formula to calculate more exact degrees of freedom, so our number will not match. The t-value is -3.83 with a two sided p-value of 0.04. At a = 0.05 we would reject the null and conclude response times are different.

Equality of Variances Variable Method Num DF Den DF F Value Pr > F Time Folded F 4 4.40 0.038 Note: If Pr > F is < 0.05 do NOT assume σ =σ do not pool sample variances. 3 So why this final piece of output when we are able to draw our conclusion from the previous output? Well, SAS performs a test of equality for population variances instead of using our ratio as a rule of thumb. Generally, the results will agree. Here SAS has the Ho: sigma=sigma vs. Ha: sigma not equal sigma. The test statistic is an f with a p-value of 0.03. At a = 0.05 we would reject the null and conclude the population variance cannot be assumed to be equal. This would lead us to use the Satterthwaite test above as we had decided before. Recall: we said that the confidence interval for the difference of means does not contain zero. This corresponds to the two sided hypothesis test of the difference of means. If the confidence interval does not contain zero we would reject the null hypothesis of equal means. Note however; SAS calculates this confidence interval is based on the assumption that sigma=sigma. Your results may not always match if you cannot make this assumption. 3

Hypothesis Testing Comparing Two Means: An Example The effect of environmental exposure to lead on intellectual development is investigated using two randomly selected samples of 7 year old children from similar backgrounds but with different lead exposures. 4 Here is an example not using SAS to do our analysis. Lead has detrimental effect on intellectual development, especially when young children are exposed. The effect of environmental exposure to lead on intellectual development is investigated using two randomly selected samples of 7 year old children from similar backgrounds but with different lead exposures. 4

Hypothesis Testing Comparing Two Means: An Example Serum lead levels in group > 30 ug/dl Serum lead levels in group < 30 ug/dl 5 The two groups of children have different lead levels. One group had lead levels above 30 micrograms per deci liter. The other group had lead levels below 30 micrograms per deci liter. Researchers are wondering if children with lead levels above 30 micrograms per deci liter will score differently on intelligence tests than the children with lead levels that are lower. 5

Hypothesis Testing Comparing Two Means: An Example Does a significant difference exist between the mean intelligence test score in these two groups? The data for intelligence test score is summarized below: n =6 n =4 x =94 x =0 s =7 s =8 6 A random sample was drawn from each populations and the children were given an establish intelligence test. The results are as follows. Of the 6 kids in the higher lead level group the average score was 94 with a standard deviation of 7 points. Of the 4 kids with the lower lead level the average score was 0 with a standard deviation of 8 points. Is there a difference between mean intelligence test scores for the different populations? The sample averages are different, but this could have happened by chance. We can do a hypothesis test of two means to see if the means are significantly different. 6

Hypothesis Testing Comparing Two Means: An Example Step : State your hypotheses (set α=.0) H 0 : μ - μ = 0 H a : μ - μ 0 (two-sided) Step : Calculate your test statistic t = ( x x ) ( μo) (94 0) 0 = = s 7 8 + + n 6 4 s n.789 7 The first step is to write our null and alternative hypothesis. Remember this is a two-sided hypothesis. We did not specify that either group would be lower. Next we need to decide with test statistic to use. We do not know sigma for either populations so we will use a t statistic, but which one? If we look at the ratio of the sample standard deviations, the larger over the smaller, we see that this value is greater than two. We do not assume the population standard deviations are the same and we do not pool the variance. The unpooled t value is.789. 7

Hypothesis Testing Comparing Two Means: An Example Step 3: Calculate the p-value * p( t.789) = *.005 =.005 Degrees of freedom = 40 Step 4 : Make a conclusion p-value < α, then reject H o The data suggests a significant mean difference exists in intelligence scores for the two groups. 8 Step 3 is calculate our pvalue. We look at times the probability that a t with 40 degrees of freedom is less than -.789. Our conclusion is to reject Ho. This means the scores are significantly differenct at the alpha 0.05 level. Our sample means were not likely different by chance. It is likely the average score for the populations would have different locations. In other words, the data suggests a significant mean difference exists in intelligence scores for the two groups. 8

Hypothesis Testing: A Pooled T-test Example Independent random samples selected from two normal populations produced the sample means and standard deviations shown in the table: Sample size Mean Sample Standard deviation Sample 5.4 3.4 7.9 4.8 Test the null hypothesis that the population means are equal vs. the alternative that they are not equal. Let α=0.05, this means we will reject the null hypothesis when it is true 5% of the time. 7 Sample 9 Here is an example where we would choose to use the pooled t test. Independent random samples selected from two normal populations produced the following results. There were 7 subjects in sample with a mean value of 5.4 and a sample standard deviation of 3.4. There were subjects in sample with a mean value of 7.9 and a standard deviation of 4.8. I want to test the null hypothesis that the population means are equal vs. the alternative that they are not equal. Let =0.05, this means we will reject the null hypothesis when it is true 5% of the time. Notice the ratio of the larger sample standard deviation divided by the smaller sample standard deviation is less than two. 9

Hypothesis Testing: A Pooled T-test Example. Ho: μ - μ =0 Ha: μ - μ 0. Calculate the test statistic: s p and ( n = ) s + ( n ) s n + n ( x x) μo t = = s + p n n (5.4 7.9) 6.4 (7 )(3.4 ) + ( )(4.8 = 7 + 7 0 + =.645 ) = 6.4 0 We choose to use the pooled t test. First, we need to figure out what the pooled estimate of the variance would be. We call this sp. It s value is 6.4. We use this value in the denominator for the t statistic. Our statistic yields a value of.645. 0

Hypothesis Testing: A Pooled T-test Example. This test statistic follows the t-distribution with 7 degrees of freedom. 3. P-value=*P(T>.645) =0. (answer from calculator). 4. Therefore we fail to reject the null hypothesis based on an α-level of 0.05 and conclude that the two population means are not likely different. This test statistic follows the t-distribution with 7 degrees of freedom. This is n + n. The P-value=*P(T>.645) =0.. I obtained this answer from a calculator of a computer program. You cannot get an exact value using the table in your book. Our p-value leads us to fail to reject the null hypothesis based on an?-level of 0.05 and conclude that the two population means are not likely different. This assumes we had the power to detect a difference should there be one. This ends lesson c. Please go to self assessment c.