Two-sample inference: Continuous data

Similar documents

Comparing Two Groups. Standard Error of ȳ 1 ȳ 2. Setting. Two Independent Samples

Introduction. Hypothesis Testing. Hypothesis Testing. Significance Testing

THE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7.

Week 4: Standard Error and Confidence Intervals

3.4 Statistical inference for 2 populations based on two samples

Outline. Definitions Descriptive vs. Inferential Statistics The t-test - One-sample t-test

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

CALCULATIONS & STATISTICS

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test

Independent samples t-test. Dr. Tom Pierce Radford University

Two-sample hypothesis testing, II /16/2004

Stat 411/511 THE RANDOMIZATION TEST. Charlotte Wickham. stat511.cwick.co.nz. Oct

Comparing Means in Two Populations

MONT 107N Understanding Randomness Solutions For Final Examination May 11, 2010

1.5 Oneway Analysis of Variance

General Method: Difference of Means. 3. Calculate df: either Welch-Satterthwaite formula or simpler df = min(n 1, n 2 ) 1.

5.1 Identifying the Target Parameter

Multiple samples: Pairwise comparisons and categorical outcomes

CHAPTER 14 NONPARAMETRIC TESTS

Section 13, Part 1 ANOVA. Analysis Of Variance

Confidence intervals

Analysis of Variance ANOVA

HYPOTHESIS TESTING WITH SPSS:

Introduction to nonparametric regression: Least squares vs. Nearest neighbors

Association Between Variables

Inference for two Population Means

statistics Chi-square tests and nonparametric Summary sheet from last time: Hypothesis testing Summary sheet from last time: Confidence intervals

Introduction to Hypothesis Testing. Hypothesis Testing. Step 1: State the Hypotheses

Chapter 7 Notes - Inference for Single Samples. You know already for a large sample, you can invoke the CLT so:

Lecture Notes Module 1

6.4 Normal Distribution

Permutation Tests for Comparing Two Populations

Hypothesis Testing for Beginners

Math 251, Review Questions for Test 3 Rough Answers

Statistics 2014 Scoring Guidelines

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon

Prospective, retrospective, and cross-sectional studies

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 26 Estimation with Confidence Intervals

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

The correlation coefficient

Descriptive Statistics

Descriptive statistics; Correlation and regression

Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties:

research/scientific includes the following: statistical hypotheses: you have a null and alternative you accept one and reject the other

Chapter 6: Probability

Study Guide for the Final Exam

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

Finding Rates and the Geometric Mean

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

CHI-SQUARE: TESTING FOR GOODNESS OF FIT

MTH 140 Statistics Videos

Statistics Review PSY379

Sample Size and Power in Clinical Trials

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

Week 3&4: Z tables and the Sampling Distribution of X

Case Study in Data Analysis Does a drug prevent cardiomegaly in heart failure?

1. How different is the t distribution from the normal?

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Chapter 9. Two-Sample Tests. Effect Sizes and Power Paired t Test Calculation

Chapter 23 Inferences About Means

Simple Regression Theory II 2010 Samuel L. Baker

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing

Chapter 2. Hypothesis testing in one population

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Introduction to Hypothesis Testing OPRE 6301

Describing Populations Statistically: The Mean, Variance, and Standard Deviation

Part 3. Comparing Groups. Chapter 7 Comparing Paired Groups 189. Chapter 8 Comparing Two Independent Groups 217

Non-Inferiority Tests for Two Means using Differences

Two-Sample T-Tests Assuming Equal Variance (Enter Means)

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference)

Recall this chart that showed how most of our course would be organized:

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

SENSITIVITY ANALYSIS AND INFERENCE. Lecture 12

CONTINGENCY TABLES ARE NOT ALL THE SAME David C. Howell University of Vermont

Chapter 3 RANDOM VARIATE GENERATION

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance

The Wilcoxon Rank-Sum Test

Northumberland Knowledge

Correlation key concepts:

Simple Linear Regression Inference

Chapter 7. Comparing Means in SPSS (t-tests) Compare Means analyses. Specifically, we demonstrate procedures for running Dependent-Sample (or

4. Continuous Random Variables, the Pareto and Normal Distributions

NCSS Statistical Software

Name: Date: Use the following to answer questions 3-4:

12.5: CHI-SQUARE GOODNESS OF FIT TESTS

T-TESTS: There are two versions of the t-test:

t-test Statistics Overview of Statistical Tests Assumptions

Hypothesis testing. c 2014, Jeffrey S. Simonoff 1

individualdifferences

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

Module 5: Multiple Regression Analysis

Transcription:

Two-sample inference: Continuous data Patrick Breheny April 5 Patrick Breheny STA 580: Biostatistics I 1/32

Introduction Our next two lectures will deal with two-sample inference for continuous data As we will see, some procedures work very well when the continuous variable follows a roughly normal distribution, but work poorly when it doesn t Today s lecture will cover procedures that work well when the variable is roughly normal in distribution Patrick Breheny STA 580: Biostatistics I 2/32

Example: lead exposure and IQ Our motivating example for today deals with a study that you looked at on assignment 4 concerning lead exposure and neurological development for a group of children in El Paso, Texas The study compared the IQ levels of 57 children who lived within 1 mile of a lead smelter and a control group of 67 children who lived at least 1 mile away from the smelter In the study, the average IQ of the children who lived near the smelter was 89.2, while the average IQ of the control group was 92.7 Patrick Breheny STA 580: Biostatistics I 3/32

Looking at the data Percent of Total 40 30 20 10 Near Far 40 30 20 10 0 60 80 100 120 140 0 40 60 80 100 120 140 Far Near IQ Patrick Breheny STA 580: Biostatistics I 4/32

Could the results have been due to chance? Looking at the raw data, it appears that living close to the smelter may hamper neurological development However, as always, there is sampling variability present just because, in this sample, the children who lived closer to the smelter had lower IQs does not necessarily imply that the population of children who live smelters have lower IQs We need to ask whether or not our finding could have been due to chance Patrick Breheny STA 580: Biostatistics I 5/32

The difference between two means We can calculate separate confidence intervals for the two groups using one-sample t-distribution methods However, as we have said several times, the better way to analyze the data is to use all of the data to answer the single question: is there a difference between the two groups? Denoting the two groups with the subscripts 1 and 2, we will attempt to test the hypothesis that µ 1 = µ 2 by looking at the random variable x 1 x 2 and determining whether it is far away from 0 with respect to variability (standard error) Today, we will go over an exact, albeit computer/labor-intensive, approach (the permutation test ); then talk about an approximate approach that is much easier to do by hand (the two-sample t-test ) Patrick Breheny STA 580: Biostatistics I 6/32

Viewing our study as balls in an urn The same concept that we encountered in the two-sample Fisher s exact test can be used for continuous data also If the smelter had no effect on children s neurological development, then it wouldn t matter if the child lived near it or not Under this null hypothesis, then, it would be like writing down each child s IQ on a ball, putting all the balls into an urn, then randomly picking out 57 balls and calling them the near group, and calling the other 67 balls the far group If we were to do this, how often would we get a difference as large or larger than 3.5, the actual difference observed? Patrick Breheny STA 580: Biostatistics I 7/32

Results of the experiment Frequency 0 500 1000 1500 10 5 0 5 10 Difference Patrick Breheny STA 580: Biostatistics I 8/32

Results of the experiment (cont d) In the experiment, I obtained a random difference in means larger than the observed one 1,809 times out of 10,000 My experimental p-value is 1809/10000=.18 Thus, our suggestive-looking box plots may have been due entirely to chance Patrick Breheny STA 580: Biostatistics I 9/32

Experiment examples Examples of recreations of the experiment under the null hypothesis in which a difference as large or larger was obtained: 60 80 100 120 140 60 80 100 120 140 Far Near Far Near Patrick Breheny STA 580: Biostatistics I 10/32

This approach to carrying out a hypothesis test is called a permutation test The different orders that a sequence of numbers can be arranged in are called permutations; a permutation test is essentially calculating the percent of random permutations under the null hypothesis that produce a result as extreme or more extreme than the observed value Unlike Fisher s exact test, exact solutions are not readily available (this problem is much harder than Fisher s exact test) Unless the number of observations is small enough that we can count the permutations by hand, we need a computer to perform a permutation test Patrick Breheny STA 580: Biostatistics I 11/32

Getting an approximate answer based on the normal distribution As you might guess from the look of the histogram, a much easier way to obtain an answer is to use the normal distribution as an approximation Letting d = x 1 x 2, consider the test statistic: d d 0 SE d = x 1 x 2 SE d But what s the standard error of the difference between two means, SE d? Patrick Breheny STA 580: Biostatistics I 12/32

The standard error of the difference between two means Suppose we have x 1 with standard error SE 1 and x 2 with standard error SE 2 (and that x 1 and x 2 are independent) Then the standard error of x 1 x 2 is SE d = SE1 2 + SE2 2 Note the connections with both the root-mean-square idea and the square root law from earlier in the course Note also that SE 2 1 + SE2 2 < SE 1 + SE 2 a two-sample test is a more powerful way to look at differences than two separate analyses Patrick Breheny STA 580: Biostatistics I 13/32

The split Introduction This equation would be perfect if we knew SE 1 and SE 2 But of course we don t all we have are estimates There are two ways of settling this question, and they have led to two different forms of the two-sample t-test Patrick Breheny STA 580: Biostatistics I 14/32

Approach #1: Student s t-test The first approach was invented by W.S. Gosset (Student) His approach was to assume that the standard deviations of the two groups were the same If you do this, then you only have two sources of variability to worry about: the variability of x 1 x 2 and the variability in your estimate of the common standard deviation Patrick Breheny STA 580: Biostatistics I 15/32

Approach #2: Welch s t-test The second approach was invented by B.L. Welch He generalized the two-sample t-test to situations in which the standard deviations were different between the two groups If you don t make Student s assumption, then you have three sources of variability to worry about: the variability of x 1 x 2, the variability of SD 1, and the variability of SD 2 Patrick Breheny STA 580: Biostatistics I 16/32

vs. Welch s test We ll talk more about the difference between the two tests at the end of class In most situations, the two tests provide similar answers However, is much easier to do by hand Patrick Breheny STA 580: Biostatistics I 17/32

The pooled standard deviation In order to estimate a standard error, we will need an estimate of the common standard deviation This is obtained by pooling the deviations To calculate a standard deviation, we took the root-mean-square of the deviations (only with n 1 in the denominator) To calculate a pooled standard deviation, we take the root-mean-square of all the deviations from both groups (only with n 1 + n 2 2 in the denominator) Essentially, the pooled standard deviation is a weighted average of the standard deviations in the two groups, where the weights are the number of observations in each group Patrick Breheny STA 580: Biostatistics I 18/32

The pooled standard deviation and the standard error Letting SD p denote our pooled standard deviation, our estimated standard error is SDp 2 SE d = + SD2 p n 1 n 2 = SD p 1 n 1 + 1 n 2 This equation is similar to our earlier one, only now the amount by which the SE is reduced in comparison to the SD depends on the sample size in each group Patrick Breheny STA 580: Biostatistics I 19/32

Sample size and standard error So, let s say that we have 50 subjects in one group and 10 subjects in the other group, and we have enough money to enroll 20 more people in the study To reduce the SE as much as possible, should we assign them to the group that already has 50, or the group that only has 10? Let s check: 1 70 + 1 10 = 0.34 1 50 + 1 30 = 0.23 Patrick Breheny STA 580: Biostatistics I 20/32

The advantages of balanced sample sizes This example illustrates an important general point: the greatest improvement in accuracy/reduction in standard error comes when the sample sizes of the two groups are balanced Occasionally, it is much easier (or cheaper) to obtain (or assign) subjects in one group than in the other In these cases, one often sees unbalanced sample sizes However, it is rare to see a ratio that exceeds 3:1, as the study runs into diminishing returns no matter how much you reduce the standard error of x 1, the standard error of x 2 will still be there Patrick Breheny STA 580: Biostatistics I 21/32

The degrees of freedom of the two-sample t-test We have said that if we make the assumption of equal standard deviations, then we only need to worry about the variability of x 1 x 2 and the variability in your estimate of the common standard deviation How variable is our estimate of the common standard deviation? Well, it s now based on n 1 + n 2 2 degrees of freedom (we lose one degree of freedom for each mean that we calculate) Thus, to perform inference, we will look up results on Student s curve with n 1 + n 2 2 degrees of freedom Patrick Breheny STA 580: Biostatistics I 22/32

Student s t-test: procedure The procedure of Student s two-sample t-test should look quite familiar: #1 Estimate the standard error: SE d = SD p 1 n 1 + 1 n 2 #2 Calculate the test statistic t = x 1 x 2 SE d #3 Calculate the area under the Student s curve with n 1 + n 2 2 degrees of freedom curve outside ±t Patrick Breheny STA 580: Biostatistics I 23/32

Student s t-test: example For the lead study, The mean IQ for the children who lived near the smelter was 89.2 The mean IQ for the children who did not live near the smelter was 92.7 The pooled standard deviation was 14.4 The individual standard deviations were 16.0 and 12.2 note that the pooled standard deviation is a weighted average of the individual standard deviations Patrick Breheny STA 580: Biostatistics I 24/32

Student s t-test: example 1 #1 Estimate the standard error: SE d = 14.4 57 + 1 67 = 2.59 #2 Calculate the test statistic: t = 92.7 89.2 2.59 = 1.35 #3 For Student s curve with 57 + 67 2 = 122 degrees of freedom, 17.9% of the area lies outside 1.35 Thus, if there was no difference in IQ between the two groups, we would have observed a difference as large or larger than the one we saw about 18% of the time; the study provides very weak evidence of a population difference Patrick Breheny STA 580: Biostatistics I 25/32

Student s t-test, Welch s t-test, and the permutation test Note that Student s t-test agrees quite well with our permutation test from earlier, and with Welch s test Permutation: p =.181 Student s: p =.179 Welch s: p =.170 This is usually the case when the sample sizes are reasonably large: the approximations work well, agreeing both with each other and with exact approaches Patrick Breheny STA 580: Biostatistics I 26/32

Introduction Our hypothesis test suggests that there may be no difference in IQ between the two groups Is it possible that there is a difference, even a large difference? This is the kind of question answered by a confidence interval Patrick Breheny STA 580: Biostatistics I 27/32

: Procedure The procedure for calculating confidence intervals is straightforward: 1 #1 Estimate the standard error: SE d = SD p n 1 + 1 n 2 #2 Determine the values ±t x% that contain the middle x% of the Student s curve with n 1 + n 2 2 degrees of freedom #3 Calculate the confidence interval: ( x 1 x 2 t x% SE d, x 1 x 2 + t x% SE d ) Patrick Breheny STA 580: Biostatistics I 28/32

: Example For the lead study: 1 #1 The standard error is SE d = 14.4 57 + 1 67 = 2.59, and the difference between the two means was 92.7 89.2 = 3.5 #2 The values ±1.98 contain the middle 95% of the Student s curve with 57 + 67 2 = 122 degrees of freedom #3 Thus, the 95% confidence interval is: (3.5 1.98(2.59), 3.5 + 1.98(2.59)) = ( 1.63, 8.61) So, although we cannot rule out the idea that lead has no effect on IQ, it s possible that lead reduces IQ by as much as 8 and a half points (or increases it by a point and a half) Patrick Breheny STA 580: Biostatistics I 29/32

vs. t-tests Should I use a permutation test or a t-test? The difference between a permutation test or a t-test is whether one trusts the normal approximation to the sampling distribution Thus, if n is large, it doesn t matter; the sampling distribution will always be approximately normal (because of the Central Limit Theorem) If n is small, then this is a tough choice the difference between the two tests can be sizable, and there is little data with which to determine normality In practice, however, permutation tests are not very common The reason is that, if one is concerned about normality, it is usually better to choose one of the methods that we will talk about in the next lecture Patrick Breheny STA 580: Biostatistics I 30/32

vs. t-tests Should I use or Welch s test? In the earlier example, and Welch s test were basically the same, even though the standard deviations of the two groups differed by about 30% This is often the case However, the two tests will provide different answers when: The standard deviations of the two groups are very different, and The number of subjects in each group is also very different Patrick Breheny STA 580: Biostatistics I 31/32

Tradeoff Introduction vs. t-tests So which test is better? This is a complicated question and depends on a number of factors, but the basic tradeoff is this: When the standard deviations or sample sizes are fairly close, Student s t-test is (slightly) more powerful When they are not, Student s t-test is unreliable it is making an assumption of common variability and that assumption is inaccurate Patrick Breheny STA 580: Biostatistics I 32/32