Bootstrap Hypothesis Test



Similar documents
Chapter 23 Inferences About Means

Stat 411/511 THE RANDOMIZATION TEST. Charlotte Wickham. stat511.cwick.co.nz. Oct

Hypothesis testing - Steps

Section 13, Part 1 ANOVA. Analysis Of Variance

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

How To Test For Significance On A Data Set

ISyE 2028 Basic Statistical Methods - Fall 2015 Bonus Project: Big Data Analytics Final Report: Time spent on social media

General Method: Difference of Means. 3. Calculate df: either Welch-Satterthwaite formula or simpler df = min(n 1, n 2 ) 1.

Independent t- Test (Comparing Two Means)

Odds ratio, Odds ratio test for independence, chi-squared statistic.

Two Related Samples t Test

Hypothesis testing. c 2014, Jeffrey S. Simonoff 1

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference)

Chapter 7 Notes - Inference for Single Samples. You know already for a large sample, you can invoke the CLT so:

MONT 107N Understanding Randomness Solutions For Final Examination May 11, 2010

C. The null hypothesis is not rejected when the alternative hypothesis is true. A. population parameters.

HYPOTHESIS TESTING WITH SPSS:

Statistics Review PSY379

This chapter discusses some of the basic concepts in inferential statistics.

BA 275 Review Problems - Week 6 (10/30/06-11/3/06) CD Lessons: 53, 54, 55, 56 Textbook: pp , ,

p-values and significance levels (false positive or false alarm rates)

UNDERSTANDING THE INDEPENDENT-SAMPLES t TEST

Two-Sample T-Tests Assuming Equal Variance (Enter Means)

THE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7.

Part 3. Comparing Groups. Chapter 7 Comparing Paired Groups 189. Chapter 8 Comparing Two Independent Groups 217

This can dilute the significance of a departure from the null hypothesis. We can focus the test on departures of a particular form.

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

1 Hypothesis Testing. H 0 : population parameter = hypothesized value:

Calculating P-Values. Parkland College. Isela Guerra Parkland College. Recommended Citation

Permutation Tests for Comparing Two Populations

Study Guide for the Final Exam

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

An Introduction to Statistics Course (ECOE 1302) Spring Semester 2011 Chapter 10- TWO-SAMPLE TESTS

Hypothesis Testing --- One Mean

Recall this chart that showed how most of our course would be organized:

Two-sample inference: Continuous data

2 Sample t-test (unequal sample sizes and unequal variances)

Chapter 26: Tests of Significance

Chapter 7 Section 1 Homework Set A

1 Why is multiple testing a problem?

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing

Difference of Means and ANOVA Problems

Comparing Means in Two Populations

p ˆ (sample mean and sample

Hypothesis Test for Mean Using Given Data (Standard Deviation Known-z-test)

Hypothesis Testing: Two Means, Paired Data, Two Proportions

Comparing Two Groups. Standard Error of ȳ 1 ȳ 2. Setting. Two Independent Samples

Statistics 2014 Scoring Guidelines

CHAPTER 14 NONPARAMETRIC TESTS

NCSS Statistical Software

Inference for two Population Means

Psychology 60 Fall 2013 Practice Exam Actual Exam: Next Monday. Good luck!

Two-sample hypothesis testing, II /16/2004

Principles of Hypothesis Testing for Public Health

Comparison of frequentist and Bayesian inference. Class 20, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES

Introduction. Statistics Toolbox

Understanding Confidence Intervals and Hypothesis Testing Using Excel Data Table Simulation

Permutation & Non-Parametric Tests

Chapter 2 Probability Topics SPSS T tests

Chapter 8 Paired observations

Name: (b) Find the minimum sample size you should use in order for your estimate to be within 0.03 of p when the confidence level is 95%.

Introduction. Hypothesis Testing. Hypothesis Testing. Significance Testing

SIMPLE LINEAR CORRELATION. r can range from -1 to 1, and is independent of units of measurement. Correlation can be done on two dependent variables.

Analysis of Variance ANOVA

1.5 Oneway Analysis of Variance

Friedman's Two-way Analysis of Variance by Ranks -- Analysis of k-within-group Data with a Quantitative Response Variable

Stats Review Chapters 9-10

Projects Involving Statistics (& SPSS)

Having a coin come up heads or tails is a variable on a nominal scale. Heads is a different category from tails.

STATISTICS PROJECT: Hypothesis Testing

UNDERSTANDING THE DEPENDENT-SAMPLES t TEST

Simple Linear Regression Inference

Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test

Lecture 5 : The Poisson Distribution

Regression step-by-step using Microsoft Excel

Correlational Research

Chapter 2. Hypothesis testing in one population

Final Exam Practice Problem Answers

12: Analysis of Variance. Introduction

BA 275 Review Problems - Week 5 (10/23/06-10/27/06) CD Lessons: 48, 49, 50, 51, 52 Textbook: pp

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS

Unit 26: Small Sample Inference for One Mean

Chapter 7 Section 7.1: Inference for the Mean of a Population

Introduction to Analysis of Variance (ANOVA) Limitations of the t-test

Comparing Multiple Proportions, Test of Independence and Goodness of Fit

How To Run Statistical Tests in Excel

Introduction to Minitab and basic commands. Manipulating data in Minitab Describing data; calculating statistics; transformation.

Understand the role that hypothesis testing plays in an improvement project. Know how to perform a two sample hypothesis test.

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Two-sample t-tests. - Independent samples - Pooled standard devation - The equal variance assumption

Module 5: Statistical Analysis

Descriptive Statistics

Using Excel for inferential statistics

Robust t Tests. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

Transcription:

Bootstrap Hypothesis Test In 1882 Simon Newcomb performed an experiment to measure the speed of light. The numbers below represent the measured time it took for light to travel from Fort Myer on the west bank of the Potomac River to a fixed mirror at the foot of the Washington monument 3721 meters away. In the units in which the data are given, the currently accepted true speed of light is 33.02. (To convert these units to time in the millionths of a second, multiply by 10-3 and add 24.8.) 28-44 29 30 26 27 22 23 33 16 24 40 21 31 34-2 25 19 Does the data support the current accepted speed of 33.02? First, note that the data are not normal: > speed <- c(28, -44, 29, 30, 26, 27, 22, 23, 33, 16, 24, 29, 24, 40, 21, 31, 34, -2, 25, 19) > hist(speed)

> qqnorm(speed) > qqline(speed) To do the t-test we must assume the population of measurements is normally distributed. If this is not true, at best our tests will be approximations. But with this small sample size, and with such a severe departure from normality, we can t be guraranteed a good approximation. The bootstrap offers one approach. Step 1: State null and alternative hypotheses: H0: mean = 33.02 Ha: mean <> 33.02 Step 2: Set the significance level. We ll choose 5%. Step 3: Choose a test statistic. We wish to estimate the mean speed, and therefore we ll use the sample average. Step 4: Find the observed value of the test statistic: > mean(speed) [1] 21.75

We now need the p-value, but to do this we need to know the sampling distribution of our test statistic when the null hypothesis is true. We know it is approximately normal, but also that the approximation might not be very good here. So our approach instead is to perform a simulation under conditions in which we know the null hypothesis is true. What we ll do is use our data to represent the population, but first we shift it over so that the mean realy is 33.02: > newspeed <- speed - mean(speed) + 33.02 > mean(newspeed) [1] 33.02 The histogram of newspeed will have exactly the same shape as speed, but it will be shifted over so that it is centered at 33.02 rather than at 21.75. Now we reach into our fake population and take out 20 observations at random, with replacement. (We take out 20 because that s the size of our initial sample). We calculate the average. It will be close to 33.02, because that is the mean of the population. We ll save it, and then repeat this process again. After many repetitions, we ll have a good idea of what sort of sample averages we should expect when the true mean is 33.02. We will then compare these to our observed sample average (21.75) to see if it s consistent with our simulated averages. If so, then perhaps the mean really is 33.02. If not, then perhaps Newcomb had some flaws in his experimental design and was measuring incorrectly. Here s some code: > bstrap <- c() > for (i in 1:1000){ + newsample <- sample(newspeed, 20, replace=t) + bstrap <- c(bstrap, mean(newsample))} > hist(bstrap)

This distribution doesn t look normal, which means that we did the right thing. With a larger sample size it would have looked normal, but 20 apparently isn t large enough. We can t, therefore, trust the normal approximation, and our bootstrap approach will be stronger. As you can see, it s not impossible for the sample average to be 21.75 even when the true mean is 33.02. But it s not all that common, either. The p-value is the probability of getting something more extreme than what we observed. 21.75 is 33.02-21.75 = 11.27 units away from the null hypothesis. So our p-value is the probability of being more than 11.27 units away from 33.02. This is P(Test Stat < 21.75) + P(Test STat > 44.29). We don t know the sampling distribution of our test statistic, but our bootstrap sample lets us estimate this probability: > (sum(bstrap < 21.75) + sum(bstrap > 44.29))/1000 [1] 0.004 Therefore, we estimate the p-value to be 0.004. (In other words, in 4 times out of 1000, we had a sample average this extreme.)

Because our significance level is 5% and.4% < 5%, we reject the null hypothesis and conclude that Newcomb s measurements were not consistent with the currently accepted figure. We did not have to use the sample average as our test statistic. We could have used a t-statistic, and calculated the observed value: > (mean(speed) - 33.02)/(sd(speed)/sqrt(20)) [1] -2.859247 Then we would repeat as before, but instead of simply calculating the mean of the re-sample in our bootstrap code, we would instead calculate this t-statistic (replacing newspeed wherever speed appears in the formula above.) Note that there were two extreme observations: -44 and -2. The t-test, because it depends on the sample average, is notoriously susceptible to being influenced by extreme observations. Let s take those values out and see what happens: > speed [1] 28-44 29 30 26 27 22 23 33 16 24 29 24 40 21 31 34-2 [19] 25 19 > betterspeed <- speed[-c(2,18)] > betterspeed [1] 28 29 30 26 27 22 23 33 16 24 29 24 40 21 31 34 25 19 The -44 and -2 occur in the 2nd and 18th positions of speed. The speed[-c(2,18)] part of the command refers to every item in the vector speed, with the 2nd and 18th removed. > qqnorm(betterspeed)

Note that things look much more normal. We can still do our bootstrap test: > newspeed <- betterspeed - mean(betterspeed) + 33.02 > mean(betterspeed) [1] 26.72222 > bstrap <- c() > for (i in 1:1000){ + bstrap <- c(bstrap, mean(sample(newspeed,20,replace=t)))} Now you ll see that the observed value of our test statistic is 26.7222. Our pvalue is now the probability of seeing something more than (33.02-26.722) = 6.298 units away from 33.02. We calculate this as: > (sum(bstrap < 26.722) + sum(bstrap > 39.218))/1000 [1] 0

It s so extreme, that in 1000 repetitions, we never saw numbers that extreme. What if we used the t-test? Since the data now look normal, there s no reason not to. > t.test(betterspeed,alternative="two.sided",mu=33.02) One Sample t-test data: betterspeed t = -4.6078, df = 17, p-value = 0.0002508 alternative hypothesis: true mean is not equal to 33.02 95 percent confidence interval: 23.83863 29.60582 sample estimates: mean of x 26.72222 As you can see, the p-value is not exactly 0, but is quite small.