Hypothesis Testing: p-value

STAT 101 Dr. Kari Lock Morgan Paul the Octopus Hypothesis Testing: SECTION 4.2 andomization distribution http://www.youtube.com/watch?v=3esgpumj9e Hypotheses In 2008, Paul the Octopus predicted 8 World Cup games, and predicted them all correctly Is this evidence that Paul s chance of guessing correctly, p, is really greater than 50%? What are the null and alternative hypotheses? a) H 0 : p 0.5, H a : p = 0.5 b) H 0 : p = 0.5, H a : p 0.5 c) H 0 : p = 0.5, H a : p > 0.5 d) H 0 : p > 0.5, H a : p = 0.5 Key Question How unusual is it to see a sample statistic as extreme as that observed, if H 0 is true? If it is very unusual, we have statistically significant evidence against the null hypothesis Today s Question: How do we measure how unusual a sample statistic is, if H 0 is true? Measuring Evidence against H 0 To see if a statistic provides evidence against H 0, we need to see what kind of sample statistics we would observe, just by random chance, if H 0 were true Paul the Octopus We need to know what kinds of statistics we would observe just by random chance, if the null hypothesis were true How could we figure this out??? Simulate many samples of size n = 8 with p = 0.5 1

Simulate! We can simulate this with a coin! Each coin flip = a guess between two teams (Heads = correct, Tails = incorrect) Flip a coin 8 times, count the number of heads, and calculate the sample proportion of heads Did you get all 8 heads (correct)? (a) Yes (b) No How extreme is Paul s sample proportion of 1? Paul the Octopus Based on your simulation results, for a sample size of n = 8, do you think p = 1 is statistically significant? a) Yes b) No andomization Distribution A randomization distribution is a collection of statistics from samples simulated assuming the null hypothesis is true Lots of simulations! For a better randomization distribution, we need many more simulations! www.lock5stat.com/statkey The randomization distribution shows what types of statistics would be observed, just by random chance, if the null hypothesis were true andomization Distribution Paul the Octopus Based on StatKey s simulation results, for a sample size of n = 8, do you think p = 1 is statistically significant? a) Yes b) No 2

Key Question How unusual is it to see a sample statistic as extreme as that observed, if H 0 is true? A randomization distribution tells us what kinds of statistics we would see just by random chance, if the null hypothesis is true This makes it straightforward to assess how extreme the is! andomization Distribution In a hypothesis test for H 0 : = 12 vs H a : < 12, we have a sample with n = 45 and x = 10.2. What do we require about the method to produce randomization samples? a) = 12 b) < 12 c) x = 10.2 We need to generate randomization samples assuming the null hypothesis is true. andomization Distribution In a hypothesis test for H 0 : = 12 vs H a : < 12, we have a sample with n = 45 and x = 10.2. Where will the randomization distribution be centered? a) 10.2 b) 12 c) 45 d) 1.8 andomization distributions are always centered around the null hypothesized value. andomization Distribution Center A randomization distribution simulates samples assuming the null hypothesis is true, so A randomization distribution is centered at the value of the parameter given in the null hypothesis. andomization Distribution In a hypothesis test for H 0 : = 12 vs H a : < 12, we have a sample with n = 45 and x = 10.2. What will we look for on the randomization distribution? a) How extreme 10.2 is We want to see how extreme the observed b) How extreme 12 is statistic is. c) How extreme 45 is d) What the standard error is e) How many randomization samples we collected andomization Distribution In a hypothesis test for H 0 : 1 = 2 vs H a : 1 > 2, we have a sample with x 1 = 26 and x 2 = 21. What do we require about the method to produce randomization samples? a) 1 = 2 b) 1 > 2 c) x 1 =26, x 2 =21 d) x 1 x 2 = 5 We need to generate randomization samples assuming the null hypothesis is true. 3

andomization Distribution In a hypothesis test for H 0 : 1 = 2 vs H a : 1 > 2, we have a sample with x 1 = 26 and x 2 = 21. Where will the randomization distribution be centered? a) 0 b) 1 c) 21 d) 26 e) 5 The randomization distribution is centered around the null hypothesized value, 1-2 = 0 andomization Distribution In a hypothesis test for H 0 : 1 = 2 vs H a : 1 > 2, we have a sample with x 1 = 26 and x 2 = 21. What do we look for on the randomization distribution? a) The standard error b) The center point c) How extreme 26 is d) How extreme 21 is e) How extreme 5 is We want to see how extreme the observed difference in means is. Quantifying Evidence We need a way to quantify evidence against the null The is the chance of obtaining a sample statistic as extreme (or more extreme) than the observed sample statistic, if the null hypothesis is true The can be calculated as the proportion of statistics in a randomization distribution that are as extreme (or more extreme) than the observed sample statistic 1000 Simulations Paul the Octopus: the is the chance of getting all 8 out of 8 guesses correct, if p = 0.5 What proportion of statistics in the randomization distribution are as extreme as p = 1? Proportion as extreme as = 0.004 If Paul is just guessing, the chance of him getting all 8 correct is 0.004. 4

Calculating a ESP 1. What kinds of statistics would we get, just by random chance, if the null hypothesis were true? (randomization distribution) 2. What proportion of these statistics are as extreme as our original sample statistic? () For our ESP example, the is the chance of getting a sample proportion as high as 0.26, from a sample of n = 98, if p = 0.2 Simulate a randomization distribution with p = 0.2 and n = 98, and see what proportion of simulated statistics are as extreme as 0.26 www.lock5stat.com/statkey ESP andomization Distributions If you were all just guessing randomly, the chance of us getting a sample proportion as high as 0.26 is 0.072. Proportion as extreme as = 0.072 s can be calculated by randomization distributions: simulate samples, assuming H 0 is true calculate the statistic of interest for each sample find the as the proportion of simulated statistics as extreme as the Let s do a randomization distribution for a randomized experiment Cocaine Addiction In a randomized experiment on treating cocaine addiction, 48 people were randomly assigned to take either Desipramine (a new drug), or Lithium (an existing drug), and then followed to see who relapsed Question of interest: Is Desipramine better than Lithium at treating cocaine addiction? Cocaine Addiction What are the null and alternative hypotheses? p D, p L : proportion of cocaine p D addicts who relapse after taking Desipramine or Lithium, respectively H 0 : p D = p L H a : p D < p L What are the possible conclusions? eject H 0 ; Desipramine is better than Lithium Do not reject H 0 : We cannot determine from these data whether Desipramine is better than Lithium 5

2. Conduct experiment 3. Observe relapse counts in each group = elapse N = No elapse Desipramine 1. andomly assign units to treatment groups Lithium Desipramine N N N N N N N N 1. andomly assign units to treatment groups pˆ D pˆ 10 18 24 24.333 10 relapse, 14 no relapse 18 relapse, 6 no relapse L Lithium Measuring Evidence against H 0 To see if a statistic provides evidence against H 0, we need to see what kind of sample statistics we would observe, just by random chance, if H 0 were true Cocaine Addiction by random chance means by the random assignment to the two treatment groups if H 0 were true means if the two drugs were equally effective at preventing relapses (equivalently: whether a person relapses or not does not depend on which drug is taken) Simulate what would happen just by random chance, if H 0 were true N N Desipramine Simulate another randomization Lithium N N 10 relapse, 14 no relapse 18 relapse, 6 no relapse N N N N N N N pˆ ˆ D pl 16 12 24 24 0.167 N N N N N N N N N N N 16 relapse, 8 no relapse 12 relapse, 12 no relapse 6

www.lock5stat.com/statkey Desipramine N N N N N N N Simulate another randomization pˆ ˆ D pl 17 11 24 24 0.250 Lithium 17 relapse, 7 no relapse 11 relapse, 13 no relapse Proportion as extreme as If the two drugs are equal regarding cocaine relapse rates, we have a 1.3% chance of seeing a difference in proportions as extreme as that observed. Death Penalty A random sample of people were asked Are you in favor of the death penalty for a person convicted of murder? Yes Did the proportion of Americans who favor the death penalty decrease from 1980 to 2010? No 1980 663 342 2010 640 360 Death Penalty, Gallup, www.gallup.com Death Penalty p 1980, p 2010 : proportion of Americans who favor the death penalty in 1980, 2010 H 0 : p 1980 = p 2010 H a : p 1980 > p 2010 How extreme is 0.02, if p 1980 = p 2010? StatKey Yes No 1980 663 342 2010 640 360 p 1980 = 0.66 p 2010 = 0.64 So the sample statistic is: p 1980 p 2010 = 0.66 0.64 = 0.02 Death Penalty Alternative Hypothesis p 1980 p 2010 p value = 0.164 p 1980 p 2010 If proportion supporting the death penalty has not changed from 1980 to 2010, we would see differences this extreme about 16% of the time. A one-sided alternative contains either > or < A two-sided alternative contains The is the proportion in the tail in the direction specified by H a For a two-sided alternative, the is twice the proportion in the smallest tail 7

Upper-tail (ight Tail) Lower-tail (Left Tail) Two-tailed and H a H 0 : = 0 H a : > 0 x = 2 H 0 : = 0 H a : < 0 x = 1 H 0 : = 0 H a : 0 x = 2 Sleep versus Caffeine ecall the sleep versus caffeine experiment from last class s and c are the mean number of words recalled after sleeping and after caffeine. H 0 : s = c H a : s c Let s find the! Two-tailed alternative www.lock5stat.com/statkey Sleep or Caffeine for Memory? www.lock5stat.com/statkey = 2 0.022 = 0.044 and H 0 If the is small, then a statistic as extreme as that observed would be unlikely if the null hypothesis were true, providing significant evidence against H 0 The smaller the, the stronger the evidence against the null hypothesis and in favor of the alternative X X when S C H0 true X S X 3 C and H 0 The smaller the, the, the stronger the evidence against evidence H o. the stronger the evidence against Hagainst o. H o. Summary The randomization distribution shows what types of statistics would be observed, just by random chance, if the null hypothesis were true A is the chance of getting a statistic as extreme as that observed, if H 0 is true A can be calculated as the proportion of statistics in the randomization distribution as extreme as the observed sample statistic The smaller the, the greater the evidence against H 0 8

ead Section 4.2 To Do Project 1 proposal (due Wednesday, 2/19) 9