Hypothesis Testing: Significance

Similar documents
Chapter 4. Hypothesis Tests

Mind on Statistics. Chapter 12

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing

Section 7.1. Introduction to Hypothesis Testing. Schrodinger s cat quantum mechanics thought experiment (1935)

HYPOTHESIS TESTING WITH SPSS:

Introduction to Hypothesis Testing OPRE 6301

C. The null hypothesis is not rejected when the alternative hypothesis is true. A. population parameters.

Hypothesis Testing --- One Mean

p ˆ (sample mean and sample

Testing Hypotheses About Proportions

Hypothesis testing. c 2014, Jeffrey S. Simonoff 1

22. HYPOTHESIS TESTING

Non-Parametric Tests (I)

Introduction to Hypothesis Testing

Hypothesis Testing: Two Means, Paired Data, Two Proportions

BA 275 Review Problems - Week 6 (10/30/06-11/3/06) CD Lessons: 53, 54, 55, 56 Textbook: pp , ,

6: Introduction to Hypothesis Testing

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.

An Introduction to Statistics Course (ECOE 1302) Spring Semester 2011 Chapter 10- TWO-SAMPLE TESTS

Statistics 2014 Scoring Guidelines

Introduction to Hypothesis Testing. Hypothesis Testing. Step 1: State the Hypotheses

Psychology 60 Fall 2013 Practice Exam Actual Exam: Next Monday. Good luck!

Statistics in Medicine Research Lecture Series CSMC Fall 2014

Mind on Statistics. Chapter 15

1 Why is multiple testing a problem?

The Importance of Statistics Education

Mind on Statistics. Chapter 4

HYPOTHESIS TESTING: POWER OF THE TEST

BA 275 Review Problems - Week 5 (10/23/06-10/27/06) CD Lessons: 48, 49, 50, 51, 52 Textbook: pp

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Testing a claim about a population mean

Comparing Two Groups. Standard Error of ȳ 1 ȳ 2. Setting. Two Independent Samples

Introduction. Hypothesis Testing. Hypothesis Testing. Significance Testing

"Statistical methods are objective methods by which group trends are abstracted from observations on many separate individuals." 1

Business Statistics, 9e (Groebner/Shannon/Fry) Chapter 9 Introduction to Hypothesis Testing

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

Chapter 2. Hypothesis testing in one population

Chapter 7 TEST OF HYPOTHESIS

Hypothesis testing - Steps

1 Hypothesis Testing. H 0 : population parameter = hypothesized value:

WISE Power Tutorial All Exercises

Online 12 - Sections 9.1 and 9.2-Doug Ensley

Hypothesis Testing for Beginners

3.4 Statistical inference for 2 populations based on two samples

AP: LAB 8: THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

Cosmological Arguments for the Existence of God S. Clarke

Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test

STATISTICS 8, FINAL EXAM. Last six digits of Student ID#: Circle your Discussion Section:

Section 13, Part 1 ANOVA. Analysis Of Variance

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

Final Exam Practice Problem Answers

A) B) C) D)

SAMPLING & INFERENTIAL STATISTICS. Sampling is necessary to make inferences about a population.

Linear Models in STATA and ANOVA

Stat 411/511 THE RANDOMIZATION TEST. Charlotte Wickham. stat511.cwick.co.nz. Oct

So quit thinking about the science behind dieting, or what the latest expert said, and concentrate on losing weight with Eat Stop Eat.

This chapter discusses some of the basic concepts in inferential statistics.

One natural response would be to cite evidence of past mornings, and give something like the following argument:

CHAPTER 3. Methods of Proofs. 1. Logical Arguments and Formal Proofs

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

1-3 id id no. of respondents respon 1 responsible for maintenance? 1 = no, 2 = yes, 9 = blank

Background Biology and Biochemistry Notes A

Week 3&4: Z tables and the Sampling Distribution of X

LAB : THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

Name: (b) Find the minimum sample size you should use in order for your estimate to be within 0.03 of p when the confidence level is 95%.

Solutions: Problems for Chapter 3. Solutions: Problems for Chapter 3

Two-Sample T-Tests Assuming Equal Variance (Enter Means)

Statistiek I. Proportions aka Sign Tests. John Nerbonne. CLCG, Rijksuniversiteit Groningen.

Recall this chart that showed how most of our course would be organized:

The Null Hypothesis. Geoffrey R. Loftus University of Washington

Statistiek II. John Nerbonne. October 1, Dept of Information Science

Calculating P-Values. Parkland College. Isela Guerra Parkland College. Recommended Citation

Introduction to Quantitative Methods

Review #2. Statistics

Chapter 26: Tests of Significance

VMC Body Fat / Hydration Monitor Scale. VBF-362 User s Manual

Propaganda and Persuasive Techniques. What is it? What does it do?

How To Test For Significance On A Data Set

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Mind on Statistics. Chapter 13

Homework Help Heart Disease & Stroke

Independent samples t-test. Dr. Tom Pierce Radford University

STAT 350 Practice Final Exam Solution (Spring 2015)

Comparison of frequentist and Bayesian inference. Class 20, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Inclusion and Exclusion Criteria

Sample Size and Power in Clinical Trials

5/31/2013. Chapter 8 Hypothesis Testing. Hypothesis Testing. Hypothesis Testing. Outline. Objectives. Objectives

The Top 3 Common Mistakes Men Make That Blow All Their Chances of Getting Their Ex-Girlfriend Back Which of these mistakes are you making?

Cultural Relativism. 1. What is Cultural Relativism? 2. Is Cultural Relativism true? 3. What can we learn from Cultural Relativism?

CHANCE ENCOUNTERS. Making Sense of Hypothesis Tests. Howard Fincher. Learning Development Tutor. Upgrade Study Advice Service

Math 251, Review Questions for Test 3 Rough Answers

Hypothesis Tests for 1 sample Proportions

IVR PARTICIPANT MANUAL

Understand the role that hypothesis testing plays in an improvement project. Know how to perform a two sample hypothesis test.

Opgaven Onderzoeksmethoden, Onderdeel Statistiek

Name: Date: Use the following to answer questions 3-4:

Odds ratio, Odds ratio test for independence, chi-squared statistic.

Transcription:

STAT 101 Dr. Kari Lock Morgan Hypothesis Testing: Significance SECTION 4.3, 4.5 Significance level (4.3) Statistical conclusions (4.3) Type I and II errors (4.3) Statistical versus practical significance (4.5) Multiple testing (4.5) Review of Last Class The randomization distribution shows what types of statistics would be observed, just by random chance, if the null hypothesis were true A p-value is the chance of getting a statistic as extreme as that observed, if H 0 is true A p-value can be calculated as the proportion of statistics in the randomization distribution as extreme as the observed sample statistic The smaller the p-value, the stronger the evidence against H 0 Which of the following p-values gives the strongest evidence against H 0? a) 0.005 b) 0.1 c) 0.32 d) 0.56 e) 0.94 p-value and H 0 Which of the following p-values gives the strongest evidence against H 0? a) 0.22 b) 0.45 c) 0.03 d) 0.8 e) 0.71 p-value and H 0 Two different studies obtain two different p- values. Study A obtained a p-value of 0.002 and Study B obtained a p-value of 0.2. Which study obtained stronger evidence against the null hypothesis? a) Study A b) Study B p-value and H 0 If the p-value is small: Formal Decisions REJECT H 0 the sample would be extreme if H 0 were true the results are statistically significant we have evidence for H a If the p-value is not small: DO NOT REJECT H 0 the sample would not be too extreme if H 0 were true the results are not statistically significant the test is inconclusive; either H 0 or H a may be true 1

Formal Decisions A formal hypothesis test has only two possible conclusions: 1. The p-value is small: reject the null hypothesis in favor of the alternative 2. The p-value is not small: do not reject the null hypothesis Significance Level The significance level,, is the threshold below which the p-value is deemed small enough to reject the null hypothesis p-value < Reject H 0 p-value > Do not Reject H 0 How small? Significance Level If the p-value is less than, the results are statistically significant, and we reject the null hypothesis in favor of the alternative If the p-value is not less than, the results are not statistically significant, and our test is inconclusive Often = 0.05 by default, unless otherwise specified Resveratrol, an ingredient in red wine and grapes, has been shown to promote weight loss in rodents, and has recently been investigated in primates (specifically, the Grey Mouse Lemur). A sample of lemurs had various measurements taken before and after receiving resveratrol supplementation for 4 weeks BioMed Central (2010, June 22). Lemurs lose weight with life-extending supplement resveratrol. Science Daily. In the test to see if the mean resting metabolic rate is higher after treatment, the p-value is 0.013. Using = 0.05, is this difference statistically significant? (should we reject H 0 : no difference?) In the test to see if the mean body mass is lower after treatment, the p-value is 0.007. Using = 0.05, is this difference statistically significant? (should we reject H 0 : no difference?) 2

In the test to see if locomotor activity changes after treatment, the p-value is 0.980. Using = 0.05, is this difference statistically significant? (should we reject H 0 : no difference?) In the test to see if mean food intake changes after treatment, the p-value is 0.035. Using = 0.05, is this difference statistically significant? (should we reject H 0 : no difference?) Elephant Example H 0 : X is an elephant H a : X is not an elephant Would you conclude, if you get the following data? X walks on two legs X has four legs Never Accept H 0 Do not reject H 0 is not the same as accept H 0! Lack of evidence against H 0 is NOT the same as evidence for H 0! For the logical fallacy of believing that a hypothesis has been proved to be true, merely because it is not contradicted by the available facts, has no more right to insinuate itself in statistical than in other kinds of scientific reasoning -Sir R. A. Fisher In a hypothesis test of H 0 : = 10 vs H a : < 10 the p-value is 0.002. With α = 0.05, we conclude: a) Reject H 0 b) Do not reject H 0 c) Reject H a In a hypothesis test of H 0 : = 10 vs H a : < 10 the p-value is 0.002. With α = 0.01, we conclude: a) There is evidence that = 10 b) There is evidence that < 10 c) We have insufficient evidence to conclude anything d) Do not reject H a 3

In a hypothesis test of H 0 : = 10 vs H a : < 10 the p-value is 0.21. With α = 0.01, we conclude: a) Reject H 0 b) Do not reject H 0 c) Reject H a In a hypothesis test of H 0 : = 10 vs H a : < 10 the p-value is 0.21. With α = 0.01, we conclude: a) There is evidence that = 10 b) There is evidence that < 10 c) We have insufficient evidence to conclude anything d) Do not reject H a Formal decision of hypothesis test, based on = 0.05 : statistically significant Informal strength of evidence against H 0 : not statistically significant Multiple Sclerosis and Sunlight It is believed that sunlight offers some protection against multiple sclerosis, but the reason is unknown Researchers randomly assigned mice to one of: Control (nothing) Vitamin D Supplements UV Light All mice were injected with proteins known to induce a mouse form of MS, and they observed which mice got MS Seppa, Nathan. Sunlight may cut MS risk by itself, Science News, April 24, 2010 pg 9, reporting on a study appearing March 22, 2010 in the Proceedings of the National Academy of Science. Multiple Sclerosis and Sunlight For each situation below, write down Null and alternative hypotheses Informal description of the strength of evidence against H 0 Formal decision about H 0, using α = 0.05 Conclusion in the context of the question Multiple Sclerosis and Sunlight In testing whether UV light provides protection against MS (UV light vs control group), the p-value is 0.002. In testing whether UV light provides protection against MS (UV light vs control group), the p-value is 0.002. In testing whether Vitamin D provides protection against MS (Vitamin D vs control group), the p- value is 0.47. 4

Truth 2/17/2014 Multiple Sclerosis and Sunlight In testing whether Vitamin D provides protection against MS (Vitamin D vs control group), the p-value is 0.47. Errors There are four possibilities: Decision Reject H 0 Do not reject H 0 H 0 true TYPE I ERROR H 0 false TYPE II ERROR A Type I Error is rejecting a true null (false positive) A Type II Error is not rejecting a false null (false negative) In the test to see if resveratrol is associated with food intake, the p-value is 0.035. o If resveratrol is not associated with food intake, a Type I Error would have been made In the test to see if resveratrol is associated with locomotor activity, the p-value is 0.980. o If resveratrol is associated with locomotor activity, a Type II Error would have been made Analogy to Law A person is innocent until proven guilty. Evidence must be beyond the shadow of a doubt. Types of mistakes in a verdict? Convict an innocent Release a guilty Probability of Type I Error Distribution of statistics, assuming H 0 true: Probability of Type I Error Distribution of statistics, assuming H 0 true: If the null hypothesis is true: 5% of statistics will be in the most extreme 5% 5% of statistics will give p-values less than 0.05 5% of statistics will lead to rejecting H 0 at α = 0.05 If α = 0.05, there is a 5% chance of a Type I error If the null hypothesis is true: 1% of statistics will be in the most extreme 1% 1% of statistics will give p-values less than 0.01 1% of statistics will lead to rejecting H 0 at α = 0.01 If α = 0.01, there is a 1% chance of a Type I error 5

Probability of Type I Error The probability of making a Type I error (rejecting a true null) is the significance level, α Probability of Type II Error How can we reduce the probability of making a Type II Error (not rejecting a false null)? Option 1: a) Decrease the significance level b) Increase the significance level Option 2: a) Decrease the sample size b) Increase the sample size Probability of Errors The probability of making a Type I error (rejecting a true null) if the null is true is the significance level, α The probability of making a Type II error (not rejecting a false null) if the alternative is true depends on the significance level and the sample size (among other things) α should be chosen depending how bad it is to make a Type I or Type II error Choosing α By default, usually α = 0.05 If a Type I error (rejecting a true null) is much worse than a Type II error, we may choose a smaller α, like α = 0.01 If a Type II error (not rejecting a false null) is much worse than a Type I error, we may choose a larger α, like α = 0.10 Significance Level Come up with a hypothesis testing situation in which you may want to Use a smaller significance level, like = 0.01 Use a larger significance level, like = 0.10 Statistical vs Practical Significance With small sample sizes, even large differences or effects may not be significant With large sample sizes, even a very small difference or effect can be significant A statistically significant result is not always practically significant, especially with large sample sizes 6

Statistical vs Practical Significance Example: Suppose a weight loss program recruits 10,000 people for a randomized experiment. A difference in average weight loss of only 0.5 lbs could be found to be statistically significant Suppose the experiment lasted for a year. Is a loss of ½ a pound practically significant? Diet and Sex of Baby Are certain foods in your diet associated with whether or not you conceive a boy or a girl? To study this, researchers asked women about their eating habits, including asking whether or not they ate 133 different foods regularly A significant difference was found for breakfast cereal (mothers of boys eat more), prompting the headline Breakfast Cereal Boosts Chances of Conceiving Boys. http://www.newscientist.com/article/dn13754-breakfast-cereals-boost-chances-of-conceiving-boys.html Breakfast Cereal Boosts Chances of Conceiving Boys I m had identical twin boys a year ago, and I eat breakfast cereal every morning. Do you think this helped to boost my chances of having boys? c) Impossible to tell Hypothesis Tests For each of the 133 foods studied, a hypothesis test was conducted for a difference between mothers who conceived boys and girls in the proportion who consume each food If there are NO differences (all null hypotheses are true), about how many significant differences would be found using α = 0.05? How might you explain the significant difference for breakfast cereal? Multiple Testing When multiple hypothesis tests are conducted, the chance that at least one test incorrectly rejects a true null hypothesis increases with the number of tests. If the null hypotheses are all true, α of the tests will yield statistically significant results just by random chance. www.causeweb.org Author: JB Landers 7

Multiple Comparisons Consider a topic that is being investigated by research teams all over the world Using α = 0.05, 5% of teams are going to find something significant, even if the null hypothesis is true Multiple Comparisons Consider a research team/company doing many hypothesis tests Using α = 0.05, 5% of tests are going to be significant, even if the null hypotheses are all true Multiple Comparisons This is a serious problem The most important thing is to be aware of this issue, and not to trust claims that are obviously one of many tests (unless they specifically mention an adjustment for multiple testing) There are ways to account for this (e.g. Bonferroni s Correction), but these are beyond the scope of this class Publication Bias publication bias refers to the fact that usually only the significant results get published The one study that turns out significant gets published, and no one knows about all the insignificant results This combined with the problem of multiple comparisons, can yield very misleading results Jelly Beans Cause Acne! http://xkcd.com/882/ 8

http://xkcd.com/882/ Summary Results are statistically significant if the p-value is less than the significance level, α In making formal decisions, reject H 0 if the p-value is less than α, otherwise do not reject H 0 Not rejecting H 0 is NOT the same as accepting H 0 Two types of errors: rejecting a true null (Type I) and not rejecting a false null (Type II) Statistical vs practical significance Using α = 0.05, 5% of all hypothesis tests will lead to rejecting the null, even if all nulls are true To Do Project 1 proposal due TODAY at 11:59pm Read Section 4.3, 4.5 Do HW 4 (due Monday, 2/24) 9