1 Why is multiple testing a problem?



Similar documents
Hypothesis testing - Steps

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing

Introduction to Hypothesis Testing. Hypothesis Testing. Step 1: State the Hypotheses

An Introduction to Statistics Course (ECOE 1302) Spring Semester 2011 Chapter 10- TWO-SAMPLE TESTS

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Section 13, Part 1 ANOVA. Analysis Of Variance

Introduction. Hypothesis Testing. Hypothesis Testing. Significance Testing

Chapter 2. Hypothesis testing in one population

Stat 411/511 THE RANDOMIZATION TEST. Charlotte Wickham. stat511.cwick.co.nz. Oct

THE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7.

Section 7.1. Introduction to Hypothesis Testing. Schrodinger s cat quantum mechanics thought experiment (1935)

Analysis of Variance ANOVA

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Normal distribution. ) 2 /2σ. 2π σ

Odds ratio, Odds ratio test for independence, chi-squared statistic.

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.

Introduction to Hypothesis Testing

Philosophical argument

Testing Hypotheses About Proportions

Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test

Comparison of frequentist and Bayesian inference. Class 20, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

HYPOTHESIS TESTING: POWER OF THE TEST

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Independent samples t-test. Dr. Tom Pierce Radford University

Writing Thesis Defense Papers

Power Analysis for Correlation & Multiple Regression

Descriptive Statistics

Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools

HYPOTHESIS TESTING WITH SPSS:

False Discovery Rates

Hypothesis testing. c 2014, Jeffrey S. Simonoff 1

Testing a claim about a population mean

1 Hypothesis Testing. H 0 : population parameter = hypothesized value:

Difference tests (2): nonparametric

Package dunn.test. January 6, 2016

Minería de Datos ANALISIS DE UN SET DE DATOS.! Visualization Techniques! Combined Graph! Charts and Pies! Search for specific functions

Non-Inferiority Tests for Two Means using Differences

Simple Regression Theory II 2010 Samuel L. Baker

Projects Involving Statistics (& SPSS)

MATH 140 Lab 4: Probability and the Standard Normal Distribution

5544 = = = Now we have to find a divisor of 693. We can try 3, and 693 = 3 231,and we keep dividing by 3 to get: 1

StatCrunch and Nonparametric Statistics

CHAPTER 14 NONPARAMETRIC TESTS

Math 4310 Handout - Quotient Vector Spaces

BA 275 Review Problems - Week 6 (10/30/06-11/3/06) CD Lessons: 53, 54, 55, 56 Textbook: pp , ,

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

A Few Basics of Probability

Conditional Probability, Hypothesis Testing, and the Monty Hall Problem

The 5 P s in Problem Solving *prob lem: a source of perplexity, distress, or vexation. *solve: to find a solution, explanation, or answer for

C. The null hypothesis is not rejected when the alternative hypothesis is true. A. population parameters.

BA 275 Review Problems - Week 5 (10/23/06-10/27/06) CD Lessons: 48, 49, 50, 51, 52 Textbook: pp

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

research/scientific includes the following: statistical hypotheses: you have a null and alternative you accept one and reject the other

Two-sample hypothesis testing, II /16/2004

Psychology 60 Fall 2013 Practice Exam Actual Exam: Next Monday. Good luck!

A full analysis example Multiple correlations Partial correlations

Two-Sample T-Tests Assuming Equal Variance (Enter Means)

Z - Scores. Why is this Important?

Pearson's Correlation Tests

Tests for Two Proportions

Hypothesis Testing: Two Means, Paired Data, Two Proportions

Permutation Tests for Comparing Two Populations

Hypothesis Testing for Beginners

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference)

Comparing Means in Two Populations

Introduction to Hypothesis Testing OPRE 6301

P1. All of the students will understand validity P2. You are one of the students C. You will understand validity

Package empiricalfdr.deseq2

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Likelihood: Frequentist vs Bayesian Reasoning

Bootstrap Hypothesis Test

Non-Inferiority Tests for One Mean

p ˆ (sample mean and sample

Correlational Research

Principles of Hypothesis Testing for Public Health

1.4 Compound Inequalities

Hypothesis Testing --- One Mean

Some Essential Statistics The Lure of Statistics

Lesson 9 Hypothesis Testing

p-values and significance levels (false positive or false alarm rates)

Survey Research: Choice of Instrument, Sample. Lynda Burton, ScD Johns Hopkins University

Tests for One Proportion

Predicting Defaults of Loans using Lending Club s Loan Data

Lesson 26: Reflection & Mirror Diagrams

Vieta s Formulas and the Identity Theorem

Solutions: Problems for Chapter 3. Solutions: Problems for Chapter 3

WRITING PROOFS. Christopher Heil Georgia Institute of Technology

Mind on Statistics. Chapter 12

Using Stata for One Sample Tests

Stat 5102 Notes: Nonparametric Tests and. confidence interval

1.5 Oneway Analysis of Variance

Finding the last cell in an Excel range Using built in Excel functions to locate the last cell containing data in a range of cells.

Sample Size and Power in Clinical Trials

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS

From Reads to Differentially Expressed Genes. The statistics of differential gene expression analysis using RNA-seq data

Multiple-Comparison Procedures

Christopher Seder Affiliate Marketer

Statistical issues in the analysis of microarray data

Point and Interval Estimates

Transcription:

Spring 2008 - Stat C141/ Bioeng C141 - Statistics for Bioinformatics Course Website: http://www.stat.berkeley.edu/users/hhuang/141c-2008.html Section Website: http://www.stat.berkeley.edu/users/mgoldman GSI Contact Info: Megan Goldman mgoldman@stat.berkeley.edu Office Hours: 342 Evans M 10-11, Th 3-4, and by appointment 1 Why is multiple testing a problem? Say you have a set of hypotheses that you wish to test simultaneously. The first idea that might come to mind is to test each hypothesis separately, using some level of significance α. At first blush, this doesn t seem like a bad idea. However, consider a case where you have 20 hypotheses to test, and a significance level of 0.05. What s the probability of observing at least one significant result just due to chance? P(at least one significant result) = 1 P(no significant results) = 1 (1 0.05) 20 0.64 So, with 20 tests being considered, we have a 64% chance of observing at least one significant result, even if all of the tests are actually not significant. In genomics and other biology-related fields, it s not unusual for the number of simultaneous tests to be quite a bit larger than 20... and the probability of getting a significant result simply due to chance keeps going up. Methods for dealing with multiple testing frequently call for adjusting α in some way, so that the probability of observing at least one significant result due to chance remains below your desired significance level. 2 The Bonferroni correction The Bonferroni correction sets the significance cut-off at α/n. For example, in the example above, with 20 tests and α = 0.05, you d only reject a null hypothesis if the p-value is less than 0.0025. The Bonferroni correction tends to be a bit too conservative. To demonstrate this, let s calculate the probability of observing at least one significant result when using the correction just described: 1

P(at least one significant result) = 1 P(no significant results) = 1 (1 0.0025) 20 0.0488 Here, we re just a shade under our desired 0.05 level. We benefit here from assuming that all tests are independent of each other. In practical applications, that is often not the case. Depending on the correlation structure of the tests, the Bonferroni correction could be extremely conservative, leading to a high rate of false negatives. 3 The False Discovery Rate In large-scale multiple testing (as often happens in genomics), you may be better served by controlling the false discovery rate (FDR). This is defined as the proportion of false positives among all significant results. The FDR works by estimating some rejection region so that, on average, FDR < α. 4 The positive False Discovery Rate The positive false discovery rate (pfdr) is a bit of a wrinkle on the FDR. Here, you try to control the probability that the null hypothesis is true, given that the test rejected the null. This method works by first fixing the rejection region, then estimating α, which is quite the opposite of how the FDR is handled. For gory levels of detail, see the Storey paper the professor has linked to from the class website. 5 Comparing the three First, let s make some data. For kicks and grins, we ll use the random normals in such a way that we ll know what the result of each hypothesis test should be. x <- c(rnorm(900), rnorm(100, mean = 3)) p <- pnorm(x, lower.tail = F) These functions should all be familiar from the first few weeks of class. Here, we ve made a vector, x, of length 1000. The first 900 entries are random numbers with a standard normal distribution. The last 100 are random numbers from a normal distribution with mean 3 and sd 1. Note that I didn t need to indicated the sd of 1 in the second bit; it s the default value. The second line of code is finding the p-values for a hypothesis test on each value of x. The hypothesis being tested is that the value of x is not different from 0, given the entries are drawn from a standard normal distribution. The alternate is a one-sided test, claiming that the value is larger than 0. Now, in this case, we know the truth: The first 900 observations should fail to reject the null hypothesis: they are, in fact, drawn from a standard normal distribution and any 2

difference between the observed value and 0 is just due to chance. The last 100 observations should reject the null hypothesis: the difference between these values and 0 is not due to chance alone. Let s take a look at our p-values, adjust them in various ways, and see what sort of results we get. Note that, since we all will have our own random vectors, your figures will probably be a different from mine. They should be pretty close, however. 5.1 No corrections test <- p > 0.05 summary(test[1:900]) summary(test[901:1000]) The two summary lines will give me a count of how many p-values where above and below.05. Based on my random vector, I get something that looks like this: > summary(test[1:900]) logical 46 854 > summary(test[901:1000]) logical 88 12 The type I error rate (false positives) is 46/900 = 0.0511. The type II error rate (false negatives) is 12/100 = 0.12. Note that the type I error rate is awfully close to our α, 0.05. This isn t a coincidence: α can be thought of as some target type I error rate. 5.2 Bonferroni correction We have α = 0.05, and 1000 tests, so the Bonferroni correction will have us looking for p-values smaller than 0.00005: > bonftest <- p > 0.00005 > summary(bonftest[1:900]) logical 1 899 > summary(bonftest[901:1000]) logical 23 77 Here, the type I error rate is 1/900 = 0.0011, but the type II error rate has skyrocketed to 0.77. We ve reduced our false positives at the expense of false negatives. Ask yourself: which is worse? False positives or false negatives? Note: there isn t a firm answer. It really depends on the context of the problem and the consequences of each type of error. 3

5.3 FDR For the FDR, we want to consider the ordered p-values. We ll see if the kth ordered p-value is larger than k.05 1000. psort <- sort(p) fdrtest <- NULL for (i in 1:1000) fdrtest <- c(fdrtest, p[i] > match(p[i],psort) *.05/1000) Let s parse this bit of code. I want the string of trues and falses to be in the same order as the original p-values, so we can easily pick off the errors. I start by creating a separate variable, psort, which holds the same values as p, but sorted from smallest to largest. Say I want to test only the first entry of p: > p[1] > match(p[1],psort) *.05/1000 [1] TRUE p[1] picks off the first entry from the vector p. match(p[1],psort) looks through the vector psort, finds the first value that s exactly equal to p[1], and returns which entry of the vector it is. In my random vector, match(p[1], psort) returns 619. That means that, if you put all the p-values in order from smallest to largest, the 619th largest value is the one that appears first in my vector. The value you get might differ pretty wildly in this case. Anyhow, on to see how the errors go: > summary(fdrtest[1:900]) logical 3 897 > summary(fdrtest[901:1000]) logical 70 30 Now we have a type I error rate of 3/900 = 0.0033. The type II error rate is now 30/70 = 0.30, a big improvement over the Bonferroni correction! 5.4 pfdr The pfdr is an awful lot more involved, coding-wise. Mercifully, someone s already written the package for us. > library(qvalue) > pfdrtest <- qvalue(p)$qvalues >.05 The qvalue function returns several things. By using $qvalues after the function, it says we only really want the bit of the output they call qvalues. 4

> summary(pfdrtest[1:900]) logical 3 897 > summary(pfdrtest[901:1000]) logical 70 30 I seem to get the same results as in the regular FDR, at least at the 5% level. Let s take a look at the cumulative number of significant calls for various levels of α and the different corrections: U ncorrected 31 57 93 118 134 188 Bonf erroni 0 6 13 21 24 31 F DR 0 19 44 63 73 91 pf DR 0 20 48 64 73 93 Here s how the type I errors do:... and type II errors: U ncorrected 0.0011 0.0022 0.0144 0.0344 0.0511 0.1056 Bonf erroni 0 0 0 0.0011 0.0011 0.0011 F DR 0 0 0.0011 0.0022 0.0033 0.0122 pf DR 0 0 0.0011 0.0022 0.0033 0.0144 U ncorrected 0.70 0.45 0.20 0.13 0.12 0.07 Bonf erroni 1 0.94 0.87 0.80 0.77 0.70 F DR 1 0.81 0.57 0.39 0.30 0.20 pf DR 1 0.80 0.53 0.38 0.30 0.20 5