Multiple Testing in QTL Mapping. Lucia Gutierrez lecture notes Tucson Winter Institute

Similar documents
From Reads to Differentially Expressed Genes. The statistics of differential gene expression analysis using RNA-seq data

Package dunn.test. January 6, 2016

Descriptive Statistics

1 Why is multiple testing a problem?

Finding statistical patterns in Big Data

Sample Size and Power in Clinical Trials

Statistical issues in the analysis of microarray data

Tutorial for proteome data analysis using the Perseus software platform

False discovery rate and permutation test: An evaluation in ERP data analysis

False Discovery Rates

Minería de Datos ANALISIS DE UN SET DE DATOS.! Visualization Techniques! Combined Graph! Charts and Pies! Search for specific functions

The Bonferonni and Šidák Corrections for Multiple Comparisons

Section 13, Part 1 ANOVA. Analysis Of Variance

Package HHG. July 14, 2015

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.

BA 275 Review Problems - Week 6 (10/30/06-11/3/06) CD Lessons: 53, 54, 55, 56 Textbook: pp , ,

Testing Hypotheses About Proportions

Redwood Building, Room T204, Stanford University School of Medicine, Stanford, CA

Part 2: Analysis of Relationship Between Two Variables

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

The Wondrous World of fmri statistics

Hypothesis testing - Steps

Cancer Biostatistics Workshop Science of Doing Science - Biostatistics

Two-Way ANOVA tests. I. Definition and Applications...2. II. Two-Way ANOVA prerequisites...2. III. How to use the Two-Way ANOVA tool?...

Independent samples t-test. Dr. Tom Pierce Radford University

Introduction to Hypothesis Testing

HYPOTHESIS TESTING: POWER OF THE TEST

Linear Models in STATA and ANOVA

Master s Thesis. PERFORMANCE OF BETA-BINOMIAL SGoF MULTITESTING METHOD UNDER DEPENDENCE: A SIMULATION STUDY

Thresholding of Statistical Maps in Functional Neuroimaging Using the False Discovery Rate 1

Introduction to Hypothesis Testing. Hypothesis Testing. Step 1: State the Hypotheses

Planning sample size for randomized evaluations

Binary Diagnostic Tests Two Independent Samples

research/scientific includes the following: statistical hypotheses: you have a null and alternative you accept one and reject the other

How To Check For Differences In The One Way Anova

Comparative genomic hybridization Because arrays are more than just a tool for expression analysis

Statistics 2014 Scoring Guidelines

"Statistical methods are objective methods by which group trends are abstracted from observations on many separate individuals." 1

Package ERP. December 14, 2015

Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test

Gene Expression Analysis

Testing Research and Statistical Hypotheses

II. DISTRIBUTIONS distribution normal distribution. standard scores

Two Correlated Proportions (McNemar Test)

Association Between Variables

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing

Comparing Two Groups. Standard Error of ȳ 1 ȳ 2. Setting. Two Independent Samples

WISE Power Tutorial All Exercises

Non-Parametric Tests (I)

Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Pearson's Correlation Tests

UNDERSTANDING THE DEPENDENT-SAMPLES t TEST

Chapter 1: The Nature of Probability and Statistics

Step 6: Writing Your Hypotheses Written and Compiled by Amanda J. Rockinson-Szapkiw

Gene expression analysis. Ulf Leser and Karin Zimmermann

Permutation & Non-Parametric Tests

6: Introduction to Hypothesis Testing

Statistical Testing of Randomness Masaryk University in Brno Faculty of Informatics

Point Biserial Correlation Tests

Dichotomic classes, correlations and entropy optimization in coding sequences

Statistiek I. Proportions aka Sign Tests. John Nerbonne. CLCG, Rijksuniversiteit Groningen.

Course Syllabus MATH 110 Introduction to Statistics 3 credits

Lecture Notes Module 1

Statistics in Medicine Research Lecture Series CSMC Fall 2014

Statistical Analysis. NBAF-B Metabolomics Masterclass. Mark Viant

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Contingency Tables and the Chi Square Statistic. Interpreting Computer Printouts and Constructing Tables

Conditional Probability, Hypothesis Testing, and the Monty Hall Problem

Topic #6: Hypothesis. Usage

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS

Section 7.1. Introduction to Hypothesis Testing. Schrodinger s cat quantum mechanics thought experiment (1935)

Introduction. Hypothesis Testing. Hypothesis Testing. Significance Testing

Tests for Two Proportions

A full analysis example Multiple correlations Partial correlations

Introduction to Hypothesis Testing OPRE 6301

SENSITIVITY ANALYSIS AND INFERENCE. Lecture 12

MIC - Detecting Novel Associations in Large Data Sets. by Nico Güttler, Andreas Ströhlein and Matt Huska

Statistical Analysis Strategies for Shotgun Proteomics Data

WHAT IS A JOURNAL CLUB?

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

HLA data analysis in anthropology: basic theory and practice

Variables Control Charts

Correlational Research

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

p-values and significance levels (false positive or false alarm rates)

An Instructor s Guide to Understanding Test Reliability. Craig S. Wells James A. Wollack

CRITICAL PATH ANALYSIS AND GANTT CHARTS

Understanding Confidence Intervals and Hypothesis Testing Using Excel Data Table Simulation

Mathematics Task Arcs

Statistical skills example sheet: Spearman s Rank

DesCartes (Combined) Subject: Mathematics Goal: Statistics and Probability

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

DDBA 8438: Introduction to Hypothesis Testing Video Podcast Transcript

Stat 411/511 THE RANDOMIZATION TEST. Charlotte Wickham. stat511.cwick.co.nz. Oct

IEMS 441 Social Network Analysis Term Paper Multiple Testing Multi-theoretical, Multi-level Hypotheses

Package empiricalfdr.deseq2

A direct approach to false discovery rates

Transcription:

Multiple Testing in QTL Mapping Lucia Gutierrez lecture notes Tucson Winter Institute 1

Multiple Testing 2

Multiple Testing 3

Multiple Testing WHAT IS WRONG WITH THE PREVIOUS CARTOON? A common practice is to use P<0.05 to decide about significance of a test. But with large number of tests (as is common in QTL mapping were we perform one test at each marker), the chance of having at least one false positive reaches 1 pretty quickly. Let s look at some theory to clarify this concept. 4

Hypothesis Testing (in QTL context) OUTCOMES OF A STATISTICAL TEST RESULT FROM TEST Reject H0 Do not reject H0 H0 is (i.e. no QTL) H0 is (i.e. a QTL) THE TRUTH FALSE POSITIVE: occurs when a QTL is incorrectly declared present TRUE POSITIVE: occurs when a QTL is correctly declared present. FALSE NEGATIVE: occurs when a QTL is incorrectly declared absent. TRUE NEGATIVE: occurs when a QTL is correctly declared absent. 5 B5

Type I Error Type I error: It is the incorrect rejection of a true null hypothesis (an incorrectly declared QTL) in a given test. Probability of Type I error (α): It is the probability of finding a false positive (an incorrectly declared QTL) in a given test. RESULT FROM TEST Reject H0 Do not reject H0 H0 is (i.e. no QTL) H0 is (i.e. a QTL) THE TRUTH B5 6

Type II Error Type II error: It is the failure to reject a true null hypothesis (when there is a QTL and we fail to declare it) in a given test. Probability of Type II error (β): It is the probability of finding a false negative (failure to declare a true QTL) in a given test. RESULT FROM TEST Reject H0 Do not reject H0 H0 is (i.e. no QTL) H0 is (i.e. a QTL) THE TRUTH B5 7

Type I and II Error NOTE that for a given test, decreasing the false positive rate means that the power (the proportion of true positives, or 1-β) is also decreased. The only way to decrease false positives and increase the power is by changing your design (i.e. increasing the population sizes, reducing experimental error, etc.). RESULT FROM TEST Reject H0 Do not reject H0 Reject H0 Do not reject H0 H0 is (i.e. no QTL) H0 is (i.e. a QTL) THE TRUTH H0 is (i.e. no QTL) H0 is (i.e. a QTL) THE TRUTH B5 8

Multiple Testing Let s now assume that we are not interested in a single hypothesis testing, but we are conducting multiple hypothesis testing. For example, we are performing one hypothesis testing on each marker (at each marker we ask the question about whether the marker is associated to a QTL or not). We could summarize the information of the number of hypothesis that follow each category in the following chart: Decision Truth Do not reject H0 Reject H0 Total H0 is true U V m 0 H0 is false Z S m 1 =m-m 0 Total m-r R m Benjamini and Hochberg, 1995 9

Multiple Testing Some more definitions: Decision Truth Do not reject H0 Reject H0 Total H0 is true U V m 0 H0 is false Z S m 1 =m-m 0 Total m-r R m Per-family error rate (PFER): expected number of false positives: E(V). Per-comparison error rate (PCER): proportion of false positives in the total: E(V)/m. Family-wise error rate (FWER): probability of at least one false positive: P(V 1). Discovery Rate (FDR): proportion of false positives among rejected: E(V/R). We could be interested in controlling the probability of having at least one false positive test (using family control). Or we could be interested in correcting the results by the proportion of false discoveries among the reject tests (using FDR). Benjamini and Hochberg, 1995 10

Multiple Testing Family control If we aim to control FWER, it is necessary to use a somewhat smaller alpha value. But how small should it be? There are many methods that aim to control FWER. We will discuss two common methods that have been used in QTL mapping. Bonferroni: Since the increase in the error is related to the number of independent tests, Bonferroni proposed to use a new alpha value as threshold: 1 * = 1 1 m α α ( α ) m However, many studies have shown that a Bonferroni correction for QTL studies is overly conservative mainly because tests are not independent (i.e. markers are linked and therefore not-independent). Having an unnecessarily stringent threshold reduces power to detect true QTL as has been shown. Benjamini and Hochberg, 1995; Li and Ji, 2005 11

Multiple Testing Family control Li and Ji (2005): An alternative is to use a Bonferroni correction but instead of using the total number of tests (which we know are not independent because markers are correlated), is to use the effective number of independent tests. This idea was first proposed by Cheverud (2001) and then modified by Li and Ji (2005). The steps are: 1. Calculate the correlation matrix of markers. 2. Use the number of significantly different from zero eigenvalues (λ i ) of the correlation matrix to determine the effective number of independent tests (M eff ) as: M M f 3. Use a Bonferroni-type of correction to determine the threshold but using the effective number of independent tests instead of the total number of 1 tests: Meff Li and Ji, 2005 f eff α * = = ( λ ) i 1 i = I x 1 + x x, x ( x ) ( ) ( ) 0 = 1 α Meff ( 1 α ) 12

Multiple Testing Family control Li and Ji (2005): This method have been shown to be better than the Bonferroni correction and the Cheverud (2001) method. It performs equally good as permutation but is fast and simple to perform. Li and Ji, 2005 13

Multiple Testing Discovery Decision Truth Do not reject H0 Reject H0 Total H0 is true U V m 0 H0 is false Z S m 1 =m-m 0 Total m-r R m The Discovery Rate (FDR) is the proportion of false positives amongst the rejected hypothesis: E(V/R). We are looking at the problem from a different perspective: out of the total tests that we reject, how many are true positives? Reject H0 Do not reject H0 H0 is (i.e. no QTL) H0 is (i.e. a QTL) Benjamini and Hochberg, 1995 14

Multiple Testing Discovery Discovery Rate (FDR): The steps to use the Discovery Rate are as follow: 1. Order the observed p-values: p(1)... p(m) 2. Calculate an arithmetic sequence as follows: 3. Reject all hypothesis where: k = max i : p ( i ) 1 i m i m α i α m FDR is also conservative in most cases. Effective number of markers and a FDR using M eff instead of m could be used (Li and Ji, 2005). Benjamini and Hochberg, 1995 15

Multiple Testing - Permutation Permutations (Broman and Sen, 2009). The steps involved are: 1. Randomize (shuffle) the phenotypes relative to the marker data. 2. Perform a QTL mapping and obtain p-values (or LOD scores) and keep the most extreme value (i.e. Maximum LOD or smallest p- value): M i * 3. Repeat 1 and 2 several (r) times (i.e. 1,000 or 10,000). 4. Produce the empirical distribution of the extreme values: M 1 *,... M r *. 5. Use the 95 th percentile of M i * values as the threshold. This are computing-intensive but are precise for all cases. 16

Multiple Testing 17

Multiple Testing SO WHAT WAS WRONG WITH THE PREVIOUS CARTOON? A common practice is to use P<0.05 to decide about significance of a test. But with large number of tests the chance of having at least one false positive is close to 1. LET S LOOK AT OUR OPTIONS IN HYPOTHESIS TESTING 1. Family Control: Bonferroni multiple-testing protection 1. P = 0.05 / number of tests (this is very conservative) 2. P = 0.05 / effective number of tests, with the effective number of tests estimated from the marker data (Li and Ji, 2005). 2. Discovery Rate (Benjamini and Hochberg, 1995). 3. Permutations (Broman and Sen, 2009). 18