Bootstrapping p-value estimations

Similar documents
Statistical issues in the analysis of microarray data

False Discovery Rates

Gene Expression Analysis

Tutorial for proteome data analysis using the Perseus software platform

Package ERP. December 14, 2015

The Bonferonni and Šidák Corrections for Multiple Comparisons

Semi-parametric Differential Expression Analysis via Partial Mixture Estimation

Minería de Datos ANALISIS DE UN SET DE DATOS.! Visualization Techniques! Combined Graph! Charts and Pies! Search for specific functions

Redwood Building, Room T204, Stanford University School of Medicine, Stanford, CA

From Reads to Differentially Expressed Genes. The statistics of differential gene expression analysis using RNA-seq data

Identification of Differentially Expressed Genes with Artificial Components the acde Package

False Discovery Rate Control with Groups

Finding statistical patterns in Big Data

Introduction to SAGEnhaft

Controlling the number of false discoveries: application to high-dimensional genomic data

Package dunn.test. January 6, 2016

False discovery rate and permutation test: An evaluation in ERP data analysis

Statistical Analysis Strategies for Shotgun Proteomics Data

Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach

Internet Appendix to False Discoveries in Mutual Fund Performance: Measuring Luck in Estimated Alphas

Gene expression analysis. Ulf Leser and Karin Zimmermann

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference)

Two-Sample T-Tests Assuming Equal Variance (Enter Means)

Two-Way ANOVA tests. I. Definition and Applications...2. II. Two-Way ANOVA prerequisites...2. III. How to use the Two-Way ANOVA tool?...

A direct approach to false discovery rates

Package empiricalfdr.deseq2

Statistical Analysis. NBAF-B Metabolomics Masterclass. Mark Viant

Master s Thesis. PERFORMANCE OF BETA-BINOMIAL SGoF MULTITESTING METHOD UNDER DEPENDENCE: A SIMULATION STUDY

Cancer Biostatistics Workshop Science of Doing Science - Biostatistics

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

Dichotomic classes, correlations and entropy optimization in coding sequences

Tests for Two Proportions

Comparative genomic hybridization Because arrays are more than just a tool for expression analysis

Statistical Testing of Randomness Masaryk University in Brno Faculty of Informatics

Permutation Tests for Comparing Two Populations

Three-Stage Phase II Clinical Trials

Permutation P-values Should Never Be Zero: Calculating Exact P-values When Permutations Are Randomly Drawn

Package HHG. July 14, 2015

Tests for Two Survival Curves Using Cox s Proportional Hazards Model

Journal of Statistical Software

Statistical inference and data mining: false discoveries control

Statistiek II. John Nerbonne. October 1, Dept of Information Science

Comparing Two Groups. Standard Error of ȳ 1 ȳ 2. Setting. Two Independent Samples

p-values and significance levels (false positive or false alarm rates)

The Effect of Correlation in False Discovery Rate Estimation

Bootstrapping Big Data

MIC - Detecting Novel Associations in Large Data Sets. by Nico Güttler, Andreas Ströhlein and Matt Huska

Tests for One Proportion

1 Why is multiple testing a problem?

Chapter 7 Notes - Inference for Single Samples. You know already for a large sample, you can invoke the CLT so:

Multiple forecast model evaluation

ADO - Omics Data Analysis

Pearson's Correlation Tests

Maximally Selected Rank Statistics in R

NCSS Statistical Software

Comparison of resampling method applied to censored data

Likelihood: Frequentist vs Bayesian Reasoning

"Statistical methods are objective methods by which group trends are abstracted from observations on many separate individuals." 1

HYPOTHESIS TESTING: POWER OF THE TEST

Is There a Future in Property Marketing?

ISyE 2028 Basic Statistical Methods - Fall 2015 Bonus Project: Big Data Analytics Final Report: Time spent on social media

IEMS 441 Social Network Analysis Term Paper Multiple Testing Multi-theoretical, Multi-level Hypotheses

Fallback tests for co-primary endpoints

CHAPTER IV FINDINGS AND CONCURRENT DISCUSSIONS

Tutorial 5: Hypothesis Testing

Hypothesis testing - Steps

Gene Enrichment Analysis

Frequently Asked Questions Next Generation Sequencing

Protein Protein Interaction Networks

STATISTICS AND GENE EXPRESSION ANALYSIS

Methods for assessing reproducibility of clustering patterns

Point Biserial Correlation Tests

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

MODIFIED PARAMETRIC BOOTSTRAP: A ROBUST ALTERNATIVE TO CLASSICAL TEST

MAANOVA: A Software Package for the Analysis of Spotted cdna Microarray Experiments

Exercise with Gene Ontology - Cytoscape - BiNGO

2 Precision-based sample size calculations

individualdifferences

Independent t- Test (Comparing Two Means)

SAM Significance Analysis of Microarrays Users guide and technical document

Bayesian Statistics in One Hour. Patrick Lam

Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test

Adaptive linear step-up procedures that control the false discovery rate

Introduction to General and Generalized Linear Models

Non-Inferiority Tests for One Mean

3.4 Statistical inference for 2 populations based on two samples


Private False Discovery Rate Control

NCSS Statistical Software. One-Sample T-Test

Principles of Hypothesis Testing for Public Health

Chapter 4 Statistical Inference in Quality Control and Improvement. Statistical Quality Control (D. C. Montgomery)

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Comparing Functional Data Analysis Approach and Nonparametric Mixed-Effects Modeling Approach for Longitudinal Data Analysis

The Variability of P-Values. Summary

Section 13, Part 1 ANOVA. Analysis Of Variance

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon

Multiple Comparisons Using R

DDBA 8438: Introduction to Hypothesis Testing Video Podcast Transcript

Data Integration. Lectures 16 & 17. ECS289A, WQ03, Filkov

Name: Date: Use the following to answer questions 3-4:

Transcription:

Bootstrapping p-value estimations In microarray studies it is common that the the sample size is small and that the distribution of expression values differs from normality. In this situations, permutation and bootstrap tests may be appropriate for the identification of differentially expressed genes. Following the bootstrap approach of Algorithm 1, the un-adjusted for multiple comparison p-values for each gene i is estimated as the proportion of permutation-based Shapley value differences δi r (φ( v r), 1 φ( v r))) 2 that are greater than the observed Shapley value difference δ i (φ( v 1 ), φ( v 2 )). The estimated p-values provided by bootstrap methods (with replacement) are less exact than p-values obtained from permutation tests (without replacement) (see e.g. Dudoit et al.(2002, 2003)) but, as we already mentioned, can be used to test the null hypothesis of no differences between the means of two statistics (Efron and Tibshirani (1993)) without assuming that the distributions are otherwise equal (see also Bickel (2002)). Following the approach in Storey and Tibshirani (2003), Figure 1 shows a density histogram of the of 5873 estimated p-values provided by Algorithm 1 on the data-set of 47 children in TP and PR, when v T P + vs. v P R+ is considered. The dashed line is the density we would expect if all genes were null (i.e., with Shapley value not different between the two conditions TP and PR). The density histogram of p-values beyond 0.3 looks fairly flat, which indicates there are mostly null p-values in this region. According to Storey and Tibshirani (2003), the height of this flat proportion actually gives a conservative estimate of the overall proportion of null p-values (77.9%). For comparison we show in Figure 2 a density histogram of the of 5873 estimated p-values provided t-test. Here the region beyond 0.4 looks fairly flat and a conservative estimate of the overall proportion of null p-values is 68.5%. Applying the Algorithm 1 to microarray data, thousands of null hypothesis can be tested separately; so we would need to consider the problem of multiple comparison. In fact, if n is the number of statistical tests, each performed at level α, if the tests are independent, the expected number of false positive is αn, which is very large for large n. It is possible to alleviate this problem by adjusting the individual p-value of the tests for multiplicity. Several methods have been proposed in literature to tackle this problem (see for a summary Amaratunga and Cabrera (2004)), mainly assuming independence of the test statistics. In Algorithm 1, test statistics are likely not independent; in fact they are statistics on the Shapley value distribution in the population of genes, which should be representative of the relevance of each gene (interacting with many others) in determining the association between the genes expression properties of groups of genes 1

Density 0.0 0.5 1.0 1.5 2.0 2.5 0.0 0.2 0.4 0.6 0.8 1.0 Figure 1: density histogram of the of estimated p-values provided by Algorithm 1. Density 0.0 0.5 1.0 1.5 2.0 2.5 0.0 0.2 0.4 0.6 0.8 1.0 Figure 2: density histogram of the of p-values provided by t-test. 2

and the study conditions. On the other hand, the problem of multiplicity is still there, but to establish its entity is even harder with respect to the case of test statistics independency. Moreover, given the very high number of null hypothesis tested in a typical microarray game, aggressively adjusting the p-values for multiplicity could seriously impede the ability of the test to find genes with respective relevance index which are truly different under the two biological conditions at hand. Traditional statistical procedures often control the family-wise error rate (FWER), i.e. the probability that at least one of the true null hypothesis is rejected. Classical p-value adjustment methods for multiple comparisons which control FWER have been found to be too conservative in analyzing differential expression in large-screening microarray data, and the False Discovery Rate (FDR), i.e. the expected proportion of false positives among all positives, has been recently suggested as an alternative for controlling false positives (Benjamini and Hochberg (1995), Dudoit et al. (2003)). Facing the problem of possible dependent statistical tests, we are presently studying an approach to estimate the FDR and FWER in Algorithm 1 using again re-sampling data (Bickel (2002), Jain et al. (2005)). We give here a brief introduction to such an approach. Let V (c) be the average number of bootstrap Shapley value differences equal to or greater than c, in formula: V (c) = 1 m m r=1 ( ) card {i N : βi r (φ( v r), 1 φ( v r)) 2 c}, (1) with the convention that the cardinality of the empty set is zero, i.e. = 0. Let R(c) be the average number of observed Shapley value differences equal to or greater than c, in formula ( ) R(c) = card {i N : δ i (φ( v 1 ), φ( v 2 )) c}. (2) The simplest way to estimate FDR at the threshold value c is obtained via the following relation (Bickel (2002), Jain et al. (2005)) F DR(c) = V (c) R(c), (3) to control the estimated FDR at a level ɛ, let γ be the minimum value of δ i (φ( v 1 ), φ( v 2 )) for which F DR(δ i (φ( v 1 ), φ( v 2 ))) ɛ and reject the j-th null hypothesis if δ i (φ( v 1 ), φ( v 2 )) γ. For what concerns controlling the FWER, as we already said different approach have been proposed. Here we present a single-step method to 3

adjust the p-values obtained in Algorithm 1 for controlling the FWER. For each i N, consider the adjusted p-value p i defined as follows p i = 1 ({r m card ( {1,..., m} : max j N β r j (φ( v r), 1 φ( v r)) ) ) 2 δ i (φ( v 1 ), φ( v 2 ))} ; (4) given the FWER α, reject the i-th null hypothesis if p i α. On the other hand, the best method to use in order to control the FDR or the FWER in the CASh framework, where the interaction between genes is the goal of the analysis and test statistic independency cannot be assumed at all, has still to be identified and validated. References Amaratunga D., Cabrera J. (2004). Exploration and Analysis of DNA Microarray and Protein Array Data, Wiley-Interscience, New Jersey. Benjamini Y., Hochberg Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B, 57:289-300. Bickel, D. R. (2002). Microarray gene expression analysis:data transformation and multiple comparison bootstrapping, Computing Science and Statistics 34, 383-400, Interface Foundation of North America (Proceedings of the 34th Symposium on the Interface, Montreal, Quebec, Canada, April 17-20, 2002) Dudoit S., Yang Y., Speed T., Callow M. (2002). Statistical methods for identifying differentially expressed genes in replicated cdna microarray experiments. Statistica Sinica, 12:111-139. Dudoit S., Shaffer J.P., J.C. Boldrick (2003). Multiple hypothesis testing in microarray experiments, Statistical Science, 18(1), 71-103. Efron B., Tibshirani R. J. (1993). An Introduction to the Bootstrap, Chapman & Hall/CRC: New York. Jain N., Cho H.J., O Connell M., Lee J.K. (2005) Rank-Invariant Resampling Based Estimation of False Discovery Rate for Analysis of Small Sample Microarray Data. BMC Bioinformatics, 6, 187:195. Storey J.D., Tibshirani R. (2003) Statistical significance for genomewide 4

studies. Proceedings of the National Academy of Sciences of the United States of America, 100(16), 9440-9445. 5