Multiple Testing in a Two-Stage Adaptive Design With Combination Tests Controlling FDR

Similar documents

Treatment Spring Late Summer Fall Mean = 1.33 Mean = 4.88 Mean = 3.

Example 27.1 Draw a Venn diagram to show the relationship between counting numbers, whole numbers, integers, and rational numbers.

Polynomial Functions. Polynomial functions in one variable can be written in expanded form as ( )

An Undergraduate Curriculum Evaluation with the Analytic Hierarchy Process

Economics Letters 65 (1999) macroeconomists. a b, Ruth A. Judson, Ann L. Owen. Received 11 December 1998; accepted 12 May 1999

Helicopter Theme and Variations

Lecture 3 Gaussian Probability Distribution

Factoring Polynomials

DlNBVRGH + Sickness Absence Monitoring Report. Executive of the Council. Purpose of report

Graphs on Logarithmic and Semilogarithmic Paper

Reasoning to Solve Equations and Inequalities

Distributions. (corresponding to the cumulative distribution function for the discrete case).

Small Business Networking

Small Business Networking

Basic Analysis of Autarky and Free Trade Models

LINEAR TRANSFORMATIONS AND THEIR REPRESENTING MATRICES

EQUATIONS OF LINES AND PLANES

All pay auctions with certain and uncertain prizes a comment

Small Business Networking

COMPARISON OF SOME METHODS TO FIT A MULTIPLICATIVE TARIFF STRUCTURE TO OBSERVED RISK DATA BY B. AJNE. Skandza, Stockholm ABSTRACT

UNIVERSITY OF NOTTINGHAM. Discussion Papers in Economics STRATEGIC SECOND SOURCING IN A VERTICAL STRUCTURE

CHAPTER 11 Numerical Differentiation and Integration

Operations with Polynomials

How To Network A Smll Business

How To Understand The Theory Of Inequlities

Value Function Approximation using Multiple Aggregation for Multiattribute Resource Management

Experiment 6: Friction

How To Set Up A Network For Your Business

MATH 150 HOMEWORK 4 SOLUTIONS

SPECIAL PRODUCTS AND FACTORIZATION

Integration by Substitution

Portfolio approach to information technology security resource allocation decisions

Small Business Networking

Econ 4721 Money and Banking Problem Set 2 Answer Key

Math 135 Circles and Completing the Square Examples

4.11 Inner Product Spaces

Health insurance exchanges What to expect in 2014

Contextualizing NSSE Effect Sizes: Empirical Analysis and Interpretation of Benchmark Comparisons

9 CONTINUOUS DISTRIBUTIONS

Unit 29: Inference for Two-Way Tables

Or more simply put, when adding or subtracting quantities, their uncertainties add.

Lump-Sum Distributions at Job Change, p. 2

Vectors Recap of vectors

and thus, they are similar. If k = 3 then the Jordan form of both matrices is

Example A rectangular box without lid is to be made from a square cardboard of sides 18 cm by cutting equal squares from each corner and then folding

Health insurance marketplace What to expect in 2014

MODULE 3. 0, y = 0 for all y

PROF. BOYAN KOSTADINOV NEW YORK CITY COLLEGE OF TECHNOLOGY, CUNY

Decision Rule Extraction from Trained Neural Networks Using Rough Sets

Techniques for Requirements Gathering and Definition. Kristian Persson Principal Product Specialist

Data replication in mobile computing

College Admissions with Entrance Exams: Centralized versus Decentralized

Bayesian Updating with Continuous Priors Class 13, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

ClearPeaks Customer Care Guide. Business as Usual (BaU) Services Peace of mind for your BI Investment

Quality Evaluation of Entrepreneur Education on Graduate Students Based on AHP-fuzzy Comprehensive Evaluation Approach ZhongXiaojun 1, WangYunfeng 2

5.2. LINE INTEGRALS 265. Let us quickly review the kind of integrals we have studied so far before we introduce a new one.

Why is the NSW prison population falling?

Hillsborough Township Public Schools Mathematics Department Computer Programming 1

Numeracy across the Curriculum in Key Stages 3 and 4. Helpful advice and suggested resources from the Leicestershire Secondary Mathematics Team

Integration. 148 Chapter 7 Integration

Physics 43 Homework Set 9 Chapter 40 Key

JaERM Software-as-a-Solution Package

Regular Sets and Expressions

ENHANCING CUSTOMER EXPERIENCE THROUGH BUSINESS PROCESS IMPROVEMENT: AN APPLICATION OF THE ENHANCED CUSTOMER EXPERIENCE FRAMEWORK (ECEF)

Utilization of Smoking Cessation Benefits in Medicaid Managed Care,

Introducing Kashef for Application Monitoring

Babylonian Method of Computing the Square Root: Justifications Based on Fuzzy Techniques and on Computational Complexity

Modeling POMDPs for Generating and Simulating Stock Investment Policies

Appendix D: Completing the Square and the Quadratic Formula. In Appendix A, two special cases of expanding brackets were considered:

Improving Library Users' Perceived Quality, Satisfaction and Loyalty: An Integrated Measurement and Management System

The mean-variance optimal portfolio

Health insurance exchanges What to expect in 2014

Roudmup for Los Angeles Pierce College ADIV Program ancl csu Dominguez Hilk Rlt-B^sr/ progrum

Second Term MAT2060B 1. Supplementary Notes 3 Interchange of Differentiation and Integration

** Dpt. Chemical Engineering, Kasetsart University, Bangkok 10900, Thailand

Section 5-4 Trigonometric Functions

2015 EDITION. AVMA Report on Veterinary Compensation

The Velocity Factor of an Insulated Two-Wire Transmission Line

Solving BAMO Problems

Recognition Scheme Forensic Science Content Within Educational Programmes

NOTES. Cohasset Associates, Inc Managing Electronic Records Conference 8.1

QUANTITATIVE METHODS IN PSYCHOLOGY A Power Primer

COMPONENTS: COMBINED LOADING

6.2 Volumes of Revolution: The Disk Method

Characteristics of Applicants Who Obtain Interviews at Orthodontic Postgraduate Programs

Performance analysis model for big data applications in cloud computing

Credit Ratings, Collateral, and Loan Characteristics: Implications for Yield*

g(y(a), y(b)) = o, B a y(a)+b b y(b)=c, Boundary Value Problems Lecture Notes to Accompany

Anthem Blue Cross Life and Health Insurance Company University of Southern California Custom Premier PPO 800/20%/20%

This paper considers two independent firms that invest in resources such as capacity or inventory based on

GAO HIGHER EDUCATION. Improved Tax Information Could Help Families Pay for College. Report to the Committee on Finance, U.S.

Transcription:

This rticle ws downloded by: [New Jersey Institute of Technology] On: 28 Februry 204, At: 08:46 Publisher: Tylor & Frncis Inform Ltd Registered in nglnd nd Wles Registered Number: 072954 Registered office: Mortimer House, 37-4 Mortimer Street, London WT 3JH, UK Journl of the Americn Sttisticl Assocition Publiction detils, including instructions for uthors nd subscription informtion: http://mstt.tndfonline.com/loi/us20 Multiple Testing in Two-Stge Adptive Design With Combintion Tests Controlling FDR Snt K. Srkr, Jingjing Chen b & Wenge Guo c Deprtment of Sttistics, Temple University, Phildelphi, PA, 922 b Clinicl Sttistics, MedImmune/ AstrZenec, Githersburg, MD, 20878 c Deprtment of Mthemticl Sciences, New Jersey Institute of Technology, Newrk, NJ, 0702 Accepted uthor version posted online: 24 Aug 203.Published online: 9 Dec 203. To cite this rticle: Snt K. Srkr, Jingjing Chen & Wenge Guo (203) Multiple Testing in Two-Stge Adptive Design With Combintion Tests Controlling FDR, Journl of the Americn Sttisticl Assocition, 08:504, 385-40, DOI: 0.080/062459.203.835662 To link to this rticle: http://dx.doi.org/0.080/062459.203.835662 PLAS SCROLL DOWN FOR ARTICL Tylor & Frncis mkes every effort to ensure the ccurcy of ll the informtion (the Content ) contined in the publictions on our pltform. However, Tylor & Frncis, our gents, nd our licensors mke no representtions or wrrnties whtsoever s to the ccurcy, completeness, or suitbility for ny purpose of the Content. Any opinions nd views expressed in this publiction re the opinions nd views of the uthors, nd re not the views of or endorsed by Tylor & Frncis. The ccurcy of the Content should not be relied upon nd should be independently verified with primry sources of informtion. Tylor nd Frncis shll not be lible for ny losses, ctions, clims, proceedings, demnds, costs, expenses, dmges, nd other libilities whtsoever or howsoever cused rising directly or indirectly in connection with, in reltion to or rising out of the use of the Content. This rticle my be used for reserch, teching, nd privte study purposes. Any substntil or systemtic reproduction, redistribution, reselling, lon, sub-licensing, systemtic supply, or distribution in ny form to nyone is expressly forbidden. Terms & Conditions of ccess nd use cn be found t http:// mstt.tndfonline.com/pge/terms-nd-conditions

Supplementry mterils for this rticle re vilble online. Plese go to www.tndfonline.com/r/jasa Multiple Testing in Two-Stge Adptive Design With Combintion Tests Controlling FDR Snt K. SARKAR, Jingjing CHN, nd Wenge GUO Testing multiple null hypotheses in two stges to decide which of these cn be rejected or ccepted t the first stge nd which should be followed up for further testing hving hd dditionl observtions is of importnce in mny scientific studies. We develop two procedures, ech with two different combintion functions, s nd, to combine p-vlues from two stges, given prespecified boundries on the first-stge p-vlues in terms of the flse discovery rte (FDR) nd controlling the overll FDR t desired level. The FDR control is proved when the pirs of first- nd second-stge p-vlues re independent nd those corresponding to the null hypotheses re identiclly distributed s pir (p,p 2 ) stisfying the p-clud property. We did simultions to show tht () our two-stge procedures cn hve significnt power improvements over the first-stge Benjmini Hochberg (BH) procedure compred to the improvement offered by the idel BH procedure tht one would hve used hd the second stge dt been vilble for ll the hypotheses, nd cn continue to control the FDR under some dependence situtions, nd (2) cn offer considerble cost svings compred to the idel BH procedure. The procedures re illustrted through rel gene expression dt. Supplementry mterils for this rticle re vilble online. KY WORDS: rly cceptnce nd rejection boundries; Flse discovery rte; Single-stge BH procedure; Stepdown test; Stepup test; Two-stge multiple testing.. INTRODUCTION Gene ssocition or expression studies tht usully involve lrge number of endpoints (i.e., genetic mrkers) re often quite expensive. Such studies conducted in multistge dptive design setting cn be cost effective nd efficient, since genes re screened in erly stges nd selected genes re further investigted in lter stges using dditionl observtions. Multiplicity in simultneous testing of hypotheses ssocited with the endpoints in multistge dptive design is n importnt issue, s in single-stge design. For ddressing the multiplicity concern, controlling the fmilywise error rte (FWR), the probbility of t lest one Type I error mong ll hypotheses, is commonly pplied concept. However, these studies re often explortive, so controlling the flse discovery rte (FDR), which is the expected proportion of Type I errors mong ll rejected hypotheses, is more pproprite thn controlling the FWR (Weller et l. 998; Benjmini nd Hochberg 995; Storey nd Tibshirni 2003). Moreover, with lrge number of hypotheses typiclly being tested in these studies, better power cn be chieved in multiple testing method under the FDR frmework thn under the more conservtive FWR frmework. Although dptive designs with multiple endpoints hve been considered in the literture under the FDR frmework (Zehetmyer, Buer, nd Posch 2005, 2008; Victor nd Hommel 2007; Posch, Zehetmyer, nd Buer 2009), the theory presented so fr (see, e.g, Victor nd Hommel 2007) towrd developing n FDR controlling procedure in the setting of two-stge dptive design with combintion tests does not seem Snt K. Srkr is Cyrus H. K. Curtis Professor, Deprtment of Sttistics, Temple University, Phildelphi, PA 922 (-mil: snt@temple.edu). Jingjing Chen is Associte Director, Clinicl Sttistics, MedImmune/ AstrZenec, Githersburg, MD 20878 (-mil: chenjin@medimmune.com). Wenge Guo is Assistnt Professor, Deprtment of Mthemticl Sciences, New Jersey Institute of Technology, Newrk, NJ 0702 (-mil: wenge.guo@gmil.com). This work is bsed on Jingjing s PhD thesis under the supervision of Srkr. The reserch of Srkr nd Guo were supported by NSF Grnts DMS-006344, 309273 nd DMS-00602, 30962 respectively. We thnk the A nd two referees whose comments led much improved presenttion. to be s simple s one would hope for. Moreover, it does not llow setting boundries on the first stge p-vlues in terms of FDR nd operte in mnner tht would be nturl extension of stndrd single-stge FDR controlling methods, like the BH (Benjmini nd Hochberg 995) or methods relted to it, from single-stge to two-stge design setting. So, we consider the following to be our min problem in this rticle: To construct n FDR controlling procedure for simultneous testing of the null hypotheses ssocited with multiple endpoints in the following two-stge dptive design setting: The hypotheses re sequentilly screened t the first stge s rejected or ccepted bsed on prespecified boundries on their p-vlues in terms of the FDR, nd those tht re left out t the first stge re gin sequentilly tested t the second stge hving determined their second-stge p-vlues bsed on dditionl observtions nd then using the combined p-vlues from the two stges through combintion function. We propose two FDR controlling procedures, one extending the originl single-stge BH procedure, which we cll the BH- TSADC Procedure (BH-type procedure for two-stge dptive design with combintion tests), nd the other extending n dptive version of the single-stge BH procedure incorporting n estimte of the number of true null hypotheses, which we cll the Plug-In BH-TSADC Procedure, from single-stge to twostge setting. Let (p i,p 2i ) be the pir of first- nd second-stge p-vlues corresponding to the ith null hypothesis. We provide theoreticl proof of the FDR control of the proposed procedures under the ssumption tht the (p i,p 2i ) s re independent nd those corresponding to the true null hypotheses re identiclly distributed s (p,p 2 ) stisfying the p-clud property (Brnnth, Posch, nd Buer 2002), nd some stndrd ssumption on the combintion function. We consider two specil types of combintion function, s nd, which re often used in multiple testing pplictions, nd present explicit formuls for 203 Americn Sttisticl Assocition Journl of the Americn Sttisticl Assocition December 203, Vol. 08, No. 504, Theory nd Methods DOI: 0.080/062459.203.835662 385

386 Journl of the Americn Sttisticl Assocition, December 203 probbilities involving them tht would be useful to crry out the proposed procedures t the second stge either using criticl vlues tht cn be determined before observing the p-vlues or bsed on estimted FDR s tht cn be obtined fter observing the p-vlues. We crried out extensive simultions to investigte how well our proposed procedures perform in terms of FDR control nd power under independence with respect to the number of true null hypotheses nd the selection of erly stopping boundries. Simultions were lso performed () to exmine the cost svings our procedures cn potentilly offer reltive to the mximum possible cost incurred idelly by the BH method one would hve used hd the second stge dt been vilble for ll the endpoints, nd (2) to evlute whether or not the proposed procedures cn continue to control the FDR under different types of (positive) dependence mong the underlying test sttistics we consider, such s equl, clumpy, nd utoregressive of order one [AR()] dependence. Our simultion studies indicte tht between the two proposed procedures, the BH-TSADC seems to be the better choice in terms of controlling the FDR nd power improvement over the single-stge BH procedure when,the proportion of true nulls, is lrge. If is not lrge, the Plug-In BH-TSADC procedure is better, but it might lose the FDR control when the p-vlues exhibit equl or AR() type dependence with lrge equl- or uto-correltion. In terms of cost, both our procedures cn provide significntly lrge svings. We pplied our proposed two-stge procedures to renlyze the dt on multiple myelom considered before by Zehetmyer, Buer, nd Posch (2008), of course, for different purpose. The dt consist of set of 2,625 gene expression mesurements for ech of 36 ptients with bone lytic lesions nd 36 ptients in control group without such lesions. We considered these dt in two-stge frmework, with the first 8 subjects per group for Stge nd the next 8 per group for Stge 2. With some prechosen erly rejection nd cceptnce boundries, these procedures produce significntly more discoveries thn the first-stge BH procedure reltive to the dditionl discoveries mde by the idel BH procedure bsed on the full dt from both stges. The rticle is orgnized s follows. We review some bsic results on the FDR control in single-stge design in Section 2, present our proposed two-stge procedures in Section 3, discuss the results of simultions studies in Section 4, nd illustrte the rel dt ppliction in Section 5. We conclude the rticle in Section 6 with some remrks on the present work nd brief discussions on some future reserch topics including those relted to designing n FDR-bsed two-stge study. Proofs of our min theorem nd propositions re given in Appendix. 2. CONTROLLING TH FDR IN A SINGL-STAG DSIGN Suppose tht there re m endpoints nd the corresponding null hypotheses H i, i =,...,m, re to be simultneously tested bsed on their respective p-vlues p i, i =,...,m, obtined in single-stge design. The FDR of multiple testing method tht rejects R nd flsely rejects V null hypotheses is (FDP), where FDP = V/mx{R,} is the flse discovery proportion. Multiple testing is often crried out using stepwise procedure defined in terms of p () p (m), the ordered p-vlues. With H (i) the null hypothesis corresponding to p (i), stepup procedure with criticl vlues γ γ m rejects H (i) for ll i k = mx{j : p (j) γ j }, provided the mximum exists; otherwise, it ccepts ll null hypotheses. A stepdown procedure, on the other hnd, with these sme criticl vlues rejects H (i) for ll i k = mx{j : p (i) γ i for ll i j}, provided the mximum exists, otherwise, ccepts ll null hypotheses. The following re formuls for the FDR s of stepup or single-step procedure (when the criticl vlues re sme in stepup procedure) nd stepdown procedure in single-stge design, which cn guide us in developing stepwise procedures controlling the FDR in two-stge design. We will use the nottion FDR for the FDR of procedure in single-stge design. Result. (Srkr 2008). Consider stepup or stepdown method for testing m null hypotheses bsed on their p-vlues p i, i =,...,m, nd criticl vlues γ γ m in singlestge design. The FDR of this method is given by FDR [ ( )] I pi γ R ( i) m (γ 2,...,γ m )+ R ( i) m (γ, 2,...,γ m ) + with the equlity holding in the cse of stepup method, where I is the indictor function, J 0 is the set of indices of the true null hypotheses, nd R ( i) m (γ 2,...,γ m ) is the number of rejections in testing the m null hypotheses other thn H i bsed on their p-vlues nd using the sme type of the stepwise method with the criticl vlues γ 2 γ m. With p i hving the cdf F (u) when H i is true, the FDR of stepup or stepdown method with the thresholds γ i, i =,...,m, under independence of the p-vlues, stisfies the following: FDR ( ( ) ) F γr ( i) m (γ 2,...,γ m )+ R ( i) m (γ. 2,...,γ m ) + When F is the cdf of U(0, ) nd these thresholds re chosen s γ i = iα/m, i =,...,m, the FDR equls α for the stepup nd is less thn or equl to α for the stepdown method, where is the proportion of true nulls, nd hence the FDR is controlled t α. This stepup method is the so-clled BH method (Benjmini nd Hochberg 995), the most commonly used FDR controlling procedure in single-stge deign. The FDR is bounded bove by α for the BH s well s its stepdown nlog under certin type of positive dependence condition mong the p-vlues (Benjmini nd Yekutieli 200; Srkr 2002, 2008). The ide of improving the FDR control of the BH method by plugging into it suitble estimte ˆ of, tht is, by considering the modified p-vlues ˆ p i, rther thn the originl p-vlues, in the BH method, ws introduced by Benjmini nd Hochberg (2000), which ws lter brought into the estimtion-bsed pproch to controlling the FDR by Storey (2002). A number of such plugged-in versions of the BH method with proven nd improved FDR control mostly under independence hve been put forwrd bsed on different methods of estimting (e.g., Storey, Tylor, nd Siegmund 2004; Benjmini, Krieger, nd Yekutieli 2006; Srkr 2008; Blnchrd nd Roquin 2009; Gvrilov, Benjmini, nd Srkr 2009).

Srkr, Chen, nd Guo: Multiple Testing in Two-Stge Adptive Design With Combintion Tests Controlling FDR 387 3. CONTROLLING TH FDR IN A TWO-STAG ADAPTIV DSIGN Now suppose tht the m null hypotheses H i, i =,...,m, re to be simultneously tested in two-stge dptive design setting. When testing single hypothesis, sy H i, the theory of two-stge combintion test cn be described s follows: given p i,thep-vlue vilble for H i t the first stge, nd two constnts λ<λ, mke n erly decision regrding the hypothesis by rejecting it if p i λ, ccepting it if p i >λ, nd continuing to test it t the second stge if λ<p i λ. At the second stge, combine p i with the dditionl p-vlue p 2i vilble for H i using combintion function C(p i,p 2i ) nd reject H i if C(p i,p 2i ) γ, for some constnt γ. The constnts λ, λ, nd γ re determined subject to control of the Type I error rte t prespecified level by the test. For simultneous testing, we consider nturl extension of this theory from single to multiple testing. More specificlly, given the first-stge p-vlue p i corresponding to H i for i =,...,m, we first determine two thresholds 0 ˆλ <ˆλ, stochstic or nonstochstic, nd mke n erly decision regrding the hypotheses t this stge by rejecting H i if p i ˆλ, ccepting H i if p i > ˆλ, nd continuing to test H i t the second stge if ˆλ <p i ˆλ. At the second stge, we use the dditionl p-vlue p 2i vilble for follow-up hypothesis H i nd combine it with p i using the combintion function C(p i,p 2i ). The finl decision is tken on the follow-up hypotheses t the second stge by determining nother threshold ˆγ, gin stochstic or nonstochstic, nd by rejecting the follow-up hypothesis H i if C(p i,p 2i ) ˆγ. Both first-stge nd second-stge thresholds re to be determined in such wy tht the overll FDR is controlled t the desired level α. Let p () p (m) be the ordered versions of the firststge p-vlues, with H (i) being the null hypotheses corresponding to p (i), i =,...,m, nd q i = C(p i,p 2i ). We describe in the following generl multiple testing procedure bsed on the bove theory, before proposing our FDR controlling procedures tht will be of this type. A Generl Stepwise Procedure.. For two nondecresing sequences of constnts λ λ m nd λ λ m, with λ i <λ i for ll i =,...,m, nd the first-stge p-vlues p i, i =,...,m, define two thresholds s follows: R = mx{ i m : p (j) λ j for ll j i} nd S = mx{ i m : p (i) λ i }, where 0 R S m nd R or S equls zero if the corresponding mximum does not exist. Reject H (i) for ll i R, ccept H (i) for ll i>s, nd continue testing H (i) t the second stge for ll i such tht R <i S. 2. At the second stge, consider q (i), i =,...,S R, the ordered versions of the combined p-vlues q i = C(p i,p 2i ), i =,...,S R, for the follow-up null hypotheses, nd find R 2 (R,S ) = mx{ i S R : q (i) γ R +i,s }, given nother nondecresing sequence of constnts γ r +,s γ s,s, for every fixed r <s. Reject the follow-up null hypothesis H (i) corresponding to q (i) for ll i R 2 if this mximum exists, otherwise, reject none of the follow-up null hypotheses. Remrk. We should point out tht the bove two-stge procedure screens out the null hypotheses t the first stge by ccepting those with reltively lrge p-vlues through stepup procedure nd by rejecting those with reltively smll p-vlues through stepdown procedure. At the second stge, it pplies stepup procedure to the combined p-vlues. Conceptully, one could hve used ny type of multiple testing procedure to screen out the null hypotheses t the first stge nd to test the followup null hypotheses t the second stge. However, the prticulr types of stepwise procedure we hve chosen t the two stges provide flexibility in terms of developing formul for the FDR nd eventully determining explicitly the thresholds we need to control the FDR t the desired level. Let V nd V 2 denote the totl numbers of flsely rejected mong ll the R null hypotheses rejected t the first stge nd the R 2 follow-up null hypotheses rejected t the second stge, respectively, in the bove procedure. Then, the overll FDR in this two-stge procedure is given by [ FDR 2 = V + V 2 mx{r + R 2, } The following theorem (to be proved in Appendix) will guide us in determining the first- nd second-stge thresholds in the bove procedure tht will provide control of FDR 2 t the desired level. This is the procedure tht will be one of those we propose in this rticle. Before stting the theorem, we need to define the following nottions. R ( i) : Defined s R in terms of the m first-stge p-vlues {p,...,p m }\{p i } nd the sequence of constnts λ 2 λ m. S ( i) : Defined s S in terms of {p,...,p m }\{p i } nd the sequences of constnts λ 2 λ m. R ( i) : Defined s R in terms of {p,...,p m }\{p i } nd the sequence of constnts λ λ m. R ( i) 2 : Defined s R 2 with R replced by R ( i) nd S replced by S ( i) + nd noting the number of rejected follow-up null hypotheses bsed on ll the combined p-vlues except the q i nd the criticl vlues other thn the first one; tht is, ( R ( i),s ( i) + ) = mx { j S ( i) R ( i) 2 R ( i) 2 γ R ( i) +j+,s ( i) + ]. R ( i) : q ( i) (j) }, where q ( i) (j) s re the ordered versions of the combined p-vlues for the follow-up null hypotheses except the q i. Theorem. The FDR of the bove generl multiple testing procedure stisfies the following inequlity: FDR 2 [ ( )] I pi λ R ( i) + R ( i) + + i J 0 I( λ R ( i) + <p i λ,q ) S ( i) + i γ R ( i) +R ( i) 2 +,S ( i) +. R ( i) + R ( i) 2 + The theorem is proved in Appendix. 3. BH-type Procedures We re now redy to propose our FDR controlling multiple testing procedures in two-stge dptive design setting with

388 Journl of the Americn Sttisticl Assocition, December 203 combintion function. Before tht, let us stte some ssumptions we need. Assumption. The combintion function C(p,p 2 ) is nondecresing in both rguments. Assumption 2. The pirs (p i,p 2i ), i =,...,m, re independently distributed nd the pirs corresponding to the null hypotheses re identiclly distributed s (p,p 2 ) with joint distribution tht stisfies the p-clud property (Brnnth, Posch, nd Buer 2002), tht is, Pr (p u) u nd Pr (p 2 u p ) u for ll 0 u. Let us define the function H (c; t,t ) = t t 0 I(C(u,u 2 ) c)du 2 du, 0 <c<. When testing single hypothesis bsed on the pir (p,p 2 ) using t nd t s the first-stge rejection nd cceptnce thresholds, respectively, nd c s the second-stge rejection threshold, H (c; t,t ) is the chnce of this hypothesis to be followed up nd rejected in the second stge when it is null. Definition. (BH-TSADC Procedure).. Given the level α t which the overll FDR is to be controlled, three sequences of constnts λ i = iλ/m, i =,...,m, λ i = iλ /m, i =,...,m, for some prefixed λ<α<λ, nd γ r +,s γ s,s, stisfying H ( γ r +i,s ; λ r,λ ) (r + i)(α λ) s =, m i =,...,s r, for every fixed r <s m, find R = mx{ i m : p (j) λ j for ll j i} nd S = mx{ i m : p (i) λ i }, with R or S being equl to zero if the corresponding mximum does not exist. 2. Reject H (i) for i R ; ccept H (i) for i>s ; nd continue testing H (i) for R <i S,ifR <S, mking use of the dditionl p-vlues p 2i s vilble for ll such follow-up hypotheses t the second stge. 3. At the second stge, consider the combined p-vlues q i = C(p i,p 2i ) for the follow-up null hypotheses. Let q (i), i =,...,S R, be their ordered versions. Reject H (i) (the null hypothesis corresponding to q (i) ) for ll i R 2 (R,S ) = mx{ j S R : q (j) γ R +j,s }, provided this mximum exists, otherwise, reject none of the follow-up null hypotheses. Proposition. Let be the proportion of true null hypotheses. Then, the FDR of the BH-TSADC method is less thn or equl to α, nd hence controlled t α, if Assumptions nd 2 hold. The proposition is proved in Appendix. The BH-TSADC procedure cn be implemented lterntively, nd often more conveniently, in terms of some FDR estimtes t both stges. With R () (t) = #{i : p i t) nd R (2) (c; t,t ) = #{i : t<p i t,c(p i,p 2i ) c}, let us define mt if R () (t) > 0 FDR (t) = R () (t) 0 if R () (t) = 0, nd FDR 2 (c; t,t ) mh (c; t,t ) if R (2) (c; t,t ) > 0 = R () (t) + R (2) (c; t,t ) 0 if R (2) (c; t,t ) = 0, Then, we hve the following: The BH-TSADC procedure: An lterntive definition. Reject H (i) for ll i R = mx{ k m : FDR (p (j) ) λ for ll j k}; ccept H (i) for ll i>s = mx{ k m : FDR (p (k) ) λ }; continue to test H (i) t the second stge for ll i such tht R <i S, if R <S. Reject H (i), the follow-up null hypothesis corresponding to q (i), t the second stge for ll i R 2 (R,S ) = mx{ k S R : FDR 2 (q (k) ; R λ/m, S λ /m) α λ}. Remrk 2. The BH-TSADC procedure is n extension of the BH procedure, from method of controlling the FDR in single-stge design to tht in two-stge dptive design with combintion tests. When λ = 0 nd λ =, tht is, when we hve single-stge design bsed on the combined p-vlues, this method reduces to the usul BH method. Note tht FDR (t) is conservtive estimte of the FDR of the single-step test with the rejection p i t for ech H i. So, the BH-TSADC procedure screens out those null hypotheses s being rejected (or ccepted) t the first stge the estimted FDR s t whose p-vlues re ll less thn or equl to λ (or greter thn λ ). Clerly, the BH-TSADC procedure cn potentilly be improved in terms of hving tighter control over its FDR t α by plugging suitble estimte of into it while choosing the second-stge thresholds, similr to wht is done for the BH method in single-stge design. As sid in Section 2, there re different wys of estimting, ech of which hs been shown to provide the ultimte control of the FDR, of course when the p-vlues re independent, by the resulting plugged-in version of the single-stge BH method (see, e.g., Srkr 2008). However, we will consider the following estimte of, which is of the type considered in Storey, Tylor, nd Siegmund (2004) nd seems nturl in the context of the present dptive design setting where m S of the null hypotheses re ccepted s being true t the first stge: ˆ = m S + m( λ ). The following theorem gives modified version of the BH-TSADC procedure using this estimte. Definition 2. (Plug-In BH-TSADC Procedure). Consider the BH-TSADC procedure with R nd S bsed on the sequences of constnts λ i = iλ/m, i =,...,m, nd λ i = iλ /m, i =,...,m,given0 λ<λ, nd the secondstge criticl vlues γ R +i,s, i =,...,S R, given by the equtions H ( γr +i,s ; λ r,λ ) (r + i)(α λ) s =, () m ˆ for i =,...,s r. Proposition 2. The FDR of the Plug-In BH-TSADC method is less thn or equl to α if Assumptions nd 2 hold.

Srkr, Chen, nd Guo: Multiple Testing in Two-Stge Adptive Design With Combintion Tests Controlling FDR 389 lmbd=0.005 FDR lmbd=0 lmbd=5 Plug in BH: Stge Dt BH: Full Dt Figure. Comprison of simulted FDRs of BH-TSADC nd Plug-In BH-TSADC procedures with simulted FDRs of first-stge nd full-dt BH procedures, with m = 00, λ = 0.005, 0,nd 5, λ = 0.5, nd α =. The online version of this figure is in color. A proof of this proposition is given in Appendix. As in the BH-TSADC procedure, the Plug-In BH-TSADC procedure cn lso be described lterntively using estimted FDR s t both stges. Let FDR 2 (c; t,t ) m ˆ H (c; t,t ) if R (2) (c; t,t ) > 0 = R () (t) + R (2) (c; t,t ) 0 if R (2) (c; t,t ) = 0, Then, we hve the following: The Plug-In BH-TSADC procedure: An lterntive definition. At the first stge, decide the null hypotheses to be rejected, ccepted, or continued to be tested t the second stge bsed on FDR, s in (the lterntive description of) the BH-TSADC procedure. At the second stge, reject H (i), the follow-up null hypothesis corresponding to q (i), for ll i R 2 (R,S ) = mx{ k S R : FDR 2 (q (k); R λ/m, S λ /m) α λ}. 3.2 Two Specil Combintion Functions We now present explicit formuls of H (c; t,t ) for two specil combintion functions s nd often used in multiple testing pplictions. s combintion function: C(p,p 2 ) = p p 2. t H (c; t,t ) = I(C(u,u 2 ) c)du 2 du t 0 ( t ) c ln if c<t t ( = t ) c t + c ln if t c<t c t t if c t, for c (0, ). The H function for combintion function is lso given in n unpublished mnuscript, Chen, J., Srkr, S. K. nd Bretz, F. (20). Finding Criticl Vlues with Prefixed rly Stopping Boundries nd Controlled Type I rror for Two- Stge Combintion Test. (2)

390 Journl of the Americn Sttisticl Assocition, December 203.0 lmbd=0.005.0 power.0 lmbd=0 lmbd=5 Plug in BH: Stge Dt BH: Full Dt Figure 2. Comprison of simulted verge powers of BH-TSADC nd Plug-In BH-TSADC procedures with simulted verge powers of first-stge nd full-dt BH procedures, with m = 00, λ = 0.005, 0, nd 5, λ = 0.5, nd α =. The online version of this figure is in color. combintion function: C(p,p 2 ) = min{2min(p, p 2 ), mx(p,p 2 )}. H (c; t,t ) t = I(C(u,u 2 ) c)du 2 du t 0 c 2 (t t) if c t ( t ) c 2 t + c2 if t<c min(2t,t ) 2 c(t t) if t <c 2t = c 2 ( + t ) t if 2t <c t c 2 ( + 2t ) c2 2 t if mx(2t, t ) c 2t t t if c 2t, for c (0, ). See lso Brnnth, Posch, nd Buer (2002) for formul (2). These formuls cn be used to determine the criticl vlues γ i s before observing the combined p-vlues or to estimte the FDR fter observing the combined p-vlues t the second stge in the BH-TSADC nd Plug-In BH-TSADC procedures with s nd combintion functions. Of course, for lrge vlues of m, it is numericlly more chllenging to determine the γ i s thn estimting the FDR t the second stge, nd so in tht cse we would recommend using the lterntive versions of these procedures. Given the p-vlues from the two stges, s combintion function llows us to give equl importnce to the evidences from both stges before forming composite evidence towrd deciding on the corresponding null hypothesis. combintion function, on the other hnd, llows us to mke this decision bsed on the strength of evidence provided by ech individul p-vlue reltive to the other. 4. SIMULATION STUDIS There re number of importnt issues relted to our proposed procedures tht re worth investigting. Modifying the

Srkr, Chen, nd Guo: Multiple Testing in Two-Stge Adptive Design With Combintion Tests Controlling FDR 39 lmbd=0.005 FDR lmbd=0 lmbd=5 Plug in BH: Stge Dt BH: Full Dt Figure 3. Comprison of simulted FDRs of BH-TSADC nd Plug-In BH-TSADC procedures with simulted FDRs of first-stge nd full-dt BH procedures, with m = 000, λ = 0.005, 0, 5, λ = 0.5, nd α =. The online version of this figure is in color. first-stge BH method to mke it more powerful in the present two-stge dptive design setting reltive to the idel BH method tht would hve been used hd the second stge dt been collected for ll the hypotheses, without losing the ultimte control over the FDR, is n importnt rtionle behind developing our proposed methods. Hence, it is importnt to numericlly investigte how well the proposed procedures control the FDR nd how powerful they cn potentilly be compred to both the first-stge nd idel BH methods. Since the ultimte control over the FDR hs been theoreticlly estblished for our methods only under independence, it would be worthwhile to provide some insight through simultions into their FDRs under some dependence situtions. The considertion of cost efficiency is s essentil s tht of improved power performnce while choosing two-stge multiple testing procedure over its single-stge version, nd so it is lso importnt to provide numericl evidence of how much cost svings our procedures cn potentilly offer reltive to the mximum possible cost incurred by using the idel BH method. We conducted our simultion studies ddressing these issues. More detils bout these studies nd conclusions derived from them re given in the following subsections. 4. FDR nd Power Under Independence To investigte how well our procedures perform reltive to the first-stge nd full-dt BH methods under independence, we () generted two independent sets of m uncorrelted rndom vribles Z i N(μ i, ), i =,...,m, one for Stge nd the other for Stge 2, hving set m of these μ i s t zero nd the rest t 2; (2) tested H i : μ i = 0ginstK i : μ i > 0, simultneously for i =,...,m, by pplying ech of the following procedures t α = to the generted dt: The (lterntive versions of) BH-TSADC nd Plug-In BH-TSADC procedures, ech with s nd combintion functions, the first-stge BH method, nd the BH method bsed on combining the dt from two stges (which we cll the full-dt BH method); nd (3) noted the flse discovery proportion nd the proportion of

392 Journl of the Americn Sttisticl Assocition, December 203.0 lmbd=0.005.0 power.0 lmbd=0 lmbd=5 Plug in BH: Stge Dt BH: Full Dt Figure 4. Comprison of simulted verge powers of BH-TSADC nd Plug-In BH-TSADC procedures with simulted verge powers of first-stge nd full-dt BH procedures, with m = 000, λ = 0.005, 0, 5, λ = 0.5, nd α =. The online version of this figure is in color. flse nulls tht re rejected. We repeted Steps 3 000 times nd verged out the bove proportions over these 000 runs to obtin the finl simulted vlues of FDR nd verge power (the expected proportion of flse nulls tht re rejected) for ech of these procedures. The simulted FDRs nd verge powers of these procedures for different vlues of nd selections of erly stopping boundries hve been grphiclly displyed in Figures 8. Figures nd 3 compre the BH-TSADC nd Plug-In BH-TSADC procedures bsed on both s nd combintion functions with the first-stge nd full-dt BH procedures in terms of the FDR control for m = 00 (Figure ) nd 000 (Figure 3), the erly rejection boundry λ = 0.005, 0, or 5, nd the erly cceptnce boundry λ = 0.5; wheres, Figures 2 nd 4 do the sme in terms of the verge power. Figures 5 8 re reproductions of Figures 4, respectively, with different selections of erly rejection nd cceptnce boundries: λ = 5 nd λ = 0.5,, or 0.9. 4.2 FDR Under Dependence We considered three different scenrios for dependent p- vlues in our simultion study to investigte the FDR control of our procedures under dependence. In prticulr, we generted two independent sets of m = 00 correlted norml rndom vribles Z i N(μ i, ), i =,...,m, one for Stge nd the other for Stge 2, with m of the μ i s being equl to 0 nd the rest being equl to 2, nd correltion mtrix exhibiting one of the three different types of dependence equl, clumpy, nd AR() dependence. In other words, the Z i s were ssumed to hve common, nonnegtive correltion ρ in cse of equl dependence, were broken up into 0 independent groups with 0 of the Z i s within ech group hving common, nonnegtive correltion ρ in cse of clumpy dependence, nd were ssumed to hve correltions ρ ij = Cor(Z i,z j )oftheformρ ij = ρ i j for ll i j =,...,m, nd some nonnegtive ρ in cse of AR() dependence. We then pplied the (lterntive versions of) the BH-TSADC nd Plug-In BH-TSADC procedures t level

Srkr, Chen, nd Guo: Multiple Testing in Two-Stge Adptive Design With Combintion Tests Controlling FDR 393 lmbd =0.5 FDR lmbd = lmbd =0.9 Plug in BH: Stge Dt BH: Full Dt Figure 5. Comprison of simulted FDRs of BH-TSADC nd Plug-In BH-TSADC procedures with simulted FDRs of first-stge nd full-dt BH procedures, with m = 00, λ = 5, λ = 0.5,, 0.9, nd α =. The online version of this figure is in color. α = with both s nd combintion functions, λ = 5, nd λ = 0.5 to these dtsets. These two steps were repeted 000 times before obtining the simulted FDRs for these procedures. Figures 9 grphiclly disply the simulted FDRs of these procedures for different vlues of nd types of dependent p- vlues considered. 4.3 Cost Sving Let us consider determining the cost sving in the context of genome-wide ssocition study. Becuse of high cost of genotyping hundreds of thousnds of mrkers on thousnds of subjects, such genotyping is often crried out in two-stge formt. A proportion of the vilble smples re genotyped on lrge number of mrkers in the first stge, nd smll proportion of these mrkers re selected nd then followed up by genotyping them on the remining smples in the second stge. Suppose tht c is the unit cost of genotyping one mrker for ech ptient, n is the totl number of ptients ssigned cross stges nd 2, nd m is the totl number of mrkers for ech ptient. Then, if we hd to pply the full-dt BH method, the totl cost of genotyping for ll these ptients would be n m c. Wheres, if we pply our proposed methods with frction f of the n ptients ssigned to stge, then the expected totl cost would be f n m c + ( f ) n [m (S(f ))] c, where S(f ) is the totl number of rejected nd ccepted hypotheses in the first stge. Thus, for our proposed methods, the expected proportion of sving from the mximum possible cost of using the full-dt BH method is ( f ) n (S(f )) c m n c = ( f )(S(f )). m Tble presents the simulted vlues of this expected proportion of cost sving for our proposed two-stge methods in multiple testing of m (= 00, 000, or 5000) independent norml mens in the present two-stge setting with frction f

394 Journl of the Americn Sttisticl Assocition, December 203.0 lmbd =0.5.0 power.0 lmbd = lmbd =0.9 Plug in BH: Stge Dt BH: Full Dt Figure 6. Comprison of simulted verge powers of BH-TSADC nd Plug-In BH-TSADC procedures with simulted verge powers of first-stge nd full-dt BH procedures, with m = 00, λ = 5, λ = 0.5,, 0.9, nd α =. The online version of this figure is in color. (= 5, 0.50, or 0.75) of the totl number of ptients being llocted to the first stge. 4.4 Conclusions Our simultions in Sections 4. nd 4.2 mimic the scenrios with equl lloction of smple size between the two stges. So, if we mesure the performnce of two-stge procedure by how much power improvement it cn offer over the first-stge BH method reltive to tht offered by the idel, full-dt BH method, then our proposed two-stge FDR controlling procedures with s combintion function re seen from Figures to 8 to do much better under such equl lloction, t lest when the p-vlues re independent both cross the hypotheses nd stges, thn those bsed on combintion function. Of course, our procedures bsed on combintion function re doing resonbly well in terms of this mesure of reltive power improvement. Its performnce is roughly between those Tble. Simulted vlues of the expected proportion of cost sving (with λ = 5 nd λ = 0.5) m = 00 m = 000 m = 5000 = 0.5 = 0.9 = 0.5 = 0.9 = 0.5 = 0.9 f = 5 32 0.5653 337 0.576 336 0.5723 f = 0.50 405 325 442 40 442 407 f = 0.75 0.075 300 0.082 39 0.090 320

Srkr, Chen, nd Guo: Multiple Testing in Two-Stge Adptive Design With Combintion Tests Controlling FDR 395 lmbd =0.5 FDR lmbd = lmbd =0.9 Plug in BH: Stge Dt BH: Full Dt Figure 7. Comprison of simulted FDRs of BH-TSADC nd Plug-In BH-TSADC procedures with simulted FDRs of first-stge nd full-dt BH procedures, with m = 000, λ = 5, λ = 0.5,, 0.9, nd α =. The online version of this figure is in color. of the first-stge nd the full-dt BH methods. Between our two proposed procedures, whether it is bsed on s or combintion function, the BH-TSADC ppers to be the better choice when is lrge, like more thn 50%, which is often the cse in prctice. It controls the FDR not only under independence, which is theoreticlly known, but lso the FDR control seems to be mintined even under different types of positive dependence, s seen from Figures 9 to. If, however, is not lrge, the Plug-In BH-TSADC procedure provides better control of the FDR, lthough it might lose the FDR control when the sttistics generting the p-vlues exhibit equl but moderte to high dependence. Also seen from Figures to 8, there is no pprecible difference in the power performnces of the proposed procedures over different choices of the erly stopping boundries. From Tble, we notice tht our two-stge methods cn provide lrge cost svings. For instnce, with 90% true nulls nd hlf of the totl smple size llocted to the first stge, our procedures cn offer 44% sving from the mximum cost of using the idel, full-dt BH method. This proportion gets lrger with incresing proportion of true nulls or decresing proportion of the totl smple size llocted to the first stge. 5. A RAL DATA APPLICATION To illustrte how the proposed procedures cn be implemented in prctice, we renlyzed dtset tken from n experiment by Tin et l. (2003) nd post-processed by Jeffery, Higgins, nd Culhnce (2006). Zehetmyer, Buer, nd Posch (2008) considered these dt for different purpose. In this dtset, multiple myelom smples were generted with Affymetrix Humn U95A chips, ech consisting 2,625 probe sets. The smples were split into two groups bsed on the presence or the bsence of focl lesions of bone. The originl dtset contins gene expression mesurements of 36 ptients without nd 37 ptients with bone lytic lesions, However, for the illustrtion purpose, we used the gene expression mesurements of 36 ptients with bone lytic lesions nd control group of the sme smple size without such lesions. We

396 Journl of the Americn Sttisticl Assocition, December 203.0 lmbd =0.5.0 power.0 lmbd = lmbd =0.9 Plug in BH: Stge Dt BH: Full Dt Figure 8. Comprison of simulted verge powers of BH-TSADC nd Plug-In BH-TSADC procedures with simulted verge powers of first-stge nd full-dt BH procedures, with m = 000, λ = 5, λ = 0.5,, 0.9, nd α =. The online version of this figure is in color. considered these dt in two-stge frmework, with the first 8 subjects per group for Stge nd the next 8 subjects per group for Stge 2. We prefixed the Stge erly rejection boundry λ t 0.005, 0, or 5, nd the erly cceptnce boundry λ t 0.5,, or 0.9, nd pplied the proposed (lterntives versions of) BH-TSADC nd plug-in BH-TSADC procedures t the FDR level of 5. In prticulr, we considered ll m = 2,625 probe set gene expression mesurements for the first stge dt of 36 ptients (8 ptients per group) nd the full dt of 72 ptients (36 ptients per group) cross two stges, nd nlyzed them bsed on stepdown procedure with the criticl vlues λ i = iλ/m, i =,...,m, nd stepup procedure with the criticl vlues λ i = iλ /m, i =,...,m, using the corresponding p-vlues generted from one-sided t-tests pplied to the first-stge dt. We noted the probe sets tht were rejected by the stepdown procedure nd those tht were ccepted by the stepup procedure. With these numbers being r nd m s, respectively, we took the probe sets tht were neither rejected by the stepdown procedure nor ccepted by the stepup procedure, tht is, the probe sets with the first-stge p-vlues more thn r λ/m but less thn or equl to s λ /m, for further nlysis using estimted FDR bsed on their first-stge nd second-stge p-vlues combined through s nd combintion functions s described in the lterntive versions of the BH-TSADC nd plug-in BH-TSADC procedures. The results of this nlysis re reported in Tble 2. As seen from this tble, the BH-TSADC with s combintion function is doing the best. For instnce, with λ = 0.005 nd λ = 0.9, the proportion of dditionl discoveries it mkes over the first-stge BH method is 04/25 = 83.2% of such dditionl discoveries tht the idel, full-dt BH method could mke, wheres these percentges re 52/25 = 4.6%, 32/25 = 25.6%, nd 6/25 = 2.8% for the BH-TSADC with combintion function, the Plug-In BH-TSADC with s combintion function, nd the Plug-In BH-TSADC

Srkr, Chen, nd Guo: Multiple Testing in Two-Stge Adptive Design With Combintion Tests Controlling FDR 397 S FDR Plug in ρ=0 ρ=0.3 ρ= ρ=0.9 Figure 9. Comprison of simulted FDRs of BH-TSADC nd Plug-In BH-TSADC procedures under equl dependence with m = 00, λ = 5, λ = 0.5, nd α =. The online version of this figure is in color. with combintion function, respectively. This pttern of dominnce of the BH-TSADC with s combintion function over the other procedures is noted for other vlues of λ nd λ s well. This tble provides some dditionl insights into our procedures. For instnce, under positive dependence cross H hypotheses, which cn be ssumed to be the cse for this dtset, it ppers tht the BH-TSADC procedure, with either s or combintion function, tend to become stedily more powerful with incresing λ but fixed λ or with decresing λ but fixed λ. Note tht we did not hve the opportunity to get this insight from our simultion studies. Tble 2. The numbers of discoveries mde out of 2625 probe sets in the Affymetrix Humn U95A Chips dt from Tin et l. (2003)bythe BH-TSADC nd Plug-In BH-TSADC procedures, ech with either s or combintion function, t the FDR level of 5 s BH BH-TSADC Plug-in BH-TSADC BH-TSADC Plug-in BH-TSADC Stge dt Full dt λ = 0.005 λ = 0.5 84 58 33 7 2 27 λ = 97 35 42 7 2 27 λ = 0.9 06 34 54 8 2 27 λ = 0 λ = 0.5 74 4 24 3 2 27 λ = 8 3 30 6 2 27 λ = 0.9 90 3 37 8 2 27 λ = 5 λ = 0.5 56 3 7 2 2 27 λ = 63 29 23 5 2 27 λ = 0.9 69 27 30 8 2 27

398 Journl of the Americn Sttisticl Assocition, December 203 S FDR Plug in ρ=0 ρ=0.3 Figure 0. Comprison of simulted FDRs of BH-TSADC nd Plug-In BH-TSADC procedures under clumpy dependence with m = 00, λ = 5, λ = 0.5, nd α =. The online version of this figure is in color. ρ= ρ=0.9 H S FDR Plug in H ρ=0 ρ=0.3 ρ= ρ=0.9 Figure. Comprison of simulted FDRs of BH-TSADC nd Plug-In BH-TSADC procedures under AR() dependence with m = 00, λ = 5, λ = 0.5, nd α =. The online version of this figure is in color.

Srkr, Chen, nd Guo: Multiple Testing in Two-Stge Adptive Design With Combintion Tests Controlling FDR 399 6. CONCLUDING RMARKS This rticle hs been motivted by the need to hve twostge strtegy for testing multiple null hypotheses, not known before, tht llows mking erly decisions on the null hypotheses in terms of rejection, cceptnce, or continution to the second stge for further testing with more observtions, nd eventully controls the FDR in nonsymptotic setting, s the first step towrd designing n FDR bsed two-stge study. We hve produced two such strtegies by generlizing the clssicl BH method nd its dptive version from single-stge to the present two-stge setting. We hve proved their FDR control under independence nd provided simultion evidence showing their meningful improvements over the first-stge BH method reltive to those idelly offered by the full-dt BH method in terms of both power nd cost svings, nd given n exmple of their utilities in prctice. We lso hve presented numericl evidence tht the proposed strtegies cn mintin control over the FDR even under some dependence situtions. Now tht we know how to test multiple hypotheses in the present two-stge dptive design formt controlling the FDR, we cn get to ddressing issues relted to designing FDR bsed two-stge studies. One such issue is optiml lloction of smple sizes to the two stges. Let us briefly outline the steps one cn tke towrd ddressing this issue. Suppose tht we hve study involving m genes, nd our problem is to identify the differentilly expressed genes between two independent groups by simultneously testing H i : δ i = 0ginstK i : δ i 0fori =,...,m, where δ i = (μ ix μ iy )/σ i is the (stndrdized) effect size defined in terms of μ ix nd μ iy, the group mens, nd σi 2, the common group vrince, for the ith gene, given tht we decide to hve the mximum N number of observtions per gene for ll the groups nd stges combined nd choose some fixed erly stopping boundries λ<λ. Assume tht the observed expression levels for ech group follow norml distributions, with proper normliztion, so tht we cn pply the two-smple t test once such observtions re vilble. We consider using equl smple size per group for this test. An optiml FDR bsed two-stge design bsed on our method of multiple testing cn be constructed s follows. Assume tht we tke n = Nf/2 observtions per group for ech gene t the first stge, for some frction 0 <f <, nd dditionl n 2 = N( f )/2 observtions per group for ech of the m S(f ) follow-up genes, where S(f ) denotes (s in Section 3.4) the totl number of rejected nd ccepted null hypotheses t the first stge. Let x i nd ȳ i be the estimtes of μ ix nd μ iy, respectively, nd si 2 2 be the pooled estimte of σi,fortheith gene bsed on the first-stge observtions, nd x i2, ȳ i2, nd si2 2 be those estimtes bsed on the dditionl observtions for the ith follow-up gene. Let t ij = ( x ij ȳ ij )/s ij 2/nj = ˆδ i nj /2, where ˆδ ij = ( X ij Ȳ ij )/s ij,fori =,...,m, j =, 2. Then, p i = 2[ G ( t i ) is the first-stge p-vlue for the ith gene, for i =,...,m, nd p i2 = 2[ G 2 ( t i2 ) is the second-stge p-vlue for the ith follow-up gene, where G j is the cumultive distribution function of the centrl t distribution with n j 2 degrees of freedom. Now, if we find the f for which our proposed two-stge method of multiple testing bsed on these first- nd second-stge p-vlues mximize the verge power t specified lterntives for some trgeted genes, then tht f will provide good FDR bsed two-stge design, given N, λ nd λ. Of course, it brings forth some newer nd interesting theoreticl issues tht need to be ddressed. We hve proposed our FDR controlling procedures in this rticle considering nonsymptotic setting. However, one my consider developing procedures tht would symptoticlly control the FDR by tking the following pproch towrd finding the first- nd second-stge thresholds subject to the erly boundries λ<λ nd the finl boundry α on the FDR. Given two constnts t<t, consider mking n erly decision regrding H i by rejecting it if p i t, ccepting it if p i >t, nd continuing to test it t the second stge if t<p i t. At the second stge, reject H i if C(p i,p 2i ) c. Storey s(2002) estimte of the FDR t the first-stge is given by m ˆ t FDR (t) = if R () (t) > 0 R () (t) 0 if R () (t) = 0, for some estimte ˆ of. Similrly, the cumultive FDR t the second stge cn be estimted s follows: FDR 2 (c, t, t ) m ˆ [t + H (c; t,t )] if R () (t) + R (2) (c; t,t ) > 0 = R () (t) + R (2) (c; t,t ) 0 if R () (t) + R (2) (c; t,t ) = 0 Let ˆt λ = sup{t : FDR (t ) λ for ll t t}, ˆt λ = inf{t : FDR (t ) >λ for ll t >t}, nd ĉ α (λ, λ ) = sup{c : FDR 2 (c, ˆt λ, ˆt λ ) α}. Then, reject H i if p i ˆt λ or if ˆt λ <p i ˆt λ nd C(p i,p 2i ) ĉ α (λ, λ ). This my control the overll FDR symptoticlly under the wek dependence condition nd the consistency property of ˆ (s in Storey, Tylor, nd Siegmund 2004). The foregoing discussion lso suggests how to estimte the FDR for ech hypothesis in completed two-stge design of the present form. For instnce, for hypothesis with the pir of p-vlues (p,p 2 ), the estimted FDR is FDR (p )ifp ˆt λ or p ˆt λ, nd is FDR 2 (c(p,p 2 ), ˆt λ, ˆt λ )ifˆt λ <p < ˆt λ. There is nother importnt issue relted to the present problem which we hve not touched in this rticle but hope to ddress in different communiction. There re other combintion functions, such s s weighted product ( 932) nd weighted inverse norml (Mosteller nd Bush 954); their performnces would be worth investigting. APPNDIX Proof of Theorem. [ ] V + V 2 FDR 2 = mx{r + R 2, } [ V 2 + mx{r + R 2, } [ ]. V mx{r, } ]

400 Journl of the Americn Sttisticl Assocition, December 203 Now, [ ] V = [ ] I(pi λ R ) mx{r, } mx{r i J, } 0 [ ( ) ] I pi λ ( i) R + ; R ( i) + (s shown in Srkr 2008; see lso Result ). And, [ ] V 2 = mx{r + R 2, } [ I(λR + <p i λ ] S,q i γ R +R 2,S,S >R,R 2 > 0). R + R 2 (A.) Writing R 2 more explicitly in terms of R nd S, we see tht the expression in qution (3) is equl to s m s r [( I ( λ r + <p i λ s,q i γ r +r 2,s,R = r, s = r =0 r 2 = ))/ S = s,r 2 (r,s ) = r 2 (r + r 2 ) ] = m s s r s = r =0 r 2 = [( I ( λ r + <p i λ s, q i γ r +r 2,s R ( i) = r,s ( i) = s, R ( i) 2 (r,s ) = r 2 ))/ (r + r 2 ) ] = m s s r [( I ( λ r + <p i λ s +, s =0 r =0 r 2 =0 q i γ r +r 2 +,s +, R ( i) = r,s ( i) = s, R ( i) ))/ 2 (r,s + ) = r 2 (r + r 2 + ) ] = [( I ( λ R ( i) <p + i λ, S ( i) + ))/( q i γ R ( i) +R ( i) 2 +,S ( i) + R ( i) + R ( i) 2 + )]. Thus, the theorem is proved. Proof of proposition. FDR 2 [ ( ) ] PrH p λ ( i) R + + R ( i) + i J 0 Pr ( H λ R ( i) <p + λ,c(p ),p S ( i) 2 ) γ + R ( i) +R ( i) 2 +,S ( i) + R ( i) + R ( i) 2 + [ ] λr ( i) + + R ( i) + i J 0 Pr( λ R ( i) <u + λ,c(u ),u S ( i) 2 ) γ + R ( i) +R ( i) 2 +,S ( i) +. R ( i) + R ( i) 2 + (A.2) The first sum in qution (4) is less thn or equl to λ,since λ ( i) = R + [R( i) + ]λ/m, nd the second sum is less thn or equl to (α λ), since the probbility in the numertor in this sum is equl to H ( γr ( i) ( i) +R 2 +,S ( i) ; λ ) + R ( i) +,λ S ( i) + [ R ( i) + + R ( i) ] 2 (α λ) =. m Thus, the proposition is proved. Proof of Proposition 2. This cn be proved s in Proposition. More specificlly, first note tht the FDR here, which we cll the FDR 2, stisfies the following: FDR 2 [ ( ) ] I pi λ ( i) R + + R ( i) + i J 0 I( λ R ( i) p + i λ,q ) i γ S ( i) ( i) + R +R ( i) 2 +,S ( i) +, R ( i) + R ( i) 2 + (A.3) where R ( i) 2 R ( i) ( 2 R ( i),s ( i) + ) = mx { j S ( i) R ( i) : q ( i) } (j) γ ( i) R +j+,s ( i) +, with q ( i) (j) being the ordered versions of the combined p-vlues except the q i. As in Proposition, the first sum in qution (5) is less thn or equl to λ. Before working with the second sum, first note tht the γ stisfying qution (), tht is, the following eqution: H ( ) γ r +i,s ; λ r,λ (r + i)(α λ)( λ ) s =, m s + is less thn or equl to the γ stisfying H ( ) γ r +i,s ; λ r,λ (r + i)(α λ)( λ ) s =, m s ( j) for ny fixed j =,...,m. So, the second sum in qution (5) is less thn or equl to ( ) I λ R ( i) p + i λ,q S ( i) i γ + R ( i) +R ( i) 2 +,S ( i) + R ( i) + R ( i) 2 + ( ) = H γ ; λ R ( i) +R ( i) 2 +,S ( i) + R ( i) +,λ S ( i) + R ( i) + R ( i) 2 + = (α λ) [ ] λ α λ, m S ( i) λ since [ ] ; see, for instnce, Srkr (2008, p. 5). m S ( i) Hence, FDR 2 λ + α λ α, which proves the proposition. SUPPLMNTARY MATRIALS As suggested by one of the reviewers, we hve exmined the performnce of our proposed procedures in complicted genetic mode with exponentilly decresing effect sizes. The simultion results cn be found in the supplementry mterils. [Received June 20. Revised September 202.] RFRNCS Benjmini, Y., nd Hochberg, Y. (995), Controlling the Flse Discovery Rte: A Prcticl nd Powerful Approch to Multiple Testing, Journl of the Royl Sttisticl Society, Series B, 57, 289 300. [385,386] (2000), On the Adptive Control of the Flse Discovery Rte in Multiple Testing With Independent Sttistics, Journl of ductionl nd Behviorl Sttistics, 25, 60 83. [386]

Srkr, Chen, nd Guo: Multiple Testing in Two-Stge Adptive Design With Combintion Tests Controlling FDR 40 Benjmini, Y., Krieger, A., nd Yekutieli, D. (2006), Adptive Liner Step-Up Flse Discovery Rte Controlling Procedures, Biometrik, 93, 49 507. [386] Benjmini, Y., nd Yekutieli, D. (200), The Control of the Flse Discovery Rte in Multiple Testing Under Dependency, The Annls of Sttistics, 29, 65 88. [386] Blnchrd, G., nd Roquin,. (2009), Adptive FDR Control Under Independence nd Dependence, Journl of Mchine Lerning Reserch, 0, 2837 287. [386] Brnnth, W., Posch, M., nd Buer, P. (2002), Recursive Combintion Tests, Journl of the Americn Sttisticl Assocition, 97, 236 244. [385,388,390], R. A. (932), Sttisticl Methods for Reserch Workers (4th ed.), London: Oliver nd Boyd. [399] Gvrilov, Y., Benjmini, Y., nd Srkr, S. K. (2009), An Adptive Step-Down Procedure With Proven FDR Control Under Independence, The Annls of Sttistics, 37, 69 629. [386] Jeffery, I., Higgins, D., nd Culhnce, A. (2006), Comprison nd vlution of Methods for Generting Differentill xpressed Genes Lists From Microrry Dt, BMC Bioinformtics, 7, 359 375. [395] Mosteller, F., nd Bush, R. (954), Selected Quntittive Techniques, in Hndbook of Socil Psychology, Vol., ed. G. Lindzey, Cmbridge, MA: Addison-Wesley, pp. 289 334. [399] Posch, M., Zehetmyer, S., nd Buer, P. (2009), Hunting for Significnce With the Flse Discovery Rte, Journl of the Americn Sttisticl Assocition, 04, 832 840. [385] Srkr, S. K. (2002), Some Results on Flse Discovery Rte in Stepwise Multiple Testing Procedures, The Annls of Sttistics, 30,239 257. [386] (2008), On Methods Controlling the Flse Discovery Rte, Snkhy, Series A, 70, 35 68. [386,388,400] Storey, J. (2002), A Direct Approch to Flse Discovery Rtes, Journl of the Royl Sttisticl Society, Series B, 64, 479 498. [386,399] Storey, J., Tylor, J., nd Siegmund, D. (2004), Strong Control, Conservtive Point stimtion nd Simultneous Conservtive Consistency of Flse Discovery Rtes: A Unified Approch, Journl of the Royl Sttisticl Society, Series B, 66, 87 205. [386,388,399] Storey, J., nd Tibshirni, R. (2003), Sttisticl Significnce in Genomewide Studies, Proceedings of the Ntionl Acdemy of Science USA, 00, 9440 9445. [385] Tin,., Zhn, F., Wlker, R., Rsmussen,., M, Y., nd Brlogie, B. (2003), The Role of the WNT-Signling Antgonist DKKI in the Development of Osteolytic Lesions in Multple Myelom, New nglnd Journl of Medicine, 349, 2438 2494. [395,397] Victor, A., nd Hommel, G. (2007), Combining Adptive Design With Control of the Flse Discovery Rte A Generlized Definition for Globl P- vlue, Biometricl Journl, 49, 94 06. [385] Weller, J., Song, J., Heyen, D., Lewin, H., nd Ron, M. (998), A New Approch to the Problem of Multiple Comprisons in the Genetic Dissection of Complex Trits, Genetics, 50, 699 706. [385] Zehetmyer, S., Buer, P., nd Posch, M. (2005), Two-Stge Designs for xperiments With Lrge Number of Hypotheses, Bioinformtics, 2, 377 3777. [385] (2008), Optimized Multi-Stge Designs Controlling the Flse Discovery or the Fmily-Wise rror Rte, Sttistics in Medicine, 27, 445 460. [385,386,395]