Sample size calculation for multiple testing in microarray data analysis

Size: px
Start display at page:

Download "Sample size calculation for multiple testing in microarray data analysis"

Transcription

1 Biostatistics (2005), 6, 1,pp doi: /biostatistics/kxh026 Sample size calculation for multiple testing in microarray data analysis SIN-HO JUNG Department of Biostatistics and Bioinformatics, Duke University, Box 2716, Durham, NC 27705, USA HEEJUNG BANG Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA STANLEY YOUNG National Institute of Statistical Sciences, Research Triangle Park, NC 27709, USA SUMMARY Microarray technology is rapidly emerging for genome-wide screening of differentially expressed genes between clinical subtypes or different conditions of human diseases. Traditional statistical testing approaches, such as the two-sample t-test or Wilcoxon test, are frequently used for evaluating statistical significance of informative expressions but require adjustment for large-scale multiplicity. Due to its simplicity, Bonferroni adjustment has been widely used to circumvent this problem. It is well known, however, that the standard Bonferroni test is often very conservative. In the present paper, we compare three multiple testing procedures in the microarray context: the original Bonferroni method, a Bonferronitype improved single-step method and a step-down method. The latter two methods are based on nonparametric resampling, by which the null distribution can be derived with the dependency structure among gene expressions preserved and the family-wise error rate accurately controlled at the desired level. We also present a sample size calculation method for designing microarray studies. Through simulations and data analyses, we find that the proposed methods for testing and sample size calculation are computationally fast and control error and power precisely. Keywords: Adjusted p-value; Bonferroni; Multi-step; Permutation; Simulation; Single-step. 1. INTRODUCTION DNA microarray is a biotechnology for performing genome-wide screening and monitoring of expression levels in cells for thousands of genes simultaneously, and has been extensively applied to a broad range of problems in biomedical fields (Golub et al., 1999; Alizadeh and Staudt, 2000; Sander, 2000). A primary aim is often to reveal the association of the expression levels and an outcome or other risk factor of interest. Golub et al. (1999) explored about 7000 genes extracted from bone marrow in 38 patients, 27 with acute lymphoblastic leukemia (ALL) and 11 with acute myeloid leukemia (AML), in order to identify the susceptible genes with potential clinical heterogeneity in the two subclasses of leukemia. Genes useful to distinguish ALL from AML may provide insight into cancer pathogenesis and patient treatment. Biostatistics Vol. 6 No. 1 c Oxford University Press 2005; all rights reserved.

2 158 SIN-HO JUNG ET AL. The authors concluded that roughly 1100 genes were more highly correlated with the AML ALL class distinction relying on, what they called, neighborhood analysis; they then selected the top 50 genes arbitrarily for intensive research. This data set has been referred to and reanalyzed by many other researchers (Thomas et al., 2001; Pan, 2002; Dudoit et al., 2003; Ge et al., 2003). Due to different methods and assumptions adopted, statistical inference obtained from the same data set has varied widely with respect to observed significance and the number of significant genes declared (Pan, 2002; Dudoit et al., 2003). Traditional statistical testing procedures, such as two-sample t-tests or Wilcoxon rank sum tests, are frequently used to determine statistical significance of the difference in gene expression patterns. These approaches, however, are faced with serious multiplicity as a very large number possibly or more of hypotheses are to be tested, while the number of studied experimental units is relatively small tens to a few hundreds (West et al., 2001). If we use a per comparison type I error rate α in each test, the probability of rejecting any null hypothesis when all null hypotheses are true, which is called the family-wise error rate (FWER), will be greatly inflated. So as to avoid this pitfall, the Bonferroni test is used most commonly in this field despite its well-known conservativeness. Although Holm (1979) and Hochberg (1998) improved upon such conservativeness by devising multi-step testing procedures, they did not exploit the dependency of the test statistics and consequently the resulting improvement is often minor. Later, Westfall and Young (1989, 1993) proposed adjusting p-values in a state-of-the-art step-down manner using a simulation or resampling method, by which dependency among test statistics is effectively incorporated. Westfall and Wolfinger (1997) derived exact adjusted p-values for a step-down method for discrete data. Recently, the Westfall and Young s permutation-based test was introduced to microarray data analyses and strongly advocated by Dudoit and her colleagues. Troendle et al. (2004) favor permutation test over bootstrap resampling due to slow convergence in high dimensional data. Various multiple testing procedures and error control methods applicable to microarray experiments are well documented in Dudoit et al. (2003, pp ). Which test to use among a bewildering variety of choices should be judged by relevance to research questions, validity (of underlying assumptions), type of control (strong or weak), and computability. The Bonferroni-type single-step procedure, however, is still attractive due to its easy calculation and interpretation. Comparisons between single vs. multi-step testing procedures have been briefly discussed in several papers, but there is little attempt to compare their theoretical and numerical properties, especially in the microarray framework. A stepwise procedure does not offer a critical value, while the Bonferroni s critical value is fixed based on the number of comparisons. Neither provides a simple way to calculate the minimal sample size for a designated power. Sample size estimation in this area is also an important problem as indicated in Golub et al. (1999), where the authors called for larger studies because they were uncertain about the statistical power. In this article, we compare the Bonferroni, resampling-based single-step and step-down multiple testing procedures through simulation and a real data example. The null distribution of the test statistics is approximated by permutation, which is nonparametric in that it does not require specification of the joint distribution of the test statistics and hence of the p-values. Adjusted p-values are also derived as better-suited summaries of the evidence against the null. Most importantly, we show that the single-step test provides a simple and accurate method for sample size determination and that can also be used for multi-step tests. 2. MULTIPLE TESTING PROCEDURES: REVIEW 2.1 Single-step vs. multi-step Suppose that there are n 1 subjects in group 1 and n 2 subjects in group 2. Gene expression data for m genes are measured from each subject. We want to identify the informative genes, i.e. those that are differentially

3 Sample size calculation for multiple testing in microarray data analysis 159 expressed between the two groups. Let (X 1i1,...,X 1im ) denote the gene expression levels obtained from subject i (= 1,...,n 1 )ingroup 1 and (X 2i1,...,X 2im ) similarly for subject i (= 1,...,n 2 )ingroup 2. Let µ 1 = (µ 11,...,µ 1m ) and µ 2 = (µ 21,...,µ 2m ) represent the respective mean vectors. In order to test whether or not gene j (= 1,...,m) is not differentially expressed between the two conditions, i.e. H j : µ 1 j µ 2 j = 0, we may use the t-test statistic T j = X 1 j X 2 j S j n n 1 2 where X kj is the sample mean in group k (= 1, 2) and S 2 j ={ n 1 i=1 (X 1ij X 1 j ) 2 + n 2 i=1 (X 2ij X 2 j ) 2 }/(n 1 + n 2 2) is the pooled sample variance for the jth gene. Suppose that our interest lies in identifying any genes overexpressed in group 1. This question can be stated as multiple one-sided tests of H j vs. H j : µ 1 j >µ 2 j for j = 1,...,m.Two-sided tests, as a simple extension, will be discussed briefly later and in Appendix 1. A single-step procedure adopts a common critical value c to reject H j,infavorof H j, when T j > c. Inthis case, FWER fixed at α is defined as α = P(T 1 > c or T 2 > c,..., or T m > c H 0 ) = P( max j=1,...,m T j > c H 0 ) (1) where H 0 : µ 1 j = µ 2 j for all j = 1,...,m, or equivalently H 0 = m j=1 H j,isthe complete null hypothesis and the relevant alternative hypothesis is H a = m H j=1 j.inorder to control FWER below the nominal level α, Bonferroni uses c = c α = t n1 +n 2 2,α/m, the upper α/m-quantile for the t-distribution with n 1 + n 2 2degrees of freedom imposing normality for the expression data, or c = z α/m,the upper α/m-quantile for the standard normal distribution based on asymptotic normality. If gene expression levels are not normally distributed, the assumption of t-distribution may be violated. Furthermore, n 1 and n 2 usually may not be large enough to warrant a normal approximation. Even if the assumed conditions are met, the Bonferroni procedure is conservative for correlated data. In fact, microarray data are collected from the same individuals and experience co-regulation, so they are expected to be correlated. Being motivated by these limitations together with the relationship in (1), we derive the distribution of W = max j=1,...,m T j under H 0 using permutation. There are B = ( n ) n 1 different ways of partitioning the pooled sample of size n = n1 + n 2 into two groups of sizes n 1 and n 2.Inorder to maintain the dependence structure and distributional characteristics of the gene expression measures within each subject, the sampling unit is subject, not gene. Recently, this type of resampling became popular in multiple testing to avoid the specification of the true distribution for the gene expression data (Dudoit et al., 2002, 2003; Mutter et al., 2001; Ge et al., 2003). Note that the number of possible permutations B can be very large even with a small size. For instance, with n 1 = n 2 = 10, there exist distinct permutations. A reasonable number of random permutations, say B = , can be chosen for feasible computation. For the observed test statistic t j of T j from the original data, the unadjusted (or raw) p-values can be approximated by p j B 1 B b=1 I (t (b) j t j ) where I (A) is an indicator function of event A. For gene-specific inference, an adjusted p-value quantifying a significance of each gene relative to FWER is more realistic. Toward this end, we define an adjusted p-value for gene j as the minimum FWER for which H j will be rejected, i.e. p j = P(max j =1,...,m T j t j H 0 ).Inwhat follows, this probability is estimated from the permutation distribution: Algorithm 1 (Single-step procedure) (A) Compute the test statistics t 1,...,t m from the original data.

4 160 SIN-HO JUNG ET AL. (B) For the bth permutation of the original data (b = 1,...,B), compute the test statistics t (b) 1,...,t(b) m and w b = max j=1,...,m t (b) j. (C) Estimate the adjusted p-values by p j = B b=1 I (w b t j )/B for j = 1,...,m. (D) Reject all hypotheses H j ( j = 1,...,m) such that p j <α. Alternatively, with steps (C) and (D) replaced, the cut-off value c α can be determined: Algorithm 1 (C ) Sort w 1,...,w B to obtain the order statistics w (1) w (B) and compute the critical value c α = w ([B(1 α)+1]), where [a] is the largest integer no greater than a. Ifthere exist ties, c α = w (k) where k is the smallest integer such that w (k) w ([B(1 α)+1]). (D ) Reject all hypotheses H j ( j = 1,...,m) for which t j > c α. Below is a step-down analog suggested by Dudoit et al. (2002, 2003), originally proposed by Westfall and Young (1989, 1993, see Algorithms 2.8 and 4.1 in their book): Algorithm 2 (Step-down procedure) (A) Compute the test statistics t 1,...,t m from the original data. (A1) Sort t 1,...,t m to obtain the ordered test statistics t r1 t rm, where H r1,...,h rm are the corresponding hypotheses. (B) For the bth permutation of the original data (b = 1,...,B), compute the test statistics t r (b) 1,...,t r (b) m and u b, j = max j = j,...,m t r (b) j for j = 1,...,m. (C) Estimate the adjusted p-values by p r j = B b=1 I (u b, j t r j )/B for j = 1,...,m. (C1) Enforce monotonicity by setting p r j max( p r j 1, p r j ) for j = 2,...,m. (D) Reject all hypotheses H r j ( j = 1,...,m) for which p r j <α. Note that two-sided tests can be fulfilled by replacing t j by t j in steps (B) and (C) in Algorithm 1. Finally, it can be shown that a single-step procedure, controlling the FWER weakly as in (1), also controls the FWER strongly under the condition of subset pivotality (see p. 42 in Westfall and Young, 1993). 2.2 A simulation study We investigate the performance of the multiple testing procedures for control of the FWER and power through a simulation study: the Bonferroni (BON), the single-step procedure (SSP) and the step-down procedure (SDP) presented in this section. To evaluate FWER empirically, 1000-dimensional artificial gene expression profiles in each group were generated from a multivariate Gaussian distribution with zeromeans (i.e. µ 1 = µ 2 = 0) and unit marginal variances. A block exchangeable correlation structure was assumed with the correlation coefficient ρ(= 0, 0.4 or 0.8) and block size 100, i.e. genes are correlated within blocks and uncorrelated between blocks. We used balanced allocation (n 1 = n 2 = n/2) with n = 20 or 50 subjects. With one-sided FWER α = 0.05, c α was approximated from B = 1000 random permutations and the empirical FWER was estimated by the proportion of H 0 being rejected out of N = 1000 replications. As Table 1(A) displays, BON is precise with mild correlation (ρ 0.4), but becomes highly conservative as correlation increases (ρ = 0.8). The conservatism becomes more prominent with a larger sample (n = 50). The estimates from both SSP and SDP are slightly anticonservative with n = 20 and ρ = 0, but accurate overall. Also reported are the average of c α values for SSP over simulation along with

5 Sample size calculation for multiple testing in microarray data analysis 161 Table 1. Simulation results (A) Average FWER (critical value) n ρ BON SSP SDP (4.966) 0.066(4.950) (4.898) (4.384) (4.244) 0.046(4.233) (4.177) (3.767) (B) Average true rejection rate (global power) δ = 1 δ = 1.5 n D ρ BON SSP SDP BON SSP SDP (0.237) 0.022(0.245) 0.022(0.245) 0.116(0.702) 0.117(0.706) 0.117(0.706) (0.190) 0.024(0.208) 0.024(0.208) 0.106(0.536) 0.113(0.554) 0.113(0.554) (0.097) 0.055(0.210) 0.055(0.210) 0.116(0.339) 0.215(0.517) 0.217(0.517) (0.625) 0.020(0.627) 0.020(0.627) 0.115(0.999) 0.119(0.999) 0.119(0.999) (0.395) 0.022(0.421) 0.022(0.421) 0.120(0.856) 0.127(0.866) 0.127(0.866) (0.185) 0.042(0.314) 0.042(0.314) 0.117(0.507) 0.211(0.688) 0.214(0.688) (0.949) 0.268(0.949) 0.268(0.949) 0.842(1.00) 0.844(1.00) 0.845(1.00) (0.810) 0.286(0.834) 0.287(0.834) 0.840(0.997) 0.855(0.997) 0.855(0.997) (0.516) 0.393(0.695) 0.394(0.695) 0.845(0.969) 0.929(0.990) 0.929(0.990) (1.00) 0.267(1.00) 0.268(1.00) 0.842(1.00) 0.844(1.00) 0.846(1.00) (0.947) 0.289(0.956) 0.291(0.956) 0.842(1.00) 0.859(1.00) 0.859(1.00) (0.692) 0.426(0.836) 0.429(0.836) 0.841(0.984) 0.925(0.996) 0.925(0.996) BON=Bonferroni, SSP=single-step procedure, and SDP=step-down procedure. n denotes sample size and D denotes the number of genes with non-zero effect size δ out of m = 1000 genes tested. Block diagonal matrix with block size 100 and correlation ρ was used for correlation structure. Nominal α is set at B = N = 1000 permutations and simulations were used. Average false rejection rates (among genes with zero effect size) range in and are omitted in this table. ones for BON, t n 2,α/m.Asexpected, the estimated critical value c α increases in m (result not shown) and decreases in n and is always smaller than the critical value of BON. Forpower analysis, the first D genes in group 1 have a non-zero effect size δ, i.e. µ 1 = (δ 1 D, 0 m D ), where 1 a and 0 a are a-dimensional row vectors with components of all 1 and 0, respectively. Effect size as well as correlation vary: δ = 1or1.5; ρ = 0, 0.4or0.8. Three different rejection rates were assessed: (1) global power (i.e. the probability of rejecting at least one null hypothesis); (2) false rejection rate (FRR) (i.e. the probability of declaring the genes with a null effect as predictive); and (3) true rejection rate (TRR) (i.e. the probability of declaring the predictive genes as predictive). This is important because high global power does not mean high rate of rejecting individual (true or false) hypotheses as Table 1(B) makes clear. For different concepts of power in the multiple testing context, see Dudoit et al. (2003, p. 74). The FRRs are omitted in the table, being similarly very low (maximum 0.15%) for all entries. All three procedures show that the TRR and global power increase in n, δ or D. Interestingly, ρ is associated inversely with global power but positively with TRR both for SSP and SDP. However, for BON, the TRR is virtually constant in ρ. SSP and SDP exhibit almost the same performance although SDP has slightly higher (by 0.5% at most) TRR than SSP, particularly with D = 50 and n = 50. SSP and SDP show identical global power (and FWER under the composite null) in all cases. This is obvious because global power

6 162 SIN-HO JUNG ET AL. Table 2. Average rejection rate and global power in a classical setting Average rejection rate Global D Procedure TRR FRR power 0 SDP SSP SDP SSP SDP SSP SDP SSP SDP SSP SDP SSP SDP=step-down procedure and SSP=single-step procedure. TRR and FRR denote true rejection rate (among genes that are differentially expressed) and false rejection rate (among genes that are not differentially expressed), respectively. D is the number of genes with non-zero effect size δ. m = 5 genes and B = N = permutations and simulations were used. Compound symmetry with the correlation coefficient of 0.3 and a total sample size n of 20 (n 1 = n 2 = 10) were employed. is governed by the smallest adjusted p-value, min j=1,...,m p j, which is common for the two procedures. We conclude that Algorithms 1 (SSP) and 2 (SDP) behave very similarly in situations typically arising in microarray experiments, where the number of genes is very large but the proportion of genes differentially expressed is small. To examine possible differences of the two procedures, we simulated a typical multiple testing situation with a small number of tests and report our findings in Table 2. We set n 1 = n 2 = 10 and m = 5, among which D = 0,...,5 test hypotheses have effect size δ = 1. Raw data are generated from a multivariate Gaussian distribution with a compound symmetry (CS) structure and mild correlation coefficient (ρ = 0.3). For each D, B = permutations were conducted within each simulation and this process was repeated N = times. As D increases, the TRR and FRR are relatively constant in SSP but sharply increase in SDP. Both TRR and FRR are higher in SDP and the difference becomes more pronounced as D increases. 3. SAMPLE SIZE CALCULATION In this section, we derive a sample size calculation method using the single-step procedure. The calculated sample size is also applied to the step-down procedure since the two procedures have the same global power. Our discussion is focused on one-sided testing, but two-sided testing case can be similarly derived. Recall that the multiple testing procedures discussed in this paper do not require a large sample assumption. However, we derive our sample size formula based on the large sample approximation and then show through simulations that the formula also works well with moderate sample sizes.

7 Sample size calculation for multiple testing in microarray data analysis Algorithms for sample size calculation We wish to determine sample size for a designated global power 1 β. Suppose that the gene expression data {(X ki1,...,x kim ), i = 1,...,n k, k = 1, 2} are random samples from an unknown distribution with E(X kij ) = µ kj,var(x kij ) = σ 2 j and corr(x kij, X kij ) = ρ jj. Let R = (ρ jj ) j, j =1,...,m be the m m correlation matrix. Under H a,wespecify the effect size as δ j = (µ 1 j µ 2 j )/σ j.inthe design stage of a microarray study, we usually project the number of predictive genes D and set an equal effect size among them, i.e. δ j = δ for j = 1,...,D = 0 for j = D + 1,...,m. (2) Appendix 2A shows that, for large n 1 and n 2, (T 1,...,T m ) has approximately the same distribution as (e 1,...,e m ) N(0, R) under H 0 and (e j + δ j npq, j = 1,...,m) under Ha, where p = n 1 /n and q = 1 p. Hence, at FWER = α, the common critical value c α is given as the upper α quantile of max j=1,...,m e j from (1). Similarly, the global power as a function of n is h a (n) = P{ max j=1,...,m (e j + δ j npq)>cα }. Thus, given FWER = α, the sample size n to detect the specified effect sizes (δ 1,...,δ m ) with a global power 1 β will be calculated as the solution to h a (n) = 1 β. Analytic calculation of c α and h a (n) will be feasible only when the distributions of max j e j and max j (e j + δ j npq) are available in simple forms. With a large m,however, it is almost impossible to derive the distributions. We avoid the difficulty by using simulation. Our simulation method is to approximate c α and h a ( ) by generating random vectors (e 1,...,e m ) from N(0, R). For easy generation of the random numbers, we have to assume a simple, but realistic, correlation structure for the gene expression data. Recall that R is the correlation matrix among the gene expression data (X ki1,...,x kim ).Areasonable correlation structure would be block compound symmetry (BCS) or CS (i.e. with only 1 block). Suppose that m genes are partitioned into L blocks, and B l denotes the set of genes belonging to block l (l = 1,...,L). Weassume that ρ jj = ρ if j, j B l for some l, and ρ jj = 0otherwise. Under the BCS structure, we can generate (e 1,...,e m ) as a function of i.i.d. standard normal random variates u 1,...,u m, b 1,...,b L : Finally, the entire procedure can be summarized as follows: e j = u j 1 ρ + bl ρ for j Bl. (3) (a) Specify FWER (α), global power (1 β), effect sizes (δ 1,...,δ m ) and correlation structure (R). (b) Generate K (say, ) i.i.d. random vectors {(e (k) 1,...,e(k) m ), k = 1,...,K } from N(0, R). Let ē k = max j=1,...,m e (k) j. (c) Approximate c α by ē [(1 α)k +1], the [(1 α)k + 1]th order statistic of ē 1,...,ē K. (d) Calculate n by solving ĥ a (n) = 1 β by the bisection method (Press et al., 1996), where ĥ a (n) = K 1 K k=1 I {max j=1,...,m (e (k) j + δ j npq)>cα }. Mathematically put, step (d) is equivalent to finding n = min{n : ĥ a (n) 1 β}. In Appendix 2A, the asymptotic distribution of (T 1,...,T m ) is derived without resort to the use of permutations in testing. In this sense, the above algorithm using (3) will be called a naive method. Appendix 2B shows that the permutation procedure alters the correlation structure among the test statistics

8 164 SIN-HO JUNG ET AL. under H a. Suppose that there are m 1 genes in block 1, among which the first D are predictive. Then, under (2) and BCS, we have (ρ + pqδ 2 )/(1 + pqδ 2 ) ρ 1 if 1 j < j D corr(t j, T j ) ρ/ 1 + pqδ 2 ρ 2 if 1 j D < j m 1 (4) ρ if D < j < j m 1 or j, j B l for l 2 where the approximation is with respect to large n. Let R denote the correlation matrix with these correlation coefficients. Note that R = R under H 0 : δ = 0, so that calculation of c α is the same as in the naive method. However, h a (n) should be modified to h a (n) = P{ max (ẽ j + δ j npq)>cα } j=1,...,m where random samples of (ẽ 1,...,ẽ m ) can be generated using ẽ j = u j 1 ρ1 + b 1 ρ2 + b 1 ρ1 ρ 2 if 1 j D u j 1 ρ + b1 ρ2 + b 0 ρ ρ2 if D < j m 1 u j 1 ρ + bl ρ if j Bl for l 2 with u 1,...,u m, b 1, b 0, b 1,...,b L independently from N(0, 1). Then {(ẽ (k) 1,...,ẽ(k) m ), k = 1,...,K } are i.i.d. random vectors from N(0, R), and ˆ h a (n) = K 1 K k=1 I { max j=1,...,m (ẽ(k) j + δ j npq)>cα }. The sample size calculation solving ˆ h a (n) = 1 β will be named a modified method. Note that the methods discussed here are different from a pure simulation method in the sense that it does not require generating the raw data and then calculating test statistics. Thus, the computing time is not of an order of n m, but of m. Furthermore, we can share the random numbers u 1,...,u m, b 1, b 0, b 1,...,b L in the calculation of c α and n. Wedonot need to generate a new set of random numbers at each replication of the bisection procedures either. If the target n is not large, the large sample approximation may not perform well. In our simulation study, we examine how large n needs to be for an adequate approximation. If the target n is so small that the approximation is questionable, then we have to use a pure simulation method by generating raw data. 3.2 A simulation study We conducted numerical experiments to investigate the accuracy of our sample size estimation. First, sample size was computed under one-sided FWER = 0.05; 80% global power; p = q = 0.5; δ = 0.5or1; ρ = 0.1, 0.4or0.8; m, D and block size varied as shown in Table 3. A simulated sample of the calculated size was generated from the same parameter setting as in sample size calculation. B = N = 1000 samples were generated, and global power was calculated empirically. Sample size increases in ρ (assuming there is no variable reduction technique involved) and decreases in δ. GivenD, intuitively, a larger number of tests (m) demand a larger sample size. The sample sizes by the naive method are underpowered, especially with δ = 1 and large m. The modified method remarkably improves the accuracy except when δ = 1 and m = With large m and ρ, the large sample convergence will be slow; resulting in a poor approximation, especially with a large effect size which yields a small n. These results show that power and sample size depend on not only the study design but also the proposed method for analyzing data.

9 Sample size calculation for multiple testing in microarray data analysis 165 Table 3. Sample size (empirical power) for 80% global power Correlation δ = 0.5 δ = 1 m (block size) D formula ρ = 0.1 ρ = 0.4 ρ = 0.8 ρ = 0.1 ρ = 0.4 ρ = (10) 5 naive 119(0.79) 150(0.79) 179(0.82) 30(0.68) 38(0.75) 45(0.74) modified 127(0.79) 152(0.82) 183(0.80) 35(0.79) 40(0.80) 47(0.76) 1000 (100) 10 naive 139(0.76) 168(0.78) 199(0.76) 35(0.65) 42(0.70) 51(0.75) modified 145(0.81) 176(0.80) 204(0.81) 41(0.79) 48(0.81) 53(0.75) (100) 10 naive 183(0.70) 233(0.75) 284(0.79) 45(0.53) 59(0.70) 71(0.70) modified 188(0.77) 239(0.79) 288(0.81) 53(0.74) 64(0.77) 74(0.75) (1000) 1000 naive 41(0.64) 86(0.82) 152(0.77) 10(0.21) 22(0.68) 39(0.71) modified 57(0.83) 113(0.87) 185(0.82) 20(0.87) 34(0.85) 49(0.85) m is the total number of genes tested and D is the number of genes with non-zero effect size δ. Naive and modified represent the original and modified correlation matrix before and after permutation, respectively. Sample size n was estimated from K = 5000 simulated samples. B = N = 1000 times of permutation and simulation were used. 4. APPLICATION TO LEUKEMIA DATA In this section, the leukemia data from Golub et al. (1999) are reanalyzed. There are n ALL = 27 patients with ALL and n AML = 11 patients with AML in the training set, and expression patterns in m = 6810 human genes are explored. Note that, in general, such expression measures are subject to preprocessing steps such as image analysis and normalization, and also to a priori quality control. Supplemental information and dataset are located in the authors website ( mit.edu/mpr). Gene-specific significance was ascertained for alternative hypotheses H 1, j : µ ALL, j = µ AML, j, H 2, j : µ ALL, j <µ AML, j, and H 3, j : µ ALL, j >µ AML, j by SDP and SSP. We implemented our algorithm as well as PROC MULTTEST in SAS with B = permutations (Westfall et al., 2001). Due to essentially identical results, we report the results from SAS. Table 4 lists 41 genes with two-sided adjusted p-values which are smaller than Although adjusted p-values by SDP are slightly smaller than SSP, the results are extremely similar, confirming the findings from our simulation study. Note that Golub et al. and we identified 1100 and 1579 predictive genes without accounting for multiplicity, respectively. A Bonferroni adjustment declared 37 significant genes. This is not so surprising because relatively low correlations among genes were observed in these data. We do not show the results for H 3, j ; only four hypotheses are rejected. Note that the two-sided p-value is smaller than twice of the smaller one-sided p-value as theory predicts (see Appendix 1) and that the difference is not often negligible (Shaffer, 2002). Suppose that we want to design a prospective study to identify predictive genes overexpressing in AML based on observed parameter values. So we assume m = 6810, p = 0.3( 11/38), D = 10 or 100, δ = 0.5 or1,and BCS with block size 100 or CS with a common correlation coefficient of ρ = 0.1 or 0.4. We calculated the sample size using the modified formula under each parameter setting for FWER α = 0.05 and a global power 1 β = 0.8 with K = 5000 replications. For D = 10 and δ = 1, the minimal sample size required for BCS/CS are 59/59 and 74/63 for ρ = 0.1 and 0.4, respectively. If a larger number of genes, say D = 100, are anticipated to overexpress in AML with the same effect size, the respective sample sizes reduce to 34/34 and 49/41 in order to maintain the same power. With δ = 0.5, the required sample size becomes nearly 3.5 to4times that for δ = 1. Note that, with the same ρ, BCS tends to require a larger sample size than CS. One of the referees raised a question about the accuracy of our sample size formula when the gene expression data have other distributions than the multivariate normal distributions. We considered the setting α = 0.05, 1 β = 0.8, δ = 1, D = 100, ρ = 0.1 with CS structure, which results in the smallest

10 166 SIN-HO JUNG ET AL. Table 4. Reanalysis of the leukemia data from Golub et al. (1999) Alternative hypothesis µ ALL = µ AML µ ALL <µ AML Gene index (description) SDP SSP SDP SSP 1701 (FAH Fumarylacetoacetate) (Leukotriene C4 synthase) (Zyxin) (LYN V-yes-1 Yamaguchi) (LEPR Leptin receptor) (CD33 CD33 antigen) (Liver mrna for IGIF) (PRG1 Proteoglycan 1) (DF D component of complement) (GB DEF) (Induced Myeloid Leukemia Cell) (IL8 Precursor) (PEPTIDYL-PROLYL CIS-TRANS Isomerase) (Phosphotyrosine independent ligand p62) (CST3 Cystatin C) (ATP6C Vacuolar H+ ATPase proton channel subunit) (CTSD Cathepsin D) (Interleukin 8) (ITGAX Integrin) (Epb72 gene exon 1) (LGALS3 Lectin) (Thrombospondin-p50) (LYZ Lysozyme) (FTL Ferritin) (Azurocidin) (Protein MAD3) (PFC Properdin P factor) (Lysophospholipase homolog) (Lysozyme) (PPGB Protective protein) (LYZ Lysozyme) (HOX 2.2) (Catalase EC ) (FTH1 Ferritin heavy chain) (CD36 CD36 antigen) (ADM) (CDC25A Cell division cycle) (APLP2 Amyloid beta precursor-like protein) (TIMP2 Tissue inhibitor of metalloproteinase) (C-myb) (NF-IL6-beta protein mrna) Adjusted p-values from two-sided hypothesis less than 0.05 are listed in increasing order among total m = 6810 genes investigated. The total number of studied subjects n was 38 (n ALL = 27 and n AML = 11). B = times of permutation were used. Note that C-myb gene has p-value of against the hypothesis µ ALL >µ AML.Although some gene descriptions are identical, gene accession numbers are different.

11 Sample size calculation for multiple testing in microarray data analysis 167 sample size, n = 34, in the above sample size calculation. Gene expression data were generated from a correlated asymmetric distribution: X kj = µ kj + (e kj 2) ρ/4 + (e k0 2) (1 ρ)/4 for 1 j m and k = 1, 2. Here, µ 1 j = δ j and µ 2 j = 0, and e k0, e k1,...,e km are i.i.d. random variables from a χ 2 distribution with two degrees of freedom. Note that (X k1,...,x km ) have means (µ k1,...,µ km ), marginal variances 1, and a compound symmetry correlation structure with ρ = 0.1. In this case, we obtained an empirical FWER of and an empirical global power of which are close to the nominal α = 0.05 and 1 β = 0.8, respectively, from a simulation with B = N = DISCUSSION Genomic scientists are using DNA microarray as a major high-throughput assay to display DNA or RNA abundance for a large number of genes concurrently; this examination has rekindled interest in statistical issues such as multiple testing, giving methodological and computational challenges. Endeavors to identify the informative genes should be made taking multiplicity into account, but also with enough power to discover important genes successfully. This problem is different from the classical multiple testing situations in that the number of truly effective genes is often very small compared to the number of candidate genes under investigation. Moreover, only a small sample size is often available so large sample theory is not justified for standard statistical inference. An underpowered study is no service to the investigator or to science; results significant without assurance will often fail to replicate, and time will be wasted and resources needlessly expended. In this paper, we compared three popular testing procedures and developed a new fast algorithm for determining sample size with a particular emphasis on the microarray context. We basically suggest using exact permutation-based tests but also argue for the utility of the single-step which is often undervalued. Permutation tests do not require specification of the joint distribution or true correlation structure of the gene expression data. In typical circumstances occurring in microarrays, we verified that the actual advantage of the step-down procedure is minimal and that the improvement is more relevant in classical testing situations dealing with a small number of hypotheses. The single-step method is fast, easy to understand, computes critical values as well as adjusted p-values and, most importantly, offers a simple way tocalculate sample size. Generating high-dimensional (say, ) multivariate (normal) data many times (say, 5000) is not a simple undertaking even with a fast computer. To the best of our knowledge, there is no fast numerical algorithm to generate high-dimensional random vectors from general correlation structure. Some simplifying assumptions (e.g. BCS or CS correlation structure, common effect size and normal test statistics) may be more realistic in the microarray analysis under such technical constraints. However, further simulation under more varied conditions would be extremely useful. Our method for sample size determination is efficiently implemented using a novel and fast algorithm, and accurate as reflected in the empirical evaluation. Although there have been several publications on sample size estimation in the microarray context, none have examined the accuracy of their estimates. Furthermore, all focused on exploratory and approximate relationships among statistical power, sample size (or the number of replicates) and effect size (often, in terms of fold-change), and used the most conservative Bonferroni adjustment without any attempt to incorporate underlying correlation structure (Witte et al., 2000; Wolfinger et al., 2001; Black and Doerge, 2002; Lee and Whitmore, 2002; Pan et al., 2002; Simon et al., 2002; Cui and Churchill, 2003). By comparing empirical power resulting from naive and modified methods, we show that an ostensibly similar but incorrect choice of sample size ascertainment could cause considerable underestimation of

12 168 SIN-HO JUNG ET AL. required sample size. We recommend that the assessment of bias in empirical power (compared to nominal power) be a conventional step in publication of all sample size papers. Recently, some researchers proposed the new concepts of error such as false discovery rate (FDR) and positive-fdr (so-called, pfdr), which control the expected proportion of Type I error among the rejected hypotheses (Benjamini and Hochberg, 1995; Storey, 2002). Controlling these quantities relaxes the multiple testing criteria compared to controlling FWER in general and increases the number of declared significant genes. In particular, pfdr is motivated by Bayesian perspective and inherits the idea of single-step in constructing q-values, which are the counterpart of the adjusted p-values in this case (Ge et al., 2003). It would be useful to do a sample size comparison for FDR, pfdr and FWER. FWER is important as a benchmark because the reexamination of Golub et al. s data tells us that classical FWER control (along with global power) may not necessarily be as exceedingly conservative as many researchers thought and carries clear conceptual and practical interpretations. Appendices are available online at ACKNOWLEDGEMENTS The authors are grateful to the reviewers for their careful and speedy reviews of this paper. Their comments greatly improved this paper without a doubt. REFERENCES ALIZADEH, A. A. AND STAUDT, L. M.(2000). Genomic-scale gene expression profiling of normal and malignant immune cells. Current Opinions in Immunology 12, BENJAMINI, Y.AND HOCHBERG, Y.(1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society B 57, BLACK, M. A. AND DOERGE, R. W.(2002). Calculation of the minimum number of replicate spots required for detection of significant gene expression fold change in microarray experiments. Bioinformatics 18, CUI, X. AND CHURCHILL, G. A.(2003). How many mice and how many arrays? Replication in mouse cdna microarray experiments. In Johnson, K. F. and Lin, S. M. (eds), Methods of Microarray Data Analysis II, Norwell, MA: Kluwer Academic Publishers, pp DUDOIT, S., SHAFFER, J.P.AND BOLDRICK, J.C.(2003). Multiple hypothesis testing in microarray experiments. Statistical Science 18, DUDOIT, S., YANG, Y. H., CALLOW, M. J. AND SPEED, T. P. (2002). Statistical methods for identifying differentially expressed genes in replicated cdna microarray experiments. Statistica Sinica 12, GE, Y., DUDOIT, S. AND SPEED, T. P.(2003). Resampling-based multiple testing for microarray data analysis. TEST 12, GOLUB, T.R.,SLONIM, D.K.,TAMAYO, P.,HUARD, C., GAASENBEEK, M., MESIROV, J.P.,COLLER, H., LOH, M. L., DOWNING, J. R., CALIGIURI, M. A., BLOOMFIELD, C. D. AND LANDER, E. S.(1999). Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, HOCHBERG, Y.(1998). A sharper Bonferroni procedure for multiple tests of significance. Biometrika 75, HOLM, S.(1979). A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics 6, LEE, M. L. T. AND WHITMORE, G. A.(2002). Power and sample size for DNA microarray studies. Statistics in Medicine 21,

13 Sample size calculation for multiple testing in microarray data analysis 169 MUTTER, G.L., BAAK, J.P.A.,FITZGERALD, J.T.,GRAY, R., NEUBERG, D., KUST, G.A., GENTLEMAN, R., GALLANS, S. R., WEI, L. J. AND WILCOX, M. (2001). Global express changes of constitutive and hormonally regulated genes during endometrial neoplastic transformation. Gynecologic Oncology 83, PAN, W. (2002). A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments. Bioinformatics 18, PAN, W., LIN, J. AND LE, C. T.(2002). How many replicated of arrays are required to detect gene expression changes in microarray experiments? A mixture model approach. Genome Biology 3, PRESS, W. H., TEUKOLSKY, S. A., VETTERLING, W. T. AND FLANNERY, B. P.(1996). Numerical Recipes in Fortran 90. New York: Cambridge University Press. SANDER, C.(2000). Genomic medicine and the future of health care. Science 287, SHAFFER, J. P.(2002). Multiplicity, directional (Type III) errors, and the null hypothesis. Psychological Methods 7, SIMON, R., RADMACHER, M. D. AND DOBBIN, K.(2002). Design of studies with DNA microarrays. Genetic Epidemiology 23, STOREY, J. D.(2002). A direct approach to false discovery rates. Journal of the Royal Statistical Society B 64, THOMAS, J. G., OLSON, J. M., TAPSCOTT, S. J. AND ZHAO, L. P.(2001). An efficient and robust statistical modeling approach to discover differentially expressed genes using genomic expression profiles. Genome Research 11, TROENDLE, J. F., KORN, E. L. AND MCSHANE, L. M.(2004). An example of slow convergence of the bootstrap in high dimensions. American Statistician 58, WEST, M., BLANCHETTE, C., DRESSMAN, H., HUANG, E., ISHIDA, S., SPRANG, R., ZUZAN, H., OLSON, J., MARKS, J. AND NEVINS, J.(2001). Predicting the clinical status of human breast cancer by using gene expression profiles. Proceedings of the National Academy of Sciences USA 98, WESTFALL, P. H. AND YOUNG, S. S.(1989). P-value adjustments for multiple tests in multivariate binomial models. Journal of the American Statistical Association 84, WESTFALL, P. H. AND YOUNG, S. S.(1993). Resampling-based Multiple Testing: Examples and Methods for P- value Adjustment. New York: Wiley. WESTFALL, P.H.AND WOLFINGER, R.D.(1997). Multiple tests with discrete distributions. American Statistician 51, 3 8. WESTFALL, P. H., ZAYKIN, D. V. AND YOUNG, S. S.(2001). Multiple tests for genetic effects in association studies: methods in molecular biology. In Looney, S. (ed.), Biostatistical Methods, Toloway, NJ: Humana Press, pp WITTE, J. S., ELSTON, R. C. AND CARDON, L. R.(2000). On the relative sample size required for multiple comparisons. Statistics in Medicine 19, WOLFINGER, R.D.,GIBSON, G., WOLFINGER, E.D.,BENNETT, L., HAMADEH, H., BUSHEL, P.,AFSHARI, C. AND PAULES, R. S.(2001). Assessing gene significance from cdna microarray expression data via mixed models. Journal of Computational Biology 8, [Received 27 April 2004; revised 4 August 2004; accepted for publication 23 September 2004]

False Discovery Rates

False Discovery Rates False Discovery Rates John D. Storey Princeton University, Princeton, USA January 2010 Multiple Hypothesis Testing In hypothesis testing, statistical significance is typically based on calculations involving

More information

Statistical issues in the analysis of microarray data

Statistical issues in the analysis of microarray data Statistical issues in the analysis of microarray data Daniel Gerhard Institute of Biostatistics Leibniz University of Hannover ESNATS Summerschool, Zermatt D. Gerhard (LUH) Analysis of microarray data

More information

Gene Expression Analysis

Gene Expression Analysis Gene Expression Analysis Jie Peng Department of Statistics University of California, Davis May 2012 RNA expression technologies High-throughput technologies to measure the expression levels of thousands

More information

Permutation Tests for Comparing Two Populations

Permutation Tests for Comparing Two Populations Permutation Tests for Comparing Two Populations Ferry Butar Butar, Ph.D. Jae-Wan Park Abstract Permutation tests for comparing two populations could be widely used in practice because of flexibility of

More information

Sample Size and Power in Clinical Trials

Sample Size and Power in Clinical Trials Sample Size and Power in Clinical Trials Version 1.0 May 011 1. Power of a Test. Factors affecting Power 3. Required Sample Size RELATED ISSUES 1. Effect Size. Test Statistics 3. Variation 4. Significance

More information

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.

More information

The Variability of P-Values. Summary

The Variability of P-Values. Summary The Variability of P-Values Dennis D. Boos Department of Statistics North Carolina State University Raleigh, NC 27695-8203 boos@stat.ncsu.edu August 15, 2009 NC State Statistics Departement Tech Report

More information

Variables Control Charts

Variables Control Charts MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. Variables

More information

Exact Nonparametric Tests for Comparing Means - A Personal Summary

Exact Nonparametric Tests for Comparing Means - A Personal Summary Exact Nonparametric Tests for Comparing Means - A Personal Summary Karl H. Schlag European University Institute 1 December 14, 2006 1 Economics Department, European University Institute. Via della Piazzuola

More information

Tutorial 5: Hypothesis Testing

Tutorial 5: Hypothesis Testing Tutorial 5: Hypothesis Testing Rob Nicholls nicholls@mrc-lmb.cam.ac.uk MRC LMB Statistics Course 2014 Contents 1 Introduction................................ 1 2 Testing distributional assumptions....................

More information

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference)

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference) Chapter 45 Two-Sample T-Tests Allowing Unequal Variance (Enter Difference) Introduction This procedure provides sample size and power calculations for one- or two-sided two-sample t-tests when no assumption

More information

Two-Sample T-Tests Assuming Equal Variance (Enter Means)

Two-Sample T-Tests Assuming Equal Variance (Enter Means) Chapter 4 Two-Sample T-Tests Assuming Equal Variance (Enter Means) Introduction This procedure provides sample size and power calculations for one- or two-sided two-sample t-tests when the variances of

More information

Multivariate Analysis of Ecological Data

Multivariate Analysis of Ecological Data Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology

More information

Two-Way ANOVA tests. I. Definition and Applications...2. II. Two-Way ANOVA prerequisites...2. III. How to use the Two-Way ANOVA tool?...

Two-Way ANOVA tests. I. Definition and Applications...2. II. Two-Way ANOVA prerequisites...2. III. How to use the Two-Way ANOVA tool?... Two-Way ANOVA tests Contents at a glance I. Definition and Applications...2 II. Two-Way ANOVA prerequisites...2 III. How to use the Two-Way ANOVA tool?...3 A. Parametric test, assume variances equal....4

More information

Controlling the number of false discoveries: application to high-dimensional genomic data

Controlling the number of false discoveries: application to high-dimensional genomic data Journal of Statistical Planning and Inference 124 (2004) 379 398 www.elsevier.com/locate/jspi Controlling the number of false discoveries: application to high-dimensional genomic data Edward L. Korn a;,

More information

A direct approach to false discovery rates

A direct approach to false discovery rates J. R. Statist. Soc. B (2002) 64, Part 3, pp. 479 498 A direct approach to false discovery rates John D. Storey Stanford University, USA [Received June 2001. Revised December 2001] Summary. Multiple-hypothesis

More information

Finding statistical patterns in Big Data

Finding statistical patterns in Big Data Finding statistical patterns in Big Data Patrick Rubin-Delanchy University of Bristol & Heilbronn Institute for Mathematical Research IAS Research Workshop: Data science for the real world (workshop 1)

More information

From Reads to Differentially Expressed Genes. The statistics of differential gene expression analysis using RNA-seq data

From Reads to Differentially Expressed Genes. The statistics of differential gene expression analysis using RNA-seq data From Reads to Differentially Expressed Genes The statistics of differential gene expression analysis using RNA-seq data experimental design data collection modeling statistical testing biological heterogeneity

More information

Data, Measurements, Features

Data, Measurements, Features Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are

More information

The Bonferonni and Šidák Corrections for Multiple Comparisons

The Bonferonni and Šidák Corrections for Multiple Comparisons The Bonferonni and Šidák Corrections for Multiple Comparisons Hervé Abdi 1 1 Overview The more tests we perform on a set of data, the more likely we are to reject the null hypothesis when it is true (i.e.,

More information

Rank-Based Non-Parametric Tests

Rank-Based Non-Parametric Tests Rank-Based Non-Parametric Tests Reminder: Student Instructional Rating Surveys You have until May 8 th to fill out the student instructional rating surveys at https://sakai.rutgers.edu/portal/site/sirs

More information

Redwood Building, Room T204, Stanford University School of Medicine, Stanford, CA 94305-5405.

Redwood Building, Room T204, Stanford University School of Medicine, Stanford, CA 94305-5405. W hittemoretxt050806.tex A Bayesian False Discovery Rate for Multiple Testing Alice S. Whittemore Department of Health Research and Policy Stanford University School of Medicine Correspondence Address:

More information

Chapter 1 Introduction. 1.1 Introduction

Chapter 1 Introduction. 1.1 Introduction Chapter 1 Introduction 1.1 Introduction 1 1.2 What Is a Monte Carlo Study? 2 1.2.1 Simulating the Rolling of Two Dice 2 1.3 Why Is Monte Carlo Simulation Often Necessary? 4 1.4 What Are Some Typical Situations

More information

Likelihood Approaches for Trial Designs in Early Phase Oncology

Likelihood Approaches for Trial Designs in Early Phase Oncology Likelihood Approaches for Trial Designs in Early Phase Oncology Clinical Trials Elizabeth Garrett-Mayer, PhD Cody Chiuzan, PhD Hollings Cancer Center Department of Public Health Sciences Medical University

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

The Wilcoxon Rank-Sum Test

The Wilcoxon Rank-Sum Test 1 The Wilcoxon Rank-Sum Test The Wilcoxon rank-sum test is a nonparametric alternative to the twosample t-test which is based solely on the order in which the observations from the two samples fall. We

More information

CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

More information

Cancer Biostatistics Workshop Science of Doing Science - Biostatistics

Cancer Biostatistics Workshop Science of Doing Science - Biostatistics Cancer Biostatistics Workshop Science of Doing Science - Biostatistics Yu Shyr, PhD Jan. 18, 2008 Cancer Biostatistics Center Vanderbilt-Ingram Cancer Center Yu.Shyr@vanderbilt.edu Aims Cancer Biostatistics

More information

Nonparametric Tests for Randomness

Nonparametric Tests for Randomness ECE 461 PROJECT REPORT, MAY 2003 1 Nonparametric Tests for Randomness Ying Wang ECE 461 PROJECT REPORT, MAY 2003 2 Abstract To decide whether a given sequence is truely random, or independent and identically

More information

MEASURES OF LOCATION AND SPREAD

MEASURES OF LOCATION AND SPREAD Paper TU04 An Overview of Non-parametric Tests in SAS : When, Why, and How Paul A. Pappas and Venita DePuy Durham, North Carolina, USA ABSTRACT Most commonly used statistical procedures are based on the

More information

NCSS Statistical Software

NCSS Statistical Software Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the

More information

Statistics Graduate Courses

Statistics Graduate Courses Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

More information

Stepwise Regression. Chapter 311. Introduction. Variable Selection Procedures. Forward (Step-Up) Selection

Stepwise Regression. Chapter 311. Introduction. Variable Selection Procedures. Forward (Step-Up) Selection Chapter 311 Introduction Often, theory and experience give only general direction as to which of a pool of candidate variables (including transformed variables) should be included in the regression model.

More information

MODIFIED PARAMETRIC BOOTSTRAP: A ROBUST ALTERNATIVE TO CLASSICAL TEST

MODIFIED PARAMETRIC BOOTSTRAP: A ROBUST ALTERNATIVE TO CLASSICAL TEST MODIFIED PARAMETRIC BOOTSTRAP: A ROBUST ALTERNATIVE TO CLASSICAL TEST Zahayu Md Yusof, Nurul Hanis Harun, Sharipah Sooad Syed Yahaya & Suhaida Abdullah School of Quantitative Sciences College of Arts and

More information

SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

More information

How To Check For Differences In The One Way Anova

How To Check For Differences In The One Way Anova MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

More information

Department of Economics

Department of Economics Department of Economics On Testing for Diagonality of Large Dimensional Covariance Matrices George Kapetanios Working Paper No. 526 October 2004 ISSN 1473-0278 On Testing for Diagonality of Large Dimensional

More information

Permutation P-values Should Never Be Zero: Calculating Exact P-values When Permutations Are Randomly Drawn

Permutation P-values Should Never Be Zero: Calculating Exact P-values When Permutations Are Randomly Drawn Permutation P-values Should Never Be Zero: Calculating Exact P-values When Permutations Are Randomly Drawn Gordon K. Smyth & Belinda Phipson Walter and Eliza Hall Institute of Medical Research Melbourne,

More information

D-optimal plans in observational studies

D-optimal plans in observational studies D-optimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational

More information

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE 1 2 CONTENTS OF DAY 2 I. More Precise Definition of Simple Random Sample 3 Connection with independent random variables 3 Problems with small populations 8 II. Why Random Sampling is Important 9 A myth,

More information

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

Tests for Two Proportions

Tests for Two Proportions Chapter 200 Tests for Two Proportions Introduction This module computes power and sample size for hypothesis tests of the difference, ratio, or odds ratio of two independent proportions. The test statistics

More information

Monte Carlo testing with Big Data

Monte Carlo testing with Big Data Monte Carlo testing with Big Data Patrick Rubin-Delanchy University of Bristol & Heilbronn Institute for Mathematical Research Joint work with: Axel Gandy (Imperial College London) with contributions from:

More information

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1) Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the

More information

UNDERSTANDING THE TWO-WAY ANOVA

UNDERSTANDING THE TWO-WAY ANOVA UNDERSTANDING THE e have seen how the one-way ANOVA can be used to compare two or more sample means in studies involving a single independent variable. This can be extended to two independent variables

More information

15.062 Data Mining: Algorithms and Applications Matrix Math Review

15.062 Data Mining: Algorithms and Applications Matrix Math Review .6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

More information

Marketing Mix Modelling and Big Data P. M Cain

Marketing Mix Modelling and Big Data P. M Cain 1) Introduction Marketing Mix Modelling and Big Data P. M Cain Big data is generally defined in terms of the volume and variety of structured and unstructured information. Whereas structured data is stored

More information

Package dunn.test. January 6, 2016

Package dunn.test. January 6, 2016 Version 1.3.2 Date 2016-01-06 Package dunn.test January 6, 2016 Title Dunn's Test of Multiple Comparisons Using Rank Sums Author Alexis Dinno Maintainer Alexis Dinno

More information

Package HHG. July 14, 2015

Package HHG. July 14, 2015 Type Package Package HHG July 14, 2015 Title Heller-Heller-Gorfine Tests of Independence and Equality of Distributions Version 1.5.1 Date 2015-07-13 Author Barak Brill & Shachar Kaufman, based in part

More information

Personalized Predictive Medicine and Genomic Clinical Trials

Personalized Predictive Medicine and Genomic Clinical Trials Personalized Predictive Medicine and Genomic Clinical Trials Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute http://brb.nci.nih.gov brb.nci.nih.gov Powerpoint presentations

More information

Association Between Variables

Association Between Variables Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi

More information

E3: PROBABILITY AND STATISTICS lecture notes

E3: PROBABILITY AND STATISTICS lecture notes E3: PROBABILITY AND STATISTICS lecture notes 2 Contents 1 PROBABILITY THEORY 7 1.1 Experiments and random events............................ 7 1.2 Certain event. Impossible event............................

More information

Bootstrapping Big Data

Bootstrapping Big Data Bootstrapping Big Data Ariel Kleiner Ameet Talwalkar Purnamrita Sarkar Michael I. Jordan Computer Science Division University of California, Berkeley {akleiner, ameet, psarkar, jordan}@eecs.berkeley.edu

More information

Tutorial for proteome data analysis using the Perseus software platform

Tutorial for proteome data analysis using the Perseus software platform Tutorial for proteome data analysis using the Perseus software platform Laboratory of Mass Spectrometry, LNBio, CNPEM Tutorial version 1.0, January 2014. Note: This tutorial was written based on the information

More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear

More information

A study on the bi-aspect procedure with location and scale parameters

A study on the bi-aspect procedure with location and scale parameters 통계연구(2012), 제17권 제1호, 19-26 A study on the bi-aspect procedure with location and scale parameters (Short Title: Bi-aspect procedure) Hyo-Il Park 1) Ju Sung Kim 2) Abstract In this research we propose a

More information

Recall this chart that showed how most of our course would be organized:

Recall this chart that showed how most of our course would be organized: Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical

More information

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling Jeff Wooldridge NBER Summer Institute, 2007 1. The Linear Model with Cluster Effects 2. Estimation with a Small Number of Groups and

More information

Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

More information

Fallback tests for co-primary endpoints

Fallback tests for co-primary endpoints Research Article Received 16 April 2014, Accepted 27 January 2016 Published online 25 February 2016 in Wiley Online Library (wileyonlinelibrary.com) DOI: 10.1002/sim.6911 Fallback tests for co-primary

More information

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012 Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

More information

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics. Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are

More information

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS This booklet contains lecture notes for the nonparametric work in the QM course. This booklet may be online at http://users.ox.ac.uk/~grafen/qmnotes/index.html.

More information

Chapter G08 Nonparametric Statistics

Chapter G08 Nonparametric Statistics G08 Nonparametric Statistics Chapter G08 Nonparametric Statistics Contents 1 Scope of the Chapter 2 2 Background to the Problems 2 2.1 Parametric and Nonparametric Hypothesis Testing......................

More information

Life Table Analysis using Weighted Survey Data

Life Table Analysis using Weighted Survey Data Life Table Analysis using Weighted Survey Data James G. Booth and Thomas A. Hirschl June 2005 Abstract Formulas for constructing valid pointwise confidence bands for survival distributions, estimated using

More information

Publication List. Chen Zehua Department of Statistics & Applied Probability National University of Singapore

Publication List. Chen Zehua Department of Statistics & Applied Probability National University of Singapore Publication List Chen Zehua Department of Statistics & Applied Probability National University of Singapore Publications Journal Papers 1. Y. He and Z. Chen (2014). A sequential procedure for feature selection

More information

Sensitivity Analysis 3.1 AN EXAMPLE FOR ANALYSIS

Sensitivity Analysis 3.1 AN EXAMPLE FOR ANALYSIS Sensitivity Analysis 3 We have already been introduced to sensitivity analysis in Chapter via the geometry of a simple example. We saw that the values of the decision variables and those of the slack and

More information

Non Parametric Inference

Non Parametric Inference Maura Department of Economics and Finance Università Tor Vergata Outline 1 2 3 Inverse distribution function Theorem: Let U be a uniform random variable on (0, 1). Let X be a continuous random variable

More information

Statistical inference and data mining: false discoveries control

Statistical inference and data mining: false discoveries control Statistical inference and data mining: false discoveries control Stéphane Lallich 1 and Olivier Teytaud 2 and Elie Prudhomme 1 1 Université Lyon 2, Equipe de Recherche en Ingénierie des Connaissances 5

More information

From the help desk: Bootstrapped standard errors

From the help desk: Bootstrapped standard errors The Stata Journal (2003) 3, Number 1, pp. 71 80 From the help desk: Bootstrapped standard errors Weihua Guan Stata Corporation Abstract. Bootstrapping is a nonparametric approach for evaluating the distribution

More information

Exploratory data analysis for microarray data

Exploratory data analysis for microarray data Eploratory data analysis for microarray data Anja von Heydebreck Ma Planck Institute for Molecular Genetics, Dept. Computational Molecular Biology, Berlin, Germany heydebre@molgen.mpg.de Visualization

More information

Molecular Genetics: Challenges for Statistical Practice. J.K. Lindsey

Molecular Genetics: Challenges for Statistical Practice. J.K. Lindsey Molecular Genetics: Challenges for Statistical Practice J.K. Lindsey 1. What is a Microarray? 2. Design Questions 3. Modelling Questions 4. Longitudinal Data 5. Conclusions 1. What is a microarray? A microarray

More information

Comparative genomic hybridization Because arrays are more than just a tool for expression analysis

Comparative genomic hybridization Because arrays are more than just a tool for expression analysis Microarray Data Analysis Workshop MedVetNet Workshop, DTU 2008 Comparative genomic hybridization Because arrays are more than just a tool for expression analysis Carsten Friis ( with several slides from

More information

Gene expression analysis. Ulf Leser and Karin Zimmermann

Gene expression analysis. Ulf Leser and Karin Zimmermann Gene expression analysis Ulf Leser and Karin Zimmermann Ulf Leser: Bioinformatics, Wintersemester 2010/2011 1 Last lecture What are microarrays? - Biomolecular devices measuring the transcriptome of a

More information

Acknowledgments. Data Mining with Regression. Data Mining Context. Overview. Colleagues

Acknowledgments. Data Mining with Regression. Data Mining Context. Overview. Colleagues Data Mining with Regression Teaching an old dog some new tricks Acknowledgments Colleagues Dean Foster in Statistics Lyle Ungar in Computer Science Bob Stine Department of Statistics The School of the

More information

CONTINGENCY TABLES ARE NOT ALL THE SAME David C. Howell University of Vermont

CONTINGENCY TABLES ARE NOT ALL THE SAME David C. Howell University of Vermont CONTINGENCY TABLES ARE NOT ALL THE SAME David C. Howell University of Vermont To most people studying statistics a contingency table is a contingency table. We tend to forget, if we ever knew, that contingency

More information

3.4 Statistical inference for 2 populations based on two samples

3.4 Statistical inference for 2 populations based on two samples 3.4 Statistical inference for 2 populations based on two samples Tests for a difference between two population means The first sample will be denoted as X 1, X 2,..., X m. The second sample will be denoted

More information

PROPERTIES OF THE SAMPLE CORRELATION OF THE BIVARIATE LOGNORMAL DISTRIBUTION

PROPERTIES OF THE SAMPLE CORRELATION OF THE BIVARIATE LOGNORMAL DISTRIBUTION PROPERTIES OF THE SAMPLE CORRELATION OF THE BIVARIATE LOGNORMAL DISTRIBUTION Chin-Diew Lai, Department of Statistics, Massey University, New Zealand John C W Rayner, School of Mathematics and Applied Statistics,

More information

Unit 26 Estimation with Confidence Intervals

Unit 26 Estimation with Confidence Intervals Unit 26 Estimation with Confidence Intervals Objectives: To see how confidence intervals are used to estimate a population proportion, a population mean, a difference in population proportions, or a difference

More information

Goodness of fit assessment of item response theory models

Goodness of fit assessment of item response theory models Goodness of fit assessment of item response theory models Alberto Maydeu Olivares University of Barcelona Madrid November 1, 014 Outline Introduction Overall goodness of fit testing Two examples Assessing

More information

TOWARD BIG DATA ANALYSIS WORKSHOP

TOWARD BIG DATA ANALYSIS WORKSHOP TOWARD BIG DATA ANALYSIS WORKSHOP 邁 向 巨 量 資 料 分 析 研 討 會 摘 要 集 2015.06.05-06 巨 量 資 料 之 矩 陣 視 覺 化 陳 君 厚 中 央 研 究 院 統 計 科 學 研 究 所 摘 要 視 覺 化 (Visualization) 與 探 索 式 資 料 分 析 (Exploratory Data Analysis, EDA)

More information

Nonparametric Statistics

Nonparametric Statistics Nonparametric Statistics References Some good references for the topics in this course are 1. Higgins, James (2004), Introduction to Nonparametric Statistics 2. Hollander and Wolfe, (1999), Nonparametric

More information

Lasso on Categorical Data

Lasso on Categorical Data Lasso on Categorical Data Yunjin Choi, Rina Park, Michael Seo December 14, 2012 1 Introduction In social science studies, the variables of interest are often categorical, such as race, gender, and nationality.

More information

Uncertainty quantification for the family-wise error rate in multivariate copula models

Uncertainty quantification for the family-wise error rate in multivariate copula models Uncertainty quantification for the family-wise error rate in multivariate copula models Thorsten Dickhaus (joint work with Taras Bodnar, Jakob Gierl and Jens Stange) University of Bremen Institute for

More information

1 Review of Newton Polynomials

1 Review of Newton Polynomials cs: introduction to numerical analysis 0/0/0 Lecture 8: Polynomial Interpolation: Using Newton Polynomials and Error Analysis Instructor: Professor Amos Ron Scribes: Giordano Fusco, Mark Cowlishaw, Nathanael

More information

13: Additional ANOVA Topics. Post hoc Comparisons

13: Additional ANOVA Topics. Post hoc Comparisons 13: Additional ANOVA Topics Post hoc Comparisons ANOVA Assumptions Assessing Group Variances When Distributional Assumptions are Severely Violated Kruskal-Wallis Test Post hoc Comparisons In the prior

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

1 Solving LPs: The Simplex Algorithm of George Dantzig

1 Solving LPs: The Simplex Algorithm of George Dantzig Solving LPs: The Simplex Algorithm of George Dantzig. Simplex Pivoting: Dictionary Format We illustrate a general solution procedure, called the simplex algorithm, by implementing it on a very simple example.

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

More information

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

More information

T test as a parametric statistic

T test as a parametric statistic KJA Statistical Round pissn 2005-619 eissn 2005-7563 T test as a parametric statistic Korean Journal of Anesthesiology Department of Anesthesia and Pain Medicine, Pusan National University School of Medicine,

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

Non-Inferiority Tests for Two Means using Differences

Non-Inferiority Tests for Two Means using Differences Chapter 450 on-inferiority Tests for Two Means using Differences Introduction This procedure computes power and sample size for non-inferiority tests in two-sample designs in which the outcome is a continuous

More information

START Selected Topics in Assurance

START Selected Topics in Assurance START Selected Topics in Assurance Related Technologies Table of Contents Introduction Some Statistical Background Fitting a Normal Using the Anderson Darling GoF Test Fitting a Weibull Using the Anderson

More information

Introduction to Matrix Algebra

Introduction to Matrix Algebra Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra - 1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary

More information

Inference for two Population Means

Inference for two Population Means Inference for two Population Means Bret Hanlon and Bret Larget Department of Statistics University of Wisconsin Madison October 27 November 1, 2011 Two Population Means 1 / 65 Case Study Case Study Example

More information

Cross-Validation. Synonyms Rotation estimation

Cross-Validation. Synonyms Rotation estimation Comp. by: BVijayalakshmiGalleys0000875816 Date:6/11/08 Time:19:52:53 Stage:First Proof C PAYAM REFAEILZADEH, LEI TANG, HUAN LIU Arizona State University Synonyms Rotation estimation Definition is a statistical

More information

Standard Deviation Estimator

Standard Deviation Estimator CSS.com Chapter 905 Standard Deviation Estimator Introduction Even though it is not of primary interest, an estimate of the standard deviation (SD) is needed when calculating the power or sample size of

More information

Test Positive True Positive False Positive. Test Negative False Negative True Negative. Figure 5-1: 2 x 2 Contingency Table

Test Positive True Positive False Positive. Test Negative False Negative True Negative. Figure 5-1: 2 x 2 Contingency Table ANALYSIS OF DISCRT VARIABLS / 5 CHAPTR FIV ANALYSIS OF DISCRT VARIABLS Discrete variables are those which can only assume certain fixed values. xamples include outcome variables with results such as live

More information