Statistical power when testing for genetic differentiation

Size: px
Start display at page:

Download "Statistical power when testing for genetic differentiation"

Transcription

1 Molecular Ecology (2001) 10, Statistical power when testing for genetic differentiation Blackwell Science, Ltd N. RYMAN* and P. E. JORDE *Division of Population Genetics, Stockholm University, S Stockholm, Sweden, Division of Zoology, Department of Biology, University of Oslo, PO Box 1050 Blindern, N-0316 Oslo, Norway Abstract A variety of statistical procedures are commonly employed when testing for genetic differentiation. In a typical situation two or more samples of individuals have been genotyped at several gene loci by molecular or biochemical means, and in a first step a statistical test for allele frequency homogeneity is performed at each locus separately, using, e.g. the contingency chi-square test, Fisher s exact test, or some modification thereof. In a second step the results from the separate tests are combined for evaluation of the joint null hypothesis that there is no allele frequency difference at any locus, corresponding to the important case where the samples would be regarded as drawn from the same statistical and, hence, biological population. Presently, there are two conceptually different strategies in use for testing the joint null hypothesis of no difference at any locus. One approach is based on the summation of chi-square statistics over loci. Another method is employed by investigators applying the Bonferroni technique (adjusting the P-value required for rejection to account for the elevated alpha errors when performing multiple tests simultaneously) to test if the heterogeneity observed at any particular locus can be regarded significant when considered separately. Under this approach the joint null hypothesis is rejected if one or more of the component single locus tests is considered significant under the Bonferroni criterion. We used computer simulations to evaluate the statistical power and realized alpha errors of these strategies when evaluating the joint hypothesis after scoring multiple loci. We find that the extended Bonferroni approach generally is associated with low statistical power and should not be applied in the current setting. Further, and contrary to what might be expected, we find that exact tests typically behave poorly when combined in existing procedures for joint hypothesis testing. Thus, while exact tests are generally to be preferred over approximate ones when testing each particular locus, approximate tests such as the traditional chi-square seem preferable when addressing the joint hypothesis. Keywords: allele frequency, Bonferroni, chi-square, contingency table, Fisher s exact test, statistical analysis Received 8 January 2001; revision received 9 April 2001; accepted 20 April 2001 Introduction An increasingly common question in conservation and evolutionary biology is whether a set of samples are likely to represent the same gene pool. Several statistical techniques are being applied when addressing this type of problem, but there has been little discussion about their relative merits for detection of genetic heterogeneity. This lack is particularly obvious for methods used to combine the information from multiple loci. Correspondence: Nils Ryman. Fax: ; Nils.Ryman@popgen.su.se In a typical situation an investigator has collected tissue samples from two or more groups of individuals that are separated in space or time. Application of some biochemical or molecular techniques provides genotypes of the sampled individuals at one or more nuclear loci or at the mitochondrial genome, and each sample is described in terms of its size and allele (or haplotype) frequencies. The specific scientific questions may vary from study to study, but a very basic one, which frequently determines how to proceed with the analysis is the following one: Are the allele frequency differences observed among samples large enough to suggest that all the samples are not drawn from the same population (gene pool)? It appears that in most 2001 Blackwell Science Ltd

2 2362 N. RYMAN and P. E. JORDE cases the underlying evolutionary model is one of selective neutrality isolation genetic drift, which implies that all polymorphic loci examined are potentially informative with respect to the question of overall genetic heterogeneity. The general statistical approach most frequently used and the one dealt with in this paper is first to conduct a contingency test for allele frequency homogeneity for each locus separately, and in a second step to evaluate the simultaneous, or joint, information from all loci examined. The test procedure applied to each individual locus (contingency table) implies assessment of the probability of obtaining if the null hypothesis (H 0 ) of equal allele frequencies is true an outcome that is as likely as, or less likely than, the observed one. This probability (P-value) can either be calculated exactly (Fisher 1950; Mehta & Patel 1983), iterated or simulated to a desired degree of precision (Roff & Bentzen 1989; Raymond & Rousset 1995a,b), or approximated by means of some test statistic expected to follow a chi-square distribution (Fisher 1950; Everitt 1977; Sokal & Rohlf 1981). In the second step the results from the separate tests are combined for evaluation of the joint null hypothesis (H 0,J ) that there is no allele frequency difference at any locus (i.e. H 0,J is true when all the separate H 0 s are true). Presently there appears to be two conceptually different strategies in use for testing H 0,J. One technique is based on the summation of chi-square statistics and utilizes the fact that the sum of a series of chi-square distributed variables also follows a chi-square distribution (e.g. Everitt 1977; Sokal & Rohlf 1981). Another approach is used by investigators applying the Bonferroni technique to test if the heterogeneity observed at any particular locus can be regarded significant when considered separately. The general idea behind the Bonferroni method is to account for the increased probability of obtaining, by pure chance when the null hypothesis is true, a significant result at one or more loci when performing multiple tests (e.g. Rice 1989). Under this approach the joint null hypothesis (H 0,J ) is rejected if one or more of the component contingency tests is considered significant, i.e. at least one Bonferroni significance is required for rejecting the joint null hypothesis of equal allele frequencies at all loci. This strategy for testing H 0,J is conceptually adequate in the present context, although it has been noted that it may be quite conservative resulting in too few rejections (Legendre & Legendre 1998). We are aware of no study, however, that evaluates the two approaches with respect to their ability to detect true heterogeneity. This paper compares the power of the above statistical methods summation of chi-square vs. application of the Bonferroni method to determine if any one of the separate locus tests can be considered significant for detecting genetic heterogeneity when multiple loci have been scored. The results show that the efficiency may differ dramatically between the two approaches and, contrary to what might be expected, this difference may become enhanced as the number of loci increases. Exemplifying the problem As an example of the statistical test options, Table 1 presents sample allele frequency data for 12 codominant and di-allelic allozyme loci from two consecutive yearclasses of brown trout (Salmo trutta) collected from Lake Blanktjärnen in central Sweden (see Jorde & Ryman 1996 for details). Are the observed allele frequency differences large enough to suggest that there are true genetic differences between year-classes? The allele frequency difference at each individual locus was tested using Fisher s exact test, the conventional chisquare 2 2 contingency statistic (X 2, degrees of freedom, d.f. = 1), and the chi-square statistic with Yates continuity correction (X C2, d.f. = 1; chi-square test statistics are denoted by X 2 to distinguish them from values of a theoretical χ 2 distribution). Both chi-square approximations provide P- values that are reasonably similar to the exact ones, but those from the conventional X 2 are generally smaller (less conservative) whereas X C 2 tends to produce larger ones. All methods yield significant results (P < 0.05) for the same two loci (saat-4 and bgala-2). When testing the joint null hypothesis (H 0,J ) that there are no allele frequency differences at any locus, we note that the sum of a set of χ 2 distributed variables also follows a χ 2 distribution with d.f. equal to the sum of d.f. of the contributing variables. This summation is straightforward for X 2 and X C2, which are both expected to follow approximately a χ 2 distribution under the null hypothesis. With respect to the P-values obtained in the exact tests, Fisher (1950) has shown that when the null hypothesis is true the quantity 2ln(P) is expected to follow asymptotically a χ 2 distribution with d.f. = 2. Thus, summing the negative of twice the natural logarithm of the 12 P-values results in a X 2 statistic that is to be evaluated against a χ 2 distribution with d.f. = 24 (Table 1; this technique of summing 2ln(P) is sometimes referred to as Fisher s method, but it should not be confused with Fisher s exact test). Under each of the three approaches the summed chi-square statistic is significant (P < 0.05) and results in rejection of the joint null hypothesis (H 0,J ) although the level of significance differs among them (X 2 and X C 2 yielding the smallest and largest summation P-value, respectively). In contrast to the above summation approaches, application of the Bonferroni method results in a different conclusion. The Bonferroni logic implies that no individual test in a series of k-tests is to be judged significant unless the P- value is smaller than α/k, where α is the preassigned significance level for rejecting the null hypothesis (Rice 1989 and references therein; see below). In the present case with k = 12 (12 loci) and α = 0.05 a P-value less than 0.05/12 =

3 DETECTING GENETIC DIFFERENTIATION 2363 Table 1 Allele frequencies (the 100 allele) and 2 2 contingency test statistics for 12 di-allelic allozyme loci in two cohorts (1992 and 1993) of brown trout from Lake Blanktjärnen, Sweden. The number of fish is 43 and 27 in 1992 and 1993, respectively Allele frequency (100 allele) Cohort Fisher s exact test Chi-square test Chi-square test Yates correction Locus P 2ln(P) X 2 P X 2 C P saat DIA bgala bglua G3PDH sidhp LDH aman smdh ME MPI PEPLT Sum Sum of d.f P is thus required to be regarded significant. No individual (single locus) P-value in Table 1 is that small, and consequently no locus can be considered displaying significant heterogeneity when applying the Bonferroni technique. The joint null hypothesis (H 0,J ) is therefore also accepted, regardless of whether the basic contingency P-values were obtained by means of chi-square approximation or exact calculation. Thus, in this example the summation method indicates that the joint null hypothesis should be rejected for each of the tests statistics applied, whereas application of the Bonferroni technique consistently suggests acceptance when using the same set of individual P-values. The difference is crucial, but on the basis of the data available an investigator cannot determine what decision is most appropriate. It appears that many scientific journals would accept either approach without requiring additional data analysis. In the remainder of this paper we present results of computer simulations aimed at evaluating the probability of detecting a true genetic difference (statistical power) and the probability of falsely rejecting a true null hypothesis (α or type I error) when addressing the joint H 0,J (Bonferroni vs. summation ). We recognize that the present problem can also be addressed by means of exact binomial or multinomial calculations, but we chose the simulation approach for practical reasons. Simulations Random sampling of 2n genes (n diploid individuals) from populations with known allele frequencies was simulated by means of pseudo random number generation. At each locus the hypothesis of equal allele frequencies was tested by various r c contingency tests (r = number of samples and c = number of alleles) using both Fisher s exact method and chi-square tests with d.f. = (r 1)(c 1). The number of replicates (runs) of each simulation was typically in the interval , and the frequency of replicates resulting in rejection and acceptance of a null hypothesis provided estimates of the statistical power (probability of rejecting a false hypothesis) or the α (type I) error (probability of rejecting a true hypothesis), respectively. The intended α level was consistently kept at 0.05, rejecting the null hypothesis for P < A replicate was discarded if the random sampling of genes resulted in less than c alleles being observed in the combined material from the r samples, and a new set of r samples was drawn in those cases. Population allele frequencies were generally chosen to result in a realistic proportion of simulated contingency tables where at least one cell had a small expected value (expectancy less than 5 or 1) at low or moderate sample sizes. This was done in order not to provide an overly optimistic view of the fit of the various chi-square approximations to the expected χ 2 distribution. When simulating situations where multiple loci (k > 1) are scored, all loci within a population were assumed to segregate at identical allele frequencies (for example, for di-allelic loci the allele under consideration may occur at the frequency 0.10 at all loci in population 1 and at 0.15 at all loci in population 2). Of course, in the real world

4 2364 N. RYMAN and P. E. JORDE it is not very likely to encounter a situation where all the loci examined in a population segregate at exactly the same frequency (although the true allele frequency distribution is unknown in most cases). This model is appropriate for the present purpose of examining differences between test procedures, however. When dealing with relative frequencies both the power and the α error are dependent on the population frequencies, and varying those frequencies might obscure the comparison of test procedures. A regular chi-square test statistic (X 2 ) was calculated for each simulated contingency table. In addition, Yates continuity correction was applied to provide a corrected chi-square (X C2 ) for all 2 2 tables, and for larger tables the G-statistic with and without Williams correction was used (Sokal & Rohlf 1981; ). Fisher s exact test for 2 2 tables was performed as described by Sokal & Rohlf (1981; ), and the algorithm of Mehta & Patel (1983) was applied for larger tables. Computational results from the statistical routines developed for the simulation programs were checked with outputs from softwares such as biom (Rohlf 1987), statistica (StatSoft Inc. 1998), StatXact-Turbo (CYTEL Software Corporation 1992), and genepop (Raymond & Rousset 1995b), and the simulated power estimates were checked against exact calculations and those obtained using the standard normal approximation for power assessment (e.g. Zar 1984; 398). When combining the information from multiple loci for evaluation of the joint null hypothesis (H 0,J ) of no difference at any locus the approaches of Bonferroni and summation of chi-square statistics were applied as exemplified in the preceding section. A considerable number of simulations have been conducted during the course of this study using different combinations of allele frequencies, sample sizes, number of alleles, loci, and populations. In order not to burden the presentation unnecessarily, however, we have tried to choose as simple combinations as possible for illustration of general trends and principles. Thus, most of the paper focuses on situations like that in Table 1 where the basic statistical tests refer to 2 2 tables (2 samples and 2 alleles per locus). 2 2 contingency tables Four examples of typical simulation results for 2 2 contingency tests are depicted in Fig. 1(a). In the most basic case we consider a single locus (k = 1) with two alleles (A and A ), and a random sample of 20 diploid individuals (40 genes) is drawn from each of two populations (1 and 2) where the A allele occurs in the true frequency Q 1 and Q 2, respectively. In the case of multiple loci (k > 1) all of them have the same allele frequency (Q 1 ) in population 1, and within population 2 all frequencies are Q 2. Every plate in Fig. 1(a) represents a specific combination of Q 1 and Q 2, and for each test procedure the proportion of replicates resulting in rejection of the joint null hypothesis (y-axis) is indicated for different number of loci examined (x-axis; k = 1, 2 5, 10, 20 50). As in the introductory example (Table 1) the joint null hypothesis (H 0,J ) was rejected for P < 0.05 (summation of chi-squares), and when using the Bonferroni method rejection of H 0,J required at least one single locus P-value smaller than 0.05/k. For k = 1 the summation is over a single value only, and the Bonferroni rejection criterion coincides with that of the basic test. When Q 1 is different from Q 2 (H 0 is false; upper plates) the proportion of H 0,J rejections estimates the power of the test, and when Q 1 = Q 2 (H 0 is true; lower plates) it estimates the realized α error. In a first step we focus on the situations where H 0 is false (Q 1 Q 2 ; Fig. 1a, upper plates). Considering a single locus only (k = 1) the power estimates for the Q 1 /Q 2 combination of 0.10/0.15 are 0.107, 0.041, and when using X 2, X C2, and Fisher s exact test, respectively, and for the 0.50/0.60 combination the corresponding estimates are 0.150, 0.103, and The difference between these simulated power estimates and the values expected theoretically is noticeable in the third decimal place only, indicating that the simulations provide reasonably accurate results. Also, Fisher s exact and the X C 2 tests are both expected to be more conservative than the X 2 test (e.g. Everitt 1977), and they yield accordingly lower power estimates within each combination of Q 1 /Q 2. In contrast to the power observed for a single locus, the results for tests involving multiple loci are more surprising. Here, one might expect a reliable test to detect the true divergence between populations more frequently as additional loci are examined. It is only the ΣX 2 approach, however, that behaves consistently in this way at both combinations of Q 1 and Q 2. In the case of Q 1 /Q 2 = 0.10/ 0.15 the probability of rejecting H 0 actually tends to decrease when the number of loci included in the test increases for all methods except the ΣX 2. When Q 1 /Q 2 = 0.50/0.60 all the procedures provide at least some increase of power when more loci are considered, but the ΣX 2 approach is consistently the most powerful one. It is an important observation that the remarkably better power obtained through summation of X 2 is not associated with an unduly large α error when H 0,J is true. Rather, this method is the only one that consistently appears to provide an α error that is reasonably close to the intended one of α = 0.05 (Fig. 1a, lower panels). Simulation results for the considerably larger sample sizes of 500 individuals (1000 genes) are shown in Fig. 1(b). Here, the population allele frequencies are the same as those in Fig. 1(a) when estimating the α error (lower plates), but for the sake of illustration the assessment of power (upper plates) is based on smaller differences

5 DETECTING GENETIC DIFFERENTIATION 2365 Fig. 1 Proportion of statistical significances when testing the joint null hypothesis (H 0,J ) of no difference at any locus after simulated drawing of a sample of size (n) from each of two populations where the true allele frequency at di-allelic loci are Q 1 and Q 2, respectively. The number of loci examined is A test for allele frequency homogeneity is conducted for each locus separately using X 2, X C2, and Fisher s exact test, and the resulting P- values are combined into a joint test by the summation and Bonferroni approaches. The intended α is 0.05, and each data point is based on (Fig. 1a) and 5000 (Fig. 1b) replicates, respectively. See text for details. Figure 1(a): n = 20 individuals (40 genes). Figure 1(b): n = 500 individuals (1000 genes). between Q 1 and Q 2 to account for generally higher power associated with larger samples. (When sample sizes as large as 500 individuals the power of all methods is close to unity for the combinations of Q 1 and Q 2 used in Fig. 1a). At sample sizes of 1000 genes all methods are reasonably successful in keeping the α error close to the intended one of 0.05, except for those representing summation of X C 2 and 2ln(P). This latter observation most likely reflects a somewhat slower approach to the limiting χ 2 distribution for these two test statistics as compared to X 2 (see below). When H 0,J is false (upper panels, Fig. 1b) summation of X 2 is the technique that consistently provides the highest probability of rejecting H 0,J. For all methods, however, the power increases as the number of loci grows, although the summation approach in all cases appears more powerful regardless of the statistical procedure applied in the basic contingency tests. This is true also when summing X C 2 and 2ln(P), in spite of the lower realized α observed for these statistics. Reasons for low power For both the Bonferroni and the summation method the low power is associated with the approximations involved when assessing P-values for the basic 2 2 tables. Both methods are expected to work satisfactorily in theory when samples are large and allele frequencies intermediate. In practice, however, the approximate nature of the test statistics and the P-values produced by the primary contingency tests may provide a realized α in the combined test that is far below the intended one, and the reduced α typically results in a small power.

6 2366 N. RYMAN and P. E. JORDE Fig. 1 Continued Bonferroni. As noted above, the objective behind the Bonferroni technique is to avoid excessive numbers of false significances when performing multiple tests through adjusting the P-value at which a particular component test is to be judged significant. The probability of observing false significances increases rapidly as the number of tests grows. When performing k-tests of a true null hypothesis at α = 0.05 the expected probability of obtaining one or more significances by pure chance is 1 (1 0.05) k ; for k = 10, for example, this probability exceeds 40%. In order to maintain the intended α level (here 0.05) of a particular test when a series of k-tests have been performed, the idea behind the Bonferroni method is to adjust the P-value for rejection such that the probability of observing one or more significances among the k-tests remains at α. This goal is met approximately if rejecting any particular H 0 at P < α/k (rather than at P < α). The rationale for this criterion is based on the relationship 1 (1 α/k) k α. The above arguments form the basis for the extended application of the Bonferroni correction in the context of testing the joint null hypothesis H 0,J that all the components H 0 s are true when multiple tests have been performed. That is, if all the component (single locus) H 0 s are true (i.e. H 0,J is true), the probability is α to observe one or more Bonferroni significances (at the α/ k level). Thus, H 0,J is rejected when obtaining a single locus P-value less than α/k, otherwise it is accepted.

7 DETECTING GENETIC DIFFERENTIATION 2367 If the Bonferroni technique is to work in practice, however, two basic conditions must be met which are not satisfied in many realistic situations. First, it is necessary that each single locus test can produce a P-value that is smaller than α/k. However, this is not possible for many combinations of sample size (n) and population allele frequencies (Q 1 /Q 2 ) because of the restricted number of potential outcomes of the sampling process. If basing the evaluation of H 0,J on a suite of such contingency tests there is a substantial risk that the Bonferroni method yields a realized α that is considerably smaller than the intended one, and thereby a correspondingly reduced power. To clarify the point we may consider the extreme example of sampling two individuals (four genes) from each of two populations that are fixed for different alleles. The counts of the two types of alleles will be 4; 0 and 0; 4 for the two samples, respectively, which represents the most extreme outcome possible under the null hypothesis of equal allele frequencies. Fisher s exact test yields a (twosided) P-value of 0.029, which is significant at the 5% level, and this is the smallest P-value that can be obtained with the present sample sizes. Analysis of, say, five alternately fixed loci would result in five P-values of 0.029, and intuition would justifiably make the investigator suspect that the populations are genetically divergent. A Bonferroni evaluation would dismiss such an interpretation, however, because no P-value is smaller than 0.01 (0.05/5), a P-value which is impossible to obtain with the present sample sizes. Thus, in this situation the Bonferroni correction results in a realized α level of zero, and the power is thereby also reduced to zero. In the present simplified example it is easy to see how the Bonferroni method, as an effect of a discrete number of possible experimental outcomes, may result in a reduction of the realized α far below the intended one, and thereby in a corresponding decrease of power. The phenomenon is a general one, however, and the magnitude of the effect depends on the sample sizes, the number of loci (tests), and the true allele frequency differences (Fig. 1). The other requirement for the Bonferroni method to work satisfactorily is that the realized α error of each of the basic, single locus, contingency tests is reasonably close to the intended one. In other words, when the component H 0 is true the probability of obtaining a P-value of 0.05 or less should be close to 0.05, that of obtaining P 0.01 should be close to 0.01, etc. It appears that this characteristic of the sampling distribution is frequently taken for granted, in spite of the fact that substantial deviations are quite common. To exemplify the difference between intended and realized α errors in basic contingency tests we may consider the occurrence of various P-values when drawing two independent samples of 20 individuals (40 genes) from a population with the allele frequency Q = For illustration, all the 2 2 tables possible when drawing two samples of 40 items were created, P-values based on regular chisquare (X 2 ) and Fisher s test were computed, the exact frequency of occurrence of each table and its associated P-values was derived binomially, and the cumulative frequency of occurrence of P 0.05 was depicted graphically (Fig. 2, upper plate). As seen in Fig. 2 (upper plate) the realized α may be considerably smaller than the intended one, and particularly so for Fisher s exact test. With Fisher s exact test, for example, values of P 0.05 only occur in a frequency of (less than half the ideal rate), and the discrepancy is even more pronounced for smaller P-values. The corresponding cumulative distributions of P-values when sampling 50 and 500 individuals (100 and 1000 genes) are depicted in the central and lower plates of Fig. 2, respectively. For X 2 the correspondence between intended and realized α is fairly good at sample sizes of 50 individuals or more. In contrast, at the present population allele frequency samples in the order of hundreds of individuals are required for this to occur with Fisher s exact test. When realized α of the separate (single locus) contingency tests are smaller than intended, the Bonferroni approach to testing the joint H 0,J of no difference at any locus may result in an overall realized α that is far below the anticipated one. To exemplify, Table 2 gives exact realized values of α for 1 50 loci when applying Fisher s exact test to two independent samples of n = 10 (or n = 20) diploids from a population with the true allele frequency Q = For n = 10 and k = 10 loci (tests), for instance, the Bonferroni method implies that H 0,J should be rejected when observing at least one contingency P (0.05/ 10). With two samples of n 1 = n 2 = 10 diploids the largest Fisher P-value meeting the criterion P is (due to the restricted number of possible 2 2 tables at these sample sizes), and P-values this small or smaller occur at a frequency of (exact cumulative binomial probability). Thus, the realized α of the Bonferroni approach to testing H 0,J corresponds to the probability of observing one or more single locus P-values of P 0.005, which is α = 1 ( ) 10 = Clearly, this is dramatically smaller than the intended α of 0.05, and with such a small realized α the chance of detecting anything but very large allele frequency differences is minor. Chi-square summation. As noted above, the logic of the summation approach is based on the fact that the sum of two or more χ 2 distributed variables will also follow a χ 2 distribution with d.f. equal to the summed d.f. of the component variables. Under the null hypothesis, the test statistic computed for each particular locus (contingency table) is expected to be asymptotically χ 2 distributed with d.f. = (r 1)(c 1) where r is the number of samples (rows) and c is the number of alleles (columns). The fit may be quite poor, however, particularly for small sample sizes and skewed allele frequencies, and the sum of several such

8 2368 N. RYMAN and P. E. JORDE Fig. 2 Exact cumulative frequency of occurrence of possible P-values when drawing two independent samples of n = 20, 50, or 100 diploids (40, 100 or 1000 genes) from a population where the true allele frequency at a di-allelic locus is 0.1. The null hypothesis of allele frequency homogeneity is tested using regular chi-square (X 2 ) and Fisher s exact test. Only the left-most part (P 0.05) of the distribution is shown. Table 2 Exact realized α error when applying the Bonferroni method to a series of k 2 2 tables. Each table represents two independent samples of n diploid individuals (2n genes) from a population where the true gene frequency at di-allelic loci is 0.10, and where the H 0 of no gene frequency difference has been tested by Fisher s exact method. Intended α = Frequency of occurrence represents the cumulative binomial probability of obtaining the realized P-value or a smaller one. See text for details n = 10 n = 20 k = no. of tested loci Intended P-value for rejection Realized P-value for rejecting a separate 2 2 table Frequency of occurrence Realized α Realized P-value for rejecting a separate 2 2 table Frequency of occurrence Realized α

9 DETECTING GENETIC DIFFERENTIATION 2369 poorly fitted variables may deviate dramatically from the expected χ 2 distribution. The mean and variance of a χ 2 distribution is d.f. and 2d.f, respectively, and as an example of the fit to χ 2 Table 3 gives the observed mean and variance of the test statistics obtained when simulating the drawing of two independent samples from a population with an allele frequency of 0.1 (Q 1 = Q 2 = 0.1; n = 3 500). Both of X 2 and X C 2 have d.f. = 1, and if the fit is perfect we expect these statistics to yield means and variances of 1 and 2, respectively. With respect to the P-value from Fisher s exact test (exact P) the quantity 2ln(exact P) should be asymptotically χ 2 distributed with d.f. = 2 (see above), and with a perfect fit we expect the mean and variance to be 2 and 4, respectively. With respect to the traditional chi-square statistic (X 2 ), the fit of the observed sampling distribution to the theoretical χ 2 is quite good, except for a reduced variance and a slightly inflated mean at the smallest sample sizes. In contrast, the approach to the limiting χ 2 distribution is markedly slower for X C 2 and 2ln(exact P). The observed sampling distributions tend to be located to the left of the expected one, and this shift in location produces too few false significances (realized α < 0.05), and thereby a low power. The mean and variance of X 2 are fairly close to their expected values when n 50 individuals, but X C 2 and 2ln(exact P) both yield markedly smaller means and variances even at n = 500. The shift of location relative to χ 2 may not seem alarming when plotting the test statistic sample distribution, but a variable representing the sum of several observations from such a distribution may deviate dramatically from the one expected for the sum. To exemplify, the simulated and expected distributions of the 2 2 contingency X C 2 (d.f. = 1) for n = 20 diploids and Q 1 = Q 2 = 0.1 are shown in Fig. 3(a). The deviation from the χ 2 distribution with d.f. = 1 may not appear overly large, but when summing 10 observations and comparing with χ 2 with d.f. = 10 the difference is dramatic (Fig. 3b). Clearly, in a situation like this when X C 2 for 10 loci is being summed, the realized α error is far below the expected one, and the probability of rejecting H 0,J is reduced correspondingly. In the case of 2 2 contingency tables we have focused on the X 2, X C2, and 2ln(exact P) test statistics, and it is Test statistic n Mean Variance Observed Expected Observed Expected X X X X X X XC XC XC XC XC XC ln(exact P) ln(exact P) ln(exact P) ln(exact P) ln(exact P) ln(exact P) G G G G G G Williams G Williams G Williams G Williams G Williams G Williams G Table 3 Mean and variance of test statistics from 2 2 contingency tables when simulating the drawing of two independent samples of equal size (n = diploids) from a population where the true allele frequency at a di-allelic locus is 0.1. The number of replicates (number of 2 2 tables) is at each sample size. Expected values are those for a χ 2 distribution with d.f. = 1 [or d.f. = 2 for 2ln(exactP) ]. See text for details

10 2370 N. RYMAN and P. E. JORDE Fig. 3 Simulated sampling distribution of the X 2 C test statistic (2 2 contingency chi-square with Yates correction) when drawing two independent samples of 20 diploids (40 genes) from a population where the true allele frequency at di-allelic loci is 0.1. The corresponding χ 2 distributions are those expected asymptotically under large sample theory. (a) Examining a single locus; expected χ 2 has d.f. = 1. (b) Testing 10 loci separately and summing the test statistic values; expected χ 2 has d.f. = 10. clear that the traditional chi-square test appears to provide the statistic (X 2 ) that is to be preferred when combining information from multiple loci by summation (Fig. 1, Table 3). It is especially interesting to note that the fit of 2ln(exact P) is markedly poor when n is small, i.e. when the use of an exact test is typically regarded most warranted for comparisons of allele frequencies at each particular locus considered separately. As a comparison, Table 3 also gives simulated means and variances for G and Williams G for the case of Q 1 = Q 2 = Here, Williams G performs as well as X 2, and sometimes even better. In contrast to the other statistics, however, the sampling distribution of G seems to be shifted to the right of χ 2 for many sample sizes (larger mean and variance than expected). The G-test therefore appears to produce an excess of false significances in the basic contingency tests (realized α > 0.05), and this tendency is expected to grow progressively stronger when testing H 0,J through summation of G-values from multiple loci. two rows or columns (Everitt 1977), and when testing H 0,J the realized α is therefore expected to be in better agreement with the intended one for both the Bonferroni and the summation methods. As an example, Table 4 gives the results from simulated drawings of three samples (r = 3) from populations segregating for two and five alleles, respectively (c = 2 or 5), for the diploid sample sizes (n) of 10, 20, and 50. Under the conditions simulated the difference between the contingency test statistic sampling distribution and the expected χ 2 is in most cases minimal for the largest sample size (n = 50). As a result, the summation method for addressing H 0,J generally provides a realized error rate that is reasonably close to the intended α at n = 50, although that of 2ln(exact P) is still a bit low in the 3 2 tests (Table 4). At all sample sizes the Bonferroni approach usually results in a realized α that is smaller than that obtained by summation. At the smaller sample sizes (10 and 20) the 2ln(exact P) statistic yields an α error that appears unduly small in the 3 2 tests, but in the 3 5 tables the error is close to the intended one. Most strikingly, however, summation of the G-statistic tends to produce unacceptably high rates of false significances (10 37%) at the smallest sample sizes, an effect of a markedly poor fit to the expected χ 2 distribution. The traditional chi-square (X 2 ) generally seems to behave in fairly good agreement with expectation, and the summation method appears to perform better than Bonferroni over a wide range of sample sizes. Although the differences between test statistics and methods for evaluating H 0,J are less pronounced than for the 2 2 tables, the tendencies are similar. Combining exact P-values by means of Fisher s method tends to result in an unduly small α error more frequently than when summing X 2, particularly at small or moderate sample sizes, and it seems that summation should be preferred before Bonferroni. Finally, it should be noted that the present results indicating generally better summation properties of X 2 relative to the other test statistics is not caused by choosing combinations of sample sizes (n) and allele frequencies (Q) that produce unduly few tables with low expectancy cells. Considering 2 2 contingency tables with n = 20 and Q = 0.1 (Table 3), for example, over 70% of the simulated tables have two (out of four) cells with an expectation less than five. Similarly, at n = 20 nearly 80% of the simulated 3 2 contingency tables (Table 4) have three cells with an expected value of less than five; almost all the 3 5 tables have 6 12 such cells and over 10% have 3 6 cells with an expectation of less than unity. General r c contingency tables The fit of the test statistic sample distributions to χ 2 is generally improved for contingency tables with more than Concluding remarks and recommendations It is obvious from the above that the choice of statistical method may be crucial for the probability of drawing

11 DETECTING GENETIC DIFFERENTIATION 2371 Table 4 Mean and variance of test statistics from 3 c contingency tables when simulating the drawing of three samples of equal size (n = diploids) from the same population. The number of alleles (c) is 2 and 5 occurring in the frequencies 0.1 and 0.9 (c = 2) and 0.7, 0.1, 0.1, 0.05, and 0.05 (c = 5). The number of replicates (number of 3 c tables) at each sample size is and 1000 for the 3 2 and 5 2 tables, respectively. Expected values are those for a chi-square distribution with d.f. = 2(c 1) [d.f. = 2 for 2ln(exact P) ]. Realized α refers to a situation where the information for 10 loci is combined by means of the summation or the Bonferroni approach and the intended α is See text for details Mean Variance Realized α (10 loci) c Test statistic d.f. n Obs. Exp. Obs. Exp. Summation Bonferroni 2 X X X ln(exact P) ln(exact P) ln(exact P) G G G Williams G Williams G Williams G X X X ln(exact P) ln(exact P) ln(exact P) G G G Williams G Williams G Williams G the right conclusion when testing for genetic differentiation. To date, it appears that the primary statistical interest has been devoted to avoidance of false significances (α errors), whereas considerably less attention has been paid to the prospect of detecting true differences (power). Because the two quantities are interrelated, excessive focus on one of them may have undesirable effects on the other. It appears that the current trend in many fields of evolutionary biology is to score a steadily growing number of loci in a quite restricted number of individuals, frequently in the range, say, The question of how to combine the information from multiple loci is therefore becoming increasingly significant. Our results indicate strongly that summation of chi-square (X 2 ) tends to perform better than any of the alternatives examined, even at fairly small sample sizes. This approach should typically be the method of choice when testing the joint null hypothesis of no difference at any locus. The technique of summing twice the negative logarithm of P-values from Fisher s exact test [Σ 2ln(exact P) ] appears to be used increasingly often. It should be stressed, though, that this approach ( Fisher s method applied to Fisher s exact P) may be associated with a strikingly small power even when sample sizes are quite large, and particularly so for 2 2 tables. It may seem superficially appealing to base the joint test on probabilities that are computed exactly for each contributing contingency table. Nevertheless, the poor fit to the asymptotically expected χ 2 distribution frequently makes this method profusely conservative. Therefore, we suggest that results from summation of 2ln(exact P) should generally be accompanied by the results from chi-square summation. It should be stressed that we by no means suggest that exact contingency tests should be abandoned. Whenever possible, exact calculation of P is to be preferred before any approximation relying on large sample theory, and particularly so when sample sizes are modest or small. Although necessarily conservative, there is an obvious advantage with exact tests in that the investigator is guaranteed that the realized α will never exceed the

12 2372 N. RYMAN and P. E. JORDE intended one. The problem we address arises when combining the information from several exact tests by means of an approximation such as Fisher s method. There is no contingency test that is universally best from every perspective and under all circumstances, and there are many situations where an investigator may have valid concern regarding the appropriateness of the chisquare statistic, which necessarily represents an approximation. With small expected values in one or more cells the risk of excessive rates of false significances cannot be ignored, although several reports suggest (as do the present simulation results) that the severity of this continuity problem may be overrated (e.g. Cochran 1954; Lewontin & Felsenstein 1965; Everitt 1977). Exact tests, on the other hand, may be overly conservative and fail to detect true differences more often than anticipated. When evaluating a single contingency table, however, exact calculation of the P-value should be the primary method of choice. It must be noted, though, that the observation of a nonsignificant P-value (P > 0.05) may be quite uninformative in the absence of minimal information on realized α and power of the test at the sample sizes at hand. When combining the information from several contingency tables, however, we recommend that the decision on overall genetic divergence is made on the basis of summation of chi-square (X 2 ). The Bonferroni correction is also being applied more commonly as a tool for evaluating the joint null hypothesis of no difference at any locus, and its frequently poor performance in the present context may be perceived a bit surprising. It should be noted, though, that the Bonferroni correction was primarily designed to reduce the probability of obtaining false significances when performing several independent tests. It was not aimed at combining the information from multiple tests that address the same null hypothesis. The Bonferroni method focuses exclusively on the occurrence of (very) small P-values and largely ignores, for example, tendencies of weak significances to be overrepresented. When testing for genetic heterogeneity under a selective neutrality genetic drift model, however, any indication of allele frequency differences at any polymorphic locus (regardless of direction) should ideally contribute information that makes the joint null hypothesis less likely. The summation method is directly aimed at picking up such tendencies, and the difference between the two approaches becomes particularly obvious when the underlying test statistic distributions are characterized by marked discontinuities. As with exact tests, we do not suggest that the Bonferroni method should be avoided in general. The Bonferroni (or the sequential Bonferroni) approach represents a most valuable tool for controlling the α error when an investigator, after conducting multiple tests, focuses on a particular null hypothesis (Rice 1989). Rather, because of the markedly low power we recommend against its extended use when testing the joint null hypothesis (H 0,J ) that all the component H 0 s are true. This paper is focused on contingency tests for allele frequency heterogeneity, but problems with statistical power similar to those discussed here may also occur in other testing situations. The Bonferroni approach, for example, is often applied for evaluation of multiple P-values obtained when testing for Hardy Weinberg proportions or linkage equilibrium between pairs of loci. For the Bonferroni method to work properly in such cases, it is also necessary that the contributing tests can produce P-values as small as α/k and that those P-values occur at frequency reasonably close to α/k when the null hypothesis is true. For instance, if it is impossible in practice to obtain P < α/k in many or most of the contributing tests, then the power of the joint test (the Bonferroni evaluation) may be very close to zero. In such a situation, any attempt to interpret an observed lack of significance in biological terms would typically be meaningless and potentially erroneous. Acknowledgements We thank Linda Laikre, Ole Christian Lingjærde, Stefan Palm, Associate Editor Laurent Excoffier, and two anonymous reviewers for comments on earlier versions of this paper. The study was supported by grants to N.R. from the Swedish Natural Science Research Council and from the Swedish research program on Sustainable Costal Zone Management, SUCOZOMA, funded by the Foundation for Strategic Environmental Research, MISTRA. P.E.J. was supported by a grant from the National Research Council of Norway. References Cochran WG (1954) Some methods for strengthening the common χ 2 test. Biometrics, 10, CYTEL Software Corporation (1992) StatXact-Turbo; statistical software for exact nonparametric inference. CYTEL Software Corporation, Cambridge, MA. Everitt BS (1977) The Analysis of Contingency Tables. Chapman & Hall, London. Fisher RA (1950) Statistical Methods for Research Workers. 11th edn. Oliver and Boy, London. Jorde PE, Ryman N (1996) Demographic genetics of brown trout (Salmo trutta) and estimation of effective population size from temporal change of allele frequencies. Genetics, 143, Legendre P, Legendre L (1998) Numerical Ecology. 2nd edn. Elsevier, Amsterdam. Lewontin RC, Felsenstein J (1965) The robustness of homogeneity tests in 2xN tables. Biometrika, 36, Mehta CR, Patel NR (1983) A network algorithm for performing Fisher s exact test in r c contingency tables. Journal of the American Statistical Association, 78, Raymond M, Rousset F (1995a) An exact test for population differentiation. Evolution, 49,

13 DETECTING GENETIC DIFFERENTIATION 2373 Raymond M, Rousset F (1995b) genepop (version 1.2): a population genetics software for exact tests and ecumenicism. Journal of Heredity, 86, Rice WR (1989) Analyzing tables of statistical tests. Evolution, 43, Roff DA, Bentzen P (1989) The statistical analysis of mitochondrial DNA polymorphisms: X 2 and the problem of small samples. Molecular Biology and Evolution, 6, Rohlf FJ (1987) BIOM. A Package of Statistical Programs to Accompany the Text of Biometry. Applied Biostatistics, Inc., New York. Sokal RR, Rohlf FJ (1981) Biometry. 2nd edn. W.H. Freeman, San Francisco, CA. StatSoft Inc. (1998) STATISTICA for Windows. StatSoft, Inc., Tulsa, OK. Zar JH (1984) Biostatistical analysis. 2nd edn. Prentice Hall, Inc., Englewood Cliffs, New Jersey. Nils Ryman is a professor of genetics and heads the Division of Population Genetics at the Stockholm University. His research has focused primarily on the genetic structure of natural populations, the genetic effects of human exploitation of such populations, and related conservation genetics issues. Per Erik Jorde graduated in genetics in Stockholm and is presently a postdoctoral fellow at the University of Oslo, working on microsatellite DNA analyses of fishes.

Association Between Variables

Association Between Variables Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi

More information

HLA data analysis in anthropology: basic theory and practice

HLA data analysis in anthropology: basic theory and practice HLA data analysis in anthropology: basic theory and practice Alicia Sanchez-Mazas and José Manuel Nunes Laboratory of Anthropology, Genetics and Peopling history (AGP), Department of Anthropology and Ecology,

More information

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.

More information

Variables Control Charts

Variables Control Charts MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. Variables

More information

Two-Sample T-Tests Assuming Equal Variance (Enter Means)

Two-Sample T-Tests Assuming Equal Variance (Enter Means) Chapter 4 Two-Sample T-Tests Assuming Equal Variance (Enter Means) Introduction This procedure provides sample size and power calculations for one- or two-sided two-sample t-tests when the variances of

More information

Sample Size and Power in Clinical Trials

Sample Size and Power in Clinical Trials Sample Size and Power in Clinical Trials Version 1.0 May 011 1. Power of a Test. Factors affecting Power 3. Required Sample Size RELATED ISSUES 1. Effect Size. Test Statistics 3. Variation 4. Significance

More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

Introduction to Hypothesis Testing OPRE 6301

Introduction to Hypothesis Testing OPRE 6301 Introduction to Hypothesis Testing OPRE 6301 Motivation... The purpose of hypothesis testing is to determine whether there is enough statistical evidence in favor of a certain belief, or hypothesis, about

More information

CHI-SQUARE: TESTING FOR GOODNESS OF FIT

CHI-SQUARE: TESTING FOR GOODNESS OF FIT CHI-SQUARE: TESTING FOR GOODNESS OF FIT In the previous chapter we discussed procedures for fitting a hypothesized function to a set of experimental data points. Such procedures involve minimizing a quantity

More information

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference)

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference) Chapter 45 Two-Sample T-Tests Allowing Unequal Variance (Enter Difference) Introduction This procedure provides sample size and power calculations for one- or two-sided two-sample t-tests when no assumption

More information

NCSS Statistical Software

NCSS Statistical Software Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the

More information

Bivariate Statistics Session 2: Measuring Associations Chi-Square Test

Bivariate Statistics Session 2: Measuring Associations Chi-Square Test Bivariate Statistics Session 2: Measuring Associations Chi-Square Test Features Of The Chi-Square Statistic The chi-square test is non-parametric. That is, it makes no assumptions about the distribution

More information

UNDERSTANDING THE TWO-WAY ANOVA

UNDERSTANDING THE TWO-WAY ANOVA UNDERSTANDING THE e have seen how the one-way ANOVA can be used to compare two or more sample means in studies involving a single independent variable. This can be extended to two independent variables

More information

Two Correlated Proportions (McNemar Test)

Two Correlated Proportions (McNemar Test) Chapter 50 Two Correlated Proportions (Mcemar Test) Introduction This procedure computes confidence intervals and hypothesis tests for the comparison of the marginal frequencies of two factors (each with

More information

How To Check For Differences In The One Way Anova

How To Check For Differences In The One Way Anova MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

More information

Statistics 2014 Scoring Guidelines

Statistics 2014 Scoring Guidelines AP Statistics 2014 Scoring Guidelines College Board, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks of the College Board. AP Central is the official online home

More information

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4) Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

More information

CONTINGENCY TABLES ARE NOT ALL THE SAME David C. Howell University of Vermont

CONTINGENCY TABLES ARE NOT ALL THE SAME David C. Howell University of Vermont CONTINGENCY TABLES ARE NOT ALL THE SAME David C. Howell University of Vermont To most people studying statistics a contingency table is a contingency table. We tend to forget, if we ever knew, that contingency

More information

II. DISTRIBUTIONS distribution normal distribution. standard scores

II. DISTRIBUTIONS distribution normal distribution. standard scores Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

More information

Test Positive True Positive False Positive. Test Negative False Negative True Negative. Figure 5-1: 2 x 2 Contingency Table

Test Positive True Positive False Positive. Test Negative False Negative True Negative. Figure 5-1: 2 x 2 Contingency Table ANALYSIS OF DISCRT VARIABLS / 5 CHAPTR FIV ANALYSIS OF DISCRT VARIABLS Discrete variables are those which can only assume certain fixed values. xamples include outcome variables with results such as live

More information

Using Excel for inferential statistics

Using Excel for inferential statistics FACT SHEET Using Excel for inferential statistics Introduction When you collect data, you expect a certain amount of variation, just caused by chance. A wide variety of statistical tests can be applied

More information

Chi Square Tests. Chapter 10. 10.1 Introduction

Chi Square Tests. Chapter 10. 10.1 Introduction Contents 10 Chi Square Tests 703 10.1 Introduction............................ 703 10.2 The Chi Square Distribution.................. 704 10.3 Goodness of Fit Test....................... 709 10.4 Chi Square

More information

6.4 Normal Distribution

6.4 Normal Distribution Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under

More information

Inference for two Population Means

Inference for two Population Means Inference for two Population Means Bret Hanlon and Bret Larget Department of Statistics University of Wisconsin Madison October 27 November 1, 2011 Two Population Means 1 / 65 Case Study Case Study Example

More information

CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

More information

Tests for Two Proportions

Tests for Two Proportions Chapter 200 Tests for Two Proportions Introduction This module computes power and sample size for hypothesis tests of the difference, ratio, or odds ratio of two independent proportions. The test statistics

More information

Chapter 3 RANDOM VARIATE GENERATION

Chapter 3 RANDOM VARIATE GENERATION Chapter 3 RANDOM VARIATE GENERATION In order to do a Monte Carlo simulation either by hand or by computer, techniques must be developed for generating values of random variables having known distributions.

More information

Chi-square test Fisher s Exact test

Chi-square test Fisher s Exact test Lesson 1 Chi-square test Fisher s Exact test McNemar s Test Lesson 1 Overview Lesson 11 covered two inference methods for categorical data from groups Confidence Intervals for the difference of two proportions

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

99.37, 99.38, 99.38, 99.39, 99.39, 99.39, 99.39, 99.40, 99.41, 99.42 cm

99.37, 99.38, 99.38, 99.39, 99.39, 99.39, 99.39, 99.40, 99.41, 99.42 cm Error Analysis and the Gaussian Distribution In experimental science theory lives or dies based on the results of experimental evidence and thus the analysis of this evidence is a critical part of the

More information

SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

More information

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1) CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

More information

Normality Testing in Excel

Normality Testing in Excel Normality Testing in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com

More information

Come scegliere un test statistico

Come scegliere un test statistico Come scegliere un test statistico Estratto dal Capitolo 37 of Intuitive Biostatistics (ISBN 0-19-508607-4) by Harvey Motulsky. Copyright 1995 by Oxfd University Press Inc. (disponibile in Iinternet) Table

More information

Hypothesis Testing for Beginners

Hypothesis Testing for Beginners Hypothesis Testing for Beginners Michele Piffer LSE August, 2011 Michele Piffer (LSE) Hypothesis Testing for Beginners August, 2011 1 / 53 One year ago a friend asked me to put down some easy-to-read notes

More information

Calculating P-Values. Parkland College. Isela Guerra Parkland College. Recommended Citation

Calculating P-Values. Parkland College. Isela Guerra Parkland College. Recommended Citation Parkland College A with Honors Projects Honors Program 2014 Calculating P-Values Isela Guerra Parkland College Recommended Citation Guerra, Isela, "Calculating P-Values" (2014). A with Honors Projects.

More information

Non-Inferiority Tests for Two Proportions

Non-Inferiority Tests for Two Proportions Chapter 0 Non-Inferiority Tests for Two Proportions Introduction This module provides power analysis and sample size calculation for non-inferiority and superiority tests in twosample designs in which

More information

Lecture Notes Module 1

Lecture Notes Module 1 Lecture Notes Module 1 Study Populations A study population is a clearly defined collection of people, animals, plants, or objects. In psychological research, a study population usually consists of a specific

More information

Hypothesis testing. c 2014, Jeffrey S. Simonoff 1

Hypothesis testing. c 2014, Jeffrey S. Simonoff 1 Hypothesis testing So far, we ve talked about inference from the point of estimation. We ve tried to answer questions like What is a good estimate for a typical value? or How much variability is there

More information

Introduction to Hypothesis Testing

Introduction to Hypothesis Testing I. Terms, Concepts. Introduction to Hypothesis Testing A. In general, we do not know the true value of population parameters - they must be estimated. However, we do have hypotheses about what the true

More information

Experimental Analysis

Experimental Analysis Experimental Analysis Instructors: If your institution does not have the Fish Farm computer simulation, contact the project directors for information on obtaining it free of charge. The ESA21 project team

More information

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing Chapter 8 Hypothesis Testing 1 Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing 8-3 Testing a Claim About a Proportion 8-5 Testing a Claim About a Mean: s Not Known 8-6 Testing

More information

Odds ratio, Odds ratio test for independence, chi-squared statistic.

Odds ratio, Odds ratio test for independence, chi-squared statistic. Odds ratio, Odds ratio test for independence, chi-squared statistic. Announcements: Assignment 5 is live on webpage. Due Wed Aug 1 at 4:30pm. (9 days, 1 hour, 58.5 minutes ) Final exam is Aug 9. Review

More information

UNDERSTANDING THE INDEPENDENT-SAMPLES t TEST

UNDERSTANDING THE INDEPENDENT-SAMPLES t TEST UNDERSTANDING The independent-samples t test evaluates the difference between the means of two independent or unrelated groups. That is, we evaluate whether the means for two independent groups are significantly

More information

Describing Populations Statistically: The Mean, Variance, and Standard Deviation

Describing Populations Statistically: The Mean, Variance, and Standard Deviation Describing Populations Statistically: The Mean, Variance, and Standard Deviation BIOLOGICAL VARIATION One aspect of biology that holds true for almost all species is that not every individual is exactly

More information

Fixed-Effect Versus Random-Effects Models

Fixed-Effect Versus Random-Effects Models CHAPTER 13 Fixed-Effect Versus Random-Effects Models Introduction Definition of a summary effect Estimating the summary effect Extreme effect size in a large study or a small study Confidence interval

More information

Sensitivity Analysis 3.1 AN EXAMPLE FOR ANALYSIS

Sensitivity Analysis 3.1 AN EXAMPLE FOR ANALYSIS Sensitivity Analysis 3 We have already been introduced to sensitivity analysis in Chapter via the geometry of a simple example. We saw that the values of the decision variables and those of the slack and

More information

Topic 8. Chi Square Tests

Topic 8. Chi Square Tests BE540W Chi Square Tests Page 1 of 5 Topic 8 Chi Square Tests Topics 1. Introduction to Contingency Tables. Introduction to the Contingency Table Hypothesis Test of No Association.. 3. The Chi Square Test

More information

First-year Statistics for Psychology Students Through Worked Examples

First-year Statistics for Psychology Students Through Worked Examples First-year Statistics for Psychology Students Through Worked Examples 1. THE CHI-SQUARE TEST A test of association between categorical variables by Charles McCreery, D.Phil Formerly Lecturer in Experimental

More information

EPS 625 INTERMEDIATE STATISTICS FRIEDMAN TEST

EPS 625 INTERMEDIATE STATISTICS FRIEDMAN TEST EPS 625 INTERMEDIATE STATISTICS The Friedman test is an extension of the Wilcoxon test. The Wilcoxon test can be applied to repeated-measures data if participants are assessed on two occasions or conditions

More information

Permutation Tests for Comparing Two Populations

Permutation Tests for Comparing Two Populations Permutation Tests for Comparing Two Populations Ferry Butar Butar, Ph.D. Jae-Wan Park Abstract Permutation tests for comparing two populations could be widely used in practice because of flexibility of

More information

Two-Way ANOVA tests. I. Definition and Applications...2. II. Two-Way ANOVA prerequisites...2. III. How to use the Two-Way ANOVA tool?...

Two-Way ANOVA tests. I. Definition and Applications...2. II. Two-Way ANOVA prerequisites...2. III. How to use the Two-Way ANOVA tool?... Two-Way ANOVA tests Contents at a glance I. Definition and Applications...2 II. Two-Way ANOVA prerequisites...2 III. How to use the Two-Way ANOVA tool?...3 A. Parametric test, assume variances equal....4

More information

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1) Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the

More information

The Null Hypothesis. Geoffrey R. Loftus University of Washington

The Null Hypothesis. Geoffrey R. Loftus University of Washington The Null Hypothesis Geoffrey R. Loftus University of Washington Send correspondence to: Geoffrey R. Loftus Department of Psychology, Box 351525 University of Washington Seattle, WA 98195-1525 gloftus@u.washington.edu

More information

statistics Chi-square tests and nonparametric Summary sheet from last time: Hypothesis testing Summary sheet from last time: Confidence intervals

statistics Chi-square tests and nonparametric Summary sheet from last time: Hypothesis testing Summary sheet from last time: Confidence intervals Summary sheet from last time: Confidence intervals Confidence intervals take on the usual form: parameter = statistic ± t crit SE(statistic) parameter SE a s e sqrt(1/n + m x 2 /ss xx ) b s e /sqrt(ss

More information

13: Additional ANOVA Topics. Post hoc Comparisons

13: Additional ANOVA Topics. Post hoc Comparisons 13: Additional ANOVA Topics Post hoc Comparisons ANOVA Assumptions Assessing Group Variances When Distributional Assumptions are Severely Violated Kruskal-Wallis Test Post hoc Comparisons In the prior

More information

3.4 Statistical inference for 2 populations based on two samples

3.4 Statistical inference for 2 populations based on two samples 3.4 Statistical inference for 2 populations based on two samples Tests for a difference between two population means The first sample will be denoted as X 1, X 2,..., X m. The second sample will be denoted

More information

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS This booklet contains lecture notes for the nonparametric work in the QM course. This booklet may be online at http://users.ox.ac.uk/~grafen/qmnotes/index.html.

More information

Statistical tests for SPSS

Statistical tests for SPSS Statistical tests for SPSS Paolo Coletti A.Y. 2010/11 Free University of Bolzano Bozen Premise This book is a very quick, rough and fast description of statistical tests and their usage. It is explicitly

More information

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) As with other parametric statistics, we begin the one-way ANOVA with a test of the underlying assumptions. Our first assumption is the assumption of

More information

Point Biserial Correlation Tests

Point Biserial Correlation Tests Chapter 807 Point Biserial Correlation Tests Introduction The point biserial correlation coefficient (ρ in this chapter) is the product-moment correlation calculated between a continuous random variable

More information

Confidence Intervals for the Difference Between Two Means

Confidence Intervals for the Difference Between Two Means Chapter 47 Confidence Intervals for the Difference Between Two Means Introduction This procedure calculates the sample size necessary to achieve a specified distance from the difference in sample means

More information

Statistical Impact of Slip Simulator Training at Los Alamos National Laboratory

Statistical Impact of Slip Simulator Training at Los Alamos National Laboratory LA-UR-12-24572 Approved for public release; distribution is unlimited Statistical Impact of Slip Simulator Training at Los Alamos National Laboratory Alicia Garcia-Lopez Steven R. Booth September 2012

More information

Confidence Intervals for One Standard Deviation Using Standard Deviation

Confidence Intervals for One Standard Deviation Using Standard Deviation Chapter 640 Confidence Intervals for One Standard Deviation Using Standard Deviation Introduction This routine calculates the sample size necessary to achieve a specified interval width or distance from

More information

ECON 459 Game Theory. Lecture Notes Auctions. Luca Anderlini Spring 2015

ECON 459 Game Theory. Lecture Notes Auctions. Luca Anderlini Spring 2015 ECON 459 Game Theory Lecture Notes Auctions Luca Anderlini Spring 2015 These notes have been used before. If you can still spot any errors or have any suggestions for improvement, please let me know. 1

More information

Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test

Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test Experimental Design Power and Sample Size Determination Bret Hanlon and Bret Larget Department of Statistics University of Wisconsin Madison November 3 8, 2011 To this point in the semester, we have largely

More information

HYPOTHESIS TESTING WITH SPSS:

HYPOTHESIS TESTING WITH SPSS: HYPOTHESIS TESTING WITH SPSS: A NON-STATISTICIAN S GUIDE & TUTORIAL by Dr. Jim Mirabella SPSS 14.0 screenshots reprinted with permission from SPSS Inc. Published June 2006 Copyright Dr. Jim Mirabella CHAPTER

More information

Normal distribution. ) 2 /2σ. 2π σ

Normal distribution. ) 2 /2σ. 2π σ Normal distribution The normal distribution is the most widely known and used of all distributions. Because the normal distribution approximates many natural phenomena so well, it has developed into a

More information

interpretation and implication of Keogh, Barnes, Joiner, and Littleton s paper Gender,

interpretation and implication of Keogh, Barnes, Joiner, and Littleton s paper Gender, This essay critiques the theoretical perspectives, research design and analysis, and interpretation and implication of Keogh, Barnes, Joiner, and Littleton s paper Gender, Pair Composition and Computer

More information

Friedman's Two-way Analysis of Variance by Ranks -- Analysis of k-within-group Data with a Quantitative Response Variable

Friedman's Two-way Analysis of Variance by Ranks -- Analysis of k-within-group Data with a Quantitative Response Variable Friedman's Two-way Analysis of Variance by Ranks -- Analysis of k-within-group Data with a Quantitative Response Variable Application: This statistic has two applications that can appear very different,

More information

Descriptive Statistics and Measurement Scales

Descriptive Statistics and Measurement Scales Descriptive Statistics 1 Descriptive Statistics and Measurement Scales Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample

More information

STATISTICAL SIGNIFICANCE OF RANKING PARADOXES

STATISTICAL SIGNIFICANCE OF RANKING PARADOXES STATISTICAL SIGNIFICANCE OF RANKING PARADOXES Anna E. Bargagliotti and Raymond N. Greenwell Department of Mathematical Sciences and Department of Mathematics University of Memphis and Hofstra University

More information

Standard Deviation Estimator

Standard Deviation Estimator CSS.com Chapter 905 Standard Deviation Estimator Introduction Even though it is not of primary interest, an estimate of the standard deviation (SD) is needed when calculating the power or sample size of

More information

1. How different is the t distribution from the normal?

1. How different is the t distribution from the normal? Statistics 101 106 Lecture 7 (20 October 98) c David Pollard Page 1 Read M&M 7.1 and 7.2, ignoring starred parts. Reread M&M 3.2. The effects of estimated variances on normal approximations. t-distributions.

More information

WHAT IS A JOURNAL CLUB?

WHAT IS A JOURNAL CLUB? WHAT IS A JOURNAL CLUB? With its September 2002 issue, the American Journal of Critical Care debuts a new feature, the AJCC Journal Club. Each issue of the journal will now feature an AJCC Journal Club

More information

START Selected Topics in Assurance

START Selected Topics in Assurance START Selected Topics in Assurance Related Technologies Table of Contents Introduction Some Statistical Background Fitting a Normal Using the Anderson Darling GoF Test Fitting a Weibull Using the Anderson

More information

Exact Nonparametric Tests for Comparing Means - A Personal Summary

Exact Nonparametric Tests for Comparing Means - A Personal Summary Exact Nonparametric Tests for Comparing Means - A Personal Summary Karl H. Schlag European University Institute 1 December 14, 2006 1 Economics Department, European University Institute. Via della Piazzuola

More information

Study Guide for the Final Exam

Study Guide for the Final Exam Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make

More information

DATA INTERPRETATION AND STATISTICS

DATA INTERPRETATION AND STATISTICS PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE

More information

Chapter 4. Probability and Probability Distributions

Chapter 4. Probability and Probability Distributions Chapter 4. robability and robability Distributions Importance of Knowing robability To know whether a sample is not identical to the population from which it was selected, it is necessary to assess the

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

Testing for differences I exercises with SPSS

Testing for differences I exercises with SPSS Testing for differences I exercises with SPSS Introduction The exercises presented here are all about the t-test and its non-parametric equivalents in their various forms. In SPSS, all these tests can

More information

Pearson's Correlation Tests

Pearson's Correlation Tests Chapter 800 Pearson's Correlation Tests Introduction The correlation coefficient, ρ (rho), is a popular statistic for describing the strength of the relationship between two variables. The correlation

More information

Introduction to Analysis of Variance (ANOVA) Limitations of the t-test

Introduction to Analysis of Variance (ANOVA) Limitations of the t-test Introduction to Analysis of Variance (ANOVA) The Structural Model, The Summary Table, and the One- Way ANOVA Limitations of the t-test Although the t-test is commonly used, it has limitations Can only

More information

1.5 Oneway Analysis of Variance

1.5 Oneway Analysis of Variance Statistics: Rosie Cornish. 200. 1.5 Oneway Analysis of Variance 1 Introduction Oneway analysis of variance (ANOVA) is used to compare several means. This method is often used in scientific or medical experiments

More information

1 Nonparametric Statistics

1 Nonparametric Statistics 1 Nonparametric Statistics When finding confidence intervals or conducting tests so far, we always described the population with a model, which includes a set of parameters. Then we could make decisions

More information

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name: Glo bal Leadership M BA BUSINESS STATISTICS FINAL EXAM Name: INSTRUCTIONS 1. Do not open this exam until instructed to do so. 2. Be sure to fill in your name before starting the exam. 3. You have two hours

More information

LAB : THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

LAB : THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics Period Date LAB : THE CHI-SQUARE TEST Probability, Random Chance, and Genetics Why do we study random chance and probability at the beginning of a unit on genetics? Genetics is the study of inheritance,

More information

AP: LAB 8: THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

AP: LAB 8: THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics Ms. Foglia Date AP: LAB 8: THE CHI-SQUARE TEST Probability, Random Chance, and Genetics Why do we study random chance and probability at the beginning of a unit on genetics? Genetics is the study of inheritance,

More information

A logistic approximation to the cumulative normal distribution

A logistic approximation to the cumulative normal distribution A logistic approximation to the cumulative normal distribution Shannon R. Bowling 1 ; Mohammad T. Khasawneh 2 ; Sittichai Kaewkuekool 3 ; Byung Rae Cho 4 1 Old Dominion University (USA); 2 State University

More information

Covariance and Correlation

Covariance and Correlation Covariance and Correlation ( c Robert J. Serfling Not for reproduction or distribution) We have seen how to summarize a data-based relative frequency distribution by measures of location and spread, such

More information

4. Continuous Random Variables, the Pareto and Normal Distributions

4. Continuous Random Variables, the Pareto and Normal Distributions 4. Continuous Random Variables, the Pareto and Normal Distributions A continuous random variable X can take any value in a given range (e.g. height, weight, age). The distribution of a continuous random

More information

Descriptive Statistics

Descriptive Statistics Y520 Robert S Michael Goal: Learn to calculate indicators and construct graphs that summarize and describe a large quantity of values. Using the textbook readings and other resources listed on the web

More information

A Power Primer. Jacob Cohen New York University ABSTRACT

A Power Primer. Jacob Cohen New York University ABSTRACT Psychological Bulletin July 1992 Vol. 112, No. 1, 155-159 1992 by the American Psychological Association For personal use only--not for distribution. A Power Primer Jacob Cohen New York University ABSTRACT

More information

Writing a degree project at Lund University student perspectives

Writing a degree project at Lund University student perspectives 1 Writing a degree project at Lund University student perspectives Summary This report summarises the results of a survey that focused on the students experiences of writing a degree project at Lund University.

More information

False Discovery Rates

False Discovery Rates False Discovery Rates John D. Storey Princeton University, Princeton, USA January 2010 Multiple Hypothesis Testing In hypothesis testing, statistical significance is typically based on calculations involving

More information

Chapter 1 Introduction. 1.1 Introduction

Chapter 1 Introduction. 1.1 Introduction Chapter 1 Introduction 1.1 Introduction 1 1.2 What Is a Monte Carlo Study? 2 1.2.1 Simulating the Rolling of Two Dice 2 1.3 Why Is Monte Carlo Simulation Often Necessary? 4 1.4 What Are Some Typical Situations

More information

COMPARING DATA ANALYSIS TECHNIQUES FOR EVALUATION DESIGNS WITH NON -NORMAL POFULP_TIOKS Elaine S. Jeffers, University of Maryland, Eastern Shore*

COMPARING DATA ANALYSIS TECHNIQUES FOR EVALUATION DESIGNS WITH NON -NORMAL POFULP_TIOKS Elaine S. Jeffers, University of Maryland, Eastern Shore* COMPARING DATA ANALYSIS TECHNIQUES FOR EVALUATION DESIGNS WITH NON -NORMAL POFULP_TIOKS Elaine S. Jeffers, University of Maryland, Eastern Shore* The data collection phases for evaluation designs may involve

More information