Nonparametric and Distribution- Free Statistical Tests

Size: px
Start display at page:

Download "Nonparametric and Distribution- Free Statistical Tests"

Transcription

1 20 Nonparametric and Distribution- Free Statistical Tests Concepts that you will need to remember from previous chapters SS total, SS group, SS error : Sums of squares of all scores, of group means, and within groups MS group, MS error : Mean squares for group means, and within groups F statistic: Ratio of MS group over MS error Degrees of freedom: The number of independent pieces of information remaining after estimating one or more parameters Effect size ( dˆ ): A measure intended to express the size of a treatment Eta 2 1h 2 2, omega 2 1v 2 2: Correlation based measures of effect size Multiple comparisons: Tests on differences between specific group means in terms that are meaningful to the reader 536

2 Nonparametric and Distribution-Free Statistical Tests 537 In this chapter we are going to change our general approach to hypothesis testing and look at procedures that rely on substituting ranks for raw scores. These are members of the class of nonparametric, or distribution-free, tests. We will first look at the underlying principle that stands behind such tests and then discuss the reasons why one might prefer to use this kind of test. We will see that these tests are a supplement to what we have learned, not a replacement. Most of the statistical procedures we have discussed in the preceding chapters have involved the estimation of one or more parameters of the distribution of scores in the population(s) from which the data were sampled and assumptions concerning the shape of that distribution. For example, the t test makes use of the sample variance 1s 2 2 as an estimate of the population variance 1s 2 2 and also requires the assumption that the population from which we sampled is normal (or at least that the sampling distribution of the mean is normal). Tests, such as the t test, that involve assumptions either about specific parameters or about the distribution of the population are referred to as parametric tests. Definition Parametric tests: Statistical tests that involve assumptions about, or estimation of, population parameters. Nonparametric tests: Statistical tests that do not rely on parameter estimation or precise distributional assumptions. Distribution-free tests: Another name for nonparametric tests. One class of tests, however, places less reliance on parameter estimation and/or distribution assumptions. Such tests usually are referred to as nonparametric tests or distribution-free tests. By and large if a test is nonparametric, it is also distribution-free; in fact, it is the distribution-free nature of the test that is most valuable to us. Although the two names often are used interchangeably, these tests will be referred to here as distribution-free tests. The argument over the value of distribution-free tests has gone on for many years, and it certainly cannot be resolved in this chapter. Many experimenters feel that, for the vast majority of cases, parametric tests are sufficiently robust (unaffected by violations of assumptions) to make distribution-free tests unnecessary. Others, however, believe just as strongly in the unsuitability of parametric tests and the overwhelming superiority of the distribution-free approach. (Bradley [1968] is a forceful and articulate spokesman for the latter group, even though his book on the subject is over 40 years old.) Regardless of the position you take on this issue, it is important that you are familiar with the most common distribution-free procedures and their underlying rationale. These tests are too prevalent in the experimental literature simply to be ignored. The major advantage generally attributed to distribution-free tests is also the most obvious they do not rely on any seriously restrictive assumptions concerning the shape of the sampled population(s). This is not to say that distribution-free tests do not make any distribution assumptions, only that the assumptions they do require are far

3 538 Chapter 20 Nonparametric and Distribution-Free Statistical Tests more general than those required for the parametric tests. The exact null hypothesis being tested may depend, for example, on whether two populations are symmetric or have a similar shape. None of these tests, however, makes an a priori assumption about the specific shape of the distribution; that is, the validity of the test is not affected by whether the distribution of the variable in the population is normal. A parametric test, on the other hand, usually includes some type of normality assumption; if that assumption is false, the conclusions drawn from the test may be inaccurate. Another characteristic of distribution-free tests that often acts as an advantage is that many of them, especially the ones discussed in this chapter, are more sensitive to medians than to means. Thus if the nature of your data is such that you are interested primarily in medians, the tests presented here may be particularly useful to you. Those who favor using parametric tests in every case do not deny that the distribution-free tests are more liberal in the assumptions they require. They do argue, however, that the assumptions normally cited as being required of parametric tests are overly restrictive in practice and that the parametric tests are remarkably unaffected by violations of distribution assumptions. In other words, they argue that the parametric test is still a valid test even if all of its assumptions are not met. The major disadvantage generally attributed to distribution-free tests is their lower power relative to the corresponding parametric test. In general, when the assumptions of the parametric test are met, the distribution-free test requires more observations than the comparable parametric test for the same level of power. Thus for a given set of data the parametric test is more likely to lead to rejection of a false null hypothesis than is the corresponding distribution-free test. Moreover, even when the distribution assumptions are violated to a moderate degree, the parametric tests are thought to maintain their advantage. It often is claimed that the distribution-free procedures are particularly useful because of the simplicity of their calculations. However, for an experimenter who has just invested six months collecting data, a difference of five minutes in computation time hardly justifies the use of a less desirable test. Moreover, since most people run their analyses using computer software, the difference in ease of use disappears completely. There is one other advantage of distribution-free tests. Because many of them rank the raw scores and operate on those ranks, they offer a test of differences in central tendency that are not affected by one or a few very extreme scores (outliers). An extreme score in a set of data actually can make the parametric test less powerful because it inflates the variance and hence the error term, as well as biasing the mean by shifting it toward the outlier (the latter may increase or decrease the difference between means). In this chapter we will be concerned with four of the most important distribution-free methods. The first two are analogues of the t test, one for independent samples and one for matched samples. The next two tests are distribution-free analogues of the analysis of variance, the first for k independent groups and the second for k repeated measures. All these tests are members of a class known as rank-randomization tests because they deal with ranked data and take as the distribution of their test statistic, when the null hypothesis is true, the theoretical distribution of randomly distributed ranks. I ll come back to this idea shortly. Because these tests convert raw data to ranks, the shape of the underlying distribution of scores in the population becomes less important. Thus both the sets

4 Nonparametric and Distribution-Free Statistical Tests (data that might have come from a normal distribution) and (data that might have come from a bimodal distribution) reduce to the ranks Definition Rank-randomization tests: A class of nonparametric tests based on the theoretical distribution of randomly assigned ranks. The use of methods based on ranks is not the only approach when we are concerned about nonnormality, though it is the most common. Wilcox (2003) has an extensive discussion of newer alternative methods (often relying on the trimming of samples), though there is not space to discuss those methods here. Why do we use ranks? You might reasonably ask why we would use ranks to run any of the tests in this chapter. There are three good reasons why these tests were designed around the substitution of ranks for raw data. In the first place, ranks can eliminate or reduce the effects of extreme values. The two highest ranks of 20 items will be the values 19 and 20. But the highest raw score values could be 77 and 78 or 77 and 130. It makes a difference with raw scores, but not with ranks. A second advantage of ranks is that we know certain of their properties, such as that the sum of a set of ranks is N 3 1N 1 12>2. This greatly simplifies calculations. This was especially important in the days before high speed computers. The third advantage is that once you have worked out the critical value of the test statistic when you have 8 observations in one group and 13 in another, you never have to solve that problem again. The next time you have 8 scores in one group and 13 in another, converting to ranks will yield the same critical value. However, with raw scores you would have to set a cutoff for every conceivable collection of 8 scores in one group and 13 in another. However, while ranks provided an easy solution when we had to do calculations by hand, that advantage is now largely gone. There is a whole set of statistical tests called randomization tests (or sometimes permutation tests) that work by randomizing raw scores. For the Mann-Whitney test we converted to ranks and then asked about all of the possible ways those ranks could have been assigned to groups if the null hypothesis were true. As I said, ranks made it easy to acquire all possible arrangements and identify the 5% most extreme ones. But now we can

5 540 Chapter 20 Nonparametric and Distribution-Free Statistical Tests do exactly the same thing with raw scores. We can write a very simple computer program that randomly assigns scores to groups, calculates some statistic, and then repeats that process 5,000 times or more in a very few seconds. Then we identify the 5% most extreme outcomes and that gives us our critical value. And if there are two many possible permutations of the raw scores to make the enumeration practical, we can pick a random 5,000 or 10,000 rearrangements, and that will gives us a result that is acceptably close to the result of the full solution. If R. A. Fisher were still around, he would argue that when the randomization of raw scores gives a result that is more than trivially different from the results of a parametric test such as t or F, then it is the t or F that is wrong The Mann Whitney Test One of the most common and best known of the distribution-free tests is the Mann Whitney test for two independent samples. This test often is thought of as the distribution-free analogue of the t test for two independent samples, although it tests a slightly different, and broader, null hypothesis. Its null hypothesis is the hypothesis that the two samples were drawn at random from identical populations (not just populations with the same mean), but it is especially sensitive to population differences in central tendency. Thus rejection of H 0 generally is interpreted to mean that the two distributions had different central tendencies, but it is possible that rejection actually resulted from some other difference between the populations. Notice that when we gain one thing (freedom from assumptions), we pay for it with something else (loss of specificity). Definition Mann Whitney test: A nonparametric test for comparing the central tendency of two independent samples. The Mann Whitney test is a variation on a test originally devised by Wilcoxon called the Rank-Sum test. Because Wilcoxon also devised another test, to be discussed in the next section, we will refer to this version as the Mann Whitney test to avoid confusion. Although the test as devised by Mann and Whitney used a slightly different test statistic, the statistic used in this chapter (the sum of the ranks of the scores in one of the groups) is often advocated because it is much easier to calculate. (In fact, this is the statistic that Wilcoxon uses for his test. So, to be honest, I am calling this the Mann Whitney test but doing it the way Wilcoxon proposed.) The result is the same, because either way of computing a test statistic would lead to exactly the same conclusion when applied to the same set of data. The logical basis of the Mann-Whitney test is particularly easy to understand. Assume that we have two independent treatment groups, with observations in n 1

6 20.1 The Mann Whitney Test 541 Group 1 and n 2 observations in Group 2. To make it concrete, assume that there are 8 observations in each group. Further assume that we don t know whether or not the null hypothesis is true, but we happen to obtain the following data: Raw Scores Group Group Well, it looks as if Group 2 outscored Group 1 by a substantial margin. Now suppose that we rank the data from lowest to highest, without regard to group membership. Ranked Scores Group Ranks 36 Group Ranks 100 Look at that! The lowest 8 ranks ended up in Group 1 and the highest 8 ranks ended up in Group 2. That doesn t look like a very likely event if the two populations don t differ. We could calculate how often such a result would happen if we really need to, and if you are very patient. Although it could be done mathematically, we could do it empirically by taking 16 balls and writing the numbers 1 through 16 on them, corresponding to the 16 ranks. (We don t have to worry about actual scores, because we are going to replace scores with ranks anyway.) Now we will toss all of the balls into a bucket, shake the bucket thoroughly, pull out 8 balls, which will correspond with the ranks for Group 1, record the sum of the numbers on those balls, toss them back into the bucket, shake and draw again, record the sum of the numbers, and continue that process all night. By the next morning we will have drawn an awful lot of samples, and we can look at the values we recorded and make a frequency distribution of them. This will tell us how often we had a sum of the ranks of only 36, how often the sum was 37, how often it was 50, or 60, or 90, or whatever. Now we really are finished. We know that if we just draw ranks out at random, only very rarely will we get a sum as small as 36. (A simple calculation shows that an outcome as extreme as ours would be expected to occur only one time out of 12,870, for a probability of ) If the null hypothesis is really true, then there should be no systematic reason for the first group to have only the lowest ranks. It should have ranks that are about like those of the second group. If the ranks in Group 1 are improbably low, that is evidence against the null hypothesis. I mentioned above that this is a rank randomization test, and what we have just done illustrates where the name comes from. We run the test by looking at what would happen if we randomly assigned scores (or actually ranks) to groups, even if we don t actually go through the process of doing the random assignment ourselves.

7 542 Chapter 20 Nonparametric and Distribution-Free Statistical Tests Now consider the case in which the null hypothesis is true and the scores for the two groups were sampled from identical populations. In this situation if we were to rank all N scores without regard to group membership, we would expect some low ranks and some high ranks in each group, and the sum of the ranks assigned to Group 1 would be roughly equal to the sum of the ranks assigned to Group 2. Reasonable results for the situation with a true null hypothesis are illustrated. Raw Scores Group Group Now it looks as if Group 2 scores are not a lot different from Group 1 scores. We can rank the data across both groups. Ranked Scores Group Ranks 64 Group Ranks 72 Here the sum of the ranks in Group 1 is not much different from the sum of the ranks in Group 2, and a sum like that would occur quite often if we just drew ranks at random. Mann and Whitney (and Wilcoxon) based their tests on the logic just described, using the sum of the ranks in one of the groups as the test statistic. If that sum is too small relative to the other sum, we will reject the null hypothesis. More specifically, we will take as our test statistic the sum of the ranks assigned to the smaller group, or if n 1 5 n 2 the smaller of the two sums. Given this value, we can use tables of the Mann Whitney statistic 1W S 2 to test the null hypothesis. (They needed to concern themselves with only one of the sums, because with a fixed set of numbers [ranks], the sum of the ranks in one group is directly related to the sum of the ranks in the other group. If one sum is high, the other must be low.) To take a specific example, consider the data in Table 20.1 on the number of recent stressful life events reported by a group of cardiac patients in a local hospital and a control group of orthopedic patients in the same hospital. It is well known that stressful life events (marriage, new job, death of a spouse, etc.) are associated with illness, and it is reasonable to expect that many cardiac patients would have experienced more recent stressful events than orthopedic patients (who just happened to break an ankle while tearing down a building or a collarbone while skiing). It would appear from the data that this expectation is borne out. Because we have some reason to suspect that life stress scores probably are not symmetrically distributed in the population (especially for cardiac patients if our research hypothesis is true), we will choose to use a distribution-free test. In this case we will use the Mann Whitney test because we have two independent groups.

8 20.1 The Mann Whitney Test 543 Table 20.1 Stressful Life Events Reported by Cardiac and Orthopedic Patients Cardiac Patients Orthopedic Patients Data Ranks To apply the Mann Whitney test, we first rank all 11 scores from lowest to highest, assigning tied ranks to tied scores. The orthopedic group is the smaller of the two, and if those patients generally have had fewer recent stressful life events, then the sum of the ranks assigned to that group would be relatively low. Letting W S stand for the sum of the ranks in the smaller group (the orthopedic group), we find W S 5 1R i 2 in smaller group W S W S We can evaluate the obtained value of by using Table E.8 in the Appendix E, which gives the smallest value of W S we would expect to obtain by chance if the null hypothesis were true. From Table E.8 we find that for n subjects in the smaller group and n subjects in the larger group ( n 1 is always used to represent the number of subjects in the smaller group) the entry for a5.025 (one-tailed) is 18. This means that for a difference between groups to be significant at the two-tailed.05 level (or the one-tailed.025 level), W S must be less than or equal to 18. Because we found W s to be 21, we cannot reject H 0. (By way of comparison, if we ran a t test on these data, ignoring the fact that one sample variance is almost 50 times the other and that the data suggest that our prediction of the shape of the distribution of cardiac scores may be correct, t would be 1.52 on 9 df, which is also a nonsignificant result.) As an aside, I should point out that we would have rejected H 0 if our value of W S was smaller than the tabled value. Until now you have been rejecting H 0 when the obtained test statistic was larger than the corresponding tabled value. When we work with nonparametric tests the tables are usually set up to lead to rejection for small obtained values. If I were redesigning statistical procedures, I would set the tables up differently, but nobody asked me. Just get used to the fact that parametric tables are set up such that you reject H 0 for large obtained values, and nonparametric tables are often set up so that you reject for small values. That s just the way it is. The entries in Table E.8 are for a one-tailed test and will lead to rejection of the null hypothesis only if the sum of the ranks for the smaller group is sufficiently small. It is possible, however, that the larger ranks could be congregated in the smaller group, in which case if is false, the sum of the ranks would be larger than chance H 0

9 544 Chapter 20 Nonparametric and Distribution-Free Statistical Tests expectation rather than smaller. One rather awkward way around this problem would be to rank the data all over again, this time ranking from high to low, rather than from low to high. If we did that, the smaller ranks would appear in the smaller group, and we could proceed as before. We do not have to go through the process of reranking data, however. We can accomplish the same thing by making use of the symmetric properties of the distribution of the rank sum by calculating a statistic called W S. W S is the sum of the ranks for the smaller group that we would have found if we had reversed our ranking and ranked from highest to lowest: W S 5 2W 2 W S where 2W 5 n 1 1n 1 1 n and is tabled in Table E.8 in Appendix E. For a twotailed test of H 0 (which is what we normally want) we calculate both W S and W S, enter the table with whichever is smaller, and double the listed value of a. For an illustration of and W S, consider the following two sets of data: Set 1 Group 1 Group 2 X Ranks W S 5 11 W S 5 29 Set 2 Group 1 Group 2 X Ranks W S 5 29 W S 5 11 W S Notice that the two data sets exhibit the same degree of extremeness, in the sense that for the first set, four of the five lowest ranks are in Group 1, and in the second set, four of the five highest ranks are in Group 1. Moreover, W S for Set 1 is equal to W S for Set 2 and vice versa. Thus if we establish the rule that we will calculate both W S and W S for the smaller group and refer the smaller of W S and W S to the tables, we will have a two-tailed test and will come to the same conclusion with respect to the two data sets. The Normal Approximation Table E.8 in Appendix E is suitable for all cases in which n 1 and n 2 are less than or equal to 25. For larger values of n 1 and/or we can make use of the fact that the n 2

10 20.1 The Mann Whitney Test 545 distribution of distribution has and approaches a normal distribution as sample sizes increase. This Because the distribution is normal and we know its mean and its standard deviation (the standard error), we can calculate z: z 5 W S Mean 5 n 1 1n 1 1 n Standard error 5 B n 1 n 2 1 n 1 1 n Statistic 2 Mean Standard error 5 W S 2 n 1 1 n 1 1 n B n 1 n 2 1 n 1 1 n and obtain from the tables of the normal distribution an approximation of the true probability of a value of W S at least as low as the one obtained. To illustrate the computations for the case in which the larger ranks fall into the smaller group and to illustrate the use of the normal approximation (although we don t really need to use an approximation for such small sample sizes), consider the data in Table These data are hypothetical (but reasonable) data on the birthweights (in grams) of children born to mothers who did not seek prenatal care until the third trimester and of children born to mothers who received prenatal care starting in the first trimester. For the data in Table 20.2 the sum of the ranks in the smaller group equals 100. From Table E.8 in Appendix E we find 2W 5 152; thus W S 5 2W 2 W S Because 52 is smaller than 100, we go to Table E.8 with W S 5 52, n 1 5 8, and n (Remember, n 1 is defined as the smaller sample size.) Because we want a two-tailed test, we will double the column headings for a. The critical value of W S (or W S ) for a two-tailed test at a5.05 is 53, meaning that only 5% of the time would we expect a value of W S or W S less than or equal to 53 when H 0 is true. Our obtained value of W S is 52, which falls into the rejection region, so we will reject H 0. We will conclude that mothers who do not receive prenatal care until the third trimester tend to give birth to smaller babies. This does not necessarily mean that not having care until the third trimester causes smaller babies, but only that variables associated with delayed care (e.g., young mothers, poor nutrition, and poverty) also are associated with lower birthweight. The use of the normal approximation for evaluating W S is illustrated in the lower section of Table Here we find that z From Table E.10 in Appendix E we find that the probability of W S or W S at least as small as 52 12

11 546 Chapter 20 Nonparametric and Distribution-Free Statistical Tests Table 20.2 Data on Birthweight of Infants Born to Mothers with Different Levels of Prenatal Care Beginning of Care Third Trimester First Trimester Birthweight Rank Birthweight Rank 1, , , , , , , , , , , , , , , , , ,775 6 W S 5 o 1ranks in Group W S 5 2W 2 W S z W S 2 n 1 1n 1 1 n B B n 1 n 2 1n 1 1 n (a z at least as extreme as ; 2.13) is Because this value is smaller than our traditional cutoff of a5.05, we will reject H 0 and again conclude that there is sufficient evidence to say that failing to seek early prenatal care is related to lower birthweight. Note that both the exact solution and the normal approximation lead to the same conclusion with respect to H 0. (With the normal approximation it is not necessary to calculate and use W S because use of W S will lead to the same value of z except for the reversal of its sign. It would be instructive for you to calculate Student s t test for two independent groups from the same set of data.)

12 20.1 The Mann Whitney Test 547 The Treatment of Ties When the data contain tied scores, any test that relies on ranks is likely to be somewhat distorted. There are several different ways of dealing with ties. You can assign tied ranks to tied scores (as we have been doing), you can flip a coin and assign consecutive ranks to tied scores, or you can assign untied ranks in whatever way will make it hardest to reject H 0. In actual practice most people simply assign tied ranks. Although that may not be the statistically best way to proceed, it is the most common and the method we will use here. The Null Hypothesis The Mann Whitney test evaluates the null hypothesis that the two sets of scores were sampled from identical populations. This is broader than the null hypothesis tested by the corresponding t test, which dealt specifically with means (primarily as a result of the underlying assumptions that ruled out other sources of difference). If the two populations are assumed to have the same shape and dispersion, then the null hypothesis tested by the Mann Whitney test would actually deal with the central tendency (in this case the medians) of the two populations; if the populations are also symmetric, the test will be a test of means. In any event the Mann Whitney test is particularly sensitive to differences in central tendency. Using SPSS I will illustrate the use of SPSS for this test, and it should be clear how it would be used for those tests that follow. In Chapter 17 we considered data collected by Willer (2005) on the Masculine Overcompensation Thesis. Those data can be found on the Web site as Tab17.5.dat. The first column represents Gender 11 5 Male2, the second column represents Condition 11 5 Threat2, and the third column contains the dependent variable (Price). In Chapter 17 I mentioned that Willer s data were probably positively skewed, although the data that I created to match his data were more or less normal. This might be a place where the Mann Whitney test would be useful, especially if we had Willer s actual data. I also noted there that Willer was most interested in males and the hypothesis that when males masculinity is questioned, they might engage in more masculine behavior, and so we will limit our analysis to males. To restrict the analysis to data from males, you need to go to the dropdown menu labeled Data, choose Select Cases, and then specify that you want to use only the data from Gender 5 1. Next, choose Analyze/ Nonparametric tests/ 2-independent samples. Next you need to specify that Price is the test variable and that Threat is the Grouping variable. When you do that you also have to indicate that the levels of Threat are 1 and 2. The results of this analysis appear below.

13 548 Chapter 20 Nonparametric and Distribution-Free Statistical Tests Mann Whitney Test Ranks Condition N Mean Rank Sum of Rank Price willing to pay Threatened Confirmed Total 50 Test Statistics a Price willing to pay Mann Whitney U Wilcoxon W Z Asymp. Sig. (2-tailed).046 a. Grouping Variable: Condition Here you see that the probability of this result under the null hypothesis is given as.046, which is less that.05 and will lead us to conclude that threatened males do engage in more masculine behavior. (SPSS uses a normal approximation, but if you look at Appendix Table E.8 you will see that the critical sum of ranks is 536. From the printout the smaller sum was 534.5, which also leads to rejection of the null hypothesis.) 20.2 Wilcoxon s Matched-Pairs Signed-Ranks Test Frank Wilcoxon is credited with developing the most popular distribution-free test for independent groups, which I referred to as the Mann Whitney test to avoid confusion and because of their work on it. He also developed the most popular test for matched groups (or paired scores). This test is the distribution-free analogue of the t test for related samples. It tests the null hypothesis that two related (matched) samples were drawn either from identical populations or from symmetric populations with the same mean. More specifically it tests the null hypothesis that the distribution of difference scores (in the population) is symmetric about zero. This is the same hypothesis tested by the corresponding t test when that test s normality assumption is met. The logic behind Wilcoxon s matched-pairs signed-ranks test is straightforward and can be illustrated with an example of a study of schizophrenia and subcortical structures by Suddath, Christison, Torrey, Casanova, and Weinberger (1990). Bleuler (1911) originally described schizophrenia as being characterized by a lack of connections between associations in memory. The hippocampus has

14 20.2 Wilcoxon s Matched-Pairs Signed-Ranks Test 549 been suggested as playing an important role in memory storage and retrieval, and it is reasonable to ask if differences in hippocampal structures (particularly size) could play a role in schizophrenia. Suddath obtained MRI scans on the brains of 15 schizophrenic individuals and their monozygotic (identical) twins. They measured the volume of each brain s left hippocampus. Because there are many things that control the volume of cortical and subcortical structures, Suddath used monozygotic twin pairs in an effort to control as many of these as possible and to reduce the amount of variance to be explained. The results appear in Table 20.3 as taken from Ramsey and Schafer (1996). Definition Wilcoxon s matched-pairs signed-ranks test: A nonparametric test for comparing the central tendency of two matched (related) samples. If you plot the difference scores for these 15 twin pairs, as shown in Figure 20.1, you will note that the distribution is far from normal. With so few observations it is not feasible to make a definitive statement about normality, but I would not like to have to defend the idea that these are normally distributed observations. For that reason I would prefer to rely on a distribution-free test for paired observations, and that test is the Wilcoxon matched-pairs signed-ranks Table 20.3 Data on Volume (in cm 3 ) of Left Hippocampus in Schizophrenic and Nonschizophrenic Twin Pairs Signed Pair Normal Schizophrenic Difference Rank Rank T (Positive ranks) 111 T (Negative ranks) 9

15 550 Chapter 20 Nonparametric and Distribution-Free Statistical Tests Histogram of Difference Scores Difference Std. Dev. =.24 Mean =.20 N = Figure 20.1 Distribution of differences between schizophrenic and normal twins test, which is based, as its name suggests, on the ranks of the differences rather than the numerical values. If schizophrenia is associated with lower (or higher) volume for the left hippocampus, we would expect most of the twin pairs to show a lower (or higher) volume for the schizophrenic twin than for the control twin. Thus we would expect a predominantly positive (or negative) difference. We also would expect that twin pairs who broke this pattern to differ only slightly in the opposite direction from the trend. On the other hand, if schizophrenia has nothing to do with volume, we would expect about one-half of the difference scores to be positive and one-half to be negative, with the positive differences about as large as the negative ones. In other words, if H 0 is really true, we would no longer expect most changes to be in the predicted direction with only small changes in the unpredicted direction. Notice that I have deliberately phrased this paragraph for a two-tailed (nondirectional) test. For a directional test you would simply remove the phrases in parentheses. In carrying out the Wilcoxon matched-pairs signed-ranks test we first calculate the difference score for each pair of measurements. We then rank all difference scores without regard to the sign of the difference, give the algebraic sign of the differences to the ranks themselves, and finally sum the positive and negative ranks separately. The data in Table 20.3 present the numerical scores 1in cm 3 2 for the 15 schizophrenic participants and their twin in columns two and three. The fourth column shows the differences between the twins, with these differences ranked (without regard to sign) in the fifth column. Although the difference for pair 2 is the smallest number in column four , and would normally be ranked 1, when we drop its sign and look only at the size of the difference , and not its direction, it is the ninth-smallest difference. The last column shows the ranks found in column five with the sign of the difference

16 20.2 Wilcoxon s Matched-Pairs Signed-Ranks Test 551 applied. The test statistic 1T2 is taken as the smaller of the absolute values (i.e., dropping the sign) of the two sums and is evaluated against Table E.7 in Appendix E. (It is important to note that in calculating T we attach algebraic signs to the ranks only for convenience. We could just as easily, for example, circle those ranks that went with lower volume for the normal twin and underline those that went with higher volume for the normal twin. We are merely trying to differentiate between the two cases.) For the data in Table 20.3 only one of the pairs had the normal twin with a smaller volume that the schizophrenic twin. Although that was the 9th largest difference, it was still only one case. All other pairs showed a difference in the other direction. The sum of the positive ranks 1T and the sum of the negative ranks 1T Because T is defined as the smaller absolute value of T1 and T2, T 5 9. To evaluate T, we refer to Table E.7, a portion of which is shown in Table The format of this table is somewhat different from that of the other tables we have seen. The easiest way to understand what the entries in the table represent is by way of an analogy. Suppose that to test the fairness of a coin, you are going to flip it eight times and reject the null hypothesis, at a5.05 (one-tailed), if there were too few heads. Out of eight flips of a coin there is no set of outcomes that has a probability of exactly.05 under H 0. The probability of one or fewer heads is.0352, and the probability of two or fewer heads is Thus if we want to work at a5.05, we can either reject for one or fewer heads, in which case the probability of a Type I error is actually.0352 (less than.05), or we can reject for two or fewer heads, in which case the probability of a Type I error is actually.1445 (much greater than.05). Do you see where we are going? The same kind of problem arises with T because it is a discrete distribution. No value has a probability of exactly the desired a. In Table E.7 we find that for a one-tailed test at a5.025 (or a two-tailed test at a5.05) with n 5 15 the entries are 25 [.0240] and 26 [.0277]. This tells us that if we want to work at a (one-tailed) a5.025 (and thus a two-tailed test at a5.05), we can reject H 0 either for T # 25 (in which case a actually equals.0240) or for T # 26 (in which case the true value of a is.0277). Because we want a two-tailed test, the probabilities should be doubled to 25 [.0480] and 26 [.0554]. We obtained a T value of 9, so we would reject H 0, whichever cutoff we choose. We will conclude, therefore, that we reject the null hypothesis of equal volumes for the left hippocampus for both schizophrenic and normal participants. We can see from the data that the left hippocampus is generally smaller in those suffering from schizophrenia. This is a very important finding if only in that it demonstrates that there is a physical basis underlying schizophrenia, and not simply mistaken ways of living. Ties Ties can occur in the data in two different ways. One way would be for a twin pair to have the same scores for both the normal and schizophrenic twin, leading to a difference score of zero, which has no sign. In that case we normally eliminate that

17 552 Chapter 20 Nonparametric and Distribution-Free Statistical Tests Table 20.4 Critical Lower-Tail Values of T and Their Associated Probabilities (Abbreviated Version of Table E.7) Nominal (One-Tailed) N T T T T pair from consideration and reduce the sample size accordingly, although this leads to some bias in the test. We could have tied difference scores that lead to tied rankings. If both tied scores have the same sign, we can break the tie in any way we want (or assign tied ranks) without affecting the final outcome. If the scores have opposite signs, we normally assign tied ranks and proceed as usual. The Normal Approximation Just as with the Mann Whitney test, when the sample size is too large (in this case, larger than 50, which is the limit for Table E.7), a normal approximation is available to evaluate T. For larger sample sizes we know that the sampling distribution is approximately normally distributed with and n1n 1 12 Mean 5 4 Standard error 5 B n1n n

18 20.3 Kruskal Wallis One-Way Analysis of Variance 553 Thus we can calculate z as z 5 B n1n 1 12 T 2 4 n1n n and evaluate z using Table E.10. The procedure is directly analogous to that used with the Mann Whitney test and will not be repeated here. Frank Wilcoxon ( ) Frank Wilcoxon is an interesting person in statistics for the simple reason that he was not really a statistician and didn t publish any statistical work until he was in his 50s. He was originally trained in inorganic chemistry and spent most of his life doing chemical research dealing with insecticides and fungicides. Wilcoxon had been in a statistical study group with W. J. Youden, an important early figure in statistics, and they had worked their way through Fisher s very influential text. But when it came to analyzing data in later years, Wilcoxon was not satisfied with Fisher s method of randomization of observations. Wilcoxon hit upon the idea of substituting ranks for raw scores, which allowed him to work out the distribution of various test statistics quite easily. His use of ranks stimulated work on inference based ranks on and led to a number of related statistical tests applied to ranks. Wilcoxon officially retired in 1957, but then joined Florida State University and worked on sequential ranking methods until his death. His name is still largely synonymous with rank-based statistics Kruskal Wallis One-Way Analysis of Variance The Kruskal Wallis one-way analysis of variance is a direct generalization of the Mann Whitney test to the case in which we have three or more independent groups. As such it is the distribution-free analogue of the one-way analysis of variance discussed in Chapter 16. It tests the hypothesis that all samples were drawn from identical populations and is particularly sensitive to differences in central tendency. Definition Kruskal Wallis one-way analysis of variance: A nonparametric test analogous to a standard one-way analysis of variance.

19 554 Chapter 20 Nonparametric and Distribution-Free Statistical Tests To perform the Kruskal Wallis test, we simply rank all scores without regard to group membership and then compute the sum of the ranks for each group. The sums are denoted by R j. If the null hypothesis were true, we would expect the R j s to be more or less equal (aside from differences due to the size of the samples). A measure of the degree to which the s differ from one another is provided by where 12 R2 j H 5 N1N 1 12 a 2 31N 1 12 n j n j 5 the number of observations in the jth group R j 5 the sum of the ranks in the jth group N 5 n j 5 total sample size R j and the summation is taken over all k groups. H is then evaluated against the distribution on k 2 1 df. x 2 Students frequently have problems with a statement such as H is then evaluated against the x 2 distribution on k 2 1 df. All that it really means is that we treat H as if it were a value of x 2 and look it up in the chi-square tables on k 2 1 df. For an example, assume that the data in Table 20.5 represent the number of simple arithmetic problems (out of 85) solved (correctly or incorrectly) in one hour by participants given a depressant drug, a stimulant drug, or a placebo. Notice that in the Depressant group three of the participants were too depressed to do much of anything and in the Stimulant group three of the participants ran up against the limit of 85 available problems. These data are decidedly nonnormal, and we will convert the data to ranks and use the Kruskal Wallis test. The calculations are shown in the lower part of the table. The obtained value of H is 10.36, which can be treated as a x 2 on df. The critical value of x is found in Table E.1 in the Appendices to be Because , we can reject H 0 and conclude that the three drugs lead to different rates of performance. (Like other chi-square tests, this test rejects for large values of H. It is nonetheless a nondirectional test.) H Friedman s Rank Test for k Correlated Samples The last test to be discussed in this chapter is the distribution-free analogue of the one-way repeated-measures analysis of variance, Friedman s rank test for k correlated samples. It was developed by the well-known economist Milton Friedman in

20 20.4 Friedman s Rank Test for k Correlated Samples 555 Table 20.5 Kruskal Wallis Test Applied to Data on Problem Solving Depressant Stimulant Placebo Score Rank Score Rank Score Rank R i H 5 x N1N 1 12 a k i R 2 i n i 2 31N 1 12 a b the days before he was a well-known economist. This test is closely related to a standard repeated-measures analysis of variance applied to ranks instead of raw scores. It is a test on the null hypothesis that the scores for each treatment were drawn from identical populations, and it is especially sensitive to population differences in central tendency. Definition Friedman s rank test for k correlated samples: A nonparametric test analogous to a standard one-way repeated-measures analysis of variance. We will base our example on a study by Foertsch and Gernsbacher (1997), who investigated the substitution of the genderless word they for he or she. With the decrease in the acceptance of the word he as a gender-neutral pronoun, many writers are using the grammatically incorrect they in its place. (You may have noticed that in this text I have very deliberately used the less-expected pronoun, such as he for nurse and she for professor, to make the point that profession and gender are not linked. You may also have noticed that you sometimes stumbled over some of those sentences, taking longer to read them. That is what Foertsch

21 556 Chapter 20 Nonparametric and Distribution-Free Statistical Tests Table 20.6 Data on Reading Times as a Function of Pronoun Participant Expect He/See She Expect She/See He Neutral/See They and Gernsbacher s study was all about.) Foertsch and Gernsbacher asked participants to read sentences like A truck driver should never drive when sleepy, even if (he/she/they) may be struggling to make a delivery on time, because many accidents are caused by drivers who fall asleep at the wheel. On some trials the words in parentheses were replaced by the gender-stereotypic expected pronoun, sometimes by the gender-stereotypic unexpected pronoun, and sometimes by they. For our purposes the dependent variable will be taken as the difference in reading time between sentences with unexpected pronouns and sentences with they. There were three kinds of sentences in this study, those in which the expected pronoun was male, those in which it was female, and those in which it could equally be male or female. There are several dependent variables I could use from this study, but I have chosen the effect of seeing she when expecting he, the effect of seeing he when expecting she, and effect of seeing they when the expectation is neutral. (The original study is more complete than this.) The dependent variable is the reading time/character (in milliseconds). The data in Table 20.6 have been created to have roughly the same medians as the authors report. Here we have repeated measures on each participant, because each participant was presented with each kind of sentence. Some people read anything more slowly than others, which is reflected in the raw data. The data are far from normally distributed, which is why I am applying a distribution-free test. For Friedman s test the data are ranked within each subject from low to high. If it is easier to read neutral sentences with they than sentences with an unexpected pronoun, then the lowest ranks for each participant should pile up in the Neutral category. The ranked data follow. Raw Data Participant Sum Expect He/See She Expect She/See He Neutral/See They If the null hypothesis were true, we would expect the rankings to be randomly distributed within each subject. Thus one participant might do best on sentences with an expected he, another might do best with an expected she, and a third

22 20.5 Measures of Effect Size 557 might do best with an expected they. If this were the case, the sum of the rankings in each condition (row) would be approximately equal. On the other hand, if neutral sentences with they are easiest, then most participants would have their lowest ranking under that condition, and the sum of the rankings for the three conditions would be decidedly unequal. To apply Friedman s test, we rank the raw scores for each participant separately and then sum the rankings for each condition. We then evaluate the variability of the sums by computing x 2 F 5 12 Nk1k 1 12 a R2 j 2 3N1k 1 12 where R j 5 the sum of the ranks for the jth condition N 5 the number of subjects k 5 the number of conditions and the summation is taken over all k conditions. This value of with respect to the standard x 2 distribution on k 2 1 df. For the data in Table 20.5 we have x 2 F 5 12 Nk1k 1 12 a R2 j 2 3N1k can be evaluated The critical value of x 2 on df is 5.99, so we can reject H 0 and conclude that reading times are not independent of conditions. People can read a neutral sentence with they much faster than they can read sentences wherein the gender of the pronoun conflicts with the expected gender. From additional data that Foertsch and Gernsbacher present, it is clear that they is easier to read than the wrong gender, but harder than the expected gender. x 2 F 20.5 Measures of Effect Size Measures of effect size are difficult to find with distribution-free statistical tests. 1 An important reason for this is because many of our effect size measures are based on the size of the standard deviation, and if the data are very badly (nonnormally) 1 Conover (1980) discusses the use of confidence intervals for nonparametric procedures.

Rank-Based Non-Parametric Tests

Rank-Based Non-Parametric Tests Rank-Based Non-Parametric Tests Reminder: Student Instructional Rating Surveys You have until May 8 th to fill out the student instructional rating surveys at https://sakai.rutgers.edu/portal/site/sirs

More information

Difference tests (2): nonparametric

Difference tests (2): nonparametric NST 1B Experimental Psychology Statistics practical 3 Difference tests (): nonparametric Rudolf Cardinal & Mike Aitken 10 / 11 February 005; Department of Experimental Psychology University of Cambridge

More information

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS This booklet contains lecture notes for the nonparametric work in the QM course. This booklet may be online at http://users.ox.ac.uk/~grafen/qmnotes/index.html.

More information

UNDERSTANDING THE TWO-WAY ANOVA

UNDERSTANDING THE TWO-WAY ANOVA UNDERSTANDING THE e have seen how the one-way ANOVA can be used to compare two or more sample means in studies involving a single independent variable. This can be extended to two independent variables

More information

Permutation Tests for Comparing Two Populations

Permutation Tests for Comparing Two Populations Permutation Tests for Comparing Two Populations Ferry Butar Butar, Ph.D. Jae-Wan Park Abstract Permutation tests for comparing two populations could be widely used in practice because of flexibility of

More information

NCSS Statistical Software

NCSS Statistical Software Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the

More information

Statistical tests for SPSS

Statistical tests for SPSS Statistical tests for SPSS Paolo Coletti A.Y. 2010/11 Free University of Bolzano Bozen Premise This book is a very quick, rough and fast description of statistical tests and their usage. It is explicitly

More information

THE KRUSKAL WALLLIS TEST

THE KRUSKAL WALLLIS TEST THE KRUSKAL WALLLIS TEST TEODORA H. MEHOTCHEVA Wednesday, 23 rd April 08 THE KRUSKAL-WALLIS TEST: The non-parametric alternative to ANOVA: testing for difference between several independent groups 2 NON

More information

Comparing Means in Two Populations

Comparing Means in Two Populations Comparing Means in Two Populations Overview The previous section discussed hypothesis testing when sampling from a single population (either a single mean or two means from the same population). Now we

More information

Sample Size and Power in Clinical Trials

Sample Size and Power in Clinical Trials Sample Size and Power in Clinical Trials Version 1.0 May 011 1. Power of a Test. Factors affecting Power 3. Required Sample Size RELATED ISSUES 1. Effect Size. Test Statistics 3. Variation 4. Significance

More information

NONPARAMETRIC STATISTICS 1. depend on assumptions about the underlying distribution of the data (or on the Central Limit Theorem)

NONPARAMETRIC STATISTICS 1. depend on assumptions about the underlying distribution of the data (or on the Central Limit Theorem) NONPARAMETRIC STATISTICS 1 PREVIOUSLY parametric statistics in estimation and hypothesis testing... construction of confidence intervals computing of p-values classical significance testing depend on assumptions

More information

Study Guide for the Final Exam

Study Guide for the Final Exam Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make

More information

II. DISTRIBUTIONS distribution normal distribution. standard scores

II. DISTRIBUTIONS distribution normal distribution. standard scores Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

More information

Non-Parametric Tests (I)

Non-Parametric Tests (I) Lecture 5: Non-Parametric Tests (I) KimHuat LIM lim@stats.ox.ac.uk http://www.stats.ox.ac.uk/~lim/teaching.html Slide 1 5.1 Outline (i) Overview of Distribution-Free Tests (ii) Median Test for Two Independent

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

More information

CHAPTER 14 NONPARAMETRIC TESTS

CHAPTER 14 NONPARAMETRIC TESTS CHAPTER 14 NONPARAMETRIC TESTS Everything that we have done up until now in statistics has relied heavily on one major fact: that our data is normally distributed. We have been able to make inferences

More information

Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures

Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 35487-0348 Phone:

More information

The Dummy s Guide to Data Analysis Using SPSS

The Dummy s Guide to Data Analysis Using SPSS The Dummy s Guide to Data Analysis Using SPSS Mathematics 57 Scripps College Amy Gamble April, 2001 Amy Gamble 4/30/01 All Rights Rerserved TABLE OF CONTENTS PAGE Helpful Hints for All Tests...1 Tests

More information

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES SCHOOL OF HEALTH AND HUMAN SCIENCES Using SPSS Topics addressed today: 1. Differences between groups 2. Graphing Use the s4data.sav file for the first part of this session. DON T FORGET TO RECODE YOUR

More information

SPSS Explore procedure

SPSS Explore procedure SPSS Explore procedure One useful function in SPSS is the Explore procedure, which will produce histograms, boxplots, stem-and-leaf plots and extensive descriptive statistics. To run the Explore procedure,

More information

StatCrunch and Nonparametric Statistics

StatCrunch and Nonparametric Statistics StatCrunch and Nonparametric Statistics You can use StatCrunch to calculate the values of nonparametric statistics. It may not be obvious how to enter the data in StatCrunch for various data sets that

More information

Nonparametric Two-Sample Tests. Nonparametric Tests. Sign Test

Nonparametric Two-Sample Tests. Nonparametric Tests. Sign Test Nonparametric Two-Sample Tests Sign test Mann-Whitney U-test (a.k.a. Wilcoxon two-sample test) Kolmogorov-Smirnov Test Wilcoxon Signed-Rank Test Tukey-Duckworth Test 1 Nonparametric Tests Recall, nonparametric

More information

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing. Introduction to Hypothesis Testing CHAPTER 8 LEARNING OBJECTIVES After reading this chapter, you should be able to: 1 Identify the four steps of hypothesis testing. 2 Define null hypothesis, alternative

More information

CONTINGENCY TABLES ARE NOT ALL THE SAME David C. Howell University of Vermont

CONTINGENCY TABLES ARE NOT ALL THE SAME David C. Howell University of Vermont CONTINGENCY TABLES ARE NOT ALL THE SAME David C. Howell University of Vermont To most people studying statistics a contingency table is a contingency table. We tend to forget, if we ever knew, that contingency

More information

CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

More information

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) As with other parametric statistics, we begin the one-way ANOVA with a test of the underlying assumptions. Our first assumption is the assumption of

More information

Come scegliere un test statistico

Come scegliere un test statistico Come scegliere un test statistico Estratto dal Capitolo 37 of Intuitive Biostatistics (ISBN 0-19-508607-4) by Harvey Motulsky. Copyright 1995 by Oxfd University Press Inc. (disponibile in Iinternet) Table

More information

Projects Involving Statistics (& SPSS)

Projects Involving Statistics (& SPSS) Projects Involving Statistics (& SPSS) Academic Skills Advice Starting a project which involves using statistics can feel confusing as there seems to be many different things you can do (charts, graphs,

More information

Recall this chart that showed how most of our course would be organized:

Recall this chart that showed how most of our course would be organized: Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical

More information

Nonparametric tests these test hypotheses that are not statements about population parameters (e.g.,

Nonparametric tests these test hypotheses that are not statements about population parameters (e.g., CHAPTER 13 Nonparametric and Distribution-Free Statistics Nonparametric tests these test hypotheses that are not statements about population parameters (e.g., 2 tests for goodness of fit and independence).

More information

Lesson 4 Measures of Central Tendency

Lesson 4 Measures of Central Tendency Outline Measures of a distribution s shape -modality and skewness -the normal distribution Measures of central tendency -mean, median, and mode Skewness and Central Tendency Lesson 4 Measures of Central

More information

Using Excel for inferential statistics

Using Excel for inferential statistics FACT SHEET Using Excel for inferential statistics Introduction When you collect data, you expect a certain amount of variation, just caused by chance. A wide variety of statistical tests can be applied

More information

Independent samples t-test. Dr. Tom Pierce Radford University

Independent samples t-test. Dr. Tom Pierce Radford University Independent samples t-test Dr. Tom Pierce Radford University The logic behind drawing causal conclusions from experiments The sampling distribution of the difference between means The standard error of

More information

Having a coin come up heads or tails is a variable on a nominal scale. Heads is a different category from tails.

Having a coin come up heads or tails is a variable on a nominal scale. Heads is a different category from tails. Chi-square Goodness of Fit Test The chi-square test is designed to test differences whether one frequency is different from another frequency. The chi-square test is designed for use with data on a nominal

More information

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.

More information

Research Methodology: Tools

Research Methodology: Tools MSc Business Administration Research Methodology: Tools Applied Data Analysis (with SPSS) Lecture 11: Nonparametric Methods May 2014 Prof. Dr. Jürg Schwarz Lic. phil. Heidi Bruderer Enzler Contents Slide

More information

Parametric and Nonparametric: Demystifying the Terms

Parametric and Nonparametric: Demystifying the Terms Parametric and Nonparametric: Demystifying the Terms By Tanya Hoskin, a statistician in the Mayo Clinic Department of Health Sciences Research who provides consultations through the Mayo Clinic CTSA BERD

More information

EPS 625 INTERMEDIATE STATISTICS FRIEDMAN TEST

EPS 625 INTERMEDIATE STATISTICS FRIEDMAN TEST EPS 625 INTERMEDIATE STATISTICS The Friedman test is an extension of the Wilcoxon test. The Wilcoxon test can be applied to repeated-measures data if participants are assessed on two occasions or conditions

More information

13: Additional ANOVA Topics. Post hoc Comparisons

13: Additional ANOVA Topics. Post hoc Comparisons 13: Additional ANOVA Topics Post hoc Comparisons ANOVA Assumptions Assessing Group Variances When Distributional Assumptions are Severely Violated Kruskal-Wallis Test Post hoc Comparisons In the prior

More information

NCSS Statistical Software

NCSS Statistical Software Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the

More information

STATISTICAL SIGNIFICANCE OF RANKING PARADOXES

STATISTICAL SIGNIFICANCE OF RANKING PARADOXES STATISTICAL SIGNIFICANCE OF RANKING PARADOXES Anna E. Bargagliotti and Raymond N. Greenwell Department of Mathematical Sciences and Department of Mathematics University of Memphis and Hofstra University

More information

Testing Research and Statistical Hypotheses

Testing Research and Statistical Hypotheses Testing Research and Statistical Hypotheses Introduction In the last lab we analyzed metric artifact attributes such as thickness or width/thickness ratio. Those were continuous variables, which as you

More information

1 Nonparametric Statistics

1 Nonparametric Statistics 1 Nonparametric Statistics When finding confidence intervals or conducting tests so far, we always described the population with a model, which includes a set of parameters. Then we could make decisions

More information

Introduction to Hypothesis Testing OPRE 6301

Introduction to Hypothesis Testing OPRE 6301 Introduction to Hypothesis Testing OPRE 6301 Motivation... The purpose of hypothesis testing is to determine whether there is enough statistical evidence in favor of a certain belief, or hypothesis, about

More information

Standard Deviation Estimator

Standard Deviation Estimator CSS.com Chapter 905 Standard Deviation Estimator Introduction Even though it is not of primary interest, an estimate of the standard deviation (SD) is needed when calculating the power or sample size of

More information

The Wilcoxon Rank-Sum Test

The Wilcoxon Rank-Sum Test 1 The Wilcoxon Rank-Sum Test The Wilcoxon rank-sum test is a nonparametric alternative to the twosample t-test which is based solely on the order in which the observations from the two samples fall. We

More information

AP: LAB 8: THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

AP: LAB 8: THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics Ms. Foglia Date AP: LAB 8: THE CHI-SQUARE TEST Probability, Random Chance, and Genetics Why do we study random chance and probability at the beginning of a unit on genetics? Genetics is the study of inheritance,

More information

Chapter 7 Section 7.1: Inference for the Mean of a Population

Chapter 7 Section 7.1: Inference for the Mean of a Population Chapter 7 Section 7.1: Inference for the Mean of a Population Now let s look at a similar situation Take an SRS of size n Normal Population : N(, ). Both and are unknown parameters. Unlike what we used

More information

Chapter G08 Nonparametric Statistics

Chapter G08 Nonparametric Statistics G08 Nonparametric Statistics Chapter G08 Nonparametric Statistics Contents 1 Scope of the Chapter 2 2 Background to the Problems 2 2.1 Parametric and Nonparametric Hypothesis Testing......................

More information

NAG C Library Chapter Introduction. g08 Nonparametric Statistics

NAG C Library Chapter Introduction. g08 Nonparametric Statistics g08 Nonparametric Statistics Introduction g08 NAG C Library Chapter Introduction g08 Nonparametric Statistics Contents 1 Scope of the Chapter... 2 2 Background to the Problems... 2 2.1 Parametric and Nonparametric

More information

Analysis of Variance ANOVA

Analysis of Variance ANOVA Analysis of Variance ANOVA Overview We ve used the t -test to compare the means from two independent groups. Now we ve come to the final topic of the course: how to compare means from more than two populations.

More information

Association Between Variables

Association Between Variables Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi

More information

Nonparametric Statistics

Nonparametric Statistics Nonparametric Statistics J. Lozano University of Goettingen Department of Genetic Epidemiology Interdisciplinary PhD Program in Applied Statistics & Empirical Methods Graduate Seminar in Applied Statistics

More information

Tutorial 5: Hypothesis Testing

Tutorial 5: Hypothesis Testing Tutorial 5: Hypothesis Testing Rob Nicholls nicholls@mrc-lmb.cam.ac.uk MRC LMB Statistics Course 2014 Contents 1 Introduction................................ 1 2 Testing distributional assumptions....................

More information

NCSS Statistical Software. One-Sample T-Test

NCSS Statistical Software. One-Sample T-Test Chapter 205 Introduction This procedure provides several reports for making inference about a population mean based on a single sample. These reports include confidence intervals of the mean or median,

More information

Two-Sample T-Tests Assuming Equal Variance (Enter Means)

Two-Sample T-Tests Assuming Equal Variance (Enter Means) Chapter 4 Two-Sample T-Tests Assuming Equal Variance (Enter Means) Introduction This procedure provides sample size and power calculations for one- or two-sided two-sample t-tests when the variances of

More information

Outline. Definitions Descriptive vs. Inferential Statistics The t-test - One-sample t-test

Outline. Definitions Descriptive vs. Inferential Statistics The t-test - One-sample t-test The t-test Outline Definitions Descriptive vs. Inferential Statistics The t-test - One-sample t-test - Dependent (related) groups t-test - Independent (unrelated) groups t-test Comparing means Correlation

More information

UNDERSTANDING THE INDEPENDENT-SAMPLES t TEST

UNDERSTANDING THE INDEPENDENT-SAMPLES t TEST UNDERSTANDING The independent-samples t test evaluates the difference between the means of two independent or unrelated groups. That is, we evaluate whether the means for two independent groups are significantly

More information

Exact Nonparametric Tests for Comparing Means - A Personal Summary

Exact Nonparametric Tests for Comparing Means - A Personal Summary Exact Nonparametric Tests for Comparing Means - A Personal Summary Karl H. Schlag European University Institute 1 December 14, 2006 1 Economics Department, European University Institute. Via della Piazzuola

More information

One-Way ANOVA using SPSS 11.0. SPSS ANOVA procedures found in the Compare Means analyses. Specifically, we demonstrate

One-Way ANOVA using SPSS 11.0. SPSS ANOVA procedures found in the Compare Means analyses. Specifically, we demonstrate 1 One-Way ANOVA using SPSS 11.0 This section covers steps for testing the difference between three or more group means using the SPSS ANOVA procedures found in the Compare Means analyses. Specifically,

More information

Simple Regression Theory II 2010 Samuel L. Baker

Simple Regression Theory II 2010 Samuel L. Baker SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

More information

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon t-tests in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com www.excelmasterseries.com

More information

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences Introduction to Statistics for Psychology and Quantitative Methods for Human Sciences Jonathan Marchini Course Information There is website devoted to the course at http://www.stats.ox.ac.uk/ marchini/phs.html

More information

Basic Concepts in Research and Data Analysis

Basic Concepts in Research and Data Analysis Basic Concepts in Research and Data Analysis Introduction: A Common Language for Researchers...2 Steps to Follow When Conducting Research...3 The Research Question... 3 The Hypothesis... 4 Defining the

More information

LAB : THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

LAB : THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics Period Date LAB : THE CHI-SQUARE TEST Probability, Random Chance, and Genetics Why do we study random chance and probability at the beginning of a unit on genetics? Genetics is the study of inheritance,

More information

CHAPTER 12 TESTING DIFFERENCES WITH ORDINAL DATA: MANN WHITNEY U

CHAPTER 12 TESTING DIFFERENCES WITH ORDINAL DATA: MANN WHITNEY U CHAPTER 12 TESTING DIFFERENCES WITH ORDINAL DATA: MANN WHITNEY U Previous chapters of this text have explained the procedures used to test hypotheses using interval data (t-tests and ANOVA s) and nominal

More information

1.5 Oneway Analysis of Variance

1.5 Oneway Analysis of Variance Statistics: Rosie Cornish. 200. 1.5 Oneway Analysis of Variance 1 Introduction Oneway analysis of variance (ANOVA) is used to compare several means. This method is often used in scientific or medical experiments

More information

Lab 11. Simulations. The Concept

Lab 11. Simulations. The Concept Lab 11 Simulations In this lab you ll learn how to create simulations to provide approximate answers to probability questions. We ll make use of a particular kind of structure, called a box model, that

More information

Chapter 6: The Information Function 129. CHAPTER 7 Test Calibration

Chapter 6: The Information Function 129. CHAPTER 7 Test Calibration Chapter 6: The Information Function 129 CHAPTER 7 Test Calibration 130 Chapter 7: Test Calibration CHAPTER 7 Test Calibration For didactic purposes, all of the preceding chapters have assumed that the

More information

Comparing the Means of Two Populations: Independent Samples

Comparing the Means of Two Populations: Independent Samples CHAPTER 14 Comparing the Means of Two Populations: Independent Samples 14.1 From One Mu to Two Do children in phonics-based reading programs become better readers than children in whole language programs?

More information

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Chapter Seven Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Section : An introduction to multiple regression WHAT IS MULTIPLE REGRESSION? Multiple

More information

Statistics Review PSY379

Statistics Review PSY379 Statistics Review PSY379 Basic concepts Measurement scales Populations vs. samples Continuous vs. discrete variable Independent vs. dependent variable Descriptive vs. inferential stats Common analyses

More information

Nonparametric statistics and model selection

Nonparametric statistics and model selection Chapter 5 Nonparametric statistics and model selection In Chapter, we learned about the t-test and its variations. These were designed to compare sample means, and relied heavily on assumptions of normality.

More information

Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk

Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk Structure As a starting point it is useful to consider a basic questionnaire as containing three main sections:

More information

Analysis of Data. Organizing Data Files in SPSS. Descriptive Statistics

Analysis of Data. Organizing Data Files in SPSS. Descriptive Statistics Analysis of Data Claudia J. Stanny PSY 67 Research Design Organizing Data Files in SPSS All data for one subject entered on the same line Identification data Between-subjects manipulations: variable to

More information

Stat 5102 Notes: Nonparametric Tests and. confidence interval

Stat 5102 Notes: Nonparametric Tests and. confidence interval Stat 510 Notes: Nonparametric Tests and Confidence Intervals Charles J. Geyer April 13, 003 This handout gives a brief introduction to nonparametrics, which is what you do when you don t believe the assumptions

More information

Descriptive Statistics and Measurement Scales

Descriptive Statistics and Measurement Scales Descriptive Statistics 1 Descriptive Statistics and Measurement Scales Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample

More information

Means, standard deviations and. and standard errors

Means, standard deviations and. and standard errors CHAPTER 4 Means, standard deviations and standard errors 4.1 Introduction Change of units 4.2 Mean, median and mode Coefficient of variation 4.3 Measures of variation 4.4 Calculating the mean and standard

More information

The Kruskal-Wallis test:

The Kruskal-Wallis test: Graham Hole Research Skills Kruskal-Wallis handout, version 1.0, page 1 The Kruskal-Wallis test: This test is appropriate for use under the following circumstances: (a) you have three or more conditions

More information

Week 3&4: Z tables and the Sampling Distribution of X

Week 3&4: Z tables and the Sampling Distribution of X Week 3&4: Z tables and the Sampling Distribution of X 2 / 36 The Standard Normal Distribution, or Z Distribution, is the distribution of a random variable, Z N(0, 1 2 ). The distribution of any other normal

More information

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference)

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference) Chapter 45 Two-Sample T-Tests Allowing Unequal Variance (Enter Difference) Introduction This procedure provides sample size and power calculations for one- or two-sided two-sample t-tests when no assumption

More information

individualdifferences

individualdifferences 1 Simple ANalysis Of Variance (ANOVA) Oftentimes we have more than two groups that we want to compare. The purpose of ANOVA is to allow us to compare group means from several independent samples. In general,

More information

UNDERSTANDING THE DEPENDENT-SAMPLES t TEST

UNDERSTANDING THE DEPENDENT-SAMPLES t TEST UNDERSTANDING THE DEPENDENT-SAMPLES t TEST A dependent-samples t test (a.k.a. matched or paired-samples, matched-pairs, samples, or subjects, simple repeated-measures or within-groups, or correlated groups)

More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

UNIVERSITY OF NAIROBI

UNIVERSITY OF NAIROBI UNIVERSITY OF NAIROBI MASTERS IN PROJECT PLANNING AND MANAGEMENT NAME: SARU CAROLYNN ELIZABETH REGISTRATION NO: L50/61646/2013 COURSE CODE: LDP 603 COURSE TITLE: RESEARCH METHODS LECTURER: GAKUU CHRISTOPHER

More information

Overview of Non-Parametric Statistics PRESENTER: ELAINE EISENBEISZ OWNER AND PRINCIPAL, OMEGA STATISTICS

Overview of Non-Parametric Statistics PRESENTER: ELAINE EISENBEISZ OWNER AND PRINCIPAL, OMEGA STATISTICS Overview of Non-Parametric Statistics PRESENTER: ELAINE EISENBEISZ OWNER AND PRINCIPAL, OMEGA STATISTICS About Omega Statistics Private practice consultancy based in Southern California, Medical and Clinical

More information

Chapter 9. Two-Sample Tests. Effect Sizes and Power Paired t Test Calculation

Chapter 9. Two-Sample Tests. Effect Sizes and Power Paired t Test Calculation Chapter 9 Two-Sample Tests Paired t Test (Correlated Groups t Test) Effect Sizes and Power Paired t Test Calculation Summary Independent t Test Chapter 9 Homework Power and Two-Sample Tests: Paired Versus

More information

Non-Inferiority Tests for One Mean

Non-Inferiority Tests for One Mean Chapter 45 Non-Inferiority ests for One Mean Introduction his module computes power and sample size for non-inferiority tests in one-sample designs in which the outcome is distributed as a normal random

More information

Statistics. One-two sided test, Parametric and non-parametric test statistics: one group, two groups, and more than two groups samples

Statistics. One-two sided test, Parametric and non-parametric test statistics: one group, two groups, and more than two groups samples Statistics One-two sided test, Parametric and non-parametric test statistics: one group, two groups, and more than two groups samples February 3, 00 Jobayer Hossain, Ph.D. & Tim Bunnell, Ph.D. Nemours

More information

Introduction to Quantitative Methods

Introduction to Quantitative Methods Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

Research Methods & Experimental Design

Research Methods & Experimental Design Research Methods & Experimental Design 16.422 Human Supervisory Control April 2004 Research Methods Qualitative vs. quantitative Understanding the relationship between objectives (research question) and

More information

Introduction to Analysis of Variance (ANOVA) Limitations of the t-test

Introduction to Analysis of Variance (ANOVA) Limitations of the t-test Introduction to Analysis of Variance (ANOVA) The Structural Model, The Summary Table, and the One- Way ANOVA Limitations of the t-test Although the t-test is commonly used, it has limitations Can only

More information

Chi Square Tests. Chapter 10. 10.1 Introduction

Chi Square Tests. Chapter 10. 10.1 Introduction Contents 10 Chi Square Tests 703 10.1 Introduction............................ 703 10.2 The Chi Square Distribution.................. 704 10.3 Goodness of Fit Test....................... 709 10.4 Chi Square

More information

3. Mathematical Induction

3. Mathematical Induction 3. MATHEMATICAL INDUCTION 83 3. Mathematical Induction 3.1. First Principle of Mathematical Induction. Let P (n) be a predicate with domain of discourse (over) the natural numbers N = {0, 1,,...}. If (1)

More information

Chapter 5 Analysis of variance SPSS Analysis of variance

Chapter 5 Analysis of variance SPSS Analysis of variance Chapter 5 Analysis of variance SPSS Analysis of variance Data file used: gss.sav How to get there: Analyze Compare Means One-way ANOVA To test the null hypothesis that several population means are equal,

More information

6.4 Normal Distribution

6.4 Normal Distribution Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under

More information

Skewed Data and Non-parametric Methods

Skewed Data and Non-parametric Methods 0 2 4 6 8 10 12 14 Skewed Data and Non-parametric Methods Comparing two groups: t-test assumes data are: 1. Normally distributed, and 2. both samples have the same SD (i.e. one sample is simply shifted

More information

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 10

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 10 CS 70 Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 10 Introduction to Discrete Probability Probability theory has its origins in gambling analyzing card games, dice,

More information

Part 3. Comparing Groups. Chapter 7 Comparing Paired Groups 189. Chapter 8 Comparing Two Independent Groups 217

Part 3. Comparing Groups. Chapter 7 Comparing Paired Groups 189. Chapter 8 Comparing Two Independent Groups 217 Part 3 Comparing Groups Chapter 7 Comparing Paired Groups 189 Chapter 8 Comparing Two Independent Groups 217 Chapter 9 Comparing More Than Two Groups 257 188 Elementary Statistics Using SAS Chapter 7 Comparing

More information