Permutation tests are similar to rank tests, except that we use the observations directly without replacing them by ranks.


1 Chapter 2 Permutation Tests Permutation tests are similar to rank tests, except that we use the observations directly without replacing them by ranks. 2.1 The twosample location problem Assumptions: x 1,...,x m are observations on i.i.d. r.vs X 1,...,X m with a c.d.f. F 1, y 1,...,y n are observations on i.i.d. r.vs Y 1,...,Y n with a c.d.f. F 2. Null Hypothesis takes the same form as in the MWW test H 0 : F 1 (x) = F 2 (x) for all x and possible alternative hypotheses are as before: H 1 : F 1 (x) F 2 (x) with inequality for at least one x, H 1 : F 1 (x) F 2 (x) with inequality for at least one x, H 1 : Either F 1 (x) F 2 (x) or F 1 (x) F 2 (x) with inequality for at least one x. For this problem one of the suitable test statistics is Other possibilities are: T = X Ȳ. (2.1) Difference between sample medians, i.e., T = Me 1 Me 2 19
2 20 CHAPTER 2. PERMUTATION TESTS Difference between sample trimmed means, i.e., T = X t Ȳt Any monotonic function of T, say g(t); for example g(t) = T a b, b > 0 Null distribution of T: Under H 0 all of the m + n observations form a single random sample, and every selection of m observations out of m + n is equally likely; there are m+n C m such selections, for every selection of the m observations we get a value of T, say t. Hence, the null distribution of T is given by P(T = t) = #(t,m,n) m+n C m, (2.2) where #(t,m,n) denotes the number of all subsets for which T = t. The pvalue is P(T t 0 ) = #(t t 0,m,n) m+n C m, where t 0 is obtained from the sample, #(t t 0,m,n) denotes the number of the subsets for which T t 0. Example: visual acuity McClave and Dietrich(1988) In a comparison of visual acuity (VA) of deaf (D) and hearing (H) children, eye movement rates were taken on eight deaf and ten hearing children. A clinical psychologist believes that deaf children have greater visual acuity than hearing children. The larger a child s eye movement rate, the more visual acuity the child possesses. Test the psychologist s claim using the data given below. VA sample D D D D D D D D H H H H H H H H H H If there is no difference between the two groups with respect to VA then both samples come from a population with a common distribution. Hence the null hypothesis is H 0 : F D (x) = F H (x) for allx against the alternative H 1 : F D (x) F H (x) with inequality for at least one x
3 2.1. THE TWOSAMPLE LOCATION PROBLEM 21 We will use permutation test function ( 2.1) to verify this hypothesis. GenStat program calculating the value of the statistic T = D H and simulating the null distribution of T will be shown at the lectures.
4 22 CHAPTER 2. PERMUTATION TESTS 2.2 Test for independence of bivariate data Let observations (x 1,y 1 ), (x 2,y 2 ),...,(x n,y n ) be a realization of i.i.d. r.vs (X 1,Y 1 ), (X 2,Y 2 ),...,(X n,y n ) with a c.d.f. F X,Y. As in the rank test of independence, here too, we may state the hypotheses as: H 0 : There is no association between r.vs X and Y H 1 : There is an association between r.vs X and Y. A suitable test statistic for a permutation test is based on the sample correlation coefficient n i=1 ˆρ = X iy i n XȲ n i=1 (X i X) 2 n i=1 (Y i Ȳ )2. For a given sample n xȳ and the denominator are constant values for all permutations of the data. The only part of the coefficient sensitive to changes due to permutations is the sum n i=1 x iy i. Hence, the function V p = n X i Y i (2.3) i=1 may be used as a test statistic for the hypothesis of independence. We get the distribution of V p similarly to that for the rank statistic V. It involves calculating values of V p for all n! ways of pairing y i with x i. 2.3 Matched pairs As in the Wilcoxon signed rank test, let (y 1,...,y n ) and (z 1,...,z n ) be a realization of r.vs (Y 1,...,Y n ) and (Z 1,...,Z n ) such that Y s and Z s do not have to be independent. Then we analyze differences X i = Y i Z i which should be symmetrically distributed about zero if there is no difference between Y and Z. The null and alternative hypotheses have the same form as in the rank test, that is H 0 : X i are symmetrically distributed about zero H 1 : The center of the distribution is not zero
5 2.3. MATCHED PAIRS 23 The permutation test statistic is W + p = n Ψ i x i, (2.4) i=1 where the r.v. Ψ i is The null distribution of W + p Ψ i = { 1 if xi > 0 0 otherwise. is given by P(W + p = w) = #(w,n) 2 n, where #(w,n) denotes the number of subsets of the values of the data which give W + p = w. Learning the mechanics example will be given at the lectures
6 24 CHAPTER 2. PERMUTATION TESTS Example: CocaCola Advertising. Newbold (1988) The CocaCola Company ran a national advertising campaign based on the slogan Twice the cola, twice the fun. To test whether the campaign had improved brand awareness, random samples of 500 people in each of 10 cities were asked to name five soft drinks, both before and after the campaign had run. The accompanying table shows the numbers naming CocaCola. Test the hypothesis that the campaign made no difference to the customers awareness of the CocaCola brand. City Before After Atlanta Boston Chicago Denver Los Angeles Miami New Orleans New York Philadelphia St. Louis Calculations in GenStat will be shown during the lecture.
7 2.4. RANK TESTS VERSUS PERMUTATION TESTS Rank tests versus permutation tests There are some similarities and some differences between the two kinds of nonparametric tests. Which one to choose depends on the given hypothesistesting problem. Similarities. Both tests are: nonparametric: the underlying distribution does not need to be assumed to belong to any particular family of distributions, based on general principles of statistical hypothesistesting, exact: for large samples we can get close to the pvalue by taking enough simulations, noninformative about the estimates of the parameters. Differences Criterion Rank Tests Permutation Tests Robustness Not sensitive to outliers Quite sensitive to outliers Power Less powerful; loss of More powerful; for large n information by using ranks comparable to parametric tests Computational Null distribution depends Null distribution has to Complexity only on the sample size; be calculated for each critical values can be data set. tabulated. Asymptotic Usually asymptotically In general, not Normality normal (under H 0 ). asymptotically normal. Ties Slight problem. No problem.
More information