Biometril-a (1982). 69. 2. pp. 377-82 377 Printed in Great Britain Two modified Wilcoxon tests for symmetry about an unknown location parameter BY P. K. BHATTACHARYA Department of Statistics, University of California, Davis, California, U.S.A. J. L. GASTWIRTH Department of Statistics, George Washington University, Washington D.C., U.S.A. AND A. L. WRIGHT Department of Mathematics, University of Arizona, Tucson, Arizona, U.S.A. SUMMARY Two modifications of the Wilcoxon test for symmetry about the sample median are proposed and their large sample distribution theory derived. Their Pitman efficacies against contamination in the right tail are obtained. Both the asymptotic results and a Monte Carlo study show that the procedures are more powerful than the Wilcoxon test about the median against many asymmetric alternatives while preserving its robustness of level shown by Antille & Kersting (1977). Some key words: Contamination; Modified Wilcoxon test for symmetry about sample median; Monte Carlo study; Pitman efficacy; Robustness of level. 1. INTRODUCTION The availability of robust estimators of the location parameter, 9, for families of symmetric distributions (Andrews et al., 1972, pp. 5, 6) highlights the need for tests of symmetry when 6 is unknown. For this we desire a test which is approximately distribution free for a large class of symmetric densities rather than a goodness-of-fit test. Nonparametric tests about an estimated 6 are a class of potentially suitable tests. When 9 is estimated by the sample mean, however, Gastwirth (1971) and M. E. B. Owens in a George Washington University thesis showed that the tests lose their distributionfree character. Later, Antille & Kersting (1977) found that the asymptotic variance of the Wilcoxon test about the sample median did not vary greatly over a wide family of symmetric densities, while Monte Carlo results of R. W. Resek, in an unpublished conference paper, showed that this test had low power. The present article presents two modifications, similar to those of Gastwirth (1965), of the Wilcoxon test about the median which place more weight on the portion of the sample which is affected by the asymmetry. The level of the modified tests is as robust as the original one while their power was larger against most asymmetric alternatives examined. The test statistics are defined in 2 and their asymptotic properties are given in 3. The results of a Monte Carlo study showed that the tests improved on the Wilcoxon but are not so powerful as a test of normality as the one using yjb 1. The details of the proofs and the simulation which are omitted from the article are given in a technical report by the present authors.
378 P. K. BHATTACHARYA, J. L. GASTWIRTH AXD A. L. WRIGHT 2. FORMAL STATEMENT OF THE PROBLEM AND THE TEST STATISTICS The problem is to test whether a density function/(x) is symmetric about an unknown median or is skew. The primary alternative is skewness resulting from contamination in the right tail. If F(x) denotes the cumulative distribution function and v its median, then we are testing the null hypothesis H o : F(v-x)= l-f(v + x) versus H x : F(v-x) =(= 1 -F(v + x). If we let f 0 = F' o, symmetric about v, the alternative considered has a density where f x is a density with location parameter \i > v. To define the test statistics wefixp (0 < p < 1, q = 1 p) and introduce the following notation for quantiles of interest and related parameters If we let X lt...,x N with N = 2n+l be a random sample from F(x) and X (l),...,x (N) be the ordered sample, the sample quantiles corresponding to v, r\ and are M n = X (B + 1), X (n9+1) and X in _ nq) respectively, where we treat nq as an integer to avoid unnecessary notational complexity. The first statistic is a modified Wilcoxon test in the form considered by Antille & Kersting (1977). If we let MO = ^ {nq + l)~-^-(nq+ 1-0' (0 = ^N for i = 1,...,nq and define x( u ) to be 1, ^ or 0 according as u is less than, equal to or greater than zero, this statistic is This statistic compares the upper \q fraction of the data with the lower ^q fraction. Another way of focusing on this portion of the data is based on the distance between these order statistics and the median. Formally, we define If we consider the Y"s and Z"a as one sample and let R t,...,r H and R n + 1 R 2n denote the ranks of the Y"s and Z"s, respectively, the general linear rank statistic for symmetry (Hajek & Sidak, 1967, p. 108; Jureckova, 1971) is of the form With (f>(u) = 0 for u ^ p and <j>(u) = u p for u > p, where p = 1 q, (2-2) has the form where x + = max (x, 0). X (=1 (-=n+l + \ (2-3)
Modified Wilcoxon tests for symmetry 379 3. ASYMPTOTIC PROPERTIES OF THE MODIFIED WILCOXON TESTS In this section we obtain the asymptotic distributions and Pitman efficacies of both modifed versions of the Wilcoxon test statistic. The null distribution F will be assumed to be symmetric about zero because the statistics T and S are both translation invariant. To obtain the limit law of T N, define -x)} 2 f(x)dx, Co = - S q 3 f ( C ) h - 4 q 2 h + 2 q l /( ), c 3 = 8q*( the integrals are over ((, oo), and r\ and are defined by F{r\) = 1 F(Q = $q. THEOREM 1. If F has density f which is differentiable at tj and, then {T N -n 1 (F)} y/n/a^f) -»iv(0,1) in distribution, where n^f) = /i, = l-c 0 and Under H o, this result reduces to the following. COROLLARY. For symmetric F,fi l = \ and o\ = (T?(0) = {Sqr l +q- l [l-4{qf(c)}- 1 jf 2 {x)dxj. (3-1) The theorem is proved by arguing conditionally on the normalized sample quantiles -^(Bfl + i) ar >d X^-^ and removing the conditioning. The details are given in the original report. The asymptotic efficacy of tests based on T N in (21) against contamination alternatives F e = (1 e)f 0 + ef 1 (0 < e < 1) is obtained by differentiating /i 1 (e) = Hi(F t ) and noting that o\{e) = o\(f t ) is continuous on the right at e = 0. Specifically, we have the following. COROLLARY. The Pitman efficacy n\(0) 2 /al(0) of tests based on T N against contamination alternatives is I* f&x)dx \ {F l (-x)j 0 {x) + F 0 {-x)f l (x)}dx\ \ j The statistic S N given by (2-2) can be expressed in the notation of Hajek & Sidak (1967,
380 P. K. BHATTACHARYA, J. L. GASTWIRTH AND A. L. WRIGHT p. 108) for tests of symmetry about A by defining where 7A( = {-1 (X,<A), R i is the rank of \X t A, and a N (i) = <j>{i/(n + 1)}, where A = 3f w, the sample median. The asymptotic null distribution of any rank test statistic of symmetry of the form (32) about the sample median is obtained by using the Bahadur (1966) representation of the sample median in conjunction with the methods of Hajek & Sidak, (1967) and is given in Theorem 2. THEOREM 2. If f is a symmetric density, differentiable at its median 0, and S N is a linear rank statistic of the form (3-2) with A = M N, that is S N = [»S r AJV] A = ]Ifi, then it is asymptotically normal with mean 0 and variance where with Jo (f) 2 {H(x)}dH(x) + b + 2 {4f 2 (0)}' 1 -b + {f(0)}~ 1 <f){h(x)}dh(x), Jo H(x) = F(x) )-F(-x), b + = \(j>(u)ci> + (u,f)du, Jo For the (f> function defining S N in (23) we have the following. COROLLARY. Under H o, the second modified Wilcoxon statistic S N is asymptotically normal with mean 0 and variance where ( satisfies \f(x)dx Test T S = %q, with the integral over (, oo). Table 1. Asymptotic variances of modified Wilcoxon test statistics q 1-00 O50 O25 1-00 0-50 O25 Normal 0-505 0-754 1-41 0-126 CH3394 0-00600 Logistic 0-444 0-691 1-34 0-111 0-330 0-00491 Double exponential 0-333 0-667 1 33 0-0833 0-0260 0-00423 h 0-449 0-693 1-34 0-112 0-0334 0-00494 h 0-396 0-667 1-35 0-0990 0-0286 0-00434 Cauchy 0-333 0-816 1-74 0-0833 0-0274 0-00458 (3-3) The cases q = 1 correspond to the original Wilcoxon test with centre estimated by the median. The asymptotic variance of «S in one-fourth that of T due to different normalizations. Similarly, the asymptotic variance of the version analysed by Antille & Kersting (1977) is one-sixteenth of that of T.
Modified Wilcoxon tests for symmetry 381 The limiting efficacy of S N against contamination alternatives is given by ^2(Q) 2 lo\{fi), where r oo [{l-2f1 (0)}f 0 (x)/f 0 (0) + F l (x)-f 1 (-x)]f 0 (x)dx. Table 1 gives the null asymptotic variances; the modified tests are robust in level. Pitman asymptotic efficacies of these modified tests under contamination of the above symmetric densities f o (x) by a~ x fo{(x n)/o} were computed for selected \i and a and the modifications were seen to improve the performance in many cases. Details are given in the original report. 4. MONTE CARLO STUDY This section summarizes a Monte Carlo study which showed that asymptotic properties of the modified tests hold for moderate sample sizes. Our study was based on 1000 independent samples of 101 observations from each of the symmetric densities (or alternatives). Motivated by the research of Doksum, Fenstead & Aaberge (1977) and R. W. Resek we also included the following tests: (i) the ordinary Wilcoxon test W for symmetry about a known location, (ii) the classical test of skewness y/b l = M 3 /M\ 12 with its variance (/x 6 6/^2^4 + 9^2)/^2 estimated by the appropriate sample moments, (iii) the David & Johnson (1956) test, J = with q = 004 and q = 001. X(2iu,-nq+2) As the null distributions of both modified tests and J depend on the underlying density functions, we estimated the density and integrals (3-1) and (33) using the uniform kernel (Tapia & Thompson, 1978, p. 54). The major findings of our study were as follows. (a) The David-Johnson test is not robust in level and should not be regarded as a test of symmetry. It is a powerful test of normality. (b) The classical test, ^/t^was level robust in samples of 25, 50 and 101 when the density had several moments, but not for Cauchy data. It had high power, see Table 2, against %\ 0, log normal and the contamination alternatives when the underlying density was 'near normal', for example t 7, logistic. Table 2. Number of rejections by various tests of symmetry in 1000 samples of size 101 from asymmetric alternatives Test A T 1 1-0 025 1-0 0-25 10% F O (H) a = 0-5/i 38 58 37 50 49 contaminated normal = 0-9 F 0 (y) = 0-975 a = 2/i a = 0-5/i a = 2/i 261 81 184 73 147 332 102 168 96 192 305 86 253 78 196 i) X\o 876 245 466 229 508 Log normal a = 0-1 a = 0-4 271 24 53 19 2 869 412 739 390 870
382 P. K. BHATTACHARYA, J. L. GASTWIRTH AND A. L. WRIGHT (c) The modified tests, S and T, increased the power of the Wilcoxon test about the median but were not as powerful as sjb l against the contamination alternatives when the underlying density was 'near' normal. Therefore, in sample sizes of 100 or more, when one is willing to assume the existence of several moments, N/6 1 is a reliable test of symmetry against skewed or contaminated alternatives. As general tests of symmetry the modified Wilcoxon tests have a greater degree of level robustness for the densities typically considered, but are not as powerful as yjb^ over the smaller class of densities for which it is level robust. This research was supported by the National Science Foundation. REFERENCES ANDREWS, D. F., BICKBL, P. J., HAMPEL, F. R., HUBER, P. J., ROGERS, W. H. & TUKEY, J. W. (1972). Robust Estimates of Location. Princeton University Press. ANTILLE, A. & KERSTJNG, G. (1977). Tests for symmetry. Z. Wahr. verw. Oeb 39, 235-55. BAHADUR, R. R. (1966). A note on quantiles in large samples. Ann. Math. Statist. 37, 577-80. DAVID, F. X. & JOHNSON, N. L. (1956). Some tests of significance with ordered variables. J. R. Statist. Soc. B 18, 1 20. DOKSUM, K. A., FENSTAD, G. &, AABERGE, R. (1977). Plots and tests for symmetry. Biomtirika 64, 473-87. GASTWIRTH, J. L. (1965). Percentile modifications of two sample rank tests. J. Am. Statist. Assoc. 60, 1127 41. GASTWIRTH, J. L. (1971). On the sign test for symmetry. J. Am. Statist. Assoc. 66, 821-23. HAJEK, J. & SIDAK, Z. (1967). Theory of Rank Tests. New York: Academic Press. JURECKOVA, J. (1971). Asymptotic independence of rank test statistic for testing symmetry on regression. Sankhyd A 33, 1-18. TAFIA, R. A. & THOMPSON, J. R. (1978). Nonparametric Density Estimation. Baltimore, Md: Johns Hopkins University Press. [Received May 1981. Revised October 1981]