Two modified Wilcoxon tests for symmetry about an unknown location parameter

Similar documents
Nonparametric adaptive age replacement with a one-cycle criterion

Permutation Tests for Comparing Two Populations

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

The Variability of P-Values. Summary

Interpretation of Somers D under four simple models

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

Order Statistics: Theory & Methods. N. Balakrishnan Department of Mathematics and Statistics McMaster University Hamilton, Ontario, Canada. C. R.

On Mardia s Tests of Multinormality

An Introduction to Basic Statistics and Probability

Multivariate Normal Distribution

Convex Hull Probability Depth: first results

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

PROPERTIES OF THE SAMPLE CORRELATION OF THE BIVARIATE LOGNORMAL DISTRIBUTION

A study on the bi-aspect procedure with location and scale parameters

Continued Fractions. Darren C. Collins

Non Parametric Inference

Normal Probability Plots and Tests for Normality

A SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS

How Far is too Far? Statistical Outlier Detection

Continued Fractions and the Euclidean Algorithm

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools

Sums of Independent Random Variables

E3: PROBABILITY AND STATISTICS lecture notes

Exact Confidence Intervals

Tests for exponentiality against the M and LM-classes of life distributions

Master s Theory Exam Spring 2006

Testing against a Change from Short to Long Memory

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber

Chapter 4 - Lecture 1 Probability Density Functions and Cumul. Distribution Functions

How To Understand The Theory Of Probability

Nonparametric tests these test hypotheses that are not statements about population parameters (e.g.,

EMPIRICAL RISK MINIMIZATION FOR CAR INSURANCE DATA

Chapter 3 RANDOM VARIATE GENERATION

Adaptive Search with Stochastic Acceptance Probabilities for Global Optimization

From the help desk: Bootstrapped standard errors

. (3.3) n Note that supremum (3.2) must occur at one of the observed values x i or to the left of x i.

MEASURES OF LOCATION AND SPREAD

Assumptions. Assumptions of linear models. Boxplot. Data exploration. Apply to response variable. Apply to error terms from linear model

Quantitative Methods for Finance

14.1. Basic Concepts of Integration. Introduction. Prerequisites. Learning Outcomes. Learning Style

2DI36 Statistics. 2DI36 Part II (Chapter 7 of MR)

Non-Inferiority Tests for One Mean

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Financial Time Series Analysis (FTSA) Lecture 1: Introduction

Stat 5102 Notes: Nonparametric Tests and. confidence interval

Dongfeng Li. Autumn 2010

Principle of Data Reduction

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

Magruder Statistics & Data Analysis

Statistics Graduate Courses

MATH 10: Elementary Statistics and Probability Chapter 5: Continuous Random Variables

Assessing the Relative Power of Structural Break Tests Using a Framework Based on the Approximate Bahadur Slope

Geostatistics Exploratory Analysis

Testing against a Change from Short to Long Memory

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission.

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

Math 370/408, Spring 2008 Prof. A.J. Hildebrand. Actuarial Exam Practice Problem Set 3 Solutions

MODIFIED PARAMETRIC BOOTSTRAP: A ROBUST ALTERNATIVE TO CLASSICAL TEST

Impact of Skewness on Statistical Power

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

Nonparametric Tests for Randomness

5/31/ Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives.

A Review of Statistical Outlier Methods

THE IMPACT OF 401(K) PARTICIPATION ON THE WEALTH DISTRIBUTION: AN INSTRUMENTAL QUANTILE REGRESSION ANALYSIS

Non-Parametric Tests (I)

A Coefficient of Variation for Skewed and Heavy-Tailed Insurance Losses. Michael R. Powers[ 1 ] Temple University and Tsinghua University

Ranked Set Sampling: an Approach to More Efficient Data Collection

Variables Control Charts

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Decomposing total risk of a portfolio into the contributions of individual assets

Non-Inferiority Tests for Two Means using Differences

Recursive Estimation

Monte Carlo testing with Big Data

Methods for Finding Bases

Transformations and Expectations of random variables

The Wilcoxon Rank-Sum Test

Multivariate normal distribution and testing for means (see MKB Ch 3)

How To Check For Differences In The One Way Anova

The Assumption(s) of Normality

Least Squares Estimation

Contributions to extreme-value analysis

Nonparametric statistics and model selection

COMPARING DATA ANALYSIS TECHNIQUES FOR EVALUATION DESIGNS WITH NON -NORMAL POFULP_TIOKS Elaine S. Jeffers, University of Maryland, Eastern Shore*

Module 3: Correlation and Covariance

NAG C Library Chapter Introduction. g08 Nonparametric Statistics

NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS

Alessandro Birolini. ineerin. Theory and Practice. Fifth edition. With 140 Figures, 60 Tables, 120 Examples, and 50 Problems.

On characterization of a class of convex operators for pricing insurance risks

LOGNORMAL MODEL FOR STOCK PRICES

Handout #1: Mathematical Reasoning

NONPARAMETRIC STATISTICS 1. depend on assumptions about the underlying distribution of the data (or on the Central Limit Theorem)

Chapter 1 Introduction. 1.1 Introduction

Basic Descriptive Statistics & Probability Distributions

II. DISTRIBUTIONS distribution normal distribution. standard scores

General Sampling Methods

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

8 INTERPRETATION OF SURVEY RESULTS

9.2 Summation Notation

Transcription:

Biometril-a (1982). 69. 2. pp. 377-82 377 Printed in Great Britain Two modified Wilcoxon tests for symmetry about an unknown location parameter BY P. K. BHATTACHARYA Department of Statistics, University of California, Davis, California, U.S.A. J. L. GASTWIRTH Department of Statistics, George Washington University, Washington D.C., U.S.A. AND A. L. WRIGHT Department of Mathematics, University of Arizona, Tucson, Arizona, U.S.A. SUMMARY Two modifications of the Wilcoxon test for symmetry about the sample median are proposed and their large sample distribution theory derived. Their Pitman efficacies against contamination in the right tail are obtained. Both the asymptotic results and a Monte Carlo study show that the procedures are more powerful than the Wilcoxon test about the median against many asymmetric alternatives while preserving its robustness of level shown by Antille & Kersting (1977). Some key words: Contamination; Modified Wilcoxon test for symmetry about sample median; Monte Carlo study; Pitman efficacy; Robustness of level. 1. INTRODUCTION The availability of robust estimators of the location parameter, 9, for families of symmetric distributions (Andrews et al., 1972, pp. 5, 6) highlights the need for tests of symmetry when 6 is unknown. For this we desire a test which is approximately distribution free for a large class of symmetric densities rather than a goodness-of-fit test. Nonparametric tests about an estimated 6 are a class of potentially suitable tests. When 9 is estimated by the sample mean, however, Gastwirth (1971) and M. E. B. Owens in a George Washington University thesis showed that the tests lose their distributionfree character. Later, Antille & Kersting (1977) found that the asymptotic variance of the Wilcoxon test about the sample median did not vary greatly over a wide family of symmetric densities, while Monte Carlo results of R. W. Resek, in an unpublished conference paper, showed that this test had low power. The present article presents two modifications, similar to those of Gastwirth (1965), of the Wilcoxon test about the median which place more weight on the portion of the sample which is affected by the asymmetry. The level of the modified tests is as robust as the original one while their power was larger against most asymmetric alternatives examined. The test statistics are defined in 2 and their asymptotic properties are given in 3. The results of a Monte Carlo study showed that the tests improved on the Wilcoxon but are not so powerful as a test of normality as the one using yjb 1. The details of the proofs and the simulation which are omitted from the article are given in a technical report by the present authors.

378 P. K. BHATTACHARYA, J. L. GASTWIRTH AXD A. L. WRIGHT 2. FORMAL STATEMENT OF THE PROBLEM AND THE TEST STATISTICS The problem is to test whether a density function/(x) is symmetric about an unknown median or is skew. The primary alternative is skewness resulting from contamination in the right tail. If F(x) denotes the cumulative distribution function and v its median, then we are testing the null hypothesis H o : F(v-x)= l-f(v + x) versus H x : F(v-x) =(= 1 -F(v + x). If we let f 0 = F' o, symmetric about v, the alternative considered has a density where f x is a density with location parameter \i > v. To define the test statistics wefixp (0 < p < 1, q = 1 p) and introduce the following notation for quantiles of interest and related parameters If we let X lt...,x N with N = 2n+l be a random sample from F(x) and X (l),...,x (N) be the ordered sample, the sample quantiles corresponding to v, r\ and are M n = X (B + 1), X (n9+1) and X in _ nq) respectively, where we treat nq as an integer to avoid unnecessary notational complexity. The first statistic is a modified Wilcoxon test in the form considered by Antille & Kersting (1977). If we let MO = ^ {nq + l)~-^-(nq+ 1-0' (0 = ^N for i = 1,...,nq and define x( u ) to be 1, ^ or 0 according as u is less than, equal to or greater than zero, this statistic is This statistic compares the upper \q fraction of the data with the lower ^q fraction. Another way of focusing on this portion of the data is based on the distance between these order statistics and the median. Formally, we define If we consider the Y"s and Z"a as one sample and let R t,...,r H and R n + 1 R 2n denote the ranks of the Y"s and Z"s, respectively, the general linear rank statistic for symmetry (Hajek & Sidak, 1967, p. 108; Jureckova, 1971) is of the form With (f>(u) = 0 for u ^ p and <j>(u) = u p for u > p, where p = 1 q, (2-2) has the form where x + = max (x, 0). X (=1 (-=n+l + \ (2-3)

Modified Wilcoxon tests for symmetry 379 3. ASYMPTOTIC PROPERTIES OF THE MODIFIED WILCOXON TESTS In this section we obtain the asymptotic distributions and Pitman efficacies of both modifed versions of the Wilcoxon test statistic. The null distribution F will be assumed to be symmetric about zero because the statistics T and S are both translation invariant. To obtain the limit law of T N, define -x)} 2 f(x)dx, Co = - S q 3 f ( C ) h - 4 q 2 h + 2 q l /( ), c 3 = 8q*( the integrals are over ((, oo), and r\ and are defined by F{r\) = 1 F(Q = $q. THEOREM 1. If F has density f which is differentiable at tj and, then {T N -n 1 (F)} y/n/a^f) -»iv(0,1) in distribution, where n^f) = /i, = l-c 0 and Under H o, this result reduces to the following. COROLLARY. For symmetric F,fi l = \ and o\ = (T?(0) = {Sqr l +q- l [l-4{qf(c)}- 1 jf 2 {x)dxj. (3-1) The theorem is proved by arguing conditionally on the normalized sample quantiles -^(Bfl + i) ar >d X^-^ and removing the conditioning. The details are given in the original report. The asymptotic efficacy of tests based on T N in (21) against contamination alternatives F e = (1 e)f 0 + ef 1 (0 < e < 1) is obtained by differentiating /i 1 (e) = Hi(F t ) and noting that o\{e) = o\(f t ) is continuous on the right at e = 0. Specifically, we have the following. COROLLARY. The Pitman efficacy n\(0) 2 /al(0) of tests based on T N against contamination alternatives is I* f&x)dx \ {F l (-x)j 0 {x) + F 0 {-x)f l (x)}dx\ \ j The statistic S N given by (2-2) can be expressed in the notation of Hajek & Sidak (1967,

380 P. K. BHATTACHARYA, J. L. GASTWIRTH AND A. L. WRIGHT p. 108) for tests of symmetry about A by defining where 7A( = {-1 (X,<A), R i is the rank of \X t A, and a N (i) = <j>{i/(n + 1)}, where A = 3f w, the sample median. The asymptotic null distribution of any rank test statistic of symmetry of the form (32) about the sample median is obtained by using the Bahadur (1966) representation of the sample median in conjunction with the methods of Hajek & Sidak, (1967) and is given in Theorem 2. THEOREM 2. If f is a symmetric density, differentiable at its median 0, and S N is a linear rank statistic of the form (3-2) with A = M N, that is S N = [»S r AJV] A = ]Ifi, then it is asymptotically normal with mean 0 and variance where with Jo (f) 2 {H(x)}dH(x) + b + 2 {4f 2 (0)}' 1 -b + {f(0)}~ 1 <f){h(x)}dh(x), Jo H(x) = F(x) )-F(-x), b + = \(j>(u)ci> + (u,f)du, Jo For the (f> function defining S N in (23) we have the following. COROLLARY. Under H o, the second modified Wilcoxon statistic S N is asymptotically normal with mean 0 and variance where ( satisfies \f(x)dx Test T S = %q, with the integral over (, oo). Table 1. Asymptotic variances of modified Wilcoxon test statistics q 1-00 O50 O25 1-00 0-50 O25 Normal 0-505 0-754 1-41 0-126 CH3394 0-00600 Logistic 0-444 0-691 1-34 0-111 0-330 0-00491 Double exponential 0-333 0-667 1 33 0-0833 0-0260 0-00423 h 0-449 0-693 1-34 0-112 0-0334 0-00494 h 0-396 0-667 1-35 0-0990 0-0286 0-00434 Cauchy 0-333 0-816 1-74 0-0833 0-0274 0-00458 (3-3) The cases q = 1 correspond to the original Wilcoxon test with centre estimated by the median. The asymptotic variance of «S in one-fourth that of T due to different normalizations. Similarly, the asymptotic variance of the version analysed by Antille & Kersting (1977) is one-sixteenth of that of T.

Modified Wilcoxon tests for symmetry 381 The limiting efficacy of S N against contamination alternatives is given by ^2(Q) 2 lo\{fi), where r oo [{l-2f1 (0)}f 0 (x)/f 0 (0) + F l (x)-f 1 (-x)]f 0 (x)dx. Table 1 gives the null asymptotic variances; the modified tests are robust in level. Pitman asymptotic efficacies of these modified tests under contamination of the above symmetric densities f o (x) by a~ x fo{(x n)/o} were computed for selected \i and a and the modifications were seen to improve the performance in many cases. Details are given in the original report. 4. MONTE CARLO STUDY This section summarizes a Monte Carlo study which showed that asymptotic properties of the modified tests hold for moderate sample sizes. Our study was based on 1000 independent samples of 101 observations from each of the symmetric densities (or alternatives). Motivated by the research of Doksum, Fenstead & Aaberge (1977) and R. W. Resek we also included the following tests: (i) the ordinary Wilcoxon test W for symmetry about a known location, (ii) the classical test of skewness y/b l = M 3 /M\ 12 with its variance (/x 6 6/^2^4 + 9^2)/^2 estimated by the appropriate sample moments, (iii) the David & Johnson (1956) test, J = with q = 004 and q = 001. X(2iu,-nq+2) As the null distributions of both modified tests and J depend on the underlying density functions, we estimated the density and integrals (3-1) and (33) using the uniform kernel (Tapia & Thompson, 1978, p. 54). The major findings of our study were as follows. (a) The David-Johnson test is not robust in level and should not be regarded as a test of symmetry. It is a powerful test of normality. (b) The classical test, ^/t^was level robust in samples of 25, 50 and 101 when the density had several moments, but not for Cauchy data. It had high power, see Table 2, against %\ 0, log normal and the contamination alternatives when the underlying density was 'near normal', for example t 7, logistic. Table 2. Number of rejections by various tests of symmetry in 1000 samples of size 101 from asymmetric alternatives Test A T 1 1-0 025 1-0 0-25 10% F O (H) a = 0-5/i 38 58 37 50 49 contaminated normal = 0-9 F 0 (y) = 0-975 a = 2/i a = 0-5/i a = 2/i 261 81 184 73 147 332 102 168 96 192 305 86 253 78 196 i) X\o 876 245 466 229 508 Log normal a = 0-1 a = 0-4 271 24 53 19 2 869 412 739 390 870

382 P. K. BHATTACHARYA, J. L. GASTWIRTH AND A. L. WRIGHT (c) The modified tests, S and T, increased the power of the Wilcoxon test about the median but were not as powerful as sjb l against the contamination alternatives when the underlying density was 'near' normal. Therefore, in sample sizes of 100 or more, when one is willing to assume the existence of several moments, N/6 1 is a reliable test of symmetry against skewed or contaminated alternatives. As general tests of symmetry the modified Wilcoxon tests have a greater degree of level robustness for the densities typically considered, but are not as powerful as yjb^ over the smaller class of densities for which it is level robust. This research was supported by the National Science Foundation. REFERENCES ANDREWS, D. F., BICKBL, P. J., HAMPEL, F. R., HUBER, P. J., ROGERS, W. H. & TUKEY, J. W. (1972). Robust Estimates of Location. Princeton University Press. ANTILLE, A. & KERSTJNG, G. (1977). Tests for symmetry. Z. Wahr. verw. Oeb 39, 235-55. BAHADUR, R. R. (1966). A note on quantiles in large samples. Ann. Math. Statist. 37, 577-80. DAVID, F. X. & JOHNSON, N. L. (1956). Some tests of significance with ordered variables. J. R. Statist. Soc. B 18, 1 20. DOKSUM, K. A., FENSTAD, G. &, AABERGE, R. (1977). Plots and tests for symmetry. Biomtirika 64, 473-87. GASTWIRTH, J. L. (1965). Percentile modifications of two sample rank tests. J. Am. Statist. Assoc. 60, 1127 41. GASTWIRTH, J. L. (1971). On the sign test for symmetry. J. Am. Statist. Assoc. 66, 821-23. HAJEK, J. & SIDAK, Z. (1967). Theory of Rank Tests. New York: Academic Press. JURECKOVA, J. (1971). Asymptotic independence of rank test statistic for testing symmetry on regression. Sankhyd A 33, 1-18. TAFIA, R. A. & THOMPSON, J. R. (1978). Nonparametric Density Estimation. Baltimore, Md: Johns Hopkins University Press. [Received May 1981. Revised October 1981]