Bootstrapping Analogs of the One Way MANOVA Test

Transcription

1 Bootstrapping Analogs of the One Way MANOVA Test Hasthika S Rupasinghe Arachchige Don and David J Olive Southern Illinois University March 17, 2016 Abstract The classical one way MANOVA model is used to test whether the mean measurements are the same or differ across p groups, and assumes that the covariance matrix of each group is the same This paper suggests using the Olive (2016a) bootstrap technique to develop analogs of one way MANOVA test The new tests can have considerable outlier resistance, and the tests do not need the population covariance matrices to be equal KEY WORDS: Behrens Fisher problem, bootstrap, prediction region, coordinatewise median, RMVN estimator David J Olive is Professor, Hasthika S Rupasinghe Arachchige Don is PhD student, Department of Mathematics, Southern Illinois University, Carbondale, IL 62901, USA 1

2 1 INTRODUCTION The multivariate linear model y i = B T x i + ɛ i for i = 1,, n has m 2 response variables Y 1,, Y m and p predictor variables x 1, x 2,, x p The ith case is (x T i, yt i ) = (x i1, x i2,, x ip, Y i1,, Y im ) The model is written in matrix form as Z = XB + E where the matrices are defined below The model has E(ɛ k ) = 0 and Cov(ɛ k ) = Σɛ = (σ ij ) for k = 1,, n Then the p m coefficient matrix B = [ ] β 1 β 2 β m and the m m covariance matrix Σ ɛ are to be estimated, and E(Z) = XB while E(Y ij ) = x T i β j The ɛ i are assumed to be iid The univariate linear model corresponds to m = 1 response variable, and is written in matrix form as Y = Xβ + e Subscripts are needed for the m univariate linear models Y j = Xβ j + e j for j = 1,, m where E(e j ) = 0 For the multivariate linear model, Cov(e i, e j ) = σ ij I n for i, j = 1,, m where I n is the n n identity matrix The n m matrix Y 1,1 Y 1,2 Y 1,m Y 2,1 Y 2,2 Y 2,m Z = = [ y ] T 1 Y 1 Y 2 Y m = y Y n,1 Y n,2 Y T n n,m The n p design matrix X of predictor variables is not necessarily of full rank p, and x 1,1 x 1,2 x 1,p x 2,1 x 2,2 x 2,p X = = [ x ] T 1 v 1 v 2 v p = x x n,1 x n,2 x T n n,p where often v 1 = 1 The p m matrix B = β 1,1 β 1,2 β 1,m β 2,1 β 2,2 β 2,m β 1 β 2 β m ] β p,1 β p,2 β p,m The n m matrix ɛ 1,1 ɛ 1,2 ɛ 1,m ɛ 2,1 ɛ 2,2 ɛ 2,m E = ɛ n,1 ɛ n,2 ɛ n,m e 1 e 2 e m ] = ɛ T 1 ɛ T n Considering the ith row of Z, X, and E shows that y T i = x T i B + ɛ T i The multivariate linear regression model and one way MANOVA model are special cases of the multivariate linear model, but using double subscripts will be useful for 2

3 describing the one way MANOVA model Suppose there are independent random samples of size n i from p different populations (treatments), or n i cases are randomly assigned to p treatment groups where n = p n i Assume that m response variables y ij = (Y ij1,, Y ijm ) T are measured for the ith treatment group and the jth case (often an individual or thing) in the group Hence i = 1,, p and j = 1,, n i The Y ijk follow different one way ANOVA models for k = 1,, m Assume E(y ij ) = µ i and Cov(y ij ) = Σɛ Hence the p treatments have different mean vectors µ i, but common covariance matrix Σɛ The one way MANOVA is used to test H 0 : µ 1 = µ 2 = = µ p Often µ i = µ + τ i, so H 0 becomes H 0 : τ 1 = = τ p If m = 1, the one way MANOVA model is the one way ANOVA model MANOVA is useful since it takes into account the correlations between the m response variables The Hotelling s T 2 test that uses a common covariance matrix is a special case of the one way MANOVA model with m = 2 Let µ i = µ + τ i where p n i τ i = 0 The jth case from the ith population or treatment group is y ij = µ + τ j + ɛ ij where ɛ ij is an error vector, i = 1,, p and j = 1,, n i Let y = ˆµ = p ni j=1 y ij /n be the overall mean Let y i = n i j=1 y ij /n i so ˆτ i = y i y Let the residual vector ˆɛ ij = y ij y i = y ij ˆµ ˆτ i Then y ij = y + (y i y) + (y ij y i ) = ˆµ + ˆτ i + ˆɛ ij Several m m matrices will be useful Let S i be the sample covariance matrix corresponding to the ith treatment group Then the within sum of squares and cross products matrix is W = (n 1 1)S 1 + +( 1) = p ni j=1(y ij y i )(y ij y i ) T Then ˆΣɛ = W/(n p) The treatment or between sum of squares and cross products matrix is p B T = n i (y i y)(y i y) T The total corrected (for the mean) sum of squares and cross products matrix is T = B T + W = p ni j=1(y ij y)(y ij y) T Note that S = T/(n 1) is the usual sample covariance matrix of the y ij if it is assumed that all n of the y ij are iid so that the µ i µ for i = 1,, p The one way MANOVA model is y ij = µ+τ i +ɛ ij where the ɛ ij are iid with E(ɛ ij ) = 0 and Cov(ɛ ij ) = Σɛ If all n of the y ij are iid with E(y ij ) = µ and Cov(y ij ) = Σɛ, it can be shown that A/df P Σɛ where A = W, B T, or T and df is the corresponding degrees of freedom Let t 0 be the test statistic Often Pillai s trace statistic, the Hotelling Lawley trace statistic, or Wilks lambda are used Wilks lambda Λ = W B T + W = W T p p ni = p (n i 1)S i (n 1)S j=1(y ij y i )(y ij y i ) T ni j=1(y ij y)(y ij y) T Then t o = [n 05(m + p 2)] log(λ) and the test rejects H 0 if t 0 > χ 2 m(p 1) (1 α) See Johnson and Wichern (1988, p 238) Following Mardia, Kent, and Bibby (1979, p 335), let λ 1 λ 2 λ m be the eigenvalues of W 1 B T Then 1 + λ i for i = 1,, m are the eigenvalues of W 1 T and Λ = m (1 + λ i ) 1 3 =

4 Following Fujikoshi (2002) and Kakizawa (2009), let the Hotelling Lawley trace statistic U = tr(b T W 1 ) = tr(w 1 B T ) = m λ i, and let Pillai s trace statistic V = m tr(b T T 1 ) = tr(t 1 λ i B T ) = If the y 1 + λ ij µ j are iid with common covariance matrix Σɛ, and if H 0 is true, then under regularity conditions [n 05(m + p i 2)] log(λ) D χ 2 m(p 1), (n m p 1)U D χ 2 m(p 1), and (n 1)V D χ 2 m(p 1) Note that the common covariance matrix assumption implies that each of the p treatment groups or populations has the same covariance matrix Σ i = Σɛ for i = 1,, p, an extremely strong assumption or A possible alternative method for one way MANOVA is to use the model Z = XB+E Y 111 Y 112 Y 11m Y 1,n1,1 Y 1,n1,2 Y 1,n1,m Y 211 Y 211 Y 21m β ,1 β 1,2 β 1,m β 2,1 β 2,2 β 2,m = Y 2,n2,1 Y 2,n2,2 Y 2,n2,m E β p,1 β p,2 β p,m Y p,11 Y p,1m Y p,1m Y p,np,1 Y p,np,2 Y p,np,m Then X is full rank where the ith column of X is an indicator for group i 1 for i = 2,, p, ˆβ 1k = Y p0k = ˆµ pk for k = 1,, m, and ˆβ ik = Y i 1,0k Y p0k = ˆµ i 1,k ˆµ pk for k = 1,, m and i = 2,, p Thus testing H 0 : µ 1 = = µ p is equivalent to testing H 0 : LB = 0 where L = [0 I p 1 ] Press (2005, p 262) uses the above model Then y ij = µ i + ɛ ij and B = µ T p µ T 1 µ T p µ T 2 µ T p µ T p 2 µ T p µ T p 1 µ T p Following Olive (2016b, ch 10) and Rupasinghe Arachchige Don (2017), large sample theory can be also be used to derive a better test Let Σ i be the nonsingular population 4

5 covariance matrix of the ith treatment group or population To simplify the large sample theory, assume n i = π i n where 0 < π i < 1 and p π i = 1 Assume H 0 is true, and let µ i = µ for i = 1,, p Then by the multivariate central limit theorem, n i (y i µ) D N m (0,Σ i ), and ( n(y i µ) D N m 0, Σ ) i Let π i w = y 1 y p y 2 y p y p 2 y p y p 1 y p Then nw D N m(p 1) (0,Σw) where Σw = (Σ ij ) where Σ ij = Cov( n(y i y p ), n(y j y p )) = Σ p π p for i j, and Σ ii = Cov( n(y i y p )) = Σ i π i + Σ p π p for i = j Hence t 0 = nw T ˆΣ 1 ww = w T as the n i if H 0 is true Here ˆΣw n = S 1 n 1 + Sp S 2 n 2 + Sp ( ) 1 ˆΣw w D χ 2 m(p 1) n Sp is a block matrix where the off diagonal block entries equal / and the ith diagonal block entry is S i + for i = 1,, (p 1) Reject H 0 if t 0 > m(p 1)F m(p 1),dn (1 α) n i where d n = min(n 1,, ) It may make sense to relabel the groups so that is the largest n i or / has the smallest generalized variance of the S i /n i This test may start to outperform the one way MANOVA test if n (m + p) 2 and n i 10m for i = 1,, p 2 The Prediction Region Method Since the common covariance matrix assumption Cov(ɛ k ) = Σ ɛ for k = 1,, n is extremely strong, using the prediction region method to test H 0 : LB = 0 may be a useful alternative Take a sample of size n i with replacement from random sample i for i = 1, 2,, m Let the (p 1)m 1 vector w i = vec(l ˆB i ) = ((ˆµ 1 ˆµ p )T,, (ˆµ p 1 ˆµ p )T ) T for i = 1,, B, where vec(a) stacks columns of a matrix into a vector For a robust test use w i = ((T 1 T p ) T,, (T p 1 T p ) T ) T where T i is a robust location estimator, such as the coordinatewise median or RMVN location estimator, applied to the cases in the ith treatment group Likely need n 20mp, n (m + p) 2, and n i 20m 5

6 3 EXAMPLES AND SIMULATIONS In tables 1 to 4, 4 CONCLUSIONS Pelawa Watagoda and Rupasinghe Arachchige Don (2016) consider bootstrapping analogs of the two sample Hotelling s T 2 test, and Konietschke, Bathke, Harrar, and Pauly (2015) bootstrap MANOVA models 5 References Fujikoshi, Y (2002), Asymptotic Expansions for the Distributions of Multivariate Basic Statistics and One-Way MANOVA Tests Under Nonnormality, Journal of Statistical Planning and Inference, 108, Johnson, RA, and Wichern, DW (1988), Applied Multivariate Statistical Analysis, 2nd ed, Prentice Hall, Englewood Cliffs, NJ Kakizawa, Y (2009), Third-Order Power Comparisons for a Class of Tests for Multivariate Linear Hypothesis Under General Distributions, Journal of Multivariate Analysis, 100, Konietschke, F, Bathke, AC, Harrar, SW, and Pauly, M (2015), Parametric and Nonparametric Bootstrap Methods for General MANOVA, Journal of Multivariate Analysis, 140, Mardia, KV, Kent, JT, and Bibby, JM (1979), Multivariate Analysis, Academic Press, London, UK Olive, DJ (2016a), Bootstrapping Hypothesis Tests, unpublished manuscript at ( Olive, DJ (2016b), Robust Multivariate Analysis, Springer, New York, NY, to appear Pelawa Watagoda, LCR, and Rupasinghe Arachchige Don, HS (2016), Bootstrapping Analogs of the Two Sample Hotelling s T 2 Test, Preprint at ( mathsiuedu/olive/stwosamplepdf) Press, SJ (2005), Applied Multivariate Analysis: Using Bayesian and Frequentist Methods of Inference, 2nd ed, Dover, Mineola, NY Rupasinghe Arachchige Don, HS (2017), Bootstrapping Analogs of the One Way MANOVA Test, PhD Thesis, Southern Illinois University, to appear Van Aelst, S, and Willems, G (2011), Robust and Efficient One-Way MANOVA Tests, Journal of the American Statistical Association, 106,