SOME HYPOTHESIS TESTS FOR THE COVARIANCE MATRIX WHEN THE DIMENSION IS LARGE COMPARED TO THE SAMPLE SIZE


 Albert Knight
 1 years ago
 Views:
Transcription
1 The Aals of Statistis 2002, Vol. 30, No. 4, SOME HYPOTHESIS TESTS FOR THE COVARIANCE MATRIX WHEN THE DIMENSION IS LARGE COMPARED TO THE SAMPLE SIZE BY OLIVIER LEDOIT AND MICHAEL WOLF 1 UCLA ad Credit Suisse First Bosto, ad Uiversitat Pompeu Fabra This paper aalyzes whether stadard ovariae matrix tests work whe dimesioality is large, ad i partiular larger tha sample size. I the latter ase, the sigularity of the sample ovariae matrix makes likelihood ratio tests degeerate, but other tests based o quadrati forms of sample ovariae matrix eigevalues remai welldefied. We study the osistey property ad limitig distributio of these tests as dimesioality ad sample size go to ifiity together, with their ratio overgig to a fiite ozero limit. We fid that the existig test for spheriity is robust agaist high dimesioality, but ot the test for equality of the ovariae matrix to a give matrix. For the latter test, we develop a ew orretio to the existig test statisti that makes it robust agaist high dimesioality. 1. Itrodutio. May empirial problems ivolve largedimesioal ovariae matries. Sometimes the dimesioality p is eve larger tha the sample size, whih makes the sample ovariae matrix S sigular. How to odut statistial iferee i this ase? For oreteess, we fous o two ommo testig problems i this paper: 1) the ovariae matrix is proportioal to the idetity I spheriity); 2) the ovariae matrix is equal to the idetity I. The idetity a be replaed with ay other matrix 0 by multiplyig the data by 1/2 0. Followig muh of the literature, we assume ormality. For both hypotheses the likelihood ratio test statisti is degeerate whe p exeeds ; see, for example, Muirhead 1982), Setios 8.3 ad 8.4, or Aderso 1984), Setios 10.7 ad This steers us toward other test statistis that do ot degeerate, suh as U = 1 [ )2 ] p tr S 1) 1/p) trs) I ad V = 1 p tr[ S I) 2] where tr deotes the trae. Joh 1971) proves that the test based o U is the loally most powerful ivariat test for spheriity, ad Nagao 1973) derives V as the equivalet of U for the test of = I. The asymptoti framework where U ad V have bee studied assumes that goes to ifiity while p remais fixed. It treats terms of order p/ like terms of order 1/, whih is iappropriate if p is Reeived May 1998; revised November Supported by DGES Grat BEC AMS 2000 subjet lassifiatios. Primary 62H15; seodary 62E20. Key words ad phrases. Coetratio asymptotis, equality test, spheriity test. 1081
2 1082 O. LEDOIT AND M. WOLF of the same order of magitude as. The robustess of tests based o U ad V agaist high dimesioality is heretofore ukow. We study the asymptoti behavior of U ad V as p ad go to ifiity together with the ratio p/ overgig to a limit 0, + ) alled the oetratio. The sigular ase orrespods to a oetratio above oe. The robustess issue boils dow to power ad size: is the test still osistet? Is the limitig distributio uder the ull still a good approximatio? Surprisigly, we fid opposite aswers for U ad V. The power ad the size of the spheriity test based o U tur out to be robust agaist p large, ad eve larger tha. Butthetestof = I based o V is ot osistet agaist every alterative whe p goes to ifiity with,adits limitig distributio differs from its, p)limitig distributio uder the ull. This prompts us to itrodue the modified statisti 2) W = 1 p tr[ S I) 2] p [ ] 1 2 p trs) + p. W has the same asymptoti properties as V :it is osistet ad has the same limitig distributio as V udertheull. Weshowthat, otraryto V, thepower ad the size of the test based o W are robust agaist p large, ad eve larger tha. The otributios of this paper are: i) developig a method to hek the robustess of ovariae matrix tests agaist high dimesioality; ad ii) fidig two statistis oe old ad oe ew) for ommoly used ovariae matrix tests that a be used whe the sample ovariae matrix is sigular. Our results rest o a large ad importat body of literature o the asymptotis for eigevalues of radom matries, suh as Arharov 1971), Bai 1993), Girko 1979, 1988), Josso 1982), Narayaaswamy ad Raghavarao 1991), Serdobol skii 1985, 1995, 1999), Silverstei 1986), Silverstei ad Combettes 1992), Wahter 1976, 1978) ad Yi ad Krishaiah 1983), amog others. Also, we are addig to a substatial list of papers dealig with statistial tests usig results o large radom matries, suh as Alalouf 1978), Bai, Krishaiah, ad Zhao 1989), Bai ad Saraadasa 1996), Dempster 1958, 1960), Läuter 1996), Saraadasa 1993), Wilso ad Kshirsagar 1980) ad Zhao, Krishaiah ad Bai 1986a, 1986b). The remaider of the paper is orgaized as follows. Setio 2 ompiles prelimiary results. Setio 3 shows that the test statisti U for spheriity is robust agaist large dimesioality. Setio 4 shows that the test of = I based o V is ot. Setio 5 itrodues a ew statisti W that a be used whe p is large. Setio 6 reports evidee from Mote Carlo simulatios. Setio 7 addresses some possible oers. Setio 8 otais the olusios. Proofs are deferred to the Appedix. 2. Prelimiaries. The exat sese i whih sample size ad dimesioality go to ifiity together is defied by the followig assumptios.
3 LARGEDIMENSIONAL COVARIANCE MATRIX TESTS 1083 ASSUMPTION 1 Asymptotis). Dimesioality ad sample size are two ireasig iteger futios p = p k ad = k of a idex k = 1, 2,... suh that lim k p k =+,lim k k =+ ad there exists 0, + ) suh that lim k p k / k =. The ase where the sample ovariae matrix is sigular orrespods to a oetratio higher tha oe. I this paper, we refer to oetratio asymptotis or, p)asymptotis. Aother term sometimes used for the same oept is ireasig dimesio asymptotis i.d.a) ; for example, see Serdobol skii 1999). ASSUMPTION 2 Datageeratig proess). For eah positive iteger k, X k is a k + 1) p k matrix of k + 1 i.i.d. observatios o a system of p k radom variables that are joitly ormally distributed with mea vetor µ k ad ovariae matrix k.letλ 1,k,...,λ pk,k deote the eigevalues of the ovariae matrix k. We suppose that their average α = p k i=1 λ i,k/p k ad their dispersio δ 2 = p k i=1 λ i,k α) 2 /p k are idepedet of the idex k. Furthermore, we require α>0. S k is the sample ovariae matrix with etries s ij,k = 1 x jl,k m j,k ) where m i,k = l=1 x il,k m i,k ) +1 l=1 x il,k. The ull hypothesis of spheriity a be stated as δ 2 = 0, ad the ull = I a be stated as δ 2 = 0adα = 1. We eed oe more assumptio to obtai overgee results uder the alterative. ASSUMPTION 3 Higher momets). The averages of the third ad fourth momets of the eigevalues of the populatio ovariae matrix p k λ i,k ) j j = 3, 4) p i=1 k overge to fiite limits, respetively. Depedee o k will be omitted whe o ambiguity is possible. Muh of the mathematial groudwork has already bee laid out by researh i the spetral theory of largedimesioal radom matries. The fudametal results of iterest to us are as follows. PROPOSITION 1 Law of large umbers). Uder Assumptios 1 3, 1 3) p trs) P α, 4) 1 p trs2 ) P 1 + )α 2 + δ 2 where P deotes overgee i probability.
4 1084 O. LEDOIT AND M. WOLF All proofs are i the Appedix. This law of large umbers will help us establish whether or ot a give test is osistet agaist every alterative as ad p go to ifiity together. The distributio of the test statisti uder the ull will be foud by usig the followig etral limit theorem. PROPOSITION 2 Cetral limit theorem). Uder Assumptios 1 2, if δ 2 = 0, the 1 p trs) α 1 p trs2 ) + p + 1 α 2 5) 2α 2 [ ] D ) α 3 N, ) ) 2 α α 4 where D deotes overgee i distributio ad N the ormal distributio. 3. Spheriity test. It is well kow that the spheriity test based o U is osistet. As for, p)osistey, Propositio 1 implies that, uder Assumptios 1 3, U = 1/p) trs2 ) [1/p) trs)] 2 1 P 1 + )α2 + δ 2 6) α 2 1 = + δ2 α 2. Sie a be approximated by the kow quatity p/, the power of this test to separate the ull hypothesis of spheriity δ 2 /α 2 = 0 from the alterative δ 2 /α 2 > 0 overges to oe as ad p go to ifiity together: this ostitutes a, p)osistet test. Joh 1972) showsthat, as goesto ifiity while p remais fixed, the limitig distributio of U uder the ull is give by p 7) 2 U D Y pp+1)/2 1 or, equivaletly, 8) U p D 2 p Y pp+1)/2 1 p where Y d deotes a radom variable distributed as a χ 2 with d degrees of freedom. It will beome apparet after Propositio 4 why we hoose to rewrite equatio 7) as 8). This approximatio may or may ot remai aurate uder, p)asymptotis, depedig o whether it omits terms of order p/. To fid out, let us start by derivig the, p)limitig distributio of U uder the ull hypothesis δ 2 /α 2 = 0.
5 LARGEDIMENSIONAL COVARIANCE MATRIX TESTS 1085 PROPOSITION 3. Uder the assumptios of Propositio 2, 9) U p D N 1, 4). Now we a ompare equatios 8) ad 9). PROPOSITION 4. Suppose that, for every k, the radom variable Y pk p k +1)/2+a is distributed as a χ 2 with p k p k + 1)/2 + a degrees of freedom, where a is a ostat iteger. The its limitig distributio uder Assumptio 1 satisfies 10) 2 p k Y pk p k +1)/2+a p k D N 1, 4). Usig Propositio 4 with a = 1 shows that the limitig distributio give by equatio 8) is still orret uder, p)asymptotis. Theolusioof ouraalysisofthespheriitytestbasedo U is the followig: the existig asymptoti theory where p is fixed) remais valid if p goes to ifiity with, eve for the ase p>. 4. Test that a ovariae matrix is the idetity. As goes to ifiity with p fixed, S P, therefore V P p 1 tr[ I)2 ]. This shows that the test of = I based o V is osistet. As for, p)osistey, Propositio 1 implies that, uder Assumptios 1 3, 11) V = 1 p trs2 ) 2 p trs) + 1 P 1 + )α 2 + δ 2 2α + 1 = α 2 + α 1) 2 + δ 2. Sie 1 p tr[ I)2 ]=α 1) 2 + δ 2 is a squared measure of distae betwee the populatio ovariae matrix ad the idetity, the ull hypothesis a be rewritte as α 1) 2 + δ 2 = 0, ad the alterative as α 1) 2 + δ 2 > 0. The problem is that the probability limit of the test statisti V is ot diretly a futio of α 1) 2 +δ 2 : it ivolves aother term, α 2, whih otais the uisae parameter α 2. Therefore the test based o V may sometimes be powerless to separate the ull from the alterative. More speifially, whe the triplet,α,δ)satisfies 12) α 2 + α 1) 2 + δ 2 =, the test statisti V has the same probability limit uder the ull as uder the alterative. The learest outerexamples are those where δ 2 = 0, beause Propositio 2 allows us to ompute the limit of the power of the test agaist suh alteratives. Whe δ 2 = 0 the solutio to equatio 12) is α = 1 1+.
6 1086 O. LEDOIT AND M. WOLF PROPOSITION 5. Uder Assumptios 1 2, if 0, 1) ad there exists a fiite d suh that p = + d + o 1 ) the the power of the test of ay positive sigifiae level based o V to rejet the ull = I whe the alterative = 1 1+I is true overges to a limit stritly below oe. We see that the osistey of the test based o V does ot exted to, p)asymptotis. Nagao 1973) shows that, as goes to ifiity while p remais fixed, the limitig distributio of V udertheullis giveby p 13) 2 V D Y pp+1)/2 or, equivaletly, 14) V p D 2 p Y pp+1)/2 p where, as before, Y d deotes a radom variable distributed as a χ 2 with d degrees of freedom. It is ot immediately apparet whether this approximatio remais aurate uder, p)asymptotis. The, p)limitig distributio of V uder the ull hypothesis α 1) 2 + δ 2 = 0 is derived i equatio 38) i the Appedix as part of the proof of Propositio 5: 15) V p D N 1, 4 + 8). Usig Propositio 4 with a = 0showsthatthelimitig distributio give by equatio 14) is iorret uder, p)asymptotis. The olusio of our aalysis of the test of = I based o V is the followig: the existig asymptoti theory where p is fixed) breaks dow whe p goes to ifiity with, iludig the ase p>. 5. Test that a ovariae matrix is the idetity: ew statisti. The ideal would be to fid a simple modifiatio of V that has the same asymptoti properties ad better, p)asymptoti properties i the spirit of U). This is why we itrodue the ew statisti 16) W = 1 p tr[ S I) 2] p [ 1 p trs) ] 2 + p. As goes to ifiity with p fixed, W P 1 p tr[ I)2 ], therefore the test of = I based o W is osistet. As for, p)osistey, Propositio 1 implies that, uder Assumptios 1 3, 17) W P α 2 + α 1) 2 + δ 2 α 2 + = + α 1) 2 + δ 2.
7 LARGEDIMENSIONAL COVARIANCE MATRIX TESTS 1087 Sie a be approximated by the kow quatity p/, the power of the test based o W to separate the ull hypothesis α 1) 2 + δ 2 = 0 from the alterative α 1) 2 +δ 2 > 0 overges to oe as ad p go to ifiity together: the test based o W is, p)osistet. The followig propositio shows that W has the same limitig distributio as V uder the ull. PROPOSITION 6. As goes to ifiity with p fixed, the limitig distributio of W uder the ull hypothesis α 1) 2 + δ 2 = 0 is the same as for V : p 18) 2 W D Y pp+1)/2 or, equivaletly, 19) W p D 2 p Y pp+1)/2 p where Y d deotes a radom variable distributed as a χ 2 with d degrees of freedom. To fid out whether this approximatio remais aurate uder, p)asymptotis, we derive the, p)limitig distributio of W uder the ull. 20) PROPOSITION 7. Uder Assumptios 1 2, if α 1) 2 + δ 2 = 0 the W p D N 1, 4). Usig Propositio 4 with a = 0 shows that the limitig distributio give by equatio 19) is still orret uder, p)asymptotis. The olusio of our aalysis of the test of = I based o W is the followig: the asymptoti theory developed for V is diretly appliable to W,aditremais valid for W but ot V )ifp goes to ifiity with, eve i the ase p>. 6. Mote Carlo simulatios. So far, little is kow about the fiitesample behavior of these tests. I partiular the questio of whether they are ubiased i fiite sample is ot readily tratable. Yet some light a be shed o fiitesample behavior through Mote Carlo simulatios. Mote Carlo simulatios are used to fid the size ad power of the test statistis U, V,adW for p, = 4, 8,...,256. I eah ase we ru 10, 000 simulatios. The alterative agaist whih power is omputed has to be salable i the sese that it a be represeted by populatio ovariae matries of ay dimesio p = 4, 8,...,256. The simplest alterative we a thik of is to set half of the populatio eigevalues equal to 1, ad the other oes equal to 0.5. Table 1 reports the size of the spheriity test based o U. The test is arried out by omputig the 95% utoff poit from the χ 2 limitig distributio i
8 1088 O. LEDOIT AND M. WOLF TABLE 1 Size of spheriity test based o U. The ull hypothesis is rejeted whe the test statisti exeeds the 95% utoff poit obtaied from the χ 2 approximatio. Atual size overges to omial size as dimesioality p goes to ifiity with sample size. Results ome from 10,000 Mote Carlo simulatios p equatio 8). We see that the quality of this approximatio does ot get worse whe p gets large: it a be relied upo eve whe p>. This is what we expeted give Propositio 4. Table 2 shows the power of the spheriity test based o U agaist the alterative desribed above. We see that the power does ot beome lower whe p gets large: power stays high eve whe p>. This ofirms the, p)osistey result derived from equatio 6). The table idiates that the power seems to deped predomiatly o. For fixed sample size, the power of the test is ofte ireasig i p, whih is somewhat surprisig. We do ot have ay simple explaatio of this pheomeo but will address it i future researh fousig o the aalysis of power. TABLE 2 Power of spheriity test based o U. The ull hypothesis is rejeted whe the test statisti exeeds the 95% utoff poit obtaied from the χ 2 approximatio. Data are geerated uder the alterative where half of the populatio eigevalues are equal to 1, ad the other oes are equal to 0.5. Power overges to oe as dimesioality p goes to ifiity with sample size. Results ome from 10,000 Mote Carlo simulatios p
9 LARGEDIMENSIONAL COVARIANCE MATRIX TESTS 1089 TABLE 3 Size of equality test based o V. The ull hypothesis is rejeted whe the test statisti exeeds the 95% utoff poit obtaied from the χ 2 approximatio. Atual size does ot overge to omial size as dimesioality p goes to ifiity with sample size. Results ome from 10,000 Mote Carlo simulatios p Usig the same methodology as i Table 1, we report i Table 3 the size of the test for = I based o V. We see that the χ 2 limitig distributio uder the ull i equatio 14) is a poor approximatio for large p. This is what we expeted give the disussio surroudig equatio 15). Usig the same methodology as i Table 2, we report i Table 4 the power of the test based o V agaist the alterative desribed above. Give the disussio surroudig equatio 12), we atiipate that this test will ot be powerful whe =[α 1) 2 + δ 2 ]/1 α 2 ) = 2/7. Ideed we observe that, i the ells where p/ exeeds the ritial value 2/7, this test does ot have muh power to rejet the alterative. TABLE 4 Power of equality test based o V. The ull hypothesis is rejeted whe the test statisti exeeds the 95% utoff poit obtaied from the χ 2 approximatio. Data are geerated uder the alterative where half of the populatio eigevalues are equal to 1, ad the other oes are equal to 0.5. Power does ot overge to oe as dimesioality p goes to ifiity with sample size. Results ome from 10,000 Mote Carlo simulatios p
10 1090 O. LEDOIT AND M. WOLF TABLE 5 Size of equality test based o W. The ull hypothesis is rejeted whe the test statisti exeeds the 95% utoff poit obtaied from the χ 2 approximatio. Atual size overges to omial size as dimesioality p goes to ifiity with sample size. Results ome from 10,000 Mote Carlo simulatios p Usig the same methodology as i Table 1, we report i Table 5 the size of the test for = I based o W. We see that the χ 2 approximatio i equatio 19) for the ull distributio does ot get worse whe p gets large: it a be relied upo eve whe p>. This is what we expeted give the disussio surroudig equatio 15). Usig the same methodology as i Table 2, we report i Table 6 the power of the test based o W agaist the alterative desribed above. We see that the power does ot beome lower whe p gets large: power stays high eve whe p>.this ofirms the, p)osistey result derived from equatio 17). As with U, the table idiates that the power seems to deped predomiatly o, ad to be ireasig i p for fixed. TABLE 6 Power of equality test based o W. The ull hypothesis is rejeted whe the test statisti exeeds the 95% utoff poit obtaied from the χ 2 approximatio. Data are geerated uder the alterative where half of the populatio eigevalues are equal to 1, ad the other eigevalues are equal to 0.5. Power overges to oe as dimesioality p goes to ifiity with sample size. Results ome from 10,000 Mote Carlo simulatios p
11 LARGEDIMENSIONAL COVARIANCE MATRIX TESTS 1091 Overall, these Mote Carlo simulatios ofirm the fiitesample relevae of the asymptoti results obtaied i Setios 3, 4 ad Possible oers. For the disussio that follows, reall the defiitio of the rth mea of a olletio of p oegative reals, {s 1,...,s p },giveby ) 1/p 1 p s p i, if r 0, p Mr) = i=1 p s 1/p i, if r = 0. i=1 A possible oer is the use of Joh s statisti U for testig spheriity, sie it is based o the ratio of the first ad seod meas [i.e., M1) ad M2)] ofthe sample eigevalues. The likelihood ratio LR) test statisti, o the other had, is based o the ratio of the geometri mea [i.e., M0)] to the first mea of the sample eigevalues; for example, see Muirhead 1982), Setio 8.3. It has log bee kow that the LR test has the desirable property of beig ubiased; see Gleser 1966) ad Marshall ad Olki 1979), pages Also, for the related problem of testig homogeeity of variaes, it has log bee established that ertai tests based o ratios of the type Mr)/Mt) with r 0adt 0 are ubiased; see Cohe ad Strawderma 1971). No ubiasedess properties are kow for tests based o ratios of the type Mr)/Mt) with both r>0adt>0. Still, we advoate the use of Joh s statisti U over the LR statisti for testig spheriity whe p is large ompared to. First, the LR test statisti is degeerate whe p>though oe might try to defie a alterative statisti usig the ozero sample eigevalues oly i this ase). Seod, whe p is less tha or equal to but lose to some of the sample eigevalues will be very lose to zero, ausig the LR statisti to be early degeerate; this should affet the fiitesample performae of the LR test. Obviously, this also questios the strategy of ostrutig a LRlike statisti based o the ozero sample eigevalues oly whe p>.) Our ituitio is that tests whose statisti ivolves a mea Mr) with r 0 will misbehave whe p beomes lose to. The reaso is that they give too muh importae to the sample eigevalues lose to zero, whih otai iformatio ot o the true ovariae matrix but o the ratio p/; see Figure 1 for a illustratio. To hek this ituitio, we ru a Mote Carlo o the LR test for spheriity for the ase p. Critial values are obtaied from the χ 2 approximatio uder the ull; for example, see Muirhead 1982), Setio 8.3. The simulatio setup is idetial to that of Setio 6. Table 7 reports the simulated size of the LR test ad severe size distortios for large values of p ompared to are obvious. Next we ompute the power of the LR test i a way that eables diret ompariso with Table 2: we use the distributio of the LR test statisti simulated uder the ull
12 1092 O. LEDOIT AND M. WOLF FIG.1. Sample versus true eigevalues. The solid lie represets the distributio of the eigevalues of the sample ovariae matrix based o the asymptoti formula prove by Marčeko ad Pastur 1967). Eigevalues are sorted from largest to smallest, the plotted agaist their rak. I this ase, the true ovariae matrix is the idetity, that is, the true eigevalues are all equal to oe. The distributio of the true eigevalues is plotted as a dashed horizotal lie at oe. Distributios are obtaied i the limit as the umber of observatios ad the umber of variables p bothgoto ifiity with the ratio p/ overgig to a fiite positive limit, the oetratio. The four plots orrespod to differet values of the oetratio. to fid the utoff poits orrespodig to the realized sizes i Table 1 most of them are equal to the omial size of 0.05, but for small values of p ad they are lower). Usig these utoff poits for the LR test statisti geerates a test with exatly the same size as the test based o Joh s statisti U, so we a diretly ompare the power of the two tests. Table 8 is the equivalet of Table 2 exept it uses the LR test statisti for p. We a see that the LR test is slightly more powerful tha Joh s test by oe peret or less) whe p is small ompared to,
13 LARGEDIMENSIONAL COVARIANCE MATRIX TESTS 1093 TABLE 7 Size of spheriity test based o LR test statisti. The ull hypothesis is rejeted whe the test statisti exeeds the 95% utoff poit obtaied from the χ 2 approximatio. Atual size does ot overge to omial size as dimesioality p goes to ifiity with sample size. Results ome from 10,000 Mote Carlo simulatios p but is substatially less powerful whe p gets lose to. Hee, both i terms of size ad power, the test based o U is preferable to the LR test whe p is large ompared to, ad this is the seario of iterest of the paper. Aother possible oer addresses the otio of osistey whe p teds to ifiity. For p fixed, the alterative is give by a fixed ovariae matrix ad osistey meas that the power of the test teds to oe as the sample size teds to ifiity. Of ourse, whe p ireases the matrix of the alterative a o loger be fixed. Our approah is to work withi a asymptoti framework that plaes ertai restritios o how a evolve, amely we require that the quatities α ad δ 2 aot hage; see Assumptio 2. Obviously, this exludes TABLE 8 Power of spheriity test based o LR test statisti. The ull hypothesis is rejeted whe the test statisti exeeds the 95% sizeadjusted utoff poit to eable diret ompariso with Table 2) obtaied from the χ 2 approximatio. Data are geerated uder the alterative where half of the populatio eigevalues are equal to 1, ad the other oes are equal to 0.5. Power does ot overge to oe as dimesioality p goes to ifiity with sample size. Results ome from 10,000 Mote Carlo simulatios p
14 1094 O. LEDOIT AND M. WOLF ertai alteratives of iterest suh as havig all eigevalues equal to 1 exept for the largest whih is equal to p β,forsome0<β<0.5. For this sequee of alteratives, the test based o Joh s statisti U is ot osistet ad a test based o aother statisti would have to be devised e.g., ivolvig the maximum sample eigevalue). Suh other asymptoti frameworks are deferred to future researh. 8. Colusios. I this paper, we have studied the spheriity test ad the idetity test for ovariae matries whe the dimesioality is large ompared to the sample size, ad i partiular whe it exeeds the sample size. Our aalysis is restrited to a asymptoti framework that osiders the first two momets of the eigevalues of the true ovariae matrix to be idepedet of the dimesioality. We foud that the existig test for spheriity based o Joh s 1971) statisti U is robust agaist high dimesioality. O the other had, the related test for idetity based o Nagao s 1973) statisti V is iosistet. We proposed a modifiatio to the statisti V whih makes it robust agaist high dimesioality. Mote Carlo simulatios ofirmed that our asymptoti results ted to hold well i fiite samples. Diretios for future researh ilude: applyig the method to other test statistis; fidig limitig distributios uder the alterative to ompute power; searhig for most powerful tests withi speifi asymptoti frameworks for the sequee of alteratives); relaxig the ormality assumptio. APPENDIX PROOF OF PROPOSITION 1. The proof of this propositio is otaied iside the proof of the mai theorem of Yi ad Krishaiah 1983). Their paper deals with the produt of two radom matries but it a be applied to our setup by takig oe of them to be the idetity matrix as a speial ase of a radom matrix. Eve though their mai theorem is derived uder assumptios o all the average momets of the eigevalues of the populatio ovariae matrix, areful ispetio of their proof reveals that overgee i probability of the first two average momets requires oly assumptios up to the fourth momet. The formulas for the limits ome from Yi ad Krishaiah s 1983) seod equatio o the top of page 504. PROOF OF PROPOSITION 2. Chagig α simply amouts to resalig 1 p trs) by α ad p 1 trs2 ) by α 2, therefore we a assume without loss of geerality that α = 1. Josso s 1982) Theorem 4.1 shows that, uder the assumptios of Propositio 2, { } trs) E[trS)] + p 21) 2 { trs 2 + p) 2 ) E[trS 2 )] }
15 LARGEDIMENSIONAL COVARIANCE MATRIX TESTS 1095 overges i distributio to a bivariate ormal. Sie p/ 0, + ), this implies that 22) [ ] 1 1 p trs) E p trs) 1 p trs2 ) E [ ] 1 p trs2 ) also overges i distributio to a bivariate ormal. p trs) is the average of the diagoal elemets of the ubiased sample ovariae matrix, therefore its expetatio is equal to oe. Joh 1972), Lemma 2, shows that the expetatio of p 1 trs2 ) is equal to +p+1. So far we have established that 1 23) 1 p trs) 1 1 p trs2 ) + p + 1 overges i distributio to a bivariate ormal. Sie this limitig bivariate ormal has mea zero, the oly task left is to ompute its ovariae matrix. This a be doe by takig the limit of the ovariae matrix of the expressio i equatio 23). Usig oe agai the momets omputed by Joh 1972), Lemma 2, we fid that [ ] Var p trs) [ ) 2 ] = E p trs) ] Var[ p trs2 ) [ ) 2 ] = E p trs2 ) [ ]) 2 E p trs) p + 2) = 2 = 2 p p 2, [ ]) 2 E p trs2 ) = p3 + 2p 2 + 2p + 8) 2 + p 3 + 2p p + 20) + 8p p + 20 p + p + 1) 2 = 8 p + 20p2 + 20p p 2 + 8p3 + 20p p p
16 1096 O. LEDOIT AND M. WOLF Fially we have to fid the ovariae term. Let s ij deote the etry i, j) of the ubiased sample ovariae matrix S. Wehave p p p E[trS) trs 2 )]= E[s ii sjl 2 ] i=1 j=1 l=1 = pp 1)p 2)E[s 11 s23 2 ]+pp 1)E[s 11s22 2 ] + 2pp 1)E[s 11 s 2 12 ]+pe[s3 11 ] 24) = pp 1)p 2) + pp 1) pp 1) ) + 4) 2 + p 2 = p 2 + p3 + p 2 + 4p + 4p2 + 4p 2. The momet formulas that appear i equatio 24) are omputed i the same fashio as i the proof of Lemma 2 by Joh 1972). This eables us to ompute the limitig ovariae term as 25) [ Cov p trs), ] p trs2 ) = 2 [ p 2 E[trS) trs2 )] E p trs) = 2 + p2 + p + 4 p = 4 p p ). ] [ ] E p trs2 ) + 4p p + 1) p This ompletes the proof of Propositio 2. PROOF OF PROPOSITION 3. Defie the futio fx,y) = y 1. The x 2 U = f 1 p trs), 1 p trs2 )). Propositio 2 implies that, by the delta method, [ U f α, + p + 1 )] α 2 D N 0, lim A),
17 LARGEDIMENSIONAL COVARIANCE MATRIX TESTS 1097 where f α, + p + 1 ) α 2 x A = f α, + p + 1 ) α 2 y 2α ) α 3 f α, + p + 1 ) α 2 x ) ) 2 α α 4 f α, + p + 1 ) α 2 y ad deotes the traspose. Notie that f α, + p + 1 ) 26) α 2 = p + 1, f α, + p + 1 ) 27) α 2 = 2 + p + 1, x α f α, + p + 1 ) 28) α 2 = 1 y α 2. Plaig the last two expressios ito the formula for A yields 29) + p + 1)2 A = ) ) + p ) 1 + ) ) ) ) = 4. This ompletes the proof of Propositio 3. PROOF OF PROPOSITION 4. Let z 1,z 2,...deote a sequee of i.i.d. stadard ormal radom variables. The Y pk p k +1)/2+a has the same distributio as z zp 2 k p k +1)/2+a.SieE[z2 1 ]=1adVar[z2 1 ]=2, the Lideberg Lévy etral limit theorem implies that [ ] Ypk p p k p k + 1)/2 + a k +1)/2+a D 31) p k p k + 1)/2 + a 1 N 0, 2). Multiplyig the lefthad side by p k p k + 1) + 2a/p k, whih overges to oe, does ot affet the limit, therefore 2 Y pk p p k +1)/2+a p k a 2 D 32) N 0, 2). k 2 p k
18 1098 O. LEDOIT AND M. WOLF Subtratig from the lefthad side a 2/p k, whih overges to zero, does ot affet the limit, therefore 2 Y pk p p k +1)/2+a p k + 1 D 33) N 0, 2). k 2 Resalig equatio 33) yields 10). PROOF OF PROPOSITION 5. Defie the futio gx,y) = y 2x + 1. The V = g p 1 trs), p 1 trs2 )). Propositio 2 implies that, by the delta method, [ V g α, + p + 1 )] α 2 D N 0, lim B), where g α, + p + 1 ) α 2 x B = g y Notie that 34) 35) 36) α, + p + 1 2α g g x g y ) α ) α 3 ) ) 2 α α, + p + 1 α, + p + 1 ) α 2 = 2, α, + p + 1 ) α 2 = 1. g α, + p + 1 ) α 2 x α 4 g α, + p + 1 ) α 2. y α 2 ) = α 1) 2 + p + 1 α2, Plaig the last two expressios ito the formula for B yields B = 8 1 α ) ) 2 37) α α 4. First let usfid the, p)limitig distributio of V uderthe ull. Settig α equal to oe yields g1, +p+1 ) = p+1 ad B = Hee, uder the ull, V p + 1 ) D 38) N 0, 4 + 8).
19 LARGEDIMENSIONAL COVARIANCE MATRIX TESTS 1099 Now let us fid the, p)limitig distributio of V uder the alterative. Settig α equal to 1 1+ yields 1 g 1 +, + p ) 2 ) 1 + ) 2 ad B = 8 )2 1 = = p p + 1 ) ) 2 ) d ) 2 + o ) ) ) 1 3 ) = 41 ) ) 4. Hee, uder the alterative, 39) V p + 1 ) D 2 )d + 1) N 1 + ) 2, 41 ) ) 4 Therefore the power of a test of sigifiae level θ>0 to rejet the ull = I whe the alterative = 1 1+I is true overges to 1 1 θ) )d + 1)/1 + ) 2 ) 40) 1 < 1 41 ) )/1 + ) 4 where deotes the stadard ormal.d.f. 41) 42) Assumig p fixed, it is easily see that [ ] 1 2 ) W V) = p 1 p trs) PROOF OF PROPOSITION 6. P p1 α 2 ) = 0 uder the ull). Hee, uder the ull, W V) overges to zero i probability, as goes to ifiity for p fixed. The proof is ompleted by applyig Slutzky s theorem. PROOF OF PROPOSITION 7. Defie hx, y) = y 2x + 1 p x2 + p.the W = h p 1 trs), p 1 trs2 )). Propositio 2 implies that, by the delta method, [ W h 1, + p + 1 )] D N 0, lim C), ) 4 ).
20 1100 O. LEDOIT AND M. WOLF where Notie that 43) 44) 45) h 1, + p + 1 ) x C = h 1, + p + 1 ) y ) h h x h y ) , + p + 1 1, + p + 1 1, + p + 1 h 1, + p + 1 ) x ) h 1, + p + 1 ). y ) = p + 1, ) = 2 + p, ) = 1. Plaig the last two expressios ito the formula for C yields 46) + p)2 C = ) ) + p ) 1 + ) ) ) ) = 4. This ompletes the proof of Propositio 7. Akowledgmets. We wish to thak Theodore W. Aderso for eouragemet. We are also grateful to a Assoiate Editor ad a referee for ostrutive ritiisms that have led to a improved presetatio of the paper. REFERENCES ALALOUF, I. S. 1978). A expliit treatmet of the geeral liear model with sigular ovariae matrix. Sakhyā Ser.B ANDERSON, T. W. 1984). A Itrodutio to Multivariate Statistial Aalysis, 2d ed. Wiley, New York. ARHAROV, L. V. 1971). Limit theorems for the harateristi roots of a sample ovariae matrix. Soviet Math. Dokl BAI, Z. D. 1993). Covergee rate of expeted spetral distributios of large radom matries. II. Sample ovariae matries. A. Probab BAI, Z.D.,KRISHNAIAH, P.R.adZHAO, L. C. 1989). O rates of overgee of effiiet detetio riteria i sigal proessig with white oise. IEEE Tras. Iform. Theory
SOME GEOMETRY IN HIGHDIMENSIONAL SPACES
SOME GEOMETRY IN HIGHDIMENSIONAL SPACES MATH 57A. Itroductio Our geometric ituitio is derived from threedimesioal space. Three coordiates suffice. May objects of iterest i aalysis, however, require far
More informationConsistency of Random Forests and Other Averaging Classifiers
Joural of Machie Learig Research 9 (2008) 20152033 Submitted 1/08; Revised 5/08; Published 9/08 Cosistecy of Radom Forests ad Other Averagig Classifiers Gérard Biau LSTA & LPMA Uiversité Pierre et Marie
More informationHOW MANY TIMES SHOULD YOU SHUFFLE A DECK OF CARDS? 1
1 HOW MANY TIMES SHOULD YOU SHUFFLE A DECK OF CARDS? 1 Brad Ma Departmet of Mathematics Harvard Uiversity ABSTRACT I this paper a mathematical model of card shufflig is costructed, ad used to determie
More informationNo Eigenvalues Outside the Support of the Limiting Spectral Distribution of Large Dimensional Sample Covariance Matrices
No igevalues Outside the Support of the Limitig Spectral Distributio of Large Dimesioal Sample Covariace Matrices By Z.D. Bai ad Jack W. Silverstei 2 Natioal Uiversity of Sigapore ad North Carolia State
More informationWhich Extreme Values Are Really Extreme?
Which Extreme Values Are Really Extreme? JESÚS GONZALO Uiversidad Carlos III de Madrid JOSÉ OLMO Uiversidad Carlos III de Madrid abstract We defie the extreme values of ay radom sample of size from a distributio
More informationTesting for Welfare Comparisons when Populations Differ in Size
Cahier de recherche/workig Paper 039 Testig for Welfare Comparisos whe Populatios Differ i Size JeaYves Duclos Agès Zabsoré Septembre/September 200 Duclos: Départemet d écoomique, PEP ad CIRPÉE, Uiversité
More informationEverything You Always Wanted to Know about Copula Modeling but Were Afraid to Ask
Everythig You Always Wated to Kow about Copula Modelig but Were Afraid to Ask Christia Geest ad AeCatherie Favre 2 Abstract: This paper presets a itroductio to iferece for copula models, based o rak methods.
More informationHow Has the Literature on Gini s Index Evolved in the Past 80 Years?
How Has the Literature o Gii s Idex Evolved i the Past 80 Years? Kua Xu Departmet of Ecoomics Dalhousie Uiversity Halifax, Nova Scotia Caada B3H 3J5 Jauary 2004 The author started this survey paper whe
More informationStéphane Boucheron 1, Olivier Bousquet 2 and Gábor Lugosi 3
ESAIM: Probability ad Statistics URL: http://wwwemathfr/ps/ Will be set by the publisher THEORY OF CLASSIFICATION: A SURVEY OF SOME RECENT ADVANCES Stéphae Bouchero 1, Olivier Bousquet 2 ad Gábor Lugosi
More informationThe Arithmetic of Investment Expenses
Fiacial Aalysts Joural Volume 69 Number 2 2013 CFA Istitute The Arithmetic of Ivestmet Expeses William F. Sharpe Recet regulatory chages have brought a reewed focus o the impact of ivestmet expeses o ivestors
More informationSystemic Risk and Stability in Financial Networks
America Ecoomic Review 2015, 105(2): 564 608 http://dx.doi.org/10.1257/aer.20130456 Systemic Risk ad Stability i Fiacial Networks By Daro Acemoglu, Asuma Ozdaglar, ad Alireza TahbazSalehi * This paper
More informationType Less, Find More: Fast Autocompletion Search with a Succinct Index
Type Less, Fid More: Fast Autocompletio Search with a Succict Idex Holger Bast MaxPlackIstitut für Iformatik Saarbrücke, Germay bast@mpiif.mpg.de Igmar Weber MaxPlackIstitut für Iformatik Saarbrücke,
More informationCrowds: Anonymity for Web Transactions
Crowds: Aoymity for Web Trasactios Michael K. Reiter ad Aviel D. Rubi AT&T Labs Research I this paper we itroduce a system called Crowds for protectig users aoymity o the worldwideweb. Crowds, amed for
More informationON THE EVOLUTION OF RANDOM GRAPHS by P. ERDŐS and A. RÉNYI. Introduction
ON THE EVOLUTION OF RANDOM GRAPHS by P. ERDŐS ad A. RÉNYI Itroductio Dedicated to Professor P. Turá at his 50th birthday. Our aim is to study the probable structure of a radom graph r N which has give
More informationSignal Reconstruction from Noisy Random Projections
Sigal Recostructio from Noisy Radom Projectios Jarvis Haut ad Robert Nowak Deartmet of Electrical ad Comuter Egieerig Uiversity of WiscosiMadiso March, 005; Revised February, 006 Abstract Recet results
More informationThe Unicorn, The Normal Curve, and Other Improbable Creatures
Psychological Bulleti 1989, Vol. 105. No.1, 156166 The Uicor, The Normal Curve, ad Other Improbable Creatures Theodore Micceri 1 Departmet of Educatioal Leadership Uiversity of South Florida A ivestigatio
More informationBOUNDED GAPS BETWEEN PRIMES
BOUNDED GAPS BETWEEN PRIMES ANDREW GRANVILLE Abstract. Recetly, Yitag Zhag proved the existece of a fiite boud B such that there are ifiitely may pairs p, p of cosecutive primes for which p p B. This ca
More informationTeaching Bayesian Reasoning in Less Than Two Hours
Joural of Experimetal Psychology: Geeral 21, Vol., No. 3, 4 Copyright 21 by the America Psychological Associatio, Ic. 963445/1/S5. DOI: 1.7//963445..3. Teachig Bayesia Reasoig i Less Tha Two Hours Peter
More informationJ. J. Kennedy, 1 N. A. Rayner, 1 R. O. Smith, 2 D. E. Parker, 1 and M. Saunby 1. 1. Introduction
Reassessig biases ad other ucertaities i seasurface temperature observatios measured i situ sice 85, part : measuremet ad samplig ucertaities J. J. Keedy, N. A. Rayer, R. O. Smith, D. E. Parker, ad M.
More informationAdverse Health Care Events Reporting System: What have we learned?
Adverse Health Care Evets Reportig System: What have we leared? 5year REVIEW Jauary 2009 For More Iformatio: Miesota Departmet of Health Divisio of Health Policy P.O. Box 64882 85 East Seveth Place, Suite
More information4. Trees. 4.1 Basics. Definition: A graph having no cycles is said to be acyclic. A forest is an acyclic graph.
4. Trees Oe of the importat classes of graphs is the trees. The importace of trees is evidet from their applicatios i various areas, especially theoretical computer sciece ad molecular evolutio. 4.1 Basics
More informationA Kernel TwoSample Test
Joural of Machie Learig Research 3 0) 73773 Subitted 4/08; Revised /; Published 3/ Arthur Gretto MPI for Itelliget Systes Speastrasse 38 7076 Tübige, Geray A Kerel TwoSaple Test Karste M. Borgwardt Machie
More informationare new doctors safe to practise?
Be prepared: are ew doctors safe to practise? Cotets What we foud 02 Why we ve writte this report 04 What is preparedess ad how ca it be measured? 06 How well prepared are medical graduates? 08 How has
More informationWhich Codes Have CycleFree Tanner Graphs?
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 45, NO. 6, SEPTEMBER 1999 173 Which Coes Have CycleFree Taer Graphs? Tuvi Etzio, Seior Member, IEEE, Ari Trachteberg, Stuet Member, IEEE, a Alexaer Vary,
More informationTurning Brownfields into Greenspaces: Examining Incentives and Barriers to Revitalization
Turig Browfields ito Greespaces: Examiig Icetives ad Barriers to Revitalizatio Juha Siikamäki Resources for the Future Kris Werstedt Virgiia Tech Uiversity Abstract This study employs iterviews, documet
More informationDryad: Distributed DataParallel Programs from Sequential Building Blocks
Dryad: Distributed DataParallel Programs from Sequetial uildig locks Michael Isard Microsoft esearch, Silico Valley drew irrell Microsoft esearch, Silico Valley Mihai udiu Microsoft esearch, Silico Valley
More informationSpinout Companies. A Researcher s Guide
Spiout Compaies A Researcher s Guide Cotets Itroductio 2 Sectio 1 Why create a spiout compay? 4 Sectio 2 Itellectual Property 10 Sectio 3 Compay Structure 15 Sectio 4 Shareholders ad Directors 19 Sectio
More informationCatalogue no. 62557XPB Your Guide to the Consumer Price Index
Catalogue o. 62557XPB Your Guide to the Cosumer Price Idex (Texte fraçais au verso) Statistics Caada Statistique Caada Data i may forms Statistics Caada dissemiates data i a variety of forms. I additio
More informationThe Review of Economic Studies Ltd.
The Review of Ecoomic Studies Ltd. Walras' Tâtoemet i the Theory of Exchage Author(s): H. Uzawa Source: The Review of Ecoomic Studies, Vol. 27, No. 3 (Ju., 1960), pp. 182194 Published by: The Review of
More informationNo One Benefits. How teacher pension systems are failing BOTH teachers and taxpayers
No Oe Beefits How teacher pesio systems are failig BOTH teachers ad taxpayers Authors Kathry M. Doherty, Sadi Jacobs ad Trisha M. Madde Pricipal Fudig The Bill ad Melida Gates Foudatio ad the Joyce Foudatio.
More information