SOME HYPOTHESIS TESTS FOR THE COVARIANCE MATRIX WHEN THE DIMENSION IS LARGE COMPARED TO THE SAMPLE SIZE

Size: px
Start display at page:

Download "SOME HYPOTHESIS TESTS FOR THE COVARIANCE MATRIX WHEN THE DIMENSION IS LARGE COMPARED TO THE SAMPLE SIZE"

Transcription

1 The Aals of Statistis 2002, Vol. 30, No. 4, SOME HYPOTHESIS TESTS FOR THE COVARIANCE MATRIX WHEN THE DIMENSION IS LARGE COMPARED TO THE SAMPLE SIZE BY OLIVIER LEDOIT AND MICHAEL WOLF 1 UCLA ad Credit Suisse First Bosto, ad Uiversitat Pompeu Fabra This paper aalyzes whether stadard ovariae matrix tests work whe dimesioality is large, ad i partiular larger tha sample size. I the latter ase, the sigularity of the sample ovariae matrix makes likelihood ratio tests degeerate, but other tests based o quadrati forms of sample ovariae matrix eigevalues remai well-defied. We study the osistey property ad limitig distributio of these tests as dimesioality ad sample size go to ifiity together, with their ratio overgig to a fiite ozero limit. We fid that the existig test for spheriity is robust agaist high dimesioality, but ot the test for equality of the ovariae matrix to a give matrix. For the latter test, we develop a ew orretio to the existig test statisti that makes it robust agaist high dimesioality. 1. Itrodutio. May empirial problems ivolve large-dimesioal ovariae matries. Sometimes the dimesioality p is eve larger tha the sample size, whih makes the sample ovariae matrix S sigular. How to odut statistial iferee i this ase? For oreteess, we fous o two ommo testig problems i this paper: 1) the ovariae matrix is proportioal to the idetity I spheriity); 2) the ovariae matrix is equal to the idetity I. The idetity a be replaed with ay other matrix 0 by multiplyig the data by 1/2 0. Followig muh of the literature, we assume ormality. For both hypotheses the likelihood ratio test statisti is degeerate whe p exeeds ; see, for example, Muirhead 1982), Setios 8.3 ad 8.4, or Aderso 1984), Setios 10.7 ad This steers us toward other test statistis that do ot degeerate, suh as U = 1 [ )2 ] p tr S 1) 1/p) trs) I ad V = 1 p tr[ S I) 2] where tr deotes the trae. Joh 1971) proves that the test based o U is the loally most powerful ivariat test for spheriity, ad Nagao 1973) derives V as the equivalet of U for the test of = I. The asymptoti framework where U ad V have bee studied assumes that goes to ifiity while p remais fixed. It treats terms of order p/ like terms of order 1/, whih is iappropriate if p is Reeived May 1998; revised November Supported by DGES Grat BEC AMS 2000 subjet lassifiatios. Primary 62H15; seodary 62E20. Key words ad phrases. Coetratio asymptotis, equality test, spheriity test. 1081

2 1082 O. LEDOIT AND M. WOLF of the same order of magitude as. The robustess of tests based o U ad V agaist high dimesioality is heretofore ukow. We study the asymptoti behavior of U ad V as p ad go to ifiity together with the ratio p/ overgig to a limit 0, + ) alled the oetratio. The sigular ase orrespods to a oetratio above oe. The robustess issue boils dow to power ad size: is the test still osistet? Is the -limitig distributio uder the ull still a good approximatio? Surprisigly, we fid opposite aswers for U ad V. The power ad the size of the spheriity test based o U tur out to be robust agaist p large, ad eve larger tha. Butthetestof = I based o V is ot osistet agaist every alterative whe p goes to ifiity with,adits -limitig distributio differs from its, p)-limitig distributio uder the ull. This prompts us to itrodue the modified statisti 2) W = 1 p tr[ S I) 2] p [ ] 1 2 p trs) + p. W has the same -asymptoti properties as V :it is -osistet ad has the same -limitig distributio as V udertheull. Weshowthat, otraryto V, thepower ad the size of the test based o W are robust agaist p large, ad eve larger tha. The otributios of this paper are: i) developig a method to hek the robustess of ovariae matrix tests agaist high dimesioality; ad ii) fidig two statistis oe old ad oe ew) for ommoly used ovariae matrix tests that a be used whe the sample ovariae matrix is sigular. Our results rest o a large ad importat body of literature o the asymptotis for eigevalues of radom matries, suh as Arharov 1971), Bai 1993), Girko 1979, 1988), Josso 1982), Narayaaswamy ad Raghavarao 1991), Serdobol skii 1985, 1995, 1999), Silverstei 1986), Silverstei ad Combettes 1992), Wahter 1976, 1978) ad Yi ad Krishaiah 1983), amog others. Also, we are addig to a substatial list of papers dealig with statistial tests usig results o large radom matries, suh as Alalouf 1978), Bai, Krishaiah, ad Zhao 1989), Bai ad Saraadasa 1996), Dempster 1958, 1960), Läuter 1996), Saraadasa 1993), Wilso ad Kshirsagar 1980) ad Zhao, Krishaiah ad Bai 1986a, 1986b). The remaider of the paper is orgaized as follows. Setio 2 ompiles prelimiary results. Setio 3 shows that the test statisti U for spheriity is robust agaist large dimesioality. Setio 4 shows that the test of = I based o V is ot. Setio 5 itrodues a ew statisti W that a be used whe p is large. Setio 6 reports evidee from Mote Carlo simulatios. Setio 7 addresses some possible oers. Setio 8 otais the olusios. Proofs are deferred to the Appedix. 2. Prelimiaries. The exat sese i whih sample size ad dimesioality go to ifiity together is defied by the followig assumptios.

3 LARGE-DIMENSIONAL COVARIANCE MATRIX TESTS 1083 ASSUMPTION 1 Asymptotis). Dimesioality ad sample size are two ireasig iteger futios p = p k ad = k of a idex k = 1, 2,... suh that lim k p k =+,lim k k =+ ad there exists 0, + ) suh that lim k p k / k =. The ase where the sample ovariae matrix is sigular orrespods to a oetratio higher tha oe. I this paper, we refer to oetratio asymptotis or, p)-asymptotis. Aother term sometimes used for the same oept is ireasig dimesio asymptotis i.d.a) ; for example, see Serdobol skii 1999). ASSUMPTION 2 Data-geeratig proess). For eah positive iteger k, X k is a k + 1) p k matrix of k + 1 i.i.d. observatios o a system of p k radom variables that are joitly ormally distributed with mea vetor µ k ad ovariae matrix k.letλ 1,k,...,λ pk,k deote the eigevalues of the ovariae matrix k. We suppose that their average α = p k i=1 λ i,k/p k ad their dispersio δ 2 = p k i=1 λ i,k α) 2 /p k are idepedet of the idex k. Furthermore, we require α>0. S k is the sample ovariae matrix with etries s ij,k = 1 x jl,k m j,k ) where m i,k = l=1 x il,k m i,k ) +1 l=1 x il,k. The ull hypothesis of spheriity a be stated as δ 2 = 0, ad the ull = I a be stated as δ 2 = 0adα = 1. We eed oe more assumptio to obtai overgee results uder the alterative. ASSUMPTION 3 Higher momets). The averages of the third ad fourth momets of the eigevalues of the populatio ovariae matrix p k λ i,k ) j j = 3, 4) p i=1 k overge to fiite limits, respetively. Depedee o k will be omitted whe o ambiguity is possible. Muh of the mathematial groudwork has already bee laid out by researh i the spetral theory of large-dimesioal radom matries. The fudametal results of iterest to us are as follows. PROPOSITION 1 Law of large umbers). Uder Assumptios 1 3, 1 3) p trs) P α, 4) 1 p trs2 ) P 1 + )α 2 + δ 2 where P deotes overgee i probability.

4 1084 O. LEDOIT AND M. WOLF All proofs are i the Appedix. This law of large umbers will help us establish whether or ot a give test is osistet agaist every alterative as ad p go to ifiity together. The distributio of the test statisti uder the ull will be foud by usig the followig etral limit theorem. PROPOSITION 2 Cetral limit theorem). Uder Assumptios 1 2, if δ 2 = 0, the 1 p trs) α 1 p trs2 ) + p + 1 α 2 5) 2α 2 [ ] D ) α 3 N, ) ) 2 α α 4 where D deotes overgee i distributio ad N the ormal distributio. 3. Spheriity test. It is well kow that the spheriity test based o U is -osistet. As for, p)-osistey, Propositio 1 implies that, uder Assumptios 1 3, U = 1/p) trs2 ) [1/p) trs)] 2 1 P 1 + )α2 + δ 2 6) α 2 1 = + δ2 α 2. Sie a be approximated by the kow quatity p/, the power of this test to separate the ull hypothesis of spheriity δ 2 /α 2 = 0 from the alterative δ 2 /α 2 > 0 overges to oe as ad p go to ifiity together: this ostitutes a, p)-osistet test. Joh 1972) showsthat, as goesto ifiity while p remais fixed, the limitig distributio of U uder the ull is give by p 7) 2 U D Y pp+1)/2 1 or, equivaletly, 8) U p D 2 p Y pp+1)/2 1 p where Y d deotes a radom variable distributed as a χ 2 with d degrees of freedom. It will beome apparet after Propositio 4 why we hoose to rewrite equatio 7) as 8). This approximatio may or may ot remai aurate uder, p)-asymptotis, depedig o whether it omits terms of order p/. To fid out, let us start by derivig the, p)-limitig distributio of U uder the ull hypothesis δ 2 /α 2 = 0.

5 LARGE-DIMENSIONAL COVARIANCE MATRIX TESTS 1085 PROPOSITION 3. Uder the assumptios of Propositio 2, 9) U p D N 1, 4). Now we a ompare equatios 8) ad 9). PROPOSITION 4. Suppose that, for every k, the radom variable Y pk p k +1)/2+a is distributed as a χ 2 with p k p k + 1)/2 + a degrees of freedom, where a is a ostat iteger. The its limitig distributio uder Assumptio 1 satisfies 10) 2 p k Y pk p k +1)/2+a p k D N 1, 4). Usig Propositio 4 with a = 1 shows that the -limitig distributio give by equatio 8) is still orret uder, p)-asymptotis. Theolusioof ouraalysisofthespheriitytestbasedo U is the followig: the existig -asymptoti theory where p is fixed) remais valid if p goes to ifiity with, eve for the ase p>. 4. Test that a ovariae matrix is the idetity. As goes to ifiity with p fixed, S P, therefore V P p 1 tr[ I)2 ]. This shows that the test of = I based o V is -osistet. As for, p)-osistey, Propositio 1 implies that, uder Assumptios 1 3, 11) V = 1 p trs2 ) 2 p trs) + 1 P 1 + )α 2 + δ 2 2α + 1 = α 2 + α 1) 2 + δ 2. Sie 1 p tr[ I)2 ]=α 1) 2 + δ 2 is a squared measure of distae betwee the populatio ovariae matrix ad the idetity, the ull hypothesis a be rewritte as α 1) 2 + δ 2 = 0, ad the alterative as α 1) 2 + δ 2 > 0. The problem is that the probability limit of the test statisti V is ot diretly a futio of α 1) 2 +δ 2 : it ivolves aother term, α 2, whih otais the uisae parameter α 2. Therefore the test based o V may sometimes be powerless to separate the ull from the alterative. More speifially, whe the triplet,α,δ)satisfies 12) α 2 + α 1) 2 + δ 2 =, the test statisti V has the same probability limit uder the ull as uder the alterative. The learest outer-examples are those where δ 2 = 0, beause Propositio 2 allows us to ompute the limit of the power of the test agaist suh alteratives. Whe δ 2 = 0 the solutio to equatio 12) is α = 1 1+.

6 1086 O. LEDOIT AND M. WOLF PROPOSITION 5. Uder Assumptios 1 2, if 0, 1) ad there exists a fiite d suh that p = + d + o 1 ) the the power of the test of ay positive sigifiae level based o V to rejet the ull = I whe the alterative = 1 1+I is true overges to a limit stritly below oe. We see that the -osistey of the test based o V does ot exted to, p)-asymptotis. Nagao 1973) shows that, as goes to ifiity while p remais fixed, the limitig distributio of V udertheullis giveby p 13) 2 V D Y pp+1)/2 or, equivaletly, 14) V p D 2 p Y pp+1)/2 p where, as before, Y d deotes a radom variable distributed as a χ 2 with d degrees of freedom. It is ot immediately apparet whether this approximatio remais aurate uder, p)-asymptotis. The, p)-limitig distributio of V uder the ull hypothesis α 1) 2 + δ 2 = 0 is derived i equatio 38) i the Appedix as part of the proof of Propositio 5: 15) V p D N 1, 4 + 8). Usig Propositio 4 with a = 0showsthatthe-limitig distributio give by equatio 14) is iorret uder, p)-asymptotis. The olusio of our aalysis of the test of = I based o V is the followig: the existig -asymptoti theory where p is fixed) breaks dow whe p goes to ifiity with, iludig the ase p>. 5. Test that a ovariae matrix is the idetity: ew statisti. The ideal would be to fid a simple modifiatio of V that has the same -asymptoti properties ad better, p)-asymptoti properties i the spirit of U). This is why we itrodue the ew statisti 16) W = 1 p tr[ S I) 2] p [ 1 p trs) ] 2 + p. As goes to ifiity with p fixed, W P 1 p tr[ I)2 ], therefore the test of = I based o W is -osistet. As for, p)-osistey, Propositio 1 implies that, uder Assumptios 1 3, 17) W P α 2 + α 1) 2 + δ 2 α 2 + = + α 1) 2 + δ 2.

7 LARGE-DIMENSIONAL COVARIANCE MATRIX TESTS 1087 Sie a be approximated by the kow quatity p/, the power of the test based o W to separate the ull hypothesis α 1) 2 + δ 2 = 0 from the alterative α 1) 2 +δ 2 > 0 overges to oe as ad p go to ifiity together: the test based o W is, p)-osistet. The followig propositio shows that W has the same -limitig distributio as V uder the ull. PROPOSITION 6. As goes to ifiity with p fixed, the limitig distributio of W uder the ull hypothesis α 1) 2 + δ 2 = 0 is the same as for V : p 18) 2 W D Y pp+1)/2 or, equivaletly, 19) W p D 2 p Y pp+1)/2 p where Y d deotes a radom variable distributed as a χ 2 with d degrees of freedom. To fid out whether this approximatio remais aurate uder, p)-asymptotis, we derive the, p)-limitig distributio of W uder the ull. 20) PROPOSITION 7. Uder Assumptios 1 2, if α 1) 2 + δ 2 = 0 the W p D N 1, 4). Usig Propositio 4 with a = 0 shows that the -limitig distributio give by equatio 19) is still orret uder, p)-asymptotis. The olusio of our aalysis of the test of = I based o W is the followig: the -asymptoti theory developed for V is diretly appliable to W,aditremais valid for W but ot V )ifp goes to ifiity with, eve i the ase p>. 6. Mote Carlo simulatios. So far, little is kow about the fiite-sample behavior of these tests. I partiular the questio of whether they are ubiased i fiite sample is ot readily tratable. Yet some light a be shed o fiite-sample behavior through Mote Carlo simulatios. Mote Carlo simulatios are used to fid the size ad power of the test statistis U, V,adW for p, = 4, 8,...,256. I eah ase we ru 10, 000 simulatios. The alterative agaist whih power is omputed has to be salable i the sese that it a be represeted by populatio ovariae matries of ay dimesio p = 4, 8,...,256. The simplest alterative we a thik of is to set half of the populatio eigevalues equal to 1, ad the other oes equal to 0.5. Table 1 reports the size of the spheriity test based o U. The test is arried out by omputig the 95% utoff poit from the χ 2 -limitig distributio i

8 1088 O. LEDOIT AND M. WOLF TABLE 1 Size of spheriity test based o U. The ull hypothesis is rejeted whe the test statisti exeeds the 95% utoff poit obtaied from the χ 2 approximatio. Atual size overges to omial size as dimesioality p goes to ifiity with sample size. Results ome from 10,000 Mote Carlo simulatios p equatio 8). We see that the quality of this approximatio does ot get worse whe p gets large: it a be relied upo eve whe p>. This is what we expeted give Propositio 4. Table 2 shows the power of the spheriity test based o U agaist the alterative desribed above. We see that the power does ot beome lower whe p gets large: power stays high eve whe p>. This ofirms the, p)-osistey result derived from equatio 6). The table idiates that the power seems to deped predomiatly o. For fixed sample size, the power of the test is ofte ireasig i p, whih is somewhat surprisig. We do ot have ay simple explaatio of this pheomeo but will address it i future researh fousig o the aalysis of power. TABLE 2 Power of spheriity test based o U. The ull hypothesis is rejeted whe the test statisti exeeds the 95% utoff poit obtaied from the χ 2 approximatio. Data are geerated uder the alterative where half of the populatio eigevalues are equal to 1, ad the other oes are equal to 0.5. Power overges to oe as dimesioality p goes to ifiity with sample size. Results ome from 10,000 Mote Carlo simulatios p

9 LARGE-DIMENSIONAL COVARIANCE MATRIX TESTS 1089 TABLE 3 Size of equality test based o V. The ull hypothesis is rejeted whe the test statisti exeeds the 95% utoff poit obtaied from the χ 2 approximatio. Atual size does ot overge to omial size as dimesioality p goes to ifiity with sample size. Results ome from 10,000 Mote Carlo simulatios p Usig the same methodology as i Table 1, we report i Table 3 the size of the test for = I based o V. We see that the χ 2 -limitig distributio uder the ull i equatio 14) is a poor approximatio for large p. This is what we expeted give the disussio surroudig equatio 15). Usig the same methodology as i Table 2, we report i Table 4 the power of the test based o V agaist the alterative desribed above. Give the disussio surroudig equatio 12), we atiipate that this test will ot be powerful whe =[α 1) 2 + δ 2 ]/1 α 2 ) = 2/7. Ideed we observe that, i the ells where p/ exeeds the ritial value 2/7, this test does ot have muh power to rejet the alterative. TABLE 4 Power of equality test based o V. The ull hypothesis is rejeted whe the test statisti exeeds the 95% utoff poit obtaied from the χ 2 approximatio. Data are geerated uder the alterative where half of the populatio eigevalues are equal to 1, ad the other oes are equal to 0.5. Power does ot overge to oe as dimesioality p goes to ifiity with sample size. Results ome from 10,000 Mote Carlo simulatios p

10 1090 O. LEDOIT AND M. WOLF TABLE 5 Size of equality test based o W. The ull hypothesis is rejeted whe the test statisti exeeds the 95% utoff poit obtaied from the χ 2 approximatio. Atual size overges to omial size as dimesioality p goes to ifiity with sample size. Results ome from 10,000 Mote Carlo simulatios p Usig the same methodology as i Table 1, we report i Table 5 the size of the test for = I based o W. We see that the χ 2 approximatio i equatio 19) for the ull distributio does ot get worse whe p gets large: it a be relied upo eve whe p>. This is what we expeted give the disussio surroudig equatio 15). Usig the same methodology as i Table 2, we report i Table 6 the power of the test based o W agaist the alterative desribed above. We see that the power does ot beome lower whe p gets large: power stays high eve whe p>.this ofirms the, p)-osistey result derived from equatio 17). As with U, the table idiates that the power seems to deped predomiatly o, ad to be ireasig i p for fixed. TABLE 6 Power of equality test based o W. The ull hypothesis is rejeted whe the test statisti exeeds the 95% utoff poit obtaied from the χ 2 approximatio. Data are geerated uder the alterative where half of the populatio eigevalues are equal to 1, ad the other eigevalues are equal to 0.5. Power overges to oe as dimesioality p goes to ifiity with sample size. Results ome from 10,000 Mote Carlo simulatios p

11 LARGE-DIMENSIONAL COVARIANCE MATRIX TESTS 1091 Overall, these Mote Carlo simulatios ofirm the fiite-sample relevae of the asymptoti results obtaied i Setios 3, 4 ad Possible oers. For the disussio that follows, reall the defiitio of the rth mea of a olletio of p oegative reals, {s 1,...,s p },giveby ) 1/p 1 p s p i, if r 0, p Mr) = i=1 p s 1/p i, if r = 0. i=1 A possible oer is the use of Joh s statisti U for testig spheriity, sie it is based o the ratio of the first ad seod meas [i.e., M1) ad M2)] ofthe sample eigevalues. The likelihood ratio LR) test statisti, o the other had, is based o the ratio of the geometri mea [i.e., M0)] to the first mea of the sample eigevalues; for example, see Muirhead 1982), Setio 8.3. It has log bee kow that the LR test has the desirable property of beig ubiased; see Gleser 1966) ad Marshall ad Olki 1979), pages Also, for the related problem of testig homogeeity of variaes, it has log bee established that ertai tests based o ratios of the type Mr)/Mt) with r 0adt 0 are ubiased; see Cohe ad Strawderma 1971). No ubiasedess properties are kow for tests based o ratios of the type Mr)/Mt) with both r>0adt>0. Still, we advoate the use of Joh s statisti U over the LR statisti for testig spheriity whe p is large ompared to. First, the LR test statisti is degeerate whe p>though oe might try to defie a alterative statisti usig the ozero sample eigevalues oly i this ase). Seod, whe p is less tha or equal to but lose to some of the sample eigevalues will be very lose to zero, ausig the LR statisti to be early degeerate; this should affet the fiitesample performae of the LR test. Obviously, this also questios the strategy of ostrutig a LR-like statisti based o the ozero sample eigevalues oly whe p>.) Our ituitio is that tests whose statisti ivolves a mea Mr) with r 0 will misbehave whe p beomes lose to. The reaso is that they give too muh importae to the sample eigevalues lose to zero, whih otai iformatio ot o the true ovariae matrix but o the ratio p/; see Figure 1 for a illustratio. To hek this ituitio, we ru a Mote Carlo o the LR test for spheriity for the ase p. Critial values are obtaied from the χ 2 approximatio uder the ull; for example, see Muirhead 1982), Setio 8.3. The simulatio set-up is idetial to that of Setio 6. Table 7 reports the simulated size of the LR test ad severe size distortios for large values of p ompared to are obvious. Next we ompute the power of the LR test i a way that eables diret ompariso with Table 2: we use the distributio of the LR test statisti simulated uder the ull

12 1092 O. LEDOIT AND M. WOLF FIG.1. Sample versus true eigevalues. The solid lie represets the distributio of the eigevalues of the sample ovariae matrix based o the asymptoti formula prove by Marčeko ad Pastur 1967). Eigevalues are sorted from largest to smallest, the plotted agaist their rak. I this ase, the true ovariae matrix is the idetity, that is, the true eigevalues are all equal to oe. The distributio of the true eigevalues is plotted as a dashed horizotal lie at oe. Distributios are obtaied i the limit as the umber of observatios ad the umber of variables p bothgoto ifiity with the ratio p/ overgig to a fiite positive limit, the oetratio. The four plots orrespod to differet values of the oetratio. to fid the utoff poits orrespodig to the realized sizes i Table 1 most of them are equal to the omial size of 0.05, but for small values of p ad they are lower). Usig these utoff poits for the LR test statisti geerates a test with exatly the same size as the test based o Joh s statisti U, so we a diretly ompare the power of the two tests. Table 8 is the equivalet of Table 2 exept it uses the LR test statisti for p. We a see that the LR test is slightly more powerful tha Joh s test by oe peret or less) whe p is small ompared to,

13 LARGE-DIMENSIONAL COVARIANCE MATRIX TESTS 1093 TABLE 7 Size of spheriity test based o LR test statisti. The ull hypothesis is rejeted whe the test statisti exeeds the 95% utoff poit obtaied from the χ 2 approximatio. Atual size does ot overge to omial size as dimesioality p goes to ifiity with sample size. Results ome from 10,000 Mote Carlo simulatios p but is substatially less powerful whe p gets lose to. Hee, both i terms of size ad power, the test based o U is preferable to the LR test whe p is large ompared to, ad this is the seario of iterest of the paper. Aother possible oer addresses the otio of osistey whe p teds to ifiity. For p fixed, the alterative is give by a fixed ovariae matrix ad osistey meas that the power of the test teds to oe as the sample size teds to ifiity. Of ourse, whe p ireases the matrix of the alterative a o loger be fixed. Our approah is to work withi a asymptoti framework that plaes ertai restritios o how a evolve, amely we require that the quatities α ad δ 2 aot hage; see Assumptio 2. Obviously, this exludes TABLE 8 Power of spheriity test based o LR test statisti. The ull hypothesis is rejeted whe the test statisti exeeds the 95% size-adjusted utoff poit to eable diret ompariso with Table 2) obtaied from the χ 2 approximatio. Data are geerated uder the alterative where half of the populatio eigevalues are equal to 1, ad the other oes are equal to 0.5. Power does ot overge to oe as dimesioality p goes to ifiity with sample size. Results ome from 10,000 Mote Carlo simulatios p

14 1094 O. LEDOIT AND M. WOLF ertai alteratives of iterest suh as havig all eigevalues equal to 1 exept for the largest whih is equal to p β,forsome0<β<0.5. For this sequee of alteratives, the test based o Joh s statisti U is ot osistet ad a test based o aother statisti would have to be devised e.g., ivolvig the maximum sample eigevalue). Suh other asymptoti frameworks are deferred to future researh. 8. Colusios. I this paper, we have studied the spheriity test ad the idetity test for ovariae matries whe the dimesioality is large ompared to the sample size, ad i partiular whe it exeeds the sample size. Our aalysis is restrited to a asymptoti framework that osiders the first two momets of the eigevalues of the true ovariae matrix to be idepedet of the dimesioality. We foud that the existig test for spheriity based o Joh s 1971) statisti U is robust agaist high dimesioality. O the other had, the related test for idetity based o Nagao s 1973) statisti V is iosistet. We proposed a modifiatio to the statisti V whih makes it robust agaist high dimesioality. Mote Carlo simulatios ofirmed that our asymptoti results ted to hold well i fiite samples. Diretios for future researh ilude: applyig the method to other test statistis; fidig limitig distributios uder the alterative to ompute power; searhig for most powerful tests withi speifi asymptoti frameworks for the sequee of alteratives); relaxig the ormality assumptio. APPENDIX PROOF OF PROPOSITION 1. The proof of this propositio is otaied iside the proof of the mai theorem of Yi ad Krishaiah 1983). Their paper deals with the produt of two radom matries but it a be applied to our set-up by takig oe of them to be the idetity matrix as a speial ase of a radom matrix. Eve though their mai theorem is derived uder assumptios o all the average momets of the eigevalues of the populatio ovariae matrix, areful ispetio of their proof reveals that overgee i probability of the first two average momets requires oly assumptios up to the fourth momet. The formulas for the limits ome from Yi ad Krishaiah s 1983) seod equatio o the top of page 504. PROOF OF PROPOSITION 2. Chagig α simply amouts to resalig 1 p trs) by α ad p 1 trs2 ) by α 2, therefore we a assume without loss of geerality that α = 1. Josso s 1982) Theorem 4.1 shows that, uder the assumptios of Propositio 2, { } trs) E[trS)] + p 21) 2 { trs 2 + p) 2 ) E[trS 2 )] }

15 LARGE-DIMENSIONAL COVARIANCE MATRIX TESTS 1095 overges i distributio to a bivariate ormal. Sie p/ 0, + ), this implies that 22) [ ] 1 1 p trs) E p trs) 1 p trs2 ) E [ ] 1 p trs2 ) also overges i distributio to a bivariate ormal. p trs) is the average of the diagoal elemets of the ubiased sample ovariae matrix, therefore its expetatio is equal to oe. Joh 1972), Lemma 2, shows that the expetatio of p 1 trs2 ) is equal to +p+1. So far we have established that 1 23) 1 p trs) 1 1 p trs2 ) + p + 1 overges i distributio to a bivariate ormal. Sie this limitig bivariate ormal has mea zero, the oly task left is to ompute its ovariae matrix. This a be doe by takig the limit of the ovariae matrix of the expressio i equatio 23). Usig oe agai the momets omputed by Joh 1972), Lemma 2, we fid that [ ] Var p trs) [ ) 2 ] = E p trs) ] Var[ p trs2 ) [ ) 2 ] = E p trs2 ) [ ]) 2 E p trs) p + 2) = 2 = 2 p p 2, [ ]) 2 E p trs2 ) = p3 + 2p 2 + 2p + 8) 2 + p 3 + 2p p + 20) + 8p p + 20 p + p + 1) 2 = 8 p + 20p2 + 20p p 2 + 8p3 + 20p p p

16 1096 O. LEDOIT AND M. WOLF Fially we have to fid the ovariae term. Let s ij deote the etry i, j) of the ubiased sample ovariae matrix S. Wehave p p p E[trS) trs 2 )]= E[s ii sjl 2 ] i=1 j=1 l=1 = pp 1)p 2)E[s 11 s23 2 ]+pp 1)E[s 11s22 2 ] + 2pp 1)E[s 11 s 2 12 ]+pe[s3 11 ] 24) = pp 1)p 2) + pp 1) pp 1) ) + 4) 2 + p 2 = p 2 + p3 + p 2 + 4p + 4p2 + 4p 2. The momet formulas that appear i equatio 24) are omputed i the same fashio as i the proof of Lemma 2 by Joh 1972). This eables us to ompute the limitig ovariae term as 25) [ Cov p trs), ] p trs2 ) = 2 [ p 2 E[trS) trs2 )] E p trs) = 2 + p2 + p + 4 p = 4 p p ). ] [ ] E p trs2 ) + 4p p + 1) p This ompletes the proof of Propositio 2. PROOF OF PROPOSITION 3. Defie the futio fx,y) = y 1. The x 2 U = f 1 p trs), 1 p trs2 )). Propositio 2 implies that, by the delta method, [ U f α, + p + 1 )] α 2 D N 0, lim A),

17 LARGE-DIMENSIONAL COVARIANCE MATRIX TESTS 1097 where f α, + p + 1 ) α 2 x A = f α, + p + 1 ) α 2 y 2α ) α 3 f α, + p + 1 ) α 2 x ) ) 2 α α 4 f α, + p + 1 ) α 2 y ad deotes the traspose. Notie that f α, + p + 1 ) 26) α 2 = p + 1, f α, + p + 1 ) 27) α 2 = 2 + p + 1, x α f α, + p + 1 ) 28) α 2 = 1 y α 2. Plaig the last two expressios ito the formula for A yields 29) + p + 1)2 A = ) ) + p ) 1 + ) ) ) ) = 4. This ompletes the proof of Propositio 3. PROOF OF PROPOSITION 4. Let z 1,z 2,...deote a sequee of i.i.d. stadard ormal radom variables. The Y pk p k +1)/2+a has the same distributio as z zp 2 k p k +1)/2+a.SieE[z2 1 ]=1adVar[z2 1 ]=2, the Lideberg Lévy etral limit theorem implies that [ ] Ypk p p k p k + 1)/2 + a k +1)/2+a D 31) p k p k + 1)/2 + a 1 N 0, 2). Multiplyig the left-had side by p k p k + 1) + 2a/p k, whih overges to oe, does ot affet the limit, therefore 2 Y pk p p k +1)/2+a p k a 2 D 32) N 0, 2). k 2 p k

18 1098 O. LEDOIT AND M. WOLF Subtratig from the left-had side a 2/p k, whih overges to zero, does ot affet the limit, therefore 2 Y pk p p k +1)/2+a p k + 1 D 33) N 0, 2). k 2 Resalig equatio 33) yields 10). PROOF OF PROPOSITION 5. Defie the futio gx,y) = y 2x + 1. The V = g p 1 trs), p 1 trs2 )). Propositio 2 implies that, by the delta method, [ V g α, + p + 1 )] α 2 D N 0, lim B), where g α, + p + 1 ) α 2 x B = g y Notie that 34) 35) 36) α, + p + 1 2α g g x g y ) α ) α 3 ) ) 2 α α, + p + 1 α, + p + 1 ) α 2 = 2, α, + p + 1 ) α 2 = 1. g α, + p + 1 ) α 2 x α 4 g α, + p + 1 ) α 2. y α 2 ) = α 1) 2 + p + 1 α2, Plaig the last two expressios ito the formula for B yields B = 8 1 α ) ) 2 37) α α 4. First let usfid the, p)-limitig distributio of V uderthe ull. Settig α equal to oe yields g1, +p+1 ) = p+1 ad B = Hee, uder the ull, V p + 1 ) D 38) N 0, 4 + 8).

19 LARGE-DIMENSIONAL COVARIANCE MATRIX TESTS 1099 Now let us fid the, p)-limitig distributio of V uder the alterative. Settig α equal to 1 1+ yields 1 g 1 +, + p ) 2 ) 1 + ) 2 ad B = 8 )2 1 = = p p + 1 ) ) 2 ) d ) 2 + o ) ) ) 1 3 ) = 41 ) ) 4. Hee, uder the alterative, 39) V p + 1 ) D 2 )d + 1) N 1 + ) 2, 41 ) ) 4 Therefore the power of a test of sigifiae level θ>0 to rejet the ull = I whe the alterative = 1 1+I is true overges to 1 1 θ) )d + 1)/1 + ) 2 ) 40) 1 < 1 41 ) )/1 + ) 4 where deotes the stadard ormal.d.f. 41) 42) Assumig p fixed, it is easily see that [ ] 1 2 ) W V) = p 1 p trs) PROOF OF PROPOSITION 6. P p1 α 2 ) = 0 uder the ull). Hee, uder the ull, W V) overges to zero i probability, as goes to ifiity for p fixed. The proof is ompleted by applyig Slutzky s theorem. PROOF OF PROPOSITION 7. Defie hx, y) = y 2x + 1 p x2 + p.the W = h p 1 trs), p 1 trs2 )). Propositio 2 implies that, by the delta method, [ W h 1, + p + 1 )] D N 0, lim C), ) 4 ).

20 1100 O. LEDOIT AND M. WOLF where Notie that 43) 44) 45) h 1, + p + 1 ) x C = h 1, + p + 1 ) y ) h h x h y ) , + p + 1 1, + p + 1 1, + p + 1 h 1, + p + 1 ) x ) h 1, + p + 1 ). y ) = p + 1, ) = 2 + p, ) = 1. Plaig the last two expressios ito the formula for C yields 46) + p)2 C = ) ) + p ) 1 + ) ) ) ) = 4. This ompletes the proof of Propositio 7. Akowledgmets. We wish to thak Theodore W. Aderso for eouragemet. We are also grateful to a Assoiate Editor ad a referee for ostrutive ritiisms that have led to a improved presetatio of the paper. REFERENCES ALALOUF, I. S. 1978). A expliit treatmet of the geeral liear model with sigular ovariae matrix. Sakhyā Ser.B ANDERSON, T. W. 1984). A Itrodutio to Multivariate Statistial Aalysis, 2d ed. Wiley, New York. ARHAROV, L. V. 1971). Limit theorems for the harateristi roots of a sample ovariae matrix. Soviet Math. Dokl BAI, Z. D. 1993). Covergee rate of expeted spetral distributios of large radom matries. II. Sample ovariae matries. A. Probab BAI, Z.D.,KRISHNAIAH, P.R.adZHAO, L. C. 1989). O rates of overgee of effiiet detetio riteria i sigal proessig with white oise. IEEE Tras. Iform. Theory

A Capacity Supply Model for Virtualized Servers

A Capacity Supply Model for Virtualized Servers 96 Iformatia Eoomiă vol. 3, o. 3/009 A apaity upply Model for Virtualized ervers Alexader PINNOW, tefa OTERBURG Otto-vo-Guerike-Uiversity, Magdeburg, Germay {alexader.piow stefa.osterburg}@iti.s.ui-magdeburg.de

More information

I. Chi-squared Distributions

I. Chi-squared Distributions 1 M 358K Supplemet to Chapter 23: CHI-SQUARED DISTRIBUTIONS, T-DISTRIBUTIONS, AND DEGREES OF FREEDOM To uderstad t-distributios, we first eed to look at aother family of distributios, the chi-squared distributios.

More information

Chapter 7 Methods of Finding Estimators

Chapter 7 Methods of Finding Estimators Chapter 7 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 011 Chapter 7 Methods of Fidig Estimators Sectio 7.1 Itroductio Defiitio 7.1.1 A poit estimator is ay fuctio W( X) W( X1, X,, X ) of

More information

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n We will cosider the liear regressio model i matrix form. For simple liear regressio, meaig oe predictor, the model is i = + x i + ε i for i =,,,, This model icludes the assumptio that the ε i s are a sample

More information

A Result on Diffuse Random Measure

A Result on Diffuse Random Measure It. J. Cotemp. Math. Si., Vol. 2, 2007, o. 14, 679-683 Result o Diffuse Radom Measure. Varsei ad. Samimi Departmet of Mathematis Faulty of Siees, The Uiversity of Guila P.O. Box 1914 P.C. 41938, Rasht,

More information

Chapter 14 Nonparametric Statistics

Chapter 14 Nonparametric Statistics Chapter 14 Noparametric Statistics A.K.A. distributio-free statistics! Does ot deped o the populatio fittig ay particular type of distributio (e.g, ormal). Sice these methods make fewer assumptios, they

More information

1 Computing the Standard Deviation of Sample Means

1 Computing the Standard Deviation of Sample Means Computig the Stadard Deviatio of Sample Meas Quality cotrol charts are based o sample meas ot o idividual values withi a sample. A sample is a group of items, which are cosidered all together for our aalysis.

More information

2.11. Semiconductor thermodynamics

2.11. Semiconductor thermodynamics 2.11. Semiodutor thermodyamis Thermodyamis a be used to explai some harateristis of semiodutors ad semiodutor devies, whih a ot readily be explaied based o the trasport of sigle partiles. Oe example is

More information

Hypothesis testing. Null and alternative hypotheses

Hypothesis testing. Null and alternative hypotheses Hypothesis testig Aother importat use of samplig distributios is to test hypotheses about populatio parameters, e.g. mea, proportio, regressio coefficiets, etc. For example, it is possible to stipulate

More information

FIBONACCI NUMBERS: AN APPLICATION OF LINEAR ALGEBRA. 1. Powers of a matrix

FIBONACCI NUMBERS: AN APPLICATION OF LINEAR ALGEBRA. 1. Powers of a matrix FIBONACCI NUMBERS: AN APPLICATION OF LINEAR ALGEBRA. Powers of a matrix We begi with a propositio which illustrates the usefuless of the diagoalizatio. Recall that a square matrix A is diogaalizable if

More information

5: Introduction to Estimation

5: Introduction to Estimation 5: Itroductio to Estimatio Cotets Acroyms ad symbols... 1 Statistical iferece... Estimatig µ with cofidece... 3 Samplig distributio of the mea... 3 Cofidece Iterval for μ whe σ is kow before had... 4 Sample

More information

Sequences and Series

Sequences and Series CHAPTER 9 Sequeces ad Series 9.. Covergece: Defiitio ad Examples Sequeces The purpose of this chapter is to itroduce a particular way of geeratig algorithms for fidig the values of fuctios defied by their

More information

Properties of MLE: consistency, asymptotic normality. Fisher information.

Properties of MLE: consistency, asymptotic normality. Fisher information. Lecture 3 Properties of MLE: cosistecy, asymptotic ormality. Fisher iformatio. I this sectio we will try to uderstad why MLEs are good. Let us recall two facts from probability that we be used ofte throughout

More information

Key Ideas Section 8-1: Overview hypothesis testing Hypothesis Hypothesis Test Section 8-2: Basics of Hypothesis Testing Null Hypothesis

Key Ideas Section 8-1: Overview hypothesis testing Hypothesis Hypothesis Test Section 8-2: Basics of Hypothesis Testing Null Hypothesis Chapter 8 Key Ideas Hypothesis (Null ad Alterative), Hypothesis Test, Test Statistic, P-value Type I Error, Type II Error, Sigificace Level, Power Sectio 8-1: Overview Cofidece Itervals (Chapter 7) are

More information

7. Sample Covariance and Correlation

7. Sample Covariance and Correlation 1 of 8 7/16/2009 6:06 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 7. Sample Covariace ad Correlatio The Bivariate Model Suppose agai that we have a basic radom experimet, ad that X ad Y

More information

Department of Computer Science, University of Otago

Department of Computer Science, University of Otago Departmet of Computer Sciece, Uiversity of Otago Techical Report OUCS-2006-09 Permutatios Cotaiig May Patters Authors: M.H. Albert Departmet of Computer Sciece, Uiversity of Otago Micah Colema, Rya Fly

More information

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008 I ite Sequeces Dr. Philippe B. Laval Keesaw State Uiversity October 9, 2008 Abstract This had out is a itroductio to i ite sequeces. mai de itios ad presets some elemetary results. It gives the I ite Sequeces

More information

SOLID MECHANICS DYNAMICS TUTORIAL DAMPED VIBRATIONS. On completion of this tutorial you should be able to do the following.

SOLID MECHANICS DYNAMICS TUTORIAL DAMPED VIBRATIONS. On completion of this tutorial you should be able to do the following. SOLID MECHANICS DYNAMICS TUTORIAL DAMPED VIBRATIONS This work overs elemets of the syllabus for the Egieerig Couil Eam D5 Dyamis of Mehaial Systems, C05 Mehaial ad Strutural Egieerig ad the Edeel HNC/D

More information

Statistica Siica 6(1996), 311-39 EFFECT OF HIGH DIMENSION: BY AN EXAMPLE OF A TWO SAMPLE PROBLEM Zhidog Bai ad Hewa Saraadasa Natioal Su Yat-se Uiversity Abstract: With the rapid developmet of moder computig

More information

Laws of Exponents. net effect is to multiply with 2 a total of 3 + 5 = 8 times

Laws of Exponents. net effect is to multiply with 2 a total of 3 + 5 = 8 times The Mathematis 11 Competey Test Laws of Expoets (i) multipliatio of two powers: multiply by five times 3 x = ( x x ) x ( x x x x ) = 8 multiply by three times et effet is to multiply with a total of 3

More information

Determining the sample size

Determining the sample size Determiig the sample size Oe of the most commo questios ay statisticia gets asked is How large a sample size do I eed? Researchers are ofte surprised to fid out that the aswer depeds o a umber of factors

More information

Case Study. Normal and t Distributions. Density Plot. Normal Distributions

Case Study. Normal and t Distributions. Density Plot. Normal Distributions Case Study Normal ad t Distributios Bret Halo ad Bret Larget Departmet of Statistics Uiversity of Wiscosi Madiso October 11 13, 2011 Case Study Body temperature varies withi idividuals over time (it ca

More information

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method Chapter 6: Variace, the law of large umbers ad the Mote-Carlo method Expected value, variace, ad Chebyshev iequality. If X is a radom variable recall that the expected value of X, E[X] is the average value

More information

Lesson 17 Pearson s Correlation Coefficient

Lesson 17 Pearson s Correlation Coefficient Outlie Measures of Relatioships Pearso s Correlatio Coefficiet (r) -types of data -scatter plots -measure of directio -measure of stregth Computatio -covariatio of X ad Y -uique variatio i X ad Y -measurig

More information

Maximum Likelihood Estimators.

Maximum Likelihood Estimators. Lecture 2 Maximum Likelihood Estimators. Matlab example. As a motivatio, let us look at oe Matlab example. Let us geerate a radom sample of size 00 from beta distributio Beta(5, 2). We will lear the defiitio

More information

Sequences II. Chapter 3. 3.1 Convergent Sequences

Sequences II. Chapter 3. 3.1 Convergent Sequences Chapter 3 Sequeces II 3. Coverget Sequeces Plot a graph of the sequece a ) = 2, 3 2, 4 3, 5 + 4,...,,... To what limit do you thik this sequece teds? What ca you say about the sequece a )? For ǫ = 0.,

More information

Lecture 13. Lecturer: Jonathan Kelner Scribe: Jonathan Pines (2009)

Lecture 13. Lecturer: Jonathan Kelner Scribe: Jonathan Pines (2009) 18.409 A Algorithmist s Toolkit October 27, 2009 Lecture 13 Lecturer: Joatha Keler Scribe: Joatha Pies (2009) 1 Outlie Last time, we proved the Bru-Mikowski iequality for boxes. Today we ll go over the

More information

Measures of Spread and Boxplots Discrete Math, Section 9.4

Measures of Spread and Boxplots Discrete Math, Section 9.4 Measures of Spread ad Boxplots Discrete Math, Sectio 9.4 We start with a example: Example 1: Comparig Mea ad Media Compute the mea ad media of each data set: S 1 = {4, 6, 8, 10, 1, 14, 16} S = {4, 7, 9,

More information

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed. This documet was writte ad copyrighted by Paul Dawkis. Use of this documet ad its olie versio is govered by the Terms ad Coditios of Use located at http://tutorial.math.lamar.edu/terms.asp. The olie versio

More information

Soving Recurrence Relations

Soving Recurrence Relations Sovig Recurrece Relatios Part 1. Homogeeous liear 2d degree relatios with costat coefficiets. Cosider the recurrece relatio ( ) T () + at ( 1) + bt ( 2) = 0 This is called a homogeeous liear 2d degree

More information

1. C. The formula for the confidence interval for a population mean is: x t, which was

1. C. The formula for the confidence interval for a population mean is: x t, which was s 1. C. The formula for the cofidece iterval for a populatio mea is: x t, which was based o the sample Mea. So, x is guarateed to be i the iterval you form.. D. Use the rule : p-value

More information

Chapter 7 - Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas:

Chapter 7 - Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas: Chapter 7 - Samplig Distributios 1 Itroductio What is statistics? It cosist of three major areas: Data Collectio: samplig plas ad experimetal desigs Descriptive Statistics: umerical ad graphical summaries

More information

3. Covariance and Correlation

3. Covariance and Correlation Virtual Laboratories > 3. Expected Value > 1 2 3 4 5 6 3. Covariace ad Correlatio Recall that by takig the expected value of various trasformatios of a radom variable, we ca measure may iterestig characteristics

More information

x : X bar Mean (i.e. Average) of a sample

x : X bar Mean (i.e. Average) of a sample A quick referece for symbols ad formulas covered i COGS14: MEAN OF SAMPLE: x = x i x : X bar Mea (i.e. Average) of a sample x i : X sub i This stads for each idividual value you have i your sample. For

More information

Convexity, Inequalities, and Norms

Convexity, Inequalities, and Norms Covexity, Iequalities, ad Norms Covex Fuctios You are probably familiar with the otio of cocavity of fuctios. Give a twicedifferetiable fuctio ϕ: R R, We say that ϕ is covex (or cocave up) if ϕ (x) 0 for

More information

Incremental calculation of weighted mean and variance

Incremental calculation of weighted mean and variance Icremetal calculatio of weighted mea ad variace Toy Fich faf@cam.ac.uk dot@dotat.at Uiversity of Cambridge Computig Service February 009 Abstract I these otes I eplai how to derive formulae for umerically

More information

Output Analysis (2, Chapters 10 &11 Law)

Output Analysis (2, Chapters 10 &11 Law) B. Maddah ENMG 6 Simulatio 05/0/07 Output Aalysis (, Chapters 10 &11 Law) Comparig alterative system cofiguratio Sice the output of a simulatio is radom, the comparig differet systems via simulatio should

More information

Lecture 4: Cheeger s Inequality

Lecture 4: Cheeger s Inequality Spectral Graph Theory ad Applicatios WS 0/0 Lecture 4: Cheeger s Iequality Lecturer: Thomas Sauerwald & He Su Statemet of Cheeger s Iequality I this lecture we assume for simplicity that G is a d-regular

More information

Overview of some probability distributions.

Overview of some probability distributions. Lecture Overview of some probability distributios. I this lecture we will review several commo distributios that will be used ofte throughtout the class. Each distributio is usually described by its probability

More information

Normal Distribution.

Normal Distribution. Normal Distributio www.icrf.l Normal distributio I probability theory, the ormal or Gaussia distributio, is a cotiuous probability distributio that is ofte used as a first approimatio to describe realvalued

More information

9.8: THE POWER OF A TEST

9.8: THE POWER OF A TEST 9.8: The Power of a Test CD9-1 9.8: THE POWER OF A TEST I the iitial discussio of statistical hypothesis testig, the two types of risks that are take whe decisios are made about populatio parameters based

More information

The second difference is the sequence of differences of the first difference sequence, 2

The second difference is the sequence of differences of the first difference sequence, 2 Differece Equatios I differetial equatios, you look for a fuctio that satisfies ad equatio ivolvig derivatives. I differece equatios, istead of a fuctio of a cotiuous variable (such as time), we look for

More information

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable Week 3 Coditioal probabilities, Bayes formula, WEEK 3 page 1 Expected value of a radom variable We recall our discussio of 5 card poker hads. Example 13 : a) What is the probability of evet A that a 5

More information

THE TWO-VARIABLE LINEAR REGRESSION MODEL

THE TWO-VARIABLE LINEAR REGRESSION MODEL THE TWO-VARIABLE LINEAR REGRESSION MODEL Herma J. Bieres Pesylvaia State Uiversity April 30, 202. Itroductio Suppose you are a ecoomics or busiess maor i a college close to the beach i the souther part

More information

Confidence Intervals for One Mean

Confidence Intervals for One Mean Chapter 420 Cofidece Itervals for Oe Mea Itroductio This routie calculates the sample size ecessary to achieve a specified distace from the mea to the cofidece limit(s) at a stated cofidece level for a

More information

University of California, Los Angeles Department of Statistics. Distributions related to the normal distribution

University of California, Los Angeles Department of Statistics. Distributions related to the normal distribution Uiversity of Califoria, Los Ageles Departmet of Statistics Statistics 100B Istructor: Nicolas Christou Three importat distributios: Distributios related to the ormal distributio Chi-square (χ ) distributio.

More information

Asymptotic Growth of Functions

Asymptotic Growth of Functions CMPS Itroductio to Aalysis of Algorithms Fall 3 Asymptotic Growth of Fuctios We itroduce several types of asymptotic otatio which are used to compare the performace ad efficiecy of algorithms As we ll

More information

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown Z-TEST / Z-STATISTIC: used to test hypotheses about µ whe the populatio stadard deviatio is kow ad populatio distributio is ormal or sample size is large T-TEST / T-STATISTIC: used to test hypotheses about

More information

Confidence Intervals. CI for a population mean (σ is known and n > 30 or the variable is normally distributed in the.

Confidence Intervals. CI for a population mean (σ is known and n > 30 or the variable is normally distributed in the. Cofidece Itervals A cofidece iterval is a iterval whose purpose is to estimate a parameter (a umber that could, i theory, be calculated from the populatio, if measuremets were available for the whole populatio).

More information

ABSTRACT INTRODUCTION MATERIALS AND METHODS

ABSTRACT INTRODUCTION MATERIALS AND METHODS INTENATIONAL JOUNAL OF AGICULTUE & BIOLOGY 156 853/6/8 1 5 9 http://www.fspublishers.org Multiplate Peetratio Tests to Predit Soil Pressure-siage Behaviour uder etagular egio M. ASHIDI 1, A. KEYHANI AND

More information

Unit 20 Hypotheses Testing

Unit 20 Hypotheses Testing Uit 2 Hypotheses Testig Objectives: To uderstad how to formulate a ull hypothesis ad a alterative hypothesis about a populatio proportio, ad how to choose a sigificace level To uderstad how to collect

More information

An example of non-quenched convergence in the conditional central limit theorem for partial sums of a linear process

An example of non-quenched convergence in the conditional central limit theorem for partial sums of a linear process A example of o-queched covergece i the coditioal cetral limit theorem for partial sums of a liear process Dalibor Volý ad Michael Woodroofe Abstract A causal liear processes X,X 0,X is costructed for which

More information

Center, Spread, and Shape in Inference: Claims, Caveats, and Insights

Center, Spread, and Shape in Inference: Claims, Caveats, and Insights Ceter, Spread, ad Shape i Iferece: Claims, Caveats, ad Isights Dr. Nacy Pfeig (Uiversity of Pittsburgh) AMATYC November 2008 Prelimiary Activities 1. I would like to produce a iterval estimate for the

More information

Linear Algebra II. 4 Determinants. Notes 4 1st November Definition of determinant

Linear Algebra II. 4 Determinants. Notes 4 1st November Definition of determinant MTH6140 Liear Algebra II Notes 4 1st November 2010 4 Determiats The determiat is a fuctio defied o square matrices; its value is a scalar. It has some very importat properties: perhaps most importat is

More information

STATISTICAL METHODS FOR BUSINESS

STATISTICAL METHODS FOR BUSINESS STATISTICAL METHODS FOR BUSINESS UNIT 7: INFERENTIAL TOOLS. DISTRIBUTIONS ASSOCIATED WITH SAMPLING 7.1.- Distributios associated with the samplig process. 7.2.- Iferetial processes ad relevat distributios.

More information

Lecture 4: Cauchy sequences, Bolzano-Weierstrass, and the Squeeze theorem

Lecture 4: Cauchy sequences, Bolzano-Weierstrass, and the Squeeze theorem Lecture 4: Cauchy sequeces, Bolzao-Weierstrass, ad the Squeeze theorem The purpose of this lecture is more modest tha the previous oes. It is to state certai coditios uder which we are guarateed that limits

More information

Inference on Proportion. Chapter 8 Tests of Statistical Hypotheses. Sampling Distribution of Sample Proportion. Confidence Interval

Inference on Proportion. Chapter 8 Tests of Statistical Hypotheses. Sampling Distribution of Sample Proportion. Confidence Interval Chapter 8 Tests of Statistical Hypotheses 8. Tests about Proportios HT - Iferece o Proportio Parameter: Populatio Proportio p (or π) (Percetage of people has o health isurace) x Statistic: Sample Proportio

More information

Example 2 Find the square root of 0. The only square root of 0 is 0 (since 0 is not positive or negative, so those choices don t exist here).

Example 2 Find the square root of 0. The only square root of 0 is 0 (since 0 is not positive or negative, so those choices don t exist here). BEGINNING ALGEBRA Roots ad Radicals (revised summer, 00 Olso) Packet to Supplemet the Curret Textbook - Part Review of Square Roots & Irratioals (This portio ca be ay time before Part ad should mostly

More information

0.7 0.6 0.2 0 0 96 96.5 97 97.5 98 98.5 99 99.5 100 100.5 96.5 97 97.5 98 98.5 99 99.5 100 100.5

0.7 0.6 0.2 0 0 96 96.5 97 97.5 98 98.5 99 99.5 100 100.5 96.5 97 97.5 98 98.5 99 99.5 100 100.5 Sectio 13 Kolmogorov-Smirov test. Suppose that we have a i.i.d. sample X 1,..., X with some ukow distributio P ad we would like to test the hypothesis that P is equal to a particular distributio P 0, i.e.

More information

Divide and Conquer. Maximum/minimum. Integer Multiplication. CS125 Lecture 4 Fall 2015

Divide and Conquer. Maximum/minimum. Integer Multiplication. CS125 Lecture 4 Fall 2015 CS125 Lecture 4 Fall 2015 Divide ad Coquer We have see oe geeral paradigm for fidig algorithms: the greedy approach. We ow cosider aother geeral paradigm, kow as divide ad coquer. We have already see a

More information

4.1 Sigma Notation and Riemann Sums

4.1 Sigma Notation and Riemann Sums 0 the itegral. Sigma Notatio ad Riema Sums Oe strategy for calculatig the area of a regio is to cut the regio ito simple shapes, calculate the area of each simple shape, ad the add these smaller areas

More information

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES Read Sectio 1.5 (pages 5 9) Overview I Sectio 1.5 we lear to work with summatio otatio ad formulas. We will also itroduce a brief overview of sequeces,

More information

Definition. A variable X that takes on values X 1, X 2, X 3,...X k with respective frequencies f 1, f 2, f 3,...f k has mean

Definition. A variable X that takes on values X 1, X 2, X 3,...X k with respective frequencies f 1, f 2, f 3,...f k has mean 1 Social Studies 201 October 13, 2004 Note: The examples i these otes may be differet tha used i class. However, the examples are similar ad the methods used are idetical to what was preseted i class.

More information

MEI Structured Mathematics. Module Summary Sheets. Statistics 2 (Version B: reference to new book)

MEI Structured Mathematics. Module Summary Sheets. Statistics 2 (Version B: reference to new book) MEI Mathematics i Educatio ad Idustry MEI Structured Mathematics Module Summary Sheets Statistics (Versio B: referece to ew book) Topic : The Poisso Distributio Topic : The Normal Distributio Topic 3:

More information

PSYCHOLOGICAL STATISTICS

PSYCHOLOGICAL STATISTICS UNIVERSITY OF CALICUT SCHOOL OF DISTANCE EDUCATION B Sc. Cousellig Psychology (0 Adm.) IV SEMESTER COMPLEMENTARY COURSE PSYCHOLOGICAL STATISTICS QUESTION BANK. Iferetial statistics is the brach of statistics

More information

3. Greatest Common Divisor - Least Common Multiple

3. Greatest Common Divisor - Least Common Multiple 3 Greatest Commo Divisor - Least Commo Multiple Defiitio 31: The greatest commo divisor of two atural umbers a ad b is the largest atural umber c which divides both a ad b We deote the greatest commo gcd

More information

CHAPTER 3 DIGITAL CODING OF SIGNALS

CHAPTER 3 DIGITAL CODING OF SIGNALS CHAPTER 3 DIGITAL CODING OF SIGNALS Computers are ofte used to automate the recordig of measuremets. The trasducers ad sigal coditioig circuits produce a voltage sigal that is proportioal to a quatity

More information

Analyzing Longitudinal Data from Complex Surveys Using SUDAAN

Analyzing Longitudinal Data from Complex Surveys Using SUDAAN Aalyzig Logitudial Data from Complex Surveys Usig SUDAAN Darryl Creel Statistics ad Epidemiology, RTI Iteratioal, 312 Trotter Farm Drive, Rockville, MD, 20850 Abstract SUDAAN: Software for the Statistical

More information

A probabilistic proof of a binomial identity

A probabilistic proof of a binomial identity A probabilistic proof of a biomial idetity Joatho Peterso Abstract We give a elemetary probabilistic proof of a biomial idetity. The proof is obtaied by computig the probability of a certai evet i two

More information

UC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Solutions 9 Spring 2006

UC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Solutions 9 Spring 2006 Exam format UC Bereley Departmet of Electrical Egieerig ad Computer Sciece EE 6: Probablity ad Radom Processes Solutios 9 Sprig 006 The secod midterm will be held o Wedesday May 7; CHECK the fial exam

More information

Hypergeometric Distributions

Hypergeometric Distributions 7.4 Hypergeometric Distributios Whe choosig the startig lie-up for a game, a coach obviously has to choose a differet player for each positio. Similarly, whe a uio elects delegates for a covetio or you

More information

THIN SEQUENCES AND THE GRAM MATRIX PAMELA GORKIN, JOHN E. MCCARTHY, SANDRA POTT, AND BRETT D. WICK

THIN SEQUENCES AND THE GRAM MATRIX PAMELA GORKIN, JOHN E. MCCARTHY, SANDRA POTT, AND BRETT D. WICK THIN SEQUENCES AND THE GRAM MATRIX PAMELA GORKIN, JOHN E MCCARTHY, SANDRA POTT, AND BRETT D WICK Abstract We provide a ew proof of Volberg s Theorem characterizig thi iterpolatig sequeces as those for

More information

Quantum Mechanics for Scientists and Engineers. David Miller

Quantum Mechanics for Scientists and Engineers. David Miller Quatum Mechaics for Scietists ad Egieers David Miller Measuremet ad expectatio values Measuremet ad expectatio values Quatum-mechaical measuremet Probabilities ad expasio coefficiets Suppose we take some

More information

1.3 Binomial Coefficients

1.3 Binomial Coefficients 18 CHAPTER 1. COUNTING 1. Biomial Coefficiets I this sectio, we will explore various properties of biomial coefficiets. Pascal s Triagle Table 1 cotais the values of the biomial coefficiets ( ) for 0to

More information

Math C067 Sampling Distributions

Math C067 Sampling Distributions Math C067 Samplig Distributios Sample Mea ad Sample Proportio Richard Beigel Some time betwee April 16, 2007 ad April 16, 2007 Examples of Samplig A pollster may try to estimate the proportio of voters

More information

1 Hypothesis testing for a single mean

1 Hypothesis testing for a single mean BST 140.65 Hypothesis Testig Review otes 1 Hypothesis testig for a sigle mea 1. The ull, or status quo, hypothesis is labeled H 0, the alterative H a or H 1 or H.... A type I error occurs whe we falsely

More information

Factors of sums of powers of binomial coefficients

Factors of sums of powers of binomial coefficients ACTA ARITHMETICA LXXXVI.1 (1998) Factors of sums of powers of biomial coefficiets by Neil J. Cali (Clemso, S.C.) Dedicated to the memory of Paul Erdős 1. Itroductio. It is well ow that if ( ) a f,a = the

More information

Class Meeting # 16: The Fourier Transform on R n

Class Meeting # 16: The Fourier Transform on R n MATH 18.152 COUSE NOTES - CLASS MEETING # 16 18.152 Itroductio to PDEs, Fall 2011 Professor: Jared Speck Class Meetig # 16: The Fourier Trasform o 1. Itroductio to the Fourier Trasform Earlier i the course,

More information

BASIC STATISTICS. f(x 1,x 2,..., x n )=f(x 1 )f(x 2 ) f(x n )= f(x i ) (1)

BASIC STATISTICS. f(x 1,x 2,..., x n )=f(x 1 )f(x 2 ) f(x n )= f(x i ) (1) BASIC STATISTICS. SAMPLES, RANDOM SAMPLING AND SAMPLE STATISTICS.. Radom Sample. The radom variables X,X 2,..., X are called a radom sample of size from the populatio f(x if X,X 2,..., X are mutually idepedet

More information

Chapter 5: Inner Product Spaces

Chapter 5: Inner Product Spaces Chapter 5: Ier Product Spaces Chapter 5: Ier Product Spaces SECION A Itroductio to Ier Product Spaces By the ed of this sectio you will be able to uderstad what is meat by a ier product space give examples

More information

Lecture 10: Hypothesis testing and confidence intervals

Lecture 10: Hypothesis testing and confidence intervals Eco 514: Probability ad Statistics Lecture 10: Hypothesis testig ad cofidece itervals Types of reasoig Deductive reasoig: Start with statemets that are assumed to be true ad use rules of logic to esure

More information

Infinite Sequences and Series

Infinite Sequences and Series CHAPTER 4 Ifiite Sequeces ad Series 4.1. Sequeces A sequece is a ifiite ordered list of umbers, for example the sequece of odd positive itegers: 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29...

More information

INFINITE SERIES KEITH CONRAD

INFINITE SERIES KEITH CONRAD INFINITE SERIES KEITH CONRAD. Itroductio The two basic cocepts of calculus, differetiatio ad itegratio, are defied i terms of limits (Newto quotiets ad Riema sums). I additio to these is a third fudametal

More information

CS103X: Discrete Structures Homework 4 Solutions

CS103X: Discrete Structures Homework 4 Solutions CS103X: Discrete Structures Homewor 4 Solutios Due February 22, 2008 Exercise 1 10 poits. Silico Valley questios: a How may possible six-figure salaries i whole dollar amouts are there that cotai at least

More information

Mann-Whitney U 2 Sample Test (a.k.a. Wilcoxon Rank Sum Test)

Mann-Whitney U 2 Sample Test (a.k.a. Wilcoxon Rank Sum Test) No-Parametric ivariate Statistics: Wilcoxo-Ma-Whitey 2 Sample Test 1 Ma-Whitey 2 Sample Test (a.k.a. Wilcoxo Rak Sum Test) The (Wilcoxo-) Ma-Whitey (WMW) test is the o-parametric equivalet of a pooled

More information

The following example will help us understand The Sampling Distribution of the Mean. C1 C2 C3 C4 C5 50 miles 84 miles 38 miles 120 miles 48 miles

The following example will help us understand The Sampling Distribution of the Mean. C1 C2 C3 C4 C5 50 miles 84 miles 38 miles 120 miles 48 miles The followig eample will help us uderstad The Samplig Distributio of the Mea Review: The populatio is the etire collectio of all idividuals or objects of iterest The sample is the portio of the populatio

More information

BINOMIAL EXPANSIONS 12.5. In this section. Some Examples. Obtaining the Coefficients

BINOMIAL EXPANSIONS 12.5. In this section. Some Examples. Obtaining the Coefficients 652 (12-26) Chapter 12 Sequeces ad Series 12.5 BINOMIAL EXPANSIONS I this sectio Some Examples Otaiig the Coefficiets The Biomial Theorem I Chapter 5 you leared how to square a iomial. I this sectio you

More information

THE ABRACADABRA PROBLEM

THE ABRACADABRA PROBLEM THE ABRACADABRA PROBLEM FRANCESCO CARAVENNA Abstract. We preset a detailed solutio of Exercise E0.6 i [Wil9]: i a radom sequece of letters, draw idepedetly ad uiformly from the Eglish alphabet, the expected

More information

PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM

PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY Physical ad Mathematical Scieces 2015, 1, p. 15 19 M a t h e m a t i c s AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM A. G. GULYAN Chair of Actuarial Mathematics

More information

.04. This means $1000 is multiplied by 1.02 five times, once for each of the remaining sixmonth

.04. This means $1000 is multiplied by 1.02 five times, once for each of the remaining sixmonth Questio 1: What is a ordiary auity? Let s look at a ordiary auity that is certai ad simple. By this, we mea a auity over a fixed term whose paymet period matches the iterest coversio period. Additioally,

More information

Analyzing Patterns of User Content Generation in Online Social Networks

Analyzing Patterns of User Content Generation in Online Social Networks Aalyzig Patters of User Cotet Geeratio i Olie Soial Networks Lei Guo, Ehua Ta, Sogqig Che, Xiaodog Zhag, ad Yihog (Eri) Zhao Yahoo! I. 7 First Aveue Suyvale, CA 989, USA {lguo,yzhao}@yahoo-i.om Dept. of

More information

Gregory Carey, 1998 Linear Transformations & Composites - 1. Linear Transformations and Linear Composites

Gregory Carey, 1998 Linear Transformations & Composites - 1. Linear Transformations and Linear Composites Gregory Carey, 1998 Liear Trasformatios & Composites - 1 Liear Trasformatios ad Liear Composites I Liear Trasformatios of Variables Meas ad Stadard Deviatios of Liear Trasformatios A liear trasformatio

More information

Chapter 10 Student Lecture Notes 10-1

Chapter 10 Student Lecture Notes 10-1 Chapter 0 tudet Lecture Notes 0- Basic Busiess tatistics (9 th Editio) Chapter 0 Two-ample Tests with Numerical Data 004 Pretice-Hall, Ic. Chap 0- Chapter Topics Comparig Two Idepedet amples Z test for

More information

, a Wishart distribution with n -1 degrees of freedom and scale matrix.

, a Wishart distribution with n -1 degrees of freedom and scale matrix. UMEÅ UNIVERSITET Matematisk-statistiska istitutioe Multivariat dataaalys D MSTD79 PA TENTAMEN 004-0-9 LÖSNINGSFÖRSLAG TILL TENTAMEN I MATEMATISK STATISTIK Multivariat dataaalys D, 5 poäg.. Assume that

More information

Gibbs Distribution in Quantum Statistics

Gibbs Distribution in Quantum Statistics Gibbs Distributio i Quatum Statistics Quatum Mechaics is much more complicated tha the Classical oe. To fully characterize a state of oe particle i Classical Mechaics we just eed to specify its radius

More information

Lesson 15 ANOVA (analysis of variance)

Lesson 15 ANOVA (analysis of variance) Outlie Variability -betwee group variability -withi group variability -total variability -F-ratio Computatio -sums of squares (betwee/withi/total -degrees of freedom (betwee/withi/total -mea square (betwee/withi

More information

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13 EECS 70 Discrete Mathematics ad Probability Theory Sprig 2014 Aat Sahai Note 13 Itroductio At this poit, we have see eough examples that it is worth just takig stock of our model of probability ad may

More information

Approximating Area under a curve with rectangles. To find the area under a curve we approximate the area using rectangles and then use limits to find

Approximating Area under a curve with rectangles. To find the area under a curve we approximate the area using rectangles and then use limits to find 1.8 Approximatig Area uder a curve with rectagles 1.6 To fid the area uder a curve we approximate the area usig rectagles ad the use limits to fid 1.4 the area. Example 1 Suppose we wat to estimate 1.

More information

hp calculators HP 12C Statistics - average and standard deviation Average and standard deviation concepts HP12C average and standard deviation

hp calculators HP 12C Statistics - average and standard deviation Average and standard deviation concepts HP12C average and standard deviation HP 1C Statistics - average ad stadard deviatio Average ad stadard deviatio cocepts HP1C average ad stadard deviatio Practice calculatig averages ad stadard deviatios with oe or two variables HP 1C Statistics

More information

Lecture 3. denote the orthogonal complement of S k. Then. 1 x S k. n. 2 x T Ax = ( ) λ x. with x = 1, we have. i = λ k x 2 = λ k.

Lecture 3. denote the orthogonal complement of S k. Then. 1 x S k. n. 2 x T Ax = ( ) λ x. with x = 1, we have. i = λ k x 2 = λ k. 18.409 A Algorithmist s Toolkit September 17, 009 Lecture 3 Lecturer: Joatha Keler Scribe: Adre Wibisoo 1 Outlie Today s lecture covers three mai parts: Courat-Fischer formula ad Rayleigh quotiets The

More information