

 Letitia Turner
 2 years ago
 Views:
Transcription
1 Statistica Siica 6(1996), EFFECT OF HIGH DIMENSION: BY AN EXAMPLE OF A TWO SAMPLE PROBLEM Zhidog Bai ad Hewa Saraadasa Natioal Su Yatse Uiversity Abstract: With the rapid developmet of moder computig techiques, statisticias are dealig with data with much higher dimesio. Cosequetly, due to their loss of accuracy or power, some classical statistical ifereces are beig challeged by oexact approaches. The purpose of this paper is to poit out ad briey aalyze such a pheomeo ad to ecourage statisticias to reexamie classical statistical approaches whe they are dealig with high dimesioal data. As a example, we derive the asymptotic power of the classical Hotellig's T test ad Dempster's oexact test for atwosample problem. Also, a asymptotically ormally distributed test statistic is proposed. Our results show that both Dempster's oexact test ad the ew test have higher power tha Hotellig's test whe the data dimesio is proportioally close to the withi sample degrees of freedom. Although our ew test has a asymptotic power fuctio similar to Dempster's, it does ot rely o the ormality assumptio. Some simulatio results are preseted which show that the oexact tests are more powerful tha Hotellig's test eve for moderately large dimesio ad sample sizes. Key words ad phrases: Edgeworth expasio, Hotellig T test, hypothesis test, power fuctio, sigicace test, approximatio. 1. Itroductio Moder computatio techiques make it possible to deal with high dimesioal data. Some recet examples of iterest i dealig with high dimesioal data ca be foud i Narayaaswamy ad Raghavarao (1991) ad Saraadasa (1991, 1993). Examples may also be foud i applied statistical iferece hadlig samples of may measuremets o idividuals. For example, i a cliical trial of pharmaceutical studies, may blood chemistry measuremets are measured o each idividual. I some studies the umber of variables is comparable to or eve exceeds the total sample size. The purpose of this article is to raise the followig questios: What's ew i high dimesioal statistical iferece ad what should be doe? The dierece of high dimesioal statistical iferece from that i classical statistical iferece will be referred to as the \Eect of High Dimesio" (EHD).
2 31 ZHIDONG BAI AND HEWA SARANADASA There are two aspects of the EHD. The rst, there are too may iterestig or uisace parameters i the model. For example, i Mestimatio i liear models, the umber of regressio parameters may be proportioal to the sample size. This problem remais usolved. The best results are due to Huber's work (1973) i which the cosistecy of estimatio is proved uder the assumptio that p =! 0 ad the asymptotic ormality uder p 3 =! 0, where ad p are the sample size ad the dimesio of regressio coecet vector. Althogh these requiremets o the ratio of the dimetio to the sample size were reduced, very strog assumptios were made o the desig sequece. Refereces are made to Portoy (1984,1985). Aother example is the model of Error i Variables i which the true regressor variables ca be cosidered as uisace parameters whose umber is p (while the umber of observatios is (p + 1)). I these cases, either the estimatio is very poor or it is impossible to get a ubiased or cosistet estimator. The secod case is that the dimesio itself of the data is very high. Of course, the umber of parameters to be estimated must be very large. A example is the detectio of the sigal umber i omidirectioal sigal processig. Whe the umber of sesors are icreased, the detectio accuracy is supposed to be better. However, the simulatio results show the opposite whe the traditioal method (the MUSIC method) is used if the umber of sesors is 10 or more. We believe that the reaso is that the umber of elemets of the covariace matrix (parametrs to be estimated) becomes very large (p ad 00 if p = 10). Some refereces i this directio are Bai, Krishaiah ad Zhao (1989) ad Zhao, Krishiaiah ad Bai (1986a,b). Although the EHD has bee oticed i may dieret directios of multivariate statistical ifereces, the problem has ot yet bee clearly stated i the literature ad o appropriate methods have bee proposed to deal with the EHD. To this ed, we shall aalyze these problems through the two sample problem, as a example to show howadwhy the EHD aects ifereces ad how the EHD ca be reduced. A classical method to deal with this problem is the famous Hotellig's T test. Its advatages iclude: it is ivariat uder liear trasformatio, its exact distributio is kow uder the ull hypothesis ad it is powerful whe the dimesio of data is sucietly small, compared with the sample sizes. However, Hotellig's test has the serious defect that the T statistic is udeed whe the dimesio of data is greater tha the withi sample degrees of freedom. Seekig remedies, Chug ad Fraser (1958) proposed a oparametric test ad Dempster (1958, 1960) discussed the socalled \oexact" sigicace test. Dempster (1960) also cosidered the socalled radomizatio test. These works seek alteratives to Hotellig's test i situatios whe the latter does ot apply. Not oly beig a remedy whe the T is udeed, we show that eve it is well
3 EFFECT OF HIGH DIMENSION 313 deed, the oexact test is more powerful tha the T test whe the dimesio is proportioally \close to" (more discussio o the ratio will be give i Sectio 5) the sample degrees of freedom. Both the T test ad Dempster's oexact test strogly rely o the ormality assumptio. Moreover, Dempster's oexact test statistic ivolves a complicated estimatio of r, the \degrees of freedom" for the chisquare approximatio. To simplify the testig procedure, a ew method is proposed i Sectio 4. It is prove i Sectios 3 ad4thatthe asymptotic power of the ew test is equivalet to that of Dempster's test. Simulatio results further show that our ew approach is slightly more powerful tha Dempster's. We believe that the estimatio of r ad its roudig to a iteger i Dempster's procedure may cause a error of order O(1=). This might idicate that the ew approach is superior to Dempster's test i the secod order term i some Edgeworthtype expasios. We shall ot discuss this i detail i this paper but hope to address it i future work. Some simulatio results ad discussios are preseted i Sectio 5 ad some techical proofs are give i the Appedix.. Asymptotic Power of Hotellig's Test I this sectio, we derive the asymptotic power fuctios of the T test for the two sample problem. The model described here is the same as the oe i Dempster's test give i the ext sectio. Suppose that x i j N p ( i ) j =1 ::: N i i =1 are two idepedet samples. To test the hypothesis H 0 : 1 = vs H 1 : 1 6=, traditioally oe uses Hotellig's famous T test which isdeedby T = (x 1 ; x ) 0 A ;1 (x 1 ; x ) (:1) P where x i = 1 N i Ni x j=1 i j i = 1 A = P P N i j=1(x i j ; x i )(x i j ; x i ) 0 ad = N1N N 1+N with = N 1 + N ;. The purpose of this sectio is to ivestigate the power fuctio of Hotellig's test whe p=! y (0 1) for guarateeig the existece of the T statistic, ad to compare it with other oexact tests give i later sectios. To derive the asymptotic power of Hotellig's test, we rst derive a asymptotic expressio for the threshold of the test. It is well kow that uder the ull hypothesis, ;p+1 p T has a F distributio with degrees of freedom p ad ; p +1. Let the sigicace level be chose as ad the threshold be deoted by F (p ; p + 1). We have the followig lemma. q p Lemma.1. F ;p+1 (p ; p +1)= y + y 1;y (1;y) 3 + o( p 1 ) where y = p, lim!1 y = y (0 1) ad is the 1 ; quatile of stadard ormal distributio.
4 314 ZHIDONG BAI AND HEWA SARANADASA Proof. Uder the ull hypothesis, by the Cetral Limit Theorem, s (1 ; y) 3 T y ; y!n(0 1) as!1 1 ; y from which the result follows immediately. Now, we cosider the behavior of T = uder H 1. I this case, its distributio is the same as (w + ;1= ) 0 U ;1 (w + ;1= ) (:) where = ; 1 (1 ; ) U= P u iu 0 i w =(w 1 ::: w p ) 0 ad u i i =1 ::: are i.i.d. N(0 I p ) radom vectors ad =(N 1 + N )=N 1 N : Deote the spectral decompositio of U ;1 by O 0 diag[d 1 ::: d p ]O with eigevalues d 1 d p > 0. The, (.) becomes (Ow + ;1= kkv) 0 diag[d 1 ::: d p ](Ow + ;1= kkv) (:3) where v = O=kk. Sice U has the Wishart distributio W ( I p ), the orthogoal matrix O has the Haar distributio o the group of all orthogoal pmatrices, ad hece the vector v is uiformly distributed o the uit psphere. Note that the coditioal distributio of Ow give O is N (0 I p ), the same as that of w which is idepedet ofo. This shows that Ow is idepedet ofv. Therefore, replacig Ow i (.3) by w does ot chage the joit distributio of Ow, v ad the d i 's. Cosequetly, T has the same distributio as = px (w i +w i v i ;1= kk + ;1 kk v i )d i (:4) where v =(v 1 ::: v p ) 0 is uiformly distributed o the uit sphere of R p ad is idepedet ofw ad the d i 's. Lemma.. Usig the above otatio, we have p P p d i ; y 1;y P p d i! y (1;y) 3 i probability.! 0 ad Proof. Recallig (.4) with = 0 uder the ull hypothesis ad applyig the Cetral Limit Theorem with D = fd 1 ::: d p g give, we have s T P 1 ; y y + y (1 ; y) 3 x p P q ( 1;y p ; y d y i)+ x (1;y) 3 h = E P P p (w i ; 1)d i p P p d i h p ( 1;y y = E P q p ; d y i)+ p P p d i (1;y) 3 p P p d i x D i i + o(1) (:5)
5 EFFECT OF HIGH DIMENSION 315 where is the distributio fuctio of a stadard ormal variable. O the other had, as show i the proof of Lemma.1, the Cetral Limit Theorem implies that the above quatity teds to (x), for all x. Hece, by the typecovergece theorem (see Page 16 of Loeve (1977)), the lemma is proved. Now we are i positio to derive a approximatio of the power fuctio of Hotellig's test. Theorem.1. If y = p! y (0 1), N 1=(N 1 + N )! (0 1) ad kk = o(1) the H () ; ; + s (1 ; y) y where H () is the power fuctio of Hotellig's test. (1 ; )kk! 0 (:6) Remark.1. The usual cosideratio of the alterative hypothesis i limitig theorems is to assume that p kk! a > 0. Uder this additioal assumptio, it follows from (.6) that the limitig power of Hotellig's test is give by (; +((1;y)=y) 1= (1;)a). This formula shows that the limitig power of Hotellig's test is slowly icreasig for y close to 1, as the ocetral parameter (amely a) icreases. Proof. Write D =(d 1 ::: d ). Usig the facts Ev 1 =1=p, Ev 4 1 =3=[p(p +)] ad Ev 1 v =1=[p(p + )] ad the applyig Lemma., oe easily obtais E E h px w i v i d i ;1= kk i D = hx p (vi ; Evi ) ;1 kk d i i D = ; kk 4 h p(p +) px d i ; p (p +) px d i kk p! 0 i Pr. (:7) px i d i! 0i Pr. (:8) ad px (Ev i ) ;1 kk d i = 1 p kk px d i = y kk p(1 ; y ) (1 + o p( 1 p )): (:9) Thus, by the above ad Lemma.1, we have H () =P p X w i d i y 1 ; y + s y (1 ; y) 3 p ; y kk p(1 ; y ) + o( p 1 )
6 316 ZHIDONG BAI AND HEWA SARANADASA q y h = E P = ; + P p (w i ; 1)d i pp p d i s (1 ; y) y The proof of Theorem.1 is ow complete. 3. Discussio o Dempster's NoExact Test p ) (1;y) 3 p ; ykk + o( 1 p(1;y) pp p d i D i (1 ; )kk + o(1): (:10) Dempster (1958, 1960) proposed a oexact test for the hypothesis described i Sectio, with the dimesio of data possibly greater tha the sample degrees of freedom. First, let us briey describe his test. Deote q N = N 1 + N, X 0 = (x 11 x 1 ::: x 1N1 x 1 ::: x N ) ad by H 0 = ( p 1 N N J N ( N J0 1(N 1+N ) N 1, q N ; 1 N J0 (N 1+N ) N ) 0 h 3 ::: h N ) a suitably chose orthogoal matrix, where J d is a d dimesioal colum vector of 1's. Let Y = HX = (y 1 ::: y N ) 0. The, the vectors y 1 ::: y N are idepedet ormal radom vectors with E(y 1 ) = (N 1 1 +N )= p N, E(y )= ;1= ( 1 ; ), E(y j )=0 for 3 j N Cov(y j )= 1 j N. The, Dempster proposed his oexact sigicace test statistic F = Q =( P N Q i=3 i=), where Q i = yiy 0 i, = N ;. He used the socalled approximatio techique, assumig Q i is approximately distributed as m r, where the parameters m ad r may besolved by the method of momets. The, the distributio of F is approximately F r r. But geerally the parameter r (its explicit form is give i (3.3) below) is ukow. He estimated r by either of the followig two ways. Approach 1: ^r is the solutio of the equatio Approach : ^r is the solutio of the equatio t + w = t = + 1^r 1+ 1 ( ; 1) (3:1) 1 3^r 1 + 1^r 1+ 1 ( ; 1) + + 3^r 1^r 3 (3:) ^r where t = [l( 1 P N i=3 Q i)] ; P N i=3 l Q i, w = ; P 3i<jN l si ij ad ij is the agle betwee the vectors of y i y j, 3 i<j N. Dempster's test is the to reject H 0 if F >F (^r ^r): By elemetary calculus, we have r = (tr()) tr( ) ad m = tr( ) tr : (3:3)
7 EFFECT OF HIGH DIMENSION 317 From (3.3) ad the CauchySchwarz iequality, it follows that r p. O the other had, uder regular coditios, both tr() ad tr( ) are of the order O(), ad hece, r is of the same order. Uder wider coditios (3.7) ad (3.8) give i Theorem 3.1 below, it ca be proved that r! 1. Further, we may prove that t (=r)n (1 ;1= ) ad w (;1) 4 r N (1 + 8 (;1) r ). From these estimates, oe may coclude that both ^r 1 ad ^r are ratiocosistet (i the sese that ^r r! 1). Therefore, the solutios of equatios (3.1) ad (3.) should satisfy ^r 1 = t + O(1) (3:4) ad ^r = 1 w + O(1) (3:5) respectively. Sice the radom eect may cause a error of order O(1), oe may simply choose the estimates of r as t or 1 w;. I the remaider of this sectio, we derive a asymptotic power fuctio of Dempster's oexact test, uder the coditios: p=! y>0, N 1 =(N 1 + N )! (0 1) ad the parameter r is kow. The reader should ote that the limitig ratio y is allowed to be greater tha oe i this case, which is dieret fromthat assumed i Sectio. Whe r is ukow, substitutig r by the estimators ^r 1 or ^r may cause a error of high order smalless i the approximatio of the power fuctio of Dempster's oexact test, as will be see i the proof of Theorem 3.1. Similar to Lemma.1 oe may show the followig lemma. Lemma 3.1. Whe r!1, F (r r)=1+ q =r + o(1= p r): (3:6) The we have the followig approximatio of the power fuctio of Dempster's test. Theorem 3.1. If ad r is kow, the where = 1 ; : D () ; (; + 0 = o(tr ) (3:7) max = o( p tr ) (3:8) (1 ; )kk p )! 0 (3:9) tr
8 318 ZHIDONG BAI AND HEWA SARANADASA Remark 3.1. I usual cases whe cosiderig the asymptotic power of Dempster's test, the quatity kk is ordiarily assumed to have the same order as 1= p ad tr( )tohave order. Thus, the quatities kk = p tr ad p kk are both bouded away from zero ad iity. The expressio of the asymptotic power of Hotellig test is ivolved with a factor p 1 ; y which disappears i the expressio of the asymptotic power of Dempster's test. This reveals the reaso why the power of the Hotellig test icreases much slower tha that of the Dempster test as the ocetral parameter icreases if y is close to oe. Proof. Let =( 1 ::: p ) 0 = ; 1 : The, P p D () =P (yi + ;1= i y i + ;1 i ) P i P p > ;1 F (r r) (3:10) j=1 z ij i where y i, z ij i =1 ::: p j =1 ::: are i.i.d. N (0 1) variables ad 1 ::: p are eigevalues of. By the Cetral Limit Theorem, the laws of large umbers, (3.7) ad (3.8), oe may easily show that: P p (y i ; 1+ ;1= i y i ) i p tr = P p (y i ; 1+ ;1= i y i ) i p tr +4 ;1 0 D!N (0 1): (3:11) ad X px j=1 q q zij i = (tr) 1+ =rn (0 1) + o p ( 1=r) : (3:1) Notig that P p i i = kk ad r = (tr) tr the result (3.9) follows from (3.7) ad Lemma 3.1, immediately. The proof of Theorem 3.1 is ow complete. 4. A New Approach to Test H 0 I this sectio, we propose a ew test for H 0. Istead of the ormality of the uderlyig distributios, we assume: (a) x ij =;z ij + j i =1 ::: N j, j =1, where ; is a p m matrix (m 1) with ;; 0 =adz ij are i.i.d. radom mvectors with idepedet compoets satisfyig Ez ij = 0, Var(z ij ) = I m, Ezijk 4 = 3+ < 1 ad Q m E k=1 z k ijk = 0 (ad 1) whe there is at least oe k = 1 (there are two k 's equal to, correspodigly), wheever m =4 (b) p=! y>0adn 1 =(N 1 + N )! (0 1) (c) (3.7) ad (3.8) are true. Here ad later, it should be oted that all radom variables ad parameters deped o. For simplicity we omit the subscript from all radom variables except those statistics deed later.
9 EFFECT OF HIGH DIMENSION 319 Now, we begi to costruct our test. Cosider the statistic M =(x 1 ; x ) 0 (x 1 ; x ) ; trs (4:1) where S = 1 A, x 1 x ad A are deed i Sectio. Uder H 0, we have EM =0. If the coditios (a)  (c) are true, it may be proved (see the Appedix) that uder H 0, M Z = p!n(0 1) as!1: (4:) VarM If the uderlyig distributios are ormal as described i Sectio, the uder H 0 wehave M := VarM = (1 + 1 )tr : (4:3) If the uderlyig distributios are ot ormal but satisfy the coditios (a)  (c), oe may show (see the Appedix) that VarM = M(1 + o(1)): (4:4) Hece (4.) is still true if the deomiator of Z is replaced by M. Therefore, to complete the costructio of our test statistic, we eed oly d a ratiocosistet estimator of tr( )adsubstitute it ito the deomiator of Z. It seems that a atural estimator of tr should be trs. However, ulike the case where p is xed, trs is geerally either ubiased or ratiocosistet eve uder the ormal assumptio. If S W p ( ),itisroutietoverify that B = trs ; 1 ( +)( ; 1) (trs ) is a ubiased ad ratiocosistet estimator of tr. Here, it should be oted that trs ; 1 (trs ) 0, by the CauchySchwarz iequality. I the Appedix, we shall prove that B is still a ratiocosistet estimator of tr uder the Coditios (a)  (c). Replacig tr i (4.3) by the ratiocosistet estimator B,we obtai our test statistic Z = (x 1 ; x ) 0 (x 1 ; x ) ; trs trs ; ;1 (trs ) r (+1) (+)(;1) = N 1N N 1+N (x 1 ; x ) 0 (x 1 ; x ) ; trs q (+1) B!N(0 1): (4:5) Due to (4.5) the test rejects H 0 if Z > : Regardig the asymptotic power of our ew test, we have the followig theorem.
10 30 ZHIDONG BAI AND HEWA SARANADASA Theorem 4.1. Uder the Coditios i (a)  (c), (1 ; )kk BS () ; ; + p tr! 0: (4:6) Proof. Let z j be the sample mea of z ij, i =1 ::: j j =1 ad let M 0 =(z 1 ; z ) 0 ; 0 ;(z 1 ; z ) ; tr(s ): The, M 0 has the same distributio as M uder H 0. Thus, Var(M 0 )= M(1 + o(1)) ad M 0 = p Var(M 0 )!N(0 1). Note that M = M 0 ; 0 (z 1 ; z )+kk ad by (3.7) Var( 0 (z 1 ; z )) = 0 = o( tr( )): Hece, Var(M 0 )= Var(M )! 1 ad cosequetly Note that (+1) B= Var(M) 0! 1: Hece, p M;kk!N(0 1): Var(M 0 ) Z ; (1 ; )kk p tr( )!N(0 1): This implies that BS () =P H1 (Z > ) = P M ;kk p Var M 0 = ; + > ; which completes the proof of the theorem. 5. Discussios ad Simulatios (1 ; )kk p tr + o(1) (1 ; )kk p + o(1) (4:7) tr Comparig Theorems.1, 3.1 ad 4.1, we d that from the poit of view of large sample theory, Hotellig's test is less powerful tha the other two tests, whe y is close to oe, ad that the latter two tests have the same asymptotic power fuctio. Our simulatio results show that eve for moderate sample ad dimesio sizes, Hotellig's test is still less powerful tha the other two tests whe the uderlyig covariace structure is reasoably regular (i.e., the structure of does ot cause a too large dierece betwee 0 ;1 ad p kk = p tr( )), whereas the Type I error does ot chage much i the latter two tests. It would ot be hard to see that usig the approach of this paper, oe may easily derive similar results for the oesample problem, amely, Hotellig's test
11 EFFECT OF HIGH DIMENSION 31 is less powerful tha a oexact test which ca be deed as i Sectio 4, whe the dimesio of data is high. Now, we would like to explai why this pheomeo happes. The reaso for the less powerfuless of Hotellig's test is the \iaccuracy" of the estimator of the covariace matrix. Let X 1 ::: X be i.i.d. radom pvectors of mea 0 ad variacecovariace matrix I p. By the law of large umbers, the sample covariace matrix S = P ;1 X i X 0 i should be \close" to the idetity I p with a error of the order O p (1= p ) whe p is xed. However, whe p is proportioal to (say p=! y (0 1)), the ratio of the largest ad the smallest eigevalues of S teds to (1 + p y) =(1 ; p y) (see, e.g., refereces Bai, Silverstei ad Yi (1988), Bai ad Yi (1993), Gema (1980), Silverstei (1985) ad Yi, Bai ad Krishaiah (1988)). More precisely, i the Theory of spectral aalysis of large dimesioal radom matrices, it has bee prove that the empirical distributio of the eigevalues of S teds to a limitig distributio spreadig over [(1 ; p y) (1 + p y) ] as! 1 (see e.g., Josso (198), Wachter (1978), Yi (1986) ad Yi, Bai ad Krishaiah (1983)). These show that S is ot close to I p. Especially whe y is \close to" oe, the S has may small eigevalues ad hece S ;1 has may huge eigevalues. This will cause the deciecy of the T test. We believe that i may other multivariate statistical ifereces with a iverse of a sample covariace matrix ivolved, the same pheomeo should exist ( as aother example, see Saraadasa (1991, 1993)). Here we would like to explai our quotatiomarked \ `close to' oe". Note that the limitig ratio of the largest to the smallest eigevalues of S teds to (1 + p y) =(1 ; p y). For our simulatio example, y =0:93 ad the ratio of the extreme eigevalues is about That is very serious. Eve for y as small as 0:1 or 0:01, the ratio ca be as large as 3:705 ad 1:494. These show that it is ot ecessary to require the dimesio of data to be very close to the degrees of freedom to make the eect of high dimesio visible. I fact, this has bee show by our simulatio for p =4. Dempster's test statistic depeds o the choice of vectors h 3 h 4 ::: h N because dieret choices of these vectors would produce dieret estimates of the parameter r. O the other had, the estimatio of r ad the roudig of the estimates may cause a error (probably a error of secod order smalless) i Dempster's test. Thus, we cojecture that our ew test ca be more powerful tha Dempster's i their secod terms of a Edgeworth type expasio of their power fuctios. This cojecture was strogly supported by our simulatio re
12 3 ZHIDONG BAI AND HEWA SARANADASA sults. Because our test statistic is mathematically simple, it is ot dicult to get a Edgeworth expasio by usig the results obtai i Babu ad Bai (1993), Bai ad Rao (1991) or Bhattacharya ad Ghosh (1978). It seems dicult to get a similar expasio for Dempster's test due to his complicated estimatio of r. We coducted our simulatio study to compare the power of the three tests for both ormal ad oormal cases. Let N 1 = 5, N = 0, ad p = 40. For the oormal case, observatios were geerated by the followig movig average model: Let fu ijk g be a set of idepedet gamma variables with shape parameter 4 ad scale parameter 1. Dee X ijk = U ijk + U i j+1 k + jk (j =1 ::: p i =1 ::: N k k =1 ) where ad the 's are costats. Uder this model, = ( ij )with ii = 4(1 + ), i i1 =4 ad ij = 0 for ji ; jj > 1. For the ormal case, the covariace matrices were chose to be = I p ad=(1; )I p + J p with =0:5, where J is a p p matrix with all etries oe. Simulatio was also coducted for small p (chose as p =4). The tests were made for size =0:05 with 1000 repetitios. The power is evaluated at stadard parameter = k 1 ; k = p tr. The simulatio for the oormal case was coducted for =0 :3 :6 ad :9 (Table 5.1 ad Figure 5.1). All three tests have almost the same sigicace level. Uder the alterative hypothesis, the power curves of Dempster's test ad our test are rather close but that of our test is always higher tha Dempster's test. Theoretically, the power fuctio for Hotellig's test should icrease very lowly whe the ocetral parameter icreases. This was also demostrated by our simulatio results. The reader should ote that there are oly 1000 repetatios for each p value of ocetral parameter i our simulatio which may cause a error of 1=1000 = 0:0316 by Cetral Limit Theorem, it is ot surprisig the simulated power fuctio of the Hotellig's test, whose magitude is oly aroud 0:05, seems ot icreasig at some poits of the ocetral parameter. Similar tables are preseted for the ormal case (Table 5. ad Figure 5.). For higher dimesio cases the power fuctios of Dempster's test ad our test are almost the same ad our method is ot worse tha Hotellig's test eve for p =4. Ackowledgemet The research of the rst author was partially supported by US NSF grat DMS ad partially by ROC NSC grat NSC M L.
13 EFFECT OF HIGH DIMENSION 33 Table 5.1. Simulated power fuctios of the three tests with multivariate Gamma distributio. N =45 p=40 = :05 =0 =1 = :3 =3:4 = :6 =15:6 = :9 = 35:8 H D BS H D BS H D BS H D BS Table 5.. Simulated power fuctios of the three tests with multivariate ormal distributio. N =45 p=40 = :05 N =45 p=40 = :05 =0 =1 = :5 =41 =0 =1 = :5 =5 H D BS H D BS H D BS H D BS H: Hotellig's F test, D: Dempster's o exact F test, BS: Proposed ormal test, = kp 1; k ad = max tr mi.
14 34 ZHIDONG BAI AND HEWA SARANADASA (p =40 =0:0) (p =40 =0:3) Power Power Power Power (p =40 =0:6) (p =4 =0:9) Hotellig's test ;;;Dempster's test { BS's test Figure 5.1. Simulated power fuctios of the three tests with multivariate Gamma distributio (p =4 =0:0) (p =4 =0:5) Power Power (p =4 =0:0) (p =4 =0:5) Power Power Figure 5.. Simulated power fuctios of the three tests with multivariate ormal distributio. 1.0
15 EFFECT OF HIGH DIMENSION 35 Appedix. Asymptotics Related to the Statistic M A.1. The proof of (4.4): By deitio, we have M =(1+N 1 ;1 )kx 1 k +(1+N ;1 )kx k ; x 0 1 x ; ;1 X XNj j=1 kx ij k : Uder H 0, we may assume 1 = = 0. Write ; = [; 1 ::: ; p ] 0 = [ k `] ad ; 0 ;=[ k`]. The, by Coditios (a)  (c), we have Var( ;1 X XNj j=1 h = ; N tr( )+ Similarly, we may show that kx ij k )= ; E mx `=1 `` i X XNj px j=1 k=1 [(; 0 kz ij ) ;k; k k ] C ;1 [tr + max tr] = o( M): XN 1 Var(x 0 x 1 )=N ; N ; 1 E XN `=1 x 0 i1 x` 1 = tr( ) N 1 N Var(kx 1 k ) = N 1 P tr( )+ N m`=1 ``, P Var(kx 1 3 k = N tr( )+ N m`=1 `` 3 Cov(kx 1 k kx k ) = 0 ad Cov(x 0 1x kx j k )=0forj =1. Therefore, by the fact that P m `=1 `` p max, wehave 1 Var(M )= tr( )+ N1 3 The proof of (4.4) is the complete. + 1 h m i N X`=1 `` = M(1 + o(1)): 3 A.. The asymptotic ormality ofz uder H 0 : From the proof of A.1, oe ca see that (tr(s ) ; tr())= M! 0. Therefore, to show that Z!N(0 1), we eed oly show that[kx 1 ; x k ; E(kx 1 ; x k )]= M!N(0 1). We may rewrite kx 1 ; x k ; E(kx 1 ; x k )= := px k=1 mx `=1 h=1 [(; 0 k(z 1 ; z )) ;k; k k ] NX [U` h + V` h + ``(w ` h ; E(w ` h))]
16 36 ZHIDONG BAI AND HEWA SARANADASA P where z j = N j ;1 N j z ij, z jk deotes the kth compoet ofz j ad h;1 X i U` h = M hw` h ;1 `` w` k1 k 1=1 X `;1 V` h = M ;1 w` h `1=1 `1`(z 1`1 ; z `1) with the covetio that P 0 `1=1 = 0 ad the otatio w` h = 8 >< >: 1 N 1 z h 1 ` if h =1 ::: N 1, 1 N z h;n1 ` if h = N 1 +1 ::: N. Sice Var( P m P N `=1 h=1(w ` h ; E(w ` h))) = ( + )( 1 N N ) P m ``= 1 3 `=1 M! 0, we eed oly show that P m P N `=1 h=1[u` h + V` h ]!N(0 1). Note that fun(`;1)+k = U` k+v` k g forms a sequece of martigale diereces with elds F N(`;1)+h = F(z ijt j = 1 t < ` i = 1 ::: N j ad w` i i h). The the asymptotic ormality may be proved by employig Corollary 3.1 i Hall (1980) with routie vericatio of the followig: ad m Var X`=1 mx NX `=1 h=1 NX h=1 The proof of (4.) is ow complete. E(U 4` h + V 4 ` h)! 0 E[(U ` h + V` h)jf N(`;1)+h ]! 0: A.3. The ratiocosistecy of B : We oly eed show that ~ B = trs ; 1 (trs ) is ratiocosistet for tr( ). Without loss of geerality, we assume that 1 = =0. Note that h X XNj S = ;1 j=1 x ij x 0 ij ; N 1x 1 x 0 1 ; N x x 0 Sice Ex 0 j x j = N ;1 j tr() = o( p tr( )), j = 1, it follows that, x 0 j x j = o( p tr( )). Therefore, we eed oly show that 1 ^B =tr X XNj j=1 x ij x 0 ij 1 ; tr( 1 X XNj j=1 i : x ij x 0 ij)
17 EFFECT OF HIGH DIMENSION 37 is a ratiocosistet estimator of tr( ). P By elemetary calculatio, we have E(tr( 1 P N j j=1 x ij x 0 ij)) = N tr() ad Var(tr( p P 1 P N j j=1 x ij x 0 ij)) = O(tr( )). These, together with p ;1= tr() = o( tr( )), imply that Rewrite 1 tr 1 tr( 1 X XNj j=1 X XNj j=1 x ij x 0 ij = N tr( )+ N + 1 X X XNj N j 0 X j=1 j 0 =1 i 0 =1 x ij x 0 N ij) = (tr()) + o p (tr( )): X XNj j=1 := N tr( )+H 1 + H : We have E(H 1 )=0adVar(H 1 )= 4N 3 4 Thus, tr((; 0 ;) (z ij z 0 ij ; I m)) (tr((; 0 ;)(z ij z 0 ij ; I m ))(; 0 ;)(z i 0 j0z0 i 0 j ; I m)) 0 h tr( 4 )+ P i m ([(; 0 ;) ] ii = o(tr ( )). H 1 = o p (tr( )): Write H = H 1 + H + H 3 + H 4 + H 5, where H 1 = 1 H = 1 X (ij)6=(i 0 j 0 ) X XNj (tr((; 0 ;)(z ij z 0 ij ; I m))(; 0 ;)(z i 0 j0z0 i 0 j ; I m)) 0 X j=1 (k 0 6=` `06=k) k ` k 0 `0(z ij`z ijk 0z ij`0z ijk ) ad H 3 = H 4 = H 5 = 1 X XNj X j=1 X XNj X `6=k6=`0 j=1 `6=k X XNj mx j=1 k `=1 k `` `0((z ij` ; 1)(z ij`0z ijk )) k `` `((z ij` ; 1)(z ij`z ijk )) k `(z ij` ; 1)(z ijk ; 1):
18 38 ZHIDONG BAI AND HEWA SARANADASA We have E(H 1 )=0,E(H )= N [tr( )+tr (); P m k=1 kk = N tr ()+ o(tr( )) ad Var(H 1 )= N(N ; 1) h 4tr ( )+4 4 mx i j t=1 ij it + X m ij 4 i j=1 i = o(tr ( )): Similarly,wemay showthatvar(h ) ad Var(H 3 )have the same order. Fially, oe may show that ad EjH 4 j CN CN EjH 5 j CN mx `=1 mx ` `E mx k `z 11k k=1 `=1 ` `v uut m X k=1 k ` CN maxtr() = o(tr()): mx k `=1 k ` = o(tr()): Combiig the above, we obtai H = N tr () + o p (tr( )). Thus, ^B = tr( )[1 + o p (1)] ad cosequetly, the ratiocosistecy of ^B follows. Refereces Babu, G. J. ad Bai, Z. D. (1993). Edgeworth expasios of a fuctio of sample meas uder miimal momet coditios ad partial Cramer's coditios. Sakhya Ser.A 55, Bai, Z. D., Krishaiah, P. R. ad Zhao, L. (1989). O rates of covergece of eciet detectio criteria i sigal processig with white oise IEEE Iformatio 35, Bai, Z. D. ad Rao, C. R. (1991). Edgeworth expasio of a fuctio of sample meas. A. Statist. 19, Bai, Z. D. Silverstei, J. W. ad Yi, Y. Q. (1988). A ote o the largest eigevalue of a large dimesioal sample covariace matrix. J. Multivariate Aal. 6, Bai, Z. D. ad Yi, Y. Q. (1993). Limit of the smallest eigevalue of large dimesioal sample covariace matrix. A. Probab. 1, Bhattacharya, R. N. ad Ghosh, J. K. (1988). O momet coditios for valid formal Edgeworth expasios. J. Multivariate Aal. 7, Chug, J. H. ad Fraser, D. A. S. (1958). Radomizatio tests for a multivariate twosample problem. J. Amer. Statist. Assoc. 53, Dempster, A. P. (1958). A high dimesioal two sample sigicace test. A. Math. Statist. 9, Dempster, A. P. (1960). A sigicace test for the separatio of two highly multivariate small samples. Biometrics 16, Gema, S. (1980). A limit theorem for the orm of radom matrices. A. Probab. 8, 561.
19 EFFECT OF HIGH DIMENSION 39 Hall, P. G. ad Heyde, C. C. (1980). Martigale Limit Theory ad Its Applicatios. Academic Press, New York. Huber, Peter J. (1973). Robust regressio: Asymptotics, cojectures ad Mote Carlo A. Statist. 1, Josso, D. (198). Some limit theorems for the eigevalues of a sample covariace matrix. J. Multivariate Aal. 1, Loeve, M. (1977). Probability Theory, 4th Ed. SprigerVerlag, New York. Narayaaswamy, C. R. ad Raghavarao, D. (1991). Pricipal compoet aalysis of large dispersio matrices. Appl. Statist. 40, Portoy, S. (1984). Asymptotic behavior of Mestimators of p regressio parameters whe p = is large. I. Cosistecy A. Statist. 1, Portoy, S. (1985). Asymptotic behavior of Mestimators of p regressio parameters whe p = is large: II. Normal approximatio (Corr: 91V19 p8) A. Statist. 13, Saraadasa, H. (1991). Discrimiat aalysis based o experimetal desig cocepts, Ph.D. Thesis, Departmet of Statistics, Temple Uiversity. Saraadasa, H. (1993). Asymptotic expasio of the misclassicatio probabilities of D ad Acriteria for discrimiatio from two high dimesioal populatios usig the theory of large dimesioal radom matrices. J. Multivariate Aal. 46, Silverstei, J. W. (1985). The smallest eigevalue of a large dimesioal Wishart matrix. A. Probab. 13, Wachter, K. W. (1978). The strog limits of radom matrix spectra for sample matrices of idepedet elemets. A. Probab. 6, Yi, Y. Q. (1986). Limitig spectral distributio for a class of radom matrices. J. Multivariate Aal. 0, Yi, Y. Q., Bai, Z. D. ad Krishaiah, P. R. (1983). Limitig behavior of the eigevalues of a multivariate F matrix. J. Multivariate Aal. 13, Yi, Y. Q., Bai, Z. D. ad Krishaiah, P. R. (1988). O the limit of the Largest eigevalue of the large dimesioal sample covariace matrix. Probab. Theory Related Fields 78, Zhao, L. C., Krishaiah, P. R. ad Bai, Z. D. (1986a). O detectio of the umber of sigals i presece of white oise J. Multivariate Aal. 0, 15. Zhao, L. C., Krishaiah, P. R. ad Bai, Z. D. (1986b). O detectio of the umber of sigals whe the oise covariace matrix is arbitrary J. Multivariate Aal. 0, Departmet of Applied Mathematics, Natioal Su Yatse Uiversity, Kaohsiug 8044, Taiwa. The R. W. Johso Pharmaceutical Research Istitute, Precliical Biostatistics, Welsh ad Mckea Road, Sprig House, PA , U.S.A. (Received July 1993 accepted April 1995)
I. Chisquared Distributions
1 M 358K Supplemet to Chapter 23: CHISQUARED DISTRIBUTIONS, TDISTRIBUTIONS, AND DEGREES OF FREEDOM To uderstad tdistributios, we first eed to look at aother family of distributios, the chisquared distributios.
More informationChapter 7 Methods of Finding Estimators
Chapter 7 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 011 Chapter 7 Methods of Fidig Estimators Sectio 7.1 Itroductio Defiitio 7.1.1 A poit estimator is ay fuctio W( X) W( X1, X,, X ) of
More informationProperties of MLE: consistency, asymptotic normality. Fisher information.
Lecture 3 Properties of MLE: cosistecy, asymptotic ormality. Fisher iformatio. I this sectio we will try to uderstad why MLEs are good. Let us recall two facts from probability that we be used ofte throughout
More informationDepartment of Computer Science, University of Otago
Departmet of Computer Sciece, Uiversity of Otago Techical Report OUCS200609 Permutatios Cotaiig May Patters Authors: M.H. Albert Departmet of Computer Sciece, Uiversity of Otago Micah Colema, Rya Fly
More informationIn nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008
I ite Sequeces Dr. Philippe B. Laval Keesaw State Uiversity October 9, 2008 Abstract This had out is a itroductio to i ite sequeces. mai de itios ad presets some elemetary results. It gives the I ite Sequeces
More informationA probabilistic proof of a binomial identity
A probabilistic proof of a biomial idetity Joatho Peterso Abstract We give a elemetary probabilistic proof of a biomial idetity. The proof is obtaied by computig the probability of a certai evet i two
More information7. Sample Covariance and Correlation
1 of 8 7/16/2009 6:06 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 7. Sample Covariace ad Correlatio The Bivariate Model Suppose agai that we have a basic radom experimet, ad that X ad Y
More information0.7 0.6 0.2 0 0 96 96.5 97 97.5 98 98.5 99 99.5 100 100.5 96.5 97 97.5 98 98.5 99 99.5 100 100.5
Sectio 13 KolmogorovSmirov test. Suppose that we have a i.i.d. sample X 1,..., X with some ukow distributio P ad we would like to test the hypothesis that P is equal to a particular distributio P 0, i.e.
More informationCase Study. Normal and t Distributions. Density Plot. Normal Distributions
Case Study Normal ad t Distributios Bret Halo ad Bret Larget Departmet of Statistics Uiversity of Wiscosi Madiso October 11 13, 2011 Case Study Body temperature varies withi idividuals over time (it ca
More informationHypothesis testing. Null and alternative hypotheses
Hypothesis testig Aother importat use of samplig distributios is to test hypotheses about populatio parameters, e.g. mea, proportio, regressio coefficiets, etc. For example, it is possible to stipulate
More informationNPTEL STRUCTURAL RELIABILITY
NPTEL Course O STRUCTURAL RELIABILITY Module # 0 Lecture 1 Course Format: Web Istructor: Dr. Aruasis Chakraborty Departmet of Civil Egieerig Idia Istitute of Techology Guwahati 1. Lecture 01: Basic Statistics
More informationOutput Analysis (2, Chapters 10 &11 Law)
B. Maddah ENMG 6 Simulatio 05/0/07 Output Aalysis (, Chapters 10 &11 Law) Comparig alterative system cofiguratio Sice the output of a simulatio is radom, the comparig differet systems via simulatio should
More informationThe Field of Complex Numbers
The Field of Complex Numbers S. F. Ellermeyer The costructio of the system of complex umbers begis by appedig to the system of real umbers a umber which we call i with the property that i = 1. (Note that
More informationHypothesis Tests Applied to Means
The Samplig Distributio of the Mea Hypothesis Tests Applied to Meas Recall that the samplig distributio of the mea is the distributio of sample meas that would be obtaied from a particular populatio (with
More informationKey Ideas Section 81: Overview hypothesis testing Hypothesis Hypothesis Test Section 82: Basics of Hypothesis Testing Null Hypothesis
Chapter 8 Key Ideas Hypothesis (Null ad Alterative), Hypothesis Test, Test Statistic, Pvalue Type I Error, Type II Error, Sigificace Level, Power Sectio 81: Overview Cofidece Itervals (Chapter 7) are
More information3. Covariance and Correlation
Virtual Laboratories > 3. Expected Value > 1 2 3 4 5 6 3. Covariace ad Correlatio Recall that by takig the expected value of various trasformatios of a radom variable, we ca measure may iterestig characteristics
More informationEconomics 140A Confidence Intervals and Hypothesis Testing
Ecoomics 140A Cofidece Itervals ad Hypothesis Testig Obtaiig a estimate of a parameter is ot the al purpose of statistical iferece because it is highly ulikely that the populatio value of a parameter is
More informationConfidence Intervals for One Mean with Tolerance Probability
Chapter 421 Cofidece Itervals for Oe Mea with Tolerace Probability Itroductio This procedure calculates the sample size ecessary to achieve a specified distace from the mea to the cofidece limit(s) with
More informationDefinition. Definition. 72 Estimating a Population Proportion. Definition. Definition
7 stimatig a Populatio Proportio I this sectio we preset methods for usig a sample proportio to estimate the value of a populatio proportio. The sample proportio is the best poit estimate of the populatio
More informationFourier Series and the Wave Equation Part 2
Fourier Series ad the Wave Equatio Part There are two big ideas i our work this week. The first is the use of liearity to break complicated problems ito simple pieces. The secod is the use of the symmetries
More informationChapter 6: Variance, the law of large numbers and the MonteCarlo method
Chapter 6: Variace, the law of large umbers ad the MoteCarlo method Expected value, variace, ad Chebyshev iequality. If X is a radom variable recall that the expected value of X, E[X] is the average value
More informationUniversity of California, Los Angeles Department of Statistics. Distributions related to the normal distribution
Uiversity of Califoria, Los Ageles Departmet of Statistics Statistics 100B Istructor: Nicolas Christou Three importat distributios: Distributios related to the ormal distributio Chisquare (χ ) distributio.
More informationTHE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n
We will cosider the liear regressio model i matrix form. For simple liear regressio, meaig oe predictor, the model is i = + x i + ε i for i =,,,, This model icludes the assumptio that the ε i s are a sample
More information1 Introduction to reducing variance in Monte Carlo simulations
Copyright c 007 by Karl Sigma 1 Itroductio to reducig variace i Mote Carlo simulatios 11 Review of cofidece itervals for estimatig a mea I statistics, we estimate a uow mea µ = E(X) of a distributio by
More informationSoving Recurrence Relations
Sovig Recurrece Relatios Part 1. Homogeeous liear 2d degree relatios with costat coefficiets. Cosider the recurrece relatio ( ) T () + at ( 1) + bt ( 2) = 0 This is called a homogeeous liear 2d degree
More information5: Introduction to Estimation
5: Itroductio to Estimatio Cotets Acroyms ad symbols... 1 Statistical iferece... Estimatig µ with cofidece... 3 Samplig distributio of the mea... 3 Cofidece Iterval for μ whe σ is kow before had... 4 Sample
More informationChapter 7  Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas:
Chapter 7  Samplig Distributios 1 Itroductio What is statistics? It cosist of three major areas: Data Collectio: samplig plas ad experimetal desigs Descriptive Statistics: umerical ad graphical summaries
More informationAn example of nonquenched convergence in the conditional central limit theorem for partial sums of a linear process
A example of oqueched covergece i the coditioal cetral limit theorem for partial sums of a liear process Dalibor Volý ad Michael Woodroofe Abstract A causal liear processes X,X 0,X is costructed for which
More informationB1. Fourier Analysis of Discrete Time Signals
B. Fourier Aalysis of Discrete Time Sigals Objectives Itroduce discrete time periodic sigals Defie the Discrete Fourier Series (DFS) expasio of periodic sigals Defie the Discrete Fourier Trasform (DFT)
More informationGregory Carey, 1998 Linear Transformations & Composites  1. Linear Transformations and Linear Composites
Gregory Carey, 1998 Liear Trasformatios & Composites  1 Liear Trasformatios ad Liear Composites I Liear Trasformatios of Variables Meas ad Stadard Deviatios of Liear Trasformatios A liear trasformatio
More informationStandard Errors and Confidence Intervals
Stadard Errors ad Cofidece Itervals Itroductio I the documet Data Descriptio, Populatios ad the Normal Distributio a sample had bee obtaied from the populatio of heights of 5yearold boys. If we assume
More informationDistributions of Order Statistics
Chapter 2 Distributios of Order Statistics We give some importat formulae for distributios of order statistics. For example, where F k: (x)=p{x k, x} = I F(x) (k, k + 1), I x (a,b)= 1 x t a 1 (1 t) b 1
More informationModule 4: Mathematical Induction
Module 4: Mathematical Iductio Theme 1: Priciple of Mathematical Iductio Mathematical iductio is used to prove statemets about atural umbers. As studets may remember, we ca write such a statemet as a predicate
More informationAsymptotic Growth of Functions
CMPS Itroductio to Aalysis of Algorithms Fall 3 Asymptotic Growth of Fuctios We itroduce several types of asymptotic otatio which are used to compare the performace ad efficiecy of algorithms As we ll
More informationTIEE Teaching Issues and Experiments in Ecology  Volume 1, January 2004
TIEE Teachig Issues ad Experimets i Ecology  Volume 1, Jauary 2004 EXPERIMENTS Evirometal Correlates of Leaf Stomata Desity Bruce W. Grat ad Itzick Vatick Biology, Wideer Uiversity, Chester PA, 19013
More information1 Computing the Standard Deviation of Sample Means
Computig the Stadard Deviatio of Sample Meas Quality cotrol charts are based o sample meas ot o idividual values withi a sample. A sample is a group of items, which are cosidered all together for our aalysis.
More informationARTICLE IN PRESS. Statistics & Probability Letters ( ) A Kolmogorovtype test for monotonicity of regression. Cecile Durot
STAPRO 66 pp:  col.fig.: il ED: MG PROD. TYPE: COM PAGN: Usha.N  SCAN: il Statistics & Probability Letters 2 2 2 2 Abstract A Kolmogorovtype test for mootoicity of regressio Cecile Durot Laboratoire
More informationSECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES
SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES Read Sectio 1.5 (pages 5 9) Overview I Sectio 1.5 we lear to work with summatio otatio ad formulas. We will also itroduce a brief overview of sequeces,
More informationPROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUSMALUS SYSTEM
PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY Physical ad Mathematical Scieces 2015, 1, p. 15 19 M a t h e m a t i c s AN ALTERNATIVE MODEL FOR BONUSMALUS SYSTEM A. G. GULYAN Chair of Actuarial Mathematics
More informationModified Line Search Method for Global Optimization
Modified Lie Search Method for Global Optimizatio Cria Grosa ad Ajith Abraham Ceter of Excellece for Quatifiable Quality of Service Norwegia Uiversity of Sciece ad Techology Trodheim, Norway {cria, ajith}@q2s.tu.o
More informationDetermining the sample size
Determiig the sample size Oe of the most commo questios ay statisticia gets asked is How large a sample size do I eed? Researchers are ofte surprised to fid out that the aswer depeds o a umber of factors
More informationIncremental calculation of weighted mean and variance
Icremetal calculatio of weighted mea ad variace Toy Fich faf@cam.ac.uk dot@dotat.at Uiversity of Cambridge Computig Service February 009 Abstract I these otes I eplai how to derive formulae for umerically
More informationConfidence Intervals for One Mean
Chapter 420 Cofidece Itervals for Oe Mea Itroductio This routie calculates the sample size ecessary to achieve a specified distace from the mea to the cofidece limit(s) at a stated cofidece level for a
More informationMaximum Likelihood Estimators.
Lecture 2 Maximum Likelihood Estimators. Matlab example. As a motivatio, let us look at oe Matlab example. Let us geerate a radom sample of size 00 from beta distributio Beta(5, 2). We will lear the defiitio
More informationMARTINGALES AND A BASIC APPLICATION
MARTINGALES AND A BASIC APPLICATION TURNER SMITH Abstract. This paper will develop the measuretheoretic approach to probability i order to preset the defiitio of martigales. From there we will apply this
More informationORDERS OF GROWTH KEITH CONRAD
ORDERS OF GROWTH KEITH CONRAD Itroductio Gaiig a ituitive feel for the relative growth of fuctios is importat if you really wat to uderstad their behavior It also helps you better grasp topics i calculus
More informationChapter 10. Hypothesis Tests Regarding a Parameter. 10.1 The Language of Hypothesis Testing
Chapter 10 Hypothesis Tests Regardig a Parameter A secod type of statistical iferece is hypothesis testig. Here, rather tha use either a poit (or iterval) estimate from a simple radom sample to approximate
More information9.8: THE POWER OF A TEST
9.8: The Power of a Test CD91 9.8: THE POWER OF A TEST I the iitial discussio of statistical hypothesis testig, the two types of risks that are take whe decisios are made about populatio parameters based
More information, a Wishart distribution with n 1 degrees of freedom and scale matrix.
UMEÅ UNIVERSITET Matematiskstatistiska istitutioe Multivariat dataaalys D MSTD79 PA TENTAMEN 00409 LÖSNINGSFÖRSLAG TILL TENTAMEN I MATEMATISK STATISTIK Multivariat dataaalys D, 5 poäg.. Assume that
More informationUC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Solutions 9 Spring 2006
Exam format UC Bereley Departmet of Electrical Egieerig ad Computer Sciece EE 6: Probablity ad Radom Processes Solutios 9 Sprig 006 The secod midterm will be held o Wedesday May 7; CHECK the fial exam
More information13 Fast Fourier Transform (FFT)
13 Fast Fourier Trasform FFT) The fast Fourier trasform FFT) is a algorithm for the efficiet implemetatio of the discrete Fourier trasform. We begi our discussio oce more with the cotiuous Fourier trasform.
More informationThe analysis of the Cournot oligopoly model considering the subjective motive in the strategy selection
The aalysis of the Courot oligopoly model cosiderig the subjective motive i the strategy selectio Shigehito Furuyama Teruhisa Nakai Departmet of Systems Maagemet Egieerig Faculty of Egieerig Kasai Uiversity
More informationNormal Distribution.
Normal Distributio www.icrf.l Normal distributio I probability theory, the ormal or Gaussia distributio, is a cotiuous probability distributio that is ofte used as a first approimatio to describe realvalued
More informationThe second difference is the sequence of differences of the first difference sequence, 2
Differece Equatios I differetial equatios, you look for a fuctio that satisfies ad equatio ivolvig derivatives. I differece equatios, istead of a fuctio of a cotiuous variable (such as time), we look for
More informationBASIC STATISTICS. f(x 1,x 2,..., x n )=f(x 1 )f(x 2 ) f(x n )= f(x i ) (1)
BASIC STATISTICS. SAMPLES, RANDOM SAMPLING AND SAMPLE STATISTICS.. Radom Sample. The radom variables X,X 2,..., X are called a radom sample of size from the populatio f(x if X,X 2,..., X are mutually idepedet
More informationSequences II. Chapter 3. 3.1 Convergent Sequences
Chapter 3 Sequeces II 3. Coverget Sequeces Plot a graph of the sequece a ) = 2, 3 2, 4 3, 5 + 4,...,,... To what limit do you thik this sequece teds? What ca you say about the sequece a )? For ǫ = 0.,
More informationA Gentle Introduction to Algorithms: Part II
A Getle Itroductio to Algorithms: Part II Cotets of Part I:. Merge: (to merge two sorted lists ito a sigle sorted list.) 2. Bubble Sort 3. Merge Sort: 4. The BigO, BigΘ, BigΩ otatios: asymptotic bouds
More informationNotes on Hypothesis Testing
Probability & Statistics Grishpa Notes o Hypothesis Testig A radom sample X = X 1,..., X is observed, with joit pmf/pdf f θ x 1,..., x. The values x = x 1,..., x of X lie i some sample space X. The parameter
More informationStat 104 Lecture 16. Statistics 104 Lecture 16 (IPS 6.1) Confidence intervals  the general concept
Statistics 104 Lecture 16 (IPS 6.1) Outlie for today Cofidece itervals Cofidece itervals for a mea, µ (kow σ) Cofidece itervals for a proportio, p Margi of error ad sample size Review of mai topics for
More informationZTEST / ZSTATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown
ZTEST / ZSTATISTIC: used to test hypotheses about µ whe the populatio stadard deviatio is kow ad populatio distributio is ormal or sample size is large TTEST / TSTATISTIC: used to test hypotheses about
More informationSequences and Series
CHAPTER 9 Sequeces ad Series 9.. Covergece: Defiitio ad Examples Sequeces The purpose of this chapter is to itroduce a particular way of geeratig algorithms for fidig the values of fuctios defied by their
More informationLecture 4: Cheeger s Inequality
Spectral Graph Theory ad Applicatios WS 0/0 Lecture 4: Cheeger s Iequality Lecturer: Thomas Sauerwald & He Su Statemet of Cheeger s Iequality I this lecture we assume for simplicity that G is a dregular
More informationIrreducible polynomials with consecutive zero coefficients
Irreducible polyomials with cosecutive zero coefficiets Theodoulos Garefalakis Departmet of Mathematics, Uiversity of Crete, 71409 Heraklio, Greece Abstract Let q be a prime power. We cosider the problem
More informationCenter, Spread, and Shape in Inference: Claims, Caveats, and Insights
Ceter, Spread, ad Shape i Iferece: Claims, Caveats, ad Isights Dr. Nacy Pfeig (Uiversity of Pittsburgh) AMATYC November 2008 Prelimiary Activities 1. I would like to produce a iterval estimate for the
More informationChapter 7: Confidence Interval and Sample Size
Chapter 7: Cofidece Iterval ad Sample Size Learig Objectives Upo successful completio of Chapter 7, you will be able to: Fid the cofidece iterval for the mea, proportio, ad variace. Determie the miimum
More informationAlternatives To Pearson s and Spearman s Correlation Coefficients
Alteratives To Pearso s ad Spearma s Correlatio Coefficiets Floreti Smaradache Chair of Math & Scieces Departmet Uiversity of New Mexico Gallup, NM 8730, USA Abstract. This article presets several alteratives
More informationHypergeometric Distributions
7.4 Hypergeometric Distributios Whe choosig the startig lieup for a game, a coach obviously has to choose a differet player for each positio. Similarly, whe a uio elects delegates for a covetio or you
More informationPSYCHOLOGICAL STATISTICS
UNIVERSITY OF CALICUT SCHOOL OF DISTANCE EDUCATION B Sc. Cousellig Psychology (0 Adm.) IV SEMESTER COMPLEMENTARY COURSE PSYCHOLOGICAL STATISTICS QUESTION BANK. Iferetial statistics is the brach of statistics
More informationChapter 5 O A Cojecture Of Erdíos Proceedigs NCUR VIII è1994è, Vol II, pp 794í798 Jeærey F Gold Departmet of Mathematics, Departmet of Physics Uiversity of Utah Do H Tucker Departmet of Mathematics Uiversity
More informationTHIN SEQUENCES AND THE GRAM MATRIX PAMELA GORKIN, JOHN E. MCCARTHY, SANDRA POTT, AND BRETT D. WICK
THIN SEQUENCES AND THE GRAM MATRIX PAMELA GORKIN, JOHN E MCCARTHY, SANDRA POTT, AND BRETT D WICK Abstract We provide a ew proof of Volberg s Theorem characterizig thi iterpolatig sequeces as those for
More information8.5 Alternating infinite series
65 8.5 Alteratig ifiite series I the previous two sectios we cosidered oly series with positive terms. I this sectio we cosider series with both positive ad egative terms which alterate: positive, egative,
More informationLesson 12. Sequences and Series
Retur to List of Lessos Lesso. Sequeces ad Series A ifiite sequece { a, a, a,... a,...} ca be thought of as a list of umbers writte i defiite order ad certai patter. It is usually deoted by { a } =, or
More informationSection 73 Estimating a Population. Requirements
Sectio 73 Estimatig a Populatio Mea: σ Kow Key Cocept This sectio presets methods for usig sample data to fid a poit estimate ad cofidece iterval estimate of a populatio mea. A key requiremet i this sectio
More informationAQA STATISTICS 1 REVISION NOTES
AQA STATISTICS 1 REVISION NOTES AVERAGES AND MEASURES OF SPREAD www.mathsbox.org.uk Mode : the most commo or most popular data value the oly average that ca be used for qualitative data ot suitable if
More informationApproximating the Sum of a Convergent Series
Approximatig the Sum of a Coverget Series Larry Riddle Ages Scott College Decatur, GA 30030 lriddle@agesscott.edu The BC Calculus Course Descriptio metios how techology ca be used to explore covergece
More informationSimulation and Monte Carlo integration
Chapter 3 Simulatio ad Mote Carlo itegratio I this chapter we itroduce the cocept of geeratig observatios from a specified distributio or sample, which is ofte called Mote Carlo geeratio. The ame of Mote
More informationwhen n = 1, 2, 3, 4, 5, 6, This list represents the amount of dollars you have after n days. Note: The use of is read as and so on.
Geometric eries Before we defie what is meat by a series, we eed to itroduce a related topic, that of sequeces. Formally, a sequece is a fuctio that computes a ordered list. uppose that o day 1, you have
More informationUsing Excel to Construct Confidence Intervals
OPIM 303 Statistics Ja Stallaert Usig Excel to Costruct Cofidece Itervals This hadout explais how to costruct cofidece itervals i Excel for the followig cases: 1. Cofidece Itervals for the mea of a populatio
More information1 The Binomial Theorem: Another Approach
The Biomial Theorem: Aother Approach Pascal s Triagle I class (ad i our text we saw that, for iteger, the biomial theorem ca be stated (a + b = c a + c a b + c a b + + c ab + c b, where the coefficiets
More informationChapter 14 Nonparametric Statistics
Chapter 14 Noparametric Statistics A.K.A. distributiofree statistics! Does ot deped o the populatio fittig ay particular type of distributio (e.g, ormal). Sice these methods make fewer assumptios, they
More informationExample 2 Find the square root of 0. The only square root of 0 is 0 (since 0 is not positive or negative, so those choices don t exist here).
BEGINNING ALGEBRA Roots ad Radicals (revised summer, 00 Olso) Packet to Supplemet the Curret Textbook  Part Review of Square Roots & Irratioals (This portio ca be ay time before Part ad should mostly
More informationStatistics Lecture 14. Introduction to Inference. Administrative Notes. Hypothesis Tests. Last Class: Confidence Intervals
Statistics 111  Lecture 14 Itroductio to Iferece Hypothesis Tests Admiistrative Notes Sprig Break! No lectures o Tuesday, March 8 th ad Thursday March 10 th Exteded Sprig Break! There is o Stat 111 recitatio
More informationConfidence Intervals and Sample Size
8/7/015 C H A P T E R S E V E N Cofidece Itervals ad Copyright 015 The McGrawHill Compaies, Ic. Permissio required for reproductio or display. 1 Cofidece Itervals ad Outlie 71 Cofidece Itervals for the
More information4.1 Sigma Notation and Riemann Sums
0 the itegral. Sigma Notatio ad Riema Sums Oe strategy for calculatig the area of a regio is to cut the regio ito simple shapes, calculate the area of each simple shape, ad the add these smaller areas
More informationGrade 7. Strand: Number Specific Learning Outcomes It is expected that students will:
Strad: Number Specific Learig Outcomes It is expected that studets will: 7.N.1. Determie ad explai why a umber is divisible by 2, 3, 4, 5, 6, 8, 9, or 10, ad why a umber caot be divided by 0. [C, R] [C]
More informationTHE HEIGHT OF qbinary SEARCH TREES
THE HEIGHT OF qbinary SEARCH TREES MICHAEL DRMOTA AND HELMUT PRODINGER Abstract. q biary search trees are obtaied from words, equipped with the geometric distributio istead of permutatios. The average
More informationLecture 7: Borel Sets and Lebesgue Measure
EE50: Probability Foudatios for Electrical Egieers JulyNovember 205 Lecture 7: Borel Sets ad Lebesgue Measure Lecturer: Dr. Krisha Jagaatha Scribes: Ravi Kolla, Aseem Sharma, Vishakh Hegde I this lecture,
More informationMeasures of Spread and Boxplots Discrete Math, Section 9.4
Measures of Spread ad Boxplots Discrete Math, Sectio 9.4 We start with a example: Example 1: Comparig Mea ad Media Compute the mea ad media of each data set: S 1 = {4, 6, 8, 10, 1, 14, 16} S = {4, 7, 9,
More informationLecture 13. Lecturer: Jonathan Kelner Scribe: Jonathan Pines (2009)
18.409 A Algorithmist s Toolkit October 27, 2009 Lecture 13 Lecturer: Joatha Keler Scribe: Joatha Pies (2009) 1 Outlie Last time, we proved the BruMikowski iequality for boxes. Today we ll go over the
More informationConfidence Intervals for the Mean of Nonnormal Data Class 23, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom
Cofidece Itervals for the Mea of Noormal Data Class 23, 8.05, Sprig 204 Jeremy Orloff ad Joatha Bloom Learig Goals. Be able to derive the formula for coservative ormal cofidece itervals for the proportio
More informationSystems Design Project: Indoor Location of Wireless Devices
Systems Desig Project: Idoor Locatio of Wireless Devices Prepared By: Bria Murphy Seior Systems Sciece ad Egieerig Washigto Uiversity i St. Louis Phoe: (805) 6985295 Email: bcm1@cec.wustl.edu Supervised
More information1. C. The formula for the confidence interval for a population mean is: x t, which was
s 1. C. The formula for the cofidece iterval for a populatio mea is: x t, which was based o the sample Mea. So, x is guarateed to be i the iterval you form.. D. Use the rule : pvalue
More informationSTATISTICAL METHODS FOR BUSINESS
STATISTICAL METHODS FOR BUSINESS UNIT 7: INFERENTIAL TOOLS. DISTRIBUTIONS ASSOCIATED WITH SAMPLING 7.1. Distributios associated with the samplig process. 7.2. Iferetial processes ad relevat distributios.
More informationUniversal coding for classes of sources
Coexios module: m46228 Uiversal codig for classes of sources Dever Greee This work is produced by The Coexios Project ad licesed uder the Creative Commos Attributio Licese We have discussed several parametric
More informationEstimating Probability Distributions by Observing Betting Practices
5th Iteratioal Symposium o Imprecise Probability: Theories ad Applicatios, Prague, Czech Republic, 007 Estimatig Probability Distributios by Observig Bettig Practices Dr C Lych Natioal Uiversity of Irelad,
More information= 1. n n 2 )= n n 2 σ2 = σ2
SAMLE STATISTICS A rado saple of size fro a distributio f(x is a set of rado variables x 1,x,,x which are idepedetly ad idetically distributed with x i f(x for all i Thus, the joit pdf of the rado saple
More informationarxiv:1506.03481v1 [stat.me] 10 Jun 2015
BEHAVIOUR OF ABC FOR BIG DATA By Wetao Li ad Paul Fearhead Lacaster Uiversity arxiv:1506.03481v1 [stat.me] 10 Ju 2015 May statistical applicatios ivolve models that it is difficult to evaluate the likelihood,
More informationAMS 2000 subject classification. Primary 62G08, 62G20; secondary 62G99
VARIABLE SELECTION IN NONPARAMETRIC ADDITIVE MODELS Jia Huag 1, Joel L. Horowitz 2 ad Fegrog Wei 3 1 Uiversity of Iowa, 2 Northwester Uiversity ad 3 Uiversity of West Georgia Abstract We cosider a oparametric
More informationTHE ABRACADABRA PROBLEM
THE ABRACADABRA PROBLEM FRANCESCO CARAVENNA Abstract. We preset a detailed solutio of Exercise E0.6 i [Wil9]: i a radom sequece of letters, draw idepedetly ad uiformly from the Eglish alphabet, the expected
More informationProbabilistic Engineering Mechanics. Do Rosenblatt and Nataf isoprobabilistic transformations really differ?
Probabilistic Egieerig Mechaics 4 (009) 577 584 Cotets lists available at ScieceDirect Probabilistic Egieerig Mechaics joural homepage: wwwelseviercom/locate/probegmech Do Roseblatt ad Nataf isoprobabilistic
More information