

 Letitia Turner
 2 years ago
 Views:
Transcription
1 Statistica Siica 6(1996), EFFECT OF HIGH DIMENSION: BY AN EXAMPLE OF A TWO SAMPLE PROBLEM Zhidog Bai ad Hewa Saraadasa Natioal Su Yatse Uiversity Abstract: With the rapid developmet of moder computig techiques, statisticias are dealig with data with much higher dimesio. Cosequetly, due to their loss of accuracy or power, some classical statistical ifereces are beig challeged by oexact approaches. The purpose of this paper is to poit out ad briey aalyze such a pheomeo ad to ecourage statisticias to reexamie classical statistical approaches whe they are dealig with high dimesioal data. As a example, we derive the asymptotic power of the classical Hotellig's T test ad Dempster's oexact test for atwosample problem. Also, a asymptotically ormally distributed test statistic is proposed. Our results show that both Dempster's oexact test ad the ew test have higher power tha Hotellig's test whe the data dimesio is proportioally close to the withi sample degrees of freedom. Although our ew test has a asymptotic power fuctio similar to Dempster's, it does ot rely o the ormality assumptio. Some simulatio results are preseted which show that the oexact tests are more powerful tha Hotellig's test eve for moderately large dimesio ad sample sizes. Key words ad phrases: Edgeworth expasio, Hotellig T test, hypothesis test, power fuctio, sigicace test, approximatio. 1. Itroductio Moder computatio techiques make it possible to deal with high dimesioal data. Some recet examples of iterest i dealig with high dimesioal data ca be foud i Narayaaswamy ad Raghavarao (1991) ad Saraadasa (1991, 1993). Examples may also be foud i applied statistical iferece hadlig samples of may measuremets o idividuals. For example, i a cliical trial of pharmaceutical studies, may blood chemistry measuremets are measured o each idividual. I some studies the umber of variables is comparable to or eve exceeds the total sample size. The purpose of this article is to raise the followig questios: What's ew i high dimesioal statistical iferece ad what should be doe? The dierece of high dimesioal statistical iferece from that i classical statistical iferece will be referred to as the \Eect of High Dimesio" (EHD).
2 31 ZHIDONG BAI AND HEWA SARANADASA There are two aspects of the EHD. The rst, there are too may iterestig or uisace parameters i the model. For example, i Mestimatio i liear models, the umber of regressio parameters may be proportioal to the sample size. This problem remais usolved. The best results are due to Huber's work (1973) i which the cosistecy of estimatio is proved uder the assumptio that p =! 0 ad the asymptotic ormality uder p 3 =! 0, where ad p are the sample size ad the dimesio of regressio coecet vector. Althogh these requiremets o the ratio of the dimetio to the sample size were reduced, very strog assumptios were made o the desig sequece. Refereces are made to Portoy (1984,1985). Aother example is the model of Error i Variables i which the true regressor variables ca be cosidered as uisace parameters whose umber is p (while the umber of observatios is (p + 1)). I these cases, either the estimatio is very poor or it is impossible to get a ubiased or cosistet estimator. The secod case is that the dimesio itself of the data is very high. Of course, the umber of parameters to be estimated must be very large. A example is the detectio of the sigal umber i omidirectioal sigal processig. Whe the umber of sesors are icreased, the detectio accuracy is supposed to be better. However, the simulatio results show the opposite whe the traditioal method (the MUSIC method) is used if the umber of sesors is 10 or more. We believe that the reaso is that the umber of elemets of the covariace matrix (parametrs to be estimated) becomes very large (p ad 00 if p = 10). Some refereces i this directio are Bai, Krishaiah ad Zhao (1989) ad Zhao, Krishiaiah ad Bai (1986a,b). Although the EHD has bee oticed i may dieret directios of multivariate statistical ifereces, the problem has ot yet bee clearly stated i the literature ad o appropriate methods have bee proposed to deal with the EHD. To this ed, we shall aalyze these problems through the two sample problem, as a example to show howadwhy the EHD aects ifereces ad how the EHD ca be reduced. A classical method to deal with this problem is the famous Hotellig's T test. Its advatages iclude: it is ivariat uder liear trasformatio, its exact distributio is kow uder the ull hypothesis ad it is powerful whe the dimesio of data is sucietly small, compared with the sample sizes. However, Hotellig's test has the serious defect that the T statistic is udeed whe the dimesio of data is greater tha the withi sample degrees of freedom. Seekig remedies, Chug ad Fraser (1958) proposed a oparametric test ad Dempster (1958, 1960) discussed the socalled \oexact" sigicace test. Dempster (1960) also cosidered the socalled radomizatio test. These works seek alteratives to Hotellig's test i situatios whe the latter does ot apply. Not oly beig a remedy whe the T is udeed, we show that eve it is well
3 EFFECT OF HIGH DIMENSION 313 deed, the oexact test is more powerful tha the T test whe the dimesio is proportioally \close to" (more discussio o the ratio will be give i Sectio 5) the sample degrees of freedom. Both the T test ad Dempster's oexact test strogly rely o the ormality assumptio. Moreover, Dempster's oexact test statistic ivolves a complicated estimatio of r, the \degrees of freedom" for the chisquare approximatio. To simplify the testig procedure, a ew method is proposed i Sectio 4. It is prove i Sectios 3 ad4thatthe asymptotic power of the ew test is equivalet to that of Dempster's test. Simulatio results further show that our ew approach is slightly more powerful tha Dempster's. We believe that the estimatio of r ad its roudig to a iteger i Dempster's procedure may cause a error of order O(1=). This might idicate that the ew approach is superior to Dempster's test i the secod order term i some Edgeworthtype expasios. We shall ot discuss this i detail i this paper but hope to address it i future work. Some simulatio results ad discussios are preseted i Sectio 5 ad some techical proofs are give i the Appedix.. Asymptotic Power of Hotellig's Test I this sectio, we derive the asymptotic power fuctios of the T test for the two sample problem. The model described here is the same as the oe i Dempster's test give i the ext sectio. Suppose that x i j N p ( i ) j =1 ::: N i i =1 are two idepedet samples. To test the hypothesis H 0 : 1 = vs H 1 : 1 6=, traditioally oe uses Hotellig's famous T test which isdeedby T = (x 1 ; x ) 0 A ;1 (x 1 ; x ) (:1) P where x i = 1 N i Ni x j=1 i j i = 1 A = P P N i j=1(x i j ; x i )(x i j ; x i ) 0 ad = N1N N 1+N with = N 1 + N ;. The purpose of this sectio is to ivestigate the power fuctio of Hotellig's test whe p=! y (0 1) for guarateeig the existece of the T statistic, ad to compare it with other oexact tests give i later sectios. To derive the asymptotic power of Hotellig's test, we rst derive a asymptotic expressio for the threshold of the test. It is well kow that uder the ull hypothesis, ;p+1 p T has a F distributio with degrees of freedom p ad ; p +1. Let the sigicace level be chose as ad the threshold be deoted by F (p ; p + 1). We have the followig lemma. q p Lemma.1. F ;p+1 (p ; p +1)= y + y 1;y (1;y) 3 + o( p 1 ) where y = p, lim!1 y = y (0 1) ad is the 1 ; quatile of stadard ormal distributio.
4 314 ZHIDONG BAI AND HEWA SARANADASA Proof. Uder the ull hypothesis, by the Cetral Limit Theorem, s (1 ; y) 3 T y ; y!n(0 1) as!1 1 ; y from which the result follows immediately. Now, we cosider the behavior of T = uder H 1. I this case, its distributio is the same as (w + ;1= ) 0 U ;1 (w + ;1= ) (:) where = ; 1 (1 ; ) U= P u iu 0 i w =(w 1 ::: w p ) 0 ad u i i =1 ::: are i.i.d. N(0 I p ) radom vectors ad =(N 1 + N )=N 1 N : Deote the spectral decompositio of U ;1 by O 0 diag[d 1 ::: d p ]O with eigevalues d 1 d p > 0. The, (.) becomes (Ow + ;1= kkv) 0 diag[d 1 ::: d p ](Ow + ;1= kkv) (:3) where v = O=kk. Sice U has the Wishart distributio W ( I p ), the orthogoal matrix O has the Haar distributio o the group of all orthogoal pmatrices, ad hece the vector v is uiformly distributed o the uit psphere. Note that the coditioal distributio of Ow give O is N (0 I p ), the same as that of w which is idepedet ofo. This shows that Ow is idepedet ofv. Therefore, replacig Ow i (.3) by w does ot chage the joit distributio of Ow, v ad the d i 's. Cosequetly, T has the same distributio as = px (w i +w i v i ;1= kk + ;1 kk v i )d i (:4) where v =(v 1 ::: v p ) 0 is uiformly distributed o the uit sphere of R p ad is idepedet ofw ad the d i 's. Lemma.. Usig the above otatio, we have p P p d i ; y 1;y P p d i! y (1;y) 3 i probability.! 0 ad Proof. Recallig (.4) with = 0 uder the ull hypothesis ad applyig the Cetral Limit Theorem with D = fd 1 ::: d p g give, we have s T P 1 ; y y + y (1 ; y) 3 x p P q ( 1;y p ; y d y i)+ x (1;y) 3 h = E P P p (w i ; 1)d i p P p d i h p ( 1;y y = E P q p ; d y i)+ p P p d i (1;y) 3 p P p d i x D i i + o(1) (:5)
5 EFFECT OF HIGH DIMENSION 315 where is the distributio fuctio of a stadard ormal variable. O the other had, as show i the proof of Lemma.1, the Cetral Limit Theorem implies that the above quatity teds to (x), for all x. Hece, by the typecovergece theorem (see Page 16 of Loeve (1977)), the lemma is proved. Now we are i positio to derive a approximatio of the power fuctio of Hotellig's test. Theorem.1. If y = p! y (0 1), N 1=(N 1 + N )! (0 1) ad kk = o(1) the H () ; ; + s (1 ; y) y where H () is the power fuctio of Hotellig's test. (1 ; )kk! 0 (:6) Remark.1. The usual cosideratio of the alterative hypothesis i limitig theorems is to assume that p kk! a > 0. Uder this additioal assumptio, it follows from (.6) that the limitig power of Hotellig's test is give by (; +((1;y)=y) 1= (1;)a). This formula shows that the limitig power of Hotellig's test is slowly icreasig for y close to 1, as the ocetral parameter (amely a) icreases. Proof. Write D =(d 1 ::: d ). Usig the facts Ev 1 =1=p, Ev 4 1 =3=[p(p +)] ad Ev 1 v =1=[p(p + )] ad the applyig Lemma., oe easily obtais E E h px w i v i d i ;1= kk i D = hx p (vi ; Evi ) ;1 kk d i i D = ; kk 4 h p(p +) px d i ; p (p +) px d i kk p! 0 i Pr. (:7) px i d i! 0i Pr. (:8) ad px (Ev i ) ;1 kk d i = 1 p kk px d i = y kk p(1 ; y ) (1 + o p( 1 p )): (:9) Thus, by the above ad Lemma.1, we have H () =P p X w i d i y 1 ; y + s y (1 ; y) 3 p ; y kk p(1 ; y ) + o( p 1 )
6 316 ZHIDONG BAI AND HEWA SARANADASA q y h = E P = ; + P p (w i ; 1)d i pp p d i s (1 ; y) y The proof of Theorem.1 is ow complete. 3. Discussio o Dempster's NoExact Test p ) (1;y) 3 p ; ykk + o( 1 p(1;y) pp p d i D i (1 ; )kk + o(1): (:10) Dempster (1958, 1960) proposed a oexact test for the hypothesis described i Sectio, with the dimesio of data possibly greater tha the sample degrees of freedom. First, let us briey describe his test. Deote q N = N 1 + N, X 0 = (x 11 x 1 ::: x 1N1 x 1 ::: x N ) ad by H 0 = ( p 1 N N J N ( N J0 1(N 1+N ) N 1, q N ; 1 N J0 (N 1+N ) N ) 0 h 3 ::: h N ) a suitably chose orthogoal matrix, where J d is a d dimesioal colum vector of 1's. Let Y = HX = (y 1 ::: y N ) 0. The, the vectors y 1 ::: y N are idepedet ormal radom vectors with E(y 1 ) = (N 1 1 +N )= p N, E(y )= ;1= ( 1 ; ), E(y j )=0 for 3 j N Cov(y j )= 1 j N. The, Dempster proposed his oexact sigicace test statistic F = Q =( P N Q i=3 i=), where Q i = yiy 0 i, = N ;. He used the socalled approximatio techique, assumig Q i is approximately distributed as m r, where the parameters m ad r may besolved by the method of momets. The, the distributio of F is approximately F r r. But geerally the parameter r (its explicit form is give i (3.3) below) is ukow. He estimated r by either of the followig two ways. Approach 1: ^r is the solutio of the equatio Approach : ^r is the solutio of the equatio t + w = t = + 1^r 1+ 1 ( ; 1) (3:1) 1 3^r 1 + 1^r 1+ 1 ( ; 1) + + 3^r 1^r 3 (3:) ^r where t = [l( 1 P N i=3 Q i)] ; P N i=3 l Q i, w = ; P 3i<jN l si ij ad ij is the agle betwee the vectors of y i y j, 3 i<j N. Dempster's test is the to reject H 0 if F >F (^r ^r): By elemetary calculus, we have r = (tr()) tr( ) ad m = tr( ) tr : (3:3)
7 EFFECT OF HIGH DIMENSION 317 From (3.3) ad the CauchySchwarz iequality, it follows that r p. O the other had, uder regular coditios, both tr() ad tr( ) are of the order O(), ad hece, r is of the same order. Uder wider coditios (3.7) ad (3.8) give i Theorem 3.1 below, it ca be proved that r! 1. Further, we may prove that t (=r)n (1 ;1= ) ad w (;1) 4 r N (1 + 8 (;1) r ). From these estimates, oe may coclude that both ^r 1 ad ^r are ratiocosistet (i the sese that ^r r! 1). Therefore, the solutios of equatios (3.1) ad (3.) should satisfy ^r 1 = t + O(1) (3:4) ad ^r = 1 w + O(1) (3:5) respectively. Sice the radom eect may cause a error of order O(1), oe may simply choose the estimates of r as t or 1 w;. I the remaider of this sectio, we derive a asymptotic power fuctio of Dempster's oexact test, uder the coditios: p=! y>0, N 1 =(N 1 + N )! (0 1) ad the parameter r is kow. The reader should ote that the limitig ratio y is allowed to be greater tha oe i this case, which is dieret fromthat assumed i Sectio. Whe r is ukow, substitutig r by the estimators ^r 1 or ^r may cause a error of high order smalless i the approximatio of the power fuctio of Dempster's oexact test, as will be see i the proof of Theorem 3.1. Similar to Lemma.1 oe may show the followig lemma. Lemma 3.1. Whe r!1, F (r r)=1+ q =r + o(1= p r): (3:6) The we have the followig approximatio of the power fuctio of Dempster's test. Theorem 3.1. If ad r is kow, the where = 1 ; : D () ; (; + 0 = o(tr ) (3:7) max = o( p tr ) (3:8) (1 ; )kk p )! 0 (3:9) tr
8 318 ZHIDONG BAI AND HEWA SARANADASA Remark 3.1. I usual cases whe cosiderig the asymptotic power of Dempster's test, the quatity kk is ordiarily assumed to have the same order as 1= p ad tr( )tohave order. Thus, the quatities kk = p tr ad p kk are both bouded away from zero ad iity. The expressio of the asymptotic power of Hotellig test is ivolved with a factor p 1 ; y which disappears i the expressio of the asymptotic power of Dempster's test. This reveals the reaso why the power of the Hotellig test icreases much slower tha that of the Dempster test as the ocetral parameter icreases if y is close to oe. Proof. Let =( 1 ::: p ) 0 = ; 1 : The, P p D () =P (yi + ;1= i y i + ;1 i ) P i P p > ;1 F (r r) (3:10) j=1 z ij i where y i, z ij i =1 ::: p j =1 ::: are i.i.d. N (0 1) variables ad 1 ::: p are eigevalues of. By the Cetral Limit Theorem, the laws of large umbers, (3.7) ad (3.8), oe may easily show that: P p (y i ; 1+ ;1= i y i ) i p tr = P p (y i ; 1+ ;1= i y i ) i p tr +4 ;1 0 D!N (0 1): (3:11) ad X px j=1 q q zij i = (tr) 1+ =rn (0 1) + o p ( 1=r) : (3:1) Notig that P p i i = kk ad r = (tr) tr the result (3.9) follows from (3.7) ad Lemma 3.1, immediately. The proof of Theorem 3.1 is ow complete. 4. A New Approach to Test H 0 I this sectio, we propose a ew test for H 0. Istead of the ormality of the uderlyig distributios, we assume: (a) x ij =;z ij + j i =1 ::: N j, j =1, where ; is a p m matrix (m 1) with ;; 0 =adz ij are i.i.d. radom mvectors with idepedet compoets satisfyig Ez ij = 0, Var(z ij ) = I m, Ezijk 4 = 3+ < 1 ad Q m E k=1 z k ijk = 0 (ad 1) whe there is at least oe k = 1 (there are two k 's equal to, correspodigly), wheever m =4 (b) p=! y>0adn 1 =(N 1 + N )! (0 1) (c) (3.7) ad (3.8) are true. Here ad later, it should be oted that all radom variables ad parameters deped o. For simplicity we omit the subscript from all radom variables except those statistics deed later.
9 EFFECT OF HIGH DIMENSION 319 Now, we begi to costruct our test. Cosider the statistic M =(x 1 ; x ) 0 (x 1 ; x ) ; trs (4:1) where S = 1 A, x 1 x ad A are deed i Sectio. Uder H 0, we have EM =0. If the coditios (a)  (c) are true, it may be proved (see the Appedix) that uder H 0, M Z = p!n(0 1) as!1: (4:) VarM If the uderlyig distributios are ormal as described i Sectio, the uder H 0 wehave M := VarM = (1 + 1 )tr : (4:3) If the uderlyig distributios are ot ormal but satisfy the coditios (a)  (c), oe may show (see the Appedix) that VarM = M(1 + o(1)): (4:4) Hece (4.) is still true if the deomiator of Z is replaced by M. Therefore, to complete the costructio of our test statistic, we eed oly d a ratiocosistet estimator of tr( )adsubstitute it ito the deomiator of Z. It seems that a atural estimator of tr should be trs. However, ulike the case where p is xed, trs is geerally either ubiased or ratiocosistet eve uder the ormal assumptio. If S W p ( ),itisroutietoverify that B = trs ; 1 ( +)( ; 1) (trs ) is a ubiased ad ratiocosistet estimator of tr. Here, it should be oted that trs ; 1 (trs ) 0, by the CauchySchwarz iequality. I the Appedix, we shall prove that B is still a ratiocosistet estimator of tr uder the Coditios (a)  (c). Replacig tr i (4.3) by the ratiocosistet estimator B,we obtai our test statistic Z = (x 1 ; x ) 0 (x 1 ; x ) ; trs trs ; ;1 (trs ) r (+1) (+)(;1) = N 1N N 1+N (x 1 ; x ) 0 (x 1 ; x ) ; trs q (+1) B!N(0 1): (4:5) Due to (4.5) the test rejects H 0 if Z > : Regardig the asymptotic power of our ew test, we have the followig theorem.
10 30 ZHIDONG BAI AND HEWA SARANADASA Theorem 4.1. Uder the Coditios i (a)  (c), (1 ; )kk BS () ; ; + p tr! 0: (4:6) Proof. Let z j be the sample mea of z ij, i =1 ::: j j =1 ad let M 0 =(z 1 ; z ) 0 ; 0 ;(z 1 ; z ) ; tr(s ): The, M 0 has the same distributio as M uder H 0. Thus, Var(M 0 )= M(1 + o(1)) ad M 0 = p Var(M 0 )!N(0 1). Note that M = M 0 ; 0 (z 1 ; z )+kk ad by (3.7) Var( 0 (z 1 ; z )) = 0 = o( tr( )): Hece, Var(M 0 )= Var(M )! 1 ad cosequetly Note that (+1) B= Var(M) 0! 1: Hece, p M;kk!N(0 1): Var(M 0 ) Z ; (1 ; )kk p tr( )!N(0 1): This implies that BS () =P H1 (Z > ) = P M ;kk p Var M 0 = ; + > ; which completes the proof of the theorem. 5. Discussios ad Simulatios (1 ; )kk p tr + o(1) (1 ; )kk p + o(1) (4:7) tr Comparig Theorems.1, 3.1 ad 4.1, we d that from the poit of view of large sample theory, Hotellig's test is less powerful tha the other two tests, whe y is close to oe, ad that the latter two tests have the same asymptotic power fuctio. Our simulatio results show that eve for moderate sample ad dimesio sizes, Hotellig's test is still less powerful tha the other two tests whe the uderlyig covariace structure is reasoably regular (i.e., the structure of does ot cause a too large dierece betwee 0 ;1 ad p kk = p tr( )), whereas the Type I error does ot chage much i the latter two tests. It would ot be hard to see that usig the approach of this paper, oe may easily derive similar results for the oesample problem, amely, Hotellig's test
11 EFFECT OF HIGH DIMENSION 31 is less powerful tha a oexact test which ca be deed as i Sectio 4, whe the dimesio of data is high. Now, we would like to explai why this pheomeo happes. The reaso for the less powerfuless of Hotellig's test is the \iaccuracy" of the estimator of the covariace matrix. Let X 1 ::: X be i.i.d. radom pvectors of mea 0 ad variacecovariace matrix I p. By the law of large umbers, the sample covariace matrix S = P ;1 X i X 0 i should be \close" to the idetity I p with a error of the order O p (1= p ) whe p is xed. However, whe p is proportioal to (say p=! y (0 1)), the ratio of the largest ad the smallest eigevalues of S teds to (1 + p y) =(1 ; p y) (see, e.g., refereces Bai, Silverstei ad Yi (1988), Bai ad Yi (1993), Gema (1980), Silverstei (1985) ad Yi, Bai ad Krishaiah (1988)). More precisely, i the Theory of spectral aalysis of large dimesioal radom matrices, it has bee prove that the empirical distributio of the eigevalues of S teds to a limitig distributio spreadig over [(1 ; p y) (1 + p y) ] as! 1 (see e.g., Josso (198), Wachter (1978), Yi (1986) ad Yi, Bai ad Krishaiah (1983)). These show that S is ot close to I p. Especially whe y is \close to" oe, the S has may small eigevalues ad hece S ;1 has may huge eigevalues. This will cause the deciecy of the T test. We believe that i may other multivariate statistical ifereces with a iverse of a sample covariace matrix ivolved, the same pheomeo should exist ( as aother example, see Saraadasa (1991, 1993)). Here we would like to explai our quotatiomarked \ `close to' oe". Note that the limitig ratio of the largest to the smallest eigevalues of S teds to (1 + p y) =(1 ; p y). For our simulatio example, y =0:93 ad the ratio of the extreme eigevalues is about That is very serious. Eve for y as small as 0:1 or 0:01, the ratio ca be as large as 3:705 ad 1:494. These show that it is ot ecessary to require the dimesio of data to be very close to the degrees of freedom to make the eect of high dimesio visible. I fact, this has bee show by our simulatio for p =4. Dempster's test statistic depeds o the choice of vectors h 3 h 4 ::: h N because dieret choices of these vectors would produce dieret estimates of the parameter r. O the other had, the estimatio of r ad the roudig of the estimates may cause a error (probably a error of secod order smalless) i Dempster's test. Thus, we cojecture that our ew test ca be more powerful tha Dempster's i their secod terms of a Edgeworth type expasio of their power fuctios. This cojecture was strogly supported by our simulatio re
12 3 ZHIDONG BAI AND HEWA SARANADASA sults. Because our test statistic is mathematically simple, it is ot dicult to get a Edgeworth expasio by usig the results obtai i Babu ad Bai (1993), Bai ad Rao (1991) or Bhattacharya ad Ghosh (1978). It seems dicult to get a similar expasio for Dempster's test due to his complicated estimatio of r. We coducted our simulatio study to compare the power of the three tests for both ormal ad oormal cases. Let N 1 = 5, N = 0, ad p = 40. For the oormal case, observatios were geerated by the followig movig average model: Let fu ijk g be a set of idepedet gamma variables with shape parameter 4 ad scale parameter 1. Dee X ijk = U ijk + U i j+1 k + jk (j =1 ::: p i =1 ::: N k k =1 ) where ad the 's are costats. Uder this model, = ( ij )with ii = 4(1 + ), i i1 =4 ad ij = 0 for ji ; jj > 1. For the ormal case, the covariace matrices were chose to be = I p ad=(1; )I p + J p with =0:5, where J is a p p matrix with all etries oe. Simulatio was also coducted for small p (chose as p =4). The tests were made for size =0:05 with 1000 repetitios. The power is evaluated at stadard parameter = k 1 ; k = p tr. The simulatio for the oormal case was coducted for =0 :3 :6 ad :9 (Table 5.1 ad Figure 5.1). All three tests have almost the same sigicace level. Uder the alterative hypothesis, the power curves of Dempster's test ad our test are rather close but that of our test is always higher tha Dempster's test. Theoretically, the power fuctio for Hotellig's test should icrease very lowly whe the ocetral parameter icreases. This was also demostrated by our simulatio results. The reader should ote that there are oly 1000 repetatios for each p value of ocetral parameter i our simulatio which may cause a error of 1=1000 = 0:0316 by Cetral Limit Theorem, it is ot surprisig the simulated power fuctio of the Hotellig's test, whose magitude is oly aroud 0:05, seems ot icreasig at some poits of the ocetral parameter. Similar tables are preseted for the ormal case (Table 5. ad Figure 5.). For higher dimesio cases the power fuctios of Dempster's test ad our test are almost the same ad our method is ot worse tha Hotellig's test eve for p =4. Ackowledgemet The research of the rst author was partially supported by US NSF grat DMS ad partially by ROC NSC grat NSC M L.
13 EFFECT OF HIGH DIMENSION 33 Table 5.1. Simulated power fuctios of the three tests with multivariate Gamma distributio. N =45 p=40 = :05 =0 =1 = :3 =3:4 = :6 =15:6 = :9 = 35:8 H D BS H D BS H D BS H D BS Table 5.. Simulated power fuctios of the three tests with multivariate ormal distributio. N =45 p=40 = :05 N =45 p=40 = :05 =0 =1 = :5 =41 =0 =1 = :5 =5 H D BS H D BS H D BS H D BS H: Hotellig's F test, D: Dempster's o exact F test, BS: Proposed ormal test, = kp 1; k ad = max tr mi.
14 34 ZHIDONG BAI AND HEWA SARANADASA (p =40 =0:0) (p =40 =0:3) Power Power Power Power (p =40 =0:6) (p =4 =0:9) Hotellig's test ;;;Dempster's test { BS's test Figure 5.1. Simulated power fuctios of the three tests with multivariate Gamma distributio (p =4 =0:0) (p =4 =0:5) Power Power (p =4 =0:0) (p =4 =0:5) Power Power Figure 5.. Simulated power fuctios of the three tests with multivariate ormal distributio. 1.0
15 EFFECT OF HIGH DIMENSION 35 Appedix. Asymptotics Related to the Statistic M A.1. The proof of (4.4): By deitio, we have M =(1+N 1 ;1 )kx 1 k +(1+N ;1 )kx k ; x 0 1 x ; ;1 X XNj j=1 kx ij k : Uder H 0, we may assume 1 = = 0. Write ; = [; 1 ::: ; p ] 0 = [ k `] ad ; 0 ;=[ k`]. The, by Coditios (a)  (c), we have Var( ;1 X XNj j=1 h = ; N tr( )+ Similarly, we may show that kx ij k )= ; E mx `=1 `` i X XNj px j=1 k=1 [(; 0 kz ij ) ;k; k k ] C ;1 [tr + max tr] = o( M): XN 1 Var(x 0 x 1 )=N ; N ; 1 E XN `=1 x 0 i1 x` 1 = tr( ) N 1 N Var(kx 1 k ) = N 1 P tr( )+ N m`=1 ``, P Var(kx 1 3 k = N tr( )+ N m`=1 `` 3 Cov(kx 1 k kx k ) = 0 ad Cov(x 0 1x kx j k )=0forj =1. Therefore, by the fact that P m `=1 `` p max, wehave 1 Var(M )= tr( )+ N1 3 The proof of (4.4) is the complete. + 1 h m i N X`=1 `` = M(1 + o(1)): 3 A.. The asymptotic ormality ofz uder H 0 : From the proof of A.1, oe ca see that (tr(s ) ; tr())= M! 0. Therefore, to show that Z!N(0 1), we eed oly show that[kx 1 ; x k ; E(kx 1 ; x k )]= M!N(0 1). We may rewrite kx 1 ; x k ; E(kx 1 ; x k )= := px k=1 mx `=1 h=1 [(; 0 k(z 1 ; z )) ;k; k k ] NX [U` h + V` h + ``(w ` h ; E(w ` h))]
16 36 ZHIDONG BAI AND HEWA SARANADASA P where z j = N j ;1 N j z ij, z jk deotes the kth compoet ofz j ad h;1 X i U` h = M hw` h ;1 `` w` k1 k 1=1 X `;1 V` h = M ;1 w` h `1=1 `1`(z 1`1 ; z `1) with the covetio that P 0 `1=1 = 0 ad the otatio w` h = 8 >< >: 1 N 1 z h 1 ` if h =1 ::: N 1, 1 N z h;n1 ` if h = N 1 +1 ::: N. Sice Var( P m P N `=1 h=1(w ` h ; E(w ` h))) = ( + )( 1 N N ) P m ``= 1 3 `=1 M! 0, we eed oly show that P m P N `=1 h=1[u` h + V` h ]!N(0 1). Note that fun(`;1)+k = U` k+v` k g forms a sequece of martigale diereces with elds F N(`;1)+h = F(z ijt j = 1 t < ` i = 1 ::: N j ad w` i i h). The the asymptotic ormality may be proved by employig Corollary 3.1 i Hall (1980) with routie vericatio of the followig: ad m Var X`=1 mx NX `=1 h=1 NX h=1 The proof of (4.) is ow complete. E(U 4` h + V 4 ` h)! 0 E[(U ` h + V` h)jf N(`;1)+h ]! 0: A.3. The ratiocosistecy of B : We oly eed show that ~ B = trs ; 1 (trs ) is ratiocosistet for tr( ). Without loss of geerality, we assume that 1 = =0. Note that h X XNj S = ;1 j=1 x ij x 0 ij ; N 1x 1 x 0 1 ; N x x 0 Sice Ex 0 j x j = N ;1 j tr() = o( p tr( )), j = 1, it follows that, x 0 j x j = o( p tr( )). Therefore, we eed oly show that 1 ^B =tr X XNj j=1 x ij x 0 ij 1 ; tr( 1 X XNj j=1 i : x ij x 0 ij)
17 EFFECT OF HIGH DIMENSION 37 is a ratiocosistet estimator of tr( ). P By elemetary calculatio, we have E(tr( 1 P N j j=1 x ij x 0 ij)) = N tr() ad Var(tr( p P 1 P N j j=1 x ij x 0 ij)) = O(tr( )). These, together with p ;1= tr() = o( tr( )), imply that Rewrite 1 tr 1 tr( 1 X XNj j=1 X XNj j=1 x ij x 0 ij = N tr( )+ N + 1 X X XNj N j 0 X j=1 j 0 =1 i 0 =1 x ij x 0 N ij) = (tr()) + o p (tr( )): X XNj j=1 := N tr( )+H 1 + H : We have E(H 1 )=0adVar(H 1 )= 4N 3 4 Thus, tr((; 0 ;) (z ij z 0 ij ; I m)) (tr((; 0 ;)(z ij z 0 ij ; I m ))(; 0 ;)(z i 0 j0z0 i 0 j ; I m)) 0 h tr( 4 )+ P i m ([(; 0 ;) ] ii = o(tr ( )). H 1 = o p (tr( )): Write H = H 1 + H + H 3 + H 4 + H 5, where H 1 = 1 H = 1 X (ij)6=(i 0 j 0 ) X XNj (tr((; 0 ;)(z ij z 0 ij ; I m))(; 0 ;)(z i 0 j0z0 i 0 j ; I m)) 0 X j=1 (k 0 6=` `06=k) k ` k 0 `0(z ij`z ijk 0z ij`0z ijk ) ad H 3 = H 4 = H 5 = 1 X XNj X j=1 X XNj X `6=k6=`0 j=1 `6=k X XNj mx j=1 k `=1 k `` `0((z ij` ; 1)(z ij`0z ijk )) k `` `((z ij` ; 1)(z ij`z ijk )) k `(z ij` ; 1)(z ijk ; 1):
18 38 ZHIDONG BAI AND HEWA SARANADASA We have E(H 1 )=0,E(H )= N [tr( )+tr (); P m k=1 kk = N tr ()+ o(tr( )) ad Var(H 1 )= N(N ; 1) h 4tr ( )+4 4 mx i j t=1 ij it + X m ij 4 i j=1 i = o(tr ( )): Similarly,wemay showthatvar(h ) ad Var(H 3 )have the same order. Fially, oe may show that ad EjH 4 j CN CN EjH 5 j CN mx `=1 mx ` `E mx k `z 11k k=1 `=1 ` `v uut m X k=1 k ` CN maxtr() = o(tr()): mx k `=1 k ` = o(tr()): Combiig the above, we obtai H = N tr () + o p (tr( )). Thus, ^B = tr( )[1 + o p (1)] ad cosequetly, the ratiocosistecy of ^B follows. Refereces Babu, G. J. ad Bai, Z. D. (1993). Edgeworth expasios of a fuctio of sample meas uder miimal momet coditios ad partial Cramer's coditios. Sakhya Ser.A 55, Bai, Z. D., Krishaiah, P. R. ad Zhao, L. (1989). O rates of covergece of eciet detectio criteria i sigal processig with white oise IEEE Iformatio 35, Bai, Z. D. ad Rao, C. R. (1991). Edgeworth expasio of a fuctio of sample meas. A. Statist. 19, Bai, Z. D. Silverstei, J. W. ad Yi, Y. Q. (1988). A ote o the largest eigevalue of a large dimesioal sample covariace matrix. J. Multivariate Aal. 6, Bai, Z. D. ad Yi, Y. Q. (1993). Limit of the smallest eigevalue of large dimesioal sample covariace matrix. A. Probab. 1, Bhattacharya, R. N. ad Ghosh, J. K. (1988). O momet coditios for valid formal Edgeworth expasios. J. Multivariate Aal. 7, Chug, J. H. ad Fraser, D. A. S. (1958). Radomizatio tests for a multivariate twosample problem. J. Amer. Statist. Assoc. 53, Dempster, A. P. (1958). A high dimesioal two sample sigicace test. A. Math. Statist. 9, Dempster, A. P. (1960). A sigicace test for the separatio of two highly multivariate small samples. Biometrics 16, Gema, S. (1980). A limit theorem for the orm of radom matrices. A. Probab. 8, 561.
19 EFFECT OF HIGH DIMENSION 39 Hall, P. G. ad Heyde, C. C. (1980). Martigale Limit Theory ad Its Applicatios. Academic Press, New York. Huber, Peter J. (1973). Robust regressio: Asymptotics, cojectures ad Mote Carlo A. Statist. 1, Josso, D. (198). Some limit theorems for the eigevalues of a sample covariace matrix. J. Multivariate Aal. 1, Loeve, M. (1977). Probability Theory, 4th Ed. SprigerVerlag, New York. Narayaaswamy, C. R. ad Raghavarao, D. (1991). Pricipal compoet aalysis of large dispersio matrices. Appl. Statist. 40, Portoy, S. (1984). Asymptotic behavior of Mestimators of p regressio parameters whe p = is large. I. Cosistecy A. Statist. 1, Portoy, S. (1985). Asymptotic behavior of Mestimators of p regressio parameters whe p = is large: II. Normal approximatio (Corr: 91V19 p8) A. Statist. 13, Saraadasa, H. (1991). Discrimiat aalysis based o experimetal desig cocepts, Ph.D. Thesis, Departmet of Statistics, Temple Uiversity. Saraadasa, H. (1993). Asymptotic expasio of the misclassicatio probabilities of D ad Acriteria for discrimiatio from two high dimesioal populatios usig the theory of large dimesioal radom matrices. J. Multivariate Aal. 46, Silverstei, J. W. (1985). The smallest eigevalue of a large dimesioal Wishart matrix. A. Probab. 13, Wachter, K. W. (1978). The strog limits of radom matrix spectra for sample matrices of idepedet elemets. A. Probab. 6, Yi, Y. Q. (1986). Limitig spectral distributio for a class of radom matrices. J. Multivariate Aal. 0, Yi, Y. Q., Bai, Z. D. ad Krishaiah, P. R. (1983). Limitig behavior of the eigevalues of a multivariate F matrix. J. Multivariate Aal. 13, Yi, Y. Q., Bai, Z. D. ad Krishaiah, P. R. (1988). O the limit of the Largest eigevalue of the large dimesioal sample covariace matrix. Probab. Theory Related Fields 78, Zhao, L. C., Krishaiah, P. R. ad Bai, Z. D. (1986a). O detectio of the umber of sigals i presece of white oise J. Multivariate Aal. 0, 15. Zhao, L. C., Krishaiah, P. R. ad Bai, Z. D. (1986b). O detectio of the umber of sigals whe the oise covariace matrix is arbitrary J. Multivariate Aal. 0, Departmet of Applied Mathematics, Natioal Su Yatse Uiversity, Kaohsiug 8044, Taiwa. The R. W. Johso Pharmaceutical Research Istitute, Precliical Biostatistics, Welsh ad Mckea Road, Sprig House, PA , U.S.A. (Received July 1993 accepted April 1995)
I. Chisquared Distributions
1 M 358K Supplemet to Chapter 23: CHISQUARED DISTRIBUTIONS, TDISTRIBUTIONS, AND DEGREES OF FREEDOM To uderstad tdistributios, we first eed to look at aother family of distributios, the chisquared distributios.
More informationChapter 7 Methods of Finding Estimators
Chapter 7 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 011 Chapter 7 Methods of Fidig Estimators Sectio 7.1 Itroductio Defiitio 7.1.1 A poit estimator is ay fuctio W( X) W( X1, X,, X ) of
More informationDepartment of Computer Science, University of Otago
Departmet of Computer Sciece, Uiversity of Otago Techical Report OUCS200609 Permutatios Cotaiig May Patters Authors: M.H. Albert Departmet of Computer Sciece, Uiversity of Otago Micah Colema, Rya Fly
More informationProperties of MLE: consistency, asymptotic normality. Fisher information.
Lecture 3 Properties of MLE: cosistecy, asymptotic ormality. Fisher iformatio. I this sectio we will try to uderstad why MLEs are good. Let us recall two facts from probability that we be used ofte throughout
More informationIn nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008
I ite Sequeces Dr. Philippe B. Laval Keesaw State Uiversity October 9, 2008 Abstract This had out is a itroductio to i ite sequeces. mai de itios ad presets some elemetary results. It gives the I ite Sequeces
More informationA probabilistic proof of a binomial identity
A probabilistic proof of a biomial idetity Joatho Peterso Abstract We give a elemetary probabilistic proof of a biomial idetity. The proof is obtaied by computig the probability of a certai evet i two
More information7. Sample Covariance and Correlation
1 of 8 7/16/2009 6:06 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 7. Sample Covariace ad Correlatio The Bivariate Model Suppose agai that we have a basic radom experimet, ad that X ad Y
More informationCase Study. Normal and t Distributions. Density Plot. Normal Distributions
Case Study Normal ad t Distributios Bret Halo ad Bret Larget Departmet of Statistics Uiversity of Wiscosi Madiso October 11 13, 2011 Case Study Body temperature varies withi idividuals over time (it ca
More information0.7 0.6 0.2 0 0 96 96.5 97 97.5 98 98.5 99 99.5 100 100.5 96.5 97 97.5 98 98.5 99 99.5 100 100.5
Sectio 13 KolmogorovSmirov test. Suppose that we have a i.i.d. sample X 1,..., X with some ukow distributio P ad we would like to test the hypothesis that P is equal to a particular distributio P 0, i.e.
More informationOutput Analysis (2, Chapters 10 &11 Law)
B. Maddah ENMG 6 Simulatio 05/0/07 Output Aalysis (, Chapters 10 &11 Law) Comparig alterative system cofiguratio Sice the output of a simulatio is radom, the comparig differet systems via simulatio should
More informationHypothesis testing. Null and alternative hypotheses
Hypothesis testig Aother importat use of samplig distributios is to test hypotheses about populatio parameters, e.g. mea, proportio, regressio coefficiets, etc. For example, it is possible to stipulate
More informationNPTEL STRUCTURAL RELIABILITY
NPTEL Course O STRUCTURAL RELIABILITY Module # 0 Lecture 1 Course Format: Web Istructor: Dr. Aruasis Chakraborty Departmet of Civil Egieerig Idia Istitute of Techology Guwahati 1. Lecture 01: Basic Statistics
More informationKey Ideas Section 81: Overview hypothesis testing Hypothesis Hypothesis Test Section 82: Basics of Hypothesis Testing Null Hypothesis
Chapter 8 Key Ideas Hypothesis (Null ad Alterative), Hypothesis Test, Test Statistic, Pvalue Type I Error, Type II Error, Sigificace Level, Power Sectio 81: Overview Cofidece Itervals (Chapter 7) are
More information3. Covariance and Correlation
Virtual Laboratories > 3. Expected Value > 1 2 3 4 5 6 3. Covariace ad Correlatio Recall that by takig the expected value of various trasformatios of a radom variable, we ca measure may iterestig characteristics
More informationChapter 6: Variance, the law of large numbers and the MonteCarlo method
Chapter 6: Variace, the law of large umbers ad the MoteCarlo method Expected value, variace, ad Chebyshev iequality. If X is a radom variable recall that the expected value of X, E[X] is the average value
More informationUniversity of California, Los Angeles Department of Statistics. Distributions related to the normal distribution
Uiversity of Califoria, Los Ageles Departmet of Statistics Statistics 100B Istructor: Nicolas Christou Three importat distributios: Distributios related to the ormal distributio Chisquare (χ ) distributio.
More information1 Introduction to reducing variance in Monte Carlo simulations
Copyright c 007 by Karl Sigma 1 Itroductio to reducig variace i Mote Carlo simulatios 11 Review of cofidece itervals for estimatig a mea I statistics, we estimate a uow mea µ = E(X) of a distributio by
More informationTHE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n
We will cosider the liear regressio model i matrix form. For simple liear regressio, meaig oe predictor, the model is i = + x i + ε i for i =,,,, This model icludes the assumptio that the ε i s are a sample
More information5: Introduction to Estimation
5: Itroductio to Estimatio Cotets Acroyms ad symbols... 1 Statistical iferece... Estimatig µ with cofidece... 3 Samplig distributio of the mea... 3 Cofidece Iterval for μ whe σ is kow before had... 4 Sample
More informationChapter 7  Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas:
Chapter 7  Samplig Distributios 1 Itroductio What is statistics? It cosist of three major areas: Data Collectio: samplig plas ad experimetal desigs Descriptive Statistics: umerical ad graphical summaries
More informationSoving Recurrence Relations
Sovig Recurrece Relatios Part 1. Homogeeous liear 2d degree relatios with costat coefficiets. Cosider the recurrece relatio ( ) T () + at ( 1) + bt ( 2) = 0 This is called a homogeeous liear 2d degree
More informationGregory Carey, 1998 Linear Transformations & Composites  1. Linear Transformations and Linear Composites
Gregory Carey, 1998 Liear Trasformatios & Composites  1 Liear Trasformatios ad Liear Composites I Liear Trasformatios of Variables Meas ad Stadard Deviatios of Liear Trasformatios A liear trasformatio
More informationARTICLE IN PRESS. Statistics & Probability Letters ( ) A Kolmogorovtype test for monotonicity of regression. Cecile Durot
STAPRO 66 pp:  col.fig.: il ED: MG PROD. TYPE: COM PAGN: Usha.N  SCAN: il Statistics & Probability Letters 2 2 2 2 Abstract A Kolmogorovtype test for mootoicity of regressio Cecile Durot Laboratoire
More informationModule 4: Mathematical Induction
Module 4: Mathematical Iductio Theme 1: Priciple of Mathematical Iductio Mathematical iductio is used to prove statemets about atural umbers. As studets may remember, we ca write such a statemet as a predicate
More informationStandard Errors and Confidence Intervals
Stadard Errors ad Cofidece Itervals Itroductio I the documet Data Descriptio, Populatios ad the Normal Distributio a sample had bee obtaied from the populatio of heights of 5yearold boys. If we assume
More informationDistributions of Order Statistics
Chapter 2 Distributios of Order Statistics We give some importat formulae for distributios of order statistics. For example, where F k: (x)=p{x k, x} = I F(x) (k, k + 1), I x (a,b)= 1 x t a 1 (1 t) b 1
More information1 Computing the Standard Deviation of Sample Means
Computig the Stadard Deviatio of Sample Meas Quality cotrol charts are based o sample meas ot o idividual values withi a sample. A sample is a group of items, which are cosidered all together for our aalysis.
More informationAsymptotic Growth of Functions
CMPS Itroductio to Aalysis of Algorithms Fall 3 Asymptotic Growth of Fuctios We itroduce several types of asymptotic otatio which are used to compare the performace ad efficiecy of algorithms As we ll
More informationAn example of nonquenched convergence in the conditional central limit theorem for partial sums of a linear process
A example of oqueched covergece i the coditioal cetral limit theorem for partial sums of a liear process Dalibor Volý ad Michael Woodroofe Abstract A causal liear processes X,X 0,X is costructed for which
More informationPROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUSMALUS SYSTEM
PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY Physical ad Mathematical Scieces 2015, 1, p. 15 19 M a t h e m a t i c s AN ALTERNATIVE MODEL FOR BONUSMALUS SYSTEM A. G. GULYAN Chair of Actuarial Mathematics
More informationTIEE Teaching Issues and Experiments in Ecology  Volume 1, January 2004
TIEE Teachig Issues ad Experimets i Ecology  Volume 1, Jauary 2004 EXPERIMENTS Evirometal Correlates of Leaf Stomata Desity Bruce W. Grat ad Itzick Vatick Biology, Wideer Uiversity, Chester PA, 19013
More informationSECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES
SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES Read Sectio 1.5 (pages 5 9) Overview I Sectio 1.5 we lear to work with summatio otatio ad formulas. We will also itroduce a brief overview of sequeces,
More informationModified Line Search Method for Global Optimization
Modified Lie Search Method for Global Optimizatio Cria Grosa ad Ajith Abraham Ceter of Excellece for Quatifiable Quality of Service Norwegia Uiversity of Sciece ad Techology Trodheim, Norway {cria, ajith}@q2s.tu.o
More informationDetermining the sample size
Determiig the sample size Oe of the most commo questios ay statisticia gets asked is How large a sample size do I eed? Researchers are ofte surprised to fid out that the aswer depeds o a umber of factors
More informationIncremental calculation of weighted mean and variance
Icremetal calculatio of weighted mea ad variace Toy Fich faf@cam.ac.uk dot@dotat.at Uiversity of Cambridge Computig Service February 009 Abstract I these otes I eplai how to derive formulae for umerically
More informationMARTINGALES AND A BASIC APPLICATION
MARTINGALES AND A BASIC APPLICATION TURNER SMITH Abstract. This paper will develop the measuretheoretic approach to probability i order to preset the defiitio of martigales. From there we will apply this
More informationConfidence Intervals for One Mean
Chapter 420 Cofidece Itervals for Oe Mea Itroductio This routie calculates the sample size ecessary to achieve a specified distace from the mea to the cofidece limit(s) at a stated cofidece level for a
More information, a Wishart distribution with n 1 degrees of freedom and scale matrix.
UMEÅ UNIVERSITET Matematiskstatistiska istitutioe Multivariat dataaalys D MSTD79 PA TENTAMEN 00409 LÖSNINGSFÖRSLAG TILL TENTAMEN I MATEMATISK STATISTIK Multivariat dataaalys D, 5 poäg.. Assume that
More informationChapter 10. Hypothesis Tests Regarding a Parameter. 10.1 The Language of Hypothesis Testing
Chapter 10 Hypothesis Tests Regardig a Parameter A secod type of statistical iferece is hypothesis testig. Here, rather tha use either a poit (or iterval) estimate from a simple radom sample to approximate
More informationUC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Solutions 9 Spring 2006
Exam format UC Bereley Departmet of Electrical Egieerig ad Computer Sciece EE 6: Probablity ad Radom Processes Solutios 9 Sprig 006 The secod midterm will be held o Wedesday May 7; CHECK the fial exam
More information9.8: THE POWER OF A TEST
9.8: The Power of a Test CD91 9.8: THE POWER OF A TEST I the iitial discussio of statistical hypothesis testig, the two types of risks that are take whe decisios are made about populatio parameters based
More informationMaximum Likelihood Estimators.
Lecture 2 Maximum Likelihood Estimators. Matlab example. As a motivatio, let us look at oe Matlab example. Let us geerate a radom sample of size 00 from beta distributio Beta(5, 2). We will lear the defiitio
More informationBASIC STATISTICS. f(x 1,x 2,..., x n )=f(x 1 )f(x 2 ) f(x n )= f(x i ) (1)
BASIC STATISTICS. SAMPLES, RANDOM SAMPLING AND SAMPLE STATISTICS.. Radom Sample. The radom variables X,X 2,..., X are called a radom sample of size from the populatio f(x if X,X 2,..., X are mutually idepedet
More informationCenter, Spread, and Shape in Inference: Claims, Caveats, and Insights
Ceter, Spread, ad Shape i Iferece: Claims, Caveats, ad Isights Dr. Nacy Pfeig (Uiversity of Pittsburgh) AMATYC November 2008 Prelimiary Activities 1. I would like to produce a iterval estimate for the
More informationNormal Distribution.
Normal Distributio www.icrf.l Normal distributio I probability theory, the ormal or Gaussia distributio, is a cotiuous probability distributio that is ofte used as a first approimatio to describe realvalued
More informationSequences II. Chapter 3. 3.1 Convergent Sequences
Chapter 3 Sequeces II 3. Coverget Sequeces Plot a graph of the sequece a ) = 2, 3 2, 4 3, 5 + 4,...,,... To what limit do you thik this sequece teds? What ca you say about the sequece a )? For ǫ = 0.,
More informationHypergeometric Distributions
7.4 Hypergeometric Distributios Whe choosig the startig lieup for a game, a coach obviously has to choose a differet player for each positio. Similarly, whe a uio elects delegates for a covetio or you
More informationSequences and Series
CHAPTER 9 Sequeces ad Series 9.. Covergece: Defiitio ad Examples Sequeces The purpose of this chapter is to itroduce a particular way of geeratig algorithms for fidig the values of fuctios defied by their
More informationNotes on Hypothesis Testing
Probability & Statistics Grishpa Notes o Hypothesis Testig A radom sample X = X 1,..., X is observed, with joit pmf/pdf f θ x 1,..., x. The values x = x 1,..., x of X lie i some sample space X. The parameter
More informationZTEST / ZSTATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown
ZTEST / ZSTATISTIC: used to test hypotheses about µ whe the populatio stadard deviatio is kow ad populatio distributio is ormal or sample size is large TTEST / TSTATISTIC: used to test hypotheses about
More informationThe analysis of the Cournot oligopoly model considering the subjective motive in the strategy selection
The aalysis of the Courot oligopoly model cosiderig the subjective motive i the strategy selectio Shigehito Furuyama Teruhisa Nakai Departmet of Systems Maagemet Egieerig Faculty of Egieerig Kasai Uiversity
More informationIrreducible polynomials with consecutive zero coefficients
Irreducible polyomials with cosecutive zero coefficiets Theodoulos Garefalakis Departmet of Mathematics, Uiversity of Crete, 71409 Heraklio, Greece Abstract Let q be a prime power. We cosider the problem
More informationChapter 7: Confidence Interval and Sample Size
Chapter 7: Cofidece Iterval ad Sample Size Learig Objectives Upo successful completio of Chapter 7, you will be able to: Fid the cofidece iterval for the mea, proportio, ad variace. Determie the miimum
More informationThe second difference is the sequence of differences of the first difference sequence, 2
Differece Equatios I differetial equatios, you look for a fuctio that satisfies ad equatio ivolvig derivatives. I differece equatios, istead of a fuctio of a cotiuous variable (such as time), we look for
More informationPSYCHOLOGICAL STATISTICS
UNIVERSITY OF CALICUT SCHOOL OF DISTANCE EDUCATION B Sc. Cousellig Psychology (0 Adm.) IV SEMESTER COMPLEMENTARY COURSE PSYCHOLOGICAL STATISTICS QUESTION BANK. Iferetial statistics is the brach of statistics
More informationLecture 4: Cheeger s Inequality
Spectral Graph Theory ad Applicatios WS 0/0 Lecture 4: Cheeger s Iequality Lecturer: Thomas Sauerwald & He Su Statemet of Cheeger s Iequality I this lecture we assume for simplicity that G is a dregular
More informationTHIN SEQUENCES AND THE GRAM MATRIX PAMELA GORKIN, JOHN E. MCCARTHY, SANDRA POTT, AND BRETT D. WICK
THIN SEQUENCES AND THE GRAM MATRIX PAMELA GORKIN, JOHN E MCCARTHY, SANDRA POTT, AND BRETT D WICK Abstract We provide a ew proof of Volberg s Theorem characterizig thi iterpolatig sequeces as those for
More informationChapter 5 O A Cojecture Of Erdíos Proceedigs NCUR VIII è1994è, Vol II, pp 794í798 Jeærey F Gold Departmet of Mathematics, Departmet of Physics Uiversity of Utah Do H Tucker Departmet of Mathematics Uiversity
More informationChapter 14 Nonparametric Statistics
Chapter 14 Noparametric Statistics A.K.A. distributiofree statistics! Does ot deped o the populatio fittig ay particular type of distributio (e.g, ormal). Sice these methods make fewer assumptios, they
More informationApproximating the Sum of a Convergent Series
Approximatig the Sum of a Coverget Series Larry Riddle Ages Scott College Decatur, GA 30030 lriddle@agesscott.edu The BC Calculus Course Descriptio metios how techology ca be used to explore covergece
More informationUsing Excel to Construct Confidence Intervals
OPIM 303 Statistics Ja Stallaert Usig Excel to Costruct Cofidece Itervals This hadout explais how to costruct cofidece itervals i Excel for the followig cases: 1. Cofidece Itervals for the mea of a populatio
More informationTHE HEIGHT OF qbinary SEARCH TREES
THE HEIGHT OF qbinary SEARCH TREES MICHAEL DRMOTA AND HELMUT PRODINGER Abstract. q biary search trees are obtaied from words, equipped with the geometric distributio istead of permutatios. The average
More informationGrade 7. Strand: Number Specific Learning Outcomes It is expected that students will:
Strad: Number Specific Learig Outcomes It is expected that studets will: 7.N.1. Determie ad explai why a umber is divisible by 2, 3, 4, 5, 6, 8, 9, or 10, ad why a umber caot be divided by 0. [C, R] [C]
More informationMeasures of Spread and Boxplots Discrete Math, Section 9.4
Measures of Spread ad Boxplots Discrete Math, Sectio 9.4 We start with a example: Example 1: Comparig Mea ad Media Compute the mea ad media of each data set: S 1 = {4, 6, 8, 10, 1, 14, 16} S = {4, 7, 9,
More informationExample 2 Find the square root of 0. The only square root of 0 is 0 (since 0 is not positive or negative, so those choices don t exist here).
BEGINNING ALGEBRA Roots ad Radicals (revised summer, 00 Olso) Packet to Supplemet the Curret Textbook  Part Review of Square Roots & Irratioals (This portio ca be ay time before Part ad should mostly
More informationAQA STATISTICS 1 REVISION NOTES
AQA STATISTICS 1 REVISION NOTES AVERAGES AND MEASURES OF SPREAD www.mathsbox.org.uk Mode : the most commo or most popular data value the oly average that ca be used for qualitative data ot suitable if
More information4.1 Sigma Notation and Riemann Sums
0 the itegral. Sigma Notatio ad Riema Sums Oe strategy for calculatig the area of a regio is to cut the regio ito simple shapes, calculate the area of each simple shape, ad the add these smaller areas
More informationEstimating Probability Distributions by Observing Betting Practices
5th Iteratioal Symposium o Imprecise Probability: Theories ad Applicatios, Prague, Czech Republic, 007 Estimatig Probability Distributios by Observig Bettig Practices Dr C Lych Natioal Uiversity of Irelad,
More informationSTATISTICAL METHODS FOR BUSINESS
STATISTICAL METHODS FOR BUSINESS UNIT 7: INFERENTIAL TOOLS. DISTRIBUTIONS ASSOCIATED WITH SAMPLING 7.1. Distributios associated with the samplig process. 7.2. Iferetial processes ad relevat distributios.
More informationUniversal coding for classes of sources
Coexios module: m46228 Uiversal codig for classes of sources Dever Greee This work is produced by The Coexios Project ad licesed uder the Creative Commos Attributio Licese We have discussed several parametric
More informationarxiv:1506.03481v1 [stat.me] 10 Jun 2015
BEHAVIOUR OF ABC FOR BIG DATA By Wetao Li ad Paul Fearhead Lacaster Uiversity arxiv:1506.03481v1 [stat.me] 10 Ju 2015 May statistical applicatios ivolve models that it is difficult to evaluate the likelihood,
More informationLecture 13. Lecturer: Jonathan Kelner Scribe: Jonathan Pines (2009)
18.409 A Algorithmist s Toolkit October 27, 2009 Lecture 13 Lecturer: Joatha Keler Scribe: Joatha Pies (2009) 1 Outlie Last time, we proved the BruMikowski iequality for boxes. Today we ll go over the
More informationAMS 2000 subject classification. Primary 62G08, 62G20; secondary 62G99
VARIABLE SELECTION IN NONPARAMETRIC ADDITIVE MODELS Jia Huag 1, Joel L. Horowitz 2 ad Fegrog Wei 3 1 Uiversity of Iowa, 2 Northwester Uiversity ad 3 Uiversity of West Georgia Abstract We cosider a oparametric
More informationSystems Design Project: Indoor Location of Wireless Devices
Systems Desig Project: Idoor Locatio of Wireless Devices Prepared By: Bria Murphy Seior Systems Sciece ad Egieerig Washigto Uiversity i St. Louis Phoe: (805) 6985295 Email: bcm1@cec.wustl.edu Supervised
More information1. C. The formula for the confidence interval for a population mean is: x t, which was
s 1. C. The formula for the cofidece iterval for a populatio mea is: x t, which was based o the sample Mea. So, x is guarateed to be i the iterval you form.. D. Use the rule : pvalue
More informationStatistical inference: example 1. Inferential Statistics
Statistical iferece: example 1 Iferetial Statistics POPULATION SAMPLE A clothig store chai regularly buys from a supplier large quatities of a certai piece of clothig. Each item ca be classified either
More informationTHE ABRACADABRA PROBLEM
THE ABRACADABRA PROBLEM FRANCESCO CARAVENNA Abstract. We preset a detailed solutio of Exercise E0.6 i [Wil9]: i a radom sequece of letters, draw idepedetly ad uiformly from the Eglish alphabet, the expected
More informationEstimating the Mean and Variance of a Normal Distribution
Estimatig the Mea ad Variace of a Normal Distributio Learig Objectives After completig this module, the studet will be able to eplai the value of repeatig eperimets eplai the role of the law of large umbers
More informationNonlife insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring
Nolife isurace mathematics Nils F. Haavardsso, Uiversity of Oslo ad DNB Skadeforsikrig Mai issues so far Why does isurace work? How is risk premium defied ad why is it importat? How ca claim frequecy
More informationDiscrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13
EECS 70 Discrete Mathematics ad Probability Theory Sprig 2014 Aat Sahai Note 13 Itroductio At this poit, we have see eough examples that it is worth just takig stock of our model of probability ad may
More informationProbabilistic Engineering Mechanics. Do Rosenblatt and Nataf isoprobabilistic transformations really differ?
Probabilistic Egieerig Mechaics 4 (009) 577 584 Cotets lists available at ScieceDirect Probabilistic Egieerig Mechaics joural homepage: wwwelseviercom/locate/probegmech Do Roseblatt ad Nataf isoprobabilistic
More informationTHE TWOVARIABLE LINEAR REGRESSION MODEL
THE TWOVARIABLE LINEAR REGRESSION MODEL Herma J. Bieres Pesylvaia State Uiversity April 30, 202. Itroductio Suppose you are a ecoomics or busiess maor i a college close to the beach i the souther part
More information8.1 Arithmetic Sequences
MCR3U Uit 8: Sequeces & Series Page 1 of 1 8.1 Arithmetic Sequeces Defiitio: A sequece is a comma separated list of ordered terms that follow a patter. Examples: 1, 2, 3, 4, 5 : a sequece of the first
More informationA CUSUM TEST OF COMMON TRENDS IN LARGE HETEROGENEOUS PANELS
A CUSUM TEST OF COMMON TRENDS IN LARGE HETEROGENEOUS PANELS JAVIER HIDALGO AND JUNGYOON LEE A. This paper examies a oparametric CUSUMtype test for commo treds i large pael data sets with idividual fixed
More informationFIBONACCI NUMBERS: AN APPLICATION OF LINEAR ALGEBRA. 1. Powers of a matrix
FIBONACCI NUMBERS: AN APPLICATION OF LINEAR ALGEBRA. Powers of a matrix We begi with a propositio which illustrates the usefuless of the diagoalizatio. Recall that a square matrix A is diogaalizable if
More informationChapter 5 Unit 1. IET 350 Engineering Economics. Learning Objectives Chapter 5. Learning Objectives Unit 1. Annual Amount and Gradient Functions
Chapter 5 Uit Aual Amout ad Gradiet Fuctios IET 350 Egieerig Ecoomics Learig Objectives Chapter 5 Upo completio of this chapter you should uderstad: Calculatig future values from aual amouts. Calculatig
More informationwhere: T = number of years of cash flow in investment's life n = the year in which the cash flow X n i = IRR = the internal rate of return
EVALUATING ALTERNATIVE CAPITAL INVESTMENT PROGRAMS By Ke D. Duft, Extesio Ecoomist I the March 98 issue of this publicatio we reviewed the procedure by which a capital ivestmet project was assessed. The
More informationLecture 3. denote the orthogonal complement of S k. Then. 1 x S k. n. 2 x T Ax = ( ) λ x. with x = 1, we have. i = λ k x 2 = λ k.
18.409 A Algorithmist s Toolkit September 17, 009 Lecture 3 Lecturer: Joatha Keler Scribe: Adre Wibisoo 1 Outlie Today s lecture covers three mai parts: CouratFischer formula ad Rayleigh quotiets The
More informationFactors of sums of powers of binomial coefficients
ACTA ARITHMETICA LXXXVI.1 (1998) Factors of sums of powers of biomial coefficiets by Neil J. Cali (Clemso, S.C.) Dedicated to the memory of Paul Erdős 1. Itroductio. It is well ow that if ( ) a f,a = the
More informationUnit 20 Hypotheses Testing
Uit 2 Hypotheses Testig Objectives: To uderstad how to formulate a ull hypothesis ad a alterative hypothesis about a populatio proportio, ad how to choose a sigificace level To uderstad how to collect
More informationOverview of some probability distributions.
Lecture Overview of some probability distributios. I this lecture we will review several commo distributios that will be used ofte throughtout the class. Each distributio is usually described by its probability
More informationAnnuities Under Random Rates of Interest II By Abraham Zaks. Technion I.I.T. Haifa ISRAEL and Haifa University Haifa ISRAEL.
Auities Uder Radom Rates of Iterest II By Abraham Zas Techio I.I.T. Haifa ISRAEL ad Haifa Uiversity Haifa ISRAEL Departmet of Mathematics, Techio  Israel Istitute of Techology, 3000, Haifa, Israel I memory
More informationPlugin martingales for testing exchangeability online
Plugi martigales for testig exchageability olie Valetia Fedorova, Alex Gammerma, Ilia Nouretdiov, ad Vladimir Vovk Computer Learig Research Cetre Royal Holloway, Uiversity of Lodo, UK {valetia,ilia,alex,vovk}@cs.rhul.ac.uk
More information5 Boolean Decision Trees (February 11)
5 Boolea Decisio Trees (February 11) 5.1 Graph Coectivity Suppose we are give a udirected graph G, represeted as a boolea adjacecy matrix = (a ij ), where a ij = 1 if ad oly if vertices i ad j are coected
More informationLecture 4: Cauchy sequences, BolzanoWeierstrass, and the Squeeze theorem
Lecture 4: Cauchy sequeces, BolzaoWeierstrass, ad the Squeeze theorem The purpose of this lecture is more modest tha the previous oes. It is to state certai coditios uder which we are guarateed that limits
More information1 Hypothesis testing for a single mean
BST 140.65 Hypothesis Testig Review otes 1 Hypothesis testig for a sigle mea 1. The ull, or status quo, hypothesis is labeled H 0, the alterative H a or H 1 or H.... A type I error occurs whe we falsely
More informationBASIC STATISTICS. Discrete. Mass Probability Function: P(X=x i ) Only one finite set of values is considered {x 1, x 2,...} Prob. t = 1.
BASIC STATISTICS 1.) Basic Cocepts: Statistics: is a sciece that aalyzes iformatio variables (for istace, populatio age, height of a basketball team, the temperatures of summer moths, etc.) ad attempts
More information1 Itroductio Let A be a complex matrix ad let C (A) be its th compoud. It was show i [10, Formula (12)] that the imal row sum (of moduli) of elemets o
Bouds o orms of compoud matrices ad o products of eigevalues Ludwig Elser Faultat fur Mathemati Uiversitat Bielefeld Postfach 100131 D33615 Bielefeld Germay Daiel Hershowitz Departmet of Mathematics Techio
More informationLECTURE 13: Crossvalidation
LECTURE 3: Crossvalidatio Resampli methods Cross Validatio Bootstrap Bias ad variace estimatio with the Bootstrap Threeway data partitioi Itroductio to Patter Aalysis Ricardo GutierrezOsua Texas A&M
More information3. Greatest Common Divisor  Least Common Multiple
3 Greatest Commo Divisor  Least Commo Multiple Defiitio 31: The greatest commo divisor of two atural umbers a ad b is the largest atural umber c which divides both a ad b We deote the greatest commo gcd
More information