Size: px
Start display at page:

Download ""

Transcription

1 Statistica Siica 6(1996), EFFECT OF HIGH DIMENSION: BY AN EXAMPLE OF A TWO SAMPLE PROBLEM Zhidog Bai ad Hewa Saraadasa Natioal Su Yat-se Uiversity Abstract: With the rapid developmet of moder computig techiques, statisticias are dealig with data with much higher dimesio. Cosequetly, due to their loss of accuracy or power, some classical statistical ifereces are beig challeged by o-exact approaches. The purpose of this paper is to poit out ad briey aalyze such a pheomeo ad to ecourage statisticias to reexamie classical statistical approaches whe they are dealig with high dimesioal data. As a example, we derive the asymptotic power of the classical Hotellig's T test ad Dempster's oexact test for atwo-sample problem. Also, a asymptotically ormally distributed test statistic is proposed. Our results show that both Dempster's o-exact test ad the ew test have higher power tha Hotellig's test whe the data dimesio is proportioally close to the withi sample degrees of freedom. Although our ew test has a asymptotic power fuctio similar to Dempster's, it does ot rely o the ormality assumptio. Some simulatio results are preseted which show that the o-exact tests are more powerful tha Hotellig's test eve for moderately large dimesio ad sample sizes. Key words ad phrases: Edgeworth expasio, Hotellig T test, hypothesis test, power fuctio, sigicace test, approximatio. 1. Itroductio Moder computatio techiques make it possible to deal with high dimesioal data. Some recet examples of iterest i dealig with high dimesioal data ca be foud i Narayaaswamy ad Raghavarao (1991) ad Saraadasa (1991, 1993). Examples may also be foud i applied statistical iferece hadlig samples of may measuremets o idividuals. For example, i a cliical trial of pharmaceutical studies, may blood chemistry measuremets are measured o each idividual. I some studies the umber of variables is comparable to or eve exceeds the total sample size. The purpose of this article is to raise the followig questios: What's ew i high dimesioal statistical iferece ad what should be doe? The dierece of high dimesioal statistical iferece from that i classical statistical iferece will be referred to as the \Eect of High Dimesio" (EHD).

2 31 ZHIDONG BAI AND HEWA SARANADASA There are two aspects of the EHD. The rst, there are too may iterestig or uisace parameters i the model. For example, i M-estimatio i liear models, the umber of regressio parameters may be proportioal to the sample size. This problem remais usolved. The best results are due to Huber's work (1973) i which the cosistecy of estimatio is proved uder the assumptio that p =! 0 ad the asymptotic ormality uder p 3 =! 0, where ad p are the sample size ad the dimesio of regressio coecet vector. Althogh these requiremets o the ratio of the dimetio to the sample size were reduced, very strog assumptios were made o the desig sequece. Refereces are made to Portoy (1984,1985). Aother example is the model of Error i Variables i which the true regressor variables ca be cosidered as uisace parameters whose umber is p (while the umber of observatios is (p + 1)). I these cases, either the estimatio is very poor or it is impossible to get a ubiased or cosistet estimator. The secod case is that the dimesio itself of the data is very high. Of course, the umber of parameters to be estimated must be very large. A example is the detectio of the sigal umber i omi-directioal sigal processig. Whe the umber of sesors are icreased, the detectio accuracy is supposed to be better. However, the simulatio results show the opposite whe the traditioal method (the MUSIC method) is used if the umber of sesors is 10 or more. We believe that the reaso is that the umber of elemets of the covariace matrix (parametrs to be estimated) becomes very large (p ad 00 if p = 10). Some refereces i this directio are Bai, Krishaiah ad Zhao (1989) ad Zhao, Krishiaiah ad Bai (1986a,b). Although the EHD has bee oticed i may dieret directios of multivariate statistical ifereces, the problem has ot yet bee clearly stated i the literature ad o appropriate methods have bee proposed to deal with the EHD. To this ed, we shall aalyze these problems through the two sample problem, as a example to show howadwhy the EHD aects ifereces ad how the EHD ca be reduced. A classical method to deal with this problem is the famous Hotellig's T test. Its advatages iclude: it is ivariat uder liear trasformatio, its exact distributio is kow uder the ull hypothesis ad it is powerful whe the dimesio of data is sucietly small, compared with the sample sizes. However, Hotellig's test has the serious defect that the T statistic is udeed whe the dimesio of data is greater tha the withi sample degrees of freedom. Seekig remedies, Chug ad Fraser (1958) proposed a oparametric test ad Dempster (1958, 1960) discussed the so-called \o-exact" sigicace test. Dempster (1960) also cosidered the so-called radomizatio test. These works seek alteratives to Hotellig's test i situatios whe the latter does ot apply. Not oly beig a remedy whe the T is udeed, we show that eve it is well

3 EFFECT OF HIGH DIMENSION 313 deed, the o-exact test is more powerful tha the T test whe the dimesio is proportioally \close to" (more discussio o the ratio will be give i Sectio 5) the sample degrees of freedom. Both the T test ad Dempster's o-exact test strogly rely o the ormality assumptio. Moreover, Dempster's o-exact test statistic ivolves a complicated estimatio of r, the \degrees of freedom" for the chi-square approximatio. To simplify the testig procedure, a ew method is proposed i Sectio 4. It is prove i Sectios 3 ad4thatthe asymptotic power of the ew test is equivalet to that of Dempster's test. Simulatio results further show that our ew approach is slightly more powerful tha Dempster's. We believe that the estimatio of r ad its roudig to a iteger i Dempster's procedure may cause a error of order O(1=). This might idicate that the ew approach is superior to Dempster's test i the secod order term i some Edgeworth-type expasios. We shall ot discuss this i detail i this paper but hope to address it i future work. Some simulatio results ad discussios are preseted i Sectio 5 ad some techical proofs are give i the Appedix.. Asymptotic Power of Hotellig's Test I this sectio, we derive the asymptotic power fuctios of the T test for the two sample problem. The model described here is the same as the oe i Dempster's test give i the ext sectio. Suppose that x i j N p ( i ) j =1 ::: N i i =1 are two idepedet samples. To test the hypothesis H 0 : 1 = vs H 1 : 1 6=, traditioally oe uses Hotellig's famous T test which isdeedby T = (x 1 ; x ) 0 A ;1 (x 1 ; x ) (:1) P where x i = 1 N i Ni x j=1 i j i = 1 A = P P N i j=1(x i j ; x i )(x i j ; x i ) 0 ad = N1N N 1+N with = N 1 + N ;. The purpose of this sectio is to ivestigate the power fuctio of Hotellig's test whe p=! y (0 1) for guarateeig the existece of the T statistic, ad to compare it with other o-exact tests give i later sectios. To derive the asymptotic power of Hotellig's test, we rst derive a asymptotic expressio for the threshold of the test. It is well kow that uder the ull hypothesis, ;p+1 p T has a F -distributio with degrees of freedom p ad ; p +1. Let the sigicace level be chose as ad the threshold be deoted by F (p ; p + 1). We have the followig lemma. q p Lemma.1. F ;p+1 (p ; p +1)= y + y 1;y (1;y) 3 + o( p 1 ) where y = p, lim!1 y = y (0 1) ad is the 1 ; quatile of stadard ormal distributio.

4 314 ZHIDONG BAI AND HEWA SARANADASA Proof. Uder the ull hypothesis, by the Cetral Limit Theorem, s (1 ; y) 3 T y ; y!n(0 1) as!1 1 ; y from which the result follows immediately. Now, we cosider the behavior of T = uder H 1. I this case, its distributio is the same as (w + ;1= ) 0 U ;1 (w + ;1= ) (:) where = ; 1 (1 ; ) U= P u iu 0 i w =(w 1 ::: w p ) 0 ad u i i =1 ::: are i.i.d. N(0 I p ) radom vectors ad =(N 1 + N )=N 1 N : Deote the spectral decompositio of U ;1 by O 0 diag[d 1 ::: d p ]O with eigevalues d 1 d p > 0. The, (.) becomes (Ow + ;1= kkv) 0 diag[d 1 ::: d p ](Ow + ;1= kkv) (:3) where v = O=kk. Sice U has the Wishart distributio W ( I p ), the orthogoal matrix O has the Haar distributio o the group of all orthogoal p-matrices, ad hece the vector v is uiformly distributed o the uit p-sphere. Note that the coditioal distributio of Ow give O is N (0 I p ), the same as that of w which is idepedet ofo. This shows that Ow is idepedet ofv. Therefore, replacig Ow i (.3) by w does ot chage the joit distributio of Ow, v ad the d i 's. Cosequetly, T has the same distributio as = px (w i +w i v i ;1= kk + ;1 kk v i )d i (:4) where v =(v 1 ::: v p ) 0 is uiformly distributed o the uit sphere of R p ad is idepedet ofw ad the d i 's. Lemma.. Usig the above otatio, we have p P p d i ; y 1;y P p d i! y (1;y) 3 i probability.! 0 ad Proof. Recallig (.4) with = 0 uder the ull hypothesis ad applyig the Cetral Limit Theorem with D = fd 1 ::: d p g give, we have s T P 1 ; y y + y (1 ; y) 3 x p P q ( 1;y p ; y d y i)+ x (1;y) 3 h = E P P p (w i ; 1)d i p P p d i h p ( 1;y y = E P q p ; d y i)+ p P p d i (1;y) 3 p P p d i x D i i + o(1) (:5)

5 EFFECT OF HIGH DIMENSION 315 where is the distributio fuctio of a stadard ormal variable. O the other had, as show i the proof of Lemma.1, the Cetral Limit Theorem implies that the above quatity teds to (x), for all x. Hece, by the type-covergece theorem (see Page 16 of Loeve (1977)), the lemma is proved. Now we are i positio to derive a approximatio of the power fuctio of Hotellig's test. Theorem.1. If y = p! y (0 1), N 1=(N 1 + N )! (0 1) ad kk = o(1) the H () ; ; + s (1 ; y) y where H () is the power fuctio of Hotellig's test. (1 ; )kk! 0 (:6) Remark.1. The usual cosideratio of the alterative hypothesis i limitig theorems is to assume that p kk! a > 0. Uder this additioal assumptio, it follows from (.6) that the limitig power of Hotellig's test is give by (; +((1;y)=y) 1= (1;)a). This formula shows that the limitig power of Hotellig's test is slowly icreasig for y close to 1, as the o-cetral parameter (amely a) icreases. Proof. Write D =(d 1 ::: d ). Usig the facts Ev 1 =1=p, Ev 4 1 =3=[p(p +)] ad Ev 1 v =1=[p(p + )] ad the applyig Lemma., oe easily obtais E E h px w i v i d i ;1= kk i D = hx p (vi ; Evi ) ;1 kk d i i D = ; kk 4 h p(p +) px d i ; p (p +) px d i kk p! 0 i Pr. (:7) px i d i! 0i Pr. (:8) ad px (Ev i ) ;1 kk d i = 1 p kk px d i = y kk p(1 ; y ) (1 + o p( 1 p )): (:9) Thus, by the above ad Lemma.1, we have H () =P p X w i d i y 1 ; y + s y (1 ; y) 3 p ; y kk p(1 ; y ) + o( p 1 )

6 316 ZHIDONG BAI AND HEWA SARANADASA q y h = E P = ; + P p (w i ; 1)d i pp p d i s (1 ; y) y The proof of Theorem.1 is ow complete. 3. Discussio o Dempster's No-Exact Test p ) (1;y) 3 p ; ykk + o( 1 p(1;y) pp p d i D i (1 ; )kk + o(1): (:10) Dempster (1958, 1960) proposed a o-exact test for the hypothesis described i Sectio, with the dimesio of data possibly greater tha the sample degrees of freedom. First, let us briey describe his test. Deote q N = N 1 + N, X 0 = (x 11 x 1 ::: x 1N1 x 1 ::: x N ) ad by H 0 = ( p 1 N N J N ( N J0 1(N 1+N ) N 1, q N ; 1 N J0 (N 1+N ) N ) 0 h 3 ::: h N ) a suitably chose orthogoal matrix, where J d is a d dimesioal colum vector of 1's. Let Y = HX = (y 1 ::: y N ) 0. The, the vectors y 1 ::: y N are idepedet ormal radom vectors with E(y 1 ) = (N 1 1 +N )= p N, E(y )= ;1= ( 1 ; ), E(y j )=0 for 3 j N Cov(y j )= 1 j N. The, Dempster proposed his o-exact sigicace test statistic F = Q =( P N Q i=3 i=), where Q i = yiy 0 i, = N ;. He used the so-called approximatio techique, assumig Q i is approximately distributed as m r, where the parameters m ad r may besolved by the method of momets. The, the distributio of F is approximately F r r. But geerally the parameter r (its explicit form is give i (3.3) below) is ukow. He estimated r by either of the followig two ways. Approach 1: ^r is the solutio of the equatio Approach : ^r is the solutio of the equatio t + w = t = + 1^r 1+ 1 ( ; 1) (3:1) 1 3^r 1 + 1^r 1+ 1 ( ; 1) + + 3^r 1^r 3 (3:) ^r where t = [l( 1 P N i=3 Q i)] ; P N i=3 l Q i, w = ; P 3i<jN l si ij ad ij is the agle betwee the vectors of y i y j, 3 i<j N. Dempster's test is the to reject H 0 if F >F (^r ^r): By elemetary calculus, we have r = (tr()) tr( ) ad m = tr( ) tr : (3:3)

7 EFFECT OF HIGH DIMENSION 317 From (3.3) ad the Cauchy-Schwarz iequality, it follows that r p. O the other had, uder regular coditios, both tr() ad tr( ) are of the order O(), ad hece, r is of the same order. Uder wider coditios (3.7) ad (3.8) give i Theorem 3.1 below, it ca be proved that r! 1. Further, we may prove that t (=r)n (1 ;1= ) ad w (;1) 4 r N (1 + 8 (;1) r ). From these estimates, oe may coclude that both ^r 1 ad ^r are ratio-cosistet (i the sese that ^r r! 1). Therefore, the solutios of equatios (3.1) ad (3.) should satisfy ^r 1 = t + O(1) (3:4) ad ^r = 1 w + O(1) (3:5) respectively. Sice the radom eect may cause a error of order O(1), oe may simply choose the estimates of r as t or 1 w;. I the remaider of this sectio, we derive a asymptotic power fuctio of Dempster's o-exact test, uder the coditios: p=! y>0, N 1 =(N 1 + N )! (0 1) ad the parameter r is kow. The reader should ote that the limitig ratio y is allowed to be greater tha oe i this case, which is dieret fromthat assumed i Sectio. Whe r is ukow, substitutig r by the estimators ^r 1 or ^r may cause a error of high order smalless i the approximatio of the power fuctio of Dempster's o-exact test, as will be see i the proof of Theorem 3.1. Similar to Lemma.1 oe may show the followig lemma. Lemma 3.1. Whe r!1, F (r r)=1+ q =r + o(1= p r): (3:6) The we have the followig approximatio of the power fuctio of Dempster's test. Theorem 3.1. If ad r is kow, the where = 1 ; : D () ; (; + 0 = o(tr ) (3:7) max = o( p tr ) (3:8) (1 ; )kk p )! 0 (3:9) tr

8 318 ZHIDONG BAI AND HEWA SARANADASA Remark 3.1. I usual cases whe cosiderig the asymptotic power of Dempster's test, the quatity kk is ordiarily assumed to have the same order as 1= p ad tr( )tohave order. Thus, the quatities kk = p tr ad p kk are both bouded away from zero ad iity. The expressio of the asymptotic power of Hotellig test is ivolved with a factor p 1 ; y which disappears i the expressio of the asymptotic power of Dempster's test. This reveals the reaso why the power of the Hotellig test icreases much slower tha that of the Dempster test as the o-cetral parameter icreases if y is close to oe. Proof. Let =( 1 ::: p ) 0 = ; 1 : The, P p D () =P (yi + ;1= i y i + ;1 i ) P i P p > ;1 F (r r) (3:10) j=1 z ij i where y i, z ij i =1 ::: p j =1 ::: are i.i.d. N (0 1) variables ad 1 ::: p are eigevalues of. By the Cetral Limit Theorem, the laws of large umbers, (3.7) ad (3.8), oe may easily show that: P p (y i ; 1+ ;1= i y i ) i p tr = P p (y i ; 1+ ;1= i y i ) i p tr +4 ;1 0 D!N (0 1): (3:11) ad X px j=1 q q zij i = (tr) 1+ =rn (0 1) + o p ( 1=r) : (3:1) Notig that P p i i = kk ad r = (tr) tr the result (3.9) follows from (3.7) ad Lemma 3.1, immediately. The proof of Theorem 3.1 is ow complete. 4. A New Approach to Test H 0 I this sectio, we propose a ew test for H 0. Istead of the ormality of the uderlyig distributios, we assume: (a) x ij =;z ij + j i =1 ::: N j, j =1, where ; is a p m matrix (m 1) with ;; 0 =adz ij are i.i.d. radom m-vectors with idepedet compoets satisfyig Ez ij = 0, Var(z ij ) = I m, Ezijk 4 = 3+ < 1 ad Q m E k=1 z k ijk = 0 (ad 1) whe there is at least oe k = 1 (there are two k 's equal to, correspodigly), wheever m =4 (b) p=! y>0adn 1 =(N 1 + N )! (0 1) (c) (3.7) ad (3.8) are true. Here ad later, it should be oted that all radom variables ad parameters deped o. For simplicity we omit the subscript from all radom variables except those statistics deed later.

9 EFFECT OF HIGH DIMENSION 319 Now, we begi to costruct our test. Cosider the statistic M =(x 1 ; x ) 0 (x 1 ; x ) ; trs (4:1) where S = 1 A, x 1 x ad A are deed i Sectio. Uder H 0, we have EM =0. If the coditios (a) - (c) are true, it may be proved (see the Appedix) that uder H 0, M Z = p!n(0 1) as!1: (4:) VarM If the uderlyig distributios are ormal as described i Sectio, the uder H 0 wehave M := VarM = (1 + 1 )tr : (4:3) If the uderlyig distributios are ot ormal but satisfy the coditios (a) - (c), oe may show (see the Appedix) that VarM = M(1 + o(1)): (4:4) Hece (4.) is still true if the deomiator of Z is replaced by M. Therefore, to complete the costructio of our test statistic, we eed oly d a ratio-cosistet estimator of tr( )adsubstitute it ito the deomiator of Z. It seems that a atural estimator of tr should be trs. However, ulike the case where p is xed, trs is geerally either ubiased or ratio-cosistet eve uder the ormal assumptio. If S W p ( ),itisroutietoverify that B = trs ; 1 ( +)( ; 1) (trs ) is a ubiased ad ratio-cosistet estimator of tr. Here, it should be oted that trs ; 1 (trs ) 0, by the Cauchy-Schwarz iequality. I the Appedix, we shall prove that B is still a ratio-cosistet estimator of tr uder the Coditios (a) - (c). Replacig tr i (4.3) by the ratio-cosistet estimator B,we obtai our test statistic Z = (x 1 ; x ) 0 (x 1 ; x ) ; trs trs ; ;1 (trs ) r (+1) (+)(;1) = N 1N N 1+N (x 1 ; x ) 0 (x 1 ; x ) ; trs q (+1) B!N(0 1): (4:5) Due to (4.5) the test rejects H 0 if Z > : Regardig the asymptotic power of our ew test, we have the followig theorem.

10 30 ZHIDONG BAI AND HEWA SARANADASA Theorem 4.1. Uder the Coditios i (a) - (c), (1 ; )kk BS () ; ; + p tr! 0: (4:6) Proof. Let z j be the sample mea of z ij, i =1 ::: j j =1 ad let M 0 =(z 1 ; z ) 0 ; 0 ;(z 1 ; z ) ; tr(s ): The, M 0 has the same distributio as M uder H 0. Thus, Var(M 0 )= M(1 + o(1)) ad M 0 = p Var(M 0 )!N(0 1). Note that M = M 0 ; 0 (z 1 ; z )+kk ad by (3.7) Var( 0 (z 1 ; z )) = 0 = o( tr( )): Hece, Var(M 0 )= Var(M )! 1 ad cosequetly Note that (+1) B= Var(M) 0! 1: Hece, p M;kk!N(0 1): Var(M 0 ) Z ; (1 ; )kk p tr( )!N(0 1): This implies that BS () =P H1 (Z > ) = P M ;kk p Var M 0 = ; + > ; which completes the proof of the theorem. 5. Discussios ad Simulatios (1 ; )kk p tr + o(1) (1 ; )kk p + o(1) (4:7) tr Comparig Theorems.1, 3.1 ad 4.1, we d that from the poit of view of large sample theory, Hotellig's test is less powerful tha the other two tests, whe y is close to oe, ad that the latter two tests have the same asymptotic power fuctio. Our simulatio results show that eve for moderate sample ad dimesio sizes, Hotellig's test is still less powerful tha the other two tests whe the uderlyig covariace structure is reasoably regular (i.e., the structure of does ot cause a too large dierece betwee 0 ;1 ad p kk = p tr( )), whereas the Type I error does ot chage much i the latter two tests. It would ot be hard to see that usig the approach of this paper, oe may easily derive similar results for the oe-sample problem, amely, Hotellig's test

11 EFFECT OF HIGH DIMENSION 31 is less powerful tha a o-exact test which ca be deed as i Sectio 4, whe the dimesio of data is high. Now, we would like to explai why this pheomeo happes. The reaso for the less powerfuless of Hotellig's test is the \iaccuracy" of the estimator of the covariace matrix. Let X 1 ::: X be i.i.d. radom p-vectors of mea 0 ad variace-covariace matrix I p. By the law of large umbers, the sample covariace matrix S = P ;1 X i X 0 i should be \close" to the idetity I p with a error of the order O p (1= p ) whe p is xed. However, whe p is proportioal to (say p=! y (0 1)), the ratio of the largest ad the smallest eigevalues of S teds to (1 + p y) =(1 ; p y) (see, e.g., refereces Bai, Silverstei ad Yi (1988), Bai ad Yi (1993), Gema (1980), Silverstei (1985) ad Yi, Bai ad Krishaiah (1988)). More precisely, i the Theory of spectral aalysis of large dimesioal radom matrices, it has bee prove that the empirical distributio of the eigevalues of S teds to a limitig distributio spreadig over [(1 ; p y) (1 + p y) ] as! 1 (see e.g., Josso (198), Wachter (1978), Yi (1986) ad Yi, Bai ad Krishaiah (1983)). These show that S is ot close to I p. Especially whe y is \close to" oe, the S has may small eigevalues ad hece S ;1 has may huge eigevalues. This will cause the deciecy of the T test. We believe that i may other multivariate statistical ifereces with a iverse of a sample covariace matrix ivolved, the same pheomeo should exist ( as aother example, see Saraadasa (1991, 1993)). Here we would like to explai our quotatio-marked \ `close to' oe". Note that the limitig ratio of the largest to the smallest eigevalues of S teds to (1 + p y) =(1 ; p y). For our simulatio example, y =0:93 ad the ratio of the extreme eigevalues is about That is very serious. Eve for y as small as 0:1 or 0:01, the ratio ca be as large as 3:705 ad 1:494. These show that it is ot ecessary to require the dimesio of data to be very close to the degrees of freedom to make the eect of high dimesio visible. I fact, this has bee show by our simulatio for p =4. Dempster's test statistic depeds o the choice of vectors h 3 h 4 ::: h N because dieret choices of these vectors would produce dieret estimates of the parameter r. O the other had, the estimatio of r ad the roudig of the estimates may cause a error (probably a error of secod order smalless) i Dempster's test. Thus, we cojecture that our ew test ca be more powerful tha Dempster's i their secod terms of a Edgeworth type expasio of their power fuctios. This cojecture was strogly supported by our simulatio re-

12 3 ZHIDONG BAI AND HEWA SARANADASA sults. Because our test statistic is mathematically simple, it is ot dicult to get a Edgeworth expasio by usig the results obtai i Babu ad Bai (1993), Bai ad Rao (1991) or Bhattacharya ad Ghosh (1978). It seems dicult to get a similar expasio for Dempster's test due to his complicated estimatio of r. We coducted our simulatio study to compare the power of the three tests for both ormal ad o-ormal cases. Let N 1 = 5, N = 0, ad p = 40. For the o-ormal case, observatios were geerated by the followig movig average model: Let fu ijk g be a set of idepedet gamma variables with shape parameter 4 ad scale parameter 1. Dee X ijk = U ijk + U i j+1 k + jk (j =1 ::: p i =1 ::: N k k =1 ) where ad the 's are costats. Uder this model, = ( ij )with ii = 4(1 + ), i i1 =4 ad ij = 0 for ji ; jj > 1. For the ormal case, the covariace matrices were chose to be = I p ad=(1; )I p + J p with =0:5, where J is a p p matrix with all etries oe. Simulatio was also coducted for small p (chose as p =4). The tests were made for size =0:05 with 1000 repetitios. The power is evaluated at stadard parameter = k 1 ; k = p tr. The simulatio for the o-ormal case was coducted for =0 :3 :6 ad :9 (Table 5.1 ad Figure 5.1). All three tests have almost the same sigicace level. Uder the alterative hypothesis, the power curves of Dempster's test ad our test are rather close but that of our test is always higher tha Dempster's test. Theoretically, the power fuctio for Hotellig's test should icrease very lowly whe the ocetral parameter icreases. This was also demostrated by our simulatio results. The reader should ote that there are oly 1000 repetatios for each p value of ocetral parameter i our simulatio which may cause a error of 1=1000 = 0:0316 by Cetral Limit Theorem, it is ot surprisig the simulated power fuctio of the Hotellig's test, whose magitude is oly aroud 0:05, seems ot icreasig at some poits of the ocetral parameter. Similar tables are preseted for the ormal case (Table 5. ad Figure 5.). For higher dimesio cases the power fuctios of Dempster's test ad our test are almost the same ad our method is ot worse tha Hotellig's test eve for p =4. Ackowledgemet The research of the rst author was partially supported by US NSF grat DMS ad partially by ROC NSC grat NSC M L.

13 EFFECT OF HIGH DIMENSION 33 Table 5.1. Simulated power fuctios of the three tests with multivariate Gamma distributio. N =45 p=40 = :05 =0 =1 = :3 =3:4 = :6 =15:6 = :9 = 35:8 H D BS H D BS H D BS H D BS Table 5.. Simulated power fuctios of the three tests with multivariate ormal distributio. N =45 p=40 = :05 N =45 p=40 = :05 =0 =1 = :5 =41 =0 =1 = :5 =5 H D BS H D BS H D BS H D BS H: Hotellig's F test, D: Dempster's o exact F test, BS: Proposed ormal test, = kp 1; k ad = max tr mi.

14 34 ZHIDONG BAI AND HEWA SARANADASA (p =40 =0:0) (p =40 =0:3) Power Power Power Power (p =40 =0:6) (p =4 =0:9) Hotellig's test ;;;Dempster's test { BS's test Figure 5.1. Simulated power fuctios of the three tests with multivariate Gamma distributio (p =4 =0:0) (p =4 =0:5) Power Power (p =4 =0:0) (p =4 =0:5) Power Power Figure 5.. Simulated power fuctios of the three tests with multivariate ormal distributio. 1.0

15 EFFECT OF HIGH DIMENSION 35 Appedix. Asymptotics Related to the Statistic M A.1. The proof of (4.4): By deitio, we have M =(1+N 1 ;1 )kx 1 k +(1+N ;1 )kx k ; x 0 1 x ; ;1 X XNj j=1 kx ij k : Uder H 0, we may assume 1 = = 0. Write ; = [; 1 ::: ; p ] 0 = [ k `] ad ; 0 ;=[ k`]. The, by Coditios (a) - (c), we have Var( ;1 X XNj j=1 h = ; N tr( )+ Similarly, we may show that kx ij k )= ; E mx `=1 `` i X XNj px j=1 k=1 [(; 0 kz ij ) ;k; k k ] C ;1 [tr + max tr] = o( M): XN 1 Var(x 0 x 1 )=N ; N ; 1 E XN `=1 x 0 i1 x` 1 = tr( ) N 1 N Var(kx 1 k ) = N 1 P tr( )+ N m`=1 ``, P Var(kx 1 3 k = N tr( )+ N m`=1 `` 3 Cov(kx 1 k kx k ) = 0 ad Cov(x 0 1x kx j k )=0forj =1. Therefore, by the fact that P m `=1 `` p max, wehave 1 Var(M )= tr( )+ N1 3 The proof of (4.4) is the complete. + 1 h m i N X`=1 `` = M(1 + o(1)): 3 A.. The asymptotic ormality ofz uder H 0 : From the proof of A.1, oe ca see that (tr(s ) ; tr())= M! 0. Therefore, to show that Z!N(0 1), we eed oly show that[kx 1 ; x k ; E(kx 1 ; x k )]= M!N(0 1). We may rewrite kx 1 ; x k ; E(kx 1 ; x k )= := px k=1 mx `=1 h=1 [(; 0 k(z 1 ; z )) ;k; k k ] NX [U` h + V` h + ``(w ` h ; E(w ` h))]

16 36 ZHIDONG BAI AND HEWA SARANADASA P where z j = N j ;1 N j z ij, z jk deotes the kth compoet ofz j ad h;1 X i U` h = M hw` h ;1 `` w` k1 k 1=1 X `;1 V` h = M ;1 w` h `1=1 `1`(z 1`1 ; z `1) with the covetio that P 0 `1=1 = 0 ad the otatio w` h = 8 >< >: 1 N 1 z h 1 ` if h =1 ::: N 1, 1 N z h;n1 ` if h = N 1 +1 ::: N. Sice Var( P m P N `=1 h=1(w ` h ; E(w ` h))) = ( + )( 1 N N ) P m ``= 1 3 `=1 M! 0, we eed oly show that P m P N `=1 h=1[u` h + V` h ]!N(0 1). Note that fun(`;1)+k = U` k+v` k g forms a sequece of martigale diereces with -elds F N(`;1)+h = F(z ijt j = 1 t < ` i = 1 ::: N j ad w` i i h). The the asymptotic ormality may be proved by employig Corollary 3.1 i Hall (1980) with routie vericatio of the followig: ad m Var X`=1 mx NX `=1 h=1 NX h=1 The proof of (4.) is ow complete. E(U 4` h + V 4 ` h)! 0 E[(U ` h + V` h)jf N(`;1)+h ]! 0: A.3. The ratio-cosistecy of B : We oly eed show that ~ B = trs ; 1 (trs ) is ratio-cosistet for tr( ). Without loss of geerality, we assume that 1 = =0. Note that h X XNj S = ;1 j=1 x ij x 0 ij ; N 1x 1 x 0 1 ; N x x 0 Sice Ex 0 j x j = N ;1 j tr() = o( p tr( )), j = 1, it follows that, x 0 j x j = o( p tr( )). Therefore, we eed oly show that 1 ^B =tr X XNj j=1 x ij x 0 ij 1 ; tr( 1 X XNj j=1 i : x ij x 0 ij)

17 EFFECT OF HIGH DIMENSION 37 is a ratio-cosistet estimator of tr( ). P By elemetary calculatio, we have E(tr( 1 P N j j=1 x ij x 0 ij)) = N tr() ad Var(tr( p P 1 P N j j=1 x ij x 0 ij)) = O(tr( )). These, together with p ;1= tr() = o( tr( )), imply that Rewrite 1 tr 1 tr( 1 X XNj j=1 X XNj j=1 x ij x 0 ij = N tr( )+ N + 1 X X XNj N j 0 X j=1 j 0 =1 i 0 =1 x ij x 0 N ij) = (tr()) + o p (tr( )): X XNj j=1 := N tr( )+H 1 + H : We have E(H 1 )=0adVar(H 1 )= 4N 3 4 Thus, tr((; 0 ;) (z ij z 0 ij ; I m)) (tr((; 0 ;)(z ij z 0 ij ; I m ))(; 0 ;)(z i 0 j0z0 i 0 j ; I m)) 0 h tr( 4 )+ P i m ([(; 0 ;) ] ii = o(tr ( )). H 1 = o p (tr( )): Write H = H 1 + H + H 3 + H 4 + H 5, where H 1 = 1 H = 1 X (ij)6=(i 0 j 0 ) X XNj (tr((; 0 ;)(z ij z 0 ij ; I m))(; 0 ;)(z i 0 j0z0 i 0 j ; I m)) 0 X j=1 (k 0 6=` `06=k) k ` k 0 `0(z ij`z ijk 0z ij`0z ijk ) ad H 3 = H 4 = H 5 = 1 X XNj X j=1 X XNj X `6=k6=`0 j=1 `6=k X XNj mx j=1 k `=1 k `` `0((z ij` ; 1)(z ij`0z ijk )) k `` `((z ij` ; 1)(z ij`z ijk )) k `(z ij` ; 1)(z ijk ; 1):

18 38 ZHIDONG BAI AND HEWA SARANADASA We have E(H 1 )=0,E(H )= N [tr( )+tr (); P m k=1 kk = N tr ()+ o(tr( )) ad Var(H 1 )= N(N ; 1) h 4tr ( )+4 4 mx i j t=1 ij it + X m ij 4 i j=1 i = o(tr ( )): Similarly,wemay showthatvar(h ) ad Var(H 3 )have the same order. Fially, oe may show that ad EjH 4 j CN CN EjH 5 j CN mx `=1 mx ` `E mx k `z 11k k=1 `=1 ` `v uut m X k=1 k ` CN maxtr() = o(tr()): mx k `=1 k ` = o(tr()): Combiig the above, we obtai H = N tr () + o p (tr( )). Thus, ^B = tr( )[1 + o p (1)] ad cosequetly, the ratio-cosistecy of ^B follows. Refereces Babu, G. J. ad Bai, Z. D. (1993). Edgeworth expasios of a fuctio of sample meas uder miimal momet coditios ad partial Cramer's coditios. Sakhya Ser.A 55, Bai, Z. D., Krishaiah, P. R. ad Zhao, L. (1989). O rates of covergece of eciet detectio criteria i sigal processig with white oise IEEE Iformatio 35, Bai, Z. D. ad Rao, C. R. (1991). Edgeworth expasio of a fuctio of sample meas. A. Statist. 19, Bai, Z. D. Silverstei, J. W. ad Yi, Y. Q. (1988). A ote o the largest eigevalue of a large dimesioal sample covariace matrix. J. Multivariate Aal. 6, Bai, Z. D. ad Yi, Y. Q. (1993). Limit of the smallest eigevalue of large dimesioal sample covariace matrix. A. Probab. 1, Bhattacharya, R. N. ad Ghosh, J. K. (1988). O momet coditios for valid formal Edgeworth expasios. J. Multivariate Aal. 7, Chug, J. H. ad Fraser, D. A. S. (1958). Radomizatio tests for a multivariate two-sample problem. J. Amer. Statist. Assoc. 53, Dempster, A. P. (1958). A high dimesioal two sample sigicace test. A. Math. Statist. 9, Dempster, A. P. (1960). A sigicace test for the separatio of two highly multivariate small samples. Biometrics 16, Gema, S. (1980). A limit theorem for the orm of radom matrices. A. Probab. 8, 5-61.

19 EFFECT OF HIGH DIMENSION 39 Hall, P. G. ad Heyde, C. C. (1980). Martigale Limit Theory ad Its Applicatios. Academic Press, New York. Huber, Peter J. (1973). Robust regressio: Asymptotics, cojectures ad Mote Carlo A. Statist. 1, Josso, D. (198). Some limit theorems for the eigevalues of a sample covariace matrix. J. Multivariate Aal. 1, Loeve, M. (1977). Probability Theory, 4th Ed. Spriger-Verlag, New York. Narayaaswamy, C. R. ad Raghavarao, D. (1991). Pricipal compoet aalysis of large dispersio matrices. Appl. Statist. 40, Portoy, S. (1984). Asymptotic behavior of M-estimators of p regressio parameters whe p = is large. I. Cosistecy A. Statist. 1, Portoy, S. (1985). Asymptotic behavior of M-estimators of p regressio parameters whe p = is large: II. Normal approximatio (Corr: 91V19 p8) A. Statist. 13, Saraadasa, H. (1991). Discrimiat aalysis based o experimetal desig cocepts, Ph.D. Thesis, Departmet of Statistics, Temple Uiversity. Saraadasa, H. (1993). Asymptotic expasio of the misclassicatio probabilities of D- ad A-criteria for discrimiatio from two high dimesioal populatios usig the theory of large dimesioal radom matrices. J. Multivariate Aal. 46, Silverstei, J. W. (1985). The smallest eigevalue of a large dimesioal Wishart matrix. A. Probab. 13, Wachter, K. W. (1978). The strog limits of radom matrix spectra for sample matrices of idepedet elemets. A. Probab. 6, Yi, Y. Q. (1986). Limitig spectral distributio for a class of radom matrices. J. Multivariate Aal. 0, Yi, Y. Q., Bai, Z. D. ad Krishaiah, P. R. (1983). Limitig behavior of the eigevalues of a multivariate F matrix. J. Multivariate Aal. 13, Yi, Y. Q., Bai, Z. D. ad Krishaiah, P. R. (1988). O the limit of the Largest eigevalue of the large dimesioal sample covariace matrix. Probab. Theory Related Fields 78, Zhao, L. C., Krishaiah, P. R. ad Bai, Z. D. (1986a). O detectio of the umber of sigals i presece of white oise J. Multivariate Aal. 0, 1-5. Zhao, L. C., Krishaiah, P. R. ad Bai, Z. D. (1986b). O detectio of the umber of sigals whe the oise covariace matrix is arbitrary J. Multivariate Aal. 0, Departmet of Applied Mathematics, Natioal Su Yat-se Uiversity, Kaohsiug 8044, Taiwa. The R. W. Johso Pharmaceutical Research Istitute, Precliical Biostatistics, Welsh ad Mckea Road, Sprig House, PA , U.S.A. (Received July 1993 accepted April 1995)

I. Chi-squared Distributions

I. Chi-squared Distributions 1 M 358K Supplemet to Chapter 23: CHI-SQUARED DISTRIBUTIONS, T-DISTRIBUTIONS, AND DEGREES OF FREEDOM To uderstad t-distributios, we first eed to look at aother family of distributios, the chi-squared distributios.

More information

Chapter 7 Methods of Finding Estimators

Chapter 7 Methods of Finding Estimators Chapter 7 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 011 Chapter 7 Methods of Fidig Estimators Sectio 7.1 Itroductio Defiitio 7.1.1 A poit estimator is ay fuctio W( X) W( X1, X,, X ) of

More information

Properties of MLE: consistency, asymptotic normality. Fisher information.

Properties of MLE: consistency, asymptotic normality. Fisher information. Lecture 3 Properties of MLE: cosistecy, asymptotic ormality. Fisher iformatio. I this sectio we will try to uderstad why MLEs are good. Let us recall two facts from probability that we be used ofte throughout

More information

Department of Computer Science, University of Otago

Department of Computer Science, University of Otago Departmet of Computer Sciece, Uiversity of Otago Techical Report OUCS-2006-09 Permutatios Cotaiig May Patters Authors: M.H. Albert Departmet of Computer Sciece, Uiversity of Otago Micah Colema, Rya Fly

More information

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008 I ite Sequeces Dr. Philippe B. Laval Keesaw State Uiversity October 9, 2008 Abstract This had out is a itroductio to i ite sequeces. mai de itios ad presets some elemetary results. It gives the I ite Sequeces

More information

A probabilistic proof of a binomial identity

A probabilistic proof of a binomial identity A probabilistic proof of a biomial idetity Joatho Peterso Abstract We give a elemetary probabilistic proof of a biomial idetity. The proof is obtaied by computig the probability of a certai evet i two

More information

7. Sample Covariance and Correlation

7. Sample Covariance and Correlation 1 of 8 7/16/2009 6:06 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 7. Sample Covariace ad Correlatio The Bivariate Model Suppose agai that we have a basic radom experimet, ad that X ad Y

More information

0.7 0.6 0.2 0 0 96 96.5 97 97.5 98 98.5 99 99.5 100 100.5 96.5 97 97.5 98 98.5 99 99.5 100 100.5

0.7 0.6 0.2 0 0 96 96.5 97 97.5 98 98.5 99 99.5 100 100.5 96.5 97 97.5 98 98.5 99 99.5 100 100.5 Sectio 13 Kolmogorov-Smirov test. Suppose that we have a i.i.d. sample X 1,..., X with some ukow distributio P ad we would like to test the hypothesis that P is equal to a particular distributio P 0, i.e.

More information

Case Study. Normal and t Distributions. Density Plot. Normal Distributions

Case Study. Normal and t Distributions. Density Plot. Normal Distributions Case Study Normal ad t Distributios Bret Halo ad Bret Larget Departmet of Statistics Uiversity of Wiscosi Madiso October 11 13, 2011 Case Study Body temperature varies withi idividuals over time (it ca

More information

Hypothesis testing. Null and alternative hypotheses

Hypothesis testing. Null and alternative hypotheses Hypothesis testig Aother importat use of samplig distributios is to test hypotheses about populatio parameters, e.g. mea, proportio, regressio coefficiets, etc. For example, it is possible to stipulate

More information

NPTEL STRUCTURAL RELIABILITY

NPTEL STRUCTURAL RELIABILITY NPTEL Course O STRUCTURAL RELIABILITY Module # 0 Lecture 1 Course Format: Web Istructor: Dr. Aruasis Chakraborty Departmet of Civil Egieerig Idia Istitute of Techology Guwahati 1. Lecture 01: Basic Statistics

More information

Output Analysis (2, Chapters 10 &11 Law)

Output Analysis (2, Chapters 10 &11 Law) B. Maddah ENMG 6 Simulatio 05/0/07 Output Aalysis (, Chapters 10 &11 Law) Comparig alterative system cofiguratio Sice the output of a simulatio is radom, the comparig differet systems via simulatio should

More information

The Field of Complex Numbers

The Field of Complex Numbers The Field of Complex Numbers S. F. Ellermeyer The costructio of the system of complex umbers begis by appedig to the system of real umbers a umber which we call i with the property that i = 1. (Note that

More information

Hypothesis Tests Applied to Means

Hypothesis Tests Applied to Means The Samplig Distributio of the Mea Hypothesis Tests Applied to Meas Recall that the samplig distributio of the mea is the distributio of sample meas that would be obtaied from a particular populatio (with

More information

Key Ideas Section 8-1: Overview hypothesis testing Hypothesis Hypothesis Test Section 8-2: Basics of Hypothesis Testing Null Hypothesis

Key Ideas Section 8-1: Overview hypothesis testing Hypothesis Hypothesis Test Section 8-2: Basics of Hypothesis Testing Null Hypothesis Chapter 8 Key Ideas Hypothesis (Null ad Alterative), Hypothesis Test, Test Statistic, P-value Type I Error, Type II Error, Sigificace Level, Power Sectio 8-1: Overview Cofidece Itervals (Chapter 7) are

More information

3. Covariance and Correlation

3. Covariance and Correlation Virtual Laboratories > 3. Expected Value > 1 2 3 4 5 6 3. Covariace ad Correlatio Recall that by takig the expected value of various trasformatios of a radom variable, we ca measure may iterestig characteristics

More information

Economics 140A Confidence Intervals and Hypothesis Testing

Economics 140A Confidence Intervals and Hypothesis Testing Ecoomics 140A Cofidece Itervals ad Hypothesis Testig Obtaiig a estimate of a parameter is ot the al purpose of statistical iferece because it is highly ulikely that the populatio value of a parameter is

More information

Confidence Intervals for One Mean with Tolerance Probability

Confidence Intervals for One Mean with Tolerance Probability Chapter 421 Cofidece Itervals for Oe Mea with Tolerace Probability Itroductio This procedure calculates the sample size ecessary to achieve a specified distace from the mea to the cofidece limit(s) with

More information

Definition. Definition. 7-2 Estimating a Population Proportion. Definition. Definition

Definition. Definition. 7-2 Estimating a Population Proportion. Definition. Definition 7- stimatig a Populatio Proportio I this sectio we preset methods for usig a sample proportio to estimate the value of a populatio proportio. The sample proportio is the best poit estimate of the populatio

More information

Fourier Series and the Wave Equation Part 2

Fourier Series and the Wave Equation Part 2 Fourier Series ad the Wave Equatio Part There are two big ideas i our work this week. The first is the use of liearity to break complicated problems ito simple pieces. The secod is the use of the symmetries

More information

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method Chapter 6: Variace, the law of large umbers ad the Mote-Carlo method Expected value, variace, ad Chebyshev iequality. If X is a radom variable recall that the expected value of X, E[X] is the average value

More information

University of California, Los Angeles Department of Statistics. Distributions related to the normal distribution

University of California, Los Angeles Department of Statistics. Distributions related to the normal distribution Uiversity of Califoria, Los Ageles Departmet of Statistics Statistics 100B Istructor: Nicolas Christou Three importat distributios: Distributios related to the ormal distributio Chi-square (χ ) distributio.

More information

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n We will cosider the liear regressio model i matrix form. For simple liear regressio, meaig oe predictor, the model is i = + x i + ε i for i =,,,, This model icludes the assumptio that the ε i s are a sample

More information

1 Introduction to reducing variance in Monte Carlo simulations

1 Introduction to reducing variance in Monte Carlo simulations Copyright c 007 by Karl Sigma 1 Itroductio to reducig variace i Mote Carlo simulatios 11 Review of cofidece itervals for estimatig a mea I statistics, we estimate a uow mea µ = E(X) of a distributio by

More information

Soving Recurrence Relations

Soving Recurrence Relations Sovig Recurrece Relatios Part 1. Homogeeous liear 2d degree relatios with costat coefficiets. Cosider the recurrece relatio ( ) T () + at ( 1) + bt ( 2) = 0 This is called a homogeeous liear 2d degree

More information

5: Introduction to Estimation

5: Introduction to Estimation 5: Itroductio to Estimatio Cotets Acroyms ad symbols... 1 Statistical iferece... Estimatig µ with cofidece... 3 Samplig distributio of the mea... 3 Cofidece Iterval for μ whe σ is kow before had... 4 Sample

More information

Chapter 7 - Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas:

Chapter 7 - Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas: Chapter 7 - Samplig Distributios 1 Itroductio What is statistics? It cosist of three major areas: Data Collectio: samplig plas ad experimetal desigs Descriptive Statistics: umerical ad graphical summaries

More information

An example of non-quenched convergence in the conditional central limit theorem for partial sums of a linear process

An example of non-quenched convergence in the conditional central limit theorem for partial sums of a linear process A example of o-queched covergece i the coditioal cetral limit theorem for partial sums of a liear process Dalibor Volý ad Michael Woodroofe Abstract A causal liear processes X,X 0,X is costructed for which

More information

B1. Fourier Analysis of Discrete Time Signals

B1. Fourier Analysis of Discrete Time Signals B. Fourier Aalysis of Discrete Time Sigals Objectives Itroduce discrete time periodic sigals Defie the Discrete Fourier Series (DFS) expasio of periodic sigals Defie the Discrete Fourier Trasform (DFT)

More information

Gregory Carey, 1998 Linear Transformations & Composites - 1. Linear Transformations and Linear Composites

Gregory Carey, 1998 Linear Transformations & Composites - 1. Linear Transformations and Linear Composites Gregory Carey, 1998 Liear Trasformatios & Composites - 1 Liear Trasformatios ad Liear Composites I Liear Trasformatios of Variables Meas ad Stadard Deviatios of Liear Trasformatios A liear trasformatio

More information

Standard Errors and Confidence Intervals

Standard Errors and Confidence Intervals Stadard Errors ad Cofidece Itervals Itroductio I the documet Data Descriptio, Populatios ad the Normal Distributio a sample had bee obtaied from the populatio of heights of 5-year-old boys. If we assume

More information

Distributions of Order Statistics

Distributions of Order Statistics Chapter 2 Distributios of Order Statistics We give some importat formulae for distributios of order statistics. For example, where F k: (x)=p{x k, x} = I F(x) (k, k + 1), I x (a,b)= 1 x t a 1 (1 t) b 1

More information

Module 4: Mathematical Induction

Module 4: Mathematical Induction Module 4: Mathematical Iductio Theme 1: Priciple of Mathematical Iductio Mathematical iductio is used to prove statemets about atural umbers. As studets may remember, we ca write such a statemet as a predicate

More information

Asymptotic Growth of Functions

Asymptotic Growth of Functions CMPS Itroductio to Aalysis of Algorithms Fall 3 Asymptotic Growth of Fuctios We itroduce several types of asymptotic otatio which are used to compare the performace ad efficiecy of algorithms As we ll

More information

TIEE Teaching Issues and Experiments in Ecology - Volume 1, January 2004

TIEE Teaching Issues and Experiments in Ecology - Volume 1, January 2004 TIEE Teachig Issues ad Experimets i Ecology - Volume 1, Jauary 2004 EXPERIMENTS Evirometal Correlates of Leaf Stomata Desity Bruce W. Grat ad Itzick Vatick Biology, Wideer Uiversity, Chester PA, 19013

More information

1 Computing the Standard Deviation of Sample Means

1 Computing the Standard Deviation of Sample Means Computig the Stadard Deviatio of Sample Meas Quality cotrol charts are based o sample meas ot o idividual values withi a sample. A sample is a group of items, which are cosidered all together for our aalysis.

More information

ARTICLE IN PRESS. Statistics & Probability Letters ( ) A Kolmogorov-type test for monotonicity of regression. Cecile Durot

ARTICLE IN PRESS. Statistics & Probability Letters ( ) A Kolmogorov-type test for monotonicity of regression. Cecile Durot STAPRO 66 pp: - col.fig.: il ED: MG PROD. TYPE: COM PAGN: Usha.N -- SCAN: il Statistics & Probability Letters 2 2 2 2 Abstract A Kolmogorov-type test for mootoicity of regressio Cecile Durot Laboratoire

More information

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES Read Sectio 1.5 (pages 5 9) Overview I Sectio 1.5 we lear to work with summatio otatio ad formulas. We will also itroduce a brief overview of sequeces,

More information

PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM

PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY Physical ad Mathematical Scieces 2015, 1, p. 15 19 M a t h e m a t i c s AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM A. G. GULYAN Chair of Actuarial Mathematics

More information

Modified Line Search Method for Global Optimization

Modified Line Search Method for Global Optimization Modified Lie Search Method for Global Optimizatio Cria Grosa ad Ajith Abraham Ceter of Excellece for Quatifiable Quality of Service Norwegia Uiversity of Sciece ad Techology Trodheim, Norway {cria, ajith}@q2s.tu.o

More information

Determining the sample size

Determining the sample size Determiig the sample size Oe of the most commo questios ay statisticia gets asked is How large a sample size do I eed? Researchers are ofte surprised to fid out that the aswer depeds o a umber of factors

More information

Incremental calculation of weighted mean and variance

Incremental calculation of weighted mean and variance Icremetal calculatio of weighted mea ad variace Toy Fich faf@cam.ac.uk dot@dotat.at Uiversity of Cambridge Computig Service February 009 Abstract I these otes I eplai how to derive formulae for umerically

More information

Confidence Intervals for One Mean

Confidence Intervals for One Mean Chapter 420 Cofidece Itervals for Oe Mea Itroductio This routie calculates the sample size ecessary to achieve a specified distace from the mea to the cofidece limit(s) at a stated cofidece level for a

More information

Maximum Likelihood Estimators.

Maximum Likelihood Estimators. Lecture 2 Maximum Likelihood Estimators. Matlab example. As a motivatio, let us look at oe Matlab example. Let us geerate a radom sample of size 00 from beta distributio Beta(5, 2). We will lear the defiitio

More information

MARTINGALES AND A BASIC APPLICATION

MARTINGALES AND A BASIC APPLICATION MARTINGALES AND A BASIC APPLICATION TURNER SMITH Abstract. This paper will develop the measure-theoretic approach to probability i order to preset the defiitio of martigales. From there we will apply this

More information

ORDERS OF GROWTH KEITH CONRAD

ORDERS OF GROWTH KEITH CONRAD ORDERS OF GROWTH KEITH CONRAD Itroductio Gaiig a ituitive feel for the relative growth of fuctios is importat if you really wat to uderstad their behavior It also helps you better grasp topics i calculus

More information

Chapter 10. Hypothesis Tests Regarding a Parameter. 10.1 The Language of Hypothesis Testing

Chapter 10. Hypothesis Tests Regarding a Parameter. 10.1 The Language of Hypothesis Testing Chapter 10 Hypothesis Tests Regardig a Parameter A secod type of statistical iferece is hypothesis testig. Here, rather tha use either a poit (or iterval) estimate from a simple radom sample to approximate

More information

9.8: THE POWER OF A TEST

9.8: THE POWER OF A TEST 9.8: The Power of a Test CD9-1 9.8: THE POWER OF A TEST I the iitial discussio of statistical hypothesis testig, the two types of risks that are take whe decisios are made about populatio parameters based

More information

, a Wishart distribution with n -1 degrees of freedom and scale matrix.

, a Wishart distribution with n -1 degrees of freedom and scale matrix. UMEÅ UNIVERSITET Matematisk-statistiska istitutioe Multivariat dataaalys D MSTD79 PA TENTAMEN 004-0-9 LÖSNINGSFÖRSLAG TILL TENTAMEN I MATEMATISK STATISTIK Multivariat dataaalys D, 5 poäg.. Assume that

More information

UC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Solutions 9 Spring 2006

UC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Solutions 9 Spring 2006 Exam format UC Bereley Departmet of Electrical Egieerig ad Computer Sciece EE 6: Probablity ad Radom Processes Solutios 9 Sprig 006 The secod midterm will be held o Wedesday May 7; CHECK the fial exam

More information

13 Fast Fourier Transform (FFT)

13 Fast Fourier Transform (FFT) 13 Fast Fourier Trasform FFT) The fast Fourier trasform FFT) is a algorithm for the efficiet implemetatio of the discrete Fourier trasform. We begi our discussio oce more with the cotiuous Fourier trasform.

More information

The analysis of the Cournot oligopoly model considering the subjective motive in the strategy selection

The analysis of the Cournot oligopoly model considering the subjective motive in the strategy selection The aalysis of the Courot oligopoly model cosiderig the subjective motive i the strategy selectio Shigehito Furuyama Teruhisa Nakai Departmet of Systems Maagemet Egieerig Faculty of Egieerig Kasai Uiversity

More information

Normal Distribution.

Normal Distribution. Normal Distributio www.icrf.l Normal distributio I probability theory, the ormal or Gaussia distributio, is a cotiuous probability distributio that is ofte used as a first approimatio to describe realvalued

More information

The second difference is the sequence of differences of the first difference sequence, 2

The second difference is the sequence of differences of the first difference sequence, 2 Differece Equatios I differetial equatios, you look for a fuctio that satisfies ad equatio ivolvig derivatives. I differece equatios, istead of a fuctio of a cotiuous variable (such as time), we look for

More information

BASIC STATISTICS. f(x 1,x 2,..., x n )=f(x 1 )f(x 2 ) f(x n )= f(x i ) (1)

BASIC STATISTICS. f(x 1,x 2,..., x n )=f(x 1 )f(x 2 ) f(x n )= f(x i ) (1) BASIC STATISTICS. SAMPLES, RANDOM SAMPLING AND SAMPLE STATISTICS.. Radom Sample. The radom variables X,X 2,..., X are called a radom sample of size from the populatio f(x if X,X 2,..., X are mutually idepedet

More information

Sequences II. Chapter 3. 3.1 Convergent Sequences

Sequences II. Chapter 3. 3.1 Convergent Sequences Chapter 3 Sequeces II 3. Coverget Sequeces Plot a graph of the sequece a ) = 2, 3 2, 4 3, 5 + 4,...,,... To what limit do you thik this sequece teds? What ca you say about the sequece a )? For ǫ = 0.,

More information

A Gentle Introduction to Algorithms: Part II

A Gentle Introduction to Algorithms: Part II A Getle Itroductio to Algorithms: Part II Cotets of Part I:. Merge: (to merge two sorted lists ito a sigle sorted list.) 2. Bubble Sort 3. Merge Sort: 4. The Big-O, Big-Θ, Big-Ω otatios: asymptotic bouds

More information

Notes on Hypothesis Testing

Notes on Hypothesis Testing Probability & Statistics Grishpa Notes o Hypothesis Testig A radom sample X = X 1,..., X is observed, with joit pmf/pdf f θ x 1,..., x. The values x = x 1,..., x of X lie i some sample space X. The parameter

More information

Stat 104 Lecture 16. Statistics 104 Lecture 16 (IPS 6.1) Confidence intervals - the general concept

Stat 104 Lecture 16. Statistics 104 Lecture 16 (IPS 6.1) Confidence intervals - the general concept Statistics 104 Lecture 16 (IPS 6.1) Outlie for today Cofidece itervals Cofidece itervals for a mea, µ (kow σ) Cofidece itervals for a proportio, p Margi of error ad sample size Review of mai topics for

More information

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown Z-TEST / Z-STATISTIC: used to test hypotheses about µ whe the populatio stadard deviatio is kow ad populatio distributio is ormal or sample size is large T-TEST / T-STATISTIC: used to test hypotheses about

More information

Sequences and Series

Sequences and Series CHAPTER 9 Sequeces ad Series 9.. Covergece: Defiitio ad Examples Sequeces The purpose of this chapter is to itroduce a particular way of geeratig algorithms for fidig the values of fuctios defied by their

More information

Lecture 4: Cheeger s Inequality

Lecture 4: Cheeger s Inequality Spectral Graph Theory ad Applicatios WS 0/0 Lecture 4: Cheeger s Iequality Lecturer: Thomas Sauerwald & He Su Statemet of Cheeger s Iequality I this lecture we assume for simplicity that G is a d-regular

More information

Irreducible polynomials with consecutive zero coefficients

Irreducible polynomials with consecutive zero coefficients Irreducible polyomials with cosecutive zero coefficiets Theodoulos Garefalakis Departmet of Mathematics, Uiversity of Crete, 71409 Heraklio, Greece Abstract Let q be a prime power. We cosider the problem

More information

Center, Spread, and Shape in Inference: Claims, Caveats, and Insights

Center, Spread, and Shape in Inference: Claims, Caveats, and Insights Ceter, Spread, ad Shape i Iferece: Claims, Caveats, ad Isights Dr. Nacy Pfeig (Uiversity of Pittsburgh) AMATYC November 2008 Prelimiary Activities 1. I would like to produce a iterval estimate for the

More information

Chapter 7: Confidence Interval and Sample Size

Chapter 7: Confidence Interval and Sample Size Chapter 7: Cofidece Iterval ad Sample Size Learig Objectives Upo successful completio of Chapter 7, you will be able to: Fid the cofidece iterval for the mea, proportio, ad variace. Determie the miimum

More information

Alternatives To Pearson s and Spearman s Correlation Coefficients

Alternatives To Pearson s and Spearman s Correlation Coefficients Alteratives To Pearso s ad Spearma s Correlatio Coefficiets Floreti Smaradache Chair of Math & Scieces Departmet Uiversity of New Mexico Gallup, NM 8730, USA Abstract. This article presets several alteratives

More information

Hypergeometric Distributions

Hypergeometric Distributions 7.4 Hypergeometric Distributios Whe choosig the startig lie-up for a game, a coach obviously has to choose a differet player for each positio. Similarly, whe a uio elects delegates for a covetio or you

More information

PSYCHOLOGICAL STATISTICS

PSYCHOLOGICAL STATISTICS UNIVERSITY OF CALICUT SCHOOL OF DISTANCE EDUCATION B Sc. Cousellig Psychology (0 Adm.) IV SEMESTER COMPLEMENTARY COURSE PSYCHOLOGICAL STATISTICS QUESTION BANK. Iferetial statistics is the brach of statistics

More information

Chapter 5 O A Cojecture Of Erdíos Proceedigs NCUR VIII è1994è, Vol II, pp 794í798 Jeærey F Gold Departmet of Mathematics, Departmet of Physics Uiversity of Utah Do H Tucker Departmet of Mathematics Uiversity

More information

THIN SEQUENCES AND THE GRAM MATRIX PAMELA GORKIN, JOHN E. MCCARTHY, SANDRA POTT, AND BRETT D. WICK

THIN SEQUENCES AND THE GRAM MATRIX PAMELA GORKIN, JOHN E. MCCARTHY, SANDRA POTT, AND BRETT D. WICK THIN SEQUENCES AND THE GRAM MATRIX PAMELA GORKIN, JOHN E MCCARTHY, SANDRA POTT, AND BRETT D WICK Abstract We provide a ew proof of Volberg s Theorem characterizig thi iterpolatig sequeces as those for

More information

8.5 Alternating infinite series

8.5 Alternating infinite series 65 8.5 Alteratig ifiite series I the previous two sectios we cosidered oly series with positive terms. I this sectio we cosider series with both positive ad egative terms which alterate: positive, egative,

More information

Lesson 12. Sequences and Series

Lesson 12. Sequences and Series Retur to List of Lessos Lesso. Sequeces ad Series A ifiite sequece { a, a, a,... a,...} ca be thought of as a list of umbers writte i defiite order ad certai patter. It is usually deoted by { a } =, or

More information

Section 7-3 Estimating a Population. Requirements

Section 7-3 Estimating a Population. Requirements Sectio 7-3 Estimatig a Populatio Mea: σ Kow Key Cocept This sectio presets methods for usig sample data to fid a poit estimate ad cofidece iterval estimate of a populatio mea. A key requiremet i this sectio

More information

AQA STATISTICS 1 REVISION NOTES

AQA STATISTICS 1 REVISION NOTES AQA STATISTICS 1 REVISION NOTES AVERAGES AND MEASURES OF SPREAD www.mathsbox.org.uk Mode : the most commo or most popular data value the oly average that ca be used for qualitative data ot suitable if

More information

Approximating the Sum of a Convergent Series

Approximating the Sum of a Convergent Series Approximatig the Sum of a Coverget Series Larry Riddle Ages Scott College Decatur, GA 30030 lriddle@agesscott.edu The BC Calculus Course Descriptio metios how techology ca be used to explore covergece

More information

Simulation and Monte Carlo integration

Simulation and Monte Carlo integration Chapter 3 Simulatio ad Mote Carlo itegratio I this chapter we itroduce the cocept of geeratig observatios from a specified distributio or sample, which is ofte called Mote Carlo geeratio. The ame of Mote

More information

when n = 1, 2, 3, 4, 5, 6, This list represents the amount of dollars you have after n days. Note: The use of is read as and so on.

when n = 1, 2, 3, 4, 5, 6, This list represents the amount of dollars you have after n days. Note: The use of is read as and so on. Geometric eries Before we defie what is meat by a series, we eed to itroduce a related topic, that of sequeces. Formally, a sequece is a fuctio that computes a ordered list. uppose that o day 1, you have

More information

Using Excel to Construct Confidence Intervals

Using Excel to Construct Confidence Intervals OPIM 303 Statistics Ja Stallaert Usig Excel to Costruct Cofidece Itervals This hadout explais how to costruct cofidece itervals i Excel for the followig cases: 1. Cofidece Itervals for the mea of a populatio

More information

1 The Binomial Theorem: Another Approach

1 The Binomial Theorem: Another Approach The Biomial Theorem: Aother Approach Pascal s Triagle I class (ad i our text we saw that, for iteger, the biomial theorem ca be stated (a + b = c a + c a b + c a b + + c ab + c b, where the coefficiets

More information

Chapter 14 Nonparametric Statistics

Chapter 14 Nonparametric Statistics Chapter 14 Noparametric Statistics A.K.A. distributio-free statistics! Does ot deped o the populatio fittig ay particular type of distributio (e.g, ormal). Sice these methods make fewer assumptios, they

More information

Example 2 Find the square root of 0. The only square root of 0 is 0 (since 0 is not positive or negative, so those choices don t exist here).

Example 2 Find the square root of 0. The only square root of 0 is 0 (since 0 is not positive or negative, so those choices don t exist here). BEGINNING ALGEBRA Roots ad Radicals (revised summer, 00 Olso) Packet to Supplemet the Curret Textbook - Part Review of Square Roots & Irratioals (This portio ca be ay time before Part ad should mostly

More information

Statistics Lecture 14. Introduction to Inference. Administrative Notes. Hypothesis Tests. Last Class: Confidence Intervals

Statistics Lecture 14. Introduction to Inference. Administrative Notes. Hypothesis Tests. Last Class: Confidence Intervals Statistics 111 - Lecture 14 Itroductio to Iferece Hypothesis Tests Admiistrative Notes Sprig Break! No lectures o Tuesday, March 8 th ad Thursday March 10 th Exteded Sprig Break! There is o Stat 111 recitatio

More information

Confidence Intervals and Sample Size

Confidence Intervals and Sample Size 8/7/015 C H A P T E R S E V E N Cofidece Itervals ad Copyright 015 The McGraw-Hill Compaies, Ic. Permissio required for reproductio or display. 1 Cofidece Itervals ad Outlie 7-1 Cofidece Itervals for the

More information

4.1 Sigma Notation and Riemann Sums

4.1 Sigma Notation and Riemann Sums 0 the itegral. Sigma Notatio ad Riema Sums Oe strategy for calculatig the area of a regio is to cut the regio ito simple shapes, calculate the area of each simple shape, ad the add these smaller areas

More information

Grade 7. Strand: Number Specific Learning Outcomes It is expected that students will:

Grade 7. Strand: Number Specific Learning Outcomes It is expected that students will: Strad: Number Specific Learig Outcomes It is expected that studets will: 7.N.1. Determie ad explai why a umber is divisible by 2, 3, 4, 5, 6, 8, 9, or 10, ad why a umber caot be divided by 0. [C, R] [C]

More information

THE HEIGHT OF q-binary SEARCH TREES

THE HEIGHT OF q-binary SEARCH TREES THE HEIGHT OF q-binary SEARCH TREES MICHAEL DRMOTA AND HELMUT PRODINGER Abstract. q biary search trees are obtaied from words, equipped with the geometric distributio istead of permutatios. The average

More information

Lecture 7: Borel Sets and Lebesgue Measure

Lecture 7: Borel Sets and Lebesgue Measure EE50: Probability Foudatios for Electrical Egieers July-November 205 Lecture 7: Borel Sets ad Lebesgue Measure Lecturer: Dr. Krisha Jagaatha Scribes: Ravi Kolla, Aseem Sharma, Vishakh Hegde I this lecture,

More information

Measures of Spread and Boxplots Discrete Math, Section 9.4

Measures of Spread and Boxplots Discrete Math, Section 9.4 Measures of Spread ad Boxplots Discrete Math, Sectio 9.4 We start with a example: Example 1: Comparig Mea ad Media Compute the mea ad media of each data set: S 1 = {4, 6, 8, 10, 1, 14, 16} S = {4, 7, 9,

More information

Lecture 13. Lecturer: Jonathan Kelner Scribe: Jonathan Pines (2009)

Lecture 13. Lecturer: Jonathan Kelner Scribe: Jonathan Pines (2009) 18.409 A Algorithmist s Toolkit October 27, 2009 Lecture 13 Lecturer: Joatha Keler Scribe: Joatha Pies (2009) 1 Outlie Last time, we proved the Bru-Mikowski iequality for boxes. Today we ll go over the

More information

Confidence Intervals for the Mean of Non-normal Data Class 23, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Confidence Intervals for the Mean of Non-normal Data Class 23, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom Cofidece Itervals for the Mea of No-ormal Data Class 23, 8.05, Sprig 204 Jeremy Orloff ad Joatha Bloom Learig Goals. Be able to derive the formula for coservative ormal cofidece itervals for the proportio

More information

Systems Design Project: Indoor Location of Wireless Devices

Systems Design Project: Indoor Location of Wireless Devices Systems Desig Project: Idoor Locatio of Wireless Devices Prepared By: Bria Murphy Seior Systems Sciece ad Egieerig Washigto Uiversity i St. Louis Phoe: (805) 698-5295 Email: bcm1@cec.wustl.edu Supervised

More information

1. C. The formula for the confidence interval for a population mean is: x t, which was

1. C. The formula for the confidence interval for a population mean is: x t, which was s 1. C. The formula for the cofidece iterval for a populatio mea is: x t, which was based o the sample Mea. So, x is guarateed to be i the iterval you form.. D. Use the rule : p-value

More information

STATISTICAL METHODS FOR BUSINESS

STATISTICAL METHODS FOR BUSINESS STATISTICAL METHODS FOR BUSINESS UNIT 7: INFERENTIAL TOOLS. DISTRIBUTIONS ASSOCIATED WITH SAMPLING 7.1.- Distributios associated with the samplig process. 7.2.- Iferetial processes ad relevat distributios.

More information

Universal coding for classes of sources

Universal coding for classes of sources Coexios module: m46228 Uiversal codig for classes of sources Dever Greee This work is produced by The Coexios Project ad licesed uder the Creative Commos Attributio Licese We have discussed several parametric

More information

Estimating Probability Distributions by Observing Betting Practices

Estimating Probability Distributions by Observing Betting Practices 5th Iteratioal Symposium o Imprecise Probability: Theories ad Applicatios, Prague, Czech Republic, 007 Estimatig Probability Distributios by Observig Bettig Practices Dr C Lych Natioal Uiversity of Irelad,

More information

= 1. n n 2 )= n n 2 σ2 = σ2

= 1. n n 2 )= n n 2 σ2 = σ2 SAMLE STATISTICS A rado saple of size fro a distributio f(x is a set of rado variables x 1,x,,x which are idepedetly ad idetically distributed with x i f(x for all i Thus, the joit pdf of the rado saple

More information

arxiv:1506.03481v1 [stat.me] 10 Jun 2015

arxiv:1506.03481v1 [stat.me] 10 Jun 2015 BEHAVIOUR OF ABC FOR BIG DATA By Wetao Li ad Paul Fearhead Lacaster Uiversity arxiv:1506.03481v1 [stat.me] 10 Ju 2015 May statistical applicatios ivolve models that it is difficult to evaluate the likelihood,

More information

AMS 2000 subject classification. Primary 62G08, 62G20; secondary 62G99

AMS 2000 subject classification. Primary 62G08, 62G20; secondary 62G99 VARIABLE SELECTION IN NONPARAMETRIC ADDITIVE MODELS Jia Huag 1, Joel L. Horowitz 2 ad Fegrog Wei 3 1 Uiversity of Iowa, 2 Northwester Uiversity ad 3 Uiversity of West Georgia Abstract We cosider a oparametric

More information

THE ABRACADABRA PROBLEM

THE ABRACADABRA PROBLEM THE ABRACADABRA PROBLEM FRANCESCO CARAVENNA Abstract. We preset a detailed solutio of Exercise E0.6 i [Wil9]: i a radom sequece of letters, draw idepedetly ad uiformly from the Eglish alphabet, the expected

More information

Probabilistic Engineering Mechanics. Do Rosenblatt and Nataf isoprobabilistic transformations really differ?

Probabilistic Engineering Mechanics. Do Rosenblatt and Nataf isoprobabilistic transformations really differ? Probabilistic Egieerig Mechaics 4 (009) 577 584 Cotets lists available at ScieceDirect Probabilistic Egieerig Mechaics joural homepage: wwwelseviercom/locate/probegmech Do Roseblatt ad Nataf isoprobabilistic

More information