Concentration of Measure
|
|
|
- Justin Kelly
- 9 years ago
- Views:
Transcription
1 Copyright c Joh Lafferty, Ha Liu, ad Larry Wasserma Do Not Distribute Chapter 7 Cocetratio of Measure Ofte we wat to show that some radom quatity is close to its mea with high probability Results of this kid are kow as cocetratio iequalities I this chapter we cosider some importat cocetratio results such as Hoeffdig s iequality, Berstei s iequality ad McDiarmid s iequality The we cosider uiform bouds that guaratee that a set of radom quatities are simultaeously close to their meas with high probabilty 71 Itroductio Ofte we eed to show that a radom quatity is close to its mea For example, later we will prove Hoeffdig s iequality which implies that, if 1,, are Beroulli radom variables with mea µ the P( µ >) apple 2e 22 where = 1 P i More geerally, we wat a result of the form P f( 1,, ) µ (f) > < (71) where µ (f) = E(f( 1,, )) ad 0 as 1 Such results are kow as cocetratio iequalities ad the pheomeo that may radom quatities are close to their mea with high probability is called cocetratio of measure These results are 97
2 98 Chapter 7 Cocetratio of Measure fudametal for establishig performace guaratees of may algorithms For statistical learig theory, we will eed uiform bouds of the form P sup f( 1,, ) µ (f) > < (72) over a class of fuctios F 73 Example To motivate the eed for such results, cosider empirical risk miimizatio i classificatio Suppose we have data (X 1,Y 1 ),, (X,Y ) where Y i 2{0, 1} ad X i 2 R d Let h : R d {0, 1} be a classifier The traiig error is br (h) = 1 X I(Y i 6= h(x i )) ad the true classificatio error is R(h) = P(Y 6= h(x)) We would like to kow if b R(h) is close to R(h) with high probability This is precisely of the form (71) with i =(X i,y i ) ad f( 1,, )= 1 P I(Y i 6= h(x i )) Now let H be a set of classifiers Let b h miimize the traiig error R(h) b over H ad let h miimize the true error R(h) over H Ca we guaratee that the risk R( b h) of the selected classifier is close to the risk R(h ) of the best classifier? Let E deote the evet that sup h2h R b (h) R(h) apple Whe the evet E holds, we have that R(h ) apple R( b h) apple b R ( b h)+ apple b R (h )+ apple R(h )+2 where we used the followig facts: h miimizes R, E holds, b h miimizes R b, E holds ad h miimizes R It follows that, whe E holds, R( b h) R(h ) apple2 Cocetratio of measure is used to prove that E holds with high probability 2 Besides classificatio, cocetratio iequalities are used for studyig may other methods such as clusterig, radom projectios ad desity estimatio
3 72 Basic Iequalities 99 Notatio If P is a probability measure ad f is a fuctio the we write Pf = P (f) = f(z)dp (z) =E(f()) Give 1,,, let P deote the empirical measure that puts mass 1/ at each data poit: P (A) = 1 X I( i 2 A) where I( i 2 A) =1if i 2 A ad I( i 2 A) =0otherwise The we write P f = P (f) = f(z)dp (z) = 1 X f( i ) 72 Basic Iequalities 721 Hoeffdig s Iequality Suppose that has a fiite mea ad that P( E() = 1 0 zdp(z) which yields Markov s iequality: 1 zdp(z) 0) = 1 The, for ay >0, 1 dp (z) = P( >) (74) P( >) apple E() (75) A immediate cosequece of Markov s iequality is Chebyshev s iequality P( µ >)=P( µ 2 > 2 ) apple E( µ)2 2 = 2 2 (76) where µ = E() ad 2 = Var() If 1,, are iid with mea µ ad variace 2 the, sice Var( )= 2 /, Chebyshev s iequality yields P( 2 µ >) apple 2 (77) While this iequality is useful, it does ot decay expoetially fast as icreases To improve the iequality, we use Cheroff s method: for ay t>0, P( >)=P(e >e )=P(e t >e t ) apple e t E(e t ) (78) We the miimize over t ad coclude that:
4 100 Chapter 7 Cocetratio of Measure P( >) apple if t 0 e t E(e t ) (79) To use the above result we eed to boud the momet geeratig fuctio E(e t ) 710 Lemma Let be a mea µ radom variable such that a apple apple b The, for ay t, E(e t ) apple e tµ+t2 (b a) 2 /8 (711) Proof For simplicity, assume that µ =0 Sice a apple apple b, we ca write as a covex combiatio of a ad b, amely, = b +(1 )a where =( a)/(b a) By the covexity of the fuctio y e ty we have e t apple e tb +(1 )e ta = a b a etb + b b a eta Take expectatios of both sides ad use the fact that E() =0to get Ee t apple a b a etb + b b a eta = e g(u) (712) where u = t(b a), g(u) = u + log(1 + e u ) ad = a/(b a) Note that g(0) = g 0 (0) = 0 Also, g 00 (u) apple 1/4 for all u>0 By Taylor s theorem, there is a 2 (0,u) such that g(u) = g(0) + ug 0 (0) + u2 2 g00 ( ) = u2 2 g00 ( ) apple u2 8 = t2 (b a) 2 8 Hece, Ee t apple e g(u) apple e t2 (b a) 2 /8 713 Theorem (Hoeffdig) If 1, 2,, are idepedet with P(a apple i apple b) =1ad commo mea µ the for ay t>0 where = 1 P i P( µ >) apple 2e 22 /(b a) 2 (714)
5 72 Basic Iequalities 101 Proof For simplicity assume that E( i )=0 Now we use the Cheroff method For ay t>0, we have, from Markov s iequality, that P 1 X i = P t apple e t E X i t e (t/) P i = P e (t/) P i e t = e t Y i E(e (t/) i ) (715) apple e t e (t2 / 2 ) P (b i a i ) 2 /8 (716) where the last iequality follows from Lemma 710 Now we miimize the right had side over t I particular, we set t =4 2 / P (b i a i ) 2 ad get P apple e 22 /c By a similar argumet, P apple apple e 22 /c ad the result follows 717 Corollary If 1, 2,, are idepedet with P(a i apple i apple b i )=1ad commo mea µ, the, with probability at least 1, s c 2 µ apple 2 log (718) where c = 1 P (b i a i ) Corollary If 1, 2,, are idepedet Beroulli radom variables with P( i = 1) = p the, for ay >0, P( q p >) apple 2e 22 Hece, with probability at least 1 1 we have that p apple 2 log Example (Classificatio) Returig to the classificatio problem, let h be a classifier ad let f(z) = I(y q 6= h(x) where z =(x, y) The Hoeffdig s iequality implies that R(h) R b 1 (h) apple 2 log 2 with probability at least 1 2 The followig result exteds Hoeffdig s iequality to more geeral fuctios f(z 1,,z )
6 102 Chapter 7 Cocetratio of Measure 721 Theorem (McDiarmid) Let 1,, be idepedet radom variables Suppose that sup z 1,,z,zi 0 f(z 1,,z i 1,z i,z i+1,,z ) f(z 1,,z i 1,zi,z 0 i+1,,z ) apple c i (722) for i =1,, The P f( 1,, ) E(f( 1,, )) apple 2exp 2 P 2 c2 (723) i Proof Let Y = f( 1,, ) ad µ = E(f( 1,, )) The P Y µ = P Y µ + P Y µ apple We will boud the first quatity The secod follows similarly Let V i = E(Y 1,, i ) E(Y 1,, i 1 ) The ad E(V i 1,, i Now, for ay t>0, P (Y µ ) = P f( 1,, ) E(f( 1,, )) = X 1 )=0 Usig a similar argumet as i Lemma 710, we have E(e tv i 1,, i 1 ) apple e t2 c 2 i /8 (724) X = e t E e t P 1 V i E e tv 1,, 1 V i = P e t P V i e t apple e t E e t P V i apple e t e t2 c 2 /8 E e t P 1 V i applee t e P t2 c2 i The result follows by takig t =4/ P c2 i Remark: If f(z 1,,z )= 1 P z i the we get back Hoeffdig s iequality 725 Example Let X 1,,X P ad let P (A) = 1 P I(X i 2 A) Defie f(x 1,,X )=sup A P (A) P (A) Chagig oe observatio chages f by at most V i
7 72 Basic Iequalities 103 1/ Hece, P E( ) > apple 2e Sharper Iequalities Hoeffdig s iequality does ot use ay iformatio about the radom variables except the fact that they are bouded If the variace of X i is small, the we ca get a sharper iequality from Berstei s iequality We begi with a prelimiary result 726 Lemma Suppose that X applec ad E(X) =0 For ay t>0, e E(e tx ) apple exp t 2 2 tc 1 tc (tc) 2 where 2 = Var(X) (727) Proof Let F = P 1 t r 2 E(X r ) r=2 r 2 The, 1X E(e tx t r X r ) = E 1+tx + =1+t 2 2 F apple e t2 2F (728) r r=2 For r 2, E(X r )=E(X r 2 X 2 ) apple c r 2 2 ad so 1X t r 2 c r 2 2 F apple = 1 1X (tc) r r 2 (tc) 2 r r=2 i=2 Hece, E(e tx ) apple exp t 2 2 etc 1 tc o (tc) 2 = etc 1 tc (tc) 2 (729) 730 Theorem (Berstei) If P( X i applec) =1ad E(X i )=µ the, for ay >0, P( X where 2 = 1 P Var(X i) µ >) apple 2exp c/3 (731) Proof For simplicity, assume that µ =0 From Lemma 726, E(e tx i ) apple exp t 2 i 2 e tc 1 tc (tc) 2 (732)
8 104 Chapter 7 Cocetratio of Measure where i 2 = E(X2 i ) Now, P X > = P X X i > = P e t P X i >e t (733) apple e t E(e t P Y X i )=e t E(e tx i ) (734) apple e t exp t 2 2 etc 1 tc (tc) 2 (735) Take t =(1/c) log(1 + c/ 2 ) to get 2 c P(X >) apple exp c 2 h 2 (736) where h(u) =(1+u) log(1 + u) u The results follows by otig that h(u) u 2 /(2 + 2u/3) for u 0 A useful corollary is the followig 737 Lemma Let X 1,,X be iid ad suppose that X i applec ad E(X i )=µ With probability at least 1, r 2 X µ apple 2 log(1/ ) 2c log(1/ ) + (738) 3 I particular, if apple p 2c 2 log(1/ )/(9), the with probability at least 1, X µ apple C (739) where C =4c log 1 /3 We also get a very specific iequality i the special case that X is Gaussia 740 Theorem Suppose that X 1,,X N(µ, 2 ) 2 P( X µ >) apple exp 2 2 (741) Proof Let X N(0, 1) with desity (x) = R x1 (s)ds For ay >0, (x) =(2 ) 1/2 e x2 /2 ad distributio fuctio P(X >)= 1 (s)ds apple 1 1 s (s)ds = (s)ds = () apple 1 e 2 /2 (742)
9 72 Basic Iequalities 105 By symmetry we have that Now suppose that X 1,,X N(µ, Let N(0, 1) The, for all large P( X >) apple 2 e 2 /2 P( X µ >) = P p X µ / > p / 2 ) The X = 1 P X i N(µ, 2 /) apple 2 p e 2 /(2 2) apple e 2 /(2 2 ) = P > p / (743) (744) 723 Bouds o Expected Values Suppose we have a expoetial boud o P(X >) I that case we ca boud E(X ) as follows 745 Theorem Suppose that X 0 ad that for every >0, for some c 2 > 0 ad c 1 > 1/e The, E(X ) apple P(X >) apple c 1 e c 2 2 (746) q C where C = (1 + log(c 1))/c 2 Proof Recall that for ay oegative radom variable Y, E(Y ) = R 1 0 P(Y t)dt Hece, for ay a>0, E(X 2 )= 1 0 P(X 2 t)dt = a 0 P(X 2 t)dt+ 1 a P(X 2 t)dt apple a+ 1 a P(X 2 t)dt Equatio (746) implies that P(X > p t) apple c 1 e c 2t Hece, E(X 2 ) apple a + 1 a P(X 2 t)dt = a + Set a = log(c 1 )/(c 2 ) ad coclude that Fially, we have E(X ) apple p E(X 2 ) apple 1 a P(X p t)dt apple a + c1 1 E(X 2 ) apple log(c 1) c c 2 = 1 + log(c 1) c 2 q 1+log(c1 ) c 2 a e c 2t dt = a + c 1e c 2a c 2
10 106 Chapter 7 Cocetratio of Measure Now we cosider boudig the maximum of a set of radom variables 747 Theorem Let X 1,,X be radom variables Suppose there exists >0 such that E(e tx i ) apple e t 2 /2 for all t>0 The E max X i apple p 2 log (748) 1appleiapple Proof By Jese s iequality, exp te max X i apple E exp t max X i 1appleiapple 1appleiapple = E max exp {tx i} apple 1appleiapple X E (exp {tx i }) apple e t2 2 /2 Thus, E (max 1appleiapple X i ) apple log t + t 2 2 The result follows by settig t = p 2 log / 73 Uiform Bouds 731 Biary Fuctios A biary fuctio o a space is a fuctio f : {0, 1} Let F be a class of biary fuctios o For ay z 1,,z defie o F z1,,z = (f(z 1 ),,f(z )) : f 2F (749) F z1,,z is a fiite collectio of biary vectors ad F z1,,z apple2 The set F z1,,z is called the projectio of F oto z 1,,z 750 Example Let F = {f t : t 2 R} where f t (z) =1if z>tad f t (z) =0of z apple t Cosider three real umbers z 1 <z 2 <z 3 The o F z1,z 2,z 3 = (0, 0, 0), (0, 0, 1), (0, 1, 1), (1, 1, 1) 2 Defie the growth fuctio or shatterig umber by s(f,)= sup z 1,,z F z1,,z (751)
11 73 Uiform Bouds 107 A biary fuctio f ca be thought of as a idicator fuctio for a set, amely, A = {z : f(x) =1} Coversely, ay set ca be thought of as a biary fuctio, amely, its idicator fuctio I A (z) We ca therefore re-express the growth fuctio i terms of sets If A is a class of subsets of R d the s(a,) is defied to be s(f,) where F = {I A : A 2 A} is the set of idicator fuctios ad the s(a,) is agai called the shatterig umber It follows that s(a,) = max s(a,f) F where the maximum is over all fiite sets of size ad s(a,f)= {A \ F : A 2A deotes the umber of subsets of F picked out by A We say that a fiite set F of size is shattered by A if s(a,f)=2 752 Theorem Let A ad B be classes of subsets of R d 1 s(a,+ m) apple s(a,)s(a,m) 2 If C = A S B the s(c,) apple s(a,)+s(b,) 3 If C = {A S B : A 2A,B 2B}the s(c,) apple s(a,)s(b,) 4 If C = {A T B : A 2A,B 2B}the s(c,) apple s(a,)s(b,) Proof See exercise 10 VC Dimesio Recall that a fiite set F of size is shattered by A if s(a,f)=2 The VC dimesio (amed after Vapik ad Chervoekis) of A is the size of the largest set that ca be shattered by A The VC dimesio of a class of set A is VC(A) =sup : s(a,)=2 o (753) The VC dimesio of a class of biary fuctios F is VC(F) =sup : s(f,)=2 o (754) If the VC dimesio is fiite, the the growth fuctio caot grow too quickly I fact, there is a phase trasitio: s(f,)=2 for <dad the the growth switches to
12 108 Chapter 7 Cocetratio of Measure polyomial 755 Theorem (Sauer s Theorem) Suppose that F has fiite VC dimesio d The, dx s(f,) apple i ad for all d, i=0 (756) e d s(f,) apple (757) d Proof Whe = d = 1, (756) clearly holds We proceed by iductio Suppose that (756) holds for 1 ad d 1 ad also that it holds for 1 ad d We will show that it holds for ad d Let h(, d) = P d i=0 i We eed to show that VC(F) apple d implies that s(f,) apple h(, d) Let F 1 = {z 1,,z } ad F 2 = {z 2,,z } Let F 1 = {(f(z 1 ),,f(z ) : f 2 F} ad F 2 = {(f(z 2 ),,f(z ) : f 2 F} For f,g 2F, write f g if g(z 1 )=1 f(z 1 ) ad g(z j )=f(z j ) for j =2,, Let o G = f 2F: there existsg 2Fsuch that g f Defie F 3 = {(f(z 2 ),,f(z )) : f 2 G} The F 1 = F 2 + F 3 Note that VC(F 2 ) apple d ad VC(F 3 ) apple d 1 The latter follows sice, if F 3 shatters a set, the we ca add z 1 to create a set that is shattered by F 1 By assumptio F 2 appleh( 1,d) ad F 3 appleh( 1,d 1) Hece, F 1 appleh( 1,d)+h( 1,d 1) = h(, d) Thus, s(f,) apple h(, d) which proves (756) To prove (757), we use the fact that dx d dx apple i d i i=0 apple i=0 d d d 1+ d apple d ad so: i d X apple d d d e d i=0 i d i The VC dimesios of some commo examples are summarized i Table 71 Now we ca exted the cocetratio iequalities to hold uiformly over sets of biary fuctios We start with fiite collectios 758 Theorem Suppose that F = {f 1,,f N } is a fiite set of biary fuctios The, with probability at least 1, s 2 2N sup P (f) P (f) apple log (759)
13 73 Uiform Bouds 109 Class A VC dimesio V A A = {A 1,,A N } log 2 N Itervals [a, b] o the real lie 2 Discs i R 2 3 Closed balls i R d d +2 Rectagles i R d 2d Half-spaces i R d d +1 Covex polygos i R 2 1 Table 71 The VC dimesio of some classes A Proof It follows from Hoeffdig s iequality that, for each f 2F, P ( P (f) P (f) >) apple 2e 2 /2 Hece, P max P (f) P (f) > = P ( P (f) P (f) > for some f 2F) The coclusio follows apple NX P ( P (f j ) P (f j ) >) apple 2Ne 2 /2 j=1 Now we cosider results for the case where F is ifiite We begi with a importat result due to Vapik ad Chervoekis 760 Theorem (Vapik ad Chervoekis) Let F be a class of biary fuctios For ay t> p 2/, P sup (P P )f >t ad hece, with probability at least 1, sup P (f) P (f) apple apple 4 s(f, 2)e t2 /8 (761) s 8 4 s(f, 2) log (762) Before provig the theorem, we eed the symmetrizatio lemma Let 0 1,,0 deote a secod idepedet sample from P Let P 0 deote the empirical distributio of this
14 110 Chapter 7 Cocetratio of Measure secod sample The variables 0 1,,0 are called a ghost sample 763 Lemma (Symmetrizatio) For all t> p 2/, P sup (P P )f >t apple 2P sup (P P 0 )f >t/2 (764) Proof Let f 2Fmaximize (P P )f Note that f is a radom fuctio as it depeds o 1,, We claim that if (P P )f >tad (P P)f 0 applet/2 the (P 0 P )f >t/2 This follows sice t < (P P )f = (P P 0 + P 0 P )f apple (P P 0 )f + (P 0 P )f apple (P P 0 )f + t 2 ad hece (P 0 P )f >t/2 So I( (P P )f >t) I( (P P 0 )f applet/2) = I( (P P )f >t, (P P 0 )f applet/2) apple I( (P 0 Now take the expected value over 0 1,,0 ad coclude that P )f >t/2) I( (P P )f >t) P 0 ( (P P 0 )f applet/2) apple P 0 ( (P 0 P )f >t/2) (765) By Chebyshev s iequality, P 0 ( (P P 0 )f applet/2) 1 4Var 0 (f ) t t 2 2 (Here we used the fact that W 2 [0, 1] implies that Var(W ) apple 1/4) Isertig this ito (765) we have that I( (P P )f >t) apple 2 P 0 ( (P 0 P )f >t/2) Thus, I sup (P P )f >t apple 2 P 0 sup (P 0 Now take the expectatio over 1,, to coclude that P (P P )f >t apple 2 P (P 0 sup sup P )f >t/2 P )f >t/2 The importace of symmetrizatio is that we have replaced (P P )f, which ca take ay real value, with (P P)f, 0 which ca take oly fiitely may values Now we prove the Vapik-Chervoekis theorem
15 73 Uiform Bouds 111 Proof Let V = F 0 1,, 0, 1,, For ay v 2 V write (P 0 P )v to mea (1/)( P v i P 2 i=+1 v i) Usig the symmetrizatio lemma ad Hoeffdig s iequality, P(sup (P P )f >t) apple 2 P(sup (P 0 =2P(max v2v (P 0 apple 2 X v2v P( (P 0 P )f >t/2) P )v >t/2) P )v >t/2) apple 2 X v2v 2e t2 /8 apple 4 s(f, 2)e t2 /8 Recall that, for a class with fiite VC dimesio d, s(f,) apple (e/d) d hece we have: 766 Corollary If F has fiite VC dimesio d, the, with probability at least 1, sup P (f) P (f) apple s 8 4 log + d log e (767) d 732 Radamacher Complexity A more geeral way to develop uiform bouds is to use a quatity called Rademacher complexity I this sectio we assume that F is a class of fuctios f such that 0 apple f(z) apple 1 Radom variables 1,, are called Rademacher radom variables if they are idepedet, idetically distributed ad P( i = 1) = P( i = 1) = 1/2 Defie the Rademacher complexity of F by 1 X Rad (F) =E if( i ) (768) sup Defie the empirical Rademacher complexity of F by Rad (F, )=E sup 1 X where =( 1,, ) ad the expectatio is over if( i ) oly (769)
16 112 Chapter 7 Cocetratio of Measure Ituitively, Rad (F) is large if we ca fid fuctios f 2Fthat look like radom oise, that is, they are highly correlated with 1,, Here are some properties of the Rademacher complexity 770 Lemma 1 If F Gthe Rad (F, ) apple Rad (G, ) 2 Let cov(f) deote the covex hull of F The Rad (F, )=Rad (cov(f), ) 3 For ay c 2 R, Rad (cf, )= c Rad (F, ) 4 Let g : R R be such that g(0) = 0 ad, g(y) g(x) applel x y for all x, y The Rad (g F, ) apple 2LRad (F, ) Proof See Exercise Theorem With probability at least 1, sup P (f) P (f) apple2 Rad (F)+ s log (772) ad sup P (f) P (f) apple2 Rad (F, )+ s 4 2 log (773) Proof The proof has two steps First we show that sup P (f) mea The we boud the mea P (f) is close to its Step 1: Let g( 1,, )=sup P (f) P (f) If we chage i to some other value i 0 the g( 1,, ) g( 1,,i 0,, ) apple 1 By McDiarmid s iequality, P ( g( 1,, ) E[g( 1,, )] >) apple 2 e 22 Hece, with probability at least 1, g( 1,, ) apple E[g( 1,, )] + s log (774)
17 73 Uiform Bouds 113 Step 2: Now we boud E[g( 1,, )] Oce agai we itroduce a ghost sample 1 0,,0 ad Rademacher variables 1,,, Note that P (f) =E 0 P(f) 0 Also ote that 1 X (f( 0 i) f( i )) = d 1 X i(f( 0 i) f( i )) where = d meas equal i distributio Hece, " # E[g( 1,, )] = E sup P (f) P (f) apple EE "sup 0 P(f) 0 P (f) X = EE 0 "sup apple E 0 "sup 1 =2Rad (F) 1 X = E " sup E 0 (P(f) 0 P (f)) # = EE 0 " i(f( 0 i) f( i )) i f( 0 i) # + E Combiig this boud with (774) proves the first result " sup sup # 1 1 X # # X (f(i) 0 f( i )) i f( i ) To prove the secod result, let a( 1,, )=Rad (F, ) ad ote that a( 1,, ) chages by at most 1/ if we chage q oe observatio McDiarmid s iequality implies that Rad (F, 1 ) Rad (F) apple 2 log 2 with probability at least 1 Combiig this with the first result yields the secod result I the special case where F is a class of biary fuctios, we ca relate Rad (F) to shatterig umbers 775 Theorem Let F be a set of biary fuctios The, for all, r 2 log s(f,) Rad (F) apple (776) Proof Let D = { 1,, } Defie S(f, )= 1 P if( i ) ad S(v, )= 1 P iv i Now, 1 apple i f( i ) apple 1 Note that Rad (F) = E sup S(f, ) = E E sup S(f, ) D # = E E max S(v, ) D v2f 1,, Now, iv i / has mea 0 ad 1/ apple i v i apple 1/ so, by Lemma 710, E(e t iv i ) apple e t2 /(2 2) for ay t>0 From Theorem 747, r r 2 log V 2 log s(f,) E max S(v, ) D apple = v2f 1,,
18 114 Chapter 7 Cocetratio of Measure ad the result follows I fact, there is a sharper relatioship betwee Rad (F) ad VC dimesio 777 Theorem Suppose that F has fiite VC dimesio d There exists a uiversal costat C>0 such that Rad (F) apple C p d/ For a proof, see, for example, Devroye ad Lugosi (2001) Combiig these results with Theorem 775 ad Theorem 777 we get the followig result 778 Corollary With probability at least 1, sup P (f) P (f) apple r s 8 log s(f,) log If F has fiite VC dimesio d the, with probability at least 1, 2 (779) r s d sup P (f) P (f) apple2c log (780) 733 Bouds For Classes of Real Valued Fuctios Suppose ow that F is a class of real-valued fuctios There are various methods to obtai uiform bouds We cosider two such methods: coverig umbers ad bracketig umbers If Q is a measure ad p 1, defie kfk Lp(Q) = f(x) p dq(x) 1/p Whe Q is Lebesgue measure we simply write kfk p We also defie kfk 1 =sup f(x) x A set C = {f 1,,f N } is a -cover of F (or a -et) if, for every f 2Fthere exists a f j 2Csuch that kf f j k Lp(Q) <
19 73 Uiform Bouds Defiitio The size of the smallest -cover is called the coverig umber ad is deoted by N p (, F,Q) The uiform coverig umber is defied by N p (, F) =supn p (, F,Q) Q where the supremum is over all probability measures Q Now we show how coverig umbers ca be used to obtai bouds 782 Theorem Suppose that kfk 1 apple B for all f 2F The, P P (f) P (f) > apple 2N(/3, F,L 1 )e 2 /(18B 2) sup Proof Let N = N(/3, F,L 1 ) ad let C = {f 1,,f N } be a /3 cover For ay f 2F there is a f j 2 C such that kf f j k 1 apple /3 So Hece, P (f) P (f) apple P (f) P (f j ) + P (f j ) P (f j ) + P (f j ) P (f) apple P (f j ) P (f j ) P sup P (f) P (f) > = P max f j 2C P (f j ) P (f j ) > 3 apple 2N(/3, F,L 1 )e 2 /(18B 2 ) apple P max P (f j ) f j 2C apple from the uio boud ad Hoeffdig s iequality NX j=1 P (f j ) > P P (f j ) P (f j ) > 3 The VC dimesio ca be used to boud coverig umbers 783 Theorem Let F be a class of fuctios f : R d [0,B] with VC dimesio d such that 2 apple d<1 Let p 1 ad 0 <<B/4 The 2eB p 3eB p d N p (, F) apple 3 log p e p (For a proof, see Devroye, Gyorfi ad Lugosi (1996)) However, there are cases where the coverig umbers are fiite ad yet the VC dimesio is ifiite
20 116 Chapter 7 Cocetratio of Measure Bracketig Numbers Aother measure of complexity is the bracketig umber Give a pair of fuctios ` ad u with ` apple u, we defie the bracket o [`, u] = h : `(x) apple h(x) apple u(x) for all x A collectio of pairs of fuctios (`1,u 1 ),,(`N,u N ) is a bracketig of F if, F B[ [`j,u j ] j=1 The collectio is a -L q (P )-bracketig if it is a bracketig ad if u j (x) `j(x) q dp (x) 1 q apple for j =1,,N The bracketig umber N [] (, F,L q (P )) is the size of the smallest bracketig Bracketig umber are a little larger tha coverig umbers but provide stroger cotrol of the class F 784 Theorem 1 N p (, F,P) apple N [] (2, F,L p (P )) 2 Let X 1,,X P If Suppose that N [] (, F,L 1 (P )) < 1 for all >0 The, for every >0, as 1 P sup P (f) P (f) > 0 (785) Proof See exercise 11 R 786 Theorem Let A =sup f f dp ad B =supf kfk 1 The P sup P (f) P (f) > apple 2N [] (/8, F,L 1 (P )) exp +2N [] (/8, F,L 1 (P )) exp Hece, if apple 2A/3, P sup P (f) P (f) > 3 2 4B[6A + ] 3 40B 96 2 apple 4N [] (/8, F,L 1 (P )) exp 76AB (787)
21 73 Uiform Bouds 117 Proof (This proof follows Yukich (1985)) For otatioal simplicity i the proof, let us write, N() N [] (, F,L 1 (P )) Defie z (f) = R f(dp dp ) Let [`1,u 1 ],,[`N,u N ] be a miimal /8 bracketig We may assume that for each j, ku j kappleb ad k`jk appleb (Otherwise, we simply trucate the brackets) For each j, choose some f j 2 [`j,u j ] Cosider ay f 2Fad let [`j,u j ] deote a bracket cotaiig f The z (f) apple z (f j ) + z (f f j ) Furthermore, z (f f j ) = (f f j )(dp dp ) apple f f j (dp + dp ) apple = u j `j (dp dp )+2 u j `j dp = u j `j (dp dp )+2 = z ( u j `j )+ 8 4 u j `j (dp + dp ) Hece, h z (f) apple z (f j ) + [z ( u j `j )+ i 4 Thus, P (sup Now z (f) >) apple P (max j Var(f j ) apple apple P (max j fj 2 dp = Hece, by Berstei s iequality, P max z (f j ) >/2 j Similarly, Also, ku j Var( u j apple 2 j=1 z (f j ) >/2) + P (max z ( u j `j ) + /4 >/2) j z (f j ) >/2) + P (max z ( u j `j ) >/4) j f j f j dp applekf j k 1 NX 1 (/2) 2 exp 2 AB + B/6 f j dp apple AB 3 apple 2N(/8) exp 4B `j ) apple (u j `j) 2 dp apple u j `j u j `j dp appleku j `jk 1 u j `j dp apple 2B 8 = B 4 `jk 1 apple 2B Hece, by Berstei s iequality, P max z ( u j j `j ) >/4 NX 1 apple 2 exp 2 j=1 apple 2N(/8) exp (/4) 2 2B 4 +2B(/4)/3 3 40B 2 6A +
22 118 Chapter 7 Cocetratio of Measure The followig result is from Example 197 from va der Vaart (1998) 788 Lemma Let F = {f : 2 } where is a bouded subset of R d Suppose there exists a fuctio m such that, for every 1, 2, f 1 (x) f 2 (x) applem(x) k 1 2 k The, N [] (, F,L q (P )) apple 4p d diam( ) R d m(x) q dp (x) Proof Let = 4 p d R m(x) q dp (x) We ca cover with (at most) N = (diam( )/ ) d cubes C 1,,C N of size Let c 1,,c N deote the ceters of the cubes Note that C j B(x j, p d ) where B(x, r) deotes a ball of radius r cetered at x Hece, S j B(c j, p d ) covers Let j be the projectio of c j oto The S j B( j, 2 p d) covers I summary, for every 2 there is a j 2{ 1,, N } such that k j kapple2 p d apple 2 R m(x) q dp (x) Defie `j = f j m(x)/2 ad u j = f j + m(x)/2 We claim that the brackets [`1,u 1 ],,[`N,u N ] cover F To see this, choose ay f 2 F Let j be the closest elemet { 1,, N } to The f (x) = f j (x)+f (x) f j (x) apple f j (x)+ f (x) f j (x) apple f j (x)+m(x)k j kapplef j (x)+ m(x) 2 R m(x) q dp (x) = u j(x) By a similar argumet, f (x) `j(x) Also, R (u j `j) q dp apple q Fially, ote that the umber of brackets is N =(diam( )/ ) d = 4p d diam( ) R d m(x) q dp (x)
23 73 Uiform Bouds Example (Desity Estimatio) Let X 1,,X P where P has support o a compact set X R d Cosider the kerel desity estimator bp h (x) = 1 h Pi K(kx d X i k/h) where K is a smooth symmetric fuctio ad h>0 is a badwidth We study bp h i detail i the chapter o oparametric desity estimatio Here we boud the sup orm distace betwee bp h (x) ad is mea p h (x) =E(bp h (x)) 790 Theorem Suppose that K(x) apple K(0) for all x ad that for all x, y The K(y) K(x) applelkx yk P sup bp(x) p h (x) > apple 2 x 32L p d apple d diam(x ) h d+1 exp 3 2 h d +exp 4K(0)(6 + ) 3h d 40K(0) Hece, if apple 2/3 the P sup bp(x) p h (x) > apple 4 x 32L p d d diam(x ) h d+1 exp 3 2 h d 28K(0) Proof Let F = {f x : x 2X}where f x (u) =h d K(kx uk/h) We apply Theorem 786 with A =1ad B = K(0)/h d We eed to boud N [] (, F,L 1 (P ) Now f x (u) f y (u) = 1 kx uk h d K h apple apple L kx uk ky uk hd+1 L kx hd+1 yk Apply Lemma 788 with m(x) =L/h d+1 Thus implies that ky uk K h N [] (, F,L 1 (P )) apple 4Lp d diamx h d+1 d Hece, Theorem 786 yields, P sup bp(x) p h (x) > apple 2 x 32L p d apple d diam(x ) h d+1 exp 3 2 h d +exp 4K(0)(6 + ) 3h d 40K(0)
24 120 Chapter 7 Cocetratio of Measure 791 Corollary Suppose that h = h =(C /) where 1/d ad C = (log ) a for some a 0 The P sup bp(x) p h (x) > apple 4 x 32L p d d diam(x ) (d+1) exp C 3 2 C d 1 28K(0) d Hece, for sufficietly large, P (sup bp(x) x p h (x) >) apple c 1 exp c 2 2 C d 1 d Note that the proofs of the last two results did ot deped o P Hece, if P is the set of distributio with support o X, we have that 2 sup P 2P P sup bp(x) p h (x) > apple c 1 exp x c 2 2 C d 1 d 792 Example Here are some further examples I exercise 12 you are asked to prove these results 1 Let F be the set of cdf s o R The N [] (, F,L 2 (P )) apple 2/ 2 2 (Sobolev Spaces) Let F be the fuctios f o [0, 1] such that kfk 1 apple 1, the (k 1) derivative is absolutely cotiuous ad R (f (k) (x)) 2 dx apple 1 The, there is a costat C>0 such that N [] (, F,L 1 (P )) apple exp " C 1 1 # k 3 Let F be the set of mootoe fuctios f o R such that kfk 1 apple 1 The, there is a costat C>0 such that apple 1 N [] (, F,L 1 (P )) apple exp C 2 74 Additioal results 741 Talagrad s Iequality Oe of the most importat developmets i cocetratio of measure is Talagrad s iequality (Talagrad 1994, 1996) which ca be thought of as a uiform versio of Berstei s
25 74 Additioal results 121 iequality Let F be a class of fuctios ad defie =sup P (f) 1 P 793 Theorem Let v E sup f 2 (X i ) ad U sup kfk 1 There exists a uiversal costat K such that P sup P (f) E(sup P (f) ) >t apple K exp t KU log 1+ tu v (794) To make use of Talagrad s iequality, we eed to estimate E(sup P (f) ) 795 Theorem (Gié ad Guillou, 2001) Suppose that there exist A ad d such that sup N(, L 2 (P ),a) apple (A/) d P where a = kf k L2 (P ) ad F (x) =sup f(x) The for some C>0 AU E(sup P (f) ) apple C du log + p d s AU log Combiig these results gives Gié ad Guillou s versio of Talagrad s iequality: 1 P 796 Theorem Let v E sup f 2 (X i ) ad U sup kfk 1 There exists a uiversal costat K such that P sup P (f) P (f) >t wheever t C apple K exp ( AU U log + p ) t KU log 1+ tu K( p + U p log(au/ )) 2 s log AU 798 Example Desity Estimatio Gie ad Guillou (2002) apply Talagrad s iequality to get bouds o desity estimators Let X 1,,X P where X i 2 R d ad suppose that P has desity p The kerel desity estimator of p with badwidth h is X kx bp h (x) = 1 K Xi k h Applyig the results above to bp h (x) we see that (uder very weak coditios o K) for all small ad large, (797) P(sup x2r d bp h (x) p h (x) >) apple c 2 e c 2hd 2 (799)
26 122 Chapter 7 Cocetratio of Measure where p h (x) =E(bp h (x)) ad c 1,c 2 are positive costats This agrees with the earlier result Theorem A Boud o Expected Values Now we cosider boudig the expected value of the maximum of a ifiite set of radom variables Let {X f : f 2F}be a collectio of mea 0 radom variables idexed by f 2F ad let d be a metric o F Let N(F,r) be the coverig umber of F, that is, the smallest umber of balls of radius r required to cover F Say that {X f : f 2F}is sub-gaussia if, for every t>0 ad every f,g 2F, E(e t(x f X g) ) apple e t2 d 2 (f,g)/2 We say that {X f : f 2F}is sample cotiuous if, for every sequece f 1,f 2,,2Fsuch that d(f i,f) 0 for some f 2F, we have that X fi X f as The followig theorem is from Cesa-Biachi ad Lugosi (2006) ad is a variatio of a theorem due to Dudley (1978) 7100 Theorem Suppose that {X f : f 2F}is sub-gaussia ad sample cotiuous The D/2 p E sup X f apple 12 log N(F,)d (7101) where D =sup f,g2f d(f,g) 0 Proof The proof uses Dudley s chaiig techique We follow the versio i Theorem 83 of Cesa-Biachi ad Lugosi (2006) Let F k be a miimal cover of F of radius D2 k Thus F k = N(F,D2 k ) Let f 0 deote the uique elemet i F 0 Each X f is a radom variable ad hece is a mappig from some sample space S to the reals Fix s 2 S ad let f be such that sup F 2F X f (s) =X f (s) (If a exact maximizer does ot exist, we ca choose a approximate maximizer but we shall assume a exact maximizer) Let f k 2F k miimize the distace to f Hece, d(f k 1,f k ) apple d(f,f k )+d(f,f k 1 ) apple 3D2 k Now lim k1 f k = f ad by sample cotiuity sup X f (s) =X f (s) =X f0 (s)+ f Recall that E(X f0 )=0 Therefore, 1X (X fk (s) X fk 1 (s)) k=1 E sup X f apple f 1X k=1 E max (X f X g ) f,g
27 75 Summary 123 where the max is over all f 2F k ad g 2F k 1 such that d(f,g) apple 3D2 k There are at most N(F,D2 k ) 2 such pairs By Theorem 747, q E max (X f X g ) apple 3D2 k 2 log N(F,D2 k ) 2 f,g By summig over k we have 1X 1X E sup X f apple 3D2 q2 k log N(F,D2 k ) 2 = 12 D2 qlog (k+1) N(F,D2 k ) f k=1 D/2 apple 12 0 p N(F,)d k= Example Let Y 1,,Y be a sample from a cotiuous cdf F o [0, 1] with bouded desity Let X s = p (F (s) F (s)) where F is the empirical distributio fuctio The collectio {X s : s 2 [0, 1]} ca be show to be sub-gaussia a sample cotiuous with respect to the Euclidea metric o [0, 1] The coverig umber is N([0, 1],r)=1/r Hece, p 1/2 p E (F (s) F (s) apple 12 log(1/)d apple C sup 0applesapple1 for some C>0 Hece, 2 E sup (F (s) F (s) 0applesapple1 0 apple C p 75 Summary The most importat results i this chapter are Hoeffdig s iequality: P( X µ >) apple 2e 22 /c, Berstei s iequality P( X >) apple 2exp c/3 the Vapik-Chervoekis boud, P sup (P P )f >t apple 4 s(f, 2)e t2 /8
28 124 Chapter 7 Cocetratio of Measure ad the Rademacher boud: with probability at least 1, s sup P (f) P (f) apple2 Rad (F)+ 1 2 log 2 These, ad similar results, provide the theoretical basis for may statistical machie learig methods The literature catais may refiemets ad extesios of these results 76 Bibliographic Remarks Cocetratio of measure is a vast ad still growig area Some good refereces are Devroye, Gyorfi ad Lugosi (1996), va der Vaart ad Weller (1996), Chapter 19 of va der Vaart (1998), Dubhashi ad Pacoesi (2009), ad Ledoux (2005) Exercises 71 Suppose that X 0 ad E(X) < 1 Show that E(X) = R 1 0 P (X t)dt 72 Show that h(u) u 2 /(2 + 2u/3) for u 0 where h(u) =(1+u) log(1 + u) u 73 I the proof of McDiarmid s iequality, verify that E(V i X 1,,X i 1 )=0 74 Prove Lemma Prove equatio (724) 76 Prove the results i Table Derive Hoeffdig s iequality from McDiarmid s iequality 78 Prove lemma Cosider Example 7102 Show that {X s : s 2 [0, 1]} is sub-gaussia Show that p log(1/)d apple C for some C>0 R 1/ Prove Theorem Prove Theorem Prove the results i Example 792
Properties of MLE: consistency, asymptotic normality. Fisher information.
Lecture 3 Properties of MLE: cosistecy, asymptotic ormality. Fisher iformatio. I this sectio we will try to uderstad why MLEs are good. Let us recall two facts from probability that we be used ofte throughout
SAMPLE QUESTIONS FOR FINAL EXAM. (1) (2) (3) (4) Find the following using the definition of the Riemann integral: (2x + 1)dx
SAMPLE QUESTIONS FOR FINAL EXAM REAL ANALYSIS I FALL 006 3 4 Fid the followig usig the defiitio of the Riema itegral: a 0 x + dx 3 Cosider the partitio P x 0 3, x 3 +, x 3 +,......, x 3 3 + 3 of the iterval
Chapter 6: Variance, the law of large numbers and the Monte-Carlo method
Chapter 6: Variace, the law of large umbers ad the Mote-Carlo method Expected value, variace, ad Chebyshev iequality. If X is a radom variable recall that the expected value of X, E[X] is the average value
Overview of some probability distributions.
Lecture Overview of some probability distributios. I this lecture we will review several commo distributios that will be used ofte throughtout the class. Each distributio is usually described by its probability
Convexity, Inequalities, and Norms
Covexity, Iequalities, ad Norms Covex Fuctios You are probably familiar with the otio of cocavity of fuctios. Give a twicedifferetiable fuctio ϕ: R R, We say that ϕ is covex (or cocave up) if ϕ (x) 0 for
Chapter 7 Methods of Finding Estimators
Chapter 7 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 011 Chapter 7 Methods of Fidig Estimators Sectio 7.1 Itroductio Defiitio 7.1.1 A poit estimator is ay fuctio W( X) W( X1, X,, X ) of
In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008
I ite Sequeces Dr. Philippe B. Laval Keesaw State Uiversity October 9, 2008 Abstract This had out is a itroductio to i ite sequeces. mai de itios ad presets some elemetary results. It gives the I ite Sequeces
Asymptotic Growth of Functions
CMPS Itroductio to Aalysis of Algorithms Fall 3 Asymptotic Growth of Fuctios We itroduce several types of asymptotic otatio which are used to compare the performace ad efficiecy of algorithms As we ll
0.7 0.6 0.2 0 0 96 96.5 97 97.5 98 98.5 99 99.5 100 100.5 96.5 97 97.5 98 98.5 99 99.5 100 100.5
Sectio 13 Kolmogorov-Smirov test. Suppose that we have a i.i.d. sample X 1,..., X with some ukow distributio P ad we would like to test the hypothesis that P is equal to a particular distributio P 0, i.e.
10-705/36-705 Intermediate Statistics
0-705/36-705 Itermediate Statistics Larry Wasserma http://www.stat.cmu.edu/~larry/=stat705/ Fall 0 Week Class I Class II Day III Class IV Syllabus August 9 Review Review, Iequalities Iequalities September
University of California, Los Angeles Department of Statistics. Distributions related to the normal distribution
Uiversity of Califoria, Los Ageles Departmet of Statistics Statistics 100B Istructor: Nicolas Christou Three importat distributios: Distributios related to the ormal distributio Chi-square (χ ) distributio.
Lecture 5: Span, linear independence, bases, and dimension
Lecture 5: Spa, liear idepedece, bases, ad dimesio Travis Schedler Thurs, Sep 23, 2010 (versio: 9/21 9:55 PM) 1 Motivatio Motivatio To uderstad what it meas that R has dimesio oe, R 2 dimesio 2, etc.;
Chapter 7 - Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas:
Chapter 7 - Samplig Distributios 1 Itroductio What is statistics? It cosist of three major areas: Data Collectio: samplig plas ad experimetal desigs Descriptive Statistics: umerical ad graphical summaries
Our aim is to show that under reasonable assumptions a given 2π-periodic function f can be represented as convergent series
8 Fourier Series Our aim is to show that uder reasoable assumptios a give -periodic fuctio f ca be represeted as coverget series f(x) = a + (a cos x + b si x). (8.) By defiitio, the covergece of the series
Introduction to Statistical Learning Theory
Itroductio to Statistical Learig Theory Olivier Bousquet 1, Stéphae Bouchero 2, ad Gábor Lugosi 3 1 Max-Plack Istitute for Biological Cyberetics Spemastr 38, D-72076 Tübige, Germay olivierbousquet@m4xorg
Lecture 4: Cauchy sequences, Bolzano-Weierstrass, and the Squeeze theorem
Lecture 4: Cauchy sequeces, Bolzao-Weierstrass, ad the Squeeze theorem The purpose of this lecture is more modest tha the previous oes. It is to state certai coditios uder which we are guarateed that limits
Maximum Likelihood Estimators.
Lecture 2 Maximum Likelihood Estimators. Matlab example. As a motivatio, let us look at oe Matlab example. Let us geerate a radom sample of size 00 from beta distributio Beta(5, 2). We will lear the defiitio
Output Analysis (2, Chapters 10 &11 Law)
B. Maddah ENMG 6 Simulatio 05/0/07 Output Aalysis (, Chapters 10 &11 Law) Comparig alterative system cofiguratio Sice the output of a simulatio is radom, the comparig differet systems via simulatio should
Chapter 5: Inner Product Spaces
Chapter 5: Ier Product Spaces Chapter 5: Ier Product Spaces SECION A Itroductio to Ier Product Spaces By the ed of this sectio you will be able to uderstad what is meat by a ier product space give examples
Infinite Sequences and Series
CHAPTER 4 Ifiite Sequeces ad Series 4.1. Sequeces A sequece is a ifiite ordered list of umbers, for example the sequece of odd positive itegers: 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29...
Sequences and Series
CHAPTER 9 Sequeces ad Series 9.. Covergece: Defiitio ad Examples Sequeces The purpose of this chapter is to itroduce a particular way of geeratig algorithms for fidig the values of fuctios defied by their
Universal coding for classes of sources
Coexios module: m46228 Uiversal codig for classes of sources Dever Greee This work is produced by The Coexios Project ad licesed uder the Creative Commos Attributio Licese We have discussed several parametric
UC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Solutions 9 Spring 2006
Exam format UC Bereley Departmet of Electrical Egieerig ad Computer Sciece EE 6: Probablity ad Radom Processes Solutios 9 Sprig 006 The secod midterm will be held o Wedesday May 7; CHECK the fial exam
I. Chi-squared Distributions
1 M 358K Supplemet to Chapter 23: CHI-SQUARED DISTRIBUTIONS, T-DISTRIBUTIONS, AND DEGREES OF FREEDOM To uderstad t-distributios, we first eed to look at aother family of distributios, the chi-squared distributios.
A probabilistic proof of a binomial identity
A probabilistic proof of a biomial idetity Joatho Peterso Abstract We give a elemetary probabilistic proof of a biomial idetity. The proof is obtaied by computig the probability of a certai evet i two
1. MATHEMATICAL INDUCTION
1. MATHEMATICAL INDUCTION EXAMPLE 1: Prove that for ay iteger 1. Proof: 1 + 2 + 3 +... + ( + 1 2 (1.1 STEP 1: For 1 (1.1 is true, sice 1 1(1 + 1. 2 STEP 2: Suppose (1.1 is true for some k 1, that is 1
Statistical Learning Theory
1 / 130 Statistical Learig Theory Machie Learig Summer School, Kyoto, Japa Alexader (Sasha) Rakhli Uiversity of Pesylvaia, The Wharto School Pe Research i Machie Learig (PRiML) August 27-28, 2012 2 / 130
3. Greatest Common Divisor - Least Common Multiple
3 Greatest Commo Divisor - Least Commo Multiple Defiitio 31: The greatest commo divisor of two atural umbers a ad b is the largest atural umber c which divides both a ad b We deote the greatest commo gcd
THE ABRACADABRA PROBLEM
THE ABRACADABRA PROBLEM FRANCESCO CARAVENNA Abstract. We preset a detailed solutio of Exercise E0.6 i [Wil9]: i a radom sequece of letters, draw idepedetly ad uiformly from the Eglish alphabet, the expected
Theorems About Power Series
Physics 6A Witer 20 Theorems About Power Series Cosider a power series, f(x) = a x, () where the a are real coefficiets ad x is a real variable. There exists a real o-egative umber R, called the radius
Class Meeting # 16: The Fourier Transform on R n
MATH 18.152 COUSE NOTES - CLASS MEETING # 16 18.152 Itroductio to PDEs, Fall 2011 Professor: Jared Speck Class Meetig # 16: The Fourier Trasform o 1. Itroductio to the Fourier Trasform Earlier i the course,
Chapter 14 Nonparametric Statistics
Chapter 14 Noparametric Statistics A.K.A. distributio-free statistics! Does ot deped o the populatio fittig ay particular type of distributio (e.g, ormal). Sice these methods make fewer assumptios, they
4.3. The Integral and Comparison Tests
4.3. THE INTEGRAL AND COMPARISON TESTS 9 4.3. The Itegral ad Compariso Tests 4.3.. The Itegral Test. Suppose f is a cotiuous, positive, decreasig fuctio o [, ), ad let a = f(). The the covergece or divergece
CS103X: Discrete Structures Homework 4 Solutions
CS103X: Discrete Structures Homewor 4 Solutios Due February 22, 2008 Exercise 1 10 poits. Silico Valley questios: a How may possible six-figure salaries i whole dollar amouts are there that cotai at least
Normal Distribution.
Normal Distributio www.icrf.l Normal distributio I probability theory, the ormal or Gaussia distributio, is a cotiuous probability distributio that is ofte used as a first approimatio to describe realvalued
MARTINGALES AND A BASIC APPLICATION
MARTINGALES AND A BASIC APPLICATION TURNER SMITH Abstract. This paper will develop the measure-theoretic approach to probability i order to preset the defiitio of martigales. From there we will apply this
SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES
SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES Read Sectio 1.5 (pages 5 9) Overview I Sectio 1.5 we lear to work with summatio otatio ad formulas. We will also itroduce a brief overview of sequeces,
1 The Gaussian channel
ECE 77 Lecture 0 The Gaussia chael Objective: I this lecture we will lear about commuicatio over a chael of practical iterest, i which the trasmitted sigal is subjected to additive white Gaussia oise.
Stéphane Boucheron 1, Olivier Bousquet 2 and Gábor Lugosi 3
ESAIM: Probability ad Statistics URL: http://wwwemathfr/ps/ Will be set by the publisher THEORY OF CLASSIFICATION: A SURVEY OF SOME RECENT ADVANCES Stéphae Bouchero 1, Olivier Bousquet 2 ad Gábor Lugosi
Plug-in martingales for testing exchangeability on-line
Plug-i martigales for testig exchageability o-lie Valetia Fedorova, Alex Gammerma, Ilia Nouretdiov, ad Vladimir Vovk Computer Learig Research Cetre Royal Holloway, Uiversity of Lodo, UK {valetia,ilia,alex,vovk}@cs.rhul.ac.uk
Case Study. Normal and t Distributions. Density Plot. Normal Distributions
Case Study Normal ad t Distributios Bret Halo ad Bret Larget Departmet of Statistics Uiversity of Wiscosi Madiso October 11 13, 2011 Case Study Body temperature varies withi idividuals over time (it ca
Chapter 11 Convergence in Distribution
Chapter Covergece i Distributio. Weak covergece i metric spaces 2. Weak covergece i R 3. Tightess ad subsequeces 4. Metrizig weak covergece 5. Characterizig weak covergece i spaces of fuctios 2 Chapter
1 Computing the Standard Deviation of Sample Means
Computig the Stadard Deviatio of Sample Meas Quality cotrol charts are based o sample meas ot o idividual values withi a sample. A sample is a group of items, which are cosidered all together for our aalysis.
Lecture 4: Cheeger s Inequality
Spectral Graph Theory ad Applicatios WS 0/0 Lecture 4: Cheeger s Iequality Lecturer: Thomas Sauerwald & He Su Statemet of Cheeger s Iequality I this lecture we assume for simplicity that G is a d-regular
NOTES ON PROBABILITY Greg Lawler Last Updated: March 21, 2016
NOTES ON PROBBILITY Greg Lawler Last Updated: March 21, 2016 Overview This is a itroductio to the mathematical foudatios of probability theory. It is iteded as a supplemet or follow-up to a graduate course
LECTURE 13: Cross-validation
LECTURE 3: Cross-validatio Resampli methods Cross Validatio Bootstrap Bias ad variace estimatio with the Bootstrap Three-way data partitioi Itroductio to Patter Aalysis Ricardo Gutierrez-Osua Texas A&M
Solutions to Selected Problems In: Pattern Classification by Duda, Hart, Stork
Solutios to Selected Problems I: Patter Classificatio by Duda, Hart, Stork Joh L. Weatherwax February 4, 008 Problem Solutios Chapter Bayesia Decisio Theory Problem radomized rules Part a: Let Rx be the
A Recursive Formula for Moments of a Binomial Distribution
A Recursive Formula for Momets of a Biomial Distributio Árpád Béyi beyi@mathumassedu, Uiversity of Massachusetts, Amherst, MA 01003 ad Saverio M Maago smmaago@psavymil Naval Postgraduate School, Moterey,
PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM
PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY Physical ad Mathematical Scieces 2015, 1, p. 15 19 M a t h e m a t i c s AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM A. G. GULYAN Chair of Actuarial Mathematics
Trigonometric Form of a Complex Number. The Complex Plane. axis. ( 2, 1) or 2 i FIGURE 6.44. The absolute value of the complex number z a bi is
0_0605.qxd /5/05 0:45 AM Page 470 470 Chapter 6 Additioal Topics i Trigoometry 6.5 Trigoometric Form of a Complex Number What you should lear Plot complex umbers i the complex plae ad fid absolute values
Section 11.3: The Integral Test
Sectio.3: The Itegral Test Most of the series we have looked at have either diverged or have coverged ad we have bee able to fid what they coverge to. I geeral however, the problem is much more difficult
A Note on Sums of Greatest (Least) Prime Factors
It. J. Cotemp. Math. Scieces, Vol. 8, 203, o. 9, 423-432 HIKARI Ltd, www.m-hikari.com A Note o Sums of Greatest (Least Prime Factors Rafael Jakimczuk Divisio Matemática, Uiversidad Nacioal de Luá Bueos
1 Correlation and Regression Analysis
1 Correlatio ad Regressio Aalysis I this sectio we will be ivestigatig the relatioship betwee two cotiuous variable, such as height ad weight, the cocetratio of a ijected drug ad heart rate, or the cosumptio
Soving Recurrence Relations
Sovig Recurrece Relatios Part 1. Homogeeous liear 2d degree relatios with costat coefficiets. Cosider the recurrece relatio ( ) T () + at ( 1) + bt ( 2) = 0 This is called a homogeeous liear 2d degree
Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring
No-life isurace mathematics Nils F. Haavardsso, Uiversity of Oslo ad DNB Skadeforsikrig Mai issues so far Why does isurace work? How is risk premium defied ad why is it importat? How ca claim frequecy
1. C. The formula for the confidence interval for a population mean is: x t, which was
s 1. C. The formula for the cofidece iterval for a populatio mea is: x t, which was based o the sample Mea. So, x is guarateed to be i the iterval you form.. D. Use the rule : p-value
Department of Computer Science, University of Otago
Departmet of Computer Sciece, Uiversity of Otago Techical Report OUCS-2006-09 Permutatios Cotaiig May Patters Authors: M.H. Albert Departmet of Computer Sciece, Uiversity of Otago Micah Colema, Rya Fly
Unbiased Estimation. Topic 14. 14.1 Introduction
Topic 4 Ubiased Estimatio 4. Itroductio I creatig a parameter estimator, a fudametal questio is whether or ot the estimator differs from the parameter i a systematic maer. Let s examie this by lookig a
Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13
EECS 70 Discrete Mathematics ad Probability Theory Sprig 2014 Aat Sahai Note 13 Itroductio At this poit, we have see eough examples that it is worth just takig stock of our model of probability ad may
Hypothesis testing. Null and alternative hypotheses
Hypothesis testig Aother importat use of samplig distributios is to test hypotheses about populatio parameters, e.g. mea, proportio, regressio coefficiets, etc. For example, it is possible to stipulate
Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable
Week 3 Coditioal probabilities, Bayes formula, WEEK 3 page 1 Expected value of a radom variable We recall our discussio of 5 card poker hads. Example 13 : a) What is the probability of evet A that a 5
Incremental calculation of weighted mean and variance
Icremetal calculatio of weighted mea ad variace Toy Fich [email protected] [email protected] Uiversity of Cambridge Computig Service February 009 Abstract I these otes I eplai how to derive formulae for umerically
Approximating Area under a curve with rectangles. To find the area under a curve we approximate the area using rectangles and then use limits to find
1.8 Approximatig Area uder a curve with rectagles 1.6 To fid the area uder a curve we approximate the area usig rectagles ad the use limits to fid 1.4 the area. Example 1 Suppose we wat to estimate 1.
Lecture 3. denote the orthogonal complement of S k. Then. 1 x S k. n. 2 x T Ax = ( ) λ x. with x = 1, we have. i = λ k x 2 = λ k.
18.409 A Algorithmist s Toolkit September 17, 009 Lecture 3 Lecturer: Joatha Keler Scribe: Adre Wibisoo 1 Outlie Today s lecture covers three mai parts: Courat-Fischer formula ad Rayleigh quotiets The
THE HEIGHT OF q-binary SEARCH TREES
THE HEIGHT OF q-binary SEARCH TREES MICHAEL DRMOTA AND HELMUT PRODINGER Abstract. q biary search trees are obtaied from words, equipped with the geometric distributio istead of permutatios. The average
Ekkehart Schlicht: Economic Surplus and Derived Demand
Ekkehart Schlicht: Ecoomic Surplus ad Derived Demad Muich Discussio Paper No. 2006-17 Departmet of Ecoomics Uiversity of Muich Volkswirtschaftliche Fakultät Ludwig-Maximilias-Uiversität Müche Olie at http://epub.ub.ui-mueche.de/940/
A PROBABILISTIC VIEW ON THE ECONOMICS OF GAMBLING
A PROBABILISTIC VIEW ON THE ECONOMICS OF GAMBLING MATTHEW ACTIPES Abstract. This paper begis by defiig a probability space ad establishig probability fuctios i this space over discrete radom variables.
Partial Di erential Equations
Partial Di eretial Equatios Partial Di eretial Equatios Much of moder sciece, egieerig, ad mathematics is based o the study of partial di eretial equatios, where a partial di eretial equatio is a equatio
5 Boolean Decision Trees (February 11)
5 Boolea Decisio Trees (February 11) 5.1 Graph Coectivity Suppose we are give a udirected graph G, represeted as a boolea adjacecy matrix = (a ij ), where a ij = 1 if ad oly if vertices i ad j are coected
The Stable Marriage Problem
The Stable Marriage Problem William Hut Lae Departmet of Computer Sciece ad Electrical Egieerig, West Virgiia Uiversity, Morgatow, WV [email protected] 1 Itroductio Imagie you are a matchmaker,
Measures of Spread and Boxplots Discrete Math, Section 9.4
Measures of Spread ad Boxplots Discrete Math, Sectio 9.4 We start with a example: Example 1: Comparig Mea ad Media Compute the mea ad media of each data set: S 1 = {4, 6, 8, 10, 1, 14, 16} S = {4, 7, 9,
3 Basic Definitions of Probability Theory
3 Basic Defiitios of Probability Theory 3defprob.tex: Feb 10, 2003 Classical probability Frequecy probability axiomatic probability Historical developemet: Classical Frequecy Axiomatic The Axiomatic defiitio
FIBONACCI NUMBERS: AN APPLICATION OF LINEAR ALGEBRA. 1. Powers of a matrix
FIBONACCI NUMBERS: AN APPLICATION OF LINEAR ALGEBRA. Powers of a matrix We begi with a propositio which illustrates the usefuless of the diagoalizatio. Recall that a square matrix A is diogaalizable if
Descriptive Statistics
Descriptive Statistics We leared to describe data sets graphically. We ca also describe a data set umerically. Measures of Locatio Defiitio The sample mea is the arithmetic average of values. We deote
Metric, Normed, and Topological Spaces
Chapter 13 Metric, Normed, ad Topological Spaces A metric space is a set X that has a otio of the distace d(x, y) betwee every pair of poits x, y X. A fudametal example is R with the absolute-value metric
CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 8
CME 30: NUMERICAL LINEAR ALGEBRA FALL 005/06 LECTURE 8 GENE H GOLUB 1 Positive Defiite Matrices A matrix A is positive defiite if x Ax > 0 for all ozero x A positive defiite matrix has real ad positive
SUPPLEMENTARY MATERIAL TO GENERAL NON-EXACT ORACLE INEQUALITIES FOR CLASSES WITH A SUBEXPONENTIAL ENVELOPE
SUPPLEMENTARY MATERIAL TO GENERAL NON-EXACT ORACLE INEQUALITIES FOR CLASSES WITH A SUBEXPONENTIAL ENVELOPE By Guillaume Lecué CNRS, LAMA, Mare-la-vallée, 77454 Frace ad By Shahar Medelso Departmet of Mathematics,
Inverse Gaussian Distribution
5 Kauhisa Matsuda All rights reserved. Iverse Gaussia Distributio Abstract Kauhisa Matsuda Departmet of Ecoomics The Graduate Ceter The City Uiversity of New York 65 Fifth Aveue New York NY 6-49 Email:
Running Time ( 3.1) Analysis of Algorithms. Experimental Studies ( 3.1.1) Limitations of Experiments. Pseudocode ( 3.1.2) Theoretical Analysis
Ruig Time ( 3.) Aalysis of Algorithms Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite amout of time. Most algorithms trasform iput objects ito output objects.
INFINITE SERIES KEITH CONRAD
INFINITE SERIES KEITH CONRAD. Itroductio The two basic cocepts of calculus, differetiatio ad itegratio, are defied i terms of limits (Newto quotiets ad Riema sums). I additio to these is a third fudametal
Central Limit Theorem and Its Applications to Baseball
Cetral Limit Theorem ad Its Applicatios to Baseball by Nicole Aderso A project submitted to the Departmet of Mathematical Scieces i coformity with the requiremets for Math 4301 (Hoours Semiar) Lakehead
Statistical inference: example 1. Inferential Statistics
Statistical iferece: example 1 Iferetial Statistics POPULATION SAMPLE A clothig store chai regularly buys from a supplier large quatities of a certai piece of clothig. Each item ca be classified either
PRODUCT RULE WINS A COMPETITIVE GAME
PROCEEDIGS OF THE AMERICA MATHEMATICAL SOCIETY Volume 00, umber 0, Pages 000 000 S 0002-9939XX0000-0 PRODUCT RULE WIS A COMPETITIVE GAME ADREW BEVERIDGE, TOM BOHMA, ALA FRIEZE, AD OLEG PIKHURKO Abstract
Factors of sums of powers of binomial coefficients
ACTA ARITHMETICA LXXXVI.1 (1998) Factors of sums of powers of biomial coefficiets by Neil J. Cali (Clemso, S.C.) Dedicated to the memory of Paul Erdős 1. Itroductio. It is well ow that if ( ) a f,a = the
Example 2 Find the square root of 0. The only square root of 0 is 0 (since 0 is not positive or negative, so those choices don t exist here).
BEGINNING ALGEBRA Roots ad Radicals (revised summer, 00 Olso) Packet to Supplemet the Curret Textbook - Part Review of Square Roots & Irratioals (This portio ca be ay time before Part ad should mostly
Taking DCOP to the Real World: Efficient Complete Solutions for Distributed Multi-Event Scheduling
Taig DCOP to the Real World: Efficiet Complete Solutios for Distributed Multi-Evet Schedulig Rajiv T. Maheswara, Milid Tambe, Emma Bowrig, Joatha P. Pearce, ad Pradeep araatham Uiversity of Souther Califoria
BASIC STATISTICS. f(x 1,x 2,..., x n )=f(x 1 )f(x 2 ) f(x n )= f(x i ) (1)
BASIC STATISTICS. SAMPLES, RANDOM SAMPLING AND SAMPLE STATISTICS.. Radom Sample. The radom variables X,X 2,..., X are called a radom sample of size from the populatio f(x if X,X 2,..., X are mutually idepedet
Modified Line Search Method for Global Optimization
Modified Lie Search Method for Global Optimizatio Cria Grosa ad Ajith Abraham Ceter of Excellece for Quatifiable Quality of Service Norwegia Uiversity of Sciece ad Techology Trodheim, Norway {cria, ajith}@q2s.tu.o
AMS 2000 subject classification. Primary 62G08, 62G20; secondary 62G99
VARIABLE SELECTION IN NONPARAMETRIC ADDITIVE MODELS Jia Huag 1, Joel L. Horowitz 2 ad Fegrog Wei 3 1 Uiversity of Iowa, 2 Northwester Uiversity ad 3 Uiversity of West Georgia Abstract We cosider a oparametric
Basic Elements of Arithmetic Sequences and Series
MA40S PRE-CALCULUS UNIT G GEOMETRIC SEQUENCES CLASS NOTES (COMPLETED NO NEED TO COPY NOTES FROM OVERHEAD) Basic Elemets of Arithmetic Sequeces ad Series Objective: To establish basic elemets of arithmetic
Permutations, the Parity Theorem, and Determinants
1 Permutatios, the Parity Theorem, ad Determiats Joh A. Guber Departmet of Electrical ad Computer Egieerig Uiversity of Wiscosi Madiso Cotets 1 What is a Permutatio 1 2 Cycles 2 2.1 Traspositios 4 3 Orbits
THIN SEQUENCES AND THE GRAM MATRIX PAMELA GORKIN, JOHN E. MCCARTHY, SANDRA POTT, AND BRETT D. WICK
THIN SEQUENCES AND THE GRAM MATRIX PAMELA GORKIN, JOHN E MCCARTHY, SANDRA POTT, AND BRETT D WICK Abstract We provide a ew proof of Volberg s Theorem characterizig thi iterpolatig sequeces as those for
THE TWO-VARIABLE LINEAR REGRESSION MODEL
THE TWO-VARIABLE LINEAR REGRESSION MODEL Herma J. Bieres Pesylvaia State Uiversity April 30, 202. Itroductio Suppose you are a ecoomics or busiess maor i a college close to the beach i the souther part
