# Stéphane Boucheron 1, Olivier Bousquet 2 and Gábor Lugosi 3

Save this PDF as:

Size: px
Start display at page:

## Transcription

6 6 TITLE WILL BE SET BY THE PUBLISHER where c A = ca : a A} ad A B = a + b : a A, b B} Moreover, if A = a (1),, a (N) } R is a fiite set, the 2 log N R (A) max j=1,,n a(j) (4) N where deotes Euclidea orm If abscov(a) = j=1 c ja (j) : N N, } N j=1 c j 1, a (j) A is the absolute covex hull of A, the R (A) = R (abscov(a)) (5) Fially, the cotractio priciple states that if φ : R R is a fuctio with φ(0) = 0 ad Lipschitz costat L φ ad φ A is the set of vectors of form (φ(a 1 ),, φ(a )) R with a A, the R (φ A) L φ R (A) proof The first three properties are immediate from the defiitio Iequality (4) follows by Hoeffdig s iequality which states that if X is a bouded zero-mea radom variable takig values i a iterval [α, β], the for ay s > 0, E exp(sx) exp ( s 2 (β α) 2 /8 ) I particular, by idepedece, This implies that E exp ( s 1 ) σ i a i = i=1 e sr(a) = exp i=1 ( 1 se max j=1,,n N Ee s 1 j=1 E exp (s 1 ) σ ia i i=1 σ i a (j) i ) ( s 2 a 2 ) ( i s 2 a 2 ) exp 2 2 = exp 2 2 i=1 E exp P i=1 σia(j) i N max j=1,,n exp ( s max j=1,,n 1 ( s 2 a (j) 2 ) 2 2 Takig the logarithm of both sides, dividig by s, ad choosig s to miimize the obtaied upper boud for R (A), we arrive at (4) The idetity (5) is easily see from the defiitio For a proof of the cotractio priciple, see Ledoux ad Talagrad [133] Ofte it is useful to derive further upper bouds o Rademacher averages As a illustratio, we cosider the case whe F is a class of idicator fuctios Recall that this is the case i our motivatig example i the classificatio problem described above whe each f F is the idicator fuctio of a set of the form (x, y) : g(x) y} I such a case, for ay collectio of poits x 1 = (x 1,, x ), F(x 1 ) is a fiite subset of R whose cardiality is deoted by S F (x 1 ) ad is called the vc shatter coefficiet (where vc stads for Vapik-Chervoekis) Obviously, S F (x 1 ) 2 By iequality (4), we have, for all x 1, i=1 σ i a (j) i ) R (F(x 1 )) 2 log SF (x 1 ) (6) where we used the fact that for each f F, i f(x i) 2 I particular, 2 log SF (X1 E sup P f P f 2E ) f F The logarithm of the vc shatter coefficiet may be upper bouded i terms of a combiatorial quatity, called the vc dimesio If A 1, 1}, the the vc dimesio of A is the size V of the largest set of idices

9 TITLE WILL BE SET BY THE PUBLISHER 9 of misclassificatio errors has bee proved to be at least as hard as solvig ay p-complete problem, but eve approximately miimizig the umber of misclassificatio errors withi a costat factor of the optimum has bee show to be p-hard This meas that, uless p =p, we will ot be able to build a computatioally efficiet empirical risk miimizer for half-spaces that will work for all iput space dimesios If the iput space dimesio d is fixed, a algorithm ruig i O( d 1 log ) steps eumerates the trace of half-spaces o a sample of legth This allows a exhaustive search for the empirical risk miimizer Such a possibility should be cosidered with circumspectio sice its rage of applicatios would exted much beyod problems where iput dimesio is less tha 5 41 Margi-based performace bouds A attempt to solve both of these problems is to modify the empirical fuctioal to be miimized by itroducig a cost fuctio Next we describe the mai ideas of empirical miimizatio of cost fuctioals ad its aalysis We cosider classifiers of the form 1 if f(x) 0 g f (x) = 1 otherwise where f : X R is a real-valued fuctio I such a case the probability of error of g may be writte as L(g f ) = Psg(f(X)) Y } E1 f(x)y <0 To lighte otatio we will simply write L(f) = L(g f ) Let φ : R R + be a oegative cost fuctio such that φ(x) 1 x>0 (Typical choices of φ iclude φ(x) = e x, φ(x) = log 2 (1+e x ), ad φ(x) = (1+x) + ) Itroduce the cost fuctioal ad its empirical versio by A(f) = Eφ( f(x)y ) ad A (f) = 1 φ( f(x i )Y i ) i=1 Obviously, L(f) A(f) ad L (f) A (f) Theorem 41 Assume that the fuctio f is chose from a class F based o the data (Z 1,, Z ) def = (X 1, Y 1 ),, (X, Y ) Let B deote a uiform upper boud o φ( f(x)y) ad let L φ be the Lipschitz costat of φ The the probability of error of the correspodig classifier may be bouded, with probability at least 1 δ, by L(f ) A (f ) + 2L φ ER (F(X 1 )) + B 2 log 1 δ Thus, the Rademacher average of the class of real-valued fuctios f bouds the performace of the classifier

10 10 TITLE WILL BE SET BY THE PUBLISHER proof The proof similar to he argumet of the previous sectio: L(f ) A(f ) A (f ) + sup(a(f) A (f)) f F A (f ) + 2ER (φ H(Z1 2 log 1 δ )) + B (where H is the class of fuctios X 1, 1} R of the form f(x)y, f F) A (f ) + 2L φ ER (H(Z1 2 log 1 δ )) + B (by the cotractio priciple of Theorem 33) = A (f ) + 2L φ ER (F(X 1 )) + B 2 log 1 δ 411 Weighted votig schemes I may applicatios such as boostig ad baggig, classifiers are combied by weighted votig schemes which meas that the classificatio rule is obtaied by meas of fuctios f from a class N F λ = f(x) = N c j g j (x) : N N, c j λ, g 1,, g N C (7) j=1 where C is a class of base classifiers, that is, fuctios defied o X, takig values i 1, 1} A classifier of this form may be thought of as oe that, upo observig x, takes a weighted vote of the classifiers g 1,, g N (usig the weights c 1,, c N ) ad decides accordig to the weighted majority I this case, by (5) ad (6) we have j=1 R (F λ (X 1 )) λr (C(X 1 )) λ 2VC log( + 1) where V C is the vc dimesio of the base class To uderstad the richess of classes formed by weighted averages of classifiers from a base class, just cosider the simple oe-dimesioal example i which the base class C cotais all classifiers of the form g(x) = 21 x a 1, a R The V C = 1 ad the closure of F λ (uder the L orm) is the set of all fuctios of total variatio bouded by 2λ Thus, F λ is rich i the sese that ay classifier may be approximated by classifiers associated with the fuctios i F λ I particular, the vc dimesio of the class of all classifiers iduced by fuctios i F λ is ifiite For such large classes of classifiers it is impossible to guaratee that L(f ) exceeds the miimal risk i the class by somethig of the order of 1/2 (see Sectio 55) However, L(f ) may be made as small as the miimum of the cost fuctioal A(f) over the class plus O( 1/2 ) Summarizig, we have obtaied that if F λ is of the form idicated above, the for ay fuctio f chose from F λ i a data-based maer, the probability of error of the associated classifier satisfies, with probability at least 1 δ, 2VC log( + 1) 2 log 1 δ L(f ) A (f ) + 2L φ λ + B (8) The remarkable fact about this iequality is that the upper boud oly ivolves the vc dimesio of the class C of base classifiers which is typically small The price we pay is that the first term o the right-had side is

11 TITLE WILL BE SET BY THE PUBLISHER 11 the empirical cost fuctioal istead of the empirical probability of error As a first illustratio, cosider the example whe γ is a fixed positive parameter ad 0 if x γ φ(x) = 1 if x x/γ otherwise I this case B = 1 ad L φ = 1/γ Notice also that 1 x>0 φ(x) 1 x> γ ad therefore A (f) L γ (f) where L γ (f) is the so-called margi error defied by L γ (f) = 1 i=1 1 f(xi)y i<γ Notice that for all γ > 0, L γ (f) L (f) ad the L γ (f) is icreasig i γ A iterpretatio of the margi error L γ (f) is that it couts, apart from the umber of misclassified pairs (X i, Y i ), also those which are well classified but oly with a small cofidece (or margi ) by f Thus, (8) implies the followig margi-based boud for the risk: Corollary 42 For ay γ > 0, with probability at least 1 δ, L(f ) L γ (f ) + 2 λ 2VC log( + 1) + γ 2 log 1 δ (9) Notice that, as γ grows, the first term of the sum icreases, while the secod decreases The boud ca be very useful wheever a classifier has a small margi error for a relatively large γ (ie, if the classifier classifies the traiig data well with high cofidece ) sice the secod term oly depeds o the vc dimesio of the small base class C This result has bee used to explai the good behavior of some votig methods such as AdaBoost, sice these methods have a tedecy to fid classifiers that classify the data poits well with a large margi 412 Kerel methods Aother popular way to obtai classificatio rules from a class of real-valued fuctios which is used i kerel methods such as Support Vector Machies (SVM) or Kerel Fisher Discrimiat (KFD) is to cosider balls of a reproducig kerel Hilbert space The basic idea is to use a positive defiite kerel fuctio k : X X R, that is, a symmetric fuctio satisfyig α i α j k(x i, x j ) 0, i,j=1 for all choices of, α 1,, α R ad x 1,, x X Such a fuctio aturally geerates a space of fuctios of the form } F = f( ) = α i k(x i, ) : N, α i R, x i X, i=1 which, with the ier product α i k(x i, ), β j k(x j, ) def = α i β j k(x i, x j ) ca be completed ito a Hilbert space The key property is that for all x 1, x 2 X there exist elemets f x1, f x2 F such that k(x 1, x 2 ) = f x1, f x2 This meas that ay liear algorithm based o computig ier products ca be exteded ito a o-liear versio by replacig the ier products by a kerel fuctio The advatage is that eve though the algorithm remais of low complexity, it works i a class of fuctios that ca potetially represet ay cotiuous fuctio arbitrarily well (provided k is chose appropriately)

12 12 TITLE WILL BE SET BY THE PUBLISHER Algorithms workig with kerels usually perform miimizatio of a cost fuctioal o a ball of the associated reproducig kerel Hilbert space of the form N F λ = f(x) = c j k(x j, x) : N N, j=1 N c i c j k(x i, x j ) λ 2, x 1,, x N X (10) i,j=1 Notice that, i cotrast with (7) where the costrait is of l 1 type, the costrait here is of l 2 type Also, the basis fuctios, istead of beig chose from a fixed class, are determied by elemets of X themselves A importat property of fuctios i the reproducig kerel Hilbert space associated with k is that for all x X, f(x) = f, k(x, ) This is called the reproducig property The reproducig property may be used to estimate precisely the Rademacher average of F λ Ideed, deotig by E σ expectatio with respect to the Rademacher variables σ 1,, σ, we have R (F λ (X 1 )) = 1 E σ sup = 1 E σ sup f λ i=1 f λ i=1 σ i f(x i ) σ i f, k(x i, ) = λ E σ σ i k(x i, ) by the Cauchy-Schwarz iequality, where deotes the orm i the reproducig kerel Hilbert space The Kahae-Khichie iequality states that for ay vectors a 1,, a i a Hilbert space, It is also easy to see that so we obtai 1 2 ( E σ i a i 2 E i=1 E i=1 i=1 ) 2 2 σ i a i E σ i a i 2 σ i a i = E σ i σ j a i, a j = i=1 i,j=1 i=1 a i 2, i=1 λ k(x i, X i ) R (F λ (X1 )) λ k(x i, X i ) 2 i=1 This is very ice as it gives a boud that ca be computed very easily from the data A reasoig similar to the oe leadig to (9), usig the bouded differeces iequality to replace the Rademacher average by its empirical versio, gives the followig Corollary 43 Let f be ay fuctio chose from the ball F λ The, with probability at least 1 δ, L(f ) L γ (f ) + 2 λ k(x i, X i ) + γ i=1 i=1 2 log 2 δ

17 TITLE WILL BE SET BY THE PUBLISHER 17 Next we itroduce Rademacher radom variables, obtaiig, by simple symmetrizatio, 2P sup f F P f P f (P f + P f)/2 t } = 2E [P σ sup f F 1 i=1 σ i(f(x i ) f(x }] i)) t (P f + P f)/2 (where P σ is the coditioal probability, give the X i ad X i ) The last step uses tail bouds for idividual fuctios ad a uio boud over F(X1 2 ), where X1 2 deotes the uio of the iitial sample X1 ad of the extra symmetrizatio sample X 1,, X Summarizig, we obtai the followig iequalities: Theorem 51 Let F be a class of fuctios takig biary values i 0, 1} For ay δ (0, 1), with probability at least 1 δ, all f F satisfy P f P f log S F (X1 2 2 ) + log 4 δ P f Also, with probability at least 1 δ, for all f F, P f P f P f 2 log S F (X1 2) + log 4 δ As a cosequece, we have that for all s > 0, with probability at least 1 δ, sup f F P f P f P f + P f + s/2 2 log S F (X 2 1 ) + log 4 δ s (14) ad the same is true if P ad P are permuted Aother cosequece of Theorem 51 with iterestig applicatios is the followig For all t (0, 1], with probability at least 1 δ, I particular, settig t = 1, f F, P f (1 t)p f implies P f 4 log S F(X1 2 ) + log 4 δ t 2 (15) 512 Applicatios to empirical risk miimizatio f F, P f = 0 implies P f 4 log S F(X1 2 ) + log 4 δ It is easy to see that, for o-egative umbers A, B, C 0, the fact that A B A + C etails A B 2 + B C + C so that we obtai from the secod iequality of Theorem 51 that, with probability at least 1 δ, for all f F, P f P f + 2 P f log S F(X 2 1 ) + log 4 δ + 4 log S F(X1 2 ) + log 4 δ Corollary 52 Let g be the empirical risk miimizer i a class C of vc dimesio V The, with probability at least 1 δ, L(g ) L (g ) + 2 L (g ) 2V log( + 1) + log 4 δ + 4 2V log( + 1) + log 4 δ

18 18 TITLE WILL BE SET BY THE PUBLISHER Cosider first the extreme situatio whe there exists a classifier i C which classifies without error This also meas that for some g C, Y = g (X) with probability oe This is clearly a quite restrictive assumptio, oly satisfied i very special cases Nevertheless, the assumptio that if g C L(g) = 0 has bee commoly used i computatioal learig theory, perhaps because of its mathematical simplicity I such a case, clearly L (g) = 0, so that we get, with probability at least 1 δ, L(g) if L(g) 42V log( + 1) + log 4 δ (16) g C The mai poit here is that the upper boud obtaied ( i this special case is of smaller order of magitude V ) tha i the geeral case (O(V l /) as opposed to O l / ) Oe ca actually obtai a versio which iterpolates betwee these two cases as follows: for simplicity, assume that there is a classifier g i C such that L(g ) = if g C L(g) The we have L (g ) L (g ) = L (g ) L(g ) + L(g ) Usig Berstei s iequality, we get, with probability 1 δ, which, together with Corollary 52, yields: L (g ) L(g ) 2L(g ) log 1 δ + 2 log 1 δ 3, Corollary 53 There exists a costat C such that, with probability at least 1 δ, L(g) if L(g) C g C if L(g)V log + log 1 δ g C + V log + log 1 δ 52 Noise ad fast rates We have see that i the case where f takes values i 0, 1} there is a ice relatioship betwee the variace of f (which cotrols the size of the deviatios betwee P f ad P f) ad its expectatio, amely, Var(f) P f This is the key property that allows oe to obtai faster rates of covergece for L(g ) if g C L(g) I particular, i the ideal situatio metioed above, whe if g C L(g) = 0, the differece L(g ) if g C L(g) may be much smaller tha the worst-case differece sup g C (L(g) L (g)) This actually happes i may cases, wheever the distributio satisfies certai coditios Next we describe such coditios ad show how the fier bouds ca be derived The mai idea is that, i order to get precise rates for L(g ) if g C L(g), we cosider fuctios of the form 1 g(x) Y 1 g (X) Y where g is a classifier miimizig the loss i the class C, that is, such that L(g ) = if g C L(g) Note that fuctios of this form are o loger o-egative To illustrate the basic ideas i the simplest possible settig, cosider the case whe the loss class F is a fiite set of N fuctios of the form 1 g(x) Y 1 g (X) Y I additio, we assume that there is a relatioship betwee the variace ad the expectatio of the fuctios i F give by the iequality Var(f) ( ) α P f (17) h

20 20 TITLE WILL BE SET BY THE PUBLISHER be stated by ay of the followig three equivalet statemets: (1) β > 0, g 0, 1} X, E [ ] 1 g(x) g (X) β(l(g) L ) α (2) c > 0, A X, A ( dp (x) c 2η(x) 1 dp (x) A (3) B > 0, t 0, P 2η(X) 1 t} Bt α We refer to this as the Mamme-Tsybakov oise coditio The proof that these statemets are equivalet is straightforward, ad we omit it, but we commet o the meaig of these statemets Notice first that α has to be i [0, 1] because L(g) L = E [ 2η(X) 1 1 g(x) g (X)] E1g(X) g (X) Also, whe α = 0 these coditios are void The case α = 1 i (1) is realized whe there exists a s > 0 such that 2η(X) 1 > s almost surely (which is just the extreme oise coditio we cosidered above) The most importat cosequece of these coditios is that they imply a relatioship betwee the variace ad the expectatio of fuctios of the form 1 g(x) Y 1 g (X) Y Ideed, we obtai 1 α E [ (1 g(x) Y 1 g (X) Y ) 2] c(l(g) L ) α This is thus eough to get (18) for a fiite class of fuctios The sharper bouds, established i this sectio ad the ext, come at the price of the assumptio that the Bayes classifier is i the class C Because of this, it is difficult to compare the fast rates achieved with the slower rates proved i Sectio 3 O the other had, oise coditios like the Mamme-Tsybakov coditio may be used to get improvemets eve whe g is ot cotaied i C I these cases the approximatio error L(g ) L also eeds to be take ito accout, ad the situatio becomes somewhat more complex We retur to these issues i Sectios 535 ad 8 53 Localizatio The purpose of this sectio is to geeralize the simple argumet of the previous sectio to more geeral classes C of classifiers This geeralizatio reveals the importace of the modulus of cotiuity of the empirical process as a measure of complexity of the learig problem 531 Talagrad s iequality Oe of the most importat recet developmets i empirical process theory is a cocetratio iequality for the supremum of a empirical process first proved by Talagrad [212] ad refied later by various authors This iequality is at the heart of may key developmets i statistical learig theory Here we recall the followig versio: Theorem 54 Let b > 0 ad set F to be a set of fuctios from X to R Assume that all fuctios i F satisfy P f f b The, with probability at least 1 δ, for ay θ > 0, [ ] sup (P f P f) (1 + θ)e sup (P f P f) f F f F which, for θ = 1 traslates to [ sup (P f P f) 2E f F sup (P f P f) f F ] + + 2(sup f F Var(f)) log 1 δ 2(sup f F Var(f)) log 1 δ ) α + (1 + 3/θ)b log 1 δ 3 + 4b log 1 δ 3,

### Properties of MLE: consistency, asymptotic normality. Fisher information.

Lecture 3 Properties of MLE: cosistecy, asymptotic ormality. Fisher iformatio. I this sectio we will try to uderstad why MLEs are good. Let us recall two facts from probability that we be used ofte throughout

### Chapter 7 Methods of Finding Estimators

Chapter 7 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 011 Chapter 7 Methods of Fidig Estimators Sectio 7.1 Itroductio Defiitio 7.1.1 A poit estimator is ay fuctio W( X) W( X1, X,, X ) of

### I. Chi-squared Distributions

1 M 358K Supplemet to Chapter 23: CHI-SQUARED DISTRIBUTIONS, T-DISTRIBUTIONS, AND DEGREES OF FREEDOM To uderstad t-distributios, we first eed to look at aother family of distributios, the chi-squared distributios.

### ORDERS OF GROWTH KEITH CONRAD

ORDERS OF GROWTH KEITH CONRAD Itroductio Gaiig a ituitive feel for the relative growth of fuctios is importat if you really wat to uderstad their behavior It also helps you better grasp topics i calculus

### Module 4: Mathematical Induction

Module 4: Mathematical Iductio Theme 1: Priciple of Mathematical Iductio Mathematical iductio is used to prove statemets about atural umbers. As studets may remember, we ca write such a statemet as a predicate

### 3. Covariance and Correlation

Virtual Laboratories > 3. Expected Value > 1 2 3 4 5 6 3. Covariace ad Correlatio Recall that by takig the expected value of various trasformatios of a radom variable, we ca measure may iterestig characteristics

### Introduction to Statistical Learning Theory

Itroductio to Statistical Learig Theory Olivier Bousquet 1, Stéphae Bouchero 2, ad Gábor Lugosi 3 1 Max-Plack Istitute for Biological Cyberetics Spemastr 38, D-72076 Tübige, Germay olivierbousquet@m4xorg

### In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008

I ite Sequeces Dr. Philippe B. Laval Keesaw State Uiversity October 9, 2008 Abstract This had out is a itroductio to i ite sequeces. mai de itios ad presets some elemetary results. It gives the I ite Sequeces

### Asymptotic Growth of Functions

CMPS Itroductio to Aalysis of Algorithms Fall 3 Asymptotic Growth of Fuctios We itroduce several types of asymptotic otatio which are used to compare the performace ad efficiecy of algorithms As we ll

### Chapter 6: Variance, the law of large numbers and the Monte-Carlo method

Chapter 6: Variace, the law of large umbers ad the Mote-Carlo method Expected value, variace, ad Chebyshev iequality. If X is a radom variable recall that the expected value of X, E[X] is the average value

### 7. Sample Covariance and Correlation

1 of 8 7/16/2009 6:06 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 7. Sample Covariace ad Correlatio The Bivariate Model Suppose agai that we have a basic radom experimet, ad that X ad Y

### Lecture 13. Lecturer: Jonathan Kelner Scribe: Jonathan Pines (2009)

18.409 A Algorithmist s Toolkit October 27, 2009 Lecture 13 Lecturer: Joatha Keler Scribe: Joatha Pies (2009) 1 Outlie Last time, we proved the Bru-Mikowski iequality for boxes. Today we ll go over the

### Department of Computer Science, University of Otago

Departmet of Computer Sciece, Uiversity of Otago Techical Report OUCS-2006-09 Permutatios Cotaiig May Patters Authors: M.H. Albert Departmet of Computer Sciece, Uiversity of Otago Micah Colema, Rya Fly

### Incremental calculation of weighted mean and variance

Icremetal calculatio of weighted mea ad variace Toy Fich faf@cam.ac.uk dot@dotat.at Uiversity of Cambridge Computig Service February 009 Abstract I these otes I eplai how to derive formulae for umerically

### Modified Line Search Method for Global Optimization

Modified Lie Search Method for Global Optimizatio Cria Grosa ad Ajith Abraham Ceter of Excellece for Quatifiable Quality of Service Norwegia Uiversity of Sciece ad Techology Trodheim, Norway {cria, ajith}@q2s.tu.o

### Soving Recurrence Relations

Sovig Recurrece Relatios Part 1. Homogeeous liear 2d degree relatios with costat coefficiets. Cosider the recurrece relatio ( ) T () + at ( 1) + bt ( 2) = 0 This is called a homogeeous liear 2d degree

### Sequences and Series

CHAPTER 9 Sequeces ad Series 9.. Covergece: Defiitio ad Examples Sequeces The purpose of this chapter is to itroduce a particular way of geeratig algorithms for fidig the values of fuctios defied by their

### A probabilistic proof of a binomial identity

A probabilistic proof of a biomial idetity Joatho Peterso Abstract We give a elemetary probabilistic proof of a biomial idetity. The proof is obtaied by computig the probability of a certai evet i two

### Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13

EECS 70 Discrete Mathematics ad Probability Theory Sprig 2014 Aat Sahai Note 13 Itroductio At this poit, we have see eough examples that it is worth just takig stock of our model of probability ad may

### Convexity, Inequalities, and Norms

Covexity, Iequalities, ad Norms Covex Fuctios You are probably familiar with the otio of cocavity of fuctios. Give a twicedifferetiable fuctio ϕ: R R, We say that ϕ is covex (or cocave up) if ϕ (x) 0 for

### Class Meeting # 16: The Fourier Transform on R n

MATH 18.152 COUSE NOTES - CLASS MEETING # 16 18.152 Itroductio to PDEs, Fall 2011 Professor: Jared Speck Class Meetig # 16: The Fourier Trasform o 1. Itroductio to the Fourier Trasform Earlier i the course,

### Statistical Learning Theory

1 / 130 Statistical Learig Theory Machie Learig Summer School, Kyoto, Japa Alexader (Sasha) Rakhli Uiversity of Pesylvaia, The Wharto School Pe Research i Machie Learig (PRiML) August 27-28, 2012 2 / 130

### Lecture 7: Borel Sets and Lebesgue Measure

EE50: Probability Foudatios for Electrical Egieers July-November 205 Lecture 7: Borel Sets ad Lebesgue Measure Lecturer: Dr. Krisha Jagaatha Scribes: Ravi Kolla, Aseem Sharma, Vishakh Hegde I this lecture,

### Section IV.5: Recurrence Relations from Algorithms

Sectio IV.5: Recurrece Relatios from Algorithms Give a recursive algorithm with iput size, we wish to fid a Θ (best big O) estimate for its ru time T() either by obtaiig a explicit formula for T() or by

### Sequences II. Chapter 3. 3.1 Convergent Sequences

Chapter 3 Sequeces II 3. Coverget Sequeces Plot a graph of the sequece a ) = 2, 3 2, 4 3, 5 + 4,...,,... To what limit do you thik this sequece teds? What ca you say about the sequece a )? For ǫ = 0.,

### NPTEL STRUCTURAL RELIABILITY

NPTEL Course O STRUCTURAL RELIABILITY Module # 0 Lecture 1 Course Format: Web Istructor: Dr. Aruasis Chakraborty Departmet of Civil Egieerig Idia Istitute of Techology Guwahati 1. Lecture 01: Basic Statistics

### Maximum Likelihood Estimators.

Lecture 2 Maximum Likelihood Estimators. Matlab example. As a motivatio, let us look at oe Matlab example. Let us geerate a radom sample of size 00 from beta distributio Beta(5, 2). We will lear the defiitio

### THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n

We will cosider the liear regressio model i matrix form. For simple liear regressio, meaig oe predictor, the model is i = + x i + ε i for i =,,,, This model icludes the assumptio that the ε i s are a sample

Advaced Probability Theory Math5411 HKUST Kai Che (Istructor) Chapter 1. Law of Large Numbers 1.1. σ-algebra, measure, probability space ad radom variables. This sectio lays the ecessary rigorous foudatio

### if A S, then X \ A S, and if (A n ) n is a sequence of sets in S, then n A n S,

Lecture 5: Borel Sets Topologically, the Borel sets i a topological space are the σ-algebra geerated by the ope sets. Oe ca build up the Borel sets from the ope sets by iteratig the operatios of complemetatio

### 4.1 Sigma Notation and Riemann Sums

0 the itegral. Sigma Notatio ad Riema Sums Oe strategy for calculatig the area of a regio is to cut the regio ito simple shapes, calculate the area of each simple shape, ad the add these smaller areas

### 1 Introduction to reducing variance in Monte Carlo simulations

Copyright c 007 by Karl Sigma 1 Itroductio to reducig variace i Mote Carlo simulatios 11 Review of cofidece itervals for estimatig a mea I statistics, we estimate a uow mea µ = E(X) of a distributio by

### The second difference is the sequence of differences of the first difference sequence, 2

Differece Equatios I differetial equatios, you look for a fuctio that satisfies ad equatio ivolvig derivatives. I differece equatios, istead of a fuctio of a cotiuous variable (such as time), we look for

### 8.5 Alternating infinite series

65 8.5 Alteratig ifiite series I the previous two sectios we cosidered oly series with positive terms. I this sectio we cosider series with both positive ad egative terms which alterate: positive, egative,

### Solutions to Selected Problems In: Pattern Classification by Duda, Hart, Stork

Solutios to Selected Problems I: Patter Classificatio by Duda, Hart, Stork Joh L. Weatherwax February 4, 008 Problem Solutios Chapter Bayesia Decisio Theory Problem radomized rules Part a: Let Rx be the

### Key Ideas Section 8-1: Overview hypothesis testing Hypothesis Hypothesis Test Section 8-2: Basics of Hypothesis Testing Null Hypothesis

Chapter 8 Key Ideas Hypothesis (Null ad Alterative), Hypothesis Test, Test Statistic, P-value Type I Error, Type II Error, Sigificace Level, Power Sectio 8-1: Overview Cofidece Itervals (Chapter 7) are

### 1 Computing the Standard Deviation of Sample Means

Computig the Stadard Deviatio of Sample Meas Quality cotrol charts are based o sample meas ot o idividual values withi a sample. A sample is a group of items, which are cosidered all together for our aalysis.

### Hypothesis testing. Null and alternative hypotheses

Hypothesis testig Aother importat use of samplig distributios is to test hypotheses about populatio parameters, e.g. mea, proportio, regressio coefficiets, etc. For example, it is possible to stipulate

### 0.7 0.6 0.2 0 0 96 96.5 97 97.5 98 98.5 99 99.5 100 100.5 96.5 97 97.5 98 98.5 99 99.5 100 100.5

Sectio 13 Kolmogorov-Smirov test. Suppose that we have a i.i.d. sample X 1,..., X with some ukow distributio P ad we would like to test the hypothesis that P is equal to a particular distributio P 0, i.e.

### Lesson 12. Sequences and Series

Retur to List of Lessos Lesso. Sequeces ad Series A ifiite sequece { a, a, a,... a,...} ca be thought of as a list of umbers writte i defiite order ad certai patter. It is usually deoted by { a } =, or

### BASIC STATISTICS. Discrete. Mass Probability Function: P(X=x i ) Only one finite set of values is considered {x 1, x 2,...} Prob. t = 1.

BASIC STATISTICS 1.) Basic Cocepts: Statistics: is a sciece that aalyzes iformatio variables (for istace, populatio age, height of a basketball team, the temperatures of summer moths, etc.) ad attempts

### A Gentle Introduction to Algorithms: Part II

A Getle Itroductio to Algorithms: Part II Cotets of Part I:. Merge: (to merge two sorted lists ito a sigle sorted list.) 2. Bubble Sort 3. Merge Sort: 4. The Big-O, Big-Θ, Big-Ω otatios: asymptotic bouds

### Vladimir N. Burkov, Dmitri A. Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT

Keywords: project maagemet, resource allocatio, etwork plaig Vladimir N Burkov, Dmitri A Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT The paper deals with the problems of resource allocatio betwee

### Lecture Notes CMSC 251

We have this messy summatio to solve though First observe that the value remais costat throughout the sum, ad so we ca pull it out frot Also ote that we ca write 3 i / i ad (3/) i T () = log 3 (log ) 1

### 5: Introduction to Estimation

5: Itroductio to Estimatio Cotets Acroyms ad symbols... 1 Statistical iferece... Estimatig µ with cofidece... 3 Samplig distributio of the mea... 3 Cofidece Iterval for μ whe σ is kow before had... 4 Sample

### Totally Corrective Boosting Algorithms that Maximize the Margin

Mafred K. Warmuth mafred@cse.ucsc.edu Ju Liao liaoju@cse.ucsc.edu Uiversity of Califoria at Sata Cruz, Sata Cruz, CA 95064, USA Guar Rätsch Guar.Raetsch@tuebige.mpg.de Friedrich Miescher Laboratory of

### LECTURE 13: Cross-validation

LECTURE 3: Cross-validatio Resampli methods Cross Validatio Bootstrap Bias ad variace estimatio with the Bootstrap Three-way data partitioi Itroductio to Patter Aalysis Ricardo Gutierrez-Osua Texas A&M

### 13 Fast Fourier Transform (FFT)

13 Fast Fourier Trasform FFT) The fast Fourier trasform FFT) is a algorithm for the efficiet implemetatio of the discrete Fourier trasform. We begi our discussio oce more with the cotiuous Fourier trasform.

### Confidence Intervals for One Mean with Tolerance Probability

Chapter 421 Cofidece Itervals for Oe Mea with Tolerace Probability Itroductio This procedure calculates the sample size ecessary to achieve a specified distace from the mea to the cofidece limit(s) with

### TAYLOR SERIES, POWER SERIES

TAYLOR SERIES, POWER SERIES The followig represets a (icomplete) collectio of thigs that we covered o the subject of Taylor series ad power series. Warig. Be prepared to prove ay of these thigs durig the

### Hypothesis Tests Applied to Means

The Samplig Distributio of the Mea Hypothesis Tests Applied to Meas Recall that the samplig distributio of the mea is the distributio of sample meas that would be obtaied from a particular populatio (with

### SUPPLEMENTARY MATERIAL TO GENERAL NON-EXACT ORACLE INEQUALITIES FOR CLASSES WITH A SUBEXPONENTIAL ENVELOPE

SUPPLEMENTARY MATERIAL TO GENERAL NON-EXACT ORACLE INEQUALITIES FOR CLASSES WITH A SUBEXPONENTIAL ENVELOPE By Guillaume Lecué CNRS, LAMA, Mare-la-vallée, 77454 Frace ad By Shahar Medelso Departmet of Mathematics,

### Divide and Conquer, Solving Recurrences, Integer Multiplication Scribe: Juliana Cook (2015), V. Williams Date: April 6, 2016

CS 6, Lecture 3 Divide ad Coquer, Solvig Recurreces, Iteger Multiplicatio Scribe: Juliaa Cook (05, V Williams Date: April 6, 06 Itroductio Today we will cotiue to talk about divide ad coquer, ad go ito

### 5 Boolean Decision Trees (February 11)

5 Boolea Decisio Trees (February 11) 5.1 Graph Coectivity Suppose we are give a udirected graph G, represeted as a boolea adjacecy matrix = (a ij ), where a ij = 1 if ad oly if vertices i ad j are coected

### Plug-in martingales for testing exchangeability on-line

Plug-i martigales for testig exchageability o-lie Valetia Fedorova, Alex Gammerma, Ilia Nouretdiov, ad Vladimir Vovk Computer Learig Research Cetre Royal Holloway, Uiversity of Lodo, UK {valetia,ilia,alex,vovk}@cs.rhul.ac.uk

### The analysis of the Cournot oligopoly model considering the subjective motive in the strategy selection

The aalysis of the Courot oligopoly model cosiderig the subjective motive i the strategy selectio Shigehito Furuyama Teruhisa Nakai Departmet of Systems Maagemet Egieerig Faculty of Egieerig Kasai Uiversity

### CS103A Handout 23 Winter 2002 February 22, 2002 Solving Recurrence Relations

CS3A Hadout 3 Witer 00 February, 00 Solvig Recurrece Relatios Itroductio A wide variety of recurrece problems occur i models. Some of these recurrece relatios ca be solved usig iteratio or some other ad

### where: T = number of years of cash flow in investment's life n = the year in which the cash flow X n i = IRR = the internal rate of return

EVALUATING ALTERNATIVE CAPITAL INVESTMENT PROGRAMS By Ke D. Duft, Extesio Ecoomist I the March 98 issue of this publicatio we reviewed the procedure by which a capital ivestmet project was assessed. The

### 1 The Binomial Theorem: Another Approach

The Biomial Theorem: Aother Approach Pascal s Triagle I class (ad i our text we saw that, for iteger, the biomial theorem ca be stated (a + b = c a + c a b + c a b + + c ab + c b, where the coefficiets

### Trading the randomness - Designing an optimal trading strategy under a drifted random walk price model

Tradig the radomess - Desigig a optimal tradig strategy uder a drifted radom walk price model Yuao Wu Math 20 Project Paper Professor Zachary Hamaker Abstract: I this paper the author iteds to explore

### THE HEIGHT OF q-binary SEARCH TREES

THE HEIGHT OF q-binary SEARCH TREES MICHAEL DRMOTA AND HELMUT PRODINGER Abstract. q biary search trees are obtaied from words, equipped with the geometric distributio istead of permutatios. The average

### when n = 1, 2, 3, 4, 5, 6, This list represents the amount of dollars you have after n days. Note: The use of is read as and so on.

Geometric eries Before we defie what is meat by a series, we eed to itroduce a related topic, that of sequeces. Formally, a sequece is a fuctio that computes a ordered list. uppose that o day 1, you have

### Measurable Functions

Measurable Fuctios Dug Le 1 1 Defiitio It is ecessary to determie the class of fuctios that will be cosidered for the Lebesgue itegratio. We wat to guaratee that the sets which arise whe workig with these

### Review for College Algebra Final Exam

Review for College Algebra Fial Exam (Please remember that half of the fial exam will cover chapters 1-4. This review sheet covers oly the ew material, from chapters 5 ad 7.) 5.1 Systems of equatios i

### CHAPTER 3 DIGITAL CODING OF SIGNALS

CHAPTER 3 DIGITAL CODING OF SIGNALS Computers are ofte used to automate the recordig of measuremets. The trasducers ad sigal coditioig circuits produce a voltage sigal that is proportioal to a quatity

### SAMPLE QUESTIONS FOR FINAL EXAM. (1) (2) (3) (4) Find the following using the definition of the Riemann integral: (2x + 1)dx

SAMPLE QUESTIONS FOR FINAL EXAM REAL ANALYSIS I FALL 006 3 4 Fid the followig usig the defiitio of the Riema itegral: a 0 x + dx 3 Cosider the partitio P x 0 3, x 3 +, x 3 +,......, x 3 3 + 3 of the iterval

### Standard Errors and Confidence Intervals

Stadard Errors ad Cofidece Itervals Itroductio I the documet Data Descriptio, Populatios ad the Normal Distributio a sample had bee obtaied from the populatio of heights of 5-year-old boys. If we assume

### MARTINGALES AND A BASIC APPLICATION

MARTINGALES AND A BASIC APPLICATION TURNER SMITH Abstract. This paper will develop the measure-theoretic approach to probability i order to preset the defiitio of martigales. From there we will apply this

### Notes on exponential generating functions and structures.

Notes o expoetial geeratig fuctios ad structures. 1. The cocept of a structure. Cosider the followig coutig problems: (1) to fid for each the umber of partitios of a -elemet set, (2) to fid for each the

### Lecture 4: Cauchy sequences, Bolzano-Weierstrass, and the Squeeze theorem

Lecture 4: Cauchy sequeces, Bolzao-Weierstrass, ad the Squeeze theorem The purpose of this lecture is more modest tha the previous oes. It is to state certai coditios uder which we are guarateed that limits

### 3. Continuous Random Variables

Statistics ad probability: 3-1 3. Cotiuous Radom Variables A cotiuous radom variable is a radom variable which ca take values measured o a cotiuous scale e.g. weights, stregths, times or legths. For ay

Physics 6A Witer 20 Theorems About Power Series Cosider a power series, f(x) = a x, () where the a are real coefficiets ad x is a real variable. There exists a real o-egative umber R, called the radius

### Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable

Week 3 Coditioal probabilities, Bayes formula, WEEK 3 page 1 Expected value of a radom variable We recall our discussio of 5 card poker hads. Example 13 : a) What is the probability of evet A that a 5

### Normal Distribution.

Normal Distributio www.icrf.l Normal distributio I probability theory, the ormal or Gaussia distributio, is a cotiuous probability distributio that is ofte used as a first approimatio to describe realvalued

### Output Analysis (2, Chapters 10 &11 Law)

B. Maddah ENMG 6 Simulatio 05/0/07 Output Aalysis (, Chapters 10 &11 Law) Comparig alterative system cofiguratio Sice the output of a simulatio is radom, the comparig differet systems via simulatio should

### Recursion and Recurrences

Chapter 5 Recursio ad Recurreces 5.1 Growth Rates of Solutios to Recurreces Divide ad Coquer Algorithms Oe of the most basic ad powerful algorithmic techiques is divide ad coquer. Cosider, for example,

### Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring

No-life isurace mathematics Nils F. Haavardsso, Uiversity of Oslo ad DNB Skadeforsikrig Mai issues so far Why does isurace work? How is risk premium defied ad why is it importat? How ca claim frequecy

### Confidence Intervals for the Mean of Non-normal Data Class 23, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Cofidece Itervals for the Mea of No-ormal Data Class 23, 8.05, Sprig 204 Jeremy Orloff ad Joatha Bloom Learig Goals. Be able to derive the formula for coservative ormal cofidece itervals for the proportio

### Riemann Sums y = f (x)

Riema Sums Recall that we have previously discussed the area problem I its simplest form we ca state it this way: The Area Problem Let f be a cotiuous, o-egative fuctio o the closed iterval [a, b] Fid

### Irreducible polynomials with consecutive zero coefficients

Irreducible polyomials with cosecutive zero coefficiets Theodoulos Garefalakis Departmet of Mathematics, Uiversity of Crete, 71409 Heraklio, Greece Abstract Let q be a prime power. We cosider the problem

### Taking DCOP to the Real World: Efficient Complete Solutions for Distributed Multi-Event Scheduling

Taig DCOP to the Real World: Efficiet Complete Solutios for Distributed Multi-Evet Schedulig Rajiv T. Maheswara, Milid Tambe, Emma Bowrig, Joatha P. Pearce, ad Pradeep araatham Uiversity of Souther Califoria

### INVESTMENT PERFORMANCE COUNCIL (IPC)

INVESTMENT PEFOMANCE COUNCIL (IPC) INVITATION TO COMMENT: Global Ivestmet Performace Stadards (GIPS ) Guidace Statemet o Calculatio Methodology The Associatio for Ivestmet Maagemet ad esearch (AIM) seeks

### Universal coding for classes of sources

Coexios module: m46228 Uiversal codig for classes of sources Dever Greee This work is produced by The Coexios Project ad licesed uder the Creative Commos Attributio Licese We have discussed several parametric

### Chapter 7 - Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas:

Chapter 7 - Samplig Distributios 1 Itroductio What is statistics? It cosist of three major areas: Data Collectio: samplig plas ad experimetal desigs Descriptive Statistics: umerical ad graphical summaries

### Statistical inference: example 1. Inferential Statistics

Statistical iferece: example 1 Iferetial Statistics POPULATION SAMPLE A clothig store chai regularly buys from a supplier large quatities of a certai piece of clothig. Each item ca be classified either

### Overview of some probability distributions.

Lecture Overview of some probability distributios. I this lecture we will review several commo distributios that will be used ofte throughtout the class. Each distributio is usually described by its probability

### Definition. Definition. 7-2 Estimating a Population Proportion. Definition. Definition

7- stimatig a Populatio Proportio I this sectio we preset methods for usig a sample proportio to estimate the value of a populatio proportio. The sample proportio is the best poit estimate of the populatio

### Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

This documet was writte ad copyrighted by Paul Dawkis. Use of this documet ad its olie versio is govered by the Terms ad Coditios of Use located at http://tutorial.math.lamar.edu/terms.asp. The olie versio

### Methods of Evaluating Estimators

Math 541: Statistical Theory II Istructor: Sogfeg Zheg Methods of Evaluatig Estimators Let X 1, X 2,, X be i.i.d. radom variables, i.e., a radom sample from f(x θ), where θ is ukow. A estimator of θ is

### 1. C. The formula for the confidence interval for a population mean is: x t, which was

s 1. C. The formula for the cofidece iterval for a populatio mea is: x t, which was based o the sample Mea. So, x is guarateed to be i the iterval you form.. D. Use the rule : p-value

INFINITE SERIES KEITH CONRAD. Itroductio The two basic cocepts of calculus, differetiatio ad itegratio, are defied i terms of limits (Newto quotiets ad Riema sums). I additio to these is a third fudametal

### Confidence Intervals for One Mean

Chapter 420 Cofidece Itervals for Oe Mea Itroductio This routie calculates the sample size ecessary to achieve a specified distace from the mea to the cofidece limit(s) at a stated cofidece level for a

### Math Discrete Math Combinatorics MULTIPLICATION PRINCIPLE:

Math 355 - Discrete Math 4.1-4.4 Combiatorics Notes MULTIPLICATION PRINCIPLE: If there m ways to do somethig ad ways to do aother thig the there are m ways to do both. I the laguage of set theory: Let

### B1. Fourier Analysis of Discrete Time Signals

B. Fourier Aalysis of Discrete Time Sigals Objectives Itroduce discrete time periodic sigals Defie the Discrete Fourier Series (DFS) expasio of periodic sigals Defie the Discrete Fourier Trasform (DFT)

### Infinite Sequences and Series

CHAPTER 4 Ifiite Sequeces ad Series 4.1. Sequeces A sequece is a ifiite ordered list of umbers, for example the sequece of odd positive itegers: 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29...

### The Stable Marriage Problem

The Stable Marriage Problem William Hut Lae Departmet of Computer Sciece ad Electrical Egieerig, West Virgiia Uiversity, Morgatow, WV William.Hut@mail.wvu.edu 1 Itroductio Imagie you are a matchmaker,

### Numerical Solution of Equations

School of Mechaical Aerospace ad Civil Egieerig Numerical Solutio of Equatios T J Craft George Begg Buildig, C4 TPFE MSc CFD- Readig: J Ferziger, M Peric, Computatioal Methods for Fluid Dyamics HK Versteeg,

### An example of non-quenched convergence in the conditional central limit theorem for partial sums of a linear process

A example of o-queched covergece i the coditioal cetral limit theorem for partial sums of a liear process Dalibor Volý ad Michael Woodroofe Abstract A causal liear processes X,X 0,X is costructed for which