On the Generalization Ability of Online Learning Algorithms for Pairwise Loss Functions
|
|
|
- Iris Douglas
- 10 years ago
- Views:
Transcription
1 O the Geeraliatio Ability of Olie Learig Algorithms for Pairwise Loss Fuctios Purushottam Kar Departmet of Computer Sciece ad Egieerig, Idia Istitute of Techology, Kapur, UP , INDIA. Bharath K Sriperumbudur [email protected] Statistical Laboratory, Cetre for Mathematical Scieces, Wilberforce Road, Cambridge, CB3 0WB, ENGLAND. Prateek Jai [email protected] Microsoft Research Idia, Vigya, #9, Lavelle Road, Bagalore, KA , INDIA. Harish C Karick [email protected] Departmet of Computer Sciece ad Egieerig, Idia Istitute of Techology, Kapur, UP , INDIA. Abstract I this paper, we study the geeraliatio properties of olie learig based stochastic methods for supervised learig problems where the loss fuctio is depedet o more tha oe traiig sample (e.g., metric learig, rakig). We preset a geeric decouplig techique that eables us to provide Rademacher complexity-based geeraliatio error bouds. Our bouds are i geeral tighter tha those obtaied by Wag et al. (202) for the same problem. Usig our decouplig techique, we are further able to obtai fast covergece rates for strogly covex pairwise loss fuctios. We are also able to aalye a class of memory efficiet olie learig algorithms for pairwise learig problems that use oly a bouded subset of past traiig samples to update the hypothesis at each step. Fially, i order to complemet our geeraliatio bouds, we propose a ovel memory efficiet olie learig algorithm for higher order learig problems with bouded regret guaratees.. Itroductio Several supervised learig problems ivolve workig with pairwise or higher order loss fuctios, i.e., loss fuctios that deped o more tha oe traiig sam- Proceedigs of the 30 th Iteratioal Coferece o Machie Learig, Atlata, Georgia, USA, 203. JMLR: W&CP volume 28. Copyright 203 by the author(s). ple. Take for example the metric learig problem (Ji et al., 2009), where the goal is to lear a metric M that brigs poits of a similar label together while keepig differetly labeled poits apart. I this case the loss fuctio used is a pairwise loss fuctio l(m, (x, y), (x, y )) = φ (yy ( M(x, x ))) where φ is the hige loss fuctio. I geeral, a pairwise loss fuctio is of the form l : H X X R + where H is the hypothesis space ad X is the iput domai. Other examples iclude preferece learig (Xig et al., 2002), rakig (Agarwal & Niyogi, 2009), AUC maximiatio (Zhao et al., 20) ad multiple kerel learig (Kumar et al., 202). I practice, algorithms for such problems use itersectig pairs of traiig samples to lear. Hece the traiig data pairs are ot i.i.d. ad cosequetly, stadard geeraliatio error aalysis techiques do ot apply to these algorithms. Recetly, the aalysis of batch algorithms learig from such coupled samples has received much attetio (Cao et al., 202; Clémeço et al., 2008; Brefeld & Scheffer, 2005) where a domiat idea has bee to use a alterate represetatio of the U-statistic ad provide uiform covergece bouds. Aother popular approach has bee to use algorithmic stability (Agarwal & Niyogi, 2009; Ji et al., 2009) to obtai algorithm-specific results. While batch algorithms for pairwise (ad higher-order) learig problems have bee studied well theoretically, olie learig based stochastic algorithms are more popular i practice due to their scalability. However, their geeraliatio properties were ot studied util recetly. Wag et al. (202) provided the first geeraliatio error aalysis of olie learig methods
2 O the Geeraliatio Ability of Olie Learig Algorithms for Pairwise Loss Fuctios applied to pairwise loss fuctios. I particular, they showed that such higher-order olie learig methods also admit olie to batch coversio bouds (similar to those for first-order problems (Cesa-Biachi et al., 200)) which ca be combied with regret bouds to obtai geeraliatio error bouds. However, due to their proof techique ad depedece o L coverig umbers of fuctio classes, their bouds are ot tight ad have a strog depedece o the dimesioality of the iput space. I literature, there are several istaces where Rademacher complexity based techiques achieve sharper bouds tha those based o coverig umbers (Kakade et al., 2008). However, the couplig of differet iput pairs i our problem does ot allow us to use such techiques directly. I this paper we itroduce a geeric techique for aalyig olie learig algorithms for higher order learig problems. Our techique, that uses a extesio of Rademacher complexities to higher order fuctio classes (istead of coverig umbers), allows us to give bouds that are tighter tha those of (Wag et al., 202) ad that, for several learig scearios, have o depedece o iput dimesioality at all. Key to our proof is a techique we call Symmetriatio of Expectatios which acts as a decouplig step ad allows us to reduce excess risk estimates to Rademacher complexities of fuctio classes. (Wag et al., 202), o the other had, perform a symmetriatio with probabilities which, apart from beig more ivolved, yields suboptimal bouds. Aother advatage of our techique is that it allows us to obtai fast covergece rates for learig algorithms that use strogly covex loss fuctios. Our result, that uses a ovel two stage proof techique, exteds a similar result i the first order settig by Kakade & Tewari (2008) to the pairwise settig. Wag et al. (202) (ad our results metioed above) assume a olie learig setup i which a stream of poits,..., is observed ad the pealty fuctio t τ= l(h, t, τ ). used at the t th step is ˆL t (h) = t Cosequetly, the results of Wag et al. (202) expect regret bouds with respect to these all-pairs pealties ˆL t. This requires oe to use/store all previously see poits which is computatioally/storagewise expesive ad hece i practice, learig algorithms update their hypotheses usig oly a bouded subset of the past samples (Zhao et al., 20). I the above metioed settig, we are able to give geeraliatio bouds that oly require algorithms to give regret bouds with respect to fiite-fer pealty fuctios such as ˆL t (h) = B B l(h, t, ) where B is a fer that is updated at each step. Our proofs hold for ay stream oblivious fer update policy icludig FIFO ad the widely used reservoir samplig policy (Vitter, 985; Zhao et al., 20). To complemet our olie to batch coversio bouds, we also provide a memory efficiet olie learig algorithm that works with bouded fers. Although our algorithm is costraied to observe ad lear usig ˆL t the fiite-fer pealties aloe, we are still able to provide high cofidece regret bouds with respect to the all-pairs pealty fuctios ˆL t. We ote that Zhao et al. (20) also propose a algorithm that uses fiite fers ad claim a all-pairs regret boud for the same. However, their regret boud does ot hold due to a subtle mistake i their proof. We also provide empirical validatio of our proposed olie learig algorithm o AUC maximiatio tasks ad show that our algorithm performs competitively with that of (Zhao et al., 20), i additio to beig able to offer theoretical regret bouds. Our Cotributios: (a) We provide a geeric olie-to-batch coversio techique for higher-order supervised learig problems offerig bouds that are sharper tha those of (Wag et al., 202). (b) We obtai fast covergece rates whe loss fuctios are strogly covex. (c) We aalye olie learig algorithms that are costraied to lear usig a fiite fer. (d) We propose a ovel olie learig algorithm that works with fiite fers but is able to provide a high cofidece regret boud with respect to the all-pairs pealty fuctios. 2. Problem Setup For ease of expositio, we itroduce a olie learig model for higher order supervised learig problems i this sectio; cocrete learig istaces such as AUC maximiatio ad metric learig are give i Sectio 6. For sake of simplicity, we restrict ourselves to pairwise problems i this paper; our techiques ca be readily exteded to higher order problems as well. For pairwise learig problems, our goal is to lear a Idepedetly, Wag et al. (203) also exteded their proof to give similar guaratees. However, their bouds hold oly for the FIFO update policy ad have worse depedece o dimesioality i several cases (see Sectio 5).
3 O the Geeraliatio Ability of Olie Learig Algorithms for Pairwise Loss Fuctios real valued bivariate fuctio h : X X Y, where h H, uder some loss fuctio l : H Z Z R + where Z = X Y. The olie learig algorithm is give sequetial access to a stream of elemets, 2,..., chose i.i.d. from the domai Z. Let Z t := {,..., t }. At each time step t = 2..., the algorithm posits a hypothesis h t H upo which the elemet t is revealed ad the algorithm icurs the followig pealty: ˆL t (h t ) = t l(h t, t, τ ). () t τ= For ay h H, we defie its expected risk as: L(h) := E, l(h,, ). (2) Our aim is to preset a esemble h,..., h such that the expected risk of the esemble is small. More specifically, we desire that, for some small ɛ > 0, L(h t ) L(h ) + ɛ, where h = arg mi L(h) is the populatio risk miimier. Note that this allows us to do hypothe- sis selectio i a way that esures small expected risk. Specifically, if oe chooses a hypothesis as ĥ := ( ) h t (for covex l) or ĥ := arg mi L(h t ),..., the we have L(ĥ) L(h ) + ɛ. Sice the model preseted above requires storig all previously see poits, it becomes uusable i large scale learig scearios. Istead, i practice, a sketch of the stream is maitaied i a fer B of capacity s. At each step, the pealty is ow icurred oly o the pairs {( t, ) : B t } where B t is the state of the fer at time t. That is, ˆL t (h t ) = l(h t, t, ). (3) B t B t We shall assume that the fer is updated at each step usig some stream oblivious policy such as FIFO or Reservoir samplig (Vitter, 985) (see Sectio 5). I Sectio 3, we preset olie-to-batch coversio bouds for olie learig algorithms that give regret bouds w.r.t. pealty fuctios give by (). I Sectio 4, we exted our aalysis to algorithms usig strogly covex loss fuctios. I Sectio 5 we provide geeraliatio error bouds for algorithms that give regret bouds w.r.t. fiite-fer pealty fuctios give by (3). Fially i sectio 7 we preset a ovel memory efficiet olie learig algorithm with regret bouds. 3. Olie to Batch Coversio Bouds for Bouded Loss Fuctios We ow preset our geeraliatio bouds for algorithms that provide regret bouds with respect to the all-pairs loss fuctios (see Eq. ()). Our results give tighter bouds ad have a much better depedece o iput dimesioality tha the bouds give by Wag et al. (202). See Sectio 3. for a detailed compariso. As was oted by (Wag et al., 202), the geeraliatio error aalysis of olie learig algorithms i this settig does ot follow from existig techiques for first-order problems (such as (Cesa-Biachi et al., 200; Kakade & Tewari, 2008)). The reaso is that the terms V t = ˆL t (h t ) do ot form a martigale due to the itersectio of traiig samples i V t ad V τ, τ < t. Our techique, that aims to utilie the Rademacher complexities of fuctio classes i order to get tighter bouds, faces yet aother challege at the symmetriatio step, a precursor to the itroductio of Rademacher complexities. It turs out that, due to the couplig betwee the head variable t ad the tail variables τ i the loss fuctio ˆL t, a stadard symmetriatio betwee true τ ad ghost τ samples does ot succeed i geeratig Rademacher averages ad istead yields complex lookig terms. More specifically, suppose we have true variables t ad ghost variables t ad are i the process of boudig the expected excess risk by aalyig expressios of the form E orig = l(h t, t, τ ) l(h t, t, τ ). Performig a traditioal symmetriatio of the variables τ with τ would give us expressios of the form E symm = l(h t, t, τ ) l(h t, t, τ ). At this poit the aalysis hits a barrier sice ulike first order situatios, we caot relate E symm to E orig by meas of itroducig Rademacher variables. We circumvet this problem by usig a techique that we call Symmetriatio of Expectatios. The techique allows us to use stadard symmetriatio to obtai Rademacher complexities. More specifically, we aalye expressios of the form E orig = E l(h t,, τ ) E l(h t,, τ ) which upo symmetriatio yield expressios such as E symm = E l(h t,, τ ) E l(h t,, τ ) which allow us to itroduce Rademacher variables sice E symm = E orig. This idea is exploited by the
4 O the Geeraliatio Ability of Olie Learig Algorithms for Pairwise Loss Fuctios lemma give below that relates the expected risk of the esemble to the pealties icurred durig the olie learig process. I the followig we use the followig extesio of Rademacher averages (Kakade et al., 2008) to bivariate fuctio classes: R (H) = E sup ɛ τ h(, τ ) τ= where the expectatio is over ɛ τ, ad τ. We shall deote composite fuctio classes as follows : l H := {(, ) l(h,, ), h H}. Lemma. Let h,..., h be a esemble of hypotheses geerated by a olie learig algorithm workig with a bouded loss fuctio l : H Z Z [0, B]. The for ay δ > 0, we have with probability at least δ, L(h t ) ˆL t (h t ) + 2 log δ R t (l H) + 3B. The proof of the lemma ivolves decomposig the excess risk term ito a martigale differece sequece ad a residual term i a maer similar to (Wag et al., 202). The martigale sequece, beig a bouded oe, is show to coverge usig the Auma- Hoeffdig iequality. The residual term is hadled usig uiform covergece techiques ivolvig Rademacher averages. The complete proof of the lemma is give i the Appedix A. Similar to Lemma, the followig coverse relatio betwee the populatio ad empirical risk of the populatio risk miimier h ca also be show. Lemma 2. For ay δ > 0, we have with probability at least δ, ˆL t (h ) L(h ) + 2 R t (l H) log δ +3B. A olie learig algorithm will be said to have a all-pairs regret boud R if it presets a esemble h,..., h such that ˆL t (h t ) if ˆL t (h) + R. Suppose we have a olie learig algorithm with a regret boud R. The combiig Lemmata ad 2 gives us the followig olie to batch coversio boud: Theorem 3. Let h,..., h be a esemble of hypotheses geerated by a olie learig algorithm workig with a B-bouded loss fuctio l that guaratees a regret boud of R. The for ay δ > 0, we have with probability at least δ, L(h t ) L(h ) R + 6B R t (l H) log δ. As we shall see i Sectio 6, for several learig problems, the Rademacher ( ) complexities behave as R t (l H) C d O t where C d is a costat depedet oly o the dimesio d of the iput space ad the O ( ) otatio hides costats depedet o the domai sie ad the loss fuctio. This allows us to boud the excess risk as follows: ( L(ht ) L(h ) + R + O C d + ) log(/δ). Here, the error decreases with at a stadard / rate (up to a log factor), similar to that obtaied by Wag et al. (202). However, for several problems the above boud ca be sigificatly tighter tha those offered by coverig umber based argumets. We provide below a detailed compariso of our results with those of Wag et al. (202). 3.. Discussio o the ature of our bouds As metioed above, our proof eables us to use Rademacher complexities which are typically easier to aalye ad provide tighter bouds (Kakade et al., 2008). I particular, as show i Sectio 6, for L 2 regularied learig formulatios, the Rademacher complexities are dimesio idepedet i.e. C d =. Cosequetly, ulike the bouds of (Wag et al., 202) that have a liear depedece o d, our boud becomes idepedet of the iput space dimesio. For sparse learig formulatios with L or trace orm regulariatio, we have C d = log d givig us a mild depedece o the iput dimesioality. Our bouds are also tighter that those of (Wag et al., 202) i geeral. Whereas we provide a cofidece boud of δ < exp ( ɛ 2 + log ), (Wag et al., 202) offer a weaker boud δ < (/ɛ) d exp ( ɛ 2 + log ). A artifact of the proof techique of (Wag et al., 202) is that their proof is required to exclude a costat fractio of the esemble (h,..., h c ) from the
5 O the Geeraliatio Ability of Olie Learig Algorithms for Pairwise Loss Fuctios aalysis, failig which their bouds tur vacuous. Our proof o the other had is able to give guaratees for the etire esemble. I additio to this, as the followig sectios show, our proof techique ejoys the flexibility of beig extedable to give fast covergece guaratees for strogly covex loss fuctios as well as beig able to accommodate learig algorithms that use fiite fers. 4. Fast Covergece Rates for Strogly Covex Loss Fuctios I this sectio we exted results of the previous sectio to give fast covergece guaratees for olie learig algorithms that use strogly covex loss fuctios of the followig form: l(h,, ) = g( h, φ(, ) ) + r(h), where g is a covex fuctio ad r(h) is a σ-strogly covex regularier (see Sectio 6 for examples) i.e. h, h 2 H ad α [0, ], we have r(αh + ( α)h 2 ) αr(h ) + ( α)r(h 2 ) σ 2 α( α) h h 2 2. For ay orm, let deote its dual orm. Our aalysis reduces the pairwise problem to a first order problem ad a martigale covergece problem. We require the followig fast covergece boud i the stadard first order batch learig settig: Theorem 4. Let F be a closed ad covex set of fuctios over X. Let (f, x) = p( f, φ(x) ) + r(f), for a σ-strogly covex fuctio r, be a loss fuctio with P ad ˆP as the associated populatio ad empirical risk fuctioals ad f as the populatio risk miimier. Suppose is L-Lipschit ad φ(x) R, x X. The w.p. δ, for ay ɛ > 0, we have for all f F, ( ) P(f) P(f ) ( + ɛ) ˆP(f) ˆP(f ) + C δ ɛσ where C δ = C 2 d (4( + ɛ)lr)2 (32 + log(/δ)) ad C d is the depedece of the Rademacher complexity of the class F o the iput dimesioality d. The above theorem is a mior modificatio of a similar result by Sridhara et al. (2008) ad the proof (give i Appedix B) closely follows their proof as well. We ca ow state our olie to batch coversio result for strogly covex loss fuctios. Theorem 5. Let h,..., h be a esemble of hypotheses geerated by a olie learig algorithm workig with a B-bouded, L-Lipschit ad σ-strogly covex loss fuctio l. Further suppose the learig algorithm guaratees a regret boud of R. Let V = max { R, 2C 2 d log log(/δ)} The for ay δ > 0, we have with probability at least δ, L(h t ) L(h ) + R ( ) V log log(/δ) +C d O, where the O ( ) otatio hides costats depedet o domai sie ad the loss fuctio such as L, B ad σ. The decompositio of the excess risk i this case is ot made explicitly but rather emerges as a side-effect of the proof progressio. The proof starts off by applyig Theorem 4 to the hypothesis i each roud with the followig loss fuctio (h, ) := E l(h,, ). Applyig the regret boud to the resultig expressio gives us a martigale differece sequece which we the boud usig Berstei-style iequalities ad a proof techique from (Kakade & Tewari, 2008). The complete proof is give i Appedix C. We ow ote some properties of this result. The effective depedece of the above boud o the iput dimesioality is Cd 2 sice the expressio V hides a C d term. We have Cd 2 = for o sparse learig formulatios ad Cd 2 = log d for sparse learig formulatios. We ote that our boud matches that of Kakade & Tewari (2008) (for first-order learig problems) up to a logarithmic factor. 5. Aalyig Olie Learig Algorithms that use Fiite Buffers I this sectio, we preset our olie to batch coversio bouds for algorithms that work with fiitefer loss fuctios ˆL t. Recall that a olie lear- ig algorithm workig with fiite fers icurs a loss ˆL t (h) = B t B t l(h t, t, ) at each step where B t is the state of the fer at time t. A olie learig algorithm will be said to have a fiite-fer regret boud R if it presets a esemble h,..., h such that ˆL t (h t ) if ˆL t (h) R. For our guaratees to hold, we require the fer update policy used by the learig algorithm to be stream oblivious. More specifically, we require the fer update rule to decide upo the iclusio of a particular poit i i the fer based oly o its stream idex i []. Popular examples of stream oblivious policies iclude Reservoir samplig (Vitter, 985) (referred to
6 O the Geeraliatio Ability of Olie Learig Algorithms for Pairwise Loss Fuctios as RS heceforth) ad FIFO. Stream oblivious policies allow us to decouple fer costructio radomess from traiig sample radomess which makes aalysis easier; we leave the aalysis of stream aware fer update policies as a topic of future research. I the above metioed settig, we ca prove the followig olie to batch coversio bouds: Theorem 6. Let h,..., h be a esemble of hypotheses geerated by a olie learig algorithm workig with a fiite fer of capacity s ad a B- bouded loss fuctio l. Moreover, suppose that the algorithm guaratees a regret boud of R. The for ay δ > 0, we have with probability at least δ, ( L(h ) t ) L(h ) + R + O C d log δ + B s s If the loss fuctio is Lipschit ad strogly covex as well, the with the same cofidece, we have ( L(h t ) L(h ) + R + C W log δ d O s { where W = max R, 2C2 d log(/δ) s } ad C d is the depedece of R (H) o the iput dimesioality d. The above boud guaratees a excess error of Õ (/s) for algorithms (such as Follow-the-leader (Haa et al., 2006)) that offer logarithmic regret R = O (log ). We stress that this theorem is ot a direct corollary of our results for the ifiite fer case (Theorems 3 ad 5). Istead, our proofs require a more careful aalysis of the excess risk i order to accommodate the fiiteess of the fer ad the radomess (possibly) used i costructig it. More specifically, care eeds to be take to hadle radomied fer update policies such as RS which itroduce additioal radomess ito the aalysis. A aive applicatio of techiques used to prove results for the ubouded fer case would result i bouds that give o trivial geeraliatio guaratees oly for large fer sies such as s = ω( ). Our bouds, o the other had, oly require s = ω(). Key to our proofs is a coditioig step where we first aalye the coditioal excess risk by coditioig upo radomess used by the fer update policy. Such coditioig is made possible by the stream-oblivious ature of the update policy ad thus, stream-obliviousess is required by our aalysis. Subsequetly, we aalye the excess risk by takig expectatios over radomess used by the fer update policy. The complete proofs of both parts of Theorem 6 are give i Appedix D. ) Note that the above results oly require a olie learig algorithm to provide regret bouds w.r.t. the fiite-fer pealties ˆL t ad do ot require ay regret bouds w.r.t the all-pairs pealties ˆL t. For istace, the fiite fer based olie learig algorithms OAM seq ad OAM gra proposed i (Zhao et al., 20) are able to provide a regret boud w.r.t. ˆL t (Zhao et al., 20, Lemma 2) but are ot able to do so w.r.t the all-pairs loss fuctio (see Sectio 7 for a discussio). Usig Theorem 6, we are able to give a geeraliatio boud for OAM seq ad OAM gra ad hece explai the good empirical performace of these algorithms as reported i (Zhao et al., 20). Note that Wag et al. (203) are ot able to aalye OAM seq ad OAM gra sice their aalysis is restricted to algorithms that use the (determiistic) FIFO update policy whereas OAM seq ad OAM gra use the (radomied) RS policy of Vitter (985). 6. Applicatios I this sectio we make explicit our olie to batch coversio bouds for several learig scearios ad also demostrate their depedece o iput dimesioality by calculatig their respective Rademacher complexities. Recall that our defiitio of Rademacher complexity for a pairwise fuctio class is give by, R (H) = E sup ɛ τ h(, τ ). τ= For our purposes, we would be iterested i the Rademacher complexities of compositio classes of the form l H := {(, ) l(h,, ), h H} where l is some Lipschit loss fuctio. Frequetly we have l(h,, ) = φ (h(x, x )Y (y, y )) where Y (y, y ) = y y or Y (y, y ) = yy ad φ : R R is some margi loss fuctio (Steiwart & Christma, 2008). Suppose φ is L-Lipschit ad Y = sup Y (y, y ). The we have y,y Y Theorem 7. R (l H) LY R (H). The proof uses stadard cotractio iequalities ad is give i Appedix E. This reduces our task to computig the values of R (H) which we do usig a two stage proof techique (see Appedix F). For ay subset X of a Baach space ad ay orm p, we defie X p := sup x p. Let the domai X R d. x X AUC maximiatio (Zhao et al., 20): the goal here is to maximie the area uder the ROC curve for a liear classificatio problem where the hypothesis space W R d. We have h w (x, x ) = w x w x ad l(h w,, ) = φ ((y y )h w (x, x )) where φ is the
7 O the Geeraliatio Ability of Olie Learig Algorithms for Pairwise Loss Fuctios hige loss. I case our classifiers are L p regularied for q p >, we ca show that R (W) 2 X q W p where q = p/(p ). Usig the sparsity promotig L e log d regularier gives us R (W) 2 X W. Note that we obtai dimesio idepedece, for example whe the classifiers are L 2 regularied which allows us to boud the Rademacher complexities of kerelied fuctio classes for bouded kerels as well. Metric learig (Ji et al., 2009): the goal here is to lear a Mahalaobis metric M W (x, x ) = (x x ) W(x x ) usig the loss fuctio l(w,, ) = φ ( yy ( MW 2 (x, x ) )) for a hypothesis class W R d d. I this case it is possible to use a variety of mixed orm p,q ad Schatte orm S(p) regulariatios o matrices i the hypothesis class. I case we use trace orm regulariatio o the matrix class, we get R (W) X 2 2 W e log d S(). The (2, 2)-orm regulariatio offers a dimesio idepedet boud R (W) X 2 2 W 2,2. The mixed (2, )-orm regulariatio offers R (W) e log d X 2 X W 2,. Multiple kerel learig (Kumar et al., 202): the goal here is to improve the SVM classificatio algorithm by learig a good kerel K that is a positive combiatio of base kerels K,..., K p i.e. K µ (x, x ) = p i= µ ik i (x, x ) for some µ R p, µ 0. The base kerels are bouded, i.e. for all i, K i (x, x ) κ 2 for all x, x X The otio of goodess used here is the oe proposed by Balca & Blum (2006) ad ivolves usig the loss fuctio l(µ,, ) = φ (yy K µ (x, x )) where φ( ) is a margi loss fuctio meat to ecode some otio of aligmet. The two hypothesis classes for the combiatio vector µ that we study are the L regularied uit simplex () = {µ : µ =, µ 0} ad the L 2 regularied uit sphere S 2 () = {µ : µ 2 =, µ 0}. We are able to show the followig Rademacher complexity bouds for these classes: R (S 2 ()) κ 2 p ad R ( ()) κ 2 e log p. The details of the Rademacher complexity derivatios for these problems ad other examples such as similarity learig ca be foud i Appedix F. 7. OLP : Olie Learig with Pairwise Loss Fuctios I this sectio, we preset a olie learig algorithm for learig with pairwise loss fuctios i a fiite fer settig. The key cotributio i this sectio Algorithm RS-x : Stream Subsamplig with Replacemet Iput: Buffer B, ew poit t, fer sie s, timestep t. : if B < s the //There is space 2: B B { t} 3: else //Overflow situatio 4: if t = s + the //Repopulatio step 5: TMP B { t} 6: Repopulate B with s poits sampled uiformly with replacemet from TMP. 7: else //Normal update step 8: Idepedetly, replace each poit of B with t with probability /t. 9: ed if 0: ed if Algorithm 2 OLP : Olie Learig with Pairwise Loss Fuctios Iput: Step legth scale η, Buffer sie s Output: A esemble w 2,..., w W with low regret : w 0 0, B φ 2: for t = to do 3: Obtai a traiig poit t 4: Set step legth η t η t B wl(wt, t, ) ] 5: w t Π W [w t + η t B //Π W projects oto the set W 6: B Update-fer(B, t, s, t) //usig RS-x 7: ed for 8: retur w 2,..., w is a fer update policy that whe combied with a variat of the GIGA algorithm (Zikevich, 2003) allows us to give high probability regret bouds. I previous work, Zhao et al. (20) preseted a olie learig algorithm that uses fiite fers with the RS policy ad proposed a all-pairs regret boud. The RS policy esures, over the radomess used i fer updates, that at ay give time, the fer cotais a uiform sample from the precedig stream. Usig thisproperty, (Zhao et al., 20, Lemma 2) claimed that E ˆL t (h t ) = ˆL t (h t ) where the expectatio is take over the radomess used i fer costructio. However, a property such as E ˆL t (h) = ˆL t (h) holds oly for fuctios h that are either fixed or obtaied idepedetly of the radom variables used i fer updates (over which the expectatio is take). Sice h t is leared from poits i the fer itself, the above property, ad cosequetly the regret boud, does ot hold. We remedy this issue by showig a relatively weaker claim; we show that with high probability we have ˆL t (h t ) ˆL t (h t ) + ɛ. At a high level, this claim is similar to showig uiform covergece bouds for ˆL t. However, the reservoir samplig algorithm is ot particularly well suited to prove such uiform cover-
8 O the Geeraliatio Ability of Olie Learig Algorithms for Pairwise Loss Fuctios Average AUC value Average AUC value Vitter s RS Policy RS-x Policy Buffer sie (a) Soar Vitter s RS Policy RS-x Policy Buffer sie (c) IJCNN Average AUC value Average AUC value Vitter s RS Policy RS-x Policy Buffer sie (b) Segmet Vitter s RS Policy RS-x Policy Buffer sie (d) Covertype Figure. Performace of OLP (usig RS-x) ad OAM gra (usig RS) by (Zhao et al., 20) o AUC maximiatio tasks with varyig fer sies. gece bouds as it essetially performs samplig without replacemet (see Appedix G for a discussio). We overcome this hurdle by proposig a ew fer update policy RS-x (see Algorithm ) that, at each time step, guaratees s i.i.d. samples from the precedig stream (see Appedix H for a proof). Our algorithm uses this fer update policy i cojuctio with a olie learig algorithm OLP (see Algorithm 2) that is a variat of the well-kow GIGA algorithm (Zikevich, 2003). We provide the followig all-pairs regret guaratee for our algorithm: Theorem 8. Suppose the OLP algorithm workig with a s-sied fer geerates a esemble w,..., w. The with probability at least δ, R O ( C d log δ s + ) See Appedix I for the proof. A drawback of our boud is that it offers subliear regret oly for fer sies s = ω(log ). A better regret boud for costat s or a lower-boud o the regret is a ope problem. geeraliatio guaratees despite the lack of a allpairs regret boud. I our experimets, we adapted the OLP algorithm to the AUC maximiatio problem ad compared it with OAM gra o 8 differet bechmark datasets. We used 60% of the available data poits up to a maximum of poits to trai both algorithms. We refer the reader to Appedix J for a discussio o the implemetatio of the RS-x algorithm. Figure presets the results of our experimets o 4 datasets across 5 radom traiig/test splits. Results o other datasets ca be foud i Appedix K. The results demostrate that OLP performs competitively to OAM gra while i some cases havig slightly better performace for small fer sies. 9. Coclusio I this paper we studied the geeraliatio capabilities of olie learig algorithms for pairwise loss fuctios from several differet perspectives. Usig the method of Symmetriatio of Expectatios, we first provided sharp olie to batch coversio bouds for algorithms that offer all-pairs regret bouds. Our results for bouded ad strogly covex loss fuctios closely match their first order couterparts. We also exteded our aalysis to algorithms that are oly able to provide fiite-fer regret bouds usig which we were able to explai the good empirical performace of some existig algorithms. Fially we preseted a ew memory-efficiet olie learig algorithm that is able to provide all-pairs regret bouds i additio to performig well empirically. Several iterestig directios ca be pursued for future work, foremost beig the developmet of olie learig algorithms that ca guaratee sub-liear regret at costat fer sies or else a regret lower boud for fiite fer algorithms. Secodly, the idea of a stream-aware fer update policy is especially iterestig both from a empirical as well as theoretical poit of view ad would possibly require ovel proof techiques for its aalysis. Lastly, scalability issues that arise whe workig with higher order loss fuctios also pose a iterestig challege. 8. Experimetal Evaluatio I this sectio we preset experimetal evaluatio of our proposed OLP algorithm. We stress that the aim of this evaluatio is to show that our algorithm, that ejoys high cofidece regret bouds, also performs competitively i practice with respect to the OAM gra algorithm proposed by Zhao et al. (20) sice our results i Sectio 5 show that OAM gra does ejoy good Ackowledgmet The authors thak the aoymous referees for commets that improved the presetatio of the paper. PK is supported by the Microsoft Corporatio ad Microsoft Research Idia uder a Microsoft Research Idia Ph.D. fellowship award.
9 O the Geeraliatio Ability of Olie Learig Algorithms for Pairwise Loss Fuctios Refereces Agarwal, Shivai ad Niyogi, Partha. Geeraliatio Bouds for Rakig Algorithms via Algorithmic Stability. JMLR, 0:44 474, Balca, Maria-Floria ad Blum, Avrim. O a Theory of Learig with Similarity Fuctios. I ICML, pp , Bellet, Aurélie, Habrard, Amaury, ad Sebba, Marc. Similarity Learig for Provably Accurate Sparse Liear Classificatio. I ICML, 202. Brefeld, Ulf ad Scheffer, Tobias. AUC Maximiig Support Vector Learig. I ICML workshop o ROC Aalysis i Machie Learig, Cao, Qiog, Guo, Zheg-Chu, ad Yig, Yimig. Geeraliatio Bouds for Metric ad Similarity Learig, 202. arxiv: Cesa-Biachi, Nicoló ad Getile, Claudio. Improved Risk Tail Bouds for O-Lie Algorithms. IEEE Tras. o If. Theory, 54(): , Cesa-Biachi, Nicoló, Cocoi, Alex, ad Getile, Claudio. O the Geeraliatio Ability of O-Lie Learig Algorithms. I NIPS, pp , 200. Clémeço, Stépha, Lugosi, Gábor, ad Vayatis, Nicolas. Rakig ad empirical miimiatio of U- statistics. Aals of Statistics, 36: , Cortes, Coria, Mohri, Mehryar, ad Rostamiadeh, Afshi. Geeraliatio Bouds for Learig Kerels. I ICML, pp , 200a. Cortes, Coria, Mohri, Mehryar, ad Rostamiadeh, Afshi. Two-Stage Learig Kerel Algorithms. I ICML, pp , 200b. Cristiaii, Nello, Shawe-Taylor, Joh, Elisseeff, Adré, ad Kadola, Ja S. O Kerel-Target Aligmet. I NIPS, pp , 200. Freedma, David A. O Tail Probabilities for Martigales. Aals of Probability, 3():00 8, 975. Haa, Elad, Kalai, Adam, Kale, Satye, ad Agarwal, Amit. Logarithmic Regret Algorithms for Olie Covex Optimiatio. I COLT, pp , Ji, Rog, Wag, Shiju, ad Zhou, Yag. Regularied Distace Metric Learig: Theory ad Algorithm. I NIPS, pp , Kakade, Sham M. ad Tewari, Ambuj. O the Geeraliatio Ability of Olie Strogly Covex Programmig Algorithms. I NIPS, pp , Kakade, Sham M., Sridhara, Karthik, ad Tewari, Ambuj. O the Complexity of Liear Predictio: Risk Bouds, Margi Bouds, ad Regulariatio. I NIPS, Kakade, Sham M., Shalev-Shwart, Shai, ad Tewari, Ambuj. Regulariatio Techiques for Learig with Matrices. JMLR, 3: , 202. Kumar, Abhishek, Niculescu-Miil, Alexadru, Kavukcuoglu, Koray, ad III, Hal Daumé. A Biary Classificatio Framework for Two-Stage Multiple Kerel Learig. I ICML, 202. Ledoux, Michel ad Talagrad, Michel. Probability i Baach Spaces: Isoperimetry ad Processes. Spriger, Sridhara, Karthik, Shalev-Shwart, Shai, ad Srebro, Natha. Fast Rates for Regularied Objectives. I NIPS, pp , Steiwart, Igo ad Christma, Adreas. Support Vector Machies. Iformatio Sciece ad Statistics. Spriger, Vitter, Jeffrey Scott. Radom Samplig with a Reservoir. ACM Tras. o Math. Soft., ():37 57, 985. Wag, Yuyag, Khardo, Roi, Pechyoy, Dmitry, ad Joes, Rosie. Geeraliatio Bouds for Olie Learig Algorithms with Pairwise Loss Fuctios. JMLR - Proceedigs Track, 23: , 202. Wag, Yuyag, Khardo, Roi, Pechyoy, Dmitry, ad Joes, Rosie. Olie Learig with Pairwise Loss Fuctios, 203. arxiv: Xig, Eric P., Ng, Adrew Y., Jorda, Michael I., ad Russell, Stuart J. Distace Metric Learig with Applicatio to Clusterig with Side-Iformatio. I NIPS, pp , Zhao, Peili, Hoi, Steve C. H., Ji, Rog, ad Yag, Tiabao. Olie AUC Maximiatio. I ICML, pp , 20. Zikevich, Marti. Olie Covex Programmig ad Geeralied Ifiitesimal Gradiet Ascet. I ICML, pp , 2003.
Modified Line Search Method for Global Optimization
Modified Lie Search Method for Global Optimizatio Cria Grosa ad Ajith Abraham Ceter of Excellece for Quatifiable Quality of Service Norwegia Uiversity of Sciece ad Techology Trodheim, Norway {cria, ajith}@q2s.tu.o
I. Chi-squared Distributions
1 M 358K Supplemet to Chapter 23: CHI-SQUARED DISTRIBUTIONS, T-DISTRIBUTIONS, AND DEGREES OF FREEDOM To uderstad t-distributios, we first eed to look at aother family of distributios, the chi-squared distributios.
Department of Computer Science, University of Otago
Departmet of Computer Sciece, Uiversity of Otago Techical Report OUCS-2006-09 Permutatios Cotaiig May Patters Authors: M.H. Albert Departmet of Computer Sciece, Uiversity of Otago Micah Colema, Rya Fly
A probabilistic proof of a binomial identity
A probabilistic proof of a biomial idetity Joatho Peterso Abstract We give a elemetary probabilistic proof of a biomial idetity. The proof is obtaied by computig the probability of a certai evet i two
MARTINGALES AND A BASIC APPLICATION
MARTINGALES AND A BASIC APPLICATION TURNER SMITH Abstract. This paper will develop the measure-theoretic approach to probability i order to preset the defiitio of martigales. From there we will apply this
Properties of MLE: consistency, asymptotic normality. Fisher information.
Lecture 3 Properties of MLE: cosistecy, asymptotic ormality. Fisher iformatio. I this sectio we will try to uderstad why MLEs are good. Let us recall two facts from probability that we be used ofte throughout
Asymptotic Growth of Functions
CMPS Itroductio to Aalysis of Algorithms Fall 3 Asymptotic Growth of Fuctios We itroduce several types of asymptotic otatio which are used to compare the performace ad efficiecy of algorithms As we ll
A Combined Continuous/Binary Genetic Algorithm for Microstrip Antenna Design
A Combied Cotiuous/Biary Geetic Algorithm for Microstrip Atea Desig Rady L. Haupt The Pesylvaia State Uiversity Applied Research Laboratory P. O. Box 30 State College, PA 16804-0030 [email protected] Abstract:
In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008
I ite Sequeces Dr. Philippe B. Laval Keesaw State Uiversity October 9, 2008 Abstract This had out is a itroductio to i ite sequeces. mai de itios ad presets some elemetary results. It gives the I ite Sequeces
Chapter 6: Variance, the law of large numbers and the Monte-Carlo method
Chapter 6: Variace, the law of large umbers ad the Mote-Carlo method Expected value, variace, ad Chebyshev iequality. If X is a radom variable recall that the expected value of X, E[X] is the average value
Convexity, Inequalities, and Norms
Covexity, Iequalities, ad Norms Covex Fuctios You are probably familiar with the otio of cocavity of fuctios. Give a twicedifferetiable fuctio ϕ: R R, We say that ϕ is covex (or cocave up) if ϕ (x) 0 for
Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13
EECS 70 Discrete Mathematics ad Probability Theory Sprig 2014 Aat Sahai Note 13 Itroductio At this poit, we have see eough examples that it is worth just takig stock of our model of probability ad may
Chapter 7 Methods of Finding Estimators
Chapter 7 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 011 Chapter 7 Methods of Fidig Estimators Sectio 7.1 Itroductio Defiitio 7.1.1 A poit estimator is ay fuctio W( X) W( X1, X,, X ) of
Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable
Week 3 Coditioal probabilities, Bayes formula, WEEK 3 page 1 Expected value of a radom variable We recall our discussio of 5 card poker hads. Example 13 : a) What is the probability of evet A that a 5
THE HEIGHT OF q-binary SEARCH TREES
THE HEIGHT OF q-binary SEARCH TREES MICHAEL DRMOTA AND HELMUT PRODINGER Abstract. q biary search trees are obtaied from words, equipped with the geometric distributio istead of permutatios. The average
Regularized Distance Metric Learning: Theory and Algorithm
Regularized Distace Metric Learig: Theory ad Algorithm Rog Ji 1 Shiju Wag 2 Yag Zhou 1 1 Dept. of Computer Sciece & Egieerig, Michiga State Uiversity, East Lasig, MI 48824 2 Radiology ad Imagig Scieces,
Plug-in martingales for testing exchangeability on-line
Plug-i martigales for testig exchageability o-lie Valetia Fedorova, Alex Gammerma, Ilia Nouretdiov, ad Vladimir Vovk Computer Learig Research Cetre Royal Holloway, Uiversity of Lodo, UK {valetia,ilia,alex,vovk}@cs.rhul.ac.uk
THE ABRACADABRA PROBLEM
THE ABRACADABRA PROBLEM FRANCESCO CARAVENNA Abstract. We preset a detailed solutio of Exercise E0.6 i [Wil9]: i a radom sequece of letters, draw idepedetly ad uiformly from the Eglish alphabet, the expected
Soving Recurrence Relations
Sovig Recurrece Relatios Part 1. Homogeeous liear 2d degree relatios with costat coefficiets. Cosider the recurrece relatio ( ) T () + at ( 1) + bt ( 2) = 0 This is called a homogeeous liear 2d degree
An Efficient Polynomial Approximation of the Normal Distribution Function & Its Inverse Function
A Efficiet Polyomial Approximatio of the Normal Distributio Fuctio & Its Iverse Fuctio Wisto A. Richards, 1 Robi Atoie, * 1 Asho Sahai, ad 3 M. Raghuadh Acharya 1 Departmet of Mathematics & Computer Sciece;
Annuities Under Random Rates of Interest II By Abraham Zaks. Technion I.I.T. Haifa ISRAEL and Haifa University Haifa ISRAEL.
Auities Uder Radom Rates of Iterest II By Abraham Zas Techio I.I.T. Haifa ISRAEL ad Haifa Uiversity Haifa ISRAEL Departmet of Mathematics, Techio - Israel Istitute of Techology, 3000, Haifa, Israel I memory
Hypothesis testing. Null and alternative hypotheses
Hypothesis testig Aother importat use of samplig distributios is to test hypotheses about populatio parameters, e.g. mea, proportio, regressio coefficiets, etc. For example, it is possible to stipulate
Analyzing Longitudinal Data from Complex Surveys Using SUDAAN
Aalyzig Logitudial Data from Complex Surveys Usig SUDAAN Darryl Creel Statistics ad Epidemiology, RTI Iteratioal, 312 Trotter Farm Drive, Rockville, MD, 20850 Abstract SUDAAN: Software for the Statistical
PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM
PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY Physical ad Mathematical Scieces 2015, 1, p. 15 19 M a t h e m a t i c s AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM A. G. GULYAN Chair of Actuarial Mathematics
LECTURE 13: Cross-validation
LECTURE 3: Cross-validatio Resampli methods Cross Validatio Bootstrap Bias ad variace estimatio with the Bootstrap Three-way data partitioi Itroductio to Patter Aalysis Ricardo Gutierrez-Osua Texas A&M
Vladimir N. Burkov, Dmitri A. Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT
Keywords: project maagemet, resource allocatio, etwork plaig Vladimir N Burkov, Dmitri A Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT The paper deals with the problems of resource allocatio betwee
Output Analysis (2, Chapters 10 &11 Law)
B. Maddah ENMG 6 Simulatio 05/0/07 Output Aalysis (, Chapters 10 &11 Law) Comparig alterative system cofiguratio Sice the output of a simulatio is radom, the comparig differet systems via simulatio should
Class Meeting # 16: The Fourier Transform on R n
MATH 18.152 COUSE NOTES - CLASS MEETING # 16 18.152 Itroductio to PDEs, Fall 2011 Professor: Jared Speck Class Meetig # 16: The Fourier Trasform o 1. Itroductio to the Fourier Trasform Earlier i the course,
Running Time ( 3.1) Analysis of Algorithms. Experimental Studies ( 3.1.1) Limitations of Experiments. Pseudocode ( 3.1.2) Theoretical Analysis
Ruig Time ( 3.) Aalysis of Algorithms Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite amout of time. Most algorithms trasform iput objects ito output objects.
5 Boolean Decision Trees (February 11)
5 Boolea Decisio Trees (February 11) 5.1 Graph Coectivity Suppose we are give a udirected graph G, represeted as a boolea adjacecy matrix = (a ij ), where a ij = 1 if ad oly if vertices i ad j are coected
Basic Elements of Arithmetic Sequences and Series
MA40S PRE-CALCULUS UNIT G GEOMETRIC SEQUENCES CLASS NOTES (COMPLETED NO NEED TO COPY NOTES FROM OVERHEAD) Basic Elemets of Arithmetic Sequeces ad Series Objective: To establish basic elemets of arithmetic
Confidence Intervals for One Mean
Chapter 420 Cofidece Itervals for Oe Mea Itroductio This routie calculates the sample size ecessary to achieve a specified distace from the mea to the cofidece limit(s) at a stated cofidece level for a
Subject CT5 Contingencies Core Technical Syllabus
Subject CT5 Cotigecies Core Techical Syllabus for the 2015 exams 1 Jue 2014 Aim The aim of the Cotigecies subject is to provide a groudig i the mathematical techiques which ca be used to model ad value
Project Deliverables. CS 361, Lecture 28. Outline. Project Deliverables. Administrative. Project Comments
Project Deliverables CS 361, Lecture 28 Jared Saia Uiversity of New Mexico Each Group should tur i oe group project cosistig of: About 6-12 pages of text (ca be loger with appedix) 6-12 figures (please
CHAPTER 3 DIGITAL CODING OF SIGNALS
CHAPTER 3 DIGITAL CODING OF SIGNALS Computers are ofte used to automate the recordig of measuremets. The trasducers ad sigal coditioig circuits produce a voltage sigal that is proportioal to a quatity
Research Article Sign Data Derivative Recovery
Iteratioal Scholarly Research Network ISRN Applied Mathematics Volume 0, Article ID 63070, 7 pages doi:0.540/0/63070 Research Article Sig Data Derivative Recovery L. M. Housto, G. A. Glass, ad A. D. Dymikov
0.7 0.6 0.2 0 0 96 96.5 97 97.5 98 98.5 99 99.5 100 100.5 96.5 97 97.5 98 98.5 99 99.5 100 100.5
Sectio 13 Kolmogorov-Smirov test. Suppose that we have a i.i.d. sample X 1,..., X with some ukow distributio P ad we would like to test the hypothesis that P is equal to a particular distributio P 0, i.e.
Incremental calculation of weighted mean and variance
Icremetal calculatio of weighted mea ad variace Toy Fich [email protected] [email protected] Uiversity of Cambridge Computig Service February 009 Abstract I these otes I eplai how to derive formulae for umerically
SAMPLE QUESTIONS FOR FINAL EXAM. (1) (2) (3) (4) Find the following using the definition of the Riemann integral: (2x + 1)dx
SAMPLE QUESTIONS FOR FINAL EXAM REAL ANALYSIS I FALL 006 3 4 Fid the followig usig the defiitio of the Riema itegral: a 0 x + dx 3 Cosider the partitio P x 0 3, x 3 +, x 3 +,......, x 3 3 + 3 of the iterval
CS103A Handout 23 Winter 2002 February 22, 2002 Solving Recurrence Relations
CS3A Hadout 3 Witer 00 February, 00 Solvig Recurrece Relatios Itroductio A wide variety of recurrece problems occur i models. Some of these recurrece relatios ca be solved usig iteratio or some other ad
Chair for Network Architectures and Services Institute of Informatics TU München Prof. Carle. Network Security. Chapter 2 Basics
Chair for Network Architectures ad Services Istitute of Iformatics TU Müche Prof. Carle Network Security Chapter 2 Basics 2.4 Radom Number Geeratio for Cryptographic Protocols Motivatio It is crucial to
Spam Detection. A Bayesian approach to filtering spam
Spam Detectio A Bayesia approach to filterig spam Kual Mehrotra Shailedra Watave Abstract The ever icreasig meace of spam is brigig dow productivity. More tha 70% of the email messages are spam, ad it
SUPPLEMENTARY MATERIAL TO GENERAL NON-EXACT ORACLE INEQUALITIES FOR CLASSES WITH A SUBEXPONENTIAL ENVELOPE
SUPPLEMENTARY MATERIAL TO GENERAL NON-EXACT ORACLE INEQUALITIES FOR CLASSES WITH A SUBEXPONENTIAL ENVELOPE By Guillaume Lecué CNRS, LAMA, Mare-la-vallée, 77454 Frace ad By Shahar Medelso Departmet of Mathematics,
Totally Corrective Boosting Algorithms that Maximize the Margin
Mafred K. Warmuth [email protected] Ju Liao [email protected] Uiversity of Califoria at Sata Cruz, Sata Cruz, CA 95064, USA Guar Rätsch [email protected] Friedrich Miescher Laboratory of
Statistical Learning Theory
1 / 130 Statistical Learig Theory Machie Learig Summer School, Kyoto, Japa Alexader (Sasha) Rakhli Uiversity of Pesylvaia, The Wharto School Pe Research i Machie Learig (PRiML) August 27-28, 2012 2 / 130
Sequences and Series
CHAPTER 9 Sequeces ad Series 9.. Covergece: Defiitio ad Examples Sequeces The purpose of this chapter is to itroduce a particular way of geeratig algorithms for fidig the values of fuctios defied by their
Theorems About Power Series
Physics 6A Witer 20 Theorems About Power Series Cosider a power series, f(x) = a x, () where the a are real coefficiets ad x is a real variable. There exists a real o-egative umber R, called the radius
Overview of some probability distributions.
Lecture Overview of some probability distributios. I this lecture we will review several commo distributios that will be used ofte throughtout the class. Each distributio is usually described by its probability
Swaps: Constant maturity swaps (CMS) and constant maturity. Treasury (CMT) swaps
Swaps: Costat maturity swaps (CMS) ad costat maturity reasury (CM) swaps A Costat Maturity Swap (CMS) swap is a swap where oe of the legs pays (respectively receives) a swap rate of a fixed maturity, while
The Stable Marriage Problem
The Stable Marriage Problem William Hut Lae Departmet of Computer Sciece ad Electrical Egieerig, West Virgiia Uiversity, Morgatow, WV [email protected] 1 Itroductio Imagie you are a matchmaker,
Taking DCOP to the Real World: Efficient Complete Solutions for Distributed Multi-Event Scheduling
Taig DCOP to the Real World: Efficiet Complete Solutios for Distributed Multi-Evet Schedulig Rajiv T. Maheswara, Milid Tambe, Emma Bowrig, Joatha P. Pearce, ad Pradeep araatham Uiversity of Souther Califoria
FOUNDATIONS OF MATHEMATICS AND PRE-CALCULUS GRADE 10
FOUNDATIONS OF MATHEMATICS AND PRE-CALCULUS GRADE 10 [C] Commuicatio Measuremet A1. Solve problems that ivolve liear measuremet, usig: SI ad imperial uits of measure estimatio strategies measuremet strategies.
A Faster Clause-Shortening Algorithm for SAT with No Restriction on Clause Length
Joural o Satisfiability, Boolea Modelig ad Computatio 1 2005) 49-60 A Faster Clause-Shorteig Algorithm for SAT with No Restrictio o Clause Legth Evgey Datsi Alexader Wolpert Departmet of Computer Sciece
Recovery time guaranteed heuristic routing for improving computation complexity in survivable WDM networks
Computer Commuicatios 30 (2007) 1331 1336 wwwelseviercom/locate/comcom Recovery time guarateed heuristic routig for improvig computatio complexity i survivable WDM etworks Lei Guo * College of Iformatio
SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES
SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES Read Sectio 1.5 (pages 5 9) Overview I Sectio 1.5 we lear to work with summatio otatio ad formulas. We will also itroduce a brief overview of sequeces,
Universal coding for classes of sources
Coexios module: m46228 Uiversal codig for classes of sources Dever Greee This work is produced by The Coexios Project ad licesed uder the Creative Commos Attributio Licese We have discussed several parametric
Lecture 4: Cauchy sequences, Bolzano-Weierstrass, and the Squeeze theorem
Lecture 4: Cauchy sequeces, Bolzao-Weierstrass, ad the Squeeze theorem The purpose of this lecture is more modest tha the previous oes. It is to state certai coditios uder which we are guarateed that limits
The analysis of the Cournot oligopoly model considering the subjective motive in the strategy selection
The aalysis of the Courot oligopoly model cosiderig the subjective motive i the strategy selectio Shigehito Furuyama Teruhisa Nakai Departmet of Systems Maagemet Egieerig Faculty of Egieerig Kasai Uiversity
Chapter 5 O A Cojecture Of Erdíos Proceedigs NCUR VIII è1994è, Vol II, pp 794í798 Jeærey F Gold Departmet of Mathematics, Departmet of Physics Uiversity of Utah Do H Tucker Departmet of Mathematics Uiversity
Infinite Sequences and Series
CHAPTER 4 Ifiite Sequeces ad Series 4.1. Sequeces A sequece is a ifiite ordered list of umbers, for example the sequece of odd positive itegers: 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29...
A sharp Trudinger-Moser type inequality for unbounded domains in R n
A sharp Trudiger-Moser type iequality for ubouded domais i R Yuxiag Li ad Berhard Ruf Abstract The Trudiger-Moser iequality states that for fuctios u H, 0 (Ω) (Ω R a bouded domai) with Ω u dx oe has Ω
THIN SEQUENCES AND THE GRAM MATRIX PAMELA GORKIN, JOHN E. MCCARTHY, SANDRA POTT, AND BRETT D. WICK
THIN SEQUENCES AND THE GRAM MATRIX PAMELA GORKIN, JOHN E MCCARTHY, SANDRA POTT, AND BRETT D WICK Abstract We provide a ew proof of Volberg s Theorem characterizig thi iterpolatig sequeces as those for
Lecture 2: Karger s Min Cut Algorithm
priceto uiv. F 3 cos 5: Advaced Algorithm Desig Lecture : Karger s Mi Cut Algorithm Lecturer: Sajeev Arora Scribe:Sajeev Today s topic is simple but gorgeous: Karger s mi cut algorithm ad its extesio.
Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.
This documet was writte ad copyrighted by Paul Dawkis. Use of this documet ad its olie versio is govered by the Terms ad Coditios of Use located at http://tutorial.math.lamar.edu/terms.asp. The olie versio
1 The Gaussian channel
ECE 77 Lecture 0 The Gaussia chael Objective: I this lecture we will lear about commuicatio over a chael of practical iterest, i which the trasmitted sigal is subjected to additive white Gaussia oise.
Escola Federal de Engenharia de Itajubá
Escola Federal de Egeharia de Itajubá Departameto de Egeharia Mecâica Pós-Graduação em Egeharia Mecâica MPF04 ANÁLISE DE SINAIS E AQUISÇÃO DE DADOS SINAIS E SISTEMAS Trabalho 02 (MATLAB) Prof. Dr. José
Statistical inference: example 1. Inferential Statistics
Statistical iferece: example 1 Iferetial Statistics POPULATION SAMPLE A clothig store chai regularly buys from a supplier large quatities of a certai piece of clothig. Each item ca be classified either
Case Study. Normal and t Distributions. Density Plot. Normal Distributions
Case Study Normal ad t Distributios Bret Halo ad Bret Larget Departmet of Statistics Uiversity of Wiscosi Madiso October 11 13, 2011 Case Study Body temperature varies withi idividuals over time (it ca
Chapter 7 - Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas:
Chapter 7 - Samplig Distributios 1 Itroductio What is statistics? It cosist of three major areas: Data Collectio: samplig plas ad experimetal desigs Descriptive Statistics: umerical ad graphical summaries
Entropy of bi-capacities
Etropy of bi-capacities Iva Kojadiovic LINA CNRS FRE 2729 Site école polytechique de l uiv. de Nates Rue Christia Pauc 44306 Nates, Frace [email protected] Jea-Luc Marichal Applied Mathematics
, a Wishart distribution with n -1 degrees of freedom and scale matrix.
UMEÅ UNIVERSITET Matematisk-statistiska istitutioe Multivariat dataaalys D MSTD79 PA TENTAMEN 004-0-9 LÖSNINGSFÖRSLAG TILL TENTAMEN I MATEMATISK STATISTIK Multivariat dataaalys D, 5 poäg.. Assume that
Notes on exponential generating functions and structures.
Notes o expoetial geeratig fuctios ad structures. 1. The cocept of a structure. Cosider the followig coutig problems: (1) to fid for each the umber of partitios of a -elemet set, (2) to fid for each the
The following example will help us understand The Sampling Distribution of the Mean. C1 C2 C3 C4 C5 50 miles 84 miles 38 miles 120 miles 48 miles
The followig eample will help us uderstad The Samplig Distributio of the Mea Review: The populatio is the etire collectio of all idividuals or objects of iterest The sample is the portio of the populatio
Automatic Tuning for FOREX Trading System Using Fuzzy Time Series
utomatic Tuig for FOREX Tradig System Usig Fuzzy Time Series Kraimo Maeesilp ad Pitihate Soorasa bstract Efficiecy of the automatic currecy tradig system is time depedet due to usig fixed parameters which
Lecture 3. denote the orthogonal complement of S k. Then. 1 x S k. n. 2 x T Ax = ( ) λ x. with x = 1, we have. i = λ k x 2 = λ k.
18.409 A Algorithmist s Toolkit September 17, 009 Lecture 3 Lecturer: Joatha Keler Scribe: Adre Wibisoo 1 Outlie Today s lecture covers three mai parts: Courat-Fischer formula ad Rayleigh quotiets The
The Power of Free Branching in a General Model of Backtracking and Dynamic Programming Algorithms
The Power of Free Brachig i a Geeral Model of Backtrackig ad Dyamic Programmig Algorithms SASHKA DAVIS IDA/Ceter for Computig Scieces Bowie, MD [email protected] RUSSELL IMPAGLIAZZO Dept. of Computer
DAME - Microsoft Excel add-in for solving multicriteria decision problems with scenarios Radomir Perzina 1, Jaroslav Ramik 2
Itroductio DAME - Microsoft Excel add-i for solvig multicriteria decisio problems with scearios Radomir Perzia, Jaroslav Ramik 2 Abstract. The mai goal of every ecoomic aget is to make a good decisio,
How To Solve The Homewor Problem Beautifully
Egieerig 33 eautiful Homewor et 3 of 7 Kuszmar roblem.5.5 large departmet store sells sport shirts i three sizes small, medium, ad large, three patters plaid, prit, ad stripe, ad two sleeve legths log
Hypergeometric Distributions
7.4 Hypergeometric Distributios Whe choosig the startig lie-up for a game, a coach obviously has to choose a differet player for each positio. Similarly, whe a uio elects delegates for a covetio or you
Section 11.3: The Integral Test
Sectio.3: The Itegral Test Most of the series we have looked at have either diverged or have coverged ad we have bee able to fid what they coverge to. I geeral however, the problem is much more difficult
Factors of sums of powers of binomial coefficients
ACTA ARITHMETICA LXXXVI.1 (1998) Factors of sums of powers of biomial coefficiets by Neil J. Cali (Clemso, S.C.) Dedicated to the memory of Paul Erdős 1. Itroductio. It is well ow that if ( ) a f,a = the
Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring
No-life isurace mathematics Nils F. Haavardsso, Uiversity of Oslo ad DNB Skadeforsikrig Mai issues so far Why does isurace work? How is risk premium defied ad why is it importat? How ca claim frequecy
1. C. The formula for the confidence interval for a population mean is: x t, which was
s 1. C. The formula for the cofidece iterval for a populatio mea is: x t, which was based o the sample Mea. So, x is guarateed to be i the iterval you form.. D. Use the rule : p-value
Overview on S-Box Design Principles
Overview o S-Box Desig Priciples Debdeep Mukhopadhyay Assistat Professor Departmet of Computer Sciece ad Egieerig Idia Istitute of Techology Kharagpur INDIA -721302 What is a S-Box? S-Boxes are Boolea
Introduction to Statistical Learning Theory
Itroductio to Statistical Learig Theory Olivier Bousquet 1, Stéphae Bouchero 2, ad Gábor Lugosi 3 1 Max-Plack Istitute for Biological Cyberetics Spemastr 38, D-72076 Tübige, Germay olivierbousquet@m4xorg
Center, Spread, and Shape in Inference: Claims, Caveats, and Insights
Ceter, Spread, ad Shape i Iferece: Claims, Caveats, ad Isights Dr. Nacy Pfeig (Uiversity of Pittsburgh) AMATYC November 2008 Prelimiary Activities 1. I would like to produce a iterval estimate for the
Solutions to Selected Problems In: Pattern Classification by Duda, Hart, Stork
Solutios to Selected Problems I: Patter Classificatio by Duda, Hart, Stork Joh L. Weatherwax February 4, 008 Problem Solutios Chapter Bayesia Decisio Theory Problem radomized rules Part a: Let Rx be the
Lecture 5: Span, linear independence, bases, and dimension
Lecture 5: Spa, liear idepedece, bases, ad dimesio Travis Schedler Thurs, Sep 23, 2010 (versio: 9/21 9:55 PM) 1 Motivatio Motivatio To uderstad what it meas that R has dimesio oe, R 2 dimesio 2, etc.;
Estimating Probability Distributions by Observing Betting Practices
5th Iteratioal Symposium o Imprecise Probability: Theories ad Applicatios, Prague, Czech Republic, 007 Estimatig Probability Distributios by Observig Bettig Practices Dr C Lych Natioal Uiversity of Irelad,
Math C067 Sampling Distributions
Math C067 Samplig Distributios Sample Mea ad Sample Proportio Richard Beigel Some time betwee April 16, 2007 ad April 16, 2007 Examples of Samplig A pollster may try to estimate the proportio of voters
Maximum Likelihood Estimators.
Lecture 2 Maximum Likelihood Estimators. Matlab example. As a motivatio, let us look at oe Matlab example. Let us geerate a radom sample of size 00 from beta distributio Beta(5, 2). We will lear the defiitio
Capacity Management for Contract Manufacturing
OPERATIONS RESEARCH Vol. 55, No. 2, March April 2007, pp. 367 377 iss 0030-364X eiss 526-5463 07 5502 0367 iforms doi 0.287/opre.060.0359 2007 INFORMS Capacity Maagemet for Cotract Maufacturig Diwakar
Domain 1: Designing a SQL Server Instance and a Database Solution
Maual SQL Server 2008 Desig, Optimize ad Maitai (70-450) 1-800-418-6789 Domai 1: Desigig a SQL Server Istace ad a Database Solutio Desigig for CPU, Memory ad Storage Capacity Requiremets Whe desigig a
CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 8
CME 30: NUMERICAL LINEAR ALGEBRA FALL 005/06 LECTURE 8 GENE H GOLUB 1 Positive Defiite Matrices A matrix A is positive defiite if x Ax > 0 for all ozero x A positive defiite matrix has real ad positive
