(Almost) No Label No Cry

Size: px
Start display at page:

Download "(Almost) No Label No Cry"

Transcription

1 (Almost) No Label No Cry Gorgo Patrn,, Rchard Nock,, Paul Rvera,, Tbero Caetano,3,4 Australan Natonal Unversty, NICTA, Unversty of New South Wales 3, Ambata 4 Sydney, NSW, Australa {namesurname}@anueduau Abstract In Learnng wth Label Proportons (LLP), the objectve s to learn a supervsed classfer when, nstead of labels, only label proportons for bags of observatons are known Ths settng has broad practcal relevance, n partcular for prvacy preservng data processng We frst show that the mean operator, a statstc whch aggregates all labels, s mnmally suffcent for the mnmzaton of many proper scorng losses wth lnear (or kernelzed) classfers wthout usng labels We provde a fast learnng algorthm that estmates the mean operator va a manfold regularzer wth guaranteed approxmaton bounds Then, we present an teratve learnng algorthm that uses ths as ntalzaton We ground ths algorthm n Rademacher-style generalzaton bounds that ft the LLP settng, ntroducng a generalzaton of Rademacher complexty and a Label Proporton Complexty measure Ths latter algorthm optmzes tractable bounds for the correspondng bag-emprcal rsk Experments are provded on fourteen domans, whose sze ranges up to 300K observatons They dsplay that our algorthms are scalable and tend to consstently outperform the state of the art n LLP Moreover, n many cases, our algorthms compete wth or are just percents of AUC away from the Oracle that learns knowng all labels On the largest domans, half a dozen proportons can suffce, e roughly 40K tmes less than the total number of labels Introducton Machne learnng has recently experenced a prolferaton of problem settngs that, to some extent, enrch the classcal dchotomy between supervsed and unsupervsed learnng Cases as multple nstance labels, nosy labels, partal labels as well as sem-supervsed learnng have been studed motvated by applcatons where fully supervsed learnng s no longer realstc In the present work, we are nterested n learnng a bnary classfer from nformaton provded at the level of groups of nstances, called bags The type of nformaton we assume avalable s the label proportons per bag, ndcatng the fracton of postve bnary labels of ts nstances Inspred by [], we refer to ths framework as Learnng wth Label Proportons (LLP) Settngs that perform a bag-wse aggregaton of labels nclude Multple Instance Learnng (MIL) [] In MIL, the aggregaton s logcal rather than statstcal: each bag s provded wth a bnary label expressng an OR condton on all the labels contaned n the bag More general settng also exst [3] [4] [5] Many practcal scenaros ft the LLP abstracton (a) Only aggregated labels can be obtaned due to the physcal lmts of measurement tools [6] [7] [8] [9] (b) The problem s sem- or unsupervsed but doman experts have knowledge about the unlabelled samples n form of expectaton, as pseudomeasurement [5] (c) Labels exsted once but they are now gven n an aggregated fashon for prvacy-preservng reasons, as n medcal databases [0], fraud detecton [], house prce market, electon results, census data, etc (d) Ths settng also arses n computer vson [] [3] [4] Related work The settng was frst ntroduced by [], where a prncpled herarchcal model generates labels consstent wth the proportons and s traned through MCMC Subsequently, [9] and ts follower [6] offer a varety of standard learnng algorthms desgned to generate self-consstent

2 labels [5] gves a Bayesan nterpretaton of LLP where the key dstrbuton s estmated through an RBM Other deas rely on structural learnng of Bayesan networks wth mssng data [7], and on K- MEANS clusterng to solve prelmnary label assgnment [3] [8] Recent SVM mplementatons [] [6] outperform most of the other known methods Theoretcal works on LLP belong to two man categores The frst contans unform convergence results, for the estmators of label proportons [], or the estmator of the mean operator [7] The second contans approxmaton results for the classfer [7] Our work bulds upon ther Mean Map algorthm, that reles on the trck that the logstc loss may be splt n two, a convex part dependng only on the observatons, and a lnear part nvolvng a suffcent statstc for the label, the mean operator Beng able to estmate the mean operator means beng able to ft a classfer wthout usng labels In [7], ths estmaton reles on a restrctve homogenety assumpton that the class-condtonal estmaton of features does not depend on the bags Experments dsplay the lmts of ths assumpton [][6] Contrbutons In ths paper we consder lnear classfers, but our results hold for kernelzed formulatons followng [7] We frst show that the trck about the logstc loss can be generalzed, and the mean operator s actually mnmally suffcent for a wde set of symmetrc proper scorng losses wth no class-dependent msclassfcaton cost, that encompass the logstc, square and Matsushta losses [8] We then provde an algorthm, LMM, whch estmates the mean operator va a Laplacan-based manfold regularzer wthout callng to the homogenety assumpton We show that under a weak dstngushablty assumpton between bags, our estmaton of the mean operator s all the better as the observatons norm ncrease Ths, as we show, cannot hold for the Mean Map estmator Then, we provde a data-dependent approxmaton bound for our classfer wth respect to the optmal classfer, that s shown to be better than prevous bounds [7] We also show that the manfold regularzer s soluton s tghtly related to the lnear separablty of the bags We then provde an teratve algorthm, AMM, that takes as nput the soluton of LMM and optmzes t further over the set of consstent labelngs We ground the algorthm n a unform convergence result nvolvng a generalzaton of Rademacher complextes for the LLP settng The bound nvolves a bag-emprcal surrogate rsk for whch we show that AMM optmzes tractable bounds All our theoretcal results hold for any symmetrc proper scorng loss Experments are provded on fourteen domans, rangng from hundreds to hundreds of thousands of examples, comparng AMM and LMM to ther contenders: Mean Map, InvCal [] and SVM [6] They dsplay that AMM and LMM outperform ther contenders, and sometmes even compete wth the fully supervsed learner whle requrng few proportons only Tests on the largest domans dsplay the scalablty of both algorthms Such expermental evdence serously questons the safety of prvacy-preservng summarzaton of data, whenever accurate aggregates and nformatve ndvdual features are avalable Secton () presents our algorthms and related theoretcal results Secton (3) presents experments Secton (4) concludes A Supplementary Materal [9] ncludes proofs and addtonal experments LLP and the mean operator: theoretcal results and algorthms Learnng settng Hereafter, boldfaces lke p denote vectors, whose coordnates are denoted p l for l,, For any m N, let [m] {,,, m} Let Σ m {σ {, } m } and X R d Examples are couples (observaton, label) X Σ, sampled d accordng to some unknown but fxed dstrbuton D Let S {(x, y ), [m]} D m denote a sze-m sample In Learnng wth Label Proportons (LLP), we do not observe drectly S but S y, whch denotes S wth labels removed; we are gven ts partton n n > 0 bags, S y j S j, j [n], along wth ther respectve label proportons ˆπ j ˆP[y + S j ] and bag proportons ˆp j m j /m wth m j card(s j ) (Ths generalzes to a cover of S, by copyng examples among bags) The bag assgnment functon that parttons S s unknown but fxed In real world domans, t would rather be known, eg state, gender, age band A classfer s a functon h : X R, from a set of classfers H H L denotes the set of lnear classfers, noted h θ (x) θ x wth θ X A (surrogate) loss s a functon F : R R + We let F (S, h) (/m) F (y h(x )) denote the emprcal surrogate rsk on S correspondng to loss F For the sake of clarty, ndexes, j and k respectvely refer to examples, bags and features The mean operator and ts mnmal suffcency µ S m We defne the (emprcal) mean operator as: y x ()

3 Algorthm Laplacan Mean Map (LMM) Input S j, ˆπ j, j [n]; γ > 0 (7); w (7); V (8); permssble φ (); λ > 0; Step : let B± arg mn X R n d l(l, X) usng (7) (Lemma ) Step : let µ S j ˆp j(ˆπ j b+ j ( ˆπ j) b j ) Step 3 : let θ arg mn θ F φ (S y, θ, µ S ) + λ θ (3) Return θ Table : Correspondence between permssble functons φ and the correspondng loss F φ loss name F φ (x) φ(x) logstc loss log( + exp( x)) x log x ( x) log( x) square loss ( x) x( x) Matsushta loss x + + x x( x) The estmaton of the mean operator µ S appears to be a learnng bottleneck n the LLP settng [7] The fact that the mean operator s suffcent to learn a classfer wthout the label nformaton motvates the noton of mnmal suffcent statstc for features n ths context Let F be a set of loss functons, H be a set of classfers, I be a subset of features Some quantty t(s) s sad to be a mnmal suffcent statstc for I wth respect to F and H ff: for any F F, any h H and any two samples S and S, the quantty F (S, h) F (S, h) does not depend on I ff t(s) t(s ) Ths defnton can be motvated from the one n statstcs by buldng losses from log lkelhoods The followng Lemma motvates further the mean operator n the LLP settng, as t s the mnmal suffcent statstc for a broad set of proper scorng losses that encompass the logstc and square losses [8] The proper scorng losses we consder, hereafter called symmetrc (SPSL), are twce dfferentable, non-negatve and such that msclassfcaton cost s not label-dependent Lemma µ S s a mnmal suffcent statstc for the label varable, wth respect to SPSL and H L ([9], Subsecton ) Ths property, very useful for LLP, may also be exploted n other weakly supervsed tasks [] Up to constant scalngs that play no role n ts mnmzaton, the emprcal surrogate rsk correspondng to any SPSL, F φ (S, h), can be wrtten wth loss: F φ (x) φ(0) + φ ( x) a φ + φ ( x), () φ(0) φ(/) b φ and φ s a permssble functon [0, 8], e dom(φ) [0, ], φ s strctly convex, dfferentable and symmetrc wth respect to / φ s the convex conjugate of φ Table shows examples of F φ It follows from Lemma and ts proof, that any F φ (Sθ), can be wrtten for any θ h θ H L as: ( ) F φ (S, θ) b φ F φ (σθ x ) m θ µ S F φ (S y, θ, µ S ), (3) where σ Σ σ The Laplacan Mean Map (LMM) algorthm The sum n eq (3) s convex and dfferentable n θ Hence, once we have an accurate estmator of µ S, we can then easly ft θ to mnmze F φ (S y, θ, µ S ) Ths two-steps strategy s mplemented n LMM n algorthm µ S can be retreved from n bag-wse, label-wse unknown averages b σ j : n µ S (/) ˆp j j σ Σ (ˆπ j + σ( σ))b σ j, (4) wth b σ j E S [x σ, j] denotng these n unknowns (for j [n], σ Σ ), and let b j (/m j ) x S j x The n b σ j s are soluton of a set of n denttes that are (n matrx form): B Π B ± 0, (5) 3

4 where B [b b b n ] R n d, Π [DIAG(ˆπ) DIAG( ˆπ)] R n n and B ± R n d s the matrx of unknowns: [ ] B ± b + b + b + n b - b - b - n (6) } {{ } } {{ } (B + ) (B ) System (5) s underdetermned, unless one makes the homogenety assumpton that yelds the Mean Map estmator [7] Rather than makng such a restrctve assumpton, we regularze the cost that brngs (5) wth a manfold regularzer [], and search for B± arg mn X R n d l(l, X), wth: l(l, X) tr ( (B X Π)D w (B Π X) ) + γtr ( X ) LX, (7) and γ > 0 D w DIAG(w) s a user-fxed bas matrx wth w R n +, (and w ˆp n general) and: [ ] La 0 L εi + R 0 n n, (8) L a where L a D V R n n s the Laplacan of the bag smlartes V s a symmetrc smlarty matrx wth non negatve coordnates, and the dagonal matrx D satsfes d jj j v jj, j [n] The sze of the Laplacan s O(n ), whch s small compared to O(m ) f there are not many bags One can nterpret the Laplacan regularzaton as smoothng the estmates of b σ j wrt the smlarty of the respectve bags Lemma The soluton B± to mn X R n d l(l, X) s B± ( ΠD w Π + γl ) ΠDw B ([9], Subsecton ) Ths Lemma explans the role of penalty εi n (8) as ΠD w Π and L have respectvely n- and ( )-dm null spaces, so the nverson may not be possble Even when ths does not happen exactly, ths may ncur numercal nstabltes n computng the nverse For domans where ths rsk exsts, pckng a small ε > 0 solves the problem Let b σ j denote the row-wse decomposton of B± followng (6), from whch we compute µ S followng (4) when we use these n estmates n leu of the true b σ j We compare µ j ˆπ j b + j ( ˆπ j)b j, j [n] to our estmates µ j ˆπ j b+ j ( ˆπ j) b j, j [n], granted that µ S j ˆp jµ j and µ S j ˆp j µ j Theorem 3 Suppose that γ satsfes γ ((ε(n) ) + max j j v jj )/ mn j w j Let M [µ µ µ n ] R n d, M [ µ µ µ n ] R n d and ς(v, B ± ) ((ε(n) ) + max j j v jj ) B ± F The followng holds: M M F ( ) n mn wj ς(v, B ± ) (9) j ([9], Subsecton 3) The multplcatve factor to ς n (9) s roughly O(n 5/ ) when there s no large dscrepancy n the bas matrx D w, so the upperbound s drven by ς(, ) when there are not many bags We have studed ts varatons when the dstngushablty between bags ncreases Ths settng s nterestng because n ths case we may kll two brds n one shot, wth the estmaton of M and the subsequent learnng problem potentally easer, n partcular for lnear separators We consder two examples for v jj, the frst beng (half) the normalzed assocaton []: v nc jj ( ASSOC(Sj, S j ) ASSOC(S j, S j S j ) + ASSOC(S j, S j ) ASSOC(S j, S j S j ) ) NASSOC(S j, S j ), (0) v G,s jj exp( b j b j /s), s > 0 () Here, ASSOC(S j, S j ) x S j,x S x x j [] To put these two smlarty measures n the context of Theorem 3, consder the settng where we can make assumpton (D) that there exsts a small constant κ > 0 such that b j b j κ max σ,j b σ j, j, j [n] Ths s a weak dstngushablty property as f no such κ exsts, then the centers of dstnct bags may just be confounded Consder also the addtonal assumpton, (D), that there exsts κ > 0 such that max j d j κ, j [n], where d j max x,x x Sj x s a bag s dameter In the followng Lemma, the lttle-oh notaton s wth respect to the largest unknown n eq (4), e max σ,j b σ j 4

5 Algorthm Alternatng Mean Map (AMM OPT ) Input LMM parameters + optmzaton strategy OPT {mn, max} + convergence predcate PR Step : let θ 0 LMM(LMM parameters) and t 0 Step : repeat Step : let σ t arg OPT σ Σ ˆπ F φ (S y, θ t, µ S (σ)) Step : let θ t+ arg mn θ F φ (S y, θ, µ S (σ t )) + λ θ Step 3 : let t t + untl predcate PR s true Return θ arg mn t F φ (S y, θ t+, µ S (σ t )) Lemma 4 There exsts ε > 0 such that ε ε, the followng holds: () ς(v nc, B ± ) o() under assumptons (D + D); () ς(v G,s, B ± ) o() under assumpton (D), s > 0 ([9], Subsecton 4) Hence, provded a weak (D) or stronger (D+D) dstngushablty assumpton holds, the dvergence between M and M gets smaller wth the ncrease of the norm of the unknowns b σ j The proof of the Lemma suggests that the convergence may be faster for VG,s The followng Lemma shows that both smlartes also partally encode the hardness of solvng the classfcaton problem wth lnear separators, so that the manfold regularzer lmts the dstorton of the b ± s between two bags that tend not to be lnearly separable Lemma 5 Take v jj {v G, jj, vnc jj } There exsts 0 < κ l < κ n < such that () f v jj > κ n then S j, S j are not lnearly separable, and f v jj < κ l then S j, S j are lnearly separable ([9], Subsecton 5) Ths Lemma s an advocacy to ft s n a data-dependent way n v G,s jj The queston may be rased as to whether fnte samples approxmaton results lke Theorem 3 can be proven for the Mean Map estmator [7] [9], Subsecton 6 answers by the negatve In the Laplacan Mean Map algorthm (LMM, Algorthm ), Steps and have now been descrbed Step 3 s a dfferentable convex mnmzaton problem for θ that does not use the labels, so t does not present any techncal dffculty An nterestng queston s how much our classfer θ n Step 3 dverges from the one that would be computed wth the true expresson for µ S, θ It s not hard to show that Lemma 7 n Altun and Smola [3], and Corollary 9 n Quadranto et al [7] hold for LMM so that θ θ (λ) µ S µ S The followng Theorem shows a data-dependent approxmaton bound that can be sgnfcantly better, when t holds that θ x, θ x φ ([0, ]), (φ s the frst dervatve) We call ths settng proper scorng complance (PSC) [8] PSC always holds for the logstc and Matsushta losses for whch φ ([0, ]) R For other losses lke the square loss for whch φ ([0, ]) [, ], shrnkng the observatons n a ball of suffcently small radus s suffcent to ensure ths Theorem 6 Let f k R m denote the vector encodng the k th feature varable n S : f k x k (k [d]) Let F denote the feature matrx wth column-wse normalzed feature vectors: fk (d/ k f k ) (d )/(d) f k Under PSC, we have θ θ (λ + q) µ S µ S, wth: q det F F m e b φ φ (φ (q /λ)) (> 0), () for some q I [±(x + max{ µ S, µ S })] Here, x max x and φ (φ ) ([9], Subsecton 7) To see how large q can be, consder the smple case where all egenvalues of F F, λk ( F F) [λ ± δ] for small δ In ths case, q s proportonal to the average feature norm : det F F tr ( ) F F + o(δ) x + o(δ) m md md 5

6 The Alternatng Mean Map (AMM) algorthm Let us denote Σˆπ {σ Σ m : :x S j σ (ˆπ j )m j, j [n]} the set of labelngs that are consstent wth the observed proportons ˆπ, and µ S (σ) (/m) σ x the based mean operator computed from some σ Σˆπ Notce that the true mean operator µ S µ S (σ) for at least one σ Σˆπ The Alternatng Mean Map algorthm, (AMM, Algorthm ), starts wth the output of LMM and then optmzes t further over the set of consstent labelngs At each teraton, t frst pcks a consstent labelng n Σˆπ that s the best (OPT mn) or the worst (OPT max) for the current classfer (Step ) and then fts a classfer θ on the gven set of labels (Step ) The algorthm then terates untl a convergence predcate s met, whch tests whether the dfference between two values for F φ (,, ) s too small (AMM mn ), or the number of teratons exceeds a user-specfed lmt (AMM max ) The classfer returned θ s the best n the sequence In the case of AMM mn, t s the last of the sequence as rsk F φ (S y,, ) cannot ncrease Agan, Step s a convex mnmzaton wth no techncal dffculty Step s combnatoral It can be solved n tme almost lnear n m [9] (Subsecton 8) Lemma 7 The runnng tme of Step n AMM s Õ(m), where the tlde notaton hdes log-terms Bag-Rademacher generalzaton bounds for LLP We relate the mn and max strateges of AMM by unform convergence bounds nvolvng the true surrogate rsk, e ntegratng the unknown dstrbuton D and the true labels (whch we may never know) Prevous unform convergence bounds for LLP focus on coarser graned problems, lke the estmaton of label proportons [] We rely on a LLP generalzaton of Rademacher complexty [4, 5] Let F : R R + be a loss functon and H a set of classfers The bag emprcal Rademacher complexty of sample S, Rm, b s defned as Rm b E σ Σm sup h H {E σ Σ ˆπ E S [σ(x)f (σ (x)h(x))] The usual emprcal Rademacher complexty equals Rm b for card(σˆπ ) The Label Proporton Complexty of H s: L m E Dm E I /,I / sup E S [σ (x)(ˆπ s (x) ˆπl (x))h(x)] (3) h H Here, each of I / l, l, s a random (unformly) subset of [m] of cardnal m Let S(I/ l ) be the sze-m subset of S that corresponds to the ndexes Take l, and any x S If I / l then ˆπ l s (x ) ˆπ l l (x ) s x s bag s label proporton measured on S\S(I / l ) Else, ˆπs (x ) s ts bag s label proporton measured on S(I / ) and ˆπl (x ) s ts label (e a bag s label proporton that would contan only x ) Fnally, σ (x) x S(I / ) Σ L m tends to be all the smaller as classfers n H have small magntude on bags whose label proporton s close to / Theorem 8 Suppose h 0 st h(x) h, x, h Then, for any loss F φ, any tranng sample of sze m and any 0 < δ, wth probablty > δ, the followng bound holds over all h H: ( ) E D [F φ (yh(x))] E Σ ˆπ E S [F φ (σ(x)h(x))] + Rm b h + L m b φ m log δ (4) Furthermore, under PSC (Theorem 6), we have for any F φ : Rm b b φ E Σm sup {E S [σ(x)(ˆπ(x) (/))h(x)]} (5) h H ([9], Subsecton 9) Despte smlar shapes (3) (5), R b m and L m behave dfferently: when bags are pure (ˆπ j {0, }, j), L m 0 When bags are mpure (ˆπ j /, j), R b m 0 As bags get mpure, the bag-emprcal surrogate rsk, E Σ ˆπ E S [F φ (σ(x)h(x))], also tends to ncrease AMM mn and AMM max respectvely mnmze a lowerbound and an upperbound of ths rsk 3 Experments Algorthms We compare LMM, AMM (F φ logstc loss) to the orgnal MM [7], InvCal [], conv- SVM and alter- SVM [6] (lnear kernels) To make experments extensve, we test several ntalzatons for AMM that are not dsplayed n Algorthm (Step ): () the edge mean map estmator, µ S EMM /m ( y )( x ) (AMM EMM ), () the constant estmator µ S (AMM ), and fnally AMM 0ran whch runs 0 random ntal models ( θ 0 ), and selects the one wth smallest rsk; 6

7 AUC rel to MM 3 0 MM LMM G LMM G,s LMM nc 4 6 dvergence (a) AUC rel to Oracle MM LMM G LMM G,s LMM nc (b) AUC rel to Oracle AMM MM AMM G AMM G,s AMM nc AMM 0ran (c) AUC Oracle AMM G Bgger domans Small domans 0^ 5 0^ 3 0^ #bags/#nstance (d) Fgure : Relatve AUC (wrt MM) as homogenety assumpton s volated (a) Relatve AUC (wrt Oracle) vs on heart for LMM(b), AMM mn (c) AUC vs n/m for AMM mn G and the Oracle (d) Table : Small domans results #wn/#lose for row vs column Bold faces means p-val < 00 for Wlcoxon sgned-rank tests Top-left subtable s for one-shot methods, bottom-rght teratve ones, bottom-left compare the two Italc s state-of-the-art Grey cells hghlght the best of all (AMM mn G ) LMM algorthm MM LMM InvCal AMM mn AMM max conv- G G,s nc MM G G,s 0ran MM G G,s 0ran SVM AMM mn AMM max SVM G 36/4 G,s 38/3 30/6 nc 8/ 3/37 /37 InvCal 4/46 3/47 4/46 4/46 MM 33/6 6/4 5/5 3/8 46/4 G 38/ 35/4 30/0 37/3 47/3 3/7 G,s 35/4 33/7 30/0 35/5 47/3 4/ 7/5 eg AMM mn G,s wns on AMMmn G 7 tmes, loses 5, wth 8 tes 0ran 7/ 4/6 /8 6/4 44/6 0/30 6/34 9/3 MM 5/5 3/7 /8 5/5 45/5 5/35 3/37 3/37 8/4 G 7/3 /8 /8 6/4 45/5 7/33 4/36 4/36 0/40 3/4 G,s 5/5 /9 /8 4/6 45/5 5/35 3/37 3/37 /38 5/ 6/ 0ran 3/7 /9 9/3 4/6 50/0 9/3 5/35 7/33 7/43 9/30 0/9 7/3 conv- /9 /48 /48 /48 /48 4/46 3/47 3/47 4/46 3/47 3/47 4/46 0/50 alter- 0/50 0/50 0/50 0/50 0/30 0/50 0/50 0/50 3/47 3/47 /48 /49 0/50 7/3 ths s the same procedure of alter- SVM Matrx V (eqs (0), ()) used s ndcated n subscrpt: LMM/AMM G, LMM/AMM G,s, LMM/AMM nc respectvely denote v G,s wth s, v G,s wth s learned on cross valdaton (CV; valdaton ranges ndcated n [9]) and v nc For space reasons, results not dsplayed n the paper can be found n [9], Secton 3 (ncludng runtme comparsons, and detaled results by doman) We splt the algorthms n two groups, one-shot and teratve The latter, ncludng AMM, (conv/alter)- SVM, teratvely optmze a cost over labelngs (always consstent wth label proportons for AMM, not always for (conv/alter)- SVM) The former (LMM, InvCal) do not and are thus much faster Tests are done on a 4-core 3GHz CPUs Mac wth 3GB of RAM AMM/LMM/MM are mplemented n R Code for InvCal and SVM s [6] Smulated domans, MM and the homogenety assumpton The testng metrc s the AUC Pror to testng on our domans, we generate 6 domans that gradually move away the b σ j away from each other (wrt j), thus volatng ncreasngly the homogenety assumpton [7] The degree of volaton s measured as B ± B ± F, where B ± s the homogenety assumpton matrx, that replaces all b σ j by b σ for σ {, }, see eq (5) Fgure (a) dsplays the ratos of the AUC of LMM to the AUC of MM It shows that LMM s all the better wth respect to MM as the homogenety assumpton s volated Furthermore, learnng s n LMM mproves the results Experments on the smulated doman of [6] on whch MM obtans zero accuracy also dsplay that our algorthms perform better ( teraton only of AMM max brngs 00% AUC) Small and large domans experments We convert 0 small domans [9] (m 000) and 4 bgger ones (m > 8000) from UCI[6] nto the LLP framework We cast to one-aganst-all classfcaton when the problem s multclass On large domans, the bag assgnment functon s nspred by []: we craft bags accordng to a selected feature value, and then we remove that feature from the data Ths conforms to the dea that bag assgnment s structured and non random n real-world problems Most of our small domans, however, do not have a lot of features, so nstead of clusterng on one feature and then dscard t, we run K-MEANS on the whole data to make the bags, for K n [5] Small domans results We perform 5-folds nested CV comparsons on the 0 domans 50 AUC values for each algorthm Table synthesses the results [9], splttng one-shot and teratve algo- 7

8 Table 3: AUCs on bg domans (name: #nstances #features) Icap-shape, IIhabtat, IIIcap-colour, IVrace, Veducaton, VIcountry, VIIpoutcome, VIIIjob (number of bags); for each feature, the best result over one-shot, and over teratve algorthms s bold faced AMM mn AMM max algorthm mushroom: adult: marketng: 45 4 census: I(6) II(7) III(0) IV(5) V(6) VI(4) V(4) VII(4) VIII() IV(5) VIII(9) VI(4) EMM MM LMM G LMM G,s AMMEMM AMMMM AMM G AMM G,s AMM AMMEMM AMMMM AMM G AMM G,s AMM Oracle rthms LMM G,s outperforms all one-shot algorthms LMM G and LMM G,s are compettve wth many teratve algorthms, but lose aganst ther AMM counterpart, whch proves that addtonal optmzaton over labels s benefcal AMM G and AMM G,s are confrmed as the best varant of AMM, the frst beng the best n ths case Surprsngly, all mean map algorthms, even one-shots, are clearly superor to SVMs Further results [9] reveal that SVM performances are dampened by learnng classfers wth the nverted polarty e flppng the sgn of the classfer mproves ts performances Fgure (b, c) presents the AUC relatve to the Oracle (whch learns the classfer knowng all labels and mnmzng the logstc loss), as a functon of the Gn of bag assgnment, gn(s) 4E j [ˆπ j ( ˆπ j )] For an close to, we were expectng a drop n performances The unexpected [9] s that on some domans, large entropes ( 8) do not prevent AMM mn to compete wth the Oracle No such pattern clearly emerges for SVM and AMM max [9] Bg domans results We adopt a /5 hold-out method Scalablty results [9] dsplay that every method usng v nc and SVM are not scalable to bg domans; n partcular, the estmated tme for a sngle run of alter- SVM s >00 hours on the adult doman Table 3 presents the results on the bg domans, dstngushng the feature used for bag assgnment Bg domans confrm the effcency of LMM+AMM No approach clearly outperforms the rest, although LMM G,s s often the best one-shot Synthess Fgure (d) gves the AUCs of AMM mn G over the Oracle for all domans [9], as a functon of the degree of supervson, n/m ( f the problem s fully supervsed) Notceably, on 90% of the runs, AMM mn G gets an AUC representng at least 70% of the Oracle s Results on bg domans can be remarkable: on the census doman wth bag assgnment on race, 5 proportons are suffcent for an AUC 5 ponts below the Oracle s whch learns wth 00K labels 4 Concluson In ths paper, we have shown that effcent learnng n the LLP settng s possble, for general loss functons, va the mean operator and wthout resortng to the homogenety assumpton Through ts estmaton, the suffcency allows one to resort to standard learnng procedures for bnary classfcaton, practcally mplementng a reducton between machne learnng problems [7]; hence the mean operator estmaton may be a vable shortcut to tackle other weakly supervsed settngs [] [3] [4] [5] Approxmaton results and generalzaton bounds are provded Experments dsplay results that are superor to the state of the art, wth algorthms that scale to bg domans at affordable computatonal costs Performances sometmes compete wth the Oracle s that learns knowng all labels, even on bg domans Such expermental fndng poses severe mplcatons on the relablty of prvacy-preservng aggregaton technques wth smple group statstcs lke proportons Acknowledgments NICTA s funded by the Australan Government through the Department of Communcatons and the Australan Research Councl through the ICT Centre of Excellence Program G Patrn acknowledges that part of the research was conducted at the Commonwealth Bank of Australa We thank A Menon, D García-García, N de Fretas for nvaluable feedback, and FYu for help wth the code 8

9 References [] F X Yu, S Kumar, T Jebara, and S F Chang On learnng wth label proportons CoRR, abs/40590, 04 [] T G Detterch, R H Lathrop, and T Lozano-Pérez Solvng the multple nstance problem wth axsparallel rectangles Artfcal Intellgence, 89:3 7, 997 [3] G S Mann and A McCallum Generalzed expectaton crtera for sem-supervsed learnng of condtonal random felds In 46 th ACL, 008 [4] J Graça, K Ganchev, and B Taskar Expectaton maxmzaton and posteror constrants In NIPS*0, pages , 007 [5] P Lang, M I Jordan, and D Klen Learnng from measurements n exponental famles In 6 th ICML, pages , 009 [6] D J Muscant, J M Chrstensen, and J F Olson Supervsed learnng by tranng on aggregate outputs In 7 th ICDM, pages 5 6, 007 [7] J Hernández-González, I Inza, and J A Lozano Learnng bayesan network classfers from label proportons Pattern Recognton, 46(): , 03 [8] M Stolpe and K Mork Learnng from label proportons by optmzng cluster model selecton In 5 th ECMLPKDD, pages , 0 [9] B C Chen, L Chen, R Ramakrshnan, and D R Muscant Learnng from aggregate vews In th ICDE, pages 3 3, 006 [0] J Wojtusak, K Irvn, A Brerdnc, and A V Baranova Usng publshed medcal results and nonhomogenous data n rule learnng In 0 th ICMLA, pages 84 89, 0 [] S Rüpng Svm classfer estmaton from group probabltes In 7 th ICML, pages 9 98, 00 [] H Kueck and N de Fretas Learnng about ndvduals from group statstcs In th UAI, pages , 005 [3] S Chen, B Lu, M Qan, and C Zhang Kernel k-means based framework for aggregate outputs classfcaton In 9 th ICDMW, pages , 009 [4] K T La, F X Yu, M S Chen, and S F Chang Vdeo event detecton by nferrng temporal nstance labels In th CVPR, 04 [5] K Fan, H Zhang, S Yan, L Wang, W Zhang, and J Feng Learnng a generatve classfer from label proportons Neurocomputng, 39:47 55, 04 [6] F X Yu, D Lu, S Kumar, T Jebara, and S F Chang SVM for Learnng wth Label Proportons In 30 th ICML, pages 504 5, 03 [7] N Quadranto, A J Smola, T S Caetano, and Q V Le Estmatng labels from label proportons JMLR, 0: , 009 [8] R Nock and F Nelsen Bregman dvergences and surrogates for learnng IEEE TransPAMI, 3: , 009 [9] G Patrn, R Nock, P Rvera, and T S Caetano (Almost) no label no cry - supplementary materal In NIPS*7, 04 [0] M J Kearns and Y Mansour On the boostng ablty of top-down decson tree learnng algorthms In 8 th ACM STOC, pages , 996 [] M Belkn, P Nyog, and V Sndhwan Manfold regularzaton: A geometrc framework for learnng from labeled and unlabeled examples JMLR, 7: , 006 [] J Sh and J Malk Normalzed cuts and mage segmentaton IEEE TransPAMI, : , 000 [3] Y Altun and A J Smola Unfyng dvergence mnmzaton and statstcal nference va convex dualty In 9 th COLT, pages 39 53, 006 [4] P L Bartlett and S Mendelson Rademacher and gaussan complextes: Rsk bounds and structural results JMLR, 3:463 48, 00 [5] V Koltchnsk and D Panchenko Emprcal margn dstrbutons and boundng the generalzaton error of combned classfers Ann of Stat, 30: 50, 00 [6] K Bache and M Lchman UCI machne learnng repostory, 03 [7] A Beygelzmer, V Dan, T Hayes, J Langford, and B Zadrozny Error lmtng reductons between classfcaton tasks In th ICML, pages 49 56, 005 9

10 (Almost) No Label No Cry - Supplementary Materal Gorgo Patrn,, Rchard Nock,, Paul Rvera,, Tbero Caetano,3,4 Australan Natonal Unversty, NICTA, Unversty of New South Wales 3, Ambata 4 Sydney, NSW, Australa {namesurname}@anueduau Table of contents Supplementary materal on proofs Pg Proof of Lemma Pg Proof of Lemma Pg Proof of Theorem 3 Pg 3 Proof of Lemma 4 Pg 4 Proof of Lemma 5 Pg 6 Mean Map estmator s Lemma and Proof Pg 8 Proof of Theorem 6 Pg 9 Proof of Lemma 7 Pg 3 Proof of Theorem 8 Pg 3 Supplementary materal on experments Pg 7 Full Expermental Setup Pg 7 Smulated Doman for Volaton of Homogenety Assumpton Pg 8 Smulated Doman from [] Pg 8 Addtonal Tests on alter- SVM [] Pg 8 Scalablty Pg 9 Full Results on Small Domans Pg 9

11 Supplementary Materal on Proofs Proof of Lemma For any SPSL F (S, h), we can wrte t as ([], Lemma, [3]): F (S, h) F φ (S, h) D φ (y m φ (h(x ))), () where y ff y and 0 otherwse, φ s permssble and D φ s the Bregman dvergence wth generator φ [3] It also holds that: D φ (y φ (h(x ))) b φ F φ (yh(x)) wth: F φ (x) φ ( x) + φ(0) φ(0) φ(/) a φ + φ ( x), () b φ and φ s the convex conjugate of φ, e φ (x) xφ (x) φ(φ (x)) Furthermore, for any permssble φ, the conjex conjugate φ (x) verfes the property φ ( x) φ (x) x, (3) and so we get that: F (S, h) D φ (y m φ (h(x ))) b φ m b φ m b φ m b φ m b φ m b φ m F φ (y h(x )) ( F φ (y h(x )) + ) F φ (y h(x )) ( F φ (y h(x )) + ) F φ ( y h(x )) y h(x ) b φ F φ (yh(x )) y h(x ) m y {,+} ( ) F φ (σh(x )) h y x m σ {,+} σ {,+} F φ (σh(x )) h (µ S) (6) (4) holds because of (3), (5) holds because h s lnear So for any samples S and S wth respectve sze m and m, we have (agan usng the property that h s lnear): ( ) F (S, h) F (S, h) b φ F φ (σh(x )) m m F φ (σh(x )) x S x S σ {,+} whch yelds the statement of the Lemma Proof of Lemma Usng the fact that D w and L are symmetrc, we have: l(l, X) X + h (µ S µ S ), (7) X tr ( B D w Π ) X + X tr ( X ΠD w Π ) X + γ X tr ( X ) LX ΠD w B + ΠD w Π X + γlx 0, out of whch B± follows n Lemma (4) (5)

12 3 Proof of Theorem 3 We let Π o [DIAG(ˆπ) DIAG(ˆπ )] N an orthonormal system (n jj (ˆπ j +( ˆπ j) ) /, j [n] and 0 otherwse) Let K Πo be the n-dm subspace of R d generated by Π o The proof of Theorem (3) explots the followng Lemma, whch assumes that ε s any > 0 real for L n (8) (man fle) to be 0 When ε 0, the result of Theorem (3) stll holds but follows a dfferent proof Lemma Let A ΠD w Π and L defned as n (8) (man paper) Denote for short U ( L A + γ I ) (8) Suppose there exsts ξ > 0 such that for any x R n, the projecton of Ux n K Πo, x U,o, satsfes Then: Proof Combnng Lemma and (5), we get x U,o ξ x (9) M M F γξ B ± F (0) B ± B± Defne the followng permutaton matrx: C ( ) (A + γl) A I B ± ( (γl) A + I ) B ± () [ 0 I I 0 ] R n n () A ΠD w Π s not nvertble but dagonalsable Its (orthonormal) egenvectors can be parttoned n two matrces P o and P such that: We have: P o P [DIAG(ˆπ ) DIAG(ˆπ)] N CΠ o R n n (egenvalues 0), (3) ΠN R n n (egenvalues w j (ˆπ j + ( ˆπ j) ), j) (4) M M P o CB ± P o C B± P ( o C (γl) A + ) I B ± Π ( o (γl) A + ) I B ± (5) γπ ( o L A + γ ) I B ± (6) Eq (5) follows from the fact that C s dempotent Pluggng Frobenus norm n (6), we obtan M M F γ Π ( o L A + γ ) I B ± F γ d k Π o ( L A + γ I ) b ± k d γ ξ b ± k (7) k γ ξ B ± F, whch yelds (0) In (7), b ± k denotes column k n B± Ineq (7) makes use of assumpton (9) To ensure x U,o ξ x, t s suffcent that Ux ξ x, and snce Ux U F x, t s suffcent to show that, (8) U ξ F 3

13 wth U ξ L ξ A + ξγ I, for relevant choces of ξ We have let L ξ (/ξ)l Let 0 λ () λ n () denote the ordered egenvalues of a postve-semdefnte matrx n R n n It follows that, snce L s symmetrc postve defnte, we have λ j (L ξ A) λ j(a) λ n (L ξ ) ( 0), j [n] We have used eq (3) Weyl s Theorem then brngs: λ j (U ξ ) λ n (L ξ ) λ j (A) + ξγ λ n (L ξ ) { ξ γ f j [n] λ n(l ξ ) λ j(a) otherwse (9) Gershgorn s Theorem brngs λ n (/ξ)(ε + max j j l jj ), and furthermore the egenvalues of A satsfy λ j w j /, j n + We thus have: U ξ F nγ ξ ) 4n (ε + max j j l + jj ξ mn j wj (0) In (9) and (0), we have used the egenvalues of A gven n eqs (3) and (4) Assumng: γ ξ n, () a suffcent condton for the rght-hand sde of (0) to be s that ξ ε + max j j l jj n mn j w j () To fnsh up the proof, recall that L D V wth d jj j,j v jj and the coordnates v jj 0 Hence, l jj j j j v jj n max v jj, j [n] j j The proof s fnshed by pluggng ths upperbound n () to choose ξ, then takng the maxmal value for γ n () and fnally solvng the upperbound n (0) Ths ends the proof of Theorem 3 4 Proof of Lemma 4 We frst consder the normalzed assocaton crteron n (0): ASSOC(S j, S j ) vjj N ( ASSOC(Sj, S j ) ASSOC(S j, S j S j ) + x S j,x S j ASSOC(S ) j, S j ) ASSOC(S j, S j S j ) x x (3), 4

14 Remark that b j b j x x m j m j x S j x S j m x + j x S j m j x S j m + j m j m j x S j x x S j x + m j m j x S j,x S j x S j m j x S j x x x x x m j m j x S j m j m j x m j m j x S j,x S j x S j,x S j x x x S j x x x (4) + m j x m j m + m j x j m j m x x j m j m j x S j x S j x S j,x S j } {{ } a x x (5) m j m j x S j,x S j ASSOC(S j, S j ) (6) m j m j ( n ) ( Eq (4) explots the fact that j a n ) j n j a j and eq (5) explots the fact that a (m j m j ) x S j,x S x j x We thus have: ASSOC(S j, S j ) ASSOC(S j, S j S j ) ASSOC(S j, S j ) ASSOC(S j, S j ) + ASSOC(S j, S j ) ASSOC(S j, S j ) ASSOC(S j, S j ) + mjm j b j b j κ m j κ m j + mjm j b j b j + m j κ b j b j 5 (7) (8) (9)

15 Eq (7) uses (6) and eq (8) uses assumpton (D) Eq (8) also holds when permutng j and j, so we get: ( ) ς(v NC ε, B ± ) max j j n + + mj κ b j b j + + m j κ b j b j B ± F ( ) ε n + B ± mnj mj F + κ mn j,j b j b j ( ) ε n + B ± mnj mj F (30) + κ mn j,j b j b j ε n d max σ,j bσ j + 4κ d max σ,j b σ j mn j,j b j b j ε n d max 4κ d σ,j bσ j + κ max σ,j b σ j ) f (max NC σ,j bσ j o(), (3) where the last nequalty uses assumpton (D), and (30) uses the property that (a+b) a +b We have let f NC (x) ε n dx + 4κ d κx, (3) whch s ndeed o() f ε o(n / x) Ths proves the Lemma for ς(v NC, B ± ) The case of ς(v G,s, B ± ) s easer, as ( exp b ) ( j b j exp mn j,j b j b j ) s s ( exp κ ) s max σ,j bσ j, from assumpton (D) alone, whch gves ( ( ε ς(v G,s, B ± ) B ± F n + exp κ )) s max σ,j bσ j ( ( ε B ± F n + exp κ )) s max σ,j bσ j ( ( ε d max σ,j bσ j n + exp κ )) s max σ,j bσ j ) f (max G σ,j bσ j o(), (33) as clamed We have let f G (x) ε n dx+dx exp( κx/s), whch s ndeed o() f ε o(n / x) Remark that we shall have n general f G (x) f NC (x) and even f G (x) o(f NC (x)) f ε 0, so we may expect better convergence n the case of V G,s as max σ,j b σ j grows 5 Proof of Lemma 5 We frst restate the Lemma n a more explct way, that shall provde explct values for κ l and κ n Lemma There exst κ jj and s jj dependng on d j, d j, and κ jj > dependng on m j, m j, such that: 6

16 If v G,s jj jj > exp( /4) then S j, S j are not lnearly separable; If v G,s jj jj < exp( 64) then S j, S j are lnearly separable; If v NC jj If v NC jj > κ jj then S j, S j are not lnearly separable; < κ jj /κ jj then S j, S j are lnearly separable Proof We frst consder the normalzed assocaton crteron n (0), and we prove the Lemma for the followng expressons of κ jj and κ jj : κ jj d jj + d jj d j d j, (34) κ jj 5 max{m j, m j }, (35) wth d jj max{d j, d j } and d j max x,x S j x x, j j [n] For any bag S j, we let (b j, r j) MEB(S j ) denote the mnmum enclosng ball (MEB) for bag S j and dstance L, that s, r j s the smallest unque real such that!b j : d(x, b j ) x b j r j, x S j We have let d(x, b j ) x b j We are gong to prove a frst result nvolvng the MEBs of S j and S j, and then wll translate the result to the Lemma s statement The followng propertes follows from standard propertes of MEBs and the fact that d(, ) s a dstance (they hold for any j j ): (a) d(x, x ) r j, x, x S j ; (b) If bags S j and S j are lnearly separable, then x CO(S j ), x S j such that d(x, x ) max{r j, r j }; here, CO denotes the convex closure; (c) If bags S j and S j are lnearly separable, then d(b j, b j ) max{r j, r j }, where b j and b j are the bags average; (d) x S j, x S j st d(x, x ) r j ; (e) d(x, x ) max{r j, r j } + d(b j, b j ), x CO(S j), x CO(S j ) Let us defne ASSOC(S j, S j ) d (x, x ) (36) x S j,x S j We remark that, assumng that each bag contans at least two elements wthout loss of generalty: vjj NC + (37) + ASSOC(Bj,B j ) ASSOC(B j,b j) + ASSOC(Bj,B j ) ASSOC(B j,b j ) We have ASSOC(S j, S j ) 4m j rj and ASSOC(S j, S j ) 4m j r j (because of (a)), and also ASSOC(S j, S j ) max{m j, m j } max{rj, r j } when S j and S j are lnearly separable (because of (b)), whch yelds n ths case vjj NC + + max{mj,m j } max{r j,r j } m jrj + max{r j,r j } r j + + max{mj,m j } max{r j,r j } m j r j + max{r j,r j } r j (38) Let us name κ jj the rght-hand sde of (38) It follows that when vnc jj > κ jj, S j and S j are not lnearly separable 7

17 On the other hand, we have ASSOC(S j, S j ) m j rj and ASSOC(S j, S j ) m j r j (because of (d)), and also ASSOC(S j, S j ) m j m j ( max{r j, r j } + d(b j, b j )) m j m j (4 max{rj, rj } + d (b j, b j )), (39) because of (e) and the fact that (a + b) a + b It follows that j j : vjj NC + (40) + m j (4 max{r j,r j }+d (b j,b j )) + mj(4 max{r j,r j }+d (b j,b j )) rj r j For any j j, when d (b j, b j ) 4 max{r j, r j }, then we have from (40): vjj NC + + 6m j max{r j,r j } + 6mj max{r j,r j } rj r j > κ jj /(3 max{m j, m j }) (4) Hence, when vjj NC κ jj /(3 max{m j, m j }), t mples d(b j, b j ) > max{r j, r j }, mplyng d(b j, b j ) > r j + r j, whch s a suffcent condton for the lnear separablty of S j and S j So, we can relate the lnear separablty of S j and S j to the value of vjj NC wth respect to κ jj defned n (38) To remove the dependence n the MEB parameters and obtan the statement of the Lemma, we just have to remark that d j /4 r j 4d j, j [n], whch yelds κ jj /6 κ jj κ jj Hence, when vjj NC > κ jj, t follows that vnc jj > κ jj and S j and S j are not lnearly separable On the other hand, when vjj NC κ jj /(6 3 max{m j, m j }) κ jj /κ jj, then vjj NC κ jj /(3 max{m j, m j }) and the bags S j and S j are lnearly separable Ths acheves the proof of Lemma 5 for the normalzed assocaton crteron n (0) The proof for v G,s jj s shorter, and we prove t for s j,j max{d j, d j } (4) We have (/) max{d j, d j } max{r j, r j } max{d j, d j } Hence, because of (c) above, f S j and S j are lnearly separable, then v G,s jj /e/4 ; so, when v G,s jj > /e/4, the two bags are not lnearly separable On the other hand, f d(b j, b j ) max{r j, r j }, then because of (e) above d(b j, b j ) 4 max{r j, r j } 8 max{d j, d j }, and so v G,s jj /e64 Ths mples that f v G,s jj < /e64, then d(b j, b j ) > max{r j, r j } r j + r j, and thus the two bags are lnearly separable, as clamed Ths acheves the proof of Lemma Ths acheves the proof of Lemma 5 6 Mean Map estmator s Lemma and Proof It s not hard to check that the randomzed procedure that bulds µ S RAND yx for some random x S and y {, } guarantees O( + γ) approxmablty when some bags are close to the convex hull of S, for small γ > 0 Hence, the Mean Map estmaton of µ S can be very poor n that respect Lemma 3 For any γ > 0, the Mean Map estmator µ S MM µ S / max σ,j b σ j γ, even when (D + D) hold cannot guarantee µ MM S Proof Let x > 0, ɛ (0, ), p (0, ), p / We create a dataset from four observatons, {(x 0, ), (x 0, ), (x 3 x, ), (x 4 x, )} There are two bags, S takes ɛ of x and ɛ of x S takes ɛ of x 4 and ɛ of x 3 The label-wse estmators µ σ of [4] are soluton of ( [ ] [ ] ɛ ɛ ɛ ɛ [ µ µ ] ɛ ɛ ɛ [ ( ɛ)x ɛx ] ɛ 8 ɛ ] ) [ ɛ ɛ ɛ ɛ ] [ x 0 (43)

18 On the other hand, the true quanttes are: [ ] µ µ [ ( ɛ)x ɛx ] (44) We now mx classes n S and pck bag proportons q P S [S ] and q P S [S ] We have the class proportons defned by P S [y +] ɛq + ( ɛ)( q) p Then ( ) ( ) µ S µ S p( ɛ) ɛ x ( p)ɛ ɛ x ɛ p ɛ ɛ x ɛ( q)x (45) Furthermore, max b σ x We get µ S µ S max b σ ɛ( q) (46) Pckng ɛ and ( q) both > (γ/) s suffcent to have eq (46) > γ for any γ > 0 Remark that both assumptons (D) and (D) hold for any κ < and any κ > 0 7 Proof of Theorem 6 The proof of the Theorem nvolves two Lemmata, the frst of whch s of ndependent nterest and holds for any convex twce dfferentable functon F, and not just any F φ So, let us defne: ( ) b F (S y, θ, µ) F (σθ x ) m θ µ (47) where b s any fxed postve real Defne also the regularzed loss: F (S y, θ, µ, λ) F (S y, θ, µ) + λ θ (48) Let f k R m denote the vector encodng the k th varable n S : f k x k For any k [d], let ( d f k σ k f k denote a normalzaton of vectors f k n the sense that d f k ( d d k ( d k f k f k k ) d d fk (49) ) d ) d k f k (50) Let Ṽ collect all vectors f k n column and V collect all vectors f k n column Wthout loss of generalty, we assume V V 0, e V V postve defnte (e no feature s a lnear combnaton of the others), mplyng, because the columns of Ṽ are just postve rescalng of the columns of V, that Ṽ Ṽ 0 as well We use V nstead of F as n the man paper, n order not to counfound wth the general convex surrogate notaton F that we use here Lemma 4 Gven any two µ and µ, let θ and θ be the respectve mnmzers of F (S y,, µ, λ) and F (S y,, µ, λ) Suppose there exsts F > 0 such that surrogate F satsfes F (±(αθ + ( α)θ ) x ) F, α [0, ], [m] (5) Then the followng holds: θ θ λ + em F vol (Ṽ) µ µ, (5) where vol(ṽ) det Ṽ Ṽ denote the volume of the (row/column) system of Ṽ 9

19 Proof Our proof begns followng the same frst steps as the proof of Lemma 7 n [5], addng the steps that handle the lowerbound on F Consder the followng auxlary functon A F (τ ): A F (τ ) ( F (S y, θ, µ) F (S y, θ, µ ) ) (τ θ ) + λ τ θ, (53) where the gradent of F s computed wth respect to parameter θ The gradent of A F () s: The gradent of A F satsfes A F (τ ) F (S y, θ, µ) F (S y, θ, µ ) + λ(τ θ ), (54) A F (θ ) F (S y, θ, µ, λ) F (S y, θ, µ, λ) 0, (55) as both gradents n the rght are 0 because of the optmalty of θ and θ wth respect to F (S y,, µ, λ) and F (S y,, µ, λ) The Hessan H of A F s HA F (τ ) λi 0 and so A F s convex and s thus mnmal at τ θ Fnally, A F (θ ) 0 It comes thus A F (θ ) 0, whch yelds equvalently: 0 ( F (S y, θ, µ) F (S y, θ, µ ) ) (θ θ ) + λ θ θ ( ) b F (yθ x ) m µ b F (yθ x ) + m µ (θ θ ) y y +λ θ θ ( b F (yθ x ) ) F (yθ x ) (θ θ m ) y y } {{ } a (µ µ ) (θ θ ) + λ θ θ (56) Let us lowerbound a We have F (yθ x) yf (yθ x)x, and a Taylor expanson brngs that for any θ, θ, there exsts some α [0, ] such that, defnng we have: We thus get: a u α, y(αθ + ( α)θ ) x, (57) F (yθ x ) F (yθ x ) + y(θ θ ) x F (u α, ) (58) ( F (yθ x ) y y ( y ) F (yθ x ) (θ θ ) y(f (yθ x ) F (yθ x ))x ) (θ θ ) ( ) (θ θ ) x F (u α, )x (θ θ ) y ((θ θ ) x ) F (u α, ) F ((θ θ ) x ) (59) F (θ θ ) SS (θ θ ), (60) where matrx S R d m s formed by the observatons of S y n columns, and neq (59) comes from (5) Defne T (d/ x )SS Its trace satsfes tr (T) d Let λ d λ d λ > 0 0

20 denote egenvalues of T, wth λ strctly postve because SS V V 0 The AGH nequalty brngs: Multplyng both sde by λ and rearrangng yelds: d λ k ( ) d d λ k (6) d k ( ) d tr (T) λ d ( ) d d λ d ( ) d d (6) d λ ( ) d d det T (63) d Let λ > 0 denote the mnmal egenvalue of SS It satsfes λ ( x /d)λ and thus t comes from neq (63): ( ) d ( ) d d d λ d x det SS ( ) [ d ( ) ] d d d det d x SS ( ) d d det Ṽ Ṽ (64) d ( ) d d vol (Ṽ) (65) d e vol (Ṽ) (66) We have used notaton vol(ṽ) det Ṽ Ṽ Snce (θ θ ) SS (θ θ ) λ θ θ, combnng (60) wth (66) yelds the followng lowerbound on a: Gong back to (56), we get λ θ θ (µ µ ) (θ θ ) + a e F vol (Ṽ) θ θ (67) b em F vol (Ṽ) θ θ 0 Snce (µ µ ) (θ θ ) µ µ θ θ, we get after channg the nequaltes and solvng for θ θ : as clamed θ θ λ + em F vol (Ṽ) µ µ, The second Lemma s used to (5) when F (x) F φ Notce that we cannot rely on strong convexty arguments on F φ, as ths do not hold n general The Lemma s stated n a more general settng than for just F F φ

21 Lemma 5 Fx λ, b > 0, and let x max x Suppose that µ µ for some µ > 0 Let ( ) b F (S y, θ, µ, λ) F (σθ x ) m θ µ + λ θ, (68) and let θ arg mn θ F (S y, θ, µ, λ) Suppose that F () s L-Lpschtz Then σ θ blx + µ λ (69) Proof Let us defne a shrnkng of the optmal soluton θ, θ α αθ for α (0, ) We have ( ) b F (S y, θ α, µ, λ) F (σθα x ) m θ α µ + λ θ α σ ( ) b F (σαθ x ) α m θ µ + λα θ σ ( b F (σθ x ) + L ) σαθ m x σθ x + α θ µ σ +λα θ (70) ( ) b F (σθ bk( α) x ) + θ x α m m θ µ σ +λα θ, (7) where (70) holds because F s L-Lpschtz To have eq (7) smaller than F (S y, θ, µ, λ), we need equvalently: bl( α) θ x α m θ µ + λα θ θ µ + λ θ, that s: bl( α) m θ x + α θ µ λ( α ) θ, and to fnd an α (0, ) such that ths holds, because of Cauchy-Schwartz nequalty, t s suffcent that ( α)(blx + µ) λ( α ) θ, e: θ blx + µ λ( + α) Hence, whenever θ > (blx + µ )/λ, there s a shrnkng of the optmal soluton to eq (68) that further decreases the rsk, thus contradctng ts optmalty Ths ends the proof of Lemma 5 Notce that Lemma 5 does not requre F (x) to be convex, nor dfferentable To use ths Lemma, remark that for any F φ, F φ(x) b φ (φ ) ( x) b φ (φ ) ( x) [ /b φ, 0], (7) for any x φ ([0, ]) [], and thus F φ s /b φ -Lpschtz Fnally, consderng (5), for any α [0, ] ± (αθ + ( α)θ ) x (α θ + ( α) θ )x x + α µ + ( α) µ (73) λ x + max{ µ, µ }, (74) λ where neq (73) uses Lemma 5 wth b /K b φ µ and µ are the parameters of F (S y,, µ, λ) and F (S y,, µ, λ) n Lemma 4

22 Algorthm Label Assgnaton (LA) Input θ R d, a bag B {x R d,,,, m}, bag sze m + [m]; If B then stop Else f m + (m) then y I(m + m) I(m + 0),,,, m Else Step : arg max θ x Step : y sgn(θ x ) Step 3 : LA(θ, B\{x }, m + I(y )) Now, gong back to the parameters of Theorem 6, we make the change µ µ S and µ µ S and obtan the statement of the Theorem for nterval Ths acheves the proof of Theorem 6 I [±(x + max{ µ S, µ S })] (75) 8 Proof of Lemma 7 We make the proof for optmzaton strategy OPT mn The case OPT max flps the choce of the label n Step To mnmze F φ (S y, θ t, µ S (σ)) over σ Σˆπ, we just have to fnd σ arg max σ Σ ˆπ θ σ x, and we can do that bag-wse Algorthm presents the labelng (notaton (m) {,,, m }) Remark that the tme complexty for one bag s O(m j log m j ) due to the orderng (Step ), so the overall complexty s ndeed O(m max log m ) Lemma 6 Let σ {σ, σ,, σ m} be the set of labels obtaned after runnng LA(θ, S j, m + j ) for j,,, n Then σ arg max σ Σ ˆπ θ σ x Proof The total edge, θ σ x (for any σ Σˆπ ), can be summable bag-wse wrt the coordnates of σ Consder thus the optmal set {σ } B arg max σ {,} m : σm + m θ x σ B x, for some bag B {x,,,, m }, wth constrant m + [m ] Ths set contans the label assgnment σ returned by LA(θ, B, m + ), a property that follows from two smple observatons: P Consder any observaton x of bag B; for any optmal labelng σ of B, let m + m + I(σ ) Defne the set {σ } of optmal labelngs of B\{x } wth constrant m + m + I(σ ) Then ths set concdes wth the set created by takng the elements of {σ } B to whch we drop coordnate Ths follows from the per-observaton summablty of the total edge wrt labels P Assume m + (m ) arg max θ x, there exsts an optmal assgnment σ such that σ sgn(θ x ) Otherwse, startng from any optmal assgnment σ, we can flp the label of x and the label of any other x for whch σ σ, and get a label assgnment that satsfes constrant m + and cannot be worse than σ, and s thus optmal, a contradcton Hence, LA(θ, B, m + ) pcks at each teraton a label that matches one n a subset of optmal labelngs, and the recursve call preserves the subset of optmal labelngs Snce when m + (m) the soluton returned by LA(θ, B, m + ) s obvously optmal, we end up when the current B s empty wth σ arg max σ Σ ˆπ θ σ x, as clamed 9 Proof of Theorem 8 We prove separately Eqs (4) and (5) 3

23 9 Proof of eq (4) Notatons : unless explctly stated, all samples lke S and S are of sze m To make the readng of our expectatons clear and smple, we shall wrte E D for E (x,y) D, E Σm for E σ Σm, E S for E (x,y) S, E D m for E S D and E Dm for E S D We now proceed to the proof, that follows the same man steps as that of Theorem 5 n [6] For any q [0, ], let us defne the convex combnaton: F φ (q, h(x)) qf φ (h(x)) + ( q)f φ ( h(x)) (76) It follows that E Σ ˆπ E S [F φ (σ(x)h(x))] E S [F φ (ˆπ(x), h(x))], (77) wth ˆπ(x) the label proporton of the bag to whch x belongs n S We also have h, wth Λ(S) E D [F φ (yh(x))] E S [F φ (ˆπ(x), h(x))] + Λ(S), (78) sup g {E D [F φ (yg(x))] E S [F φ (ˆπ(x), g(x))]} (79) Let us bound the devatons of Λ(S) around ts expectaton on the samplng of S, usng the ndependent bounded dfferences nequalty (IBDI, [7]) for whch we need to upperbound the maxmum dfference for the supremum term computed over two samples S and S of the same sze, such that S s S wth one example replaced We have: Λ(S) Λ(S ) E S [F φ (ˆπ(x), g(x))] E S [F φ (ˆπ (x), g(x))], (80) wth ˆπ and ˆπ denotng the correspondng label proportons n S and S Let {x } S\S and {x } S \S Let x S j and x S j for some bags j and j Upperbound (80) depends only on bags j and j For any x (S j S j )\{x, x }, eqs () and (3) brng: F φ (ˆπ(x), g(x)) F φ (ˆπ (x), g(x)) F φ(g(x)) F φ ( g(x)) m(x) g(x) b φ m(x) (8) h b φ m(x), (8) where m(x) s the sze of the bag to whch t belongs n S, plus ff t s bag j and j j, mnus ff t s bag j and j j Furthermore, () and (3) also brng: F φ (ˆπ(x), g(x)) F φ ( g(x) ) + b φ (( ˆπ(x)) g(x)>0 + ˆπ(x)( g(x)>0 )) g(x) F φ (0) + b φ (( ˆπ(x)) g(x)>0 + ˆπ(x)( g(x)>0 ))h Also, t comes from ts defnton that: We obtan that: Λ(S) Λ(S ) m F φ (0) + h b φ, x S F φ (0) b φ (0φ (0) φ(φ (0))) φ(/) b φ (83) ) ( + h + + h + b φ b φ m x (S j S j )\{x,x } h b φ m(x) Q m, (84) 4

24 where ( ) h Q + b φ So the IBDI yelds that wth probablty δ/ over the samplng of S, (85) Λ(S) E Dm sup {E D [F φ (yg(x))] E S [F φ (ˆπ(x), g(x))]} + Q g m log δ, (86) We now upperbound the expectaton n (86) Usng the convexty of the supremum, we have E Dm sup {E D [F φ (yg(x))] E S [F φ (ˆπ(x), g(x))]} g { E Dm sup ED m [F φ(yg(x))] E S [F φ (ˆπ(x), g(x))] } g E Dm,D sup {E m S [F φ (yg(x))] E S [F φ (ˆπ(x), g(x))]} (87) g Consder any set S D m, and let I / [m] be a subset of m ndces, pcked unformly at random among all ( ) m m possble choces For any I [m], let S(I) denote the subset of examples whose ndex matches I, and for any x S(I), let ˆπ(x S(I)) denote ts bag proporton n S(I) For any I / l ndexed by l and any x S, let: ˆπ s l (x) { ˆπ(x S(I / l )) f x S(I / l ) ˆπ(x S\S(I / l )) otherwse (88) denote the label proportons nduced by the splt of S n two subsamples S(I / l ) and S\S(I/ l ) Let { ˆπ l l (x) y f x S(I / l ) ˆπ(x S\S(I / l )) otherwse, (89) where y s the true label of x Let σ l (x) x S(I / l ) The Label Proporton Complexty (LPC) L m quantfes the dscrepance between these two estmators When each bag n S has label proporton zero or one, each term factorng classfer h n eq (3) (man fle) s zero, so L m 0 Lemma 7 The followng holds true: E Dm,D sup {E m S [F φ (yg(x))] E S [F φ (ˆπ(x), g(x))]} g E Dm,Σ m sup {E S [σ(x)f φ (ˆπ(x), h(x))]} + L m (90) h Proof For any σ Σ m and any sets S {x, x,, x m } and S {x, x,, x m}of sze m, denote and S σ S σ {x ff σ, x otherwse}, {x ff σ, x otherwse} (S S )\S σ (9) ˆπ (x) { ˆπσ (x) f x S σ, ˆπ σ (x) otherwse, (9) where ˆπ σ () denote the label proportons n S σ and ˆπ σ () denote the label proportons n S σ Let ˆπ() denote the label proportons n S, ˆπ () denote the label proportons n S (we know each bag to whch each example n S belongs to, so we can compute these estmators), We have E Dm,D m sup h E Dm,D m sup h E Dm,D m sup h {E S [F φ (yh(x))] E S [F φ (ˆπ(x), h(x))]} { E S [F φ (ˆπ (x), h(x))] E S [F φ (ˆπ(x), h(x))] b φ { E Sσ [σ(x)f φ (ˆπ l (x), h(x))] E Sσ [σ(x)f φ (ˆπ r (x), h(x))] b φ 5 } } (93),

25 wth E S [(( ˆπ (x)) y ˆπ (x) y )h(x)] ; (94) ˆπ l (x) (( + σ(x))ˆπ (x) + ( σ(x))ˆπ(x)), ˆπ r (x) We also have from eq () and (3): (( + σ(x))ˆπ(x) + ( σ(x))ˆπ (x)) (95) E Sσ [σ(x)f φ (ˆπ l (x), h(x))] E Sσ [σ(x)f φ (ˆπ σ (x), h(x))] b φ, (96) E Sσ [σ(x)f φ (ˆπ r (x), h(x))] E Sσ [σ(x)f φ (ˆπ σ (x), h(x))] 3, b φ (97) wth E Sσ [σ(x)(ˆπ l (x) ˆπ σ (x))h(x)], (98) 3 E Sσ [σ(x)(ˆπ r (x) ˆπ σ (x))h(x)] (99) We also have: 3 E S [(ˆπ (x) y )h(x)] + E S [(ˆπ(x) ˆπ (x))h(x)] 4 (00) Puttng eqs (93), (96), (97) and (00) altogether, we get, after ntroducng Rademacher varables: {E S [F φ (yh(x))] E S [F φ (ˆπ(x), h(x))]} E Dm,D m,σm sup h E Dm,D m,σm sup h E Dm,D m,σm sup h +E Dm,D m,σm sup h {E Sσ [σ(x)f φ (ˆπ σ (x), h(x))] E Sσ [σ(x)f φ (ˆπ σ (x), h(x))] + 4 } {E Sσ [σ(x)f φ (ˆπ σ (x), h(x))] E Sσ [σ(x)f φ (ˆπ σ (x), h(x))]} {E S [(ˆπ (x) y )h(x)] + E S [(ˆπ(x) ˆπ (x))h(x)]} E Dm,D sup {E m,σm S [σ(x)f φ (ˆπ (x), h(x))] E S [σ(x)f φ (ˆπ(x), h(x))]} h +E Dm,D sup {E m,σm S [(ˆπ (x) y )h(x)] + E S [(ˆπ(x) ˆπ (x))h(x)]} (0) h E Dm,Σ m sup {E S [σ(x)f φ (ˆπ(x), h(x))]} h {E S [(ˆπ (x) y )h(x)] + E S [(ˆπ(x) ˆπ (x))h(x)]} (0) +E Dm,D m,σm sup h Eq (0) holds because the dstrbuton of the supremum s the same We also have: E Dm,D sup {E m,σm S [(ˆπ (x) y )h(x)] + E S [(ˆπ(x) ˆπ (x))h(x)]} h E Dm,D m,σm sup h {E S [(ˆπ(x) ˆπ (x))h(x)] E S [( y ˆπ (x))h(x)]} E Dm E I /,I / sup E S [σ (x)(ˆπ s (x) ˆπl (x))h(x)] (03) h L m (04) Eq (03) holds because swappng the sample does not make any dfference n the outer expectaton, as each couple of swapped samples s generated wth the same probablty wthout swappng Puttng altogether (0) and (04) ends the proof of Lemma 7 We now bound the devatons of E Σm sup h {E S [σ(x)f φ (ˆπ(x), h(x))]} wth respect to ts expectaton over the samplng of S, E Dm,Σ m sup h {E S [σ(x)f φ (ˆπ(x), h(x))]} To do that, we use a thrd tme the IBDI and compute an upperbound for E Σ m sup g {E S [σ(x)f φ (ˆπ(x), h(x))]} E Σm sup g {E S [σ(x)f φ (ˆπ(x), h(x))]} [ ] sup E g {E S [σ(x)f φ (ˆπ(x), h(x))]} Σm (05) max Σ m sup g {E S [σ(x)f φ (ˆπ(x), h(x))]} [ ] sup g {E S [σ(x)f φ (ˆπ(x), h(x))]} sup g {E S [σ(x)f φ (ˆπ(x), h(x))]} 6 Q m, (06)

26 where Q s defned n eq (85) Eq (05) holds because of the trangular nequalty Ineq (06) holds because σ() So wth probablty δ/ over the samplng of S, E Σm sup {E S [σ(x)f φ (ˆπ(x), h(x))]} h E Dm,Σ m sup {E S [σ(x)f φ (ˆπ(x), h(x))]} Q h m log δ, (07) where Q s defned va (84) We obtan that wth probablty > ((δ/) + (δ/)) δ, the followng holds h: E D [F φ (yh(x))] E S [F φ (ˆπ(x), h(x))] + Λ(S) (see (78) and (79)) E S [F φ (ˆπ(x), h(x))] + E Dm sup {E D [F φ (yg(x))] E S [F φ (ˆπ(x), g(x))]} g as clamed 9 Proof of eq (5) +Q m log (from (86)) δ E S [F φ (ˆπ(x), h(x))] + E Dm,D sup {E m S [F φ (yg(x))] E S [F φ (ˆπ(x), g(x))]} g +Q m log (from (87)) δ E S [F φ (ˆπ(x), h(x))] + E Dm,Σ m sup {E S [σ(x)f φ (ˆπ(x), g(x))]} + L m g +Q m log (Lemma (7)) δ E S [F φ (ˆπ(x), h(x))] + E Σm sup {E S [σ(x)f φ (ˆπ(x), h(x))]} + L m h +Q m log δ (from (07)) E Σ ˆπ E S [F φ (σ(x)h(x))] + ˆR b m + L m + 4 ( ) h + b φ m log δ, We have F φ (x) (/b φ))(φ ) ( x) (/b φ )(φ ) ( x) [ /b φ, 0], and thus F φ s /b φ - Lpschtz, so Theorem 4 n [8] brngs: Rm(F, b { η) E σ Σm sup E [m] [σ E σ Σ [F ˆπ φ(σ h(x ) η)]] } h H { b φ E σ Σm sup E [m] [σ E σ Σ [σ ˆπ h(x ) η]] } h H { b φ E σ Σm sup E [m] [σ E σ Σ [σ ˆπ h(x )]] } h H { b φ E σ Σm sup E [m] [σ (ˆπ(x ) )h(x )] }, h H as clamed 3 Supplementary Materal on Experments 3 Full Expermental Setup All mean operator algorthms have been coded n R For SVM and InvCal, we used a Matlab mplementaton from the authors of [] The ranges of parameters for cross valdaton are λ λ m wth λ {0} 0 {0,,}, γ 0 {,,0}, σ {,,0} for mean operator algorthms We ran all 7

27 experments wth D w I and ε 0 Snce we tested on smlar domans -6 are actually the sameranges for InvCal and SVM were taken from [] To avod an addtonal source of complexty n the analyss, we cross-valdated all hyper-parameters usng the knowledge of all labels of the valdaton sets; notce that labels at valdaton tme generally would not be accessble n real world applcatons 3 Smulated Doman for Volaton of Homogenety Assumpton The synthetc data generated for ths test conssts on 6 classfcaton problems, each one formed by 6 bags of 00 two-dmensonal normal samples The dstrbuton generatng the frst dataset satsfes the homogenety assumpton (Fgure (a)) Then, we gradually change the poston of the class-condtonal bag-condtonal means on one lnear drecton (to the rght on Fgure (b) and (c)), wth dfferent offsets for dfferent bags In Fgure we gve a graphcal explanaton of the process wth 3 bags x 0 label + bag x (a) x 0 label + bag x (b) x 0 label + bag x (c) 3 Fgure : Volaton of homogenety assumpton 33 Smulated Doman from [] The MM algorthm was shown to learn a model wth zero accuracy predcton on the toy doman of [] We report here n Table performance of all mean operator algorthms measured n transductve settng, tranng wth cross-valdaton Although none of the dstances used n our experments n LMM leads reasonable accuracy n the toy dataset, AMM max ntalsed wth any startng pont learns n one step a model whch perfectly classfes all the nstances We also notce that EMM returns an optmal classfer by tself (not reported n Table ) Table : AUC on the toy dataset of [] AMM mn AMM max EMM MM LMM G LMM G,s LMM nc ran Addtonal Tests on alter- SVM [] In our experments, we observe that AUC acheved by SVM can be hgh, but t s also often below 05; n those cases the algorthm outputs models whch are worse than random and the average performance over 5 test folds drops We are able to reproduce the same behavour on the heart 8

28 dataset provded by the authors n a demo for alter- SVM; ths also proves our bag assgnment for LLP smulaton does not ntroduce the ssue In a frst test, we randomly select 3/4 of the dataset, and randomly assgn nstances to 4 bags of fxed sze 64, followng [] We repeat the tranng splt 50 tmes wth C C p, as n the demo, and we measure AUCs on the same tranng set As expected, a consstent number of run (%) ends up producng AUC smaller than 05 We dsplay n Fgure (a) the AUC s densty profle, whch shows a relevant mass around 05; notce also the two dstrbuton modes look symmetrc around 05 In a second test, we nvestgate further measurng pars of tranng set AUC and loss value obtaned by the same executon of the algorthm In ths case, we run over all parameters ranges defned n SVM s paper, and do not pck the model that mnmzes the loss over the 0 random runs, but record losses of all Fgures (b) and (c) show scatter plots relatve to two chosen tranng set splts We observe that loss mnmzaton can lead both to hgh and low AUCs, wth only few ponts close to 05 A possble explanaton mght be n the nverted polarty of the learnt lnear classfer; nverted polarty n ths contest means havng a model whch would acheve better performance classfyng nstances labels opposte to the ones predcted We conclude that optmzng SVM s loss n some cases mght be equvalent to tran a max-margn separator of the unlabelled data, whch only explots weakly the nformaton gven by the label proportons Ths would gve a heurstc understandng of the frequent symmetrcal behavour of the AUC 5 count 4 3 alter SVM loss alter SVM loss transet AUC (a) transet AUC (b) transet AUC (c) Fgure : alter- SVM: emprcal dstrbuton of AUC (a), and relatonshp between loss and AUC n two dfferent tran spt (b)(c) 35 Scalablty Fgure 3 (a) shows runtme of learnng (ncludng cross-valdaton) of MM and LMM wth regard to the number of bags whch s the natural parameter of tme complexty for our Laplacan-based methods Although the 3 layers of cross-valdaton of LMM G,s, LMM nc results the only method clearly not scalable Fgure 3 (b) presents how our one-shots algorthms scale on all small domans as a functon of problem sze Runtme s averaged over the dfferent bag assgnments The same plot s gven n Fgure 3 (c) for teratve algorthms, n partcular AMM mn and (alter/conv)- SVM All curves are completed wth measurements on bgger domans when avalable Runtme of SVMs s not drectly comparable wth our methods Ths s due to both (a) the mplementaton on dfferent programmng languages and (b) to the fact that the code provded mplements kernel SVM, even for lnear kernels, whch s a bg overhead n computaton and memory access Nevertheless, the hgh growth rate of conv- SVM makes the algorthm not sutable for large datasets Notceably, even f alter- SVM does not show such behavour, we are not able to run t on our bgger domans, snce t requres approxmately 0 hours to run on a tranng set splt wth fxed parameters 36 Full Results on Small Domans Fnally we report detals about all experments run on the 0 small domans (Table ) In the followng Tables, columns show the number of bags generated through K-MEANS Each cell contans 9

29 runtme (s) MM LMM G LMM G,s LMM nc #bags runtme (s) MM LMM G LMM G,s LMM nc #nstance * #features runtme (s) AMM MM AMM G AMM G,s AMM nc AMM 0ran alter SVM conv SVM #nstance * #features (a) (b) (c) Fgure 3: Learnng runtme of LMM for bags number (a), and for doman sze one-shot (b) and teratve methods (c) Table : Small domans sze dataset nstances feature arrhythma australan breastw 699 colc german heart 70 4 onosphere vertebral column 60 9 vote wne 78 6 average AUC over 5 test splts and standard devaton; runtme n second s n the separated column Best performng algorthm and ones not worse than 0 AUC are bold faced Comparsons are made n the respectve top/bottom sub-tables, whch group one-shot and teratve algorthms We use to hghlght runs whch acheve average AUC greater or equal than the Oracle 0

30 Table 3: arrhythma algorthm bags 4 bags 8 bags 6 bags 3 bags AUC tme(s) AUC tme(s) AUC tme(s) AUC tme(s) AUC tme(s) AMM mn AMM max SVM EMM 709 ± ± ± ± ± 75 MM 6499 ± ± ± ± ± 949 LMM G 6499 ± ± ± ± ± 79 0 LMM G,s 6499 ± ± ± ± ± LMMnc 6499 ± ± ± ± ± InvCal 6475 ± ± ± ± ± 56 7 AMM EMM 5954 ± ± ± ± ± 88 8 AMMMM 579 ± ± ± ± ± AMM G 585 ± ± ± ± ± AMM G,s 5667 ± ± ± ± ± AMMnc 579 ± ± ± ± ± AMM 6580 ± ± ± ± ± 50 5 AMM 0ran 5409 ± ± ± ± ± AMM EMM 5059 ± ± ± ± ± AMMMM 608 ± ± ± ± ± AMM G 608 ± ± ± ± ± 67 4 AMM G,s 608 ± ± ± ± ± AMMnc 608 ± ± ± ± ± AMM 6053 ± ± ± ± ± AMM 0ran 4979 ± ± ± ± ± alter- 494 ± ± ± ± ± 60 5 conv- 545 ± ± ± ± ± Oracle 9999 ± ± ± ± ± 007 Table 4: australan algorthm bags 4 bags 8 bags 6 bags 3 bags AUC tme(s) AUC tme(s) AUC tme(s) AUC tme(s) AUC tme(s) AMM mn AMM max SVM EMM 6648 ± 36 < 6467 ± 4 < 6356 ± 400 < 647 ± 480 < 634 ± 54 < MM 808 ± 66 < 87 ± 68 < 8749 ± ± < 8953 ± 3 LMM G 808 ± ± ± ± ± 68 8 LMM G,s 808 ± ± ± ± ± 53 7 LMMnc 808 ± ± ± ± ± 4 7 Invcal 967 ± ± ± ± ± 47 5 AMMEMM 8665 ± ± ± ± ± 4 6 AMMMM 8754 ± ± ± ± ± 38 5 AMM G 8754 ± ± ± ± ± 78 8 AMM G,s 8754 ± ± ± ± ± AMMnc 8754 ± ± ± ± ± 93 7 AMM 760 ± ± ± ± ± AMM 0ran 79 ± ± ± ± ± AMMEMM 8009 ± ± ± ± ± AMMMM 8683 ± ± ± ± ± 350 AMM G 8683 ± ± ± ± ± AMM G,s 8683 ± ± ± ± ± AMMnc 8683 ± ± ± ± ± 74 7 AMM 6957 ± ± ± ± ± 30 9 AMM 0ran 778 ± ± ± ± ± alter- 536 ± ± ± ± ± 5 64 conv ± ± ± ± ± Oracle 98 ± 89 < 968 ± 4 < 944 ± 30, 96 ± 03 < 999 ± 358 < Table 5: breastw algorthm bags 4 bags 8 bags 6 bags 3 bags AUC tme(s) AUC tme(s) AUC tme(s) AUC tme(s) AUC tme(s) AMM mn AMM max SVM EMM 4865 ± 754 < 745 ± 659 < 668 ± 747 < 3488 ± 33 < 4750 ± 77 < MM 994 ± ± 039 < 998 ± 05 < 998 ± 037 < 998 ± 047 LMM G 994 ± ± ± ± ± LMM G,s 994 ± ± ± ± ± LMMnc 994 ± ± ± ± ± Invcal 967 ± ± ± ± ± 47 5 AMMEMM 9937 ± ± ± ± ± 049 AMMMM 9934 ± ± ± ± ± 048 AMM G 9934 ± ± ± ± ± AMM G,s 9934 ± ± ± ± ± AMMnc 9934 ± ± ± ± ± AMM 9935 ± 045 < 993 ± ± ± ± 048 AMM 0ran 9936 ± ± ± ± ± AMMEMM 994 ± ± ± ± ± AMM MM 990 ± ± ± ± ± AMM G 990 ± ± ± ± ± AMM G,s 990 ± ± ± ± ± AMMnc 990 ± ± ± ± ± AMM 9909 ± ± ± ± ± AMM 0ran 9897 ± ± ± ± ± 04 8 alter ± ± ± ± ± conv- 994 ± ± ± ± ± Oracle 9948 ± 04 < 9953 ± 04 < 993 ± 037 < 9943 ± 039 < 993 ± 044 <

31 Table 6: colc algorthm bags 4 bags 8 bags 6 bags 3 bags AUC tme(s) AUC tme(s) AUC tme(s) AUC tme(s) AUC tme(s) AMM mn AMM max SVM EMM 6069 ± 30 < 583 ± 636 < 599 ± 537 < 5383 ± 49 < 595 ± 38 < MM 600 ± 644 < 7048 ± 743 < 673 ± ± ± 338 LMM G 600 ± ± ± ± ± LMM G,s 600 ± ± ± ± ± 30 7 LMMnc 600 ± ± ± ± ± Invcal 3873 ± ± ± ± ± AMM EMM 59 ± ± ± ± ± AMMMM 7744 ± ± ± ± ± 58 4 AMM G 7744 ± ± 3 76 ± ± ± 33 6 AMM G,s 7744 ± ± ± ± ± AMMnc 7744 ± ± ± ± ± AMM 3869 ± ± ± ± ± AMM 0ran 3763 ± ± ± ± ± 47 3 AMM EMM 5094 ± ± ± ± ± AMMMM 4305 ± ± ± ± ± 37 0 AMM G 4305 ± ± ± ± ± AMM G,s 4305 ± ± ± ± ± AMMnc 49 ± ± ± ± ± AMM 59 ± ± ± ± ± 64 8 AMM 0ran 5639 ± ± ± ± ± alter ± ± ± ± ± conv- 57 ± ± ± ± ± Oracle 869 ± 43 < 8780 ± 50 < 8705 ± 605 < 8653 ± 75 < 8797 ± 0 < Table 7: german algorthm bags 4 bags 8 bags 6 bags 3 bags AUC tme(s) AUC tme(s) AUC tme(s) AUC tme(s) AUC tme(s) AMM mn AMM max SVM EMM 4790 ± 45 < 50 ± 57 < 460 ± 588 < 5094 ± 6 < 50 ± 55 < MM 607 ± 557 < 609 ± 400 < 6550 ± ± ± 456 LMM G 607 ± ± ± ± ± LMM G,s 607 ± ± ± ± ± 557 LMMnc 607 ± ± ± ± ± Invcal 3874 ± ± ± ± ± AMMEMM 5389 ± ± ± ± ± 8 AMMMM 6045 ± ± ± ± ± 54 7 AMM G 6045 ± ± ± ± ± AMM G,s 6045 ± ± ± ± ± AMMnc 6045 ± ± ± ± ± 56 5 AMM 3708 ± ± ± ± ± AMM 0ran 49 ± ± ± ± ± AMMEMM 4645 ± ± ± ± ± 463 AMMMM 547 ± ± ± ± ± 375 AMM G 547 ± ± ± ± ± AMM G,s 547 ± ± ± ± ± AMMnc 547 ± ± ± ± ± AMM 5839 ± ± ± ± ± AMM 0ran 5047 ± ± ± ± ± alter ± ± ± ± ± 7 64 conv- 970 ± ± ± ± ± Oracle 7943 ± 88 < 7895 ± 399 < 798 ± 70 < 794 ± 80 < 790 ± 36 < Table 8: heart algorthm bags 4 bags 8 bags 6 bags 3 bags AUC tme(s) AUC tme(s) AUC tme(s) AUC tme(s) AUC tme(s) AMM mn AMM max SVM EMM 58 ± 39 < 5043 ± 303 < 5509 ± 944 < 4955 ± 747 < 6349 ± 8 < MM 6875 ± 609 < 604 ± 354 < 8035 ± 94 < 76 ± ± 6 LMM G 6875 ± ± ± ± ± LMM G,s 6875 ± ± ± ± ± LMMnc 6875 ± ± ± ± ± Invcal 884 ± ± ± ± ± AMMEMM 6050 ± 3088 < 6336 ± ± ± ± 60 AMMMM 8659 ± ± ± ± ± 570 AMM G 8659 ± ± ± ± ± AMM G,s 8659 ± ± ± ± ± AMMnc 8659 ± ± ± ± ± AMM 906 ± 58 < 899 ± ± ± ± 58 AMM 0ran 7838 ± ± ± ± ± AMMEMM 8574 ± ± ± ± ± 85 6 AMM MM 8535 ± ± ± ± ± AMM G 8535 ± ± ± ± ± 97 3 AMM G,s 8535 ± ± ± ± ± AMMnc 8535 ± ± ± ± ± AMM 777 ± ± ± ± ± 94 6 AMM 0ran 8996 ± ± ± ± ± alter ± ± ± ± ± conv- 468 ± ± ± ± ± Oracle 97 ± 395 < 9 ± 409 < 97 ± 88 < 954 ± 76 < 94 ± 546 <

32 Table 9: onosphere algorthm bags 4 bags 8 bags 6 bags 3 bags AUC tme(s) AUC tme(s) AUC tme(s) AUC tme(s) AUC tme(s) AMM mn AMM max SVM EMM 448 ± 3 < 586 ± 80 < 5069 ± 634 < 4460 ± 39 < 489 ± 73 < MM 648 ± 88 < 7774 ± ± ± ± 46 LMM G 648 ± ± ± ± ± 44 7 LMM G,s 648 ± ± ± ± ± 458 LMMnc 648 ± ± 88 ± ± ± 43 8 Invcal 3534 ± ± ± ± ± AMM EMM 5677 ± ± ± ± ± AMMMM 4667 ± ± ± ± ± AMM G 4667 ± ± ± ± ± 55 AMM G,s 4667 ± ± ± ± ± AMMnc 4667 ± ± ± ± ± AMM 547 ± ± ± ± ± 505 AMM 0ran 569 ± ± ± ± ± 65 5 AMM EMM 5799 ± ± ± ± ± 586 AMMMM 7457 ± ± ± ± ± AMM G 7457 ± ± ± ± ± AMM G,s 7457 ± ± ± ± ± 594 AMMnc 7457 ± ± ± ± ± AMM 6553 ± ± ± ± ± 70 AMM 0ran 6505 ± ± ± ± ± alter ± ± ± ± ± conv ± ± ± ± ± 9 87 Oracle 9007 ± 504 < 8999 ± 43 < 9008 ± 550 < 894 ± 634 < 90 ± 57 < Table 0: vertebral column algorthm bags 4 bags 8 bags 6 bags 3 bags AUC tme(s) AUC tme(s) AUC tme(s) AUC tme(s) AUC tme(s) AMM mn AMM max SVM EMM 579 ± 04 < 5905 ± 046 < 543 ± 7 < 4539 ± 38 < 630 ± 786 < MM 7745 ± 64 < 7897 ± 354 < 7985 ± 44 < 874 ± 8745 ± 357 LMM G 7745 ± ± ± ± ± 30 6 LMM G,s 7745 ± ± ± ± ± LMMnc 7745 ± ± ± ± ± 357 InvCal 3374 ± ± ± ± ± AMMEMM 807 ± ± ± ± ± 04 3 AMMMM 7564 ± ± ± ± ± 93 3 AMM G 7564 ± ± ± ± ± 83 AMM G,s 7564 ± ± ± ± ± 58 3 AMMnc 7564 ± ± ± ± ± 09 7 AMM 7449 ± ± ± ± ± 75 AMM 0ran 764 ± ± ± ± ± 79 9 AMMEMM 760 ± ± ± ± ± 79 8 AMMMM 753 ± ± ± ± ± 47 9 AMM G 753 ± ± ± ± ± 47 8 AMM G,s 753 ± ± ± ± ± 36 8 AMMnc 753 ± ± ± ± ± AMM 7735 ± ± ± ± ± AMM 0ran 739 ± ± ± ± ± alter ± ± ± ± ± conv- 777 ± ± ± ± ± Oracle 9380 ± 06 < 9383 ± 67 < 9389 ± 89 < 9383 ± 6 < 9400 ± 4 < Table : vote (feature physcan-fee-freeze was removed to make the problem harder) algorthm bags 4 bags 8 bags 6 bags 3 bags AUC tme(s) AUC tme(s) AUC tme(s) AUC tme(s) AUC tme(s) AMM mn AMM max SVM EMM 543 ± 879 < 4547 ± 563 < 4688 ± ± ± 059 MM 9456 ± ± ± ± ± 50 LMM G 9456 ± ± ± ± ± 67 0 LMM G,s 9456 ± ± ± ± ± 09 8 LMMnc 9456 ± ± ± ± ± Invcal 9485 ± ± ± ± ± 65 4 AMMEMM 9367 ± ± ± ± ± 6 3 AMMMM 9348 ± 3 95 ± ± ± ± 58 4 AMM G 9348 ± ± ± ± ± 47 5 AMM G,s 9348 ± ± ± ± ± AMMnc 9348 ± ± ± ± ± AMM 9357 ± ± ± ± ± 4 AMM 0ran 9384 ± ± ± ± ± 70 8 AMMEMM 968 ± ± ± ± ± 3 5 AMM MM 947 ± ± ± ± ± 3 7 AMM G 947 ± ± ± ± ± 5 53 AMM G,s 947 ± ± ± ± ± AMMnc 947 ± ± ± ± ± 3 75 AMM 960 ± ± ± ± ± 5 5 AMM 0ran 9049 ± ± ± ± ± 67 8 alter- 558 ± ± ± ± ± 7 57 conv- 563 ± ± ± ± ± Oracle 97 ± 3 < 9743 ± 5 < 9706 ± 087 < 9733 ± 38 < 975 ± 49 < 3

33 Table : wne algorthm bags 4 bags 8 bags 6 bags 3 bags AUC tme(s) AUC tme(s) AUC tme(s) AUC tme(s) AUC tme(s) AMM mn AMM max SVM EMM 7038 ± 039 < 567 ± 985 < 554 ± 070 < 658 ± 45 < 4685 ± 67 < MM 6645 ± ± ± ± ± 45 LMM G 6645 ± ± ± ± ± LMM G,s 6645 ± ± ± ± ± LMMnc 6645 ± ± ± ± ± 06 6 Invcal 5896 ± ± ± ± ± 89 6 AMMEMM 807 ± ± ± ± ± 79 AMMMM 684 ± ± ± ± ± AMM G 684 ± ± ± ± ± 0 9 AMM G,s 684 ± ± ± ± ± 0 7 AMMnc 684 ± ± ± ± ± 0 9 AMM 8 ± 39 < 94 ± ± ± ± 366 AMM 0ran 5875 ± ± ± ± ± 66 0 AMMEMM 743 ± ± ± ± ± 55 7 AMMMM 883 ± ± ± ± ± 69 8 AMM G 883 ± ± ± ± ± 69 5 AMM G,s 883 ± ± ± ± ± AMMnc 883 ± ± ± ± ± AMM 754 ± ± ± ± ± AMM 0ran 9754 ± ± ± ± ± alter- 568 ± ± ± ± ± conv- 543 ± ± ± ± ± Oracle 9969 ± 05 < 9980 ± 044 < 9960 ± 043 < 9980 ± 044 < 9978 ± 033 < AUC rel to Oracle MM LMM G LMM G,s LMM nc AUC rel to Oracle AMM MM AMM G AMM G,s AMM nc AMM 0ran AUC rel to Oracle alter SVM conv SVM InvCal (a) (b) (c) Fgure 4: Relatve AUC (wrt Oracle) vs on arrhythma AUC rel to Oracle MM LMM G LMM G,s LMM nc AUC rel to Oracle AMM MM AMM G AMM G,s AMM nc AMM 0ran AUC rel to Oracle alter SVM conv SVM InvCal (a) (b) (c) Fgure 5: Relatve AUC (wrt Oracle) vs on australan 4

34 AUC rel to Oracle MM LMM G LMM G,s LMM nc AUC rel to Oracle AMM MM AMM G AMM G,s AMM nc AMM 0ran AUC rel to Oracle alter SVM conv SVM InvCal (a) (b) (c) Fgure 6: Relatve AUC (wrt Oracle) vs on breastw 5

35 0 0 0 AUC rel to Oracle MM LMM G LMM G,s LMM nc AUC rel to Oracle AMM MM AMM G AMM G,s AMM nc AMM 0ran AUC rel to Oracle alter SVM conv SVM InvCal (a) (b) (c) Fgure 7: Relatve AUC (wrt Oracle) vs on colc AUC rel to Oracle MM LMM G LMM G,s LMM nc AUC rel to Oracle AMM MM AMM G AMM G,s AMM nc AMM 0ran AUC rel to Oracle alter SVM conv SVM InvCal (a) (b) (c) Fgure 8: Relatve AUC (wrt Oracle) vs on german AUC rel to Oracle MM LMM G LMM G,s LMM nc AUC rel to Oracle AMM MM AMM G AMM G,s AMM nc AMM 0ran AUC rel to Oracle alter SVM conv SVM InvCal (a) (b) (c) Fgure 9: Relatve AUC (wrt Oracle) vs on heart 6

36 0 0 0 AUC rel to Oracle MM LMM G LMM G,s LMM nc (a) AUC rel to Oracle AMM MM AMM G AMM G,s AMM nc AMM 0ran (b) AUC rel to Oracle alter SVM conv SVM InvCal (c) Fgure 0: Relatve AUC (wrt Oracle) vs on onosphere AUC rel to Oracle MM LMM G LMM G,s LMM nc AUC rel to Oracle AMM MM AMM G AMM G,s AMM nc AMM 0ran AUC rel to Oracle alter SVM conv SVM InvCal (a) (b) (c) Fgure : Relatve AUC (wrt Oracle) vs on vertebral column AUC rel to Oracle MM LMM G LMM G,s LMM nc AUC rel to Oracle AMM MM AMM G AMM G,s AMM nc AMM 0ran AUC rel to Oracle alter SVM conv SVM InvCal (a) (b) (c) Fgure : Relatve AUC (wrt Oracle) vs on vote 7

37 AUC rel to Oracle MM LMM G LMM G,s LMM nc AUC rel to Oracle AMM MM AMM G AMM G,s AMM nc AMM 0ran AUC rel to Oracle alter SVM conv SVM InvCal (a) (b) (c) Fgure 3: Relatve AUC (wrt Oracle) vs on wne References [] F X Yu, D Lu, S Kumar, T Jebara, and S F Chang SVM for Learnng wth Label Proportons In 30 th ICML, pages 504 5, 03 [] R Nock and F Nelsen Bregman dvergences and surrogates for learnng IEEE TransPAMI, 3: , 009 [3] A Banerjee, X Guo, and H Wang On the optmalty of condtonal expectaton as a bregman predctor IEEE Trans on Informaton Theory, 5: , 005 [4] N Quadranto, A J Smola, T S Caetano, and Q V Le Estmatng labels from label proportons JMLR, 0: , 009 [5] Y Altun and A J Smola Unfyng dvergence mnmzaton and statstcal nference va convex dualty In 9 th COLT, pages 39 53, 006 [6] P L Bartlett and S Mendelson Rademacher and gaussan complextes: Rsk bounds and structural results JMLR, 3:463 48, 00 [7] C McDarmd Concentraton In M Habb, C McDarmd, J Ramrez-Alfonsn, and B Reed, edtors, Probablstc Methods for Algorthmc Dscrete Mathematcs, pages 54 Sprnger Verlag, 998 [8] M Ledoux and M Talagrand Probablty n Banach Spaces Sprnger Verlag, 99 8

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification Lecture 4: More classfers and classes C4B Machne Learnng Hlary 20 A. Zsserman Logstc regresson Loss functons revsted Adaboost Loss functons revsted Optmzaton Multple class classfcaton Logstc Regresson

More information

What is Candidate Sampling

What is Candidate Sampling What s Canddate Samplng Say we have a multclass or mult label problem where each tranng example ( x, T ) conssts of a context x a small (mult)set of target classes T out of a large unverse L of possble

More information

1 Example 1: Axis-aligned rectangles

1 Example 1: Axis-aligned rectangles COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture # 6 Scrbe: Aaron Schld February 21, 2013 Last class, we dscussed an analogue for Occam s Razor for nfnte hypothess spaces that, n conjuncton

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Max Wellng Department of Computer Scence Unversty of Toronto 10 Kng s College Road Toronto, M5S 3G5 Canada [email protected] Abstract Ths s a note to explan support vector machnes.

More information

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

Luby s Alg. for Maximal Independent Sets using Pairwise Independence Lecture Notes for Randomzed Algorthms Luby s Alg. for Maxmal Independent Sets usng Parwse Independence Last Updated by Erc Vgoda on February, 006 8. Maxmal Independent Sets For a graph G = (V, E), an ndependent

More information

Recurrence. 1 Definitions and main statements

Recurrence. 1 Definitions and main statements Recurrence 1 Defntons and man statements Let X n, n = 0, 1, 2,... be a MC wth the state space S = (1, 2,...), transton probabltes p j = P {X n+1 = j X n = }, and the transton matrx P = (p j ),j S def.

More information

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements Lecture 3 Densty estmaton Mlos Hauskrecht [email protected] 5329 Sennott Square Next lecture: Matlab tutoral Announcements Rules for attendng the class: Regstered for credt Regstered for audt (only f there

More information

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by 6 CHAPTER 8 COMPLEX VECTOR SPACES 5. Fnd the kernel of the lnear transformaton gven n Exercse 5. In Exercses 55 and 56, fnd the mage of v, for the ndcated composton, where and are gven by the followng

More information

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12 14 The Ch-squared dstrbuton PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 1 If a normal varable X, havng mean µ and varance σ, s standardsed, the new varable Z has a mean 0 and varance 1. When ths standardsed

More information

The Greedy Method. Introduction. 0/1 Knapsack Problem

The Greedy Method. Introduction. 0/1 Knapsack Problem The Greedy Method Introducton We have completed data structures. We now are gong to look at algorthm desgn methods. Often we are lookng at optmzaton problems whose performance s exponental. For an optmzaton

More information

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module LOSSLESS IMAGE COMPRESSION SYSTEMS Lesson 3 Lossless Compresson: Huffman Codng Instructonal Objectves At the end of ths lesson, the students should be able to:. Defne and measure source entropy..

More information

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ). REVIEW OF RISK MANAGEMENT CONCEPTS LOSS DISTRIBUTIONS AND INSURANCE Loss and nsurance: When someone s subject to the rsk of ncurrng a fnancal loss, the loss s generally modeled usng a random varable or

More information

8 Algorithm for Binary Searching in Trees

8 Algorithm for Binary Searching in Trees 8 Algorthm for Bnary Searchng n Trees In ths secton we present our algorthm for bnary searchng n trees. A crucal observaton employed by the algorthm s that ths problem can be effcently solved when the

More information

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence 1 st Internatonal Symposum on Imprecse Probabltes and Ther Applcatons, Ghent, Belgum, 29 June 2 July 1999 How Sets of Coherent Probabltes May Serve as Models for Degrees of Incoherence Mar J. Schervsh

More information

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis The Development of Web Log Mnng Based on Improve-K-Means Clusterng Analyss TngZhong Wang * College of Informaton Technology, Luoyang Normal Unversty, Luoyang, 471022, Chna [email protected] Abstract.

More information

Forecasting the Direction and Strength of Stock Market Movement

Forecasting the Direction and Strength of Stock Market Movement Forecastng the Drecton and Strength of Stock Market Movement Jngwe Chen Mng Chen Nan Ye [email protected] [email protected] [email protected] Abstract - Stock market s one of the most complcated systems

More information

L10: Linear discriminants analysis

L10: Linear discriminants analysis L0: Lnear dscrmnants analyss Lnear dscrmnant analyss, two classes Lnear dscrmnant analyss, C classes LDA vs. PCA Lmtatons of LDA Varants of LDA Other dmensonalty reducton methods CSCE 666 Pattern Analyss

More information

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting Causal, Explanatory Forecastng Assumes cause-and-effect relatonshp between system nputs and ts output Forecastng wth Regresson Analyss Rchard S. Barr Inputs System Cause + Effect Relatonshp The job of

More information

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College Feature selecton for ntruson detecton Slobodan Petrovć NISlab, Gjøvk Unversty College Contents The feature selecton problem Intruson detecton Traffc features relevant for IDS The CFS measure The mrmr measure

More information

Can Auto Liability Insurance Purchases Signal Risk Attitude?

Can Auto Liability Insurance Purchases Signal Risk Attitude? Internatonal Journal of Busness and Economcs, 2011, Vol. 10, No. 2, 159-164 Can Auto Lablty Insurance Purchases Sgnal Rsk Atttude? Chu-Shu L Department of Internatonal Busness, Asa Unversty, Tawan Sheng-Chang

More information

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network 700 Proceedngs of the 8th Internatonal Conference on Innovaton & Management Forecastng the Demand of Emergency Supples: Based on the CBR Theory and BP Neural Network Fu Deqang, Lu Yun, L Changbng School

More information

v a 1 b 1 i, a 2 b 2 i,..., a n b n i.

v a 1 b 1 i, a 2 b 2 i,..., a n b n i. SECTION 8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS 455 8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS All the vector spaces we have studed thus far n the text are real vector spaces snce the scalars are

More information

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek HE DISRIBUION OF LOAN PORFOLIO VALUE * Oldrch Alfons Vascek he amount of captal necessary to support a portfolo of debt securtes depends on the probablty dstrbuton of the portfolo loss. Consder a portfolo

More information

Logistic Regression. Steve Kroon

Logistic Regression. Steve Kroon Logstc Regresson Steve Kroon Course notes sectons: 24.3-24.4 Dsclamer: these notes do not explctly ndcate whether values are vectors or scalars, but expects the reader to dscern ths from the context. Scenaro

More information

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic Lagrange Multplers as Quanttatve Indcators n Economcs Ivan Mezník Insttute of Informatcs, Faculty of Busness and Management, Brno Unversty of TechnologCzech Republc Abstract The quanttatve role of Lagrange

More information

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006 Latent Class Regresson Statstcs for Psychosocal Research II: Structural Models December 4 and 6, 2006 Latent Class Regresson (LCR) What s t and when do we use t? Recall the standard latent class model

More information

ECE544NA Final Project: Robust Machine Learning Hardware via Classifier Ensemble

ECE544NA Final Project: Robust Machine Learning Hardware via Classifier Ensemble 1 ECE544NA Fnal Project: Robust Machne Learnng Hardware va Classfer Ensemble Sa Zhang, [email protected] Dept. of Electr. & Comput. Eng., Unv. of Illnos at Urbana-Champagn, Urbana, IL, USA Abstract In

More information

Realistic Image Synthesis

Realistic Image Synthesis Realstc Image Synthess - Combned Samplng and Path Tracng - Phlpp Slusallek Karol Myszkowsk Vncent Pegoraro Overvew: Today Combned Samplng (Multple Importance Samplng) Renderng and Measurng Equaton Random

More information

An Alternative Way to Measure Private Equity Performance

An Alternative Way to Measure Private Equity Performance An Alternatve Way to Measure Prvate Equty Performance Peter Todd Parlux Investment Technology LLC Summary Internal Rate of Return (IRR) s probably the most common way to measure the performance of prvate

More information

A Probabilistic Theory of Coherence

A Probabilistic Theory of Coherence A Probablstc Theory of Coherence BRANDEN FITELSON. The Coherence Measure C Let E be a set of n propostons E,..., E n. We seek a probablstc measure C(E) of the degree of coherence of E. Intutvely, we want

More information

Single and multiple stage classifiers implementing logistic discrimination

Single and multiple stage classifiers implementing logistic discrimination Sngle and multple stage classfers mplementng logstc dscrmnaton Hélo Radke Bttencourt 1 Dens Alter de Olvera Moraes 2 Vctor Haertel 2 1 Pontfíca Unversdade Católca do Ro Grande do Sul - PUCRS Av. Ipranga,

More information

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications CMSC828G Prncples of Data Mnng Lecture #9 Today s Readng: HMS, chapter 9 Today s Lecture: Descrptve Modelng Clusterng Algorthms Descrptve Models model presents the man features of the data, a global summary

More information

Loop Parallelization

Loop Parallelization - - Loop Parallelzaton C-52 Complaton steps: nested loops operatng on arrays, sequentell executon of teraton space DECLARE B[..,..+] FOR I :=.. FOR J :=.. I B[I,J] := B[I-,J]+B[I-,J-] ED FOR ED FOR analyze

More information

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching) Face Recognton Problem Face Verfcaton Problem Face Verfcaton (1:1 matchng) Querymage face query Face Recognton (1:N matchng) database Applcaton: Access Control www.vsage.com www.vsoncs.com Bometrc Authentcaton

More information

Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering

Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering Out-of-Sample Extensons for LLE, Isomap, MDS, Egenmaps, and Spectral Clusterng Yoshua Bengo, Jean-Franços Paement, Pascal Vncent Olver Delalleau, Ncolas Le Roux and Mare Oumet Département d Informatque

More information

DEFINING %COMPLETE IN MICROSOFT PROJECT

DEFINING %COMPLETE IN MICROSOFT PROJECT CelersSystems DEFINING %COMPLETE IN MICROSOFT PROJECT PREPARED BY James E Aksel, PMP, PMI-SP, MVP For Addtonal Informaton about Earned Value Management Systems and reportng, please contact: CelersSystems,

More information

J. Parallel Distrib. Comput.

J. Parallel Distrib. Comput. J. Parallel Dstrb. Comput. 71 (2011) 62 76 Contents lsts avalable at ScenceDrect J. Parallel Dstrb. Comput. journal homepage: www.elsever.com/locate/jpdc Optmzng server placement n dstrbuted systems n

More information

Performance Analysis and Coding Strategy of ECOC SVMs

Performance Analysis and Coding Strategy of ECOC SVMs Internatonal Journal of Grd and Dstrbuted Computng Vol.7, No. (04), pp.67-76 http://dx.do.org/0.457/jgdc.04.7..07 Performance Analyss and Codng Strategy of ECOC SVMs Zhgang Yan, and Yuanxuan Yang, School

More information

Point cloud to point cloud rigid transformations. Minimizing Rigid Registration Errors

Point cloud to point cloud rigid transformations. Minimizing Rigid Registration Errors Pont cloud to pont cloud rgd transformatons Russell Taylor 600.445 1 600.445 Fall 000-014 Copyrght R. H. Taylor Mnmzng Rgd Regstraton Errors Typcally, gven a set of ponts {a } n one coordnate system and

More information

BERNSTEIN POLYNOMIALS

BERNSTEIN POLYNOMIALS On-Lne Geometrc Modelng Notes BERNSTEIN POLYNOMIALS Kenneth I. Joy Vsualzaton and Graphcs Research Group Department of Computer Scence Unversty of Calforna, Davs Overvew Polynomals are ncredbly useful

More information

where the coordinates are related to those in the old frame as follows.

where the coordinates are related to those in the old frame as follows. Chapter 2 - Cartesan Vectors and Tensors: Ther Algebra Defnton of a vector Examples of vectors Scalar multplcaton Addton of vectors coplanar vectors Unt vectors A bass of non-coplanar vectors Scalar product

More information

An Interest-Oriented Network Evolution Mechanism for Online Communities

An Interest-Oriented Network Evolution Mechanism for Online Communities An Interest-Orented Network Evoluton Mechansm for Onlne Communtes Cahong Sun and Xaopng Yang School of Informaton, Renmn Unversty of Chna, Bejng 100872, P.R. Chna {chsun,yang}@ruc.edu.cn Abstract. Onlne

More information

Joint Scheduling of Processing and Shuffle Phases in MapReduce Systems

Joint Scheduling of Processing and Shuffle Phases in MapReduce Systems Jont Schedulng of Processng and Shuffle Phases n MapReduce Systems Fangfe Chen, Mural Kodalam, T. V. Lakshman Department of Computer Scence and Engneerng, The Penn State Unversty Bell Laboratores, Alcatel-Lucent

More information

New Approaches to Support Vector Ordinal Regression

New Approaches to Support Vector Ordinal Regression New Approaches to Support Vector Ordnal Regresson We Chu [email protected] Gatsby Computatonal Neuroscence Unt, Unversty College London, London, WCN 3AR, UK S. Sathya Keerth [email protected]

More information

) of the Cell class is created containing information about events associated with the cell. Events are added to the Cell instance

) of the Cell class is created containing information about events associated with the cell. Events are added to the Cell instance Calbraton Method Instances of the Cell class (one nstance for each FMS cell) contan ADC raw data and methods assocated wth each partcular FMS cell. The calbraton method ncludes event selecton (Class Cell

More information

On the Solution of Indefinite Systems Arising in Nonlinear Optimization

On the Solution of Indefinite Systems Arising in Nonlinear Optimization On the Soluton of Indefnte Systems Arsng n Nonlnear Optmzaton Slva Bonettn, Valera Ruggero and Federca Tnt Dpartmento d Matematca, Unverstà d Ferrara Abstract We consder the applcaton of the precondtoned

More information

Analysis of Premium Liabilities for Australian Lines of Business

Analysis of Premium Liabilities for Australian Lines of Business Summary of Analyss of Premum Labltes for Australan Lnes of Busness Emly Tao Honours Research Paper, The Unversty of Melbourne Emly Tao Acknowledgements I am grateful to the Australan Prudental Regulaton

More information

When Network Effect Meets Congestion Effect: Leveraging Social Services for Wireless Services

When Network Effect Meets Congestion Effect: Leveraging Social Services for Wireless Services When Network Effect Meets Congeston Effect: Leveragng Socal Servces for Wreless Servces aowen Gong School of Electrcal, Computer and Energy Engeerng Arzona State Unversty Tempe, AZ 8587, USA xgong9@asuedu

More information

How To Calculate The Accountng Perod Of Nequalty

How To Calculate The Accountng Perod Of Nequalty Inequalty and The Accountng Perod Quentn Wodon and Shlomo Ytzha World Ban and Hebrew Unversty September Abstract Income nequalty typcally declnes wth the length of tme taen nto account for measurement.

More information

Fisher Markets and Convex Programs

Fisher Markets and Convex Programs Fsher Markets and Convex Programs Nkhl R. Devanur 1 Introducton Convex programmng dualty s usually stated n ts most general form, wth convex objectve functons and convex constrants. (The book by Boyd and

More information

How To Understand The Results Of The German Meris Cloud And Water Vapour Product

How To Understand The Results Of The German Meris Cloud And Water Vapour Product Ttel: Project: Doc. No.: MERIS level 3 cloud and water vapour products MAPP MAPP-ATBD-ClWVL3 Issue: 1 Revson: 0 Date: 9.12.1998 Functon Name Organsaton Sgnature Date Author: Bennartz FUB Preusker FUB Schüller

More information

PERRON FROBENIUS THEOREM

PERRON FROBENIUS THEOREM PERRON FROBENIUS THEOREM R. CLARK ROBINSON Defnton. A n n matrx M wth real entres m, s called a stochastc matrx provded () all the entres m satsfy 0 m, () each of the columns sum to one, m = for all, ()

More information

SIMPLE LINEAR CORRELATION

SIMPLE LINEAR CORRELATION SIMPLE LINEAR CORRELATION Smple lnear correlaton s a measure of the degree to whch two varables vary together, or a measure of the ntensty of the assocaton between two varables. Correlaton often s abused.

More information

How Much to Bet on Video Poker

How Much to Bet on Video Poker How Much to Bet on Vdeo Poker Trstan Barnett A queston that arses whenever a gae s favorable to the player s how uch to wager on each event? Whle conservatve play (or nu bet nzes large fluctuatons, t lacks

More information

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm Avalable onlne www.ocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(7):1884-1889 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCPRC5 A hybrd global optmzaton algorthm based on parallel

More information

Learning from Multiple Outlooks

Learning from Multiple Outlooks Learnng from Multple Outlooks Maayan Harel Department of Electrcal Engneerng, Technon, Hafa, Israel She Mannor Department of Electrcal Engneerng, Technon, Hafa, Israel [email protected] [email protected]

More information

Efficient Project Portfolio as a tool for Enterprise Risk Management

Efficient Project Portfolio as a tool for Enterprise Risk Management Effcent Proect Portfolo as a tool for Enterprse Rsk Management Valentn O. Nkonov Ural State Techncal Unversty Growth Traectory Consultng Company January 5, 27 Effcent Proect Portfolo as a tool for Enterprse

More information

Lecture 5,6 Linear Methods for Classification. Summary

Lecture 5,6 Linear Methods for Classification. Summary Lecture 5,6 Lnear Methods for Classfcaton Rce ELEC 697 Farnaz Koushanfar Fall 2006 Summary Bayes Classfers Lnear Classfers Lnear regresson of an ndcator matrx Lnear dscrmnant analyss (LDA) Logstc regresson

More information

How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S

How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S S C H E D A E I N F O R M A T I C A E VOLUME 0 0 On Mean Squared Error of Herarchcal Estmator Stans law Brodowsk Faculty of Physcs, Astronomy, and Appled Computer Scence, Jagellonan Unversty, Reymonta

More information

Bayesian Cluster Ensembles

Bayesian Cluster Ensembles Bayesan Cluster Ensembles Hongjun Wang 1, Hanhua Shan 2 and Arndam Banerjee 2 1 Informaton Research Insttute, Southwest Jaotong Unversty, Chengdu, Schuan, 610031, Chna 2 Department of Computer Scence &

More information

SVM Tutorial: Classification, Regression, and Ranking

SVM Tutorial: Classification, Regression, and Ranking SVM Tutoral: Classfcaton, Regresson, and Rankng Hwanjo Yu and Sungchul Km 1 Introducton Support Vector Machnes(SVMs) have been extensvely researched n the data mnng and machne learnng communtes for the

More information

POLYSA: A Polynomial Algorithm for Non-binary Constraint Satisfaction Problems with and

POLYSA: A Polynomial Algorithm for Non-binary Constraint Satisfaction Problems with and POLYSA: A Polynomal Algorthm for Non-bnary Constrant Satsfacton Problems wth and Mguel A. Saldo, Federco Barber Dpto. Sstemas Informátcos y Computacón Unversdad Poltécnca de Valenca, Camno de Vera s/n

More information

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network *

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network * JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 24, 819-840 (2008) Data Broadcast on a Mult-System Heterogeneous Overlayed Wreless Network * Department of Computer Scence Natonal Chao Tung Unversty Hsnchu,

More information

An MILP model for planning of batch plants operating in a campaign-mode

An MILP model for planning of batch plants operating in a campaign-mode An MILP model for plannng of batch plants operatng n a campagn-mode Yanna Fumero Insttuto de Desarrollo y Dseño CONICET UTN [email protected] Gabrela Corsano Insttuto de Desarrollo y Dseño

More information

Lecture 2: Single Layer Perceptrons Kevin Swingler

Lecture 2: Single Layer Perceptrons Kevin Swingler Lecture 2: Sngle Layer Perceptrons Kevn Sngler [email protected] Recap: McCulloch-Ptts Neuron Ths vastly smplfed model of real neurons s also knon as a Threshold Logc Unt: W 2 A Y 3 n W n. A set of synapses

More information

Learning Permutations with Exponential Weights

Learning Permutations with Exponential Weights Journal of Machne Learnng Research 2009 (10) 1705-1736 Submtted 9/08; Publshed 7/09 Learnng Permutatons wth Exponental Weghts Davd P. Helmbold Manfred K. Warmuth Computer Scence Department Unversty of

More information

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6 PAR TESTS If a WEIGHT varable s specfed, t s used to replcate a case as many tmes as ndcated by the weght value rounded to the nearest nteger. If the workspace requrements are exceeded and samplng has

More information

Project Networks With Mixed-Time Constraints

Project Networks With Mixed-Time Constraints Project Networs Wth Mxed-Tme Constrants L Caccetta and B Wattananon Western Australan Centre of Excellence n Industral Optmsaton (WACEIO) Curtn Unversty of Technology GPO Box U1987 Perth Western Australa

More information

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

On the Optimal Control of a Cascade of Hydro-Electric Power Stations On the Optmal Control of a Cascade of Hydro-Electrc Power Statons M.C.M. Guedes a, A.F. Rbero a, G.V. Smrnov b and S. Vlela c a Department of Mathematcs, School of Scences, Unversty of Porto, Portugal;

More information

+ + + - - This circuit than can be reduced to a planar circuit

+ + + - - This circuit than can be reduced to a planar circuit MeshCurrent Method The meshcurrent s analog of the nodeoltage method. We sole for a new set of arables, mesh currents, that automatcally satsfy KCLs. As such, meshcurrent method reduces crcut soluton to

More information

Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers

Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers Foundatons and Trends R n Machne Learnng Vol. 3, No. 1 (2010) 1 122 c 2011 S. Boyd, N. Parkh, E. Chu, B. Peleato and J. Ecksten DOI: 10.1561/2200000016 Dstrbuted Optmzaton and Statstcal Learnng va the

More information

Heuristic Static Load-Balancing Algorithm Applied to CESM

Heuristic Static Load-Balancing Algorithm Applied to CESM Heurstc Statc Load-Balancng Algorthm Appled to CESM 1 Yur Alexeev, 1 Sher Mckelson, 1 Sven Leyffer, 1 Robert Jacob, 2 Anthony Crag 1 Argonne Natonal Laboratory, 9700 S. Cass Avenue, Argonne, IL 60439,

More information

CHAPTER 14 MORE ABOUT REGRESSION

CHAPTER 14 MORE ABOUT REGRESSION CHAPTER 14 MORE ABOUT REGRESSION We learned n Chapter 5 that often a straght lne descrbes the pattern of a relatonshp between two quanttatve varables. For nstance, n Example 5.1 we explored the relatonshp

More information

1. Measuring association using correlation and regression

1. Measuring association using correlation and regression How to measure assocaton I: Correlaton. 1. Measurng assocaton usng correlaton and regresson We often would lke to know how one varable, such as a mother's weght, s related to another varable, such as a

More information

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy 4.02 Quz Solutons Fall 2004 Multple-Choce Questons (30/00 ponts) Please, crcle the correct answer for each of the followng 0 multple-choce questons. For each queston, only one of the answers s correct.

More information

The OC Curve of Attribute Acceptance Plans

The OC Curve of Attribute Acceptance Plans The OC Curve of Attrbute Acceptance Plans The Operatng Characterstc (OC) curve descrbes the probablty of acceptng a lot as a functon of the lot s qualty. Fgure 1 shows a typcal OC Curve. 10 8 6 4 1 3 4

More information

INSTITUT FÜR INFORMATIK

INSTITUT FÜR INFORMATIK INSTITUT FÜR INFORMATIK Schedulng jobs on unform processors revsted Klaus Jansen Chrstna Robene Bercht Nr. 1109 November 2011 ISSN 2192-6247 CHRISTIAN-ALBRECHTS-UNIVERSITÄT ZU KIEL Insttut für Informat

More information

Statistical Methods to Develop Rating Models

Statistical Methods to Develop Rating Models Statstcal Methods to Develop Ratng Models [Evelyn Hayden and Danel Porath, Österrechsche Natonalbank and Unversty of Appled Scences at Manz] Source: The Basel II Rsk Parameters Estmaton, Valdaton, and

More information

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION Vson Mouse Saurabh Sarkar a* a Unversty of Cncnnat, Cncnnat, USA ABSTRACT The report dscusses a vson based approach towards trackng of eyes and fngers. The report descrbes the process of locatng the possble

More information

Optimal resource capacity management for stochastic networks

Optimal resource capacity management for stochastic networks Submtted for publcaton. Optmal resource capacty management for stochastc networks A.B. Deker H. Mlton Stewart School of ISyE, Georga Insttute of Technology, Atlanta, GA 30332, [email protected]

More information

GRAVITY DATA VALIDATION AND OUTLIER DETECTION USING L 1 -NORM

GRAVITY DATA VALIDATION AND OUTLIER DETECTION USING L 1 -NORM GRAVITY DATA VALIDATION AND OUTLIER DETECTION USING L 1 -NORM BARRIOT Jean-Perre, SARRAILH Mchel BGI/CNES 18.av.E.Beln 31401 TOULOUSE Cedex 4 (France) Emal: [email protected] 1/Introducton The

More information

Financial market forecasting using a two-step kernel learning method for the support vector regression

Financial market forecasting using a two-step kernel learning method for the support vector regression Ann Oper Res (2010) 174: 103 120 DOI 10.1007/s10479-008-0357-7 Fnancal market forecastng usng a two-step kernel learnng method for the support vector regresson L Wang J Zhu Publshed onlne: 28 May 2008

More information

Gender Classification for Real-Time Audience Analysis System

Gender Classification for Real-Time Audience Analysis System Gender Classfcaton for Real-Tme Audence Analyss System Vladmr Khryashchev, Lev Shmaglt, Andrey Shemyakov, Anton Lebedev Yaroslavl State Unversty Yaroslavl, Russa [email protected], [email protected], [email protected],

More information

Production. 2. Y is closed A set is closed if it contains its boundary. We need this for the solution existence in the profit maximization problem.

Production. 2. Y is closed A set is closed if it contains its boundary. We need this for the solution existence in the profit maximization problem. Producer Theory Producton ASSUMPTION 2.1 Propertes of the Producton Set The producton set Y satsfes the followng propertes 1. Y s non-empty If Y s empty, we have nothng to talk about 2. Y s closed A set

More information

Brigid Mullany, Ph.D University of North Carolina, Charlotte

Brigid Mullany, Ph.D University of North Carolina, Charlotte Evaluaton And Comparson Of The Dfferent Standards Used To Defne The Postonal Accuracy And Repeatablty Of Numercally Controlled Machnng Center Axes Brgd Mullany, Ph.D Unversty of North Carolna, Charlotte

More information

THE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES

THE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES The goal: to measure (determne) an unknown quantty x (the value of a RV X) Realsaton: n results: y 1, y 2,..., y j,..., y n, (the measured values of Y 1, Y 2,..., Y j,..., Y n ) every result s encumbered

More information

Extending Probabilistic Dynamic Epistemic Logic

Extending Probabilistic Dynamic Epistemic Logic Extendng Probablstc Dynamc Epstemc Logc Joshua Sack May 29, 2008 Probablty Space Defnton A probablty space s a tuple (S, A, µ), where 1 S s a set called the sample space. 2 A P(S) s a σ-algebra: a set

More information

Improved SVM in Cloud Computing Information Mining

Improved SVM in Cloud Computing Information Mining Internatonal Journal of Grd Dstrbuton Computng Vol.8, No.1 (015), pp.33-40 http://dx.do.org/10.1457/jgdc.015.8.1.04 Improved n Cloud Computng Informaton Mnng Lvshuhong (ZhengDe polytechnc college JangSu

More information

STATISTICAL DATA ANALYSIS IN EXCEL

STATISTICAL DATA ANALYSIS IN EXCEL Mcroarray Center STATISTICAL DATA ANALYSIS IN EXCEL Lecture 6 Some Advanced Topcs Dr. Petr Nazarov 14-01-013 [email protected] Statstcal data analyss n Ecel. 6. Some advanced topcs Correcton for

More information

Active Learning for Interactive Visualization

Active Learning for Interactive Visualization Actve Learnng for Interactve Vsualzaton Tomoharu Iwata Nel Houlsby Zoubn Ghahraman Unversty of Cambrdge Unversty of Cambrdge Unversty of Cambrdge Abstract Many automatc vsualzaton methods have been. However,

More information

Credit Limit Optimization (CLO) for Credit Cards

Credit Limit Optimization (CLO) for Credit Cards Credt Lmt Optmzaton (CLO) for Credt Cards Vay S. Desa CSCC IX, Ednburgh September 8, 2005 Copyrght 2003, SAS Insttute Inc. All rghts reserved. SAS Propretary Agenda Background Tradtonal approaches to credt

More information

On the Interaction between Load Balancing and Speed Scaling

On the Interaction between Load Balancing and Speed Scaling On the Interacton between Load Balancng and Speed Scalng Ljun Chen, Na L and Steven H. Low Engneerng & Appled Scence Dvson, Calforna Insttute of Technology, USA Abstract Speed scalng has been wdely adopted

More information

2008/8. An integrated model for warehouse and inventory planning. Géraldine Strack and Yves Pochet

2008/8. An integrated model for warehouse and inventory planning. Géraldine Strack and Yves Pochet 2008/8 An ntegrated model for warehouse and nventory plannng Géraldne Strack and Yves Pochet CORE Voe du Roman Pays 34 B-1348 Louvan-la-Neuve, Belgum. Tel (32 10) 47 43 04 Fax (32 10) 47 43 01 E-mal: [email protected]

More information

Sketching Sampled Data Streams

Sketching Sampled Data Streams Sketchng Sampled Data Streams Florn Rusu, Aln Dobra CISE Department Unversty of Florda Ganesvlle, FL, USA [email protected] [email protected] Abstract Samplng s used as a unversal method to reduce the

More information

A Simple Approach to Clustering in Excel

A Simple Approach to Clustering in Excel A Smple Approach to Clusterng n Excel Aravnd H Center for Computatonal Engneerng and Networng Amrta Vshwa Vdyapeetham, Combatore, Inda C Rajgopal Center for Computatonal Engneerng and Networng Amrta Vshwa

More information

On the Interaction between Load Balancing and Speed Scaling

On the Interaction between Load Balancing and Speed Scaling On the Interacton between Load Balancng and Speed Scalng Ljun Chen and Na L Abstract Speed scalng has been wdely adopted n computer and communcaton systems, n partcular, to reduce energy consumpton. An

More information