(Almost) No Label No Cry

Size: px
Start display at page:

Download "(Almost) No Label No Cry"

Transcription

1 (Almost) No Label No Cry Gorgo Patrn,, Rchard Nock,, Paul Rvera,, Tbero Caetano,3,4 Australan Natonal Unversty, NICTA, Unversty of New South Wales 3, Ambata 4 Sydney, NSW, Australa Abstract In Learnng wth Label Proportons (LLP), the objectve s to learn a supervsed classfer when, nstead of labels, only label proportons for bags of observatons are known Ths settng has broad practcal relevance, n partcular for prvacy preservng data processng We frst show that the mean operator, a statstc whch aggregates all labels, s mnmally suffcent for the mnmzaton of many proper scorng losses wth lnear (or kernelzed) classfers wthout usng labels We provde a fast learnng algorthm that estmates the mean operator va a manfold regularzer wth guaranteed approxmaton bounds Then, we present an teratve learnng algorthm that uses ths as ntalzaton We ground ths algorthm n Rademacher-style generalzaton bounds that ft the LLP settng, ntroducng a generalzaton of Rademacher complexty and a Label Proporton Complexty measure Ths latter algorthm optmzes tractable bounds for the correspondng bag-emprcal rsk Experments are provded on fourteen domans, whose sze ranges up to 300K observatons They dsplay that our algorthms are scalable and tend to consstently outperform the state of the art n LLP Moreover, n many cases, our algorthms compete wth or are just percents of AUC away from the Oracle that learns knowng all labels On the largest domans, half a dozen proportons can suffce, e roughly 40K tmes less than the total number of labels Introducton Machne learnng has recently experenced a prolferaton of problem settngs that, to some extent, enrch the classcal dchotomy between supervsed and unsupervsed learnng Cases as multple nstance labels, nosy labels, partal labels as well as sem-supervsed learnng have been studed motvated by applcatons where fully supervsed learnng s no longer realstc In the present work, we are nterested n learnng a bnary classfer from nformaton provded at the level of groups of nstances, called bags The type of nformaton we assume avalable s the label proportons per bag, ndcatng the fracton of postve bnary labels of ts nstances Inspred by [], we refer to ths framework as Learnng wth Label Proportons (LLP) Settngs that perform a bag-wse aggregaton of labels nclude Multple Instance Learnng (MIL) [] In MIL, the aggregaton s logcal rather than statstcal: each bag s provded wth a bnary label expressng an OR condton on all the labels contaned n the bag More general settng also exst [3] [4] [5] Many practcal scenaros ft the LLP abstracton (a) Only aggregated labels can be obtaned due to the physcal lmts of measurement tools [6] [7] [8] [9] (b) The problem s sem- or unsupervsed but doman experts have knowledge about the unlabelled samples n form of expectaton, as pseudomeasurement [5] (c) Labels exsted once but they are now gven n an aggregated fashon for prvacy-preservng reasons, as n medcal databases [0], fraud detecton [], house prce market, electon results, census data, etc (d) Ths settng also arses n computer vson [] [3] [4] Related work The settng was frst ntroduced by [], where a prncpled herarchcal model generates labels consstent wth the proportons and s traned through MCMC Subsequently, [9] and ts follower [6] offer a varety of standard learnng algorthms desgned to generate self-consstent

2 labels [5] gves a Bayesan nterpretaton of LLP where the key dstrbuton s estmated through an RBM Other deas rely on structural learnng of Bayesan networks wth mssng data [7], and on K- MEANS clusterng to solve prelmnary label assgnment [3] [8] Recent SVM mplementatons [] [6] outperform most of the other known methods Theoretcal works on LLP belong to two man categores The frst contans unform convergence results, for the estmators of label proportons [], or the estmator of the mean operator [7] The second contans approxmaton results for the classfer [7] Our work bulds upon ther Mean Map algorthm, that reles on the trck that the logstc loss may be splt n two, a convex part dependng only on the observatons, and a lnear part nvolvng a suffcent statstc for the label, the mean operator Beng able to estmate the mean operator means beng able to ft a classfer wthout usng labels In [7], ths estmaton reles on a restrctve homogenety assumpton that the class-condtonal estmaton of features does not depend on the bags Experments dsplay the lmts of ths assumpton [][6] Contrbutons In ths paper we consder lnear classfers, but our results hold for kernelzed formulatons followng [7] We frst show that the trck about the logstc loss can be generalzed, and the mean operator s actually mnmally suffcent for a wde set of symmetrc proper scorng losses wth no class-dependent msclassfcaton cost, that encompass the logstc, square and Matsushta losses [8] We then provde an algorthm, LMM, whch estmates the mean operator va a Laplacan-based manfold regularzer wthout callng to the homogenety assumpton We show that under a weak dstngushablty assumpton between bags, our estmaton of the mean operator s all the better as the observatons norm ncrease Ths, as we show, cannot hold for the Mean Map estmator Then, we provde a data-dependent approxmaton bound for our classfer wth respect to the optmal classfer, that s shown to be better than prevous bounds [7] We also show that the manfold regularzer s soluton s tghtly related to the lnear separablty of the bags We then provde an teratve algorthm, AMM, that takes as nput the soluton of LMM and optmzes t further over the set of consstent labelngs We ground the algorthm n a unform convergence result nvolvng a generalzaton of Rademacher complextes for the LLP settng The bound nvolves a bag-emprcal surrogate rsk for whch we show that AMM optmzes tractable bounds All our theoretcal results hold for any symmetrc proper scorng loss Experments are provded on fourteen domans, rangng from hundreds to hundreds of thousands of examples, comparng AMM and LMM to ther contenders: Mean Map, InvCal [] and SVM [6] They dsplay that AMM and LMM outperform ther contenders, and sometmes even compete wth the fully supervsed learner whle requrng few proportons only Tests on the largest domans dsplay the scalablty of both algorthms Such expermental evdence serously questons the safety of prvacy-preservng summarzaton of data, whenever accurate aggregates and nformatve ndvdual features are avalable Secton () presents our algorthms and related theoretcal results Secton (3) presents experments Secton (4) concludes A Supplementary Materal [9] ncludes proofs and addtonal experments LLP and the mean operator: theoretcal results and algorthms Learnng settng Hereafter, boldfaces lke p denote vectors, whose coordnates are denoted p l for l,, For any m N, let [m] {,,, m} Let Σ m {σ {, } m } and X R d Examples are couples (observaton, label) X Σ, sampled d accordng to some unknown but fxed dstrbuton D Let S {(x, y ), [m]} D m denote a sze-m sample In Learnng wth Label Proportons (LLP), we do not observe drectly S but S y, whch denotes S wth labels removed; we are gven ts partton n n > 0 bags, S y j S j, j [n], along wth ther respectve label proportons ˆπ j ˆP[y + S j ] and bag proportons ˆp j m j /m wth m j card(s j ) (Ths generalzes to a cover of S, by copyng examples among bags) The bag assgnment functon that parttons S s unknown but fxed In real world domans, t would rather be known, eg state, gender, age band A classfer s a functon h : X R, from a set of classfers H H L denotes the set of lnear classfers, noted h θ (x) θ x wth θ X A (surrogate) loss s a functon F : R R + We let F (S, h) (/m) F (y h(x )) denote the emprcal surrogate rsk on S correspondng to loss F For the sake of clarty, ndexes, j and k respectvely refer to examples, bags and features The mean operator and ts mnmal suffcency µ S m We defne the (emprcal) mean operator as: y x ()

3 Algorthm Laplacan Mean Map (LMM) Input S j, ˆπ j, j [n]; γ > 0 (7); w (7); V (8); permssble φ (); λ > 0; Step : let B± arg mn X R n d l(l, X) usng (7) (Lemma ) Step : let µ S j ˆp j(ˆπ j b+ j ( ˆπ j) b j ) Step 3 : let θ arg mn θ F φ (S y, θ, µ S ) + λ θ (3) Return θ Table : Correspondence between permssble functons φ and the correspondng loss F φ loss name F φ (x) φ(x) logstc loss log( + exp( x)) x log x ( x) log( x) square loss ( x) x( x) Matsushta loss x + + x x( x) The estmaton of the mean operator µ S appears to be a learnng bottleneck n the LLP settng [7] The fact that the mean operator s suffcent to learn a classfer wthout the label nformaton motvates the noton of mnmal suffcent statstc for features n ths context Let F be a set of loss functons, H be a set of classfers, I be a subset of features Some quantty t(s) s sad to be a mnmal suffcent statstc for I wth respect to F and H ff: for any F F, any h H and any two samples S and S, the quantty F (S, h) F (S, h) does not depend on I ff t(s) t(s ) Ths defnton can be motvated from the one n statstcs by buldng losses from log lkelhoods The followng Lemma motvates further the mean operator n the LLP settng, as t s the mnmal suffcent statstc for a broad set of proper scorng losses that encompass the logstc and square losses [8] The proper scorng losses we consder, hereafter called symmetrc (SPSL), are twce dfferentable, non-negatve and such that msclassfcaton cost s not label-dependent Lemma µ S s a mnmal suffcent statstc for the label varable, wth respect to SPSL and H L ([9], Subsecton ) Ths property, very useful for LLP, may also be exploted n other weakly supervsed tasks [] Up to constant scalngs that play no role n ts mnmzaton, the emprcal surrogate rsk correspondng to any SPSL, F φ (S, h), can be wrtten wth loss: F φ (x) φ(0) + φ ( x) a φ + φ ( x), () φ(0) φ(/) b φ and φ s a permssble functon [0, 8], e dom(φ) [0, ], φ s strctly convex, dfferentable and symmetrc wth respect to / φ s the convex conjugate of φ Table shows examples of F φ It follows from Lemma and ts proof, that any F φ (Sθ), can be wrtten for any θ h θ H L as: ( ) F φ (S, θ) b φ F φ (σθ x ) m θ µ S F φ (S y, θ, µ S ), (3) where σ Σ σ The Laplacan Mean Map (LMM) algorthm The sum n eq (3) s convex and dfferentable n θ Hence, once we have an accurate estmator of µ S, we can then easly ft θ to mnmze F φ (S y, θ, µ S ) Ths two-steps strategy s mplemented n LMM n algorthm µ S can be retreved from n bag-wse, label-wse unknown averages b σ j : n µ S (/) ˆp j j σ Σ (ˆπ j + σ( σ))b σ j, (4) wth b σ j E S [x σ, j] denotng these n unknowns (for j [n], σ Σ ), and let b j (/m j ) x S j x The n b σ j s are soluton of a set of n denttes that are (n matrx form): B Π B ± 0, (5) 3

4 where B [b b b n ] R n d, Π [DIAG(ˆπ) DIAG( ˆπ)] R n n and B ± R n d s the matrx of unknowns: [ ] B ± b + b + b + n b - b - b - n (6) } {{ } } {{ } (B + ) (B ) System (5) s underdetermned, unless one makes the homogenety assumpton that yelds the Mean Map estmator [7] Rather than makng such a restrctve assumpton, we regularze the cost that brngs (5) wth a manfold regularzer [], and search for B± arg mn X R n d l(l, X), wth: l(l, X) tr ( (B X Π)D w (B Π X) ) + γtr ( X ) LX, (7) and γ > 0 D w DIAG(w) s a user-fxed bas matrx wth w R n +, (and w ˆp n general) and: [ ] La 0 L εi + R 0 n n, (8) L a where L a D V R n n s the Laplacan of the bag smlartes V s a symmetrc smlarty matrx wth non negatve coordnates, and the dagonal matrx D satsfes d jj j v jj, j [n] The sze of the Laplacan s O(n ), whch s small compared to O(m ) f there are not many bags One can nterpret the Laplacan regularzaton as smoothng the estmates of b σ j wrt the smlarty of the respectve bags Lemma The soluton B± to mn X R n d l(l, X) s B± ( ΠD w Π + γl ) ΠDw B ([9], Subsecton ) Ths Lemma explans the role of penalty εi n (8) as ΠD w Π and L have respectvely n- and ( )-dm null spaces, so the nverson may not be possble Even when ths does not happen exactly, ths may ncur numercal nstabltes n computng the nverse For domans where ths rsk exsts, pckng a small ε > 0 solves the problem Let b σ j denote the row-wse decomposton of B± followng (6), from whch we compute µ S followng (4) when we use these n estmates n leu of the true b σ j We compare µ j ˆπ j b + j ( ˆπ j)b j, j [n] to our estmates µ j ˆπ j b+ j ( ˆπ j) b j, j [n], granted that µ S j ˆp jµ j and µ S j ˆp j µ j Theorem 3 Suppose that γ satsfes γ ((ε(n) ) + max j j v jj )/ mn j w j Let M [µ µ µ n ] R n d, M [ µ µ µ n ] R n d and ς(v, B ± ) ((ε(n) ) + max j j v jj ) B ± F The followng holds: M M F ( ) n mn wj ς(v, B ± ) (9) j ([9], Subsecton 3) The multplcatve factor to ς n (9) s roughly O(n 5/ ) when there s no large dscrepancy n the bas matrx D w, so the upperbound s drven by ς(, ) when there are not many bags We have studed ts varatons when the dstngushablty between bags ncreases Ths settng s nterestng because n ths case we may kll two brds n one shot, wth the estmaton of M and the subsequent learnng problem potentally easer, n partcular for lnear separators We consder two examples for v jj, the frst beng (half) the normalzed assocaton []: v nc jj ( ASSOC(Sj, S j ) ASSOC(S j, S j S j ) + ASSOC(S j, S j ) ASSOC(S j, S j S j ) ) NASSOC(S j, S j ), (0) v G,s jj exp( b j b j /s), s > 0 () Here, ASSOC(S j, S j ) x S j,x S x x j [] To put these two smlarty measures n the context of Theorem 3, consder the settng where we can make assumpton (D) that there exsts a small constant κ > 0 such that b j b j κ max σ,j b σ j, j, j [n] Ths s a weak dstngushablty property as f no such κ exsts, then the centers of dstnct bags may just be confounded Consder also the addtonal assumpton, (D), that there exsts κ > 0 such that max j d j κ, j [n], where d j max x,x x Sj x s a bag s dameter In the followng Lemma, the lttle-oh notaton s wth respect to the largest unknown n eq (4), e max σ,j b σ j 4

5 Algorthm Alternatng Mean Map (AMM OPT ) Input LMM parameters + optmzaton strategy OPT {mn, max} + convergence predcate PR Step : let θ 0 LMM(LMM parameters) and t 0 Step : repeat Step : let σ t arg OPT σ Σ ˆπ F φ (S y, θ t, µ S (σ)) Step : let θ t+ arg mn θ F φ (S y, θ, µ S (σ t )) + λ θ Step 3 : let t t + untl predcate PR s true Return θ arg mn t F φ (S y, θ t+, µ S (σ t )) Lemma 4 There exsts ε > 0 such that ε ε, the followng holds: () ς(v nc, B ± ) o() under assumptons (D + D); () ς(v G,s, B ± ) o() under assumpton (D), s > 0 ([9], Subsecton 4) Hence, provded a weak (D) or stronger (D+D) dstngushablty assumpton holds, the dvergence between M and M gets smaller wth the ncrease of the norm of the unknowns b σ j The proof of the Lemma suggests that the convergence may be faster for VG,s The followng Lemma shows that both smlartes also partally encode the hardness of solvng the classfcaton problem wth lnear separators, so that the manfold regularzer lmts the dstorton of the b ± s between two bags that tend not to be lnearly separable Lemma 5 Take v jj {v G, jj, vnc jj } There exsts 0 < κ l < κ n < such that () f v jj > κ n then S j, S j are not lnearly separable, and f v jj < κ l then S j, S j are lnearly separable ([9], Subsecton 5) Ths Lemma s an advocacy to ft s n a data-dependent way n v G,s jj The queston may be rased as to whether fnte samples approxmaton results lke Theorem 3 can be proven for the Mean Map estmator [7] [9], Subsecton 6 answers by the negatve In the Laplacan Mean Map algorthm (LMM, Algorthm ), Steps and have now been descrbed Step 3 s a dfferentable convex mnmzaton problem for θ that does not use the labels, so t does not present any techncal dffculty An nterestng queston s how much our classfer θ n Step 3 dverges from the one that would be computed wth the true expresson for µ S, θ It s not hard to show that Lemma 7 n Altun and Smola [3], and Corollary 9 n Quadranto et al [7] hold for LMM so that θ θ (λ) µ S µ S The followng Theorem shows a data-dependent approxmaton bound that can be sgnfcantly better, when t holds that θ x, θ x φ ([0, ]), (φ s the frst dervatve) We call ths settng proper scorng complance (PSC) [8] PSC always holds for the logstc and Matsushta losses for whch φ ([0, ]) R For other losses lke the square loss for whch φ ([0, ]) [, ], shrnkng the observatons n a ball of suffcently small radus s suffcent to ensure ths Theorem 6 Let f k R m denote the vector encodng the k th feature varable n S : f k x k (k [d]) Let F denote the feature matrx wth column-wse normalzed feature vectors: fk (d/ k f k ) (d )/(d) f k Under PSC, we have θ θ (λ + q) µ S µ S, wth: q det F F m e b φ φ (φ (q /λ)) (> 0), () for some q I [±(x + max{ µ S, µ S })] Here, x max x and φ (φ ) ([9], Subsecton 7) To see how large q can be, consder the smple case where all egenvalues of F F, λk ( F F) [λ ± δ] for small δ In ths case, q s proportonal to the average feature norm : det F F tr ( ) F F + o(δ) x + o(δ) m md md 5

6 The Alternatng Mean Map (AMM) algorthm Let us denote Σˆπ {σ Σ m : :x S j σ (ˆπ j )m j, j [n]} the set of labelngs that are consstent wth the observed proportons ˆπ, and µ S (σ) (/m) σ x the based mean operator computed from some σ Σˆπ Notce that the true mean operator µ S µ S (σ) for at least one σ Σˆπ The Alternatng Mean Map algorthm, (AMM, Algorthm ), starts wth the output of LMM and then optmzes t further over the set of consstent labelngs At each teraton, t frst pcks a consstent labelng n Σˆπ that s the best (OPT mn) or the worst (OPT max) for the current classfer (Step ) and then fts a classfer θ on the gven set of labels (Step ) The algorthm then terates untl a convergence predcate s met, whch tests whether the dfference between two values for F φ (,, ) s too small (AMM mn ), or the number of teratons exceeds a user-specfed lmt (AMM max ) The classfer returned θ s the best n the sequence In the case of AMM mn, t s the last of the sequence as rsk F φ (S y,, ) cannot ncrease Agan, Step s a convex mnmzaton wth no techncal dffculty Step s combnatoral It can be solved n tme almost lnear n m [9] (Subsecton 8) Lemma 7 The runnng tme of Step n AMM s Õ(m), where the tlde notaton hdes log-terms Bag-Rademacher generalzaton bounds for LLP We relate the mn and max strateges of AMM by unform convergence bounds nvolvng the true surrogate rsk, e ntegratng the unknown dstrbuton D and the true labels (whch we may never know) Prevous unform convergence bounds for LLP focus on coarser graned problems, lke the estmaton of label proportons [] We rely on a LLP generalzaton of Rademacher complexty [4, 5] Let F : R R + be a loss functon and H a set of classfers The bag emprcal Rademacher complexty of sample S, Rm, b s defned as Rm b E σ Σm sup h H {E σ Σ ˆπ E S [σ(x)f (σ (x)h(x))] The usual emprcal Rademacher complexty equals Rm b for card(σˆπ ) The Label Proporton Complexty of H s: L m E Dm E I /,I / sup E S [σ (x)(ˆπ s (x) ˆπl (x))h(x)] (3) h H Here, each of I / l, l, s a random (unformly) subset of [m] of cardnal m Let S(I/ l ) be the sze-m subset of S that corresponds to the ndexes Take l, and any x S If I / l then ˆπ l s (x ) ˆπ l l (x ) s x s bag s label proporton measured on S\S(I / l ) Else, ˆπs (x ) s ts bag s label proporton measured on S(I / ) and ˆπl (x ) s ts label (e a bag s label proporton that would contan only x ) Fnally, σ (x) x S(I / ) Σ L m tends to be all the smaller as classfers n H have small magntude on bags whose label proporton s close to / Theorem 8 Suppose h 0 st h(x) h, x, h Then, for any loss F φ, any tranng sample of sze m and any 0 < δ, wth probablty > δ, the followng bound holds over all h H: ( ) E D [F φ (yh(x))] E Σ ˆπ E S [F φ (σ(x)h(x))] + Rm b h + L m b φ m log δ (4) Furthermore, under PSC (Theorem 6), we have for any F φ : Rm b b φ E Σm sup {E S [σ(x)(ˆπ(x) (/))h(x)]} (5) h H ([9], Subsecton 9) Despte smlar shapes (3) (5), R b m and L m behave dfferently: when bags are pure (ˆπ j {0, }, j), L m 0 When bags are mpure (ˆπ j /, j), R b m 0 As bags get mpure, the bag-emprcal surrogate rsk, E Σ ˆπ E S [F φ (σ(x)h(x))], also tends to ncrease AMM mn and AMM max respectvely mnmze a lowerbound and an upperbound of ths rsk 3 Experments Algorthms We compare LMM, AMM (F φ logstc loss) to the orgnal MM [7], InvCal [], conv- SVM and alter- SVM [6] (lnear kernels) To make experments extensve, we test several ntalzatons for AMM that are not dsplayed n Algorthm (Step ): () the edge mean map estmator, µ S EMM /m ( y )( x ) (AMM EMM ), () the constant estmator µ S (AMM ), and fnally AMM 0ran whch runs 0 random ntal models ( θ 0 ), and selects the one wth smallest rsk; 6

7 AUC rel to MM 3 0 MM LMM G LMM G,s LMM nc 4 6 dvergence (a) AUC rel to Oracle MM LMM G LMM G,s LMM nc (b) AUC rel to Oracle AMM MM AMM G AMM G,s AMM nc AMM 0ran (c) AUC Oracle AMM G Bgger domans Small domans 0^ 5 0^ 3 0^ #bags/#nstance (d) Fgure : Relatve AUC (wrt MM) as homogenety assumpton s volated (a) Relatve AUC (wrt Oracle) vs on heart for LMM(b), AMM mn (c) AUC vs n/m for AMM mn G and the Oracle (d) Table : Small domans results #wn/#lose for row vs column Bold faces means p-val < 00 for Wlcoxon sgned-rank tests Top-left subtable s for one-shot methods, bottom-rght teratve ones, bottom-left compare the two Italc s state-of-the-art Grey cells hghlght the best of all (AMM mn G ) LMM algorthm MM LMM InvCal AMM mn AMM max conv- G G,s nc MM G G,s 0ran MM G G,s 0ran SVM AMM mn AMM max SVM G 36/4 G,s 38/3 30/6 nc 8/ 3/37 /37 InvCal 4/46 3/47 4/46 4/46 MM 33/6 6/4 5/5 3/8 46/4 G 38/ 35/4 30/0 37/3 47/3 3/7 G,s 35/4 33/7 30/0 35/5 47/3 4/ 7/5 eg AMM mn G,s wns on AMMmn G 7 tmes, loses 5, wth 8 tes 0ran 7/ 4/6 /8 6/4 44/6 0/30 6/34 9/3 MM 5/5 3/7 /8 5/5 45/5 5/35 3/37 3/37 8/4 G 7/3 /8 /8 6/4 45/5 7/33 4/36 4/36 0/40 3/4 G,s 5/5 /9 /8 4/6 45/5 5/35 3/37 3/37 /38 5/ 6/ 0ran 3/7 /9 9/3 4/6 50/0 9/3 5/35 7/33 7/43 9/30 0/9 7/3 conv- /9 /48 /48 /48 /48 4/46 3/47 3/47 4/46 3/47 3/47 4/46 0/50 alter- 0/50 0/50 0/50 0/50 0/30 0/50 0/50 0/50 3/47 3/47 /48 /49 0/50 7/3 ths s the same procedure of alter- SVM Matrx V (eqs (0), ()) used s ndcated n subscrpt: LMM/AMM G, LMM/AMM G,s, LMM/AMM nc respectvely denote v G,s wth s, v G,s wth s learned on cross valdaton (CV; valdaton ranges ndcated n [9]) and v nc For space reasons, results not dsplayed n the paper can be found n [9], Secton 3 (ncludng runtme comparsons, and detaled results by doman) We splt the algorthms n two groups, one-shot and teratve The latter, ncludng AMM, (conv/alter)- SVM, teratvely optmze a cost over labelngs (always consstent wth label proportons for AMM, not always for (conv/alter)- SVM) The former (LMM, InvCal) do not and are thus much faster Tests are done on a 4-core 3GHz CPUs Mac wth 3GB of RAM AMM/LMM/MM are mplemented n R Code for InvCal and SVM s [6] Smulated domans, MM and the homogenety assumpton The testng metrc s the AUC Pror to testng on our domans, we generate 6 domans that gradually move away the b σ j away from each other (wrt j), thus volatng ncreasngly the homogenety assumpton [7] The degree of volaton s measured as B ± B ± F, where B ± s the homogenety assumpton matrx, that replaces all b σ j by b σ for σ {, }, see eq (5) Fgure (a) dsplays the ratos of the AUC of LMM to the AUC of MM It shows that LMM s all the better wth respect to MM as the homogenety assumpton s volated Furthermore, learnng s n LMM mproves the results Experments on the smulated doman of [6] on whch MM obtans zero accuracy also dsplay that our algorthms perform better ( teraton only of AMM max brngs 00% AUC) Small and large domans experments We convert 0 small domans [9] (m 000) and 4 bgger ones (m > 8000) from UCI[6] nto the LLP framework We cast to one-aganst-all classfcaton when the problem s multclass On large domans, the bag assgnment functon s nspred by []: we craft bags accordng to a selected feature value, and then we remove that feature from the data Ths conforms to the dea that bag assgnment s structured and non random n real-world problems Most of our small domans, however, do not have a lot of features, so nstead of clusterng on one feature and then dscard t, we run K-MEANS on the whole data to make the bags, for K n [5] Small domans results We perform 5-folds nested CV comparsons on the 0 domans 50 AUC values for each algorthm Table synthesses the results [9], splttng one-shot and teratve algo- 7

8 Table 3: AUCs on bg domans (name: #nstances #features) Icap-shape, IIhabtat, IIIcap-colour, IVrace, Veducaton, VIcountry, VIIpoutcome, VIIIjob (number of bags); for each feature, the best result over one-shot, and over teratve algorthms s bold faced AMM mn AMM max algorthm mushroom: adult: marketng: 45 4 census: I(6) II(7) III(0) IV(5) V(6) VI(4) V(4) VII(4) VIII() IV(5) VIII(9) VI(4) EMM MM LMM G LMM G,s AMMEMM AMMMM AMM G AMM G,s AMM AMMEMM AMMMM AMM G AMM G,s AMM Oracle rthms LMM G,s outperforms all one-shot algorthms LMM G and LMM G,s are compettve wth many teratve algorthms, but lose aganst ther AMM counterpart, whch proves that addtonal optmzaton over labels s benefcal AMM G and AMM G,s are confrmed as the best varant of AMM, the frst beng the best n ths case Surprsngly, all mean map algorthms, even one-shots, are clearly superor to SVMs Further results [9] reveal that SVM performances are dampened by learnng classfers wth the nverted polarty e flppng the sgn of the classfer mproves ts performances Fgure (b, c) presents the AUC relatve to the Oracle (whch learns the classfer knowng all labels and mnmzng the logstc loss), as a functon of the Gn of bag assgnment, gn(s) 4E j [ˆπ j ( ˆπ j )] For an close to, we were expectng a drop n performances The unexpected [9] s that on some domans, large entropes ( 8) do not prevent AMM mn to compete wth the Oracle No such pattern clearly emerges for SVM and AMM max [9] Bg domans results We adopt a /5 hold-out method Scalablty results [9] dsplay that every method usng v nc and SVM are not scalable to bg domans; n partcular, the estmated tme for a sngle run of alter- SVM s >00 hours on the adult doman Table 3 presents the results on the bg domans, dstngushng the feature used for bag assgnment Bg domans confrm the effcency of LMM+AMM No approach clearly outperforms the rest, although LMM G,s s often the best one-shot Synthess Fgure (d) gves the AUCs of AMM mn G over the Oracle for all domans [9], as a functon of the degree of supervson, n/m ( f the problem s fully supervsed) Notceably, on 90% of the runs, AMM mn G gets an AUC representng at least 70% of the Oracle s Results on bg domans can be remarkable: on the census doman wth bag assgnment on race, 5 proportons are suffcent for an AUC 5 ponts below the Oracle s whch learns wth 00K labels 4 Concluson In ths paper, we have shown that effcent learnng n the LLP settng s possble, for general loss functons, va the mean operator and wthout resortng to the homogenety assumpton Through ts estmaton, the suffcency allows one to resort to standard learnng procedures for bnary classfcaton, practcally mplementng a reducton between machne learnng problems [7]; hence the mean operator estmaton may be a vable shortcut to tackle other weakly supervsed settngs [] [3] [4] [5] Approxmaton results and generalzaton bounds are provded Experments dsplay results that are superor to the state of the art, wth algorthms that scale to bg domans at affordable computatonal costs Performances sometmes compete wth the Oracle s that learns knowng all labels, even on bg domans Such expermental fndng poses severe mplcatons on the relablty of prvacy-preservng aggregaton technques wth smple group statstcs lke proportons Acknowledgments NICTA s funded by the Australan Government through the Department of Communcatons and the Australan Research Councl through the ICT Centre of Excellence Program G Patrn acknowledges that part of the research was conducted at the Commonwealth Bank of Australa We thank A Menon, D García-García, N de Fretas for nvaluable feedback, and FYu for help wth the code 8

9 References [] F X Yu, S Kumar, T Jebara, and S F Chang On learnng wth label proportons CoRR, abs/40590, 04 [] T G Detterch, R H Lathrop, and T Lozano-Pérez Solvng the multple nstance problem wth axsparallel rectangles Artfcal Intellgence, 89:3 7, 997 [3] G S Mann and A McCallum Generalzed expectaton crtera for sem-supervsed learnng of condtonal random felds In 46 th ACL, 008 [4] J Graça, K Ganchev, and B Taskar Expectaton maxmzaton and posteror constrants In NIPS*0, pages , 007 [5] P Lang, M I Jordan, and D Klen Learnng from measurements n exponental famles In 6 th ICML, pages , 009 [6] D J Muscant, J M Chrstensen, and J F Olson Supervsed learnng by tranng on aggregate outputs In 7 th ICDM, pages 5 6, 007 [7] J Hernández-González, I Inza, and J A Lozano Learnng bayesan network classfers from label proportons Pattern Recognton, 46(): , 03 [8] M Stolpe and K Mork Learnng from label proportons by optmzng cluster model selecton In 5 th ECMLPKDD, pages , 0 [9] B C Chen, L Chen, R Ramakrshnan, and D R Muscant Learnng from aggregate vews In th ICDE, pages 3 3, 006 [0] J Wojtusak, K Irvn, A Brerdnc, and A V Baranova Usng publshed medcal results and nonhomogenous data n rule learnng In 0 th ICMLA, pages 84 89, 0 [] S Rüpng Svm classfer estmaton from group probabltes In 7 th ICML, pages 9 98, 00 [] H Kueck and N de Fretas Learnng about ndvduals from group statstcs In th UAI, pages , 005 [3] S Chen, B Lu, M Qan, and C Zhang Kernel k-means based framework for aggregate outputs classfcaton In 9 th ICDMW, pages , 009 [4] K T La, F X Yu, M S Chen, and S F Chang Vdeo event detecton by nferrng temporal nstance labels In th CVPR, 04 [5] K Fan, H Zhang, S Yan, L Wang, W Zhang, and J Feng Learnng a generatve classfer from label proportons Neurocomputng, 39:47 55, 04 [6] F X Yu, D Lu, S Kumar, T Jebara, and S F Chang SVM for Learnng wth Label Proportons In 30 th ICML, pages 504 5, 03 [7] N Quadranto, A J Smola, T S Caetano, and Q V Le Estmatng labels from label proportons JMLR, 0: , 009 [8] R Nock and F Nelsen Bregman dvergences and surrogates for learnng IEEE TransPAMI, 3: , 009 [9] G Patrn, R Nock, P Rvera, and T S Caetano (Almost) no label no cry - supplementary materal In NIPS*7, 04 [0] M J Kearns and Y Mansour On the boostng ablty of top-down decson tree learnng algorthms In 8 th ACM STOC, pages , 996 [] M Belkn, P Nyog, and V Sndhwan Manfold regularzaton: A geometrc framework for learnng from labeled and unlabeled examples JMLR, 7: , 006 [] J Sh and J Malk Normalzed cuts and mage segmentaton IEEE TransPAMI, : , 000 [3] Y Altun and A J Smola Unfyng dvergence mnmzaton and statstcal nference va convex dualty In 9 th COLT, pages 39 53, 006 [4] P L Bartlett and S Mendelson Rademacher and gaussan complextes: Rsk bounds and structural results JMLR, 3:463 48, 00 [5] V Koltchnsk and D Panchenko Emprcal margn dstrbutons and boundng the generalzaton error of combned classfers Ann of Stat, 30: 50, 00 [6] K Bache and M Lchman UCI machne learnng repostory, 03 [7] A Beygelzmer, V Dan, T Hayes, J Langford, and B Zadrozny Error lmtng reductons between classfcaton tasks In th ICML, pages 49 56, 005 9

10 (Almost) No Label No Cry - Supplementary Materal Gorgo Patrn,, Rchard Nock,, Paul Rvera,, Tbero Caetano,3,4 Australan Natonal Unversty, NICTA, Unversty of New South Wales 3, Ambata 4 Sydney, NSW, Australa Table of contents Supplementary materal on proofs Pg Proof of Lemma Pg Proof of Lemma Pg Proof of Theorem 3 Pg 3 Proof of Lemma 4 Pg 4 Proof of Lemma 5 Pg 6 Mean Map estmator s Lemma and Proof Pg 8 Proof of Theorem 6 Pg 9 Proof of Lemma 7 Pg 3 Proof of Theorem 8 Pg 3 Supplementary materal on experments Pg 7 Full Expermental Setup Pg 7 Smulated Doman for Volaton of Homogenety Assumpton Pg 8 Smulated Doman from [] Pg 8 Addtonal Tests on alter- SVM [] Pg 8 Scalablty Pg 9 Full Results on Small Domans Pg 9

11 Supplementary Materal on Proofs Proof of Lemma For any SPSL F (S, h), we can wrte t as ([], Lemma, [3]): F (S, h) F φ (S, h) D φ (y m φ (h(x ))), () where y ff y and 0 otherwse, φ s permssble and D φ s the Bregman dvergence wth generator φ [3] It also holds that: D φ (y φ (h(x ))) b φ F φ (yh(x)) wth: F φ (x) φ ( x) + φ(0) φ(0) φ(/) a φ + φ ( x), () b φ and φ s the convex conjugate of φ, e φ (x) xφ (x) φ(φ (x)) Furthermore, for any permssble φ, the conjex conjugate φ (x) verfes the property φ ( x) φ (x) x, (3) and so we get that: F (S, h) D φ (y m φ (h(x ))) b φ m b φ m b φ m b φ m b φ m b φ m F φ (y h(x )) ( F φ (y h(x )) + ) F φ (y h(x )) ( F φ (y h(x )) + ) F φ ( y h(x )) y h(x ) b φ F φ (yh(x )) y h(x ) m y {,+} ( ) F φ (σh(x )) h y x m σ {,+} σ {,+} F φ (σh(x )) h (µ S) (6) (4) holds because of (3), (5) holds because h s lnear So for any samples S and S wth respectve sze m and m, we have (agan usng the property that h s lnear): ( ) F (S, h) F (S, h) b φ F φ (σh(x )) m m F φ (σh(x )) x S x S σ {,+} whch yelds the statement of the Lemma Proof of Lemma Usng the fact that D w and L are symmetrc, we have: l(l, X) X + h (µ S µ S ), (7) X tr ( B D w Π ) X + X tr ( X ΠD w Π ) X + γ X tr ( X ) LX ΠD w B + ΠD w Π X + γlx 0, out of whch B± follows n Lemma (4) (5)

12 3 Proof of Theorem 3 We let Π o [DIAG(ˆπ) DIAG(ˆπ )] N an orthonormal system (n jj (ˆπ j +( ˆπ j) ) /, j [n] and 0 otherwse) Let K Πo be the n-dm subspace of R d generated by Π o The proof of Theorem (3) explots the followng Lemma, whch assumes that ε s any > 0 real for L n (8) (man fle) to be 0 When ε 0, the result of Theorem (3) stll holds but follows a dfferent proof Lemma Let A ΠD w Π and L defned as n (8) (man paper) Denote for short U ( L A + γ I ) (8) Suppose there exsts ξ > 0 such that for any x R n, the projecton of Ux n K Πo, x U,o, satsfes Then: Proof Combnng Lemma and (5), we get x U,o ξ x (9) M M F γξ B ± F (0) B ± B± Defne the followng permutaton matrx: C ( ) (A + γl) A I B ± ( (γl) A + I ) B ± () [ 0 I I 0 ] R n n () A ΠD w Π s not nvertble but dagonalsable Its (orthonormal) egenvectors can be parttoned n two matrces P o and P such that: We have: P o P [DIAG(ˆπ ) DIAG(ˆπ)] N CΠ o R n n (egenvalues 0), (3) ΠN R n n (egenvalues w j (ˆπ j + ( ˆπ j) ), j) (4) M M P o CB ± P o C B± P ( o C (γl) A + ) I B ± Π ( o (γl) A + ) I B ± (5) γπ ( o L A + γ ) I B ± (6) Eq (5) follows from the fact that C s dempotent Pluggng Frobenus norm n (6), we obtan M M F γ Π ( o L A + γ ) I B ± F γ d k Π o ( L A + γ I ) b ± k d γ ξ b ± k (7) k γ ξ B ± F, whch yelds (0) In (7), b ± k denotes column k n B± Ineq (7) makes use of assumpton (9) To ensure x U,o ξ x, t s suffcent that Ux ξ x, and snce Ux U F x, t s suffcent to show that, (8) U ξ F 3

13 wth U ξ L ξ A + ξγ I, for relevant choces of ξ We have let L ξ (/ξ)l Let 0 λ () λ n () denote the ordered egenvalues of a postve-semdefnte matrx n R n n It follows that, snce L s symmetrc postve defnte, we have λ j (L ξ A) λ j(a) λ n (L ξ ) ( 0), j [n] We have used eq (3) Weyl s Theorem then brngs: λ j (U ξ ) λ n (L ξ ) λ j (A) + ξγ λ n (L ξ ) { ξ γ f j [n] λ n(l ξ ) λ j(a) otherwse (9) Gershgorn s Theorem brngs λ n (/ξ)(ε + max j j l jj ), and furthermore the egenvalues of A satsfy λ j w j /, j n + We thus have: U ξ F nγ ξ ) 4n (ε + max j j l + jj ξ mn j wj (0) In (9) and (0), we have used the egenvalues of A gven n eqs (3) and (4) Assumng: γ ξ n, () a suffcent condton for the rght-hand sde of (0) to be s that ξ ε + max j j l jj n mn j w j () To fnsh up the proof, recall that L D V wth d jj j,j v jj and the coordnates v jj 0 Hence, l jj j j j v jj n max v jj, j [n] j j The proof s fnshed by pluggng ths upperbound n () to choose ξ, then takng the maxmal value for γ n () and fnally solvng the upperbound n (0) Ths ends the proof of Theorem 3 4 Proof of Lemma 4 We frst consder the normalzed assocaton crteron n (0): ASSOC(S j, S j ) vjj N ( ASSOC(Sj, S j ) ASSOC(S j, S j S j ) + x S j,x S j ASSOC(S ) j, S j ) ASSOC(S j, S j S j ) x x (3), 4

14 Remark that b j b j x x m j m j x S j x S j m x + j x S j m j x S j m + j m j m j x S j x x S j x + m j m j x S j,x S j x S j m j x S j x x x x x m j m j x S j m j m j x m j m j x S j,x S j x S j,x S j x x x S j x x x (4) + m j x m j m + m j x j m j m x x j m j m j x S j x S j x S j,x S j } {{ } a x x (5) m j m j x S j,x S j ASSOC(S j, S j ) (6) m j m j ( n ) ( Eq (4) explots the fact that j a n ) j n j a j and eq (5) explots the fact that a (m j m j ) x S j,x S x j x We thus have: ASSOC(S j, S j ) ASSOC(S j, S j S j ) ASSOC(S j, S j ) ASSOC(S j, S j ) + ASSOC(S j, S j ) ASSOC(S j, S j ) ASSOC(S j, S j ) + mjm j b j b j κ m j κ m j + mjm j b j b j + m j κ b j b j 5 (7) (8) (9)

15 Eq (7) uses (6) and eq (8) uses assumpton (D) Eq (8) also holds when permutng j and j, so we get: ( ) ς(v NC ε, B ± ) max j j n + + mj κ b j b j + + m j κ b j b j B ± F ( ) ε n + B ± mnj mj F + κ mn j,j b j b j ( ) ε n + B ± mnj mj F (30) + κ mn j,j b j b j ε n d max σ,j bσ j + 4κ d max σ,j b σ j mn j,j b j b j ε n d max 4κ d σ,j bσ j + κ max σ,j b σ j ) f (max NC σ,j bσ j o(), (3) where the last nequalty uses assumpton (D), and (30) uses the property that (a+b) a +b We have let f NC (x) ε n dx + 4κ d κx, (3) whch s ndeed o() f ε o(n / x) Ths proves the Lemma for ς(v NC, B ± ) The case of ς(v G,s, B ± ) s easer, as ( exp b ) ( j b j exp mn j,j b j b j ) s s ( exp κ ) s max σ,j bσ j, from assumpton (D) alone, whch gves ( ( ε ς(v G,s, B ± ) B ± F n + exp κ )) s max σ,j bσ j ( ( ε B ± F n + exp κ )) s max σ,j bσ j ( ( ε d max σ,j bσ j n + exp κ )) s max σ,j bσ j ) f (max G σ,j bσ j o(), (33) as clamed We have let f G (x) ε n dx+dx exp( κx/s), whch s ndeed o() f ε o(n / x) Remark that we shall have n general f G (x) f NC (x) and even f G (x) o(f NC (x)) f ε 0, so we may expect better convergence n the case of V G,s as max σ,j b σ j grows 5 Proof of Lemma 5 We frst restate the Lemma n a more explct way, that shall provde explct values for κ l and κ n Lemma There exst κ jj and s jj dependng on d j, d j, and κ jj > dependng on m j, m j, such that: 6

16 If v G,s jj jj > exp( /4) then S j, S j are not lnearly separable; If v G,s jj jj < exp( 64) then S j, S j are lnearly separable; If v NC jj If v NC jj > κ jj then S j, S j are not lnearly separable; < κ jj /κ jj then S j, S j are lnearly separable Proof We frst consder the normalzed assocaton crteron n (0), and we prove the Lemma for the followng expressons of κ jj and κ jj : κ jj d jj + d jj d j d j, (34) κ jj 5 max{m j, m j }, (35) wth d jj max{d j, d j } and d j max x,x S j x x, j j [n] For any bag S j, we let (b j, r j) MEB(S j ) denote the mnmum enclosng ball (MEB) for bag S j and dstance L, that s, r j s the smallest unque real such that!b j : d(x, b j ) x b j r j, x S j We have let d(x, b j ) x b j We are gong to prove a frst result nvolvng the MEBs of S j and S j, and then wll translate the result to the Lemma s statement The followng propertes follows from standard propertes of MEBs and the fact that d(, ) s a dstance (they hold for any j j ): (a) d(x, x ) r j, x, x S j ; (b) If bags S j and S j are lnearly separable, then x CO(S j ), x S j such that d(x, x ) max{r j, r j }; here, CO denotes the convex closure; (c) If bags S j and S j are lnearly separable, then d(b j, b j ) max{r j, r j }, where b j and b j are the bags average; (d) x S j, x S j st d(x, x ) r j ; (e) d(x, x ) max{r j, r j } + d(b j, b j ), x CO(S j), x CO(S j ) Let us defne ASSOC(S j, S j ) d (x, x ) (36) x S j,x S j We remark that, assumng that each bag contans at least two elements wthout loss of generalty: vjj NC + (37) + ASSOC(Bj,B j ) ASSOC(B j,b j) + ASSOC(Bj,B j ) ASSOC(B j,b j ) We have ASSOC(S j, S j ) 4m j rj and ASSOC(S j, S j ) 4m j r j (because of (a)), and also ASSOC(S j, S j ) max{m j, m j } max{rj, r j } when S j and S j are lnearly separable (because of (b)), whch yelds n ths case vjj NC + + max{mj,m j } max{r j,r j } m jrj + max{r j,r j } r j + + max{mj,m j } max{r j,r j } m j r j + max{r j,r j } r j (38) Let us name κ jj the rght-hand sde of (38) It follows that when vnc jj > κ jj, S j and S j are not lnearly separable 7

17 On the other hand, we have ASSOC(S j, S j ) m j rj and ASSOC(S j, S j ) m j r j (because of (d)), and also ASSOC(S j, S j ) m j m j ( max{r j, r j } + d(b j, b j )) m j m j (4 max{rj, rj } + d (b j, b j )), (39) because of (e) and the fact that (a + b) a + b It follows that j j : vjj NC + (40) + m j (4 max{r j,r j }+d (b j,b j )) + mj(4 max{r j,r j }+d (b j,b j )) rj r j For any j j, when d (b j, b j ) 4 max{r j, r j }, then we have from (40): vjj NC + + 6m j max{r j,r j } + 6mj max{r j,r j } rj r j > κ jj /(3 max{m j, m j }) (4) Hence, when vjj NC κ jj /(3 max{m j, m j }), t mples d(b j, b j ) > max{r j, r j }, mplyng d(b j, b j ) > r j + r j, whch s a suffcent condton for the lnear separablty of S j and S j So, we can relate the lnear separablty of S j and S j to the value of vjj NC wth respect to κ jj defned n (38) To remove the dependence n the MEB parameters and obtan the statement of the Lemma, we just have to remark that d j /4 r j 4d j, j [n], whch yelds κ jj /6 κ jj κ jj Hence, when vjj NC > κ jj, t follows that vnc jj > κ jj and S j and S j are not lnearly separable On the other hand, when vjj NC κ jj /(6 3 max{m j, m j }) κ jj /κ jj, then vjj NC κ jj /(3 max{m j, m j }) and the bags S j and S j are lnearly separable Ths acheves the proof of Lemma 5 for the normalzed assocaton crteron n (0) The proof for v G,s jj s shorter, and we prove t for s j,j max{d j, d j } (4) We have (/) max{d j, d j } max{r j, r j } max{d j, d j } Hence, because of (c) above, f S j and S j are lnearly separable, then v G,s jj /e/4 ; so, when v G,s jj > /e/4, the two bags are not lnearly separable On the other hand, f d(b j, b j ) max{r j, r j }, then because of (e) above d(b j, b j ) 4 max{r j, r j } 8 max{d j, d j }, and so v G,s jj /e64 Ths mples that f v G,s jj < /e64, then d(b j, b j ) > max{r j, r j } r j + r j, and thus the two bags are lnearly separable, as clamed Ths acheves the proof of Lemma Ths acheves the proof of Lemma 5 6 Mean Map estmator s Lemma and Proof It s not hard to check that the randomzed procedure that bulds µ S RAND yx for some random x S and y {, } guarantees O( + γ) approxmablty when some bags are close to the convex hull of S, for small γ > 0 Hence, the Mean Map estmaton of µ S can be very poor n that respect Lemma 3 For any γ > 0, the Mean Map estmator µ S MM µ S / max σ,j b σ j γ, even when (D + D) hold cannot guarantee µ MM S Proof Let x > 0, ɛ (0, ), p (0, ), p / We create a dataset from four observatons, {(x 0, ), (x 0, ), (x 3 x, ), (x 4 x, )} There are two bags, S takes ɛ of x and ɛ of x S takes ɛ of x 4 and ɛ of x 3 The label-wse estmators µ σ of [4] are soluton of ( [ ] [ ] ɛ ɛ ɛ ɛ [ µ µ ] ɛ ɛ ɛ [ ( ɛ)x ɛx ] ɛ 8 ɛ ] ) [ ɛ ɛ ɛ ɛ ] [ x 0 (43)

18 On the other hand, the true quanttes are: [ ] µ µ [ ( ɛ)x ɛx ] (44) We now mx classes n S and pck bag proportons q P S [S ] and q P S [S ] We have the class proportons defned by P S [y +] ɛq + ( ɛ)( q) p Then ( ) ( ) µ S µ S p( ɛ) ɛ x ( p)ɛ ɛ x ɛ p ɛ ɛ x ɛ( q)x (45) Furthermore, max b σ x We get µ S µ S max b σ ɛ( q) (46) Pckng ɛ and ( q) both > (γ/) s suffcent to have eq (46) > γ for any γ > 0 Remark that both assumptons (D) and (D) hold for any κ < and any κ > 0 7 Proof of Theorem 6 The proof of the Theorem nvolves two Lemmata, the frst of whch s of ndependent nterest and holds for any convex twce dfferentable functon F, and not just any F φ So, let us defne: ( ) b F (S y, θ, µ) F (σθ x ) m θ µ (47) where b s any fxed postve real Defne also the regularzed loss: F (S y, θ, µ, λ) F (S y, θ, µ) + λ θ (48) Let f k R m denote the vector encodng the k th varable n S : f k x k For any k [d], let ( d f k σ k f k denote a normalzaton of vectors f k n the sense that d f k ( d d k ( d k f k f k k ) d d fk (49) ) d ) d k f k (50) Let Ṽ collect all vectors f k n column and V collect all vectors f k n column Wthout loss of generalty, we assume V V 0, e V V postve defnte (e no feature s a lnear combnaton of the others), mplyng, because the columns of Ṽ are just postve rescalng of the columns of V, that Ṽ Ṽ 0 as well We use V nstead of F as n the man paper, n order not to counfound wth the general convex surrogate notaton F that we use here Lemma 4 Gven any two µ and µ, let θ and θ be the respectve mnmzers of F (S y,, µ, λ) and F (S y,, µ, λ) Suppose there exsts F > 0 such that surrogate F satsfes F (±(αθ + ( α)θ ) x ) F, α [0, ], [m] (5) Then the followng holds: θ θ λ + em F vol (Ṽ) µ µ, (5) where vol(ṽ) det Ṽ Ṽ denote the volume of the (row/column) system of Ṽ 9

19 Proof Our proof begns followng the same frst steps as the proof of Lemma 7 n [5], addng the steps that handle the lowerbound on F Consder the followng auxlary functon A F (τ ): A F (τ ) ( F (S y, θ, µ) F (S y, θ, µ ) ) (τ θ ) + λ τ θ, (53) where the gradent of F s computed wth respect to parameter θ The gradent of A F () s: The gradent of A F satsfes A F (τ ) F (S y, θ, µ) F (S y, θ, µ ) + λ(τ θ ), (54) A F (θ ) F (S y, θ, µ, λ) F (S y, θ, µ, λ) 0, (55) as both gradents n the rght are 0 because of the optmalty of θ and θ wth respect to F (S y,, µ, λ) and F (S y,, µ, λ) The Hessan H of A F s HA F (τ ) λi 0 and so A F s convex and s thus mnmal at τ θ Fnally, A F (θ ) 0 It comes thus A F (θ ) 0, whch yelds equvalently: 0 ( F (S y, θ, µ) F (S y, θ, µ ) ) (θ θ ) + λ θ θ ( ) b F (yθ x ) m µ b F (yθ x ) + m µ (θ θ ) y y +λ θ θ ( b F (yθ x ) ) F (yθ x ) (θ θ m ) y y } {{ } a (µ µ ) (θ θ ) + λ θ θ (56) Let us lowerbound a We have F (yθ x) yf (yθ x)x, and a Taylor expanson brngs that for any θ, θ, there exsts some α [0, ] such that, defnng we have: We thus get: a u α, y(αθ + ( α)θ ) x, (57) F (yθ x ) F (yθ x ) + y(θ θ ) x F (u α, ) (58) ( F (yθ x ) y y ( y ) F (yθ x ) (θ θ ) y(f (yθ x ) F (yθ x ))x ) (θ θ ) ( ) (θ θ ) x F (u α, )x (θ θ ) y ((θ θ ) x ) F (u α, ) F ((θ θ ) x ) (59) F (θ θ ) SS (θ θ ), (60) where matrx S R d m s formed by the observatons of S y n columns, and neq (59) comes from (5) Defne T (d/ x )SS Its trace satsfes tr (T) d Let λ d λ d λ > 0 0

20 denote egenvalues of T, wth λ strctly postve because SS V V 0 The AGH nequalty brngs: Multplyng both sde by λ and rearrangng yelds: d λ k ( ) d d λ k (6) d k ( ) d tr (T) λ d ( ) d d λ d ( ) d d (6) d λ ( ) d d det T (63) d Let λ > 0 denote the mnmal egenvalue of SS It satsfes λ ( x /d)λ and thus t comes from neq (63): ( ) d ( ) d d d λ d x det SS ( ) [ d ( ) ] d d d det d x SS ( ) d d det Ṽ Ṽ (64) d ( ) d d vol (Ṽ) (65) d e vol (Ṽ) (66) We have used notaton vol(ṽ) det Ṽ Ṽ Snce (θ θ ) SS (θ θ ) λ θ θ, combnng (60) wth (66) yelds the followng lowerbound on a: Gong back to (56), we get λ θ θ (µ µ ) (θ θ ) + a e F vol (Ṽ) θ θ (67) b em F vol (Ṽ) θ θ 0 Snce (µ µ ) (θ θ ) µ µ θ θ, we get after channg the nequaltes and solvng for θ θ : as clamed θ θ λ + em F vol (Ṽ) µ µ, The second Lemma s used to (5) when F (x) F φ Notce that we cannot rely on strong convexty arguments on F φ, as ths do not hold n general The Lemma s stated n a more general settng than for just F F φ

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification Lecture 4: More classfers and classes C4B Machne Learnng Hlary 20 A. Zsserman Logstc regresson Loss functons revsted Adaboost Loss functons revsted Optmzaton Multple class classfcaton Logstc Regresson

More information

What is Candidate Sampling

What is Candidate Sampling What s Canddate Samplng Say we have a multclass or mult label problem where each tranng example ( x, T ) conssts of a context x a small (mult)set of target classes T out of a large unverse L of possble

More information

1 Example 1: Axis-aligned rectangles

1 Example 1: Axis-aligned rectangles COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture # 6 Scrbe: Aaron Schld February 21, 2013 Last class, we dscussed an analogue for Occam s Razor for nfnte hypothess spaces that, n conjuncton

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Max Wellng Department of Computer Scence Unversty of Toronto 10 Kng s College Road Toronto, M5S 3G5 Canada wellng@cs.toronto.edu Abstract Ths s a note to explan support vector machnes.

More information

Recurrence. 1 Definitions and main statements

Recurrence. 1 Definitions and main statements Recurrence 1 Defntons and man statements Let X n, n = 0, 1, 2,... be a MC wth the state space S = (1, 2,...), transton probabltes p j = P {X n+1 = j X n = }, and the transton matrx P = (p j ),j S def.

More information

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

Luby s Alg. for Maximal Independent Sets using Pairwise Independence Lecture Notes for Randomzed Algorthms Luby s Alg. for Maxmal Independent Sets usng Parwse Independence Last Updated by Erc Vgoda on February, 006 8. Maxmal Independent Sets For a graph G = (V, E), an ndependent

More information

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements Lecture 3 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square Next lecture: Matlab tutoral Announcements Rules for attendng the class: Regstered for credt Regstered for audt (only f there

More information

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by 6 CHAPTER 8 COMPLEX VECTOR SPACES 5. Fnd the kernel of the lnear transformaton gven n Exercse 5. In Exercses 55 and 56, fnd the mage of v, for the ndcated composton, where and are gven by the followng

More information

1 Approximation Algorithms

1 Approximation Algorithms CME 305: Dscrete Mathematcs and Algorthms 1 Approxmaton Algorthms In lght of the apparent ntractablty of the problems we beleve not to le n P, t makes sense to pursue deas other than complete solutons

More information

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12 14 The Ch-squared dstrbuton PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 1 If a normal varable X, havng mean µ and varance σ, s standardsed, the new varable Z has a mean 0 and varance 1. When ths standardsed

More information

The Greedy Method. Introduction. 0/1 Knapsack Problem

The Greedy Method. Introduction. 0/1 Knapsack Problem The Greedy Method Introducton We have completed data structures. We now are gong to look at algorthm desgn methods. Often we are lookng at optmzaton problems whose performance s exponental. For an optmzaton

More information

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module LOSSLESS IMAGE COMPRESSION SYSTEMS Lesson 3 Lossless Compresson: Huffman Codng Instructonal Objectves At the end of ths lesson, the students should be able to:. Defne and measure source entropy..

More information

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ). REVIEW OF RISK MANAGEMENT CONCEPTS LOSS DISTRIBUTIONS AND INSURANCE Loss and nsurance: When someone s subject to the rsk of ncurrng a fnancal loss, the loss s generally modeled usng a random varable or

More information

CS 2750 Machine Learning. Lecture 17a. Clustering. CS 2750 Machine Learning. Clustering

CS 2750 Machine Learning. Lecture 17a. Clustering. CS 2750 Machine Learning. Clustering Lecture 7a Clusterng Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square Clusterng Groups together smlar nstances n the data sample Basc clusterng problem: dstrbute data nto k dfferent groups such that

More information

8 Algorithm for Binary Searching in Trees

8 Algorithm for Binary Searching in Trees 8 Algorthm for Bnary Searchng n Trees In ths secton we present our algorthm for bnary searchng n trees. A crucal observaton employed by the algorthm s that ths problem can be effcently solved when the

More information

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence 1 st Internatonal Symposum on Imprecse Probabltes and Ther Applcatons, Ghent, Belgum, 29 June 2 July 1999 How Sets of Coherent Probabltes May Serve as Models for Degrees of Incoherence Mar J. Schervsh

More information

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis The Development of Web Log Mnng Based on Improve-K-Means Clusterng Analyss TngZhong Wang * College of Informaton Technology, Luoyang Normal Unversty, Luoyang, 471022, Chna wangtngzhong2@sna.cn Abstract.

More information

Forecasting the Direction and Strength of Stock Market Movement

Forecasting the Direction and Strength of Stock Market Movement Forecastng the Drecton and Strength of Stock Market Movement Jngwe Chen Mng Chen Nan Ye cjngwe@stanford.edu mchen5@stanford.edu nanye@stanford.edu Abstract - Stock market s one of the most complcated systems

More information

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting Causal, Explanatory Forecastng Assumes cause-and-effect relatonshp between system nputs and ts output Forecastng wth Regresson Analyss Rchard S. Barr Inputs System Cause + Effect Relatonshp The job of

More information

L10: Linear discriminants analysis

L10: Linear discriminants analysis L0: Lnear dscrmnants analyss Lnear dscrmnant analyss, two classes Lnear dscrmnant analyss, C classes LDA vs. PCA Lmtatons of LDA Varants of LDA Other dmensonalty reducton methods CSCE 666 Pattern Analyss

More information

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College Feature selecton for ntruson detecton Slobodan Petrovć NISlab, Gjøvk Unversty College Contents The feature selecton problem Intruson detecton Traffc features relevant for IDS The CFS measure The mrmr measure

More information

Can Auto Liability Insurance Purchases Signal Risk Attitude?

Can Auto Liability Insurance Purchases Signal Risk Attitude? Internatonal Journal of Busness and Economcs, 2011, Vol. 10, No. 2, 159-164 Can Auto Lablty Insurance Purchases Sgnal Rsk Atttude? Chu-Shu L Department of Internatonal Busness, Asa Unversty, Tawan Sheng-Chang

More information

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network 700 Proceedngs of the 8th Internatonal Conference on Innovaton & Management Forecastng the Demand of Emergency Supples: Based on the CBR Theory and BP Neural Network Fu Deqang, Lu Yun, L Changbng School

More information

v a 1 b 1 i, a 2 b 2 i,..., a n b n i.

v a 1 b 1 i, a 2 b 2 i,..., a n b n i. SECTION 8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS 455 8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS All the vector spaces we have studed thus far n the text are real vector spaces snce the scalars are

More information

Logistic Regression. Steve Kroon

Logistic Regression. Steve Kroon Logstc Regresson Steve Kroon Course notes sectons: 24.3-24.4 Dsclamer: these notes do not explctly ndcate whether values are vectors or scalars, but expects the reader to dscern ths from the context. Scenaro

More information

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek HE DISRIBUION OF LOAN PORFOLIO VALUE * Oldrch Alfons Vascek he amount of captal necessary to support a portfolo of debt securtes depends on the probablty dstrbuton of the portfolo loss. Consder a portfolo

More information

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic Lagrange Multplers as Quanttatve Indcators n Economcs Ivan Mezník Insttute of Informatcs, Faculty of Busness and Management, Brno Unversty of TechnologCzech Republc Abstract The quanttatve role of Lagrange

More information

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006 Latent Class Regresson Statstcs for Psychosocal Research II: Structural Models December 4 and 6, 2006 Latent Class Regresson (LCR) What s t and when do we use t? Recall the standard latent class model

More information

Realistic Image Synthesis

Realistic Image Synthesis Realstc Image Synthess - Combned Samplng and Path Tracng - Phlpp Slusallek Karol Myszkowsk Vncent Pegoraro Overvew: Today Combned Samplng (Multple Importance Samplng) Renderng and Measurng Equaton Random

More information

An Alternative Way to Measure Private Equity Performance

An Alternative Way to Measure Private Equity Performance An Alternatve Way to Measure Prvate Equty Performance Peter Todd Parlux Investment Technology LLC Summary Internal Rate of Return (IRR) s probably the most common way to measure the performance of prvate

More information

A Probabilistic Theory of Coherence

A Probabilistic Theory of Coherence A Probablstc Theory of Coherence BRANDEN FITELSON. The Coherence Measure C Let E be a set of n propostons E,..., E n. We seek a probablstc measure C(E) of the degree of coherence of E. Intutvely, we want

More information

ECE544NA Final Project: Robust Machine Learning Hardware via Classifier Ensemble

ECE544NA Final Project: Robust Machine Learning Hardware via Classifier Ensemble 1 ECE544NA Fnal Project: Robust Machne Learnng Hardware va Classfer Ensemble Sa Zhang, szhang12@llnos.edu Dept. of Electr. & Comput. Eng., Unv. of Illnos at Urbana-Champagn, Urbana, IL, USA Abstract In

More information

Single and multiple stage classifiers implementing logistic discrimination

Single and multiple stage classifiers implementing logistic discrimination Sngle and multple stage classfers mplementng logstc dscrmnaton Hélo Radke Bttencourt 1 Dens Alter de Olvera Moraes 2 Vctor Haertel 2 1 Pontfíca Unversdade Católca do Ro Grande do Sul - PUCRS Av. Ipranga,

More information

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications CMSC828G Prncples of Data Mnng Lecture #9 Today s Readng: HMS, chapter 9 Today s Lecture: Descrptve Modelng Clusterng Algorthms Descrptve Models model presents the man features of the data, a global summary

More information

6. EIGENVALUES AND EIGENVECTORS 3 = 3 2

6. EIGENVALUES AND EIGENVECTORS 3 = 3 2 EIGENVALUES AND EIGENVECTORS The Characterstc Polynomal If A s a square matrx and v s a non-zero vector such that Av v we say that v s an egenvector of A and s the correspondng egenvalue Av v Example :

More information

Loop Parallelization

Loop Parallelization - - Loop Parallelzaton C-52 Complaton steps: nested loops operatng on arrays, sequentell executon of teraton space DECLARE B[..,..+] FOR I :=.. FOR J :=.. I B[I,J] := B[I-,J]+B[I-,J-] ED FOR ED FOR analyze

More information

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching) Face Recognton Problem Face Verfcaton Problem Face Verfcaton (1:1 matchng) Querymage face query Face Recognton (1:N matchng) database Applcaton: Access Control www.vsage.com www.vsoncs.com Bometrc Authentcaton

More information

Performance Analysis and Coding Strategy of ECOC SVMs

Performance Analysis and Coding Strategy of ECOC SVMs Internatonal Journal of Grd and Dstrbuted Computng Vol.7, No. (04), pp.67-76 http://dx.do.org/0.457/jgdc.04.7..07 Performance Analyss and Codng Strategy of ECOC SVMs Zhgang Yan, and Yuanxuan Yang, School

More information

DEFINING %COMPLETE IN MICROSOFT PROJECT

DEFINING %COMPLETE IN MICROSOFT PROJECT CelersSystems DEFINING %COMPLETE IN MICROSOFT PROJECT PREPARED BY James E Aksel, PMP, PMI-SP, MVP For Addtonal Informaton about Earned Value Management Systems and reportng, please contact: CelersSystems,

More information

J. Parallel Distrib. Comput.

J. Parallel Distrib. Comput. J. Parallel Dstrb. Comput. 71 (2011) 62 76 Contents lsts avalable at ScenceDrect J. Parallel Dstrb. Comput. journal homepage: www.elsever.com/locate/jpdc Optmzng server placement n dstrbuted systems n

More information

Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering

Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering Out-of-Sample Extensons for LLE, Isomap, MDS, Egenmaps, and Spectral Clusterng Yoshua Bengo, Jean-Franços Paement, Pascal Vncent Olver Delalleau, Ncolas Le Roux and Mare Oumet Département d Informatque

More information

Point cloud to point cloud rigid transformations. Minimizing Rigid Registration Errors

Point cloud to point cloud rigid transformations. Minimizing Rigid Registration Errors Pont cloud to pont cloud rgd transformatons Russell Taylor 600.445 1 600.445 Fall 000-014 Copyrght R. H. Taylor Mnmzng Rgd Regstraton Errors Typcally, gven a set of ponts {a } n one coordnate system and

More information

BERNSTEIN POLYNOMIALS

BERNSTEIN POLYNOMIALS On-Lne Geometrc Modelng Notes BERNSTEIN POLYNOMIALS Kenneth I. Joy Vsualzaton and Graphcs Research Group Department of Computer Scence Unversty of Calforna, Davs Overvew Polynomals are ncredbly useful

More information

where the coordinates are related to those in the old frame as follows.

where the coordinates are related to those in the old frame as follows. Chapter 2 - Cartesan Vectors and Tensors: Ther Algebra Defnton of a vector Examples of vectors Scalar multplcaton Addton of vectors coplanar vectors Unt vectors A bass of non-coplanar vectors Scalar product

More information

An Interest-Oriented Network Evolution Mechanism for Online Communities

An Interest-Oriented Network Evolution Mechanism for Online Communities An Interest-Orented Network Evoluton Mechansm for Onlne Communtes Cahong Sun and Xaopng Yang School of Informaton, Renmn Unversty of Chna, Bejng 100872, P.R. Chna {chsun,yang}@ruc.edu.cn Abstract. Onlne

More information

Joint Scheduling of Processing and Shuffle Phases in MapReduce Systems

Joint Scheduling of Processing and Shuffle Phases in MapReduce Systems Jont Schedulng of Processng and Shuffle Phases n MapReduce Systems Fangfe Chen, Mural Kodalam, T. V. Lakshman Department of Computer Scence and Engneerng, The Penn State Unversty Bell Laboratores, Alcatel-Lucent

More information

New Approaches to Support Vector Ordinal Regression

New Approaches to Support Vector Ordinal Regression New Approaches to Support Vector Ordnal Regresson We Chu chuwe@gatsby.ucl.ac.uk Gatsby Computatonal Neuroscence Unt, Unversty College London, London, WCN 3AR, UK S. Sathya Keerth selvarak@yahoo-nc.com

More information

) of the Cell class is created containing information about events associated with the cell. Events are added to the Cell instance

) of the Cell class is created containing information about events associated with the cell. Events are added to the Cell instance Calbraton Method Instances of the Cell class (one nstance for each FMS cell) contan ADC raw data and methods assocated wth each partcular FMS cell. The calbraton method ncludes event selecton (Class Cell

More information

On the Solution of Indefinite Systems Arising in Nonlinear Optimization

On the Solution of Indefinite Systems Arising in Nonlinear Optimization On the Soluton of Indefnte Systems Arsng n Nonlnear Optmzaton Slva Bonettn, Valera Ruggero and Federca Tnt Dpartmento d Matematca, Unverstà d Ferrara Abstract We consder the applcaton of the precondtoned

More information

ErrorPropagation.nb 1. Error Propagation

ErrorPropagation.nb 1. Error Propagation ErrorPropagaton.nb Error Propagaton Suppose that we make observatons of a quantty x that s subject to random fluctuatons or measurement errors. Our best estmate of the true value for ths quantty s then

More information

Analysis of Premium Liabilities for Australian Lines of Business

Analysis of Premium Liabilities for Australian Lines of Business Summary of Analyss of Premum Labltes for Australan Lnes of Busness Emly Tao Honours Research Paper, The Unversty of Melbourne Emly Tao Acknowledgements I am grateful to the Australan Prudental Regulaton

More information

Inequality and The Accounting Period. Quentin Wodon and Shlomo Yitzhaki. World Bank and Hebrew University. September 2001.

Inequality and The Accounting Period. Quentin Wodon and Shlomo Yitzhaki. World Bank and Hebrew University. September 2001. Inequalty and The Accountng Perod Quentn Wodon and Shlomo Ytzha World Ban and Hebrew Unversty September Abstract Income nequalty typcally declnes wth the length of tme taen nto account for measurement.

More information

MAPP. MERIS level 3 cloud and water vapour products. Issue: 1. Revision: 0. Date: 9.12.1998. Function Name Organisation Signature Date

MAPP. MERIS level 3 cloud and water vapour products. Issue: 1. Revision: 0. Date: 9.12.1998. Function Name Organisation Signature Date Ttel: Project: Doc. No.: MERIS level 3 cloud and water vapour products MAPP MAPP-ATBD-ClWVL3 Issue: 1 Revson: 0 Date: 9.12.1998 Functon Name Organsaton Sgnature Date Author: Bennartz FUB Preusker FUB Schüller

More information

Fisher Markets and Convex Programs

Fisher Markets and Convex Programs Fsher Markets and Convex Programs Nkhl R. Devanur 1 Introducton Convex programmng dualty s usually stated n ts most general form, wth convex objectve functons and convex constrants. (The book by Boyd and

More information

When Network Effect Meets Congestion Effect: Leveraging Social Services for Wireless Services

When Network Effect Meets Congestion Effect: Leveraging Social Services for Wireless Services When Network Effect Meets Congeston Effect: Leveragng Socal Servces for Wreless Servces aowen Gong School of Electrcal, Computer and Energy Engeerng Arzona State Unversty Tempe, AZ 8587, USA xgong9@asuedu

More information

How Much to Bet on Video Poker

How Much to Bet on Video Poker How Much to Bet on Vdeo Poker Trstan Barnett A queston that arses whenever a gae s favorable to the player s how uch to wager on each event? Whle conservatve play (or nu bet nzes large fluctuatons, t lacks

More information

PERRON FROBENIUS THEOREM

PERRON FROBENIUS THEOREM PERRON FROBENIUS THEOREM R. CLARK ROBINSON Defnton. A n n matrx M wth real entres m, s called a stochastc matrx provded () all the entres m satsfy 0 m, () each of the columns sum to one, m = for all, ()

More information

SIMPLE LINEAR CORRELATION

SIMPLE LINEAR CORRELATION SIMPLE LINEAR CORRELATION Smple lnear correlaton s a measure of the degree to whch two varables vary together, or a measure of the ntensty of the assocaton between two varables. Correlaton often s abused.

More information

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm Avalable onlne www.ocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(7):1884-1889 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCPRC5 A hybrd global optmzaton algorthm based on parallel

More information

Learning from Multiple Outlooks

Learning from Multiple Outlooks Learnng from Multple Outlooks Maayan Harel Department of Electrcal Engneerng, Technon, Hafa, Israel She Mannor Department of Electrcal Engneerng, Technon, Hafa, Israel maayanga@tx.technon.ac.l she@ee.technon.ac.l

More information

Lecture 5,6 Linear Methods for Classification. Summary

Lecture 5,6 Linear Methods for Classification. Summary Lecture 5,6 Lnear Methods for Classfcaton Rce ELEC 697 Farnaz Koushanfar Fall 2006 Summary Bayes Classfers Lnear Classfers Lnear regresson of an ndcator matrx Lnear dscrmnant analyss (LDA) Logstc regresson

More information

Efficient Project Portfolio as a tool for Enterprise Risk Management

Efficient Project Portfolio as a tool for Enterprise Risk Management Effcent Proect Portfolo as a tool for Enterprse Rsk Management Valentn O. Nkonov Ural State Techncal Unversty Growth Traectory Consultng Company January 5, 27 Effcent Proect Portfolo as a tool for Enterprse

More information

On Mean Squared Error of Hierarchical Estimator

On Mean Squared Error of Hierarchical Estimator S C H E D A E I N F O R M A T I C A E VOLUME 0 0 On Mean Squared Error of Herarchcal Estmator Stans law Brodowsk Faculty of Physcs, Astronomy, and Appled Computer Scence, Jagellonan Unversty, Reymonta

More information

Bayesian Cluster Ensembles

Bayesian Cluster Ensembles Bayesan Cluster Ensembles Hongjun Wang 1, Hanhua Shan 2 and Arndam Banerjee 2 1 Informaton Research Insttute, Southwest Jaotong Unversty, Chengdu, Schuan, 610031, Chna 2 Department of Computer Scence &

More information

Clustering Gene Expression Data. (Slides thanks to Dr. Mark Craven)

Clustering Gene Expression Data. (Slides thanks to Dr. Mark Craven) Clusterng Gene Epresson Data Sldes thanks to Dr. Mark Craven Gene Epresson Proles we ll assume we have a D matr o gene epresson measurements rows represent genes columns represent derent eperments tme

More information

SVM Tutorial: Classification, Regression, and Ranking

SVM Tutorial: Classification, Regression, and Ranking SVM Tutoral: Classfcaton, Regresson, and Rankng Hwanjo Yu and Sungchul Km 1 Introducton Support Vector Machnes(SVMs) have been extensvely researched n the data mnng and machne learnng communtes for the

More information

POLYSA: A Polynomial Algorithm for Non-binary Constraint Satisfaction Problems with and

POLYSA: A Polynomial Algorithm for Non-binary Constraint Satisfaction Problems with and POLYSA: A Polynomal Algorthm for Non-bnary Constrant Satsfacton Problems wth and Mguel A. Saldo, Federco Barber Dpto. Sstemas Informátcos y Computacón Unversdad Poltécnca de Valenca, Camno de Vera s/n

More information

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network *

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network * JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 24, 819-840 (2008) Data Broadcast on a Mult-System Heterogeneous Overlayed Wreless Network * Department of Computer Scence Natonal Chao Tung Unversty Hsnchu,

More information

An MILP model for planning of batch plants operating in a campaign-mode

An MILP model for planning of batch plants operating in a campaign-mode An MILP model for plannng of batch plants operatng n a campagn-mode Yanna Fumero Insttuto de Desarrollo y Dseño CONICET UTN yfumero@santafe-concet.gov.ar Gabrela Corsano Insttuto de Desarrollo y Dseño

More information

Semi-Supervised Text Classification Using Partitioned EM

Semi-Supervised Text Classification Using Partitioned EM Sem-Supervsed Text Classfcaton Usng Parttoned EM Gao Cong 1, Wee Sun Lee 1, Haoran Wu 1, Bng Lu 2 1 Department of Computer Scence, Natonal Unversty of Sngapore, Sngapore 117543 {conggao, leews, wuhaoran}@comp.nus.edu.sg

More information

Lecture 2: Single Layer Perceptrons Kevin Swingler

Lecture 2: Single Layer Perceptrons Kevin Swingler Lecture 2: Sngle Layer Perceptrons Kevn Sngler kms@cs.str.ac.uk Recap: McCulloch-Ptts Neuron Ths vastly smplfed model of real neurons s also knon as a Threshold Logc Unt: W 2 A Y 3 n W n. A set of synapses

More information

Learning Permutations with Exponential Weights

Learning Permutations with Exponential Weights Journal of Machne Learnng Research 2009 (10) 1705-1736 Submtted 9/08; Publshed 7/09 Learnng Permutatons wth Exponental Weghts Davd P. Helmbold Manfred K. Warmuth Computer Scence Department Unversty of

More information

greatest common divisor

greatest common divisor 4. GCD 1 The greatest common dvsor of two ntegers a and b (not both zero) s the largest nteger whch s a common factor of both a and b. We denote ths number by gcd(a, b), or smply (a, b) when there s no

More information

+ + + - - This circuit than can be reduced to a planar circuit

+ + + - - This circuit than can be reduced to a planar circuit MeshCurrent Method The meshcurrent s analog of the nodeoltage method. We sole for a new set of arables, mesh currents, that automatcally satsfy KCLs. As such, meshcurrent method reduces crcut soluton to

More information

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6 PAR TESTS If a WEIGHT varable s specfed, t s used to replcate a case as many tmes as ndcated by the weght value rounded to the nearest nteger. If the workspace requrements are exceeded and samplng has

More information

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

On the Optimal Control of a Cascade of Hydro-Electric Power Stations On the Optmal Control of a Cascade of Hydro-Electrc Power Statons M.C.M. Guedes a, A.F. Rbero a, G.V. Smrnov b and S. Vlela c a Department of Mathematcs, School of Scences, Unversty of Porto, Portugal;

More information

Lecture 18: Clustering & classification

Lecture 18: Clustering & classification O CPS260/BGT204. Algorthms n Computatonal Bology October 30, 2003 Lecturer: Pana K. Agarwal Lecture 8: Clusterng & classfcaton Scrbe: Daun Hou Open Problem In HomeWor 2, problem 5 has an open problem whch

More information

1. Measuring association using correlation and regression

1. Measuring association using correlation and regression How to measure assocaton I: Correlaton. 1. Measurng assocaton usng correlaton and regresson We often would lke to know how one varable, such as a mother's weght, s related to another varable, such as a

More information

Abstract. Clustering ensembles have emerged as a powerful method for improving both the

Abstract. Clustering ensembles have emerged as a powerful method for improving both the Clusterng Ensembles: {topchyal, Models jan, of punch}@cse.msu.edu Consensus and Weak Parttons * Alexander Topchy, Anl K. Jan, and Wllam Punch Department of Computer Scence and Engneerng, Mchgan State Unversty

More information

Heuristic Static Load-Balancing Algorithm Applied to CESM

Heuristic Static Load-Balancing Algorithm Applied to CESM Heurstc Statc Load-Balancng Algorthm Appled to CESM 1 Yur Alexeev, 1 Sher Mckelson, 1 Sven Leyffer, 1 Robert Jacob, 2 Anthony Crag 1 Argonne Natonal Laboratory, 9700 S. Cass Avenue, Argonne, IL 60439,

More information

Project Networks With Mixed-Time Constraints

Project Networks With Mixed-Time Constraints Project Networs Wth Mxed-Tme Constrants L Caccetta and B Wattananon Western Australan Centre of Excellence n Industral Optmsaton (WACEIO) Curtn Unversty of Technology GPO Box U1987 Perth Western Australa

More information

The OC Curve of Attribute Acceptance Plans

The OC Curve of Attribute Acceptance Plans The OC Curve of Attrbute Acceptance Plans The Operatng Characterstc (OC) curve descrbes the probablty of acceptng a lot as a functon of the lot s qualty. Fgure 1 shows a typcal OC Curve. 10 8 6 4 1 3 4

More information

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy 4.02 Quz Solutons Fall 2004 Multple-Choce Questons (30/00 ponts) Please, crcle the correct answer for each of the followng 0 multple-choce questons. For each queston, only one of the answers s correct.

More information

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION Vson Mouse Saurabh Sarkar a* a Unversty of Cncnnat, Cncnnat, USA ABSTRACT The report dscusses a vson based approach towards trackng of eyes and fngers. The report descrbes the process of locatng the possble

More information

Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers

Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers Foundatons and Trends R n Machne Learnng Vol. 3, No. 1 (2010) 1 122 c 2011 S. Boyd, N. Parkh, E. Chu, B. Peleato and J. Ecksten DOI: 10.1561/2200000016 Dstrbuted Optmzaton and Statstcal Learnng va the

More information

x f(x) 1 0.25 1 0.75 x 1 0 1 1 0.04 0.01 0.20 1 0.12 0.03 0.60

x f(x) 1 0.25 1 0.75 x 1 0 1 1 0.04 0.01 0.20 1 0.12 0.03 0.60 BIVARIATE DISTRIBUTIONS Let be a varable that assumes the values { 1,,..., n }. Then, a functon that epresses the relatve frequenc of these values s called a unvarate frequenc functon. It must be true

More information

Statistical Methods to Develop Rating Models

Statistical Methods to Develop Rating Models Statstcal Methods to Develop Ratng Models [Evelyn Hayden and Danel Porath, Österrechsche Natonalbank and Unversty of Appled Scences at Manz] Source: The Basel II Rsk Parameters Estmaton, Valdaton, and

More information

INSTITUT FÜR INFORMATIK

INSTITUT FÜR INFORMATIK INSTITUT FÜR INFORMATIK Schedulng jobs on unform processors revsted Klaus Jansen Chrstna Robene Bercht Nr. 1109 November 2011 ISSN 2192-6247 CHRISTIAN-ALBRECHTS-UNIVERSITÄT ZU KIEL Insttut für Informat

More information

Using Mixture Covariance Matrices to Improve Face and Facial Expression Recognitions

Using Mixture Covariance Matrices to Improve Face and Facial Expression Recognitions Usng Mxture Covarance Matrces to Improve Face and Facal Expresson Recogntons Carlos E. homaz, Duncan F. Glles and Raul Q. Fetosa 2 Imperal College of Scence echnology and Medcne, Department of Computng,

More information

GRAVITY DATA VALIDATION AND OUTLIER DETECTION USING L 1 -NORM

GRAVITY DATA VALIDATION AND OUTLIER DETECTION USING L 1 -NORM GRAVITY DATA VALIDATION AND OUTLIER DETECTION USING L 1 -NORM BARRIOT Jean-Perre, SARRAILH Mchel BGI/CNES 18.av.E.Beln 31401 TOULOUSE Cedex 4 (France) Emal: jean-perre.barrot@cnes.fr 1/Introducton The

More information

The Analysis of Covariance. ERSH 8310 Keppel and Wickens Chapter 15

The Analysis of Covariance. ERSH 8310 Keppel and Wickens Chapter 15 The Analyss of Covarance ERSH 830 Keppel and Wckens Chapter 5 Today s Class Intal Consderatons Covarance and Lnear Regresson The Lnear Regresson Equaton TheAnalyss of Covarance Assumptons Underlyng the

More information

Optimal resource capacity management for stochastic networks

Optimal resource capacity management for stochastic networks Submtted for publcaton. Optmal resource capacty management for stochastc networks A.B. Deker H. Mlton Stewart School of ISyE, Georga Insttute of Technology, Atlanta, GA 30332, ton.deker@sye.gatech.edu

More information

CHAPTER 14 MORE ABOUT REGRESSION

CHAPTER 14 MORE ABOUT REGRESSION CHAPTER 14 MORE ABOUT REGRESSION We learned n Chapter 5 that often a straght lne descrbes the pattern of a relatonshp between two quanttatve varables. For nstance, n Example 5.1 we explored the relatonshp

More information

MARKET SHARE CONSTRAINTS AND THE LOSS FUNCTION IN CHOICE BASED CONJOINT ANALYSIS

MARKET SHARE CONSTRAINTS AND THE LOSS FUNCTION IN CHOICE BASED CONJOINT ANALYSIS MARKET SHARE CONSTRAINTS AND THE LOSS FUNCTION IN CHOICE BASED CONJOINT ANALYSIS Tmothy J. Glbrde Assstant Professor of Marketng 315 Mendoza College of Busness Unversty of Notre Dame Notre Dame, IN 46556

More information

Financial market forecasting using a two-step kernel learning method for the support vector regression

Financial market forecasting using a two-step kernel learning method for the support vector regression Ann Oper Res (2010) 174: 103 120 DOI 10.1007/s10479-008-0357-7 Fnancal market forecastng usng a two-step kernel learnng method for the support vector regresson L Wang J Zhu Publshed onlne: 28 May 2008

More information

Gender Classification for Real-Time Audience Analysis System

Gender Classification for Real-Time Audience Analysis System Gender Classfcaton for Real-Tme Audence Analyss System Vladmr Khryashchev, Lev Shmaglt, Andrey Shemyakov, Anton Lebedev Yaroslavl State Unversty Yaroslavl, Russa vhr@yandex.ru, shmaglt_lev@yahoo.com, andrey.shemakov@gmal.com,

More information

The Analysis of Outliers in Statistical Data

The Analysis of Outliers in Statistical Data THALES Project No. xxxx The Analyss of Outlers n Statstcal Data Research Team Chrysses Caron, Assocate Professor (P.I.) Vaslk Karot, Doctoral canddate Polychrons Economou, Chrstna Perrakou, Postgraduate

More information

Brigid Mullany, Ph.D University of North Carolina, Charlotte

Brigid Mullany, Ph.D University of North Carolina, Charlotte Evaluaton And Comparson Of The Dfferent Standards Used To Defne The Postonal Accuracy And Repeatablty Of Numercally Controlled Machnng Center Axes Brgd Mullany, Ph.D Unversty of North Carolna, Charlotte

More information

THE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES

THE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES The goal: to measure (determne) an unknown quantty x (the value of a RV X) Realsaton: n results: y 1, y 2,..., y j,..., y n, (the measured values of Y 1, Y 2,..., Y j,..., Y n ) every result s encumbered

More information

Extending Probabilistic Dynamic Epistemic Logic

Extending Probabilistic Dynamic Epistemic Logic Extendng Probablstc Dynamc Epstemc Logc Joshua Sack May 29, 2008 Probablty Space Defnton A probablty space s a tuple (S, A, µ), where 1 S s a set called the sample space. 2 A P(S) s a σ-algebra: a set

More information