(Almost) No Label No Cry


 Briana Bradford
 2 years ago
 Views:
Transcription
1 (Almost) No Label No Cry Gorgo Patrn,, Rchard Nock,, Paul Rvera,, Tbero Caetano,3,4 Australan Natonal Unversty, NICTA, Unversty of New South Wales 3, Ambata 4 Sydney, NSW, Australa Abstract In Learnng wth Label Proportons (LLP), the objectve s to learn a supervsed classfer when, nstead of labels, only label proportons for bags of observatons are known Ths settng has broad practcal relevance, n partcular for prvacy preservng data processng We frst show that the mean operator, a statstc whch aggregates all labels, s mnmally suffcent for the mnmzaton of many proper scorng losses wth lnear (or kernelzed) classfers wthout usng labels We provde a fast learnng algorthm that estmates the mean operator va a manfold regularzer wth guaranteed approxmaton bounds Then, we present an teratve learnng algorthm that uses ths as ntalzaton We ground ths algorthm n Rademacherstyle generalzaton bounds that ft the LLP settng, ntroducng a generalzaton of Rademacher complexty and a Label Proporton Complexty measure Ths latter algorthm optmzes tractable bounds for the correspondng bagemprcal rsk Experments are provded on fourteen domans, whose sze ranges up to 300K observatons They dsplay that our algorthms are scalable and tend to consstently outperform the state of the art n LLP Moreover, n many cases, our algorthms compete wth or are just percents of AUC away from the Oracle that learns knowng all labels On the largest domans, half a dozen proportons can suffce, e roughly 40K tmes less than the total number of labels Introducton Machne learnng has recently experenced a prolferaton of problem settngs that, to some extent, enrch the classcal dchotomy between supervsed and unsupervsed learnng Cases as multple nstance labels, nosy labels, partal labels as well as semsupervsed learnng have been studed motvated by applcatons where fully supervsed learnng s no longer realstc In the present work, we are nterested n learnng a bnary classfer from nformaton provded at the level of groups of nstances, called bags The type of nformaton we assume avalable s the label proportons per bag, ndcatng the fracton of postve bnary labels of ts nstances Inspred by [], we refer to ths framework as Learnng wth Label Proportons (LLP) Settngs that perform a bagwse aggregaton of labels nclude Multple Instance Learnng (MIL) [] In MIL, the aggregaton s logcal rather than statstcal: each bag s provded wth a bnary label expressng an OR condton on all the labels contaned n the bag More general settng also exst [3] [4] [5] Many practcal scenaros ft the LLP abstracton (a) Only aggregated labels can be obtaned due to the physcal lmts of measurement tools [6] [7] [8] [9] (b) The problem s sem or unsupervsed but doman experts have knowledge about the unlabelled samples n form of expectaton, as pseudomeasurement [5] (c) Labels exsted once but they are now gven n an aggregated fashon for prvacypreservng reasons, as n medcal databases [0], fraud detecton [], house prce market, electon results, census data, etc (d) Ths settng also arses n computer vson [] [3] [4] Related work The settng was frst ntroduced by [], where a prncpled herarchcal model generates labels consstent wth the proportons and s traned through MCMC Subsequently, [9] and ts follower [6] offer a varety of standard learnng algorthms desgned to generate selfconsstent
2 labels [5] gves a Bayesan nterpretaton of LLP where the key dstrbuton s estmated through an RBM Other deas rely on structural learnng of Bayesan networks wth mssng data [7], and on K MEANS clusterng to solve prelmnary label assgnment [3] [8] Recent SVM mplementatons [] [6] outperform most of the other known methods Theoretcal works on LLP belong to two man categores The frst contans unform convergence results, for the estmators of label proportons [], or the estmator of the mean operator [7] The second contans approxmaton results for the classfer [7] Our work bulds upon ther Mean Map algorthm, that reles on the trck that the logstc loss may be splt n two, a convex part dependng only on the observatons, and a lnear part nvolvng a suffcent statstc for the label, the mean operator Beng able to estmate the mean operator means beng able to ft a classfer wthout usng labels In [7], ths estmaton reles on a restrctve homogenety assumpton that the classcondtonal estmaton of features does not depend on the bags Experments dsplay the lmts of ths assumpton [][6] Contrbutons In ths paper we consder lnear classfers, but our results hold for kernelzed formulatons followng [7] We frst show that the trck about the logstc loss can be generalzed, and the mean operator s actually mnmally suffcent for a wde set of symmetrc proper scorng losses wth no classdependent msclassfcaton cost, that encompass the logstc, square and Matsushta losses [8] We then provde an algorthm, LMM, whch estmates the mean operator va a Laplacanbased manfold regularzer wthout callng to the homogenety assumpton We show that under a weak dstngushablty assumpton between bags, our estmaton of the mean operator s all the better as the observatons norm ncrease Ths, as we show, cannot hold for the Mean Map estmator Then, we provde a datadependent approxmaton bound for our classfer wth respect to the optmal classfer, that s shown to be better than prevous bounds [7] We also show that the manfold regularzer s soluton s tghtly related to the lnear separablty of the bags We then provde an teratve algorthm, AMM, that takes as nput the soluton of LMM and optmzes t further over the set of consstent labelngs We ground the algorthm n a unform convergence result nvolvng a generalzaton of Rademacher complextes for the LLP settng The bound nvolves a bagemprcal surrogate rsk for whch we show that AMM optmzes tractable bounds All our theoretcal results hold for any symmetrc proper scorng loss Experments are provded on fourteen domans, rangng from hundreds to hundreds of thousands of examples, comparng AMM and LMM to ther contenders: Mean Map, InvCal [] and SVM [6] They dsplay that AMM and LMM outperform ther contenders, and sometmes even compete wth the fully supervsed learner whle requrng few proportons only Tests on the largest domans dsplay the scalablty of both algorthms Such expermental evdence serously questons the safety of prvacypreservng summarzaton of data, whenever accurate aggregates and nformatve ndvdual features are avalable Secton () presents our algorthms and related theoretcal results Secton (3) presents experments Secton (4) concludes A Supplementary Materal [9] ncludes proofs and addtonal experments LLP and the mean operator: theoretcal results and algorthms Learnng settng Hereafter, boldfaces lke p denote vectors, whose coordnates are denoted p l for l,, For any m N, let [m] {,,, m} Let Σ m {σ {, } m } and X R d Examples are couples (observaton, label) X Σ, sampled d accordng to some unknown but fxed dstrbuton D Let S {(x, y ), [m]} D m denote a szem sample In Learnng wth Label Proportons (LLP), we do not observe drectly S but S y, whch denotes S wth labels removed; we are gven ts partton n n > 0 bags, S y j S j, j [n], along wth ther respectve label proportons ˆπ j ˆP[y + S j ] and bag proportons ˆp j m j /m wth m j card(s j ) (Ths generalzes to a cover of S, by copyng examples among bags) The bag assgnment functon that parttons S s unknown but fxed In real world domans, t would rather be known, eg state, gender, age band A classfer s a functon h : X R, from a set of classfers H H L denotes the set of lnear classfers, noted h θ (x) θ x wth θ X A (surrogate) loss s a functon F : R R + We let F (S, h) (/m) F (y h(x )) denote the emprcal surrogate rsk on S correspondng to loss F For the sake of clarty, ndexes, j and k respectvely refer to examples, bags and features The mean operator and ts mnmal suffcency µ S m We defne the (emprcal) mean operator as: y x ()
3 Algorthm Laplacan Mean Map (LMM) Input S j, ˆπ j, j [n]; γ > 0 (7); w (7); V (8); permssble φ (); λ > 0; Step : let B± arg mn X R n d l(l, X) usng (7) (Lemma ) Step : let µ S j ˆp j(ˆπ j b+ j ( ˆπ j) b j ) Step 3 : let θ arg mn θ F φ (S y, θ, µ S ) + λ θ (3) Return θ Table : Correspondence between permssble functons φ and the correspondng loss F φ loss name F φ (x) φ(x) logstc loss log( + exp( x)) x log x ( x) log( x) square loss ( x) x( x) Matsushta loss x + + x x( x) The estmaton of the mean operator µ S appears to be a learnng bottleneck n the LLP settng [7] The fact that the mean operator s suffcent to learn a classfer wthout the label nformaton motvates the noton of mnmal suffcent statstc for features n ths context Let F be a set of loss functons, H be a set of classfers, I be a subset of features Some quantty t(s) s sad to be a mnmal suffcent statstc for I wth respect to F and H ff: for any F F, any h H and any two samples S and S, the quantty F (S, h) F (S, h) does not depend on I ff t(s) t(s ) Ths defnton can be motvated from the one n statstcs by buldng losses from log lkelhoods The followng Lemma motvates further the mean operator n the LLP settng, as t s the mnmal suffcent statstc for a broad set of proper scorng losses that encompass the logstc and square losses [8] The proper scorng losses we consder, hereafter called symmetrc (SPSL), are twce dfferentable, nonnegatve and such that msclassfcaton cost s not labeldependent Lemma µ S s a mnmal suffcent statstc for the label varable, wth respect to SPSL and H L ([9], Subsecton ) Ths property, very useful for LLP, may also be exploted n other weakly supervsed tasks [] Up to constant scalngs that play no role n ts mnmzaton, the emprcal surrogate rsk correspondng to any SPSL, F φ (S, h), can be wrtten wth loss: F φ (x) φ(0) + φ ( x) a φ + φ ( x), () φ(0) φ(/) b φ and φ s a permssble functon [0, 8], e dom(φ) [0, ], φ s strctly convex, dfferentable and symmetrc wth respect to / φ s the convex conjugate of φ Table shows examples of F φ It follows from Lemma and ts proof, that any F φ (Sθ), can be wrtten for any θ h θ H L as: ( ) F φ (S, θ) b φ F φ (σθ x ) m θ µ S F φ (S y, θ, µ S ), (3) where σ Σ σ The Laplacan Mean Map (LMM) algorthm The sum n eq (3) s convex and dfferentable n θ Hence, once we have an accurate estmator of µ S, we can then easly ft θ to mnmze F φ (S y, θ, µ S ) Ths twosteps strategy s mplemented n LMM n algorthm µ S can be retreved from n bagwse, labelwse unknown averages b σ j : n µ S (/) ˆp j j σ Σ (ˆπ j + σ( σ))b σ j, (4) wth b σ j E S [x σ, j] denotng these n unknowns (for j [n], σ Σ ), and let b j (/m j ) x S j x The n b σ j s are soluton of a set of n denttes that are (n matrx form): B Π B ± 0, (5) 3
4 where B [b b b n ] R n d, Π [DIAG(ˆπ) DIAG( ˆπ)] R n n and B ± R n d s the matrx of unknowns: [ ] B ± b + b + b + n b  b  b  n (6) } {{ } } {{ } (B + ) (B ) System (5) s underdetermned, unless one makes the homogenety assumpton that yelds the Mean Map estmator [7] Rather than makng such a restrctve assumpton, we regularze the cost that brngs (5) wth a manfold regularzer [], and search for B± arg mn X R n d l(l, X), wth: l(l, X) tr ( (B X Π)D w (B Π X) ) + γtr ( X ) LX, (7) and γ > 0 D w DIAG(w) s a userfxed bas matrx wth w R n +, (and w ˆp n general) and: [ ] La 0 L εi + R 0 n n, (8) L a where L a D V R n n s the Laplacan of the bag smlartes V s a symmetrc smlarty matrx wth non negatve coordnates, and the dagonal matrx D satsfes d jj j v jj, j [n] The sze of the Laplacan s O(n ), whch s small compared to O(m ) f there are not many bags One can nterpret the Laplacan regularzaton as smoothng the estmates of b σ j wrt the smlarty of the respectve bags Lemma The soluton B± to mn X R n d l(l, X) s B± ( ΠD w Π + γl ) ΠDw B ([9], Subsecton ) Ths Lemma explans the role of penalty εi n (8) as ΠD w Π and L have respectvely n and ( )dm null spaces, so the nverson may not be possble Even when ths does not happen exactly, ths may ncur numercal nstabltes n computng the nverse For domans where ths rsk exsts, pckng a small ε > 0 solves the problem Let b σ j denote the rowwse decomposton of B± followng (6), from whch we compute µ S followng (4) when we use these n estmates n leu of the true b σ j We compare µ j ˆπ j b + j ( ˆπ j)b j, j [n] to our estmates µ j ˆπ j b+ j ( ˆπ j) b j, j [n], granted that µ S j ˆp jµ j and µ S j ˆp j µ j Theorem 3 Suppose that γ satsfes γ ((ε(n) ) + max j j v jj )/ mn j w j Let M [µ µ µ n ] R n d, M [ µ µ µ n ] R n d and ς(v, B ± ) ((ε(n) ) + max j j v jj ) B ± F The followng holds: M M F ( ) n mn wj ς(v, B ± ) (9) j ([9], Subsecton 3) The multplcatve factor to ς n (9) s roughly O(n 5/ ) when there s no large dscrepancy n the bas matrx D w, so the upperbound s drven by ς(, ) when there are not many bags We have studed ts varatons when the dstngushablty between bags ncreases Ths settng s nterestng because n ths case we may kll two brds n one shot, wth the estmaton of M and the subsequent learnng problem potentally easer, n partcular for lnear separators We consder two examples for v jj, the frst beng (half) the normalzed assocaton []: v nc jj ( ASSOC(Sj, S j ) ASSOC(S j, S j S j ) + ASSOC(S j, S j ) ASSOC(S j, S j S j ) ) NASSOC(S j, S j ), (0) v G,s jj exp( b j b j /s), s > 0 () Here, ASSOC(S j, S j ) x S j,x S x x j [] To put these two smlarty measures n the context of Theorem 3, consder the settng where we can make assumpton (D) that there exsts a small constant κ > 0 such that b j b j κ max σ,j b σ j, j, j [n] Ths s a weak dstngushablty property as f no such κ exsts, then the centers of dstnct bags may just be confounded Consder also the addtonal assumpton, (D), that there exsts κ > 0 such that max j d j κ, j [n], where d j max x,x x Sj x s a bag s dameter In the followng Lemma, the lttleoh notaton s wth respect to the largest unknown n eq (4), e max σ,j b σ j 4
5 Algorthm Alternatng Mean Map (AMM OPT ) Input LMM parameters + optmzaton strategy OPT {mn, max} + convergence predcate PR Step : let θ 0 LMM(LMM parameters) and t 0 Step : repeat Step : let σ t arg OPT σ Σ ˆπ F φ (S y, θ t, µ S (σ)) Step : let θ t+ arg mn θ F φ (S y, θ, µ S (σ t )) + λ θ Step 3 : let t t + untl predcate PR s true Return θ arg mn t F φ (S y, θ t+, µ S (σ t )) Lemma 4 There exsts ε > 0 such that ε ε, the followng holds: () ς(v nc, B ± ) o() under assumptons (D + D); () ς(v G,s, B ± ) o() under assumpton (D), s > 0 ([9], Subsecton 4) Hence, provded a weak (D) or stronger (D+D) dstngushablty assumpton holds, the dvergence between M and M gets smaller wth the ncrease of the norm of the unknowns b σ j The proof of the Lemma suggests that the convergence may be faster for VG,s The followng Lemma shows that both smlartes also partally encode the hardness of solvng the classfcaton problem wth lnear separators, so that the manfold regularzer lmts the dstorton of the b ± s between two bags that tend not to be lnearly separable Lemma 5 Take v jj {v G, jj, vnc jj } There exsts 0 < κ l < κ n < such that () f v jj > κ n then S j, S j are not lnearly separable, and f v jj < κ l then S j, S j are lnearly separable ([9], Subsecton 5) Ths Lemma s an advocacy to ft s n a datadependent way n v G,s jj The queston may be rased as to whether fnte samples approxmaton results lke Theorem 3 can be proven for the Mean Map estmator [7] [9], Subsecton 6 answers by the negatve In the Laplacan Mean Map algorthm (LMM, Algorthm ), Steps and have now been descrbed Step 3 s a dfferentable convex mnmzaton problem for θ that does not use the labels, so t does not present any techncal dffculty An nterestng queston s how much our classfer θ n Step 3 dverges from the one that would be computed wth the true expresson for µ S, θ It s not hard to show that Lemma 7 n Altun and Smola [3], and Corollary 9 n Quadranto et al [7] hold for LMM so that θ θ (λ) µ S µ S The followng Theorem shows a datadependent approxmaton bound that can be sgnfcantly better, when t holds that θ x, θ x φ ([0, ]), (φ s the frst dervatve) We call ths settng proper scorng complance (PSC) [8] PSC always holds for the logstc and Matsushta losses for whch φ ([0, ]) R For other losses lke the square loss for whch φ ([0, ]) [, ], shrnkng the observatons n a ball of suffcently small radus s suffcent to ensure ths Theorem 6 Let f k R m denote the vector encodng the k th feature varable n S : f k x k (k [d]) Let F denote the feature matrx wth columnwse normalzed feature vectors: fk (d/ k f k ) (d )/(d) f k Under PSC, we have θ θ (λ + q) µ S µ S, wth: q det F F m e b φ φ (φ (q /λ)) (> 0), () for some q I [±(x + max{ µ S, µ S })] Here, x max x and φ (φ ) ([9], Subsecton 7) To see how large q can be, consder the smple case where all egenvalues of F F, λk ( F F) [λ ± δ] for small δ In ths case, q s proportonal to the average feature norm : det F F tr ( ) F F + o(δ) x + o(δ) m md md 5
6 The Alternatng Mean Map (AMM) algorthm Let us denote Σˆπ {σ Σ m : :x S j σ (ˆπ j )m j, j [n]} the set of labelngs that are consstent wth the observed proportons ˆπ, and µ S (σ) (/m) σ x the based mean operator computed from some σ Σˆπ Notce that the true mean operator µ S µ S (σ) for at least one σ Σˆπ The Alternatng Mean Map algorthm, (AMM, Algorthm ), starts wth the output of LMM and then optmzes t further over the set of consstent labelngs At each teraton, t frst pcks a consstent labelng n Σˆπ that s the best (OPT mn) or the worst (OPT max) for the current classfer (Step ) and then fts a classfer θ on the gven set of labels (Step ) The algorthm then terates untl a convergence predcate s met, whch tests whether the dfference between two values for F φ (,, ) s too small (AMM mn ), or the number of teratons exceeds a userspecfed lmt (AMM max ) The classfer returned θ s the best n the sequence In the case of AMM mn, t s the last of the sequence as rsk F φ (S y,, ) cannot ncrease Agan, Step s a convex mnmzaton wth no techncal dffculty Step s combnatoral It can be solved n tme almost lnear n m [9] (Subsecton 8) Lemma 7 The runnng tme of Step n AMM s Õ(m), where the tlde notaton hdes logterms BagRademacher generalzaton bounds for LLP We relate the mn and max strateges of AMM by unform convergence bounds nvolvng the true surrogate rsk, e ntegratng the unknown dstrbuton D and the true labels (whch we may never know) Prevous unform convergence bounds for LLP focus on coarser graned problems, lke the estmaton of label proportons [] We rely on a LLP generalzaton of Rademacher complexty [4, 5] Let F : R R + be a loss functon and H a set of classfers The bag emprcal Rademacher complexty of sample S, Rm, b s defned as Rm b E σ Σm sup h H {E σ Σ ˆπ E S [σ(x)f (σ (x)h(x))] The usual emprcal Rademacher complexty equals Rm b for card(σˆπ ) The Label Proporton Complexty of H s: L m E Dm E I /,I / sup E S [σ (x)(ˆπ s (x) ˆπl (x))h(x)] (3) h H Here, each of I / l, l, s a random (unformly) subset of [m] of cardnal m Let S(I/ l ) be the szem subset of S that corresponds to the ndexes Take l, and any x S If I / l then ˆπ l s (x ) ˆπ l l (x ) s x s bag s label proporton measured on S\S(I / l ) Else, ˆπs (x ) s ts bag s label proporton measured on S(I / ) and ˆπl (x ) s ts label (e a bag s label proporton that would contan only x ) Fnally, σ (x) x S(I / ) Σ L m tends to be all the smaller as classfers n H have small magntude on bags whose label proporton s close to / Theorem 8 Suppose h 0 st h(x) h, x, h Then, for any loss F φ, any tranng sample of sze m and any 0 < δ, wth probablty > δ, the followng bound holds over all h H: ( ) E D [F φ (yh(x))] E Σ ˆπ E S [F φ (σ(x)h(x))] + Rm b h + L m b φ m log δ (4) Furthermore, under PSC (Theorem 6), we have for any F φ : Rm b b φ E Σm sup {E S [σ(x)(ˆπ(x) (/))h(x)]} (5) h H ([9], Subsecton 9) Despte smlar shapes (3) (5), R b m and L m behave dfferently: when bags are pure (ˆπ j {0, }, j), L m 0 When bags are mpure (ˆπ j /, j), R b m 0 As bags get mpure, the bagemprcal surrogate rsk, E Σ ˆπ E S [F φ (σ(x)h(x))], also tends to ncrease AMM mn and AMM max respectvely mnmze a lowerbound and an upperbound of ths rsk 3 Experments Algorthms We compare LMM, AMM (F φ logstc loss) to the orgnal MM [7], InvCal [], conv SVM and alter SVM [6] (lnear kernels) To make experments extensve, we test several ntalzatons for AMM that are not dsplayed n Algorthm (Step ): () the edge mean map estmator, µ S EMM /m ( y )( x ) (AMM EMM ), () the constant estmator µ S (AMM ), and fnally AMM 0ran whch runs 0 random ntal models ( θ 0 ), and selects the one wth smallest rsk; 6
7 AUC rel to MM 3 0 MM LMM G LMM G,s LMM nc 4 6 dvergence (a) AUC rel to Oracle MM LMM G LMM G,s LMM nc (b) AUC rel to Oracle AMM MM AMM G AMM G,s AMM nc AMM 0ran (c) AUC Oracle AMM G Bgger domans Small domans 0^ 5 0^ 3 0^ #bags/#nstance (d) Fgure : Relatve AUC (wrt MM) as homogenety assumpton s volated (a) Relatve AUC (wrt Oracle) vs on heart for LMM(b), AMM mn (c) AUC vs n/m for AMM mn G and the Oracle (d) Table : Small domans results #wn/#lose for row vs column Bold faces means pval < 00 for Wlcoxon sgnedrank tests Topleft subtable s for oneshot methods, bottomrght teratve ones, bottomleft compare the two Italc s stateoftheart Grey cells hghlght the best of all (AMM mn G ) LMM algorthm MM LMM InvCal AMM mn AMM max conv G G,s nc MM G G,s 0ran MM G G,s 0ran SVM AMM mn AMM max SVM G 36/4 G,s 38/3 30/6 nc 8/ 3/37 /37 InvCal 4/46 3/47 4/46 4/46 MM 33/6 6/4 5/5 3/8 46/4 G 38/ 35/4 30/0 37/3 47/3 3/7 G,s 35/4 33/7 30/0 35/5 47/3 4/ 7/5 eg AMM mn G,s wns on AMMmn G 7 tmes, loses 5, wth 8 tes 0ran 7/ 4/6 /8 6/4 44/6 0/30 6/34 9/3 MM 5/5 3/7 /8 5/5 45/5 5/35 3/37 3/37 8/4 G 7/3 /8 /8 6/4 45/5 7/33 4/36 4/36 0/40 3/4 G,s 5/5 /9 /8 4/6 45/5 5/35 3/37 3/37 /38 5/ 6/ 0ran 3/7 /9 9/3 4/6 50/0 9/3 5/35 7/33 7/43 9/30 0/9 7/3 conv /9 /48 /48 /48 /48 4/46 3/47 3/47 4/46 3/47 3/47 4/46 0/50 alter 0/50 0/50 0/50 0/50 0/30 0/50 0/50 0/50 3/47 3/47 /48 /49 0/50 7/3 ths s the same procedure of alter SVM Matrx V (eqs (0), ()) used s ndcated n subscrpt: LMM/AMM G, LMM/AMM G,s, LMM/AMM nc respectvely denote v G,s wth s, v G,s wth s learned on cross valdaton (CV; valdaton ranges ndcated n [9]) and v nc For space reasons, results not dsplayed n the paper can be found n [9], Secton 3 (ncludng runtme comparsons, and detaled results by doman) We splt the algorthms n two groups, oneshot and teratve The latter, ncludng AMM, (conv/alter) SVM, teratvely optmze a cost over labelngs (always consstent wth label proportons for AMM, not always for (conv/alter) SVM) The former (LMM, InvCal) do not and are thus much faster Tests are done on a 4core 3GHz CPUs Mac wth 3GB of RAM AMM/LMM/MM are mplemented n R Code for InvCal and SVM s [6] Smulated domans, MM and the homogenety assumpton The testng metrc s the AUC Pror to testng on our domans, we generate 6 domans that gradually move away the b σ j away from each other (wrt j), thus volatng ncreasngly the homogenety assumpton [7] The degree of volaton s measured as B ± B ± F, where B ± s the homogenety assumpton matrx, that replaces all b σ j by b σ for σ {, }, see eq (5) Fgure (a) dsplays the ratos of the AUC of LMM to the AUC of MM It shows that LMM s all the better wth respect to MM as the homogenety assumpton s volated Furthermore, learnng s n LMM mproves the results Experments on the smulated doman of [6] on whch MM obtans zero accuracy also dsplay that our algorthms perform better ( teraton only of AMM max brngs 00% AUC) Small and large domans experments We convert 0 small domans [9] (m 000) and 4 bgger ones (m > 8000) from UCI[6] nto the LLP framework We cast to oneaganstall classfcaton when the problem s multclass On large domans, the bag assgnment functon s nspred by []: we craft bags accordng to a selected feature value, and then we remove that feature from the data Ths conforms to the dea that bag assgnment s structured and non random n realworld problems Most of our small domans, however, do not have a lot of features, so nstead of clusterng on one feature and then dscard t, we run KMEANS on the whole data to make the bags, for K n [5] Small domans results We perform 5folds nested CV comparsons on the 0 domans 50 AUC values for each algorthm Table synthesses the results [9], splttng oneshot and teratve algo 7
8 Table 3: AUCs on bg domans (name: #nstances #features) Icapshape, IIhabtat, IIIcapcolour, IVrace, Veducaton, VIcountry, VIIpoutcome, VIIIjob (number of bags); for each feature, the best result over oneshot, and over teratve algorthms s bold faced AMM mn AMM max algorthm mushroom: adult: marketng: 45 4 census: I(6) II(7) III(0) IV(5) V(6) VI(4) V(4) VII(4) VIII() IV(5) VIII(9) VI(4) EMM MM LMM G LMM G,s AMMEMM AMMMM AMM G AMM G,s AMM AMMEMM AMMMM AMM G AMM G,s AMM Oracle rthms LMM G,s outperforms all oneshot algorthms LMM G and LMM G,s are compettve wth many teratve algorthms, but lose aganst ther AMM counterpart, whch proves that addtonal optmzaton over labels s benefcal AMM G and AMM G,s are confrmed as the best varant of AMM, the frst beng the best n ths case Surprsngly, all mean map algorthms, even oneshots, are clearly superor to SVMs Further results [9] reveal that SVM performances are dampened by learnng classfers wth the nverted polarty e flppng the sgn of the classfer mproves ts performances Fgure (b, c) presents the AUC relatve to the Oracle (whch learns the classfer knowng all labels and mnmzng the logstc loss), as a functon of the Gn of bag assgnment, gn(s) 4E j [ˆπ j ( ˆπ j )] For an close to, we were expectng a drop n performances The unexpected [9] s that on some domans, large entropes ( 8) do not prevent AMM mn to compete wth the Oracle No such pattern clearly emerges for SVM and AMM max [9] Bg domans results We adopt a /5 holdout method Scalablty results [9] dsplay that every method usng v nc and SVM are not scalable to bg domans; n partcular, the estmated tme for a sngle run of alter SVM s >00 hours on the adult doman Table 3 presents the results on the bg domans, dstngushng the feature used for bag assgnment Bg domans confrm the effcency of LMM+AMM No approach clearly outperforms the rest, although LMM G,s s often the best oneshot Synthess Fgure (d) gves the AUCs of AMM mn G over the Oracle for all domans [9], as a functon of the degree of supervson, n/m ( f the problem s fully supervsed) Notceably, on 90% of the runs, AMM mn G gets an AUC representng at least 70% of the Oracle s Results on bg domans can be remarkable: on the census doman wth bag assgnment on race, 5 proportons are suffcent for an AUC 5 ponts below the Oracle s whch learns wth 00K labels 4 Concluson In ths paper, we have shown that effcent learnng n the LLP settng s possble, for general loss functons, va the mean operator and wthout resortng to the homogenety assumpton Through ts estmaton, the suffcency allows one to resort to standard learnng procedures for bnary classfcaton, practcally mplementng a reducton between machne learnng problems [7]; hence the mean operator estmaton may be a vable shortcut to tackle other weakly supervsed settngs [] [3] [4] [5] Approxmaton results and generalzaton bounds are provded Experments dsplay results that are superor to the state of the art, wth algorthms that scale to bg domans at affordable computatonal costs Performances sometmes compete wth the Oracle s that learns knowng all labels, even on bg domans Such expermental fndng poses severe mplcatons on the relablty of prvacypreservng aggregaton technques wth smple group statstcs lke proportons Acknowledgments NICTA s funded by the Australan Government through the Department of Communcatons and the Australan Research Councl through the ICT Centre of Excellence Program G Patrn acknowledges that part of the research was conducted at the Commonwealth Bank of Australa We thank A Menon, D GarcíaGarcía, N de Fretas for nvaluable feedback, and FYu for help wth the code 8
9 References [] F X Yu, S Kumar, T Jebara, and S F Chang On learnng wth label proportons CoRR, abs/40590, 04 [] T G Detterch, R H Lathrop, and T LozanoPérez Solvng the multple nstance problem wth axsparallel rectangles Artfcal Intellgence, 89:3 7, 997 [3] G S Mann and A McCallum Generalzed expectaton crtera for semsupervsed learnng of condtonal random felds In 46 th ACL, 008 [4] J Graça, K Ganchev, and B Taskar Expectaton maxmzaton and posteror constrants In NIPS*0, pages , 007 [5] P Lang, M I Jordan, and D Klen Learnng from measurements n exponental famles In 6 th ICML, pages , 009 [6] D J Muscant, J M Chrstensen, and J F Olson Supervsed learnng by tranng on aggregate outputs In 7 th ICDM, pages 5 6, 007 [7] J HernándezGonzález, I Inza, and J A Lozano Learnng bayesan network classfers from label proportons Pattern Recognton, 46(): , 03 [8] M Stolpe and K Mork Learnng from label proportons by optmzng cluster model selecton In 5 th ECMLPKDD, pages , 0 [9] B C Chen, L Chen, R Ramakrshnan, and D R Muscant Learnng from aggregate vews In th ICDE, pages 3 3, 006 [0] J Wojtusak, K Irvn, A Brerdnc, and A V Baranova Usng publshed medcal results and nonhomogenous data n rule learnng In 0 th ICMLA, pages 84 89, 0 [] S Rüpng Svm classfer estmaton from group probabltes In 7 th ICML, pages 9 98, 00 [] H Kueck and N de Fretas Learnng about ndvduals from group statstcs In th UAI, pages , 005 [3] S Chen, B Lu, M Qan, and C Zhang Kernel kmeans based framework for aggregate outputs classfcaton In 9 th ICDMW, pages , 009 [4] K T La, F X Yu, M S Chen, and S F Chang Vdeo event detecton by nferrng temporal nstance labels In th CVPR, 04 [5] K Fan, H Zhang, S Yan, L Wang, W Zhang, and J Feng Learnng a generatve classfer from label proportons Neurocomputng, 39:47 55, 04 [6] F X Yu, D Lu, S Kumar, T Jebara, and S F Chang SVM for Learnng wth Label Proportons In 30 th ICML, pages 504 5, 03 [7] N Quadranto, A J Smola, T S Caetano, and Q V Le Estmatng labels from label proportons JMLR, 0: , 009 [8] R Nock and F Nelsen Bregman dvergences and surrogates for learnng IEEE TransPAMI, 3: , 009 [9] G Patrn, R Nock, P Rvera, and T S Caetano (Almost) no label no cry  supplementary materal In NIPS*7, 04 [0] M J Kearns and Y Mansour On the boostng ablty of topdown decson tree learnng algorthms In 8 th ACM STOC, pages , 996 [] M Belkn, P Nyog, and V Sndhwan Manfold regularzaton: A geometrc framework for learnng from labeled and unlabeled examples JMLR, 7: , 006 [] J Sh and J Malk Normalzed cuts and mage segmentaton IEEE TransPAMI, : , 000 [3] Y Altun and A J Smola Unfyng dvergence mnmzaton and statstcal nference va convex dualty In 9 th COLT, pages 39 53, 006 [4] P L Bartlett and S Mendelson Rademacher and gaussan complextes: Rsk bounds and structural results JMLR, 3:463 48, 00 [5] V Koltchnsk and D Panchenko Emprcal margn dstrbutons and boundng the generalzaton error of combned classfers Ann of Stat, 30: 50, 00 [6] K Bache and M Lchman UCI machne learnng repostory, 03 [7] A Beygelzmer, V Dan, T Hayes, J Langford, and B Zadrozny Error lmtng reductons between classfcaton tasks In th ICML, pages 49 56, 005 9
10 (Almost) No Label No Cry  Supplementary Materal Gorgo Patrn,, Rchard Nock,, Paul Rvera,, Tbero Caetano,3,4 Australan Natonal Unversty, NICTA, Unversty of New South Wales 3, Ambata 4 Sydney, NSW, Australa Table of contents Supplementary materal on proofs Pg Proof of Lemma Pg Proof of Lemma Pg Proof of Theorem 3 Pg 3 Proof of Lemma 4 Pg 4 Proof of Lemma 5 Pg 6 Mean Map estmator s Lemma and Proof Pg 8 Proof of Theorem 6 Pg 9 Proof of Lemma 7 Pg 3 Proof of Theorem 8 Pg 3 Supplementary materal on experments Pg 7 Full Expermental Setup Pg 7 Smulated Doman for Volaton of Homogenety Assumpton Pg 8 Smulated Doman from [] Pg 8 Addtonal Tests on alter SVM [] Pg 8 Scalablty Pg 9 Full Results on Small Domans Pg 9
11 Supplementary Materal on Proofs Proof of Lemma For any SPSL F (S, h), we can wrte t as ([], Lemma, [3]): F (S, h) F φ (S, h) D φ (y m φ (h(x ))), () where y ff y and 0 otherwse, φ s permssble and D φ s the Bregman dvergence wth generator φ [3] It also holds that: D φ (y φ (h(x ))) b φ F φ (yh(x)) wth: F φ (x) φ ( x) + φ(0) φ(0) φ(/) a φ + φ ( x), () b φ and φ s the convex conjugate of φ, e φ (x) xφ (x) φ(φ (x)) Furthermore, for any permssble φ, the conjex conjugate φ (x) verfes the property φ ( x) φ (x) x, (3) and so we get that: F (S, h) D φ (y m φ (h(x ))) b φ m b φ m b φ m b φ m b φ m b φ m F φ (y h(x )) ( F φ (y h(x )) + ) F φ (y h(x )) ( F φ (y h(x )) + ) F φ ( y h(x )) y h(x ) b φ F φ (yh(x )) y h(x ) m y {,+} ( ) F φ (σh(x )) h y x m σ {,+} σ {,+} F φ (σh(x )) h (µ S) (6) (4) holds because of (3), (5) holds because h s lnear So for any samples S and S wth respectve sze m and m, we have (agan usng the property that h s lnear): ( ) F (S, h) F (S, h) b φ F φ (σh(x )) m m F φ (σh(x )) x S x S σ {,+} whch yelds the statement of the Lemma Proof of Lemma Usng the fact that D w and L are symmetrc, we have: l(l, X) X + h (µ S µ S ), (7) X tr ( B D w Π ) X + X tr ( X ΠD w Π ) X + γ X tr ( X ) LX ΠD w B + ΠD w Π X + γlx 0, out of whch B± follows n Lemma (4) (5)
12 3 Proof of Theorem 3 We let Π o [DIAG(ˆπ) DIAG(ˆπ )] N an orthonormal system (n jj (ˆπ j +( ˆπ j) ) /, j [n] and 0 otherwse) Let K Πo be the ndm subspace of R d generated by Π o The proof of Theorem (3) explots the followng Lemma, whch assumes that ε s any > 0 real for L n (8) (man fle) to be 0 When ε 0, the result of Theorem (3) stll holds but follows a dfferent proof Lemma Let A ΠD w Π and L defned as n (8) (man paper) Denote for short U ( L A + γ I ) (8) Suppose there exsts ξ > 0 such that for any x R n, the projecton of Ux n K Πo, x U,o, satsfes Then: Proof Combnng Lemma and (5), we get x U,o ξ x (9) M M F γξ B ± F (0) B ± B± Defne the followng permutaton matrx: C ( ) (A + γl) A I B ± ( (γl) A + I ) B ± () [ 0 I I 0 ] R n n () A ΠD w Π s not nvertble but dagonalsable Its (orthonormal) egenvectors can be parttoned n two matrces P o and P such that: We have: P o P [DIAG(ˆπ ) DIAG(ˆπ)] N CΠ o R n n (egenvalues 0), (3) ΠN R n n (egenvalues w j (ˆπ j + ( ˆπ j) ), j) (4) M M P o CB ± P o C B± P ( o C (γl) A + ) I B ± Π ( o (γl) A + ) I B ± (5) γπ ( o L A + γ ) I B ± (6) Eq (5) follows from the fact that C s dempotent Pluggng Frobenus norm n (6), we obtan M M F γ Π ( o L A + γ ) I B ± F γ d k Π o ( L A + γ I ) b ± k d γ ξ b ± k (7) k γ ξ B ± F, whch yelds (0) In (7), b ± k denotes column k n B± Ineq (7) makes use of assumpton (9) To ensure x U,o ξ x, t s suffcent that Ux ξ x, and snce Ux U F x, t s suffcent to show that, (8) U ξ F 3
13 wth U ξ L ξ A + ξγ I, for relevant choces of ξ We have let L ξ (/ξ)l Let 0 λ () λ n () denote the ordered egenvalues of a postvesemdefnte matrx n R n n It follows that, snce L s symmetrc postve defnte, we have λ j (L ξ A) λ j(a) λ n (L ξ ) ( 0), j [n] We have used eq (3) Weyl s Theorem then brngs: λ j (U ξ ) λ n (L ξ ) λ j (A) + ξγ λ n (L ξ ) { ξ γ f j [n] λ n(l ξ ) λ j(a) otherwse (9) Gershgorn s Theorem brngs λ n (/ξ)(ε + max j j l jj ), and furthermore the egenvalues of A satsfy λ j w j /, j n + We thus have: U ξ F nγ ξ ) 4n (ε + max j j l + jj ξ mn j wj (0) In (9) and (0), we have used the egenvalues of A gven n eqs (3) and (4) Assumng: γ ξ n, () a suffcent condton for the rghthand sde of (0) to be s that ξ ε + max j j l jj n mn j w j () To fnsh up the proof, recall that L D V wth d jj j,j v jj and the coordnates v jj 0 Hence, l jj j j j v jj n max v jj, j [n] j j The proof s fnshed by pluggng ths upperbound n () to choose ξ, then takng the maxmal value for γ n () and fnally solvng the upperbound n (0) Ths ends the proof of Theorem 3 4 Proof of Lemma 4 We frst consder the normalzed assocaton crteron n (0): ASSOC(S j, S j ) vjj N ( ASSOC(Sj, S j ) ASSOC(S j, S j S j ) + x S j,x S j ASSOC(S ) j, S j ) ASSOC(S j, S j S j ) x x (3), 4
14 Remark that b j b j x x m j m j x S j x S j m x + j x S j m j x S j m + j m j m j x S j x x S j x + m j m j x S j,x S j x S j m j x S j x x x x x m j m j x S j m j m j x m j m j x S j,x S j x S j,x S j x x x S j x x x (4) + m j x m j m + m j x j m j m x x j m j m j x S j x S j x S j,x S j } {{ } a x x (5) m j m j x S j,x S j ASSOC(S j, S j ) (6) m j m j ( n ) ( Eq (4) explots the fact that j a n ) j n j a j and eq (5) explots the fact that a (m j m j ) x S j,x S x j x We thus have: ASSOC(S j, S j ) ASSOC(S j, S j S j ) ASSOC(S j, S j ) ASSOC(S j, S j ) + ASSOC(S j, S j ) ASSOC(S j, S j ) ASSOC(S j, S j ) + mjm j b j b j κ m j κ m j + mjm j b j b j + m j κ b j b j 5 (7) (8) (9)
15 Eq (7) uses (6) and eq (8) uses assumpton (D) Eq (8) also holds when permutng j and j, so we get: ( ) ς(v NC ε, B ± ) max j j n + + mj κ b j b j + + m j κ b j b j B ± F ( ) ε n + B ± mnj mj F + κ mn j,j b j b j ( ) ε n + B ± mnj mj F (30) + κ mn j,j b j b j ε n d max σ,j bσ j + 4κ d max σ,j b σ j mn j,j b j b j ε n d max 4κ d σ,j bσ j + κ max σ,j b σ j ) f (max NC σ,j bσ j o(), (3) where the last nequalty uses assumpton (D), and (30) uses the property that (a+b) a +b We have let f NC (x) ε n dx + 4κ d κx, (3) whch s ndeed o() f ε o(n / x) Ths proves the Lemma for ς(v NC, B ± ) The case of ς(v G,s, B ± ) s easer, as ( exp b ) ( j b j exp mn j,j b j b j ) s s ( exp κ ) s max σ,j bσ j, from assumpton (D) alone, whch gves ( ( ε ς(v G,s, B ± ) B ± F n + exp κ )) s max σ,j bσ j ( ( ε B ± F n + exp κ )) s max σ,j bσ j ( ( ε d max σ,j bσ j n + exp κ )) s max σ,j bσ j ) f (max G σ,j bσ j o(), (33) as clamed We have let f G (x) ε n dx+dx exp( κx/s), whch s ndeed o() f ε o(n / x) Remark that we shall have n general f G (x) f NC (x) and even f G (x) o(f NC (x)) f ε 0, so we may expect better convergence n the case of V G,s as max σ,j b σ j grows 5 Proof of Lemma 5 We frst restate the Lemma n a more explct way, that shall provde explct values for κ l and κ n Lemma There exst κ jj and s jj dependng on d j, d j, and κ jj > dependng on m j, m j, such that: 6
16 If v G,s jj jj > exp( /4) then S j, S j are not lnearly separable; If v G,s jj jj < exp( 64) then S j, S j are lnearly separable; If v NC jj If v NC jj > κ jj then S j, S j are not lnearly separable; < κ jj /κ jj then S j, S j are lnearly separable Proof We frst consder the normalzed assocaton crteron n (0), and we prove the Lemma for the followng expressons of κ jj and κ jj : κ jj d jj + d jj d j d j, (34) κ jj 5 max{m j, m j }, (35) wth d jj max{d j, d j } and d j max x,x S j x x, j j [n] For any bag S j, we let (b j, r j) MEB(S j ) denote the mnmum enclosng ball (MEB) for bag S j and dstance L, that s, r j s the smallest unque real such that!b j : d(x, b j ) x b j r j, x S j We have let d(x, b j ) x b j We are gong to prove a frst result nvolvng the MEBs of S j and S j, and then wll translate the result to the Lemma s statement The followng propertes follows from standard propertes of MEBs and the fact that d(, ) s a dstance (they hold for any j j ): (a) d(x, x ) r j, x, x S j ; (b) If bags S j and S j are lnearly separable, then x CO(S j ), x S j such that d(x, x ) max{r j, r j }; here, CO denotes the convex closure; (c) If bags S j and S j are lnearly separable, then d(b j, b j ) max{r j, r j }, where b j and b j are the bags average; (d) x S j, x S j st d(x, x ) r j ; (e) d(x, x ) max{r j, r j } + d(b j, b j ), x CO(S j), x CO(S j ) Let us defne ASSOC(S j, S j ) d (x, x ) (36) x S j,x S j We remark that, assumng that each bag contans at least two elements wthout loss of generalty: vjj NC + (37) + ASSOC(Bj,B j ) ASSOC(B j,b j) + ASSOC(Bj,B j ) ASSOC(B j,b j ) We have ASSOC(S j, S j ) 4m j rj and ASSOC(S j, S j ) 4m j r j (because of (a)), and also ASSOC(S j, S j ) max{m j, m j } max{rj, r j } when S j and S j are lnearly separable (because of (b)), whch yelds n ths case vjj NC + + max{mj,m j } max{r j,r j } m jrj + max{r j,r j } r j + + max{mj,m j } max{r j,r j } m j r j + max{r j,r j } r j (38) Let us name κ jj the rghthand sde of (38) It follows that when vnc jj > κ jj, S j and S j are not lnearly separable 7
17 On the other hand, we have ASSOC(S j, S j ) m j rj and ASSOC(S j, S j ) m j r j (because of (d)), and also ASSOC(S j, S j ) m j m j ( max{r j, r j } + d(b j, b j )) m j m j (4 max{rj, rj } + d (b j, b j )), (39) because of (e) and the fact that (a + b) a + b It follows that j j : vjj NC + (40) + m j (4 max{r j,r j }+d (b j,b j )) + mj(4 max{r j,r j }+d (b j,b j )) rj r j For any j j, when d (b j, b j ) 4 max{r j, r j }, then we have from (40): vjj NC + + 6m j max{r j,r j } + 6mj max{r j,r j } rj r j > κ jj /(3 max{m j, m j }) (4) Hence, when vjj NC κ jj /(3 max{m j, m j }), t mples d(b j, b j ) > max{r j, r j }, mplyng d(b j, b j ) > r j + r j, whch s a suffcent condton for the lnear separablty of S j and S j So, we can relate the lnear separablty of S j and S j to the value of vjj NC wth respect to κ jj defned n (38) To remove the dependence n the MEB parameters and obtan the statement of the Lemma, we just have to remark that d j /4 r j 4d j, j [n], whch yelds κ jj /6 κ jj κ jj Hence, when vjj NC > κ jj, t follows that vnc jj > κ jj and S j and S j are not lnearly separable On the other hand, when vjj NC κ jj /(6 3 max{m j, m j }) κ jj /κ jj, then vjj NC κ jj /(3 max{m j, m j }) and the bags S j and S j are lnearly separable Ths acheves the proof of Lemma 5 for the normalzed assocaton crteron n (0) The proof for v G,s jj s shorter, and we prove t for s j,j max{d j, d j } (4) We have (/) max{d j, d j } max{r j, r j } max{d j, d j } Hence, because of (c) above, f S j and S j are lnearly separable, then v G,s jj /e/4 ; so, when v G,s jj > /e/4, the two bags are not lnearly separable On the other hand, f d(b j, b j ) max{r j, r j }, then because of (e) above d(b j, b j ) 4 max{r j, r j } 8 max{d j, d j }, and so v G,s jj /e64 Ths mples that f v G,s jj < /e64, then d(b j, b j ) > max{r j, r j } r j + r j, and thus the two bags are lnearly separable, as clamed Ths acheves the proof of Lemma Ths acheves the proof of Lemma 5 6 Mean Map estmator s Lemma and Proof It s not hard to check that the randomzed procedure that bulds µ S RAND yx for some random x S and y {, } guarantees O( + γ) approxmablty when some bags are close to the convex hull of S, for small γ > 0 Hence, the Mean Map estmaton of µ S can be very poor n that respect Lemma 3 For any γ > 0, the Mean Map estmator µ S MM µ S / max σ,j b σ j γ, even when (D + D) hold cannot guarantee µ MM S Proof Let x > 0, ɛ (0, ), p (0, ), p / We create a dataset from four observatons, {(x 0, ), (x 0, ), (x 3 x, ), (x 4 x, )} There are two bags, S takes ɛ of x and ɛ of x S takes ɛ of x 4 and ɛ of x 3 The labelwse estmators µ σ of [4] are soluton of ( [ ] [ ] ɛ ɛ ɛ ɛ [ µ µ ] ɛ ɛ ɛ [ ( ɛ)x ɛx ] ɛ 8 ɛ ] ) [ ɛ ɛ ɛ ɛ ] [ x 0 (43)
18 On the other hand, the true quanttes are: [ ] µ µ [ ( ɛ)x ɛx ] (44) We now mx classes n S and pck bag proportons q P S [S ] and q P S [S ] We have the class proportons defned by P S [y +] ɛq + ( ɛ)( q) p Then ( ) ( ) µ S µ S p( ɛ) ɛ x ( p)ɛ ɛ x ɛ p ɛ ɛ x ɛ( q)x (45) Furthermore, max b σ x We get µ S µ S max b σ ɛ( q) (46) Pckng ɛ and ( q) both > (γ/) s suffcent to have eq (46) > γ for any γ > 0 Remark that both assumptons (D) and (D) hold for any κ < and any κ > 0 7 Proof of Theorem 6 The proof of the Theorem nvolves two Lemmata, the frst of whch s of ndependent nterest and holds for any convex twce dfferentable functon F, and not just any F φ So, let us defne: ( ) b F (S y, θ, µ) F (σθ x ) m θ µ (47) where b s any fxed postve real Defne also the regularzed loss: F (S y, θ, µ, λ) F (S y, θ, µ) + λ θ (48) Let f k R m denote the vector encodng the k th varable n S : f k x k For any k [d], let ( d f k σ k f k denote a normalzaton of vectors f k n the sense that d f k ( d d k ( d k f k f k k ) d d fk (49) ) d ) d k f k (50) Let Ṽ collect all vectors f k n column and V collect all vectors f k n column Wthout loss of generalty, we assume V V 0, e V V postve defnte (e no feature s a lnear combnaton of the others), mplyng, because the columns of Ṽ are just postve rescalng of the columns of V, that Ṽ Ṽ 0 as well We use V nstead of F as n the man paper, n order not to counfound wth the general convex surrogate notaton F that we use here Lemma 4 Gven any two µ and µ, let θ and θ be the respectve mnmzers of F (S y,, µ, λ) and F (S y,, µ, λ) Suppose there exsts F > 0 such that surrogate F satsfes F (±(αθ + ( α)θ ) x ) F, α [0, ], [m] (5) Then the followng holds: θ θ λ + em F vol (Ṽ) µ µ, (5) where vol(ṽ) det Ṽ Ṽ denote the volume of the (row/column) system of Ṽ 9
19 Proof Our proof begns followng the same frst steps as the proof of Lemma 7 n [5], addng the steps that handle the lowerbound on F Consder the followng auxlary functon A F (τ ): A F (τ ) ( F (S y, θ, µ) F (S y, θ, µ ) ) (τ θ ) + λ τ θ, (53) where the gradent of F s computed wth respect to parameter θ The gradent of A F () s: The gradent of A F satsfes A F (τ ) F (S y, θ, µ) F (S y, θ, µ ) + λ(τ θ ), (54) A F (θ ) F (S y, θ, µ, λ) F (S y, θ, µ, λ) 0, (55) as both gradents n the rght are 0 because of the optmalty of θ and θ wth respect to F (S y,, µ, λ) and F (S y,, µ, λ) The Hessan H of A F s HA F (τ ) λi 0 and so A F s convex and s thus mnmal at τ θ Fnally, A F (θ ) 0 It comes thus A F (θ ) 0, whch yelds equvalently: 0 ( F (S y, θ, µ) F (S y, θ, µ ) ) (θ θ ) + λ θ θ ( ) b F (yθ x ) m µ b F (yθ x ) + m µ (θ θ ) y y +λ θ θ ( b F (yθ x ) ) F (yθ x ) (θ θ m ) y y } {{ } a (µ µ ) (θ θ ) + λ θ θ (56) Let us lowerbound a We have F (yθ x) yf (yθ x)x, and a Taylor expanson brngs that for any θ, θ, there exsts some α [0, ] such that, defnng we have: We thus get: a u α, y(αθ + ( α)θ ) x, (57) F (yθ x ) F (yθ x ) + y(θ θ ) x F (u α, ) (58) ( F (yθ x ) y y ( y ) F (yθ x ) (θ θ ) y(f (yθ x ) F (yθ x ))x ) (θ θ ) ( ) (θ θ ) x F (u α, )x (θ θ ) y ((θ θ ) x ) F (u α, ) F ((θ θ ) x ) (59) F (θ θ ) SS (θ θ ), (60) where matrx S R d m s formed by the observatons of S y n columns, and neq (59) comes from (5) Defne T (d/ x )SS Its trace satsfes tr (T) d Let λ d λ d λ > 0 0
20 denote egenvalues of T, wth λ strctly postve because SS V V 0 The AGH nequalty brngs: Multplyng both sde by λ and rearrangng yelds: d λ k ( ) d d λ k (6) d k ( ) d tr (T) λ d ( ) d d λ d ( ) d d (6) d λ ( ) d d det T (63) d Let λ > 0 denote the mnmal egenvalue of SS It satsfes λ ( x /d)λ and thus t comes from neq (63): ( ) d ( ) d d d λ d x det SS ( ) [ d ( ) ] d d d det d x SS ( ) d d det Ṽ Ṽ (64) d ( ) d d vol (Ṽ) (65) d e vol (Ṽ) (66) We have used notaton vol(ṽ) det Ṽ Ṽ Snce (θ θ ) SS (θ θ ) λ θ θ, combnng (60) wth (66) yelds the followng lowerbound on a: Gong back to (56), we get λ θ θ (µ µ ) (θ θ ) + a e F vol (Ṽ) θ θ (67) b em F vol (Ṽ) θ θ 0 Snce (µ µ ) (θ θ ) µ µ θ θ, we get after channg the nequaltes and solvng for θ θ : as clamed θ θ λ + em F vol (Ṽ) µ µ, The second Lemma s used to (5) when F (x) F φ Notce that we cannot rely on strong convexty arguments on F φ, as ths do not hold n general The Lemma s stated n a more general settng than for just F F φ
Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification
Lecture 4: More classfers and classes C4B Machne Learnng Hlary 20 A. Zsserman Logstc regresson Loss functons revsted Adaboost Loss functons revsted Optmzaton Multple class classfcaton Logstc Regresson
More informationWhat is Candidate Sampling
What s Canddate Samplng Say we have a multclass or mult label problem where each tranng example ( x, T ) conssts of a context x a small (mult)set of target classes T out of a large unverse L of possble
More information1 Example 1: Axisaligned rectangles
COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture # 6 Scrbe: Aaron Schld February 21, 2013 Last class, we dscussed an analogue for Occam s Razor for nfnte hypothess spaces that, n conjuncton
More informationSupport Vector Machines
Support Vector Machnes Max Wellng Department of Computer Scence Unversty of Toronto 10 Kng s College Road Toronto, M5S 3G5 Canada wellng@cs.toronto.edu Abstract Ths s a note to explan support vector machnes.
More informationRecurrence. 1 Definitions and main statements
Recurrence 1 Defntons and man statements Let X n, n = 0, 1, 2,... be a MC wth the state space S = (1, 2,...), transton probabltes p j = P {X n+1 = j X n = }, and the transton matrx P = (p j ),j S def.
More informationLuby s Alg. for Maximal Independent Sets using Pairwise Independence
Lecture Notes for Randomzed Algorthms Luby s Alg. for Maxmal Independent Sets usng Parwse Independence Last Updated by Erc Vgoda on February, 006 8. Maxmal Independent Sets For a graph G = (V, E), an ndependent
More informationCS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements
Lecture 3 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square Next lecture: Matlab tutoral Announcements Rules for attendng the class: Regstered for credt Regstered for audt (only f there
More information8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by
6 CHAPTER 8 COMPLEX VECTOR SPACES 5. Fnd the kernel of the lnear transformaton gven n Exercse 5. In Exercses 55 and 56, fnd the mage of v, for the ndcated composton, where and are gven by the followng
More information1 Approximation Algorithms
CME 305: Dscrete Mathematcs and Algorthms 1 Approxmaton Algorthms In lght of the apparent ntractablty of the problems we beleve not to le n P, t makes sense to pursue deas other than complete solutons
More informationThe Greedy Method. Introduction. 0/1 Knapsack Problem
The Greedy Method Introducton We have completed data structures. We now are gong to look at algorthm desgn methods. Often we are lookng at optmzaton problems whose performance s exponental. For an optmzaton
More informationPSYCHOLOGICAL RESEARCH (PYC 304C) Lecture 12
14 The Chsquared dstrbuton PSYCHOLOGICAL RESEARCH (PYC 304C) Lecture 1 If a normal varable X, havng mean µ and varance σ, s standardsed, the new varable Z has a mean 0 and varance 1. When ths standardsed
More informationModule 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur
Module LOSSLESS IMAGE COMPRESSION SYSTEMS Lesson 3 Lossless Compresson: Huffman Codng Instructonal Objectves At the end of ths lesson, the students should be able to:. Defne and measure source entropy..
More informationbenefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).
REVIEW OF RISK MANAGEMENT CONCEPTS LOSS DISTRIBUTIONS AND INSURANCE Loss and nsurance: When someone s subject to the rsk of ncurrng a fnancal loss, the loss s generally modeled usng a random varable or
More informationCS 2750 Machine Learning. Lecture 17a. Clustering. CS 2750 Machine Learning. Clustering
Lecture 7a Clusterng Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square Clusterng Groups together smlar nstances n the data sample Basc clusterng problem: dstrbute data nto k dfferent groups such that
More information8 Algorithm for Binary Searching in Trees
8 Algorthm for Bnary Searchng n Trees In ths secton we present our algorthm for bnary searchng n trees. A crucal observaton employed by the algorthm s that ths problem can be effcently solved when the
More informationHow Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence
1 st Internatonal Symposum on Imprecse Probabltes and Ther Applcatons, Ghent, Belgum, 29 June 2 July 1999 How Sets of Coherent Probabltes May Serve as Models for Degrees of Incoherence Mar J. Schervsh
More informationThe Development of Web Log Mining Based on ImproveKMeans Clustering Analysis
The Development of Web Log Mnng Based on ImproveKMeans Clusterng Analyss TngZhong Wang * College of Informaton Technology, Luoyang Normal Unversty, Luoyang, 471022, Chna wangtngzhong2@sna.cn Abstract.
More informationForecasting the Direction and Strength of Stock Market Movement
Forecastng the Drecton and Strength of Stock Market Movement Jngwe Chen Mng Chen Nan Ye cjngwe@stanford.edu mchen5@stanford.edu nanye@stanford.edu Abstract  Stock market s one of the most complcated systems
More informationCausal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting
Causal, Explanatory Forecastng Assumes causeandeffect relatonshp between system nputs and ts output Forecastng wth Regresson Analyss Rchard S. Barr Inputs System Cause + Effect Relatonshp The job of
More informationL10: Linear discriminants analysis
L0: Lnear dscrmnants analyss Lnear dscrmnant analyss, two classes Lnear dscrmnant analyss, C classes LDA vs. PCA Lmtatons of LDA Varants of LDA Other dmensonalty reducton methods CSCE 666 Pattern Analyss
More informationFeature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College
Feature selecton for ntruson detecton Slobodan Petrovć NISlab, Gjøvk Unversty College Contents The feature selecton problem Intruson detecton Traffc features relevant for IDS The CFS measure The mrmr measure
More informationCan Auto Liability Insurance Purchases Signal Risk Attitude?
Internatonal Journal of Busness and Economcs, 2011, Vol. 10, No. 2, 159164 Can Auto Lablty Insurance Purchases Sgnal Rsk Atttude? ChuShu L Department of Internatonal Busness, Asa Unversty, Tawan ShengChang
More informationForecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network
700 Proceedngs of the 8th Internatonal Conference on Innovaton & Management Forecastng the Demand of Emergency Supples: Based on the CBR Theory and BP Neural Network Fu Deqang, Lu Yun, L Changbng School
More informationv a 1 b 1 i, a 2 b 2 i,..., a n b n i.
SECTION 8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS 455 8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS All the vector spaces we have studed thus far n the text are real vector spaces snce the scalars are
More informationLogistic Regression. Steve Kroon
Logstc Regresson Steve Kroon Course notes sectons: 24.324.4 Dsclamer: these notes do not explctly ndcate whether values are vectors or scalars, but expects the reader to dscern ths from the context. Scenaro
More informationTHE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek
HE DISRIBUION OF LOAN PORFOLIO VALUE * Oldrch Alfons Vascek he amount of captal necessary to support a portfolo of debt securtes depends on the probablty dstrbuton of the portfolo loss. Consder a portfolo
More informationInstitute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic
Lagrange Multplers as Quanttatve Indcators n Economcs Ivan Mezník Insttute of Informatcs, Faculty of Busness and Management, Brno Unversty of TechnologCzech Republc Abstract The quanttatve role of Lagrange
More informationMANY machine learning and pattern recognition applications
1 Trace Rato Problem Revsted Yangqng Ja, Fepng Ne, and Changshu Zhang Abstract Dmensonalty reducton s an mportant ssue n many machne learnng and pattern recognton applcatons, and the trace rato problem
More informationLatent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006
Latent Class Regresson Statstcs for Psychosocal Research II: Structural Models December 4 and 6, 2006 Latent Class Regresson (LCR) What s t and when do we use t? Recall the standard latent class model
More informationECE544NA Final Project: Robust Machine Learning Hardware via Classifier Ensemble
1 ECE544NA Fnal Project: Robust Machne Learnng Hardware va Classfer Ensemble Sa Zhang, szhang12@llnos.edu Dept. of Electr. & Comput. Eng., Unv. of Illnos at UrbanaChampagn, Urbana, IL, USA Abstract In
More informationSingle and multiple stage classifiers implementing logistic discrimination
Sngle and multple stage classfers mplementng logstc dscrmnaton Hélo Radke Bttencourt 1 Dens Alter de Olvera Moraes 2 Vctor Haertel 2 1 Pontfíca Unversdade Católca do Ro Grande do Sul  PUCRS Av. Ipranga,
More informationNonlinear data mapping by neural networks
Nonlnear data mappng by neural networks R.P.W. Dun Delft Unversty of Technology, Netherlands Abstract A revew s gven of the use of neural networks for nonlnear mappng of hgh dmensonal data on lower dmensonal
More informationRealistic Image Synthesis
Realstc Image Synthess  Combned Samplng and Path Tracng  Phlpp Slusallek Karol Myszkowsk Vncent Pegoraro Overvew: Today Combned Samplng (Multple Importance Samplng) Renderng and Measurng Equaton Random
More informationAn Alternative Way to Measure Private Equity Performance
An Alternatve Way to Measure Prvate Equty Performance Peter Todd Parlux Investment Technology LLC Summary Internal Rate of Return (IRR) s probably the most common way to measure the performance of prvate
More information6. EIGENVALUES AND EIGENVECTORS 3 = 3 2
EIGENVALUES AND EIGENVECTORS The Characterstc Polynomal If A s a square matrx and v s a nonzero vector such that Av v we say that v s an egenvector of A and s the correspondng egenvalue Av v Example :
More informationA Probabilistic Theory of Coherence
A Probablstc Theory of Coherence BRANDEN FITELSON. The Coherence Measure C Let E be a set of n propostons E,..., E n. We seek a probablstc measure C(E) of the degree of coherence of E. Intutvely, we want
More informationLoop Parallelization
  Loop Parallelzaton C52 Complaton steps: nested loops operatng on arrays, sequentell executon of teraton space DECLARE B[..,..+] FOR I :=.. FOR J :=.. I B[I,J] := B[I,J]+B[I,J] ED FOR ED FOR analyze
More informationDescriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications
CMSC828G Prncples of Data Mnng Lecture #9 Today s Readng: HMS, chapter 9 Today s Lecture: Descrptve Modelng Clusterng Algorthms Descrptve Models model presents the man features of the data, a global summary
More informationFace Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)
Face Recognton Problem Face Verfcaton Problem Face Verfcaton (1:1 matchng) Querymage face query Face Recognton (1:N matchng) database Applcaton: Access Control www.vsage.com www.vsoncs.com Bometrc Authentcaton
More informationThe eigenvalue derivatives of linear damped systems
Control and Cybernetcs vol. 32 (2003) No. 4 The egenvalue dervatves of lnear damped systems by YeongJeu Sun Department of Electrcal Engneerng IShou Unversty Kaohsung, Tawan 840, R.O.C emal: yjsun@su.edu.tw
More informationOutofSample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering
OutofSample Extensons for LLE, Isomap, MDS, Egenmaps, and Spectral Clusterng Yoshua Bengo, JeanFranços Paement, Pascal Vncent Olver Delalleau, Ncolas Le Roux and Mare Oumet Département d Informatque
More informationPerformance Analysis and Coding Strategy of ECOC SVMs
Internatonal Journal of Grd and Dstrbuted Computng Vol.7, No. (04), pp.6776 http://dx.do.org/0.457/jgdc.04.7..07 Performance Analyss and Codng Strategy of ECOC SVMs Zhgang Yan, and Yuanxuan Yang, School
More informationDEFINING %COMPLETE IN MICROSOFT PROJECT
CelersSystems DEFINING %COMPLETE IN MICROSOFT PROJECT PREPARED BY James E Aksel, PMP, PMISP, MVP For Addtonal Informaton about Earned Value Management Systems and reportng, please contact: CelersSystems,
More informationJ. Parallel Distrib. Comput.
J. Parallel Dstrb. Comput. 71 (2011) 62 76 Contents lsts avalable at ScenceDrect J. Parallel Dstrb. Comput. journal homepage: www.elsever.com/locate/jpdc Optmzng server placement n dstrbuted systems n
More informationBERNSTEIN POLYNOMIALS
OnLne Geometrc Modelng Notes BERNSTEIN POLYNOMIALS Kenneth I. Joy Vsualzaton and Graphcs Research Group Department of Computer Scence Unversty of Calforna, Davs Overvew Polynomals are ncredbly useful
More informationwhere the coordinates are related to those in the old frame as follows.
Chapter 2  Cartesan Vectors and Tensors: Ther Algebra Defnton of a vector Examples of vectors Scalar multplcaton Addton of vectors coplanar vectors Unt vectors A bass of noncoplanar vectors Scalar product
More informationPoint cloud to point cloud rigid transformations. Minimizing Rigid Registration Errors
Pont cloud to pont cloud rgd transformatons Russell Taylor 600.445 1 600.445 Fall 000014 Copyrght R. H. Taylor Mnmzng Rgd Regstraton Errors Typcally, gven a set of ponts {a } n one coordnate system and
More informationJoint Scheduling of Processing and Shuffle Phases in MapReduce Systems
Jont Schedulng of Processng and Shuffle Phases n MapReduce Systems Fangfe Chen, Mural Kodalam, T. V. Lakshman Department of Computer Scence and Engneerng, The Penn State Unversty Bell Laboratores, AlcatelLucent
More informationNew Approaches to Support Vector Ordinal Regression
New Approaches to Support Vector Ordnal Regresson We Chu chuwe@gatsby.ucl.ac.uk Gatsby Computatonal Neuroscence Unt, Unversty College London, London, WCN 3AR, UK S. Sathya Keerth selvarak@yahoonc.com
More informationAn InterestOriented Network Evolution Mechanism for Online Communities
An InterestOrented Network Evoluton Mechansm for Onlne Communtes Cahong Sun and Xaopng Yang School of Informaton, Renmn Unversty of Chna, Bejng 100872, P.R. Chna {chsun,yang}@ruc.edu.cn Abstract. Onlne
More informationOn the Solution of Indefinite Systems Arising in Nonlinear Optimization
On the Soluton of Indefnte Systems Arsng n Nonlnear Optmzaton Slva Bonettn, Valera Ruggero and Federca Tnt Dpartmento d Matematca, Unverstà d Ferrara Abstract We consder the applcaton of the precondtoned
More informationErrorPropagation.nb 1. Error Propagation
ErrorPropagaton.nb Error Propagaton Suppose that we make observatons of a quantty x that s subject to random fluctuatons or measurement errors. Our best estmate of the true value for ths quantty s then
More information) of the Cell class is created containing information about events associated with the cell. Events are added to the Cell instance
Calbraton Method Instances of the Cell class (one nstance for each FMS cell) contan ADC raw data and methods assocated wth each partcular FMS cell. The calbraton method ncludes event selecton (Class Cell
More informationAnalysis of Premium Liabilities for Australian Lines of Business
Summary of Analyss of Premum Labltes for Australan Lnes of Busness Emly Tao Honours Research Paper, The Unversty of Melbourne Emly Tao Acknowledgements I am grateful to the Australan Prudental Regulaton
More informationInequality and The Accounting Period. Quentin Wodon and Shlomo Yitzhaki. World Bank and Hebrew University. September 2001.
Inequalty and The Accountng Perod Quentn Wodon and Shlomo Ytzha World Ban and Hebrew Unversty September Abstract Income nequalty typcally declnes wth the length of tme taen nto account for measurement.
More informationMAPP. MERIS level 3 cloud and water vapour products. Issue: 1. Revision: 0. Date: 9.12.1998. Function Name Organisation Signature Date
Ttel: Project: Doc. No.: MERIS level 3 cloud and water vapour products MAPP MAPPATBDClWVL3 Issue: 1 Revson: 0 Date: 9.12.1998 Functon Name Organsaton Sgnature Date Author: Bennartz FUB Preusker FUB Schüller
More informationFisher Markets and Convex Programs
Fsher Markets and Convex Programs Nkhl R. Devanur 1 Introducton Convex programmng dualty s usually stated n ts most general form, wth convex objectve functons and convex constrants. (The book by Boyd and
More informationCommunication Networks II Contents
8 / 1  Communcaton Networs II (Görg)  www.comnets.unbremen.de Communcaton Networs II Contents 1 Fundamentals of probablty theory 2 Traffc n communcaton networs 3 Stochastc & Marovan Processes (SP
More informationWhen Network Effect Meets Congestion Effect: Leveraging Social Services for Wireless Services
When Network Effect Meets Congeston Effect: Leveragng Socal Servces for Wreless Servces aowen Gong School of Electrcal, Computer and Energy Engeerng Arzona State Unversty Tempe, AZ 8587, USA xgong9@asuedu
More informationSIMPLE LINEAR CORRELATION
SIMPLE LINEAR CORRELATION Smple lnear correlaton s a measure of the degree to whch two varables vary together, or a measure of the ntensty of the assocaton between two varables. Correlaton often s abused.
More informationEfficient Project Portfolio as a tool for Enterprise Risk Management
Effcent Proect Portfolo as a tool for Enterprse Rsk Management Valentn O. Nkonov Ural State Techncal Unversty Growth Traectory Consultng Company January 5, 27 Effcent Proect Portfolo as a tool for Enterprse
More informationPERRON FROBENIUS THEOREM
PERRON FROBENIUS THEOREM R. CLARK ROBINSON Defnton. A n n matrx M wth real entres m, s called a stochastc matrx provded () all the entres m satsfy 0 m, () each of the columns sum to one, m = for all, ()
More informationClustering Gene Expression Data. (Slides thanks to Dr. Mark Craven)
Clusterng Gene Epresson Data Sldes thanks to Dr. Mark Craven Gene Epresson Proles we ll assume we have a D matr o gene epresson measurements rows represent genes columns represent derent eperments tme
More informationA hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm
Avalable onlne www.ocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(7):18841889 Research Artcle ISSN : 09757384 CODEN(USA) : JCPRC5 A hybrd global optmzaton algorthm based on parallel
More informationBayesian Cluster Ensembles
Bayesan Cluster Ensembles Hongjun Wang 1, Hanhua Shan 2 and Arndam Banerjee 2 1 Informaton Research Insttute, Southwest Jaotong Unversty, Chengdu, Schuan, 610031, Chna 2 Department of Computer Scence &
More informationHow Much to Bet on Video Poker
How Much to Bet on Vdeo Poker Trstan Barnett A queston that arses whenever a gae s favorable to the player s how uch to wager on each event? Whle conservatve play (or nu bet nzes large fluctuatons, t lacks
More informationLecture 5,6 Linear Methods for Classification. Summary
Lecture 5,6 Lnear Methods for Classfcaton Rce ELEC 697 Farnaz Koushanfar Fall 2006 Summary Bayes Classfers Lnear Classfers Lnear regresson of an ndcator matrx Lnear dscrmnant analyss (LDA) Logstc regresson
More informationLearning from Multiple Outlooks
Learnng from Multple Outlooks Maayan Harel Department of Electrcal Engneerng, Technon, Hafa, Israel She Mannor Department of Electrcal Engneerng, Technon, Hafa, Israel maayanga@tx.technon.ac.l she@ee.technon.ac.l
More informationOn Mean Squared Error of Hierarchical Estimator
S C H E D A E I N F O R M A T I C A E VOLUME 0 0 On Mean Squared Error of Herarchcal Estmator Stans law Brodowsk Faculty of Physcs, Astronomy, and Appled Computer Scence, Jagellonan Unversty, Reymonta
More informationSVM Tutorial: Classification, Regression, and Ranking
SVM Tutoral: Classfcaton, Regresson, and Rankng Hwanjo Yu and Sungchul Km 1 Introducton Support Vector Machnes(SVMs) have been extensvely researched n the data mnng and machne learnng communtes for the
More informationPOLYSA: A Polynomial Algorithm for Nonbinary Constraint Satisfaction Problems with and
POLYSA: A Polynomal Algorthm for Nonbnary Constrant Satsfacton Problems wth and Mguel A. Saldo, Federco Barber Dpto. Sstemas Informátcos y Computacón Unversdad Poltécnca de Valenca, Camno de Vera s/n
More informationLecture 2: Single Layer Perceptrons Kevin Swingler
Lecture 2: Sngle Layer Perceptrons Kevn Sngler kms@cs.str.ac.uk Recap: McCullochPtts Neuron Ths vastly smplfed model of real neurons s also knon as a Threshold Logc Unt: W 2 A Y 3 n W n. A set of synapses
More informationSemiSupervised Text Classification Using Partitioned EM
SemSupervsed Text Classfcaton Usng Parttoned EM Gao Cong 1, Wee Sun Lee 1, Haoran Wu 1, Bng Lu 2 1 Department of Computer Scence, Natonal Unversty of Sngapore, Sngapore 117543 {conggao, leews, wuhaoran}@comp.nus.edu.sg
More informationAn MILP model for planning of batch plants operating in a campaignmode
An MILP model for plannng of batch plants operatng n a campagnmode Yanna Fumero Insttuto de Desarrollo y Dseño CONICET UTN yfumero@santafeconcet.gov.ar Gabrela Corsano Insttuto de Desarrollo y Dseño
More informationgreatest common divisor
4. GCD 1 The greatest common dvsor of two ntegers a and b (not both zero) s the largest nteger whch s a common factor of both a and b. We denote ths number by gcd(a, b), or smply (a, b) when there s no
More information+ + +   This circuit than can be reduced to a planar circuit
MeshCurrent Method The meshcurrent s analog of the nodeoltage method. We sole for a new set of arables, mesh currents, that automatcally satsfy KCLs. As such, meshcurrent method reduces crcut soluton to
More informationLearning Permutations with Exponential Weights
Journal of Machne Learnng Research 2009 (10) 17051736 Submtted 9/08; Publshed 7/09 Learnng Permutatons wth Exponental Weghts Davd P. Helmbold Manfred K. Warmuth Computer Scence Department Unversty of
More informationNPAR TESTS. OneSample ChiSquare Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6
PAR TESTS If a WEIGHT varable s specfed, t s used to replcate a case as many tmes as ndcated by the weght value rounded to the nearest nteger. If the workspace requrements are exceeded and samplng has
More informationOn the Optimal Control of a Cascade of HydroElectric Power Stations
On the Optmal Control of a Cascade of HydroElectrc Power Statons M.C.M. Guedes a, A.F. Rbero a, G.V. Smrnov b and S. Vlela c a Department of Mathematcs, School of Scences, Unversty of Porto, Portugal;
More information1. Measuring association using correlation and regression
How to measure assocaton I: Correlaton. 1. Measurng assocaton usng correlaton and regresson We often would lke to know how one varable, such as a mother's weght, s related to another varable, such as a
More informationLecture 18: Clustering & classification
O CPS260/BGT204. Algorthms n Computatonal Bology October 30, 2003 Lecturer: Pana K. Agarwal Lecture 8: Clusterng & classfcaton Scrbe: Daun Hou Open Problem In HomeWor 2, problem 5 has an open problem whch
More informationData Broadcast on a MultiSystem Heterogeneous Overlayed Wireless Network *
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 24, 819840 (2008) Data Broadcast on a MultSystem Heterogeneous Overlayed Wreless Network * Department of Computer Scence Natonal Chao Tung Unversty Hsnchu,
More informationAbstract. Clustering ensembles have emerged as a powerful method for improving both the
Clusterng Ensembles: {topchyal, Models jan, of punch}@cse.msu.edu Consensus and Weak Parttons * Alexander Topchy, Anl K. Jan, and Wllam Punch Department of Computer Scence and Engneerng, Mchgan State Unversty
More informationCHAPTER 14 MORE ABOUT REGRESSION
CHAPTER 14 MORE ABOUT REGRESSION We learned n Chapter 5 that often a straght lne descrbes the pattern of a relatonshp between two quanttatve varables. For nstance, n Example 5.1 we explored the relatonshp
More informationHeuristic Static LoadBalancing Algorithm Applied to CESM
Heurstc Statc LoadBalancng Algorthm Appled to CESM 1 Yur Alexeev, 1 Sher Mckelson, 1 Sven Leyffer, 1 Robert Jacob, 2 Anthony Crag 1 Argonne Natonal Laboratory, 9700 S. Cass Avenue, Argonne, IL 60439,
More informationProject Networks With MixedTime Constraints
Project Networs Wth MxedTme Constrants L Caccetta and B Wattananon Western Australan Centre of Excellence n Industral Optmsaton (WACEIO) Curtn Unversty of Technology GPO Box U1987 Perth Western Australa
More informationOn the Interaction between Load Balancing and Speed Scaling
On the Interacton between Load Balancng and Speed Scalng Ljun Chen, Na L and Steven H. Low Engneerng & Appled Scence Dvson, Calforna Insttute of Technology, USA Abstract Speed scalng has been wdely adopted
More informationx f(x) 1 0.25 1 0.75 x 1 0 1 1 0.04 0.01 0.20 1 0.12 0.03 0.60
BIVARIATE DISTRIBUTIONS Let be a varable that assumes the values { 1,,..., n }. Then, a functon that epresses the relatve frequenc of these values s called a unvarate frequenc functon. It must be true
More informationStatistical Methods to Develop Rating Models
Statstcal Methods to Develop Ratng Models [Evelyn Hayden and Danel Porath, Österrechsche Natonalbank and Unversty of Appled Scences at Manz] Source: The Basel II Rsk Parameters Estmaton, Valdaton, and
More informationINSTITUT FÜR INFORMATIK
INSTITUT FÜR INFORMATIK Schedulng jobs on unform processors revsted Klaus Jansen Chrstna Robene Bercht Nr. 1109 November 2011 ISSN 21926247 CHRISTIANALBRECHTSUNIVERSITÄT ZU KIEL Insttut für Informat
More informationDistributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers
Foundatons and Trends R n Machne Learnng Vol. 3, No. 1 (2010) 1 122 c 2011 S. Boyd, N. Parkh, E. Chu, B. Peleato and J. Ecksten DOI: 10.1561/2200000016 Dstrbuted Optmzaton and Statstcal Learnng va the
More informationUsing Mixture Covariance Matrices to Improve Face and Facial Expression Recognitions
Usng Mxture Covarance Matrces to Improve Face and Facal Expresson Recogntons Carlos E. homaz, Duncan F. Glles and Raul Q. Fetosa 2 Imperal College of Scence echnology and Medcne, Department of Computng,
More informationStudy on CET4 Marks in China s Graded English Teaching
Study on CET4 Marks n Chna s Graded Englsh Teachng CHE We College of Foregn Studes, Shandong Insttute of Busness and Technology, P.R.Chna, 264005 Abstract: Ths paper deploys Logt model, and decomposes
More informationThe Analysis of Covariance. ERSH 8310 Keppel and Wickens Chapter 15
The Analyss of Covarance ERSH 830 Keppel and Wckens Chapter 5 Today s Class Intal Consderatons Covarance and Lnear Regresson The Lnear Regresson Equaton TheAnalyss of Covarance Assumptons Underlyng the
More informationAnswer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy
4.02 Quz Solutons Fall 2004 MultpleChoce Questons (30/00 ponts) Please, crcle the correct answer for each of the followng 0 multplechoce questons. For each queston, only one of the answers s correct.
More informationOptimal resource capacity management for stochastic networks
Submtted for publcaton. Optmal resource capacty management for stochastc networks A.B. Deker H. Mlton Stewart School of ISyE, Georga Insttute of Technology, Atlanta, GA 30332, ton.deker@sye.gatech.edu
More informationMARKET SHARE CONSTRAINTS AND THE LOSS FUNCTION IN CHOICE BASED CONJOINT ANALYSIS
MARKET SHARE CONSTRAINTS AND THE LOSS FUNCTION IN CHOICE BASED CONJOINT ANALYSIS Tmothy J. Glbrde Assstant Professor of Marketng 315 Mendoza College of Busness Unversty of Notre Dame Notre Dame, IN 46556
More informationThe OC Curve of Attribute Acceptance Plans
The OC Curve of Attrbute Acceptance Plans The Operatng Characterstc (OC) curve descrbes the probablty of acceptng a lot as a functon of the lot s qualty. Fgure 1 shows a typcal OC Curve. 10 8 6 4 1 3 4
More informationVision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION
Vson Mouse Saurabh Sarkar a* a Unversty of Cncnnat, Cncnnat, USA ABSTRACT The report dscusses a vson based approach towards trackng of eyes and fngers. The report descrbes the process of locatng the possble
More informationThe covariance is the two variable analog to the variance. The formula for the covariance between two variables is
Regresson Lectures So far we have talked only about statstcs that descrbe one varable. What we are gong to be dscussng for much of the remander of the course s relatonshps between two or more varables.
More information