Accuracy at the Top. Abstract

Size: px
Start display at page:

Download "Accuracy at the Top. Abstract"

Transcription

1 Accuacy at the Top Stephen Boyd Stanfod Univesity Packad 64 Stanfod, CA Mehya Mohi Couant Institute and Google 5 Mece Steet New Yok, NY 00 mohi@cims.nyu.edu Coinna Cotes Google Reseach 76 Ninth Avenue New Yok, NY 00 coinna@google.com Ana Radovanovic Google Reseach 76 Ninth Avenue New Yok, NY 00 anaadovanovic@google.com Abstact We intoduce a new notion of classification accuacy based on the top -quantile values of a scoing function, a elevant citeion in a numbe of poblems aising fo seach engines. We define an algoithm optimizing a convex suogate of the coesponding loss, and discuss its solution in tems of a set of convex optimization poblems. We also pesent magin-based guaantees fo this algoithm based on the top -quantile value of the scoes of the functions in the hypothesis set. Finally, we epot the esults of seveal expeiments in the bipatite setting evaluating the pefomance of ou solution and compaing the esults to seveal othe algoithms seeking high pecision at the top. In most examples, ou solution achieves a bette pefomance in pecision at the top. Intoduction The accuacy of the items placed nea the top is cucial fo many infomation etieval systems such as seach engines o ecommendation systems, since most uses of these systems bowse o conside only the fist k items. Diffeent citeia have been intoduced in the past to measue this quality, including the pecision at k (Pecision@k), the nomalized discounted cumulative gain (NDCG) and othe vaiants of DCG, o the mean ecipocal ank (MRR) when the ank of the most elevant document is citical. A somewhat diffeent but also elated citeion adopted by [] is based on the position of the top ielevant item. Seveal machine leaning algoithms have been ecently designed to optimize these citeia and othe elated ones [6,,,, 7, 4, 3]. A geneal algoithm inspied by the stuctued pediction technique SVMStuct [] was incopoated in an algoithm by [5] which can be used to optimize a convex uppe bound on the numbe of eos among the top k items. The algoithm seeks to solve a convex poblem with exponentially many constaints via seveal ounds of optimization with a smalle numbe of constaints, augmenting the set of constaints at each ound with the most violating one. Anothe algoithm, also based on stuctued pediction ideas, is poposed in an unpublished manuscipt of [9] and coves seveal citeia, including Pecision@k and NDCG. A egession-based solution is suggested by [0] fo DCG in the case of lage sample sizes. Some othe methods have also been poposed to optimize a smooth vesion of a non-convex cost function in this context [8]. [] discusses an optimization solution fo an algoithm seeking to minimize the position of the top ielevant item.

2 Howeve, one obvious shotcoming of all these algoithms is that the notion of top k does not genealize to new data. Fo what k should one tain if the test data in some instances is half the size and in othe cases twice the size? In fact, no genealization guaantee is available fo such pecision@k optimization o algoithm. A moe pincipled appoach in all the applications aleady mentioned consists of designing algoithms that optimize accuacy in some top faction of the scoes etuned by a eal-valued hypothesis. This pape deals pecisely with this poblem. The desied objective is to lean a scoing function that is as accuate as possible fo the items whose scoes ae above the top -quantile. To be moe specific, when applied to a set of size n, the numbe of top items is k = n fo a -quantile, while fo a diffeent set of size n 0 6= n, this would coespond to k 0 = n 0 6= k. The implementation of the Pecision@k algoithm in [5] indiectly acknowledges the poblem that the notion of top k does not genealize since the command-line flag equies k to be specified as a faction of the positive samples. Nevetheless, the fomulation of the poblem as well as the solution ae still in tems of the top k items of the taining set. A study of vaious statistical questions elated to the poblem of accuacy at the top is discussed by [9]. The authos also pesent genealization bounds fo the specific case of empiical isk minimization (ERM) unde some assumptions about the hypothesis set and the distibution. But, to ou knowledge, no pevious publication has given geneal leaning guaantees fo the poblem of accuacy in the top quantile scoing items o caefully addessed the coesponding algoithmic poblem. We discuss the fomulation of this poblem (Section 3.) and define an algoithm optimizing a convex suogate of the coesponding loss in the case of linea scoing functions. We discuss the solution of this poblem in tems of seveal simple convex optimization poblems and show that these poblems can be extended to the case whee positive semi-definite kenels ae used (Section 3.). In Section 4, we pesent a Rademache complexity analysis of the poblem and give magin-based guaantees fo ou algoithm based on the -quantile value of the functions in the hypothesis set. In Section 5, we also epot the esults of seveal expeiments evaluating the pefomance of ou algoithm. In a compaison in a bipatite setting with seveal algoithms seeking high pecision at the top, ou algoithm achieves a bette pefomance in pecision at the top. We stat with a pesentation of notions and notation useful fo the discussion in the following sections. Peliminaies Let X denote the input space and D a distibution ove X X. We intepet the pesence of a pai (x, x 0 ) in the suppot of D as the pefeence of x 0 ove x. We denote by S = (x,x 0 ),...,(x m,x 0 m) (X X) m a labeled sample of size m dawn i.i.d. accoding to D and denote by D b the coesponding empiical distibution. D induces a maginal distibution ove X that we denote by D 0, which in the discete case can be defined via D 0 (x) = X D(x, x 0 )+D(x 0,x). x 0 X We also denote by b D 0 the empiical distibution associated to D 0 based on the sample S. The leaning poblems we ae studying ae defined in tems of the top -quantile of the values taken by a function h: X! R, that is a scoe q such that P x D 0[h(x) >q]= (see Figue (a)). In geneal, q is not unique and this equality may hold fo all q in an inteval [q min,q max ]. We will be paticulaly inteested in the popeties of the set of points x whose scoes ae above a quantile, that is s q = {x: h(x) >q}. Since fo any (q, q 0 ) [q min,q max ], s q and s q 0 diffe only by a set of measue zeo, the paticula choice of q in that inteval has no significant consequence. Thus, in what follows, when it is not unique, we will choose the quantile value to be the maximum, q max. Fo any [0, ], let denote the function defined by 8u R, (u) = (u) + ( )(u) +, whee (u) + = max(u, 0) and (u) = min(u, 0) (see Figue (b)). is convex as a sum of two convex functions since u 7! (u) + is convex, u 7! (u) concave. We will denote by agmin u f(u) the lagest minimize of function f. It is known (see fo example [7]) that the

3 page τ-quantile Set U = {u,...,u n} R, [0, ]. top τ faction of scoes ρτ Mehya Mohi - Couant & Google (a) 5 0 u Figue : (a) Illustation of the -quantile. (b) Gaph of function fo =.5. (b) (maximum) -quantile value bq of a sample of eal numbes X = (u,...,u n ) R n can be given by bq = agmin ur F (u), whee F is the convex function defined fo all u R by F (u) = P n n (u i u). Mehya Mohi - Couant & Google page 3 Accuacy at the top (AATP) 3. Poblem fomulation and algoithm The leaning poblem we conside is that of accuacy at the top (AATP) which consists of achieving an odeing of all items so that items whose scoes ae among the top -quantile ae as elevant as possible. Ideally, all pefeed items ae anked above the quantile and non-pefeed ones anked below. Thus, the loss o genealization eo of a hypothesis h: X!R with top -quantile value q h is the aveage numbe of non-pefeed elements that h anks above q h and pefeed ones anked below: R(h) = E h(x)>qh + h(x 0 )<q (x,x 0 h. ) D q h can be defined as follows in tems of the distibution D 0 : q h = agmin ur E x D 0[ (h(x) u)]. The quantile value q h depends on the tue distibution D. To define the empiical eo of h fo a sample S = (x,x 0 ),...,(x m,x 0 m) (X X) m, we will use instead an empiical estimate bq h of q h : bq h = agmin ur E x D b 0[ (h(x) u)]. Thus, we define the empiical eo of h fo a labeled sample as follows: br(h) = h(xi)>bq h + h(x 0 i )<bq h. We fist assume that X is a subset of R N fo some N and conside a hypothesis set H of linea functions h: x 7! w x. We will use a suogate empiical loss taking into consideation how much the scoe w x i of a non-pefeed item x i exceeds bq h, and similaly how much lowe the scoe w x 0 i fo a pefeed point x 0 i is than bq h, and seek a solution w minimizing a tade-off of that suogate loss and the nom squaed kwk. This leads to the following optimization poblem fo AATP: h X m i min w kwk + C w x i bq w bq w w x 0 i + + () subject to bq w = agmin ur Q (w,u), whee C 0 is a egulaization paamete and Q the quantile function defined as follows fo a sample S, fo any w R N and u R: Q (w,u)= h X m i (w x i ) u) + (w x 0 i) u). In the following, we will assume that is a multiple of /, othewise it can be ounded to the neaest such value. 3. Analysis of the optimization poblem Poblem () is not a convex optimization poblem since, while the objective function is convex, the equality constaint is not affine. Hee, we futhe analyze the poblem and discuss a solution. 3

4 The equality constaint could be witten as an infinite numbe of inequalities of Q (w, bq w ) apple Q (w,u) fo all u R. Obseve, howeve, that the quantile value q w must coincide with the scoe of one of taining points x k o x 0 k, that is w x k o w x 0 k. Thus, Poblem () can be equivalently witten with a finite numbe of constaints as follows: h X m i min w kwk + C w x i bq w bq w w x 0 i + + subject to bq w {w x k, w x 0 k : k [,m]} 8k [,m],q (w, bq w ) apple Q (w, w x k ), 8k [,m],q (w, bq w ) apple Q (w, w x 0 k). The inequality constaints do not coespond to non-positivity constaints on convex functions. Thus, the poblem is not a standad convex optimization poblem, but ou analysis leads us to a simple appoximate solution fo the poblem. Fo convenience, let (z,...,z ) denote (x,...,x m, x 0,...,x 0 m). Ou method consists of solving the convex quadatic pogamming (QP) poblem fo each value of k [, ]: min w h X m kwk + C subject to bq w = w z k. w x i bq w bq w w x 0 i + + i Let w k be the solution of Poblem (). Fo each k [, ], we detemine the -quantile value of the scoes {w k z i : i [, ]}. This can be checked staightfowadly in time O(m log m) by soting the scoes. Then, the solution w we etun is the w k fo which w k z k is closest to the -quantile value, the one fo which the objective function is the smallest in the pesence of ties. The method fo detemining w is thus based on the solution of simple QPs. Ou solution natually paallelizes so that on a distibuted computing envionment, the computational time fo solving the poblem can be educed to oughly the same as that of solving a single QP. 3.3 Kenelized fomulation Fo any i[, ], let y i = if iapplem, y i =+ othewise. Then, Poblem () admits the following equivalent dual optimization poblem simila to that of SVMs: max X i X i,j= subject to: 8i [, ], 0 apple i apple C, () i j y i y j (z i z k ) (z j z k ) (3) which depends only on inne poducts between points of the taining set. The vecto w can be obtained fom the solution via w = P iy i (z i z k ). The algoithm can theefoe be genealized by using equivalently any positive semi-definite kenel symmetic (PDS) kenel K : X X!R instead of the inne poduct in the input space, theeby also extending it to the case of non-vectoial input spaces X. The coesponding hypothesis set H is that of linea functions h: x 7! w (x) whee : X!H is a featue mapping to a Hilbet space H associated to K and w an element of H. In view of (3), fo any k [, ], the dual poblem of () can then be expessed as follows: max X i X i,j= subject to: 8i [, ], 0 apple i apple C, i j y i y j K k (z i, z j ) (4) whee, fo any k [, ], K k is the PDS kenel defined by K k :(z, z 0 ) 7! K(z, z 0 ) K(z, z k ) K(z k, z 0 )+K(z k, z k ). Ou solution can theefoe also be found in the dual by solving the QPs defined by (4). 4 Theoetical guaantees We hee pesent magin-based genealization bounds fo the AATP leaning poblem. 4

5 Let : R! [0; ] be the function defined by : x 7! xapple0 + ( x/ ) + x>0. Fo any >0 and t R, we define the genealization eo R(h, t) and empiical magin loss R b (h, t), both with espect to t, by R(h, t) = E h(x)>t + h(x0 )<t br (h, t) = (t h(x i )) + (h(x 0 (x,x 0 ) D i) t). In paticula, R(h, q h ) coesponds to the genealization eo and R b (h, q h ) to the empiical magin loss of a hypothesis h fo AATP. Fo any t>0, the empiical magin loss R b (h, t) is uppe bounded by the aveage of the faction of non-pefeed elements x i that h anks above t o less than below t, and the faction of pefeed ones x 0 i it anks below t o less than above t: br (h, t) apple t h(xi)< + h(x 0 i ) t<. (5) We denote by D the maginal distibution of the fist element of the pais in X Xdeived fom D, and by D the maginal distibution with espect to the second element. Similaly, S is the sample deived fom S by keeping only the fist element of each pai: S = x,...,x m and S the one obtained by keeping only the second element: S = x 0,...,x 0 m. We also denote by R D m (H) the Rademache complexity of H with espect to the maginal distibution D, that is R D m (H) = E[ R b S (H)], and R D m (H) = E[ R b S (H)]. Theoem Let H be a set of eal-valued functions taking values in [ M,+M] fo some M>0. Fix [0, ] and >0, then, fo any >0, with pobability at least ove the choice of a sample S of size m, each of the following inequalities holds fo all h H and t [ M,+M]: R(h, t) apple R b (h, t)+ R D m (H)+R D m (H)+ M log / p + m R(h, t) apple R b (h, t)+ br S (H)+ R b S (H)+ p M log / +3 m. Poof. Let H e be the family of hypotheses mapping (X X) to R defined by H e = {z =(x, x 0 ) 7! t h(x): h H, t [ M,+M]} and similaly H e 0 = {z =(x, x 0 ) 7! h(x 0 ) t: h H, t [ M,+M]}. Conside the two families of functions H e and H e0 taking values in [0, ] defined by eh = { f : f H} e and H e0 = { f : f H e 0 }. By the geneal Rademache complexity bounds fo functions taking values in [0, ] [8, 3, 0], with pobability at least, E (t h(x)) + (h(x 0 ) t) apple R b (h, t)+r m ( H e + H e 0 ) apple b R (h, t)+r m ( e H)+R m ( e H 0 + log / + log /, fo all h H. Since u<0 apple (u) fo all u R, the genealization eo R(h, t) is a lowe bound on left-hand side: R(h, t) apple E (t h(x)) + (h(x 0 ) t), we obtain R(h, t) apple R b (h, t)+r m ( H)+R e m ( H e log / 0 +. Since is / -Lipschitz, by Talagand s contaction lemma, we have R m H e apple (/ )Rm ( H) e and R m H e0 apple (/ )R m ( H e 0 ). By definition of the Rademache complexity, " # " # R m ( H)= e m E sup i(t h(x i )) = S D m, hh,t m E sup it + sup ih(x i ) S, t hh = h m E sup t i i+ apple m E ih(x i ). t[ M,+M] sup hh 5

6 Since the andom vaiables i and i follow the same distibution, the second tem coincides with R D m (H). The fist tem can be ewitten and uppe bounded as follows using Jensen s inequality: " # m E sup it = M X M X P[ ] i P[ ] i MappletappleM m P m m i>0 P m i<0 " = M m # m E X i apple M h m m E X i i = M h m m E X i i = p M. m Note that, by the Kahane-Khintchine inequality, the last uppe bound used is tight modulo a constant (/ p ). Similaly, we can show that R m ( H e 0 ) apple R D m (H)+M/ p m. This poves the fist inequality of the theoem; the second inequality can be deived fom the fist one using the standad bound elating the empiical and tue Rademache complexity. Since the bounds of the theoem hold unifomly fo all t [ any quantile value q h. M,+M], they hold in paticula fo Coollay (Magin bounds fo AATP) Let H be a set of eal-valued functions taking values in [ M,+M] fo some M>0. Fix [0, ] and >0, then, fo any >0, with pobability at least ove the choice of a sample S of size m, fo all h H it holds that: R D m (H)+R D m (H)+ p M log / + m R(h)apple b R (h, q h )+ R(h)apple b R (h, q h )+ br S (H)+ b R S (H)+ M p m +3 log /. A moe explicit vesion of this coollay can be deived fo kenel-based hypotheses (Appendix A). In the esults of the pevious theoem and coollay, the ight-hand side of the genealization bounds is expessed in tems of the empiical magin loss with espect to the tue quantile value q h, which is uppe bounded (see (5)) by half the faction of non-pefeed points in the sample whose scoe is above q h and half the faction of the pefeed points whose scoe is less than q h +. These factions ae close to the same factions with q h eplaced with bq h since the pobability that a scoe falls between q h and bq h can be shown to be unifomly bounded by a tem in O(/ p m). Altogethe, this analysis povides a stong suppot fo ou algoithm which is pecisely seeking to minimize the sum of an empiical magin loss based on the quantile and a tem that depends on the complexity, as in the ight-hand side of the leaning guaantees above. 5 Expeiments This section epots the esults of expeiments with ou AATP algoithm on seveal datasets. To measue the effectiveness of ou algoithm, we compae it to two othe algoithms, the INFINITE- PUSH algoithm [] and the SVMPERF algoithm [5], which ae both algoithms seeking to emphasize the accuacy nea the top. Ou expeiments ae caied out using thee data sets fom the UC Ivine Machine Leaning Repositoy Ionosphee, Housing, and Spambase. (Results fo Spambase can be found in Appendix C). In addition, we use the TREC 003 (LETOR.0) data set which is available fo download fom the following Micosoft Reseach URL: All the UC Ivine data sets we expeiment with ae fo two-goup classification poblems. Fom these we constuct bipatite anking poblems whee a pefeence pai consists of one positive and one negative example. To explicitly indicate the dependency on the quantile, we denote by q the value of the top -th quantile of the scoe distibution of a hypothesis. We will use N to denote the numbe of instances in a paticula data set, as well as s i, i =,...,N, to denote the paticula scoe values. If n + denotes the numbe of positive examples in the data set and n denotes the numbe of negative examples, then N = n + + n and the numbe of pefeences is m = n + n. Note that the Bahadu-Kiefe epesentation is known to povide a unifom convegence bound on the diffeence of the tue and empiical quantiles when the distibution admits a density [, 6], a stonge esult than what is needed in ou context. 6

7 Table : Ionosphee data: fo each top quantile and each evaluation metic, the thee ows coespond to AATP (top), SVMPERF(middle) and INFINITEPUSH (bottom). Fo the INFINITEPUSH algoithm we only epot mean values ove the folds. (%) P@ AP DCG@ NDCG@ Positives@top ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± Implementation We solved the convex optimization poblems () using the CVX solve As aleady noted, the AATP poblem can be solved efficiently using a distibuted computing envionment. The convex optimization poblem of the INFINITEPUSH algoithm (see (3.9) of []) can also be solved using CVX. Howeve, this optimization poblem has as many vaiables as the poduct of the numbes of positively and negatively labeled instances (n + n ), which makes it pohibitive to solve fo lage data sets within a untime of a few days. Thus, we expeimented with the INFINITEPUSH algoithm only on the Ionosphee data set. Finally, fo SVM- PERF s taining and scoe pediction we used the binay executables downloaded fom the URL and used the SVMPERF s settings that ae the closest to ou optimization fomulation. Thus, we used L-nom fo slack vaiables and allowed the constaint cache and the toleance fo temination citeion to gow in ode to contol the algoithm s convegence, especially fo lage values of the egulaization constant. 5. Evaluation measues To evaluate and compae the AATP, INFINITEPUSH, and SVMPERF algoithms, we used a numbe of standad metics: Pecision at the top (P@ ), Aveage Pecision (AP), Numbe of positives at the absolute top (Positives@top), Discounted Cumulative Gain (DCG@ ), and Nomalized Discounted Cumulative Gain (NDCG@ ). Definitions ae included in Appendix B. 5.3 Ionosphee data The data set s 35 instances epesent ada signals collected fom phased antennas, whee good signals (5 positively labeled instances) ae those that eflect back towad the antennas and bad signals (6 negatively labeled instances) ae those that pass though the ionosphee. The data has 34 featues. We split the data set into 0 independent sets of instances, say S,...,S 0. Then, we an 0 expeiments, whee we used 3 consecutive sets fo leaning and the est (7 sets) fo testing. We evaluated and compaed the algoithms fo 5 diffeent top quantiles {9, 4, 9.5, 5, } (%), which would coespond to the top 0, 5, 0, 5, items, espectively. Fo each, the egulaization paamete C was selected based on the aveage value of P@. The pefomance of AATP is significantly bette than that of the othe algoithms, paticulaly fo the smallest top quantiles. The two main citeia on which to evaluate the AATP algoithm ae Pecision at the top, (P@ ), and Numbe of positive at the top, (Positives@top). Fo = 5% the AATP algoithm obtains a stella 9% accuacy with an aveage of 3.3 positive elements at the top (Table ). 7

8 Table : Housing data: fo each quantile value and each evaluation metic, thee ae two ows coesponding to AATP (top) and SVMPERF(bottom). (%) P@ AP DCG@ NDCG@ Positives@top ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± Housing data The Boston Housing data set has 506 examples, 35 positive and 47 negative, descibed by 3 featues. We used featue 4 as the binay taget value. Two thids of the data instances was andomly selected and used fo taining, and the est fo testing. We ceated 0 expeimental folds analogously as in the case of the Ionosphee data. The Housing data is vey unbalanced with less than 7% positive examples. Fo this dataset we obtain esults vey compaable to SVMPERF fo the vey top quantiles, see Table. Natually, the standad deviations ae lage as a esult of the low pecentage of positive examples, so the esults ae not always significant. Fo highe top quantiles, e.g., top 4%, the AATP algoithm significantly outpefoms SVMPERF, obtaining 9% accuacy at the top (P@ ). Fo the highest top quantiles the diffeence in pefomance between the two algoithms is not significant. 5.5 LETOR.0 This data set coesponds to a elatively had anking poblem, with an aveage of only % elevant quey-url pais pe quey. It consists of 5 folds. Ou Matlab implementation (with CVX) of the algoithms pevented us fom tying ou appoach on lage data sets. Hence fom each taining fold we andomly selected 500 items fo taining. Fo testing, we selected 000 items at andom fom the test fold. Hee, we only epot esults fo P@%. SVMPERF obtained an accuacy of.5% ±.5% while the AATP algoithm obtained an accuacy of 4.6% ±.4%. This significantly bette esult indicates the powe of the algoithm poposed. 6 Conclusion We pesented a seies of esults fo the poblem of accuacy at the top quantile, including an AATP algoithm, a magin-based theoetical analysis in suppot of that algoithm, and a seies of expeiments with seveal data sets demonstating the effectiveness of ou algoithm. These esults ae of pactical inteest in applications whee the accuacy among the top quantile is sought. The analysis of poblems based on othe loss functions depending on the top -quantile scoes is also likely to benefit fom the theoetical and algoithmic esults we pesented. The optimization algoithm we discussed is highly paallelizable, since it is based on solving independent QPs. Ou initial expeiments epoted hee wee caied out using Matlab with CVX, which pevented us fom evaluating ou appoach on lage data sets, such as the full LETOR.0 data set. Howeve, we have now designed a solution fo vey lage m based on the ADMM (Altenating Diection Method of Multiplies) famewok [4]. We have implemented that solution and will pesent and discuss it in futue wok. 8

9 Refeences [] S. Agawal. The infinite push: A new suppot vecto anking algoithm that diectly optimizes accuacy at the absolute top of the list. In Poceedings of the SIAM Intenational Confeence on Data Mining, 0. [] R. R. Bahadu. A note on quantiles in lage samples. Annals of Mathematical Statistics, 37, 966. [3] P. L. Batlett and S. Mendelson. Rademache and Gaussian complexities: Risk bounds and stuctual esults. Jounal of Machine Leaning Reseach, 3:00, 00. [4] S. Boyd, N. Paikh, E. Chu, B. Peleato, and J. Eckstein. Distibuted optimization and statistical leaning via the altenating diection method of multiplies. Foundations and Tends in Machine Leaning, 3():, 0. [5] S. Boyd and L. Vandenbeghe. Convex Optimization. Cambidge Univesity Pess, 004. [6] J. S. Beese, D. Heckeman, and C. M. Kadie. Empiical analysis of pedictive algoithms fo collaboative filteing. In UAI 98: Poceedings of the Fouteenth Confeence on Uncetainty in Atificial Intelligence. Mogan Kaufmann, 998. [7] C. Buges, T. Shaked, E. Renshaw, A. Lazie, M. Deeds, N. Hamilton, and G. Hullende. Leaning to ank using gadient descent. In Poceedings of the nd intenational confeence on Machine leaning, ICML 05, pages 89 96, New Yok, NY, USA, 005. ACM. [8] C. J. C. Buges, R. Ragno, and Q. V. Le. Leaning to ank with nonsmooth cost functions. In NIPS, pages 93 00, 006. [9] S. Clémençon and N. Vayatis. Ranking the best instances. Jounal of Machine Leaning Reseach, 8:67 699, 007. [0] D. Cossock and T. Zhang. Statistical analysis of Bayes optimal subset anking. IEEE Tansactions on Infomation Theoy, 54(): , 008. [] K. Camme and Y. Singe. PRanking with anking. In Neual Infomation Pocessing Systems (NIPS 00). MIT Pess, 00. [] Y. Feund, R. Iye, R. E. Schapie, and Y. Singe. An efficient boosting algoithm fo combining pefeences. J. Mach. Lean. Res., 4, Decembe 003. [3] R. Hebich, K. Obemaye, and T. Gaepel. Advances in Lage Magin Classifies, chapte Lage Magin Rank Boundaies fo Odinal Regession. MIT Pess, 000. [4] T. Joachims. Optimizing seach engines using clickthough data. In Poceedings of the eighth ACM SIGKDD intenational confeence on Knowledge discovey and data mining, KDD 0, pages 33 4, New Yok, NY, USA, 00. ACM. [5] T. Joachims. A suppot vecto method fo multivaiate pefomance measues. In ICML, pages , 005. [6] J. Kiefe. On Bahadu s epesentation of sample quantiles. Annals of Mathematical Statistics, 38, 967. [7] R. Koenke. Quantile Regession. Cambidge Univesity Pess, 005. [8] V. Koltchinskii and D. Panchenko. Empiical magin distibutions and bounding the genealization eo of combined classifies. Annals of Statistics, 30, 00. [9] Q. V. Le, A. Smola, O. Chapelle, and C. H. Teo. Optimization of anking measues. Unpublished, 009. [0] M. Mohi, A. Rostamizadeh, and A. Talwalka. Foundations of Machine Leaning. The MIT Pess, 0. [] C. Rudin, C. Cotes, M. Mohi, and R. E. Schapie. Magin-based anking meets boosting in the middle. In COLT, pages 63 78, 005. [] I. Tsochantaidis, T. Joachims, T. Hofmann, and Y. Altun. Lage magin methods fo stuctued and intedependent output vaiables. Jounal of Machine Leaning Reseach, 6: ,

On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines

On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines Jounal of Machine Leaning Reseach 2 (2001) 265-292 Submitted 03/01; Published 12/01 On the Algoithmic Implementation of Multiclass Kenel-based Vecto Machines Koby Camme Yoam Singe School of Compute Science

More information

CHAPTER 9 THE TWO BODY PROBLEM IN TWO DIMENSIONS

CHAPTER 9 THE TWO BODY PROBLEM IN TWO DIMENSIONS 9. Intoduction CHAPTER 9 THE TWO BODY PROBLEM IN TWO DIMENSIONS In this chapte we show how Keple s laws can be deived fom Newton s laws of motion and gavitation, and consevation of angula momentum, and

More information

How To Know The Cost Of Delaed Dieentiation

How To Know The Cost Of Delaed Dieentiation ae-to-ode, ae-to-stoc, o Dela Poduct Dieentiation? A Common Famewo o odeling and Analsis Diwaa Gupta Saiallah Benjaaa Univesit o innesota Depatment o echanical Engineeing inneapolis, N 55455 Second evision,

More information

Data integration: A theoretical perspective

Data integration: A theoretical perspective Data integation: A theoetical esective Mauizio Lenzeini Diatimento di Infomatica e Sistemistica Antonio Rubeti Univesità di Roma La Saienza Tutoial at PODS 2002 Madison, Wisconsin, USA, June 2002 Data

More information

How Boosting the Margin Can Also Boost Classifier Complexity

How Boosting the Margin Can Also Boost Classifier Complexity Lev Reyzin lev.reyzin@yale.edu Yale University, Department of Computer Science, 51 Prospect Street, New Haven, CT 652, USA Robert E. Schapire schapire@cs.princeton.edu Princeton University, Department

More information

Learning to Select Features using their Properties

Learning to Select Features using their Properties Journal of Machine Learning Research 9 (2008) 2349-2376 Submitted 8/06; Revised 1/08; Published 10/08 Learning to Select Features using their Properties Eyal Krupka Amir Navot Naftali Tishby School of

More information

How To Understand The Problem Of Decoding By Linear Programming

How To Understand The Problem Of Decoding By Linear Programming Decoding by Linear Programming Emmanuel Candes and Terence Tao Applied and Computational Mathematics, Caltech, Pasadena, CA 91125 Department of Mathematics, University of California, Los Angeles, CA 90095

More information

Probability Estimates for Multi-class Classification by Pairwise Coupling

Probability Estimates for Multi-class Classification by Pairwise Coupling Journal of Machine Learning Research 5 (2004) 975-005 Submitted /03; Revised 05/04; Published 8/04 Probability Estimates for Multi-class Classification by Pairwise Coupling Ting-Fan Wu Chih-Jen Lin Department

More information

THE PROBLEM OF finding localized energy solutions

THE PROBLEM OF finding localized energy solutions 600 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 45, NO. 3, MARCH 1997 Sparse Signal Reconstruction from Limited Data Using FOCUSS: A Re-weighted Minimum Norm Algorithm Irina F. Gorodnitsky, Member, IEEE,

More information

How to Use Expert Advice

How to Use Expert Advice NICOLÒ CESA-BIANCHI Università di Milano, Milan, Italy YOAV FREUND AT&T Labs, Florham Park, New Jersey DAVID HAUSSLER AND DAVID P. HELMBOLD University of California, Santa Cruz, Santa Cruz, California

More information

More Generality in Efficient Multiple Kernel Learning

More Generality in Efficient Multiple Kernel Learning Manik Varma manik@microsoft.com Microsoft Research India, Second Main Road, Sadashiv Nagar, Bangalore 560 080, India Bodla Rakesh Babu rakeshbabu@research.iiit.net CVIT, International Institute of Information

More information

Most-Surely vs. Least-Surely Uncertain

Most-Surely vs. Least-Surely Uncertain Most-Surely vs. Least-Surely Uncertain Manali Sharma and Mustafa Bilgic Computer Science Department Illinois Institute of Technology, Chicago, IL, USA msharm11@hawk.iit.edu mbilgic@iit.edu Abstract Active

More information

Knowing a good HOG filter when you see it: Efficient selection of filters for detection

Knowing a good HOG filter when you see it: Efficient selection of filters for detection Knowing a good HOG filter when you see it: Efficient selection of filters for detection Ejaz Ahmed 1, Gregory Shakhnarovich 2, and Subhransu Maji 3 1 University of Maryland, College Park ejaz@umd.edu 2

More information

Object Detection with Discriminatively Trained Part Based Models

Object Detection with Discriminatively Trained Part Based Models 1 Object Detection with Discriminatively Trained Part Based Models Pedro F. Felzenszwalb, Ross B. Girshick, David McAllester and Deva Ramanan Abstract We describe an object detection system based on mixtures

More information

New Results on Hermitian Matrix Rank-One Decomposition

New Results on Hermitian Matrix Rank-One Decomposition New Results on Hermitian Matrix Rank-One Decomposition Wenbao Ai Yongwei Huang Shuzhong Zhang June 22, 2009 Abstract In this paper, we present several new rank-one decomposition theorems for Hermitian

More information

Getting the Most Out of Ensemble Selection

Getting the Most Out of Ensemble Selection Getting the Most Out of Ensemble Selection Rich Caruana, Art Munson, Alexandru Niculescu-Mizil Department of Computer Science Cornell University Technical Report 2006-2045 {caruana, mmunson, alexn} @cs.cornell.edu

More information

Generalized Compact Knapsacks are Collision Resistant

Generalized Compact Knapsacks are Collision Resistant Generalized Compact Knapsacks are Collision Resistant Vadim Lyubashevsky Daniele Micciancio University of California, San Diego 9500 Gilman Drive, La Jolla, CA 92093-0404, USA {vlyubash,daniele}@cs.ucsd.edu

More information

Truthful Mechanisms for One-Parameter Agents

Truthful Mechanisms for One-Parameter Agents Truthful Mechanisms for One-Parameter Agents Aaron Archer Éva Tardos y Abstract In this paper, we show how to design truthful (dominant strategy) mechanisms for several combinatorial problems where each

More information

Lazier Than Lazy Greedy

Lazier Than Lazy Greedy Baharan Mirzasoleiman ETH Zurich baharanm@inf.ethz.ch Lazier Than Lazy Greedy Ashwinumar Badanidiyuru Google Research Mountain View ashwinumarbv@google.com Amin Karbasi Yale University amin.arbasi@yale.edu

More information

How To Solve A One Class Collaborative Filtering Problem

How To Solve A One Class Collaborative Filtering Problem One-Class Collaborative Filtering Rong Pan 1 Yunhong Zhou 2 Bin Cao 3 Nathan N. Liu 3 Rajan Lukose 1 Martin Scholz 1 Qiang Yang 3 1. HP Labs, 1501 Page Mill Rd, Palo Alto, CA, 4304, US {rong.pan,rajan.lukose,scholz}@hp.com

More information

Subspace Pursuit for Compressive Sensing: Closing the Gap Between Performance and Complexity

Subspace Pursuit for Compressive Sensing: Closing the Gap Between Performance and Complexity Subspace Pursuit for Compressive Sensing: Closing the Gap Between Performance and Complexity Wei Dai and Olgica Milenkovic Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign

More information

High-Rate Codes That Are Linear in Space and Time

High-Rate Codes That Are Linear in Space and Time 1804 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 48, NO 7, JULY 2002 High-Rate Codes That Are Linear in Space and Time Babak Hassibi and Bertrand M Hochwald Abstract Multiple-antenna systems that operate

More information

Approximately Detecting Duplicates for Streaming Data using Stable Bloom Filters

Approximately Detecting Duplicates for Streaming Data using Stable Bloom Filters Approximately Detecting Duplicates for Streaming Data using Stable Bloom Filters Fan Deng University of Alberta fandeng@cs.ualberta.ca Davood Rafiei University of Alberta drafiei@cs.ualberta.ca ABSTRACT

More information

An Introduction to Variable and Feature Selection

An Introduction to Variable and Feature Selection Journal of Machine Learning Research 3 (23) 1157-1182 Submitted 11/2; Published 3/3 An Introduction to Variable and Feature Selection Isabelle Guyon Clopinet 955 Creston Road Berkeley, CA 9478-151, USA

More information

Two faces of active learning

Two faces of active learning Two faces of active learning Sanjoy Dasgupta dasgupta@cs.ucsd.edu Abstract An active learner has a collection of data points, each with a label that is initially hidden but can be obtained at some cost.

More information

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 4, APRIL 2006 1289. Compressed Sensing. David L. Donoho, Member, IEEE

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 4, APRIL 2006 1289. Compressed Sensing. David L. Donoho, Member, IEEE IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 4, APRIL 2006 1289 Compressed Sensing David L. Donoho, Member, IEEE Abstract Suppose is an unknown vector in (a digital image or signal); we plan to

More information

Generalized compact knapsacks, cyclic lattices, and efficient one-way functions

Generalized compact knapsacks, cyclic lattices, and efficient one-way functions Generalized compact knapsacks, cyclic lattices, and efficient one-way functions Daniele Micciancio University of California, San Diego 9500 Gilman Drive La Jolla, CA 92093-0404, USA daniele@cs.ucsd.edu

More information

Optimization with Sparsity-Inducing Penalties. Contents

Optimization with Sparsity-Inducing Penalties. Contents Foundations and Trends R in Machine Learning Vol. 4, No. 1 (2011) 1 106 c 2012 F. Bach, R. Jenatton, J. Mairal and G. Obozinski DOI: 10.1561/2200000015 Optimization with Sparsity-Inducing Penalties By

More information

When Is There a Representer Theorem? Vector Versus Matrix Regularizers

When Is There a Representer Theorem? Vector Versus Matrix Regularizers Journal of Machine Learning Research 10 (2009) 2507-2529 Submitted 9/08; Revised 3/09; Published 11/09 When Is There a Representer Theorem? Vector Versus Matrix Regularizers Andreas Argyriou Department

More information

Learning Deep Architectures for AI. Contents

Learning Deep Architectures for AI. Contents Foundations and Trends R in Machine Learning Vol. 2, No. 1 (2009) 1 127 c 2009 Y. Bengio DOI: 10.1561/2200000006 Learning Deep Architectures for AI By Yoshua Bengio Contents 1 Introduction 2 1.1 How do

More information