Accuracy at the Top. Abstract

Transcription

1 Accuacy at the Top Stephen Boyd Stanfod Univesity Packad 64 Stanfod, CA Mehya Mohi Couant Institute and Google 5 Mece Steet New Yok, NY 00 mohi@cims.nyu.edu Coinna Cotes Google Reseach 76 Ninth Avenue New Yok, NY 00 coinna@google.com Ana Radovanovic Google Reseach 76 Ninth Avenue New Yok, NY 00 anaadovanovic@google.com Abstact We intoduce a new notion of classification accuacy based on the top -quantile values of a scoing function, a elevant citeion in a numbe of poblems aising fo seach engines. We define an algoithm optimizing a convex suogate of the coesponding loss, and discuss its solution in tems of a set of convex optimization poblems. We also pesent magin-based guaantees fo this algoithm based on the top -quantile value of the scoes of the functions in the hypothesis set. Finally, we epot the esults of seveal expeiments in the bipatite setting evaluating the pefomance of ou solution and compaing the esults to seveal othe algoithms seeking high pecision at the top. In most examples, ou solution achieves a bette pefomance in pecision at the top. Intoduction The accuacy of the items placed nea the top is cucial fo many infomation etieval systems such as seach engines o ecommendation systems, since most uses of these systems bowse o conside only the fist k items. Diffeent citeia have been intoduced in the past to measue this quality, including the pecision at k (Pecision@k), the nomalized discounted cumulative gain (NDCG) and othe vaiants of DCG, o the mean ecipocal ank (MRR) when the ank of the most elevant document is citical. A somewhat diffeent but also elated citeion adopted by [] is based on the position of the top ielevant item. Seveal machine leaning algoithms have been ecently designed to optimize these citeia and othe elated ones [6,,,, 7, 4, 3]. A geneal algoithm inspied by the stuctued pediction technique SVMStuct [] was incopoated in an algoithm by [5] which can be used to optimize a convex uppe bound on the numbe of eos among the top k items. The algoithm seeks to solve a convex poblem with exponentially many constaints via seveal ounds of optimization with a smalle numbe of constaints, augmenting the set of constaints at each ound with the most violating one. Anothe algoithm, also based on stuctued pediction ideas, is poposed in an unpublished manuscipt of [9] and coves seveal citeia, including Pecision@k and NDCG. A egession-based solution is suggested by [0] fo DCG in the case of lage sample sizes. Some othe methods have also been poposed to optimize a smooth vesion of a non-convex cost function in this context [8]. [] discusses an optimization solution fo an algoithm seeking to minimize the position of the top ielevant item.

2 Howeve, one obvious shotcoming of all these algoithms is that the notion of top k does not genealize to new data. Fo what k should one tain if the test data in some instances is half the size and in othe cases twice the size? In fact, no genealization guaantee is available fo such pecision@k optimization o algoithm. A moe pincipled appoach in all the applications aleady mentioned consists of designing algoithms that optimize accuacy in some top faction of the scoes etuned by a eal-valued hypothesis. This pape deals pecisely with this poblem. The desied objective is to lean a scoing function that is as accuate as possible fo the items whose scoes ae above the top -quantile. To be moe specific, when applied to a set of size n, the numbe of top items is k = n fo a -quantile, while fo a diffeent set of size n 0 6= n, this would coespond to k 0 = n 0 6= k. The implementation of the Pecision@k algoithm in [5] indiectly acknowledges the poblem that the notion of top k does not genealize since the command-line flag equies k to be specified as a faction of the positive samples. Nevetheless, the fomulation of the poblem as well as the solution ae still in tems of the top k items of the taining set. A study of vaious statistical questions elated to the poblem of accuacy at the top is discussed by [9]. The authos also pesent genealization bounds fo the specific case of empiical isk minimization (ERM) unde some assumptions about the hypothesis set and the distibution. But, to ou knowledge, no pevious publication has given geneal leaning guaantees fo the poblem of accuacy in the top quantile scoing items o caefully addessed the coesponding algoithmic poblem. We discuss the fomulation of this poblem (Section 3.) and define an algoithm optimizing a convex suogate of the coesponding loss in the case of linea scoing functions. We discuss the solution of this poblem in tems of seveal simple convex optimization poblems and show that these poblems can be extended to the case whee positive semi-definite kenels ae used (Section 3.). In Section 4, we pesent a Rademache complexity analysis of the poblem and give magin-based guaantees fo ou algoithm based on the -quantile value of the functions in the hypothesis set. In Section 5, we also epot the esults of seveal expeiments evaluating the pefomance of ou algoithm. In a compaison in a bipatite setting with seveal algoithms seeking high pecision at the top, ou algoithm achieves a bette pefomance in pecision at the top. We stat with a pesentation of notions and notation useful fo the discussion in the following sections. Peliminaies Let X denote the input space and D a distibution ove X X. We intepet the pesence of a pai (x, x 0 ) in the suppot of D as the pefeence of x 0 ove x. We denote by S = (x,x 0 ),...,(x m,x 0 m) (X X) m a labeled sample of size m dawn i.i.d. accoding to D and denote by D b the coesponding empiical distibution. D induces a maginal distibution ove X that we denote by D 0, which in the discete case can be defined via D 0 (x) = X D(x, x 0 )+D(x 0,x). x 0 X We also denote by b D 0 the empiical distibution associated to D 0 based on the sample S. The leaning poblems we ae studying ae defined in tems of the top -quantile of the values taken by a function h: X! R, that is a scoe q such that P x D 0[h(x) >q]= (see Figue (a)). In geneal, q is not unique and this equality may hold fo all q in an inteval [q min,q max ]. We will be paticulaly inteested in the popeties of the set of points x whose scoes ae above a quantile, that is s q = {x: h(x) >q}. Since fo any (q, q 0 ) [q min,q max ], s q and s q 0 diffe only by a set of measue zeo, the paticula choice of q in that inteval has no significant consequence. Thus, in what follows, when it is not unique, we will choose the quantile value to be the maximum, q max. Fo any [0, ], let denote the function defined by 8u R, (u) = (u) + ( )(u) +, whee (u) + = max(u, 0) and (u) = min(u, 0) (see Figue (b)). is convex as a sum of two convex functions since u 7! (u) + is convex, u 7! (u) concave. We will denote by agmin u f(u) the lagest minimize of function f. It is known (see fo example [7]) that the

3 page τ-quantile Set U = {u,...,u n} R, [0, ]. top τ faction of scoes ρτ Mehya Mohi - Couant & Google (a) 5 0 u Figue : (a) Illustation of the -quantile. (b) Gaph of function fo =.5. (b) (maximum) -quantile value bq of a sample of eal numbes X = (u,...,u n ) R n can be given by bq = agmin ur F (u), whee F is the convex function defined fo all u R by F (u) = P n n (u i u). Mehya Mohi - Couant & Google page 3 Accuacy at the top (AATP) 3. Poblem fomulation and algoithm The leaning poblem we conside is that of accuacy at the top (AATP) which consists of achieving an odeing of all items so that items whose scoes ae among the top -quantile ae as elevant as possible. Ideally, all pefeed items ae anked above the quantile and non-pefeed ones anked below. Thus, the loss o genealization eo of a hypothesis h: X!R with top -quantile value q h is the aveage numbe of non-pefeed elements that h anks above q h and pefeed ones anked below: R(h) = E h(x)>qh + h(x 0 )<q (x,x 0 h. ) D q h can be defined as follows in tems of the distibution D 0 : q h = agmin ur E x D 0[ (h(x) u)]. The quantile value q h depends on the tue distibution D. To define the empiical eo of h fo a sample S = (x,x 0 ),...,(x m,x 0 m) (X X) m, we will use instead an empiical estimate bq h of q h : bq h = agmin ur E x D b 0[ (h(x) u)]. Thus, we define the empiical eo of h fo a labeled sample as follows: br(h) = h(xi)>bq h + h(x 0 i )<bq h. We fist assume that X is a subset of R N fo some N and conside a hypothesis set H of linea functions h: x 7! w x. We will use a suogate empiical loss taking into consideation how much the scoe w x i of a non-pefeed item x i exceeds bq h, and similaly how much lowe the scoe w x 0 i fo a pefeed point x 0 i is than bq h, and seek a solution w minimizing a tade-off of that suogate loss and the nom squaed kwk. This leads to the following optimization poblem fo AATP: h X m i min w kwk + C w x i bq w bq w w x 0 i + + () subject to bq w = agmin ur Q (w,u), whee C 0 is a egulaization paamete and Q the quantile function defined as follows fo a sample S, fo any w R N and u R: Q (w,u)= h X m i (w x i ) u) + (w x 0 i) u). In the following, we will assume that is a multiple of /, othewise it can be ounded to the neaest such value. 3. Analysis of the optimization poblem Poblem () is not a convex optimization poblem since, while the objective function is convex, the equality constaint is not affine. Hee, we futhe analyze the poblem and discuss a solution. 3

4 The equality constaint could be witten as an infinite numbe of inequalities of Q (w, bq w ) apple Q (w,u) fo all u R. Obseve, howeve, that the quantile value q w must coincide with the scoe of one of taining points x k o x 0 k, that is w x k o w x 0 k. Thus, Poblem () can be equivalently witten with a finite numbe of constaints as follows: h X m i min w kwk + C w x i bq w bq w w x 0 i + + subject to bq w {w x k, w x 0 k : k [,m]} 8k [,m],q (w, bq w ) apple Q (w, w x k ), 8k [,m],q (w, bq w ) apple Q (w, w x 0 k). The inequality constaints do not coespond to non-positivity constaints on convex functions. Thus, the poblem is not a standad convex optimization poblem, but ou analysis leads us to a simple appoximate solution fo the poblem. Fo convenience, let (z,...,z ) denote (x,...,x m, x 0,...,x 0 m). Ou method consists of solving the convex quadatic pogamming (QP) poblem fo each value of k [, ]: min w h X m kwk + C subject to bq w = w z k. w x i bq w bq w w x 0 i + + i Let w k be the solution of Poblem (). Fo each k [, ], we detemine the -quantile value of the scoes {w k z i : i [, ]}. This can be checked staightfowadly in time O(m log m) by soting the scoes. Then, the solution w we etun is the w k fo which w k z k is closest to the -quantile value, the one fo which the objective function is the smallest in the pesence of ties. The method fo detemining w is thus based on the solution of simple QPs. Ou solution natually paallelizes so that on a distibuted computing envionment, the computational time fo solving the poblem can be educed to oughly the same as that of solving a single QP. 3.3 Kenelized fomulation Fo any i[, ], let y i = if iapplem, y i =+ othewise. Then, Poblem () admits the following equivalent dual optimization poblem simila to that of SVMs: max X i X i,j= subject to: 8i [, ], 0 apple i apple C, () i j y i y j (z i z k ) (z j z k ) (3) which depends only on inne poducts between points of the taining set. The vecto w can be obtained fom the solution via w = P iy i (z i z k ). The algoithm can theefoe be genealized by using equivalently any positive semi-definite kenel symmetic (PDS) kenel K : X X!R instead of the inne poduct in the input space, theeby also extending it to the case of non-vectoial input spaces X. The coesponding hypothesis set H is that of linea functions h: x 7! w (x) whee : X!H is a featue mapping to a Hilbet space H associated to K and w an element of H. In view of (3), fo any k [, ], the dual poblem of () can then be expessed as follows: max X i X i,j= subject to: 8i [, ], 0 apple i apple C, i j y i y j K k (z i, z j ) (4) whee, fo any k [, ], K k is the PDS kenel defined by K k :(z, z 0 ) 7! K(z, z 0 ) K(z, z k ) K(z k, z 0 )+K(z k, z k ). Ou solution can theefoe also be found in the dual by solving the QPs defined by (4). 4 Theoetical guaantees We hee pesent magin-based genealization bounds fo the AATP leaning poblem. 4

5 Let : R! [0; ] be the function defined by : x 7! xapple0 + ( x/ ) + x>0. Fo any >0 and t R, we define the genealization eo R(h, t) and empiical magin loss R b (h, t), both with espect to t, by R(h, t) = E h(x)>t + h(x0 )<t br (h, t) = (t h(x i )) + (h(x 0 (x,x 0 ) D i) t). In paticula, R(h, q h ) coesponds to the genealization eo and R b (h, q h ) to the empiical magin loss of a hypothesis h fo AATP. Fo any t>0, the empiical magin loss R b (h, t) is uppe bounded by the aveage of the faction of non-pefeed elements x i that h anks above t o less than below t, and the faction of pefeed ones x 0 i it anks below t o less than above t: br (h, t) apple t h(xi)< + h(x 0 i ) t<. (5) We denote by D the maginal distibution of the fist element of the pais in X Xdeived fom D, and by D the maginal distibution with espect to the second element. Similaly, S is the sample deived fom S by keeping only the fist element of each pai: S = x,...,x m and S the one obtained by keeping only the second element: S = x 0,...,x 0 m. We also denote by R D m (H) the Rademache complexity of H with espect to the maginal distibution D, that is R D m (H) = E[ R b S (H)], and R D m (H) = E[ R b S (H)]. Theoem Let H be a set of eal-valued functions taking values in [ M,+M] fo some M>0. Fix [0, ] and >0, then, fo any >0, with pobability at least ove the choice of a sample S of size m, each of the following inequalities holds fo all h H and t [ M,+M]: R(h, t) apple R b (h, t)+ R D m (H)+R D m (H)+ M log / p + m R(h, t) apple R b (h, t)+ br S (H)+ R b S (H)+ p M log / +3 m. Poof. Let H e be the family of hypotheses mapping (X X) to R defined by H e = {z =(x, x 0 ) 7! t h(x): h H, t [ M,+M]} and similaly H e 0 = {z =(x, x 0 ) 7! h(x 0 ) t: h H, t [ M,+M]}. Conside the two families of functions H e and H e0 taking values in [0, ] defined by eh = { f : f H} e and H e0 = { f : f H e 0 }. By the geneal Rademache complexity bounds fo functions taking values in [0, ] [8, 3, 0], with pobability at least, E (t h(x)) + (h(x 0 ) t) apple R b (h, t)+r m ( H e + H e 0 ) apple b R (h, t)+r m ( e H)+R m ( e H 0 + log / + log /, fo all h H. Since u<0 apple (u) fo all u R, the genealization eo R(h, t) is a lowe bound on left-hand side: R(h, t) apple E (t h(x)) + (h(x 0 ) t), we obtain R(h, t) apple R b (h, t)+r m ( H)+R e m ( H e log / 0 +. Since is / -Lipschitz, by Talagand s contaction lemma, we have R m H e apple (/ )Rm ( H) e and R m H e0 apple (/ )R m ( H e 0 ). By definition of the Rademache complexity, " # " # R m ( H)= e m E sup i(t h(x i )) = S D m, hh,t m E sup it + sup ih(x i ) S, t hh = h m E sup t i i+ apple m E ih(x i ). t[ M,+M] sup hh 5

6 Since the andom vaiables i and i follow the same distibution, the second tem coincides with R D m (H). The fist tem can be ewitten and uppe bounded as follows using Jensen s inequality: " # m E sup it = M X M X P[ ] i P[ ] i MappletappleM m P m m i>0 P m i<0 " = M m # m E X i apple M h m m E X i i = M h m m E X i i = p M. m Note that, by the Kahane-Khintchine inequality, the last uppe bound used is tight modulo a constant (/ p ). Similaly, we can show that R m ( H e 0 ) apple R D m (H)+M/ p m. This poves the fist inequality of the theoem; the second inequality can be deived fom the fist one using the standad bound elating the empiical and tue Rademache complexity. Since the bounds of the theoem hold unifomly fo all t [ any quantile value q h. M,+M], they hold in paticula fo Coollay (Magin bounds fo AATP) Let H be a set of eal-valued functions taking values in [ M,+M] fo some M>0. Fix [0, ] and >0, then, fo any >0, with pobability at least ove the choice of a sample S of size m, fo all h H it holds that: R D m (H)+R D m (H)+ p M log / + m R(h)apple b R (h, q h )+ R(h)apple b R (h, q h )+ br S (H)+ b R S (H)+ M p m +3 log /. A moe explicit vesion of this coollay can be deived fo kenel-based hypotheses (Appendix A). In the esults of the pevious theoem and coollay, the ight-hand side of the genealization bounds is expessed in tems of the empiical magin loss with espect to the tue quantile value q h, which is uppe bounded (see (5)) by half the faction of non-pefeed points in the sample whose scoe is above q h and half the faction of the pefeed points whose scoe is less than q h +. These factions ae close to the same factions with q h eplaced with bq h since the pobability that a scoe falls between q h and bq h can be shown to be unifomly bounded by a tem in O(/ p m). Altogethe, this analysis povides a stong suppot fo ou algoithm which is pecisely seeking to minimize the sum of an empiical magin loss based on the quantile and a tem that depends on the complexity, as in the ight-hand side of the leaning guaantees above. 5 Expeiments This section epots the esults of expeiments with ou AATP algoithm on seveal datasets. To measue the effectiveness of ou algoithm, we compae it to two othe algoithms, the INFINITE- PUSH algoithm [] and the SVMPERF algoithm [5], which ae both algoithms seeking to emphasize the accuacy nea the top. Ou expeiments ae caied out using thee data sets fom the UC Ivine Machine Leaning Repositoy Ionosphee, Housing, and Spambase. (Results fo Spambase can be found in Appendix C). In addition, we use the TREC 003 (LETOR.0) data set which is available fo download fom the following Micosoft Reseach URL: All the UC Ivine data sets we expeiment with ae fo two-goup classification poblems. Fom these we constuct bipatite anking poblems whee a pefeence pai consists of one positive and one negative example. To explicitly indicate the dependency on the quantile, we denote by q the value of the top -th quantile of the scoe distibution of a hypothesis. We will use N to denote the numbe of instances in a paticula data set, as well as s i, i =,...,N, to denote the paticula scoe values. If n + denotes the numbe of positive examples in the data set and n denotes the numbe of negative examples, then N = n + + n and the numbe of pefeences is m = n + n. Note that the Bahadu-Kiefe epesentation is known to povide a unifom convegence bound on the diffeence of the tue and empiical quantiles when the distibution admits a density [, 6], a stonge esult than what is needed in ou context. 6

7 Table : Ionosphee data: fo each top quantile and each evaluation metic, the thee ows coespond to AATP (top), SVMPERF(middle) and INFINITEPUSH (bottom). Fo the INFINITEPUSH algoithm we only epot mean values ove the folds. (%) P@ AP DCG@ NDCG@ Positives@top ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± Implementation We solved the convex optimization poblems () using the CVX solve As aleady noted, the AATP poblem can be solved efficiently using a distibuted computing envionment. The convex optimization poblem of the INFINITEPUSH algoithm (see (3.9) of []) can also be solved using CVX. Howeve, this optimization poblem has as many vaiables as the poduct of the numbes of positively and negatively labeled instances (n + n ), which makes it pohibitive to solve fo lage data sets within a untime of a few days. Thus, we expeimented with the INFINITEPUSH algoithm only on the Ionosphee data set. Finally, fo SVM- PERF s taining and scoe pediction we used the binay executables downloaded fom the URL and used the SVMPERF s settings that ae the closest to ou optimization fomulation. Thus, we used L-nom fo slack vaiables and allowed the constaint cache and the toleance fo temination citeion to gow in ode to contol the algoithm s convegence, especially fo lage values of the egulaization constant. 5. Evaluation measues To evaluate and compae the AATP, INFINITEPUSH, and SVMPERF algoithms, we used a numbe of standad metics: Pecision at the top (P@ ), Aveage Pecision (AP), Numbe of positives at the absolute top (Positives@top), Discounted Cumulative Gain (DCG@ ), and Nomalized Discounted Cumulative Gain (NDCG@ ). Definitions ae included in Appendix B. 5.3 Ionosphee data The data set s 35 instances epesent ada signals collected fom phased antennas, whee good signals (5 positively labeled instances) ae those that eflect back towad the antennas and bad signals (6 negatively labeled instances) ae those that pass though the ionosphee. The data has 34 featues. We split the data set into 0 independent sets of instances, say S,...,S 0. Then, we an 0 expeiments, whee we used 3 consecutive sets fo leaning and the est (7 sets) fo testing. We evaluated and compaed the algoithms fo 5 diffeent top quantiles {9, 4, 9.5, 5, } (%), which would coespond to the top 0, 5, 0, 5, items, espectively. Fo each, the egulaization paamete C was selected based on the aveage value of P@. The pefomance of AATP is significantly bette than that of the othe algoithms, paticulaly fo the smallest top quantiles. The two main citeia on which to evaluate the AATP algoithm ae Pecision at the top, (P@ ), and Numbe of positive at the top, (Positives@top). Fo = 5% the AATP algoithm obtains a stella 9% accuacy with an aveage of 3.3 positive elements at the top (Table ). 7

8 Table : Housing data: fo each quantile value and each evaluation metic, thee ae two ows coesponding to AATP (top) and SVMPERF(bottom). (%) P@ AP DCG@ NDCG@ Positives@top ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± Housing data The Boston Housing data set has 506 examples, 35 positive and 47 negative, descibed by 3 featues. We used featue 4 as the binay taget value. Two thids of the data instances was andomly selected and used fo taining, and the est fo testing. We ceated 0 expeimental folds analogously as in the case of the Ionosphee data. The Housing data is vey unbalanced with less than 7% positive examples. Fo this dataset we obtain esults vey compaable to SVMPERF fo the vey top quantiles, see Table. Natually, the standad deviations ae lage as a esult of the low pecentage of positive examples, so the esults ae not always significant. Fo highe top quantiles, e.g., top 4%, the AATP algoithm significantly outpefoms SVMPERF, obtaining 9% accuacy at the top (P@ ). Fo the highest top quantiles the diffeence in pefomance between the two algoithms is not significant. 5.5 LETOR.0 This data set coesponds to a elatively had anking poblem, with an aveage of only % elevant quey-url pais pe quey. It consists of 5 folds. Ou Matlab implementation (with CVX) of the algoithms pevented us fom tying ou appoach on lage data sets. Hence fom each taining fold we andomly selected 500 items fo taining. Fo testing, we selected 000 items at andom fom the test fold. Hee, we only epot esults fo P@%. SVMPERF obtained an accuacy of.5% ±.5% while the AATP algoithm obtained an accuacy of 4.6% ±.4%. This significantly bette esult indicates the powe of the algoithm poposed. 6 Conclusion We pesented a seies of esults fo the poblem of accuacy at the top quantile, including an AATP algoithm, a magin-based theoetical analysis in suppot of that algoithm, and a seies of expeiments with seveal data sets demonstating the effectiveness of ou algoithm. These esults ae of pactical inteest in applications whee the accuacy among the top quantile is sought. The analysis of poblems based on othe loss functions depending on the top -quantile scoes is also likely to benefit fom the theoetical and algoithmic esults we pesented. The optimization algoithm we discussed is highly paallelizable, since it is based on solving independent QPs. Ou initial expeiments epoted hee wee caied out using Matlab with CVX, which pevented us fom evaluating ou appoach on lage data sets, such as the full LETOR.0 data set. Howeve, we have now designed a solution fo vey lage m based on the ADMM (Altenating Diection Method of Multiplies) famewok [4]. We have implemented that solution and will pesent and discuss it in futue wok. 8

9 Refeences [] S. Agawal. The infinite push: A new suppot vecto anking algoithm that diectly optimizes accuacy at the absolute top of the list. In Poceedings of the SIAM Intenational Confeence on Data Mining, 0. [] R. R. Bahadu. A note on quantiles in lage samples. Annals of Mathematical Statistics, 37, 966. [3] P. L. Batlett and S. Mendelson. Rademache and Gaussian complexities: Risk bounds and stuctual esults. Jounal of Machine Leaning Reseach, 3:00, 00. [4] S. Boyd, N. Paikh, E. Chu, B. Peleato, and J. Eckstein. Distibuted optimization and statistical leaning via the altenating diection method of multiplies. Foundations and Tends in Machine Leaning, 3():, 0. [5] S. Boyd and L. Vandenbeghe. Convex Optimization. Cambidge Univesity Pess, 004. [6] J. S. Beese, D. Heckeman, and C. M. Kadie. Empiical analysis of pedictive algoithms fo collaboative filteing. In UAI 98: Poceedings of the Fouteenth Confeence on Uncetainty in Atificial Intelligence. Mogan Kaufmann, 998. [7] C. Buges, T. Shaked, E. Renshaw, A. Lazie, M. Deeds, N. Hamilton, and G. Hullende. Leaning to ank using gadient descent. In Poceedings of the nd intenational confeence on Machine leaning, ICML 05, pages 89 96, New Yok, NY, USA, 005. ACM. [8] C. J. C. Buges, R. Ragno, and Q. V. Le. Leaning to ank with nonsmooth cost functions. In NIPS, pages 93 00, 006. [9] S. Clémençon and N. Vayatis. Ranking the best instances. Jounal of Machine Leaning Reseach, 8:67 699, 007. [0] D. Cossock and T. Zhang. Statistical analysis of Bayes optimal subset anking. IEEE Tansactions on Infomation Theoy, 54(): , 008. [] K. Camme and Y. Singe. PRanking with anking. In Neual Infomation Pocessing Systems (NIPS 00). MIT Pess, 00. [] Y. Feund, R. Iye, R. E. Schapie, and Y. Singe. An efficient boosting algoithm fo combining pefeences. J. Mach. Lean. Res., 4, Decembe 003. [3] R. Hebich, K. Obemaye, and T. Gaepel. Advances in Lage Magin Classifies, chapte Lage Magin Rank Boundaies fo Odinal Regession. MIT Pess, 000. [4] T. Joachims. Optimizing seach engines using clickthough data. In Poceedings of the eighth ACM SIGKDD intenational confeence on Knowledge discovey and data mining, KDD 0, pages 33 4, New Yok, NY, USA, 00. ACM. [5] T. Joachims. A suppot vecto method fo multivaiate pefomance measues. In ICML, pages , 005. [6] J. Kiefe. On Bahadu s epesentation of sample quantiles. Annals of Mathematical Statistics, 38, 967. [7] R. Koenke. Quantile Regession. Cambidge Univesity Pess, 005. [8] V. Koltchinskii and D. Panchenko. Empiical magin distibutions and bounding the genealization eo of combined classifies. Annals of Statistics, 30, 00. [9] Q. V. Le, A. Smola, O. Chapelle, and C. H. Teo. Optimization of anking measues. Unpublished, 009. [0] M. Mohi, A. Rostamizadeh, and A. Talwalka. Foundations of Machine Leaning. The MIT Pess, 0. [] C. Rudin, C. Cotes, M. Mohi, and R. E. Schapie. Magin-based anking meets boosting in the middle. In COLT, pages 63 78, 005. [] I. Tsochantaidis, T. Joachims, T. Hofmann, and Y. Altun. Lage magin methods fo stuctued and intedependent output vaiables. Jounal of Machine Leaning Reseach, 6: ,