On the Algorithmic Implementation of Multiclass Kernelbased Vector Machines


 Barbara Sherman
 1 years ago
 Views:
Transcription
1 Jounal of Machine Leaning Reseach 2 (2001) Submitted 03/01; Published 12/01 On the Algoithmic Implementation of Multiclass Kenelbased Vecto Machines Koby Camme Yoam Singe School of Compute Science & Engineeing Hebew Univesity, Jeusalem 91904, Isael Editos: Nello Cistianini, John ShaweTaylo and Bob Williamson Abstact In this pape we descibe the algoithmic implementation of multiclass kenelbased vecto machines. Ou stating point is a genealized notion of the magin to multiclass poblems. Using this notion we cast multiclass categoization poblems as a constained optimization poblem with a quadatic objective function. Unlike most of pevious appoaches which typically decompose a multiclass poblem into multiple independent binay classification tasks, ou notion of magin yields a diect method fo taining multiclass pedictos. By using the dual of the optimization poblem we ae able to incopoate kenels with a compact set of constaints and decompose the dual poblem into multiple optimization poblems of educed size. We descibe an efficient fixedpoint algoithm fo solving the educed optimization poblems and pove its convegence. We then discuss technical details that yield significant unning time impovements fo lage datasets. Finally, we descibe vaious expeiments with ou appoach compaing it to peviously studied kenelbased methods. Ou expeiments indicate that fo multiclass poblems we attain stateoftheat accuacy. Keywods: Multiclass poblems, SVM, Kenel Machines 1. Intoduction Supevised machine leaning tasks often boil down to the poblem of assigning labels to instances whee the labels ae dawn fom a finite set of elements. This task is efeed to as multiclass leaning. Numeous specialized algoithms have been devised fo multiclass poblems by building upon classification leaning algoithms fo binay poblems, i.e., poblems in which the set of possible labels is of size two. Notable examples fo multiclass leaning algoithms ae the multiclass extensions fo decision tee leaning (Beiman et al., 1984, Quinlan, 1993) and vaious specialized vesions of boosting such as AdaBoost.M2 and AdaBoost.MH (Feund and Schapie, 1997, Schapie and Singe, 1999). Howeve, the dominating appoach fo solving multiclass poblems using suppot vecto machines has been based on educing a single multiclass poblems into multiple binay poblems. Fo instance, a common method is to build a set of binay classifies whee each classifie distinguishes between one of the labels to the est. This appoach is a special case of using output codes fo solving multiclass poblems (Dietteich and Bakii, 1995). Howeve, while multiclass leaning using output codes povides a simple and poweful famewok it cannot captue c 2001 Koby Camme and Yoam Singe.
2 Camme and Singe coelations between the diffeent classes since it beaks a multiclass poblem into multiple independent binay poblems. In this pape we develop and discuss in detail a diect appoach fo leaning multiclass suppot vecto machines (SVM). SVMs have gained an enomous populaity in statistics, leaning theoy, and engineeing (see fo instance Vapnik, 1998, Schölkopf et al., 1998, Cistianini and ShaweTaylo, 2000, and the many efeences theein). With a few exceptions most suppot vecto leaning algoithms have been designed fo binay (two class) poblems. A few attempts have been made to genealize SVM to multiclass poblems (Weston and Watkins, 1999, Vapnik, 1998). These attempts to extend the binay case ae acheived by adding constaints fo evey class and thus the size of the quadatic optimization is popotional to the numbe categoies in the classification poblems. The esult is often a homogeneous quadatic poblem which is had to solve and difficult to stoe. The stating point of ou appoach is a simple genealization of sepaating hypeplanes and, analogously, a genealized notion of magins fo multiclass poblems. This notion of a magin has been employed in pevious eseach (Allwein et al., 2000) but not in the context of SVM. Using the definition of a magin fo multiclass poblems we descibe in Section 3 a compact quadatic optimization poblem. We then discuss its dual poblem and the fom of the esulting multiclass pedicto. In Section 4 we give a decomposition of the dual poblem into multiple small optimization poblems. This decomposition yields a memoy and time efficient epesentation of multiclass poblems. We poceed and descibe an iteative solution fo the set of the educed optimization poblems. We fist discuss in Section 5 the means of choosing which educed poblem to solve on each ound of the algoithm. We then discuss in Section 6 an efficient fixedpoint algoithm fo finding an appoximate solution fo the educed poblem that was chosen. We analyze the algoithm and deive a bound on its ate of convegence to the optimal solution. The baseline algoithm is based on a main loop which is composed of an example selection fo optimization followed by an invocation of the fixedpoint algoithm with the example that was chosen. This baseline algoithm can be used with small datasets but to make it pactical fo lage ones, seveal technical impovements had to be sought. We theefoe devote Section 7 to a desciption of the diffeent technical impovements we have taken in ode to make ou appoach applicable to lage datasets. We also discuss the unning time and accuacy esults achieved in expeiments that undescoe the technical impovements. In addition, we epot in Section 8 the esults achieved in evaluation expeiments, compaing them to pevious wok. Finally, we give conclusions in Section 9. Related wok Natually, ou wok builds on pevious eseach and advances in leaning using suppot vecto machines. The space is clealy too limited to mention all the elevant wok, and thus we efe the eade to the books and collections mentioned above. As we have aleady mentioned, the idea of casting multiclass poblems as a single constained optimization with a quadatic objective function was poposed by Vapnik (1998), Weston and Watkins (1999), Bedensteine and Bennet (1999), and Guemeu et. al (2000). Howeve, the size of the esulting optimization poblems devised in the above papes is typically lage and complex. The idea of beaking a lage constained optimization poblem into small poblems, whee each of which employs a subset of the constaints was fist exploed in the context of suppot vecto machines by Bose et al. (1992). These ideas wee futhe 266
3 Multiclass Kenelbased Vecto Machines developed by seveal eseaches (see Joachims, 1998 fo an oveview). Howeve, the oots of this line of eseach go back to the seminal wok of LevBegman (1967) which was futhe developed by Yai Censo and colleagues (see Censo and Zenios, 1997 fo an excellent oveview). These ideas distilled in Platt s method, called SMO, fo sequential minimal optimization. SMO woks with educed poblems that ae deived fom a pai of examples while ou appoach employs a single example fo each educed optimization poblem. The esult is a simple optimization poblem which can be solved analytically in binay classification poblems (see Platt, 1998) and leads to an efficient numeical algoithm (that is guaanteed to convege) in multiclass settings. Futhemoe, although not exploed in this pape, it deems possible that the singleexample eduction can be used in paallel applications. Many of the technical impovements we discuss in this pape have been poposed in pevious wok. In paticula ideas such as using a woking set and caching have been descibed by Buges (1998), Platt (1998), Joachims (1998), and othes. Finally, we would like to note that this wok is pat of a geneal line of eseach on multiclass leaning we have been involved with. Allwein et al. (2000) descibed and analyzed a geneal appoach fo multiclass poblems using eo coecting output codes (Dietteich and Bakii, 1995). Building on that wok, we investigated the poblem of designing good output codes fo multiclass poblems (Camme and Singe, 2000). Although the model of leaning using output codes diffes fom the famewok studied in this pape, some of the techniques pesented in this pape build upon esults fom an ealie pape (Camme and Singe, 2000). Finally, some of the ideas pesented in this pape can also be used to build multiclass pedictos in online settings using the mistake bound model as the means of analysis. Ou cuent eseach on multiclass poblems concentates on analogous online appoaches (Camme and Singe, 2001). 2. Peliminaies Let S = {( x 1,y 1 ),...,( x m,y m )} be a set of m taining examples. We assume that each example x i is dawn fom a domain X R n and that each label y i is an intege fom the set Y = {1,...,k}. A (multiclass) classifie is a function H : X Ythat maps an instance x to an element y of Y. In this pape we focus on a famewok that uses classifies of the fom H M ( x) = ag max k { M x}, =1 whee M is a matix of size k n ove R and M is the th ow of M. We intechangeably call the value of the innepoduct of the th ow of M with the instance x the confidence and the similaity scoe fo the class. Theefoe, accoding to ou definition above, the pedicted label is the index of the ow attaining the highest similaity scoe with x. This setting is a genealization of linea binay classifies. Using the notation intoduced above, linea binay classifies pedict that the label of an instance x is1if w x >0 and 2 othewise ( w x 0). Such a classifie can be implemented using a matix of size 2 n whee M 1 = w and M 2 = w. Note, howeve, that this epesentation is less efficient as it occupies twice the memoy needed. Ou model becomes pasimonious when k 3 in which we maintain k pototypes M 1, M 2,..., M k and set the label of a new input instance by choosing the index of the most simila ow of M. 267
4 Camme and Singe Figue 1: Illustation of the magin bound employed by the optimization poblem. Given a classifie H M ( x) (paametized by a matix M) and an example ( x, y), we say that H M ( x) misclassifies an example x if H M ( x) y. Let [[π]] be 1 if the pedicate π holds and 0 othewise. Thus, the empiical eo fo a multiclass poblem is given by ɛ S (M) = 1 m m [[H M (x i ) y i ]]. (1) i=1 Ou goal is to find a matix M that attains a small empiical eo on the sample S and also genealizes well. Diect appoaches that attempt to minimize the empiical eo ae computationally expensive (see fo instance Höffgen and Simon, 1992, Camme and Singe, 2000). Building on Vapnik s wok on suppot vecto machines (Vapnik, 1998), we descibe in the next section ou paadigm fo finding a good matix M by eplacing the discete empiical eo minimization poblem with a quadatic optimization poblem. As we see late, ecasting the poblem as a minimization poblem also enables us to eplace innepoducts of the fom ā b with kenelbased innepoducts of the fom K(ā, b) = φ(ā) φ( b). 3. Constucting multiclass kenelbased pedictos To constuct multiclass pedictos we eplace the misclassification eo of an example, ([[H M (x) y]]), with the following piecewise linea bound, max{ M x +1 δ y, } M y x, whee δ p,q is equal 1 if p = q and 0 othewise. The above bound is zeo if the confidence value fo the coect label is lage by at least one than the confidences assigned to the est of the labels. Othewise, we suffe a loss which is linealy popotional to the diffeence between the confidence of the coect label and the maximum among the confidences of the othe labels. A gaphical illustation of the above is given in Figue 1. The cicles in the figue denote diffeent labels and the coect label is plotted in dak gey while the est of the labels ae plotted in light gay. The height of each label designates its confidence. Thee settings ae plotted in the figue. The left plot coesponds to the case when the magin is lage than one, and theefoe the bound max { M x +1 δ y, } M y x equals zeo, and hence the example is coectly classified. The middle figue shows a case whee the example is coectly classified but with a small magin and we suffe some loss. The ight plot depicts the loss of a misclassified example. 268
5 Multiclass Kenelbased Vecto Machines Summing ove all the examples in S we get an uppe bound on the empiical loss, ɛ S (M) 1 m [ max{ m M x i +1 δ yi,} M ] yi x i. (2) i=1 We say that a sample S is linealy sepaable by a multiclass machine if thee exists a matix M such that the above loss is equal to zeo fo all the examples in S, that is, i max{ M x i +1 δ yi,} M yi x i =0. (3) Theefoe, a matix M that satisfies Eq. (3) would also satisfy the constaints, i, Myi x i + δ yi, M x i 1. (4) Define the l 2 nom of a matix M to be the l 2 nom of the vecto epesented by the concatenation of M s ows, M 2 2 = ( M 1,..., M k ) 2 2 = i,j M i,j 2. Note that if the constaints given by Eq. (4) ae satisfied, we can make the diffeences between M yi x i and M x i abitaily lage. Futhemoe, pevious wok on the genealization popeties of lage magin DAGs (Platt et al., 2000) fo multiclass poblems showed that the genealization popeties depend on the l 2 nom of M (see also Camme and Singe, 2000). We theefoe would like to seek a matix M of a small nom that satisfies Eq. (4). When the sample S is linealy sepaable by a multiclass machine, we seek a matix M of the smallest nom that satisfies Eq. (4). The esult is the following optimization poblem, 1 min M 2 M 2 2 (5) subject to : i, Myi x i + δ yi, M x i 1. Note that m of the constaints fo = y i ae automatically satisfied since, M yi x i + δ yi,y i M yi x i =1. This popety is an atifact of the sepaable case. In the geneal case the sample S might not be linealy sepaable by a multiclass machine. We theefoe add slack vaiables ξ i 0 and modify Eq. (3) to be, i max{ M x i +1 δ yi,} M yi x i = ξ i. (6) We now eplace the optimization poblem defined by Eq. (5) with the following pimal optimization poblem, min M,ξ 1 2 β M m ξ i (7) i=1 subject to : i, Myi x i + δ yi, M x i 1 ξ i. whee β>0 is a egulaization constant and fo = y i the inequality constaints become ξ i 0. This is an optimization poblem with soft constaints. We would like to note in 269
6 Camme and Singe passing that it is possible to cast an analogous optimization poblem with had constaints as in (Vapnik, 1998). To solve the optimization poblem we use the KaushKuhnTucke theoem (see fo instance Vapnik, 1998, Cistianini and ShaweTaylo, 2000). We add a dual set of vaiables, one fo each constaint and get the Lagangian of the optimization poblem, L(M,ξ,η) = 1 2 β M m ξ i (8) i=1 + [ η i, M x i M ] yi x i δ yi, +1 ξ i i, subject to : i, η i, 0. We now seek a saddle point of the Lagangian, which would be the minimum fo the pimal vaiables {M,ξ} and the maximum fo the dual vaiables η. To find the minimum ove the pimal vaiables we equie, ξ i L =1 η i, =0 η i, =1. (9) Similaly, fo M we equie, M L = i = i η i, x i η i, x i i i,y i = ( ) η i,q x i + β M q } {{ } =1 δ yi x i + β M =0, which esults in the following fom M = β 1 [ i (δ yi, η i, ) x i ]. (10) Eq. (10) implies that the solution of the optimization poblem given by Eq. (5) is a matix M whose ows ae linea combinations of the instances x 1... x m. Note that fom Eq. (10) we get that the contibution of an instance x i to M is δ yi, η i,. We say that an example x i is a suppot patten if thee is a ow fo which this coefficient is not zeo. Fo each ow M of the matix M we can patition the pattens with nonzeo coefficients into two subsets by ewiting Eq. (10) as follows, M = β 1 (1 η i, ) x i + ( η i, ) x i. i:y i = i:y i The fist sum is ove all pattens that belong to the th class. Hence, an example x i labeled y i = is a suppot patten only if η i, = η i,yi < 1. The second sum is ove the est of the pattens whose labels ae diffeent fom. In this case, an example x i is a suppot patten 270
7 Multiclass Kenelbased Vecto Machines only if η i, > 0. Put anothe way, since fo each patten x i the set {η i,1,η i,2,...,η i,k } satisfies the constaints η i,1,...,η i,k 0 and η i, = 1, each set can be viewed as a pobability distibution ove the labels {1...k}. Unde this pobabilistic intepetation an example x i is a suppot patten if and only if its coesponding distibution is not concentated on the coect label y i. Theefoe, the classifie is constucted using pattens whose labels ae uncetain; the est of the input pattens ae ignoed. Next, we develop the Lagangian using only the dual vaiables by substituting Eqs. (9) and (10) into Eq. (8). Since the deivation is athe technical we defe the complete deivation to App. A. We obtain the following objective function of the dual pogam, Q(η) = 1 [ ] 2 β 1 ( x i x j ) (δ yi, η i, )(δ yj, η j, ) η i, δ yi,. i,j i, Let 1 i be the vecto whose components ae all zeo except fo the ith component which is equal to one, and let 1 be the vecto whose components ae all one. Using this notation we can ewite the dual pogam in the following vecto fom, max η Q(η) = 1 2 β 1 i,j ( x i x j ) [ ( 1 yi η i ) ( 1 yj η j ) ] i η i 1 yi (11) subject to : i : η i 0 and η i 1 =1. It is easy to veify that Q(η) is concave in η. Since the set of constaints is convex, thee is a unique maximum value of Q(η). To simplify the poblem we now pefom the following change of vaiables. Let τ i = 1 yi η i be the diffeence between the point distibution 1 yi concentating on the coect label and the distibution η i obtained by the optimization poblem. Then Eq. (10) that descibes the fom of M becomes, M = β 1 i τ i, x i. (12) Since we seach fo the value of the vaiables which maximize the objective function Q (and not the optimum value of Q itself), we can omit any additive and positive multiplicative constants and wite the dual poblem given by Eq. (11) as, max Q(τ) = 1 ( x i x j )( τ i τ j )+β τ i 1 yi (13) τ 2 i,j i subject to : i τ i 1 yi and τ i 1 =0. Finally, we ewite the classifie H( x) in tems of the vaiable τ, H( x) = ag max k { M x } { } = ag max k τ i, ( x i x) =1 =1 i. (14) As in Suppot Vecto Machines (Cotes and Vapnik, 1995), the dual pogam and the esulting classifie depend only on inne poducts of the fom ( x i x). Theefoe, we can pefom innepoduct calculations in some high dimensional innepoduct space Z by 271
8 Camme and Singe eplacing the innepoducts in Eq. (13) and in Eq. (14) with a kenel function K(, ) that satisfies Mece s conditions (Vapnik, 1998). The geneal dual pogam using kenel functions is theefoe, max Q(τ) = 1 K ( x i, x j ) ( τ i τ j )+β τ i 1 yi (15) τ 2 i,j i subject to : i τ i 1 yi and τ i 1 =0, and the classification ule H( x) becomes, H( x) = ag max k =1 { } τ i, K ( x, x i ) i. (16) Theefoe, constucting a multiclass pedicto using kenelbased innepoducts is as simple as using standad innepoducts. Note the classifie of Eq. (16) does not contain a bias paamete b fo =1...k. Augmenting these tems will add m moe equality constaints to the dual optimization poblem, inceasing the complexity of the optimization poblem. Howeve, one can always use innepoducts of the fom K(ā, b) + 1 which is equivalent of using bias paametes, and adding 1 2 β b2 to the objective function. Also note that in the special case of k = 2 Eq. (7) educes to the pimal pogam of SVM by setting w = M 1 M 2 and C = β 1. As mentioned above, Weston and Watkins (1999) also developed a multiclass vesion fo SVM. Thei appoach compaed the confidence M y x of the coect label to the confidences of all the othe labels M x and theefoe used m(k 1) slack vaiables in the pimal poblem. In contast, in ou famewok the confidence of the coect label is compaed to the highest similaityscoe among the est of the labels and uses only m slack vaiables in the pimal pogam. As we descibe in the sequel ou compact fomalization leads to a memoy and time efficient algoithm fo the above optimization poblem. 4. Decomposing the optimization poblem The dual quadatic pogam given by Eq. (15) can be solved using standad quadatic pogamming (QP) techniques. Howeve, since it employs mk vaiables, conveting the dual pogam given by Eq. (15) into a standad QP fom yields a epesentation that employs a matix of size mk mk, which leads to a vey lage scale poblem in geneal. Clealy, stoing a matix of that size is intactable fo lage poblems. We now intoduce a simple, memoy efficient algoithm fo solving the quadatic optimization poblem given by Eq. (15) by decomposing it into small poblems. The coe idea of ou algoithm is based on sepaating the constaints of Eq. (15) into m disjoint sets, { τ i τ i 1 yi, τ i 1 =0} m i=1. The algoithm we popose woks in ounds. On each ound the algoithm chooses a patten p and impoves the value of the objective function by updating the vaiables τ p unde the set of constaints, τ p 1 yp and τ p 1 =0. Let us fix an example index p and wite the objective function only in tems of the vaiables τ p. Fo bevity we use K i,j to denote K ( x i, x j ). We now isolate the contibution 272
9 Multiclass Kenelbased Vecto Machines Input {( x 1,y 1 ),...,( x m,y m )}. Initialize τ 1 = 0,..., τ m = 0. Loop: 1. Choose an example p. 2. Calculate the constants fo the educed poblem: A p = K( x p, x p ) B p = i p K( x i, x p ) τ i β 1 yp 3. Set τ p to be the solution of the educed poblem : Output : H( x) = ag max k =1 min Q( τ p )= 1 τ p 2 A p( τ p τ p )+ B p τ p subject to : τ p 1 yp and τ p 1 =0 { } τ i, K ( x, x i ). i Figue 2: Skeleton of the algoithm fo leaning multiclass suppot vecto machine. of τ p in Q. Q p ( τ p ) def = 1 2 K i,j ( τ i τ j )+β i,j i τ i 1 yi = 1 2 K p,p( τ p τ p ) K i,p ( τ p τ i ) i p 1 K i,j ( τ i τ j )+β τ p 1 yp + β τ i 1 yi 2 i p,j p i p = 1 2 K p,p( τ p τ p ) τ p β 1 yp + K i,p τ i i p + 1 K i,j ( τ i τ j )+β τ i 1 yi. (17) 2 i p,j p i p Let us now define the following vaiables, A p = K p,p > 0 (18) B p = β 1 yp + K i,p τ i (19) i p C p = 1 K i,j ( τ i τ j )+β τ i 1 yi. 2 i,j p i p Using the vaiables defined above the objective function becomes, Q p ( τ p )= 1 2 A p( τ p τ p ) B p τ p + C p. 273
10 Camme and Singe Fo bevity, let us now omit all constants that do not affect the solution. Each educed optimization poblem has k vaiables and k + 1 constaints, min Q( τ) = 1 τ 2 A p( τ p τ p )+ B p τ p (20) subject to : τ p 1 yp and τ p 1 =0. The skeleton of the algoithm is given in Figue 2. The algoithm is initialized with τ i = 0 fo i =1...m which, as we discuss late, leads to a simple initialization of intenal vaiables the algoithm employs fo efficient implementation. To complete the details of the algoithm we need to discuss the following issues. Fist, we need a stopping citeion fo the loop. A simple method is to un the algoithm fo a fixed numbe of ounds. A bette appoach which we discuss in the sequel is to continue iteating as long as the algoithm does not meet a pedefined accuacy condition. Second, we need a scheme fo choosing the patten p on each ound which then induces the educed optimization poblem given in Eq. (20). Two commonly used methods ae to scan the pattens sequentially o to choose a patten unifomly at andom. In this pape we descibe a scheme fo choosing an example p in a geedy manne. This scheme appeas to pefom bette empiically than othe naive schemes. We addess these two issues in Section 5. The thid issue we need to addess is how to solve efficiently the educed poblem given by Eq. (20). Since this poblem constitutes the coe and the inneloop of the algoithm we develop an efficient method fo solving the educed quadatic optimization poblem. This method is moe efficient than using the standad QP techniques, especially when it suffices to find an appoximation to the optimal solution. Ou specialized solution enables us to solve poblems with a lage numbe of classes k when a staightfowad appoach could not be applicable. This method is descibed in Section Example selection fo optimization To emind the eade, we need to solve Eq. (15), min Q(τ) = 1 K i,j ( τ i τ j ) β τ i 1 yi τ 2 i,j i subject to : i τ i 1 yi and τ i 1 =0, whee as befoe K i,j = K ( x i, x j ). We use the KaushKuhnTucke theoem (see Cistianini and ShaweTaylo, 2000) to find the necessay conditions fo a point τ to be an optimum of Eq. (15). The Lagangian of the poblem is, L(τ, u, v) = 1 K i,j τ i, τ j, β τ i, δ yi, (21) 2 i,j i, + u i, (τ i, δ yi,) v i τ i, i, i subject to : i, u i, 0. The fist condition is, L τ i, = K i,j τ j, βδ yi, + u i, v i =0. j (22) 274
11 Multiclass Kenelbased Vecto Machines Let us now define the following auxiliay set of vaiables, F i, = j K i,j τ j, βδ yi,. (23) Fo each instance x i, the value of F i, designates the confidence in assigning the label to x i. A value of β is subtacted fom the coect label confidence in ode to obtain a magin of at least β. Note that fom Eq. (19) we get, F p, = B p, + k p,p τ p,. (24) We will make use of this elation between the vaiables F and B in the next section in which we discuss an efficient solution to the quadatic poblem. Taking the deivative with espect to the dual vaiables of the Lagangian given by Eq. (21) and using the definition of F i, fom Eq. (23) and KKT conditions we get the following set of equality constaints on a feasible solution fo the quadatic optimization poblem, i, F i, + u i, = v i, (25) i, u i, (τ i, δ yi,) =0, (26) i, u i, 0. (27) We now futhe simplify the equations above. We do so by consideing two cases. The fist case is when τ i, = δ yi,. In this case Eq. (26) holds automatically. By combining Eq. (27) and Eq. (25) we get that, F i, v i. (28) In the second case τ i, <δ yi,. In ode fo Eq. (26) to hold we must have u i, = 0. Thus, using Eq. (25) we get that, F i, = v i. We now eplace the single equality constaint with the following two inequalities, F i, v i and F i, v i. (29) To emind the eade, the constaints on τ fom the optimization poblem given by Eq. (15) imply that fo all i, τ i 1 yi and τ i 1 = 0. Theefoe, if these constaints ae satisfied thee must exist at least one label fo which τ i, <δ yi,. We thus get that v i = max F i,. Note also that if τ i = 0 then F i,yi = v i = max F i, and F i,yi is the unique maximum. We now combine the set of constaints fom Eqs. (28) and (29) into a single inequality, Finally, dopping v i we obtain, max F i, v i min F i,. (30) : τ i, <δ yi, max F i, min F i,. (31) : τ i, <δ yi, We now define, ψ i = max F i, min : τ i, <δ yi, F i,. (32) 275
12 Camme and Singe Since max F i, min : τi, <δ F yi, i, then the necessay and sufficient condition fo a feasible vecto τ i to be an optimum fo Eq. (15) is that, ψ i = 0. In the actual numeical implementation it is sufficient to find τ i such that ψ i ɛ whee ɛ is a pedefined accuacy paamete. We theefoe keep pefoming the main loop of Figue 2 so long as thee ae examples ( x i,y i ) whose values ψ i ae geate than ɛ. The vaiables ψ i also seve as ou means fo choosing an example fo an update. In ou implementation we ty to keep the memoy equiements as small as possible and thus manipulate a single example on each loop. We choose the example index p fo which ψ p is maximal. We then find the vecto τ p which is the (appoximate) solution of the educed optimization poblem given by Eq. (15). Due to the change in τ p we need to update F i, and ψ i fo all i and. The pseudocode descibing this pocess is defeed to the next section in which we descibe a simple and efficient algoithm fo finding an appoximate solution fo the optimization poblem of Eq. (15). Lin (2001) showed that this scheme does convege to the solution in a finite numbe of steps. Finally, we would like to note that some of the undelying ideas descibed in this section have been also exploed by Keethi and Gilbet (2000). 6. Solving the educed optimization poblem The coe of ou algoithm elies on an efficient method fo solving the educed optimization given by Eq. (15) o the equivalent poblem as defined by Eq. (20). In this section we descibe an efficient fixedpoint algoithm that finds an appoximate solution to Eq. (20). We would like to note that an exact solution can also be deived. In (Camme and Singe, 2000) we descibed a closely elated algoithm fo solving a simila quadatic optimization poblem in the context of output coding. A simple modification of the algoithm can be used hee. Howeve, the algoithm needs to sot k values on each iteation and thus might be slow when k is lage. Futhemoe, as we discuss in the next section, we found empiically that the quality of the solution is quite insensitive to how well we fulfill the KaushKuhn Tucke condition by bounding ψ i. Theefoe, it is enough to find a vecto τ p that deceases significantly the value of Q( τ) but is not necessaily the optimal solution. We stat by ewiting Q( τ) fom Eq. (20) using a completion to quadatic fom and dopping the patten index p, Q( τ) = 1 2 A( τ τ) B τ = 1 B B A[( τ + ) ( τ + 2 A A )] + B B 2A. We now pefom the following change of vaiables, ν = τ + B A D = B A + 1 y. (33) At this point, we omit additive constants and the multiplicative facto A since they do not affect the value of the optimal solution. Using the above vaiable the optimization poblem fom Eq. (20) now becomes, min ν Q( ν) = ν 2 (34) subject to : ν D and ν 1 = D
13 Multiclass Kenelbased Vecto Machines We would like to note that since F i, = B i, + A i τ i,, we can compute ψ i fom B i and thus need to stoe eithe B i, o F i,. Let us denote by θ and α i the vaiables of the dual poblem of Eq. (34). Then, the KaushKuhnTucke conditions imply that, ν D ; α (ν D )=0 ; ν + α θ =0. (35) Note that since α 0 the above conditions imply that ν θ fo all. Combining this inequality with the constaint that ν D we get that the solution satisfies ν min{θ, D }. (36) If α = 0 we get that ν = θ and if α > 0 we must have that ν = D. Thus, Eq. (36) holds with equality, namely, the solution is of the fom, ν = min{θ, D }. Now, since ν 1 = D 1 1 we get that θ satisfies the following constaint, k =1 min {θ, D } = k D 1. (37) The above equation uniquely defines θ since the sum k =1 min {θ, D } is a stictly monotone and continuous function in θ. Fo θ = max D we have that k =1 min {θ, D } > k =1 D 1 while k =1 min {θ, D } as θ. Theefoe, thee always exists a unique value θ that satisfies Eq. (37). The following theoem shows that θ is indeed the optimal solution of the quadatic optimization poblem. Theoem 1 Let ν = min{θ,d } whee θ is the solution of k =1 min {θ, D } = k =1 D 1. Then, fo evey point ν we have that ν 2 > ν 2. Poof Assume by contadiction that thee is anothe feasible point ν = ν + which minimizes the objective function. Since ν ν we know that 0. Both ν and ν satisfy the equality constaint of Eq. (34), thus = 0. Also, both points satisfy the inequality constaint of Eq. (34) thus 0 when ν = D. Combining the last two equations with the assumption that 0 we get that s > 0 fo some s with νs = θ. Using again the equality = 0 we have that thee exists an index u with u < 0. Let us denote by ɛ = min{ s, u }. We now define a new feasible point ν as follows. Let ν s = ν s ɛ, ν u = ν u + ɛ, and ν = ν othewise. We now show that the nom of ν is smalle than the nom of ν. Since ν and ν diffe only in thei s and u coodinates, we have that, =1 ν 2 ν 2 =(ν s) 2 +(ν u) 2 (ν s ) 2 (ν u ) 2. Witing the values of ν in tems of ν and ɛ we get, ν 2 ν 2 =2ɛ (ɛ ν s + ν u ). 277
14 Camme and Singe Fom ou constuction of ν we have that eithe ν u ɛ = θ>ν s o ν u ɛ>θ ν s and theefoe we get ν u ɛ>ν s. This implies that which is clealy a contadiction. ν 2 ν 2 < 0, We now use the above chaacteization of the solution to deive a simple fixedpoint algoithm that finds θ. We use the simple identity min{θ, D } + max{θ, D } = θ + D and eplace the minimum function with the above sum in Eq. (37) and get, which amounts to, Let us define, k [θ + D max{θ, D }]= =1 Then, the optimal value θ satisfies k D 1, =1 [ k ] θ = 1 max{θ,d } 1 k k. (38) =1 [ k ] F (θ) = 1 max{θ, D } 1 k k. (39) =1 θ = F (θ ). (40) Eq. (40) can be used fo the following iteative algoithm. The algoithm stats with an initial value of θ and then computes the next value using Eq. (39). It continues iteating by substituting each new value of θ in F ( ) and poducing a seies of values fo θ. The algoithm halts when a equied accuacy is met, that is, when two successive values of θ ae close enough. A pseudocode of the algoithm is given in Figue 3. The input to the algoithm is the vecto D, an initial suggestion fo θ, and a equied accuacy ɛ. We next show that if θ 1 max D then the algoithm does convege to the coect value of θ. Theoem 2 Let θ be the fixed point of Eq. (40) (θ = F (θ )). Assume that θ 1 max D and let θ l+1 = F (θ l ). Then fo l 1 θ l+1 θ θ l θ 1 1 k, whee k is the numbe of classes. Poof Assume without loss of geneality that max D = D 1 D 2... D k D k+1 def =. Also assume that θ (D s+1,d s ) and θ l (D u+1,d u ) whee u, s {1, 2,...,k}. 278
15 Multiclass Kenelbased Vecto Machines FixedPointAlgoithm( D, θ, ɛ) Input D, θ 1, ɛ. Initialize l =0. Repeat l l +1. [ θ l+1 1 k k =1 max{θ l,d }] 1 k. Until θ l θ l+1 θ l ɛ. Assign fo =1,...,k: ν = min{θ l+1,d } Retun: τ = ν B A. Thus, Figue 3: The fixedpoint algoithm fo solving the educed quadatic pogam. θ l+1 = F (θ l ) [ k ] = 1 max{θ l,d } 1 k k =1 ( k ) ( u ) = 1 θ l + 1 D 1 k k k =u+1 =1 ( ( = 1 u ) u ) θ l + 1 D 1. (41) k k Note that if θ l max D then θ l+1 max D. Similaly, ( ( θ = F (θ )= 1 s ) s ) θ + 1 D 1 k k ( =1 s ) θ = 1 D 1. s We now need to conside thee cases depending on the elative ode of s and u. The fist case is when u = s. In this case we get that, θ l+1 θ θ l θ = ( 1 s ) k θl + 1 k ( s =1 D 1) θ θ l θ = ( 1 s ) k θl + s k θ θ θ l θ =1 = 1 s k 1 1 k. whee the second equality follows fom Eq. (42). The second case is whee u>s. In this case we get that fo all = s +1,...,u : =1 (42) θ l D θ. (43) 279
16 Camme and Singe Using Eq. (41) and Eq. (42) we get, θ l+1 = = = ( 1 u ) θ l + 1 k k ( 1 u ) θ l + s 1 k k s ( u ) D 1 =1 ( 1 u ) θ l + s k k θ + 1 k ( s ) ( u D k =1 ( u ) D. =s+1 =s+1 D ) Applying Eq. (43) we obtain, θ l+1 = ( 1 u ) ( k 1 u k θ l + s k θ + 1 (u s)θ k ) θ l + u k θ. Since θ l+1 is bounded by a convex combination of θ l and θ, and θ is lage than θ l, then θ θ l+1. We theefoe finally get that, θ l+1 θ θ l θ = θ θ l+1 θ θ l θ ( 1 u ) k θl u k θ θ θ l = 1 u k 1 1 k. The last case, whee u<s, is deived analogously to the second case, intechanging the oles of u and s. Fom the poof we see that the best convegence ate is obtained fo lage values of u. Thus, a good feasible initialization fo θ 1 can be min D. In this case ( k ) θ 2 = F (θ 1 )= 1 D 1 k k. This gives a simple initialization of the algoithm which ensues that the initial ate of convegence will be fast. We ae now eady to descibe the complete implementation of the algoithm fo leaning multiclass kenel machine. The algoithm gets a equied accuacy paamete and the value of β. It is initialized with τ i = 0 fo all indices 1 i m. This value yields a simple initialization of the vaiables F i,. On each iteation we compute fom F i, the value ψ i fo each example and choose the example index p fo which ψ p is the lagest. We then call the fixedpoint algoithm which in tun finds an appoximate solution to the educed quadatic optimization poblem fo the example indexed p. The fixedpoint algoithm etuns a set of new values fo τ p which tigges the update of F i,. This pocess is epeated until the value ψ i is smalle than ɛ fo all 1 i m. The pseudocode of the algoithm is given in Figue 4. =1 280
17 Multiclass Kenelbased Vecto Machines Input {( x 1,y 1 ),...,( x m,y m )}. Initialize fo i =1,...,m: τ i = 0 F i, = βδ,yi ( =1...k) A i = K( x i, x i ) Repeat: Calculate fo i =1...m: ψ i = max F i, min F i, : τ i, <δ yi, Set: p = ag max{ψ i } ( Set fo =1...k : D = Fp, A p τ p, + δ,yp and θ = 1 k ) k =1 D 1 k Call: τ p =FixedPointAlgoithm( D, θ, ɛ/2). (See Figue 3) Set: τ p = τ p τ p Update fo i =1...m and =1...k: F i, F i, + τ p, K ( x p, x i ) Update: τ p τ p Until ψ p <ɛβ { } Output : H( x) = ag max τ i, K ( x, x i ). i Figue 4: Basic algoithm fo leaning a multiclass, kenelbased, suppot vecto machine using KKT conditions fo example selection. 7. Implementation details We have discussed so fa the undelying pincipal and algoithmic issues that aise in the design of multiclass kenelbased vecto machines. Howeve, to make the leaning algoithm pactical fo lage datasets we had to make seveal technical impovements to the baseline implementation. While these impovements do not change the undelying design pincipals they lead to a significant impovement in unning time. We theefoe devote this section to a desciption of the implementation details. To compae the pefomance of the diffeent vesions pesented in this section we used the MNIST OCR dataset 1. The MNIST dataset contains 60, 000 taining examples and 10, 000 test examples and thus can undescoe significant implementation impovements. Befoe diving into the technical details we would like to note that many of the techniques ae by no means new and have been used in pio implementation of twoclass suppot vecto machines (see fo instance Platt, 1998, Joachims, 1998, Collobet and Bengio, 2001). Howeve, a few of ou implementation impovements build on the specific algoithmic design of multiclass kenel machines. 1. Available at yann/exdb/mnist/index.html 281
18 Camme and Singe Run Time (sec) 1.30 x epsilon Test Eo epsilon Figue 5: The un time (left) and test eo (ight) as a function of equied accuacy ɛ. Ou stating point and baseline implementation is the algoithm descibed in Figue 2 combined with the fixedpoint algoithm fo solving the educed quadatic optimization poblem. In the baseline implementation we simply cycle though the examples fo a fixed numbe of iteations, solving the educed optimization poblem, ignoing the KKT conditions fo the examples. This scheme is vey simple to implement and use. Howeve, it spends unnecessay time in the optimization of pattens which ae coectly classified with a lage confidence. We use this scheme to illustate the impotance of the efficient example selection descibed in the pevious section. We now descibe the diffeent steps we took, stating fom the vesion descibed in Section 5. Using KKT fo example selection This is the algoithm descibed in Figue 4. Fo each example i and label we compute F i,. These vaiables ae used to compute ψ i as descibed in Section 5. On each ound we choose the example p fo which ψ p is the lagest and iteate the pocess until the value of ψ i is smalle fo a pedefined accuacy denoted by ɛ. It tuns out that the choice of ɛ is not cucial and a lage ange of values yield good esults. The lage ɛ is the soone we teminate the main loop of the algoithm. Theefoe, we would like to set ɛ to a lage value as long as the genealization pefomance is not effected. In Figue 5 we show the unning time and the test eo as a function of ɛ. The esults show that a modeate value of ɛ of 0.1 aleady yields good genealization. The incease in unning time when using smalle values fo ɛ is between %20 to %30. Thus, the algoithm is athe obust to the actual choice of the accuacy paamete ɛ so long as it is not set to a value which is evidently too lage. Maintaining an active set The standad implementation descibed above scans the entie taining set and computes ψ i fo each example x i in the set. Howeve, if only a few suppot pattens constitute the multiclass machine then the vecto τ is the zeo vecto fo many example. We thus patition the set of examples into two sets. The fist, denoted by A and called the active set, is composed of the set of examples that contibute to the solution, that is, A = {i τ i 0}. The second set is simply its complement, A c = {i τ i = 0}. Duing the couse of the main loop we fist seach fo an example to update fom the set A. Only if such an example does not exist, which can happen iff i A,ψ i <ɛ, we scan the 282
19 Multiclass Kenelbased Vecto Machines 10 4 Objective function value w/o cooling with cooling Iteation Figue 6: The value of the objective function Q as a function of the numbe of iteation fo a fixed and vaiable scheduling of the accuacy paamete ɛ. set A c fo an example p with ψ p >ɛ. If such an example exists we emove it fom A c, add it to A, and call the fixedpoint algoithm with that example. This pocedue spends most of it time adjusting the weights of examples that constitute the active set and adds a new example only when the active set is exhausted. A natual implication of this pocedue is that the suppot pattens can come only fom the active set. Cooling of the accuacy paamete The employment of an active set yields significant eduction in unning time. Howeve, the scheme also foces the algoithm to keep updating the vectos τ i fo i A as long thee is even a single example i fo which ψ i >ɛ. This may esult in minuscule changes and a slow decease in Q once most examples in A have been updated. In Figue 6 we plot in bold line the value of Q as a function of the numbe of iteations when ɛ is kept fixed. The line has a staicaselike shape. Caeful examination of the iteations in which thee was a significant dop in Q evealed that these ae the iteations on which new examples wee added to the active set. Afte each addition of a new example numeous iteations ae spent in adjust the weights τ i. To acceleate the pocess, especially on ealy iteations duing which we mostly add new examples to the active set, we use a vaiable accuacy paamete, athe than a fixed accuacy. On ealy iteations the accuacy value is set to a high value so that the algoithm will mostly add new examples to the active set and spend only a small time on adjusting the weights of the suppot pattens. As the numbe of iteations inceases we decease ɛ and spend moe time on adjusting the weights of suppot pattens. The esult is a smoothe and moe apid decease in Q which leads to faste convegence of the algoithm. We efe to this pocess of gadually deceasing 283
20 Camme and Singe Run Time Size of Taining Set Figue 7: Compaison of the untime on the MNIST dataset of the diffeent vesions as a function of the tainingset size. Vesion 1 is the baseline implementation. Vesion 2 uses KKT conditions fo selecting an example to update. Vesion 3 adds the usage of an active set and cooling of ɛ. Vesion 4 adds caching of innepoducts. Finally, vesion 5 uses data stuctues fo epesenting and using spase inputs. ɛ as cooling. We tested the following cooling schemes (fo t = 0, 1,...): (a) exponential: ɛ(t) =ɛ 0 exp( t); (b) linea: ɛ(t) =ɛ 0 /(t + 1) (c) logaithmic: ɛ(t) =ɛ 0 / log 10 (t + 10). The initial accuacy ɛ 0 was set to We found that all of these cooling schemes impove the ate of decease in Q, especially the logaithmic scheme fo which ɛ(t) is elatively lage fo a long peiod and than deceases modeately. The dashed line in Figue 6 designate the value of Q as a function of the numbe of iteations using a logaithmic cooling scheme fo ɛ. In the paticula setting of the figue, cooling educes the numbe of iteations, and thus the unning time, by an ode of magnitude. Caching Pevious implementations of algoithms fo suppot vecto machines employ a cache fo saving expensive kenelbased innepoducts (see fo instance Platt, 1998, Joachims, 1998, Collobet and Bengio, 2001). Indeed, one of the most expensive steps in the algoithm is the evaluation of the kenel. Ou scheme fo maintaining a cache is as follows. Fo small datasets we stoe in the cache all the kenel evaluations between each example in the active set and all the examples in the taining set. Fo lage poblems with many suppot pattens (and thus a lage active set) we use a leastecentlyused (LRU) scheme as a caching stategy. In this scheme, when the cache is full we eplace least used innepoducts of an example with the innepoducts of a new example. LRU caching is also used in SVM light (Joachims, 1998). 284
Accuracy at the Top. Abstract
Accuacy at the Top Stephen Boyd Stanfod Univesity Packad 64 Stanfod, CA 94305 boyd@stanfod.edu Mehya Mohi Couant Institute and Google 5 Mece Steet New Yok, NY 00 mohi@cims.nyu.edu Coinna Cotes Google Reseach
More informationCHAPTER 9 THE TWO BODY PROBLEM IN TWO DIMENSIONS
9. Intoduction CHAPTER 9 THE TWO BODY PROBLEM IN TWO DIMENSIONS In this chapte we show how Keple s laws can be deived fom Newton s laws of motion and gavitation, and consevation of angula momentum, and
More informationMaketoorder, Maketostock, or Delay Product Differentiation? A Common Framework for Modeling and Analysis
aetoode, aetostoc, o Dela Poduct Dieentiation? A Common Famewo o odeling and Analsis Diwaa Gupta Saiallah Benjaaa Univesit o innesota Depatment o echanical Engineeing inneapolis, N 55455 Second evision,
More informationLearning to Select Features using their Properties
Journal of Machine Learning Research 9 (2008) 23492376 Submitted 8/06; Revised 1/08; Published 10/08 Learning to Select Features using their Properties Eyal Krupka Amir Navot Naftali Tishby School of
More informationProbability Estimates for Multiclass Classification by Pairwise Coupling
Journal of Machine Learning Research 5 (2004) 975005 Submitted /03; Revised 05/04; Published 8/04 Probability Estimates for Multiclass Classification by Pairwise Coupling TingFan Wu ChihJen Lin Department
More informationAn Introduction to Variable and Feature Selection
Journal of Machine Learning Research 3 (23) 11571182 Submitted 11/2; Published 3/3 An Introduction to Variable and Feature Selection Isabelle Guyon Clopinet 955 Creston Road Berkeley, CA 9478151, USA
More informationSubspace Pursuit for Compressive Sensing: Closing the Gap Between Performance and Complexity
Subspace Pursuit for Compressive Sensing: Closing the Gap Between Performance and Complexity Wei Dai and Olgica Milenkovic Department of Electrical and Computer Engineering University of Illinois at UrbanaChampaign
More informationHow to Use Expert Advice
NICOLÒ CESABIANCHI Università di Milano, Milan, Italy YOAV FREUND AT&T Labs, Florham Park, New Jersey DAVID HAUSSLER AND DAVID P. HELMBOLD University of California, Santa Cruz, Santa Cruz, California
More informationHow Boosting the Margin Can Also Boost Classifier Complexity
Lev Reyzin lev.reyzin@yale.edu Yale University, Department of Computer Science, 51 Prospect Street, New Haven, CT 652, USA Robert E. Schapire schapire@cs.princeton.edu Princeton University, Department
More informationData integration: A theoretical perspective
Data integation: A theoetical esective Mauizio Lenzeini Diatimento di Infomatica e Sistemistica Antonio Rubeti Univesità di Roma La Saienza Tutoial at PODS 2002 Madison, Wisconsin, USA, June 2002 Data
More informationTHE PROBLEM OF finding localized energy solutions
600 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 45, NO. 3, MARCH 1997 Sparse Signal Reconstruction from Limited Data Using FOCUSS: A Reweighted Minimum Norm Algorithm Irina F. Gorodnitsky, Member, IEEE,
More informationApproximately Detecting Duplicates for Streaming Data using Stable Bloom Filters
Approximately Detecting Duplicates for Streaming Data using Stable Bloom Filters Fan Deng University of Alberta fandeng@cs.ualberta.ca Davood Rafiei University of Alberta drafiei@cs.ualberta.ca ABSTRACT
More informationWhich Is the Best Multiclass SVM Method? An Empirical Study
Which Is the Best Multiclass SVM Method? An Empirical Study KaiBo Duan 1 and S. Sathiya Keerthi 2 1 BioInformatics Research Centre, Nanyang Technological University, Nanyang Avenue, Singapore 639798 askbduan@ntu.edu.sg
More informationAutomatic segmentation of text into structured records
Automatic segmentation of text into structured records Vinayak Borkar Kaustubh Deshmukh Sunita Sarawagi Indian Institute of Technology, Bombay ABSTRACT In this paper we present a method for automatically
More informationDecoding by Linear Programming
Decoding by Linear Programming Emmanuel Candes and Terence Tao Applied and Computational Mathematics, Caltech, Pasadena, CA 91125 Department of Mathematics, University of California, Los Angeles, CA 90095
More informationLearning Deep Architectures for AI. Contents
Foundations and Trends R in Machine Learning Vol. 2, No. 1 (2009) 1 127 c 2009 Y. Bengio DOI: 10.1561/2200000006 Learning Deep Architectures for AI By Yoshua Bengio Contents 1 Introduction 2 1.1 How do
More informationSteering User Behavior with Badges
Steering User Behavior with Badges Ashton Anderson Daniel Huttenlocher Jon Kleinberg Jure Leskovec Stanford University Cornell University Cornell University Stanford University ashton@cs.stanford.edu {dph,
More informationFast Solution of l 1 norm Minimization Problems When the Solution May be Sparse
Fast Solution of l 1 norm Minimization Problems When the Solution May be Sparse David L. Donoho and Yaakov Tsaig October 6 Abstract The minimum l 1 norm solution to an underdetermined system of linear
More informationSpeeding up Distributed RequestResponse Workflows
Speeding up Distributed RequestResponse Workflows Virajith Jalaparti (UIUC) Peter Bodik Srikanth Kandula Ishai Menache Mikhail Rybalkin (Steklov Math Inst.) Chenyu Yan Microsoft Abstract We found that
More informationHighRate Codes That Are Linear in Space and Time
1804 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 48, NO 7, JULY 2002 HighRate Codes That Are Linear in Space and Time Babak Hassibi and Bertrand M Hochwald Abstract Multipleantenna systems that operate
More informationLearning over Sets using Kernel Principal Angles
Journal of Machine Learning Research 4 (2003) 913931 Submitted 3/03; Published 10/03 Learning over Sets using Kernel Principal Angles Lior Wolf Amnon Shashua School of Engineering and Computer Science
More informationEVALUATION OF GAUSSIAN PROCESSES AND OTHER METHODS FOR NONLINEAR REGRESSION. Carl Edward Rasmussen
EVALUATION OF GAUSSIAN PROCESSES AND OTHER METHODS FOR NONLINEAR REGRESSION Carl Edward Rasmussen A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy, Graduate
More informationA Fast Learning Algorithm for Deep Belief Nets
LETTER Communicated by Yann Le Cun A Fast Learning Algorithm for Deep Belief Nets Geoffrey E. Hinton hinton@cs.toronto.edu Simon Osindero osindero@cs.toronto.edu Department of Computer Science, University
More informationMore Generality in Efficient Multiple Kernel Learning
Manik Varma manik@microsoft.com Microsoft Research India, Second Main Road, Sadashiv Nagar, Bangalore 560 080, India Bodla Rakesh Babu rakeshbabu@research.iiit.net CVIT, International Institute of Information
More informationCOSAMP: ITERATIVE SIGNAL RECOVERY FROM INCOMPLETE AND INACCURATE SAMPLES
COSAMP: ITERATIVE SIGNAL RECOVERY FROM INCOMPLETE AND INACCURATE SAMPLES D NEEDELL AND J A TROPP Abstract Compressive sampling offers a new paradigm for acquiring signals that are compressible with respect
More informationTo Recognize Shapes, First Learn to Generate Images
Department of Computer Science 6 King s College Rd, Toronto University of Toronto M5S 3G4, Canada http://learning.cs.toronto.edu fax: +1 416 978 1455 Copyright c Geoffrey Hinton 2006. October 26, 2006
More informationFast Greeks by algorithmic differentiation
The Journal of Computational Finance (3 35) Volume 14/Number 3, Spring 2011 Fast Greeks by algorithmic differentiation Luca Capriotti Quantitative Strategies, Investment Banking Division, Credit Suisse
More informationTwodimensional Languages
Charles University Faculty of Mathematics and Physics Mgr. Daniel Průša Twodimensional Languages Doctoral Thesis Supervisor: Martin Plátek, CSc. Prague, 2004 Acknowledgements The results presented in
More informationGenerative or Discriminative? Getting the Best of Both Worlds
BAYESIAN STATISTICS 8, pp. 3 24. J. M. Bernardo, M. J. Bayarri, J. O. Berger, A. P. Dawid, D. Heckerman, A. F. M. Smith and M. West (Eds.) c Oxford University Press, 2007 Generative or Discriminative?
More informationDiscovering Value from Community Activity on Focused Question Answering Sites: A Case Study of Stack Overflow
Discovering Value from Community Activity on Focused Question Answering Sites: A Case Study of Stack Overflow Ashton Anderson Daniel Huttenlocher Jon Kleinberg Jure Leskovec Stanford University Cornell
More information