Learning Permutations with Exponential Weights

Size: px
Start display at page:

Download "Learning Permutations with Exponential Weights"

Transcription

1 Journal of Machne Learnng Research 2009 (10) Submtted 9/08; Publshed 7/09 Learnng Permutatons wth Exponental Weghts Davd P. Helmbold Manfred K. Warmuth Computer Scence Department Unversty of Calforna, Santa Cruz Santa Cruz, CA Edtor: Yoav Freund Abstract We gve an algorthm for the on-lne learnng of permutatons. The algorthm mantans ts uncertanty about the target permutaton as a doubly stochastc weght matrx, and makes predctons usng an effcent method for decomposng the weght matrx nto a convex combnaton of permutatons. The weght matrx s updated by multplyng the current matrx entres by exponental factors, and an teratve procedure s needed to restore double stochastcty. Even though the result of ths procedure does not have a closed form, a new analyss approach allows us to prove an optmal (up to small constant factors) bound on the regret of our algorthm. Ths regret bound s sgnfcantly better than that of ether Kala and Vempala s more effcent Follow the Perturbed Leader algorthm or the computatonally expensve method of explctly representng each permutaton as an expert. Keywords: permutaton, rankng, on-lne learnng, Hedge algorthm, doubly stochastc matrx, relatve entropy projecton, Snkhorn balancng 1. Introducton Fndng a good permutaton s a key aspect of many problems such as the rankng of search results or matchng workers to tasks. In ths paper we present an effcent and effectve on-lne algorthm for learnng permutatons n a model related to the on-lne allocaton model of learnng wth experts (Freund and Schapre, 1997). In each tral, the algorthm probablstcally chooses a permutaton and then ncurs a lnear loss based on how approprate the permutaton was for that tral. The regret s the total expected loss of the algorthm on the whole sequence of trals mnus the total loss of the best permutaton chosen n hndsght for the whole sequence, and the goal s to fnd algorthms that have provably small worst-case regret. For example, one could consder a commuter arlne whch owns n arplanes of varous szes and fles n routes. 1 Each day the arlne must match arplanes to routes. If too small an arplane s assgned to a route then the arlne wll loose revenue and reputaton due to unserved potental passengers. On the other hand, f too large an arplane s used on a long route then the arlne could have larger than necessary fuel costs. If the number of passengers wantng each flght were known ahead of tme, then choosng an assgnment s a weghted matchng problem. In the on-lne. An earler verson of ths paper appears n Proceedngs of the Twenteth Annual Conference on Computatonal Learnng Theory (COLT 2007), publshed by Sprnger as LNAI Manfred K. Warmuth acknowledges the support of NSF grant IIS We assume that each route starts and ends at the arlne s home arport. c 10 Davd P. Helmbold and Manfred K. Warmuth.

2 HELMBOLD AND WARMUTH allocaton model, the arlne frst chooses a dstrbuton over possble assgnments of arplanes to routes and then randomly selects an assgnment from the dstrbuton. The regret of the arlne s the earnngs of the sngle best assgnment for the whole sequence of passenger requests mnus the total expected earnngs of the on-lne assgnments. When arplanes and routes are each numbered from 1 to n, then an assgnment s equvalent to selectng a permutaton. The randomness helps protect the on-lne algorthm from adversares and allows one to prove good bounds on the algorthm s regret for arbtrary sequences of requests. Snce there are n! permutatons on n elements, t s nfeasble to smply treat each permutaton as an expert and apply one of the expert algorthms that uses exponental weghts. Prevous work has exploted the combnatoral structure of other large sets of experts to create effcent algorthms (see Helmbold and Schapre, 1997; Takmoto and Warmuth, 2003; Warmuth and Kuzmn, 2008, for examples). Our soluton s to make a smplfyng assumpton on the loss functon whch allows the new algorthm, called PermELearn, to mantan a suffcent amount of nformaton about the dstrbuton over n! permutatons whle usng only n 2 weghts. We represent a permutaton of n elements as an n n permutaton matrx Π where Π, j = 1 f the permutaton maps element to poston j and Π, j = 0 otherwse. As the algorthm randomly selects a permutaton Π at the begnnng of a tral, an adversary smultaneously selects an arbtrary loss matrx L [0,1] n n whch specfes the loss of all permutatons for the tral. Each entry L, j of the loss matrx gves the loss for mappng element to j, and the loss of any whole permutaton s the sum of the losses of the permutaton s mappngs, that s, the loss of permutaton Π s L,Π() =, j Π, j L, j. Note that the per-tral expected losses can be as large as n, as opposed to the common assumpton for the expert settng that the losses are bounded n [0,1]. In Secton 3 we show how a varety of ntutve loss motfs can be expressed n ths matrx form. Ths assumpton that the loss has a lnear matrx form ensures the expected loss of the algorthm can be expressed as, j W, j L, j, where W = E( Π). Ths expectaton W s an n n weght matrx whch s doubly stochastc, that s, t has non-negatve entres and the property that every row and column sums to 1. The algorthm s uncertanty about whch permutaton s the target s summarzed by W; each weght W, j s the probablty that the algorthm predcts wth a permutaton mappng element to poston j. It s worth emphaszng that the W matrx s only a summary of the dstrbuton over permutatons used by any algorthm (t doesn t ndcate whch permutatons have non-zero probablty, for example). However, ths summary s suffcent to determne the algorthm s expected loss when the losses of permutatons have the assumed loss matrx form. Our PermELearn algorthm stores the weght matrx W and must convert W nto an effcently sampled dstrbuton over permutatons n order to make predctons. By Brkhoff s Theorem, every doubly stochastc matrx can be expressed as the convex combnaton of at most n 2 2n + 2 permutatons (see, e.g., Bhata, 1997). In Appendx A we show that a greedy matchng-based algorthm effcently decomposes any doubly stochastc matrx nto a convex combnaton of at most n 2 2n+2 permutatons. Although the effcacy of ths algorthm s mpled by standard dmensonalty arguments, we gve a new combnatoral proof that provdes ndependent nsght as to why the algorthm fnds a convex combnaton matchng Brkhoff s bound. Our algorthm for learnng permutatons predcts wth a random Π sampled from the convex combnaton of permutatons created by decomposng weght matrx W. It has been appled recently for prcng combnatoral markets when the outcomes are permutatons of objects (Chen et al., 2008). The PermELearn algorthm updates the entres of ts weght matrx usng exponental factors commonly used for updatng the weghts of experts n on-lne learnng algorthms (Lttlestone and 1706

3 LEARNING PERMUTATIONS Warmuth, 1994; Vovk, 1990; Freund and Schapre, 1997): each entry W, j s multpled by a factor e ηl, j. Here η s a postve learnng rate that controls the strength of the update (When η = 0, than all the factors are one and the update s vacuous). After ths update, the weght matrx no longer has the doubly stochastc property, and the weght matrx must be projected back nto the space of doubly stochastc matrces (called Snkhorn balancng, see Secton 4) before the next predcton can be made. In Theorem 4 we bound the expected loss of PermELearn over any sequence of trals by nlnn+ηl best 1 e η, (1) where n s the number of elements beng permuted, η s the learnng rate, and L best s the loss of the best permutaton on the entre sequence. If an upper boundl est L best s known, then η can be tuned (as n Freund and Schapre, 1997) and the expected loss bound becomes L best + 2L est nlnn+nlnn, (2) gvng a bound of 2L est nlnn+nlnn on the worst case expected regret of the tuned PermELearn algorthm. We also prove a matchng lower bound (Theorem 6) of Ω( L best nlnn) for the expected regret of any algorthm solvng our permutaton learnng problem. A smpler and more effcent algorthm than PermELearn mantans the sum of the loss matrces on the the prevous trals. Each tral t adds random perturbatons to the cumulatve loss matrx and then predcts wth the permutaton havng mnmum perturbed loss. Ths Follow the Perturbed Leader algorthm (Kala and Vempala, 2005) has good regret bounds for many on-lne learnng settngs. However, the regret bound we can obtan for t n the permutaton settng s about a factor of n worse than the bound for PermELearn and the lower bound. Although computatonally expensve, one can also consder runnng the Hedge algorthm whle explctly representng each of the n! permutatons as an expert. If T s the sum of the loss matrces over the past trals and F s the n n matrx wth entres F, j = e ηt, j, then the weght of each permutaton expert Π s proportonal to the product F,Π() and the normalzaton constant s the permanent of the matrx F. Calculatng the permanent s a known #P-complete problem and samplng from ths dstrbuton over permutatons s very neffcent (Jerrum et al., 2004). Moreover snce the loss range of a permutaton s [0,n], the standard loss bound for the algorthm that uses one expert per permutaton must be scaled up by a factor of n, becomng L best + n 2 L est n ln(n!)+nln(n!) L best + 2L est n 2 lnn+n 2 lnn. Ths expected loss bound s smlar to our expected loss bound for PermELearn n Equaton (2), except that the nlnn terms are replaced by n 2 lnn. Our method based on Snkhorn balancng bypasses the estmaton of permanents and somehow PermELearn s mplct representaton and predcton method explot the structure of permutatons and lets us obtan the mproved bound. We also gve a matchng lower bound that shows PermELearn has the optmum regret bound (up to a small constant factor). It s an nterestng open queston whether the structure of permutatons can be exploted to prove bounds lke (2) for the Hedge algorthm wth one expert per permutaton. PermELearn s weght updates belong to the Exponentated Gradent famly of updates (Kvnen and Warmuth, 1997) snce the components L, j of the loss matrx that appear n the exponental 1707

4 HELMBOLD AND WARMUTH factor are the dervatves of our lnear loss wth respect to the weghts W, j. Ths famly of updates usually mantans a probablty vector as ts weght vector. In that case the normalzaton of the weght vector s straghtforward and s folded drectly nto the update formula. Our new algorthm PermELearn for learnng permutatons mantans a doubly stochastc matrx wth n 2 weghts. The normalzaton alternately normalzes the rows and columns of the matrx untl convergence (Snkhorn balancng). Ths may requre an unbounded number of steps and the resultng matrx does not have a closed form. Despte ths fact, we are able to prove bounds for our algorthm. We frst show that our update mnmzes a tradeoff between the loss and a relatve entropy between doubly stochastc matrces. Ths relatve entropy becomes our measure of progress n the analyss. Luckly, the un-normalzed multplcatve update already makes enough progress (towards the best permutaton) to acheve the loss bound quoted above. Fnally, we nterpret the teratons of Snkhorn balancng as Bregman projectons wth respect to the same relatve entropy and show usng the propertes of Bregman projectons that these projectons can only ncrease the progress and thus don t hurt the analyss (Herbster and Warmuth, 2001). Our new nsght of splttng the update nto an un-normalzed step followed by a normalzaton step also leads to a streamlned proof of the loss bound for the Hedge algorthm n the standard expert settng that s nterestng n ts own rght. Snce the loss n the allocaton settng s lnear, the bounds can be proven n many dfferent ways, ncludng potental based methods (see, e.g., Kvnen and Warmuth, 1999; Gordon, 2006; Cesa-Banch and Lugos, 2006). For the sake of completeness we reprove our man loss bound for PermELearn usng potental based methods n Appendx B. We show how potental based proof methods can be extended to handle lnear equalty constrants that don t have a soluton n closed form, parallelng a related extenson to lnear nequalty constrants n Kuzmn and Warmuth (2007). In ths appendx we also dscuss the relatonshp between the projecton and potental based proof methods. In partcular, we show how the Bregman projecton step corresponds to pluggng n suboptmal dual varables nto the potental. The remander of the paper s organzed as follows. We ntroduce our notaton n the next secton. Secton 3 presents the permutaton learnng model and gves several ntutve examples of approprate loss motfs. Secton 4 gves the PermELearn algorthm and dscusses ts computatonal requrements. One part of the algorthm s to decompose the current doubly stochastc matrx nto a small convex combnaton of permutatons usng a greedy algorthm. The bound on the number of permutatons needed to decompose the weght matrx s deferred to Appendx A. We then bound PermELearn s regret n Secton 5 n a two-step analyss that uses a relatve entropy as a measure of progress. To exemplfy the new technques, we also analyze the basc Hedge algorthm wth the same methodology. The regret bounds for Hedge and PermELearn are re-proven n Appendx B usng potental based methods. In Secton 6, we apply the Follow the Perturbed Leader algorthm to learnng permutatons and show that the resultng regret bounds are not as good. In Secton 7 we prove a lower bound on the regret when learnng permutatons that s wthn a small constant factor of our regret bound on the tuned PermELearn algorthm. The concludng secton descrbes extensons and drectons for further work. 2. Notaton All matrces wll be n n matrces. When A s a matrx, A, j denotes the entry of A n row, and column j. We use A B to denote the dot product between matrces A and B, that s,, j A, j B, j. We use sngle superscrpts (e.g., A k ) to dentfy matrces/permutatons from a sequence. 1708

5 LEARNING PERMUTATIONS Permutatons on n elements are frequently represented n two ways: as a bjectve mappng of the elements {1,...,n} nto the postons {1,...,n} or as a permutaton matrx whch s an n n bnary matrx wth exactly one 1 n each row and each column. We use the notaton Π (and Π) to represent a permutaton n ether format, usng the context to ndcate the approprate representaton. Thus, for each {1,...,n}, we use Π() to denote the poston that the th element s mapped to by permutaton Π, and matrx element Π, j = 1 f Π() = j and 0 otherwse. If L s a matrx wth n rows then the product ΠL permutes the rows of L: Π = L = ΠL = perm. (2,4,3,1) as matrx an arbtrary matrx permutng the rows Convex combnatons of permutatons create doubly stochastc or balanced matrces: nonnegatve matrces whose n rows and n columns each sum to one. Our algorthm mantans ts uncertanty about whch permutaton s best as a doubly stochastc weght matrx W and needs to randomly select a permutaton from some dstrbuton whose expectaton s W. By Brkhoff s Theorem (see, e.g., Bhata, 1997), for every doubly stochastc matrx W there s a decomposton nto a convex combnaton of at most n 2 2n+2 permutaton matrces. We show n Appendx A how a decomposton of ths sze can be found effectvely. Ths decomposton gves a dstrbuton over permutatons whose expectaton s W that now can be effectvely sampled because ts support s at most n 2 2n+2 permutatons. 3. On-lne Protocol We are nterested n learnng permutatons n a model related to the on-lne allocaton model of learnng wth experts (Freund and Schapre, 1997). In that model there are N experts and at the begnnng of each tral the algorthm allocates a probablty dstrbuton w over the experts. The algorthm pcks expert wth probablty w and then receves a loss vector l [0,1] N. Each expert ncurs loss l and the expected loss of the algorthm s w l. Fnally, the algorthm updates ts dstrbuton w for the next tral. In case of permutatons we could have one expert per permutaton and allocate a dstrbuton over the n! permutatons. Explctly trackng ths dstrbuton s computatonally expensve, even for moderate n. As dscussed n the ntroducton, we assume that the losses n each tral can be specfed by a loss matrx L [0,1] n n where the loss of each permutaton Π has the lnear form L,Π() = Π L. If the algorthm s predcton Π s chosen probablstcally n each tral then the algorthm s expected loss s E[ Π L] = W L, where W = E[ Π]. Ths expected predcton W s an n n doubly stochastc matrx and algorthms for learnng permutatons under the lnear loss assumpton can be vewed as mplctly mantanng such a doubly stochastc weght matrx. More precsely, the on-lne algorthm follows the followng protocol n each tral: The learner (probablstcally) chooses a permutaton Π, and let W = E( Π). Nature smultaneously chooses a loss matrx L [0,1] n n for the tral. At the end of the tral, the algorthm s gven L. The loss of Π s Π L and the expected loss of the algorthm s W L. 1709

6 HELMBOLD AND WARMUTH Fnally, the algorthm updates ts dstrbuton over permutatons for the next tral, mplctly updatng matrx W. Although our algorthm can handle arbtrary sequences of loss matrces L [0,1] n n, nature could be sgnfcantly more restrcted. Many rankng applcatons have an assocated loss motf M and nature s constraned to choose (row) permutatons of M as ts loss matrx L. In effect, at each tral nature chooses a correct permutaton Π and uses the loss matrx L = ΠM. Note that the permutaton left-multples the loss motf, and thus permutes the rows of M. If nature chooses the dentty permutaton then the loss matrx L s the motf M tself. When M s known to the algorthm, t suffces to gve the algorthm only the permutaton Π at the end of the tral, rather than the loss matrx L tself. Fgure 1 gves examples of loss motfs. The last loss n Fgure 1 s related to a compettve Lst Update Problem where an algorthm servces requests to a lst of n tems. In the Lst Update Problem the cost of a request s the requested tem s current poston n the lst. After each request, the requested tem can be moved forward n the lst for free, and addtonal rearrangement can be done at a cost of one per transposton. The goal s for the algorthm to be cost-compettve wth the best statc orderng of the elements n hndsght. Note that the transposton cost for addtonal lst rearrangement s not represented n the permutaton loss motf. Blum et al. (2003) gve very effcent algorthms for the Lst Update Problem that do not do addtonal rearrangng of the lst (and thus do not ncur the cost neglect by the loss motf). In our notaton, ther bound has the same form as ours (1) but wth the nlnn factors replaced by O(n). However, our lower bound (see Secton 7) shows that the n ln n factors n (2) are necessary n the general permutaton settng. Note that many compostons of loss motfs are possble. For example, gven two motfs wth ther assocated losses, any convex combnaton of the motfs creates a new motf for the same convex combnaton of the assocated losses. Other component-wse combnatons of two motfs (such as product or max) can also produce nterestng loss motfs, but the combnaton usually cannot be dstrbuted across the matrx dot-product calculaton, and so cannot be expressed as a smple lnear functon of the orgnal losses. 4. PermELearn Algorthm Our permutaton learnng algorthm uses exponenental weghts and we call t PermELearn. It mantans an n n doubly stochastc weght matrx W as ts man data structure, where W, j s the probablty that PermELearn predcts wth a permutaton mappng element to poston j. In the absence of pror nformaton t s natural to start wth unform weghts, that s, the matrx wth 1 n n each entry. In each tral PermELearn does two thngs: 1. Choose a permutaton Π from some dstrbuton such that E[ Π] = W. 2. Create a new doubly stochastc matrx W for use n the next tral based on the current weght matrx W and loss matrx L. 1710

7 LEARNING PERMUTATIONS lossl( Π,Π) the number of elements where Π() Π 1 n 1 n =1 Π() Π(), how far the elements are from ther correct postons (the dvson by n 1 ensures that the entres of M are n [0,1].) 1 n 1 n Π() Π() =1 Π(), a poston weghted verson of the above emphaszng the early postons n Π the number of elements mapped to the frst half by Π but the second half by Π, or vce versa the number of elements mapped to the frst two postons by Π that fal to appear n the top three poston of Π the number of lnks traversed to fnd the frst element of Π n a lst ordered by Π motf M /2 0 1/ /3 1/3 0 1/3 3/4 1/2 1/ Fgure 1: Loss motfs Choosng a permutaton s done by Algorthm 1. The algorthm greedly decomposes W nto a convex combnaton of at most n 2 2n+2 permutatons (see Theorem 7), and then randomly selects one of these permutatons for the predcton. 2 Our decomposton algorthm uses a Temporary matrx A ntalzed to the weght matrx W. Each teraton of Algorthm 1 fnds a permutaton Π where each A,Π() > 0. Ths can be done by fndng a perfect matchng on the n n bpartte graph contanng the edge, j whenever A, j > 0. We shall soon see that each matrx A s a constant tmes a doubly stochastc matrx, so the exstence of a sutable permutaton Π follows from Brkhoff s Theorem. Gven such a permutaton Π, the algorthm updates A to A απ where α = mn A,Π(). The updated matrx A has non-negatve entres and has strctly more zeros than the orgnal A. Snce the update decreases each row and 2. The decomposton s usually not unque and the mplementaton may have a bas as to exactly whch convex combnaton s chosen. 1711

8 HELMBOLD AND WARMUTH Algorthm 1 PermELearn: Selectng a permutaton Requre: a doubly stochastc n n matrx W A := W; q = 0; repeat q := q+1; Fnd permutaton Π q such that A,Π q () s postve for each {1,...,n} α q := mn A,Π q () A := A α q Π q untl All entres of A are zero {at end of loop W = q k=1 α kπ k } Randomly select and return a Π {Π 1,...,Π q } usng probabltes α 1,...,α q. Algorthm 2 PermELearn: Weght Matrx Update Requre: learnng rate η, loss matrx L, and doubly stochastc weght matrx W Create W where each W, j = W, j e ηl, j (3) Create doubly stochastc W by re-balancng the rows and columns of W (Snkhorn balancng) and update W to W. column sum by α and the orgnal matrx W was doubly stochastc, each matrx A wll have rows and columns that sum to the same amount. In other words, each matrx A created durng Algorthm 1 s a constant tmes a doubly stochastc matrx, and thus (by Brkhoff s Theorem) s a constant tmes a convex combnaton of permutatons. After at most n 2 n teratons the algorthm arrves at a matrx A havng exactly n non-zero entres, so ths A s a constant tmes a permutaton matrx. Therefore, Algorthm 1 decomposes the orgnal doubly stochastc matrx nto the convex combnaton of (at most) n 2 n+1 permutaton matrces. The more refned arguments n Appendx A shows that the Algorthm 1 never uses more than n 2 2n+2 permutatons, matchng the bound gven by Brkhoff s Theorem. Several mprovements are possble. In partcular, we need not compute each perfect matchng from scratch. If only z entres of A are zeroed by a permutaton, then that permutaton s stll a matchng of sze n z n the graph for the updated matrx. Thus we need to fnd only z augmentng paths to complete the perfect matchng. The entre process thus requres fndng O(n 2 ) augmentng paths at a cost of O(n 2 ) each, for a total cost of O(n 4 ) to decompose weght matrx W nto a convex combnaton of permutatons. 4.1 Updatng the Weghts In the second step, Algorthm 2 updates the weght matrx by multplyng each W, j entry by the factor e ηl, j. These factors destroy the row and column normalzaton, so the matrx must be rebalanced to restore the doubly-stochastc property. There s no closed form for the normalzaton step. The standard teratve re-balancng method for non-negatve matrces s called Snkhorn balancng. Ths method frst normalzes each row of the matrx to sum to one, and then normalzes the columns. Snce normalzng the columns typcally destroys the row normalzaton, the process must be terated untl convergence (Snkhorn, 1964). 1712

9 LEARNING PERMUTATIONS ( ) Snkhorn balancng = Fgure 2: Example where Snkhorn balancng requres nfntely many steps. Normalzng the rows corresponds to pre-multplyng by a dagonal matrx. The product of these dagonal matrces thus represents the combned effect of the multple row normalzaton steps. Smlarly, the combned effect of the column normalzaton steps can be represented by post-multplyng the matrx by a dagonal matrx. Therefore we get the well known fact that Snkhorn balancng a matrx A results n a doubly stochastc matrx RAC where R and C are dagonal matrces. Each entry R, s the postve multpler appled to row, and each entry C j, j s the postve multpler of column j needed to convert A nto a doubly stochastc matrx. In Fgure 2 we gve a ratonal matrx that balances to an rratonal matrx. Snce each row and column balancng step creates ratonals, Snkhorn balancng produces rratonals only n the lmt (after nfntely many steps). Multplyng a weght matrx from the left and/or rght by non-negatve dagonal matrces (e.g., row or column normalzaton) preserves the rato of product weghts between permutatons. That s f A = RAC, then for any two permutatons Π 1 and Π 2, A,Π 1 () A,Π 2 () = A,Π1 ()R, C Π1 (),Π 1 () A,Π2 ()R, C Π2 (),Π 2 () = A,Π1 (). A,Π2 () ( ) 1/2 1/2 Therefore must balance to a doubly stochastc matrx ( ) a 1 a 1/2 1 1 a a such that the rato of the product weght between the two permutatons (1,2) and (2,1) s preserved. Ths means 1/ /4 = a2 (1 a) 2 and thus a = Ths example leads to another mportant observaton: PermELearn s predctons are dfferent than Hedge s when each permutaton s treated as an expert. If each permutaton s explctly represented as an expert, then the Hedge algorthm predcts permutaton Π wth probablty proportonal to the product weght, e η t L t,π(). However, algorthm PermELearn predcts dfferently. Wth the weght matrx n Fgure 4.1, Hedge puts probablty 2 3 on permutaton (1,2) and probablty 1 3 on permutaton (2, 1) whle PermELearn puts probablty on permutaton (1,2) and 2 probablty on permutaton (2,1). 2 There has been much wrtten on the balancng of matrces, and we brefly descrbe only a few of the results here. Snkhorn showed that ths procedure converges and that the RAC balancng of any matrx A nto a doubly stochastc matrx s unque (up to cancelng multples of R and C) f t exsts 3 (Snkhorn, 1964). A number of authors consder balancng a matrx A so that the row and column sums are 1 ± ε. Frankln and Lorenz (1989) show that O(length(A)/ε) Snkhorn teratons suffce, where length(a) s the bt-length of matrx A s bnary representaton. Kalantar and Khachyan (1996) show that 3. Some non-negatve matrces, lke , cannot be converted nto doubly stochastc matrces because of ther pattern of zeros. The weght matrces we deal wth have strctly postve entres, and thus can always be made doubly stochastc wth an RAC balancng. 1713

10 HELMBOLD AND WARMUTH O(n 4 ln n ε ln 1 mna, j ) operatons suffce usng an nteror pont method. Lnal et al. (2000) gve a preprocessng step after whch only O((n/ε) 2 ) Snkhorn teratons suffce. They also present a strongly polynomal tme teratve procedure requrng Õ(n 7 log(1/ε)) teratons. Balakrshnan et al. (2004) gve an nteror pont method wth complexty O(n 6 log(n/ε)). Fnally, Fürer (2004) shows that f the row and column sums of A are 1 ± ε then every matrx entry changes by at most ±nε when A s balanced to a doubly stochastc matrx. 4.2 Dealng wth Approxmate Balancng Wth slght modfcatons, Algorthm PermELearn can handle the stuaton where ts weght matrx s mperfectly balanced (and thus not qute doubly stochastc). As before, let W be the fully balanced doubly stochastc weght matrx, but we now assume that only an approxmately balanced Ŵ s avalable to predct from. In partcular, we assume that each row and column of Ŵ sum to 1 ± ε for some ε < 1 3. Let s 1 ε be the smallest row or column sum n Ŵ. We modfy Algorthm 1 n two ways. Frst, A s ntalzed to 1 sŵ rather than W. Ths ensures every row and column n the ntal A sums to at least one, to at most 1+3ε, and at least one row or column sums to exactly 1. Second, the loop exts as soon as A has an all-zero row or column. Snce the smallest row or column sum starts at 1, s decreased by α k each teraton k, and ends at zero, we have that q k=1 α k = 1 and the modfed Algorthm 1 stll outputs a convex combnaton of permutatons C = q k=1 α kπ k. Furthermore, each entry C, j 1 sŵ, j. We now bound the addtonal loss of ths modfed algorthm. Lemma 1 If the weght matrx Ŵ s approxmately balanced so each row and column sum s n 1±ε (for ε 1 3 ) then the modfed Algorthm 1 has an expected loss C L at most 3n3 ε greater than the expected loss W L of the orgnal algorthm that uses the completely balanced doubly stochastc matrx W. Proof Let s be the smallest row or column sum n Ŵ. Snce each row and column sum of 1 sŵ les n [1,1+3ε], each entry of 1 sŵ s close to the correspondng entry of the fully balanced W. In partcular each 1 sŵ, j W, j + 3nε (Fürer, 2004). Ths allows us to bound the expected loss when predctng wth the convex combnaton C n terms of the expected loss usng a decomposton of the perfectly balanced W: C L 1 sŵ L Ŵ, j =, j s L, j (W, j + 3nε)L, j, j W L+3n 3 ε. Therefore the extra loss ncurred by usng a ε-approxmately balanced weght matrx at a partcular tral s at most 3n 3 ε, as desred. 1714

11 LEARNING PERMUTATIONS If n a sequence of T trals the matrces Ŵ are ε = 1/(3T n 3 ) balanced (so that each row and column sum s 1 ± 1/(3T n 3 )) then Lemma 1 mples that the total addtonal expected loss for usng approxmate balancng s at most 1. The algorthm of Balakrshnan et al. (2004) ε-balances a matrx n O(n 6 log(n/ε)) tme (note that ths domnates the tme for the loss update and constructng the convex combnaton). Ths balancng algorthm wth ε = 1/(3T n 3 ) together wth the modfed predcton algorthm gve a method requrng O(T n 6 log(t n)) total tme over the T trals and havng a bound of 2L est nlnn+nlnn+1 on the worst-case regret. If the number of trals T s not known n advance then settng ε as a functon of t can be helpful. A natural choce s ε t = 1/(3t 2 n 3 ). In ths case the total extra regret for not havng perfect balancng s bounded by T t=1 1/t2 5/3 and the total computaton tme over the T trals s stll bounded by O(T n 6 log(t n)). One mght be concerned about the effects of approxmate balancng propagatng between trals. However ths s not an ssue. In the followng secton we show that the loss updates and balancng can be arbtrarly nterleaved. Therefore the modfed algorthm can ether keep a cumulatve loss matrx L t = t =1 L and create ts next Ŵ by (approxmately) balancng the matrx wth entres 1 n e ηl t, j, or apply the multplcatve updates to the prevous approxmately balanced Ŵ. 5. Bounds for PermELearn Our analyss of PermELearn follows the entropy-based analyss of the exponentated gradent famly of algorthms (Kvnen and Warmuth, 1997). Ths style of analyss frst shows a per-tral progress bound usng relatve entropy to a comparator as a measure of progress, and then sums ths nvarant over the trals to bound the expected total loss of the algorthm. We also show that PermELearn s weght update belongs to the exponentated gradent famly of updates (Kvnen and Warmuth, 1997) snce t s the soluton to a mnmzaton problem that trades of the loss (n ths case a lnear loss) aganst a relatve entropy regularzaton. Recall that the expected loss of PermELearn on a tral s a lnear functon of ts weght matrx W. Therefore the gradent of the loss s ndependent of the current value of W. Ths property of the loss greatly smplfes the analyss. Our analyss for ths settng provdes a good foundaton for learnng permutaton matrces and lays the groundwork for the future study of other permutaton loss functons. We start our analyss wth an attempt to mmc the standard analyss (Kvnen and Warmuth, 1997) for the exponentated gradent famly updates whch multply by exponental factors and renormalze. The per-tral nvarant used to analyze the exponentated gradent famly bounds the decrease n relatve entropy from any (normalzed) vector u to the algorthm s weght vector by a lnear combnaton of the algorthm s loss and the loss of u on the tral. In our case the weght vectors are matrces and we use the followng (un-normalzed) relatve entropy between matrces A and B wth non-negatve entres: (A,B) = A, j ln A, j + B, j A, j., j B, j Note that ths s just the sum of the relatve entropes between the correspondng rows (or equvalently, between the correspondng columns): (A,B) = (A,,B, ) = (A, j,b, j ) j 1715

12 HELMBOLD AND WARMUTH (here A, s the th row of A and A, j s ts jth column). Unfortunately, the lack of a closed form for the matrx balancng procedure makes t dffcult to prove bounds on the loss of the algorthm. Our soluton s to break PermELearn s update (Algorthm 2) nto two steps, and use only the progress made to the ntermedate un-balanced matrx n our per-tral bound (8). After showng that balancng to a doubly stochastc matrx only ncreases the progress, we can sum the per-tral bound to obtan our man theorem. 5.1 A Dead End In each tral, PermELearn multples each entry of ts weght matrx by an exponental factor and then uses one addtonal factor per row and column to make the matrx doubly stochastc (Algorthm 2 descrbed n Secton 4.1): W, j := r c j W, j e ηl, j (4) where the r and c j factors are chosen so that all rows and columns of the matrx W sum to one. We now show that PermELearn s update (4) gves the matrx A solvng the followng mnmzaton problem: argmn : j A, j = 1 j : A, j = 1 ( (A,W)+η (A L)). (5) Snce the lnear constrants are feasble and the dvergence s strctly convex, there always s a unque soluton, even though the soluton does not have a closed form. Lemma 2 PermELearn s updated weght matrx W (4) s the soluton of (5). Proof We form a Lagrangan for the optmzaton problem: l(a,ρ,γ) = (A,W)+η (A L)+ ρ ( j A, j 1)+ j γ j (A, j 1). Settng the dervatve wth respect to A, j to 0 yelds A, j = W, j e ηl, j e ρ e γ j. By enforcng the row and column sum constrants we see that the factors r = e ρ and c j = e γ j functon as row and column normalzers, respectvely. We now examne the progress (U,W) (U, W) towards an arbtrary stochastc matrx U. Usng Equaton (4) and notng that all three matrces are doubly stochastc (so ther entres sum to n), we see that (U,W) (U, W) = ηu L+ lnr +lnc j. j Makng ths a useful nvarant requres lower boundng the sums on the rhs by a constant tmes W L, the loss of the algorthm. Unfortunately we are stuck because the r and c j normalzaton factors don t even have a closed form. 1716

13 LEARNING PERMUTATIONS 5.2 Successful Analyss Our successful analyss splts the update (4) nto two steps: W, j := W, j e ηl, j and W, j := r c j W, j, (6) where (as before) r and c j are chosen so that each row and column of the matrx W sum to one. Usng the Lagrangan (as n the proof of Lemma 2), t s easy to see that these W and W matrces solve the followng mnmzaton problems: W = argmn( (A,W)+η (A L)) and W := argmn A : j A, j = 1 j : A, j = 1 (A,W ). (7) The second problem shows that the doubly stochastc matrx W s the projecton of W onto to the lnear row and column sum constrants. The strct convexty of the relatve entropy between nonnegatve matrces and the feasblty of the lnear constrants ensure that the solutons for both steps are unque. We now lower bound the progress (U,W) (U,W ) n the followng lemma to get our pertral nvarant. Lemma 3 For any η > 0, any doubly stochastc matrces U and W and any tral wth loss matrx L [0,1] n n (U,W) (U,W ) (1 e η )(W L) η(u L), where W s the unbalanced ntermedate matrx (6) constructed by PermELearn from W. Proof The proof manpulates the dfference of relatve entropes and uses the nequalty e ηx 1 (1 e η )x, whch holds for any η and any x [0,1]: (U,W) (U,W ) = (U, j ln W, j, j =, j, j ) +W, j W, j W, j ( U, j ln(e ηl, j )+W, j W, j e ηl ), j ( ηl, j U, j +W, j W, j (1 (1 e η )L, j ) ) = η(u L)+(1 e η )(W L). Relatve entropy s a Bregman dvergence, so the Generalzed Pythagorean Theorem (Bregman, 1967) apples. Specalzed to our settng, ths theorem states that f S s a closed convex set contanng some matrx U wth non-negatve entres, W s any matrx wth strctly postve entres, and W s the relatve entropy projecton of W onto S then (U,W ) (U, W)+ ( W,W ). 1717

14 HELMBOLD AND WARMUTH Furthermore, ths holds wth equalty when S s affne, whch s the case here snce S s the set of matrces whose rows and columns each sum to 1. Rearrangng and notng that (A,B) s nonnegatve yelds Corollary 3 of Herbster and Warmuth (2001), whch s the nequalty we need: (U,W ) (U, W) = ( W,W ) 0. Combnng ths wth the nequalty of Lemma 3 gves the crtcal per-tral nvarant: (U,W) (U, W) (1 e η )(W L) η(u L). (8) We now ntroduce some notaton and bound the expected total loss by summng the above nequalty over a sequence of trals. When consderng a sequence of trals, L t s the loss matrx at tral t, W t 1 s PermELearn s weght matrx W at the start of tral t (so W 0 s the ntal weght matrx) and W t s the updated weght matrx W at the end of the tral. Theorem 4 For any learnng rate η > 0, any doubly stochastc matrces U and ntal W 0, and any sequence of T trals wth loss matrces L t [0,1] n n (for 1 t T ), the expected loss of PermELearn s bounded by: T t=1 Proof Applyng (8) to tral t gves: W t 1 L t (U,W 0 ) (U,W T )+η T t=1 U Lt 1 e η. (U,W t 1 ) (U,W t ) (1 e η )(W t 1 L t ) η(u L t ). By summng the above over all T trals we get: (U,W 0 ) (U,W T ) (1 e η ) T t=1 W t 1 L t η T t=1 U L t. The bound then follows by solvng for the total expected loss, T t=1 W t 1 L t, of the algorthm. When the entres of W 0 are all ntalzed to 1 n and U s a permutaton then (U,W 0 ) = nlnn. Snce each doubly stochastc matrx U s a convex combnaton of permutaton matrces, at least one mnmzer of the total loss t=1 T U L wll be a permutaton matrx. IfL best denotes the loss of such a permutaton U, then Theorem 4 mples that the total loss of the algorthm s bounded by (U,W 0 )+ηl best 1 e η. If upper bounds (U,W 0 ) Dest nlnn and L est L best are known, then by choosng η = ( ) ln 1+ 2D est L, and the above bound becomes (Freund and Schapre, 1997): est L best + 2L est Dest + (U,W 0 ). (9) A natural choce for Dest s nlnn. In ths case the tuned bound becomes L best + 2L est nlnn+nlnn. 1718

15 LEARNING PERMUTATIONS 5.3 Approxmate Balancng The precedng analyss assumes that PermELearn s weght matrx s perfectly balanced each teraton. However, balancng technques are only capably of approxmately balancng the weght matrx n fnte tme, so mplementatons of PermELearn must handle approxmately balanced matrces. In Secton 4.2, we descrbe an mplementaton that uses an approxmately balanced Ŵ t 1 at the start of teraton t rather than the completely balanced W t 1 of the precedng analyss. Lemma 1 shows that when ths mplementaton of PermELearn uses an approxmately balanced Ŵ t 1 where each row and column sum s n 1 ± ε t, then the expected loss on tral t s at most W t 1 L t + 3n 3 ε t. Summng over all trals and usng Theorem 4, ths mplementaton s total loss s at most T t=1 ( W t 1 L t + 3n 3 ) (U,W 0 ) (U,W T )+η T ε t t=1 U Lt 1 e η + T t=1 3n 3 ε t. As dscussed n Secton 4.2, settng ε t = 1/(3n 3 t 2 ) leads to an addtonal loss of less than 5/3 over the bound of Theorem 4 and ts subsequent tunngs whle ncurrng a total runnng tme (over all T trals) n O(T n 6 log(t n)). In fact, the addtonal loss for approxmate balancng can be made less than any postve c by settng ε t = c/(5n 3 t 2 ). Snce the tme to approxmately balance depends only logarthmcally on 1/ε, the total tme taken over T trals remans n O(T n 6 log(t n)). 5.4 Splt Analyss for the Hedge Algorthm Perhaps the smplest case where the loss s lnear n the parameter vector s the on-lne allocaton settng of Freund and Schapre (1997). It s nstructve to apply our method of splttng the update n ths smpler settng. There are N experts and the algorthm keeps a probablty dstrbuton w over the experts. In each tral the algorthm pcks expert wth probablty w and then gets a loss vector l [0,1] N. Each expert ncurs loss l and the algorthm s expected loss s w l. Fnally w s updated to w for the next tral. The Hedge algorthm (Freund and Schapre, 1997) updates ts weght vector to w = w e ηl j w j e ηl j. Ths update can be motvated by a tradeoff between the un-normalzed relatve entropy to the old weght vector and expected loss n the last tral (Kvnen and Warmuth, 1999): w := argmn( (ŵ,w)+η ŵ l). ŵ =1 For vectors, the relatve entropy s smply (ŵ,w) := ŵ ln ŵ w + w ŵ. As n the permutaton case, we can splt ths update (and motvaton) nto two steps: settng each w = w e ηl then w = w / w. These are the solutons to: w := argmn ŵ ( (ŵ,w)+η ŵ l) and w := argmn (ŵ,w ). ŵ =1 1719

16 HELMBOLD AND WARMUTH The followng lower bound has been shown on the progress towards any probablty vector u servng as a comparator: 4 (u,w) (u, w) = η u l lnw e ηl η u l lnw (1 (1 e η )l ) η u l+w l (1 e η ), (10) where the frst nequalty uses e ηx 1 (1 e η )x, for any x [0,1], and the second uses ln(1 x) x, for x [0,1]. Surprsngly the same nequalty already holds for the un-normalzed update: 5 (u,w) (u,w ) = η u l+w (1 e ηl ) w l (1 e η ) η u l. Snce the normalzaton s a projecton w.r.t. a Bregman dvergence onto a lnear constrant satsfed by the comparator u, (u,w ) (u, w) 0 by the Generalzed Pythagorean Theorem (Herbster and Warmuth, 2001). The total progress for both steps s agan Inequalty (10). Wth the key Inequalty (10) n hand, t s easy to ntroduce tral dependent notaton and sum over trals (as done n the proof of Theorem 4, arrvng at the famlar bound for Hedge (Freund and Schapre, 1997): For any η > 0, any probablty vectors w 0 and u, and any loss vectors l t [0,1] n, T t=1 w t 1 l t (u,w0 ) (u,w T )+η T t=1 u lt 1 e η. (11) Note that the r.h.s. s actually constant n the comparator u (Kvnen and Warmuth, 1999), that s, for all u, (u,w 0 ) (u,w T )+ηt=1 T u lt 1 e η = ln w 0 e ηl T 1 e η. The r.h.s. of the above equalty s often used as a potental n provng bounds for expert algorthms. We dscuss ths further n Appendx B. 5.5 When to Normalze? Probably the most surprsng aspect about the proof methodology s the flexblty about how and when to project onto the constrants. Instead of projectng a nonnegatve matrx onto all 2n constrants at once (as n optmzaton problem (7)), we could mmc the Snkhorn balancng algorthm by frst projectng onto the row constrants and then the column constrants and alternatng untl convergence. The Generalzed Pythagorean Theorem shows that projectng onto any convex constrant that s satsfed by the comparator class of doubly stochastc matrces brngs the weght matrx closer to every doubly stochastc matrx. 6 Therefore our bound on t W t 1 L t (Theorem 4) holds f the exponental updates are nterleaved wth any sequence of projectons to some subsets of the 4. Ths s essentally Lemma 5.2 of Lttlestone and Warmuth (1994). The reformulaton of ths type of nequalty wth relatve entropes goes back to Kvnen and Warmuth (1999) 5. Note that f the algorthm does not normalze the weghts then w s no longer a dstrbuton. When w < 1, the loss w L amounts to ncurrng 0 loss wth probablty 1 w, and predctng as expert wth probablty w. 6. There s a large body of work on fndng a soluton subject to constrants va terated Bregman projectons (see, e.g., Censor and Lent, 1981). 1720

17 LEARNING PERMUTATIONS constrants. However, f the normalzaton constrants are not enforced then W s no longer a convex combnaton of permutatons. Furthermore, the exponental update factors only decrease the entres of W and wthout any normalzaton all of the entres of W can get arbtrarly small. If ths s allowed to happen then the loss W L can approach 0 for any loss matrx, volatng the sprt of the predcton model. There s a drect argument that shows that the same fnal doubly stochastc matrx s reached f we nterleave the exponental updates wth projectons to any of the constrants as long as all 2n constrants hold at the end. To see ths we partton the class of matrces wth postve entres nto equvalence classes. Call two such matrces A and B equvalent f there are dagonal matrces R and C wth postve dagonal entres such that B = RAC. Note that [RAC], j = R, A, j C j, j and therefore B s just a rescaled verson of A. Projectng onto any row and/or column sum constrants amounts to pre- and/or post-multplyng the matrx by some postve dagonal matrces R and C. Therefore f matrces A and B are equvalent then the projecton of A (or B) onto a set of row and/or column sum constrants results n another matrx equvalent to both A and B. The mportance of equvalent matrces s that they balance to the same doubly stochastc matrx. Lemma 5 For any two equvalent matrces A and RAC, where the entres of A and the dagonal entres of R and C are postve, argmn : j Â, j = 1 j : Â, j = 1 (Â, A) = argmn : j Â, j = 1 j : Â, j = 1 (Â,RAC). Proof The strct convexty of the relatve entropy mples that both problems have a unque matrx as ther soluton. We wll now reason that the unque solutons for both problems are the same. By usng a Lagrangan (as n the proof of Lemma 2) we see that the soluton of the left optmzaton problem s a square matrx wth ṙ A, j ċ j n poston, j. Smlarly the soluton of the problem on the rght has r R, A, j C j, j c j n poston, j. Here the factors ṙ, r functon as row normalzers and ċ j, c j as column normalzers. Gven a soluton matrx ṙ,ċ j to the left problem, then ṙ /R,,ċ j /C j, j s a soluton of the rght problem of the same value. Also f r, c j s a soluton of rght problem, then r R,, c j C j, j s a soluton to the left problem of the same value. Ths shows that both mnmzaton problems have the same value and the matrx solutons for both problems are the same and unque (even though the normalzaton factors ṙ,ċ j of say the left problem are not necessarly unque). Note that ts crucal for the above argument that the dagonal entres of R,C are postve. The analogous phenomenon s much smpler n the weghted majorty case: Two non-negatve vectors a and b are equvalent f a = cb, where c s any nonnegatve scalar, and agan each equvalence class has exactly one normalzed weght vector. PermELearn s ntermedate matrx W, j := W, je ηl, j can be wrtten W M where denotes the Hadamard (entry-wse) Product and M, j = e ηl, j. Note that the Hadamard product commutes wth matrx multplcaton by dagonal matrces, f C s dagonal and P = (A B)C then P, j = (A, j B, j )C j, j = (A, j C j, j )B, j so we also have P = (AC) B. Smlarly, R(A B) = (RA) B when R s dagonal. 1721

18 HELMBOLD AND WARMUTH Hadamard products also preserve equvalence. For equvalent matrces A and B = RAC (for dagonal R and C) the matrces A M and B M are equvalent (although they are not lkely to be equvalent to A and B) snce B M = (RAC) M = R(A M)C. Ths means that any two runs of PermELearn-lke algorthms that have the same bag of loss matrces and equvalent ntal matrces end wth equvalent fnal matrces even f they project onto dfferent subsets of the constrants at the end of the varous trals. In summary the proof method dscussed so far uses a relatve entropy as a measure of progress and reles on Bregman projectons as ts fundamental tool. In Appendx B we re-derve the bound for PermELearn usng the value of the optmzaton problem (5) as a potental. Ths value s expressed usng the dual optmzaton problem and ntutvely the applcaton of the Generalzed Pythagorean Theorem now s replaced by pluggng n a non-optmal choce for the dual varables. Both proof technques are useful. 5.6 Learnng Mappngs We have an algorthm that has small regret aganst the best permutaton. Permutatons are a subset of all mappngs from {1,...,n} to {1,...,n}. We contnue usng Π for a permutaton and ntroduce Ψ to denote an arbtrary mappng from {1,...,n} to {1,...,n}. Mappngs dffer from permutatons n that the n dmensonal vector (Ψ()) n =1 can have repeats, that s, Ψ() mght equal Ψ( j) for j. Agan we alternately represent a mappng Ψ as an n n matrx where Ψ, j = 1 f Ψ() = j and 0 otherwse. Note that such square 7 mappng matrces have the specal property that they have exactly one 1 n each row. Agan the loss s specfed by a loss matrx L and the loss of mappng Ψ s Ψ L. It s straghtforward to desgn an algorthm MapELearn for learnng mappngs wth exponental weghts: Smply run n ndependent copes of the Hedge algorthm for each of the n rows of the receved loss matrces. That s, the r th copy of Hedge always receves the r th row of the loss matrx L as ts loss vector. Even though learnng mappngs s easy, t s nevertheless nstructve to dscuss the dfferences wth PermELearn. Note that MapELearn s combned weght matrx s now a convex combnaton of mappngs, that s, a sngly stochastc matrx wth the constrant that each row sums to one. Agan, after the exponental update (3), the constrants are typcally not satsfed any more, but they can be easly reestablshed by smply normalzng each row. The row normalzaton only needs to be done once n each tral: no teratve process s needed. Furthermore, no fancy decomposton algorthm s needed n MapELearn: for (sngly) stochastc weght matrx W, the predcton Ψ() s smply a random element chosen from the row dstrbuton W,. Ths samplng procedure produces a mappng Ψ such that W = E(Ψ) and thus E(Ψ L) = W L as needed. We can use the same relatve entropy between the sngle stochastc matrces, and the lower bound on the progress for the exponental update gven n Lemma 3 stll holds. Also our man bound (Theorem 4) s stll true for MapELearn and we arrve at the same tuned bound for the total loss of MapELearn: L best + 2L est Dest + (U,W 0 ), where L best, L est, and Dest are now the total loss of the best mappng, a known upper bound on L best, and an upper bound on (U,W 0 ), respectvely. Recall thatl est and Dest are needed to tune the η parameter. 7. In the case of mappngs the restrcton to square matrces s not essental. 1722

19 LEARNING PERMUTATIONS Our algorthm PermElearn for permutatons may be seen as the above algorthm for mappngs whle enforcng the column sum constrants n addton to the row constrants used n MapELearn. Snce PermELearn s row balancng messes up the column sums and vce versa, an nteractve procedure (.e., Snkhorn Balancng) s needed to create to a matrx n whch each row and column sums to one. The enforcement of the addtonal column sum constrants results n a doubly stochastc matrx, an apparently necessary step to produce predctons that are permutatons (and an expected predcton equal to the doubly stochastc weght matrx). When t s known that the comparator s a permutaton, then the algorthm always benefts from enforcng the addtonal column constrants. In general we should always make use of any constrants that the comparator s known to satsfy (see, e.g., Warmuth and Vshwanathan, 2005, for a dscusson of ths). As dscussed n Secton 4.1, f A s a Snkhorn-balanced verson of a non-negatve matrx A, then for any permutatons Π 1 and Π 2, A,Π1 () A,Π2 () = A,Π 1 () A. (12),Π 2 () An analogous nvarant holds for mappngs: If A s a row-balanced verson of a non-negatve matrx A, then A,Ψ1 () for any mappngs Ψ 1 and Ψ 2, = A,Ψ 1 () A,Ψ2 () A.,Ψ 2 () However t s mportant to note that column balancng does not preserve the above nvarant for mappngs. In fact, permutatons are the subclass of mappngs where nvarant 12 holds. There s another mportant dfference between PermELearn and MapELearn. For MapELearn, the probablty of predctng mappng Ψ wth weght matrx W s always the product W,Ψ(). The analogous property does not hold for PermELearn. Consder the balanced 2 2 weght matrx W on the rght of Fgure 2. Ths matrx decomposes nto tmes the permutaton (1,2) plus tmes the permutaton (2,1). Thus the probablty of predctng wth permutaton (1,2) s 2 tmes the probablty of permutaton (2,1) for the PermELearn algorthm. However, when the probabltes are proportonal to the ntutve product form W,Π(), then the probablty rato for these two permutatons s 2. Notce that ths ntutve product weght measure s the dstrbuton used by the Hedge algorthm that explctly treats each permutaton as a separate expert. Therefore PermELearn s clearly dfferent than a concse mplementaton of Hedge for permutatons. 6. Follow the Perturbed Leader Algorthm Perhaps the smplest on-lne algorthm s the Follow the Leader (FL) algorthm: at each tral predct wth one of the best models on the data seen so far. Thus FL predcts at tral t wth an expert n argmn l <t or any permutaton n argmn Π Π L <t, where <t ndcates that we sum over the past trals, that s, l <t := t 1 q=1 lq. The FL algorthm s clearly non-optmal; n the expert settng there s a smple adversary strategy that forces FL to have loss at least n tmes larger than the loss of the best expert n hndsght. The expected total loss of tuned Hedge s one tmes the loss of the best expert plus lower order terms. Hedge acheves ths by randomly choosng experts. The probablty w t 1 for choosng expert at tral t s proportonal to e ηl<t. As the learnng rate η, Hedge becomes FL (when there are 1723

20 HELMBOLD AND WARMUTH no tes) and the same holds for PermELearn. Thus the exponental weghts wth moderate η may be seen as a soft mn calculaton: the algorthm hedges ts bets and does not put all ts probablty on the expert wth mnmum loss so far. The Follow the Perturbed Leader (FPL) algorthm of Kala and Vempala (2005) s an alternate on-lne predcton algorthm that works n a very general settng. It adds random perturbatons to the total losses of the experts ncurred so far and then predcts wth the expert of mnmum perturbed loss. Ther FPL algorthm has bounds closely related to Hedge and other multplcatve weght algorthms and n some cases Hedge can be smulated exactly (Kuzmn and Warmuth, 2005) by judcously choosng the dstrbuton of perturbatons. However, for the permutaton problem the bounds we were able to obtan for FPL are weaker than the the bound we obtaned bounds for PermELearn that uses exponental weghts despte the apparent smlarty between our representatons and the general formulaton of FPL. The FPL settng uses an abstract k-dmensonal decson space used to encode predctors as well as a k-dmensonal state space used to represent the losses of the predctors. At any tral, the current loss of a partcular predctor s the dot product between that predctor s representaton n the decson space and the state-space vector for the tral. Ths general settng can explctly represent each permutaton and ts loss when k = n!. The FPL settng also easly handles the encodngs of permutatons and losses used by PermELearn by representng each permutaton matrx Π and loss matrx L as n 2 -dmensonal vectors. The FPL algorthm (Kala and Vempala, 2005) takes a parameter ε and mantans a cumulatve loss matrx C (ntally C s the zero matrx) At each tral, FPL : 1. Generates a random perturbaton matrx P where each P, j s proportonal to ±r, j where r, j s drawn from the standard exponental dstrbuton. 2. Predcts wth a permutaton Π mnmzng Π (C+ P). 3. After gettng the loss matrx L, updates C to C+ L. Note that FPL s more computatonally effcent than PermELearn. It takes only O(n 3 ) tme to make ts predcton (the tme to compute a mnmum weght bpartte matchng) and only O(n 2 ) tme to update C. Unfortunately the generc FPL loss bounds are not as good as the bounds on PermELearn. In partcular, they show that the loss of FPL on any sequence of trals s at most 8 (1+ε)L best + 8n3 (1+lnn) ε where ε s a parameter of the algorthm. When the loss of the best expert s known ahead of tme, ε can be tuned and the bound becomes L best + 4 2L best n 3 (1+lnn)+8n 3 (1+lnn). Although FPL gets the same L best leadng term, the excess loss over the best permutaton grows as n 3 lnn rather the nlnn growth of PermELearn s bound. Of course, PermELearn pays for the mproved bound by requrng more computaton. 8. The n 3 terms n the bounds for FPL are n tmes the sum of the entres n the loss matrx. So f the applcaton has a loss motf whose entres sum to only n, then the n 3 factors become n

Support Vector Machines

Support Vector Machines Support Vector Machnes Max Wellng Department of Computer Scence Unversty of Toronto 10 Kng s College Road Toronto, M5S 3G5 Canada wellng@cs.toronto.edu Abstract Ths s a note to explan support vector machnes.

More information

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by 6 CHAPTER 8 COMPLEX VECTOR SPACES 5. Fnd the kernel of the lnear transformaton gven n Exercse 5. In Exercses 55 and 56, fnd the mage of v, for the ndcated composton, where and are gven by the followng

More information

The Greedy Method. Introduction. 0/1 Knapsack Problem

The Greedy Method. Introduction. 0/1 Knapsack Problem The Greedy Method Introducton We have completed data structures. We now are gong to look at algorthm desgn methods. Often we are lookng at optmzaton problems whose performance s exponental. For an optmzaton

More information

1 Example 1: Axis-aligned rectangles

1 Example 1: Axis-aligned rectangles COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture # 6 Scrbe: Aaron Schld February 21, 2013 Last class, we dscussed an analogue for Occam s Razor for nfnte hypothess spaces that, n conjuncton

More information

What is Candidate Sampling

What is Candidate Sampling What s Canddate Samplng Say we have a multclass or mult label problem where each tranng example ( x, T ) conssts of a context x a small (mult)set of target classes T out of a large unverse L of possble

More information

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module LOSSLESS IMAGE COMPRESSION SYSTEMS Lesson 3 Lossless Compresson: Huffman Codng Instructonal Objectves At the end of ths lesson, the students should be able to:. Defne and measure source entropy..

More information

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification Lecture 4: More classfers and classes C4B Machne Learnng Hlary 20 A. Zsserman Logstc regresson Loss functons revsted Adaboost Loss functons revsted Optmzaton Multple class classfcaton Logstc Regresson

More information

v a 1 b 1 i, a 2 b 2 i,..., a n b n i.

v a 1 b 1 i, a 2 b 2 i,..., a n b n i. SECTION 8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS 455 8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS All the vector spaces we have studed thus far n the text are real vector spaces snce the scalars are

More information

An Alternative Way to Measure Private Equity Performance

An Alternative Way to Measure Private Equity Performance An Alternatve Way to Measure Prvate Equty Performance Peter Todd Parlux Investment Technology LLC Summary Internal Rate of Return (IRR) s probably the most common way to measure the performance of prvate

More information

Recurrence. 1 Definitions and main statements

Recurrence. 1 Definitions and main statements Recurrence 1 Defntons and man statements Let X n, n = 0, 1, 2,... be a MC wth the state space S = (1, 2,...), transton probabltes p j = P {X n+1 = j X n = }, and the transton matrx P = (p j ),j S def.

More information

PERRON FROBENIUS THEOREM

PERRON FROBENIUS THEOREM PERRON FROBENIUS THEOREM R. CLARK ROBINSON Defnton. A n n matrx M wth real entres m, s called a stochastc matrx provded () all the entres m satsfy 0 m, () each of the columns sum to one, m = for all, ()

More information

BERNSTEIN POLYNOMIALS

BERNSTEIN POLYNOMIALS On-Lne Geometrc Modelng Notes BERNSTEIN POLYNOMIALS Kenneth I. Joy Vsualzaton and Graphcs Research Group Department of Computer Scence Unversty of Calforna, Davs Overvew Polynomals are ncredbly useful

More information

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

Luby s Alg. for Maximal Independent Sets using Pairwise Independence Lecture Notes for Randomzed Algorthms Luby s Alg. for Maxmal Independent Sets usng Parwse Independence Last Updated by Erc Vgoda on February, 006 8. Maxmal Independent Sets For a graph G = (V, E), an ndependent

More information

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ). REVIEW OF RISK MANAGEMENT CONCEPTS LOSS DISTRIBUTIONS AND INSURANCE Loss and nsurance: When someone s subject to the rsk of ncurrng a fnancal loss, the loss s generally modeled usng a random varable or

More information

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence 1 st Internatonal Symposum on Imprecse Probabltes and Ther Applcatons, Ghent, Belgum, 29 June 2 July 1999 How Sets of Coherent Probabltes May Serve as Models for Degrees of Incoherence Mar J. Schervsh

More information

Power-of-Two Policies for Single- Warehouse Multi-Retailer Inventory Systems with Order Frequency Discounts

Power-of-Two Policies for Single- Warehouse Multi-Retailer Inventory Systems with Order Frequency Discounts Power-of-wo Polces for Sngle- Warehouse Mult-Retaler Inventory Systems wth Order Frequency Dscounts José A. Ventura Pennsylvana State Unversty (USA) Yale. Herer echnon Israel Insttute of echnology (Israel)

More information

8 Algorithm for Binary Searching in Trees

8 Algorithm for Binary Searching in Trees 8 Algorthm for Bnary Searchng n Trees In ths secton we present our algorthm for bnary searchng n trees. A crucal observaton employed by the algorthm s that ths problem can be effcently solved when the

More information

Fisher Markets and Convex Programs

Fisher Markets and Convex Programs Fsher Markets and Convex Programs Nkhl R. Devanur 1 Introducton Convex programmng dualty s usually stated n ts most general form, wth convex objectve functons and convex constrants. (The book by Boyd and

More information

+ + + - - This circuit than can be reduced to a planar circuit

+ + + - - This circuit than can be reduced to a planar circuit MeshCurrent Method The meshcurrent s analog of the nodeoltage method. We sole for a new set of arables, mesh currents, that automatcally satsfy KCLs. As such, meshcurrent method reduces crcut soluton to

More information

Project Networks With Mixed-Time Constraints

Project Networks With Mixed-Time Constraints Project Networs Wth Mxed-Tme Constrants L Caccetta and B Wattananon Western Australan Centre of Excellence n Industral Optmsaton (WACEIO) Curtn Unversty of Technology GPO Box U1987 Perth Western Australa

More information

How To Calculate The Accountng Perod Of Nequalty

How To Calculate The Accountng Perod Of Nequalty Inequalty and The Accountng Perod Quentn Wodon and Shlomo Ytzha World Ban and Hebrew Unversty September Abstract Income nequalty typcally declnes wth the length of tme taen nto account for measurement.

More information

where the coordinates are related to those in the old frame as follows.

where the coordinates are related to those in the old frame as follows. Chapter 2 - Cartesan Vectors and Tensors: Ther Algebra Defnton of a vector Examples of vectors Scalar multplcaton Addton of vectors coplanar vectors Unt vectors A bass of non-coplanar vectors Scalar product

More information

L10: Linear discriminants analysis

L10: Linear discriminants analysis L0: Lnear dscrmnants analyss Lnear dscrmnant analyss, two classes Lnear dscrmnant analyss, C classes LDA vs. PCA Lmtatons of LDA Varants of LDA Other dmensonalty reducton methods CSCE 666 Pattern Analyss

More information

Implementation of Deutsch's Algorithm Using Mathcad

Implementation of Deutsch's Algorithm Using Mathcad Implementaton of Deutsch's Algorthm Usng Mathcad Frank Roux The followng s a Mathcad mplementaton of Davd Deutsch's quantum computer prototype as presented on pages - n "Machnes, Logc and Quantum Physcs"

More information

The OC Curve of Attribute Acceptance Plans

The OC Curve of Attribute Acceptance Plans The OC Curve of Attrbute Acceptance Plans The Operatng Characterstc (OC) curve descrbes the probablty of acceptng a lot as a functon of the lot s qualty. Fgure 1 shows a typcal OC Curve. 10 8 6 4 1 3 4

More information

Calculation of Sampling Weights

Calculation of Sampling Weights Perre Foy Statstcs Canada 4 Calculaton of Samplng Weghts 4.1 OVERVIEW The basc sample desgn used n TIMSS Populatons 1 and 2 was a two-stage stratfed cluster desgn. 1 The frst stage conssted of a sample

More information

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP)

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP) 6.3 / -- Communcaton Networks II (Görg) SS20 -- www.comnets.un-bremen.de Communcaton Networks II Contents. Fundamentals of probablty theory 2. Emergence of communcaton traffc 3. Stochastc & Markovan Processes

More information

We are now ready to answer the question: What are the possible cardinalities for finite fields?

We are now ready to answer the question: What are the possible cardinalities for finite fields? Chapter 3 Fnte felds We have seen, n the prevous chapters, some examples of fnte felds. For example, the resdue class rng Z/pZ (when p s a prme) forms a feld wth p elements whch may be dentfed wth the

More information

General Auction Mechanism for Search Advertising

General Auction Mechanism for Search Advertising General Aucton Mechansm for Search Advertsng Gagan Aggarwal S. Muthukrshnan Dávd Pál Martn Pál Keywords game theory, onlne auctons, stable matchngs ABSTRACT Internet search advertsng s often sold by an

More information

1. Measuring association using correlation and regression

1. Measuring association using correlation and regression How to measure assocaton I: Correlaton. 1. Measurng assocaton usng correlaton and regresson We often would lke to know how one varable, such as a mother's weght, s related to another varable, such as a

More information

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic Lagrange Multplers as Quanttatve Indcators n Economcs Ivan Mezník Insttute of Informatcs, Faculty of Busness and Management, Brno Unversty of TechnologCzech Republc Abstract The quanttatve role of Lagrange

More information

Chapter 4 ECONOMIC DISPATCH AND UNIT COMMITMENT

Chapter 4 ECONOMIC DISPATCH AND UNIT COMMITMENT Chapter 4 ECOOMIC DISATCH AD UIT COMMITMET ITRODUCTIO A power system has several power plants. Each power plant has several generatng unts. At any pont of tme, the total load n the system s met by the

More information

Extending Probabilistic Dynamic Epistemic Logic

Extending Probabilistic Dynamic Epistemic Logic Extendng Probablstc Dynamc Epstemc Logc Joshua Sack May 29, 2008 Probablty Space Defnton A probablty space s a tuple (S, A, µ), where 1 S s a set called the sample space. 2 A P(S) s a σ-algebra: a set

More information

The Mathematical Derivation of Least Squares

The Mathematical Derivation of Least Squares Pscholog 885 Prof. Federco The Mathematcal Dervaton of Least Squares Back when the powers that e forced ou to learn matr algera and calculus, I et ou all asked ourself the age-old queston: When the hell

More information

n + d + q = 24 and.05n +.1d +.25q = 2 { n + d + q = 24 (3) n + 2d + 5q = 40 (2)

n + d + q = 24 and.05n +.1d +.25q = 2 { n + d + q = 24 (3) n + 2d + 5q = 40 (2) MATH 16T Exam 1 : Part I (In-Class) Solutons 1. (0 pts) A pggy bank contans 4 cons, all of whch are nckels (5 ), dmes (10 ) or quarters (5 ). The pggy bank also contans a con of each denomnaton. The total

More information

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College Feature selecton for ntruson detecton Slobodan Petrovć NISlab, Gjøvk Unversty College Contents The feature selecton problem Intruson detecton Traffc features relevant for IDS The CFS measure The mrmr measure

More information

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis The Development of Web Log Mnng Based on Improve-K-Means Clusterng Analyss TngZhong Wang * College of Informaton Technology, Luoyang Normal Unversty, Luoyang, 471022, Chna wangtngzhong2@sna.cn Abstract.

More information

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK Sample Stablty Protocol Background The Cholesterol Reference Method Laboratory Network (CRMLN) developed certfcaton protocols for total cholesterol, HDL

More information

Logistic Regression. Steve Kroon

Logistic Regression. Steve Kroon Logstc Regresson Steve Kroon Course notes sectons: 24.3-24.4 Dsclamer: these notes do not explctly ndcate whether values are vectors or scalars, but expects the reader to dscern ths from the context. Scenaro

More information

Loop Parallelization

Loop Parallelization - - Loop Parallelzaton C-52 Complaton steps: nested loops operatng on arrays, sequentell executon of teraton space DECLARE B[..,..+] FOR I :=.. FOR J :=.. I B[I,J] := B[I-,J]+B[I-,J-] ED FOR ED FOR analyze

More information

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression Novel Methodology of Workng Captal Management for Large Publc Constructons by Usng Fuzzy S-curve Regresson Cheng-Wu Chen, Morrs H. L. Wang and Tng-Ya Hseh Department of Cvl Engneerng, Natonal Central Unversty,

More information

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm Avalable onlne www.ocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(7):1884-1889 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCPRC5 A hybrd global optmzaton algorthm based on parallel

More information

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6 PAR TESTS If a WEIGHT varable s specfed, t s used to replcate a case as many tmes as ndcated by the weght value rounded to the nearest nteger. If the workspace requrements are exceeded and samplng has

More information

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements Lecture 3 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square Next lecture: Matlab tutoral Announcements Rules for attendng the class: Regstered for credt Regstered for audt (only f there

More information

Robust Design of Public Storage Warehouses. Yeming (Yale) Gong EMLYON Business School

Robust Design of Public Storage Warehouses. Yeming (Yale) Gong EMLYON Business School Robust Desgn of Publc Storage Warehouses Yemng (Yale) Gong EMLYON Busness School Rene de Koster Rotterdam school of management, Erasmus Unversty Abstract We apply robust optmzaton and revenue management

More information

Generalizing the degree sequence problem

Generalizing the degree sequence problem Mddlebury College March 2009 Arzona State Unversty Dscrete Mathematcs Semnar The degree sequence problem Problem: Gven an nteger sequence d = (d 1,...,d n ) determne f there exsts a graph G wth d as ts

More information

Linear Circuits Analysis. Superposition, Thevenin /Norton Equivalent circuits

Linear Circuits Analysis. Superposition, Thevenin /Norton Equivalent circuits Lnear Crcuts Analyss. Superposton, Theenn /Norton Equalent crcuts So far we hae explored tmendependent (resste) elements that are also lnear. A tmendependent elements s one for whch we can plot an / cure.

More information

Joe Pimbley, unpublished, 2005. Yield Curve Calculations

Joe Pimbley, unpublished, 2005. Yield Curve Calculations Joe Pmbley, unpublshed, 005. Yeld Curve Calculatons Background: Everythng s dscount factors Yeld curve calculatons nclude valuaton of forward rate agreements (FRAs), swaps, nterest rate optons, and forward

More information

Logical Development Of Vogel s Approximation Method (LD-VAM): An Approach To Find Basic Feasible Solution Of Transportation Problem

Logical Development Of Vogel s Approximation Method (LD-VAM): An Approach To Find Basic Feasible Solution Of Transportation Problem INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME, ISSUE, FEBRUARY ISSN 77-866 Logcal Development Of Vogel s Approxmaton Method (LD- An Approach To Fnd Basc Feasble Soluton Of Transportaton

More information

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications CMSC828G Prncples of Data Mnng Lecture #9 Today s Readng: HMS, chapter 9 Today s Lecture: Descrptve Modelng Clusterng Algorthms Descrptve Models model presents the man features of the data, a global summary

More information

How To Solve A Problem In A Powerline (Powerline) With A Powerbook (Powerbook)

How To Solve A Problem In A Powerline (Powerline) With A Powerbook (Powerbook) MIT 8.996: Topc n TCS: Internet Research Problems Sprng 2002 Lecture 7 March 20, 2002 Lecturer: Bran Dean Global Load Balancng Scrbe: John Kogel, Ben Leong In today s lecture, we dscuss global load balancng

More information

Period and Deadline Selection for Schedulability in Real-Time Systems

Period and Deadline Selection for Schedulability in Real-Time Systems Perod and Deadlne Selecton for Schedulablty n Real-Tme Systems Thdapat Chantem, Xaofeng Wang, M.D. Lemmon, and X. Sharon Hu Department of Computer Scence and Engneerng, Department of Electrcal Engneerng

More information

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

On the Optimal Control of a Cascade of Hydro-Electric Power Stations On the Optmal Control of a Cascade of Hydro-Electrc Power Statons M.C.M. Guedes a, A.F. Rbero a, G.V. Smrnov b and S. Vlela c a Department of Mathematcs, School of Scences, Unversty of Porto, Portugal;

More information

Forecasting the Direction and Strength of Stock Market Movement

Forecasting the Direction and Strength of Stock Market Movement Forecastng the Drecton and Strength of Stock Market Movement Jngwe Chen Mng Chen Nan Ye cjngwe@stanford.edu mchen5@stanford.edu nanye@stanford.edu Abstract - Stock market s one of the most complcated systems

More information

Formulating & Solving Integer Problems Chapter 11 289

Formulating & Solving Integer Problems Chapter 11 289 Formulatng & Solvng Integer Problems Chapter 11 289 The Optonal Stop TSP If we drop the requrement that every stop must be vsted, we then get the optonal stop TSP. Ths mght correspond to a ob sequencng

More information

An Interest-Oriented Network Evolution Mechanism for Online Communities

An Interest-Oriented Network Evolution Mechanism for Online Communities An Interest-Orented Network Evoluton Mechansm for Onlne Communtes Cahong Sun and Xaopng Yang School of Informaton, Renmn Unversty of Chna, Bejng 100872, P.R. Chna {chsun,yang}@ruc.edu.cn Abstract. Onlne

More information

THE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES

THE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES The goal: to measure (determne) an unknown quantty x (the value of a RV X) Realsaton: n results: y 1, y 2,..., y j,..., y n, (the measured values of Y 1, Y 2,..., Y j,..., Y n ) every result s encumbered

More information

Lecture 2: Single Layer Perceptrons Kevin Swingler

Lecture 2: Single Layer Perceptrons Kevin Swingler Lecture 2: Sngle Layer Perceptrons Kevn Sngler kms@cs.str.ac.uk Recap: McCulloch-Ptts Neuron Ths vastly smplfed model of real neurons s also knon as a Threshold Logc Unt: W 2 A Y 3 n W n. A set of synapses

More information

NON-CONSTANT SUM RED-AND-BLACK GAMES WITH BET-DEPENDENT WIN PROBABILITY FUNCTION LAURA PONTIGGIA, University of the Sciences in Philadelphia

NON-CONSTANT SUM RED-AND-BLACK GAMES WITH BET-DEPENDENT WIN PROBABILITY FUNCTION LAURA PONTIGGIA, University of the Sciences in Philadelphia To appear n Journal o Appled Probablty June 2007 O-COSTAT SUM RED-AD-BLACK GAMES WITH BET-DEPEDET WI PROBABILITY FUCTIO LAURA POTIGGIA, Unversty o the Scences n Phladelpha Abstract In ths paper we nvestgate

More information

A Probabilistic Theory of Coherence

A Probabilistic Theory of Coherence A Probablstc Theory of Coherence BRANDEN FITELSON. The Coherence Measure C Let E be a set of n propostons E,..., E n. We seek a probablstc measure C(E) of the degree of coherence of E. Intutvely, we want

More information

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12 14 The Ch-squared dstrbuton PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 1 If a normal varable X, havng mean µ and varance σ, s standardsed, the new varable Z has a mean 0 and varance 1. When ths standardsed

More information

The Geometry of Online Packing Linear Programs

The Geometry of Online Packing Linear Programs The Geometry of Onlne Packng Lnear Programs Marco Molnaro R. Rav Abstract We consder packng lnear programs wth m rows where all constrant coeffcents are n the unt nterval. In the onlne model, we know the

More information

J. Parallel Distrib. Comput.

J. Parallel Distrib. Comput. J. Parallel Dstrb. Comput. 71 (2011) 62 76 Contents lsts avalable at ScenceDrect J. Parallel Dstrb. Comput. journal homepage: www.elsever.com/locate/jpdc Optmzng server placement n dstrbuted systems n

More information

Stochastic Bandits with Side Observations on Networks

Stochastic Bandits with Side Observations on Networks Stochastc Bandts wth Sde Observatons on Networks Swapna Buccapatnam, Atlla Erylmaz Department of ECE The Oho State Unversty Columbus, OH - 430 buccapat@eceosuedu, erylmaz@osuedu Ness B Shroff Departments

More information

Joint Scheduling of Processing and Shuffle Phases in MapReduce Systems

Joint Scheduling of Processing and Shuffle Phases in MapReduce Systems Jont Schedulng of Processng and Shuffle Phases n MapReduce Systems Fangfe Chen, Mural Kodalam, T. V. Lakshman Department of Computer Scence and Engneerng, The Penn State Unversty Bell Laboratores, Alcatel-Lucent

More information

A Lyapunov Optimization Approach to Repeated Stochastic Games

A Lyapunov Optimization Approach to Repeated Stochastic Games PROC. ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING, OCT. 2013 1 A Lyapunov Optmzaton Approach to Repeated Stochastc Games Mchael J. Neely Unversty of Southern Calforna http://www-bcf.usc.edu/

More information

On Robust Network Planning

On Robust Network Planning On Robust Network Plannng Al Tzghadam School of Electrcal and Computer Engneerng Unversty of Toronto, Toronto, Canada Emal: al.tzghadam@utoronto.ca Alberto Leon-Garca School of Electrcal and Computer Engneerng

More information

) of the Cell class is created containing information about events associated with the cell. Events are added to the Cell instance

) of the Cell class is created containing information about events associated with the cell. Events are added to the Cell instance Calbraton Method Instances of the Cell class (one nstance for each FMS cell) contan ADC raw data and methods assocated wth each partcular FMS cell. The calbraton method ncludes event selecton (Class Cell

More information

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek HE DISRIBUION OF LOAN PORFOLIO VALUE * Oldrch Alfons Vascek he amount of captal necessary to support a portfolo of debt securtes depends on the probablty dstrbuton of the portfolo loss. Consder a portfolo

More information

Learning the Best K-th Channel for QoS Provisioning in Cognitive Networks

Learning the Best K-th Channel for QoS Provisioning in Cognitive Networks 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Time Value of Money Module

Time Value of Money Module Tme Value of Money Module O BJECTIVES After readng ths Module, you wll be able to: Understand smple nterest and compound nterest. 2 Compute and use the future value of a sngle sum. 3 Compute and use the

More information

Online Advertisement, Optimization and Stochastic Networks

Online Advertisement, Optimization and Stochastic Networks Onlne Advertsement, Optmzaton and Stochastc Networks Bo (Rambo) Tan and R. Srkant Department of Electrcal and Computer Engneerng Unversty of Illnos at Urbana-Champagn Urbana, IL, USA 1 arxv:1009.0870v6

More information

7.5. Present Value of an Annuity. Investigate

7.5. Present Value of an Annuity. Investigate 7.5 Present Value of an Annuty Owen and Anna are approachng retrement and are puttng ther fnances n order. They have worked hard and nvested ther earnngs so that they now have a large amount of money on

More information

Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers

Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers Foundatons and Trends R n Machne Learnng Vol. 3, No. 1 (2010) 1 122 c 2011 S. Boyd, N. Parkh, E. Chu, B. Peleato and J. Ecksten DOI: 10.1561/2200000016 Dstrbuted Optmzaton and Statstcal Learnng va the

More information

Efficient Project Portfolio as a tool for Enterprise Risk Management

Efficient Project Portfolio as a tool for Enterprise Risk Management Effcent Proect Portfolo as a tool for Enterprse Rsk Management Valentn O. Nkonov Ural State Techncal Unversty Growth Traectory Consultng Company January 5, 27 Effcent Proect Portfolo as a tool for Enterprse

More information

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network *

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network * JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 24, 819-840 (2008) Data Broadcast on a Mult-System Heterogeneous Overlayed Wreless Network * Department of Computer Scence Natonal Chao Tung Unversty Hsnchu,

More information

CHAPTER 14 MORE ABOUT REGRESSION

CHAPTER 14 MORE ABOUT REGRESSION CHAPTER 14 MORE ABOUT REGRESSION We learned n Chapter 5 that often a straght lne descrbes the pattern of a relatonshp between two quanttatve varables. For nstance, n Example 5.1 we explored the relatonshp

More information

Traffic State Estimation in the Traffic Management Center of Berlin

Traffic State Estimation in the Traffic Management Center of Berlin Traffc State Estmaton n the Traffc Management Center of Berln Authors: Peter Vortsch, PTV AG, Stumpfstrasse, D-763 Karlsruhe, Germany phone ++49/72/965/35, emal peter.vortsch@ptv.de Peter Möhl, PTV AG,

More information

Section 5.4 Annuities, Present Value, and Amortization

Section 5.4 Annuities, Present Value, and Amortization Secton 5.4 Annutes, Present Value, and Amortzaton Present Value In Secton 5.2, we saw that the present value of A dollars at nterest rate per perod for n perods s the amount that must be deposted today

More information

Product-Form Stationary Distributions for Deficiency Zero Chemical Reaction Networks

Product-Form Stationary Distributions for Deficiency Zero Chemical Reaction Networks Bulletn of Mathematcal Bology (21 DOI 1.17/s11538-1-9517-4 ORIGINAL ARTICLE Product-Form Statonary Dstrbutons for Defcency Zero Chemcal Reacton Networks Davd F. Anderson, Gheorghe Cracun, Thomas G. Kurtz

More information

Enabling P2P One-view Multi-party Video Conferencing

Enabling P2P One-view Multi-party Video Conferencing Enablng P2P One-vew Mult-party Vdeo Conferencng Yongxang Zhao, Yong Lu, Changja Chen, and JanYn Zhang Abstract Mult-Party Vdeo Conferencng (MPVC) facltates realtme group nteracton between users. Whle P2P

More information

Compiling for Parallelism & Locality. Dependence Testing in General. Algorithms for Solving the Dependence Problem. Dependence Testing

Compiling for Parallelism & Locality. Dependence Testing in General. Algorithms for Solving the Dependence Problem. Dependence Testing Complng for Parallelsm & Localty Dependence Testng n General Assgnments Deadlne for proect 4 extended to Dec 1 Last tme Data dependences and loops Today Fnsh data dependence analyss for loops General code

More information

NMT EE 589 & UNM ME 482/582 ROBOT ENGINEERING. Dr. Stephen Bruder NMT EE 589 & UNM ME 482/582

NMT EE 589 & UNM ME 482/582 ROBOT ENGINEERING. Dr. Stephen Bruder NMT EE 589 & UNM ME 482/582 NMT EE 589 & UNM ME 482/582 ROBOT ENGINEERING Dr. Stephen Bruder NMT EE 589 & UNM ME 482/582 7. Root Dynamcs 7.2 Intro to Root Dynamcs We now look at the forces requred to cause moton of the root.e. dynamcs!!

More information

Research Article Enhanced Two-Step Method via Relaxed Order of α-satisfactory Degrees for Fuzzy Multiobjective Optimization

Research Article Enhanced Two-Step Method via Relaxed Order of α-satisfactory Degrees for Fuzzy Multiobjective Optimization Hndaw Publshng Corporaton Mathematcal Problems n Engneerng Artcle ID 867836 pages http://dxdoorg/055/204/867836 Research Artcle Enhanced Two-Step Method va Relaxed Order of α-satsfactory Degrees for Fuzzy

More information

Ants Can Schedule Software Projects

Ants Can Schedule Software Projects Ants Can Schedule Software Proects Broderck Crawford 1,2, Rcardo Soto 1,3, Frankln Johnson 4, and Erc Monfroy 5 1 Pontfca Unversdad Católca de Valparaíso, Chle FrstName.Name@ucv.cl 2 Unversdad Fns Terrae,

More information

Matrix Multiplication I

Matrix Multiplication I Matrx Multplcaton I Yuval Flmus February 2, 2012 These notes are based on a lecture gven at the Toronto Student Semnar on February 2, 2012. The materal s taen mostly from the boo Algebrac Complexty Theory

More information

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy 4.02 Quz Solutons Fall 2004 Multple-Choce Questons (30/00 ponts) Please, crcle the correct answer for each of the followng 0 multple-choce questons. For each queston, only one of the answers s correct.

More information

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING Matthew J. Lberatore, Department of Management and Operatons, Vllanova Unversty, Vllanova, PA 19085, 610-519-4390,

More information

"Research Note" APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES *

Research Note APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES * Iranan Journal of Scence & Technology, Transacton B, Engneerng, ol. 30, No. B6, 789-794 rnted n The Islamc Republc of Iran, 006 Shraz Unversty "Research Note" ALICATION OF CHARGE SIMULATION METHOD TO ELECTRIC

More information

From Selective to Full Security: Semi-Generic Transformations in the Standard Model

From Selective to Full Security: Semi-Generic Transformations in the Standard Model An extended abstract of ths work appears n the proceedngs of PKC 2012 From Selectve to Full Securty: Sem-Generc Transformatons n the Standard Model Mchel Abdalla 1 Daro Fore 2 Vadm Lyubashevsky 1 1 Département

More information

Quantization Effects in Digital Filters

Quantization Effects in Digital Filters Quantzaton Effects n Dgtal Flters Dstrbuton of Truncaton Errors In two's complement representaton an exact number would have nfntely many bts (n general). When we lmt the number of bts to some fnte value

More information

Multiple-Period Attribution: Residuals and Compounding

Multiple-Period Attribution: Residuals and Compounding Multple-Perod Attrbuton: Resduals and Compoundng Our revewer gave these authors full marks for dealng wth an ssue that performance measurers and vendors often regard as propretary nformaton. In 1994, Dens

More information

1.1 The University may award Higher Doctorate degrees as specified from time-to-time in UPR AS11 1.

1.1 The University may award Higher Doctorate degrees as specified from time-to-time in UPR AS11 1. HIGHER DOCTORATE DEGREES SUMMARY OF PRINCIPAL CHANGES General changes None Secton 3.2 Refer to text (Amendments to verson 03.0, UPR AS02 are shown n talcs.) 1 INTRODUCTION 1.1 The Unversty may award Hgher

More information

How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S

How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S S C H E D A E I N F O R M A T I C A E VOLUME 0 0 On Mean Squared Error of Herarchcal Estmator Stans law Brodowsk Faculty of Physcs, Astronomy, and Appled Computer Scence, Jagellonan Unversty, Reymonta

More information

Chapter 7: Answers to Questions and Problems

Chapter 7: Answers to Questions and Problems 19. Based on the nformaton contaned n Table 7-3 of the text, the food and apparel ndustres are most compettve and therefore probably represent the best match for the expertse of these managers. Chapter

More information

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network 700 Proceedngs of the 8th Internatonal Conference on Innovaton & Management Forecastng the Demand of Emergency Supples: Based on the CBR Theory and BP Neural Network Fu Deqang, Lu Yun, L Changbng School

More information

A Fast Incremental Spectral Clustering for Large Data Sets

A Fast Incremental Spectral Clustering for Large Data Sets 2011 12th Internatonal Conference on Parallel and Dstrbuted Computng, Applcatons and Technologes A Fast Incremental Spectral Clusterng for Large Data Sets Tengteng Kong 1,YeTan 1, Hong Shen 1,2 1 School

More information

21 Vectors: The Cross Product & Torque

21 Vectors: The Cross Product & Torque 21 Vectors: The Cross Product & Torque Do not use our left hand when applng ether the rght-hand rule for the cross product of two vectors dscussed n ths chapter or the rght-hand rule for somethng curl

More information

DEFINING %COMPLETE IN MICROSOFT PROJECT

DEFINING %COMPLETE IN MICROSOFT PROJECT CelersSystems DEFINING %COMPLETE IN MICROSOFT PROJECT PREPARED BY James E Aksel, PMP, PMI-SP, MVP For Addtonal Informaton about Earned Value Management Systems and reportng, please contact: CelersSystems,

More information

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching) Face Recognton Problem Face Verfcaton Problem Face Verfcaton (1:1 matchng) Querymage face query Face Recognton (1:N matchng) database Applcaton: Access Control www.vsage.com www.vsoncs.com Bometrc Authentcaton

More information