Parallel Algorithms for Big Data Optimization
|
|
|
- Phoebe Hall
- 10 years ago
- Views:
Transcription
1 Parallel Algorthms for Bg Data Optmzaton 1 Francsco Facchne, Smone Sagratella, and Gesualdo Scutar Senor Member, IEEE Abstract We propose a decomposton framework for the parallel optmzaton of the sum of a dffentable functon and a (block) separable nonsmooth, convex one. The latter term s usually employed to enforce structu n the soluton, typcally sparsty. Our framework s very flexble and ncludes both fully parallel Jacob schemes and Gauss-Sedel (.e., sequental) ones, as well as vrtually all possbltes n between wth only a subset of varables updated at each teraton. Our theotcal convergence sults mprove on exstng ones, and numercal sults on LASSO and logstc gsson problems show that the new method consstently outperforms exstng algorthms. Index Terms Parallel optmzaton, Dstrbuted methods, Jacob method, LASSO, Sparse soluton. I. INTRODUCTION The mnmzaton of the sum of a smooth functon, F, and of a nonsmooth (block separable) convex one, G, mn V (x) F (x) + G(x), (1) x X s an ubqutous problem that arses n many felds of engneerng, so dverse as compssed sensng, bass pursut denosng, sensor networks, neuroelectromagnetc magng, machne learnng, data mnng, sparse logstc gsson, genomcs, meteology, tensor factorzaton and completon, geophyscs, and rado astronomy. Usually the nonsmooth term s used to promote sparsty of the optmal soluton, whch often corsponds to a parsmonous psentaton of some phenomenon at hand. Many of the afomentoned applcatons can gve rse to extmely large problems so that standard optmzaton technques a hardly applcable. And ndeed, cent years have wtnessed a flurry of search actvty amed at developng soluton methods that a smple (for example based solely on matrx/vector multplcatons) but yet capable to converge to a good approxmate soluton n asonable tme. It s hardly possble he to even summarze the huge amount of work done n ths feld; we fer the ader to the cent works [2] [17] and books [18] [20] as entry ponts to the lteratu. However, wth bg data problems t s clearly necessary to desgn parallel methods able to explot the computatonal power of mult-co processors n order to solve many ntestng problems. It s then surprsng that whle sequental solutons methods for Problem (1) have been wdely nvestgated, the analyss of parallel algorthms sutable to large-scale The order of the authors s alphabetc; all the authors contrbuted equally to the paper. F. Facchne and S. Sagratella a wth the Dept. of Computer, Control, and Management Engneerng, at Unv. of Rome La Sapenza, Rome, Italy. Emals: <facchne,sagratella>@ds.unroma1.t G. Scutar s wth the Dept. of Electrcal Engneerng, at the State Unv. of New York at Buffalo, Buffalo, USA. Emal: [email protected]. Hs work was supported by the USA Natonal Scence Foundaton under Grants CMS and CAREER Award No Part of ths work has been psented at the 2014 IEEE Internatonal Confence on Acoustcs, Speech, and Sgnal Processng (ICASSP 2014), Flonce, Italy, May , [1]. mplementatons lags behnd. Gradent-type methods can of course be easly parallelzed, but they a known to generally suffer from slow convergence; furthermo, by lnearzng F they do not explot any structu of F, a fact that nstead has been shown to enhance convergence speed [21]. However, beyond that, and lookng at cent approaches, we a only awa of very few papers that deal wth parallel soluton methods [9] [16]. These papers analyze both randomzed and determnstc block Coordnate Descent Methods (CDMs); one advantage of the analyses then s that they provde an ntestng (global) rate of convergence. However, ) they a essentally stll (gularzed) gradent-based methods; ) they a not flexble enough to nclude, among other thngs, very natural Jacob-type methods (whe at each teraton a mnmzaton of the orgnal functon s performed n parallel wth spect to all blocks of varables); and ) except for [10], [11], [13], they cannot deal wth a nonconvex F. We fer to Secton V for a detaled dscusson on curnt parallel and sequental soluton methods for (1). In ths paper, we propose a new, broad, determnstc algorthmc framework for the soluton of Problem (1). The essental, rather natural dea underlyng our approach s to decompose (1) nto a sequence of (smpler) subproblems wheby the functon F s placed by sutable convex approxmatons; the subproblems can be solved n a parallel and dstrbuted fashon. Key (new) featus of the proposed algorthmc framework a: ) t s parallel, wth a dege of parallelsm that can be chosen by the user and that can go from a complete parallelsm (every varable s updated n parallel to all the others) to the sequental (only one varable s updated at each teraton), coverng vrtually all the possbltes n between ; ) t easly leads to dstrbuted mplementatons; ) t can tackle a nonconvex F ; v) t s very flexble and ncludes, among others, updates based on gradent- or Newtontype approxmatons; v) t easly allows for nexact soluton of the subproblems; v) t permts the update of only some (blocks of) varables at each teraton (a featu that turns out to be very mportant numercally); v) even n the case of the mnmzaton of a smooth, convex functon (.e., F C 1 s convex and G 0) our theotcal sults compa favorably to state-of-the-art methods. The proposed framework encompasses a gamut of novel algorthms, offerng a lot of flexblty to control teraton complexty, communcaton overhead, and convergence speed, whle convergng under the same condtons; these desrable featus make our schemes applcable to several dffent problems and scenaros. Among the varety of new updatng rules for the (block) varables we propose, t s worth mentonng a combnaton of Jacob and Gauss-Sedel updates, whch seems partcularly valuable n parallel optmzaton on mult co/processor archtectus; to the best of our knowledge ths s the frst tme that such a scheme s proposed and analyzed. A further contrbuton of the paper s to mplement our
2 2 schemes and the most psentatve ones n the lteratu over a parallel archtectu, the General Compute Cluster of the Center for Computatonal Research at the State Unversty of New York at Buffalo. Numercal sults on LASSO and Logstc Regsson problems show that our algorthms consstently outperform state-of-the-art schemes. The paper s organzed as follows. Secton II formally ntroduces the optmzaton problem along wth the man assumptons under whch t s studed. Secton III descrbes our novel general algorthmc framework along wth ts convergence propertes. In Secton IV we dscuss several nstances of the general scheme ntroduced n Secton III. Secton V contans a detaled comparson of our schemes wth stateof-the-art algorthms on smlar problems. Numercal sults a psented n Secton VI, whe we focus on LASSO and Logstc Regsson problems and compa our schemes wth state-of-the-art alternatve soluton methods. Fnally, Secton VII draws some conclusons. All proofs of our sults a gven n the Appendx. II. PROBLEM DEFINITION We consder Problem (1), whe the feasble set X = X 1 X N s a Cartesan product of lower dmensonal convex sets X R n, and x R n s parttoned accordngly: x = (x 1,..., x N ), wth each x R n ; F s smooth (and not necessarly convex) and G s convex and possbly nondffentable, wth G(x) = N =1g (x ). Ths formulaton s very general and ncludes problems of gat ntest. Below we lst some nstances of Problem (1). G(x) = 0; n ths case the problem duces to the mnmzaton of a smooth, possbly nonconvex problem wth convex constrants. F (x) = Ax b 2 and G(x) = cx 1, X = R n, wth A R m n, b R m, and c R ++ gven constants; ths s the nowned and much studed LASSO problem [2]. F (x) = Ax b 2 and G(x) = c N =1 x 2, X = R n, wth A R m n, b R m, and c R ++ gven constants; ths s the group LASSO problem [22]. F (x) = m j=1 log(1 + e a y T x ) and G(x) = cx 1 (or G(x) = c N =1 x 2 ), wth y R n, a R, and c R ++ gven constants; ths s the sparse logstc gsson problem [23], [24]. F (x) = m j=1 max{0, 1 a y T x}2 and G(x) = cx 1, wth a { 1, 1}, y R n, and c R ++ gven; ths s the l 1 -gularzed l 2 -loss Support Vector Machne problem [5]. Other problems that can be cast n the form (1) nclude the Nuclear Norm Mnmzaton problem, the Robust Prncpal Component Analyss problem, the Sparse Inverse Covarance Selecton problem, the Nonnegatve Matrx (or Tensor) Factorzaton problem, see e.g., [25] and fences then. Assumptons. Gven (1), we make the followng blanket assumptons: (A1) Each X s nonempty, closed, and convex; (A2) F s C 1 on an open set contanng X; (A3) F s Lpschtz contnuous on X wth constant L F ; (A4) G(x) = N = g (x ), wth all g contnuous and convex on X ; (A5) V s coercve. Note that the above assumptons a standard and a satsfed by most of the problems of practcal ntest. For nstance, A3 holds automatcally f X s bounded; the block-separablty condton A4 s a common assumpton n the lteratu of parallel methods for the class of problems (1) (t s n fact nstrumental to deal wth the nonsmoothness of G n a parallel envronment). Intestngly A4 s satsfed by all standard G usually encounted n applcatons, ncludng G(x) = x 1 and G(x) = N =1 x 2, whch a among the most commonly used functons. Assumpton A5 s needed to guarantee that the sequence generated by our method s bounded; we could dspense wth t at the prce of a mo complex analyss and cumbersome statement of convergence sults. III. MAIN RESULTS We begn ntroducng an nformal descrpton of our algorthmc framework along wth a lst of key featus that we would lke our schemes enjoy; ths wll shed lght on the co dea of the proposed decomposton technque. We want to develop parallel soluton methods for Problem (1) wheby operatons can be carred out on some or (possbly) all (block) varables x at the same tme. The most natural parallel (Jacob-type) method one can thnk of s updatng all blocks smultaneously: gven x k, each (block) varable x s updated by solvng the followng subproblem x k+1 argmn { F (x, x k ) + g (x ) }, (2) x X whe x denotes the vector obtaned from x by deletng the block x. Unfortunately ths method converges only under very strctve condtons [26] that a seldom verfed n practce. To cope wth ths ssue the proposed approach ntroduces some memory" n the terate: the new pont s a convex combnaton of x k and the solutons of (2). Buldng on ths terate, we would lke our framework to enjoy many addtonal featus, as descrbed next. Approxmatng F : Solvng each subproblem as n (2) may be too costly or dffcult n some stuatons. One may then pfer to approxmate ths problem, n some sutable sense, n order to facltate the task of computng the new teraton. To ths end, we assume that for all N {1,..., N} we can defne a functon P (z; w) : X X R, the canddate approxmant of F, havng the followng propertes (we denote by P the partal gradent of P wth spect to the frst argument z): (P1) P ( ; w) s convex and contnuously dffentable on X for all w X; (P2) P (x ; x) = x F (x) for all x X; (P3) P (z; ) s Lpschtz contnuous on X for all z X. Such a functon P should be garded as a (smple) convex approxmaton of F at the pont x wth spect to the block of varables x that pserves the frst order propertes of F wth spect to x. Based on ths approxmaton we can defne at any pont x k X a gularzed approxmaton h (x ; x k ) of V wth spect to x when F s placed by P whle the nondffentable term s pserved, and a quadratc proxmal term
3 3 s added to make the overall approxmaton strongly convex. on the same dea, we can ntroduce alternatve less expensve Mo formally, we have h (x ; x k ) P (x ; x k ) + τ ( x x k ) T Q (x k ) ( metrcs by placng the dstance x (x k, τ ) x k x x k ) wth a computatonally cheaper error bound,.e., a functon E (x) } 2 {{ } such that h (x ;x k ) s x (x k, τ ) x k E (x k ) s x (x k, τ ) x k, (4) +g (x ), for some 0 < s s. Of course one can always set whe Q (x k ) s an n n postve defnte matrx (possbly E (x k ) = x (x k, τ ) x k, but other choces a also dependent on x k ). We always assume that the functons h (, x k ) a unformly strongly convex. (A6) All h ( ; x k ) a unformly strongly convex on X wth a common postve defnteness constant q > 0; furthermo, Q ( ) s Lpschtz contnuous on X. Note that an easy and standard way to satsfy A6 s to take, for any and for any k, τ = q > 0 and Q (x k ) = I. However, f P ( ; x k ) s alady unformly strongly convex, one can avod the proxmal term and set τ = 0 whle satsfyng A6. Assocated wth each and pont x k X we can defne the followng optmal block soluton map: x (x k, τ ) argmn x X h (x ; x k ). (3) Note that x (x k, τ ) s always well-defned, snce the optmzaton problem n (3) s strongly convex. Gven (3), we can then ntroduce the soluton map X y x(y, τ ) ( x (y, τ )) N =1. The proposed algorthm (that we formally descrbe later on) s based on the computaton of (an approxmaton of) x(x k, τ ). Thefo the functons P should lead to as easly computable functons x as possble. An approprate choce depends on the problem at hand and on computatonal quments. We dscuss alternatve possble choces for the approxmatons P n Secton IV. Inexact solutons: In many stuatons (especally n the case of large-scale problems), t can be useful to further duce the computatonal effort needed to solve the subproblems n (3) by allowng nexact computatons z k of x (x k, τ ),.e., z k x ( x k, τ ) ε k, whe εk measus the accuracy n computng the soluton. Updatng only some blocks: Another mportant featu we want for our algorthmc framework s the capablty of updatng at each teraton only some of the (block) varables, a featu that has been observed to be very effectve numercally. In fact, our schemes a guaranteed to converge under the update of only a subset of the varables at each teraton; the only condton s that such a subset contans at least one (block) component whch s wthn a factor ρ (0, 1] far away from the optmalty, n the sense explaned next. Snce x k s an optmal soluton of (3) f and only f x (x k, τ ) = x k, a natural dstance of xk from the optmalty s d k x (x k, τ ) x k ; one could then select the blocks x s to update based on such an optmalty measu (e.g., optng for blocks exhbtng larger d k s). However, ths choce qus the computaton of all the solutons x (x k, τ ), for = 1,..., n, whch n some applcatons (e.g., huge-scale problems) mght be computatonally too expensve. Buldng possble; we dscuss ths pont further n Secton IV. Algorthmc framework: We a now ady to formally ntroduce our algorthm, Algorthm 1, that ncludes all the featus dscussed above; convergence to statonary solutons 1 of (1) s stated n Theom 1. Algorthm 1: Inexact Flexble Parallel Algorthm (FLEXA) Data : {ε k } for N, τ 0, {γk } > 0, x 0 X, ρ (0, 1]. Set k = 0. (S.1) : If x k satsfes a termnaton crteron: STOP; (S.2) : For all N, solve (3) wth accuracy ε k : Fnd z k X s.t. z k x ( x k, τ ) ε k ; (S.3) : Set M k max {E (x k )}. Choose a set S k that contans at least one ndex for whch E (x k ) ρm k. Set ẑ k = zk for Sk and ẑ k = xk for Sk (S.4) : Set x k+1 x k + γ k (ẑ k x k ); (S.5) : k k + 1, and go to (S.1). Theom 1: Let {x k } be the sequence generated by Algorthm 1, under A1-A6. Suppose that {γ k } and {ε k } satsfy the followng condtons: ) γ k (0, 1]; ) γ k 0; ) k γk = + ; v) ( ) k γ k 2 < + ; and v) ε k γ k α 1 mn{α 2, 1/ x F (x k )} for all N and some nonnegatve constants α 1 and α 2. Addtonally, f nexact solutons a used n Step 2,.e., ε k > 0 for some and nfnte k, then assume also that G s globally Lpschtz on X. Then, ether Algorthm 1 converges n a fnte number of teratons to a statonary soluton of (1) or every lmt pont of {x k } (at least one such ponts exsts) s a statonary soluton of (1). Proof: See Appendx B. The proposed algorthm s extmely flexble. We can always choose S k = N sultng n the smultaneous update of all the (block) varables (full Jacob scheme); or, at the other extme, one can update a sngle (block) varable per tme, thus obtanng a Gauss-Southwell knd of method. Mo classcal cyclc Gauss-Sedel methods can also be derved and a dscussed n the next subsecton. One can also compute nexact solutons (Step 2) whle pservng convergence, provded that the error term ε k and the step-sze γk s a chosen accordng to Theom 1; some practcal choces for these parameters a dscussed n Secton IV. We emphasze that the Lpschtzanty of G s qud only f x(x k, τ) s not computed exactly for nfnte teratons. At any rate ths Lpschtz condtons s 1 We call that a statonary soluton x of (1) s a ponts for whch a subgradent ξ G(x ) exsts such that ( F (x ) +ξ) T (y x ) 0 for all y X. Of course, f F s convex, statonary ponts concde wth global mnmzers.
4 4 automatcally satsfed f G s a norm (and thefo n LASSO and group LASSO problems for example) or f X s bounded. As a fnal mark, note that versons of Algorthm 1 whe all (or most of) the varables a updated at each teraton a partcularly amenable to mplementaton n dstrbuted envronments (e.g., mult-user communcatons systems, ad-hoc networks, etc.). In fact, n ths case, not only the calculaton of the nexact solutons z k can be carred out n parallel, but the nformaton that the -th subproblem has to exchange wth the other subproblem n order to compute the next teraton s very lmted. A full appcaton of the potentaltes of our approach n dstrbuted settngs depends however on the specfc applcaton under consderaton and s beyond the scope of ths paper. We fer the ader to [21] for some examples, even f n less general settngs. A. A Gauss-Jacob algorthm Algorthm 1 and ts convergence theory cover fully parallel Jacob as well as Gauss-Southwell-type methods, and many of ther varants. In ths secton we show that Algorthm 1 can also ncorporate hybrd parallel-sequental (Jacob Gauss- Sedel) schemes when block of varables a updated smultaneously by sequentally computng entres per block. Ths procedu seems partcularly well suted to parallel optmzaton on mult-co/processor archtectus. Suppose that we have P processors that can be used n parallel and we want to explot them to solve Problem (1) (P wll denote both the number of processors and the set {1, 2,..., P }). We assgn to each processor p the varables I p ; thefo I 1,..., I P s a partton of I. We denote by x p (x p ) Ip the vector of (block) varables x p assgned to processor p, wth I p ; and x p s the vector of manng varables,.e., the vector of those assgned to all processors except the p-th one. Fnally, gven I p, we partton x p as x p = (x p<, x p ), whe x p< s the vector contanng all varables n I p that come befo (n the order assumed n I p ), whle x p a the manng varables. Thus we wll wrte, wth a slght abuse of notaton x = (x p<, x p, x p ). Once the optmzaton varables have been assgned to the processors, one could n prncple apply the nexact Jacob Algorthm 1. In ths scheme each processor p would compute sequentally, at each teraton k and for every (block) varable x p, a sutable z k p by keepng all varables but x p fxed to (x k pj ) j I p and x k p. But snce we a solvng the problems for each group of varables assgned to a processor sequentally, ths seems a waste of sources; t s nstead much mo effcent to use, wthn each processor, a Gauss-Sedel scheme, wheby the curnt calculated terates a used n all subsequent calculatons. Our Gauss-Jacob method formally descrbed n Algorthm 2 mplements exactly ths dea; ts convergence propertes a gven n Theom 2. Theom 2: Let {x k } k=1 be the sequence generated by Algorthm 2, under the settng of Theom 1. Then, ether Algorthm 2 converges n a fnte number of teratons to a statonary soluton of (1) or every lmt pont of the sequence {x k } k=1 (at least one such ponts exsts) s a statonary soluton of (1). Proof: See Appendx C. Algorthm 2: Inexact Gauss-Jacob Algorthm Data : {ε k p } for p P and I p, τ 0, {γ k } > 0, x 0 K. Set k = 0. (S.1) : If x k satsfes a termnaton crteron: STOP; (S.2) : For all p P do (n parallel), For all I p do (sequentally) a) Fnd z k p s.t. z k p x ( p (x k+1 p<, xk p, xk p), τ ) ε k p ; b) Set x k+1 p x k p + γ ( k z k p ) xk p (S.3) : k k + 1, and go to (S.1). Although the proof of Theom 2 s legated to the appendx, t s ntestng to pont out that the gst of the proof s to show that Algorthm 2 s nothng else but an nstance of Algorthm 1 wth errors. By updatng all varables at each teraton, Algorthm 2 has the advantage that nether the error bounds E nor the exact solutons x p need to be computed, n order to decde whch varables should be updated. Furthermo t s rather ntutve that the use of the latest avalable nformaton should duce the number of overall teratons needed to converge wth spect to Algorthm 1 (assumng n the latter algorthm that all varables a updated at each teraton). However ths advantages should be contrasted wth the followng two facts: ) updatng all varables at each teraton mght not always be the best (or a feasble) choce; and ) n many practcal nstances of Problem (1), usng the latest nformaton as dctated by Algorthm 2 may qu extra calculatons (e.g., to compute functon nformaton, as the gradents) and communcaton overhead. These aspects a dscussed on specfc examples n Secton VI. As a fnal mark, note that Algorthm 2 contans as specal case the classcal cyclcal Gauss-Sedel scheme (a fact that was less obvous to deduce dctly from Algorthm 1); t s suffcent to set P = 1 (corspondng to usng only one processor): the sngle processor updates all the (scalar) varables sequentally whle usng the new values of those that have alady been updated. IV. EXAMPLES AND SPECIAL CASES Algorthms 1 and 2 a very general and encompass a gamut of novel algorthms, each corspondng to varous forms of the approxmant P, the error bound functon E, the step-sze sequence γ k, the block partton, etc. These choces lead to algorthms that can be very dffent from each other, but all convergng under the same condtons. These deges of fedom offer a lot of flexblty to control teraton complexty, communcaton overhead, and convergence speed. In ths secton we outlne several effectve choces for the desgn parameters along wth some llustratve examples of specfc algorthms sultng from a proper combnaton of these choces. On the choce of the step-sze γ k. An example of step-sze rule satsfyng condtons )-v) n Theom 1 s: gven 0 <
5 5 γ 0 1, let γ k = γ k 1 ( 1 θ γ k 1), k = 1,..., (5) whe θ (0, 1) s a gven constant. Notce that whle ths rule may stll qu some tunng for optmal behavor, t s qute lable, snce n general we a not usng a (sub)gradent dcton, so that many of the well-known practcal drawbacks assocated wth a (sub)gradent method wth dmnshng stepsze a mtgated n our settng. Furthermo, ths choce of step-sze does not qu any form of centralzed coordnaton, whch s a favorable featu n a parallel envronment. Numercal sults n Secton VI show the effectveness of (the customzaton of) (5) on specfc problems. We mark that t s possble to prove convergence of Algorthm 1 also usng other step-sze rules, such as a standard Armjo-lke lne-search procedu or a (sutably small) constant step-sze. We omt the dscusson of these optons because the former s not n lne wth our parallel approach whle the latter s numercally less effcent. On the choce of the error bound functon E (x). As we mentoned, the most obvous choce s to take E (x) = x (x k, τ ) x k. Ths s a valuable choce f the computaton of x (x k, τ ) can be easly accomplshed. For nstance, n the LASSO problem wth N = {1,..., n} (.e., when each block duces to a scalar varable), t s well-known that x (x k, τ ) can be computed n closed form usng the softthsholdng operator [12]. In stuatons whe the computaton of x (x k, τ ) x k s not possble or advsable, we can sort to estmates. Assume momentarly that G 0. Then t s known [27, Proposton 6.3.1] under our assumptons that Π X (x k x F (x k )) x k s an error bound for the mnmzaton problem n (3) and thefo satsfes (4), whe Π X (y) denotes the Eucldean projecton of y onto the closed and convex set X. In ths stuaton we can choose E (x k ) = Π X (x k x F (x k )) x k. If G(x) 0 thngs become mo complex. In most cases of practcal ntest, adequate error bounds can be derved from [11, Lemma 7]. It s ntestng to note that the computaton of E s only needed f a partal update of the (block) varables s performed. However, an opton that s always feasble s to take S k = N at each teraton,.e., update all (block) varables at each teraton. Wth ths choce we can dspense wth the computaton of E altogether. On the choce of the approxmant P (x ; x). The most obvous choce for P s the lnearzaton of F at x k wth spect to x : P (x ; x k ) = F (x k ) + x F (x k ) T (x x k ). Wth ths choce, and takng for smplcty Q (x k ) = I, { x (x k, τ ) = argmn F (x k ) + x F (x k ) T (x x k ) x X + τ } 2 x x k 2 + g (x ). (6) Ths s essentally the way a new teraton s computed n most sequental (block-)cdms for the soluton of (group) LASSO problems and ts generalzatons. Note that contrary to most exstng schemes, our algorthm s parallel. At another extme we could just take P (x ; x k ) = F (x, x k ). Of course, to have P1 satsfed (cf. Secton III), we must assume that F (x, x k ) s convex. Wth ths choce, and settng for smplcty Q (x k ) = I, we have { x (x k, τ ) argmn F (x, x k ) + τ } x X 2 x x k 2 + g (x ), (7) thus gvng rse to a parallel nonlnear Jacob type method for the constraned mnmzaton of V (x). Between the two extme solutons proposed above, one can consder ntermedate choces. For example, If F (x, x k ) s convex, we can take P (x ; x k ) as a second order approxmaton of F (x, x k ),.e., P (x ; x k ) = F (x k ) + x F (x k ) T (x x k ) (x x k )T 2 x x F (x k )(x x k ). (8) When g (x ) 0, ths essentally corsponds to takng a Newton step n mnmzng the duced problem mn x X F (x, x k ), sultng n x (x k, τ ) = argmn x X { F (x k ) + x F (x k ) T (x x k ) (x x k )T 2 x x F (x k )(x x k ) + τ 2 x x k 2 + g (x ) }. (9) Another ntermedate choce, lyng on a specfc structu of the objectve functon that has mportant applcatons s the followng. Suppose that F s a sum-utlty functon,.e., F (x) = j J f j (x, x ), for some fnte set J. Assume now that for every j S J, the functons f j (, x ) s convex. Then we may set P (x ; x k ) = j S f j (x, x k ) + j S f j (x, x k ) T (x x k ) thus pservng, for each, the favorable convex part of F wth spect to x whle lnearzng the nonconvex parts. Ths s the approach adopted n [21] n the desgn of mult-users systems, to whch we fer for applcatons n sgnal processng and communcatons. The framework descrbed n Algorthm 1 can gve rse to very dffent schemes, accordng to the choces one makes for the many varable featus t contans, some of whch have been detaled above. Because of space lmtaton, we cannot dscuss he all possbltes. We provde next just a few nstances of possble algorthms that fall n our framework. Example #1 (Proxmal) Jacob algorthms for convex functons. Consder the smplest problem fallng n our settng: the unconstraned mnmzaton of a contnuously dffentable convex functon,.e., assume that F s convex, G 0, and X = R n. Although ths s possbly the best studed problem n nonlnear optmzaton, classcal parallel methods for ths problem [26, Sec ] qu very strong contracton condtons. In our framework we can take P (x ; x k ) = F (x, x k ), sultng n a parallel Jacob-type method whch does not need any addtonal assumptons. Furthermo our
6 6 theory shows that we can even dspense wth the convexty assumpton and stll get convergence of a Jacob-type method to a statonary pont. If n addton we take S k = N, we obtan the class of methods studed n [21], [28] [30]. Example #2 Parallel coordnate descent methods for LASSO. Consder the LASSO problem,.e., Problem (1) wth F (x) = Ax b 2, G(x) = cx 1, and X = R n. Probably, to date, the most successful class of methods for ths problem s that of CDMs, wheby at each teraton a sngle varable s updated usng (6). We can easly obtan a parallel verson for ths method by takng n = 1, S k = N and stll usng (6). Alternatvely, nstead of lnearzng F (x), we can better explot the structu of F (x) and use (7). In fact, t s well known that n LASSO problems subproblem (7) can be solved analytcally. We can easly consder smlar methods for the group LASSO problem as well (just take n > 1). Example #3 Parallel coordnate descent methods for Logstc Regsson. Consder the Logstc Regsson problem,.e., Problem (1) wth F (x) = m j=1 log(1 + e ayt x ), G(x) = cx 1, and X = R n, whe y R n, a { 1, 1}, and c R ++ a gven constants. Snce F (x, x k ) s convex, we can take P (x ; x k ) = F (x k ) + x F (x k ) T (x x k ) (x x k )T 2 x x F (x k )(x x k ) and thus obtanng a fully dstrbuted and parallel CDM that uses a second order approxmaton of the smooth functon F. Moover by takng n = 1 and usng a soft-thsholdng operator, each x can be computed n closed form. V. RELATED WORKS The proposed algorthmc framework draws on Successve Convex Approxmaton (SCA) paradgms that have a long hstory n the optmzaton lteratu. Nevertheless, our algorthms and ther convergence condtons (cf. Theoms 1 and 2) unfy and extend curnt parallel and sequental SCA methods n several dctons, as outlned next. (Partally) Parallel Determnstc Methods: The roots of parallel determnstc SCA schemes (when all the varables a updated smultaneously) can be traced back at least to the work of Cohen on the so-called auxlary prncple [28], [29] and ts lated developments, see e.g. [9] [16], [21], [30] [32]. Roughly speakng these works can be dvded n two groups, namely: soluton methods for convex objectve functons [9], [12], [14] [16], [28], [29] and nonconvex ones [10], [11], [13], [21], [30] [32]. All methods n the former group (and [10], [11], [13], [31], [32]) a (proxmal) gradent schemes; they thus sha the classcal drawbacks of gradent-lke schemes; moover, by placng the convex functon F wth ts frst order approxmaton, they do not take any advantage of the structu of F, a fact that nstead has been shown to enhance convergence speed [21]. Comparng wth the second group of works [10], [11], [13], [21], [30] [32], our algorthmc framework mproves on ther convergence propertes whle addng mo flexblty n the selecton of how many varables to update at each teraton. For nstance, wth the excepton of [11], all the afomentoned works do not allow parallel updates of only a subset of all varables, a fact that nstead can dramatcally mprove the convergence speed of the algorthm, as we show n Secton VI. Moover, wth the excepton of [30], they all qu an Armjo-type lne-search, whch makes them not appealng for a (parallel) dstrbuted mplementaton. A scheme n [30] s actually based on dmnshng step-szerules, but ts convergence propertes a qute weak: not all the lmt ponts of the sequence generated by ths scheme a guaranteed to be statonary solutons of (1). Our framework nstead ) deals wth nonconvex (nonsmooth) problems; ) allows one to use a much vared array of approxmatons for F and also nexact solutons of the subproblems; ) s fully parallelzable and dstrbutable (t does not ly on any lne-search); and v) leads to the frst dstrbuted convergent schemes based on very general (possbly) partal updatng rules of the optmzaton varables. In fact, among determnstc schemes, we a awa of only the algorthms [11], [14], [15] performng at each teraton a parallel update of only a subset of all the varables. These algorthms however a gradent-lke schemes, and do not allow nexact solutons of the subproblems (n some large-scale problems the cost of computng the exact soluton of all the subproblems can be prohbtve). In addton, [11] qus an Armjo-type lnesearch wheas [14] and [15] a applcable only to convex objectve functons and a not fully parallel. In fact, convergence condtons then mpose a constrant on the maxmum number of varables that can be smultaneously updated (lnked to the spectral radus of some matrces), a constrant that n many large scale problems s lkely not satsfed. Sequental Methods: Our framework contans as specal cases also sequental updates; t s then ntestng to compa our sults to sequental schemes too. Gven the vast lteratu on the subject, we consder he only the most cent and general work [17]. In [17] the authors consder the mnmzaton of a possbly nonsmooth functon by Gauss-Sedel methods wheby, at each teraton, a sngle block of varables s updated by mnmzng a global upper convex approxmaton of the functon. However, fndng such an approxmaton s generally not an easy task, f not mpossble. To cope wth ths ssue, the authors also proposed a varant of ther scheme that does not need ths qument but uses an Armjo-type lne-search, whch however makes the scheme not sutable for a parallel/dstrbuted mplementaton. Contrary to [17], n our framework condtons on the approxmaton functon (cf. P1-P3) a trval to be satsfed (n partcular, P need not be an upper bound of F ), enlargng sgnfcantly the class of utlty functons V whch the proposed soluton method s applcable to. Furthermo, our framework gves rse to parallel and dstrbuted methods (no lne search s used) when all varables can be updated rather ndependently at the same tme. VI. NUMERICAL RESULTS In ths secton we provde some numercal sults provdng a sold evdence of the vablty of our approach; they clearly show that our algorthmc framework leads to practcal methods that explot well parallelsm and compa favourably to exstng schemes, both parallel and sequental. The tests we carred out on LASSO and Logstc Regsson problems, two of the most studed nstances of Problem (1).
7 7 All codes have been wrtten n C++ and use the Message Passng Interface for parallel operatons. All algebra s performed by usng the GNU Scentfc Lbrary (GSL). The algorthms we tested on the General Compute Cluster of the Center for Computatonal Research at the State Unversty of New York at Buffalo. In partcular for our experments we used a partton composed of 372 DELL 12x2.40GHz Intel Xeon E5645 Processor computer nodes wth 48 Gb of man memory and QDR InfnBand 40Gb/s network card. In our experments dstrbuted algorthms ran on 20 parallel processes (that s we used 2 nodes wth 10 cos each one), whle sequental algorthms ran on a sngle process (usng thus one sngle co). A. LASSO problem We mplemented the nstance of Algorthm 1 that we descrbed n Example # 2 n the pvous secton, usng the approxmatng functon P as n (7). Note that n the case of LASSO problems x (x k, τ ), the unque soluton (7), can be easly computed n closed form usng the soft-thsholdng operator, see e.g. [12]. Tunng of Algorthm 1: The fe parameters of our algorthm a chosen as follows. The proxmal gans τ a ntally all set to τ = tr(a T A)/2n, whe n s the total number of varables. Ths ntal value, whch s half of the mean of the egenvalues of 2 F, has been observed to be very effectve n all our numercal tests. Choosng an approprate value of τ at each teraton s crucal. Note that n the descrpton of our algorthmc framework we consded fxed values of τ, but t s clear that varyng them a fnte number of tmes does not affect n any way the theotcal convergence propertes of the algorthms. On the other hand, we found that an approprate update of τ n early teratons can enhance consderably the performance of the algorthm. Some plmnary experments showed that an effectve opton s to choose τ large enough to force a decase n the objectve functon value, but not too large to slow down progss towards optmalty. We found that the followng heurstc works well n practce: () all τ a doubled f at a certan teraton the objectve functon does not decase; and () they a all halved f the objectve functon decases for ten consecutve teratons or the latve error on the objectve functon (x) s suffcently small, specfcally f (x) V (x) V V, (10) whe V s the optmal value of the objectve functon V (n our experments on LASSO V s known, see below). In order to avod ncments n the objectve functon, whenever all τ a doubled, the assocated teraton s dscarded, and n Step 4 of Algorthm 1 t s set x k+1 = x k. In any case we lmted the number of possble updates of the values of τ to 100. The step-sze γ k s updated accordng to the followng rule: { } ) γ k = γ (1 k 1 mn 1, θ γ k 1, k = 1,..., (x k 1 ) (11) wth γ 0 = 0.9 and θ = 1e 7. The above dmnshng rule s based on (5) whle guaranteeng that γ k does not become too close to zero befo the latve error s suffcently small. Note that snce τ a changed only a fnte number of tmes and the step-sze γ k decases, the condtons of Theom 1 a all satsfed. Fnally the error bound functon s chosen as E (x k ) = x (x k, τ ) x k, and Sk n Step 3 of the algorthm s set to S k = { : E (x k ) σm k }. In our tests we consder two optons for σ, namely: ) σ = 0, whch leads to a fully parallel scheme when at each teraton all varables a updated; and ) σ = 0.5, whch corsponds to updatng only a subset of all the varables at each teraton. Note that for both choces of σ, the sultng set S k satsfes the qument n Step 3 of Algorthm 1; ndeed, S k always contans the ndex corspondng to the largest E (x k ). Recall also that, as we alady mentoned, the computaton of each x (x k, τ ) for the LASSO problem s n closed form and thus nexpensve. We termed the above nstance of our Algorthm 1 FLEXble parallel Algorthm (FLEXA); n the sequel we wll fer to the two versons of FLEXA as FLEXA σ = 0 and FLEXA σ = 0.5. Algorthms n the lteratu: We compad our versons of FLEXA wth the most common dstrbuted and sequental algorthms proposed n the lteratu to solve the LASSO problem. Mo specfcally, we consder the followng schemes. FISTA: The Fast Iteratve Shrnkage-Thsholdng Algorthm (FISTA) proposed n [12] s a frst order method and can be garded as the benchmark algorthm for LASSO problems. By takng advantages of the separablty of the terms n the objectve functon V, ths method can be easly parallelzed and thus mplemented on a parallel archtectu. FISTA qus the plmnary computaton of the Lpschtz constant L F of F ; n our experments we performed ths computaton usng a dstrbuted verson of the power method that computes A 2 2 (see, e.g., [33]). SpaRSA: Ths s the frst order method proposed n [13]; t s a popular spectral projected gradent method that uses a spectral step length together wth a nonmonotone lne search to enhance convergence. Also ths method can be easly parallelzed, whch s the verson that mplemented n our tests. In all the experments we set the parameters of SpaRSA as n [13]: M = 5, σ = 0.01, α max = 1e30, and α mn = 1e 30. GRock: Ths s a parallel algorthm proposed n [15] that seems to perform extmely well on sparse LASSO problems. We actually tested two nstances of GRock, namely: ) one when only one varable s updated at each teraton; and ) a second nstance whe the number of varables smultaneously updated s equal to the number of the parallel processors (n our experments we used 20 processors). It s mportant to mark that the theotcal convergence propertes of GRock a n jeopardy as the number of varables updated n parallel ncases; roughly speakng, GRock s guaranteed to converge f the columns of the data matrx A n the LASSO problem a almost orthogonal, a featu enjoyed by most of our test problems, but that s not satsfed n many applcatons. ADMM: Ths s a classcal Alternatng Method of Multplers (ADMM) n the form used n [34]. Appled to LASSO problems, ths nstance leads to a sequental scheme whe
8 8 only one varable per tme can be updated (n closed form). Note that n prncple ADMM can be parallelzed, but t s well known that t does not to scale well wth the number of the processors; thefo n our tests we have not mplemented the parallel verson. GS: Ths s a classcal sequental Gauss-Sedel scheme [26] computng ˆx wth n = 1, and then updatng all x n a sequental fashon (and usng untary step-sze). In all the parallel algorthms we mplemented (FLEXA, FISTA, SpaRSA and GRock), the data matrx A of the LASSO problem s stod n a column block dstrbuted manner A = [A 1 A 2 A P ], whe P s the number of parallel processors. Thus the computaton of each product Ax (whch s qud to evaluate F ) and the norm x 1 (that s G) s dvded nto the parallel jobs of computng A x and x 1, followed by a duce operaton. Columns of A we equally dstrbuted among the processes. Numercal Tests: We generated sx groups of LASSO problems usng the random generaton technque proposed by Nesterov [10]; ths method permts to control the sparsty of the soluton. For the frst fve groups, we consded problems wth 10,000 varables and matrx A havng 9,000 rows. The fve groups dffer from the dege of sparsty of the soluton; mo specfcally the percentage of non zeros n the soluton s 1%, 10%, 20%, 30%, and 40%, spectvely. The last group s formed by nstances wth 100,000 varables and 5000 rows for A, and solutons havng 1% of non zero varables. In all experments and for all the algorthms, the ntal pont was set to the zero vector. Results of our experments for each of the 10,000 varables groups a ported n Fg. 1, whe we plot the latve error as defned n (10) versus the CPU tme; all the curves a averaged over ten ndependent random alzatons. Note that the CPU tme ncludes communcaton tmes (for dstrbuted algorthms) and the ntal tme needed by the methods to perform all p-teratons computatons (ths explans why the curves assocated wth FISTA start after the others; n fact FISTA qus some nontrval ntalzatons based on the computaton of A 2 2). Results of our experments for the LASSO nstance wth 100,000 varables a ported n Fg. 2. The curves a averaged over the random alzatons. Note that we have not ncluded the curves for sequental algorthms (ADMM and GS) on ths group of bg problems, snce we could not use the same nodes used to run all the other algorthms, due to memory lmtatons. However, we tested ADMM and GS on these bg problems on dffent hgh-memory nodes; the obtaned sults (not ported he) showed that, as the dmensons of the problem ncase, sequental methods perform poorly n comparson wth parallel methods; thefo we excluded ADMM and GS n the tests for the LASSO nstance wth 100,000 varables. Gven Fg. 1 and 2, the followng comments a n order. On all the tested problems, FLEXA wth σ = 0.5 outperforms n a consstent manner all other mplemented algorthms. Results for FLEXA wth σ = 0 a qute smlar to those wth σ = 0.5 on the 10,000 varables problems. However on larger problems FLEXA σ = 0 (.e., the verson when all varables FLEXA σ = 0 FLEXA σ = 0.5 FISTA SpaRSA GRock P = 1 GRock P = 20 ADMM GS (b) (d) (a) (c) Fg. 1: Relatve error vs. tme (n seconds) for Lasso wth 10,000 varables: (a) 1% non zeros - (b) 10% non zeros - (c) 20% non zeros - (d) 30% non zeros - (e) 40% non zeros FLEXA σ = 0 FLEXA σ = 0.5 FISTA SpaRSA GRock P = 1 GRock P = Fg. 2: Relatve error vs. tme (n seconds) for Lasso wth 100,000 varables a updated at each teraton) seems neffectve. Ths sult mght seem surprsng at frst sght: why, once all the optmal solutons x (x k, τ ) a computed, s t mo convenent not to use all of them but update nstead only a subset of varables? We brefly dscuss ths complex ssue next. Remark 3 (On the partal updates): It can be shown that (e)
9 9 Algorthm 1 has the markable capablty to dentfy those varables that wll be zero at a soluton; because of lack of space, we do not provde he the proof of ths statement but only an nformal descrpton. Roughly speakng, t can be shown that, for k large enough, those varables that a zero n x(x k, τ ) wll be zero also n a lmtng soluton x. Thefo, suppose that k s large enough so that ths dentfcaton property alady takes place (we wll say that we a n the dentfcaton phase") and consder an ndex such that x = 0. Then, f x k s zero, t s clear, by Steps 3 and 4, that x k wll be zero for all ndces k > k, ndependently of whether belongs to S k or not. In other words, f a varable that s zero at the soluton s alady zero when the algorthms enters the dentfcaton phase, that varable wll be zero n all subsequent teratons; ths fact, ntutvely, should enhance the convergence speed of the algorthm. Conversely, f when we enter the dentfcaton phase x k s not zero, the algorthm wll have to brng t back to zero teratvely. It should then be clear why updatng only varables that we have strong ason to beleve wll be non zero at a soluton s a better strategy than updatng them all. Of course, the may be a problem dependence and the best value of σ can vary from problem to problem. But we beleve that the explanaton outlned above gves frm theotcal ground to the dea that t mght be wse to waste" some calculatons and perform only a partal update of the varables. Referrng to sequental methods (ADMM and GS), they behave strkngly well on the 10,000 varables problems, f one keeps n mnd that they only use one process. However, as alady observed, they cannot compete wth parallel methods on larger problems. FISTA s capable to approach latvely fast low accuracy solutons, but has dffcultes n achng hgh accuracy. The verson of GRock wth P = 20 s the closest match to FLEXA, but only when the problems a very sparse. Ths s consstent wth the fact that ts convergence propertes a at stake when the problems a qute dense. Furthermo, t should be clear that f the problem s very large, updatng only 20 varables at each teraton, as GRock does, could slow down the convergence, especally when the optmal soluton s not very sparse. From ths pont of vew, the strategy used by FLEXA σ = 0.5 seems to strke a good balance between not updatng varables that a probably zero at the optmum and nevertheless update a szeable amount of varables when needed n order to enhance convergence. Fnally, SpaRSA seems to be very nsenstve to the dege of sparsty of the soluton; t s comparable to our FLEXA on 10,000 varables problems, but s much less effectve on very large-scale problems. In concluson, Fg. 1 and Fg. 2 show that whle the s no algorthm n the lteratu performng equally well on all the smulated (large and very large-scale) problems, the proposed FLEXA s consstently the wnner. B. Logstc gsson problems The logstc gsson problem s descrbed n Example #3 (cf. Secton III). For such a problem, we mplemented the nstance of Algorthm 1 descrbed n the same example. Mo specfcally, the algorthm s essentally the same descrbed for LASSO, but wth the followng dffences: Data set m n c gsette (scaled) /1500 colon-cancer leukema TABLE I: Test problems for logstc gsson tests (a) The approxmant P s chosen as the second order approxmaton of the orgnal functon F ; (b) The ntal τ a set to tr(y T Y)/2n for all, whe n s the total number of varables and Y = [y 1 y 2 y m ] T. (c) Snce the optmal value V s not known for the logstc gsson problem, we no longer use (x) as mert functon but Z(x), wth Z(x) = F (x) Π [ c,c] n ( F (x) x). He the projecton Π [ c,c] n(z) can be effcently computed; t acts component-wse on z, snce [ c, c] n = [ c, c] [ c, c]. Note that Z(x) s a vald optmalty measu functon; ndeed, Z(x) = 0 s equvalent to the standard necessary optmalty condton for Problem (1), see [6]. Thefo, whenever (x) was used for the Lasso problems, we now use Z(x) [ncludng n the step-sze rule (11)]. We smulated the nstances of the logstc gsson problem, whose essental data featus a gven n Table I; we downloaded the data from the LIBSVM postory cjln/lbsvm/, whch we fer to for a detaled descrpton of the test problems. In our mplementaton, the matrx Y s stod n a column block dstrbuted manner Y = [Y 1 Y 2 Y P ], whe P s the number of parallel processors. We compad FLEXA σ = 0 and FLEXA σ = 0.5 wth the other parallel algorthms, namely: FISTA, SpaRSA, and GRock. We do not port sults for the sequental methods (GS and ADMM) because we alady ascertaned that they a not compettve. The tunng of the fe parameters n all the algorthms s the same as n Fg. 1 and Fg. 2. In Fg. 3 we plotted the latve error vs. the CPU tme (the latter defned as n Fg. 1 and Fg. 2). Note that ths tme n order to plot the latve error, we had to plmnary estmate V (whch we call s not known for logstc gsson problems). In order to do so we ran FLEXA wth σ = 0.5 untl the mert functon value Z(x k ) went below 1e 6, and used the corspondng value of the objectve functon as estmate of V. We mark that we used ths value only to plot the curves n Fg. 3. Results on Logstc Regsson nforce the concluson we made based on the experments on LASSO problems. Actually, Fg. 3 clearly shows that on these problems both FLEXA methods sgnfcantly and consstently outperform all other soluton methods. In concluson, our experments ndcate that our algorthmc framework can lead to very effcent and practcal soluton methods for large-scale problems, wth the flexblty to adapt to many dffent problem characterstcs.
10 10 FLEXA σ = 0 FLEXA σ = 0.5 FISTA SpaRSA GRock P = 1 GRock P = x 10 4 (a) (x w) T ( x H (x; y) x H (w; y)) c τ x w 2, (12) for all x, w X and gven y X; () x H(x; ) s unformly Lpschtz contnuous on X,.e., the exsts a 0 < L H < ndependent on x such that x H (x; y) x H (x; w) L H y w, (13) for all y, w X and gven x X. Proof: The proof s standard and thus s omtted. Proposton 5: Consder Problem (1) under A1-A6. Then the mappng X y x(y) has the followng propertes: (a) x( ) s Lpschtz contnuous on X,.e., the exsts a postve constant ˆL such that x(y) x(z) ˆL y z, y, z X; (14) (b) Fg. 3: Relatve error vs. tme (n seconds) for Logstc Regsson: (a) gsette - (b) colon-cancer - (c) leukema VII. CONCLUSIONS We proposed a hghly parallelzable algorthmc scheme for the mnmzaton of the sum of a possbly noncovex dffentable functon and a possbly nonsmooth but blockseparable convex one. Qute markably, our framework leads to dffent (new) algorthms whose dege of parallelsm can be chosen by the user, rangng from fully parallel to sequental schemes, all of them convergng under the same condtons. Many well know sequental and smultaneous soluton methods n the lteratu a just specal cases of our algorthmc framework. Our plmnary tests a very promsng, showng that our algorthms consstently outperform stateof-the-art schemes. Experments on larger and mo vared classes of problems (ncludng those lsted n Secton II) a the subject of our curnt search. We also plan to nvestgate asynchronous versons of Algorthm 1, the latter beng a very mportant ssue n many dstrbuted settngs. APPENDIX We frst ntroduce some plmnary sults nstrumental to prove both Theom 1 and Theom 2. Heafter, for notatonal smplcty, we wll omt the dependence of x(y, τ ) on τ and wrte x(y). Gven S N and x (x ) N =1, we wll also denote by (x) S (or nterchangeably x S ) the vector whose component s equal to x f S, and zero otherwse. A. Intermedate sults Lemma 4: Let H(x; y) h (x ; y). Then, the followng hold: () H( ; y) s unformly strongly convex on X wth constant c τ > 0,.e., (c) (b) the set of the fxed-ponts of x( ) concdes wth the set of statonary solutons of Problem (1); thefo x( ) has a fxed-pont; (c) for every gven y X and for any set S N, t holds that ( x(y) y) T S xf (y) S + S wth c τ q mn τ. g ( x (y)) S c τ ( x(y) y) S 2, g (y ) (15) Proof: We prove the proposton n the followng order: (c), (a), (b). (c): Gven y X, by defnton, each x (y) s the unque soluton of problem (3); then t s not dffcult to see that the followng holds: for all z X, (z x (y)) T x h ( x (y); y) + g (z ) g ( x (y)) 0. (16) Summng and subtractng x P (y ; y) n (16), choosng z = y, and usng P2, we get (y x (y)) T ( x P ( x (y); y) x P (y ; y)) + (y x (y)) T x F (y) + g (y ) g ( x (y)) τ ( x (y) y ) T Q (y) ( x (y) y ) 0, (17) for all N. Observng that the term on the frst lne of (17) s non postve and usng P1, we obtan (y x (y)) T x F (y) + g (y ) g ( x (y)) c τ x (y) y 2, for all N. Summng over S we get (15). (a): We use the notaton ntroduced n Lemma 4. Gven y, z X, by optmalty and (16), we have, for all v and w n X (v x(y)) T x H ( x(y); y) + G(v) G( x(y)) 0 (w x(z)) T x H ( x(z); z) + G(w) G( x(z)) 0. Settng v = x(z) and w = x(y), summng the two nequaltes above, and addng and subtractng x H ( x(y); z), we
11 11 obtan: ( x(z) x(y)) T ( x H ( x(z); z) x H ( x(y); z)) ( x(y) x(z)) T ( x H ( x(y); z) x H ( x(y); y)). (18) Usng (12) we can lower bound the left-hand-sde of (18) as ( x(z) x(y)) T ( x H ( x(z); z) x H ( x(y); z)) c τ x(z) x(y) 2, (19) wheas the rght-hand-sde of (18) can be upper bounded as ( x(y) x(z)) T ( x H ( x(y); z) x H ( x(y); y)) L H x(y) x(z) y z, (20) whe the nequalty follows from the Cauchy-Schwartz nequalty and (13). Combnng (18), (19), and (20), we obtan the desd Lpschtz property of x( ). (b): Let x X be a fxed pont of x(y), that s x = x(x ). Each x (y) satsfes (16) for any gven y X. For some ξ g (x ), settng y = x and usng x = x(x ) and the convexty of g, (16) duces to (z x ) T ( x F (x ) + ξ ) 0, (21) for all z X and N. Takng nto account the Cartesan structu of X, the separablty of G, and summng (21) over N, we obtan (z x ) T ( x F (x ) + ξ) 0, for all z X, wth z (z ) N =1 and ξ (ξ ) N =1 G(x ); thefo x s a statonary soluton of (1). The converse holds because ) x(x ) s the unque optmal soluton of (3) wth y = x, and ) x s also an optmal soluton of (3), snce t satsfes the mnmum prncple. Lemma 6: [35, Lemma 3.4, p.121] Let {X k }, {Y k }, and {Z k } be the sequences of numbers such that Y k 0 for all k. Suppose that X k+1 X k Y k + Z k, k = 0, 1,... and k=0 Zk <. Then ether X k or else {X k } converges to a fnte value and k=0 Y k <. Lemma 7: Let {x k } be the sequence generated by Algorthm 1. Then, the s a postve constant c such that the followng holds: for all k 1, ( x F (x k ) ) T ( x(x k ) x k) + g S k S k ( x (x k )) S k (22) g (x k ) c x(x k ) x k 2. S k Proof: Let j k be an ndex n S k such that E jk (x k ) ρ max E (x k ) (Step 3 of Algorthm 1). Then, usng the afomentoned bound and (4), t s easy to check that the followng chan of nequaltes holds: s jk x S k(x k ) x k S k s j k x jk (x k ) x k j k Hence we have for any k, E jk (x k ) x S k(x k ) x k S k ( ρ mn s N s jk ρ max E (x k ) ( ) ( ) ρ mn s max{ x (x k ) x k } ( ) ρ mn s x(x k ) x k. N ) x(x k ) x k. (23) Invokng now Proposton 5(c) wth S = ( S k and y ) = x k, and ρ mn s 2. usng (23), (22) holds true, wth c c τ N max j s j B. Proof of Theom 1 We a now ady to prove the theom. For any gven k 0, the Descent Lemma [26] yelds F ( x k+1) F ( x k) + γ k x F ( x k) T (ẑk x k) ( ) γ k 2 L F + ẑ k x k 2, 2 (24) wth ẑ k (ẑ k )N =1 and zk (z k )N =1 defned n Step 3 and 4 (Algorthm 1). Observe that ẑ k x k 2 z k x k 2 2 x(x k ) x k 2 +2 N z k x (x k ) 2 2 x(x k ) x k N (εk )2, (25) whe the frst nequalty follows from the defnton of z k and ẑ k, and n the last nequalty we used z k x (x k ) ε k. Denotng by S k the complement of S, we also have, for k large enough, x F ( x k) T (ẑk x k) = x F ( x k) T (ẑk x(x k ) + x(x k ) x k) = x F ( x k) T (z k x(x k )) S k S k + x F ( x k) T S k (xk x(x k )) k S + x F ( x k) T ( x(x k ) x k ) S k S k + x F ( x k) T S k ( x(xk ) x k ) k S = x F ( x k) T (z k x(x k )) S k S k + x F ( x k) T ( x(x k ) x k ) S k S k, (26) whe n the second equalty we used the defnton of ẑ k and of the set S k. Now, usng (26) and Lemma 7, we can wrte
12 12 x F ( x k) T (ẑk x k) + S k g (ẑ k ) S k g (x k ) = x F ( x k) T (ẑk x k) + S k g ( x (x k )) S g k (x k ) + S g k (ẑ k ) S g k ( x (x k )) c x(x k ) x k 2 + S ε k k x F (x k ) +L G S k εk, (27) whe L G s a (global) Lpschtz constant for (all) g. Fnally, from the defnton of ẑ k and of the set S k, we have for all k large enough, V (x k+1 ) = F (x k+1 ) + N g (x k+1 ) = F (x k+1 ) + N g (x k + γk (ẑ k xk )) F (x k+1 ) + N g (x k ) + γk ( S k (g (ẑ k ) g (x k ))) V ( x k) γ ( c ) k γ k L F x(x k ) x k 2 + T k, (28) whe n the frst nequalty we used the the convexty of the g s, wheas the second one follows from (24), (25) and (27), wth T k γ k ( LG + x F (x k ) ) + ( γ k) 2 L F (ε k ) 2. S k ε k Usng assumpton (v), we can bound T k as N T k (γ k ) 2 [ Nα 1 (α 2 L G + 1) + (γ k ) 2 L F (Nα 1 α 2 ) 2], whch, by assumpton (v) mples k=0 T k <. Snce γ k 0, t follows from (28) that the exst some postve constant β 1 and a suffcently large k, say k, such that V (x k+1 ) V (x k ) γ k β 1 x(x k ) x k 2 + T k, (29) for all k k. Invokng Lemma 6 wth the dentfcatons X k = V (x k+1 ), Y k = γ k β 1 x(x k ) x k 2 and Z k = T k whle usng k=0 T k <, we deduce from (29) that ether {V (x k )} or else {V (x k )} converges to a fnte value and k lm γ t x(x t ) x t 2 < +. (30) k t= k Snce V s coercve, V (x) mn y X V (y) >, mplyng that {V ( x k) } s convergent; t follows from (30) and k=0 γk = that lm nf k x(x k ) x k = 0. Usng Proposton 5, we show next that lm k x(x k ) x k = 0; for notatonal smplcty we wll wrte x(x k ) x(x k ) x k. Suppose, by contradcton, that lm sup k x(x k ) > 0. Then, the exsts a δ > 0 such that x(x k ) > 2δ for nfntely many k and also x(x k ) < δ for nfntely many k. Thefo, one can always fnd an nfnte set of ndexes, say K, havng the followng propertes: for any k K, the exsts an nteger k > k such that x(x k ) < δ, x(x k ) > 2δ (31) δ x(x j ) 2δ k < j < k. (32) Gven the above bounds, the followng holds: for all k K, δ (a) < x(x k ) x(x k ) x(x k ) x(x k ) + x k x k (33) (b) (1 + ˆL) x k x k (34) (c) (1 + ˆL) k 1 γ t ( x(x t ) S t + (z t x(x t )) S t ) (d) (1 + ˆL) (2δ + ε max ) k 1 γ t, (35) whe (a) follows from (31); (b) s due to Proposton 5(a); (c) comes from the trangle nequalty, the updatng rule of the algorthm and the defnton of ẑ k ; and n (d) we used (31), (32), and z t x(x t ) N εt, whe εmax max k N εk <. It follows from (35) that k 1 lm nf γ t k δ (1 + ˆL)(2δ > 0. (36) + ε max ) We show next that (36) s n contradcton wth the convergence of {V (x k )}. To do that, we plmnary prove that, for suffcently large k K, t must be x(x k ) δ/2. Proceedng as n (35), we have: for any gven k K, x(x k+1 ) x(x k ) (1 + ˆL) x k+1 x k (1 + ˆL)γ k ( x(x k ) + ε max). It turns out that for suffcently large k K so that (1+ ˆL)γ k < δ/(δ + 2ε max ), t must be x(x k ) δ/2; (37) otherwse the condton x(x k+1 ) δ would be volated [cf. (32)]. Heafter we assume wthout loss of generalty that (37) holds for all k K (n fact, one can alway strct {x k } k K to a proper subsequence). We can show now that (36) s n contradcton wth the convergence of {V (x k )}. Usng (29) (possbly over a subsequence), we have: for suffcently large k K, k 1 V (x k ) V (x k ) β 2 (a) < V (x k ) β 2 (δ 2 /4) γ t x(x t ) 2 k 1 + k 1 k 1 γ t + T t T t (38) whe n (a) we used (32) and (37), and β 2 s some postve constant. Snce {V (x k )} converges and k=0 T k <, (38) k 1 mples lm K k γt = 0, whch contradcts (36). Fnally, snce the sequence {x k } s bounded [due to the coercvty of V and the convergence of {V (x k )}], t has at least one lmt pont x that must belong to X. By the contnuty of x( ) [Proposton 5(a)] and lm k x(x k ) x k = 0, t must be x( x) = x. By Proposton 5(b) x s also a statonary soluton of Problem (1). As a fnal mark, note that f ε k = 0 for every and for every k large enough,.e., f eventually x(x k ) s computed exactly, the s no need to assume that G s globally Lpschtz.
13 13 In fact n (27) the term contanng L G dsappears, and actually all the terms T k a zero and all the subsequent dervatons ndependent of the Lpschtzanty of G. C. Proof of Theom 2 We show next that Algorthm 2 s just an nstance of the nexact Jacob scheme descrbed n Algorthm 1 satsfyng the convergence condtons n Theom 1; whch proves Theom 2. It s not dffcult to see that ths bols down to provng that, for all p P and I p, the sequence zp k n Step 2a) of Algorthm 2 satsfes z k p x p (x k ) ε k p, (39) for some { ε k p } such that n εk p γk <. The followng holds for the LHS of (39): z k p x p(x k ) x p (x k+1 p<, xk p, x p) x p (x k ) + z k p x p(x k+1 p<, xk p, x p) (a) x p (x k+1 p<, xk p, x p) x p (x k ) + ε k p (b) ˆL x k+1 p< xk p< + εk p (c) = ˆLγ k ( z k p< p<) xk + ε k p ˆLγ ( 1 ) k j=1 (zk pj x pj(x k ) + x pj (x k ) x k pj ) + ε k p (d) ˆLγ k β + ˆLγ k j< εk pj + εk p, whe (a) follows from the error bound n Step 2a) of Algorthm 2; n (b) we used Proposton 5a); (c) follows from Step 2b); and n (d) we used nducton, whe β < s a postve constant. It turns out that (39) s satsfed choosng ε k p ˆLγ k β + ˆLγ k j< εk pj + εk p. REFERENCES [1] F. Facchne, S. Sagratella, and G. Scutar, Flexble parallel algorthms for bg data optmzaton, n Proc. of the IEEE 2014 Internatonal Confence on Acoustcs, Speech, and Sgnal Processng (ICASSP 2014), Flonce, Italy, May 4-9,. [Onlne]. Avalable: [2] R. Tbshran, Regsson shrnkage and selecton va the lasso, Journal of the Royal Statstcal Socety. Seres B (Methodologcal), pp , [3] Z. Qn, K. Schenberg, and D. Goldfarb, Effcent block-coordnate descent algorthms for the group lasso, Mathematcal Programmng Computaton, vol. 5, pp , June [4] A. Rakotomamonjy, Surveyng and comparng smultaneous sparse approxmaton (or group-lasso) algorthms, Sgnal processng, vol. 91, no. 7, pp , July [5] G.-X. Yuan, K.-W. Chang, C.-J. Hseh, and C.-J. Ln, A comparson of optmzaton methods and softwa for large-scale l1-gularzed lnear classfcaton, The Journal of Machne Learnng Research, vol. 9999, pp , [6] R. H. Byrd, J. Nocedal, and F. Oztoprak, An Inexact Successve Quadratc Approxmaton Method for Convex L-1 Regularzed Optmzaton, arxv pprnt arxv: , [7] K. Fountoulaks and J. Gondzo, A Second-Order Method for Strongly Convex L1-Regularzaton Problems, arxv pprnt arxv: , [8] Y. Nesterov, Effcency of coordnate descent methods on huge-scale optmzaton problems, SIAM Journal on Optmzaton, vol. 22, no. 2, pp , [9] I. Necoara and D. Clpc, Effcent parallel coordnate descent algorthm for convex optmzaton problems wth separable constrants: applcaton to dstrbuted MPC, Journal of Process Control, vol. 23, no. 3, pp , March [10] Y. Nesterov, Gradent methods for mnmzng composte functons, Mathematcal Programmng, vol. 140, pp , August [11] P. Tseng and S. Yun, A coordnate gradent descent method for nonsmooth separable mnmzaton, Mathematcal Programmng, vol. 117, no. 1-2, pp , March [12] A. Beck and M. Teboulle, A fast teratve shrnkage-thsholdng algorthm for lnear nverse problems, SIAM Journal on Imagng Scences, vol. 2, no. 1, pp , Jan. [13] S. J. Wrght, R. D. Nowak, and M. A. Fguedo, Sparse constructon by separable approxmaton, IEEE Trans. on Sgnal Processng, vol. 57, no. 7, pp , July [14] J. K. Bradley, A. Kyrola, D. Bckson, and C. Guestrn, Parallel coordnate descent for l1-gularzed loss mnmzaton, n Proc. of the 28th Internatonal Confence on Machne Learnng, Bellevue, WA, USA, June 28 July 2, [15] Z. Yn, P. Mng, and Y. Wotao, Parallel and Dstrbuted Sparse Optmzaton, [Onlne]. Avalable: optmzaton/dsparse/ [16] P. Rchtárk and M. Takáč, Parallel coordnate descent methods for bg data optmzaton, arxv pprnt arxv: , [17] M. Razavyayn, M. Hong, and Z.-Q. Luo, A unfed convergence analyss of block successve mnmzaton methods for nonsmooth optmzaton, SIAM Journal on Optmzaton, vol. 23, no. 2, pp , [18] P. L. Bühlmann, S. A. van de Geer, and S. Van de Geer, Statstcs for hgh-dmensonal data. Sprnger, [19] S. Sra, S. Nowozn, and S. J. Wrght, Eds., Optmzaton for Machne Learnng, ser. Neural Informaton Processng. Cambrdge, Massachusetts: The MIT Pss, Sept [20] F. Bach, R. Jenatton, J. Maral, and G. Oboznsk, Optmzaton wth Sparsty-nducng Penaltes. Foundatons and Tnds R n Machne Learnng, Now Publshers Inc, Dec [21] G. Scutar, F. Facchne, P. Song, D. Palomar, and J.-S. Pang, Decomposton by Partal lnearzaton: Parallel optmzaton of mult-agent systems, IEEE Trans. on Sgnal Processng, vol. 62, pp , Feb [22] M. Yuan and Y. Ln, Model selecton and estmaton n gsson wth grouped varables, Journal of the Royal Statstcal Socety: Seres B (Statstcal Methodology), vol. 68, no. 1, pp , [23] S. K. Shevade and S. S. Keerth, A smple and effcent algorthm for gene selecton usng sparse logstc gsson, Bonformatcs, vol. 19, no. 17, pp , [24] L. Meer, S. Van De Geer, and P. Bühlmann, The group lasso for logstc gsson, Journal of the Royal Statstcal Socety: Seres B (Statstcal Methodology), vol. 70, no. 1, pp , [25] D. Goldfarb, S. Ma, and K. Schenberg, Fast alternatng lnearzaton methods for mnmzng the sum of two convex functons, Mathematcal Programmng, vol. 141, pp , Oct [26] D. P. Bertsekas and J. N. Tstskls, Parallel and Dstrbuted Computaton: Numercal Methods, 2nd ed. Athena Scentfc Pss, [27] F. Facchne and J.-S. Pang, Fnte-Dmensonal Varatonal Inequaltes and Complementarty Problem. Sprnger-Verlag, New York, [28] G. Cohen, Optmzaton by decomposton and coordnaton: A unfed approach, IEEE Trans. on Automatc Control, vol. 23, no. 2, pp , Aprl [29], Auxlary problem prncple and decomposton of optmzaton problems, Journal of Optmzaton Theory and Applcatons, vol. 32, no. 3, pp , Nov [30] M. Patrksson, Cost approxmaton: a unfed framework of descent algorthms for nonlnear programs, SIAM Journal on Optmzaton, vol. 8, no. 2, pp , [31] M. Fukushma and H. Mne, A generalzed proxmal pont algorthm for certan non-convex mnmzaton problems, Internatonal Journal of Systems Scence, vol. 12, no. 8, pp , [32] H. Mne and M. Fukushma, A mnmzaton method for the sum of a convex functon and a contnuously dffentable functon, Journal of Optmzaton Theory and Applcatons, vol. 33, no. 1, pp. 9 23, Jan [33] Y. Saad, Numercal methods for large egenvalue problems, ser. Classcs n Appled Mathematcs (Book 66). SIAM Socety for Industral & Appled Mathematcs; Revsed edton, May 2011, vol [34] Z.-Q. Luo and M. Hong, On the lnear convergence of the alternatng dcton method of multplers, arxv pprnt arxv: , [35] D. P. Bertsekas and J. N. Tstskls, Neuro-Dynamc Programmng. Cambrdge, Massachusetts: Athena Scentfc Pss, May
Hybrd Random-Determnstc Parallel Algorthms for Convex
Hybrd Random/Determnstc Parallel Algorthms for Convex and Nonconvex Bg Data Optmzaton Amr Daneshmand, Francsco Facchne, Vyacheslav Kungurtsev, and Gesualdo Scutar Abstract We propose a decomposton framewor
arxiv:1311.2444v1 [cs.dc] 11 Nov 2013
FLEXIBLE PARALLEL ALGORITHMS FOR BIG DATA OPTIMIZATION Francsco Facchne 1, Smone Sagratella 1, Gesualdo Scutar 2 1 Dpt. of Computer, Control, and Management Eng., Unversty of Rome La Sapenza", Roma, Italy.
What is Candidate Sampling
What s Canddate Samplng Say we have a multclass or mult label problem where each tranng example ( x, T ) conssts of a context x a small (mult)set of target classes T out of a large unverse L of possble
Recurrence. 1 Definitions and main statements
Recurrence 1 Defntons and man statements Let X n, n = 0, 1, 2,... be a MC wth the state space S = (1, 2,...), transton probabltes p j = P {X n+1 = j X n = }, and the transton matrx P = (p j ),j S def.
8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by
6 CHAPTER 8 COMPLEX VECTOR SPACES 5. Fnd the kernel of the lnear transformaton gven n Exercse 5. In Exercses 55 and 56, fnd the mage of v, for the ndcated composton, where and are gven by the followng
Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic
Lagrange Multplers as Quanttatve Indcators n Economcs Ivan Mezník Insttute of Informatcs, Faculty of Busness and Management, Brno Unversty of TechnologCzech Republc Abstract The quanttatve role of Lagrange
Support Vector Machines
Support Vector Machnes Max Wellng Department of Computer Scence Unversty of Toronto 10 Kng s College Road Toronto, M5S 3G5 Canada [email protected] Abstract Ths s a note to explan support vector machnes.
Luby s Alg. for Maximal Independent Sets using Pairwise Independence
Lecture Notes for Randomzed Algorthms Luby s Alg. for Maxmal Independent Sets usng Parwse Independence Last Updated by Erc Vgoda on February, 006 8. Maxmal Independent Sets For a graph G = (V, E), an ndependent
v a 1 b 1 i, a 2 b 2 i,..., a n b n i.
SECTION 8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS 455 8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS All the vector spaces we have studed thus far n the text are real vector spaces snce the scalars are
Loop Parallelization
- - Loop Parallelzaton C-52 Complaton steps: nested loops operatng on arrays, sequentell executon of teraton space DECLARE B[..,..+] FOR I :=.. FOR J :=.. I B[I,J] := B[I-,J]+B[I-,J-] ED FOR ED FOR analyze
The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis
The Development of Web Log Mnng Based on Improve-K-Means Clusterng Analyss TngZhong Wang * College of Informaton Technology, Luoyang Normal Unversty, Luoyang, 471022, Chna [email protected] Abstract.
An Alternative Way to Measure Private Equity Performance
An Alternatve Way to Measure Prvate Equty Performance Peter Todd Parlux Investment Technology LLC Summary Internal Rate of Return (IRR) s probably the most common way to measure the performance of prvate
benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).
REVIEW OF RISK MANAGEMENT CONCEPTS LOSS DISTRIBUTIONS AND INSURANCE Loss and nsurance: When someone s subject to the rsk of ncurrng a fnancal loss, the loss s generally modeled usng a random varable or
On the Optimal Control of a Cascade of Hydro-Electric Power Stations
On the Optmal Control of a Cascade of Hydro-Electrc Power Statons M.C.M. Guedes a, A.F. Rbero a, G.V. Smrnov b and S. Vlela c a Department of Mathematcs, School of Scences, Unversty of Porto, Portugal;
A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm
Avalable onlne www.ocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(7):1884-1889 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCPRC5 A hybrd global optmzaton algorthm based on parallel
L10: Linear discriminants analysis
L0: Lnear dscrmnants analyss Lnear dscrmnant analyss, two classes Lnear dscrmnant analyss, C classes LDA vs. PCA Lmtatons of LDA Varants of LDA Other dmensonalty reducton methods CSCE 666 Pattern Analyss
Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur
Module LOSSLESS IMAGE COMPRESSION SYSTEMS Lesson 3 Lossless Compresson: Huffman Codng Instructonal Objectves At the end of ths lesson, the students should be able to:. Defne and measure source entropy..
Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network *
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 24, 819-840 (2008) Data Broadcast on a Mult-System Heterogeneous Overlayed Wreless Network * Department of Computer Scence Natonal Chao Tung Unversty Hsnchu,
1 Example 1: Axis-aligned rectangles
COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture # 6 Scrbe: Aaron Schld February 21, 2013 Last class, we dscussed an analogue for Occam s Razor for nfnte hypothess spaces that, n conjuncton
A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression
Novel Methodology of Workng Captal Management for Large Publc Constructons by Usng Fuzzy S-curve Regresson Cheng-Wu Chen, Morrs H. L. Wang and Tng-Ya Hseh Department of Cvl Engneerng, Natonal Central Unversty,
"Research Note" APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES *
Iranan Journal of Scence & Technology, Transacton B, Engneerng, ol. 30, No. B6, 789-794 rnted n The Islamc Republc of Iran, 006 Shraz Unversty "Research Note" ALICATION OF CHARGE SIMULATION METHOD TO ELECTRIC
Section 5.4 Annuities, Present Value, and Amortization
Secton 5.4 Annutes, Present Value, and Amortzaton Present Value In Secton 5.2, we saw that the present value of A dollars at nterest rate per perod for n perods s the amount that must be deposted today
Fisher Markets and Convex Programs
Fsher Markets and Convex Programs Nkhl R. Devanur 1 Introducton Convex programmng dualty s usually stated n ts most general form, wth convex objectve functons and convex constrants. (The book by Boyd and
Joint Scheduling of Processing and Shuffle Phases in MapReduce Systems
Jont Schedulng of Processng and Shuffle Phases n MapReduce Systems Fangfe Chen, Mural Kodalam, T. V. Lakshman Department of Computer Scence and Engneerng, The Penn State Unversty Bell Laboratores, Alcatel-Lucent
Power-of-Two Policies for Single- Warehouse Multi-Retailer Inventory Systems with Order Frequency Discounts
Power-of-wo Polces for Sngle- Warehouse Mult-Retaler Inventory Systems wth Order Frequency Dscounts José A. Ventura Pennsylvana State Unversty (USA) Yale. Herer echnon Israel Insttute of echnology (Israel)
Project Networks With Mixed-Time Constraints
Project Networs Wth Mxed-Tme Constrants L Caccetta and B Wattananon Western Australan Centre of Excellence n Industral Optmsaton (WACEIO) Curtn Unversty of Technology GPO Box U1987 Perth Western Australa
+ + + - - This circuit than can be reduced to a planar circuit
MeshCurrent Method The meshcurrent s analog of the nodeoltage method. We sole for a new set of arables, mesh currents, that automatcally satsfy KCLs. As such, meshcurrent method reduces crcut soluton to
A Secure Password-Authenticated Key Agreement Using Smart Cards
A Secure Password-Authentcated Key Agreement Usng Smart Cards Ka Chan 1, Wen-Chung Kuo 2 and Jn-Chou Cheng 3 1 Department of Computer and Informaton Scence, R.O.C. Mltary Academy, Kaohsung 83059, Tawan,
How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence
1 st Internatonal Symposum on Imprecse Probabltes and Ther Applcatons, Ghent, Belgum, 29 June 2 July 1999 How Sets of Coherent Probabltes May Serve as Models for Degrees of Incoherence Mar J. Schervsh
Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification
Lecture 4: More classfers and classes C4B Machne Learnng Hlary 20 A. Zsserman Logstc regresson Loss functons revsted Adaboost Loss functons revsted Optmzaton Multple class classfcaton Logstc Regresson
Calculating the high frequency transmission line parameters of power cables
< ' Calculatng the hgh frequency transmsson lne parameters of power cables Authors: Dr. John Dcknson, Laboratory Servces Manager, N 0 RW E B Communcatons Mr. Peter J. Ncholson, Project Assgnment Manager,
Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers
Foundatons and Trends R n Machne Learnng Vol. 3, No. 1 (2010) 1 122 c 2011 S. Boyd, N. Parkh, E. Chu, B. Peleato and J. Ecksten DOI: 10.1561/2200000016 Dstrbuted Optmzaton and Statstcal Learnng va the
1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP)
6.3 / -- Communcaton Networks II (Görg) SS20 -- www.comnets.un-bremen.de Communcaton Networks II Contents. Fundamentals of probablty theory 2. Emergence of communcaton traffc 3. Stochastc & Markovan Processes
J. Parallel Distrib. Comput.
J. Parallel Dstrb. Comput. 71 (2011) 62 76 Contents lsts avalable at ScenceDrect J. Parallel Dstrb. Comput. journal homepage: www.elsever.com/locate/jpdc Optmzng server placement n dstrbuted systems n
How To Understand The Results Of The German Meris Cloud And Water Vapour Product
Ttel: Project: Doc. No.: MERIS level 3 cloud and water vapour products MAPP MAPP-ATBD-ClWVL3 Issue: 1 Revson: 0 Date: 9.12.1998 Functon Name Organsaton Sgnature Date Author: Bennartz FUB Preusker FUB Schüller
Lecture 3: Force of Interest, Real Interest Rate, Annuity
Lecture 3: Force of Interest, Real Interest Rate, Annuty Goals: Study contnuous compoundng and force of nterest Dscuss real nterest rate Learn annuty-mmedate, and ts present value Study annuty-due, and
The Greedy Method. Introduction. 0/1 Knapsack Problem
The Greedy Method Introducton We have completed data structures. We now are gong to look at algorthm desgn methods. Often we are lookng at optmzaton problems whose performance s exponental. For an optmzaton
A Probabilistic Theory of Coherence
A Probablstc Theory of Coherence BRANDEN FITELSON. The Coherence Measure C Let E be a set of n propostons E,..., E n. We seek a probablstc measure C(E) of the degree of coherence of E. Intutvely, we want
POLYSA: A Polynomial Algorithm for Non-binary Constraint Satisfaction Problems with and
POLYSA: A Polynomal Algorthm for Non-bnary Constrant Satsfacton Problems wth and Mguel A. Saldo, Federco Barber Dpto. Sstemas Informátcos y Computacón Unversdad Poltécnca de Valenca, Camno de Vera s/n
Forecasting the Direction and Strength of Stock Market Movement
Forecastng the Drecton and Strength of Stock Market Movement Jngwe Chen Mng Chen Nan Ye [email protected] [email protected] [email protected] Abstract - Stock market s one of the most complcated systems
Rate Monotonic (RM) Disadvantages of cyclic. TDDB47 Real Time Systems. Lecture 2: RM & EDF. Priority-based scheduling. States of a process
Dsadvantages of cyclc TDDB47 Real Tme Systems Manual scheduler constructon Cannot deal wth any runtme changes What happens f we add a task to the set? Real-Tme Systems Laboratory Department of Computer
Optimal resource capacity management for stochastic networks
Submtted for publcaton. Optmal resource capacty management for stochastc networks A.B. Deker H. Mlton Stewart School of ISyE, Georga Insttute of Technology, Atlanta, GA 30332, [email protected]
DEFINING %COMPLETE IN MICROSOFT PROJECT
CelersSystems DEFINING %COMPLETE IN MICROSOFT PROJECT PREPARED BY James E Aksel, PMP, PMI-SP, MVP For Addtonal Informaton about Earned Value Management Systems and reportng, please contact: CelersSystems,
On the Solution of Indefinite Systems Arising in Nonlinear Optimization
On the Soluton of Indefnte Systems Arsng n Nonlnear Optmzaton Slva Bonettn, Valera Ruggero and Federca Tnt Dpartmento d Matematca, Unverstà d Ferrara Abstract We consder the applcaton of the precondtoned
Examensarbete. Rotating Workforce Scheduling. Caroline Granfeldt
Examensarbete Rotatng Workforce Schedulng Carolne Granfeldt LTH - MAT - EX - - 2015 / 08 - - SE Rotatng Workforce Schedulng Optmerngslära, Lnköpngs Unverstet Carolne Granfeldt LTH - MAT - EX - - 2015
BERNSTEIN POLYNOMIALS
On-Lne Geometrc Modelng Notes BERNSTEIN POLYNOMIALS Kenneth I. Joy Vsualzaton and Graphcs Research Group Department of Computer Scence Unversty of Calforna, Davs Overvew Polynomals are ncredbly useful
Multiple-Period Attribution: Residuals and Compounding
Multple-Perod Attrbuton: Resduals and Compoundng Our revewer gave these authors full marks for dealng wth an ssue that performance measurers and vendors often regard as propretary nformaton. In 1994, Dens
When Network Effect Meets Congestion Effect: Leveraging Social Services for Wireless Services
When Network Effect Meets Congeston Effect: Leveragng Socal Servces for Wreless Servces aowen Gong School of Electrcal, Computer and Energy Engeerng Arzona State Unversty Tempe, AZ 8587, USA xgong9@asuedu
Enterprise Master Patient Index
Enterprse Master Patent Index Healthcare data are captured n many dfferent settngs such as hosptals, clncs, labs, and physcan offces. Accordng to a report by the CDC, patents n the Unted States made an
Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College
Feature selecton for ntruson detecton Slobodan Petrovć NISlab, Gjøvk Unversty College Contents The feature selecton problem Intruson detecton Traffc features relevant for IDS The CFS measure The mrmr measure
An MILP model for planning of batch plants operating in a campaign-mode
An MILP model for plannng of batch plants operatng n a campagn-mode Yanna Fumero Insttuto de Desarrollo y Dseño CONICET UTN [email protected] Gabrela Corsano Insttuto de Desarrollo y Dseño
A Lyapunov Optimization Approach to Repeated Stochastic Games
PROC. ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING, OCT. 2013 1 A Lyapunov Optmzaton Approach to Repeated Stochastc Games Mchael J. Neely Unversty of Southern Calforna http://www-bcf.usc.edu/
The OC Curve of Attribute Acceptance Plans
The OC Curve of Attrbute Acceptance Plans The Operatng Characterstc (OC) curve descrbes the probablty of acceptng a lot as a functon of the lot s qualty. Fgure 1 shows a typcal OC Curve. 10 8 6 4 1 3 4
Point cloud to point cloud rigid transformations. Minimizing Rigid Registration Errors
Pont cloud to pont cloud rgd transformatons Russell Taylor 600.445 1 600.445 Fall 000-014 Copyrght R. H. Taylor Mnmzng Rgd Regstraton Errors Typcally, gven a set of ponts {a } n one coordnate system and
Logical Development Of Vogel s Approximation Method (LD-VAM): An Approach To Find Basic Feasible Solution Of Transportation Problem
INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME, ISSUE, FEBRUARY ISSN 77-866 Logcal Development Of Vogel s Approxmaton Method (LD- An Approach To Fnd Basc Feasble Soluton Of Transportaton
PERRON FROBENIUS THEOREM
PERRON FROBENIUS THEOREM R. CLARK ROBINSON Defnton. A n n matrx M wth real entres m, s called a stochastc matrx provded () all the entres m satsfy 0 m, () each of the columns sum to one, m = for all, ()
Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network
700 Proceedngs of the 8th Internatonal Conference on Innovaton & Management Forecastng the Demand of Emergency Supples: Based on the CBR Theory and BP Neural Network Fu Deqang, Lu Yun, L Changbng School
Extending Probabilistic Dynamic Epistemic Logic
Extendng Probablstc Dynamc Epstemc Logc Joshua Sack May 29, 2008 Probablty Space Defnton A probablty space s a tuple (S, A, µ), where 1 S s a set called the sample space. 2 A P(S) s a σ-algebra: a set
Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)
Face Recognton Problem Face Verfcaton Problem Face Verfcaton (1:1 matchng) Querymage face query Face Recognton (1:N matchng) database Applcaton: Access Control www.vsage.com www.vsoncs.com Bometrc Authentcaton
8 Algorithm for Binary Searching in Trees
8 Algorthm for Bnary Searchng n Trees In ths secton we present our algorthm for bnary searchng n trees. A crucal observaton employed by the algorthm s that ths problem can be effcently solved when the
CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol
CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK Sample Stablty Protocol Background The Cholesterol Reference Method Laboratory Network (CRMLN) developed certfcaton protocols for total cholesterol, HDL
Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy
Fnancal Tme Seres Analyss Patrck McSharry [email protected] www.mcsharry.net Trnty Term 2014 Mathematcal Insttute Unversty of Oxford Course outlne 1. Data analyss, probablty, correlatons, vsualsaton
INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS
21 22 September 2007, BULGARIA 119 Proceedngs of the Internatonal Conference on Informaton Technologes (InfoTech-2007) 21 st 22 nd September 2007, Bulgara vol. 2 INVESTIGATION OF VEHICULAR USERS FAIRNESS
Robust Design of Public Storage Warehouses. Yeming (Yale) Gong EMLYON Business School
Robust Desgn of Publc Storage Warehouses Yemng (Yale) Gong EMLYON Busness School Rene de Koster Rotterdam school of management, Erasmus Unversty Abstract We apply robust optmzaton and revenue management
行 政 院 國 家 科 學 委 員 會 補 助 專 題 研 究 計 畫 成 果 報 告 期 中 進 度 報 告
行 政 院 國 家 科 學 委 員 會 補 助 專 題 研 究 計 畫 成 果 報 告 期 中 進 度 報 告 畫 類 別 : 個 別 型 計 畫 半 導 體 產 業 大 型 廠 房 之 設 施 規 劃 計 畫 編 號 :NSC 96-2628-E-009-026-MY3 執 行 期 間 : 2007 年 8 月 1 日 至 2010 年 7 月 31 日 計 畫 主 持 人 : 巫 木 誠 共 同
On the Interaction between Load Balancing and Speed Scaling
On the Interacton between Load Balancng and Speed Scalng Ljun Chen, Na L and Steven H. Low Engneerng & Appled Scence Dvson, Calforna Insttute of Technology, USA Abstract Speed scalng has been wdely adopted
A Simple Approach to Clustering in Excel
A Smple Approach to Clusterng n Excel Aravnd H Center for Computatonal Engneerng and Networng Amrta Vshwa Vdyapeetham, Combatore, Inda C Rajgopal Center for Computatonal Engneerng and Networng Amrta Vshwa
Open Access A Load Balancing Strategy with Bandwidth Constraint in Cloud Computing. Jing Deng 1,*, Ping Guo 2, Qi Li 3, Haizhu Chen 1
Send Orders for Reprnts to [email protected] The Open Cybernetcs & Systemcs Journal, 2014, 8, 115-121 115 Open Access A Load Balancng Strategy wth Bandwdth Constrant n Cloud Computng Jng Deng 1,*,
Single and multiple stage classifiers implementing logistic discrimination
Sngle and multple stage classfers mplementng logstc dscrmnaton Hélo Radke Bttencourt 1 Dens Alter de Olvera Moraes 2 Vctor Haertel 2 1 Pontfíca Unversdade Católca do Ro Grande do Sul - PUCRS Av. Ipranga,
Ant Colony Optimization for Economic Generator Scheduling and Load Dispatch
Proceedngs of the th WSEAS Int. Conf. on EVOLUTIONARY COMPUTING, Lsbon, Portugal, June 1-18, 5 (pp17-175) Ant Colony Optmzaton for Economc Generator Schedulng and Load Dspatch K. S. Swarup Abstract Feasblty
An Interest-Oriented Network Evolution Mechanism for Online Communities
An Interest-Orented Network Evoluton Mechansm for Onlne Communtes Cahong Sun and Xaopng Yang School of Informaton, Renmn Unversty of Chna, Bejng 100872, P.R. Chna {chsun,yang}@ruc.edu.cn Abstract. Onlne
General Auction Mechanism for Search Advertising
General Aucton Mechansm for Search Advertsng Gagan Aggarwal S. Muthukrshnan Dávd Pál Martn Pál Keywords game theory, onlne auctons, stable matchngs ABSTRACT Internet search advertsng s often sold by an
Efficient Project Portfolio as a tool for Enterprise Risk Management
Effcent Proect Portfolo as a tool for Enterprse Rsk Management Valentn O. Nkonov Ural State Techncal Unversty Growth Traectory Consultng Company January 5, 27 Effcent Proect Portfolo as a tool for Enterprse
where the coordinates are related to those in the old frame as follows.
Chapter 2 - Cartesan Vectors and Tensors: Ther Algebra Defnton of a vector Examples of vectors Scalar multplcaton Addton of vectors coplanar vectors Unt vectors A bass of non-coplanar vectors Scalar product
RESEARCH ON DUAL-SHAKER SINE VIBRATION CONTROL. Yaoqi FENG 1, Hanping QIU 1. China Academy of Space Technology (CAST) yaoqi.feng@yahoo.
ICSV4 Carns Australa 9- July, 007 RESEARCH ON DUAL-SHAKER SINE VIBRATION CONTROL Yaoq FENG, Hanpng QIU Dynamc Test Laboratory, BISEE Chna Academy of Space Technology (CAST) [email protected] Abstract
Form-finding of grid shells with continuous elastic rods
Page of 0 Form-fndng of grd shells wth contnuous elastc rods Jan-Mn L PhD student Insttute of Buldng Structures and Structural Desgn (tke), Unversty Stuttgart Stuttgar, Germany [email protected] Jan
THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek
HE DISRIBUION OF LOAN PORFOLIO VALUE * Oldrch Alfons Vascek he amount of captal necessary to support a portfolo of debt securtes depends on the probablty dstrbuton of the portfolo loss. Consder a portfolo
A Performance Analysis of View Maintenance Techniques for Data Warehouses
A Performance Analyss of Vew Mantenance Technques for Data Warehouses Xng Wang Dell Computer Corporaton Round Roc, Texas Le Gruenwald The nversty of Olahoma School of Computer Scence orman, OK 739 Guangtao
Analysis of Energy-Conserving Access Protocols for Wireless Identification Networks
From the Proceedngs of Internatonal Conference on Telecommuncaton Systems (ITC-97), March 2-23, 1997. 1 Analyss of Energy-Conservng Access Protocols for Wreless Identfcaton etworks Imrch Chlamtac a, Chara
Credit Limit Optimization (CLO) for Credit Cards
Credt Lmt Optmzaton (CLO) for Credt Cards Vay S. Desa CSCC IX, Ednburgh September 8, 2005 Copyrght 2003, SAS Insttute Inc. All rghts reserved. SAS Propretary Agenda Background Tradtonal approaches to credt
Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008
Rsk-based Fatgue Estmate of Deep Water Rsers -- Course Project for EM388F: Fracture Mechancs, Sprng 2008 Chen Sh Department of Cvl, Archtectural, and Envronmental Engneerng The Unversty of Texas at Austn
FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES
FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES Zuzanna BRO EK-MUCHA, Grzegorz ZADORA, 2 Insttute of Forensc Research, Cracow, Poland 2 Faculty of Chemstry, Jagellonan
Ants Can Schedule Software Projects
Ants Can Schedule Software Proects Broderck Crawford 1,2, Rcardo Soto 1,3, Frankln Johnson 4, and Erc Monfroy 5 1 Pontfca Unversdad Católca de Valparaíso, Chle [email protected] 2 Unversdad Fns Terrae,
Activity Scheduling for Cost-Time Investment Optimization in Project Management
PROJECT MANAGEMENT 4 th Internatonal Conference on Industral Engneerng and Industral Management XIV Congreso de Ingenería de Organzacón Donosta- San Sebastán, September 8 th -10 th 010 Actvty Schedulng
1.1 The University may award Higher Doctorate degrees as specified from time-to-time in UPR AS11 1.
HIGHER DOCTORATE DEGREES SUMMARY OF PRINCIPAL CHANGES General changes None Secton 3.2 Refer to text (Amendments to verson 03.0, UPR AS02 are shown n talcs.) 1 INTRODUCTION 1.1 The Unversty may award Hgher
Sciences Shenyang, Shenyang, China.
Advanced Materals Research Vols. 314-316 (2011) pp 1315-1320 (2011) Trans Tech Publcatons, Swtzerland do:10.4028/www.scentfc.net/amr.314-316.1315 Solvng the Two-Obectve Shop Schedulng Problem n MTO Manufacturng
SOLVING CARDINALITY CONSTRAINED PORTFOLIO OPTIMIZATION PROBLEM BY BINARY PARTICLE SWARM OPTIMIZATION ALGORITHM
SOLVIG CARDIALITY COSTRAIED PORTFOLIO OPTIMIZATIO PROBLEM BY BIARY PARTICLE SWARM OPTIMIZATIO ALGORITHM Aleš Kresta Klíčová slova: optmalzace portfola, bnární algortmus rojení částc Key words: portfolo
Can Auto Liability Insurance Purchases Signal Risk Attitude?
Internatonal Journal of Busness and Economcs, 2011, Vol. 10, No. 2, 159-164 Can Auto Lablty Insurance Purchases Sgnal Rsk Atttude? Chu-Shu L Department of Internatonal Busness, Asa Unversty, Tawan Sheng-Chang
PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12
14 The Ch-squared dstrbuton PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 1 If a normal varable X, havng mean µ and varance σ, s standardsed, the new varable Z has a mean 0 and varance 1. When ths standardsed
2008/8. An integrated model for warehouse and inventory planning. Géraldine Strack and Yves Pochet
2008/8 An ntegrated model for warehouse and nventory plannng Géraldne Strack and Yves Pochet CORE Voe du Roman Pays 34 B-1348 Louvan-la-Neuve, Belgum. Tel (32 10) 47 43 04 Fax (32 10) 47 43 01 E-mal: [email protected]
Politecnico di Torino. Porto Institutional Repository
Poltecnco d Torno Porto Insttutonal Repostory [Artcle] A cost-effectve cloud computng framework for acceleratng multmeda communcaton smulatons Orgnal Ctaton: D. Angel, E. Masala (2012). A cost-effectve
The Application of Fractional Brownian Motion in Option Pricing
Vol. 0, No. (05), pp. 73-8 http://dx.do.org/0.457/jmue.05.0..6 The Applcaton of Fractonal Brownan Moton n Opton Prcng Qng-xn Zhou School of Basc Scence,arbn Unversty of Commerce,arbn [email protected]
Formulating & Solving Integer Problems Chapter 11 289
Formulatng & Solvng Integer Problems Chapter 11 289 The Optonal Stop TSP If we drop the requrement that every stop must be vsted, we then get the optonal stop TSP. Ths mght correspond to a ob sequencng
Inter-Ing 2007. INTERDISCIPLINARITY IN ENGINEERING SCIENTIFIC INTERNATIONAL CONFERENCE, TG. MUREŞ ROMÂNIA, 15-16 November 2007.
Inter-Ing 2007 INTERDISCIPLINARITY IN ENGINEERING SCIENTIFIC INTERNATIONAL CONFERENCE, TG. MUREŞ ROMÂNIA, 15-16 November 2007. UNCERTAINTY REGION SIMULATION FOR A SERIAL ROBOT STRUCTURE MARIUS SEBASTIAN
