A Simple Congestion-Aware Algorithm for Load Balancing in Datacenter Networks



Similar documents
Approximation Algorithms for Data Distribution with Load Balancing of Web Servers

TCP/IP Interaction Based on Congestion Price: Stability and Optimality

Dynamic Virtual Network Allocation for OpenFlow Based Cloud Resident Data Center

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

Predictive Control of a Smart Grid: A Distributed Optimization Algorithm with Centralized Performance Properties*

Off-line and on-line scheduling on heterogeneous master-slave platforms

Recurrence. 1 Definitions and main statements

An Efficient Job Scheduling for MapReduce Clusters

Increasing Supported VoIP Flows in WMNs through Link-Based Aggregation

PAS: A Packet Accounting System to Limit the Effects of DoS & DDoS. Debish Fesehaye & Klara Naherstedt University of Illinois-Urbana Champaign

Multi-agent System for Custom Relationship Management with SVMs Tool

The Dynamics of Wealth and Income Distribution in a Neoclassical Growth Model * Stephen J. Turnovsky. University of Washington, Seattle

Expressive Negotiation over Donations to Charities

On the Interaction between Load Balancing and Speed Scaling

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek

v a 1 b 1 i, a 2 b 2 i,..., a n b n i.

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

The Greedy Method. Introduction. 0/1 Knapsack Problem

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

1 Example 1: Axis-aligned rectangles

An Interest-Oriented Network Evolution Mechanism for Online Communities

An Ensemble Classification Framework to Evolving Data Streams

On the Interaction between Load Balancing and Speed Scaling

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

Support Vector Machines

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

J. Parallel Distrib. Comput.

DEFINING %COMPLETE IN MICROSOFT PROJECT

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

Clustering based Two-Stage Text Classification Requiring Minimal Training Data

Network Aware Load-Balancing via Parallel VM Migration for Data Centers

What is Candidate Sampling

Dynamic Pricing for Smart Grid with Reinforcement Learning

Branch-and-Price and Heuristic Column Generation for the Generalized Truck-and-Trailer Routing Problem

VoIP Playout Buffer Adjustment using Adaptive Estimation of Network Delays

A Dynamic Energy-Efficiency Mechanism for Data Center Networks

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

Availability-Based Path Selection and Network Vulnerability Assessment

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING

The Power of Slightly More than One Sample in Randomized Load Balancing

INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS

AN APPOINTMENT ORDER OUTPATIENT SCHEDULING SYSTEM THAT IMPROVES OUTPATIENT EXPERIENCE

Case Study: Load Balancing

Fair Virtual Bandwidth Allocation Model in Virtual Data Centers

Generalizing the degree sequence problem

CALL ADMISSION CONTROL IN WIRELESS MULTIMEDIA NETWORKS

Minimal Coding Network With Combinatorial Structure For Instantaneous Recovery From Edge Failures

On-Line Trajectory Generation: Nonconstant Motion Constraints

When Network Effect Meets Congestion Effect: Leveraging Social Services for Wireless Services

An Alternative Way to Measure Private Equity Performance

Traffic State Estimation in the Traffic Management Center of Berlin

Revenue Management for a Multiclass Single-Server Queue via a Fluid Model Analysis

Prediction of Success or Fail of Students on Different Educational Majors at the End of the High School with Artificial Neural Networks Methods

Open Access A Load Balancing Strategy with Bandwidth Constraint in Cloud Computing. Jing Deng 1,*, Ping Guo 2, Qi Li 3, Haizhu Chen 1

SIMPLIFYING NDA PROGRAMMING WITH PROt SQL

Robust Design of Public Storage Warehouses. Yeming (Yale) Gong EMLYON Business School

How To Understand The Results Of The German Meris Cloud And Water Vapour Product

Predicting Advertiser Bidding Behaviors in Sponsored Search by Rationality Modeling

How To Plan A Network Wide Load Balancing Route For A Network Wde Network (Network)

Pricing Model of Cloud Computing Service with Partial Multihoming

Extending Probabilistic Dynamic Epistemic Logic

A Resources Allocation Model for Multi-Project Management

Enabling P2P One-view Multi-party Video Conferencing

Lecture 3: Force of Interest, Real Interest Rate, Annuity

Cardiovascular Event Risk Assessment Fusion of Individual Risk Assessment Tools Applied to the Portuguese Population

INSTITUT FÜR INFORMATIK

Stochastic Models of Load Balancing and Scheduling in Cloud Computing Clusters

Analysis of Energy-Conserving Access Protocols for Wireless Identification Networks

Sngle Snk Buy at Bulk Problem and the Access Network

Relay Secrecy in Wireless Networks with Eavesdropper

Efficient Bandwidth Management in Broadband Wireless Access Systems Using CAC-based Dynamic Pricing

Power-of-Two Policies for Single- Warehouse Multi-Retailer Inventory Systems with Order Frequency Discounts

This paper concerns the evaluation and analysis of order

2008/8. An integrated model for warehouse and inventory planning. Géraldine Strack and Yves Pochet

1 OPTIMIZATION ISSUES IN WEB

The literature on many-server approximations provides significant simplifications toward the optimal capacity

How To Improve Delay Throughput In Wireless Networks With Multipath Routing And Channel Codeing

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP)

Optimization of network mesh topologies and link capacities for congestion relief

CLoud computing technologies have enabled rapid

denote the location of a node, and suppose node X . This transmission causes a successful reception by node X for any other node

Schedulability Bound of Weighted Round Robin Schedulers for Hard Real-Time Systems

Neural Network-based Colonoscopic Diagnosis Using On-line Learning and Differential Evolution

Forecasting the Direction and Strength of Stock Market Movement


Ants Can Schedule Software Projects

How To Make A Co-Ocaton Work For Free

Transcription:

A Smpe Congeston-Aware Agorthm for Load Baancng n Datacenter Networs Mehrnoosh Shafee, and Javad Ghader, Coumba Unversty Abstract We study the probem of oad baancng n datacenter networs, namey, assgnng the end-to-end data fows among the avaabe paths n order to effcenty baance the oad n the networ. The soutons used today rey typcay on ECMP (Equa Cost Mut Path mechansm whch essentay attempts to baance the oad n the networ by hashng the fows to the avaabe shortest paths. However, t s we nown that ECMP performs poory when there s asymmetry ether n the networ topoogy or the fow szes, and thus there has been much nterest recenty n aternatve mechansms to address these shortcomngs. In ths paper, we consder a genera networ topoogy where each n has a cost whch s a convex functon of the n utzaton. Fows among the varous source-destnaton pars are generated dynamcay over tme, each wth a sze (bandwdth requrement and a duraton. Once a fow s assgned to a path n the networ, t consumes bandwdth equa to ts sze from a the ns aong ts path for ts duraton. We propose a owcompexty congeston-aware agorthm that assgns the fows to the avaabe paths n an onne fashon and wthout spttng, and prove that t asymptotcay mnmzes the tota networ cost. Extensve smuaton resuts are presented to verfy the performance of our agorthm under a wde range of traffc condtons and under dfferent datacenter archtectures. I. INTRODUCTION There has been a dramatc shft over the recent decades wth search, storage, and computng movng nto arge-scae datacenters. Today s datacenters can contan thousands of servers and typcay use a mut-ter swtch networ to provde connectvty among the servers. To mantan effcency and quaty of servce, t s essenta that the data fows among the servers are mapped to the avaabe paths n the networ propery n order to baance the oad and mnmze the cost (e.g., deay, congeston, etc.. For exampe when a arge fow s routed poory, coson wth the other fows can cause some ns to become congested, whe other ess utzed paths are avaabe. The datacenter networs rey on path mutpcty to provde scaabty, fexbty, and cost effcency. Consequenty, there has been much research on fow schedung agorthms that mae better use of the path mutpcty (e.g., [] [5] or desgnng new networs wth better topoogca features (e.g., FatTree [], VL2 [6], hypercube [7], hypergrd [8], random graphs such as JeyFsh [9], etc.. In ths paper, we consder a genera networ topoogy where each n s assocated wth a cost whch s a convex functon of the n utzaton (e.g., ths coud be a atency functon. The networ cost s defned as the sum of the n costs. Fows among the varous source-destnaton pars are generated dynamcay over tme where each fow s assocated wth a sze and a duraton. Once a fow s assgned to a path n the networ, t consumes resource (bandwdth equa to ts sze from a the ns aong ts path for ts duraton. The man queston that we as s the foowng. Is t possbe to desgn a ow-compexty agorthm, that assgns the fows to the avaabe paths n an onne fashon and wthout spttng, so as to mnmze the average networ cost? In genera, mut fow routng n networs has been extensvey studed from both networng systems and theoretca perspectve, however the probem consdered here has two ey dstngushng objectves. Frst, t does not aow fow spttng because spttng the fow s undesrabe due to TCP reorderng effect []. Wthout spttng, many versons of mut fow routng n networs become hard combnatora probems [], [2]. Second, t aows dynamc routng because t consders the current utzaton of ns n the networ when mang the routng decsons une statc soutons where the mappng of fows to the paths s fxed and requres the nowedge of the traffc matrx. A. Reated Wor Semna soutons for fow schedung (e.g. [6], [3] rey on Equa Cost Mut Path (ECMP oad baancng whch statcay spts the traffc among avaabe shortest paths (va fow hashng. However, t s we nown [2] [5], [4] that ECMP can baance oad poory snce t may map arge ong-ved fows to the same path, thus causng sgnfcant oad mbaance. Further, ECMP s suted for symmetrc archtectures such as FatTree and performs poory n presence of asymmetry ether due to n faures [5] or n recenty proposed datacenter archtectures [9]. There have been recent efforts to address the shortcomngs of ECMP however they are mosty heurstcs wth no performance guarantees. The proposed agorthms range from centrazed soutons (e.g., [2], [3], where a centrazed scheduer maes routng decsons based on goba vew of the networ, to dstrbuted soutons (e.g., [5], [6] where routng decsons are made n a dstrbuted manner by the swtches. There are aso host-based protocos based on Mut Path TCP (e.g., [4] where the routng decsons are made by the end-host transport protoco rather than by the networ operator. [7] nvestgates a more genera probem based on a Gbbs sampng technque and proposes a pausbe heurstc that requres re-routng and nterrupton of fows (whch s operatonay expensve. There are aso agorthms that aow fow spttng and try to resove the pacet reorderng effect n symmetrc networ topooges [5], [6], [8].

Software Defned Networng (SDN has enabed networ contro wth qucer and more fexbe adaptaton to changes n the networ topoogy or the traffc pattern and can be everaged to mpement centrazed or hybrd agorthms n datacenters [], [9], [2]. B. Contrbuton We propose and anayze a smpe fow schedung agorthm to mnmze the average networ cost (the sum of convex functons of n utzatons. Our man contrbutons can be summarzed as beow. We prove that our smpe agorthm s asymptotcay optma n any networ topoogy, n the sense that the performance rato between our agorthm and the optma cost approaches as the mean number of fows n the system ncreases. Our agorthm does not rey on fow spttng, hence pacets of the same fow w trave aong the same path wthout reorderng. Further, t does not requre mgraton/reroutng of the fows or the nowedge of the traffc pattern. Our expermenta resuts show that our agorthm n fact performs very we under a wde range of traffc condtons and datacenter networ topooges. For practca mpementatons, the weght construct n our agorthm can provde an approach to optmay accommodate dynamc varatons n datacenter networ traffc n centrazed contro patforms such as OpenFow [9]. C. Notatons Gven a sequence of random varabes {X n }, X n X ndcates convergence n dstrbuton, and X n X ndcates the amost sure convergence. Gven a Marov process {X(t}, X( denotes a random varabe whose dstrbuton s the same as the steady-state dstrbuton of X(t (when t exsts. s the Eucdan norm n R n. d(x, S = mn s S s x s the dstance of x from the set S. u.o.c. means unformy over compact sets. D. Organzaton The remander of the paper s organzed as foows. In Secton II, we ntroduce the datacenter networ and traffc mode. Our agorthm s presented n Secton III. Secton IV s devoted to the man resuts and performance anayss usng fud mts. Secton V contans our smuaton resuts to verfy the performance of our agorthm under a wde range of traffc condtons and varous datacenter archtectures. The rgorous proofs of some of the resuts are provded n Secton VI. Secton VII contans our concudng remars. II. MODEL AND PROBLEM STATEMENT A. Datacenter Networ Mode We consder a datacenter (DC consstng of a set of servers (host machnes connected by a coecton of swtches and ns. Dependng on the DC networ topoogy, a or a subset of the swtches are drecty connected to servers; for exampe, Aggregaton Edge Core (a FatTree (b JeyFsh (random graph Fg. : Two datacenter networs connectng 6 servers (rectanges usng 4-port swtches (crces. n FatTree (Fgure a ony the edge (top-of-the-rac swtches are connected to servers, whe n JeyFsh (Fgure b a the swtches have some ports connected to servers. Nevertheess, we can mode any genera DC networ topoogy (FatTree, JeyFsh, etc. by a graph G(V, E where V s the set of swtches and E s the set of communcaton ns. A path between two swtches s defned as a set of ns that connects the swtches and does not ntersect tsef. The paths between the same par of source-destnaton swtches may ntersect wth each other or wth other paths n DC. B. Traffc Mode Each server can generate a fow destned to some other server. We assume that each fow beongs to a set of fow types J. A fow of type j J s a trpe (a j, d j, s j where a j V s ts source swtch (.e., the swtch connected to the source server, d j V s ts destnaton swtch (.e., the swtch connected to ts destnaton server, and s j s ts sze (bandwdth requrement. Note that based on ths defnton, we ony need to fnd the routng of fows n the swtch networ G(V, E snce the routng from the source server to the source swtch or from the destnaton swtch to the destnaton server s trva (foows the drect n from the server to the swtch. Further, two swtches can have more than one fow type wth dfferent szes. We assume that type-j fows are generated accordng to a Posson process wth rate λ j, and each fow remans n the system for an exponentay dstrbuted amount of tme wth mean /µ j (we w see n Sectons V that our agorthm actuay performs very we under much more genera arrva and servce tme processes.. For any j J, et R j denote the set of a paths from a j to d j, then each type-j fow must be accommodated by usng ony one of the paths from R j (.e., the fow cannot be spt among mutpe paths. Assume that R j s nonempty for each j J. Defne Y (j (t to be the number of type-j fows routed aong the path R j at tme t. The networ state s defned as ( Y (t = (t; R j, j J. ( Y (j Under any onne (Marov fow schedung agorthm, {Y (t} t evoves as a Marov chan. We aso defne X (j (t = R j Y (j (t whch s the tota number of type-j

fows n the networ at tme t. Let Z (t be the tota amount of traffc (congeston over n E. Based on our notatons, Z (t = s j Y (j (t, (2 : R j, (here means that n beongs to path. We aso defne ρ j = λ j /µ j whch s the mean offered oad by type-j fows. C. Probem Formuaton For the purpose of oad baancng, the networ can attempt to optmze dfferent objectves [2] such as mnmzng the maxmum n utzaton n the networ or mnmzng the sum of n costs where each n cost s a convex functon of the n utzaton (e.g. ths coud be a n atency measure [22]. In ths paper, we use the atter objectve but by choosng proper cost functons, an optma souton to the ater objectve can be used to aso approxmate the former objectve as we see beow. We defne g(z /C to be the cost of n wth capacty C when ts congeston s Z. Our goa s to fnd a fow schedung agorthm that assgns each fow to a snge path n the networ so as to mnmze the mean networ cost n the ong run, specfcay, where, mnmze m E [F (Y (t] t subject to: servng each fow usng one path, F (Y (t = E g(z (t/c. We consder poynoma cost functons of the form g(x = x+α, α >, (5 + α where α > s a constant. Thus g s ncreasng and strcty convex n x. As α, the optma souton to (3 approaches the optma souton of the optmzaton probem whose objectve s to mnmze the maxmum n utzaton n the networ. III. ALGORITHM DESCRIPTION In ths secton, we descrbe our agorthm for fow assgnment where each fow s assgned to one path n the networ (no spttng wthout nterruptng/mgratng the ongong fows n the networ. Reca that Y (t = (Y (j (t s the networ state, Y (j (t s the number of type-j fows on path R j, and Z (t s the tota traffc on n gven by (2 Frst, we defne two forms of n margna cost that measure the ncrease n the n cost f an arrvng type-j fow at tme t s routed usng a path that uses n. Defnton. (Ln margna cost For each n and fowtype j, the n margna cost s defned n ether of the forms beow. Integra form: (j (Y (t = g ( Z (t + s j C (3 (4 ( Z (t g. (6 C Dfferenta form: δ (j (Y (t = s j g ( Z (t. (7 C C Based on the n margna costs, we can characterze the ncrease n the networ cost f an arrvng type-j fow at tme t s routed usng path R j. Specfcay, et Y (t + = Y (t + e (j, where e (j denotes a vector whose correspondng entty to path and fow type j s one, and ts other enttes are zero. Then F (Y (t s the networ cost before the type-j fow arrva, and F (Y (t + s the networ cost after assgnng the type-j fow to path. Then, t s easy to see that F (Y (t + F (Y (t = [ ( Z (t + s ( j Z (t ] g g C C = (j (Y (t. (8 Smary, based on the dfferenta margna costs, we have F (Y (t Y (j (t = = s j C g ( Z (t C δ (j (Y (t. (9 Agorthm descrbes our fow assgnment agorthm that essentay paces the newy generated fow on a path that mnmzes the ncrease n the networ cost based on ether forms (8 or (9. Agorthm Fow Schedung Agorthm Suppose a type-j fow arrves at tme t when the system s n state Y(t. Then, : Compute the path margna costs (Y (t, R j, n ether of the forms beow: Integra form: Dfferenta form: (Y (t = (j (Y (t, ( (Y (t = δ (j (Y (t. ( 2: Pace the fow on a path such that = arg mn R j Brea tes n (2 unformy at random. (Y (t. (2 Upon arrva of a fow, Agorthm taes the correspondng feasbe paths and ther n congestons nto the account for computng the path margna costs (t but t does not requre to now any nformaton about the other ns n the networ. The two forms ( and ( are essentay dentca n our asymptotc performance anayss n the next secton, however the dfferenta form ( seems sghty easer to wor wth. Agorthm can be mpemented ether centray

or n a dstrbuted manner usng a dstrbuted shortest path agorthm that uses the n margna costs, (j (t or δ (j (t, as n weghts. IV. PERFORMANCE ANALYSIS VIA FLUID LIMITS The system state {Y (t} t s a stochastc process whch s not easy to anayze, therefore we anayze the fud mts of the system nstead. Fud mts can be nterpreted as the frst order approxmaton to the orgna process {Y (t} t and provde vauabe quatatve nsght nto the operaton of the agorthm. In ths secton, we ntroduce the fud mts of the process {Y (t} t and present our man resut regardng the convergence of our agorthm to the optma cost. We deberatey defer the rgorous cams and proofs about the fud mts to Secton VI and for now many focus on the convergence anayss to the optma cost whch s the man contrbuton of ths paper. A. Informa Descrpton of Fud Lmt Process In order to obtan the fud mts, we scae the process n rate and space. Specfcay, consder a sequence of systems {Y r (t} t ndexed by a sequence of postve numbers r, each governed by the same statstca aws as the orgna system wth the fow arrva rates rλ j, j J, and nta state Y r ( such that Y r (/r y( as r for some fxed y(. The fud-scae process s defned as y r (t = Y r (t/r, t. We aso defne y r ( = Y r ( /r, the random state of the fud-scae process n steady state. If the sequence of processes {y r (t} t converges to a process {y(t} t (unformy over compact tme ntervas, wth probabty as r, the process {y(t} t s caed the fud mt. Then, (t s the fud mt number of type-j fows routed through path. Accordngy, we defne z r(t = Zr (t/r and x(jr (t = X (jr (t/r and ther correspondng mts as z (t and x (j (t as r. The fud mts under Agorthm foow possby random trajectores but they satsfy the foowng set of dfferenta equatons. We state the resut as the foowng emma whose proof can be found n Secton VI. Lemma. (Fud equatons Any fud mt y(t satsfes the foowng equatons. For any j J, and R j, d dt y(j p (j (t = λ j p (j (y(t µ j (t (3a (y(t = f / arg mn p (j (y(t, (y(t = R j (y(t (3b R j p (j (y(t = (3c s j C g (z (t/c. (3d Equaton (3a s smpy an accountng dentty for (t statng that, on the fud-scae, the number of type-j fows over path R j ncreases at rate λ j p (j (y(t, and decreases at rate µ j due to departures of type-j fows on path. (y(t s the fracton of type-j fow arrvas paced on p (j path. (y(t s the fud-mt margna cost of routng type-j fows n path when the system s n state y(t. Equaton (3b foows from (2 and states that the fows can ony be paced on the paths whch have the mnmum margna cost mn Rj (y(t. It foows from (3a and (3c that the tota number of typej fows n the system,.e., x (j (t = R j (t, foows a determnstc trajectory descrbed by the foowng equaton, d dt x(j (t = λ j µ j x (j (t, j J, (4 whch ceary mpes that x (j (t = ρ j + (x (j ( ρ j e µjt j J. (5 Consequenty at steady state, x (j ( = ρ j, j J, (6 whch means that, n steady state, there s a tota of ρ j type-j fows on the fud scae. B. Man Resut and Asymptotc Optmaty In ths secton, we state our man resut regardng the asymptotc optmaty of our agorthm. Frst note that by (6, the vaues of y( are confned to a convex compact set Υ defned beow Υ {y = ( :, R j = ρ j, j J }. (7 Consder the probem of mnmzng the networ cost n steady state on the fud scae (the counterpart of the optmzaton (3, mn F (y s.t. y Υ. (8a (8b Denote by Υ Υ the set of optma soutons to the optmzaton (8. The foowng proposton states that the fud mts of our agorthm ndeed converge to an optma souton of the optmzaton (8. Proposton. Consder the fud mts of the system under Agorthm wth nta condton y(, then as t d(y(t, Υ. (9 Convergence s unform over nta condtons chosen from a compact set. The theorem beow maes the connecton between the fud mts and the orgna optmzaton probem (3. It states the man resut of ths paper whch s the asymptotc optmaty of Agorthm. Theorem. Let Y r (t and Yopt(t r be respectvey the system trajectores under Agorthm and any optma agorthm for the optmzaton (3. Then n steady state, [ ] m r E F (Y r ( [ ] =. (2 E F (Yopt( r

For exampe, one optma agorthm that soves (3 s the one that every tme a fow arrves or departs, t re-routes the exstng fows n the networ n order to mnmze the networ cost at a tmes. Of course ths requres sovng a compex combnatora probem every tme a fow arrves/departs and further t nterrupts/mgrates the exstng fows. Under any agorthm (ncudng our agorthm and the optma one, the mean number of fows n the system n steady state s O(r. Thus by Theorem, Agorthm has roughy the same cost as the optma cost when the number of fows n the system s arge, but at much ower compexty and wth no mgratons/nterruptons. The rest of ths secton s devoted to the proof of Proposton. The proof of Theorem rees on Proposton and s provded n Secton VI. C. Proof of Proposton We frst characterze the set of optma soutons Υ usng KKT condtons n the emma beow. Lemma 2. Let Γ j = { R j : > } R j, j J. A vector y Υ ff y Υ and there exsts a vector η such that where ( defned n (3d. (y = η j, Γ j, (2a (y η j, R j \ Γ j, (2b Proof of Lemma 2. Consder the foowng optmzaton probem, mn F (y (22a s.t. ρ j, j J (22b R j, j J, R j. (22c Snce F (y s an strcty ncreasng functon wth respect to, for a j J, R j, t s easy to chec that the optmzaton (8 has the same set of optma soutons as the optmzaton (22. Moreover, both optmzatons have the same optma vaue. Hence we can use the Lagrange mutpers η j and ν (j to characterze the Lagrangan as foows. L(η, ν, y =F (y + η j (ρ j ; R j = ( g s j + [ ηj ρ j ν (j ; R j R j, (η j + ν (j ; R j ]. (23 From KKT condtons [23], y Υ, f and ony f there exst vectors η and ν such that the foowng hods. Feasbty: y Υ, Compementary sacness: η j (ρ j Statonarty: (24a η j, j J, (24b ν (j, j J, R j, (24c ; R j =, j J, (25a ν (j =, j J, R j, (25b L(η, ν, y =. j J, R j. (26a y ( j Note that (24a mpes (25a. It foows from (26a that F (y y ( j = η j + ν (j, j J, R j. (27 Defne Γ j as n the statement of the emma. Note that Γ j s nonempty for a j J by (24a. Then combnng (25b and F (y (27, j J, and notng that = (y by defnton, yeds (2a-(2b. Next, we show that the set of optma soutons Υ s an nvarant set of the fud mts, usng the fud mt equatons (3a-(3d, and Lemma 2. Lemma 3. Υ s an nvarant set for the fud mts,.e., startng from any nta condton y( Υ, y(t Υ for a t. Proof of Lemma 3. Consder a type-j fow and et I (j (t = arg mn R j (y(t be the set of paths wth the mnmum path margna cost. Note that I (j (t p(j (t =, t, by (3b, therefore d ( dt I (j (t (t ( = λ j I (j (t (t µ j. (28 Snce y( Υ, t foows from Lemma 2 that ( = ρ j. Hence, Equaton (28 has a unque I (j ( y(j souton for I (j (t y(j (t whch s (t = ρ j, t. (29 I (j (t On the other hand, snce x (j ( = ρ j, by (5, x (j (t = (t = ρ j, t. (3 R j Equatons (29 and 3 mpy that, at any tme t, (t = for / I (j (t, and (t for I (j (t such that

I (j (t y(j (t = ρ j. Hence, y(t = ( usng η j (t = mn Rj (y(t n Lemma 2. (t Υ by Next, we show that the fud mts ndeed converge to the nvarant set Υ startng from an nta condton n Υ. Lemma 4. (Convergence to the nvarant set Consder the fud mts of the system under Agorthm wth nta condton y( Υ, then d(y(t, Υ. (3 Aso convergence s unform over the set of nta condtons Υ. Proof of Lemma 4. Startng from y( Υ, (5 mpes that x (j (t = (t = ρ j j J, (32 R j at any tme t. To show convergence of y(t to the set Υ, we use a Lyapunov argument. Specfcay, we choose F (. as the Lyapunov functon and show that (d/dtf (y(t < f y(t / Υ. Let η j (y(t = mn Rj (y(t. Then F (y R j d (t dt (d/dtf (y(t = = (y(t [ λ j p (j (t µ j (t ] R j = [ µ j ρj (y(tp (j (t (y(t (t ] R j R j (a = [ µ j ρj η j (y(t (y(t (t ] (33 R j (b < [ µ j ρj η j (y(t η j (y(t (t ] R j (c =. Equaty (a foows from the fact that p (j (t = f (t > η j (t by (3b. Inequaty (b foows from the fact that y(t / Υ, so by Lemma 2, there exsts an R j such that (t > but (y(t > η j (y(t. Equaty (c hods because of (32. Now we are ready to compete the proof of Proposton,.e., to show that startng from any nta condton n a compact set, unform convergence to the nvarant set Υ hods. Proof of Proposton. Frst note that (d/dtf (y(t (as gven by (33 s a contnuous functon wth respect to y(t = ( (t. Ths s because the path margna costs (y(t are contnuous functons of y(t and so s ther mnmum η j (y(t = mn Rj (y(t. Next, note that by Lemma 4, for any ɛ >, and a Υ, there exsts an ɛ 2 > such that f F (a F (Υ ɛ then (d/dtf (y(t y(t=a ɛ 2. By the contnuty of (d/dtf (y(t n y(t, there exsts a δ > such that y(t a δ mpes (d/dtf (y(t (d/dtf (a ɛ 2 /2. Therefore, for a y(t such that y(t a δ, (d/dtf (y(t ɛ 2 /2. By (5, for any δ >, we can fnd t δ arge enough such that for a t > t δ, y(t a δ for some a Υ. Puttng everythng together, for any ɛ >, there exsts ɛ 2 > such that f F (y(t F (Υ ɛ then (d/dtf (y(t ɛ 2 /2. Ths competes the proof of Proposton. V. SIMULATION RESULTS In ths secton, we provde smuaton resuts and evauate the performance of our agorthm under a wde range of traffc condtons n the foowng datacenter archtectures: FatTree whch conssts of a coecton of edge, aggregaton, and core swtches and offers equa ength path between the edge swtches. Fgure a shows a FatTree wth 6 servers and 8 4-port edge swtches. For smuatons, we consder a FatTree wth 28 servers and 32 8-port edge swtches. JeyFsh whch s a random graph n whch each swtch has ports out of whch r ports are used for connecton to other swtches and the remanng r ports are used for connecton to servers. Fgure b shows a JeyFsh wth 4-port swtches, and = 4, r = 2 for a the swtches. For smuatons, we consder a JeyFsh constructed usng 2 8-port swtches and servers. Each 8-port swtch s connected to 5 servers and 3 remanng ns are randomy connected to other swtches (ths corresponds to = 8, r = 3 for a the swtches. Our ratonae for seectng these archtectures stems from the fact that they are on two opposng sdes of the spectrum of topooges: whe FatTree s a hghy structured topoogy, JeyFsh s a random topoogy; hence they shoud provde a good estmate for the robustness of our agorthm to dfferent networ topooges and possbe n faures. We generate the fows under two dfferent traffc modes to whch we refer to as exponenta mode and emprca mode: Exponenta mode: Fows are generated per Posson processes and exponentay dstrbuted duratons. The parameters of duraton dstrbuton s chosen unformy at random from.5 to.5 for dfferent fows. The fow szes are chosen accordng to a og-norma dstrbuton. Emprca mode: Fows are generated based on recent emprca studes on characterzaton of datacenter traffc. As suggested by these studes, we consder ognorma nter-arrva tmes [24], servce tmes based on the emprca resut n [], and og-norma fow szes [24]. Partcuary, the most perods of congeston tend to be short ved, namey, more than 9% of the fows that are more than second ong, are no onger than 2 seconds []. In both modes, the fow szes are og-norma wth mean. and standard devaton. Ths generates fow szes rangng

Normazed Networ Cost.2.8.6.4.2 2 3 4 5 Tme Ag. (a Convergence n FatTree. Normazed Networ Cost.2.8.6.4.2 2 3 4 5 6 Tme Ag. (b Convergence n JeyFsh. Fg. 2: Convergence of the networ cost under Agorthm, normazed wth the the ower-bound on the optma souton (, to. The scang parameter r s here. Normazed Networ Cost 2.8.6.4.2.8.6.4.2.2.3.5.95 Traffc Intensty ECMP Ag. (a Exponenta traffc mode. Normazed Networ Cost 2.5 2.5.5.2.3.5.95 Traffc Intensty (b Emprca traffc mode ECMP Ag. Fg. 3: Performance rato of Agorthm and ECMP n FatTree, normazed wth the ower-bound (. from.% to 4% of n capacty whch captures the nature of fow szes n terms of mce and eephant fows. Furthermore, we consder a random traffc pattern,.e., source and destnaton of fows are chosen unformy at random. The n cost parameter α s chosen to be n ths smuatons. Under both modes, to change the traffc ntensty, we eep the other parameters fxed and scae the arrva rates (wth parameter r. We report the smuaton resuts n terms of the performance rato between our agorthm and a benchmar agorthm (smar to (2. Snce the optma agorthm s hard to mpement, nstead we use a convex reaxaton method to fnd a ower-bound on the optma cost at each tme. Specfcay, every tme a fow arrves or departs, we use [25], to mnmze F (Y (t, by reaxng the combnatora constrants,.e., aowng spttng of fows among mutpe paths and reroutng the exstng fows. We compare the networ cost under our agorthm (Agorthm and tradtona ECMP, normazed by the ower-bound on the optma souton (to whch we refer to as n the pots. A. Expermenta Resuts for FatTree Fgure 2a shows that the aggregate cost under Agorthm ndeed converges to the optma souton (normazed cost rato goes to whch verfes Theorem. Fgures 3a and 3b show the cost performance under Agorthm and ECMP, normazed by the ower-bound, under the exponenta and the emprca traffc modes respectvey. The traffc ntensty s measured n terms of the rato between the steady state offered oad and the bsecton bandwdth. For FatTree, the bsecton bandwdth depends on the number of core swtches and ther number of ports. As we can see, our agorthm s very cose to the ower-bound on the optma vaue ( for ght, medum, and hgh traffc ntenstes. They aso suggest that Theorem ndeed hods under more genera arrva and servce tme processes. In ths smuatons, our agorthm gave a performance mprovement rangng form 5% to more than %, compared to ECMP, dependng on the traffc ntensty, under the emprca traffc mode. Normazed Networ Cost.8.6.4.2.8.6.4.2.5.3.7.95 Traffc Intensty ECMP Ag. (a Exponenta traffc mode. Normazed Networ Cost.6.4.2.8.6.4.2.5.3.7.95 Traffc Intensty (b Emprca traffc mode. ECMP Ag. Fg. 4: Performance rato of Agorthm and ECMP n JeyFsh, normazed wth the ower-bound (. B. Expermenta Resuts for JeyFsh Fgure 2b shows that the aggregate cost under Agorthm ndeed converges to the optma souton whch agan verfes Theorem. Fgures 4a and 4b compare the performance of Agorthm and ECMP, normazed wth the ower-boud on the optma souton (, under both the exponenta and emprca traffc modes. As before, the traffc ntensty s measured by the rato between the steady state offered oad and the bsecton bandwdth. To determne the bsecton bandwdth, we have used the bounds reported n [26], [27] for reguar random graphs. Agan we see that our agorthm performs very we n a ght, medum, and hgh traffcs. In JeyFsh, our agorthm yeds performance gans rangng from 6% to 7%, compared to ECMP, under the emprca traffc mode. VI. FORMAL PROOFS OF FLUID LIMITS AND THEOREM A. Proof of Fud Lmts We prove the exstence of fud mts under our agorthm and derve the correspondng fud equatons (3a-(3d. Arguments n ths secton are qute standard [28], [29], [3]. Reca that Y r (t s the system state wth the fow arrva rate rλ j, j J, and nta state Y r (. The fud-scae process s y r (t = Y r (t/r, t [,. Smary, z r (t = Zr (t/r and x (jr (t = X (jr (t/r are defned. We assume that y r ( y( as r for some fxed y(.

We frst show that, under Agorthm, the mt of the process {y r (t} t exsts aong a subsequence of r as we show next. The process Y r (t can be constructed as foows Y (j r (t =Y (j Π d,j( r ( + Π a,j ( t t µ j Y (j r (sds P (j (Y r (srλ j ds j J, Rj (34 where Π a,j (. and Πd,j (. are ndependent unt-rate Posson processes, and P (j (Y r (t s the probabty of assgnng a type-j fow to path when the system state s Y r (t. Note that by the Functona Strong Law of Large Numbers [3], amost surey, r Πa,j(rt t, u.o.c.; r Πd,j(rt t, u.o.c. (35 where u.o.c. means unformy over compact tme ntervas. Defne the fud-scae arrva and departure processes as a r,j(t = r Πa,j( t d r,j(t = r Πd,j( P (j t (Y r (srλ j ds, (36 µ j Y (j r (sds. Lemma 5. (Convergence to fud mt sampe paths If y r ( y(, then amost surey, every subsequence (y rn, a rn, d rn has a further subsequence (y rn, a rn, d rn such that (y rn, a rn, d rn (y, a, d. The sampe paths y, a, d are Lpschtz contnuous and the convergence s u.o.c. Proof Setch of Lemma 5. The proof s standard and foows from the fact that a r,j (. and dr,j (. are asymptotcay Lpschtz contnuous (see e.g., [28], [29], [32] for smar arguments, namey, there exsts a constant C > such that for t t 2 <, m sup(a r,j(t 2 a r,j(t C(t 2 t, r and smary for d r,j (.. The above nequaty foows from (35 and notng that (y r (. s unformy bounded over any fnte tme nterva for arge r. So the mt (y, a, d exsts aong the subsequence. Proof of Lemma. It foows from (34, (36, (35, and the exstence of the fud mts (Lemma 5, that where a (j R j (t = d (j (t = ( + a (j t (t d (j (t, (sµ j ds, (t = λ j t, a (j (t s nondecreasng. The fud equatons (3a and (3c are the dffrenta form of these equatons (the fud sampe paths are Lpschtz contnuous so the dervatves exst amost everywhere, where p (j (t := da (j λ j dt (t. (37 For any type j, et wj (y(t = mn R j (y(t, for (y(t defned n (3d. Consder any reguar tme t and a path / arg mn Rj (y(t. By the contnuty of (y(t, there must exst a sma tme nterva (t, t 2 around t such that (y(τ > wj (τ for a τ (t, t 2. Consequenty, for a r arge enough aong the subsequence, (y r (τ > wj (yr (τ, τ (t, t 2. Mutpyng both sdes by r α, t foows that (Y r (τ > wj (Y r (τ, τ (t, t 2. Hence P (j (Y r (τ =, τ (t, t 2, and a r(j (t, t 2 =, for a r arge enough aong the subsequence. Therefore a (j (t, t 2 = whch shows that (d/dta (j (t = at t (t, t 2. Ths estabshes (3b. B. Proof of Theorem We frst show that F (y r ( = F, (38 where F = F (Υ s the optma cost. By Proposton and the contnuty of F (, for any fud sampe path y(t wth nta condton y(, we can choose t ɛ arge enough such that gven any sma ɛ >, F (y(t ɛ F ɛ. Wth probabty, y r (t y(t u.o.c. (see Lemma 5, hence, by the contnuous mappng theorem [3], we aso have F (y r (t F (y(t, u.o.c. For any ɛ 2 >, for r arge enough, we can choose an ɛ 3 > such that, unformy over a nta states y r ( such that y r ( y( ɛ 3, P{ F (y r (t ɛ F (y(t ɛ < ɛ } > ɛ 2 (39 Ths cam s true, snce otherwse for a sequence of nta states y r (t y( we have P{ F (y r (t ɛ F (y(t ɛ < ɛ } ɛ 2, whch s mpossbe because, amost surey, we can choose a subsequence of r aong whch unform convergence F (y r (t F (y(t, wth nta condton y( hods. Hence, P{ F (y r (t ɛ F < 2ɛ } P{ F (y r (t ɛ F (y(t ɛ + F (y(t ɛ F < 2ɛ } P{ F (y r (t ɛ F (y(t ɛ < ɛ } > ɛ 2 whch n partcuar mpes (38 because ɛ and ɛ 2 can be made arbtrary sma. Next, we show (2. Under any agorthm (ncudng our agorthm and the optma one, R j Y (j r ( /r = X (jr ( /r, where X (jr ( has Posson dstrbuton wth mean rρ j, and X (jr (, j J, are ndependent. Let s = max s j <. The traffc over each n s ceary bounded as Z r/r < s j X(jr ( /r = sx r ( /r where X r ( has Posson dstrbuton wth mean r j ρ j. Hence, ( sx r ( /r F (y r ( s stochastcay domnated by E g C, and g s poynoma. It then foows that the sequence of random varabes {F (y r ( } (and aso {y r ( } are unformy ntegrabe under any agorthm. Then, n vew of (38, by Theorem 3.5 of [3], under our agorthm. [ ] E F (Y r ( /r F. (4

Now consder any optma [ agorthm ] for [ the optmzaton (3. It hods that F (E yopt( r E F (yopt( r ] [ ] E F (y r ( where the frst nequaty s by Jensen s nequaty. Tang the mt as r, t foows by an squeeze argument that [ ] E F (Yopt( /r r F. (4 (4 and (4 w mpy (2 n vew of the poynoma structure of F. VII. CONCLUDING REMARKS Ths paper presents a smpe agorthm that dynamcay adjusts the n weghts as a functon of the n utzatons and paces any newy generated fow on a east weght path n the networ, wth no spttng/mgraton of exstng fows. We demonstrate both theoretcay and expermentay that ths agorthm has a good oad baancng performance. In partcuar, we prove that the agorthm asymptotcay mnmzes a networ cost and estabsh the reatonshp between the networ cost and the correspondng weght construct. Athough our theoretca resut s an asymptotc resut, our expermenta resuts show that the agorthm n fact performs very we under a wde range of traffc condtons and dfferent datacenter networs. Whe the agorthm has ow compexty, the rea mpementaton depends on how fast the weght updates and east weght paths can be computed n practca datacenters (e.g., based on SDN. One possbe way to mprove the computaton tmescae s to perform the computaton perodcay or ony for ong fows, whe usng the prevousy computed east weght paths for short fows or between the perodc updates. REFERENCES [] M. A-Fares, A. Loussas, and A. Vahdat, A scaabe, commodty data center networ archtecture, ACM SIGCOMM Computer Communcaton Revew, vo. 38, no. 4, pp. 63 74, 28. [2] T. Benson, A. Anand, A. Aea, and M. Zhang, McroTE: Fne graned traffc engneerng for data centers, n Proceedngs of the 7th Conference on Emergng Networng Experments and Technooges. ACM, 2, p. 8. [3] M. A-Fares, S. Radharshnan, B. Raghavan, N. Huang, and A. Vahdat, Hedera: Dynamc fow schedung for data center networs. n NSDI, vo., 2, pp. 9 9. [4] C. Racu, S. Barre, C. Punte, A. Greenhagh, D. Wsch, and M. Handey, Improvng datacenter performance and robustness wth mutpath TCP, ACM SIGCOMM Computer Communcaton Revew, vo. 4, no. 4, pp. 266 277, 2. [5] S. Kandua, D. Katab, S. Snha, and A. Berger, Dynamc oad baancng wthout pacet reorderng, ACM SIGCOMM Computer Communcaton Revew, vo. 37, no. 2, pp. 5 62, 27. [6] A. Greenberg, J. R. Hamton, N. Jan, S. Kandua, C. Km, P. Lahr, D. A. Matz, P. Pate, and S. Sengupta, VL2: A scaabe and fexbe data center networ, n ACM SIGCOMM Computer Communcaton Revew, vo. 39, no. 4, 29, pp. 5 62. [7] C. Guo, G. Lu, D. L, H. Wu, X. Zhang, Y. Sh, C. Tan, Y. Zhang, and S. Lu, BCube: A hgh performance, server-centrc networ archtecture for moduar data centers, ACM SIGCOMM Computer Communcaton Revew, vo. 39, no. 4, pp. 63 74, 29. [8] M. Bradonjć, I. Sanee, and I. Wdjaja, Scang of capacty and reabty n data center networs, ACM SIGMETRICS Performance Evauaton Revew, vo. 42, no. 2, pp. 46 48, 24. [9] A. Snga, C.-Y. Hong, L. Popa, and P. B. Godfrey, Jeyfsh: Networng data centers randomy. n NSDI, vo. 2, 22, pp. 7 7. [] S. Kandua, S. Sengupta, A. Greenberg, P. Pate, and R. Chaen, The nature of data center traffc: Measurements & anayss, n Proceedngs of the 9th ACM SIGCOMM Conference On Internet Measurement Conference, 29, pp. 22 28. [] S. Even, A. Ita, and A. Shamr, On the compexty of tme tabe and mut-commodty fow probems, n 6th Annua Symposum on Foundaton of Computer Scence. IEEE, 975, pp. 84 93. [2] G. M. Gusewte and P. M. Pardaos, Mnmum concave-cost networ fow probems: Appcatons, compexty, and agorthms, Annas of Operatons Research, vo. 25, no., pp. 75 99, 99. [3] R. Nranjan Mysore, A. Pambors, N. Farrngton, N. Huang, P. Mr, S. Radharshnan, V. Subramanya, and A. Vahdat, Portand: A scaabe faut-toerant ayer 2 data center networ fabrc, n ACM SIGCOMM Computer Communcaton Revew, vo. 39, no. 4, 29, pp. 39 5. [4] J. Cao, R. Xa, P. Yang, C. Guo, G. Lu, L. Yuan, Y. Zheng, H. Wu, Y. Xong, and D. Matz, Per-pacet oad-baanced, ow-atency routng for cos-based data center networs, n Proceedngs of the 9th ACM Conference on Emergng Networng Experments and Technooges. ACM, 23, pp. 49 6. [5] P. G, N. Jan, and N. Nagappan, Understandng networ faures n data centers: Measurement, anayss, and mpcatons, n ACM SIGCOMM Computer Communcaton Revew, vo. 4, no. 4, 2, pp. 35 36. [6] S. Sen, D. Shue, S. Ihm, and M. J. Freedman, Scaabe, optma fow routng n datacenters va oca n baancng, n Proceedngs of the 9th ACM Conference on Emergng Networng Experments and Technooges, 23, pp. 5 62. [7] J. W. Jang, T. Lan, S. Ha, M. Chen, and M. Chang, Jont VM pacement and routng for data center traffc engneerng, n Proceedngs of IEEE, INFOCOM, 22, pp. 2876 288. [8] A. Dxt, P. Praash, Y. C. Hu, and R. R. Kompea, On the mpact of pacet sprayng n data center networs, n Proceedngs of IEEE, INFOCOM, 23, pp. 23 238. [9] N. McKeown, T. Anderson, H. Baarshnan, G. Paruar, L. Peterson, J. Rexford, S. Shener, and J. Turner, OpenFow: Enabng nnovaton n campus networs, ACM SIGCOMM Computer Communcaton Revew, vo. 38, no. 2, pp. 69 74, 28. [2] M. Casado, M. J. Freedman, J. Pettt, J. Luo, N. Gude, N. McKeown, and S. Shener, Rethnng enterprse networ contro, IEEE/ACM Transactons on Networng (TON, vo. 7, no. 4, pp. 27 283, 29. [2] M. Chesa, G. Knder, and M. Schapra, Traffc engneerng wth equacost-mutpath: An agorthmc perspectve, n Proceedngs of IEEE, INFOCOM, 24, pp. 59 598. [22] B. Fortz and M. Thorup, Internet traffc engneerng by optmzng OSPF weghts, n Proceedng of 9th annua jont conference of the IEEE computer and communcatons socetes. INFOCOM 2, vo. 2, pp. 59 528. [23] S. Boyd and L. Vandenberghe, Convex optmzaton. Cambrdge unversty press, 24. [24] D. Ersoz, M. S. Yousf, and C. R. Das, Characterzng networ traffc n a custer-based, mut-ter data center, n ICDCS 7. 27th Internatona Conference on Dstrbuted Computng Systems, 27. IEEE, pp. 59 59. [25] M. Grant and S. Boyd, : Matab software for dscpned convex programmng, verson 2., http://cvxr.com/cvx, Mar. 24. [26] J. Díaz, M. J. Serna, and N. C. Wormad, Bounds on the bsecton wdth for random d-reguar graphs, Theoretca Computer Scence, vo. 382, no. 2, pp. 2 3, 27. [27] B. Boobás, Random graphs. Sprnger, 998. [28] A. L. Stoyar, An nfnte server system wth genera pacng constrants, Operatons Research, vo. 6, no. 5, pp. 2 27, 23. [29] A. L. Stoyar and Y. Zhong, Asymptotc optmaty of a greedy randomzed agorthm n a arge-scae servce system wth genera pacng constrants, Queueng Systems, vo. 79, no. 2, pp. 7 43, 25. [3] J. Ghader, Y. Zhong, and R. Srant, Asymptotc optmaty of BestFt for stochastc bn pacng, ACM SIGMETRICS Performance Evauaton Revew, vo. 42, no. 2, pp. 64 66, 24. [3] P. Bngsey, Convergence of probabty measures. John Wey & Sons, 23. [32] S. N. Ether and T. G. Kurtz, Marov processes: Characterzaton and convergence. John Wey & Sons, 29, vo. 282.