Network Architecture for Joint Failure Recovery and Traffic Engineering Martin Suchara Dept. of Computer Science Princeton Univerity, NJ 08544 muchara@princeton.edu Dahai Xu AT&T Lab Reearch Florham Park, NJ 07932 dahaixu@reearch.att.com Robert Doverpike AT&T Lab Reearch Florham Park, NJ 07932 rdd@reearch.att.com ABSTRACT David Johnon AT&T Lab Reearch Florham Park, NJ 07932 dj@reearch.att.com Today network typically handle traffic engineering (e.g., tuning the routing-protocol parameter to optimize the flow of traffic) and failure recovery (e.g., pre-intalled backup path) independently. In thi paper, we propoe a unified way to balance load efficiently under a wide range of failure cenario. Our architecture upport flexible plitting of traffic over multiple precomputed path, with efficient pathlevel failure detection and automatic load balancing over the remaining path. We propoe two candidate olution that differ in how the router rebalance the load after a failure, leading to a trade-off between router complexity and load-balancing performance. We preent and olve the optimization problem that compute the configuration tate for each router. Our experiment with traffic meaurement and topology data (including hared rik in the underlying tranport network) from a large ISP identify a weet pot that achieve near-optimal load balancing under a variety of failure cenario, with a relatively mall amount of tate in the router. We believe that our olution for joint traffic engineering and failure recovery will appeal to Internet Service Provider a well a the operator of data-center network. Jennifer Rexford Dept. of Computer Science Princeton Univerity, NJ 08544 jrex@princeton.edu 1. INTRODUCTION To enure uninterrupted data delivery, communication network mut ditribute traffic efficiently even a link and router fail and recover. By tuning routing to the offered traffic, traffic engineering [28] improve performance and allow network operator to defer expenive outlay of new capacity. Effective failure recovery [29,35] adapting to failure by directing traffic over good alternate path i alo important to avoid performance diruption. However, today network typically handle failure recovery and traffic engineering independently, leading to more complex router and le efficient path after failure. In thi paper, we propoe an integrated olution with much impler router that balance load effectively under a range of failure cenario. We argue that traffic engineering and failure recovery can be achieved by the ame underlying approach dynamically rebalancing traffic acro divere end-to-end path in repone to individual failure event. Thi reduce the complexity of the router by moving mot functionality to the management ytem an algorithm run by the network operator. Our network architecture ha three key feature: Precomputed multipath routing: Traffic between each pair of edge router i plit over multiple path that are configured in advance. The router do not compute (or recompute) path, reducing router overhead and improving path tability. Intead, the management ytem compute path that offer ufficient diverity acro a range of failure cenario, including correlated failure of multiple link. Path-level failure detection: The ingre router perform failure recovery baed only on which path have failed. A minimalit control plane perform path-level failure detection and notification, in contrat to the link-level probing and network-wide flooding common in today intradomain routing protocol. Thi lead to impler, cheaper router. Local adaptation to path failure: Upon detecting path failure, the ingre router rebalance the traffic on the remaining path, baed only on which path() failed not on load information. Thi avoid having the router ditribute real-time update about link load and prevent intability. Intead, the management ytem precompute the reaction to path failure and configure the router accordingly. The firt two feature multiple precomputed path and path-level monitoring are idea that have been urfacing (ometime implicitly) in the networking literature over the pat few year (e.g., [4, 15, 25, 40], and many other). Our architecture combine thee two idea in a new way, through (i) a pecific propoal for the diviion of labor between the router and the management ytem and (ii) an integrated view of traffic engineering and failure recovery within a ingle adminitrative domain. To upport the imple network element, the management ytem make network-wide deciion baed on the expected traffic, the network topology, and the group of link that can fail together. The management ytem doe not need to make thee deciion in real time quite the contrary, offline algorithm can compute the path and the adaptation to path failure to enure good performance. Our architecture raie important quetion about (i) what configuration tate the router hould have to drive their local reaction to path failure and (ii) how the management
ytem hould compute thi tate, and the underlying path, for good traffic engineering and failure recovery. In addreing thee quetion, we make four main contribution: Simple architecture for joint TE and failure recovery (Section 2): We propoe a joint olution for traffic engineering and failure recovery, in contrat to today network that handle thee problem eparately. Our minimalit control plane ha router balance load baed only on path-failure information, in contrat to recent deign that require router to dieminate link-load information and compute new path-plitting parameter in real time [19, 24]. Network-wide optimization acro failure cenario (Section 3): We formulate and olve network-wide optimization problem for configuring the router. Our algorithm compute (i) multiple path that ditribute traffic efficiently under a range of failure cenario and (ii) the tate for each ingre router to adapt to path failure. We preent algorithm for two router deign that trike a different tradeoff between router tate and load-balancing performance. Experiment with meaurement data from a large ISP (Section 4): We evaluate our algorithm on meaurement data from a tier-1 ISP network. Our imulation achieve a high degree of accuracy by utilizing the real topology, link capacitie, link delay, hourly traffic matrice, and Shared Rik Link Group (SRLG) [14]. Our experiment how that one of our candidate router deign achieve nearoptimal load balancing acro a wide range of failure cenario, even when the traffic demand change dynamically. Deployability in ISP and data-center network (Section 5): While our architecture enable impler router and witche, exiting equipment can upport our olution. ISP backbone can ue RSVP to ignal multiple MPLS [30] path, with hah-baed plitting of traffic over the path. In data center, the fabric controller can configure multiple path through the network, and the erver machine can encapulate packet to plit traffic in the deired proportion. The paper end with related work in Section 6, concluion in Section 7, and upporting proof in an Appendix. 2. SIMPLE NETWORK ARCHITECTURE Our architecture ue imple, cheap router to balance load before, during, and after failure by placing mot functionality in a management ytem that perform offline optimization. The network-management ytem compute multiple divere path between each pair of edge router, and tell each ingre router how to plit traffic over thee path under a range of failure cenario. Each edge router imply detect path-level failure and ue thi information to adjut the plitting of traffic over the remaining path, a hown in Figure 1. The main novel feature of our architecture i the way router plit traffic over the working path; we propoe two approache that introduce a trade-off between router tate and load-balancing performance. 2.1 Precomputed Multipath Routing Many exiting routing protocol compute a ingle path between each pair of router, and change that path in repone to topology change. However, dynamic routing ha many downide, including the overhead on the router (to dieminate topology information and compute path) and the tranient diruption during routing-protocol convergence. Technique for making convergence fater tend to increae the complexity of the routing oftware and the overhead on fixed path plitting ratio topology deign lit of hared rik traffic demand link cut Figure 1: The management ytem calculate a fixed et of path and plitting ratio, baed on the topology, traffic demand, and potential failure. The ingre router learn about path failure and plit traffic over the remaining path, baed on preconfigured plitting ratio. the router, by dieminating more information or updating it more quickly. Rather than trying to reduce convergence time, or add mechanim to detect tranient loop and blackhole, we avoid dynamic routing protocol entirely [4]. Our architecture ue multiple preconfigured path between each pair of edge router, allowing ingre router to adapt to failure by hifting traffic away from failed path(). With multiple path through the network, the router do not need to recompute path dynamically they imply top uing the failed path until they tart working again. Thi ubtantially reduce router oftware complexity and protocol overhead (e.g., bandwidth and CPU reource), while entirely ide-tepping the problem of convergence. Intead, the management ytem compute thee path, baed on both traffic-engineering and failure-recovery goal, and intall the path in the underlying router. The management ytem can elect divere path that enure connectivity in the face of failure, including multiple correlated failure. Uing multiple path alo lead to better load balancing, whether or not failure occur. Today hortet-path routing protocol (like OSPF and IS-IS) ue a ingle path, or (at bet) only upport even plitting of traffic over multiple hortet path. Our architecture (like other recent propoal for multipath load balancing [7, 15, 18, 39]) allow flexible plitting of traffic over multiple path. However, we do not require dynamic adaptation of the traffic plitting. Intead, the ingre router ha a imple tatic configuration that determine the plitting of traffic over the available path, while intermediate router merely forward packet over pre-etablihed path. The management ytem optimize thi configuration in advance baed on a network-wide view of the expected traffic and likely failure. Thi avoid the protocol overhead and tability challenge of ditributed, load-enitive routing protocol. Alo, the management ytem can ue knowledge about hared rik and anticipated traffic demand information the router do not have. 2.2 Path-Level Failure Detection Mot routing protocol detect failure by exchanging hello meage between neighboring router and flooding the topol- t
Optimal (Baeline) State-Dependent Splitting State-Independent Splitting Router tate Exponential in total Exponential in # of pre-configured Linear in # of pre-configured # of link path between two router path between two router Failure information Link level Path level Path level Optimality Optimal Nearly-optimal Good Table 1: Propertie of the candidate olution. The olution differ in the amount of configuration tate that mut be tored in the router, the information the router mut obtain about each failure, and the achieved traffic-engineering performance. ogy change through the network. Thi approach require mall timer for fat failure detection, impoing additional overhead on the router. In addition, many failure are triggered by planned maintenance [22], leading to two convergence event one for the link failure(), and another for the recovery that both caue tranient diruption. In addition, hello meage do not detect all kind of failure ome miconfiguration (e.g., having a maximum packet ize that i too mall) and attack (e.g., an adverary electively dropping packet) do not lead to lot hello meage. Intead, our architecture relie on path-level failure detection. Each ingre-egre router pair ha a eion to monitor each of it path (e.g., a in BFD [16]). The probe can be piggybacked on exiting data traffic, obviating the need for eparate hello meage when the path i carrying regular data traffic. Thi enable fat failure detection without introducing extra probe traffic, and the implicit probe provide a more realitic view of the reliability of a path [3, 12], ince the packet vary in ize, addree, and o on. Another advantage i that the packet are handled by the hardware interface and, a uch, do not conume proceing reource (or experience oftware proceing delay) at intermediate router. (Still, the propagation delay along a path doe impoe limit on detection time in large topologie, an iue we dicu in more detail in Section 5.) Although the ingre router doen t learn which link failed, knowledge of the path failure i ufficient to avoid the failed path. In fact, ince the router need not be aware of the topology, no control protocol i needed to exchange topology information. In fact, only ome of the ingre router need to learn about the failure only the router that have path travering the failed edge. The other ingre router, and the intermediate router, can remain unaware of the link failure. Of coure, the management ytem ultimately need to know about topology change, o failed equipment can be fixed or replaced. But thi detection problem can be handled on a much longer timecale ince it doe not affect the failure-recovery time for data traffic. 2.3 Local Adaptation to Path Failure In our architecture, a router i a imple device that doe not participate in a routing protocol, collect congetion feedback, or olve any computationally difficult problem. Still, the router do play an important role in adapting the ditribution of traffic when path fail or recover, at the behet of the management ytem. We propoe two different way the router can plit traffic over the working path: (i) tateindependent plitting which ha minimal router tate and (ii) tate-dependent plitting which introduce more tate in exchange for near-optimal performance, a ummarized (and compared to an idealized olution) in Table 1. Optimal load balancing: Thi idealized olution calculate the optimal path and plitting ratio eparately for each poible failure tate, i.e., for each combination of link failure. Thi approach achieve the bet poible load balancing by finding the optimal et of path and plitting ratio. However, the approach i impractical becaue the router mut (i) tore far too much tate and (ii) learn about all link failure even on link the router path do not travere. Therefore, thi olution would violate our architecture. However, the olution i till intereting a an upper bound on the performance of the other two cheme. State dependent plitting: In thi olution, each ingre router ha a eparate configuration entry with path-plitting weight for each combination of path failure to a particular egre router. For example, uppoe a router ha three path to an egre router. Then, the router configuration contain even entrie one for each of the 2 3 1 combination of path failure. Each configuration entry, computed ahead of time by the management ytem, conit of three weight one per path, with a 0 for any failed path. Upon detecting path failure, the ingre router inpect a pre-configured table to elect the appropriate weight for plitting the traffic detined to the egre router. Our experiment in Section 4 how that, even in a large ISP backbone, having three or four path i ufficient, leading to modet tate requirement on the router in exchange for near-optimal load balancing. State independent plitting: Thi olution further implifie the router configuration by having a ingle et of weight acro all failure cenario. So, an ingre router with three path to an egre router would have only three weight, one for each path. If any path fail, the ingre router imply renormalize the traffic on the remaining path. A uch, the management ytem mut perform a robut optimization of the limited configuration parameter to achieve good load-balancing performance acro a range of failure cenario. Our experiment in Section 4 how that thi imple approach can perform urpriingly well, but undertandably not a well a tate-dependent plitting. 3. NETWORK-WIDE OPTIMIZATION In our architecture, the network-management ytem perform network-wide optimization to compute path and trafficplitting ratio that balance load effectively acro a range of failure cenario. In thi ection, we firt dicu the information the management ytem ha about the network topology, traffic demand, and hared rik. Then, we explain how the management ytem compute the multiple divere path and the traffic-plitting ratio, for both tate-dependent and tate-independent plitting. We olve all optimization prob-
lem either by formulating them a linear program olvable in polynomial time, or by providing heuritic for olving NP-hard problem. Table 2 ummarize the notation. 3.1 Network-Wide Viibility and Control The management ytem compute path and plitting ratio baed on a network-wide view: Fixed topology: The management ytem make deciion baed on the deigned topology of the network the router and link that have been deployed. The topology i repreented by a graph G(V, E) with a et of vertice V and directed edge E. The capacity of edge e E i denoted by c e, and the propagation delay on the edge i y e. Shared rik link group: The management ytem know which link hare a common vulnerability, uch a connecting to the ame line card or router or travering the ame optical fiber or amplifier [14]. The hared rik are denoted by the et S, where each S conit of a et of edge that may fail together. For example, a router failure i repreented by the et of it incident link, a fiber cut i repreented by all link in the affected fiber bundle, and the failure-free cae i repreented by the empty et. Operator alo have meaurement data from pat failure to produce etimate for the likelihood of different failure (e.g., an optical amplifier may fail le often than a line card). A uch, each failure tate ha a weight w that repreent it likelihood or importance. Expected traffic demand: The management ytem know the anticipated traffic demand, baed on pat meaurement and prediction of traffic change. Each traffic demand d D i repreented by a triple (u d, v d, h d ), where u d V i the traffic ource (ingre router), v d V i the detination (egre router), and h d i the flow requirement Variable G(V, E) c e y e S w D u d v d h d P d α p O d o d () P o d fp fp o le le,d Decription network with vertice V and directed edge E capacity of edge e E propagation delay on edge e E family of network failure tate network failure tate (et of failed link) weight of network failure tate S et of demand ource of demand d D detination of demand d D flow requirement of demand d D path available to demand d D fraction of the demand aigned to path p family of obervable failure tate for node u d tate obervable by u d in failure tate S path available to u d in failure tate o O d flow on path p in failure tate S flow on path p in failure tate o O d total flow on edge e in failure tate flow of demand d on edge e in failure tate Table 2: Summary of notation (meaured traffic). For implicity, we aume that all demand remain connected for each failure cenario; alternatively, a demand can be omitted for each failure cae that diconnect it. In practice, the management ytem may have a time equence of traffic demand (e.g., for different hour in the day), and optimize the network configuration acro all thee demand, a we dicu in Section 4.3. The management ytem output i et of path P d for each demand d and the plitting ratio for each path. In each failure tate, the traffic plitting by ingre router u d depend only on which path have failed, not which failure cenario ha occurred; in fact, multiple failure cenario may affect the ame ubet of path in P d. To reaon about the handling of a particular demand d, we conider a et O d of obervable failure tate, where each obervable tate o O d correpond to a particular Pd o P d repreenting the available path. For eae of expreion, we let the function o d () map to the failure tate obervable by node u d when the network i in failure tate S. The amount of flow aigned to path p in obervable failure tate o O d i fp o. The total flow on edge e in failure tate i le, and the flow on edge e correponding to demand d i le,d. The management ytem goal i to compute path and plitting ratio that minimize congetion over the range of poible failure tate. A common traffic-engineering objective [10] i to minimize e E Φ(l e/c e) where l e i the load on edge e and c e i it capacity. Φ() could be a convex function of link load [10], to penalize the mot congeted link while till accounting for load on the remaining link. The final objective minimizing congetion acro failure cenario i obj(l 1 e 1 /c e1,...) = S w e E Φ(l e/c e). (1) Minimizing thi objective function i the goal of all the candidate olution in the following ection. The contraint that complete the problem formulation differ depending on the functionality placed in the underlying router. 3.2 Computing Multiple Divere Path The management ytem mut compute multiple divere path that enure good load balancing and (mot importantly) continued connectivity acro a range of failure cenario. However, computing the optimal path for tatedependent and tate-independent plitting i NP-hard. Intead, we propoe a heuritic: uing the collection of path computed by the optimal olution that optimize for each failure tate independently. Thi guarantee that the path are ufficiently divere to enure traffic delivery in all failure tate, while alo making efficient ue of network reource. The idealized optimal olution ha a eparate et of path and plitting ratio in each failure tate. To avoid introducing explicit variable for exponentially many path, we formulate the problem in term of the amount of flow le,d from demand d travering edge e for failure tate. The optimal edge load are obtained by olving a linear program: min obj(l 1 e 1 /c e1,...).t. l e = d D l e,d, e 0 = i:e=(i,j) l e,d i:e=(j,i) l e,d d,, j u d, v d h d = i:e=(u d,i) l e,d i:e=(i,u d ) l e,d d, 0 le,d d,, e, (2)
where le and le,d are variable. The firt contraint define the load on edge e, the econd contraint enure flow conervation, the third contraint enure that the demand are met, and the lat contraint guarantee flow non-negativity. An optimal olution can be found in polynomial time uing conventional technique for olving multi-commodity flow problem. After obtaining the optimal flow on each edge for all the failure cenario, we ue a tandard decompoition algorithm to determine the correponding path P d and the flow fp on each of them. The decompoition tart with a et P d that i empty. New unique path are added to the et by performing the following decompoition for each failure tate. Firt, annotate each edge e with the value le,d. Remove all edge that have 0 value. Then, find a path connecting u d and v d. Although we could chooe any of the path from u d to v d, our goal i to obtain path that are a hort a poible. So, if multiple uch path exit, we ue the path p with the mallet propagation delay. Add thi path p to the et P d and aign to it flow fp equal to the mallet value of the edge on path p. Reduce the value of thee edge accordingly. Continue in thi fahion, removing edge with zero value and finding new path, until there are no remaining edge in the graph. Note that we can how by induction that thi proce completely partition the flow le,d into path. The decompoition yield at mot E path for each network failure tate becaue the value of at leat one edge become 0 whenever a new path i found. Hence the total ize of the et P d i at mot E S. It i difficult to obtain a olution that retrict the number of path a we prove in the appendix that it i NP-hard to olve problem (2) when the number of allowed path i bounded by a contant J. In practice, the algorithm produce a relatively mall number of path between each pair of edge router, a hown later in Section 4. 3.3 Optimizing the Traffic-Splitting Ratio Once the path are computed, the network-management ytem can optimize the path-plitting ratio for each ingreegre router pair. The optimization problem and the reulting olution depend on whether the router perform tatedependent or tate-independent plitting. 3.3.1 State-Dependent Splitting In tate-dependent plitting, each ingre router u d ha a et of plitting ratio for each obervable failure tate o O d. Since the path-plitting ratio depend on which path in P d have failed, the ingre router mut tore plitting ratio for min( S, 2 Pd ) cenario; fortunately, the number of path P d i typically mall in practice. When the network perform uch tate-dependent plitting, the management ytem goal i to find a et of path P d for each demand and the flow fp o on thee path in all obervable tate o O d. If the path P d are known and fixed, the problem can be formulated a a linear program: min obj(l 1 e 1 /c e1,...).t. le = f o d D p P d o,e p p e,, o = o d () h d = p P d o fp o d, o O d 0 fp o d, o O d, p P d, where le and fp o are variable. The firt contraint define the load on edge e, the econd contraint guarantee that (3) the demand d i atified in all obervable failure tate, and the lat contraint enure non-negativity of flow aigned to the path. The olution of the optimization problem (3) can be found in polynomial time. The problem become NP-hard if the et of path {P d } are not known in advance. In fact, a we how in the Appendix, it i NP-hard even to tell if two path that allow an ingre router to ditinguih two network failure tate can be contructed. Therefore, it i NP-hard to contruct the optimal et of path for all our formulation that aume the ource do not have information about the network failure tate. Therefore, we ue the path that are found by the decompoition of the optimal olution (2), a outlined in the previou ubection. Since thee path allow optimal load balancing for the optimal olution (2), they are alo likely to enable good load balancing for the optimization problem (3). 3.3.2 State-Independent Splitting In tate independent plitting, each ingre router ha a ingle configuration entry containing the plitting ratio that are ued under any combination of path failure. Each path p i aociated with a plitting fraction α p. When one or more path fail, the ingre router u d renormalize the plitting parameter for the working path to compute the fraction of traffic to direct to each of thee path. If the network element implement uch tate-independent plitting, and the path P d are known and fixed, the management ytem need to olve the following non-convex optimization problem: min obj(l 1 e 1 /c e1,...).t. fp o α = h p d d, o O q P d o αq d, p P d le = f o d D p P d o,e p p e,, o = o d () h d = p P d o fp o d, o O d 0 fp o d, o O d, p P d, where le, fp o and α p are variable. The firt contraint enure that the flow aigned to every available path p i proportional to α p. The other three contraint are the ame a in (3). Unfortunately, no tandard optimization technique allow u to compute an optimal olution efficiently, even when the path P d are fixed. Therefore, we have to rely on heuritic to find both the candidate path P d and the plitting ratio α p. To find the et of candidate path P d, we again ue the optimal path obtained by decompoing (2). To find the plitting ratio we mimic the behavior of the optimal olution a cloely a poible. We find the plitting ratio for all path p by letting α p = w fp S h d where fp i the flow aigned by the optimal olution to path p in network failure tate. Since w = 1, the calculated ratio i the weighted average of the plitting ratio ued by the optimal olution (2). 4. EXPERIMENTAL EVALUATION To evaluate the algorithm decribed in the previou ection, we wrote a imulator in C++ that call the CPLEX linear program olver in AMPL and olve the optimization problem (2) and (3). We compare our two heuritic to the optimal olution, a imple equal plitting configuration, (4)
and OSPF with the link weight et uing tate-of-the-art optimization technique. We how that our two heuritic require few path reulting in compact routing table, and the round-trip propagation delay doe not increae. Finally, uing real traffic trace obtained during a 24-hour meaurement in the network of a tier-1 ISP we how that our olution achieve excellent reult without the need to perform any reoptimization even in the preence of a changing traffic matrix. Our experimental reult how that the objective value of tate-dependent plitting very cloely track the optimal objective. For thi reaon, thi olution i our favorite. Although tate-independent plitting ha a omewhat wore performance epecially a the network load increae beyond current level, it i alo attractive due to it implicity. 4.1 Experimental Setup Our imulation ue a variety of ynthetic topologie, the Abilene topology, a well a the city-level IP backbone topology of a tier-1 ISP with a et of failure provided by the network operator. The parameter of the topologie we ued are ummarized in Table 3. Synthetic topologie: The ynthetic topologie include 2- level hierarchical graph, purely random graph, and Waxman graph. 2-level hierarchical graph are produced uing the generator GT-ITM [41], for random graph the probability of two edge being connected i contant, and the probability of having an edge between two node in the Waxman graph decay exponentially with the ditance of the node. Thee topologie alo appear in [9]. Abilene topology: The topology of the Abilene network and a meaured traffic matrix are ued. We ue the true edge capacitie of 10 Gbp. Tier-1 IP backbone: The city-level IP backbone of a tier-1 ISP i ued. In our imulation, we ue the real link capacitie and meaured traffic demand. We alo obtained the link round-trip propagation delay. The collection of network failure S for the ynthetic topologie and Abilene contain ingle edge failure and the nofailure cae. Two experiment with different collection of failure are performed on the tier-1 IP backbone. In the firt experiment, ingle edge failure are ued. In the econd experiment, the collection of failure alo contain Shared Rik Link Group (SRLG), link failure that occur imulta- Name Topology Node Edge Demand hier50a hierarchical 50 148 2,450 hier50b hierarchical 50 212 2,450 rand50 random 50 228 2,450 rand50a random 50 245 2,450 rand100 random 100 403 9,900 wax50 Waxman 50 169 2,450 wax50a Waxman 50 230 2,450 abilene backbone 11 28 253 tier-1 backbone 50 180 625 Table 3: Synthetic and realitic network topologie. neouly. SRLG were obtained from the network operator databae that contain 954 failure with the larget failure affecting 20 link imultaneouly. For each potential line card failure, a complete router failure, or a link cut there i a correponding record in the SRLG databae. Therefore, failure that do not appear in the databae are rare. The weight w in the optimization objective (1) were et to 0.5 for the no-failure cae, and all other failure weight are equal and um to 0.5. The et of demand D in the Abilene and tier-1 network were obtained by ampling Netflow data meaured on Nov. 15th 2005 and May 22nd 2009, repectively. For the ynthetic topologie, we choe the ame traffic demand a in [9]. To imulate the algorithm in environment with increaing congetion, we repeat all experiment everal time while uniformly increaing the traffic demand. For the ynthetic topologie we tart with the original demand and cale them up to twice the original value. A the average link utilization in Abilene and the tier-1 topology i lower than in the ynthetic topologie, we cale the demand in thee realitic topologie up to three time the original value. In our experiment we ue the piecewie linear penalty function defined by Φ(0) = 0 and it derivative: 1 for 0 l < 0.333 3 for 0.333 l < 0.667 Φ 10 for 0.667 l < 0.9 (l) = 70 for 0.9 l < 1 500 for 1 l < 1.1 5000 for 1.1 l < Thi penalty function wa introduced in [10]. The function can be viewed a modeling retranmiion delay caued by packet loe. The cot i mall for low utilization, and increae teeply a the utilization exceed 100%. Our imulation calculate the objective value of the optimal olution, tate-independent and tate-dependent plitting, and equal plitting. Equal plitting i a variant of tateindependent plitting that plit the flow evenly on the available path. We alo calculate the objective achieved by the hortet path routing of OSPF with optimized link weight. Thee link weight were calculated uing the tate-of-the-art optimization of [9], and thee optimization take into conideration the et of failure tate S and the correponding failure weight w. Our imulation were performed uing CPLEX verion 11.2 on a 1.5 GHz Intel Itanium 2 proceor. Solving the linear program (2) for a particular failure cae in the tier-1 topology take 4 econd, and olving the linear program (3) take about 16 minute. A tier-1 network operator can perform calculation for it entire city-level topology in le than 2 hour. 4.2 Performance with Static Traffic Avoiding congetion and packet loe during planned and unplanned failure i the central goal of traffic engineering. Our traffic engineering objective meaure congetion acro all the conidered failure cae. The objective a a function of the caled-up demand i depicted in Figure 2. The reult which were obtained on the hierarchical and tier-1 topologie are repreentative, we made imilar obervation for all the other topologie. In Figure 2, the performance of tate-dependent plitting and the optimal olution i virtually inditinguihable in all cae. State-independent plitting i le ophiticated and doe not allow cutom load
objective value 1.2e+06 1e+06 800000 600000 400000 OSPF equal plitting tate indep. plitting tate dep. plitting global optimum routing table ize 350 300 250 200 150 100 200000 0 1 1.2 1.4 1.6 1.8 2 network traffic 50 0 1 1.5 2 2.5 3 network traffic max ize avg ize objective value objective value 8e+06 7e+06 6e+06 5e+06 4e+06 3e+06 2e+06 1.2e+07 1e+07 8e+06 6e+06 4e+06 2e+06 OSPF equal plitting tate indep. plitting tate dep. plitting global optimum 1 1.5 2 2.5 3 OSPF equal plitting tate indep. plitting tate dep. plitting global optimum network traffic 1 1.5 2 2.5 3 network traffic Figure 2: From top to bottom the traffic engineering objective a a function of an increaing traffic load in the hierarchical topology hier50a, tier-1 topology with ingle edge failure, and mboxtier-1 topology with SRLG, repectively. The performance of the optimal olution and tate-dependent plitting i nearly identical. balancing ratio for ditinct failure, and therefore it performance i wore compared to the optimum. However, the performance compare well with that of OSPF. Unlike OSPF, tate-independent plitting benefit from uing the ame et of path a the optimal olution. It i not urpriing that the equal plitting algorithm achieve the wort performance. Figure 4: Size of the compreed routing table in the tier-1 topology with SRLG. The larget and average routing table ize (± one tandard deviation) in the backbone router are hown. We oberve that OSPF achieve a omewhat wore performance than tate-independent and tate-dependent plitting a the load increae. We made thi obervation depite the fact that we obtained a cutom et of OSPF link weight for each network load we evaluated. A poible explanation i that OSPF routing, in which each router plit the load evenly between the mallet weight path, doe not allow enough flexibility in chooing route and plitting ratio. Solution with few path are preferred a they decreae the number of tunnel that have to be managed, and reduce the ize of the router configuration. However, a ufficient number of path mut be available to avoid failure and to reduce congetion. We oberve that the number of path ued by our algorithm i mall. We record the number of path ued by each demand, and plot the ditribution in Figure 3. Not urpriingly, the number of path i greater for larger and more divere topologie. 92% of the demand in the hierarchical topology ue 7 or fewer path, and fewer than 10 path are needed in the tier-1 backbone topology for almot all demand. Further, Figure 3 how that the number of path only increae lightly a we cale up the amount of traffic in the network. Thi mall increae i caued by hifting ome traffic to longer path a the hort path become congeted. A practical olution ue few MPLS label in order to reduce the ize of routing table in the router. Our experimental reult reveal that when we ue MPLS tunnel in the tier-1 topology, a few thouand tunnel can pa through a ingle router. However, a imple routing table compreion technique allow u to reduce the routing table ize to a few hundred entrie in each router. Such compreion i important becaue it reduce the memory requirement impoed on the imple router whoe ue we advocate, and it improve the route lookup time. Routing table can be compreed by uing the ame MPLS label for route with a common path to the detination. Specifically, if two route to detination t pa through router r, and thee route hare the ame path between the router r and the detination t, the ame outbound label hould be ued in the routing table of router r. The reulting routing table ize a a function of the network load are depicted in
1 1 0.8 0.8 0.6 0.6 cdf cdf 0.4 0.4 0.2 0 2 3 4 5 6 7 8 9 10 number of path abilene wax50 tier-1 hier50a tier-1 (SRLG) 0.2 0 2 3 4 5 6 7 8 9 10 number of path traffic=1.0 traffic=1.8 traffic=2.6 Figure 3: The number of path ued in variou topologie on the left, and in the tier-1 topology with SRLG on the right. The cumulative ditribution function how that the number of path i almot independent of the traffic load in the network, but i larger for bigger, more well-connected topologie. Algorithm Single edge SRLG Optimal load balancing 31.75 ± 0.26 31.80 ± 0.25 State dep. plitting 31.51 ± 0.17 31.61 ± 0.16 State indep. plitting 31.76 ± 0.26 31.87 ± 0.25 Equal plitting 34.83 ± 0.33 40.85 ± 0.86 OSPF (optimized) 31.18 ± 0.40 31.23 ± 0.40 OSPF (current) 31.38 ± 0 31.38 ± 0 Table 4: Round-trip propagation delay in m (average ± one tandard deviation) in the tier-1 backbone network for ingle edge failure and SRLG failure. Figure 4. The curve on the top how the ize of the larget routing table, and the curve on the bottom how the average routing table ize among all the backbone router. Minimizing the delay experienced by the uer i another important goal of network operator. We calculated the average round-trip propagation delay of all the evaluated algorithm. The calculated delay include delay in all failure tate weighted by the correponding likelihood of occurrence, but exclude congetion delay which i negligible. The delay are ummarized in Table 4. We oberve that the round-trip delay of all algorithm except equal plitting i almot identical at around 31 m. Thee value would atify the 37 m requirement pecified in the SLA of the tier-1 network. Moreover, thee value are not higher than thee experienced by the network uer today. To demontrate thi, we repeated our imulation on the tier-1 topology uing the real OSPF weight which are ued by the network operator. Thee value are choen to provide a tradeoff between traffic engineering and hortet delay routing. The reult which appear in Table 4 in the row titled OSPF (current) how that the current delay are 31.38 m for each of the two tier-1 failure et. 4.3 Robut Optimization for Dynamic Traffic Solving the optimization problem repeatedly a the traffic matrix change i undeirable due to the need to update the router configuration with new path and plitting ratio. relative traffic volume (%) 100 80 60 40 20 0 5 10 15 20 time (GMT) aggregate volume flow 1 flow 2 flow 3 Figure 5: The aggregate traffic volume in the tier-1 network ha peak at midnight GMT and 8 p.m. GMT. Example of three demand how that their peak occur at different time of the day. We explore the poibility of uing a ingle router configuration that i robut to diurnal change of the demand. To perform thi tudy we collected hourly netflow traffic trace in the tier-1 network on September 29, 2009. We denote the reulting 24 hourly traffic matrice D 0, D 1,..., D 23. Figure 5 depict the aggregate traffic volume, a well a example of the traffic between three ingre-egre router pair. The aggregate traffic volume i the lowet at 9 a.m. GMT and peak with 2.5 time a much traffic at midnight and 8 p.m. GMT. Comparion to the three depicted ingre-egre router demand reveal that the traffic during a day cannot be obtained by imple caling a the individual demand peak at different time. Thi make the joint optimization challenging. The firt tep in the joint optimization i to calculate a ingle et of path that guarantee failure reilience and load balancing for each of the 24 traffic matrice. There are everal approache we can take. In the firt approach, we olve linear program (2) for each traffic matrix D i eparately and ue the union of the path obtained for each matrix. The
objective value 4.5e+06 4e+06 3.5e+06 3e+06 2.5e+06 2e+06 ISP Backbone Data Center Network element MPLS router Ethernet witch Path intallation RSVP VLAN trunking Traffic plitting Ingre router End hot Failure detection BFD Hot probing Fat recovery Ingre router End hot Traffic demand MPLS MIB Hot/VLAN counter Table 5: Exiting tool and protocol that can be ued to deploy our architecture. 1.5e+06 1e+06 tate dep. plitting global optimum tate indep. plitting 5 10 15 20 time (GMT) Figure 6: The traffic engineering objective in the tier-1 topology with SRLG. The tate dependent and tate independent plitting algorithm ue a ingle configuration throughout the day. The optimal olution ue a cutom configuration for each hour. econd approach i to calculate the average traffic matrix D = 1 24 i Di. The linear program (2) i then olved for the average traffic matrix. In the third approach we ue the envelope of the 24 traffic matrice intead of the average, i.e., we let D jk = max idjk. i In our imulation we choe the lat method. Compared to the firt method, it reult in fewer path. Compared to the econd method, it allow better load balancing becaue demand between ingre-egre pair with high traffic variability throughout the day are repreented by the peak traffic. The econd tep i to calculate router configuration robut to traffic change. We again ue the envelope D jk = max idjk i a the input traffic matrix and repeat the optimization from the previou ection. Then we tet the olution by imulating the varying traffic demand during one day period. The reulting objective value of tate dependent plitting and tate independent plitting i depicted in Figure 6. The optimal objective in Figure 6 repreent the performance of the bet poible olution that ue cutom configuration updated hourly. We oberve that tate dependent plitting with a ingle configuration i robut to diurnal traffic change and the value of it objective cloely track the optimum. State independent plitting i alo cloe to optimal during low congetion period, but become uboptimal during the peak hour. 5. DEPLOYMENT SCENARIOS Although our architecture enable the ue of new impler router, we can readily deploy our olution uing exiting protocol and equipment, a ummarized in Table 5. An ISP can deploy our architecture uing Multi-Protocol Label Switching (MPLS) [30]. Data center could ue the ame olution, or leverage exiting Ethernet witche and move ome functionality into the end-hot machine. 5.1 ISP Backbone Uing MPLS Intalling MPLS path with RSVP: MPLS i particularly uitable becaue ingre router encapulate pack- et with label and direct them over pre-etablihed Label- Switched Path (LSP). Thi enable flexible routing when multiple LSP are etablihed between each ingre-egre router pair. Our olution, then, could be viewed a a particular application of MPLS, where the management ytem compute the LSP, intruct the ingre router to etablih the path (ay, uing RSVP), and diable any dynamic recalculation of alternate path when primary path fail. Hah-baed plitting at ingre router: Multipath forwarding i upported by commercial router of both major vendor [2, 28]. The router can be configured to hah packet baed on port and addre information in the header into everal group and forward each group on a eparate path. Thi provide path plitting with relatively fine granularity (e.g., at the 1/16th level), while preventing out-oforder packet by delivery by enuring that packet belonging to the ame TCP or UDP flow travere the ame path. Path-level failure detection uing BFD: Fat failure detection can be done uing Bidirectional Forwarding Detection (BFD) [16]. A BFD eion can monitor each path between two router, by piggybacking on the exiting data traffic. (Backbone covering a large geographic region may alo ue exiting link-level detection mechanim for even fater recovery. For example, local path protection [29] intall a hort alternate path between two adjacent router, for temporary ue after the direct link fail. However, local protection cannot fully exploit the available path diverity, leading to uboptimal load balancing; intead, local protection can be ued in conjunction with our deign.) Failure recovery at ingre router: The ingre router adapt to path failure by plitting traffic over the remaining path. In tate-independent plitting, the ingre router ha a ingle et of traffic-plitting weight, and automatically renormalize to direct traffic over the working path. State-dependent plitting require modification to the router oftware to witch to alternate traffic-plitting weight in the data plane; no hardware modification are required. Meauring traffic demand uing SNMP: MPLS ha SNMP counter (called Management Information Bae) that meaure the total traffic travering each Label-Switched Path. The management ytem can poll thee counter to meaure the traffic demand. Alternative meaurement technique, uch a Netflow or tomography, may alo be ued. 5.2 Data Center Uing Hot and Switche While a data center could eaily ue the ame MPLS-baed olution, control over the end hot and the availability of cheaper commodity witche enable another olution. End-hot upport for monitoring and traffic plitting: The erver machine in data center can perform many
of the path-level operation in our architecture. A in the VL2 [13] and SPAIN [25] architecture, the end hot can encapulate the packet (ay, uing a VLAN tag) to direct them over a pecific path. Thi enable much finer-grain traffic plitting. In addition, the end hot can perform path-level probing in the data plane, by piggybacking on exiting data traffic and ending additional active probe when needed. Upon detecting path failure, the end hot can change to new path-plitting percentage baed on the precomputed configuration intalled by the controller. The end hot could alo meaure the traffic demand by keeping count of the traffic detined to each egre witch. Thee function can be implemented in the hypervior, uch a the virtual witch that often run on erver machine in data center. Multiple VLAN or OpenFlow rule for forwarding: The remaining function can be performed by the underlying witche. For example, the management ytem can configure multiple path by merging thee path into a et of tree, where each tree correpond to a different VLAN [25]. Or, if the witche upport the emerging Open- Flow tandard [1, 23], the management ytem could intall a forwarding-table rule for each hop in each path, where the rule matche on the VLAN tag, and forward the packet to the appropriate output port. Since OpenFlow witche maintain traffic counter for each rule, the management ytem can meaure the traffic demand by polling the witche, in lieu of the end hot collecting thee meaurement. 6. RELATED WORK Traffic engineering: Mot of the related work treat failure recovery and traffic engineering independently. Traffic engineering without failure recovery in the context of MPLS i tudied in [6, 7, 20, 33, 39]. The work in [6] utilize traffic plitting to minimize end-to-end delay and lo rate; however, an algorithm for optimal path election i not provided. The work in [20] and [33] minimize the maximum link utilization while atifying the requeted traffic demand. Other paper [7, 15, 18, 39] prevent congetion by adaptively balancing the load among multiple path baed on meaurement of congetion, wherea our olution precompute traffic-plitting configuration baed on both the offered traffic and the likely failure. Failure recovery: Local and global path protection are popular failure recovery mechanim in MPLS. In local protection the backup path take the hortet path that avoid the outage location from a point of local repair to the merge point with the primary path. The IETF RFC 4090 [29] focue on defining ignaling extenion to etablih the backup path, but leave the iue of bandwidth reervation and optimal route election open. In [37] the hortet path that avoid the failure i ued. While [32] and [38] attempt to find optimal backup path with the goal of reducing congetion, local path protection i le uitable for traffic engineering than global path protection, which allow rerouting on end-to-end path [35]. Other work decribe how to manage retoration bandwidth and elect optimal path [17, 21]. While our olution alo ue global protection to reroute around failure, the bigget difference i that mot of the related work ditinguihe primary and backup path and only ue a backup path when the primary path fail. In contrat, our olution balance the load acro multiple path even before failure occur, and imply adjut the plitting ratio in repone to failure. Integrated failure recovery and TE: Previou reult that integrate failure recovery with routing on multiple path only ue alternate path when primary route fail [31], or they require explicit congetion feedback and do not provide algorithm to find the optimal path [19, 24]. YAMR [11] contruct a et of divere path in the interdomain routing etting that are reilient againt a pecified et of failure, but without regard to load balancing. In [42] they integrate failure recovery with load balancing, but their focu i different they guarantee delivery of a certain fraction of the traffic after a ingle edge failure, wherea our goal i to deliver all traffic for a known et of multi-edge failure. Propoal that optimize OSPF or IS-IS link weight with failure in mind, uch a [9] and [27], mut rely on hortet path IGP routing and therefore cannot fully utilize the path diverity in the network. Failure recovery and TE with multiple panning tree: Enterprie and data-center network often ue Ethernet witche, which do do not cale well becaue all traffic flow over a ingle panning tree, even if multiple path exit. Several paper propoe more calable Ethernet deign that ue multiple path. The work of Sharma et al. ue VLAN to exploit multiple panning tree to improve link utilization, and achieve improved fault recovery [34]. Mot of the deign uch a VL2 [13], and PortLand [26] rely on equal plitting of traffic on path with the ame cot. SPAIN [25] upport multipath routing through multiple panning tree, with end hot plitting traffic over the multiple path. However, the algorithm for computing the path doe not conider the traffic demand, and the end hot mut play a tronger role in deciding which path to ue for each individual flow baed on the oberved performance. NP-hardne: Hardne proof of optimization problem related to failure recovery appear, e.g., in [36] and [5]. 7. CONCLUSION In thi paper we propoe a mechanim that combine path protection and traffic engineering to enable reliable data delivery in the preence of link failure. We formalize the problem by providing everal optimization-theoretic formulation that differ in the capabilitie they require of the network router. For each of the formulation, we preent algorithm and heuritic that allow the network operator to find a et of optimal end-to-end path and load balancing rule. Our extenive imulation on the IP backbone of a tier-1 ISP and on a range of ynthetic topologie demontrate the attractive propertie of our olution. Firt, tate-dependent plitting achieve load balancing performance cloe to the theoretical optimum, while tate-independent plitting often offer comparable performance and a very imple etup. Second, uing our olution doe not ignificantly increae propagation delay compared to the hortet path routing of OSPF. Finally, our olution i robut to diurnal traffic change and a ingle configuration uffice to provide good performance. In addition to failure reilience and favorable traffic engineering propertie, our architecture ha the potential to implify router deign and reduce operation cot for ISP a well a operator of data center and enterprie network. 8. REFERENCES [1] OpenFlow Switch Conortium. http://www.openflowwitch.org/.
[2] JUNOS: MPLS fat reroute olution, network operation guide, 2007. [3] I. Avramopoulo and J. Rexford. Stealth probing: Securing IP routing through data-plane ecurity. In Proc. USENIX Annual Technical Conference, June 2006. [4] M. Caear, M. Caado, T. Koponen, J. Rexford, and S. Shenker. Dynamic route computation conidered harmful. SIGCOMM Comput. Commun. Rev., Apr. 2010. [5] D. Coudert, P. Datta, S. Perenne, H. Rivano, and M.-E. Voge. Shared rik reource group: Complexity and approximability iue. Parallel Proceing Letter, 17(2):169 184, 2007. [6] E. Dinan, D. Awduche, and B. Jabbari. Analytical framework for dynamic traffic partitioning in MPLS network. In IEEE International Conference on Communication, volume 3, page 1604 1608, 2000. [7] A. Elwalid, C. Jin, S. Low, and I. Widjaja. MATE: MPLS adaptive traffic engineering. In Proceeding of INFOCOM, volume 3, page 1300 1309, 2001. [8] S. Fortune, J. Hopcroft, and J. Wyllie. The directed ubgraph homeomorphim problem. Theor. Comput. Sci., 10(2):111 121, 1980. [9] B. Fortz and M. Thorup. Optimizing OSPF/IS-IS weight in a changing world. IEEE Journal on Selected Area in Communication, 20(4):756 767, 2002. [10] B. Fortz and M. Thorup. Increaing Internet capacity uing local earch. Computational Optimization and Application, 29(1):13 48, 2004. [11] I. Ganichev, B. Dai, B. Godfrey, and S. Shenker. YAMR: Yet another multipath routing protocol. SIGCOMM Comput. Commun. Rev., 40(5):14 19, 2010. [12] S. Goldberg, D. Xiao, E. Tromer, B. Barak, and J. Rexford. Path-quality monitoring in the preence of adverarie. In Proc. ACM SIGMETRICS, June 2008. [13] A. Greenberg et al. VL2: A calable and flexible data center network. SIGCOMM Comput. Commun. Rev., 39:51 62, 2009. [14] I. P. Kaminow and T. L. Koch. The Optical Fiber Telecommunication IIIA. Academic Pre, New York, 1997. [15] S. Kandula, D. Katabi, B. Davie, and A. Charny. Walking the tightrope: Reponive yet table traffic engineering. In Proc. ACM SIGCOMM, page 253 264, 2005. [16] D. Katz and D. Ward. Bidirectional forwarding detection (BFD). IETF RFC 5880, 2010. [17] M. Kodialam and T. V. Lakhman. Dynamic routing of retorable bandwidth-guaranteed tunnel uing aggregated network reource uage information. IEEE/ACM Tran. Netw., 11(3):399 410, 2003. [18] A. Kvalbein, C. Dovroli, and C. Muthu. Multipath load-adaptive routing: Putting the emphai on robutne and implicity. In IEEE ICNP, 2009. [19] C. M. Lagoa, H. Che, and B. A. Movichoff. Adaptive control algorithm for decentralized optimal traffic engineering in the Internet. IEEE/ACM Tran. Netw., 12(3):415 428, 2004. [20] Y. Lee, Y. Seok, Y. Choi, and C. Kim. A contrained multipath traffic engineering cheme for MPLS network. In IEEE International Conference on Communication, volume 4, page 2431 2436, 2002. [21] Y. Liu, D. Tipper, and P. Siripongwutikorn. Approximating optimal pare capacity allocation by ucceive urvivable routing. IEEE/ACM Tran. Netw., 13(1):198 211, 2005. [22] A. Markopoulou, G. Iannaccone, S. Bhattacharyya, C.-N. Chuah, Y. Ganjali, and C. Diot. Characterization of failure in an operational IP backbone network. IEEE/ACM Tran. Netw., 16(4):749 762, 2008. [23] N. McKeown, T. Anderon, H. Balakrihnan, G. Parulkar, L. Peteron, J. Rexford, S. Shenker, and J. Turner. OpenFlow: enabling innovation in campu network. SIGCOMM Comput. Commun. Rev., 38:69 74, 2008. [24] B. A. Movichoff, C. M. Lagoa, and H. Che. End-to-end optimal algorithm for integrated QoS, traffic engineering, and failure recovery. IEEE/ACM Tran. Netw., 15(4):813 823, 2007. [25] J. Mudigonda, P. Yalagandula, M. Al-Fare, and J. C. Mogul. SPAIN: COTS data-center ethernet for multipathing over arbitrary topologie. In Proc. Networked Sytem Deign and Implementation, Apr. 2010. [26] R. Niranjan Myore, A. Pambori, N. Farrington, N. Huang, P. Miri, S. Radhakrihnan, V. Subramanya, and A. Vahdat. Portland: a calable fault-tolerant layer 2 data center network fabric. SIGCOMM Comput. Commun. Rev., 39:39 50, 2009. [27] A. Nucci, S. Bhattacharyya, N. Taft, and C. Diot. IGP link weight aignment for operational tier-1 backbone. IEEE/ACM Tran. Netw., 15(4):789 802, 2007. [28] E. Oborne and A. Simha. Traffic Engineering with MPLS. Cico Pre, Indianapoli, IN, 2002. [29] P. Pan, G. Swallow, and A. Atla. Fat reroute extenion to RSVP-TE for LSP tunnel. IETF RFC 4090, 2005. [30] E. Roen, A. Viwanathan, and R. Callon. Multiprotocol label witching architecture. IETF RFC 3031, 2001. [31] H. Saito, Y. Miyao, and M. Yohida. Traffic engineering uing multiple multipoint-to-point LSP. In Proceeding of INFOCOM, volume 2, page 894 901, 2000. [32] H. Saito and M. Yohida. An optimal recovery LSP aignment cheme for MPLS fat reroute. In International Telecommunication Network Strategy and Planning Sympoium (Network), page 229 234, 2002. [33] Y. Seok, Y. Lee, Y. Choi, and C. Kim. Dynamic contrained multipath routing for MPLS network. In International Conference on Computer Communication and Network, page 348 353, 2001. [34] S. Sharma, K. Gopalan, S. Nanda, and T. cker Chiueh. Viking: A multi-panning-tree ethernet architecture for metropolitan area and cluter network. In Proceeding of INFOCOM, volume 4, page 2283 2294, 2004. [35] V. Sharma and F. Helltrand. Framework for
multi-protocol label witching (MPLS)-baed recovery. IETF RFC 3469, 2003. [36] A. Tomazewki, M. Pioro, and M. Zotkiewicz. On the complexity of reilient network deign. Network, 55(2), 2010. [37] J.-P. Vaeur, M. Pickavet, and P. Demeeter. Network Recovery: Protection and Retoration of Optical, SONET-SDH, IP, and MPLS, page 397 422. Morgan Kaufmann Publiher Inc., San Francico, CA, 2004. [38] D. Wang and G. Li. Efficient ditributed bandwidth management for MPLS fat reroute. IEEE/ACM Tran. Netw., 16(2):486 495, 2008. [39] J. Wang, S. Patek, H. Wang, and J. Liebeherr. Traffic engineering with AIMD in MPLS network. In IEEE International Workhop on Protocol for High Speed Network, page 192 210, 2002. [40] D. Wendlandt, I. Avramopoulo, D. Anderen, and J. Rexford. Don t ecure routing protocol, ecure data delivery. In Proc. ACM SIGCOMM Workhop on Hot Topic in Network, Nov. 2006. [41] E. W. Zegura. GT-ITM: Georgia Tech internetwork topology model (oftware), 1996. [42] W. Zhang, J. Tang, C. Wang, and S. de Soya. Reliable adaptive multipath proviioning with bandwidth and differential delay contraint. In Proceeding of INFOCOM, page 2178 2186, 2010. APPENDIX A. PROOFS Thi Appendix how that two problem are NP-hard: Failure State Ditinguihing intance: A directed graph G = (V, E), a ource and detination vertice u, v V, and two et, E. quetion: I there a imple directed path P from u to v that contain edge from one and only one of the et and? Bounded Path Load Balancing intance: A directed graph G = (V, E) with a poitive rational capacity c e for each edge e E, a collection S of ubet E of failure tate with a rational weight w for each S, a et of triple (u d, v d, h d ), 1 d k, correponding to demand, where h d unit of demand d need to be ent from ource vertex u d V to detination vertex v d V, an integer bound J on the number of path that can be ued between any ource-detination pair, a piecewielinear increaing cot function Φ(l) mapping edge load l to rational, and an overall cot bound B. quetion: Are there J (or fewer) path between each ourcedetination pair uch that the given demand can be aigned to the path o that the cot (um of Φ(l) over all edge and weighted failure tate a decribed in the text) i B or le? To prove that a problem X i NP-hard, we mut how that for ome known NP-hard problem Y, any intance y of Y can be tranformed into an intance x of X in polynomial time, with the property that the anwer for y i ye if and only if the anwer for x i ye. Both our problem can be proved NP-hard by tranformation from the following problem, proved NP-hard by Fortune, Hopcroft, and Wyllie [8]. Dijoint Directed Path intance: A directed graph G(V, E) and ditinguihed vertice u 1, v 1, u 2, v 2 V. quetion: Are there directed path P 1 from u 1 to v 1 and P 2 from u 2 to v 2 uch that P 1 and P 2 are vertex-dijoint? Theorem 1. The Failure State Ditinguihing problem i NP-hard. Proof. Suppoe we are given an intance G = (V, E), u 1, v 1, u 2, v 2 of Dijoint Directed Path. Our contructed intance of Failure State Ditinguihing conit of the graph G = (V, E ), where E = E {(v 1, u 2)}, with u = u 1, v = v 2, = φ, and = {(v 1, u 2)}. Given thi choice of and, a imple directed path from u to v that ditinguihe the two tate mut contain the edge (v 1, u 2). We claim that uch a path exit if and only if there are vertex-dijoint directed path P 1 from u 1 to v 1 and P 2 from u 2 to v 2. Suppoe a ditinguihing path P exit. Then it mut conit of of three egment: a path P 1 from u = u 1 to v 1, the edge (v 1, u 2), and then a path P 2 from u 2 to v = v 2. Since it i a imple path, P 1 and P 2 mut be vertex-dijoint. Converely, if vertex-dijoint path P 1 from u 1 to v 1 and P 2 from u 2 to v 2 exit, then the path P that concatenate P 1 followed by (v 1, u 2) followed by P 2 i our deired ditinguihing path. Theorem 2. The Bounded Path Load Balancing problem i NP-hard even if there are only two commoditie (k = 2), only one path i allowed for each (J = 1), and there i only one failure tate. Proof. For thi reult we ue the variant of Dijoint Directed Path in which we ak for edge-dijoint rather than vertex-dijoint path. The NP-hardne of thi variant i eay to prove, uing a contruction in which each vertex x of G i replaced by a pair of new vertice in x and out x, and each edge (x, y) i replaced by the edge (out x, in y). Suppoe we are given an intance G = (V, E), u 1, v 1, u 2, v 2 of the edge-dijoint variant of Dijoint Directed Path. Our contructed intance of Bounded Path Load Balancing i baed on the ame graph, with each edge e given capacity c e = 1, with the ingle failure tate = φ (i.e., the tate with no failure), with w = 1, and with demand repreented by the triple (u 1, v 1, 1) and (u 2, v 2, 1). The cot function Φ ha derivative Φ (l) = 1, 0 l 1, and Φ (l) = E, l > 1. Our target overall cot bound i B = E. If the deired dijoint path exit, we can ue P 1 to end the required unit of traffic from u 1 to v 1, and P 2 to end the required unit of traffic from u 2 to v 2. Since the path are edge-dijoint, no edge will carry more than one unit of traffic, o the cot per edge ued i 1, and the total number of edge ued i at mot E. Thu the pecified cot bound B = E i met. On the other hand, if no uch pair of path exit, then we mut chooe path P 1 and P 2 that hare at leat one edge, which will carry two unit of flow, for an overall cot of at leat E + 1, jut for that edge. Thu if there i a olution with cot E or le, the deired dijoint path mut exit. Adding more path, failure tate, or commoditie cannot make the problem eaier. Note, however, that thi doe not imply that the problem for the precie cot function Φ preented in the text i NP-hard. It doe, however, mean that, auming P NP, any efficient algorithm for that Φ would have to exploit the particular feature of that function.