DRCN 15 - March 25, 2015 Detour planning for fast and reliable fault recovery in SDN with OpenState Antonio Capone^, Carmelo Cascone^*, Alessandro Q.T. Nguyen^*, Brunilde Sansò^ Join work with: Luca Pollini^, Davide Sanvito^ Politecnico di Milano^ Dipartimento di Elettronica, Informazione e Bioingegneria Polytechnique Montréal* Département de génie électrique This work has been partly funded by the EC in the context of the BEBA project
Outline Introduction Software-defined Networking (SDN) OpenFlow Failure recovery in OpenFlow OpenState Our approach to failure recovery Backup path planning Experimental validation & results 2
Software-defined Networking (SDN) Traditional networking paradigm SDN paradigm Remote controller Closed platform Switch Control-plane Control-plane Data-plane Control-plane Programmable switch Open standard API Data-plane Data-plane Data-plane Control-plane Data-plane Data-plane 3
OpenFlow SDN API 1) Packet notification Controller Packet Flow table IP src IP dest TCP dest... Actions 192.168/16 10/8 any Port 2 192.168/16 any 80 Rate limit, Port 13 any 192.168/16 22 Drop any any any Send to controller 2) Install/update rules Switch 4
Failure recovery in OpenFlow Backup path Fast-failover : Local reroute based on port status (OpenFlow 1.1+) Weak! What if a local reroute in not available? 5
Failure recovery in OpenFlow (2) Flow entries update Link status change controller Single point of failure! Can rely on controller intervention, but: Long recovery latency detection + signaling + flow update Failure of control channel (controller unreachable) Signaling congestion (controller unresponsive) 6
Stateless vs. Stateful SDN Stateless data-plane model (e.g. OpenFlow) Stateful data-plane model Controller Global + local states SMART! Controller Global states SMART! Event notifications Control enforcing Auto-adaption Control delegation Switch Stateless DUMB! Switch Local states SMART! 7
OpenState Stateful extension to OpenFlow: Finite-state machines (FSM) abstraction Forward based on flow-states pkt headers State table match key state DEFAULT headers + state headers Flow table match fields state actions headers + actions headers + next-state SET_STATE next-state G. Bianchi, M. Bonola, A. Capone, and C. Cascone OpenState: Programming Platform-independent Stateful OpenFlow Applications Inside the Switch SIGCOMM CCR, Apr. 2014 8
Approach sketch STATE TRANSITION! PKT TAG PKT Faults signaled using same data packets Tag pushed with failed link ID Packets sent back until a convenient redirect point Flow-states used to update the routing CC [3]1 No extra signaling No packet loss after failure detection Controller not involved 9
Diapositiva 9 CC [3]1 Animazione punto per punto. Poi scompare e faccio apparire le scritte rosse Carmelo Cascone; 23/03/2015
Running example STATE TRANSITION! PORT STATUS CHANGE PKT TAG PKT TAG PKT Fault_ID=20 PKT Redirect Node: Detect Node: FLOW STATE = DEF TAG = STATE = 20 20 TAG = FAULT_ID = 20 DEF 20 i 10
Example on larger network Primary path 13 Detect node 17 Reverse path node 16 Redirect node Detour node 11
Backup path planning Key features: Single-failure scenario Agnostic failure characterization (link/node) Failure detection event (n,m) from node n to m Input: Capacitated network graph Traffic demands Primary paths 2 MILP formulations: 1 st : 3-terms objective function 2 nd : Congestion avoidance 12
1 st formulation Decision variables Backup path: Is 1 if link (i, j) belongs to the backup path of demand d in case of failure detection event (n, m), otherwise 0; Reverse path: number of backward hops that a tagged packet of demand d must perform in case of failure detection event (n, m), before reaching the reroute node. Link allocation: Is 1 if link (i,j) is used by at least one backup path for demand d, otherwise 0. 13
1 st formulation Weighted 3-terms objective function Length of the reverse path Length of the backup path Link capacity allocation Tip! This formulation can be used also to compute OpenFlow fast-failover local reroutes (no reverse path) 14
2 nd formulation Congestion related objective function Link cost Link load w.r.t. all possible failures 15
Computational results 16
Emulated testbed Mininet emulator Patched with OpenState support Topology Norway 27 switches, 51 links Out-of-band control channel Fixed to 12 ms delay Experimental results Packet loss OpenFlow vs. OpenState 17
Test topology: Norway Worst link 13 demands to be rerouted n Legend: Traffic demands count (primary path) Traffic generation details: 13 traffic demands 2700 ping requests for each demand Rate: 20-160 req/s 18
Experimental results OpenState vs. OpenFlow Number of packets lost OpenState (Ideal) OpenState (Realistic) OpenFlow (Ideal) OpenFlow (Realistic) Ideal case: 0ms failure detection delay Realistic case: Switch embedded failure detection mechanism Ping rate (req/s) 19
Conclusions OpenFlow weak support for fault-tolerance Full-state controller as single point of failure OpenState Stateful extension to OpenFlow Signalization using same data packets Controller independence No packet loss after detection 2 formulations for backup path planning Modelization based on OpenState Experimental results Minimum packet loss when using OpenState 20
www.openstate-sdn.org Open-source Download & try Controller Switch Example applications Mininet emulation 21
Thanks! Q&A carmelo.cascone@polymtl.ca http://ccascone.net
FSM description Flow-states updated accordingly to tag value DEFAULT = normal operation S1 = forward on detour for fault on link 1, etc.. Controller can restore the primary path (DEF state) once the fault has been fixed DEF tag=1 Forward (detour) S1 Forward (detour) Forward (primary) CTRL message Sn Forward (detour)
Test instances Network topologies used in test instances: (a) Polska, (b) Norway, and (c) Fat tree 24