Longer is Better? Exploiting Path Diversity in Data Centre Networks Fung Po (Posco) Tso, Gregg Hamilton, Rene Weber, Colin S. Perkins and Dimitrios P. Pezaros University of Glasgow
Cloud Data Centres Are used to create Cloud services Require a significant investment in capital outlay Accommodate tens of thousands machines
Google s Data Centre in Council Bluffs, Iowa $600 million
Microsoft Data Centre, Dublin. $500m
Facebook s Data Centre, North Carolina $606 million
Apple s Data Centre, Maiden $1 billion
Cost of DC Outages http://www.virtualhosting.com/blog/2013/ outrageous-costs-data-center-downtime/
Cloud Data Centres Collocated processing, network, storage resources Network topologies built from commodity data comm. mechanism Static resource management Oversubscribed network bandwidth Internet......... Co Aggregat Ed
DC Traffic Engineering Bandwidth can be major DC performance bottleneck Extensive server-to-server communication Increase in latency can cause significant revenue loss (cf. Google, Amazon Reports)
DC Traffic Engineering Existing approaches have shortcomings Static hashing Only schedule large flows (Hedera) Require advance knowledge of traffic demand
DC Traffic Engineering Opportunities for adaptive, measurementbased resource provisioning Software Defined Networking (SDN) Hardware-accelerated switches Centralised ownership
Baatdaat: Aims & Objectives Baatdaat ( 八 達 ): Reachable in all directions Actively avoid congestion based on realtime direct measurement of network utilisation Use non-shortest but lightly-utilised paths (detours) to better exploit resource redundancy
Baatdaat: Architecture... OpenFlow Switches 1. Measures link utilization locally 2. Places flows on to least utilized paths (uplinks) 3. Maintains hash table for multipathing 4. Reports local link utilization statistics to controller (Aggregation Switches) Network Switches (Hardware Space) Flow Entries Statistics Report 1. Store link utilization statistics 2. Compute all possible detour paths OpenFlow Controller (Software Space)
Baatdaat: Architecture Considerably alter the typical SDN paradigm Switches individually monitor adjacent link and schedule flows independently... OpenFlow Switches 1. Measures link utilization locally 2. Places flows on to least utilized paths (uplinks) 3. Maintains hash table for multipathing 4. Reports local link utilization statistics to controller (Aggregation Switches) Network Switches (Hardware Space) Flow Entries Statistics Report 1. Store link utilization statistics 2. Compute all possible detour paths OpenFlow Controller (Software Space) Avoid bottleneck at the controller Avoid delaying flow admission Maintain flow-level scheduling (5-tuple)
Baatdaat: Path Computation Detour constraints (empirical) only happen between aggregation and ToR layers Downlink aggregation switches Use detour if utilisation of shortest paths 30% Weighting factor to penalise non-shortest paths Only allow two hops longer 4 1 5 6 1 2 3 4 5 6 2 3 5 5 4 2 5 6 1 3 5 5 Shortest Paths 4 1 5 4 2 5 4 3 5 Additional Detour Paths 4 1 6 2 5 4 1 6 3 5 4 2 6 1 5 4 2 6 3 5 4 3 6 1 5 4 3 6 2 5 4 3 5 6 1 2 5 5 path diversity increased by: k/2 x (k/2-1) x (k/2-2)
Baatdaat: Load-aware Scheduling 3 shortest paths from to
Baatdaat: Load-aware Start Scheduling pick min(util) at ToR (x) (z) (y) yes util(shortest) 30%? at aggr no from to use shortest path use min( util(shortest), util(detour)) utilisation(detour_path) = max(x,y,z) x c where 1.5 c 2
Baatdaat: Switch Implementation Added switch-local multipath support to OpenFlow 1.0 Use wildcard table also as a forwarding table Input Arbiter Write Wildcard Table Write Exact Match Table Header Parser OpenFlow Firmware Exact Match Lookup Wildcard Lookup Link Measurement Signal Miss Arbiter Packet Editor miss hit Output Queues Flow entry will be added to exact math table OpenFlow Output Port Lookup
Baatdaat: Experimental Results ns-3: k=8 fat-tree (128 servers; 8 pods, 8 switches each) with latency sensitive 4, 8, 100 KB flows CDF 1 0.8 0.6 0.4 Optimal Baatdaat ECMP CDF 1 0.8 0.6 0.4 Optimal Baatdaat ECMP CDF 1 0.8 0.6 0.4 Optimal Baatdaat ECMP 0.2 0.2 0.2 CDF 0 20 40 60 80 100 Maximum Link Utilization (%) 1 0.8 0.6 0.4 0.2 Baatdaat Ecmp 0 0 10 20 30 Flow Completion Time (ms) CDF 0 20 40 60 80 100 Maximum Link Utilization (%) 1 0.8 0.6 0.4 0.2 Baatdaat ECMP 0 0 10 20 30 40 Flow Completion Time (ms) CDF 0 0 20 40 60 80 100 Maximum Link Utilization (%) 1 0.8 0.6 0.4 0.2 Baatdaat Ecmp 0 0 10 20 30 Flow Completion Time (ms)
Baatdaat: Experimental Results Impact of measurement interval and path length (4, 8, 100KB flows) 1 CDF 0.8 0.6 0.4 Baatdaat 1ms Baatdaat 10ms 0.2 Baatdaat 100ms ECMP 0 20 40 60 80 100 Maximum Link Utilization (%) ECMP ECMP 1 0.8 ECMP CDF 0.6 0.4 Baatdaat 2 Hops 0.2 Baatdaat 4 Hops ECMP 0 20 40 60 80 100 Maximum Link Utilization (%) ECMP
Take Away Adaptive, measurement-based provisioning for data centre networks Opportunities due to collocation of resources, redundancy, short control timescales; and network programmability Baatdaat is a measurement-based flow scheduling system for Cloud DCs
Thank You posco.tso@glasgow.ac.uk