Multipath TCP in Data Centres (work in progress) Costin Raiciu Joint work with Christopher Pluntke, Adam Greenhalgh, Sebastien Barre, Mark Handley, Damon Wischik
Data Centre Trends Cloud services are driving the creation of more and more data centres Leverage economies of scale Ability to dynamically provision servers among services as workload changes Many diverse applications run in DCs Search, large scale processing (map reduce), storage, web hosting, etc. To be profitable, data centers must be highly utilized Need to be able to assign any server to any service Need high bandwidth and location independence
Tree DC topology 10Gbps 1Gbps 10Gbps Core Switch Aggregation Switches Top of Rack Switches Racks of servers
Fat Tree topology [Fares, 2008] 1Gbps K=4 Aggregation Switches 1Gbps K Pods with K Switches each Racks of servers
Using the Fat Tree is Challenging Need to place flows on paths Obvious solution: Valiant Load Balancing (ECMP) Suffers from collisions A centralized scheduler Knows the whole network Places only large flows Fundamentally limited Reaction speed In-network knowledge
Multipath TCP Can naturally use the multiple paths and find available bandwidth Can use existing multipath routing inside DC s ECMP VLANs Memory Usage Rtts and rtos are small enough that this doesn t matter
Which Congestion Control? Linked Increases Less aggressive increase behaves as one TCP Better fairness Uncoupled Strong correlation between number of subflows and throughput
Experiments Setup Flow-level simulator Up to 30K hosts Details Permutation matrix: each host does an infinite bulk transfer to a single other host Choose one of the paths randomly and use it Measure Bisection bandwidth: sum of all flows average throughputs Fairness
Multipath TCP in the Fat Tree Topology K=32 (8K hosts, 256 Paths between endpoints)
Multipath TCP in the Fat Tree Topology
Other topologies:vl2 Throughput
VL2: Fairness
Other topologies: BCube
Bcube: Fairness
How many subflows? To get 90% utilization 8 subflows are enough in Fat Tree up to 30K hosts 2 subflows are enough in VL2
Are 8 subflows too much? Increased the number of flows and subflows Measured loss rate and network utilization Linked Increases with 8 subflows Has similar loss rate to regular TCP, and same utilization It has proportionally more timeouts Does affect app-to-app delay and burstiness
Centralized Scheduling [Fares, 2010] Centralized server that: Polls all switches periodically to find large flows Assign large flows to non-conflicting paths Detecting large flows Set speed threshold to 1/10 of interface speed Anything over is scheduled
Comparing Multipath TCP and Centralized Scheduling Closed loop arrival of flows Permutation matrix, one flow per host at any time Flow Size distribution taken from VL2 paper Most flows (99%) transfer a mean of 10KB of data 1% of flows transfer a mean of 100MB of data (1s at interface rate)
Dynamic Scheduling
Centralized Scheduling: Setting the Threshold Throughput 1Gbps Hope 15% worse than multipath TCP 100Mbps App Limited
Centralized Scheduling: Setting the Threshold Throughput 1Gbps 10% worse than multipath TCP 100Mbps App Limited Hope
Centralized Scheduling: Setting the Threshold Throughput 1Gbps 500Mbps 100Mbps 5% 15% 25% 10%
Practical Experiments VLB: 48% Centralized Scheduling: 95% Multipath TCP 88%
Practical Experiments Bottleneck Link Bottleneck Link Centralized Scheduling: 66% Multipath TCP 88%
Simpler, Cheaper Data Centres with Multipath TCP Current requirements: 1Gbps between any two hosts How about at least 1Gbps always, but 10Gbps whenever network is idle (most of the times)?
Changing the Fat Tree: Single Path 1Gbps 10Gbps 10Gbps 1Gbps 1Gbps 10Gbps
Changing the Fat Tree: Single Path 1Gbps 1Gbps 1Gbps 10Gbps
Summary Multipath TCP seems the natural way to evolve data centre transport Can use multiple paths automatically Provider better fairness and more throughput No obvious drawbacks Multipath TCP makes data centres simpler, cheaper and gives more performance Ongoing work When should we open the subflows Practical study of the effects of multiple subflows