Multipath TCP in Practice (Work in Progress) Mark Handley Damon Wischik Costin Raiciu Alan Ford
The difference between theory and practice is in theory somewhat smaller than in practice. In theory, this quote actually originated somewhere. In practice it is impossible for us to know...
Designing an effective Multipath TCP requires a balance between theory and pragmatics Three parts to practical MP-TCP: Desirable packet-level dynamics. Deployable protocol design. Effective heuristics for using the protocol.
Desirable packet-level congestion control dynamics.
Selfish: Safety: Network: Improve throughput for applications. Improve reliability for applications. Co-exist gracefully with existing single-path TCP flows. Pool resources where possible. Move traffic away from congestion. Predictable behaviour.
Implications Window-based protocol Get at least as much throughput as a single-path TCP on the best of the MP-TCP paths. Through a single bottleneck, subflows shouldn t get more than a single TCP flow. RTT dependency.
Starting point: a window-based version of a coupled congestion controller.
Coupling the subflows is fair, even if they all go through the same bottleneck. x 1 x 2 x 1 + x 2 = y y So coupled MP-TCP will be fair to normal TCP.
So, how well does it work in practice? How much more congested is the red link than the blue one?
Coupled subflows Uncoupled TCP flows (Epsilon controls the degree of coupling)
But what if the load is dynamic? 8 on/off TCP flows 1 multipath flow 3 long lived TCP flows Parameters set so mean loss rate is roughly the same on both paths
Coupled subflows Uncoupled TCP flows (Epsilon controls the degree of coupling)
Effect of dynamic load 8 on/off TCP flows 3 long lived TCP flows Coupling tries to equalize the loss With unequal loss, will move almost all traffic off the more congested path. On/off flows briefly congest the top path. Coupled algorithm rapidly moves almost all traffic off top path. On/off flows briefly underload the top path. Coupled algorithm is sending so little traffic on top path and increases so slowly it doesn t notice until after the transient underload has passed.
Self-interference Consider the ideal case where we have equal random loss on two identical paths. Might hope coupling would split traffic equally across both paths. Subflow window on path 2 Traffic flaps between one path and the other Subflow window on path 1
Loss rates are never precisely equal. The coupled congestion controller gets stuck on one path, then after a random time flaps to the other path. Regular TCP Flows Coupled Subflows
What to do? Apply some sort of damping? But will be even slower to respond to real changes in load. Compromise on resource pooling a little? Build in some sense of history? Better resource pooling when conditions are stable More responsive when conditions are varying
Compromise solution? =1 is an interesting case. Reasonable load balancing, good equipoise. Very simple algorithm.
=1 is an algorithm that just links the increases.
The linked increases algorithm does not flap. loss on path i: p i window on path i: w i Stable fixed point: w 1 p 1 = w 2 p 2 Does not attempt to completely equalize losses, so doesn t move all the traffic off a more congested path.
The linked increases algorithm has nice properties, but isn t sufficient by itself. It s not naturally fair to TCP when multiple subflows go through the same bottleneck. When RTTs on the subflows are dissimilar, it can get poor throughput. But the dynamics are pretty good, so these are just a question of scaling the aggressiveness.
Differing round trip times can influence throughput significantly. 1. For the same loss rate, linked increases will equalize the windows of the subflows. 2. For the same bandwidth, an equal number of flows will give a lower loss rate on a higher RTT path.
An example: Consider first the throughput when the loss rates and RTTs are equal. w 1 = 10 RTT 1 =10ms => 1000pkts/s Src p 1 = p 2 Dst w 2 = 10 RTT 2 =10ms => 1000pkts/s Total rate = 2000 pkts/s
The linked increases algorithm gets low throughput when the round trip times are different. But a regular TCP on on path 1 would get 2000pkts/s w 1 = 10 RTT 1 =10ms => 1000pkts/s Src Dst w 2 = 10 RTT 2 =100ms => 100pkts/s Total rate = 1100 pkts/s
We can correct for this, but first we need to decide what a desirable outcome would be. Goal 1: Improve Throughput Do at least as well as a single-path flow on the best path. Goal 2: Do no harm. Affect other flows no more than a single-path flow on their path. Goal 3: Balance congestion Move the maximum amount of traffic off the more congested path
We can make the linked increases algorithm fair by simply scaling the increase function. a is the aggressiveness. It doesn t change relative window sizes, so doesn t affect Goal 3: move traffic away from congestion. By tuning a, we can also achieve Goal 1: improve throughput and Goal 2: do no harm.
The value of a can be calculated from the window sizes and the RTTs of the subflows. w^ is the equilibrium window, but experiments show the instantaneous window can be used.
RTT compensation is effective; MPTCP implementation over WiFi and 3G
Load balancing of a multihomed server A minority of MP-TCP flows can balance the traffic of single-path TCP flows at a multihomed server. 5 and 15 TCP flows, add 10 MP-TCP flows 15 and 25 TCP flows, add 10 MP-TCP flows
Can we design a deployable MP-TCP protocol?
What does deployable MP-TCP actually mean? Protocol works at least as well as regular TCP. Always works when a regular TCP would work. Falls back to regular TCP when path or endpoint is not MP-TCP capable. Plays nicely with all the strange middleboxes that are out there. Simple and secure.
TCP Packet Header Bit 0 Bit 15 Bit 16 Bit 31 Source Port Destination Port Sequence Number Header Length Reserved Acknowledgment Number Code bits Receive Window 20 Bytes Checksum Urgent Pointer Options 0-40 Bytes Data
Negotiation Send MP-CAPABLE option on SYN packet. If receive MP-CAPABLE on SYN/ACK packet. else enable MPTCP and establish additional subflows as required. fallback to regular TCP behaviour.
Sequence Numbers Packets go multiple paths. Need sequence numbers to put them back in sequence. Need sequence numbers to infer loss on a single path. Options: One sequence space shared across all paths? One sequence space per path, plus an extra one to put data back in the correct order at the receiver?
Sequence Numbers One sequence number per path is preferable. Loss inference is more reliable. Some firewalls/proxies expect to see all the sequence numbers on a path. Outer TCP header holds subflow sequence numbers. Where do we put the data sequence numbers?
TCP Packet Header Bit 0 Bit 15 Bit 16 Bit 31 Subflow Source Port SubflowDestination Port Header Length Subflow Sequence Number Subflow Acknowledgment Number Reserved Checksum Code bits Data sequence number? Options Receive Window Urgent Pointer 20 Bytes 0-40 Bytes Data sequence number? Data
Data Sequence Number It s a little cleaner to put the data sequence number in an option. Assume this; actually quite a big debate though.
TCP Packet Header Bit 0 Bit 15 Bit 16 Bit 31 Subflow Source Port SubflowDestination Port Header Length Subflow Sequence Number Subflow Acknowledgment Number Reserved Checksum Code bits [Data sequence number] Options Receive Window Urgent Pointer 20 Bytes What does this this now now mean? 0-40 Bytes Data
Receive Window Regular TCP: receive window is the amount of buffer the receiver has available beyond the cumulative ack. Flow control: sender can t send data receiver doesn t have buffering for. Receive window must therefore refer to data sequence space, not subflow sequence space. Relative to which Ack? Really want a Data Acknowledgment field.
TCP Packet Header Bit 0 Bit 15 Bit 16 Bit 31 Subflow Source Port Header Length Subflow Sequence Number Subflow Acknowledgment Number Reserved Checksum Code bits [Data sequence number] Options Data SubflowDestination Port DataReceive Windowrelative to Urgent Pointer 20 Bytes 0-40 Bytes [Data acknowledgment number]
How big must the receive window be? Need to be able to lose a packet on largest delay subflow, resend, and have the other subflows not fill the receive buffer. If sender cannot fast-retransmit, must wait for a timeout:
Problems, problems, problems... TCP offload engines may re-segment TCP packets, replicating options on all segments. Firewalls may drop packets with options. Firewalls may remove options from packets. Proxies may ack data before it s received by receiver. Proxies may report their window, not the receiver s. Proxies/NATs may rewrite/extend/shrink payload and fix up sequence numbers accordingly. Normalizers may ensure retransmissions are consistent with the original data. Firewalls may rewrite sequence numbers in packets.
The Resegmentation Problem We cannot guarantee the segmentation of data into packets we send is what is actually received. Implicit mapping of subflow sequence numbers to data sequence numbers in an option is unreliable. Can t just supply a data sequence number; needs to be an explicit data sequence number mapping. But subflow sequence numbers get re-written (eg, by pf) Data seqno must be absolute, Subflow seqno must be relative to start of flow. Need a mapping length to cope with resegmentation where only the first packet gets the option added.
TCP Packet Header Bit 0 Bit 15 Bit 16 Bit 31 Subflow Source Port Header Length Subflow Sequence Number Subflow Acknowledgment Number Reserved Checksum Code bits Data SubflowDestination Port DataReceive Windowrelative to Urgent Pointer 20 Bytes [Subflow seqno, length -> Data seqno] [Data acknowledgment number] 0-40 Bytes
Smart NAT What happens if a NAT rewrites content, changing its length? Gap or overlap in data sequence number mapping. Disaster - cannot recover! DSN mapping must carry a checksum. Checksum fails, drop the subflow. Unless it s the only subflow: Initiate a new subflow between the same addresses Signal an infinite mapping, basically falling back to regular TCP with no further explicit mappings Drop the failed subflow.
TCP Packet Header Bit 0 Bit 15 Bit 16 Bit 31 Subflow Source Port Header Length Subflow Sequence Number Subflow Acknowledgment Number Reserved Checksum Code bits Data SubflowDestination Port DataReceive Windowrelative to Urgent Pointer 20 Bytes [Subflow seqno, length -> Data seqno, checksum] [Data acknowledgment number] 0-40 Bytes
More issues Lots: What do you do when you receive data for which there s no mapping? Silently drop; sender will retransmit? Ack at subflow, drop at data level? Wrong answer to this leads to potential deadlocks when proxies manipulate receive window. Etc, etc.
Heuristics and Open Issues When should we start an additional subflow? Which address pair to use? What should slowstart behaviour be? When should you resend data on a different subflow? When is performance on a subflow so bad we should give up on it. Or should we send redundant data on it, as a placeholder? How should costs (in $ ) factor in to which subflows are used? How does multipath performance trade off against battery life on smart phones and similar devices? Can we factor latency into path choice for delay-sensitive apps? Or is that a recipe for instability?
Conclusions Practical congestion control is never simple. Multipath multiply so. Design of deployable extensions to TCP in the presence of creative middleboxes is just painful. No way to verify design decisions short of widespread deployment. Resulting changes are no longer simple. We think we have a design that falls back gracefully to regular TCP behaviour. Will still confuse intrusion detection systems, etc, which no longer see the whole connection.