Congestion Control and Active Queue Management Congestion Control, Efficiency and Fairness Analysis of TCP Congestion Control A simple TCP throughput formula RED and Active Queue Management How RED wors Fluid model of TCP and RED interaction Other AQM mechanisms XCP: congestion for large delay-bandwidth product Router-based mechanism Decoupling congestion control from fairness Readings: please do required readings! and optional readings if interested hy Congestion Control Inefficiency and Congestion Collapse self-interest vs. social welfare Inefficiency: a simple artificial example source rate λ =00b/s source 2 rate λ 2 =000b/s s s2 C =00b/s x C2 =000b/s C3 =0b/s y C4 =00b/s C5 =0b/s Assumption: when total offered traffic exceeds lin capacity, all sources see their traffic reduced in proportion of their offered traffic (e.g., when FIFO is used) d source throughput µ =0b/s! d2 source 2 throughput µ 2 =0b/s BISS 200: FAN BISS 200: FAN 2 Fairness Consider a simple scenario: N users want to transit data over a lin of bandwidth C Each user i wants r i bandwidth If Σr i <C, no problem! But if Σ r i >C, what should we do? Suppose all users are of equal importance : To be fair, allocate the same share, C/N, to each user O, if r i > C/N; what if there exists r i, r i < C/N i.e., some users want less than that their fair share how do we allocate the residue bandwidth of these users? the Fair Queuing algorithm: FQ where w i = for all i s If not all users are equal: importance denoted by w i weighted fair queueing Max-Min Fairness Networ scenario: A simple line networ example user 0 lin i bw C i user i First, maximize thruput at each router/lin (or total networ thruput) may lead to unfair bw allocation How to allocate bw fairly? let x ij be a feasible bw share of user i at lin j Bw allocation to user i: x i = min j x ij Ideally, we want to max x i =max min j x ij, for all users Max-min fairness: Let {x j } be a bw allocation vector (bav), it is max-min fair if for any other bav y, if y i > x i, then there exists j, x j <= x i and y j < x j Unfortunately, such max-min fair bav may not always exist! BISS 200: FAN 3 BISS 200: FAN 4
Fairness (cont d) (Abstract) Networ model: S sources and L lins (c l ) A l,s (routing matrix): fraction of traffic of source s on lin l feasible (rate) allocation: formal def of bottlenec lin (with respect to source s) Some Facts (Theorems) A feasible rate allocation is max-min fair if and only if every source has a bottlenec lin Under networ model and assuming routing matrix fixed, there exists a unique max-min fair allocation! Fair Queueing implements max-min fairness TCP Congestion Control Behavior congestion control: decrease sending rate when loss detected, increase when no loss routers discard, mar pacets when congestion occurs interaction between end systems (TCP) and routers? want to understand (quantify) this interaction TCP runs at end-hosts congested router drops pacets BISS 200: FAN 5 BISS 200: FAN 6 Generic TCP CC Behavior: Additive Increase window algorithm (window ) up to pacets in networ return of ACK allows sender to send another pacet cumulative ACKS increase window by one per RTT < +/ per ACK < + per RTT sees available networ bandwidth Ignoring the slow start phase during which window increased by one per ACK < + per ACK < 2 per RTT receiver sender BISS 200: FAN 7 BISS 200: FAN 8
Generic TCP CC Behavior: Multiplicative Decrease window algorithm (window ) increase window by one per RTT < +/ per ACK loss indication of congestion decrease window by half on detection of loss, (triple duplicate ACKs), < /2 receiver sender TD BISS 200: FAN 9 BISS 200: FAN 0 Generic TCP CC Behavior: After Time-Out (TO) window algorithm (window ) increase window by one per RTT < +/ per ACK halve window on detection of loss, < /2 timeouts due to lac of ACKs > window reduced to one, < receiver sender TO BISS 200: FAN BISS 200: FAN 2
Generic TCP Behavior: Summary window algorithm (window ) increase window by one per RTT (or one over window per ACK, < +/) halve window on detection of loss, < /2 timeouts due to lac of ACKs, < successive timeout intervals grow exponentially long up to six times Understanding TCP Behavior can simulate (ns-2) + faithful to operation of TCP - expensive, time consuming deterministic approximations + quic - ignore some TCP details, steady state fluid models + transient behavior - ignore some TCP details BISS 200: FAN 3 BISS 200: FAN 4 TCP window size /2 TCP Throughput/Loss Relationship loss occurs Idealized model: is maximum supportable window size (then loss occurs) TCP window starts at /2 grows to, then halves, then grows to, then halves one window worth of pacets each RTT to find: throughput as function of loss, RTT TCP window size /2 TCP Throughput/Loss Relationship period # pacets sent per period = time (rtt) time (rtt) BISS 200: FAN 5 BISS 200: FAN 6
TCP Throughput/Loss Relationship TCP window size /2 period time (rtt) # pacets sent per period = + + +... + = 2 2 = 2 + 2 + / 2 2 n= 0 n / 2 n= 0 ( + n) 2 / 2( / 2 + ) = + + 2 2 2 3 3 = 2 + 8 4 3 2 8 TCP window size /2 TCP Throughput/Loss Relationship period time (rtt) 3 2 # pacets sent per period 8 pacet lost per period implies: 8 8 or: = 3 3p loss 3 pacets B = avg._thruput = 4 rtt.22 B = avg._thruput = p loss 2 p loss pacets rtt B throughput formula can be extended to model timeouts and slow start [PFTK 98] BISS 200: FAN 7 BISS 200: FAN 8 Drawbacs of FIFO with Tail-drop FIFO Router with Two TCP Sessions Sometimes too late a signal to end system about networ congestion in particular, when RTT is large Buffer loc out by misbehaving flows Synchronizing effect for multiple TCP flows Burst or multiple consecutive pacet drops Bad for TCP fast recovery BISS 200: FAN 9 BISS 200: FAN 20
Active Queue Management Dropping/maring pacets depends on average queue length -> p = p(x) Advantages: signal end systems earlier absorb burst better avoids synchronization Examples: RED REM p max 0 t min t max Maring probability p 2t max average queue length x BISS 200: FAN 2 RED: Parameters min_th minimum threshold max_th maximum threshold avg_len average queue length avg_len = (-w)*avg_len + w*sample_len Discard Probability 0 min_th max_th queue_len Average Queue Length BISS 200: FAN 22 RED: Pacet Dropping If (avg_len < min_th) enqueue pacet If (avg_len > max_th) drop pacet If (avg_len >= min_th and avg_len < max_th) enqueue pacet with probability P Discard Probability (P) RED: Pacet Dropping (cont d) P = max_p*(avg_len min_th)/(max_th min_th) Discard Probability max_p P 0 min_th max_th queue_len Average Queue Length 0 min_th max_th queue_len Average Queue Length avg_len BISS 200: FAN 23 BISS 200: FAN 24
RED Router with Two TCP Sessions Dynamic (Transient) Analysis of TCP Fluids model TCP traffic as fluid describe behavior of flows and queues using Ordinary Differential Equations (ODEs) solve resulting ODEs numerically BISS 200: FAN 25 BISS 200: FAN 26 Loss Model A Single Congested Router B(t) Pacet Drop/Mar Sender AQM Router Round Trip Delay (τ) p(t) Receiver Loss Rate as seen by Sender: λ(t) = B(t-τ) τ) p(t-τ) TCP flow i AQM router C, p focus on single bottleneced router capacity {C (pacets/sec) } queue length q(t) discard prob. p(t) N TCP flows thru router window sizes (t) i round trip time R (t) i = A i +q(t)/c throughputs B i (t) = i (t)/r (t) i BISS 200: FAN 27 BISS 200: FAN 28
Adding RED to the Model RED: Maring/dropping based on average queue length x(t) Maring probability p p max t min t max 2t max Average queue length x - q(t) - x(t) TCP indow Dynamic Model TCP class -- TCP flows sharing the same route average window size of a TCP class: sender receiver x(t): smoothed, time averaged q(t) t -> additive multi. increase decrease loss arrival rate BISS 200: FAN 29 Lin Model: RED Traffic Propagation Model trac TCP class s arrival/departure rates at each queue -- average queue length -- averaging parameter pacet loss/mar probability -- arrival rate of TCP class i on th queue -- departure rate of TCP class i from th queue
Traffic Propagation Model trac TCP class s arrival/departure rates at each queue Putting it Together TCP Throughput Queue Traffic Propagation Queueing Delay Loss AQM Averaging TCP Source Rate Loss Probability Coupled differential equations solved numerically A Queue is not a Networ Lin bandwidth constraints Queue equations Networ - set of AQM routers, V sequence V i for session i Round trip time - aggregate delay R i (t) = A i + Σ v Vi q v (t) Loss/maring probability - cumulative prob p i (t) = -Π v Vi ( - p v (q v (t))) let t Steady State Behavior d 0, p( t) p, ( t), R ( t) R dt this yields p 2( p ) 0 = p or = R 2 R p the throughput is B = 2( p) R p R 2 for small p p BISS 200: FAN 35 BISS 200: FAN 36
How well does it wor? simulation fluid model OC-2 OC-48 lins RED with target delay 5msec 2600 TCP flows OC-48 OC-2 instantaneous delay decrease to 300 at 30 sec. increase to 2600 at 90 sec. 2600 j 2600 j 300 j t=30 t=90 Good queue length match time (sec) BISS 200: FAN 37 BISS 200: FAN 38 Numerical Solution: time-stepped simulation window size average window size simulation fluid model time (sec) simulation fluid model time (sec) matches average window size solve ODEs using Matlab: low efficiency, poor flexibility C program: fixed step-size Runge-Kutta method time-stepped simulation: update windows of all TCP classes calculate departure/arrival rates on each queue update queue length and pacet loss/mar prob. computation cost step-size number of TCP classes number of lins BISS 200: FAN 39
Other Model Enhancements adjustment for TCP implementations reno, newreno, sac window bacoff size vs. average window size different AQMs PI controller AVQ REM adjustment for RED implementations geometric and uniform dropping wait option Accuracy: Transient Behavior compare with pacet level simulation: networ simulator (ns) class S D load variations: class2 S2 B B2 D2 {class, 2} {class } {class,2,3} class3 S3 D3 queue length ns fluid window size ns indiv. ns avg. fluid time time Scalability: Lin Bandwidth & Flow Population 8 lins, 3 TCP classes Scalability: Lin Bandwidth & Flow Population (cont.) = =0 =00 rate Class time time time Class3 Class2 Scale FFM 0.766sec 0 0.766sec 50 0.766sec 00 0.766sec scale bandwidth and flow population with =, 0, 50, 00 lin Bandwidth: 0M*, 00M* flows population each class: 40* NS Speed-up 2.5sec 6.32 2min.2sec 59.3 6min.23sec,283 27min.56sec 2,88
Issues with RED Parameter sensitivity how to set min th, max th, and max p Goal: maintain avg. queue size below midpoint between min_{th} and max_{th} max th needs to be significantly smaller than max. queue size to absorb transient peas max p determines drop rate In reality, hard to set these parameters RED uses avg. queue length, may introduce large feedbac delay, lead to instability Other AQM Mechanisms Adaptive RED (ARED) BLUE Virtual Queue Random Early Discard (REM) Proportional Integral Controller Adaptive Virtual Queue Improved AQMs are designed based on control theory to provide better faster response to congestion and more stable systems BISS 200: FAN 45 BISS 200: FAN 46 Explicit Congestion Notification (ECN) Standard TCP: Losses needed to detect congestion asteful and unnecessary ECN (RFC 248): Routers mar pacets instead of dropping them Receiver returns mars to sender in ACK pacets Sender adjusts its window accordingly Two bits in IP header: ECT: ECN-capable transport (set to ) CE: congestion experienced (set to ) BISS 200: FAN 47 TCP congestion control performs poorly as bandwidth or delay increases Shown analytically in [Low0] and via simulations 50 flows in both directions Buffer = B x Delay RTT = 80 ms Because TCP lacs fast response 50 flows in both directions Buffer = B x Delay B = 55 Mb/s Spare bandwidth is available TCP increases by pt/rtt even if spare bandwidth is huge hen a TCP starts, it increases exponentially Too many drops Flows ramp up by pt/rtt, Bottlenec taing forever Bandwidth to grab (Mb/s) the large Round bandwidth Trip Delay (sec) BISS 200: FAN 48
XCP: explicit congestion Control Protocol Solution: Decouple Congestion Control from Fairness High Utilization; Small Queues; Bandwidth Few Drops Allocation Policy hy Decoupling? Solution: Decouple Congestion Control from Fairness Coupled because a single mechanism controls both Example: In TCP, Additive-Increase Multiplicative- Decrease (AIMD) controls both How does decoupling solve the problem?. To control congestion: use MIMD which shows fast response 2. To control fairness: use AIMD which converges to fairness BISS 200: FAN 49 BISS 200: FAN 50 Characteristics of XCP Solution. Improved Congestion Control (in high bandwidthdelay & conventional environments): Small queues Almost no drops 2. Improved Fairness 3. Scalable (no per-flow state) 4. Flexible bandwidth allocation: min-max fairness, proportional fairness, differential bandwidth allocation, XCP: An explicit Control Protocol. Congestion Controller 2. Fairness Controller BISS 200: FAN 5 BISS 200: FAN 52
How does XCP or? How does XCP or? Round Trip Round Time Trip Time Congestion Congestion indow indow Feedbac Feedbac = + 0. pacet Round Trip Time Congestion indow Feedbac = + - 0.3 0. pacet Congestion Header BISS 200: FAN 53 BISS 200: FAN 54 How does XCP or? How Does an XCP Router Compute the Feedbac? Congestion Controller Fairness Controller Goal: Matches input traffic to lin capacity & drains the queue Goal: Divides between flows to converge to fairness Congestion indow = Congestion indow + Feedbac XCP uses ECN and Core Stateless mechanism (i.e. state carried in pacet header) Routers compute feedbac without any per-flow state Loos at aggregate traffic & queue Algorithm: MIMD Aggregate traffic changes by ~ Spare Bandwidth ~ - Queue Size So, = α d avg Spare - β Queue Loos at a flow s state in Congestion Header Algorithm: AIMD If > 0 Divide equally between flows If < 0 Divide between flows proportionally to their current rates BISS 200: FAN 55 BISS 200: FAN 56
Getting the devil out of the details Congestion Controller = α d avg Spare - β Queue Theorem: System converges to optimal utilization (i.e., stable) for any lin bandwidth, delay, number of sources if: Fairness Controller Algorithm: If > 0 Divide equally between flows If < 0 Divide between flows proportionally to their current rates Need to estimate number of flows N π 2 0< α < and β = α 4 2 2 N = T ( Cwnd pts in T pt / RTT pt ) No Parameter Tuning (Proof based on Nyquist Criterion) No Per-Flow State RTT pt : Round Trip Time in header Cwnd pt : Congestion indow in header T: Counting Interval BISS 200: FAN 57