Part1: Lecture 2 TCP congestion control

Similar documents

Outline. TCP connection setup/data transfer Computer Networking. TCP Reliability. Congestion sources and collapse. Congestion control basics

Improving the Performance of TCP Using Window Adjustment Procedure and Bandwidth Estimation

Lecture 15: Congestion Control. CSE 123: Computer Networks Stefan Savage

Transport Layer Protocols

TCP over Wireless Networks

TCP in Wireless Mobile Networks

A Survey on Congestion Control Mechanisms for Performance Improvement of TCP

TCP over Multi-hop Wireless Networks * Overview of Transmission Control Protocol / Internet Protocol (TCP/IP) Internet Protocol (IP)

Low-rate TCP-targeted Denial of Service Attack Defense

TCP/IP Optimization for Wide Area Storage Networks. Dr. Joseph L White Juniper Networks

TCP Westwood for Wireless

TCP/IP Over Lossy Links - TCP SACK without Congestion Control

Lecture Objectives. Lecture 07 Mobile Networks: TCP in Wireless Networks. Agenda. TCP Flow Control. Flow Control Can Limit Throughput (1)

First Midterm for ECE374 03/09/12 Solution!!

Final for ECE374 05/06/13 Solution!!

TCP Adaptation for MPI on Long-and-Fat Networks

Chapter 6 Congestion Control and Resource Allocation

B-2 Analyzing TCP/IP Networks with Wireshark. Ray Tompkins Founder of Gearbit

Transport Layer and Data Center TCP

Computer Networks - CS132/EECS148 - Spring

TCP Flow Control. TCP Receiver Window. Sliding Window. Computer Networks. Lecture 30: Flow Control, Reliable Delivery

High Speed Internet Access Using Satellite-Based DVB Networks

International Journal of Scientific & Engineering Research, Volume 6, Issue 7, July ISSN

La couche transport dans l'internet (la suite TCP/IP)

Visualizations and Correlations in Troubleshooting

Mobile Communications Chapter 9: Mobile Transport Layer

Simulation-Based Comparisons of Solutions for TCP Packet Reordering in Wireless Network

Computer Networks. Chapter 5 Transport Protocols

TCP and Wireless Networks Classical Approaches Optimizations TCP for 2.5G/3G Systems. Lehrstuhl für Informatik 4 Kommunikation und verteilte Systeme

Application Level Congestion Control Enhancements in High BDP Networks. Anupama Sundaresan

High-Speed TCP Performance Characterization under Various Operating Systems

Congestions and Control Mechanisms n Wired and Wireless Networks

15-441: Computer Networks Homework 2 Solution

A Survey: High Speed TCP Variants in Wireless Networks

TCP Window Size for WWAN Jim Panian Qualcomm

Transport layer issues in ad hoc wireless networks Dmitrij Lagutin,

Midterm Exam CMPSCI 453: Computer Networks Fall 2011 Prof. Jim Kurose

TCP PACKET CONTROL FOR WIRELESS NETWORKS

Parallel TCP Data Transfers: A Practical Model and its Application

First Midterm for ECE374 03/24/11 Solution!!

TCP in Wireless Networks

Application Note. Windows 2000/XP TCP Tuning for High Bandwidth Networks. mguard smart mguard PCI mguard blade

CSE331: Introduction to Networks and Security. Lecture 9 Fall 2006

Chapter 5. Transport layer protocols

Data Networks Summer 2007 Homework #3

2 TCP-like Design. Answer

Linux 2.4 Implementation of Westwood+ TCP with rate-halving: A Performance Evaluation over the Internet

STUDY OF TCP VARIANTS OVER WIRELESS NETWORK

Congestion Control Review Computer Networking. Resource Management Approaches. Traffic and Resource Management. What is congestion control?

SJBIT, Bangalore, KARNATAKA

A Study on TCP Performance over Mobile Ad Hoc Networks

CS268 Exam Solutions. 1) End-to-End (20 pts)

Active Queue Management (AQM) based Internet Congestion Control

CSE 473 Introduction to Computer Networks. Exam 2 Solutions. Your name: 10/31/2013

COMP 3331/9331: Computer Networks and Applications. Lab Exercise 3: TCP and UDP (Solutions)

TCP for Wireless Networks

4 High-speed Transmission and Interoperability

This sequence diagram was generated with EventStudio System Designer (

SELECTIVE-TCP FOR WIRED/WIRELESS NETWORKS

Names & Addresses. Names & Addresses. Hop-by-Hop Packet Forwarding. Longest-Prefix-Match Forwarding. Longest-Prefix-Match Forwarding

Optimization of Communication Systems Lecture 6: Internet TCP Congestion Control

ALTHOUGH it is one of the first protocols

Master Course Computer Networks IN2097

TCP, Active Queue Management and QoS

Introduction. Abusayeed Saifullah. CS 5600 Computer Networks. These slides are adapted from Kurose and Ross

La couche transport dans l'internet (la suite TCP/IP)

Homework 3 assignment for ECE374 Posted: 03/13/15 Due: 03/27/15

TCP/IP Inside the Data Center and Beyond. Dr. Joseph L White, Juniper Networks

1. The subnet must prevent additional packets from entering the congested region until those already present can be processed.

A packet-reordering solution to wireless losses in transmission control protocol

Access Control: Firewalls (1)

An Improved TCP Congestion Control Algorithm for Wireless Networks

Analytic Models for the Latency and Steady-State Throughput of TCP Tahoe, Reno and SACK

Protagonist International Journal of Management And Technology (PIJMT) Online ISSN Vol 2 No 3 (May-2015) Active Queue Management

EFFECT OF TRANSFER FILE SIZE ON TCP-ADaLR PERFORMANCE: A SIMULATION STUDY

Prefix AggregaNon. Company X and Company Y connect to the same ISP, and they are assigned the prefixes:

Research of TCP ssthresh Dynamical Adjustment Algorithm Based on Available Bandwidth in Mixed Networks

Linux TCP Implementation Issues in High-Speed Networks

Transport layer protocols for ad hoc networks

Higher Layer Protocols: UDP, TCP, ATM, MPLS

THE Transmission Control Protocol (TCP) has proved

Networking part 3: the transport layer

Network Friendliness of Mobility Management Protocols

TCP based Denial-of-Service Attacks to Edge Network: Analysis and Detection

Digital Audio and Video Data

Improving Effective WAN Throughput for Large Data Flows By Peter Sevcik and Rebecca Wetzel November 2008

internet technologies and standards

IP - The Internet Protocol

Multimedia Communications Voice over IP

17: Queue Management. Queuing. Mark Handley

Transcription:

Part1: Lecture 2 TCP congestion control

Summary of last time TCP headers and details of the flags Flow control TCP sequence numbers TCP connection establishment and termination End to end principle, layering and functionalities

Performance

Flow control Sliding window: Initial window 1 2 3 4 5 6 7 8 9 10 Acknowledged packets Window slides ---> 1 2 3 4 5 6 7 8 9 10

Link capacity In TCP you are limited by the receive window (your upper bound). Imagine you don t have such buffers constrains: How fast you put them in? Your bandwidth in bits/sec How long you have to wait for an ACK? Your RTT in seconds What about when you have links with different bandwidth and different RTT?

BDP The BDP - Bandwidth Delay Product = bandwidth (bits per second) * round trip time(in seconds) A network with a large BPDP (>10 5 bits>12.5kbytes) is called a LFN - long fat network. Your node in Amsterdam (1Gbps) talking to a node in San Diego (1Gbps) BDP =1gbps 162msec = 1, 000, 000, 000bits 0.162sec =162, 000, 000bits = 1sec 162, 000, 000 bytes = 20, 250, 000bytes = 20, 25MB 8

Problems with LFN Receive window size (wasting bandwidth) Need better RTT measurements (used for timeouts calculation) Wrapping of sequence numbers (32bits) Packet loss reduce dramatically throughput More information to be found at:"! Enabling High Performance Data Transfers!

Error recovery

Positive acknowledgements with retransmission It uses a positive acknowledgement schema: The ACKNOWLEDGEMENT NUMBER in the header specifies the sequence number of next missing octet (the stream flowing in the opposite direction of the segment) Events at sender side Events at receiver side Send Packet 1 Receive Packet 1 Send ACK 1 Receive ACK 1 Send Packet 2 Receive Packet 2 Send ACK 2 Receive ACK 2

Error recovery How does TCP handle problems in the transmission? What to do when some segments are lost? And when can you actually say in TCP that a segment is actually lost?

Retransmission It uses an adaptive retransmission algorithm to determine the timeout value before retransmission. Events at sender side Send Packet 1 Start timer ACK would normally arrive Events at receiver side Packet should arrive ACK should be sent Timer expires Retransmit Packet 1 Start timer Receive ACK 1 Receive Packet 1 Send ACK 1 Cancel timer How do you determine what is the ideal RTO (retransmission timeout)?

RTT Round trip time (RTT). The time taken by the signal to be transmitted from sender to receiver Plus acknowldegement for receipt to go from receiver to sender Speed of light in fiber: 200km/ms

Know more: Computing TCP s retransmission timers RFC 6298 June 2011 RTT estimation SampleRTT is measured once per RTT for packets that have been transmitted once One RTT measure per ACK if timestamp option is ON. SmoothedRTT SRTT - is the weighted average of the SampleRTTs values collected: an exponential weighted moving average SRTT = (1 α) SRTT + α SampleRTT if α = 1/ 8 = 0.125 SRTT = 0.875 SRTT + 0.125 SampleRTT

Timeout interval Sample RTT SRTT RTTVAR is the variation on the RTT the EWMA of the difference between SampleRTT and SRTT RTTVAR = (1 β) RTTVAR + β SampleRTT SRTT β =1/ 4 = 0.25 RTO = SRTT + max(clock, 4 DevRTT )

Complex TCP retransmission Premature timeout! Cumulative ACKs! Host A Seq=92 timeout Seq=92, 8 bytes data Seq=100, 20 bytes data Seq=92, 8 bytes data Host B Host A timeout Seq=92, 8 bytes data Seq=100, 20 bytes data X loss ACK=100 Host B Seq=92 timeout ACK=120 time time

TCP ACK generation Event at Receiver Arrival of in-order segment with expected seq #. All data up to expected seq # already ACKed Arrival of in-order segment with expected seq #. One other segment has ACK pending Arrival of out-of-order segment higher-than-expect seq. #. Gap detected Arrival of segment that partially or completely fills gap TCP Receiver action Delayed ACK. Wait up to 500ms for next segment. If no next segment, send ACK Immediately send single cumulative ACK, ACKing both in-order segments Immediately send duplicate ACK, indicating seq. # of next expected byte Immediate send ACK, provided that segment starts at lower end of gap

Refinements through options

TCP options End of option Kind =0 No operation Kind =1 Maximum segment size Kind =2 Len=4 MSS Window scale factor Kind =3 Len=3 Shift count Timestamp Kind =4 Len=10 Timestamp value Timestamp echo reply SACK Kind =5 Len=10 Left edge of 1 st block Right edge of 1 st block Left edge of N th block Right edge of N th block

MTU MTU - Maximum Transmission Unit: largest packet size that can travel through the network, in bytes Ethernet: 1500 bytes Ethernet w/ Jumbo frames : 9000 bytes Path MTU: the smallest MTU on an IP path, as discovered by Path MTU Discovery - or - the largest packet size that will transverse the network without fragmentation

Fragmentation IP packets are encapsulated in frames: DATAGRAM HEADER DATAGRAM DATA FRAME HEADER FRAME DATA IP packets are fragmented to fit within the Path MTU FRAGMENT1 HEADER FRAGMENT2 HEADER DATA2 DATA1

Know more: Path MTU discovery RFC 1191- Nov. 1990 MSS MSS - Maximum Segment Size: the largest amount of data in bytes that a device can handle in a single and un-fragmented piece. Announced at the start of the TCP transmission in the SYN packet. The resulting IP datagram will be MSS+40bytes (20bytes TCP header and 20 bytes IP header). MTU Frame header IP header TCP header TCP data MSS

Window scaling option The standard receive window on TCP systems is 65K bytes. RFC 1323 TCP Large Window Extensions introduced the WSCALE option: A scale factor for the receive window Negotiated at start up (in a SYN packet), and cannot be renogotiated Cannot exceed the maximum permitted buffer size by the system Receive window should be: equal to the BPDP or better BPDP < window < BPDB + B (buffer size at intermediate routers) 11:44:45.679928 IP u019857.1x.uva.nl.65295 > rembrandt0.uva.netherlight.nl.ssh: Flags [S], seq 3977286301, win 65535, options [mss 1460,nop,wscale 3,nop,nop,TS val 629245282 ecr 0,sackOK,eol], length 0

Timestamp option A timestamp is placed in very segment and used for more accurate RTT calculation, based on each received ACK. Receivers echoes back what he receives. No need to clock synchronization! Provides Protection Against Wrapped Sequence Numbers (PAWS) 15:10:01.802654 IP u019857.1x.uva.nl.55721 > rembrandt0.uva.netherlight.nl.ssh: Flags [P.], seq 1094:1110, ack 1609, win 65535, options [nop,nop,ts val 758648946 ecr 325477188], length 16 15:10:01.841480 IP rembrandt0.uva.netherlight.nl.ssh > u019857.1x.uva.nl.55721: Flags [.], ack 1110, win 283, options [nop,nop,ts val 325477199 ecr 758648946], length 0 1

SACKs Know more: TCP Selective Acknowledgements Option RFC 2018 Oct. 1996 An extension to the Selective Acknowledgements (SACK) Option for TCP RFC 2883 Jul. 2000 It allows to acknowledge out-of-order segments selectively. It can be combined with selective retransmission. DSACK: acknowledges duplicate packets using the SACK field, using the first block. Transmitted Segment Received Segment ACK Sent (Including SACK Blocks) 3500-3999 3500-3999 4000 4000-4499 (data packet dropped) 4500-4999 4500-4999 4000, SACK=4500-5000 5000-5499 5000-5499 4000, SACK=4500-5500 Duplicated packet 5000-5499 4000, SACK=5000-5500, 4500-5500

Congestion control

One source of congestion rcwd1 1gbps 1gbps 1gbps rcwd3 rcwd2

What happens if. rcwd1 1gbps 1gbps 1gbps rcwd3 the buffer is bloated? rcwd2

Router with infinite buffer R/2 rcwd1 λ out λ λ in R/2 in λ out maximum per-connection R throughput: R/2 delay λ in R/2 large delays as arrival rate, λ in approaches capacity

Router with finite buffer sender retransmission of timed-out packet application-layer input = application-layer output: λ in = λ out transport-layer input includes retransmissions : λ in λ in λ in : original data λ' in : original data, plus retransmitted data λ out Host A Host B finite shared output link buffers

Retransmissions R/2 λ out when sending at R/2, some packets are retransmissions including duplicated that are delivered! The cost of congestion: - more work (retrans) for given goodput - unneeded retransmissions: link carries multiple copies of pkt decreasing goodput λ in R/2

Congestion: a problem in the network? Congestion indicates a problem in the network! Long delays due to queueing in router buffers; Lost packets and retransmissions due to buffer overflows; Unneeded retransmissions by the sender if large delays leading to congestion collapse

Is congestion bad? Not really. if you know how to manage it. TCP has a mechanism to handle it. Congestion is unavoidable given we want to use the network capacity as efficiently as possible.

Two possible approaches End-end congestion control! Network-assisted congestion control! no explicit feedback from network congestion inferred from endsystem observed loss, delay approach taken by TCP routers provide feedback to end systems single bit indicating congestion Explicit Congestion Notification explicit rate sender should send at

TCP congestion control How does a TCP sender limit the rate at which it sends traffic into its connection? How does a TCP sender perceives that there is congestion on the path to the destination? What algorithm should the sender use to change its sending rate as function of the perceived congestion?

Test Time

Pause

Congestion Control algorithm TCP congestion control algorithm has four components: Slow start Congestion avoidance Fast retransmit Fast recovery Devised in 1988 by Van Jacobson: "Congestion avoidance and control", Proceedings of SIGCOMM 88, Stanford, CA, Aug. 1988, ACM Continued evolution: RFC 5681 TCP congestion control - 2009

TCP stack evolution TCP Tahoe was the original implementation; TCP Reno implemented fast recovery; TCP New Reno improves retransmission during the fast recovery phase of TCP Reno. Learn more: The NewReno Modification to TCP's Fast Recovery Algorithm RFC 3782 Apr. 2004

Congestion window TCP maintains on the sender side also a congestion window (cwnd ): Used to restrict data flow to less than the receivers buffer size when congestion occurs Allowed window = min(rwnd,cwnd) rcwd= 6, cwnd = 8 Sent; acked 1 2 3 4 5 6 7 8 9 10 11 12 13 Sent; not acked Ok to send

Congestion detection Two mechanisms indicate congestion: Timeouts Duplicate acks The congestion window is _not_ static. It increases and decreases based on the arrival of ACKs: It increases slowly if the link has low bandwidth or the link has high delays and viceversa. This is called self-clocking.

Know more: Increasing TCP initial window RFC 3390 October 2002 Slow start At start cwnd is equal to: min (4*MSS, max (2*MSS, 4380 bytes)) ~4K bytes Host A Host B one segment To avoid wasting bandwidth the initial increase is exponential. RTT two segments Doubling cwnd every RTT. cwnd += cwnd + MSS (when an ACK arrives) four segments time

Slow start threshold The Slow Start Threshold (ssthresh) determines if cwnd should follow slow start or congestion control. Congestion avoidance Slow start phase ssthresh Initially very high (equal rwnd) Decreases after congestion cwnd

Know more: Congestion Control Principles RFC 2914 September 2000 Congestion avoidance AIMD - Additive Increase Multiplicative Decrease. Multiplicative Decrease Half congestion window for every lost segment cwnd -= 0.5cwnd (Cannot decrease below 1MSS) Additive increase: Every time an ACK arrives: cwnd += MSS * MSS / cwnd Every RTT congestion window increases by 1 MSS

Which rate can you achieve? AIMD saw tooth behavior: probing for bandwidth cwnd: TCP sender congestion window size additively increase window size. until loss occurs (then cut window in half) time In an animation: http://guido.appenzeller.net/anims/ Courtesy of Guido Appenzeller and Nick McKeown (Stanford University)

TCP sending rate rate ~ cwnd RTT bytes/sec Or better said, given cwnd and RTT vary with time: rate ~ Cwnd(t) bytes/sec RTT (t)

Reaction to timeouts TCP reacts differently depending on the type of loss detected. ssthresh= cwnd/2 at loss event 1. After one timeout: slow-start up to cwnd> ssthresh (cwnd(at loss)/2); then congestion avoidance 2. After three ACKs: saw toothed behavior of congestion avoidance Fast recovery, implemented first in TCP Reno

Fast Retransmit If sender receives 3 ACKs for the same data: Host A Host B resends segment before timer expires; X waits for an acknowledgment of the entire transmit window before returning to congestion avoidance. timeout" resend 2 nd segment" time"

Remember SACK? TCP New Reno What is used in systems today. Able to detect multiple losses. Same as Tahoe/Reno on timeouts. Improves further on fast retransmit phase. Keep track of last un-acked packet when entering fast recovery On every ACK increase cwnd by one MSS When last ACK arrives, return to congestion avoidance, set cwnd to value when entering fast recovery

Evolution of algorithm Fast recovery When receiving duplicate ACKs

Λ cwnd = 4Kbytes ssthresh = rwnd dupackcount = 0 timeout ssthresh = cwnd/2 cwnd = 1 MSS dupackcount = 0 retransmit missing segment dupackcount == 3 ssthresh= cwnd/2 cwnd = ssthresh + 3 retransmit missing segment duplicate ACK dupackcount++ slow start Summary New ACK! new ACK cwnd = cwnd+mss dupackcount = 0 transmit new segment(s), as allowed cwnd > ssthresh Λ timeout ssthresh = cwnd/2 cwnd = 4 KBytes dupackcount = 0 retransmit missing segment timeout ssthresh = cwnd/2 cwnd = 1 dupackcount = 0 retransmit missing segment fast recovery duplicate ACK new ACK cwnd = cwnd + MSS (MSS/cwnd) dupackcount = 0 transmit new segment(s), as allowed cwnd = ssthresh dupackcount = 0 congestion avoidance New ACK! New ACK cwnd = cwnd + MSS transmit new segment(s), as allowed. New ACK! duplicate ACK dupackcount++ dupackcount == 3 ssthresh= cwnd/2 cwnd = ssthresh + 3MSS retransmit missing segment

Congestion control on LFN cwnd = cwnd a*cwnd (when loss is detected) cwnd = cwnd + b/cwnd (when an ACK arrives) Scalable TCP: A = 0.125 and b = 0.01 = congestion window does not oscillate, throughput increases slightly High-speed TCP (HSTCP) a(w) and b(w). Particularly suitable for large BPDP networks TCP BIC It is used by default in Linux kernels 2.6.8 through 2.6.18. CUBIC It is a less aggressive derivative of BIC. Default in Linux kernels since version 2.6.19. Fast TCP, Westwood TCP, H-TCP, TCP VEGAS..

New TCP flavors Want to know more? Scalable TCP: http://www.deneholme.net/tom/scalable/ High-speed TCP: RFC 3649 HighSpeed TCP for Large Congestion Windows TCP BIC/CUBIC: http://netsrv.csc.ncsu.edu/twiki/bin/view/main/bic Fast TCP: http://netlab.caltech.edu/fast/ Westwood TCP: http://www.cs.ucla.edu/nrl/hpi/tcpw/ TCP Vegas: http://www.cs.arizona.edu/projects/protocols/

TCP fairness

TCP Fairness fairness goal: if K TCP sessions share same bottleneck link of bandwidth R, each should have average rate of R/K TCP connection 1 TCP connection 2 bottleneck router capacity R

Why is TCP fair? Two competing sessions: additive increase gives slope of 1, as throughout increases multiplicative decrease decreases throughput proportionally R equal bandwidth share Connection 2 throughput loss: decrease window by factor of 2 congestion avoidance: additive increase loss: decrease window by factor of 2 congestion avoidance: additive increase Connection 1 throughput R

Fairness Fairness and UDP Fairness and parallel TCP connections multimedia apps often do not use TCP do not want rate throttled by congestion control instead use UDP: pump audio/video at constant rate, tolerate packet loss nothing prevents app from opening parallel connections between 2 hosts.

Home reading For the test on Apr. 08 read: A Comparison of SIP and H.323 for Internet Telephony By H. Schulzrinne and J. Rosenberg URL: http://www.cs.columbia.edu/~hgs/papers/ Schu9807_Comparison.pdf

Literature Chapter 20: TCP Bulk Data Flow Chapter 21: TCP Timeout and Retransmission Chapter 24: TCP Future and Performance Chapter 3: Transport Layer Few slides were adapted from: Computer Networking: A Top Down Approach, 5 th edition. Jim Kurose, Keith Ross Addison-Wesley, April 2009 Chapter 7: Transport Over IP