Overview of TCP Overview of TCP Connection-oriented, byte-stream sending process writes some number of bytes TCP breaks into segments and sends via IP receiving process reads some number of bytes Full duplex Overview of TCP Implements both flow and congestion controls Flow control: keep sender from overrunning receiver Congestion control: keep sender from overrunning network Flow control is an end-to-end issue and congestion control is concerned with how hosts and network interact Overview of TCP Based on sliding window protocol used at data link level, but the situation is very different. Potentially connects many different hosts need explicit connection establishment and termination Potentially different RTT need adaptive timeout mechanism Potentially long delay in network need to be prepared for arrival of very old packets 1
Overview of TCP Potentially different capacity at destination need to accommodate different amounts of buffering Potentially different network capacity need to be prepared for network congestion TCP header TCP data is encapsulated in an IP datagram The normal size of the TCP header is 20 bytes, unless options are present TCP header TCP services TCP provides a byte-stream service no record markers are inserted by TCP if sending application writes 10 bytes, 20 bytes, and 50 bytes --- the receiving application may receive the 80 bytes in four reads of 20 bytes TCP does not interpret the contents of the bytes -- ASCII/binary -- same 2
TCP Segment format As mentioned earlier TCP does not transmit bytes -- although it is a byte stream based service Source host buffers enough bytes from the sending process to fill a reasonably sized packet Sends these packets, called segments to receiver TCP Segment format What causes TCP to send the segments? segment grows larger than the maximum segment size explicit action by the sending application trigger by a timer that periodically fires -- segment contains as many bytes as are currently buffered TCP Segment format Recall that IP discards packets after a packet s TTL expires each TCP packet has a maximum lifetime -- maximum segment lifetime (MSL) -- current recommended setting is 120 seconds This value is not enforced by the IP -- it is a conservative estimate the TCP makes of how long a packet might live TCP Segment format The Src Port and Dest Port along with the IP src/dest addresses identify each TCP connection TCP s demux key is <SrcPort, SrcIPAddr, DestPort, DestIPAddr> Because TCP connections come and go, it is possible for a connection to have different incarnations 3
TCP Segment format The Acknowledgment, SequenceNumber, AdvertisedWindow fields are all involved in TCP s sliding window algorithm Each data byte has a sequence number The Sequence Number field contains the # for the first byte of data Acknowledgment and AdvertisedWindow fields carry information about the opposite flow TCP Segment format 6-bit flag field is used to relay information between TCP peers SYN, FIN -- used for connections ACK flag set to indicate Acknowledgement field is valid URG flag set to indicate Urgent data is contained The RESET flag -- receiver wants to abort the connection TCP Connection Establishment TCP connection begins with two actions client (caller) does an active open -- party wanting to initiate a connection server (callee) has already done a passive open -- party willing to accept a connection Most connection setup is asymmetric TCP has an explicit connection setup -- both sides should agree on a set of transmission parameters Three-way handshake Why the sequence number ACK is one larger than the one sent? It is the next sequence number expected -- this implicitly acknowledges all earlier sequence numbers 4
Three-way handshake Why should client and server exchange starting sequence numbers with each other? It should be simpler is each side starts from 0 -- well known sequence number Reason: to protect against two incarnations of a connection reusing the same sequence numbers too soon TCP state-transition diagram The states above ESTABLISHED are involved in setting up a connection The states below ESTABLISHED are involved in terminating a connection The sliding-window algorithm is hidden in ESTABLISHED state all connections start in CLOSED state Each arc is labeled with a tag of the form event/action TCP state-transition diagram TCP state-transition diagram Opening a connection: server invokes a passive open operation -- causing TCP to move to LISTEN state client does an active open -- send a SYN segment to server and moves to SYN_SENT state when SYN arrives at the server -- server moves to SYN_RCVD and responds with SYN+ACK arrival of SYN+ACK at client moves it to ESTABLISHED -- three way handshake 5
TCP state-transition diagram Closing a connection: application process on both sides of the connection must independently close its half of the connection if one side closes the connection, it has no more data to send -- will be available to receive data TCP state-transition diagram Three possible combinations to go from ESTABLISHED to CLOSED this side closes first: ESTABLISHED -- FIN_WAIT_1 - - FIN_WAIT_2 -- TIME_WAIT -- CLOSED other side closes first: ESTABLISHED -- CLOSE_WAIT -- LAST_ACK -- CLOSED both sides close at the same time: ESTABLISHED -- FIN_WAIT_1 -- CLOSING -- TIME_WAIT -- CLOSED A connection in TIME_WAIT state cannot move to CLOSED state until it has waited 2*MSL TCP state-transition diagram Reason: local side responds with an ACK to a FIN from remote side in case the ACK is lost, the remote side would retransmit the FIN again after timeout if the connection is allowed to move to CLOSED state -- then there might be another incarnation of the connection when the FIN for the earlier connection arrives at the local side this FIN will close the new incarnation Serves several purposes: it guarantees reliable delivery of data it ensures data is delivered in order enforces flow control between sender/receiver Receiver advertises the size of the sliding window -- using the Advertised Window field in the TCP header Receiver selects a suitable value so that its buffer is not overflowed by a fast sender 6
Receiver cannot acknowledge a byte that has not been sent TCP can t send a byte application has not written In receiving side: LastByteRead < NextByteExpected -- byte cannot be read by the application until it is received and all preceding bytes are also received -- NextBytesExpected points to the byte immediately after the last byte meeting this criterion NextByteExpected <= LastByteRcvd + 1 if data has arrived in-order, NextByteExpected points to the byte after LastByteRcvd if out-of-order arrival, NextBytesExpected points to the first gap in the data Flow control: sending application is filling its local buffer receiving application is emptying its buffer Both buffers have finite size: MaxSendBuffer and MaxRcvBuffer Receiver throttles the sender by advertising the window size no larger than the amount of data it can buffer 7
On the receive side to avoid buffer overflow: LastByteRcvd - LastByteRead <= MaxRcvBuffer AdvertisedWindow = MaxRcvBuffer - (LastByteRcvd - LastByteRead) This is the amount of free space remaining in the receive buffer If the data arrives faster than consumed, this value decreases with time -- at one time AdvertisedWindow will be 0 if the receiver falls behind the sender On sending side TCP should adhere to the advertised window it gets from the receiver LastByteReceived - LastByteAcked <= AdvertisedWindow EffectiveWindow = AdvertisedWindow - (LastByteSent - LastByteAcked) EffectiveWindow must be greater than 0 before the sender can send data Send side must also make sure the local application does not overflow the send buffer LastByteWritten - LastByteAcked <= MaxSendBuffer if the sending process tries to write y bytes to TCP and (LastByteWritten - LastByteAcked) + y > MaxSendBuffer TCP blocks TCP ensures that a slow receiving process can stop a fast sending process When advertisedwindow becomes 0 the sender stops sending data Because TCP sends a segment only in response to a received segment -- How sender know the receiver is ready? When receiver advertises a window size of 0, sender periodically sends a 1 byte segment this triggers a response -- reports a non-zero window size called smart sender/dumb receiver technique 8
9 TCP sliding Window Issues TCP s sequence is 32-bits wide TCP s advertised window is 16-bits wide This satisfies the sliding window algorithm 32 16 requirement 2 >> 2 2 However, 32-bit sequence number field can wrap around -- i.e., a packet with sequence # x can be sent and after a while another packet with sequence # x can be sent TCP sliding Window Issues A packet can survive for MSL time -- 120 seconds If sequence number wraps around within 120 seconds we have a problem Sequence # will wrap around if the #s are consumed very fast -- data is transmitted very fast An OC-48 (622Mbps) link can wraparound sequence #s in 55 seconds TCP sliding Window Issues Largest possible data sender could have in the pipe is determined by the 16-bit AdvertisedWindow The advertisedwindow should be large enough to inject delay * bandwidth data into the network Assuming a RTT of 100ms for T3 -- 549Kbytes 16-bit AdvertisedWindow will allow only 64Kbytes! Adaptive Retransmissions in TCP
10 Adaptive Retransmissions in TCP TCP sets timeout for retransmission as a function of the estimated RTT TCP uses an adaptive mechanism to estimate the RTT Idea: keep a running average of RTT and compute timeout as a function of RTT EstimatedR TT = α EstimatedRTT + (1 α) SampleRTT α is selected to smooth EstimatedRTT -- original TCP spec. recommends a setting between 0.8 and 0.9 Adaptive Retransmissions in TCP TimeOut = 2 x EstimatedRTT Karn/Partridge Algorithm: Adaptive Retransmissions in TCP The RTT estimation process should not consider the samplertt when a retransmission occurs As shown above, precise measurement of samplertt becomes difficult due to the ambiguity in matching the ACKs with the transmissions Adaptive Retransmissions in TCP Jacobson/Karels Algorithm Only the aspect of the algorithm that deals with timeout and retransmit is discussed here Main problem with the original scheme it does not consider the variation of the samplertts into account if the variation is small, then EstimatedRTT can be better trusted if large variation then, timeout should not be tightly coupled to EstimatedRTT
Adaptive Retransmissions in TCP Difference = SampleRTT - EstimatedRTT EstimatedRTT = EstimatedRTT + (δ x Difference) Deviation = Deviation + δ( Difference - Deviation) δ is a fraction between 0 and 1 TCP Extensions Measuring RTT, sequence number wrap around, and keeping the pipe full are some of issues with TCP Extensions have been proposed to address these issues Timeout = µ x EstimatedRTT + φ x Deviation µ is typicall 1, φ is set to 4 TCP Extensions TCP timeout estimation: TCP reads the actual system clock and puts it in the segment header Receiver echo the timestamp back to the sender Sender can estimate the RTT by subtracting the current time from the received timestamp TCP Extensions Sequence number wrap around: Two segments with the same sequence number Differentiate the two segments by putting the timestamp value in the option field Timestamps monotonically increasing; helps in distinguishing the segments 11
Larger Pipe: TCP Extensions AdvertisedWindow may not be sufficient to fully utilize the pipe -- Delay * bandwidth product may very large compared to the AdvertisedWindow We can use a scaling factor AdvertisedWindow is left shifted by that many places before using its contents 12