TCP: Reliable, In-Order Delivery EE 122: Intro to Communication Networks Fall 2006 (MW 4-5:30 in Donner 155) Vern Paxson TAs: Dilip Antony Joseph and Sukun Kim http://inst.eecs.berkeley.edu/~ee122/ Materials with thanks to Jennifer Rexford, Ion Stoica, and colleagues at Princeton and UC Berkeley 1 Announcements Sukun is away this week. Dilip will cover his section and office hours. Dilip s office has moved to 751 Soda (alcove). His office hours remain Fri 11-12. Today is the deadline for requesting discussion / possible regarding of midterm questions. Send email to do so. Project #3 out on Wednesday Can do individual or in a team of 2 people First phase due November 16 - no slip days Exercise good (better) time management 2 1
Today s Lecture How does TCP achieve correct operation? Reliability in the face of IP s meager best effort service 3-way handshake to establish connections 3-way or 4-way handshake to terminate conn. Retransmission to recover from loss We ll only look at timeout-based retransmission today State diagrams as a tool for understanding complex protocol operation 3 TCP Service Model Reliable, in-order, byte-stream delivery and with good performance Challenges - the network can drop packets Even perhaps a large number delay packets Even perhaps for many seconds deliver packets out-of-order Follows from possibility of arbitrary delay replicate packets Weird, but it does sometimes happen corrupt packets (What s missing?) 4 2
TCP Support for Reliable Delivery Checksum Used to detect corrupted data at the receiver leading the receiver to drop the packet Sequence numbers Used to detect missing data... and for putting the data back in order Retransmission Sender retransmits lost or corrupted data Timeout based on estimates of round-trip time Fast retransmit algorithm for rapid retransmission 5 TCP Header Source port Destination port HdrLen 0 Sequence number Flags Advertised window Checksum Urgent pointer 6 3
TCP Header These should be familiar Source port Destination port Sequence number HdrLen 0 Flags Advertised window Checksum Urgent pointer 7 TCP Header Starting sequence number (byte offset) of data carried in this segment Source port Destination port Sequence number HdrLen 0 Flags Advertised window Checksum Urgent pointer 8 4
TCP Header Source port Destination port gives seq # just beyond highest seq. received in order. If sender sends N in-order bytes starting at seq S then ack for it will be S+N. Sequence number HdrLen 0 Flags Advertised window Checksum Urgent pointer 9 TCP Header Source port Destination port Number of 4-byte words in TCP header; 5 = no options Sequence number HdrLen 0 Flags Advertised window Checksum Urgent pointer 10 5
TCP Header Source port Destination port Must Be Zero 6 bits reserved HdrLen 0 Sequence number Flags Advertised window Checksum Urgent pointer 11 TCP Header Source port Destination port We will get to these shortly HdrLen 0 Sequence number Flags Advertised window Checksum Urgent pointer 12 6
TCP Header Source port Destination port Buffer space available for receiving data. Used for TCP s sliding window. Interpreted as offset beyond field s value. Sequence number HdrLen 0 Flags Advertised window Checksum Urgent pointer 13 TCP Header Source port Destination port Used with URG flag to indicate urgent data (not discussed further) Sequence number HdrLen 0 Flags Advertised window Checksum Urgent pointer 14 7
TCP Stream of Bytes Service Host A Byte 80 Byte 3 Byte 2 Byte 1 Byte 0 Host B Byte 80 Byte 3 Byte 2 Byte 1 Byte 0 15 Provided Using TCP Segments Host A Byte 80 Byte 3 Byte 2 Byte 1 Byte 0 TCP Segment sent when: 1. Segment full (Max Segment Size), 2. Not full, but times out, or 3. Pushed by application. Host B TCP Byte 80 Byte 3 Byte 2 Byte 1 Byte 0 16 8
TCP Segment IP TCP (segment) TCP Hdr IP Hdr IP packet No bigger than Maximum Transmission Unit (MTU) E.g., up to 1,500 bytes on an Ethernet TCP packet IP packet with a TCP header and data inside TCP header 20 bytes long TCP segment No more than Maximum Segment Size (MSS) bytes E.g., up to 1460 consecutive bytes from the stream 17 Sequence Numbers Host A ISN (initial sequence number) Sequence number = 1 st byte TCP TCP HDR sequence number = next expected byte Host B TCP TCP HDR 18 9
Initial Sequence Number (ISN) Sequence number for the very first byte E.g., Why not just use ISN = 0? Practical issue IP addresses and port #s uniquely identify a connection Eventually, though, these port #s do get used again a chance an old packet is still in flight and might be associated with new connection TCP requires (RFC793) changing ISN over time Set from 32-bit clock that ticks every 4 microseconds only wraps around once every 4.55 hours To establish a connection, hosts exchange ISNs 19 Connection Establishment: TCP s Three-Way Handshake 20 10
Establishing a TCP Connection A SYN B SYN Each host tells its ISN to the other host. Three-way handshake to establish connection Host A sends a SYN (open; synchronize sequence numbers ) to host B Host B returns a SYN acknowledgment (SYN ) Host A sends an to acknowledge the SYN 21 TCP Header Source port Destination port Flags: SYN FIN RST PSH URG Sequence number HdrLen 0 Flags Advertised window Checksum Urgent pointer See /usr/include/netinet/tcp.h on Unix Systems 22 11
Step 1: A s Initial SYN Packet A s port B s port Flags: SYN FIN RST PSH URG A s Initial Sequence Number (Irrelevant since not set) 5=20B Flags 0 Checksum Advertised window Urgent pointer A tells B it wants to open a connection 23 Step 2: B s SYN- Packet B s port A s port Flags: SYN FIN RST PSH URG B s Initial Sequence Number = A s ISN plus 1 20B 0 Flags Advertised window Checksum Urgent pointer B tells A it accepts, and is ready to hear the next byte upon receiving this packet, A can start sending data 24 12
Step 3: A s of the SYN- A s port B s port Flags: SYN FIN RST PSH URG A s Initial Sequence Number B s ISN plus 1 20B 0 Flags Advertised window Checksum Urgent pointer A tells B it s likewise okay to start sending upon receiving this packet, B can start sending data 25 Timing Diagram: 3-Way Handshaking Active Open Client (initiator) connect() SYN, SeqNum = x Passive Open Server listen() SYN +, SeqNum = y, Ack = x + 1, Ack = y + 1 accept() 26 13
What if the SYN Packet Gets Lost? Suppose the SYN packet gets lost Packet is lost inside the network, or: Server discards the packet (e.g., listen queue is full) Eventually, no SYN- arrives Sender sets a timer and waits for the SYN- and retransmits the SYN if needed How should the TCP sender set the timer? Sender has no idea how far away the receiver is Hard to guess a reasonable length of time to wait SHOULD (RFCs 1122 & 2988) use default of 3 seconds Other implementations instead use 6 seconds 27 SYN Loss and Web Downloads User clicks on a hypertext link Browser creates a socket and does a connect The connect triggers the OS to transmit a SYN If the SYN is lost 3-6 seconds of delay: can be very long User may become impatient and click the hyperlink again, or click reload User triggers an abort of the connect Browser creates a new socket and another connect Essentially, forces a faster send of a new SYN packet! Sometimes very effective, and the page comes quickly 28 14
5 Minute Break Questions Before We Proceed? 29 Tearing Down the Connection 30 15
Normal Termination, One Side At A Time B SYN SYN FIN FIN A time Finish (FIN) to close and receive remaining bytes FIN occupies one octet in the sequence space Other host ack s the octet to confirm Closes A s side of the connection, but not B s Until B likewise sends a FIN Which A then acks Connection now half-closed Timeout: Avoid reincarnation Can retransmit FIN if lost Connection now closed 31 Normal Termination, Both Together B SYN SYN FIN FIN + A time Timeout: Avoid reincarnation Can retransmit FIN if lost Connection now closed Same as before, but B sets FIN with their ack of A s FIN 32 16
Sending/Receiving the FIN Packet Sending a FIN: close() Process has finished sending data via the socket Process calls close() to close the socket Once TCP has sent all of the outstanding bytes then TCP sends a FIN Even if bytes not yet ack d Because FIN has seqno beyond all the bytes and thus won t be ack d until all bytes are delivered Receiving a FIN: EOF Process is reading data from the socket Eventually, the attempt to read returns an EOF All bytes prior to sender calling close() have been delivered 33 Abrupt Termination B SYN SYN RST RST A time A sends a RESET (RST) to B E.g., because app. process on A crashed That s it B does not ack the RST Thus, RST is not delivered reliably And: any data in flight is lost But: if B sends anything more, will elicit another RST 34 17
Reliability: TCP Retransmission 35 Reasons for Retransmission Packet Packet Packet Timeout Packet Timeout Timeout Packet Timeout Timeout Packet Timeout Packet lost lost DUPLICATE PET Early timeout DUPLICATE PETS 36 18
How Long Should Sender Wait? Sender sets a timeout to wait for an Too short: wasted retransmissions Too long: excessive delays when packet lost TCP sets retransmission timeout (RTO) as function of RTT Expect to arrive an RTT after data sent plus slop to allow for variations (e.g., queuing, MAC) But: how does the sender know the RTT? And: what s a good estimate for slop? 37 RTT Estimation Use exponential averaging: SampleRTT = AckRcvdTime " SendPacketTime EstimatedRTT = # $ EstimatedRTT + (1"#) $ SampleRTT # = 7 /8 (for one measurement per flight)! EstimatedRTT SampleRTT Time 38 19
Jacobson/Karels Algorithm Compute slop in terms of observed variability One solution: use standard deviation (requires expensive square root computation) Use mean deviation instead Difference = SampleRTT " EstimatedRTT Deviation = Deviation + # $ ( Difference "Deviation) RTO = µ $ EstimatedRTT + % $ Deviation # =1/4 (again, for one measurement per flight) µ =1 % = 4! Implementations often use a coarse-grained (500 msec) 39 timer, so resulting value is large Problem: Ambiguous Measurement How to differentiate between the real, and of the retransmitted packet? Sender Receiver Sender Receiver SampleRTT? Original Transmission Retransmission SampleRTT? Original Transmission Retransmission 40 20
Karn/Partridge Algorithm Measure SampleRTT only for original transmissions Once a segment has been retransmitted, do not use it for any further measurements Also, employ exponential backoff Every time RTO timer expires, set RTO 2 RTO (Up to maximum 60 sec) Every time new measurement comes in (= successful original transmission), collapse RTO back to computed value 41 State Diagrams For complicated protocols, operation depends critically on current mode of operation Important tool for capture this: state diagram At any given time, protocol endpoint is in a particular state Dictates its current behavior Endpoint transitions to other states on events Interaction with lower layer Reception of certain types of packets Interaction with upper layer New data arrives to send, or received data is consumed Timers 42 21
TCP State Diagram 43 44 22
45 Summary Reliable, in-order, byte-stream delivery Sequence numbers s 3-way handshake to establish 3-way or 4-way handshake to terminate Timer-based retransmission State diagram to keep it all straight What s missing? Performance Next lecture Congestion control 46 23