TCP transmission control protocol Suguru Yamaguchi 2014 Information Network 1 Functions that transport layer provides! Model: inter-process communication Identification of process Communication pair of processes! Interfaces for upper layers Connection oriented (virtual circuit) Connectionless (datagram)! Contention and coordination of network resources Flow control, maximizing peer benefit. Congestion control, maximizing network welfare. 2014 Information Network 1
Transport protocols in Internet protocol suites! TCP Connection-oriented Almost all applications are using. Powerful functions! UDP Connectionless Simple, less overhead. IP + process identification! others Many implementations and standards, SCTP, RTP, DCCP,.. 2014 Information Network 1 Process and connection! Identification of process (IP, port)! Identification of TCP connection (source IP, source port, destination IP, destination port) (163.221.52.100, 1040) (203.178.136.36, 25) 1040 connection 25 80 connection 3175 163.221.52.100 2014 Information Network 1 203.178.136.36
port! Port is defined for each transport protocol, separately. TCP/25 is NOT equal to UDP/25 The number has meaning. IANA manages the numbers. Well-known port: 1 1023 www (world wide web) = 80 smtp (simple mail transfer protocol) = 25 Registered port: 1024 49151 Registration to IANA Private port: 49152 65535 http://www.iana.org/assignments/port-numbers 2014 Information Network 1 TCP service model (1)! Connection-oriented! Byte-stream service No explicit boundary among messages Message structure defined by applications! Full duplex Independent stream for sending and receiving! Reliable Managing message order, duplications, discarding, and bit errors. OLLEH OK TCP being viewed as byte-stream service OLLEH OK 2014 Information Network 1
Reliable steam, how?! ACK: acknowledgement Active acknowledgement Duplicate ACK = notification of packet drop! Timeout and retransmission In the case the sender does not receive ACK from its receiver, TIMEOUT! Suppose the message transmission did incomplete with some errors, sender does retransmission again for its receiver. Exponential back-off 2014 Information Network 1 ACK Sender Sent and acknowledged Sent but unacknowledged Nara Institute of Science and Technology User data arrives 10 16 Packet in transit Receiver Nara Insti 2014 Information Network 1
Piggybacking: speed up for ACK Sender Receiver Sent and acknowledged Sent but unacknowledged Nara Institute of Science and Technology User data arrives Graduate S Packet in transit Sender Receiver Nara Insti Graduate School of Information Science User data arrives Sent and acknowledged Sent but unacknowledged 2014 Information Network 1 *** Not accurate Duplicate ACK Sender Sent and acknowledged Sent but unacknowledged Nara Institute of Science and Technology User data arrives 10 16 16 Packet loss Outstanding packets Receiver Nara Institute o 2014 Information Network 1
TCP header IP Header TCP Header TCP segment TCP data 16bit source port 16bit destination port 32bit sequence number 32bit acknowledgment number 20 octets 4bit hlen reserved flags 16bit window size 16bit TCP checksum 16bit urgent pointer (options) (TCP data) 2014 Information Network 1 Nagle algorithm! Q. header (20bytes+20bytes) is too large for 1byte data. How can we deal with this?! Nagle algorithm Only one unacknowledged small segment in the connection If the sending segment is smaller than its receiver buffer, wait until it exceeds, or wait predefined time for transmission Small RTT - small waiting time Large RTT fill up the buffer for good throughput 2014 Information Network 1
TCP service model (2)! Buffered transfer Write messages as long as you want No explicit synchronization needed in application layer OS manages status of processes.! Virtual circuit Connection setup & release Detecting disconnection in communication 2014 Information Network 1 Buffered transfer process process write() read() write() read() Send buffer Recv buffer Send buffer Recv buffer TCP connection OS kernel OS kernel 2014 Information Network 1
TCP header again Sender port # Receiver port # Sequence # ACK # 20 octets Hdr len rsv Checksum flags Window size Pointer to OOB TCP option FIN SYN RST PSH ACK URG 2014 Information Network 1 TCP connection setup - 3-way Handshake Client (active open) Server (passive open) SYN_SENT ESTABLISHED SYN J SYN K, ACK J+1 ACK K+1 LISTEN SYN_RECEIVED ESTABLISHED 2014 Information Network 1
TCP connection release close FIN Ack of FIN FIN close ACK of FIN 2014 Information Network 1 TCP connection reset! RST Abortive release Nonexistent port 2014 Information Network 1
Options! TCP options in 3-way handshake Negotiation on options in 3way handshake MSS option Maximum segment Size negotiation Window scale option For huge message buffer, larger than 64k, with bit shift High speed networks Timestamp option More accurate RTT measurement With MSS option Many options available 2014 Information Network 1 TCP state transition appl: passive opem send: <nodata> start CLOSED Server Client LISTEN Passive open SYN_RCVD recv: SYN send: SYN, ACK Simul. open SYN_SENT Active open appl: CLOSE Or timeout recv: CLOSE send: FIN FIN_WAIT_1 recv: FIN send: <nodata> recv: FIN send: ACK ESTABLISHED Data transmission Simul.close CLOSING recv: ACK send: <nodata> recv: FIN send: ACK CLOSE_WAIT recv: CLOSE send: FIN LAST_ACK Passive close recv: ACK send: <nodata> FIN_WAIT_2 recv: FIN send: ACK TIME_WAIT 2MSL Active close 2014 Information Network 1
Summary! Functions in transport layer (L4)! Internet transport protocol! TCP service model! High performance: ACK, piggybacking, Nagle algorithm! Connection management 2014 Information Network 1 Tcpdump 3way handshake # tcpdump tcp and host iplab.naist.jp 15:26:50.965563 IP rm.naist.jp.64868 > iplab.naist.jp.http: S 2196338486:2196338486(0) win 32120 <mss 1460,nop,wscale 0,nop,nop,timestamp 234659186 0,sackOK,eol> 15:26:51.013517 IP iplab.naist.jp.http > rm.naist.jp.64868: S 2951392133:2951392133(0) ack 2196338487 win 57344 <mss 1414,nop,wscale 0,nop,nop,timestamp 10980172 234659186> 15:26:51.013634 IP rm.naist.jp.64868 > iplab.naist.jp.http:. ack 1 win 32246 <nop,nop,timestamp 234659187 10980172> Time src.port > dst.port flag [ from:to(nbytes) ack # ] win # opt 32bit sequence number & acknowledgement number flags 2014 Information Network 1
Tcpdump connection release 15:26:51.149121 IP rm.naist.jp.64868 > iplab.naist.jp.http:. ack 5857 win 30554 <nop,nop,timestamp 234659188 10980187> 15:27:06.103280 IP iplab.naist.jp.http > rm.naist.jp.64868: F 5857:5857(0) ack 430 win 58296 <nop,nop,timestamp 10981679 234659188> 15:27:06.103372 IP rm.naist.jp.64868 > iplab.naist.jp.http:. ack 5858 win 32246 <nop,nop,timestamp 234659337 10981679> 15:27:10.938811 IP rm.naist.jp.64868 > iplab.naist.jp.http: F 430:430(0) ack 5858 win 32246 <nop,nop,timestamp 234659385 10981679> 15:27:10.961089 IP iplab.naist.jp.http > rm.naist.jp.64868:. ack 431 win 58296 <nop,nop,timestamp 10982169 234659385> 2014 Information Network 1 Play with tcpdump! Tcpdump microscope of TCP communication RST use Packet transmission order TCP option MSS options Window scale options 2014 Information Network 1
TCP flow control & congestion control Suguru Yamaguchi 2014 information Network 1 Contention and coordination of resources! Flow control Negotiation of processing performance Recovery from message disorders Recovery from message duplication, discard and bit errors Maximizing performance of data transmission! Congestion control Sharing network bandwidth among connections, suppressing network congestions. Fair sharing Maximizing network welfare
Flow control! Stop-and-wait! Go Back N! Selective repeat! Many schemes ARQ (Adaptive Repeat request) Stop-and-wait ARQ Sender t 1 t 5 t 4 t 1 t 2 t 3 Receiver t 1 : transmission delay t 2 : frame transmission time t 3 : frame processing time t 4 : ACK transmission time t 5 : ACK processing time 2014 information Network 1 28
Go-back-N ARQ Timeout on frame3!! 1 2 3 4 5 3 4 5 6 1 2 4 5 3 4 5 6 2014 information Network 1 29 TCP flow control! End to End No global coordination Working with available bandwidth estimation at individual hosts No interference with intermediate routers Implicit signaling through packet drops! Scalable Working at each end host Autonomous less state management Scalable
End to end control in TCP Data flow ACK flow - timer & retransmission - packet interval handlings - on-the-fly packet control - buffering for retransmission Possible packet drop in Intermediate routers (both data and ACK) - timer & duplicate ACK - delayed ACKing - window size notification - buffering for reordering packets Many contributions for TCP! Very simple algorithm Macroscopic self-stabilization! No assumption with Greedy nodes No global control system No greedy node for eating bandwidth as much as possible Reject the idea of intermediate policing system! For many data-links General purpose Modest performance on almost all data-links! Long term tuning for last 20 years
TCP flow control! Bandwidth usage coordination Sliding window! Sequence number based control Window size! Packet gap control ACK clocking! others Error detection - TCP checksum Discard detection - duplicate ACK, timeout Sliding window Sender Sent and acknowledged Sent but unacknowledged Nara Institute of Science and Technology Window size User data arrives Sequence number 10 16 Packets in transit (on-the-fly packets, outstanding packets) Receiver Nara Insti 2014 information Network 1
Advertisement window size from receivers! Flow control of classic TCP! rwnd: advertisement window Notification from receiver, on maximum receivable packet size Coordination with sender s sliding window size Too sensitive on bottleneck link ACK clocking data Packets in the Bottle neck, with packet gap T Data flow ACK flow Transmission with receiving speed of ACK (bottle neck speed) self clocking in its balancing stituation ACK gen. (gap )
TCP congestion control TCP tahoe! Fair-share model: End to end! Increase/decrease of Window size Additive increase Multiplicative decrease For Self-stabilization (Jain, et.al)! Strategy on changing of Window size\! Detect congestion through packet drops More control parameters TCP tahoe! Parameters in sender Cwnd congestion window Init 1 Ssthresh slow start threshold, Init large Tcprecvthresh dup ACK number for fast recovery, Init 3 for many implementations
Increasing Window size! Increase congestion window (cwnd) exponentially, by slow start threshold (ssthresh)! Overview of algorithm On receiving an ACK: If (cwnd < ssthresh) { /* slow start */ send 2 packets on every ACK; /* exponentially growth*/ cwnd += 1; } else { } /* congestion avoidance */ send cwnd+1 packets on every ACK, cwnd += 1 / cwnd; /* liner behavior */ Increasing Window size! Slow start Exponential increase! Congestion avoidance Additive increase Liner growth # of packets congestion avoidance slow start T
Reducing Window size (idea)! In the case the transmission exceeds maximum throughput Packet drop may occur, because buffer overrun.! In the case of packet drop, Return Duplicate ACK Congested, but not serious (because ACK was traveled) Maybe OK for retransmission Timeout!! Retransmission Time Out (RTO) ACK cannot travel back, so serious heavy congestion. It s better to wait some.. Reducing Window size (overview of algorithm)! On detecting packet drop: If (dup ACK # == tcprecvthresh ) { } /* fast retransmit */ retransmission; ssthresh = cwnd / 2; cwnd = 1; /* again slow start */ If (timeout) { } retransmission; timeout *= 2; /* exponentially backoff */ cwnd = 1;
Totally, TCP behaves like this # of packets Max throughput (may change) slow start T RTO Calculation! Err = M A A A+ gerr D D + h( Err -D) RTO = A + 4D A: smoothed RTT D: smoothed mean deviation g: gain for the average (1/8) h: gain for the deviation (1/4)! simply RTO = {average RTT} + 4 {smoothed mean deviation}
More improvement TCP reno! Issues Tahoe Too much penalty on doing slow start after Fast retransmit More good control on cwnd! Fast recovery If (dup ACK # == tcprecvthresh) { } retransmission; /* fast retransmit */ ssthresh = cwnd / 2 ; cwnd = cwnd/2 + tcprecvthresh; If (dup ACK # > cwnd/2 ) send new one packet on every dup ACK; If (ACK on retransmission) cwnd = ssthresh ; Less penalty # of packets Maximum throughput (may change) slow start cwnd /2 T
More improvement! Selective Acknowledgement (SACK)! Rate flow control TCP vegas! TFRC - TCP Friendly Rate Control (RFC4828)! Explicit Congestion Notification (ECN)! Interaction with RED! TCP extensions for wireless links!. Summary! Flow control Stop-and-Wait Go back N Sliding window! Congestion control Slow start Congestion avoidance Fast retransmit Fast recovery