Computer Networking Transport Layer and Sockets Prof. Andrzej Duda duda@imag.fr http://duda.imag.fr 1
Transport layer and sockets Chapter goals: Introduction to the transport layer: multiplexing /demultiplexing reliable data transfer flow control Introduction to sockets Chapter overview: Transport layer services Multiplexing/demultiplexing Connectionless transport: UDP Connection-oriented transport: TCP reliable transfer flow control connection management Sockets 2
Transport services and protocols provide logical communication between app processes running on different hosts transport protocols run in end systems application transport network data link physical network data link physical network data link physical network data link physical network data link physical network data link physical application transport network data link physical 3
transport vs network layer services segment - unit of data exchanged between transport layer entities aka TPDU: transport protocol data unit IP service model: best-effort delivery service network layer: data transfer between end systems transport layer: data transfer between processes relies on, enhances, network layer services Ht HnHt Hl HnHt M M M M source application transport network link physical PDU SDU destination application transport network link physical Ht HnHt Hl HnHt M M M M message segment datagram frame 4
Transport-layer protocols Internet transport services: reliable, in-order unicast delivery (TCP) congestion flow control connection setup unreliable ( best-effort ), unordered unicast or multicast delivery: UDP services not available: real-time bandwidth guarantees reliable multicast application transport network data link physical network data link physical network data link physical network data link physical network data link physical network data link physical application transport network data link physical 5
Multiplexing/demultiplexing Multiplexing: gathering data at source from different app layer processes and put them on the same link Demultiplexing: delivering received segments to correct app layer processes M/D: based on sender, receiver port numbers, IP addresses receiver P1 application-layer data segment P3 header M segment application transport network Ht M Hn segment multiplexing M application transport network M M P4 P5 M M P2 P6 application transport network receiver network data link physical demultiplexing 6
UDP: User Datagram Protocol [RFC 768] no frills, bare bones Internet transport protocol best effort service, UDP segments may be: lost delivered out of order to app connectionless: no handshaking between UDP sender, receiver each UDP segment handled independently of others Why is there a UDP? no connection establishment (which can add delay) simple: no connection state at sender, receiver small segment header no congestion control: UDP can blast away as fast as desired 7
UDP: more often used for streaming multimedia apps loss tolerant rate sensitive other UDP uses DNS SNMP reliable transfer over UDP: add reliability at application layer application-specific error recover! Length, in bytes of UDP segment, including header 32 bits source port # dest port # length Application data (message) checksum UDP segment format 8
End to end UDP communication Host IP addr=a process process process process pa qa ra sa IP network Host IP addr=b process process process process sb rb qb pb 1267 UDP IP TCP IP SA=A DA=B prot=udp source port=1267 destination port=53 data TCP IP UDP 53 IP datagram IP header UDP Source Port UDP Dest Port UDP Message Length UDP Checksum data UDP datagram 9
TCP: Overview RFCs: 793, 1122, 1323, 2018, 2581 connection-oriented: handshaking (exchange of control msgs) init s sender, receiver state before data exchange point-to-point: one sender, one receiver full duplex data: bi-directional data flow in same connection MSS: maximum segment size send & receive buffers reliable, in-order byte steam: no message boundaries pipelined: TCP congestion and flow control set window size flow controlled: sender will not overwhelm receiver 10
TCP: architecture application writes data socket interface socket interface application writes data TCP send buffer TCP segment TCP recv buffer Conceptual model reliable, flow-controlled channel between sender and receiver 11
TCP segment structure flag field URG: urgent data (generally not used) ACK: ACK # valid PSH: push data now (generally not used) RST, SYN, FIN: connection estab (setup, teardown commands) Internet checksum (as in UDP) 32 bits source port # dest port # head len sequence number acknowledgement number not used U A P R S F checksum rcvr window size ptr urgent data Options (variable length) application data (variable length) counting by bytes of data (not segments!) # bytes rcvr willing to accept 12
TCP Connection Management Recall: TCP sender, receiver establish connection before exchanging data segments initialize TCP variables: seq. #s buffers, flow control info (e.g. RcvWindow) client: connection initiator socket connect primitive server: contacted by client socket listen primitive, then accept for each incoming connection 13
TCP Connection Management (cont.) Three way handshake: Step 1: client end system sends TCP SYN control segment to server specifies initial seq # socket connect primitive open client SYN server Step 2: server end system receives SYN, replies with SYNACK control segment ACKs received SYN allocates buffers allocate resources specifies server-> receiver initial seq. # socket accept primitive SYN, ACK ACK allocate resources Step 3: client end system sends TCP ACK segment to server allocates buffers 14
TCP Connection Management (cont.) Closing a connection: client server client closes socket: socket close primitive close FIN Step 1: client end system sends TCP FIN control segment to server ACK FIN close Step 2: server receives FIN, replies with ACK. Closes connection, sends FIN. timed wait closed ACK 15
TCP Connection Management (cont.) Step 3: client receives FIN, replies with ACK. Enters timed wait - will respond with ACK to received FINs Step 4: server, receives ACK. Connection closed. closing client FIN ACK FIN server closing Note: with small modification, can handly simultaneous FINs. timed wait ACK closed closed 16
TCP seq. # s and ACKs Seq. # s: byte stream number of first byte in segment s data ACKs: seq # of next byte expected from other side cumulative ACK Q: how receiver handles out-of-order segments A: TCP spec doesn t say - implementation Accept out-of-order segments User types C host ACKs receipt of echoed C Host A Host B Seq=42, ACK=79, data = C Seq=79, ACK=43, data = C Seq=43, ACK=80 simple telnet scenario host ACKs receipt of C, echoes back C time 17
TCP Flow Control flow control sender won t overrun receiver s buffers by transmitting too much, too fast RcvBuffer = size of TCP Receive Buffer RcvWindow = amount of spare room in Buffer receiver: explicitly informs sender of (dynamically changing) amount of free buffer space RcvWindow field in TCP segment sender: keeps the amount of transmitted, unacked data less than most recently received RcvWindow: unacked < RcvWindow receiver buffering DEADLOCK? A continues to send 1B when RcvWindow = 0 18
Data transfer TCP connection ordered sequence of bytes Segments Forming segments at transport layer is independent from application layer MSS - Maximal Segment Size negociated at connection establishment 19
Data transfer RcvWindow 1 2 3 4 5 6 7 8 9 10 11 sent and ACKed sent non ACKed may send may not send Sliding window size announced by the receiver (RcvWindow) advance at received ACKs closed when all bytes sent 20
Example of data transfer - Reno 8001:8501(500) A 101 W 6000 101:201(100) A 8501 W 14000 8501:9001(500) A 201 W14247 9001:9501(500) A 201 W 14247 9501:10001(500) A 201 W 14247 (0) A 8501 W 13000 201:251(50) A 8501 W 12000 8501:9001(500) A 251 W14247 reset timers 251:401(150) A 10001 W 12000 10001:10501(500) A 401 W 14247 21
Error recovery Segment has the SEQ no. of the first data byte When sending a segment, start timer Receiver acknowledges segments either immedietely either after a small delay it has a data segment to send (piggybacking) receives another segment (cumulative ACKs) general rule: ack each other segment ACKs may be sent, even if segments are received out of order When no ACK received until timeout, retransmit the first no acked segment 22
Fast Retransmit P1 P2 P3 P4 P5 P6 P3 P7 Fast retransmit A1 A2 A2 A2 A2 A? timeout may be large add the Selective Repeat behavior if the sender receives 3 duplicated ACKs, retransmit the missing segment 23
Congestion Control Congestion: too many sources sending too much data too fast for network to handle bottleneck manifestations: lost packets (buffer overflow at routers) long delays (queueing in router buffers) Two broad approaches towards congestion control: End-end congestion control: no explicit feedback from network congestion inferred from end -system observed loss, delay approach taken by TCP Network-assisted congestion control: routers provide feedback to end systems single bit indicating congestion explicit rate sender should send at 24
TCP Congestion Control end-end control (no network assistance) transmission rate limited by congestion window size, Congwin, over segments: Congwin w segments, each with MSS bytes sent in one RTT: throughput = w * MSS Bytes/sec RTT 25
TCP congestion control: probing for usable bandwidth: ideally: transmit as fast as possible (Congwin as large as possible) without loss increase Congwin until loss (congestion) loss: decrease Congwin, then begin probing (increasing) again two phases slow start congestion avoidance important variables: Congwin threshold: defines threshold between slow start phase and congestion avoidance phase 26
TCP Slowstart Slowstart algorithm initialize: Congwin = 1 for (each ACK) Congwin++ until (loss event OR CongWin > threshold) RTT Host A Host B one segment two segments four segments exponential increase (per RTT) in window size (not so slow!) loss event: timeout (Tahoe TCP) and/or or three duplicate ACKs (Reno TCP) time 27
TCP Congestion Avoidance Congestion avoidance /* slowstart is over */ /* Congwin > threshold */ Until (loss event) { every w segments ACKed: Congwin++ } threshold = Congwin/2 Congwin = 1 1 perform slowstart 1: TCP Reno skips slowstart (fast recovery) after three duplicate ACKs 28
TCP Fairness Fairness goal: if N TCP sessions share same bottleneck link, each should get 1/N of link capacity WHY? Additive increase gives slope of 1, as throughout increases multiplicative decrease decreases throughput proportionally TCP congestion avoidance: AIMD: additive increase, multiplicative decrease increase window by 1 per RTT decrease window by factor of 2 on loss event 29
Why is TCP fair? Two competing sessions: Additive increase gives slope of 1, as throughout increases multiplicative decrease decreases throughput proportionally 30
Why AI-MD works? source 1 link C source 2 router destination Simple scenario with two sources sharing a bottleneck link of capacity C 31
Throughput of sources x 2 1. Additive increase C x 1 = x 2 2. Multiplicative decrease 3. Additive increase 4. Multiplicative decrease 4 3 2 1 C x 1 32
TCP Fairness loss: decrease window by factor of 2 congestion avoidance: additive increase 33
Fairness of the TCP TCP differs from the pure AI-MD principle window based control, not rate based increase in rate is not strictly additive - window is increased by 1/W for each ACK Adaptation algorithm of TCP results in a negative bias against long round trip times adaptation algorithm gives less throughput to sources having larger RTT 34
Fairness of TCP S 1 router 10 Mb/s, 20 ms 1 Mb/s 10 ms 10 Mb/s, 60 ms 8 seg. 8 seg. destination S 2 Example network with two TCP sources link capacity, delay limited queues on the link (8 segments) NS simulation 35
Throughput in time ACK numbers S 1 S 2 time 36
Sockets Interface between applications and the transport layer protocols socket - communication end-point network communication viewed as a file descriptor (socket descriptor) Two main types of sockets connectionless mode (or datagram, UDP protocol) connection mode (or stream, TCP protocol) 37
Connectionless mode System calls in connectionless mode (UDP) socket - create a socket descriptor bind - associate with a local address sendto - send a buffer of data recvfrom - receive data close - close socket descriptor 38
Connectionless mode client socket(); server socket(); bind(); bind(); sendto(); recvfrom(); close(); 39
Connectionless mode application id=3 id=4 buffer buffer UDP port=32456 port=32654 address=129.88.38.84 address=fe:a1:56:87:98:12 IP Ethernet 40
Connection mode System calls in connection mode (TCP) socket - create a socket descriptor bind - associate with a local address listen - signal willingness to wait for incoming connections (S) accept - accept a new incoming connection (S) connect - ask to establish a new connection (C) send - send a buffer of data recv - receive data close - close socket descriptor 41
Connection mode client socket(); server socket(); bind(); bind(); listen(); connect(); connection establishment accept(); send(); recv(); close(); data transfer close(); 42
Connection mode application id=3 id=4 id=5 connection queue buffer buffer TCP port=32456 address=129.88.38.84 IP address=fe:a1:56:87:98:12 Ethernet 43
Summary Principles behind the transport layer services: multiplexing/demultiplexing reliable data transfer flow control congestion control Sockets - application interface to network communication connectionless sockets (UDP) connection sockets (TCP) 44