Design and Modeling of Internet Protocols Dmitri Loguinov March 1, 2005 1
Agenda What is protocol scalability Why TCP does not scale Future high-speed applications AQM congestion control Other work at Texas A&M 2
Protocol Scalability Internet is a complex large-scale distributed system Over 285 million hosts advertized by DNS in July 2004 (ISC Internet Domain Survey) More 60 million websites in Jan 2005 (Netcraft.com) Over 8 billion webpages crawled by Google Millions of routers, switches, and other devices attached to the network The Internet continues to grow exponentially Not only in size, but also in bandwidth, number of users, hosts, and amount of data transmitted 3
Protocol Scalability 2 Any potential problems in the current Internet will become amplied in the future Scalability thus becomes a fundamental issue Scalability determines how well the network handles increase in its size Network protocols provide communication between users and determine how the Internet sustains its current/future load As the size of the network grows, some protocols may not scale well, leading to noticeable problems 4
Congestion Control One of the fundamental problems in computer networks is how to manage congestion Special protocols called congestion control are designed to make sure that congestion is handled promptly and properly What is congestion? single lane 5
Congestion Control 2 Instead of cars, the Internet has packets A packet is a piece of data that travels over a network Too many packets sent into a given link cause congestion Congestion fills up the outgoing queues, eventually causing packet loss Router 6
Congestion Control 3 TCP is the current standard for congestion control TCP was originally designed in the early 1980s Early versions of TCP did not have proper congestion control This resulted in congestion collapses throughout mid- 1980s Network utilization was close to 100%, but the throughput an application was able to obtain was close to zero TCP was re-designed in 1988, 1990, and 1992 7
Congestion Control 4 The same 12-year old protocols run in the Internet today One of the main issues with TCP at this stage is its scalability in high-bandwidth networks Can TCP effectively utilize terabit (10 12 ) and petabit (10 15 ) per second links? Even at gigabit/second (10 9 ) speeds, TCP exhibits difficulties in long-term transfers It takes close to an hour for TCP to reach full link utilization The protocol was not designed for such capacities 8
Congestion Control 5 If network bandwidth continues to double every year, terabits and petabits per second will become mainstream in 10 and 20 years, respectively Assume 100 ms delay between hosts Link capacity Time to reach utilization 10 gb/s 1.15 hours 1 tb/s 4.8 days 1 pt/s 13 years 9
Congestion Control 6 A more subtle issue is that TCP cannot experience packet loss for this entire duration In fact, the table below shows that network loss must be substantially below all realistic values (10-6 -10-8 ) Link capacity Average loss probability 10 gb/s 10-10 1 tb/s 10-14 1 pt/s 10-20 10
High Bandwidth Applications Scientific applications today require transfer of tera- and petabytes of data Telescope data, various physics and sensor applications Teragrid: 40 gb/s distributed infrastructure for open scientific research HDTV over the Internet Uncompressed HD video at a basic (1024x768) resolution requires 1.4 gb/s 125 channels of HD video = 200 gb/s 1000 channels on the backbone = 1.2 tb/s 11
High Bandwidth Applications 2 General requirements on future congestion control High link utilization Fast convergence Low oscillations Low packet loss Low end-to-end delay Video streaming is more sensitive to the last three issues than simple data transfer 12
AQM Congestion Control A recent direction in congestion control relies on Active Queue Management (AQM) Routers compute congestion information and insert it into passing packets In contrast, TCP infers congestion from packet loss observed by the receiver AQM algorithms generally scale better and can exhibit many useful properties not available in TCP We recently developed an AQM algorithm with many desirable properties listed above 13
AQM Congestion Control 2 The algorithm reaches link utilization in the same number of steps for all links With 200 ms delay, it takes 3 seconds to utilize 1 mb/s, 1 tb/s, or 1 googol (10 100 ) bps link It exhibits fairness All flows sharing a link receive equal share of the link TCP discriminates agains flows with large delay Convergence to fairness is within 3 seconds as well No oscillations in the steady state Link utilization reaches 100% and stays there 14
AQM Congestion Control 3 The method does not lose any packets Link capacity is never exceeded It is stable for arbitrary (including time-varying) delays Control theoretic stability means that rates always converge to the desired state It is low-overhead and can be implemented with four additions per arriving packet inside routers Implemented in the Linux kernel Experiments with a 1 gb/s network at Texas A&M 15
AQM Congestion Control 4 Linux topology (left) and sending rate of flows x 1 and x 2 (right) x 1 x 2 16
AQM Congestion Control 5 Three flows using the same topology x 1 x 2 x 2 x 3 x 3 17
Other Work Besides congestion control, we also study Internet protocols and create models for the various Internet systems These include P2P systems (low diameter graphs, faster searches, analysis of resilience) Evolution models (graph theoretic approaches to understanding the topology of the Internet) Better video streaming and compression Topology discovery, content distribution networks, performance analysis, traffic measurement, etc. 18