Congestion Notification in Ethernet: Part of the IEEE 802.1 Data Center Bridging standardization effort



Similar documents
Data Center Quantized Congestion Notification (QCN): Implementation and Evaluation on NetFPGA June 14th

Data Center Transport Mechanisms: Congestion Control Theory and IEEE Standardization

TCP, Active Queue Management and QoS

Congestion Control Review Computer Networking. Resource Management Approaches. Traffic and Resource Management. What is congestion control?

Using Fuzzy Logic Control to Provide Intelligent Traffic Management Service for High-Speed Networks ABSTRACT:

Context. Congestion Control In High Bandwidth-Delay Nets [Katabi02a] What is XCP? Key ideas. Router Feedback: Utilization.

Seamless Congestion Control over Wired and Wireless IEEE Networks

Transport Layer Protocols

Lecture 15: Congestion Control. CSE 123: Computer Networks Stefan Savage

Robust Router Congestion Control Using Acceptance and Departure Rate Measures

Active Queue Management (AQM) based Internet Congestion Control

Requirements for Simulation and Modeling Tools. Sally Floyd NSF Workshop August 2005

Lecture Objectives. Lecture 07 Mobile Networks: TCP in Wireless Networks. Agenda. TCP Flow Control. Flow Control Can Limit Throughput (1)

TCP in Wireless Mobile Networks

Master s Thesis. A Study on Active Queue Management Mechanisms for. Internet Routers: Design, Performance Analysis, and.

Comparative Analysis of Congestion Control Algorithms Using ns-2

17: Queue Management. Queuing. Mark Handley

Improving the Performance of TCP Using Window Adjustment Procedure and Bandwidth Estimation

Data Networks Summer 2007 Homework #3

Chapter 4. VoIP Metric based Traffic Engineering to Support the Service Quality over the Internet (Inter-domain IP network)

Rate-Based Active Queue Management: A Green Algorithm in Congestion Control

15-441: Computer Networks Homework 2 Solution

Outline. TCP connection setup/data transfer Computer Networking. TCP Reliability. Congestion sources and collapse. Congestion control basics

Lecture 8 Performance Measurements and Metrics. Performance Metrics. Outline. Performance Metrics. Performance Metrics Performance Measurements

TCP over Wireless Networks

An Active Network Based Hierarchical Mobile Internet Protocol Version 6 Framework

Final for ECE374 05/06/13 Solution!!

CROSS LAYER BASED MULTIPATH ROUTING FOR LOAD BALANCING

Chapter 6 Congestion Control and Resource Allocation

Performance Evaluation of Active Queue Management Using a Hybrid Approach

A Survey: High Speed TCP Variants in Wireless Networks

TCP over Multi-hop Wireless Networks * Overview of Transmission Control Protocol / Internet Protocol (TCP/IP) Internet Protocol (IP)

Analyzing Marking Mod RED Active Queue Management Scheme on TCP Applications

Transport layer issues in ad hoc wireless networks Dmitrij Lagutin,

Low-rate TCP-targeted Denial of Service Attack Defense

AN IMPROVED SNOOP FOR TCP RENO AND TCP SACK IN WIRED-CUM- WIRELESS NETWORKS

PART III. OPS-based wide area networks

Horizon: Balancing TCP over multiple paths in wireless mesh networks

An Adaptive Load Balancing to Provide Quality of Service

Why Congestion Control. Congestion Control and Active Queue Management. Max-Min Fairness. Fairness

Visualizations and Correlations in Troubleshooting

A Congestion Control Algorithm for Data Center Area Communications

Introduction to Metropolitan Area Networks and Wide Area Networks

Active Queue Management: Comparison of Sliding Mode Controller and Linear Quadratic Regulator

Passive Queue Management

Optimization models for congestion control with multipath routing in TCP/IP networks

Active Queue Management A router based control mechanism

Packet Queueing Delay

Performance of networks containing both MaxNet and SumNet links

Oscillations of the Sending Window in Compound TCP

Protagonist International Journal of Management And Technology (PIJMT) Online ISSN Vol 2 No 3 (May-2015) Active Queue Management

Performance Analysis of AQM Schemes in Wired and Wireless Networks based on TCP flow

CS268 Exam Solutions. 1) End-to-End (20 pts)

Application Level Congestion Control Enhancements in High BDP Networks. Anupama Sundaresan

Networking part 3: the transport layer

Using median filtering in active queue management for telecommunication networks

Improving our Evaluation of Transport Protocols. Sally Floyd Hamilton Institute July 29, 2005

MANAGING QUEUE STABILITY USING ART2 IN ACTIVE QUEUE MANAGEMENT FOR CONGESTION CONTROL

Ethernet. Ethernet Frame Structure. Ethernet Frame Structure (more) Ethernet: uses CSMA/CD

Research on Video Traffic Control Technology Based on SDN. Ziyan Lin

Simulation-Based Comparisons of Solutions for TCP Packet Reordering in Wireless Network

Sizing Internet Router Buffers, Active Queue Management, and the Lur e Problem

ECSE-6600: Internet Protocols Exam 2

On real-time delay monitoring in software-defined networks

Bandwidth Measurement in Wireless Networks

International Journal of Scientific & Engineering Research, Volume 6, Issue 7, July ISSN

Measuring IP Performance. Geoff Huston Telstra

Parallel TCP Data Transfers: A Practical Model and its Application

Student, Haryana Engineering College, Haryana, India 2 H.O.D (CSE), Haryana Engineering College, Haryana, India

Network Protocol Design and Evaluation

Computer Networks - CS132/EECS148 - Spring

Asynchronous Transfer Mode: ATM. ATM architecture. ATM: network or link layer? ATM Adaptation Layer (AAL)

Indian Institute of Technology Kharagpur. TCP/IP Part I. Prof Indranil Sengupta Computer Science and Engineering Indian Institute of Technology

Lecture 16: Quality of Service. CSE 123: Computer Networks Stefan Savage

Network traffic: Scaling

STANDPOINT FOR QUALITY-OF-SERVICE MEASUREMENT

THE UNIVERSITY OF AUCKLAND

A Survey on Congestion Control Mechanisms for Performance Improvement of TCP

Design of Active Queue Management System for Scalable TCP in High Speed Networks

CUBIC: A New TCP-Friendly High-Speed TCP Variant

An Adaptive Virtual Queue (AVQ) Algorithm for Active Queue Management

Internet Infrastructure Measurement: Challenges and Tools

Performance improvement of active queue management with per-flow scheduling

1. The subnet must prevent additional packets from entering the congested region until those already present can be processed.

About the Stability of Active Queue Management Mechanisms

Written examination in Computer Networks

Routing in packet-switching networks

Introduction, Rate and Latency

Burst Testing. New mobility standards and cloud-computing network. This application note will describe how TCP creates bursty

Transcription:

High Performance Switching and Routing Telecom Center Workshop: Sept 4, 1997. Congestion Notification in Ethernet: Part of the IEEE 802.1 Data Center Bridging standardization effort Berk Atikoglu, Balaji Abdul Prabhakar Kabbani, Balaji Prabhakar Stanford University Rong Pan Cisco Systems Mick Seaman

Background Switches and routers send congestion signals to end-systems to regulate the amount of network traffic. Two types of congestion. Transient: Caused by random fluctuations in the arrival rate of packets, and effectively dealt with using buffers and link-level pausing (or dropping packets in the case of the Internet). Oversubscription: Caused by an increase in the applied load either because existing flows send more traffic, or (more likely) because new flows have arrived. We ve been developing QCN (for Quantized Congestion Notification), an algorithm which is being studied as a part of the IEEE 802.1 Data Center Bridging group for deployment in Ethernet 2

Congestion management/control In essence, a CM scheme will induce a control loop between a switch (which sends congestion signals) and a source (which changes its sending rate in response). It is important to study this control loop from the following viewpoints: Performance: stability, throughput, fairness, robustness Implementation: simplicity, cost of deployment The performance analysis problem is complicated because there are many interacting control loops One switch to many sources Many switches to many (common) sources 3

Summary of what is to come Congestion control algorithms aim to deliver high throughput, maintain low latencies/backlogs, be fair to all flows, be simple to implement and easy to deploy Performance is related to stability of control loop Stability refers to the non-oscillatory or non-exploding behavior of congestion control loops. In real terms, stability refers to the nonoscillatory behavior of the queues at the switch. If the switch buffers are short, oscillating queues can overflow (hence drop packets/pause the link) or underflow (hence lose utilization) In either case, links cannot be fully utilized, throughput is lost, flow transfers take longer So stability is important to establish 4

Unit step response of the network The control loops are not easy to analyze They are described by non-linear, delay differential equations which are usually impossible to analyze So linearized analyses are performed using Nyquist or Bode theory Is linearized analysis useful? Yes! It is not difficult to know if a zero-delay non-linear system is stable. As the delay increases, linearization can be used to tell if the system is stable for delay (or number of sources) in some range; i.e. we get sufficient conditions The above stability theory is essentially studying the unit step response of a network Apply many infinitely long flows at time 0 and see how long the network takes to settle them to the correct collective and individual rate; the first is about throughput, the second is about fairness 5

Flow-level models However, what users care about is flow (file) transfer times, not about link utilization or about buffer overflows Recognizing this, the literature devotes a substantial effort to rating an algorithm s ability to transfer flows quickly Here flows, of various sizes, continually arrive at the network and are processed according to the underlying congestion control and transport algorithms Metrics like average flow completion time and its variance reveal the goodness of the schemes The two parts mesh as follows First use the unit step response to get control gain parameters Then use the flow-models to understand the dynamic performance of the network under more realistic loads Winning algorithms score high on both metrics and are simple We re going to see how all this works at L3 and then at L2 6

Overview of Internet research 7

A one-slide (extreme) summary The basic control loop: an end-system--router pair End-systems either Have simple reactions: Cut window by a factor 1/2 (e.g. TCP) Or elaborate reactions: Various increase/decrease behaviors (e.g. High-speed TCP, Fast TCP) The router could Send simple signals: signal (drop/mark) based on queue-size alone (e.g. RED, DropTail) Or detailed signals: drop/mark based on queue-size, and link utilization (e.g. REM, PI controller, AVQ) For high bandwidth-delay-product networks The simple-simple combo doesn t work (e.g. RED-TCP); this is what has led to the development of other algorithms 8

TCP--RED: A deep dive into a basic control loop TCP TCP TCP TCP TCP: Slow start + Congestion avoidance p Congestion avoidance: AIMD No loss: increase window by 1; Pkt loss: cut window by half min th max th q avg RED: Drop probability, p, increases as the congestion level goes up 9

The simulation setup # of TCP flows 1200 300 # of TCP flows 1200 0 50 100 150 200 time grp1 grp2 0 50 100 150 200 time # of TCP flows 600 grp3 RED 100Mbps 100Mbps RED 0 50 100 150 200 time 10

TCP--RED: Analytical model 1/R W : +1 - W N R - C! q W 2 Time Delay p LPF TCP Control RED Control 11

TCP--RED: Analytical model Users: Network: dw ( t) dt 1 RTT i =! dq dt i ( t) N W ( t) " # RTT ( t) i i Wi ( t) 2 i! Wi ( t) p( t) * RTT ( t) C 1.5 i W: window size; RTT: round trip time; C: link capacity q: queue length; q a : ave queue length p: drop probability *By V. Misra, W. Dong and D. Towsley at SIGCOMM 2000 *Fluid model concept originated by F. Kelly, A. Maullo and D. Tan at Jour. Oper. Res. Society, 1998 12

Accuracy of analytical model Recall the ns-2 simulation from earlier: Delay at Link 1 13

Accuracy of analytical model 14

Accuracy of analytical model 15

Why are the Diff Eqn models so accurate? They ve been developed in Physics, where thy are called Mean Field Models The main idea very difficult to model large-scale systems: there are simply too many events, too many random quantities but, it is quite easy to model the mean or average behavior of such systems interestingly, when the size of the system grows, its behavior gets closer and closer to that predicted by the mean-field model! physicists have been exploiting this feature to model large magnetic materials, gases, etc. just as a few electrons/particles don t have a very big influence on a system, so is Internet resource usage not heavily influenced by a few packets: aggregates matter more 16

TCP--RED: Stability analysis Given the differential equations, in principle one can figure out whether the TCP--RED control loop is stable However, the differential equations are v.complicated 3rd or 4th order, nonlinear, with delays There is no general theory, specific case treatments exist Linearize and analyze Linearize equations around the (unique) operating point Analyze resultant linear, delay-differential equations using Nyquist or Bode theory End result: Design stable control loops Determine stability conditions (RTT limits, number of users, etc) Obtain control loop parameters: gains, drop functions, 17

Instability of TCP--RED As the bandwidth-delay-product increases, the TCP--RED control loop becomes unstable Parameters: 50 sources, link capacity = 9000 pkts/sec, TCP--RED Source: S. Low et. al. Infocom 2002 18

Layer 2 Congestion Control 19

Background This is ongoing effort in the IEEE 802.1 CM Working Group The goal is to design a congestion management algorithm for Ethernet as part of the Data Center Bridging standards effort Simulations and theory used for design and validation We borrow from Internet research, where possible However, several significant differences exist 20

Switched Ethernet vs Internet Some significant differences 1. There is no end-to-end signaling in the Ethernet a la per-packet acks in the Internet So congestion must be signaled to the source by switches Not possible to know round trip time! Algorithm not automatically self-clocked (like TCP) 2. Links can be paused; i.e. packets may not be dropped 3. No sequence numbering of L2 packets 4. Sources do not start transmission gently (like TCP slow-start); they can potentially come on at the full line rate of 10Gbps 5. Ethernet switch buffers are much smaller than router buffers (100s of KBs vs 100s of MBs) 6. Most importantly, algorithm should be simple enough to be implemented completely in hardware An interesting environment to develop a congestion control algorithm We have developed QCN, for Quantized Congestion Notification, which is now being drafted (version 0 released) Closest Internet relatives: BIC TCP at source, REM/PI controller at switch 21

Congestion management loop components Source Reaction Point Congestion Point 1 Congestion Point n Destination Reaction Point: Where the rate of injection of a flow (or flows) is changed due to congestion signals; usually, the place where rate limiters reside. Congestion Point: Where resources (buffers/links) exist and can be congested, and where congestion signals are generated; usually, switch buffers and the links they are attached to. Congestion Management Domain: ReaP -- CPs 22

Basic QCN 2-point architecture: Reaction Point -- Congestion Point 1. Congestion Points: Sample packets, compute feedback (Fb), quantize Fb to 6 bits, and reflect only negative Fb values back to Reaction Point with a probability proportional to Fb. Fb = -(q off + w q delta ) = -(queue offset + w.rate offset) Reflection Probability Pmin P max Fb 2. Reaction Points: Transmit regular Ethernet frames. When congestion message arrives: perform multiplicative decrease, fast recovery and active probing. Fast recovery similar to BIC-TCP: gives high performance in high bandwidthdelay product networks, while being very simple. 23

Fast Recovery and Active Probing R Fast Recovery Rate Rd Rnew Rd/2 Rd/4 Rd/8 Active Probing Congestion message recd Time 24

Basic QCN: Outcomes/results Easy to deploy, light resource requirement No header modifications, no tags, immediately deployable. Can work with a single rate limiter. Alias all flows which have received negative feedback onto the rate limiter. RL becomes meta-flow with fast recovery + active probing ensuring good performance. The algorithm is well-defined; i.e. does not rely on the existence of multiple rate limiters for correctness of specification since it has no tags or probes. Quantizing Fb simplifies implementation Fb value used to index into a small table to find the decrease factor. No potentially expensive hardware resources needed for computations. Lookup table also makes the scheme easily reconfigurable (if Fb --> Rate relation changes), a useful workaround. 25

Timer-supported QCN Byte-Ctr Byte-Counter 5 cycles of FR (150KB per cycle) AI cycles afterwards (75KB per cycle) Fb < 0 sends timer to FR Timer RL RL Timer In FR if both byte-ctr and timer in FR In AI if only one of byte-ctr or timer in AI In HAI if both byte-ctr and timer in AI Note: RL goes to HAI only after 500 pkts have been sent 5 cycles of FR (T msec per cycle) AI cycles afterwards (T/2 msec/cycle) Fb < 0 sends timer to FR 26

Rate Adjustments When RL is in FR - Upon completion of a byte-ctr or timer cycle: CR=(CR+TR)/2 - EFR and Target rate reduction enabled during first cycle of byte-ctr When RL is in AI - Upon completion of byte-ctr or timer cycle: TR = TR + R AI ; CR = (CR+TR)/2 (We ve used R AI = 5 Mbps) When RL is in HAI (This means at least 500 pkts have been transmitted since last ding) - Events = completion of byte-ctr or timer cycles - Events numbered i = 1, 2, - At the end of event number i: TR = TR + (i*r HAI ); CR = (CT+TR)/2; (We ve used R HAI = 50 Mbps) 27

Simulations: OG Hotspot Parameters 10 sources share a 10 G link, whose capacity drops to 0.5G during 2-4 secs Max offered rate per source: 1.05G RTT = 50 usec Buffer size = 100 pkts (150KB); Qeq = 22 T = 10 msecs R AI = 5 Mbps R HAI = 50 Mbps 10 G 10 G Source 1 Source 2 0.5G Source 10 28

Recovery Time Stability not compromised Recovery time = 80 msec 29

Stability (RTT = 50 usecs) 30

Stability (RTT = 50 usecs) 31

Stability (RTT = 50 usecs) 32

Stability (long RTT = 500 usecs) 33

Stability (RTT = 500 usecs) 34

Stability (500 usecs) 35

Conclusion Most of the basic tests have been presented at the IEEE (You ve seen just the single link simulations here) More complex L2 topologies, bursty traffic, multipathing, etc Flow completion times TCP interaction with QCN Theory work (showing control theoretic stability) exists, continues to develop/refine 36