Problem space for Ethernet congestion management

Similar documents
Fiber Channel Over Ethernet (FCoE)

Joint ITU-T/IEEE Workshop on Carrier-class Ethernet

IEEE Congestion Management Presentation for IEEE Congestion Management Study Group

Optimizing Data Center Networks for Cloud Computing

A Dell Technical White Paper Dell PowerConnect Team

Fibre Channel over Ethernet in the Data Center: An Introduction

Ethernet Fabric Requirements for FCoE in the Data Center

CCNA R&S: Introduction to Networks. Chapter 5: Ethernet

Quality of Service versus Fairness. Inelastic Applications. QoS Analogy: Surface Mail. How to Provide QoS?

Intel Ethernet Switch Converged Enhanced Ethernet (CEE) and Datacenter Bridging (DCB) Using Intel Ethernet Switch Family Switches

Transport Layer Protocols

Using Fuzzy Logic Control to Provide Intelligent Traffic Management Service for High-Speed Networks ABSTRACT:

Congestion Control Review Computer Networking. Resource Management Approaches. Traffic and Resource Management. What is congestion control?

LAN Switching Computer Networking. Switched Network Advantages. Hubs (more) Hubs. Bridges/Switches, , PPP. Interconnecting LANs

VMWARE WHITE PAPER 1

Performance of Software Switching

Protagonist International Journal of Management And Technology (PIJMT) Online ISSN Vol 2 No 3 (May-2015) Active Queue Management

VXLAN: Scaling Data Center Capacity. White Paper

IMPLEMENTING CISCO QUALITY OF SERVICE V2.5 (QOS)

The proliferation of the raw processing

Quality of Service. Traditional Nonconverged Network. Traditional data traffic characteristics:

Data Center Quantized Congestion Notification (QCN): Implementation and Evaluation on NetFPGA June 14th

Quality of Service (QoS) on Netgear switches

Objectives. The Role of Redundancy in a Switched Network. Layer 2 Loops. Broadcast Storms. More problems with Layer 2 loops

Network Virtualization for Large-Scale Data Centers

Data Center Convergence. Ahmad Zamer, Brocade

ADVANCED NETWORK CONFIGURATION GUIDE

10/100/1000 Ethernet MAC with Protocol Acceleration MAC-NET Core

Performance Evaluation of Linux Bridge

Data Center Infrastructure of the future. Alexei Agueev, Systems Engineer

"Charting the Course to Your Success!" QOS - Implementing Cisco Quality of Service 2.5 Course Summary

SBSCET, Firozpur (Punjab), India

10/100/1000Mbps Ethernet MAC with Protocol Acceleration MAC-NET Core with Avalon Interface

Description: To participate in the hands-on labs in this class, you need to bring a laptop computer with the following:

Transport layer issues in ad hoc wireless networks Dmitrij Lagutin,

Using & Offering Wholesale Ethernet Network and Operational Considerations

An enhanced TCP mechanism Fast-TCP in IP networks with wireless links

LAN Switching and VLANs

17: Queue Management. Queuing. Mark Handley

OVERLAYING VIRTUALIZED LAYER 2 NETWORKS OVER LAYER 3 NETWORKS

How To Switch In Sonicos Enhanced (Sonicwall) On A 2400Mmi 2400Mm2 (Solarwall Nametra) (Soulwall 2400Mm1) (Network) (

Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build

UPPER LAYER SWITCHING

Quality of Service Analysis of site to site for IPSec VPNs for realtime multimedia traffic.

What is VLAN Routing?

APPLICATION NOTE 209 QUALITY OF SERVICE: KEY CONCEPTS AND TESTING NEEDS. Quality of Service Drivers. Why Test Quality of Service?

Using TrueSpeed VNF to Test TCP Throughput in a Call Center Environment

Fulvio Risso Politecnico di Torino

Applications. Network Application Performance Analysis. Laboratory. Objective. Overview

Quality of Service using Traffic Engineering over MPLS: An Analysis. Praveen Bhaniramka, Wei Sun, Raj Jain

Implementing Cisco Quality of Service QOS v2.5; 5 days, Instructor-led

3G Converged-NICs A Platform for Server I/O to Converged Networks

Intel Ethernet Switch Load Balancing System Design Using Advanced Features in Intel Ethernet Switch Family

PART III. OPS-based wide area networks

Question: 3 When using Application Intelligence, Server Time may be defined as.

20. Switched Local Area Networks

Distributed Systems 3. Network Quality of Service (QoS)

Ethernet-based Software Defined Network (SDN) Cloud Computing Research Center for Mobile Applications (CCMA), ITRI 雲 端 運 算 行 動 應 用 研 究 中 心

1. The subnet must prevent additional packets from entering the congested region until those already present can be processed.

NComputing L-Series LAN Deployment

Leased Line + Remote Dial-in connectivity

VMware ESX Server Q VLAN Solutions W H I T E P A P E R

Evaluation Report: Emulex OCe GbE and OCe GbE Adapter Comparison with Intel X710 10GbE and XL710 40GbE Adapters

This topic lists the key mechanisms use to implement QoS in an IP network.

Integrating 16Gb Fibre Channel and 10GbE Technologies with Current Infrastructure

Application Note. Windows 2000/XP TCP Tuning for High Bandwidth Networks. mguard smart mguard PCI mguard blade

Quality of Service (QoS)) in IP networks

Performance Analysis of IPv4 v/s IPv6 in Virtual Environment Using UBUNTU

The Network Layer Functions: Congestion Control

QUALITY OF SERVICE INTRODUCTION TO QUALITY OF SERVICE CONCEPTS AND PROTOCOLS

Chapter 3. Enterprise Campus Network Design

How To Provide Qos Based Routing In The Internet

How Router Technology Shapes Inter-Cloud Computing Service Architecture for The Future Internet

Configuring QoS and Per Port Per VLAN QoS

Quality of Service in the Internet. QoS Parameters. Keeping the QoS. Traffic Shaping: Leaky Bucket Algorithm

Chapter 1 Reading Organizer

Burst Testing. New mobility standards and cloud-computing network. This application note will describe how TCP creates bursty

Allocating Network Bandwidth to Match Business Priorities

Unified Fabric: Cisco's Innovation for Data Center Networks

Seamless Congestion Control over Wired and Wireless IEEE Networks

Indian Institute of Technology Kharagpur. TCP/IP Part I. Prof Indranil Sengupta Computer Science and Engineering Indian Institute of Technology

RoCE vs. iwarp Competitive Analysis

Zarząd (7 osób) F inanse (13 osób) M arketing (7 osób) S przedaż (16 osób) K adry (15 osób)

HP Virtual Connect. Tarass Vercešuks / 3 rd of October, 2013

FCoE Deployment in a Virtualized Data Center

Intel DPDK Boosts Server Appliance Performance White Paper

ETHERNET WAN ENCRYPTION SOLUTIONS COMPARED

Improving Quality of Service

TRILL for Service Provider Data Center and IXP. Francois Tallet, Cisco Systems

Cisco CCNP Optimizing Converged Cisco Networks (ONT)

IP videoconferencing solution with ProCurve switches and Tandberg terminals

Deployments and Tests in an iscsi SAN

TRILL Large Layer 2 Network Solution

QoS Switching. Two Related Areas to Cover (1) Switched IP Forwarding (2) 802.1Q (Virtual LANs) and 802.1p (GARP/Priorities)

IP - The Internet Protocol

hp ProLiant network adapter teaming

Introduction to Differentiated Services (DiffServ) and HP-UX IPQoS

Communication Systems Internetworking (Bridges & Co)

Measuring IP Performance. Geoff Huston Telstra

QoS Parameters. Quality of Service in the Internet. Traffic Shaping: Congestion Control. Keeping the QoS

Transcription:

Problem space for Ethernet congestion management Hugh Barrass (Cisco Systems) With significant contributions from Manoj Wadekar, Gary McAlpine Tanmay Gupta, Gopal Hegde (Intel) 1

Agenda Topology and components Layering Congestion management Notification Conclusions and proposals 2

Data center topology More blade servers or nodes Blade server Gateway to corporate LAN or WAN Data center switch or mesh network Server switch fabric Processor/ Processor/ memory Processor/ blades memory Processor/ blades memory Processor/ blades memory Processor/ blades memory Processor/ blades memory blades memory blades 3

Data center components Network interface subsystems (in server blades and other nodes): Includes Ethernet encapsulation May include network & transport layer acceleration Blade server switch fabric: Typically <20 blades supported Dedicated uplink ports Data center switch or mesh network: Single fabric, up to 100 s ports or multiple fabrics May include multi-path switching (aggregated links etc.) Gateway to corporate LAN or WAN: Connects to legacy networks Could be layer 3 or above 4

Data center component options Network interface subsystems (in server blades and other nodes): Tagging, rate shaping, flow control Maybe transport window adjustment, per flow/session state information Blade server switch fabric: Priority queuing, buffer size optimization, congestion tagging, policing Maybe rate limiting methods Data center switch or mesh network: Similar to blade server switch fabric Assumed to be more feature rich Gateway to corporate LAN or WAN: Must accommodate wider application parameters 5

Agenda Topology and components Layering Congestion management Notification Conclusions and proposals 6

ISO layering in data center components Typical configuration End station or node Defined by IETF Defined by IETF Transport client Transport layer Gateway IP client Network layer Router Defined by 802.1 DTE client Bridge Link layer DTE DTE client DTE client client MAC MAC MAC MAC MAC MAC Network switch Physical layer Defined by 802.3 7

Congestion happens! Transport layer creates network load Transport layer reacts Congestion happens! Transport layer sends data into the network, Congestion happens in the bridge, Causing a reaction in the transport layer 8

Congestion in the network cloud Transport client Transport client Transport client Transport client Transport client Transport client In arbitrary network topology connectivity cannot be assumed Only by adjusting effected transport can congestion be remedied without perturbing innocent conversations 9

Problems with transport adjustment mechanisms Transport adjustment often relies on packet loss Retries are expensive timeouts are disastrous! Not only a problem with TCP Transport adjustment mechanisms are generally optimized for internet-like topologies Transport windows are very large, requiring large network buffers Reaction times are slow Traffic is bursty in time & space Typically clients send bursts to various destinations Causes congestion points to move Needs fast reaction times in transport to avoid misadjustment 10

What is needed for congestion management? Lossless transport adjustment Notification to transport clients without causing retries or timeouts For TCP & non-tcp transport Fast reaction times for adjustment Low network latency, plus change to optimization mechanisms Removes need to pre-tune network Method for notifying transport clients of congestion Transport client (at layer 4) must be made aware of congestion happening in (layer 2) bridge Ideally should not rely on layer 4 implementations for network switches Should also be as compatible as possible with legacy devices 11

Agenda Topology and components Layering Congestion management Notification Conclusions and proposals 12

Congestion management solution Example solution Layer 2 ECN (like) Explicit Congestion Notification Marks a layer 2 packet in case of congestion Set bridge buffer thresholds lower than discard level Indication says packet would have been discarded if my buffer was smaller Following example uses arbitrary solution for ECN NOT intended as a detailed proposal, but as an example Uses TCP plus ECN in data center (type) network Demonstrates effectiveness of transport client adjustment in a tightly bounded environment 13

L2 Congestion Indication Issue: Congestion due to oversubscription Reactive rate control in TCP Method: Rate Control is done at end-points based on congestion information provided by L2 network Provide Congestion Information from the network devices to the edges Standard notification allows end-station drivers to benefit Various mechanisms possible for Congestion Indication Marking, control packet, forward/backward/both TCP applications can benefit ECN can be triggered even by L2 congestion Proactive action by TCP, avoids packet drop Non-TCP applications can leverage New mechanism to respond to congestion 14

Model Implementation: L2 Congestion Indication TCP Back-off Triggered NIC NIC NIC AQM AQM AQM Switch AQM Switch AQM AQM: Active Queue Management CI: Congestion indication triggered by AQM Congestion AQM Switch CI Marking AQM Switch TCP ECN Response NIC NIC TCP CI Marking = frames get marked while forwarding if AQM thresholds exceeded. If between early detection thresholds, use RED algorithm to select frames to mark. If between early drop thresholds, mark more frames but drop some frames. If high drop threshold exceeded, drop frames. 802.3 Support: - Enhancement to MAC Client Service Interface - If messaging used, Enhancement to MAC Control messaging CI Trigger ECN 15

Simple Topology All Links are 10 Gbs Shared Memory 150KB App = Database Entry over full TCP/IP stack Workload distribution = Exponential (8000) ULP Packet Sizes = 1 Bytes to ~85KB Client 1 sending to both servers Clients 2 & 3 sending to Server 1 TCP Delay = DB Entry request to completion HOL Blocking at Client1 for Client1-Server2 traffic 16

Application Throughput & Response Time ~2.29 GB/s 2175 us 218 us Bytes/sec secs ~1.53 GB/s 145 us L2-CI with ECN improves TCP Performance 17

Shared Memory Utilization and Packet Drop at the Switch Number of drops Some initial drops with ECN when it is stabilizing its average Q size L2-CI can significantly reduce packet drops & reduce buffer requirements 18

Multi-stage system w/ mixed link speeds All Links except one are 10 Gbs Peak Throughput = 2.434 Gigabytes / Sec App = Database Entry over the full TCP/IP stack 1 Gbs Link Workload distribution = Exponential (8000) ULP Packet Sizes = 1 Byte to ~85KB TCP Window size = 64KB All clients sending database entries to all servers 19

Application Throughput & Response Time (Buffer = 64 KB per Switch Port) Drops: NoFC_RED = 2554 802.3x = 0 NoFC_RED_ECN = 72 DRC_RED_ECN = 0 ~1.9 ms ~2.4 GB/S ~750 us ~1 GB/S No delayed response & no response spike L2-CI/ECN shows excellent characteristic for short range TCP. Addition of DRC eliminates drops and response spikes. 20

Application Throughput & Response Time (Buffer = 32 KB per Switch Port) Drops: NoFC_RED = 2373 802.3x = 0 NoFC_RED_ECN = 105 DRC_RED_ECN = 0 ~2.4 GB/S ~3 ms ~750 us ~600 MB/S L2-CI/ECN maintains performance even with small switch buffers 21

Summary Examples presented show technical feasibility of Congestion Management in Ethernet Can allow MAC Clients to take proactive actions based on congestion information via 802.3 Facilitate & take advantage of higher layer CM mechanisms Simulations show significant comparative improvements 22

Agenda Topology and components Layering Congestion management Notification Conclusions and proposals 23

Notification Requires that layer 2 devices (bridges) mark packets Must be orthogonal to transport protocol Marking at layer 2, independent of transport Should be transparent for legacy bridges or end stations Some network elements may not notify or react to notification Hypocratic oath, First do no harm. Transfer of information from L2 up to L4 Must be the domain of L4 devices: Either L4 (+) switches or end stations Requires new definitions for transport mechanisms to use notification 24

Ethertype stacking & frame extension Using the new definitions for generic encapsulation may provide opportunity to add CN information Will be ignored by non-cogniscent devices Other options can be explored Markdown Extra header bits Also need to consider backward indication To work with any transport protocol Especially for unidirectional transport protocols Significantly more complex, requires research 25

(probably) not needed in the definition Buffer management algorithms Early tail drop, RED or variations Relationship to priority queuing Define congestion notification NOT detection Transport layer definitions Leave to IETF Possibly make recommendations from 802 TF Gateway (layer 3 or higher) device behavior Gateway may choose to ignore, pass or react to notification Beware that wider system is not tightly bound 26

Agenda Topology and components Layering Congestion management Notification Conclusions and proposals 27

Work required for 802.1 Define congestion notification mechanism Will be ignored by non-cogniscent devices Other options can be explored Markdown Extra header bits Also need to consider backward indication For unidirectional transport protocols Significantly more complex, requires research 28

Possible 802.1 PAR (Purpose) To improve the performance of 802.1 bridged networks in the presence of congestion. In bounded environments, higher layer protocols may significantly improve the management of congestion if they are notified of the congestion occurring at convergence points in the bridged network. This project will define a mechanism for notifying higher layers that congestion has been detected in the path that a packet has followed through the network. 29

Additional 802.1 work IEEE 802.1Q defines tagging for bridges It is not clear that the definition applies to DTEs Update 802-5 to make it clear how definitions apply To bridges only or to all MAC clients Consider updating overview & architecture To show clear relationship between DTE & bridge Show scope of 802 standards 30

Finally Also need some volunteers for IETF work Need RFC for changes to transport adjustment Investigate general applicability of mechanisms Maybe applicable outside data center Other work necessary? More ideas being entertained Need to examine with respect to 802.3 AND 802.1 31

Question Is there support for a new project and Task Force? Mechanism to enable congestion management in bridged networks 802.1~~ Share work between 802.1 & 802.3 members Including joint balloting 32