Data Center Bridging. IEEE 802 Tutorial 12 th November 2007



Similar documents
Data Center Convergence. Ahmad Zamer, Brocade

Joint ITU-T/IEEE Workshop on Carrier-class Ethernet

FCoCEE* Enterprise Data Center Use Cases

Unified Fabric: Cisco's Innovation for Data Center Networks

IEEE Congestion Management Presentation for IEEE Congestion Management Study Group

3G Converged-NICs A Platform for Server I/O to Converged Networks

Fibre Channel over Ethernet in the Data Center: An Introduction

Fibre Channel Over and Under

Block based, file-based, combination. Component based, solution based

Integrating 16Gb Fibre Channel and 10GbE Technologies with Current Infrastructure

FIBRE CHANNEL OVER ETHERNET

Cisco Datacenter 3.0. Datacenter Trends. David Gonzalez Consulting Systems Engineer Cisco

Converging Data Center Applications onto a Single 10Gb/s Ethernet Network

Rev Priority Grouping for DCB Networks (Enhanced Transmission Selection) Data Center Bridging Revision 1.01 Author.

Ethernet Fabrics: An Architecture for Cloud Networking

Intel Ethernet Switch Converged Enhanced Ethernet (CEE) and Datacenter Bridging (DCB) Using Intel Ethernet Switch Family Switches

over Ethernet (FCoE) Dennis Martin President, Demartek

How To Evaluate Netapp Ethernet Storage System For A Test Drive

IP SAN Best Practices

Ethernet Fabric Requirements for FCoE in the Data Center

Brocade One Data Center Cloud-Optimized Networks

iscsi Top Ten Top Ten reasons to use Emulex OneConnect iscsi adapters

Intel Ethernet Switch Load Balancing System Design Using Advanced Features in Intel Ethernet Switch Family

Enterasys Data Center Fabric

全 新 企 業 網 路 儲 存 應 用 THE STORAGE NETWORK MATTERS FOR EMC IP STORAGE PLATFORMS

Future technologies for storage networks. Shravan Pargal Director, Compellent Consulting

ENABLING THE PRIVATE CLOUD - THE NEW DATA CENTER NETWORK. David Yen EVP and GM, Fabric and Switching Technologies Juniper Networks

A Dell Technical White Paper Dell PowerConnect Team

Unified Storage Networking

Global Headquarters: 5 Speen Street Framingham, MA USA P F

Best Practice and Deployment of the Network for iscsi, NAS and DAS in the Data Center

Virtualizing the SAN with Software Defined Storage Networks

FCoE Deployment in a Virtualized Data Center

Ethernet: THE Converged Network Ethernet Alliance Demonstration as SC 09

Expert Reference Series of White Papers. Planning for the Redeployment of Technical Personnel in the Modern Data Center

FCoE Enabled Network Consolidation in the Enterprise Data Center

Converged Networking Solution for Dell M-Series Blades. Spencer Wheelwright

Fiber Channel Over Ethernet (FCoE)

Overview and Frequently Asked Questions Sun Storage 10GbE FCoE PCIe CNA

SAN Conceptual and Design Basics

Data Center Evolution and Network Convergence

Data Center Networking Designing Today s Data Center

Enhanced Transmission Selection

Data Center Evolution and Network Convergence

From Ethernet Ubiquity to Ethernet Convergence: The Emergence of the Converged Network Interface Controller

Cut I/O Power and Cost while Boosting Blade Server Performance

Data Center Fabric Convergence for Cloud Computing (the Debate of Ethernet vs. Fibre Channel is Over)

Networking Topology For Your System

WHITE PAPER. Best Practices in Deploying Converged Data Centers

The proliferation of the raw processing

Fibre Channel over Convergence Enhanced Ethernet Enterprise Data Center Use Cases

Server and Storage Consolidation with iscsi Arrays. David Dale, NetApp

Performance Evaluation of the RDMA over Ethernet (RoCE) Standard in Enterprise Data Centers Infrastructure. Abstract:

I/O Virtualization Using Mellanox InfiniBand And Channel I/O Virtualization (CIOV) Technology

Data Center Evolution and Network Convergence. Joseph L White, Juniper Networks

Scaling 10Gb/s Clustering at Wire-Speed

ADVANCED NETWORK CONFIGURATION GUIDE

Feature Comparison. Windows Server 2008 R2 Hyper-V and Windows Server 2012 Hyper-V

Next Gen Data Center. KwaiSeng Consulting Systems Engineer

PCI Express Overview. And, by the way, they need to do it in less time.

IP SAN BEST PRACTICES

White. Paper. The Converged Network. November, By Bob Laliberte. 2009, Enterprise Strategy Group, Inc. All Rights Reserved

Networking and High Availability

Converged networks with Fibre Channel over Ethernet and Data Center Bridging

Simplifying the Data Center Network to Reduce Complexity and Improve Performance

SAN and NAS Bandwidth Requirements

Juniper Networks QFabric: Scaling for the Modern Data Center

Simplify VMware vsphere* 4 Networking with Intel Ethernet 10 Gigabit Server Adapters

Dell PowerVault MD Series Storage Arrays: IP SAN Best Practices

Lustre Networking BY PETER J. BRAAM

VXLAN: Scaling Data Center Capacity. White Paper

Ethernet in the Data Center

Networking and High Availability

SMB Direct for SQL Server and Private Cloud

The Future of Computing Cisco Unified Computing System. Markus Kunstmann Channels Systems Engineer

Fibre Forward - Why Storage Infrastructures Should Be Built With Fibre Channel

Unified Computing Systems

High Speed I/O Server Computing with InfiniBand

Increase Simplicity and Improve Reliability with VPLS on the MX Series Routers

Michael Kagan.

Windows TCP Chimney: Network Protocol Offload for Optimal Application Scalability and Manageability

Best Practices Guide: Network Convergence with Emulex LP21000 CNA & VMware ESX Server

Optimizing Data Center Networks for Cloud Computing

Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet. September 2014

IP ETHERNET STORAGE CHALLENGES

Voice Over IP. MultiFlow IP Phone # 3071 Subnet # Subnet Mask IP address Telephone.

Jive Core: Platform, Infrastructure, and Installation

Building Enterprise-Class Storage Using 40GbE

Accelerating Network Virtualization Overlays with QLogic Intelligent Ethernet Adapters

How To Switch In Sonicos Enhanced (Sonicwall) On A 2400Mmi 2400Mm2 (Solarwall Nametra) (Soulwall 2400Mm1) (Network) (

CCNA R&S: Introduction to Networks. Chapter 5: Ethernet

Fibre Channel over Ethernet: Enabling Server I/O Consolidation

The Future of Cloud Networking. Idris T. Vasi

Using High Availability Technologies Lesson 12

How To Get 10Gbe (10Gbem) In Your Data Center

HP FlexNetwork and IPv6

OVERLAYING VIRTUALIZED LAYER 2 NETWORKS OVER LAYER 3 NETWORKS

Data Center Bridging Plugfest

Private cloud computing advances

Testing Network Virtualization For Data Center and Cloud VERYX TECHNOLOGIES

Transcription:

Data Center Bridging IEEE 802 Tutorial 12 th November 2007

Contributors and Supporters Hugh Barrass Jan Bialkowski Bob Brunner Craig Carlson Mukund Chavan Rao Cherukuri Uri Cummings Norman Finn Anoop Ghanwani Mitchell Gusat Asif Hazarika (Cisco) (Infinera) (Ericsson) (Qlogic) (Emulex) (Juniper Networks) (Fulcrum Micro) (Cisco) (Brocade) (IBM) (Fujitsu Microelectronics) Zhi Hern-Loh Mike Ko Menu Menuchehry Joe Pelissier Renato Recio Guenter Roeck Ravi Shenoy John Terry Pat Thaler Manoj Wadekar Fred Worley (Fulcrum Micro) (IBM) (Marvell) (Cisco) (IBM) (Teak Technologies) (Emulex) (Brocade) (Broadcom) (Intel) (HP) 2

Agenda Introduction: Pat Thaler Background: Manoj Wadekar Gap Analysis: Anoop Ghanwani Solution Framework: Hugh Barrass Potential Challenges and Solutions: Joe Pelissier 802.1 Architecture for DCB: Norm Finn Q&A 3

Data Center Bridging Related Projects 802.1Qau Congestion Notification In draft development 802.1Qaz Enhanced Transmission Selection PAR submitted for IEEE 802 approval at this meeting Priority-Based Flow Control Congestion Management task group is developing a PAR 4

Background: Data Center I/O Consolidation Manoj Wadekar

Data Center Topology NAS web content access NAS transaction state packet processing (load balance, firewall, proxy, etc) open client requests, responses secured interactions business transactions DAS web content access Front End Intra-Tier: Load balancing, dynamic content creation, caching Inter-Tier: Business transaction DB queries responses Mid Tier Intra-Tier: State preservation Inter-Tier: Business transaction, database query or commit SAN commits, data integrity Back End Intra-Tier: lock mechanics, data sharing, consistency Inter-Tier: database query/response Networking Characteristics: Application I/O Size: Smaller Larger # of Connections: Many Few 6

Characteristics of Data Center Market Segments Internet Portal Data Centers Internet based services to consumers and businesses Concentrated industry, with small number of high growth customers Enterprise Servers Data Centers Business workflow, Database, Web 2.0/SOA Large enterprise to medium Business HPC and Analytics Data Centers Large 1000s node clusters for HPC (DOD, Seismic, ) Medium 100s node clusters for Analytics (e.g. FSIs ) Complex scientific and technical 7

Data Centers Today Type Internet Portal Data Center Enterprise Data Center HPCC Data Center Character istics SW based enterprises, Non mission critical HW base 10K to 100K servers Primary Needs: Low capital cost Reduced power and cooling Configuration solution flexibility Large desktop client base 100 to 10K servers Primary Needs: Robust RAS Security and QOS control Simplified management Non mission critical HW architecture 100 to 10K servers Primary Needs: Low Latency High throughput Presentation Logic Business Logic Database Topology examples 100s 1000s Web Servers 1000s Servers >100 DB Servers Storage Ethernet Routers Ethernet Switches and Appliances Fibre Channel Switches and Appliances Fabric preference is Low Cost Standard high volume Looks nice on the surface, but Lots of cables (cost and power) underneath 8

HPC Cluster Network Market Overview - Interconnects in Top 500 100 90 80 70 60 50 40 30 20 10 Interconnect Family Top500 Machines Other Cray HiPPI IBM Quadrics InfiniPath InfiniBand Myrinet Ethernet Other Cray IBM SP Switch IB Myrinet- 2K Gigabit Ethernet 0 1997 2000 2002 2007 Future Directions 20-40 Gb/s IB Myri-10G (Enet) 1 & 10G Enet Standard networks Ethernet & Infiniband (IB) replacing proprietary networks: IB leading in aggregate performance Ethernet dominates in volume Adoption of 10 Gbps Ethernet in the HPC market will likely be fueled by: 10 Gbps NICs on the motherboard, 10 GBASE-T/10GBASE-KR Storage convergence over Ethernet (iscsi, FCoE) There will always be need for special solution for high end (E.g. IB, Myrinet) But 10 GigE will also play a role in the HPC market in the future Challenges: Need Ethernet enhancements - IB claims better technology Low Latency, traffic differentiation, no-drop fabric, multi-path, bi-sectional bandwidth 9

Data Center: Blade Servers 12,000 12,000 10,000 10,000 8,000 8,000 6,000 6,000 4,000 4,000 2,000 2,000 Projected Blade Units (k) and Server Share (%) 29% 23% 19% 15% 11% 9% 31% 35% 35% 30% 30% 25% 25% 20% 20% 15% 15% 10% 10% 5% 5% BLADE BLADE BLADE BLADE Enet Enet Enet FC FC Enet FC IB IB FC IB IB Enet SW FC SW IB SW LAN Storage IPC 00 2006 2006 2007 2007 2008 2008 2009 2009 2010 2010 2011 2011 2012 2012 Total Servers Blades Blade Attach Rate 0% 0% Blade Servers are gaining momentum in market Second Generation Blade Servers in market Provides server, cable consolidation Ethernet default fabric Optional fabrics for SAN and HPC Challenges: IO Consolidation is strong requirement for Blade Servers: Multiple fabrics, mezzanine cards, Power/thermal envelope, Management complexity Backplane complexity REF: Total Servers: Worldwide and Regional Server 2007-2012 Forecast, April, 2007, IDC Blade Servers: Worldwide and US Blade Server 2006-2010 Forecast, Oct 06, IDC 10

Data Center: Storage Trends Lot of people using SAN: Networked Storage growing 60% year FC is incumbent and growing: Entrenched enterprise customers iscsi is taking off : SMB and Greenfield deployment - choice driven by targets External Terabytes SAN Growth: iscsi and FC 5,000 4,000 3,000 2,000 FC iscsi Storage convergence over Ethernet: iscsi for new SAN installations FC tunneled over Ethernet (FCoE) for expanding FC SAN installation 1,000 0 2005 2006 2007 2008 2009 2010 IDC Storage and FC HBA analyst reports, 2006 Challenges: Too many ports, fabrics, cables, power/thermal Need to address FC as well as iscsi 11

SAN Protocols FCoE: FC Tunneled through Ethernet Addresses customer investment in legacy FC storage Expects FC equivalent no-drop behavior in underlying Ethernet interconnect Needs Ethernet enhancements for link convergence and no-drop performance iscsi: SCSI over TCP/IP that provides reliability High speed protocol acceleration solutions benefit from reduced packet drops Encapsulation Layer Base Transport Ethernet Headers App SCSI FC Headers iscsi TCP IP Applications Ethernet (DC enhanced) FC Payload FC SCSI FCoE Frame FCP FC/FCoE FC CRC Ethernet FCS/PAD 12

Fabric convergence needs: Processor Memory I/O I/O I/O IPC Stg LAN Traffic Differentiation: LAN, SAN, IPC traffic needs differentiation in converged fabric Lossless Fabric: FC does not have transport layer retransmissions are at SCSI! iscsi acceleration may benefit from lossless fabric too Seamless deployment: Backward compatibility Plug and play Ethernet needs these enhancements to be true converged fabric for Data Center 13

Data Center Bridging Processor Memory I/O DCB What is DCB? Data Center Bridging provides Ethernet enhancements for Data Center needs (Storage, IPC, Blade Servers etc.) Enhancements apply to bridges as well as end stations Why DCB? DC market demands converged fabric Ethernet needs enhancements to be successful converged fabric of choice Scope of DCB: Should provide convergence capabilities for Data Center short range networks 14

Data Center Bridging Value Proposition Fabric Convergence Simpler Management Single physical fabric to manage. Simpler to deploy, upgrade and maintain. Servers Multiple Fabrics Converged Fabric (DCB) Lower Cost Less adapters, cables & switches Lower power/thermals Improved RAS Reduced failure points, time, misconnections, bumping. 15

Comparison of convergence levels No convergence (dedicated) Converged Fabric Management DCB Memory Memory up Chip PCIe Cluster Network SAN Memory Memory up Chip PCIe Cluster Network SAN Memory Memory up Chip PCIe 1 Link System LAN System LAN System Number of Fabric Types 2-3 1 1 IO interference No No Yes Technologies Managed 3 1 to 2 1 to 2 HW cost 3x adapters 3x adapters 1x adapter RAS More HW More HW Least HW Cable mess 3-4x 3-4x 1x 16

Data Center Bridging Summary DCB provides IO Consolidation: Lower CapEx, Lower OpEx, Unified management Needs standards-based Ethernet enhancements: Need to support multiple traffic types and provide traffic differentiation No-drop option for DC applications Deterministic network behavior for IPC Does not disrupt existing infrastructure Should allow plug-and-play for enhanced devices Maintain backward compatibility for legacy devices 17

Gap Analysis: Data Center Current Infrastructure Anoop Ghanwani

Overview Data Center Bridging (DCB) requirements What do 802.1 and 802.3 already provide? What can be done to make bridges more suited for the data center? 19

Recap of DCB Requirements Single physical infrastructure for different traffic types Traffic in existing bridged LANs Tolerant to loss apps that care about loss use TCP QoS achieved by using traffic classes; e.g. voice vs data traffic Data center traffic has different needs Some apps expect lossless transport; e.g. FCoE This requirement cannot be satisfactorily met by existing standards Building a converged network Multiple apps with different requirements; e.g. Voice (loss- & delay-sensitive) Storage (lossless, delay-sensitive) Email (loss-tolerant) Assign each app type to a traffic class Satisfy the loss/delay/bw requirements for each traffic class 20

Relevant 802.1/802.3 Standards For traffic isolation and bandwidth sharing Expedited traffic forwarding - 802.1p/Q 8 traffic classes Default transmission selection mechanism is strict priority Others are permitted but not specified Works well in for traffic in existing LANs Control > Voice > Data For achieving lossless behavior Congestion Notification - P802.1Qau [in progress] Goal is to reduce loss due to congestion in the data center Works end to end both ends must be within the L2 network Needed for apps that run directly over L2 with no native congestion control PAUSE - 802.3x On/off flow control Operates at the link level Can we build a converged data center network using existing standards? 21

Transmission Selection for DCB LACP VoIP IPC FCoE HTTP STP OSPF VoIP IPC FCoE HTTP 802.1p/Q s strict priority is inadequate Potential starvation of lower priorities No min BW guarantee or max BW limit Operator cannot manage bandwidth 1/3 for storage, 1/10 for voice, etc. Transmission selection requirements Allow for minimal interference between traffic classes Congestion-managed traffic will back off during congestion Should not result in non-congestionmanaged traffic grabbing all the BW Ideally, a virtual pipe for each class Benefits regular LAN traffic as well Many proprietary implementations exist Need a standardized behavior with a common management framework 22

Achieving Lossless Transport Using P802.1Qau and 802.3x NIC CN: Congestion Notification RL: Rate Limiter LLFC: Link Level Flow Control Congestion NIC Bridge Bridge Bridge NIC NIC NIC RL Bridge CN LLFC CN Message is generated and sent to ingress end station when a bridge experiences congestion RL In response to CN, ingress node rate-limits the flows that caused the congestion LLFC Simulation has shown P802.1Qau is effective at reducing loss Packet drop still possible if multiple high-bw source go active simultaneously LLFC is the only way to guarantee zero loss 23

More on Link Level Flow Control Using 802.3x 802.3x is an on/off mechanism All traffic stops during the off phase LACP VoIP IPC STP OSPF VoIP IPC 802.3x does not benefit some traffic Tolerant to loss; e.g. data over TCP Low BW, high priority ensure loss is relatively rare; e.g. voice 802.3x may be detrimental in some cases Control traffic; e.g. LACP & STP BPDUs Increases latency for interactive traffic FCoE FCoE HTTP HTTP Link Level Flow Control Enabled As a result most folks turn 802.3x off Need priority-based link level flow control Should only affect traffic that needs it Ability to enable it per priority Not simply 8 x 802.3x PAUSE! Provides a complete solution when used together with P802.1Qau 24

Summary The goal of DCB is to facilitate convergence in the data center Apps with differing requirements on a single infrastructure Need improvements to existing standards to be successful Flexible, standards-based transmission selection End to end congestion management Enhanced link level flow control Networks may contain DCB- and non-dcb-capable devices Discovery/management framework for DCB-specific features The technology exists in many implementations today Standardization will promote interoperability and lower costs 25

Solution Framework: Data Center Bridging Hugh Barrass

Data Center Bridging Framework Solution must be bounded - No leakage to legacy or non-dcb systems - Can optimize behavior at edges Aim for no drop ideal - Using reasonable architectures - No congestion spreading, or latency inflation (Covered in 802.1Qau) (in discussion) Relies on congestion notification with flow control backup - Two pronged attack required for corner cases (PAR under consideration) Support for multiple service architectures - Priority queuing with transmission selection (Covered in 802.1Qau and extensions) Discovery & capability exchange - Forms & defines the cloud; services supported - Management views & control 27

DCB cloud DCB devices in cloud, edge behavior Transparent pass-through for non-dcb classes Non-DCB bridges DCB service exists for some (but not all) traffic classes DCB bridges DCB end-stations Non-DCB end-stations Edge behavior 28

DCB cloud definition allows mixed topology Introduction of DCB devices in key parts of network offers significant advantages DCB cloud is formed, only DCB devices allowed inside - Function using LLDP defines cloud and operating parameters If source, destination & path all use DCB then optimal behavior At edge of cloud, edge devices restrict access to CM classes - DCB specific packets do not leak out Optimal performance for small, high b/w networks - E.g. datacenter core 29

Sample system 1 congestion point, 1 reaction point considered Network cloud: many to many connectivity Reaction point Edge data destination control packets Congestion point Edge data source Other data sources assumed equivalent Other destinations 30

In principle Operation of congestion management CM is an end-to-end function - Data from multiple sources hits a choke point - Congestion is detected, ingress rate is slowed - Control loop matches net ingress rate to choke point Simple cases 2 or more similar ingress devices; 1 choke point - Each end device gets 1/N b/w - Minimal buffer use at choke point More complex cases with asymmetric sources, multi-choke etc. - CM is always better than no CM! Corner cases still cause problems particularly for buffering - b/w spikes overfill buffers before CM can take control - Requires secondary mechanism for safety net 31

Priority based flow control In corner cases, and edge conditions, CM cannot react quickly enough to prevent queue overflow. For certain traffic types the packet loss is unacceptable. Control / hi-pri traffic doesn t fill buffer CM traffic generates flow control to avoid overflow CM traffic buffer glitches full Best effort traffic dropped if buffer overflows Transmission selection controls egress 32

Multiple service architectures using 802.1p Current standard defines only simplistic behavior - Strict priority defined - More complex behavior ubiquitous, but not standardized Enhancements needed to support more sophisticated, converged, networks Within the defined 8 code points - Define grouping & b/w sharing - Queue draining algorithms defined to allow minimum & maximum b/w share; weighted priorities etc. Common definition & management interface allow stable interoperability Sophisticated queue distribution within and between service groups Service types, sharing b/w 33

DCB Framework Congestion Management Persistent Congestion: 802.1Qau (approved TF) Transient Congestion: Priority based Flow Control (under discussion) Traffic Differentiation Enhanced Transmission Selection: 802.1Qaz (proposed PAR) Discovery and Capability Exchange Covered for 802.1Qau Additional enhancements may be needed for other DCB projects 34

In summary Complete datacenter solution needs whole framework - Components rely on each other - Utility is maximized only when all are available Fully functional Data Center Bridging solves the problem - Allows convergence of datacenter network - Currently discrete networks per function - Eliminates niche application networks High bandwidth, low latency, no drop network alongside scalability & simplicity of 802.1 bridging - Supports rapid growth to meet datacenter demands - Net benefits for users and producers 35

Challenges and Solutions: DCB Joe Pelissier

DCB: Challenges Congestion Spreading: Priority based Flow Control causes congestion spreading, throughput melt-down Deadlocks: Link level Flow Control can lead to deadlocks DCB not required for all applications/products DCB functionality applicable to DC networks, but.. May become unnecessary burden for some products Compatibility with existing devices Data Centers contain applications that tolerate drop Legacy devices and DCB devices interoperability challenges 37

Requirements of a Lossless Fabric Many IPC and storage protocols do not provide for low level recovery of lost frames Done by higher level protocol (e.g. class driver or application) Recovery in certain cases requires 100 s to 1000 s of ms Excessive loss (e.g. loss due to congestion vs. bit errors) may result in link resets, redundant fabric failovers, and severe application disruption These protocols therefore require (and currently operate over) a flow control method to be enabled With 802.3x PAUSE, this implies a separate fabric for these protocols Since traditional LAN/WAN traffic is best served without PAUSE These protocols are not broken They are optimized for layer 2 data center environments Such as those envisioned for DCB Maintaining this optimization is critical for broad market adoption 38

Frame Loss Rate is not the Issue Many IPC and storage protocols are not congestion aware Do not expect frame loss due to congestion Potential for congestion caused frame loss highly application sensitive Traffic may be very bursty very real potential for fame loss Do not respond appropriately Huge retransmission attempts Congestion collapse Storage and IPC applications can tolerate low frame loss rates Bit errors do occur Frame loss due to congestion requires different behavior compared to frame loss due to bit errors Back-off, slow restart, etc. to avoid congestion collapse 39

Congestion Notification and Frame Loss Tremendous effort has been expended in developing a congestion notification scheme for Bridged LANs Simulation efforts indicate that these schemes are likely to dramatically reduce frame loss However, frame loss not sufficiently eliminated Especially under transitory congestion events and in topologies that one would reasonably expect for storage and IPC traffic Congestion Notification does reduce the congestion spreading side effect of flow control Therefore a supplemental flow control mechanism that prevents frame loss is viewed as a requirement for successful deployment of storage and IPC protocols over Bridged LANs These protocols operate over a small portion of the network (i.e. the portion to be supported by DCB). A simple method is sufficient 40

Congestion Spreading Multiple hops in a flow controlled network can cause congestion that spreads throughout the network Major issue with 802.3x PAUSE Effects mitigated by:.. Limited to flow controlled DCB region. of network.. Limited to traffic that traditionally operates over flow controlled... networks E.g. IPC and storage. NIC Isolated from and independent of.. traffic that is prone to negative impacts of congestion spreading Priority Based Flow Control (and selective transmission) create virtual pipes on the link Flow control is enabled (or not) on each pipe as appropriate for the traffic type NIC................... NIC..... NIC 41

Deadlock (1) Flow control in conjunction with MSTP or SPB can cause deadlock See Requirements Discussion of Link Level- Flow Control for Next Generation Ethernet by Gusat et al, January 07 (au-zrl-ethernet-ll-fcrequirements-r03) To create deadlock, all of the following conditions must occur: A cyclic flow control dependency exists Traffic flows across all corners of the cycle Sufficient congestion occurs on all links in the cycle simultaneously such that each bridge is unable to permit more frames to flow At this point, all traffic on the affected Node links halts until frames age out Generally after one second Feeder links also experience severe congestion and probable halt to traffic flow A form of congestion spreading STOP Node STOP STOP Node STOP Node 42

Deadlock (2) However: The probability of an actual deadlock is very small Deadlock recovers due to frame discard mandated by existing Maximum Bridge Transit Delay (clause 7.7.3 in IEEE Std 802.1D - 2004) Low Deadlock Probability: Congestion Notification renders sufficient congestion at all necessary points in the network highly unlikely Many Data Center topologies (e.g. Core / Edge or fat tree) do not contain cyclic flow control dependencies MSTP or SPB routing may be configured to eliminate deadlock Edges would not be routed as an intermediate hop between core bridges Traffic flow frequently is such that deadlock cannot occur E.g. storage traffic generally travels between storage arrays and servers Insufficient traffic passes through certain corners of the loop to create deadlock 43

Deadlock (3) The previous assumptions are not without demonstration IPC and storage networks are widely deployed in mission critical environments: Most of these networks are flow controlled What happens if a deadlock does occur? Traffic not associated with the class (i.e. different priority level) is unaffected Normal bridge fame lifetime enforcement frees the deadlock Congestion Notification kicks in to prevent reoccurrence Node SLOW STOP SLOW STOP Node SLOW STOP Node SLOW STOP Node 44

Summary Congestion Spreading: Two-pronged solution: End-to-end congestion management using 802.1Qau and Priority based Flow Control 802.1Qau reduces both congestion spreading and packet drop Priority based Flow Control provides no-drop where required Deadlocks: Deadlock had been shown to be a rare event in existing flow controlled networks Addition of Congestion Notification further reduces occurrence Normal Ethernet mechanisms resolve the deadlock Non flow controlled traffic classes unaffected CM not required for all applications/products Work related to Data Center should be identified as Data Center Bridging (following Provider Bridging, AV Bridging ) Provides clear message about usage model Compatibility with existing devices Capability Exchange protocol and MIBs will be provided for backward compatibility 45

802.1 Architecture for DCB Norm Finn

IEEE Std. 802.1Q-2006 Subclause 8.6 The Forwarding Process 47

IEEE Std. 802.1ag-2007 Subclause 22 CFM in systems 48

IEEE Std. 802.1ag-2007 Subclause 22 CFM in systems 49

IEEE P802.1au/Draft 0.4: Bridges Subclause 31 Congestion notification entity operation 50

Proposed extension for PPP in Bridges Subclause TBD modified queuing entities RED: PPP generation based on space available in perport, per-controlled-priority, input buffers. BLUE: PPP reaction, only for BCN controlled priority queues. 51

IEEE P802.1au/Draft 0.4: Stations Subclause 31 Congestion notification entity operation 52

Proposed extension for PPP in Stations Subclause TBD modified queuing entities RED: PPP generation based on space available in perport, per-controlled-priority, input buffers. BLUE: PPP reaction, only for BCN controlled priority queues. 53

Thank You!

Questions and Answers 802.1Qau Congestion Notification In draft development 802.1Qaz Enhanced Transmission Selection PAR submitted for IEEE 802 approval at this meeting Priority-Based Flow Control Congestion Management task group is developing a PAR 55