Paul Doherty Technical Marketing Engineer Intel Corporation
Agenda Dell Forum 2011 Introductions Unified Networking Why do we need it? What is it? Why should I care about the end node? Platform I/O Changes & How They Impact Unified Networking CNA High Level Architectural Approaches The Software Myth Ethernet, & Virtualization Focus
Dell Forum 2011 The Data Center Evolution & the influence of storage
Evolution of the Datacenter Dell Forum 2011 Cloud Datacenter Virtualized Datacenter Cloud Infrastructure Discrete Datacenter Compu te Management Networ k VM VM VM VM Mgmt Unified Network Network Security Datacenter facilities (e.g. cooling, power) Comput e Consolidation Discrete networks Servers Arrays Flexible Management 10G Unified Network Efficient and Secure Open Architecture Simplified Network
Growth Dell Forum 2011 Despite down economy data growth strong at 40% Unstructured data growing the fastest Replication, content depots and cloud accelerate growth Personal data growth still faster than enterprise Source: IDC, 2010 Forecast Data growth challenge requires storage innovation
Usage Models Dictate the Solutions Dell Forum 2011 Random small Business DB (OLTP, OLAP) Content distribution network (CDN) Application data store (e.g. Web,E/Vmail, VM/Boot) Performance Performance Sequential Large Large Relational DB (e.g. NoSQL, non ACID) Large analytics (e.g Hadoop) High performance compute (e.g. pnfs) Large object storage (e.g. Sharepoint, Haystack) Backup and archive (server and client) Capacity Gigabytes Terabytes Petabytes Zetabytes Capacity Converged storage server enables tuning for usage model
Dell Forum 2011 The role Intel IA Platforms play in the Data Center Evolution 7
Server- Platform Convergence Dell Forum 2011 Modular Platform for DAS, NAS and SAN Systems Scalable Architecture Performance, Memory and I/O with integrated storage features Unified Network 10Gb Ethernet & FCOE for and Network Industry Standards Volume economics achieved via industry standard hardware Proprietary Solutions Advanced Technologies Powerful, scale out to support advanced data center efficiencies Next Generation Data Center
Integrated Server- Platform Dell Forum 2011 Innovate and Integrate: Intel Xeon processor Based on Intel Xeon microarchitecture PCI Express* (PCIe) integrated into processor Integrated DMA engines Integrated Non-Transparent PCIe Bridging (NTB) for redundant systems Asynchronous DRAM Self-Refresh (ADR) capability to preserve critical data in RAM through a power fail Hardware RAID5/RAID6 acceleration to offload RAID calculations (XOR/P+Q) Integrated Serial Attached SCSI (SAS) Instruction set innovations targeted at key workloads Memory Controller Intel Xeon processor I/O Controller Xeon
Platform I/O Changes Dell Forum 2011 Dell 10G based platform (Intel Xeon (5100 chipset)) w/ 1-10 GB Dual Port Intel Ethernet 82598 based adapter w/ bi-directional traffic could not achieve line rate with both ports achieved ~ 13 Gb/sec. Dell 11G based platform (Intel Xeon (5500 chipset)) w/ 1-10 GB Dual Port Intel Ethernet 82599 adapter w/ bi-directional traffic achieve ~ 18.4 Gb/sec. Active/Active vs Active/Passive Ports Post Intel Xeon (5500 chipset) capable of 50 Gb/sec. Micro-architecture changes altered the I/O Game Platform Micro-architecture Changes: Significantly more CPU head room Opens up different approaches to traditionally Compute intense I/O applications such as iscsi and FC
Dell Forum 2011 Unified Networking
Why Unified Networking Dell Forum 2011 Today, Datacenters deploy multiple networks for different traffic types Infiniband / Ethernet <5% attach Clustering Network Local Area Network Ethernet ~100% attach Network Fibre Channel or iscsi SAN <30% attach Unified Network consolidates traffic on an 10G Ethernet Fabric Simplifies the network by migrating to 10GbE Lowers TCO by consolidating data and storage networks Flexible network is the foundation of Cloud architecture
What is Data Center Bridging? Dell Forum 2011 Data Center Bridging (DCB) is an architectural collection of standards based (IEEE) Ethernet extensions designed to improve Ethernet networking and management in the Data Center. Sometimes also called CEE = Converged Enhanced Ethernet DCE = Data Center Ethernet (Cisco Trademark) EEDC = Enhanced Ethernet for Data Centers
Evolving Ethernet Dell Forum 2011 Feature / Standard Priority Flow Control (PFC) IEEE 802.1Qbb Bandwidth Management IEEE 802.1Qaz Congestion Management IEEE 802.1Qau Data Center Bridging Exchange Protocol (DCBX) L2 Multipath for Unicast and Multicast Benefit Enable multiple traffic types to share a common Ethernet link without interfering with each other Enable consistent management of QoS at the network level by providing consistent scheduling End-to-end congestion management for L2 network (future) Management protocol for enhanced Ethernet capabilities Increase bandwidth, multiple active paths. No spanning tree (future) Enabling Differentiated Services in an Ethernet Fabric
Priority based Flow Control (PFC) Dell Forum 2011 Transmit Queues Ethernet Link Receive Queues P0 P0 P1 P1 P2 P2 P3 P4 STOP PAUSE UP=3 P3 P4 Up to 8 priorities P5 P5 P6 P6 P7 P7 PFC allows to selectively flow control traffic types that need lossless characteristics Defined in IEEE 802.1Qbb specification
Enhanced Transmission Selection (ETS) Dell Forum 2011 Traffic Type Input traffic ETS configuration 10GbE Link utilization IPC 3G 3G 2G BW: 20% 3G 3G 2G 3G 3G 3G BW: 30% 3G 3G 3G LAN 3G 4G 6G BW: 50% 3G 4G 5G t1 t2 t3 t1 t2 t3 ETS provides minimum bandwidth guarantees per traffic type Allows up to 8 traffic class groups Defined in IEEE 802.1Qaz specification
Congestion Notification (IEEE 802.1Qau) Dell Forum 2011 NIC RL Reaction Point Switch Congestion Point Switch Switch NIC NIC NIC RL Back-off Triggered NIC RL Switch CN - Congestion Notification gets generated when a device experiences congestion. Request is generated to the ingress node to slow down RL - In response to CN, ingress node rate-limits the flows that caused the congestion Priority based Flow Control = Provides insurance against sharp spikes in the confluence traffic, avoids packet drops
Configuration Management Dell Forum 2011 Announced parameters Operational parameters Local parameters DCE MIB Device A Announced parameters LLDP Messages Ethernet Link Announced parameters Operational parameters Local parameters DCE MIB Device B Link level capability and configuration exchange Similar to FLOGI and PLOGI in Fibre Channel Allows either full configuration or configuration checking Based on LLDP (Link Level Discovery Protocol) DCBX TLVs Priority Groups, Priority-based Flow Control, Congestion Notification Application (frame priority usage)
Example DCBX Deployment Model Dell Forum 2011 Detects configuration mismatches between link peers and notifies Management Discovers DCB related peer capability Detect boundaries of congestion management Conflict alarm DCX DCX DCX Switch Management Conflict alarm DCX DCX Boundary of Congestion Managent Region DCX DCX Server Management
Shortest Path Bridging - 802.1aq Another Approach to Shortest Path Bridging Dell Forum 2011 What is it Enhancement to 802.1Q to provide Shortest Path Bridging (Optimal Bridging) in L2 Ethernet topologies Provides for each bridge to be the root of its own topology and hence uses the best path to any destination Benefits Resolves issues related to root disappearance Fast convergence no count to infinity Does not require a link state protocol Resources http://www.ieee802.org/1/files/public/docs2005/aq-nfinn-shortest-path-0905.pdf http://www.ieee802.org/1/files/public/docs2006/aq-nfinn-shortest-path-2-0106.pdf
Ecosystem Enabling: DCB Dell Forum 2011 Started working on Data center bridging with co-travelers Launched Intel 82598 (Oplin), 1 st product to support DCB DCB code included in Linux 2.6.29 kernel release Launched Intel 82599 (Niantic) with DCB support and FCoE offloads 2004 2005 2006 2007 2008 2009 2010 Wrote specification for ETS and DCBX CNA s launched in the market using Intel 82598 (Oplin) Ethernet Alliance DCB Plugfest. 9 vendors participating including Dell, Cisco, and Intel Leading the industry in lossless Ethernet
Preliminary iscsi DCB Results iscsi WITH DCB Dell Forum 2011 iscsi WITHOUT DCB 1 10G 10G DCB SWITCH 1 10G NON-DCB 10G SWITCH 2 10G 10G 10GbE ARRAY 2 10G 10G 10GbE ARRAY WINDOWS SERVER 2008 x64 10GbE CNA DCB LINKs (PFC) WINDOWS SERVER 2008 x64 10GbE CNA 1 1 2 2 Balanced iscsi throughput (600MB/s, 600MB/s) Steady packet streams (no TCP burstiness) Unbalanced iscsi throughput (1100MB/s, 100MB/s) Typical TCP burstiness
iscsi & DCB Application TLV Intel x520/brocade/dell Equallogic Dell Forum 2011
Dell Forum 2011 CNA Architectural Approaches
CNA Architectural Approaches Dell Forum 2011 CNA vendors have their roots in either Ethernet or FC markets Ethernet Vendors attempted to address the limitations of the FSB in a couple of ways Stateful Approach Some vendors relied on a TOE Although it was not widely deployed, there were some markets (iscsi being one of them) where it did show improvements Stateless Approach Other vendors relied on platform assists and OS improvements
CNA Architectural Approaches Dell Forum 2011 CNA vendors have their roots in either Ethernet or FC markets Ethernet Vendors attempted to address the limitations of the FSB in a couple of ways Stateful Approach Some vendors relied on a TOE Although it was not widely deployed, there were some markets (iscsi being one of them) where it did show improvements Stateless Approach Other vendors relied on platform assists and OS improvements FC All FC vendors attempted to address the limitations of the FSB using a stateful approach
CNA Architectural Approaches Dell Forum 2011 Strengths of Stateless Approach Scales well Less power Easier integration with add on OS features such as teaming and multi-pathing. More users using native OS paths leads to less risk Weaknesses of Stateless Approach Silicon changes require longer lead times Longer integration into native OS means slower TTM Strengths of Stateful Approach Relatively easy to make changes particularly to ULP TTM as they are not as dependent on the OS or platform for support Weaknesses of Stateful Approach Power associated with onboard microprocessor to keep state Does not scale with platform. Platform Micro architecture changes have largely negated the primary rational for full scale offload.the great equalizer Challenges with 40 GB and beyond Inherent risk with offloaded stack fewer users, more error prone
Best of Breed Unified Networking Adapter Dell Forum 2011 Best of Breed Question No CNA currently on the market has a best of breed solution for both Ethernet and FC. Need to determine your primary needs are. Cisco 20% of servers in a data center require block storage and virtually all servers in today s Data Center are using Ethernet. At first blush, it seems that since Ethernet is going to be the backbone of the new data center and it would be foolish not to strongly consider CNA s whose roots are Ethernet. DCB flexibility should also be a consideration as well as we see new Application TLV s on the horizon.
10Gb Ethernet Growth Significant Increase Expected with LOM Transition Dell Forum 2011 Ports (millions) 1 GbE Infiniband 10 GbE Ethernet iscsi FCoE Fiber Channel Source: IDC Intel FCoE Assessment, Rick Villars, Dell Oro Network Adapter Forecast Tables Jul 09, IDC Networking Forecast 2010-14, IDC Infiniband Volumes 2009, Intel market model estimates
The Software Myth: Protocols require HW offloads Dell Forum 2011 Many have claimed that HW offloads are required for optimal performance Many differentiate the 2 approaches as a HW vs SW approach Fact No commercial Unified Networking solution does not have some level of HW offload. The Intel solution uses the following: iscsi iscsi CRC offload is done in the microprocessor RSS hashing on queues FCoE FC CRC offloaded DDP TX/RX exchange offload
FCoE Receive Direct Data Placement Dell Forum 2011 SW prepares upfront the FC RD context including pinning the user buffers SW then sends the FC RD request FC RD response may be split to multiple FC sequences User Buffers Buffer 0 Buffer 1 Buffer 2 SW Receive Buffer Descriptor List Buffer 0 Address Buffer 1 Address Buffer 2 Address... Buffer N Address Per Exchange Context In Host Memory Buffer N HW DDP Contexts OX_ID DMA Context Filter Context 0 Valid First Last Count Size Offset Buffer List Ptr Valid First Seq_ID Seq_Cnt 511 Valid First Last Count Size Offset Buffer List Ptr Valid First Seq_ID Seq_Cnt HW Large FC Receive Contexts 31 ntel Confidential
Rx Exchange-Id Packet Filtering and Multi-Core Aligned Processing Model Dell Forum 2011 System Cores (1 to N) I/O Processing Request Queuing Once an I/O is queued from originating core all remaining processing and completion back to SCSI layer will be via associated queue core. H/W receive logic to route frames to specific queue based on FC exchange ID. Exchange ID assigned during request queuing process to select specific base driver queue pair. Rx Pool 0 Rx Pool 1 NDIS Base LAN Driver RSS Queues Rx Filtering ( OX-ID & 0x7 ) DMiX setting for number of queues active (1-8) Specific per queue interrupt core affinity ntel Confidential
iscsi Application Performance Data Dell Forum 2011 Microsoft Exchange Jetstress 5000 Mailboxes Source: Demartek Testing. Microsoft Exchange Jetstress, Intel FCoE size=200mb IOPS=0.5 SG=8 NetApp FAS3170 using Intel Xeon 5600 processor, Windows Server 2008 R2
iscsi Application Performance Data Dell Forum 2011 SQL SQLIO Performance testing Source: Demartek Testing. Microsoft SQLIO, Intel FCoE, and competitor FCoE CNA, size=200mb IOPS=0.5 SG=8 EMC CX4 using Intel Xeon 5600 processor, Windows Server 2008 R2
Simplify your Network with Ethernet Dell Forum 2011 Consolidate Gigabit Ethernet One Network Connection Proven Intel Ethernet Best-in-class virtualization Price/Performance optimized Deploy iscsi, NFS, or FCoE
Ecosystem Enabling: DCB Dell Forum 2011 Started working on Data center bridging with co-travelers Launched Intel 82598 (Oplin), 1 st product to support DCB DCB code included in Linux 2.6.29 kernel release Launched Intel 82599 (Niantic) with DCB support and FCoE offloads 2004 2005 2006 2007 2008 2009 2010 Wrote specification for ETS and DCBX CNA s launched in the market using Intel 82598 (Oplin) Ethernet Alliance DCB Plugfest. 9 vendors participating including Dell, Cisco, and Intel Leading the industry in lossless Ethernet
Preliminary iscsi DCB Results iscsi WITH DCB Dell Forum 2011 iscsi WITHOUT DCB 1 10G 10G DCB SWITCH 1 10G NON-DCB 10G SWITCH 2 10G 10G 10GbE ARRAY 2 10G 10G 10GbE ARRAY WINDOWS SERVER 2008 x64 10GbE CNA DCB LINKs (PFC) WINDOWS SERVER 2008 x64 10GbE CNA 1 1 2 2 Balanced iscsi throughput (600MB/s, 600MB/s) Steady packet streams (no TCP burstiness) Unbalanced iscsi throughput (1100MB/s, 100MB/s) Typical TCP burstiness
Intel Enterprise iscsi Trusted Native Initiators & Targets Integrated into Operating Systems (MSFT, Linux, ESX, etc) with GUI, MPIO, etc iscsi boot with Intel iscsi Remote Boot and single PCI option ROM* with Intel Ethernet Combo Option ROM Consistent solution across 1G/10GbE Maximum compatibility Native iscsi Acceleration TCP segmentation offloads (LSO) Large receive offloads (LRO) Multicore scaling and load balancing Data Integrity Off-loads Xeon 5500 iscsi CRC32c CPU instruction set IPSec Encryption DCB for iscsi traffic (in 2010) Leading Virtualization Support Virtual I/O acceleration with VMDq (VMware Netqueue and HyperV VMQ) Dell Forum 2011 Intel 82599 iscsi Performance Single Port Windows Server 2008 R2 Iometer R/W Line Rate @ 4k Source: Intel, April 2010, Intel Networking Performance Lab. Based on Internal testing of Intel Xeon W5600 servers and Intel 82599 10GbE Adapters running Windows Server 2008 R2x86_64. Cisco Nexus 5020, & Starwind Soft Target *contains setup, iscsi, PXE, UEFI
Thank you