Data Centre Networking Evolution Business Challenges, Architectural Principles, Impact of Cloud Computing, FCoE Yves Haemmerli IT/Network Architect yha@ch.ibm.com 2011 IBM Corporation
Agenda 1 Business Challenges in the Data Centre Architecture 2 Impact of Cloud Computing on the Data Centre Network 3 Data Centre Network Architectural Principles 4 Unified Fabric and Fibre Channel over Ethernet (FCoE) 5 New Ethernet Developments 2
Agenda 1 Business Challenges in the Data Centre Architecture 2 Impact of Cloud Computing on the Data Centre Network 3 Data Centre Network Architectural Principles 4 Unified Fabric and Fibre Channel over Ethernet (FCoE) 5 New Ethernet Developments 3
Migrating of building a new D.C network is a challenging task and often over-simplified. After all, the network federates the whole IS/IT! Business & Architecture How should a new data centre architecture look like, to provide the required level of availability and resiliency for the business? Data Centre Security & Privacy Technological Decisions What level of security has to be designed to protect systems and critical resources in a fully virtualized Data Center? Based on the broad offering in high-performance and virtualized systems, what are the best technologies to optimize the infrastructure? Skills & Expertise Crisis Complexity is clearly increasing and data centre network designers are required to master multiple domains 4
Requirements and challenges on the data centre are technical, organizational, architectural and environmental Expertise Requirements Designing or migrating to a new data centre requires strong expertise in various areas. A «versatilist» profile is demanded instead of pure specialists and generalists Mobility of Virtual Machines This leads to significant network architecture changes, due to Layer-2 extension requirement. This has a direct impact on the security model and D.C architecture Operational Complexity Automation, Management, Consolidation, Heterogeneity, Multiple Management Systems, all this highly relies on skilled and experienced professionals Fibre-Channel & Ethernet Convergence Paradigm shift to FCoE, Unified Fabric, lossless behaviour, common infrastructure, traffic prioritization, bandwidth management, etc.. High Performance Servers Proliferation Despite virtualization, many customers continue to deploy physical servers at a rapid pace. There is an increasing demand on server ports and blade connections Power, Cooling, Cabling Multiplication of devices, low resource utilization, high cooling requirement, multitude of network appliances, cabling complexity,etc.. 5
Analysts state that both data centre and networking technologies follow parallel evolution path Servers/ Applications Evolution Networking Evolution Autonomic Computing Adaptive performance and resource management Automation SOA, Web services, Cloud computing Network flexibility & dynamic provisionning Virtualization Application integration Intelligent network services Integration Consolidation Strategic approach Resource islands Disparate networks Consolidation Time Time 6
SAN and LAN Convergence is a new concept that did not gain fast adoption gloablly, but will be a corner stone in the near future Network Specialized network devices Networking functions integration Networking virtualization Unified Fabric : Ethernet switching AND SAN switching Storage Dedicated servers and storage Server and storage consolidation Virtual storage networks (VSAN) Dedicated Consolidated Virtualized Unified
In many organizations, a new type of personal profile is required. We already have specialists and generalists now we need versatilists! Servers Skills Network Skills! Storage Skills Technologies Application Layer Networking and influences application flows based on Layer- 7 information Server Virtualization pushes Ethernet switching closer to virtual machines and creates new challenges in connectivity and network security LAN / SAN Convergence will merge Fibre- Channel and Ethernet on the same Ethernet network Skills Network Specialists and engineers are required to significantly enlarge their skills spectrum in neighboring domains Inter-Team collaboration becomes a key success factor in the IT department! 8
Agenda 1 Business Challenges in the Data Centre Architecture 2 Impact of Cloud Computing on the Data Centre Network 3 Data Centre Network Architectural Principles 4 Unified Fabric and Fibre Channel over Ethernet (FCoE) 5 New Ethernet Developments 9
Why do we speak about cloud computing in this presentation? Cloud Computing is in almost all IT discussions today Public clouds, hybrid and private clouds are known keywords... On high-level discussions, we all understand the concepts But when it comes to understand how it works it is not so easy And when it comes to networking and security, it is not easy at all! 10
Cloud Computing is a new style of IT consumtion, in which applications, data & IT resources are provided as a service to users The main objective is to increase flexibility and reducing costs, by presenting the Data Center as a service. One important enabling factor for cloud concept is a very dynamic infrastructure. The data centre network design is directly impacted by this new paradigm Cloud Infrastructure Servers, Storage, Network Cloud Services Catalog Clients using Cloud Services Cloud Development Cloud Administrator Virtualization Standardization Automation 11
The Network is the undelying Element of the Cloud Reference Architecture that enables Cloud Computing Cloud Service Consumer Consumer End user Cloud Services User Interface API Cloud Service Provider Business-Process-as-a-Service Software-as-a-Service Platform-as-a-Service Infrastructure-as-a-Service Virtualized Infrastructure Server, Storage, Network, Facilities Cloud Services Developers Developer Developer PartnerClouds Customer In-house IT Consumer Administrator Service Delivery Portal API BSS Business Support Services OSS Operation Support Services Common Cloud Management Platform Offering Mgmt Order Mgmt Accounting & Billing Contract Mgmt Service Templates Service Request Mgmt Provisioning Monitoring & Event Management Customer Mgmt Entitlements Invoicing SLA Reporting Metering, Analytics & Reporting Service Automation Management Configuration Mgmt Incident, Problem & Change Management IT Asset & License Mgmt Virtualization Mgmt Pricing & Rating Subscriber Mgmt Peering & Settlement Service Catalog Image Lifecycle Mgmt IT Service Level Mgmt Capacity & Performance Management Service Development Portal Developer Service Development Tools Service Definition Tools Image Creation Tools Consumer Business Manager Service Business Mgr Service Provider Portal Service Transition Mgr. Service Operations Mgr. Service Security Mgr.
Cloud-driven business is emerging with an expected time to broad acceptance in two to five years. 13
Each cloud delivery model has different characteristics and the impact on the data centre design differs significantly In these models, data resides outside client s firewall! A,B,C Users X,Y,Z Enterprise Enterprises A B Enterprise Enterprise data centre Enterprise data centre Public cloud services Shared cloud services Hosted private cloud Managed private cloud Private cloud Service providers clouds Standard services Users are Individual users Resources are shared Access to the cloud through the public Internet Service providers clouds Standard services Users are enterprises Mix of shared & dedicated resources Access to the cloud through VPNs or WAN service provider Cloud owned and operated by a Service provider Dedicated resources for the client enterprise Access through dedicated connections from a WAN service provider Cloud operated by a third party, but owned by the enterprise Implemented in the client data centre Uses the client enterprise network High compliancy Cloud owned and operated by the enterprise Implemented in the client data centre Uses the client enterprise network High compliancy 14
Cloud computing data centres require a new set of network design attributes when compared with traditional architectures Enterprise data centre Enterprise data centre Traditional Network Design Optimized for availability Slow to adapt Costly Infrastructure silos Dedicated Device sprawl Location dependence Cloud Network Design Optimized for flexibility New approach to availability Cost is more important Variable, metered Integrated infrastructure Shared Consolidation/Virtualization Location independence 15
Cloud Computing is enabled by taking advantage of all possible virtualization capabilities in the infrastructure Data Centre Infrastructure Virtualization Systems & Applications Special Appliances Pool of virtual Servers Full Virtualized and Unified Network Infrastructure Pool of virtual LANs Pool of virtual Routers Pool of virtual Firewalls Pool of virtual Load- Balancers Pool of virtual Switches Pool of virtual SAN Fabrics Pool of virtual SANs Pool of virtual Storage 16
For cloud computing data centres and in any flexible designs, virtualization of network devices is a mandatory concept Virtual Load Balancers created on-demand Sample network part of a cloud enabled data centre network DMZ VLAN not used yet but ready to host a new service 17
Agenda 1 Business Challenges in the Data Centre Architecture 2 Impact of Cloud Computing on the Data Centre Network 3 Data Centre Network Architectural Principles 4 Unified Fabric and Fibre Channel over Ethernet (FCoE) 5 New Ethernet Developments 18
An appropriate security concept is the corner stone and the first step of any data center design Security Zoning Concept Sample Data Center Security Model An appropriate Network Security Concept is a corner stone of a modern Data Center and should be the driver of a new design. The Data Center network has to be based on a zoning architecture to render a breackout more difficult to achieve. A Data Center security model not only applies to the network design itself, but also influences the way applications are split in distinct zones and enclaves. 19
High Availability (HA) is often confused with Disaster Recovery (DR). Both concepts have very different characteristics High Availability Disaster Recovery Corporate WAN 50-100 Km Up to 0.5 ms Latency Layer-2 Connectivity Netw. Convergence < 2s Synchronous data mirroring RTO = 0 to 4 Hrs Virtually Unlimited Up to 350 ms Latency Layer-3 Connectivity (New Layer-2 extension are emerging) Network Convergence < 30s Asynchronous data replication RTO = 8 to 72 Hrs
A two-sites concept is generally the best practice to guarantee the level of availability required by the business Corporate WAN Active and Internet Hot Standby WAN Connections Internet Connections Security Zones IP Routing Ethernet Switching Server Load Balancing Server Clustering Storage Area Network Synchronization, Mirroring and Failover Typical Distance = 50-80 Km (50 Km = about 250 µs latency = 32 KB in-flight) xwdm Optical Network Encryption may be required WAN Connections Internet Connections Security Zones IP Routing Ethernet Switching Server Load Balancing Server Clustering Storage Area Network Site A Site B 21
The anatomy of a modern data centre must be modular, allowing the relocation of a site without impacting the overall architecture With the massive arrival of virtualized servers, Layer-2 and Layer-3 boundaries have to be re-evaluated, in order to keep segmentation and modularity, while satisfying Layer-2 requirement Server mobility and server clustering both require a Layer-2 domain to work Layer-3 Routing Layer-2 Extension Site A Site B 22
In a two-sites concept, Layer-2 domains have to be extended on both sites to comply with server mobility and server clustering technologies Virtual machines mobility radically changes the way we think about network design, segmentation, addreesing and security Even if the Layer-3 segmentation paradigm is still valid in the data centre network design, Layer-2 (flat network) is required between both data centre sites IP1 MAC1 IP2 MAC2 Production VLAN IP3 MAC3 VM1 VM2 VM3 Management VLAN Data Center Site A Layer-2 Extension Data Center Site B 23
Optical multiplexing between both sites transports 1G, 10G Ethernet as well as 4G and 8G Fibre Channel over the same fibre optic link CWDM : up to 8 channels, 20 nm between lambdas, 1470 to 1610 nm Allows transport of 1Gigabit Ethernet, 1,2,4 Gigabit FC CWDM is based on PASSIVE devices DWDM : up to 80 channels, 0.8 nm between lambdas, 1530.33 to 1560.61 nm Enables the transport of 10 Gigabit links DWDM is based on ACTIVE devices EWDM : up to 8 DWDM channels between CWDM channels, 1538.98 nm to 1560.61 nm Enables a CWDM infrastructure to transport 10 Gigabit links EWDM is based on PASSIVE devices (but amplifying is possible) Redundant Optical Path on two separate physical paths 24
There are four main reasons to build a data centre with 10G or even 40G Ethernet Multi-Core CPUs allow faster execution of multiples applications on the same processor. Multi-Socket and Multi-Core server technology supports higher workload levels, which, in turn, demand greather network throughput. Server Consolidation in Blade Chassis aggregates server NICs, which together generate higher demand on the Data Center network Access Layer. Virtualization allows the consolidation of multiple appli-cations on a server, thus driving the bandwidth requirement. Server virtualization enables workload consolidation, which contributes, additionally, to network throughput demand. FCoE requires 10 Gigabit Ethernet as enabling technology for I/O convergence. It offers the bandwidth and underlying enhancements to support multiple I/O streams on a single cable. FCoE CNA 25
Traditional data centre installations rapidely face the dense cabling problem, due to the growing number of servers (despite virtualization.) At least 3 x 1G Ethernet copper cable per server (2 x production + 1 x management) Network Rack Server Racks
Top of Rack are not switches, but remote line cards associated with parent switches in the network rack. Massive reduction of the cabling is achieved 2 x 10G Ethernet Top of Rack Modules 1G Ethernet local connections Network Rack Server Racks
Agenda 1 Business Challenges in the Data Centre Architecture 2 Impact of Cloud Computing on the Data Centre Network 3 Data Centre Network Architectural Principles 4 Unified Fabric and Fibre Channel over Ethernet (FCoE) 5 New Ethernet Developments 28
Agenda 1 Business Challenges in the Data Centre Architecture 2 Impact of Cloud Computing on the Data Centre Network 3 Data Centre Network Architectural Principles 4 Unified Fabric and Fibre Channel over Ethernet (FCoE) What are the main differences between LAN and SAN? 29
The major difference consist in the way congestions are handled Ethernet (Lossy) Ethernet frame drop is allowed when an input queue becomes overloaded Upper layers are responsible to retransmit lost packets SAN (Lossless) Frame drop would have catastrophic impact on SAN performance In the Buffer to Buffer Credit mechanism, the sender knows how much data he can send without creating a congestion Traditional Ethernet networks are not compatible with FcoE!
Sharing an Ethernet link LAN and SAN traffic requires a link-level flow control mechanism to stop the sender in case of a congestion In the Ethernet world, there is similar flow control mechanism, specified in the IEEE 802.3X standard. A special packet called PAUSE frame can be used to inform the sender to stop sending data for a specified period of time. Unfortunately, the PAUSE frame caused a lot of problems due to incompatible implementations and is therefore not universally used Another issue with the current PAUSE frame mechanism is that all traffic from a port is stopped, not only the traffic generating the congestion. This negatively impacts the other traffic.
Another difference resides in the usage of available links Ethernet networks are based on Spanning-Tree protocol (802.1w) Data Center Ethernet LAN Half of Access Layer uplinks are actually used, while the others are in blocking state, to avoid loops. Unused links Unused link Unused links There is no multipathing at all. This will be not acceptable anymore in converged data centres. Data Center Fibre-Channel SAN All links in use In a SAN network, all available links are used. Even edge nodes have multipathing capabilities. Multipathing is enabled by the Fabric Shortest Path First (SFPF) protocol, which is similar to its OSPF counterpart in IP networks.
Agenda 1 Business Challenges in the Data Centre Architecture 2 Impact of Cloud Computing on the Data Centre Network 3 Data Centre Network Architectural Principles 4 Unified Fabric and Fibre Channel over Ethernet (FCoE) The FCoE Standard and its implementation 33
From a Fibre-Channel point of view, nothing changes with traditional SAN connections. It is just the encapsulation of FC in Ethernet The best use case for FCoE is still for customers who are building a completely new data centre, or refreshing their entire data centre network. In such situations, it is obvious to deploy 10GbE infrastructure For customers with a 10GbE infrastructure in place already, it will most certainly make sense to leverage FCoE functionality. The uplift to implement FCoE should be relatively minimal. One important caveat to consider before implementing a converged infrastructure is to have organization discussions about management responsibility of the switch infrastructure. This will particularly apply to environments where the network team is separate from the storage team.
Fibre Channel over Ethernet (FCoE) is now a stable standard, even if extensions are currently in a pre-standard phase FCoE is a cooperative effort of Broadcom, Brocade, Cisco, EMC, Emulex, Finisar, HP, IBM, Intel, Mellanox, Netapp, Nuova, Qlogic, ANSI T11 workgroup (www.t11.org/fcoe) FCoE is an approved standard from the ANSI T11 FC-BB- 5 working group FCoE requires a Lossless Ethernet fabric FCoE is not IP routable, FCoE it is a Layer-2 protocol. However, it is routable at the FC Layer, like traditional FC switching, unsing FSPF as routing protocol. FCoE is just another type of wire that preserves the FC semantics and tools. Mini-Jumbo Ethernet frames are required (2.5 KB) to transport FC frames (up to 2128 Bytes) Frame efficiency of FCoE, is about 1% less compared to standard FC
With FCoE, the original link level frlow control PAUSE frame mechanism has been redefined and is now selective for specific traffic 8 Input Queues Lossy Queues assigned to LAN traffic Lossless Queues assigned to SAN traffic 802.1Q Ethernet link with 8 virtual lanes. Virtual lanes are identified with the three 802.1Q Priority bits
FCoE is the more efficient encapsulation method to transport Fibre Channel over an Ethernet link SCSI Cable SCSI Plug SCSI Commands TEST UNIT READY FORMAT UNIT READ BLOCK LIMITS The common goal of these technologies is to transport iscsi commands as reliably as the good old flat SCSI cable! SCSI FCP iscsi Fibre Channel FCIP TCP IP FCoE Ethernet Nothing changes with FCoE in regard to SAN configuration. You can see FCoE just as another link type FCoE has less overhead compared to iscsi and FCIP. FCoE presents about 96% of native Fibre Channel efficiency FCoE is not a replacement of FCIP or iscsi, which will still be used for long distance SAN connections. Physical Layer
FCoE versus 8 Gig Fibre Channel shows better performance for larger block sizes FCoE technology is capable of providing up to 250,000 IOPs As 8G FC reaches its throughput limits; FCoE can still provide additional bandwidth, up to 25% for larger block sizes.
FCoE It is just the encapsulation of a Fibre Channel frame in an Ethernet frame Ethernet Envelope Fibre Channel Traffic
Converged Network Adapters (CNA) are PCI-Express cards seen by the operating system as Ethernet and Fibre Channel adapters A CNA is seen by the operating system as a dual-port Ethernet adapter and a separate dual-port Fibre Channel adapter (HBA) A CNA installed on a storage array connects to a FCoE switch and transmits FC frames encapsulated in a 10 Gb/s Ethernet link A CNA installed on a host connects to an FCoE switch and transmits Fibre Channel frames as well as IP frames (including iscsi or NFS) over a 10 Gb/s Ethernet link Q-Logic CNA for IBM Blade Server Generation 2 CNAs now have one chipset for both Ethernet and Fibre Channel support 2 x 10 Gigabit Ehernet Ports, each carrying both LAN and SAN PCI Express for High speed communication
FCoE is slowly adopted as an interesting option for data centre renewals, to solve to cabling issues and leverage 10GE and 40GE Traditional Design Parallel LAN & SAN infrastructures Inefficient use of available links 5+ connections per server FCoE Design, Phase 1 Reduction of server adapters Link Multipath at Access Layer Still distinct networks at aggregation
Phase 2 of FCoE implementation allows storage arrays to be connected to a converged data center network. FCoE Design, Phase 2 The core becomes FCoE capable Storage arrays connected on FCoE Traditional FC for non FCoE devices
Traditional data centre installations rapidely face the dense cabling problem, due to the growing number of servers At least 3 x 1G Ethernet copper cable per server (2 x production + 1 x management) 2 x 4G/8G Fibre Channel optical cable per server Network Rack SAN Rack Server Racks
Top of Rack modules, associated with FCoE parent switches in the network rack transport both LAN and SAN Traffic 10G Ethernet 4G/8G Fibre Channel 10G Converged Ethernet (FCoE) Top of Rack Modules CX4 Cables Network Rack SAN Rack Server Racks
Cisco Nexus 4000 Bladecentre modules significantly reduce the cabling requirement, while providing higher bandwidth and FCoE SAN Fabric 1 Data Centre SAN Fabric 2 LAN SAN Fabric 1 Data Centre SAN Fabric 2 LAN Separated LAN and SAN 3 6 6 3 10G Ethernet 8G Fibre Channel 10G Converged Minimum number of uplinks 1 1 1 1 FCoE switches Nexus 5000 3 3 With FCoE BC3 Nexus 4000 FCoE bridge for IBM BladeCenter BC3 BC2 Separated Ethernet and FC switches in IBM BladeCenter BC2 BC1 FCoE CNA BC1
Agenda 1 Business Challenges in the Data Centre Architecture 2 Impact of Cloud Computing on the Data Centre Network 3 Data Centre Network Architectural Principles 4 Unified Fabric and Fibre Channel over Ethernet (FCoE) 5 New Ethernet Developments 46
As of today, multipathing for Ethernet is based on proprietary mechanisms On Cisco Nexus 7000, Virtual Port Channel (vpc) also allows Multi-Chassis Etherchannel. Two switches are linked together with a dedicated 10G link, but both switches are independent (two mgmt points). Virtual Switch System (VSS). Proprietary mechanism from Cisco on Catalyst 6500. Two switches are seen as one unique device (one mgmt point) Multi-Chassis Etherchannel is then possible
TRILL (Transparent Interconnection of Lot of Links) is in its early adoption phase but will replace the well known Spanning Tree TRILL Packet Format TRILL is an IETF informational standard as of June 2009 (RFC 5556) TRILL uses IS-IS (traditionally a Layer-3 Routing Protocol) at Layer-2 to replace Spanning-Tree Ethernet frames are encapsulated in another Ethernet frame, which is then routed at Layer-2 by IS-IS. TRILL is based on a topology database and allows to load balance traffic among all available paths. Pre-standard of TRILL has been implemented by Cisco (Fabric Path)
802.1AB : Data Center Bridging Exchange Protocol is used as a multihop FCoE configuration DCBX is the management protocol of Data Center Bridging extensions. DCBX is itself an extension of the vendor-neutral Link Level Discovery Protocol (LLDP). Network devices or adapters advertise their identity and capabilities on the local network. This includes PFC, ETS, BCN/QCN, FCoE, Logical Link Down DCBX is used to discover FCoE devices and related configuration parameters DCBX is not mandatory to run FCoE, but is required in multi-hop configuration where pure DCB switches exist betwenn two FCoE end-nodes
802.1AZ : Enhanced Transmission Selection is a standard scheduling mechanism at port level The traffic is first grouped in Priority Groups with a first level of scheduling Then, the priority groups are scheduled by a second level scheduler With this structure, it is possible to assign bandwidth to each Priority Group. Inside each Priority Group, multiple traffic classes are allowed to share the bandwidth ETS Scheduling Model
802.1QAU : Congestion Notification can selectively stop a sender that generates a congestion in the network 802.1QAU is a congestion notification mechanism that allows a data centre switch experiencing congestion on one of its port (Congestion Point) to inform the source of the congestion (Reaction Point). This mechanism reacts faster than the PAUSE frame propagation mechanism in case of congestions caused by long-live flows. It is therefore a complement to the PAUSE frame mechanism If you have FCoE nodes at the edges of a large LAN (with a lot of DCB-enabled aggregation/core switches in the middle), it s quite possible to get congestion somewhere in the core and QCN is the only thing that can help.
Overlay Transport Virtualization (OTV) will extend Layer-2 domains through a WAN, allowing server mobility and global cloud computing OTV is an encapsulation of Ethernet frames in IP and provides a mechanism to avoid loops and frame flooding over the WAN Within a site, conventional MAB switching is used. In the Core, MAC Routing is used via overlaid IGP (IS-IS) Edge routers maintain a MAC Table with additional information for destinations in another data centre
In summary, designing a data centre is very challenging and requires expertise and experience in multiple domains Business & Architecture Designing a data centre requires experience and expertise in multiple domains Data Centre Security & Privacy Technological Decisions A Security Model is the first phase to go through in any data centre design Technologies are evolving immensely rapidely. Specialized consulting is required Skills & Expertise Crisis In many organizations, specialists have to become versatilists 53
Thank you! 54