Blade Switches Don t Cut It in a 10 Gig Data Center Zeus Kerravala, Senior Vice President and Distinguished Research Fellow, zkerravala@yankeegroup.com Introduction: Virtualization Drives Data Center Evolution The enterprise data center has gone through several major technological shifts over the past several decades. The primary computing platform has shifted from mainframes to client/server computing to Internet computing, with each major transition lowering the overall cost of computing and increasing the strategic nature of the data center network. Now, the industry is in the middle of the next major computing evolution: the shift to a fully virtualized data center (see Exhibit 1). Virtualization has already transformed many parts of the data center. It has changed the way software is licensed and built, it has had a significant impact on server design and it has radically changed the way IT resources are provisioned and allocated. Now, for organizations to maximize their investments in virtualization technology, the network must transform as well. Table of Contents September 2011 Introduction: Virtualization Drives Data Center Evolution 1 Defining a Fabric 2 The Role of the Blade Switch 3 Extending the Fabric to the Blade Switching Tier 4 Benefits of Integrating the Blade Switch into the Fabric 7 Conclusion and Recommendations 10 Exhibit 1: Computing Through the Ages Value Mainframe Client/Server Internet Computing Virtual Computing 1960-1980 1980-1995 1995-2010 2010+ Cost of Computing This custom publication has been sponsored by Juniper.
However, the transformation of the data center is not without its challenges. The following obstacles must be overcome before the required network transformation can occur: The current multi-tier data center network architecture is not built to efficiently handle today s traffic patterns. Current data center networks were designed almost a decade ago, when the majority of traffic was between clients and servers (i.e., north to south). The dynamic nature of today s virtualized traffic flows between machines within the data center itself (i.e., east to west) requires a simpler, more efficient network design. The silo-like nature of data center operations leads to inefficient use of resources. and compute operations are still typically isolated within their own silos. While this was adequate in legacy data centers, virtualization and convergence have forged a level of co-dependence among the network, compute and storage tiers, requiring operations teams to come together and collaborate as well. The next wave of virtualization is designed to enable greater mobility of virtual workloads across racks, rows or any location in the data center. The network, compute and storage infrastructure must be aware of one another to ensure consistent application performance as virtual workloads are launched. These challenges are significant barriers to evolving the data center. However, they can be overcome with a highperformance network fabric that can act as the backplane for the virtual data center. Defining a Fabric The requirements of a virtual data center have driven the need for network evolution. There can be no compromise between network scalability, reliability and performance. A network fabric can be thought of as a network that scales seamlessly and is very easy to manage, enabling business agility and providing a solid network infrastructure for longterm evolution of the data center. Fabrics already exist in many data centers today, but they have typically been limited to storage networks where the real-time demands of storage required the evolution of a no-compromise, fabric-like network. These same demands are now present in Ethernet networks, which require a network fabric with the following characteristics: Optimized for east-west (server-to-server) traffic flow: Legacy data center networks are designed for northsouth (client-to-server) traffic flows, from the edge of the network through each tier and back again. This adds significant latency to traffic flows and causes congestion on inter-switch links. A network fabric can be thought of as a high-speed transport with a single, logical domain where traffic can move north-south and east-west with equal ease, as applications demand. This is critical to the movement of virtual workloads where latency can impact application performance and disrupt the business. Single-hop network: In traditional multi-tier networks, traffic can take several hops, passing through many network devices, before reaching its destination. Each hop requires individual packet processing and adds latency to the traffic flow. A network fabric would virtually connect every port to every other port, meaning all traffic is no more than a single hop away from its destination. Spanning Tree Protocol (STP)-free network: STP was developed to keep traffic from getting caught in routing loops, never reaching its destination. STP prevents these loops by disabling all possible routes between any two points except for the one determined to be the fastest; an alternate route only becomes active when the primary route fails. Since all traffic is forced to travel the same path, the network must be over-provisioned to accommodate these demands. As a result, many ports up to 50 percent in some cases may be inactive at any given time, creating a highly inefficient network. STP was a great innovation for its time, but it was developed in the client/ server era; now the network needs to evolve past this inefficient use of network resources. In essence, a network fabric can be thought of as a single, flat network where every port is directly connected to every other port (see Exhibit 2 on the next page). This network fabric should provide high performance, scalability and a simple management model so that it can efficiently interconnect data center resources and become the backplane of the virtual data center. To date, network fabric solutions from leading vendors have focused on collapsing the current three physical network tiers in the data center that is, top-of-rack, aggregation and core down to two or even one tier. However, an oft-neglected and forgotten fourth tier does exist: the blade-switching tier. This tier is often embedded in server clusters and therefore has not been addressed by many solution providers. As the network continues to grow, however, the data center network fabric will need to extend to this tier as well. Copyright 1997-2011, Yankee Group Research, Inc. All rights reserved. Page 2
Exhibit 2: Traditional s and a Fabric Server Rack Server Rack Access Access TRAFFIC FLOW Core Aggregation TRAFFIC FLOW TRAFFIC FLOW Core The Role of the Blade Switch Traditional The blade switch was introduced as an efficient way to collectively connect blade servers in a chassis to the external network. Rather than individually connecting multiple blade servers directly to the physical network, a blade switch built into the blade server chassis provides a single network connection (see Exhibit 3). The blade switch provides the following benefits to data centers: Simplified cabling. Instead of each blade server having its own cable run from the back of the server to the end-of-row switch, all blade servers connect to the blade switch in the rack. A single long cable run, or two for redundancy purposes, connects the blade switch to the physical network. Minimized power, cooling and space. Without the blade switch, the data center network would require many more switching ports to accommodate the increased number of connections. This in turn would require more aggregation switches and perhaps a larger core switch. Ability to integrate with traditional data centers. The latency characteristics and network connectivity speeds of blade switches are on par with legacy servers, so they integrate well into traditional data centers. Blade Chassis Fabric Exhibit 3: Blade Chassis Architecture Blade Server 1 Blade Server 2 Blade Server 3 Blade Server n Connection disabled by STP Active connections Connections GigE Connections Blade Switch Data Center Copyright 1997-2011, Yankee Group Research, Inc. All rights reserved. Page 3
While blade switches met the challenges of legacy data centers, they do not meet the demands of the virtual data center due to the following limitations: Limited network feature set. Since blade switches are used to connect blade servers, server vendors often manufacture them, resulting in limited features. While they may meet the minimum requirements in most cases, advanced capabilities such as QoS, multicast, high availability and access control lists (ACLs) table stakes in most network designs are unavailable. Designed for cable simplicity, not for solving complex network problems. Blade switches were developed to consolidate and reduce the number of network connections exiting the blade server chassis. As a result, blade switches lack the ability to solve more complex network problems such as virtualization and convergence. Many advanced features are moving to the network edge, and blade switches with their minimal feature set merely add complexity to virtualized data center networks. Bandwidth limitations that inhibit data center evolution. Storage-network convergence is in its infancy today. As it becomes more commonplace, blade servers will need to offer connections a feature not commonly available. The primary purpose of blade switches was to simplify cabling; bandwidth limitations restrict the network and add cost and complexity. While blade switches played a significant role in helping blade servers reach their current high penetration rate, they will not meet the demands of today s virtualized data center. Instead, a better solution is needed. Extending the network fabric into the blade-switching tier will create the necessary foundation on which to build the virtual data center. Extending the Fabric to the Blade Switching Tier The evolution of the data center is driving the need for network change. Migrating legacy networks to a network fabric will simplify and improve the performance of the data center. However, there are a number of trends that will require extending the network fabric into the blade-switching tier. These trends include: Density of server virtualization. Organizations have been migrating more workloads to virtual servers. Based on our ongoing research, Yankee Group estimates 58 percent of all workloads are now virtualized, up from 24 percent in 2007. As the density of server virtualization increases, low-latency, high-performance, feature-rich networks will be needed to accommodate the accompanying increase in network traffic. Virtual machine (VM) mobility. While virtualization has typically been used to consolidate servers, mobilizing virtual workloads has become critical for maintenance and disaster recovery as well. Moving a workload across the network can drive bandwidth requirements that enhance the need for a network fabric. Large application clusters. The increasing size of application clusters drives the need for faster network performance. A flat, low-latency fabric is needed to optimize the performance of application clusters. Converged I/O. The convergence of Fibre Channel and Ethernet create new challenges for the network. Migrating to a network fabric will enable Ethernet to support the low-latency, high-bandwidth, feature-rich requirements that storage demands. Extending the fabric to the blade-switching tier will provide consistency from server to server. Copyright 1997-2011, Yankee Group Research, Inc. All rights reserved. Page 4
Extending the fabric to the blade server requires removing the entire blade-switching tier. Such an architecture, which combines the simplified cabling and lower power and cooling requirements of blade architectures with the agility, performance and manageability of a network fabric, provides servers with a redundant pair of connections to the fabric (see Exhibit 4). This architecture works well in both Ethernet-only deployments and converged I/O deployments. Exhibit 4: Server Connectivity to the Fabric Fabric Fabric Edge Quad Small Form- Factor Pluggable (QSFP) Interconnect Ethernet-Only Deployment Blade switches in an Ethernet network present several challenges that impede the evolution to the virtual data center. The insertion of blade switches between servers and the fabric adds an extra network tier, which in turn adds an extra hop and unwanted latency. Depending on the size of the environment, an aggregation layer may be required, adding yet another hop and even more latency. Plus, since the additional switches also need to be managed, each blade represents yet another element to monitor and maintain. In large networks, this can translate into hundreds of devices adding to the operational overhead of the data center. Lastly, each blade switch requires additional transceivers and uplink connectors, which raises the overall total cost of ownership (TCO) of running the data center. When the blade switching functionality is integrated into the network fabric, however, the result is a much simpler, flatter network that is in line with the goals of nextgeneration data center networks. The network fabric would act as a single tier, minimizing the end-to-end latency of the overall network. Additionally, the network would have wire-speed performance from each server to every other server, a key requirement to facilitate large workloads or VM mobility. Copyright 1997-2011, Yankee Group Research, Inc. All rights reserved. Page 5
Direct integration of blade switching into the fabric is also a lower-cost model, since there is no extra hardware to buy and fewer devices to manage. Exhibit 5 shows the cost comparison between a three-blade chassis in a rack with a six-switch deployment and a blade switch versus the same setup using a pass-through solution. The example uses actual prices gathered through secondary research. Exhibit 5: Sample : Blade Switch vs. Pass-Through Configuration Blade Switch Pass-Through Infrastructure Quantity Unit Price Total Infrastructure Quantity Unit Price Total Blade switch 6 $11,199 $67,194 pass-through 6 $4,999 $29,994 small form-factor pluggable (SFP)+ optics 120 $1,500 $180,000 direct-attached copper (DAC) 84 $210 $17,640 top-of-rack switch 2 $30,000 $60,000 Note: Estimates are based on leading suppliers list prices. Total $247,194 Total $107,634 As the table shows, the use of a pass-through model results in a greater than 50 percent capex reduction. Additionally, the pass-through configuration has only two switches to manage instead of six; if a blade system is used to connect to the fabric, there are no additional switches to manage. The blade-switch configuration is over-subscribed and has approximately 5 microseconds of latency compared to the pass-through configuration, which has sub-1-microsecond latency and wire-speed performance. Therefore, the pass-through configuration offers better performance at a lower price. Converged I/O Deployment Organizations that want to migrate to a converged I/O network should consider a passthrough architecture for their network. When blade switches are used, the additional number of hops and related latency impedes the performance of the converged I/O. The additional latency will negatively impact the performance of storage systems and virtualized servers. Also, each blade server requires both an Ethernet network interface card (NIC) and Fibre Channel host bus adapter (HBA). In the blade-switching configuration, in addition to the overhead imposed by additional blade servers, the organization would need to deploy Fibre Channel switches as well, driving up costs and creating management challenges for both network managers and server administrators. Copyright 1997-2011, Yankee Group Research, Inc. All rights reserved. Page 6
If the network fabric is extended to remove the blade tier, converged traffic needs just a single hop to reach its destination with minimal latency. This is a critical factor for storage and virtualization. Each blade server would require only a single converged network adapter (CNA) or two for redundancy with a connection to a converged top-of-rack switch. This has the obvious benefit of being much simpler to deploy and is also significantly easier to manage as well. Exhibit 6 examines the same size configuration as Exhibit 5, but adds a converged infrastructure. Just like the Ethernet-only example, the pass-through solution with converged I/O provides greater than 50 percent capex reduction and adds wire-speed performance and sub-1-microsecond latency. Exhibit 6: Sample : Blade Switch vs. Pass-Through With Converged I/O Blade Switch Pass-Through Infrastructure Quantity Unit Price Total Infrastructure Quantity Unit Price Total Blade switch 6 $11,199 $67,194 pass-through 6 $4,999 $29,994 SFP+ optics 120 $1,500 $180,000 DAC 84 $210 $17,640 network interface card 42 $599 $25,158 top-of-rack switch 2 $30,000 $60,000 Fibre Channel switch 6 $9,779 $58,674 8 GigE Fibre Channel SFP 72 $299 $21,528 Fibre Channel host bus adapter 42 $849 $35,658 converged network adapter 42 $1,199 $50,358 Top-of-rack uplink Fibre Channel SFP 24 $750 $18,000 Total $388,212 Total $175,992 Note: Estimates are based on leading suppliers list prices. Benefits of Integrating the Blade Switch into the Fabric Extending the network fabric into the blade switching tier offers many benefits. Exhibit 7 on the next page shows the difference between a blade switch-connected environment and a pass-through configuration where the fabric has been extended to the blade servers. The blade switch configuration is configured using SFP+ connectors in the access tier and then terminated into the end-of-row switch. Similarly, all Fibre Channel connections are terminated at the end-of-row storage-area network (SAN) switch. The pass-through design is architected with a pass-through module and requires 75 percent fewer cables using the more efficient QSFP connectors (see Exhibit 8 on the next page). In this configuration, there is no need for an end-of-row switch as the blade chassis and top-of-rack switch can connect directly to the core switch. Copyright 1997-2011, Yankee Group Research, Inc. All rights reserved. Page 7
Exhibit 7: Blade Switch vs. a Pass-Through Design Blade Switch Top-of-Rack Switch Pass-Through Switch To Core Storage-Area Storage-Area Exhibit 8: Blade Switch vs. a Pass-Through Design With QSFP Blade Switch Top-of-Rack Switch Pass-Through (with QSFP) QSFP Interconnect Switch To Core Storage-Area Storage-Area Copyright 1997-2011, Yankee Group Research, Inc. All rights reserved. Page 8
This network simplification, performance improvement and dramatic reduction in cabling leads to the following benefits: Capex reduction. The network fabric creates a single-tier network extended all the way to the blade server chassis. Capex is lowered through the elimination of the endof-row aggregation tier, resulting in fewer devices. In addition, adapters are reduced by up to 50 percent, the number of physical devices in the rack is reduced by a factor of six and cabling costs are cut due to the use of QSFP connectors. Opex reductions. The network fabric acts as single distributed switch and integrates seamlessly into the computing infrastructure. This means no overlap between server and network management functions. The network fabric has fewer devices to manage and overall operational expenses in terms of power, space and cooling are reduced. Additionally, orchestration is much easier with the integrated and simplified design. Optimized performance. The flat, single-tier network means no over-subscription within the network (including inside the rack) as well as high bandwidth and low latency across the data center. This provides the best possible performance of any network configuration. Enhanced feature set and scale for the virtual data center. The fabric architecture has many benefits optimized for the virtual data center. The simplified configuration has no interoperability issues, with Fibre Channel over Ethernet (FCoE) providing a simplified migration path. Additionally, a network fabric can provide significant high-availability options to ensure maximum uptime (see the July 2010 Yankee Group Report Evolution to a Virtual Data Center Requires a Fabric ). Unified architecture. The unified network topology creates a consistent architecture across all blade server vendors and rack-and-stack 1U servers. In addition, the network fabric is able to adapt as the data center evolves, providing superior futureproofing advantages. Copyright 1997-2011, Yankee Group Research, Inc. All rights reserved. Page 9
Conclusion and Recommendations The shift to virtualization is transforming the data center faster than at any time in the history of computing. Virtualization has already had a significant impact on the software and server industries; now, the network is at the precipice of change as well. The virtualized data center makes the network a key point of competitive differentiation for companies looking to capitalize on the flexibility and efficiencies of virtualization. This creates new demands and changes the requirements of the network and the choice of solution provider. To realize the full potential of virtual data centers, the network must undergo a significant transformation. With this understanding, the following recommendations can help companies begin their transition to a virtualized data center. Evaluate network infrastructure on criteria that is relevant to the future vision of the data center and the network. Decision criteria should no longer be based on brand, vendor incumbency or even traditional measuring sticks such as port density. Instead, evaluation criteria should be based on the number of server-facing ports the network can support, end-to-end latency and overall network throughput. Simplify the network architecture. managers should move away from three- and four-tier architectures and strive to migrate to a single tier, if possible. The less complex a network is, the easier it is to manage and troubleshoot. Make power efficiency needs a key part of the decision criteria. Power and cooling can vary quite widely between solution providers. Choose a network infrastructure vendor that includes power and cooling efficiency as part of its overall solution design. About the Author Zeus Kerravala Senior Vice President and Distinguished Research Fellow Zeus Kerravala, senior vice president and distinguished research fellow, leads the Research Council and is chartered with the responsibility of providing thought leadership to the research organization. Comprising senior research leaders, the Research Council provides outreach to clients and the broader Yankee Group community, as well as ensures that the company s research agenda addresses the needs of business leaders. Kerravala drives the strategic thinking of the research organization and helps shape the research direction. Much of Kerravala s expertise involves working with customers to solve their business issues through the deployment of infrastructure technology. HEADQUARTERS Corporate One Liberty Square 7th Floor Boston, Massachusetts 617-598-7200 phone 617-598-7400 fax European 30 Artillery Lane London E17LS United Kingdom 44-20-7426-1050 phone 44-20-7426-1051 fax Copyright 2011. Yankee Group Research, Inc. Yankee Group published this content for the sole use of Yankee Group subscribers. It may not be duplicated, reproduced or retransmitted in whole or in part without the express permission of Yankee Group, One Liberty Square, 7 th Floor, Boston, MA 02109. All rights reserved. All opinions and estimates herein constitute our judgment as of this date and are subject to change without notice.