Simplifying Data Center Network Architecture: Collapsing the Tiers Abstract: This paper outlines some of the impacts of the adoption of virtualization and blade switches and how Extreme Networks can address these changes and bring to market differentiated solutions that meet the growing needs of the data center. 2009 Extreme Networks, Inc. All rights reserved. Do not reproduce.
Overview Traditional data center architectures have revolved around a Top-of-Row (ToR) or an End-of-Row (EoR) switch which then connects into either an aggregation or a core switch. Typically this leads to a 2-tier or 3-tier architecture. With the adoption of virtualization, a new switching tier has been introduced into the network by way of the virtual switch. The virtual switch is nothing but a software switch that sits inside the hypervisor and allows Virtual Machines (VMs) to communicate with each other. In addition with the adoption of blade servers, it is becoming increasingly common to use a blade switch within the blade chassis enclosure. The blade switch allows communication between blade servers within a blade chassis enclosure. The blade switch provides uplinks from the blade chassis enclosure to the rest of the physical network infrastructure such as a top-of-rack or end-of-row switch. The net effect of adding the virtual switch and the blade switch to the data center switching infrastructure is that the network architecture for the data center is now moving from a 2 or 3-tier architecture, to a 4 or 5-tier architecture. (See Figure 1 below.) This paper outlines some of the impacts of these changes and how Extreme Networks can address these changes and bring to market differentiated solutions that address the growing needs of the data center. 5-Tier Architecture Core (Tier 1) Aggregation (Tier 2) ToR (Tier 3) 42U 19 Rack Blade (Tier 4) Virtual (Tier 5) Blade Server Chassis Blade Server 5385-01 Figure 1. 5-Tier Architecture 2009 Extreme Networks, Inc. All rights reserved. Simplifying Data Center Network Architecture: Collapsing the Tiers 2
The Impact of Virtual es Virtual switches allow virtual machines within a server to communicate with each other locally. As the number of VMs on a server increase, the functionality in the virtual switch also increases. Today virtual switches support tagging, rate limiting, Access Control Lists (ACLs) and other sophisticated functionality found in Layer 2 network switches. The impact of the virtual switch is significant in many ways. First the virtual switch resides within the server. This gives rise to the challenge of who will manage the virtual switch. Is it the network administrator or is it the server administrator? Troubleshooting, configuration, etc. require coordination between the server and the network administrators as well as between the tools used to manage, configure and troubleshoot the physical and virtual switches. Compounding this problem is the fact that different virtualization technology providers have different capabilities and models for different functionalities in the virtual switch. This adds an additional dimension of operational overhead when dealing with heterogeneous virtualization environments. Effectively we have heterogeneity across the virtual switch layer, and heterogeneity between the virtual switch and the physical/network switch tiers since the network switch functionality is in many cases different (and more sophisticated) than the virtual switch tier. Secondly, the virtual switch tier gives rise to a problem of scale. With a virtual switch in every server, the number of actively managed switches in the network has grown by a factor of 10 or more. The more functionality that is enabled in the virtual switches, the more the management overhead. Additionally, since virtual switches today run mostly in software in the hypervisor, they take away CPU resources from the actual application. As more functionality is enabled in the virtual switches, the CPU impact of this becomes more pronounced. Lastly, with the number of virtual machines running on a server increasing significantly, the inter-vm traffic is also increasing. ing between these VMs locally on the server via the virtual switch provides very little visibility in terms of being able to get insight into what is going on for troubleshooting purposes. The Blade Tier Blade servers are rapidly gaining in popularity. Where earlier 8 blade servers in an enclosure were the norm, today 16 and even 32 blade servers are being accommodated within the blade chassis. Each blade server provides connectivity to the network via Ethernet ports, sometime via 2 or more ports on each server. These connections are available through the front panel via two options: A pass-through blade or a blade switch. The pass-through blade simply brings all the Ethernet connections from each of the blade servers out the front panel. In a dense deployment the number of Ethernet ports within a blade chassis and in the case of a rack housing two or more blade chassis, the number of Ethernet ports within the rack is significantly high leading to significant cabling problems. The blade switch addresses this issue by locally switching traffic between blade servers and providing a fewer set of uplink ports thus reducing the cabling nightmare. However, what the blade switch does is to introduce yet another switching tier in the data center. Once again as in the case of the virtual switch, this introduces another layer of management overhead into the data center. And as in the case of the virtual switch, the ambiguity of who owns and manages the blade switch switching tier (server or network administrators) applies as well. More importantly, it also introduces oversubscription into the network. On the one hand, the move to greater virtualization is driving higher and higher bandwidth requirements closer to the edge in the data center. Many data centers are moving to reduce the oversubscription that was commonly designed into the edge. The blade switch, while addressing the cabling issue, does so at the expense of introducing oversubscription. Another key impact of the blade switch is that in addition to introducing oversubscription, it also introduces latency into the data center. While many switch vendors are introducing new switches that are purpose built for low latency, adding another switching tier that increases latency within the data center may not be acceptable to customers using applications that can benefit from lower latency in the network. Clearly while virtualization is a powerful and disruptive enabling technology, the addition of the virtual switch layer has created complexity into the network. 2009 Extreme Networks, Inc. All rights reserved. Simplifying Data Center Network Architecture: Collapsing the Tiers 3
The Extreme Networks Approach: Reduce the Tiers While cost and virtualization are driving higher consolidation, what we are seeing is that the network architecture of the data center is evolving to a more complex model. There are several ways that this additional complexity can be avoided and in fact eliminated, while still retaining the benefits of virtualization and consolidation. The first step in this direction to simplify the data center network is to eliminate the blade switch tier. Getting rid of the blade switch eliminates oversubscription at that layer and reduces end-to-end latency. It also eliminates heterogeneity between switching layers in the network. However, the alternative to the blade switch, the pass-through blade, introduces wiring and cabling complexity that is hard to deal with. Extreme Networks has introduced solutions here that can address the wiring challenges of the pass-through blade without introducing the oversubscription, latency and management overhead of a blade switch. The BlackDiamond 8900-G96T-C blade incorporates 96 gigabit copper connections on a single I/O switch module via MRJ21 connectors. Each MRJ21 connector connects to a MRJ21 cable. The MRJ21 cable aggregates 6 Ethernet cables into one cable. The other end of the MRJ21 cable can plug directly into the front panel ports of a pass-through blade using RJ45 connectors or to a passive patch panel connected in a ToR configuration using an MRJ21 connector. In this model a pass through blade can be used for the blade server enclosure. By virtue of the MRJ21 cable, a 6:1 cable consolidation ratio can be achieved thus significantly reducing the pain of dealing with cable sprawl. Furthermore, by connecting the blade servers via the pass through module directly to the Ethernet ports of the G96T module in an EoR configuration, we not only eliminate the blade switch, but also the active ToR switch. Effectively we have collapsed two distinct switching tiers in the network. The advantages of this solution are significant. 1. Directly attaching the servers to the EoR switch eliminates oversubscription at the blade switch and ToR layers. 2. It eliminates the management overhead of dealing with blade switches and heterogeneity across switching layers. It also eliminates the conflict between server and network management organizations when it comes to managing and troubleshooting problems related to the blade switch. 3. It eliminates additional switching latencies associated with the blade switch and TOR switch. 4. It reduces cabling overhead and management issues. 5. It reduces power consumption in the data center. The second step where complexity can be reduced is in the virtual switch tier. As described earlier, the virtual switch introduces a new level of complexity in terms of management, scaling and performance. Multiple approaches are being looked at in the industry to address some of these issues, including eliminating the virtual switch and replacing it with a virtual Ethernet port aggregator (VEPA). The VEPA is a layer in the hypervisor that takes traffic from all the VMs and simply forwards it out to the external switch. The external network switch forwards traffic between VMs on the same server as well as between servers. There are several advantages to the VEPA approach, such as bringing networking between VMs back into the network administrator s domain and out of the server administrator s domain, being able to leverage the advanced capabilities of the network switch such as ACLs, QoS and rate limiting, without any degradation in performance, as well as being able to use traditional network functionality such as port mirroring to troubleshoot traffic forwarding issues between VMs on the same server. However, the VEPA approach is not yet standardized and requires hardware support on the external network switch to enable forwarding traffic between VMs on the same server at wire speed. Extreme Networks switch portfolio supports VEPA type functionality in hardware, which may be enabled in the future using a firmware upgrade. For now however, the recommended network approach is to limit the virtual switch to basic forwarding functionality and not for any advanced functionality such as ACLs, QoS, rate limiting, etc. By limiting the virtual switch functionality to simple forwarding functionality, the CPU performance is kept predictable and CPU cycles remain available for application performance. In conjunction, any advanced network functionality associated with VMs should be implemented in the physical network infrastructure. Extreme Networks has developed capabilities to address virtualization requirements within its network switches. For example, the products support the ability to assign various traffic and security parameters to a virtual machine and to be able to automatically move those parameters as the virtual machine moves from server to server. By moving the burden of enforcing advanced switching functionality from the CPU to the network, the CPU on the server can be freed up to provide more cycles and predictable performance to VMs. Furthermore this functionality can be enforced in the network at wire speed with no performance degradation. Latency in the virtual switch tier can also be reduced by limiting the virtual switch to basic forwarding requirements. In addition, by leveraging the network, network administrators have greater control on the performance, management and troubleshooting of the network. Finally, in a heterogeneous virtualization environment, moving functionality from the virtual switch 2009 Extreme Networks, Inc. All rights reserved. Simplifying Data Center Network Architecture: Collapsing the Tiers 4
to the network switch allows more uniform enforcement of policies with no additional training, cost or overhead. Summary By leveraging an Extreme Networks infrastructure, the data center network architecture can be considerably simplified, enabling reduced cost and improved performance. Multiple layers in the network architecture can be eliminated leading to more bandwidth available to servers, reduced end-to-end latency and simplified management. In addition, by relying on an Extreme network infrastructure rather than adding more functionality into the virtual switch tier, advanced switching functionality can be applied consistently in a heterogeneous environment with no performance impact. The CPU on the servers becomes free from doing things that the network can carry out at wire speed, and the network can take an agile and dynamic nature complementing the dynamism that virtualization brings to the server space. Lastly, Extreme Networks switch portfolio offers a smooth migration to standards based switching for virtual machines without requiring forklift upgrades or tying the network into any proprietary architecture. www.extremenetworks.com Corporate and North America Extreme Networks, Inc. 3585 Monroe Street Santa Clara, CA 95051 USA Phone +1 408 579 2800 Europe, Middle East, Africa and South America Phone +31 30 800 5100 Asia Pacific Phone +852 2517 1123 Japan Phone +81 3 5842 4011 2009 Extreme Networks, Inc. All rights reserved. Extreme Networks, the Extreme Networks logo and BlackDiamond are either registered trademarks or trademarks of Extreme Networks, Inc. in the United States and/or other countries. All other trademarks are the trademarks of their respective owners. Specifications are subject to change without notice. 1618_01 12/09