1 VMware Virtual SAN 6.2 Network Design Guide TECHNICAL WHITE PAPER APRIL 2016
2 Contents Intended Audience... 2 Overview... 2 Virtual SAN Network... 2 Physical network infrastructure... 3 Data center network... 3 Oversubscription considerations... 3 Host network adapter... 7 Virtual network infrastructure... 8 VMkernel network... 8 Virtual Switch... 8 NIC teaming... 9 Multicast Network I/O Control Jumbo Frames Switch Discovery Protocol Network availability Conclusion About the Author Appendix Multicast configuration examples References TECHNICAL WHITE PAPER /1
3 Intended Audience This document is targeted toward virtualization, network, and storage architects interested in deploying VMware Virtual SAN solutions. Overview Virtual SAN is a hypervisor-converged, software-defined storage solution for the software-defined data center. It is the first policy-driven storage product designed for VMware vsphere environments that simplifies and streamlines storage provisioning and management. Virtual SAN is a distributed, shared storage solution that enables the rapid provisioning of storage within VMware vcenter Server as part of virtual machine creation and deployment operations. Virtual SAN uses the concept of disk groups to pool together locally attached flash devices and magnetic disks as management constructs. Disk groups are composed of at least cache device and several magnetic or flash capacity devices. In Hybrid architectures, flash devices are used as read cache and write buffer in front of the magnetic disks to optimize virtual machine and application performance. In all flash the cache device endurance is leveraged to allow lower cost capacity devices. The Virtual SAN datastore aggregates the disk groups across all hosts in the Virtual SAN cluster to form a single shared datastore for all hosts in the cluster. Virtual SAN requires correctly configured network for virtual machine I/O as well as communication among cluster nodes. Since the majority of virtual machine I/O travels the network due to the distributed storage architecture, highly performing and available network configuration is critical to a successful Virtual SAN deployment. This paper gives a technology overview of Virtual SAN network requirements and provides Virtual SAN network design and configuration best practices for deploying a highly available and scalable Virtual SAN solution. Virtual SAN Network The hosts in a Virtual SAN cluster must be part of a Virtual SAN network and must be on the same subnet regardless whether the hosts contribute storage
4 or not. Virtual SAN requires a dedicated VMkernel port type and uses a proprietary transport protocol for Virtual SAN traffic between the hosts. The Virtual SAN network is an integral part of an overall vsphere network configuration and therefore cannot work in isolation from other vsphere network services. Virtual SAN utilizes either VMware vsphere Standard Switch (VSS) or VMware vsphere Distributed Switch (VDS) to construct a dedicated storage network. However, Virtual SAN and other vsphere workloads commonly share the underlying virtual and physical network infrastructure. Therefore, the Virtual SAN network must be carefully designed following general vsphere networking best practices in addition to its own. The following sections review general guidelines that should be followed when designing Virtual SAN network. These recommendations do not conflict with general vsphere network design best practices. Physical network infrastructure Data center network The traditional access-aggregation-core, three-tier network model was built to serve north-south traffic in and out of a data center. While the model offers great redundancy and resiliency, it limits overall bandwidth by as much as 50% due to critical network links being blocked using the Spanning Tree Protocol (STP) to prevent network looping. As virtualization and cloud computing evolves, more data centers have adopted the leaf-spine topology for data center fabric simplicity, scalability, bandwidth, fault tolerance, and quality of service (QoS). Virtual SAN is compatible with both topologies regardless how the core switch layer is constructed. Oversubscription considerations East West and throughput concerns VMware Virtual SAN requires low latency and ample throughput between the hosts, as reads may come from any host in the cluster, and writes must be acknowledged by two hosts. For simple configurations utilizing modern, wire speed, top of rack switches, this is a relatively simple consideration as all ports can speak wire speed to all ports. As clusters are stretched across
5 datacenters (perhaps using the Virtual SAN fault domains feature), the potential for oversubscription become a concern. Typically, the largest demand for throughput is during a host rebuild or host evacuation as potentially all hosts may be requesting to send and receive traffic at wire speed to reduce the time of the action. The larger the capacity consumed on each host, the more important the over subscription ratio becomes. A host with only 1Gbps and 12TB of capacity would take over 24 hours to refill with data. Leaf-spine In traditional leaf-spine architecture, due to the full mesh topology and port density constraints, leaf switches are normally oversubscribed for bandwidth. For example, a fully utilized 10GbE uplink utilized by the Virtual SAN network in reality may only achieve 2.5Gbps throughput on each node when the leaf switches are oversubscribed at a 4:1 ratio and Virtual SAN traffic needs to go across the spine, as illustrated in Figure 2. The impact of network topology on available bandwidth should be considered when designing your Virtual SAN cluster. The leaf switches are fully meshed to the spine switches with links that could either be switched or routed, these are referred to as Layer 2 and Layer 3 leafspine architectures respectively. Virtual SAN over layer 3 networks is currently supported. VMware Recommends: consider using layer 2 multicast for simplicity of configuration and operations
6 Figure 1. Bandwidth oversubscription for Virtual SAN network in leaf-spine architecture Here is an example of how over commitment can impact rebuild times. Let us assume the the above design is used with 3 fault domains, and data is being mirrored between cabinets. In this example each host has 10TB of raw capacity, with 6TB of it being used for virtual machines protected by FTT=1. In this case we will also assume 3/4ths (or 30Gbps) of the available bandwidth is available for rebuild. Assuming no disk contention bottlenecks it would take approximately 26 minutes to rebuild over the over subscribed link. If the capacity needing to rebuild was increased to 12TB of data, and the bandwidth was reduced to only 10Gbps, then the rebuild would take at a minimum 156 minutes. Any time capacity increases, or bandwidth between hosts is decreased the time for rebuilds becomes longer. VMware Recommends: minimizing oversubscription to reduce opportunities for congestion during host rebuilds or high throughput operations.
7 ECMP A number of vendors have implemented Ethernet fabrics that eliminate the need for spanning tree to prevent loops, and employ layer 2 routing mechanisms to best use the shortest paths as well as supplemental paths for added throughput. SPB (Shortest Path Bridging) or TRILL ("Transparent Interconnection of Lots of Links") are commonly used, but often with proprietary extensions. Virtual SAN is compatible with these topologies, but be sure to design adequate east west traffic within each Virtual SAN cluster. Cisco FEX/Nexus 2000 It should be noted that fabric extending devices such as the Cisco Nexus 2000 product line have unique considerations. These devices lack the ability for port to port direct traffic on the same switch, and all traffic must travel through the uplink to the Nexus 5000 or 7000 series device and back down. While this will increase port to port latency, the larger concern is large throughput operations (such as a host rebuild) will potentially put pressure on the over subscribed uplinks back to the switch. Non-Stacked top of rack switches and Cisco Fabric Interconnects. VMware Recommends: Deploying all hosts within a fault domain to a low latency wire speed switch or switch stack. When multiple switches are used, pay attention to throughput of the links between switches. Deployments with limited or heavily over subscribed throughput should be carefully considered. Flow Control Pause Frames are related to Ethernet flow control and are used to manage the pacing of data transmission on a network segment. Sometimes, a sending node (ESXi/ESX host, switch, etc.) may transmit data faster than another node can accept it. In this case, the overwhelmed network node can send
8 pause frames back to the sender, pausing the transmission of traffic for a brief period of time. Virtual SAN manages congestion by introducing artificial latency to prevent cache/buffer exhaustion. Since Virtual SAN has built-in congestion management, disabling flow control on VMkernel interfaces tagged for Virtual SAN traffic is recommended. Note Flow Control is enabled by default on all physical uplinks. For further information on Flow Control see KB VMware recommends: disabling flow control for Virtual SAN traffic Security considerations VMware Virtual SAN, like other IP storage traffic, is not encrypted and should be deployed to isolated networks. VLAN s can be leveraged to securely separate Virtual SAN traffic from virtual machine and other networks. Security can also be added at a higher layer by encrypting data in guest in order to meet security and compliance requirements. Host network adapter On each Virtual SAN cluster node, the following practices should be applied: At least one physical NIC must be used for Virtual SAN network. One or more additional physical NICs are recommended to provide failover capability. The physical NIC(s) can be shared amongst other vsphere networks such as virtual machine network and vmotion network. Logical Layer2 separation of Virtual SAN VMkernel traffic (VLANs) is recommended when physical NIC(s) share traffic types. QoS can be provided for traffic types via Network IO Control (NIOC). 10GbE NIC or larger is strongly recommended for Virtual SAN, and a requirement for all flash Virtual SAN. If 1GbE NIC is used for hybrid configurations, VMware recommends it to be dedicated for Virtual SAN. Larger than 10Gbps such as 25/40/100Gbps is supported as long as your edition of vsphere supports it.
9 Virtual network infrastructure VMkernel network A new VMkernel type called Virtual SAN traffic is introduced in vsphere for Virtual SAN. Each cluster node must have this VMkernel port configured in order to participate in a Virtual SAN cluster. This is true even for nodes that do not contribute storage to Virtual SAN. For each cluster, a VMkernel port group for Virtual SAN should be created in VSS or VDS, and the same port group network label should be used to ensure labels are consistent across all hosts. Unlike multiple-nic vmotion, Virtual SAN does not support multiple VMkernel adapters on the same subnet. Virtual Switch VMware Virtual SAN supports both VSS and VDS virtual switches. It should be noted that VDS licensing is included with VMware Virtual SAN and licensing should not be a consideration when choosing a virtual switch type. As VDS is required for dynamic LACP (Link Aggregation Control Protocol), LBT (Load Based Teaming), LLDP (Link Layer Discovery Protocol), bi-directional CDP (Cisco Discovery Protocol), and Network IO Control (NIOC) VDS is preferred for superior performance operational visibility, and management capabilities. VMware recommends: Deploying VDS for use with VMware Virtual SAN. vcenter and VDS considerations VMware fully supports deploying a vcenter that manages a cluster on top of the storage cluster. Starting with vsphere 5.x Static port groups became the default port group type for VDS, and will persist assignment to a virtual machine through a reboot. In the event vcenter is unable to be bind to the VDS a pre-created ephemeral port group, or a VSS can be leveraged to restore access to the vcenter Server.
10 NIC teaming Virtual SAN network can use teaming and failover policy to determine how traffic is distributed between physical adapters and how to reroute traffic in the event of adapter failure. NIC teaming is used mainly for high availability, but not load balancing when the team is dedicated for Virtual SAN. However, additional vsphere traffic types sharing the same team could still leverage the aggregated bandwidth by distributing different types of traffic to different adapters within the team. Virtual SAN supports all VSS and VDS supported NIC teaming options. Load Based Teaming Route based on physical NIC load, also known as Load Based Teaming (LBT), allows vsphere to balance the load on multiple NIC s without a custom switch configuration. It begins balancing similar to Virtual Port ID, but will dynamically reassess physical to virtual NIC bindings every 30 seconds based on congestion thresholds. To prevent impact on port change settings such as Cisco s portfast or HP admin-edge-port on ESXi host facing physical switch ports should be configured. With this setting, network convergence on these switch ports will happen fast after the failure because the port will enter the Spanning tree forwarding state immediately, bypassing the listening and learning states. Additional information can be found on different teaming policies in the vsphere networking documentation. IP Hash Policy One failover path option is the IP hash based policy. Under this policy, Virtual SAN, either alone or together with other vsphere workloads, is capable of balancing load between adapters within a team, although there is no guarantee of performance improvement for all configurations. While Virtual SAN does initiate multiple connections, there is no deterministic balancing of traffic. This policy requires the physical switch ports to be configured for a port link aggregation technology or port-channel architecture such as Link Aggregation Control Protocol (LACP) or EtherChannel. Only static mode EtherChannel is supported with the vsphere Standard Switch. LACP is supported only with vsphere Distributed Switch. VMware recommends: Use Load Based Teaming or for load balancing, and appropriate spanning tree port configurations are taken into account.
11 Multicast IP multicast sends source packets to multiple receivers as a group transmission. Packets are replicated in the network only at the points of path divergence, normally switches or routers, resulting in the most efficient delivery of data to a number of destinations with minimum network bandwidth consumption. For examples of Multicast configuration please see the Layer 2/Layer 3 network topologies white paper. Virtual SAN uses multicast to deliver metadata traffic among cluster nodes for efficiency and bandwidth conservation. Multicast is required for VMkernel ports utilized by Virtual SAN. While Layer 3 is supported, Layer 2 is recommended to reduce complexity. All VMkernel ports on the Virtual SAN network subscribe to a multicast group using Internet Group Management Protocol (IGMP). IGMP snooping configured with an IGMP snooping querier can be used to limit the physical switch ports participating in the multicast group to only Virtual SAN VMkernel port uplinks. The need to configure an IGMP snooping querier to support IGMP snooping varies by switch vendor. Consult your specific switch vendor/model best practices for IGMP snooping configuration. If deploying a Virtual SAN cluster across multiple subnets, be sure to review best practices and limitations in scaling Protocol Independent Multicast (PIM) dense or sparse node. A default multicast address is assigned to each Virtual SAN cluster at time of creation. When multiple Virtual SAN clusters reside on the same layer 2 network, the default multicast address should be changed within the additional Virtual SAN clusters to prevent multiple clusters from receiving all multicast streams. Similarly, multicast address ranges must be carefully planned in environments where other network services such as VXLAN also utilize multicast. VMware Knowledge Base Article can be consulted for the detailed procedure of changing the default Virtual SAN multicast address. More simply isolating each clusters traffic to its own VLAN will remove possibility for conflict. VMware recommends: isolating each Virtual SAN clusters traffic to its own VLAN to when using multiple clusters. Network I/O Control vsphere Network I/O Control (NIOC) can be used to set quality of service (QoS) for Virtual SAN traffic over the same NIC uplink in a VDS shared by
12 other vsphere traffic types including iscsi traffic, vmotion traffic, management traffic, vsphere Replication (VR) traffic, NFS traffic, Fault Tolerance (FT) traffic, and virtual machine traffic. General NIOC best practices apply with Virtual SAN traffic in the mix: For bandwidth allocation, use shares instead of limits as the former has greater flexibility for unused capacity redistribution. Always assign a reasonably high relative share for the Fault Tolerance resource pool because FT is a very latency-sensitive traffic type. Use NIOC together with NIC teaming to maximize network capacity utilization. Leverage the VDS Port Group and Traffic Shaping Policy features for additional bandwidth control on different resource pools. Specifically, for Virtual SAN, we make the following recommendations: Do not set a limit on the Virtual SAN traffic; by default, it is unlimited. Set a relative share for the Virtual SAN resource pool based on application performance requirements on storage, also holistically taking into account other workloads such as bursty vmotion traffic that is required for business mobility and availability. Avoid reservations as they will share unused traffic only with other management types (vmotion, Storage etc.) but not with Virtual Machine networking needs. Jumbo Frames Virtual SAN supports jumbo frames, but does not require them. VMware testing finds that using jumbo frames can reduce CPU utilization and improve throughput, however, with both gains at minimum level because vsphere already uses TCP Segmentation Offload (TSO) and Large Receive Offload (LRO) to deliver similar benefits. In data centers where jumbo frames are already enabled in the network infrastructure, jumbo frames are recommended for Virtual SAN deployment. If jumbo frames are not currently in use, Virtual SAN alone should not be the justification for deploying Jumbo Frames.
13 VMware Recommends: Using the existing MTU/Frame size you would otherwise be using in your environment. Switch Discovery Protocol Switch discovery protocols allow vsphere administrators to determine which switch port is connected to a given VSS or VDS. vsphere supports Cisco Discovery Protocol (CDP) and Link Layer Discovery Protocol (LLDP). CDP is available for vsphere Standard Switches and vsphere Distributed Switches connected to Cisco physical switches. When CDP or LLDP is enabled for a particular vsphere Distributed Switch or vsphere Standard Switch, you can view properties of the peer physical switch such as device ID, software version, and timeout from the vsphere Client. VMware Recommends: enable LLDP or CDP in both send and receive mode. Network availability For high availability, Virtual SAN network should have redundancy in both physical and virtual network paths and components to avoid single points of failure. The architecture should configure all port groups or distributed virtual port groups with at least two uplink paths using different NICs that are configured with NIC teaming, set a failover policy specifying the appropriate active-active or active-standby mode, and connect each NIC to a different physical switch for an additional level of redundancy. VMware recommends: redundant uplinks for Virtual SAN and all other traffic. Conclusion Virtual SAN Network design should be approached in a holistic fashion, taking into account other traffic types utilized in the vsphere cluster in addition to the Virtual SAN network. Other factors to consider should be the physical network topology, and the overprovisioning posture of your physical switch infrastructure.
14 Virtual SAN requires a 1GbE network at the minimum for hybrid clusters and 10Gbps for all flash clusters. As a best practice, VMware strongly recommends 10GbE network for Virtual SAN to avoid the possibility of the network congestion leading to degraded performance. A 1GbE network can easily be saturated by Virtual SAN traffic and teaming of multiple NICs can only provide availability benefits in limited cases. If 1GbE network is used, VMware recommends it be used for smaller clusters, and be to be dedicated to Virtual SAN traffic. To implement a highly available network infrastructure for Virtual SAN, redundant hardware components and network paths are recommended. Switches can be configured either in uplink or stack mode, depending on switch capability and your physical switch configuration. Virtual SAN supports both vsphere Standard Switches and vsphere Distributed Switches. However, VMware recommends the use of vsphere Distributed Switches in order to realize network QoS benefits offered by vsphere NIOC. When various vsphere network traffic types must share the same NICs as Virtual SAN, separate them onto different VLANs and use shares as a quality of service mechanism to guarantee the level of performance expected for Virtual SAN in possible contention scenarios. About the Author John Nicholson is a Senior Technical Marketing Manager in the Storage and Availability Business Unit. He focuses on delivering technical guidance around VMware Virtual SAN solutions. John previously worked in architecting and implementing enterprise storage and VMware solutions. Follow John on
15 Appendix Multicast configuration examples. Multicast configuration examples should be used only as a reference. Consult with your switch vendor as configuration commands may change between platforms and versions. Cisco IOS (Default is IGMP snooping on). switch# configure terminal switch(config)# vlan 500 switch(config vlan)# no ip igmp snooping switch(config vlan)# do write memory Brocade ICX (Default is IGMP snooping off) Switch#configure Switch(config)# VLAN 500 Switch(config vlan 500)# multicast disable igmp snoop Switch(config vlan 500)# do write memory Brocade VDX Guide (See guide for Virtual SAN VDX configuration) HP ProCurve (Default is IGMP snooping on) switch# configure terminal switch(config)# VLAN 500 ip IGMP switch(config)# no VLAN 500 ip IGMP querier switch(config)# write memory
16 References 1. Virtual SAN Product Page 2. VMware Virtual SAN Hardware Guidance, Hardware-Guidance.pdf 3. VMware NSX Network Virtualization Design Guide, 4. VMware Network Virtualization Design Guide, 5. Understanding IP Hash Load Balancing, VMware Knowledge Base Article Sample configuration of EtherChannel / Link Aggregation Control Protocol (LACP) with ESXi/ESX and Cisco/HP switches, VMware Knowledge Base Article Changing the multicast address used for a VMware Virtual SAN Cluster, VMware Knowledge Base Article Understanding TCP Segmentation Offload (TSO) and Large Receive Offload (LRO) in a VMware environment, VMware Knowledge Base Article IP Multicast Technology Overview, papers/mcst_ovr.pdf 10. Essential Virtual SAN: Administrator s Guide to VMware Virtual SAN by Cormac Hogan, Duncan Epping 11. VMware Network I/O Control: Architecture, Performance and Best Practices,