W H I T E P A P E R Infrastructure Performance Analytics Private Cloud Migration Infrastructure Performance Validation Use Case October 2012
Table of Contents Introduction 3 Model of the Private Cloud Infrastructure 3 Load DynamiX Pre-Deployment Testing Strategy 4 Pre-Test Workloads 5 Boot Storm Workload Test 5 Run Time Workload Test 7 Private Cloud Analysis 8 Computational resources 8 Network resources and configuration 9 Storage resources and configuration 9 Authentication 10 Examples of Test Results 10 VM Boot Process 10 Run Time Scenarios 12 Conclusions 14 Whitepaper: Private Cloud Migration October 2012 2
Introduction Private cloud infrastructures are gaining wide acceptance among enterprises and service providers. Consolidation of multiple servers and storage devices into private clouds based on Vblock (Cisco UCS + VMware vsphere + EMC vplex) and FlexPod (Cisco UCS + VMware vsphere + NetApp FAS) are examples of such adoption. The particular behavior of networked storage components in such virtualized contexts must be investigated so customers can establish the performance limits of their private clouds, and optimize their configuration. This white paper is based on actual Load DynamiX customer experiences. Users have demonstrated that a pre-production testing approach using Load DynamiX provides critical configuration information and reveals important infrastructure behavior issues before deployment. These insights save customers valuable time and help them design more robust solutions for greater performance and uptime. This white paper describes how Load DynamiX is used to investigate a boot storm scenario as well as the run-time behavior of a virtualized application. The process generates insight used into the configuration of not only networked storage components, but also their related network and compute components. Background on Load DynamiX Load DynamiX provides an all-in-one test solution for network storage (File, Block and Object storage) designed for storage engineers and architects in business critical data center environments. Load DynamiX allows users to investigate new storage configurations and test the limits of their infrastructures. Users leverage the Load DynamiX test development environment (TDE) and tools to emulate their production workloads; they can then generate realistic traffic using the Load DynamiX load generation appliances. Users can thus continually evaluate new configurations or solutions. Load DynamiX empowers customers with the insight they need to make better design and configuration decisions, so they can control their cost, performance, and risk. Model of the Private Cloud Infrastructure Leaving Firewall and Load Balancer out of scope, a simplified representation of the Private Cloud can be shown as follows: v (NFS/CIFS/iSCSI/FCoE) /TCP/IP/Ethernet (NFS/CIFS/iSCSI)/TC P/IP/Ethernet UCS chassis Switch Fabric FC Unified Network Storage Whitepaper: Private Cloud Migration October 2012 3
The UCS chassis can host multiple blades; each of them may host a vsphere ESXi hypervisor and multiple VMs. The hypervisors and VMs can use various network storage protocols, such as NFS, iscsi, CIFS, and FC/FCoE. The switch connects the UCS to Network Storage over multiple VLANs and FC interfaces. The Network Storage is often a cluster with multiple physical ports, IPs, FC interfaces on the network side providing access to the storage arrays through multiple layers of protocols and software. Performance thresholds are usually set up in the monitoring system, which is a part of the Private Cloud management. Configuration of a complex system like this is a no small task. Thus, the probability of a mistake is high. Testing and verification of the private cloud, even partial, prior to moving it into production will save lots of efforts going forward. The Load DynamiX approach to pre-deployment testing is to emulate the computational resources (effectively replacing the UCS resources) with the Load DynamiX appliance and run synthetic network storage workloads resembling the production workloads against the Network Storage and Switch Fabric. v (NFS/CIFS/iSCSI) /TCP/IP/Ethernet (NFS/CIFS/iSCSI) /TCP/IP/Ethernet Load DynamiX FC Switch Fabric FC Unified Network Storage The Load DynamiX Appliance is capable of generating feature-rich scalable complex network storage workloads emulating large amount of the network storage clients over a set of networks and interfaces. Load DynamiX Pre-Deployment Testing Strategy The Load DynamiX solution is provided with a suite of pre-built test workloads designed for testing the Private Cloud. Whitepaper: Private Cloud Migration October 2012 4
Pre-Test Workloads Prior to running any tests, Load DynamiX creates: 1. An emulation of the VM images. Depending on the storage type, the VM images are represented as sets of directories and files or block ranges on the LUNs 2. Directories and files on the volumes/shares representing the objects used by VMs 3. Ranges of the blocks on the LUNs representing logical disks used by VMs Boot Storm Workload Test Boot storm is an industry term for a simultaneous boot up process of multiple VMs from a single storage device. A boot storm creates an extraordinarily high load for read requests on the network storage. This load is many times greater than normal production loads. Load DynamiX models the boot storm as a set of concurrent clients reading a significant amount of data (1GB) from the designated parts of the Network Storage. In the case of NFS these are previously created files representing the VM images, in the case of iscsi or FC these are the sub areas of the LUNs filled with the VM image data. Whitepaper: Private Cloud Migration October 2012 5
Load DynamiX Appliance An example of the Boot Storm Workload Test. The VM images are stored on the NFS volumes. The test provides the following insights: Reveals hidden bottlenecks in the switch/storage configuration Allows for evaluation of the optimal cache and boot concurrency settings in order to optimize the boot process There is a special kind of the boot storm known as a VDI boot storm. The specifics of the VDI boot storm are that the images of the VM are very similar to each other and, therefore, deduplicable and that the boot is often accompanied by the authentication process. Load DynamiX accommodates these specifics very easily in its workloads. Whitepaper: Private Cloud Migration October 2012 6
Run Time Workload Test This test exercises the Network Storage with a mix of Read/Write/Meta Data operations directed towards both the VM images through the Hypervisors and the files previously created over CIFS and NFS or logical iscsi/fc disks and accessed by the guest OSs. The Run Time Workload tests are also designed to test and stress the authentication infrastructure. An example of a Private Cloud utilizing NFS, iscsi, and CIFS protocols is shown in the figure below. The Authentication Servers are usually located on a UCS blade and run as VMs. Hypervisors SwiftTest Appliance Load DynamiX Appliance * Scenario ISCSI Read (blocks) Guest Os s * Scenario NFS Read (swap) * Scenario NFS Write (swap) * Scenario ISCSI Write (blocks) * Scenario NFS Read (image) * Scenario NFS Write (image) * Scenario CIFS Session Setup Read, Write, Create, Delete,..etc. (App Files) (ISCSI) (NFS) Application Networks (CIFS) Network Storage Authentication Server ISCSI Lun/blocks Lun/blocks Lun/blocks VM*.vmdk VM*.vswp NFS *Share F_App1 F_App2... CIFS File1, File2,... File1, File2,... File1, File2,... F_App File1, File2,... Whitepaper: Private Cloud Migration October 2012 7
The ratio of the IOPs, their sizes, and the content are configurable in order to emulate the production workloads. The number of concurrently active VMs can be scaled up to thousands. This test reveals: Performance limits Potential misconfigurations of the network and/or the network storage Potential interference of the multiple interfaces Acceptable number of simultaneously running VMs Functioning of monitoring system alarms Private Cloud Analysis The prebuilt workloads outlined above provide a number of open parameters that allows for customization for the specific Private Cloud under test. These parameters and settings are used throughout the test suite and can be broken down into three major groups: Computational resources parameters and settings Network resources parameters and settings Configuration parameters and settings Computational resources Number of UCS blades Number of VMs Distribution of VM size Type of storage used for VM images (FC, NFS, iscsi) Types of external storage used by VMs - FC - iscsi drives - NFS mounts - CIFS shares An example of the VM size distribution is shown in the figure below. The horizontal axis represents the file size in logarithmic scale. Whitepaper: Private Cloud Migration October 2012 8
70 VM File Size Distribution 60 Typical VM sizes 50 Number of files 40 30 20 10 Large size outliers 0 1 10 100 1000 VM size (GB) Network resources and configuration Network configuration and IP addresses allocated to the Hypervisors IP addresses of the guest OSs running on the VMs. These IP addressed will be used by the Load DynamiX clients representing VMs and hypervisors Storage resources and configuration IP addresses of the Network Storage servers, volume, shares, initiators and target names are open parameters of the Load DynamiX workload tests. Additional information on the specifics of the production workloads can be used as an input to the Load DynamiX workload: Whitepaper: Private Cloud Migration October 2012 9
Topology of the directory structure Number of files and their size distribution Ratio of the Read/Write/Meta Data operations Degree of compressibility and de-duplicability of the file contents Authentication In order to test the authentication infrastructure as a part of the Private Cloud, user names and passwords should be created for the testing purposes and configured in Load DynamiX. Examples of Test Results Execution of the Load DynamiX workloads in the Private Cloud environment reveals many issues otherwise overlooked, usually to be uncovered at a later point in production. The first two charts present an example of optimization of the VM boot process and the discovery of a frame size misconfiguration found during the testing process. VM Boot Process Fig. R1 shows the number of active VMs (VMs that have completed the boot up process) as a function of time for three different values of Startup Delay. The Startup Delay is a pause between starts of two consecutive boot processes. Time count starts at the start of the boot of the first VM. Whitepaper: Private Cloud Migration October 2012 10
Fig. R1. VM Boot Process for Various Startup Delays 450 400 Startup Delay = 20 s 350 Startup Delay = 13 s Number of Active VMs 300 250 200 150 100 Startup Delay = 10 s Startup Delay = 0.1 s 50 0 0 20 40 60 80 100 120 140 Time (min) Configuring an optimal Startup Delay: Fig R1 illustrates that Startup Delay being too short (0.1s, red line) leads to a longer overall boot process than an optimal one of 13 s, green line. Startup Delay being too long (20 s, orange line) also results in a suboptimal time for the completion of the boot process. Whitepaper: Private Cloud Migration October 2012 11
Fig. R2. VM Boot Process for different frame sizes 2000 MTU = 1500 B Number of active VMs 1500 1000 500 MTU = 8000 B 0 0 10 20 30 40 50 60 Time (min) Configuring frame size: Fig. R2 shows the boot up process over NFS of a large number of VMs using regular Ethernet frames (green line) and jumbo frames (red line). At around 43 min in the process, traffic slows down to a crawl in the jumbo frame case. This test revealed a misconfiguration of the switch which was caught prior to release to production. Run Time Scenarios In these scenarios, Load DynamiX is being used to test and configure the responsiveness of the virtualized application. Load DynamiX workloads emulate VMs updating business data over networked storage. Load DynamiX measures the server response times to commands initiated by the VMs. These response times are directly related to the user experience and better characterize performance of the network storage than would a simple measurement of IOPs on the storage device. The response time for Write/Read operations over NFS shows dependence on the number of simultaneously working VMs and the size of the VM image file. Fig. R3 shows that the larger the size of the VM image the more significant the slowdown experienced by a VM as the number of the VM grows. Whitepaper: Private Cloud Migration October 2012 12
6 Fig. R3. NFS Write/Read (32 KB) Response Time vs Number of Files of different sizes S: Connections S > 100 GB 20 GB < S < 40 GB 2 GB < S < 10 GB 5 Read response teime (ms) 4 3 2 1 0 0 50 100 150 200 250 Number of active VM The information presented in Fig. R3 allows for proper configuration of VM image size and the number of the concurrently running VMs in the Private Cloud against the expected performance benchmarks. Whitepaper: Private Cloud Migration October 2012 13
Conclusions The Load DynamiX testing platform is applied to a pre-deployment testing of a Private Cloud environment. Load DynamiX workloads can be customized to capture specifics of a particular Private Cloud and emulate a variety of production loads (nominal, peak, limits). These unique insights into the performance bottlenecks of the Private Cloud obtained using Load DynamiX allow customers to make important design and configuration decisions pre-deployment, saving valuable time and resulting in greater performance and uptime. Whitepaper: Private Cloud Migration October 2012 14