Expert Reference Series of White Papers Visions of My Datacenter Virtualized 1-800-COURSES www.globalknowledge.com
Visions of My Datacenter Virtualized John A. Davis, VMware Certified Instructor (VCI), VCP5-DV, VCP5-DT, VCAP5-DCA, VCAP5-DCD, VCP-vCloud, VCAP-CID Introduction Today, 98 percent of the Fortune 500 companies have virtualized some portion of their datacenter utilizing VMware products (http://www.vmware.com/company/customers/), but some small- and medium-size businesses have yet to follow their lead. Some businesses are comfortable with their current datacenter implementations and hesitate to make a major change. Others are concerned that virtualization will cause performance or reliability issues. But, most of these businesses are actually harming themselves and limiting their potential by resisting virtualization. Today, all businesses should take a long look and strongly consider the benefits of virtualizing their datacenters. This paper is aimed at stakeholders of datacenters in businesses who are still using a traditional, physical environment and have not yet adapted server virtualization. It is intended to help the reader visualize how virtualization benefits their unique environment and their lives. It provides a specific scenario for reference as well as a few other examples. Scenario Overview Consider a scenario, where a particular business (customer) is using a traditional environment consisting of 50 physical Windows servers. They expect their business and IT needs to grow rapidly over the next few years. They estimate that they will grow to 200 Windows servers during the next three years. Today, they are mostly satisfied with their current environment; but they expect great challenges due to scaling. They know the buzz in the community concerning virtualization and how it may address some of their challenges. They are interested in migrating their Window servers to a virtualized environment built on VMware vsphere to achieve server consolidation. They expect such a move would also improve high availability and ease of management. The customer decides to engage a VMware Certified Professional (VCP) to lead a virtualization project to assess the current environment and capacity planning, and propose a solution and proof of concept testing. Assessment The first step of the customer s virtualization journey is to assess the current environment. The assessment includes identifying the current configuration and capacity of the existing hardware, software, network, and storage. It also includes measuring the current workload, over a specific period of time and identifying the current peak usage of each resource. It involves gathering resource usage data, performing customer interviews, and utilizing various assessment tools. It identifies the service level agreements (SLAs) for the customer s applications and the current obstacles for compliance. Copyright 2013 Global Knowledge Training LLC. All rights reserved. 2
Various tools, such as VMware Capacity Planner or Microsoft Performance Monitor could be used to collect the necessary resource usage statistics. The assessment should at least measure the combined usage of each resource type during periods of peak activity, but could also measure resource usage during periods of normal and idle activity. The main resource usage statistics to measure include CPU usage, RAM usage, disk operations per second (IOPs), network IOPs, and disk space usage. Although this paper focuses mostly on capacity planning and benefits of server consolidation, an actual assessment should cover much more. Current Capacity and Configuration Details For this white paper, only a subset of the data that would be collected during a full assessment is provided as an example. A full assessment would require more details. Here are the details of the capacity and configuration of the customer s current environment as revealed during the assessment. Server Configuration 50 physical Windows Servers (2003 and 2008) are used to run infrastructure and business applications. Each server has redundant connections to the LAN, utilizing two NICs per server, each connected to different Ethernet switches. Each server has redundant connections to the SAN, utilizing two FC host bus adapters (HBAs) per server, each connected to different FC switches. Each server has a single remote access card to allow access to the console via a web browser. 20 small servers are configured with one 2.5 GHz CPU, 2 GB RAM, and a 20 GB operating system (OS) drive. 30 large servers are configured with two 2.5 GHz CPUs, 4 GB RAM, and a 40 GB OS drive. (However, ten of these servers actually boot from SAN and do not utilize the local disk.) Each server occupies one unit of rack space. Each server has redundant power supplies and redundant fans. Each server has an OS drive (C:) and an application and data drive (E:). The application data drives are SAN-based. Ten of the large servers boot from SAN, utilizing 40 GB LUNs. Storage Area Network Configuration (SAN) Fiber Channel (FC) SAN is utilized to store all application data and some boot disks. 500 GB of space is configured into 50 LUNs to provide a single LUN to each server for application data storage. 400 GB of space is configured into ten LUNs to provide boot disks for ten of the servers. Two 64-port FC switches are used to provide the storage network. 50 FC switch ports are used to connect to the Windows Servers. Network Configuration Three 64-port Ethernet switches are used to provide the local area network (LAN). 50 Ethernet switch ports on one switch are used to connect the Windows Servers Remote Access Cards. 50 Ethernet switch ports are used on each of the other two switches to connect the Windows Servers NICs. Copyright 2013 Global Knowledge Training LLC. All rights reserved. 3
Summary of Total Current Capacity The details from above can be summarized as follows: 1600 GB local disk space 500 GB SAN space for application data 400 GB SAN space for boot disks CPU Capacity = 200 GHz RAM Capacity = 160 GB 100 NICs (plus 50 Remote Access Cards) 100 FC Adapters Summary of Other Assessed Items The customer currently utilizes a backup system that involves copying some OS and application data to SAN-based storage and tape. Tapes are stored offsite for data recovery usage. The current expected time to fully restore the OS drive of a failed server to a spare server is two hours. The current expected time to configure a spare server to replace one of the ten servers that boot from SAN is 30 minutes. Currently, administrators can access the console of the Windows Servers remotely by connecting a web browser to the server s remote access card. However, each server has just a single remote access card with a single network connection to one specific switch. Console access to a single server is vulnerable to the failure of a remote access card, network connection, or switch. A single switch failure would cause the loss of console access to all servers. The current time to deploy a new Windows server after the final approval is made, is typically at least one week. This allows time for the procurement, diagnostics, hardware configuration, and software configuration. Traditional Capacity Planning The VCP performed capacity planning to calculate how much resources would be required to support their environment as it grows for three years, utilizing the current traditional (physical) model. Here are the details for the target environment using the current, traditional method: 200 Windows servers (120 Large Servers, 80 Small Servers) Total CPU capacity = 800 GHZ Total RAM capacity = 640 GB 400 Network Adapters 200 Remote Access Cards 600 network switch ports (for NICs and remote access card connections) 400 FC adapters and FC switch ports 200 rack units for servers Copyright 2013 Global Knowledge Training LLC. All rights reserved. 4
6400 GB Local Disk Space 2000 GB SAN space for application data 1600 GB SAN space for boot from SAN drives 240 LUNs (200 application data LUNs and 40 boot LUNs) Capacity Planning for Virtualized Environments One major benefit of virtualization is to lower the total cost of ownership (TCO) by server consolidation, which implies that the target environment will allow the provisioning of virtual resources that exceed the actual, physical resources. For example, the total number of provisioned virtual CPUs will likely exceed the total number of actual, physical CPUs. For a successful deployment, the key is to ensure that actual physical resources are enough to meet the expected peak concurrent demand. Naturally, if a set of virtual machines (VMs) concurrently demands more CPU resources than are physically available, the VMs will perform terribly. During an assessment, a critical step is to identify the peak usage of each resource, including CPU, RAM, disk, and network. In order to keep this paper brief, the following details of the capacity planning will focus on peak CPU usage, peak RAM usage, data space usage, and expected growth. Current Usage Details In this scenario, a monitoring tool is used to continuously collect resource usage statistics from each of the 50 Windows servers for 30 days, at five-minute intervals. At each sample, a combined usage value is calculated for each resource by summing the value from each of the servers. The combined peak usage of a particular resource is defined as the highest combined value of that resource at any interval during the collection period. The combined idle usage of a particular resource is defined as the lowest combined value of that resource at any interval during the collection period. The combined normal usage of a particular resource is defined as the average of the combined values of that resource measured only during business hours (between 8 a.m. and 6 p.m.). The combined peak usage of CPU, RAM, and disk space are: Combined peak CPU usage 50 GHz (25 percent of capacity) Combined peak RAM usage 80 GB (50 percent of capacity) Combined peak OS disk space usage 640 GB (40 percent of capacity) Combined peak application data disk space usage 200 GB (40 percent of capacity) Table 1. Combined Peak Usage of CPU, RAM, and Disk Space Copyright 2013 Global Knowledge Training LLC. All rights reserved. 5
Capacity Planning for Potential Target Virtual State For this scenario, the following details are used for the capacity planning for CPU, RAM, and disk space for the potential, target virtual state. The target environment must provide sufficient resources to meet the estimated peak demand of the future environment after its expected growth. Assuming the percentage of actually used resources during peak activity stays the same in the future as in the current environment, the future peak usage can be estimated by applying those percentages to the values shown in an expected traditional target capacity table (see Traditional Capacity Planning above) - Future peak CPU usage = 25 percent x 800 GHZ = 200 GHz - Future peak RAM usage = 50 percent x 640 GB = 320 GB - Future peak disk space usage = 40 percent x (OS Disk space + App Data disk space) = 40 percent x (6400 GB + 2000 GB) = 3360 GB It must provide additional resources for virtualization overhead, which is estimated as follows for this scenario. - CPU overhead = 10 percent x Future Peak CPU Usage = 200 GHz x 10 percent = 20 GHz - RAM overhead = 100 MB per VM = 100 MB x 200 VMs = 20 GB - Disk space overhead = Total provisioned RAM + 20% x Future Peak disk usage = 640 GB + 20 percent x 3360 GB = 1312 GB The minimum capacity needs for each resource based on expected future peak usage plus overhead is: - Future peak CPU usage + CPU overhead = 200 GHz + 20 GHz = 220 GHz - Future peak RAM usage + RAM overhead = 320 GB + 20 GB = 340 GB - Future peak disk space usage + disk space overhead = 3360 GB + 1132 GB = 4492 GB The customer s plan is never to use more than 80 percent of the CPU and RAM of the host hardware, even during periods of peak activity. This can be calculated by increasing the capacity by 125 percent as follows: - Minimum CPU capacity required = 220 GHz x 1.25 = 275 GHz - Minimum RAM capacity required= 340 GB x 1.25 = 425 GB The environment must provide spare capacity to support both planned and unplanned host downtime. In this scenario, the customer decided to plan for an N+2 design, where they intend to implement two additional hosts to the environment for spare capacity than would otherwise be needed. The goal is to provide sufficient spare capacity even if one host unexpectedly failed while another host was under scheduled maintenance. The customer desires to continue utilizing the existing SAN, but plans to add more storage space to the SAN. The customer desires to plan to always leave 10 percent to 20 percent spare capacity in each LUN and available provisioned storage space in the SAN for flexibility. To meet this goal, the customer decides to add an additional 40 percent to the 4492 GB minimum disk space previously calculated. The total planned usable disk space = 6288 GB Copyright 2013 Global Knowledge Training LLC. All rights reserved. 6
Details for Potential Virtual Target State Here are the details for the proposed virtualized environment. For a full assessment, more than one configuration is typically proposed and considered. For this paper, the following configuration is chosen. 200 virtual Windows servers Ten VMware ESXi hosts (new hardware) - CPU: 36 GHz per host (two sockets x six cores per socket 3 GHz) - RAM: 64 GB per host - Network Adapter ports: six per host (three dual-port NICs per host) - FC Adapter ports: two per host (two one-port FC adapters per host) - Rack units: two per host Total CPU capacity: 36 GHz x 10 hosts = 360 GHz Total RAM capacity: 64 GB x 10 hosts = 640 GB Total rack space for servers: 2 units x 10 hosts = 20 units Total Ethernet ports for servers: 6 ports x 10 hosts = 60 ports Total FC ports for servers: 2 ports x 10 servers = 20 ports 6288 GB disk space (2688 GB new storage plus 3600 GB existing) 20 FC LUNs (assuming 10 VMs per LUN) Planned vsphere Functionalities In addition to consolidation, the customer plans to implement the following vsphere features: VMware vmotion: provides the customer with the ability to migrate running Windows VMs from one ESXi host to another without user disruption. This allows the administrator to perform scheduled maintenance of a host without affecting the availability of the VM. It also allows the balancing of workloads across all the hosts. VMware Distributed Resource Scheduler (DRS): automatically balances the workload, based on CPU and RAM usage, across all the ESXi hosts by utilizing vmotion. VMware High Availability (HA): automatically detects the failure of an ESXi host, cold migrates the failed VMs from that host to surviving hosts, and restarts those VMs. The customer expects that whenever a host failure event occurs, the failed VMs will automatically migrate, restart, and respond to users within a few minutes. VMware vcenter: provides central management of all ESXi hosts and VMs. It includes performance charts for monitoring resource usage, event-related alarms, resource usage alarms, and logs. VM File System (VMFS) Datastores: are LUNs that are formatted with the VMFS file system and used to store multiple VMs. Each VMFS datastore allows concurrent access from multiple ESXi hosts and VMs at close to native performance. The customer expects to store about ten VMs per VMFS datastore, which should greatly decrease the amount of effort to configure and manage FC LUNs. Copyright 2013 Global Knowledge Training LLC. All rights reserved. 7
Main Benefits of Virtualization: Comparing the Virtualized Target State with the Traditional Target State The following list identifies some of the major benefits of the proposed migration to virtualization. The numbers are intended to be reasonable estimates, focused on just what the ESXi servers would need. For example, the calculation for network ports only includes the connections to the ESXi hardware, but a few other ports may be utilized for other means. Some of the data that is provided was generated by the VCP using the VMware Return on Investment (ROI) Calculator. Reduced physical servers from 200 to ten Improved peak CPU utilization from 25 percent to 80 percent (not including spare capacity) Improved peak RAM utilization from 25 percent to 80 percent (not including spare capacity) Reduced utilized rack space from 200 units to 20 Reduced utilized electrical power usage for servers and cooling, from 1,266,000 kilowatt hours per year to 118 kilowatt hour per year (estimated using the VMware ROI Calculator) Reduced network connections, including network adapters, remote access cards, cables, and ports, from 600 to 60 Reduced SAN connections, including FC adapters, cables, and ports, from 400 to 20 Improved storage utilization from 40 percent to 71 percent Reduced number of LUNs from 240 to 20 Added central management and monitoring of VMs, ESXi host, virtual switches, VMFS datastores, and resource usage. Improved, automated recovery times due to hardware failure: - For the ten boot-from-san servers from 30 minutes to five minutes - For all other servers from two hours to five minutes Reduced the required support labor hours. Reductions are expected in overall hardware, software, network, storage, and operations support. Added redundancy for console access to the Windows servers via the remote console in the vsphere Client and NIC Teaming on the management network ports on each host. Decreased deployment time for new Windows Servers from one week to one day. Total cost of ownership (TCO) savings over three years, as estimated using the VMware ROI Calculator is $1,087,710, which includes the main elements: - Server Savings: $192,000 - Power and Cooling Savings: $237,579 - Storage Savings: $159,635 - Network Savings: $56,000 - Rack Space Savings: $67,000 - IT Administration Savings: $330,241. ROI = 135 percent (Payback = 1.9 years) Copyright 2013 Global Knowledge Training LLC. All rights reserved. 8
Details on Additional Potential Benefits The customer could choose to implement VMware Data Protection or a third-party backup solution that integrates with VMware vsphere Storage APIs. Potentially, such solutions could: - Utilize storage area network rather than local area network for the data transfers - Perform full VM backups, focused on virtual disk files - Eliminate backup agents in each VM, thus eliminating workload in the VMs and allowing the work to be offloaded - Improve the performance, reliability, and manageability of backups and restores The customer could choose to implement storage replication and disaster recovery (DR) software to automate DR. The replication could be SAN-based, which requires implementing a similar SAN at the recovery site. The replication could be VM-based, such as VMware vsphere Replication, which does not have any specific storage requirements. The DR software could be VMware Site Recovery Manager, which works with both types of replication to automatically fail VMs over to the recovery site when a disaster occurs. The customer could utilize the impact of reduced power and cooling efforts to promote itself as a green company. A new method for troubleshooting production applications may present itself. A running VM can be copied to produce a VM that can be started and analyzed on a test network. Other Scenarios and Examples A potential scenario is a business with just 15 traditional production servers, that struggles to finds a means to improve business continuity. They currently have no automatic high availability. By implementing vsphere on two ESI hosts, they could provide automatic high availability due to server hardware failure. This allows all the VMs on a host that fails to automatically reboot within a few minutes on the surviving host. They could potentially add automatic fault tolerance for one or two of these servers by utilizing VMware vsphere Fault Tolerance. This allows that VM to automatically migrate statefully, immediately, with no user interruption, to the surviving host. Another potential scenario is a business with 90 Windows and Linux servers that refresh their hardware in three-year cycles. In the first year, they could implement a new vsphere environment with two hosts and migrate the 30 servers that were due to be refreshed. They could immediately create all new application servers as VMs. They could migrate the remaining, existing servers over the next two years. Another potential scenario is a small company that never had a test environment. Currently, they truly have no viable means to test new software, OS updates, and application software updates. They rely mostly on buying new hardware to test new software. They typically utilize full backups and scheduled downtimes to test changes to existing software. They could implement an affordable vsphere environment for test purposes, where they use VMware Converter to clone each traditional, physical server to create an identical test VM. The test VMs could be connected to a separate test network. Although the company may still elect to continue utilizing physical servers for production, they will likely, eventually realize the counterpart tests servers run as well or better and will likely decide to migrate the production servers to VMs. Copyright 2013 Global Knowledge Training LLC. All rights reserved. 9
Finally, another potential scenario is a mid-size company with no viable DR solution. They may decide to utilize a virtualized environment for recovery. In this case, they may choose to use a backup solution to backup the OS, application, and data files from their traditional, physical servers and use VMware Converter to migrate the data into VMs in the event of a failure. Conclusion Although many small- and mid-size businesses have not yet invested in server virtualization, they are actually limiting themselves from dramatic TCO savings and improved business continuity. VMware vsphere is a reliable technology that is utilized by almost all Fortune 500 companies and is proven to reduce costs and simplify administration. All companies should at least assess their specific situation, consider various virtualized configurations, and visualize their virtualized datacenter running on VMware vsphere. Learn More To learn more about how you can improve productivity, enhance efficiency, and sharpen your competitive edge, Global Knowledge suggests the following courses: VMware vsphere: Fast Track [V5.1] VMware vsphere: Install, Configure, Manage [V5.1] VMware vsphere: What s New [V5.1] VMware vsphere: Optimize and Scale [V5.1] VMware vsphere: Troubleshooting Workshop [V5.1] Visit www.globalknowledge.com or call 1-800-COURSES (1-800-268-7737) to speak with a Global Knowledge training advisor. About the Author John A. Davis has been a VMware Certified Instructor (VCI) and VMware Certified Professional (VCP) since 2004, when only a dozen or so VCIs existed in the U.S. He has traveled to many cities in the U.S., Canada, Singapore, Japan, Australia, and New Zealand to teach. He splits his time between teaching and delivering professional consulting services that are 100 percent focused on VMware technology. He is a VMware Certified Advanced Professional (VCAP) on VMware vsphere (VCAP5-DCA, VCAP5-DCD) and VMware vcloud (VCAP5-CID). Copyright 2013 Global Knowledge Training LLC. All rights reserved. 10