VBLOCK TM SOLUTION FOR APPLICATION LIFECYCLE PLATFORM August, 2011 2011 VCE Company, LLC. All rights reserved.
Table of Contents Executive Summary... 3 Business Challenges... 3 Solution... 4 Benefits... 4 Scope... 4 Audience... 5 Technology Overview... 6 VCE Vblock Infrastructure Platforms... 6 VMware vfabric Platform... 7 Testing Tools and Application... 10 Solution Architecture Overview... 11 Use Cases... 13 Use Case 1: Elasticity of applications on Virtual Servers (VMs)... 13 Use Case 2: Elasticity of the application on the underlying physical servers.. 19 Conclusion... 31 References... 32 2011 VCE Company, LLC. All rights reserved. 2
Executive Summary Enterprises engaged in developing, testing, and deploying their applications must find a way to deal with many layers of the targeted platform. These layers include application services, virtualization, and hardware infrastructure with compute, network, storage, and management at all levels. IT and engineering must also focus their energy on acquiring, deploying, and providing ongoing management, as well as making sure all the layers work seamlessly together. This situation increases an enterprise s initial and ongoing cost, extends the development cycle, and reduces flexibility needed to respond to changes in the market. The Vblock Solution for Application Lifecycle Platform (ALP) provides application developers and owners full control over the Lifecycle of applications. The solution provides and streamlines the requisite applications and infrastructure resources to develop, test, provision, run, and manage applications in their environment. The solution enables them to dynamically adjust application and infrastructure resources according to real-time demand. Application owners set the rules for when to scale up resources, or scale down. This flexibility enables applications owners to meet peak demand to maintain performance standards or to scale down during slow times, in order to balance IT resources for other applications. This document describes the Solution Architecture for the Vblock Solution for Application Lifecycle Platform with a complete stack, from application development services and infrastructure, to system infrastructure, application scalability, and management. ALP provides rapid deployment and sustainable operation for a virtualized converged infrastructure. It leverages the VCE Vblock Infrastructure Platforms and VMware vfabric cloud application platform. The solution architecture shows how automatic elasticity can be built using VMware vfabric Hyperic and Vblock management. Business Challenges The traditional enterprise IT roles such as server administrator, unix administrator, and network, storage or exchange administrator, have been far too static. The roles operate in silos, which causes friction within IT organizations, as well as between developers and IT. The approach offered by the Vblock Solution for Application Lifecycle Platform provides a more efficient operational model for accelerated delivery of new IT high-value services. Prior to this present solution, it has been extremely difficult for developers to play the role of IT administrator when their developed applications were deployed as cloud applications. This difficulty has been due not only to various access-controls, authentication and authorization complexities, but also to a model of organizational separation between developers and IT, where the default behavior is to deny access. Enterprises face the following challenges when developing applications for the cloud: Software is increasingly being offered as a service and deployed through the cloud A new breed of software, called Web-apps is being developed by many organizations. Web-apps have an unpredictable traffic workload. Enterprise Software is increasingly developed using languages such as Java and Ruby/Rails, and frameworks such as Spring Development cycles are shrinking and products are frequently being deployed on clouds. Consequently, application owners and software developers must ensure that the new features that they develop are rapidly taken through the dev-build-test-stage-deploy cycle, and most importantly: o Traditional IT roles are being turned on their heads when enterprises adopt cloud computing and move to delivery of applications via the software-as-a-service model (SAAS) on a public cloud or private cloud deployment environment 2011 VCE Company, LLC. All rights reserved. 3
Solution ALP combines the Vblock platforms and VMware vfabric cloud application platform. ALP combines best-of-breed Platform as a Service (PaaS) with Infrastructure as a Service (IaaS) to provide a complete platform for modern applications. It provides customers with what is needed to develop and build their modern applications; the infrastructure and management needed for testing, deploying and dynamically scaling these applications. It provides IT with control over quality of service and security, while providing application owners and developers with flexibilty, as well as with instant access to needed resources for addressing application demand fluctuation. The solution enables application owners to dynamically adjust application and infrastructure resources according to real-time demand. Application owners set the rules for when to scale up resources or scale down. The solution architecture provides customers with an approach to implement automatic elasticity with ALP. The approach is demonstrated through two use cases. Both use cases ensure proactive elasticity, and reactive (just-in-time) elasticity: Use Case 1: Elastic and automated policy driven expansion of Virtual Infrastructure, specifically virtual machines which run different application components in response to increased application workload Use Case 2: Elastic and automated policy driven expansion of Physical Infrastructure, specifically physical machines which run the Hypervisor to provide virtual infrastructure, in response to increased application workload, that cannot just be handled by Elasticity Use Case 1, above Benefits The Solution Architecture for Application Lifecycle Platform as described in this paper provides the following benefits to enterprises developing for the cloud: Simplified and streamlined acquisition of: o o Hardware - network, compute, storage Software - application development tools, and an application management tool Management software that enables automation, deployment, and elastic control over the hardware and software Turnkey, ready to use, application development components in the front end, middle tier, and back end for customers to rapidly build and deploy web-applications Reduced platform total cost of ownership (TCO) Proactive business agility through application and infrastructure monitoring Scope The paper describes a Solution Architecture that is a design and provisioning strategy for the Application Lifecycle Platform. The specific goals of the Solution Architecture are to: Describe end-to-end application and infrastructure provisioning Demonstrate running custom Java applications Show how customers can build their application elasticity on the Vblock Solution for Application Lifecycle Platform 2011 VCE Company, LLC. All rights reserved. 4
Show how to use VMware vfabric Hyperic to monitor and manage their application performance on the platform Provide elastic scaling on the private cloud with the ability to tap into a public cloud for excess capacity needs These goals are demonstrated through use cases, both use cases ensure proactive elasticity, and reactive (justin-time) elasticity, the two uses cases include: Use Case 1: Elastic and automated policy driven expansion of Virtual Infrastructure, specifically virtual machines which run different application components in response to increased application workload Use Case 2: Elastic and automated policy driven expansion of Physical Infrastructure, specifically physical machines which run the Hypervisor to provide virtual infrastructure, in response to increased application workload, that cannot just be handled by Elasticity Use Case 1, above Audience This document is intended for technical engineering staff, managers, IT planners, administrators, development operations, and other IT professionals involved in evaluating, managing, operating, or designing end-to-end provisioning and deployment on Vblock platforms. Table 1. Terminology Term Vblock Infrastructure Platforms tc server and Hyperic server and agent Monitoring Elasticity Automatic Elasticity Definition Vblock Infrastructure Platforms by VCE are enterprise-and service provider-class IT infrastructure. Vblock platforms are pre-engineered, tested, and validated units that have a defined performance, capacity, and availability Service Level Agreement (SLA). Vblock platforms streamline IT infrastructure acquisition, deployment, and operations. The platforms accelerate organizations migration to private clouds. This project incorporates the VMware vfabric components: tc server, Hyperic server and agent. Monitoring, for example with Hyperic, to monitor the application health based on predefined response time or other metrics such as response time to detect issues before users notice. Computing resources are pooled and allocated/deallocated to different projects or running application instances as needed, without a disruption to the running system. Elasticity s main objective: maximize resource utilization, and reduce costs. The allocation and deallocation of resources is unintrusive and automatic, based on monitoring of performance and preset policies. Its main objective: uninterrupted business in a volatile context at a usebase cost. 2011 VCE Company, LLC. All rights reserved. 5
Technology Overview The Vblock Solution Application Lifecycle Platform architecture enables automation of cross-functional operations on physical elements such as servers, network and storage devices, and virtualization layers; on each Lifecycle step; during runtime (post-deployment), through proactive and reactive elasticity and using various configuration elements including hardware, software, tools, components and management elements. The solution uses the following major hardware and software components and technologies. A customer s environment may include additional components based on the application needs. VCE Vblock Infrastructure Platforms Vblock platforms are enterprise- and service provider-class IT infrastructure units that are pre-engineered, tested, and validated with pre-defined performance, capacity, and availability service levels. The standardized converged infrastructure of the Vblock platform is a foundational building block for cloud computing that helps customers to realize the benefits of applications running in a virtualized environment. Vblock platforms are characterized by: Repeatable units of construction based on matched performance, operational characteristics, and discrete requirements of power, space, and cooling Repeatable design patterns that facilitate rapid deployment, integration, and scalability An architecture that can be scaled for the highest efficiencies in virtualization and workload mobility An extensible management and orchestration model based on industry-standard tools, APIs, and methods A design that contains, manages, and mitigates failure scenarios in hardware and software environments Note: Refer to the Vblock Infrastructure Platform Architecture Overview for detailed information on the Vblock architecture. Use the link provided in the References section of this paper. EMC Ionix Unified Infrastructure Manager (UIM) UIM is included in every Vblock Infrastructure Platform to manage the configuration, provisioning, and compliance of aggregated Vblock Infrastructure Platforms. UIM simplifies deployment and integration into IT service catalogs and workflow engines, and dramatically simplifies Vblock platform deployment by abstracting the overall provisioning while offering granular access to individual components for troubleshooting and fault management. ESX 4.1 VMware ESX is the basis of the virtual environment with VMware vcenter. It provides the foundation for building and managing a virtualized IT infrastructure. This market leading, production-proven hypervisor abstracts processor, memory, storage and networking resources into multiple virtual machines that run unmodified operating systems and applications. VMware vsphere 4.1 and its subsequent update and patch releases are the last releases to include both ESX and ESXi hypervisor architectures. Future major releases of VMware vsphere will include only the VMware ESXi architecture. VMware ESXi is the latest VMware hypervisor architecture. In addition to the performance, reliability and consolidation capabilities you are accustomed to with ESX, VMware ESXi improves hypervisor management 2011 VCE Company, LLC. All rights reserved. 6
in the areas of security, deployment and configuration, and ongoing administration. VMware recommends that deployments of vsphere 4.x should also utilize the ESXi hypervisor architecture. For more information visit VMware ESXi and VMware ESX Info Center by using the link provided in the References section of this paper. VMware vcenter The VMware vcenter is the virtualization management platform used in this solution to provide seamless end-toend datacenter management through its rich set of APIs that enable integration with third-party management tools. It simplifies virtual datacenter operations across virtual and physical environments with "set and forget" policy-driven administration and automated IT processes for greater efficiency across your deployment. VMware vfabric Platform The VMware vfabric cloud application platform is a complete solution that fills IT s need for a fast, efficient and lightweight approach to building applications and running them on a virtualized and cloud-based infrastructure. VMware vfabric leapfrogs the limitations of traditional platforms, drawing on the expertise; developers already have with popular frameworks and tools. VMware vfabric also works seamlessly with the world s most trusted and widely used virtualization engine, VMware vsphere, making it ideally suited for applications that need to scale dynamically to address unpredictable spikes in user demand. VMware vfabric integrates all the essentials of a modern application platform: A proven development framework that bypasses the complexity of overweight platforms such as Java Platform, Enterprise Edition (JEE), to simplify and accelerate the development of modern applications A lean runtime platform optimized for both the development framework and virtual infrastructure A set of runtime services tailored to the needs of modern applications For informational purposes, the following are additional VMware vfabric Platform components of interest, but which were not used in the tested use cases. Customer applications may include some or all of these components. In the future, we may include some of these components. This solution architecture paper provides an approach for how customers can implement their own automatic elasticity. The approach is valid even if the application has all these components. VMware vfabric GemFire High Performance Data Management Get elastic data management for the speed and dynamic scalability you need for today's data-intensive applications, including: Http session management for Tomcat and VMware vfabric tc Server L2 Caching for Hibernate Enhanced parallel disk persistence Fast and secure Apache HTTP VMware vfabric Web Server VMware vfabric Web Server is the HTTP server and load-balancing component of VMware s vfabric Cloud Platform, and provides high performance, scalability and security while reducing the cost and complexity of sophisticated web infrastructure. vfabric Web Server is easy to deploy, tuned for performance and fully supported by VMware. Simplified deployment and maintenance High performance High security 2011 VCE Company, LLC. All rights reserved. 7
VMware vfabric RabbitMQ Open Source Enterprise Messaging Route data to distributed applications throughout the cloud with this open source robust and reliable inter-system messaging. Fully extensible via plug-ins to meet the needs of any use case and application environment Eliminates your dependency on proprietary commercial messaging technologies Proven platform and open standard protocols for portable and interoperable messaging VMware vfabric SQLFire Memory-Oriented Data Management Software Get high performance data access with horizontal scale. Operate at memory speed Dynamically grow or decrease cluster size Leverage existing SQL knowledge for accelerated application development Enterprise Ready Server: Apache Web Server with Load-Balancing This was not used in the tested use case, but it could be used instead of the HW load-balancer. Deploy and maintain multiple instances of Apache Web Server with the customizations you need. Quick installation and setup Up to 100% performance improvements with reduced deployment time Optimized SSL management For further information about the VMware vfabric Platform components, see the link provided in the References section. Below are the VMware vfabric components used in the use cases testing and required for the tested applications. Other applications may require other components. VMware vfabric tc Server vfabric tc Server Spring Edition ( Spring Framework ), is an enterprise version of Apache Tomcat, the widely adopted application server. Optimized for Java Spring users, with a lightweight footprint, vfabric tc Server is ideally suited for usage in modern virtual environments. A lightweight application server optimized for virtual environment, the vfabric tc Server provides a lean platform for running modern applications and is ideally suited for the virtualized datacenter. Due to its very small footprint and lean approach, vfabric tc Server generally requires significantly less computing resources when compared to typical application servers, enabling greater application server density within a virtual environment. An integrated experience with VMware tools means applications can be easily deployed and managed. vfabric tc Server is a Tomcat compatible Enterprise Application Server and ideally suited for virtual environments, including: Secure remote server administration via web portal and command line Application configuration management Advanced diagnostics including advanced error reporting and application thread lock detection and alerting Unparalleled visibility into the performance of Spring applications with Spring Insight Optimizations to allow for reduced memory consumption on vsphere 2011 VCE Company, LLC. All rights reserved. 8
For more information about vfabric tc Server, use the link provided in the References section of this paper. VMware vfabric Hyperic Application Monitoring Continuously monitor web applications on physical, virtual or cloud infrastructures with Hyperic and get: Auto-discovery of over 75 common web application technologies Advanced alerting to reduce duplicate and irrelevant alerts while providing concise information on a wide range of performance metrics Scheduled control for administrative actions like restarting servers and running garbage collection routine VMware vfabric Hyperic has two components: Hyperic 4.5 Server VMware vfabric Hyperic Server is used as the principal monitoring tool of this solution. Specifically, the Hyperic 4.5 Server is the application management component of the VMware vfabric Cloud Application Platform. Hyperic enables system administrators to find, fix, and prevent performance problems in custom web apps, whether running on physical, virtual, or cloud infrastructures. With Hyperic s automatic discovery of infrastructure changes, complete visibility into the entire virtualized application stack, effortless handling of high volumes of performance, metrics, and automated remediation capabilities, it helps resolve application problems quickly, reduce app downtime, and improve app performance - even for highly dynamic and elastic cloud applications. For more information about the Hyperic server, use the link provided in the References section of this paper. Hyperic Agent The Hyperic Agent is part of the monitoring function of vfabric Hyperic. Hyperic Agents can be configured to monitor numerous servers and to monitor custom web applications wherever they may reside physical machines, a virtual infrastructure environment, or public, private or hybrid clouds using vfabric Hyperic agents. By providing immediate notification of application performance degradation or unavailability, Hyperic enables system administrators to ensure availability and reliability of critical business applications. For more information about Hyperic, use the link provided in the References section of this paper. 2011 VCE Company, LLC. All rights reserved. 9
Testing Tools and Application Hotel Booking Application The Hotel Booking application is used in this solution as the example application on which usage is monitored in the Vblock Solution for Application Lifecycle Platform architecture. Refer to Figure 1 and the use case discussions. For more information about the Hotel Booking application per se, and available scripts, use the link to Xentric, which is provided, in the References section of this paper. Synthetic Load Creation In this solution, tools are used to create synthetic workloads, which cause the need for elasticity and trigger allocation or deallocation of resources. The following tools are used to create synthetic workloads: An in-house developed program that accepts as input the amount of RAM it needs to use up, for example 300 MB. It then keeps creating dummy objects in the Java Heap until it gets to approximately that much of RAM. Once the objects are created, the program runs in a loop, accesses the objects, and modifies them. This is used to consume memory in a server An in-house developed program that accepts as input, a number of files to create. Then it creates as many threads and keeps accessing/updating the files in an infinite loop. This is to generate disk i/o load on the system An in-house developed program that simulates CPU usage JMeter Apache JMeter is open source software, a pure Java desktop application designed to load-test functional behavior and measure performance. It was originally designed for testing Web Applications, but has since expanded to other test functions. Apache JMeter may be used to test performance both on static and dynamic resources (files, Servlets, Perl scripts, Java Objects, Data Bases and Queries, FTP Servers, and more). It can be used to simulate a heavy load on a server, network, or object to test its strength or to analyze overall performance under different load types. You can use it to make a graphical analysis of performance or to test your server/script/object behavior under a heavy concurrent load. For more information about JMeter software, use the link to Apache JMeter which is provided in the References section of this paper. 2011 VCE Company, LLC. All rights reserved. 10
Solution Architecture Overview The Vblock Application Lifecycle Platform solution architecture is described for enterprise web application development teams, who are building customer-facing cloud applications, to be deployed on a public or private enterprise cloud. The application deployments have shared-components management with IT, are self-service automated, and are located on an elastic platform across application services, infrastructure services, and resources, managed with a service level agreement (SLA). The Vblock Solution for Application Lifecycle Platform architecture provides elasticity in its infrastructure that enables enterprises to: Create an application Create a policy Choose a development, test, deploy environment Install, use, and monitor the application Automatically scale up or scale down the application and environment, based on application and environmental performance. Manage virtual and physical resources The following figure displays the virtual and physical architecture of the Vblock Solution for Application Lifecycle Platform: Figure 1. Virtual and Physical Architecture Vblock Solution for Application Lifecycle Platform 2011 VCE Company, LLC. All rights reserved. 11
This diagram shows a sample application consisting of various components with a multi-tiered arrangement of parts. The virtual parts are at the top tier, and the physical components at the bottom tier. This architecture provides scalability at both levels, based on demand metrics reported through the use of monitoring. Monitoring, in this multi-tier scaling architecture includes: The use of vfabric Hyperic to monitor different application tiers via Hyperic agents and then to aggregate the collected data Based on the aggregated data, the use of hyperic alerts to trigger various provisioning remediation actions at both the virtual and the physical layers This solution architecture provides elasticity both to the virtual environment and the physical environment. The automation of the elasticity requires writing scripts or programs, and metrics for both environments and can be set up in the same policy. 2011 VCE Company, LLC. All rights reserved. 12
Use Cases The use cases in this solution architecture describe how to automate applications and infrastructure elasticity - both on virtual servers and on the underlying physical servers. The two use cases employ a concept of automated application deployment. The use cases are driven by application-to-component characteristics such as static mapping and dynamic tuning, scaling, and elasticity. That definition determines when to increase the size, resources, and capability of the environment to accommodate applications, and also addresses the following automated application deployment inputs considerations: Application virtual provisioning o Base configuration o Elasticity configuration Application placement o Testing o Production Automated setup of monitoring policies o Which Hyperic metrics to monitor o What threshold levels of these metrics are critical o How long should the system tolerate the critical condition before taking a scaling action Use Case 1: Elasticity of applications on Virtual Servers (VMs) Hyperic is configured with monitoring policies. For our testing, these policies are created by Java code using the Hyperic API to automate and enable this use case. The policies are driven by the application metrics for scaling the application. The application policy configures Hyperic with the alerts specified to monitor the application and scale the application s group of VMs when the policy s thresholds are exceeded. The Hyperic monitoring framework invokes scripts to create a new virtual application server (VM) from the application VM template. A script calls vcenter components through a set of APIs to create the additional VM, as a clone of the existing VMs. The script uses special naming conventions for the new clones, and for the hosts running on them, so that it can separate the management of multiple groups of application VMs managed by Hyperic. Once cloned, the Hyperic agent on the new VM notifies the Hyperic server of its existence, thus Hyperic discovers and inventories the new VM. The monitoring policies now monitor the expanded group of VMs. Hyperic monitoring shows that application response time was improved after the new VM was added. The process of deprovisioning servers, when responding to reduced workload conditions, utilizes similar methods to reduce the number of VMs. Example for Use Case 1 In this example, a Hotel Booking application is running. It is a web based, load balanced application receiving data from users who are using it to make hotel reservations. The same application is running on an initial number of virtual machines (VMs), two VMs in this case, for scaling and performance reasons. As the requests are received into the application, a Load Balancer is handing off the requests (data) to the two front-end servers. A Hyperic Server is employed, and Hyperic Agents are located at each of the two front-end servers. The agents report performance data to the Hyperic Server, and Hyperic then aggregates the data, in this case CPU performance data. Following is an illustration of the initial environment configuration. 2011 VCE Company, LLC. All rights reserved. 13
Figure 2. Initial environment configuration Alerts have been set on the aggregated metric that aggregates performance data for the two application servers. Initially the workload may fluctuate within the critical threshold specified but the system isn t reacting to this. If the application workload increases to a level that cannot be handled by the initial two VMs (base configuration), this load requires the creation of additional VMs. A threshold CPU load level, and time duration have been programmatically set by your configuration that you have implemented. When the monitored data shows the threshold load level has been reached for a predetermined duration, for example the CPU performance load is set at 60% utilization, and a duration of 5 minutes is set, then when the 60% has been observed to exists for 5 minutes, this condition indicates that the load growth has caused the system to run out of resources to handle additional loads. Following is an illustration that shows the group s CPU utilization operating under normal load and then above the threshold load. 2011 VCE Company, LLC. All rights reserved. 14
Figure 3. CPU Utilization - Normal load and above threshold load At this point, a script is run which calls the vcenter API to clone an additional VM server. This is the remediation step for the increased load. The next illustration shows that with the addition of the third VM the load has leveled off and the CPU utilization has become reduced to acceptable levels. A recovery alert was fired to instruct Hyperic to resume monitoring the aggregated metrics after the clone operation. Figure 4. Addition of third VM load levels off 2011 VCE Company, LLC. All rights reserved. 15
Monitoring the new situation may reveal that the load levels off, but then continues to grow again. If the load repeatedly grows, then the number of tc servers continues to expand and soon the system will need more physical servers to provision future tc servers. At that point Use Case 2 is set into action to provision more physical servers to manage the increased load. After some time the workload tapers off, well below the high threshold. This is an acceptable behavior for the system, and it is not yet reacting to the reduced workload. But when the workload reduces below a low threshold, 20% in this example, and stays there for a user-specified period of time, for example 1 or 5 minutes, then the system takes the action of deprovisioning VMs. The illustration below shows a normal reduction in load, then a reduction below a low threshold and then the load for the reduced number of VMs. Refer to the following Use Case 2 section for a discussion of the physical resources. Figure 5. Reduction below low threshold - VM deprovisioned 2011 VCE Company, LLC. All rights reserved. 16
The complete chart of the load test follows: Figure 6. Complete load test Below is a report of cloning and recovery alerts fired. Hyperic triggered the remediation alerts On high Cloud CPU and On low Cloud CPU when a threshold condition was breached, and then stopped monitoring the condition to prevent triggering duplicate clone operations. The recovery alerts On upscaling complete fired after the new clones were detected, and Hyperic resumed monitoring the aggregated metrics for the new group. Figure 7. Cloning - Recovery alerts 2011 VCE Company, LLC. All rights reserved. 17
Result The elasticity of the application virtual provisioning under increased load conditions has been demonstrated when, based on the implemented policy, Hyperic monitors a workload state measuring an application specific threshold and then identifies the need for additional virtual application servers (VMs) when the application workload increases to a level that cannot be handled by the initial two VMs. The additional application servers are then created and provisioned, and the application response time then improves to the normal level. When the application load reduces to an acceptable level the system waits a user-specified time period to confirm a further reduced load and then takes a reverse action of deprovisioning VMs. If needed, the demonstrated cycle of monitoring and deprovisioning could then continue until the environment is deprovisioned down to the initial VM count of two, the base configuration. 2011 VCE Company, LLC. All rights reserved. 18
Use Case 2: Elasticity of the application on the underlying physical servers Use Case 2 describes how to provide elasticity to the physical layer, when the virtual application workload in use case 1 increases to the extent that more physical servers are required to handle the corresponding physical workload increase and its storage needs. In the event, the additional physical servers, and additional ESX servers on top of those physical servers, all need to be provisioned. These additional provisioned physical servers, with the ESX servers on them, are then added to storage clusters. Use Case 2 utilizes policies to react to aggregated monitoring metrics from workload conditions, but the use case 2 metrics are gathered through periodically monitoring the physical workload state of one or more hosts. This could be a vcenter DRS cluster, as in the use case 2 example described below this section. Provisioning of physical resources may require an extended period of time. If you are operating in a VMware vcenter environment, you should initiate any provisioning with the requirement to UIM to also install ESX on your new system. The reactive state is entered when there is an immediate demand for additional resources. You will need to write a program to activate the host that you provisioned in the proactive state. If you are running a DRS cluster, you will need to add the new host and its resources to the existing DRS cluster enabling that cluster to rebalance your workload across the new number of hosts. The Application Lifecycle Platform is driven by a common set of policies. The automation code parses the physical elasticity policy section to create rules that decide when additional physical resources are needed. A user configures the policy when setting up the environment. The policy is used to take actions. The physical elasticity policy describes the performance metrics, which trigger remediation actions. vcenter DRS automatically load balances and migrates parts of the applications to this new physical server and its associated ESX virtual server, thus balancing the load. Infrastructure tasks, when completed, will result in the load of the cluster reduced back below threshold. Overview of Tasks for Use Case 2 Use Case 2 illustrates elastic scaling of the underlying physical infrastructure. The infrastructure provided by the Vblock Solution for Application Lifecycle Platform allows for elastic scaling of the underlying physical infrastructure. Such an automatic elasticity can be implemented in a variety of ways and is a fundamental component of the private cloud. It can provide monitoring and remediation capabilities to elastically expand the physical resources of a vsphere cluster based on application demand. The automatic elasticity leverages vfabric Hyperic and vsphere vcenter capabilities to track real-time performance of a cluster. When the Elastic Infrastructure recognizes that a condition exists requiring remediation, it automatically coordinates the elastic expansion of a cluster. The automatic elasticity orchestrates this expansion, using UIM and storage array controller, just-in-time to satisfy application needs. vfabric Hyperic and vsphere vcenter can be instrumented in a variety of ways to implement monitoring and provisioning capabilities for the infrastructure. This instrumentation can form a library that monitors Hyperic and selected vcenter metrics to determine when it should proactively provision new CPU, networking, and storage for a cluster. The instrumentation attempts to have, in parallel, both a proactive and a reactive system in place. It functions in a proactive mode to have resources provisioned and ready for activation simultaneously when the point of most urgent need arrives. It also operates in a reactive mode so that if the urgent need becomes immediate, the infrastructure immediately provisions (if needed), and activates, a new system. Metrics You must decide what metrics work best in your environment. This depends on the type of workload you run, and whether that workload is CPU intensive, or memory intensive, I/O intensive, or some combination. You choose a set of metrics that identify two conditions related to the host or hosts your workload runs on: 2011 VCE Company, LLC. All rights reserved. 19
1. A pro-active condition that indicates your workload is growing and, if it continues at the same rate, at some point will require additional CPU, memory, or I/O resources for your workload to continue operating at the same rate of performance. This results in the creation of additional physical servers with ESX installed on the physical servers via UIM, and the servers are then placed in a standby pool. 2. A re-active condition that indicates that your workload has grown to the point that requires additional CPU, memory or I/O resources. Metrics should be defined with a time period over which the metric threshold is compared. A CPU metric, for example, set to 90% over 8 hours means that the metric is exceeded when CPU consistently averages over 90% over an 8 hour period. By setting up proactive and reactive metrics, either directly using Hyperic, or indirectly by writing a program, which queries the Hyperic metrics, you can detect when either state occurs. The proactive state enables you to begin the process of physical provisioning by UIM before your workload actually requires additional resources. You will need to write a program, which calls the UIM API interfaces, which you call when the proactive state is detected. Your system is capable of using any Hyperic metric in monitoring a host. Hyperic metrics trigger remediation in the context of a time interval. CPU usage, for example, must exceed its defined threshold as an average over a well-defined period of time. If CPU usage is configured to 90% and the time interval is set to 8 hours, then CPU usage for the entire cluster of hosts must exceed 90% for the last eight hours before remediation is triggered. These values can be set for individual metrics according to what works best for a given environment. You can use several vsphere vcenter maintained cluster-level metrics in triggering elastic growth, these include: Cluster-level metric Cluster status, either red or yellow Cluster CPU usage Cluster memory usage Cluster hosts usage Datastore usage Definition This is the same status seen through the vsphere client. A high-level percentage threshold of: the total effective MHz of CPU usage by a cluster s physical hosts A high-level percentage threshold of: the total effective memory usage by a cluster s physical hosts A high-level percentage threshold of: the effective usage of all available hosts A high-level percentage threshold of: the total use of the Datastores available to the cluster 2011 VCE Company, LLC. All rights reserved. 20
Remediation When a metric threshold is exceeded, the Vblock platform configured with Hyperic and vcenter can begin remediation. The metrics defined above can be specified separately for each level. Two levels of remediation are supported, as described in the following table: Level of Remediation Pro-active remediation Re-active remediation Description Your configured system can pro-actively recognize the future need to expand a cluster. In this type of remediation, CPU, network and storage hardware resources are provisioned from the Vblock platform, but not activated. Those resources remain in a stand-by pool once provisioned, and are activated re-actively, or when an immediate demand is detected. Your configured system searches for pre-provisioned resources and attempts to use those for an immediate demand to expand a cluster. If such resources are available, they are immediately activated and added to the cluster in need of additional hardware resource. If pre-provisioned resources are not available for a re-active remediation, then your configured system attempts to immediately provision and activate those resources. Your Configured System Architecture Your system addresses two key system requirements for enterprises: 1. Infrastructure and IT elasticity needs for Enterprise Application Development Teams during the Application Lifecycle stages prior to production deployment of an application. Specifically, these are the development, build, integrate, and test steps for an application that is built by the enterprise and made available via software as a service (SAAS) for the enterprise customers, or for enterprise employees. 2. Elasticity needs for cloud deployed Enterprise Web Applications in production, typically written in Java or Ruby/Rails using Spring Frameworks. Your system has two main parts: Performance Monitor: One Performance Monitor runs for each host or cluster of hosts monitored. The Performance Monitor periodically communicates with the Hyperic Server and vcenter to cost real-time data about the state and status of a host and/or cluster. When, and if, any of the metrics defined above are met or exceeded, the Remediator is triggered into action. Remediator: Remediation is proactive, or reactive, depending on the definition of the metric exceeded. For pro-active remediation, the goal is to provision a system consisting of CPU, network connectivity and storage but not to activate it. For re-active remediation, the goal is to activate a system and add it to an existing cluster known to a vcenter. The result of remediation is a either an additional host or a larger cluster with additional compute, network and storage resources. The Remediator uses 3 components: The Unified Infrastructure Manager (UIM) is used to provision and activate new systems The storage controller is used to enable the visibility of each cluster storage volume by each host. A server component is installed on the controller, which receives commands from the Remediator to perform lun-mapping when needed. You will need to write this component to execute the appropriate storage commands on the storage ccontroller, such as Symmetrix, that you are using. Your remediation is done through custom script or code written by you, or the professional services group, or a partner. 2011 VCE Company, LLC. All rights reserved. 21
If you are using a DRS cluster, you should also add the new host system, and create a new data store, on the newly provisioned storage See the figure, below: Figure 8. Vblock platforms with Hyperic Server and vcenter Assumptions Your system is intended for vcenter clusters running with DRS active and set to run fully automatically. When there are performance issues with a cluster, DRS will attempt its own remediation through the movement of existing virtual machines and their data from one host/datastore to another. Your configured system provides additional functionality through the capability of elastic growth of a cluster after DRS has taken its own steps to alleviate performance or status issues. Prerequisites The Vblock platform and its accompanying software, including the Unified Computing System, Symmetric Array Controller, and UIM must be installed, configured, and running before configuring and running your system. A vsphere vcenter must be also be installed, configured, and running along with a Hyperic Server and Hyperic Agent for the vcenter. Grading of Hardware The blades, storage, and networking to be used in your system must be graded before use. This grading is a requirement of UIM. Only similar model blades should be used. These need to be graded as EXAMPLE_POOL using the UIM user interface. Similarly, the storage and network intended for your system should also be graded EXAMPLE_POOL. You can create your own _POOL name. For our example we used ALM_POOL. 2011 VCE Company, LLC. All rights reserved. 22
Service Offering Definition Before such a system can successfully provision and activate storage, a service offering named EXAMPLE_SERVICE_OFFERING must be created using the UIM user interface. You can create your own name. For our example we used ALM_SERVICE_OFFERING: 1. This service offering must be defined to use one blade from the ALM_POOL, and two storage volumes from the ALM_POOL: a boot device and a data device 2. The sizes of these devices can be set to what works best in the environment 3. A single NIC must be defined which provides for network access to the vcenter that manages the hardware. 4. The service offering must also be defined to install ESX 4.1 onto a newly provisioned system. UIM Configuration A list of IP address and system credentials must be supplied to your system to assign newly provisioned systems. The following section provides an example of using your system. 2011 VCE Company, LLC. All rights reserved. 23
Example for Use Case 2: Using the Vblock Platform with Hyperic, UIM, vcenter, and VMware ESXi This section describes a Use Case 2 example using automatic elasticity to expand the physical size of a vcenter DRS managed cluster based on CPU performance. The Vblock platform operates with any Hyperic metric applicable to physical systems. It also supports a limited number of status metrics from vcenter. For simplicity however, the example below shows how CPU load can influence elastic growth. Initial Environment In the initial environment, as illustrated in the following figure: 1. vcenter DRS managed cluster named ALM_POOL is set to load balance automatically using an aggressive policy. 2. vmotion is enabled. 3. The cluster is backed by one physical host running ESXi 4.1. 4. There are two virtual machines running on the physical host. These machines run identical test code which create a CPU load on the virtual machines and the physical hosts. Combined, the test code running on both virtual machines, contributes approximately 70% to the CPU load of the physical host. 5. The physical host runs on IP address 192.168.152.102. DRS Cluster Figure 9. Initial environment Elastic system For the elastic system running in the initial environment, the following two charts show the CPU utilization for the cluster, and then for the single available physical host, prior to the remediation 2011 VCE Company, LLC. All rights reserved. 24
This table charts CPU utilization for the cluster (ALM_POOL) prior to remediation: 2011 VCE Company, LLC. All rights reserved. 25
This table charts CPU utilization for the single available physical host prior to remediation: Action Your system continually monitors performance metrics of physical hosts and status of clusters. The configured policy of this example indicates that stand-by provisioning be initiated when CPU load averages over 50% over an 8 hour period. For this example, this is assumed to have occurred. There is one physical host system on standby. When CPU load averages over 70% over an 8 hour period, activation provisioning is initiated. Your configured system automatically adds a new physical host to the vcenter. It also configures a Distributed Virtual Switch, activates vmotion, and sets the time/date to match the single host already in the cluster. After a few moments, the load is quickly re-balanced across the increased number of physical hosts. This is done automatically by DRS. One of the two identical virtual machines is migrated to the new physical host. 2011 VCE Company, LLC. All rights reserved. 26
Final State After adding the new host: 1. vcenter DRS managed cluster named ALM_POOL is elastically extended to consist of two physical hosts with increased shared storage. 2. The initial physical host continues to run on IP address 192.168.152.102. 3. The newly provisioned and activated host runs on IP address 192.168.152.103. 4. There continue to be two virtual machines running on the cluster. One runs on one host and the second on the other. DRS Cluster Figure 10. DRS Cluster The following charts show the CPU usage before and after the addition of this new host, as well as the decrease in activity on the first host as the load has become balanced after the addition of the new host. 2011 VCE Company, LLC. All rights reserved. 27
This table charts CPU usage (in MHz) before, and after, the addition of the new host. The total available MHz of CPU increases as the new host is added, while the total CPU usage in Mhz remains constant but gets balanced across two hosts instead of one. 2011 VCE Company, LLC. All rights reserved. 28
This table charts CPU utilization for the original physical host. It demonstrates the drop off in utilization due to the migration of one of the two virtual machines to the new host: 2011 VCE Company, LLC. All rights reserved. 29
This table charts CPU utilization for the newly activated, second physical host. The chart demonstrates little CPU activity after initialization up to the point that one of the two virtual machines is migrated: Result The configured system is able to monitor physical hosts operating in the context of a vcenter DRS managed cluster. When performance metrics, such as CPU load, exceed desired thresholds your system provides the functionality to automatically expand the size of the compute environment elastically in order to meet needed demand. In the example provided, monitoring indicated a need to elastically add another host. 2011 VCE Company, LLC. All rights reserved. 30
Conclusion The Vblock Solution for Application Lifecycle Platform combines the VCE Vblock Infrastructure Platforms and VMware vfabric cloud application platform to provide the needed platform for modern applications from design to development, testing, staging, deployment, and management. It gives IT control over quality of service and security, while providing application owners and developers with flexibility, as well as instant access to resources for addressing application demand fluctuation. Automatic elasticity provides applications with additional capacity, as needed, and removes it when no longer needed. The freed capacity can be used by other applications that enable balanced use of resources among applications. Automatic elasticity lowers the operational cost per application. The automatic elasticity Application Lifecycle Platform can be built using Vblock platform with Hyperic, vcenter, and UIM. The objectives for this solution architecture paper involved showing how to automate provisioning steps using Vblock platform and VMware management tools in order to provide automatic elasticity: Monitoring running applications for increased workload that triggers proactive and reactive response Taking the necessary remediation action upon workloads exceeding set thresholds, as set in policies These objectives were met by demonstrating the automatic elasticity of the Vblock Solution for Application Lifecycle Platform through Use Case 1 and Use Case 2 examples which successfully illustrate how metrics, gathered from monitoring application demand fluctuation, can be utilized based on configured policies to provide automatic elasticity. When the system needs additional resources, the elastic environment provides them, and also provides management of virtual and physical server resources. The Vblock Application Lifecycle Platform solution helps enterprise IT solve their Capacity Planning issues. It reduces IT manual interventions to adjust to growing demands.. 2011 VCE Company, LLC. All rights reserved. 31
References VCE Vblock Infrastructure Platform Architecture Overview VMware VMware ESX 4.1 VMware Hyperic 4.5 Server VMware Hyperic Monitoring and Agents VMware vfabric tc Server Full VMware vfabric Platform Cisco Cisco Documentation EMC EMC Other Apache JMeter Xentrik - Hotel Booking Application Scripts 2011 VCE Company, LLC. All rights reserved. 32
ABOUT VCE VCE, the Virtual Computing Environment Company formed by Cisco and EMC with investments from VMware and Intel, accelerates the adoption of converged infrastructure and cloud-based computing models that dramatically reduce the cost of IT while improving time to market for our customers. VCE, through the Vblock platform, delivers the industry's first completely integrated IT offering with end-to-end vendor accountability. VCE's prepackaged solutions are available through an extensive partner network, and cover horizontal applications, vertical industry offerings, and application development environments, allowing customers to focus on business innovation instead of integrating, validating and managing IT infrastructure. For more information, go to www.vce.com. THE INFORMATION IN THIS PUBLICATION IS PROVIDED "AS IS." VCE MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OR MECHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Copyright 2011 VCE Company, LLC. All rights reserved. Vblock and the VCE logo are registered trademarks or trademarks of VCE Company, LLC. and/or its affiliates in the United States or other countries. All other trademarks used herein are the property of their respective owners. 2011 VCE Company, LLC. All rights reserved. 33