White Paper Top Purchase Considerations for Virtualization Management One Burlington Woods Drive Burlington, MA 01803 USA Phone: (781) 373-3540 2012 All Rights Reserved.
CONTENTS Contents... 2 Executive Summary... 3 Challenges of Virtualization 2.0... 3 Key Requirements... 5 Performance... 5 Efficiency... 5 Planning for the Future... 5 Selection Criteria... 6 Real- Time Operation... 6 Closed- Loop Control... 6 Holistic View of the Entire IT Stack... 7 Rich Set of Corrective/Preventative Actions... 8 Planning Tools to Support Growth and Change... 9 Real- Time Monitoring... 10 Historical Reporting... 13 Integrated Suite for a Consistent, Systemic Workflow... 15 Proactive vs. Reactive: True Optimization... 16 Deployment Considerations... 16 Scaling to the Enterprise and Cloud... 16 Conclusion... 17 About VMTurbo... 17 All trademark names are property of their respective companies. VMT- WP- PURCHASE- 612 2
EXECUTIVE SUMMARY Many organizations have hit the critical scaling stage of their virtualization initiative moving past simple consolidation initiatives and taking a virtualization first stance on IT expansion. Chances are that size, complexity, and dynamic workload demands are creating new challenges for operations staff, making it difficult to keep pace. Virtualization management tools are available to gain control of the situation, but which one to choose? Can legacy infrastructure management solutions adapt to virtualization- and cloud- era requirements? Or is there a new breed of virtualization management purpose- built to meet today s unique requirements? This paper proposes purchase criteria for virtualization management solutions, and details how VMTurbo Operations Manager measures up. CHALLENGES OF VIRTUALIZATION 2.0 Virtual and cloud workloads fluctuate by season, quarter, day of the month, day of the week, time of the day, and from second to second. At the same time, workload priorities fluctuate by application, business unit, time, and user. These conditions make it increasing difficult for infrastructure managers to keep pace; they need agility. Cloud IT environments are increasingly shared, dynamic, constrained, and growing in complexity. These environments need agility as well. Today s IT management is struggling to meet these challenges with physical- era tools. Yesterday s tools fall short: siloed by technology and function, encumbered by collecting too much detailed data, and lacking in intelligence to automate decision making, resulting in IT management s inability to adapt to the demands of virtual and cloud environments. Yesterday s tools are also unable to orchestrate across multiple layers of services and infrastructure, leaving the heavy lifting to system administrators. Invariably, ad- hoc approaches are used to address exceedingly complex problems. Large, complex, dynamic environments are manually managed, resulting in performance degradation, inefficiencies, waste, and operations that cannot scale. This situation compromises IT s ability to deliver on the agility promise of virtualization and cloud computing. Virtualization Infrastructure Management Checklist Real- time identification and execution of rich sets of actions to resolve and prevent problems, and maintain the environment in a healthy state Proactive analysis to avoid degradations and anomalies (vs. reacting to / detecting threshold violations) Planning tools (predictive analysis for a wide range of scenarios related to planning growth and change of virtual workloads and infrastructure) Closed loop control True integration for consistent and systemic workflow Single- pane- of- glass view that includes holistic analysis of the entire IT stack Real- time monitoring on infrastructure health, performance and utilization Historical reporting on key performance and efficiency metrics Rapid deployment Scaling to the Enterprise and Cloud 3
The good news is that a new generation of management tools has emerged to address these new challenges, falling broadly into two categories: Focused tools that provide a deep dive into a particular aspect of the IT stack (e.g., troubleshooting storage performance issues), or a portion of the closed- loop control process (e.g., monitoring tools or workload placement mechanisms). Integrated platforms (or suites ) that provide a broader view of the entire IT stack, and attempt to close the control loop through monitoring, analysis, and control. Deep- dive point tools are indispensable for troubleshooting the increasingly complex IT stack. At the same time, cloud environments call for a new paradigm of closed loop management that ties the viewing with the doing through holistic, proactive, end- to- end management with built- in intelligence to automate management decisions. Which management solution is right? The answer depends on many factors: size and complexity of the environment, diversity and relative importance of the virtualized applications, fluctuation of workloads, business goals, and more. 4
KEY REQUIREMENTS There are three central challenges in keeping a virtualized environment running optimally: Assuring application performance (SLAs) Optimizing resource utilization and operating expenses Planning for the future Performance The top priority for any corporate IT environment is to deliver application performance meet the required Quality of Service (QoS) goals for an application. To accomplish this, it s critical to know (as close to real time as possible) if there are performance problems or infrastructure bottlenecks, and what actions are needed to resolve problems. The top objective, however, is to prevent issues impacting QoS in the first place. Efficiency Of course, if application performance was the only requirement, organizations would simply over- provision resources to ensure QoS. However, this would return the environment to the low utilization metrics associated with non- virtualized infrastructure and dilute the benefits of virtualization. Today, many organizations that have deployed virtual infrastructure are challenged with optimizing the efficiency of the environment without compromising the service levels of virtual machines (VMs) and the applications they support. Planning for the Future Virtualization is an on- going project for most organizations. Some of these projects are in their early stages. Others are well along toward achieving the levels of virtualization that the organization desires. Tools are required to assist in growth planning and change management within a virtualization capability. Questions regarding the necessary capital equipment investments need to be explored in terms of the behavior of VMs in the existing environment. The load profiles and capacity profiles of virtualization components (taking all resource needs and resource contentions into account) should be used to produce accurate models to guide the processes of infrastructure build- out and optimization. Assuring Application Performance Are there problems that are impacting applications? How can problems be prevented? How can key metrics be shared with stakeholders? Optimizing the Environment Planning for the Future How can resource capacity be maximized? How can operating expenses be reduced? How can more classes of workloads be supported? How can more applications be virtualized? What is the impact of changes in demand? What is the impact of hardware changes? 5
SELECTION CRITERIA No two virtual data centers are the same. The optimal choice for a management solution must be driven by the unique application requirements, virtual infrastructure, business requirements, and other factors specific to the environment. Some of the criteria to consider when choosing a virtualization management solution are identified here. Real- Time Operation Virtualized environments are dynamic subject to rapid changes in the workload and infrastructure configurations. Therefore, the ability to rapidly react to changes in the environment leads to a higher likelihood of achieving better efficiency and performance. Virtualized environments can have wildly fluctuating dynamics, at levels previously unseen in the world of siloed IT environments. Usage spikes whether from rapidly changing market or news events, or more regular daily/weekly/seasonal sharp increases can severely tax the data center environment. Further, if not properly managed, fluctuations can cause performance bottlenecks, SLA violations, and even uptime failures. And in this increasingly dynamic world, what does it take to implement changes with a VM performance optimization tool? How frequently are the recommendations or corrective actions updated? What is entailed to make the changes? Is it necessary to review and manually implement them? What is the time between analysis and action seconds, minutes, hours, or days? After all, some solutions are far from real- time, polling data as infrequently as 30 minutes from off- line databases, due to the scalability challenges of collecting large numbers of parameters across many entities. VMTurbo is the only real- time solution that: Recommends (and automates execution of) a broad set of specific actions based on a wide range of performance metrics. Enables the IT agility required to address the continuously fluctuating demands of today data centers. Unlike other solutions that rely on a database to perform analysis, VMTurbo uses an in- memory analytics model a key requirement for responding in real time. A database- driven approach lacks the performance required to manage thousands of entities (or more) on a real- time basis. Closed- Loop Control Collecting and reporting on performance metrics is fairly straightforward, which is one reason the marketplace is saturated with vendors providing monitoring and reporting tools. Arguably, the real value is in the next step of keeping the data center in a healthy state providing the specific workload management and infrastructure optimization actions that must be taken in order to prevent performance bottlenecks, maximize utilization, and minimize operating and capital costs. The VMTurbo suite uses patented analytics to automate key virtualization management operational procedures, including complex workload balancing, real- time and continuous rightsizing, and virtual infrastructure performance and capacity management. Through powerful data correlation capabilities, VMTurbo provides line- of- sight into problems, in real time. Only VMTurbo: Pinpoints the problems Identifies problems impact Recommends corrective actions Executes remedies to ensure optimal operations 6
Holistic View of the Entire IT Stack Bottlenecks and other challenges can pop up throughout the entire IT stack, so the virtualization management approach needs to be just as broad: Examining virtual machines, physical machines, network and storage infrastructure. Monitoring a sufficiently broad number of parameters, including CPU and memory congestion, co- scheduling congestion, IO and network bottlenecks, over- and under- utilization, and more. VMTurbo is the only solution taking a holistic approach across the IT stack - - analyzing a broad range of performance metrics, with a rich set of actions to keep your environment healthy and efficient.. As shown in Figure 1, VMTurbo monitors the entire IT stack, including VMs, physical machines, storage infrastructure, and more. VMTurbo detects problems and bottlenecks, such as CPU and memory congestions, co- scheduling congestion, IO and network bottlenecks, over- and under- utilization, and others. Figure 1. VMTurbo Monitors the Entire IT Stack 7
Rich Set of Corrective/Preventative Actions Problems can arise in any layer of the virtual infrastructure, and resolving and preventing them requires a rich set of actions the management solution can execute. A management solution must be able to take action across physical, virtual, storage and network components. As shown in Figure 2, VMTurbo makes recommendations for actions that will maintain optimal health of the environment. Actions can be executed automatically, or manually by clicking resolve for suggested activities. To continuously maintain the health of the environment, VMTurbo considers real- time data, historical trends and transient spikes. This provides the basis for a rich set of corrective and preventive actions across the entire infrastructure. Figure 2. VMTurbo Recommends and Can Execute Actions 8
Planning Tools to Support Growth and Change Virtualization redefines IT fundamentals. Planning decisions can no longer be partitioned across silo boundaries. Actions taken by one part of a functionally- delineated organization can result in waste and inefficiency, or (worse) performance problems. Manual ad- hoc management processes are too limited since: Changes may require re- assignment of workloads. Complexity and waste can grow dramatically when the number of VMs increases, physical machines vary, constraints exist (e.g., storage access, security policies), or the rate of change is high. They can lead to costly inefficiencies. VMTurbo Operation Manager s Planner is a wizard- based tool that aids in planning workload changes and hardware transformation, provisioning and/or decommissioning. The results of planning scenarios shows a summary of the target environment, utilization charts for resources, as well as recommended actions to perform to achieve the target results (see Figure 3). Figure 3. VMTurbo Operations Manager Planner 9
Holistic capacity planning involves the seven use cases and five key questions regarding workload placement and resource allocation, as illustrated in Figure 4. Figure 4. Key Planning Questions Real- Time Monitoring A proper monitoring solution must offer system administrators complete visibility into their virtual infrastructure by providing: Key metrics for managing virtual infrastructure health and assuring application performance Views that allow administrators to share the relevant subset of monitoring data with IT managers and application owners A problem log highlighting performance bottlenecks and their related elements at a glance A grouping capability to organize views of resources by criteria, which are aligned to technical or business needs of each user VMTurbo Operations Manager logs events for risks and opportunities that arise in the environment, as shown in Figure 5, justifying actions in the ToDo list. The log shows icons for problem severity of a given risk or opportunity: critical, major or minor. 10
Figure 5. VMTurbo Operations Manager Risks and Opportunities Log As shown in Figure 6, VMTurbo Operations Manager provides a real- time, holistic view of all key physical and virtual resources (not just CPU and memory; also IO, network, swapping, ballooning, and CPU ready queues). Operations Manager Monitor provides insights into resource problems and bottlenecks, including CPU and memory congestion, co- scheduling congestion, IO and network bottlenecks, and over- /under- utilization. Figure 6. VMTurbo Operations Manager Monitoring 11
VMTurbo Operations Manager s heatmap quickly identifies over- utilized or under- utilized resources. Heat charts are available for applications, VMs, physical hosts, and datastores. While lots of green is a good sign, red (critically over- utilized), yellow (over- utilized), dark blue (under- utilized), and light blue (critically under- utilized) indicators show opportunities for improvement. Figure 7. VMTurbo Operations Manager Heatmap Drill down to see trends and inspect problem details in VMTurbo Operations Manager s Inventory Monitoring (see Figure 8). Inventory monitoring provides real- time answers to key questions: What is the health of the environment right now? Are there any problems, and what is their impact? Figure 8. VMTurbo Operations Manager Inventory Monitoring Drill Down 12
Historical Reporting Managing a virtual infrastructure environment can be complex and time consuming. It's often a challenge to assess which resources will run out of capacity as the workloads continue to grow; when and where mission- critical applications and virtual machines are disrupted by high storage latencies; what storage can be reclaimed from sprawling dormant VMs; and which data stores will run out of space and when. A reporting tool must enable you to: Determine when workload demands will saturate resources and require capacity expansion Rapidly identify and eliminate resource congestion and virtual infrastructure bottlenecks Explore key resources (CPU, memory, IO, network, swapping, ballooning, CPU ready queues, datastores IOPS, latency and space) Analyze storage usage, including key categories such as VMDK files, snapshots, logs, configurations, etc. Identify wasted resources from dormant VM sprawl, wasted storage, and VMs that are rightsizing candidates Create management reports to justify plans for virtual infrastructure growth and change The reporting function in VMTurbo Operations Manager provides administrators with the ability to track, analyze and trend virtual infrastructure. This allows the IT organization to gain an understanding into relevant performance, demand and utilization trends; detect emerging bottlenecks; identify resource waste; and estimate capacity expansion needs. Figures 9 and 10 detail examples of standard reports. Figure 9. VMTurbo Operations Manager Offers Standard and Customizable Reports 13
Figure 10. VMTurbo Operations Manager identifies over- and under- utilized resources, performance bottlenecks, and trending patterns. The analysis leverages VMTurbo's patent- pending utilization index, which reflects both resource- specific and infrastructure- wide utilization patterns. 14
Integrated Suite for a Consistent, Systemic Workflow It is key that the management tools and functions assure consistent actions. For example, a recommendation to shift workloads to an available host by a performance optimizer should not compromise energy minimization provided by a capacity management tool. Similarly, actions recommended by a problem resolution tool should not compromise the performance improvements executed by the optimizer. As organizations scale out virtualization and support mission- critical applications, it is imperative that consistent, repeatable and automated service delivery processes are adopted to ensure SLAs are consistently met while containing operational costs. The VMTurbo suite supports a systemic life- cycle management process by helping administrators and IT leadership to organize their workflow into a cycle of activities: When evaluating operations management solutions, organizations should consider their capabilities in the context of how broadly they can address a number of key steps in an integrated, scalable and repeatable process. These steps are illustrated and described in the diagram in Figure 11. Monitor system behaviors Resolve and eliminate problems Plan changes and growth Optimize utilization and performance Figure 11. Key Steps in an Integrated, Scalable and Repeatable Process 15
Proactive vs. Reactive: True Optimization Some VM management systems respond to problems, the implication being that that problem has already occurred analogous to firefighters breaking down the front door and dousing a burning house with fire extinguishers. There are two limitations with this approach. First, waiting for a problem to occur prior to initiating action may result in a state far from optimal. By the time the problem has occurred, some damage may have already been done. For example, waiting for a threshold setting to trigger a VM move may have resulted in the environment being substantially unbalanced prior to the move. And worse, the corrective action may be messy and cause other problems, resulting in a cascade effect. Moving a VM to another server may result in unexpected interaction among the VMs, triggering a set of other VM moves and other reactions, driving the environment into a worse state. Deployment Considerations The VMTurbo suite is packaged in a single virtual appliance, making it easy to deploy, configure and operate. The appliance is self- managing: it installs automatically and supports automated installation and in- place software upgrades. Installed in minutes, the appliance automatically discovers and then monitors and analyzes the virtual infrastructure. Scaling to the Enterprise and Cloud Virtualization increases the importance of scalability requirements by orders of magnitude, as it introduces thousands (or more!) of dynamically changing entities that need to be tracked in a holistic manner. When considering a management solution, explore the following metrics: maximum number of managed entities (hosts, virtual machines, datastores, networks, etc.) a solution can support in a single application instance; time required to discover and present the entities; amount of storage needed; average time required to perform daily tasks such as exploring bottlenecks and suggested resolutions; and the ability to manage the rapidly growing environment from a single pane of glass. VMTurbo s Virtualization Management Suite not only resolves problems, but also works to proactively keep the data center in a healthy state. It is a proactive system that keeps the managed environment healthy by continuously adjusting workload placement and tuning the managed entities, keeping the performance of all the entities at similar levels. Unlike a threshold- based system that reacts to abnormalities when performance violates thresholds, VMTurbo maintains the data center in a much more stable and predictable state. An important topic sometimes overlooked is deployment. For example, does the application require setting up agents and authentication access across multiple stacks and regions? If it does, you may have to factor in a potentially complex deployment process involving multiple parties with different needs and priorities. Once it is done, this framework will have to be properly maintained over time, which may increase maintenance costs. Another important factor is how much burden the management solution imposes on the existing infrastructure? For example, it may overload already overloaded corporate or management databases, or impose a heavy load on the SAN. Another factor is how much tweaking is needed to begin getting value from the management application (e.g., setting up thresholds, custom policies, etc.). VMTurbo suite is architected as an enterprise- class application, supporting distributed deployment, consolidating multiple data centers, clusters, applications, vcenters and cloud- scale environments. It can scale operations to 10,000 VMs and beyond. 16
CONCLUSION Virtualization management solution selection criteria should be anchored around three central challenges in keeping a virtualized environment running optimally: assuring application performance, optimizing resource utilization and operating expenses, and planning for the future. While there are many tools that collect information about the environment and present it to operators or administrators, they do little to offer actionable advice or take proactive steps to meet these challenges. VMTurbo Operations Manager delivers: Real- time identification and execution of rich sets of actions to resolve and prevent problems, and maintain the environment in a healthy state Proactive analysis to avoid degradations and anomalies (vs. reacting to or simply detecting threshold violations) Planning tools (predictive analysis for a wide range of scenarios related to planning growth and change of virtual workloads and infrastructure) Closed loop control True integration for consistent and systemic workflow Single- pane- of- glass view that includes holistic analysis of the entire IT stack Real- time monitoring on infrastructure health, performance and utilization Historical reporting on key performance and efficiency metrics Rapid deployment Scaling to the enterprise and cloud Today, IT organizations require tools that are designed, developed and delivered to meet today s virtualization and cloud workloads unique requirements. VMTurbo stands apart in providing the intelligence, automation and orchestration capabilities to improve IT service delivery processes and deliver greater financial returns. ABOUT VMTURBO VMTurbo delivers an Intelligent Workload Management solution for cloud and enterprise virtualization environments. VMTurbo uses an economic scheduling engine to dynamically adjust resource allocation to meet business goals. The VMTurbo platform first launched in August 2010 and since that time more than 4,000 cloud service providers and enterprises worldwide have deployed the platform including British Telecom, Omnicare and L- 3 Communications. Using VMTurbo our customers ensure that applications get the resources they need to operate reliably, while utilizing infrastructure and human resources in the most efficient way. VMTurbo is headquartered in Massachusetts, with offices in New York, California, United Kingdom and Israel. 17