New Management Stack for the Software Defined Data Center Bernd Harzog Analyst, Virtualization Performance and Capacity Management bharzog@virtualizationpractice.com #splunkconf 1
Bernd Harzog Virtualization Performance and Capacity Management Analyst Analyst and Consultant Focused upon: Infrastructure Performance and Capacity Management of Virtualized Systems Application Performance Management Transaction Performance Management End User Experience Management Clients include: Enterprises seeking virtualization performance management solutions Vendors offering solutions Key Findings Virtualization introduces sharing and dynamic behavior Agile Development produces rapidly changing applications Both combine to require a new tools, organizations and management processes 2
Key Trends The demand for business functionality implemented in software is infinite (therefore so is the backlog) More and more software from different sources, from different tools, in different languages, on different run times Scaled out commodity deployment platforms Distribution of applications across data centers and private/public clouds Virtualization of business critical and performance critical applications More than one hypervisor in the enterprise Rapidly changing applications running on dynamic platforms The Software Defined Data Center will deliver dramatic benefits and create significant management challenges 3
Virtualization is Progressing Business Critical Apps are Now VMware s Focus 4
Your New World Agile Development creates rapidly changing applications Built in diverse languages and running on diverse language runtimes Running on next generation deployment platforms Deployed on multiple virtualization platforms Running on scaled out commodity hardware Located in multiple clouds with multiple owners 5 Your Cloud Hybrid Cloud Public Cloud
The Principles of the SDDC 6
Network Virtualization 7
What s Different about a Software Defined Data Center? Configuration, management, and some of the functional execution of CPU, memory, networking and storage is done in the SDDC software Example configuration of a virtual tunnel between two VM s across clusters or even virtual data centers Some of the services currently performed by dedicated hardware appliances will be performed by software plug-ins to the SDDC Example load balancing and security Almost all of the configuration for a VM or for an N-tier application system can be done in one place, and will follow that workload around Since all of this configuration will be done in the SDDC software layer, and since it will all be exposed via API s, configuration changes will occur much more frequently and easily Private clouds will be able to address a broad range of business critical applications since the required resources will be able to be automatically marshaled by the Cloud Management platform from the SDDC An SDDC supporting a private cloud will be a highly dynamic computing platform using a high degree of automation to continuously execute a variety of actions on a highly automated basis 8
Management Principles for the Software Defined Data Center Start Over Start with a new Reference Architecture - do not assume that any tool you have purchased automatically makes the cut Insist upon easy to try, easy to buy, easy to manage, and results in production before purchase Organize for the successful virtualization of business critical applications Define Performance as Latency and Response time, not Resource Utilization Manage every application for performance, not just the 5% most painful and important ones Get Real Time, Deterministic and Comprehensive about Data Collection Design your management architecture for the distributed cloud case even if you are not there yet 9
If You Don t Believe Me! Statistics Collection & Telemetry Another area of focus for an open networking ecosystem should be defining a framework for common storage and query of real time and historical performance data and statistics gathered from all devices and functional blocks participating in the network; This is an area that doesn t exist today; Similar to Quantum, the framework should provide for vendor specific extensions and plug-ins; For example, a fabric vendor might be able to provide telemetry for fabric link utilization, failure events and the hosts affected, and supply a plug-in for a Tool vendor to query that data and subscribe to network events http://cto.vmware.com/open-source-open-interfaces-and-open-networking/ 10
The Worse than Useless Test Apply this test to every single management product in your company 1. Does it operate on a real-time, continuous, and deterministic basis? 2. Does it support workloads distributed across data centers (yours and ones you rent (cloud))? 3. Does it work across your virtualization and cloud vendor environments? 4. Can it re-configure itself every time you change something in the environment or in the applications? 5. Can you support it and use it without the continuous presence of on premise consultants from the vendor of the tool? 6. If it is a monitoring tool, does it focus upon response time and latency? 7. Can you try it, for free, in production, before you buy it or more of it? If the answer is not Yes to all seven junk the tool and start over 11
Starting Over Rethink ITIL and the CMDB ITIL is designed to get you to document and slow down the rate of change Don t tell the Change Control Committee about vmotion! Your CMDB will never be able to keep up with rate of change in a Software Defined Data Center Every configuration change needs to be tracked in real time, and cross-correlated with performance degradations and resource contention 12
Starting Over Rethink ITIL Business Service Management There will be no time to Design Services ; They will need to be discovered automatically as they are put into production 13
Starting Over Legacy Management Solutions Will Never Be Able to Cope With the SDDC Blind Dinosaur = A Software Defined Data Center changes too frequently for legacy management solutions to be able to keep up. Legacy solutions cannot be incrementally modified to be able to cope with the SDDC Gluing a new product from an acquired startup to the side of a legacy management solution cannot fix a fundamentally broken approach Put the dino in a cage and do not let him out build a new management stack for your SDDC isolate the dino to your legacy physical environment 14
Gartner is Not Going to Be Much Help Either Gartner used to cover Operations Management tools in its IT Event Correlation and Analysis Magic Quadrant That MQ was last published in December 2012, and was retired in 2013 Gartner has not yet come up with a replace MQ that includes legacy vendors like IBM, BMC, HP and CA, as well as newcomers like VMware vcenter Operations, Dell vfoglight, Microsoft SCVVM, VMTurbo, etc. 15
Insist upon the New Way of Trying, Implementing, and Buying Management Software The Old Way The New Way Rep takes the CIO to play golf Enterprise software deal gets signed Some products work, others don t People go around the ELA to get the tools they need You get to download and use the software in production first You prove to yourself that it really does work and add value in your environment Then (and only then) do you buy it 16
Organize for Virtualization of Critical Applications, Agility, and Success Virtualization is Just One Team Virtualization and Application Operations are THE Teams Data Center Operations Virtual Operations Application Operations Support Systems Engineering Programmer/Analyst Team Tier 3 Support WAN Team Java Server Team Tier 2 Support LAN Team Web Server Team Tier 1 Help Desk Windows Server Team Database Team Linux Server Team SAN Team Storage Team The existing IT Operations Organization will not be able to cope with the SDDC or the clouds that run on it Virtualization pervades IT Operations, and becomes Virtual Operations Application Operations is responsible for the performance of every application in production (purchased and custom developed) 17
Performance Resource Utilization Performance = Response Time & Latency The Root of All Evil CPU and Memory are horrible indicators of performance Latency is the appropriate measure of infrastructure performance Response Time is the appropriate measure of application performance 18
Pick The Right Vendors 19
A Reference Architecture for your SDDC Management Stack The SDDC Management Stack Self-Learning Analytics Big Data Repository Cloud Management App Performance Mgmt Infrastructure Perf. Mgmt Operations Mgmt Security* Data Protection* Automation & Orchestration * Not Covered in this Presentation 20
Surgeon Generals Warning Trying to use software products that do not exist (or do not work yet) is bad for your health 21
Big Data Repository We Need a Multi-Vendor Management Data Store! Key Functionality All management products should feed one data store One version of the truth as to the state of the SDDC Since the SDDC is one Domain The only feasible way to do entire-domain root cause and reporting The only feasible way to do entire domain analytics Data Protection Security Operations Mgmt Infrastructure Perf. Mgmt App Performance Mgmt Cloud Management Big Data Repository Self-Learning Analytics Automation & Orchestration Potential Vendors VMware (LogInsights) Splunk Cloudera (Hadoop) 10gen (MongoDB) NuOdb Pivotal (HVE) 22
Operations Management Key Features 1. Host and guest resource utilization monitoring 2. Capacity Mgmt & Planning 3. Used by IT Operations Example Vendors Cirba CloudPhysics HP (VPV) ManageEngine Quest (voperations) Reflex Systems Solarwinds Splunk Veeam VMTurbo VMware vc OPS Zenoss 23
Key Criteria for Resource Based Performance and Capacity Monitoring Out of the box value if it is not providing value in 10 minutes junk it and find something else (auto-discovery is key) Collect data from vcenter AND the other virtualization platforms that you support or plan to support Look for the integration of performance management, capacity management, and configuration management Collecting, dashboarding, alerting, and reporting on vcenter data is commodity functionality look for value in analytics and automation 24
Infrastructure Performance (Latency) Management Servers Network Fabric Key Features 1. Understanding of end-to-end infrastructure performance 2. Capacity management and planning 3. Infrastructure response time is the key metric 4. Used by the team supporting the virtual infrastructure SAN Fabric Storage 25 Example Vendors AppNeta ExtraHop Networks Riverbed Sevone Virtual Instruments Xangati
Key Criteria for Infrastructure Response Time Solutions Measure IRT Monitor how long it takes the infrastructure to respond to requests for work, not how much resource it takes Deterministic Get the real data, not a synthetic transaction, or an average Real Time Get the data when it happens, not seconds or minutes later Comprehensive Get all of the data, not a periodic sample of the data Zero-Configuration (Discovery) Discover the environment and its topology, and keep this up to date in real time Application (or VM) Aware Understand where the load is coming from and where it is going Application Agnostic Work for every workload or VM type in the environment irrespective of how the application is built or deployed 26
Example Infrastructure Performance Management & Real Time Metrics Knowing whether performance is good or not all of the time, requires measuring performance in a comprehensive, deterministic, and real time manner Averaging good transactions with bad transactions obscures the true nature and impact of the bad transactions VMware vcenter 5 Minute Average Data Virtual Instruments VirtualWisdom Real Time Data 27
Application Performance Management Key Features 1. Understanding of app response time across the application system 2. Used by Operations and Application Support Agent Agent Agent Agent Agent Agent Agent Agent Agent Agent 28 Example Vendors AppEnsure AppDynamics AppFirst AppNeta BlueStripe Boundary Confio Software Correlsense Compuware (dynatrace) ExtraHop Networks HP (Performance Anywhere) New Relic Quest (Foglight) Riverbed
APM is Not Just for Custom Applications Apps Ops = Every Application! Custom Developed Apps (DevOps) Every App (AppOps)) CA/Wily HP Diagnostics IBM ITCAM Precise BMC Patrol NetIQ HP BAC Legacy CA Unicenter/Spectrum 29 AppDynamics AppNeta (TraceView) Compuware HP (Perf. Anywhere) New Relic Quest (Foglight) AppEnsure AppFirst BlueStripe Boundary Confio Software Correlsense ExtraHop Riverbed Modern
Key Criteria for Application Response Time Solutions Measure Actual Application Response Time How long did it take, not how much resource it used Breadth of Application Support Ideally support every application running in the environment automatically (conflicts with depth) Depth of Root Cause Diagnostics Provide deep analysis into the application stack for root cause (conflicts with breadth) Deterministic Get the real data, not a synthetic transaction, or an average Real Time Get the data when it happens, not seconds or minutes later Comprehensive Get all of the data, not a periodic sample of the data Application Discovery and Topology Mapping Automatically discover new applications and their topology and keep this update to date automatically and continuously Analytics and Baselining Avoid manual thresholds, learn normal behavior and alarm based upon deviations from normal Public Cloud Ready Allow applications to be distributed across organizational boundaries, and have monitoring work with no firewall work 30
Examples Dynamic, Continuous, Real-Time Application Response Time AppEnsure dynatrace AppDynamics BlueStripe 31
Cloud Management Key Features 1. Automated Provisioning of Services 2. Presentation of Services in a Service Catalog Agent Agent Agent Agent Agent Agent Agent Agent Agent Agent Example Vendors BMC CLM Cisco (Cloupia) Citrix (Cloud.com) CloudBolt Software Embotics Eucalyptus FluidOps Piston Cloud (OpenStack) ServiceMesh VirtuStream VMware vcac 32
The Three Phases of Cloud Management 1) AWS Clone Phase (Self-Service from IT) Let IT offer what AWS offers Probably not as easy Probably not as flexible Probably not as cheap Why the first generation of Cloud Management failed 2) Tactical IT Agility Phase (Automated Provisioning) Automates provisioning of tactical and simple production applications Does not address anything that really matters to the business Where we are now 3) Enterprise Application Phase (Lifecycle Management) Automate the management of the applications that matter (DevOps, SAP) Address the core of what IT does day in and day out The strategy for the enterprise capable Cloud Management vendors 33
IT Automation in Your SDDC Puppet Populate the Image Chef Legacy Automation Process vfabric AppDirector Assemble the Application 34
Self-Learning Analytics The Only Way to Keep up with your SDDC Prelert Automation & Orchestration Cloud Management App Performance Mgmt Infrastructure Perf. Mgmt Operations Mgmt Security Data Protection Big Data Repository Real Time, Deterministic and Comprehensive Data Self-Learning Analytics VMW vc Ops Netuitive The right organization, the right tools, and the right data Combined with the right self-learning Analytics Leads to an automated entire stack Root Cause Analysis Process 35
Before You Try to be Predictive Instrument your infrastructure for end-to-end latency (Infrastructure Performance Management) Implement a real-time operational data store that can keep up with the rate of change in your virtual environment Implement a modern Developer focused APM solution for your critical custom developed applications Implement an Operations focused APM solution to measure response time for every application Get as real time, deterministic, and comprehensive as possible with all of your response time and latency metrics Reorganize and implement an Application Operations function staffed with application domain experts Operationalize finding and fixing problems in real time Then and only then try to get truly predictive 36
Evaluation Criteria for Performance Analytics How automated is the learning (really) Diversity of accepted data (time series, events) Frequency and quantity of data inputs Breadth of plug-ins to the monitoring products you own, or are going to own Process for learning (handling) normal events Tradeoffs between false positives (false alarms) and false negatives (you missed something) Ease of implementation (time and cost) Quality of the Analysis (can you trust it?) 37
The Reference Architecture with VMware Management Solutions The SDDC Management Stack The VMware Implementation Self-Learning Analytics Big Data Repository Cloud Management App Performance Mgmt Infrastructure Perf. Mgmt Operations Mgmt Security Data Protection Automation & Orchestration vc Ops & Log Insight Analytics Log Insight vcloud Automation Center Partner Solutions Future Networking Instrumentation vcenter Operations Manager vshield VDP & SRM vc Orchestrator and Puppet The first vendor of an SDDC (VMware) will be the first vendor of an SDDC Management Stack (VMware) 38
A Reference Architecture for your SDDC Management Stack Netuitive, Prelert Self-Learning Analytics The SDDC Management Stack Big Data Repository Cloud Management App Performance Mgmt Infrastructure Perf. Mgmt Operations Mgmt Security* Data Protection* Automation & Orchestration CloudBolt, Embotics, FluidOps ServiceMesh, VirtuStream AppDynamics, AppEnsure, AppFirst, AppNeta, BlueStripe, Boundary, Compuware, Correlsense, ExtraHop, INETCO, New Relic, Riverbed Confio, ExtraHop, GigaMon, Virtual Instruments, Xangati Cirba, CloudPhysics, Dell, HP, Hotlink, VMTurbo, Zenoss Splunk * Not Covered in this Presentation Puppet, Chef, Cloud Sidekick Intigua 39
Candidate Vendors to Manage Your SDDC Cirba CloudPhysics Hotlink HP (VPV) Intigua MangeEngine Quest Reflex Systems SolarWinds Splunk Veeam VMTurbo Zenoss vc Operations Perf. & Cap Mgmt AppEnsure AppDynamics AppFirst AppNeta BMC CLM BlueStripe Cisco/Cloupia Boundary Citrix (Cloud.com) Confio Software CloudBolt Correlsense Embotics AppNeta Compuware Eucalyptus ExtraHop Networks ExtraHop FluidOps Riverbed HP (Perf. Anywhere) Piston Cloud Puppet Sevone New Relic ServiceMesh Opscode (Chef) Netuitive Virtual Instruments Quest (Foglight) VirtuStream ScaleXtreme Prelert Xangati Riverbed vcac App Director vcenter Operations Infr. Perf. Mgmt App Perf. Mgmt Cloud Management Automation Self-Learning Analytics Virtualization Platform (vsphere, vcloud, Hyper-V, KVM, XenServer) 40
One Final Point (Wrap Up) In this industry we are great at inventing things to solve problems that we did not know that we had The PC, the LAN, Client/Server, the Internet, Java, Server Virtualization, VDI, Clouds and Smartphones are all innovations that targeted previously unknown problems We are very good at propagating these innovations throughout enterprise organizations worldwide Every time we do this we forget about managing the innovation before we deploy it If you buy the right management products at the right time you can avoid repeating this mistake with your SDDC 41
Thank You 42