1 Splunk for VMware Virtualization Marco Bizzantino Vmug - 05/10/2011
2 Collect, index, organize, correlate to gain visibility to all IT data Using Splunk you can identify problems, models, threats and deals that help you to improve IT and business decisions
3 Real Time indexing
4 Search and Investigate
5 Interact with search results
6 Correlate complex events
7 Analyze and Report
8 Build custom Dashboards
9 Deploy IT Apps
10 Centralizes Data Across the Environment - Universal Forwarder sends data to Splunk from remote systems - Uses minimal system resources, easy to install and deploy - Delivers secure, distributed, real-time universal data collection for tens of thousands of endpoints
11 Scales to TBs/day and Thousands of - Automatic load balancing linearly scales indexing Users - Distributed search and MapReduce linearly scales search and reporting
12 Runs Across Datacenters - Distributed search unifies the view across locations - Role-based access controls how far a given user's search will span
13 Getting Data into Splunk Agent and Agent-less Approach for Flexibility syslog TCP/UDP Local File Monitoring log filesconfig files dumps and trace files syslog compatible hosts and network devices Mounted File Systems \\hostname\mount WMI Event Logs Performance Active Director yy she ll cod e perf Scripted Inputs shell scripts custom parsers batch loading virtual host Windows Inputs Event Logs performance counters registry monitoring Active Directory monitoring Unix, Linux and Windows hosts Windows hosts Custom apps and scripted API connections Agent-less Data Input Splunk Forwarder Windows hosts 13
14 Universal Data Forwarder Forward data without negatively impacting production performance. Delivers secure, distributed, real-time universal data collection for 10 s of thousands of endpoints Extends Splunk data fabric to large scale private cloud and desktop environments Uses minimal system resources, easy to install and deploy Logs Universal Forwarder Deployment Message s Configurations Metrics Central Deployment Management Scripts Monitor files, changes and the system registry; capture metrics and status. 14
15 IT Infrastructures Are Complex, Expensive and Mission Critical Explosion of servers number, differences and interdependencies Growing volumes of IT data, captive in silos, all managed separately Diagnosing and fixing issues too time-consuming and manual Relentless focus on IT efficiency
16 Virtualization Increases Complexity Virtualization separates applications from hardware Virtualization allows sharing of infrastructure but problems also get shared! Troubleshooting issues with application service levels now needs one more layer of visibility 16
17 Why Splunk Is Unique for Complex, Distributed Environments Universal. Works with ALL IT infrastructure data from a single place, without complex parsers or adapters. Centralized views spanning virtual and physical environments provide visibility and correlation across all layers. Proactive. Monitor and alert for early warning signs to preempt performance degradation and catch issues before they affect services. Flexible. Analyze any type of problem with rapid drilldown into source data to precisely pinpoint root cause; adapts quickly to any type of change in your environment Massively Scalable. Scale linearly with commodity hardware; scale horizontally across Splunk instances. Easily Customizable. Create reports, dashboards and views on the fly and in minutes. Customize views depending on the type of reporting needed (example: business-level vs device-level metrics) 17
18 Splunk for VMware Gain deep insights into virtual environments with correlation across the application, hypervisor and hardware tiers. Scalable and extensible monitoring for all elements of the virtual infrastructure 18
19 What Splunk Monitors in Virtual Environments VMware vcenter Server VMware vsphere VC logs and events Application configurations/ logs/metrics OS logs/metrics Kernel logs Network device logs Storage access logs Host level logs 19 Splunk natively works with data generated by every layer of the stack ESX/vSphere logs can be sent over syslog to Splunk VC logs and events can be directly tailed by Splunk Collect from within virtual machines VMware app to pull cluster, host and virtual machine metrics
20 Only Splunk Can: Find errors and issues hidden in host log files and correlate them with application performance slow downs or outages Detect hypervisor functionality failure issues (e.g. HA/DRS failures for VMware environments) Persist all events for compliance and security Coming in the future: Display critical host and virtual machine metrics/monitor configurations and correlate with application behaviour Persist VC data and use for historical analysis and trending 20
21 Index - Remotely indexes all of the logs, metrics and configurations from all the applications and operating systems, hypervisors and the underlying infrastructure Search Features - Pre-defined searches accelerate troubleshootingacross dynamic virtual environments - Instantaneous free form search across all IT data: apps, guests, VMs, physical host and the network - Find information hidden in logs without having to log in to multiple, individual hosts or virtual machines 21
22 Features Alert - Pre-defined alerts notify administrators of common performance and resource contention issues - Root cause investigation searches can be saved as new alerts to improve monitoring coverage over time - Automated actions using management APIs Report - Pre-defined reports and dashboards provide management visibility into workload and service levels within virtualized environments - Custom and ad-hoc reports can be created easily - No schema to maintain. Identify fields and report on identified fields on the fly - Persist transient data and flexibly report on it to meet compliance requirements 22
23 Example Scenario 1: Symptom : Application performance slows down Diagnosis : Splunk dashboards for the application show normal CPU/memory availability Splunk dashboards for ESX indicates SCSI aborts on the host running the application Root Cause : Virtual machine is connected on the backend to a shared storage LUN where many other busy VMs reside Virtual machine running the application is encountering storage conflicts
24 Example Scenario 2: HA Heartbeat network Symptom : Applications on a particular host suddenly get powered off Diagnosis : Splunk shows syslog entries for the particular IP address as missing for more than 12 seconds but less than 15 seconds Root cause : Sometimes VMware features (ironically, High Availability) will cause applications to get powered off. HA heartbeat network goes down, 12 seconds later VMs get powered off, if network resumes in 3 secs after that, they don t get restarted.
25 Example Scenario 3: HA Heartbeat network Symptom : Storage accesses by virtual machines are very slow Diagnosis : Splunk dashboard indicates a huge increase in entries in vmware-x log files Root cause : VMware HA wrongly tries to restart some virtual machines. Logs are flooded with the below warning messages WARNING: Swap: vm 13001: 1480: Failed to create swap file '/volumes/datastorename/vmfolder/vmname-6bc43c2b.vswp': Out of resources
26 Other Nuggets in Hypervisor Logs: Host level logs contain critical events like: VMotion failed due to virtual hardware misconfiguration connectivity lost : for networking and storage (vsphere events have vprob preceding them) vmfs volume locked : possibly because the host crashed while accessing a volume APD : all paths are dead vprob.net.migrate.vmknic : Migration failures due to NIC misconfiguration Storage misconfiguration issues
27 Coming in the Future : Metrics in Splunk! Example Uses: View CPU utilization or CPU ready time by virtual machines on the same host, by virtual machines in the cluster and other views View memory swap rates to determine if memory is being allocated appropriately View network and storage stats to proactively discover contention Persist your metrics for historical views or analyses without overwhelming vcenter Server
28 Getting Data In ESX logs Forwarding host logs through Syslog to Splunk Edited ESX host config file and restarted syslog server, details at below URL Pulling Virtual Center logs and events Splunk can directly index the VC logs file at C:\Documents and Settings\All users\application Data\VMware\VMware VirtualCenter\Logs
29 Getting Data In The new APP The Splunk for VMware solution collects and harnesses data from the virtualization layer to enable true end-to-end visibility in virtualized environments 1. Splunk App for VMware. This app has the views, dashboards and saved searches that provide insights into your virtualization layer 2. Splunk Forwarder Virtual Appliance for Vmware. This VM image (.ova) is a data collector that you deploy using vcenter (VC) 3. Splunk Add-on for vcenter. This is used to collect vcenter log data and is installed into Splunk Forwarders running on vcenter machines 4. Perl API package
30 Virtualization management using Splunk Index metrics, configurations, status and logs from the hypervisor via the VMware ESX, Xen and other APIs as well as logs and other data from the guest OS and applications. This indexed data repository will survive guest power-down, critical for compliance-mandated log retention.
31 Virtualization management using Splunk Systems administrators and developers will initially use Splunk to troubleshoot problems with apps deployed in the virtual infrastructure, with the ability to navigate from the application tier to guest OS, underlying hypervisor, and identify cross-guest issues. Security analysts will use Splunk to investigate incidents and identify zero-day attack footprints across both running and power-down systems.
32 Virtualization management using Splunk Over time, everyone will enrich the indexed data with knowledge of the environment and the data it produces, breaking down the even more severe silos of knowledge endemic to virtualized environments.
33 Virtualization management using Splunk Virtualized infrastructure managers will come to automate Splunk searches with alert triggers to easily monitor for new problems they identify as they adopt virtualization and lack adequate monitoring coverage.
34 Virtualization management using Splunk Splunk reports and dashboards will provide a quick way to build visibility into utilization, performance and faults across all tiers of the virtualized environment, even where existing management tools haven't kept up. Ultimately, operations staff will realize that proactively searching and visualizing machine data across the stack with Splunk is one of the best approaches to navigating the unknowns of new virtualized infrastructure.
35 Typical Saved Searches/Alerts Alerts for: Host re-boots Hardware or machine check errors Predict HA failures by watching for a memory leak - Hosts exceeding soft memory limits - SCSI Aborts AdHoc reports such as: Quick scan to see where VMware tools are out of date What percentage of Win 64 bit vs. Win 32 bit? How many Red Hat Linux VMs do we have? Who logged into VM environment? What did they do?
WHITE PAPER Introduction... 2 Reduce Tool and Process Sprawl... 2 Control Virtual Server Sprawl... 3 Effectively Manage Network Stress... 4 Reliably Deliver Application Services... 5 Comprehensively Manage
Achieving Business Performance Goals through Virtualization Management Best Practices An ENTERPRISE MANAGEMENT ASSOCIATES (EMA ) White Paper Prepared for eg Innovations, Inc Virtual environments need comprehensive
XenApp on VMware: This product is protected by U.S. and international copyright and intellectual property laws. This product is covered by one or more patents listed at http://www.vmware.com/download/patents.html.
WHITE PAPER: Integrated Infrastructure and Performance Management for Virtualized Environments Integrated Infrastructure and Performance Management for Virtualized Environments APRIL 2009 Table of Contents
VMware vsphere The Best Platform for Building Cloud Infrastructures VMware vsphere 4.1 Features and Benefits Compared to Microsoft Hyper-V R2, and VMware vsphere, the industry s first cloud operating system,
Trend Micro Deep Security Server Security Protecting the Dynamic Datacenter A Trend Micro White Paper August 2009 I. SECURITY IN THE DYNAMIC DATACENTER The purpose of IT security is to enable your business,
HP Performance Engineering Best Practices Series for Performance Engineers and Managers Performance Monitoring Best Practices Document Release Date: 201 Software Release Date: 2014 Legal Notices Warranty
Basic System Administration ESX Server 3.0 and VirtualCenter 2.0 Basic System Administration Revision: 20090213 Item: VI-ENG-Q206-219 You can find the most up-to-date technical documentation at: http://www.vmware.com/support/pubs
HP Performance Engineering Best Practices Series for Performance Engineers and Managers Performance Monitoring Best Practices Document Release Date: May 2009 Software Release Date: May 2009 Legal Notices
AUTOMATED MONITORING AND EVENT RECOVERY OF VBLOCK INFRASTRUCTURE PLATFORMS WITH IPSOFT MANAGED SERVICE May 2011 2011 VCE Company, LLC. All rights reserved. 1 Table of Contents Executive Summary... 3 The
Solution Brief TrueSight App Visibility Manager Go beyond mere monitoring. Table of Contents 1 EXECUTIVE SUMMARY 1 IT LANDSCAPE TRENDS AFFECTING APPLICATION PERFORMANCE 1 THE MOBILE CONSUMER MINDSET DRIVES
IT@Intel White Paper Intel IT IT Best Practices Private Cloud and Cloud Architecture December 2011 Best Practices for Building an Enterprise Private Cloud Executive Overview As we begin the final phases
; The Truth about Agent vs. Agentless Monitoring A Short Guide to Choosing the Right Solution. Monitoring applications, networks and servers (physical, virtual and cloud), across multiple platforms (AIX,
About this guide Deep Security provides a single platform for server security to protect physical, virtual, and cloud servers as well as hypervisors and virtual desktops. Tightly integrated modules easily
Trend Micro Incorporated reserves the right to make changes to this document and to the products described herein without notice. Before installing and using the software, please review the readme files,
IT Monitoring for the Hybrid Enterprise With a Look at ScienceLogic Perspective 2012 Neovise, LLC. All Rights Reserved. Report Published April, 2015 Hybrid IT Goes Mainstream Enterprises everywhere are
Best Practices and Recommendations for Scale-up Deployments of SAP HANA on VMware vsphere DEPLOYMENT AND TECHNICAL CONSIDERATIONS GUIDE Table of Contents Introduction...................................................................
An Oracle White Paper April 2010 Application Performance Management with Oracle Enterprise Manager 11g Introduction... 1 Top Challenges of Application Performance Management... 2 Oracle s Application Performance
Performance Study VMware vcenter Server Performance and Best Practices VMware vsphere 4.1 VMware vcenter Server allows you to manage all levels of a VMware vsphere deployment from datacenters to clusters,
Acronis Backup & Recovery 11 Next Generation Physical, Virtual, Cloud Backup, Disaster Recovery, and Data Protection Solution from Acronis An Acronis White Paper Copyright Acronis, Inc., 2000 2011 Table
Microsoft System Center 2012 R2 Why Microsoft? For Virtualizing & Managing SharePoint July 2014 v1.0 2014 Microsoft Corporation. All rights reserved. This document is provided as-is. Information and views
Best Practices for the HP EVA Array using VMware vcenter Site Recovery Manager Table of contents Introduction... 2 HP StorageWorks Continuous Access EVA... 3 Data replication... 3 DR groups and copy sets...
The Incremental Advantage: MIGRATE TRADITIONAL APPLICATIONS FROM YOUR ON-PREMISES VMWARE ENVIRONMENT TO THE HYBRID CLOUD IN FIVE STEPS CONTENTS Introduction..................... 2 Five Steps to the Hybrid
Monitoring Best Practices for OVERVIEW Providing the right level and depth of monitoring is key to ensuring the effective operation of IT systems. This is especially true for ecommerce systems like Magento,