Keeping Splunk in Check: Tools to BeGer Manage Your Investment

Similar documents
The Jiffy Lube Quick Tune- up for your Splunk Environment

Making the Most of the New Splunk Scheduler

Copyright 2015 Splunk Inc. Go Big or Go Home. Sean Delaney Specialist SE Mustafa Ahamed Director, Product Management

From the Datacenter to the Dean s office

Deploying Splunk on Amazon Web Services

Best PracBces: Deploying Splunk on Physical, Virtual, and Cloud Infrastructure

Splunk Enterprise in the Cloud Vision and Roadmap

In Depth with Deployment Server Sanford Owings

Splunk implementa-on. Our experiences throughout the 3 year journey

Splunk Search Pro Tips

Grid CompuAng AnalyAcs with Splunk Finnbar Cunningham

Analyzing Big Data with Splunk A Cost Effective Storage Architecture and Solution

NETWRIX FILE SERVER CHANGE REPORTER

Monitoring System Status

Analysis of VDI Storage Performance During Bootstorm

VMware vcenter Log Insight Getting Started Guide

Building a Splunk-based Lumber Mill. Turning a bunch of logs into useful products

ontune SPA - Server Performance Monitor and Analysis Tool

How To Set Up Safetica Insight 9 (Safetica) For A Safetrica Management Service (Sms) For An Ipad Or Ipad (Smb) (Sbc) (For A Safetaica) (

Vulnerability Management with the Splunk App for Enterprise Security

For Splunk Universal Forwarder and Splunk Cloud

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

Enterprise Manager Performance Tips

Gain Insight into Your Cloud Usage with the Splunk App for AWS

Panorama PANORAMA. Panorama provides centralized policy and device management over a network of Palo Alto Networks next-generation firewalls.

NetFlow Analytics for Splunk

VMware vcenter Log Insight Getting Started Guide

Cognos Performance Troubleshooting

Administering Cisco ISE

NetIQ Privileged User Manager

Trend Micro Incorporated reserves the right to make changes to this document and to the products described herein without notice.

SAP HANA implementation on SLT with a Non SAP source. Poornima Ramachandra

Building a Cyber Security Program

Sawmill Log Analyzer Best Practices!! Page 1 of 6. Sawmill Log Analyzer Best Practices

KonyOne Server Installer - Linux Release Notes

Belgacom Group Carrier & Wholesale Solutions. ICT to drive Your Business. Hosting Solutions. Datacenter Services

Q & A From Hitachi Data Systems WebTech Presentation:

Using New Relic to Monitor Your Servers

Deployment Best PracHces for Splunk Apps Monitoring MicrosoK- based Infrastructure

securityserver Unparalleled management and detection of security, environmental, and disaster control resources.

Tableau Server Scalability Explained

Advanced Install & Configuration Guide

VMware vcenter Log Insight Administration Guide

Hardware Sizing and Bandwidth Usage Guide. McAfee epolicy Orchestrator Software

Maintaining Non-Stop Services with Multi Layer Monitoring

e-config Data Migration Guidelines Version 1.1 Author: e-config Team Owner: e-config Team

MulGsite Clustering and Search Affinity

insync Installation Guide

PANORAMA. Panorama provides centralized policy and device management over a network of Palo Alto Networks next-generation firewalls.

This document details the procedure for installing Layer8 software agents and reporting dashboards.

QRadar SIEM 7.2 Windows Event Collection Overview

Virtuoso and Database Scalability

Stream Deployments in the Real World: Enhance Opera?onal Intelligence Across Applica?on Delivery, IT Ops, Security, and More

WhatsUp Event Archiver v10 and v10.1 Quick Setup Guide

ENTERPRISE INFRASTRUCTURE CONFIGURATION GUIDE

W H I T E P A P E R : T E C H N I C A L. Understanding and Configuring Symantec Endpoint Protection Group Update Providers

McAfee Network Security Platform 8.2

Agility Database Scalability Testing

Enterprise Network Deployment, 10,000 25,000 Users

Intel Service Assurance Administrator. Product Overview

BIG-IP Access Policy Manager and Splunk Templates

WhatsUp Gold v16.2 MSP Edition Deployment Guide This guide provides information about installing and configuring WhatsUp Gold MSP Edition to central

Mark Bennett. Search and the Virtual Machine

GigaSpaces XAP 10.0 Administration Training ADMINISTRATION, MONITORING AND TROUBLESHOOTING GIGASPACES XAP DISTRIBUTED SYSTEMS

Lavastorm Resolution Center 2.2 Release Frequently Asked Questions

Network Monitoring & Management Log Management

Wowza Media Systems provides all the pieces in the streaming puzzle, from capture to delivery, taking the complexity out of streaming live events.

Workflow ProducCvity in Splunk Enterprise

Secure Web. Hardware Sizing Guide

A10 Networks Load Balancer

Product Version 1.0 Document Version 1.0-B

GigaSpaces XAP.NET Administration Training ADMINISTRATION, MONITORING AND TROUBLESHOOTING GIGASPACES XAP.NET DISTRIBUTED SYSTEMS

Tableau Server 7.0 scalability

Performance Characteristics of VMFS and RDM VMware ESX Server 3.0.1

Splunk Best Practices

Pricing Guide. Overview FD Enterprise License SaaS Packages Dedicated SaaS Shared SaaS. Page 2 Page 3 Page 4 Page 5 Page 8

Heroix Longitude Quick Start Guide V7.1

OPAS Prerequisites. Prepared By: This document contains the prerequisites and requirements for setting up OPAS.

University of Southern California Shibboleth High Availability with Terracotta

MEASURING WORKLOAD PERFORMANCE IS THE INFRASTRUCTURE A PROBLEM?

Chronon: A modern alternative to Log Files

Splunk Dashboard Framework What s New Nicholas Filippi Product Management, Splunk

Real World Big Data Architecture - Splunk, Hadoop, RDBMS

VMware vrealize Automation

IBM Security QRadar Version (MR1) WinCollect User Guide

SurfProtect User Activity Reporting

Developing an Application Tracing Utility for Mule ESB Application on EL (Elastic Search, Log stash) Stack Using AOP

Symantec Endpoint Protection 11.0 Architecture, Sizing, and Performance Recommendations

VMware vrealize Automation

TIBCO Spotfire Metrics Prerequisites and Installation

OnCommand Performance Manager 1.1

FUNCTIONAL OVERVIEW

WhatsUp Event Alarm v10x Quick Setup Guide

Security Event Management. February 7, 2007 (Revision 5)

Transcription:

Copyright 2015 Splunk Inc. Keeping Splunk in Check: Tools to BeGer Manage Your Investment Aaron Kornhauser Sr. Professional Services Consultant, Splunk, Inc. Vladimir Skoryk Sr. Professional Services Consultant, Splunk, Inc.

Disclaimer During the course of this presentakon, we may make forward looking statements regarding future events or the expected performance of the company. We caukon you that such statements reflect our current expectakons and eskmates based on factors currently known to us and that actual events or results could differ materially. For important factors that may cause actual results to differ from those contained in our forward- looking statements, please review our filings with the SEC. The forward- looking statements made in the this presentakon are being made as of the Kme and date of its live presentakon. If reviewed aver its live presentakon, this presentakon may not contain current or accurate informakon. We do not assume any obligakon to update any forward looking statements we may make. In addikon, any informakon about our roadmap outlines our general product direckon and is subject to change at any Kme without nokce. It is for informakonal purposes only and shall not, be incorporated into any contract or other commitment. Splunk undertakes no obligakon either to develop the features or funckonality described or to include any such feature or funckonality in a future release. 2

Agenda! IntroducKon! Reference Hardware! Available Tools! Common QuesKons! Scenarios/TroubleshooKng! Resources! Q&A 3

Who Are We? Hello, I m: Aaron Kornhauser Sr. PS Consultant akorn@splunk.com Vladimir Skoryk Sr. PS Consultant vs@splunk.com 4

Reference Hardware Role Core Splunk* Enterprise Security (ES) Indexer Search Head 12 CPU cores 12GB of RAM 800 IOPS/indexer RAID 1+0 data ingest: 150-200GB/day 16 CPU cores 12GB of RAM 2x 300GB 10k rpm SAS in RAID1 12 CPU cores 12GB of RAM 800 IOPS/indexer RAID 1+0 data ingest: 100GB/day 16 CPU cores 16GB of RAM 2x 300GB 10k rpm SAS in RAID1 All instances x64, CPU > 2Ghz per core * hgp://docs.splunk.com/documentakon/splunk/latest/capacity/referencehardware hgp://docs.splunk.com/documentakon/es/latest/install/deploymentplanning 5

So what s out there and what s the difference? Available Tools Distributed Management Console (DMC) Built in and only available on v6.2+ hgp://docs.splunk.com/documentakon/splunk/latest/admin/configurethemonitoringconsole Splunk supported and focuses on all facets of the deployment New feature preso with Patrick/Octavio make sure you go see it! FireBrigade hgps://splunkbase.splunk.com/app/1632/ Detailed look at index/bucket ackvity and capacity SoS (Splunk on Splunk) hgps://splunkbase.splunk.com/app/748/ Legacy Splunk troubleshookng tool Our health app Splunk Health Overview hgps://splunkbase.splunk.com/app/1919/ CombinaKon of views found to be helpful in the field Note: Deployment monitor app is deprecated try to stay away from it Many of these app funckonalikes are being rolled in the DMC 6

How Are Things, Overall? High level environment status quick view of what s up/down/not reporkng: Forwarder health - finding forwarders that we haven t seen for awhile Data source health - how are our data feeds doing? REST endpoints ( rest /services/server/info) - looking at system informakon, possibly under provisioned ones Spovng warnings and errors within Splunk _internal: index=_internal sourcetype=splunkd (log_level=error OR log_level=warn) cluster showcount=t table cluster_count host log_level message sort cluster_count rename cluster_count AS count, log_level AS level index=_internal sourceype=splunkd log_level!=info Kmechart count by component Track resource usage: Say hello to _introspeckon (Splunk 6.1+) Captures disk and other resource metrics (by default on full installs) hgp://docs.splunk.com/documentakon/splunk/latest/troubleshookng/abougheplayorminstrumentakonframework Dashboards to help save the day: Health Status - Splunk Health Overview Instance - Distributed Management Console Indexing Performance - Distributed Management Console Resource Usage - Splunk Health Overview License Usage - Splunk Health Overview 7

Coming up! Scenario based discussions around health topics Environment overview Data health ConfiguraKon Usage Search insights 8

Scenario 1: Environment Overview How to use the tools available to check overall health What are we reporkng on? _internal _introspeckon metadata and using tstats hgp://docs.splunk.com/documentakon/splunk/latest/searchreference/ Tstats REST endpoints rest /services/server/info rest /services/server/roles rest /services/server/status/resource- usage No need for addikonal addons 9

Scenario 1: Environment Overview Splunk Health Overview Health Status Doesn t meet reference hardware 10

Scenario 1: Environment Overview Splunk Health Overview Heath Status Looks like source stopped sending data! Quite a few errors and warnings! 11

Scenario 1: Environment Overview DMC - Instances Issues accessing instance 12

Scenario 1: Environment Overview DMC Indexing Performance Slight ingeskon imbalance 13

Scenario 1: Environment Overview Splunk Health Overview Resource Usage 14

Scenario 2: Data Imbalance Splunk Health Overview License Usage The admin is doing their daily roukne checks and inspects the license usage. Things seem normal Week license over week limit license wise usage and helps they detect anomalies and growth have historically seen a drop aver updakng syslog filters but not all indexers are gevng an even stream of data. 15

Scenario 2: Data Imbalance ConKnued Splunk Health Overview License Usage Ideally, each indexer should hold equal amount of data Not all indexers are receiving data 16

Scenario 2: Data Imbalance - Troubleshoot/Wrapup TroubleshooKng: Validate firewall rules are in place Check that all forwarders have the correct outputs Ensure indexers all all listening on proper port Does splunkd.log have anything to say? Use the Indexing Overview and ConfiguraKon Overview (btool saves the day) Possible Causes: Simple misconfigurakon Data processing queues filling up and forwarders Kming out and jumping to next indexer Check Distributed Indexing Performance in the DMC for queue filling - typical sign of disk performance issues Indexer affinity - the forwarders get stuck to one indexer because EOF never met forcetimebasedautolb can help! hgp://blogs.splunk.com/2014/03/18/kme- based- load- balancing/ UpdaKng syslog files - each file <1GB, host in the path, broken out by sourcetype, cron job/ logrotate to remove stale files. 17

Scenario 3: Data Health Checkup How s your data feeling? Feed skll working Seeing recent data Gaps in data Ingest issues Line breaking, Kme parsing, truncakon Indexing latency _Kme - _indexkme PredicKve analykcs events in the future! incorrect Kme zone Kmestamp parsing issues Kme driv (NTP not set) Make sure to see the Onboarding Data Into Splunk presentakon! 18

Scenario 3: Data Quality GreeKngs from the future! Your data is hours ahead of system Kme! index=* earliest=+5m latest=+20y eval ahead=abs(now() - _Kme) stats avg(ahead) by host, sourcetype, index eval avg(ahead)=tostring('avg(ahead)', "durakon") 19

Scenario 3: Data Quality Linebreaking and Timestamping index=_internal sourcetype=splunkd component=linebreakingprocessor OR component=dateparserverbose OR component=aggregator* Kmechart count by component 20

Scenario 3: Data Quality Indexing Delay We have latency! tstats earliest(_kme) AS t earliest(_indexkme) AS i WHERE index=* AND (earliest=- 1d) BY host, sourcetype, index, _Kme span=1h eval delay=abs(t - i) where delay>5 stats avg(delay) BY host, sourcetype, index 21

Scenario 4: Consistency Is Key File order precedence hgp://docs.splunk.com/documentakon/splunk/latest/ AdminWheretofindtheconfiguraKonfiles Don t put configs in /etc/system/local Are like instances of Splunk uniformly configured? Indexer A knows about more than Indexer B Forwarder A knows about Indexer A & B Use configurakon management tools Deployment server, Chef, Puppet, SCCM, etc. Meaningful Splunk app naming convenkons org_group_applicakon_configurakon (acme_all_search_base) 22

Scenario 4: Consistency Is Key Why do we have an extra app? ConfiguraKon Overview - Splunk Health Overview Comparing btool output across like instances shows configurakon inconsistencies 23

Scenario 5: Splunk Usage Inventories Reports, dashboards, apps Search AcKvity Are users running efficient searches? hgp://docs.splunk.com/documentakon/splunk/latest/search/writebegersearches How are the scheduled jobs doing? Differed/Skipped? User ackvity monitoring What views are being accessed Who has access to data Roles and permissions Useful dashboards Search AcKvity Splunk Health Overview Scheduler AcKvity Splunk Health Overview Search AcKvity: Instance DMC Useful app: Data Governance on apps.splunk.com 24

Scenario 5: Inventory Check Saved Search Inventory - Splunk Health Overview rest splunk_server_group=dmc_group_search_head /servicesns/- /- /saved/searches 25

Scenario 5: User AcKvity View and Dashboard Audit Splunk Health Overview index="_internal" sourcetype=splunk_web_access GET app rex "GET /[^/]+/app/(?<app>[^/?]+)/" search app!="search" app=* AND user=* AND user!="- " Kmechart limit=100 count by app 26 Can always spot weekends!

Scenario 6: Search Performance Review search ackvity to ensure system and users are happy Tools Search AcKvity Splunk Health Overview Scheduler AcKvity Splunk Health Overview hgp://docs.splunk.com/documentakon/splunk/latest/search/writebegersearches Search AcKvity Instance DMC What to look for Long running searches Real Kme searches Concurrency Inefficient inline regular expressions Streaming commands before searching commands Scheduling - Frequently executed searches for long periods of Kme. ie running a search for the last day every minute 27

Scenario 6: Knowing What Is Being Searched Search range by index Looks like bulk of the searches cover 45 days Search AcKvity Splunk Health Overview 28

Scenario 6: Search Performance Understanding concurrency Search AcKvity Splunk Health Overview The total number of concurrent searches is base_max_searches + #cpus*max_searches_per_cpu max real- Kme searches = max_rt_search_mulkplier x max historical searches Set in limits.conf 29

Scenario 6: Search Performance InspecKng Searches Search AcKvity Splunk Health Overview Other helpful views: Job inspector - hgp://docs.splunk.com/documentakon/splunk/latest/knowledge/viewsearchjobproperkeswiththejobinspector Job Viewer Search AcKvity Instance - DMC 30

Wrap up: Other Sanity Checks Validate ulimit sevngs: - n open files (>2048) - f file size (unlimited?) - d data seg size (>1GB) Ensure THP is disabled on Linux distros: hgp://docs.splunk.com/documentakon/splunk/latest/releasenotes/ SplunkandTHP Index sizing: Ensure that higher volume indexes (>10GB/day) are tuned with maxdatasize = auto_high_volume and have the appropriate number of maxhotbuckets Using Fire Brigade can help determine index bucket sizing. More buckets = more scanning = slower searches 31

Scaling Splunk Knowing What To Look For Key things to look for MeeKng the reference hardware specs Indexing volume 150-200GB/day/indexer non- ES / ~100 GB ES Talk to your friendly sales rep! Data retenkon Can you meet your retenkon SLA? System load Is your system load > # cores? Number of users/searches Check search concurrency - real Kme and historical 32

Q&A Copyright 2015 Splunk Inc.

THANK YOU