Best PracBces: Deploying Splunk on Physical, Virtual, and Cloud Infrastructure

Similar documents

Deploying Splunk on Amazon Web Services

Copyright 2015 Splunk Inc. Go Big or Go Home. Sean Delaney Specialist SE Mustafa Ahamed Director, Product Management

Gain Insight into Your Cloud Usage with the Splunk App for AWS

Keeping Splunk in Check: Tools to BeGer Manage Your Investment

Splunk Enterprise in the Cloud Vision and Roadmap

Architec;ng Splunk for High Availability and Disaster Recovery

Best Practices for Monitoring Databases on VMware. Dean Richards Senior DBA, Confio Software

How AWS Pricing Works

ArcGIS for Server: In the Cloud

Windows Inputs and MicrosoC Apps Strategy

How AWS Pricing Works May 2015

Technical Paper. Moving SAS Applications from a Physical to a Virtual VMware Environment

VMware vcenter Log Insight Getting Started Guide

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

VDI Without Compromise with SimpliVity OmniStack and Citrix XenDesktop

Boas Betzler. Planet. Globally Distributed IaaS Platform Examples AWS and SoftLayer. November 9, IBM Corporation

VMware vrealize Automation

Amazon EC2 Product Details Page 1 of 5

Dell Compellent Storage Center SAN & VMware View 1,000 Desktop Reference Architecture. Dell Compellent Product Specialist Team

MaxDeploy Ready. Hyper- Converged Virtualization Solution. With SanDisk Fusion iomemory products

VMware vcloud Automation Center 6.0

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform

Dimension Data Enabling the Journey to the Cloud

HPSA Agent Characterization

SUSE Cloud 2.0. Pete Chadwick. Douglas Jarvis. Senior Product Manager Product Marketing Manager

Best Practices for Using MySQL in the Cloud

VMware vcenter Log Insight Getting Started Guide

Technology Insight Series

VMware vcloud Automation Center 6.1

VMware vrealize Automation

Storage and Disaster Recovery

Getting the Most Out of Virtualization of Your Progress OpenEdge Environment. Libor Laubacher Principal Technical Support Engineer 8.10.

Q & A From Hitachi Data Systems WebTech Presentation:

VI Performance Monitoring

Scaling out a SharePoint Farm and Configuring Network Load Balancing on the Web Servers. Steve Smith Combined Knowledge MVP SharePoint Server

Assignment # 1 (Cloud Computing Security)

Cloud computing - Architecting in the cloud

ACANO SOLUTION VIRTUALIZED DEPLOYMENTS. White Paper. Simon Evans, Acano Chief Scientist

IBM Spectrum Protect in the Cloud

REFERENCE ARCHITECTURE. PernixData FVP Software and Splunk Enterprise

Managing Capacity Using VMware vcenter CapacityIQ TECHNICAL WHITE PAPER

Splunk implementa-on. Our experiences throughout the 3 year journey

Scalable Architecture on Amazon AWS Cloud

Last time. Today. IaaS Providers. Amazon Web Services, overview

HCIbench: Virtual SAN Automated Performance Testing Tool User Guide

KT ucloud storage. Two Years of Life with OpenStack Swift / Jaesuk Ahn, Cloud OS Dev. Team, Korea Telecom

Zadara Storage Cloud A

NEXENTA S VDI SOLUTIONS BRAD STONE GENERAL MANAGER NEXENTA GREATERCHINA

VMware Virtual Machine File System: Technical Overview and Best Practices

Cloud Based Application Architectures using Smart Computing

Distributed File System Choices: Red Hat Storage, GFS2 & pnfs

Amazon Elastic Compute Cloud Getting Started Guide. My experience

ANDREW HERTENSTEIN Manager Microsoft Modern Datacenter and Azure Solutions En Pointe Technologies Phone

Building a big IaaS cloud with Apache CloudStack

Nutanix Tech Note. Configuration Best Practices for Nutanix Storage with VMware vsphere

VMware vsphere Data Protection 6.0

Microsoft Azure Cloud oplossing als een extensie op mijn datacenter? Frederik Baert Solution Advisor

Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies

How To Fix A Fault Fault Fault Management In A Vsphere 5 Vsphe5 Vsphee5 V2.5.5 (Vmfs) Vspheron 5 (Vsphere5) (Vmf5) V

Protecting the Microsoft Data Center with NetBackup 7.6

How To Use Arcgis For Free On A Gdb (For A Gis Server) For A Small Business

Cloud Optimize Your IT

Deployment Options for Microsoft Hyper-V Server

Citrix XenDesktop Modular Reference Architecture Version 2.0. Prepared by: Worldwide Consulting Solutions

Scaling Analysis Services in the Cloud

The Jiffy Lube Quick Tune- up for your Splunk Environment

Managing your Red Hat Enterprise Linux guests with RHN Satellite

Amazon Elastic Beanstalk

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

Towards Smart and Intelligent SDN Controller

HBC How to build your cloud - Steps to Extend your Datacenter

Desktop Virtualization. The back-end

An Introduction to Cloud Computing Concepts

High Availability of the Polarion Server

Release 8.2 Hardware and Software Requirements. PowerSchool Student Information System

A virtual SAN for distributed multi-site environments

Monitoring Databases on VMware

Cloud Computing and Amazon Web Services

Hardware/Software Guidelines

WINDOWS AZURE EXECUTION MODELS

Maximizing SQL Server Virtualization Performance

Oracle Hyperion Financial Management Virtualization Whitepaper

1 Copyright 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 7

Using VMware VMotion with Oracle Database and EMC CLARiiON Storage Systems

Best Practices for Implementing iscsi Storage in a Virtual Server Environment

Azure VM Performance Considerations Running SQL Server

Exploring Amazon EC2 for Scale-out Applications

VMware Virtual SAN Backup Using VMware vsphere Data Protection Advanced SEPTEMBER 2014

Cloud Computing through Virtualization and HPC technologies

Bosch Video Management System High Availability with Hyper-V

Mark Bennett. Search and the Virtual Machine

Alfresco Enterprise on Azure: Reference Architecture. September 2014

Deploying F5 BIG-IP Virtual Editions in a Hyper-Converged Infrastructure

Performance characterization report for Microsoft Hyper-V R2 on HP StorageWorks P4500 SAN storage

Transcription:

Copyright 2013 Splunk Inc. Best PracBces: Deploying Splunk on Physical, Virtual, and Cloud Infrastructure Sean Blake & Simeon Yep #splunkconf

Legal NoBces During the course of this presentabon, we may make forward- looking statements regarding future events or the expected performance of the company. We caubon you that such statements reflect our current expectabons and esbmates based on factors currently known to us and that actual events or results could differ materially. For important factors that may cause actual results to differ from those contained in our forward- looking statements, please review our filings with the SEC. The forward- looking statements made in this presentabon are being made as of the Bme and date of its live presentabon. If reviewed aver its live presentabon, this presentabon may not contain current or accurate informabon. We do not assume any obligabon to update any forward- looking statements we may make. In addibon, any informabon about our roadmap outlines our general product direcbon and is subject to change at any Bme without nobce. It is for informabonal purposes only and shall not, be incorporated into any contract or other commitment. Splunk undertakes no obligabon either to develop the features or funcbonality described or to include any such feature or funcbonality in a future release. Splunk, Splunk>, Splunk Storm, Listen to Your Data, SPL and The Engine for Machine Data are trademarks and registered trademarks of Splunk Inc. in the United States and other countries. All other brand names, product names, or trademarks belong to their respeccve owners. 2013 Splunk Inc. All rights reserved. 2

IntroducBon

About Us! Simeon Yep! 5+ years @ Splunk! Experience Customer Success Manager On- site Consultant (20 TB/day) Technical Sales Strategic Accounts! Based in HQ (San Francisco)! Currently: Sales Engineering Manager, Business Development! Sean Blake! 2+ years @ Splunk! Experience Professional Services MulBtude of customers Development background! Not at HQ (Washington DC)! Currently: Professional Services Manager, Public Sector 4

Agenda! Refresher! Plagorms Physical Virtual Cloud! Scaling! Expert s Tool Bag 5

Technical Refresher

Splunk: Indexer! Processes raw data and stores it onto disk! Input processing Parsing (char set determinabon, linebreaking) Merging (line merging, Bme extracbon) Typing (punctuabon, anonymizabon)! Indexer pipe Internal stuff Write to disk (compressed)! Performs HEAVY living for searches!

Splunk: Searcher! Spawns search process (splunkd- search) 1:1 rabo of search process to CPU core Communicates via REST API (hkps) splunkd search splunkd splunkd splunkd splunkd

Splunk: Forwarder! Sends data to a Splunk indexer in Splunk format! Install onto the remote system for data ingesbon! Low impact - basically reads in data for transmission! Full vs. Light vs. Universal?

Splunk: Universal Forwarder! A brief history! Light weight = forwarding only! Python/Splunkweb removed! Searching/indexing removed! Deployment server removed! LWF (4.1 and earlier) ~ UF

Brief Summary! Indexers: heavy living (index AND search)! Searchers: spawn the inibal search distribute as necessary! Forwarders: send data to the indexer for indexing

Plagorm: Physical

Infrastructure: Best PracBces ü Rule of thumb: more is beker Distributed search Splunk scales horizontally Higher quanbbes will parallelize CPU/IO Map- reduce ü Rule of thumb: 100 GB/day indexing volume per reference server Reference server: 2x4 core CPU, 16GB RAM, Fast Disks in RAID 1+0 ExcepBons and qualificabons ü Leverage deployment server to manage configurabon AlternaBves: Opscode s Chef, Puppet, Solaris Package Manager, etc

Infrastructure: Best PracBces ü Use commodity servers (reference server) Reference server: 2x4 core CPU, 16GB RAM, 4x300GB SAS drive (RAID 1+0) 2 vcpu indexer is a poor choice ü Use Fast Persistent Storage, don t skimp RAID 1+0 arrays, SAN, NFS is not ideal This will affect the user experience and limit growth if too slow We are oven constrained by this first, so make it a high priority ü Distributed Search considerabons Indexers require good IO performance Searchers are not as IO dependent as indexers Leverage blades or virtual machines for intermediate forwarders

Data CollecBon: Review ü How do we get data into Splunk? File or directory monitoring (e.g. access log files from web servers) Network input over TCP/UDP (e.g. syslog from a router) Scripted input (e.g. text output from running a shell script) Modular inputs (5.x) Textual Machine Data

Data Inputs: Best PracBces ü Rule of thumb: persist raw data to disk Splunk tracks files very well (CRC checks) Recovery and reliability ü Network inputs Stream to syslog- ng or similar Output to a file ü Forwarders vs. network stream or NFS File input is more steady state than network stream Data distribubon: load balance to many indexers Pre- processing: anonymize data or route it to a 3 rd party

Data Inputs: Best PracBces ü Rule of thumb: set index and sourcetype on forwarder It s the easiest method Improves indexer efficiency ü Rule of thumb: sourcetype your data Use built- in sourcetypes or create new ones Examples: csv, log4j, access_combined, iis, syslog sourcetype is an indexed field Organizes your data for efficient retrieval

Plagorm: Virtual

Virtual Machine Specs ü Virtual Machine considerabons (VMware, Citrix XenServer) Follow standard guidelines ê Bigger!= Beker ê Many = Beker Always set vcpu, RAM for full reservabon ü Indexer 8 vcpu recommended; 4 vcpu per VM minimum: 8GB RAM ê Full reservabon, if it s 14000MHz worth of CPU, then set it that way ü Search head 12 vcpu recommended; 8 vcpu per VM minimum; 12GB RAM ü Maintain expected performance = full reservabon Constantly read/write from disk, this is intensive and demanding

Storage ConsideraBons ü Splunk recommends 800 IOPS Don t use NFS for primary storage Thick provisioned disks Eager Zeroed Thick ê Avoid double I/0 when wribng to disk ê One write to zero it then another to actually write ê Does not pertain to NFS ê Common mistake and why we have caubon on VMFS ü Performance tesbng SplunkIT app tests indexing and searching bonnie++ (blog informabon available) iozone (contact for app, sbll in development) ü Use raw volumes VMFS experiences lower performance, but see above

Virtual Machine Notes ü Snapshots will degrade performance All writes to filesystem are in turn wriken to the snapshot, I/O hit When consolidabng it also requires I/O to move blocks around along with incoming data ü Distributed Resource Scheduler (DRS) Great tool, hurts Splunk If avoidable, pin the VM to the host If not, ensure anb- affinity so indexers are not overloading a single host ü Highest priority for CPU and memory shares ü Do NOT set Max Resources to less than the assigned value for the VM ü No memory overcommit ü Possible 20-30% overhead against data volumes

Plagorm: Cloud

Cloud Providers ü Splunk Storm; Enterprise SaaS ü Amazon Web Services (AWS) Large market share Many best pracbces Splunk friendly ü Azure Splunk runs on Azure VMs Azure app data is not trivial to retrieve ü Other Rackspace

AWS Overview ü Availability Zones concept of regions ü Amazon Machine Image (AMI) Amazon Linux based Best performance Cost effecbve (extra $$ for Windows) ü Instance type Spot vs. On- demand vs. Reserved ü Instance size Small, Medium, Large, Extra Large (2-8 EC2 Compute Units, 1-8 GB RAM) Standard vs. Cluster compute vs. GPU (varying CPU and RAM) XL standard behaves similar to a reference server (50-100 GB/day)

AWS Instance SelecBon ü Which instance do I want? ü Splunk test results c1.xlarge is most cost effecbve (High- CPU Extra Large)

AWS Storage SelecBon ü Storage: ElasBc Block Storage (EBS) Simple Storage Service (S3) ü EBS considerabons Behaves like a volume RAID 1+0 improved performance Zone limited Provisioned IOPS tesbng underway ü Use Snapshots ü S3 opbmal for long term due to zone availability

AWS Infrastructure PracBces ü AVer selecbon process is done Create your own Custom AMI Use your configurabon tool to push AMI or bits ü AuthenBcaBon and AuthorizaBon Managing Users and Roles SSO or LDAP (SSL Tunnel) ü Security Create SSL tunnels or similar for distributed environments Enable SSL and use your own cerbficates

AWS Infrastructure PracBces ü Search head pooling requires NFS ü EBS!= NFS; must build NFS server ü Example configurabon (500+ GB/day ) ü m1.small as forwarder (N) ü m1.large as indexer (10) ü c1.xlarge as search head (1) ü Who does it? Best Buy scales 1000s of systems with Chef

AWS Content ü AWS usage app Splunk developed Track usage, cost, and capacity of your AWS instances with improved granularity ü AWS S3 add- on Splunk developed Modular input for S3 data ü AWS cloud formabon Allows easy creabon of a mulb- node Splunk distributed environment

ü Three ways to compute in Azure Use Virtual Machines with Splunk Azure Review

Azure H/W Plagorm ü What is a Role VM Role (Virtual Machine) Worker Role Web Role ü OperaBng Systems ü Windows and Linux based ü Sizing XS, S, M, L, XL Medium or Higher (CPU = 1.6GHz, recommended minimum=3 GHz)

Azure Storage! Windows Azure Storage Blobs = Block Blobs and Page Blobs! VHD (Virtual Hard Disk/Drive) VHD = base storage unit (exists as a Page Blob) Drives, disks, and images are all VHDs that exist in Blob storage! Windows Azure Drive AKA: X drive Persistent Storage (network akached durable drive) RAID? 32

Azure Data CollecBon! Use Forwarders + Standard CollecBon Methods File/directory monitoring Network input Scripted input Azure Apps! Azure Apps Output typically wriken to Blob Storage (must write a scripted input) AlternaBve is to nabvely (within app) send events to a Splunk instance ê Some content on how this can be done 33

Azure Infrastructure Summary! Use Extra Large instances! Azure App! Use Persistent Storage Azure (X) Drives can be moved if the VM dies 34

Azure Infrastructure Summary! Leverage Deployment server + scripbng to manage distributed environment! Power Shell Scripts to automate deployment 35

Cloud Summary! Great informabon for major cloud providers! Reference architecture and automabon templates publicly available! Consider best fit for your scenario Security requirements Pricing model Topology fit 36

Scaling

Plan for Growth! Splunk is very flexible, but ensure you have enough at all Bers (forwarders, indexers, search)! A bokleneck today can be remedied but something else will take it s place! Use more nodes to scale up, not bigger machines (when it doubt = reference architecture) indexer indexers search head & indexers search head, deployment server & indexers 38

Scaling Up! Use off- the- shelf dual- socket machines with direct- akached storage! Use more nodes to scale up, not bigger machines! More cores are more expensive and don t scale aggregate IO as much You can work with this (up to a point) with mulbple instances! Number of nodes depends mostly on search requirements Ignore the 100GB/day/instance rule- of- thumb at scale Indexers can index 250GB/day safely, and over 500GB/day possibly ê Search requirements drive this heavily, lots of read I/O will take away from the writes 39

Scaling Indexing! Make sure you have enough at every point:! Readers/forwarders to read the data? May need several for large syslog volumes! Indexers to receive the data?! Forwarders to spread data over indexers? AutoLB by default sends at least 30 seconds of data to a single indexer So a single indexer can be backed up while others are idle Decrease the AutoLB interval, and increase the input queue size to contain it! Try to avoid Heavy Forwarders and boklenecks Don t funnel (if you don t have to), go directly from UF to indexers, parse/filter on indexers 40

Scaling Search! Use parallel dispatch from search head to indexer peers Requires disabling SSL on splunkd! Use job servers: isolate jobs and users from each other on different search heads! ParBBon groups of users from each other on different search heads! SHP can be used if you have fast shared NFS between search heads 41

Ongoing OperaBons! SoVware configurabon management and deployment! Backups, retenbon, archiving, disaster recovery You can akend Dritan s session: Architect Splunk for High Availability and Disaster Recovery Lower RTO = Lots more $; Lower RPO = Lots more $; Lower both = $$$$$$ 5.x = clustering Align data retenbon policy with search use cases! Service resiliency Splunk is fine for this re: indexing forwarder LB handles it In 5.x+, use index replicabon (clustering) for HA, however, if you need DR a conversabon sbll needs to take place! Capacity monitoring, review, and planning Perform over Bme, especially when data and users are onboarded 42

Expert s Tool Bag

Closing Thoughts Follow Hardware Guidelines Virtual!= Physical

Expert s Tool Bag ü Metadata searches Hosts, Sources, Sourcetypes Latest, Oldest, and Last Event Bme Total Event Count Fast!!! ü Field opbmizabon Use Advanced charbng (disables Preview ) Use fields <relevant_field> Turn off field discovery ü Inspect search Displays stabsbcs about your search Data is from $SPLUNK_HOME/var/run/splunk/dispatch/<job_id>

Expert s Tool Bag ü ConfiguraBon check./splunk cmd btool <config_file> list --debug! What Splunk currently thinks (may not be what is loaded) ü Remote commands./splunk <command_set> -uri https://remoteserver:8089! Good for searching Clean your history! ü Splunk logs Tune logging level via management UI $SPLUNK_HOME/etc/log.cfg Not ideal for ProducBon

Expert s Tool Bag ü Debugging forwarders with Splunk Forwarder connecbons and stats index=_internal source=*metrics.log group=tcpin_connections Data transferred, connected?, 30 second intervals ü Debugging Bme stamp issues Check MAX_TIMESTAMP_LOOKAHEAD and TIME_FORMAT! Leverage _indextime, timestartpos, timeendpos fields to debug ü Bundle replicabon (distributed environments) What is it? OpBmize bundles so large ones are NOT transferred all the Bme MySQL App for lookups

More InformaBon! Contact: syep@splunk.com or sblake@splunk.com! ApplicaBons: apps.splunk.com! Answers: answers.splunk.com! EducaBon: www.splunk.com/view/educabon/sp- CAAAAH9! Professional Services: www.splunk.com/view/professional- services/sp- CAAABH9! Videos: www.splunk.com/videos 48

Other Sessions You Can Akend! Architect Splunk for High Availability and Disaster Recovery Dritan Bitincka, Wed @ 9:00! Architecting and Sizing Your Splunk Deployments Simeon Yep, Wed @ 3:00! Onboard Data into Splunk, Correctly Matthew Settipane, Wed @ 4:30! The S.o.S App: All Splunk on Splunk Action, All The Time Octavio DiSciullo, Thurs @ 9:00! Planning and Execution for Successful Deployments Chris Olson & Pete Sicilia, Thurs @ 10:15 49

Next Steps 1 2 Download the.conf2013 Mobile App If not iphone, ipad or Android, use the Web App Take the survey & WIN A PASS FOR.CONF2014 Or one of these bags! 3 View the other Deploying sessions All sessions are available on the Mobile App Videos will be available shortly 50

Q & A

THANK YOU