Copyright 2013 Splunk Inc. Best PracBces: Deploying Splunk on Physical, Virtual, and Cloud Infrastructure Sean Blake & Simeon Yep #splunkconf
Legal NoBces During the course of this presentabon, we may make forward- looking statements regarding future events or the expected performance of the company. We caubon you that such statements reflect our current expectabons and esbmates based on factors currently known to us and that actual events or results could differ materially. For important factors that may cause actual results to differ from those contained in our forward- looking statements, please review our filings with the SEC. The forward- looking statements made in this presentabon are being made as of the Bme and date of its live presentabon. If reviewed aver its live presentabon, this presentabon may not contain current or accurate informabon. We do not assume any obligabon to update any forward- looking statements we may make. In addibon, any informabon about our roadmap outlines our general product direcbon and is subject to change at any Bme without nobce. It is for informabonal purposes only and shall not, be incorporated into any contract or other commitment. Splunk undertakes no obligabon either to develop the features or funcbonality described or to include any such feature or funcbonality in a future release. Splunk, Splunk>, Splunk Storm, Listen to Your Data, SPL and The Engine for Machine Data are trademarks and registered trademarks of Splunk Inc. in the United States and other countries. All other brand names, product names, or trademarks belong to their respeccve owners. 2013 Splunk Inc. All rights reserved. 2
IntroducBon
About Us! Simeon Yep! 5+ years @ Splunk! Experience Customer Success Manager On- site Consultant (20 TB/day) Technical Sales Strategic Accounts! Based in HQ (San Francisco)! Currently: Sales Engineering Manager, Business Development! Sean Blake! 2+ years @ Splunk! Experience Professional Services MulBtude of customers Development background! Not at HQ (Washington DC)! Currently: Professional Services Manager, Public Sector 4
Agenda! Refresher! Plagorms Physical Virtual Cloud! Scaling! Expert s Tool Bag 5
Technical Refresher
Splunk: Indexer! Processes raw data and stores it onto disk! Input processing Parsing (char set determinabon, linebreaking) Merging (line merging, Bme extracbon) Typing (punctuabon, anonymizabon)! Indexer pipe Internal stuff Write to disk (compressed)! Performs HEAVY living for searches!
Splunk: Searcher! Spawns search process (splunkd- search) 1:1 rabo of search process to CPU core Communicates via REST API (hkps) splunkd search splunkd splunkd splunkd splunkd
Splunk: Forwarder! Sends data to a Splunk indexer in Splunk format! Install onto the remote system for data ingesbon! Low impact - basically reads in data for transmission! Full vs. Light vs. Universal?
Splunk: Universal Forwarder! A brief history! Light weight = forwarding only! Python/Splunkweb removed! Searching/indexing removed! Deployment server removed! LWF (4.1 and earlier) ~ UF
Brief Summary! Indexers: heavy living (index AND search)! Searchers: spawn the inibal search distribute as necessary! Forwarders: send data to the indexer for indexing
Plagorm: Physical
Infrastructure: Best PracBces ü Rule of thumb: more is beker Distributed search Splunk scales horizontally Higher quanbbes will parallelize CPU/IO Map- reduce ü Rule of thumb: 100 GB/day indexing volume per reference server Reference server: 2x4 core CPU, 16GB RAM, Fast Disks in RAID 1+0 ExcepBons and qualificabons ü Leverage deployment server to manage configurabon AlternaBves: Opscode s Chef, Puppet, Solaris Package Manager, etc
Infrastructure: Best PracBces ü Use commodity servers (reference server) Reference server: 2x4 core CPU, 16GB RAM, 4x300GB SAS drive (RAID 1+0) 2 vcpu indexer is a poor choice ü Use Fast Persistent Storage, don t skimp RAID 1+0 arrays, SAN, NFS is not ideal This will affect the user experience and limit growth if too slow We are oven constrained by this first, so make it a high priority ü Distributed Search considerabons Indexers require good IO performance Searchers are not as IO dependent as indexers Leverage blades or virtual machines for intermediate forwarders
Data CollecBon: Review ü How do we get data into Splunk? File or directory monitoring (e.g. access log files from web servers) Network input over TCP/UDP (e.g. syslog from a router) Scripted input (e.g. text output from running a shell script) Modular inputs (5.x) Textual Machine Data
Data Inputs: Best PracBces ü Rule of thumb: persist raw data to disk Splunk tracks files very well (CRC checks) Recovery and reliability ü Network inputs Stream to syslog- ng or similar Output to a file ü Forwarders vs. network stream or NFS File input is more steady state than network stream Data distribubon: load balance to many indexers Pre- processing: anonymize data or route it to a 3 rd party
Data Inputs: Best PracBces ü Rule of thumb: set index and sourcetype on forwarder It s the easiest method Improves indexer efficiency ü Rule of thumb: sourcetype your data Use built- in sourcetypes or create new ones Examples: csv, log4j, access_combined, iis, syslog sourcetype is an indexed field Organizes your data for efficient retrieval
Plagorm: Virtual
Virtual Machine Specs ü Virtual Machine considerabons (VMware, Citrix XenServer) Follow standard guidelines ê Bigger!= Beker ê Many = Beker Always set vcpu, RAM for full reservabon ü Indexer 8 vcpu recommended; 4 vcpu per VM minimum: 8GB RAM ê Full reservabon, if it s 14000MHz worth of CPU, then set it that way ü Search head 12 vcpu recommended; 8 vcpu per VM minimum; 12GB RAM ü Maintain expected performance = full reservabon Constantly read/write from disk, this is intensive and demanding
Storage ConsideraBons ü Splunk recommends 800 IOPS Don t use NFS for primary storage Thick provisioned disks Eager Zeroed Thick ê Avoid double I/0 when wribng to disk ê One write to zero it then another to actually write ê Does not pertain to NFS ê Common mistake and why we have caubon on VMFS ü Performance tesbng SplunkIT app tests indexing and searching bonnie++ (blog informabon available) iozone (contact for app, sbll in development) ü Use raw volumes VMFS experiences lower performance, but see above
Virtual Machine Notes ü Snapshots will degrade performance All writes to filesystem are in turn wriken to the snapshot, I/O hit When consolidabng it also requires I/O to move blocks around along with incoming data ü Distributed Resource Scheduler (DRS) Great tool, hurts Splunk If avoidable, pin the VM to the host If not, ensure anb- affinity so indexers are not overloading a single host ü Highest priority for CPU and memory shares ü Do NOT set Max Resources to less than the assigned value for the VM ü No memory overcommit ü Possible 20-30% overhead against data volumes
Plagorm: Cloud
Cloud Providers ü Splunk Storm; Enterprise SaaS ü Amazon Web Services (AWS) Large market share Many best pracbces Splunk friendly ü Azure Splunk runs on Azure VMs Azure app data is not trivial to retrieve ü Other Rackspace
AWS Overview ü Availability Zones concept of regions ü Amazon Machine Image (AMI) Amazon Linux based Best performance Cost effecbve (extra $$ for Windows) ü Instance type Spot vs. On- demand vs. Reserved ü Instance size Small, Medium, Large, Extra Large (2-8 EC2 Compute Units, 1-8 GB RAM) Standard vs. Cluster compute vs. GPU (varying CPU and RAM) XL standard behaves similar to a reference server (50-100 GB/day)
AWS Instance SelecBon ü Which instance do I want? ü Splunk test results c1.xlarge is most cost effecbve (High- CPU Extra Large)
AWS Storage SelecBon ü Storage: ElasBc Block Storage (EBS) Simple Storage Service (S3) ü EBS considerabons Behaves like a volume RAID 1+0 improved performance Zone limited Provisioned IOPS tesbng underway ü Use Snapshots ü S3 opbmal for long term due to zone availability
AWS Infrastructure PracBces ü AVer selecbon process is done Create your own Custom AMI Use your configurabon tool to push AMI or bits ü AuthenBcaBon and AuthorizaBon Managing Users and Roles SSO or LDAP (SSL Tunnel) ü Security Create SSL tunnels or similar for distributed environments Enable SSL and use your own cerbficates
AWS Infrastructure PracBces ü Search head pooling requires NFS ü EBS!= NFS; must build NFS server ü Example configurabon (500+ GB/day ) ü m1.small as forwarder (N) ü m1.large as indexer (10) ü c1.xlarge as search head (1) ü Who does it? Best Buy scales 1000s of systems with Chef
AWS Content ü AWS usage app Splunk developed Track usage, cost, and capacity of your AWS instances with improved granularity ü AWS S3 add- on Splunk developed Modular input for S3 data ü AWS cloud formabon Allows easy creabon of a mulb- node Splunk distributed environment
ü Three ways to compute in Azure Use Virtual Machines with Splunk Azure Review
Azure H/W Plagorm ü What is a Role VM Role (Virtual Machine) Worker Role Web Role ü OperaBng Systems ü Windows and Linux based ü Sizing XS, S, M, L, XL Medium or Higher (CPU = 1.6GHz, recommended minimum=3 GHz)
Azure Storage! Windows Azure Storage Blobs = Block Blobs and Page Blobs! VHD (Virtual Hard Disk/Drive) VHD = base storage unit (exists as a Page Blob) Drives, disks, and images are all VHDs that exist in Blob storage! Windows Azure Drive AKA: X drive Persistent Storage (network akached durable drive) RAID? 32
Azure Data CollecBon! Use Forwarders + Standard CollecBon Methods File/directory monitoring Network input Scripted input Azure Apps! Azure Apps Output typically wriken to Blob Storage (must write a scripted input) AlternaBve is to nabvely (within app) send events to a Splunk instance ê Some content on how this can be done 33
Azure Infrastructure Summary! Use Extra Large instances! Azure App! Use Persistent Storage Azure (X) Drives can be moved if the VM dies 34
Azure Infrastructure Summary! Leverage Deployment server + scripbng to manage distributed environment! Power Shell Scripts to automate deployment 35
Cloud Summary! Great informabon for major cloud providers! Reference architecture and automabon templates publicly available! Consider best fit for your scenario Security requirements Pricing model Topology fit 36
Scaling
Plan for Growth! Splunk is very flexible, but ensure you have enough at all Bers (forwarders, indexers, search)! A bokleneck today can be remedied but something else will take it s place! Use more nodes to scale up, not bigger machines (when it doubt = reference architecture) indexer indexers search head & indexers search head, deployment server & indexers 38
Scaling Up! Use off- the- shelf dual- socket machines with direct- akached storage! Use more nodes to scale up, not bigger machines! More cores are more expensive and don t scale aggregate IO as much You can work with this (up to a point) with mulbple instances! Number of nodes depends mostly on search requirements Ignore the 100GB/day/instance rule- of- thumb at scale Indexers can index 250GB/day safely, and over 500GB/day possibly ê Search requirements drive this heavily, lots of read I/O will take away from the writes 39
Scaling Indexing! Make sure you have enough at every point:! Readers/forwarders to read the data? May need several for large syslog volumes! Indexers to receive the data?! Forwarders to spread data over indexers? AutoLB by default sends at least 30 seconds of data to a single indexer So a single indexer can be backed up while others are idle Decrease the AutoLB interval, and increase the input queue size to contain it! Try to avoid Heavy Forwarders and boklenecks Don t funnel (if you don t have to), go directly from UF to indexers, parse/filter on indexers 40
Scaling Search! Use parallel dispatch from search head to indexer peers Requires disabling SSL on splunkd! Use job servers: isolate jobs and users from each other on different search heads! ParBBon groups of users from each other on different search heads! SHP can be used if you have fast shared NFS between search heads 41
Ongoing OperaBons! SoVware configurabon management and deployment! Backups, retenbon, archiving, disaster recovery You can akend Dritan s session: Architect Splunk for High Availability and Disaster Recovery Lower RTO = Lots more $; Lower RPO = Lots more $; Lower both = $$$$$$ 5.x = clustering Align data retenbon policy with search use cases! Service resiliency Splunk is fine for this re: indexing forwarder LB handles it In 5.x+, use index replicabon (clustering) for HA, however, if you need DR a conversabon sbll needs to take place! Capacity monitoring, review, and planning Perform over Bme, especially when data and users are onboarded 42
Expert s Tool Bag
Closing Thoughts Follow Hardware Guidelines Virtual!= Physical
Expert s Tool Bag ü Metadata searches Hosts, Sources, Sourcetypes Latest, Oldest, and Last Event Bme Total Event Count Fast!!! ü Field opbmizabon Use Advanced charbng (disables Preview ) Use fields <relevant_field> Turn off field discovery ü Inspect search Displays stabsbcs about your search Data is from $SPLUNK_HOME/var/run/splunk/dispatch/<job_id>
Expert s Tool Bag ü ConfiguraBon check./splunk cmd btool <config_file> list --debug! What Splunk currently thinks (may not be what is loaded) ü Remote commands./splunk <command_set> -uri https://remoteserver:8089! Good for searching Clean your history! ü Splunk logs Tune logging level via management UI $SPLUNK_HOME/etc/log.cfg Not ideal for ProducBon
Expert s Tool Bag ü Debugging forwarders with Splunk Forwarder connecbons and stats index=_internal source=*metrics.log group=tcpin_connections Data transferred, connected?, 30 second intervals ü Debugging Bme stamp issues Check MAX_TIMESTAMP_LOOKAHEAD and TIME_FORMAT! Leverage _indextime, timestartpos, timeendpos fields to debug ü Bundle replicabon (distributed environments) What is it? OpBmize bundles so large ones are NOT transferred all the Bme MySQL App for lookups
More InformaBon! Contact: syep@splunk.com or sblake@splunk.com! ApplicaBons: apps.splunk.com! Answers: answers.splunk.com! EducaBon: www.splunk.com/view/educabon/sp- CAAAAH9! Professional Services: www.splunk.com/view/professional- services/sp- CAAABH9! Videos: www.splunk.com/videos 48
Other Sessions You Can Akend! Architect Splunk for High Availability and Disaster Recovery Dritan Bitincka, Wed @ 9:00! Architecting and Sizing Your Splunk Deployments Simeon Yep, Wed @ 3:00! Onboard Data into Splunk, Correctly Matthew Settipane, Wed @ 4:30! The S.o.S App: All Splunk on Splunk Action, All The Time Octavio DiSciullo, Thurs @ 9:00! Planning and Execution for Successful Deployments Chris Olson & Pete Sicilia, Thurs @ 10:15 49
Next Steps 1 2 Download the.conf2013 Mobile App If not iphone, ipad or Android, use the Web App Take the survey & WIN A PASS FOR.CONF2014 Or one of these bags! 3 View the other Deploying sessions All sessions are available on the Mobile App Videos will be available shortly 50
Q & A
THANK YOU