Copyright 2014 Splunk Inc. MulGsite Clustering and Search Affinity Mustafa Ahamed Director, Product Management Da Xu Senior SoDware Engineer
Disclaimer During the course of this presentagon, we may make forward- looking statements regarding future events or the expected performance of the company. We caugon you that such statements reflect our current expectagons and esgmates based on factors currently known to us and that actual events or results could differ materially. For important factors that may cause actual results to differ from those contained in our forward- looking statements, please review our filings with the SEC. The forward- looking statements made in the this presentagon are being made as of the Gme and date of its live presentagon. If reviewed ader its live presentagon, this presentagon may not contain current or accurate informagon. We do not assume any obligagon to update any forward- looking statements we may make. In addigon, any informagon about our roadmap outlines our general product direcgon and is subject to change at any Gme without nogce. It is for informagonal purposes only, and shall not be incorporated into any contract or other commitment. Splunk undertakes no obligagon either to develop the features or funcgonality described or to include any such feature or funcgonality in a future release. 2
Agenda! What is Clustering?! Business Benefits of MulGsite Clustering! MulGsite ConfiguraGon! Search Affinity! Tips and Tricks MigraGon! Q&A 3
Why We Need Clustering? Work- Arounds 1. Index and Forward Addl. Licensing Costs 2. Simultaneous forward Data Sync Issue Forwarders Forwarders 4
Four Ideal Data Availability Product Requirements Data Recovery Auto failure detecgon and recovery Data Fidelity Correctness of data at all Gme Data Redundancy Auto back- up of data Ease of Use + Lower TCO Management Console Single graphical interface to manage cluster 5
Major Components of Clustering Cluster Master Cluster Peer(s) One master per cluster meta data only Stores and indexes the actual data Replicates data to other peers Search Head(s) Coordinates the searches 6
Clustering The Search Process Flow 2 1 3 1 Search Head gets the peer list from Cluster Master 2 Search Head sends the search queries to peers 3 Redundant copies of raw data are available 7
Enterprise Readiness in Splunk High Availability Indexer Tier Index ReplicaGon Commodity hardware based Recommended for single site Flexible replicagon policies Available since Splunk 5.0 Disaster Recovery MulG Site Clustering Can withstand engre site failure Support for acgve- acgve configuragon Search affinity MISSION CRITICAL ENTERPRISE 8
MulGsite Clustering
MulGsite Clustering! Released in Splunk 6.1! Previous Splunk clusters gave us data redundancy RF- 1 indexers can fail without data loss! MulGsite allows for an extra layer of parggoning Indexers are grouped up in sites Now, an engre site of indexers (or mulgple sites) can fail without data loss This parggoning allows for bejer real- world redundancy ê e.g. failures in a rack/office- locagon (one site) will not result in data loss from redundant sites Los Angeles (site1) San Jose (site2) 10
MulGsite configuragons are in splunk.conf MulGsite ConfiguraGon Cluster Master splunk.conf [general] site = site1 [clustering] mode = master mulnsite = 1 available_sites = site1,site2,site3 site_replicanon_factor = origin:2,total:3 site_search_factor = origin:1,total:2 Cluster Indexer splunk.conf [general] site = site1 [clustering] mode = slave 11
MulGsite ConfiguraGon ConfiguraNon ExplanaNon/Rules mulnsite Turns mulgsite on or off [0/1] site available_sites Which site this Splunk instance belongs to Master/peers/searchheads all require a site if mulgsite is enabled Valid sites are site1 site63 List of all sites that will be part of this cluster Splunk instances with a site not listed here will not be able to join the cluster site_replicanon_factor site_search_factor MulGsite replicagon policy - specifies how many copies of a bucket per site [required] origin refers to # of replicated copies for the original site sitex refers to # of replicated copies for a specific site [required] total refers to total # of replicated copies for each bucket MulGsite replicagon policy for searchable copies, similar to site_replicagon_factor None of the values can be larger than their corresponding site_replicagon_factor values 12
site_replicagon_factor and site_search_factor ConfiguraNon origin:2, total:3 origin:2, total:4 origin:2, site1:2, total:4 origin:2, site1:1, total:4 origin:2:, site1:2, site2:2, total:3 ExplanaNon/Rules Default value. Origin site has 2 copies, 3 copies cluster- wide. Splunk will put the extra copy in a site that doesn t have a copy Similar to above, but Splunk will try to put a single copy into any site that doesn t have one Both site1 and origin will require a minimum of 2 copies If origin==sitex, then we require a minimum of the max of the 2 values (in our case, sgll 2) If origin==site1, Splunk will put 2 copies of the bucket in site1. Invalid the individual sites add up to more than total! 13
MulGsite ConfiguraGon! Sample configuragon these are two idengcal configuragons Site1 Indexers Site2 Indexers available_sites = site1,site2 site_replicanon_factor = origin:1,total:2 available_sites = site1,site2 site_replicanon_factor = origin:1,site1:1,site2:1,total:2 14
Search Affinity! Before mulgsite clustering, each bucket had a single primary searchable copy that would respond to searches! With mulgsite, each bucket now has a primary per site An individual copy can be primary for mulgple sites (search affinity) If a bucket with a searchable copy exists on a site, Splunk will make that bucket the primary for that site! Searchheads also have a site (search affinity) Searches will get as much events from indexers that share the same site 15
Search Affinity Site1 search head Site1 Indexers 1 1 1 2 Site2 search head Site2 Indexers 1 2 1 2 1 2! When a searchable copy becomes available on a site, Splunk will move the primary for that site to its local copy! Buckets on a site will return events to a searchhead with the same site! If a peer goes down, the master will move the primaries that peer had to another copy! If the engre site goes down, the other site(s) will become primaries 16
Where Do the Buckets Go? When a new hot bucket is created, Splunk will choose replicagon peers as follows: 1. For specific site counts, randomly choose peers from that site to be targets origin:n and sitex:n (site2:2) Splunk will find 2 random peers in site2 to be the replicagon partner 2. For the remaining unspecified counts (the ledover of total subtracted from the specific counts) We target at least 1 copy into sites that have no copies yet (origin:2, total:4), and (available_sites=site1,site2,site3,site4). There are two unspecified counts here, and 3 sites that have no copies yet so Splunk will randomly target a copy into two of those sites If every unspecified site has at least 1 copy, Splunk will then choose sites with the lowest number of copies (which leads to an even distribugon, number of peers permixng) (origin:2, total:6) and (available_sites=site1,site2,site3,site4). There must be 2 copies in the origin site Splunk will then distribute the remaining 4 buckets over all sites. 17
Some Things to Note! The Cluster Master coordinates all primary changes If it fails, primaries will no longer change and thus we may lose site affinity if a site goes down SoluGon is to bring up another cluster master (can be in a separate site)! Buckets created before mulgsite is turned on follow slightly different rules: 1. They follow the old replicagon_factor and search_factor rules instead of mulgsite rules 2. These buckets will also not replicate across sites. Splunk will try to keep these old buckets on its origin site, and perform replicagons between peers of that site 18
MigraGon
MigraGon! 6.0 ( non- clustering ) to 6.1 ( mulgsite) MulGsite policies will be applied to new data Pre- mulgsite buckets will have single copy of the data ungl they age out! 6.0 ( clustering ) to 6.1 ( mulgsite) MulGsite policies will be applied to new data Pre- mulgsite buckets will follow the legacy rep_factor / search_factor policies ungl they age out 20
Scaling & UI Enhancements
MulGsite Clustering Scaling! Indexers Test 1000 nodes cluster Splunk 5.0 Splunk 6.0 Splunk 6.1 Splunk 6.2! Indexes Test 100+ indexes 200,000 buckets Largest cluster tested in- house 10 Nodes 150 Nodes 450 Nodes 1000 Nodes! Sites Test 200 nodes cluster ê 63 Sites, 3 nodes in each site 22
Clustering UI New Bucket Status Page 23
Clustering UI Fixup Tasks In- progress fixup acgviges Pending fixup tasks 24
Clustering UI Manage Excess Buckets 25
Summary
Key Benefits of MulGsite Clustering 1. Faster Recovery from Disastrous Events 2. Intelligent Search RouGngs through Search Affinity 3. ConGnuous Data Availability 27
Q & A
THANK YOU