Copyright 2013 Splunk Inc. Architec;ng Splunk for High Availability and Disaster Recovery Dritan Bi;ncka Professional Services #splunkconf
Legal No;ces During the course of this presenta;on, we may make forward- looking statements regarding future events or the expected performance of the company. We cau;on you that such statements reflect our current expecta;ons and es;mates based on factors currently known to us and that actual events or results could differ materially. For important factors that may cause actual results to differ from those contained in our forward- looking statements, please review our filings with the SEC. The forward- looking statements made in this presenta;on are being made as of the ;me and date of its live presenta;on. If reviewed auer its live presenta;on, this presenta;on may not contain current or accurate informa;on. We do not assume any obliga;on to update any forward- looking statements we may make. In addi;on, any informa;on about our roadmap outlines our general product direc;on and is subject to change at any ;me without no;ce. It is for informa;onal purposes only and shall not, be incorporated into any contract or other commitment. Splunk undertakes no obliga;on either to develop the features or func;onality described or to include any such feature or func;onality in a future release. Splunk, Splunk>, Splunk Storm, Listen to Your Data, SPL and The Engine for Machine Data are trademarks and registered trademarks of Splunk Inc. in the United States and other countries. All other brand names, product names, or trademarks belong to their respeccve owners. 2013 Splunk Inc. All rights reserved. 2
About Me! Member of Professional Services! Several mul;- TB deployments! 50+ projects/customers! Third.conf 3
Agenda Problems! Disaster recovery Recover in the event of a disaster! High availability Data collection Searching Indexing! Top takeaways Maintain an acceptable level of continuous service 4
Disaster Recovery (DR)
What is Disaster Recovery? Set of processes necessary to ensure recovery of service after a disaster 6!
Disaster Recovery Steps 1 Backup necessary data Local backup vs. remote Backup to a medium at least as resilient as source 2 Restore Ensure this works Backup is worthless without restore 7!
Backup 1 a Configurations $SPLUNK_HOME/etc/* b Indexes Buckets: Hot*, warm, cold, frozen 8!
Backup Configurations Splunk instance $SPLUNK_HOME/etc/* 9
Backup: Bucket Lifecycle Events [Hot bucket is full] [Too many warms] [Out of space or bucket is old] $ Home path $ Cold path [Cheaper storage] [Explicit user action] $ Thawed path $ Frozen path or deleted 10!
Backup Indexes Bucket type State Can backup? Read + write No* Read only Yes Read only Yes 11
What if your backup fails? 12
13
Restore Configurations Splunk instance New Splunk instance $SPLUNK_HOME/etc/* $SPLUNK_HOME/etc/* 14
Restore Data Splunk instance New Splunk instance $Indexes_Location ($SPLUNK_HOME/var/lib/splunk) $Indexes_Location ($SPLUNK_HOME/var/lib/splunk) 15
Putting Restore Together a New Splunk instance 2 b Configurations C Indexes 16!
Things to Think About:! Recovery )me vs. complexity and cost Ex. Backup to tape vs. warm standby! Tolerable loss vs. complexity and cost Ex. Backup warm/cold buckets vs. mirror hot bucket writes 17!
High Availability (HA)
What is High Availability? A design methodology whereby a system is continuously operational, bounded by a set of predetermined tolerances Note: high availability!= complete availability 19!
Splunk High Availability 1 Data collection 2 Searching 3 Indexing 20!
Data Collection A Indexers... B Forwarder Forwarder... Forwarder 21!
Data Collection A Indexers... B outputs.conf:!! [tcpout]! defaultgroup = mygroup!! [tcpout:mygroup]! server = A:9997, B:9997! autolb = true!... Forwarder Forwarder Forwarder 22!
Searching 2 a b Multiple, Independent Search Heads Splunk Native Search Head Pooling 23!
Searching Typical search hierarchy Indexer A Indexer B...! Indexer N 24!
Searching: Independent SH Typical search hierarchy Indexer A Indexer B...! Indexer N 25!
Searching: Independent SH Search head A Search head B Sync configs Sync job artifacts Independent search heads Indexer A Indexer B...! Indexer N 26!
Search Head Pooling 27!
Searching: Pooling Use NFS to share - Configurations - Job artifacts Search head A Search head B NFS Indexer A Indexer B...! Indexer N 28!
Searching: Pooling Use NFS to share - Configurations - Job artifacts Search head A Search head B NFS Indexer A Indexer B...! Indexer N 29!
Steps to SHP 1 Configure the shared location (NFS/SMB) 2 3 Enable pooling on each (stopped) search head./splunk pooling enable <path_to_share> Copy $SPLUNK_HOME/etc/apps and $SPLUNK_HOME/etc/users to $path_to_share/etc/ 4 Start all search heads 30!
Indexing a Reliable storage (SAN) 3 b Mirrored indexers C Index replication 31!
Indexing: Reliable Storage Indexer A Use the underlying SAN infrastructure to provide HA SAN 32!
Indexing: Reliable Storage Indexer A When indexer A fails, mount SAN to another indexer instance SAN Indexer B 33!
Indexing: Mirrored Indexers Search head Primary indexers X.0 index (locally) and forward to their respective secondaries X.1 Indexer A.0 Indexer B.0 Indexer A.1 Indexer B.1 34!
Indexing: Mirrored Indexers Search head Point search head to mirrored indexer B.1 Indexer A.0 Indexer B.0 Indexer A.1 Indexer B.1 35!
Index Replication
Indexing: Index Replication! Introduced with Splunk 5! Cluster = a group of search peers (indexers) that replicate each others' data! Mul;ple copies of all data exist throughout the cluster! The process that achieves this is known as index replica;on! Bucket based replica;on i.e. the unit of replication is a bucket 37!
Index Replication Benefits! Data availability. An indexer is always available to handle incoming data, and the indexed data is always available for searching! Disaster tolerance. System can tolerate downed indexers without losing data or losing access to data 38!
Index Replication Trade Offs! Extra storage! To a lesser degree, increased processing load More disaster tolerance (i.e., more copies of data) requires higher storage requirements Governed the replication factor (RF) attribute 39!
3 node Cluster Replication factor of 3 40
Ok guys, pair up in threes! Yogi Berra 41
Cluster Components! Master node Orchestrates replication process Informs the SH where to find data Helps manage peer configurations! Peer nodes Receive and index data (normal peer behavior) Replicate data to/from other peers Peer nodes number RF 42!
Cluster Components! At least one search head Must use one to manage searches across the cluster Can t run searches directly on an indexer, as you can with nonclustered indexers! Auto load- balanced forwarders Capable of indexer acknowledgement ê Ensuring data gets reliably indexed Note that other input methods can be used as well, but no indexer acknowledgement is available 43!
Cluster Concepts: RF! Replica;on factor :: RF defaults to 3 Number of copies of data in the cluster Cluster can tolerate RF-1 node failures! Cluster can tolerate 2 node failures 44!
Cluster Concepts: SF! Searchable factor :: SF defaults to 2 Number of searchable copies the cluster maintains SF RF Requires more storage and more processing than replicated copies Replicated copy vs. searchable copy SF = 1 vs. SF = 2 45!
Clustered Indexing! Origina;ng peer node streams copies of data to other clustered peers They store those copies in their own buckets Some peers may also index the copy! Master determines which peer nodes get replicated data Currently not configurable! Master manages all peer- to- peer interac;ons! Master keeps track of which peers have searchable data Ensures that there are always SF copies of searchable data available When a peer goes down, the master coordinates remedial activities 46!
Clustered Searching! Search head coordinates all searches in the cluster! SH relies on master to tell it who its peers are! Only one replicated bucket is searchable (primary) Searches occur over primary buckets, only! Primary buckets (searchable) may change over ;me Peers know their status and therefore know where to search 47!
Putting HA All Together NFS! Search head pooling HA Index replication or SAN based mirrorring..!..! Forwarders autolb HA 48!
Top Takeways! DR Process of backing-up and restoring service in case of disaster Configuration files Copy of $SPLUNK_HOME/etc/ folder Indexed data Backup and restore buckets ê Hot, warm, cold, frozen ê Can t backup hot but can safely backup warm and cold! HA Continuously operational system bounded by a set of tolerances Data collection ê Autolb from forwarders to multiple indexers ê Use indexer acknowledgement to protect in flight data Searching ê Search head pooling (SHP) Indexing ê Use index replication ê Highly reliable storage, mirrored set of indexers 49!
Questions?
Next Steps 1 2 Download the.conf2013 Mobile App If not iphone, ipad or Android, use the Web App Take the survey & WIN A PASS FOR.CONF2014 Or one of these bags! 3 Go to the sessions listed on the next slide 51
More Info Other Sessions You May Like:! Architec)ng and Sizing Your Splunk Deployments Brera 2&3, Level 3 Today, 3-4pm! Planning and Execu)on for Successful Deployments Brera 2&3, Level 3 October 3rd, 10:15-11:15am 52