Copyright 2014 Splunk Inc. The Jiffy Lube Quick Tune- up for your Splunk Environment Sean Delaney Client Architect, Splunk
Disclaimer During the course of this presentaion, we may make forward looking statements regarding future events or the expected performance of the company. We cauion you that such statements reflect our current expectaions and esimates based on factors currently known to us and that actual events or results could differ materially. For important factors that may cause actual results to differ from those contained in our forward- looking statements, please review our filings with the SEC. The forward- looking statements made in the this presentaion are being made as of the Ime and date of its live presentaion. If reviewed aqer its live presentaion, this presentaion may not contain current or accurate informaion. We do not assume any obligaion to update any forward looking statements we may make. In addiion, any informaion about our roadmap outlines our general product direcion and is subject to change at any Ime without noice. It is for informaional purposes only and shall not, be incorporated into any contract or other commitment. Splunk undertakes no obligaion either to develop the features or funcionality described or to include any such feature or funcionality in a future release. 2
IntroducIon
! Splunk Client Architect Splunker for 3+ years Using Splunk for 4+ years Large Scale Deployment Specialist About Me! Previously Splunk Professional Services 10+ years ProducIon Services for a large Internet Security Company 4
Agenda! Time Issues! Indexer Health! Search Performance! New Kids on the block 5
What s a Splunk Tune- Up?! IdenIfying common performance issues in a distributed environment: Ime issues lagging events monitoring indexer queues slow and resource intensive searches! Provide steps to improve performance via configuraion changes 6
Splunk Architecture 7
Splunk on Splunk hcp://apps.splunk.com/app/748/ 8
Time
! Splunk indexes all events by Ime Time is Important! Splunk Searches are bound by a Ime range! When you run a Search Splunk retrieves these events based on their indexed Ime 10
Common Time Issues! ExtracIng/assigning correct Imestamp for the event MulIple Imestamps in an event Time but no date Custom Imestamp formats! Line Breaking Line breaking occurring in the middle of muli- line events Orphaned log lines with no Imestamps! Time Zones! Servers not synchronized to an authoritaive Ime source NTP 11
Assigning Time! Timestamps are determined and set at index Ime! Splunk will first acempt to extract the Imestamp out of the raw event! Timestamp is stored in the _Ime field in UTC Ime format 10.10.241.162 - - [29/Mar/2014:17:43:09-0700] "GET /includes/js/combined-javascript.js HTTP/ 1.1" 200 5067 0.003! 10.10.241.162 - - [29/Mar/2014:17:43:09-0700] "GET /favicon.ico HTTP/1.1" 200 2494 0.001! 10.10.241.162 - - [29/Mar/2014:17:43:17-0700] "POST /login.jsp HTTP/1.1" 302-0.028! 12
The side- effects! Incorrectly Ime stamped events can: Exclude key events from the result set Exclude events from Real Time Window Miss correlaion of events Missed alerts Throw off Summary Indexing results Produce incorrect reports Increase the window of index buckets Cause heart burn at audit Ime 13
Finding Time Stamping Issues! Where are my events? Possibly in the future.! Run a Real- Time All Time Search 14
TZ issues! Splunk will use a Ime zone offset reported in the _raw Ime field: 29/Jan/2014:17:43:09-0700! 29/Jan/2014:17:43:09 PST!! Otherwise, if your events come from systems in different Time Zones you may need to account for this as the event will be set with the TZ of the indexer! Set TZ offsets in props.conf [host::chi*] TZ = US/Central! Default Forwarder Time Zone Timezone supplied by the forwarder as its system Imezone (both forwarder and indexer must be Splunk 6 or later) 15
Other Time Errors! Check splunkd.log for TimeParserVerbose warnings: > index=_internal source=*splunkd.log WARN DateParserVerbose! 08-10-2014 05:59:14.488 +0000 WARN DateParserVerbose - Failed to parse timestamp. Defaulting to timestamp of previous event. Context="source::/ log/applog.log host::appserver appserver_log remoteport::58689" Text="columnCount = 6;...!! These errors are normally due to incorrect TIME_FORMAT or LINE_BREAKER errors 16
Use the log file viewer in SoS 17
TIME_FORMAT in props.conf! [source::transaction.log] SHOULD_LINEMERGE = true BREAK_ONLY_BEFORE_DATE = false BREAK_ONLY_BEFORE = ^Processing TIME_PREFIX = \sat\s TIME_FORMAT = %Y-%m-%d %H:%M:%S MAX_TIMESTAMP_LOOKAHEAD = 70!! Verify the strpime() format with the date command % date "+%Y-%m-%d %H:%M:%S 2014-08-19 11:06:24 %! 18
Where to apply props.conf 19
Verify events are matching TIME_FORMAT! 08-07- 2014 14:33:09.566-0700! Use regex in your search to isolate events which don t have a matching TIME_FORMAT index=x sourcetype=y regex _raw!= ^\d{2}- \d{2}- \d{4}\s\d{2}:\d{2}:\d{2}. \d{3}\s- \d{4}! Similar approach using punct index=x sourcetype=y punct!= - - _::._- 20
Ge~ng data in on Ime! Events are slow ge~ng into Splunk! Look at indexing Ime lag source=<your delayed source> eval delay=_indexime- _Ime fields delay 21
SoS Real- Time Indexing Lag S.o.S - Splunk on Splunk > Distributed Indexing Performance 22
Causes for reported indexing lag! Network throughput issues! Universal/LightWeightForwarder Throcling (limits.conf) [thruput] maxkbps = 256! Blocked indexer queues! Buffered wriing to log files! Batch log files or historical log file loads! Incorrect Timestamp extracion 23
Indexer Health
Indexer Server Health! Monitor Indexer s Server Resources Memory Usage Processes System Load I/O throughput! System Commands top free/swap l iostat! *nix/windows App 25
SoS Splunk CPU/Memory Usage 26
Indexer Queues! Indexer queues provide insight to an indexer s internal state! Blocked queues indicate a processing bocleneck in the indexing process and will delay the indexing of events! Default max queue size = 1000, when this limit is reached this queue will be blocked! It s normal for queues to be temporarily blocked, indicates that your indexer is doing work! Persistent blocked queues indicate a performance issue, and will block upstream queues as they fill 27
Important Indexer Queues parsingqueue - UTF Encoding - Line Breaking (single line) - Header Processing aggqueue - Line Merging (identify single, multiline event) - Timestamp extraction typingqueue - Regex Replacement (field extraction, routing) - Punct Extraction indexqueue - Index and Forwarding - Block Signature - Writing to Disk 28
Looking for Boclenecks! View the current queue size by searching the metrics log: index=_internal source=*metrics.log group=queue Imechart median(current_size) by name! Find blocked queue events index=_internal source=*metrics.log group=queue blocked 29
SoS Indexer Queues S.o.S - Splunk on Splunk > Indexing Performance 30
SoS Indexer Queues S.o.S - Splunk on Splunk > Distributed Indexing Performance 31
Tuning Configs! Performance can be improved by modifying the configuraion file se~ng related to each queue s operaion 1. parsingqueue (props.conf)! LINE_BREAKER!! TRUNCATE! 2. aggqueue (props.conf)! SHOULD_LINEMERGE=false TIME_PREFIX=<something> TIME_FORMAT=<something> MAX_TIMESTAMP_LOOKAHEAD=X! 32
3. typingqueue props.conf ê TRANSFORMS- xxx ê SEDCMD transforms.conf ê SOURCE_KEY ê DEST_KEY ê REGEX ê FORMAT Tuning Configs 33
4. indexerqueue Index and Forwarding Block Signing Tuning Configs indexes.conf se~ngs ê maxhotbuckets = 10 ê maxdatasize = auto_high_volume! In most cases a blocked indexerqueue is due to more data being wricen to disk than can be serviced.! If the disk performance cannot be improved, consider adding addiional indexers to offload this indexing workload. 34
Search Performance
Search Performance! Search performance: idenifying slow searches, re- factoring searches to take advantage of map- reduce! Bundle replicaion white/blacklists! Where summary indexing/search acceleraion helps! Spreading out scheduled searches using cron syntax to avoid search hot spots at midnight, top of the hour! Using Ime snaps effecively and reducing the risk of missing delayed events 36
Search Performance ReporIng Splunk Server AcIvity Reports SoS Search AcIvity Reports 37
SoS Search Performance 38
SoS - Scheduler AcIvity 39
SoS - Skipped Scheduled Searches 40
Concurrent Searches! SoS > Search AcIvity > Search concurrency! System AcIvity Report > Search AcIvity Overview 41
Concurrent Searches! The system- wide number of searches is computed as base_max_searches + max_searches_per_cpu x number_of_cpus! Splunk 5 (limits.conf):! [search]!!base_max_searches = 6!!max_searches_per_cpu = 1!! number_of_cpus = number of cpus reported by OS with hyperthreading enabled this is 2 x physical cores! Beware increasing the number of concurrent searches may increase the CPU load and the Ime taken for any search to complete! 42
Scheduled Search Limit! The raio of jobs that the scheduler can use (versus the manual/dashboard ones).! By default, 25% of your quota is for scheduled searches (limits.conf):! [scheduler]!!max_searches_perc = 25!! If you have a large number of Scheduled Searches increasing this percentage may help with skipped searches.! The searches are NOT reserved for the scheduler - this is just a limit - ad- hoc user searches take priority over the scheduler. 43
Summary Indexing! Use Ime modifier offsets to snap- to- Ime! For last hour - 1h@h to @h! Offset the scheduled Ime from the top of the hour to 5 mins aqer the hour 5 * * * *! This offset method can be applied to other scheduled searches Start Time Finish Time Cron Schedule sidata -1h@h @h 5 * * * * Scheduled saved search to run every hous at 5 mins past the hour, with the time range of the last whole hour. 1:00 2:00 3:00 4:00 The results from the summary search will populate the Summary Index for the last whole hour. 44
New kids on the block
_introspecion index! In 6.1 added the _introspecion index! Collects Splunk performance data Per- process resource usage data: splunkd, splunkweb, splunk- opimize Search process data Host resource usage: CPU uilizaion, paging Disk object informaion: server- info, indexes, volumes, search- arifacts, fishbucket! hcp://docs.splunk.com/documentaion/splunk/latest/ TroubleshooIng/Abouchepla orminstrumentaionframework 46
Distributed Management Console! New in 6.2 visualizes data collected in the _introspecion index 47
Wrapping Up
In Review! It s important to ensure that events are indexed with the correct Ime, otherwise you risk missing data in your result sets! Tracking down Ime zone issues can be difficult, all your systems running in UTC is Utopia! Correct se~ng of TIME_FORMAT can improve indexer throughput! Monitor Splunk s indexer queues to stay ahead of indexer load/throughput issues! Tuning se~ngs in props and transforms can increase your indexer throughput! Monitor search acivity, fix expensive searches, modified scheduled search Imes to avoid search load hot spots 49
! hcp://answers.splunk.com Answers 50
TroubleshooIng Manual hcp://docs.splunk.com/documentaion/splunk/latest/ 51
Resources! Get educated: hcp://www.splunk.com/view/educaion/sp- CAAAAH9! Download Splunk applicaions: hcp://apps.splunk.com! Hire Splunk Professional Services: hcp://www.splunk.com/view/professional- services/sp- CAAABH9! Watch some videos: hcp://www.splunk.com/videos 52
Splunk on Splunk hcp://apps.splunk.com/app/748/ 53
QuesIons?
THANK YOU