Using Splunk to Monitor the Customer Experience JUSTIN BROWN Pacific Northwest National Laboratory NLIT Summit 2015
About Me Justin Brown justin@pnnl.gov IT Engineer Automation & Monitoring Team 15 Years at PNNL Lead Engineer for Splunk
The Challenge Traditional monitoring Servers & Services Customer Focused Outside looking in
Why Splunk Pulls together logs from several sources Scripted inputs Database connectivity Visualization Splunk 6.x Dashboard Examples https://splunkbase.splunk.com/app/1603/
The Targets Accounts Workstations Lync Email Websites Network
The Plan
Accounts Account Lockouts Bad Password Attempts Calls to the Help Desk
Accounts: Bad Passwords Source: Domain Controller Event Logs index=os source=wls:security host=dcpn* EventID=4771 Status=0x18 timechart span=1h dc(user) as perhour
Accounts: Account Lockouts Source: Domain Controller Event Logs index=os source=wls:security host=dcpn* EventID=4740 process=security timechart span=1h dc(user) as perhour
Accounts: Help Desk Calls Source: Help Desk Ticket Database dbquery "MAXIMO_PROD" "SELECT TICKETID, DESCRIPTION, COMMODITYGROUP, COMMODITY FROM MAXIMO.TICKET WHERE REPORTDATE > SYSDATE - 1 search (DESCRIPTION=*password* AND COMMODITY=ADACCESS) OR DESCRIPTION=*account*lock* rename REPORTDATE as _time timechart span=1h count(ticketid) as perhour
Workstations Reliability Score Calls to the Help Desk
Workstations: Reliability Score Source: Workstation Event Logs `wls` EventID=2005 ProviderName=Microsoft-Windows-Reliability-Analysis- Engine Stability=* timechart span=1d eval(round(avg(stability),2)) as perday `wls` EventID=2005 ProviderName=Microsoft-Windows-Reliability-Analysis- Engine Stability=* timechart span=1d dc(host) as perday
Lync SCOM Synthetic Transactions Application Crashes and Hangs Calls to the Help Desk
Lync: Synthetic Transactions Source: SCOM Synthetic Transactions in Event Logs index=os source=wls host=<server name> EventID=334 timechart span=1h count as perhour
Lync: Crashes & Hangs Source: Workstation Event Logs `wls` EventID=1001 process=application Data1=APPCRASH Data4=lync.exe timechart span=1h count as perhour
Email SCOM Synthetic Transactions Application Crashes and Hangs Calls to the Help Desk
Email: Synthetic Transactions Source: SCOM Synthetic Transaction Logs index=scom sourcetype=scom_input DistApp=Exchange MaintenanceMode=False Status=Error timechart span=1h count as perhour
Web Applications Selenium Synthetic Transactions SCOM SharePoint monitoring.net Application Errors on Workstations Errors from IIS logs Calls to the Help Desk
Web Applications: Selenium http://www.seleniumhq.org/projects/webdriver/ https://selenium-python.readthedocs.org/ Source: Selenium Synthetic Transactions index=web sourcetype=synthetic:transaction transaction execution_id transaction_name startswith="transaction_start endswith="transaction_end keepevicted=true maxspan=5m search closed_txn=0 timechart span=1h count as perhour
Web Applications:.Net Errors Source: Workstation Event Logs `wls` EventID=1309 RequestURL=http*://* Eventmessage="An unhandled exception has occurred. timechart span=1h dc(user) as perhour
Network Solar Winds via SCOM Alerts Calls to the Help Desk
Building Each Row index=os source=wls:security host=dcpn* EventID=4740 timechart span=1h dc(user) as perhour stats sparkline(max(perhour),1h) as Trend, max(perhour) as Highest, latest(perhour) as Now eval Section="Account Lockouts table Section, Trend, Now rename Now as "Current Count"
Adding the Status index=os source=wls:security host=dcpn* EventID=4740 timechart span=1h dc(user) as perhour stats sparkline(max(perhour),1h) as Trend, max(perhour) as Highest, latest(perhour) as Now rangemap field=now low=0-10 elevated=11-20 default=severe rename range as "Current Status rangemap field=highest low=0-10 elevated=11-20 default=severe rename range as "Past 24 Hours eval Section="Account Lockouts table Section, Trend, Now, "Past 24 Hours", "Current Status" rename Now as "Current Count
Combining Queries eval Section="Account Lockouts table Section, Trend, Now rename Now as "Current Count append [ search index=os source=wls:security host=dcpn* EventID=4771 Status=0x18 timechart span=1h dc(user) as perhour... eval Section="Bad Passwords ]
Adding Icons Custom JavaScript & CSS
Custom Drilldowns index=os source=wls:security host=dcpn* EventID=4740 timechart span=1h dc(user) as perhour stats sparkline(max(perhour),1h) as Trend, max(perhour) as Highest, latest(perhour) as Now eval Section="Account Lockouts eval Drilldown=ced_account_dashboard table Section, Trend, Now rename Now as "Current Count"
Questions?