Copyright 2013 Splunk, Inc. Using Big Data to Align IT Security with Business Risk Mark Seward, Senior Director, Security and Compliance
Legal Notices During the course of this presentation, we may make forward-looking statements regarding future events or the expected performance of the company. We caution you that such statements reflect our current expectations and estimates based on factors currently known to us and that actual events or results could differ materially. For important factors that may cause actual results to differ from those contained in our forward-looking statements, please review our filings with the SEC. The forward-looking statements made in this presentation are being made as of the time and date of its live presentation. If reviewed after its live presentation, this presentation may not contain current or accurate information. We do not assume any obligation to update any forward-looking statements we may make. In addition, any information about our roadmap outlines our general product direction and is subject to change at any time without notice. It is for informational purposes only and shall not, be incorporated into any contract or other commitment. Splunk undertakes no obligation either to develop the features or functionality described or to include any such feature or functionality in a future release. Splunk, the engine for machine data is a registered trademarks or trademarks of Splunk Inc. and/or its subsidiaries and/or affiliates in the United States and/or other jurisdictions. All other brand names, product names or trademarks belong to their respective holders. 2013 Splunk Inc. All rights reserved. 2
A little about Splunk Over paying 5200 customers Over 300 free apps and templates Over 2300 security use case customers Half of all Fortune 100 companies are customers Named #1 in Big Data and Named #4 Most innovative company in the world by Fast Company Magazine 3
According to the Verizon Data Breach Investigations Report (DBIR), 92% of breaches are made public by someone other than the one who s been breached. An indictment of SIEM technologies - only 1% of breaches are detected by log analysis. WHY? 4
The Way Cyber Adversaries Think Where is the most important and valuable data? What s the typical patch cycle for applications and operating systems? What are the typical security defenses? How does the IT team prioritize vulnerabilities? What structural information silos that exist for the security team? Who in the organization has access to the most valuable data and credentials I can steal? Are normal IT service user activities routinely monitored and correlated? 5
Attack Vectors Have Gotten Personal 50% of Attacks based on compromised passwords. Are your security systems able to detect evil authenticated users? 6
Where s evidence of the attack? In log data most companies already have Many of these companies had a SIEM So if I have log data and a SIEM, why am I still breached Verizon Data Breach Report 2012 7
The Way Some Security Folks Think I hope my AV, IPS, Firewall, (name your technology) vendor catches these guys. I have 300 rules on my SIEM. One of them will catch the attacker. Attackers know that if you have a static correlation engine, you are likely trusting it, and because often "No news is good news" 8
Why the Disconnect Between Attacker and Security Professional? Vendors have convinced security folks -- the reactive approach is the only approach Solutions don t reinforce skills -- SIEMs don t nourish the security person s inner hacker Security persons have been taught a data reduction strategy compromises data fidelity and limits investigations Current SIEMs can t accept enough data for long term pattern analysis Security folks see an attack as an event when its actually a process 9
Security has out grown the traditional SIEM Security Relevant Data Security Relevant Data (IT infrastructure logs / Physical Security / Communication systems logs / Application data / non-traditional data sources) Normal user and machine generated data credentialed activities unknown threats are behavior based require analytics SIEM Known events as seen by current security architecture vendor supplied hampered by lack of context 10
Current Architecture Issues Traditional SIEM Must Fit a Schema Selective Supported Raw Data Mostly traditional security data Correlation Reporting Data Discard Leak Data Reduction Model 11
The amount of data generated only gets bigger Volume Velocity Variety Variability Fastest growing, most complex, most valuable area of big data GPS, RFID, Hypervisor, Web Servers, Email, Messaging Clickstreams, Mobile, Telephony, IVR, Databases, Sensors, Telematics, Storage, Servers, Security Devices, Desktops Stunt and Zeus mark the beginning of industrial, vertical attacks coming from machine data 12
What Does Machine Data Look Like? Sources Patient Portal Middleware Error Care IVR Community 13
What Does Machine Data Look Like? Sources Patient ID Patient Portal Middleware Error Time Waiting On Hold Patient ID Care IVR Patient ID Community ID Community 14
Text Based Search Time Index Ingestion Text Base Search Nested Search Cross Data-type Search Apend Abstract Cluster Bucket Multikv Scrub Join Rare The Love Child of Google and Excel + Over 150 data manipulation and visualization commands Statistical Analysis Cluster Associate Stats AVG Transaction Addtotals Delta Eval Stddev Rare Outlier Streamstats Timechart Copyright 2012, Splunk Inc. 15
Moving to a data inclusion model Specific behavior based pattern modeling for humans and machines Based on combinations of: Location Role Data/Asset type Data/Asset criticality Time of day Action type Action length of time No up front normalization Time-indexed Data Analytics and Statistics Commands Correlation Pattern Analysis Data Inclusion Model 16
New way of thinking: The big data security process The true sign of intelligence is not knowledge but imagination. A. Einstein 17
The new weapons of a security warrior Creativity + + Imagination 18
Using Statistical Analysis Action Phase Source Search Type SQL Injection Infiltration WebLogs Password Brutes Infiltration Auth Logs DNS Exfil Exfiltration DNS logs/fw Logs Outlier and exception Outlier and exception Outlier and exception Splunk Search len(_raw) +2.5stddev short delta _time count +2.5stddev Why Hacker puts SQL commands in the URL; URL length is standard deviations higher than normal Automated password guessing tools enter credentials much faster than humanly possible Hackers exfiltrate the data in DNS packet; standard deviations more DNS requests from a single IP Web Crawling Reconnaissance Web/FTP Logs Outlier and exception count(src_ip) +2.5stddev Web crawlers (copying the web site for comments, passwords, email addresses, etc) will be the source IP behind page requests standard deviations higher than normal Port Knocking Exfil/CnC Firewall Outlier and exception Count outbound (deny) by ip Threat does inside-out port scan to identify exfiltration paths Copyright 2012, Splunk Inc. 19
A Process for Using Big Data for Security: Identify the Business Issue What does the business care about? What could cause loss of service or financial harm? Performance Degradation Unplanned outages (security related) Intellectual property access Data theft 20
A Process for Using Big Data for Security: Construct a Hypothesis How could someone gain access to data that should be kept private? What could cause a mass system outage does the business care about? What could cause performance degradation resulting in an increase in customers dissatisfaction? 21
A Process for Using Big Data for Security: It s about the Data Where might our problem be in evidence? For data theft start with unauthorized access issues Facility access data, VPN, AD, Wireless, Applications, others Beg, Borrow, SME from system owners 22
A Process for Using Big Data for Security: Data Analysis For data theft start with what s normal and what s not (create a statistical model) How do we normally behave? What patterns would we see to identify outliers? Patterns based on ToD, Length of time, who, organizational role, IP geo-lookups, the order in which things happen, how often a thing normally happens, etc.
A Process for Using Big Data for Security: Interpret and Identify What are the mitigating factors? Does the end of the quarter cause increased access to financial data? Does our statistical model need to change due to network architecture changes, employee growth, etc? Can we gather vacation information to know when it is appropriate for HPA users to access data from foreign soil. What are the changes in attack patterns? 24
Short form - Example The Steps Business Issue Construct one of more hypothesis (team creativity required) Gather data sources and expertise The Response Service degradation causes monetary damage and customer satisfaction issues. Unwanted bots can degrade service and steal content. What combinations of data would be considered definitive evidence? What might be the first signs of trouble? List all data in which this might be reflected. Determine the analysis to be performed Interpret the results Determine the types of data searches appropriate and automation requirements Do the results represent false positives of false positives or false negatives? Are there good bots and bad bots? Copyright 2012, Splunk Inc.
Big Data Platform: Insight for Business Risk App Monitoring Data Security Data IT Operations Data LDAP, AD Watch Lists Distribution System Data Business Process Data Business Risk and Security Security & Compliance IT Operations Management Business Analytics Web Intelligence Application Monitoring 26
Looking Beyond IT for Business Risk Manufacturing Parts/Ingredients (RFID) Data Raw Materials Data Shipping Data (when/who loaded the truck) Facility Security Data Personnel Data Industrial Control System Data HVAC data Distribution Monitoring Data (GPS) Point of Sale Data Traditional IT Data 27
What manufacturing questions could you ask of Splunk? Is the product quality compromised due to an increase in ambient temperature in the plant? What pattern of user activity did we see before they attacked the website? Who is accessing company data from outside the company but is sitting at their desk? Are the large file exchanges between these two employees normal? Who is stealing company property? What s the real-time ongoing drop off rate in sales after a specific sales promotion ends? Are there employees that surf to the same website at exactly the same time every day? Are terms like virus, bot, and Anonymous trending upward on twitter? 28
Why Big Data and Analytics are the Future of Security Confidentiality Integrity and Availability: A holistic view of business security and risk mitigation growing beyond traditional IT data sources Security is being redefined Monitoring & mitigating threats that compromise business reputation, service delivery, confidential data or result in loss of intellectual property Security folks need more data not less for accurate root cause analysis Complexity of threats will continue to grow and cross from IT to less traditional data / devices / sources A single investigation will include data from all parts of the business beyond IT data Using statistical analysis for Base-lining and understanding outliers is the way to detect advanced threats 29
Thank You Questions?