Getting the Most Out of SIEM Presentation Title Data in Big Data Presented By: Dr. Char Sample, CERT
Acknowledgements Dr. Ben Shniederman, UMD Big Data Big Insights George Jones, John Stogoski, CERT Alternatives to Signatures Gartner, Reality Check for Cybersecurity and Fraud Caltagirone, Pendergast & Betz, The Diamond Model of Intrusion Analysis Bhattacharya & Mitra Analytics on Big Fast Data Using Real Time Stream Data Processing Architecture 2
Introduction Defining Big Data Big Data Issues What feeds BD The signal to noise problem Gaining insights 3
What is Big Data? Lots of noise! We are drowning in information, but starved for knowledge John Naisbitt So much noise that finding the signal is difficult. Data scientists Data visualization The goal is to find ways to make sense out of the noise that works for you! 4
What is Big Data? What is different? Volume Variety Velocity What is common? Flow data Date/time stamps 5
BD - Components Security BD comes from many sources: Proxy logs, e-mail metadata, security logs (firewall, IDS, IPS, Authentication, HBSS), DNS & DHCP logs, query logs, badge data, netflow, crash-dump analysis, SIEM data, news events*. Hunt teams intuitive engineers Threat intelligence News sources 6
BD Characteristics - Events Events Methods tend to be intuitive Temporal Events Network Tree Event processing Signature: challenge signatures miss to (False negative) much AD: challenge = inform decisions, in a timely manner. (False positive) Probabilistic vs possibilistic 7
BD Characteristics - Provenance Provenance How to find ground truth data. What is the quality of the data that you are examining? Where is the data from? How did it get there? Why is data provenance is important? 8
Signal / Noise How do we separate? Shniederman, UMD: Overview, zoom, filter, details on demand Gartner: Start small and infuse with data analytics with contextual data and analytics Regardless of approach, the role of the human continues to grow, along with the complexity of that role. 9
Signal / Noise Shneiderman Start large, understand the entire picture, then look for something amiss. Zoom in on items of interest. 10
Signal / Noise Gartner Start small and pick a project where you can see results. Eventually broaden adoption of big data analytics across multiple applications. 11
BD Architecture 12
BD Processing Architecture 13
BD Architecture - Gartner 14
Signal / Noise Visualization tools Time (Palantir, Time Searcher, Lifelines) Network (Tom Sawyer, Pajek, Gephi) Trees (SpaceTree, TreeMap) Search Clusters Anomalies 15
Signal / Noise Efficient storage algorithms exist Retrieval algorithms Why this problem exists Other security issues Use of BD to predict events Clusters Markov 16
Signal / Noise Retrieval issues Query broker Architectural concerns Data node security 17
Hadoop Cluster Architecture 18
Signal/Noise 19
Data Clusters Why do they matter? What application in BD? 20
Markov Actually HMM seems to be emerging as the method of choice. Relies on learning algorithms for a data Corruption issue with collected data Kafka & Storm Kafka Message data (LinkedIn) Storm streamed data Probability models for anomalies in the BD. 21
Signal / Noise Current security analysis methods rely on a combination of techniques: Hunt teams Log files Attack trees Reputation lists Threat intelligence 22
Fusing Much of the work in security relies on human insight to fuse the data. Process lacking Methods lacking Framework lacking 23
Conclusions BD, particularly as it pertains to security data, can provide deep insights. Many issues with BD remain but certain areas, such as BD architecture appear to be starting to stabilize. Risk modeling for BD architectures will need to focus both inward and outward. An essential component will be the mix of personnel working together. Do NOT be intimidated, no one knows everything about BD, if they say they do, they are lying. 24
Q&A Questions & Answers
Backup Slides
Hunt Team Semantic Engines 27
DNS Mined Data Detecting Newly Active Domains Often times a fast flux behavior Usually gray listed Can check to learn more about the address space, who allocates the space. Sometimes information about the owner of the space can provide insights. 28
Reputation lists 29
Network Profiling Understanding the network. Tools like Flow & Wireshark can assist What about the use of deception technologies & techniques? 30