Data Mining in Incident Response Challenges and Opportunities

Similar documents
Fight fire with fire when protecting sensitive data

A perspective to incident response or another set of recommendations for malware authors

An Introduction to Incident Detection and Response Memory Forensic Analysis

McAfee Web Gateway Administration Intel Security Education Services Administration Course Training

Eight Essential Elements for Effective Threat Intelligence Management May 2015

Cyber Security. BDS PhantomWorks. Boeing Energy. Copyright 2011 Boeing. All rights reserved.

Splunk: Using Big Data for Cybersecurity

Environment. Attacks against physical integrity that can modify or destroy the information, Unauthorized use of information.

Modern Approach to Incident Response: Automated Response Architecture

XPROBE. Building Efficient Network Discovery Tools. Fyodor Yarochkin

Enterprise Organizations Need Contextual- security Analytics Date: October 2014 Author: Jon Oltsik, Senior Principal Analyst

Analysis of Network Beaconing Activity for Incident Response

The Third Rail: New Stakeholders Tackle Security Threats and Solutions

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging

Separating Signal from Noise: Taking Threat Intelligence to the Next Level

Presented by: Aaron Bossert, Cray Inc. Network Security Analytics, HPC Platforms, Hadoop, and Graphs Oh, My

Active Response: Automated Risk Reduction or Manual Action?

Machine-to-Machine Exchange of Cyber Threat Information: a Key to Mature Cyber Defense

The SIEM Evaluator s Guide

Know Your Foe. Threat Infrastructure Analysis Pitfalls

Endpoint Threat Detection without the Pain

SIEM Optimization 101. ReliaQuest E-Book Fully Integrated and Optimized IT Security

The Cyber Threat Profiler

2. From a control perspective, the PRIMARY objective of classifying information assets is to:

WHITE PAPER: THREAT INTELLIGENCE RANKING

Second-generation (GenII) honeypots

Network Based Intrusion Detection Using Honey pot Deception

Network Security Monitoring

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

Solera Networks, A Blue Coat Company SOLERA NETWORKS BIG DATA SECURITY ANALYTICS

Niara Security Intelligence. Overview. Threat Discovery and Incident Investigation Reimagined

Indexing Full Packet Capture Data With Flow

WHITE PAPER SPLUNK SOFTWARE AS A SIEM

Clavister InSight TM. Protecting Values

An Evaluation of Machine Learning Method for Intrusion Detection System Using LOF on Jubatus

Oracle Database In-Memory The Next Big Thing

Data Science Transforming Security Operations

User Documentation Web Traffic Security. University of Stavanger

Learn How to Defend Your Online Marketplace from Unwanted Traffic

Spark: Cluster Computing with Working Sets

Splunk Enterprise Log Management Role Supporting the ISO Framework EXECUTIVE BRIEF

Symantec Cyber Threat Analysis Program Program Overview. Symantec Cyber Threat Analysis Program Team

LOG AND EVENT MANAGEMENT FOR SECURITY AND COMPLIANCE

QRadar SIEM and Zscaler Nanolog Streaming Service

Unlocking the Intelligence in. Big Data. Ron Kasabian General Manager Big Data Solutions Intel Corporation

Big Data in Action: Behind the Scenes at Symantec with the World s Largest Threat Intelligence Data

D&D of malware with exotic C&C

A Novel Cloud Based Elastic Framework for Big Data Preprocessing

with Managing RSA the Lifecycle of Key Manager RSA Streamlining Security Operations Data Loss Prevention Solutions RSA Solution Brief

Ty Miller. Director, Threat Intelligence Pty Ltd

An In-Depth Look at In-Memory Predictive Analytics for Developers

What s New in Security Analytics Be the Hunter.. Not the Hunted

Fostering Incident Response and Digital Forensics Research

POLIWALL: AHEAD OF THE FIREWALL

The Big Data Paradigm Shift. Insight Through Automation

Threat Advisory: Accellion File Transfer Appliance Vulnerability

GRC & Cyber Security Conference - Bringing the Silos Together ISACA Ireland 3 Oct 2014 Fahad Ehsan

FROM INBOX TO ACTION AND THREAT INTELLIGENCE:

New ways to a secure IT Management

EXTENDING NETWORK SECURITY: TAKING A THREAT CENTRIC APPROACH TO SECURITY

White Paper. How Streaming Data Analytics Enables Real-Time Decisions

SAP Predictive Analytics: An Overview and Roadmap. Charles Gadalla, SESSION CODE: 603

XPROBE-NG. What s new with upcoming version of the tool. Fyodor Yarochkin Armorize Technologies

LOG INTELLIGENCE FOR SECURITY AND COMPLIANCE

Automating Attack Analysis Using Audit Data. Dr. Bruce Gabrielson (BAH) CND R&T PMO 28 October 2009

First Look Trend Micro Deep Discovery Inspector

Practical Threat Intelligence. with Bromium LAVA

How to Choose Between Hadoop, NoSQL and RDBMS

Information management software solutions White paper. Powerful data warehousing performance with IBM Red Brick Warehouse

ASSUMING A STATE OF COMPROMISE: EFFECTIVE DETECTION OF SECURITY BREACHES

Memory Forensics & Security Analytics: Detecting Unknown Malware

Security Operation Center Architecture for E-government based on Big Data Analysis

A New Era Of Analytic

Information Technology Policy

Reporting and Incident Management for Firewalls

WHEN THE HUNTER BECOMES THE HUNTED HUNTING DOWN BOTNETS USING NETWORK TRAFFIC ANALYSIS

Symantec Global Intelligence Network 2.0 Architecture: Staying Ahead of the Evolving Threat Landscape

ProTrack: A Simple Provenance-tracking Filesystem

ENABLING FAST RESPONSES THREAT MONITORING

POLIWALL: AHEAD OF THE FIREWALL

DRIVE-BY DOWNLOAD WHAT IS DRIVE-BY DOWNLOAD? A Typical Attack Scenario

Cloud Computing at Google. Architecture

Defending Against Cyber Attacks with SessionLevel Network Security

Concierge SIEM Reporting Overview

Achieving Actionable Situational Awareness... McAfee ESM. Ad Quist, Sales Engineer NEEUR

High End Information Security Services

Rethinking Information Security for Advanced Threats. CEB Information Risk Leadership Council

Advanced SOC Design. Next Generation Security Operations. Shane Harsch Senior Solutions Principal, MBA GCED CISSP RSA

GOOD GUYS VS BAD GUYS: USING BIG DATA TO COUNTERACT ADVANCED THREATS. Joe Goldberg. Splunk. Session ID: SPO-W09 Session Classification: Intermediate

RSA Security Analytics Certified Administrator (CA) Certification Examination Study Guide

Transformation of honeypot raw data into structured data

L evoluzione del Security Operation Center tra Threat Detection e Incident Response & Management

Transcription:

Data Mining in Incident Response Challenges and Opportunities Alexandre Dulaunoy - TLP:WHITE Information Security Education Day 1 of 22

CIRCL The Computer Incident Response Center Luxembourg (CIRCL) is a government-driven initiative designed to provide a systematic response facility to computer security threats and incidents. CIRCL is the CERT for the private sector, communes and non-governmental entities in Luxembourg. 2 of 22

Figures at CIRCL 1.4GB of compressed malware sample in a day. An average of 2-4TB per evidence acquisition (disk, memory,...) including analysis artefacts or duplicate analysis information. 1.2GB of compressed network capture from the operational honeypot network (HoneyBot). 10-20 million records added or updated in the Passive DNS in a day. 500 million of X.509 certificates in the Passive SSL. 3 of 22

Do we have an issue with such volume of data? Storage price goes down and it will probably follow this trend. Storing huge amount of data is still practical and CSIRT can usually handle it. Write-speed on disk is still the main limitation (e.g. wire speed increased faster than the I/O). 4 of 22

Where are the real challenges in a day-to-day CSIRT operation? 12000 requests per second to lookup records in the Passive DNS. Collections (network, disk, memory) by CSIRTs are often unstructured, sources of data are uncontrolled and unstrusted and incomplete. 5 of 22

Homogeneous data versus heterogeneous data 45TB of normalized and homogeneous network capture is fundamentaly different than 45TB of black-hole network capture. Discarding is easy in normalized traffic. In incident response, protocol errors or incomplete packets are part of the potential attacks. Parser errors and exceptions are more common on an untrusted and uncontrolled data sources. Data mining capabilities highly depend of the data structuration (e.g. exfiltration channels are rarely respecting the network layers). If the structuration is close to zero, more human pre-analysis is required. 6 of 22

What are the key factors in incident response? Reduce workload for the analysis (e.g. a full file-system forensic analysis of a standard system can take up to 10 days). Allow fast lookup in the data collected and processed. Easier the access of correlation is, faster is the exclusion or inclusion of data. Dynamic feedback on the data from the users (what are the most queried records?). Reduce false positive but false negative reduction is more important (e.g. can you miss an evidence in a critical case?). 7 of 22

How do we try to improve? Data-structure allowing fast lookup and fast update/counting. Bitindex, Bloom filters, HyperLogLog... Space efficient in-memory key/value store. Parallel processing of large datasets introduces challenges in checkpointing and updates (e.g. a crash of a parser is not uncommon from untrusted datasets). Simple parallel processing frameworks versus complex frameworks (e.g. limiting the cost of bootstrapping, memory usage and overhead of a framework). 8 of 22

Improving with the feedback loop The greatest benefit for data mining is to introduce human feedback early. Analysts discover outliers, errors or even missing data. Feedback can be used to improve algorithms, data structuration (e.g. 4th iteration of the CIRCL Passive SSL data structure) or query interfaces. 9 of 22

How to get analysts feedback? Integrate lookup services in the tools used by the analysts. Provide multiple UI to promote the reuse of the datasets. Support the classification of the results (e.g. a source of classified dataset). MISP, malware information and threat sharing platform, is developed to support this. 10 of 22

Quick MISP introduction MISP 1 is an IOC and threat indicators sharing free software. MISP has many functionalities e.g. flexible sharing groups, automatic correlation, expansion and enrichment modules, free-text import helper, event distribution and collaboration. CIRCL operates multiple MISP instances with a significant user base (around 400 organizations and more than 1000 users). After some years of trial-and-error, we explain the background behind current and new MISP features. 1 https://github.com/misp/misp 11 of 22

MISP core distributed sharing functionality MISP s core functionality is sharing where everyone can be a consumer and/or a contributor/producer. Quick benefit without the obligation to contribute. Low barrier access to get acquainted to the system. 12 of 22

Development based on practical user feedback There are many different types of users of an information sharing platform like MISP: Malware reversers willing to share indicators of analysis with respective colleagues. Security analysts searching, validating and using indicators in operational security. Intelligence analysts gathering information about specific adversary groups. Law-enforcement relying on indicators to support or bootstrap their DFIR cases. Risk analysis teams willing to know about the new threats, likelyhood and occurences. Fraud analysts willing to share financial indicators to detect financial frauds. 13 of 22

Events and Attributes in MISP MISP attributes 2 initially started with a standard set of cyber security indicators. MISP attributes are purely based on usage (what people and organizations use daily). Evolution of MISP attributes is based on practical usage and users (e.g. recent addition of the financial indicators in 2.4). In version 3.0, MISP objects will be added to give the freedom to the community to create new and combined attributes and share them. 2 attributes can be anything that helps describe the intent of the event package from indicators, vulnerabilities or any relevant information 14 of 22

Helping Contributors in MISP Contributors can use the UI, API or using the freetext import to add events and attributes. Modules existing in Viper (a binary framework for malware reverser) to populate and use MISP from the vty or via your IDA. Contribution can be direct by creating an event but users can propose attributes updates to the event owner. Users should not be forced to use a single interface to contribute. 15 of 22

From Tagging to Flexible Taxonomies Tagging is a simple way to attach a classification to an event. In the early version of MISP, tagging was local to an instance. After evaluating different solutions of classification, we build a new scheme using the concept of machine tags. 16 of 22

Machine Tags Triple tag or machine tag was introduced in 2004 to extend geotagging on images. A machine tag is just a tag expressed in way that allows systems to parse and interpret it. Still have a human-readable version: admiralty-scale:source Reliability= Fairly reliable 17 of 22

Sightings support Sightings allow users to notify the community about the activities related to an indicator. Refresh time-to-live of an indicator. Sightings can be performed via API, and UI including import of STIX sighting documents. Many research opportunities in scoring indicators based on users sighting. 18 of 22

MISP modules - extending MISP with Python scripts 19 of 22 Extending MISP with expansion modules with zero customization in MISP. A simple ReST API between the modules and MISP allowing auto-discovery of new modules with their features. Benefit from existing Python modules in Viper or any other tools. Current modules include: Passive Total, Passive SSL/DNS (CIRCL), CVE expansion...

MISP modules - How it s integrated in the UI? 20 of 22

Conclusion Data mining is a core activity in incident response and forensic analysis. Many challenges remain in order to reduce the time-to-process and improve the accessibility of the data-sets. Information sharing is a way to couple the cross-validation of datasets, improving usage and finally improving the data mining processes (collection, filtering, aggregation and query). Ongoing research ideas on the operational MISP platforms like: Gamification of information sharing (more people contribute...). Sharing of privacy-aware data structure. Scoring models of information correlation. 21 of 22

Q&A https://github.com/circl/ https://github.com/misp/ - https://www.circl.lu/ info@circl.lu - research projects and partnerships PGP key fingerprint: CA57 2205 C002 4E06 BA70 BE89 EAAD CFFC 22BD 4CD5 22 of 22