From Threat Intelligence to Defense Cleverness: A Data Science Approach (#tidatasci)



Similar documents
Data-Driven Threat Intelligence: Metrics on Indicator Dissemination and Sharing (#ddti)

Data- Driven Threat Intelligence: Metrics on Indicator Dissemination and Sharing (#ddti)

Secure Because Math: Understanding ML- based Security Products (#SecureBecauseMath)

Evolution Of Cyber Threats & Defense Approaches

Applying Machine Learning to Network Security Monitoring. Alex Pinto Chief Data Scien2st

Defending Networks with Incomplete Information: A Machine Learning Approach. Alexandre

Cymon.io. Open Threat Intelligence. 29 October 2015 Copyright 2015 esentire, Inc. 1

Threat Intelligence Buyer s Guide

FROM INBOX TO ACTION AND THREAT INTELLIGENCE:

All about Threat Central

WHITE PAPER: THREAT INTELLIGENCE RANKING

Separating Signal from Noise: Taking Threat Intelligence to the Next Level

Modern Approach to Incident Response: Automated Response Architecture

Threat Intelligence Platforms: The New Essential Enterprise Software

The MANTIS Framework Cyber-Threat Intelligence Mgmt. for CERTs Siemens AG All rights reserved

After the Attack: RSA's Security Operations Transformed

Open Source Threat Intelligence. Kyle R Maxwell (@kylemaxwell) Senior Researcher, Verizon RISK Team

What s New in Security Analytics Be the Hunter.. Not the Hunted

Operational Lessons from the RSA/EMC CIRC: People, Process, & Threat Intel

ESG Threat Intelligence Research Project

WHEN THE HUNTER BECOMES THE HUNTED HUNTING DOWN BOTNETS USING NETWORK TRAFFIC ANALYSIS

Enabling Security Operations with RSA envision. August, 2009

Security Analytics for Smart Grid

How To Use Data Analysis And Machine Learning For Information Security

BIG DATA CALLS FOR BIG SECURITY!

Ty Miller. Director, Threat Intelligence Pty Ltd

Hunting for the Undefined Threat: Advanced Analytics & Visualization

Logs and Tactical Defence. Allan Stojanovic David Auclair University of Toronto #include <disclaimer.h>

Advanced Visibility. Moving Beyond a Log Centric View. Matthew Gardiner, RSA & Richard Nichols, RSA

THE EVOLUTION OF SIEM

Intrusion Detection in AlienVault

First Step Guide for Building Cyber Threat Intelligence Team. Hitoshi ENDOH (NTT-CERT) Natsuko INUI (CDI-CIRT)

Unified Security, ATP and more

Threat Intelligence: Friend of the Enterprise

A Primer on Cyber Threat Intelligence

Active Response: Automated Risk Reduction or Manual Action?

We Know It Before You Do: Predicting Malicious Domains

Concierge SIEM Reporting Overview

Defending Networks with Incomplete Information: A Machine Learning Approach. Alexandre

2012 SaaS Conversions Benchmark

Take the Red Pill: Becoming One with Your Computing Environment using Security Intelligence

Using SIEM for Real- Time Threat Detection

Comprehensive Understanding of Malicious Overlay Networks

Achieving Actionable Situational Awareness... McAfee ESM. Ad Quist, Sales Engineer NEEUR

Intrusion Along the Kill Chain

Threat Intelligence: STIX and Stones Will Break Your Foes

SORTING OUT YOUR SIEM STRATEGY:

Using Big Data to Align IT Security with Business Risk Mark Seward, Senior Director, Security and Compliance

Rashmi Knowles Chief Security Architect EMEA

Integrating cloud services with Polaris. Presented by: Wes Osborn

Anatomy of Cyber Threats, Vulnerabilities, and Attacks

Endpoint Threat Detection without the Pain

Network Security Monitoring

Analytics: The Future of Security

The Third Rail: New Stakeholders Tackle Security Threats and Solutions

Reduce Your Network's Attack Surface

Eight Essential Elements for Effective Threat Intelligence Management May 2015

Sorting out SIEM strategy Five step guide to full security information visibility and controlled threat management

Good Guys vs. the Bad Guys: Can Big Data Tools Counteract Advanced Threats?

5 Reasons Why Your Security Education Program isn t Working (and how to fix it)

The Big Data Paradigm Shift. Insight Through Automation

IBM: An Early Leader across the Big Data Security Analytics Continuum Date: June 2013 Author: Jon Oltsik, Senior Principal Analyst

ThreatSTOP Technology Overview

Indicator Expansion Techniques Tracking Cyber Threats via DNS and Netflow Analysis

CYBER SCIENCE 2015 AN ANALYSIS OF NETWORK TRAFFIC CLASSIFICATION FOR BOTNET DETECTION

Intelligence Driven Security

Threat Intelligence is Like Three Day Potty Training

Machine-to-Machine Exchange of Cyber Threat Information: a Key to Mature Cyber Defense

Continuous Network Monitoring

The session is about to commence. Please switch your phone to silent!

Threat Intelligence: An Essential Component of Cyber Incident Response. Jeanie M Larson, CISSP-ISSMP, CISM, CRISC

Big Data and Security: At the Edge of Prediction

Cyber Security Metrics Dashboards & Analytics

THREAT VISIBILITY & VULNERABILITY ASSESSMENT

CALNET 3 Category 7 Network Based Management Security. Table of Contents

Why Your SIEM Isn t Adding Value And Why It May Not Be The Tool s Fault. Best Practices Whitepaper June 18, 2014

GETTING REAL ABOUT SECURITY MANAGEMENT AND "BIG DATA"

Getting the Most Out of SIEM. Presentation Title. Data in Big Data. Presented By: Dr. Char Sample, CERT

This Symposium brought to you by

Enterprise Organizations Need Contextual- security Analytics Date: October 2014 Author: Jon Oltsik, Senior Principal Analyst

Traffic analysis with NetFlow

Transcription:

From Threat Intelligence to Defense Cleverness: A Data Science Approach (#tidatasci) Alex Pinto Chief Data Scientist Niddel / MLSec Project @alexcpsec @MLSecProject

() { :; }; whoami Alex Pinto That guy that started MLSec Project Chief Data Scientist at Niddel Machine Learning Researcher focused on Security Data Network security and incident response aficionado Tortured by Log Management / SIEMs as a child A BIG FAN of Attack Maps #pewpew Does not know who hacked Sony Has nothing to do with attribution in Operation Capybara

Agenda Cyber War Threat Intel What is it good for? Combine and TIQ-test Using TIQ-test Novelty Test Overlap Test Population Test Aging Test Uniqueness Test Use case: Feed Comparison

What is TI good for anyway?

What is TI good for anyway? 1) Attribution

What is TI good for anyway?

What is TI good for anyway? 2) Cyber Threat Maps (https://github.com/hrbrmstr/pewpew)

What is TI good for anyway? 3) How about actual defense? Use it as blacklists? As research data? Thing is RAW DATA is hard to work with

(Semi-)Required Reading #tiqtest slides: http://bit.ly/tiqtest RPubs Page: http://bit.ly/tiqtest-rpubs

Combine and TIQ-Test Combine (https://github.com/mlsecproject/combine) Gathers TI data (ip/host) from Internet and local files Normalizes the data and enriches it (AS / Geo / pdns) Can export to CSV, tiq-test format and CRITs Coming soon: CybOX / STIX (ty @kylemaxwell) TIQ-Test (https://github.com/mlsecproject/tiq-test) Runs statistical summaries and tests on TI feeds Generates charts based on the tests and summaries Written in R (because you should learn a stat language)

Using TIQ-TEST Available tests and statistics: NOVELTY How often do they update themselves? OVERLAP How do they compare to what you got? POPULATION How does this population distribution compare to another one? AGING How long does an indicator sit on a feed? UNIQUENESS How many indicators are found in only one feed?

Using TIQ-TEST New dataset! https://github.com/mlsecproject/tiq-test-winter2015

Using TIQ-TEST Feeds Selected Dataset was separated into inbound and outbound

Using TIQ-TEST Data Prep Extract the raw information from indicator feeds Both IP addresses and hostnames were extracted

Using TIQ-TEST Data Prep Convert the hostname data to IP addresses: Active IP addresses for the respective date ( A query) Passive DNS from Farsight Security (DNSDB) For each IP record (including the ones from hostnames): Add asnumber and asname (from MaxMind ASN DB) Add country (from MaxMind GeoLite DB) Add rhost (again from DNSDB) most popular PTR

Using TIQ-TEST Data Prep Done

Novelty Test measuring added and dropped indicators

Novelty Test - Inbound

Overlap Test More data is better, but make sure it is not the same data

Overlap Test - Inbound

Overlap Test - Outbound

Population Test Let us use the ASN and GeoIP databases that we used to enrich our data as a reference of the true population. But, but, human beings are unpredictable! We will never be able to forecast this!

Is your sampling poll as random as you think?

Can we get a better look? Statistical inference-based comparison models (hypothesis testing) Exact binomial tests (when we have the true pop) Chi-squared proportion tests (similar to independence tests)

Aging Test Is someone cleaning this up eventually?

INBOUND

OUTBOUND

Uniqueness Test

Uniqueness Test Domain-based indicators are unique to one list between 96.16% and 97.37% IP-based indicators are unique to one list between 82.46% and 95.24% of the time

Intermission

OPTION 1: Cool Story, Bro! Some of you are probably like: You Data Scientists and your algorithms, how quaint. Why aren t you doing some useful research like nation-state attribution?

OPTION 2: How can I use this awesomeness on my data?

Use Case: Comparing Private Feeds How about using TIQ-TEST to evaluate a private intel feed? Trying stuff before you buy is usually a good idea. Just sayin Let s compare a new feed, private1, against our combined outbound indicators

Population Test

Population Test

Population Test

Aging Test Mostly DGA Related Churn I guess most DGAs rotate every 24 hours, right? Rotation means the private data is still fresh, from research or DGA generation procedures

A++ WOULD THREAT INTEL AGAIN

MLSec Project Both projects are available as GPLv3 by MLSec Project Doing ML research on Security? Let us know! Putting together a trust group to share experiences and develop open-source tools to help with data gathering and analysis Liked TIQ-TEST? We can help benchmark your private feeds using these and other techniques Visit https://www.mlsecproject.org, message @MLSecProject or just e-mail me.

Don t want to do all this work? Come talk to me about Niddel! Private Beta of Magnet, the Machine Learning-powered Threat Intelligence Platform. Our models extrapolate the knowledge of existing threat intelligence feeds as experienced analysis would. Models make use of the same data analyst would have. Automatically triages and hunts on pivots of enriched information

Take Aways Analyze your data. Extract value from it! Try before you buy! Different test results mean different things to different orgs. Try the sample data, replicate the experiments: https://github.com/mlsecproject/tiq-test-winter2015 http://rpubs.com/alexcpsec/tiq-test-winter2015

Greets and Thanks! @kylemaxwell, @paul4pc for helping with COMBINE @bfist for his work on http://sony.attributed.to @hrbrmstr for IPEW and chart revisions for TIQ- TEST All the MLSec Project community peps!

Thanks! Q&A? Feedback! Alex Pinto @alexcpsec @MLSecProject @NiddelCorp The measure of intelligence is the ability to change." - Albert Einstein