Big Data for Security: Challenges, Opportunities, and Experiments



Similar documents
Clavister InSight TM. Protecting Values

Web Application Security

Presenting Mongoose A New Approach to Traffic Capture (patent pending) presented by Ron McLeod and Ashraf Abu Sharekh January 2013

Reputation based Security. Vijay Seshadri Zulfikar Ramzan Carey Nachenberg

NETWRIX EVENT LOG MANAGER

Computer Forensics and Incident Response in the Cloud. Stephen Coty AlertLogic, AlertLogic_ACID

How Companies are! Using Spark

IBM Tivoli Compliance Insight Manager

The syslog-ng Store Box 3 LTS

Intrusion Detection for Mobile Ad Hoc Networks

WildFire Overview. WildFire Administrator s Guide 1. Copyright Palo Alto Networks

Advancement in Virtualization Based Intrusion Detection System in Cloud Environment

Cisco Security Intelligence Operations

Splunk Enterprise Log Management Role Supporting the ISO Framework EXECUTIVE BRIEF

Unified Threat Management, Managed Security, and the Cloud Services Model

Mobile Performance Testing Approaches and Challenges

GMI CLOUD SERVICES. GMI Business Services To Be Migrated: Deployment, Migration, Security, Management

Path Optimization in Computer Networks

ACE Management Server Deployment Guide VMware ACE 2.0

SEIZE THE DATA SEIZE THE DATA. 2015

Achieving SOX Compliance with Masergy Security Professional Services

Basheer Al-Duwairi Jordan University of Science & Technology

HP Virtual Controller and Virtual Firewall for VMware vsphere 1-proc SW LTU

Analyzing Logs For Security Information Event Management

NetCrunch 6. AdRem. Network Monitoring Server. Document. Monitor. Manage

syslog-ng Store Box PRODUCT DESCRIPTION Copyright BalaBit IT Security All rights reserved.

Enterprise SysLog Manager (ESM)

Analyzing HTTP/HTTPS Traffic Logs

A Performance Analysis of Distributed Indexing using Terrier

STeP-IN SUMMIT June 2014 at Bangalore, Hyderabad, Pune - INDIA. Mobile Performance Testing

Preetham Mohan Pawar ( )

Security Solutions

PLUMgrid Toolbox: Tools to Install, Operate and Monitor Your Virtual Network Infrastructure

Machine Learning over Big Data

Development of Software Dispatcher Based. for Heterogeneous. Cluster Based Web Systems

Development of Monitoring and Analysis Tools for the Huawei Cloud Storage

NitroView Enterprise Security Manager (ESM), Enterprise Log Manager (ELM), & Receivers

Endpoint Threat Detection without the Pain

Detecting Algorithmically Generated Malicious Domain Names

Additional Security Considerations and Controls for Virtual Private Networks

SOFTNIX LOGGER Centralized Logs Management

Becoming a Cloud Services Broker. Neelam Chakrabarty Sr. Product Marketing Manager, HP SW Cloud Products, HP April 17, 2013

The syslog-ng Store Box 3 F2

Design and Implementation Guide. Apple iphone Compatibility

Edge Configuration Series Reporting Overview

Administering Windows Server 2012

Preprocessing Web Logs for Web Intrusion Detection

Cisco EXAM Implementing Cisco Threat Control Solutions (SITCS) Buy Full Product.

Dynamic Trust Management for the Internet of Things Applications

The Big Data Paradigm Shift. Insight Through Automation

Data Integrity and Dynamic Storage Way in Cloud Computing

Cheap and efficient anti-ddos solution

Integrated System Modeling for Handling Big Data in Electric Utility Systems

.nl ENTRADA. CENTR-tech 33. November 2015 Marco Davids, SIDN Labs. Klik om de s+jl te bewerken

CHAPTER 4 PERFORMANCE ANALYSIS OF CDN IN ACADEMICS

An Oracle White Paper July Oracle Primavera Contract Management, Business Intelligence Publisher Edition-Sizing Guide

Netwrix Auditor for Windows Server

An Oracle White Paper April Oracle Audit Vault and Database Firewall

SIGMOD RWE Review Towards Proximity Pattern Mining in Large Graphs

The Business Benefits of Logging

Course Syllabus. Fundamentals of Windows Server 2008 Network and Applications Infrastructure. Key Data. Audience. Prerequisites. At Course Completion

Intrusion Detection from Simple to Cloud

Sydney Boys High School. Technology Implementation

Discover & Investigate Advanced Threats. OVERVIEW

LASTLINE WHITEPAPER. The Holy Grail: Automatically Identifying Command and Control Connections from Bot Traffic

Analyzing Logs For Security Information Event Management Whitepaper

Fundamentals of Windows Server 2008 Network and Applications Infrastructure

Preventing Data Leaks At The Firewall A Simple, Cost-Effective Way To Stop Social Security and Credit Card Numbers From Leaving Your Network

Finding Liveness Errors with ACO

Names & Addresses. Names & Addresses. Names vs. Addresses. Identity. Names vs. Addresses. CS 194: Distributed Systems: Naming

VPN. Date: 4/15/2004 By: Heena Patel

Virtualization Impact on Compliance and Audit

Netwrix Auditor for Exchange

Analyzing Logs For Security Information Event Management Whitepaper

MySQL Enterprise Monitor

Frequently Asked Questions. Secure Log Manager. Last Update: 6/25/ Barfield Road Atlanta, GA Tel: Fax:

DNS Big Data

Monitoring Windows Workstations Seven Important Events

Oracle Information Security Visioni

F5 Silverline DDoS Protection Onboarding: Technical Note

Web Traffic Capture Butler Street, Suite 200 Pittsburgh, PA (412)

McAfee Network Security Platform Administration Course

Module II. Internet Security. Chapter 7. Intrusion Detection. Web Security: Theory & Applications. School of Software, Sun Yat-sen University

Stephen Coty Director, Threat Research

Detect & Investigate Threats. OVERVIEW

RAVEN, Network Security and Health for the Enterprise

Best Practices for Controlling Skype within the Enterprise > White Paper

Big Data Science. Prof.dr.ir. Geert-Jan Houben. TU Delft Web Information Systems Delft Data Science KIVI chair Big Data Science

How To Protect A Network From Attack From A Hacker (Hbss)

Vantage Report. Quick Start Guide

LIST OF FIGURES. Figure No. Caption Page No.

PSG College of Technology, Coimbatore Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.

Transcription:

Big ata for Security: Challenges, Opportunities, and Experiments Prat yusa K. Manadhat a HP Labs Joint work with Stuart Haber, William Horne, Prasad Rao, and Sandeep Yadav

Big at a is everywhere 2

Enterprises collect big data Storage is cheaper Compliance Audit trails (CFR 11), HIPAA, and SOX Hilbert and López, The world s technological capacity to store, communicate, and compute information, Science, 2011 3

From More is less to More is more What can we do with the data? Algorit hms and systems to identify actionable security events from big data. 4

5 Big data for security Traditional approach: Point products Big data: Holistic view of an enterprise Big data: Global view of enterprises AV LP Fi re wall IS AV LP Fi re wall IS Syslog HCP NS HTTP Big ata A V L P F i r e w a l l I S S y s l o g H C P N S H T T P Big ata A V L P F i r e w a l l I S S y s l o g H C P N S H T T P Big ata A V L P F i r e w a l l I S S y s l o g H C P N S H T T P Big ata A V L P F i r e w a l l I S S y s l o g H C P N S H T T P Big ata

Challenges ata collection and storage technical, legal, privacy, etc. Analysis infrastructure Scalable algorithms Limitations what works and what doesn t 6

Example: Malicious domain detection Scalable identification of malware-infected hosts in an enterprise and of malicious domains accessed by the enterprise s hosts 7

State of the art Commercial blacklists Traffic analysis Machine learning and statistical analysis 8

Our approach: Scalable graph inference omains Host-omain graph Host s Malfeasance inference as marginal probability estimation Minimal ground truth 9

Belief propagation [P82, YFW01] Marginal probability estimation in graphs NP-complete Belief propagation is fast and approximate Iterative message passing 10

Message passing Message(i j) (prior, edge potential, incoming messages) m ij ( xj) = φ( xi) ψ ( xi, x xi S k N ( i) \ j j ) m ki ( xi) Prior Edge potential Incoming messages 11

Bel ief computation Belief(i) (prior, incoming messages) bi( xi) = Kφ( x i ) j N ( i) m ji ( x i ) Normalization constant Prior Incoming messages 12

Our approach 13

Experiment al evaluat ion Graph completeness Size of ground truth data Two class vs multi class classification Homogeneous vs heterogeneous data One enterprise vs multiple enterprises 14

HTTP Proxy logs and HCP logs Logs f rom a large ent erprise 98 HTTP proxy servers and 6 HCP servers world wide 1 day s logs : 2 billion events 144K hosts, 1.28M domains, and 12M edges Priors from ground truth (0.4% nodes) 3K known bad domains: 0.99 (TippingPoint) 3K known good domains: 0.01 (Alexa) Unknown hosts and domains: 0.5 Edge potential Benign Malicious Benign 0.51 0.49 Malicious 0.49 0.51 15

BP scales t o ent erprise set t ings Java implementation of BP 12 core 2.67GHz desktop with 48GB RAM Each iteration takes 2-4 minutes Converges in 13 iterations 16

Malicious domain detection ROC Plot FR: 2%-5% (4%, 83%) and (6%, 90%) 17

NS request logs Collected from a medium sized ISP 2 Gbps packet captures 1 week s data: 1.1 billion NS requests 927K hosts, 1.32M domains, and 12M edges Priors and edge potential similar to HTTP data 18

NS ROC pl ot (7%, 84%) and (10%, 86%) 19

IS alerts logs Collected from 916 enterprises worldwide 5 years data: 15.5 billion alerts 3.1M internal nodes, 3.69M external nodes, and 21.4M edges Classify nodes into 4 classes IS expert annotates 400 IS signatures Assign priors according to alert classes (6.6% nodes in ground truth) Edge potential according to homophilic relationship 20

IS l ogs ROC pl ot 21

Summary of result s Works well when the graph is complete (IS), does poorly when the graph is incomplete (NS) Requires minimal ground truth (HTTP), but more ground truth data is better (IS) Works in multiclass settings (IS) Can handle heterogeneous data (NS) and from multiple enterprises (IS) iscovers genuine anomalies, manually verified (HTTP) 22

Future work Combine different data sources e.g., use IS logs as ground truth for NS logs Root cause analysis 23

Thank you manadhata@hp.com