Presented by: Aaron Bossert, Cray Inc. Network Security Analytics, HPC Platforms, Hadoop, and Graphs Oh, My

Similar documents
Six Days in the Network Security Trenches at SC14. A Cray Graph Analytics Case Study

What s New in Security Analytics Be the Hunter.. Not the Hunted

SIMPLE MACHINE HEURISTIC INTELLIGENT AGENT FRAMEWORK

THE GLOBAL EVENT MANAGER

Viete, čo robia Vaši užívatelia na sieti? Roman Tuchyňa, CSA

D. Grzetich 6/26/2013. The Problem We Face Today

Analysis of Network Beaconing Activity for Incident Response

Best Practices for Hadoop Data Analysis with Tableau

Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013

PALANTIR CYBER An End-to-End Cyber Intelligence Platform for Analysis & Knowledge Management

LOG INTELLIGENCE FOR SECURITY AND COMPLIANCE

Unified Batch & Stream Processing Platform

SolarWinds Log & Event Manager

Cray: Enabling Real-Time Discovery in Big Data

VMware vcenter Log Insight Getting Started Guide

Oracle Big Data SQL Technical Update

A Novel Cloud Based Elastic Framework for Big Data Preprocessing

IBM: An Early Leader across the Big Data Security Analytics Continuum Date: June 2013 Author: Jon Oltsik, Senior Principal Analyst

RSA Security Analytics Security Analytics System Overview

Dell* In-Memory Appliance for Cloudera* Enterprise

Massive Cloud Auditing using Data Mining on Hadoop

Niara Security Analytics. Overview. Automatically detect attacks on the inside using machine learning

WHITE PAPER SPLUNK SOFTWARE AS A SIEM

Detect & Investigate Threats. OVERVIEW

Beyond Web Application Log Analysis using Apache TM Hadoop. A Whitepaper by Orzota, Inc.

Hillstone Intelligent Next Generation Firewall

Niara Security Intelligence. Overview. Threat Discovery and Incident Investigation Reimagined

Case Study: Instrumenting a Network for NetFlow Security Visualization Tools

Combat Sophisticated Threats

Situational Awareness Through Network Visualization

XpoLog Center Suite Log Management & Analysis platform

SolarWinds. Understanding SolarWinds Charts and Graphs Technical Reference

GETTING REAL ABOUT SECURITY MANAGEMENT AND "BIG DATA"

Spatio-Temporal Networks:

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Indicator Expansion Techniques Tracking Cyber Threats via DNS and Netflow Analysis

Federated SQL on Hadoop and Beyond: Leveraging Apache Geode to Build a Poor Man's SAP HANA. by Christian

RSA Security Analytics

EnCase Analytics Product Overview

Using LYNXeon with NetFlow to Complete Your Cyber Security Picture

America s Most Wanted a metric to detect persistently faulty machines in Hadoop

Building Scalable Big Data Infrastructure Using Open Source Software. Sam William

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

Pulsar Realtime Analytics At Scale. Tony Ng April 14, 2015

SES / CIF. Internet2 Combined Industry and Research Constituency Meeting April 24, 2012

Hunting for the Undefined Threat: Advanced Analytics & Visualization

The Big Data Paradigm Shift. Insight Through Automation

Big Fast Data Hadoop acceleration with Flash. June 2013

Topology Aware Analytics for Elastic Cloud Services

The Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn

NetFlow Analytics for Splunk

Exercise 7 Network Forensics

Palo Alto Networks and Splunk: Combining Next-generation Solutions to Defeat Advanced Threats

Know Your Foe. Threat Infrastructure Analysis Pitfalls

Transformation of honeypot raw data into structured data

Introduction to Apache Kafka And Real-Time ETL. for Oracle DBAs and Data Analysts

NSC E

Detecting Anomalous Behavior with the Business Data Lake. Reference Architecture and Enterprise Approaches.

HADOOP AND MAINFRAMES CRAZY OR CRAZY LIKE A FOX? Mike Combs, VP of Marketing mcombs@veristorm.com

A New Approach to Network Visibility at UBC. Presented by the Network Management Centre and Wireless Infrastructure Teams

CHANGING THE SECURITY MONITORING STATUS QUO Solving SIEM problems with RSA Security Analytics

Cisco OpenSOC Hadoop Design with Bro

plixer Scrutinizer Competitor Worksheet Visualization of Network Health Unauthorized application deployments Detect DNS communication tunnels

Installation Guide Avi Networks Cloud Application Delivery Platform Integration with Cisco Application Policy Infrastructure

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Challenges in NetFlow based Event Logging

Analyzing HTTP/HTTPS Traffic Logs

NetFlow Tracker Overview. Mike McGrath x ccie CTO mike@crannog-software.com

Analyzing Big Data with Splunk A Cost Effective Storage Architecture and Solution

Cisco IOS Flexible NetFlow Technology

A New Era Of Analytic

RSA envision. Platform. Real-time Actionable Security Information, Streamlined Incident Handling, Effective Security Measures. RSA Solution Brief

IBM Software InfoSphere Guardium. Planning a data security and auditing deployment for Hadoop

AlienVault Unified Security Management (USM) 4.x-5.x. Deployment Planning Guide

Objectives of Lecture. Network Architecture. Protocols. Contents

CYBER SCIENCE 2015 AN ANALYSIS OF NETWORK TRAFFIC CLASSIFICATION FOR BOTNET DETECTION

Network Security Deployment (NSD)

Sicurezza & Big Data: la Security Intelligence aiuta le aziende a difendersi dagli attacchi

Getting the Most Out of SIEM. Presentation Title. Data in Big Data. Presented By: Dr. Char Sample, CERT

Concierge SIEM Reporting Overview

AlienVault Unified Security Management Solution Complete. Simple. Affordable Life Cycle of a log

locuz.com Big Data Services

Big Data Management and Security

Cyber Security. BDS PhantomWorks. Boeing Energy. Copyright 2011 Boeing. All rights reserved.

Enterprise SysLog Manager (ESM)

CDH AND BUSINESS CONTINUITY:

PEPPERDATA IN MULTI-TENANT ENVIRONMENTS

Network Visiblity and Performance Solutions Online Demo Guide

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack

Application Discovery Manager User s Guide vcenter Application Discovery Manager 6.2.1

SQLstream Blaze and Apache Storm A BENCHMARK COMPARISON

Apache CloudStack 4.x (incubating) Network Setup: excerpt from Installation Guide. Revised February 28, :32 pm Pacific

AccelOps NOC and SOC Analytics in a Single Pane of Glass Date: March 2016 Author: Tony Palmer, Senior ESG Lab Analyst

Building a Scalable Big Data Infrastructure for Dynamic Workflows

Service Description DDoS Mitigation Service

Transcription:

Presented by: Aaron Bossert, Cray Inc. Network Security Analytics, HPC Platforms, Hadoop, and Graphs Oh, My

The Proverbial Needle In A Haystack Problem The Nuclear Option

Problem Statement and Proposed Solutions The Spock Option

Problem Statement and Proposed Solutions The how we ve been doing it Option

Problem Statement and Proposed Solutions We would like to humbly suggest bringing more workers to the party

Problem Statement and Proposed Solutions Prefer a less recent pop-culture reference?

Background Technologies Urika GD RDF triple store proprietary architecture (XMT, XMT2) Urika XA Hadoop appliance x86 based architecture Next? Customer needs Massive scale Flexibility to develop different use cases on one platform Prevent cluster sprawl (e.g. dense racks) Example Use Case: Network security Near-real-time ingest Machine learning applied to streaming and static data (e.g. IR and Forensic investigations) Flexible framework easy to extend and modify bag of tools, not a bag of hammers (e.g. complementary technology stack to address different workloads) Support novice to expert users (e.g. easy button, if you want it; spin all the knobs if you don t)

High-Level Architecture file-based input Apache Phoenix Threat feeds Transform: Kafka Urika GD Interactive analysis Network streams Statistical anomaly detection IR ticketing Other data sources Machine learning

Architecture Highlights Credit where credit is due Architecture is heavily based off of and influenced by Cisco OpenSOC Changes made to take advantage of newer technologies (e.g. Apache Phoenix) Ingest Apache Kafka selected for high throughput Kafka development is relatively language agnostic (i.e. lower learning curve) Kafka handles streaming and file-based input well (assuming sufficient IO to/from disk) Processing and machine learning Still evaluating Kafka and Apache Storm, bulk of processing is done with Kafka for now Existing algorithms are leveraged, new ones implemented trivially Queries can be directed to the most appropriate tool, taking advantage of both traditional row/column and graph store strengths to answer questions The end result Nearly raw data stored in Phoenix for maximum flexibility Automated and manual analytic results aggregated and used for confidence scoring Automated alerts used to create tickets past a certain threshold Near-real-time and forensic use cases can be supported on a single platform seamlessly Most of the pipeline can be extended in any programming language and potentially re-use existing code bases, lowering the bar to entry in a new environment

Input Data Off the wire, from files, or both Kafka Producers used to efficiently manage and add new data sources Currently have parsers for the following: Netflow Cisco ASA Passive DNS (collected from internal DNS servers) Publicly available black/white lists (fetched at regular intervals based on the data source) WHOIS Active directory GeoIP DHCP PCAP Many more supported by Cisco OpenSOC

Scoring Suspicious behavior Anomaly detection Track both internal and external entities on a per-entity basis Examples of dimensions tracked Temporal patterns (e.g. time of day, day of week, etc.) Traffic volume TCP/UDP port usage Protocol usage Existing threat data Black/white lists Firewall/IDS/IPS/SIEM logs Pulling it all together Scores are transient in the sense that they apply for a given window of time (e.g. arbitrarily by hour or by day) Calculated across all alerting mechanisms; use weighting Weighted entity or traffic (depending on context) score crossing a threshold is flagged for analysis/verification Automated analytics can run side by side with ad-hoc queries Ad-hoc analysis can be integrated into the automated workflow including replay of past traffic Difference from standard IDS/IPS/SIEM More complex pattern and behavior-based risk scoring based on multiple dimensions Risk score s temporal aspect can be used to potentially block traffic dynamically and in a more fine-grained fashion

Scoring Example Time Anomaly Weight Score 2016-01-02 13:10:02.223657 Abnormal SSH activity 2 0.2 2016-01-02 13:14:33.114538 Abnormal UDP port usage 2 0.3 2016-01-02 13:36:21.685934 Blocked traffic to blacklisted IP/domain 4 0.7 Weighted score for 2016-01-02 13:00:00.000000 0.6 2016-01-03 08:44:55.300978 Unusual temporal activity (compared to baseline) 1 0.3 Weighted score for 2016-01-03 08:00:00.000000 0.3 2016-01-03 10:02:31.000494 IDS alert 5 0.8 2016-01-03 10:03:01.756002 Allowed transfer to domain closely associated with blacklisted IP (badrank) 4 0.6 Weighted score for 2016-01-03 10:00:00.000000 0.7

Graphs BadRank Essentially a seeded PageRank score Allows for determining guilt by association; Specifically, uses passive DNS and/or WHOIS Centrality Identifies bridge nodes between clusters/groups Enables Identification of chokepoints for blocking traffic Can be used to analyze botnet C 2 structure Community detection Flexible multi-dimensional similarity Can be used to classify traffic patterns and/or hosts Can be used to identify additional compromised/malicious entities Summary Graph algorithms provide a distinct class of tools not able to be easily implemented with relational data Compliments statistical anomaly detection by providing additional dimensions Handles joining disparate and complex datasets for enrichment

User Interface Graphs

User Interface Or tabular data in one UI

Questions, Contact, and Further Details M. Aaron Bossert bossert@cray.com Cray Inc.