SGT Technology Innovation Center Dasvis Project



Similar documents
BIG DATA FOR MEDIA SIGMA DATA SCIENCE GROUP MARCH 2ND, OSLO

A stream computing approach towards scalable NLP

Real Time Fraud Detection With Sequence Mining on Big Data Platform. Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May Santa Clara, CA

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Introducing Storm 1 Core Storm concepts Topology design

Real-time Big Data Analytics with Storm

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

Take the Red Pill: Becoming One with Your Computing Environment using Security Intelligence

Spark use case at Telefonica CBS

LOG INTELLIGENCE FOR SECURITY AND COMPLIANCE

BIG DATA. Using the Lambda Architecture on a Big Data Platform to Improve Mobile Campaign Management. Author: Sandesh Deshmane

SIMPLE MACHINE HEURISTIC INTELLIGENT AGENT FRAMEWORK

Assignment # 1 (Cloud Computing Security)

HP ArcSight User Behavior Analytics

Integrating Big Data into the Computing Curricula

Information Technology Policy

Pulsar Realtime Analytics At Scale. Tony Ng April 14, 2015

GUJARAT TECHNOLOGICAL UNIVERSITY

Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia

the missing log collector Treasure Data, Inc. Muga Nishizawa

Big Data Analytics - Accelerated. stream-horizon.com

Towards Smart and Intelligent SDN Controller

MongoDB Developer and Administrator Certification Course Agenda

Data-intensive HPC: opportunities and challenges. Patrick Valduriez

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Predictive Research Inc., Predict & Benefit

IBM: An Early Leader across the Big Data Security Analytics Continuum Date: June 2013 Author: Jon Oltsik, Senior Principal Analyst

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

An Industrial Perspective on the Hadoop Ecosystem. Eldar Khalilov Pavel Valov

QRadar SIEM and Zscaler Nanolog Streaming Service

NitroView. Content Aware SIEM TM. Unified Security and Compliance Unmatched Speed and Scale. Application Data Monitoring. Database Monitoring

THE 2014 THREAT DETECTION CHECKLIST. Six ways to tell a criminal from a customer.

Big Data Analytics in LinkedIn. Danielle Aring & William Merritt

Monitoring BGP and Route Leaks using OpenBMP and Apache Kafka

HDMQ :Towards In-Order and Exactly-Once Delivery using Hierarchical Distributed Message Queues. Dharmit Patel Faraj Khasib Shiva Srivastava

Monitoring Best Practices for

The Purview Solution Integration With Splunk

An Approach to Implement Map Reduce with NoSQL Databases

Big Data Use Case. How Rackspace is using Private Cloud for Big Data. Bryan Thompson. May 8th, 2013

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

A New Approach to Network Visibility at UBC. Presented by the Network Management Centre and Wireless Infrastructure Teams

Tungsten Replicator, more open than ever!

Big Data a threat or a chance?

Performance and Scalability Overview

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Openbus Documentation

NStreamAware: Real-Time Visual Analytics for Data Streams to Enhance Situational Awareness

HADOOP. Revised 10/19/2015

Resource Aware Scheduler for Storm. Software Design Document. Date: 09/18/2015

Topology Aware Analytics for Elastic Cloud Services

Indicator Expansion Techniques Tracking Cyber Threats via DNS and Netflow Analysis

How To Use A Data Center With A Data Farm On A Microsoft Server On A Linux Server On An Ipad Or Ipad (Ortero) On A Cheap Computer (Orropera) On An Uniden (Orran)

Presenting Mongoose A New Approach to Traffic Capture (patent pending) presented by Ron McLeod and Ashraf Abu Sharekh January 2013

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

An Oracle White Paper November Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

CA Technologies Big Data Infrastructure Management Unified Management and Visibility of Big Data

Software development & technologies in Market Research industry

White Paper: Datameer s User-Focused Big Data Solutions

Performance and Scalability Overview

Tivoli Security Information and Event Manager V1.0

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases

Kafka & Redis for Big Data Solutions

Problem Solving Hands-on Labware for Teaching Big Data Cybersecurity Analysis

Cloud3DView: Gamifying Data Center Management

The Future of Big Data SAS Automotive Roundtable Los Angeles, CA 5 March 2015 Mike Olson Chief Strategy Officer,

Embedded inside the database. No need for Hadoop or customcode. True real-time analytics done per transaction and in aggregate. On-the-fly linking IP

Hadoop Ecosystem B Y R A H I M A.

Big Data Analytics for Cyber

Detecting Anomalous Behavior with the Business Data Lake. Reference Architecture and Enterprise Approaches.

How To Handle Big Data With A Data Scientist

Designing a Data Solution with Microsoft SQL Server 2014

[Hadoop, Storm and Couchbase: Faster Big Data]

The Cyber Threat Profiler

Innovative, High-Density, Massively Scalable Packet Capture and Cyber Analytics Cluster for Enterprise Customers

Automating Big Data Benchmarking for Different Architectures with ALOJA

Request for Resume (RFR) CATS+ Master Contract All Master Contract Provisions Apply. Section 1 General Information

Ganzheitliches Datenmanagement

Keyword: Cloud computing, service model, deployment model, network layer security.

Big Data Visualization with JReport

International Journal of Enterprise Computing and Business Systems ISSN (Online) :

CMS Query Suite. CS4440 Project Proposal. Chris Baker Michael Cook Soumo Gorai

Architectures for massive data management

NitroView Enterprise Security Manager (ESM), Enterprise Log Manager (ELM), & Receivers

The syslog-ng Premium Edition 5F2

SQL + NOSQL + NEWSQL + REALTIME FOR INVESTMENT BANKS

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

PROJECT BOEING SGS. Interim Technology Performance Report 1. Company Name: The Boeing Company. Contract ID: DE-OE

Create and Drive Big Data Success Don t Get Left Behind

Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges

Presented by: Aaron Bossert, Cray Inc. Network Security Analytics, HPC Platforms, Hadoop, and Graphs Oh, My

ForeScout CounterACT. Device Host and Detection Methods. Technology Brief

A Performance Analysis of Distributed Indexing using Terrier

Security strategies to stay off the Børsen front page

AccelOps NOC and SOC Analytics in a Single Pane of Glass Date: March 2016 Author: Tony Palmer, Senior ESG Lab Analyst

Transcription:

www.sgt-inc.com SGT Technology Innovation Center Dasvis Project 12 March 2015 2015 SGT Inc. Rohit Mital Jay Ellis Ashton Webster Grant Orndorff

Introduction About SGT Technology Innovation Center Genesis for Dasvis project 2015 SGT Inc. 2

Purpose Project Goals Develop a real-time distributed processing framework for big data Determine how tools like Dasvis (built upon this framework) can fit in with other tools in the marketplace Design and develop a complementary tool suite to SGT s Cyber Security capabilities to ensure the security of SGT and our infrastructure Dasvis is designed to be a customizable network monitoring tool Will mirror the capabilities of standard SIEM, Network IDS/IPS, and other tools Can accept a variety of inputs 2015 SGT Inc. 3

Data Exfiltration in the News Sony Pictures Entertainment - 2014 Attack on Sony by Guardians of Peace (with suspected Nation-State involvement) in retaliation to the release of movie The Interview Exfiltration of PII from Sony employees / family members, emails, executive salaries, and previously unreleased Sony movies Elimination of wide-scale theatrical movie release Edward Snowden - 2013 Former NSA contractor, CIA, and DIA employee who released thousands of classified documents about NSA s global Surveillance programs Charged with espionage by US DOJ (30 year sentence) and theft of government property, currently living in Russia WikiLeaks - 2006 to present 1.2 million documents published in the first year after website launch Initial communication to WikiLeaks founder by PFC Manning (currently serving 35 year prison term) considered to be the largest leak of classified information in history, to include: 500,000+ US Army reports (Afghan and Iraq War Logs) 250,000+ unredacted US State Department cables 2015 SGT Inc. 4

Real-World Applications Large-scale data exfiltration from both government and commercial sector becoming all too common Loss of sensitive and classified data occurring for and by corporations and Nation States Indicates a need for companies to monitor network and/or user activity to protect against these types of threats Tools and frameworks needed to process the amount of information necessary to thwart these types of attacks 2015 SGT Inc. 5

System Architecture Cloud-based Real-time distributed processing framework Developed using standard, open-source tools with an available labor pool to support future maintenance and expansion Designed with flexibility and portability in mind 2015 SGT Inc. 6

Dasvis Architecture / Tools Configuration Processing Apache 2.4 Web Server Capture and Processing Packet Captures: Pcap4j Data Transfer: Apache Kafka Queue Distributed/Real Time Processing: Apache Storm/Trident Data Storage NoSQL Databases: Primary Packet Store: MongoDB Aggregate/Time Series DB: Cube DB Reporting/Graphing Apache 2.4 Web Server PHP Web Framework: Laravel Graphing/Visualizations: Google Visualizations Post Processing (Future) Integration with HDFS/Hadoop with queries using HQL 2015 SGT Inc. 7

Apache Kafka Queue Kafka is a distributed messaging system that is used to transfer large amounts of data between processes. It is a queue and has producers and consumers Producers push data to a Kafka Queue Consumers pull data from a Kafka Queue Basically a reliable way to send big data from one place to another in virtually any format 2015 SGT Inc. 8

MongoDB and CubeDB MongoDB is a NoSQL database Has collections (analagous to tables in SQL) that can accept documents of varying structures Uses JavaScript Object Notation (JSON) for more flexible format (similar to rows in SQL) Unlike other databases (e.g. MySQL) that require every object inserted to be of the exact same structure/schema CubeDB is a Time Series database that sits on top of MongoDB A time series database is a database that is highly optimized for queries based on time of insertion 2015 SGT Inc. 9

Apache Storm/Trident Storm allows one to process large amounts of data in real time by providing an abstraction for writing distributed processing programs Spout: A unit that creates a stream of data to be processed A unit that accepts a stream of data, performs an operation on it, and optionally passes on more data. Topology: A collection of spouts and bolts connected by the streams of data passed between them Storm Bolts and Spouts can be run as multiple tasks (threads) and even on different machines in parallel Trident is a further abstraction on top of Storm that handles the creation of spouts and bolts in what it deems the most efficient topology 2015 SGT Inc. 10

How it All Fits Together 2015 SGT Inc. 11

Dasvis Storm Topologies: Tracking and Comparing The Tracking Topology looks at incoming data and aggregates data that we want to track Aggregated data is stored in the Time Series database, and sent to the Comparing Topology The Comparing topology compares the incoming data to the Baseline Data to look for anomalies Raw Data Do we want to track this data? Yes Aggregate incoming data Aggregated Data Compares incoming data to baseline data Discard Data Comparison information Tracking Topology Comparing Topology 2015 SGT Inc. 12

A Closer Look at the Tracking Topology Packet Spout: Packet is retrieved from Kafka Queue Packet Parse Packet Parsed to JSON Packet Match Packet Matched with Configurations Packet Aggregation Packet aggregated over time with other packets Single Insertion Packet inserted to MongoDB Aggregate Forward Aggregated packets sent to Comparing Topology via Kafka Queue Aggregate Insertion Packet aggregate data stored in Time Series Database Spouts and bolts make for simple programming abstractions Spouts start the data processing Bolts are operations on those packets Bolts Data Flow 2015 SGT Inc. 13

A Closer Look at the Tracking Topology Packet Spout: Packet Parse Packet Match Single Insertion Aggregate Forward Bolts Can Run as multiple Tasks Tasks can be thought of as threads Packet Aggregation Aggregate Insertion Bolts Task 2015 SGT Inc. 14

Node 1 Node 2 Packet Spout: Packet Parse Packet Match Packet Spout: Packet Parse Packet Match A Closer Look at the Tracking Topology Node 4 Single Insertion Aggregate Forward Bolts can run on multiple nodes in a cluster Each bolt can still run as multiple tasks This greatly improves performance Packet Aggregation Aggregate Insertion Bolts Tasks Nodes Node 3 Node 5 2015 SGT Inc. 15

Episodes and Baseline Data Baseline Data is the data that represents what the incoming data to Dasvis should look like If the incoming data is significantly different from the Baseline Data, then we have an anomaly An Episode is a set of Baseline Data associated with a set of Conditions This allows the user to have different sets of Baseline Data for different times. Episodes of Baseline Data Normal Baseline Data 2015 SGT Inc. 16

Review of Dasvis-Specific Concepts Tracking vs Comparing Topologies Tracking topology records and aggregates the incoming data we want to track Comparing topology decides if there are anomalies in incoming data by comparing against baseline data Baseline Data Past data aggregated by Dasvis that represents the normal distribution of data Episode A set of Baseline Data that is only used at specific times (Ex. only on Mondays, or only during business hours) 2015 SGT Inc. 17

Demo Mini Tutorial Creating a Baseline Setting Baseline Data Example Scenario and expected output Normal data that matches baseline well Potentially malicious activity 2015 SGT Inc. 18

Summary Challenges / Issues Need to clarify current use of Open source tools and potential costs for deploying Dasvis as a COTS product Future Plans Adding new inputs such as Netflow, Application Logs, etc. in addition to packet capture Adherence to NIST Cyber Security Situational Awareness specification 2015 SGT Inc. 19

Comments/Questions? Your Feedback is Appreciated! 2015 SGT Inc. 20