The Big Data Paradigm Shift. Insight Through Automation



Similar documents
Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc All Rights Reserved

Transforming the Telecoms Business using Big Data and Analytics

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics

Beyond Watson: The Business Implications of Big Data

Big Data & Analytics: Your concise guide (note the irony) Wednesday 27th November 2013

Anatomy of Cyber Threats, Vulnerabilities, and Attacks

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, Viswa Sharma Solutions Architect Tata Consultancy Services

How To Make Sense Of Data With Altilia

The session is about to commence. Please switch your phone to silent!

Hur hanterar vi utmaningar inom området - Big Data. Jan Östling Enterprise Technologies Intel Corporation, NER

Using Tableau Software with Hortonworks Data Platform

THE 2014 THREAT DETECTION CHECKLIST. Six ways to tell a criminal from a customer.

Mastering Big Data. Steve Hoskin, VP and Chief Architect INFORMATICA MDM. October 2015

Niara Security Intelligence. Overview. Threat Discovery and Incident Investigation Reimagined

Big Data. White Paper. Big Data Executive Overview WP-BD Jafar Shunnar & Dan Raver. Page 1 Last Updated

Detecting Anomalous Behavior with the Business Data Lake. Reference Architecture and Enterprise Approaches.

Oracle Big Data Building A Big Data Management System

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

SECURITY MEETS BIG DATA. Achieve Effectiveness And Efficiency. Copyright 2012 EMC Corporation. All rights reserved.

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

PLATFORA INTERACTIVE, IN-MEMORY BUSINESS INTELLIGENCE FOR HADOOP

Separating Signal from Noise: Taking Threat Intelligence to the Next Level

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

How To Make Data Streaming A Real Time Intelligence

A New Era Of Analytic

Ramesh Bhashyam Teradata Fellow Teradata Corporation

How To Handle Big Data With A Data Scientist

The Future of Data Management

Big Data for Big Intel

Embedded inside the database. No need for Hadoop or customcode. True real-time analytics done per transaction and in aggregate. On-the-fly linking IP

An Oracle White Paper November Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

IBM QRadar Security Intelligence April 2013

Big Data Integration: A Buyer's Guide

Detect & Investigate Threats. OVERVIEW

Fight fire with fire when protecting sensitive data

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

A Scalable Data Transformation Framework using the Hadoop Ecosystem

How To Create An Insight Analysis For Cyber Security

Apigee Insights Increase marketing effectiveness and customer satisfaction with API-driven adaptive apps

Analyzing HTTP/HTTPS Traffic Logs

Big Data Use Cases Update

Big Data for Public Safety: 4 use cases for intelligence and law enforcement agencies to leverage Big Data for crime prevention.

locuz.com Big Data Services

Unlocking the Intelligence in. Big Data. Ron Kasabian General Manager Big Data Solutions Intel Corporation

Are You Ready for Big Data?

Big Data in Telecom value chain. Presented by: Gurjot S Sandhu Director Sales Xalted Information Systems Pvt. Ltd.

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

The webinar will begin shortly

Understanding Your Customer Journey by Extending Adobe Analytics with Big Data

Big Data and Analytics: Challenges and Opportunities

Discover & Investigate Advanced Threats. OVERVIEW

White Paper. How Streaming Data Analytics Enables Real-Time Decisions

GETTING REAL ABOUT SECURITY MANAGEMENT AND "BIG DATA"

Leveraging Machine Data to Deliver New Insights for Business Analytics

Cyber Security. BDS PhantomWorks. Boeing Energy. Copyright 2011 Boeing. All rights reserved.

The Rise of Industrial Big Data. Brian Courtney General Manager Industrial Data Intelligence

Are You Ready for Big Data?

Big Data: What You Should Know. Mark Child Research Manager - Software IDC CEMA

Addressing Open Source Big Data, Hadoop, and MapReduce limitations

Session 9: Changing Paradigms and Challenges Tools for Space Systems Cyber Situational Awareness

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: Vol. 1, Issue 6, October Big Data and Hadoop

White Paper: Leveraging Web Intelligence to Enhance Cyber Security

Big Data Are You Ready? Thomas Kyte

Data Loss Prevention with Platfora Big Data Analytics

Evolution Of Cyber Threats & Defense Approaches

TUT NoSQL Seminar (Oracle) Big Data

The Enterprise Data Hub and The Modern Information Architecture

How does Big Data disrupt the technology ecosystem of the public cloud?

Big Data Are You Ready? Jorge Plascencia Solution Architect Manager

Data Refinery with Big Data Aspects

Evolution from Big Data to Smart Data

5 Big Data Use Cases to Understand Your Customer Journey CUSTOMER ANALYTICS EBOOK

IBM Big Data in Government

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Chapter 6 - Enhancing Business Intelligence Using Information Systems

Understanding the Value of In-Memory in the IT Landscape

Six Days in the Network Security Trenches at SC14. A Cray Graph Analytics Case Study

WHITE PAPER: THREAT INTELLIGENCE RANKING

HDP Hadoop From concept to deployment.

Big Data & Analytics. The. Deal. About. Jacob Büchler jbuechler@dk.ibm.com Cand. Polit. IBM Denmark, Solution Exec IBM Corporation

Predictive Analytics: Turn Information into Insights

IBM: An Early Leader across the Big Data Security Analytics Continuum Date: June 2013 Author: Jon Oltsik, Senior Principal Analyst

Big Data-Challenges and Opportunities

DATAMEER WHITE PAPER. Beyond BI. Big Data Analytic Use Cases

IDC MaturityScape Benchmark: Big Data and Analytics in Government. Adelaide O Brien Research Director IDC Government Insights June 20, 2014

Take the Red Pill: Becoming One with Your Computing Environment using Security Intelligence

Big Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel

Big data platform for IoT Cloud Analytics. Chen Admati, Advanced Analytics, Intel

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Transcription:

The Big Data Paradigm Shift Insight Through Automation

Agenda The Problem Emcien s Solution: Algorithms solve data related business problems How Does the Technology Work? Case Studies 2013 Emcien, Inc. All rights reserved worldwide. www.emcien.com 2

The Problem Exabytes Data is growing at an unprecedented rate Less than 1% of data is analyzed 40000 30000 20000 10000 50 Fold growth from 2010 to 2020 0 2010 2012 2014 2016 2018 2020 Source: IDC s Digital Universe Study, sponsored by EMC, December 2012 2013 Emcien, Inc. All rights reserved worldwide. www.emcien.com 3

Old Paradigm: Manually Intensive Analysis Unpredictable Slow Expensive Collect Analyze Report 2013 Emcien, Inc. All rights reserved worldwide. www.emcien.com 4

New Paradigm: Automation of Analysis PREDICTABLE FAST ECONOMICAL Collect Solve Review & Act 2013 Emcien, Inc. All rights reserved worldwide. www.emcien.com 5

Emcien s Unique Value Proposition Emcien s automatic pattern-detection platform delivers timely mission critical insights from data Automated analysis for fast, predictable, accurate insight Applicable across all data types: Structured & Unstructured data, Text or Numeric Algorithms designed to solve business problems 2013 Emcien, Inc. All rights reserved worldwide. www.emcien.com 6

Types of Data: Structured, Unstructured, Static, Streaming Machine Data Network log files Social, Blogs, Newsfeeds Email Data Click stream data Marketing Data Sales Data Corporate Data 2013 Emcien, Inc. All rights reserved worldwide. www.emcien.com 7

Limitations of Current Solutions Manually Intensive Very slow and unreliable Search or query based Visualization as a means for discovery High error Only certain data types Numerical analysis only Text only, NLP methods, very high set up cost Data staging Streaming data and recent analysis At-rest data and historic analysis Lack of Scalability Current approaches focus too much on storage methods 2013 Emcien, Inc. All rights reserved worldwide. www.emcien.com 8

Another View of the Big Data Stack Our Focus Value Layer Sectors Analysis Layer Use Cases (Industry specific or cross industry) Banking Insurance Manufacturing Retail Internet Telco Intel/Security Healthcare Algorithms and Analytics Infrastructure Processor Storage Data Click stream Machine Data Corporate Data 2013 Emcien, Inc. All rights reserved worldwide. www.emcien.com 9

How Does Our Solution Work? Big Data problems need graph analysis Framework for analyzing relationships Highly scalable representation Data values Tokens Graph Tokens are linked if they occur together Unstructured Structured Closed CARS Americana Flashter, Inc 01/01/11 2013 Emcien, Inc. All rights reserved worldwide. www.emcien.com 10

Algorithms Solve to Extract Patterns Algorithms surface the highly relevant dependencies Defocus the redundant/noise to surface the signal 2013 Emcien, Inc. All rights reserved worldwide. www.emcien.com 11

Data Patterns That Reveal The Insight Algorithms designed to reveal graph constructs that solve a business problem Solving a Graph Problem Loosely Federated Communities Cliques People Network Substitute Nodes Results in Solving a Business Problem - Reveals groups that behave similarly - Reveals dimensions that bind the group - Impossible to detect in a typical querying system - Highly correlated elements - Optimal query that would lead to insight - Reveals influence network of individual - Highly predictive for adoption behaviors - Nodes that behave very similarly - ID theft or product substitutes 2013 Emcien, Inc. All rights reserved worldwide. www.emcien.com 12

Algorithms Are Highly Scalable For Big Data Traditional Data Storage: Linear growth with transactions Very large storage requirements are Increases response time Graph Data Storage: Size of total number of entities E.g. Store has 500,000 items graph has 500,000 nodes Weights updated with transactions Delivers a global view of the data 200 200 150 150 100 100 50 50 0 1 3 5 7 9 11 13 15 17 19 0 1 3 5 7 9 11 13 15 17 19 2013 Emcien, Inc. All rights reserved worldwide. www.emcien.com 13

Speed of Data Answers Access Time vs. Processing Speed Traditional Data Storage: Graph Data Storage: Limit is query speed In-memory, hadoop cluster approaches Highly dimensional data is a problem Unstructured content is a problem All results are Pre-computed (like Google) Pre-computing speed: 50K trans/sec compute on 1-core 8GB RAM system Speed of response is access time 2013 Emcien, Inc. All rights reserved worldwide. www.emcien.com 14

The Business Problems We Solve Across Different Types of Data Automatically Extract Dependencies Web click-stream - Reveal click patterns & market segments Sales data - Reveal consumption patterns and propensity Clinical trials - extract hypothesis for testing Social Patterns & People Network Marketing - Reveal conversation patterns, people communities For Intel - Reveal bad actors based on conversation patterns Surprising Streaming Content Machine Network traffic Reveal network intrusion Sales transactions Reveal fraud based on unusual patterns Entity Resolution / Cleansing Patterns automatically clusters similar entities. Example credit card transactions, insurance claims with varying merchant names 2013 Emcien, Inc. All rights reserved worldwide. www.emcien.com 15

Intel Case Study (1/4) Cyber Threat Monitoring with Open Source Data Customer Overview And Current Situation Federal agency is failing to keep up with the activity and data in open source Open Source (social, IRC, blogs, etc.) are a key source of communication for underworld Link analysis leads to people of interest network which is key for intelligence Customer Objective Federal agency requires fast methods to process high volume open source data Need automated methods to highlight conversations of interest Need automated link analysis to focus on people of interest Fast and continuous data processing to keep up with the speed of crime 2013 Emcien, Inc. All rights reserved worldwide. www.emcien.com 16

Intel Case Study (2/4) Layers of Analysis for Cyber Threat Monitoring Influential people are ranked based on conversation relevance z People Graph: Seeds to Reveal network of People of Interest Bomb attack next week Helicopter ready for take-off Tanks turn on crowd Pattern Detection to Isolate Conversations of Interest Open Source Photos meta data Chat rooms Twitter Social Streams emails Maps/ geo -location Data Silos 2013 Emcien, Inc. All rights reserved worldwide. www.emcien.com 17 17

Intel Case Study (3/4) Cyber Threat Monitoring with Open Source Data Influential people ranked based on conversations Overview 250,000 Accounts Over 1,000,000 Connections Highly relevant People of Interest network 2013 Emcien, Inc. All rights reserved worldwide. www.emcien.com 18

Intel Case Study (4/4) Cyber Threat Monitoring with Open Source Data The silent signal Automatically detecting a sleeper cell Overview 250,000 Accounts Analyzed Over 1,000,000 Connections 1 Account of Interest 2013 Emcien, Inc. All rights reserved worldwide. www.emcien.com 19

Network Traffic Log Files (1/6) Revealing Patterns In Machine-to-Machine Data Customer Overview Research Institute has thousands of users on their network Must provide controlled safe access for the internal working labs and the outside network Control illegal intrusions, malicious malware and illegal data transmissions Customer Objective Automate Process of Intrusion Detection Scan streaming machine-to-machine log file output Detect surprising/interesting anomalies/beacons Automatically send short list of top ranked questions to ask of the data into existing tools (such as CA, Sumologic, Splunk, etc.) 2013 Emcien, Inc. All rights reserved worldwide. www.emcien.com 20

Network Traffic Log Files (2/6) Example Use Cases Example Use Cases 1. Summarize and Rank Log File data based on Surprising flow patterns 2. Determine Machine network based on flow patterns. Rank Machines based on their influence in the network 3. Detect communities of machines based on how they talk to each other 2013 Emcien, Inc. All rights reserved worldwide. www.emcien.com 21

Network Traffic Log Files (3/6) Reveal Surprising Patterns In Network flow Data Network Log Data Surprising network activity within the data flow 2013 Emcien, Inc. All rights reserved worldwide. www.emcien.com 22

Network Traffic Log Files (4/6) Ranked Summary of Surprising Events Full Log File Details Big Data Example data size: 1,000 transactions/sec Emcien Scout 1 2 3 1 2 3 4 1 2 3 4 Last 24 hours Last 6 hours CA, Sumologic, Splunk 4 Last 30 mins Top Surprising Clusters 2013 Emcien, Inc. All rights reserved worldwide. www.emcien.com 23

Network Traffic Log Files (5/6) Most Influential Nodes on network Audience size for each machine Emcien Scout Most influential machines on the network based on communication patterns 2013 Emcien, Inc. All rights reserved worldwide. www.emcien.com 24

Network Traffic Log Files (6/6) Machine Communities based on how they talk Lab A Lab B Physical Connections Lab A 101011 Lab B 101011 Machine Communication Communities 101011 101011 101011 101011 2013 Emcien, Inc. All rights reserved worldwide. www.emcien.com 25

Intel Case Study (1/6) Reveal Conversation Patterns & Network of Actors in Email Data Customer Overview And Current Situation Federal agency is failing to keep up with the activity and data in email Too much data and current tools are manually intensive Customer Objective Federal agency requires fast methods to process high volume of email data Need automated methods to highlight conversations of interest Need automated link analysis to focus on people of interest based on emails Fast and continuous data processing to keep up with the speed of crime 2013 Emcien, Inc. All rights reserved worldwide. www.emcien.com 26

Intel Case Study (2/6) Automatic Data Collectors Content extracted from emails Addresses extracted and linked Text Summary Email Content Email Data EmcienScout Extraction Program People Network Analysis Email Addresses And Phone nums 2013 Emcien, Inc. All rights reserved worldwide. www.emcien.com 27

Intel Case Study (3/6) Automatic Email Extraction Extracts all Addresses in Header AND Body To & From addresses extracted Messages extracted, each word tokenized and connected into graph. We are working to gather as much information as possible. We are working to gather 2013 Emcien, Inc. All rights reserved worldwide. www.emcien.com 28

Intel Case Study (4/6) Automatic Email Summarization Summarize content from emails to better understand group conversations 2013 Emcien, Inc. All rights reserved worldwide. www.emcien.com 29

Intel Case Study (5/6) People Graph (1/2) Program extracts To/From email addresses and phone numbers from suspects email account Newly created contacts file is loaded into Scout People Graph Large complex graph is created using emails and phone number connections joe@hotmail.com jane@ofc.com 404 555-1212 2013 Emcien, Inc. All rights reserved worldwide. www.emcien.com 30

Intel Case Study (6/6) Algorithm Computes People Graph (2/2) Initial bad actor seed accounts (emails addresses or phone numbers) are selected or entered Ranked List of other accounts that are likely involved with seed accounts based on their connections. 2013 Emcien, Inc. All rights reserved worldwide. www.emcien.com 31

How Emcien Fits Into Your Ecosystem Feed downstream systems with data output Production Servers API UI UI for Analyst who wants to review results Sales Trans Bank Trans Insurance Claims Pattern Detection Platform Compressed Graph Data Representation Social Media News / Blogs Emails / Chat Server Logs Web logs Security Logs Structured Data Unstructured Data Unstructured (Machine Data) 2013 Emcien, Inc. All rights reserved worldwide. www.emcien.com 32

Questions?

Types of Data Many types of Data Structured, Unstructured Text, Numeric, Machine In many states Static (slow batch) Streaming or fast batch Marketing Data Sales Data Corporate Data Social, Blogs, Newsfeeds Email Data Machine Data Network log files Click stream data 2013 Emcien, Inc. All rights reserved worldwide. www.emcien.com 34

Limitations of Current Solutions Manually Intensive: Very slow and unreliable Search or query based Visualization as a means for discovery High error Limitation based on data types Numerical analysis only Text only, NLP methods, very high set up cost Limitation based on data staging Streaming data and recent analysis At-rest data and Historic analysis Scalability Current approaches focus too much on storage methods 2013 Emcien, Inc. All rights reserved worldwide. www.emcien.com 35