Combat Sophisticated Threats
|
|
|
- Diana Hubbard
- 10 years ago
- Views:
Transcription
1 Combat Sophisticated Threats How Big Data and OpenSOC Could Help SESSION ID: SEC-T11 James Sirota Big Data Architect/Data Scientist Cisco Security Solutions
2 In the next few minutes Introduction to Data-Driven Security Overview of OpenSOC Overview of the Analytics Pipeline Survey of Algorithms used by OpenSOC Probabilistic Structures for Stream Estimation Q&A
3 Introduction to Data-Driven Security
4 Traditional Approach to Security Reactive Security Model [1,2,3,4,5] 80% of spending on perimeter defenses Average security tools per organization Collectively generate ~20,000 events per second Tools are highly specialized 80% of alerts require additional follow-up
5 Traditional Approach to Security Investigation and Incident Response [5,6] 66% take an hour or longer to identify 91% take a day or longer to discover root cause 81% take a day or longer to fix 84% of fixes take a week or longer to validate
6 Where things break down Challenges with perimeter defense [2] Most organizations support BYOD 47% of organizations use cloud services Global Workplace Challenges with Point Tools [4, 5] Too many tools and manual processes Too many false positive alerts Lack of integrated architecture Not enough data sources
7 Where things break down Challenges with Adding Data Sources Volume: from Terabytes to Zettabytes Variety: structured to unstructured Velocity: everything is a sensor, generates logs Veracity: data quality issues Value: What to keep? Purge? Summarize? Skills Challenge [4,6,7] Staff spend ~40% less time on tasks than ideal 35% of organizations use contractors
8 Cisco Approach to Security Data-Driven Security Model Moving from reactive to predictive Moving from point tools to a unified platform Shifting skills from specialists to data scientists Emphasis on quality and not quantity of alerts Emphasis on closed-loop analytics
9 Overview of OpenSOC
10 Introducing OpenSOC Supports Data-Driven Security Model Unified platform for ingest, storage, analytics Provides multiple views/access patterns for data Interactive Analytics and Predictive Modeling Provides contextual real-time alerts Rapid deployment and scoring
11 Emergence of Big Data Systems that scale out, not up Parallel and scalable computation tools Cheap, massively-scalable storage Stream computation + stream analysis Scaling + approximation
12 Technology Behind OpenSOC Telemetry Capture Layer: Apache Flume Data Bus: Apache Kafka Stream Processor: Apache Storm Real-Time Index and Search: Elastic Search Long-Term Data Store: Apache Hive Long-Term Packet Store: Apache Hbase Visualization Platform: Kibana
13 Introducing OpenSOC Intersection of Big Data and Security Analytics Real-Time Alerts Anomaly Detection Data Correlation Hadoop Scalable Compute Multi Petabyte Storage Interactive Query Real-Time Search OpenSOC Big Data Platform Rules and Reports Predictive Modeling UI and Applications Unstructured Data Scalable Stream Processing Data Access Control
14 OpenSOC in a Nutshell Threat Intelligence Feeds Raw Network Stream Network Metadata Stream Netflow Syslog Raw Application Logs Parse + Format Enrich Alert Log Mining and Analytics Elastic Search Applications + Analyst Tools Network Packet Mining and PCAP Reconstructio n HBase Big Data Exploration, Predictive Modeling Hive Other Streaming Telemetry Real-Time Index Raw Packet Store Long-Term Store Enrichment Data
15 OpenSOC Journey Sept 2014 General Availability May 2014 CR Work off April 2014 Sept 2013 Dec 2013 Hortonworks joins the project March 2014 Platform development finished First beta test at customer site First Prototype
16 Overview of the Analytics Pipeline
17 Analytics Pipeline Flume Kafka Storm OpenSOC-Streaming RAW Transform Enrich Alert (Rules-Based) Big Data Stores Elastic Search Real-Time Index and Search HIVE Long-Term Data Store OpenSOC-Aggregation Filter Aggregators HBase Windowed Rollup Store Enriched OpenSOC-ML Router Model 1 Model 2 Scorer HIVE Alerts Store Model n Alerts UI UI UI SOC Alert Consumers UI UI Web Services Secure Gateway Services External Alert Consumers Remedy Ticketing System
18 OpenSOC-Streaming KAFKA TOPIC: RAW [TIMESTAMP] Log message: "This [TIMESTAMP] is a test log Log message", message: generated "This [TIMESTAMP] is a from test log Log message", message: mydomain.com, generated "This is a from test log message", :123 mydomain.com, generated from -> : :123 mydomain.com, {TCP} -> : :123 {TCP} -> :456 {TCP} OpenSOC-Streaming 1:1 KAFKA TOPIC: ENRICHED { msg_content : This is a test log message, processed: { msg_content : [TIMESTAMP], This is src_ip: a test , log message, src_port: 123, dest_ip: , dest_port: processed: { msg_content : 456, [TIMESTAMP], domain: This mydomain.com, is src_ip: a test , log message, protocol:"tcp" src_port: 123, dest_ip: , geo_enrichment:{ dest_port: processed: 456, [TIMESTAMP], domain: mydomain.com, src_ip: , protocol:"tcp" src_port: 123, dest_ip: , geo_enrichment:{ dest_port: 456, source:{ domain: mydomain.com, protocol:"tcp" geo_enrichment:{ source:{ postalcode: 95134,areaCode: 408,metroCode: 807,longitude: , source:{ latitude:37.425,locid:4522,city:"san postalcode: 95134,areaCode: 408,metroCode: Jose",country:"US"} 807,longitude: , dest:{ latitude:37.425,locid:4522,city:"san postalcode: 95134,areaCode: 408,metroCode: Jose",country:"US"} 807,longitude: , dest:{ postalcode: latitude:37.425,locid:4522,city:"san 95134,areaCode: 408,metroCode: Jose",country:"US"} 807,longitude: , dest:{ latitude:37.425,locid:4522,city:"san postalcode: 95134,areaCode: 408,metroCode: Jose",country:"US"}} 807,longitude: , whois_enrichment:{ latitude:37.425,locid:4522,city:"san postalcode: 95134,areaCode: 408,metroCode: Jose",country:"US"}} 807,longitude: , whois_enrichment:{ source: { latitude:37.425,locid:4522,city:"san Jose",country:"US"}} whois_enrichment:{ source: OrgId":"CISCOS","Parent":"NET ", { source: OrgAbuseName":"Cisco OrgId":"CISCOS","Parent":"NET ", { Systems Inc", "RegDate":" ","OrgName":"Cisco OrgAbuseName":"Cisco OrgId":"CISCOS","Parent":"NET ", Systems Inc", Systems", "Address":"170 "RegDate":" ","OrgName":"Cisco OrgAbuseName":"Cisco West Tasman Drive", Systems Inc", Systems", "NetType":"Direct "Address":"170 "RegDate":" ","OrgName":"Cisco Assignment"} West Tasman Drive", Systems", dest: { "NetType":"Direct "Address":"170 Assignment"} West Tasman Drive", dest: { OrgId":"CISCOS","Parent":"NET ", "NetType":"Direct Assignment"} dest: "OrgAbuseName":"Cisco { OrgId":"CISCOS","Parent":"NET ", Systems Inc", "RegDate":" ","OrgName":"Cisco "OrgAbuseName":"Cisco OrgId":"CISCOS","Parent":"NET ", Systems Inc", Systems", "Address":"170 "RegDate":" ","OrgName":"Cisco "OrgAbuseName":"Cisco West Tasman Drive", Systems Inc", Systems", "NetType":"Direct "Address":"170 "RegDate":" ","OrgName":"Cisco Assignment"}}} West Tasman Drive", Systems", "NetType":"Direct "Address":"170 Assignment"}}} West Tasman Drive", "NetType":"Direct Assignment"}}}
19 Streaming Topology Message Stream Shuffle Grouping Parsers Enrich Geo Enrich Whois Control Stream ALL Grouping Parser Enrich Enrich KAFKA Topic: Enriched Kafka Spout MESSAGE Parser Enrich Enrich KAFKA Topic: Enriched Parser Enrich Enrich KAFKA Topic: Enriched Parser Enrich Enrich KAFKA Topic: Enriched Parser Plugin Geo Plug-In Whois Plug-In SQL Store K-V Store
20 OpenSOC-Aggregation KAFKA TOPIC: ENRICHED { msg_content : This is a test log message, processed: { msg_content : [TIMESTAMP], This is src_ip: a test , log message, src_port: 123, dest_ip: , dest_port: processed: { msg_content : 456, [TIMESTAMP], domain: This mydomain.com, is src_ip: a test , log message, protocol:"tcp" src_port: 123, dest_ip: , geo_enrichment:{ dest_port: processed: 456, [TIMESTAMP], domain: mydomain.com, src_ip: , protocol:"tcp" src_port: 123, dest_ip: , geo_enrichment:{ dest_port: 456, source:{ domain: mydomain.com, protocol:"tcp" geo_enrichment:{ source:{ postalcode: 95134,areaCode: 408,metroCode: 807,longitude: , source:{ latitude:37.425,locid:4522,city:"san postalcode: 95134,areaCode: 408,metroCode: Jose",country:"US"} 807,longitude: , dest:{ latitude:37.425,locid:4522,city:"san postalcode: 95134,areaCode: 408,metroCode: Jose",country:"US"} 807,longitude: , dest:{ postalcode: latitude:37.425,locid:4522,city:"san 95134,areaCode: 408,metroCode: Jose",country:"US"} 807,longitude: , dest:{ latitude:37.425,locid:4522,city:"san postalcode: 95134,areaCode: 408,metroCode: Jose",country:"US"}} 807,longitude: , whois_enrichment:{ latitude:37.425,locid:4522,city:"san postalcode: 95134,areaCode: 408,metroCode: Jose",country:"US"}} 807,longitude: , whois_enrichment:{ source: { latitude:37.425,locid:4522,city:"san Jose",country:"US"}} whois_enrichment:{ source: OrgId":"CISCOS","Parent":"NET ", { source: OrgAbuseName":"Cisco OrgId":"CISCOS","Parent":"NET ", { Systems Inc", "RegDate":" ","OrgName":"Cisco OrgAbuseName":"Cisco OrgId":"CISCOS","Parent":"NET ", Systems Inc", Systems", "Address":"170 "RegDate":" ","OrgName":"Cisco OrgAbuseName":"Cisco West Tasman Drive", Systems Inc", Systems", "NetType":"Direct "Address":"170 "RegDate":" ","OrgName":"Cisco Assignment"} West Tasman Drive", Systems", dest: { "NetType":"Direct "Address":"170 Assignment"} West Tasman Drive", dest: { OrgId":"CISCOS","Parent":"NET ", "NetType":"Direct Assignment"} dest: "OrgAbuseName":"Cisco { OrgId":"CISCOS","Parent":"NET ", Systems Inc", "RegDate":" ","OrgName":"Cisco "OrgAbuseName":"Cisco OrgId":"CISCOS","Parent":"NET ", Systems Inc", Systems", "Address":"170 "RegDate":" ","OrgName":"Cisco "OrgAbuseName":"Cisco West Tasman Drive", Systems Inc", Systems", "NetType":"Direct "Address":"170 "RegDate":" ","OrgName":"Cisco Assignment"}}} West Tasman Drive", Systems", "NetType":"Direct "Address":"170 Assignment"}}} West Tasman Drive", "NetType":"Direct Assignment"}}} 1:* OpenSOC-Aggregation RULES OpenSOC-Streaming HBASE yyyy:mm:dd:hh:[bucket_id]: { total-messages:1003 protocol-tcp:305 protocol-udp:201 city-from-phoenix:11 city-from-sanjose:405 country-from-usa:10 country-from-ca:11 ip-from-sketch(a):301 ip-from-sketch(n):55 port-from-sketch(a):44 port-from-sketch(n):12 }
21 Aggregation Topology Message Stream Shuffle Grouping Control Stream ALL Grouping Filters Filter Pulse Spout KEY Aggregators Aggreg ator HBase KEY:CF:COUNT TIMER Kafka Spout MESSAGE Filter Filter TUPLE TUPLE TUPLE TUPLE Aggreg ator Aggreg ator HBase KEY:CF:COUNT HBase KEY:CF:COUNT Filter Aggreg ator HBase KEY:CF:COUNT MAPPER FILTER AGGREGAT E
22 HBase Aggregation Table Key Total- Messages Feature 1 Feature 2 Feature 3 Feature 4 Feature 5 Feature n
23 OpenSOC-ML KAFKA TOPIC: ENRICHED { msg_content : This is a test log message, processed: { msg_content : [TIMESTAMP], This is src_ip: a test , log message, src_port: 123, dest_ip: , dest_port: processed: { msg_content : 456, [TIMESTAMP], domain: This mydomain.com, is src_ip: a test , log message, protocol:"tcp" src_port: 123, dest_ip: , geo_enrichment:{ dest_port: processed: 456, [TIMESTAMP], domain: mydomain.com, src_ip: , protocol:"tcp" src_port: 123, dest_ip: , geo_enrichment:{ dest_port: 456, source:{ domain: mydomain.com, protocol:"tcp" geo_enrichment:{ source:{ postalcode: 95134,areaCode: 408,metroCode: 807,longitude: , source:{ latitude:37.425,locid:4522,city:"san postalcode: 95134,areaCode: 408,metroCode: Jose",country:"US"} 807,longitude: , dest:{ latitude:37.425,locid:4522,city:"san postalcode: 95134,areaCode: 408,metroCode: Jose",country:"US"} 807,longitude: , dest:{ postalcode: latitude:37.425,locid:4522,city:"san 95134,areaCode: 408,metroCode: Jose",country:"US"} 807,longitude: , dest:{ latitude:37.425,locid:4522,city:"san postalcode: 95134,areaCode: 408,metroCode: Jose",country:"US"}} 807,longitude: , whois_enrichment:{ latitude:37.425,locid:4522,city:"san postalcode: 95134,areaCode: 408,metroCode: Jose",country:"US"}} 807,longitude: , whois_enrichment:{ source: { latitude:37.425,locid:4522,city:"san Jose",country:"US"}} whois_enrichment:{ source: OrgId":"CISCOS","Parent":"NET ", { source: OrgAbuseName":"Cisco OrgId":"CISCOS","Parent":"NET ", { Systems Inc", "RegDate":" ","OrgName":"Cisco OrgAbuseName":"Cisco OrgId":"CISCOS","Parent":"NET ", Systems Inc", Systems", "Address":"170 "RegDate":" ","OrgName":"Cisco OrgAbuseName":"Cisco West Tasman Drive", Systems Inc", Systems", "NetType":"Direct "Address":"170 "RegDate":" ","OrgName":"Cisco Assignment"} West Tasman Drive", Systems", dest: { "NetType":"Direct "Address":"170 Assignment"} West Tasman Drive", dest: { OrgId":"CISCOS","Parent":"NET ", "NetType":"Direct Assignment"} dest: "OrgAbuseName":"Cisco { OrgId":"CISCOS","Parent":"NET ", Systems Inc", "RegDate":" ","OrgName":"Cisco "OrgAbuseName":"Cisco OrgId":"CISCOS","Parent":"NET ", Systems Inc", Systems", "Address":"170 "RegDate":" ","OrgName":"Cisco "OrgAbuseName":"Cisco West Tasman Drive", Systems Inc", Systems", "NetType":"Direct "Address":"170 "RegDate":" ","OrgName":"Cisco Assignment"}}} West Tasman Drive", Systems", "NetType":"Direct "Address":"170 Assignment"}}} West Tasman Drive", "NetType":"Direct Assignment"}}} OpenSOC-Streaming KAFKA OpenSOC-ML *:* alert: { } anomaly-type: irregular pattern, models-triggered: [m1,m2,m4]
24 ML Topology Pulse Spout Event ID Online Models Model Event ID TIMER Model Kafka Router Spout MESSAGE Offline Models Scorer Alert KAFKA Topic: Alerts Message Stream ALL Grouping Control Stream ALL Grouping Model Model Eval Algo HBase MR JOB Spark JOB HIVE Long-Term Data Store
25 Survey of Algorithms Used by OpenSOC
26 Types of Algorithms Offline analyze entire data set at once Generally a good fit for Apache Hadoop/Map Reduce Model compiled via batch, scored via stream processor Online start with an initial state and analyze each piece of data serially one at a time Generally require a chain of Map Reduce jobs Good fit for Apache Spark, Tez, Storm Primarily batch, good for Lambda architectures Online Streaming real-time, in-memory, limited analysis Generally a good fit for Apache Storm, Spark-Streaming Probabilistic, random structures, approximations
27 Online Models Stream Processing Bolt Bolt Stream Bolt Bolt Bolt Bolt Result Batch Workflow Hadoop
28 Examples: Stream Clustering StreamKM++: computes a small weighted sample of the data stream and it uses the k- means++ algorithm as a randomized seeding technique to choose the first values for the clusters. CluStream: micro-clusters are temporal extensions of cluster feature vectors. The microclusters are stored at snapshots in time following a pyramidal pattern. ClusTree: It is a parameter free algorithm automatically adapting to the speed of the stream and it is capable of detecting concept drift, novelty, and outliers in the stream. DenStream: uses dense micro-clusters (named core-micro-cluster) to summarize clusters. To maintain and distinguish the potential clusters and outliers, this method presents coremicro-cluster and outlier micro-cluster structures. D-Stream: maps each input data record into a grid and it computes the grid density. The grids are clustered based on the density. This algorithm adopts a density decaying technique to capture the dynamic changes of a data stream. CobWeb: uses a classification tree. Each node in a classification tree represents a class (concept) and is labeled by a probabilistic concept that summarizes the attribute-value distributions of objects classified under the node 28
29 Example: Stream Classification Hoeffding Tree (VFDT) incremental, anytime decision tree induction algorithm that is capable of learning from massive data streams, assuming that the distribution generating examples does not change over time Half-Space Trees ensemble model that randomly spits data into half spaces. They are created online and detect anomalies by their deviations in placement within the forest relative to other data from the same window 29
30 Examples: Outlier Detection [10] Median Absolute Deviation: Telemetry is anomalous if the deviation of its latest datapoint with respect to the median is X times larger than the median of deviations Standard Deviation from Average: Telemetry is anomalous if the absolute value of the average of the latest three datapoint minus the moving average is greater than three standard deviations of the average. Standard Deviation from Moving Average: Telemetry is anomalous if the absolute value of the average of the latest three datapoints minus the moving average is greater than three standard deviations of the moving average.
31 Examples: Outlier Detection [10] Mean Subtraction Cumulation: Telemetry is anomalous if the value of the next datapoint in the series is farther than three standard deviations out in cumulative terms after subtracting the mean from each data point Least Squares: Telemetry is anomalous if the average of the last three datapoints on a projected least squares model is greater than three sigma Histogram Bins: Telemetry is anomalous if the average of the last three datapoints falls into a histogram bin with less than x 31
32 Offline Models Stream Processing Stream Bolt Result Batch Workflow Stream Stream Stream Hadoop Batch JOB Model
33 Offline Algorithms Hypothesis Tests Chi2 Test (Goodness of Fit): A feature is anomalous if it the data for the latest micro batch (for the last 10 minutes) comes from a different distribution than the historical distribution for that feature Grubbs Test: telemetry is anomalous if the Z score is greater than the Grubb's score. Kolmogorov-Smirnov Test: check if data distribution for last 10 minutes is different from last hour Simple Outliers test: telemetry is anomalous if the number of outliers for the last 10 minutes is statistically different then the historical number of outliers for that time frame Decision Trees/Random Forests Association Rules (Apriori) BIRCH/DBSCAN Clustering Auto Regressive (AR) Moving Average (MA)
34 Approximation for Streaming
35 Approximation for Streaming Skip Lists Trade memory for lookup time increase Hierarchical layered indexing on top of lists Lookup improvement: O(n) -> O(log n) Best Uses: Search for element in a list Approximate the number of distinct elements in a list
36 Approximation for Streaming HyperLogLog Turns each feature to a hash value Keeps track of the longest run of a feature Approximates the number of occurrences of a feature based on it s longest runs Best Uses: Estimate how rare an occurrence of a feature is Approximate count of distinct objects over time Estimate entropy of a data set
37 Approximation for Streaming Bloom Filters Trade memory for fast set membership check Hash all incoming elements to hash sets Use multiple hash functions and multiple hash sets of varying sizes Best Uses: Check element membership in a set
38 Approximation for Streaming Sketches Top-K: count of top elements in the stream Count-Min: histograms over large feature sets (similar to Bloom Filters with counts) Digest algorithm for computing approximate quantiles on a collection of integers
39 Thank You Follow us on MTD Data Science Team James Sam Nate 39
40 Sources [1] RSA Incident Response Teams are the New (Security) Black [2] PWC - The Global State of Information Security Survey [3] Tripwire 2014 IT SECURITY BUDGET FORECAST ROUNDUP FOR CIOs AND CISOs/CSOs [4] NetworkWorld: Real-Time Big Data Security Analytics for Incident Detection [5] CAS Static Analysis Tool Study [6] TIBCO Cyber Security Platform [7] Reuters [8] Staffing the Informaton Security Organization [9] SBT Global Survey us.westcon.com/documents/.../stbglobalsurveywpen11nov2013web.pdf [10] Etsy Kale 40
Cisco OpenSOC Hadoop Design with Bro
Cisco OpenSOC Hadoop Design with Bro Kurt Grutzmacher (@grutz) Solutions Architect, Cisco Security Services 19 August 2014 What Does OpenSOC Address? SOC Challenges: Volume: from Terabytes to Zettabytes
HDP Hadoop From concept to deployment.
HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some
HUAWEI Advanced Data Science with Spark Streaming. Albert Bifet (@abifet)
HUAWEI Advanced Data Science with Spark Streaming Albert Bifet (@abifet) Huawei Noah s Ark Lab Focus Intelligent Mobile Devices Data Mining & Artificial Intelligence Intelligent Telecommunication Networks
Presented by: Aaron Bossert, Cray Inc. Network Security Analytics, HPC Platforms, Hadoop, and Graphs Oh, My
Presented by: Aaron Bossert, Cray Inc. Network Security Analytics, HPC Platforms, Hadoop, and Graphs Oh, My The Proverbial Needle In A Haystack Problem The Nuclear Option Problem Statement and Proposed
Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase
Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform
What s New in Security Analytics 10.4. Be the Hunter.. Not the Hunted
What s New in Security Analytics 10.4 Be the Hunter.. Not the Hunted Attackers Are Outpacing Detection Attacker Capabilities Time To Discovery Source: VERIZON 2014 DATA BREACH INVESTIGATIONS REPORT 2 TRANSFORM
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control EP/K006487/1 UK PI: Prof Gareth Taylor (BU) China PI: Prof Yong-Hua Song (THU) Consortium UK Members: Brunel University
Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: [email protected] Website: www.qburst.com
Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...
SEIZE THE DATA. 2015 SEIZE THE DATA. 2015
1 Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. BIG DATA CONFERENCE 2015 Boston August 10-13 Predicting and reducing deforestation
Real Time Fraud Detection With Sequence Mining on Big Data Platform. Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May 6 2014 Santa Clara, CA
Real Time Fraud Detection With Sequence Mining on Big Data Platform Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May 6 2014 Santa Clara, CA Open Source Big Data Eco System Query (NOSQL) : Cassandra,
Comprehensive Analytics on the Hortonworks Data Platform
Comprehensive Analytics on the Hortonworks Data Platform We do Hadoop. Page 1 Page 2 Back to 2005 Page 3 Vertical Scaling Page 4 Vertical Scaling Page 5 Vertical Scaling Page 6 Horizontal Scaling Page
Big Data Analytics. An Introduction. Oliver Fuchsberger University of Paderborn 2014
Big Data Analytics An Introduction Oliver Fuchsberger University of Paderborn 2014 Table of Contents I. Introduction & Motivation What is Big Data Analytics? Why is it so important? II. Techniques & Solutions
ANALYTICS CENTER LEARNING PROGRAM
Overview of Curriculum ANALYTICS CENTER LEARNING PROGRAM The following courses are offered by Analytics Center as part of its learning program: Course Duration Prerequisites 1- Math and Theory 101 - Fundamentals
BigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic
BigData An Overview of Several Approaches David Mera Masaryk University Brno, Czech Republic 16/12/2013 Table of Contents 1 Introduction 2 Terminology 3 Approaches focused on batch data processing MapReduce-Hadoop
The 4 Pillars of Technosoft s Big Data Practice
beyond possible Big Use End-user applications Big Analytics Visualisation tools Big Analytical tools Big management systems The 4 Pillars of Technosoft s Big Practice Overview Businesses have long managed
Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc. 2011 2014. All Rights Reserved
Hortonworks & SAS Analytics everywhere. Page 1 A change in focus. A shift in Advertising From mass branding A shift in Financial Services From Educated Investing A shift in Healthcare From mass treatment
Towards Smart and Intelligent SDN Controller
Towards Smart and Intelligent SDN Controller - Through the Generic, Extensible, and Elastic Time Series Data Repository (TSDR) YuLing Chen, Dell Inc. Rajesh Narayanan, Dell Inc. Sharon Aicler, Cisco Systems
Internet of Things. Opportunity Challenges Solutions
Internet of Things Opportunity Challenges Solutions Copyright 2014 Boeing. All rights reserved. GPDIS_2015.ppt 1 ANALYZING INTERNET OF THINGS USING BIG DATA ECOSYSTEM Internet of Things matter for... Industrial
Upcoming Announcements
Enterprise Hadoop Enterprise Hadoop Jeff Markham Technical Director, APAC [email protected] Page 1 Upcoming Announcements April 2 Hortonworks Platform 2.1 A continued focus on innovation within
locuz.com Big Data Services
locuz.com Big Data Services Big Data At Locuz, we help the enterprise move from being a data-limited to a data-driven one, thereby enabling smarter, faster decisions that result in better business outcome.
How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning
How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume
HADOOP. Revised 10/19/2015
HADOOP Revised 10/19/2015 This Page Intentionally Left Blank Table of Contents Hortonworks HDP Developer: Java... 1 Hortonworks HDP Developer: Apache Pig and Hive... 2 Hortonworks HDP Developer: Windows...
Pulsar Realtime Analytics At Scale. Tony Ng April 14, 2015
Pulsar Realtime Analytics At Scale Tony Ng April 14, 2015 Big Data Trends Bigger data volumes More data sources DBs, logs, behavioral & business event streams, sensors Faster analysis Next day to hours
Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time?
Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time? Kai Wähner [email protected] @KaiWaehner www.kai-waehner.de Disclaimer! These opinions are my own and do not necessarily
Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.
Collaborative Big Data Analytics 1 Big Data Is Less About Size, And More About Freedom TechCrunch!!!!!!!!! Total data: bigger than big data 451 Group Findings: Big Data Is More Extreme Than Volume Gartner!!!!!!!!!!!!!!!
Real-time Data Analytics mit Elasticsearch. Bernhard Pflugfelder inovex GmbH
Real-time Data Analytics mit Elasticsearch Bernhard Pflugfelder inovex GmbH Bernhard Pflugfelder Big Data Engineer @ inovex Fields of interest: search analytics big data bi Working with: Lucene Solr Elasticsearch
Oracle Big Data SQL Technical Update
Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical
BIG DATA What it is and how to use?
BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14
Big Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel
Big Data and Analytics: Getting Started with ArcGIS Mike Park Erik Hoel Agenda Overview of big data Distributed computation User experience Data management Big data What is it? Big Data is a loosely defined
BIG DATA ANALYTICS For REAL TIME SYSTEM
BIG DATA ANALYTICS For REAL TIME SYSTEM Where does big data come from? Big Data is often boiled down to three main varieties: Transactional data these include data from invoices, payment orders, storage
HDP Enabling the Modern Data Architecture
HDP Enabling the Modern Data Architecture Herb Cunitz President, Hortonworks Page 1 Hortonworks enables adoption of Apache Hadoop through HDP (Hortonworks Data Platform) Founded in 2011 Original 24 architects,
Embedded inside the database. No need for Hadoop or customcode. True real-time analytics done per transaction and in aggregate. On-the-fly linking IP
Operates more like a search engine than a database Scoring and ranking IP allows for fuzzy searching Best-result candidate sets returned Contextual analytics to correctly disambiguate entities Embedded
Transforming the Telecoms Business using Big Data and Analytics
Transforming the Telecoms Business using Big Data and Analytics Event: ICT Forum for HR Professionals Venue: Meikles Hotel, Harare, Zimbabwe Date: 19 th 21 st August 2015 AFRALTI 1 Objectives Describe
Detecting Anomalous Behavior with the Business Data Lake. Reference Architecture and Enterprise Approaches.
Detecting Anomalous Behavior with the Business Data Lake Reference Architecture and Enterprise Approaches. 2 Detecting Anomalous Behavior with the Business Data Lake Pivotal the way we see it Reference
Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook
Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future
Building Scalable Big Data Infrastructure Using Open Source Software. Sam William sampd@stumbleupon.
Building Scalable Big Data Infrastructure Using Open Source Software Sam William sampd@stumbleupon. What is StumbleUpon? Help users find content they did not expect to find The best way to discover new
Architectures for massive data management
Architectures for massive data management Apache Kafka, Samza, Storm Albert Bifet [email protected] October 20, 2015 Stream Engine Motivation Digital Universe EMC Digital Universe with
Getting the Most Out of SIEM. Presentation Title. Data in Big Data. Presented By: Dr. Char Sample, CERT
Getting the Most Out of SIEM Presentation Title Data in Big Data Presented By: Dr. Char Sample, CERT Acknowledgements Dr. Ben Shniederman, UMD Big Data Big Insights George Jones, John Stogoski, CERT Alternatives
PALANTIR CYBER An End-to-End Cyber Intelligence Platform for Analysis & Knowledge Management
PALANTIR CYBER An End-to-End Cyber Intelligence Platform for Analysis & Knowledge Management INTRODUCTION Traditional perimeter defense solutions fail against sophisticated adversaries who target their
Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data
Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give
The basic data mining algorithms introduced may be enhanced in a number of ways.
DATA MINING TECHNOLOGIES AND IMPLEMENTATIONS The basic data mining algorithms introduced may be enhanced in a number of ways. Data mining algorithms have traditionally assumed data is memory resident,
LOG AND EVENT MANAGEMENT FOR SECURITY AND COMPLIANCE
PRODUCT BRIEF LOG AND EVENT MANAGEMENT FOR SECURITY AND COMPLIANCE The Tripwire VIA platform delivers system state intelligence, a continuous approach to security that provides leading indicators of breach
SQLstream Blaze and Apache Storm A BENCHMARK COMPARISON
SQLstream Blaze and Apache Storm A BENCHMARK COMPARISON 2 The V of Big Data Velocity means both how fast data is being produced and how fast the data must be processed to meet demand. Gartner The emergence
Hadoop Ecosystem B Y R A H I M A.
Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open
Advanced In-Database Analytics
Advanced In-Database Analytics Tallinn, Sept. 25th, 2012 Mikko-Pekka Bertling, BDM Greenplum EMEA 1 That sounds complicated? 2 Who can tell me how best to solve this 3 What are the main mathematical functions??
Big Data Analytics with Spark and Oscar BAO. Tamas Jambor, Lead Data Scientist at Massive Analytic
Big Data Analytics with Spark and Oscar BAO Tamas Jambor, Lead Data Scientist at Massive Analytic About me Building a scalable Machine Learning platform at MA Worked in Big Data and Data Science in the
Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database
Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica
Hadoop s Advantages for! Machine! Learning and. Predictive! Analytics. Webinar will begin shortly. Presented by Hortonworks & Zementis
Webinar will begin shortly Hadoop s Advantages for Machine Learning and Predictive Analytics Presented by Hortonworks & Zementis September 10, 2014 Copyright 2014 Zementis, Inc. All rights reserved. 2
Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84
Index A Amazon Web Services (AWS), 50, 58 Analytics engine, 21 22 Apache Kafka, 38, 131 Apache S4, 38, 131 Apache Sqoop, 37, 131 Appliance pattern, 104 105 Application architecture, big data analytics
End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ
End to End Solution to Accelerate Data Warehouse Optimization Franco Flore Alliance Sales Director - APJ Big Data Is Driving Key Business Initiatives Increase profitability, innovation, customer satisfaction,
Creating Big Data Applications with Spring XD
Creating Big Data Applications with Spring XD Thomas Darimont @thomasdarimont THE FASTEST PATH TO NEW BUSINESS VALUE Journey Introduction Concepts Applications Outlook 3 Unless otherwise indicated, these
Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop
1 Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop 2 Pivotal s Full Approach It s More Than Just Hadoop Pivotal Data Labs 3 Why Pivotal Exists First Movers Solve the Big Data Utility Gap
LOG INTELLIGENCE FOR SECURITY AND COMPLIANCE
PRODUCT BRIEF uugiven today s environment of sophisticated security threats, big data security intelligence solutions and regulatory compliance demands, the need for a log intelligence solution has become
The session is about to commence. Please switch your phone to silent!
The session is about to commence. Please switch your phone to silent! 1 Defend with Confidence Against Advanced Threats Nicholas Chia SE Manager, SEA RSA 2 TRUST? Years to earn, seconds to break 3 Market
Big Data & Security. Aljosa Pasic 12/02/2015
Big Data & Security Aljosa Pasic 12/02/2015 Welcome to Madrid!!! Big Data AND security: what is there on our minds? Big Data tools and technologies Big Data T&T chain and security/privacy concern mappings
WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat
Information Builders enables agile information solutions with business intelligence (BI) and integration technologies. WebFOCUS the most widely utilized business intelligence platform connects to any enterprise
Detect & Investigate Threats. OVERVIEW
Detect & Investigate Threats. OVERVIEW HIGHLIGHTS Introducing RSA Security Analytics, Providing: Security monitoring Incident investigation Compliance reporting Providing Big Data Security Analytics Enterprise-wide
The Next Big Thing in the Internet of Things: Real-time Big Data Analytics
The Next Big Thing in the Internet of Things: Real-time Big Data Analytics Dale Skeen CTO and Co-Founder 2014. VITRIA TECHNOLOGY, INC. All rights reserved. Internet of Things (IoT) Devices > People In
TORNADO Solution for Telecom Vertical
BIG DATA ANALYTICS & REPORTING TORNADO Solution for Telecom Vertical Overview Last decade has see a rapid growth in wireless and mobile devices such as smart- phones, tablets and netbook is becoming very
Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015
Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL May 2015 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document
Sunnie Chung. Cleveland State University
Sunnie Chung Cleveland State University Data Scientist Big Data Processing Data Mining 2 INTERSECT of Computer Scientists and Statisticians with Knowledge of Data Mining AND Big data Processing Skills:
Big Data and Advanced Analytics Technologies for the Smart Grid
1 Big Data and Advanced Analytics Technologies for the Smart Grid Arnie de Castro, PhD SAS Institute IEEE PES 2014 General Meeting July 27-31, 2014 Panel Session: Using Smart Grid Data to Improve Planning,
The Internet of Things and Big Data: Intro
The Internet of Things and Big Data: Intro John Berns, Solutions Architect, APAC - MapR Technologies April 22 nd, 2014 1 What This Is; What This Is Not It s not specific to IoT It s not about any specific
Next Gen Hadoop Gather around the campfire and I will tell you a good YARN
Next Gen Hadoop Gather around the campfire and I will tell you a good YARN Akmal B. Chaudhri* Hortonworks *about.me/akmalchaudhri My background ~25 years experience in IT Developer (Reuters) Academic (City
Streamdrill: Analyzing Big Data Streams in Realtime
Streamdrill: Analyzing Big Data Streams in Realtime Mikio L. Braun [email protected] @mikiobraun th 6 Realtime Big Data: Sources Finance Gaming Monitoring Advertisment Sensor Networks Social Media
Openbus Documentation
Openbus Documentation Release 1 Produban February 17, 2014 Contents i ii An open source architecture able to process the massive amount of events that occur in a banking IT Infraestructure. Contents:
SIMPLE MACHINE HEURISTIC INTELLIGENT AGENT FRAMEWORK
SIMPLE MACHINE HEURISTIC INTELLIGENT AGENT FRAMEWORK Simple Machine Heuristic (SMH) Intelligent Agent (IA) Framework Tuesday, November 20, 2011 Randall Mora, David Harris, Wyn Hack Avum, Inc. Outline Solution
TIBCO Cyber Security Platform. Atif Chaughtai
TIBCO Cyber Security Platform Atif Chaughtai 2 TABLE OF CONTENTS 1 Introduction/Background... 3 2 Current Challenges... 3 3 Solution...4 4 CONCLUSION...6 5 A Case in Point: The US Intelligence Community...7
The Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn
The Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn Presented by :- Ishank Kumar Aakash Patel Vishnu Dev Yadav CONTENT Abstract Introduction Related work The Ecosystem Ingress
Time-Series Databases and Machine Learning
Time-Series Databases and Machine Learning Jimmy Bates November 2017 1 Top-Ranked Hadoop 1 3 5 7 Read Write File System World Record Performance High Availability Enterprise-grade Security Distribution
Integrating a Big Data Platform into Government:
Integrating a Big Data Platform into Government: Drive Better Decisions for Policy and Program Outcomes John Haddad, Senior Director Product Marketing, Informatica Digital Government Institute s Government
Advanced SOC Design. Next Generation Security Operations. Shane Harsch Senior Solutions Principal, MBA GCED CISSP RSA
Advanced SOC Design Next Generation Security Operations Shane Harsch Senior Solutions Principal, MBA GCED CISSP RSA 1 ! Why/How security investments need to shift! Key functions of a Security Operations
CRITEO INTERNSHIP PROGRAM 2015/2016
CRITEO INTERNSHIP PROGRAM 2015/2016 A. List of topics PLATFORM Topic 1: Build an API and a web interface on top of it to manage the back-end of our third party demand component. Challenge(s): Working with
Up Your R Game. James Taylor, Decision Management Solutions Bill Franks, Teradata
Up Your R Game James Taylor, Decision Management Solutions Bill Franks, Teradata Today s Speakers James Taylor Bill Franks CEO Chief Analytics Officer Decision Management Solutions Teradata 7/28/14 3 Polling
Some vendors have a big presence in a particular industry; some are geared toward data scientists, others toward business users.
Bonus Chapter Ten Major Predictive Analytics Vendors In This Chapter Angoss FICO IBM RapidMiner Revolution Analytics Salford Systems SAP SAS StatSoft, Inc. TIBCO This chapter highlights ten of the major
Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013
Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013 James Maltby, Ph.D 1 Outline of Presentation Semantic Graph Analytics Database Architectures In-memory Semantic Database Formulation
Advanced Big Data Analytics with R and Hadoop
REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional
Where is... How do I get to...
Big Data, Fast Data, Spatial Data Making Sense of Location Data in a Smart City Hans Viehmann Product Manager EMEA ORACLE Corporation August 19, 2015 Copyright 2014, Oracle and/or its affiliates. All rights
Big Data and Market Surveillance. April 28, 2014
Big Data and Market Surveillance April 28, 2014 Copyright 2014 Scila AB. All rights reserved. Scila AB reserves the right to make changes to the information contained herein without prior notice. No part
Real Time Big Data Processing
Real Time Big Data Processing Cloud Expo 2014 Ian Meyers Amazon Web Services Global Infrastructure Deployment & Administration App Services Analytics Compute Storage Database Networking AWS Global Infrastructure
Streaming Analytics A Framework for Innovation
Streaming Analytics A Framework for Innovation Jan Humble Solutions Architect 1 Volume and Scale of Sensing Data Can you TURN IT ON? Can you Identify Insights in REAL-TIME? Can you REACT and ENGAGE in
Kafka & Redis for Big Data Solutions
Kafka & Redis for Big Data Solutions Christopher Curtin Head of Technical Research @ChrisCurtin About Me 25+ years in technology Head of Technical Research at Silverpop, an IBM Company (14 + years at Silverpop)
Big Data Management and Security
Big Data Management and Security Audit Concerns and Business Risks Tami Frankenfield Sr. Director, Analytics and Enterprise Data Mercury Insurance What is Big Data? Velocity + Volume + Variety = Value
Case Study: Instrumenting a Network for NetFlow Security Visualization Tools
Case Study: Instrumenting a Network for NetFlow Security Visualization Tools William Yurcik* Yifan Li SIFT Research Group National Center for Supercomputing Applications (NCSA) University of Illinois at
Security Analytics for Smart Grid
Security Analytics for Smart Grid Dr. Robert W. Griffin Chief Security Architect RSA, the Security Division of EMC [email protected] blogs.rsa.com/author/griffin @RobtWesGriffin 1 No Shortage of Hard
WHITE PAPER SPLUNK SOFTWARE AS A SIEM
SPLUNK SOFTWARE AS A SIEM Improve your security posture by using Splunk as your SIEM HIGHLIGHTS Splunk software can be used to operate security operations centers (SOC) of any size (large, med, small)
SQL + NOSQL + NEWSQL + REALTIME FOR INVESTMENT BANKS
Enterprise Data Problems in Investment Banks BigData History and Trend Driven by Google CAP Theorem for Distributed Computer System Open Source Building Blocks: Hadoop, Solr, Storm.. 3548 Hypothetical
BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON
BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON Overview * Introduction * Multiple faces of Big Data * Challenges of Big Data * Cloud Computing
Big Data and Industrial Internet
Big Data and Industrial Internet Keijo Heljanko Department of Computer Science and Helsinki Institute for Information Technology HIIT School of Science, Aalto University [email protected] 16.6-2015
How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time
SCALEOUT SOFTWARE How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time by Dr. William Bain and Dr. Mikhail Sobolev, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T wenty-first
HiBench Introduction. Carson Wang ([email protected]) Software & Services Group
HiBench Introduction Carson Wang ([email protected]) Agenda Background Workloads Configurations Benchmark Report Tuning Guide Background WHY Why we need big data benchmarking systems? WHAT What is
The Future of Data Management
The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class
SAP Predictive Analytics: An Overview and Roadmap. Charles Gadalla, SAP @cgadalla SESSION CODE: 603
SAP Predictive Analytics: An Overview and Roadmap Charles Gadalla, SAP @cgadalla SESSION CODE: 603 Advanced Analytics SAP Vision Embed Smart Agile Analytics into Decision Processes to Deliver Business
Big Data. Fast Forward. Putting data to productive use
Big Data Putting data to productive use Fast Forward What is big data, and why should you care? Get familiar with big data terminology, technologies, and techniques. Getting started with big data to realize
The Big Data Ecosystem at LinkedIn. Presented by Zhongfang Zhuang
The Big Data Ecosystem at LinkedIn Presented by Zhongfang Zhuang Based on the paper The Big Data Ecosystem at LinkedIn, written by Roshan Sumbaly, Jay Kreps, and Sam Shah. The Ecosystems Hadoop Ecosystem
Simplifying Big Data Analytics: Unifying Batch and Stream Processing. John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!!
Simplifying Big Data Analytics: Unifying Batch and Stream Processing John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!! Streaming Analy.cs S S S Scale- up Database Data And Compute Grid
Big Data Patterns. Ron Bodkin Founder and President, Think Big
Big Data Patterns Ron Bodkin Founder and President, Think Big 1 About Me Ron Bodkin Founder and President, Think Big I have 9 years experience working with Big Data and Hadoop. In 2010, I founded Think
Infrastructures for big data
Infrastructures for big data Rasmus Pagh 1 Today s lecture Three technologies for handling big data: MapReduce (Hadoop) BigTable (and descendants) Data stream algorithms Alternatives to (some uses of)
