Cisco OpenSOC Hadoop Design with Bro

Size: px
Start display at page:

Download "Cisco OpenSOC Hadoop Design with Bro"

Transcription

1 Cisco OpenSOC Hadoop Design with Bro Kurt Grutzmacher Solutions Architect, Cisco Security Services 19 August 2014

2 What Does OpenSOC Address? SOC Challenges: Volume: from Terabytes to Zettabytes Variety: structured to unstructured Velocity: everything is a sensor, generates logs Veracity: data quality issues Value: What to keep? Purge? Summarize? Cisco Public 2

3 Introducing OpenSOC Supports Data-Driven Security Model Unified platform for ingest, storage, analytics Provides multiple views/access patterns for data Interactive Analytics and Predictive Modeling Provides contextual real-time alerts Rapid deployment and scoring Cisco Public 3

4 Emergence of Big Data Systems that scale out, not up Parallel and scalable computation tools Cheap, massively-scalable storage Stream computation + stream analysis Scaling + approximation Cisco Public 4

5 Technology Behind OpenSOC Telemetry Capture Layer: Data Bus: Stream Processor: Real-Time Index and Search: Long-Term Data Store: Long-Term Packet Store: Visualization Platform: Apache Flume Apache Kafka Apache Storm Elastic Search Apache Hive Apache Hbase Kibana Cisco Public 5

6 Introducing OpenSOC Intersection of Big Data and Security Analytics Real-Time Alerts Anomaly Detection Data Correlation Hadoop Scalable Compute Multi Petabyte Storage Interactive Query Real-Time Search OpenSOC Rules and Reports Predictive Modeling UI and Applications Elastic Search Unstructured Data Scalable Stream Processing Data Access Control Big Data Platform Cisco Public 6

7 OpenSOC in a Nutshell Threat Intelligence Feeds Raw Network Stream Applications + Analyst Tools Network Metadata Stream Netflow Syslog Raw Application Logs Parse + Format Enrich Alert Log Mining and Analytics Elastic Search Network Packet Mining and PCAP Reconstruction HBase Big Data Exploration, Predictive Modeling Hive Other Streaming Telemetry Real-Time Index Raw Packet Store Long-Term Store Enrichment Data Cisco Public 7

8 Overview of the Analytics Pipeline Cisco Public 8

9 Analytics Pipeline Flume Kafka Storm OpenSOC-Streaming RAW Transform Enrich Alert (Rules-Based) Big Data Stores Elastic Search Real-Time Index and Search HIVE Long-Term Data Store OpenSOC-Aggregation Filter Aggregators HBase Windowed Rollup Store Enriched OpenSOC-ML Router Model 1 Model 2 Scorer HIVE Alerts Store Model n Alerts UI UI UI SOC Alert Consumers UI UI Web Services Secure Gateway Services External Alert Consumers Remedy Ticketing System Cisco Public 9

10 OpenSOC-Streaming KAFKA TOPIC: RAW [TIMESTAMP] Log message: "This is a test [TIMESTAMP] log message", Log generated message: from "This is a mydomain.com, test [TIMESTAMP] log message", :123 Log generated message: -> from "This is a :456 mydomain.com, test log message", {TCP} :123 generated -> from :456 mydomain.com, {TCP} :123 -> :456 {TCP} OpenSOC-Streaming 1:1 KAFKA TOPIC: ENRICHED { msg_content : This is a test log message, processed: { msg_content : [TIMESTAMP], This is src_ip: a test log , message, src_port: 123, dest_ip: , dest_port: processed: { msg_content : 456, [TIMESTAMP], domain: This mydomain.com, is src_ip: a test log , message, protocol:"tcp" src_port: 123, dest_ip: , geo_enrichment:{ dest_port: processed: 456, [TIMESTAMP], domain: mydomain.com, src_ip: , protocol:"tcp" src_port: 123, dest_ip: , geo_enrichment:{ dest_port: 456, source:{ domain: mydomain.com, protocol:"tcp" geo_enrichment:{ source:{ postalcode: 95134,areaCode: 408,metroCode: 807,longitude: , source:{ latitude:37.425,locid:4522,city:"san postalcode: 95134,areaCode: 408,metroCode: Jose",country:"US"} 807,longitude: , dest:{ latitude:37.425,locid:4522,city:"san postalcode: 95134,areaCode: 408,metroCode: Jose",country:"US"} 807,longitude: , dest:{ postalcode: latitude:37.425,locid:4522,city:"san 95134,areaCode: 408,metroCode: Jose",country:"US"} 807,longitude: , dest:{ latitude:37.425,locid:4522,city:"san postalcode: 95134,areaCode: 408,metroCode: Jose",country:"US"}} 807,longitude: , whois_enrichment:{ latitude:37.425,locid:4522,city:"san postalcode: 95134,areaCode: 408,metroCode: Jose",country:"US"}} 807,longitude: , whois_enrichment:{ source: { latitude:37.425,locid:4522,city:"san Jose",country:"US"}} whois_enrichment:{ source: OrgId":"CISCOS","Parent":"NET ", { source: OrgAbuseName":"Cisco OrgId":"CISCOS","Parent":"NET ", { Systems Inc", "RegDate":" ","OrgName":"Cisco OrgAbuseName":"Cisco OrgId":"CISCOS","Parent":"NET ", Systems Inc", Systems", "Address":"170 "RegDate":" ","OrgName":"Cisco OrgAbuseName":"Cisco West Tasman Drive", Systems Inc", Systems", "NetType":"Direct "Address":"170 "RegDate":" ","OrgName":"Cisco Assignment"} West Tasman Drive", Systems", dest: { "NetType":"Direct "Address":"170 Assignment"} West Tasman Drive", dest: { OrgId":"CISCOS","Parent":"NET ", "NetType":"Direct Assignment"} dest: "OrgAbuseName":"Cisco { OrgId":"CISCOS","Parent":"NET ", Systems Inc", "RegDate":" ","OrgName":"Cisco "OrgAbuseName":"Cisco OrgId":"CISCOS","Parent":"NET ", Systems Inc", Systems", "Address":"170 "RegDate":" ","OrgName":"Cisco "OrgAbuseName":"Cisco West Tasman Drive", Systems Inc", Systems", "NetType":"Direct "Address":"170 "RegDate":" ","OrgName":"Cisco Assignment"}}} West Tasman Drive", Systems", "NetType":"Direct "Address":"170 Assignment"}}} West Tasman Drive", "NetType":"Direct Assignment"}}} Cisco Public 10

11 Streaming Topology Message Stream Shuffle Grouping Parsers Enrich Geo Enrich Whois Control Stream ALL Grouping Parser Enrich Enrich KAFKA Topic: Enriched Kafka Spout MESSAGE Parser Enrich Enrich KAFKA Topic: Enriched Parser Enrich Enrich KAFKA Topic: Enriched Parser Enrich Enrich KAFKA Topic: Enriched Parser Plugin Geo Plug-In Whois Plug-In SQL Store K-V Store Cisco Public 11

12 OpenSOC-Aggregation KAFKA TOPIC: ENRICHED { msg_content : This is a test log message, processed: { msg_content : [TIMESTAMP], This is src_ip: a test log , message, src_port: 123, dest_ip: , dest_port: processed: { msg_content : 456, [TIMESTAMP], domain: This mydomain.com, is src_ip: a test log , message, protocol:"tcp" src_port: 123, dest_ip: , geo_enrichment:{ dest_port: processed: 456, [TIMESTAMP], domain: mydomain.com, src_ip: , protocol:"tcp" src_port: 123, dest_ip: , geo_enrichment:{ dest_port: 456, source:{ domain: mydomain.com, protocol:"tcp" geo_enrichment:{ source:{ postalcode: 95134,areaCode: 408,metroCode: 807,longitude: , source:{ latitude:37.425,locid:4522,city:"san postalcode: 95134,areaCode: 408,metroCode: Jose",country:"US"} 807,longitude: , dest:{ latitude:37.425,locid:4522,city:"san postalcode: 95134,areaCode: 408,metroCode: Jose",country:"US"} 807,longitude: , dest:{ postalcode: latitude:37.425,locid:4522,city:"san 95134,areaCode: 408,metroCode: Jose",country:"US"} 807,longitude: , dest:{ latitude:37.425,locid:4522,city:"san postalcode: 95134,areaCode: 408,metroCode: Jose",country:"US"}} 807,longitude: , whois_enrichment:{ latitude:37.425,locid:4522,city:"san postalcode: 95134,areaCode: 408,metroCode: Jose",country:"US"}} 807,longitude: , whois_enrichment:{ source: { latitude:37.425,locid:4522,city:"san Jose",country:"US"}} whois_enrichment:{ source: OrgId":"CISCOS","Parent":"NET ", { source: OrgAbuseName":"Cisco OrgId":"CISCOS","Parent":"NET ", { Systems Inc", "RegDate":" ","OrgName":"Cisco OrgAbuseName":"Cisco OrgId":"CISCOS","Parent":"NET ", Systems Inc", Systems", "Address":"170 "RegDate":" ","OrgName":"Cisco OrgAbuseName":"Cisco West Tasman Drive", Systems Inc", Systems", "NetType":"Direct "Address":"170 "RegDate":" ","OrgName":"Cisco Assignment"} West Tasman Drive", Systems", dest: { "NetType":"Direct "Address":"170 Assignment"} West Tasman Drive", dest: { OrgId":"CISCOS","Parent":"NET ", "NetType":"Direct Assignment"} dest: "OrgAbuseName":"Cisco { OrgId":"CISCOS","Parent":"NET ", Systems Inc", "RegDate":" ","OrgName":"Cisco "OrgAbuseName":"Cisco OrgId":"CISCOS","Parent":"NET ", Systems Inc", Systems", "Address":"170 "RegDate":" ","OrgName":"Cisco "OrgAbuseName":"Cisco West Tasman Drive", Systems Inc", Systems", "NetType":"Direct "Address":"170 "RegDate":" ","OrgName":"Cisco Assignment"}}} West Tasman Drive", Systems", "NetType":"Direct "Address":"170 Assignment"}}} West Tasman Drive", "NetType":"Direct Assignment"}}} OpenSOC-Aggregation RULES 1:* HBASE yyyy:mm:dd:hh:[bucket_id]: { total-messages:1003 protocol-tcp:305 protocol-udp:201 city-from-phoenix:11 city-from-sanjose:405 country-from-usa:10 country-from-ca:11 ip-from-sketch(a):301 ip-from-sketch(n):55 port-from-sketch(a):44 port-from-sketch(n):12 } OpenSOC-Streaming Cisco Public 12

13 Aggregation Topology Message Stream Shuffle Grouping Control Stream ALL Grouping Filters Filter Pulse Spout KEY Aggregators Aggregator HBase KEY:CF:COUNT TIMER Kafka Spout MESSAGE Filter Filter TUPLE TUPLE TUPLE TUPLE Aggregator Aggregator HBase KEY:CF:COUNT HBase KEY:CF:COUNT Filter Aggregator HBase KEY:CF:COUNT MAPPER FILTER AGGREGATE Cisco Public 13

14 OpenSOC-ML KAFKA TOPIC: ENRICHED { msg_content : This is a test log message, processed: { msg_content : [TIMESTAMP], This is src_ip: a test log , message, src_port: 123, dest_ip: , dest_port: processed: { msg_content : 456, [TIMESTAMP], domain: This mydomain.com, is src_ip: a test log , message, protocol:"tcp" src_port: 123, dest_ip: , geo_enrichment:{ dest_port: processed: 456, [TIMESTAMP], domain: mydomain.com, src_ip: , protocol:"tcp" src_port: 123, dest_ip: , geo_enrichment:{ dest_port: 456, source:{ domain: mydomain.com, protocol:"tcp" geo_enrichment:{ source:{ postalcode: 95134,areaCode: 408,metroCode: 807,longitude: , source:{ latitude:37.425,locid:4522,city:"san postalcode: 95134,areaCode: 408,metroCode: Jose",country:"US"} 807,longitude: , dest:{ latitude:37.425,locid:4522,city:"san postalcode: 95134,areaCode: 408,metroCode: Jose",country:"US"} 807,longitude: , dest:{ postalcode: latitude:37.425,locid:4522,city:"san 95134,areaCode: 408,metroCode: Jose",country:"US"} 807,longitude: , dest:{ latitude:37.425,locid:4522,city:"san postalcode: 95134,areaCode: 408,metroCode: Jose",country:"US"}} 807,longitude: , whois_enrichment:{ latitude:37.425,locid:4522,city:"san postalcode: 95134,areaCode: 408,metroCode: Jose",country:"US"}} 807,longitude: , whois_enrichment:{ source: { latitude:37.425,locid:4522,city:"san Jose",country:"US"}} whois_enrichment:{ source: OrgId":"CISCOS","Parent":"NET ", { source: OrgAbuseName":"Cisco OrgId":"CISCOS","Parent":"NET ", { Systems Inc", "RegDate":" ","OrgName":"Cisco OrgAbuseName":"Cisco OrgId":"CISCOS","Parent":"NET ", Systems Inc", Systems", "Address":"170 "RegDate":" ","OrgName":"Cisco OrgAbuseName":"Cisco West Tasman Drive", Systems Inc", Systems", "NetType":"Direct "Address":"170 "RegDate":" ","OrgName":"Cisco Assignment"} West Tasman Drive", Systems", dest: { "NetType":"Direct "Address":"170 Assignment"} West Tasman Drive", dest: { OrgId":"CISCOS","Parent":"NET ", "NetType":"Direct Assignment"} dest: "OrgAbuseName":"Cisco { OrgId":"CISCOS","Parent":"NET ", Systems Inc", "RegDate":" ","OrgName":"Cisco "OrgAbuseName":"Cisco OrgId":"CISCOS","Parent":"NET ", Systems Inc", Systems", "Address":"170 "RegDate":" ","OrgName":"Cisco "OrgAbuseName":"Cisco West Tasman Drive", Systems Inc", Systems", "NetType":"Direct "Address":"170 "RegDate":" ","OrgName":"Cisco Assignment"}}} West Tasman Drive", Systems", "NetType":"Direct "Address":"170 Assignment"}}} West Tasman Drive", "NetType":"Direct Assignment"}}} KAFKA Topic: ML Alerts OpenSOC-Streaming OpenSOC-ML *:* alert: { } anomaly-type: irregular pattern, models-triggered: [m1,m2,m4] Cisco Public 14

15 ML Topology Pulse Spout Event ID Online Models Model Event ID TIMER Model Kafka Router Spout MESSAGE Offline Models Scorer Alert KAFKA Topic: Alerts Message Stream ALL Grouping Control Stream ALL Grouping Model Model Eval Algo HBase MR JOB Spark JOB HIVE Long-Term Data Store Cisco Public 15

16 Packet Metadata Cisco Public 16

17 OpenSOC - Stitching Things Together Source Systems Data Collection Messaging System Real Time Processing Storage Access Passive Tap Traffic Replicator PCAP Kafka PCAP Topic Storm PCAP Topology Hive Raw Data Analytic Tools R / Python Telemetry Sources Syslog HTTP File System Flume Agent A Agent B DPI Topic A Topic B Topic DPI Topology Deeper Look A Topology B Topology ORC Elastic Search Index HBase Power Pivot Tableau Web Services Search Other Agent N N Topic N Topology PCAP Table PCAP Reconstruction Cisco Public 17

18 PCAP Topology Real Time Processing Storm Storage Hive Raw Data HDFS Bolt ORC Elastic Search Kafka Spout Parser Bolt ES Bolt Index HBase Bolt HBase PCAP Table Cisco Public 18

19 DPI Topology & Telemetry Enrichment Real Time Processing Storm Storage Hive HDFS Bolt Raw Data ORC Kafka Spout GEO Enrich CIF Enrich Elastic Search ES Bolt Index Parser Bolt Whois Enrich HBase PCAP Table Cisco Public 19

20 Enrichments {! msg_key1 : msg value1,! src_ip : ,! dest_ip : ,! domain : mydomain.com! }! "geo":[ {"region":"ca",! "postalcode":"95134",! "areacode":"408",! "metrocode":"807",! "longitude": ,! "latitude":37.425,! "locid":4522,! "city":"san Jose",! "country":"us"! }]! "whois":[ {! "OrgId":"CISCOS",! "Parent":"NET ",! "OrgAbuseName":"Cisco Systems Inc",! "RegDate":" ",! "OrgName":"Cisco Systems",! "Address":"170 West Tasman Drive",! "NetType":"Direct Assignment"! } ],! cif : Yes! RAW Message Parser Bolt GEO Enrich Cache Whois Enrich Cache Threat Intel Enrich Cache Enriched Message MySQL HBase HBase Geo Lite Data Whois Data CIF Data Cisco Public 20

21 Producing Bro Messages to OpenSOC Bro 2.2/2.3: Writer::LogKafka addition (will not be merged) Not fully baked but spits out a ton of messages Awaiting new Logging process (BIT-1222) Let Bro do its DPD functionality Let the Hadoop cluster do the enrichment Off-loading capture cards help let the CPU better do its chores Tie your Bro capture system to HDFS and extract files to it! Cisco Public 21

22 What OpenSOC is NOT easy to install and get working quickly Remember Bro 1.x? It s sorta like that right now A good framework but really tough to put together but getting better a killer of your existing SIEM At least not yet that silver bullet which will solve all your security problems More like a rusty nail which you re just about to step on with Java Cisco Public 22

23 Algorithms Cisco Public 23

24 Types of Algorithms Offline analyze entire data set at once Generally a good fit for Apache Hadoop/Map Reduce Model compiled via batch, scored via stream processor Online start with an initial state and analyze each piece of data serially one at a time Generally require a chain of Map Reduce jobs Good fit for Apache Spark, Tez, Storm Primarily batch, good for Lambda architectures Online Streaming real-time, in-memory, limited analysis Generally a good fit for Apache Storm, Spark-Streaming Probabilistic, random structures, approximations Cisco Public 24

25 Online Models Stream Processing Bolt Bolt Stream Bolt Bolt Bolt Bolt Result Batch Workflow Hadoop Cisco Public 25

26 Examples: Stream Clustering StreamKM++: computes a small weighted sample of the data stream and it uses the k-means++ algorithm as a randomized seeding technique to choose the first values for the clusters. CluStream: micro-clusters are temporal extensions of cluster feature vectors. The micro-clusters are stored at snapshots in time following a pyramidal pattern. ClusTree: It is a parameter free algorithm automatically adapting to the speed of the stream and it is capable of detecting concept drift, novelty, and outliers in the stream. DenStream: uses dense micro-clusters (named core-micro-cluster) to summarize clusters. To maintain and distinguish the potential clusters and outliers, this method presents core-micro-cluster and outlier micro-cluster structures. D-Stream: maps each input data record into a grid and it computes the grid density. The grids are clustered based on the density. This algorithm adopts a density decaying technique to capture the dynamic changes of a data stream. CobWeb: uses a classification tree. Each node in a classification tree represents a class (concept) and is labeled by a probabilistic concept that summarizes the attribute-value distributions of objects classified under the node Cisco Public 26

27 Example: Stream Classification Hoeffding Tree (VFDT) incremental, anytime decision tree induction algorithm that is capable of learning from massive data streams, assuming that the distribution generating examples does not change over time Half-Space Trees ensemble model that randomly spits data into half spaces. They are created online and detect anomalies by their deviations in placement within the forest relative to other data from the same window Cisco Public 27

28 Examples: Outlier Detection[10] Median Absolute Deviation: Telemetry is anomalous if the deviation of its latest datapoint with respect to the median is X times larger than the median of deviations Standard Deviation from Average: Telemetry is anomalous if the absolute value of the average of the latest three datapoint minus the moving average is greater than three standard deviations of the average. Standard Deviation from Moving Average: Telemetry is anomalous if the absolute value of the average of the latest three datapoints minus the moving average is greater than three standard deviations of the moving average. Cisco Public 28

29 Examples: Outlier Detection [10] Mean Subtraction Cumulation: Telemetry is anomalous if the value of the next datapoint in the series is farther than three standard deviations out in cumulative terms after subtracting the mean from each data point Least Squares: Telemetry is anomalous if the average of the last three datapoints on a projected least squares model is greater than three sigma Histogram Bins: Telemetry is anomalous if the average of the last three datapoints falls into a histogram bin with less than x Cisco Public 29

30 Offline Models Stream Processing Stream Bolt Result Batch Workflow Stream Stream Hadoop Batch JOB Model Cisco Public 30

31 Offline Algorithms Hypothesis Tests Chi2 Test (Goodness of Fit): A feature is anomalous if it the data for the latest micro batch (for the last 10 minutes) comes from a different distribution than the historical distribution for that feature Grubbs Test: telemetry is anomalous if the Z score is greater than the Grubb's score. Kolmogorov-Smirnov Test: check if data distribution for last 10 minutes is different from last hour Simple Outliers test: telemetry is anomalous if the number of outliers for the last 10 minutes is statistically different then the historical number of outliers for that time frame Decision Trees/Random Forests Association Rules (Apriori) BIRCH/DBSCAN Clustering Auto Regressive (AR) Moving Average (MA) Cisco Public 31

32 Approximation for Streaming Cisco Public 32

33 Approximation for Streaming Skip Lists Trade memory for lookup time increase Hierarchical layered indexing on top of lists Lookup improvement: O(n) -> O(log n) Best Uses: Search for element in a list Approximate the number of distinct elements in a list Cisco Public 33

34 Approximation for Streaming HyperLogLog Turns each feature to a hash value Keeps track of the longest run of a feature Approximates the number of occurrences of a feature based on it s longest runs Best Uses: Estimate how rare an occurrence of a feature is Approximate count of distinct objects over time Estimate entropy of a data set Cisco Public 34

35 Approximation for Streaming Bloom Filters Trade memory for fast set membership check Hash all incoming elements to hash sets Use multiple hash functions and multiple hash sets of varying sizes Best Uses: Check element membership in a set Cisco Public 35

36 Approximation for Streaming Sketches Top-K: count of top elements in the stream Count-Min: histograms over large feature sets (similar to Bloom Filters with counts) Digest algorithm for computing approximate quantiles on a collection of integers Cisco Public 36

37 @ProjectOpenSOC Cisco Public 37

38 Thank you.

Combat Sophisticated Threats

Combat Sophisticated Threats Combat Sophisticated Threats How Big Data and OpenSOC Could Help SESSION ID: SEC-T11 James Sirota Big Data Architect/Data Scientist Cisco Security Solutions Practice @JamesSirota In the next few minutes

More information

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform

More information

Presented by: Aaron Bossert, Cray Inc. Network Security Analytics, HPC Platforms, Hadoop, and Graphs Oh, My

Presented by: Aaron Bossert, Cray Inc. Network Security Analytics, HPC Platforms, Hadoop, and Graphs Oh, My Presented by: Aaron Bossert, Cray Inc. Network Security Analytics, HPC Platforms, Hadoop, and Graphs Oh, My The Proverbial Needle In A Haystack Problem The Nuclear Option Problem Statement and Proposed

More information

HUAWEI Advanced Data Science with Spark Streaming. Albert Bifet (@abifet)

HUAWEI Advanced Data Science with Spark Streaming. Albert Bifet (@abifet) HUAWEI Advanced Data Science with Spark Streaming Albert Bifet (@abifet) Huawei Noah s Ark Lab Focus Intelligent Mobile Devices Data Mining & Artificial Intelligence Intelligent Telecommunication Networks

More information

HDP Hadoop From concept to deployment.

HDP Hadoop From concept to deployment. HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some

More information

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control EP/K006487/1 UK PI: Prof Gareth Taylor (BU) China PI: Prof Yong-Hua Song (THU) Consortium UK Members: Brunel University

More information

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: [email protected] Website: www.qburst.com

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...

More information

HADOOP. Revised 10/19/2015

HADOOP. Revised 10/19/2015 HADOOP Revised 10/19/2015 This Page Intentionally Left Blank Table of Contents Hortonworks HDP Developer: Java... 1 Hortonworks HDP Developer: Apache Pig and Hive... 2 Hortonworks HDP Developer: Windows...

More information

SEIZE THE DATA. 2015 SEIZE THE DATA. 2015

SEIZE THE DATA. 2015 SEIZE THE DATA. 2015 1 Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. BIG DATA CONFERENCE 2015 Boston August 10-13 Predicting and reducing deforestation

More information

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give

More information

The Internet of Things and Big Data: Intro

The Internet of Things and Big Data: Intro The Internet of Things and Big Data: Intro John Berns, Solutions Architect, APAC - MapR Technologies April 22 nd, 2014 1 What This Is; What This Is Not It s not specific to IoT It s not about any specific

More information

Oracle Big Data SQL Technical Update

Oracle Big Data SQL Technical Update Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical

More information

Hadoop Ecosystem B Y R A H I M A.

Hadoop Ecosystem B Y R A H I M A. Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open

More information

Real Time Fraud Detection With Sequence Mining on Big Data Platform. Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May 6 2014 Santa Clara, CA

Real Time Fraud Detection With Sequence Mining on Big Data Platform. Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May 6 2014 Santa Clara, CA Real Time Fraud Detection With Sequence Mining on Big Data Platform Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May 6 2014 Santa Clara, CA Open Source Big Data Eco System Query (NOSQL) : Cassandra,

More information

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ End to End Solution to Accelerate Data Warehouse Optimization Franco Flore Alliance Sales Director - APJ Big Data Is Driving Key Business Initiatives Increase profitability, innovation, customer satisfaction,

More information

HiBench Introduction. Carson Wang ([email protected]) Software & Services Group

HiBench Introduction. Carson Wang (carson.wang@intel.com) Software & Services Group HiBench Introduction Carson Wang ([email protected]) Agenda Background Workloads Configurations Benchmark Report Tuning Guide Background WHY Why we need big data benchmarking systems? WHAT What is

More information

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved. Collaborative Big Data Analytics 1 Big Data Is Less About Size, And More About Freedom TechCrunch!!!!!!!!! Total data: bigger than big data 451 Group Findings: Big Data Is More Extreme Than Volume Gartner!!!!!!!!!!!!!!!

More information

Beyond Lambda - how to get from logical to physical. Artur Borycki, Director International Technology & Innovations

Beyond Lambda - how to get from logical to physical. Artur Borycki, Director International Technology & Innovations Beyond Lambda - how to get from logical to physical Artur Borycki, Director International Technology & Innovations Simplification & Efficiency Teradata believe in the principles of self-service, automation

More information

What s New in Security Analytics 10.4. Be the Hunter.. Not the Hunted

What s New in Security Analytics 10.4. Be the Hunter.. Not the Hunted What s New in Security Analytics 10.4 Be the Hunter.. Not the Hunted Attackers Are Outpacing Detection Attacker Capabilities Time To Discovery Source: VERIZON 2014 DATA BREACH INVESTIGATIONS REPORT 2 TRANSFORM

More information

Towards Smart and Intelligent SDN Controller

Towards Smart and Intelligent SDN Controller Towards Smart and Intelligent SDN Controller - Through the Generic, Extensible, and Elastic Time Series Data Repository (TSDR) YuLing Chen, Dell Inc. Rajesh Narayanan, Dell Inc. Sharon Aicler, Cisco Systems

More information

Big Data Analytics. An Introduction. Oliver Fuchsberger University of Paderborn 2014

Big Data Analytics. An Introduction. Oliver Fuchsberger University of Paderborn 2014 Big Data Analytics An Introduction Oliver Fuchsberger University of Paderborn 2014 Table of Contents I. Introduction & Motivation What is Big Data Analytics? Why is it so important? II. Techniques & Solutions

More information

BigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic

BigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic BigData An Overview of Several Approaches David Mera Masaryk University Brno, Czech Republic 16/12/2013 Table of Contents 1 Introduction 2 Terminology 3 Approaches focused on batch data processing MapReduce-Hadoop

More information

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future

More information

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume

More information

The 4 Pillars of Technosoft s Big Data Practice

The 4 Pillars of Technosoft s Big Data Practice beyond possible Big Use End-user applications Big Analytics Visualisation tools Big Analytical tools Big management systems The 4 Pillars of Technosoft s Big Practice Overview Businesses have long managed

More information

Openbus Documentation

Openbus Documentation Openbus Documentation Release 1 Produban February 17, 2014 Contents i ii An open source architecture able to process the massive amount of events that occur in a banking IT Infraestructure. Contents:

More information

Next Gen Hadoop Gather around the campfire and I will tell you a good YARN

Next Gen Hadoop Gather around the campfire and I will tell you a good YARN Next Gen Hadoop Gather around the campfire and I will tell you a good YARN Akmal B. Chaudhri* Hortonworks *about.me/akmalchaudhri My background ~25 years experience in IT Developer (Reuters) Academic (City

More information

Advanced In-Database Analytics

Advanced In-Database Analytics Advanced In-Database Analytics Tallinn, Sept. 25th, 2012 Mikko-Pekka Bertling, BDM Greenplum EMEA 1 That sounds complicated? 2 Who can tell me how best to solve this 3 What are the main mathematical functions??

More information

Pulsar Realtime Analytics At Scale. Tony Ng April 14, 2015

Pulsar Realtime Analytics At Scale. Tony Ng April 14, 2015 Pulsar Realtime Analytics At Scale Tony Ng April 14, 2015 Big Data Trends Bigger data volumes More data sources DBs, logs, behavioral & business event streams, sensors Faster analysis Next day to hours

More information

Big Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel

Big Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel Big Data and Analytics: Getting Started with ArcGIS Mike Park Erik Hoel Agenda Overview of big data Distributed computation User experience Data management Big data What is it? Big Data is a loosely defined

More information

Real-time Data Analytics mit Elasticsearch. Bernhard Pflugfelder inovex GmbH

Real-time Data Analytics mit Elasticsearch. Bernhard Pflugfelder inovex GmbH Real-time Data Analytics mit Elasticsearch Bernhard Pflugfelder inovex GmbH Bernhard Pflugfelder Big Data Engineer @ inovex Fields of interest: search analytics big data bi Working with: Lucene Solr Elasticsearch

More information

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84 Index A Amazon Web Services (AWS), 50, 58 Analytics engine, 21 22 Apache Kafka, 38, 131 Apache S4, 38, 131 Apache Sqoop, 37, 131 Appliance pattern, 104 105 Application architecture, big data analytics

More information

Introduction to Big Data! with Apache Spark" UC#BERKELEY#

Introduction to Big Data! with Apache Spark UC#BERKELEY# Introduction to Big Data! with Apache Spark" UC#BERKELEY# This Lecture" The Big Data Problem" Hardware for Big Data" Distributing Work" Handling Failures and Slow Machines" Map Reduce and Complex Jobs"

More information

Big Data Analytics with Spark and Oscar BAO. Tamas Jambor, Lead Data Scientist at Massive Analytic

Big Data Analytics with Spark and Oscar BAO. Tamas Jambor, Lead Data Scientist at Massive Analytic Big Data Analytics with Spark and Oscar BAO Tamas Jambor, Lead Data Scientist at Massive Analytic About me Building a scalable Machine Learning platform at MA Worked in Big Data and Data Science in the

More information

Creating Big Data Applications with Spring XD

Creating Big Data Applications with Spring XD Creating Big Data Applications with Spring XD Thomas Darimont @thomasdarimont THE FASTEST PATH TO NEW BUSINESS VALUE Journey Introduction Concepts Applications Outlook 3 Unless otherwise indicated, these

More information

Federated SQL on Hadoop and Beyond: Leveraging Apache Geode to Build a Poor Man's SAP HANA. by Christian Tzolov @christzolov

Federated SQL on Hadoop and Beyond: Leveraging Apache Geode to Build a Poor Man's SAP HANA. by Christian Tzolov @christzolov Federated SQL on Hadoop and Beyond: Leveraging Apache Geode to Build a Poor Man's SAP HANA by Christian Tzolov @christzolov Whoami Christian Tzolov Technical Architect at Pivotal, BigData, Hadoop, SpringXD,

More information

SQLstream Blaze and Apache Storm A BENCHMARK COMPARISON

SQLstream Blaze and Apache Storm A BENCHMARK COMPARISON SQLstream Blaze and Apache Storm A BENCHMARK COMPARISON 2 The V of Big Data Velocity means both how fast data is being produced and how fast the data must be processed to meet demand. Gartner The emergence

More information

Comprehensive Analytics on the Hortonworks Data Platform

Comprehensive Analytics on the Hortonworks Data Platform Comprehensive Analytics on the Hortonworks Data Platform We do Hadoop. Page 1 Page 2 Back to 2005 Page 3 Vertical Scaling Page 4 Vertical Scaling Page 5 Vertical Scaling Page 6 Horizontal Scaling Page

More information

Big Data & Security. Aljosa Pasic 12/02/2015

Big Data & Security. Aljosa Pasic 12/02/2015 Big Data & Security Aljosa Pasic 12/02/2015 Welcome to Madrid!!! Big Data AND security: what is there on our minds? Big Data tools and technologies Big Data T&T chain and security/privacy concern mappings

More information

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015 Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL May 2015 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document

More information

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica

More information

Building Scalable Big Data Infrastructure Using Open Source Software. Sam William sampd@stumbleupon.

Building Scalable Big Data Infrastructure Using Open Source Software. Sam William sampd@stumbleupon. Building Scalable Big Data Infrastructure Using Open Source Software Sam William sampd@stumbleupon. What is StumbleUpon? Help users find content they did not expect to find The best way to discover new

More information

Embedded inside the database. No need for Hadoop or customcode. True real-time analytics done per transaction and in aggregate. On-the-fly linking IP

Embedded inside the database. No need for Hadoop or customcode. True real-time analytics done per transaction and in aggregate. On-the-fly linking IP Operates more like a search engine than a database Scoring and ranking IP allows for fuzzy searching Best-result candidate sets returned Contextual analytics to correctly disambiguate entities Embedded

More information

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat Information Builders enables agile information solutions with business intelligence (BI) and integration technologies. WebFOCUS the most widely utilized business intelligence platform connects to any enterprise

More information

Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop

Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop 1 Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop 2 Pivotal s Full Approach It s More Than Just Hadoop Pivotal Data Labs 3 Why Pivotal Exists First Movers Solve the Big Data Utility Gap

More information

Detecting Anomalous Behavior with the Business Data Lake. Reference Architecture and Enterprise Approaches.

Detecting Anomalous Behavior with the Business Data Lake. Reference Architecture and Enterprise Approaches. Detecting Anomalous Behavior with the Business Data Lake Reference Architecture and Enterprise Approaches. 2 Detecting Anomalous Behavior with the Business Data Lake Pivotal the way we see it Reference

More information

Up Your R Game. James Taylor, Decision Management Solutions Bill Franks, Teradata

Up Your R Game. James Taylor, Decision Management Solutions Bill Franks, Teradata Up Your R Game James Taylor, Decision Management Solutions Bill Franks, Teradata Today s Speakers James Taylor Bill Franks CEO Chief Analytics Officer Decision Management Solutions Teradata 7/28/14 3 Polling

More information

Data Challenges in Telecommunications Networks and a Big Data Solution

Data Challenges in Telecommunications Networks and a Big Data Solution Data Challenges in Telecommunications Networks and a Big Data Solution Abstract The telecom networks generate multitudes and large sets of data related to networks, applications, users, network operations

More information

Architectures for massive data management

Architectures for massive data management Architectures for massive data management Apache Kafka, Samza, Storm Albert Bifet [email protected] October 20, 2015 Stream Engine Motivation Digital Universe EMC Digital Universe with

More information

Detect & Investigate Threats. OVERVIEW

Detect & Investigate Threats. OVERVIEW Detect & Investigate Threats. OVERVIEW HIGHLIGHTS Introducing RSA Security Analytics, Providing: Security monitoring Incident investigation Compliance reporting Providing Big Data Security Analytics Enterprise-wide

More information

The Flink Big Data Analytics Platform. Marton Balassi, Gyula Fora" {mbalassi, gyfora}@apache.org

The Flink Big Data Analytics Platform. Marton Balassi, Gyula Fora {mbalassi, gyfora}@apache.org The Flink Big Data Analytics Platform Marton Balassi, Gyula Fora" {mbalassi, gyfora}@apache.org What is Apache Flink? Open Source Started in 2009 by the Berlin-based database research groups In the Apache

More information

The basic data mining algorithms introduced may be enhanced in a number of ways.

The basic data mining algorithms introduced may be enhanced in a number of ways. DATA MINING TECHNOLOGIES AND IMPLEMENTATIONS The basic data mining algorithms introduced may be enhanced in a number of ways. Data mining algorithms have traditionally assumed data is memory resident,

More information

HDP Enabling the Modern Data Architecture

HDP Enabling the Modern Data Architecture HDP Enabling the Modern Data Architecture Herb Cunitz President, Hortonworks Page 1 Hortonworks enables adoption of Apache Hadoop through HDP (Hortonworks Data Platform) Founded in 2011 Original 24 architects,

More information

ANALYTICS CENTER LEARNING PROGRAM

ANALYTICS CENTER LEARNING PROGRAM Overview of Curriculum ANALYTICS CENTER LEARNING PROGRAM The following courses are offered by Analytics Center as part of its learning program: Course Duration Prerequisites 1- Math and Theory 101 - Fundamentals

More information

Upcoming Announcements

Upcoming Announcements Enterprise Hadoop Enterprise Hadoop Jeff Markham Technical Director, APAC [email protected] Page 1 Upcoming Announcements April 2 Hortonworks Platform 2.1 A continued focus on innovation within

More information

Time-Series Databases and Machine Learning

Time-Series Databases and Machine Learning Time-Series Databases and Machine Learning Jimmy Bates November 2017 1 Top-Ranked Hadoop 1 3 5 7 Read Write File System World Record Performance High Availability Enterprise-grade Security Distribution

More information

Big Data and Data Science: Behind the Buzz Words

Big Data and Data Science: Behind the Buzz Words Big Data and Data Science: Behind the Buzz Words Peggy Brinkmann, FCAS, MAAA Actuary Milliman, Inc. April 1, 2014 Contents Big data: from hype to value Deconstructing data science Managing big data Analyzing

More information

Hadoop and Map-Reduce. Swati Gore

Hadoop and Map-Reduce. Swati Gore Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data

More information

CRITEO INTERNSHIP PROGRAM 2015/2016

CRITEO INTERNSHIP PROGRAM 2015/2016 CRITEO INTERNSHIP PROGRAM 2015/2016 A. List of topics PLATFORM Topic 1: Build an API and a web interface on top of it to manage the back-end of our third party demand component. Challenge(s): Working with

More information

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack HIGHLIGHTS Real-Time Results Elasticsearch on Cisco UCS enables a deeper

More information

Transforming the Telecoms Business using Big Data and Analytics

Transforming the Telecoms Business using Big Data and Analytics Transforming the Telecoms Business using Big Data and Analytics Event: ICT Forum for HR Professionals Venue: Meikles Hotel, Harare, Zimbabwe Date: 19 th 21 st August 2015 AFRALTI 1 Objectives Describe

More information

Designing Agile Data Pipelines. Ashish Singh Software Engineer, Cloudera

Designing Agile Data Pipelines. Ashish Singh Software Engineer, Cloudera Designing Agile Data Pipelines Ashish Singh Software Engineer, Cloudera About Me Software Engineer @ Cloudera Contributed to Kafka, Hive, Parquet and Sentry Used to work in HPC @singhasdev 204 Cloudera,

More information

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time SCALEOUT SOFTWARE How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time by Dr. William Bain and Dr. Mikhail Sobolev, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T wenty-first

More information

Simplifying Big Data Analytics: Unifying Batch and Stream Processing. John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!!

Simplifying Big Data Analytics: Unifying Batch and Stream Processing. John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!! Simplifying Big Data Analytics: Unifying Batch and Stream Processing John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!! Streaming Analy.cs S S S Scale- up Database Data And Compute Grid

More information

Search and Real-Time Analytics on Big Data

Search and Real-Time Analytics on Big Data Search and Real-Time Analytics on Big Data Sewook Wee, Ryan Tabora, Jason Rutherglen Accenture & Think Big Analytics Strata New York October, 2012 Big Data: data becomes your core asset. It realizes its

More information

Streamdrill: Analyzing Big Data Streams in Realtime

Streamdrill: Analyzing Big Data Streams in Realtime Streamdrill: Analyzing Big Data Streams in Realtime Mikio L. Braun [email protected] @mikiobraun th 6 Realtime Big Data: Sources Finance Gaming Monitoring Advertisment Sensor Networks Social Media

More information

How To Create A Data Visualization With Apache Spark And Zeppelin 2.5.3.5

How To Create A Data Visualization With Apache Spark And Zeppelin 2.5.3.5 Big Data Visualization using Apache Spark and Zeppelin Prajod Vettiyattil, Software Architect, Wipro Agenda Big Data and Ecosystem tools Apache Spark Apache Zeppelin Data Visualization Combining Spark

More information

BIG DATA What it is and how to use?

BIG DATA What it is and how to use? BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14

More information

Big Data Analytics Platform @ Nokia

Big Data Analytics Platform @ Nokia Big Data Analytics Platform @ Nokia 1 Selecting the Right Tool for the Right Workload Yekesa Kosuru Nokia Location & Commerce Strata + Hadoop World NY - Oct 25, 2012 Agenda Big Data Analytics Platform

More information

The Big Data Ecosystem at LinkedIn. Presented by Zhongfang Zhuang

The Big Data Ecosystem at LinkedIn. Presented by Zhongfang Zhuang The Big Data Ecosystem at LinkedIn Presented by Zhongfang Zhuang Based on the paper The Big Data Ecosystem at LinkedIn, written by Roshan Sumbaly, Jay Kreps, and Sam Shah. The Ecosystems Hadoop Ecosystem

More information

Real Time Data Processing using Spark Streaming

Real Time Data Processing using Spark Streaming Real Time Data Processing using Spark Streaming Hari Shreedharan, Software Engineer @ Cloudera Committer/PMC Member, Apache Flume Committer, Apache Sqoop Contributor, Apache Spark Author, Using Flume (O

More information

How Companies are! Using Spark

How Companies are! Using Spark How Companies are! Using Spark And where the Edge in Big Data will be Matei Zaharia History Decreasing storage costs have led to an explosion of big data Commodity cluster software, like Hadoop, has made

More information

Streaming items through a cluster with Spark Streaming

Streaming items through a cluster with Spark Streaming Streaming items through a cluster with Spark Streaming Tathagata TD Das @tathadas CME 323: Distributed Algorithms and Optimization Stanford, May 6, 2015 Who am I? > Project Management Committee (PMC) member

More information

PALANTIR CYBER An End-to-End Cyber Intelligence Platform for Analysis & Knowledge Management

PALANTIR CYBER An End-to-End Cyber Intelligence Platform for Analysis & Knowledge Management PALANTIR CYBER An End-to-End Cyber Intelligence Platform for Analysis & Knowledge Management INTRODUCTION Traditional perimeter defense solutions fail against sophisticated adversaries who target their

More information

Open source Google-style large scale data analysis with Hadoop

Open source Google-style large scale data analysis with Hadoop Open source Google-style large scale data analysis with Hadoop Ioannis Konstantinou Email: [email protected] Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory School of Electrical

More information

Kafka & Redis for Big Data Solutions

Kafka & Redis for Big Data Solutions Kafka & Redis for Big Data Solutions Christopher Curtin Head of Technical Research @ChrisCurtin About Me 25+ years in technology Head of Technical Research at Silverpop, an IBM Company (14 + years at Silverpop)

More information

Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time?

Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time? Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time? Kai Wähner [email protected] @KaiWaehner www.kai-waehner.de Disclaimer! These opinions are my own and do not necessarily

More information

Innovative, High-Density, Massively Scalable Packet Capture and Cyber Analytics Cluster for Enterprise Customers

Innovative, High-Density, Massively Scalable Packet Capture and Cyber Analytics Cluster for Enterprise Customers Innovative, High-Density, Massively Scalable Packet Capture and Cyber Analytics Cluster for Enterprise Customers The Enterprise Packet Capture Cluster Platform is a complete solution based on a unique

More information

Hadoop s Advantages for! Machine! Learning and. Predictive! Analytics. Webinar will begin shortly. Presented by Hortonworks & Zementis

Hadoop s Advantages for! Machine! Learning and. Predictive! Analytics. Webinar will begin shortly. Presented by Hortonworks & Zementis Webinar will begin shortly Hadoop s Advantages for Machine Learning and Predictive Analytics Presented by Hortonworks & Zementis September 10, 2014 Copyright 2014 Zementis, Inc. All rights reserved. 2

More information

Real-time Analytics at Facebook: Data Freeway and Puma. Zheng Shao 12/2/2011

Real-time Analytics at Facebook: Data Freeway and Puma. Zheng Shao 12/2/2011 Real-time Analytics at Facebook: Data Freeway and Puma Zheng Shao 12/2/2011 Agenda 1 Analytics and Real-time 2 Data Freeway 3 Puma 4 Future Works Analytics and Real-time what and why Facebook Insights

More information

SQL + NOSQL + NEWSQL + REALTIME FOR INVESTMENT BANKS

SQL + NOSQL + NEWSQL + REALTIME FOR INVESTMENT BANKS Enterprise Data Problems in Investment Banks BigData History and Trend Driven by Google CAP Theorem for Distributed Computer System Open Source Building Blocks: Hadoop, Solr, Storm.. 3548 Hypothetical

More information

COMP9321 Web Application Engineering

COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 11 (Part II) http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2411

More information

The Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn

The Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn The Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn Presented by :- Ishank Kumar Aakash Patel Vishnu Dev Yadav CONTENT Abstract Introduction Related work The Ecosystem Ingress

More information

Hadoop MapReduce and Spark. Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015

Hadoop MapReduce and Spark. Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015 Hadoop MapReduce and Spark Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015 Outline Hadoop Hadoop Import data on Hadoop Spark Spark features Scala MLlib MLlib

More information

Cisco Data Preparation

Cisco Data Preparation Data Sheet Cisco Data Preparation Unleash your business analysts to develop the insights that drive better business outcomes, sooner, from all your data. As self-service business intelligence (BI) and

More information

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON Overview * Introduction * Multiple faces of Big Data * Challenges of Big Data * Cloud Computing

More information

AlienVault Unified Security Management Solution Complete. Simple. Affordable Life Cycle of a log

AlienVault Unified Security Management Solution Complete. Simple. Affordable Life Cycle of a log Complete. Simple. Affordable Copyright 2014 AlienVault. All rights reserved. AlienVault, AlienVault Unified Security Management, AlienVault USM, AlienVault Open Threat Exchange, AlienVault OTX, Open Threat

More information

http://glennengstrand.info/analytics/fp

http://glennengstrand.info/analytics/fp Functional Programming and Big Data by Glenn Engstrand (September 2014) http://glennengstrand.info/analytics/fp What is Functional Programming? It is a style of programming that emphasizes immutable state,

More information

Case Study: Instrumenting a Network for NetFlow Security Visualization Tools

Case Study: Instrumenting a Network for NetFlow Security Visualization Tools Case Study: Instrumenting a Network for NetFlow Security Visualization Tools William Yurcik* Yifan Li SIFT Research Group National Center for Supercomputing Applications (NCSA) University of Illinois at

More information

Unified Big Data Processing with Apache Spark. Matei Zaharia @matei_zaharia

Unified Big Data Processing with Apache Spark. Matei Zaharia @matei_zaharia Unified Big Data Processing with Apache Spark Matei Zaharia @matei_zaharia What is Apache Spark? Fast & general engine for big data processing Generalizes MapReduce model to support more types of processing

More information

Distributed Computing and Big Data: Hadoop and MapReduce

Distributed Computing and Big Data: Hadoop and MapReduce Distributed Computing and Big Data: Hadoop and MapReduce Bill Keenan, Director Terry Heinze, Architect Thomson Reuters Research & Development Agenda R&D Overview Hadoop and MapReduce Overview Use Case:

More information

BIG DATA TRENDS AND TECHNOLOGIES

BIG DATA TRENDS AND TECHNOLOGIES BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.

More information

Apache Hadoop FileSystem and its Usage in Facebook

Apache Hadoop FileSystem and its Usage in Facebook Apache Hadoop FileSystem and its Usage in Facebook Dhruba Borthakur Project Lead, Apache Hadoop Distributed File System [email protected] Presented at Indian Institute of Technology November, 2010 http://www.facebook.com/hadoopfs

More information

White Paper. How Streaming Data Analytics Enables Real-Time Decisions

White Paper. How Streaming Data Analytics Enables Real-Time Decisions White Paper How Streaming Data Analytics Enables Real-Time Decisions Contents Introduction... 1 What Is Streaming Analytics?... 1 How Does SAS Event Stream Processing Work?... 2 Overview...2 Event Stream

More information

A Brief Introduction to Apache Tez

A Brief Introduction to Apache Tez A Brief Introduction to Apache Tez Introduction It is a fact that data is basically the new currency of the modern business world. Companies that effectively maximize the value of their data (extract value

More information

TORNADO Solution for Telecom Vertical

TORNADO Solution for Telecom Vertical BIG DATA ANALYTICS & REPORTING TORNADO Solution for Telecom Vertical Overview Last decade has see a rapid growth in wireless and mobile devices such as smart- phones, tablets and netbook is becoming very

More information