Simplifying Big Data Analytics: Unifying Batch and Stream Processing. John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!!



Similar documents
Unified Batch & Stream Processing Platform

3 Reasons Enterprises Struggle with Storm & Spark Streaming and Adopt DataTorrent RTS

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc All Rights Reserved

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop

XpoLog Competitive Comparison Sheet

Luncheon Webinar Series May 13, 2013

Real Time Data Processing using Spark Streaming

Big Data Analytics Nokia

Big Data and Analytics in Government

HDP Hadoop From concept to deployment.

SAP and Hortonworks Reference Architecture

WHITE PAPER DataTorrent RTS: Real-Time Streaming Analytics for Big Data

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

An Integrated Big Data & Analytics Infrastructure June 14, 2012 Robert Stackowiak, VP Oracle ESG Data Systems Architecture

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN

The Future of Data Management

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

The 4 Pillars of Technosoft s Big Data Practice

Comprehensive Analytics on the Hortonworks Data Platform

IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS!

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Jitterbit Technical Overview : Microsoft Dynamics CRM

Beyond Lambda - how to get from logical to physical. Artur Borycki, Director International Technology & Innovations

Session 0202: Big Data in action with SAP HANA and Hadoop Platforms Prasad Illapani Product Management & Strategy (SAP HANA & Big Data) SAP Labs LLC,

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Big data platform for IoT Cloud Analytics. Chen Admati, Advanced Analytics, Intel

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

Apache Ignite TM (Incubating) - In- Memory Data Fabric Fast Data Meets Open Source

The Lab and The Factory

EMC Federation Big Data Solutions. Copyright 2015 EMC Corporation. All rights reserved.

Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities

Oracle Big Data Spatial & Graph Social Network Analysis - Case Study

HDP Enabling the Modern Data Architecture

The Enterprise Data Hub and The Modern Information Architecture

Pulsar Realtime Analytics At Scale. Tony Ng April 14, 2015

Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013

Upcoming Announcements

Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time?

Ganzheitliches Datenmanagement

IBM BigInsights for Apache Hadoop

Cisco Data Preparation

Databricks. A Primer

WHAT S NEW IN SAS 9.4

Automated Data Ingestion. Bernhard Disselhoff Enterprise Sales Engineer

Deploy. Friction-free self-service BI solutions for everyone Scalable analytics on a modern architecture

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved.

How To Use A Data Center With A Data Farm On A Microsoft Server On A Linux Server On An Ipad Or Ipad (Ortero) On A Cheap Computer (Orropera) On An Uniden (Orran)

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Oracle Big Data SQL Technical Update

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

Real Time Big Data Processing

Session 1: IT Infrastructure Security Vertica / Hadoop Integration and Analytic Capabilities for Federal Big Data Challenges

Databricks. A Primer

Big Data Technology ดร.ช ชาต หฤไชยะศ กด. Choochart Haruechaiyasak, Ph.D.

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

Jitterbit Technical Overview : Salesforce

Executive Summary... 2 Introduction Defining Big Data The Importance of Big Data... 4 Building a Big Data Platform...

Oracle Database 12c Plug In. Switch On. Get SMART.

An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise

Are You Ready for Big Data?

Predictive Analytics with Storm, Hadoop, R on AWS

BIG DATA TRENDS AND TECHNOLOGIES

A TECHNICAL WHITE PAPER ATTUNITY VISIBILITY

Using Tableau Software with Hortonworks Data Platform

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014

Embracing the Cloud, Mobile, Social & Big Data

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru

Data Virtualization for Agile Business Intelligence Systems and Virtual MDM. To View This Presentation as a Video Click Here

Architecting for the Internet of Things & Big Data

How Companies are! Using Spark

Getting Started Practical Input For Your Roadmap

A Modern Data Architecture with Apache Hadoop

Big Data Use Case: Business Analytics

tuplejump The data engineering platform

Keywords Big Data, NoSQL, Relational Databases, Decision Making using Big Data, Hadoop

Real-time Big Data Analytics with Storm

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015

Search and Real-Time Analytics on Big Data

How To Use Big Data For Telco (For A Telco)

IBM Big Data Platform

Virtualizing Apache Hadoop. June, 2012

Big Data Management and Security

Beyond Web Application Log Analysis using Apache TM Hadoop. A Whitepaper by Orzota, Inc.

SQLstream Blaze and Apache Storm A BENCHMARK COMPARISON

Harnessing big data with Hortonworks Data Platform and Red Hat JBoss Data Virtualization

BAO & Big Data Overview Applied to Real-time Campaign GSE. Joel Viale Telecom Solutions Lab Solution Architect. Telecom Solutions Lab

Oracle Data Integrator 12c (ODI12c) - Powering Big Data and Real-Time Business Analytics. An Oracle White Paper October 2013

Hadoop Data Hubs and BI. Supporting the migration from siloed reporting and BI to centralized services with Hadoop

Analyzing Big Data with AWS

the missing log collector Treasure Data, Inc. Muga Nishizawa

Harnessing the Power of the Microsoft Cloud for Deep Data Analytics

Apache Hadoop: The Pla/orm for Big Data. Amr Awadallah CTO, Founder, Cloudera, Inc.

Transcription:

Simplifying Big Data Analytics: Unifying Batch and Stream Processing John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!!

Streaming Analy.cs S S S Scale- up Database Data And Compute Grid Clustered Database D G G G D D D General- purpose data processing cluster B B B

DataTorrent enables enterprises to process data in motion and take action in real-time! Faster Time to Insight! Faster Time to Action! Trend: Batch and streaming use cases!

Data Processing Categories in Big Data Use Cases! Data Velocity Streaming Stream Processing N/A Sta.c Batch Processing Ad- hoc Query Known Unknown Ques.ons known?

Data Sources! Transactional Data! Web Click Stream! Mobile Devices! Operational Log Files! Public Data! Sensor Data!!

Customer Uses! Real-Time Advertising! Customer Service! Operational! Fraud Detection! Predictive Maintenance!

Processing Data In Motion! Ingest! Archive! Transform Normalize! Analyze Business Logic! Alert! Action! Visualize! Persist!

Online advertising dynamic inventory purchases! Ad Server 1! High volume auto-scaling fault tolerant event stream. Dimensional computing to identify performing ads.! Ad Placement Strategy! Fault-Tolerant Flume! In-memory analytic cube! Campaign Analysis! Real-time Dashboard! Ad Server 800! Oracle DB!

SmartGrid Connected Home! Home Sensor(s)! SmartGrid Sensors! Smart Grid provider with many partners has heterogeneous network sources, provides analytics to utilities & customers and provide ISV platform! Normalization! Enrichment Data! Analytics! ISV Applications! Tableau! Visualizations! Alert on error! Consumer Energy Audit! Operational Safety/Costs!

Batch Data! Customer Information! Historical Sales! Support Data! Product Configuration! Corporate Info!

Batch Processing Data! Ingest! Archive! Transform Normalize! Visualize! Persist! Analyze Business Logic! Alert! Action!

The Enterprise Data Processing Problem! BI & Analytics! Platform! Ingest! Archive! Transform Normalize! Analyze Business Logic! Alert! Action! Visualize! Persist! ETL! ETL! Business Analytics! Complex Event/! Event Streaming!

The Enterprise Data Processing Problem! BI & Analytics! Platform! Ingest! Archive! Transform Normalize! Analyze Business Logic! Alert! Action! Visualize! Persist! ETL! ETL! Business Analytics! Complex Event/! Event Streaming!

The Enterprise Data Processing Problem! Transmission team - Credit, Debit, ACH over Secure FTP! Transformation team - Parse, Dedup, Transform, Encrypt! Distribution team - Hadoop, MPP, DB! Reports team! Dashboards & Alerts! BI & Analytics! Platform! Ingest! Archive! Transform Normalize! Analyze Business Logic! Alert! Action! Visualize! Persist! Separate applications for each step in the end to end process.! ETL! o 4 to 5 batch jobs to complete the process end to end! o 1 to 2 runs a day. So typical time to value is around 12 hours! ETL! Business Analytics! Complex Event/! Event Streaming!

Financial services big data fabric! SMTP Logs! Financial Data! Secure, fault tolerant, data ingestion, formatting & archiving. Data access layer for application processing! Encrypt! Historical! Compliance! Archive! Alert on error! Application 1! Application n! Persistent!

Satellite Television Provider! Package Package Data! Data! Automated, faster time to insight, driving accurate payment, auditing and business planning! Payment System! Rate Rules! Join! Data Prep! Rate Rules! Data Compliance! Audit Reporting! Subscription! Subscription! Dated Archive! Business Projection!

DataTorrent RTS Architectural Overview! Ingestion & Distribution for Hadoop! Graphical Application Assembly! Real-Time Data Visualization! Management & Monitoring! Ingest! Archive! Transform Normalize! Analyze Business Logic! Re-Usable Java Operator Library! Scalable, High Performance, Fault Tolerant In-Memory Data Processing Platform! Hadoop 2.0 YARN + HDFS! Alert! Action! Visualize! Persist! Physical Virtual Cloud!

DataTorrent - Project Apex! Industry s first open source enterprise-class unified stream and! batch processing platform! DataTorrent RTS 3 Core engine! Key features requested by open source developer community! ᵒ Event processing guarantees! ᵒ In-memory performance & scalability! ᵒ Fault tolerance and state management! ᵒ Native rolling and static window support! ᵒ Hadoop-native YARN & HDFS implementation! Apache 2.0 License! Complemented by open source Malhar operator library! https://www.datatorrent.com/product/project-apex/!

DataTorrent enables enterprises to process data in motion and take action in real-time! Faster Time to Insight! Faster Time to Action! Trend: Batch and streaming use cases!

Big Data. Big Actions. Now.!