Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop

Similar documents
Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

Big Data and the Data Lake. February 2015

EMC Federation Big Data Solutions. Copyright 2015 EMC Corporation. All rights reserved.

Internet of Things. Opportunity Challenges Solutions

The Technology of the Business Data Lake

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

The Technology of the Business Data Lake

Big Data Analytics. Copyright 2011 EMC Corporation. All rights reserved.

Big Data Storage Challenges for the Industrial Internet of Things

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

HDP Hadoop From concept to deployment.

Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013

Detecting Anomalous Behavior with the Business Data Lake. Reference Architecture and Enterprise Approaches.

G-Cloud Big Data Suite Powered by Pivotal. December G-Cloud. service definitions

Simplifying Big Data Analytics: Unifying Batch and Stream Processing. John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!!

Securing the Big Data Ecosystem

Federated SQL on Hadoop and Beyond: Leveraging Apache Geode to Build a Poor Man's SAP HANA. by Christian

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Roadmap Talend : découvrez les futures fonctionnalités de Talend

Big Data Open Source Stack vs. Traditional Stack for BI and Analytics

Upcoming Announcements

Big Data Management and Security

HDP Enabling the Modern Data Architecture

Creating Big Data Applications with Spring XD

Advanced In-Database Analytics

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

Solving performance and data protection problems with active-active Hadoop SOLUTIONS BRIEF

Fundamentals Curriculum HAWQ

BIG DATA AND THE ENTERPRISE DATA WAREHOUSE WORKSHOP

Big Data Analytics Nokia

Comprehensive Analytics on the Hortonworks Data Platform

Modernizing Your Data Warehouse for Hadoop

Data Governance in the Hadoop Data Lake. Michael Lang May 2015

BIG DATA TRENDS AND TECHNOLOGIES

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved.

Data Security in Hadoop

#TalendSandbox for Big Data

The Future of Data Management

The Potential of Big Data in the Cloud. Juan Madera Technology Consultant

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015

Building Data-Driven Internet of Things (IoT) Applications

Big Data and Advanced Analytics Applications and Capabilities Steven Hagan, Vice President, Server Technologies

Workshop on Hadoop with Big Data

Information Architecture

Pivotal HD Enterprise

Session 0202: Big Data in action with SAP HANA and Hadoop Platforms Prasad Illapani Product Management & Strategy (SAP HANA & Big Data) SAP Labs LLC,

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

SAP and Hortonworks Reference Architecture

Bringing Big Data to People

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

Traditional BI vs. Business Data Lake A comparison

Protecting Big Data Data Protection Solutions for the Business Data Lake

Information Builders Mission & Value Proposition

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

Dell In-Memory Appliance for Cloudera Enterprise

BIG DATA GOVERNANCE: BALANCING BIG DATA VELOCITY & INFORMATION GOVERNANCE

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

REDEFINE BIG DATA: BECOMING A DATA-DRIVEN BUSINESS WITH THE EMC BUSINESS DATA LAKE CHRIS HARROLD, GLOBAL CTO, BIG DATA SOLUTIONS

Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time?

I/O Considerations in Big Data Analytics

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Copyright 2012 EMC Corporation. All rights reserved.

The Future of Data Management with Hadoop and the Enterprise Data Hub

Oracle Database 12c Plug In. Switch On. Get SMART.

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

From Spark to Ignition:

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

Cisco IT Hadoop Journey

How to Enhance Traditional BI Architecture to Leverage Big Data

EMC SOLUTION FOR AGILE AND ROBUST ANALYTICS ON HADOOP DATA LAKE WITH PIVOTAL HDB

Descriptive to Predictive to Prescriptive Analytics: Move Up the Value Chain. Suren Nathan CTO

Advanced Big Data Analytics with R and Hadoop

Implement Hadoop jobs to extract business value from large and varied data sets

Big Data and Analytics 21 A Technical Perspective Abhishek Bhattacharya, Aditya Gandhi and Pankaj Jain November 2012

The Principles of the Business Data Lake

Agenda. Big Data & Hadoop ViPR HDFS Pivotal Big Data Suite & ViPR HDFS ViON Customer Feedback #EMCVIPR

Trafodion Operational SQL-on-Hadoop

In-Memory Analytics for Big Data

Are You Big Data Ready?

Cisco IT Hadoop Journey

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014

ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE

The Analytics Value Chain Key to Delivering Value in IoT

Business Intelligence for Big Data

Dominik Wagenknecht Accenture

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Real Time Big Data Processing

Apache Hadoop: The Pla/orm for Big Data. Amr Awadallah CTO, Founder, Cloudera, Inc.

EMC Big Data: Cesta k podniku řízenému daty

The Lab and The Factory

Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities

Exploiting Data at Rest and Data in Motion with a Big Data Platform

EMC/Greenplum Driving the Future of Data Warehousing and Analytics

Moving From Hadoop to Spark

BUILT FOR THE SPEED OF BUSINESS. Copyright 2013 Pivotal. All rights reserved.

BIG DATA TECHNOLOGY. Hadoop Ecosystem

Apache Hadoop: Past, Present, and Future

Transcription:

1 Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop

2 Pivotal s Full Approach It s More Than Just Hadoop Pivotal Data Labs

3 Why Pivotal Exists First Movers Solve the Big Data Utility Gap ~7X $290B ~20X $120B ~30X$4 B ~20X $2.9B 70% of data generated by customers 80% of data being stored 3% being prepared for analysis 0.5% being analyzed Average Enterprises <0.5% being operationalized Smart Enterprises

STEPS TECHNOLOGY TARGET Copyright 2014 EMC Corporation. All rights reserved. 4 Journey To Data Driven Enterprise Archive Insights Apps Business Models Repeatable Framework Realize cost efficiencies and extend life of existing systems and Data migration Integrate all existing data to generate business insights Data Analysis Data Lake Build Apps to assist/take (automated) actions from the insights generated Data Driven Apps Create new revenue streams leveraging new data and new insights Business Transformation Platform as a Service Platform for experimenting data driven business models and innovation Experimentation Platform Manager IT Leaders Business Leader CEO

5 Data Driven: Harder Than It Sounds Real Time Near Real Time Batch Analytical Transactional Analytical Transactional Analytical Transactional Operationalize Operationalize Operationalize Ingest Interface Ingest Interface Ingest Interface Distill Process Distill Process Distill Process Predictive Call Routing, Fraud Prediction, Dynamic Pricing, Re-Marketing, Stream Analytics Analytic Model Designs, Transaction Analysis, Trend Analysis ETL, Archive, Trending, Monthly and Weekly Jobs

6 Data Driven: Impossible In Silos Data Growth Over 60% Floods These Silos Finance Manufacturing Marketing IT

7 Pivotal Business Data Lake Architecture Unified Sources Real-time ingestion Micro batch ingestion Batch ingestion Centralized Management System monitoring System management Unified Data Management Tier Data mgmt. services MDM RDM Workflow Management Processing Tier In-memory MPP database Distillation Tier Audit and policy mgmt. HDFS storage Unstructured and structured data Flexible Actions Real-time insights Interactive insights Batch insights

Pivotal HD Center Of Data Lake Pivotal HD Enterprise HAWQ Advanced Database Services Pivotal GemFire XD Real-Time Database Services Resource Management & Workflow Virtual Extensions YARN HBas e Xtension Framework ANSI SQL + Analytics MADlib Algorithms Catalog Services Dynamic Pipelining Query Optimizer Distributed In-memory Store ANSI SQL + In-Memory Query Transactions Ingestion Processing Hadoop Driver Parallel with Compaction Spring GraphLab, Open MPI Pig, Hive, Mahout Map Reduce Command Center Configure, Deploy, Monitor, ZooKeeper HDFS Manage Oozie Sqoop Spring XD Flume Apache Pivotal Copyright 2014 EMC Corporation. All rights reserved. 8

9 Pivotal HD Value OLAP HDFS SQL OLTP Cost-based Query Optimizer ANSI SQL Compliant Linear, incremental scalability on COTS hardware Deep Analytic OLAP Queries Petabyte Data Storage & Management Low latency updates and transactions Partitioned Events in situ w/ data Active-active deployment across WAN

10 Data Lake In Action Real-time, Closed Loop Analytics With Pivotal HD Detect Threshold Market Data Monte Carlo Simulation Trades/Bids Recalculate Model Send Correction Adaptive model development based on long term trends Real-time model scoring Continuous queries Fast Active Data + Big Historical Data

11 The Scenario Yesterday Application Type Pivotal Component Pricing Metric 1 Database Greenplum DB Data storage: tiered terabytes 2 Hadoop Distributed File System Pivotal HD Nodes 3 Parallel Query Engine HAWQ Nodes 4 In-Memory Data Grid GemFire CPUs and Add Ons with restrictions 5 In-Memory Data Grid for Hadoop GemFire XD TBD 6 In-Memory Data Grid with SQL Layer SQLFire CPUs and Add Ons with restrictions * GemFire XD will be included upon GA-Est. Q2-2014 Other add-on products: Pivotal Data Dispatch, Alpine Chorus

12 The Scenario Today Application Type Pivotal Component Pricing Metric: 1 Database Greenplum DB 2 3 4 Hadoop Distributed File System Parallel Query Engine In-Memory Data Grid Pivotal HD HAWQ GemFire SKU Unit of Measure 5 In-Memory Data Grid for Hadoop GemFire XD* Price 6 In-Memory Data Grid with SQL Layer SQLFire * GemFire XD will be included upon GA. Est Q2-2014

13 World s Leading Experts Pivotal Labs Pivotal Data Labs On Demand Services Pivotal Data Dispatch REAL TIME GemFire GemFire XD REAL TIME NEAR TIME Greenplum DB HAWQ NEAR TIME BATCH Pivotal HD BATCH

14 Customer Centric Model Subscription Based Software Only Core Based Customer Incentives Flexible Licensing UNLIMITED PIVOTAL HD INCLUDED

15 How Does This Work In Practice? Store Everything Obsessively collect data Keep it forever Put the data in one place Analyze Anything Cleanse, organize, and manage you data lake Make the right tools available Use the resources wisely to compute, analyze, and understand data Build the Right Thing Use insights to iteratively improve your product