The Future of Data Management with Hadoop and the Enterprise Data Hub



Similar documents
The Future of Data Management

The Enterprise Data Hub and The Modern Information Architecture

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

HDP Hadoop From concept to deployment.

Apache Hadoop in the Enterprise. Dr. Amr Awadallah,

HDP Enabling the Modern Data Architecture

The Future of Big Data SAS Automotive Roundtable Los Angeles, CA 5 March 2015 Mike Olson Chief Strategy Officer,

Hadoop Trends and Practical Use Cases. April 2014

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

More Data in Less Time

Information Builders Mission & Value Proposition

Cloudera Enterprise Data Hub in Telecom:

Build Your Competitive Edge in Big Data with Cisco. Rick Speyer Senior Global Marketing Manager Big Data Cisco Systems 6/25/2015

Apache Hadoop: Past, Present, and Future

Session 0202: Big Data in action with SAP HANA and Hadoop Platforms Prasad Illapani Product Management & Strategy (SAP HANA & Big Data) SAP Labs LLC,

Apache Hadoop: The Pla/orm for Big Data. Amr Awadallah CTO, Founder, Cloudera, Inc.

Comprehensive Analytics on the Hortonworks Data Platform

Big Data Are You Ready? Jorge Plascencia Solution Architect Manager

Talend Big Data. Delivering instant value from all your data. Talend

#TalendSandbox for Big Data

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Roadmap Talend : découvrez les futures fonctionnalités de Talend

Integrating a Big Data Platform into Government:

Upcoming Announcements

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Big Data Realities Hadoop in the Enterprise Architecture

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015

An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise

BIG DATA AND THE ENTERPRISE DATA WAREHOUSE WORKSHOP

Modernizing Your Data Warehouse for Hadoop

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, Viswa Sharma Solutions Architect Tata Consultancy Services

Interactive data analytics drive insights

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

Fighting Cyber Fraud with Hadoop. Niel Dunnage Senior Solutions Architect

Bringing Big Data to People

Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

Self-service BI for big data applications using Apache Drill

Driving Growth in Insurance With a Big Data Architecture

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

Ganzheitliches Datenmanagement

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

Evolution from Big Data to Smart Data

Big Data and New Paradigms in Information Management. Vladimir Videnovic Institute for Information Management

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

A Modern Data Architecture with Apache Hadoop

Getting Started Practical Input For Your Roadmap

Big Data Analytics. Copyright 2011 EMC Corporation. All rights reserved.

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc All Rights Reserved

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Dell In-Memory Appliance for Cloudera Enterprise

BIG DATA TRENDS AND TECHNOLOGIES

Financial, Telco, Retail, & Manufacturing: Hadoop Business Services for Industries

Are You Big Data Ready?

Deploying an Operational Data Store Designed for Big Data

Big Data Use Cases Update

BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014

Modern Data Architecture for Predictive Analytics

Self-service BI for big data applications using Apache Drill

The Digital Enterprise Demands a Modern Integration Approach. Nada daveiga, Sr. Dir. of Technical Sales Tony LaVasseur, Territory Leader

Accelerate your Big Data Strategy. Execute faster with Capgemini and Cloudera s Enterprise Data Hub Accelerator

IBM Big Data Platform

Luncheon Webinar Series May 13, 2013

Hadoop Ecosystem B Y R A H I M A.

Big Data: What You Should Know. Mark Child Research Manager - Software IDC CEMA

Cloud Integration and the Big Data Journey - Common Use-Case Patterns

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014

Certified Big Data and Apache Hadoop Developer VS-1221

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014

SQL Server 2012 PDW. Ryan Simpson Technical Solution Professional PDW Microsoft. Microsoft SQL Server 2012 Parallel Data Warehouse

Why Spark on Hadoop Matters

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Oracle Big Data SQL Technical Update

Understanding Your Customer Journey by Extending Adobe Analytics with Big Data

Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved.

BIG DATA TECHNOLOGY. Hadoop Ecosystem

Implement Hadoop jobs to extract business value from large and varied data sets

Dominik Wagenknecht Accenture

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Big data for the Masses The Unique Challenge of Big Data Integration

Hadoop, the Data Lake, and a New World of Analytics

New Clinical Research & Care Opportunities Through Big Data Informatics

Native Connectivity to Big Data Sources in MSTR 10

Building Scalable Big Data Pipelines

Are You Ready for Big Data?

Next presentation starting soon Business Analytics using Big Data to gain competitive advantage

ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE

Oracle Database 12c Plug In. Switch On. Get SMART.

The Next Wave of Data Management. Is Big Data The New Normal?

Transcription:

The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah Cofounder & CTO, Cloudera, Inc. Twitter: @awadallah 1

2

Cloudera Snapshot Founded 2008, by former employees of Employees Today 900+ World Class Support 24x7 Global Staff Pro-active & Predictive Support Programs Mission Critical Thousands of Enterprise Users Over ~600 Paying Subscription Customers The Largest Ecosystem Over 1600+ Partners Cloudera University Over 100,000+ Trained Open Source Leaders Cloudera Employees are Leading Developers & Contributors Total Capital Raised $1B+ (from Intel, Google, Dell, T. Rowe Price, Accel, Greylock) Mission Help Organizations Leverage the Power of All Their Data to Ask Bigger Questions. 2014 Cloudera, Inc. All rights 3

A Big Data Revolution is happening as we speak Industrial Revolution Data Revolution 4

Data Drives Industries Telecommunications Financial Services Public Sector Optimize network performance Money laundering detection Cyber security detection Retail Healthcare Product recommendations Personalized medicine 5

Data Drives Business Marketing Sales Operations Increase conversions by 2% Convert 5% more leads Reduce fraud by 3% Customer Satisfaction Product Reduce churn by 1% Increase user adoption by 10% 6

Why is Big Data Happening Now? Instrumentation Personalization Advanced Analytics Everything that can be measured will be measured. Employees and customers expect more personal interactions, but not at the cost of their privacy. The age of segment of 1. The most innovative companies embrace experimentation, predictive analytics and agility. 7

Data is fueling this opportunity Web/Mobile Clickstream Social Media Sensor Networks Audio, Image & Video 8

SQL Video & Voice Processing Text Sentiment Analysis Social Graph Analysis Access to diverse analysis techniques 9

People require analytics 80% of CEOs cite data mining and analytics as strategically important. -2015 PWC CEO Survey 10

GB of Data (IN BILLIONS) Big Data is Getting Bigger & More Multi-structured 10,000 1.8 trillion gigabytes of data was created in 2011* More than 90% is unstructured data Data volume doubles every year UNSTRUCTURED DATA * Source: IDC 2011 STRUCTURED DATA 0 2005 2015 2010 11

Hadoop Changes the Game: Storage & Compute Together The Old Way The Hadoop Way Compute (RDBMS, EDW) Data Storage (SAN, NAS) Compute (CPU) Memory Storage (Disk) Network z z Expensive, Special purpose, Reliable Servers Expensive Licensed Software Hard to scale Network is a bottleneck Only handles relational data Difficult to add new fields & data types Expensive & Unattainable $30,000+ per TB Commodity Unreliable Servers Hybrid Open Source Software Scales out forever No bottlenecks Easy to ingest any data Agile data access Affordable & Attainable $300-$1,000 per TB 2014 Cloudera, Inc. All rights 12

Expanding Data Requires A New Approach What we do Copy Data to Applications What we should do Bring Applications to Data App Data Process-centric businesses use: App Data App Data App Data Data Structured data mainly Internal data only Important data only Multiple copies of data App App Information-centric businesses use all Data: Multi-structured, Internal & external data of all types 13

The Legacy Way: Bringing Data to Applications 4 3 2 1 Can t Get a 360 View Many special-purpose systems Moving data around No complete views Can t Ask New Questions Existing systems strained No agility BI backlog Can t Meet ETL SLAs Up-front modeling Transforms slow Transforms lose data Can t Retain Valuable Data Leaving data behind Risk and compliance High cost of storage EDWS MARTS SERVERS DOCUMENTS STORAGE SEARCH ARCHIVE ERP, CRM, RDBMS, MACHINES FILES, IMAGES, VIDEOS, LOGS, CLICKSTREAMS EXTERNAL DATA SOURCES 2014 Cloudera, Inc. All rights 14

The Agile Way: Bringing Applications to Data 4 Consolidated Architecture Bring applications to data Combine different workloads on common data (i.e. SQL + Search) True analytic agility 3 4 3 2 1 Agile Exploration Simple search + BI tools Schema on read agility Reduce BI user backlog requests Scalable Transformations One source of data for all analytics Persist state of transformed data Significantly faster & cheaper Active Archive Full fidelity original data Indefinite time, any source Lowest cost storage SERVERS MARTS EDWS DOCUMENTS STORAGE SEARCH ARCHIVE 1 2 ERP, CRM, RDBMS, MACHINES FILES, IMAGES, VIDEOS, LOGS, CLICKSTREAMS ESTERNAL DATA SOURCES 2014 Cloudera, Inc. All rights 15

Core Benefits of the Enterprise Data Hub Full-Fidelity Active Archive Accelerate Time to Insight (Scale) Unlock Agility & Exploration Consolidate Silos for 360 o View Enable Pervasive Analytics 2014 Cloudera, Inc. All rights reserved. 16

17

Apache Hadoop is more than just Hadoop Core Hadoop (HDFS, MR) HBase ZooKeeper Core Hadoop Hive Pig Mahout HBase ZooKeeper Core Hadoop Sqoop Whirr Avro Hive Pig Mahout HBase ZooKeeper Core Hadoop Flume Bigtop Oozie MRUnit HCatalog Sqoop Whirr Avro Hive Pig Mahout HBase ZooKeeper Core Hadoop +YARN Spark Impala Solr Kafka Flume Bigtop Oozie MRUnit HCatalog Sqoop Whirr Avro Hive Pig Mahout HBase ZooKeeper Core Hadoop +YARN Parquet Sentry Spark Impala Solr Kafka Flume Bigtop Oozie MRUnit HCatalog Sqoop Whirr Avro Hive Pig Mahout HBase ZooKeeper Core Hadoop +YARN 2006 2008 2009 2010 2011 2012 Present 18

Cloudera Enterprise powered by Apache Hadoop Process Discover Model Serve Deployment Flexibility Security and Administration Unlimited Storage On-Premises Appliances Engineered Systems Public Cloud Private Cloud Hybrid Cloud A new kind of data platform One place for unlimited any-type data Unified, multi-framework data access Key Advantages: High performance Enterprise system and data management Secure by default Open source, Open standards 19

The Modern Information Architecture Data Architects System Operators Engineers Data Scientists Analysts Business Users META DATA / ETL TOOLS CLOUDERA MANAGER CONVERGED APPLICATIONS MACHINE LEARNING BI / ANALYTICS ENTERPRISE REPORTING ENTERPRISE DATA HUB ENTERPRISE DATA WAREHOUSE ONLINE SERVING SYSTEM SYS LOGS WEB LOGS FILES RDBMS WEB/MOBILE APPLICATIONS Customers & End Users 20

Largest Ecosystem: More than 1600 partners Applications Operational Tools Data Systems Enterprise Data Hub Process Discover Model Serve Security and Administration Unlimited Storage System Integration Infrastructure 21

A High Level View of the Journey Operational Efficiency (Faster, Bigger, Cheaper) Transformative Applications (New Business Value) Cheap Storage ETL Acceleration EDW Optimization Agile Exploration Not Only SQL Pervasive Analytics IT Business 2014 Cloudera, Inc. All rights 22

Customer Segmentation Marketing Campaign Testing Regulatory Compliance Data Drives Travel/Leisure 23

Data Drives Social 24

Data Drives Healthcare Population Health Patient Monitoring Chronic Disease Management 25

Predictive maintenance Goods classification Data Drives Manufacturing 26

Data Drives Financial Services Regulatory Compliance InfoSec Fraud Detection 27

Ask Bigger Questions: How do we feed the world? A Fortune 500 company specializing in agriculture and genomics can automate data-driven R&D decisions to reduce time to market from years to months. 28 2013 Cloudera, Inc. All rights reserved. 28

WHAT S AHEAD: Security & privacy of sensitive data Lower total-cost-of-ownership Time-to-value (Ease of Use & Real Time) 29

30

Data Will Drive the Modern World Amr Awadallah (@awadallah) Cofounder/CTO, Cloudera, Inc. 31

Thank you! Amr Awadallah Cofounder & CTO Twitter: @awadallah 32

Why Cloudera? Enterprise-Grade Hadoop Differentiated performance, security, management, and governance. Expertise No one knows Hadoop better than Cloudera. Enablement Support, Training, and Professional Services enable and deliver success. Ecosystem Cloudera ensures that Hadoop works with the platforms, tools, and integrators you rely on. Sustainable Innovation Our hybrid open source model delivers the benefits of open source and what the enterprise requires, while enabling us to invest in the future for our customers. 33

One Platform, Many Workloads Process Ingest Sqoop, Flume, Kafka Transform MapReduce, Hive, Pig, Spark Discover Analytic Database Impala Search Solr Security and Administration Model Machine Learning SAS, R, Spark, Mahout, Oryx YARN, Cloudera Manager, Cloudera Navigator Unlimited Storage HDFS, HBase Serve NoSQL Database HBase Streaming Spark Streaming Batch, Interactive, and Real-Time. Leading performance and usability in one platform. End-to-end analytic workflows Access more data Work with data in new ways Enable new users 34

Hadoop Administration Made Easy Cloudera Manager Focus on the solution, not the cluster, with the only complete, zero-downtime administration tool for Apache Hadoop. Unique Capabilities: Unified configuration, management and monitoring across all services Online installation and upgrades Direct connection to Cloudera Support 3 rd Party Extensibility 35

Big Data Meets Data Governance Cloudera Navigator Minimize risk and maintain compliance with the only native end-to-end data governance solution for Apache Hadoop. Unique Capabilities: Auditing Lineage Metadata Tagging and Discovery Lifecycle Management 36

The Cloudera Community Enterprise Expertise Training Expertise Hadoop Expertise Security, Compliance, Cloud 30K Trained 100K Online Training 10K Certified Other 56% Source: Apache JIRA January 2012 October 2014 37

And It Isn t Just About Web 2.0 / Social AUTOMOTIVE Auto sensors reporting location, problems HIGH TECHNOLOGY / INDUSTRIAL MFG. Mfg quality Warranty analysis CONSUMER PACKAGED GOODS Sentiment analysis of what s hot, customer service UTILITIES Smart Meter analysis for network capacity EDUCATION & RESEARCH Experiment sensor analysis COMMUNICATIONS Locationbased advertising LIFE SCIENCES Clinical trials Genomics MEDIA / ENTERTAINMENT Viewers / advertising effectiveness ON-LINE SERVICES / SOCIAL MEDIA People & career matching Website optimization HEALTH CARE Patient sensors, monitoring, EHRs Quality of care OIL & GAS Drilling exploration sensor analysis RETAIL Consumer sentiment Optimized marketing TRAVEL & TRANSPORTATION Sensor analysis for optimal traffic flows Customer sentiment FINANCIAL SERVICES Risk & portfolio analysis New products LAW ENFORCEMENT & DEFENSE Threat analysis, Social media monitoring, Photo analysis 38