More Data in Less Time

Similar documents
Deploying an Operational Data Store Designed for Big Data

The Future of Data Management

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

The Enterprise Data Hub and The Modern Information Architecture

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

The Future of Data Management with Hadoop and the Enterprise Data Hub

Driving Growth in Insurance With a Big Data Architecture

Hadoop Trends and Practical Use Cases. April 2014

Cloudera Enterprise Data Hub in Telecom:

How to avoid building a data swamp

Cloudera Enterprise Data Hub. GCloud Service Definition Lot 3: Software as a Service

Hadoop Data Hubs and BI. Supporting the migration from siloed reporting and BI to centralized services with Hadoop

Databricks. A Primer

Databricks. A Primer

Apache Hadoop in the Enterprise. Dr. Amr Awadallah,

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014

Accelerate your Big Data Strategy. Execute faster with Capgemini and Cloudera s Enterprise Data Hub Accelerator

Ganzheitliches Datenmanagement

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Financial, Telco, Retail, & Manufacturing: Hadoop Business Services for Industries

Luncheon Webinar Series May 13, 2013

HDP Hadoop From concept to deployment.

Integrating a Big Data Platform into Government:

Interactive data analytics drive insights

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Accelerating Enterprise Big Data Success. Tim Stevens, VP of Business and Corporate Development Cloudera

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

Big Data must become a first class citizen in the enterprise

Protecting Big Data Data Protection Solutions for the Business Data Lake

Cloudera in the Public Cloud

locuz.com Big Data Services

Are You Big Data Ready?

Elastic Application Platform for Market Data Real-Time Analytics. for E-Commerce

MULTITENANCY AND THE ENTERPRISE DATA HUB:

Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload

Data Integration Checklist

Optimized for the Industrial Internet: GE s Industrial Data Lake Platform

UNIFY YOUR (BIG) DATA

Oracle Big Data Building A Big Data Management System

SAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES

An Enterprise Data Hub, the Next Gen Operational Data Store

How to Enhance Traditional BI Architecture to Leverage Big Data

Unico Enterprise Big Data

Building Your Big Data Team

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Oracle Database 12c Plug In. Switch On. Get SMART.

Insurers Capitalize on Big Data and Hadoop

Big Data Use Cases. To Start Today. Paul Scholey Sales Director, EMEA. 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866)

Data Governance in the Hadoop Data Lake. Michael Lang May 2015

A TECHNICAL WHITE PAPER ATTUNITY VISIBILITY

CAPTURING & PROCESSING REAL-TIME DATA ON AWS

CA Technologies Big Data Infrastructure Management Unified Management and Visibility of Big Data

Deploy. Friction-free self-service BI solutions for everyone Scalable analytics on a modern architecture

Virtualizing Apache Hadoop. June, 2012

Native Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy

White Paper: Evaluating Big Data Analytical Capabilities For Government Use

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

A Big Data Storage Architecture for the Second Wave David Sunny Sundstrom Principle Product Director, Storage Oracle

A Whole New World. Big Data Technologies Big Discovery Big Insights Endless Possibilities

How To Handle Big Data With A Data Scientist

AtScale Intelligence Platform

HITACHI DATA SYSTEMS HADOOP SOLUTION JUNE 12, 2012

WHITE PAPER. Hadoop and HDFS: Storage for Next Generation Data Management. Version: Q

BIG DATA TECHNOLOGY. Hadoop Ecosystem

Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth

Cloud Integration and the Big Data Journey - Common Use-Case Patterns

Offload Enterprise Data Warehouse (EDW) to Big Data Lake. Ample White Paper

Redefining Infrastructure Management for Today s Application Economy

Real Time Big Data Processing

Cisco IT Hadoop Journey

BIG DATA & DATA SCIENCE

Hadoop in the Hybrid Cloud

Addressing Risk Data Aggregation and Risk Reporting Ben Sharma, CEO. Big Data Everywhere Conference, NYC November 2015

Data movement for globally deployed Big Data Hadoop architectures

Roadmap Talend : découvrez les futures fonctionnalités de Talend

Making Leaders Successful Every Day

Native Connectivity to Big Data Sources in MSTR 10

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Big Data Analytics Nokia

HDP Enabling the Modern Data Architecture

Operational Analytics

TE's Analytics on Hadoop and SAP HANA Using SAP Vora

MapR: Best Solution for Customer Success

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances

Non-Stop Hadoop Paul Scott-Murphy VP Field Techincal Service, APJ. Cloudera World Japan November 2014

Big Data at Cloud Scale

Quickly Deploy Microsoft Private Cloud and SQL Server 2012 Data Warehouse on Hitachi Converged Solutions. September 25, 2013

Oracle Big Data Discovery Unlock Potential in Big Data Reservoir

Executive Summary WHO SHOULD READ THIS PAPER?

Cisco IT Hadoop Journey

Microsoft Big Data. Solution Brief

Three Open Blueprints For Big Data Success

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

Transcription:

More Data in Less Time Leveraging Cloudera CDH as an Operational Data Store Daniel Tydecks, Systems Engineering DACH & CE

Goals of an Operational Data Store Load Data Sources Traditional Architecture Operational Data Store Enterprise Data Warehouse Archive Storage N Applications BI System Ingest Storage #2 Unstructured Storage #1 ELT Serve Modeling Ingest Process Reporting Structured ETL Enterprise Data Warehouse Ingest Data Prepare Data Store Data 3

Load Challenges with a Traditional Architecture Traditional Architecture Data Sources Operational Data Store Enterprise Data Warehouse Applications 3 Archive Storage N BI System Ingest Unstructured 1 Storage #1 Storage #2 ELT 2 Serve Modeling Ingest Process 2 Reporting Structured ETL Enterprise Data Warehouse 1) Limited Data Ingest Unstructured Data Challenge Data Siloes Limit Data Collection 2) Inefficient Data Processing Resource Intensive ELT Transforming Unstructured Data Meeting SLAs 3) Data Archived Decrease Data Returns Archive is offline Data Deleted 4

A New Way Forward Data Sources Modern Architecture Operational Data Store Applications BI System Unstructured 1 Ingest ETL 3 EDH ELT Archiv e Load 2 Serve Modeling Active Structured Data Serve Reporting Structured Enterprise Data Warehouse Enterprise Data Warehouse 1) Ingest More Data Collect Any Data Volume Collect Data in Full Fidelity Diverse Data 2) Optimize Data Processing ELT Offload Parallel Processing Scalable Storage 3) Automated Secure Archive Historic Data Access Cost Effective Data Storage Compliance-Ready 5

Customer Spotlight Challenge Traditional system could not process omni-channel data fast enough Limiting customers to monthly reports Forcing decisions to be made with stale data Leading to poor consumer experience due to latency Solution Cloudera provided a landing zone where Experian could process and store large amounts of disparate data at scale. Benefit Process 28K records per second Process data 50X faster Increase consumer report frequency from monthly to weekly We needed to leap forward in our processing ability. We wanted to process data orders of magnitude faster so we could react to tomorrow s consumer. -Jeff Hassemer, VP of Product Strategy 6

How Cloudera Helps 1. Scalable Storage & Ingest 2. ETL Tool Integration 3. Data Modeling CLOUDERA S ENTERPRISE DATA HUB BATCH ANALYTIC SQL SEARCH ENGINE MACHINE LEARNING WORKLOAD STREAM 3 RD PARTY APPS DATA 4. Parallel Processing 5. Data Security & Governance 6. High Availability Administration Filesystem STORAGE FOR ANY TYPE OF DATA UNIFIED, ELASTIC, RESILIENT, SECURE Online NoSQL SYSTEM Companies that are more data driven are 5 percent more productive and 6 percent more profitable than other companies. 7

Store and Ingest More Data Data Storage CLOUDERA S ENTERPRISE DATA HUB Store any volume or type of data in full fidelity Storage for Replay Data Ingestion Easily integrate data from existing systems (relational, EDW, NoSQL, etc) Quickly ingest multiple data types (schema on read vs schema on write) BATCH ANALYTIC SQL Filesystem SEARCH ENGINE MACHINE LEARNING WORKLOAD STORAGE FOR ANY TYPE OF DATA UNIFIED, ELASTIC, RESILIENT, SECURE STREAM Online NoSQL 3 RD PARTY APPS DATA SYSTEM The NetApp Open Solution for Hadoop system offers us the scalability and flexibility we need to effectively support our growing client base and rapidly expanding data stores Marty Mayer, Director of Customer Tools Structured Unstructured 8

Integrate with Existing Tools ETL Partners Integrate with ETL tools to compliment existing investments and skills CLOUDERA S ENTERPRISE DATA HUB BATCH ANALYTIC SQL SEARCH ENGINE MACHINE LEARNING WORKLOAD STREAM 3 RD PARTY APPS DATA Filesystem STORAGE FOR ANY TYPE OF DATA UNIFIED, ELASTIC, RESILIENT, SECURE Online NoSQL SYSTEM 9

Model Structured & Unstructured Data Faster Data Management CLOUDERA S ENTERPRISE DATA HUB Use lineage to discover, track, and validate new and old data to ensure proper use Analytic SQL Quickly discover patterns in new data to facilitate large scale processing BATCH ANALYTIC SQL Filesystem SEARCH ENGINE MACHINE LEARNING WORKLOAD STORAGE FOR ANY TYPE OF DATA UNIFIED, ELASTIC, RESILIENT, SECURE STREAM Online NoSQL 3 RD PARTY APPS DATA SYSTEM 10

Parallel Process Data Volumes Batch Processing Fault-tolerant processing of large volumes of diverse data Stream Processing Process data as it s made available CLOUDERA S ENTERPRISE DATA HUB BATCH ANALYTIC SQL Filesystem SEARCH ENGINE MACHINE LEARNING WORKLOAD STORAGE FOR ANY TYPE OF DATA UNIFIED, ELASTIC, RESILIENT, SECURE STREAM Online NoSQL 3 RD PARTY APPS DATA SYSTEM "The Orbitz Worldwide sites process millions of searches and transactions every day... Hadoop was selected to provide a solution to the problem of long-term storage and processing - Jonathan Seifman, Lead Engineer for the Intelligent Marketplace Team 12

Protect and Govern Your Data Enterprise Security & Governance CLOUDERA S ENTERPRISE DATA HUB End-to-end protection with integrated authentication, role based authorization, encryption, key management, audit, and lineage Native platform solution ensures unified data management for easy reporting and discovery of data Compliance-ready to meet stringent regulatory requirements, out-of-the-box BATCH ANALYTIC SQL Filesystem SEARCH ENGINE MACHINE LEARNING WORKLOAD STORAGE FOR ANY TYPE OF DATA UNIFIED, ELASTIC, RESILIENT, SECURE STREAM Online NoSQL 3 RD PARTY APPS DATA SYSTEM "We selected Cloudera because of its short deployment time and breadth of mission-critical features, which satisfy the strict security and reliability requirements of our business. Stefan Apitz, VP of Operations 13

Manage Overall System Performance High Availably Administration CLOUDERA S ENTERPRISE DATA HUB Simple, centralized system view from ingest to analysis Supports mission critical workloads with necessary enterprise features (BDR, Proactive Support, Security) Zero downtime rolling upgrades Natively deploy and mange ETL tools Cloudera Enterprise gives our operations team the confidence that we are ahead of the curve in terms of keeping our cluster running with peak performance. Nick Halstead, Founder BATCH ANALYTIC SQL Filesystem SEARCH ENGINE MACHINE LEARNING WORKLOAD STORAGE FOR ANY TYPE OF DATA UNIFIED, ELASTIC, RESILIENT, SECURE STREAM Online NoSQL 3 RD PARTY APPS DATA SYSTEM 14

Keep Services Running Focus on the solution, not the cluster, with the only complete, zero-downtime administration tool for Apache Hadoop. Cloudera Enterprise gives our operations team the confidence that we are ahead of the curve in terms of keeping our cluster running with peak performance. Nick Halstead, Founder Unique Capabilities: Unified configuration, management and monitoring across all services Online installation and upgrades Direct connection to Cloudera Support 3 rd Party Extensibility 15

Load Traditional vs Modern Architectures Data Sources Traditional Architecture Operational Data Store Archive Enterprise Data Warehouse Applications Data Sources Modern Architecture Operational Data Store Applications Unstructured Ingest Storage N Storage #1 Storage #2 ELT Serve BI System Modeling Unstructured Ingest ETL EDH ELT Archiv e Load Serve BI System Modeling Ingest Process Reporting Active Structured Data Serve Reporting Structured ETL Enterprise Data Warehouse Structured Enterprise Data Warehouse Enterprise Data Warehouse Ingest More Data Optimize Data Processing Automated Secure Archive 16

The Road to Success Administrator Training Security Integration Configure, install, and monitor clusters for optimal performance Implement security measures and multi-user functionality Audit architecture in light of security policies and best practices Implement custom security to authenticate users, admins, and apps Data Analyst Training Apply SQL to much larger data sets with Impala, Hive, and Pig Master advanced techniques that boost Hadoop accessibility ETL Ingestion Pilot Reference implementation to 3 sources, 5 transforms, 1 target Create, execute, test, and review a custom ingestion/etl plan 17

Disrupt the Industry Not Your Business Implement Full Governance, Privacy, and Compliance Enable Big Data Processing and Applications Development Activate All Your Data in One Place Align Systems, Operations, & Strategy to Best-in-Class Proposed Evolution of Cloudera Enterprise Deployment Estimated Data in Production Proposed Services Timeline Administrator Training 4 Days Cluster Setup & Certification 1 Week Security Integration 1-2 Week Data Analyst Training 3 Days ETL Ingestion Pilot 2 Weeks 18

Thank you.

Why Cloudera? Enterprise-Grade Hadoop Differentiated performance, security, management, and governance. Expertise No one knows Hadoop better than Cloudera. Enablement Support, Training, and Professional Services enable and deliver success. Ecosystem Cloudera ensures that Hadoop works with the platforms, tools, and integrators you rely on. Sustainable Innovation Our hybrid open source model delivers the benefits of open source and what the enterprise requires, while enabling us to invest in the future for our customers. 20

The Most Complete Ecosystem Applications More than 1,200 partners ensure compatibility with existing investments, lower skill barriers, and help maximize value from your data. Enterprise Data Hub Data Systems Process Discover Model Serve Security and Administration Unlimited Storage System Integration Operational Tools Infrastructure 21

The Journey to a Data Strategy Operational Efficiency New Business Value Optimize your architecture. IT Discover the value in your data. analysts and data scientists Empower users directly. everyone Proces Discov Model Serve s er Security and Administration Unlimited Storage 22