Saving Millions through Data Warehouse Offloading to Hadoop. Jack Norris, CMO MapR Technologies. MapR Technologies. All rights reserved.



Similar documents
HDP Hadoop From concept to deployment.

The Future of Data Management

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

How To Use A Data Center With A Data Farm On A Microsoft Server On A Linux Server On An Ipad Or Ipad (Ortero) On A Cheap Computer (Orropera) On An Uniden (Orran)

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Information Builders Mission & Value Proposition

<Insert Picture Here> Big Data

HDP Enabling the Modern Data Architecture

The Future of Data Management with Hadoop and the Enterprise Data Hub

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

Data Integration Checklist

#TalendSandbox for Big Data

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

Luncheon Webinar Series May 13, 2013

BIG DATA TRENDS AND TECHNOLOGIES

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014

Putting Apache Kafka to Use!

I/O Considerations in Big Data Analytics

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Self-service BI for big data applications using Apache Drill

A Big Data Storage Architecture for the Second Wave David Sunny Sundstrom Principle Product Director, Storage Oracle

Non-Stop Hadoop Paul Scott-Murphy VP Field Techincal Service, APJ. Cloudera World Japan November 2014

Oracle Database 12c Plug In. Switch On. Get SMART.

Offload Enterprise Data Warehouse (EDW) to Big Data Lake. Ample White Paper

Upcoming Announcements

WHITE PAPER USING CLOUDERA TO IMPROVE DATA PROCESSING

Self-service BI for big data applications using Apache Drill

Big Data Success Step 1: Get the Technology Right

ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE

Native Connectivity to Big Data Sources in MSTR 10

Big Data and Data Science: Behind the Buzz Words

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances

In-memory computing with SAP HANA

Dominik Wagenknecht Accenture

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

The Internet of Things and Big Data: Intro

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Please give me your feedback

More Data in Less Time

So What s the Big Deal?

Big Data Technology ดร.ช ชาต หฤไชยะศ กด. Choochart Haruechaiyasak, Ph.D.

Apache Hadoop: The Pla/orm for Big Data. Amr Awadallah CTO, Founder, Cloudera, Inc.

Big Data Analytics Nokia

Big Data Realities Hadoop in the Enterprise Architecture

Big Data Use Cases. To Start Today. Paul Scholey Sales Director, EMEA. 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866)

#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld

EMC BACKUP MEETS BIG DATA

Cisco Unified Data Center Solutions for MapR: Deliver Automated, High-Performance Hadoop Workloads

Big Data Technologies Compared June 2014

Big Data. Value, use cases and architectures. Petar Torre Lead Architect Service Provider Group. Dubrovnik, Croatia, South East Europe May, 2013

Comprehensive Analytics on the Hortonworks Data Platform

QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc All Rights Reserved

Hadoop Big Data for Processing Data and Performing Workload

Elastic Application Platform for Market Data Real-Time Analytics. for E-Commerce

A Modern Data Architecture with Apache Hadoop

Protecting Big Data Data Protection Solutions for the Business Data Lake

Trafodion Operational SQL-on-Hadoop

Implement Hadoop jobs to extract business value from large and varied data sets

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

IBM BigInsights Has Potential If It Lives Up To Its Promise. InfoSphere BigInsights A Closer Look

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

Cost-Effective Business Intelligence with Red Hat and Open Source

Il mondo dei DB Cambia : Tecnologie e opportunita`

Evolution from Big Data to Smart Data

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru

Big Data Analytics - Accelerated. stream-horizon.com

Tap into Hadoop and Other No SQL Sources

Proact whitepaper on Big Data

IBM Big Data Platform

High Availability on MapR

Big Data Management and Security

Big Data: Are You Ready? Kevin Lancaster

IBM InfoSphere BigInsights Enterprise Edition

Executive Summary... 2 Introduction Defining Big Data The Importance of Big Data... 4 Building a Big Data Platform...

Workshop on Hadoop with Big Data

Why Big Data in the Cloud?

Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time?

Oracle Database - Engineered for Innovation. Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya

Hadoop Ecosystem B Y R A H I M A.

Bringing Big Data into the Enterprise

How Companies are! Using Spark

Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April

Advanced In-Database Analytics

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

CIO Guide How to Use Hadoop with Your SAP Software Landscape

Understanding How Sensage Compares/Contrasts with Hadoop

Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop

Big data blue print for cloud architecture

How To Use Big Data For Telco (For A Telco)

Microsoft Big Data. Solution Brief

Getting Started Practical Input For Your Roadmap

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Virtualizing Apache Hadoop. June, 2012

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Hadoop & its Usage at Facebook

Transcription:

Saving Millions through Data Warehouse Offloading to Hadoop Jack Norris, CMO MapR Technologies MapR Technologies. All rights reserved.

MapR Technologies Overview Open, enterprise-grade distribution for Hadoop Easy, dependable and fast Open source with standards-based extensions Background of team Deep management bench with extensive analytic, storage, virtualization, and open source experience Google, Microsoft, EMC, Network Appliance, Transarc, Apache MapR is deployed at 1000 s of companies From small Internet startups to the world s largest enterprises MapR customers analyze massive amounts of data: Hundreds of billions of events daily 90% of the world s Internet population monthly $1 trillion in retail purchases annually MapR Technologies. All rights reserved.

Arrival of Big Data Impacts Data Warehouse Variety Inability to process unstructured formats Volume Velocity Prohibitively expensive storage costs Data Warehouse Faster arrival and processing needs

Top Concern for Big Data Too many different types, sources & formats of critical data Multiple Data sources Multiple technologies Multiple copies of data

The Hadoop Advantage Fueling an industry revolution by providing infinite capability to store and process big data Expanding analytics across data types Compelling economics 20 to 100X more cost effective than alternatives Pioneered at

Important Drivers for Hadoop Compute on data drives efficiencies You don t need to know what questions to ask beforehand More data beats better models Powerful ability to analyze unstructured data Pioneered at

Hadoop Growth

How Do You Lower & Control Data Warehouse Costs? Source Data Traditional Targets Databases and data warehouses are exceeding their capacity too quickly Transactions, OLTP, OLAP Enterprise Data Warehouse Batch ETL Documents and Emails Social Media, Web Logs Machine Device, Scientific Batch windows hitting their limits putting SLAs at risk Raw data or infrequently used data consuming capacity Datamarts ODS

Lower Data Management Costs Source Data Traditional Targets Archive Profile Parse Transform Cleanse Match Transactions, OLTP, OLAP Enterprise Data Warehouse Documents and Emails Social Media, Web Logs RDBMS Machine Device, Scientific MDM

Leverage Familiar Tools With MapR Cluster No-code visual development environment Preview results at any point in the data flow

Maximize Your Return on Data 1. Lower Costs Order of magnitude savings with MapR Optimized end-to-end performance Rich pre-built connectors, library of transforms for ETL, data quality, parsing, profiling Archive Profile Parse Transform Cleanse Match 2. Increase Productivity Up to 5x productivity gains with no-code visual development environment No need for Hadoop expertise for data integration 3. Increase ROI Ability to scale platform easily with demand Leverage for additional analytics Hadoop

Customer Case Study Business Impact: Saved millions in TCO 10x faster, 100x cheaper Maintain the same SLAs Implemented the change without impacting users

Data Warehouse Offload: Cost Savings + Analytics RDBMS Sensor Data Web Logs ETL + Long Term Storage Hadoop DW Query + Present Benefits: Both structured and unstructured data Expanded analytics with MapReduce, NoSQL, etc. Solution Cost / Terabyte Hadoop Advantage Hadoop $333 Teradata Warehouse Appliance $16,500 50x savings Oracle Exadata $14,000 42x savings IBM Netezza $10,000 30x savings

What is the Best Way to Deploy Hadoop? Permanent Data Store Transitory Data Store No long-term scale advantages Unprotected data ETL Tool focus vs. Highly available and fully protected data Works with existing tools Real-time ingestion and extraction Archive data from data warehouse Enterprise Data Hub

An Enterprise Data Hub Location Click Streams Sensor Data Enterprise Data Hub Production Data Web Logs Public Social Media Sales SCM CRM Billing Combine different data sources Minimize data movement One platform for analytics

Key Elements of Enterprise Data Hub Enterprise-grade data platform for the long-term 99.999% HA Data Protection Disaster Recovery Reliability to support stringent SLAs Scalability & Performance Enterprise Integration Multitenancy Protection from data loss and user or application errors Support business continuity and meet recovery objectives

High Availability and Dependability Reliable Compute Dependable Storage Automated stateful failover Automated re-replication Self-healing from HW and SW failures Load balancing Rolling upgrades No lost jobs or data 99999 s of uptime Business continuity with snapshots and mirrors Recover to a point in time End-to-end check summing Strong consistency Data safe Mirror across sites to meet Recovery Time Objectives

Data Protection with Snapshots Hadoop / HBASE Hadoop Hadoop / HBASE APPLICATIONS APPLICATIONS / HBASE APPLICATIONS NFS NFS APPLICAITONS APPLICAITONS NFS APPLICATIONS Snapshots without data duplication Data Blocks READ / WRITE Storage Services REDIRECT ON WRITE FOR SNAPSHOT Save space by sharing blocks Lightning fast A B C C D Zero performance loss on writing to original Scheduled, or on-demand Snapshot 1 Snapshot 2 Snapshot 3 Easy recovery by user

Disaster Recovery with Mirroring Production Research Business Continuity and Efficiency WAN Datacenter 1 Datacenter 1 Efficient design Differential deltas Compressed and check-summed Production WAN EC2 Easy to manage Scheduled or on-demand WAN, Remote Seeding Consistent point-in-time

Key Elements of Enterprise Data Hub Enterprise-grade platform for the long-term Consistent performance with large horizontal scalability 99.999% HA Data Protection Integrate with existing applications and tools Disaster Recovery Scalability & Performance Enterprise Integration Multitenancy Support multiple users, groups & applications securely

World Record Performance and Scale MinuteSort World Record 1.5 TB in 1 minute 2103 nodes

Direct Integration with Existing Applications 100% POSIX compliant Industry standard APIs - NFS, ODBC, LDAP, REST More 3rd party solutions Proprietary connectors unnecessary Language neutral

When Hadoop Looks Like Enterprise Storage Data ingestion is easy Popular online gaming company changed data ingestion from a complex Flume cluster to a 17-line Python script Database bulk import/export with standard vendor tools Large telco saved tens of millions on Teradata costs by leveraging MapR to pre-process data prior to loading into Teradata Application servers Logs 1000s of applications/tools Large credit card company uses MapR volumes as the user home directories on the Hadoop gateway servers $ find. grep log $ cp $ vi results.csv $ scp $ tail -f part-00000

Multi-tenant Capabilities to Share a Cluster Successfully Isolation Data placement control Label based job scheduling Quotas Storage, CPU, Memory Security and delegation ACLs AD, LDAP, Linux PAM Reporting About 70 resource usage metrics REST API integration

Enterprise Data Hub Supports a Range of Applications Batch Interactive Real-time 99.999% HA Data Protection Disaster Recovery Scalability & Performance Enterprise Integration Multitenancy Self-Healing Instant recovery Snapshots for point in time recovery from user or application errors Mirroring across clusters and the WAN Unlimited files & tables Record setting performance Direct data ingestion and access Fully compliant ODBC access and SQL-92 support Secure access to multiple users and groups

Expand Data For Existing Applications Network security: Network IDS with a 3-day window instead of a 10-minute window Trade Surveillance: Rogue trader detection on intraday instead of end-of-day market data Insurance: Calculate risk triangles for individual properties instead of neighborhoods Advantages: 1T files and tables Real-time data ingestion with streaming writes 24x7 operations with automated failure recovery Better hardware utilization with 2x performance

Combine Different Data Sources Streaming writes to Hadoop Advantages: Hadoop Real-time offers Exponential decrease in time to market Real-time data ingestion with streaming writes 1T files and tables 24x7 operations with automated failure recovery POS/Online Data Retail purchase Info

New Analytics Enhanced search Real-time event processing MapReduce-enabled machine learning algorithms Advantages Increased ROI with 2x performance High available, fully data protected environment Multiple users running different jobs on one cluster

One Platform with a Range of Functionality Batch Interactive Real-time Map Reduce File-Based Applications SQL Database Search Stream Processing 99.999% HA Data Protection Disaster Recovery Scalability & Performance Enterprise Integration Multitenancy

One Platform with a Range of Functionality Batch Interactive Real-time Map Reduce File-Based Applications SQL Database Search Stream Processing 99.999% HA Data Protection Disaster Recovery Scalability & Performance Enterprise Integration Multitenancy

Apache Drill Overview Apache Drill Alpha now available Community-driven project Apache Drill supports Centralized and Self Describing Centralized Schema Management HCatalog - Hive metastore Centralized data structure DBA defines and maintains structure Self Describing Data Structure stored with data Document database orientation Like MongoDB Project Contributors MapR Pentaho Oracle VMWare Microsoft Thoughtworks UT Austin UW Madison RJMetrics XingCloud

NoSQL Applications Batch Interactive Real-time Map Reduce File-Based Applications SQL Database Search Stream Processing 99.999% HA Data Protection Disaster Recovery Scalability & Performance Enterprise Integration Multitenancy

M7 Enterprise-Grade NoSQL on Hadoop HBase JVM NoSQL Columnar Store HBase API Integrated with Hadoop HDFS JVM ext3/ext4 Disks Other Distros Tables/Files Disks MapR M7

Tradeoffs with NoSQL Solutions 24x7 applications with strong data consistency Reliability Performance Continuous low latency with horizontal scaling Easy Administration Easy day-to-day management with minimal learning curve

High Performance with Consistent Low Latency

Search Applications Batch Interactive Real-time Map Reduce File-Based Applications SQL Database Search Stream Processing 99.999% HA Data Protection Disaster Recovery Scalability & Performance Enterprise Integration Multitenancy

The Need for Integrated Search Hadoop Advantages Scale Distributed processing Extended analytics such as machine learning Search Search Requirements for Hadoop Ability to randomly read and write indices over Hadoop Store and process large number of small files for indices High availability for data as well as indices Analytics Discovery

Integration Issues for Search Direct Data Ingestion Snapshots Support for Steaming Writes Capability to easily integrate with other Hadoop components Point-in-time recovery from errors Data Versioning for Modeling Search index Storage Store index directly on Hadoop instead of local file system Leverage scale, replication and data management Enterprise Grade Search Enterprise-grade Hadoop

Streaming Applications Batch Interactive Real-time Map Reduce File-Based Applications SQL Database Search Stream Processing 99.999% HA Data Protection Disaster Recovery Scalability & Performance Enterprise Integration Multitenancy

Other Distributions Require Separate Clusters Twitter TwitterLogger Twitter API Kafka Kafka Kafka Cluster Kafka Cluster Cluster Kafka API Storm Flume HDFS Data Hadoop Web Data NAS http Web-server

Enterprise Hub with MapR Twitter Twitter API DFS MapR Spout TwitterLogger MapR Optional MapReduce Storm Other Distributions 8+ separate Cluster 20-25 nodes 3 Kafka nodes; >2 TwitterLogger; 5-10 Hadoop; >3 Storm; 3 zookeepers; NAS for web storage; > 2 web servers MapR One Platform 5-10 nodes total, any node does any job Full HA included Snapshots included Disaster recovery included

Advantages for Enterprise Data Hub Enterprise Grade Platform 99.999% HA Full data protection Disaster recovery Easiest Integration Industry-standard interfaces: NFS, ODBC, LDAP, REST Streaming writes Best ROI Faster time to market Eliminate risk Reuse existing apps and tools

An Enterprise Data Hub Location Click Streams Sensor Data Enterprise Data Hub Production Data Web Logs Public Social Media Sales SCM CRM Billing Combine different data sources Minimize data movement One platform for analytics

MapR: Open Source + Innovation Management Innovations Management Innovations Apache Hadoop Apache Hadoop Apache Hadoop Infrastructure Innovations

Q & A @mapr Engage with us! maprtechnologies john@maprtech.com maprtech MapR maprtech MapR Technologies. All rights reserved.