Hadoop in the Enterprise

Similar documents

Big Data Realities Hadoop in the Enterprise Architecture

Apache Hadoop's Role in Your Big Data Architecture

YARN Apache Hadoop Next Generation Compute Platform

Upcoming Announcements

How Companies are! Using Spark

Comprehensive Analytics on the Hortonworks Data Platform

Hadoop2, Spark Big Data, real time, machine learning & use cases. Cédric Carbone Twitter

Hadoop 2.6 Configuration and More Examples

A Brief Introduction to Apache Tez

Data Security in Hadoop

Sujee Maniyam, ElephantScale

Next Gen Hadoop Gather around the campfire and I will tell you a good YARN

Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015

Stinger Initiative: Introduction

Hadoop Job Oriented Training Agenda

HDP Hadoop From concept to deployment.

Big Data Approaches. Making Sense of Big Data. Ian Crosland. Jan 2016

YARN, the Apache Hadoop Platform for Streaming, Realtime and Batch Processing

Solving performance and data protection problems with active-active Hadoop SOLUTIONS BRIEF

Extending Hadoop beyond MapReduce

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Pilot-Streaming: Design Considerations for a Stream Processing Framework for High- Performance Computing

Accelerating Enterprise Big Data Success. Tim Stevens, VP of Business and Corporate Development Cloudera

Workshop on Hadoop with Big Data

HADOOP. Revised 10/19/2015

Native Connectivity to Big Data Sources in MSTR 10

Moving From Hadoop to Spark

Apache Flink Next-gen data analysis. Kostas

Scaling Out With Apache Spark. DTL Meeting Slides based on

Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013

Hortonworks Architecting the Future of Big Data

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Processing of Big Data. Nelson L. S. da Fonseca IEEE ComSoc Summer Scool Trento, July 9 th, 2015

Ali Ghodsi Head of PM and Engineering Databricks

Dominik Wagenknecht Accenture

Enterprise Operational SQL on Hadoop Trafodion Overview

HDP Enabling the Modern Data Architecture

The Top 10 7 Hadoop Patterns and Anti-patterns. Alex

Oracle Big Data SQL Technical Update

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

Systems Engineering II. Pramod Bhatotia TU Dresden dresden.de

Real Time Fraud Detection With Sequence Mining on Big Data Platform. Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May Santa Clara, CA

Unified Big Data Processing with Apache Spark. Matei

Apache Ignite TM (Incubating) - In- Memory Data Fabric Fast Data Meets Open Source

Apache Hadoop: Past, Present, and Future

The Future of Data Management

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

HPC ABDS: The Case for an Integrating Apache Big Data Stack

How To Create A Data Visualization With Apache Spark And Zeppelin

TE's Analytics on Hadoop and SAP HANA Using SAP Vora

Where is Hadoop Going Next?

Talend Big Data. Delivering instant value from all your data. Talend

Hadoop: Embracing future hardware

Information Builders Mission & Value Proposition

Virtualizing Apache Hadoop. June, 2012

Hadoop Introduction. Olivier Renault Solution Engineer - Hortonworks

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Unified Big Data Analytics Pipeline. 连城

Hortonworks Data Platform Reference Architecture

Hortonworks Data Platform for Hadoop and SAP HANA

[Hadoop, Storm and Couchbase: Faster Big Data]

Apache Spark 11/10/15. Context. Reminder. Context. What is Spark? A GrowingStack

Beyond Hadoop with Apache Spark and BDAS

Cloudera Impala: A Modern SQL Engine for Hadoop Headline Goes Here

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

Spark in Action. Fast Big Data Analytics using Scala. Matei Zaharia. project.org. University of California, Berkeley UC BERKELEY

Actian SQL in Hadoop Buyer s Guide

Big Data and Industrial Internet

SAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES

Near Real Time Indexing Kafka Message to Apache Blur using Spark Streaming. by Dibyendu Bhattacharya

Oracle Database - Engineered for Innovation. Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

Spark and the Big Data Library

Non-Stop Hadoop Paul Scott-Murphy VP Field Techincal Service, APJ. Cloudera World Japan November 2014

The Flink Big Data Analytics Platform. Marton Balassi, Gyula Fora" {mbalassi,

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

Evolution from Big Data to Smart Data

High Availability on MapR

HDFS Federation. Sanjay Radia Founder and Hortonworks. Page 1

#TalendSandbox for Big Data

Roadmap Talend : découvrez les futures fonctionnalités de Talend

The Future of Data Management with Hadoop and the Enterprise Data Hub

Self-service BI for big data applications using Apache Drill

Hadoop. History and Introduction. Explained By Vaibhav Agarwal

ANALYTICS CENTER LEARNING PROGRAM

ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014

Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April

Managing large clusters resources

Dell In-Memory Appliance for Cloudera Enterprise

Hadoop & Spark Using Amazon EMR

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

SAP HANA From Relational OLAP Database to Big Data Infrastructure

HADOOP BIG DATA DEVELOPER TRAINING AGENDA

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Next-Gen Big Data Analytics using the Spark stack

Transcription:

Hadoop in the Enterprise Modern Architecture with Hadoop 2 Jeff Markham Technical Director, APAC Hortonworks

Hadoop Wave ONE: Web-scale Batch Apps relative % customers 2006 to 2012 Web-Scale Batch Applications Innovators, technology enthusiasts Early adopters, visionaries The CHASM Early majority, pragmatists Late majority, conservatives Laggards, Skeptics time Customers want technology & performance Customers want solutions & convenience Source: Geoffrey Moore - Crossing the Chasm

Hadoop Wave TWO: Broad Enterprise Apps relative % customers 2013 & Beyond Batch, Interactive, Online, Streaming, etc., etc. Innovators, technology enthusiasts Early adopters, visionaries The CHASM Early majority, pragmatists Late majority, conservatives Laggards, Skeptics time Customers want technology & performance Customers want solutions & convenience Source: Geoffrey Moore - Crossing the Chasm

Hadoop 2.0 Key Highlights 2.0 Architected for the Broad Enterprise Single Cluster, Many Workloads Enterprise Requirements Mixed workloads Interactive Query Reliability Point in time Recovery HDP 2.0 Features YARN Hive on Tez Full Stack HA Snapshots BATCH INTERACTIVE ONLINE STREAMING Multi Data Center Disaster Recovery ZERO downtime Rolling Upgrades

The 1 st Generation of Hadoop: Batch HADOOP 1.0 Built for Web-Scale Batch Apps Single App INTERACTIVE Single App ONLINE All other usage patterns must leverage that same infrastructure Single App BATCH Single App BATCH Single App BATCH Forces the creation of silos for managing mixed workloads HDFS HDFS HDFS

A Transition From Hadoop 1 to 2 HADOOP 1.0 MapReduce (cluster resource management & data processing) HDFS (redundant, reliable storage)

A Transition From Hadoop 1 to 2 HADOOP 1.0 HADOOP 2.0 MapReduce (cluster resource management & data processing) HDFS (redundant, reliable storage) MapReduce (data processing) YARN (cluster resource management) HDFS (redundant, reliable storage) Others (data processing)

The Enterprise Requirement: Beyond Batch To become an enterprise viable data platform, customers have told us they want to store ALL DATA in one place and interact with it in MULTIPLE WAYS Simultaneously & with predictable levels of service BATCH INTERACTIVE ONLINE STREAMING GRAPH IN- MEMORY HPC MPI OTHER HDFS (Redundant, Reliable Storage) Page 17

YARN: Taking Hadoop Beyond Batch Created to manage resource needs across all uses Ensures predictable performance & QoS for all apps Enables apps to run IN Hadoop rather than ON Key to leveraging all other common services of the Hadoop platform: security, data lifecycle management, etc. ApplicaIons Run NaIvely IN Hadoop BATCH (MapReduce) INTERACTIVE (Tez) ONLINE (HBase) STREAMING (Storm, S4, ) GRAPH (Giraph) IN- MEMORY (Spark) HPC MPI (OpenMPI) OTHER (Search) (Weave ) YARN (Cluster Resource Management) HDFS2 (Redundant, Reliable Storage) Page 18

Old School Hadoop: MapReduce

New School Hadoop with YARN Node Manager Container App Mstr Client Client Resource Manager Node Manager App Mstr Container MapReduce Status Job Submission Node Status Resource Request Container Node Manager Container

5 Key Benefits of YARN 5 1. Scale! 2. Compatibility with MapReduce. 3. Improved cluster utilization. 4. New Programming Models 5. Agility Page 23

Apache Tez An alternate data processing framework to MapReduce Improves performance of low-latency applications Page 24

SQL-IN-Hadoop with Apache Hive Hadoop Business AnalyIcs MAP REDUCE SQL HIVE YARN HDFS2 Custom Apps TEZ Apache Hive: First Application to use YARN Hive on Tez optimizes resource for Hive queries to improve performance Apache Hive is the standard for SQL interaction in Hadoop (Most applications claim Hive compatibility today) Apache Tez: optimized for YARN, general purpose processing framework for existing Hadoop applications Stinger Initiative Simple Focus 1 2 100x Performance Improvement Increased SQL Compatibility Enable Hive to support interactive workloads Improve existing tools & preserve investments SInger Phase 1 Base OpJmizaJons SQL AnalyJcs ORCFile Format SInger Phase 2 YARN Resource Mgmnt Hive on Apache Tez Query Service (always on) SInger Phase 3 Vector Query Buffer Cache Query Planner Page 25

Hive: More SQL & 100X Faster Stinger Phase 1 Base Optimizations SQL Analytics ORCFile Format Stinger Phase 2 YARN Resource Mgmnt Hive on Apache Tez Query Service Stinger Phase 3 Vector Query Buffer Cache Query Planner Done in Hive 0.11 We Are Here Work Started SQL Compliance Highlights ROLLUP and CUBE Windowing functions (OVER, RANK, etc.) DECIMAL CHAR VARCHAR DATE UNION DISTINCT and UNION outside of subquery Sub-queries for IN/NOT IN, HAVING EXISTS / NOT EXISTS INTERSECT, EXCEPT

Hive s Performance Trajectory http://hortonworks.com/blog/delivering-on-stinger-a-phase-3-progress-update/

Making Hadoop Enterprise Ready

Thank You! http://hortonworks.com/sandbox