HDP Hadoop From concept to deployment.



Similar documents
HDP Enabling the Modern Data Architecture

Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015

Comprehensive Analytics on the Hortonworks Data Platform

Upcoming Announcements

A Modern Data Architecture with Apache Hadoop

Data Security in Hadoop

Session 0202: Big Data in action with SAP HANA and Hadoop Platforms Prasad Illapani Product Management & Strategy (SAP HANA & Big Data) SAP Labs LLC,

The Future of Data Management

Hadoop, the Data Lake, and a New World of Analytics

The Future of Data Management with Hadoop and the Enterprise Data Hub

Harnessing big data with Hortonworks Data Platform and Red Hat JBoss Data Virtualization

Modernizing Your Data Warehouse for Hadoop

The Digital Enterprise Demands a Modern Integration Approach. Nada daveiga, Sr. Dir. of Technical Sales Tony LaVasseur, Territory Leader

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Big Data Realities Hadoop in the Enterprise Architecture

Modern Data Architecture for Predictive Analytics

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc All Rights Reserved

Modern Data Architecture for Retail with Apache Hadoop on Windows

#TalendSandbox for Big Data

Big Data: Making Sense of it all!

Hortonworks Data Platform for Hadoop and SAP HANA

Open Source in Financial Services: Meet the challenges of new business models and disruption

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

THE JOURNEY TO A DATA LAKE

Talend Big Data. Delivering instant value from all your data. Talend

HADOOP. Revised 10/19/2015

Apache Hadoop: The Big Data Refinery

SAP and Hortonworks Reference Architecture

Modern Data Architecture for Financial Services with Apache Hadoop on Windows

Evolution from Big Data to Smart Data

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

The Enterprise Data Hub and The Modern Information Architecture

Please give me your feedback

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April

Roadmap Talend : découvrez les futures fonctionnalités de Talend

Hortonworks Data Platform. Buyer s Guide

Dominik Wagenknecht Accenture

Bringing Big Data to People

Integrating a Big Data Platform into Government:

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

Using Tableau Software with Hortonworks Data Platform

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE

Information Builders Mission & Value Proposition

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Apache Hadoop's Role in Your Big Data Architecture

Self-service BI for big data applications using Apache Drill

Cloudera Enterprise Data Hub in Telecom:

Hadoop Job Oriented Training Agenda

Stinger Initiative: Introduction

Microsoft Big Data. Solution Brief

Community Driven Apache Hadoop. Apache Hadoop Basics. May Hortonworks Inc.

Self-service BI for big data applications using Apache Drill

Investor Presentation. Second Quarter 2015

Workshop on Hadoop with Big Data

BIG DATA TRENDS AND TECHNOLOGIES

Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster. Nov 7, 2012

Hadoop Ecosystem B Y R A H I M A.

Why Spark on Hadoop Matters

The Evolving Apache Hadoop Eco-System

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

Dell In-Memory Appliance for Cloudera Enterprise

Ganzheitliches Datenmanagement

Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

SQL Server 2012 PDW. Ryan Simpson Technical Solution Professional PDW Microsoft. Microsoft SQL Server 2012 Parallel Data Warehouse

Cost-Effective Business Intelligence with Red Hat and Open Source

How to Hadoop Without the Worry: Protecting Big Data at Scale

Cisco IT Hadoop Journey

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

How Companies are! Using Spark

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

Moving From Hadoop to Spark

Big Data Management and Security

Training Catalog. Summer 2015 Training Catalog. Apache Hadoop Training from the Experts. Apache Hadoop Training From the Experts

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014

Hadoop Trends and Practical Use Cases. April 2014

Implement Hadoop jobs to extract business value from large and varied data sets

Virtualizing Apache Hadoop. June, 2012

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014

Native Connectivity to Big Data Sources in MSTR 10

Market Overview: Big Data Integration

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru

Apache Hadoop Patterns of Use

Big Data Analytics. Copyright 2011 EMC Corporation. All rights reserved.

Oracle Database 12c Plug In. Switch On. Get SMART.

BIG DATA TECHNOLOGY. Hadoop Ecosystem

A Modern Data Architecture with Apache Hadoop

Luncheon Webinar Series May 13, 2013

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Transcription:

HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015

Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some software C. Deep in a trial D. In production with a hadoop cluster E. What s Hadoop? The question will open when you start your session and slideshow. # votes: 66 Closed Internet TXT Twitter Page 42 This text box will be used to describe the different message sending methods. The applicable explanations will be inserted after you have started a session. It is possible to move, resize and modify the appearance of this text box.

Where are you in your Hadoop Journey? A. Researching our options 40.9% B. Currently evaluating some software 7.6% C. Deep in a trial 9.1% D. In production with a hadoop cluster 9.1% E. What s Hadoop? 33.3% Closed Internet TXT Twitter Page 43 This text box will be used to describe the different message sending methods. The applicable explanations will be inserted after you have started a session. It is possible to move, resize and modify the appearance of this text box.

Hadoop for the Enterprise: Implement a Modern Data Architecture with HDP Customer Momentum 230+ customers (as of Q3 2014) Founded in 2011 Original 24 architects, developers, operators of Hadoop from Yahoo! 600+ Employees 800+ Ecosystem Partners Hortonworks Data Platform Completely open multi-tenant platform for any app & any data. A centralized architecture of consistent enterprise services for resource management, security, operations, and governance. Partner for Customer Success Open source community leadership focus on enterprise needs Unrivaled world class support Page 44

Hadoop: A Modern Storage and Data Processing Platform. Page 45

Traditional systems under pressure 1 Challenges Constrains data to app Can t manage new data Costly to Scale INDUSTRY LEADERS 2020 40 Zettabytes Clickstream Geolocation Business Value 2 New Data New Web Data Internet of Things Docs, emails Server logs LAGGARDS ERP CRM SCM 2012 2.8 Zettabytes Traditional Page 46

Modern Data Architecture emerges to unify data & processing ANALYTICS Data Applications Marts Business Analytics Visualization & Dashboards Modern Data Architecture Enable applications to have access to all your enterprise data through an efficient centralized platform Batch MP P Batch EDW Batch Interactive YARN: Data Operating System HDFS Real-Time (Hadoop Distributed File System) Partner ISV Supported with a centralized approach governance, security and operations Versatile to handle any applications and datasets no matter the size or type SOURCES ERP CRM SC M Existing Systems Clickstream Web & Social Geolocation Sensor & Machine Server Logs Unstructured Schema on read. Complements rather than replaces. Page 47

HDP IS Apache Hadoop There is ONE Enterprise Hadoop: everything else is a vendor derivation HDP 2.2 October 2014 HDP 2.1 April 2014 HDP 2.0 October 2013 2.6.0 2.4.0 2.2.0 Hadoop &YARN 0.14.0 0.12.1 0.12.0 Pig 0.14.0 0.13.0 0.12.0 Hive & HCatalog 0.98.4 0.98.0 0.96.1 HBase 4.2 4.0.0 Phoenix 1.6.1 1.5.1 Accumulo 0.9.3 0.9.1 Storm 1.2.0 Spark 4.10.0 4.7.2 Solr 0.60 0.4.0 Tez 0.5.1 Slider 0.6.0 0.5.0 Falcon 0.8.1 Kafka 1.4.5 1.4.4 Sqoop 1.5.0 1.4.0 1.3.1 Flume 1.7.0 1.5.1 1.4.4 Ambari 4.1.0 4.0.0 3.3.2 Oozie 3.4.5 3.4.5 Zookeeper 0.5.0 0.4.0 Knox 0.4.0 Ranger Data Management Data Access Governance & Integration Operations Security Page 48 Hortonworks Data Platform 2.2 * version numbers are targets and subject to change at time of general availability in accordance with ASF release process

HDP delivers a comprehensive data management platform GOVERNANCE Hortonworks Data Platform 2.2 BATCH, INTERACTIVE & REAL-TIME DATA ACCESS SECURITY OPERATIONS YARN is the architectural center of HDP Data Workflow, Lifecycle & Governance Falcon Sqoop Flume Kafka NFS WebHDFS Script Pig Tez SQL Hive Tez Java Scala Cascading Tez NoSQL HBase Accumulo Slider Stream Storm Slider In-Memory Spark YARN: Data Operating System (Cluster Resource Management) Search Solr Others ISV Engines Authentication Authorization Accounting Data Protection Storage: HDFS Resources: YARN Access: Hive, Pipeline: Falcon Cluster: Knox Cluster: Ranger Provision, Manage & Monitor Ambari Zookeeper Scheduling Oozie Enables batch, interactive and real-time workloads Provides comprehensive enterprise capabilities HDFS (Hadoop Distributed File System) 1 Linux Windows Deployment Choice On-Premises Cloud The widest range of deployment options Delivered Completely in the OPEN Page 49

Hadoop adoption follows a predictable journey Cost Optimization, new analytic apps, and ultimately to a data lake Page 50

Hadoop Driver: Cost optimization HDP helps you reduce costs and optimize the value associated with your EDW ANALYTICS Data Marts Business Analytics Visualization & Dashboards Archive Data off EDW Move rarely used data to Hadoop as active archive, store more data longer DATA SYSTEMS MPP In-Memory Enterprise Data Warehouse Hot HDP 2.2 Cold Data, Deeper Archive & New Sources ELT N Offload costly ETL process Free your EDW to perform high-value functions like analytics & operations, not ETL Enrich the value of your EDW Use Hadoop to refine new data sources, such as web and machine data for new analytical context SOURCES ERP CRM SC M Existing Systems Clickstream Web & Social Geolocation Sensor & Machine Server Logs Unstructured Page 51

Financial Drivers Hadoop Enables Scalable Compute & Storage at a Compelling Cost Structure Cost Efficiencies Reduce costs associated with expensive archive systems Utilize existing relationships with hardware vendors Open Source Software Active Archive Provide access to archived data. It s not there to collect dust. Cloud Storage HADOOP NAS Engineered System MPP SAN Storage Costs/Compute Costs from $19/GB to $0.23/GB $0 $20,000 $40,000 $60,000 $80,000 $180,000 Fully-loaded Cost Per Raw TB of Data (Min Max Cost) Page 52

Hadoop Driver: Today s Data Architectures Inhibit a Single View ANALYTICS App 1 App Data Marts App Visualization 1. Data Silos: disparate views of each customer DATA SYSTEMS 2 Enterprise Data Warehouse 3 2. Volume Limitations: cannot store and process all customer data in the EDW SOURCES RDBMS CRM CRM Systems of Record ODS Clickstream Web & Social Geolocation Sensor & Machine Server Logs Unstructured 3. New Data Sources: unable to capture and use new data to complete the view Page 53

Hadoop Driver: Single View: Consolidating the Silos HDP provides a centralized architecture for any application and any data ANALYTICS Data Applications Marts Business Analytics Visualization & Dashboards Single Data Repository Resolve customer data across repositories Provide analysts with a single view of data Batch MP P Batch EDW Batch Interactive Real-Time YARN: Data Operating System Partner ISV Optimized Storage Eliminate unnecessary silos to reduce costs HDFS (Hadoop Distributed File System) Store more data about each customer Analytical Flexibility SOURCES ERP CRM SC M Existing Systems Clickstream Web & Social Geolocation Sensor & Machine Server Logs Unstructured Dynamic schema on read removes limitations of other single view applications Page 54

Hadoop Driver: Today s Data Architectures Limit Predictive Capabilities ANALYTICS Data Marts Business Analytics Visualization & Dashboards 1. Data Silos: difficult to find predictive correlations DATA SYSTEMS MPP In-Memory 1 Enterprise Data Warehouse Hot 2 3 2. Data Volumes: cannot store enough data to find patterns 3. New Data Sources: unable to capture and use new data for real-time analysis SOURCES RDBMS CRM ERP Systems of Record Clickstream Web & Social Geolocation Sensor & Machine Server Logs Unstructured Page 55

Hadoop Driver: Predictive Analytics: Capture Opportunity with HDP Future state analysis Capture and combine large data sets Understand patterns, model outcomes, and forecast accurately to guide action RDBMS MPP EDW Other Existing and New Data HDP 2.2 Consolidate data Run iterative analytics Predictive Insight N Real-time insight Analyze streaming data from new sources on the fly Deliver timely insights to the right people and systems to take action Streaming Data HDP 2.2 Process in real-time Store to HDFS Actionable Insight N Page 56

New requirements to shift from reactive to proactive A shift from Reactive to Proactive New Requirements: From break then fix From static resource planning From reaction to human activity to Preventative Maintenance to Resource Optimization to Behavioral Insight Analyze extremely large data sets to find patterns Combine disparate data and process it in multiple ways Capture unstructured data from a variety of new sources Perform real-time analysis on streaming data Page 57

Hadoop Driver 3: Today s Data Architectures Limit Data Discovery ANALYTICS Data Marts Business Analytics Visualization & Dashboards 1. Data Silos: miss insights because data is isolated DATA SYSTEMS 1 Enterprise Data Warehouse 2 3 2. Data Volumes: throwing away data that has value 3. New Data Sources: unable to mine new data sources SOURCES RDBMS CRM ERP Systems of Record Clickstream Web & Social Geolocation Sensor & Machine Server Logs Unstructured Page 58

Hadoop Driver 3: Data Discovery: Unlock New Insights with HDP HDP provides a centralized architecture for any application and any data ANALYTICS Data Applications Marts Business Analytics Visualization & Dashboards Combine Combine data from many systems and of many different types Batch MP P Batch EDW Batch Interactive Real-Time YARN: Data Operating System Partner ISV Take advantage of schema on read to define new analyses and seek new answers Explore HDFS (Hadoop Distributed File System) Explore large volumes of data together in its many forms Answer a wide range of questions applying multiple processing techniques SOURCES ERP CRM SC M Existing Systems Clickstream Web & Social Geolocation Sensor & Machine Server Logs Unstructured Page 59

Hadoop Driver: Enabling the data lake SCALE Journey to the Data Lake with Hadoop Systems of Insight DATA LAKE Goal: Centralized Architecture Data-driven Business Data Lake Definition Centralized Architecture Multiple applications on a shared data set with consistent levels of service Any App, Any Data Multiple applications accessing all data affording new insights and opportunities. Unlocks Systems of Insight Advanced algorithms and applications used to derive new value and optimize existing value. Drivers: 1. Cost Optimization 2. Advanced Analytic Apps Page 60 SCOPE

Deployment. Page 61

Deployment Options 1 On-line IAAS: Rackspace Managed Big Data 2 On-line: Elastic Service: Rackspace Cloud Big Data 3 Laptop: Sandbox: Single node Hadoop distribution http://hortonworks.com/products/hortonworks-sandbox/ 4 On-Premis: HDP: Complete Hadoop Distribution http://docs.hortonworks.com/hdpdocuments/hdp2/hdp-2.2.0/hdp_man_install_v22/index.html Page 62

Summary: Any Data, Any Application, Anywhere Any Data Deploy applications fueled by clickstream, sensor, social, mobile, geo-location, server log, and other new paradigm datasets with existing legacy datasets. Anywhere Implement HDP naturally across the complete range of deployment options commodity appliance cloud ERP CRM SC M Clickstream Web & Social Geolocation Internet of Things Server Logs Files, emails hybrid Any Application Deep integration with ecosystem partners to extend existing investments and skills Over 70 Hortonworks Certified YARN Apps Broadest set of applications through the stable of YARN-Ready applications Page 63

Questions. Page 64