A modern, flexible approach to Hadoop implementation incorporating innovations from HP Vertica & IDOL



Similar documents
Trafodion Operational SQL-on-Hadoop

Enterprise Operational SQL on Hadoop Trafodion Overview

How To Use Hp Vertica Ondemand

Big Data Analytics: Today's Gold Rush November 20, 2013

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

Il mondo dei DB Cambia : Tecnologie e opportunita`

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack

SQL Server 2012 Parallel Data Warehouse. Solution Brief

HPE Vertica & Hadoop. Tapping Innovation to Turbocharge Your Big Data. #SeizeTheData

Oracle Big Data SQL Technical Update

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

SEIZE THE DATA SEIZE THE DATA. 2015

Hadoop: Embracing future hardware

Oracle Database - Engineered for Innovation. Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya

Oracle Database 12c Plug In. Switch On. Get SMART.

Luncheon Webinar Series May 13, 2013

HP ConvergedSystem 900 for SAP HANA Scale-up solution architecture

Data-Centric security and HP NonStop-centric ecosystems. Andrew Price, XYPRO Technology Corporation Mark Bower, Voltage Security

Platfora Big Data Analytics

HP Big Data Reference Architecture: A Modern Approach

Turning Data Into Answers With HP Vertica

Introducing Oracle Exalytics In-Memory Machine

Einsatzfelder von IBM PureData Systems und Ihre Vorteile.

How To Write An Article On An Hp Appsystem For Spera Hana

Ubuntu and Hadoop: the perfect match

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Pentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System

Big Data Technologies Compared June 2014

Actian SQL in Hadoop Buyer s Guide

Real-Time Big Data Analytics SAP HANA with the Intel Distribution for Apache Hadoop software

HadoopTM Analytics DDN

TE's Analytics on Hadoop and SAP HANA Using SAP Vora

BIG DATA TRENDS AND TECHNOLOGIES

Microsoft Analytics Platform System. Solution Brief

Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

The Future of Data Management

Convergence is accelerating the path to the New Style of Business

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Please give me your feedback

ENTERPRISE-CLASS MONITORING SOLUTION FOR EVERYONE ALL-IN-ONE OPEN-SOURCE DISTRIBUTED MONITORING

HDP Hadoop From concept to deployment.

Evolution from Big Data to Smart Data

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Netezza and Business Analytics Synergy

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

Hadoop & Spark Using Amazon EMR

Enabling High performance Big Data platform with RDMA

Cost-Effective Business Intelligence with Red Hat and Open Source

Ganzheitliches Datenmanagement

HP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief

Intel RAID SSD Cache Controller RCS25ZB040

HDP Enabling the Modern Data Architecture

SQL Server 2012 Performance White Paper

In-memory computing with SAP HANA

Microsoft Hybrid Cloud IaaS Platforms

Dell In-Memory Appliance for Cloudera Enterprise

IRON Big Data Appliance Platform for Hadoop

Oracle Exadata: The World s Fastest Database Machine Exadata Database Machine Architecture

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Real World Big Data Architecture - Splunk, Hadoop, RDBMS

Virtualizing Apache Hadoop. June, 2012

The Future of Data Management with Hadoop and the Enterprise Data Hub

Nutanix Solutions for Private Cloud. Kees Baggerman Performance and Solution Engineer

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Successfully Deploying Alternative Storage Architectures for Hadoop Gus Horn Iyer Venkatesan NetApp

Reference Architecture, Requirements, Gaps, Roles

The HP Neoview data warehousing platform for business intelligence

Microsoft Private Cloud Fast Track Reference Architecture

Interactive data analytics drive insights

RED HAT STORAGE PORTFOLIO OVERVIEW

Real-Time Analytics for Big Market Data with XAP In-Memory Computing

Session 1: IT Infrastructure Security Vertica / Hadoop Integration and Analytic Capabilities for Federal Big Data Challenges

CA Technologies Big Data Infrastructure Management Unified Management and Visibility of Big Data

HP BladeSystem Advantage over Cisco s UCS

SPEED your path to virtualization.

2009 Oracle Corporation 1

Securing Hadoop Data Big Data Everywhere - Atlanta January 27, 2015

Dell Reference Configuration for DataStax Enterprise powered by Apache Cassandra

HP HAVEn: See the big picture in Big Data

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

HP Vertica. Echtzeit-Analyse extremer Datenmengen und Einbindung von Hadoop. Helmut Schmitt Sales Manager DACH

Big Data. Value, use cases and architectures. Petar Torre Lead Architect Service Provider Group. Dubrovnik, Croatia, South East Europe May, 2013

James Serra Sr BI Architect

Big Data Analytics - Accelerated. stream-horizon.com

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances

SAS and Oracle: Big Data and Cloud Partnering Innovation Targets the Third Platform

A HIGH-PERFORMANCE, SCALABLE BIG DATA APPLIANCE LAURA CHU-VIAL, SENIOR PRODUCT MARKETING MANAGER JOACHIM RAHMFELD, VP FIELD ALLIANCES OF SAP

Overview: X5 Generation Database Machines

Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Big Data and Its Impact on the Data Warehousing Architecture

Moving From Hadoop to Spark

IBM PureData System for Transactions. Technical Deep Dive. Jonathan Rossi, PureSystems Specialist

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Transcription:

A modern, flexible approach to Hadoop implementation incorporating innovations from HP Vertica & IDOL Gilles Noisette, HP EMEA Big Data CoE London 2015

Agenda Hadoop in the HP Big Data picture HP Platforms for Hadoop HP Reference Architectures for Hadoop HP Big Data Reference Architecture HP Haven & Hadoop HP Vertica Fast analytics on Hadoop HP IDOL Smart Hadoop Data Lake HP SecureData for Hadoop Trafodion SQL DBMS on Hadoop HP Big Data Services

The HP Haven Big Data Platform Powering Big Data Analytics to Applications Turn 100% of your data into action. Human Data Machine Data Business Data Haven Big Data Platform Insight Haven Enterprise SQL / BI / Reporting Predictive Analytics Machine Learning Log Analytics Search Image / Audio / Video Haven OnHadoop Secure Data Lake Exploration Open Data Format Governance Native support for MapR, Hortonworks & Cloudera Haven OnDemand Open APIs Rapid POCs & deployment Elastic / Multi-tenant Private Cloud-ready Pay-as-you-go HP Vertica, HP IDOL, KeyView, HP 3 Distributed R Predictive Analytics HP Vertica SQL on Hadoop, HP IDOL for Hadoop HP Vertica OnDemand & HP IDOL OnDemand

UID ProLiant DL380e Gen8 500 GB 500 GB 1 2 3 4 7 5 8 6 9 UID 10 13 11 14 12 15 500 GB 500 GB UID UID UID 1 2 3 4 7 5 8 6 9 UID 10 13 11 14 12 15 500 GB 500 GB 1 2 3 4 7 5 8 6 9 UID 10 13 11 14 12 15 ProLiant SL4540 Gen8 HP Big Data platform Hadoop centric view an HP company Analytics Data Intelligence Security SQL DBMS HP Vertica HP IDOL HP SecureData Trafodion Open Source Hadoop Ecosystem Open Source HP ProLiant / Converged Infrastructure DL380, Apollo 4200, Apollo 4530, Moonshot 1500, Network Cluster Operation HP BSM / HP DSM / HP CMU 4

HP Reference Architectures for Hadoop

UID ProLiant DL380e Gen8 UID UID UID 1 2 3 4 7 5 8 6 9 UID 10 13 11 14 12 15 1 2 3 4 7 5 8 6 9 UID 10 13 11 14 12 15 1 2 3 4 7 5 8 6 9 UID 10 13 11 14 12 15 ProLiant SL4540 Gen8 UID ProLiant DL380e Gen8 UID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 A 19 20 21 22 23 24 25 26 27 B 28 29 30 31 21 33 34 35 36 37 38 39 40 41 42 43 44 45 Moonshot 1500 HP Reference Architecture(s) for Hadoop Flexible, pre-approved & optimized configurations + Scaling from 4 to thousands of HP ProLiant Servers Sized to customer s workload and storage needs Impressive Processor and Storage density A set of pre-tested hardware components Processor, Drives, Network, 1TB/8TB disk size etc... Breakthrough economics, density, simplicity DL 380 500 GB 500 GB Apollo 4530 500 GB 500 GB 500 GB 500 GB Apollo 4200 HP 5900 10GbE HP 5930 10GbE x 2 Network Switches 3 x DL360 Gen9 Head s 24 x HP ProLiant Apollo 4530 Worker s Moonshot 1500 HP Apollo 4000 example 6 2.46 PB raw storage 630 TB Hadoop usable 756 Xeon E5 cores for a full rack 3.5 PB raw storage 900 TB Hadoop usable 960 Xeon E5 cores for a full rack 4.26 PB raw storage 1 PB Hadoop usable 756 Xeon E5 cores for a full rack 1620 Xeon E3 cores 3240 Linux CPUs for a full rack

HP Apollo 4200 - Bringing Big Data storage server density to enterprise The enterprise bridge to Big Data - Available June 1, 2015 Storage density Plug and play Performance and efficiency Leadership storage density 28 LFF or 50 SFF HDD Enterprise bridge Fits traditional enterprise/sme rack server data centers deploy today, no cost of change Configuration flexibility Balanced capacity, performance and throughput with flexible options - Disks, CPUs, I/O and interconnects Highest storage density in a traditional 2U rack server - 224 TB 7

HP Apollo 4530 System - Massive density for Hadoop and Big Data Analytics Purpose-built for Hadoop and Big Data analytics - Available June 1, 2015 Analytics At scale Versatile performance Hadoop optimized 3 servers in 4U chassis ideal for Hadoop-based analytics with 3-copy data replication Efficient analytics scaling Up to 30 servers with 15 HDDs/SSDs each and 3.6 PB capacity per 42U rack For Big Data variety Customize for Hadoop workload variety and NoSQL analytics with disk, CPU, I/O and interconnect options Unleash the full value of Big Data with Hadoop 8

You need more than good servers to get a good cluster It s also about Networking and Cluster operation + HP Networking Network matters for Hadoop clusters HP s perfect Top of Rack and Aggregation switch offer Hadoop likes the HP Deep Buffer caching feature HP IRF simplifies architecture of server access networks and enables massive scalability HP FlexFabric 5930 Switch Series : 32 x 40GbE + 6 x 40G uplink ports family of high-density, ultra-low-latency Aggregation switches HP FlexFabric 5900 Switch Series : 48 x 10GbE + 4 x 40GbE ports Family of low-latency Top of Racks (ToR) switches HP Switch HP Insight Cluster Management utility Designed to operate top500 clusters Provision thousand of nodes in minutes Monitor clusters of any size (2D instant view, 3D time view) Control thousand of servers like one Perfectly fits Hadoop cluster operation needs 1GbE, 10GbE or 40GbE Hadoop cluster behavior real time analysis 9

HP Big Data Reference Architecture 10

Interesting released Hadoop feature Architecture trends YARN Labelling (-labels / jira YARN-796) Capability to create groups of similar nodes to run different types of applications with different workload, each, on the most appropriate group of node Admin tags nodes with labels (e.g.: GPU, Storm) One node can have more than one label (e.g.: GPU, m710) Applications can include labels in container requests I want a GPU Application Master 11 Manager [Storm] Manager [Analytic, XL230a] HP Apollo 6000 blade Manager [GPU, m710] Moonshot cartridge Enabling the next Generation of Hadoop Applications...

Interesting released Hadoop features Architecture trends HDFS Tiering / Heterogeneous Storage Tiers (HDFS-2832) For example, HBase can request that its data files (Hfiles) be stored on SSD. Then when HBase does writes and reads from HDFS, these requests will hit SSD and provide the latency requirements that HBase needs for supporting near real time applications. Phase2: HDFS-5682 - Application APIs for heterogeneous storage HDFS-7228 - SSD storage tier HDFS-5851 - Memory as a storage tier (beta) HDFS Archival Storage Design (HDFS-6584) Introduces a new concept of storage policies. For accommodating future storage technology and different cluster characteristics, cluster administrators will be able to modify the predefined storage policies and/or define custom storage policies. Data policy names : Very Hot Hot Warm Luke Warm Cold 12

New approach to address Big Data demands Modern and Flexible Current traditional Big Data approach New HP Big Data approach Compute and storage are always collocated All servers are identical Data is partitioned across servers on direct-attached storage (DAS) Separate compute and storage tiers connected by Ethernet networking Standard Hadoop installed asymmetrically with storage components on the storage servers and yarn applications on the compute servers Compute Optimized Servers YARN Applications Two Socket, 2U Servers YARN Applications, HDFS, ORC Files, Parquet, Hbase, Cassandra HDFS, ORC Files, Parquet, Hbase, Cassandra Storage Optimized Servers 14

Benefits of HP Big Data Reference Architecture HP Moonshot and HP Apollo servers addresse a variety of enterprise big data needs Compute HP Moonshot Storage HP Apollo Ethernet (RoCE) Cluster consolidation Multiple big data environments can directly access a shared pool of data Flexibility to scale Scale compute and storage independently Maximum elasticity Rapidly provision compute without affecting storage Breakthrough economics Significantly better density, cost and power through workload optimized components 15

HP Apollo and Moonshot - HP Big Data Reference Architecture 2X Hadoop MapReduce performance with the same footprint 2.5X HBase performance with the same footprint 2 X Higher Density versus 20% more Memory 46% Less Power (Watts) Traditional architecture 16 Big Data Reference Architecture Note: Comparison configuration is ProLiant DL380 Gen9 servers

Maximum Elasticity for Big Data workloads Hadoop Labels feature (jira YARN-796) HP contributed IP into the Hadoop trunk Specifying labels on nodes allows for scheduling of YARN containers to specific pools of nodes - Admins able to target workloads at optimized platforms Combined with the HP Big Data Reference Architecture, compute nodes can be dynamically assigned - No data repartitioning 12am 6am Hadoop Cluster 1 Hadoop Cluster 2 6am 12am Hadoop Cluster 1 Hadoop Cluster 2 Vertica Analytics Spark 18 Storage Storage

HP Haven & Hadoop

HP IDOL for Hadoop To Build a Smarter data Lake

HP Intelligent Data Operating Layer (IDOL) The OS for human information Single processing layer to handle the continuum of human information Connect Understand Act & Automate Access virtually any source of information Form an understanding of information, including docs, emails, databases, social media, rich media, etc. Over 500 functions to derive actionable insights aka: HP Autonomy IDOL 23

A Smarter Data Lake Needs HP IDOL Features Integration points with Hadoop Breakdown information silos across enterprise Understand myriad file formats and types Improved, intuitive visibility to contents Automatically analyse rich media Connectors & Policies KeyView + IDOL to Vertica IDOL Server (incl HDFS Sync) Image Server & Video Server Knowledge Graph Advanced Speech-to-Text 24

HP Vertica SQL on Hadoop Fast analytics on Hadoop

HP Vertica Analytics platform 7 High-performance data analytics platform purpose-built for big data - columnar database engine Blazing fast analytics Gain insight into your data in near-real time by running queries 50x -1,000x faster than legacy products Massive scalability - PBs Infinitely scale your solution by adding an unlimited number of industry-standard servers Open architecture Protect and embrace your investment in hardware and software with built-in support for Hadoop, R, and a range of ETL and BI tools Optimized data storage Store 10x-30x more data per server than row databases with patented columnar compression Load & analyze growing forms of semi-structured data Quickly and easily load, explore, analyze emerging and rapidly growing forms of semi-structured data. Easy Set-Up and Administration Get to market quickly with your analytics initiatives at low cost of administration and maintenance 26 Speed, scalability, and openness at lower TCO

HP Vertica Data Storage Options and Performance HP Vertica SQL on Hadoop Query Engine Vertica ANSI SQL-99 Vertica ANSI SQL-99 Vertica ANSI SQL-99 Vertica ANSI SQL-99 Vertica ANSI SQL-99 Format Vertica ROS Vertica ROS Hadoop Format* Flex Tables Flat Files File System EXT4 HDFS HDFS HDFS HDFS Fastest Analytics Performance Slowest Discovery Structured Semi-Structured *Supported Hadoop file formats : Parquet, ORC 29

HP Secure Data for Hadoop To Secure your data

HP SecureData Data-Centric Encryption and Tokenization HP SecureData Key Servers HP SecureData Central Management Console HP Stateless Key Management No key database to store or manage High performance, unlimited scalability Both encryption & tokenization technologies Format Preserving Encryption (FPE) for De-Identification Secure Stateless Tokenization (SST) for Payment Card Industry Customize solution to meet your exact requirements Broad Platform Support On-premise / cloud / Big Data Structured / Unstructured Linux, Hadoop, Windows, AWS, IBM z/os, HP NonStop, Teradata, etc Quick time-to-value Complete end-to-end protection within a common platform Format-preservation dramatically reduces implementation effort FPE 345-753-5772 AES HP SecureData Web Services API 934-72-2356 Tax ID 8juYE%Uks&dDFa2345^WFLERG HP SecureData Command Line and Automated Parsers Credit Card 1234 5678 8765 4321 HP SecureData Native APIs (C, Java, C#,.NET) First Name: Gunther Last Name: Robertson SSN: 934-72-2356 DOB: 20-07-1966 First Name: Uywjlqo Last Name: Muwruwwbp SSN: 253-67-2356 DOB: 18-06-1972 Ija&3k24kQotugDF2390^32 0OWioNu2(*872weW Oiuqwriuweuwr%oIUOw1@ Tax ID 934-72-2356 SST 8736 5533 4678 9453 347-982-8309 Partial SST Obvious SST 1234 5633 4678 4321 1234 56AZ UYTZ 4321 347-982-2356 AZS-UXD-2356 34

Options for Securing Data in Hadoop with HP Security Voltage Hadoop Cluster Applications & Data HP Security Voltage 1 4 Hadoop Jobs & Analytics Applications & Data 2 Landing Zone ETL & Batch HP Security Voltage HDFS 5 Hadoop Jobs & Analytics HP Security Voltage Applications, Analytics & Data Applications & Data Hadoop Jobs HP Security Voltage 6 Egress Zone ETL & Batch HP Security Voltage Applications, Analytics & Data 7 HP Security Voltage BI Tools & Downstream Applications Legend: Unprotected Data De-Identified Data Application with HP Security Voltage Interface Point Standard Application 35

HP Trafodion v1.0.0 ( Open Source since June 2014) Forrester - Mike Gualtieri (October 22nd, 2013) The Future of Hadoop is real time and transactional Doug Cutting (October 30th, 2013) We're in the middle of a revolution in data processing it is inevitable that we will see just about every kind of workload be moved to this platform even OnLine Transaction Processing (OLTP) Copyright 2013 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Addresses an under-served Hadoop market segment Operational Real-time insights SQL DBMS = OLTP + Summary BI Interactive Parameterized reports Drilldown visualization Exploration Non-interactive Data preparation Incremental batch processing Dashboards, scorecards Batch Current Market Focus: Data Warehousing and Analytics Operational batch processing Enterprise reports Data mining Trafodion Focus Sub-second Response Time Hours Adds Value to Hadoop 37 Transaction Support Data Integrity Real-time Performance Operational Optimizations Workload Management

Trafodion Trafodion is a joint HP Labs and HP-IT research project to develop operational SQL on Hadoop database capabilities Complete : Full-function SQL Reuse existing SQL skills and improve developer productivity Protected : Distributed ACID transactions Guarantees data consistency across multiple rows, tables, SQL statements Efficient : Optimized for low-latency read and write transactions Supports real-time, high concurrency, transaction processing applications Interoperable : Standard ODBC/JDBC access Works with existing tools and applications Open : Hadoop and Linux distribution neutral Easy to add to your existing infrastructure and no vendor lock-in Hadoop + Operational SQL Open source project sponsorship and investment from HP 38 Production ready version 1.0 release available at www.trafodion.org

HP Big Data Services

Advisory and Discovery Services for Big Data Advisory Our industry and technical experts can support people in technology assessments and strategy development. Big Data TW Used to define Big Data strategy Transformation Workshop format Discovery Workshop Used to identify/prioritize use-cases Validate functional and technical viability Discovery Experience Discovery Lab Time boxed engagement to run a pilot Based on use-cases from workshop Run on Haven cloud environment Insert a Haven lab in the customer ecosystem Platform, platform management and lab function management (on-premise or cloud) 41

HP Services for Hadoop Bringing value to the customer Technical Services Analytics Services Hadoop Roadmap Service Enterprise Design Services Advisory & Discovery Services Information Management Services Hadoop Proof of Concept Cluster Implementation Services Hadoop Solutions & Applications Development Data Science Services Support/Management Services Cluster Support Managed Services As-a-Service 42

Summary +

Summary HP offers industry leading capability for Hadoop Open systems Deep expertise Complete support Ongoing innovation Leading Partnerships Contribution to Apache community Collaboration with Hortonworks Full portfolio of consulting services Projects Moonshot HP ProLiant Gen9 HP Apollo 4200-4530 Industry Standard Solutions HP Insight CMU HP BSM HP DSM Global Solution Center Haven Big Data Platform Designed for Big Data an HP company 45

Thank You

Learn more about HP Haven www.hp.com/go/haven Solution brochure Technical white paper HP Vertica SQL on Hadoop FAQ Customer analytics use case 47

HP Big data Reference Architecture External Collateral White papers: HP Big Data Reference Architecture: A Modern Approach http://h20195.www2.hp.com/v2/getdocument.aspx?docname=4aa5-6141enw&cc=us&lc=en HP Big Data Reference Architecture: Cloudera Enterprise reference architecture implementation http://h20195.www2.hp.com/v2/getdocument.aspx?docname=4aa5-6137enw&cc=us&lc=en HP Big Data Reference Architecture: Hortonworks Data Platform reference architecture implementation http://h20195.www2.hp.com/v2/getdocument.aspx?docname=4aa5-6136enw&cc=us&lc=en Blog posts: HP Blog post (from Greg Battas) http://h30507.www3.hp.com/t5/hyperscale-computing-blog/the-future-of-big-data-platforms-bringing-order-to-chaos-and/ba-p/178209#.vh91wkpna9i Hortonworks blog post http://hortonworks.com/blog/want-new-ways-optimize-big-data-workloads/ Joseph George s blog post (The HP Big Data Reference Architecture: It s Worth Taking a Closer Look ) http://hp.nu/i20rn Silicon Angle Blog post http://siliconangle.com/blog/2014/12/23/hp-thinks-its-got-a-better-way-to-run-hadoop-hpdiscover/ Forrester Blog Post http://blogs.forrester.com/richard_fichera/15-01-28-rethinking_analytics_infrastructure Videos: Steve Tramack interview on The Cube at Discover https://www.youtube.com/watch?v=x2ymmuhzxas&list=plenh213llmcbdrkaihfw9ue9zkxdygkxs 48

Monitoring Hadoop with HP Insight Cluster Management Utility Hadoop worker-nodes Timed View Hadoop cluster behavior real time analysis 49