HPE Vertica & Hadoop. Tapping Innovation to Turbocharge Your Big Data. #SeizeTheData



Similar documents
SEIZE THE DATA SEIZE THE DATA. 2015

HDP Hadoop From concept to deployment.

How To Use Hp Vertica Ondemand

HP Vertica. Echtzeit-Analyse extremer Datenmengen und Einbindung von Hadoop. Helmut Schmitt Sales Manager DACH

The Future of Data Management

Hadoop & Spark Using Amazon EMR

Unified Big Data Processing with Apache Spark. Matei

Moving From Hadoop to Spark

Luncheon Webinar Series May 13, 2013

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

BIG DATA TRENDS AND TECHNOLOGIES

From Spark to Ignition:

HDP Enabling the Modern Data Architecture

The Future of Data Management with Hadoop and the Enterprise Data Hub

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Oracle Big Data SQL Technical Update

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

Next-Generation Cloud Analytics with Amazon Redshift

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Upcoming Announcements

Please give me your feedback

Real Time Big Data Processing

Constructing a Data Lake: Hadoop and Oracle Database United!

The Internet of Things and Big Data: Intro

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Why Big Data in the Cloud?

Virtualizing Apache Hadoop. June, 2012

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Hadoop Ecosystem B Y R A H I M A.

EMC Federation Big Data Solutions. Copyright 2015 EMC Corporation. All rights reserved.

Information Architecture

The Inside Scoop on Hadoop

Cisco IT Hadoop Journey

Deploy. Friction-free self-service BI solutions for everyone Scalable analytics on a modern architecture

Building your Big Data Architecture on Amazon Web Services

Apache Hadoop in the Enterprise. Dr. Amr Awadallah,

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Big Data on Microsoft Platform

Bringing Big Data to People

Big Data and Market Surveillance. April 28, 2014

Ganzheitliches Datenmanagement

Advanced In-Database Analytics

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015

Roadmap Talend : découvrez les futures fonctionnalités de Talend

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

The Digital Enterprise Demands a Modern Integration Approach. Nada daveiga, Sr. Dir. of Technical Sales Tony LaVasseur, Territory Leader

Comprehensive Analytics on the Hortonworks Data Platform

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014

Scaling Out With Apache Spark. DTL Meeting Slides based on

Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time?

Modernizing Your Data Warehouse for Hadoop

Hadoop MapReduce and Spark. Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015

SAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

#TalendSandbox for Big Data

Big Data Analytics Nokia

Big Data Approaches. Making Sense of Big Data. Ian Crosland. Jan 2016

APPROACHABLE ANALYTICS MAKING SENSE OF DATA

EMC GREENPLUM DATABASE

Trafodion Operational SQL-on-Hadoop

SAP and Hortonworks Reference Architecture

CitusDB Architecture for Real-Time Big Data

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

2015 Ironside Group, Inc. 2

Hadoop and Relational Database The Best of Both Worlds for Analytics Greg Battas Hewlett Packard

INTRODUCING APACHE IGNITE An Apache Incubator Project

Investor Presentation. Second Quarter 2015

Dell In-Memory Appliance for Cloudera Enterprise

How To Handle Big Data With A Data Scientist

Data Governance in the Hadoop Data Lake. Michael Lang May 2015

In-memory data pipeline and warehouse at scale using Spark, Spark SQL, Tachyon and Parquet

Big Data at Cloud Scale

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Protecting Big Data Data Protection Solutions for the Business Data Lake

TE's Analytics on Hadoop and SAP HANA Using SAP Vora

Analytics on Spark &

The Enterprise Data Hub and The Modern Information Architecture

A Modern Data Architecture with Apache Hadoop

Data Integration Checklist

Big Data & Cloud Computing. Faysal Shaarani

Dominik Wagenknecht Accenture

Apache Ignite TM (Incubating) - In- Memory Data Fabric Fast Data Meets Open Source

Information Builders Mission & Value Proposition

Databricks. A Primer

Getting Started with Hadoop. Raanan Dagan Paul Tibaldi

THE DEVELOPER GUIDE TO BUILDING STREAMING DATA APPLICATIONS

Deploying an Operational Data Store Designed for Big Data

Up Your R Game. James Taylor, Decision Management Solutions Bill Franks, Teradata

Turning Data Into Answers With HP Vertica

How To Use A Data Center With A Data Farm On A Microsoft Server On A Linux Server On An Ipad Or Ipad (Ortero) On A Cheap Computer (Orropera) On An Uniden (Orran)

Pentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System

Oracle Database 12c Plug In. Switch On. Get SMART.

Transcription:

HPE Vertica & Hadoop Tapping Innovation to Turbocharge Your Big Data #SeizeTheData

The HPE Vertica portfolio One Vertica Engine running on Cloud, Bare Metal, or Hadoop Data Nodes HPE Vertica OnDemand & AMI Get up and running quickly in the cloud Amazon AWS Flexible, enterprise-class cloud deployment options Core HPE Vertica SQL Engine Advanced Analytics Cloud, Bare Metal, & Data Lake Open ANSI SQL Standards ++ R, Python, Java, Spark, Scala HPE Vertica Enterprise Columnar storage and advanced compression Maximum performance and scalability Flex Tables for schema-on-read HPE Vertica for SQL on Hadoop Native support for ORC, more Support for industry-leading distributions No helper node or single point of failure 2

Vertica on the Data Lake The same Vertica Engine Now using Open File Formats on Shared Storage Not a Data Mart or EDW 2.0 The Data Lake is Open and Elastic at Enabling Economics ORC, Parquet, AVRO, JSON: Stage, Retain, and use data in original formats, at high velocity ETL Process & BI Tools: Lift & Shift to access Enabling Economics without implementation barriers Leverage the Tools, Training, and Expertise within lines of business ETL / Transformation Throughput Leverage Vertica efficiency and rack aware networking to supercharge legacy ETL Discovery & Ad-Hoc HPE Vertica for SQL on Hadoop Native support for ORC, more Support for industry-leading distributions No helper node or single point of failure Combine data sets which previously were too big to house together cost effectively Understand the value of data without a Procurement Process 3

Vertica Enterprise on Bare Metal The same Vertica Engine Now using Open File Formats on Shared Storage Transition Highly Leveraged Data Sets Traditional Vertica, managing fault tolerant storage on rack servers with SAS disks Enterprise features: backup, replication, data lifecycle management HPE Vertica Enterprise Perfect for BI Tools and Application APIs which Columnar require storage consistent, and advanced predictable compression response Maximum time performance and scalability Flex Tables for schema-on-read HPE Vertica for SQL on Hadoop Native support for ORC, more Support for industry-leading distributions No helper node or single point of failure Agility and Elasticity Discover Value in the Lake / Curate to Production Right Size server count with elastic provisioning 4

Vertica OnDemand & Cloud The same Vertica Engine With the Infrastructure When & Where you need it HPE Vertica OnDemand & AMI Get up and running quickly in the cloud Amazon AWS Flexible, enterprise-class cloud deployment options HPE Vertica Enterprise Columnar storage and advanced compression Maximum performance and scalability Flex Tables for schema-on-read HPE Vertica for SQL on Hadoop Native support for ORC, more Support for industry-leading distributions No helper node or single point of failure Spin up Low Volume Production Discover on Lake / Deploy on cost effective Virtual Private Cloud or Hybrid Cloud Sunset legacy workloads Variety of Supported AMI Images / Machine types appropriate for any workload Where you want it Your data is in S3? No problem. Run Vertica where your data lives Data Governance requirements? No problem. OnDemand available in the geographies you need 5

HPE Vertica for SQL on Hadoop features and benefits Query data, no matter where it is located Install HPE Vertica directly on your Hadoop infrastructure ORC, Parquet, Avro, Vertica ROS and JSON supported Full-functionality ANSI SQL 100% of TPC-DS queries No helper node or single point of failure Competitive price point Analytical Applications R Java Python SQL HPE Vertica Core Engine Store: ROS Ingest: AVRO, JSON, etc. Query: ORC & Parquet 6

HPE Vertica Enterprise Proven Advanced ANSI SQL Analytics Massively Parallel Processing Column Orientation Application Integration Highly Available ANSI SQL ENGINE Columnar Formats (ROS, Flex) ANSI SQL ENGINE Columnar Formats (ROS, Flex) ANSI SQL ENGINE Columnar Formats (ROS, Flex) ANSI SQL ENGINE Columnar Formats (ROS, Flex) Automatic DB Designer File System (EXT 4) File System (EXT 4) File System (EXT 4) File System (EXT 4) Advanced Compression Management Console Commodity Hardware (x86) Commodity Hardware (x86) Commodity Hardware (x86) Commodity Hardware (x86) 7

HPE Vertica for SQL on Hadoop Same Vertica MPP Columnar Architecture Base ANSI SQL ANSI SQL ENGINE ANSI SQL ENGINE ANSI SQL ENGINE ANSI SQL ENGINE Co-Located with Hadoop Data Query Across Parquet, ORC, JSON, and many other formats Hadoop Agnostic Open Formats + ROS/Flex HDFS Open Formats + ROS/Flex HDFS Open Formats + ROS/Flex HDFS Open Formats + ROS/Flex HDFS Commodity Hardware (x86) Commodity Hardware (x86) Commodity Hardware (x86) Commodity Hardware (x86) 8

Relative speed-up VSQLH is reliable and performant TPC-DS performance vs Impala 8x 7x 6x 5x 4x Up to 2x slower Comparable Up to 7x faster 43 queries more! 3x 2x 1x 0x Queries In this industry-standard benchmark: 43/99 queries will not run in other solutions that work unchanged in VSQLOH Some queries run up to 7X faster on HPE SQL on Hadoop Concurrency is better supported Security with Kerberos

New ORCFile Reader Open Source Project jointly developed by Vertica and Hortonworks Engineers Query close to where data resides Column Pruning, Predicate Pushdown Hadoop Data Nodes Hadoop Data Nodes Before VSQLH SP1 After VSQLH SP2 Hcat Connector ORC Reader SerDes External Table WebHCat WebHDFS HDFS ORCFiles HDFS ORCFiles 10

Vertica Management Console Manage multiple clusters from a single web-based console Real-time view of database activity and cluster status Correlate system and database activity Browser Access Vertica Management Console Cluster 1 Cluster 2 Cluster 3

Third-party integrations HPE Vertica HPE Vertica HPE Vertica HPE Vertica ANSI SQL ENGINE ANSI SQL ENGINE ANSI SQL ENGINE ANSI SQL ENGINE

Key differentiators No limits data exploration Richest and proven SQL query engine Dramatically reduce complexity of your data architecture with a time-tested, unified query engine Enterprise-ready, stable, trusted Improve productivity with a robust, highly optimized, enterprise-ready SQL engine and first-class support Simplicity and fast insights in data exploration Leverage our Database Designer to simplify the management of your queries Easy path to Vertica Enterprise Perform data discovery in Hadoop and boost analytics performance with Vertica EE Rich ecosystem

Trial: Take a free test drive http://goo.gl/rp4fzi

HPE Vertica Connector for Apache Spark Beta download at https://saas.hpe.com/marketplace/haven/hpe-vertica-connector-apache-spark Ingest terabytes of Spark in-memory data into HPE Vertica for interactive SQL analytics Operationalize Spark machine learning models using HPE Vertica extensible Big Data platform Ingest Spark streaming data to HPE Vertica for low-latency interactive SQL analytics

HPE Vertica Connector for Apache Spark Operationalize Spark Machine learning models in HPE Vertica HPE Vertica Operationalize Business Problem HPE Vertica Vertica standalone cluster HPE Vertica and Explore Data Monitor Vertica SQL on Hadoop HPE Vertica Model and Evaluate Prepare Data Vertica Cloud Deployments Vertica OnDemand HPE Vertica

HPE Vertica Connector for Apache Spark Apache Kafka + Spark + HPE Vertica for both Batch and Streaming Analytics ETL MapReduce replacement Data Generation OLTP/ODS Logs (Apps, Web, Devices) User tracking Distributed Messaging System Apache Kafka Apache Spark ETL Stream processing HPE Vertica Analytics Platform Analytics / Reporting Operational Metrics

Save Apache Spark RDD and DataFrame in HPE Vertica HPE Vertica native efficient parallel bulk loading Parallel Copy Stream HPE Vertica WORKER NODE Executor RDD Dataframe Node 1 Vertica Table Data Segment Partition Partition Partition Node 2 Vertica Table Data Segment WORKER NODE Executor Partition Partition Node 3 Vertica Table Data Segment Partition

HPE Vertica Native Data Source for Apache Spark Simple and fast HPE Vertica data access in Spark in-memory analytics libraries HPE Vertica Parallel Copy Stream Node 1 Vertica Table Data Segment WORKER NODE Executor RDD Dataframe Partition Partition Node 2 Partition Vertica Table Data Segment WORKER NODE Node 3 Vertica Table Data Segment Executor Partition Partition Partition

Operationalize Predictive Models Deploy Spark ML Models in PMML for in-database scoring ML Lib Node 1 HPE Vertica PMML MODELS Classification models Regression models Clustering models PMML Node 2 PMML MODELS Node 3 PMML MODELS

HPE Vertica Connector for Spark WORKER NODE Executor Store data in RDDs in HPE Vertica native columnar format for highspeed interactive SQL analytics HPE Vertica Node 1 Node 2 WORKER NODE Executor Fast access to HPE Vertica data in Spark RDDs and data frames with filter and predicate push-down Node 3

On-premise data access Streaming Kafka, Trickle (Insert/Update) Schema on read Flex Zone: JSON, CSV, Text, Social Media Batch ODBC/JDBC, Bulk COPY, LCOP, ETL: Pentaho, Attunity, Informatica, Talend, ET AL Unstructured IDOL: Video, Audio, Voice Recognition, Facial Recognition Hadoop ORC Reader, MapR NFS, HIVE Serializer: HDFS, Parquet, AVRO Vertica Cluster 22

Start with the Event Bus Operational Analytics On-the-fly observations, complex events, and heads-up analytics Live Data Streams Web Logs Application Data IOT data Messaging Bus Batch Analytics Publish-Subscribe services for sharing data Longer term analysis, discovery, complex predictions and deep analytics Databases Hadoop Cloud

Take the Insights to Production Operational Analytics Spark, MemSQL, etc. On-the-fly observations, complex events and heads-up analytics Live Data Streams Messaging Bus Kafka. Also Talend ESB, Mulesoft, Informatica, other proprietary solutions Publish-Subscribe services for sharing data Web Logs Application Data IOT data Batch Analytics HPE Vertica Longer term analysis, discovery, complex predictions and deep analytics Databases Hadoop Cloud

Trends & HPE Big Data Platform

Obvious: structured data analytics trends continue Data continues to grow Hardware gets more capable Those who demand the highest levels of performance can generally forecast their capacity demands, and are willing to dedicate hardware will continue to choose high-performance MPP databases Cloud offerings of MPP databases will continue to mature, but will not fully displace dedicated offerings In-memory processing of small big data will continue The market will continue to consolidate

HPE s Big Data portfolio You should consider HPE for all your Big Data needs because: Big Data is everything HPE has something for ALL of your data The HPE portfolio has unmatched breadth vs. any other vendor Hardware Software IDOL Partners & FOSS Services It s all shipping, and it all has customer proof points HPE is a solid company and is committed to Big Data HPE has the technology, people, and partners to get you on track Who has a stronger portfolio?

The data lake paradigm will build Data will first be kept in cheap file storage (HDFS, at worst) It isn t necessarily unstructured, just not yet structured A variety of tools will be available to structure & analyze data Dumb storage isn t enough Unstructured, self-structured, structured, graph, aggregated, all on one platform The variety of software and tools to treat data will continue to expand A pyramid of choices will align costs of keeping and curating the data with the value of the data Enlightenment on democratization will be reached but we are not there yet Collaboration, sharing, etc. Tools will be available to prioritize critical processes & data Resource provisioning Break out dedicated marts, etc. The new one size fits all?

Ingest Store & explore Govern & protect Serve IDOL Connectors (KeyView) Archiving, ECM, Records Management, ControlPoint Virtualization tools ArcSight Data Lake Smarter, Safer Vertica SQL Web portal EL (Structured, PolyStructured) UDX/SDK/API ediscovery CEP/streaming IDOL APIs/ODBC/JDBC/R Vertica connectors Search Hadoop connectors (Flume, Storm, Sqoop, Scribe) R Reports Custom SDK/API Hadoop Tools Dashboard On Premise/Cloud Infrastructure - Hardware, Cloudburst, BackupData Protector

Try HPE Vertica Today! HPE Start-Up Essentials Program Grow your start-up business with Big Data analytics free to companies with <10 employees for the initial years. http://go.saas.hpe.com/bigdata/accelerator HPE Vertica Community Edition Download and install community edition. Manage and analyze up to 1 TB of data across three nodes for an unlimited time. https://my.vertica.com/community HPE Vertica OnDemand & AMI Get up and running with proven Data Warehouse as a Service in minutes. Install your Vertica license on select cloud hardware to create cluster on AWS. Vertica AMI: http://amzn.to/1olve4c VOD: https://saas.hpe.com/buy/vertica-on-demand HPE Vertica for SQL on Hadoop Apply to join the Early Adopter Program for a 90-day trial with premium level technical presales and implementation support directly from Vertica R&D. https://hp.orbitera.com/c2m/customer/testdrives/index

Thank you bryan.whitmore@hpe.com Month day, year #SeizeTheData

#SeizeTheData