Hortonworks Data Platform for Hadoop and SAP HANA Prasad illapani, Big Data & SAP HANA- Product Management & Strategy SAP Labs LLC., Bellevue, WA Bob Page, VP Partner Products, Hortonworks Inc. Palo Alto, CA
Legal Disclaimer The information in this document is confidential and proprietary to SAP and may not be disclosed without the permission of SAP. This presentation is not subject to your license agreement or any other service or subscription agreement with SAP. SAP has no obligation to pursue any course of business outlined in this document or any related presentation, or to develop or release any functionality mentioned therein. This document, or any related presentation and SAP's strategy and possible future developments, products and or platforms directions and functionality are all subject to change and may be changed by SAP at any time for any reason without notice. The information on this document is not a commitment, promise or legal obligation to deliver any material, code or functionality. This document is provided without a warranty of any kind, either express or implied, including but not limited to, the implied warranties of merchantability, fitness for a particular purpose, or non-infringement. This document is for informational purposes and may not be incorporated into a contract. SAP assumes no responsibility for errors or omissions in this document, except if such damages were caused by SAP intentionally or grossly negligent. All forward-looking statements are subject to various risks and uncertainties that could cause actual results to differ materially from expectations. Readers are cautioned not to place undue reliance on these forward-looking statements, which speak only as of their dates, and they should not be relied upon in making purchasing decisions. 2
Agenda Hortonworks Data Platform - HDP 2.1 Enterprise Hadoop SAP HANA Platform for Big Data SAP HANA Platform and Hortonworks Data Platform- Solution Use Cases and Patterns Key Takeaways 3
A Modern Data Architecture APPLICATIONS Business Analytics Custom Applications Packaged Applications DEV & DATA TOOLS Build & Test DATA SYSTEM RDBMS EDW MPP REPOSITORIES Governance & Integration Enterprise Hadoop Data Access Data Management Security Operations OPERATIONS TOOLS Provision, Manage & Monitor SOURCES OLTP, ERP, Documents, Web Logs, CRM Systems Emails Click Streams Social Networks Machine Generated Sensor Data Geolocation Data 4
HDP 2.1: Enterprise Hadoop HDP 2.1 Hortonworks Data Platform GOVERNANCE & INTEGRATION DATA ACCESS SECURITY OPERATIONS Data Workflow, Lifecycle & Governance Falcon Sqoop Flume NFS WebHDFS Batch Map Reduce Script Pig SQL Hive/Tez, HCatalog NoSQL HBase Accumulo Stream Storm YARN : Data Operating System Search Solr 1 HDFS (Hadoop Distributed File System) Others In-Memory Analytics, ISV engines N Authentication Authorization Accounting Data Protection Storage: HDFS Resources: YARN Access: Hive, Pipeline: Falcon Cluster: Knox Provision, Manage & Monitor Ambari Zookeeper Scheduling Oozie DATA MANAGEMENT Deployment Choice Linux Windows On- Premise Cloud 5
HDP: Open, Reliable, & Current HDP certifies most recent & stable community innovation HDP 2.1 0.13.0 1.5.1 April 2014 2.4.0 0.4 0.12.1 0.12.0 0.98.1 0.9.1 0.9.0 4.8.0 1.4.5 1.4.0 0.5 4.0.0 0.4 HDP 2.0 October 2013 HDP 1.3 May 2013 2.2.0 1.1.2* Hadoop &YARN Tez 0.12.0 0.11 Pig 0.11.0 Hive & HCatalog 0.96.0 0.94.6 HBase Storm 0.8.0 0.7.0 Mahout Solr 1.4.4 1.4.3 Sqoop 1.3.1 Flume Falcon 1.4.1 1.2.3 Ambari 3.3.2 Oozie 3.4.5 Zookeeper Knox Data Management Data Access Governance & Integration Operations Security Hortonworks Data Platform 6
HDP: Interactive SQL-IN-Hadoop Stinger Initiative DELIVERED Next generation SQL based interactive query in Hadoop Speed Improve Hive query performance has increased by 100X to allow for interactive query times (seconds) Scale The only SQL interface to Hadoop designed for queries that scale from TB to PB SQL Support broadest range of SQL semantics for analytic applications running against Hadoop Business Analytics Apache MapReduce SQL Apache YARN 1 Custom Apps Apache Hive Apache Tez HDFS (Hadoop Distributed File System) N Stinger Project Stinger Phase 1: Base Optimizations SQL Types SQL Analytic Functions ORCFile Modern File Format Stinger Phase 2: SQL Types SQL Analytic Functions Advanced Optimizations Performance Boosts via YARN Stinger Phase 3 Hive on Apache Tez Query Service (always on) Buffer Cache Cost Based Optimizer (Optiq) An Open Community at its finest: Apache Hive Contribution 1,672 Jira Tickets Closed 145 Developers 44 Companies ~330,000 Lines Of Code Added (2.5x) 13 Months 7
HDP 2.1 HDP: Governance and Integration Governance & Integration Data Access Data Management Security Operations Apache Falcon Simplified Data Governance for Enterprise Hadoop First time included in HDP Provides key governance framework for: Acquisition & processing of data sets Replication & Retention of datasets Redirect datasets to non-hadoop extensions Provides audit trail & lineage Another great example of Open Community Innovation Originally built and contributed to Apache by InMobi Fastest path to innovation is the open community 14 months in the making Tested In production Vibrant community of developers building Investment Phases Phase-1 Incubate Apache Falcon Dataset replication & retention Falcon tech preview Phase-2 Basic dashboard for pipeline viewing Kerberos security support Ambari integration for management Hive/HCatalog integration Phase-3 Advanced Dashboard for pipeline definition & management Audit Lineage Data tagging File import SSH & SCP 8
HDP 2.1 HDP: Apache Knox Governance & Integration Data Access Data Management Security Operations Important Note: Security for Hadoop must be addressed within every layer of the stack and integrated into existing frameworks For a full description of what is available in Enterprise Hadoop today across Authentication, Authorization, accountability and Encryption please visit our security labs page Apache Knox Perimeter security for Hadoop A common place to preform authentication across Hadoop and all related projects Integrated to LDAP and AD Currently supports: WebHDFS, WebHCAT, Oozie, Hive & HBase Broad community effort, Incubated with Microsoft, broad set of developers invovled Security Investments Phase 1 Strong AuthN with Kerberos HBase, Hive, HDFS basic AuthZ Encryption with SSL for NN, JT, etc. Wire encryption with Shuffle, HDFS, JDBC Security Phase 2: ACLs for HDFS Knox: Hadoop REST API Security SQL-style Hive AuthZ (GRANT, REVOKE) SSL support for Hive Server 2 SSL for DN/NN UI & WebHDFS PAM support for Hive Security Phase 3: Audit event correlation and Audit viewer Support Token-Based AuthN beyond kerb Data Encryption in HDFS, Hive & Hbase Knox for HDFS HA, Ambari & Falcon 9
HDP 2.1 New: Stream Processing Governance & Integration Data Access Data Management Security Operations Apache Storm Real-time event processing for sensor and business activity monitoring Unlocks new business cases for Hadoop Key component of a data lake architecture Scale: Ingest millions of events per second. Fast query on petabytes of data Integrated with Ambari to manage Investment Phases Phase-1 Install, Start, & Stop via Ambari Kafka, HBase, & HDFS Connectors Ganglia & Nagios based monitoring Phase-2 Storm-on-YARN Ingest & Notification for JMS Data persistence: EDWs, RDBMS, Cassandra Phase-3 High Availability mgmnt w/ambari AD/LDAP plugin for authentication Declarative wiring Hive update support Advanced scheduler 10
HDP 2.1 HDP: Search Governance & Integration Data Access Data Management Security Operations Apache Solr Open source enterprise search for Hadoop and HDP Open architecture: In the community, for the community Simple, powerful UI for advanced search applications High performance indexing & sub-second search times over billions of documents Deep Integration Roadmap with HDP Partnership with LucidWorks LucidWorks provides tier 3 & 4 support Alignment w/ strategy of working within the community and with the core committers 9 committers total (7 PMC) 11
HDP 2.1 HDP: Operating Enterprise Hadoop Governance & Integration Data Access Data Management Security Operations AMBARI WEB Apache Ambari is the only 100% open source framework for provisioning, managing and monitoring Apache Hadoop clusters Integration With Existing Operations Tools Viewpoint Others New in HDP 2.1 Support new Data Access Engines Stack extensibility, Cluster Blueprints Rolling restarts Maintenance mode more... REST APIs AMBARI SERVER PROVISION MANAGE MONITOR PROVISION MANAGE MONITOR compute & storage.......... compute & storage 12
SAP HANA Platform Supports any Device Any Apps Any App Server SQL Analytics (Visualize, Predict, Engage) MDX R JSON Open Connectivity SAP Business Suite and BW ABAP App Server SAP HANA Platform Hana One HEC SQL, SQLScript, JavaScript Spatial Search/Graph Text Mining Stored Procedure & Data Models Application & UI Services Business Function Library Predictive Analysis Library Database Services Planning Engine Rules Engine Integration Services/Security/ Governance/LCM/Landscape Management Transaction Unstructured Machine HADOOP Real-time Locations Other Apps SAP HANA Platform converges Database, Data Processing, Application Platform capabilities & provides libraries for Predictive, Planning, Text, Spatial, Graph and Business Analytics to enable business to operate in real-time. SAP HANA Platform is available as an On Premise Appliance or via Cloud offerings: SAP HANA One on AWS, SAP HANA Enterprise Cloud (HEC). 13
SAP HANA Platform is expanding frontiers Cloud Big Data Hosting (HEC) Platform-aaS (HCP) HANA-aaS Elastic Storage Deep Exploration and Analysis Data-as-a-Service Internet of Everything Customer / Audience Behavior Corporate Functions SAP HANA Platform Logical Data Warehouse Data Aging / Archiving 14
SAP HANA + Hortonworks Data Platform (HDP) SOURCES ERP Apps Mobile Apps Custom Apps SAP Analytics Sensor Geo Logs Text Structured Weather Data Acquisition, Ingestion & Provisioning BATCH Processing HANA Engine Processing Engine Database Services Application Function Libraries & Data Models (OLTP + OLAP) INTERACTIVE SAP HANA PLATFORM In-memory processing platform for real-time transactions + end-to-end analytics Application Development Extended Application Services Application Function Libraries & Data Models ONLINE Integration Services STREAMING HANA IN-MEMORY YARN: Cluster Resource Management Unified Administration ISV APPS OTHERS Social HDFS: Redundant, Reliable Storage Other Hortonworks Data Platform (HDP) 15
Generic pattern 1: Machine Data Insight Prototypical Machine Data case Analytics & Applications SAP Enterprise data Non-SAP Enterprise data Mobile data Machine data (Sensors, SCADA, Logs, Etc.) Data Sources Real-Time Replication Stream Processing Synchronization Dashboard / Reporting in Real- Time HANA In Memory Transactional Analytical Extended Storage (IQ) Tiered Storage (Hot-warm-cold) Smart Data Access Predictive Analysis Planning & Simulation Graph Spatial Large Low-Cost Data Platform (Hadoop / IQ) Historical Data, Offline Batch Processes, Model Training etc. SAP HANA Data Platform Real-time operations, analysis and actions Real-time data stream (Billions of events/day) Millions of events/day correlated with Enterprise Data Use Cases Energy Optimization Predictive maintenance Remote asset mgmt. Supply/demand forecast Inventory mgmt. Route optimization Transform High Volume, High Velocity data into High Value Data. Enable Real-Time Analytics. 16
Generic pattern 2: Customer Insight Prototypical customer behavior analysis case SAP Enterprise data Mobile data Clickstream data Social data Historical Data Data Sources Real-Time Replication Stream Processing Data Movement Analytics & Applications Real-Time Offers HANA In Memory Transactional Analytical Extended Storage (IQ) Dashboard / Reporting in Real-Time Tiered Storage (Hot-warm-cold) Smart Data Access Predictive Analysis Planning & Simulation Graph Spatial Large Low-Cost Data Platform (Hadoop / IQ) Historical Data, Offline Batch Processes, Model Training etc. SAP HANA Data Platform Enable actionable insight got targeted applications Terabytes of data/month Millions of events/day correlated with Enterprise Data Use Cases Customer Behavior Customer Segmentation Customer Loyalty Customer Churn Online Consume Habits Campaign Performance Predictive Maintenance Enable real-time analytics and actionable insight. 17
SAP/Hortonworks Retail Big Data Architecture Streaming Data Events, Replicate Data Tables from Transactional Applications Real-Time Data Acquisition SAP IS-Retail Retail ERP System Sybase Event Stream Processor SAP Replication Server SAP SLT Real-time Near Realtime SAP Customer Activity Repository Real-Time Multichannel & Application Platform Consuming Applications Federated Smart Data Access Multichannel Sales ----------------- 360 Degree Customer View Predictive Engine Customer Mobile Applications Sybase Unwired Platform Spatial Engine Transfer Datasets SAP Business Objects BI Suite Exploration, Reporting, Dashboarding, Predictive, Mobile OLAP Engine Hortonworks Data Platform Data Lake Large Scale Data Capture, Generate Analytical Datasets, Train/Validate Predictive Models Batch Data Acquisition SAP Data Services Transactional Systems, Databases, Flat Files, Batch Data Feeds 18
SAP HANA/ HDP- Retail Big Data Solution Real-Time Data Acquisition SAP IS-Retail Retail ERP System SAP Event Stream Processor SAP Replication Server SAP SLT Real-time Multichannel Sales ----------------- 360 Degree Customer View Custom Mobile Applications SAP HANA PLATFORM In-memory processing platform for real-time transactions + end-to-end analytics Application Development Processing Engine Database Services Federated Transfer Smart Data (OLTP + OLAP) Datasets Access Application Function Libraries & Data Models SAP Customer Activity Repository Extended Application Services Integration Services SAP BI Suite Exploration, Reporting, Dashboarding, Predictive, Mobile Unified Administration Hortonworks Data Platform Data Lake Large Scale Data Capture, Generate Analytical Datasets, Train/Validate Predictive Models Batch /Near Real time SAP Data Services 19
SAP HANA Platform for Big Data Real-time Simplicity Trusted SAP HANA remains the only truly in-memory business technology platform in the market today SAP HANA is designed for real-time performance to process streaming, transactional and analytical data SAP HANA Platform extends these boundaries with special purpose, best of breed engines High performance single store radically simplifies by eliminating the necessity for multi-staged persisted data processing Deep integration between engines offers broadest coverage of data processing with minimal data movement and shared information architectures. Enterprise class data and service level guarantees offering highest levels of trusted data access. Single platform for modeling and processing a wide range of data forms. Full spectrum of transactional consistency options over very large data sets across the platform. 20
Key Takeaways Learn Hortonworks Data Platform Understand SAP HANA Platform SAP HANA and Hortonworks Data Platform Solution 21
Further Information Experience SAP Big Data http://www.sapbigdata.com/ SAP HANA and Hadoop http://www.sapbigdata.com/platform/hadoop/#sthash.bs6twlb9.dpbs Hortonworks Data Platform www.hortonworks.com Hortonworks Sandbox www.hortonworks.com/sandbox Hortonworks & SAP www.hortonworks.com/partners/sap 22
Thank you Prasad Illapani Big Data Product Management & Strategy SAP Labs LLC, Bellevue, WA Email: prasad.illapani@sap.com Bob Page VP Partner Products Hortonworks Inc. Email: bpage@hortonworks.com
2014 SAP AG. All rights reserved. No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP AG. The information contained herein may be changed without prior notice. Some software products marketed by SAP AG and its distributors contain proprietary software components of other software vendors. National product specifications may vary. These materials are provided by SAP AG and its affiliated companies ("SAP Group") for informational purposes only, without representation or warranty of any kind, and SAP Group shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP Group products and services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional warranty. SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP AG in Germany and other countries. Please see http://www.sap.com/corporate-en/legal/copyright/index.epx#trademark for additional trademark information and notices. 24