Lofan Abrams Data Services for Big Data Session # 2987

Similar documents
Understanding and Leveraging Improvements in SAP Data Integration and Data Services Platform 4.2

SAP Data Services 4.X. An Enterprise Information management Solution

Implement Hadoop jobs to extract business value from large and varied data sets

Enhance your Analytics using Logical Data Warehouse and Data Virtualization thru SAP HANA smart data access SESSION CODE: 0210

Data Integration Checklist

SAP HANA SPS 09 - What s New? HANA IM Services: SDI and SDQ

XpoLog Competitive Comparison Sheet

SAP Data Services Hacks Auto Generating Data Migration Jobs Shobhit Acharya Session# 3507

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

Hadoop Job Oriented Training Agenda

Native Connectivity to Big Data Sources in MSTR 10

Data processing goes big

What's New in SAS Data Management

Qsoft Inc

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Exploring the Synergistic Relationships Between BPC, BW and HANA

Tap into Hadoop and Other No SQL Sources

Integrate Master Data with Big Data using Oracle Table Access for Hadoop

Luncheon Webinar Series May 13, 2013

How To Handle Big Data With A Data Scientist

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect

Dominik Wagenknecht Accenture

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Value Realization at Johnson Controls using SAP HANA smart data integration Steve Carpenter Johnson Controls Ryan Champlin - SAP

IBM BigInsights Has Potential If It Lives Up To Its Promise. InfoSphere BigInsights A Closer Look

Automated Data Ingestion. Bernhard Disselhoff Enterprise Sales Engineer

Hadoop Ecosystem B Y R A H I M A.

PUBLIC Performance Optimization Guide

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Hadoop Introduction. Olivier Renault Solution Engineer - Hortonworks

Constructing a Data Lake: Hadoop and Oracle Database United!

Performance and Scalability Overview

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

SAP Crystal Reports & SAP HANA: Integration & Roadmap Kenneth Li SAP SESSION CODE: 0401

Deploy. Friction-free self-service BI solutions for everyone Scalable analytics on a modern architecture

From Dolphins to Elephants: Real-Time MySQL to Hadoop Replication with Tungsten

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam

Native Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Performance and Scalability Overview

Big Data Analytics in LinkedIn. Danielle Aring & William Merritt

SAP Business Objects Business Intelligence platform Document Version: 4.1 Support Package Data Federation Administration Tool Guide

Data Governance in the Hadoop Data Lake. Michael Lang May 2015

Replicating to everything

Oracle Database 12c Plug In. Switch On. Get SMART.

Getting Started with Hadoop. Raanan Dagan Paul Tibaldi

BIG DATA HANDS-ON WORKSHOP Data Manipulation with Hive and Pig

#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld

Apache Hadoop: The Big Data Refinery

TE's Analytics on Hadoop and SAP HANA Using SAP Vora

SAP Sybase Replication Server What s New in SP100. Bill Zhang, Product Management, SAP HANA Lisa Spagnolie, Director of Product Marketing

Hadoop & Spark Using Amazon EMR

Oracle Warehouse Builder 10g

High-Volume Data Warehousing in Centerprise. Product Datasheet

Big Data Technologies Compared June 2014

ORACLE DATA INTEGRATOR ENTERPRISE EDITION

An Oracle White Paper November Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

Manifest for Big Data Pig, Hive & Jaql

Self-service BI for big data applications using Apache Drill

An Overview of SAP BW Powered by HANA. Al Weedman

XpoLog Center Suite Data Sheet

Big Data Analytics Nokia

Oracle Data Integrator 11g New Features & OBIEE Integration. Presented by: Arun K. Chaturvedi Business Intelligence Consultant/Architect

Business Application Services Testing

Spring,2015. Apache Hive BY NATIA MAMAIASHVILI, LASHA AMASHUKELI & ALEKO CHAKHVASHVILI SUPERVAIZOR: PROF. NODAR MOMTSELIDZE

Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload

MySQL and Hadoop. Percona Live 2014 Chris Schneider

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

SQL Server 2012 Performance White Paper

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

Reference Architecture, Requirements, Gaps, Roles

Contents. Pentaho Corporation. Version 5.1. Copyright Page. New Features in Pentaho Data Integration 5.1. PDI Version 5.1 Minor Functionality Changes

Roadmap Talend : découvrez les futures fonctionnalités de Talend

SAP Data Services and SAP Information Steward Document Version: 4.2 Support Package 7 ( ) PUBLIC. Master Guide

Safe Harbor Statement

Hadoop and Map-Reduce. Swati Gore

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

Sentimental Analysis using Hadoop Phase 2: Week 2

SAP HANA Cloud Platform

How, What, and Where of Data Warehouses for MySQL

Big Data Operations Guide for Cloudera Manager v5.x Hadoop

Oracle Data Integrator 11g: Integration and Administration

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru

Integration of Apache Hive and HBase

OWB Users, Enter The New ODI World

Ganzheitliches Datenmanagement

Self-service BI for big data applications using Apache Drill

In-memory computing with SAP HANA

Talend Open Studio for Big Data. Release Notes 5.2.1

Federated SQL on Hadoop and Beyond: Leveraging Apache Geode to Build a Poor Man's SAP HANA. by Christian

Sisense. Product Highlights.

An Oracle White Paper February Oracle Data Integrator 12c Architecture Overview

Upcoming Announcements

MySQL and Hadoop: Big Data Integration. Shubhangi Garg & Neha Kumari MySQL Engineering

6.0, 6.5 and Beyond. The Future of Spotfire. Tobias Lehtipalo Sr. Director of Product Management

Transcription:

Lofan Abrams Data Services for Big Data Session # 2987

Big Data Are you ready for blast-off?

Big Data, for better or worse: 90% of world s data generated over last two years. ScienceDaily, ScienceDaily May 22, 2013.

Barriers to operational effectiveness Scattered Information Scattered Information Heterogeneous / Complex Sources Data explosion Trustworthiness of Information Handling Unstructured Content Stored Data Structured 15% Unstructured 85% SAP 2008 / Page 4

SAP Solutions for Enterprise Information Management Information Ready for Action Analytics Business Processes Before After Data Quality Management Data Integration GOVER N Master Data Management Big Data & IoT 1010 01011 01011 001 01001 Data Discovery Information Lifecycle Management Content Management Compliance

SAP SOLUTIONS FOR TRUSTED DATA PROVEN LEADER IN EVERY CATEGORY

Data Services and Big Data Sources Hadoop MongoDB Google BigQuery

Big Data in SAP Data Services Value Proposition Use one single ETL tool to move data (structured and unstructured) to big data stores and data warehouses. Simple to use with same dataflow designer for all types of sources/targets, with data preview capabilities to enhance developers productivity. Use Cases Extract data (with the right filters pushed down to the source) from mongodb or Hadoop into a DWH for analytics (HANA, Teradata, Google Big Query, ). ETL experts don t have knowledge of languages like Pig script or MongoDB syntax and need a code-free UI. How it works Native datastores for MongoDB, Google Big Query, Hadoop (HDFS, Hive) + adapter SDK open to partners to build more adapters. Data preview and data profiling for Hadoop sources built into Designer user interface. 8

Hadoop Datastore Certified with top two Hadoop distributions Hadoop HortonWorks 2.2 (source, target) Hadoop Cloudera CDH 5.3 (source, target)

Hadoop/Hive Support since Data Services 4.1 Files HANA, IQ, other Target Systems Databases Data Services Hadoop Data Services Web & Others Same familiar, easy-to-use UI design paradigm for Hadoop/Hive as other database systems but with specific behind-the-scenes extensions to leverage the power, scale and unique functionality of Hadoop High-performance reading from and loading into both Hadoop (HDFS) and Hive Makes use of Hadoop capabilities by delegating operations to Hadoop/Hive systems (T-E-L) Extended Optimizer fully HiveQL and PIG aware and generates optimized scripts for Hive and Hadoop

Hive Support Full metadata support via JDBC, browse and explore Hive tables DS generates HiveQL and pushes down operations to Hive Joins, Sorting, Filters, Functions including aggregation functions High-performance, scalable reading from Hive Multi-threaded, parallel reading of Hive results (not JDBC) All types of column partitioning is supported High performance loading into Hive Support for Inserts and Updates Support for both Static and Dynamic partitioning Multi-threaded (parallel) loading Reading/Loading Hive Metadata (JDBC) Data Services HDFS Files

HDFS Support Access to metadata and structure of files in HDFS DS generates PIG and pushes down operations to Hadoop, operations include: Joins Sorting Filters and Projections Functions including aggregation functions Reading from Hadoop High performance, parallel reading of files produced by above PIG script Ability to invoke pre-defined or custom PIG scripts High-performance File-based loading into Hadoop

Data Preview for Hadoop Hive Tables Hive Table Preview, includes Data Preview Profile Preview Column Profile Preview Filtering

Data Preview for Hadoop HDFS Files Offer View Data (no profiling) for Hadoop HDFS files: In the datastore When used as source or target in a dataflow Including filtering and sorting pushed down to HDFS.

Enable SSL Certificate for Hadoop To enable SSL in HIVE adapter, set SSL Enabled = yes SSL Trusted Store and Password The name of the Trust Store you are using to verify credentials and store certificates. TrustStore stores certificates from third party, your Java application communicate or certificates signed by certificate authorities like Verisign, Thawte, Geotrust etc.) which can be used to identify third party. The password associated with the Trust Store. Additional Properties Specifies any additional connection properties. Property value pairs must be separated by a semi-colon.

Support SQL() function for Hadoop Added Support for HIVE data stores Used for Data Definition Language (DDL) and Data Manipulation Language (DML) on HIVE databases Useful for managing database objects as precursor to DS code execution. Can also be used for post process database information retrieval.

Support SQL Transform for Hadoop SQL Transform supports a single Select statement only Used for standard SQL selects from existing scripts outside DS Select statements can be parameterized.

Support Join pushdown operation for Hadoop Why pushdown? Pushdown of transforms and functions to source or target database will leverage the database power instead of doing these operations in the Data Services engine. Specially if source and target tables are in the same database, this will give best performance since no data is extracted from the database. Support join push-down operations for Hadoop e. g. Using a Data Transfer transform to stage data from non-hive source to HIVE

MongoDB Datastore

What is MongoDB? MongoDB is: a popular document oriented (open source) database. It s a nosql database with dynamic schemas that stores data in a (nested) JSON-like format. MongoDB ranked #4 on most popular database in February 2015 (http://db-engines.com/en/ranking ).

MongoDB use case for Data Services Enable our customers to be able to extract data from MongoDB (coinnovation with US customer) as a source and load it to a target for analytics.

MongoDB adapter in the Management Console Implemented as a new adapter leveraging the Data Services adapter SDK. Adapter needs to be added and started in Management Console before it can be used in a datastore.

MongoDB datastore MongoDB adapter supports: Single (Primary) Replica set (Secondary) Shared Cluster Sharding is the process of storing data across multiple machines MongoDB uses this approach to support large data sets deployament and high throughput operations. MongoDB Credential, LDAP and Kerberos authentications SSL Certificate Since MongoDB does not have a schema definition, Data Services will scan a sample set of documents ( Rows to scan ) in the collection and create a schema based on the superset of all fields.

MongoDB documents as source in a dataflow Collections are imported as Documents in the repository. The nested structure is preserved, with XML_Map in Data Services you can manipulate the data. Filters defined in the WHERE clause are pushed down to the database. More advanced filter conditions can be defined in the adapter parameter Query criteria using the MongoDB syntax.

Google BigQuery Datastore

What is Google Big Query? Google BigQuery is using Google s data storage in the cloud, for fast interactive analysis on huge amounts of data: Google BigQuery enables super-fast, SQL-like queries against append-only tables, using the processing power of Google's infrastructure. Main use case for Data Services is to load data into BigQuery for analytics. Note: in this release, BigQuery can be used as target only, not as source.

Google Big Query Datastore Native Google Big Query datastore (in the Applications category) Certificate based login: import private key file + provide password The private key is generated from Google Big Query account page. When exporting a GBQ datastore, the private key is NOT exported and needs to be imported again in the target repository (with correct passphrase).

Google Big Query as target in a dataflow Browse metadata and import tables Tables contain nested data Google Big Query can be a target table only Note: template tables are not supported, but from a query you can generate a JSON structure which can be used to create the target via the BigQuery web console.

Roadmap Current, Planned Innovations and Future

SAP Data Services Product road map overview - key themes and capabilities Simple Today Planned Innovations Future Direction (Release 4.2 SP5) Simple Simple Enhanced runtime troubleshooting process by introducing Bypass dataflows and workflows feature Enabled Switch repositories capability in Designer Big Data Enhanced support for IQ, HANA, and other Big Data sources Simplified real-time CDC with SAP Replication Server New connectivity for OData, JSON, REST, MongoDB, Google Big Query and JDBC Certified Hadoop Cloudera and HortonWork Support DDL and DML and data preview for Hadoop Added Sharded Cluster support for MongoDB Security enhancement for Hadoop & MongoDB SSL certificate and role-based authentications (LDAP, Kerberos) Enterprise Support Support pattern variance in Data Masking transform Added Secured Remote File Adapter Built-in functions for file transfer (SFTP) and file manipulation Simplify Data Services software upgrades Improve Substitution Parameters management Add preview and select capability for importing objects into DS repository from a file Merge DS workbench capabilities into DS Designer Big Data Support Hadoop on Windows platform Enhance existing connectivity (source/target) Token based security Enterprise Support Integrate comprehensive runtime stats of DS batch/real-time jobs with SAP Solution Manager Native integration with SAP NetWeaver CTS+ to deliver single transport tool for DS, SAP and other applications Integrate TA 5.x to enhance Text Data Processing engine Data Quality global expansion in Asia Pacific Show graphical dataflow monitor and identify bottlenecks Self-Guiding user interfaces to enhance user experience Big data Expanding support for new sources/targets based on market traction Data Model advisor for HANA database Tight integration with Big Data solutions (SPARK, YARN ) Enterprise support Resource Advisor to provide clarity of system usage Data Services components health monitor with proactive job alerts and analysis. Data Services datastore as a service

Demo

Why SAP?

SAP Solutions for Enterprise Information Management Proven and trusted 12,000+ SAP EIM customers worldwide Winner Swiss Re, Kraft Foods Inc. and Lexmark Intl. are winners of Gartner MDM Excellence Awards Leader In every EIM Category: Master Data Management Data Quality Data Integration Enterprise Architecture Enterprise Content Management Enterprise Data Virtualization 90% customer satisfaction rating #2 Market share for data integration and data quality

STAY INFORMED Follow the ASUGNews team: Tom Wailgum: @twailgum Chris Kanaracus: @chriskanaracus Craig Powers: @Powers_ASUG

SESSION CODE 2987