SAP Agile Data Preparation Speaker s Name/Department (delete if not needed) Month 00, 2015 Internal
Legal disclaimer The information in this presentation is confidential and proprietary to SAP and may not be disclosed without the permission of SAP. This presentation is not subject to your license agreement or any other service or subscription agreement with SAP. SAP has no obligation to pursue any course of business outlined in this document or any related presentation, or to develop or release any functionality mentioned therein. This document, or any related presentation and SAP's strategy and possible future developments, products and or platforms directions and functionality are all subject to change and may be changed by SAP at any time for any reason without notice. The information in this document is not a commitment, promise or legal obligation to deliver any material, code or functionality. This document is provided without a warranty of any kind, either express or implied, including but not limited to, the implied warranties of merchantability, fitness for a particular purpose, or non-infringement. This document is for informational purposes and may not be incorporated into a contract. SAP assumes no responsibility for errors or omissions in this document, except if such damages were caused by SAP s willful misconduct or gross negligence. All forward-looking statements are subject to various risks and uncertainties that could cause actual results to differ materially from expectations. Readers are cautioned not to place undue reliance on these forward-looking statements, which speak only as of their dates, and they should not be relied upon in making purchasing decisions. 2015 SAP SE or an SAP affiliate company. All rights reserved. Internal 2
Traditional BI vs. Self-Service BI Traditional BI Solutions: Expensive Heavy IT involvement Ideal when oversight trumps insight IT builds dashboards, business consumes them (but maybe not the right information) Long project cycles for new reports Require intensive SQL skills / external ETL Self-Service BI: Business users can access real-time data when they need it Short development cycles, less IT involvement Easy analysis No SQL required Ad hoc reporting Fast insight, little oversight So, businesses (analysts, data scientists) benefit from self-service BI Tools BUT: What about preparing data for analysis? Gartner says 80% of an analyst s time is spent doing data preparation not analysis! 2015 SAP SE or an SAP affiliate company. All rights reserved. Internal 3
Data Preparation is Challenging and Time-Consuming Today: Data Preparation Analysis Publish /Share Business analysts and data scientists spend most of their time on data preparation activities: Discovery Data Profiling Annotating Transformation Modeling Curation De-duplication Cleansing Enrichment This has created the need for end-user oriented tools that can shorten the overall time for data preparation and improve the productivity of data scientists and business analysts With Self-service Data Preparation Data Preparation Analysis Publish /Share Adapted from Self-Service Data Preparation: The Next Big Market Disruption, Rita Sallam, Gartner, March 2015 2015 SAP SE or an SAP affiliate company. All rights reserved. Internal 4
Self-service Data Preparation Tools Reduce the Time and Complexity of Preparing the Data Gartner predicts by 2018 most business users will have access to self-service tools to prepare data for analytics Source: Gartner 2015 SAP SE or an SAP affiliate company. All rights reserved. Internal 5
Information Management powered by SAP HANA Agile Data Preparation SAP HANA IM: Simplified landscape, lower TCO, in-memory speed, common metadata repository, integrated modeling environment, simple UI, for cloud or on premise. SAP Agile Data Preparation: Self-service data preparation with a consumer-grade UX, IT oversight, and full data stewardship. SAP HANA smart data quality: Address information management requirements (cleanse, match, best record, metadata and semantics, enrichment, etc.) with in-built HANA services; put the power to manage data within the platform and made accessible to the application. SAP HANA smart data integration: Supports real-time replication, physical bulk/batch data movement, and federation in a unified framework. Support for both on-premise and cloud sources, with built-in adapters for common sources and an open and easily extensible SDK for ecosystem to offer custom adapters. 2015 SAP SE or an SAP affiliate company. All rights reserved. Internal 6
Typical Complex Data Landscape Transactions Traditional IT Landscapes compartmentalize data and require too many moving parts Redundant ETL! ETL Aggregate? OperationalSystems Multiple Data Sources ETL Is Data Accurate? Fresh? Reliable? ETL Streams Staging DB 2015 SAP SE or an SAP affiliate company. All rights reserved. Internal 7
Simplified Landscape with SAP HANA SAP HANA Enterprise Information Management simplifies your data handling Simplified landscapes, fewer moving parts Data management services built in to the platform consumable by partner and customer applications One common modeling environment for provisioning and consuming data Open and extensible Supports any shape, size, style of data integration Open framework for new data sources Accelerated in-memory performance Current Accurate Reliable Data 2015 SAP SE or an SAP affiliate company. All rights reserved. Internal 8
Real-time, High Volume Data Integration Real-time replication Physical bulk/batch Virtual Tables Transformation HANA EIM Data Provisioning Server Adapter Framework Metadata HANA Tables Federation On-Premise or Cloud sources Built-in adapters for common sources HTTP(S) TCP/IP Open SDK for custom adapters.. Built-in Adapters 3 rd Party Adapters (partners, open source, etc.) Cloud Sources 2015 SAP SE or an SAP affiliate company. All rights reserved. Internal 9
SAP Agile Data Preparation Overview Data preparation for everyone Empower Empower business users to instantly improve the value of data by discovering, prepping, and sharing data Optimize Optimize IT s ability to govern how business users are preparing data by monitoring and operationalizing data usage Accelerate Accelerate business efficiency with trusted data by helping data stewards rapidly define, assess, and improve data Data preparation for every purpose Analytics Prepare and use data within any data visualization or analytics solution including SAP Lumira Data migration Provision data and adhere to compliance needs before distributing data to end users Master data management Instill confidence in the data with users across the enterprise Features Increases agility by enabling self-service data preparation for the business Simplifies information governance for IT Enables data stewards to collaborate, assess, define, monitor, remediate and improve data quality 2015 SAP SE or an SAP affiliate company. All rights reserved. Internal 10
SAP Agile Data Preparation POWERED BY HANA AVAILABLE ON-CLOUD AND ON-PREMISE FOR BUSINESS, DATA STEWARDS AND IT Discover Search, explore, profile local, ungoverned and enterprise datasets Define Collaborate with business users to define shared rules, business terms, policies and ownership Prepare Refine, merge, cleanse, enrich, analyze datasets smart data preparation IT Assess & Improve Data quality and proactive governance Share Share and reuse datasets Business Data Steward Govern Access Manage access to enterprise datasets, semantics and collective knowledgebase Automate Proactively deliver frequently used datasets Monitor Track dataset usage and data prep. actions, Anticipate demand Increase business agility by enabling self-service data preparation for the business Simplifies information governance for IT Enable data stewards to collaborate, assess, define, monitor, remediate and improve data quality Business IT Data Steward 2015 SAP SE or an SAP affiliate company. All rights reserved. Internal 11
Harvest Content Generated by Business Users - Promote to Enterprise Integration and Governance Is Time to Insight More Important Than Data Quality? Traditional Data Integration and Governance No Yes Metadata, Data Lineage, Auditability are key. Ad Hoc Ad Hoc or Recurring Analysis? Recurring Self-Service Data Preparation Multiple and Frequently Changing Sources? No Promote Self-Service Model to Enterprise Integration Yes Self-Service Data Preparation with Governance Rules Adapted from Self-Service Data Preparation: The Next Big Market Disruption, Rita Sallam, Gartner, March 2015 = ADP Capability/Use case 2015 SAP SE or an SAP affiliate company. All rights reserved. Internal 12
Product Features/Demo
Simple, Intuitive Spreadsheet UI 2015 SAP SE or an SAP affiliate company. All rights reserved. Internal 14
Easy, Push-Button Cleanse and De-Duplicate 2015 SAP SE or an SAP affiliate company. All rights reserved. Internal 15
Wizard-Driven Data Set Merging 2015 SAP SE or an SAP affiliate company. All rights reserved. Internal 16
Easily Undo Actions 2015 SAP SE or an SAP affiliate company. All rights reserved. Internal 17
IT Can Operationalize Frequently-Used or Crucial Sets 2015 SAP SE or an SAP affiliate company. All rights reserved. Internal 18
Product Scope first version (H1 2015) Standalone smart data preparation HANA application BI discovery tool agnostic Cloud and on premise deployment Data acquisition support MS Excel, CSV files, SAP BW, local HANA (tables, analytic/calculated views), 3rd party databases (Oracle, DB2, SQL*Server) Browsing for remote datasets Dataset search on local HANA Acquired data seen as tables in SDP Data Quality & Cleansing Data quality assessment with data and content type profiling Address and Party data cleansing with reference data OOTB for All world, U.S, and Canada (Cloud only) Record de-duplication Planned Innovations 2015 SAP SE or an SAP affiliate company. All rights reserved. Internal 19
Product Scope First Version 1H2015 (cont.) Data manipulation Wizard driven string, numeric and date manipulations, formula editor, filter, etc. Combine worksheets (Union and Merge inner, left-outer and full-outer join) Search within a worksheet Activity history and undo/redo User management Sharing projects Export dataset to Excel, CSV, and Hana Calculation View Data usage analytics for IT monitoring Promotion of datasets to production (operationalization) by IT (On premise Only) Planned Innovations 2015 SAP SE or an SAP affiliate company. All rights reserved. Internal 20
Roadmap
SAP Agile Data Preparation Today Interactively discover, search, manipulate, profile, cleanse and share datasets Monitoring capabilities for data lineage Job scheduling of projects Operationalization by IT (exchange to Web IDE) Planned Innovations Stewardship capabilities (rules management, DQ scorecards) Distribution of the validated data Filter management, Aggregations Metadata Management, Data Protection Support Data Enrichment, Best records Workflow Support Excel amenities like multicolumn sorting, formatting, pivoting Future Direction Additional domains for cleansing Additional stewardship capabilities (approval workflows, DQ reporting) Hadoop/Spark support SAP Agile Data Preparation 1.0.1 This is the current state of planning and may be changed by SAP at any time. 2015 SAP SE or an SAP affiliate company. All rights reserved. Internal 22
Please provide feedback on this session by completing a short survey via the event mobile application. SESSION CODE: 2793 For ongoing education on this area of focus, visit www.asug.com 2015 SAP SE or an SAP affiliate company. All rights reserved. Contact information: Paul Médaille Director, EIM Solutions GTM paul.medaille@sap.com Twitter: @PaulMedaille Lynne Lintelman Sr. Product Manager lynne.lintelman@sap.com
Thank you Contact information: F name MI. L name Title Address Phone number 2015 SAP SE or an SAP affiliate company. All rights reserved.