Data Aging Strategien im SAP Business Warehouse BW 7.3 Rainer Uhle, SAP Product Manager Dr. Peter Zimmerer, SAP Development Architect Mannheim, Rosengarten - June 22, 2011
Disclaimer This presentation outlines our general product direction and should not be relied on in making a purchase decision. This presentation is not subject to your license agreement or any other agreement with SAP. SAP has no obligation to pursue any course of business outlined in this presentation or to develop or release any functionality mentioned in this presentation. This presentation and SAP's strategy and possible future developments are subject to change and may be changed by SAP at any time for any reason without notice. This document is provided without a warranty of any kind, either express or implied, including but not limited to, the implied warranties of merchantability, fitness for a particular purpose, or non-infringement. SAP assumes no responsibility for errors or omissions in this document, except if such damages were caused by SAP intentionally or grossly negligent. 2011 SAP AG. All rights reserved. 2
You Need Complete and Trusted Information to Make Good Business Decisions 90% of upper level management feel they don t have the necessary information for critical business decisions; 50% of them are afraid they are making poor decisions because of it. BI strategies are deemed to fail without a trusted data foundation The #1 risk for building a data mart or data warehouse is data quality 2011 SAP AG. All rights reserved. 3
How Good is the Data Behind My Dashboard? Where did these numbers come from? Are we considering all our relevant sources? Are these terms consistent with our business definitions? How current is this data? When was it last updated? Can I trust this data enough to make my critical decisions? Has the data passed all our business rule checks? 2011 SAP AG. All rights reserved. 4
Enterprise Data Warehouse (EDW) Eigenschaften und Anforderungen 2011 SAP AG. All rights reserved. 5
SAP NetWeaver Business Warehouse Strong EDW capabilities Integrated, scalable Enterprise Data Warehouse (EDW) platform EDW = DBMS + X Business Content Reliable Data Acquisition Fast, sustainable implementation through Modeling Patterns Business Content Openness and data quality through Out-of-the box integration for data originating in SAP systems Integrated with SAP BusinessObjects Data Services (Data Integrator and Data Quality Management) Streamlined Operations Lifecycle Management Efficient data management through: Management of data consistency, data base abstraction, data base neutral Sophisticated Security, Authorization and Identity Handling High availability Enable sophisticated lifecycle management at different levels: System Meta Data Data (Nearline storage, archiving) 2011 SAP AG. All rights reserved. 6
What does BW know about my Business? 2011 SAP AG. All rights reserved. 7
Einführung in den Begriff Layered, Scalable Architecture (LSA) Die Layered, Scalable Architecture (LSA) ist ein von SAP genormter Begriff für ein gemeinsames einheitliches Verständnis. Die LSA ist eine Referenzarchitektur und nicht nur ein Datenmodell. Im Zentrum steht der Service-Gedanke der Referenzarchitektur: Jede Schicht bietet einen Service, den man nutzen kann. Layered Scalable Architecture Schichtenbasiertes Datenmodell, in der jede Schicht eine spezielle Aufgabe wahrnimmt. Das Datenmodell ist skalierbar und kann z.b. einfach um weitere Quellsysteme, Regionen und Szenarien erweitert werden. Die LSA ist eine Architektur, die im gesamten BW-System zur Anwendung kommt. 2011 SAP AG. All rights reserved. 8
Die Schichten der LSA-Referenzarchitektur BI-Applikationen (Architected Data Mart Layer) Reporting Reporting Layer Für das Reporting optimierte Schicht (bestehend aus InfoCubes und MultiProvidern) Near-Real-Time Reporting, nah am operationalen Reporting Verdaubare, konsumierbare, integrierte und unabhängige Daten Business Transformation Layer Data Propagation Layer Harmonisation Layer Data Acquisition Layer Corporate Memory Operational Data Store Anwendung von Business- Logik für die Applikationen LSA Quellsystem-nahe Struktur, vollständige Speicherung der Historie so granular wie möglich, Master the Unknown EDW Layer (Single Point of truth, wiederverwendbar, granular, vollständige Historie) Datenquellen Extraktor-Inbox, 1:1- Mapping, temporäre Speicherung Harmonisierung, Sicherung der Datenqualität, Plausibilisierung 2011 SAP AG. All rights reserved. 9
LSA Data Flow Templates as Content 2011 SAP AG. All rights reserved. 10
13.359 13.728 13.910 14.214 14.446 14.687 14.948 15.238 SAP NetWeaver BW adoption Productive SAP NetWeaver BW systems constant growth Adoption of SAP NetWeaver BW constantly growing Unaffected by economic down-turn in 2009 More than 12000 customers referring to more than 15000 productive systems 16.000 15.500 15.000 14.500 14.000 13.500 13.000 12.500 12.000 Q4 10 Q3 10 Q2 10 Q1 10 Q4 09 Q3 09 Q2 09 Q1 09 Stable Product, Large installed Base, Constant Growth 2011 SAP AG. All rights reserved. 11
Analysten Meinungen Forrester 2011 2011 SAP AG. All rights reserved. 12
SAP BW EDW and Reality - 60 TB Proof of Concept on RDBMS (IBM/ DB2) Discussions about corporate DWH architectures (EDW) are frequently driven by fears and prejudices. This results in vague questions like: Can BW handle 30, 40,..., 100 Terabyte? The answer: SAP BW - 60TB Proof of Concept 2011 SAP AG. All rights reserved. 13
SAP NetWeaver 7.0 Business Intelligence SAP NetWeaver BW Accelerator BW Accelerator Query Run Time Merging and results preparation for BI queries Information Aggregation on the fly BW Analytical Engine Query & Response InfoCube Indexing (*) property setting ( load index into main memory ) or schedule program RSDDTREX_INDEX_LOAD_UNLOAD 2011 SAP AG. All rights reserved. 14
Total DB Size BWA Linear Scalability - Data Volume vs. Resources (25 TB Showcase 2009) 25 TB 1.2 TB / h 101,000 reports / h 4.2 sec 37 M records 15 TB 1.1 TB / h 101,000 reports / h 4.2 sec 22 M records 5 TB 0.6 TB / h 100,000 reports / h 4.5 sec 6 M records Legend: Index creation throughput Multiuser reporting throughput avg. report response time avg. # records touched per report 27 blades 81 blades 135 blades BWA Resources 2011 SAP AG. All rights reserved. 15
Staging Area Bill Inmon s Corporate Information Factory & Near-Line Storage Departmental Data Marts ETL Marketing Acctg Finance Sales DSS Applications ERP ERP ERP CRM Changed Data EDW ecomm. Bus. Int. Internet ERP Corporate Applications local ODS Global ODS Dialogue Manager Cookie Cognition Preformatted dialogues Oper. Mart Session Analysis Web Logs Granularity Manager Archives Exploration warehouse/ data mining Cross media Storage Management Near line Storage Source:Bill Inmon 2011 SAP AG. All rights reserved. 16
Data-Aging Strategies for Volume Performance Storage Type / Data Category Information Lifecycle according Near-Line to Importance/Age: Storage Online Database (read only) Classic Archive (read only) Frequently read / changed data (actual) Infrequently read data (mature) Very rarely read data (aged) 2011 SAP AG. All rights reserved. 17
Key facts about SAP NLS NLS should be a part of an Information Lifecycle Management (ILM) strategy Based on wellestablished SAP / SAP BW archiving concepts Data archived in NLS can be incorporated into reporting Data consistency guaranteed before deleting the data from source High compression rate (up to 95%) Supports archiving of InfoCubes and DataStore Objects Saves storage costs and other system resources NLS is an application from a third party vendor, running on a separate system Mainly timebased archiving, yet can also be based on other characteristics Lock of the archived data slice in the original InfoProviders Increases retention period for analysis data Scheduling and Monitoring of archiving sessions from SAP BW system Included in the query statistic data collection (RSRT) Process Chain support Copes with changes in the meta data to the BW objects of the archived data 2011 SAP AG. All rights reserved. 18
Evolution by SAP NetWeaver BW Releases SAP NetWeaver BW 7.00 Enhanced Look-Up API Suspension and selective continuation of archiving processes within Process Chains Restore of an archiving request with all successors Smaller Data Object size for ADK-based NearLine Solution without semantic grouping SAP NetWeaver BW 7.01 (EhP1) Support of write-optimized DataStore Objects for ADK archiving and the NearLine- Storage interface Request based Archiving Enhanced status and job monitoring within InfoProvider management view SAP NetWeaver BW 7.30 Support for accessing NearLine-Storage data for MultiProviders Feature to allow archiving from uncompressed InfoCubes Archiving of Semantic Partioned Objects (SPO) with SP1 Automatic rebuild of BW Accelerator index possible 2011 SAP AG. All rights reserved. 19
The Nearline Storage Solution for SAP NetWeaver BW Based on the NearLine Storage Interface Development Partners can implement their Solutions for Archiving and NLS into the SAP BW 3rd Party NLS Solutions are implemented within the SAP BW ABAP Stack in partner specific namespaces have to pass a certification process can offer specific Application Area in the SAP Support Portal have to be licensed in addition to SAP licenses can have a different release cycle compared to SAP NetWeaver BW NLS Partner Solution Present development partners Certified since SAP BW 7.0 (in alphabetical order of their products) CBW PBS Software Dynamic NearLine Access - SAND Technology DB2 Viper 9.5 - IBM DataVard OutBoard 1.0 yes yes 7.01 SP6 yes (see also http://www.sap.com/ecosystem/customers/directories/searchsolution.epx ) 2011 SAP AG. All rights reserved. 20
Customer Adoption - BW Archiving and Near-line Storage (based on 895 customer messages) 2011 SAP AG. All rights reserved. 21
Data analysis and assistance for ROI analysis Sizing of NearLine-Storage solutions: Hardware sizing of the NearLine-Storage solution has to be done by the vendor Different NearLine-Storage technologies on the market From database solutions, to file-based solutions, to column-based storage solutions Data volume services by SAP Active Global Support (AGS) http://service.sap.com/dvm Deliver a thorough analysis of BW objects distribution Can help on estimating the data volume that may be archived / transferred to NLS for the largest InfoProviders within the system Considers only technical facts (and not the customer s business requirements ) 2011 SAP AG. All rights reserved. 22
Data Management with Near-line Storage Implementation Aspects 1 2 3 4 Create a Data Archiving Process Create and schedule archiving requests Restore archiving requests Load data to subsequent Data Targets 5 Look-up during Transformation 6 Query Settings 7 Multiprovider Settings Reporting Layer SAP Sales InfoCube (Architected Data Marts) 6 Data Propagation Layer Nearline Storage Data Acquisition Layer DTP DTP DTP InfoPackage MultiProvider 7 Corporate Memory DTP PSA 4 5 DTP InfoSource DataSource 2 3 DAP Nearline Storage Nearline Storage 1 LSA 2011 SAP AG. All rights reserved. 23
Design Aspects Near-line Storage (NLS) vs. BW Accelerator (BWA) BI InfoMarts (InfoCube) ADK Archive BWA Acceleration Archiving Near-line Storage Acquisition RDBMS Access - very frequently frequently not frequently rarely 2011 SAP AG. All rights reserved. 24
Data Management at Query Runtime The Data Manager identifies the availability of alternative data storage of any kind, such as 1. Data resides in the InfoProvider in the database 2. Data resides in a classical Aggregate 3. Data resides in the BW Accelerator Index 4. Data resides in an NLS Partition Aggregate Types BW Accelerator Index NLS Partition 2011 SAP AG. All rights reserved. 25
NLS Related MultiProvider Settings NearLine Readmode disabled at all enabled at all InfoProvider settings 2011 SAP AG. All rights reserved. 26
MultiProvider: Query Runtime Statistics Listing of Basis Providers and NLS partitions used during Query execution 2011 SAP AG. All rights reserved. 27
NLS Related Query Designer Settings Reporting Fixed NLS Settings read NLS do not read NLS see InfoProvider settings 2011 SAP AG. All rights reserved. 28
NLS Related Query Designer Settings: Variable Variable NLS Settings (Dialog) read NLS do not read NLS see InfoProvider settings 2011 SAP AG. All rights reserved. 29
InfoCube: Archiving of Uncompressed Data Central setting in Data Archiving Process (DAP) Valid for all archiving requests und DAP-Variants Can be changed during operation Prerequisite: only already processed requests (aggregates, Delta DTP) Allow Archiving for noncompressed data 2011 SAP AG. All rights reserved. 30
Data Management at Archiving Runtime During the delete phase of the archiving request the new setup of the BWA index is offered in the dialog. BWA consistence reflected during DAP processing 2011 SAP AG. All rights reserved. 31
Optimized Support for Navigational Attributes Optimized Support for navigational attributes during Query processing on NLS Navigational attributes are master data attributes that can be used to navigate/filter in queries. Master data attributes are located outside the InfoCube persistence in the extended star schema and thus are not a component of the NLS data stock. Previous solution: Selections for navigational attributes were not transferred to NLS as selections The attribute values were assigned subsequently and filtered in the result set Performance problems for highly selective attribute values Improvement: Selections for navigational attributes are converted first to a selection for the characteristic bearing attributes (max. 100 characteristic values) The attribute selection is replaced by this characteristic selection in the query selection. 2011 SAP AG. All rights reserved. 32
DSO Lookup for near-lined Partitions SAP NetWeaver BW 7.30 will come up with a separate transformation rule type, a DSO lookup In case a NLS solution is attached to the BW system, the lookup will automatically read from both the online and near lined data partitions. 2011 SAP AG. All rights reserved. 33
Data Access within the APD With SAP NetWeaver BW 7.30, the Analysis Process Designer will be enabled to read from Nearline-Storage also for the source type Read data from InfoProvider Option to allow reading from NLS for InfoProvider sources 2011 SAP AG. All rights reserved. 34
Reload data from both Online and Nearline partitions for InfoCubes Option to extract data from both the Online and Nearline Partition in a single DTP 2011 SAP AG. All rights reserved. 35
Transaction LISTCUBE Read data from NLS combined 2011 SAP AG. All rights reserved. 36
Archiving of Semantic Partitioned Objects Facts: Semantic Partitioning possible for InfoCubes (only standard InfoCubes) and DSOs (standard and write-optimized) There is not a DAP per PartProvider but only one DAP for the entire SPO. As a consequence, there is not a set of tables / files created in the NLS system per PartProvider but only a set of tables / files per SPO. The DAP itself has the same options / settings as a regular InfoProvider. However, the DAP must contain the logical partitioning criterion as additional archiving criterion so that data can be archived, reloaded, or restore for a dedicated Semantic Partition. Semantic Partitioning criterion 2011 SAP AG. All rights reserved. 37
Archiving of Semantic Partitioned Objects Since archiving is not carried out per PartProvider, there is not Archive tab within the administration user interface. Instead, an archiving request can be scheduled by means of a dedicated / global button. Maintain Archiving 2011 SAP AG. All rights reserved. 38
Archiving of Semantic Partitioned Objects Since archiving is not carried out per PartProvider, there is not Archive tab within the administration user interface. Instead, an archiving request can be scheduled by means of a dedicated / global button. An archiving request can be schedule to archive data from all available partitions or only from a dedicated partitions (which is equal to an archiving run being restricted to the semantic partition) Cross-partition archiving or only for a specific partition 2011 SAP AG. All rights reserved. 39
Reading data from SPOs Query In SAP NetWeaver BW 7.30 data contained within a Nearline-Storage system can be read with a query being directly flagged to read data from NLS (query properties to read NLS data do no longer have to be maintained via transaction RSRT) Query can be set to read or to not read data from a NLS. Furthermore, it is possible to specify the same on InfoProvider level, which can also be taken into consideration. 2011 SAP AG. All rights reserved. 40
Summary and Outlook Latest Enhancements Enhanced lookup support especially for temporal lookups (non-equal lookup conditions) Request-based archiving for InfoCubes (avoid compression before archiving) (BW 7.30) Combined DTP extraction from online and archive partition of an InfoCube (BW 7.30) Enhanced NLS support for Semantically Partitioned Objects (SPO) based on standard InfoCubes and standard DSOs (BW 7.30 SP 1). NLS support for SPOs based on write-optimized DSOs is available with SP3. NLS support for DSO lookup within transformations (DSO lookup feature to be released with SAP NetWeaver BW 7.30 with lookup for online data only) Master Data deletion to consider data within NLS Medium term NLS support for BW 7.3 running on HANA In-Memory Physical deletion of NLS requests from the Near-line Storage (BW 7.30 SP5) Long term Archiving of InfoCubes with non-cumulative key figures, as well as InfoSets and HybridProviders Archiving of master data and hierarchies Archiving with free selection criteria (not only time slice archiving) 2011 SAP AG. All rights reserved. 41
Planned Roadmap HANA & SAP NetWeaver BW BW 7.3 / BWA 7.2 BW 7.3 SPnn BW 7.0 / BWA 7.0 Major release BW Accelerator New features and improvements across all components BW 7.0 EhP1 (7.01) Go-to release for integration with SAP Business Objects BI Major step on Enterprise Data Warehousing scalability and flexibility BW Accelerator: additional performance Integration Improvements with SAP BusinessObjects Data Services BW running on HANA as the underlying In-Memory DB Platform In-Memory for Enterprise Data Warehousing Integrated Planning In- Memory enabled 2006 2009 SAP NetWeaver BW evolving to a fully In-Memory enabled EDW solution on top of HANA 2010 HANA V1.0 2011 Real-time operational analytics on mass data Rapid creation of agile data marts Non disruptive deployments of HANA side by side ERP and/or BW Future direction HANA V1.0 SPSnn Additional calculation capabilities Primary persistence layer under BW; eliminates need for separate database Models for SAP business content enabling new applications 2011 SAP AG. All rights reserved. 42
Data-Aging Strategies: Near-Line Storage Only Storage Type / Data Category Online Database Near-Line Storage (read only) Classic Archive (read only) Frequently read / changed data (actual) Information Lifecycle according Archive to Importance/Age: Infrequently read data (mature) Very rarely read data (aged) Current Situation Near-Line Storage is the leading and only persistency No isolated Delete from Near-Line Storage possible Workaround: Restore to Online Database and delete from there 2011 SAP AG. All rights reserved. 43
Data-Aging Strategies: Classic Archive + Near-Line Storage Storage Type / Data Category Online Database Near-Line Storage (read only) Classic Archive (read only) Frequently read / changed data (actual) Information Lifecycle according Archive to Importance/Age: (ADK Infrequently read data (mature) + NLS) Very rarely read data (aged) Current Situation ADK (Classic) Archive is the leading persistency Near-Line Storage is filled from ADK Archive during Verification Phase Near-Line Storage is strictly coupled to ADK Archive (no independent Delete) 2011 SAP AG. All rights reserved. 44
Details for the planned NLS Deletion Features (for SAP BW 7.3, SP05) 1) Data resides in NLS only (without ADK) First step "logical" Deletion of NLS Data (set NLS Request to "Invalid" ) NLS Status in NLS Archiving-Request-List will be set to Marked for Deletion / "Deleted" NLS Data will be deleted asynchronously using a Clean-Up Job or (later) a Process Chain Time slices will remain locked 2) Data resides in NLS and ADK Request can only be deleted from NLS, Data in ADK stays untouched ADK delete is not supported from NLS Dialog (see SAP Data Life Cycle/ Retention concepts in ERP) Later Restore from ADK to NLS supported 2011 SAP AG. All rights reserved. 45
Data resides in NLS (only) (Final) Deletion of Near-Line Request 2011 SAP AG. All rights reserved. 46
Data resides in NLS only Three Alternatives lead to Near-Line Request Status "Deleted" Finally Deleted from NLS (after successful archiving) Restored (Deleted from NLS but stored in Online-DB again) Invalidated (never deleted from Online-DB) 2011 SAP AG. All rights reserved. 47
Data resides in ADK and NLS Restore deleted Near-Line Request from ADK 2011 SAP AG. All rights reserved. 48
Data resides in ADK and NLS New Near-Line Request after Restore from ADK 2011 SAP AG. All rights reserved. 49
Thank You! Contact information: rainer.uhle@sap.com SAP NW BW PM SAP AG - Walldorf