September 9 11, 2013 Anaheim, California Understanding and Leveraging Improvements in SAP Data Integration and Data Services Platform 4.2 Tanya Milanovic
Enterprise Information Management with SAP Understand the big picture of SAP s enterprise information management offerings Explore step-by-step instructions for working with SAP Data Services Reviews are in! Learn how to perform the most important tasks in SAP Information Steward, SAP NetWeaver Information Lifecycle Management, SAP Master Data Governance, and more All royalties donated to Doctors Without Borders A consistent Top 10 Best-seller with SAP Press!
Learning Points Major improvements in Data Integration between Data Services 3.2 and 4.2 in areas of: Security and Supportability BIG DATA and Performance Usability 3
What is SAP Data Services? SAP Data Services unifies leading data integration, data quality management, and data profiling solutions in a single enterpriseclass product. Features: one development UI, metadata repository, data connectivity layer, run-time environment, and management console. 3.2 was good, 4.2 is much, much better 4
Learning Points Major improvements in Data Integration between Data Services 3.2 and 4.2 in areas of: Security and Supportability BIG DATA and Performance Usability 5
DS 4.0 Security and Supportability Improved security in architecture and user management through tight integration with SAP BusinessObjects Enterprise (BI) 6
DS 4.0 Security and Supportability Deployment scenarios: Wiki: http://wiki.sdn.sap.com/wiki/display/eim/data+services+common+install+scenarios KBA/SAPNote: 1740516 - SAP Data Services 4.x and SAP Information Steward 4.x compatibility with SAP Business Objects BI Platform and Information Platform Services. 7
DS 4.0 Security and Supportability Improved security in Architecture: SSL used for all TCP/IP communication channels in DS Disable SSL encryption for improved performance: For Management Console HTTPS is supported now BMC (BusinessObjects cryptographic module) used for Export/Import of ATL. 8
DS 4.0 Security and Supportability New Security Model for User Management One system to manage all users (BI and DS) Common set of users for Designer/MC/Central Repo (in 3.2 this required 3 different username/passwords) Support for LDAP and Active Directory for user Authentication Control over which repositories a user when logged onto MC can access (in 3.2 if logged on access to all repos) Enforce password policies 9
DS 4.0 Security and Supportability DS 4.0 Solution Manager 7.1 integration: Centralized monitoring, trouble-shooting, and performance statistics at a single glance CPU/memory usage, total execution time at job level as well as aggregated for job server and host 10
DS 4.1 Security and Supportability A pop-up message will appear when trying to log onto Designer or in MC when a DQ report is opened if a user without the 'retrieve password' right granted tries to use a repository. 11
DS 4.1 Security and Supportability Enhanced ABAP function modules now have their own name space /BODS/ instead of custom Z name space. Only the DS user is allowed now to call the DS functions - need to add: ZDSAUTH, ZDSDEV, ZPGMCHK & ZSDS authorization objects (part of DS function transport) 12
DS 4.1 Security and Supportability New locations for user-specific and common configuration files and logs %DS_COMMON_DIR%: %SYSTEMDRIVE%\ProgramData For all common configuration files, log files, DSConfig.txt %DS_USER_DIR%: %SYSTEMDRIVE%\Users\<user> For user specific configuration files, DSUserConfig.txt Simpler to set up DS on Citrix - no need to manually copy files and create user-specific folders. 13
DS 4.1 Security and Supportability The monitor sample rate changed from a row-based trigger to a time-based trigger Before: by default every 1000 rows (performance impact). Now each five seconds the status of all threads is written. 14
DS 4.1 Security and Supportability Improved monitor log information to identify performance bottlenecks. CPU utilization of the entire process. CPU utilization of the thread Input buffer rows used and input buffer row size 15
DS 4.2 Security and Supportability Data Services Job Lifecycle Management Web-based tool to manage job promotion from development to production securely New 3-step wizard in Management Console for export or import of data New user rights in CMC 16
DS 4.2 Security and Supportability Export: 1. Select source repository and object types 2. Select objects to promote and target 3. Export Support of FTP, SFTP, and Shared Directory 17
DS 4.2 Security and Supportability Import: 1. Select target repository 2. Select objects to import 3. Import 18
Learning Points Major improvements in Data Integration between Data Services 3.2 and 4.2 in areas of: Security and Supportability BIG DATA and Performance Usability 19
DS 4.0 BIG DATA and Performance DS is the engine to load data into SAP HANA. SAP HANA Modeler uses DS to browse and import external metadata, generate initial flows (further modifiable in DS Designer). Support for Teradata Fast Export (fast extraction) and increased number of push down functions 20
DS 4.0 BIG DATA and Performance New SAP Interface support- Extractors ABAP, IDOCs, RFCs/BAPIs, RFC_Read_Table and now Extractors Use case: If you have SAP Application but not BW or you want to integrate data from another source that is not ECC you can use DS now with Extractors Delta queue support through ODP API 200+ extractors officially released for DS by the Business Suite 21
DS 4.0 BIG DATA and Performance Support for new BW 7.3 staging BAPI for native loading of BW 7.x datastores (no need for BW 3.5 emulation mode). Better Unicode data handling: updated SAP NetWeaver RFC SDK. RFC server now allows for parallel processing for BW loading and extraction via Open Hub. Sources and targets: Synonym support (Oracle, DB2) ODBC improvements Webservice Datastore Configurations 22
DS 4.1 BIG DATA and Performance Enhanced HANA support: Improved load performance and support for bulk updates (also part of DS 4.0 SP2). Support for stored procedures. Advanced pushdown - leverage HANA's capabilities for transformation. HANA can be used to store DS repository. 23
DS 4.1 BIG DATA and Performance Hadoop support: Support for reading from and loading to HDFS and Hadoop. Text Data Processing and Query processing as MapReduce functions, utilizing Hive add-on and PIG scripts. Push down support (joins, sorting, filter and projection, aggregate functions, text data processing) 24
DS 4.1 BIG DATA and Performance Sybase IQ loading performance enhancements: bulk updates support and using binary data format Teradata bulk load enhancements - support for bulk deletes MSSQL 2008 CDC support: support for CDC and Change Tracking 25
DS 4.1 BIG DATA and Performance Data streaming (no intermediate data files needed) for ABAP dataflows via new RFC method RFC_READ_TABLE when transport method is RFC supports large amounts of data now (does not support joins) Parallel reading from business content extractors Load balancing support in SAP datastores 26
DS 4.2 BIG DATA and Performance Hana is the only database that has a script language supporting parallel processing. Hence we can push down tasks that are not possible to express in SQL efficiently, i.e. Validation transforms with certain rules Loading multiple tables in parallel Merge transform You can now turn an entire dataflow into a calculation view. 27
DS 4.2 BIG DATA and Performance You can build a dataflow and switch back and forth between materializing the data by loading the physical target table or by keeping it virtual (the target object being the Calc view) 28
DS 4.2 BIG DATA and Performance Social media and Adapter SDK Allows Customers, Partners and SAP to develop and deploy connectors to technical and application systems. SQL push-down based on data source capabilities Change Data Capture (CDC) framework for data movement in real time Built in SQL parser to help convert data access statements to SQLlike languages Use case: connect with NoSQL databases like Cassandra and social media sources like Twitter 29
DS 4.2 BIG DATA and Performance Real-time Data Integration: Real-time and continuous CDC updates with complex transformation and data quality for any data intensive projects to both ERP and non ERP sources. Guaranteed data delivery, zero fault tolerance, and zero operational downtime 30
DS 4.2 BIG DATA and Performance First introduced In DS 4.1 SP1: Continuous Workflow Runs its child data flows in a loop, keeping the min memory for next iteration. Connecting to repository, parse/optimize/compile ATL and open DB connections performed only once, improving performance. 31
DS 4.2 BIG DATA and Performance Over 5000 Extractors certified with DS Reading ABAP table directly (RFC_READ_TABLE) can be partitioned and will be faster then ABAP dataflow DS provided functions got added to the SAP stack shipped with SAP application modules PI_BASIS, SAP_APPL and SAP_BW 32
Learning Points Major improvements in Data Integration between Data Services 3.2 and 4.2 in areas of: Security and Supportability BIG DATA and Performance Usability 33
DS 4.0 Usability Query transform Improved support for outer joins, now supporting ANSI/SQL92 joins (using ON clause instead of WHERE clause). More predictable results in complex outer joins and better performance due to more pushdown. New UI more intuitive to use: All join properties on one tab Join ranks and cache settings defined in FROM tab 34
DS 4.0 Usability Validation Transform: define multiple rules for one column and bind a rule to multiple columns Hierarchy_Flattening transform supports circular dependencies New functions: encryption/decryption, generating UUIDs 35
DS 4.1 Usability Data Services Workbench 1.0 Replicate data and metadata from 3rd party databases into targets via simple wizard. 1. Set up connections to source and target system 2. Select the required tables from the source (or simply select all tables) 3. Generate and execute the job to move all data with one mouse click Bulk-load: SAP HANA Sybase IQ Teradata 36
DS 4.1 Usability XML_Map Transform Easily nest and un-nest hierarchical data structures. Use iteration rules for repeatable object construction. Target columns can be used in where clause, order by, group by, DISTINCT and aggregate functions. 37
DS 4.1 Usability Design-Time Data Viewer To invoke the Design-Time Data Viewer, select Debug->View Design- Time Data. Users can select 'View Automatically' for automatic data refresh upon any transform design change. Output display data can be filtered (default 50 rows) to improve display performance. 38
DS 4.2 Usability Data Services Workbench 2.0 Supports tables from multiple sources, including delimited text files, in a single replication job. Each replication job can be organized into multiple groups that are executed in parallel in a DS workflow. 39
DS 4.2 Usability One step query creation Automatic table joins Automatic column mappings 40
DS 4.2 Usability Apply expression macros to selected columns to reduce repetitive manual mappings Basic Cleanse 41
DS 4.2 Usability Analyze data flow lineage and transform metadata to improve quality of data integration View transformation metadata 42
DS 4.2 Usability Supported sources, targets and transforms 43
DS 4.2 Usability Enhanced Map_Operation Transform to allow for complex data transformation with non normal row types supports mapping expressions Before: After: 44
DS 4.2 Usability Different Mappings can be used for insert(=normal)/update/delete rows If left empty a 1:1 mapping is used just as in the previous releases of DS Mapping field used depends on the input row OPCODE before_image() - new function which is available only in the Map_Operation update mapping 45
DS 4.2 Usability UC1: Target should have two timestamps, INSERT_DATE and LAST_UPDATE_DATE 46
DS 4.2 Usability UC2: Input has a BALANCE column, we want to know the BALANCE_CHANGE 47
DS 4.2 Usability UC3: Delta dataflow for aggregation (some sources only give change amount) UC4: Slowly Changing Dimensions Type 3: table has a current and a previous column for each attribute CURRENT_NAME is the CUSTOMER_NAME, the PREVIOUS_NAME is the before_image(customer_name) 48
DS 4.2 Usability Enhanced XML_Map Transform: batch mode allows you to break input data set into subsets 49
DS 4.2 Usability Entity_Extraction Transform improved to simplify Text Data Processing data flows and improve results: Single transform for multiple languages Automatic language identification and selection of language-specific custom dictionaries and rules Expansion of Dutch and Portuguese extraction New Simplified Chinese Voice of the Customer rules Expansion of Emoticon and Profanity extraction in French, German and Spanish 50
Key Learnings Understand the key benefits from Data Services 4.2 upgrade: Access and analyze diverse BIG DATA sources Drive real-time decision making based on reliable real-time data Gain efficiencies in ETL development and cut cost Ensure security and robustness of your ETL Speed up your ETL processes to meet batch window 51
Submit your Information Governance project for the 2nd Annual IGgie Award Presented at the ASUG Data Governance SIG Conference 10 October 2013, Atlanta, Georgia
Thank you for participating. Please provide feedback on this session by completing a short survey via the event mobile application. SESSION CODE: 0207 Learn more year-round at www.asug.com