SAP Data Services Hacks Auto Generating Data Migration Jobs Shobhit Acharya Session# 3507
Learning Points Improve data migration efficiency using SAP Data Services and implementing a few custom approaches that will speed up the extraction and load of source data. These approaches deal with Programmatically generating data migration jobs to replace labor intensive and monotonous job design Using xml import mechanism to create job templates Using datastore configurations to ingest multiple instances of identical source databases
Agenda Introductions & Overview Data Migration & SAP EIM tools Data Services: Designer & Workbench Information Steward Review a baseline framework The use case for DS Workbench Alternative solutions for efficiency Understanding the DS Job xml structure Developing job generation programs for automation Datastore configurations When to use these programs Demo
Data Migration Assess Extract Consolidate Cleanse Load Reconcile Data migrations at an enterprise scale need a focus on migrating data efficiently, quickly, repeatedly and assuredly. The goal is not just to move and convert data; its to ensure that the data is of high quality and supports the business processes and target system s operational needs.
Data Migration 80% of organizations will underestimate the costs related to the data acquisition tasks by an average of 50 percent. Gartner
SAP Data Services & Information Steward Overview
SAP Data Services One Solution that Provides Data Integration Data Quality Text Data Processing One server to execute all capabilities One design environment to manage all development One administration console to monitor all functions
SAP Information Steward Business Measure and compare against Information governance rules and standards IT Share data quality metrics and problems with business
Case Study Review a baseline Data Migration Framework 1 2 Ingestion Extraction into Stage 4 Transform to common structure 7 Apply Relevancy Rules 10 Consolidate cleansed data 12 Fix Reference Data issues SAP BODS Source 1 2 3 Legacy / Staging Transformed 4 7 Target Relevant 10 Load 11 14 12 Target System 16 Staging Area 3 Initial Data Profiling 5 6 SAP IS 8 BoA 13 Pre-load sign off 5 Initial Health Check IVM 9 BoA Staging 14 Load to Target System 15 Reconciliation 6 Auto De-dup & Cleanse 8Secondary Health 9 Facilitated Check Cleansing 11 Reference Data validation 13 BOBJ (Reporting) 15 16 Post load sign off
Case Study Ingestion Scope 25+ distinct source systems Multiple source product versions Need separate job streams Need separate job control Over 400 databases to ingest Hadoop as staging Multiple waves of migration 1 Source Extraction 1 SAP BODS 2 3 Legacy / Staging 2 Ingestion 3 Initial Data into Profiling Stage
The use case for DS Workbench Quick to build data replication projects Data flow design Additional customizations in DS Designer Progressively additional functionality added each release
The use case for DS Workbench Data Replication Design
The use case for DS Workbench Data flows
The use case for DS Workbench Goodies Monitor performance View data
The use case for DS Workbench And where it falls short No big data sources or targets Little or no workflow customizations A small list of supported transforms Additional work could be required in DS Designer for job control and customizations
Alternative solutions Generate your own jobs in XML Dataflow= Source Tables -> Query Transforms -> Target (including HDFS) Workflows Custom script stages Jobs Datastores and Configurations Flat file formats Import generated XML as DS Designer Jobs, workflows, dataflow Configure datastore for multiple deployments of the same source product database
Data Services job export in XML Understanding the structure of a simple job Example : 1 Dataflow in a job Export
Data Services job export in XML Understanding the structure of a simple job Example : 1 Dataflow in a job 350 lines of xml Lets look closer
Data Services job export in XML Understanding the structure of a simple job DIDatabaseDatastore DIAttributes DSConfigurations *Variables* Repeat for each --- <odbc_data_source>*datastore*</odbc_data_source> DITable DIProperties DIColumn DIDataflow DITransforms DIAttributes DIDatabaseTableSource DIOutputView DIFileTarget DIAttributes DIQuery DIAttribute DISchema DIElement DISelect DIProjection DIExpression DIFrom DIFlatFileDatastore DISchema DIElement DIAttributes DIUIOptions --- name="*table_name*" owner="*dbowner* datastore="*datastore --- List all columns and column properties, *COLUMN_NAME* --- name= *DATAFLOW*" --- array sizes, static parameters --- datastorename="*datastore* tablename="*table_name*" --- name="*table_name*" --- formatname="*datastore*_*table_name*" filename="*table_name* --- HDFS File location/path + static parameters --- name="*table_name*" value="*table_name*" --- List all columns and column properties, *COLUMN_NAME* --- column="*column_name*" --- Name="*DATASTORE*_*TABLE_NAME*" --- List all columns and column properties, *COLUMN_NAME* --- Datastore input and output file store attributes, 1 per job --- name="*datastore*_*table_name*" value="*table_name*"
Generating the xml programmatically Understanding what you need Programmers for your code Sql programming skills Or on java/python/.net Data Services Sandbox Repository Source database table and column definitions Oracle : all_tab_columns, Sql Server : information_schema, Progress DB: sysprogress.syscolumns_full. DB schema for code and column definitions + True Grit
Generating the xml programmatically Applying the understanding for complex designs Auto generated Imported xml This example: 1 Source Datastore 20+ Configurations 400+ Tables 400+ HDFS formats 400+ Dataflow 400+ Workflows 1 Job
Generating the xml programmatically Applying the understanding for complex designs Workflow Auto generated Scripts Dataflow Datastores
Data Services Datastore Configurations Contains alternate connection parameters for the datastore Typically used for promotions to new environments Could be leveraged for using template jobs on multiple databases with identical schemas (e.g. QAD Progress databases)
Best practices Always isolate any custom xml imports into a sandbox repository Use datastore configurations to maximum effect Pre-import, export (to xml) the intended source and destination datastore without any tables included Post-import, re-import these datastores to override the generated datastores
Generating the xml programmatically When could you need this?
Return on Investment Initial assessment at JCI for developing the custom programs needed (Codename : ATLGEN) Effort invested in ATLGEN development: 1 pers. week Typical pre-use efforts: 2 weeks per source Potential post use efficiency : 10x per source # of sources : 25-50+ distinct sources
Live demo
STAY INFORMED Follow the ASUGNews team: Tom Wailgum: @twailgum Chris Kanaracus: @chriskanaracus Craig Powers: @Powers_ASUG
SESSION CODE 3507