InfoSphere To DataStage Integration Options 00 IBM Corporation
Business Challenges Driving Real-Time Data Integration Dynamic Warehousing & Business Intelligence and Reporting Yesterday s data inadequate for inventory and purchasing decisions Data Synchronization and Replication Real-time Event Detection We need up to date information flowing between applications and to ensure an up-to-date version is always available Need to pro-actively monitor and respond to business changes Without Impacting the Performance of Production Systems
Accelerate capture and delivery of data changes for ETL optimization or event-driven data quality InfoSphere Change Data Capture provides low impact, log-based changed data capture and rapid delivery of changes Direct integration with InfoSphere DataStage and InfoSphere QualityStage through flat files, direct connection, message queues, or staging tables Data changes for ETL and data cleansing IBM Information Server Extremely low impact on sourcing for ETL processing into data warehouse Leverage existing data ETL and data cleansing investments Change Data Capture Database.. Database.. Database 3
Differentiators Integrated with InfoSphere Information Server Technology integrated to feed real-time changed data into InfoSphere Information Server Information Management Software Benefits Extend existing InfoSphere Information Server functionality with real-time data feeds High Performance Optimized native, log-based change data capture without staging on the source Less invasive to data sources and network bandwidth than alternative solutions Transactional Integrity Fault tolerant architecture maintains consistency and recovery Breadth of Coverage DB z/luw/iseries, Oracle, Sybase, SQL Server, Informix, IMS, VSAM, ADABAS, IDMS Fast and efficient; no additional hardware; no changes to s/applications Low impact to performance of source s Lower risk by ensuring data integrity Leverage existing investments 4
Four Different Integration Options Via Database Staging MQ Series Integration Flat File Integration Direct Connect Greater flexibility to choose whichever option best fits your environment and business requirements 5
InfoSphere & InfoSphere DataStage (ETL) Information Server Change Data Capture Oracle Point Of Sale Native DB Log Retail Continuous Data Stage Consumption Direct Connect Staging Table Message Queue Flat File IBM IBM Information Information Server Server TCP via Data Stage operator Out of the box Out of the box DataStage DSX file format ETL Load EDW Teradata, DB, Oracle, SQL Server, Sybase Including BalOp (ELT) 6
DataStage Option : Database Staging InfoSphere 3 staging area 4 DS/QS job 5. DataStage extracts data for initial load using standard ETL functions. continuously captures changes made to source 3. continuously writes changes to a set of staging tables using Live Audit mappings 4. DataStage reads the changes from the staging tables, transforms and cleans the data as needed 5. Update target with changes 6. Update internal tracking with last bookmark processed Ideal for: Low Latency (minutes) High data volumes (thousands of rows per second) Any number of tables 7
DataStage Option : MQ Based integration InfoSphere 3 MQ 4 DS/QS job 5. DataStage extracts data for initial load using standard ETL functions. continuously captures changes made to remote 3. continuously writes change messages to MQ via event server target 4. DataStage (via MQ connector) processes messages and passes data off to downstream stages 5. Updates written to target Ideal for: Near real-time integration (seconds) Low data volumes (hundreds of changes per second) When infrastructure utilizes MQ Series 8
DataStage Option 3: File Based InfoSphere 3 File 4 DS/QS job 5. DataStage extracts data for initial load using standard ETL functions or can be used for refresh. continuously captures changes made to source 3. DataStage writes one file per table and periodically hardens the files 4. DataStage reads the changes from the complete files 5. Update target with changes Ideal for: Medium latency (a few minutes or more between periodic batches) Very High data volumes requiring parallel loading Up to hundreds of tables 9
DataStage Option 4: Direct Connect Source 5 DataStage Target 5 DS/QS job Transaction Stage 3 Database Connector Stage 4. DataStage extracts data for initial load using standard ETL functions or can be used for the refresh. continuously captures changes made to source and flows over TCP/IP to Transaction Stage 3. Transaction Stage passes data off to downstream stages 4. Updates target with changed data. Bookmark persisted in the target along with the client data to maintain end-to-end transactional integrity 5. Bookmark flows back to source periodically, and at start of replication Ideal for: Near real-time integration (seconds) Medium data volumes (hundreds to low thousands of rows per second) Less than 50 tables Should not be used for targeting Netezza 0
Questions??