EII - ETL - EAI What, Why, and How!



Similar documents
MDM and Data Warehousing Complement Each Other

Service Oriented Data Management

Chapter 5. Learning Objectives. DW Development and ETL

Enterprise Information Integration (EII) A Technical Ally of EAI and ETL Author Bipin Chandra Joshi Integration Architect Infosys Technologies Ltd

Offload Enterprise Data Warehouse (EDW) to Big Data Lake. Ample White Paper

SAS Enterprise Data Integration Server - A Complete Solution Designed To Meet the Full Spectrum of Enterprise Data Integration Needs

Integrating Netezza into your existing IT landscape

An Oracle White Paper March Best Practices for Real-Time Data Warehousing

Data Integration: Using ETL, EAI, and EII Tools to Create an Integrated Enterprise. Colin White Founder, BI Research TDWI Webcast October 2005

EAI vs. ETL: Drawing Boundaries for Data Integration

CitusDB Architecture for Real-Time Big Data

Chapter 3 - Data Replication and Materialized Integration

Building Your EDI Modernization Roadmap

Managing Data in Motion

Master Data Management and Data Warehousing. Zahra Mansoori

MDM for the Enterprise: Complementing and extending your Active Data Warehousing strategy. Satish Krishnaswamy VP MDM Solutions - Teradata

CHAPTER 1 INTRODUCTION

JOURNAL OF OBJECT TECHNOLOGY

Data Management in an International Data Grid Project. Timur Chabuk 04/09/2007

Understanding and Selecting Integration Approaches

Virtual Operational Data Store (VODS) A Syncordant White Paper

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications

Enterprise Application Integration - An Overview. Prepared By

Implementing Oracle BI Applications during an ERP Upgrade

Gradient An EII Solution From Infosys

Data Virtualization and ETL. Denodo Technologies Architecture Brief

Data Integration for the Real Time Enterprise

Big data management with IBM General Parallel File System

Is ETL Becoming Obsolete?

Pervasive Software + NetSuite = Seamless Cloud Business Processes

Data Virtualization for Agile Business Intelligence Systems and Virtual MDM. To View This Presentation as a Video Click Here

High-Volume Data Warehousing in Centerprise. Product Datasheet

<Insert Picture Here> Oracle BI Standard Edition One The Right BI Foundation for the Emerging Enterprise

A Service-oriented Architecture for Business Intelligence

A roadmap to enterprise data integration.

Oracle Warehouse Builder 10g

Exploring the Synergistic Relationships Between BPC, BW and HANA

Attunity Integration Suite

Data virtualization: Delivering on-demand access to information throughout the enterprise

Knowledgent White Paper Series. Developing an MDM Strategy WHITE PAPER. Key Components for Success

A Tipping Point for Automation in the Data Warehouse.

Data Virtualization Usage Patterns for Business Intelligence/ Data Warehouse Architectures

Oracle Architecture, Concepts & Facilities

BUSINESSOBJECTS DATA INTEGRATOR

OpenText Output Transformation Server

Patrick Firouzian, ebay

Enterprise Application Integration

Monitor and Manage Your MicroStrategy BI Environment Using Enterprise Manager and Health Center

SOA REFERENCE ARCHITECTURE: SERVICE TIER

An Oracle White Paper November Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

Implementing efficient system i data integration within your SOA. The Right Time for Real-Time

Enterprise Data Integration for Microsoft Dynamics CRM

Data Integration Checklist

<Insert Picture Here> Operational Reporting for Oracle Applications with Oracle GoldenGate

What does SAS Data Management do? Why is SAS Data Management important? For whom is SAS Data Management designed? Key Benefits

ORACLE BUSINESS INTELLIGENCE SUITE ENTERPRISE EDITION PLUS

Innovate and Grow: SAP and Teradata

IT Workload Automation: Control Big Data Management Costs with Cisco Tidal Enterprise Scheduler

Klarna Tech Talk: Mind the Data! Jeff Pollock InfoSphere Information Integration & Governance

ORACLE BUSINESS INTELLIGENCE SUITE ENTERPRISE EDITION PLUS

BUSINESSOBJECTS DATA INTEGRATOR

Oracle Data Integrator Technical Overview. An Oracle White Paper Updated December 2006

Elastic Application Platform for Market Data Real-Time Analytics. for E-Commerce

TIBCO ActiveSpaces Use Cases How in-memory computing supercharges your infrastructure

SAP HANA SAP s In-Memory Database. Dr. Martin Kittel, SAP HANA Development January 16, 2013

PostgreSQL Business Intelligence & Performance Simon Riggs CTO, 2ndQuadrant PostgreSQL Major Contributor

EAI OVERVIEW OF ENTERPRISE APPLICATION INTEGRATION CONCEPTS AND ARCHITECTURES. Enterprise Application Integration. Peter R. Egli INDIGOO.

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya

A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel

Azure Scalability Prescriptive Architecture using the Enzo Multitenant Framework

Real-time Data Replication

PROCESS AUTOMATION FOR DISTRIBUTION OPERATIONS MANAGEMENT. Stipe Fustar. KEMA Consulting, USA

Business Intelligence and Service Oriented Architectures. An Oracle White Paper May 2007

SQL Server Master Data Services A Point of View

Traditional BI vs. Business Data Lake A comparison

The HP Neoview data warehousing platform for business intelligence

Data Integration Overview

Trafodion Operational SQL-on-Hadoop

ORACLE DATA INTEGRATOR ENTERPRISE EDITION

purexml Critical to Capitalizing on ACORD s Potential

Lection 3-4 WAREHOUSING

ENTERPRISE EDITION ORACLE DATA SHEET KEY FEATURES AND BENEFITS ORACLE DATA INTEGRATOR

ENABLING OPERATIONAL BI

Business Intelligence In SAP Environments

Jitterbit Technical Overview : Salesforce

ETL Overview. Extract, Transform, Load (ETL) Refreshment Workflow. The ETL Process. General ETL issues. MS Integration Services

ORACLE DATA INTEGRATOR ENTERPRISE EDITION

Overview Western Mariusz Gieparda

Unified Data Integration Across Big Data Platforms

White Paper. Unified Data Integration Across Big Data Platforms

Persistent, Reliable JMS Messaging Integrated Into Voyager s Distributed Application Platform

Introduction to Oracle Business Intelligence Standard Edition One. Mike Donohue Senior Manager, Product Management Oracle Business Intelligence

Simplified Management With Hitachi Command Suite. By Hitachi Data Systems

Data Warehouse Overview. Srini Rengarajan

Enterprise Service Bus Defined. Wikipedia says (07/19/06)

Service Oriented Architecture and the DBA Kathy Komer Aetna Inc. New England DB2 Users Group. Tuesday June 12 1:00-2:15

A Near Real-Time Personalization for ecommerce Platform Amit Rustagi

IBM Software Enabling business agility through real-time process visibility

Transcription:

IBM Software Group EII - ETL - EAI What, Why, and How! Tom Wu 巫 介 唐, wuct@tw.ibm.com Information Integrator Advocate Software Group IBM Taiwan 2005 IBM Corporation

Agenda Data Integration Challenges and IBM Vision Definitions and Patterns Data Integration Approaches ETL vs. EII vs. EAI 2

Today s World: Complex and Costly Heterogeneous, distributed data Inconsistent islands of information underlie applications Complications from M&A and departmental purchases Complex & costly copy synchronization Inconsistent and poor quality data No feedback on quality of service Impossible to support business transformation CRM Order Proc Supply Chain Procurement 3

IBM Information Integration Vision Delivering accurate, consistent, timely, and coherent business information Marketing e-commerce Fulfillment Business Applications Integrated Information Get the right information, in the form you want, whenever you need it. Isolate applications from information complexity. 4

Integration Challenges Lack of confidence in the correctness of information Lack of interface design principles and common formats Definition of correct format and semantic layer to be able to merge two or more disparate repositories/applications data Definition of an Integration Governance Model Creation of a methodology to document technical items such as record definitions, structures, interfaces, and flows across the whole organization Defining rules and methods to maintain consistency across all data integration projects 5

Agenda Data Integration Challenges and IBM Vision Definitions and Patterns Data Integration Approaches ETL vs. EII vs. EAI 6

EII - Enterprise Information Integration Optimized & transparent data access and transformation layer providing a single relational interface across all enterprise data It enables the integration of structured and unstructured data to provide real-time read and write access, to transform data for business analysis and data interchange, and to manage data placement for performance, currency, and availability. 7

EAI - Enterprise Application Integration Message-based, transaction-oriented, point-to-point (or pointto-hub) brokering and transformation for application-toapplication integration The core benefits offered by Enterprise Application Integration are: A focus on integrating both business-level processes and data A focus on reuse and distribution of business processes and data A focus on simplifying application integration by reducing the amount of detailed, application specific knowledge required by the users 8

EAI Types There are four types of Enterprise Application Integration: User Interface Level Method Level Application Interface Level Data Level Many organizations choose Data Level EAI as a starting point Middleware and EAI leverage message brokers, application servers, distributed objects, and distributed agents. Transactional Middleware is becoming a more popular solution that relies on method sharing to provide a reliable messaging framework. 9

ETL - Extract - Transform - Load ETL tool Set-oriented point-in-time transformation for migration, consolidation, and data warehousing. Designed to process very large amounts of data, ETL provides a suitable platform for: Improved productivity by reuse of objects and transformations Strict methodology Better Metadata support, including impact analysis 10

Role of ETL tools Scheduling Parallel and concurrent workloads Message based designs Impact analysis Metadata gathering and management 11

ETL Architecture Based on metadata repositories and ETL engines Workflow Manager Workflow Monitor Repository Manager Designer Parallel ETL engines offer performance and scalability Schedule or Event Driven Workflow engine Integration with business processes ETL Server Sources Object Repository Targets 12

Information Integration Data Patterns EII EAI ETL SQL (or Content) Application Target / Data Warehouse Structured Data Source Data Virtualization Legacy Data Source unstructured Application Interpret Transform Route Application Data Source load transform extract Data Source Real-time information access Federation of data from multiple sources Dynamic drill down Semi-structured & unstructured data Process based integration of application data Message-based, transactionoriented processing Workflow and data orchestration, content-based routing Bulk data integration Set-based & hierarchical transformations High scale, batch-oriented data delivery 13

Unique Characteristics query oriented dynamic, real-time access transparency across different sources metadata support less a tool than a SQL access layer and methodology EII ETL batch oriented latency tolerant complex transformations massive volumes code-less specification deep metadata support bi-tool integration EAI transaction-oriented message-based transactional volumes hierarchical structures developer audiences C, Java, etc. EDI, SWIFT, HIPPA lower emphasis on metadata integration ensured delivery 24 by 7 security encryption restart/recovery 14

Information Integration Data Patterns EAI Replication ETL Application Database Target / Data Warehouse Application Interpret Transform Route Database capture, apply load transform extract Application Database Data Source Data Source Process based integration of application data Message-based, transactionoriented processing Workflow and data orchestration, content-based routing Distribution, consolidation, or synchronization between databases Change capture provides event basis Apply controls batch or transactional and adds transformation Bulk data integration Set-based & hierarchical transformations High scale, batch-oriented data delivery 15

Replication, EAI or ETL? Replication or EAI Both can be used to accomplish the same task: Capturing an event Transferring it to another system Applying it to the target application Reasons to use replication Replication handles the end-to-end delivery process through declarative specification, including recovery processes. The change capture (if log-based) occurs asynchronously from the originating application, reducing the performance impact on that application The application is similarly shielded from any loss of availability of the message queue or transfer service New or existing applications can be built without special coding for application messaging Reasons to use EAI: The event being replicated is not written to a database The event being replicated is not within the context of a single database The target application requires invocation of the application interface to preserve integrity Replication and/or ETL Both can be used to accomplish the same task: Copying data from one or more databases to one or more databases Reasons to use replication Want to capture only the changes (and the data does not contain sufficient information to for the extract process to identify changes) Want lower latency between source and target systems Reasons to use ETL Supports richer transformations Easier to manage and maintain Change volume is high, so full extract is easier and cheaper Reasons to use replication with ETL Lower latency copies with richer transformations (typically limited to single row) 16

Information Integration Data Patterns EAI Data Event Publishing Replication Application Database Database Application Interpret Transform Route Capture Publish Database capture, apply EAI Repl ETL RYO Database Application Process based integration of application data Message-based, transactionoriented processing Workflow and data orchestration, content-based routing Message-based publishing based on event capture from single database Add-on to EAI, ETL, or Replication Distribution, consolidation, or synchronization between databases Change capture provides event basis, Apply controls batch or transactional and adds transformation 17

Event Publishing, EAI or ETL? Event Publishing or EAI Both accomplish the same task: Capturing an event Putting a message about the event onto a queue Reasons to use event publishing The messages are created asynchronously from the originating application, reducing the performance impact on that application The application is similarly shielded from any loss of availability of the message queue or service New or existing applications can be built without special coding for application messaging Reasons to use EAI: Event is not written to a database Database does not contain sufficient information for full message context Message routing is based on message content Reasons to use event publishing with EAI Simplest change capture for initiating a process or passing a message Event Publishing and ETL Reasons to use event publishing with ETL Deliver changes to transformation engine and data flow Event Publishing and Replication Reasons to use event publishing with Replication Deliver changes to replication engine 18

Business Intelligence Data Patterns ETL EII Data Warehouse Application load transform Enterprise Data Warehouse extract unstructured Data Source Data Source Capture Publish /Hub Structured Data Source Legacy Data Source Bulk data integration Set-based & hierarchical transformations High scale, batch-oriented data delivery Augmentation of Existing DW Real-time joins with data from multiple sources Dynamic drill down 19

Three types of data services Real-time Near Real-time Historical & analytical SQL SQL SQL Federation Replication Data Warehouse ETL 20

Distributed Access vs. Consolidated Access Distributed access Primary requirements: Very current data Dynamic joining of data Structured and unstructured data Mixed relational and non-relational data Not practical to copy data Small amounts of data in result set Consolidated access Primary requirements: Local performance Structured data Extract, Transform and Load Extensive transformations User metadata Large volumes of data Replication Small amount of changed data Up to near real time updates 21

Agenda Data Integration Challenges and IBM Vision Definitions and Patterns Data Integration Approaches ETL vs. EII vs. EAI 22

ETL vs. EII vs. EAI Strengths and Challenges ETL tool ETL Major Strengths Optimized for data structures Periodic, batch-oriented (not intended for real-time) Can move large volumes of data in one step Enables complex data transformations requiring calculations, aggregations or multiple stages Scheduling controlled by the administrator Several GUI based tools available to increase productivity High level of reuse of objects and transformations 23

ETL vs. EII vs. EAI Strengths and Challenges ETL tool ETL Main Challenges Time to market Change management Data moved regardless of real need Consumes storage systems Data out-of-synch with the original source when it arrives in the DW Large requirements for staging areas Unidirectional Lack of multi-site update support (2 phase commit) 24

ETL vs. EII vs. EAI Strengths and Challenges EII Major Strengths Relational access to non-relational sources Ability to explore data before a formal data model and metadata are created Quicker deployment Can be reused by ETL and/or EAI further developments Access in place data, meaning it avoids unnecessary movement of data. Optimized for global access to remote sources Event publishing technology provides a non-intrusive means to listen for particular changes (insert, update or deletes) that defined as being of interest. 25

ETL vs. EII vs. EAI Strengths and Challenges EII Main Challenges Need Matching keys across sources Data types mismatch Data reconciliation Possibly high resource utilization on the source system Limited to hundreds of thousands of rows for remote result sets Performance degradation when query pushdown is not used Limited transformation bounded by SQL capability and system capacity May consume network bandwidth during peak hours Multi-site updates require transactional control (2PC) 26

ETL vs. EII vs. EAI Strengths and Challenges EAI Major Strengths Optimized for API-based applications Real-time (or near) Move/send individual events or transactions Some capability for simple and basic transformation and rules Workflow controlled Broker capabilities (subscriptions) 27

ETL vs. EII vs. EAI Strengths and Challenges EAI Main Challenges Limited transformation capability Limited support for data aggregation Limited to tens of records per transaction Complexity for development Longer Time to market Limited reuse for transformations Limited support for metadata (use, import and export) Semantic integrity May consume network bandwidth during peak hours 28

ETL vs. EII vs. EAI A Technical Comparison Data Flow ETL EII EAI Unidirectional from source to target Bidirectional Bidirectional Data Movement Scheduled batch Process managed Query time Query (SQL) managed Transaction triggered asynchronous Transaction managed Latency Daily - Monthly Real-time Near real-time Transformation, cleansing/enrichm ent Metadata process reuse Best Generally high reusability of objects and processes Medium Transformations embedded in views and database objects. Low Transformations are done with ESQL. Metadata import limited with DB catalog information 29

ETL vs. EII vs. EAI A Technical Comparison Transport ETL EII EAI FTP, direct database connection Direct database connection Messaging Data Volume Processing Very large (millions, billions of records) Medium access to 100 s of thousands or few millions of remote records Small (few records) can handle several parallel pipes of few records Complexity of Transformation Any complexity Transformations that can be expressed with SQL Simple syntax transformations. Limited semantic transformations can be implemented via a broker. 30

ETL vs. EII vs. EAI A Technical Comparison Support for Event Monitoring ETL EII EAI Very limited with high latency Limited to data events and depended on trigger capability of data sources Best logic can be added to support true event propagation and not only data transaction movement. Versioning Full support Limited support custom build Limited support custom build Workflow Control Scheduling, dependencies and error or exception handling None Extensive rules based 31

ETL Best Practices ETL is usually heavily I/O bounded. Avoid unnecessary staging steps Use faster storage Avoid I/O contention Careful with lookup processing Do not rely on ESS disk subsystems for file placement Use a tool and methodology for ETL, both for productivity and data consistency. Avoid populating data marts from data marts Avoid excessive locking Key to running many concurrent processes in parallel Allows backup at same time as query and load ETL tool 32

EII Best Practices In general: Don t allow unregulated ad-hoc access Plan ahead to manage costly queries by caching frequently-used data at the DB2 II instance for best performance Control the kinds and cost of queries that are submitted to DB2 II using Application controls: parameterized reports DB2 Query Patroller Expect operations that move a lot of data between remote sources and DB2 II to take a long time Don t try to use DB2 II to implement a virtual warehouse on a permanent basis, especially if ad-hoc access is desired 33

EII Best Practices Be mindful of the impact of federated queries on remote sources Aim for targeted access to remote data Data flows from remote sources to the federated server Be careful with joins of large tables residing in two or more remote systems Create and maintain a federated semantic layer 34

EAI Best Practices Avoid point-to-point integration Use hub and spoke brokers for better reuse Design with deployment in mind Build a deployment plan upfront Establish SLAs and monitoring procedures Understand the impact in all involved systems Understand the data flow scenarios and contingencies Profile and monitor Performance Be prepare to track and trace data consistency and performance bottlenecks in the workflow 35

EII vs EAI vs ETL When to use EII Usually connecting a large repository with selected data from other sources Selectively as a tool to extend existing well-designed EDW May be indicated when source data: Volatility is high Selectivity is granular Connectivity is reliable Service levels are compatible Transformations are minimal and can be expressed as SQL When to use EAI Integration of transactions and not large data sets Questions can be answered by joining small amounts of data Data sources repositories cannot be directly accessed When to use ETL Data consolidation Complex transformations Combination is normally used 36