Data Services for Enterprise SOA MPower Information Co., LTD. System Service Division Edward Chen
Agenda Data Services for SOA Data Delivery Styles Data Integration as a Service Simple Data Services for SOA Platform 5 Patterns for Successful Data Services Caveats and Concluding Thoughts Understanding Performance Choices Data Service Pattern Tradeoffs Key Recommendation Go Hybrid
Data Services for SOA: Analyst Coverage Highlights Importance and Confusion Analysts Writing About Data Services or IaaS or Information Fabric Gartner Forrester IDC AMR Burton Group Bloor Research Zap Think etc etc. One Thing They All Seem to Agree On: They re Crucial to SOA Success! One Thing They Can t: What are they?
Where Does Data Integration Fit? Essential Ingredient for Information Agility Data Services Information services SOA Business Intelligence Data Integration for BI applications Event driven BI Process Integration Event driven data integration Data Integration Heterogeneous Data Access Information-based analytics Governance and Impact Analysis Data Warehousing Report to Source Data Lineage Extract, Transform, Load Data Migration, Bulk Data Data Quality, Profiling
So What Are Data Services? Casting a Wide Net A General Software Pattern? Using contract-based development patterns for data distribution A Java Software Pattern? Pick your favorite TLA: DTO, DAS, SDO, JPA, ORM, JAXB, OXM A Web Services Stereotype? Dedicated SOAP services for reusable data manipulation and access An Enterprise Class of Conventional IT Services? The same tools we used to work with data before Java & SOA ALL OF THE ABOVE!
Data Delivery Styles A Key Enabler for Data Services is Flexible Service Styles Application Data Delivery Patterns Object-Oriented Bulk-Oriented Process-Oriented Event-Oriented Process Bus Object Grid Batch Master Data Data Quality A A a A b A c B C D E F G H J I K L Use Case: Hard Realtime Composite Applications eg: Trade Data Cache Fast Data Cache Object-based API Use Case: Realtime and Batch BI/BAM Applications eg : BI/DW Refresh Fast Bulk Transforms Set-Based Mgmt. Use Case: Long-lived or Workflow-oriented Transactional Systems eg : Supply Chain Data Multi-step XA Flow Approvals Workflow Use Case: Event-Action Business Rule-driven Transactional Systems eg : Sales/Cust Data Hub Publish and Subscribe Changed Data Capture
Data Integration as a Service SOA using ODI Data Capabilities For every enterprise-sized SOA solution SOA can t ignore/avoid/replace the enterprise data architecture To optimize transformation of large data payloads To orchestrate DB-to-DB data movement Including for ODS, DW, BI Cubes, MDM and any DB Hubs To optimize transformation of large B2B/EDI files To load, or work with Data Warehouse appliances Event driven / Active Business Intelligence To watch events (using CDC) inside databases Trap DB events and propagate to SOA tier
BPEL Process Manager and ODI Add Bulk Data Transformation to Business Processes Oracle SOA Suite BPEL Process Manager Oracle Data Integrator ODI Knowledge Modules Transform. Services Bulk Data Processing Changed Data Capture ODI Connectivity Framework Business Activity Monitoring Web Services Manager Business Rules Engine Enterprise Service Bus Data Services Oracle SOA Suite - BPEL BPEL Process Manager for Business Process Orchestration & Workflow Standards-based Composite Apps and Integration Processes Oracle s Strategic BPM Technology Oracle Data Integrator Efficient bulk data processing as part of Business Process Interact via Data Services and Transformation Services Unified modeling, monitoring, metadata, error mgmt, auditing Example use cases Trigger DW update off-schedule based on Business Rules Drive workflows notifications to handle errors from Data Integration or Quality
Simple Data Services for SOA Platform Transformation Services for Large Documents XML Document Transformation EDI (B2B) Document Transformation File-to-DB Loading Services Batch Data Services & DB Orchestration DB Orchestration, for DW Refresh Heterogeneous DB Replication Data Access Services & Data Virtualization DB Web Service Wrappers, with CDC Canonical Data Hub Lifecycle Management
Data Services for Large XML/EDI Docs (Data Transformation Services) 1 Trigger [SOAP or JMS] Oracle BPEL PM or ESB 2 3 Reference 8 1. Some Typical Event to BPEL PM or ESB 2. Execution Starts (BPEL or Mediator etc) some demand for transforming a large document payload occurs (>10MB) 3. Pass XML payload, by reference, to ODI 4. ODI requests payload 5. ODI loads payload 6. ODI transforms payload 7. ODI sends payload wherever instructed 8. Core BPEL/ESB processing completes ODI Invoke Reference Data Integrator Transform 6 7 4 <yxz> <zyx> 5 Order DB Product Suppliers 1. Example: Major domestic retailer 2. BPEL, ESB, ODI - Processing of large book catalog files and retail supplier data 3. Conversion of 30-70MB Flat files, expands to 200-400MB XML docs 4. Contacts/background: 5. Jeff.Pollock at Oracle.com
ODI for Loading XML/Flat File to DBMS Event 1 Oracle BPEL PM or ESB 2 <yxz> Reference 3 Reference 7 8 1. File arrives, detected by BPEL File Adapter 2. Execution Starts (BPEL/ ESB) some demand for XML to DB load occurs 3. Pass XML payload, by reference, to ODI 4. ODI requests payload 5. ODI inserts payload to DB 6. ODI transforms payload 7. ODI notifies BPEL/ESB that job is complete 8. Core BPEL/ESB processing completes Shared Container/JVM Product Suppliers ODI Invoke Reference Data Integrator 4 <yxz> File System Shared Metadata Repository Transform 5 6 Any DB Transform Records B Records A
Data Service for BI/DW Control (Batch Data Service) 1 Trigger [SOAP or JMS or BAM] 1. BAM, CEP, Workflow, BPEL or Mediator triggers a rule requiring a DW refresh for BI 2. Command issued to ODI 3. ODI orchestrates data tier loading and insertion into staging area (Target DB) 4. ODI transforms aggregate data and loads Target tables 5. Oracle Business Intelligence or Oracle Hyperion Applications can report on SOA Suite triggered data BPEL/ Mediator 2 Command Human Workflow Oracle BPEL PM or ESB ODI Invoke Data Integrator Product Suppliers Mart A 3 Mart B BAM/ CEP Mart n <yxz> B2B Data 4 Example: Major mail order relail co. obpel, ODI, BI-EE- Processing for Core Data Replication Systems o Unification of Oracle EBS + Legacy to feed OBI-EE o Contacts/background: Jeff.Pollock at Oracle.com 5 BI EE or Hyperion Warehouse Transform Records A Records B 12
ODI for SOA-based ERP Integration 1. A business process for Migration (initial bulk data load) or Replication (ongoing synch) is invoked 2. BPEL/ESB sends instruction to ODI 3. ODI performs E-LT a) ODI creates Unique ID for new ERP data objects b) ODI updates Unique ID for existing objects 4. ODI confirms job 5. BPEL/ESB begins processing ERP business transactions c) BPEL/ESB leverage same Unique IDs for canonical XML 6. All SOA and ETL jobs keep business data aligned with Unique IDs ERP App 1 UI App Data DDL XML Oracle BPEL PM or ESB Shared Container/JVM a 2 Messaging and bulk data may leverage the same unique object IDs thereby ensuring uniform ERP data objects (eg: PK123 = Pkxyz) Instruction ODI Invoke Confirmation Shared Canonical ID XREF Lookups Data Integrator 3 1 4 c 5 Transform b XML ERP App 2 DDL UI App Data
5 Patterns for Successful Data Services The key is to utilize the right technology and approach for the job SOA business benefits are transforming the industry Data Services are not a product, but an architecture pattern Data services are emerging for improved re-use and flexibility Complexity of data implementations calls for new approaches and special handling apart from traditional WS*
Data Services for Bulk Data Decouple, Recombine, Orchestrate Decouple business apps from data as services on a bus Recombine existing services and app components instead of new development Orchestrate services to create custom view. Enables efficiencies in transformation, data loading, data synchronization, all at improved performance Bulk Data Services Data Integrator (ETL & Bulk Data) Any BPEL SOA WSDL Apps ODIEE JDBC BULK DATA Java BIEE Any
Data Access Services Real-time data access Data Access Services SOA (solo) DB Adapter Java w/ TopLink Provide a unified offering for accessing data services BPEL SOA BPEL SOA Java Provides improved performance, scalability WSDL WSDL DB Adapter JDBC WSDL SDO Custom Java Any Events and Grid Coherence Standards support: WS*, SOA, Java, SDO Can be deployed on Application Grid platforms for near real-time support
Data Federation BEA adds Data Services using Data Federation to the mix Data Federation Data Service Integrator: Formerly known as AquaLogic Data Services Platform Manage Federated queries and distributed views Optimized Performance, highly scalable Real-time access to operational data Open, Standards based Data Federation (Transaction Queries) Apps BPEL SOA WSDL SQL/XQuery Java Data Service Integrator Data API (Object) Data API (Object) JDBC WSDL Any
Data Quality Services For Clean, accurate, up-to-date information Data Quality Services Data Integrator (ETL & Bulk Data) Data Quality Rules invoked as a service in a BPEL process Invoke Quality steps, parsing cleansing, and standardization, matching Data Quality is part of the loading process BPEL SOA WSDL Apps Java BIEE Manage exceptions in data as workflow error hospital ODIEE Any JDBC Any Quality Steps
Master Data Services Foundation for Data Relationship Management Master Data Services Master Data Management & Data Integration BPEL SOA Apps WSDL Java ODIEE KMs JDBC Any BIEE MDM DRM CRM ERP App Load DW Where to find or assemble the best (trusted) high quality business data records, hierarchies, and policies? Connects data to MDM to multiple systems, data warehouses and BI Applications Pre-built Knowledge modules for rapid deployment Better access authoritative information through MDM
Product Mapping for Data Services The key is to utilize the right technology and approach for the job
Product Mapping for Data Services The key is to utilize the right technology and approach for the job Oracle Data Integration Suite
Understanding Performance Choices When you need to transform data at large size Depends on whether an intermediary XML format is useful for other processing (use ESB), or if joining File data to tabular RDB data is required (use ODI) Less than 10MB XML File DB (source) (target) XML ESB ESB ESB File ESB ESB depends DB ESB depends ODI Depends on ho much cross-referencing among the data values and rows is required during transformation the more there is, the faster ODI will perform relative to ESB Between 10-50MB XML File DB (source) (target) XML depends depends ODI File depends ODI ODI DB ODI ODI ODI If the source and target are both XML, and there is no cross-referencing of data among rows, then a streaming-type or parallel-engine-type approach might scale (source) Greater than (target) 50MB XML File DB XML depends ODI ODI File ODI ODI ODI DB ODI ODI ODI
Data Service Pattern Tradeoffs Each Pattern Class Carries Assumptions and Limitations J2EE/Java SOA/XML-Centric Conventional DI J2EE Pattern Assumption Business Object tier instantiated in Java The master reference point is marshaled to Java at some process point Limitation #1: Scalability RAM memory-model vis JVM Query performance across JDO/SDOs Non-declarative access methodology (API vs. SQL) Transformation overhead (SQL to Java to XML vs SQL to XML) Limitation #2: Redundancy Replicating data instances across compute tiers / inelegant Synch consistency issues to persistence becomes a challenge at scale Limitation #3: Granularity Specific patterns solve specific issues (eg: Object to RDB) These (DTO, DAS, SDO, JPA, ORM, JAXB, OXM, etc.) are programming patterns not business or system service-level patterns SOA Service Pattern Assumption Business Object tier typically defined with Canonical XML The master reference point is on the Bus or XML document-oriented Limitation #1: Bloat Data + XML Markup = 3 to 6 times original data size This can take a 30MB product catalog to >100MB of XML Limitation #2: Performance XML processing is inherently disadvantaged (DOM/Infoset model) Data sets are typically navigated with XPath and/or in-memory Streaming XML approaches have limited applicability in complex data sets Limitation #3: Expressivity XML does not guarantee precise modeling semantics (relative to RDB) This limits automated capabilities It also limits overall transferability of common exchange models (models must be accompanied by explanatory human readable text specification) Conventional Pattern Assumption Business Object tier may be defined within any technology (typically RDB) The master reference point is typically (data+metadata) a proprietary API Limitation #1: Ownership Cost Different core technology components (ETL, MDM, IdM, etc) may have radically different underlying technology, architecture, and integration styles Impact Training Overhead Impact Data Center Overhead Impact Duplicate Hardware Impact Integration Costs Limitation #2: Flexibility and Reuse Once conventional (non-soa) integrations are tightly coupled Data bindings often result in the classical n 2 mapping nightmare Limitation #3: Lock-in Tendencies Monolithic conventional architectures create incentives to buy more from existing vendors, which leads to a repetitive cycle and vendor lock-in (aka: the SAP approach)
Key Recommendation Go Hybrid EG: Select a hybrid platform for Java SOA Conventional Core Hybrid Pattern Assumption Best-of-breed platform enables customer-choice right tools for the job The master reference points, Business /DataObjects, may be instantiated as Java, XML, or manipulated solely as RDB structures depending on the use case Requirement #1: Leverage Native Architecture Patterns Avoid bolt-on Java because API-only will not lower TCO, only access layers Avoid bolt-on SOA SOAP-wrapper will not improve performance, or data decoupling Avoid Java or XML bulk data processing limited power when used in isolation Requirement #2: Demand Runtime Integration Avoid having to integrate the integration platforms! Look for cross-system Human Workflow, common Hardware and Platform runtime arch. Requirement #3: Demand Design-time Integration Avoid having to train developers on four or five different tools for the same use cases Look for integration tools that use the same Design Platform across runtime engines