BUS05 The Evolution of Data Integration John Motler Principal Sales Consultant Informatica 1
What is Data Integration Data Warehouse Data Migration Test Data Management & Archiving Data Consolidation Master Data Management Data Synchronization B2B Data Exchange Integration Platform SWIFT NACHA HIPAA Cloud Computing Application Database Unstructured Partner Data 2
Breadth of Data Ever Increasing Messaging, and Web Services WebSphere MQ JMS MSMQ SAP NetWeaver XI Web Services TIBCO webmethods Packaged Applications JD Edwards SAP NetWeaver Lotus Notes SAP NetWeaver BI Oracle E-Business SAS PeopleSoft Siebel Relational and Flat Files Oracle DB2 UDB DB2/400 SQL Server Sybase Informix Teradata Netezza ODBC JDBC SaaS/BPO Salesforce CRM Force.com RightNow NetSuite ADP Hewitt SAP By Design Oracle OnDemand Mainframe and Midrange ADABAS Datacom DB2 IDMS IMS VSAM C-ISAM Binary Flat Files Tape Formats Industry Standards EDI X12 EDI-Fact RosettaNet HL7 HIPAA AST FIX Cargo IMP MVR Unstructured Data and Files Word, Excel PDF StarOffice WordPerfect Email (POP, IMPA) HTTP Flat files ASCII reports HTML RPG ANSI LDAP XML Standards XML LegalXML IFX cxml ebxml HL7 v3.0 ACORD (AL3, XML) 3
Evolution of Data Integration 4
1960s and 1970s Databases and Applications 5
The Database 1960s Network / Hierarchical Databases 1970s Relational Databases IBM E.F. Codd - System R - SQL 1974 DB2 1983 mainframe Ingres 1973 (first product 1979) Oracle 1978 first release Sybase 1980s MS SQL Server 1990s Others Integrated hardware and software Teradata / System 38 Object Oriented Databases NoSQL databases 6
1980s Data Integration, Data Warehouses, ETL 7
Early Application/Data Integration Payroll Extract & Split Integration needs increased with increase in repositories Tools emerged to generate code that pulled and pushed data Cust. Service Sales Mainframe data used COBOL scripts, Open Systems C Scripts to transfer data proliferated Shipping Join & Load Approach known as ETL, but growth of tools was driven by emergence of????? 8
Data Warehouse 1970s: Bill Innon defines term Data Warehouse 1983: Teradata releases decision support DBMS 1990: Red Brick (Ralph Kimball) released 1990s: Informatica releases PowerMart a GUI based ETL tool 9
ETL Capabilities Adapters Graphical development environment Transformation library: Joining tables & files Pivoting for normalization Aggregating Slowly Changing Dims. Lookups Parsing Expressions Metadata architecture Object principles Performance tuning High availability 10
1990s Data Quality and EAI 11
Data Quality Emerging Need for Data Quality Many Data Warehouse and Data Migration projects failed because of quality issues Data Warehouse being used to make business decisions with incorrect data, not cause but became the focus Government led (NCOA) providing postal data to avoid huge cost of postage etc. Initially service based but expanded to server offerings Name and Address initial focus but expanded to all data type and domains Data Quality tools added to ETL Capabilities 12
Data Quality Capabilities Data Profiling Initially assessing the data to understand its quality challenges Data Standardization A rules engine that ensures that data conforms to quality rules Address Validation and Geocoding For name and address data. Corrects data to US and Worldwide postal standards Matching or Linking A way to compare data so that similar, but slightly different records can be aligned. Matching may use "fuzzy logic" to find duplicates in the data. Monitoring Keeping track of data quality over time and reporting variations in the quality of data. Batch and Real-Time Once the data is initially cleansed (batch), companies often want to build the processes into enterprise applications to keep it clean. 13
Enterprise Application Integration Integration framework composed of a collection of technologies and services which form a middleware to enable integration of systems and applications across the enterprise Extends from just data to business data and process Topologies Hub-and-Spoke - central message broker Bus central messaging backbone Emergence of EAI offerings IBM WebSphere MQ, TIBCO, Vitria, SeeBeyond, WebMethods Early promise but many failures Constant change, competing standards, conflicting requirements, scale of initiatives 14
2000s XML & Web Services, Data Exchanges, ESB, Data Federation, Real-time and MDM 15
XML extensible Markup Language a set of rules for encoding documents in a format that is both humanreadable and machine-readable 1.0 initial 1998, fifth edition 2008; 1.1 initial 2004 (limited use) Importance Drives a lot of B2B communications Many standard XML formats US based Standard for EDI ASC X12 (Finance, Healthcare, Insurance Exchanges) In conjunction with SOAP and HTTP forms the backbone for the next key integration technology (?) Criticism Verbose and complex hence emergence of competing data interchange standards (JSON) 16
Web Services and SOA Key Concepts Less data integration but more business service or process integration Loosely coupled service that provides a single action or business function Multiple Web Services are orchestrated to provide application functionality Services should be discoverable and with a simple interface Key Principles Reuse, granularity, modularity, componentization and interoperability. Standards-compliance (both common and industry-specific). Services identification and categorization, provisioning and delivery, and monitoring and tracking Web services alone as SOA can not handle the complex, secure and SLA based applications of an enterprise. 17
B2B Data Exchanges and EDI B2B Data Exchange Monitoring Partner Management Internal Systems Managed File Transfer Data Transformation External Partners Data Integration Computer-to-computer interchange of strictly formatted messages. EDI implies a sequence of messages between two parties, either of whom may serve as originator or recipient. The formatted data representing the documents may be transmitted from originator to recipient via telecommunications or physically transported on electronic storage media." 18
Data Exchange Functional Architecture Data Exchange Trading Partners Mgmt. Onboarding Endpoints management Monitoring Events Searching Managed File Transfer Multi-Protocol support Certificate Management Profiles Scheduling Reprocessing Archiving Encryption Reporting Store & Forward Security Dashboards Pre Configured Connections Data Flow Graphical Flow Design Reconciliation Data Transformation Visual Mapping Design Universal Data Transformation Logging Routing Industry Standards Excel Mapping Specification Transformation Validations 19
Enterprise Service Buses Key Concepts Backbone to support SOA architectures Supports move to service integration Provides connectivity, application adapters, rule based routing of messages, and limited data transformation Newer wave of message based EAI 20
Data Federation or Virtualization Key Concepts Query data but don t physically move it Supported early need for web presence Changing latency requirements for data and reports 21
Real-time Integration Requirements Support need for decreasing latency for reporting and business intelligence Application synchronization Approaches Data Federation one approach Change Data Capture identification, capture and delivery of the changes made to enterprise data sources. Allows transformation of data Data Replication technology Copy of production data or subsets Used mostly for reliability or fault tolerance 22
Master Data Management Statewide Automated Welfare Systems MMIS Other: CHIP, TANF, etc. No Single View of Master Data? Data Governance Client Provider Eligibility Benefits Client Provider Eligibility Client Eligibility Benefits Provider Client Eligibility Benefits Provider Benefits Cloud Computing Application Legacy Unstructured Third Party Data 23
Master Data Management a set of processes and tools that consistently defines and manages the non-transactional data entities of an organization (which may include reference data). MDM provides processes for collecting, aggregating, matching, consolidating, quality-assuring, persisting and distributing such data throughout an organization to ensure consistency and control in the ongoing maintenance and application use of this information. 24
Data Integration Data Integration Master Data Management Applications Relational and Flat Files Master Data Management Recognize Resolve Operational Applications Legacy Legacy Third Party Data Relate Master Data Foundation Data Services Analytical Business Intelligence Data Integration Data Profiling Data Quality Dashboard 25
2010s Cloud, Big Data 26 26
Cloud Computing Cloud computing is the use of computing resources (hardware and software) which are available in a remote location and accessible over a network (typically the Internet). Cloud computing entrusts remote services with a user's data, software and computation 27
Big Data BIG TRANSACTION DATA BIG INTERACTION DATA Online Transaction Processing (OLTP) Online Analytical Processing (OLAP) & DW Appliances Social Media Data Device Sensor Data Call detail records, image, click stream data Scientific, genomic Machine/Device BIG DATA PROCESSING 28
Government Use Cases 29
Health Insurance Exchange Background Affordable Care Act ( ObamaCare ) - States need to offer central eligibility and enrollment services for both State run and commercial health care plans HIE Integration Challenges State welfare systems and other eligibility systems State healthcare systems MMIS Federal Hub, providing eligibility and income services Commercial healthcare providers (qualified health plans) 30
Health Insurance Exchange SAWS MMIS Federal Hub Payers Database Multiple data source from multiple agencies ETL Data for current eligibility and reporting Data Quality Cleansing and standardizing disparate data XML / Web Services Integration with the Federal Hub Data Exchanges X12 HIPAA transactions between payers and state ESB / Real-time Eligibility determination, messages based deployments Data Federation Deployments that don t centralize data MDM Linking peoples stored in different systems 31
Statewide Longitudinal Data Systems How well are we preparing students for college? Are college graduates prepared to enter the workforce? What types of preschool programs are most effective? Are college students taking remedial classes? SLDS LINK: Data is required to be shared and exchanged across multiple agencies (human services, K-12, higher education, labor, corrections) and levels (district, state, federal) to promote accountability, inform policy and ensure a holistic view of student success. 32
Statewide Longitudinal Data Systems Database Multiple data source from multiple agencies ETL Demographic data for students in multiple systems Data Quality Cleansing and standardizing disparate data Data Federation Access to data from corrections and other agencies to support policy and summarization without PII MDM Linking people stored in different systems 33
CO SLDS VIDEO 34
Summary Need to define an enterprise integration strategy Need to incorporate message, process and data integration architectures It will change so abstraction and modularity are important Define an SOA strategy but will be some time before this is a universal data integration strategy MDM in concert with Data Integration and Data Quality are integral to many Government data sharing initiatives 35
36