Managing Data in Motion



Similar documents
Managing Data in Motion

Data Warehousing in the Age of Big Data

Master Data Management

AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY TOKYO

Big Data Analytics From Strategie Planning to Enterprise Integration with Tools, Techniques, NoSQL, and Graph

How To Write A Diagram

Securing the Cloud. Cloud Computer Security Techniques and Tactics. Vic (J.R.) Winkler. Technical Editor Bill Meine ELSEVIER

IMPROVEMENT THE PRACTITIONER'S GUIDE TO DATA QUALITY DAVID LOSHIN

Architectures, and. Service-Oriented. Cloud Computing. Web Services, The Savvy Manager's Guide. Second Edition. Douglas K. Barry. with.

Measuring Data Quality for Ongoing Improvement

Computing. Federal Cloud. Service Providers. The Definitive Guide for Cloud. Matthew Metheny ELSEVIER. Syngress is NEWYORK OXFORD PARIS SAN DIEGO

Customer Relationship Management

JOURNAL OF OBJECT TECHNOLOGY

Master Data Management. Zahra Mansoori

Klarna Tech Talk: Mind the Data! Jeff Pollock InfoSphere Information Integration & Governance

Cloud Computing. Theory and Practice. Dan C. Marinescu. Morgan Kaufmann is an imprint of Elsevier HEIDELBERG LONDON AMSTERDAM BOSTON

Data Integration Checklist

Network Security. Windows 2012 Server. Securing Your Windows. Infrastructure. Network Systems and. Derrick Rountree. Richard Hicks, Technical Editor

The Data Access Handbook

Risk Analysis and the Security Survey

MDM and Data Warehousing Complement Each Other

Virtualization and Forensics

Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence

Open Source Toolkit. Penetration Tester's. Jeremy Faircloth. Third Edition. Fryer, Neil. Technical Editor SYNGRESS. Syngrcss is an imprint of Elsevier

EII - ETL - EAI What, Why, and How!

Knowledgent White Paper Series. Developing an MDM Strategy WHITE PAPER. Key Components for Success

Building a Data Warehouse

Enterprise Information Integration (EII) A Technical Ally of EAI and ETL Author Bipin Chandra Joshi Integration Architect Infosys Technologies Ltd

Cyber Attacks. Protecting National Infrastructure Student Edition. Edward G. Amoroso

Obj ect-oriented Construction Handbook

Supply Chain Strategies

Private Cloud Computing

Enterprise Service Bus Defined. Wikipedia says (07/19/06)

Traditional BI vs. Business Data Lake A comparison

Configuration. Management for. Senior Managers. Essential Product Configuration. and Lifecycle Management

Management. Oracle Fusion Middleware. 11 g Architecture and. Oracle Press ORACLE. Stephen Lee Gangadhar Konduri. Mc Grauu Hill.

Rapid System Prototyping with FPGAs

Data Integration: Using ETL, EAI, and EII Tools to Create an Integrated Enterprise. Colin White Founder, BI Research TDWI Webcast October 2005

BIG DATA AND THE ENTERPRISE DATA WAREHOUSE WORKSHOP

Digital Forensics with Open Source Tools

Best Practices in Leveraging a Staging Area for SaaS-to-Enterprise Integration

Securing SQL Server. Protecting Your Database from. Second Edition. Attackers. Denny Cherry. Michael Cross. Technical Editor ELSEVIER

The Lab and The Factory

Master Data Management and Data Warehousing. Zahra Mansoori

Oracle Big Data Handbook

USING BIG DATA FOR INTELLIGENT BUSINESSES

Information Architecture

FIFTH EDITION. Oracle Essentials. Rick Greenwald, Robert Stackowiak, and. Jonathan Stern O'REILLY" Tokyo. Koln Sebastopol. Cambridge Farnham.

ORACLE DATA INTEGRATOR ENTERPRISE EDITION

SAS Enterprise Data Integration Server - A Complete Solution Designed To Meet the Full Spectrum of Enterprise Data Integration Needs

Data Virtualization Usage Patterns for Business Intelligence/ Data Warehouse Architectures

BUSINESS INTELLIGENCE

Metrics and Methods for Security Risk Management

Data Virtualization for Agile Business Intelligence Systems and Virtual MDM. To View This Presentation as a Video Click Here

Real Time Big Data Processing

Data Warehouse Overview. Srini Rengarajan

Implementation & Administration

IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS!

Data Warehouse Design

Data Virtualization A Potential Antidote for Big Data Growing Pains

Service Oriented Architecture (SOA) Architecture, Governance, Standards and Technologies

Development and Management

SQL Server Integration Services. Design Patterns. Andy Leonard. Matt Masson Tim Mitchell. Jessica M. Moss. Michelle Ufford

Survey of Big Data Architecture and Framework from the Industry

SQL Server Integration Services Design Patterns

A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel

Data Ownership and Enterprise Data Management: Implementing a Data Management Strategy (Part 3)

Enterprise Data Integration for Microsoft Dynamics CRM

SOLUTION BRIEF. JUST THE FAQs: Moving Big Data with Bulk Load.

Fixed/Mobile Convergence and Beyond AMSTERDAM BOSTON. HEIDELBERG LONDON

Comprehensive Analytics on the Hortonworks Data Platform

Pro Apache Hadoop. Second Edition. Sameer Wadkar. Madhu Siddalingaiah

for the Entire Organization

A Service-oriented Architecture for Business Intelligence

SOA REFERENCE ARCHITECTURE: SERVICE TIER

The Future of Data Management

Increase Agility and Reduce Costs with a Logical Data Warehouse. February 2014

IT Manager's Handbook

White Paper. Unified Data Integration Across Big Data Platforms

Luncheon Webinar Series May 13, 2013

Unified Data Integration Across Big Data Platforms

Agile Development & Business Goals. The Six Week Solution. Joseph Gee. George Stragand. Tom Wheeler

Decoding the Big Data Deluge a Virtual Approach. Dan Luongo, Global Lead, Field Solution Engineering Data Virtualization Business Unit, Cisco

SERVICE ORIENTED ARCHITECTURE

Web Development with TIBCO General Interface

How To Make Data Streaming A Real Time Intelligence

Understanding and Selecting Integration Approaches

Compensating the Sales Force

... Foreword Preface... 19

The four (five) Sensors

Accelerating Hadoop MapReduce Using an In-Memory Data Grid

A Big Data Storage Architecture for the Second Wave David Sunny Sundstrom Principle Product Director, Storage Oracle

THE DATA WAREHOUSE ETL TOOLKIT CDT803 Three Days

Manifest for Big Data Pig, Hive & Jaql

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

Using Master Data in Business Intelligence

Transcription:

Managing Data in Motion Data Integration Best Practice Techniques and Technologies April Reeve ELSEVIER AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY TOKYO Morgan Kaufmann is an imprint of Elsevier M<

Contents Foreword Acknowledgements Biography Introduction xv xvii xix xxi PART 1 INTRODUCTION TO DATA INTEGRATION Chapter 1 The Importance of Data Integration з The natural complexity of data interfaces 3 The rise of purchased vendor packages 4 Key enablement of big data and virtualization 5 Chapter 2 What Is Data Integration? 7 Data in motion 7 Integrating into a common format transforming data 7 Migrating data from one system to another 8 Moving data around the organization 9 Pulling information from unstructured data 11 Moving process to data 12 Chapter 3 Types and Complexity of Data Integration 15 The differences and similarities in managing data in motion and persistent data 15 Batch data integration 16 Real-time data integration 16 Big data integration 17 Data virtualization 17 Chapter 4 The Process of Data Integration Development 19 The data integration development life cycle 19 Inclusion of business knowledge and expertise 20 PART 2 BATCH DATA INTEGRATION Chapter 5 Introduction to Batch Data Integration 25 What is batch data integration? 25 Batch data integration life cycle 26

viii Contents Chapter 6 Extract, Transform, and Load 29 WhatisETL? 29 Profiling 30 Extract 30 Staging 31 Access layers 32 Transform 33 Simple mapping 33 Lookups 33 Aggregation and normalization 33 Calculation 34 Load 34 Chapter 7 Data Warehousing 37 What is data warehousing? 37 Layers in an enterprise data warehouse architecture 38 Operational application layer 38 External data 38 Data staging areas coming into a data warehouse 39 Data warehouse data structure 40 Staging from data warehouse to data mart or business intelligence 40 Business Intelligence Layer 40 Types of data to load in a data warehouse 41 Master data in a data warehouse 41 Balance and snapshot data in a data warehouse 42 Transactional data in a data warehouse 43 Events 43 Reconciliation 43 Interview with an expert: Krish Krishnan on data warehousing and data integration 44 Chapter 8 Data Conversion 51 What is data conversion? 51 Data conversion life cycle 51 Data conversion analysis 52 Best practice data loading 52 Improving source data quality 53

Contents ix Mapping to target 53 Configuration data 54 Testing and dependencies 55 Private data 55 Proving 56 Environments 56 Chapter 9 Data Archiving 59 What is data archiving? 59 Selecting data to archive 60 Can the archived data be retrieved? 60 Conforming data structures in the archiving environment 61 Flexible data structures 61 Interview with an expert: John Anderson on data archiving and data integration 62 Chapter 10 Batch Data Integration Architecture and Metadata 67 What is batch data integration architecture? 67 Profiling tool 67 Modeling tool 68 Metadata repository 69 Data movement 69 Transformation 70 Scheduling 71 Interview with an expert: Adrienne Tannenbaum on metadata and data integration 73 PART 3 REAL TIME DATA INTEGRATION Chapter 11 Introduction to Real-Time Data Integration 77 Why real-time data integration? 77 Why two sets of technologies? 78 Chapter 12 Data Integration Patterns 79 Interaction patterns 79 Loose coupling 79 Hub and spoke 80 Synchronous and asynchronous interaction 83

x Contents Request and reply 83 Publish and subscribe 84 Two-phase commit 84 Integrating interaction types 85 Chapter 13 Core Real-Time Data Integration Technologies 87 Confusing terminology 87 Enterprise service bus (ESB) 88 Interview with an expert: David S. Linthicum on ESB and data integration 89 Service-oriented architecture (SOA) 90 Extensible markup language (XML) 92 Interview with an expert: M. David Allen on XML and data integration 92 Data replication and change data capture 95 Enterprise application integration (EAI) 97 Enterprise information integration (Ell) 97 Chapter 14 Data Integration Modeling 99 Canonical modeling 99 Interview with an expert: Dagna Gaythorpe on canonical modeling and data integration 100 Message modeling 103 Chapter 15 Master Data Management 105 Introduction to master data management 105 Reasons for a master data management solution 105 Purchased packages and master data 106 Reference data 107 Masters and slaves 107 External data 110 Master data management functionality 110 Types of master data management solutions registry and data hub Ill Chapter 16 Data Warehousing with Real-Time Updates 113 Corporate information factory 113 Operational data store 113

Contents xi Master data moving to the data warehouse 116 Interview with an expert: Krish Krishnan on real-time data warehousing updates 116 Chapter 17 Real-Time Data Integration Architecture and Metadata 119 What is real-time data integration metadata? 119 Modeling 120 Profiling 120 Metadata repository 120 Enterprise service bus data transformation and orchestration 121 Technical mediation 122 Business content 122 Data movement and middleware 123 External interaction 123 PART 4 BIG, CLOUD, VIRTUAL DATA Chapter 18 Introduction to Big Data Integration 127 Data integration and unstructured data 127 Big data, cloud data, and data virtualization 127 Chapter 19 Cloud Architecture and Data Integration 129 Why is data integration important in the cloud? 129 Public cloud 129 Cloud security 130 Cloud latency 131 Cloud redundancy 132 Chapter 20 Data Virtualization 135 A technology whose time has come 135 Business uses of data virtualization 137 Business intelligence solutions 137 Integrating different types of data 137 Quickly add or prototype adding data to a data warehouse 137 Present physically disparate data together 138 Leverage various data and models triggering transactions 138

xii Contents Data virtualization architecture 138 Sources and adapters 138 Mappings and models and views 138 Transformation and presentation 139 Chapter 21 Big Data Integration 141 What is big data? 142 Big data dimension volume 142 Massive parallel processing moving process to data 142 Hadoop and MapReduce 143 Integrating with external data 144 Visualization 144 Big data dimension variety 145 Types of data 145 Integrating different types of data 145 Interview with an expert: William McKnight on Hadoop and data integration 145 Big data dimension velocity 146 Streaming data 147 Sensor and GPS data 147 Social media data 147 Traditional big data use cases 147 More big data use cases 148 Health care 148 Logistics 148 National security 149 Leveraging the power of big data real-time decision support 149 Triggering action 149 Speed of data retrieval from memory versus disk 150 From data analytics to models, from streaming data to decisions 150 Big data architecture 151 Operational systems and data sources 151 Intermediate data hubs 151 Business intelligence tools 152 Data virtualization server 153

Contents xiii Batch and real-time data integration tools 153 Analytic sandbox 153 Risk response systems/recommendation engines 153 Interview with an expert: John Haddad on Big Data and data integration 154 Chapter 22 Conclusion to Managing Data in Motion 157 Data integration architecture 157 Why data integration architecture? 157 Data integration life cycle and expertise 158 Security and privacy 158 Data integration engines 160 Operational continuity 160 ETL engine 160 Enterprise service bus 161 Data virtualization server 161 Data movement 162 Data integration hubs 162 Master data 163 Data warehouse and operational data store 164 Enterprise content management 164 Data archive 164 Metadata management 164 Data discovery 165 Data profiling 165 Data modeling 165 Data flow modeling 165 Metadata repository 166 The end 166 References 167 Index 169