Master Data Management David Loshin AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO Ик^И V^ SAN FRANCISCO SINGAPORE SYDNEY TOKYO W*m k^ MORGAN KAUFMANN PUBLISHERS IS AN IMPRINT OF ELSEVIER MORGAN KAUFMANN PUBLISHERS
Contents Preface Acknowledgments About the Author xvii xxiii xxv CHAPTER 1 Master Data and Master Data Management l 1.1 Driving the Need for Master Data 1 1.2 Origins of Master Data 3 1.2.1 Example: Customer Data 4 1.3 What Is Master Data? 5 1.4 What Is Master Data Management? 8 1.5 Benefits of Master Data Management 10 1.6 Alphabet Soup: What about CRM/SCM/ERP/BI (and Others)? 12 1.7 Organizational Challenges and Master Data Management 15 1.8 MDM and Data Quality 17 1.9 Technology and Master Data Management 18 1.10 Overview of the Book 18 1.11 Summary 20 CHAPTER 2 Coordination: Stakeholders, Requirements, and Planning 23 2.1 Introduction 23 2.2 Communicating Business Value 24 2.2.1 Improving Data Quality 25 2.2.2 Reducing the Need for Cross-System Reconciliation 25 2.2.3 Reducing Operational Complexity 25 2.2.4 Simplifying Design and Implementation 26 2.2.5 Easing Integration 27 2.3 Stakeholders 27 2.3.1 Senior Management 27 2.3.2 Business Clients 28 2.3.3 Application Owners 28 2.3.4 Information Architects 29 2.3.5 Data Governance and Data Quality 29 2.3.6 Metadata Analysts 30 2.3.7 System Developers 30 2.3.8 Operations Staff 30 2.4 Developing a Project Charter 31 2.5 Participant Coordination and Knowing Where to Begin 32 vii
viii Contents 2.5.1 Processes and Procedures for Collaboration 33 2.5.2 RACI Matrix 33 2.5.3 Modeling the Business 34 2.5.4 Consensus Driven through Metadata 35 2.5.5 Data Governance 36 2.6 Establishing Feasibility through Data Requirements 36 2.6.1 Identifying the Business Context 37 2.6.2 Conduct Stakeholder Interviews 38 2.6.3 Synthesize Requirements 39 2.6.4 Establishing Feasibility and Next Steps 41 2.7 Summary 41 CHAPTER3 MDM Components and the Maturity Model 43 3.1 Introduction 43 3.2 MDM Basics 44 3.2.1 Architecture 45 3.2.2 Master Data Model 45 3.2.3 MDM System Architecture 46 3.2.4 MDM Service Layer Architecture 46 3.3 Manifesting Information Oversight with Governance 47 3.3.1 Standardized Definitions 47 3.3.2 Consolidated Metadata Management 48 3.3.3 Data Quality 49 3.3.4 Data Stewardship 49 3.4 Operations Management 49 3.4.1 Identity Management 50 3.4.2 Hierarchy Management and Data Lineage 50 3.4.3 Migration Management 51 3.4.4 Administration/Configuration 51 3.5 Identification and Consolidation 51 3.5.1 Identity Search and Resolution 52 3.5.2 Record Linkage 52 3.5.3 Merging and Consolidation 52 3.6 Integration 53 3.6.1 Application Integration with Master Data 53 3.6.2 MDM Component Service Layer 53 3.7 Business Process Management 54 3.7.1 Business Process Integration 54 3.7.2 Business Rules 55 3.7.3 MDM Business Component Layer 55 3.8 MDM Maturity Model 56 3.8.1 Initial 56
Contents ix 3.8.2 Reactive 3.8.3 Managed 59 3.8.4 Proactive 60 3.8.5 Strategic Performance 62 3.9 Developing an Implementation Road Map 63 3.10 Summary 65 CHAPTER4 Data Governance for Master Data Management 67 4.1 Introduction 67 4.2 What Is Data Governance? 68 4.3 Setting the Stage: Aligning Information Objectives with the Business Strategy 69 4.3.1 Clarifying the Information Architecture 70 4.3.2 Mapping Information Functions to Business Objectives 71 4.3.3 Instituting a Process Framework for Information Policy 71 4.4 Data Quality and Data Governance 72 4.5 Areas of Risk 72 4.5.1 Business and Financial 72 4.5.2 Reporting 73 4.5.3 Entity Knowledge 73 4.5.4 Protection 74 4.5.5 Limitation of Use 74 4.6 Risks of Master Data Management 74 4.6.1 Establishing Consensus for Coordination and Collaboration 74 4.6.2 Data Ownership 75 4.6.3 Semantics: Form, Function, and Meaning 76 4.7 Managing Risk through Measured Conformance to Information Policies 77 4.8 Key Data Entities 78 4.9 Critical Data Elements 78 4.10 Defining Information Policies 79 4.11 Metrics and Measurement 80 4.12 Monitoring and Evaluation 81 4.13 Framework for Responsibility and Accountability 82 4.14 Data Governance Director 83 4.15 Data Governance Oversight Board 84 4.16 Data Coordination Council 84 4.17 Data Stewardship 85 4.18 Summary 86
x Contents % CHAPTER5 Data Quality and MDM 87 5.1 Introduction 87 5.2 Distribution, Diffusion, and Metadata 88 5.3 Dimensions of Data Quality 89 5.3.1 Uniqueness 90 5.3.2 Accuracy 90 5.3.3 Consistency 90 5.3.4 Completeness 91 5.3.5 Timeliness 92 5.3.6 Currency 92 5.3.7 Format Compliance 92 5.3.8 Referential Integrity 93 5.4 Employing Data Quality and Data Integration Tools 93 5.5 Assessment: Data Profiling 94 5.5.1 Profiling for Metadata Resolution 94 5.5.2 Profiling for Data Quality Assessment 96 5.5.3 Profiling as Part of Migration 96 5.6 Data Cleansing 97 5.7 Data Controls 99 5.7.1 Data and Process Controls 100 5.7.2 Data Quality Control versus Data Validation 100 5.8 MDM and Data Quality Service Level Agreements 101 5.8.1 Data Controls, Downstream Trust, and the Control Framework 101 5.9 Influence of Data Profiling and Quality on MDM (and Vice Versa) 102 5.10 Summary 103 CHAPTER 6 Metadata Management for MDM 105 6.1 Introduction 105 6.2 Business Definitions 108 6.2.1 Concepts 109 6.2.2 Business Terms 109 6.2.3 Definitions 110 6.2.4 Semantics 110 6.3 Reference Metadata Ill 6.3.1 Conceptual Domains Ill 6.3.2 Value Domains 112 6.3.3 Reference Tables 113 6.3.4 Mappings 114 6.4 Data Elements 115 6.4.1 Critical Data Elements 116 6.4.2 Data Element Definition 116
Contents xi 6.4.3 Data Formats 117 6.4.4 Aliases/Synonyms 117 6.5 Information Architecture 118 6.5.1 Master Data Object Class Types 118 6.5.2 Master Entity Models 119 6.5.3 Master Object Directory 120 6.5.4 Relational Tables 120 6.6 Metadata to Support Data Governance 120 6.6.1 Information Usage 120 6.6.2 Information Quality 121 6.6.3 Data Quality SLAs 121 6.6.4 Access Control 122 6.7 Services Metadata 122 6.7.1 Service Directory 123 6.7.2 Service Users 123 6.7.3 Interfaces 123 6.8 Business Metadata 124 6.8.1 Business Policies 125 6.8.2 Information Policies 126 6.8.3 Business Rules 126 6.9 Summary 126 CHAPTER 7 Identifying Master Metadata and Master Data 129 7.1 Introduction 129 7.2 Characteristics of Master Data 131 7.2.1 Categorization and Hierarchies 131 7.2.2 Тор-Down Approach: Business Process Models 133 7.2.3 Bottom-Up Approach: Data Asset Evaluation 134 7.3 Identifying and Centralizing Semantic Metadata 135 7.3.1 Example 135 7.3.2 Analysis for Integration 137 7.3.3 Collecting and Analyzing Master Metadata 137 7.3.4 Resolving Similarity in Structure 138 7.4 Unifying Data Object Semantics 139 7.5 Identifying and Qualifying Master Data 140 7.5.1 Qualifying Master Data Types 140 7.5.2 The Fractal Nature of Metadata Profiling 141 7.5.3 Standardizing the Representation 142 7.6 Summary 142 CHAPTER8 Data Modeling for MDM 143 8.1 Introduction 143 8.2 Aspects of the Master Repository 144
xii Contents 8.2.1 Characteristics of Identifying Attributes 144 8.2.2 Minimal Master Registry 144 8.2.3 Determining the Attributes Called "Identifying Attributes" 145 8.3 Information Sharing and Exchange 146 8.3.1 Master Data Sharing Network 146 8.3.2 Driving Assumptions 146 8.3.3 Two Models: Persistence and Exchange 149 8.4 Standardized Exchange and Consolidation Models 149 8.4.1 Exchange Model 150 8.4.2 Using Metadata to Manage Type Conversion 151 8.4.3 Caveat: Type Downcasting 152 8.5 Consolidation Model 152 8.6 Persistent Master Entity Models 153 8.6.1 Supporting the Data Life Cycle 153 8.6.2 Universal Modeling Approach 154 8.6.3 Data Life Cycle 155 8.7 Master Relational Model 156 8.7.1 Process Drives Relationships 156 8.7.2 Documenting and Verifying Relationships 156 8.7.3 Expanding the Model 157 8.8 Summary 157 CHAPTER9 MDM Paradigms and Architectures 159 9.1 Introduction 159 9.2 MDM Usage Scenarios 160 9.2.1 Reference Information Management 160 9.2.2 Operational Usage 162 9.2.3 Analytical Usage 164 9.3 MDM Architectural Paradigms 165 9.3.1 Virtual/Registry 166 9.3.2 Transaction Hub 168 9.3.3 Hybrid/Centralized Master 169 9.4 Implementation Spectrum 171 9.5 Applications Impacts and Architecture Selection 172 9.5.1 Number of Master Attributes 173 9.5.2 Consolidation 174 9.5.3 Synchronization 174 9.5.4 Access 174 9.5.5 Service Complexity 175 9.5.6 Performance 175 9.6 Summary 176
Contents xiii CHAPTER10 Data Consolidation and Integration 177 10.1 Introduction 177 10.2 Information Sharing 178 10.2.1 Extraction and Consolidation 178 10.2.2 Standardization and Publication Services 179 10.2.3 Data Federation 179 10.2.4 Data Propagation 180 10.3 Identifying Information 181 10.3.1 Indexing Identifying Values 181 10.3.2 The Challenge of Variation 182 10.4 Consolidation Techniques for Identity Resolution 183 10.4.1 Identity Resolution 184 10.4.2 Parsing and Standardization 185 10.4.3 Data Transformation 186 10.4.4 Normalization 186 10.4.5 Matching/Linkage 187 10.4.6 Approaches to Approximate Matching 188 10.4.7 The Birthday Paradox versus the Curse of Dimensionality 189 10.5 Classification 190 10.5.1 Need for Classification 191 10.5.2 Value of Content and Emerging Techniques 191 10.6 Consolidation 192 10.6.1 Similarity Thresholds 193 10.6.2 Survivorship 193 10.6.3 Integration Errors 195 10.6.4 Batch versus Inline 196 10.6.5 History and Lineage 196 10.7 Additional Considerations 197 10.7.1 Data Ownership and Rights of Consolidation 197 10.7.2 Access Rights and Usage Limitations 198 10.7.3 Segregation Instead of Consolidation 199 10.8 Summary 199 CHAPTER11 Master Data Synchronization 201 11.1 Introduction 201 11.2 Aspects of Availability and Their Implications 202 11.3 Transactions, Data Dependencies, and the Need for Synchrony 203 11.3.1 Data Dependency 204 11.3.2 Business Process Considerations 205 11.3.3 Serializing Transactions 206
xiv Contents 11.4 Synchronization 207 11.4.1 Application Infrastructure Synchronization Requirements 208 11.5 Conceptual Data Sharing Models 209 11.5.1 Registry Data Sharing 209 11.5.2 Repository Data Sharing 210 11.5.3 Hybrids and Federated Repositories 211 11.5.4 MDM, the Cache Model, and Coherence 212 11.6 Incremental Adoption 215 11.6.1 Incorporating and Synchronizing New Data Sources 215 11.6.2 Application Adoption 216 11.7 Summary 216 CHAPTER12 MDM and the Functional Services Layer 217 12.1 Collecting and Using Master Data 218 12.1.1 Insufficiency of ETL 218 12.1.2 Replication of Functionality 219 12.1.3 Adjusting Application Dependencies 219 12.1.4 Need for Architectural Maturation 219 12.1.5 Similarity of Functionality 219 12.2 Concepts of the Services-Based Approach 220 12.3 Identifying Master Data Services 222 12.3.1 Master Data Object Life Cycle 222 12.3.2 MDM Service Components 224 12.3.3 More on the Banking Example 224 12.3.4 Identifying Capabilities 225 12.4 Transitioning to MDM 227 12.4.1 Transition via Wrappers 228 12.4.2 Maturation via Services 228 12.5 Supporting Application Services 230 12.5.1 Master Data Services 230 12.5.2 Life Cycle Services 231 12.5.3 Access Control 232 12.5.4 Integration 232 12.5.5 Consolidation 233 12.5.6 Workflow/Rules 233 12.6 Summary 234 CHAPTER13 Management Guidance for MDM 237 13.1 Establishing a Business Justification for Master Data Integration and Management 238 13.2 Developing an MDM Road Map and Rollout Plan 240
Contents xv 13.2.1 MDM Road Map 240 13.2.2 Rollout Plan 241 13.3 Roles and Responsibilities 244 13.4 Project Planning 245 13.5 Business Process Models and Usage Scenarios 245 13.6 Identifying Initial Data Sets for Master Integration 246 13.7 Data Governance 246 13.8 Metadata 247 13.9 Master Object Analysis 248 13.10 Master Object Modeling 249 13.11 Data Quality Management 249 13.12 Data Extraction, Sharing, Consolidation, and Population 250 13.13 MDM Architecture 252 13.14 Master Data Services 253 13.15 Transition Plan 255 13.16 Ongoing Maintenance 256 13.17 Summary: Excelsior! 257 Bibliography and Suggested Reading 259 Index 261