i I I I THE PRACTITIONER'S GUIDE TO DATA QUALITY IMPROVEMENT DAVID LOSHIN ELSEVIER AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY TOKYO Morgan Kaufmann Publishers is an Imprint of Elsevier M<
CONTENTS Foreword Preface Acknowledgments About the Author xi xiii xxi xxiii Chapter 1 Business Impacts of Poor Data Quality 1 1.1 Information Value and Data Quality Improvement 3 1.2 Business Expectations and Data Quality 4 1.3 Qualifying Impacts 5 1.4 Some Examples 7 1.5 More on Impact Classification 11 1.6 Business Impact Analysis 13 1.7 Additional Impact Categories 14 1.8 Impact Taxonomies and Iterative Refinement 15 1.9 Summary: Translating impact into Performance 16 Chapter 2 The Organizational Data Quality Program 17 2.1 The Virtuous Cycle of Data Quality 17 2.2 Data Quality Processes 19 2.3 Stakeholders and Participants 27 2.4 Data Quality Tools 30 2.5 Summary 34 Chapter 3 Data Quality Maturity 35 3.1 The Data Quality Strategy 35 3.2 A Data Quality Framework 38 3.3 A Data Quality Capability/Maturity Model 42 3.4 Mapping Framework Components to the Maturity Model 44 3.5 Summary 49 Chapter 4 Enterprise Initiative Integration 53 4.1 Planning Initiatives 53 4.2 Framework Initiatives 60 4.3 Operational and Application Initiatives 62 v
VI CONTENTS 4.4 Scoping Issues 64 4.5 Summary 66 Chapter 5 Developing Quality Road Map 67 A Business Case and A Data 5.1 Return on the Data Quality Investment 68 5.2 Developing the Business Case 69 5.3 Finding the Business Impacts 69 5.4 Researching Costs 72 5.5 Correlating Impacts and Causes 73 5.6 The Impact Matrix 74 5.7 Problems, Issues, Causes 75 5.8 Mapping Impacts to Data Flaws 75 5.9 Estimating the Value Gap 76 5.10 Prioritizing Actions 79 5.11 The Data Quality Road Map 81 5.12 Practical Steps for Developing the Road Map 84 5.13 Accountability, Responsibility, and Management 84 5.14 The Life Cycle of the Data Quality Program 86 5.15 Summary 90 Chapter 6 Metrics and Performance Improvement 91 6.1 Performance-Oriented Data Quality 92 6.2 Developing Data Quality Metrics 93 6.3 Measurement and Key Data Quality Performance Indicators.96... 6.4 Statistical Process Control 99 6.5 Control Charts 101 6.6 Kinds of Control Charts 105 6.7 Interpreting Control Charts 109 6.8 Finding Special Causes 111 6.9 Maintaining Control 112 6.10 Summary 112 Chapter 7 Data Governance 115 7.1 The Enterprise Data Quality Forum 116 7.2 The Data Quality Charter 116
CONTENTS Vii 7.3 Mission and Guiding Principles 117 7.4 Roles and Responsibilities 118 7.5 Operational Structure 122 7.6 Data Stewardship 122 7.7 Data Quality Validation and Certification 125 7.8 Issues and Resolution 127 7.9 Data Governance and Federated Communities 127 7.10 Summary 128 Chapter 8 Dimensions of Data Quality 129 8.1 What Are Dimensions of Data Quality? 130 8.2 Categorization of Dimensions 131 8.3 Describing Data Quality Dimensions 134 8.4 Intrinsic Dimensions 135 8.5 Contextual 138 8.6 Qualitative Dimensions 142 8.7 Finding Your Own Dimensions 146 8.8 Summary 146 Chapter 9 Data Requirements Analysis 147 9.1 Business Uses of Information and Business Analytics 148 9.2 Business Drivers and Data Dependencies 151 9.3 What Is Data Requirements Analysis? 152 9.4 The Data Requirements Analysis Process 154 9.5 Defining Data Quality Rules 160 9.6 Summary 164 Chapter 10 Metadata and Data Standards 167 10.1 Challenges 168 10.2 Data Standards 169 10.3 Metadata Management 171 10.4 Business Metadata 173 10.5 Reference Metadata 176 10.6 Data Elements 179 10.7 Business Metadata 183 10.8 A Process for Data Harmonization 185 10.9 Summary 189
Tangible Viii CONTENTS Chapter 11 Data Quality Assessment 191 11.1 Planning 192 11.2 Business Process Evaluation 194 11.3 Preparation and Data Analysis 197 11.4 Data Profiling and Analysis 199 11.5 Synthesis of Analysis Results 202 11.6 Review with Business Client 205 11.7 Summary Rapid Data Assessment - Results 206 Chapter 12 Remediation and Improvement Planning 207 12.1 Triage 208 12.2 The Information Flow Map 212 12.3 Root Cause Analysis 215 12.4 Remediation 216 12.5 Execution 218 12.6 Summary 218 Chapter 13 Data Quality Service Level Agreements 219 13.1 Business Drivers and Success Criteria 220 13.2 Identifying Data Quality Rules 223 13.3 Establishing Data Quality Control 227 13.4 The Data Quality Service Level Agreement 228 13.5 Inspection and Monitoring 230 13.6 Data Quality Metrics and a Data Quality Scorecard 232 13.7 Data Quality Incident Reporting and Tracking 232 13.8 Automating the Collection of Metrics 234 13.9 Reporting the Scorecard 235 13.10 Taking Action for Remediation 239 13.11 Summary - Managing Using the Data Quality Scorecard....239 Chapter 14 Data Profiling 241 14.1 Application Contexts for Data Profiling 242 14.2 Data Profiling: Algorithmic Techniques 245 14.3 Data Reverse Engineering 248 14.4 Analyzing Anomalies 249 14.5 Data Quality Rule Discovery 251 14.6 Metadata Compliance and Data Model Integrity 254
CONTENTS ix 14.7 Coordinating the Participants 256 14.8 Selecting a Data Set for Analysis 257 14.9 Summary 259 Chapter 15 Parsing and Standardization 261 15.1 Data Error Paradigms 262 15.2 The Role of Metadata 264 15.3 Tokens: Units of Meaning 266 15.4 Parsing 268 15.5 Standardization 270 15.6 Defining Rules and Recommending Transformations 272 15.7 The Proactive versus Reactive Paradox 275 15.8 Integrating Data Transformations into the Application Framework 277 15.9 Summary 277 Chapter 16 Entity Identity Resolution 279 16.1 The Lure of Data Correction 280 16.2 The Dual Challenge of Unique Identity 281 16.3 What Is an Entity? 282 16.4 Identifying Attributes 283 16.5 Similarity Analysis and the Matching Process 285 16.6 Matching Algorithms 286 16.7 False Positives, False Negatives, and Thresholding 289 16.8 Survivorship 291 16.9 Monitoring Linkage and Survivorship 293 16.10 Entity Search and Match and Computational Complexity 293 16.11 Applications of Identity Resolution 294 16.12 Evaluating Business Needs 296 16.13 Summary 296 Chapter 17 Inspection, Monitoring, Auditing, and Tracking 299 17.1 The Data Quality Service Level Agreement Revisited 300 17.2 Instituting Inspection and Monitoring: Technology and Process 300 17.3 Data Quality Business Rules 304 17.4 Automating Inspection and Monitoring 307
X CONTENTS 17.5 Incident Reporting, Notifications, and Issue Management 309 17.6 Putting It Together 312 Chapter 18 Data Enhancement 313 18.1 The Value of Enhancement 314 18.2 Approaches to Data Enhancement 315 18.3 Examples of Data Enhancement 316 18.4 Enhancement through Standardization 319 18.5 Enhancement through Context 320 18.6 Enhancement through Data Merging 321 18.7 Summary: Qualifying Data Sources for Enhancement 324 Chapter 19 Master Data Management and Data Quality 327 19.1 What Is Master Data? 328 19.2 What Is Master Data Management? 330 19.3 "Golden Record" or "Unified View"? 331 19.4 Master Data Management as a Tool 332 19.5 MDM: A High-Level Component Approach 333 19.6 Master Data Usage Scenarios 336 19.7 Master Data Management Architectures 339 19.8 Identifying Master Data 343 19.9 Master Data Services 344 19.10 Summary: Approaching MDM and Data Quality 349 Chapter 20 Bringing It All Together 351 20.1 Organization and Management 351 20.2 Building the Information Quality Program 360 20.3 Techniques and Tools 373 20.4 Summary 383 Index 385