IMPROVEMENT THE PRACTITIONER'S GUIDE TO DATA QUALITY DAVID LOSHIN



Similar documents
Master Data Management

Big Data Analytics From Strategie Planning to Enterprise Integration with Tools, Techniques, NoSQL, and Graph

Measuring Data Quality for Ongoing Improvement

Computing. Federal Cloud. Service Providers. The Definitive Guide for Cloud. Matthew Metheny ELSEVIER. Syngress is NEWYORK OXFORD PARIS SAN DIEGO

Customer Relationship Management

Securing the Cloud. Cloud Computer Security Techniques and Tactics. Vic (J.R.) Winkler. Technical Editor Bill Meine ELSEVIER

Data Warehousing in the Age of Big Data

Cyber Attacks. Protecting National Infrastructure Student Edition. Edward G. Amoroso

Configuration. Management for. Senior Managers. Essential Product Configuration. and Lifecycle Management

Managing Data in Motion

AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY TOKYO

Five Fundamental Data Quality Practices

Measuring and. Communicating. Security's Value. A Compendium of Metrics. for Enterprise Protection

How To Write A Diagram

Supply Chain Strategies

Agile Development & Business Goals. The Six Week Solution. Joseph Gee. George Stragand. Tom Wheeler

Human Performance Improvement

Building a Data Quality Scorecard for Operational Data Governance

AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY TOKYO Academic Press is an imprint of Elsevier

Cloud Computing. Theory and Practice. Dan C. Marinescu. Morgan Kaufmann is an imprint of Elsevier HEIDELBERG LONDON AMSTERDAM BOSTON

Risk Analysis and the Security Survey

IT Manager's Handbook

Practical Web Analytics for User Experience

Metrics and Methods for Security Risk Management

Business Performance & Data Quality Metrics. David Loshin Knowledge Integrity, Inc. loshin@knowledge-integrity.com (301)

Job Hazard Analysis. A Guide for Voluntary Compliance and Beyond. From Hazard to Risk: Transforming the JHA from a Tool to a Process

Scenario-Based Development of Human-Computer Interaction. MARY BETH ROSSON Virginia Polytechnic Institute and State University

DATA QUALITY MATURITY

Network Security: A Practical Approach. Jan L. Harrington

for the Entire Organization

Private Equity and Venture Capital in Europe

Security Metrics. A Beginner's Guide. Caroline Wong. Mc Graw Hill. Singapore Sydney Toronto. Lisbon London Madrid Mexico City Milan New Delhi San Juan

Network Security. Windows 2012 Server. Securing Your Windows. Infrastructure. Network Systems and. Derrick Rountree. Richard Hicks, Technical Editor

Open Source Toolkit. Penetration Tester's. Jeremy Faircloth. Third Edition. Fryer, Neil. Technical Editor SYNGRESS. Syngrcss is an imprint of Elsevier

Operationalizing Data Governance through Data Policy Management

Three Fundamental Techniques To Maximize the Value of Your Enterprise Data

CONTENTS. Preface. Acknowledgements. 1. Introduction and Overview 1 Introduction 1 Whatis the CMMI"? 2 What the CMMI* is Not 3 What are Standards?

Improving Business Process Performance

Fixed/Mobile Convergence and Beyond AMSTERDAM BOSTON. HEIDELBERG LONDON

Rapid System Prototyping with FPGAs

AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY TOKYO Academic Press is an imprint of Elsevier

INTERNATIONAL MONEY AND FINANCE

Proactive DATA QUALITY MANAGEMENT. Reactive DISCIPLINE. Quality is not an act, it is a habit. Aristotle PLAN CONTROL IMPROVE

The Process. Improvement. Handbook. A Blueprint for Managing Change and. Increasing Organizational Performance. Tristan Boutros.

Master Data Management and Data Governance Second Edition

Valvation. Theories and Concepts. Rajesh Kumar. Professor of Finance, Institute of Management Technology, Dubai, UAE

Financial Statement Analysis

Schneps, Leila; Colmez, Coralie. Math on Trial : How Numbers Get Used and Abused in the Courtroom. New York, NY, USA: Basic Books, p i.

Engineering DOCUMENTATION CONTROL HANDBOOK

Casual Game Design. Designing Play. Gamer in All of Us. for the. Gregory Trefry. TL'CHNiSCME HANNOVER. INFO R iv'iat io N S o i B L i OT H E K

CIMA'S Official Learning System

COPYRIGHTED MATERIAL. Contents. Acknowledgments Introduction

Delivery. Enterprise Software. Bringing Agility and Efficiency. Global Software Supply Chain. AAddison-Wesley. Alan W. Brown.

Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications

Obj ect-oriented Construction Handbook

Busting 7 Myths about Master Data Management

superseries FIFTH EDITION

Principal MDM Components and Capabilities

Populating a Data Quality Scorecard with Relevant Metrics WHITE PAPER

Architectures, and. Service-Oriented. Cloud Computing. Web Services, The Savvy Manager's Guide. Second Edition. Douglas K. Barry. with.

Implementing the Project Management Balanced Scorecard

Supporting Your Data Management Strategy with a Phased Approach to Master Data Management WHITE PAPER

Data Governance. David Loshin Knowledge Integrity, inc. (301)

The Designer's Guide to VHDL

Master Data Management in Practice. Achieving True Customer MDM. Wiley Corporate F&A

Understanding the Financial Value of Data Quality Improvement

Information Technology and Organizational Learning

Eye Tracking in User Experience Design

Securing SQL Server. Protecting Your Database from. Second Edition. Attackers. Denny Cherry. Michael Cross. Technical Editor ELSEVIER

Ctfo MANAGEMENT SECURITY PATCH. Felicia M. Nicastro. Second Edition. CRC Press. VC#*' J Taylor & Francis Group / Boca Raton London New York

Integrating Data Governance into Your Operational Processes

Private Cloud Computing

Digital Forensics with Open Source Tools

TABLE OF CONTENTS CHAPTER TITLE PAGE

Business Architecture

Virtualization and Forensics

Enterprise Data Governance

Platform Ecosystems. Aligning Architecture, Governance, and Strategy. Amrit Tiwana AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO

Monitoring Data Quality Performance Using Data Quality Metrics

Requirements Engineering for Software

Management. Project. Software. Ashfaque Ahmed. A Process-Driven Approach. CRC Press. Taylor Si Francis Group Boca Raton London New York

Winning the Hardware-Software Game

The Data Access Handbook

US Department of Education Federal Student Aid Integration Leadership Support Contractor January 25, 2007

Big Data and Big Data Governance

BUSINESS INTELLIGENCE

MIKE COHN. Software Development Using Scrum. VAddison-Wesley. Upper Saddle River, NJ Boston Indianapolis San Francisco

Whitepaper Data Governance Roadmap for IT Executives Valeh Nazemoff

Developer's Handbook

Information Management & Data Governance

DATA GOVERNANCE AT UPMC. A Summary of UPMC s Data Governance Program Foundation, Roles, and Services

NCOE whitepaper Master Data Deployment and Management in a Global ERP Implementation

Relationship marketing

Social Media Marketing

CAPABILITY MATURITY MODEL & ASSESSMENT

Transcription:

i I I I THE PRACTITIONER'S GUIDE TO DATA QUALITY IMPROVEMENT DAVID LOSHIN ELSEVIER AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY TOKYO Morgan Kaufmann Publishers is an Imprint of Elsevier M<

CONTENTS Foreword Preface Acknowledgments About the Author xi xiii xxi xxiii Chapter 1 Business Impacts of Poor Data Quality 1 1.1 Information Value and Data Quality Improvement 3 1.2 Business Expectations and Data Quality 4 1.3 Qualifying Impacts 5 1.4 Some Examples 7 1.5 More on Impact Classification 11 1.6 Business Impact Analysis 13 1.7 Additional Impact Categories 14 1.8 Impact Taxonomies and Iterative Refinement 15 1.9 Summary: Translating impact into Performance 16 Chapter 2 The Organizational Data Quality Program 17 2.1 The Virtuous Cycle of Data Quality 17 2.2 Data Quality Processes 19 2.3 Stakeholders and Participants 27 2.4 Data Quality Tools 30 2.5 Summary 34 Chapter 3 Data Quality Maturity 35 3.1 The Data Quality Strategy 35 3.2 A Data Quality Framework 38 3.3 A Data Quality Capability/Maturity Model 42 3.4 Mapping Framework Components to the Maturity Model 44 3.5 Summary 49 Chapter 4 Enterprise Initiative Integration 53 4.1 Planning Initiatives 53 4.2 Framework Initiatives 60 4.3 Operational and Application Initiatives 62 v

VI CONTENTS 4.4 Scoping Issues 64 4.5 Summary 66 Chapter 5 Developing Quality Road Map 67 A Business Case and A Data 5.1 Return on the Data Quality Investment 68 5.2 Developing the Business Case 69 5.3 Finding the Business Impacts 69 5.4 Researching Costs 72 5.5 Correlating Impacts and Causes 73 5.6 The Impact Matrix 74 5.7 Problems, Issues, Causes 75 5.8 Mapping Impacts to Data Flaws 75 5.9 Estimating the Value Gap 76 5.10 Prioritizing Actions 79 5.11 The Data Quality Road Map 81 5.12 Practical Steps for Developing the Road Map 84 5.13 Accountability, Responsibility, and Management 84 5.14 The Life Cycle of the Data Quality Program 86 5.15 Summary 90 Chapter 6 Metrics and Performance Improvement 91 6.1 Performance-Oriented Data Quality 92 6.2 Developing Data Quality Metrics 93 6.3 Measurement and Key Data Quality Performance Indicators.96... 6.4 Statistical Process Control 99 6.5 Control Charts 101 6.6 Kinds of Control Charts 105 6.7 Interpreting Control Charts 109 6.8 Finding Special Causes 111 6.9 Maintaining Control 112 6.10 Summary 112 Chapter 7 Data Governance 115 7.1 The Enterprise Data Quality Forum 116 7.2 The Data Quality Charter 116

CONTENTS Vii 7.3 Mission and Guiding Principles 117 7.4 Roles and Responsibilities 118 7.5 Operational Structure 122 7.6 Data Stewardship 122 7.7 Data Quality Validation and Certification 125 7.8 Issues and Resolution 127 7.9 Data Governance and Federated Communities 127 7.10 Summary 128 Chapter 8 Dimensions of Data Quality 129 8.1 What Are Dimensions of Data Quality? 130 8.2 Categorization of Dimensions 131 8.3 Describing Data Quality Dimensions 134 8.4 Intrinsic Dimensions 135 8.5 Contextual 138 8.6 Qualitative Dimensions 142 8.7 Finding Your Own Dimensions 146 8.8 Summary 146 Chapter 9 Data Requirements Analysis 147 9.1 Business Uses of Information and Business Analytics 148 9.2 Business Drivers and Data Dependencies 151 9.3 What Is Data Requirements Analysis? 152 9.4 The Data Requirements Analysis Process 154 9.5 Defining Data Quality Rules 160 9.6 Summary 164 Chapter 10 Metadata and Data Standards 167 10.1 Challenges 168 10.2 Data Standards 169 10.3 Metadata Management 171 10.4 Business Metadata 173 10.5 Reference Metadata 176 10.6 Data Elements 179 10.7 Business Metadata 183 10.8 A Process for Data Harmonization 185 10.9 Summary 189

Tangible Viii CONTENTS Chapter 11 Data Quality Assessment 191 11.1 Planning 192 11.2 Business Process Evaluation 194 11.3 Preparation and Data Analysis 197 11.4 Data Profiling and Analysis 199 11.5 Synthesis of Analysis Results 202 11.6 Review with Business Client 205 11.7 Summary Rapid Data Assessment - Results 206 Chapter 12 Remediation and Improvement Planning 207 12.1 Triage 208 12.2 The Information Flow Map 212 12.3 Root Cause Analysis 215 12.4 Remediation 216 12.5 Execution 218 12.6 Summary 218 Chapter 13 Data Quality Service Level Agreements 219 13.1 Business Drivers and Success Criteria 220 13.2 Identifying Data Quality Rules 223 13.3 Establishing Data Quality Control 227 13.4 The Data Quality Service Level Agreement 228 13.5 Inspection and Monitoring 230 13.6 Data Quality Metrics and a Data Quality Scorecard 232 13.7 Data Quality Incident Reporting and Tracking 232 13.8 Automating the Collection of Metrics 234 13.9 Reporting the Scorecard 235 13.10 Taking Action for Remediation 239 13.11 Summary - Managing Using the Data Quality Scorecard....239 Chapter 14 Data Profiling 241 14.1 Application Contexts for Data Profiling 242 14.2 Data Profiling: Algorithmic Techniques 245 14.3 Data Reverse Engineering 248 14.4 Analyzing Anomalies 249 14.5 Data Quality Rule Discovery 251 14.6 Metadata Compliance and Data Model Integrity 254

CONTENTS ix 14.7 Coordinating the Participants 256 14.8 Selecting a Data Set for Analysis 257 14.9 Summary 259 Chapter 15 Parsing and Standardization 261 15.1 Data Error Paradigms 262 15.2 The Role of Metadata 264 15.3 Tokens: Units of Meaning 266 15.4 Parsing 268 15.5 Standardization 270 15.6 Defining Rules and Recommending Transformations 272 15.7 The Proactive versus Reactive Paradox 275 15.8 Integrating Data Transformations into the Application Framework 277 15.9 Summary 277 Chapter 16 Entity Identity Resolution 279 16.1 The Lure of Data Correction 280 16.2 The Dual Challenge of Unique Identity 281 16.3 What Is an Entity? 282 16.4 Identifying Attributes 283 16.5 Similarity Analysis and the Matching Process 285 16.6 Matching Algorithms 286 16.7 False Positives, False Negatives, and Thresholding 289 16.8 Survivorship 291 16.9 Monitoring Linkage and Survivorship 293 16.10 Entity Search and Match and Computational Complexity 293 16.11 Applications of Identity Resolution 294 16.12 Evaluating Business Needs 296 16.13 Summary 296 Chapter 17 Inspection, Monitoring, Auditing, and Tracking 299 17.1 The Data Quality Service Level Agreement Revisited 300 17.2 Instituting Inspection and Monitoring: Technology and Process 300 17.3 Data Quality Business Rules 304 17.4 Automating Inspection and Monitoring 307

X CONTENTS 17.5 Incident Reporting, Notifications, and Issue Management 309 17.6 Putting It Together 312 Chapter 18 Data Enhancement 313 18.1 The Value of Enhancement 314 18.2 Approaches to Data Enhancement 315 18.3 Examples of Data Enhancement 316 18.4 Enhancement through Standardization 319 18.5 Enhancement through Context 320 18.6 Enhancement through Data Merging 321 18.7 Summary: Qualifying Data Sources for Enhancement 324 Chapter 19 Master Data Management and Data Quality 327 19.1 What Is Master Data? 328 19.2 What Is Master Data Management? 330 19.3 "Golden Record" or "Unified View"? 331 19.4 Master Data Management as a Tool 332 19.5 MDM: A High-Level Component Approach 333 19.6 Master Data Usage Scenarios 336 19.7 Master Data Management Architectures 339 19.8 Identifying Master Data 343 19.9 Master Data Services 344 19.10 Summary: Approaching MDM and Data Quality 349 Chapter 20 Bringing It All Together 351 20.1 Organization and Management 351 20.2 Building the Information Quality Program 360 20.3 Techniques and Tools 373 20.4 Summary 383 Index 385