Enterprise Data Quality Dashboards and Alerts: Holistic Data Quality Jay Zaidi Bonnie O Neil (Fannie Mae) Data Governance Winter Conference Ft. Lauderdale, Florida November 16-18, 2011
Agenda 1 Introduction 2 Data Quality Challenges and Opportunities 3 Holistic Data Quality (HDQ) 4 Enterprise Data Quality Solutions Architecture 5 Enterprise Data Quality Dashboard Example Page 2
Meet the Authors Jay Zaidi Enterprise Data Quality Program Lead, Fannie Mae 15+ years in Enterprise Data Management and Solution Architecture Specialized in Financial Services and Healthcare domains Page 3
Meet the Authors Bonnie O Neil Technical Data Architect, Fannie Mae 20+ years as a Data Architect Author: 3 books Most recent: Business Metadata Author, over 50 articles & white papers Page 4
Data Quality Management Challenges and Opportunities Data Silos Holistic Data Quality (HDQ) Data Volumes and Velocity Data Optimization and Scalability Complex Data Architectures Simplify Data Architecture Real Time Enterprise Requirements Real Time Data Quality Monitoring Lack of of Accountability Strong Data Governance Reactive Mode Proactive Data Quality Controls Lack of of Straight Through Processing Automated controls and monitoring Structured and Unstructured Data (email, video, logs, system events etc) Leverage Big Data Solutions High level of maturity in Data Quality Management is required to address operational challenges. Page 5
The Data Quality Maturity Journey STEP ONE STEP TWO STEP THREE FOUNDATION & FRAMEWORK CONSTRUCTING THE RAILROAD EXECUTION DQ Use Cases Solution Architecture Industry Tool Selection Consistent DQ Definitions Tool Deployment Reporting Capabilities Training & Communication Change Management Awareness Proactive DQ Controls DQ Continuous Improvement Robust data quality management is required to support Regulatory Compliance, Risk Management, Accounting, Financial reporting and other business functions. Page 6
The Data Architecture Spaghetti Department Two Operational Data Store Transactional Store Data Mart Transactional Store Data Mart Data Warehouses Operational Data Store Department One Department Three Diagram by Arnon Rotem-Gal-Oz, April 2007 How do you manage the quality of business critical data in a dynamic and highly complex environment? Page 7
The Information Supply Chain Transparency into quality across supply chain Diagram by George Marinos - The Information Supply Chain: Achieving Business Objectives by Enhancing Critical Business Processes, April 2005 Each link of the information supply chain is dependant on the other strong controls are needed to manage business critical data. Page 8
Typical Current State Data Flow External Data Feeds Transactional and Operational Stores External Data Feeds Data Warehouse Data Marts Potential data quality problem The current siloed approach to data management is wasteful and doesn t provide transparency into systemic issues. Page 9
Future State Data Flow: Continuous Data Quality Monitoring External Data Feeds Transactional and Operational Stores External Data Feeds Data Warehouse Data Marts DQ Monitoring Enterprise Data Architecture should enable straight through processing and offer operational efficiencies. Page 10
Typical Business Scenario Analyze Data and Conduct Forensics (Data Quality Tool) Implement Real Time Data Quality using DQ Services (Data Quality Tool) Identify anomalies and remediate issues (Data Quality Tool and EDQ Dashboard) Internally or Externally Supplied data Enterprise Applications Reports & Executive Dashboards Enterprise Data Stores (Transactional, Operational, Marts and Warehouses) The Enterprise Data Quality Platform provides the tools, methodologies and best practices to identify and remediate data quality issues. Page 11
Holistic Data Quality Our focus should be on addressing systemic issues. This requires a switch from reactive to proactive approaches to data quality and quality that is not evaluated or managed in silos, but addressed using a holistic cross-silo approach. Holistic Data Quality (HDQ) is the term that I have coined to address this need. Jay Zaidi Implementing HDQ at the enterprise level is a strategic, multi-year effort for mid to large-sized firms. If done right - the return on investment is many fold. Page 12
Do Not Boil The Ocean Narrowing the scope of the effort will ensure success Identify data critical for the enterprise 10,000 to 20,000 General population of data elements* 2,000 to 3,000 400 to 500 Critical data for a line of business* ( LOB Critical ) Critical data for the enterprise* ( Enterprise Critical ) Initial Focus should be on Enterprise Critical data * Estimates Only Enterprise level governance and quality efforts should focus on Enterprise Critical data. Lines of business should govern and manage the quality of their business critical data. Page 13
Dimensions of Data Quality The concept of Dimensions of Data Quality has been established by many authors in the industry, such as David Loshin and Danette McGilvray: To be able to correlate data quality issues to business impacts, we must be able to both classify our data quality expectations as well as our business impact criteria. -David Loshin Dimensions are facets or specific measurements of data quality, pertaining to specific data elements The authors propose many variations but the main ones that most agree on are: Accuracy Conformity Completeness Consistency/Duplication Timeliness (sometimes called Currency) Integrity Data Quality Dimensions facilitate the consistent definition of data quality requirements and metrics across various organizations. Page 14
Dimensions of Data Quality - Explanation Accuracy: How much does the data conform to the real world? Completeness: How much required data is missing? Conformity: How much does the data conform to formats and domain values? Duplication: Does the same data exist in multiple systems? If If so, is it it represented the same? Integrity: Does the data conform to integrity rules appropriately? Are relationships between elements retained? Currency: How current is the data? When was it it last entered or refreshed? There are a dozen or more Data Quality Dimensions that can be defined, but organizations should pick the ones that best meet their needs. Page 15
Replace Paper Reports with Business Intelligence Operational Incidents Audit Findings Data Quality Issues Report Regulatory Compliance Issues Weekly Data Management Status Reports Replace mounds of paper with a business intelligence solution gain access to summary and detailed information on key quality indicators on-demand. Page 16
Business Intelligence for Enterprise Data Quality Business intelligence tool (COTS) Data quality Commercial-off-the-shelf (COTS) product Data profiling, standardization, cleansing, normalization etc. Data quality rules repository Data quality rules engine Data quality results repository Data quality data mart (custom) Data quality issue management system Extract Transform and Load (ETL) product Enterprise Service Bus (SOA and Data Quality Services) Data Quality Tool (Profiling/Rule Execution) Data Stores Files SOLUTION COMPONENTS Data Quality Rules Data Quality Results ETL Data Quality Mart Enterprise Dashboard Business Intelligence Tool Page 17
QUALITY BY LINE OF BUSINESS ENTERPRISE DATA QUALITY DASHBOARD (Enterprise View) DATA QUALITY MATURITY CRITICAL DATA BREAKDOWN RELEASE 1 WHOLESALE RETAIL COMMERCIAL WHOLESALE RETAIL COMMERCIAL RELEASE 2 TRENDING OF DATA QUALITY PRODUCT DATA CUSTOMER DATA REGIONAL TREND HEALTH INDICATORS OVERALL HEALTH QUALITY RATING FOR EACH DATA ELEMENT Page 18
OVERALL HEALTH ENTERPRISE DATA QUALITY DASHBOARD (Retail Business View) CRITICAL DATA BREAKDOWN HEALTH INDICATORS RELEASE 1 RELEASE 2 TRENDING OF DATA QUALITY BORROWER DATA LOAN DATA DATA STORE TREND QUALITY RATING FOR EACH LOB DATA ELEMENT DATA QUALITY SERVER UTILIZATION Page 19
Continuously Measure and Improve Quality Step 1 - Define Define the scope, goal, budget, duration and the data quality problem to be addressed. Step 2 - Measure All relevant data quality statistics and measures important to the enterprise are collected at this stage. Step 4 - Control Monitor the quality after remediation to ensure that data is defect free. If there are any further changes to be made, the team makes changes and again measures the quality. Step 3 - Analyze and Improve Analysis of the data collected in the previous phase is conducted and root cause(s) identified. Data remediation is implemented to improve the quality of data. The Enterprise Data Quality dashboard provides transparency into data quality hotspots that must be addressed proactively. Page 20
Summary Effective data management provides order out of chaos Implementing Holistic Data Quality provides transparency into data quality issues across the information supply chain and helps in identifying systemic issues Focus must be on Enterprise Critical data initially. Do not try to boil the ocean. The solution architecture s core components are the data quality COTS product, a data quality Data Mart and a Business Intelligence tool Proactive monitoring and measurement of data quality, coupled with an alerting mechanism, significantly reduces operational incidents Implementing HDQ is a strategic initiative and requires C-level sponsorship and support Page 21
Questions!! Page 22