Data Governance David Loshin Knowledge Integrity, inc. www.knowledge-integrity.com (301) 754-6350
Risk and Governance Objectives of Governance: Identify explicit and hidden risks associated with data expectations Actualize implementation of business policy Provide framework for auditing compliance Oversee definition of critical data elements Manage enterprise data ownership and stewardship Provide management oversight for organizational observance of different kinds of information policies
Aligning Information Objectives and Business Strategy Clarify and understand the existing Information Architecture Create an inventory of data assets Applications, data assets, documentation, metadata, usage Inventory of data elements and owning application Sales Human Resources Marketing Customer Service Finance Compliance Legal
Map Information Functions to Business Objectives Document the activities that support a business activity Example: a website privacy policy specifies age limits for data sharing based on parent s permission Implies the existence of child birth date and parent permission data elements Function is to verify compliance with privacy constraints by checking those data elements Standardize mapping from business activity to application function Associate all data elements associated with each application function Bottom-up assessment describes how information policy is implemented across application silos Objective: Correlate application functionality, business policy, and data life cycle
Areas of Information Risks Business/Financial Consistency across internal reports Regulatory Reporting Sarbanes Oxley, Basel II, 21 CFR 11, FAS 133 Customer Knowledge GLB, USA PATRIOT Act, BSA, Anti-Kickback Statute Protection of Private Information HIPAA, GLB Collaboration Delays in straight-through processing, delayed settlement Limitation of Use Digital Millennium Copyright Act Consensus and Collaboration Data Ownership Semantics
Data Governance, Information, and Risks Missing or Replicated Data Nonstandard or complex data transformations Failed identity management processes Undocumented, incorrect, or misleading metadata
Missing or Replicated Data Absent or unfindable data leads to Incomplete reporting Inability to accurately calculate risk Many distributed databases feeding many financial applications leads to Variant approaches to report generation Untracked copying of reports into desktop applications Examples: Basel II: Inaccurate or missing credit assessment data will impact correct calculation of credit risk DoD Guidelines on Data Quality: the inability to match payroll records to the official employment record can cost millions in payroll overpayments to deserters, prisoners, and ghost soldiers. the inability to correlate purchase orders to invoices is a major problem in unmatched disbursements.
Nonstandard or Complex Data Transformations Original data definition and intent may reflect application dependencies and semantics Integration across multiple applications across organizational boundaries introduce numerous opportunities for transformation inconsistencies Complex data (e.g. semi-structured and unstructured documents) must be transformed into usable formats before processing
Failed Identity Management Processes Inability to uniquely identify entities (people, organizations, products, etc. Inability to link multiple records representing the same entity Example: In 2004, Senator Ted Kennedy was subjected to extra screening when boarding a plane in Boston A DHS spokesman said that Kennedy was misidentified as someone who was mistakenly identified as someone on a watch list
Undocumented, Incorrect, or Misleading Metadata Laxity in enterprise metadata management leads to: Assumptions about meanings of commonly used business terms Implied qualification of data element meanings Inconsistency across application and enterprise information architectures Reduced trust in the correctness of the data Limitations in resolving trade settlement and counterparty transactions Consolidation, integration, migration are all impacted when variant definitions are assumed to mean the same thing Example: PWC estimates that 90% of the top 100 world banks are deficient in credit risk data management in maintenance of clean counterparty static data repositories, common counterparty identifiers,, staff dedicated to data quality, consistent data standards.
Review: Challenges for Critical Data Elements Absence of clarity makes it difficult to determine semantics Ambiguity in definition introduces conflict into the process Lack of Precision leads to inconsistency in representation and reporting Variant source systems and frameworks encourage turf-oriented biases Flexibility of data motion mechanisms leads to multitude of approaches for data movement
Governance Commonalities Information policies differ depending on related business risks, but share commonalities: Federation Defined Policy Transparency Auditability
Objectives Identify critical data elements Define/Refine information policies Describe metrics and measurements Create process for monitoring and evaluation
Critical Data Elements Identify enterprise metadata in use across the organization and: Clarify unambiguous definitions, formats, and semantics Facilitate agreement to those definitions and semantics from all stakeholders Absorb replicated reference sets into a single managed repository
Define/Refine Information Policies Embody the specification of management objectives associated with data governance Relate assertions to related data sets Articulate how business policy is integrated with information asset Example: Anti-money laundering Establishing policies and procedures to detect and report suspicious transactions Ensuring compliance with the Bank Secrecy Act Providing for independent testing for compliance to be conducted by outside parties.
Metrics and Measurement Decompose information policies into specific measurable data rules Apply tools and techniques for measuring conformance to data rules (think: data profiling) Metrics can be rolled up from data rules defined as a byproduct of analyzing the information policy
Monitoring and Evaluation One business policy can encompass multiple information policies Each information policy may encompass multiple data rules Each data rule, therefore, contributes to monitoring compliance with business policy! Business Policy Information Policy Information Policy Information Policy Data rule Data rule Data rule Data rule Data rule Data rule Data rule Data rule Data rule Data rule Data rule Data rule
A Repeatable Data Quality Process Identify actual problems with the data as they relate to business client expectations Identify specific business impacts attributable to those problems Quantify the size of those impacts for prioritization Evaluate the costs to reconcile the data quality problems Once these details have been identified, the value of improved data quality can be quantified Prioritize and select projects for improvement
DQ Management Goals Evaluate business impact of poor data quality and develop ROI models for Data Quality activities Document the information architecture showing data models, metadata, information usage, and information flow throughout enterprise Identify, document, and validate Data Quality expectations Educate your staff in ways to integrate Data Quality as an integral component of system development lifecycle Governance framework for Data Quality event tracking and ongoing Data Quality measurement, monitoring, and reporting of compliance with customer expectations Consolidate current and planned Data Quality guidelines, policies, and activities
Technical Data Governance Framework Policies and Procedures Roles & Responsibilities Ongoing Monitoring Audit & Compliance Standards Oversight Performance Metrics Data Definitions Master Reference Data Taxonomies Enterprise Architecture Exchange Standards Data Quality Data Profiling Data Cleansing Auditing & Monitoring Parsing & Standardization Record Linkage Data Integration Data Access Transformation Delivery Discovery & Assessment Metadata Management
Roles and Responsibilities Executive Sponsorship Data Governance Oversight Provide senior management support at the C-level, warrants the enterprise adoption of measurably high quality data, and negotiates quality SLAs with external data suppliers. Strategic committee composed of business clients to oversee the governance program, ensure that governance priorities are set and abided by, delineates data accountability. Data Steering Committee LOB Data Governance LOB Data Governance LOB Data Governance LOB Data Governance Tactical team tasked with ensuring that data activities have defined metrics and acceptance thresholds for quality meeting business client expectations, manages governance across lines of business, sets priorities for LOBs and communicates opportunities to the Governance Oversight committee. Data governance structure at the line of business level, defines data quality criteria for LOB applications, delineates stewardship roles, reports activities and issues to Data Coordination Council
Metadata Consensus: Embedded in the Program Step One: Initial Request Submitted Review by Metadata Coordinator Step Two: Workgroup Formed Submission Development Review by Steering Committee Approved? yes Form Workgroup Review by Metadata Coordinator Step Three: Completed Candidate Proposed no Returned with explanation Review by Technical Committee Approved? no Returned with explanation yes Step Four: Public Comment Workflow incorporates both Consensus Governance Step Five: Steering Committee Approval Approved? no Returned with explanation yes Step Six: Data Governance Oversight Board Endorsement
Data Governance Roles Data Governance Oversight Board Metadata Coordinator Data Steering Committee Technical Advisory Group Workgroup Member Data Quality Representative (Data Steward) Data Registrar
Data Governance Oversight Board Guides data quality management activities Oversees compliance with information policies and governance directives Approves governance policies Reviews and Endorses/Approves standards Institutes organizational data quality scorecard
Workgroups Cross-group collection of relevant stakeholders Involve representation from both the technical and business sides Act as interface to general user community Tasked with Developing proposed definitions and standards Ensuring community collaboration Ongoing maintenance of definitions and standards
The Steering Committee Provides direction to those tasked with data quality and metadata management Authorize workgroup activities Provide direction for development of semantics, taxonomies, and ontologies Recommend standards to the Data Governance Oversight Board Ensure that data quality controls are in place Ensure that key data quality indicators are communicated to stakeholders and data owners
Technical Advisors Tasked with: Providing technical input to workgroup definitions and standards development Identifying technical and infrastructure issues with standard definitions and expected uses Assess business needs for tools and technology Updating & maintaining technical specs Providing guidance on implementation Identifying and documenting existence of source of truth data sets
Metadata Developers Encapsulate data element definitions, format specification, and semantics in a formal representation Facilitate development of: Enterprise data definitions Exchange/sharing schemas (e.g., fixed-format, XML) Exchange application support (e.g., class definitions, code development, application objects) Functional support for shared application capabilities for information life cycle
Metadata Registrar Provides support and configuration management for standards within the Metadata Registry Manages access to the Metadata Registry Facilitates and manages data standards activity workflows Helps develop procedures Promote reuse across applications
Data Steward Tasked with: Determining the relevant data sets to be subjected to data quality management Managing data quality Documenting, communicating, and tracking issues and concerns to relevant stakeholders Verifying the metadata Assuming accountability for managing the quality of data Establishing data quality service level agreements
Coordinating the Data Governance Processes Manages the various data quality activities of data owners and workgroups Compiles, maintains, and monitors data quality performance indicators in process Supports the metadata and data quality rules definition, registration, and development processes Develops policies and procedures Provides training and knowledge transfer
Engineering Data Quality into the System Flat File RDBMS Analyze/profile data Assess data quality dimensions Data quality, Validity, & Transformation rules Create monitoring system Recommend data transformations IMS VSAM Improved enterprise data quality Application Generate data quality reports Send data quality reports to data owners
Data Quality Life Cycle Initially, many new issues will be exposed Over time, identifying root causes and eliminating the source of problems will significantly reduce failure load Change from an organization that is fighting fires to one that is building data quality firewalls Transition from a reactive environment to a proactive one facilitates change management among data quality clients Errors Time
Data Quality and the SDLC How can data quality become part of the system development lifecycle? Emphasize value of high quality information in business context Develop metrics and processes for measurement Extract implementation of validation from embedded sources and expose as business knowledge Integrate automated, business rule-based data quality testing and validation as part of system design
Stewardship: Remediation and Manual Intervention Issues with addressing data quality events: Immediate remediation of flawed data does this imply data correction? Not all data flaws can be captured via automated processes this implies manual reviews Accuracy may only be measured by comparing values directly Carefully integrate manual intervention when necessary in a controlled manner
Data Quality and Data Governance Develop high level data quality management framework incorporating: Methods to evaluate business impact of poor data quality Technical requirements of data quality as part of SDLC Operational guidelines for ongoing monitoring, reporting, tracking, and management Knowledge capture, including the coordination of data modeling, data standards, metadata, and information usage modeling efforts
Pulling it All Together Review baseline of current business and information policies Develop a business case process for evaluating value of data quality improvement and risk mitigation Build an inventory of enterprise metadata Manage critical data elements Define/refine information polices and data rules Establish processes for measurements and monitoring Make accountability actionable
Questions? If you have questions, comments, or suggestions, please contact me David Loshin 301-754-6350 loshin@knowledge-integrity.com