DATA GOVERNANCE AND DATA QUALITY Kevin Lewis Partner Enterprise Management COE Barb Swartz Account Manager Teradata Government Systems
Objectives of the Presentation Show that Governance and Quality are part of a larger EDM function Provide a process framework for effective Quality Management Explain the role of Governance and Stewardship in a Quality function Provide advice on aligning a Governance program to business value 2 2/28/12 Teradata Confidential
EDM Framework A Path to Integrated and Trusted Information Governance The practice of organizing and implementing principles, policies, procedures and standards for the effective use of data Stewardship - Continual, day-to-day activities of creating, using, and retiring data Quality Ensure data is fit for its intended use Integration Includes Acquisition (ETL/ ELT) processing to combine transaction and master data to provide a consistent, meaningful, and trusted view of the data across business units and subject areas Security and Privacy Information security, data privacy and regulatory compliance across data subject areas, including monitoring and audit capabilities Metadata Management The people, processes and technical components necessary to ensure that metadata is easily accessible, consistent, current, accurate, timely and complete Master Management Management of master data domains, such as Product and Customer data, that provide context for transactional data Architecture The logical and physical data modeling plus other activities needed to understand business information needs and design for effective database usage 3 2/28/12 Teradata Confidential Master Mgmt Architecture Metadata Mgmt Governance Integrated and Trusted Information Stewardship Quality Security and Privacy Integration People, Processes, and Technology
Governance, Stewardship, and Enterprise Management Governance provides oversight for Enterprise Management (EDM) Stewardship provides the day-to-day business involvement for EDM activities Master Mgmt Architecture Metadata Mgmt Governance Integrated and Trusted Information Quality Security and Privacy Integration Stewardship 4 2/28/12 Teradata Confidential
Quality The core dimensions of data quality are: Accuracy data represents reality correctly Completeness data gaps are minimized and data subjects are covered adequately Timeliness data is stored in system within an acceptable time from the business event Master Mgmt Architecture Governance Integrated and Trusted Information Quality Integration Consistency data is defined and reported with the same meaning and values across the enterprise Metadata Mgmt Security and Privacy Governance determines the focus of data quality improvements based on business value Stewardship Stewards provide business understanding of assigned data subjects 5 2/28/12 Teradata Confidential
Dimensions of Quality A Longer List Dimension Description Conformance Non-Conformance Accuracy A measure of information correctness A balance of $10,000 is stored as a balance $10,000. A balance of $10,000 is stored as a balance of $12,500. Consistency Entirety Breadth Completeness Uniqueness A measure of the degree of conflicts that exist in situations with redundant data A measure of the quantities of entities created, versus the real world or the number of actual events A measure of the amount of information captured about an object or event A measure of information caps within a specific entity occurrence A measure of unnecessary information replication Interpretability A measure of semantic standards being applied A balance of $10,000 in the ABC system is also stored as $10,000 in the XYZ system. All phone calls that were made were recorded and stored for billing. All information about a specific call is captured including duration, start and stop time, origination and termination information, billing information, network information, etc. Name, age, and occupation are known for all customers. Customer information is stored once for each customer. A date is stored as 11 June 2002 A balance of $10,000 in the ABC system is also stored as $12,500 in the XYZ system. Calls to a particular NPA-NNX were not recorded due to a switch profile problem. Revenue for these calls will be lost. None of the network related information for a specific call is captured. Nothing is known about how the call was handled by the network. Name and age are known for all customers but occupation is known for only 50% of the customers. Certain customers records are duplicated due to variations in the spelling of the name, alternate address, etc. The records are not linked in any way. A date stored as 11062002 is interpreted as November 06, 2002. Timeliness A measure of how current a record is All customer addresses represent the current place of dwelling. Many customers have changed their address without informing the company. Precision A measure of exactness The amount of tax due for this specific transaction is $0.104. Depth Integrity A measure of the amount of entity of event history that is retained A measure of validity with respect to another item of related information A complete history of orders, bills, and payments is retained for all customers. A call detail record contains a from number of (404) 240-9999. 6 2/28/12 Teradata Confidential The amount of tax due for this specific transaction is stored as $0.10. Orders, bills, and payment information is only retained for one year. Each month, the prior year records are deleted for that month to make room for the new information. The Terminating Point Master table indicates that due to an area code split, the 240 NNX is now in the 770 NPA.
Quality Business Example Business Objective: Control outof-stocks and inventory carrying costs Action: Provide order suggestion to grocery stock clerk based on forecasted sales and current inventory balances Problem: Incorrect inventory balances in system Root Cause (example): Cashier not correctly identifying produce item Fix: Label loose produce item with lookup code and GS1 Bar Finding the problem (profiling): Find unusual percentage breakdown in sales data for certain produce categories Monitoring (scorecarding): Establish rule and threshold for expected percentage breakdown versus actual 7 2/28/12 Teradata Confidential
Quality Improvement Process Model Step 1: Select & Define Step 2: Profile Step 3: Analyze No. of Errors Value Step 6: Monitor & Trend Step 5: Fix Root Causes Step 4: Trace Root Causes Error count Time People Process Information Technology 8 2/28/12 Teradata Confidential
Governance and Stewardship Roles for Quality Step 1: Select & Define Step 2: Profile Step 3: Analyze Governance Council determines appropriate focus Steward brings business meaning of data No. of Errors Value helps interpret profiling results Error count Step 6: Monitor & Trend monitors data quality and initiates improvement Time Step 5: Fix Root Causes People approves IT fixes and Process facilitates business Information change Technology Step 4: Trace Root Causes helps determine business root causes 9 2/28/12 Teradata Confidential
Technology Enablers for Quality Step 1: Select & Define Step 2: Profile Step 3: Analyze No. of Errors Profiling Tools Value Step 6: Monitor & Trend Step 5: Fix Root Causes Step 4: Trace Root Causes Error count Quality Scorecarding / Monitoring tools Time Match People / Merge tools MDM Process Enrichment Information Input Controls Technology 10 2/28/12 Teradata Confidential
The Role of the Warehouse in Quality Improvement Business process Business process DW uses (CRM, Mining, etc.) Business process DW Source bases Awareness,T raining, Motivation System & Process changes Cleanse DQ Scorecard 11 2/28/12 Teradata Confidential
Management Organization Business Intelligence Competency Center Executive Steering Committee Governance Council Stewards Business IT Executive Steering Committee Provides ultimate authority needed to unify information across the organization Governance Council Represents the entire organization to facilitate efforts that unify information Stewards Works across business areas and systems to ensure integrity of assigned data subjects Business Intelligence Competency Center Provides information and analytical services to the enterprise The structures shown here are primarily business-focused. IT supports these organizations by ensuring that IT solutions are in place that enable each area of EDM. 12 2/28/12 Teradata Confidential
Stewardship Matrix Domain Primary Role Sales Customer Asset Finance Location Campaign etc. Owner Steward IT Steward Business Area BICC Marketing Purchasing Operations Sales Accounting Customer Service Europe South America etc. Names go in these boxes 13 2/28/12 Teradata Confidential
Building Governance Adding Business Value by Resolving Issues and Enabling Projects Identify projects to benefit from DG Develop process to link to projects Capture data issues for DG Develop process to resolve data issues Building Capability to Sustain and Increase Business Value Assess current capabilities (P, P, & T) Prioritize and plan capability improvements Implement capability improvements 14 2/28/12 Teradata Confidential
and Capabilities are Deployed Incrementally to Support Business Initiatives Application 1 Application 2 Project 1 Project 2 Project 3 Application 3 Projects that use data (e.g., Supply Chain Management, Personnel, Maintenance) Capability 1 Capability 2 Capability 3 Projects that deploy capability (e.g., DQ, MDM, Stewardship) Domain 1 Domain 2 Domain 3 Each data domain supports one or more functional projects while simultaneously providing more data to BI users Warehouse 15 2/28/12 Teradata Confidential BI Users Access Integrated Projects that deploy data (e.g., sales data, inventory data)
Integration with the System Development Life Cycle (SDLC) Quality and related activities should be embedded in projects; these are just a few examples: Perform high level data profiling on proposed sources Perform detailed data profiling on required elements Capture business metadata and design mechanism to deliver Prioritize, resolve, and communicate data issues Communicate changes using Stewardship Network Plan Analyze Design Build Implement Manage Roadmaps and PPM help us plan for each project Ensure proposed solution architecture meets standards Design data quality rules and include in SLA Build data quality monitoring with thresholds Implement complete solution, including DQ, MDM, etc. Support ongoing data quality program; maintain metadata 16 2/28/12 Teradata Confidential
AF Global Combat Support Systems Services Overview > Supports information sharing across all domains, services, & DoD agencies > Offers role-based on demand access to data > Provides designated authoritative data repository of current & historical data > Provides data transformation & integration > Utilize Commercial Off-the-Shelf (COTS) based solution > Net-centric environment The Environment > Over 19TBs of user data spread across more than 95 databases > Acquiring data from over 108 sources processing over 50 million rows of data daily Mostly batch interfaces, but do support Change Capture to meet near real time requirements > Analytics Business Objects, Cognos, 19 High Profile Rich Internet Applications supported by Web Services > Providing access to multiple USAF Communities and Commodities 17 2/28/12 Teradata Confidential
Teradata Corpora+on We invented Warehousing > Global Leader in Enterprise Warehousing > Positioned in Gartner s Leaders Quadrant in data warehousing since 1999 We pioneered the Active Warehouse Market > Extending traditional data warehousing for operational intelligence Global presence and world-class customer list > More than 1,000 customers > 10 years at USAF > More than 2,500 installations 7,000 associates Traded on NYSE (TDC) Prof. Services Hardware Software Integrated Solution Business Consulting Services Architecture Consulting Services Implementation Services Analytic Applications Logical Models base Software (inc. Tools and Utilities) Server Storage Support Services 18 2/28/12 Teradata Confidential
Questions? 19 2/28/12 Teradata Confidential
Thank you! 20 2/28/12 Teradata Confidential