TDWI rese a rch TDWI Checklist Report Integrating Data Governance into Your Operational Processes By David Loshin Sponsored by tdwi.org
August 2011 TDWI Checklist Report Integrating Data Governance into Your Operational Processes By David Loshin TABLE OF CONTENTS 2 FOREWORD 2 NUMBER ONE Define and approve a data governance charter. 3 NUMBER TWO Clarify accountability for the operational responsibilities of data governance roles. 3 NUMBER THREE Connect observance of business policies to information requirements. 4 NUMBER FOUR Assess the cross-functional uses of data. 4 NUMBER FIVE Formalize information requirements by defining data rules. 5 NUMBER SIX Standardize services for inspecting and monitoring conformance to data rules. 5 NUMBER SEVEN Design and implement data remediation workflows. 6 NUMBER eight Institute notifications, incident reporting, and resolution tracking. 6 NUMBER NINE Drive technology acquisition based on operational governance needs. 7 ABOUT OUR SPONSOR 7 ABOUT THE AUTHOR 1201 Monster Road SW, Suite 250 Renton, WA 98057 T 425.277.9126 F 425.687.2842 E info@tdwi.org tdwi.org 7 ABOUT TDWI RESEARCH 2011 by TDWI (The Data Warehousing Institute TM ), a division of 1105 Media, Inc. All rights reserved. Reproductions in whole or part are prohibited except by written permission. E-mail requests or feedback to info@tdwi.org. Product and company names mentioned herein may be trademarks and/or registered trademarks of their respective companies.
FOREWORD number ONE Define and approve a data governance charter. Although much effort has been spent on describing organizational structures for data governance committees, their effectiveness is limited by the absence of well-defined policies and operational procedures that guide the integration of data governance and control within day-to-day processes. Individuals tasked with data stewardship responsibilities may become hesitant to embark on a list of specific actions if they begin to interfere with existing responsibilities. Periodic meetings are planned, but agendas are unstructured in the absence of clearly defined performance objectives for operationalization. Data governance meetings eventually peter out, leaving data stewards to operate in a silo, making assumptions and guessing on data policies and processes. Attaining a sustainable data governance capability relies on defining cross-organizational processes for defining data oversight policies, processes for monitoring data policies, and enforcing compliance. This TDWI Checklist Report details recommendations that connect high-level policies to specific actions, and shows the ways that data governance tasks can be integrated into existing operational processes. Following the items on this Checklist will energize your data governance program and ensure its integration within operational workflows. To be effective, a data governance program should be a business initiative. A data governance charter provides the program s business justification. It anchors the program within the organization by formalizing the value proposition and specifying the program objectives, desired results, and groundwork for collaboration and oversight. The charter describes the logical organization for oversight, such as a data governance council, as well as the council s authority to define, review, implement, and enforce information policies. The charter should clearly state: Business problems to be addressed Program goals, metrics, and success criteria General approaches for institutionalization Definition of specific roles and responsibilities Only after the data governance charter is approved can the organizational structure be fleshed out and the presiding data governance roles filled. At this point, the structure can be designed, key data policies can be mapped to processes, and roles such as data governors, data custodians, data stewards, and subject matter experts can be assigned. 2 TDWI rese arch tdwi.org
number Two Clarify accountability for the operational responsibilities of data governance roles. number three Connect observance of business policies to information requirements. The first meetings to initiate data governance are marked by great enthusiasm, especially when there is a perception that concrete steps are being taken to address chronic data quality problems. With agenda items focusing on data issues, data standards, and clarifying business terms to create a business glossary, these meetings have no shortage of topics. The challenge is not in setting the agenda for what happens during the meeting, but in creating an operational framework for next steps and action items. Such a framework provides the ability to manage and take effective action between meetings. Participating in a weekly discussion about data standards and data models is one thing, but being assigned accountability for ensuring high-quality data can lead to inaction or ineffectiveness of newly designated data stewards if they don t know the operational processes for governing data. This is particularly evident in executing the day-to-day operational tasks of monitoring data rule conformance, identifying data errors or process issues, and remediating data failures. Accountability is more clearly defined when a data governance program incorporates: A framework for defining, proposing, and agreeing to data policies Defined processes for inspecting data and monitoring adherence to defined policies A set of operational workflows for investigating the root causes of a problem Processes for identifying, prioritizing, and recommending remediation alternatives Ensuring a comprehensive understanding of the value of data sharing and repurposing means defining the processes and procedures required for operationalization. One of the most important aspects is assessing data use. Data governance team members will need to follow defined processes to consider potential use cases. This will help define the workflow processes so that when data stewards are assigned their roles, they have the proper training and methods to evaluate and remediate emerging data problems. Providing designated stewards with a data governance operational run book detailing roles and responsibilities will delineate accountability, guide specific activities, and alleviate the uncertainty regarding the operational aspects of data governance. Every organization is both directly and indirectly driven by business policies, which are meant to direct behavior and practices within the business context. Sometimes policies are imposed from outside the organization, such as privacy regulations that direct the protection of personally identifiable information. Other times policies are defined internally, such as meeting projections for quarterly earnings targets or observing directives regarding preferred customer discounts. Either way, operational processes evolve and applications are developed to support the requirements dictated by these business policies. Data that supports business information is often overlooked when business policies are enforced. Operationalizing data governance requires making the connection between business policies and data rules. To make this connection, business policies should be defined using natural language that indicates specific implications for the use of data. Here are some examples: Protection of private information implies the ability to tag data elements as private and apply the proper access controls Statutory reporting implies managing data lineage along with the ability to create reports as directed by the law to ensure auditability Compliance with industry standards implies the use of agreed-to data specifications for data exchange Corporate dashboards rely on the ability to collect data for the reporting of key performance indicators (KPIs) with a predictable level of trust The proper application of discounting rules implies capturing and managing preferred customer status Ensuring observance of the business policy implies ensuring compliance with the data rules derivable from the associated information policies. Conversely, sets of rules regarding the use of accurate, consistent, complete, and timely data are then mapped to the corresponding information and business policies. Operational data governance methods for validating compliance with data rules are then integrated into the business processes and workflows. 3 TDWI rese arch tdwi.org
number four Assess the cross-functional uses of data. number five Formalize information requirements by defining data rules. Many, if not all, organizations are structured around functional areas of the business, such as marketing, sales, finance, fulfillment, manufacturing, or human resources. Accordingly, an organic approach to system development has allowed applications to be designed and developed that meet specific functional needs, such as logging sales transactions or managing customer service centers. When the processes are self-contained within a single business function, their success can be completely monitored within the context of the function. Yet, the tasks associated with many corporate business processes span more than one functional area. Examples include managing compliance with privacy regulations, ERP processes such as orderto-cash or procure-to-pay, business intelligence, and predictive analytics. The success of these cross-functional processes is measured in relation to corporate KPIs. Communication across the areas of the business is done through data sharing the results of each phase are stored in preparation for consumption by subsequent process phases. The quality of each data set might be sufficient to meet the needs of the originating business application. However, cross-functional process success depends on shared information; each phase may repurpose data sourced from different areas of the business, and may impose new data quality requirements. The absence of oversight over aspects such as collection, enrichment, and modification of that information (often accumulated from applications supporting different areas of the business) can lead to process errors, causing delays that reduce effectiveness, predictability, and ultimately contribute a cumulative negative effect on achieving key performance objectives. Effective management of operational processes that span the enterprise relies on the availability of high-quality data. Ensuring the success of cross-functional processes requires practical methods for building operational data governance into the application infrastructure. This begins with an understanding of the information flow for cross-functional processes: Identify the critical data elements consumed downstream Solicit data quality expectations Map the information flow Ensure that compliance with the collected data requirements is inspected and monitored along the way While policies expressed in natural language convey general business expectations, it is necessary to establish a connection between a high-level business policy, the information used to support compliance with the policy, and the specific data rules implied by those information dependencies. When ungoverned approaches to specifying information requirements are introduced, a number of challenges to ensuring business policy observance arise, including: Subjectivity. Since policies are interpreted in different ways, each approach to validating policy conformance is inherently subjective. Scalability. Without a standard way to validate that information requirements are observed, the typical methods of hard-coded tests and occasional manual checks will not scale. The sets of requirements will expand as data sets are repurposed multiple times, and in each instance the requirements must be mapped to different data models. Consistency. With no agreed-to method for specifying information requirements, there cannot be consistency in their validation. Each information requirement involves reviewing metadata characteristics: identifying business terms that map to known data concepts and specifying facts that relate those data concepts. In turn, defined data validity rules can be monitored to demonstrate that information requirements are met. A framework of data rule categories corresponding to recognized dimensions of data quality (such as completeness, consistency, or uniqueness) helps reduce the severity of these issues. Data rules directly link observance of quality assertions to information compliance, and these rules can be uniformly operationalized using automated inspection and monitoring services. As an example, consider a typical business policy for supplier payments: each vendor invoice must correspond to an issued purchase order (PO). A corresponding information requirement states that invoices will not be processed if any invoiced line items are not linked to a purchase order identifier issued by the PO system. Derived data rules may include: 1. Invoices are incomplete without PO identifier for each line item 2. PO identifiers must conform to a PO identifier format 3. PO identifiers must match with PO identifiers in the purchase order data system 4 TDWI rese arch tdwi.org
number six Standardize services for inspecting and monitoring conformance to data rules. number seven Design and implement data remediation workflows. In the past, it was not uncommon for application programmers to insert validations and controls directly into their programs. However, variations in interpretation existed within application development silos, and management of the data quality rules fell completely into the realm of the technologist; there was little interaction with business consumers. With few standards for data governance, even as rules were integrated into the program, no process existed for reviewing the rules, verifying their correctness, and most critically, ensuring rule consistency. In today s environment, much of the effort for data integration and repurposing has been automated, which not only addresses the need for scalability (both for data volume and data set repurposing), but also allows compliance monitoring to be streamlined through the use of tools. Instead of pushing the responsibility for data quality compliance to technical programmers, standardize processes for documenting and sharing data rules and standardize services for automated inspection, monitoring, and notification when data exceptions occur. A simple example is a defined rule for validating data completeness. If someone in shipping needs the sales agents to capture components of the delivery address at the point of sale, a rule verifying that the data values are not missing can invoke a data quality service at transaction processing time. A triggered alert pushes incomplete records back to the sales application for proper data capture. Some organizations rely on an agreed-to set of query criteria applied to different data sets for data validation. This is satisfactory within a limited context and when contained to specific data domain and use cases in a stable source system. However, as business policies affect cross-functional uses and processes, continuous monitoring of adherence to commonly recognized assertions can be streamlined by using a services approach. Specific tools such as data profilers can be coupled with reporting systems to use a common set of data rules for inspection and monitoring. Layering data profiling inside a service-oriented architecture allows applications within different business contexts to validate different data instances using a common set of data rules, and generate notifications to data stewards when noncompliant data items require remediation. As suggested by the directive of accountability within the data governance charter, practical suggestions for operationalizing data governance focus on refining discrete data rules from business policies and standardizing scalable methods for monitoring data rule validation across business process flows. With proper data management policies in place, defined processes ensure that when data rules are violated, there is a clear line of accountability directing the right set of individuals to proactively investigate the root causes of the issue as well as evaluating alternatives for addressing those root causes. These specific procedures implement agreed-to data governance policies which, when incorporated into a service-level agreement (SLA), establish lines of responsibility and escalation strategies when issues are not resolved in a timely manner. Specific aspects of the data remediation workflows include triage, assessment, and remediation: Triage is performed to understand where and how the data issue manifests itself, the business impact, the size of the problem, and the number of individuals or systems affected. It incorporates the evaluation of issue criticality, issue frequency, the feasibility of correction, and what steps can immediately be taken to prevent the issue from recurring. Assessment involves the synthesis of what was discovered during triage. The data steward prioritizes each identified issue and considers alternatives for eliminating the root cause, additional inspections, and whether specific corrective measures are to be taken. Remediation can comprise different tactics depending upon the location and root cause of the data issue. Cascaded data errors may require backing out changes, rolling back processing stages, and restarting the process. Failed processes can be adjusted to prevent the introduction of data errors, and human errors can be reduced through user training. Most critically, the key data governance team members must consider potential use cases to define the workflow processes before data stewards are put in a position to evaluate emerging data problems. 5 TDWI rese arch tdwi.org
number eight Institute notifications, incident reporting, and resolution tracking. number nine Drive technology acquisition based on operational governance needs. Standardized inspection and monitoring employ tools to streamline the data validation process. Remediation workflows operationalize accountability for data by describing the methods by which data issues are reviewed and resolved, how individuals investigate root causes, and how they evaluate alternatives for addressing those root causes. A means for triggering the remediation workflow is critical when an exception is identified and when monitoring the progress of issue resolution. Key components of an incident management framework include: Notifying data stewards of emerging issues Logging all discovered data issues for the responsible parties Providing a collaborative framework for tracking the progress of the remediation and resolution activities Identifying point of failure in the process for root cause analysis With this framework in place, data stewards can create an audit trail for aligning data quality and governance performance with business policy compliance. They also have the foundation to better report and manage remediation by understanding mean time to resolve issues, frequency of occurrence of issues, types of issues, sources of issues, and common approaches for correcting or eliminating problems. This links operational data governance to business process success. These Checklist items suggest that practical approaches to data governance can be integrated directly into operational processes, from the refinement of data rules from business policies, to the solicitation of requirements from the pool of downstream consumers, to standardized services for inspection, monitoring, notification, and resolution of data exceptions. This ultimately suggests two fundamental changes in the organization: 1. Requirements analysis, data quality inspection and monitoring, notification, tracking, and reporting must all be incorporated into the organization s system development lifecycle, or SDLC. 2. Much of the complexity for operational data governance can be simplified by employing the right kinds of automated tools and accompanying methodologies. Data governance is operationalized through the definition of policies and corresponding processes for assuring policy compliance. Although the processes for data governance are the most important components for operationalization, the procedures can be implemented in a number of ways, including the incremental introduction of tools and technology to supplement the defined processes. For example, data management tools such as data profiling, data cleansing, metadata management, data standards management, and incident management can be adapted to streamline operational activities. The results of inspection and measurements can be presented using existing end-user reporting techniques. As the policies, roles, and responsibilities are formulated and the processes are put in place, institute a technology adoption plan that corresponds to the staged rollout of operational data governance procedures. Use of data profiling, modeling, and metadata tools help to capture business data quality rules. Data profiling is helpful to develop data assessment processes, as well as inspection and monitoring services and embedded controls. Any data corrections, standardization, or enhancements performed within a resolution workflow should rely on standardized technologies to ensure consistency. Review the existing uses of reporting frameworks for developing performance dashboards and data quality metrics reporting. By aligning technology acquisition with the rollout of operational governance processes, the team can more effectively articulate business needs, evaluate training requirements, and essentially speed time to value. The combined set of business needs and requirements will also scope the vendor selection process, and perhaps even point to specific vendors whose capabilities map directly to organization expectations. 6 TDWI rese arch tdwi.org
about our sponsor about the author Be certain that all your information is reliable and available to manage your business. Trillium Software solutions and services bring certainty to your enterprise data for customer, product, financial, and supplier information. The Trillium Software System enables the business teams who understand the data and the IT teams who manage it to discover and quantify data anomalies, put data controls in place, and manage data conditions to optimize information performance. The Trillium Software System is recognized as a global leader in enterprise data quality and critical to the success of data integration, data migration, data stewardship, and data governance initiatives. We deliver solutions for data profiling, cleansing, enhancement, linking, geocoding, and governance for global enterprise applications, business intelligence, and data management platforms. Ensure that your information becomes the trusted asset that your employees rely on to make business-critical decisions every day. Don t just be certain about your data. Be Trillium Certain! For more information, visit www.trilliumsoftware.com. About the tdwi Checklist Report Series TDWI Checklist Reports provide an overview of success factors for a specific project in business intelligence, data warehousing, or a related data management discipline. Companies may use this overview to get organized before beginning a project or to identify goals and areas of improvement for current projects. David Loshin, president of Knowledge Integrity, Inc. (www.knowledge-integrity.com), is a recognized thought leader, TDWI instructor, and expert consultant in the areas of data management and business intelligence. David is a prolific author on business intelligence best practices, including numerous books and papers on data management such as The Practitioner s Guide to Data Quality Improvement, with additional content provided at www.dataqualitybook.com. David is a frequent speaker at conferences, Web seminars, and sponsored Web sites and channels such as www.b-eye-network.com. His best-selling book, Master Data Management, has been endorsed by data management industry leaders, and his MDM insights can be reviewed at www.mdmbook.com. He can be reached at loshin@knowledge-integrity.com. about tdwi research TDWI Research provides research and advice for business intelligence and data warehousing professionals worldwide. TDWI Research focuses exclusively on BI/DW issues and teams up with industry thought leaders and practitioners to deliver both broad and deep understanding of the business and technical challenges surrounding the deployment and use of business intelligence and data warehousing solutions. TDWI Research offers in-depth research reports, commentary, inquiry services, and topical conferences as well as strategic planning services to user and vendor organizations. 7 TDWI rese arch tdwi.org