Three Fundamental Techniques To Maximize the Value of Your Enterprise Data



Similar documents
Busting 7 Myths about Master Data Management

Data Integration Alternatives Managing Value and Quality

Data Integration Alternatives Managing Value and Quality

Effecting Data Quality Improvement through Data Virtualization

Building a Data Quality Scorecard for Operational Data Governance

Operationalizing Data Governance through Data Policy Management

Data Governance, Data Architecture, and Metadata Essentials

Supporting Your Data Management Strategy with a Phased Approach to Master Data Management WHITE PAPER

Understanding the Financial Value of Data Quality Improvement

Data Governance, Data Architecture, and Metadata Essentials Enabling Data Reuse Across the Enterprise

Considerations: Mastering Data Modeling for Master Data Domains

Practical Fundamentals for Master Data Management

5 Best Practices for SAP Master Data Governance

OPTIMUS SBR. Optimizing Results with Business Intelligence Governance CHOICE TOOLS. PRECISION AIM. BOLD ATTITUDE.

Master Data Management Drivers: Fantasy, Reality and Quality

Populating a Data Quality Scorecard with Relevant Metrics WHITE PAPER

Data Governance for Master Data Management and Beyond

Integrating Data Governance into Your Operational Processes

Information Management & Data Governance

Challenges in the Effective Use of Master Data Management Techniques WHITE PAPER

Data Governance. Data Governance, Data Architecture, and Metadata Essentials Enabling Data Reuse Across the Enterprise

Five Fundamental Data Quality Practices

JOURNAL OF OBJECT TECHNOLOGY

Principal MDM Components and Capabilities

Master Data Management

What to Look for When Selecting a Master Data Management Solution

Data Governance Maturity Model Guiding Questions for each Component-Dimension

Knowledgent White Paper Series. Developing an MDM Strategy WHITE PAPER. Key Components for Success

Master Data Management. Zahra Mansoori

Government Business Intelligence (BI): Solving Your Top 5 Reporting Challenges

Business Performance & Data Quality Metrics. David Loshin Knowledge Integrity, Inc. loshin@knowledge-integrity.com (301)

Enabling Data Quality

DATA QUALITY MATURITY

Enable Business Agility and Speed Empower your business with proven multidomain master data management (MDM)

Explore the Possibilities

SATISFYING NEW REQUIREMENTS FOR DATA INTEGRATION

Trends In Data Quality And Business Process Alignment

Whitepaper Data Governance Roadmap for IT Executives Valeh Nazemoff

Measure Your Data and Achieve Information Governance Excellence

Making Business Intelligence Easy. Whitepaper Measuring data quality for successful Master Data Management

CA Service Desk Manager

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya

IMPROVEMENT THE PRACTITIONER'S GUIDE TO DATA QUALITY DAVID LOSHIN

Kalido Data Governance Maturity Model

HP Service Manager software

The Role of Metadata in a Data Governance Strategy

Data Governance. David Loshin Knowledge Integrity, inc. (301)

Build an effective data integration strategy to drive innovation

5 Best Practices for SAP Master Data Governance

Unveiling the Business Value of Master Data Management

MDM Components and the Maturity Model

Harness the value of information throughout the enterprise. IBM InfoSphere Master Data Management Server. Overview

EAI vs. ETL: Drawing Boundaries for Data Integration

Data virtualization: Delivering on-demand access to information throughout the enterprise

Master Your Data and Your Business Using Informatica MDM. Ravi Shankar Sr. Director, MDM Product Marketing

THOMAS RAVN PRACTICE DIRECTOR An Effective Approach to Master Data Management. March 4 th 2010, Reykjavik

Point of View: FINANCIAL SERVICES DELIVERING BUSINESS VALUE THROUGH ENTERPRISE DATA MANAGEMENT

Analance Data Integration Technical Whitepaper

A Road Map to Successful Customer Centricity in Financial Services. White Paper

Business Architecture Scenarios

Bringing agility to Business Intelligence Metadata as key to Agile Data Warehousing. 1 P a g e.

Fortune 500 Medical Devices Company Addresses Unique Device Identification

Ensighten Data Layer (EDL) The Missing Link in Data Management

Analance Data Integration Technical Whitepaper

Tapping the benefits of business analytics and optimization

Data Integration Checklist

Master Data Management Architecture

IBM Information Management

The Informatica Solution for Improper Payments

The Importance of a Single Platform for Data Integration and Quality Management

TDWI strives to provide course books that are content-rich and that serve as useful reference documents after a class has ended.

Embarcadero DataU Conference. Data Governance. Francis McWilliams. Solutions Architect. Master Your Data

A WHITE PAPER By Silwood Technology Limited

MDM and Data Warehousing Complement Each Other

Data Governance: The Lynchpin of Effective Information Management

Data Management Roadmap

Implementing a Data Governance Initiative

CA Service Desk Manager

Enterprise Data Quality

Master Your Data. Master Your Business. Empower your business with access to consolidated and reliable business-critical data

White Paper. An Overview of the Kalido Data Governance Director Operationalizing Data Governance Programs Through Data Policy Management

Published April Executive Summary

DataFlux Data Management Studio

Applied Business Intelligence. Iakovos Motakis, Ph.D. Director, DW & Decision Support Systems Intrasoft SA

Using SAP Master Data Technologies to Enable Key Business Capabilities in Johnson & Johnson Consumer

STRATEGIC INTELLIGENCE WITH BI COMPETENCY CENTER. Student Rodica Maria BOGZA, Ph.D. The Bucharest Academy of Economic Studies

CA Service Desk On-Demand

Enterprise Information Management Capability Maturity Survey for Higher Education Institutions

NCOE whitepaper Master Data Deployment and Management in a Global ERP Implementation

Why is Master Data Management getting both Business and IT Attention in Today s Challenging Economic Environment?

Increase Business Intelligence Infrastructure Responsiveness and Reliability Using IT Automation

ElegantJ BI. White Paper. Operational Business Intelligence (BI)

Best Practices in Enterprise Data Governance

ISSA Guidelines on Master Data Management in Social Security

Best Practices in Release and Deployment Management

Master Reference Data: Extract Value from your Most Common Data. White Paper

Data Quality and Cost Reduction

How Global Data Management (GDM) within J&J Pharma is SAVE'ing its Data. Craig Pusczko & Chris Henderson

A SAS White Paper: Implementing the Customer Relationship Management Foundation Analytical CRM

Transcription:

Three Fundamental Techniques To Maximize the Value of Your Enterprise Data Prepared for Talend by: David Loshin Knowledge Integrity, Inc. October, 2010 2010 Knowledge Integrity, Inc. 1

Introduction Organizations have always sought to leverage the knowledge embedded within the existing data assets across the corporate business landscape. This knowledge empowers a corporation to identify and take advantage of new business opportunities. This voracious appetite for repurposing data clashes with the fact that the organic evolution of the modern data environment is wrought with inconsistencies and disparate knowledge centers. This persistent conflict makes the data repurposing effort difficult at best. This paper explores the details of three key methods used to improve and extend enterprise data utilization, and then examines technical considerations that suggest implementation techniques to address data reuse needs. Maximizing data utilization across the enterprise relies on three fundamental capabilities: Provide accessibility and availability to the data; Ensure transparency and visibility of data concepts across the application infrastructure; and Enable trust in the reliability and consistency of the data. Many organizations may have already invested in one or more of these capabilities; however, there is an interdependence on all three that provides the right kind of synergy that truly maximizes data utility across the enterprise. The Origins of Disparate Knowledge Business application development is driven by the desire to automate or improve specific business processes. The intent within each system is to collect the right amount of data to complete a workflow or transaction process. The creation or acquisition of data in support of the applications is often scoped to meet the acute needs of that specific application, and each application relies on its own data silo or operational data store. This is particularly evident in organizations lacking a data governance strategy. Consequently, for each data silo, the expectations for completeness, consistency, and availability are limited to those required for achieving the objectives of the specific corresponding business process. One consistent trend in data architectures is to centralize data from a variety of sources across the organization to derive and extract value through enterprise visibility, reporting, and analysis. These valuable concepts suggest the notion of data reuse, where data that is acquired or created during one business process is discovered and then repurposed to satisfy the need of alternate business processes. Data reuse has great potential, especially as it attempts to extract additional value from an established data asset. Yet often, the absence of documented metadata or underlying data meanings forces the manager of these alternate business processes to interpolate the structures as well as reinterpret the semantics of the shared data sets. 2010 Knowledge Integrity, Inc. 2

The Three Fundamental Techniques With this growing demand for centralizing data, most organizations have invested significant resources in order to provide accessibility and availability to data, ensure the transparency and visibility of data concepts across the application infrastructure, and to enable trust in the reliability and consistency of the data. These are the three fundamental capabilities to maximize utilization of enterprise data. The implementation of one of these aspects may be directly related to a specific project, even though there is often little consideration of the general applicability of the techniques to benefit other business processes. Therefore, it is worth reviewing the value proposition of each area of capability within the context of enterprise data utilization. Here we drill down into these concepts in greater detail, and explore what they really mean in the context of an enterprise. Data Availability Years of decentralized business application development suited the operational needs of each line of business, but the virtual wall put up between groups complicated any enterprise-wide reporting and analysis. The emergence of the data warehouse as a repository for data assimilated from across the enterprise reintroduced centralized data management. However, the typical approaches to data consolidation suffered because of two assumptions: Availability -the assumption that the data needed in the warehouse would be accessible from the source systems Compatibility - the assumption that data from multiple data sets were aligned both structurally and semantically and could be easily combined The most significant challenges associated with centralizing data in a data warehouse involve collecting the original source data and making that data available for the both today s information processing needs in addition to as many future data needs as possible. This translates to a number of directives: Ensure timely access to the different data sets from different source applications; Align the different structures associated with the same data concepts; and Align the different meanings associated with similar structures. Different data sets are typically designed and engineered by a variety of individuals for a range of specific business purposes without recognizing that there would be an eventual opportunity for repurposing. This corporate history has made the data consolidation process much more complex. Although tools for extraction, transformation, and loading the data warehouse have emerged, data centralization requires much more than simplistic mappings and transforms. These challenges, combined with the increased demand for reuse, has led to greater investment in the evolution of more comprehensive strategies for data accessibility and availability. Transparency and Visibility Transaction systems are designed to successfully execute business operations, but the transaction data is rarely engineered to support the types of reporting and analysis that business analysts use. Informed 2010 Knowledge Integrity, Inc. 3

decision-making relies on more comprehensive views of data; these views are materialized through the data marts and other analytical environments populated via a data warehouse. Many business decisions hinge on having a large degree of visibility into the knowledge that resides in both operational and analytical systems. For example, understanding customer behavior is a prelude to developing predictive models for cross-selling, upselling, and targeted marketing. The same holds true for analyses of other commonly-used concepts, such as spend analysis, which requires a comprehensive level of visibility into all procurement, purchasing, and acquisition transactions involving vendors and their parts, products, and services, or staff productivity analysis, which focuses on analyzing data to assess employee performance. Enabling a high degree of visibility into any commonly-used data concept means more than just data accessibility. Integration of similar data concepts from across different data sources requires transparency processes that ensure the data sources are compatible from a structural and semantic perspective. It is through this characteristic that one can determine whether two customer data sets refer to the same core concept and are therefore can be combined. Reliability and Consistency When evaluating data transparency, the concept (and context) of primary use must also be considered. While primary use of a data set can be defined as first in order, it can also be defined as first or highest in rank of importance. Naturally, the business application that originates the data is the primary consumer; however, that application may not be the most important use of the data. If the alternate uses are high in rank of importance, they are also primary consumers. Therefore, it is critical to ensure that measured levels of data reliability and consistency are sufficient to meet all business process needs. This suggests a different way of thinking about soliciting, documenting, and adhering to the data set s quality requirements. The typical approach to defining data quality requirements only looks at the functional needs of the business process application being designed. In turn the data quality requirements are only defined to meet an acute functional need. In most cases, though, no one considers how data created by one application will be used by other applications. But if the data sets are actually intended to be used by additional alternate business applications, ensuring trust in the reliability and consistency of the data becomes an organizational imperative. It is incumbent upon the system designers to talk to any potential data consumer, to identify their information needs, and to implement the inspection, monitoring, and corrective actions associated with enterprise data quality expectations. Establishing good data quality practices and supporting those practices with the right tools and techniques is imperative to prevent confused semantics, inconsistency, and incoherence. 2010 Knowledge Integrity, Inc. 4

Technical Considerations The conceptual discussion of each of the techniques demonstrates the value that can be added when there is focus on data reuse. Each of the techniques is enabled by technologies that have been refined and implemented as commonplace tools such as data integration, master data management (MDM), and data quality. While these types of tools may already be established for discrete projects, considering them in the context of enterprise utilization exposes potential for economies of scale across the organization. Many organizations already understand the value of one or more of the key technologies necessary for enterprise data utilization. For example, the key stakeholders in an organization that has implemented a data warehouse will have already invested in data integration and data cleansing tools, while business analysts focusing on customer behavior may have initiated a master data management program for customer data integration. While organizations may have invested in one technology (such as data integration), they may not be fully satisfying the growing demands of the range of alternate data purposes: Data integration may help to achieve availability and accessibility, but requires data quality to ensure availability to high quality information and master data management to provide visibility. Master data management enables transparency and visibility, but consumers of master data will need data integration services to enable accessibility and availability along with data quality services to ensure high quality. Data quality tools support assessment, correction, and inspection, but these techniques enable the data integration tools to provide a high quality view into the shared, unified view enabled via master data management. Alternatively, data quality tools need data access capabilities typically provided by data integration tools. 2010 Knowledge Integrity, Inc. 5

Figure 1: The synergy of data integration, master data management, and data quality enhances enterprise data utilization. Any of these capabilities adds value. All three capabilities are necessary. But the value of any of these capabilities is compounded through the synergy of all three working together. By instituting the proper data stewardship and management policies and procedures at the corporate and line-of-business levels, these three capabilities together provide methods for maximizing the utilization of a high-quality, unified view of real-world data objects for a wide variety of operational and analytical business applications. Data Integration for Accessibility and Availability As the data centralization needs for reporting and analysis have grown, so has the technical environment that supports more generalized data sharing. In fact, operational data stores, data warehouses, data marts, mash ups, federated operational systems, self-service reporting, data exchanges, alerts and notifications, and other analytic and operational applications are all examples of technical techniques or applications that rely on shared data. Satisfying their data demands depends on the ability to move data from the origination point(s) to specific target data stores. Those needs can be satisfied using data integration. 2010 Knowledge Integrity, Inc. 6

Data sources that are subject to reuse are often managed in a number of different formats, file and/or table structures, and sometimes even using different underlying character encodings. To facilitate data integration, there is a need for at least two capabilities: the ability to access data from a wide variety of sources, and the ability to transform the accessed data into a format suitable for sharing. The most common approach to data integration relies on a combination of data extraction and data access processes specially-engineered routines employed to fetch data from the sources. In order to merge data into a downstream target system, it must first be normalized. Normalization relies on the, data integration requires specially-design transformations that apply a series of functions to normalize, cleanse, standardize, derive and translate the data into a format that is consistent with these target systems. At that point, the data is ready to be propagated and loaded into the target destination, either overwriting the existing data or periodically augmenting the existing target data set. There are alternatives to the traditional extraction/transform/load approach. One example, referred to as extract, load, and transform (or ELT), bypasses a staging area and applies the transformations once the data has been loaded into the destination data set. Another is data federation, which is sometimes referred to as data virtualization, which enables direct access to data sources using an abstraction layer that standardizes data accessibility across a variety of platforms. Master Data Management for Transparency and Visibility Data integration establishes availability. This availability is enhanced with transparency and visibility, which are enabled by the techniques that usually comprise an effective master data management practice. Master data management is intended to enable the development of an accurate and reliable view of business entities, common reference data concepts, and the dimensional data that are vital to the operations of the enterprise, among a variety of other potential master data concepts. Essentially, those data sets that - Exemplify common data concepts, - Maintain replicated data elements, - Are subject to multiple business purposes, and - Can be used by multiple applications can be considered master data. Master data provides either a persistent repository or a (potentially virtual) registry or index of uniquely identified entities with their critical data attributes synchronized from the contributing original data sources. With the proper governance and oversight, the data in the master data system (or repository, or registry) can be qualified as a unified and coherent data asset that all applications can rely on for consistent high quality information. Master data management is a collection of data management best practices associated with both the technical oversight and the governance requirements for facilitating the sharing of commonly used master data concepts. MDM incorporates the people, policies, procedures, and technology to orchestrate key stakeholders, participants, and business clients in managing business applications, information management methods, and data management tools. 2010 Knowledge Integrity, Inc. 7

Master data management tools support an organization s business needs by allowing business modelers to develop a standardized view of the uniquely identifiable master data entities across the enterprise application infrastructure along with their corresponding master data services. Master data management governs the methods, tools, information, and services to: Identify core business-relevant data concepts used in different application data; Assess the use of commonly used data concepts and valid commonly-used data domains; Create a standardized model for integrating and managing those master data concepts; Manage collected and discovered metadata as an accessible, browsable resource, and using it to facilitate consolidation; Collect data from originating data sources, and evaluate how different data instances refer to the same real-world entities, and facilitate a unified view of each real-world entity; Enforce roles based access controls to secure and provision the master data; and Establish common master data services to maintain consistent transactions across the collection of data consumers. Data Quality Management for Reliability and Consistency In the context of enterprise data utilization, operational processes for data quality management incorporate best practices with tools and technologies to can ensure reliability and consistency, such as: Defining Data Validity Rules Rules to measure compliance with identified data quality expectations are used as data controls whose implementation is incorporated directly into the application development process so that data errors can be identified and addressed as they occur. Defining Acceptability Thresholds Acceptability threshold scores can be defined, and measured scores below the acceptability threshold indicates that the data does not meet business expectations. Defining Data Quality Metrics and Thresholds Data quality analysts can work with business data consumers to define data quality metrics to baseline and then continuously inspect and monitor levels of data quality. Inspection and Monitoring Data quality rules are used for data quality inspection, monitoring, and notifying the appropriate people when data quality issues requiring remediation are identified. Data Quality Incident and Performance Reporting This is a set of management processes for the reporting and tracking the status of data quality issues and corresponding activities. Managing Data Remediation This provides the mechanism for remedying data issues, including triage, classification, prioritization, and preparation for root cause analysis. Root Cause Analysis This set of processes is used to isolate the location in which errors are introduced to enable drill down into the data and identify the root cause. Data Correction This remediation approach is a governed process for correcting data to meet acceptability thresholds when the source of the errors cannot be fixed. 2010 Knowledge Integrity, Inc. 8

Data quality technology such as data profiling, parsing and standardization, cleansing, identity resolution, and data enhancement can be used to support the analysis, documentation, and inspection and monitoring of adherence to enterprise data quality expectations as well as taking the proper corrective measures when data flaws need immediate attention. Conclusion When coupled with best practices for operational data governance, the synergy of data integration, master data management, and data quality enhances the utilization of data from across the organization and benefits all data consumers. As you consider data integration, data quality, or master data management, recall that the success of each individual set of methods and techniques is greatly enhanced when supported by the others. The stage for successful data repurposing can be set early on in the process by taking some immediate, concrete steps: 1) Know your data consumers and assess their needs Knowing that the range of data consumers for any data set goes beyond the originating business application users, identify the community of data consumers and make sure that there are well-defined processes in place for soliciting and documenting their requirements for availability, visibility, and quality and ensuring those requirements are engineered into the application infrastructure. 2) Clarify roles and accountability for data management Since the context of information expands beyond the originating application, the roles, responsibilities, and accountabilities associated with data stewardship must be clearly defined to ensure that each business application owner is charged with guaranteeing that all alternate data consumer needs are met. 3) Assess and inventory existing methods and tools Take an objective inventory of all the technology and tools available within the organization for any of the three fundamental techniques to determine whether their capabilities satisfy the directives for enterprise utilization as suggested in this paper. 4) Harmonize semantics when possible Establish procedures for analyzing semantics to recognize and harmonize business terms, data elements, and concept domains that share the same underlying meaning. 5) Differentiate when necessary At the same time, note when similarly named business terms, data elements, and concept domains do not share meanings and ensure that those are effectively differentiated before they are inadvertently combined in an inconsistent way. 6) Take the long view on technology acquisition Whenever opportunities for improving data utilization arise, keep these three fundamental techniques in mind as the organization frames its business needs assessment and defining of functional requirements for technology procurement. 2010 Knowledge Integrity, Inc. 9

About the Author David Loshin, president of Knowledge Integrity, Inc, (www.knowledge-integrity.com), is a recognized thought leader and expert consultant in the areas of data quality, master data management, and business intelligence. David is a prolific author regarding BI best practices, via the expert channel at www.b-eye-network.com and numerous books and papers on BI and data quality. His book, Business Intelligence: The Savvy Manager s Guide (June 2003) has been hailed as a resource allowing readers to gain an understanding of business intelligence, business management disciplines, data warehousing, and how all of the pieces work together. His book, Master Data Management, has been endorsed by data management industry leaders, and his valuable MDM insights can be reviewed at www.mdmbook.com. David can be reached at loshin@knowledge-integrity.com. About the Sponsor Talend (www.talend.com) is the recognized market leader in open source data management, a fast growing market according to analyst consensus. Many large organizations around the globe use Talend's products and services to optimize the costs of data integration, data quality and Master Data Management (MDM). All Talend products are built on a unified Eclipse-based development environment, which provides users with consistent ergonomics, fast learning curve and a high-level of reusability. This offers unrivaled benefits in terms of resource optimization and utilization, and project consistency. With an ever growing number of product downloads and paying customers, Talend's solutions are the most widely used and deployed data management solutions in the world. The company breaks the traditional proprietary model by supplying open, innovative and powerful software solutions with the flexibility to meet the data management needs of all types of organizations. 2010 Knowledge Integrity, Inc. 10