Data Quality and Cost Reduction



Similar documents
Evaluating the Business Impacts of Poor Data Quality

Understanding the Financial Value of Data Quality Improvement

Building a Data Quality Scorecard for Operational Data Governance

Five Fundamental Data Quality Practices

Populating a Data Quality Scorecard with Relevant Metrics WHITE PAPER

Effecting Data Quality Improvement through Data Virtualization

5 Best Practices for SAP Master Data Governance

Data Quality Assessment. Approach

Three Fundamental Techniques To Maximize the Value of Your Enterprise Data

Data Governance, Data Architecture, and Metadata Essentials

Monitoring Data Quality Performance Using Data Quality Metrics

Supporting Your Data Management Strategy with a Phased Approach to Master Data Management WHITE PAPER

Operationalizing Data Governance through Data Policy Management

Busting 7 Myths about Master Data Management

Data Integration Alternatives Managing Value and Quality

Data Governance for Master Data Management and Beyond

Practical Fundamentals for Master Data Management

W H I T E PA P E R : DATA QUALITY

Data Governance, Data Architecture, and Metadata Essentials Enabling Data Reuse Across the Enterprise

Business Performance & Data Quality Metrics. David Loshin Knowledge Integrity, Inc. loshin@knowledge-integrity.com (301)

5 Best Practices for SAP Master Data Governance

Data Integration Alternatives Managing Value and Quality

NCOE whitepaper Master Data Deployment and Management in a Global ERP Implementation

Considerations: Mastering Data Modeling for Master Data Domains

IBM Software A Journey to Adaptive MDM

agility made possible

Master Data Management Drivers: Fantasy, Reality and Quality

SOLUTION BRIEF: CA IT ASSET MANAGER. How can I reduce IT asset costs to address my organization s budget pressures?

Principal MDM Components and Capabilities

IMPROVEMENT THE PRACTITIONER'S GUIDE TO DATA QUALITY DAVID LOSHIN

The Data Quality Business Case: Projecting Return on Investment

Challenges in the Effective Use of Master Data Management Techniques WHITE PAPER

Data Governance. Data Governance, Data Architecture, and Metadata Essentials Enabling Data Reuse Across the Enterprise

EMC PERSPECTIVE Enterprise Data Management

Supercharge Salesforce.com initiatives with a 360-degree view of the customer

10426: Large Scale Project Accounting Data Migration in E-Business Suite

Next-Generation IT Asset Management: Transform IT with Data-Driven ITAM

Make your CRM work harder so you don t have to

How To Improve Product Data Quality

Addressing the Challenges of Data Governance

The Business Case for Information Management An Oracle Thought Leadership White Paper December 2008

CHAPTER SIX DATA. Business Intelligence The McGraw-Hill Companies, All Rights Reserved

Content is essential to commerce

Informatica Master Data Management

Global Headquarters: 5 Speen Street Framingham, MA USA P F

ORACLE SERVICES PROCUREMENT

Three proven methods to achieve a higher ROI from data mining

Supply Chain Management 100 Success Secrets

A Simple Guide to Material Master Data Governance. By Keith Boardman, Strategy Principal

The Evolving Role of Process Automation and the Customer Service Experience

B2B E-Commerce Solutions Empower Wholesale Distributors

Making Business Intelligence Easy. Whitepaper Measuring data quality for successful Master Data Management

Vehicle Sales Management

SALES AND OPERATIONS PLANNING BLUEPRINT BUSINESS VALUE GUIDE

Data Governance. David Loshin Knowledge Integrity, inc. (301)

Data Governance: A Business Value-Driven Approach

Enterprise Data Quality

Role of Analytics in Infrastructure Management

Accenture Federal Services. Federal Solutions for Asset Lifecycle Management

MANAGING THE REVENUE CYCLE WITH BUSINESS INTELLIGENCE: June 30, 2006 BUSINESS INTELLIGENCE FOR HEALTHCARE

Oracle Business Intelligence Applications Overview. An Oracle White Paper March 2007

DATA QUALITY MATURITY

Customer Master Data: Common Challenges and Solutions

Government Insights: Possible IT Budget Cuts

Tapping the benefits of business analytics and optimization

Connecting data initiatives with business drivers

Data Quality Assurance

Increase Business Intelligence Infrastructure Responsiveness and Reliability Using IT Automation

The Butterfly Effect on Data Quality How small data quality issues can lead to big consequences

10 Fundamental Strategies and Best Practices of Supply Chain Organizations

The SAS Transformation Project Deploying SAS Customer Intelligence for a Single View of the Customer

Integrating Data Governance into Your Operational Processes

How To Audit A Company

InfoGlobalData specialise in B2B Lists and Appending Services.

META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING

ENTERPRISE MANAGEMENT AND SUPPORT IN THE TELECOMMUNICATIONS INDUSTRY

MDM Components and the Maturity Model

The Total Economic Impact Of SAS Customer Intelligence Solutions Intelligent Advertising For Publishers

Agile Master Data Management A Better Approach than Trial and Error

Data Governance: A Business Value-Driven Approach

Valuable Metrics for Successful Supplier Management

Business Drivers for Data Quality in the Utilities Industry

This paper looks at current order-to-pay challenges. ECM for Order-to-Pay. Maximize Operational Excellence

The Cost of Duplicate Data in Enterprise Content Management

IBM Tivoli Netcool network management solutions for enterprise

DataFlux Data Management Studio

How CRM Software Benefits Insurance Companies

how can you stop sprawl in your IT infrastructure?

Information Systems in the Enterprise

What to Look for When Selecting a Master Data Management Solution

Measure Your Data and Achieve Information Governance Excellence

An Enterprise Resource Planning Solution for Mill Products Companies

Enterprise Content Management for Procurement

Getting a head start in Software Asset Management

The Total Economic Impact Of SAS Customer Intelligence Solutions Real-Time Decision Manager

Thought Leadership White Paper. Consolidate Job Schedulers to Save Money

HIGH PRECISION MATCHING AT THE HEART OF MASTER DATA MANAGEMENT

In this chapter, we build on the basic knowledge of how businesses

Transcription:

Data Quality and Cost Reduction A White Paper by David Loshin WHITE PAPER

SAS White Paper Table of Contents Introduction Data Quality as a Cost-Reduction Technique... 1 Understanding Expenses.... 1 Establishing the Connection: Data Flaws and Increased Costs.... 2 Bids and Proposals.... 3 Product Research and Development.... 3 Human Resources.... 4 Customer Relationship Management........................... 4 Overhead.... 4 Considerations: Common Data Failures.... 5 Data Quality Services to Reduce Costs... 5 Empirical Data Quality Assessment Using Data Profiling... 6 Entity Name Harmonization Using Parsing and Standardization.... 6 Entity Record Consolidation Using Identity Resolution, Matching, and Linkage.... 6 Address Standardization and Correction.... 7 Establishing a Unified View Using Master Data Management.... 8 Summary... 8 About the Author.... 9

Data Quality and Cost Reduction Introduction Data Quality as a Cost-Reduction Technique There is no arguing that information technology (IT) is typically a cost center in which many different types of operating costs are incurred, accumulated and offset against organizational revenues and profits. As a core component of IT, the staff, hardware, software and support dedicated to data management are often directly accountable for many of those costs. Although data management contributes to corporate expenses, there are many opportunities to apply best practices in data quality management to reduce expenses. There are different aspects to the meaning of cost reduction. To some, it focuses solely on eliminating or reducing operational expenses. But reduced costs are also often linked to increased efficiencies in day-to-day operations, as well as improved performance for revenue-generating activities. In other words, within the scope of a performance-oriented organization, data quality management can be used to seek operational efficiencies for cost reduction, leading to increased margins and, consequently, increased profits. This paper will review aspects of cost reduction by examining some typical financial accounting expense categories. The paper then selects some specific examples and assesses their reliance on high-quality data. In turn, the paper looks at how data quality services can be applied in those examples to reduce expenses. Lastly, we reiterate the potential for managing data quality as a way to control and reduce organizational expenses. Understanding Expenses The objective of a program to identify operational efficiency may be to take a slash-andburn approach to reducing expenses. However, eliminating staff or services necessary to keep the business running will increase the workload for the remaining staff members, while reducing their effectiveness and, ultimately, detracting from the employee experience. This cycle often leads to increased turnover and an overall reduction in organizational knowledge. Reducing expenses is more about being smart in understanding where costs have exceeded reasonable levels and determining ways to identify and eliminate excessive costs. And this is where good data quality comes in if the source of waste can be attributed to successful business operations that are nonetheless negatively affected by poor data quality, then (logic would suggest) improving data quality will help identify ways to streamline processes and reduce costs. More importantly, the effects of applying these techniques during weaker economic times will train people to work smarter in the future, helping the organization improve competitiveness and agility during economic recovery and expansion. The challenge for the data professional lies not in the knowledge of good data management techniques, but in understanding the financial language that describes how money is spent to run a business, usually encapsulated within the finance department s chart of accounts. This chart lists the channels through which money flows into and out of the organization. 1

SAS White Paper In most cases, a business spends money as a prerequisite to making money. That spending is broken into standard categories, such as: Direct costs labor, materials or subcontractor costs associated with fulfilling contractual obligations. General overhead and administrative (GOA) costs rent, maintenance, asset purchase, asset utility, licensing, utilities, administrative staff and general procurement. Staff overhead, including the staff necessary to run the business clerical, sales management, field supervision, bids and proposals (B&P), recruiting and training. Business overhead bank fees, service charges, commissions, legal fees, accounting, penalties and fines, bad debt, merger and acquisition costs. Cost of goods sold (COGS), which refers to the costs associated with creating and selling a product design of the products, raw materials, production, cost of inventory, inventory planning, marketing, sales, customer management, advertising, lead generation, promotional events, samples, customer acquisition, customer retention, order replacement, order fulfillment and shipping, among others. With a better understanding of accounting concepts associated with how the organization spends money, it s easier to reframe cost reduction away from simply slashing the budget to actually creating opportunities. For example, cutting the costs associated with selling the product may result in selling fewer items, so that might not be the wisest choice. But examining the supply chain to reduce held inventory may free up cash so that money becomes available for other activities. This is a good example for two reasons. First, it provides a reasonable context for cost reduction as a way of improving a business process. Second, it is a situation that can be affected by data flaws (e.g., actual held inventory may be much less than what the inventory systems tell you). Establishing good data management and data governance practices, such as identifying and eliminating data errors, will ultimately help prevent inefficiencies from creeping into the system in the first place. Establishing the Connection: Data Flaws and Increased Costs This last statement establishes the role data management plays in many cost-saving initiatives. Despite the perceived tenuous connection between data quality management and the bottom line, business processes may fail to operate at peak efficiency due to poor data quality. Assessing the quality of data is a critical first step. If opportunities for improvement depend on access to high-quality data, then improving data quality management becomes a necessity for analyzing and optimizing operational activities. To illustrate, let s consider some common scenarios in which data issues negatively affect business expenses. 2

Data Quality and Cost Reduction Bids and Proposals Competitive sourcing requires that actual costs of delivering a manufactured product, service or a combination of the two be determined. Once those costs are evaluated, the proposal cost is determined by adding an acceptable margin on top of the cost. The bid and proposals (B&P) team then considers the context whether the proposal is within a range that would be acceptable to the customer and any potential aggravating circumstances that could put the project at risk. The B&P process needs reliable data to understand the operational costs for delivery, such as the cost of raw materials, manufacturing and just-in-time inventory requirements. In turn, analyzing the actual costs attributable to the customer from estimates of direct costs and materials, as well as the logistics and delivery of materials to the right places at the right times depends on a level of precision and detail that can be derailed if the right data is missing. Better data means better predictions for proposed projects, which means lower direct costs. Reducing costs and more efficient project planning increase the likelihood of delivering the results early and under budget, thereby increasing the margin. Product Research and Development Businesses often depend on a cyclical understanding of the relationships between their customers and the products those customers buy (or don t buy) in order to plan, design, and market the next iteration of products and services. Knowing which customers buy which products greatly influences the next design cycle. High-quality transaction data about customers (and customer profiles) and their product purchases must be available, because invalid or incorrect data will skew the perception of customer-product affinity. Designers must also consider which raw materials and components are necessary for any proposed product design, since any new ones require additional sourcing and procurement. However, large organizations with different design teams may not be aware of each other s plans, which could lead to inefficiencies in the design process, such as: Different teams may attempt to purchase the same components from different suppliers and end up paying different prices. Different teams may attempt to purchase the same components from the same suppliers and end up paying different prices (i.e., out-of-contract or maverick spending). Some teams may end up building their own component while others acquire them from suppliers. High-quality data will narrow the focus of new product design and, consequently, reduce the costs of developing, marketing and managing product lines that are not successful. In addition, sharing master data about available materials and components will reduce the costs and time associated with new product design. The result is a more focused line of goods with lower operational costs for design. 3

SAS White Paper Human Resources Larger companies often grow by acquisition, but each acquired company may have its own organizational resource and staffing systems for managing employees. Inconsistency in capturing an enterprisewide view of employees exposes certain risks and issues. One significant issue is managing talent and skills so that employees gain the additional skills they need during their careers. But incomplete or incorrect information about employees may lead to increased hiring costs and turnover. For example, the need for an employee with a specific skill set may arise. If the staffing personnel don t know that there are people within the organization that possess that skill set, they may recruit a new employee for that role. As a result, existing staff members may not be used to their full potential while recruitment and hiring costs may increase. Additionally, employees passed over for a plum spot may become disenfranchised, leading to increased turnover, which again increases recruiting and hiring costs. These increased costs are attributable to the lack of high-quality information about employees, but the costs can be reduced if employee information is managed better. Customer Relationship Management Poor customer data quality can wreak havoc with any kind of customer relationship management program. A common issue, such as inadvertent duplication of customer records, can obscure the full view of any one customer s interactions with the organization. Once a customer loses trust in the company s ability to serve its customer base, then the costs for customer retention, the requirements for sales and marketing, and customer attrition rates all go up. If the costs for new customer acquisition outweigh those of customer retention, then failing to stem customer churn means that the average cost of doing business with each customer increases, with a proportional increase in spending. For instance, a simple exercise of identifying and eliminating duplicate customer data can help maintain a complete view of the customer and potentially increase retention. In this way, improved data quality leads to reduced COGS. Overhead All organizations must incur overhead: the indirect expenses that are necessary to run the business but don t contribute to its profitability. While the costs of employees working on client projects are direct costs, other costs that are necessary so those employees can do their jobs office rent, telephone services, electricity and liability insurance are overhead. Other examples of overhead include non-reimbursable travel expenses, postage, office supplies, professional services (such as accounting), office equipment leases and taxes. 4

Data Quality and Cost Reduction As the organization grows, so does the need to manage overhead, especially when there are few controls over spending. Often people in one part of the company are unaware of what others are doing when it comes to engaging vendors, negotiating prices and managing the relationships with different suppliers. Multiple, unsynchronized supplier systems compound the problem. When the same suppliers appear multiple times across different systems, they benefit from establishing many relationships, contracts, prices and so forth. In addition, the same types of products and services may be purchased at different prices from different vendors. Spending analysis is a process that helps evaluate these potential sources of overhead leakage. Yet the inability to determine that two different records refer to the same vendor makes it very difficult to get a comprehensive view of all supplier relationships. Once supplier records have been consolidated into a unified view, the company can better analyze the ways it interacts with those suppliers. Considerations: Common Data Failures While the reasons for increased costs in each of these examples differ, there are some commonalities, some generic data issues, that occur with a degree of regularity that affect the organization s expenses. Essentially, each issue involves common data failures associated with defined dimensions of data quality: Completeness: Information necessary to complete transactions, to perform operations, or to make the proper decisions within the business context is missing or defaulted to an unusable value. Accuracy: Information that is inaccurate or is not at the proper level of precision leads to increased reconciliation and manual intervention, thereby reducing throughput and increasing the completion time. Uniqueness: Variance in party or entity data particularly customer, supplier, employee, product and materials data affects inventory, staffing and effective customer management, leading to increased effort and leakage. These types of data failures magnify gaps in business processes, but assessing those data failures and eliminating their root causes will remove the barriers to analyzing corporate spending and lead to greater efficiencies. Data Quality Services to Reduce Costs From an operational standpoint, costs can be reduced using the following data quality management processes and techniques: Data quality assessment using data profiling. Entity name harmonization and standardization. Entity record consolidation using identity resolution, matching and linkage. Address standardization and correction. Establishing a unified view using master data management. 5

SAS White Paper Empirical Data Quality Assessment Using Data Profiling If poor data quality can create barriers to comprehensive spending analysis, and then the first steps to breaking through those barriers are identifying, evaluating and prioritizing potential data anomalies. Empirical analysis of the data provides the first insights that can motivate cost reduction, and that empirical analysis is typically performed using data profiling. Assessing data quality is a process of analysis and discovery requiring an objective review of the actual data values. Those values are assessed by using statistical analysis of data value frequencies, followed by analyst review. This review helps pinpoint instances of flawed data and identifies potential anomalies that are impeding optimal business processes and increasing expenses. Data profiling uses a set of algorithms for statistical analysis and assessment of the quality of data values within a data set. Data profiling helps explore relationships that exist within and across data sets. For each column in a table, a data profiling tool will provide a frequency distribution of the different values, shedding insight into the data type and use in each column. Cross-column analysis can expose embedded value dependencies, while inter-table analysis explores overlapping value sets that may represent foreign key relationships between entities. In this way, profiling can analyze and assess anomalies and build the groundwork for data improvement efforts. Entity Name Harmonization Using Parsing and Standardization When analysts are able to describe the different component and format patterns used to represent a data object (person name, supplier name, raw material, employee name, product description, etc.), data quality tools can parse data values that conform to any of those patterns and even transform them into a single, normalized format. Parsing uses defined patterns managed within a rules engine to distinguish valid and invalid data values. When patterns are recognized, other rules and actions can be triggered, either to standardize the representation (presuming a valid representation) or to correct the values (if known errors are identified). Automated pattern-based parsing will recognize variant string tokens and subsequently reorder those tokens into a standardized format. Entity Record Consolidation Using Identity Resolution, Matching, and Linkage The primary challenge of entity identification and resolution is twofold. Sometimes multiple data instances actually refer to the same real-world entity, while other times it may appear that a record does not exist for an entity when, in fact, it does. Both of these conditions are related to the same fundamental issues. In the first situation, similar, yet slightly variant, representations in data values may have been inadvertently introduced into the system. In the second situation, a slight variation in representation prevents the identification of an exact match of the existing record in the data set. 6

Data Quality and Cost Reduction Both of these issues are addressed through identity resolution, which is another name for approximate matching and record linkage. In this process, the degree of similarity between any two records is scored using a weighted approximate matching between a corresponding set of mapped attribute values. When the match score exceeds a specific threshold, the two records are assumed to be a match; if the score is below an established threshold, the two records are assumed not to match. Scores that fall between the two thresholds are sent to subject-matter experts for review. Identity resolution is used to recognize when only slight variations suggest that different records are connected and where values may be cleansed. There are different approaches to matching a deterministic approach relies on a broad knowledge base for matching, while a probabilistic approach uses statistical techniques that contribute to the weighting and similarity scoring process. Identifying similar records within the same data set probably means that the records are duplicates and may be cleansed or removed. Identifying similar records in different sets may help resolve duplicated data across applications, thus eliminating variant representations of products, customers, employees or any other set of entities. Address Standardization and Correction Address data quality is critical for businesses that depend on the delivery of products to specific locations or for traditional marketing outreach to customers and prospects via mailings. Many countries have postal standards, and increasingly these standards are being used for typical mailings and for cleansing and identity resolution. The United States Postal Service s standards are well-defined and provide an easy template for address correction. An address standardization process must incorporate: A method to validate whether an address is already in standard form. Parsing and standardization tools to identify each address token. A method to disambiguate data elements that might cause a conflict (e.g., directional words such as West or NW as part of a street name, or the difference between ST as an abbreviation for street or for saint ). A method to map strings to standard abbreviations. Data normalization to reformulate the data address tokens into a standard form. This process builds on earlier capabilities and integrates corrective actions to resolve inconsistent or extraneous address tokens. The result: a clean address and a lower likelihood of incurring repeat delivery costs after attempts to deliver to the wrong location. 7

SAS White Paper Establishing a Unified View Using Master Data Management Not only are identity resolution failures among the most common data issues and pervasive across all industries, they also lead to the greatest business impact, especially when attempting to use customer data, both operationally and from an analytic standpoint. By applying these data quality techniques together, you can configure an environment with a unified view of the customer data (or product data, employee data, or any master data object) that is currently managed by different applications using different data stores across the enterprise. Master data objects are those core business objects that are used in different applications across the organization, along with their associated metadata, attributes, definitions, roles, connections and taxonomies. Master data objects are those things upon which successful business operations rely the things that are logged in our transaction systems, drive operations, are measured and reported on in our reporting systems, and analyzed in our analytical systems. Master data management (MDM) is essentially a data quality management program that incorporates the business applications, methods and tools to implement policies, procedures and infrastructure that support the capture, integration, and subsequent shared use of accurate, timely, consistent and complete master data. While the scope of the infrastructure to implement MDM may be relatively broad, it cannot be executed without data quality techniques. For example, to create a unified view of the customer, it is necessary to find all records associated with a unique entity that resides in different systems. Variance in representations, spelling and formats means that the unification process cannot be limited to exact duplicate matching. Instead, it relies on parsing, standardization and normalization of name strings, as well as approximate matching and linkage to collate identifying attributes and to connect similar records. The MDM architecture then allows similar records to be mapped into a single virtual representation. Extending the process allows you to map customers to products, or employees to skills, which is the first step in building the ever-elusive 360-degree view. This unified view can help reduce costs for sales and marketing, improve customer selfservice, and reduce recruitment and hiring costs. Summary Organizations that rely on information to successfully run their businesses should be aware of the ways that flawed data can increase costs. When times are tough and the boss is looking to reduce expenses, or when times are good and management is seeking greater margins and increased profits, having a process to establish reliable data will reveal opportunities where low investments can lead to high returns. 8

Data Quality and Cost Reduction Whether you seek to reduce COGS, overhead, general and administrative or even direct costs, be prepared see how data errors wreak havoc with the desired results. Data errors are not uncommon, and there are well-defined approaches to analyzing their root cause and eliminating their sources. Applying data quality best practices to address commonly occurring data issues will ultimately alleviate the pain those errors cause. About the Author David Loshin, President of Knowledge Integrity Inc., is a recognized thought leader and expert consultant in the areas of data quality, master data management and business intelligence. Loshin is a prolific author regarding data management best practices and has written numerous books, white papers and Web seminars on a variety of data management best practices. His book Business Intelligence: The Savvy Manager s Guide has been hailed as a resource allowing readers to gain an understanding of business intelligence, business management disciplines, data warehousing and how all of the pieces work together. His book Master Data Management has been endorsed by data management industry leaders, and his valuable MDM insights can be reviewed at mdmbook.com. Loshin is also the author of the recent book The Practitioner s Guide to Data Quality Improvement. He can be reached at loshin@knowledge-integrity.com. 9

About SAS SAS is the leader in business analytics software and services, and the largest independent vendor in the business intelligence market. Through innovative solutions, SAS helps customers at more than 65,000 sites improve performance and deliver value by making better decisions faster. Since 1976, SAS has been giving customers around the world THE POWER TO KNOW. SAS Institute Inc. World Headquarters +1 919 677 8000 To contact your local SAS office, please visit: sas.com/offices SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. Copyright 2013, SAS Institute Inc. All rights reserved. 106044_S118298_1213