How To Improve Product Data Quality



Similar documents
DataFlux Data Management Studio

Best Practices for Managing Seasonal Items Vermont Information Processing, Inc.

AN INTRODUCTION TO THE GLOBAL TRADE ITEM NUMBER (GTIN) TABLE OF CONTENTS

Solution Provider. Engagement with GS1 Standards

Informatica Master Data Management

NCOE whitepaper Master Data Deployment and Management in a Global ERP Implementation

What Your CEO Should Know About Master Data Management

Decision Tree Methodology - A solution for Local Vs. Global Master Data conflict

ORACLE ENTERPRISE DATA QUALITY PRODUCT FAMILY

ERP Implementation for Small and Medium Sized Companies Leads to Rocketing Revenues

Mastering Data Management. Mark Cheaney Regional Sales Manager, DataFlux

Considerations: Mastering Data Modeling for Master Data Domains

Using Master Data in Business Intelligence

AN INTRODUCTION TO THE GLOBAL INDIVIDUAL ASSET IDENTIFIER (GIAI) TABLE OF CONTENTS

Lawson Healthcare Solutions Optimization of Key Resources Forms a Foundation for Excellent Patient Care

RESEARCH NOTE NETSUITE S IMPACT ON MANUFACTURING COMPANY PERFORMANCE

White Paper February IBM Cognos Supply Chain Analytics

The Health Industry Bar Code Standard:

Cost-effective supply chains: Optimizing product development through integrated design and sourcing

Enterprise Systems: From Supply Chains to ERP to CRM

APICS 2012 BIG DATA INSIGHTS AND INNOVATIONS Discovering emerging data practices in supply chain and operations management

Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole

A UNSPSC Success Story. Ministry Health Care Milwaukee, Wisconsin

Product Lifecycle Management in the Food and Beverage Industry. An Oracle White Paper Updated February 2008

A Simple Guide to Material Master Data Governance

An Oracle White Paper April, Spend Management Best Practices: A Call for Data Management Accelerators

The key to success: Enterprise social collaboration fuels innovative sales & operations planning

August Industry Report: SolarBusinessServices. Solar Businesses in Australia. Prepared for: Rec Agents Association

The Business Case for Information Management An Oracle Thought Leadership White Paper December 2008

Vehicle Sales Management

Supplier Relationship Management Analysis PURCHASING FINANCIAL SUPPLIER BUYER PERFORMANCE ANALYSIS PERFORMANCE PERFORMANCE

14 Best Practices: Inventory Management Techniques

White Paper. Mining and ERPs Supply Chain Management and Beyond

Data Governance, Data Architecture, and Metadata Essentials

Sales Order Management

Section D: Logistics APICS All rights reserved Version 1.4 Draft 2

Streamline Accounts Payable Processes with Cloud-Based Electronic Invoicing

Data Quality and Cost Reduction

ENTERPRISE ASSET MANAGEMENT (EAM) The Devil is in the Details CASE STUDY

Making Strategic Decisions with Oracle Advanced Planning. An Oracle White Paper September 2006

Relationship management is dead! Long live relationship management!

Supply Chain Management Build Connections

Enabling Data Quality

A Simple Guide to Material Master Data Governance. By Keith Boardman, Strategy Principal

Supply Chain Optimization

Customer Master Data: Common Challenges and Solutions

Big Data Analytics in B2B Ecommerce - Making Big Decisions

Data Governance, Data Architecture, and Metadata Essentials Enabling Data Reuse Across the Enterprise

Globalization Drives Market Need for Supply Chain Segmentation: Research & Key Strategies

Optimize Retail Label and Poster Printing with SAP Software

whitepaper critical software characteristics

Using LSI for Implementing Document Management Systems Turning unstructured data from a liability to an asset.

Industrial Distribution

Create and Distribute Rich Media for Optimized, Omnichannel Customer Engagement

Microsoft Dynamics Food and Beverage Distribution Telesales Guide

B2B E-Commerce Solutions Empower Wholesale Distributors

Integrated Sales and Operations Business Planning for Chemicals

Streamline your staffing process with a vendor management system that fits your business

The Global Supply Chain Goes Collaborative

Measure Your Data and Achieve Information Governance Excellence

Supply Chain Mapping. Customer Solutions

ORACLE PRODUCT DATA HUB

On Your Mark, Get Set, Go! ebook

Supplier Relationship Management: Moving From "Counterparties" to Collaboration

Master Data Management Framework: Begin With an End in Mind

Product Lifecycle Sourcing enabled by Teamcenter s SRM solutions

Operations/Inventory Excellence

SAP's MDM Shows Potential, but Is Rated 'Caution'

How To Create An Insight Analysis For Cyber Security

An Enterprise Resource Planning Solution (ERP) for Mining Companies Driving Operational Excellence and Sustainable Growth

Storage & Inventory Control

Flattening Enterprise Knowledge

OPERA Central Systems. Create success. integrated efficiency for hotel chains

TAKEAWAYS CHALLENGES. The Evolution of Capture for Unstructured Forms and Documents STRATEGIC WHITE PAPER

Data Audit Solution. Data quality visibility in 5 days for improving corporate performance. TABLE OF CONTENTS. Purpose of this white paper

ISM Online Course Offerings

I-Track Software. A state of the art production and warehouse management system designed for Food and Beverage Manufacturers. Overview 2.

Applying Business Architecture to the Cloud

How To Get A Better At Writing An Invoice

Leveraging the power of UNSPSC for Business Intelligence

Asian Paints: Enabling Real-Time Analytics Across Growing Data Volumes

Operations and Supply Chain Management Prof. G. Srinivasan Department of Management Studies Indian Institute of Technology Madras

Employee Survey Analysis

The Economic Benefits of Multi-echelon Inventory Optimization

Increasing the Productivity and Efficiency of Business Transactions with Microsoft Business Solutions Navision Intercompany Postings

EDI 101. Your Basic Course in Electronic Data Interchange

Transcription:

Three Critical Steps to Improving Product Data Quality A DataFlux White Paper Prepared by Jim Harris

Introduction Convincing your organization to view data as a strategic corporate asset and, by extension, data quality as a strategic corporate discipline can be challenging. The relationship between business processes and the data used and created by those processes is not always obvious and tangible. In other words, how does the organization s data affect its business decisions and its ability to succeed? Since the strategic importance of one corporate asset has never been in question, namely the products your organization sells, the data describing those products must be of sufficient quality to support optimal business performance, right? Let s imagine you work for Acme Foods and are making a presentation to executive management about the need for improvements in product data quality. You tell the eight executives in the room that each has on the table in front of him a different product from a current list of Acme Foods Top 100 Best Selling Products. The executives are confused because they all have the same kind of candy bar in front of them. Each has a card attached with a number and some text written on it. You explain that the number is the sales rank and the text is the product description, which was copied directly from the Acme Foods master product catalog. They pass their candy bars around the room, pausing to read the attached cards. After a few minutes, you display the following chart as your only presentation slide: You briefly point out a few of the obvious product data quality issues: Numerous variations in the official brand name (E<3MC 2 ), which stands for Everybody Loves Milk Chocolate Squared Six duplicate records are describing one product (excluding #15 and #55), meaning (at least) six of the top 100 best sellers are actually the same product #15 is not a duplicate because of a different unit count based on packaging; #55 is not a duplicate because of a different unit size (i.e., it is a bag of ten smaller chocolate squares instead of one larger chocolate square candy bar) 1

Business Impacts of Poor Product Data Quality Confronted with a tangible demonstration of the product data quality issues plaguing Acme Foods, the executives begin discussing some of the business impacts: Sales Forecasting Incorrect sales numbers negatively impact the ability to predict sales trends as well as plan future product marketing and promotions Spend Analysis Incorrect sales also negatively impact the procurement planning for purchasing the raw materials to make the products being sold Supply Chain Optimization Incorrect procurement levels trigger manufacturing disruptions and inefficiencies throughout the supply chain Inventory Management Incorrect inventory levels cause order fulfillment delays in distribution channels, leading to delayed revenues or lost sales In far more general terms, the bottom line is that poor product data quality: Increases costs Decreases revenue Increases risks Disrupts daily operations Causes bad tactical business decisions Undermines strategic corporate planning Even though Acme Foods prides itself on excellent business process management as well as hiring and then investing in great people and implementing the latest technology, none of these best practices can save it from the havoc wreaked by poor data quality. Data must be viewed as a strategic corporate asset and, by extension, data quality a strategic corporate discipline, because high-quality data serves as a solid foundation for success, enabling better business decisions and optimal business performance. Congratulations! The Acme Foods executives just approved a product data quality improvement project. Now what? How will you approach this daunting challenge? Using Acme Foods as a fictional case study, this white paper will describe a general approach for planning your organization s efforts to improve product data quality. It will provide a data-example-driven perspective of some of the unique challenges of product data quality, as well as discuss and demonstrate the three critical steps to improving product data quality. 2

Unique Challenges of Product Data Quality Product data presents some challenges that are different from other data domains. The first unique challenge of product data quality is that product is a generic term that can mean many different things. For example, a product could refer to: Raw materials used to manufacture products, e.g., the cocoa beans that Acme Foods purchases as a raw material for manufacturing chocolate Semi-finished goods from an intermediate stage of product development, e.g., the couverture chocolate that Acme Foods uses to make candy bars Finished goods, i.e., stock-keeping units (SKUs), which may be a single product, a package containing several products, or multiple products within the same brand based on packaging variations in the unit size and unit type Example of Packaging Variations 3

Other data domains, such as Customer Name and Postal Address, have a relatively small set of easily defined and recognized data attributes and data quality standards. (However, these definitions and standards are not always consistently enforced.) But the complex product supply chain includes manufacturers, distributors, suppliers, wholesalers, retailers and other vendors. All of these organizations typically maintain their own product catalogs, often with inconsistent data quality standards. There are some standards for product data quality, but they are not yet as widely adopted as standards for other data domains. Examples of these standards include: United Nations Standard Products and Services Code (UNSPSC) defines over twenty thousand categories of common commodities and services Uniform Code Council (UCC) specializes in data standards for bar codes and electronic data interchange (EDI), primarily for North America European Article Numbering (EAN) European standards similar to UCC EPCglobal collectively established by UCC and EAN to develop standards for electronic product codes (EPC) and radio-frequency identification (RFID) Universal Product Code (UPC) worldwide bar code standard for the electronic identification of containers, pallets, cases, products and SKUs These standards can assist with establishing consistent product descriptions and assigning unique product identifiers. However, these identifiers can suffer from the same data entry errors and data formatting variations as identifying attributes for other data domains. Also, these identifiers may not always be available and could be replaced with proprietary product identifiers, or even database surrogate keys. Therefore, effectively implementing these or other product data standards often requires matching based on product description, which is usually unstructured, meaning that most product data attributes are buried within a free-form text field. And when you are creating your own product data standards, or receiving third-party product data that follows a different standard (or none at all), recognizing and extracting product data attributes from a free-form text field will be your primary task. Therefore, categorizing, standardizing and matching product descriptions are three fundamental challenges to overcome when improving product data quality. Data quality tools provide considerable assistance with these challenges. However, compared to other data domains, a product data quality project will typically require more customization of what the data quality tool provides out of the box. Most of the customization effort is teaching the tool how to understand what are essentially the vocabulary, spelling and grammar of the product data language. 4

Improving Product Data Quality The three critical steps to improving product data quality are: 1. Categorization 2. Standardization 3. Matching The remainder of this white paper will discuss and demonstrate these concepts from a data-example-driven perspective using the fictional products of Acme Foods. Categorization Determining the product category is an important first step because the category provides context for the product description, where the same words, abbreviations and symbols can mean something different within different product categories. For example, consider the following product descriptions: 5

Many large organizations have diverse product catalogs using a complex taxonomy or hierarchy of product categories, which are often managed by different groups of subject matter experts (SMEs). Categories are sometimes keywords that are found within the product description, but most often the category must be extrapolated from a semantic understanding of the product description. By determining the category for these product descriptions, we can begin to divide and conquer the challenge of improving product data quality by using category as a filter to route records to category-specific standardization processes. Data quality tools provide assistance by parsing the free-form product description to search for the key words, phrases and other logic necessary for categorization. For simplicity, the data examples we are working with only represent two categories, Candy and Beverage. But simply categorizing all product descriptions containing the word Chocolate as Candy and Sugar as Beverage would improperly categorize both the Chocolate Energy Drink and the Sugar Chewing Gum. Therefore, the automated categorization process provided by the data quality tool has to use natural language processing and instantiate the knowledge of data SMEs. The Acme Foods SMEs have helped us properly categorize the product descriptions: Please Note: It is a recommended best practice to design your categorization process as a separate function so that the technical processes are aligned naturally with the category-specific business rules provided by the product data SMEs. 6

Standardization Free-form fields often contain numerous variations resulting from data entry errors, different conventions for representing the same value and a general lack of data quality standards. Additional variations are introduced by multiple data sources, each with its own unique data characteristics and data quality challenges. Standardization parses free-form fields to break them down into smaller individual fields to gain improved visibility of the available input data, create a more consistent representation, apply standard values and, when possible, populate missing values. However, it is important to note that sometimes what appear to be semantic inconsistencies in product data are intentional variations to accommodate such aspects as regional and linguistic differences, as well as special promotions. Therefore, the standardization process should be designed as modular as possible to support a plug and play approach for various components, similar to how it was recommended that categorization and standardization should be separate processes. Data s quality is determined by evaluating its fitness for the purpose of business use. However, in the vast majority of cases, data has multiple business uses, and data of sufficient quality for one use may not be for other valid business uses. When the standardization process has a flexible architecture, it is easier to convert among various product data standards and support a wider range of business purposes. Most of the product attributes in our data examples are stored within the overloaded description field, such as unit count, unit size, unit measure and unit type. Even when the data source contains these attributes as separate fields, they can be sparsely populated or contain defaults or other values conflicting with the content of the product description field. Our product data standardization process is going to create the following fields: Brand the brand name of the Acme Foods product Unit Count the number of units in the packaged product Unit Size the number associated with the unit of measurement Unit Measure the unit of measurement for the product Unit Type the packaging type of the product Product Description remaining description not covered by the above fields Please Note: Many additional fields are commonly created when standardizing product data, especially to facilitate improved matching, but this white paper focuses on the above fields for the purposes of demonstrating standardization concepts. 7

Candy Brands Let s begin by focusing on only the products in the Candy category: Our Candy SMEs have highlighted in bold the contents of the product description that is appropriate for the new Brand field we are creating in this two step process. The first step is to separate the brand name content from the product description: The second step is to standardize the representation of the brand names: Please Note: Implement these steps separately to make it easier to apply different standards when appropriate (e.g., using regional brand names in a local language). 8

Beverage Units Now let s focus on only the products in the Beverage category, which has already been branded following the same process described in the previous section: Our Beverage SMEs have highlighted in bold the contents of the product description that is appropriate for the new Unit fields we are creating in this two-step process. The first step is to separate the unit information from the product description: The second step is to standardize the representation of the unit information: 9

Please note: Missing Unit Counts were populated with 1 as their default value, and the remaining content of the original product description has also been standardized. Before and After Standardization 10

After applying all of the standardization logic described above, we can easily see the dramatic improvement in the data quality of our product data examples: Matching Matching for product data is usually performed for either comparing records within and across data sources in order to evaluate if they correspond to the same product (i.e., are duplicates) or for matching records against a standard product reference (e.g., UNSPSC in order to obtain the product commodity classification code). Matching often uses standardization to prepare its input. This facilitates a direct evaluation of comparable fields (e.g., brand name to brand name) and more reliable comparisons based on standardized values. It also decreases the failure to match records because of data variations, and increases the probability of effective match results. The standardization of our data examples has normalized the product descriptions to the point that the six duplicate records in the Candy category, which were highlighted in the introduction, can now be easily identified as exact matches: 11

If the six duplicates were consolidated into a single record, then the E<3MC 2 brand could be properly represented as the following three unique Acme Foods products: Data quality tools support the advanced duplicate consolidation logic often necessary for selecting or constructing the consolidated record (aka survivor or golden copy ). Obviously, exact matching on rigorously standardized data is neither a recommended best practice nor a limitation imposed by data quality tools, which provide advanced matching techniques for overcoming data variations and other data quality issues. Although those techniques are beyond the scope of this white paper, standardization will still play an important supporting role, especially for improving candidate selection for automated and interactive matching, as well as for searching the product catalog. Data quality tools also provide some way to rank their match and search results (e.g., numeric probabilities, weighted percentages, odds ratios or confidence levels) as a primary method in differentiating automatic matches, automatic non-matches and potential matches requiring manual review and verification by a SME. After Matching After matching has performed duplicate identification and consolidation, the updated Acme Foods product catalog now has dramatically improved product data quality: 12

Searching and matching against this new internal standard product reference can prevent future duplicates from being added to the Acme Foods product catalog. Summary Product data presents some challenges that are different from other data domains. The root cause is often the product description, which is usually unstructured, meaning that most product data attributes are buried within a free-form text field. This white paper provided a data-example-driven perspective of some of the unique challenges of product data quality, as well as discussed and demonstrated the three critical steps to improving product data quality: 1. Categorization Organizes product descriptions by category, aligning technical processes and business rules with subject matter experts (SMEs), and routes product descriptions to category-specific standardization rules 2. Standardization In a two-step process, first separates the content of the product description into new fields, and second applies standard values. Implementing these steps separately makes it easier to apply different standards when appropriate (e.g., regional standards in a local language) 3. Matching Identifies and consolidates duplicate products within a source, facilitates improved search capability and supports matching against an internal or external standard product reference 13

The fictional data examples from the Acme Foods product catalog demonstrated that I love Sugar Water and Everybody Loves Milk Chocolate Squared (E<3MC 2 ). But if there is only one fact that you take away from this white paper, let it be this one: Everybody Loves High Quality Product Data. To learn more about data quality, visit: dataflux.com/knowledgecenter/dq 14

www.dataflux.com Corporate Headquarters DataFlux Corporation 940 NW Cary Parkway Suite 201 Cary, NC 27513-2792 USA 877 846 3589 (USA & Canada) 919 447 3000 (Direct) info.us@dataflux.com DataFlux United Kingdom Enterprise House 1-2 Hatfields London SE1 9PG +44 (0)20 3176 0025 info.uk@dataflux.com DataFlux Germany In der Neckarhelle 162 69118 Heidelberg Germany +49 (0) 69 66 55 42 04 info.de@dataflux.com DataFlux France Immeuble Danica B 21, avenue Georges Pompidou Lyon Cedex 03 69486 Lyon France +33 (0) 4 72 91 31 42 info.fr@dataflux.com DataFlux Australia 300 Burns Bay Road Lane Cove, NSW 2066 Australia +61 2 9428 0553 info.au@dataflux.com DataFlux and all other DataFlux Corporation LLC product or service names are registered trademarks or trademarks of, or licensed to, DataFlux Corporation LLC in the USA and other countries. Copyright 2010 DataFlux Corporation LLC, Cary NC, USA. All Rights Reserved. Other brand and product names are trademarks of their respective companies.