Best Practices for Maximizing Data Performance and Data Quality in an MDM Environment
Today s Speakers Ed Wrazen VP Product Marketing, Trillium Software Rich Pilkington Director Product Marketing, Syncsort Inc,
About Syncsort and Trillium Syncsort 40+ Years of Software Development and Performance Innovation Worldwide offices in Europe and 200+ resellers/distributors globally in 68 countries 3 Product Families Data Integration DMExpress Data Protection Backup Express (BEX) Data Sorting SyncSort 39 Years of Profitability 11,500 product licenses worldwide Trillium Software A business unit of Harte-Hanks NYSE listed (HHS) $1+ B Revenue, 6,000 employees worldwide Worldwide offices in Europe, Asia-Pacific, Americas Went to Market in 1993 Double digit profitable growth for 15 years Trillium Software System v12 for Data Profiling, Data Quality and Data Quality Dashboard Recent Press World Record: 5.4 TB in 1hr, broke gb/s barrier Customer references commonly report that DMExpress significantly decreases processing times for transformation tasks, especially sorts, joins & aggregations. Ted Freidman, Gartner Payback from the deployment of DMExpress was less than Two Months and over 100% ROI. Kevin Hagedorn, DB Architect, Merkle (#1 Database Marketing Service Provider, 2008 Forrester Wave Report) Recent Press Forrester Wave Leader Gartner Magic Quadrant: Leader Current Analysis: Very threatening to competition Bloor Research (UK): Champion for Data profiling, Data Discovery, Data Quality Platform and Data Cleansing
Agenda Common Challenges of MDM and Data Management Proposed Best Practices Data Quality Data Performance Overview of Syncsort & Trillium Software Joint Solution Case Study: Insurance Industry Conclusions
Your Business Challenge If you can t access the information your customer provided on the last call, why do you believe they will keep doing business with you? Minimize wastage and overpayment in your supply chain Ensure complete and timely billing Meet compliance and privacy regulations Improve accuracy, transparency and consistency of financial reports Avoid business dealings with fraudulent or risky customers Improve processes for selling to, servicing and interacting with customers You need complete real-time views of people, organizations, products and assets from data dispersed across multiple data sources and applications
Understanding MDM Master Data controlling values that uniquely define the objects (people, places, and things) that provide context for transactions Master Data Management (MDM) processes and technologies to (1) support the global identification, linking and synchronization of master data across heterogeneous data sources via semantic reconciliation of reference data, and (2) create and manage a central database system of record* Implies developing a system that provides Data validation, standardization and semantic reconciliation Automated identification & transformation Support for manual exception processing Data movement and synchronization Persistent data storage * Adapted from Gartner Research 2006
The MDM Ecosystem
Common Challenges Market Trends Exploding Data Volumes Service Levels Staffing Limitations Increasing Data Complexity Enterprise Solutions Best of Breed Economic Impact Deployment Timelines Ease of Use = staff productivity TCO / ROI 8
Impact of Exploding Data Volumes Increasing Data Volumes ( THE GAP IS GROWING!!! Hardware (and other resources) simply cannot keep up. Hardware Speed/ Capacity ) Due to business and operational requirements, batch windows are shrinking. SO, next quarter s window has to deal with both MORE DATA and a smaller window. Mark Madsen, TDWI (Source: Winter Top 10 and Customer Reports) 9
Increasing Data Complexity and we wonder why!!!
Economic Impact I Total Cost of Ownership (TCO) COST Deployment Timelines Typical Reality Scalability Cost Cost Typical Goal Operational Efficiency to provide nimble solutions with low deployment timelines and costs Performance to address immediate requirements to consolidate and cleanse large, complex amounts of data at high speed Scalability to meet the performance needs of growing data volumes, increasing data complexity, and looming economic challenges T I M E
Proposed Best Practices -Data Quality
Getting By or At Risk?
Data Quality Services Is Data Quality Included in MDM? Functionality Profiling Automated Identification & Routing Context-Sensitive Data Cleansing Trillium Software Highly interactive data analysis workbench Rich views of patterns, relationships and anomalies in the data Deductive approach based on all available data Out-of-box rules Context-sensitive Out-of-box rules Typical MDM Platform Not included Deterministic logic based on a limited set of fields Rules must be created One-to-one substitution based on literal values Rules must be created Verification & Enrichment Matching Merging & Survivorship Address verification & geo-coding out-ofbox Easy to append other enrichment data Robust matching based on tunable & auditable rules Highly granular, multi-matching options Robust selection of surviving values on field-by-field basis Not included Capabilities vary from very basic (exact) matching to fairly robust fuzzy logic Capabilities vary from very basic logic to select surviving record to fairly robust fuzzy logic
Trillium Software System Discover anomalies, rules, relationships and meanings in existing datasets Define data metrics and targets Relate metrics to business impact Trending and Scorecard reporting Red, Amber, Green conditions Apply robust out-of-the-box rules to standardize data from around the globe Enhancement with additional attributes such as geocodes, product classifications, etc. including using external Automatic de-duplication, sources relationship linking and merging based on transparent, tunable rules
Country-Specific Standardization How to repair and make sense of legacy data Name1: Flugtaggen GMBH Name 2: rhamer strasse 20 Address: dus City/Town: 40489 Post Code: Werner Schmidt Country: Value Added for MDM Fully automate data cleansing Apply country intelligence (names geographic, etc.) Standardize critical data elements Context-sensitive data interpretation Enrich data (geocoding, etc.) Business Name: Flugtaggen GMBH Contact Name: Werner Schmidt Street Name: Rhamer Street Type: Str. Street Number: 20 City/Town: Düsseldorf Post Code: 40489 Country: DE Increased accuracy = better business processes & better matching
How Standardization Helps Matching Original Record 1 Original Record 2 Name: Peggy Smith Address: 345 6 th Ave City: NY State: NY Zip: 01012 Country: Name: Margaret Smith Address: 345 Avenue of the Americas City Manhattan State: NY Zip: 1012 Country: USA Standardized Record 1 Standardized Record 2 Root First Name: Margaret Last Name: Smith Address: 345 Ave of the Americas City: New York State: NY Post Code: 01012-3821 Country: USA Root First Name: Margaret Last Name: Smith Address: 345 Ave of the Americas City: New York State: NY Post Code: 01012-3821 Country: USA
Matching, Merging & Survivorship Intelligently identify links and relationships, consolidate data with precision Date First Last Phone Email Source 08/02/00 Art Barrios bigwheels@hotmail.com WEB 12/02/2005 A. Barros 908-845-1234 abarrios@accen.com CRM 6/17/2003 Arthur Barrios (902)-845-4417 abarrios@accen.com SAP Specific matching routines Date First Name Last Name Ignore Punctuation Absolute Distinct survivorship routines Most Recent Complete Most Common Most Recent Best Source 12/2/2005 Arthur Barrios 908-845-1234 abarrios@accen.com Flexibility for creating cleansed, standardized, consolidated views from multiple sources
Enterprise Data Quality Services
Proposed Best Practices -Data Performance
Data Performance and MDM Why Data Performance is important Where do we find Data Performance solutions? Data Integration ETL Component of an MDM Package Part of a BI Tool Hand coding OTHER? Requirement Data Consolidation Flexibility Time to Value and Risk Evaluation of TCO Solution is able to Handle high batch data volumes that support large scale data consolidation as data volumes expand while MDM efforts kick off new projects. Support open standards and be configurable to unique business processes, but not mandate changes to applications and processes to accommodate a product suite. Be easily customized based on project requirements, provide predefined integration points to complementary technologies such as EAI, ERP, CRM, etc. as well as third party reference databases. Offer the lowest TCO with out-of-the-box functionality and capability to manage customer, product, account and location as well as other key sources and targets
Key Elements of Data Performance What makes a solution fast at the lowest TCO HOW? accomplished through: Algorithm Design Architecture Exploitation Dynamic Optimization Constant Benchmarking IMPACT enable customers to: Process massive data volumes on inexpensive, commodity hardware minimal elapsed time minimal resource footprint Dynamic Optimization Leverage Proprietary Syncsort Technology To Monitor And Optimize Performance Algorithm Design Library of Algorithms, Including Performance Expertise Job Specifics Data Characteristics, Available Resources, Platform Architecture Architecture Exploitation Knowledge of Platform Specific Optimizations Performance Acceleration Engine Constant Benchmarking Rigorous Tests Against New Platforms For Continued Improvement 22
Parse Where Data Performance Fits Production/Data Management Life Cycle Incumbent Data Fit For Purpose Data CICS/IMS Operational Flat Data Performance Customer File SQL Server Extract Sort Join Process & Call Post Process Aggregate Load Data Exceptions Total Data Quality MONITORING Trillium Software Enrich Match Survive Our Client(s) Server
Overview of Syncsort & Trillium Software Joint Solution
Positioning of Joint Solution Web and E-mail Retail and wholesale Data Profiling DMExpress Data Cleansing and Consolidating DMExpress Fastest DI Syncsort DMExpress and Trillium Software gives businesses: an end-to-end data transformation process that builds best-of-breed data quality into best of breed data performance Phone, Fax and Mail Consolidated Customer, Sales, Marketing Data Marts Updated weekly / daily or more often to create accurate, unified views of business entities, built from multiple data formats and sources.
Technical Snapshot of Joint Solution DMX Task evokes Trillium. Then, a DMX Extract Function moves data into the final transformation Enables customers to develop within each UI, but process the data as if they had one tool. Function Task Job
Case Study
Case Study: Insurance Industry The Company: Largest personal and group insurance provider in region, over 500,000 members Full range of insurance products Award winning service/call center platform The Problem: No Data Governance process or Data Stewardship The source data was not fully understood Metadata was incomplete, inaccurate Complexity was underestimated, rework unpredictable, manual analyses Integrate over 10 corporate and over 50 department specific applications Ensure that their country Post standards were met Reduce amount of ID cards and duplicate names polluting the system. Decrease and control the exploding operational costs Desired Solution: To implement a solution that would integrate and build an enterprise wide insurance processing system to support the award winning service/call center platform
After applying best practices Results: They are now able to cleanse and standardize data as it enters their enrollment application. Because duplicates never enter their environment, they have shorter, more effective service calls with their customers while maintaining clean data in their CRM and processing systems. As a result: Identify and append missing names Locate consolidated customer benefit information Real time cleansing and matching cleansed once, at the source Data seamlessly delivered to targets well within business service levels Facilitate conversion of legacy data into their processing system Over 96% address accuracy, associated with postal discounts Provide a sound, straightforward governance and stewardship process Establish a scalable solution to handle today s and tomorrow s data volumes Lower total cost of ownership (TCO) than any other solution considered
Conclusion
In conclusion Best Practices: blend of data quality and performance Ensure you have what you need to control exponentially increasing data volumes Follow a straight forward process that identifies, monitors and ultimately simplifies data complexity Leverage solutions that can be deployed and maintained with low TCO and high ROI.
Questions?
Contact Us Trillium Software 978-436-8900 www.trilliumsoftware.com Syncsort 877-FAST-951 (877-327-8951) www.syncsort.com