Mastering Data Management Mark Cheaney Regional Sales Manager, DataFlux
Today, the amount of technical information doubles every two years every two years
It is forecast to double every three days
There are over 31 Billion searches on Google every month LOADING Source: Did You Know 3.0 (Fisch, McLeod, Brenman)
In 2006, this number was 2.7 Billion Source: Did You Know 3.0 (Fisch, McLeod, Brenman)
Source: Did You Know 3.0 (Fisch, McLeod, Brenman)
Source: Did You Know 3.0 (Fisch, McLeod, Brenman)
Times Are Changing 1 out of 4 workers have been in their job less than one year. 1 out of 2 less than five years Top 10 in-demand jobs today didn t exist in 2004
Are We Keeping Up? Over 80 US banks failed in 2009 The US government has taken majority ownership of General Motors, Freddie Mac, Fannie Mae, AIG Valparaiso, Indiana had an $8M budget shortfall in 2007 US health care now 17% of personal income Société Générale lost $7.5B in a 2008 derivatives trading fiasco
What Does This Have to Do with Data?
Mastering Data Management
Mastering Data Management Is Data a Trusted Business Asset? Is Data Managed Across Your Enterprise?
Data Governance Maturity Model Sales Force Automation Data Warehouse Customer MDM Business Process Automation Database Marketing ERP Product MDM MDM CRM IT-driven projects Duplicate, inconsistent data Line of business influences IT projects Little cross-functional collaboration IT and business groups collaborate Enterprise view of certain domains Business requirements drive IT projects Repeatable, automated business processes Inability to adapt to business changes High cost to maintain multiple applications Data is a corporate asset Personalized customer relationships and optimized operations
How Do We Master Data? Establish the people and policies for data governance Focus data management on business process improvement Standardize on a data management technology platform
Data Governance People and Policies
Data Governance: IT and Business Collaboration Business IT Executive Sponsorship Data Governance Council Data Steering (business experts) LOB Data Governance Data Stewards Data Management Data Architecture Data Administration Security and Privacy
Data Governance Executive Support Management Support Collaboration Little to No Support 58% No Noticeable or Better Support Little Collaboration 83 No Collaboration
Data Governance Regimes Core Business Processes Data Governance Council Sales Customer Service Finance Marketing Human Resources Procurement Campaign Management Hiring Order Management Billing Trouble Ticket Tracking Originally published in A Data Governance Manifesto by Jill Dyché. Used with permission from Baseline Consulting. Accountable Consulted Informed
Data Governance Policy Creation, documentation (including business vocabulary), approval process and maintenance of data standards for form, function, meaning and versioning Quality and stewardship for data elements, business rules, hierarchies, taxonomies and content tagging Creation and maintenance of enterprise data model and enterprise data services Metrics, monitoring and evaluation of standards
Business Process Improvement
Manage Data for Business
Traditional Data Management Approach Data Domain Data Source Data Source Data Source Data Rule Data Rule Data Rule Data Rule Data Rule Data Rule Data Rule Data Rule Data Rule Trusted, Integrated Data
Emerging Data Management Approach Mastering Data for Business Business Domain Business Policy Business Policy Business Policy Business Info Business Info Business Info Business Info Business Info Business Info Business Info Business Info Business Info Data Source Data Source Data Source Data Rule Data Rule Data Rule Data Rule Data Rule Data Rule Data Rule Data Rule Data Rule Trusted, Integrated Data
Data Management Platform
DataFlux UnityPlatform Business Process Automation
Reporting and Dashboards Business Rule and Event Processing and Monitoring Data Archiving Data Privacy and Security Metadata Management Search and Navigation Data Access Business Vocabulary/Data Definitions Design and Development Environment
Identity Resolution Business Rule Creation and Management Verification, Normalization, Standardization, Transformation Data Exploration and Profiling Unstructured Data Discovery Hierarchy and Reference Data Definition Metadata Discovery Data Enrichment
Business Process Integration Merging and Clustering Business Rules Execution Grid Computing Data Federation ETL/ELT Data Synchronization Data Services and SOA
Business Data Services Entity Definition/Management and Search Best Record Selection Master Data History/Auditing and Exception Reporting Domain Data Models
How Do We Master Data? Establish the people and policies for data governance Focus data management on business process improvement Standardize on a data management technology platform
5-steps to Improving Data Management
DataFlux Approach
Five Steps to Better Data
Data Profiling Identify data quality issues Determine if data fits requirements Identify business process issues
Real-Life Profiling Exercises A financial services company knew of 3 genders: M, F, and blank. They did not know about X and C. A home care products company discovered shipments slated for 16 x16 pallets. The IS manager wondered what kind of truck they would go on. Prior to a VA audit, a cross-check of medical billings by a healthcare provider showed it was performing open heart surgeries in ambulances. Consumer products mfr. learned a product of theirs was railroad boxcars.
Analyze - Profiling Table, Column, & Relationship Metrics Metadata Analysis Visualization Pattern Recognition
Data Profiling Uncover Problematic or Inconsistent Data View detailed information on the accuracy, completeness, consistency, structure, uniqueness and validity of data Create and share reports to build consensus on data quality and data governance efforts
Data Quality Correct identified data quality issues Normalize inconsistent data Correct address information
Types of Data Quality Problems Standards Ambiguous Business Rules Multiple Formats for Same Data Elements Different Meanings for the Same Code Value. Multiple Codes Values with the Same Meaning Field Overuse: used for unintended purpose. Data content Missing & Invalid data. Data domain outliers. Illogical combinations of data Data structure and storage Uniqueness Referential integrity Data in Filler Migration/integration Normalization inconsistencies. Duplicate or lost data
Data Quality & Deployment Styles
Data Integration Identify and eliminate duplicates Identify and link households Move data from source to target
Data Integration Apex Equipment Pittsburgh PA Data Profiling Metadata Discovery Business Rule Definition Entity Definition Apex LLC Pittsburgh, Penn Call Center Data Warehouse Apex Equipment & Construction, LLC Pittsburgh PA 15233 SFA Apex Equip & Const Pitt PA Data Integration Data Quality Data Model Business Services Stewardship Console Business User Interface Data Governance Identity Management Reporting ERP Apex Construction Pittsburgh PA
Data Enrichment Make data more useful Add postal information to improve customer outreach Append product codes to speed procurement and materials management efforts
Data Enrichment Validate and verify Data validation and verification ensures data accuracy Test data against other data sources (internal or external) known to be correct or current Product code verification (industry-standard codes, UPC, ISDN) Address verification (ZIP codes, geocoding) Input 940 Cary Pkw Cary NC 27503 Validated data 940 NW Cary Pkwy Ste 201 Cary NC 27513-4355 County: Wake Census Tract: 452.2
Data Enrichment Validate and verify
Data Enrichment Validate and verify
Data Monitoring Data integrity checks & balances. Business rule development by business analysts. Data Stewards empowered through dashboard monitoring.
Data Monitoring Maintain High-Quality Data Over Time Ensure clean data stays clean Validate data against your business rules Automatically identify invalid data
About DataFlux Recognized as a leading provider of data quality, data integration, and MDM solutions Provides a unique single platform to analyze, improve and control enterprise data Over 1,200 customers worldwide Offices in the US, the UK, France and Germany Founded in 1997 Acquired by SAS, the world s largest privately owned software company, in 2000 Operates as a wholly-owned subsidiary
Questions DataFlux Midwest Manager Mark Cheaney mark.cheaney@dataflux.com 630-799-8058 For more information, visit: www.dataflux.com