Drive business process improvement and performance with high quality data Adam Bracey Solutions Architect abracey@informatica.com (317) 218-7661 1 1
Impact of Poor Data Quality Lack of Trust or Confidence in data for BI and DW Most BI and DW users have absolutely no control over the operational systems and processes that capture the majority of the data required within their environments. They can only wait until this dirty data flows downstream throughout the enterprise and comes to rest in the polluted lakes that fill most DW/BI environments. Rob Karel, Forrester, January 2008 Information Managers: Deliver Trusted Data with a focus on Data Quality 2
Agenda Benefits of Trusted Data in the Data Warehouse Where does Data Quality Fit? Difficulties in Implementing Trusted data Informatica Approach 3
Benefits of Trusted Data in the Data Warehouse Improve confidence in Data Warehouse & BI reporting Fulfill compliance requirements (reducing risk) Reduce rework due to poor data quality Re-use of Data Quality services, built for the Data Warehouse, across other projects 4
Delivering Value to our customers Data Migration and Consolidation BI and DW Operational Data Quality MDM Data Governance Cloud Data Quality Regulatory Compliance Reduce costs through standard process for data migration for more than 70 legacy systems Regulatory Compliance Reduce global risk with timely and trusted data for IAS IFRS Data Quality Center of Excellence Delivered a 2,000 percent return on investment All Payor Database Understand true cost of healthcare by delivering Single view of Patient to the public, & healthcare providers Improve Supply chain Saved $1.4m in mailing costs. Reduced SKUs through visibility of inactive parts by 50 percent Rapid business value Maximize online sales with correct location with geocoding - globally 5
SWIFT NACHA HIPAA Data Quality: Where it fits? Data Intelligence DQ Reporting & Metrics Enterprise Applications BI Tools Single View of X Regulatory Reporting Front End Y Data Storage Data mart Data mart Data mart EDW DB Data Integration Data Quality Matching Scorecarding Cleansing Enrichment ODS Load Transform Extract Data Profiling: Analyse & Align Data Sources Application Database Unstructured Partner Data Cloud Computing Data Quality Firewall 6
Data Quality Dimensions Data Profiling Column Profiling Relationship Redundancy What is the data s physical characteristics? Across multiple tables? What relationships exist in the data set? Across multiple tables? What data is redundant? Orphan Analysis Completeness What data is missing or unusable? Conformity What data is stored in a non-standard format? Data Quality Consistency Accuracy What data gives conflicting information? What data is incorrect or out of date? Duplication What data records are duplicated? Integrity What data is missing important relationship linkages? Range What scores, values, calculations are outside of range? 7
Difficulties in Implementing Trusted Data Hard to find the problems, hard to fix them Lack of comprehensive tools End user tools not appropriate for business users Unable to apply and standardize data quality rules across applications Bad data flows from application to application, causing projects and processes to fail. Poor data quality costs millions. 8
Informatica Approach: The right people, process and tools Unified role-specific tools for all stakeholders Comprehensive support for all data and all purposes Open to all applications Business Analyst/ Data Steward Line of Business Manager IT Centralized Data Quality Rules Rules Rules Rules Data Quality Customer Order Product 9
Continuous Data Quality Improvement For all Users Line of business manager Scorecards Data Steward 1. Profile 2. Establish Metrics and Define Targets Browser-based tool 6. Monitor Data Quality Versus Targets Data Quality 3. Design and Implement Data Quality Rules IT Developer 5. Review Exceptions 4. Deploy Data Quality Services Eclipse-based development environment 10
Requirements to delivery trusted data? Data AnalysisParsing Address Matching Monitoring & & Discovery and Validation De-duplication & Standardization Reporting And do this for all data types 11
Data Analysis and Discovery Be able to identify patterns, formats, schema, and data quality issues Drill down into actual data Create rules by example as you profile the data 12
Parsing & Standardization The key objectives in data standardization are: to transform and parse data to multiple fields to correct completeness, conformity, and consistency problems to standardize field formats 13
Address Validation Validate or correct addresses for over 240 countries Have reference data from international postal agencies Validate WW data in one environment Be continuously maintained with WW post offices and databases 14
Match and De-Duplicate All Data Types Highly accurate matching requires consideration of multiple attributes using multiple rule sets Use confidence levels to automate life cycle processes Consider over 60 cultural variations for name matching Match data in spite of poor quality 15
Monitoring Quality Stakeholders need to be aware Current quality metrics Alerts if quality thresholds are not being met Delivery of reports and alerts must be web based 16
Difficulties in Implementing Trusted Data Hard to find the problems, hard to fix them Lack of Comprehensive Tools Comprehensive support for all data and all purposes End User Tools Not Appropriate for Business Users Unable to apply and standardize data quality rules across applications Bad data flows from application to application, causing projects and processes to fail. Poor data quality costs millions. 17
Business Empowerment Simple-to-use browser-based tools Designed for the tasks and skills of business data stewards and analysts Purpose-built, web-based UI for fast ramp-up Scorecarding & trending View business, not technical, representations Interact with data directly through profiling, rule validation, and scorecarding Business Manager Analyst & Steward Work with relevant data to meet business needs while reducing reliance on IT 18
Business Empowerment Interactive specification by example Ease of use for business through specification by example Access data & data profiles Specify rules by example Immediately validate rules Specify changes & corrections Analyst & Steward Developer & Architect Work with relevant data to meet business needs while reducing reliance on IT 19
Unified role-specific tools for all stakeholders Productive development environment with mid-stream profiling for IT developers Full palette of data quality transformations One click from profiling to rule configuration Mid-stream profiling Reusable rules IT Developer Seamless integration with PowerCenter and Data Services Enables developers to rapidly profile the output of any transformation at any stage of any mapping to instantly test and debug their logic. 20
Mid-Stream Profiling Profile at any point in the data flow Developer & Architect Profile Source Profile Target For IT: Accelerate the deployment of data quality projects. Profile anywhere in between 21
Difficulties in Implementing Trusted Data Hard to find the problems, hard to fix them Comprehensive support for Lack of Comprehensive Tools all data and all purposes End Unified User Tools role-specific Not Appropriate tools for for Business all stakeholders Users Unable to apply and standardize data quality rules across applications Bad data flows from application to application, causing projects and processes to fail. Poor data quality costs millions. 22
Centralized, Reusable Rules BI Application Customer Service Portal Sales Automation Application Enforce data quality standards across the organization Centralized data quality rules Rules Rules Rules Rules Customer Order Product Invoice 23
Used Across all Interactions Operational Integration At Point of Entry Batch Feeds & Data Warehouse For the business: Support data governance by enforcing consistent data quality rules across all applications. Centralized data quality rules Rules Rules Rules Rules Customer Order Product Invoice For IT: Accelerate the deployment of common data quality rules across all applications. Reduce costs through reuse. 24
Solution: Informatica Data Quality Comprehensive support for Lack of Comprehensive Tools all data and all purposes End Unified User Tools role-specific Not Appropriate tools for for Business all stakeholders Users Unable to apply and standardize data Open quality to all applications rules across applications Bad Using data Informatica flows from Data application Quality ensures to application, that your causing organization projects and is using processes the most to fail. trusted, Poor timely, data quality and relevant costs millions. data 25
How to overcome the challenges of implementing data quality? Lack of clear ownership of data quality between IT and the business Lack of Data Governance processes Lack of Technical integration between DI and DQ Lack of understanding of how best to implement data quality processes Role based tools to empower the business Business IT collaboration framework Unified DI and DQ with Informatica Platform Expand existing PC team experience by up skilling for data quality 26
Summary - Long Term Approach to Data Quality You can t fix it just once You won t be successful just writing a few scripts Create a partnership between Business & IT Support quality for ALL data types Enable Proactive Monitoring Empower Business / Stewards to do more Provide Exception Management Facility 27
The Data Integration Company 28
Frequent Requirements 29 29
Using Informatica Analyst Tools to Profile Your Data 100% browser-based Drill-through analysis Data Steward Right-click to create data quality scorecards Increase productivity and efficiency by enabling the business to proactively take responsibility for data quality and reduce their reliance on IT. 30
Parsing & Standardization: Product Data Product ID Brand Description 90017 ipod 4GB, Red ipod Nano //Special Edt. Product_ID Brand Size Color Description 90017 IPOD 4GB Red 4 Gigabyte Nano Special Edition (Red) 31
Parsing & Standardization: Names and Contact Info ContactName Phone Judy Dent // Bob s Assistant 415.555.1212 FirstName MiddleName LastName Title Phone Judy Dent Bob s Assistant +1 (415)555-1212 32
Address Validation (before and after) Address1 Address2 Address3 Address4 Address5 7887 KATY FRWY SUITE 333 HOUSTEN TX 99999 Street City County StateCode StateName ZIP ZIP4 Latitude Longitude 7887 Katy Freeway Suite 333 Houston Harris TX Texas 77024 2005 29.283427-95.46802 Valid addresses keep costs down and helps ensure compliance 33
Using data regardless of it format or correctness SKU Description Size Price AP-2199 Sailors Desk Lamp 12 in 27.99 AP2199 Nautical Lamp 12 inch 27.99 PA-2119 Sailors Lamp 12 inch 34.99 Intrinsically wrong (and potentially uncorrectable) data can still be valuable for Matching purposes Alternate or Nicknames Misspellings Invalid Data Name DOB Address City State Zip W. S. Harrison II PhD 1/33/1967 Medical Center,117/2A #17497 Jackson E. Hartford NY 16987 William Stuart Harison 1/3/1967 117-2a Jacksen Rd. Easthartford CT 06987 William Stewart Harison 9/9/99 117 Jackson Road. Suite 2A Hartford East CT 06987 Doctor Bill Harisen jr 1/13/1967 117 Jacson Room 2a HartfordCT 6984 Harrisen William Doctor 2a Jackson Rd #174978 Hartford CT 06987-4573 Highly accurate matching ensures the minimum number of duplicate master records Informatica Confidential 34
Monitoring the quality Easy-to-share browser-based scorecards for line of business managers Browser-based scorecards enabling you to: View and share data quality scorecards Drilldown to the actual records Line of business manager Take action to reduce the business impact Zero learning curve for business users to review and track data quality metrics, enabling data quality for the masses. 35
Business IT Collaboration to support data quality within the data warehouse Business Users Data Quality Stewards Rule Informatica Platform Shared Repository Shared Engine Data Analysts Developers Architects Mapplet SHARED Profiling, Reference Data, Rules, Notes, Results, Scorecards 36
How to overcome the challenges of implementing data quality? Lack of clear ownership of data quality between IT and the business Lack of Data Governance processes Lack of Technical integration between DI and DQ Lack of understanding of how best to implement data quality processes Role based tools to empower the business Business IT collaboration framework Unified DI and DQ with Informatica Platform Expand existing PC team experience by up skilling for data quality 37