Data Integrity and Integration: How it can compliment your WebFOCUS project Vincent Deeney Solutions Architect 1
After Lunch Brain Teaser This is a Data Quality Problem! 2
Problem defining a Member How do I determine which way this person is looking? Problem : Organizations have multiple definitions of what a Member is. Copyright 2007, Information Builders. Slide 3
Agenda Why is Data Integrity and Integration important? Impact a Business Processes? What are the types of Data Integrity Issues? Technology 4
Why is Data Quality Important? More than 50 percent of data warehouse projects will have limited acceptance, or will be outright failures, as a result of a lack of attention to data quality issues Gartner (2005) Of business working to improve their CRM Processes, only 38% have evaluated the impact that poor-quality data. Those working customer experience for external-facing processes, only 30% proactively monitor data quality impacts. - According to A Forrester Survey (2011) Over the next two years, more than 25 percent of critical data in Fortune 1000 companies will continue to be flawed, that is, the information will be inaccurate, incomplete or duplicated Gartner (2007) 5
Why is Data Quality Important? Potential Energy = m * h * g There will not be a test after the talk! 6
Data Quality Potential of Data Why is Data Quality Important? Time Optimal Data usage gives continual energy to your organization 7
Data Quality -> Fitness for Use Name Email Billing Address Delivery Address John Smith Jsmith@gmail.com 2 Penn Plaza NYC NY Phone Number SSN, 610-940-070 532-12-1251 Billing Department Collections Shipping Department Copyright 2007, Information Builders. Slide 8
Data Quality -> Fitness for Use Name Email Billing Address Delivery Address John Smith Jane Smith Jsmith@gmail.com Janesmith@gmail.com 2 Penn Plaza NYC NY 2 Penn Plaza NYC NY Phone Number Purchase, 610-940-070 Seed Spreader 2 Penn Plaza, NYC 610-940-0790 Electric Tiller Sales & Marketing
Types of Issues Accuracy Consistency Sufficiency Metrics that relate the State of Data to the Fitness of Use by the Business Comparability Completeness Data Quality Dimensions Reliability Precision Scope Level of detail Timeliness 10
Types of Issues Data Quality Name Birthday Gender Date SSN Vincent Deeney 6/27/2012 123121234 Incomplete Name Birthday Gender Date SSN Vincent Deeney 11/25/1974 Male 6/27/2012 123121234 Name Birthday SSN Vince Deeney 11/26/1953 321-21-4321 Vince Deeney 11/25/1974 123-12-1233 11
Types of Issues Data Quality Name Birthday Ethnicity Date Action Vincent Deeney 11/25/2074 Klingon 6/27/2012 Admission to ER Incomplete Invalid 12
Types of Issues Data Quality Name Birthday Ethnicity Date Action Vincent Deeney 11/25/2074 Klingon 6/27/2012 Admission to ER Incomplete Invalid Rule: Birth Date can not be in Future or Greater than 105 years ago Ethnicity Asian Black Caucasian Hispanic 13
Types of Issues Data Quality Name Birthday Gender Date Action Vincent Deeney 11/25/75 Male 6/27/2012 Admission to ER Incomplete Invalid Inaccurate 14
Types of Issues Data Quality Name Birthday Gender Date Action Vincent Deeney Ok, I m not sure. What is the impact & 11/25/75 Male 6/27/2012 Admission how do I resolve? to ER Incomplete Invalid Name Birthday Gender Date Action Vincent Deeney 11/25/197? Male 6/26/2012 Admission to ER Inaccurate Name Birthday SSN Bed Assigned Vincent Deeney 11/25/1974 123-12- 1234 6/26/2012 15
Impact on Projects Identifying Ideal Metrics (Dimensions) BI : Drop Down ETL : Dimension Tables Marketing : Identifying Demographics for Customers 16
Business Impact Haug, A., Zachariassen, F., & van Liempd, D. (2011). The cost of poor data quality. Journal of Industrial Engineering and Management, 4(2), 168-193. doi:10.3926/jiem.2011.v4n2.p168-193 Copyright 2007, Information Builders. Slide 17
Impact to the Business Business Impact Productivity Financial Risk and Compliance Perspective Increased Workloads Increased Operational Costs Fraud Decreased Confidence Increased Times to Resolution Decreased Revenues Government Fines Frustration Decreased throughput Regulatory Fines Competitive Risk 18
Impact to the Business Business Impact Productivity Financial Risk and Compliance Perspective Increased Workloads Increased Operational Costs Fraud Decreased Confidence Increased Times to Resolution Decreased Revenues Government Fines Frustration Decreased throughput Regulatory Fines Competitive Risk 19
Global Manufacturing Company United Kingdom China United States Copyright 2007, Information Builders. Slide 20
125 Million Dollar mistake Mars Weather Orbiter 400 Million Miles Earth Mars Multiply by 4.448221628254617 Copyright 2007, Information Builders. Slide 21
Impact to the Business Business Impact Productivity Financial Risk and Compliance Perspective Increased Workloads Increased Operational Costs Fraud Decreased Confidence Increased Times to Resolution Decreased Revenues Government Fines Frustration Decreased throughput Regulatory Fines Competitive Risk 22
Customer Satisfaction Customer Service Jane Smith 123 Martin Way??? John Smith 521 Harbor Rd Copyright 2007, Information Builders. Slide 23
Impact to the Business Business Impact Productivity Financial Risk and Compliance Perspective Increased Workloads Increased Operational Costs Fraud Decreased Confidence Increased Times to Resolution Decreased Revenues Government Fines Frustration Decreased throughput Regulatory Fines Competitive Risk 24
TECHNOLOGY Copyright 2007, Information Builders. Slide 25
Holistic View of Entities for Business Analysts Databases 360 Viewing Tools ERPs CRMs Fin systems Data Entry I N L E T S Data Issues Management Matching Engine Rules, CEP Engine Integration Services Security Services B2A Interfaces B u s i n e s s R u l e s SEARCH CREATE MATCH MERGE Group A Model Registry Repository Group B Model Repository Group C Model Repository Data Marts O U T L E T S Gov Agencies Consuming Systems Customers Others EMS/ERS Repository Copyright 2007, Information Builders. Slide 26
Methodology KPI Definition & Refinement Ongoing Monitoring Deviance Identification Feedback into Processes Profiling Business Rule Development Reporting Data Understanding Data Enrichment & Movement Data Standardization & Transform Data Sync Content Enrichment Unification of Duplicates Relationship Association Parsing Format Correction Content Standardization Content Based Cleansing Copyright 2007, Information Builders. Slide 27
iway Data Governance Manager Platform for measuring data governance performance Measure, Analyze, Manage DG projects on an enterprise scale Promotes Materiality, Accountability, Actionability in an enterprise Help management create an actionable data governance roadmap 28
Managing Data Governance Automate Your DQ Scorecard Process Web based Data Governance Manager provides executives and managers insight into the business impact of their organization s Data Governance compliance, improve process and build a data governance roadmap. Value: Measure key policies and strategies with impact on the business Identify root cause of compliance issues Prioritize projects for process improvement Justify projects and costs 29
Data Governance Manager 30
Data Governance Manager Copyright 2007, Information Builders. Slide 31
Data Governance Manager Copyright 2007, Information Builders. Slide 32
Materiality Mapping DGM to Your Enterprise 33
iway Data Governance Manager Component Architecture Data Governance Manager Viewer Reports & Analytics User Management Designer Governance Framework I N P U T Dimensions Manager Rules Manager Action Manager O U T P U T Data Governance Model Source Systems 34
1: Data Profiling Results Basic statistic: duplicates, distinct values, minimum, maximum Patterns Values and counts of iterations Finding out relationships statistics Page 35
Profiling Easy as 1, 2, 3 1) Select File or Data Source 2) Configure Profiling Options 3) Generate Profile Copyright 2007, Information Builders. Slide 36
Data Profiling Visualize Data Challenges Copyright 2007, Information Builders. Slide 37
Data Profiling Core Analysis Frequency Analysis Domain & Mask Analysis Quantile Sampling Copyright 2007, Information Builders. Slide 38
Data Profiling: Custom Business Rules Copyright 2007, Information Builders. Slide 39
2: Accessing the Data Complete Offering Extract Transform and Load (ETL) Enterprise Service Bus (ESB) Managed File Transfer (MFT) Flat file, Database, Mainframe, Protocol and Application adapters. Capabilities for handling full and change extracts VSAM, IMS, DB2, Oracle etc SAP, Oracle Applications, Siebel, JD Edwards etc XML, EDI, HL7, SWIFT, Custom Files and Messages Key Facts Batch load: batch load interface to load set of data on managed schedule or event triggering basis Real time: any of the 300 adapters covered by iway feeding a canonical interface Adding 1 source systems means configuring 1 load process, nothing else Page 40
iway Product & Architecture iway Tools Service Composition Service composition Service Client Request/ Response iway Service Manager (SOAP/TCPIP/FTP/etc) Data Quality Service Message iway Adapter Composite Service (e.g. getrecord) iway Adapter iway Listener System Exploration Message Transformation Deploy Legacy DB Design Time Compose Run Time Explore
3: Cleaning the Data Result Purified and Standardised values Scoring Explanation Page 42
Data Cleansing & Enrichment Copyright 2007, Information Builders. Slide 43
4: Match and Merge Result Candidate groups and sub groups Golden record Mapping between golden record and source system records Page 44
Match, Merge, and Master Agile Master Data Model Configuration Base Matching and Merging Copyright 2007, Information Builders. Slide 45
Master Data Management Support the global identification, linking and synchronization of information across heterogeneous data sources Create and manage a central system of record Enable the delivery of a single view for all stakeholders Improve and protect your most important data Coexistence Registry Complete View of The Citizen The Customer The Product The Vendor The Household Source Source Master Source Source Consolidated Source Source Master Source Master Source Centralized Source Master Source Source Source Source Source Source Source 46
Data Quality Portal Web portal/dashboard interface provides a single, easy-to-access location for all activities related to the monitoring, auditing, workflow management, and resolution of data quality issues Flexible issue handling including Automatic routing System proposed correction Manual correction and override Manual merge of duplicate records Customizable workflow 47
Data Quality Portal Human Workflow Support for Ambiguous Cases Field Enabled Copyright 2007, Information Builders. Slide 48 Native Workload
IB Enterprise Information Management Framework Business Process Technology Consulting Strategy Roadmap Education Implementation Mentor Advocacy Best Practices Experience Data Governance Data Policy Standards Business Rules Roles & Responsibilities Stewardship Data Ownership Business Intelligence (Analytics/Operations) Master Data Management (Single view of the business) Data Quality Data Integration System, Data, & Intellectual Fragmentation (Costly Business & Technical Problems) Profile Cleanse Match Remediate Data Access Data Movement MD Center DQ Center DQ Profiler DQ Portal Service Manager Real Time & Batch Scalable Reusable DG Enabled Copyright 2007, Information Builders. Slide 49