Stefano Mino EU Sales Leader InfoSphere Information Integration stefano.mino@it.ibm.com Information Governance and Data Quality 2012 IBM Corporation
Most organizations continue to struggle about Data Increasing Complexity Declining Quality Protecting Privacy Ensuring Compliance 1 trillion Connected devices in the world $8.2 million Annual loss by average organization due to poor data quality $204 Cost per compromised record $29.8 billion U.S spending on governance, risk and compliance CIOs must determine whether your Information Governance strategy adequately reflects the relationship to your overall information management initiative. If the relationship is unclear, or the stated goals are different, work with the business to refactor your strategy. 2 Gartner Research, Q&A: Information Governance, Anne Lapkin, Debra Logan, Jan 201219, IBM Corporation 2011
Organizations continue to have Data Quality Challenges Compliance and transparency pressures increasingly highlight data quality issues No method to maintain high quality data Unreliable Insights Low data quality leads to lack of trust and results in poor business decisions Inability to identify source of quality issues Unreliable insights are persisted to other strategic initiatives, which then base key business decisions on bad data High costs & negative customer satisfaction Organizations are recognizing that there can be both direct (missed revenue opportunity) and indirect (low customer satisfaction and high churn) financial costs from poor data quality. 3
Survey: Data quality software is viewed as critical technology... 4 A Good ROI 79% of survey respondents indicated they had deployed their tools of choice in more than one project or deployment, as compared to only 58% in 2008. 2009 Survey on Data Quality Tools Highlights Broadening Deployments With Focus on Proven Functionality, Gartner, 14 August 2009/ID Number: G00170331
The Cost of Dirty Data 83% of Data Integration projects either overrun or fail Inaccurate or incomplete data is a leading cause of failure in business-intelligence and CRM projects 25% of time is spent resolving bad data Undetected defects will cost 10 to 100 times as much to fix upstream Low data quality costs companies $611 billion annually Scrap and rework Increased costs Lack of consumer confidence 5
IBM Information Governance creates order out of information chaos Information Governance is the exercise of decision rights to optimize, secure and leverage data as an enterprise asset. Orchestrate people, process and technology toward a common goal Promotes collaboration Derive maximum value from information Leverage information as an enterprise asset to drive opportunities Safeguarding information Ensure highest quality Manage it throughout lifecycle Governing the creation, management and usage of enterprise data is not an option any longer. It is: Expected by your customers Demanded by your executives Enforced by regulators/auditors 6 2012 IBM Corporation 6
Success requires governance across the Information Supply Chain Transactional & Collaborative Applications Manage Analyze Integrate Business Analytics Applications Content Analytics Master Data Big Data Cubes Data External Information Sources Data Warehouses Content Streaming Information Govern Quality 7 Information Governance Lifecycle Security & Privacy Standards 2012 IBM Corporation
What is Information Governance? IBM Defines Information Governance as a holistic approach to managing and leveraging information for business benefits. It encompasses information quality, information protection and information life cycle management 8 2012 IBM Corporation
Steady State: DG Council and Data Steward Committee* are Established * If Data StewardCommittee is not yet Established, LOB Coordinators (who wil eventually be on it) will serve this function 1. Select Data Domain Steward 2. Map Data Domain to Lines of Business 2.1.2.7 2.1.2.4 3. Identify Domain 8. Conduct SMEs Subject and Area-Focused Stakeholders JAD Session Resource Checklist JAD Sess ion Template Guide 11. Update 2.1.2.7 Conceptual Data Model No 4. Identify SMEs 9. for Document Applications Business Definition Findings Gl ossary Resource of Checklist Template Terms Tem plate 5. Note Potential Data Stewards During Domain Definition Resource Checklist Template 6. Recognize Data Definer, User, and Producer Stewards Resource Checklist Template Data Governance Council Initiates Domain Definition 7. On-Board 9. Mobilize Data Stewards Stewards 13. Have All Subject Areas Been Sufficiently Explored? 10. Document 12. Update Data Standards& Subject Area List Rules Findings DQ Rules & S tan dards Te mplate 8. Mentor Stewards 15. Initiate CLDM Process Maintenance M anual Process 1. Create Domain Boundaries Draft Subject Area List 2.1.2.14 Yes End 2.1.3 3.Validate Domain 16. Update Scope Glossary of Terms 1.3.1 with CLDM Terms Scope Sum m ary Glossar Tem y o plate f Ter ms Te mp late 2. Determine Domain Boundaries 14. Validate 1. Review Data Scope 2. Gather 17. Validate 3. D etermine Elements List in and DQ Summa ry Information Glossary on Applicat ion Instances of Terms Tem plate of Data Element s Conceptual Rules & Standards Model Data Elements DQ Rules and DQ Rules and DQ Rules and Glossary of Standards Standards Standards DQ R ules & Terms Standards Te Template mplate Template Template Tem plate 4. Create 9. Capture Dashboard/ Scorecard/Reporting Conceptual Data Requirement s/ Scope Model DQ Rules and Standards CL DM Template 2.3.2 Yes 12. Mock-Up Meets Needs? 10. Create Data Quality Dashboard Mock-Up 11. Validate Dashboard Mock- Up 5. Obtain Participant Time 6. Create JAD Session Guide and Draft Element List J AD Session Guide 7. Prepare Pre- JAD Session Communications 4. Create Draft DQ Rules and Standards 2.1. 1.3 DQ Rules and Standards Template 1.3.1 8. Validate DQ Rules & Standards DQ Rules and Standards Template 2.1.2.8 6. Conduct Additional JAD Sessions or Meetings Yes 5. Is More SME Input Needed? DQ Rules and Standards Template 2.1.2.11 No 7. Verify DQ Rules and Standards DQ Rules and Standards Template No Proactively leveraging information... to unlock value and manage risk People Process Technology 2.1.1 Build Team Executive Executive-Level Sponsorship Data Governance Risk Data Council Bodies Risk Data Governance Office (DGO) Data Data Quality Governance Reporting Team Data Governance PMA Program Line of Business Project Teams Manager Stewardship Community Virtual Teams Data Quality Metadata Technical Business Reporting Liaison Liaisons Liaisons Lead Steward Liaison (1) (1) (4) (4) Data Data Data Quality Definition Production Usage Measurement Stewardship Stewardship Stewardship Stewardship Function Function Function Function Data Quality LOB/Functional Data Team- Data Domain Data Steward Area Data Steward Governance Business & Steward Committee Coordinator Council Data Analysts 2.1.2 Build Common Definition (Continued) LOB/ Functional Data Quality Data Quality Team- Business Data Domain Area Data S teward Team- Modeler Analyst Steward Coordina tors 2.1.2 Build Common Definition Data Domain Steward LOB/ Functional Data Quality Data Quality Team- Business Area Data S teward Coordinators Team- Modeler Analyst 2.1.3 Build Data Quality Rules and Standards Data Quality Team- Business Analyst Data Data Quality Domain Scorecard Steward Team LOB/ Functional Area Data Steward Coordinators Extract Extract Extract Ensure information is understood and consistently defined. Increase the use and trust of information as an enterprise asset. Protect information, reduce risk and comply. 9 2012 IBM Corporation
Results from good Information Governance Understand your information Know what exists How is it related Ensure common understanding and definitions Contain costs Manage costs with continuous growth Retain information without growing retention costs Maximize value from your information Make decisions that you can trust Increase revenues Reduce costs Secure and Protect Keep information safe from internal and external threats Know who is accessing what information and why Comply with regulatory requirements Retention Security Filings Audits 10 2012 IBM Corporation
Good governance requires process and accountability IBM Information Governance Unified Process 1) Define Business Problem 2) Obtain Executive Sponsorship 3) Conduct Maturity Assessment 4) Build Roadmap 5) Establish Organizational Blueprint 6) Build Business Glossary 7) Understand Data 8) Create Metadata Repository 9) Define Metrics 10) Govern Data Quality 11) Govern Master Data 12) Govern Lifecycle of Information 13) Govern Security & Privacy 14) Govern Big Data Optional Steps 15) Measure Results Required Steps 11 2012 IBM Corporation
Conduct a maturity assessment 12 2012 IBM Corporation
What is Trusted Information? Insightful Derive meaning from information challenges In Context Real-time delivery of relevant information when and where it s needed Complete Related information reconciled into a single and holistic view Accurate Complex and disparate data transformed, cleansed and delivered 13
Align business and IT objectives using single platform that creates trusted information for use in key initiatives Sources legacy apps Business Analysts Enterprise Architects Executives Data Analysts & Architects Subject Matter Experts Business Initiatives BI dbs SAP Xls., xml, flat Warehouse warehouse MDM z/os App Consolidation custom Data Steward DBA Developer System Architect ERP System Manager 14
Example business case for data quality in marketing A. Total number of customers in the marketing list B. Number of individual party matches 40,000 C. Additional duplicate individuals who are double-counted as part of a household 50,000 D. Total number of duplicate matches 90,000 E. Number of annual marketing mailings per customer F. Cost per mailing G. Total avoidable cost of duplicate mailings (DxExF) H. Outbound telemarketing calls per customer per year I. Cost per outbound telemarketing call J. Total avoidable cost of outbound telemarketing calls (DxHxI) K. Total avoidable cost of duplicate matches (G+J) 2 $3.25 $585,000 4 $1.50 $540,000 $1,125,000 L. Cost to implement data quality tools $500,000 M. Annual Cost of full-time customer data steward $200,000 N. Total cost of data quality solution (L+M) $700,000 O. Payback period 15 950,000 7.5 months 2012 IBM Corporation
Put the right standards in place 6) Build Business Glossary 7) Understand Data 8) Create Metadata Repository 9) Define Metrics Optional Steps 15) Measure Results Required Steps 16 2012 IBM Corporation
Define a common vocabulary For example, define Financi al Officer Business Analyst Active Subscriber Mobile user who has used any service in the mobile network Compliance Officer Sales Lead User who paid for the service at least 1 time in the past 90 days. Business Intelligence Manager CRM Project Manager Marketing Manager Mobile user who has a phone plan, but not SMS Only post-paid customers, not prepaid customers User who makes at least 1 call over the period of 90 days ERP Project Manager IT Architect Support Rep 17 2012 IBM Corporation
Understand your information?????????????????????????? Data can be distributed over multiple applications, databases and platforms Relationships are complex and poorly documented Relationships are not well understood???? Distributed Data Landscape 18 2012 IBM Corporation
Determine lineage of data 1212454565253092 0000000085426938 Credit Card Number: a unique identification number issued to each card holder and unique to each card printed. 1212 4545 6525 3092 Profit Amount: a currency value that is calculated by combining data from the Customer Master database and Wholesale Inventory applications... Calculation included on monthly report $85,426,938 View end-to-end lineage including design metadata, operational metadata, user-defined metadata 19 2012 IBM Corporation
IBM Data Quality InfoSphere Discovery discover InfoSphere Information Analyzer validate & monitor InfoSphere QualityStage cleanse & enrich Value Align with Business Objectives report & deliver insight Assess & Discover Specialized validation Cleanse & Enrich Master Monitor / Track Life Cycle Shared metadata, connectivity & infrastructure
InfoSphere Information Analyzer Analyze source data quality and monitor adherence to integration and quality rules Requirements Perform data quality assessment Define business rules to monitor data quality Establish stewards for governance of data quality Benefits Identify data quality issues early to reduce project risks Monitor quality metrics over time for compliance Create business confidence with trusted information
Applying Information Analyzer The solution perspective in a variety of use cases BI Applications Master Data Management ry Packaged Applications ive el Integrate nd External Sources io at m or Inf Monitor your trusted systems and their consistency with sources through transformations Packaged App. l na er Int Data Warehouse Analyze Report status & progress to the business D a at So ce ur Supply metrics to governance initiative s Monitor quality at the source to address issues where information originates v er o G n Information Analyzer
Data Quality: Pervasive, Progressive, Continuous Information Analyzer supports the full spectrum across all levels DQ Dashboard + Reports Define (bus.-driven) Metrics Threshold=95% for tax-id rule Data Rules Common Measurement s tax-id field: many Nulls Test Business Aligned Deploy Rule: tax-id not Null And not default Business Measured Business Driven Generic 23
Common Data Quality Dimensions and Measurements Domain quality: completeness, validity, length & format Cross-domain fitness Redundancy Inconsistency 24
Data Rules Specify consistent & re-usable data rules, driven by business The account number must meet the following condition: driven by Data Rule Business users validated against Examples of Rules: The Gender field must be populated and must be in the list of accepted values The Social Security Number must be numeric and in the format 999-99-9999 If Date of Birth Exists AND Date of Birth > 1900-01-01 and < TODAY Then Customer Type Equals P The Bank Account Branch ID is valid in the Branch Reference master list 25
Measure results vs. targets View Metric & Benchmark summaries Organize Metrics and Rules within user-defined folders Create Metrics across single or multiple Data Rules 26
Comprehensive reporting and tracking environment From high level dashboard to flexible views Quickly assess the health of your information in summary dashboard view Drill into specific data quality assessment results Understand the details in multiple perspectives and based on flexible configuration 27 27
Validating Data Rules in InfoSphere QualityStage/DataStage Embed Information Analyzer Data Rule Definitions in DataStage/ QualityStage jobs Create new data rules through the DataStage / QualityStage Designer Enables an integrated and comprehensive development environment across QualityStage, DataStage and Information Analyzer
InfoSphere QualityStage Standardize, cleanse and deduplicate data, ensuring a complete, accurate view of information Requirements Resolution of data quality issues Standardization of data formats Cleanse data Manage duplicate data Enable ongoing quality Benefits Removes duplicates Cross-references matching records Survives a single, complete record Validate and enriches data Highly accurate for fast ROI
IBM InfoSphere Your Trusted Platform for Trusted Information Intelligent Prebuilt, Automated, Proactive Integrated Integrated capabilities designed to address enterprise use cases Comprehensive Covering the full information supply chain InfoSphere is a market leader in every category of Information Integration and Governance 30 2012 IBM Corporation
Next steps in Information Governance IBM Information Governance Council Established Information Governance Council over five years ago Developed Maturity Model for Information Governance leveraged by over 250 customers Community now exceeds 1500 members Join the community www.infogovcommunity.com Self assessment Workshops and assessments For more information www.ibm.com/ informationgovernance 31 2012 IBM Corporation
BACKUP WHAT IS NEW IN INFOSPHERE INFORMATION SERVER FOR DATA QUALITY 9.1
Key Data Quality Enhancements New Information Governance Rules & Policies define objectives monitor / track New Data Quality Console Extended platform support assess & discover Data Validation Rule Impact Analysis Data Validation Rule Sequencing validate cleanse & enrich New Address Verification Module master New Standardization Rules Designer
Data Validation Rules Flexible Output Table Configuration, Sequencing & Impact Analysis Flexible configuration of output tables for Data Validation Rules (naming, append/overwrite) Registration & reuse of output tables Sequencing of Data Validation Rules Advanced web-based Data Validation Rule display incl. lineage and impact analysis 34
Data Validation Rules User named output table configuration & sequencing Define name of output tables for Data Validation Rules Simple user-named tables: single table for single rule Advanced user-named tables: one or more rules can update the same table (common format required) Configure whether to append / overwrite values in output table Workflow example: Data Validation Source DB Rule 1 Data Validation Output Table 1 Data Validation Rule 3 35 Rule 2
Data Validation Rules Search, browse and view Data Validation Rules & associated assets User may browse, search and display details of published Rule Definitions, including usage by DataStage and Glossary assignments. 36
Data Validation Rules View Data Validation Rules in lineage displays Stage details includes reference to Data Rule Definition and changed Rule Logic. Lineage displays data flow through the Job. 37
Data Validation Rules Drill down from job level to Data Validation Rules details Expanding the details of a Job, will preview the data flow within the Job. Data is pushed into and out of the Rule Stage via its connecting links. 38
New Address Verification Module Provide traceability and auditability to data steward role Capabilities Superior GeoCoding support for 240 countries / territories Improved verification, suggestion and correction results Bi-directional Transliteration support Tightly integrated into QualityStage Supports for most Information Server versions Extensible framework to support other features in the future such as Address Certification Benefits Reduced errors in shipping, mailing, and other activity resulting in lower cost Better customer service and increased revenue Increase business confidence when using enterprise data for critical decision making
New InfoSphere Data Quality Console Unified environment to proactively increase Data Quality awareness Steward Business Analyst Data Analyst DQ /ETL Developer define objectives report assess & discover monitor / track validate cleanse & enrich master Discovery / Information Analyzer Exception Manager DataStage Information Analyzer 40 QualityStage
New InfoSphere Data Quality Console Dashboard view displaying most critical information at a glance 41
New InfoSphere Data Quality Console Exception summary display with advanced filtering options 42
Backup New InfoSphere Data Quality Console Assigning ownership for exception summaries 43
Backup New InfoSphere Data Quality Console View summary / metadata information for Data Validation Rules and exceptions 44
New InfoSphere Data Quality Console View exception records for Data Validation Rules 45
New InfoSphere Data Quality Console View summary / metadata information for Matching Rules and exceptions 46
New InfoSphere Data Quality Console View exception records (clerical records) for Match Rules 47
New InfoSphere Standardization Rules Designer Simplifying & accelerating the speed of cleansing data Knowledge holders looking at the data define objectives monitor / track what they want to see: 48 assess & discover validate cleanse & enrich master what they will see in the new user interface
New InfoSphere Standardization Rules Designer Data driven standardization when cleansing data Intuitive framework to design, maintain and execute standardization rules for data quality Web based user interface allows users to quickly begin the Classification process by changing or adding value definitions to their data. Drag and drop features allow users to easily manage rules that handle their records without needing to hand write any pattern action language (PAL) code. Allows team collaboration with the ability to work on any revision of the rule 49
IBM Data Quality other features and enhancements Discovery Complete globalization RHEL + AIX support for engine (client: Windows) 64-bit Enhancements for life cycle & test data management (vol. projection) Information Analyzer Pre-defined data quality rules delivered with product
54 54