An Oracle White Paper January 2013. Comprehensive Data Quality with Oracle Data Integrator and Oracle Enterprise Data Quality



Similar documents
Corente Cloud Services Exchange (CSX) Corente Cloud Services Gateway Site Survey Form

SYSTEM MONITORING PLUG-IN FOR MICROSOFT SQL SERVER

ORACLE GOLDENGATE 11G

An Oracle White Paper January Oracle WebLogic Server on Oracle Database Appliance

2008 BA Insurance Systems Pty Ltd

UC4 AUTOMATED VIRTUALIZATION Intelligent Service Automation for Physical and Virtual Environments

Process Automation With VMware

Improved Data Center Power Consumption and Streamlining Management in Windows Server 2008 R2 with SP1

ORACLE GOLDENGATE 12C

StarterPak: Dynamics CRM Opportunity To NetSuite Sales Order

URM 11g Implementation Tips, Tricks & Gotchas ALAN MACKENTHUN FISHBOWL SOLUTIONS, INC.

Data Abstraction Best Practices with Cisco Data Virtualization

Equivio Zoom. The e-discovery platform for predictive coding and analytics

The Importance Advanced Data Collection System Maintenance. Berry Drijsen Global Service Business Manager. knowledge to shape your future

Introduction to Mindjet MindManager Server

ALM in the Cloud an Overview of Oracle Developer Cloud Service. Introduction. By Dana Singleterry

CorasWorks v11 Essentials Distance Learning

Configuring and Monitoring SysLog Servers

Case Study. Sonata develops. comprehensive BI Application for a leading provider of Animal Nutrition Solutions. Ananthakrishnan

Contract Risk Management

Company Profile Updated: 22 Dec Teacup Services (P) Ltd. Australia: Nepal:

Build the cloud OpenStack Installation & Configuration Integration with existing tools and processes Cloud Migration

Getting Started Guide

Comprehensive Data Quality with Oracle Data Integrator. An Oracle White Paper Updated December 2007

Online Learning Portal best practices guide

Feature Guide. Virto Commerce Platform

Business Intelligence represents a fundamental shift in the purpose, objective and use of information

Diagnosis and Troubleshooting

G-CLOUD FRAMEWORK SERVICE DEFINITION. Oracle Technology Service for Agile Cloud Projects. Copyright: point6 Ltd

Serv-U Distributed Architecture Guide

Licensing the Core Client Access License (CAL) Suite and Enterprise CAL Suite

QAD Operations BI Metrics Demonstration Guide. May 2015 BI 3.11

Mobilizing Healthcare Staff with Cloud Services

NC3A SOA Techwatch Day Call for Presentations

Table of Contents. This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY.

ITIL Release Control & Validation (RCV) Certification Program - 5 Days

HUMAN RESOURCES. Solutions for Human Resource Management in Microsoft Dynamics GP. White Paper. Date: February

Team Process Data Warehouse Goals and High-Level Requirements

HarePoint HelpDesk for SharePoint. For SharePoint Server 2010, SharePoint Foundation User Guide

StarterPak: Dynamics CRM On-Premise to Dynamics Online Migration - Option 2. Version 1.0

Innovate faster with a cloud-enabled enterprise. Dirk Basenach, SAP SE, HANA Cloud Platform November 2 nd, 2015

HP Archiving software for Microsoft Exchange

Standardization or Harmonization? You need Both

Disk Redundancy (RAID)

Seattle Police Department

The AppSec How-To: Choosing a SAST Tool

White Paper for Mobile Workforce Management and Monitoring Copyright 2014 by Patrol-IT Inc.

Getting Started Guide

PAYMENT GATEWAY ACCOUNT SETUP FORM

Professional Leaders/Specialists

IN-HOUSE OR OUTSOURCED BILLING

Helpdesk Support Tickets & Knowledgebase

Case Study Law Firm Profit and Growth LBMS Transforms a Major Law Firm s Market Expansion & Increased Profitability Vision into Reality

Systems Load Testing Appendix

HP ExpertOne. HP2-T21: Administering HP Server Solutions. Table of Contents

KronoDesk Migration and Integration Guide Inflectra Corporation

The Importance of Market Research

Information Services Hosting Arrangements

AvePoint High Speed Migration Supplementary Tools

Using PayPal Website Payments Pro UK with ProductCart

Organizational Applications and Solutions SCM and ERP

Best Practices for Optimizing Performance and Availability in Virtual Infrastructures

Business Intelligence and DataWarehouse workshop

Configuring and Monitoring AS400 Servers. eg Enterprise v5.6

MaaS360 Cloud Extender

Cloud Services Frequently Asked Questions FAQ

HSBC Online Home Loan Application Process

1)What hardware is available for installing/configuring MOSS 2010?

Performance Test Modeling with ANALYTICS

Considerations for Success in Workflow Automation. Automating Workflows with KwikTag by ImageTag

AHI. Foreign Pre-Approval Inspections (PAIs) Points to Consider

What's New. Sitecore CMS 6.6 & DMS 6.6. A quick guide to the new features in Sitecore 6.6. Sitecore CMS 6.6 & DMS 6.6 What's New Rev:

Instant Chime for IBM Sametime Quick Start Guide

Application Services

Company Pierce Washington ( is a full service ebusiness consulting firm, with offices in San Francisco and Chicago.

WHITEPAPER Reference Architectures for Portal-based Rich Internet Applications

1 Google Apps for Education Henrico County, Virginia

Importance and Contribution of Software Engineering to the Education of Informatics Professionals

TESTING TIMES: HOLISTIC ENVIRONMENT MANAGEMENT IN AN AGILE WORLD

Systems Support - Extended

Implementing SQL Manage Quick Guide

ViPNet VPN in Cisco Environment. Supplement to ViPNet Documentation

TaskCentre v4.5 Send Fax (Tobit) Tool White Paper

Change Management Process

Zoom for E-Discovery. The e-discovery platform for predictive coding and analytics

MANITOBA SECURITIES COMMISSION STRATEGIC PLAN

The ADVANTAGE of Cloud Based Computing:

ORACLE COMMUNICATIONS UNIFIED INVENTORY MANAGEMENT

Citrix XenServer from HP Getting Started Guide

THOMSON REUTERS C-TRACK CASE MANAGEMENT SYSTEM SOFTWARE AS A SERVICE SERVICE DEFINITION FOR G-CLOUD 6

THE CUSTOMER SUPPORT KNOWLEDGE BASE FAQ

TOWARDS OF AN INFORMATION SERVICE TO EDUCATIONAL LEADERSHIPS: BUSINESS INTELLIGENCE AS ANALYTICAL ENGINE OF SERVICE

Integrating With incontact dbprovider & Screen Pops

Architecting HP Server Solutions

How to Reduce Project Lead Times Through Improved Scheduling

Analytical Techniques created for the offline world can they yield benefits online?

Succession Planning & Leadership Development: Your Utility s Bridge to the Future

Volume 2, Issue 11, November 2014 International Journal of Advance Research in Computer Science and Management Studies

Electronic Data Interchange (EDI) Requirements

Research Report. Abstract: The Emerging Intersection Between Big Data and Security Analytics. November 2012

Transcription:

An Oracle White Paper January 2013 Cmprehensive Data Quality with Oracle Data Integratr and Oracle Enterprise Data Quality

Executive Overview Pr data quality impacts almst every cmpany. In fact, accrding t a research frm Gartner Cmpanies rutinely make decisins based n remarkably inaccurate r incmplete data, a leading cause f the failure f high-prfile and high-cst IT prjects such as business-intelligence and custmer-relatinship management deplyments. Incnsistent, inaccurate, incmplete, and ut-fdate data are ften the rt cause f expensive business prblems such as peratinal inefficiencies, faulty analysis fr business ptimizatin, unrealized ecnmies f scale, and dissatisfied custmers. These data quality issues and the business-level prblems assciated with it can be slved by cmmitting t a cmprehensive data quality effrt acrss the enterprise. Oracle Data Integratr ffers a cmplete data integratin slutin t meet any data quality challenge fr any type f data dmains with a single, well- integrated technlgy package. Figure 1 Cmprehensive Data Quality Prcess Oracle s slutin fr cmprehensive data quality includes tw prducts: Oracle Data Integratr and Oracle Enterprise Data Quality. These best-f-breed technlgies wrk seamlessly tgether t slve the mst challenging enterprise data quality prblems. Intrductin The first step in a cmprehensive data quality prgram is t assess the quality f yur data thrugh data prfiling. Prfiling data means analyzing metadata frm varius data stres, detecting patterns in the data s that additinal metadata can be inferred, and cmparing the actual data values t expected data values with full drill dwn capabilities. Prfiling prvides an initial baseline fr understanding the ways in which actual data values in yur systems fail t cnfrm t expectatins. When used prir t 2

designing integratin prcesses, data prfiling helps reduce the implementatin time and lwers the assciated risks. In additin, advanced prfiling capabilities ensure data assessment is nt a ne-time activity, but an nging practice that ensures trusted data ver time. Once data prblems are well understd, the rules t repair thse prblems can be created and executed by data quality engines. An initial set f rules can be generated based n the results f prfiling, then users that understand the data can refine and extend thse rules. Data quality rules range frm ensuring data integrity t sphisticated parsing, cleansing, standardizatin, matching, validatin and de-duplicatin. After data quality rules have been generated, fine-tuned, and tested against data samples, thse rules must be added t data integratin prcesses t ensure a pervasive data quality framewrk is in place acrss the enterprise. Data can be repaired either statically in the riginal systems r as part f a data flw. Flw-based cntrl minimizes disruptin t existing systems and ensures that dwnstream analysis and prcessing wrks n reliable, trusted data. Figure 2 - Prfile data, generate data quality rules, add t ETL flw, and execute verall data integratin jbs Finally, the data integratin prcesses including data quality rules are placed int prductin. The runtime perfrmance and reliability f the data quality servers used t prcess these rules is f utmst imprtance. Data prfiling creates a clsed lp f cntinuus data quality mnitring and increasingly refined data repair. Any data quality prblem shuld be slved using these basic steps. Sme data quality challenges can be slved with the standard quality features included with Oracle Data Integratr. Mre trublesme 3

prblems will require the advanced capabilities available with the ptinal Oracle Enterprise Data Quality technlgy which is integrated with Oracle Data Integratr. The fllwing sectins explain the prfiling and quality functins available with the cre Oracle Data Integratr technlgy, alng with the mre advanced features available with Oracle Enterprise Data Quality. Standard Data Quality with Oracle Data Integratr Oracle Data Integratr enables applicatin designers and business analysts t define declarative rules fr data integrity directly in the centralized Oracle Data Integratr metadata repsitry. These rules are applied t applicatin data inline with batch r real-time extract, transfrm, and lad (ETL) jbs t guarantee the verall integrity, cnsistency, and quality f enterprise infrmatin. Defining Business Rules Oracle Data Integratr can autmatically retrieve existing rules that have been defined at the data level (such as database cnstraints) using a custmizable reverse-engineering prcess. Develpers can als create new declarative rules withut cding by using the graphical user interface in Oracle Data Integratr Studi. These rules can be inferred by lking at the data within Oracle Data Integratr. Develpers can immediately test the new declarative rules against the data by perfrming a synchrnus check. Figure 3 - Data quality rules can be checked against any ETL data inline with Oracle Data Integratr 4

Type f Rules Rules fr data integrity can include the fllwing: Uniqueness rules Different custmers must nt have the same e-mail address Different prducts must have different prduct and family cdes Referential integrity rules All custmers must have a sales representative Orders must nt be linked t custmers marked as Invalid Validatin rules that enfrce cnsistency at the recrd level Custmers must nt have an empty zip cde Web cntacts must have a valid e-mail address Enfrcing Business Rules Oracle Data Integratr s custmizable Check Knwledge Mdules (CKMs) help develpers autmatically enfrce the data integrity f their applicatins based n declarative rules that have been captured by Oracle Data Integratr. These CKMs generate the cde necessary fr static r dynamic data checks and als fr any errr recycling that is perfrmed as part f the integratin prcess. Audits prvide statistics n the integrity f applicatin data. They als islate data that is detected as errneus by applying the business rules. Once errneus recrds have been identified and islated in errr tables, they can be accessed frm Oracle Data Integratr Studi, r frm any ther frnt-end applicatin. Figure 4 - Errneus data can be easily reviewed using Oracle Data Integratr Studi 5

This extensive audit infrmatin n data integrity makes it pssible t perfrm a detailed analysis, s that errneus data can be handled accrding t infrmatin technlgy strategies and best practices. Fr example, the fllwing are fur ways errneus data might be handled: Autmatically crrect data Oracle Data Integratr ffers a set f tls t simplify the creatin f data cleansing interfaces that can be scheduled t run at predetermined intervals. Accept errneus data (fr the current prject) In this case, interface develpers need precise rules fr filtering ut errneus data later, using Oracle Data Integratr filters. Crrect the invalid recrds In this situatin, the invalid data is sent t applicatin end users via varius text frmats r distributin mdes, such as human wrkflw, e-mail, HTML, XML, flat text files, and s n, using Oracle Data Integratr packages. Recycle data Errneus data frm an audit can be recycled int the integratin prcess. All these strategies can be autmated using Oracle Data Integratr interfaces and packages withut any additinal data quality cmpnents. Therefre, Oracle Data Integratr puts data quality at the very heart f integratin prcesses with rbust standard data quality capabilities. 6

Advanced Data Quality and Data Prfiling with Oracle Enterprise Data Quality In situatins where the business requirements demand the mst advanced data quality capabilities, Oracle Data Integratr can meet thse demands with ptinal functinality available in Oracle Enterprise Data Quality. Oracle Enterprise Data Quality prvides an end-t-end slutin t measure, imprve, and manage the quality f data frm any dmain, including custmer and prduct data. The cmbinatin f Oracle Data Integratr s best-f-breed E-LT capabilities with Oracle Enterprise Data Quality platfrm makes fr an unbeatable slutin t enterprise-scale data quality issues. Oracle Enterprise Data Quality prvides advanced features such as Prfiling Data Prfiling capabilities help users t analyze and understand their data, highlighting key areas f data discrepancy and the business impact f these prblems. Users can learn frm histrical analysis and define business rules directly frm the data thanks t an integrated data quality user interface. Parsing and Standardizatin A rich palette f functins t parse, cleanse and standardize any type f data is prvided. Using easily managed reference data and a simple graphical cnfiguratin, users can quickly cnfigure, package, share and deply rules specific t their data and industry withut any cding. Matching and Merging Pwerful matching capabilities allw users t identify matching recrds and ptinally link r merge matched recrds tgether based n survivrship rules. Flexible and intuitive cnfiguratin capabilities allw matching and merging rules t be easily tuned t suit yur needs. Address Validatin Oracle Enterprise Data Quality ffers an ptinal address standardizatin and enhancement mdule which can be used in a data quality prcess. This cmpnent supprts mre than 240 cuntries wrldwide and has built-in gecding capabilities. Oracle Enterprise Data Quality nt nly prvides these features fr glbal data with built-in rules sets fr different cuntries and supprt fr Unicde and duble-byte data but its cleansing features can als be used against prduct data, brand data, financial data, and ther types f nn-custmer party data. 7

Figure 5 - Prfiling and cleansing data using Oracle Enterprise Data Quality Oracle Data Integratr and Oracle Enterprise Data Quality are well integrated and users can leverage the data parsing, standardizatin, enrichment, and matching features f Oracle Enterprise Data Quality in their ETL prcesses. Chsing the Right Tls Nt every data integratin prject requires advanced data quality and data prfiling abilities, but hw d yu chse the right tl fr each prject? Certain trade-ffs shuld be cnsidered alng functinal abilities, while thers shuld be alng perfrmance and architectural implicatins n verall quality f service (QS) and service-level agreements (SLA). Here are sme questins t cnsider when selecting a cmprehensive data quality slutin: What is the acceptable balance between high quality and lw latency? (Typically, the higher the quality yur data must be, the mre time is required t intrspect that data, apply cleansing algrithms, cmpare t trusted surces, and finally insert t a warehuse r peratinal system.) Can data quality be enfrced at pint-f-entry, r nly in batch? Often, the best way t imprve data quality is t prevent bad data frm the utset but smetimes this can be impractical if it slws dwn the end-user applicatin r entails a majr frnt-ffice upgrade. Is standard data quality gd enugh, r d I need advanced abilities? 8

The fllwing table details sme f the differences between the standard data quality features f Oracle Data Integratr and the mre advanced quality features f Oracle Enterprise Data Quality. The fllwing table details sme f the differences between the standard data prfiling features f Oracle Data Integratr and the mre advanced features f Oracle Enterprise Data Quality. 9

Cnclusin Cmprehensive data quality shuld be a key enabling technlgy fr any IT infrastructure, and it is critical t slving a range f expensive business prblems. Cmprehensive data quality is particularly imprtant in the cntext f any data integratin prcess t prevent data quality prblems frm prliferating. Oracle Data Integratr s inline, stepped apprach t cmprehensive data quality ensures that data is adequately verified, validated, and cleansed at every pint f the integratin prcess. Oracle Data Integratr has bth standard and advanced data quality capabilities, which feature the same high perfrmance and simplicity that are characteristic f the entire Oracle Fusin Middleware technlgy stack. 10

Cmprehensive Data Quality with Oracle Data Integratr and Oracle Enterprise Data Quality August 2012 Authr: Julien Testut Oracle Crpratin Wrld Headquarters 500 Oracle Parkway Redwd Shres, CA 94065 U.S.A. Wrldwide Inquiries: Phne: +1.650.506.7000 Fax: +1.650.506.7200 racle.cm Cpyright 2013, Oracle and/r its affiliates. All rights reserved. This dcument is prvided fr infrmatin purpses nly and the cntents heref are subject t change withut ntice. This dcument is nt warranted t be errr-free, nr subject t any ther warranties r cnditins, whether expressed rally r implied in law, including implied warranties and cnditins f merchantability r fitness fr a particular purpse. We specifically disclaim any liability with respect t this dcument and n cntractual bligatins are frmed either directly r indirectly by this dcument. This dcument may nt be reprduced r transmitted in any frm r by any means, electrnic r mechanical, fr any purpse, withut ur prir written permissin. Oracle and Java are registered trademarks f Oracle and/r its affiliates. Other names may be trademarks f their respective wners. AMD, Optern, the AMD lg, and the AMD Optern lg are trademarks r registered trademarks f Advanced Micr Devices. Intel and Intel Xen are trademarks r registered trademarks f Intel Crpratin. All SPARC trademarks are used under license and are trademarks r registered trademarks f SPARC Internatinal, Inc. UNIX is a registered trademark licensed thrugh X/Open Cmpany, Ltd. 0410