An Oracle White Paper March 2014. Best Practices for Real-Time Data Warehousing

Similar documents
An Oracle White Paper February Real-time Data Warehousing with ODI-EE Changed Data Capture

An Oracle White Paper February Oracle Data Integrator 12c Architecture Overview

An Oracle White Paper. Using Oracle GoldenGate to Achieve Operational Reporting for Oracle Applications

An Oracle White Paper October Oracle Data Integrator 12c New Features Overview

ENTERPRISE EDITION ORACLE DATA SHEET KEY FEATURES AND BENEFITS ORACLE DATA INTEGRATOR

Oracle Data Integrator 12c (ODI12c) - Powering Big Data and Real-Time Business Analytics. An Oracle White Paper October 2013

<Insert Picture Here> Real-Time Data Integration for BI and Data Warehousing

An Oracle White Paper March Managing Metadata with Oracle Data Integrator

Oracle SQL Developer Migration. An Oracle White Paper September 2008

Data Integration Overview

Oracle Data Integrator and Oracle Warehouse Builder Statement of Direction

ORACLE DATA INTEGRATOR ENTEPRISE EDITION FOR BUSINESS INTELLIGENCE

Oracle Data Integrator Technical Overview. An Oracle White Paper Updated December 2006

Oracle SQL Developer Migration

Dedicated Real-time Reporting Instances for Oracle Applications using Oracle GoldenGate

Real-time Data Replication

ORACLE DATA INTEGRATOR ENTERPRISE EDITION

The Role of Data Integration in Public, Private, and Hybrid Clouds

An Oracle White Paper June Integration Technologies for Primavera Solutions

OWB Users, Enter The New ODI World

Oracle Business Intelligence Enterprise Edition Plus and Microsoft Office SharePoint Server. An Oracle White Paper October 2008

ORACLE BUSINESS INTELLIGENCE SUITE ENTERPRISE EDITION PLUS

GoldenGate and ODI - A Perfect Match for Real-Time Data Warehousing

ORACLE BUSINESS INTELLIGENCE SUITE ENTERPRISE EDITION PLUS

1 Copyright 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 7

An Oracle White Paper November Oracle Business Intelligence Standard Edition One 11g

Manage Oracle Database Users and Roles Centrally in Active Directory or Sun Directory. Overview August 2008

ORACLE DATA INTEGRATOR ENTERPRISE EDITION

Comprehensive Data Quality with Oracle Data Integrator. An Oracle White Paper Updated December 2007

An Oracle White Paper August Automatic Data Optimization with Oracle Database 12c

The Five Most Common Big Data Integration Mistakes To Avoid O R A C L E W H I T E P A P E R A P R I L

Business Intelligence and Service Oriented Architectures. An Oracle White Paper May 2007

Introducing Oracle Data Integrator and Oracle GoldenGate Marco Ragogna

Oracle Data Integrator and Oracle Warehouse Builder Statement of Direction

Oracle Fusion Middleware

An Oracle White Paper May Oracle Audit Vault and Database Firewall 12.1 Sizing Best Practices

Migrating Non-Oracle Databases and their Applications to Oracle Database 12c O R A C L E W H I T E P A P E R D E C E M B E R

An Oracle White Paper July Introducing the Oracle Home User in Oracle Database 12c for Microsoft Windows

ORACLE TAX ANALYTICS. The Solution. Oracle Tax Data Model KEY FEATURES

SharePlex for SQL Server

The Case for a Stand-alone Rating Engine for Insurance. An Oracle Brief April 2009

SAP Sybase Replication Server What s New in SP100. Bill Zhang, Product Management, SAP HANA Lisa Spagnolie, Director of Product Marketing

Oracle Role Manager. An Oracle White Paper Updated June 2009

An Oracle White Paper June Oracle Database Firewall 5.0 Sizing Best Practices

An Oracle White Paper November Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

HP Shadowbase Solutions Overview

Oracle Net Services for Oracle10g. An Oracle White Paper May 2005

Disaster Recovery for Oracle Database

March Oracle Business Intelligence Discoverer Statement of Direction

Oracle Business Intelligence ADF Custom Visualizations and Integration. An Oracle White Paper November 2012

Oracle Identity Analytics Architecture. An Oracle White Paper July 2010

Monitoring and Diagnosing Production Applications Using Oracle Application Diagnostics for Java. An Oracle White Paper December 2007

Top Ten Reasons for Deploying Oracle Virtual Networking in Your Data Center

Analytics: Pharma Analytics (Siebel 7.8) Student Guide

Attunity Integration Suite

For Midsize Organizations. Oracle Product Brief Oracle Business Intelligence Standard Edition One

Ensuring Web Service Quality for Service-Oriented Architectures. An Oracle White Paper June 2008

An Oracle White Paper January Real-Time Data Integration for Data Warehousing and Operational Business Intelligence

Maximum Availability Architecture

Oracle Real-Time Scheduler Benchmark

Oracle Identity Management: Integration with Windows. An Oracle White Paper December. 2004

ORACLE RAPID PLANNING

Service Oriented Data Management

An Oracle White Paper June Creating an Oracle BI Presentation Layer from Imported Oracle OLAP Cubes

An Oracle White Paper June High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

Using Oracle GoldenGate 12c with SQL Server Databases O R A C L E W H I T E P A P E R J U L Y

An Oracle White Paper January Using Oracle GoldenGate 11g for Microsoft SQL Server Database

<Insert Picture Here> Operational Reporting for Oracle Applications with Oracle GoldenGate

<Insert Picture Here> Real-time database replication

Oracle Identity Management Concepts and Architecture. An Oracle White Paper December 2003

An Oracle White Paper November Upgrade Best Practices - Using the Oracle Upgrade Factory for Siebel Customer Relationship Management

An Oracle White Paper January Take SOA Deployments to the Next Level with Oracle Data Integrator

Oracle Total Recall with Oracle Database 11g Release 2

Maximum Availability Architecture. Oracle Best Practices For High Availability. Backup and Recovery Scenarios for Oracle WebLogic Server: 10.

An Oracle White Paper October BI Publisher 11g Scheduling & Apache ActiveMQ as JMS Provider

Driving Down the High Cost of Storage. Pillar Axiom 600

An Oracle White Paper February Oracle Data Integrator Performance Guide

ORACLE PRODUCT DATA HUB

Oracle Primavera Gateway

<Insert Picture Here> Oracle Data Integration 11g Overview Tim Sawyer

ORACLE HYPERION DATA RELATIONSHIP MANAGEMENT

Oracle BI Publisher Enterprise Cluster Deployment. An Oracle White Paper August 2007

Oracle Communications Network Discovery Overview. Updated June 2007

Oracle Enterprise Single Sign-on Technical Guide An Oracle White Paper June 2009

An Oracle White Paper January Oracle Database Firewall

An Oracle White Paper November Oracle Real Application Clusters One Node: The Always On Single-Instance Database

Business Intelligence for IT

Oracle Application Development Framework Overview

1 Performance Moves to the Forefront for Data Warehouse Initiatives. 2 Real-Time Data Gets Real

An Oracle White Paper January Using Oracle's StorageTek Search Accelerator

An Oracle White Paper July Oracle Desktop Virtualization Simplified Client Access for Oracle Applications

An Oracle White Paper May Distributed Development Using Oracle Secure Global Desktop

MANAGING A SMOOTH MARKETING AUTOMATION SOFTWARE IMPLEMENTATION

ORACLE BUSINESS INTELLIGENCE, ORACLE DATABASE, AND EXADATA INTEGRATION

Highmark Unifies Identity Data With Oracle Virtual Directory. An Oracle White Paper January 2009

An Oracle White Paper February Integration with Oracle Fusion Financials Cloud Service

An Oracle White Paper May Exadata Smart Flash Cache and the Oracle Exadata Database Machine

Achieving Sarbanes-Oxley Compliance with Oracle Identity Management. An Oracle White Paper September 2005

How to Build Business Intelligence Using Oracle Real User Experience Insight

An Oracle White Paper January A Technical Overview of New Features for Automatic Storage Management in Oracle Database 12c

Transcription:

An Oracle White Paper March 2014 Best Practices for Real-Time Data Warehousing

Executive Overview Today s integration project teams face the daunting challenge that, while data volumes are exponentially growing, the need for timely and accurate business intelligence is also constantly increasing. Batches for data warehouse loads used to be scheduled daily to weekly; today s businesses demand information that is as fresh as possible. The value of this realtime business data decreases as it gets older, latency of data integration is essential for the business value of the data warehouse. At the same time the concept of business hours is vanishing for a global enterprise, as data warehouses are in use 24 hours a day, 365 days a year. This means that the traditional nightly batch windows are becoming harder to accommodate, and interrupting or slowing down sources is not acceptable at any time during the day. Finally, integration projects have to be completed in shorter release timeframes, while fully meeting functional, performance, and quality specifications on time and within budget. These processes must be maintainable over time, and the completed work should be reusable for further, more cohesive, integration initiatives. Conventional Extract, Transform, Load (ETL) tools closely intermix data transformation rules with integration process procedures, requiring the development of both data transformations and data flow. Oracle Data Integrator (ODI) takes a different approach to integration by clearly separating the declarative rules (the what ) from the actual implementation (the how ). With Oracle Data Integrator, declarative rules describing mappings and transformations are defined graphically, through a drag-and-drop interface, and stored independently from the implementation. Oracle Data Integrator automatically generates the data flow, which can be fine-tuned if required. This innovative approach for declarative design has also been applied to Oracle Data Integrator's framework for Changed Data Capture (CDC). Oracle Data Integrator s Change Data Capture framework enables the ability to move only changed data to the target systems and can be integrated with Oracle GoldenGate, thereby enabling the kind of real time integration that businesses require. This technical brief describes several techniques available in Oracle Data Integrator to adjust data latency from scheduled batches to continuous real-time integration. Introduction The conventional approach to data integration involves extracting all data from the source system and then integrating the entire set possibly using an incremental strategy in the target system. This approach, which is suitable in most cases, can be inefficient when the integration process requires real-time data integration. In such situations, the amount of data involved makes data integration impossible in the given timeframes. 1

Basic solutions, such as filtering records according to a timestamp column or changed flag, are possible, but they might require modifications in the applications. In addition, they usually do not sufficiently ensure that all changes are taken into account. Oracle Data Integrator s Change Data Capture identifies and captures data as it is being inserted, updated, or deleted from datastores, and it makes the changed data available for integration processes. Real-Time Data Integration Use Cases Integration teams require real-time data integration with low or no data latency for a number of use cases. While this whitepaper focuses on data warehousing, it is useful to differentiate the following areas: - Real-time data warehousing Aggregation of analytical data in a data warehouse using continuous or near realtime loads. - Operational reporting and dashboards Selection of operational data into a reporting database for Business Intelligence tools and dashboards. - Query Offloading Replication of high-cost or legacy OLTP servers to secondary systems to ease query load. - High Availability / Disaster Recovery Duplication of database systems in active-active or active-passive scenarios to improve availability during outages. - Zero Downtime Migrations Ability to synchronize data between old and new systems with potentially different technologies to allow for switch-over and switch-back without downtime. - Data Federation / Data Services Provide virtual, canonical views of data distributed over several systems through federated queries over heterogeneous sources. Oracle has various solutions for different real-time data integration use cases. Query offloading, high availability/disaster recovery, and zero-downtime migrations can be handled through the Oracle GoldenGate product that provides heterogeneous, non-intrusive and highly performant changed data capture, routing, and delivery. In order to provide no to low latency loads, Oracle Data Integrator has various alternatives for real-time data warehousing through the use of Change Data Capture mechanisms, including the integration with Oracle 2

GoldenGate. This integration also provides seamless operational reporting. Data federation and data service use cases are covered by Oracle Data Service Integrator (ODSI). Architectures for Loading Data Warehouses Various architectures for collecting transactional data from operational sources have been used to populate data warehouses. These techniques vary mostly on the latency of data integration, from daily batches to continuous real-time integration. The capture of data from sources is either performed through incremental queries that filter based on a timestamp or flag, or through a Change Data Capture mechanism that detects any changes as it is happening. Architectures are further distinguished between pull and push operation, where a pull operation polls in fixed intervals for new data, while in a push operation data is loaded into the target once a change appears. A daily batch mechanism is most suitable if intra-day freshness is not required for the data, such as longer-term trends or data that is only calculated once daily, for example financial close information. Batch loads might be performed in a downtime window, if the business model doesn t require 24 hour availability of the data warehouse. Different techniques such as real-time partitioning or trickle-and-flip 1 exist to minimize the impact of a load to a live data warehouse without downtime. Description Batch Mini-Batch Micro-Batch Real-Time Data is loaded in full Data is loaded Source changes or incrementally incrementally are captured and using a off-peak using intra-day accumulated to window. loads. be loaded in Source changes are captured and immediately applied to the DW. intervals. Latency Daily or higher Hourly or higher 15min & higher sub-second Capture Filter Query Filter Query CDC CDC Intialization Pull Pull Push, then Pull Push Target Load High Impact Low Impact, load frequency is tuneable Source Load High Impact Queries at peak times necessary Some to none depending on CDC technique 1 See also: Real-Time Data Warehousing: Challenges and Solutions by Justin Langseth (http://dssresources.com/papers/features/langseth/langseth02082004.html) 3

IMPLEMENTING CHANGE DATA CAPTURE WITH ORACLE DATA INTEGRATOR Change Data Capture as a concept is natively embedded in Oracle Data Integrator. It is controlled by the modular Knowledge Module concept and supports different methods of Change Data Capture. This chapter describes the details and benefits of the Oracle Data Integrator Change Data Capture feature. Modular Framework for Different Load Mechanisms Oracle Data Integrator supports each of the described data warehouse load architectures with its modular Knowledge Module architecture. Knowledge Modules enable integration designers to separate the declarative rules of data mapping from selecting a best practice mechanism for data integration. Batch and Mini-Batch strategies can be defined by selecting Load Knowledge Modules (LKM) for the appropriate incremental load from the sources. Micro-Batch and Real-Time strategies use the Journalizing Knowledge Modules (JKM) to select a Change Data Capture mechanism to immediately access changes in the data sources. Mapping logic can be left unchanged for switching Knowledge Module strategies, so that a change in loading patterns and latency does not require a rewrite of the integration logic. Methods for Tracking Changes using Change Data Capture Oracle Data Integrator has abstracted the concept of Change Data Capture into a journalizing framework with a Journalizing Knowledge Module and journalizing infrastructure at its core. By isolating the physical specifics of the capture process from the process of detected changes, it is possible to support a number of different techniques that are represented by individual Journalizing Knowledge Modules: Non-invasive Change Data Capture through Oracle GoldenGate Source Staging Target S S ODI Load J$ T Log GoldenGate Figure 1: GoldenGate-based CDC Real-Time Reporting 4

Oracle GoldenGate provides a Change Data Capture mechanism that can process source changes non-invasively by processing log files of completed transactions and storing these captured changes into external Trail Files independent of the database. Changes are then reliably transferred to a staging database. The Journalizing Knowledge Module uses the metadata managed by Oracle Data Integrator to generate all Oracle GoldenGate configuration files and deploy them into the GoldenGate managers. It then processes all GoldenGate-detected changes in the staging area. These changes will be loaded into the target data warehouse using Oracle Data Integrator s declarative transformation mappings. This architecture enables separate real-time reporting on the normalized staging area tables in addition to loading and transforming the data into the analytical data warehouse tables. Database Triggers Source Target ODI Load S J$ T Trigger Figure 2: Trigger-based CDC Journalizing Knowledge Modules based on database triggers define procedures that are executed inside the source database when a table change occurs. Based on the wide availability of trigger mechanisms in databases, Journalizing Knowledge Modules based on triggers are available for a wide range of sources such as Oracle DB, IBM DB2/400 and UDB, Microsoft SQL Server, Sybase, and others. The disadvantage is the limited scalability and performance of trigger procedures, making them optimal for use cases with light to medium loads. 5

Source databases supported for Oracle Data Integrator Change Data Capture Database JKM Oracle GoldenGate Trigger-based CDC Oracle MS SQL Server Sybase ASE DB2/UDB DB2/400 2 DB2/390 2 Teradata, Enscribe, MySQL, SQL/MP, SQL/MX 2 Publish-and-Subscribe Model The Oracle Data Integrator journalizing framework uses a publish-and-subscribe model. This model works in three steps: 1. An identified subscriber, usually an integration process, subscribes to changes that might occur in a datastore. Multiple subscribers can subscribe to these changes. 2. The Change Data Capture framework captures changes in the datastore and then publishes them for the subscriber. 3. The subscriber an integration process can process the tracked changes at any time and consume these events. Once consumed, events are no longer available for this subscriber. Oracle Data Integrator processes datastore changes in two ways: - Regularly in batches (pull mode) for example, processes new orders from the Web site every five minutes and loads them into the operational datastore (ODS) 2 Requires customization of Oracle GoldenGate configuration generated by JKM 6

- In real time (push mode) as the changes occur for example, when a product is changed in the enterprise resource planning (ERP) system, immediately updates the on-line catalog Orders Capture/Publish CDC Subscribe Integration Process 1 Target 1 Order #5A32 Consume Consume Order #5A32 Subscribe Integration Process 2 Target 2 Figure 3: The ODI Journalizing Framework uses a publish-and-subscribe architecture Processing the Changes Oracle Data Integrator employs a powerful declarative design approach, Extract-Load, Transform (E-LT), which separates the rules from the implementation details. Its out-of-thebox integration interfaces use and process the tracked changes. Developers define the declarative rules for the captured changes within the integration processes in the Oracle Data Integrator Designer graphical user interface without having to code. With the Oracle Data Integrator Designer, customers declaratively specify set-based maps between sources and targets, and then the system automatically generates the data flow from the set-based maps. The technical processes required for processing the changes captured are implemented in Oracle Data Integrator s Knowledge Modules. Knowledge Modules are scripted modules that contain database and application-specific patterns. The runtime then interprets these modules and optimizes the instructions for targets. Ensuring Data Consistency Changes frequently involve several datastores at one time. For example, when an order is created, updated, or deleted, it involves both the orders table and the order lines table. When processing a new order line, the new order to which this line is related must be taken into account. Oracle Data Integrator provides a mode of tracking changes, called Consistent Set Changed Data Capture, for this purpose. This mode allows you to process sets of changes that guarantee data consistency. 7

Best Practices using Oracle Data Integrator for Real-Time Data Warehousing As with other approaches there is no one-size-fits-all approach when it comes to Real-Time Data Warehousing. Much depends on the latency requirements, overall data volume as well as the daily change volume, load patterns on sources and targets, as well as structure and query requirements of the data warehouse. As covered in this paper, Oracle Data Integrator supports all approaches of loading a data warehouse. In practice there is one approach that satisfies the majority of real-time data warehousing use cases: The micro-batch approach using GoldenGate-based Change Data Capture with Oracle Data Integrator. In this approach, one or more tables from operational databases are used as sources for GoldenGate Change Data Capture into a staging area database. This staging area provides a real-time copy of the transactional data for real-time reporting using Business Intelligence tools and dashboards. The operational sources are not additionally stressed as GoldenGate capture is non-invasive and performant, and the separate staging area handles operational Business Intelligence queries without adding load to the transactional system. Oracle Data Integrator performs a load of the changed records to the real-time data warehouse in frequent periods of 15 minutes or more. This pattern has demonstrated the best combination of providing fresh, actionable data to the data warehouse without introducing inconsistencies in aggregates calculated in the data warehouse. Operational Source(s) Staging Area for Operational BI Real-Time Data Warehouse Periodical ODI Load Log GoldenGate Operational BI / Real-Time Reporting Figure 4: Micro-Batch Architecture using ODI and GoldenGate 8

Conclusion Integrating data and applications throughout the enterprise, and presenting a consolidated view of them, is a complex proposition. Not only are there broad disparities in data structures and application functionality, but there are also fundamental differences in integration architectures. Some integration needs are data oriented, especially those involving large data volumes. Other integration projects lend themselves to an event-oriented architecture for asynchronous or synchronous integration. Changes tracked by Change Data Capture constitute data events. The ability to track these events and process them regularly in batches or in real time is key to the success of an eventdriven integration architecture. Oracle Data Integator provides rapid implementation and maintenance for all types of integration projects. 9

Best Practices for Real-time Data Warehousing March 2014 Author: Alex Kotopoulis Oracle Corporation World Headquarters 500 Oracle Parkway Redwood Shores, CA 94065 U.S.A. Worldwide Inquiries: Phone: +1.650.506.7000 Fax: +1.650.506.7200 oracle.com Copyright 2014, Oracle and/or its affiliates. All rights reserved. This document is provided for information purposes only and the contents hereof are subject to change without notice. This document is not warranted to be error-free, nor subject to any other warranties or conditions, whether expressed orally or implied in law, including implied warranties and conditions of merchantability or fitness for a particular purpose. We specifically disclaim any liability with respect to this document and no contractual obligations are formed either directly or indirectly by this document. This document may not be reproduced or transmitted in any form or by any means, electronic or mechanical, for any purpose, without our prior written permission. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners.