Oracle Data Integrator for Big Data. Alex Kotopoulis Senior Principal Product Manager



Similar documents
Setting up the Oracle Warehouse Builder Project. Topics. Overview. Purpose

Data processing goes big

IBM Operational Decision Manager Version 8 Release 5. Getting Started with Business Rules

Application. 1.1 About This Tutorial Tutorial Requirements Provided Files

Database Studio is the new tool to administrate SAP MaxDB database instances as of version 7.5.

Lesson 5 Build Transformations

OLH: Oracle Loader for Hadoop OSCH: Oracle SQL Connector for Hadoop Distributed File System (HDFS)

Oracle Data Integrators for Beginners. Presented by: Dip Jadawala Company: BizTech Session ID: 9950

Developing Rich Web Applications with Oracle ADF and Oracle WebCenter Portal

Data Domain Profiling and Data Masking for Hadoop

Working with SQL Server Integration Services

WebSphere Business Monitor V7.0 Business space dashboards

Managing Third Party Databases and Building Your Data Warehouse

Using Oracle Data Integrator with Essbase, Planning and the Rest of the Oracle EPM Products

WebSphere Business Monitor V6.2 Business space dashboards

Crystal Reports Installation Guide

NetIQ. How to guides: AppManager v7.04 Initial Setup for a trial. Haf Saba Attachmate NetIQ. Prepared by. Haf Saba. Senior Technical Consultant

Getting Started using the SQuirreL SQL Client

Sage Intelligence Financial Reporting for Sage ERP X3 Version 6.5 Installation Guide

Deep Quick-Dive into Big Data ETL with ODI12c and Oracle Big Data Connectors Mark Rittman, CTO, Rittman Mead Oracle Openworld 2014, San Francisco

ER/Studio 8.0 New Features Guide

Oracle Data Integrator 12c: Integration and Administration

Oracle Data Integrator 11g: Integration and Administration

ORACLE BUSINESS INTELLIGENCE WORKSHOP

Tutorial: Mobile Business Object Development. SAP Mobile Platform 2.3 SP02

Toad for Data Analysts, Tips n Tricks

MAS 500 Intelligence Tips and Tricks Booklet Vol. 1

SAP Business Objects Business Intelligence platform Document Version: 4.1 Support Package Data Federation Administration Tool Guide

BID2WIN Workshop. Advanced Report Writing

Microsoft Access 2010 handout

WebSphere Business Monitor V6.2 KPI history and prediction lab

Developing SQL and PL/SQL with JDeveloper

USING STUFFIT DELUXE THE STUFFIT START PAGE CREATING ARCHIVES (COMPRESSED FILES)

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Scribe Online Integration Services (IS) Tutorial

ORACLE BUSINESS INTELLIGENCE WORKSHOP

Oracle Fusion Middleware. Getting Started with Oracle Data Integrator 12c

Migrating to Azure SQL Database

Database Forms and Reports Tutorial

Arena Tutorial 1. Installation STUDENT 2. Overall Features of Arena

Oracle Big Data Essentials

Intelligent Event Processer (IEP) Tutorial Detection of Insider Stock Trading

Cloudera Manager Training: Hands-On Exercises

Building and Using Web Services With JDeveloper 11g

Web Intelligence User Guide

Virtual Office Remote Installation Guide

Getting Started with Oracle

Visual Studio.NET Database Projects

Actian Analytics Platform Express Hadoop SQL Edition 2.0

AWS Schema Conversion Tool. User Guide Version 1.0

Talend Open Studio for MDM. Getting Started Guide 6.0.0

Tips and Tricks SAGE ACCPAC INTELLIGENCE

Learn About Analysis, Interactive Reports, and Dashboards

Training Manual Version 1.0

Lab - Configure a Windows 7 Firewall

Best Practices for Implementing Oracle Data Integrator (ODI) July 21, 2011

Create an Excel BI report and share on SharePoint 2013

Tutorial: Mobile Business Object Development. Sybase Unwired Platform 2.2 SP02

Oracle Data Integrator: Administration and Development

Using the Query Analyzer

How To Create A Powerpoint Intelligence Report In A Pivot Table In A Powerpoints.Com

Tutorial: Mobile Business Object Development. SAP Mobile Platform 2.3

Exercise Safe Commands and Audit Trail

Team Foundation Server 2012 Installation Guide

Intellect Platform - Tables and Templates Basic Document Management System - A101

Installation Instruction STATISTICA Enterprise Server

Ricardo Perdigao, Solutions Architect Edsel Garcia, Principal Software Engineer Jean Munro, Senior Systems Engineer Dan Mitchell, Principal Systems

GoldenGate and ODI - A Perfect Match for Real-Time Data Warehousing

VMware vcenter Discovered Machines Import Tool User's Guide Version for vcenter Configuration Manager 5.3

Business Insight Report Authoring Getting Started Guide

IBM Information Server

Microsoft Office Access 2007 which I refer to as Access throughout this book

for Sage 100 ERP Business Insights Overview Document

DCA. Document Control & Archiving USER S GUIDE

Build Your First Web-based Report Using the SAS 9.2 Business Intelligence Clients

BSDI Advanced Fitness & Wellness Software

PowerLogic ION Enterprise 6.0

Cloud Administration Guide for Service Cloud. August 2015 E

SOS SO S O n O lin n e lin e Bac Ba kup cku ck p u USER MANUAL

Personal Call Manager User Guide. BCM Business Communications Manager

Infoview XIR3. User Guide. 1 of 20

Video Administration Backup and Restore Procedures

Installing SQL Express. For CribMaster 9.2 and Later

Oracle Financial Services Data Integration Hub Foundation Pack Extension for Data Relationship Management Interface

2010 Ing. Punzenberger COPA-DATA GmbH. All rights reserved.

MyOra 3.0. User Guide. SQL Tool for Oracle. Jayam Systems, LLC

INTEGRATING MICROSOFT DYNAMICS CRM WITH SIMEGO DS3

Decision Support AITS University Administration. EDDIE 4.1 User Guide

Crystal Reports Payroll Exercise

SAS Business Data Network 3.1

Magaya Software Installation Guide

SAS BI Dashboard 4.3. User's Guide. SAS Documentation

Appendix A How to create a data-sharing lab

ACTIVE DIRECTORY DEPLOYMENT

Business Intelligence Tutorial: Introduction to the Data Warehouse Center

Mitigation Planning Portal MPP Reporting System

System Administrator Guide

5 Setting up a Contact Center

HareDB HBase Client Web Version USER MANUAL HAREDB TEAM

Query JD Edwards EnterpriseOne Customer Credit using Oracle BPEL Process Manager

Transcription:

Oracle Data Integrator for Big Data Alex Kotopoulis Senior Principal Product Manager

Hands on Lab - Oracle Data Integrator for Big Data Abstract: This lab will highlight to Developers, DBAs and Architects some of the best practices for implementing a Big Data implementation of a Data Reservoir using E-LT techniques to improve performance and reduce data integration costs using Oracle Data Integrator. In this lab, participants will walk through the steps that are needed to load and transform sources into a Hadoop cluster, transform it, and load it into a relational target. The following lessons will walk us through various steps that are needed to create the Oracle Data Integrator mappings, packages and Oracle GoldenGate processes required to load and transform the data. HANDS ON LAB - ORACLE DATA INTEGRATOR FOR BIG DATA... 2 ARCHITECTURE OVERVIEW... 3 OVERVIEW... 3 Time to Complete... 3 Prerequisites... 3 TASK 0: PREPARATION STEPS... 4 TASK 1: REVIEW TOPOLOGY AND MODEL SETUP... 5 TASK 2: LOAD HIVE TABLES USING SQOOP... 8 TASK 3: TRANSFORMING DATA WITHIN HIVE... 18 TASK 4: LOAD ORACLE FROM HIVE TABLES USING ORACLE LOADER FOR HADOOP... 29 TASK 5: CREATING A NEW ODI PACKAGE TO EXECUTE END-TO-END LOAD.... 36 TASK 6: REPLICATING NEW RECORDS TO HIVE USING ORACLE GOLDENGATE... 39 SUMMARY... 42 Last Updated: 29-Aug-14 Page 2 of 42

Architecture Overview This Hands-on lab is based on a fictional movie streaming company that provides online access to movie media. The goal of this lab is to load customer activity data that includes movie rating actions as well as a movie database sourced from a MySQL DB into Hadoop Hive, aggregate and join average ratings per movie, and load this data into an Oracle DB target. Flume Log-stream Logs MySQL Movie HDFS file Activity Task 2: Sqoop Map Task 6: OGG Load Hive ext. table Activity Hive movie Task 3: Hive Map Avg. Movie Ratings Task 1: Topology and Models Hive movie_rating Task 5: ODI Package Task 4: OLH Map Oracle MOVIE_RATING We are distributing the work into 6 tasks: 1. Review the prepared ODI topology and models connecting to MySQL, Hadoop, and Oracle DB. 2. Create a mapping that uses Apache Sqoop to load movie data from MySQL to Hive tables 3. Create a mapping that joins data from customer activity with movie data and aggregates average movie ratings into a target Hive table. 4. Load the movie rating information from Hive to Oracle DB using Oracle Loader for Hadoop. 5. Create a package workflow that orchestrates the mappings of tasks 2, 3,and 4 in one end-to-end load. 6. Create Oracle GoldenGate processes that will detect inserts in the MySQL movie database and add them to the Hive movie table in realtime. Overview Time to Complete Prerequisites Perform all 6 tasks 60 Minutes Before you begin this tutorial, you should Have a general understanding of RDBMS and Hadoop concepts. Have a general understanding of ETL concepts. Last Updated: 29-Aug-14 Page 3 of 42

Task 0: Preparation Steps In these steps you will clean and setup the environment for this exercise 1. Double-click Start/Stop Services on the desktop 2. In the Start/Stop Services window, scroll down with arrow keys to ORCL Oracle Database 12c and select it. Press OK. Note: The ORCL option is initially not visible, you need to scroll down. Last Updated: 29-Aug-14 Page 4 of 42

Task 1: Review Topology and Model Setup The connectivity information has already been setup for this hand on lab. This information is setup within the Topology Manager of ODI. The next steps will walk you through how to review this information. 1. Start ODI Studio: On the toolbar single-click (No double-click!) the ODI Studio icon 2. Go to the Topology Navigator and press Connect to Repository 3. In the ODI Login dialog press OK. 4. Within the Physical Architecture accordion on the left, expand the Technologies folder Note: For this HOL the setting Hide Unused Technologies has been set to hide all technologies without a configured dataserver. Last Updated: 29-Aug-14 Page 5 of 42

5. For this HOL the connectivity information has already been setup. Connectivity information is setup for Hive, MySQL and Oracle sources and targets. Please expand these technologies to see the configured dataservers. Info: A technology is a type of datasource that can be used by ODI as source, target, or other connection. A data server is an individual server of a given technology, for example a database server. A data server can have multiple schemas. ODI uses a concept of logical and physical schemas to allow execution of the same mapping on different environments, for example on development, QA, and production environments. 6. Double-click on the Hive data server ( ) to review settings Last Updated: 29-Aug-14 Page 6 of 42

7. Click on the JDBC tab on the left to view Hive connection information. 8. Switch to the Designer navigator and open the Models accordion. Expand all models. Info: A model is a set of metadata definitions regarding a source such as a database schema or a set of files. A model can contain multiple data stores, which follow the relational concept of columns and rows and can be database tables, structured files, or XML elements within an XML document. Last Updated: 29-Aug-14 Page 7 of 42

Task 2: Load Hive Tables using Sqoop In this task we use Apache Sqoop to load data from an external DB into Hive tables. Sqoop starts parallel Map-Reduce processes in Hadoop to load chunks of the DB data with high performance. ODI can generate Sqoop code transparently from a Mapping by selecting the correct Knowledge module. Flume Log-stream Logs MySQL Movie HDFS file Activity Task 2: Sqoop Map Task 6: OGG Load Hive ext. table Activity Hive movie Task 3: Hive Map Avg. Movie Ratings Task 1: Topology and Models Hive movie_rating Task 5: ODI Package Task 4: OLH Map Oracle MOVIE_RATING 1. The first mapping to be created will load the MySQL Movie table into the Hive movie table To create a new mapping, open the Project accordion within the Designer navigator: Last Updated: 29-Aug-14 Page 8 of 42

2. Expand the Big Data HOL > First Folder folder 3. Right click on Mappings and click New Mapping Info: A mapping is a data flow to move and transform data from sources into targets. It contains declarative and graphical rules about data joining and transformation. 9. In the New Mapping dialog change the name to A - Sqoop Movie Load and press OK. Last Updated: 29-Aug-14 Page 9 of 42

4. For this mapping we will load the table MOVIE from model MySQL to the table movie within the model HiveMovie. To view the models open the Models accordion 5. Drag the datastore MOVIE from model MySQL as a source and the datastore movie from Model HiveMovie as a target onto the mapping diagram panel. Last Updated: 29-Aug-14 Page 10 of 42

6. Drag from the output port of the source MOVIE to the input port of the target movie. 7. Click OK on the Attribute Matching dialog. ODI will map all same-name fields from source to target. 8. The logical flow has now been setup. To set physical implementation click on the Physical tab of the editor. Last Updated: 29-Aug-14 Page 11 of 42

9. The physical tab shows the actual systems involved in the transformation, in this case the MySQL source and the Hive target. In the physical tab users can choose the Load Knowledge Module (LKM) that controls data movement between systems as well as the Integration Knowledge Module (IKM) that controls transformation of data. Select the access point MOVIE_AP to select an LKM. Note: The KMs that will be used have already been imported into the project. Info: A knowledge module (KM) is a template that represents best practices to perform an action in an interface, such as loading from/to a certain technology (Load knowledge module or LKM), integrating data into the target (Integration Knowledge Module or IKM), checking data constraints (Check Knowledge Module or CKM), and others. Knowledge modules can be customized by the user. Last Updated: 29-Aug-14 Page 12 of 42

10. Go to the Properties Editor underneath the Mapping editor. There is a section Loading Knowledge Module; you might have to scroll down to see it. Open this section and pick the LKM SQL Multi-Connect.GLOBAL. This LKM allows the IKM to perform loading activities. Note: If the Property Editor is not visible in the UI, go to the menu Window > Properties to open it. Depending on the available size of the Property Editor, the sections within the editor (such as General ) might be shown as titles or tabs on the left. 11. Select the target datastore MOVIE. Last Updated: 29-Aug-14 Page 13 of 42

12. In the Property Editor open section Integration Knowledge Module and pick IKM SQL to Hive-HBase-File (SQOOP).GLOBAL. Note: If this IKM is not visible in the list, make sure that you performed the previous tutorial step and chose the LKM SQL Multi-Connect. 13. Review the list of IKM Options for this KM. These options are used to configure and tune the Sqoop process to load data. Change the option TRUNCATE to true. Last Updated: 29-Aug-14 Page 14 of 42

14. The mapping is now complete. Press the Run button on the taskbar above the mapping editor. When asked to save your changes, press Yes. 15. Click OK for the run dialog. We will use all defaults and run this mapping on the local agent that is embedded in the ODI Studio UI. After a moment a Session started dialog will appear, press OK there as well. 16. To review execution go to the Operator navigator and expand the All Executions node to see the current execution. The execution might not have finished, then it will show the icon for an ongoing task. You can refresh the view by pressing to refresh once or to refresh automatically every 5 seconds. Last Updated: 29-Aug-14 Page 15 of 42

17. Once the load is complete, the warning icon will be displayed. A warning icon is ok for this run and still means the load was successful. You can expand the Execution tree to see the individual tasks of the execution. 18. Go to Designer navigator and Models and right-click HiveMovie.movie. Select View Data from the menu to see the loaded rows. Last Updated: 29-Aug-14 Page 16 of 42

19. A Data editor appears with all rows of the movie table in Hive. Last Updated: 29-Aug-14 Page 17 of 42

Task 3: Transforming Data within Hive In this task we will design a transformation in an ODI mapping that will be executed in Hive. Please note that with ODI you can create logical mappings declaratively without considering any implementation details; those can be added later in the physical design. Flume Log-stream Logs MySQL Movie HDFS file Activity Task 2: Sqoop Map Task 6: OGG Load Hive ext. table Activity Hive movie Task 3: Hive Map Avg. Movie Ratings Task 1: Topology and Models Hive movie_rating Task 5: ODI Package Task 4: OLH Map Oracle MOVIE_RATING For this mapping we will use two Hive source tables movie and movieapp_log_avro as sources and the Hive table movie_rating as target. 1. To create a new mapping, open the Project accordion within the Designer navigator, expand the Big Data HOL > First Folder folder, and right-click on Mappings and click New Mapping Last Updated: 29-Aug-14 Page 18 of 42

2. In the New Mapping dialog change the name to B - Hive Calc Ratings and press OK. 3. Open the Models accordion and expand the model HiveMovie. Drag the datastores movie and movieapp_log_avro as sources and movie_rating as target into the new mapping. 4. First we would like to filter the movie activities to only include rating activities (ID 1). For this drag a Filter from the Component Palette behind the movieapp_log_avro source. Last Updated: 29-Aug-14 Page 19 of 42

5. Drag the attribute activity from movieapp_log_avro onto the FILTER component. This will connect the components and use the attribute activity in the filter condition. 6. Select the FILTER component and go to the Property Editor. Expand the section Condition and complete the condition to movieapp_log_avro.activity = 1 Last Updated: 29-Aug-14 Page 20 of 42

7. We now want to aggregate all activities based on the movie watched and calculate an average rating. Drag an Aggregate component from the palette onto the mapping. 8. Drag and drop the attributes movieid and rating from movieapp_log_avro directly onto AGGREGATE in order to map them. They are automatically routed through the filter. Last Updated: 29-Aug-14 Page 21 of 42

9. Select the attribute AGGREGATE.rating and go to the Property Editor. Expand the section Target and complete the expression to AVG (movieapp_log_avro.rating). Note: The Expression Editor ( icon right of Expression field) can be used to edit expressions and provides lists of available functions. 10. Now we would like to join the aggregated ratings with the movie table to obtain enriched movie information. Drag a Join component from the Component Palette to the mapping. Last Updated: 29-Aug-14 Page 22 of 42

11. Drop the attributes movie.movie_id and AGGREGATE.movieid onto the JOIN component. These two attributes will be used to create an equijoin condition. Note: The join condition can also be changed in the Property Editor 12. Highlight the JOIN component and go to the property editor. Expand the Condition section and check the property Generate ANSI Syntax Last Updated: 29-Aug-14 Page 23 of 42

13. Drag from the output port of JOIN to the input port of the target movie_rating. 14. Click OK on the Attribute Matching dialog. ODI will map all same-name fields from source to target. 15. Drag the remaining unmapped attribute AGGREGATE.rating over to movie_rating.avg_rating. Last Updated: 29-Aug-14 Page 24 of 42

16. The logical flow has now been setup. Compare the diagram below with your actual mapping to spot any differences. To set physical implementation click on the Physical tab of the editor. 17. The physical tab shows that in this mapping everything is performed in the same system, the Hive server. Because of this no LKM is necessary. Select the target MOVIE_RATING to select an IKM. Last Updated: 29-Aug-14 Page 25 of 42

18. Go to the Property Editor and expand the section Integration Knowledge Module. The correct IKM Hive Control Append.GLOBAL has already been selected by default, no change is necessary. In the IKM options change TRUNCATE to True, leave all other options to default. 19. The mapping is now complete. Press the Run button on the taskbar above the mapping editor. When asked to save your changes, press Yes. 20. Click OK for the run dialog. After a moment a Session started dialog will appear, press OK there as well. 21. To review execution go to the Operator navigator and expand the All Executions node to see the current execution. Last Updated: 29-Aug-14 Page 26 of 42

22. Once the load is complete, expand the Execution tree to see the individual tasks of the execution. Double-click on Task 50 Insert (new) rows to see details of the execution 23. In the Session Task Editor that opens click on the Code tab on the left. The generated SQL code will be shown. The code is generated from the mapping logic and contains a WHERE condition, JOIN and GROUP BY statement that is directly related to the mapping components. Last Updated: 29-Aug-14 Page 27 of 42

24. Go to Designer navigator and Models accordion and right-click HiveMovie.movie_rating. Select View Data from the menu to see the loaded rows. 25. A data view editor appears with all rows of the movie_rating table in Hive. Last Updated: 29-Aug-14 Page 28 of 42

Task 4: Load Oracle from Hive Tables using Oracle Loader for Hadoop In this task we load the results of the prior Hive transformation from the resulting Hive table into the Oracle DB data warehouse. We are using the Oracle Loader for Hadoop (OLH) build data loader which uses mechanisms specifically optimized for Oracle DB. Flume Log-stream Logs MySQL Movie HDFS file Activity Task 2: Sqoop Map Task 6: OGG Load Hive ext. table Activity Hive movie Task 3: Hive Map Avg. Movie Ratings Task 1: Topology and Models Hive movie_rating Task 5: ODI Package Task 4: OLH Map Oracle MOVIE_RATING 1. To create a new mapping, open the Project accordion within the Designer navigator, expand the Big Data HOL > First Folder folder, and right-click on Mappings and click New Mapping Last Updated: 29-Aug-14 Page 29 of 42

2. In the New Mapping dialog change the name to C - OLH Load Oracle and press OK. 3. Open the Models accordion and expand the model HiveMovie. Drag the datastore movie_rating as source into the new mapping. Then open model OracleMovie and drag in the datastore MOVIE_RATING_ODI as a target. 4. Drag from the output port of the source movie_rating to the input port of the target MOVIE_RATING_ODI. Last Updated: 29-Aug-14 Page 30 of 42

5. Click OK on the Attribute Matching dialog. ODI will map all same-name fields from source to target. 6. The logical flow has now been setup. To set physical implementation click on the Physical tab of the editor. Last Updated: 29-Aug-14 Page 31 of 42

7. Select the access point MOVIE_RATING_AP (only MOVIE_RA is visible) to select an LKM. Go to the Property Editor and choose LKM SQL Multi-Connect.GLOBAL because the IKM will perform the load. 8. Select the target datastore MOVIE_RATING_ODI, then go to the Property Editor to select an IKM. Choose IKM File-Hive to Oracle (OLH-OSCH).GLOBAL. Note: If this IKM is not visible in the list, make sure that you performed the previous tutorial step and chose the LKM SQL Multi-Connect. Last Updated: 29-Aug-14 Page 32 of 42

9. Review the list of IKM options for this KM. These options are used to configure and tune the OLH or OSCH process to load data. We will use the default setting of OLH through JDBC. Change the option TRUNCATE to true. 10. The mapping is now complete. Press the Run button on the taskbar above the mapping editor. When asked to save your changes, press Yes. 11. Click OK for the run dialog. We will use all defaults and run this mapping on the local agent that is embedded in the ODI Studio UI. After a moment a Session started dialog will appear, press OK there as well. Last Updated: 29-Aug-14 Page 33 of 42

12. To review execution go to the Operator navigator and expand the All Executions node to see the current execution. Wait until the execution is finished, check by refreshing the view. 13. Go to Designer navigator and Models and right-click OracleMovie.MOVIE_RATING_ODI. Select View Data from the menu to see the loaded rows. Last Updated: 29-Aug-14 Page 34 of 42

14. A data view editor appears with all rows of the table MOVIE_RATING_ODI in Oracle. Last Updated: 29-Aug-14 Page 35 of 42

Task 5: Creating a new ODI Package to execute end-to-end load. Now that the mappings have been created we can create a package within ODI that will execute all of the interfaces in order. Flume Log-stream Logs MySQL Movie HDFS file Activity Task 2: Sqoop Map Task 6: OGG Load Hive ext. table Activity Hive movie Task 3: Hive Map Avg. Movie Ratings Task 1: Topology and Models Hive movie_rating Task 5: ODI Package Task 4: OLH Map Oracle MOVIE_RATING 1. To create a new package, open the Designer navigator and Project accordion on the Big Data HOL / First Folder, then right-click on Packages and select New Package. Info: A package is a task flow to orchestrate execution of multiple mappings and define additional logic, such as conditional execution and actions such as sending emails, calling web services, uploads/downloads, file manipulation, event handling, and others. Last Updated: 29-Aug-14 Page 36 of 42

2. Name the package Big Data Load and press OK. 3. Click the Diagram tab Drag and Drop the interfaces from the left onto the diagram panel, starting with the mapping A - Sqoop Movie Load. Notice the green arrow on this mapping which means it is the first step. 4. Drag the mappings B Hive Calc Ratings and C OLH Load Oracle onto the panel 5. Click the OK arrow toolbar button to select the order of precedence. Last Updated: 29-Aug-14 Page 37 of 42

6. Drag and drop from the A - Sqoop Movie Load to the B Hive Calc Ratings to set the link. Then drag and drop from B Hive Calc Ratings to C OLH Load Oracle. Note: If you need to rearrange steps, switch back to the select mode ( ) 7. The package is now setup and can be executed. To execute the interface click the Execute ( ) button in the toolbar. When prompted to save click Yes. 8. Click OK in the Run dialog. After a moment a Session started dialog will appear, press OK there as well. 9. To review execution, go to the Operator navigator and open the latest session execution. The 3 steps are separately shown and contain the same tasks as the mapping executions in the prior tutorials. Last Updated: 29-Aug-14 Page 38 of 42

Task 6: Replicating new records to Hive using Oracle GoldenGate Oracle GoldenGate allows the capture of completed transactions from a source database, and the replication of these changes to a target system. In this tutorial we will replicate inserts into the MOVIE table in MySQL to the respective movie table in Hive. Oracle GoldenGate provides this capability through the GoldenGate Adapters and implemented examples for Hive, HDFS, and HBase. Flume Log-stream Logs MySQL Movie HDFS file Activity Task 2: Sqoop Map Task 6: OGG Load Hive ext. table Activity Hive movie Task 3: Hive Map Avg. Movie Ratings Task 1: Topology and Models Hive movie_rating Task 5: ODI Package Task 4: OLH Map Oracle MOVIE_RATING The GoldenGate processes in detail are as following: Capture Extract EMOV Trail File TM Pump Extract PMOV Java Adapter MySQL table MOVIE HDFS file ogg_movie Hive table movie EMOV.prm PMOV.prm PMOV.properties myhivehandler.jar Last Updated: 29-Aug-14 Page 39 of 42

1. Start a terminal window from the menu bar by single-clicking on the Terminal icon 2. In the terminal window, execute the commands: cd /u01/ogg ggsci 3. Start the GoldenGate manager process by executing start mgr 4. Add and start the GoldenGate extract processes by executing obey dirprm/bigdata.oby Note: Ignore any errors shown from the stop and delete commands at the beginning. 5. See the status of the newly added processes by executing info all Last Updated: 29-Aug-14 Page 40 of 42

6. Start a second terminal window from the menu bar and enter the command: mysql --user=root --password=welcome1 odidemo 7. Insert a new row into the MySQL table movie by executing the following command: insert into MOVIE (MOVIE_ID,TITLE,YEAR,BUDGET,GROSS,PLOT_SUMMARY) values (1, 'Sharknado 2', 2014, 500000, 20000000, 'Flying sharks attack city'); Note: Alternatively you can execute the following command: source ~/movie/moviework/ogg/mysql_insert_movie.sql; 8. Go to the ODI Studio and open the Designer navigator and Models accordion. Rightclick on datastore HiveMovie.movie and select View Data. Last Updated: 29-Aug-14 Page 41 of 42

9. In the View Data window choose the Move to last row toolbar button ( ). The inserted row with movie_id 1 should be in the last row. You might have to scroll all the way down to see it. Refresh the screen if you don t see the entry. Summary You have now successfully completed the Hands on Lab, and have successfully performed an end-to-end load through a Hadoop Data Reservoir using Oracle Data Integrator and Oracle GoldenGate. The strength of this products is to provide an easy-to-use approach to developing performant data integration flows that utilize the strength of the underlying environments without adding proprietary transformation engines. This is especially relevant in the age of Big Data. Last Updated: 29-Aug-14 Page 42 of 42