Implementing the Logical Data Warehouse with Oracle Big Data SQL

Similar documents

Constructing a Data Lake: Hadoop and Oracle Database United!

Oracle Big Data Essentials

Oracle Big Data Fundamentals Ed 1 NEW

Oracle Big Data SQL. Architectural Deep Dive. Dan McClary, Ph.D. Big Data Product Management Oracle

Evolution of Information Management Architecture and Development

Architecting for the Internet of Things & Big Data

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

#TalendSandbox for Big Data

IOT & Big Data: The Future Information Processing Architecture

Hadoop Job Oriented Training Agenda

Oracle Big Data SQL Technical Update

Cloudera Certified Developer for Apache Hadoop

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Hadoop Ecosystem B Y R A H I M A.

<Insert Picture Here> Big Data

Big Data SQL and Query Franchising

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam

Native Connectivity to Big Data Sources in MSTR 10

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

MySQL and Hadoop Big Data Integration

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Connecting Hadoop with Oracle Database

Big Data and Advanced Analytics Applications and Capabilities Steven Hagan, Vice President, Server Technologies

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

An Oracle White Paper June High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

Safe Harbor Statement

Move Data from Oracle to Hadoop and Gain New Business Insights

Deep Quick-Dive into Big Data ETL with ODI12c and Oracle Big Data Connectors Mark Rittman, CTO, Rittman Mead Oracle Openworld 2014, San Francisco

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Big Data Too Big To Ignore

From Dolphins to Elephants: Real-Time MySQL to Hadoop Replication with Tungsten

Database Performance with In-Memory Solutions

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

Big Data. Marriage of RDBMS-DWH and Hadoop & Co. Author: Jan Ott Trivadis AG Trivadis. Big Data - Marriage of RDBMS-DWH and Hadoop & Co.

Are You Big Data Ready?

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

The Future of Data Management

Oracle Big Data Discovery Unlock Potential in Big Data Reservoir

Oracle R zum Anfassen: Die Themen

Self-service BI for big data applications using Apache Drill

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

Talend Big Data. Delivering instant value from all your data. Talend

Hadoop. for Oracle database professionals. Alex Gorbachev Calgary, AB September 2013

SharePlex for SQL Server

Executive Summary... 2 Introduction Defining Big Data The Importance of Big Data... 4 Building a Big Data Platform...

Self-service BI for big data applications using Apache Drill

Big Data and New Paradigms in Information Management. Vladimir Videnovic Institute for Information Management

Native Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy

brief contents PART 1 BACKGROUND AND FUNDAMENTALS...1 PART 2 PART 3 BIG DATA PATTERNS PART 4 BEYOND MAPREDUCE...385

From Relational to Hadoop Part 1: Introduction to Hadoop. Gwen Shapira, Cloudera and Danil Zburivsky, Pythian

Data Warehousing and Analytics Infrastructure at Facebook. Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com

HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM

Hadoop Introduction. Olivier Renault Solution Engineer - Hortonworks

SQL on NoSQL (and all of the data) With Apache Drill

Big Data Course Highlights

Luncheon Webinar Series May 13, 2013

Hadoop 101. Lars George. NoSQL- Ma4ers, Cologne April 26, 2013

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances

Oracle Big Data Strategy Simplified Infrastrcuture

An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise

An Integrated Big Data & Analytics Infrastructure June 14, 2012 Robert Stackowiak, VP Oracle ESG Data Systems Architecture

BIG DATA HADOOP TRAINING

Hadoop Meets Exadata. Presented by: Kerry Osborne. DW Global Leaders Program Decemeber, 2012

HDP Hadoop From concept to deployment.

MySQL and Hadoop. Percona Live 2014 Chris Schneider

Data Warehousing Metadata Management

Data Warehousing Metadata Management

Upcoming Announcements

The Future of Data Management with Hadoop and the Enterprise Data Hub

Big Data: Are You Ready? Kevin Lancaster

Hadoop and MySQL for Big Data

The Future of Big Data SAS Automotive Roundtable Los Angeles, CA 5 March 2015 Mike Olson Chief Strategy Officer,

A Big Data Storage Architecture for the Second Wave David Sunny Sundstrom Principle Product Director, Storage Oracle

Apache Hadoop: The Pla/orm for Big Data. Amr Awadallah CTO, Founder, Cloudera, Inc.

Where is... How do I get to...

Bringing Big Data to People

MySQL and Hadoop: Big Data Integration. Shubhangi Garg & Neha Kumari MySQL Engineering

Seamless Access from Oracle Database to Your Big Data

How To Manage Big Data In A Microsoft Cloud (Hadoop)

BIG DATA & HADOOP DEVELOPER TRAINING & CERTIFICATION

Hadoop & Spark Using Amazon EMR

ITG Software Engineering

Ganzheitliches Datenmanagement

Real-time Data Analytics mit Elasticsearch. Bernhard Pflugfelder inovex GmbH

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Building Scalable Big Data Pipelines

Decoding the Big Data Deluge a Virtual Approach. Dan Luongo, Global Lead, Field Solution Engineering Data Virtualization Business Unit, Cisco

Oracle Big Data, In-memory, and Exadata - One Database Engine to Rule Them All Dr.-Ing. Holger Friedrich

Next-Generation Cloud Analytics with Amazon Redshift

The Top 10 7 Hadoop Patterns and Anti-patterns. Alex

Big Data on Microsoft Platform

Transcription:

Implementing the Logical Data Warehouse with Oracle Big Data SQL Matthias Fuchs DWH Architekt ISE Information Systems Engineering GmbH

ISE Information Systems Engineering Gegründet 1991 Mitarbeiteranzahl: 60 Hauptsitz in Gräfenberg, Niederlassungen in München und Nürnberg Schwerpunkte: Oracle Engineered Systems (Exadata / Exalogic / Exalytics) Data Warehousing & Business Intelligence Oracle DB Migrationen, Optimierungen, Hochverfügbarkeit Managed Service für Datenbanken, BI und Middlewareapplikationen Oracle Partner Engineered Systems Award 2013 Copyright (C) ISE GmbH - All Rights Reserved 2

ISE Oracle Technology Center Copyright (C) ISE GmbH - All Rights Reserved 3

ISE Oracle Technology Center Erstes und einziges Exastack Technology Center in Deutschland in Nürnberg Coming soon ODA X5 Copyright (C) ISE GmbH - All Rights Reserved 4

Agenda LDW - Logical Datawarehouse Big Data SQL Infrastructure Sqoop - der Anfang Customer case Copyright (C) ISE GmbH - All Rights Reserved 5

LDW Logical Datawarehouse Copyright (C) ISE GmbH - All Rights Reserved 6

Logical Data Warehouse Gartner Hype Cycle for Information Infrastructure, 2012, the Logical Data Warehouse (LDW) is a new data management architecture for analytics which combines the strengths of traditional repository warehouses with alternative data management and access strategy. The LDW will form a new best practices by the end of 2015. Copyright (C) ISE GmbH - All Rights Reserved 7

Gartner: Logical Dataware House Repository Management Verschiedene Typen u.a. Metadaten Konsolidierung Data Virtualization Virtuelle Daten Schicht Distributed Processes Aufruf externer Prozesse z.b. Bilder oder Content Analyse, aber auch MapReduce Cloud Auditing statistics and performance Evaluation Statistik über Performance End User, Applikationen oder Verbindungen SLA Management Metadataset über erwartete Ausführungenzeiten etc. Überwachung und ggf. Änderung der Ausführung Taxonomy - Ontology resolution a taxonomy tree in an ontological forest Metadata Management Copyright (C) ISE GmbH - All Rights Reserved 8

Gartner: Logical Dataware House Repository Management Verschiedene Typen u.a. Metadaten Konsolidierung Data Virtualization Virtuelle Daten Schicht Distributed Processes Aufruf externer Prozesse z.b. Bilder oder Content Analyse, aber auch MapReduce Cloud Inhalte einzubeziehen Auditing statistics and performance Evaluation Statistik über Performance End User, Applikationen Höhere oder Verbindungen Flexibilität SLA Management Metadataset über erwartete Ausführungenzeiten etc. Überwachung und ggf. Änderung der Ausführung Taxonomy - Ontology resolution a taxonomy tree in an ontological forest Metadata Management Data-to-insight cycle ' schneller günstiges Framework um neue Copyright (C) ISE GmbH - All Rights Reserved 9

Gartner: Übersicht Aus Gartner Newsletter Logical Data Warehousing for Big Data Copyright (C) ISE GmbH - All Rights Reserved 10

Virtualisation & Query Federation Information Management Reference Architecture Oracle Data Reservoir & Enterprise Information Store complete view Data Sources Enterprise Performance Management Data Engines & Poly-structured sources Structured Data Sources Operational Data COTS Data Streaming & BAM Data Ingestion Access & Performance Layer Foundation Data Layer Past, current and future interpretation of enterprise data. Structured to support agile access & navigation Immutable modelled data. Business Process Neutral form. Abstracted from business process changes Pre-built & Ad-hoc BI Assets Master & Reference Data Sources Raw Data Reservoir Immutable raw data reservoir Raw data at rest is not interpreted Information Interpretation Information Services Content Docs SMS Web & Social Media Discovery Lab Sandboxes Project based data stores to support specific discovery objectives Rapid Development Sandboxes Project based data stored to facilitate rapid content / presentation delivery Data Science Auditing statistics/performance Evaluation SLA Management http://www.oracle.com/ocom/groups/public/@otn/documents/webcontent/2297765.pdf Copyright (C) ISE GmbH - All Rights Reserved 11

Big Data SQL Infrastructure Copyright (C) ISE GmbH - All Rights Reserved 12

Big Data Sql - Übersicht Cloudera Hadoop NOSQL R Advanced Analytics Oracle Big Data SQL Connectors ODI Exadata Advanced Analytics Advanced Security Or BigData Lite VM Copyright (C) ISE GmbH - All Rights Reserved 13

Big Data Systemübersicht Processing Layer Big Data SQL Resource Management YARN + MapReduce Storage Layer Filesystem (HDFS) Copyright (C) ISE GmbH - All Rights Reserved 14

Big Data und DB im LDW Repository Management Oracle Big Data Appliance Data Virtualization Distributed Processes Auditing statistics and performance SLA Management ODI, BPM, SOA Enterprise Metadata Management Taxonomy - Ontology resolution Copyright (C) ISE GmbH - All Rights Reserved 15

Sqoop - der Anfang Copyright (C) ISE GmbH - All Rights Reserved 16

Sqoop Sqoop = SQL- to Hadoop Paralleles kopieren von JDBC <-> HDFS MapReduce jobs zum Daten laden/schreiben HDFS DB Map Reduce Copyright (C) ISE GmbH - All Rights Reserved 17

Sqoop mit Oracle OraOOP Guy Harrison team Quest (Dell) Ab version 1.4.5 (CDH 5.1) Oracle direct path (non-buffered) IO for all reads Auf mappers werden Anzahl Blöcke verteilt Bei partitionierten Tabellen, kann der Mapper pro Partition arbeiten HDFS HADOOP MAPPER ORACLE SESSION ORACLE TABLE HADOOP MAPPER ORACLE SESSION Copyright (C) ISE GmbH - All Rights Reserved 18

Real Time Oracle Change Data Capture Supported in 11.2 but not recommended by Oracle Desupported in 12.1 Oracle Golden Gate 1. RDBMS to HIVE 2. RDBMS to Flume 3. RDBMS to HDFS Andere Hersteller: (Dell) Quest SharePlex Auslesen redologs (VMWare) Continuent Tungsten uses CDC im Hintergrund Libelle Copyright (C) ISE GmbH - All Rights Reserved 19

Customer case Copyright (C) ISE GmbH - All Rights Reserved 20

Analyse von Infrastrukturdaten Ziel Daten von Servicecalls (OSB) auswerten Daten Historisieren Feststellen von Anomalien Mappen von Strukturierten und Unstrukturierten Daten Tabellen/View und Datei Import Auswertung mit ausgewählten Werkzeugen Analytic output R Elasticsearch YARN/MR Weblogs Flume HDFS SQOOP CC RDBMS Copyright (C) ISE GmbH - All Rights Reserved 21

Vorbereitung Wahl der Hadoop Distribution Cloudera Oracle supported Ohne -> sehr aufwendig Filedaten Flume Weblogic und Apache Logs Gut dokumentiert im Netz Ggf. Realtime Auswertung mit Elasticsearch or Solr Hive CDH 5.1 OCRFile Format Copyright (C) ISE GmbH - All Rights Reserved 22

Hive ORCFile Optimized Row Columnar File Format light-weight indexes bereits im Fileformat block-mode compression auf basis des Datentyps Größenvergleich über verschiedene Typen 585 505 221 131 Encoded Text CSV File RCFile Record Columnar File Parquet Columnar Storage Format, impala ORCFile Hive TPC-DS Scale 500 Dataset GB, Hortonworks Copyright (C) ISE GmbH - All Rights Reserved 23

Ablauf Datenintegration Teil 1 Datenladen DB HDFS HIVE Oracle Big Data SQL Teil 2 Create Big Data SQL Layer Copyright (C) ISE GmbH - All Rights Reserved 24

Prozess Teil 1 DB Start sqoop job to HDFS Create external table on HDFS Files insert as select in hive ocr data table HDFS HIVE Import parallel 1, da view daten Kein primary key, keine parallelen MapReduce Prozesse Direct read notwendig, da sonst tmp Tablespace zu klein Start mit sqoop2, ende mit sqoop1 inklusiv Optimierung ODI statt oozie Copyright (C) ISE GmbH - All Rights Reserved 25

Prozess Teil 2 Suche Tabelle in Hive aus DB select table_name, input_format, Location from ALL_HIVE_tables where table_name like '%oem%'; Copyright (C) ISE GmbH - All Rights Reserved 26

Prozess Teil 2 Create Table in DB (nur in Test VM) DDL mit CREATE_EXTDDL_FOR_HIVE erzeugen DDL ausführen DDL Erzeugen dbms_hadoop.create_extddl_for_hive( CLUSTER_ID=>'bigdatalite', DB_NAME=>'default', HIVE_TABLE_NAME=>'oem_data', HIVE_PARTITION=>FALSE, TABLE_NAME=>'oem_data', PERFORM_DDL=>FALSE, TEXT_OF_DDL=>DDLout ); DDL Asuführen CREATE TABLE OEM_DATA ( target_name VARCHAR2(4000), target_guid.. key_value6 VARCHAR2(4000), collection_timestamp VARCHAR2(4000)) ORGANIZATION EXTERNAL (TYPE ORACLE_HIVE DEFAULT DIRECTORY DEFAULT_DIR ACCESS PARAMETERS ( com.oracle.bigdata.cluster=bigdatalite com.oracle.bigdata.tablename=default.oem_ data) ) ; Copyright (C) ISE GmbH - All Rights Reserved 27

Ausführungsplan Copyright (C) ISE GmbH - All Rights Reserved 28

Ergebnisse: Laden der Daten Daten für einen Tag ~ 239.634.928 Zeilen/12 Spalten TXT Files ~100 G unkomprimiert Ladezeit ca. 1h aus CC DB OCR Files in hive ~ 27 M komprimiert ~ Ladezeit ca. 30 Minuten Teil 1 Type Größe Select count Where Oem_data BigDataSQL 2,8 MB 2,1 Mio 11s 8s Teil 2 Oem_data local kopiert Oracle 558 MB 2,1 Mio 0,5s 0,5s Oem_data Hive 57s 50s Copyright (C) ISE GmbH - All Rights Reserved 29

Lastverteilung Big Data SQL Only data retrieval (TABLE ACCESS FULL und Filter ) werden offloaded! Datenbearbeitung im DB Layer GROUP BY, ORDER BY, JOIN, PL/SQL etc BigDataSQL 2.0 (Aggregation in Hadoop?) Alternativ Connect über ODBC Tool Beschreibung Decompress CPU Filtering CPU Datatype Conversion Sqoop Hadoop Oracle Oracle Oracle SQL Connector für HDFS Big Data SQL Text Dateien HDFS oder DataPump HDFS 12c Exadata&BDA Oracle Oracle Hadoop Hadoop Hadoop ODBC Hadoop Hadoop Oracle Copyright (C) ISE GmbH - All Rights Reserved 30

Zusammenfassung Vorher: Exadata DB/EMC Nacher: Hadoop Integration Layer Exadata DB/EMC Copyright (C) ISE GmbH - All Rights Reserved 31

Q & A Copyright (C) ISE GmbH - All Rights Reserved 32