Oracle Big Data, In-memory, and Exadata - One Database Engine to Rule Them All Dr.-Ing. Holger Friedrich



Similar documents
Inge Os Sales Consulting Manager Oracle Norway

Safe Harbor Statement

Oracle Big Data SQL Technical Update

Oracle Database In-Memory The Next Big Thing

Oracle Database 12c Plug In. Switch On. Get SMART.

2009 Oracle Corporation 1

Oracle Big Data SQL. Architectural Deep Dive. Dan McClary, Ph.D. Big Data Product Management Oracle

Big Data SQL and Query Franchising

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

Who am I? Copyright 2014, Oracle and/or its affiliates. All rights reserved. 3

Preview of Oracle Database 12c In-Memory Option. Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Exadata for Oracle DBAs. Longtime Oracle DBA

Oracle Database - Engineered for Innovation. Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya

Expert Oracle Exadata

<Insert Picture Here> Best Practices for Extreme Performance with Data Warehousing on Oracle Database

ORACLE BUSINESS INTELLIGENCE, ORACLE DATABASE, AND EXADATA INTEGRATION

Integrating Apache Spark with an Enterprise Data Warehouse

ORACLE INSTANCE ARCHITECTURE

Applying traditional DBA skills to Oracle Exadata. Marc Fielding March 2013

Oracle Database 12c Built for Data Warehousing O R A C L E W H I T E P A P E R F E B R U A R Y

<Insert Picture Here> Oracle Database Directions Fred Louis Principal Sales Consultant Ohio Valley Region

ORACLE DATABASE 12C IN-MEMORY OPTION

Optimize Oracle Business Intelligence Analytics with Oracle 12c In-Memory Database Option

Exadata and Database Machine Administration Seminar

Novinky v Oracle Exadata Database Machine

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam

Tuning Exadata. But Why?

Seamless Access from Oracle Database to Your Big Data

ORACLE OLAP. Oracle OLAP is embedded in the Oracle Database kernel and runs in the same database process

Oracle: Database and Data Management Innovations with CERN Public Day

How To Manage Big Data In A Microsoft Cloud (Hadoop)

Introducing Oracle Exalytics In-Memory Machine

Oracle Exadata: The World s Fastest Database Machine Exadata Database Machine Architecture

I/O Considerations in Big Data Analytics

Automatic Data Optimization

An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise

SAP HANA SAP s In-Memory Database. Dr. Martin Kittel, SAP HANA Development January 16, 2013

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances

Exadata Database Machine Administration Workshop NEW

Overview: X5 Generation Database Machines

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011

Introduction to Database as a Service

Oracle Big Data SQL Architectural Deep Dive

Apache Kylin Introduction Dec 8,

Optimizing Storage for Better TCO in Oracle Environments. Part 1: Management INFOSTOR. Executive Brief

IBM Data Retrieval Technologies: RDBMS, BLU, IBM Netezza, and Hadoop

SAP HANA PLATFORM Top Ten Questions for Choosing In-Memory Databases. Start Here

Constructing a Data Lake: Hadoop and Oracle Database United!

Tips and Tricks for Using Oracle TimesTen In-Memory Database in the Application Tier

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

<Insert Picture Here> Oracle Exadata Database Machine Overview

Main Memory Data Warehouses

Oracle Database In-Memory A Practical Solution

Oracle Maximum Availability Architecture with Exadata Database Machine. Morana Kobal Butković Principal Sales Consultant Oracle Hrvatska

Distributed Architecture of Oracle Database In-memory

Oracle MulBtenant Customer Success Stories

In-memory databases and innovations in Business Intelligence

SQL 2016 and SQL Azure

TUT NoSQL Seminar (Oracle) Big Data

Integrate Master Data with Big Data using Oracle Table Access for Hadoop

Architecting for the Internet of Things & Big Data

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

Expert Oracle Exadata

How To Build An Exadata Database Machine X2-8 Full Rack For A Large Database Server

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

SAP HANA In-Memory Database Sizing Guideline

Exadata: from Beginner to Advanced in 3 Hours. Arup Nanda Longtime Oracle DBA (and now DMA)

Query Acceleration of Oracle Database 12c In-Memory using Software on Chip Technology with Fujitsu M10 SPARC Servers

SAS and Oracle: Big Data and Cloud Partnering Innovation Targets the Third Platform

IBM DB2 Near-Line Storage Solution for SAP NetWeaver BW

<Insert Picture Here> Big Data

In-Memory Data Management for Enterprise Applications

BIG DATA TRENDS AND TECHNOLOGIES

Hadoop: Embracing future hardware

Why DBMSs Matter More than Ever in the Big Data Era

A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel

In-Memory Analytics for Big Data

Big Data Are You Ready? Thomas Kyte

Oracle Database 12c: Performance Management and Tuning NEW

System Architecture. In-Memory Database

Tiber Solutions. Understanding the Current & Future Landscape of BI and Data Storage. Jim Hadley

How to Choose Between Hadoop, NoSQL and RDBMS

Oracle Database 11g Comparison Chart

Oracle Data Integration: CON7920 Making the Move to Oracle Data Integrator

ORACLE EXADATA STORAGE SERVER X4-2

A SURVEY OF POPULAR CLUSTERING TECHNOLOGIES

Big Data and Its Impact on the Data Warehousing Architecture

How To Use Exadata

In-memory computing with SAP HANA

Oracle Big Data Management System

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle

Database Decisions: Performance, manageability and availability considerations in choosing a database

Big Data Analytics Using SAP HANA Dynamic Tiering Balaji Krishna SAP Labs SESSION CODE: BI474

An Oracle White Paper June A Technical Overview of the Oracle Exadata Database Machine and Exadata Storage Server

Alejandro Vaisman Esteban Zimanyi. Data. Warehouse. Systems. Design and Implementation. ^ Springer

Actian Vector in Hadoop

Transcription:

Oracle Big Data, In-memory, and Exadata - One Database Engine to Rule Them All Dr.-Ing. Holger Friedrich

Agenda Introduction Old Times Exadata Big Data Oracle In-Memory Headquarters Conclusions 2

sumit AG Consulting and implementation services in Switzerland Experts for Data Warehousing, Business Intelligence, and Big Data solutions Focussed on Oracle technology BI Foundation specialized partner Data Warehousing specialized partner Our motto: Get Value From Data Visit our web site: www.sumit.ch (in German) 2013 sumit AG 3

Computer Science diploma of Karlsruhe Institute of Technology (KIT) Ph.D. in Robotics and Machine Learning More than 16 years experience with Oracle technology Expert for Data Integration Data Warehousing, Data Mining and Business Intelligence Technical Director of sumit AG First Oracle ACE for DWH/BI in Switzerland Holger Friedrich 2013 sumit AG 4

Agenda Introduction Old Times Exadata Big Data Oracle In-Memory Headquarters Conclusions 5

DB Architecture - Old Times Old times = 1977-2008 SGA - System Global Area - Shared Pools (Library Cache etc.) - Redo Log Buffer - Buffer Cache Persistent Storage - Disk & Tape - serve database blocks PGA - Program Global Area - Query specific processing and storage Query processing done in PGA by query specific server processes 6

Query Processing - Old Times Server Process Block Buffer Disk 7

Agenda Introduction Old Times Exadata Big Data Oracle In-Memory Headquarters Conclusions 8

2008 - Times Are a Changing 9

Exadata - Architecture Databases and applications deployed and configured without any adaptations Fast network via Infiniband Regular compute servers Dedicated storage servers - organised in cells - discs & flash attached - run Exadata Storage Software 10

Exadata - The Secret Sauce Three reasons for outstanding Exadata performance Hardware engineering Local query processing functionality in storage layer Database engine aware of intelligent storage layer - extended optimizer costing model and transformations - extended SW to use Exacta Storage APIs Divide and conquer for query processing not just with slave processes (PARALLEL) not just between compute nodes (RAC) but between compute and storage nodes 11

Exadata - Storage Software Evolution Smart Scanning - execute sub-query in storage cells - project results in storage already Keep hot data in Flash Cache Storage Indexes - collect min/max column values - reduce disc access Smart scanning directly on HCC data - no decompression required Offload mining tasks like scoring Additional data caching in columnar format in Flash Cache 12

Agenda Introduction Old Times Exadata Big Data Oracle In-Memory Headquarters Conclusions 13

Information Mgmt Reference Architecture Big Data 14

The HADOOP Zoo 15

Information Managament Data Flow 16

Big Data - Challenges Dynamic ecosphere - Pre-packaged distributions - Oracle Big Data Appliance Analytics - Tools of Hadoop ecosphere - Oracle Big Data Analytics Data Integration - Ever changing Hadoop tool set - Oracle Data Integrator - Big Data SQL 17

Big Data Appliance - The Secret Sauce Three reasons for outstanding BDA performance Hardware engineering Local query processing functionality in storage layer - Big Data SQL = Exadata Storage Software on HADOOP - Added as process engine to the HADOOP process layer - BDS agents run independently on HADOOP nodes Database engine aware of intelligent big data layer - extended and enhanced External Table API - extended optimizer costing model and transformations Exadata success and performance on Big Data Big Data transparently available for DB queries 18

Big Data SQL - Smart Scan 1.Read data from HDFS data node - Direct-path reads - C-based readers when possible - native HADOOP classes otherwise 2.Translate bytes to Oracle 3.Smart scan on Oracle format - apply storage indexes (BDS2.0) - filtering - column projection - parsing JSON/XML - model scoringmodels High compression benefits (except cols with distinct values) 19

Big Data SQL 2.0 - Storage Indexes New feature of Big Data SQL 2.0 Avoid unnecessary disc access on HADOOP nodes Index built during first full scan Granularity in HDFS blocks (256MB) Index application - receive filter predicate - check storage index for blocks where predicate between min and max - only smart scan matching blocks 20

Big Data SQL - Query Execution 21

Extended External Tables - HIVE CREATE TABLE order (cust_num VARCHAR2(10), order_num VARCHAR2(20), order_date DATE, item_cnt NUMBER, description VARCHAR2(100), order_total (NUMBER(8,2)) ORGANIZATION EXTERNAL (TYPE oracle_hive ACCESS PARAMETERS ( com.oracle.bigdata.tablename: order_db.order_summary com.oracle.bigdata.colmap: {"col":"item_cnt", \ "field":"order_line_item_count"} com.oracle.bigdata.overflow: {"action":"truncate", \ "col":"description"} com.oracle.bigdata.erroropt: [{"action":"replace", \ "value":"invalid_num", \ optional settings ) PARALLEL 4; ) new type ORACLE_HIVE "col":["cust_num","order_num"]},\ {"action":"reject", \ col":"order_total}] 22

Extended External Tables - HDFS CREATE TABLE order (cust_num VARCHAR2(10), order_num VARCHAR2(20), order_date DATE, item_cnt NUMBER, description VARCHAR2(100), order_total (NUMBER8,2)) ORGANIZATION EXTERNAL (TYPE oracle_hdfs ACCESS PARAMETERS( com.oracle.bigdata.rowformat: \ SERDE 'org.apache.hadoop.hive.serde2.avro.avroserde' com.oracle.bigdata.fileformat: \ INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.avrocontainerinputformat'\ OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.avrocontaineroutputformat' com.oracle.bigdata.colmap: {"col":"item_cnt", \ "field":"order_line_item_count"} com.oracle.bigdata.overflow: {"action":"truncate", \ "col":"description"} LOCATION ("hdfs:/usr/cust/summary/*")); Location on HDFS new type ORACLE_HDFS optional settings 23

Agenda Introduction Old Times Exadata Big Data Oracle In-Memory Headquarters Conclusions 24

Columnar Stores - Oracle s Flavour transparent column store managed next to the row store not either/or persistent storage row-based as before column store DML-synched in real-time the entire Oracle DB-ecosphere remains unchanged - security - backup - disaster recovery - RAC - NO application changes required! 25

Advantages Best for queries that - scan large quantities of data - on a rather small set of columns - compute aggregates on the results High compression benefits on most columns (except ones containing distinct values) Well suited for OLAP/BI 26

Technology Gems 1. In-memory storage index 2. Filtering on binary compressed data 3. Columnar storage of selected columns 4. Transparent querying across storage hierarchy 5. Real-time background actualization of columnar store 6. Parallel query execution on the columnar store 7. SIMD vector processing 8. In-memory fault tolerance on RAC 9. In-memory aggregation 27

New optimizer transformation Vector Group By Resembles well-known star transformation Two phase, 6 step process Phase 1 - preparation 1. Scan dimensions 2. Build key vectors 3. Prepare accumulator 4. Build tmp-tables for dim select attributes Phase 2 - computation 5. Scan facts w.r.t. key vectors 6. Join filtered facts with tmp-tables Example - In-Memory Aggregation 28

In-Memory - The Secret Sauce Many reasons for outstanding In-Memory performance Conceptual advantage of columnar format Speed of processing in DRAM Sum of technology gems (see earlier) Database engine aware of columnar stores capabilities - extended optimizer costing model and transformations - extended SW to use columnar stores APIs Unprecedented performance for analytics Transparently available for DB queries 29

Agenda Introduction Old Times Exadata Big Data Oracle In-Memory Headquarters Conclusions 30

Headquarters Wikipedia: "Headquarters (HQ) denotes the location where most, if not all, of the important functions of an organization are coordinated." Big Data Storage Exadata Storage Query Process in DB HQ Columnar Store Block Buffer Disks 31

The Database Kernel Rules Them All Query Franchising in action optimizer generates execution plan partial queries are sent out to other engines - Big Data (SQL) - Columnar in-memory store - Exadata storage partial results are received & further processed security policies are applied final results are delivered Divide and conquer between data management technologies 32

The Key Lies in The Kernel Database optimizer and execution engine make it happen Transformer: - new transformations Estimator: - new cost estimation models Execution engine: - extended calls and APIs Only possible because Oracle owns all implementations and APIs involved 33

Crucial Part - The Dictionary The optimizer s estimates rely on - the data dictionary - statistics Data Dictionary knows all objects - Exadata: regular db objects - In-memory: regular db objects - Big Data: defined through External Table declaration Estimating statistics about Big Data objects is challenging 34

Agenda Introduction Old Times Exadata Big Data Oracle In-Memory Headquarters Conclusions 35

Conclusions Exadata - boosts execution for traditional applications and analytics Big Data - provides affordable data management for lots of and unstructured data In-Memory - serves mighty fast scans, joins, and aggregations for analytics With other vendors these technologies are either - not available in the desired quality - or not tightly integrated, if at all Data silos & isolated solutions are being built again But: Oracle provides top solutions for each In fact: Oracle provides the only portfolio with - all three technologies tightly integrated - and central data management through the Oracle Database 36