Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Similar documents

Big Data Are You Ready? Thomas Kyte

Executive Summary... 2 Introduction Defining Big Data The Importance of Big Data... 4 Building a Big Data Platform...

Big Data and Advanced Analytics Applications and Capabilities Steven Hagan, Vice President, Server Technologies

Big Data: Are You Ready? Kevin Lancaster

An Oracle White Paper June Oracle: Big Data for the Enterprise

An Oracle White Paper October Oracle: Big Data for the Enterprise

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

Big Data Are You Ready? Jorge Plascencia Solution Architect Manager

An Oracle White Paper June High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

Constructing a Data Lake: Hadoop and Oracle Database United!

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances

Oracle Big Data Essentials

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise

Oracle Big Data Handbook

Oracle Database - Engineered for Innovation. Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya

Connecting Hadoop with Oracle Database

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

An Oracle White Paper September Oracle: Big Data for the Enterprise

<Insert Picture Here> Oracle NoSQL Database A Distributed Key-Value Store

Introducing Oracle Exalytics In-Memory Machine

Oracle Big Data SQL Technical Update

Apache Hadoop: Past, Present, and Future

Architecting for the Internet of Things & Big Data

<Insert Picture Here> Big Data

Trafodion Operational SQL-on-Hadoop

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc.

MySQL and Hadoop: Big Data Integration. Shubhangi Garg & Neha Kumari MySQL Engineering

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

Hadoop and Map-Reduce. Swati Gore

ORACLE BIG DATA APPLIANCE X3-2

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam

Can High-Performance Interconnects Benefit Memcached and Hadoop?

TUT NoSQL Seminar (Oracle) Big Data

Implement Hadoop jobs to extract business value from large and varied data sets

Oracle Database 12c Plug In. Switch On. Get SMART.

High Performance IT Insights. Building the Foundation for Big Data

Big Data Technologies Compared June 2014

An Integrated Big Data & Analytics Infrastructure June 14, 2012 Robert Stackowiak, VP Oracle ESG Data Systems Architecture

An Approach to Implement Map Reduce with NoSQL Databases

How To Use Big Data For Telco (For A Telco)

How To Scale Out Of A Nosql Database

Data processing goes big

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Enabling High performance Big Data platform with RDMA

EMC Federation Big Data Solutions. Copyright 2015 EMC Corporation. All rights reserved.

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

Certified Big Data and Apache Hadoop Developer VS-1221

Oracle Exadata: The World s Fastest Database Machine Exadata Database Machine Architecture

Cisco Unified Data Center Solutions for MapR: Deliver Automated, High-Performance Hadoop Workloads

I/O Considerations in Big Data Analytics

Hadoop IST 734 SS CHUNG

NoSQL and Hadoop Technologies On Oracle Cloud

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack

Using OBIEE for Location-Aware Predictive Analytics

Realtime Apache Hadoop at Facebook. Jonathan Gray & Dhruba Borthakur June 14, 2011 at SIGMOD, Athens

Using RDBMS, NoSQL or Hadoop?

Making Sense of Big Data in Insurance

Hadoop implementation of MapReduce computational model. Ján Vaňo

Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale

Internals of Hadoop Application Framework and Distributed File System

FIFTH EDITION. Oracle Essentials. Rick Greenwald, Robert Stackowiak, and. Jonathan Stern O'REILLY" Tokyo. Koln Sebastopol. Cambridge Farnham.

Big Data Big Data/Data Analytics & Software Development

How To Create A Business Intelligence (Bi)

High Performance Data Management Use of Standards in Commercial Product Development

Hadoop Ecosystem B Y R A H I M A.

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Big Data Analytics - Accelerated. stream-horizon.com

Data Warehouse design

Safe Harbor Statement

Workshop on Hadoop with Big Data

Big Data Introduction

Integrating Big Data into the Computing Curricula

Testing Big data is one of the biggest

So What s the Big Deal?

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

How To Handle Big Data With A Data Scientist

BIG DATA TRENDS AND TECHNOLOGIES

HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM

Big Data and Analytics in Government

Luncheon Webinar Series May 13, 2013

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

Can the Elephants Handle the NoSQL Onslaught?

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

Data Center Op+miza+on

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

Oracle Database Cloud Service Rick Greenwald, Director, Product Management, Database Cloud

Getting Started with Hadoop. Raanan Dagan Paul Tibaldi

This article is the second

Scalable Architecture on Amazon AWS Cloud

Hadoop Meets Exadata. Presented by: Kerry Osborne. DW Global Leaders Program Decemeber, 2012

Transcription:

<Insert Picture Here> s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for s products remain at the sole discretion of. Agenda Big Data Solution Spectrum Inside the Big Data Appliance Big Data Applications Software Big Data Analytics Conclusions 2

Big Data Why Everyone Should Care <Insert Picture Here> Big Data: Acting on New Data I found it Catalog/ Call Center Stores 60% Web Potential increase in Looking back PAST Retail retailers operating margins Decisions possible with Big Data McKinsey Global Institute: Big DataThe next frontier for innovation, competition and Looking productivity (May ahead 2011) Social Search Networks FUTURE I think I want 3

Tapping into Diverse Data Sets Big Data: Decisions based on all your data Video and Images Social Data Documents Machine-Generated Data Information Architectures Today: Decisions based on database data Transactions Drive Value from Big Data Building a Big Data Platform <Insert Picture Here> 4

Big Data: Infrastructure Requirements Acquire Organize Analyze Low, predictable Latency High Transaction Count Flexible Data Structures High Throughput In-Place Preparation All Data Sources/Structures Deep Analytics Agile Development Massive Scalability Real Time Results Divided Solution Spectrum Data Variety Unstructured Schema-less Distributed File Systems Transaction (Key-Value) Stores MapReduce Solutions NoSQL Flexible Specialized Developer Centric Schema DBMS (OLTP) ETL DBMS (DW) Advanced Analytics SQL Trusted Secure Administered Acquire Organize Analyze 5

Hadoop to Bridging the Gap Data Variety Unstructured Schema-less HDFS Cassandra Hadoop MapReduce Loader for Hadoop Schema RDBMS (OLTP) ETL RDBMS (DW) Advanced Analytics Acquire Organize Analyze Integrated Software Solution Data Variety Unstructured Schema-less Schema HDFS NoSQL DB (OLTP) Hadoop Loader for Hadoop Data Integrator (DW) Analytics: Data Mining R Spatial Graph mapreduce OBI EE Acquire Organize Analyze 6

Inside the Big Data Appliance Overview <Insert Picture Here> Engineered Solutions Data Variety Unstructured Schema Big HDFSData Appliance Hadoop Hadoop NoSQL Database Loader NoSQL Loader for hadoop Hadoop DB Data Integrator Data Integrator Database (OLTP) Exadata OLTP & DW Database Data Mining & (DW) R Semantics Spatial In-DB Analytics R Mining Text Graph Spatial BI EE Exalytics Speed of Thought Analytics Information Density Acquire Organize Analyze 14 Copyright 2011, and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8 7

Big Data Appliance Usage Model Big Data Appliance Exadata Exalytics InfiniBand InfiniBand Stream Acquire Organize Analyze & Visualize Why build a Hadoop Appliance? Time to Build? Required Expertise? Cost and Difficulty Maintaining? 8

Big Data Appliance Hardware 18 Sun X4270 M2 Servers 48 GB memory per node = 864 GB memory 12 Intel cores per node = 216 cores 24 TB storage per node = 432 TB storage 40 Gb p/sec InfiniBand 10 Gb p/sec Ethernet Big Data Appliance Cluster of industry standard servers for Hadoop and NoSQL Database Focus on Scalability and Availability at low cost InfiniBand Network Redundant 40Gb/s switches IB connectivity to Exadata Compute and Storage 18 High-performance low-cost servers acting as Hadoop nodes 10GigE Network 8 10GigE ports Datacenter connectivity 24 TB Capacity per node 2 6-core CPUs per node Hadoop triple replication NoSQL Database triple replication 9

Scale Out to Infinity Scale out by connecting racks to each other using Infiniband Expand up to eight racks without additional switches Scale beyond eight racks by adding an additional switch Big Data Appliance Software Enterprise Linux 5.6 Hotspot Java VM Cloudera s Distribution including Apache Hadoop Cloudera Manager Open Source Distribution of R NoSQL Database Community Edition 10

Why Open-Source Apache Hadoop? Fast evolution in critical features Built by the Hadoop experts in the community Practical instead of esoteric Focus on what is needed for large clusters Proven at very large scale In production at all the large consumers of Hadoop Extremely stable in those environments Well-understood by practitioners Software Layout Node 1: M: Name Node, Balancer & HBase Master S: HDFS Data Node, NoSQL DB Storage Node Node 2: M: Secondary Name Node, Management, Zookeeper, MySQL Slave S: HDFS Data Node, NoSQL DB Storage Node Node 3: M: JobTracker, MySQL Master, ODI Agent, Hive Server S: HDFS Data Node, NoSQL DB Storage Node Node 4 18: S: HDFS Data Nodes, Task Tracker, HBase Region Server, NoSQL DB Storage Nodes Your MapReduce runs here! 11

Big Data Application Software <Insert Picture Here> Acquire New Information Key-Value Store Workloads Large dynamic schema based data repositories Data capture Web applications (click-through capture) Online retail Sensor/statistics/network capture (factory automation for example) Backup services for mobile devices Data services Scalable authentication Real-time communication (MMS, SMS, routing) Personalization Social Networks 12

NoSQL DB A distributed, scalable key-value database Simple Data Model Key-value pair with major+sub-key paradigm Read/insert/update/delete operations Scalability Dynamic data partitioning and distribution Optimized data access via intelligent driver High availability One or more replicas Disaster recovery through location of replicas Resilient to partition master failures No single point of failure Transparent load balancing Reads from master or replicas Driver is network topology & latency aware Elastic (Planned for Release 2) Online addition/removal of Storage Nodes Automatic data redistribution Application NoSQLDB Driver Storage Nodes Data Center A Application NoSQLDB Driver Storage Nodes Data Center B NoSQL DB Differentiation Commercial Grade Software and Support General-purpose Reliable Based on proven Berkeley DB JE HA Easy to install and configure Scalable throughput, bounded latency Simple Programming and Operational Model Simple Major + Sub key and Value data structure ACID transactions Configurable consistency & durability Easy Management Web-based console, API accessible Manages and Monitors: Topology; Load; Performance; Events; Alerts Completes large scale data storage offerings 13

Big Data Application Software Organizing Data for Analysis <Insert Picture Here> Loader for Hadoop Features Load data into a partitioned or non-partitioned table Single level, composite or interval partitioned table Support for scalar datatypes of Database Load into Database 11g Release 2 Runs as a Hadoop job and supports standard options Pre-partitions and sorts data on Hadoop Online and offline load modes 28 Copyright 2011, and/or its affiliates. All rights reserved. 14

Loader for Hadoop INPUT 1 ORACLE LOADER FOR HADOOP SHUFFLE /SORT SHUFFLE /SORT INPUT 2 SHUFFLE /SORT SHUFFLE /SORT SHUFFLE /SORT 29 Copyright 2011, and/or its affiliates. All rights reserved. Loader for Hadoop: Online Option Read target table metadata from the database Perform partitioning, ORACLE LOADER FOR HADOOP sorting, and data conversion Connect to the database from reducer nodes, load into database partitions in parallel SHUFFLE /SORT SHUFFLE /SORT 30 Copyright 2011, and/or its affiliates. All rights reserved. 15

Loader for Hadoop: Offline Option Read target table metadata from the database Perform partitioning, ORACLE LOADER FOR HADOOP sorting, and data conversion Write from reducer nodes to Data Pump files Import into the database in parallel using external table mechanism SHUFFLE /SORT SHUFFLE /SORT 31 Copyright 2011, and/or its affiliates. All rights reserved. Loader for Hadoop Advantages Offload database server processing to Hadoop: Convert input data to final database format Compute table partition for row Sort rows by primary key within a table partition Generate binary datapump files Balance partition groups across reducers 32 Copyright 2011, and/or its affiliates. All rights reserved. 16

Selection Output Option for Use Case Loader for Hadoop Output Option Online load with JDBC Online load with Direct Path Offline load with datapump files On Big Data Appliance Direct HDFS Use Case Characteristics The simplest use case for non partitioned tables Fast online load for partitioned tables Fastest load method for external tables Leave data on HDFS Parallel access from database Import into database when needed 33 Copyright 2011, and/or its affiliates. All rights reserved. Streaming Access to HDFS HDFS Datafile_part_1 HDFS Datafile_part_2 Database External Table Query HDFS Datafile_part_m HDFS FUSE View Or Table Function Datafile_part_n HDFS Datafile_part_x Map Reduce 17

Automate Usage of Loader for Hadoop Data Integrator (ODI) ODI has knowledge modules to Generate data transformation code to run on Hive/Hadoop Invoke Loader for Hadoop Use the drag-and-drop interface in ODI to Include invocation of Loader for Hadoop in any ODI packaged flow 35 Copyright 2011, and/or its affiliates. All rights reserved. 36 Copyright 2011, and/or its affiliates. All rights reserved. 18

Big Data Analytics Real Time Analytics Platform <Insert Picture Here> R Statistical Programming Language Open source language and environment Used for statistical computing and graphics Strength in easily producing publication-quality plots Highly extensible with open source community R packages 19

R Enterprise Data and statistical analysis are stored and run in-database Same R user experience & same R clients Embed in operational systems Complements Data Mining Drive Value from Big Data Conclusions <Insert Picture Here> 20

Big Data Appliance Big Data for the Enterprise Optimized and Complete Everything you need to store and integrate your lower information density data Integrated with Exadata Analyze all your data Easy to Deploy Risk Free, Quick Installation and Setup Single Vendor Support Full support for the entire system and software set Integrated Solution Stack for Big Data HDFS NoSQL Database Enterprise Applications Hadoop (MapReduce) Loader for Hadoop Data Integrator Data Warehouse In-Database Analytics Analytic Applications ACQUIRE ORGANIZE ANALYZE DECIDE 21

: Big Data for the Enterprise The most comprehensive solution Includes everything needed to acquire, organize and analyze all your data Optimized for Extreme Analytics Deepest analytics portfolio with access to all data Engineered to Work Together Eliminate deployment risk and support risk Enterprise Ready Deliver extreme performance and scalability Questions 22

23