Big Data Analytics Scaling R to Enterprise Data user! 2013 Albacete Spain #user2013



Similar documents
OTN Developer Day: Oracle Big Data

Oracle Advanced Analytics Oracle R Enterprise & Oracle Data Mining

Learning R Series Session 4: Oracle R Enterprise 1.3 Predictive Analytics Mark Hornick Oracle Advanced Analytics

Oracle Big Data Handbook

Oracle R zum Anfassen: Die Themen

Oracle Big Data Essentials

Safe Harbor Statement

Oracle Advanced Analytics 12c & SQLDEV/Oracle Data Miner 4.0 New Features

Tax Fraud in Increasing

Getting Started with Oracle Data Miner 11g R2. Brendan Tierney

Connecting Hadoop with Oracle Database

Starting Smart with Oracle Advanced Analytics

Oracle Big Data Strategy Simplified Infrastrcuture

A Perfect Storm. Oracle Big Data Science for Enterprise R and SAS Users. Marcos Arancibia, Consulting Product Manager marcos.arancibia@oracle.

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Oracle Big Data SQL Technical Update

Massive Predictive Modeling using Oracle R Technologies Mark Hornick, Director, Oracle Advanced Analytics

Introducing Oracle Exalytics In-Memory Machine

ORACLE OLAP. Oracle OLAP is embedded in the Oracle Database kernel and runs in the same database process

Data processing goes big

Oracle Database 12c Plug In. Switch On. Get SMART.

Big Data Are You Ready? Jorge Plascencia Solution Architect Manager

Advanced Big Data Analytics with R and Hadoop

Oracle Big Data Building A Big Data Management System

An Integrated Big Data & Analytics Infrastructure June 14, 2012 Robert Stackowiak, VP Oracle ESG Data Systems Architecture

Oracle Data Miner (Extension of SQL Developer 4.0)

Hadoop & SAS Data Loader for Hadoop

Big Data Are You Ready? Thomas Kyte

Architecting for the Internet of Things & Big Data

Executive Summary... 2 Introduction Defining Big Data The Importance of Big Data... 4 Building a Big Data Platform...

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

An Oracle White Paper November Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

TUT NoSQL Seminar (Oracle) Big Data

Oracle Big Data Discovery Unlock Potential in Big Data Reservoir

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

Oracle Big Data Spatial & Graph Social Network Analysis - Case Study

Session 1: Introduction to Oracle's R Technologies

Speed of Thought Analytics Graz, June 17 th 2015

A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel

Up Your R Game. James Taylor, Decision Management Solutions Bill Franks, Teradata

Using OBIEE for Location-Aware Predictive Analytics

Data Analysis with Various Oracle Business Intelligence and Analytic Tools

Big Data and Advanced Analytics Applications and Capabilities Steven Hagan, Vice President, Server Technologies

Oracle Big Data Fundamentals Ed 1 NEW

extreme Datamining mit Oracle R Enterprise

Oracle Big Data, In-memory, and Exadata - One Database Engine to Rule Them All Dr.-Ing. Holger Friedrich

An Oracle White Paper October Oracle: Big Data for the Enterprise

An Oracle White Paper September Oracle: Big Data for the Enterprise

WHAT S NEW IN SAS 9.4

Where is... How do I get to...

Big Data Analytics. An Introduction. Oliver Fuchsberger University of Paderborn 2014

I/O Considerations in Big Data Analytics

Oracle Database - Engineered for Innovation. Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya

<Insert Picture Here> Big Data

Big Data and Predictive Analytics: Fiserv Data Mining Case Study [CON8631] Data Warehouse and Big Data

ORACLE DATA INTEGRATOR ENTERPRISE EDITION

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014

An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise

SEIZE THE DATA SEIZE THE DATA. 2015

An Oracle White Paper June Oracle: Big Data for the Enterprise

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances

An Oracle White Paper May Oracle Database Cloud Service

Main Memory Data Warehouses

Oracle: Database and Data Management Innovations with CERN Public Day

The Inside Scoop on Hadoop

Integrating Apache Spark with an Enterprise Data Warehouse

Choosing The Right Big Data Tools For The Job A Polyglot Approach

ETPL Extract, Transform, Predict and Load

Deploy. Friction-free self-service BI solutions for everyone Scalable analytics on a modern architecture

Harnessing Big Data with KNIME

An Oracle White Paper June High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

Cost-Effective Business Intelligence with Red Hat and Open Source

An Oracle White Paper February Oracle Data Integrator 12c Architecture Overview

IBM BigInsights for Apache Hadoop

Oracle SQL Developer Migration

TRAINING PROGRAM ON BIGDATA/HADOOP

Fast Analytics on Big Data with H20

ORACLE BUSINESS INTELLIGENCE FOUNDATION SUITE 11g WHAT S NEW

Oracle Database Cloud Services OGh DBA & Middleware Day

Fraud and Anomaly Detection Using Oracle Advanced Analytic Option 12c

Big Data Analytics with Oracle Advanced Analytics

IBM InfoSphere BigInsights Enterprise Edition

How To Handle Big Data With A Data Scientist

Cisco Integration Platform

Oracle Big Data Spatial and Graph

Extend your analytic capabilities with SAP Predictive Analysis

A Big Data Storage Architecture for the Second Wave David Sunny Sundstrom Principle Product Director, Storage Oracle

SAS and Oracle: Big Data and Cloud Partnering Innovation Targets the Third Platform

Enabling Continuous Delivery for Java Projects with Oracle Cloud Services (Oracle PaaS) Siva Rama Krishna Oracle India

Safe Harbor Statement

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum

ANALYTICS CENTER LEARNING PROGRAM

MySQL and Hadoop: Big Data Integration. Shubhangi Garg & Neha Kumari MySQL Engineering

Big Data and Data Science: Behind the Buzz Words

Fusion Applications Overview of Business Intelligence and Reporting components

Big Data Defined Introducing DataStack 3.0

How To Create A Business Intelligence (Bi)

Actian Vortex Express 3.0

Big Data: Are You Ready? Kevin Lancaster

ORACLE BIG DATA APPLIANCE X4-2

Transcription:

Big Analytics Scaling R to Enterprise user! 2013 Albacete Spain #user2013 Luis Campos Mark Hornick 1 Big Solutions Lead, Oracle EMEA Director, Oracle base Advanced Analytics @luigicampos @MarkHornick

2

The girl with all the questions! The real innovation here is that we can ask questions and get the answer back before we have forgotten why we asked the question in the first place. Hilary Mason, Chief Scientist Bit.ly + member of NYC Mayor Bloomberg s Technology and Innovation Advisory Council 3

Nexus of Forces, Platform 3.0, Four Pillars What Analysts/groups are saying? 4

New Information Challenges Explosion A Decade of Digital Universe Growth: Storage in Exabytes (Source: IDC s Digital Universe Study, June 2011) Combinatory Explosion Dimension Explosion 5

Big Solution = + Analytics + Tools Source: McKinsey study Big data: What s your plan? (March 2013) http://www.mckinsey.com/insights/business_technology/big_data_whats_your_plan DATA Any, Any Source ANALYTICS Out-of-the box Analytics, New Models TOOLS Self Service Discovery On Premise, On Cloud, On Mobile 6

Oracle Complete Business Analytics Solution BIG DATA APPLIANCE BIG DATA CONNECTORS NoSQL DB Oracle DATA Advanced MINING ORACLE Analytics R Ent. SPATIAL,GRAPH Real Time Decisions (RTD) OBIEE ENDECA Collective Intellect (CI) On Premise, Oracle Cloud, On Mobile 7

Apply Advanced Analytics on All Visualise it with any BI Tool Hadoop HDFS Relational BI Tools 8

Oracle R Advantages 1. Keep the R tools 2. Keep the data where it sits (Relational or HDFS) 3. Keep the SQL Based BI Tools 4. Scale to LARGE data sets R workspace console Function push-down data transformation & statistics Oracle statistics engine OBIEE, Web Services Development Production Consumption 9

Oracle s Advanced Analytics Strategic Offerings Deliver enterprise-level advanced analytics in the base Oracle in-base Mining algorithms Access through Free GUI from SQL Developer or programmatically from SQL, PL/SQL, R or Java Predictive model APIs for the Oracle R Enterprise Exadata architecture advantages for up to 5x improvement with Smart Scan Oracle R Distribution Free download, pre-installed on Oracle Big Appliance, bundled with Oracle Linux Enhanced linear algebra performance: Intel s Math Kernel Library, AMD s Core Math Library (Windows and Linux), SUN Solaris and IBM AIX Enterprise support for customers of Oracle Advanced Analytics, Big Appliance, and Oracle Linux 10 Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Oracle s Advanced Analytics Strategic Offerings Deliver enterprise-level R in the base or Hadoop Oracle R Enterprise Transparent access to database-resident data from R Embedded R script execution through database managed R engines Statistics engine Enhanced support for high-speed Exadata scoring Oracle R Connector for Hadoop [ORCH] (Part of Oracle Big Connectors) R interface to Oracle Hadoop Cluster on BDA and non-oracle Hadoop clusters Access and manipulate data in HDFS, database, and file system Write MapReduce functions using R and execute through natural R interface Predictive models with execution in-cluster against Hadoop-stored data 11 Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Oracle R Components Component layout Analyst Laptop Oracle base Optional with ORCH Oracle R Distribution Oracle R Connector for Hadoop Client Oracle R Enterprise Client Packages Oracle R Distribution Oracle R Connector for Hadoop Oracle R Enterprise Client Packages Big Appliance Oracle R Distribution Oracle R Enterprise Server Components Oracle R Enterprise Client Packages Exadata 12 Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Knowledge Exploitation Process Typical stages in a Big Project Deployment Business Understanding Scientist Selection Evaluation Discovery Model Building Preparation 13 13

Loading with Oracle R Enterprise Deployment Evaluation Business Understanding Scientist Model Building Preparation Selection Discovery library(ore) R> df <- data.frame(a=1:26, B=letters[1:26]) R> dim(df) [1] 26 2 R> class(df) [1] "data.frame" R> ore.create(df, table="df_table") R> ore.ls() [1] "DF_TABLE" R> class(df_table) [1] "ore.frame" attr(,"package") [1] "OREbase" R> dim(df_table) [1] 26 2 16 16

Discovery with Oracle R in-db and HDFS library(ore) Deployment Business Understanding Scientist Selection ore.ls() # list tables in DB class(my_table) # ore.frame dim(my_table) # overloaded R functions head(my_table) sample(my_table) summary(my_table) library(orch) Evaluation Discovery hdfs.ls() hdfs.dim("myhdfsdata") hdfs.head("myhdfsdata") hdfs.sample("myhdfsdata") Model Building Preparation hdfs.tohive("myhdfsdata", tablename="my_hive_data") summary(my_hive_data) 17 17

Prep with Oracle R in-db and HDFS Deployment Business Understanding Scientist Selection library(ore) / library(orch) # join merge (MY_TABLE1, MY_TABLE2,by.x="x1", by.y="x2") # project columns df <- MY_TABLE[,c("X","Y","Z")] # filter rows df <- df[df$z<=4.3 df$a=="b",1:3] Evaluation Discovery #binning IRIS_TAB <- ore.push(iris[1:4]) Model Building Preparation IRIS_TAB$PetalBins = ifelse(iris_tab$petal.length < 2.0, "SMALL PETALS", ifelse(iris_tab$petal.length < 4.0, "MEDIUM PETALS", "LARGE PETALS")) 18 18

Densifying data: custom MapReduce jobs Count occurrence of hash tags in tweets per customer for select tags maphashtags <- function (k,v) { x <- strsplit(v$text, " ") x <- x[x!=''] importanttags <- tolower(importanttags) for(twt in 1:length(x)) { for(tag in x[[twt]]) { if(substr(tag,1,1) == "#") { tagl <- tolower(tag) if(tagl %in% importanttags) { orch.keyval(v[twt,"screenname"],tagl) }}}}} reducehashtags <- function(k,vals) { # k = screenname, vals = vector(tags) importanttags <- tolower(importanttags) vals <- factor(vals$val,levels=importanttags) x <- as.data.frame(t(as.matrix(table(vals)))) orch.keyval(k,x) # k = screenname, x = df(importanttags as cols) with counts } 19 19

ORCH: Create your own MapReduce jobs Count occurrence of hash tags in tweets per customer for select tags importanttags <- c("#bigdata","#database","#oracle","#sql") tag.summary <- hadoop.exec(tweets.id, mapper=maphashtags, reducer=reducehashtags, export=orch.export(importanttags=importanttags), config=new("mapred.config", job.name = "TwitterScreenNameHashTags", reduce.tasks = 5, map.output = data.frame(key='a', val='a'), reduce.output = data.frame(key='a', bigdata=0, database=0,oracle=0, sql=0))) hdfs.get(tag.summary) > hdfs.get(tag.summary) key bigdata database oracle sql 1 twitter.user.1 4 7 37 91 2 twitter.user.2 15 19 1 32 3 twitter.user.3 104 57 8 0 4 twitter.user.4 0 64 549 0 20 20

Modelling with Oracle R in-db and HDFS # Clustering with ORE Deployment Business Understanding Scientist Selection X <- ore.push (data.frame(x)) km.mod1 <- ore.odmkmeans(~., X, num.centers=2, num.bins=5) summary(km.mod1) rules(km.mod1) clusterhists(km.mod1) Evaluation Discovery # Regression with ORCH mod.lm <- orch.lm(myformula, my, nreducers = 2) summary(mod.lm) Model Building Preparation pred <- predict.orch.lm(mod.lm, newdata = my) res.pred <- hdfs.get(pred) head(res.pred) 21 21

In-database performance advantage R lm vs. ORE ore.lm : 500k to 1.5m records, 3 predictors Performance: 2x-3x improvement for build, 4x improvement for scoring 22 22

In-database performance advantage lm More tests at http://blogs.oracle.com/r/entry/oracle_r_enterprise_1_32 23 23

Deploying with Oracle R Enterprise Production Deploy ment Business Understanding Scientist Selection Load R scripts into ORE script repository Invoke R scripts by name from SQL Store R objects directly in Oracle base (no separate files) Optional return values: frame consumable by any SQL-ready application Evaluation Discovery XML containing structured data, complex R objects, PNG images PNG table with BLOB column containing images for immediate consumption Model Building Preparation Schedule for automatic execution 24 24

Oracle Advanced Analytics: Embedded R Execution SQL interface rqeval generate XML string for graphic output Oracle PL/SQL begin sys.rqscriptcreate('example6', 'function(){ res <- 1:10 Oracle BI Publisher plot( 1:100, rnorm(100), pch = 21, bg = "red", cex = 2 ) R Language res }'); end; / Oracle SQL select value from table(rqeval(null,'xml','example6')); 25 Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Summary Oracle R Enterprise (ORE) A comprehensive, database-centric environment for end-to-end analytical processes in R with immediate deployment to production environments Wide range of in-database advanced analytics algorithms exposed through R Eliminate R client memory limits Oracle R Connector for Hadoop (ORCH) A collection of R packages enabling Big analytics from an R environment Allows R users to leverage a Hadoop Cluster with HDFS and MapReduce from R Prepackaged advanced analytics algorithms Transparent manipulation of HIVE data Enable R users to conduct Big projects from R Eliminate client R engine memory barrier Scale to large data sets Deploy R-based solutions without translation to other languages or environments 26 26

Resources http://www.oracle.com/goto/r Blog: https://blogs.oracle.com/r/ Forum: https://forums.oracle.com/forums/forum.jspa?forumid=1397 Oracle R Distribution: http://www.oracle.com/technetwork/indexes/downloads/r-distribution-1532464.html ROracle: http://cran.r-project.org/web/packages/roracle Oracle R Enterprise: http://www.oracle.com/technetwork/database/options/advanced-analytics/r-enterprise Oracle R Connector for Hadoop: http://www.oracle.com/us/products/database/big-data-connectors/overview 27 27

28 28