extreme Datamining mit Oracle R Enterprise

Similar documents
Oracle Advanced Analytics Oracle R Enterprise & Oracle Data Mining

Exadata V2 + Oracle Data Mining 11g Release 2 Importing 3 rd Party (SAS) dm models

Anomaly and Fraud Detection with Oracle Data Mining 11g Release 2

The Oracle Data Mining Machine Bundle: Zero to Predictive Analytics in Two Weeks Collaborate 15 IOUG

The Data Mining Process

Advanced In-Database Analytics

Understanding the Benefits of IBM SPSS Statistics Server

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

Oracle Big Data SQL Technical Update

Oracle Advanced Analytics 12c & SQLDEV/Oracle Data Miner 4.0 New Features

SUN ORACLE EXADATA STORAGE SERVER

Big Data and Data Science: Behind the Buzz Words

Inge Os Sales Consulting Manager Oracle Norway

1 Copyright 2011, Oracle and/or its affiliates. All rights reserved.

Learning R Series Session 4: Oracle R Enterprise 1.3 Predictive Analytics Mark Hornick Oracle Advanced Analytics

Fraud and Anomaly Detection Using Oracle Advanced Analytic Option 12c

Main Memory Data Warehouses

An Oracle White Paper June High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

Introducing Oracle Exalytics In-Memory Machine

Preview of Oracle Database 12c In-Memory Option. Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Starting Smart with Oracle Advanced Analytics

I/O Considerations in Big Data Analytics

Exadata for Oracle DBAs. Longtime Oracle DBA

Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs

Oracle Data Mining In-Database Data Mining Made Easy!

Integrating Apache Spark with an Enterprise Data Warehouse

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

Greenplum Database. Getting Started with Big Data Analytics. Ofir Manor Pre Sales Technical Architect, EMC Greenplum

Predictive Analytics Powered by SAP HANA. Cary Bourgeois Principal Solution Advisor Platform and Analytics

Big Data Are You Ready? Thomas Kyte

Exadata Database Machine

Oracle Database - Engineered for Innovation. Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya

TUT NoSQL Seminar (Oracle) Big Data

Seamless Access from Oracle Database to Your Big Data

Architecting for the Internet of Things & Big Data

2009 Oracle Corporation 1

SAS and Oracle: Big Data and Cloud Partnering Innovation Targets the Third Platform

Advanced Big Data Analytics with R and Hadoop

An In-Depth Look at In-Memory Predictive Analytics for Developers

IBM Netezza High Capacity Appliance

Hadoop SNS. renren.com. Saturday, December 3, 11

What s Cooking in KNIME

Oracle Exadata Database Machine for SAP Systems - Innovation Provided by SAP and Oracle for Joint Customers

Scalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011

An Introduction to Data Mining

<Insert Picture Here> Best Practices for Extreme Performance with Data Warehousing on Oracle Database

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Novinky v Oracle Exadata Database Machine

Oracle Data Miner (Extension of SQL Developer 4.0)

Oracle Database In-Memory The Next Big Thing

Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics

BIG DATA What it is and how to use?

Capacity Management for Oracle Database Machine Exadata v2

Oracle Enterprise Manager 12c New Capabilities for the DBA. Charlie Garry, Director, Product Management Oracle Server Technologies

ORACLE BIG DATA APPLIANCE X3-2

Statistical Analysis of Gene Expression Data With Oracle & R (- data mining)

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved.

IBM SPSS Modeler 15 In-Database Mining Guide

ORACLE BUSINESS INTELLIGENCE, ORACLE DATABASE, AND EXADATA INTEGRATION

Increasing Flash Throughput for Big Data Applications (Data Management Track)

Safe Harbor Statement

Session 1: Introduction to Oracle's R Technologies

Building In-Database Predictive Scoring Model: Check Fraud Detection Case Study

Data Mining with Oracle Database 11g Release 2

Quick Start. Creating a Scoring Application. RStat. Based on a Decision Tree Model

Application of Predictive Analytics for Better Alignment of Business and IT

HP Oracle Database Platform / Exadata Appliance Extreme Data Warehousing

Oracle Database 12c Plug In. Switch On. Get SMART.

Executive Summary... 2 Introduction Defining Big Data The Importance of Big Data... 4 Building a Big Data Platform...

Up Your R Game. James Taylor, Decision Management Solutions Bill Franks, Teradata

Cost-Effective Business Intelligence with Red Hat and Open Source

SMB Direct for SQL Server and Private Cloud

Oracle Big Data, In-memory, and Exadata - One Database Engine to Rule Them All Dr.-Ing. Holger Friedrich

Oracle Data Mining. Concepts 11g Release 2 (11.2) E

SQream Technologies Ltd - Confiden7al

Big Data and Its Impact on the Data Warehousing Architecture

Unstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012

SQL Server 2005 Features Comparison

Package dsmodellingclient

Prerequisites. Course Outline

ANALYTICS CENTER LEARNING PROGRAM

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

Data Analysis with Various Oracle Business Intelligence and Analytic Tools

NextGen Infrastructure for Big DATA Analytics.

High Performance Predictive Analytics in R and Hadoop:

An Oracle White Paper January Improving SAS Customer Intelligence Solution Performance with Oracle SPARC SuperCluster

Oracle Big Data Spatial & Graph Social Network Analysis - Case Study

Getting Started with Oracle Data Miner 11g R2. Brendan Tierney

An Oracle White Paper June A Technical Overview of the Oracle Exadata Database Machine and Exadata Storage Server

SAP HANA. Markus Fath, SAP HANA Product Management June 2013

ORACLE EXADATA STORAGE SERVER X4-2

Transcription:

extreme Datamining mit Oracle R Enterprise Oliver Bracht Managing Director eoda Matthias Fuchs Senior Consultant ISE Information Systems Engineering GmbH

extreme Datamining with Oracle R Enterprise About R In database data mining R with Oracle database R on Oracle Exadata R Example implementation Outlook R Copyright (C) ISE GmbH - All Rights Reserved 2

ISE & eoda - Oracle partner since 1992 - Test center Exadata Exalogic Exalytics - Gräfenberg Nürnberg - München - R expertice since 2009 - analysing of structured and unstructured data - Kassel Copyright (C) ISE GmbH - All Rights Reserved 3

About R Copyright (C) ISE GmbH - All Rights Reserved 4

About R - Packages Copyright (C) ISE GmbH - All Rights Reserved 5

About R - Relevance Copyright (C) ISE GmbH - All Rights Reserved 6

About R - Relevance Copyright (C) ISE GmbH - All Rights Reserved 7

About R - Relevance Copyright (C) ISE GmbH - All Rights Reserved 8

About R - Example Copyright (C) ISE GmbH - All Rights Reserved 9

In Database data mining Traditonal Analytics Data Import Model Scoring Data Preparation Transformation Oracle Datamining Savings Results Faster time for Data to Insights Lower TCO Eliminates Data Movement Data Duplication Maintains Security Model Building Data Preparation Transformation Data Extraction Model Scoring Embedded Data Preparation Model Building Data Preperation Cutting edge machine learning algorithms inside the SQL kernel of Database Copyright (C) ISE GmbH - All Rights Reserved 10

R with Oracle database Using Oracle DB Calculation Local Using Oracle DB Calculation in Oracle DB Using Oracle DB Calculation on DB Server R Engine Oracle R Packages SQL In Database statistical and data mining R emebedded Oracle R Packages R Engine Calculating on R Client Data out of DB transfer to client Calculating in DB Oracle Data mining Data stay in database Use of cell storage Calculating on DB server Data out of DB Spawning several R Processes Copyright (C) ISE GmbH - All Rights Reserved 11

R with Oracle database Comaprison between the Oracle R database methods R Client R in Database R in DB Server Cran Packages Yes Ore packages and ODM Parallel No, only in R In Packages, spawn parralel R Processes Yes Spawn parallel R Processes performance limitation Network, CPU, RAM on Client I/O, CPU, RAM on DB Server I/O, CPU, RAM of DB Server Parallel in R Start R client Out of SQL, R client Out of SQL, R client Copyright (C) ISE GmbH - All Rights Reserved 12

R with Oracle database - Oracle Data Mining Mapping Cran RODM Packages Mapping of ODM Packages to R RODM Function RODM_create_ai_model RODM_create_assoc_model RODM_create_dt_model RODM_create_glm_model RODM_create_kmeans_model RODM_create_nb_model RODM_create_nmf_model RODM_create_oc_model RODM_create_svm_model Description Attribute Importance Association Rules Decision Tree Generalized Linear Model Hierarchical k-means Naive Bayes Non-Negative Matrix Factorization O-cluster Support Vector Machine http://www.oracle.com/technetwork/articles/datawarehouse/saternos-r-161569.html Copyright (C) ISE GmbH - All Rights Reserved 13

R with Oracle database Routines in package ore Significance Tests Chi-square, McNemar, Bowker Simple and weighted kappas Cochran-Mantel-Haenzel correlation Cramer's V Binomial, KS, t, F, Wilcox Distribution Functions Beta distribution Binomial distribution Cauchy distribution Chi-square distribution Exponential distribution F-distribution Gamma distribution Geometric distribution Log Normal distribution Logistic distribution Negative Binomial distribution Normal distribution Poisson distribution Sign Rank distribution Student t distribution Uniform distribution Weibull distribution Density Function Probability Function Quantile distribution Other Functions Gamma function Natural logarithm of the Gamma function Digamma function Trigamma function Error function Complementary error function Base SAS Equivalents Freq, Summary, Sort Rank, Corr, Univariate Copyright (C) ISE GmbH - All Rights Reserved 14

R on Oracle Exadata Oracle Exadata Storage Server Oracle Database Server Compute Intensive Processing Oracle Database Server Compute Intensive Processing Data Intensive Processing Oracle Exadata Storage Server Data Intensive Processing Oracle Exadata Storage Server Data Intensive Processing Oracle Database Server Compute Intensive Processing Oracle Exadata Storage Server Data Intensive Processing Clustered Database Servers High Bandwidth Interconnect Massively Parallel Storage Copyright (C) ISE GmbH - All Rights Reserved 15

R on Oracle Exadata Database server Up to 256 GB memory Up to 2x8cores 8 times in full rack Exadata Cell Servers R Falsh cache up to 1,6 TB per cell Infiniband connections to DB Server Offloading 14 times in full rack Offloading for ore and ODM packages, cell use Spawing many R processes over all database servers Copyright (C) ISE GmbH - All Rights Reserved 16

R example implementation is one 100% child of the Axel Springer corporation and forms part of the media concern s extremely successful digital strategy is one of the three major digital markets for real estate in Germany has a complety oracle solution with exadata and exalytics Copyright (C) ISE GmbH - All Rights Reserved 17

R Example implementation Starting on R client op <- options(digits.secs=2) Sys.time() #Loading libraries require(party) #connecting to exadata.exa() #Loading data out of database dat <- ore.pull(immonet_data) #Building regression tree ct <- ctree(data = dat, control = ctree_control(maxdepth = 3), formula = rexa.calc ~ rpqm.calc + auss2.calc + flaechen.wohnflaeche + flaechen.anzahl_zimmer + freitexte.objekttitel.nchar) #Plot tree plot(ct, terminal_panel = node_boxplot(ct, id = FALSE, cex = 0)) " 21:33:03.85 CET" - " 21:33:58.31 CET" Copyright (C) ISE GmbH - All Rights Reserved 18

R Example implementation Starting on R client Copyright (C) ISE GmbH - All Rights Reserved 19

R Example implementation Working with R on server Copyright (C) ISE GmbH - All Rights Reserved 20

R Example implementation Starting on R remote op <- options(digits.secs=2) Sys.time() #connect.exa() #Calc mod <- ore.doeval( function(param) { require(party) dat <- ore.pull(immonet_data) ct <- ctree(data = dat, control = ctree_control(maxdepth = 3), formula = rexa.calc ~ rpqm.calc + auss2.calc + flaechen.wohnflaeche + flaechen.anzahl_zimmer + freitexte.objekttitel.nchar) pdf("2_client.pdf") plot(ct) dev.off() ct}) op <- options(digits.secs=2) Sys.time() Copyright (C) ISE GmbH - All Rights Reserved 21

R Example implementation Working embedded Copyright (C) ISE GmbH - All Rights Reserved 22

R Example implementation Working embedded - Detail rq*eval() Table Functions rqeval(), rqtableeval(), rqgroupeval(), rqroweval() Output only parts of the calculation, num rows Output table definition a query specifying the format of the result If NULL, output is a serialized BLOB Group name (optional) Name of the grouping column Number of rows (optional) number of rows to provide to function at one time Copyright (C) ISE GmbH - All Rights Reserved 23

Outlook - R in Big Data - Overall picture Big Data Appliance Exadata Exalytics Aquire Organize Analyze Decide Copyright (C) ISE GmbH - All Rights Reserved 24

Outlook - R hadoop and database Copyright (C) ISE GmbH - All Rights Reserved 25

Outlook - R on ExaStack Copyright (C) ISE GmbH - All Rights Reserved 26

More Informations OTN Blog Oracle R Packages https://blogs.oracle.com/r/entry/introduction_to_the_ore_statistics Rittmanmead http://www.oracle.com/technetwork/database/options/advancedanalytics/index.html http://www.rittmanmead.com/2012/10/oracle-exalytics-oracle-renterprise-and-endeca-part-1-oracles-analytics-engineered-systemsand-big-data-strategy/ Copyright (C) ISE GmbH - All Rights Reserved 27

Questions Copyright (C) ISE GmbH - All Rights Reserved 28