extreme Datamining mit Oracle R Enterprise Oliver Bracht Managing Director eoda Matthias Fuchs Senior Consultant ISE Information Systems Engineering GmbH
extreme Datamining with Oracle R Enterprise About R In database data mining R with Oracle database R on Oracle Exadata R Example implementation Outlook R Copyright (C) ISE GmbH - All Rights Reserved 2
ISE & eoda - Oracle partner since 1992 - Test center Exadata Exalogic Exalytics - Gräfenberg Nürnberg - München - R expertice since 2009 - analysing of structured and unstructured data - Kassel Copyright (C) ISE GmbH - All Rights Reserved 3
About R Copyright (C) ISE GmbH - All Rights Reserved 4
About R - Packages Copyright (C) ISE GmbH - All Rights Reserved 5
About R - Relevance Copyright (C) ISE GmbH - All Rights Reserved 6
About R - Relevance Copyright (C) ISE GmbH - All Rights Reserved 7
About R - Relevance Copyright (C) ISE GmbH - All Rights Reserved 8
About R - Example Copyright (C) ISE GmbH - All Rights Reserved 9
In Database data mining Traditonal Analytics Data Import Model Scoring Data Preparation Transformation Oracle Datamining Savings Results Faster time for Data to Insights Lower TCO Eliminates Data Movement Data Duplication Maintains Security Model Building Data Preparation Transformation Data Extraction Model Scoring Embedded Data Preparation Model Building Data Preperation Cutting edge machine learning algorithms inside the SQL kernel of Database Copyright (C) ISE GmbH - All Rights Reserved 10
R with Oracle database Using Oracle DB Calculation Local Using Oracle DB Calculation in Oracle DB Using Oracle DB Calculation on DB Server R Engine Oracle R Packages SQL In Database statistical and data mining R emebedded Oracle R Packages R Engine Calculating on R Client Data out of DB transfer to client Calculating in DB Oracle Data mining Data stay in database Use of cell storage Calculating on DB server Data out of DB Spawning several R Processes Copyright (C) ISE GmbH - All Rights Reserved 11
R with Oracle database Comaprison between the Oracle R database methods R Client R in Database R in DB Server Cran Packages Yes Ore packages and ODM Parallel No, only in R In Packages, spawn parralel R Processes Yes Spawn parallel R Processes performance limitation Network, CPU, RAM on Client I/O, CPU, RAM on DB Server I/O, CPU, RAM of DB Server Parallel in R Start R client Out of SQL, R client Out of SQL, R client Copyright (C) ISE GmbH - All Rights Reserved 12
R with Oracle database - Oracle Data Mining Mapping Cran RODM Packages Mapping of ODM Packages to R RODM Function RODM_create_ai_model RODM_create_assoc_model RODM_create_dt_model RODM_create_glm_model RODM_create_kmeans_model RODM_create_nb_model RODM_create_nmf_model RODM_create_oc_model RODM_create_svm_model Description Attribute Importance Association Rules Decision Tree Generalized Linear Model Hierarchical k-means Naive Bayes Non-Negative Matrix Factorization O-cluster Support Vector Machine http://www.oracle.com/technetwork/articles/datawarehouse/saternos-r-161569.html Copyright (C) ISE GmbH - All Rights Reserved 13
R with Oracle database Routines in package ore Significance Tests Chi-square, McNemar, Bowker Simple and weighted kappas Cochran-Mantel-Haenzel correlation Cramer's V Binomial, KS, t, F, Wilcox Distribution Functions Beta distribution Binomial distribution Cauchy distribution Chi-square distribution Exponential distribution F-distribution Gamma distribution Geometric distribution Log Normal distribution Logistic distribution Negative Binomial distribution Normal distribution Poisson distribution Sign Rank distribution Student t distribution Uniform distribution Weibull distribution Density Function Probability Function Quantile distribution Other Functions Gamma function Natural logarithm of the Gamma function Digamma function Trigamma function Error function Complementary error function Base SAS Equivalents Freq, Summary, Sort Rank, Corr, Univariate Copyright (C) ISE GmbH - All Rights Reserved 14
R on Oracle Exadata Oracle Exadata Storage Server Oracle Database Server Compute Intensive Processing Oracle Database Server Compute Intensive Processing Data Intensive Processing Oracle Exadata Storage Server Data Intensive Processing Oracle Exadata Storage Server Data Intensive Processing Oracle Database Server Compute Intensive Processing Oracle Exadata Storage Server Data Intensive Processing Clustered Database Servers High Bandwidth Interconnect Massively Parallel Storage Copyright (C) ISE GmbH - All Rights Reserved 15
R on Oracle Exadata Database server Up to 256 GB memory Up to 2x8cores 8 times in full rack Exadata Cell Servers R Falsh cache up to 1,6 TB per cell Infiniband connections to DB Server Offloading 14 times in full rack Offloading for ore and ODM packages, cell use Spawing many R processes over all database servers Copyright (C) ISE GmbH - All Rights Reserved 16
R example implementation is one 100% child of the Axel Springer corporation and forms part of the media concern s extremely successful digital strategy is one of the three major digital markets for real estate in Germany has a complety oracle solution with exadata and exalytics Copyright (C) ISE GmbH - All Rights Reserved 17
R Example implementation Starting on R client op <- options(digits.secs=2) Sys.time() #Loading libraries require(party) #connecting to exadata.exa() #Loading data out of database dat <- ore.pull(immonet_data) #Building regression tree ct <- ctree(data = dat, control = ctree_control(maxdepth = 3), formula = rexa.calc ~ rpqm.calc + auss2.calc + flaechen.wohnflaeche + flaechen.anzahl_zimmer + freitexte.objekttitel.nchar) #Plot tree plot(ct, terminal_panel = node_boxplot(ct, id = FALSE, cex = 0)) " 21:33:03.85 CET" - " 21:33:58.31 CET" Copyright (C) ISE GmbH - All Rights Reserved 18
R Example implementation Starting on R client Copyright (C) ISE GmbH - All Rights Reserved 19
R Example implementation Working with R on server Copyright (C) ISE GmbH - All Rights Reserved 20
R Example implementation Starting on R remote op <- options(digits.secs=2) Sys.time() #connect.exa() #Calc mod <- ore.doeval( function(param) { require(party) dat <- ore.pull(immonet_data) ct <- ctree(data = dat, control = ctree_control(maxdepth = 3), formula = rexa.calc ~ rpqm.calc + auss2.calc + flaechen.wohnflaeche + flaechen.anzahl_zimmer + freitexte.objekttitel.nchar) pdf("2_client.pdf") plot(ct) dev.off() ct}) op <- options(digits.secs=2) Sys.time() Copyright (C) ISE GmbH - All Rights Reserved 21
R Example implementation Working embedded Copyright (C) ISE GmbH - All Rights Reserved 22
R Example implementation Working embedded - Detail rq*eval() Table Functions rqeval(), rqtableeval(), rqgroupeval(), rqroweval() Output only parts of the calculation, num rows Output table definition a query specifying the format of the result If NULL, output is a serialized BLOB Group name (optional) Name of the grouping column Number of rows (optional) number of rows to provide to function at one time Copyright (C) ISE GmbH - All Rights Reserved 23
Outlook - R in Big Data - Overall picture Big Data Appliance Exadata Exalytics Aquire Organize Analyze Decide Copyright (C) ISE GmbH - All Rights Reserved 24
Outlook - R hadoop and database Copyright (C) ISE GmbH - All Rights Reserved 25
Outlook - R on ExaStack Copyright (C) ISE GmbH - All Rights Reserved 26
More Informations OTN Blog Oracle R Packages https://blogs.oracle.com/r/entry/introduction_to_the_ore_statistics Rittmanmead http://www.oracle.com/technetwork/database/options/advancedanalytics/index.html http://www.rittmanmead.com/2012/10/oracle-exalytics-oracle-renterprise-and-endeca-part-1-oracles-analytics-engineered-systemsand-big-data-strategy/ Copyright (C) ISE GmbH - All Rights Reserved 27
Questions Copyright (C) ISE GmbH - All Rights Reserved 28