R / TERR Ana Costa e SIlva, PhD Senior Data Scientist TIBCO Copyright 2000-2013 TIBCO Software Inc.
Tower of Big and Fast Data Visual Data Discovery Hundreds of Records Millions of Records Key peformance indicators Data Mining Real Time Analytics Billions of Records (Big Data) Trillions of Records (Fast Data) Copyright 2000-2014 TIBCO Software Inc. 2
Tower of Big and Fast Data Spotfire Analyst Spotfire Business Author Spotfire Consumer Visual Data Discovery Spotfire Event Analytics Real Time Analytics Hundreds of Records Millions of Records Billions of Records (Big Data) Key peformance indicators Spotfire Mobile Metrics Data Mining TIBCO Enterprise Runtime for R Trillions of Records (Fast Data) Copyright 2000-2014 TIBCO Software Inc. 3
TERR TIBCO Enterprise Runtime for R (TERR) Latest in family of statistics scripting engines: S, S-PLUS, R, TERR Commercial Releases: v1.0 Nov 2012, v2.0 Nov 2013, v2.1 Feb 2014, Developer Edition: www.tibcommunity.com/community/products/analytics/terr Engine internals rebuilt from scratch Redesigned data object representation Redesigned memory management facilities Addresses long-standing problems with S language Fast and scalable engine!! 4
TERR Performance Model Fitting: 5 Million Rows Model Scoring: 20 Million Rows TERR 7X faster 84X 5
TERR: The Fastest Road to Big Data TERR: TIBCO Enterprise Runtime for R Most stable and performant access to analytics Zero learning curve for R programmers Supports in-database, in-hadoop functionality Teradata, Oracle, ; Apache, Horton, Cloudera, MapR, Deployment TERR Server execution: TIBCO Spotfire Statistics Services CEP Integration: TIBCO Business Events, Streambase Grid Integration: TIBCO GridServer Infrastructure Integration: TIBCO Business Works, 6
RStudio integration TERR now compatible with the most popular IDE in the R Community Professional-quality development environment to use with TERR Features Syntax highlighting, code completion, and smart indentation Execute R code directly from the source editor Manage multiple working directories using projects Quickly navigate code TERR integration with RStudio IDE
Demo 1 8
Hadoop / TERR: Write Your Mapper Use Standard R Syntax; Run using TERR If you can understand this, you can write mapreduce: cat input mapper sort reducer mapper <- function(d) { words <- strsplit(paste(d, collapse = ' '), '[[:punct:][:space:]]+')[[1]] # split on punctuation and spaces words <-words[!(words == '')] # get rid of empty words caused by whitespace at beginning of lines df <- data.frame(word = words) df$cnt<-1 hswritetable(df, sep = "\t") } 9
Write Your Reducer Use Standard R Syntax; Run using TERR If you can understand this, you can write mapreduce: cat input mapper sort reducer reducer <- function(d) { # d$wordis all one value per mapreduce cat(paste(d$word[1], sum(d$cnt), collapse="\t"), "\n") } 10
TERR Map Reduce From the command line: $ hadoop-streaming map mapper.r reduce reducer.r input inputfile output outputfile From TERR: optionally call remotely via TIBCO Spotfire Statistics Services Return.code <- system( hadoop-streaming map mapper.r reduce reducer.r input inputfile output outputfile ) 11
Hadoop Big Data Tools Complex Technical Confusing TIBCO Approach Authors and Consumers Hide Complexity, Empower Users Visual Query data on demand Fit interface to User skills 12
TERR Map Reduce Spotfire via Statistics Services Mapper.R TERRscript Reducer.R via TERRscript Hadoop Streaming $ hadoop-streaming map mapper.r reduce reducer.r -input inputfile output outputfile HDFS Each Node Processes its own data using TERR Data Node Data Node Data Node Data Node 13
Demo 2 14
TERR MapReduce from Spotfire Parameterize MapReduce, Generate and Edit MapReduce code, Test Locally, I/O from Spotfire Deploy through Hadoop Streaming MapReduce Interface from/to Spotfire Receive analysis results directly back into Spotfire for visualisation and further analysis Copyright 2000-2014 TIBCO Software Inc.
Contact Thank you! Ana Costa e Silva, PhD Senior Data Scientist ansilva@tibco.com TERR Developer Edition: www.tibcommunity.com/community/products/analytics/terr Copyright 2000-2013 TIBCO Software Inc. Copyright 2000-2013 TIBCO Software Inc. 16