R / TERR. Ana Costa e SIlva, PhD Senior Data Scientist TIBCO. Copyright 2000-2013 TIBCO Software Inc.



Similar documents
6.0, 6.5 and Beyond. The Future of Spotfire. Tobias Lehtipalo Sr. Director of Product Management

Welcome to the second half ofour orientation on Spotfire Administration.

April 2016 JPoint Moscow, Russia. How to Apply Big Data Analytics and Machine Learning to Real Time Processing. Kai Wähner.

What s new in TIBCO Spotfire 6.5

How to Navigate Big Data with Ad Hoc Visual Data Discovery Data technologies are rapidly changing, but principles of 30 years ago still apply today

PassTest. Bessere Qualität, bessere Dienstleistungen!

whitepaper Predictive Analytics with TIBCO Spotfire and TIBCO Enterprise Runtime for R

The Inside Scoop on Hadoop

Predictive Analytics with TIBCO Spotfire and TIBCO Enterprise Runtime for R

Big Data Are You Ready? Jorge Plascencia Solution Architect Manager

Yahoo! Grid Services Where Grid Computing at Yahoo! is Today

Data processing goes big

Spotfire and Tableau Positioning. Summary

Integrating VoltDB with Hadoop

Package HadoopStreaming

Up Your R Game. James Taylor, Decision Management Solutions Bill Franks, Teradata

An Oracle White Paper June High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

Oracle Big Data Handbook

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Performance and Scalability Overview

Hadoop and Map-Reduce. Swati Gore

Hadoop & SAS Data Loader for Hadoop

BIG DATA SOLUTION DATA SHEET

Using an In-Memory Data Grid for Near Real-Time Data Analysis

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

Introducing Oracle Exalytics In-Memory Machine

QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM

A Performance Analysis of Distributed Indexing using Terrier

Architecting for the Internet of Things & Big Data

Some vendors have a big presence in a particular industry; some are geared toward data scientists, others toward business users.

Unstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data


Big Data Are You Ready? Thomas Kyte

Bringing the Power of SAS to Hadoop. White Paper

Big Data and Data Science: Behind the Buzz Words

DATA EXPERTS MINE ANALYZE VISUALIZE. We accelerate research and transform data to help you create actionable insights

Accelerating Hadoop MapReduce Using an In-Memory Data Grid

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc

An Oracle White Paper November Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

and Hadoop Technology

CS380 Final Project Evaluating the Scalability of Hadoop in a Real and Virtual Environment

WHAT S NEW IN SAS 9.4

Actian SQL in Hadoop Buyer s Guide

Ubuntu and Hadoop: the perfect match

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

MapReduce and Hadoop Distributed File System

NoSQL and Hadoop Technologies On Oracle Cloud

Tips and Techniques on how to better Monitor, Manage and Optimize your MicroStrategy System High ROI DW and BI Solutions

I/O Considerations in Big Data Analytics

Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics

RHadoop and MapR. Accessing Enterprise- Grade Hadoop from R. Version 2.0 (14.March.2014)

Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia

Contents. Pentaho Corporation. Version 5.1. Copyright Page. New Features in Pentaho Data Integration 5.1. PDI Version 5.1 Minor Functionality Changes

Kognitio Technote Kognitio v8.x Hadoop Connector Setup

Cloudera Certified Developer for Apache Hadoop

The Power of Pentaho and Hadoop in Action. Demonstrating MapReduce Performance at Scale

Chase Wu New Jersey Ins0tute of Technology

Distributed Computing and Big Data: Hadoop and MapReduce

Apache Hadoop: Past, Present, and Future

Big & Fast Data Analytics. Event Analytics for Production Surveillance and Machine Management. Michael O Connell, PhD Chief Data Scientist TIBCO

Big Data With Hadoop

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances

Big Data and Natural Language: Extracting Insight From Text

STeP-IN SUMMIT June 2014 at Bangalore, Hyderabad, Pune - INDIA. Performance testing Hadoop based big data analytics solutions

What's New in SAS Data Management

Sentimental Analysis using Hadoop Phase 2: Week 2

An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise

Hadoop Ecosystem B Y R A H I M A.

Implement Hadoop jobs to extract business value from large and varied data sets

Using OBIEE for Location-Aware Predictive Analytics

How To Use Big Data For Telco (For A Telco)

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Advanced Big Data Analytics with R and Hadoop

Oracle Big Data SQL Technical Update

File S1: Supplementary Information of CloudDOE

Big Data Introduction

A. Aiken & K. Olukotun PA3

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014

6 Steps to Faster Data Blending Using Your Data Warehouse

Management & Analysis of Big Data in Zenith Team

Oracle Database - Engineered for Innovation. Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya

Safe Harbor Statement

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Big Data, beating the Skills Gap Using R with Hadoop

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

Presenters: Luke Dougherty & Steve Crabb

Oracle Big Data Discovery The Visual Face of Hadoop

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Empowering the Masses with Analytics

Big Data Analytics Using R

HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM

Analytics With Hadoop. SAS and Cloudera Starter Services: Visual Analytics and Visual Statistics

PEPPERDATA OVERVIEW AND DIFFERENTIATORS

Big Data Big Data/Data Analytics & Software Development

Welkom! Copyright 2014 Oracle and/or its affiliates. All rights reserved.

Understanding Hadoop Performance on Lustre

What We Can Do in the Cloud (2) -Tutorial for Cloud Computing Course- Mikael Fernandus Simalango WISE Research Lab Ajou University, South Korea

Three Reasons Why Visual Data Discovery Falls Short

EVERYTHING THAT MATTERS IN ADVANCED ANALYTICS

Transcription:

R / TERR Ana Costa e SIlva, PhD Senior Data Scientist TIBCO Copyright 2000-2013 TIBCO Software Inc.

Tower of Big and Fast Data Visual Data Discovery Hundreds of Records Millions of Records Key peformance indicators Data Mining Real Time Analytics Billions of Records (Big Data) Trillions of Records (Fast Data) Copyright 2000-2014 TIBCO Software Inc. 2

Tower of Big and Fast Data Spotfire Analyst Spotfire Business Author Spotfire Consumer Visual Data Discovery Spotfire Event Analytics Real Time Analytics Hundreds of Records Millions of Records Billions of Records (Big Data) Key peformance indicators Spotfire Mobile Metrics Data Mining TIBCO Enterprise Runtime for R Trillions of Records (Fast Data) Copyright 2000-2014 TIBCO Software Inc. 3

TERR TIBCO Enterprise Runtime for R (TERR) Latest in family of statistics scripting engines: S, S-PLUS, R, TERR Commercial Releases: v1.0 Nov 2012, v2.0 Nov 2013, v2.1 Feb 2014, Developer Edition: www.tibcommunity.com/community/products/analytics/terr Engine internals rebuilt from scratch Redesigned data object representation Redesigned memory management facilities Addresses long-standing problems with S language Fast and scalable engine!! 4

TERR Performance Model Fitting: 5 Million Rows Model Scoring: 20 Million Rows TERR 7X faster 84X 5

TERR: The Fastest Road to Big Data TERR: TIBCO Enterprise Runtime for R Most stable and performant access to analytics Zero learning curve for R programmers Supports in-database, in-hadoop functionality Teradata, Oracle, ; Apache, Horton, Cloudera, MapR, Deployment TERR Server execution: TIBCO Spotfire Statistics Services CEP Integration: TIBCO Business Events, Streambase Grid Integration: TIBCO GridServer Infrastructure Integration: TIBCO Business Works, 6

RStudio integration TERR now compatible with the most popular IDE in the R Community Professional-quality development environment to use with TERR Features Syntax highlighting, code completion, and smart indentation Execute R code directly from the source editor Manage multiple working directories using projects Quickly navigate code TERR integration with RStudio IDE

Demo 1 8

Hadoop / TERR: Write Your Mapper Use Standard R Syntax; Run using TERR If you can understand this, you can write mapreduce: cat input mapper sort reducer mapper <- function(d) { words <- strsplit(paste(d, collapse = ' '), '[[:punct:][:space:]]+')[[1]] # split on punctuation and spaces words <-words[!(words == '')] # get rid of empty words caused by whitespace at beginning of lines df <- data.frame(word = words) df$cnt<-1 hswritetable(df, sep = "\t") } 9

Write Your Reducer Use Standard R Syntax; Run using TERR If you can understand this, you can write mapreduce: cat input mapper sort reducer reducer <- function(d) { # d$wordis all one value per mapreduce cat(paste(d$word[1], sum(d$cnt), collapse="\t"), "\n") } 10

TERR Map Reduce From the command line: $ hadoop-streaming map mapper.r reduce reducer.r input inputfile output outputfile From TERR: optionally call remotely via TIBCO Spotfire Statistics Services Return.code <- system( hadoop-streaming map mapper.r reduce reducer.r input inputfile output outputfile ) 11

Hadoop Big Data Tools Complex Technical Confusing TIBCO Approach Authors and Consumers Hide Complexity, Empower Users Visual Query data on demand Fit interface to User skills 12

TERR Map Reduce Spotfire via Statistics Services Mapper.R TERRscript Reducer.R via TERRscript Hadoop Streaming $ hadoop-streaming map mapper.r reduce reducer.r -input inputfile output outputfile HDFS Each Node Processes its own data using TERR Data Node Data Node Data Node Data Node 13

Demo 2 14

TERR MapReduce from Spotfire Parameterize MapReduce, Generate and Edit MapReduce code, Test Locally, I/O from Spotfire Deploy through Hadoop Streaming MapReduce Interface from/to Spotfire Receive analysis results directly back into Spotfire for visualisation and further analysis Copyright 2000-2014 TIBCO Software Inc.

Contact Thank you! Ana Costa e Silva, PhD Senior Data Scientist ansilva@tibco.com TERR Developer Edition: www.tibcommunity.com/community/products/analytics/terr Copyright 2000-2013 TIBCO Software Inc. Copyright 2000-2013 TIBCO Software Inc. 16