Revolution R Enterprise: Efficient Predictive Analytics for Big Data
|
|
|
- Cornelia Cameron
- 10 years ago
- Views:
Transcription
1 Revolution R Enterprise: Efficient Predictive Analytics for Big Data Prepared for The Bloor Group August 2014 Bill Jacobs Director Product Marketing / Field CTO - Big Data Products
2 Revolution R Enterprise Why R? Who[m] is Revolution Analytics?
3 What is R? Most widely used data analysis software Used by 2M+ data scientists, statisticians and analysts Most powerful statistical programming language Flexible, extensible and comprehensive for productivity Create beautiful and unique data visualizations As seen in New York Times, Twitter and Flowing Data Thriving open-source community Leading edge of analytics research Fills the talent gap New graduates prefer R R is Hot bit.ly/r-is-hot WHITE PAPER
4 Graphics and Data Visualization Functions for standard graphs Scatterplot, time series, histogram, smoothing, Bar plot, pie chart, dot chart, Image plot, 3-D surface, map, Influences from Cleveland, Tufte etc. Conditioning, small multiples, use of color Customize without limits Combine graph types Create entirely new graphics 4
5 Breadth of R Community Collaboration CRAN Task Views (subset of packages) Source: 5
6 Exploding growth and demand for R R Usage Growth Rexer Data Miner Survey, % of data miners report using R R is the first choice of more data miners than any other software Source: R is the highest paid IT skill Dice.com, Jan 2014 R most-used data science language after SQL O Reilly, Jan 2014 R is used by 70% of data miners Rexer, Sep 2013 R is #15 of all programming languages RedMonk, Jan 2014 R growing faster than any other data science language KDnuggets, Aug 2013 More than 2 million users worldwide
7 OUR COMPANY OUR SOFTWARE SOME KUDOS MOUNTAIN VIEW LONDON SINGAPORE The leading provider of advanced analytics software and services based on open source R, since 2007 The only Big Data, Big Analytics software platform based on the data science language R Visionary Gartner Magic Quadrant for Advanced Analytics Platforms,
8 Technical Support for Open Source R AdviseR from Revolution Analytics Technical support for open source R, from the R experts. 24x7 and phone support On-line case management and knowledgebase Access to technical resources, documentation and user forums Exclusive on-line webinars from community experts Guaranteed response times R SUPPORT 12 INCIDENTS 12 MONTHS $3000 PER USER 8
9 Big Data Big Analy,cs is different
10 R: Open Source that Drives Innovation but. It Has Some Limitations for Enterprises Big Data In-memory bound Hybrid memory & disk scalability Speed of Analysis Enterprise Readiness Analytic Breadth & Depth Commercial Viability Operates on bigger volumes & factors Single threaded Parallel threading Shrinks analysis time Community support Commercial support Delivers full service production support innovative analytic packages Risk of deployment of open source Leverage open source packages plus Big Data ready packages Commercial license Supercharges R Eliminate risk with open source 10
11 The Goal: Desktop Users with Analytical Access to Huge Data in Big Data Platforms Predictive Modeling Algorithms Big Data 11
12 Goal #1: Simplicity One Call Does It All In Revolution R Enterprise: 1. Starts A Master Process 2. Distribute Work 1. Threading? Cores? Sockets? Nodes? 2. Available RAM? 3. Location of Data? 3. Master Tasks for Each Node rxsetcomputecontext(rxhadoopmr( ) arrdelaylm1 <- rxlinmod( ) Initiator Finalizer Task Task Task Big Data 4. Master Initiates Distributed Work 1. Hadoop Schedules Mapper for Each Split 2. Algorithm Computes Intermediate Result 3. Reducer Combines Intermediate Results 5. Master Process Evaluates Completion 6. Returns Consolidated Answer to Script 12
13 Goal #2: Scale Analyze Data That Far Exceed Memory Available Predictive Modeling Big Data 13
14 Goal #3: Manageability: Big Data Means One Repository with Many Users Other Workloads Predictive Modeling YARN, Teradata Other Workloads 14
15 Goal #5: Stability: Invest in R with Confidence ? 15
16 Goal #6: Reach: Deploy Analytics to Enterprise Audiences Information Users Business Analysts IT Pro s 16
17 Goal #7: Portability Write Once, Deploy Anywhere (well, almost anywhere ) 17
18 Revolution R Enterprise What is in Revolution R Enterprise? How does it perform?
19 Introducing Revolution R Enterprise (RRE) The Big Data Big Analytics Platform Big Data Big Analytics Ready DevelopR DeployR Enterprise readiness High performance analytics R+CRAN RevoR ConnectR ScaleR DistributedR Multi-platform architecture Data source integration Development tools Deployment tools 19
20 The Platform Step by Step: R Capabilities R+CRAN Open source R interpreter UPDATED R Freely-available R algorithms Algorithms callable by RevoR Embeddable in R scripts 100% Compatible with existing R scripts, functions and packages R+CRAN RevoR DevelopR DeployR ConnectR ScaleR DistributedR RevoR Performance enhanced R interpreter Based on open source R Adds high-performance math Available On: Platform TM LSF TM Linux Microsoft HPC Clusters Windows & Linux Servers Windows & Linux Workstations IBM Netezza NEW Cloudera Hadoop NEW Hortonworks Hadoop NEW Teradata Database Intel Hadoop IBM BigInsights TM 20
21 Revolution R Enterprise: RevoR - Performance Enhanced R Open Source R Customers report Revolution 3-50x R performance improvements Enterprise compared to Open Source R without changing any code Computation (4-core laptop) Open Source R Revolution R Speedup Linear Algebra 1 Matrix Multiply 176 sec 9.3 sec 18x Cholesky Factorization 25.5 sec 1.3 sec 19x Linear Discriminant Analysis 189 sec 74 sec 3x General R Benchmarks 2 R Benchmarks (Matrix Functions) 22 sec 3.5 sec 5x R Benchmarks (Program Control) 5.6 sec 5.4 sec Not appreciable
22 The Platform Step by Step: Parallelization & Data Sourcing ScaleR Ready-to-Use high-performance big data big analytics Fully-parallelized analytics Data prep & data distillation Descriptive statistics & statistical tests Correlation & covariance matrices Predictive Models linear, logistic, GLM Machine learning Monte Carlo simulation Tools for distributing customized algorithms across nodes R+CRAN RevoR DevelopR ConnectR ScaleR DistributedR DeployR ConnectR High-speed & direct connectors Available for: High-performance XDF SAS, SPSS, delimited & fixed format text data files Hadoop HDFS (text & XDF) Teradata Database & Aster EDWs and ADWs ODBC DistributedR Distributed computing framework Delivers portability across platforms Available on: Windows Servers Red Hat Linux SuSE Linux Servers IBM Platform LSF Linux Microsoft HPC Clusters Teradata Database Cloudera Hadoop Hortonworks Hadoop NEW MapR Hadoop 22
23 Revolution R Enterprise ScaleR: High Performance Big Data Analytics R Data Step Descriptive Statistics Statistical Tests Sampling Predictive Modeling Data Visualization Machine Learning Simulation 23
24 Revolution R Enterprise 7.0 ScaleR Parallelized Algorithms Data Step Data import Delimited, Fixed, SAS, SPSS, OBDC Variable creation & transformation Recode variables Factor variables Missing value handling Sort, Merge, Split Aggregate by category (means, sums) Descriptive Statistics Min / Max, Mean, Median (approx.) Quantiles (approx.) Standard Deviation Variance Correlation Covariance Sum of Squares (cross product matrix for set variables) Pairwise Cross tabs Risk Ratio & Odds Ratio Cross-Tabulation of Data (standard tables & long form) Marginal Summaries of Cross Tabulations Statistical Tests Chi Square Test Kendall Rank Correlation Fisher s Exact Test Student s t-test Sampling Subsample (observations & variables) Random Sampling Predictive Models Sum of Squares (cross product matrix for set variables) Multiple Linear Regression Generalized Linear Models (GLM) exponential family distributions: binomial, Gaussian, inverse Gaussian, Poisson, Tweedie. Standard link functions: cauchit, identity, log, logit, probit. User defined distributions & link functions. Covariance & Correlation Matrices Logistic Regression Classification & Regression Trees Predictions/scoring for models Residuals for all models Variable Selection Stepwise Regression Linear, Logistic and GLM Simulation Monte Carlo Parallel Random Number Generation Cluster Analysis K-Means Classification Decision Trees Decision Forests Combination Using Revolution rxdatastep and rxexec functions to combine open source R with Revolution R works with Open Source R v3 Confidential to Revolution Analytics.
25 Revolution R Enterprise ScaleR Outperforms SAS HAP at a Fraction of the Cost Logistic Regression: Rows of data 1 billion 1 billion Parameters just a few Double 7 Time 80 seconds 45% 44 seconds Data location In memory On disk Nodes 32 1/6 th 5 Cores 384 5% 20 RAM 1,536 GB 5% 80 GB Revolution R is faster on the same amount of data, despite using approximately a 20 th as many cores, a 20 th as much RAM, a 6 th as many nodes, and not pre-loading data into RAM. Bottom Line: Revolution R Enterprise Performance = Greatly Reduced TCO *As published by SAS in HPC Wire, April 21,
26 DistributedR: Platform Abstraction for ScaleR DevelopR DeployR R+CRAN RevoR ConnectR ScaleR DistributedR DistributedR Distributed Files Cloudera Hadoop Hortonworks Hadoop MapR Hadoop Under Development: Spark DistributedR MPI Based IBM Platform LSF Linux Microsoft HPC Clusters DistributedR Local Files Red Hat Linux SuSE Linux Servers Windows Server DistributedR SQL Teradata Database 26
27 Transparent Scaling of R Analytics on Hadoop Run RRE Analytics In Hadoop Without Change Eliminate Need To Design Parallel Software or Think In MapReduce Leverage All Revolution R Enterprise Pre-Parallelized Algorithms Enable Users To Build Custom Apps That Leverage Hadoop s Parallelism Slash Data Movement by Analyzing HDFS Data In place Expand Deployment and Integration Options Revolution R Enterprise Desktops & Servers Hadoop Cluster Revolution R Enterprise direct web services Corporate Applications Edge or Worker Node Data Nodes 27
28 Simplicity Goal: Leverage Teradata Database As A Parallel R Engine. Teradata Database Run RRE Code Without Change with Teradata as Engine Eliminate Need To Design Parallel Software or write UDFs Leverage All Revolution R Enterprise Pre- Parallelized Algorithms Enable Users To Build Custom Apps That Leverage Teradata MPP Slash Data Movement by Analyzing Data Inside the Database Expand Deployment and Integration Options 28
29 Support Multiplatform Deployments Data Repository Refining, Exploration & Profiling Development Environment Model Development & Testing Operations Application Deployment 29
30 Write Once. Deploy Anywhere. Hadoop Hortonworks Cloudera EDW IBM Netezza Teradata R+CRAN RevoR DevelopR ConnectR ScaleR DeployR Clustered Systems Workstations & Servers In the Cloud IBM Platform LSF Microsoft HPC Desktop Server Amazon AWS DistributedR DESIGNED FOR SCALE, PORTABILITY & PERFORMANCE 30
31 Goal #6: Reach: Enterprise Integration & Deployment Information Users Business Analysts IT Pro s 31
32 The Platform Step by Step: Tools & Deployment DevelopR Integrated development environment for R Visual step-into debugger Available on: Windows R+CRAN RevoR DevelopR DeployR ConnectR ScaleR DistributedR DeployR Web services software development kit for integration analytics via Java, JavaScript or.net APIs Integrates R Into application infrastructures Capabilities: Invokes R Scripts from web services calls RESTful interface for easy integration Works with web & mobile apps, leading BI & Visualization tools and business rules engines 32
33 DevelopR Integrated Development Environment Script with type ahead and code snippets Solutions window for organizing code and data Sophisticated debugging with breakpoints, variable values etc. Objects loaded in the R Environment Packages installed and loaded Object details 33
34 RRE DeployR: Integration with 3rd Party Software R / Statistical Modeling Expert DeployR Deployment Expert Data Analysis Business Intelligence Seamless q Bring the power of R to any web enabled application Simple q Leverage common APIs including JS, Java,.NET Scalable q Robustly scale user and compute workloads Secure q Manage enterprise security with LDAP & SSO Mobile Web Apps Cloud / SaaS 34
35 The Power of Revolution R Enterprise: Performance & Scalability In-Hadoop Execution ScaleR Moves computation to data In-Database Execution ScaleR Moves computation to data Parallelized User Code ScaleR Leverage CRAN V a l u e Parallelized Algorithms Grid Processing Multi-Threaded Execution Memory Management ScaleR DistributedR DistributedR DistributedR Labor saving power Maximizes computation Powerful divide & conquer Effective memory utilization Fast Math Libraries RevoR 3-50X faster R + CRAN Open Source Leverage latest innovation 35
36 Revolution R Enterprise Pre-Parallelized Algorithms vs. Manual Approaches
37 Parallelizing Algorithms My Hand: Illustrated Using K-Means Clustering The Goal: Compute the Algorithm on Distributed Data. Compute Algorithm 37
38 Parallelizing Algorithms My Hand: Illustrated Using K-Means Clustering The Goal: Compute the Algorithm on Distributed Data. Compute Algorithm The Fun Part: Re-Factor the Algorithm For Distributed Data. Distributed K-Means Requires About 6 Steps
39 Parallelizing Algorithms My Hand: Illustrated Using K-Means Clustering The Goal: Compute the Algorithm on Distributed Data. Compute Algorithm The Fun Part: Re-Factor the Algorithm For Distributed Data. Distributed K-Means Requires About 6 Steps The Knit Everything Together
40 Parallelizing Algorithms My Hand: Illustrated Using K-Means Clustering The Goal: Compute the Algorithm on Distributed Data. Compute Algorithm The Fun Part: Re-Factor the Algorithm For Distributed Compute. Distributed K-Means Requires About 6 Steps The Tedious Part: The Knit Everything Together Counting All the Steps, Efficient Distributed Computation of the K-Means Clustering Algorithm Requires approximately 13 Steps: 1 Initiator, 6 Steps, 5 Combiners and 1 Finalizer. and the result runs on a single platform. 40
41 Parallelizing Algorithms By Hand: Illustrated Using K-Means Clustering The Goal: Compute the Kmeans Clustering Algorithm Manual Parallelization: rhadoop, IOTools, Hadoop Streaming 1) Refactor algorithm math for parallelism 2) Write 13 components ) Build the Algorithm 4) Test and Debug ) Document 6) Optimize 7) Use Or - Compute Algorithm Automatic Parallelization: With Revolution R Enterprise Use the algorithm from day one Custom Algorithms. + Multiple Platforms. and the result runs on a all ScaleR platforms
42 Revolution R Enterprise and Big Data Platforms How Does RRE Play Inside Hadoop How Does RRE Play Inside the Teradata Database?
43 Parallel Algorithms Running Hadoop Transparent Distribution of Computation Revolution R Enterprise Desktops & Servers Hadoop Cluster Data Nodes Edge Node Master Process Reducer Mapper Mapper Mapper Mapper 43
44 How Does It Work? RRE in Teradata Database Revolution R Enterprise Desktops & Servers ODBC Teradata Database Parse Engine External Stored Procedure Database Nodes AMPs Table Operator Table Operator Table Operator Table Operator Hybrid Storage Data Segments 44
45 Revolution Analytics Support Offerings Support and Training Products Quick Start Programs
46 Customer Support Customer Support is included with Subscription Standard Support Today: 9-5pm PT telephone, , web. Also, via a support forum Q4 13: Standard 9-6pm Regional Hours Premium Support Today: Premium Support as Blocks On-Demand Consulting Q1 14: 24x7x365 Support for Severity 1 Issues Q1 14: Technical Account Management 46
47 Revolution Analytics Professional Services Training Comprehensive Topics Self Paced & Classroom Customizable Consulting Remote & On site Projects & Staff Aug Quick Start Programs Entire project lifecycle support Bundle them together 47
48 Training Onsite Online Custom 48
49 Quick Start Options 49
50 Revolution R Enterprise Ecosystem Power of Integration SI / Service Deployment / Consumption MSP / DSP Advanced Analytics ETL Corios Data / Infrastructure 50
51 Big Data Analytics Applications Our Customers and Prospects Are Attacking Marketing: Clickstream & Campaign Analyses Commerce: Recommendation Engines Social Media: Sentiment Analysis Behavior Prediction: Purchase Prediction Outlier Detection: Fraud Waste and Abuse Risk Analysis: Insurance Underwriting Machine Telemetry: Predictive Maintenance Operations: Automotive Warranty Optimization Econometrics: Market Prediction Product Marketing: Mix and Price Optimization Life Sciences: Treatment Outcome Prediction Drug Development: Pharmacogenetics Transportation: Asset Utilization 51
52 Revolution Analytics - Overview We are the only provider of a commercial analytics platform based on the open source R statistical computing language. Stable, scalable multi-platform with world-wide support Enterprise Readiness Power Productivity Distributed, high performance analytical algorithms Easier to build and deploy analytic applications Professional services enablement World Wide Support Teams Standard and Premium Programs Technical Account Managers Customer Success Managers Professional Services Architecture planning Systems Integration Advanced analytic applications Full life cycle projects 52
53 Why Customers Buy Revolution R Enterprise Powers R for the Enterprise Supercharge R with Big Data Scale Enabling Analytics Across the Big Data Platforms Lower Cost & Risk of Big Analytics 53
High Performance Predictive Analytics in R and Hadoop:
High Performance Predictive Analytics in R and Hadoop: Achieving Big Data Big Analytics Presented by: Mario E. Inchiosa, Ph.D. US Chief Scientist August 27, 2013 1 Polling Questions 1 & 2 2 Agenda Revolution
Decision Trees built in Hadoop plus more Big Data Analytics with Revolution R Enterprise
Decision Trees built in Hadoop plus more Big Data Analytics with Revolution R Enterprise Revolution Webinar April 17, 2014 Mario Inchiosa, US Chief Scientist [email protected] All
In-Database Analytics Deep Dive with Teradata and Revolution R
In-Database Analytics Deep Dive with Teradata and Revolution R Mario Inchiosa Chief Scientist, Revolution Analytics Tim Miller Partner Integration Lab, Teradata Agenda Introduction Revolution R Enterprise
Revolution R Enterprise
Revolution R Enterprise Michele Chambers Chief Strategy Officer & VP Product Management @ Revolution Analytics Bill Franks Chief Analytics Officer @ Teradata Agenda Emerging Big Data Analytic Patterns
Using Microsoft R Server to Address Scalability Issues
Using Microsoft R Server to Address Scalability Issues February 4th, 2016 - Welcome! R What is it? Open Source lingua franca Global Community Ecosystem Can be Scaled to Big Data, Big Analytics Analytics,
R and Hadoop: Architectural Options. Bill Jacobs VP Product Marketing & Field CTO, Revolution Analytics @bill_jacobs
R and Hadoop: Architectural Options Bill Jacobs VP Product Marketing & Field CTO, Revolution Analytics @bill_jacobs Polling Question #1: Who Are You? (choose one) Statistician or modeler who uses R Other
R Tools Evaluation. A review by Analytics @ Global BI / Local & Regional Capabilities. Telefónica CCDO May 2015
R Tools Evaluation A review by Analytics @ Global BI / Local & Regional Capabilities Telefónica CCDO May 2015 R Features What is? Most widely used data analysis software Used by 2M+ data scientists, statisticians
Find the Hidden Signal in Market Data Noise
Find the Hidden Signal in Market Data Noise Revolution Analytics Webinar, 13 March 2013 Andrie de Vries Business Services Director (Europe) @RevoAndrie [email protected] Agenda Find the Hidden
SQL Server 2016. Everything built-in. Csom Gergely Microsoft Adat platform szakértő
SQL Server 2016 Everything built-in Csom Gergely Microsoft Adat platform szakértő SQL Server 2016: Everything built-in built-in built-in built-in built-in built-in $2,230 80 70 60 50 43 69 49 SQL Server
Laurence Liew General Manager, APAC. Economics Is Driving Big Data Analytics to the Cloud
Laurence Liew General Manager, APAC Economics Is Driving Big Data Analytics to the Cloud Big Data 101 The Analytics Stack Economics of Big Data Convergence of the 3 forces Big Data Analytics in the Cloud
Delivering Value from Big Data with Revolution R Enterprise and Hadoop
Executive White Paper Delivering Value from Big Data with Revolution R Enterprise and Hadoop Bill Jacobs, Director of Product Marketing Thomas W. Dinsmore, Director of Product Management October 2013 Abstract
Building and Deploying Customer Behavior Models
Building and Deploying Customer Behavior Models February 20, 2014 David Smith, VP Marketing and Community, Revolution Analytics Paul Maiste, President and CEO, Lityx In Today s Webinar About Revolution
How To Test The Performance Of An Ass 9.4 And Sas 7.4 On A Test On A Powerpoint Powerpoint 9.2 (Powerpoint) On A Microsoft Powerpoint 8.4 (Powerprobe) (
White Paper Revolution R Enterprise: Faster Than SAS Benchmarking Results by Thomas W. Dinsmore and Derek McCrae Norton In analytics, speed matters. How much? We asked the director of analytics from a
Scalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011
Scalable Data Analysis in R Lee E. Edlefsen Chief Scientist UserR! 2011 1 Introduction Our ability to collect and store data has rapidly been outpacing our ability to analyze it We need scalable data analysis
Advanced Big Data Analytics with R and Hadoop
REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional
Big Data and Data Science: Behind the Buzz Words
Big Data and Data Science: Behind the Buzz Words Peggy Brinkmann, FCAS, MAAA Actuary Milliman, Inc. April 1, 2014 Contents Big data: from hype to value Deconstructing data science Managing big data Analyzing
WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat
Information Builders enables agile information solutions with business intelligence (BI) and integration technologies. WebFOCUS the most widely utilized business intelligence platform connects to any enterprise
Modern Data Architecture for Predictive Analytics
Modern Data Architecture for Predictive Analytics David Smith VP Marketing and Community - Revolution Analytics John Kreisa VP Strategic Marketing- Hortonworks Hortonworks Inc. 2013 Page 1 Your Presenters
Up Your R Game. James Taylor, Decision Management Solutions Bill Franks, Teradata
Up Your R Game James Taylor, Decision Management Solutions Bill Franks, Teradata Today s Speakers James Taylor Bill Franks CEO Chief Analytics Officer Decision Management Solutions Teradata 7/28/14 3 Polling
Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc. 2011 2014. All Rights Reserved
Hortonworks & SAS Analytics everywhere. Page 1 A change in focus. A shift in Advertising From mass branding A shift in Financial Services From Educated Investing A shift in Healthcare From mass treatment
Fast Analytics on Big Data with H20
Fast Analytics on Big Data with H20 0xdata.com, h2o.ai Tomas Nykodym, Petr Maj Team About H2O and 0xdata H2O is a platform for distributed in memory predictive analytics and machine learning Pure Java,
Greenplum Database. Getting Started with Big Data Analytics. Ofir Manor Pre Sales Technical Architect, EMC Greenplum
Greenplum Database Getting Started with Big Data Analytics Ofir Manor Pre Sales Technical Architect, EMC Greenplum 1 Agenda Introduction to Greenplum Greenplum Database Architecture Flexible Database Configuration
Technical Paper. Performance of SAS In-Memory Statistics for Hadoop. A Benchmark Study. Allison Jennifer Ames Xiangxiang Meng Wayne Thompson
Technical Paper Performance of SAS In-Memory Statistics for Hadoop A Benchmark Study Allison Jennifer Ames Xiangxiang Meng Wayne Thompson Release Information Content Version: 1.0 May 20, 2014 Trademarks
APPROACHABLE ANALYTICS MAKING SENSE OF DATA
APPROACHABLE ANALYTICS MAKING SENSE OF DATA AGENDA SAS DELIVERS PROVEN SOLUTIONS THAT DRIVE INNOVATION AND IMPROVE PERFORMANCE. About SAS SAS Business Analytics Framework Approachable Analytics SAS for
HDP Enabling the Modern Data Architecture
HDP Enabling the Modern Data Architecture Herb Cunitz President, Hortonworks Page 1 Hortonworks enables adoption of Apache Hadoop through HDP (Hortonworks Data Platform) Founded in 2011 Original 24 architects,
The Future of Data Management
The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class
Data processing goes big
Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,
Understanding the Benefits of IBM SPSS Statistics Server
IBM SPSS Statistics Server Understanding the Benefits of IBM SPSS Statistics Server Contents: 1 Introduction 2 Performance 101: Understanding the drivers of better performance 3 Why performance is faster
RevoScaleR Speed and Scalability
EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution
Platfora Big Data Analytics
Platfora Big Data Analytics ISV Partner Solution Case Study and Cisco Unified Computing System Platfora, the leading enterprise big data analytics platform built natively on Hadoop and Spark, delivers
Advanced In-Database Analytics
Advanced In-Database Analytics Tallinn, Sept. 25th, 2012 Mikko-Pekka Bertling, BDM Greenplum EMEA 1 That sounds complicated? 2 Who can tell me how best to solve this 3 What are the main mathematical functions??
Cisco Data Preparation
Data Sheet Cisco Data Preparation Unleash your business analysts to develop the insights that drive better business outcomes, sooner, from all your data. As self-service business intelligence (BI) and
Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect
Big Data & QlikView Democratizing Big Data Analytics David Freriks Principal Solution Architect TDWI Vancouver Agenda What really is Big Data? How do we separate hype from reality? How does that relate
HIGH PERFORMANCE ANALYTICS FOR TERADATA
F HIGH PERFORMANCE ANALYTICS FOR TERADATA F F BORN AND BRED IN FINANCIAL SERVICES AND HEALTHCARE. DECADES OF EXPERIENCE IN PARALLEL PROGRAMMING AND ANALYTICS. FOCUSED ON MAKING DATA SCIENCE HIGHLY PERFORMING
Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics
Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics Please note the following IBM s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice
Apache Hadoop's Role in Your Big Data Architecture
Apache Hadoop's Role in Your Big Data Architecture Chris Harris EMEA, Hortonworks [email protected] Twi
HDP Hadoop From concept to deployment.
HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some
SEIZE THE DATA. 2015 SEIZE THE DATA. 2015
1 Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. BIG DATA CONFERENCE 2015 Boston August 10-13 Predicting and reducing deforestation
Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database
Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica
Predictive Analytics Powered by SAP HANA. Cary Bourgeois Principal Solution Advisor Platform and Analytics
Predictive Analytics Powered by SAP HANA Cary Bourgeois Principal Solution Advisor Platform and Analytics Agenda Introduction to Predictive Analytics Key capabilities of SAP HANA for in-memory predictive
Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs
1.1 Introduction Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs For brevity, the Lavastorm Analytics Library (LAL) Predictive and Statistical Analytics Node Pack will be
Comprehensive Analytics on the Hortonworks Data Platform
Comprehensive Analytics on the Hortonworks Data Platform We do Hadoop. Page 1 Page 2 Back to 2005 Page 3 Vertical Scaling Page 4 Vertical Scaling Page 5 Vertical Scaling Page 6 Horizontal Scaling Page
Architectures for Big Data Analytics A database perspective
Architectures for Big Data Analytics A database perspective Fernando Velez Director of Product Management Enterprise Information Management, SAP June 2013 Outline Big Data Analytics Requirements Spectrum
whitepaper Predictive Analytics with TIBCO Spotfire and TIBCO Enterprise Runtime for R
Predictive Analytics with TIBCO Spotfire and TIBCO Enterprise Runtime for R Table of Contents 3 Predictive Analytics with TIBCO Spotfire 4 TIBCO Spotfire Statistics Services 8 TIBCO Enterprise Runtime
IBM BigInsights Has Potential If It Lives Up To Its Promise. InfoSphere BigInsights A Closer Look
IBM BigInsights Has Potential If It Lives Up To Its Promise By Prakash Sukumar, Principal Consultant at iolap, Inc. IBM released Hadoop-based InfoSphere BigInsights in May 2013. There are already Hadoop-based
Some vendors have a big presence in a particular industry; some are geared toward data scientists, others toward business users.
Bonus Chapter Ten Major Predictive Analytics Vendors In This Chapter Angoss FICO IBM RapidMiner Revolution Analytics Salford Systems SAP SAS StatSoft, Inc. TIBCO This chapter highlights ten of the major
Data Integration Checklist
The need for data integration tools exists in every company, small to large. Whether it is extracting data that exists in spreadsheets, packaged applications, databases, sensor networks or social media
An In-Depth Look at In-Memory Predictive Analytics for Developers
September 9 11, 2013 Anaheim, California An In-Depth Look at In-Memory Predictive Analytics for Developers Philip Mugglestone SAP Learning Points Understand the SAP HANA Predictive Analysis library (PAL)
Introducing Oracle Exalytics In-Memory Machine
Introducing Oracle Exalytics In-Memory Machine Jon Ainsworth Director of Business Development Oracle EMEA Business Analytics 1 Copyright 2011, Oracle and/or its affiliates. All rights Agenda Topics Oracle
Predictive Analytics with TIBCO Spotfire and TIBCO Enterprise Runtime for R
Predictive Analytics with TIBCO Spotfire and TIBCO Enterprise Runtime for R PREDICTIVE ANALYTICS WITH TIBCO SPOTFIRE TIBCO Spotfire is the premier data discovery and analytics platform, which provides
ANALYTICS CENTER LEARNING PROGRAM
Overview of Curriculum ANALYTICS CENTER LEARNING PROGRAM The following courses are offered by Analytics Center as part of its learning program: Course Duration Prerequisites 1- Math and Theory 101 - Fundamentals
Hadoop & SAS Data Loader for Hadoop
Turning Data into Value Hadoop & SAS Data Loader for Hadoop Sebastiaan Schaap Frederik Vandenberghe Agenda What s Hadoop SAS Data management: Traditional In-Database In-Memory The Hadoop analytics lifecycle
Presenters: Luke Dougherty & Steve Crabb
Presenters: Luke Dougherty & Steve Crabb About Keylink Keylink Technology is Syncsort s partner for Australia & New Zealand. Our Customers: www.keylink.net.au 2 ETL is THE best use case for Hadoop. ShanH
Netezza and Business Analytics Synergy
Netezza Business Partner Update: November 17, 2011 Netezza and Business Analytics Synergy Shimon Nir, IBM Agenda Business Analytics / Netezza Synergy Overview Netezza overview Enabling the Business with
Luncheon Webinar Series May 13, 2013
Luncheon Webinar Series May 13, 2013 InfoSphere DataStage is Big Data Integration Sponsored By: Presented by : Tony Curcio, InfoSphere Product Management 0 InfoSphere DataStage is Big Data Integration
Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7
Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7 Yan Fisher Senior Principal Product Marketing Manager, Red Hat Rohit Bakhshi Product Manager,
Building your Big Data Architecture on Amazon Web Services
Building your Big Data Architecture on Amazon Web Services Abhishek Sinha @abysinha [email protected] AWS Services Deployment & Administration Application Services Compute Storage Database Networking
Executive Summary... 2 Introduction... 3. Defining Big Data... 3. The Importance of Big Data... 4 Building a Big Data Platform...
Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5 Infrastructure Requirements... 5 Solution Spectrum... 6 Oracle s Big Data
Open Source in Financial Services: Meet the challenges of new business models and disruption
Open Source in Financial Services: Meet the challenges of new business models and disruption Speakers Vamsi Chemitiganti, General Manager Financial Services, Hortonworks Josh West, Senior Solutions Architect,
Harnessing the power of advanced analytics with IBM Netezza
IBM Software Information Management White Paper Harnessing the power of advanced analytics with IBM Netezza How an appliance approach simplifies the use of advanced analytics Harnessing the power of advanced
ANALYTICS IN BIG DATA ERA
ANALYTICS IN BIG DATA ERA ANALYTICS TECHNOLOGY AND ARCHITECTURE TO MANAGE VELOCITY AND VARIETY, DISCOVER RELATIONSHIPS AND CLASSIFY HUGE AMOUNT OF DATA MAURIZIO SALUSTI SAS Copyr i g ht 2012, SAS Ins titut
ADVANCED ANALYTICS AND FRAUD DETECTION THE RIGHT TECHNOLOGY FOR NOW AND THE FUTURE
ADVANCED ANALYTICS AND FRAUD DETECTION THE RIGHT TECHNOLOGY FOR NOW AND THE FUTURE Big Data Big Data What tax agencies are or will be seeing! Big Data Large and increased data volumes New and emerging
A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel
A Next-Generation Analytics Ecosystem for Big Data Colin White, BI Research September 2012 Sponsored by ParAccel BIG DATA IS BIG NEWS The value of big data lies in the business analytics that can be generated
IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud
IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud February 25, 2014 1 Agenda v Mapping clients needs to cloud technologies v Addressing your pain
WHAT S NEW IN SAS 9.4
WHAT S NEW IN SAS 9.4 PLATFORM, HPA & SAS GRID COMPUTING MICHAEL GODDARD CHIEF ARCHITECT SAS INSTITUTE, NEW ZEALAND SAS 9.4 WHAT S NEW IN THE PLATFORM Platform update SAS Grid Computing update Hadoop support
How To Use Hp Vertica Ondemand
Data sheet HP Vertica OnDemand Enterprise-class Big Data analytics in the cloud Enterprise-class Big Data analytics for any size organization Vertica OnDemand Organizations today are experiencing a greater
SAS and Oracle: Big Data and Cloud Partnering Innovation Targets the Third Platform
SAS and Oracle: Big Data and Cloud Partnering Innovation Targets the Third Platform David Lawler, Oracle Senior Vice President, Product Management and Strategy Paul Kent, SAS Vice President, Big Data What
SAP Predictive Analytics: An Overview and Roadmap. Charles Gadalla, SAP @cgadalla SESSION CODE: 603
SAP Predictive Analytics: An Overview and Roadmap Charles Gadalla, SAP @cgadalla SESSION CODE: 603 Advanced Analytics SAP Vision Embed Smart Agile Analytics into Decision Processes to Deliver Business
Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved.
Mike Maxey Senior Director Product Marketing Greenplum A Division of EMC 1 Greenplum Becomes the Foundation of EMC s Big Data Analytics (July 2010) E M C A C Q U I R E S G R E E N P L U M For three years,
From Spark to Ignition:
From Spark to Ignition: Fueling Your Business on Real-Time Analytics Eric Frenkiel, MemSQL CEO June 29, 2015 San Francisco, CA What s in Store For This Presentation? 1. MemSQL: A real-time database for
An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database
An Oracle White Paper June 2012 High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database Executive Overview... 1 Introduction... 1 Oracle Loader for Hadoop... 2 Oracle Direct
BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata
BIG DATA: FROM HYPE TO REALITY Leandro Ruiz Presales Partner for C&LA Teradata Evolution in The Use of Information Action s ACTIVATING MAKE it happen! Insights OPERATIONALIZING WHAT IS happening now? PREDICTING
Supercomputing and Big Data: Where are the Real Boundaries and Opportunities for Synergy?
HPC2012 Workshop Cetraro, Italy Supercomputing and Big Data: Where are the Real Boundaries and Opportunities for Synergy? Bill Blake CTO Cray, Inc. The Big Data Challenge Supercomputing minimizes data
The Future of Data Management with Hadoop and the Enterprise Data Hub
The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah Cofounder & CTO, Cloudera, Inc. Twitter: @awadallah 1 2 Cloudera Snapshot Founded 2008, by former employees of Employees
Using Tableau Software with Hortonworks Data Platform
Using Tableau Software with Hortonworks Data Platform September 2013 2013 Hortonworks Inc. http:// Modern businesses need to manage vast amounts of data, and in many cases they have accumulated this data
The Enterprise Data Hub and The Modern Information Architecture
The Enterprise Data Hub and The Modern Information Architecture Dr. Amr Awadallah CTO & Co-Founder, Cloudera Twitter: @awadallah 1 2013 Cloudera, Inc. All rights reserved. Cloudera Overview The Leader
2015 Ironside Group, Inc. 2
2015 Ironside Group, Inc. 2 Introduction to Ironside What is Cloud, Really? Why Cloud for Data Warehousing? Intro to IBM PureData for Analytics (IPDA) IBM PureData for Analytics on Cloud Intro to IBM dashdb
Addressing Open Source Big Data, Hadoop, and MapReduce limitations
Addressing Open Source Big Data, Hadoop, and MapReduce limitations 1 Agenda What is Big Data / Hadoop? Limitations of the existing hadoop distributions Going enterprise with Hadoop 2 How Big are Data?
2015 Workshops for Professors
SAS Education Grow with us Offered by the SAS Global Academic Program Supporting teaching, learning and research in higher education 2015 Workshops for Professors 1 Workshops for Professors As the market
Accelerating Hadoop MapReduce Using an In-Memory Data Grid
Accelerating Hadoop MapReduce Using an In-Memory Data Grid By David L. Brinker and William L. Bain, ScaleOut Software, Inc. 2013 ScaleOut Software, Inc. 12/27/2012 H adoop has been widely embraced for
I/O Considerations in Big Data Analytics
Library of Congress I/O Considerations in Big Data Analytics 26 September 2011 Marshall Presser Federal Field CTO EMC, Data Computing Division 1 Paradigms in Big Data Structured (relational) data Very
Data Warehouse as a Service. Lot 2 - Platform as a Service. Version: 1.1, Issue Date: 05/02/2014. Classification: Open
Data Warehouse as a Service Version: 1.1, Issue Date: 05/02/2014 Classification: Open Classification: Open ii MDS Technologies Ltd 2014. Other than for the sole purpose of evaluating this Response, no
The Inside Scoop on Hadoop
The Inside Scoop on Hadoop Orion Gebremedhin National Solutions Director BI & Big Data, Neudesic LLC. VTSP Microsoft Corp. [email protected] [email protected] @OrionGM The Inside Scoop
Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing
Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Wayne W. Eckerson Director of Research, TechTarget Founder, BI Leadership Forum Business Analytics
Using an In-Memory Data Grid for Near Real-Time Data Analysis
SCALEOUT SOFTWARE Using an In-Memory Data Grid for Near Real-Time Data Analysis by Dr. William Bain, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 IN today s competitive world, businesses
Big Data Technologies Compared June 2014
Big Data Technologies Compared June 2014 Agenda What is Big Data Big Data Technology Comparison Summary Other Big Data Technologies Questions 2 What is Big Data by Example The SKA Telescope is a new development
Making big data simple with Databricks
Making big data simple with Databricks We are Databricks, the company behind Spark Founded by the creators of Apache Spark in 2013 Data 75% Share of Spark code contributed by Databricks in 2014 Value Created
Bringing the Power of SAS to Hadoop. White Paper
White Paper Bringing the Power of SAS to Hadoop Combine SAS World-Class Analytic Strength with Hadoop s Low-Cost, Distributed Data Storage to Uncover Hidden Opportunities Contents Introduction... 1 What
Ganzheitliches Datenmanagement
Ganzheitliches Datenmanagement für Hadoop Michael Kohs, Senior Sales Consultant @mikchaos The Problem with Big Data Projects in 2016 Relational, Mainframe Documents and Emails Data Modeler Data Scientist
Cisco IT Hadoop Journey
Cisco IT Hadoop Journey Srini Desikan, Program Manager IT 2015 MapR Technologies 1 Agenda Hadoop Platform Timeline Key Decisions / Lessons Learnt Data Lake Hadoop s place in IT Data Platforms Use Cases
Maximizing Hadoop Performance and Storage Capacity with AltraHD TM
Maximizing Hadoop Performance and Storage Capacity with AltraHD TM Executive Summary The explosion of internet data, driven in large part by the growth of more and more powerful mobile devices, has created
Datenverwaltung im Wandel - Building an Enterprise Data Hub with
Datenverwaltung im Wandel - Building an Enterprise Data Hub with Cloudera Bernard Doering Regional Director, Central EMEA, Cloudera Cloudera Your Hadoop Experts Founded 2008, by former employees of Employees
