Technical Paper. Performance of SAS In-Memory Statistics for Hadoop. A Benchmark Study. Allison Jennifer Ames Xiangxiang Meng Wayne Thompson

Size: px
Start display at page:

Download "Technical Paper. Performance of SAS In-Memory Statistics for Hadoop. A Benchmark Study. Allison Jennifer Ames Xiangxiang Meng Wayne Thompson"

Transcription

1 Technical Paper Performance of SAS In-Memory Statistics for Hadoop A Benchmark Study Allison Jennifer Ames Xiangxiang Meng Wayne Thompson

2 Release Information Content Version: 1.0 May 20, 2014 Trademarks and Patents SAS Institute Inc., SAS Campus Drive, Cary, North Carolina SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are registered trademarks or trademarks of their respective companies. ii

3 Contents Executive Summary...1 Introduction...2 Construction of Proxy Data...3 Benchmark Methods...3 Computing Environment... 3 Benchmark Tasks... 4 Results...5 Conclusion...7 References...8 i

4 Executive Summary A recent benchmark study was undertaken by Revolution Analytics, including claims such as ScaleR outperformed SAS on every task and ScaleR ran the tasks 42 times faster than SAS (Dinsmore & Norton, 2014). However, the comparison made in the study was between Revolution R Enterprise s (RRE) Parallel External Memory Algorithms, a distributed process, to SAS procedures which were not run in distributed mode. To make a more just comparison, this benchmark study compared the tasks on a distributed analytic environment. That is, we constructed a data set of identical size to the one used in Revolution Analytics benchmark and ran the same tasks utilizing SAS In-Memory Statistics for Hadoop TM (PROC IMSTAT) on a cluster with an identical number of nodes to the hardware used in Revolution Analytics benchmark. Results indicate: With 5 million observations and 134 columns, PROC IMSTAT took a total of seconds to complete all tasks. In comparison, RRE7 completed in seconds. Thus, Revolution Analytics RRE7 took 8.7 times as long to run the same set of tasks as PROC IMSTAT. The individual tasks varied from 2.8 times as long to 40 times as long to run in RRE7 than with the SAS PROC IMSTAT. In all instances, PROC IMSTAT outperformed the RRE7 reported timings for both 1 million and 5 million observations of the data. Scoring a 50 million observation data set completed in 1.34 seconds. The comparable task in RRE7 took 21.5 times as long to complete. 1

5 Introduction The context for this study begins at the Strata Conference on October 25, 2012, where the research and planning division of a large insurance corporation presented various methods that they used to model 150 million observations of insurance data. A summary of their presentation is available at: In this performance benchmark, Revolution Analytics asserted their Parallel External Memory Algorithms (PEMA) resulted in vastly better performance for advanced analytics (Dinsmore & Norton, 2014). However, several readers voiced concern regarding the methodology used, and validity of the claims made, by Revolution Analytics. These readers pointed out that the Revolution Analytics tests were run on clustered computing environments, but that the SAS benchmark tests were not. In March 2014, a follow-up benchmark study was undertaken by Revolution Analytics to make a more fair comparison by running the tests on the same hardware. The 2014 benchmark included hiring a SAS consultant to review the programs and enable them for Grid computing. The second Revolution Analytics benchmark findings included claims such as ScaleR outperformed SAS on every task and ScaleR ran the tasks 42 times faster than SAS (Dinsmore & Norton, 2014). However, Dinsmore and Norton (2014) deployed SAS Release 9.4 with base SAS, SAS/STAT, and SAS Grid Manager as the major components. They used a desktop machine running SAS Management Console and SAS Enterprise Guide as the Grid Client. Despite enabling the Grid, SAS procedures running on a single node were compared to distributed Revolution Analytics algorithms. The one instance in which distributed SAS procedures were compared (i.e., PROC HPREG), the SAS High Performance Analytics Server was not utilized. In this case, the benefits of the High Performance procedures cannot be fully realized. While we applaud the attempt to make a more fair comparison between Revolution Analytics and SAS products, and Revolution Analytics transparency by posting the SAS code used to run the procedures (posted at AnalyticslutionAnalytics/Benchmark), the benchmark is still not an evaluation using comparable computing environments. The computing environments used in the 2014 Revolution Analytics benchmark remain dramatically different despite their intentions to provide a more just comparison. Dinsmore and Norton (2014) concluded that SAS/STAT software was slower than RRE because of the way in which SAS/STAT swaps data between memory and disk when a data set is larger than memory, a process which can be slower than in-memory operations. In contrast, RRE uses Parallel External Memory Algorithms (PEMA) to distribute operations over multiple machines in a clustered architecture. When a data set is larger than memory on any single machine, rather than swap to disk, RRE distributes the data across all available computing resources. This, Dinsmore and Norton (2014) claim, is the reason behind the vastly different timings. A more fruitful and just comparison can be made comparing SAS distributed procedures to RRE distributed algorithms. The purpose of this benchmark is to make such a comparison. We generated a data set comparable to the one described in the 2014 Revolution Analytics benchmark and performed a set of tests using SAS LASR Analytic Server and SAS In-Memory Statistics for Hadoop TM. The remainder of the paper discusses construction of the proxy data, a description of the SAS LASR Analytic Server and SAS In-Memory Statistics for Hadoop TM, benchmark procedures, results, and conclusions. 2

6 Construction of Proxy Data Three data sets were generated to mimic the properties of those used in the Dinsmore and Norton (2014) study in terms of row and column size. The row counts of these data sets are one million, five million and 50 million respectively. Each table contains 134 columns. All data generation was performed using the IMSTAT procedure on the SAS LASR Analytic Server. Benchmark Methods Computing Environment SAS LASR Analytic Server is an in-memory engine which has been designed to address advanced analytics in a scalable manner. It is an in-memory analytics engine that provides secure, multiuser, concurrent access to any size data. The SAS LASR Analytic Server is a dedicated, multipass analytical server. The SAS In-Memory Statistics for Hadoop TM procedure (PROC IMSTAT) moves all of the data into dedicated memory. The main advantage is being able to analyze all of the data in the shortest amount of time. The software is optimized for distributed, multithreaded architectures and scalable processing, so requests to run new scenarios or complex analytical computations are handled very fast. This benchmark demonstrates just how fast some common analytical procedures can be performed. PROC IMSTATuses in-memory analytics technology to perform analyses that range from data exploration, visualization and descriptive statistics to model building with advanced statistical and machine learning algorithms and scoring new data. Revolution Analytics used a clustered computing environment consisting of five, four-core machines running CentOS, all networked using Gigabit Ethernet connections and a separate NFS Server. Revolution R Enterprise Release 7 (RRE7) was installed on each node. To make a valid comparison, all tasks run within PROC IMSTAT on the SAS LASR Analytic Server used five nodes as well (one name node and four data nodes). 3

7 Benchmark Tasks The set of tasks included in the benchmark are provided in Table 1. Task RRE 7 Capability SAS PROC IMSTAT Descriptive statistics (n, min, max, mean, std) on 1 numeric variable rxsummary summary Median and deciles for 1 numeric variable rxquantile percentile Frequency distribution for 1 text variable rxcube frequency Linear regression with 1 numeric response and 20 numeric predictors, with score code generated rxlinmod glm Linear regression with 1 numeric response and 10 numeric predictors and 10 categorical predictors rxlinmod glm Stepwise linear regression with 100 numeric predictors rxlinmod -- Logistic regression with 1 binary response variable and 20 numeric predictors rxlogit logistic Generalized linear model with numeric response variable, 20 numeric predictors, gamma distribution and link function rxglm genmodel k-means clustering with 20 active variables rxkmeans cluster k-means clustering with 100 active variables rxkmeans cluster Table 1 Benchmark Tasks Example script for computing frequencies in PROC IMSTAT is found below. For a more comprehensive discussion on the SAS LASR Analytic Server and SAS In-Memory Statistics for Hadoop TM, please see the SAS LASR Analytic Server reference guide and the PROC IMSTAT documentation (SAS Institute Inc., 2014). proc lasr create port=&myport path="/tmp"; performance nodes=4; run; libname lasr sasiola port=&myport tag='work'; data lasr.data1m; set &data1m.; run; proc imstat; table lasr.organics; frequency DemTVReg; run; 4

8 A distributioninfo statement can provide information about how the data are spread across the nodes. The following information is provided to show the user how the 5,000,175 rows of data are distributed across the nodes. This information is provided in Table 2 below. Nodes Number of Partitions Number of Records node node node node Table 2 Distribution of 5 Million Observations Across 4 Nodes Results Table 3 shows complete time to run results, in seconds, using the larger data set of five million records. PROC IMSTAT took a total of seconds to complete. This is in comparison to RRE7, which took seconds to complete. This time includes the sum of all times reported in Dinsmore and Norton (2014) minus the time for the stepwise linear regression task as SAS In-Memory Statistics for Hadoop TM has yet to implement stepwise regression. Thus, Revolution Analytics RRE7 took 8.73 times as long to run the same set of tasks as PROC IMSTAT. The individual tasks varied from 2.8 times as long to 40 times as long to run in RRE7 than with PROC IMSTAT. In all instances, PROC IMSTAT outperformed the RRE7 reported timings across a set of representative tasks representing end-to-end life cycle analytics. 5

9 Task RRE 7 SAS PROC IMSTAT How Much Faster is SAS? Descriptive statistics (n, min, max, mean, std) on 1 numeric variable x Median and deciles for 1 numeric variable x Frequency distribution for 1 text variable x Linear regression with 1 numeric response and 20 numeric predictors,, with score code generated x Linear regression with 1 numeric response and 10 numeric predictors and 10 categorical predictors x Stepwise linear regression with 100 numeric predictors Logistic regression with 1 binary response variable and 20 numeric predictors x Generalized linear model with numeric response variable, 20 numeric predictors, gamma distribution and link function x k-means clustering with 20 active variables x k-means clustering with 100 active variables x Table 3 Time to Run (Seconds) Table 4 provides the overall time to run for both the 5 million observations and 1 million observations data. Using the first linear regression model (with 20 numeric predictors), 50 million observations were scored using PROC IMSTAT in 1.34 seconds. A comparable task in RRE7 took 28.8 seconds, over 21 times as long. Data Set Size Total Time for Tasks 1 Million rows Million rows Table 4 Total Time to Run (Seconds) 6

10 Conclusion This study has attempted to make a benchmark comparison between SAS In-Memory Statistics for Hadoop TM, a distributed computing environment, and Revolution Analytics Grid distributed computing environment. Results show that the SAS In-Memory Statistics for Hadoop TM time to run the reported tasks were all faster than the Revolution Analytic counterparts. These results are in contrast to those reported in a 2014 benchmark by Dinsmore and Norton (2014). One reason for the conflicting results between the two benchmarks is that the Dinsmore and Norton (2014) benchmark used Revolution Analytics distributed computing environment, PEMA, but contrasted results with (a) SAS High- Performance procedures not run on the SAS High Performance Analytics Server or (b) non-distributed procedures. This severely limited the comparability of procedures. One limitation of this study is that we were only able to use a proxy data set to the one used in the Revolution Analytics benchmark. However, the data sizes (number of rows and columns) between the two studies were identical. A next step may include ensuring the exact data generated by Revolution Analytics is used. Despite this, we feel that the results provided in this study provide a more clear comparison between the two analytics solutions. If speed matters, as claimed by Dinsmore and Norton (2014), then the SAS In-Memory Statistics for Hadoop TM provide a clear advantage for advanced analytics customers. We would like to thank the SAS Enterprise Excellence Center and Business Intelligence Research and Development teams in their assistance securing hardware assets and installing software for the tests performed in this benchmark study. 7

11 References Dinsmore, Thomas, & Norton, Derek (2014). Revolution R Enterprise: Faster than SAS. Available at SAS Institute Inc SAS LASR Analytic Server 2.3: Reference Guide. Cary, NC: SAS Institute Inc. Available at SAS Institute Inc IMSTAT Procedure (Analytics). Cary, NC: SAS Institute Inc. Available at htm. SAS Institute Inc IMSTAT Procedure (Data and Server Management). Cary, NC: SAS Institute Inc. Available at opk.htm. Smith, David. (2012). Allstate compares SAS, Hadoop and R for Big-Data Insurance Models Available at 8

12 To contact your local SAS office, please visit: sas.com/offices SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. Copyright 2014, SAS Institute Inc. All rights reserved.

How To Test The Performance Of An Ass 9.4 And Sas 7.4 On A Test On A Powerpoint Powerpoint 9.2 (Powerpoint) On A Microsoft Powerpoint 8.4 (Powerprobe) (

How To Test The Performance Of An Ass 9.4 And Sas 7.4 On A Test On A Powerpoint Powerpoint 9.2 (Powerpoint) On A Microsoft Powerpoint 8.4 (Powerprobe) ( White Paper Revolution R Enterprise: Faster Than SAS Benchmarking Results by Thomas W. Dinsmore and Derek McCrae Norton In analytics, speed matters. How much? We asked the director of analytics from a

More information

Scalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011

Scalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011 Scalable Data Analysis in R Lee E. Edlefsen Chief Scientist UserR! 2011 1 Introduction Our ability to collect and store data has rapidly been outpacing our ability to analyze it We need scalable data analysis

More information

High Performance Predictive Analytics in R and Hadoop:

High Performance Predictive Analytics in R and Hadoop: High Performance Predictive Analytics in R and Hadoop: Achieving Big Data Big Analytics Presented by: Mario E. Inchiosa, Ph.D. US Chief Scientist August 27, 2013 1 Polling Questions 1 & 2 2 Agenda Revolution

More information

Tips and Techniques for Efficiently Updating and Loading Data into SAS Visual Analytics

Tips and Techniques for Efficiently Updating and Loading Data into SAS Visual Analytics Paper SAS1905-2015 Tips and Techniques for Efficiently Updating and Loading Data into SAS Visual Analytics Kerri L. Rivers and Christopher Redpath, SAS Institute Inc., Cary, NC ABSTRACT So you have big

More information

Big Data Analytics. Benchmarking SAS, R, and Mahout. Allison J. Ames, Ralph Abbey, Wayne Thompson. SAS Institute Inc., Cary, NC

Big Data Analytics. Benchmarking SAS, R, and Mahout. Allison J. Ames, Ralph Abbey, Wayne Thompson. SAS Institute Inc., Cary, NC Technical Paper (Last Revised On: May 6, 2013) Big Data Analytics Benchmarking SAS, R, and Mahout Allison J. Ames, Ralph Abbey, Wayne Thompson SAS Institute Inc., Cary, NC Accurate and Simple Analysis

More information

In-Memory Analytics for Big Data

In-Memory Analytics for Big Data In-Memory Analytics for Big Data Game-changing technology for faster, better insights WHITE PAPER SAS White Paper Table of Contents Introduction: A New Breed of Analytics... 1 SAS In-Memory Overview...

More information

SAS LASR Analytic Server 2.4

SAS LASR Analytic Server 2.4 SAS LASR Analytic Server 2.4 Reference Guide SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2014. SAS LASR Analytic Server 2.4: Reference Guide.

More information

Paper SAS033-2014 Techniques in Processing Data on Hadoop

Paper SAS033-2014 Techniques in Processing Data on Hadoop Paper SAS033-2014 Techniques in Processing Data on Hadoop Donna De Capite, SAS Institute Inc., Cary, NC ABSTRACT Before you can analyze your big data, you need to prepare the data for analysis. This paper

More information

What's New in SAS Data Management

What's New in SAS Data Management Paper SAS034-2014 What's New in SAS Data Management Nancy Rausch, SAS Institute Inc., Cary, NC; Mike Frost, SAS Institute Inc., Cary, NC, Mike Ames, SAS Institute Inc., Cary ABSTRACT The latest releases

More information

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc. 2011 2014. All Rights Reserved

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc. 2011 2014. All Rights Reserved Hortonworks & SAS Analytics everywhere. Page 1 A change in focus. A shift in Advertising From mass branding A shift in Financial Services From Educated Investing A shift in Healthcare From mass treatment

More information

R and Hadoop: Architectural Options. Bill Jacobs VP Product Marketing & Field CTO, Revolution Analytics @bill_jacobs

R and Hadoop: Architectural Options. Bill Jacobs VP Product Marketing & Field CTO, Revolution Analytics @bill_jacobs R and Hadoop: Architectural Options Bill Jacobs VP Product Marketing & Field CTO, Revolution Analytics @bill_jacobs Polling Question #1: Who Are You? (choose one) Statistician or modeler who uses R Other

More information

Understanding the Benefits of IBM SPSS Statistics Server

Understanding the Benefits of IBM SPSS Statistics Server IBM SPSS Statistics Server Understanding the Benefits of IBM SPSS Statistics Server Contents: 1 Introduction 2 Performance 101: Understanding the drivers of better performance 3 Why performance is faster

More information

SAS and Oracle: Big Data and Cloud Partnering Innovation Targets the Third Platform

SAS and Oracle: Big Data and Cloud Partnering Innovation Targets the Third Platform SAS and Oracle: Big Data and Cloud Partnering Innovation Targets the Third Platform David Lawler, Oracle Senior Vice President, Product Management and Strategy Paul Kent, SAS Vice President, Big Data What

More information

White Paper. Redefine Your Analytics Journey With Self-Service Data Discovery and Interactive Predictive Analytics

White Paper. Redefine Your Analytics Journey With Self-Service Data Discovery and Interactive Predictive Analytics White Paper Redefine Your Analytics Journey With Self-Service Data Discovery and Interactive Predictive Analytics Contents Self-service data discovery and interactive predictive analytics... 1 What does

More information

See the Big Picture. Make Better Decisions. The Armanta Technology Advantage. Technology Whitepaper

See the Big Picture. Make Better Decisions. The Armanta Technology Advantage. Technology Whitepaper See the Big Picture. Make Better Decisions. The Armanta Technology Advantage Technology Whitepaper The Armanta Technology Advantage Executive Overview Enterprises have accumulated vast volumes of structured

More information

Analytics With Hadoop. SAS and Cloudera Starter Services: Visual Analytics and Visual Statistics

Analytics With Hadoop. SAS and Cloudera Starter Services: Visual Analytics and Visual Statistics Analytics With Hadoop SAS and Cloudera Starter Services: Visual Analytics and Visual Statistics Everything You Need to Get Started on Your First Hadoop Project SAS and Cloudera have identified the essential

More information

RevoScaleR Speed and Scalability

RevoScaleR Speed and Scalability EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution

More information

Driving Value from Big Data

Driving Value from Big Data Executive White Paper Driving Value from Big Data Bill Jacobs, Director of Product Marketing & Thomas W. Dinsmore, Director of Product Management Abstract Businesses are rapidly investing in Hadoop to

More information

APPROACHABLE ANALYTICS MAKING SENSE OF DATA

APPROACHABLE ANALYTICS MAKING SENSE OF DATA APPROACHABLE ANALYTICS MAKING SENSE OF DATA AGENDA SAS DELIVERS PROVEN SOLUTIONS THAT DRIVE INNOVATION AND IMPROVE PERFORMANCE. About SAS SAS Business Analytics Framework Approachable Analytics SAS for

More information

The Use of Open Source Is Growing. So Why Do Organizations Still Turn to SAS?

The Use of Open Source Is Growing. So Why Do Organizations Still Turn to SAS? Conclusions Paper The Use of Open Source Is Growing. So Why Do Organizations Still Turn to SAS? Insights from a presentation at the 2014 Hadoop Summit Featuring Brian Garrett, Principal Solutions Architect

More information

WHAT S NEW IN SAS 9.4

WHAT S NEW IN SAS 9.4 WHAT S NEW IN SAS 9.4 PLATFORM, HPA & SAS GRID COMPUTING MICHAEL GODDARD CHIEF ARCHITECT SAS INSTITUTE, NEW ZEALAND SAS 9.4 WHAT S NEW IN THE PLATFORM Platform update SAS Grid Computing update Hadoop support

More information

Delivering Value from Big Data with Revolution R Enterprise and Hadoop

Delivering Value from Big Data with Revolution R Enterprise and Hadoop Executive White Paper Delivering Value from Big Data with Revolution R Enterprise and Hadoop Bill Jacobs, Director of Product Marketing Thomas W. Dinsmore, Director of Product Management October 2013 Abstract

More information

Best Practices for Implementing High Availability for SAS 9.4

Best Practices for Implementing High Availability for SAS 9.4 ABSTRACT Paper 305-2014 Best Practices for Implementing High Availability for SAS 9.4 Cheryl Doninger, SAS; Zhiyong Li, SAS; Bryan Wolfe, SAS There are many components that make up the mid-tier and server-tier

More information

SEIZE THE DATA. 2015 SEIZE THE DATA. 2015

SEIZE THE DATA. 2015 SEIZE THE DATA. 2015 1 Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Deep dive into Haven Predictive Analytics Powered by HP Distributed R and

More information

Dell* In-Memory Appliance for Cloudera* Enterprise

Dell* In-Memory Appliance for Cloudera* Enterprise Built with Intel Dell* In-Memory Appliance for Cloudera* Enterprise Find out what faster big data analytics can do for your business The need for speed in all things related to big data is an enormous

More information

Predictive Analytics Powered by SAP HANA. Cary Bourgeois Principal Solution Advisor Platform and Analytics

Predictive Analytics Powered by SAP HANA. Cary Bourgeois Principal Solution Advisor Platform and Analytics Predictive Analytics Powered by SAP HANA Cary Bourgeois Principal Solution Advisor Platform and Analytics Agenda Introduction to Predictive Analytics Key capabilities of SAP HANA for in-memory predictive

More information

Find the Hidden Signal in Market Data Noise

Find the Hidden Signal in Market Data Noise Find the Hidden Signal in Market Data Noise Revolution Analytics Webinar, 13 March 2013 Andrie de Vries Business Services Director (Europe) @RevoAndrie andrie@revolutionanalytics.com Agenda Find the Hidden

More information

SQL Server 2012 Performance White Paper

SQL Server 2012 Performance White Paper Published: April 2012 Applies to: SQL Server 2012 Copyright The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication.

More information

A Survey of Shared File Systems

A Survey of Shared File Systems Technical Paper A Survey of Shared File Systems Determining the Best Choice for your Distributed Applications A Survey of Shared File Systems A Survey of Shared File Systems Table of Contents Introduction...

More information

Advanced Big Data Analytics with R and Hadoop

Advanced Big Data Analytics with R and Hadoop REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional

More information

2015 Workshops for Professors

2015 Workshops for Professors SAS Education Grow with us Offered by the SAS Global Academic Program Supporting teaching, learning and research in higher education 2015 Workshops for Professors 1 Workshops for Professors As the market

More information

DATA VISUALIZATION: CONVERTING INFORMATION TO DECISIONS DAVID FRONING, PRINCIPAL PRODUCT MANAGER

DATA VISUALIZATION: CONVERTING INFORMATION TO DECISIONS DAVID FRONING, PRINCIPAL PRODUCT MANAGER DATA VISUALIZATION: CONVERTING INFORMATION TO DECISIONS DAVID FRONING, PRINCIPAL PRODUCT MANAGER SAS WHO WE ARE World leader in analytics Founded in 1976 400 offices world-wide Used at 65,000 sites in

More information

SAS IT Intelligence for VMware Infrastructure: Resource Optimization and Cost Recovery Frank Lieble, SAS Institute Inc.

SAS IT Intelligence for VMware Infrastructure: Resource Optimization and Cost Recovery Frank Lieble, SAS Institute Inc. Paper 346-2009 SAS IT Intelligence for VMware Infrastructure: Resource Optimization and Cost Recovery Frank Lieble, SAS Institute Inc. ABSTRACT SAS and VMware have collaborated on an offering that leverages

More information

Fast Analytics on Big Data with H20

Fast Analytics on Big Data with H20 Fast Analytics on Big Data with H20 0xdata.com, h2o.ai Tomas Nykodym, Petr Maj Team About H2O and 0xdata H2O is a platform for distributed in memory predictive analytics and machine learning Pure Java,

More information

Integrating Apache Spark with an Enterprise Data Warehouse

Integrating Apache Spark with an Enterprise Data Warehouse Integrating Apache Spark with an Enterprise Warehouse Dr. Michael Wurst, IBM Corporation Architect Spark/R/Python base Integration, In-base Analytics Dr. Toni Bollinger, IBM Corporation Senior Software

More information

and Hadoop Technology

and Hadoop Technology SAS and Hadoop Technology Overview SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2015. SAS and Hadoop Technology: Overview. Cary, NC: SAS Institute

More information

ANALYTICS IN BIG DATA ERA

ANALYTICS IN BIG DATA ERA ANALYTICS IN BIG DATA ERA ANALYTICS TECHNOLOGY AND ARCHITECTURE TO MANAGE VELOCITY AND VARIETY, DISCOVER RELATIONSHIPS AND CLASSIFY HUGE AMOUNT OF DATA MAURIZIO SALUSTI SAS Copyr i g ht 2012, SAS Ins titut

More information

Table of Contents. June 2010

Table of Contents. June 2010 June 2010 From: StatSoft Analytics White Papers To: Internal release Re: Performance comparison of STATISTICA Version 9 on multi-core 64-bit machines with current 64-bit releases of SAS (Version 9.2) and

More information

Planning for the Worst SAS Grid Manager and Disaster Recovery

Planning for the Worst SAS Grid Manager and Disaster Recovery Paper SAS1897-2015 Planning for the Worst SAS Grid Manager and Disaster Recovery ABSTRACT Glenn Horton and Doug Haigh, SAS Institute Inc. Many companies use geographically dispersed data centers running

More information

Intel Platform and Big Data: Making big data work for you.

Intel Platform and Big Data: Making big data work for you. Intel Platform and Big Data: Making big data work for you. 1 From data comes insight New technologies are enabling enterprises to transform opportunity into reality by turning big data into actionable

More information

Kronos Workforce Central 6.1 with Microsoft SQL Server: Performance and Scalability for the Enterprise

Kronos Workforce Central 6.1 with Microsoft SQL Server: Performance and Scalability for the Enterprise Kronos Workforce Central 6.1 with Microsoft SQL Server: Performance and Scalability for the Enterprise Providing Enterprise-Class Performance and Scalability and Driving Lower Customer Total Cost of Ownership

More information

Laurence Liew General Manager, APAC. Economics Is Driving Big Data Analytics to the Cloud

Laurence Liew General Manager, APAC. Economics Is Driving Big Data Analytics to the Cloud Laurence Liew General Manager, APAC Economics Is Driving Big Data Analytics to the Cloud Big Data 101 The Analytics Stack Economics of Big Data Convergence of the 3 forces Big Data Analytics in the Cloud

More information

Unprecedented Performance and Scalability Demonstrated For Meter Data Management:

Unprecedented Performance and Scalability Demonstrated For Meter Data Management: Unprecedented Performance and Scalability Demonstrated For Meter Data Management: Ten Million Meters Scalable to One Hundred Million Meters For Five Billion Daily Meter Readings Performance testing results

More information

Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank

Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing C. Olivia Rud, VP, Fleet Bank ABSTRACT Data Mining is a new term for the common practice of searching through

More information

SAS 9.4 Intelligence Platform

SAS 9.4 Intelligence Platform SAS 9.4 Intelligence Platform Application Server Administration Guide SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2013. SAS 9.4 Intelligence Platform:

More information

SAS Grid Manager Testing and Benchmarking Best Practices for SAS Intelligence Platform

SAS Grid Manager Testing and Benchmarking Best Practices for SAS Intelligence Platform SAS Grid Manager Testing and Benchmarking Best Practices for SAS Intelligence Platform INTRODUCTION Grid computing offers optimization of applications that analyze enormous amounts of data as well as load

More information

In-Database Analytics Deep Dive with Teradata and Revolution R

In-Database Analytics Deep Dive with Teradata and Revolution R In-Database Analytics Deep Dive with Teradata and Revolution R Mario Inchiosa Chief Scientist, Revolution Analytics Tim Miller Partner Integration Lab, Teradata Agenda Introduction Revolution R Enterprise

More information

Hadoop & SAS Data Loader for Hadoop

Hadoop & SAS Data Loader for Hadoop Turning Data into Value Hadoop & SAS Data Loader for Hadoop Sebastiaan Schaap Frederik Vandenberghe Agenda What s Hadoop SAS Data management: Traditional In-Database In-Memory The Hadoop analytics lifecycle

More information

Integrated Grid Solutions. and Greenplum

Integrated Grid Solutions. and Greenplum EMC Perspective Integrated Grid Solutions from SAS, EMC Isilon and Greenplum Introduction Intensifying competitive pressure and vast growth in the capabilities of analytic computing platforms are driving

More information

SAS Visual Analytics 7.2 for SAS Cloud: Quick-Start Guide

SAS Visual Analytics 7.2 for SAS Cloud: Quick-Start Guide SAS Visual Analytics 7.2 for SAS Cloud: Quick-Start Guide Introduction This quick-start guide covers tasks that account administrators need to perform to set up SAS Visual Statistics and SAS Visual Analytics

More information

Architectures for Big Data Analytics A database perspective

Architectures for Big Data Analytics A database perspective Architectures for Big Data Analytics A database perspective Fernando Velez Director of Product Management Enterprise Information Management, SAP June 2013 Outline Big Data Analytics Requirements Spectrum

More information

Manage SAS Metadata Server Availability with Hewlett-Packard Technology

Manage SAS Metadata Server Availability with Hewlett-Packard Technology Manage SAS Metadata Server Availability with Hewlett-Packard Technology A SAS White Paper Table of Contents The SAS and Hewlett-Packard (HP) Relationship... 1 Introduction...1 Fault Tolerance of the SAS

More information

SAS Data Set Encryption Options

SAS Data Set Encryption Options Technical Paper SAS Data Set Encryption Options SAS product interaction with encrypted data storage Table of Contents Introduction: What Is Encryption?... 1 Test Configuration... 1 Data... 1 Code... 2

More information

ORACLE DATABASE 10G ENTERPRISE EDITION

ORACLE DATABASE 10G ENTERPRISE EDITION ORACLE DATABASE 10G ENTERPRISE EDITION OVERVIEW Oracle Database 10g Enterprise Edition is ideal for enterprises that ENTERPRISE EDITION For enterprises of any size For databases up to 8 Exabytes in size.

More information

Using In-Memory Computing to Simplify Big Data Analytics

Using In-Memory Computing to Simplify Big Data Analytics SCALEOUT SOFTWARE Using In-Memory Computing to Simplify Big Data Analytics by Dr. William Bain, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T he big data revolution is upon us, fed

More information

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software WHITEPAPER Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software SanDisk ZetaScale software unlocks the full benefits of flash for In-Memory Compute and NoSQL applications

More information

Decision Trees built in Hadoop plus more Big Data Analytics with Revolution R Enterprise

Decision Trees built in Hadoop plus more Big Data Analytics with Revolution R Enterprise Decision Trees built in Hadoop plus more Big Data Analytics with Revolution R Enterprise Revolution Webinar April 17, 2014 Mario Inchiosa, US Chief Scientist mario.inchiosa@revolutionanalytics.com All

More information

R Tools Evaluation. A review by Analytics @ Global BI / Local & Regional Capabilities. Telefónica CCDO May 2015

R Tools Evaluation. A review by Analytics @ Global BI / Local & Regional Capabilities. Telefónica CCDO May 2015 R Tools Evaluation A review by Analytics @ Global BI / Local & Regional Capabilities Telefónica CCDO May 2015 R Features What is? Most widely used data analysis software Used by 2M+ data scientists, statisticians

More information

Real-Time Big Data Analytics SAP HANA with the Intel Distribution for Apache Hadoop software

Real-Time Big Data Analytics SAP HANA with the Intel Distribution for Apache Hadoop software Real-Time Big Data Analytics with the Intel Distribution for Apache Hadoop software Executive Summary is already helping businesses extract value out of Big Data by enabling real-time analysis of diverse

More information

Working Together to Promote Business Innovations with Grid Computing

Working Together to Promote Business Innovations with Grid Computing IBM and SAS Working Together to Promote Business Innovations with Grid Computing A SAS White Paper Table of Contents Executive Summary... 1 Grid Computing Overview... 1 Benefits of Grid Computing... 1

More information

Revolution R Enterprise: Efficient Predictive Analytics for Big Data

Revolution R Enterprise: Efficient Predictive Analytics for Big Data Revolution R Enterprise: Efficient Predictive Analytics for Big Data Prepared for The Bloor Group August 2014 Bill Jacobs Director Product Marketing / Field CTO - Big Data Products bill.jacobs@revolutionanalytics.com

More information

Make Better Decisions with Optimization

Make Better Decisions with Optimization ABSTRACT Paper SAS1785-2015 Make Better Decisions with Optimization David R. Duling, SAS Institute Inc. Automated decision making systems are now found everywhere, from your bank to your government to

More information

Up Your R Game. James Taylor, Decision Management Solutions Bill Franks, Teradata

Up Your R Game. James Taylor, Decision Management Solutions Bill Franks, Teradata Up Your R Game James Taylor, Decision Management Solutions Bill Franks, Teradata Today s Speakers James Taylor Bill Franks CEO Chief Analytics Officer Decision Management Solutions Teradata 7/28/14 3 Polling

More information

Technical Paper. Moving SAS Applications from a Physical to a Virtual VMware Environment

Technical Paper. Moving SAS Applications from a Physical to a Virtual VMware Environment Technical Paper Moving SAS Applications from a Physical to a Virtual VMware Environment Release Information Content Version: April 2015. Trademarks and Patents SAS Institute Inc., SAS Campus Drive, Cary,

More information

An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database An Oracle White Paper June 2012 High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database Executive Overview... 1 Introduction... 1 Oracle Loader for Hadoop... 2 Oracle Direct

More information

IBM Cognos 10: Enhancing query processing performance for IBM Netezza appliances

IBM Cognos 10: Enhancing query processing performance for IBM Netezza appliances IBM Software Business Analytics Cognos Business Intelligence IBM Cognos 10: Enhancing query processing performance for IBM Netezza appliances 2 IBM Cognos 10: Enhancing query processing performance for

More information

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat Information Builders enables agile information solutions with business intelligence (BI) and integration technologies. WebFOCUS the most widely utilized business intelligence platform connects to any enterprise

More information

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks WHITE PAPER July 2014 Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks Contents Executive Summary...2 Background...3 InfiniteGraph...3 High Performance

More information

In-Database Analytics

In-Database Analytics Embedding Analytics in Decision Management Systems In-database analytics offer a powerful tool for embedding advanced analytics in a critical component of IT infrastructure. James Taylor CEO CONTENTS Introducing

More information

Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA

Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA WHITE PAPER April 2014 Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA Executive Summary...1 Background...2 File Systems Architecture...2 Network Architecture...3 IBM BigInsights...5

More information

High-Performance Analytics

High-Performance Analytics High-Performance Analytics David Pope January 2012 Principal Solutions Architect High Performance Analytics Practice Saturday, April 21, 2012 Agenda Who Is SAS / SAS Technology Evolution Current Trends

More information

Oracle Big Data SQL Technical Update

Oracle Big Data SQL Technical Update Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical

More information

Improved Interaction Interpretation: Application of the EFFECTPLOT statement and other useful features in PROC LOGISTIC

Improved Interaction Interpretation: Application of the EFFECTPLOT statement and other useful features in PROC LOGISTIC Paper AA08-2013 Improved Interaction Interpretation: Application of the EFFECTPLOT statement and other useful features in PROC LOGISTIC Robert G. Downer, Grand Valley State University, Allendale, MI ABSTRACT

More information

Tackling Big Data with MATLAB Adam Filion Application Engineer MathWorks, Inc.

Tackling Big Data with MATLAB Adam Filion Application Engineer MathWorks, Inc. Tackling Big Data with MATLAB Adam Filion Application Engineer MathWorks, Inc. 2015 The MathWorks, Inc. 1 Challenges of Big Data Any collection of data sets so large and complex that it becomes difficult

More information

2009 Oracle Corporation 1

2009 Oracle Corporation 1 The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material,

More information

How to Optimize Your Data Mining Environment

How to Optimize Your Data Mining Environment WHITEPAPER How to Optimize Your Data Mining Environment For Better Business Intelligence Data mining is the process of applying business intelligence software tools to business data in order to create

More information

SUGI 29 Systems Architecture. Paper 223-29

SUGI 29 Systems Architecture. Paper 223-29 Paper 223-29 SAS Add-In for Microsoft Office Leveraging SAS Throughout the Organization from Microsoft Office Jennifer Clegg, SAS Institute Inc., Cary, NC Stephen McDaniel, SAS Institute Inc., Cary, NC

More information

Numerix CrossAsset XL and Windows HPC Server 2008 R2

Numerix CrossAsset XL and Windows HPC Server 2008 R2 Numerix CrossAsset XL and Windows HPC Server 2008 R2 Faster Performance for Valuation and Risk Management in Complex Derivative Portfolios Microsoft Corporation Published: February 2011 Abstract Numerix,

More information

Cray: Enabling Real-Time Discovery in Big Data

Cray: Enabling Real-Time Discovery in Big Data Cray: Enabling Real-Time Discovery in Big Data Discovery is the process of gaining valuable insights into the world around us by recognizing previously unknown relationships between occurrences, objects

More information

Fast, Low-Overhead Encryption for Apache Hadoop*

Fast, Low-Overhead Encryption for Apache Hadoop* Fast, Low-Overhead Encryption for Apache Hadoop* Solution Brief Intel Xeon Processors Intel Advanced Encryption Standard New Instructions (Intel AES-NI) The Intel Distribution for Apache Hadoop* software

More information

Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities

Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities Technology Insight Paper Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities By John Webster February 2015 Enabling you to make the best technology decisions Enabling

More information

Cloud Storage. Parallels. Performance Benchmark Results. White Paper. www.parallels.com

Cloud Storage. Parallels. Performance Benchmark Results. White Paper. www.parallels.com Parallels Cloud Storage White Paper Performance Benchmark Results www.parallels.com Table of Contents Executive Summary... 3 Architecture Overview... 3 Key Features... 4 No Special Hardware Requirements...

More information

UNIX Operating Environment

UNIX Operating Environment 97 CHAPTER 14 UNIX Operating Environment Specifying File Attributes for UNIX 97 Determining the SAS Release Used to Create a Member 97 Creating a Transport File on Tape 98 Copying the Transport File from

More information

Technical Paper. Defining an ODBC Library in SAS 9.2 Management Console Using Microsoft Windows NT Authentication

Technical Paper. Defining an ODBC Library in SAS 9.2 Management Console Using Microsoft Windows NT Authentication Technical Paper Defining an ODBC Library in SAS 9.2 Management Console Using Microsoft Windows NT Authentication Release Information Content Version: 1.0 October 2015. Trademarks and Patents SAS Institute

More information

2015 The MathWorks, Inc. 1

2015 The MathWorks, Inc. 1 25 The MathWorks, Inc. 빅 데이터 및 다양한 데이터 처리 위한 MATLAB의 인터페이스 환경 및 새로운 기능 엄준상 대리 Application Engineer MathWorks 25 The MathWorks, Inc. 2 Challenges of Data Any collection of data sets so large and complex

More information

AcademyR Course Catalog

AcademyR Course Catalog AcademyR Course Catalog Table of Contents Our Philosophy...3 Courses Listed by Role Data Analyst...4 Data Scientist...6 R Programmer...9 Statistician.... 10 BI Developer... 11 System Administrator... 12

More information

QLIKVIEW ARCHITECTURE AND SYSTEM RESOURCE USAGE

QLIKVIEW ARCHITECTURE AND SYSTEM RESOURCE USAGE QLIKVIEW ARCHITECTURE AND SYSTEM RESOURCE USAGE QlikView Technical Brief April 2011 www.qlikview.com Introduction This technical brief covers an overview of the QlikView product components and architecture

More information

Pentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System

Pentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System Pentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System By Jake Cornelius Senior Vice President of Products Pentaho June 1, 2012 Pentaho Delivers High-Performance

More information

JVM Performance Study Comparing Oracle HotSpot and Azul Zing Using Apache Cassandra

JVM Performance Study Comparing Oracle HotSpot and Azul Zing Using Apache Cassandra JVM Performance Study Comparing Oracle HotSpot and Azul Zing Using Apache Cassandra January 2014 Legal Notices Apache Cassandra, Spark and Solr and their respective logos are trademarks or registered trademarks

More information

Technology Brochure New Technology for the Digital Consumer

Technology Brochure New Technology for the Digital Consumer Technology Brochure New Technology for the Digital Consumer Redefining how Service Providers deliver the Digital Experience 7 August 2014 New Technology for the Digital Consumer Consumer attitudes and

More information

Modeling Lifetime Value in the Insurance Industry

Modeling Lifetime Value in the Insurance Industry Modeling Lifetime Value in the Insurance Industry C. Olivia Parr Rud, Executive Vice President, Data Square, LLC ABSTRACT Acquisition modeling for direct mail insurance has the unique challenge of targeting

More information

An In-Depth Look at In-Memory Predictive Analytics for Developers

An In-Depth Look at In-Memory Predictive Analytics for Developers September 9 11, 2013 Anaheim, California An In-Depth Look at In-Memory Predictive Analytics for Developers Philip Mugglestone SAP Learning Points Understand the SAP HANA Predictive Analysis library (PAL)

More information

HP reference configuration for entry-level SAS Grid Manager solutions

HP reference configuration for entry-level SAS Grid Manager solutions HP reference configuration for entry-level SAS Grid Manager solutions Up to 864 simultaneous SAS jobs and more than 3 GB/s I/O throughput Technical white paper Table of contents Executive summary... 2

More information

The HPSUMMARY Procedure: An Old Friend s Younger (and Brawnier) Cousin Anh P. Kellermann, Jeffrey D. Kromrey University of South Florida, Tampa, FL

The HPSUMMARY Procedure: An Old Friend s Younger (and Brawnier) Cousin Anh P. Kellermann, Jeffrey D. Kromrey University of South Florida, Tampa, FL Paper 88-216 The HPSUMMARY Procedure: An Old Friend s Younger (and Brawnier) Cousin Anh P. Kellermann, Jeffrey D. Kromrey University of South Florida, Tampa, FL ABSTRACT The HPSUMMARY procedure provides

More information

ANALYTICS MODERNIZATION TRENDS, APPROACHES, AND USE CASES. Copyright 2013, SAS Institute Inc. All rights reserved.

ANALYTICS MODERNIZATION TRENDS, APPROACHES, AND USE CASES. Copyright 2013, SAS Institute Inc. All rights reserved. ANALYTICS MODERNIZATION TRENDS, APPROACHES, AND USE CASES STUNNING FACT Making the Modern World: Materials and Dematerialization - Vaclav Smil Trends in Platforms Hadoop Microsoft PDW COST PER TERABYTE

More information

Hadoop Hardware @Twitter: Size does matter. @joep and @eecraft Hadoop Summit 2013

Hadoop Hardware @Twitter: Size does matter. @joep and @eecraft Hadoop Summit 2013 Hadoop Hardware : Size does matter. @joep and @eecraft Hadoop Summit 2013 v2.3 About us Joep Rottinghuis Software Engineer @ Twitter Engineering Manager Hadoop/HBase team @ Twitter Follow me @joep Jay

More information