In-Memory Analytics for Big Data



Similar documents
Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

How To Handle Big Data With A Data Scientist

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

Bringing the Power of SAS to Hadoop

BIG DATA-AS-A-SERVICE

Navigating the Big Data infrastructure layer Helena Schwenk

Advanced Big Data Analytics with R and Hadoop

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

Architectures for Big Data Analytics A database perspective

Bringing the Power of SAS to Hadoop. White Paper

How to Effectively Realize Data Visualization

Tap into Big Data at the Speed of Business

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

ANALYTICS BUILT FOR INTERNET OF THINGS

How to Enhance Traditional BI Architecture to Leverage Big Data

High Performance IT Insights. Building the Foundation for Big Data

High-Performance Business Analytics: SAS and IBM Netezza Data Warehouse Appliances

Executive Summary... 2 Introduction Defining Big Data The Importance of Big Data... 4 Building a Big Data Platform...

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

and Hadoop Technology

EMC Federation Big Data Solutions. Copyright 2015 EMC Corporation. All rights reserved.

An Oracle White Paper November Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

Interactive data analytics drive insights

Large scale processing using Hadoop. Ján Vaňo

Bringing Big Data into the Enterprise

Advanced In-Database Analytics

Fast, Low-Overhead Encryption for Apache Hadoop*

IBM Netezza High Capacity Appliance

SQL Server 2012 Parallel Data Warehouse. Solution Brief

In-Database Analytics

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc All Rights Reserved

SQL Server 2012 PDW. Ryan Simpson Technical Solution Professional PDW Microsoft. Microsoft SQL Server 2012 Parallel Data Warehouse

IBM BigInsights for Apache Hadoop

Accelerating and Simplifying Apache

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances

BIG DATA TRENDS AND TECHNOLOGIES

Navigating Big Data business analytics

Next-Generation Cloud Analytics with Amazon Redshift

Information Architecture

Big Data Analytics - Accelerated. stream-horizon.com

Harnessing the power of advanced analytics with IBM Netezza

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

9.4 Intelligence. SAS Platform. Overview Second Edition. SAS Documentation

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Data Visualization Techniques

Whitepaper: Solution Overview - Breakthrough Insight. Published: March 7, Applies to: Microsoft SQL Server Summary:

Big Data Technologies Compared June 2014

Please give me your feedback

Microsoft Analytics Platform System. Solution Brief

White Paper. Redefine Your Analytics Journey With Self-Service Data Discovery and Interactive Predictive Analytics

WHAT S NEW IN SAS 9.4

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

IBM Cognos 10: Enhancing query processing performance for IBM Netezza appliances

Hadoop. Sunday, November 25, 12

Modernizing Your Data Warehouse for Hadoop

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014

IBM InfoSphere Guardium Data Activity Monitor for Hadoop-based systems

From Spark to Ignition:

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

Dell In-Memory Appliance for Cloudera Enterprise

Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop

An Oracle White Paper October Oracle: Big Data for the Enterprise

Big Data on Microsoft Platform

Cray: Enabling Real-Time Discovery in Big Data

Keywords Big Data, NoSQL, Relational Databases, Decision Making using Big Data, Hadoop

Big Data Success Step 1: Get the Technology Right

How To Turn Big Data Into An Insight

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Dell* In-Memory Appliance for Cloudera* Enterprise

The Use of Open Source Is Growing. So Why Do Organizations Still Turn to SAS?

Hadoop implementation of MapReduce computational model. Ján Vaňo

High-Performance Analytics

Are You Ready for Big Data?

ANALYTICS MODERNIZATION TRENDS, APPROACHES, AND USE CASES. Copyright 2013, SAS Institute Inc. All rights reserved.

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

I N T E R S Y S T E M S W H I T E P A P E R F O R F I N A N C I A L SERVICES EXECUTIVES. Deploying an elastic Data Fabric with caché

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

EMC GREENPLUM DATABASE

See the Big Picture. Make Better Decisions. The Armanta Technology Advantage. Technology Whitepaper

I/O Considerations in Big Data Analytics

Introducing Oracle Exalytics In-Memory Machine

Up Your R Game. James Taylor, Decision Management Solutions Bill Franks, Teradata

Understanding the Value of In-Memory in the IT Landscape

BIG DATA What it is and how to use?

Platfora Big Data Analytics

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

DATA VISUALIZATION: CONVERTING INFORMATION TO DECISIONS DAVID FRONING, PRINCIPAL PRODUCT MANAGER

IBM: An Early Leader across the Big Data Security Analytics Continuum Date: June 2013 Author: Jon Oltsik, Senior Principal Analyst

EMC/Greenplum Driving the Future of Data Warehousing and Analytics

What's New in SAS Data Management

Transcription:

In-Memory Analytics for Big Data Game-changing technology for faster, better insights WHITE PAPER

SAS White Paper Table of Contents Introduction: A New Breed of Analytics... 1 SAS In-Memory Overview... 1 SAS In-Memory Architectures.... 3 SAS LASR Analytic Server Running Within Hadoop.... 3 SAS LASR Analytic Server Running Alongside Greenplum or Teradata.... 5 Conclusion: Boosted Performance for Faster, More Precise Insights... 6

In-Memory Analytics for Big Data Introduction: A New Breed of Analytics In today s era of big data, organizations depend on increasingly sophisticated analysis of ever-growing volumes and varieties of data. They count on having reliable access to massive volumes of data, and they are performing advanced analytics that traditional relational technology is unsuited for and even incapable of. While other vendors are recycling relational and OLAP technology from 20 years ago and marketing it as new and improved, SAS has taken a different approach. SAS LASR Analytic Server is the world s first system specifically engineered to address diverse advanced analytic use cases. It is a read-mostly, stateless, distributed in-memory server that provides secure, multiuser, concurrent access to data in a distributed computing environment. Built upon core design principles set forth by Jim Goodnight, CEO of SAS, SAS LASR Analytic Server is a direct-access, NoSQL, NoMDX server that is engineered for maximum analytic performance through multithreading and distributed computing. Built on industry-standard hardware and distributed parallel architectures for big data, SAS LASR Analytic Server is a new breed of distributed in-memory analytic architecture for the next generation of high-performance analytics. SAS LASR Analytic Server s unmatched performance and scalability provide the ability to answer questions that previously were computationally infeasible. It enables users to explore data and analyze a variety of big-data problems, including areas such as risk management, customer intelligence, revenue optimization, and assortment and merchandise planning. SAS In-Memory Overview SAS provides a distributed in-memory computing environment configured for the demands of interactive, advanced analytic workloads. It is specifically engineered to address the computational complexity of large-scale exploratory data analysis and visualization as well as predictive analytics and data mining. Performing near-instantaneous query and reporting on sums and counts of billions of records has become one of the most common demonstrations from some vendors of their high-performance analytic capabilities. What you won t find is other vendors doing this on hundreds or thousands of variables with the types of distribution analysis and predictive modeling required by today s more sophisticated analytic community. For example, take something as simple as a box plot. It is a convenient way of graphically depicting groups of numerical data through summary calculations minimum, maximum, upper and lower quartile, and median. Box plots display different populations without making assumptions about the underlying distribution (see Figure 1). 1

SAS White Paper When it comes to the most common descriptive statistics calculations, SQL-based solutions have a number of limitations, including column limits, storage constraints and limited data type support. In addition, the iterative nature of exploratory data analysis and data mining operations, such as variable selection, dimension reduction, visualization, complex analytic data transformations and model training, require multiple concurrent passes through the data operations that SQL and relational technology are not wellsuited for. By creating an in-memory engine that is designed to speed the tasks of data exploration, predictive modeling, forecasting and optimization, SAS bypasses these issues that relational technologies face. Figure 1: SAS LASR Analytic Server enables you to produce box plots based on huge numbers of variables in just seconds. Take the next example (Figure 2). This is a simple heat map overlaid with a regression model. Most vendors would send data back to the front-end reporting tools to serially perform complex calculations. But when huge amounts of computations are needed to analyze and produce information, bottlenecks can occur. SAS in-memory technology performs the calculations on the server on the fly and in parallel. As a result, computations are very fast because you are not moving large amounts of data elsewhere for processing, and you have many computational blades to take advantage of. With SAS, processing can take place on the analytic server with the thin results sent back to the client for presentation, rather than for computation. 2

In-Memory Analytics for Big Data Figure 2: A correlation map overlaid with a regression model that uses millions of variables can be produced very quickly using SAS in-memory technology. SAS In-Memory Architectures SAS LASR Analytic Server is new thin-layer technology that enables SAS to run within distributed computing environments such as Hadoop, or alongside distributed relational databases such as Teradata and Greenplum. It provides applications with the responsiveness and extremely high throughput required by large analytic workloads and analytic-intensive applications. Applications access the SAS LASR Analytic Server using direct SAS connections and standard interfaces. The following is an overview of the two SAS architecture options. SAS LASR Analytic Server Running Within Hadoop It is easy to get caught up in the big data craze by focusing on data measured in the hundreds of terabytes and tens of petabytes. In the past, it would have cost millions of dollars to store even a few terabytes of data. However, Hadoop has changed that game. Hadoop can aid in the storage of the data as well as in exploring and analyzing it by allowing businesses to deploy distributed applications running on thousands of nodes and sifting through petabytes of data. Now, instead of using specialized hardware and software to scale applications, Hadoop enables you to use industry-standard commodity hardware and open-source software. For SAS solutions, Hadoop provides an open, simple and robust architecture that addresses the needs for fault tolerance, redundancy and scalability. 3

SAS White Paper Within each Hadoop node, a thin-layer SAS process takes incoming SAS commands and returns results (see Figure 3). It s that simple. You don t need to install SAS on every node because the Hadoop system takes care of distributing the processing, managing memory, controlling the job and managing the workload. Figure 3: SAS provides two options for loading data into the SAS LASR Analytic Server. Depending on your needs and the size of your data, you can either take data from the Hadoop storage layer and load it into the server (as shown in the bottom portion of the diagram) or you can bypass the Hadoop storage layer and load data directly into the SAS LASR Analytic Server (as shown on the left side of the diagram). Drilling a little deeper (Figure 4), you can see that SAS LASR Analytic Server operates directly on the Hadoop Distributed File System (HDFS), bypassing traditional MapReduce tasks. MapReduce is a Hadoop-based method for performing data manipulation and simple summary analysis tasks in the Hadoop environment. However, it lacks important internode communication capabilities that are critical for the high-end analytic work performed with SAS. In addition, the MapReduce paradigm does not provide the in-memory environment that is critical for analytical tasks that make multiple passes through data. On the other hand, SAS LASR Analytic Server optimizes data homogeneity in-memory for fast, steady processing even under concurrent, multipleuser access. Note that SAS/ACCESS Interface to Hadoop and new capabilities in Base SAS and the SAS Data Integration Studio component of SAS Enterprise Data Integration Server do use Hadoop s Hive and Pig languages along with MapReduce to access data stored in HDFS, thus providing several options for integrating SAS with Hadoop. 4

In-Memory Analytics for Big Data Figure 4: SAS provides two options for accessing and operating on data stored in Hadoop s HDFS, which is the primary storage system used by Hadoop applications. SAS LASR Analytic Server Running Alongside Greenplum or Teradata Data warehousing, first popularized in the mid-1990s, is now a mainstream technology that has moved from back-office, query-and-reporting style analysis to become an aid in operational decision making. The hardware architecture and enterprise-class features of Teradata and Greenplum data warehouse appliances make them excellent vehicles to deliver SAS in-memory technology. During the last several years, SAS has worked closely with RDBMS partners such as Teradata and Greenplum so that SAS can now run alongside their distributed computing architecture, bringing SAS processing to the data rather than bringing data to the SAS process. Massively parallel shared-nothing relational databases, such as Teradata and Greenplum, are good choices for SAS in-memory technology. They provide the data distribution, process management and memory management needed to run analytics alongside relational processing. 5

SAS White Paper Figure 5: Greenplum and Teradata, like most MPP data warehouse appliances, use a shared-nothing MPP architecture. On each node, SAS thin-layer, in-memory technology is used to communicate with distributed data. Conclusion: Boosted Performance for Faster, More Precise Insights Data-intensive business applications use large, wide and deep data sets. They can be both memory-intensive and CPU-intensive, while requiring little or no persistence or having big-data persistence needs. Traditional relational technology is not suited for the varied computing demands of realtime advanced analytics, but SAS LASR Analytic Server provides a new distributed, inmemory analytic computing environment specifically engineered for the next generation of real-time advanced analytics. It enables users to explore data and analyze a variety of big-data problems, including areas such as risk management, customer intelligence, revenue optimization, and assortment and merchandise planning. Built using industry-standard hardware, market-leading advanced analytics software and in-memory technology, SAS provides an optimized system that delivers highly precise answers to all your business questions with unmatched speed, intelligence and simplicity while providing enterprise-class manageability and security. 6

About SAS SAS is the leader in business analytics software and services, and the largest independent vendor in the business intelligence market. Through innovative solutions, SAS helps customers at more than 55,000 sites improve performance and deliver value by making better decisions faster. Since 1976, SAS has been giving customers around the world THE POWER TO KNOW. For more information on SAS Business Analytics software and services, visit sas.com. SAS Institute Inc. World Headquarters +1 919 677 8000 To contact your local SAS office, please visit: sas.com/offices SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. Copyright 2012, SAS Institute Inc. All rights reserved. 105645_S89872_0312