EMC Greenplum Driving the Future of Data Warehousing and Analytics. Tools and Technologies for Big Data



Similar documents
Greenplum Database. Getting Started with Big Data Analytics. Ofir Manor Pre Sales Technical Architect, EMC Greenplum

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved.

Advanced In-Database Analytics

EMC Federation Big Data Solutions. Copyright 2015 EMC Corporation. All rights reserved.

Big Data and the Data Lake. February 2015

WHAT S NEW IN SAS 9.4

Universal PMML Plug-in for EMC Greenplum Database

High-Performance Analytics

OpenChorus: Building a Tool-Chest for Big Data Science

Hadoop s Advantages for! Machine! Learning and. Predictive! Analytics. Webinar will begin shortly. Presented by Hortonworks & Zementis

The Potential of Big Data in the Cloud. Juan Madera Technology Consultant

BIG DATA-AS-A-SERVICE

Predictive Analytics Powered by SAP HANA. Cary Bourgeois Principal Solution Advisor Platform and Analytics

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop

Executive Summary... 2 Introduction Defining Big Data The Importance of Big Data... 4 Building a Big Data Platform...

Extend your analytic capabilities with SAP Predictive Analysis

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

SEIZE THE DATA SEIZE THE DATA. 2015

In-Memory Analytics for Big Data

ANALYTICS MODERNIZATION TRENDS, APPROACHES, AND USE CASES. Copyright 2013, SAS Institute Inc. All rights reserved.

SAP Solution Brief SAP HANA. Transform Your Future with Better Business Insight Using Predictive Analytics

ANALYTICS CENTER LEARNING PROGRAM

Tax Fraud in Increasing

This Symposium brought to you by

An In-Depth Look at In-Memory Predictive Analytics for Developers

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN

MADlib. An open source library for in-database analytics. Hitoshi Harada PGCon 2012, May 17th

2015 Ironside Group, Inc. 2

EMC GREENPLUM DATABASE

BIG DATA What it is and how to use?

Up Your R Game. James Taylor, Decision Management Solutions Bill Franks, Teradata

SURVEY REPORT DATA SCIENCE SOCIETY 2014

Navigating Big Data business analytics

How to Optimize Your Data Mining Environment

Advanced Big Data Analytics with R and Hadoop

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Integrating a Big Data Platform into Government:

Intelligence-Driven Security

Oracle Advanced Analytics Oracle R Enterprise & Oracle Data Mining

An Oracle White Paper June Oracle: Big Data for the Enterprise

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc All Rights Reserved

Native Connectivity to Big Data Sources in MSTR 10

An Integrated Big Data & Analytics Infrastructure June 14, 2012 Robert Stackowiak, VP Oracle ESG Data Systems Architecture

Data Warehouse design

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014

Big Data and Advanced Analytics Applications and Capabilities Steven Hagan, Vice President, Server Technologies

Achieving Business Value through Big Data Analytics Philip Russom

How To Write A Data Analysis Project

EMC/Greenplum Driving the Future of Data Warehousing and Analytics

High-Performance Business Analytics: SAS and IBM Netezza Data Warehouse Appliances

Find the Hidden Signal in Market Data Noise

Big Data Analytics Best Practices

KnowledgeSTUDIO HIGH-PERFORMANCE PREDICTIVE ANALYTICS USING ADVANCED MODELING TECHNIQUES

Some vendors have a big presence in a particular industry; some are geared toward data scientists, others toward business users.

Big Data Executive Survey

Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics

Hadoop & SAS Data Loader for Hadoop

The Use of Open Source Is Growing. So Why Do Organizations Still Turn to SAS?

Revolution R Enterprise

Big Data and Data Science: Behind the Buzz Words

Data Science And Big Data Analytics Course

SAS and Teradata Partnership

APPROACHABLE ANALYTICS MAKING SENSE OF DATA

Bringing the Power of SAS to Hadoop. White Paper

SEIZE THE DATA SEIZE THE DATA. 2015

ANALYTICS IN BIG DATA ERA

Bringing the Power of SAS to Hadoop

In-Database Analytics

An Oracle White Paper October Oracle: Big Data for the Enterprise

Integrated Grid Solutions. and Greenplum

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances

Data Science and Business Analytics Certificate Data Science and Business Intelligence Certificate

The 4 Pillars of Technosoft s Big Data Practice

Interactive data analytics drive insights

IBM Data Warehousing and Analytics Portfolio Summary

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, Viswa Sharma Solutions Architect Tata Consultancy Services

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Predictive Analytics Certificate Program

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

BIG DATA: FIVE TACTICS TO MODERNIZE YOUR DATA WAREHOUSE

Big Data Explained. An introduction to Big Data Science.

Predictive Analytics: Seeing the Whole Picture

SAP Predictive Analytics: An Overview and Roadmap. Charles Gadalla, SESSION CODE: 603

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

and Hadoop Technology

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

Big Data Analytics. Copyright 2011 EMC Corporation. All rights reserved.

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

PDF PREVIEW EMERGING TECHNOLOGIES. Applying Technologies for Social Media Data Analysis

Big Data Technologies Compared June 2014

Oracle Big Data Strategy Simplified Infrastrcuture

W H I T E P A P E R. Building your Big Data analytics strategy: Block-by-Block! Abstract

Transcription:

EMC Greenplum Driving the Future of Data Warehousing and Analytics Tools and Technologies for Big Data Steven Hillion V.P. Analytics EMC Data Computing Division 1

Big Data Size: The Volume Of Data Continues To Explode The Digital Universe 2009-2020 2009: 0.8 Zettabytes Data Volume Growing 44x 2020: 35.2 Zettabytes Source: IDC Copyright Digital 2011 Universe EMC Study, Corporation. sponsored All rights by EMC, reserved. May 2010 2

Big Data Significance: Not Just For Google and Facebook Just as search engines have transformed how we access information, other forms of big data computing can and will transform the activities of companies, scientific researchers, medical practitioners, and our nation's defense and intelligence operations. Randal E. Bryant Carnegie Mellon University Randy H. Katz UC Berkeley Edward D. Lazowska University of Washington 3

Many Requirements, Many Technologies ETL for structured and unstructured data Data storage Computational infrastructure Reporting and dashboards Data mining Model deployment Visualization Collaboration and Development e.g. MapReduce, ETL tools, SQL e.g. HDFS, RDBMS e.g. RDBMS, Hadoop, SAS Grid e.g. Excel, BI Tools e.g. SAS, SPSS, R, MADlib e.g. PMML, In-DB scoring e.g. Excel, Tableau e.g. Wiki, Sharepoint 4

Scalability SQL BI mahout In-DB analytics (MPP) SAS HPA In-DB analytics (SMP) SAS, SPSS Data Miners Matlab R Excel Mathematica, MAPLE Depth of Analysis 5

The Big Data Analytics Stack Knowledge Worker Data Scientist Integration Tools Data Science Collaboration Analytic Tools Data Computing Platform Virtualization Storage, Information Management & Security 6

Analytics Leadership SAS/ACCESS SAS Scoring Accelerator SAS High Performance Analytics In-DB MapReduce Greenplum HD sql# SELECT cid, sum(tfxidf)/count(*) AS centroid FROM ( SELECT id, tfxidf, cid, row_number() OVER ( PARTITION BY id ORDER BY distance, cid) rank FROM blog_distance ) blog_rank WHERE rank = 1 GROUP BY cid; In-database Analytics MAD Alpine Miner lib PL/R, Tools integrations Chorus 7

MADlib is an open-source library for scalable in-database analytics, jointly developed by EMC and UC Berkeley. It provides data-parallel implementations of mathematical, statistical and machine learning methods for structured and unstructured data. The MADlib mission: to foster widespread development of scalable analytic skills, by harnessing efforts from commercial practice, academic research, and open-source development. 8

A Strategic Partnership for High-Performance Computing and MAD Analytics Access relational data-sets for agile analysis SAS/ACCESS provides fast, transparent and secure access to Greenplum data. Leverage database scalability for rapid model deployment SAS Scoring Accelerator publishes models for execution in parallel across the Greenplum cluster. Build complex models at massive scales The SAS/Greenplum Analytics Appliance combines SAS In-Memory Analytics with Greenplum parallelism to produce record-breaking scalability and performance. 9

Large-Scale, Real-World Analytics Question How can I identify fraudulent activity? How do I segment my customers? Does this product appeal to some segments more than others? Which campaign is working better? How is product ownership distributed across customer segments? How do I target my marketing efforts towards customers most likely to churn? What new products should I offer my customers? What are my customers saying about the new product launch? Method Variable Selection, Logistic Regression K-means Clustering Log-likelihood Mann-Whitney U Test SQL, Cumulative Distribution Functions Logistic Regression Cosine similarity, k-nearest Neighbors MapReduce, NLP, sparse vectors 10

Q&A 11