Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

Similar documents
Advanced In-Database Analytics

Greenplum Database. Getting Started with Big Data Analytics. Ofir Manor Pre Sales Technical Architect, EMC Greenplum

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved.

Copyright 2012 EMC Corporation. All rights reserved.

I/O Considerations in Big Data Analytics

BIG DATA TRENDS AND TECHNOLOGIES

EMC/Greenplum Driving the Future of Data Warehousing and Analytics

BIG DATA TECHNOLOGY. Hadoop Ecosystem

EMC CUSTOMER UPDATE. 31 mei 2011 Fort Voordorp. Bart Sjerps. Greenplum Data Warehouse. Copyright 2011 EMC Corporation. All rights reserved.

Protecting Big Data Data Protection Solutions for the Business Data Lake

The Future of Data Management

How To Handle Big Data With A Data Scientist

Microsoft Big Data. Solution Brief

EMC Greenplum. Big Data meets Big Integration. Wolfgang Disselhoff Sr. Technology Architect, Greenplum. André Münger Sr. Account Manager, Greenplum

Modernizing Your Data Warehouse for Hadoop

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

Big + Fast + Safe + Simple = Lowest Technical Risk

HDP Hadoop From concept to deployment.

IBM Netezza High Capacity Appliance

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, Viswa Sharma Solutions Architect Tata Consultancy Services

EMC BACKUP MEETS BIG DATA

BIG DATA-AS-A-SERVICE

EMC GREENPLUM DATABASE

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

How To Scale Out Of A Nosql Database

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

G-Cloud Big Data Suite Powered by Pivotal. December G-Cloud. service definitions

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

Hadoop and Relational Database The Best of Both Worlds for Analytics Greg Battas Hewlett Packard

HDP Enabling the Modern Data Architecture

Dell In-Memory Appliance for Cloudera Enterprise

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

HadoopTM Analytics DDN

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

TUT NoSQL Seminar (Oracle) Big Data

The Future of Data Management with Hadoop and the Enterprise Data Hub

The Inside Scoop on Hadoop

Tap into Hadoop and Other No SQL Sources

Poslovni slučajevi upotrebe IBM Netezze

Big Data and the Data Lake. February 2015

Comprehensive Analytics on the Hortonworks Data Platform

The Enterprise Data Hub and The Modern Information Architecture

Apache Hadoop: Past, Present, and Future

Please give me your feedback

ADVANCED ANALYTICS AND FRAUD DETECTION THE RIGHT TECHNOLOGY FOR NOW AND THE FUTURE

SQL Server 2012 PDW. Ryan Simpson Technical Solution Professional PDW Microsoft. Microsoft SQL Server 2012 Parallel Data Warehouse

EMC s Enterprise Hadoop Solution. By Julie Lockner, Senior Analyst, and Terri McClure, Senior Analyst

Accelerating and Simplifying Apache

Integrated Grid Solutions. and Greenplum

Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013

Einsatzfelder von IBM PureData Systems und Ihre Vorteile.

Bringing Big Data to People

Navigating the Big Data infrastructure layer Helena Schwenk

Proact whitepaper on Big Data

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

Big Data Technologies Compared June 2014

2015 Ironside Group, Inc. 2

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

Oracle Database 12c Plug In. Switch On. Get SMART.

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

Workshop on Hadoop with Big Data

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Why Big Data in the Cloud?

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc All Rights Reserved

Investor Presentation. Second Quarter 2015

What is a Petabyte? Gain Big or Lose Big; Measuring the Operational Risks of Big Data. Agenda

Oracle Database - Engineered for Innovation. Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya

Big Data Analytics Nokia

So What s the Big Deal?

Virtualizing Apache Hadoop. June, 2012

Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop

Hadoop IST 734 SS CHUNG

Quickly Deploy Microsoft Private Cloud and SQL Server 2012 Data Warehouse on Hitachi Converged Solutions. September 25, 2013

Build Your Competitive Edge in Big Data with Cisco. Rick Speyer Senior Global Marketing Manager Big Data Cisco Systems 6/25/2015

HYPER-CONVERGED INFRASTRUCTURE STRATEGIES

Getting Started with Hadoop. Raanan Dagan Paul Tibaldi

Big Data and Advanced Analytics Applications and Capabilities Steven Hagan, Vice President, Server Technologies

Parallel Data Warehouse

Real Time Big Data Processing

How To Manage A Single Volume Of Data On A Single Disk (Isilon)

Inge Os Sales Consulting Manager Oracle Norway

Building your Big Data Architecture on Amazon Web Services

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April

An Oracle White Paper June High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

Harnessing the Power of the Microsoft Cloud for Deep Data Analytics

W H I T E P A P E R. Building your Big Data analytics strategy: Block-by-Block! Abstract

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru

Testing Big data is one of the biggest

BIG DATA AND MICROSOFT. Susie Adams CTO Microsoft Federal

EMC Federation Big Data Solutions. Copyright 2015 EMC Corporation. All rights reserved.

Data-Intensive Programming. Timo Aaltonen Department of Pervasive Computing

BIG DATA IS MESSY PARTNER WITH SCALABLE

Apache Hadoop: The Big Data Refinery

Transcription:

Big Data Analytics with EMC Greenplum and Hadoop Big Data Analytics with EMC Greenplum and Hadoop Ofir Manor Pre Sales Technical Architect EMC Greenplum 1

Big Data and the Data Warehouse Potential All internal operational data External web site traffic Mobile apps traffic Customer interactions from facebook, twitter etc Sensor data Deeper customer insights Better analytics better offerings, retention, fraud detection etc Increased profit, growth Less risk Reality DW slow to adapt Hard to fit into night window Can t support real-time loading Long running queries are killed A lot of hand-tuning, hints, indexes, materialized views etc Sprawl of data duplication and shadow systems Analytics done offline in small silos Can t integrate with newer Big Data sources 2

3

4

Building The Industry s Only Complete Big Data Analytics Stack Analytic Toolsets (Business Analytics, BI, Statistics, etc.) Greenplum Chorus Enterprise Collaboration Platform for Data Greenplum Data Computing Appliances Purpose-built for Big Data Analytics Greenplum Database Enterprise & Community Editions World s Most Scalable MPP Database Platform Greenplum HD Hadoop Enterprise & Community Editions Enterprise Analytics Platform for Unstructured Data 5

Building The Industry s Only Complete Big Data Analytics Stack Analytic Toolsets (Business Analytics, BI, Statistics, etc.) Greenplum Chorus Enterprise Collaboration Platform for Data Greenplum Data Computing Appliances Purpose-built for Big Data Analytics Greenplum Database Enterprise & Community Editions World s Most Scalable MPP Database Platform Greenplum HD Hadoop Enterprise & Community Editions Enterprise Analytics Platform for Unstructured Data 6

GREENPLUM DATABASE Industry-Leading Massively Parallel Processing (MPP) Performance 7

Database Architecture Matters Scale-Out vs. Scale-Up Greenplum is a Scale-Out Cloud Architecture on standard commodity hardware Others use a Mainframe Scale-Up Architecture on proprietary hardware 8

Greenplum Database Extreme Performance on Commodity HW Optimized for BI and Analytics Provides automatic parallelization Just load and query like any database Tables are automatically distributed across nodes No need for manual partitioning or tuning Interconnect Extremely scalable MPP shared-nothing Architecture All nodes can scan and process in parallel Linear scalability by adding nodes Flexible physical layout Column-oriented or row-oriented with various levels of compression Loading 9

Greenplum Database Most Powerful Data Loading Capabilities Industry leading performance: >10TB per hour per rack Innovative, parallel-everything architecture: Scatter-Gather Streaming provides true linear scaling Support for both large-batch and continuous real-time loading strategies Enable complex data transformations in-flight Transparent interfaces to loading via support files, application and services 10

Platform Independence Delivers Choice and Flexibility Data Computing Appliance Optimized Price/Performance Minimum time-to-value Ideal for Production Environments Software-Only On your x86 hardware Flexibility for any workload Ideal for Q/A or DR Virtualized Infrastructure Pool resources Elastic scalability Ideal for Test & Development 11

Building The Industry s Only Complete Big Data Analytics Stack Analytic Toolsets (Business Analytics, BI, Statistics, etc.) Greenplum Chorus Enterprise Collaboration Platform for Data Greenplum Data Computing Appliances Purpose-built for Big Data Analytics Greenplum Database Enterprise & Community Editions World s Most Scalable MPP Database Platform Greenplum HD Hadoop Enterprise & Community Editions Enterprise Analytics Platform for Unstructured Data 12

EMC GREENPLUM HD Delivering Enterprise-Ready Apache Hadoop 13

What is Hadoop? Open Source Apache Project (written in Java) Provides distributed data and processing over commodity servers for unstructured data Hadoop core components: Distributed File System - Distributes data Map/Reduce - Distributes computation (near the data) HDFS MapReduce Pig Zookeeper Hive HBase Oozie Mahout Hadoop Distributed File System Framework for writing scalable data applications Procedural language that abstracts lower level MapReduce Highly reliable distributed coordination Data warehouse infrastructure built on top of Hadoop Database for random, real time read/write access workflow/coordination to manage jobs Scalable machine learning libraries 14

Hadoop Example: Yahoo! Search Assist Insight: Related concepts appear close together in text corpus. Input: Web pages 1 Billion Pages, 10K bytes each 10 TB of input data Output: List(word, List(related words)) 15

Greenplum HD: Enterprise Edition Enterprise-Ready Hadoop Platform for Unstructured Data Faster 2 5x Faster than Apache Hadoop Reliable Easier to Use High Availability Mirroring, Snapshots NFS mountable System Management 16

Hadoop and Database Co-Processing Analytic Productivity Applications, Tools, Chorus Data Computing Interfaces SQL, MapReduce, In-Database Analytics, Parallel Data Loading (batch or real-time) Greenplum Database Hadoop Compute Storage parallel data exchange Compute Storage SQL DB Engine parallel data exchange MapReduce Engine Network unstructured data structured data temporal data All Data Types geospatial data sensor data spatial data 17

Building The Industry s Only Complete Big Data Analytics Stack Analytic Toolsets (Business Analytics, BI, Statistics, etc.) Greenplum Chorus Enterprise Collaboration Platform for Data Greenplum Data Computing Appliances Purpose-built for Big Data Analytics Greenplum Database Enterprise & Community Editions World s Most Scalable MPP Database Platform Greenplum HD Hadoop Enterprise & Community Editions Enterprise Analytics Platform for Unstructured Data 18

Greenplum Data Computing Appliances Application Specific Configurations DATABASE HADOOP Purpose-built, highly scalable data warehousing appliance that delivers leading price performance Greenplum Database combined with SAS high-performance computing to enable analytics on all the data Greenplum Database combined with Hadoop to enable co-processing of structured and unstructured data EMC* makes no representation and undertakes no obligations with regard to product planning information, anticipated product characteristics, performance specifications, or anticipated release dates (collectively, Roadmap Information ). Roadmap Information is provided by EMC as an accommodation to the recipient solely for purposes of discussion and without intending to be bound thereby. 19

Connecting Functional Modules GP DB Module GPDB GREENPLUM DATABASE MODULE 4 servers optimally preconfigured with GP DB software for simple plug and play expansion of the database cluster GP HD Module HD DIA Module DIA GREENPLUM HD MODULE 4 servers optimally preconfigured with GP HD software for simple plug and play expansion of HDFS cluster DATA INTEGRATION ACCELERATOR MODULE 4 servers available for 3 rd party software that benefits from being on shared interconnect for high speed data access 20

Example 3 Rack Configuration HD HD DIA GPDB GPDB HD GPDB HD HD 21

Sample Configuration with Greenplum Database Modules Module Type GP DB Standard Module GP DB High Capacity Module Number of Modules Number of Racks Usable Capacity (uncompressed) Usable Capacity (compressed) 4 24 4 24 1 6 1 6 36 TB 216 TB 124 TB 744 TB 144 TB 864 TB 496 TB 2,976 TB Scan Rate 24 GB/Sec 144 GB/Sec 14 GB/Sec 84 GB/Sec Data Load Rate 10 TB/Hour 60 TB/Hour 10 TB/Hour 60 TB/Hour 22

Greenplum Data Computing Appliances Seamless Infrastructure Integration EMC Data Domain Efficient Backup & Restore Isilon Scale Out Storage For Big Data Staging EMC VMAX SAN Mirror For Advanced Storage Management EMC VMAX SRDF EMC Data Domain Replication For Disaster Recovery 23

Building The Industry s Only Complete Big Data Analytics Stack Analytic Toolsets (Business Analytics, BI, Statistics, etc.) Greenplum Chorus Enterprise Collaboration Platform for Data Greenplum Data Computing Appliances Purpose-built for Big Data Analytics Greenplum Database Enterprise & Community Editions World s Most Scalable MPP Database Platform Greenplum HD Hadoop Enterprise & Community Editions Enterprise Analytics Platform for Unstructured Data 24

GREENPLUM CHORUS The World s First Enterprise Data Cloud Platform 25

Greenplum Chorus Self-Service Analytic Infrastructure Self-service provisioning Data services Collaborative analytics 26

How Do You Get Started? Unlock the business value in big data Our advanced analytics services will help you combine new, rich big data sources in powerful ways to discover new business insights Analytics Assessment Greenplum Analytics Lab Vision Workshop Big Data Advisory Service 27

Building The Industry s Only Complete Big Data Analytics Stack Greenplum Chorus Enterprise Collaboration Platform for Data Greenplum Data Computing Appliances Purpose-built for Big Data Analytics Greenplum Database Greenplum HD Enterprise & Community Editions World s Most Scalable MPP Database Platform Hadoop Enterprise & Community Editions Enterprise Analytics Platform for Unstructured Data 28

Powerful Big Data Partner Ecosystem 29

Greenplum: Current Success and Market Momentum Leaders Quadrant in Gartner DW 2011 Mission critical deployments across multiple industries Installations from small (TBs) to very large (PBs) Scalable analytics platform to complement EDW 30 30

Customer Examples Sample use cases across industries with Greenplum Database Telecom Media & Entertainment Analyze user behavior to eliminate network abuses Retail Direct marketing/crm Financial Services Detect and prevent fraud and credit scoring and analysis to reduce credit risk Pharmaceutical Analytics for drug discovery and development Internet Clickstream analytics for ad targeting and market research 31

THANK YOU 32

THANK YOU 33