Greenplum Database. Getting Started with Big Data Analytics. Ofir Manor Pre Sales Technical Architect, EMC Greenplum

Similar documents
Advanced In-Database Analytics

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

EMC GREENPLUM DATABASE

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

Copyright 2012 EMC Corporation. All rights reserved.

Accelerating GeoSpatial Data Analytics With Pivotal Greenplum Database

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved.

EMC Greenplum Driving the Future of Data Warehousing and Analytics. Tools and Technologies for Big Data

EMC/Greenplum Driving the Future of Data Warehousing and Analytics

Big Data and the Data Lake. February 2015

I/O Considerations in Big Data Analytics

APPROACHABLE ANALYTICS MAKING SENSE OF DATA

Einsatzfelder von IBM PureData Systems und Ihre Vorteile.

High-Performance Analytics

Using Attunity Replicate with Greenplum Database Using Attunity Replicate for data migration and Change Data Capture to the Greenplum Database

Big + Fast + Safe + Simple = Lowest Technical Risk

Netezza and Business Analytics Synergy

The Future of Data Management

2009 Oracle Corporation 1

WHAT S NEW IN SAS 9.4

How To Use Hp Vertica Ondemand

VIEWPOINT. High Performance Analytics. Industry Context and Trends

EMC Greenplum. Big Data meets Big Integration. Wolfgang Disselhoff Sr. Technology Architect, Greenplum. André Münger Sr. Account Manager, Greenplum

Inge Os Sales Consulting Manager Oracle Norway

EMC BACKUP MEETS BIG DATA

SAS and Oracle: Big Data and Cloud Partnering Innovation Targets the Third Platform

SAP Real-time Data Platform. April 2013

BIG DATA-AS-A-SERVICE

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Oracle Database - Engineered for Innovation. Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya

Integrated Grid Solutions. and Greenplum

MASSIVEDATANEWS. Load and Go: Fast Data Loading with the Greenplum Data Computing Appliance (DCA)

Big Data Technologies Compared June 2014

MADlib. An open source library for in-database analytics. Hitoshi Harada PGCon 2012, May 17th

Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop

High Performance IT Insights. Building the Foundation for Big Data

2015 Ironside Group, Inc. 2

Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics

Understanding Your Customer Journey by Extending Adobe Analytics with Big Data

SAP Predictive Analytics: An Overview and Roadmap. Charles Gadalla, SESSION CODE: 603

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc All Rights Reserved

Universal PMML Plug-in for EMC Greenplum Database

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

ORACLE DATABASE 10G ENTERPRISE EDITION

How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns

Cisco Solutions for Big Data and Analytics

In-Memory Analytics for Big Data

The Future of Data Management with Hadoop and the Enterprise Data Hub

Extend your analytic capabilities with SAP Predictive Analysis

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

EMC CUSTOMER UPDATE. 31 mei 2011 Fort Voordorp. Bart Sjerps. Greenplum Data Warehouse. Copyright 2011 EMC Corporation. All rights reserved.

BIG DATA What it is and how to use?

BIG DATA TRENDS AND TECHNOLOGIES

Green Migration from Oracle

HP Vertica. Echtzeit-Analyse extremer Datenmengen und Einbindung von Hadoop. Helmut Schmitt Sales Manager DACH

Innovative technology for big data analytics

Oracle Advanced Analytics 12c & SQLDEV/Oracle Data Miner 4.0 New Features

Big Data and Its Impact on the Data Warehousing Architecture

Up Your R Game. James Taylor, Decision Management Solutions Bill Franks, Teradata

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

Poslovni slučajevi upotrebe IBM Netezze

Introducing Oracle Exalytics In-Memory Machine

OBIEE 11g Analytics Using EMC Greenplum Database

SEIZE THE DATA SEIZE THE DATA. 2015

Main Memory Data Warehouses

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

Data Warehouse as a Service. Lot 2 - Platform as a Service. Version: 1.1, Issue Date: 05/02/2014. Classification: Open

Big Data and Data Science: Behind the Buzz Words

Architectures for Big Data Analytics A database perspective

A new IT era for a third generation platform demand. Pivotal Field Engineering and Customer Success

Executive Summary... 2 Introduction Defining Big Data The Importance of Big Data... 4 Building a Big Data Platform...

IBM Netezza High Capacity Appliance

Teradata s Big Data Technology Strategy & Roadmap

EMC s Enterprise Hadoop Solution. By Julie Lockner, Senior Analyst, and Terri McClure, Senior Analyst

Oracle Big Data SQL Technical Update

G-Cloud Big Data Suite Powered by Pivotal. December G-Cloud. service definitions

ORACLE BUSINESS INTELLIGENCE, ORACLE DATABASE, AND EXADATA INTEGRATION

CONVERGE APPLICATIONS, ANALYTICS, AND DATA WITH VCE AND PIVOTAL

Fundamentals Curriculum HAWQ

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

How to make BIG DATA work for you. Faster results with Microsoft SQL Server PDW

Administering a Microsoft SQL Server 2000 Database

Building your Big Data Architecture on Amazon Web Services

The Modern Online Application for the Internet Economy: 5 Key Requirements that Ensure Success

CitusDB Architecture for Real-Time Big Data

TRANSFORM YOUR BUSINESS: BIG DATA AND ANALYTICS WITH VCE AND EMC

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise

EMC STRATEGY Journey to Cloud -Big Data

Advanced Big Data Analytics with R and Hadoop

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

SAP and Hortonworks Reference Architecture

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, Viswa Sharma Solutions Architect Tata Consultancy Services

III JORNADAS DE DATA MINING

Dell s SAP HANA Appliance

Agenda. Big Data & Hadoop ViPR HDFS Pivotal Big Data Suite & ViPR HDFS ViON Customer Feedback #EMCVIPR

<Insert Picture Here> Oracle Database Directions Fred Louis Principal Sales Consultant Ohio Valley Region

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

Transcription:

Greenplum Database Getting Started with Big Data Analytics Ofir Manor Pre Sales Technical Architect, EMC Greenplum 1

Agenda Introduction to Greenplum Greenplum Database Architecture Flexible Database Configuration Beyond SQL Flexible Analytics Flexible Deployment Other considerations 2

!!! Big Data Is Less About Size, And More About Freedom!!! Techcrunch!!! Findings: Big Data Is More Extreme Than Volume Total data: bigger than big data 451 Group Gartner!!!!!!!!!!!!!!! Big Data! It s Real, It s Real-time, and It s Already Changing Your World IDC 3

!!!!!! Big Data Is Less About Size, And More About Freedom Techcrunch THE ERA OF Findings: Big Data Is More Extreme Than BIG DATA!!! Volume Gartner IS HERE Total data: bigger than big data 451 Group!!!!!!!!!!!!!!! Big Data! It s Real, It s Real-time, and It s Already Changing Your World IDC 4

Industries Are Broadly Embracing Big Data Retail CRM Customer Scoring Store Siting and Layout Fraud Detection / Prevention Supply Chain Optimization Advertising & Public Relations Demand Signaling Ad Targeting Sentiment Analysis Customer Acquisition Financial Services Algorithmic Trading Risk Analysis Fraud Detection Portfolio Analysis Media & Telecommunications Network Optimization Customer Scoring Churn Prevention Fraud Prevention Manufacturing Product Research Engineering Analytics Process & Quality Analysis Distribution Optimization Energy Smart Grid Exploration Government Market Governance Counter-Terrorism Econometrics Health Informatics Healthcare & Life Sciences Pharmaco-Genomics Bio-Informatics Pharmaceutical Research Clinical Outcomes Research 5

6

7

8

The Power of Data Co-Processing 12

GREENPLUM DATABASE Extreme Performance for Analytics Optimized for BI and analytics Deep integration with statistical packages High performance parallel implementations Simple and automatic Just load and query like any database Tables are automatically distributed across nodes Extremely scalable MPP shared-nothing architecture All nodes can scan and process in parallel Linear scalability by adding nodes 13

GREENPLUM DATABASE A Mature Enterprise Platform CLIENT ACCESS 3 rd PARTY TOOLS ADMIN TOOLS CLIENT ACCESS & TOOLS ODBC, JDBC, OLEDB, MapReduce, etc. BI Tools, ETL Tools Data Mining, etc Greenplum Command Center Greenplum Package Manager LOADING & EXT. ACCESS STORAGE & DATA ACCESS LANGUAGE SUPPORT PRODUCT FEATURES Petabyte-Scale Loading Trickle Micro-Batching Anywhere Data Access Hybrid Storage & Execution (Row- & Column-Oriented) In-Database Compression Multi-Level Partitioning Indexes Btree, Bitmap, etc. External Table Support Comprehensive SQL Native MapReduce SQL 2003 OLAP Extensions Programmable Analytics Analytics Extensions (GeoSpatial, PR/R, PL/Java, PL/Python, PL/Perl) GREENPLUM DATABASE ADAPTIVE SERVICES Multi-Level Fault Tolerance (RAID, Mirroring, DR with Data Domain Boost) Online System Expansion Workload Management CORE MPP ARCHITECTURE Shared-Nothing MPP Parallel Query Optimizer Polymorphic Data Storage Parallel Dataflow Engine gnet Software Interconnect Scatter/Gather Streaming Data Loading 14

Extremely Scalable MPP Shared-Nothing Architecture SQL Client Master High-Speed Interconnect Segment Segment Segment Segment 15

Linear Scalability Each node has its own CPU and I/O resources SQL Client Add nodes to scale Master Rebalance happens in the background Segment Segment Segment Segment High-Speed Interconnec Segment Segment Segment Segment 16

GREENPLUM DATABASE High Availability Master Server Data Protection Replicated transaction logs for server failure Optional RAID protection for drive failures Upon server failure Standby server activated Administrator alerted Orchestrated failover Master Master Segment Server Data Protection Mirrored segments for server failures Optional RAID protection for drive failures Upon server failure Mirrored segments take over with no loss of service Fast online differential recovery Segment Segment Segment Segment 17

GREENPLUM DATABASE Most Powerful Data Loading Capabilities Industry leading performance at 10+TB per-hour per-rack SINGLE RACK COMPARISON Scatter-Gather Streaming provides true linear scaling Support for both large-batch and continuous real-time loading strategies Enable complex data transformations in-flight Transparent interfaces to loading via support files, application, and services Greenplum Oracle Exadata Netezza Teradata Greenplum load rates scale linearly with the number of racks, others do not. For example, two racks = >20TB/H 18

GREENPLUM DATABASE Polymorphic Table Storage TM TABLE CUSTOMER Mar 11 Apr 11 May 11 Jun 11 Jul 11 Aug 11 Sept 11 Oct 11 Nov 11 Column-oriented for COLD DATA Row-oriented for HOT DATA Enable Information Lifecycle Management (ILM) Storage types can be mixed within a table or database Four table types: heap, row-oriented AO, column-oriented, external Block compression: Gzip (levels 1-9), QuickLZ Provide the choice of processing model for any table or partition 19

GREENPLUM DATABASE In-Database Analytics MAD lib Bringing the power of parallelism to commonly-used modeling and analytics functions In-database analytics SAS HPA, Access, and Scoring Accelerator MADLib An open-source library of advanced analytics functions Analytics extensions supported, including PostGIS - Geospatial support, PL/R - Statistical Computing, PL/Java, PL/Perl, etc. 20

GREENPLUM PARTNERS SAS and Greenplum A Strategic Partnership for High-Performance Computing Access relational data-sets for agile analysis SAS/ACCESS provides fast, transparent and secure access to Greenplum data. Leverage database scalability for rapid model deployment SAS Scoring Accelerator publishes models for execution in parallel across the Greenplum cluster. Build complex models at massive scales The SAS High-Performance Analytics Appliance combines SAS In-Memory Analytics with Greenplum parallelism to produce recordbreaking scalability and performance. 21

GREENPLUM DATABASE MADlib Scalable in-database analytics Data-parallel Mathematical Algorithms Statistical Algorithms Machine learning Algorithms Supports structured and unstructured data. Delivered via open-source Accessibility Skill development Converge business, academic, and open-source communities 22

MADlib In-Database Analytical Functions Descriptive Statistics Quantile Profile CountMin (Cormode-Muthukrishnan) Sketch-based Estimator FM (Flajolet-Martin) Sketch-based Estimator MFV (Most Frequent Values) Sketchbased Estimator Frequency Histogram Bar Chart Box Plot Chart Latent Dirichlet Allocation Topic Modeling Modeling Correlation Matrix Association Rule Mining K-Means Clustering Naïve Bayes Classification Linear Regression Logistic Regression Support Vector Machines SVD Matrix Factorisation Decision Trees/CART 23

Greenplum Analytics Labs Packaged solutions that produce business value and actionable results Accelerate analytics capabilities on your data with your analysts Leverage the expertise of Greenplum s Data Scientists Establish a strategic vision for analytics development 24

Greenplum Delivers Choice & Flexibility Greenplum Data Computing Appliance Choose Greenplum Database and/or Hadoop modules in ¼ rack increments Scale up by adding your choice of additional modules Minimal time to value Greenplum Software Solutions Greenplum Database, Hadoop, & Chorus on your x86 hardware Flexibility for any workload or environment Perpetual or subscription licenses 25

GREENPLUM DCA Seamless Infrastructure Integration EMC Data Domain Efficient Backup & Restore Isilon Scale Out Storage For Big Data Staging EMC VMAX or VNX SAN Mirror For Advanced Storage Management EMC VMAX SRDF EMC Data Domain Replication For Disaster Recovery 28

GREENPLUM DATABASE Simple To Manage Greenplum Command Center Complete platform management and control Greenplum Package Manager Automates install, uninstall, update, and query for analytics extensions Support package migration during upgrade, segment recovery, expansion, and standby initialization 29

Innovative Companies Using Greenplum 30

Powerful Partner Ecosystem Discovix 31

Thank you ofir.manor@emc.com Downloads, Documentation, Whitepapers etc: http://www.greenplum.com A copy of this presentation will be avaliable on the event s web site Next Greenplum workshop in Hungary: 04 July, 2012 Register now at EMC Hungary, or Avnet Hungary 32