Big + Fast + Safe + Simple = Lowest Technical Risk

Similar documents
THE EMC ISILON STORY. Big Data In The Enterprise. Copyright 2012 EMC Corporation. All rights reserved.

Protecting Big Data Data Protection Solutions for the Business Data Lake

EMC BACKUP MEETS BIG DATA

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

EMC IRODS RESOURCE DRIVERS

EMC s Enterprise Hadoop Solution. By Julie Lockner, Senior Analyst, and Terri McClure, Senior Analyst

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

EMC GREENPLUM DATABASE

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Big Data Technology ดร.ช ชาต หฤไชยะศ กด. Choochart Haruechaiyasak, Ph.D.

How To Manage A Single Volume Of Data On A Single Disk (Isilon)

EMC ISILON OneFS OPERATING SYSTEM Powering scale-out storage for the new world of Big Data in the enterprise

EMC BIG DATA GIS INFRASTRUCTURE

WHITE PAPER. Get Ready for Big Data:

SCALABLE FILE SHARING AND DATA MANAGEMENT FOR INTERNET OF THINGS

Advanced In-Database Analytics

HP Vertica at MIT Sloan Sports Analytics Conference March 1, 2013 Will Cairns, Senior Data Scientist, HP Vertica

EMC ISILON SCALE-OUT STORAGE PRODUCT FAMILY

Integrated Grid Solutions. and Greenplum

HYPER-CONVERGED INFRASTRUCTURE STRATEGIES

Apache Hadoop: Past, Present, and Future

I/O Considerations in Big Data Analytics

Using Attunity Replicate with Greenplum Database Using Attunity Replicate for data migration and Change Data Capture to the Greenplum Database

BIG DATA TRENDS AND TECHNOLOGIES

Greenplum Database. Getting Started with Big Data Analytics. Ofir Manor Pre Sales Technical Architect, EMC Greenplum

The BIG Data Era has. your storage! Bratislava, Slovakia, 21st March 2013

Data Storage. Vendor Neutral Data Archiving. May 2015 Sue Montagna. Imagination at work. GE Proprietary Information

EXPLORATION TECHNOLOGY REQUIRES A RADICAL CHANGE IN DATA ANALYSIS

SQL Server 2012 Parallel Data Warehouse. Solution Brief

NoSQL for SQL Professionals William McKnight

So What s the Big Deal?

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

Hadoop Architecture. Part 1

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc All Rights Reserved

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014

Tactical Advantage for Data Management at Scale and gaining value. Callan Fox, Emerging Technologies Division, EMC.

Big data management with IBM General Parallel File System

Data management challenges in todays Healthcare and Life Sciences ecosystems

Il mondo dei DB Cambia : Tecnologie e opportunita`

Microsoft Analytics Platform System. Solution Brief

Architectures for massive data management

EMC DATA PROTECTION. Backup ed Archivio su cui fare affidamento

Scala Storage Scale-Out Clustered Storage White Paper

An Alternative Storage Solution for MapReduce. Eric Lomascolo Director, Solutions Marketing

BIG DATA-AS-A-SERVICE

Hadoop: Embracing future hardware

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

IBM Global Technology Services September NAS systems scale out to meet growing storage demand.

Chapter 7. Using Hadoop Cluster and MapReduce

CitusDB Architecture for Real-Time Big Data

Real Time Big Data Processing

An Oracle White Paper June High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

Hadoop & its Usage at Facebook

Enabling High performance Big Data platform with RDMA

NextGen Infrastructure for Big DATA Analytics.

OPTIMIZING PRIMARY STORAGE WHITE PAPER FILE ARCHIVING SOLUTIONS FROM QSTAR AND CLOUDIAN

IBM Netezza High Capacity Appliance

EMC SOLUTION FOR AGILE AND ROBUST ANALYTICS ON HADOOP DATA LAKE WITH PIVOTAL HDB

Oracle Maximum Availability Architecture with Exadata Database Machine. Morana Kobal Butković Principal Sales Consultant Oracle Hrvatska

Red Hat Storage Server

Overview: X5 Generation Database Machines

Big Data. White Paper. Big Data Executive Overview WP-BD Jafar Shunnar & Dan Raver. Page 1 Last Updated

Building your Big Data Architecture on Amazon Web Services

Oracle Big Data SQL Technical Update

Please give me your feedback

The Evolving Apache Hadoop Eco-System

The Inside Scoop on Hadoop

Getting Started with Hadoop. Raanan Dagan Paul Tibaldi

Business-centric Storage FUJITSU Hyperscale Storage System ETERNUS CD10000

How To Use Hp Vertica Ondemand

Hadoop & its Usage at Facebook

The HP IT Transformation Story

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

EMC Federation Big Data Solutions. Copyright 2015 EMC Corporation. All rights reserved.

Evolution from Big Data to Smart Data

SMB Direct for SQL Server and Private Cloud

Proact whitepaper on Big Data

Open source Google-style large scale data analysis with Hadoop

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

The Growth and Management of Unstructured Data

ZFS Storage Solutions for Unstructured Data Challenges

<Insert Picture Here>

BlueArc unified network storage systems 7th TF-Storage Meeting. Scale Bigger, Store Smarter, Accelerate Everything

OBIEE 11g Analytics Using EMC Greenplum Database

Big Data With Hadoop

Quantcast Petabyte Storage at Half Price with QFS!

10th TF-Storage Meeting

Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale

Storage Architectures for Big Data in the Cloud

Hadoop and Map-Reduce. Swati Gore

(Scale Out NAS System)

EMC SOLUTION FOR SPLUNK

Transcription:

Big + Fast + Safe + Simple = Lowest Technical Risk The Synergy of Greenplum and Isilon Architecture in HP Environments Steffen Thuemmel (Isilon) Andreas Scherbaum (Greenplum) 1

Our problem 2

What is Big Data? - Big Data Storage - Big Data Analytics - Big Data Storage that supports Big Data Analytics Quelle: http://news.cnet.com/8301-21546_3-20029378-10253464.html Big Data is always Big Data, no matter what the company size is. 3

The Digital Universe 2009-2020 2009: 0.8 Zb Growing By A Factor Of 44 Source: IDC Digital Universe Study, sponsored by EMC, May 2010 2020 : 35.2 Zettabytes 4

What s a Zettabyte? Name Size Description Terabyte 2 40 =10 12 1,000,000,000,000 Petabyte 2 50 = 10 15 1,000,000,000,000,000 Exabyte 2 60 = 10 18 1,000,000,000,000,000,000 Zettabyte 2 70 = 10 21 1,000,000,000,000,000,000,000 1 TB: Tape robot 10 TB: U.S. Library of Congress 1 PB: 3 years of EOS data (2001) 20 PB: 1995 production of harddisk drives 5 EB: all words ever spoken by human beings? Can you help me? 5

Data Formats in Big Data Definition Description Solution Structured Semistructured Semistructured Quasistructured Relational database (i.e., full atomicity, consistency, isolation, and durability [ACID] support, referential integrity, strong type, and schema support). Structured data files that include metadata and are self describing (e.g., netcdf and HDF5). XML data files that are self describing and defined by an XML schema. Web clickstream data contains some inconsistencies in data values and format. Greenplum Database Greenplum Database, Greenplum Hadoop Greenplum Database, Greenplum Hadoop Greenplum Hadoop, Greenplum Database Unstructured Text documents amenable to text analytics. Greenplum Hadoop Unstructured Images and Video, Blobs. Isilon 6

Big Data Knowledge Worker Data Scientist EMC s Big Data Stack Integration Tools Data Science Collaboration Data Computing Platform Virtualization Analytic Tools Security, Information Management & Storage 7

Big Data Big Data Storage Founded 2000 Rapidly growing Since 2010 a division of EMC Leader in Scale Out NAS 2000 customers ww Big Data Analytics Founded 2003 Rapidly growing Since 2010 a division of EMC (not yet) 300+ customers ww 8

Big Data Storage Isilon 9

Understand BIG DATA Workloads Shared Infrastructure Home Directories Image Stores (Legal, Financial, Medical) Business Specific Performance Applications (HPC, LS, CAD) Rich Media Content Creation (Marketing) Surveillance and Audio (Security) Application Logging (Security) Business Specific Analytics Applications (Trading, Quants) Web Properties / Web Logs Internal Development Virtual File Stores VMDKS VHDs VMs VDIs Data Repositories Active Archives Disk Based Backup Rich Media Repository Business Archives 10

Isilon Solution for BIG DATA Workloads Virtualization Home Directories Archive New Applications Dynamic, Virtual File Stores Departmental Collaboration Active Archive Snapshot and D2D Repositories Fast Growing, Performance Sensitive Auto-Tiering of Files for Performance and Operational Efficiency S200 X200 108NL Blended Platform Blazing performance 1.6 Million SPECsfs OPS from a single cluster Balanced throughput and capacity Up to 50 GB/s from a single cluster Over 15 PB in a single cluster Disaster recovery Industry leading economics Parallel IO performance Max Throughput Large Scale Access Integrated and In Depth Analytics Simple and Efficient Provisioning 11

Isilon Innovations Meeting the Challenges of BIG DATA Operational Simplicity Scalable Reliability Application Accountability 12

Core Innovation Isilon s OneFS Scale-Out Operating System 13

Isilon Storage Evolution Starts with Simplicity Enable rapid deployment of latest technology Greener, faster every year No server or client mount point or application changes Co-existence of generations of Isilon nodes, not migration Heterogeneous nodes: mix and match Push-button retire No Manual Data Migration Infiniband Core Network Nodes are self contained: - Storage - Controller - IP connectivity - Connect via IB 14

Isilon - Linear Predictable Scalability Grow capacity and performance as needed Don t over-buy and overprovision No change in admins time or efforts as storage grows Reduce management time and costs Results in high business efficiency NO DATA MIGRATION Improve productivity with new insights 15

Big Data Storage Analytics 16

Big Data Analytics Structured Data Greenplum Database Unstructured Data Greenplum Hadoop 17

Greenplum Database Massive-parallel, shared-nothing database Based on PostgreSQL Scales linear, everywhere Runs on commodity hardware SQL + MapReduce Free Community Edition 18

Greenplum Database: Cluster Extensibility If you need to add 2TB add a node no need to add a rack 1 gnet Software Interconnect 1 Currently requires an RPQ 19

Fast Data Load We can demonstrate 10TB/hour loading from 10 ETL servers to a single rack ETL Servers gnet Software Interconnect Plus the power of SQL as you load to perform parallel transformations in flight via external tables 20

Polymorphic Storage Today Time Month partitionning Compression strategy Impacts - 3 months - 12 months No compression Row compression Possible update Average Compression rate = 2,5 Append only Better performance because reduce of IO Column Compression (low level Quicklz or Zlib-1) Average Compression rate = 10 Append only Better performance if not all columns are selected - 24 months Column Compression (high level ( Zlib-9) Average Compression rate = 20 Append only Less performance than low level compression because of uncompression task - 120 months The way you access the facts changes as the data ages so we should change the way the data is stored 21

Greenplum Hadoop Implementation in C 100% compatible Includes HDFS (plus NFS) Includes MapReduce, Jive Fault tolerance Name Node and Job Tracker Free Community Edition 22

Hadoop Pain Points Monitoring Poor Job and Application Monitoring Solution Non-existent Performance Monitoring Operability and Manageability Complex System Configuration and Manageability NoData Format Interoperability & Storage Abstractions Performance Poor Dimensional Lookup Performance Very poor Random Access and Serving Performance 23

Conclusion EMC provides tools for Big Data Continues supporting Communities Provides Support and Service 24

Q&A 25

THANK YOU 26