IBM General Parallel File System (GPFS ) 3.5 File Placement Optimizer (FPO)

Similar documents
IBM System x GPFS Storage Server

IBM System x GPFS Storage Server

GPFS Storage Server. Concepts and Setup in Lemanicus BG/Q system" Christian Clémençon (EPFL-DIT)" " 4 April 2013"

Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA

IBM ELASTIC STORAGE SEAN LEE

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components

Design and Evolution of the Apache Hadoop File System(HDFS)

General Parallel File System (GPFS) Native RAID For 100,000-Disk Petascale Systems

Storage Architectures for Big Data in the Cloud

IBM Spectrum Scale vs EMC Isilon for IBM Spectrum Protect Workloads

Hadoop: Embracing future hardware

THE EMC ISILON STORY. Big Data In The Enterprise. Copyright 2012 EMC Corporation. All rights reserved.

The BIG Data Era has. your storage! Bratislava, Slovakia, 21st March 2013

Accelerating and Simplifying Apache

HadoopTM Analytics DDN

BlueArc unified network storage systems 7th TF-Storage Meeting. Scale Bigger, Store Smarter, Accelerate Everything

Netapp HPC Solution for Lustre. Rich Fenton UK Solutions Architect

Big data management with IBM General Parallel File System

SCALABLE FILE SHARING AND DATA MANAGEMENT FOR INTERNET OF THINGS

The Design and Implementation of the Zetta Storage Service. October 27, 2009

Performance, Reliability, and Operational Issues for High Performance NAS Storage on Cray Platforms. Cray User Group Meeting June 2007

EMC IRODS RESOURCE DRIVERS

Enabling High performance Big Data platform with RDMA

Apache Hadoop FileSystem and its Usage in Facebook

HPC Advisory Council

Hadoop & its Usage at Facebook

PARALLELS CLOUD STORAGE


Deploying a big data solution using IBM GPFS-FPO

Data management challenges in todays Healthcare and Life Sciences ecosystems

Dynamic Disk Pools Delivering Worry-Free Storage

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

Data Centric Computing Revisited

Object Storage: Out of the Shadows and into the Spotlight

Storage Solutions in the AWS Cloud. Miles Ward Enterprise Solutions Architect

Large Scale Storage. Orlando Richards, Information Services LCFG Users Day, University of Edinburgh 18 th January 2013

The Panasas Parallel Storage Cluster. Acknowledgement: Some of the material presented is under copyright by Panasas Inc.

Building & Optimizing Enterprise-class Hadoop with Open Architectures Prem Jain NetApp

Moving Virtual Storage to the Cloud

Moving Virtual Storage to the Cloud. Guidelines for Hosters Who Want to Enhance Their Cloud Offerings with Cloud Storage

Successfully Deploying Alternative Storage Architectures for Hadoop Gus Horn Iyer Venkatesan NetApp

WOS OBJECT STORAGE PRODUCT BROCHURE DDN.COM Full Spectrum Object Storage

An Alternative Storage Solution for MapReduce. Eric Lomascolo Director, Solutions Marketing

IBM Scale Out Network Attached Storage

Hadoop Size does Hadoop Summit 2013

INCREASING EFFICIENCY WITH EASY AND COMPREHENSIVE STORAGE MANAGEMENT

HPC data becomes Big Data. Peter Braam

Data Protection Technologies: What comes after RAID? Vladimir Sapunenko, INFN-CNAF HEPiX Spring 2012 Workshop

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc

<Insert Picture Here> Oracle Cloud Storage. Morana Kobal Butković Principal Sales Consultant Oracle Hrvatska

Boas Betzler. Planet. Globally Distributed IaaS Platform Examples AWS and SoftLayer. November 9, IBM Corporation

IBM Big Data HW Platform

Automated Data-Aware Tiering

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

EMC ISILON OneFS OPERATING SYSTEM Powering scale-out storage for the new world of Big Data in the enterprise

Big + Fast + Safe + Simple = Lowest Technical Risk

THE SUMMARY. ARKSERIES - pg. 3. ULTRASERIES - pg. 5. EXTREMESERIES - pg. 9

Hadoop MapReduce over Lustre* High Performance Data Division Omkar Kulkarni April 16, 2013

Understanding Enterprise NAS

OPTIMIZING EXCHANGE SERVER IN A TIERED STORAGE ENVIRONMENT WHITE PAPER NOVEMBER 2006

nexsan NAS just got faster, easier and more affordable.

Hadoop & its Usage at Facebook

Software Defined Microsoft. PRESENTATION TITLE GOES HERE Siddhartha Roy Cloud + Enterprise Division Microsoft Corporation

Introduction to NetApp Infinite Volume

BookKeeper. Flavio Junqueira Yahoo! Research, Barcelona. Hadoop in China 2011

WHITE PAPER. Drobo TM Hybrid Storage TM

Object storage in Cloud Computing and Embedded Processing

Panasas at the RCF. Fall 2005 Robert Petkus RHIC/USATLAS Computing Facility Brookhaven National Laboratory. Robert Petkus Panasas at the RCF

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

I/O Considerations in Big Data Analytics

Big Data Storage Options for Hadoop Sam Fineberg, HP Storage

Apache Hadoop FileSystem Internals

Lustre * Filesystem for Cloud and Hadoop *

Introduction to Gluster. Versions 3.0.x

HADOOP MOCK TEST HADOOP MOCK TEST I

EMC s Enterprise Hadoop Solution. By Julie Lockner, Senior Analyst, and Terri McClure, Senior Analyst

HDFS Under the Hood. Sanjay Radia. Grid Computing, Hadoop Yahoo Inc.

Commoditisation of the High-End Research Storage Market with the Dell MD3460 & Intel Enterprise Edition Lustre

CS2510 Computer Operating Systems

CS2510 Computer Operating Systems

Evolution from Big Data to Smart Data

How To Manage A Single Volume Of Data On A Single Disk (Isilon)

Protecting Information in a Smarter Data Center with the Performance of Flash

<Insert Picture Here> Managing Storage in Private Clouds with Oracle Cloud File System OOW 2011 presentation

ntier Verde Simply Affordable File Storage

Storage Design for High Capacity and Long Term Storage. DLF Spring Forum, Raleigh, NC May 6, Balancing Cost, Complexity, and Fault Tolerance

An Affordable Commodity Network Attached Storage Solution for Biological Research Environments.

Hadoop Architecture. Part 1

IBM BigInsights Has Potential If It Lives Up To Its Promise. InfoSphere BigInsights A Closer Look

Reliability and Fault Tolerance in Storage

THESUMMARY. ARKSERIES - pg. 3. ULTRASERIES - pg. 5. EXTREMESERIES - pg. 9

Long term retention and archiving the challenges and the solution

Transcription:

IBM General Parallel File System (GPFS ) 3.5 File Placement Optimizer (FPO) Rick Koopman IBM Technical Computing Business Development Benelux Rick_koopman@nl.ibm.com

Enterprise class replacement for HDFS GPFS 3.5 HDFS Terasort: large reads X X Performance Enterprise readiness Hbase: small write X X Metadata intensive X X Posix compliance Meta-data replication Distributed name node X X X Protection & Recovery Security & Integrity Snapshot Asynchronous Replication Backup Access Control Lists Ease of Use Policy based Ingest X X X X X

A typical HDFS Environment Filers Map Reduce Cluster Jobs Users NFS M a p H D F S R e d u c e Uses disk local to each server Aggregates the local disk space into a single, redundant shared filesystem The open source standard file systems used in partnership with Hadoop Map reduce

Map Reduce Environment Using GPFS-FPO (File Placement Optimizer) Filers Map Reduce Cluster Jobs Users NFS G P F S - F P O M a p R e d u c e Uses disk local to each server Aggregates the local disk space into a single redundant shared filesystem Designed for map reduce workloads Unlike HDFS, GPFS-FPO is POSIX compliant so data maintenance is easy Intended as a drop in replacement for open source HDFS (IBM BigInsights product may be required)

GPFS FPO advanced storage for Map Reduce Data Hadoop HDFS HDFS NameNode is a single point of failure Large block-sizes poor support for small files IBM GPFS Advantages No single point of failure, distributed metadata Variable block sizes suited to multiple types of data and data access patterns Non-POSIX file system obscure commands POSIX file system easy to use and manage Difficulty to ingest data special tools required Policy based data ingest Single-purpose, Hadoop MapReduce only Versatile, Multi-purpose Not recommended for critical data Enterprise Class advanced storage features

IBM Storage Next Generation Archiving Solutions LTFS Storage Platforms

Data Protection Operational Technical Computing: Powerful. Comprehensive. Intuitive The Problem Network Disk Growth Manageability Cost Data mix - Rich media & databases, etc Uses active, time senstive access & static, immutable data C:/user defined namespace Large And Growing Bigger Difficult to Protect / Backup Cost Backup windows Time to recovery Data mix reduces effectiveness of compression/dedupe 7

Data Protection Operational Technical Computing: Powerful. Comprehensive. Intuitive The Solution Tiered Network Storage Single file system view C:/user defined namespace High use data, databases, email, etc Policy Based Tier Migration Static data, rich media, unstructured, archive LTFS LTFS LTFS LTFS LTFS Smaller Scalable Easier to protect Faster Time to recovery Smaller backup footprint Time critical applications/data Lower cost, scalable storage Data types/uses for tape Static data, rich media, etc. Replication backup strategies 8

Los Angeles London Tokyo NFS/CIFS NFS/CIFS NFS/CIFS Smarter Storage Distributed Data Namespace file view Load balancing Policy migration Storage Distribution Reduction of cost for storage Data monetization Node 1 Node 2 Node 3 Node 4 GPFS DSM LTFS LE GPFS DSM LTFS LE GPFS DSM LTFS LE GPFS DSM LTFS LE SSD Disk SSD Disk SSD Disk Disk LTFS

IBM System x GPFS Storage Server A Revolution in HPC Intelligent Cluster Management!

A Scalable Building Block Approach to Storage Complete Storage Solution Data Servers, Disk (SSD and NL-SAS), Software, Infiniband and Ethernet x3650 M4 Twin Tailed JBOD Disk Enclosure Model 24: Light and Fast 4 Enclosures 20U 232 NL-SAS 6 SSD 10 GB/Second Model 26: HPC Workhorse! 6 Enclosures 28U 12 GB/Second 348 NL-SAS 6 SSD High Density HPC Options 18 Enclosures 2-42u Standard Racks 1044 NL-SAS 18 SSD 36 GB/Second 11

Mean time to data loss 8+2 vs. 8+3 Parity 50 disks 200 disks 50,000 disks 8+2 200,000 years 50,000 years 200 years 8+3 250 billion years 60 billion years 230 million years These figures assume uncorrelated failures and hard read errors. Simulation assumptions: Disk capacity = 600-GB, MTTF = 600khrs, hard error rate = 1-in-10 15 bits, 47-HDD declustered arrays, uncorrelated failures. These MTTDL figures are due to hard errors, AFR (2-FT) = 5 x 10-6, AFR (3-FT) = 4 x 10-12 12

De-clustering Bringing Parallel Performance to Disk Maintenance Traditional RAID: Narrow data+parity arrays Rebuild uses IO capacity of an array s only 4 (surviving) disks 20 disks, 5 disks per traditional RAID array 4x4 RAID stripes (data plus parity) Striping across all arrays, all file accesses are throttled by array 2 s rebuild overhead. Failed Disk Declustered RAID: Data+parity distributed over all disks Rebuild uses IO capacity of an array s 19 (surviving) disks 20 disks in 1 De-clustered array 16 RAID stripes (data plus parity) Failed Disk Load on files accesses are reduced by 4.8x (=19/4) during array rebuild. 13

Low-Penalty Disk Rebuild Overhead failed disk failed disk time time Rd Wr Rd-Wr Reduces Rebuild Overhead by 3.5x 14