Object storage in Cloud Computing and Embedded Processing

Similar documents

WOS Cloud. ddn.com. Personal Storage for the Enterprise. DDN Solution Brief

ECMWF HPC Workshop: Accelerating Data Management

WOS. High Performance Object Storage

WOS OBJECT STORAGE PRODUCT BROCHURE DDN.COM Full Spectrum Object Storage

A Survey of Shared File Systems

ANY SURVEILLANCE, ANYWHERE, ANYTIME

With DDN Big Data Storage

T a c k l i ng Big Data w i th High-Performance

SMB Direct for SQL Server and Private Cloud

Large Scale Storage. Orlando Richards, Information Services LCFG Users Day, University of Edinburgh 18 th January 2013

Building Storage Service in a Private Cloud

DDN in Seismic Workflows

SCALABLE FILE SHARING AND DATA MANAGEMENT FOR INTERNET OF THINGS

Preview of a Novel Architecture for Large Scale Storage

GPFS Storage Server. Concepts and Setup in Lemanicus BG/Q system" Christian Clémençon (EPFL-DIT)" " 4 April 2013"

IBM ELASTIC STORAGE SEAN LEE

HadoopTM Analytics DDN

Quantum StorNext. Product Brief: Distributed LAN Client

DDN updates object storage platform as it aims to break out of HPC niche

10th TF-Storage Meeting

BlueArc unified network storage systems 7th TF-Storage Meeting. Scale Bigger, Store Smarter, Accelerate Everything

WOS 360 FULL SPECTRUM OBJECT STORAGE

Accelerating I/O- Intensive Applications in IT Infrastructure with Innodisk FlexiArray Flash Appliance. Alex Ho, Product Manager Innodisk Corporation

EMC IRODS RESOURCE DRIVERS

File and Object Storage Solutions for Big Data

Technology Insight Series

SCI Briefing: A Review of the New Hitachi Unified Storage and Hitachi NAS Platform 4000 Series. Silverton Consulting, Inc.

Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet. September 2014

Data Protection Technologies: What comes after RAID? Vladimir Sapunenko, INFN-CNAF HEPiX Spring 2012 Workshop

SAN Conceptual and Design Basics

June Blade.org 2009 ALL RIGHTS RESERVED

Hadoop: Embracing future hardware

Storage Architectures for Big Data in the Cloud

ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE

EMC VNX FAMILY. Copyright 2011 EMC Corporation. All rights reserved.

CIFS/NFS Gateway Product Release Notes. Version May 2015 Revision A0

Building Storage Clouds for Online Applications A Case for Optimized Object Storage

Solution Brief: Creating Avid Project Archives

Integrated Grid Solutions. and Greenplum

IOmark- VDI. Nimbus Data Gemini Test Report: VDI a Test Report Date: 6, September

broadberry.co.uk/storage-servers

THE EMC ISILON STORY. Big Data In The Enterprise. Copyright 2012 EMC Corporation. All rights reserved.

MaxDeploy Hyper- Converged Reference Architecture Solution Brief

Boas Betzler. Planet. Globally Distributed IaaS Platform Examples AWS and SoftLayer. November 9, IBM Corporation

Introduction to Gluster. Versions 3.0.x

Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.

High Performance Server SAN using Micron M500DC SSDs and Sanbolic Software

Getting performance & scalability on standard platforms, the Object vs Block storage debate. Copyright 2013 MPSTOR LTD. All rights reserved.

Understanding Enterprise NAS

Selling Compellent NAS: File & Block Level in the Same System Chad Thibodeau

Mellanox Accelerated Storage Solutions

The VMware Administrator s Guide to Hyper-V in Windows Server Brien Posey Microsoft

IOmark-VM. DotHill AssuredSAN Pro Test Report: VM a Test Report Date: 16, August

High Performance Computing OpenStack Options. September 22, 2015

A Survey of Shared File Systems

Virtual Compute Appliance Frequently Asked Questions

Improving Time to Results for Seismic Processing with Paradigm and DDN. ddn.com. DDN Whitepaper. James Coomer and Laurent Thiers

How to Choose your Red Hat Enterprise Linux Filesystem

GRIDScaler-WOS Bridge

Introduction to NetApp Infinite Volume

Diagram 1: Islands of storage across a digital broadcast workflow

EMC BACKUP MEETS BIG DATA

PrimeArray Data Storage Solutions Network Attached Storage (NAS) iscsi Storage Area Networks (SAN) Optical Storage Systems (CD/DVD)

VIDEO SURVEILLANCE WITH SURVEILLUS VMS AND EMC ISILON STORAGE ARRAYS

Big Data Unlimited Cloud Storage Atmos Object Storage

Evaluation of Enterprise Data Protection using SEP Software

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

IBM TSM DISASTER RECOVERY BEST PRACTICES WITH EMC DATA DOMAIN DEDUPLICATION STORAGE

Oracle Maximum Availability Architecture with Exadata Database Machine. Morana Kobal Butković Principal Sales Consultant Oracle Hrvatska

MaxDeploy Ready. Hyper- Converged Virtualization Solution. With SanDisk Fusion iomemory products

<Insert Picture Here> Refreshing Your Data Protection Environment with Next-Generation Architectures

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

Storage Virtualization

Analyzing Big Data with Splunk A Cost Effective Storage Architecture and Solution

Arkivum's Digital Archive Managed Service

SFA OS Release WOS Access S3. Version 2.0 Product Release Notes WOS7000, WOS6000, WOS1600

PADS GPFS Filesystem: Crash Root Cause Analysis. Computation Institute

A Virtual Filer for VMware s Virtual SAN A Maginatics and VMware Joint Partner Brief

Traditional v/s CONVRGD

Scala Storage Scale-Out Clustered Storage White Paper

Big data management with IBM General Parallel File System

Xangati Storage Solution Brief. Optimizing Virtual Infrastructure Storage Systems with Xangati

HP 3PAR StoreServ 8000 Storage - what s new

21 st Century Storage What s New and What s Changing

Best Practices for Managing Storage in the Most Challenging Environments

Get Success in Passing Your Certification Exam at first attempt!

Intel Solid- State Drive Data Center P3700 Series NVMe Hybrid Storage Performance

Distributed File System Choices: Red Hat Storage, GFS2 & pnfs

Building Storage as a Service with OpenStack. Greg Elkinbard Senior Technical Director

Introducing NetApp FAS2500 series. Marek Stopka Senior System Engineer ALEF Distribution CZ s.r.o.

HITACHI VIRTUAL STORAGE PLATFORM FAMILY MATRIX

Long term retention and archiving the challenges and the solution

Transcription:

Object storage in Cloud Computing and Embedded Processing Jan Jitze Krol Systems Engineer

DDN We Accelerate Information Insight DDN is a Leader in Massively Scalable Platforms and Solutions for Big Data and Cloud Applications Established: 1998 Revenue: $226M (2011) Profitable, Fast Growth Main Office: Sunnyvale, California, USA Employees: 600+ Worldwide Worldwide Presence: 16 Countries Installed Base: 1,000+ End Customers; 50+ Countries Go To Market: Global Partners, Resellers, Direct World-Renowned & Award-Winning 2 6/8/12

Fact: amount of data is growing, fast 2000 1800 1600 1400 1200 1000 800 600 400 200 0 3 data stored world wide in Exa Bytes 1986 1993 2000 2007 2010 2011

Disk drives grow bigger, not faster or better Disk drives haven t changed that much over the last decade They just store more, ~40GB in 2002, 4000 GB in 2012 Access times are about the same, ~ 5 10 ms Write/read speeds are about the same ~ 50-100MB/s Read error rate is about the same 1 error per 10^14 bits read, or one guaranteed read error per 12 TB read. 4

Agenda Highlight two of DDN s initiatives to deal with large repositories of data: The Web Object Scaler, WOS Embedded data processing, aka in-storage processing 5

Challenges of ExaByte scale storage Exponential growth of data The expectation that all data will be available everywhere on the planet. Management of this tidal wave of data becomes increasingly difficult with regular NAS : Introduced in the 90 s (when ExaByte was a tape drive vendor)» With 16TB file system sizes that many still have Management intensive» LUNs, Volumes, Aggregates,» Heroically management intensive at scale Antiquated resiliency techniques that don t scale» RAID ( disk is a unit in RAID, whereas drive vendors consider a sector a unit)» Cluster failover, standby replication, backup» File Allocation Tables, Extent Lists Focused on structured data transactions (IOPS)» File locking overhead adds cost and complexity 6

Hyperscale Storage Web Object Scaler NoFS Hyperscale Distributed Collaboration

What is WOS A data store for immutable data That means we don t need to worry about locking No two systems will write to the same object Data is stored in objects Written with policies Policies drive replication Objects live in zones Data protection is achieved by replication or erasure coding Replicate within a zone or between zones Data is available from every WOS node in the cluster Only three operations possible, PUT, GET and DELETE 8

Universal Access to Support a Variety of Applications C++, Python, Java, PHP, HTTP, REST interfaces PUT, GET, DELETE S3 & WebDAV APIs Secure File Sharing Multi-tenancy NFS, CIFS LDAP,AD HA Rules Engine Rich Metadata Automation WOS Native Object Interface WOS Cloud WOS Access irods WOS Hyperscale Geo-Distributed Storage Cloud

What can you build with this? IonTorrent PGM Illumina Hi-Seq Remote site 1 Note: multiple name spaces Multiple storage protocols But ONE shared storage system Sequencer center Remote site 2 Library 10 Remote site 3

Storing ExaBytes, ZettaBytes or JottaBytes of data is only part of the story. The data needs to be processed too, which means as fast as possible access.

What is Embedded Processing? And why? Do data intensive processing as close to the storage as possible. Bring computing to the data instead of bring data to computing HADOOP is an example of this approach. Why Embedded Processing? Moving data is a lot of work A lot of infrastructure needed Client sends a request to storage (red ball) But what we really want is : So how do we do that? Client Storage Storage responds with data (blue ball)

Storage Fusion Architecture (SFA) 8 x IB QDR or 16 x FC8 Host Ports Interface Virtualization Interface Virtualization Interface processor Interface processor Interface processor Interface processor Interface processor Interface processor Interface processor Interface processor System memory RAID Processors High-Speed Cache Cache Link High-Speed Cache System memory RAID Processors Internal SAS Switching Internal SAS Switching 1 2 3 4 5 6 7 8 P RAID 5,6 Q RAID 6 1 1 2 3 4 1 m P RAID 5,6 Q RAID 6 Up to 1,200 disks in an SFA 10K Or 1,680 disks in an SFA 12K

Repurposing Interface Processors In the block based SFA10K platform, the IF processors are responsible for mapping Virtual Disks to LUNs on FC or IB In the SFA10KE platform the IF processors are running Virtual Machines RAID processors place data (or use data) directly in (or from) the VM s memory One hop from disk to VM s memory Now the storage is no longer a block device It is a storage appliance with processing capabilities

One SFA-10KE controller 8 x IB QDR/10GbE Host Ports (No Fibre Channel) Interface Virtualization Virtual Machine Virtual Machine Virtual Machine Virtual Machine System memory RAID Processors High-Speed Cache Cache Link Internal SAS Switching

Example configuration Now we can put irods inside the RAID controllers This give irods the fastest access to the storage because it doesn t have to go onto the network to access a fileserver. It lives inside the fileserver. We can put the irods catalogue, icat, on a separate VM with lots of memory and SSDs for DataBase storage The following example is a mix of irods with GPFS The same filesystem is also visible from an external compute cluster via GPFS running on the remaining VMs This is only one controller, there are 4 more VMs in the other controller need some work too They see the same storage and can access it at the same speed. On the SFA-12K we will have 16 VM s available running on Intel Sandy Bridge processors. (available Q3 2012)

Example configuration 8x 10GbE Host Ports Interface Virtualization Virtual Machine Linux Virtual Machine Linux Virtual Machine Linux Virtual Machine Linux icat SFA Driver GPFS SFA Driver GPFS SFA Driver GPFS SFA Driver 16 GB memory allocated 8 GB memory allocated System memory 8GB memory allocated 8GB memory allocated RAID Processors High-Speed Cache Cache Link Internal SAS Switching RAID sets with 2TB SSD RAID sets with 300TB SATA RAID sets with 30TB SAS

Running Micro Services inside the controller Since irods runs inside the controller we now can run irods MicroServices right on top of the storage. The storage has become an irods appliance speaking irods natively. We could create hot directories that kick off processing depending on the type of incoming data.

DDN SFA10KE With irods and GridScaler parallel filesystem Reading data via parallel paths with An irods rule requires that DDN GRIDScaler embedded in the system. a copy irods of the data is sent clients Clients Faster to are access->faster sending Writing the data processing->faster result to the of built-in the processing results a remote irods server irods server. Computing Compute cluster Remote irods server All of this happened because of a few rules in irods that triggered on the incoming data. Start processing irods/icat DDN GRIDScaler NSD DDN GRIDScaler NSD DDN GRIDScaler NSD In server other words, the incoming data drives the processing. After the conversion DDN another MicroService GRIDScaler irods was configured with a rule to submits a processing job on the cluster Client convert to the orange ball into a purple one. process the uploaded Here Virtual Machine and is a 5 pre-processed configuration with Virtual Machine Note that data irods for datamanagement 6 this Virtual is Machine done on 7 the Virtual storage Machine itself 8 and GridScaler for fast scratch space. Data can come in Virtual Machine 1 Virtual Machine 2 Virtual Machine 3 Virtual Machine 4 from clients Massive or devices throughput such as to sequencers. Meta backend storage data Registering 2012 the DataDirect data Networks. with All Rights Reserved. (click to continue) irods

Thank You

Backup slides 21

Object Storage Explained Object storage stores data into containers, called objects Each object has both data and user defined and system defined metadata (a set of attributes describing the object) File Systems Objects File Systems were designed to run individual computers, then limited shared concurrent access, not to store billions of files globally Metadata Data Flat Namespace Metadata Data Metadata Data Objects are stored in an infinitely large flat address space that can contain billions of files without file system complexity

Intelligent WOS Objects Sample Object ID (OID): Signature Full File or Sub-Object Policy Checksum User Metadata Key Value or Binary ACuoBKmWW3Uw1W2TmVYthA A random 64-bit key to prevent unauthorized access to WOS objects Eg. Replicate Twice; Zone 1 & 3 Robust 64 bit checksum to verify data integrity during every read Object = Photo Tag = Beach thumbnails

WOS Performance Metrics Network Performance Per Node Maximum System Performance (256 Nodes) Large Objects Small Objects (~20KB) Small Objects (~20KB) Write MB/s Read MB/s Object Writes/s Object Reads/s Object Writes/Day Object Reads/Day 1 GbE 200 300 1200 2400 25,214,976,000 50,429,952,000 10 GbE 250 500 1200 2400 25,214,976,000 50,429,952,000 Benefits Greater data ingest capabilities Faster application response Fewer nodes to obtain equivalent performance

WOS Architected for Big Data Hyper-Scale Global Reach & Data Locality 3U, 16-drive WOS Node (SAS/SATA) 4U, 60-drive WOS Node (SAS/SATA) 256 billion objects per cluster 5 TB max object size Scales to 23PB Network & storage efficient 64 zones and 64 policies 2PB / 11 Units per Rack Up to 4 way replication Global collaboration Access closest data No risk of data loss Resiliency with Near Zero Administration Dead Simple to Deploy & Administer Universal Access San Francisco New York London Tokyo Self healing All drives fully utilized NAS Protocol ( NFS, CIFS) Cloud Platform S3 compatibility irods Store interface Native Object Store interface WOS (Replicated or Object Assure) 50% faster recovery than traditional RAID Reduce or eliminate service calls WOS Access NFS, CIFS protocols Scalable HA & DR Protected Federation Cloud Store Platform S3 - Compatible & WebDAV APIs Multi - tenancy Reporting & Billing Personal storage IRODS Custom rules a rule for every need Rich metadata Native Object Store C++, Python, Java, PHP, HTTP REST interfaces Simple interface