SDS: Object or File: How to Choose?



Similar documents
Scality Conversations (Episode 3) Ever Evolving Data Center Hardware. Leo Leung, VP of Corporate Marketing

Scala Storage Scale-Out Clustered Storage White Paper

Diagram 1: Islands of storage across a digital broadcast workflow

Scality RING. Software Defined Storage for the 21st Century. Philippe Nicolas Director of Product Strategy

IBM ELASTIC STORAGE SEAN LEE

Building Storage as a Service with OpenStack. Greg Elkinbard Senior Technical Director

WHITE PAPER. Software Defined Storage Hydrates the Cloud

WHITE PAPER. QUANTUM LATTUS: Next-Generation Object Storage for Big Data Archives

WOS. High Performance Object Storage

Big data management with IBM General Parallel File System

Scality RING High performance Storage So7ware for pla:orms, StaaS and Cloud ApplicaAons

WOS OBJECT STORAGE PRODUCT BROCHURE DDN.COM Full Spectrum Object Storage

SwiftStack Filesystem Gateway Architecture

BlueArc unified network storage systems 7th TF-Storage Meeting. Scale Bigger, Store Smarter, Accelerate Everything

Designing a Cloud Storage System

ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE

A virtual SAN for distributed multi-site environments

High Performance Server SAN using Micron M500DC SSDs and Sanbolic Software

Growth of Unstructured Data & Object Storage. Marcel Laforce Sr. Director, Object Storage

EMC SCALEIO OPERATION OVERVIEW

Business-centric Storage FUJITSU Hyperscale Storage System ETERNUS CD10000

EMC IRODS RESOURCE DRIVERS

WOS Cloud. ddn.com. Personal Storage for the Enterprise. DDN Solution Brief


Data Sheet FUJITSU Storage ETERNUS CD10000

ANY SURVEILLANCE, ANYWHERE, ANYTIME

The Pros and Cons of Erasure Coding & Replication vs. RAID in Next-Gen Storage Platforms. Abhijith Shenoy Engineer, Hedvig Inc.

The last 18 months. AutoScale. IaaS. BizTalk Services Hyper-V Disaster Recovery Support. Multi-Factor Auth. Hyper-V Recovery.

A Virtual Filer for VMware s Virtual SAN A Maginatics and VMware Joint Partner Brief

CERN Cloud Storage Evaluation Geoffray Adde, Dirk Duellmann, Maitane Zotes CERN IT

Object storage in Cloud Computing and Embedded Processing

Product Brochure. Hedvig Distributed Storage Platform Modern Storage for Modern Business. Elastic. Accelerate data to value. Simple.

StorPool Distributed Storage Software Technical Overview

TECHNICAL WHITE PAPER: ELASTIC CLOUD STORAGE SOFTWARE ARCHITECTURE

High Availability with Windows Server 2012 Release Candidate

M710 - Max 960 Drive, 8Gb/16Gb FC, Max 48 ports, Max 192GB Cache Memory

SCALABLE FILE SHARING AND DATA MANAGEMENT FOR INTERNET OF THINGS

THE EMC ISILON STORY. Big Data In The Enterprise. Copyright 2012 EMC Corporation. All rights reserved.

High Performance Computing OpenStack Options. September 22, 2015

THE FUTURE OF STORAGE IS SOFTWARE DEFINED. Jasper Geraerts Business Manager Storage Benelux/Red Hat

Introduction to Gluster. Versions 3.0.x

Product Spotlight. A Look at the Future of Storage. Featuring SUSE Enterprise Storage. Where IT perceptions are reality

Integrating Scality RING into OpenStack Unified Storage for Cloud Infrastructure

Introduction to NetApp Infinite Volume

HGST Object Storage for a New Generation of IT

Sun Storage Perspective & Lustre Architecture. Dr. Peter Braam VP Sun Microsystems

OPTIMIZING PRIMARY STORAGE WHITE PAPER FILE ARCHIVING SOLUTIONS FROM QSTAR AND CLOUDIAN

Analyzing Big Data with Splunk A Cost Effective Storage Architecture and Solution

Four Reasons To Start Working With NFSv4.1 Now

Object Storage, Cloud Storage, and High Capacity File Systems

SMB in the Cloud David Disseldorp

New Storage System Solutions

StorReduce Technical White Paper Cloud-based Data Deduplication

Milestone Solution Partner IT Infrastructure MTP Certification Report Scality RING Software-Defined Storage

WOS 360 FULL SPECTRUM OBJECT STORAGE

RED HAT STORAGE PORTFOLIO OVERVIEW

Reference Design: Scalable Object Storage with Seagate Kinetic, Supermicro, and SwiftStack

Whitepaper. NexentaConnect for VMware Virtual SAN. Full Featured File services for Virtual SAN

EOFS Workshop Paris Sept, Lustre at exascale. Eric Barton. CTO Whamcloud, Inc Whamcloud, Inc.

VMware Software-Defined Storage Vision

EMC ISILON OneFS OPERATING SYSTEM Powering scale-out storage for the new world of Big Data in the enterprise

Red Hat Storage Server Administration Deep Dive

The BIG Data Era has. your storage! Bratislava, Slovakia, 21st March 2013

10th TF-Storage Meeting

Software Defined Microsoft. PRESENTATION TITLE GOES HERE Siddhartha Roy Cloud + Enterprise Division Microsoft Corporation

How to Choose your Red Hat Enterprise Linux Filesystem

Scale and Availability Considerations for Cluster File Systems. David Noy, Symantec Corporation

Building Storage Clouds for Online Applications A Case for Optimized Object Storage

SWIFT. Page:1. Openstack Swift. Object Store Cloud built from the grounds up. David Hadas Swift ATC. HRL 2012 IBM Corporation

XenData Video Edition. Product Brief:

Traditional v/s CONVRGD

Product Brochure. Hedvig Distributed Storage Platform. Elastic. Modern Storage for Modern Business. One platform for any application. Simple.

Cloud Based Application Architectures using Smart Computing

Using object storage as a target for backup, disaster recovery, archiving

The OpenStack TM Object Storage system

Caringo Swarm 7: beyond the limits of traditional storage. A new private cloud foundation for storage needs at scale

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

DOVECOT Overview. Timo Sirainen Chief Architect Co-Founder

Windows Server 2012 授 權 說 明

Maxta Storage Platform Enterprise Storage Re-defined

EMC ISILON X-SERIES. Specifications. EMC Isilon X200. EMC Isilon X210. EMC Isilon X410 ARCHITECTURE

SMB Direct for SQL Server and Private Cloud

Workspace & Storage Infrastructure for Service Providers

DataCore SANsymphony-V Software-Defined Storage The intelligent way of data virtualizing!

The Design and Implementation of the Zetta Storage Service. October 27, 2009

Object Storage: A Growing Opportunity for Service Providers. White Paper. Prepared for: 2012 Neovise, LLC. All Rights Reserved.

PARALLELS CLOUD STORAGE

Clodoaldo Barrera Chief Technical Strategist IBM System Storage. Making a successful transition to Software Defined Storage

An Alternative Storage Solution for MapReduce. Eric Lomascolo Director, Solutions Marketing

Cloud File Services: October 1, 2014

Archive Data Retention & Compliance. Solutions Integrated Storage Appliances. Management Optimized Storage & Migration

Data management challenges in todays Healthcare and Life Sciences ecosystems

The software platform for storing, preserving and sharing very large data sets.

Seagate Cloud Systems & Solutions

EMC ISILON AND ELEMENTAL SERVER

Transcription:

SDS: Object or File: How to Choose?

Storage Industry is Transforming like Compute Virtualization Performance and reliability designed for ten years ago Embedded software tied to hardware Disks and components marked up multiple times along value chain From Monolithic Appliances To Pure Software on Standard Servers HW-Agnostic Software Industry Standard Servers BETTER: Performance Reliability Flexibility Economics

SDS Replaces Traditional File Storage Silos LOW LATENCY Flash $5/GB SAN $2/GB (EMC) Storage Defined Software on Standard x86 $[0.70-0.10]/GB TERABYTES OBJECT $0.70/GB (Cleversafe) EXABYTES NAS $1/GB (EMC/NTAP) TAPE $0.10/GB (Quantum) HIGH BANDWIDTH HW+SW acquisition costs

Scality Key Facts Founded: 2009 Product release: July 2010 $85M venture capital San Francisco (HQ), Washington, Paris (Engineering), Tokyo, Boston, Singapore 150+ people Product: Scality RING software Markets: Service Providers, Banks, Media, Government 250% global sales growth in 2014 DNA: Webmail storage for 200M users, requiring high durability, 24x7 availability, instant access, & low cost 85+ cloud-scale customers in production worldwide

Scality RING: Software for Storing the Information Age Throughput performance scales from 15GB/s to infinity 100% reliability with minimal intervention, at massive scale Pure software runs on any x86, Linux-based servers, even mixed generations Just works, as proven in the most demanding environments

85+ Cloud-scale Production Customers Worldwide Consumer Service Providers Web & Cloud Services Two of Japan s largest Telcos Media Largest post-production Largest cable network Very large retail Enterprise Multiple large Healthcare Multiple large Manufacturers Multiple large Financials Govt/Research

What issues does Exascale raise? 8 TB Drives 60 Drives/Server/4U 2% drive failure/year 1MB average file size 1 Exabyte 100PB 1PB Drives 125,000 12,500 125 Daily disk loss 6.8 0.7.01 Servers 2083 208 2 Number of Files 1,000,000,000,000 100,000,000,000 1,000,000,000 @ 100MB/sec, 22 hours to fill a 8TB drive, but @ 1Tb network ~100 days to move 1Exabyte IF load is well balanced across all 125 000 drives

Where is Scality Today? 1 Exabyte 100PB 1PB Drives 125,000 12,500 125 Daily disk loss 6.8 0.7..01 Servers 2083 208 2 Number of files 1,000,000,000,000 100,000,000,000 1,000,000,000 300 Servers > 80 Billion Files 50 PBs with 100s of PB in view

Object Storage isn t a Grail Object storage doesn t save space Most applications expect POSIX file systems so there is much work to be done: eg Whamcloud/Intel DAOS. We will need to do the work! Hierarchical models are a good way to organize data File locking: multiple writers & sync issues don t just go away

But, the Grass is Greener on Object Storage! Object storage manages 1000 s fewer things Only useful data is rebuilt in the case of drive loss Massively parallel reads and writes are straightforward Each object or collection has meaning: the blocks of traditional systems have no intrinsic meaning Maintaining system-wide consistency is not required Geo-Distribution becomes realistic

What about Data Integrity? Note: RAID 5 < 4x9s, Raid 6 < 5x9s, 3 Copies ~ 7x9s (1 year) With 7 x 9s ~ 1/100,000 chance of losing data, but if we have billions of files, the likelihood is very high ARC(8,4) gives > 11x9s of durability similar to 4 copies with 1.5 overhead Silent Data Corruption CERN found that in 15PB of data 38,000 files were corrupted Unrecoverable Tape error rates every 1PB LSE ~ 10-14 Indicates Disk bit error every 10TB accounts for ~ 5-10% of risk compared to drive failures Scality stores and checks CRC on every object

High-level RING Architecture

Scality RING software-defined storage components Multi-app FILE-BASED OBJECT-BASED OPENSTACK-BASED Linear performance scaling PARALLEL, HIGH PERFORMANCE ACCESS TO DATA SCALITY RING SOFTWARE-DEFINED STORAGE Connectors Data Protection and Geo-Redundancy Management Core RING x86 servers Limitless infrastructure scaling

Scality RING Technology Architectural Deep dive

Core RING Design Principles Core RING 100% Parallel System Skip Technical Presentation Fully distributed data & metadata No Single Point of Failure Durable & Resilient Location & geo-aware for durability (disks, servers, racks, sites) Self-healing for component failures Always available even during upgrades Hardware agnostic User space software, no kernel modifications Runs on any hardware without requiring certification

Core RING topology: an extremely resilient system designed to tolerate failure Core RING

What is Replication? Data Protection and Geo-Redundancy Optimal for small objects & files Class of service can be set from 0 to 5 replicas (1 to 6 copies) 220 00 11 20 Location-aware Data placement across failure domains Locates replicas across disks, servers, racks, sites 200 40 No data transformation Clear / native data format for very fast access Simple projection (no need to store replica keys) Self-healing Auto-heals missing replicas or parity blocks Auto-rebalancing for new or dropped nodes Transparent proxy for object being balanced Permanent CRC of all contents (no silent data corruption) 171 180 160 140 120 100 80 60 91

Data Protection and Geo- Redundancy Intelligent Data Protection: Replication, Erasure Coding & Geo-distribution Replication ARC Erasure Coding Geo- distribution 11 00 220 20 200 40 Scality RING 180 60 Software Defined Storage 171 80 160 91 140 100 120 Copyright Scality 2015

What is Erasure Coding? Data Protection and Geo-Redundancy Scality ARC (Advanced Resiliency Configuration) Erasure coding for optimal data protection of large files and objects Avoids the overhead of multiple replicas (data copies) Dynamically configurable schema (data + parity chunks) to protect against variable number of failures with improved efficiency versus replication Protects against disk, server, rack or data center level failures Flexible & Efficient Select replication or ARC strategy per object/file (or policy-based) Data chunks stored in clear to avoid read performance penalties ARC data and correction chunks are always contiguous. For example: 9+3 Schema provides three disk failure protection with ~33% space overhead data chunks parity chunks

Zoom on the Access

The Data Path Key BCEEE8B99EB9E96E71AF3731704356EFA8B35D2011? APPLICATION/CONNECTOR Chord lookup (IP) IP:PORT DATA STORAGE NODE DATA DATA BIZIOD.DAT File operation on IP:PORT lookup on biziod attached to HDD read length from.dat file at offset Skip chain lookup (memory) HDD for key bizobj lookup (file).dat file + offset + length Node with key on disk bizobj.bin HDR HASH TABLE key data key data key data key data key data key data key data key data HASH TABLE key data key data. Skip Chain BizObj.bin Determines: *.DAT File * Offset Maintains: * Version * Times * Status RAM-based Skip Chain returns: * Existence * BizIOD Ref (which HDD?) * Version * Status Flag 21

New Storage Strategies BIZIOD w/ HSE - Release M BIZOBJ CONTROL LSM DATA STRATEGY HSE Index Storage SQLite B-Tree Containers IP-Drive (Kinetic API) Pluggable Architecture Multiple Indexing Strategies Multiple Object Storage Strategies Kinetic Interface Plug-in replacement to bizobj Slide 22

Sprov part of Scality s sauce Data placement Optimization: Data wisely spread across servers, sites & racks to support site, rack and server failures Data equally distributed by disk space heterogeneous server capacity Minimize impact of server failures Minimize data movement while growing the ring Determine Data security

Multi-Geo: Single, Mirrored & Stretched RING Single RING One RING, one datacenter Replication or ARC erasure coding (within RING) for higher durability Mirrored RING Multi-RING (2 or more) in one or multiple datacenters Store multiple copies in multiple rings for higher availability Mirrored Replication to EC possible Data Protection and Geo-Redundancy Stretched RING across multiple sites Active/Active - managed as a single RING Location aware allocation: both Replication and ARC can spread data across disparate domains for site failure tolerance Synchronous operations: best for lowerlatency WAN environment Scality RING Software Defined Storage Scality RING Software Defined Storage Scality RING Software Defined Storage Scality RING Software Defined Storage DC1 DC 2 DC 3

RING Management: Easy to use GUI & sophisticated CLI Management New improved GUI and CLI Easy integration with Puppet/Chef/SaltStack Integration with standard SNMP tools All statistics can be exported via HTTP

Scality RING Management UI Management

RING 5.0: Volume Provisioning through Supervisor UI Management Simplified SOFS Configuration & Provisioning Current SOFS Connector configuration is manual & requires knowledge of many parameters New Volume concept for SOFS connectors Supervisor UI enables easy selection of NFS, FUSE, CDMI and SMB (planned) connectors and configuration Fully automates the startup of the connectors ready to be accessed by applications!

Fast rebuilds with minimum performance impact START 0:16 1:30 3:30 2200 Disk Failed Disk Repaired in only 16 mins 2100 Storage Server Failure Scality RING on six servers 1.8GB/s sustained with continuous I/O Megabytes per second 2000 1900 1800 1700 1600 1500 Self-healing: Rebalance / Rebuild 60TB in only two hours Performance impact is limited to the offline server 1400

MESA a Distributed, ACID-Compliant Database Slide 29

RING Connectors: A wide choice of Native Object & File interfaces Connectors NFS sproxyd sfused RS2 SMB OpenStack Swift CDMI How do we recommend the optimal connector(s) for customers to use? File versus objects are usually an application driven choice Some partner applications do support both types Some customers are ready to migrate their application from files to objects due to growth, performance or sometimes a negative experience with legacy products File Object

RING Object and File Connectors Connectors Type Connector Strength Tradeoffs Object Sproxyd Stateless, lightweight, native REST API, highly scalable, support for geo-distributed deployments RS2 RS2 light CDMI S3 compatible REST API, with Buckets, authentication & object indexing support Subset of RS2 highly-scalable, supports geo-distributed deployments REST API namespace compatible with SOFS (NFS, SMB, FUSE) data File NFS NFS v3 compatible server, supports Kerberos, advisory-locking (NLM), and user/group quotas FUSE SMB Local Linux file system driver, great for application servers. Fast for big files: parallel IO to multiple back-end storage servers SMB 2.x and subset of SMB 3.x compliant server No container mechanism, no authentication Buckets cannot be geo-distributed, performance overhead for authentication & MD5 verification Eliminates the S3 bucket containers & authentication mechanism Containers may present a scaling or concurrency limitation Multiple concurrent readers OK, multiple writers serialize on single directories/files Requires driver to be installed on client / app server. Same concurrency behavior as NFS Runs on top of FUSE. Does not yet support SMB 3.0 multi-channel IO

Scality RING: Unified Storage Management for OpenStack RING provides single system to manage Simplified, autonomous management Self-healing Unifies OpenStack Swift, Cinder & Glance Eliminates multiple storage silos Consolidates 80% of OpenStack data storage In the future: Manila for shared files Scalable to petabytes and beyond Enables Cloud Economics Runs on your choice of hardware Organically scalable, no downtime, no forklift upgrades

The Next Step in Flexibility: IP Drives MetaData access Direct Data access (Object / File / Block) MDC Redundant Geo Metadata farms (available also at Kinetic sites) x86 servers or VM w. File SSD system Directories Small objects Cache Erasure coding and/or replication MetaData Ring Farm Data Center #0 Data Center #1 Data Center #2 Seagate Kinetic Pools

Scality File Connectors Connectors NFS, wide platform support NFS v3 is the commonly used and available version of the popular Network File System protocol originally developed by Sun Microsystems. Supports Linux, Mac, Microsoft Windows, Includes support for NFS quotas and NFS advisory locking within a single connector Authentication for NFS clients via the Kerberos mechanism Support for security server solutions such as Microsoft Active Directory (AD) NFS Multi-app FILE-BASED OBJECT-BASED VM-BASED Linear performance scaling Scality RING Software Defined Storage x86 servers Limitless infrastructure scaling

Scality File Connectors Connectors sfused, local file system with parallel IO capabilities FUSE is a POSIX compliant local file system supported across all major Linux distributions (CentOS, Ubuntu, Debian (L)) Typically deployed as local file system access on Linux app server Local file and range locking, performs parallel backend file I/O to the same file, to optimize access to very large files. Support for user and group quotas sfused Multi-app FILE-BASED OBJECT-BASED VM-BASED Linear performance scaling Scality RING Software Defined Storage x86 servers Limitless infrastructure scaling Slide 35

Scality File Connectors Connectors SMB Connector (RING 5.0) SMB 2.x and SMB 3.x compliant (MultiChannel IO planned in 2015) Support for basic Windows ACLs Runs on the FUSE connector (sfused) SMB Multi-app FILE-BASED OBJECT-BASED VM-BASED Linear performance scaling Scality RING Software Defined Storage x86 servers Limitless infrastructure scaling Slide 36

Scality Object Connectors Connectors CDMI, REST interface with file compatibility CDMI is an industry-standard REST protocol sponsored by the Storage Networking Industry Association (SNIA). Object REST API (PUT, GET, DELETE) based on the SNIA CDMI v1.0 specification Namespace compatible with Scality file system (SOFS) to enable object & file sharing CDMI Multi-app FILE-BASED OBJECT-BASED VM-BASED Linear performance scaling Scality RING Software Defined Storage x86 servers Limitless infrastructure scaling

Scality Object Connectors Connectors sproxyd, high-performance & scalability Pure object REST API with simple key/value semantics Designed to meet extreme scalability and performance requirements. Provides a single, scalable, flat namespace but no Container or Bucket concept Great fit for distributed (multi-site) deployments Application must therefore manage the following: Catalog of keys & metadata sproxyd Security (Authentication) Metering & reporting Multi-app FILE-BASED OBJECT-BASED VM-BASED Linear performance scaling Scality RING Software Defined Storage x86 servers Limitless infrastructure scaling

Scality Object Connectors Connectors RS2, public cloud compatibility Object REST API (PUT, GET, DELETE) based on AWS S3 designed mainly for public cloud deployments Mandatory MD5 signature on every operation (impacts performance) Includes Bucket concept with commands for Bucket operations Potential bucket list update contention has performance and deployment impacts Not idea for distributed, multi-site deployments RS2 light Subset of the RS2 REST API Eliminates use of buckets No authentication Higher-performance & more scalable Multi-app FILE-BASED RS2 OBJECT-BASED VM-BASED Linear performance scaling Scality RING Software Defined Storage x86 servers Limitless infrastructure scaling

Scality Object Connectors Connectors OpenStack Swift OpenStack Swift compatible API that uses Scality Sproxyd (REST API) to store object on Scality RING (by path method) Enables replication & erasure coded storage Highly scalable, high-performance storage Preserves Swift Functionality Account & Containers Keystone authentication OpenStack Swift Multi-app FILE-BASED OBJECT-BASED VM-BASED Linear performance scaling Scality RING Software Defined Storage x86 servers Limitless infrastructure scaling

Example Deployments

The Easiest Examples: Sensor Data vs. Simulation Data Sensor Data Data is captured through innovative sensors Output can be graphical (images, film) but also numerical: temperature, humidity, DNA sequences, SKA, etc. Sensor data is often processed in a truly parallel fashion, also called embarrassingly parallel Simulation Data Data is generated with computational devices Output is computational; typically many writers to 1 file Data computation requires very low latency and high random I/O loads Less clear use for object storage Copyright Scality 2015

Example: Medical Research Multi-petabyte system High value physical data Ultra hi-res images Multi-site Usage Adopted a Object based model Site A Site B

Examples : RAM Flush (Simulation) or Sensor Burst (Seismic & Particle Physics) PUT Key GET Range PUT Key Get Range PUT Key Get Range PUT Key Get Range PUT Key Get Range 1000s-10000s Writers Parallel data access to deliver high throughput 100s-1000s Storage Servers Offload ~ 1PB in less than 1 hour Global Namespace ONE Single View of Content Scality RING

Example: 2 nd Tier using Scality RING & Lustre Integration HPC Cluster Compute Lustre MDS s Home Directories Lustre OSS OST Sproxyd Lustre OSS Lustre OSS Lustre 2.5 Parallel FIlesystem Lustre 2.5 Copy Tool Lustre OSS OST OST OST Sproxyd Lustre OSS This is Scality recommended method for best performance results: Objects are stored using HTTP/REST interface with FID Scality OBJECT connectors run on the Storage servers Scality RING x86 servers Limitless infrastructure scaling SCALITY RING SOFTWARE RING Management! Security Replication! Erasure Coding! Geo-Redundancy Scality REST REST Block Driver API

Example : European Car Manufacturer A leading European Car manufacturer. For a video data analysis project, Cars Inc. uses a 5PB Scality platform. The system will hold 1 million km of cars video analysis for crash avoidance. Highest valued features: Granular and gradual scalability Hardware-agnostic flexibility to upgrade hardware in the future without impact on the business Scality s support for distributed & parallel computing Performance: the more data can be processed, the more safety improvements can be achieved

THANK YOU! www.scality.com Try it today www.scality.com/trial