Data Storage in Clouds



Similar documents
XtreemFS a Distributed File System for Grids and Clouds Mikael Högqvist, Björn Kolbeck Zuse Institute Berlin XtreemFS Mikael Högqvist/Björn Kolbeck 1

BabuDB: Fast and Efficient File System Metadata Storage

XtreemFS - a distributed and replicated cloud file system

XtreemFS Extreme cloud file system?! Udo Seidel

Replication and Consistency in Cloud File Systems

Maginatics Cloud Storage Platform Feature Primer

Distributed File Systems

Diagram 1: Islands of storage across a digital broadcast workflow

Storage Architectures for Big Data in the Cloud

An Oracle White Paper July Oracle ACFS

<Insert Picture Here> Oracle Cloud Storage. Morana Kobal Butković Principal Sales Consultant Oracle Hrvatska

<Insert Picture Here> Managing Storage in Private Clouds with Oracle Cloud File System OOW 2011 presentation

Cloud Optimize Your IT

Testing of several distributed file-system (HadoopFS, CEPH and GlusterFS) for supporting the HEP experiments analisys. Giacinto DONVITO INFN-Bari

Maginatics Cloud Storage Platform A primer

Indexes for Distributed File/Storage Systems as a Large Scale Virtual Machine Disk Image Storage in a Wide Area Network

Eloquence Training What s new in Eloquence B.08.00

HDFS Under the Hood. Sanjay Radia. Grid Computing, Hadoop Yahoo Inc.

Web-Based Data Backup Solutions

Distributed File Systems

Solaris For The Modern Data Center. Taking Advantage of Solaris 11 Features

Cloud Federations in Contrail

HDFS Architecture Guide

Investigating Private Cloud Storage Deployment using Cumulus, Walrus, and OpenStack/Swift

Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344

DATA SECURITY MODEL FOR CLOUD COMPUTING

Cloud Computing: Meet the Players. Performance Analysis of Cloud Providers

WOS Cloud. ddn.com. Personal Storage for the Enterprise. DDN Solution Brief

RED HAT STORAGE SERVER TECHNICAL OVERVIEW

SQL Server on Azure An e2e Overview. Nosheen Syed Principal Group Program Manager Microsoft

StorPool Distributed Storage Software Technical Overview

Design and Evolution of the Apache Hadoop File System(HDFS)

Oracle TimesTen In-Memory Database on Oracle Exalogic Elastic Cloud

WHITE PAPER. Software Defined Storage Hydrates the Cloud

Creating a Cloud Backup Service. Deon George

A Survey on Cloud Storage Systems

Cloud Computing For Bioinformatics

Panasas at the RCF. Fall 2005 Robert Petkus RHIC/USATLAS Computing Facility Brookhaven National Laboratory. Robert Petkus Panasas at the RCF

Why is it a better NFS server for Enterprise NAS?

2) Xen Hypervisor 3) UEC

Windows Server 2012 授 權 說 明

QoS-Aware Storage Virtualization for Cloud File Systems. Christoph Kleineweber (Speaker) Alexander Reinefeld Thorsten Schütt. Zuse Institute Berlin

Hadoop Distributed File System. T Seminar On Multimedia Eero Kurkela

Big data Devices Apps

SMB in the Cloud David Disseldorp

We look beyond IT. Cloud Offerings

Infortrend EonNAS 3000 and 5000: Key System Features

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing WHAT IS CLOUD COMPUTING? 2

Compatibility and Support Information Nasuni Corporation Natick, MA

Cloud Computing. Adam Barker

STREAD CLOUD BACKUP MILITARY-GRADE ONLINE BACKUP BUILT FOR YOUR BUSINESS

ZFS Backup Platform. ZFS Backup Platform. Senior Systems Analyst TalkTalk Group. Robert Milkowski.

Leveraging BlobSeer to boost up the deployment and execution of Hadoop applications in Nimbus cloud environments on Grid 5000

Alliance Key Manager Solution Brief

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc

Open Directory. Apple s standards-based directory and network authentication services architecture. Features

Boas Betzler. Planet. Globally Distributed IaaS Platform Examples AWS and SoftLayer. November 9, IBM Corporation

Availability for the modern datacentre Veeam Availability Suite v8 & Sneakpreview v9

Mobile Cloud Computing T Open Source IaaS

StorReduce Technical White Paper Cloud-based Data Deduplication

CS 6343: CLOUD COMPUTING Term Project

Alliance Key Manager A Solution Brief for Technical Implementers

ConPaaS: an integrated runtime environment for elastic cloud applications

Analisi di un servizio SRM: StoRM

Apache Hadoop. Alexandru Costan

CERN Cloud Storage Evaluation Geoffray Adde, Dirk Duellmann, Maitane Zotes CERN IT

Red Hat Storage Server

Collaborate on your projects in a secure environment. Physical security. World-class datacenters. Uptime over 99%

The Storfirst Gateway Storage Operating System:

Why SaaS (Software as a Service) and not COTS (Commercial Off The Shelf software)?

Feature Comparison. Windows Server 2008 R2 Hyper-V and Windows Server 2012 Hyper-V

Snapshots in Hadoop Distributed File System

RED HAT STORAGE PORTFOLIO OVERVIEW

Amazon Cloud Storage Options

THE FUTURE OF STORAGE IS SOFTWARE DEFINED. Jasper Geraerts Business Manager Storage Benelux/Red Hat

CUMULUX WHICH CLOUD PLATFORM IS RIGHT FOR YOU? COMPARING CLOUD PLATFORMS. Review Business and Technology Series

Next Generation Tier 1 Storage

Acronis Storage Gateway

MS Configuring and Administering Hyper-V in Windows Server 2012

Cloud n Service Presentation. NTT Communications Corporation Cloud Services

References. Introduction to Database Systems CSE 444. Motivation. Basic Features. Outline: Database in the Cloud. Outline

Introduction to Database Systems CSE 444

DISK IMAGE BACKUP. For Physical Servers. VEMBU TECHNOLOGIES TRUSTED BY OVER 25,000 BUSINESSES

GPFS-OpenStack Integration. Dinesh Subhraveti IBM Research

70-414: Implementing a Cloud Based Infrastructure. Course Overview

BlobSeer: Towards efficient data storage management on large-scale, distributed systems

A Deduplication File System & Course Review

GeoGrid Project and Experiences with Hadoop

The last 18 months. AutoScale. IaaS. BizTalk Services Hyper-V Disaster Recovery Support. Multi-Factor Auth. Hyper-V Recovery.

Journal of science STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS)

High Performance Computing OpenStack Options. September 22, 2015

Transcription:

Data Storage in Clouds Jan Stender Zuse Institute Berlin contrail is co-funded by the EC 7th Framework Programme 1

Overview Introduction Motivation Challenges Requirements Cloud Storage Systems XtreemFS a Cloud File System Distribution Replication Security Customization Future Work and Research 2

Motivation Data Storage in the Cloud Why? Access to infinite storage on demand No need for expensive dedicated hardware No need for over-provisioning 10001001010010111010100101 10111101100010110110111011 3

Challenges Sounds good, but what about... Availability? Privacy? Performance? Interfaces? Data Safety? Access Control? Flexibility? 4

User Requirements Data Access Through well-known, standardized interfaces (POSIX, HDFS, S3, ) From everywhere (inside and outside the cloud) Data Protection From loss due to corruption or device failures From unauthorized access Data Availability At any time (regardless of hardware failures) With high throughput and low latency 5

System Requirements Elasticity / Scalability Scale-out rather than scale-up: add new servers to increase capacity Maintainability Self-management capabilities Monitoring support (SNMP, Ganglia,...) 6

Cloud Storage Systems Amazon S3 Oxygen Google Storage OpenStack Object Storage Rackspace Cloud Files CloudFS Windows Azure Storage Walrus Ubuntu One Dropbox However, most do not offer file system semantics! Typically vendor-specific storage models and interfaces Sometimes restricted to write-once semantics Only weak consistency guarantees for replicas 7

Cloud Storage Systems Why a cloud file system? Compatible with any (legacy) application No source code adaption necessary Familiar semantics and behavior (e.g. POSIX) 8

XtreemFS a Cloud File System Distributed Replicated Secure Customizable 9

XtreemFS a Cloud File System Distributed Replicated Secure Customizable 10

XtreemFS Internet Cluster FS/ Datacenter Network FS/ Centralized PC ext3, ZFS, NTFS NFS, SMB AFS/Coda Lustre, Panasas, GPFS, CEPH... 11 Grid File System GFarm GDM "gridftp"

XtreemFS Architecture 12

XtreemFS a Cloud File System Distributed Replicated Secure Customizable 13

XtreemFS Replication Why? Availability Data safety Performance What? Files Metadata System and configuration information How? Different replication modes 14

Read-only File Replication Only for write-once files Fast data distribution P2P data transfer w/ different strategies Locality-aware data access Use cases: data archive, CDN Load balancing, data safety Clients can preferably access closest replicas No consistency issues 15

Read-write File Replication Any replica may be read or written Primary-backup model to ensure consistency Primary fail-over guaranteed through leases Primary enforces total order on all updates Lease timeout revokes primary status Decentralized lease coordination algorithm ( Flease ) 16

Metadata Replication Metadata stored in key-value database BabuDB: key-value store optimized for file system metadata FS directory tree mapped to flat key-value pairs Metadata update = group of key-value pair inserts 17

Metadata Replication Replication at database level Primary-backup w/ failover (Flease) Updates directed to primary Primary propagates keyvalue pair insert groups to backups Same scheme for service and volume registry replication 18

XtreemFS a Cloud File System Distributed Replicated Secure Customizable 19

XtreemFS Security Authentication X.509 certificates Mutual client-server authentication Encryption Optional traffic encryption w/ SSL 20

XtreemFS Security Authorization UID + GID extracted from certificate DN Authorization enforced by MRC only! Capabilities OSD checks signed authorization token ( capability ) issued by MRC 21

XtreemFS a Cloud File System Distributed Replicated Secure Customizable 22

XtreemFS Customization Policies define XtreemFS behavior Authorization Authentication Replica placement and selection Selection of OSDs for new files Mapping between local to global users Striping Plug-in mechanism for user-defined policies Examples Only use OSDs located in France for new files Enforce access control with POSIX ('rwx') semantics Read-only-replicate new files three times on close Preferably access those replicas that are close to the client 23

XtreemFS in Contrail Global Autonomous File System (GAFS) Storage repository For user data For VM images For system data (e.g. system logs) 24

Future Work and Research Autonomous replication Service levels Offer different service classes e.g. prefer user X over user Y, user I/O over replication I/O, etc. Support for cloud storage interfaces Automatic creation and deletion of replicas Replacement of unavailable replicas HDFS, S3, CDMI, Other features End-to-end data checksums and encryption Client-side data caching HSM Deduplication 25

contrail is co-funded by the EC 7th Framework Programme Funded under: FP7 (Seventh Framework Programme) Area: Internet of Services, Software & virtualization (ICT2009.1.2) Project reference: 257438 Total cost: 11,29 million euro EU contribution: 8,3 million euro Execution: From 2010-10-01 till 2013-09-30 Duration: 36 months Contract type: Collaborative project (generic) 26

XtreemFS: Overview What is XtreemFS? a distributed and replicated POSIX compliant file system off-the-shelve servers no expensive hardware servers in Java, runs on any Java-enabled platform client in C++, runs on Linux / OS X / Windows secure (X.509 and SSL) easy to install and maintain open source (GPL)

Open Source License: currently GPLv2, next release BSD Development team: 5 developers at ZIB (3 full-time + 2 student) Community: users and bug reporters mailing list with ~100 subscribers User projects: MOSGRID (D-Grid), VDZ (AIP) XtreemFS Overview Jan Stender/Björn Kolbeck 28

Features Current Version (1.2) Striping Read-only replication and partial replicas SSL, X.509 support Linux, Windows, Mac OS X Async. metadata backups Automatic replica selection POSIX compliant (interface & semantics) Tools for consistency checks Graphical management and monitoring tool XtreemFS Overview Jan Stender/Björn Kolbeck 29

Limits (theoretical) / Operating Systems Limits file size: 294, but Linux limits file sizes to 264 max. files/directories per volume: 263-1 max. files per directory: same as max. files per volume max. number of volumes: 231-1 max. size of the metadata database: 263 on 64bit systens 2GB on 32bit system (version 1.3 will also support larger databases on 32bit machines) Supported Operating Systems Servers: any platform with Java 1.6 (Linux, Solaris, Windows, OS X) Client: any platform with FUSE (Linux, OS X, FreeBSD) and Windows (DOKAN) XtreemFS Overview Jan Stender/Björn Kolbeck 30

Metadata Management LSM-Tree based backend for MRC key-value store, non-transactional optimized for MRC and file system workloads asynchronous checkpoints and snapshots short recovery and start-up times performance: 300,000 lookups/sec (30M entries) XtreemFS Overview Jan Stender/Björn Kolbeck 31

Metadata: Mapping in Detail File and Metadata Replication in XtreemFS Björn Kolbeck 32