Data Storage in Clouds Jan Stender Zuse Institute Berlin contrail is co-funded by the EC 7th Framework Programme 1
Overview Introduction Motivation Challenges Requirements Cloud Storage Systems XtreemFS a Cloud File System Distribution Replication Security Customization Future Work and Research 2
Motivation Data Storage in the Cloud Why? Access to infinite storage on demand No need for expensive dedicated hardware No need for over-provisioning 10001001010010111010100101 10111101100010110110111011 3
Challenges Sounds good, but what about... Availability? Privacy? Performance? Interfaces? Data Safety? Access Control? Flexibility? 4
User Requirements Data Access Through well-known, standardized interfaces (POSIX, HDFS, S3, ) From everywhere (inside and outside the cloud) Data Protection From loss due to corruption or device failures From unauthorized access Data Availability At any time (regardless of hardware failures) With high throughput and low latency 5
System Requirements Elasticity / Scalability Scale-out rather than scale-up: add new servers to increase capacity Maintainability Self-management capabilities Monitoring support (SNMP, Ganglia,...) 6
Cloud Storage Systems Amazon S3 Oxygen Google Storage OpenStack Object Storage Rackspace Cloud Files CloudFS Windows Azure Storage Walrus Ubuntu One Dropbox However, most do not offer file system semantics! Typically vendor-specific storage models and interfaces Sometimes restricted to write-once semantics Only weak consistency guarantees for replicas 7
Cloud Storage Systems Why a cloud file system? Compatible with any (legacy) application No source code adaption necessary Familiar semantics and behavior (e.g. POSIX) 8
XtreemFS a Cloud File System Distributed Replicated Secure Customizable 9
XtreemFS a Cloud File System Distributed Replicated Secure Customizable 10
XtreemFS Internet Cluster FS/ Datacenter Network FS/ Centralized PC ext3, ZFS, NTFS NFS, SMB AFS/Coda Lustre, Panasas, GPFS, CEPH... 11 Grid File System GFarm GDM "gridftp"
XtreemFS Architecture 12
XtreemFS a Cloud File System Distributed Replicated Secure Customizable 13
XtreemFS Replication Why? Availability Data safety Performance What? Files Metadata System and configuration information How? Different replication modes 14
Read-only File Replication Only for write-once files Fast data distribution P2P data transfer w/ different strategies Locality-aware data access Use cases: data archive, CDN Load balancing, data safety Clients can preferably access closest replicas No consistency issues 15
Read-write File Replication Any replica may be read or written Primary-backup model to ensure consistency Primary fail-over guaranteed through leases Primary enforces total order on all updates Lease timeout revokes primary status Decentralized lease coordination algorithm ( Flease ) 16
Metadata Replication Metadata stored in key-value database BabuDB: key-value store optimized for file system metadata FS directory tree mapped to flat key-value pairs Metadata update = group of key-value pair inserts 17
Metadata Replication Replication at database level Primary-backup w/ failover (Flease) Updates directed to primary Primary propagates keyvalue pair insert groups to backups Same scheme for service and volume registry replication 18
XtreemFS a Cloud File System Distributed Replicated Secure Customizable 19
XtreemFS Security Authentication X.509 certificates Mutual client-server authentication Encryption Optional traffic encryption w/ SSL 20
XtreemFS Security Authorization UID + GID extracted from certificate DN Authorization enforced by MRC only! Capabilities OSD checks signed authorization token ( capability ) issued by MRC 21
XtreemFS a Cloud File System Distributed Replicated Secure Customizable 22
XtreemFS Customization Policies define XtreemFS behavior Authorization Authentication Replica placement and selection Selection of OSDs for new files Mapping between local to global users Striping Plug-in mechanism for user-defined policies Examples Only use OSDs located in France for new files Enforce access control with POSIX ('rwx') semantics Read-only-replicate new files three times on close Preferably access those replicas that are close to the client 23
XtreemFS in Contrail Global Autonomous File System (GAFS) Storage repository For user data For VM images For system data (e.g. system logs) 24
Future Work and Research Autonomous replication Service levels Offer different service classes e.g. prefer user X over user Y, user I/O over replication I/O, etc. Support for cloud storage interfaces Automatic creation and deletion of replicas Replacement of unavailable replicas HDFS, S3, CDMI, Other features End-to-end data checksums and encryption Client-side data caching HSM Deduplication 25
contrail is co-funded by the EC 7th Framework Programme Funded under: FP7 (Seventh Framework Programme) Area: Internet of Services, Software & virtualization (ICT2009.1.2) Project reference: 257438 Total cost: 11,29 million euro EU contribution: 8,3 million euro Execution: From 2010-10-01 till 2013-09-30 Duration: 36 months Contract type: Collaborative project (generic) 26
XtreemFS: Overview What is XtreemFS? a distributed and replicated POSIX compliant file system off-the-shelve servers no expensive hardware servers in Java, runs on any Java-enabled platform client in C++, runs on Linux / OS X / Windows secure (X.509 and SSL) easy to install and maintain open source (GPL)
Open Source License: currently GPLv2, next release BSD Development team: 5 developers at ZIB (3 full-time + 2 student) Community: users and bug reporters mailing list with ~100 subscribers User projects: MOSGRID (D-Grid), VDZ (AIP) XtreemFS Overview Jan Stender/Björn Kolbeck 28
Features Current Version (1.2) Striping Read-only replication and partial replicas SSL, X.509 support Linux, Windows, Mac OS X Async. metadata backups Automatic replica selection POSIX compliant (interface & semantics) Tools for consistency checks Graphical management and monitoring tool XtreemFS Overview Jan Stender/Björn Kolbeck 29
Limits (theoretical) / Operating Systems Limits file size: 294, but Linux limits file sizes to 264 max. files/directories per volume: 263-1 max. files per directory: same as max. files per volume max. number of volumes: 231-1 max. size of the metadata database: 263 on 64bit systens 2GB on 32bit system (version 1.3 will also support larger databases on 32bit machines) Supported Operating Systems Servers: any platform with Java 1.6 (Linux, Solaris, Windows, OS X) Client: any platform with FUSE (Linux, OS X, FreeBSD) and Windows (DOKAN) XtreemFS Overview Jan Stender/Björn Kolbeck 30
Metadata Management LSM-Tree based backend for MRC key-value store, non-transactional optimized for MRC and file system workloads asynchronous checkpoints and snapshots short recovery and start-up times performance: 300,000 lookups/sec (30M entries) XtreemFS Overview Jan Stender/Björn Kolbeck 31
Metadata: Mapping in Detail File and Metadata Replication in XtreemFS Björn Kolbeck 32