SCFS: A Shared Cloud-backed File System



Similar documents
Sharing Files Using Cloud Storage Services

DepSky Dependable and Secure Storage in a Cloud-of-Clouds Alysson Bessani, Miguel Correia, Bruno Quaresma, Fernando André, Paulo Sousa

Outline. Clouds of Clouds lessons learned from n years of research Miguel Correia

D2.2.4 Adaptive Cloud-of-Clouds Architecture, Services and Protocols

BiobankCloud a PaaS for Biobanking

Secure Framework for Data Storage from Single to Multi clouds in Cloud Networking

Keywords-Cloud, backup, Storage, PVFS2, SAN. 1. INTRODUCTION

Cloud Sync White Paper. Based on DSM 6.0

Data Storage in Clouds

How swift is your Swift? Ning Zhang, OpenStack Engineer at Zmanda Chander Kant, CEO at Zmanda

The State of Cloud Storage

Outline. Failure Types

Cloud Computing Trends

How To Use Arcgis For Free On A Gdb (For A Gis Server) For A Small Business

Massive Data Storage

A Multi-Cloud based Approach to Enhance Data Security and Availability in Cloud Storage

Introduction to Windows Azure Cloud Computing Futures Group, Microsoft Research Roger Barga, Jared Jackson,Nelson Araujo, Dennis Gannon, Wei Lu, and

Configuring Apache Derby for Performance and Durability Olav Sandstå

MIGRATION FROM SINGLE TO MULTI-CLOUDS TO SHRIVEL SECURITY RISKS IN CLOUD COMPUTING. K.Sireesha 1 and S. Suresh 2

Inside Dropbox: Understanding Personal Cloud Storage Services

Meeting Management Solution. Technology and Security Overview N. Dale Mabry Hwy Suite 115 Tampa, FL Ext 702

CUMULUX WHICH CLOUD PLATFORM IS RIGHT FOR YOU? COMPARING CLOUD PLATFORMS. Review Business and Technology Series

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Cloud Computing: Meet the Players. Performance Analysis of Cloud Providers

Simple Storage Service (S3)

SHARPCLOUD SECURITY STATEMENT

Demystifying the Cloud Computing

Cloud Service Model. Selecting a cloud service model. Different cloud service models within the enterprise

Approaches for Cloud and Mobile Computing

EXECUTIVE SUMMARY CONTENTS. 1. Summary 2. Objectives 3. Methodology and Approach 4. Results 5. Next Steps 6. Glossary 7. Appendix. 1.

Distributed Data Stores

Data Centers and Cloud Computing. Data Centers

Assignment # 1 (Cloud Computing Security)

Service Level Agreement for Windows Azure operated by 21Vianet

Leveraging Public Clouds to Ensure Data Availability

Amazon Cloud Storage Options

Stretching A Wolfpack Cluster Of Servers For Disaster Tolerance. Dick Wilkins Program Manager Hewlett-Packard Co. Redmond, WA dick_wilkins@hp.

Database Replication with MySQL and PostgreSQL

Everything you need to know about flash storage performance

A Legal and Technical Perspective on Secure Cloud Storage

Cloud Computing Is In Your Future

Azure VM Performance Considerations Running SQL Server

A Trustworthy and Resilient Event Broker for Monitoring Cloud Infrastructures

Database as a Service (DaaS) Version 1.02

CS2510 Computer Operating Systems

CS2510 Computer Operating Systems

Lecture 11. RFS A Network File System for Mobile Devices and the Cloud

Cloud computing in a nutshell

Berkeley Ninja Architecture

Practical Data Integrity Protection in Network-Coded Cloud Storage

SteelFusion with AWS Hybrid Cloud Storage

Citrix Training. Course: Citrix Training. Duration: 40 hours. Mode of Training: Classroom (Instructor-Led)

Aspera Direct-to-Cloud Storage WHITE PAPER

High Availability Storage

NCTA Cloud Architecture

CERN Cloud Storage Evaluation Geoffray Adde, Dirk Duellmann, Maitane Zotes CERN IT

Data Management in the Cloud

CLOUD SERVICES FOR EMS

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing WHAT IS CLOUD COMPUTING? 2

BookKeeper. Flavio Junqueira Yahoo! Research, Barcelona. Hadoop in China 2011

Cloud S ecurity Security Processes & Practices Jinesh Varia

References. Introduction to Database Systems CSE 444. Motivation. Basic Features. Outline: Database in the Cloud. Outline

Introduction to Database Systems CSE 444

Big Data Primer. 1 Why Big Data? Alex Sverdlov alex@theparticle.com

SDFS Overview. By Sam Silverberg

On Limitations of Using Cloud Storage for Data Replication

AppDev OnDemand Cloud Computing Learning Library

Neverfail for Windows Applications June 2010

Data In The Cloud: Who Owns It, and How Do You Get it Back?

NoSQL and Hadoop Technologies On Oracle Cloud

Graph Database Proof of Concept Report

Alfresco Enterprise on Azure: Reference Architecture. September 2014

Distributed File Systems

Cloud Computing Disaster Recovery (DR)

Low-cost Open Data As-a-Service in the Cloud

Distributed Systems (CS236351) Exercise 3

Cloud Computing. Up until now

The Google File System

A SURVEY OF POPULAR CLUSTERING TECHNOLOGIES

Transcription:

USENIX Annual Technical Conference 2014 SCFS: A Shared Cloud-backed File System Alysson Bessani, Ricardo Mendes, Tiago Oliveira, Nuno Neves, Miguel Correia, Marcelo Pasin*, Paulo Veríssimo University of Lisbon - Portugal * now at University of Neuchatel - Switzerland 1

Cloud-backed Storage Personal Storage Services (e.g., DropBox, OneDrive) DropBox System Amazon EC2 and S3 Cloud-backed File Systems (e.g., BlueSky, S3FS) Proxy Cloud Storage 2

State of the Art 3

Shared Cloud-backed File System (SCFS) Client-based Use existing cloud services Cloud Storage DATA DATA Pay-per ownership Each client pays for its own files SCFS Agent SCFS Agent SCFS Agent Strong Consistency Controlled sharing Access control for security and concurrency Redundant Cloud Services 4

SCFS Design 5

Design Choices Data layout/access pattern Each file is an object (single-block file) Multiple versions of the files are maintained Always write, avoid reading (exploiting free writes) Cache Persistent file cache Local storage is used to hold copies of all/most client files Opened files are also maintained in main-memory Short lived main-memory metadata cache To deal with bursts of metadata requests 6

Design Choices Consistency Consistency-on-close semantics Control of durability and consistency Locks used to avoid write-write conflicts Modular coordination Separate data from metadata Metadata is stored in a coordination service E.g., Zookeeper [ATC 10], DepSpace [EuroSys 08] Also used for managing file locks 7

SCFS Architecture Coordination Service Lock Service Access Control Metadata Computing clouds SCFS Agent Cache SCFS Agent Cloud storage Storage clouds Cache SCFS Agent Cache 8

Consistency 9

Strong vs Weak Consistency x.write(v) x.read() è v x.read() è v Strong Consistency (e.g., Paxos-based Systems, Azure Storage) x.write(v) x.read() è? x.read() è v Weak Consistency (e.g., Amazon S3, Rackspace Files) 10

Consistency Anchor Problem: How to provide strong consistency on top of weak consistent storage clouds? Composite Storage write Algorithm read strong consistency write read write weak consistency Storage Service strong consistency Consistency Anchor read Key property: the composite storage consistency is the same of the consistency anchor 11

Consistency Anchor in SCFS If x is cached, it is only validated! READ fd = open( x, );!! Local Storage Coordination read(fd, );! write(fd, );! 1 2 Service Main Memory! 2 fsync(fd);! Cloud(-of-Clouds)! write(fd, );!! close(fd);! Persistent Storage WRITE 1! 12

Implementation and Evaluation 13

traces with 1 milared [33]. Without end is that it removes any dependence of a single cloud provider, relying instead on a quorum of providers. It ld require 1 million means that data security is ensured even if f out-of 3f +1 of 1GB of storage of the cloud providers suffer arbitrary faults, which encompasses unavailability and data deletion, corruption or ple is 1KB, assum- 50 thousand tuples creation [15]. SCFS Although cloud Backends providers have their means needed, requiring a to ensure the dependability of their services, the recurring more importantly, occurrence of outages, security incidents (with internal or educe substantially SCFS external can origins) use different and data corruptions backends [19, 24] justifies the tion service, allow- need for this sort of backend in several scenarios. i.e., different cloud storage and a coordination service plugin user-space file syspper to connect the Overall, the SCFS f commented Java ce or storage back- FS in Java mainly e coordination and nd the high latency using a Java-based SCFS% Agent% AWS%Backend% DS% EC2( S3( BFT$SMaRt* SCFS% Agent% DepSky* CoC%Backend% Operation: blocking, non-blocking and non-sharing Figure 5: SCFS with Amazon Web Services (AWS) and Cloudof-Clouds (CoC) backends. Coordination services. The current SCFS prototype sup- DS% DS% DS% DS% RS( WA( GS( S3( 14

The Cloud-of-Clouds Backend Figure 6 shows how a file is securely stored in the cloud-of-clouds backend of SCFS using DepSky (see [15] for details). The procedure works as follows: (1) a random key K is generated, (2) this key is used to encrypt the Does not require trust on any single cloud provider file and (3) the encrypted file is encoded and each block is stored in different clouds together with (4) a share of K, obtained through secret sharing. Stored data security (confidentiality, integrity and availability) is ensured by the fact that no single cloud alone has access to the data SCFS works correctly as long as less than a third of the providers misbehave SCFS Agent Cache DepSpace/BFT-SMaRt [EuroSys 08, DSN 14] DepSky CoC Storage [ACM ToS 13] since K can only be recovered with two or more shares and that quorum reasoning is applied to discover the last version written. In the example of the figure, where a single faulty cloud is tolerated, two clouds need to be accessed to recover the file data. 1. gen. key File Data Client% 2. encrypt 4. secret sharing 3. erasure coding Storage%Services% 1 2 3 4 RS( 1 WA( Figure 6: A write in SCFS using the DepSky protocols. 4 Evaluation 2 GS( 3 S3( 4 15 popular open sou and S3FS [6]. Mo system (LocalFS sure a fair compa much better perf all SCFS variants set to 500 ms and ternative configu AWS CoC Blocki SCFS-AW SCFS-Co Table 2: SCFS v 4.2 Micro-b We start with quential reads, s writes, create fil marks are IO-in or close operati intensive. Table 3 systems. The results for show that the beh ilar, with the exc

Sharing Latency: SCFS vs DropBox DATA Rackspace Files #! Google Storage?$ DATA DATA Latency (s) 120 100 80 60 40 20 0 Windows Azure Blob % Amazon S3 90% value 50% value CoC-B CoC-NB AWS-B AWS-NB Dropbox CoC-B CoC-NB AWS-B AWS-NB Dropbox DATA CoC-B CoC-NB AWS-B AWS-NB Dropbox 256K 1M 4M 16M Data Size CoC-B CoC-NB AWS-B AWS-NB Dropbox Non-blocking Blocking Figure 10 marks consi ferent perce user. Recall (100%), wit nario. As e of private fi improves. F shared mo 16 study we ar marks decre

The benchmark follows the behavior observed in traces cesses to the coordination ser of a real system, which are similar to other modern desktop applications [25]. Typically, the files managed by the file in this system takes aroun the lock files used in this wor with s cloud-backed file system are just copied to a temporary from the usability point ingofth vi is observed in the SCFS-*-B lated as described in [25]. Nonetheless, as can be seen in of a lock file makesmore the system Desktop the benchmark definition Applications (Figure 7), these actions (especially save) still impose a lot of work the file system. We observed that most of file to be pushed to the cloud of blo manipulation of locktially files. Ho Open Action: 1 open(f,rw), 2 read(f), 3-5 open-write-close(lf1), 6-8 55% open-read-close(f), 9-11 open-read-close(lf1) not need to be stored in the SC Save Action: 1-3 open-read-close(f), 4 close(f), 5-7 open-readclose(lf1), 8 delete(lf1), 9-11 open-write-close(lf2), 12-14 open- ing service already prevents limitaw 40% concurrent lock clients. files We modi read-close(lf2), 15 truncate(f,0), 16-18 open-write-close(f), 19- sent an application that Shari writes 21 open-fsync-close(f), 22-24 open-read-close(f), 25 open(f,rw) just to avoid conflicts betwe Close Action: 1 close(f), 2-4 open-read-close(lf2), 5 delete(lf2) 80% ten us machine. The (L) variants i Figure 7: File system operations invoked in the file synchronization benchmark, simulating an OpenOffice document open, 11 CoC ingread/writes the lock files makes the with such local lock files. Th save and closeopen actions (f is the odt file and25 lf isopen a lock file). Save 20 more responsive. The takeaw Save Figure 8 shows Closethe average latency of each of the three of blocking cloud-backed fil 15 Close be av actions of our benchmark for SCFS, S3QL and S3FS, considering a file of 1.2MB, which corresponds to the aver- limitations of accessing remo tially improved if application 10 5 age file size observed in 2004 (189KB) scaled-up 15% per Sharing files. Personal clo year to reach the expected value for 2013 0 [11]. ten used for sharingdropb AWS CoC CoC S3QL files AWS CoC S3FS (NS) way [20]. We designed an 2.5 dedup Open 25 2 Open Non-blocking the time it takes for a share Save 20 Save Blocking 1.5 Close 15 Close be available for reading by 1 10 17 (a) Non-blocking. 0.5 *-{NB,B}. We did the sam 5 (b) Blocking. 0 0 Dropbox shared folder illustr (cre Figure 7: File system operations invoked in the file synchronization benchmark, simulating an OpenOffice document open, save and close actions (f is the odt file and lf is a lock file). Benchmarking (Unmodified) directory on the local file system where they are manipu- Figure 8 shows the average latency of each of the three actions of our benchmark for SCFS, S3QL and S3FS, considering a file of 1.2MB, which corresponds to the average file OpenOffice size observed in 2004 (189KB) scaled-up 15% per Writer year to reach the expected value for 2013 [11]. Latency (s) 2.5 2 1.5 1 0.5 0. S3QL S3QL(L). CoC-NS CoC-NS(L). CoC-NB CoC-NB(L) AWS-NB(L) AWS-NB Latency (s). S3QL S3QL. CoC- CoC-. CoC- CoC- AWS- AWS- Latency (s) Latency (s). AWS AWS S3FS(L) S3FS. CoC-B(L) CoC-B. AWS-B(L) AWS-B CoC CoC. S3F S3F machi way [2 the tim *-{NB sign [2

Financial Evaluation VM Instance EC2 EC2 4 CoC Capacity Large $6.24 $24.96 $39.60 7M files Extra Large $12.96 $51.84 $77.04 15M files (a) Operation costs/day and expected coordination service capacity. Cost/op (microdollar) 4096 1024 256 64 16 4 read CoC read AWS read CoC write AWS write write 0 5 10 15 20 25 30 File size (MB) (b) Cost per operation (log scale). Cost/day (microdollar) 180 160 140 120 100 80 60 40 20 0 CoC AWS 0 5 10 15 20 25 30 File size (MB) (c) Cost per file per day. +50% Figure 11: The (fixed) operation and (variable) usage costs of SCFS. The costs include outbound traffic generated by the coor- 18 Complex proxies) a coordin lock man port stron a cloud-o Cloud-o ified) clo described techniqu avoid ve in the pas DepSky [ ing and B

Wrap Up SCFS is a cloud-backed file systems that can be used for backup, disaster recovery and sharing data Key design principles: Always write, avoid reading (very cheap in terms of $$$) Strong consistency (despite storage cloud weak consistency) Experience so far Multi-cloud replication is feasible (CoC not slower!) This is a case for BFT crash-only solution will not make things better Being employed for sharing dataset metadata among Biobanks 19

Timeline of this work Summer 2011 initial idea for two MsCs October 2012 MsCs concluded November 2012 - Demo at the 2 nd TClouds FP7 project review April 2013 first submission (C2FS) September 2013 second submission January 2014 third submission 20

Thanks! SCFS code available at http://code.google.com/p/depsky/wiki/scfs DepSky and DepSpace/BFT-SMaRt also available http://code.google.com/p/depsky/ http://code.google.com/p/bft-smart/ 21