Cloud File System. Cloud computing advantages:



Similar documents
Features of AnyShare

Network File System (NFS) Pradipta De

09'Linux Plumbers Conference

Personal Cloud. Support Guide for Mac Computers. Storing and sharing your content 2

Example of Standard API

Associative Way of Data Storage

International Journal of Advance Research in Computer Science and Management Studies

Cloud Storage Service

OneDrive for Business. (formerly SkyDrive Pro) An Introductory Briefing

A programming model in Cloud: MapReduce

Distributed File Systems

ProTrack: A Simple Provenance-tracking Filesystem

CHAPTER 17: File Management

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture

Filesystems Performance in GNU/Linux Multi-Disk Data Storage

SDFS Overview. By Sam Silverberg

SOA, case Google. Faculty of technology management Information Technology Service Oriented Communications CT30A8901.

File System Management

We mean.network File System

Network Attached Storage. Jinfeng Yang Oct/19/2015

MarkLogic Server. Installation Guide for All Platforms. MarkLogic 8 February, Copyright 2015 MarkLogic Corporation. All rights reserved.

Egnyte Cloud File Server. White Paper

Cloud Computing. What is it? Presented by Prof. Dr.Prabhas CHONGSTITVATANA Asst. Prof. Dr.Chaiyachet SAIVICHIT. Source : Montana State Library Archive

Lab 2 : Basic File Server. Introduction

WOS Cloud. ddn.com. Personal Storage for the Enterprise. DDN Solution Brief

Oracle Cluster File System on Linux Version 2. Kurt Hackel Señor Software Developer Oracle Corporation

Survey of Filesystems for Embedded Linux. Presented by Gene Sally CELF

Cloud Computing Paradigm Shift. Jan Šedivý

Manual for Android 1.5

owncloud Configuration and Usage Guide

Performance Analysis of Client Side Encryption Tools

InstaFile. Complete Document management System

CloudFTP: A free Storage Cloud

INTRODUCTION TO CLOUD MANAGEMENT

Gladinet Cloud Backup V3.0 User Guide

owncloud Architecture Overview

<Insert Picture Here> Btrfs Filesystem

Cloud Computing an introduction

SAMBA AND SMB3: ARE WE THERE YET? Ira Cooper Principal Software Engineer Red Hat Samba Team

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing WHAT IS CLOUD COMPUTING? 2

Budget Event Management Design Document

The Cloud to the rescue!

Archiving, Indexing and Accessing Web Materials: Solutions for large amounts of data

Cloud Based Collaboration Tool Rohini C. Ekghare*, Prof. Manish Hadap**

A CLOUD-BASED FRAMEWORK FOR ONLINE MANAGEMENT OF MASSIVE BIMS USING HADOOP AND WEBGL

Hadoop Distributed File System. T Seminar On Multimedia Eero Kurkela

Windows NT File System. Outline. Hardware Basics. Ausgewählte Betriebssysteme Institut Betriebssysteme Fakultät Informatik

Outline. Windows NT File System. Hardware Basics. Win2K File System Formats. NTFS Cluster Sizes NTFS

File Systems Management and Examples

Encrypt-FS: A Versatile Cryptographic File System for Linux

Zend Server 4.0 Beta 2 Release Announcement What s new in Zend Server 4.0 Beta 2 Updates and Improvements Resolved Issues Installation Issues

Last modified: November 22, 2013 This manual was updated for the TeamDrive Android client version

Installing buzztouch Self Hosted

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture

owncloud Architecture Overview

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Assignment # 1 (Cloud Computing Security)

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

Storage Architectures for Big Data in the Cloud

LSKA 2010 Survey Report I Device Drivers & Cloud Computing

Qsync Install Qsync utility Login the NAS The address is :8080 bfsteelinc.info:8080

Final Year Project Interim Report

Review from last time. CS 537 Lecture 3 OS Structure. OS structure. What you should learn from this lecture

Technology in Action. Alan Evans Kendall Martin Mary Anne Poatsy. Eleventh Edition. Copyright 2015 Pearson Education, Inc.

ViewBox: Integrating Local File System with Cloud Storage Service

Cloud storage with Apache jclouds

Cloud Computing Services and its Application

Original-page small file oriented EXT3 file storage system

Red Hat Linux Internals

Introduction to Cloud Storage GOOGLE DRIVE

From Centralization to Distribution: A Comparison of File Sharing Protocols

EMC RepliStor for Microsoft Windows ERROR MESSAGE AND CODE GUIDE P/N REV A02

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam

Document OwnCloud Collaboration Server (DOCS) User Manual. How to Access Document Storage

Cloud Based Application Architectures using Smart Computing

Journal of science STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS)

File System Encryption with Integrated User Management

Linux Filesystem Comparisons

Hypertable Architecture Overview

Chapter 3 Operating-System Structures

Xythos WebFile Server Architecture A Technical Guide to the Core Technology, Components, and Design of the Xythos WebFile Server Platform

What is a database? COSC 304 Introduction to Database Systems. Database Introduction. Example Problem. Databases in the Real-World

This guide specifies the required and supported system elements for the application.

OneDrive for Business

Operating System Components and Services

Oracle Applications and Cloud Computing - Future Direction

Solution for private cloud computing

DESIGN AND IMPLEMENTATION OF A FILE SHARING APPLICATION FOR ANDROID

JBoss & Infinispan open source data grids for the cloud era

HDFS Architecture Guide

ovirt and Gluster hyper-converged! HA solution for maximum resource utilization

1/5/2013. Technology in Action

ovirt and Gluster hyper-converged! HA solution for maximum resource utilization

Violin: A Framework for Extensible Block-level Storage

Chapter 11 Distributed File Systems. Distributed File Systems

CLOUD STORAGE USING HADOOP AND PLAY

Transcription:

Cloud File System Liberios Vokorokos, Anton Baláž, Branislav Madoš and Ján Radušovský Faculty of Electrical Engineering and Informatics Technical University of Košice liberiosvokorokos@tukesk, antonbalaz@tukesk, branislavmados@tukesk, janradusovsky@tukesk Abstract Cloud computing became mainstream in the field of information technology Cloud computing represents a distributed computing paradigm that focuses on providing a wide range of users with distributed access to scalable, virtualized hardware and/or software infrastructure over the Internet The presented paper proposes a new file system as SaaS service of a cloud The paper presents architecture and implementation details and comparison with the present data store methods I INTRODUCTION Cloud computing represents a new model of development and use of applications, software platforms and hardware infrastructure projects, accessed by the user through a web browser or the client applications and uses them as a service All data is stored on an external server and a user can access them from any device Cloud represents a group of IT resources that are usually allocated at one place Physical infrastructure is hidden from the user, and the user only uses its services End user employs only cloud functionality and does not need to own hardware infrastructure and licenses to a software General cloud computing architecture is depicted in the Fig 1 Cloud is a virtual set of services to which the user or organization can access from anywhere and any device Internet Cloud Provider SaaS PaaS IaaS Fig 1 Cloud computing advantages: Cloud architecture Lower TCO - user pays only for the used services Better mobility - access to corporate information from any device and the Internet Flexibility - ability to dynamically respond to changes Reliability and data security - data is stored on external devices and regularly backed up With increasing Internet usage methods of storing and using data also change In conventional model, information is stored through the file system on a local data medium With raising persistent connectivity and user mobility it is inevitably to user have constantly and the most effective data access Present set of cloud services are oriented to web browser or another dedicated applications, which were not developed with properties of massive work with data in form of such files and directories With this context, emerge needs to linking conventional model of data storing at operating system level and present trend using cloud services Next part of the paper deals with architecture and implementation of cloud files system as a Software as a Service - SaaS, utilizing GoogleDoc services II ANALYSIS Main goal of the proposed file system is to employ present cloud services as data storage thought conventional model of data manipulation utilizing operating system API This proposal reflects mapping of individual OS function and its implementations through the cloud service API First part, we need to choose a suitable cloud service The most suitable candidate become services of Google Google allows access to its services through number of APIs that are available in multiple programming languages such as Java, Python, Net, PHP, Objective-C We consider a few appropriate APIs: Google Documents List Picasa Web Albums YouTube Last two services are excluded mainly because of absence of support for direct uploading raw data Using one of them, the special data representation in supported formats will be needed like photo or video formats This fact would strongly complicate the implementation of cloud file system Google Documents List exists in three versions: 10, 20, 30 Using version 10 is deprecated, version 20 is stable, but lately not supported and version 30 is in advanced stage of development and is currently used in the Google Docs environment The fundamental difference between ver 20 and 30 is collection (folder) support and particularly, arbitrary file (file with any content) support Disadvantage of using version 20, is necessity to represent data in Google Docs format, which has their own limits at maximum sizes Since major changes in 30 were done, advantage of using arbitrary files which may contain up to 10 GB of data is an obvious

choice With such capacity in a single file, there is no need for implementing a splitting mechanism Google SaaS III ARCHITECTURE Connection between Google Documents List API and an operating system is easier through userspace filesystem API than its direct implementation of the file system as a kernel module There are number of language bindings for FUSE technology, so we decided to choose Java programming language, mainly because of more platform independence The FUSE-J is technology, which enables to write file system in user space in Java programming language thanks to Java Native Interface (JNI) It creates a bridge between FUSE libraries and our implementation of a file system using created shared library In our design, where we represent data in form of virtual media, FUSE technology as a kernel module It allows non-privileged users to create their own file systems in user space without changing kernel source code (Fig 2) ls -l /path/cfs VFS cfs getattr() read() write() glibc gdoc Handler DB Handler FUSE-J libfuse /dev/fuse Fuse Module DB SQLiteJDBC Userspace Filesystem Fig 3 Cloud file system architecture A cfs Modules Program VFS Fig 2 glibc libfuse /dev/fuse Fuse Module FUSE technology Core design lies in application programming interface of Google Documents List in combination with particular methods of file system in FUSE or FUSE-J technology This fact divided file system into four components (Fig 3): File system - FUSE/FUSE-J operations, Google Documents List Handler (Gdocs Handler) Database Handler FUSE-J component B Initialization The first step in implementation was to ensure communication with Google Cloud The Gdocs Handler is part of the file system that will contain all the logic that serves for communication between application and server After passing login and password parameters of user Google account, handler will create new instance of service based on parameters If login and password is correct, Gdocs handler will initialize two feeds First Metadata is used to obtain information about Google account mainly about account size Second DocumentList contains information about entities stored on the server It contains collection of metadata such as resource id, description, link, MD5 checksum, size and others After successful initialization, the file system needs to set number of blocks and block size of a file system, so user can see available space and other statistics on the mounted system Unfortunately, FUSE-J in statfs (statistics) method works only with integer primitive data type (max value = 2 147 483 648, aka 2 GB) This means that it needs to implement a method which ensures that integer capacity will not be exceeded This is no problem with standard Google account, but user might upgrade account up to 16 TB With 1 TB account and block size of 4096 bytes, the number of blocks would exceed integer capacity Solution was to increase block size twice, until the integer capacity will not be exceeded Google Documents List does not support certain features, so we needed to create module which will help application in better adaptation to the file system interface We created an interface which solves abstraction of creating new directories, storing rights and other properties of the file system This handler brings performance into implementation, because it enables to create a lot of matters in the local database, which takes much less time However it needs to synchronize those data with the Google cloud Synchronization part is implemented in Google Documents List Handler Finally SQLite database file is used for storing necessary data with the proposed architecture C Database Next step is to initialize database - the Database Handler If the file already exists on the server, Gdocs handler will download this database file, file system metadata, based on a special id If the database file does not exist Database handler will create new database file in /tmp folder (of local OS file system), creates designed table and fills it firstly with

special root node and then with other entities After that, the application will upload the whole content to the cloud server Initializing size column in the DB is not very effective because of the native Google formats Native formats do not support quota, and metadata size returns zero, so it is needed to read the whole file content of a file, count the bytes and discard the data D Mapping Google Files System with FUSE GFS is hidden in class which implements Filesystem3 interface from FUSE-J framework That system part is intersection of all the used technologies All data from the Internet cloud will be interpreted into the file system format FUSE/FUSE- J implements a lot of methods like getattr, getdir, mknod, unlink, rename, statfs, open, read, write and etc All methods return error number (from 0-124, while 0 indicates correct statement), which defines further application behavior, for example: Permission denied, I/O error or No such file or directory The first method, which is called, is getattr() This method is one of the most important and most called methods of the FUSE-J interface It describes important information about each node Except metadata information from the database, getattr() required other necessary information like hash, block count etc This leads to creating a separate class that describes structure which is required by getattr() method It could be as a bridge between database structure and the file system method interface Instance of the designed object named GfsNode will be created by the Database handler The getdir() method also uses collection of the GfsNodes E File Descriptor and Node Creation The open() method creates important File Header File header is object, which stores certain information about the opened node Mainly read() and write() methods will use this header to determine further behavior The header provides information such as absolute path, id, flags, and logic variable that indicates pending file (for write) Before we can read data from files, it is necessary to create empty files or folders, where methods mknod() and mkdir() are applied If user created a new node directly on the Google Cloud in form of metadata and renamed it immediately, duration of these two operations would take too long considering communication and would cause freezing of the FS This was a reason to invent the solution that solves file/folder creation in local database, which dramatically speeds up the file system Renaming record in database became no longer a problem To note, the database is not synchronized yet The id in the database has following format: in case of file: tmp :< T >: hash :< H > in case of folder: gfs dir :< T >: hash :< H > Where < T > is actual system time and < H > is hash code based on path F Reading The most important part of the file system is reading files The read() method gets file header passed from open() method There are some reading cases that may occur: Reading empty file Reading file during uploading content Reading file with uploaded content First case First read() checks whether id stored in file header instance matches particular pattern (tmp :< T >: hash :< H >) If so, the opened file is newly created or gets into the uploading state In this case, the method simply returns 0 and a user gets an empty file (no data are presented) The file system stores this information only within local database as there is no need for uploading an empty node In the latter case, when a user wants to open a file of which the content is not fully uploaded on the cloud, the read method checks the entry id (from the file header) and then it checks whether the write thread is still running If the content is still in the uploading state, the file system returns EBUSY error Otherwise, the entry is stored on Google cloud The method needs to send a request to Gdocs handler, which fills a byte buffer with requested media content Due to frequent method calls (image or text previews etc), it leads to a partial stream reading In turn, during the next read it is necessary to discard the previous stream and to create another media request It is also important to solve buffer offset mismatches wherein the stream offset is different from the required offset Unfortunately, requesting entry media content using Google API is relatively time consuming operation, probably due to enormous number of queries to the cloud by other users or by an ineffective API implementation G Writing Writing method was harder to implement considering various file system method calls Write method also had available file header as well as the read() method The first problem is hidden in Resumable upload mechanism, which we had to use because of Google requirements The mechanism supports only MediaFileSource, which is class from the Google API and its instance requires file located on disk When a user copies data to file system, the system needs to create a file in some known path (like /tmp) and upload it Problem lies in temporary redundancy of a file The write method does not write data by itself, but it handles a file preparation for upload The real upload is running in form of new thread in release() or fsync() method For example when file system creates temporary hidden file to which data are written and after the original file is removed and temporary file is renamed to the original name The write thread uploads files to the server, updates id in the database (so it is no longer temporary), adds new metadata about newly uploaded entry to local file system memory and deletes swapped file from temporary place located on local disk Due to this implementation, file system is unable to track upload speed and lack the upload progress bar Actually Google works

TABLE I FILE SYSTEM PROPERTIES File system Max file size Max disk space File permission Link support File recovery Local / remote access ext3 2 TB 16 TB yes yes yes local DBFS 220 EB 1 248 EB 1 yes no yes local and remote gfs 10 GB 16 TB 2 yes no no remote on better implementation of resumable upload mechanism to enable support for byte arrays H Renaming Rename is one of the ambiguity methods of the file system interface It does not involve only a simple node or directory renaming, yet it is also used for moving or even removing files or directories (moving to trash) Basically, its principle lies in changing the path of a given file or a directory Moreover, it handles several cases of the renaming that may occur Many operations are done locally, thus communication with server has to be reduced (eg local renaming empty nodes) due to time costs Considering the cfs needs, we did not implement methods such as flush() or truncate() and the methods simply return 0 As the file system does not support reading and creation of links, all the operations associated with links return ENOSYS error Function not implemented I Other Operations Other file system operations were performed simply because of the designed architecture Actions such as unlink(), which works in 2 steps - database entry deletion and server entry deletion or rmdir() which is recursive form of unlink() on the whole directory Operation utime() simply changes database record about access time and modification time, as well as chmod() and chown() that also changes only database records, followed by synchronization IV FILE SYSTEMS COMPARISION We compare the designed cloud file system with standard UNIX file system ext3 and with the database file system DBFS which has been developed as a part of the reasearch at The Faculty of Electrical Enegineering and Informatics in Košice First, we compared general properties of the file systems (Table I) and their support with operating systems (Table II) TABLE II OPERATING SYSTEMS SUPPORT File system Linux MS Windows Mac OS Solaris ext3 yes external apps external apps yes DBFS yes no with FUSE with FUSE gfs yes no with FUSE with FUSE 1 Restriction resulting from local file system usage where DB system is running 2 Google capacity is depending on given account, which can be 10 GB, 20 GB, 80 GB, 200 GB, 400 GB, 1 TB, 2 TB, 4 TB, 8 TB, 16 TB of total capacity V CONCLUSION Cloud computing represents present part of the IT environment used by users In this paper we proposed cloud file system, where the data is stored in cloud Google SaaS The main advantage is its accessibility, a user can work with the same data and hierarchical structure on different machines with implemented cloud FS just like with a portable permanent medium FS allows reading, writing or deleting files, even changing file permissions On the other hand, finding mapping between API, communication protocols and file system requirements was not direct, so several problems have occurred The architecture tried to support compatibility with the web interface for data access This fact has complicated the development of certain methods Due to the inexplicably long communication response with Google Documents List API, we needed to solve some problems locally with further database synchronization The bigger the database file, the more load on a connection channel This may reduce application performance during reading or writing File system has several possible upgrades that could be applied If the file system manages data on the server with simple database service, we will be able to use DDL and DML queries and no database synchronization will be necessary We can find other improvements in hash tables or B-tree usage that bring better performance in file searching or in more complex hierarchical structures There is also possibility to avoid web interface support and focus more on behavior of the file system and represent files in blocks in cloud, so no stream mismatches will occur ACKNOWLEDGMENT This work was supported by the Slovak Research and Development Agency under the contract No APVV-0008-10 and KEGA 008TUKE-4/2013 Microlearning environment for education of information security specialists REFERENCES [1] S Ames et al, Richer file system metadata using links and attributes, Proceeding of the 22nd IEEE / 13th NASA Goddard conference on mass storage systems and technologies, College Park, MD, 2005, pp 4960 [2] S Ames et al, LiFS: An attribute-rich file system for storage class memories, Proceedings of the 23rd IEEE / 14th NASA Goddard conference on mass storage systems and technologies, College Park, MD, 2006 [3] N Adam, Single input operators of the DF KPI system, Acta Polytechnica Hungarica, vol 7, no 1, pp 7386, 2010 [4] T Gang et al, Safe Java native interface, Proceedings of the IEEE international symposium on secure software engineering, 2006, pp 97106

[5] D Garg and F Pfenning, A proof-carrying file system, IEEE symposium on security and privacy (SP), 2010, pp 349364 [6] S Li and G Tan, JET: Exception checking in the Java native interface, OOPSLA 11 Proceedings of the 2011 ACM international conference on object oriented programming systems languages and applications, ACM New York, USA, 2011, pp 345358 [7] M Szeredi, Filesystem in user space, available at: http://fusesourceforgenet/, 2012 [8] S Tezuka et al, Distributed secure virtual file system using FUSE, International federation for information processing, vol 286, 2008, pp 161172 [9] M Tomasek, Language for a distributed system of mobile agents, Acta Polytechnica Hungarica, vol 8, no 2, issn 1785-8860, pp 6179, Budapest, 2011 [10] LVokorokos, Digital computer principles,typotex, Budapest, pp 232, isbn 963954809-X, 2004 [11] Y Shi, S Luan, Q Li, and H Wang, A Flexible Business Process Customization Framework for SaaS, in ProcICIE 09, 2009, pp 350-353 [12] H Cai, N Wang and MZhou, A Transparent Approach of Enabling SaaS Multi-tenancy in the Cloud, in ProcSERVICES 10, 2010, pp40-47