SCALABLE CLUSTER BASED CLOUD STORAGE



Similar documents
Cloud Computing Simulation Using CloudSim

Dr. J. W. Bakal Principal S. S. JONDHALE College of Engg., Dombivli, India

CloudSim: A Toolkit for Modeling and Simulation of Cloud Computing Environments and Evaluation of Resource Provisioning Algorithms

Energy Conscious Virtual Machine Migration by Job Shop Scheduling Algorithm

Multilevel Communication Aware Approach for Load Balancing

LOAD BALANCING OF USER PROCESSES AMONG VIRTUAL MACHINES IN CLOUD COMPUTING ENVIRONMENT

An Implementation of Load Balancing Policy for Virtual Machines Associated With a Data Center

A Proposed Service Broker Strategy in CloudAnalyst for Cost-Effective Data Center Selection

Utilizing Round Robin Concept for Load Balancing Algorithm at Virtual Machine Level in Cloud Environment

Performance Gathering and Implementing Portability on Cloud Storage Data

International Journal of Computer & Organization Trends Volume21 Number1 June 2015 A Study on Load Balancing in Cloud Computing

An Efficient Cloud Service Broker Algorithm

EFFICIENT VM LOAD BALANCING ALGORITHM FOR A CLOUD COMPUTING ENVIRONMENT

VM Provisioning Policies to Improve the Profit of Cloud Infrastructure Service Providers

Optimal Service Pricing for a Cloud Cache

Efficient and Enhanced Load Balancing Algorithms in Cloud Computing

Dynamic Round Robin for Load Balancing in a Cloud Computing

Performance Evaluation of Round Robin Algorithm in Cloud Environment

Modeling Local Broker Policy Based on Workload Profile in Network Cloud

Performance Analysis of VM Scheduling Algorithm of CloudSim in Cloud Computing

Gecko: A Contention-Oblivious Design for Cloud Storage

CDBMS Physical Layer issue: Load Balancing

Storage CloudSim: A Simulation Environment for Cloud Object Storage Infrastructures

MyDBaaS: A Framework for Database-as-a-Service Monitoring

Service Broker Algorithm for Cloud-Analyst

Figure 1. The cloud scales: Amazon EC2 growth [2].

ISSN (Print): , ISSN (Online): , ISSN (CD-ROM):

Desktop Virtualization and Storage Infrastructure Optimization

Payment minimization and Error-tolerant Resource Allocation for Cloud System Using equally spread current execution load

Efficient Service Broker Policy For Large-Scale Cloud Environments

Challenges and Importance of Green Data Center on Virtualization Environment

Microsoft Private Cloud Fast Track

CloudSimDisk: Energy-Aware Storage Simulation in CloudSim

MaxDeploy Ready. Hyper- Converged Virtualization Solution. With SanDisk Fusion iomemory products

Heterogeneous Workload Consolidation for Efficient Management of Data Centers in Cloud Computing

Round Robin with Server Affinity: A VM Load Balancing Algorithm for Cloud Based Infrastructure

A Dynamic Resource Management with Energy Saving Mechanism for Supporting Cloud Computing

Study and Comparison of CloudSim Simulators in the Cloud Computing

UPS battery remote monitoring system in cloud computing

Throtelled: An Efficient Load Balancing Policy across Virtual Machines within a Single Data Center

CLOUD COMPUTING: A NEW VISION OF THE DISTRIBUTED SYSTEM

EMC VPLEX FAMILY. Continuous Availability and data Mobility Within and Across Data Centers

MaxDeploy Hyper- Converged Reference Architecture Solution Brief

Investigation of Cloud Computing: Applications and Challenges

SQL Server Virtualization

Dynamic resource management for energy saving in the cloud computing environment

Storage I/O Control: Proportional Allocation of Shared Storage Resources

CSE-E5430 Scalable Cloud Computing P Lecture 5

International Journal of Digital Application & Contemporary research Website: (Volume 2, Issue 9, April 2014)

PRIVACY PRESERVATION ALGORITHM USING EFFECTIVE DATA LOOKUP ORGANIZATION FOR STORAGE CLOUDS

CloudAnalyzer: A cloud based deployment framework for Service broker and VM load balancing policies

Object Storage: A Growing Opportunity for Service Providers. White Paper. Prepared for: 2012 Neovise, LLC. All Rights Reserved.

Efficient Cloud Management for Parallel Data Processing In Private Cloud

Dr. Ravi Rastogi Associate Professor Sharda University, Greater Noida, India

A Survey on Cloud Computing

A Novel Cloud Computing Architecture Supporting E-Governance

Solution Brief Availability and Recovery Options: Microsoft Exchange Solutions on VMware

WHITE PAPER Optimizing Virtual Platform Disk Performance

Keywords: Cloudsim, MIPS, Gridlet, Virtual machine, Data center, Simulation, SaaS, PaaS, IaaS, VM. Introduction

A Cloud Data Center Optimization Approach Using Dynamic Data Interchanges

Testing Network Virtualization For Data Center and Cloud VERYX TECHNOLOGIES

Nutan. N PG student. Girish. L Assistant professor Dept of CSE, CIT GubbiTumkur

Cloud Computing with Azure PaaS for Educational Institutions

Simulation-based Evaluation of an Intercloud Service Broker

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

Putting Genomes in the Cloud with WOS TM. ddn.com. DDN Whitepaper. Making data sharing faster, easier and more scalable

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

SERVICE BROKER ROUTING POLICES IN CLOUD ENVIRONMENT: A SURVEY

A Proposed Service Broker Policy for Data Center Selection in Cloud Environment with Implementation

Scheduling Virtual Machines in Cloud Computing For Enhancing Income and Resource Utilization

Load Balancing in Fault Tolerant Video Server

Data Centers and Cloud Computing. Data Centers

Efficient Qos Based Resource Scheduling Using PAPRIKA Method for Cloud Computing

International Journal of Advance Research in Computer Science and Management Studies

PERFORMANCE ANALYSIS OF PaaS CLOUD COMPUTING SYSTEM

SCHEDULING IN CLOUD COMPUTING

CLOUD COMPUTING PERFORMANCE EVALUATION: ISSUES AND CHALLENGES

SURVEY ON THE ALGORITHMS FOR WORKFLOW PLANNING AND EXECUTION

Milestone Solution Partner IT Infrastructure MTP Certification Report Scality RING Software-Defined Storage

DynamicCloudSim: Simulating Heterogeneity in Computational Clouds

Infrastructure as a Service (IaaS)

MANAGEMENT OF DATA REPLICATION FOR PC CLUSTER BASED CLOUD STORAGE SYSTEM

2) Xen Hypervisor 3) UEC

Load Balancing Algorithm Based on Estimating Finish Time of Services in Cloud Computing

How To Store Data On An Ocora Nosql Database On A Flash Memory Device On A Microsoft Flash Memory 2 (Iomemory)

Keywords Distributed Computing, On Demand Resources, Cloud Computing, Virtualization, Server Consolidation, Load Balancing

A Scalable Network Monitoring and Bandwidth Throttling System for Cloud Computing

Nutanix Tech Note. Configuration Best Practices for Nutanix Storage with VMware vsphere

EMC Documentum Interactive Delivery Services Accelerated Overview

Early Cloud Experiences with the Kepler Scientific Workflow System

Auto-Scaling Model for Cloud Computing System

CloudAnalyst: A CloudSim-based Visual Modeller for Analysing Cloud Computing Environments and Applications

Keywords: PDAs, VM. 2015, IJARCSSE All Rights Reserved Page 365

Sla Aware Load Balancing Algorithm Using Join-Idle Queue for Virtual Machines in Cloud Computing

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

Transcription:

SCALABLE CLUSTER BASED CLOUD STORAGE Parinaz Eskandarian Miyandoab 1 and Jaber Karimpour 2 1 Department of Computer Engineering, Islamic Azad University, Zanjan Branch, Zanjan, Iran parinazeskandarian@yahoo.com 2 Department of Computer Science, University of Tabriz, Tabriz, Iran karimpour@tabrizu.ac.ir ABSTRACT We consider a cloud system that has to save lots of files and has to use hundreds of computers. The existing cloud storage designs are not scalable enough to support such a huge number of nodes. In this paper, we propose a novel cloud storage system containing thousands of virtual file servers on hundreds of computers. We group these virtual servers into clusters. This system is perfectly scalable because the system load is divided among the clusters. Our simulation experiments show that our cloud storage system achieves smaller file read/write latency and traffic/processing overhead than the existing systems. KEYWORDS Cloud Storage, Data Center, Virtual, Cluster, File Server 1. INTRODUCTION Cloud storage [1] is a model of networked online storage where data is stored in virtualized pools of storage which are generally hosted by third parties. Hosting companies operate large data centers, and people who require their data to be hosted buy or lease storage capacity from them. The data center operators virtualize the resources according to the requirements of the customer and expose them as storage pools, which the customers can themselves use to store files or data objects. Physically, the resource may span across multiple servers. Cloud storage services such as Amazon S3 [2], cloud storage products such as EMC Atmos [3], and distributed storage research projects such as OceanStore [4] are all examples of object storage. In this paper, we propose a novel cloud storage system containing thousands of virtual file servers on hundreds of computers. We group these virtual servers into clusters. This system is perfectly scalable because the system load is divided among the clusters. Our simulation experiments show that our cloud storage system achieves smaller file read/write latency and traffic/processing overhead than the existing systems. The rest of this paper is organized as follows. We review related works in Section 2. We propose our cloud system in Section 3. Section 4 contains our simulation results and Section 5 concludes the paper. DOI:10.5121/ijfcst.2013.3101 1

2. RELATED WORKS In this section, we review related researches that focus on designing cloud systems. In [5], authors design a scalable architecture called vstore that provides reliable virtual disks for virtual machines (VM) in a Cloud environment. vstore uses the host s limited local disks as a block-level cache for the network attached storages. One of the challenges of cloud storage system is difficult to balance the providing huge elastic capacity of storage and investment of expensive cost for it. In order to solve this issue in the cloud storage infrastructure, low cost PC cluster based storage server is configured in [6] to be activated for large amount of data to provide cloud users. BlueSky [7] is a network file system backed by cloud storage. BlueSky stores data persistently in a cloud storage provider allowing users to take advantage of the reliability and large storage capacity of cloud providers and avoid the need for dedicated server hardware. Authors in [8] address the problem of building a secure cloud storage system which supports dynamic users and data provenance. Gecko [9] is a design for storage arrays where a single log structure is distributed across a chain of drives, physically separating the tail of the log from its body. This design provides the benefits of logging fast, sequential writes for any number of contending applications while eliminating the disruptive effect of log cleaning activity on application I/O. Authors in [10] present a power-lean storage system, where racks of servers, or even entire data center shipping containers, can be powered down to save energy. MetaStorage [11] is a federated Cloud storage system that can integrate diverse Cloud storage providers. MetaStorage is a highly available and scalable distributed hash table that replicates data on top of diverse storage services. Authors in [12] present an architecture for a secure data repository service designed on top of a public Cloud infrastructure to support multi-disciplinary scientific communities dealing with personal and human subject data, motivated by the smart power grid domain. ecstore [13] is an elastic cloud storage system that supports automated data partitioning and replication, load balancing, efficient range query, and transactional access. Cloudy [14] is a modular cloud storage system. Cloudy provides a highly flexible architecture for distributed data storage and is designed to operate with multiple workloads. 3. CLOUD STORAGE DESIGN In this section, we propose a cloud storage system. We call it CCS (Cluster-based Cloud Storage). 3.1. General Description Figure 1 shows the general view of the CCS system. VMs are grouped into clusters. Each cluster has a cluster controller that manages the cluster. There is one central controller in the system that manages cluster controllers. To reduce the load on the central controller, we try to assign as many tasks as possible to cluster controllers. 2

Figure 1. Cloud system design In the remaining part of this section, we describe the system s behaviour under various conditions assuming the clustered design depicted in Figure 1. 3.2. Clustering Algorithm We put the VMs installed on the same computer in a cluster. We use the following metric to cluster the VMs. Clustering metric: Being on the same physical computer. Clustering is done once we start the cloud system and once we add/remove a VM. The clustering algorithm contains the following steps: I. The system manager adds one physical computer as the central controller to the system. II. The system manager decides the number of clusters for the system. III. The system manager adds one physical cluster controller computer for each cluster. IV. The system manager configures the central controller to identify the cluster controllers by IP address. V. The central controller assigns a unique cluster id to each cluster controller. VI. While there is a non-clustered VM vm1 in the system, do 3

VII. a. The central controller assigns vm1 to the cluster (called Cluster cid) with the least number of VMs that satisfies the following condition. i. There is no VM vm2 in Cluster cid such that vm1 and vm2 are on the same physical computer. The central controller informs the cluster id (cid) to vm1. 3.3. Periodic Operations The central controller distributes tasks among the cluster controllers itself. The central controller keeps loads of the cluster controllers in a table. The central controller updates this table after sending each request to a cluster controller. Load of a cluster controller equals summation of the loads on the VMs in the cluster. The cluster controller distributes tasks among the cluster VMs itself. The cluster controller keeps loads of its VMs in a table. The cluster controller updates this table after sending each request to a VM. Load of a VM equals number of the requests on the VM. The central controller checks cluster controllers for failure periodically and before sending a request to them. The cluster controller checks its VMs for failure periodically and before sending a request to them. 3.4. File Write Request When an Internet user requests to write a file, the cloud system follows these steps: I. The central controller receives the request from the Internet. II. The central controller sends the request to the cluster controller with the least load. III. The cluster controller sends the request to the VM with the least load. IV. The VM saves the file inside its disk. 3.5. File Read Request When an Internet user requests to read a file, the cloud system follows these steps: I. The central controller receives the request from the Internet. II. The central controller sends the request to the cluster controller that contains the file in its cluster. III. The cluster controller sends the request to the VM that contains the file. IV. The VM sends the file to the user. 3.6. VM Failure If a VM fails, then The cluster controller will not write any more file in the VM. The cluster controller informs the system manager to repair the VM. 4. SIMULATION We implemented CCS in the CloudSim [15] simulator. In this section, we evaluate the performance and the overhead of CCS. To do this, we define the simulation scenario presented in Table 1. In this scenario, we change number of VMs in different executions whereas the other parameters are fixed to evaluate the scalability of CCS. 4

We compare the following systems: CCS SCS: The Standard Cloud System with a central controller without clustering 4.1. Simulation Results In SCS, the central controller has to handle all the tasks of requests. Thus, it takes long time to process a read/write request. If we have a higher request rate, then their response latency goes dramatically up. This shows SCS is not scalable and causes large read/write delays if we increase the request rates. Table 1. Simulation Parameters. Parameter Value Number of VMs (nv) From 100 to 12500 Number of computers 50 File Write Rate File Read Rate File Size Number of Files (nf) Simulation Duration nv/100 requests per second nv/100 requests per second 100 Kbytes 100 * nv 1 hour CCS SCS Average Read Latency (s) 0.2 0.15 0.1 0.05 0 100 500 2500 12500 Number of VMs Figure 2. Average file read latency versus number of VMs CCS SCS Average Write Latency (s) 0.3 0.2 0.1 0 100 500 2500 12500 Number of VMs Figure 3. Average file write latency versus number of VMs 5

CCS SCS Load on Central Controller (Tasks per second) 1000 500 0 100 500 2500 12500 Number of VMs Figure 4. Load on the central controller versus number of VMs In CCS, the tasks of requests are distributed among cluster controllers. Thus, the tasks assigned to the central controller increases slowly. Using this technique, CCS is capable to increase its VMs and clusters to handle more requests. This shows CCS is scalable and keeps small read/write delays if we increase the request rates. Figures 2 and 3 illustrate how much delay user requests experienced in our experiment. These results show that CCS handles user requests averagely 182 percent faster than the standard cloud system. Figure 4 illustrates how much load the central controller experienced in our experiment. These results show that CCS assigns averagely 507 percent less tasks to the central controller than the standard cloud system. 5. CONCLUSIONS In this paper, we propose a novel cloud storage system containing thousands of virtual file servers on hundreds of computers. We group these virtual servers into clusters. This system is perfectly scalable because the system load is divided among the clusters. Our simulation experiments show that our cloud storage system achieves smaller file read/write latency and traffic/processing overhead than the existing systems. REFERENCES [1] Cloud Storage, http://en.wikipedia.org/wiki/cloud_storage [2] Amazon S3, http://en.wikipedia.org/wiki/amazon_s3 [3] EMC Atmos, http://en.wikipedia.org/wiki/emc_atmos [4] Sean Rhea, Chris Wells, Patrick Eaton, Dennis Geels, Ben Zhao, Hakim Weatherspoon, and John Kubiatowicz, Maintenance-Free Global Data Storage, IEEE Internet Computing, Vol 5, No 5, September/October 2001, pp 40 49. [5] Byung Chul Tak, Chunqiang Tang, and Rong N. Chang, Designing a Storage Infrastructure for Scalable Cloud Services, Technical Report, The Pennsylvania State University, 2011. [6] Tin Tin Yee, Thinn Thu Naing, PC-Cluster based Storage System Architecture for Cloud Storage, International Journal on Cloud Computing: Services and Architecture, Volume: 1 - volume NO: 3 - Issue: November 2011. [7] Michael Vrable, Stefan Savage, and Geoffrey M. Voelker, BlueSky: A Cloud-Backed File System for the Enterprise, Proceedings of the 7th USENIX Conference on File and Storage Technologies (FAST), San Jose, CA, February 2012. [8] Sherman S. M. Chow, Cheng-Kang Chu, Xinyi Huang, Jianying Zhou, Robert H. Deng, Dynamic Secure Cloud Storage with Provenance, Cryptography and Security, pp. 442-464, 2012. 6

[9] Ji-Yong Shin, Mahesh Balakrishnan, Lakshmi Ganesh, Tudor Marian, Hakim Weatherspoon, Gecko: A Contention-Oblivious Design for Cloud Storage, In Proceedings of the USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage), Boston, MA, U.S.A., Jun 2012. [10] Lakshmi Ganesh, Hakim Weatherspoon, Ken Birman, Beyond Power Proportionality: Designing Power-Lean Cloud Storage, NCA 2011, pp.147-154, 2011. [11] David Bermbach, Markus Klems, Stefan Tai, Michael Menzel, MetaStorage: A Federated Cloud Storage System to Manage Consistency-Latency Tradeoffs, IEEE CLOUD, pp.452-459, 2011. [12] A. Kumbhare, Y. Simmhan, V. Prasanna, Designing a Secure Storage Repository for Sharing Scientific Datasets using Public Clouds, Proceedings of the second international workshop on Data intensive computing in the clouds, Pages 31-40, 2011. [13] H. T. Vo, C. Chen, B. C. Ooi, Towards Elastic Transactional Cloud Storage with Range Query Support, Int'l Conference on Very Large Data Bases (VLDB), 2010. [14] Donald Kossmann, Tim Kraska, Simon Loesing, Stephan Merkli, Raman Mittal, Flavio Pfaffhauser, Cloudy: A Modular Cloud Storage System, PVLDB 3(2), pp.1533-1536, 2010. [15] Rodrigo N. Calheiros, Rajiv Ranjan, Anton Beloglazov, César A. F. De Rose, Rajkumar Buyya, CloudSim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms, Software Practice & Experience, Volume 41 Issue 1, Pages 23-50, January 2011. AUTHORS Parinaz Eskandarian was born in 1987 in Iran. Ms. Eskandarian received her B.Engr. degree from University of Tabriz Jahad Daneshgahi, and her M.S. degree from Islamic Azad University, Zanjan branch (Zanjan, Iran) in computer engineering in 2012. Jaber Karimpour was born in 1975 in Iran. Dr. Karimpour received his B.Engr. degree and his M.S. degree from University of Tabriz in computer science. He also received his Phd degree from University of Tabriz (Tabriz, Iran) in computer science in 2009. 7