Study on Redundant Strategies in Peer to Peer Cloud Storage Systems



Similar documents
UPS battery remote monitoring system in cloud computing

DESIGN AND IMPLEMENTATION OF A SECURE MULTI-CLOUD DATA STORAGE USING ENCRYPTION

ISSN Index Terms Cloud computing, outsourcing data, cloud storage security, public auditability

Data Mining for Data Cloud and Compute Cloud

Data Security Strategy Based on Artificial Immune Algorithm for Cloud Computing

Recent Advances in Cloud Security

CLOUDDMSS: CLOUD-BASED DISTRIBUTED MULTIMEDIA STREAMING SERVICE SYSTEM FOR HETEROGENEOUS DEVICES

The Comprehensive Performance Rating for Hadoop Clusters on Cloud Computing Platform

A Survey on Cloud Storage

Sector vs. Hadoop. A Brief Comparison Between the Two Systems

Design of Electric Energy Acquisition System on Hadoop

Secrecy Maintaining Public Inspecting For Secure Cloud Storage

Improving data integrity on cloud storage services

SHARED DATA & INDENTITY PRIVACY PRESERVING IN CLOUD AND PUBLIC AUDITING

Proof of Retrivability: A Third Party Auditor Using Cloud Computing


A Digital Fountain Approach to Reliable Distribution of Bulk Data

New Cloud Computing Network Architecture Directed At Multimedia

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) PERCEIVING AND RECOVERING DEGRADED DATA ON SECURE CLOUD

Efficient Data Replication Scheme based on Hadoop Distributed File System

How To Encrypt Data With A Power Of N On A K Disk

An Architecture Model of Sensor Information System Based on Cloud Computing

Scalable Multiple NameNodes Hadoop Cloud Storage System

A Comprehensive Data Forwarding Technique under Cloud with Dynamic Notification

RSA BASED CPDP WITH ENCHANCED CLUSTER FOR DISTRUBED CLOUD STORAGE SERVICES

Near Sheltered and Loyal storage Space Navigating in Cloud

Cloud Storage Solution for WSN Based on Internet Innovation Union

Self-Compressive Approach for Distributed System Monitoring

Research on Clustering Analysis of Big Data Yuan Yuanming 1, 2, a, Wu Chanle 1, 2

Modeling for Web-based Image Processing and JImaging System Implemented Using Medium Model

Problem Solving Hands-on Labware for Teaching Big Data Cybersecurity Analysis

International Journal of Engineering Research ISSN: & Management Technology November-2015 Volume 2, Issue-6

Data Storage Security in Cloud Computing for Ensuring Effective and Flexible Distributed System

Object Request Reduction in Home Nodes and Load Balancing of Object Request in Hybrid Decentralized Web Caching

Telecom Data processing and analysis based on Hadoop

NLSS: A Near-Line Storage System Design Based on the Combination of HDFS and ZFS

A Hybrid Electrical and Optical Networking Topology of Data Center for Big Data Network

Non-Stop for Apache HBase: Active-active region server clusters TECHNICAL BRIEF

Verifying Correctness of Trusted data in Clouds

A Scheme for Implementing Load Balancing of Web Server

Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop

Big Data Storage Architecture Design in Cloud Computing

An Efficient Load Balancing Technology in CDN

AN APPLICATION OF INFORMATION RETRIEVAL IN P2P NETWORKS USING SOCKETS AND METADATA

Ensuring Data Storage Security in Cloud Computing By IP Address Restriction & Key Authentication

Load Rebalancing for File System in Public Cloud Roopa R.L 1, Jyothi Patil 2

Cloud Storage Solution for WSN in Internet Innovation Union

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Processing of Hadoop using Highly Available NameNode

Cyber Forensic for Hadoop based Cloud System

Data Mining with Hadoop at TACC

Study on Cloud Computing Resource Scheduling Strategy Based on the Ant Colony Optimization Algorithm

A Hybrid Load Balancing Policy underlying Cloud Computing Environment

2. Research and Development on the Autonomic Operation. Control Infrastructure Technologies in the Cloud Computing Environment

The Construction of Seismic and Geological Studies' Cloud Platform Using Desktop Cloud Visualization Technology

Keywords Distributed Computing, On Demand Resources, Cloud Computing, Virtualization, Server Consolidation, Load Balancing

International Journal of Advance Research in Computer Science and Management Studies

A P2P SERVICE DISCOVERY STRATEGY BASED ON CONTENT

MANAGEMENT OF DATA REPLICATION FOR PC CLUSTER BASED CLOUD STORAGE SYSTEM

HadoopRDF : A Scalable RDF Data Analysis System

Cloud computing doesn t yet have a

Open Access Research on Database Massive Data Processing and Mining Method based on Hadoop Cloud Platform

Design of Data Archive in Virtual Test Architecture

Secure Way of Storing Data in Cloud Using Third Party Auditor

Cloud Computing for Agent-based Traffic Management Systems

R.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5

Analysis of Information Management and Scheduling Technology in Hadoop

Method of Fault Detection in Cloud Computing Systems

Research on Storage Techniques in Cloud Computing

A Novel Load Balancing Optimization Algorithm Based on Peer-to-Peer

CONCEPTUAL MODEL OF MULTI-AGENT BUSINESS COLLABORATION BASED ON CLOUD WORKFLOW

Reduction of Data at Namenode in HDFS using harballing Technique

Integration of Hadoop Cluster Prototype and Analysis Software for SMB

E6893 Big Data Analytics Lecture 2: Big Data Analytics Platforms

Comparative analysis of mapreduce job by keeping data constant and varying cluster size technique

A Dynamic Resource Management with Energy Saving Mechanism for Supporting Cloud Computing

IMPROVED SECURITY MEASURES FOR DATA IN KEY EXCHANGES IN CLOUD ENVIRONMENT

A Secure Decentralized Access Control Scheme for Data stored in Clouds

Realizing a Vision Interesting Student Projects

Quality of Service Routing Network and Performance Evaluation*

DATA SECURITY MODEL FOR CLOUD COMPUTING

Distributed Framework for Data Mining As a Service on Private Cloud

Cloud Based E-Government: Benefits and Challenges

The Study of a Hierarchical Hadoop Architecture in Multiple Data Centers Environment

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. Big Data Management and Analytics

Design and Implementation of IaaS platform based on tool migration Wei Ding

Analysis of Secure Cloud Data Sharing Within a Group

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

A Network Simulation Experiment of WAN Based on OPNET

Signature Amortization Technique for Authenticating Delay Sensitive Stream

Transcription:

Applied Mathematics & Information Sciences An International Journal 2011 NSP 5 (2) (2011), 235S-242S Study on Redundant Strategies in Peer to Peer Cloud Storage Systems Wu Ji-yi 1, Zhang Jian-lin 1, Wang Tong 2 and Shen Qian-li 1 1 Key Lab of E-Business and Information Security, Hangzhou Normal University,Hangzhou 310036,PR China 2 Information and Communication Engineering College, Harbin Engineering University, Harbin 150001,PR China Email Address: wangtong@hrbeu.edu.cn Received June 22, 2010; Revised March 5, 2011 Abstract: Relative to the current Master/Slave computing model cloud storage file systems, including GFS, HDFS, Sector, we proposed a general model of Peer-to-Peer based Cloud Storage file system, and constructed a prototype system named MingCloud based on Kademlia algorithm. Focused on redundant strategy selection and design of the storage system, our experiment used the Cauchy Codes as coding algorithm to assess the system from many aspects. Compared with the redundant mode of complete copy, Erasure Code model provided more satisfactory system availability, and more suitable to apply to our P2P Cloud Storage System. Keywords: cloud computing, cloud storage, redundancy, peer-to-peer. 1 Introduction As the latest hotspot in distributed storage nearly 3 years, there are many valuable research on cloud storage from academic circles. Robert L. Grossman from University of Illinois put forward and implemented a wide area network based high performance computing and storage cloud, Sector/Sphere [1,2], and experimental results show it s better performance than Hadoop. MetaCDN were designed and proposed by Australia University of Melbourne s James Broberg [3], to integrate cloud storage service from different service providers, and provide a unified highperformance and low-cost content distributed storage and distribution services for content creators. Kevin D. Bowers et al of RSA Laboratories [4] proposed a high reliability, completeness cloud storage model HAIL, and carried out experiments on security and efficiency. Knowledge storage model to dynamic fuse local storage and cloud storage were put forward by David Tarrant, etc [5] from University of Southampton, UK. Relative to the current Master/Slave model cloud storage file systems, including

Wu Ji-yi et al 236 GFS, HDFS, Sector, we proposed a general model of Peer-to-Peer based Cloud Storage file system, and constructed a prototype system named MingCloud based on Kademlia algorithm. The results from a simulation experiment show that the system has high availability and performance, which can be actually applied to the Internet dynamic and open environment after improving and optimizing, and to provide higher quality of cloud storage service. 2 Redundant strategies in P2P Cloud Storage System 2.1. P2P Cloud Storage System Highlighting the system security and reliability of data, our system can provide basic data storage, read, delete, search and other storage services. To adopt highly scalable P2P system architecture, the basic idea of the MingCloud is to build a high performance, high reliability, low-cost distributed cloud storage system by integrating these idle storage resources in Internet connected desktop computers. According to the scale of storage nodes number, the system need to be divided into different Domains or Clusters, configured one Master Server to store user's directory information, file index information and other meta-data, and complete domain user authentication. With a hierarchical architecture, as shown in Fig. (2.1), MingCloud can be divided into five layers: Physical Layer, Router Layer, Data Layer, Session Layer, Application Layer, from a functional point of view. Figure 2.1: Layered architecture in MingCloud P2P cloud storage system 2.2. Redundant strategies selection

Study on Redundant Strategies in Peer 237 At present, there are many coding algorithms, such as Streaming Codes, Shamir Threshold Scheme, Forward Error Correction, Reed-Solomon, Vandermonde, Cauchy Codes [6] and Tornado Codes [7]. Are these algorithms suitable for P2P cloud storage system? How to choose and realize an adaptive algorithm for our MingCloud P2P cloud system, is critical. After experimental analysis, combined with related research on encoding algorithm in paper [8], Vandermonde, Cauchy Codes and Tornado codes algorithms are more suitable for our P2P cloud storage systems. The paper [8] also experimental proved that Tornado Codes have advantages and high efficiency compared with Cauchy Codes more early. More than 100 times in execution speed of Cauchy Codes; Tornado Codes operate and process through pair of graphs, as simple, fast and practical algorithms. Taking into account algorithm protection by the Digital Fountain Company, our system temporarily adopt the relatively good performance Cauchy Codes as the core algorithm. In future study, our team also hopes to design a coding algorithm close to Tornado in performance. 3 Experiments and performance analysis 3.1. Experimental Environment and Purpose Our experiments will compare the influence on system availability by using different redundancy strategy. A Java based P2P simulator PeerSim is adopted, which is run in JDK1.6 and eclipse on a PC with dual-core CPU 1.8 GHZ and 2G memory whose operation system is Windows XP. The purpose of our experiments is to verify the correctness of Eq. (3.1) and Eq. (3.2). First, we compare the system availabilities under different k-bucket sizes, in order to choose an appropriate k value; Second, we compare the system availabilities under different block number of file for each redundant strategy, in order to show the applicable scene of each redundant strategy; At last, we compare the system availabilities under different redundant strategies with the same redundant rate, in order to compare these two redundant strategies directly. r A= (1 (1 p) ) n (3.1) mr mr i A= p (1 p) i= m i mr i (3.2)

Wu Ji-yi et al 238 3.2. Experimental results and analysis As shown in Fig. (3.1), when the parameters value is assigned as followed: p=0.6, r=2.0, n=1, m=1, system availability increases as the size of k-buckets increase, whichever the adopted redundant strategy is. It is because some data may be much closer to a new node, so other nodes may tend to request data from this new node, which leads to reading failure. When the size of k-bucket k is little, other nodes request data from few nods, which make it more likely to fail. When k 4, system availability tends to be stable, because there tends to be at least one effective node among these k nodes when k is large enough. Figure 3.1: The influence of different k-bucket sizes on system availability According to the Eq. (3.1), we can see that the theoretical value of system availability is 0.84 when full replica redundant strategy is adopted. According to the Eq. (3.2), we can see that the theoretical value of system availability is 0.84 when error correction code redundant strategy is adopted. As shown in Fig. (3.1), the experiment result is close to the theoretical value when k 4. When file is large, it is wise to split the file into small blocks, which can reduce the extra spending when transmission failure happens. This experiment will measure the influence of different block number on system availability, where p=0.6, r=2.0 and k=8. When error code redundant strategy is adopted, file is first divided into m

Study on Redundant Strategies in Peer 239 blocks, and then the m blocks are encoded into mr new blocks. The experiment result is shown in Fig. (3.2). As the result shows, the system availability decreases rapidly as the block number increases when full replica redundant strategy is adopted, it may because the probability that all of the blocks exists in the network decreases as the block number increase. So it is wise not to split the file when full replica redundant strategy is adopted in order to keep high system availability. The experiment result is close to the theoretical value, for example, the theoretical system availability is 0.21 when block number equals to 9, and the system availability is 0.175 when block number is 10. System availability remains stable as the block number increases when error code redundancy is adopted, it is because the error code redundancy strategy does not require all of the blocks exist in the network. The error code redundancy strategy can restore the original file even if only some of the file blocks exist. Furthermore, system availability increases slowly as the block number increases according to Eq. (3.2), and it remains stable when m 5. Figure 3.2: The influence of different block number on system availability Comparing these two redundancy strategies, we knows that full replica redundant strategy is only suitable to save small files which don t need to be split, but error replica redundant strategy is suitable to save any size of files, which make it more suitable to P2P file storage system.

Wu Ji-yi et al 240 As shown is Fig. (3.3), the system availability increases as the redundant rate increases, whichever the adopted redundant strategy is. When the redundant rate is small ( r < 2 ), full replica redundant strategy obtains a higher system availability, which can be derived from Eq. (3.1) and Eq. (3.2) too. When the redundant rate is large ( r 2 ), error code redundant strategy obtains a higher system availability, which can also be derived from Eq. (3.1) and Eq. (3.2). Figure 3.3: The influence of different redundant strategies on system availability When redundant rate is 1, system availability is poor, because in this case, error code redundant strategy has degenerated to full replica redundant strategy, and what's more important, the block number is 5, so the system availability becomes poor according to Eq. (3.1). In summary, erasure codes redundant strategy will obtains higher system availability than full replica redundant strategy when redundant rate is high. 4 Conclusions Compared with the redundant mode of complete copy, the Erasure Code model provided more satisfactory system availability, and more suitable to apply to our P2P Cloud Storage System. Therefore, the availability of our systems needs to be further validated and tested. Based on the work in this paper, we will have the further indepth study in routing algorithm optimization and design, redundancy algorithm optimization and design, data security, trust and reputation mechanism, incentives mechanism.

Study on Redundant Strategies in Peer 241 Acknowledgements Funding for this research was provided in part by the Scientific Research Program of Zhejiang Educational Department under Grant No.20071371, National Natural Science Foundation of China (Grant No.61070153). We like to thank anonymous reviewers for their valuable comments. References [1] Yunhong Gu and Robert L.Grossman. Sector and Sphere: the design and implementation of a high-performance data cloud. Philosophical Trans. of the Royal Society. A367: 2429,2009. [2] Robert L Grossman,Yunhong Gu. Data Mining Using High Performance Data Clouds: Experimental Studies Using Sector and Sphere. In Proc. of the 14th ACM SIGKDD,920,2008. [3] James Broberg and Zahir Tari. MetaCDN: Harnessing Storage Clouds for High Performance Content Delivery. In: Proc. of ICSOC 2008,LNCS 5364, 730,2008. [4] Kevin D.Bowers,Ari Juels, and Alina Oprea. HAIL: A High-Availability and Integrity Layer for Cloud Storage. Cryptology eprint Archive, Report 2008/489, Retrieved from http://eprint.iacr.org/,2011. [5] David Tarrant,Tim Brody, and Leslie Carr. From the Desktop to the Cloud: Leveraging Hybrid Storage Architectures in your Repository,2011. [6] James.S.Plank.Optimizing Cauchy Reed-Solomon Codes for Fault-Tolerant Storage Applications.Technical Report CS-OS-569 Deparment of CoScience University oftenessee.december,2005. [7] M. Luby, M. Mitzenmacher, A. Shokrollahi, and D. Spielman.Efficient erasure correcting codes [J]. IEEE Transactions on Information Theory, 47(2):569,2001. [8] John W.Byers, Michael Luby, Michael Mitzenmacher. A Digital Fountain Approach to Reliable Distribution of Bulk Data. ACM SIGCOMM Computer Communication Review, 28(4):56,1998. Wu Jiyi received M.Eng. degree in 2005 from Zhejiang University, Hangzhou China, in computer science. He is currently a PhD candidate in the School of Computer Science and Technology, Zhejiang University, and associate professor of the Key Lab of E-Business and Information Security, Hangzhou Normal University. His research interests include services computing, trust and reputation.

Wu Ji-yi et al 242 Zhang Jianlin is a professor at Alibaba Business School, Hangzhou Normal University. He received Master's degree in Computer Application from Zhejiang University in 1998. His research interests include Cloud Computing, SaaS and information security. Prof. Zhang was born in 1966. He received B.S degree from Zhejiang University of Technology in 1989. Wang Tong is an associate professor at Information and Communication Engineering College, Harbin Engineering University. He received Doctor's degree in Computer Application from Harbin Engineering University in 2006. His research interests include Cloud Computing, SaaS and information security. He was born in 1977. He received Master's degree from Harbin Engineering University in 2003. Shen Qianli is a lecturer at Alibaba Business School, Hangzhou Normal University. His research interests include SaaS and information security.