Peer-to-Peer Replication



Similar documents
P2P Networking - Advantages and Disadvantages of Virtualization

DUP: Dynamic-tree Based Update Propagation in Peer-to-Peer Networks

Varalakshmi.T #1, Arul Murugan.R #2 # Department of Information Technology, Bannari Amman Institute of Technology, Sathyamangalam

Improving Data Availability through Dynamic Model-Driven Replication in Large Peer-to-Peer Communities

Load Balancing in Structured Overlay Networks. Tallat M. Shafaat

A NEW FULLY DECENTRALIZED SCALABLE PEER-TO-PEER GIS ARCHITECTURE

Krunal Patel Department of Information Technology A.D.I.T. Engineering College (G.T.U.) India. Fig. 1 P2P Network

IPTV AND VOD NETWORK ARCHITECTURES. Diogo Miguel Mateus Farinha

GRIDB: A SCALABLE DISTRIBUTED DATABASE SHARING SYSTEM FOR GRID ENVIRONMENTS *

SCALABLE RANGE QUERY PROCESSING FOR LARGE-SCALE DISTRIBUTED DATABASE APPLICATIONS *

Tornado: A Capability-Aware Peer-to-Peer Storage Network

Load Balancing in Structured P2P Systems

A PROXIMITY-AWARE INTEREST-CLUSTERED P2P FILE SHARING SYSTEM

Join and Leave in Peer-to-Peer Systems: The DASIS Approach

LOOKING UP DATA IN P2P SYSTEMS

Data Storage Requirements for the Service Oriented Computing

New Structured P2P Network with Dynamic Load Balancing Scheme

Load Balancing in Distributed Systems: A survey

Clustering in Peer-to-Peer File Sharing Workloads

GISP: Global Information Sharing Protocol a distributed index for peer-to-peer systems

Discovery and Routing in the HEN Heterogeneous Peer-to-Peer Network

Exploring the Design Space of Distributed and Peer-to-Peer Systems: Comparing the Web, TRIAD, and Chord/CFS

LOAD BALANCING FOR OPTIMAL SHARING OF NETWORK BANDWIDTH

Trust and Reputation Management in P2P Networks

Database Replication with Oracle 11g and MS SQL Server 2008

Towards a scalable ad hoc network infrastructure

Distributed Hash Tables in P2P Systems - A literary survey

Chord. A scalable peer-to-peer look-up protocol for internet applications

Efficient Search in Gnutella-like Small-World Peerto-Peer

Improving Availability with Adaptive Roaming Replicas in Presence of Determined DoS Attacks

Acknowledgements. Peer to Peer File Storage Systems. Target Uses. P2P File Systems CS 699. Serving data with inexpensive hosts:

RESEARCH ISSUES IN PEER-TO-PEER DATA MANAGEMENT

Clustering in Peer-to-Peer File Sharing Workloads

A Peer-to-Peer File Sharing System for Wireless Ad-Hoc Networks

Calto: A Self Sufficient Presence System for Autonomous Networks

Effective Load Balancing in P2P Systems

A P2P SERVICE DISCOVERY STRATEGY BASED ON CONTENT

Enhancing Secure File Transfer by Analyzing Repeated Server Based Strategy using Gargantuan Peers (G-peers)

A Reputation Management System in Structured Peer-to-Peer Networks

Identity Theft Protection in Structured Overlays

Magnus: Peer to Peer Backup System

New Algorithms for Load Balancing in Peer-to-Peer Systems

Peer-VM: A Peer-to-Peer Network of Virtual Machines for Grid Computing

An Introduction to Peer-to-Peer Networks

Identity Theft Protection in Structured Overlays

Database Replication with MySQL and PostgreSQL

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications

A Self-Managing SIP-based IP Telephony System based on a P2P approach using Kademlia

Distributed Software Development with Perforce Perforce Consulting Guide

File sharing using IP-Multicast

Modeling Peer-Peer File Sharing Systems

High Availability, Scalable Storage, Dynamic Peer Networks: Pick Two

Object Request Reduction in Home Nodes and Load Balancing of Object Request in Hybrid Decentralized Web Caching

I. Middleboxes No Longer Considered Harmful II. A Layered Naming Architecture for the Internet

LOAD BALANCING WITH PARTIAL KNOWLEDGE OF SYSTEM

A Survey of Peer-to-Peer Storage Techniques for Distributed File Systems

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November ISSN

Serving DNS using a Peer-to-Peer Lookup Service

PAST: A large-scale, persistent peer-to-peer storage utility

A Hybrid Keyword Search across Peer-to-Peer Federated Databases

Quantitative Analysis of 2-tier P2P- SIP Architecture with ID-based Signature

An Efficient Strategy for Data Recovery in Wi-Fi Systems

Distributed Consistency Method and Two-Phase Locking in Cloud Storage over Multiple Data Centers

SOLVING LOAD REBALANCING FOR DISTRIBUTED FILE SYSTEM IN CLOUD

A Highly Scalable and Efficient Distributed File Storage System

Optimal Proactive Caching in Peer-to-Peer Network: Analysis and Application

The Case for a Hybrid P2P Search Infrastructure

GROUPWARE. Ifeoluwa Idowu

A Survey of Peer-to-Peer Networks

P2P Storage Systems. Prof. Chun-Hsin Wu Dept. Computer Science & Info. Eng. National University of Kaohsiung

Internet Indirection Infrastructure

Index Terms : Load rebalance, distributed file systems, clouds, movement cost, load imbalance, chunk.

Chord - A Distributed Hash Table

BAJI SHAHEED BABA SHAIK 1, G CHINNA BABU 2, UPPE NANAJI 3

DATA REPLICATION STRATEGIES IN WIDE AREA DISTRIBUTED SYSTEMS

How To Build A Cooperative Service On Post

Secure Communication in a Distributed System Using Identity Based Encryption

Eventually Consistent

Balanced Reputation Detective System (BREDS): Proposed Algorithm

Nom: Resource Location and Discovery for Ad Hoc Mobile Networks

Mobile File-Sharing over P2P Networks

Storage Systems Autumn Chapter 6: Distributed Hash Tables and their Applications André Brinkmann

Availability Digest. MySQL Clusters Go Active/Active. December 2006

A Secure Distributed Search System

Appendix A Core Concepts in SQL Server High Availability and Replication

Decentralized supplementary services for Voice-over-IP telephony

Enhance Load Rebalance Algorithm for Distributed File Systems in Clouds

Data Distribution with SQL Server Replication

How To Build A Training System On A Network Of Nodes

IAX-Based Peer-to-Peer VoIP Architecture

Architectures and protocols in Peer-to-Peer networks

A Survey of Peer-to-Peer Content Distribution Technologies

Cloud Platforms, Challenges & Hadoop. Aditee Rele Karpagam Venkataraman Janani Ravi

Information Searching Methods In P2P file-sharing systems

EMC DOCUMENTUM MANAGING DISTRIBUTED ACCESS

1. Introduction. Keywords: content distribution, overlay, peer-to-peer

<Insert Picture Here> Getting Coherence: Introduction to Data Grids South Florida User Group

Technological Trend. A Framework for Highly-Available Cascaded Real-Time Internet Services. Service Composition. Service Composition

Middleware and Distributed Systems. Peer-to-Peer Systems. Martin v. Löwis. Montag, 30. Januar 12

Multicast vs. P2P for content distribution

Transcription:

Peer-to-Peer Replication Matthieu Weber September 13, 2002 Contents 1 Introduction 1 2 Database Replication 2 2.1 Synchronous Replication..................... 2 2.2 Asynchronous Replication.................... 2 2.3 Conflict Avoidance, Detection and Resolution......... 3 3 Peer-to-Peer Systems 3 3.1 Peer-to-Peer File Sharing Systems................ 4 4 Peer-to-peer Database Replication 4 4.1 Quorum Calculation....................... 4 4.2 Non-Unitary Weighting...................... 5 5 Peer-to-Peer File Replication 5 6 Conclusion 6 1 Introduction Research on the W.W.W. about peer-to-peer replication have given two kind of results: replication of databases in a peer-to-peer way, data replication in an unstructured peer-to-peer network. 1

The two approaches, if leading to the same results (i.e. data is replicated on different nodes of a network), are rather different. Database replication aims to replicate data along with its metadata (e.g. a phone number and the fact that this piece of data is a phone number and it belongs to Mr Jones) across a well-known set of database in a controlled environment, whereas peer-to-peer network allow the users to share (i.e. allow other users to copy, therefore replicate) files without much checking about metadata, with whoever wants to copy the files. The consequence is that data is replicated, but the user might get the recipe of pea soup instead of the phone number of Mr Jones, due to inconsistency in metadata. 2 Database Replication The fundamental issue in data replication is maintaining data integrity in across multiple database images. Two primary solutions exist, synchronous and asynchronous replication[1]. 2.1 Synchronous Replication Synchronous replication requires that every image of the database be written at once. Data integrity is maintained in traditional database solutions with a two-phase commit within a transaction that accesses every image of the database element to be written, and processes only when every image is available for update. In traditional database solutions, synchronous replication eliminates the distinction between master/slave and peer-to-peer configurations. However, traditional synchronous solutions are vulnerable to system failure. If one image of the database is unavailable due to server failure, the transaction is prevented from completing. While synchronous replication assures data integrity, it does it so at the expense of availability. System availability is greatly reduced with synchronous replication if the links between the database images are fragile. This vulnerability to network failures also prevents synchronously replicated database servers from being geographically distributed, leaving them vulnerable to location specific disasters, like earthquakes. 2.2 Asynchronous Replication Asynchronous data replication provides two benefits, improved read performance and continuous availability. The failure of a particular server does not 2

generally affect the operation of he other servers, but does force more work to be done in the application to maintain data integrity. When changes are made, a single image of the database is updated, and then the changes are propagated to the remaining images. The two popular mechanisms for propagating changes to the other databases images are embedded triggers and passing change logs. Triggers are implemented inside the database and add to the overhead associated with performing transactions in the database. Passing change logs also increases network overhead during the replication of the update, but is less intrusive on the local database. Asynchronous replication also allows database images to be isolated geographically, since each site has its own version of the data. This fact is of growing importance, since more and more e-commerce companies grow overseas, and need to provide access points (i.e. the company s web page) in different countries, thus allowing efficient and uninterrupted access to the company s database, which can be greatly improved with the help of peerto-peer database replication[12]. 2.3 Conflict Avoidance, Detection and Resolution Conflict resolution is required when two images if the same database are updated at the same time, without knowledge of the other image. The simplest solution for this type of conflict is to have the application avoid such conflict by ensuring that each data element belongs to one database image or another. This forces a master/slave relationship between database images. Once the conflict exists, traditional databases provide mechanisms to detect and resolve them. Conflict resolution algorithms assign priorities to the conflicting updates based on update requester status (master or slave), timestamps, or some type of application dependent algorithm. 3 Peer-to-Peer Systems Peer-to-peer networks are born because of the need to share files among users, which didn t have the possibility to publish their files in the traditional client/server way. Typically, peer-to-peer network users have huge amounts of files, that cannot be fit on public shared servers (like for example free web hosting services). The idea of peer-to-peer computing is that each user becomes at the same time server and client, thus having an equal role than other users. 3

This paradigm can be extended to whatever interaction where all the protagonists have the same rank. There is no notion anymore of master/slave or client/server. 3.1 Peer-to-Peer File Sharing Systems The main problem in peer-to-peer file sharing network is the advertisement of shared files. It has been solved first with the help of a central index (Napster[9]). This structure has proved to be easy to tear down, since each node of the network depends on the index, which is a potential point of failure. Other index-based solutions exist, using multiple indexes loosely replicated (edonkey[10], FastTrack[11]). Another solution has been to build a completely distributed network. This kind of peer-to-peer network might be either strongly structured (Chord[6], CAN[5], Past[7], Tapestry[8]) loosely structured (Freenet[3]) or unstructured (Gnutella[4]). Unstructured peer-to-peer network don t rely on any central index for searching. The nodes instead send search queries and ask who owns the file one is looking for. Structured network additionally rely on their implicit structure to make the search faster. 4 Peer-to-peer Database Replication In Objectivity/DB[1], peer-to-peer replication is achieved as follows: multiple database images are set up, and a a transaction is completed only when enough images are available to form a quorum. 4.1 Quorum Calculation The number of database images required to complete a transaction is calculated implicitly at runtime. This is called the quorum calculation. Simply put, the database images vote and the majority wins. When a database image is accessed, and a lock is implicitly requested as part of that access, the application process running on the local computer contacts each lock server to determine which database images are available to vote. If a majority of the images of a database is available, called a quorum, the access is permitted. The images which are not available, for whatever reason, will be automatically resynchronized when they come back on line. 4

4.2 Non-Unitary Weighting There are many replication scenarios in which some database images require special access or ownership of particular data, regardless of server or network failures elsewhere in the system. This is addressed by assigning voting weights to each database image. A master/slave configuration can thus be achieved by granting the master more weight than the sum of the weights of the slaves. Thus the master must always be on line in order to have a quorum of available images. 5 Peer-to-Peer File Replication Peer-to-peer file sharing systems don t provide, nowadays, any version control system nor fail-proof data identification system through the use of metadata. The consequence is that data replication in peer-to-peer networks is done at file level and is not very useful for a completely automated system: conflict resolution is entirely left to the users. Today s peer-to-peer file sharing systems allow only to replicate pieces of data, not to manage update, different versions nor concurrent write access to the data. Whereas data in a database is well organized, distributed peer-to-peer file sharing systems are completely chaotic. File replication is usually done only when a user is requesting it, therefore creating a distinction between popular files (files which are often requested) and unwanted files. Because there is no index of all available data, searching is required before copying a file. The performance of searching data depends greatly on how much that piece of data has been replicated in the network[2]. Various file sharing systems have different policies: in Gnutella[4], a file is replicated when the requester node gets a copy of it; in Freenet[3], a file is proactively replicated at node which have note explicitly requested the file, but which are involved in the retrieval process. In[2] are compared three replication strategies: Uniform replication, for which a fixed number of copies is made for each file, Proportional replication, where a fixed number of copies is made after each request of a file, Square-Root replication, for which the number of copies made is proportional to the number of probes which have been made when searching for the file, once the file has been found. 5

The Square-Root replication is proved as the most efficient one. It is, however, the most difficult to implement. 6 Conclusion Data replication in a database environment and in a file sharing system is rather different, although the final goal is to make a copy of some data. Whereas the constraints on database replication are rather strong (control over the number of nodes, their behavior, consistency of data regarding metadata), they are rather loose in file sharing systems, which allow a greater freedom of behavior, at the cost of numerous organizational problems. Given the little amount of information found on the Internet about peerto-peer database replication, this technology seems not yet ready to be sold to customers. References [1] Data Replication in Objectivity/DB An Objectivity, Inc White Paper. Objectivity, Inc. 2001. [2] Qin Lv, Pei Cao, Edith Cohen, Kai Li and Scott Shenker. Search and Replication in Unstructerd Peer-to-Peer Networks. [3] Ian Clarke. A Distributed Decentralised Information Storage and Retrieval System. Master s Thesis, Division of Informatics, University of Edinburgh, 1999. [4] Open Source Community. Gnutella. In http://gnutella.wego.com, 2001. [5] Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp and Scott Shenker. A scalable content addressable network. In Proceedings of SOSP 01, 2001. [6] Ion Stoica, Robert Morris, David Karger, Frans Kaashoek and Hari Balakrishnan. Chord: A scalable peer-to-peer lookup service for Internet applications. In Proceedings of SIGCOMM 2001, August 2001. [7] A. Rowstron and P. Druschel. Storage management and caching in past, a large-scale, persistent peer-to-peer storage utility. In Proceedings of SOSP 01, 2001. 6

[8] Ben Y. Zhao, john Kubiatowicz and Anthony Joseph. Tapestry: An infrastructure for fault-tolerant wide-area location and routing. Technical report UCB/CSD-01-1141, University of California at Berkeley, Computer Science Department, 2001. [9] In http://www.napster.com. [10] In http://www.edonkey2000.com. [11] In http://www.fasttrack.com. [12] Michael C. Morrison The Growing Importance of Peer-to-Peer Replication In DM Direct, September 2001. 7