How To Create A P2P Network



Similar documents
Distributed Systems Peer-to-Peer

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November ISSN

The Role and uses of Peer-to-Peer in file-sharing. Computer Communication & Distributed Systems EDA 390

An Introduction to Peer-to-Peer Networks

Acknowledgements. Peer to Peer File Storage Systems. Target Uses. P2P File Systems CS 699. Serving data with inexpensive hosts:

DFSgc. Distributed File System for Multipurpose Grid Applications and Cloud Computing

5. Peer-to-peer (P2P) networks

Department of Computer Science Institute for System Architecture, Chair for Computer Networks. File Sharing

GISP: Global Information Sharing Protocol a distributed index for peer-to-peer systems

CS5412: TIER 2 OVERLAYS

Denial of Service Resilience in Peer to Peer. D. Dumitriu, E. Knightly, A. Kuzmanovic, I. Stoica, W. Zwaenepoel Presented by: Ahmet Canik

Slides for Chapter 10: Peer-to-Peer Systems

Overlay Networks. Slides adopted from Prof. Böszörményi, Distributed Systems, Summer 2004.

P2P Storage Systems. Prof. Chun-Hsin Wu Dept. Computer Science & Info. Eng. National University of Kaohsiung

A Survey and Comparison of Peer-to-Peer Overlay Network Schemes

How To Understand The Power Of Icdn

Calto: A Self Sufficient Presence System for Autonomous Networks

8 Conclusion and Future Work

Tornado: A Capability-Aware Peer-to-Peer Storage Network

Decentralized Peer-to-Peer Network Architecture: Gnutella and Freenet

Anonymous Communication in Peer-to-Peer Networks for Providing more Privacy and Security

LOOKING UP DATA IN P2P SYSTEMS

Peer-to-Peer Networks Organization and Introduction 1st Week

CROSS LAYER BASED MULTIPATH ROUTING FOR LOAD BALANCING

RESEARCH ISSUES IN PEER-TO-PEER DATA MANAGEMENT

Application Layer. CMPT Application Layer 1. Required Reading: Chapter 2 of the text book. Outline of Chapter 2

Adapting Distributed Hash Tables for Mobile Ad Hoc Networks

HPAM: Hybrid Protocol for Application Level Multicast. Yeo Chai Kiat

T he Electronic Magazine of O riginal Peer-Reviewed Survey Articles ABSTRACT

Definition. A Historical Example

Decentralized supplementary services for Voice-over-IP telephony

Peer-to-peer systems and overlays

Architectures and protocols in Peer-to-Peer networks

CHAPTER 8 CONCLUSION AND FUTURE ENHANCEMENTS

Security in Structured P2P Systems

PSON: A Scalable Peer-to-Peer File Sharing System Supporting Complex Queries

query enabled P2P networks Park, Byunggyu

Peer-to-Peer Networks. Chapter 6: P2P Content Distribution

A Peer-to-Peer Approach to Content Dissemination and Search in Collaborative Networks

Middleware and Distributed Systems. Peer-to-Peer Systems. Martin v. Löwis. Montag, 30. Januar 12

Load Balancing in Structured Overlay Networks. Tallat M. Shafaat

Peer-to-Peer Networks 02: Napster & Gnutella. Christian Schindelhauer Technical Faculty Computer-Networks and Telematics University of Freiburg

PEER-TO-PEER (P2P) systems have emerged as an appealing

Peer to peer networks: sharing between peers. Trond Aspelund

2015 Internet Traffic Analysis

Plaxton routing. Systems. (Pastry, Tapestry and Kademlia) Pastry: Routing Basics. Pastry: Topology. Pastry: Routing Basics /3

Domain Name System for PeerHosting

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets

Scalable Source Routing

Internetworking and Internet-1. Global Addresses

Unit 3 - Advanced Internet Architectures

A Brief Analysis on Architecture and Reliability of Cloud Based Data Storage

Approximate Object Location and Spam Filtering on Peer-to-Peer Systems

P2P: centralized directory (Napster s Approach)

Improving Availability with Adaptive Roaming Replicas in Presence of Determined DoS Attacks

The p2pweb model: a glue for the Web

LITERATURE REVIEW. Chapter 2

Varalakshmi.T #1, Arul Murugan.R #2 # Department of Information Technology, Bannari Amman Institute of Technology, Sathyamangalam

A Survey of Peer-to-Peer File Sharing Technologies

1. The Web: HTTP; file transfer: FTP; remote login: Telnet; Network News: NNTP; SMTP.

A P2P SERVICE DISCOVERY STRATEGY BASED ON CONTENT

Exploring the Design Space of Distributed and Peer-to-Peer Systems: Comparing the Web, TRIAD, and Chord/CFS

CS514: Intermediate Course in Computer Systems

Peer-to-Peer Computing

Distributed Hash Tables in P2P Systems - A literary survey

Methods & Tools Peer-to-Peer Jakob Jenkov

CHAPTER 2 MODELLING FOR DISTRIBUTED NETWORK SYSTEMS: THE CLIENT- SERVER MODEL

SplitStream: High-Bandwidth Multicast in Cooperative Environments

PROPOSAL AND EVALUATION OF A COOPERATIVE MECHANISM FOR HYBRID P2P FILE-SHARING NETWORKS

Assignment #3 Routing and Network Analysis. CIS3210 Computer Networks. University of Guelph

A Topology-Aware Relay Lookup Scheme for P2P VoIP System

Per-Flow Queuing Allot's Approach to Bandwidth Management

Dr Markus Hagenbuchner CSCI319. Distributed Systems

A PROXIMITY-AWARE INTEREST-CLUSTERED P2P FILE SHARING SYSTEM

Identity Theft Protection in Structured Overlays

Network Architecture and Topology

Performance Evaluation of AODV, OLSR Routing Protocol in VOIP Over Ad Hoc

PEER-TO-PEER NETWORK

Routing in packet-switching networks

RVS-Seminar Overlay Multicast Quality of Service and Content Addressable Network (CAN)

Computer Networking Networks

Distributed Computing over Communication Networks: Topology. (with an excursion to P2P)

Tools for Peer-to-Peer Network Simulation

D1.1 Service Discovery system: Load balancing mechanisms

Fault-Tolerant Framework for Load Balancing System

AN EFFICIENT DISTRIBUTED CONTROL LAW FOR LOAD BALANCING IN CONTENT DELIVERY NETWORKS

CHAPTER 7 SUMMARY AND CONCLUSION

CS268 Exam Solutions. 1) End-to-End (20 pts)

Research on P2P-SIP based VoIP system enhanced by UPnP technology

Behavior Analysis of TCP Traffic in Mobile Ad Hoc Network using Reactive Routing Protocols

SOFT 437. Software Performance Analysis. Ch 5:Web Applications and Other Distributed Systems

Transcription:

Peer-to-peer systems INF 5040 autumn 2007 lecturer: Roman Vitenberg INF5040, Frank Eliassen & Roman Vitenberg 1 Motivation for peer-to-peer Inherent restrictions of the standard client/server model Centralised design lacks scalability & fault-tolerance Processing Network traffic P2P systems take care of distributing processing load and network traffic between all nodes that participate in a distributed information system Solve the bottelneck but must pay in the form of considerably more complex mechanisms and lack of control INF5040, Frank Eliassen & Roman Vitenberg 2 INF 5040 høst 2005 1

What is P2P? In a P2P system, each participating node behaves as both client and server, and pays for participation by offering access to some of its own resources Typically processing power and storage resources But it can also be a logical resource (a service) An application-level network on top of the Internet (overlay network) INF5040, Frank Eliassen & Roman Vitenberg 3 Essential characteristics of P2P systems Each participant contributes its own resources to the system All nodes have the same functional capabilities and responsibility No dependency on a central entity for administration of the system (self-organising) The effectiveness critically depends on algorithms for data placement over many nodes and for subsequent access to them Unpredictable availability of processes and nodes INF5040, Frank Eliassen & Roman Vitenberg 5 INF 5040 høst 2005 2

The evolution of P2P systems and applications First generation Napster Sharing/exchange of music files Hybrid Client/Server og P2P (central index server) Second generation Gnutella, Freenet, Kazaa,... Decentralised file-sharing system Third generation P2P middleware Application-independent middleware layer for management of distributed resources in the global scale Pastry, Tapestry, CAN, Chord,... INF5040, Frank Eliassen & Roman Vitenberg 6 P2P middleware characterisation The main objectives are to Place resources (data objects and files) on participating nodes that are widely spread over the Internet Route messages to them on behalf of the clients Hide the location of resources from the clients (transparency) Provide performance guarantees (number of hops) Place resources in a structured fashion to satisfy requirements of availability, trust, load-balancing and locality Resources are identified by GUIDs (derived from secure digest function see the textbook chapter 7.4.3). Randomised distribution of resources over nodes in different organisations in the entire world INF5040, Frank Eliassen & Roman Vitenberg 7 INF 5040 høst 2005 3

The difference between IP and overlay routing for P2P applications IP Application-level routing overlay Scale Load balancing Network dynamics (addition/deletion of objects/nodes) Fault tolerance Target identification Security and anonymity IPv4 is limited to 2 32 addressable nodes. The Peer-to-peer systems can address more objects. IPv6 name space is much more generous (2 128 ), The GUID name space is very large and flat but addresses in both versions are hierarchically (>2 128 ), allowing it to be much more fully structured and much of the space is preallocated according to administrative occupied. requirements. Loads on routers are determined by network Object locations can be randomized and hence topology and associated traffic patterns. traffic patterns are divorced from the network topology. IP routing tables are updated asynchronously on Routing tables can be updated synchronously or a best-efforts basis with time constants on the asynchronously with fractions of a second order of 1 hour. delays. Redundancy is designed into the IP network by Routes and object references can be replicated its managers, ensuring tolerance of a single n-fold, ensuring tolerance of n failures of nodes router or network connectivity failure. n-fold or connections. replication is costly. Each IP address maps to exactly one target Messages can be routed to the nearest replica of node. a target object. Addressing is only secure when all nodes are Security can be achieved even in environments trusted. Anonymity for the owners of addresses with limited trust. A limited degree of is not achievable. anonymity can be provided. INF5040, Frank Eliassen & Roman Vitenberg 8 Napster peers Napster server Index 1. File location request 2. List of peers offering the file 3. File request Napster server Index 5. Index update 4. File delivered INF5040, Frank Eliassen & Roman Vitenberg 9 INF 5040 høst 2005 4

P2P middleware (1 of 2) Challenge: offer a mechanism that gives fast and reliable access to resources in a locationtransparent fashion Functional requirements Facilitate construction of services that are implemented over many nodes in a distributed network Make it possible to locate and communicate with all available resources Possible to add new resources and remove old ones Possible to add new nodes and remove old ones Simple application- and resource-independent API INF5040, Frank Eliassen & Roman Vitenberg 10 P2P middleware (2 of 2) Non-functional requirements Global scalability Load-balancing Optimisation for local interaction between neighbour peers Coping with high node and object churn Security of data in an environment with heterogeneous trust Anonymity and resilience to censorship INF5040, Frank Eliassen & Roman Vitenberg 11 INF 5040 høst 2005 5

Distribution of information in a routing overlay A s routing knowledge D s routing knowledge C A D B Object: Node: B s routing knowledge C s routing knowledge INF5040, Frank Eliassen & Roman Vitenberg 12 Routing overlay Application-level algorithm that locates nodes and stored data objects (independently of network routing) Possible to implement at the middleware level Ensures that each node can access every object by routing requests through a sequence of nodes and exploiting the knowledge of each of them to locate the object Responsible for managing the lifecycle of objects and nodes INF5040, Frank Eliassen & Roman Vitenberg 13 INF 5040 høst 2005 6

Essential API for a Distributed Hash- Table (Pastry) Object GUID is derived from all or part of its state using a secure digest function (e.g., SHA-1). GUIDs are used to place objects and to locate them (hence called distributed hash-table) put(guid, data) The data is stored in replicas at all nodes responsible for the object identified by GUID. remove(guid) Deletes all references to GUID and the associated data. value = get(guid) The data associated with GUID is retrieved from one of the nodes responsible it. INF5040, Frank Eliassen & Roman Vitenberg 14 Case study: Pastry Nodes and objects are assigned a 128-bit GUID By applying a secure digest function on node public key and object name or (part of) its state In a network with N nodes, Pastry routing algorithm delivers a message addressed to any GUID in O(log(N)) steps If the GUID maps to an active node, the message is delivered to it. Otherwise, the message is delivered to the node with numerically closest GUID. Fully self-organising O(log(N)) messages when a participant joins, leaves, or fails INF5040, Frank Eliassen & Roman Vitenberg 16 INF 5040 høst 2005 7

Routing algorithm in Pastry Includes two mechanisms: Simple routing mechanism that uses information about neighbours that provides correct routing but may be inefficient More complex mechanism that efficiently routes requests to an arbitrary node (using at most O(log(N)) messages) but that may be temporarily unreliable during periods of instability INF5040, Frank Eliassen & Roman Vitenberg 17 Routing algorithm in Pastry: using the leaf set Each active node maintains an array L ( the leaf set ) of length 2l, that includes GUID and IP addresses of the nodes with numerically closest GUID l predecessor nodes l successor nodes Pastry maintains L in presence of node joins, leaves, and failures INF5040, Frank Eliassen & Roman Vitenberg 18 INF 5040 høst 2005 8

Circular routing: Correct but inefficient 0 FFFFF...F (2 128-1) D471F1 D467C4 D46A1C The dots depict live nodes. The space is considered circular: node 0 is adjacent to node (2 128-1). The diagram illustrates the routing of a message from node 65A1FC to D46A1C using leaf set information alone, assuming leaf sets of size 8 (l = 4). D13DA3 65A1FC INF5040, Frank Eliassen & Roman Vitenberg 19 Routing algorithm in Pastry: using the routing table Improves the leaf set algorithm Every Pastry node maintains a treestructured routing table that includes GUIDs and IP-addresses for some nodes spread over all the address space of GUID values. The table is not uniform: Dense coverage of GUIDs that are numerically close to the node own GUID Density decreases with distance from the node INF5040, Frank Eliassen & Roman Vitenberg 20 INF 5040 høst 2005 9

Example: first four rows in a Pastry routing table INF5040, Frank Eliassen & Roman Vitenberg 21 Pastry routing example 0 FFFFF...F (2 128-1) Routing a message from node 65A1FC to D46A1C. With the aid of a well-populated routing table the message can be delivered in ~ log 16 (N ) hops. D471F1 D46A1C D467C4 D462BA D4213F D13DA3 65A1FC INF5040, Frank Eliassen & Roman Vitenberg 22 INF 5040 høst 2005 10

Pastry routing algorithm When node A receives message M addressed to GUID D (R[p,i] is the element of the routing table at row p, column i) 1. if L -l < D < L l { // the destination is within the leaf set 2. Forward M to leaf set element with GUID closest to D 3. } else { // use the routing table 4. Find p, the length of the longest common prefix of D and A and i, the (p+1)th hexadecimal digit of D 5. if (R[p,i] null) { 6. Forward M to R[p,i] // common-prefix routing 7. } else { // there is no entry in the routing table 8. Forward M to any node in R or L that is numerically closer to D than A 9. } 10. } INF5040, Frank Eliassen & Roman Vitenberg 24 Pastry: addition of a new node Join protocol that constructs the routing table & leaf set X Routing info Join(x) Z C B A Join(x) Join(x) Join(x) X: new node to join A: closest neighbour Z: GUID closest numerically to X (routed in the usual way) B, C,...: nodes the join message is routed via A, Z, B, C,... transmits relevants parts of their routing tables and leaf sets to X. X uses this info to build its initial routing table and leaf set. X then transmits its routing table and leaf set to A, Z, B, C,... INF5040, Frank Eliassen & Roman Vitenberg 25 INF 5040 høst 2005 11

Pastry: handling leaves and failures Pastry node is considered failed when its immediate neighbours (in the GUID space) cannot communicate with it any longer All nodes send heartbeat messages to neighbour nodes (in their own leaf set) When it occurs, it is necessary to repair all leaf sets that include GUID of the node that left or failed A node repairs its leaf set L by asking a node close to the failed one to send its leaf set L, removing the failed node, and adding a node from L Routing tables are repaired upon discovery (when a routing request fails) INF5040, Frank Eliassen & Roman Vitenberg 26 Pastry: fault-tolerance and reliability Routing failure may occur Because of delays in spreading the info about failed nodes A Pastry application should retransmit routing requests in absence of response In the meantime, the failure can possibly become repaired Randomisation of routing choice (line 6 in the routing algorithm) In some cases, choose a node in R[p,j] instead of R[p,i] (routing choice that occasionally diverges from the standard algorithm) If some node blocks the route, a different path will be chosen sooner or later due to retransmissions MSPastry: extension of Pastry with additional dependability mechanisms Ack after each hop in the routing algorithm and selection of an alternative route upon timeout heartbeat -messages Other miscellaneous improvements INF5040, Frank Eliassen & Roman Vitenberg 27 INF 5040 høst 2005 12

Evaluation (MSPastry) Based on simulations [Castro et al 2004] Good performance and high reliability with thousands of nodes Gracefully degrading as the failure rate increases Reliability Upon 0% loss rate of IP-messages, MSPastry was not delivering 1,5 of 100.000 routing requests; none were delivered to a wrong node Upon 5% loss rate of IP-messages, MSPastry was not delivering 3,3 of 100.000 routing requests, and 1,5 of 100.000 were delivered to a wrong node Performance Measured relative delay penalty: a ratio between the delay of request delivery via MSPastry and the corresponding delay when using UDP/IP Relative delay penalty varies between about 1,8 (0% loss rate of IPmessages) and about 2,2 (5% loss rate of IP-messages) Overhead Control-traffic accounts for approximately 2 messages per minute per node in the long run (initial cost of setup is relatively high) INF5040, Frank Eliassen & Roman Vitenberg 28 Example of a Pastry-based application: Squirrel Web-caching system that makes use of storage and computational resources that are already available on desktop-machines in a local network GUID: applying SHA-1 on the URL gives a 128 bits Pastry-GUID. The node whose GUID is numerically closest to the calculated GUID becomes the home node for the object The home node is responsible for maintaining a cached copy of the object (acts as a proxy-server for this object) Client nodes use Squirrel to route GET or cget requests to the home node of the web object Evaluation shows that the performance is comparable with the performance of a typical centralised cache (measurements including (1) reduction in the use of extern bandwidth, (2) latency perceived by the user, (3) storage and processing load on client nodes) INF5040, Frank Eliassen & Roman Vitenberg 29 INF 5040 høst 2005 13

Summary P2P systems distribute processing load and network traffic between all nodes that participate in the system P2P systems are not dependent on a central entity for administration of the system (and self-organisation) The effectiveness critically depends on algorithms for placement of data over many nodes and for subsequent access to the data P2P middleware is an application-independent software layer that implements a routing overlay Study and evaluation of an implementation: Pastry A Pastry-based application: Squirrel web-cache INF5040, Frank Eliassen & Roman Vitenberg 30 INF 5040 høst 2005 14