11 Overlay networks for searching

Similar documents
Chord - A Distributed Hash Table

Distributed Computing over Communication Networks: Topology. (with an excursion to P2P)

Chord. A scalable peer-to-peer look-up protocol for internet applications

Bounding Communication Cost in Dynamic Load Balancing of Distributed Hash Tables

New Algorithms for Load Balancing in Peer-to-Peer Systems

Optimizing and Balancing Load in Fully Distributed P2P File Sharing Systems

Load Balancing in Structured Overlay Networks. Tallat M. Shafaat

Distributed Computing over Communication Networks: Maximal Independent Set

International journal of Engineering Research-Online A Peer Reviewed International Journal Articles available online

INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY

Algorithms and Methods for Distributed Storage Networks 9 Analysis of DHT Christian Schindelhauer

Approximation Algorithms

Handout #Ch7 San Skulrattanakulchai Gustavus Adolphus College Dec 6, Chapter 7: Digraphs

Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications

Async: Secure File Synchronization

Lecture 1: Course overview, circuits, and formulas

Tornado: A Capability-Aware Peer-to-Peer Storage Network

Distributed Hash Tables in P2P Systems - A literary survey

Binary Search Trees. A Generic Tree. Binary Trees. Nodes in a binary search tree ( B-S-T) are of the form. P parent. Key. Satellite data L R

CMSC 858T: Randomized Algorithms Spring 2003 Handout 8: The Local Lemma

Graph models for the Web and the Internet. Elias Koutsoupias University of Athens and UCLA. Crete, July 2003

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November ISSN

CS 598CSC: Combinatorial Optimization Lecture date: 2/4/2010

MIDAS: Multi-Attribute Indexing for Distributed Architecture Systems

New Structured P2P Network with Dynamic Load Balancing Scheme

CMPSCI611: Approximating MAX-CUT Lecture 20

A Note on Maximum Independent Sets in Rectangle Intersection Graphs

Computational Geometry. Lecture 1: Introduction and Convex Hulls

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory

Plaxton routing. Systems. (Pastry, Tapestry and Kademlia) Pastry: Routing Basics. Pastry: Topology. Pastry: Routing Basics /3

Join and Leave in Peer-to-Peer Systems: The DASIS Approach

Social Media Mining. Graph Essentials

Acknowledgements. Peer to Peer File Storage Systems. Target Uses. P2P File Systems CS 699. Serving data with inexpensive hosts:

1 Introduction to Internet Content Distribution

A Denial-of-Service Resistant DHT

Scalable P2P Overlays of Very Small Constant Degree: an Emerging Security Threat

R-Chain: A Self-Maintained Reputation Management System in P2P Networks

1) The postfix expression for the infix expression A+B*(C+D)/F+D*E is ABCD+*F/DE*++

The Butterfly, Cube-Connected-Cycles and Benes Networks

Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications

A Peer-to-Peer File Sharing System for Wireless Ad-Hoc Networks

Plaxton Routing. - From a peer - to - peer network point of view. Lars P. Wederhake

Load Balancing in Structured P2P Systems

Data Structures Fibonacci Heaps, Amortized Analysis

Dynamic TCP Acknowledgement: Penalizing Long Delays

IE 680 Special Topics in Production Systems: Networks, Routing and Logistics*

SCALABLE RANGE QUERY PROCESSING FOR LARGE-SCALE DISTRIBUTED DATABASE APPLICATIONS *

GISP: Global Information Sharing Protocol a distributed index for peer-to-peer systems

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research

LOOKING UP DATA IN P2P SYSTEMS

Heterogeneity and Load Balance in Distributed Hash Tables

Self-Chord: a Bio-Inspired Algorithm for Structured P2P Systems

Lecture 23: Interconnection Networks. Topics: communication latency, centralized and decentralized switches (Appendix E)

Graph Theory Problems and Solutions

Locality-Aware Randomized Load Balancing Algorithms for DHT Networks

1 Review of Least Squares Solutions to Overdetermined Systems

Single-Link Failure Detection in All-Optical Networks Using Monitoring Cycles and Paths

Graph Theory and Complex Networks: An Introduction. Chapter 08: Computer networks

3. Mathematical Induction

Multi-layer Structure of Data Center Based on Steiner Triple System

36 CHAPTER 1. LIMITS AND CONTINUITY. Figure 1.17: At which points is f not continuous?

Binary Search Trees. Data in each node. Larger than the data in its left child Smaller than the data in its right child

Topological Properties

PEER TO PEER CLOUD FILE STORAGE ---- OPTIMIZATION OF CHORD AND DHASH. COEN283 Term Project Group 1 Name: Ang Cheng Tiong, Qiong Liu

CHAPTER 5 Round-off errors

query enabled P2P networks Park, Byunggyu

Lecture 22: November 10

The Goldberg Rao Algorithm for the Maximum Flow Problem

Energy Benefit of Network Coding for Multiple Unicast in Wireless Networks

A Robust and Secure Overlay Storage Scheme Based on Erasure Coding

Krunal Patel Department of Information Technology A.D.I.T. Engineering College (G.T.U.) India. Fig. 1 P2P Network

System Interconnect Architectures. Goals and Analysis. Network Properties and Routing. Terminology - 2. Terminology - 1

Topology-based network security

A binary search tree is a binary tree with a special property called the BST-property, which is given as follows:

Lecture 5 - CPA security, Pseudorandom functions

Computational Puzzles as Sybil Defenses

Lecture 2: Universality

JUST-IN-TIME SCHEDULING WITH PERIODIC TIME SLOTS. Received December May 12, 2003; revised February 5, 2004

(67902) Topics in Theory and Complexity Nov 2, Lecture 7

Polynomial Invariants

1 Problem Statement. 2 Overview

Analysis of Algorithms I: Binary Search Trees

Math 4310 Handout - Quotient Vector Spaces

Some questions... Graphs

Data Structure [Question Bank]

Find-The-Number. 1 Find-The-Number With Comps

Method To Solve Linear, Polynomial, or Absolute Value Inequalities:

Offline sorting buffers on Line

Object Request Reduction in Home Nodes and Load Balancing of Object Request in Hybrid Decentralized Web Caching

Connectivity and cuts

Fault-Tolerant Routing Algorithm for BSN-Hypercube Using Unsafety Vectors

Decentralized supplementary services for Voice-over-IP telephony

arxiv: v1 [math.pr] 5 Dec 2011

Efficiency of algorithms. Algorithms. Efficiency of algorithms. Binary search and linear search. Best, worst and average case.

Lecture 4 Online and streaming algorithms for clustering

IPTV AND VOD NETWORK ARCHITECTURES. Diogo Miguel Mateus Farinha

Lecture Notes on Binary Search Trees

SHARP BOUNDS FOR THE SUM OF THE SQUARES OF THE DEGREES OF A GRAPH

Representation of functions as power series

6.852: Distributed Algorithms Fall, Class 2

Transcription:

11 Overlay networks for searching In this section we will present two overlay networks useful for data management where peers can join and leave without the aid of a supervisor. Certainly, a supervisor may always have to exist because otherwise new peers are not able to find out who is currently in the system. However, the role of the supervisor would simply reduce to providing a pointer (such as the IP address) to some peer in the system. Everything else will be handled by the peers themselves. An overlay network has to offer the following operations to be suitable for data management: p.join(q): peer p is contacted by peer q in order to join the system. p.leave(): peer p wants to leave the system. p.search(key): peer p searches for the peer responsible for key We will describe two overlay networks that provide this functionality in an efficient way: Chord [3] and Koorde [1, 2]. 11.1 Chord Suppose that we have a (pseudo-)random hash function h assigning to each node an ID representing a real number in [0, 1). Given a set of nodes V and a real number x [0, 1), we define: succ(x) = argmin{v V h(v) > x}, i.e. succ(x) is the node whose hash value is the closest successor of x, and pred(x) = argmax{v V h(v) < x}, i.e. pred(x) is the node whose hash value is the closest predecessor of x. Notice that we view [0, 1) as a ring here. This means for succ(x) that if there is no node v with h(v) > x, then we associate succ(x) with the node with smallest h(v) among all nodes in V. In the Chord network, the following invariants have to be kept at any time. Invariant 11.1 For any set of nodes V currently in the system, it holds for every v V that v is connected to (a) pred(h(v)), succ(h(v)), and (b) succ(h(v) + 1/2 i ) for all i 1 with succ(h(v) + 1/2 i ) succ(h(v)). Invariant 11.1(a) requires that the nodes are organized in a sorted, doubly linked cycle, the socalled Chord ring, and Invariant 11.1(b) makes sure that messages can move quickly from any node to any other node on the cycle. The first type of edges are called ring edges and the second type of edges are called shortcut edges. These shortcut edges actually form a hypercubic structure. To see this, recall the definition of the hypercube: Definition 11.2 (Hypercube) The d-dimensional hypercube H(d) is an undirected graph G = (V, E) with node set V = [2] d and edge set E that contains all edges {v, w} with the property that v and w differ in exactly one bit. 1

1 0 Figure 1: An example of a Chord network (only short-cut pointers for two nodes are given). Now, view any node v = (v d 1,..., v 0 ) as a real number x v = d i=1 v i 2 d i. Obviously, x v [0, 1), so it is a legal number in our domain. Using this encoding, it holds for every node w that differs from v only in the ith largest bit that x w = x v + 1/2 i or x w = x v 1/2 i (depending on v d i ). This is very similar to the requirement for our shortcut edges in Invariant 11.1. The Chord overlay network has the following properties, where n is the current number of nodes in the system: Theorem 11.3 If Invariant 11.1 is true and h assigns random numbers to nodes, then Chord has a maximum outdegree of O(log n), a maximum indegree of O(log 2 n), a diameter of O(log n), and a node expansion of Ω(1/logn) with high probability. Proof. We start with the maximum outdegree. Certainly, every node only has two ring edges. Thus, it remains to bound the number of shortcut edges. Consider any fixed node v V and let I v = [h(v), h(v)+1/n 3 ]. For any node w V \{v}, the probability that h(w) I v is equal to 1/n 3. Hence, the expected number of nodes in w V \ {v} with h(w) I v is at most (1/n 3 ) (n 1) 1/n 2. Using this in the standard Markov Inequality (i.e. Pr[X k] E[X]/k) it follows that the probability of having at least one node w V \ {v} with w I v is at most 1/n 2. Hence, the expected number of nodes v V that have at least one other node in I v is at most 1/n, and therefore, again from the Markov Inequality, the probability that there is at least one node v V with another node in I v is at most 1/n. Thus, with high probability it holds that for all nodes v V, succ(h(v) + 1/2 3 log n ) = succ(h(v)). Thus, with high probability every node only has O(log n) shortcut pointers. Next we consider the indegree. We only sketch the proof. For any node v V, let I v = [h(v) (ɛ ln n)/n, h(v)] for some constant ɛ (0, 1). The probability that for some fixed node v there is no other node w with w I v is equal to (1 I v ) n 1 = ( 1 ɛ ln n n ) n 1 e (n 1) ɛ ln n n ɛ log n 2 e 2ɛ ln n = n 2ɛ

because it is known that (1 1/k) k 1 1/e for all k > 1. Hence, the expected number of nodes v with no other node in I v is at least n n 2ɛ = n 1 2ɛ. Using this, one can also show that with high probability, there will be at least one node v with no other node in I v, if ɛ 1/3. For such a node v, all nodes in the intervals [h(v) (ɛ log n)/n 1/2 i, h(v) 1/2 i ) would have to have a shortcut pointer to v, and there are on expectation, and with high probability, Θ(log 2 n) of these nodes. So there will be a node with indegree Θ(log 2 n) with high probability. Next we bound the diameter. Consider any two nodes v and w, and let δ v,w = h(w) h(v) if h(w) h(v) and otherwise δ v,w = 1 + h(w) h(v), i.e. δ v,w is the distance between v and w when walking clockwise along the Chord ring. If we take a shortcut pointer to succ(h(v) + 1/2 i ) where i is the largest value so that h(v) + 1/2 i h(w) (resp. h(v) + 1/2 i 1 + h(w) if h(w) < h(v)), then we end up at a node u with δ u,w 1δ 2 v,w. Hence, shortcut pointers always allow us to cut the distance a message has to travel by at least a factor of 2. Hence, in O(log n) hops a message can be sent between any pair of nodes, proving the diameter bound. The proof for the expansion bound is quite involved and will therefore not be discussed here. In order to maintain Invariant 11.1, we need proper rules for joining and leaving the network. As a subroutine for the JOIN method, we need a routing mechanism to send a message to any peer in the network. Routing in Chord Consider the following routing strategy: Suppose that node u is the current location of a message with destination x [0, 1). As long as x (h(pred(h(u))), h(u)] (in which case the message has not yet reached succ(x)), u sends the request to the node succ(h(u) + 1/2 i ) with maximum i so that h(u) + 1/2 i x (treating [0, 1) here as a ring). Using the diameter proof in Theorem 11.3, we obtain the following result: Lemma 11.4 For any node v V and any real number x [0, 1), it takes at most O(log n) hops to send a message from v to succ(x). Joining and leaving the network Suppose that a new node v contacts node w V to join the system. Then w will forward v s request to succ(v) using the Chord routing strategy with x = h(v). succ(v) will then integrate v between succ(v) and pred(h(succ(v))). Afterwards, succ(v) will send all of its outgoing shortcut pointers over to v. v then sends requests to these pointers, using the Chord routing strategy, to establish its own outgoing shortcut pointers. Furthermore, succ(v) will move all of the incoming shortcut pointers relevant for v according to Invariant 11.1 to v and notify the origins of these pointers about that. Once v has received information about all of its outgoing and incoming shortcut pointers, the join operation terminates. Theorem 11.5 Inserting a new node requires O(log n) time and message transmissions on expectation and O(log 2 n) time and message transmissions with high probability. Proof. We only sketch the proof. Due to symmetry reasons, the expected fraction of the [0, 1) ring owned by any particular node v is 1/n (i.e. all edges with an endpoint in that area will be assigned to 3

v). Furthermore, the largest possible interval in the [0, 1) ring without a node is at most O((log n)/n) with high probability, because the probability that no node chooses a value in some fixed interval of size (c ln n)/n is equal to ( 1 c ln n ) n e c ln n n n = n c. n Hence, the maximum fraction of the [0, 1) ring owned by a node is O((log n)/n) with high probability. If a new node receives an area of size k, then on expectation and with high probability, O(log n) new edges from v have to be created and O(k log n) edges to v have to be changed, resulting in the theorem. If a node v wants to leave the system, it first notifies all nodes having pointers to it to point to v s successor, and then it removes itself from the doubly linked cycle by connecting its predecessor and successor. Also here we get: Theorem 11.6 Deleting a node requires O(log n) time and message transmissions on expectation and O(log 2 n) time and message transmissions with high probability. Data management The data management works very similar to the nearest neighbor strategy presented in Section 8: Also data items are hashed to random values in [0, 1), and any data item d is placed at the node v with v = succ(h(d)). This strategy shares all the properties of the nearest neighbor strategy. That is, it is faithful, spaceefficient, time-efficient, and 2-competitive concerning adaptivity. Hence, using this strategy, data will on expectation be evenly distributed among the nodes, and on expectation, at most a factor of 2 more data than necessary has to be replaced if a node joins or leaves. Searching When a new data item has to be inserted, or when a data item has to be located, we can use the Chord routing strategy above to learn about the node responsible for it and then contact it directly to insert or retrieve the data item. This strategy achieves the following result, which follows immediately from Lemma 11.4. Theorem 11.7 Any search operation can be executed in O(log n) hops, with high probability. Fault tolerance In order to achieve a high fault tolerance, every node maintains pointers to its O(log n) successors on the doubly linked cycle. In this case it holds: Claim 11.8 If nodes have random IDs, then even if ɛn nodes leave at the same time for some constant ɛ < 1, the probability that a node loses all of its successors is polynomially small in n. 4

Proof. If ɛn nodes leave the system, the probability that all c log n successors of a node are among them is ɛn n ɛn 1 ɛn (c log n 1)... ɛ c log n = n c log(1/ɛ). n 1 n (c log n 1) This is polynomially small if c is sufficiently large compared to ɛ. 11.2 Koorde The basic approach of Koorde [1] is similar Chord. Also here, every node is given a random value in [0, 1) via some hash function h. However, instead of using a hypercubic structure for interconnecting the nodes, Koorde uses a DeBruijn-like structure. More precisely, the following invariant has to be maintained at any time. Invariant 11.9 For any set of nodes V currently in the system, it holds for every v V that v is connected to (a) pred(h(v)), succ(h(v)), (b) succ(h(v)/2), and succ((1 + h(v))/2). 1 0 Figure 2: An example of a Koorde network (only short-cut pointers for two nodes are given). To see that this indeed forms a DeBruijn-like structure, recall the definition of the DeBruijn graph: Definition 11.10 (DeBruijn) The d-dimensional DeBruijn graph DB(d) is an undirected graph G = (V, E) with node set V = [2] d and edge set E that contains all edges {v, w} with the property that v = (v d 1,..., v 0 ) and w {(x, v d 1,..., v 1 ) : x {0, 1}}. 5

Now, view any node v = (v d 1,..., v 0 ) as a real number x v = d i=1 v i 2 d i. Obviously, x v [0, 1), so it is a legal number in our domain. Using this encoding, (0, v d 1,..., v 1 ) is approximately equal to x v /2 and (1, v d 1,..., v 1 ) is approximately equal to (1 + x v )/2, so a DeBruijn-like structure can be observed in Invariant 11.9. The Koorde overlay network has the following properties, where n is the current number of nodes in the system: Theorem 11.11 If Invariant 11.9 is true and h assigns random numbers to nodes, then Koorde has a maximum outdegree of 4, a maximum indegree of O(log n), a diameter of O(log n), and a node expansion of Ω(1/ log n) with high probability. Proof. The bound on the maximum outdegree is obvious. For the maximum indegree, notice that there will be a node v in the system with high probability where the interval I v = [h(v) (ɛ log n)/n, h(v)] is not occupied by any other node if the constant ɛ (0, 1) is sufficiently small (see the proof of Theorem 11.3). Hence, all nodes in the intervals [2(h(v) (ɛ log n)/n), 2h(v)) and [2(h(v) (ɛ log n)/n) 1, 2h(v) 1) would have to have a shortcut pointer to v, and there are on expectation, and with high probability, Θ(log n) of these nodes. So there will be a node with indegree Θ(log n) with high probability. Next we bound the diameter. Consider any node v. If we take a shortcut pointer to succ((1 + h(v))/2), then we arrive at a node w with h(w) 1/2 + h(v)/2. Hence, no matter at which node v we start, after t hops to succ((1 + h(v))/2), we arrive at a node w with h(w) t i=1 1/2 t. Once we arrive at a node w with smallest h(w) (i.e. we passed the 0/1 border), we stop. According to the proof of Theorem 11.3, this takes only O(log n) hops. Hence, there must be a path from any node to any other node of length O(log n), which proves the diameter bound. Again, the proof of the expansion bound is quite involved and left out here. Thus, compared to Chord, Koorde reduces the degree bounds by a logarithmic factor while preserving diameter and expansion. In order to maintain Invariant 11.9, we use the following rules when joining or leaving the network. Routing in Koorde Consider the following routing strategy for sending a message from a node v with h(v) = 0.x 1 x 2 x 3... in binary notation (i.e. h(v) = i 1 x i /2 i ) to the virtual destination z = 0.z 1 z 2 z 3... in binary notation: Forward the message along the nodes succ(0.y 1 x 1 x 2 x 3...), succ(0.y 2 y 1 x 1 x 2...), succ(0.y 3 y 2 y 1 x 1...), etc., where the bits y 1, y 2, y 3,... are chosen at random, until a node succ(0.y k... y 1 x 1 x 2...) is reached with succ(0.y k... y 1 x 1 x 2...) = succ(0.y k... y 1 z 1 z 2...). At that point, forward the message along the nodes succ(0.y k 1... y 1 z 1 z 2...), succ(0.y k 2... y 1 z 1 z 2...), etc., until node succ(z 1 z 2 z 3...) is reached. Notice that each virtual hop above may require more than just moving along a shortcut edge. Also ring edges may have to be used once in a while to adapt the position. However, on expectation, only O(1) ring edges have to be used for each virtual hop above. The reason why this routing strategy is more complicated than the Chord strategy is that routing in the DeBruijn graph requires knowledge 6

about the size of the graph to accurately simulate the rotation operations defining the edges. Since a node may not necessarily be aware of the size of the network, this more complicated strategy, that proceeds towards a random intermediate destination until the two trajectories from the source and destination meet, has to be used. The Koorde routing strategy achieves the following result (which will not be shown here): Lemma 11.12 For any node v V and any real number z [0, 1), it takes at most O(log n) hops, with high probability, to send a message from v to succ(z). Joining and leaving the network Suppose that a new node v contacts node w V to join the system. Then w will forward v s request to succ(v), using the Koorde routing strategy above. succ(v) will then integrate v between succ(v) and pred(succ(v)). Afterwards, succ(v) will send all of its outgoing shortcut pointers over to v. v then sends requests to these pointers, using the Koorde routing strategy, to establish its own outgoing shortcut pointers. Furthermore, succ(v) will move all of the incoming shortcut pointers relevant for v according to Invariant 11.9 to v and notify the origins of these pointers about that. Once v has received information about all of its outgoing and incoming shortcut pointers, the join operation terminates. Theorem 11.13 Inserting a new node requires O(log n) time and message transmissions with high probability. If a node v wants to leave the system, it first notifies all nodes having pointers to it to point to v s successor, and then it removes itself from the doubly linked cycle by connecting its predecessor and successor. Also here we get: Theorem 11.14 Deleting a node requires O(log n) time and message transmissions with high probability. Data management and searching Data and search requests are handled in the same way as in Chord, with the only difference that search requests will use the Koorde routing strategy to reach the node responsible for a data item. References [1] M. F. Kaashoek and D. R. Karger. Koorde: A simple degree-optimal distributed hash table. In 2nd Intl. Workshop on Peer-to-Peer Systems (IPTPS), 2003. [2] M. Naor and U. Wieder. Novel architectures for P2P applications: the continuous-discrete approach. In Proc. of the 15th ACM Symp. on Parallel Algorithms and Architectures (SPAA), pages 50 59, 2003. [3] I. Stoica, R. Morris, D. Karger, F. Kaashoek, and H. Balakrishnan. Chord: A scalable peer-to-peer lookup service for Internet applications. In SIGCOMM 01, pages 149 160, 2001. 7