Peer-to to-peer (P2P) Systems Chapter 3: Case study: Content-Addressable Network, Content-Addressable Network vs. Distributed Hash Table One of the first structured DHT-based P2P overlays d-torus or hypercube topology Authors (SIGCOMM 2001): Sylvia Ratnasamy (her PhD topic, 2002) Paul Francis Mark Handley Richard Karp Scott Shenker Dmitry G. Korzun, 2010-2011 1 Dmitry G. Korzun, 2010-2011 2 Styles of Content Distribution in Internet Web is a traditional style Relatively slow for large dynamic communities P2P allows content to be published easily and quickly Extending the ability of publishing to individual end-users P2P publishers have direct control over making their content available to search algorithms Newly published content is rapidly made available to keyword-based searches P2P relies entirely on the participation of end-users machines and not on any deployed infrastructure Dmitry G. Korzun, 2010-2011 3 Data ID space (keys) d-dimensional Cartesian coordinate space Hypercube geometry [0,1] d d-torus Logical, no relations to any physical coordinate system 1-torus with 4 nodes Dmitry G. Korzun, 2010-2011 4 Example 2d hypercube with 5 nodes The coordinate space is dynamically partitioned among all the nodes; every node owns its distinct zone Remember that in d-torus the coordinate space wraps Partition of the ID space The entire space is dynamically partitioned among all nodes Every node owns its individual zone Zone = hyperrectangle Dmitry G. Korzun, 2010-2011 5 Dmitry G. Korzun, 2010-2011 6 1
Partitioning as 5 nodes join in succession An example for your intuition More details will be soon Assigning keys to nodes Nodes keep pairs (k,data), where k=(k 1,,k d ) k = h(dataname) (k,data) is stored at the node that owns the zone within which k lies Dmitry G. Korzun, 2010-2011 7 Dmitry G. Korzun, 2010-2011 8 Node ID Space Node IDs Global ID construction? Node ~ its zone -> node ID Node IDs are virtual: the partition tree Think of each node as a leaf of a binary partition tree Reflection of the key space partition Dmitry G. Korzun, 2010-2011 9 Dmitry G. Korzun, 2010-2011 10 Neighbors Neighbors are adjoining nodes overlapping along d-1 dimensions and abut along one dimension Purely local neighbor state only shortdistance contacts no long-distance contacts Such immediate neighbors in the coordinate space serve as a coordinate routing table that enables routing between arbitrary points in this space Routing Following the straight line path through the d-torus Next hop has coordinates closest to the destination Cartesian distance Greedy routing Many alternative paths Dmitry G. Korzun, 2010-2011 11 Dmitry G. Korzun, 2010-2011 12 2
Alternative paths Fault-tolerance Security Multi-path routing Joining (1) 1. A new node u must find a node w already in the 2. Using routing mechanisms, w must find a node v whose zone will be split 3. The neighbors of the split zone must be notified so that routing can include u u must be allocated its own portion of the coordinate space v splits its zone in half, retaining half and handing the other half to the new node Dmitry G. Korzun, 2010-2011 13 Dmitry G. Korzun, 2010-2011 14 Joining (2) u chooses a random point at the ID space, u c =(u 1,,u d ) Joining (3) u asks w to find v whose zone contains u c JOIN request v splits its zone in half and assigns one of the halves to u Ordering dimensions along which to split so zones can be re-merged when nodes leave Node ID for u from the partition tree Dmitry G. Korzun, 2010-2011 15 Dmitry G. Korzun, 2010-2011 16 Splitting the space (1) 5 nodes join sequentially Partitioning the space Order of dimensions: X, Y Splitting the space (2) Before and after node 7 joins Dmitry G. Korzun, 2010-2011 17 Dmitry G. Korzun, 2010-2011 18 3
Selecting neighbors (1) A new node learns the IP addresses of its coordinate neighbor set from the previous occupant The previous occupant updates its neighbor set to eliminate those nodes that are no longer neighbors Both the new and old nodes neighbors are informed of this reallocation of space immediate update message then periodic refreshes, with its currently assigned zone to all its neighbors these updates ensure that all of their neighbors will quickly learn about the change and will update their own neighbor sets accordingly Selecting neighbors (2) No freedom in selection deterministic strategy Just finding the existing neighbors All neighbors of u are among v s neighbors u becomes v s neighbor (and vise versa) v shares some of its neighbors to u Notify both the old and new nodes neighbors about the reallocation UPDATE message Small amount of existing nodes are affected O(d) nodes affected, i.e., O(1) for large N Dmitry G. Korzun, 2010-2011 19 Dmitry G. Korzun, 2010-2011 20 Selecting neighbors (3) Selecting neighbors (3) Neighbors of node 1: 2, 3, 4, 5 Neighbors of node 1: 2, 3, 4, 7 Neighbors of node 7: 1, 2, 4, 5 Periodic refreshes with currently assigned zone to all neighbors Maintenance traffic (coordination) Nodes quickly learn about changes and update their own neighbor sets accordingly node joins node departures Dmitry G. Korzun, 2010-2011 21 Dmitry G. Korzun, 2010-2011 22 Node Departure (1) When nodes leave a, their zones are taken over by the remaining nodes Correct departures Faulty nodes Node Departure (2) Correct departure (with notification) If the zone of one of the neighbors can be merged with the departing node s zone to produce a valid single zone, then this is done Dmitry G. Korzun, 2010-2011 23 Dmitry G. Korzun, 2010-2011 24 4
Node Departure (3) If not, then the zone is handed to the neighbor whose current zone is smallest, and that node will then temporarily handle both zones Node Departure (4) x finds a neighbor y which zone can be merged correctly with it Squared shaped If no such a neighbor, x finds a neighbor z which temporarily owns two different zones at the same time x is replaced to the chosen neighbor updates to all x s neighbors Dmitry G. Korzun, 2010-2011 25 Dmitry G. Korzun, 2010-2011 26 Node Departure (5) Departure with notification Node Departure (6) Faults (no notification) A node sends periodic update messages to each of its neighbors giving its zone coordinates and a list of its neighbors and their zone coordinates The prolonged absence of an update message from a neighbor signals its failure The takeover mechanism Dmitry G. Korzun, 2010-2011 27 Dmitry G. Korzun, 2010-2011 28 Node Departure (7) Takeover mechanism (see the next slide) Data (k,data) owned by the failed node are lost Periodic updates to each of its neighbors giving: its zone coordinates list of its neighbors with coordinates If no update message from a neighbor for long time, then it signals on failure Dmitry G. Korzun, 2010-2011 29 Node Departure (8) TAKEOVER mechanism (when u s neighbor has died) 1. u initializes a timer in proportion to its zone volume Each neighbor u of the failed node does this independently! 2. When the timer expires, the node sends TAKEOVER to all of the failed node s neighbors It contains the volume of its own zone 3. On receipt of TAKEOVER, u: If the zone in TAKEOVER is smaller than its own zone, u cancels its timer Otherwise u sends a new TAKEOVER 4. A neighboring node (takeover node) is chosen that is still alive and has small zone volume. Reallocation of zones. Dmitry G. Korzun, 2010-2011 30 5
Node Departure (9) ID is k-bit string (e.g., k=log N) Prefix of l bits in partition tree from root Other k-l bits are 0s ID distance is numerical closeness Takeover node w is numerically closest to the died node v Extreme cases (1) Under certain failure scenarios e.g., the simultaneous failures of adjacent nodes It is possible for the state to become inconsistent an expanding search for any nodes residing beyond the failure region It eventually rebuilds sufficient neighbor state to initiate a takeover safely Stabilization procedure (as in Chord) Dmitry G. Korzun, 2010-2011 31 Dmitry G. Korzun, 2010-2011 32 Extreme cases (2) Both the normal leaving procedure and the immediate takeover algorithm can result in a node holding more than one zone To prevent repeated further fragmentation of the space, a background zonereassignment algorithm is used Analysis (1) Local routing state is 2d neighbors 2 neighbors in each dimension Topology graph is 2-grid Due to fragmentation there can be more than 2d neighbors O(d) neighbors i.e., O(1) for large N Dmitry G. Korzun, 2010-2011 33 Dmitry G. Korzun, 2010-2011 34 Analysis (2) Average routing path length is (d/4)n 1/d Each dimension has N 1/d zones Average distance along one dimension is (1/4)N 1/d Maximal routing path length is (d/2)n 1/d Analysis (3) Worst case example N=64, d=2 8 zones per dimension Maximal length in each dimension is 4 The total length is 8=2*4 Dmitry G. Korzun, 2010-2011 35 Dmitry G. Korzun, 2010-2011 36 6
Analysis (4) Fault-tolerant tolerant routing Nodes affected when joining is O(d) Scalability when N grows is O(dN 1/d ) Many alternate paths between any two nodes Dmitry G. Korzun, 2010-2011 37 Dmitry G. Korzun, 2010-2011 38 Discussion Play a role of nodes (each student acts as a node) Joining Routing Leaving Dmitry G. Korzun, 2010-2011 39 7