CS 268: Structured P2P Networks: Pastry and Tapestry

Similar documents
Plaxton routing. Systems. (Pastry, Tapestry and Kademlia) Pastry: Routing Basics. Pastry: Topology. Pastry: Routing Basics /3

Chord. A scalable peer-to-peer look-up protocol for internet applications

CS5412: TIER 2 OVERLAYS

Security in Structured P2P Systems

How To Create A P2P Network

Using Peer to Peer Dynamic Querying in Grid Information Services

A Survey and Comparison of Peer-to-Peer Overlay Network Schemes

PEER-TO-PEER (P2P) systems have emerged as an appealing

5. Peer-to-peer (P2P) networks

Analysis of Algorithms I: Binary Search Trees

P2P: centralized directory (Napster s Approach)

Chord - A Distributed Hash Table

Distributed Computing over Communication Networks: Topology. (with an excursion to P2P)

Middleware and Distributed Systems. Peer-to-Peer Systems. Martin v. Löwis. Montag, 30. Januar 12

Discovery and Routing in the HEN Heterogeneous Peer-to-Peer Network

LOOKING UP DATA IN P2P SYSTEMS

Plaxton Routing. - From a peer - to - peer network point of view. Lars P. Wederhake

Adapting Distributed Hash Tables for Mobile Ad Hoc Networks

Peer-to-Peer Networks. Chapter 6: P2P Content Distribution

P2P Storage Systems. Prof. Chun-Hsin Wu Dept. Computer Science & Info. Eng. National University of Kaohsiung

T he Electronic Magazine of O riginal Peer-Reviewed Survey Articles ABSTRACT

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November ISSN

Failure Recovery for Structured P2P Networks: Protocol Design and Performance Evaluation

Department of Computer Science Institute for System Architecture, Chair for Computer Networks. File Sharing

Final Exam. Route Computation: One reason why link state routing is preferable to distance vector style routing.

An Introduction to Peer-to-Peer Networks

GISP: Global Information Sharing Protocol a distributed index for peer-to-peer systems

p2p: systems and applications Internet Avanzado, QoS, Multimedia Carmen Guerrero

A Survey on Distributed Hash Table (DHT): Theory, Platforms, and Applications. Hao Zhang, Yonggang Wen, Haiyong Xie, and Nenghai Yu

File sharing using IP-Multicast

You can probably work with decimal. binary numbers needed by the. Working with binary numbers is time- consuming & error-prone.

PEER TO PEER FILE SHARING USING NETWORK CODING

Acknowledgements. Peer to Peer File Storage Systems. Target Uses. P2P File Systems CS 699. Serving data with inexpensive hosts:

Load Balancing in Structured Overlay Networks. Tallat M. Shafaat

MIDAS: Multi-Attribute Indexing for Distributed Architecture Systems

Calto: A Self Sufficient Presence System for Autonomous Networks

1) The postfix expression for the infix expression A+B*(C+D)/F+D*E is ABCD+*F/DE*++

A Survey of Peer-to-Peer File Sharing Technologies

The p2pweb model: a glue for the Web

IPTV AND VOD NETWORK ARCHITECTURES. Diogo Miguel Mateus Farinha

A binary search tree or BST is a binary tree that is either empty or in which the data element of each node has a key, and:

Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications

Communications and Networking

Anonymous Communication in Peer-to-Peer Networks for Providing more Privacy and Security

query enabled P2P networks Park, Byunggyu

SplitStream: High-Bandwidth Multicast in Cooperative Environments

Decentralized Peer-to-Peer Network Architecture: Gnutella and Freenet

Network Security. Mobin Javed. October 5, 2011

Mobile File-Sharing over P2P Networks

Persistent Data Structures

InfiniBand Clustering

6.02 Practice Problems: Routing

Application Layer. CMPT Application Layer 1. Required Reading: Chapter 2 of the text book. Outline of Chapter 2

Scaling 10Gb/s Clustering at Wire-Speed

CSE331: Introduction to Networks and Security. Lecture 8 Fall 2006

Lecture 8: Routing I Distance-vector Algorithms. CSE 123: Computer Networks Stefan Savage

Introduction to IPv6 and Benefits of IPv6

1 Introduction to Internet Content Distribution

CS104: Data Structures and Object-Oriented Design (Fall 2013) October 24, 2013: Priority Queues Scribes: CS 104 Teaching Team

Internet Control Protocols Reading: Chapter 3

BLOOM FILTERS: 1) Operations:

Datagram-based network layer: forwarding; routing. Additional function of VCbased network layer: call setup.

Chapter 4. Distance Vector Routing Protocols

Tornado: A Capability-Aware Peer-to-Peer Storage Network

Peer-to-Peer Networks Organization and Introduction 1st Week

San Fermín: Aggregating Large Data Sets using a Binomial Swap Forest

Internetworking and Internet-1. Global Addresses

PERFORMANCE OF MOBILE AD HOC NETWORKING ROUTING PROTOCOLS IN REALISTIC SCENARIOS

Join and Leave in Peer-to-Peer Systems: The DASIS Approach

A Comparative Study of the DNS Design with DHT-Based Alternatives

CS 457 Lecture 19 Global Internet - BGP. Fall 2011

From Last Time: Remove (Delete) Operation

Application Layer -1- Network Tools

A Publish/Subscribe Data Gathering Framework Integrating Wireless Sensor Networks and Mobile Phones

SCALABLE RANGE QUERY PROCESSING FOR LARGE-SCALE DISTRIBUTED DATABASE APPLICATIONS *

N6Lookup( title ) Client

Definition. A Historical Example

1 / 13 2 / 8 3 / 8 4 / 9 6 / 7 7 / 9 8 / / Total /100

Lecture 25: Security Issues in Structured Peer-to-Peer Networks. Lecture Notes on Computer and Network Security. by Avi Kak

CS514: Intermediate Course in Computer Systems

RESOURCE DISCOVERY IN AD HOC NETWORKS

CSE 473 Introduction to Computer Networks. Exam 2 Solutions. Your name: 10/31/2013

Outline. EE 122: Interdomain Routing Protocol (BGP) BGP Routing. Internet is more complicated... Ion Stoica TAs: Junda Liu, DK Moon, David Zats

Lecture 2 February 12, 2003

Lecture 13: Distance-vector Routing. Lecture 13 Overview. Bellman-Ford Algorithm. d u (z) = min{c(u,v) + d v (z), c(u,w) + d w (z)}

New Structured P2P Network with Dynamic Load Balancing Scheme

Architectures and protocols in Peer-to-Peer networks

CCNA R&S: Introduction to Networks. Chapter 5: Ethernet

Introduction to LAN/WAN. Network Layer

A PROXIMITY-AWARE INTEREST-CLUSTERED P2P FILE SHARING SYSTEM

Binary Search Trees. A Generic Tree. Binary Trees. Nodes in a binary search tree ( B-S-T) are of the form. P parent. Key. Satellite data L R

O /27 [110/129] via , 00:00:05, Serial0/0/1

Varalakshmi.T #1, Arul Murugan.R #2 # Department of Information Technology, Bannari Amman Institute of Technology, Sathyamangalam

DNS Best Practices. Mike Jager Network Startup Resource Center

Operations: search;; min;; max;; predecessor;; successor. Time O(h) with h height of the tree (more on later).

architecture: what the pieces are and how they fit together names and addresses: what's your name and number?

Information Searching Methods In P2P file-sharing systems

P2P File Sharing: BitTorrent in Detail

Interconnecting Unstructured P2P File Sharing Networks

Domain Name System for PeerHosting

Transcription:

Domain CS 268: Structured P2P Networks: and Tapestry Sean Rhea April 29, 2003 Structured peer-to-peer overlay networks - Sometimes called DHTs, DOLRs, CASTs, - Examples: CAN, Chord,, Tapestry, Contrasted with unstructured P2P networks - Gnutella, Freenet, etc. Today talking about and Tapestry 2 Service Model Let Ι be a set of identifiers - Such as all 160-bit unsigned integers Let Ν be a set of nodes in a P2P system - Some subset of all possible (IP, port) tuples Structured P2P overlays implement a mapping owner Ν : Ι Ν - Given any identifier, deterministically map to a node Properties - Should take O(log Ν ) time and state per node - Should be roughly load balanced Service Model (con t.) Owner mapping exposed in variety of ways - In Chord, have function n = find_successor (i) - In and Tapestry, have function route_to_root (i,m) In general, can be iterative or recursive - Iterative = directed by querying node - Recursive = forwarded through network May also expose owner 1 : Ν P(Ι) - Which identifiers given node is responsible for 3 4 1

Other Service Models Lecture Overview Other models can be implemented on owner Example: Distributed hash table (DHT) void put (key, data) { n = owner (key) n.hash_table.insert (key, data) data get (key) { n = owner (key) return n.hash_table.lookup (key) Introduction PRR Trees - Overview - Locality Properties - Routing in - Joining a network - Leaving a network Tapestry - Routing in Tapestry - Object location in Tapestry Multicast in PRR Trees Conclusions 5 6 PRR Trees Work by Plaxton, Rajaraman, Richa (SPAA 97) - Interesting in a distributed publication system - Similar to Napster in interface - Only for static networks (set Ν does not change) - No existing implementation (AFAIK) and Tapestry both based on PRR trees - Extend to support dynamic node membership - Several implementations PRR Trees: The Basic Idea Basic idea: add injective function node_id: Ν Ι - Gives each node a name in the identifier space owner (i) = node whose node_id is closest to i - Definition of closest varies, but always over Ι To find owner (i) from node with identifier j 1. Let p = longest matching prefix between i and j 2. Find node k with longest matching prefix of p +1 digits 3. If no such node, j is the owner (root) 4. Otherwise, forward query to node k Step 2 is the tricky part 7 8 2

PRR Trees: The Routing Table PRR Trees: Routing 3A01 Each node n has O(b log b Ν ) neighbors - Each Lx neighbor shares x digits with n - Set of neighbors forms a routing table L2 L0 9CD0 L3 3AFC To find owner (47E2) - Query starts at node 3AF2 - Resolve first digit by routing to 4633 - Resolve second digit by routing to 47DA, etc. 2974 45B3 47C1 L3 L0 3AF2 L1 3AF2 4633 47DA 47EC 443E 3C57 5A8F 4889 47F7 3AF1 9 10 PRR Trees: Routing (con t.) PRR Trees: Handling Inexact Matches Problem: what if no exact match? - Consider the following network - Who is the owner of identifier 3701? Network is well formed - Every routing table spot that can be filled is filled - Can route to all node identifiers Owner of 3701 not well defined - Starting from 1000, it s node 3800 - Starting from 2000, it s node 3600 Violation of service model 2000 1000 3600 3800 Want owner function to be deterministic - Must have a way to resolve inexact matches Solved different ways by each system - I have no idea what PRR did - chooses numerically closest node Can break ties high or low - Tapestry performs surrogate routing Chooses next highest match on per digit basis More on this later 11 12 3

Locality in PRR Trees Locality in PRR Trees: Experiments Consider a node with id=1000 in a PRR network - At lowest level of routing table, node 1000 needs neighbors with prefixes 2-, 3-, 4-, etc. - In a large network, there may be several of each Idea: chose the best neighbor for each prefix - Best can mean lowest latency, highest bandwidth, etc. Can show that this choice gives good routes - For certain networks, routing path from query source to owner no more than a constant worse than routing path in underlying network - I m not going to prove this today, see PRR97 for details 13 14 Lecture Overview Introduction Introduction PRR Trees - Overview - Locality Properties - Routing in - Joining a network - Leaving a network Tapestry - Routing in Tapestry - Object location in Tapestry Multicast in PRR Trees Conclusions A PRR tree combined with a Chord-like ring - Each node has PRR-style neighbors - And each node knows its predecessor and successor Called its leaf set To find owner (i), node n does the following: - If i is n s leaf set, choose numerically closest node - Else, if appropriate PRR-style neighbor, choose that - Finally, choose numerically closest from leaf set A lot like Chord - Only leaf set necessary for correctness - PRR-neighbors like finger table, only for performance 15 16 4

Routing Example Notes on Routing PRR neighbors in black Leaf set neighbors in blue Owner of 3701 is now well-defined From 1000 - Resolve first digit routing to 3800 - At 3800, see that we re done - (Numerically closer than 3600) From 2000 - Resolve first digit routing to 3600 - At 3600, 3701 is in leaf set In range 2000-3800 - Route to 3800 b/c numerically closer 2000 1000 3600 3800 Leaf set is great for correctness - Need not get PRR neighbors correct, only leaf set - If you believe the Chord work, this isn t too hard to do Leaf set also gives implementation of owner -1 (n) - All identifiers half-way between n and its predecessor to half-way between n and its successor Can store k predecessors and successors - Gives further robustness as in Chord 17 18 Joining a Network Join Example Must know of a gateway node, g Pick new node s identifier, n, U.A.R. from Ι Ask g to find the m = owner (n) - And ask that it record the path that it takes to do so Ask m for its leaf set Contact m s leaf set and announce n s presence - These nodes add n to their leaf sets and vice versa Build routing table - Get level i of routing table from node i in the join path - Use those nodes to make level i of our routing table Node 3701 wants to join - Has 1000 as gateway Join path is 1000 3800-3800 is the owner 3701 ties itself into leaf set 3701 builds routing table - L0 neighbors from 1000 1000, 2000, and 3800 - L1 neighbors from 3800 3600 Existing nodes on join path consider 3701 as a neighbor 2000 1000 3600 Join path 3800 3701 19 20 5

Join Notes Join Optimization Join is not perfect - A node whose routing table needs new node should learn about it Necessary to prove O(log Ν ) routing hops - Also not guaranteed to find close neighbors Can use routing table maintenance to fix both - Periodically ask neighbors for their neighbors - Use to fix routing table holes; replace existing distant neighbors Philosophically very similar to Chord - Start with minimum needed for correctness (leaf set) - Patch up performance later (routing table) Best if gateway node is close to joining node - Gateway joined earlier, should have close neighbors - Recursively, gateway s neighbors neighbors are close - Join path intuitively provides good initial routing table - Less need to fix up with routing table maintenance s optimized join algorithm - Before joining, find a good gateway, then join normally To find a good gateway, refine set of candidates - Start with original gateway s leaf set - Keep only a closest few, then add their neighbors - Repeat (more or less--see paper for details) 21 22 Leaving a Network Do not distinguish between leaving and crashing - A good design decision, IMHO Remaining nodes notice leaving node n down - Stops responding to keep-alive pings Fix leaf sets immediately - Easy if 2+ predecessors and successors known Fix routing table lazily - Wait until needed for a query - Or until routing table maintenance - Arbitrary decision, IMHO Dealing with Broken Leaf Sets What if no nodes left in leaf set due to failures? Can use routing table to recover (MCR93) - Choose closest nodes in routing table to own identifier - Ask them for their leaf sets - Choose closest of those, recurse Allows use of smaller leaf sets 23 24 6

Lecture Overview Tapestry Routing Introduction PRR Trees - Overview - Locality Properties - Routing in - Joining a network - Leaving a network Tapestry - Routing in Tapestry - Object location in Tapestry Multicast in PRR Trees Conclusions Only different from when no exact match - Instead of using next numerically closer node, use node with next higher digit at each hop Example: - Given 3 node network: nodes 0700, 0F00, and FFFF - Who owns identifier 0000? In, FFFF does (numerically closest) In Tapestry, 0700 does - From FFFF to 0700 or 0F00 (doesn t matter) - From 0F00 to 0700 (7 is next highest digit after 0) - From 0700 to itself (no node with digit between 0 and 7) 25 26 Notes on Tapestry Routing Object Location in Tapestry Mostly same locality properties as PRR and But compared to, very fragile Consider previous example: 0700, 0F00, FFFF - What if 0F00 doesn t know about 0700? - 0F00 will think it is the owner of 0000-0700 will still think it is the owner - Mapping won t be deterministic throughout network Tapestry join algorithm guarantees won t happen - All routing table holes than can be filled will be - Provably correct, but tricky to implement - Leaf set links are bidirectional, easier to keep consistent was originally just a DHT - Support for multicast added later (RKC+01) PRR and Tapestry are DOLRs - Distributed Object Location and Routing Service model - publish (name) - route_to_object (name, message) Like Napster, Gnutella, and DNS - Service does not store data, only pointers to it - Manages a mapping of names to hosts 27 28 7

A Simple DOLR Implementation Problems with Simple DOLR Impl. Can implement a DOLR on owner service model publish (name) { n = owner (name) n.add_mapping (name, my_addr, my_port) route_to_object (name, message) { n = owner (name) m = n.get_mapping (name) m.send_msg (message) No locality - Even if object stored nearby, owner might be far away - Bad for performance - Bad for availability (owner might be behind partition) No redundancy - Easy to fix if underlying network has leaf/successor sets - Just store pointers on owner s whole leaf set If owner fails, replacement already has pointers - But Tapestry doesn t have leaf/successor sets 29 30 Tapestry DOLR Implementation Tapestry DOLR Impl.: Experiments Insight: leave bread crumbs along publish path - Not just at owner publish (name) { foreach n in path_to_owner (name) n.add_mapping (name, my_addr, my_port) route_to_object (name, message) { foreach n in path_to_owner (name) if ((m = n.get_mapping (name))!= null) { m.send_msg (message); break; 31 32 8

Tapestry DOLR Impl. Notes Path Convergence Examples Bread crumbs called object pointers PRR show that overlay path from query source to object is no worse than a constant longer than underlying network path - Just like in routing in a PRR tree True for two reasons: 1. Hops early in path are short (in network latency) 2. Paths converge early Path convergence is a little subtle - Two nearby nodes often have same early hops - Because next hop based on destination, not source - And because neighbor choice weighted on latency Chord 000 111 001 110 010 101 011 100 000 111 001 110 010 101 011 100 33 (nodes 001, 100, and 110 are close ) 34 Lecture Overview Multicast in PRR Trees Introduction PRR Trees - Overview - Locality Properties - Routing in - Joining a network - Leaving a network Tapestry - Routing in Tapestry - Object location in Tapestry Multicast in PRR Trees Conclusions PRR Tree gives efficient paths between all nodes - Uses application nodes as routers Since control routers, can implement multicast - Seems to have been thought of simultaneously by: group, with SCRIBE protocol (RKC+01) Tapestry group, with Bayeux protocol (ZZJ+01) I ll talk about SCRIBE 35 36 9

SCRIBE Protocol SCRIBE Example Like Tapestry object location, use bread crumbs To subscribe to multicast group i - Walk up tree towards owner (i), leaving bread crumbs - If find pre-existing crumb, leave one more and stop To send a message to group i - Send message to owner (i) - Owner sends message to each bread crumb it has Its children who are subscribers or their parents - Each child recursively does the same 56B 301 Owner 370 37F Sender 74D Message stops here 3A4 BCE 37 Subscribers 38 SCRIBE Notes Multicast evaluation points - Stretch, also called Relative Delay Penalty (RDP) - Network Stress SCRIBE has constant RDP - Assuming is a good PRR tree SCRIBE has low network stress - Harder to see, but due to choosing neighbors by latency - Demonstrated in simulations (CJK03) Conclusions PRR trees are a powerful P2P primitive - Can be used as a DHT Path to owner has low RDP Chord can be hacked to do the same - Can be used as a DOLR Finds close objects when available No clear way to get this from Chord - Can be used for application-level multicast Also no clear way to get this from Chord More work to support dynamic membership - uses leaf sets like Chord - Tapestry has own algorithm 39 40 10

For more information http://research.microsoft.com/~antr/pastry/pubs.htm Tapestry http://oceanstore.cs.berkeley.edu/publications/index.html 41 11