Distributed Hash Tables

Similar documents
An Introduction to Peer-to-Peer Networks

Distributed Computing over Communication Networks: Topology. (with an excursion to P2P)

Plaxton routing. Systems. (Pastry, Tapestry and Kademlia) Pastry: Routing Basics. Pastry: Topology. Pastry: Routing Basics /3

CS5412: TIER 2 OVERLAYS

Using Peer to Peer Dynamic Querying in Grid Information Services

N6Lookup( title ) Client

Department of Computer Science Institute for System Architecture, Chair for Computer Networks. File Sharing

Security in Structured P2P Systems

Adapting Distributed Hash Tables for Mobile Ad Hoc Networks

HW2 Grade. CS585: Applications. Traditional Applications SMTP SMTP HTTP 11/10/2009

Load Balancing in Structured Overlay Networks. Tallat M. Shafaat

Distributed Hash Tables in P2P Systems - A literary survey

Chord. A scalable peer-to-peer look-up protocol for internet applications

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November ISSN

P2P: centralized directory (Napster s Approach)

p2p: systems and applications Internet Avanzado, QoS, Multimedia Carmen Guerrero

P2P Storage Systems. Prof. Chun-Hsin Wu Dept. Computer Science & Info. Eng. National University of Kaohsiung

Chord - A Distributed Hash Table

Middleware and Distributed Systems. Peer-to-Peer Systems. Martin v. Löwis. Montag, 30. Januar 12

A Survey and Comparison of Peer-to-Peer Overlay Network Schemes

Varalakshmi.T #1, Arul Murugan.R #2 # Department of Information Technology, Bannari Amman Institute of Technology, Sathyamangalam

Architectures and protocols in Peer-to-Peer networks

RESEARCH ISSUES IN PEER-TO-PEER DATA MANAGEMENT

T he Electronic Magazine of O riginal Peer-Reviewed Survey Articles ABSTRACT

A PROXIMITY-AWARE INTEREST-CLUSTERED P2P FILE SHARING SYSTEM

The Role and uses of Peer-to-Peer in file-sharing. Computer Communication & Distributed Systems EDA 390

LOOKING UP DATA IN P2P SYSTEMS

5. Peer-to-peer (P2P) networks

CSCI-1680 CDN & P2P Chen Avin

PEER TO PEER FILE SHARING USING NETWORK CODING

File sharing using IP-Multicast

How To Create A P2P Network

Denial of Service Resilience in Peer to Peer. D. Dumitriu, E. Knightly, A. Kuzmanovic, I. Stoica, W. Zwaenepoel Presented by: Ahmet Canik

Peer-to-Peer Networks. Chapter 6: P2P Content Distribution

DFSgc. Distributed File System for Multipurpose Grid Applications and Cloud Computing

PEER-TO-PEER (P2P) systems have emerged as an appealing

Domain Name System for PeerHosting

query enabled P2P networks Park, Byunggyu

Calto: A Self Sufficient Presence System for Autonomous Networks

Scalable Internet/Scalable Storage. Seif Haridi KTH/SICS

A Survey on Distributed Hash Table (DHT): Theory, Platforms, and Applications. Hao Zhang, Yonggang Wen, Haiyong Xie, and Nenghai Yu

8 Conclusion and Future Work

PSON: A Scalable Peer-to-Peer File Sharing System Supporting Complex Queries

IPTV AND VOD NETWORK ARCHITECTURES. Diogo Miguel Mateus Farinha

New Structured P2P Network with Dynamic Load Balancing Scheme

Optimizing and Balancing Load in Fully Distributed P2P File Sharing Systems

Decentralized Peer-to-Peer Network Architecture: Gnutella and Freenet

A Survey of Peer-to-Peer Network Security Issues

Decentralized supplementary services for Voice-over-IP telephony

Cassandra A Decentralized, Structured Storage System

Mapping the Gnutella Network: Macroscopic Properties of Large-Scale Peer-to-Peer Systems

A Survey of Peer-to-Peer File Sharing Technologies

Acknowledgements. Peer to Peer File Storage Systems. Target Uses. P2P File Systems CS 699. Serving data with inexpensive hosts:

Topic Communities in P2P Networks

Politehnica University of Timisoara. Distributed Mailing System PhD Report I

A Comparative Study of the DNS Design with DHT-Based Alternatives

Information Searching Methods In P2P file-sharing systems

Raddad Al King, Abdelkader Hameurlain, Franck Morvan

A Review on Efficient File Sharing in Clustered P2P System

The p2pweb model: a glue for the Web

Overlay Networks. Slides adopted from Prof. Böszörményi, Distributed Systems, Summer 2004.

MIDAS: Multi-Attribute Indexing for Distributed Architecture Systems

Peer-to-Peer Networks Organization and Introduction 1st Week

A P2P SERVICE DISCOVERY STRATEGY BASED ON CONTENT

International journal of Engineering Research-Online A Peer Reviewed International Journal Articles available online

Peer-to-Peer Networks

Approximate Object Location and Spam Filtering on Peer-to-Peer Systems

Multicast vs. P2P for content distribution

Evolution of Peer-to-Peer Systems

Lecture 25: Security Issues in Structured Peer-to-Peer Networks. Lecture Notes on Computer and Network Security. by Avi Kak

SUITABLE ROUTING PATH FOR PEER TO PEER FILE TRANSFER

P2P Characteristics and Applications

RVS-Seminar Overlay Multicast Quality of Service and Content Addressable Network (CAN)

The Value of Content Distribution Networks Mike Axelrod, Google Google Public

High Throughput Computing on P2P Networks. Carlos Pérez Miguel

Methods & Tools Peer-to-Peer Jakob Jenkov

Peer to peer networks: sharing between peers. Trond Aspelund

A Survey of Data Management in Peer-to-Peer Systems

Lecture 3: Scaling by Load Balancing 1. Comments on reviews i. 2. Topic 1: Scalability a. QUESTION: What are problems? i. These papers look at

Traffic Localization for DHT-based BitTorrent networks

Web DNS Peer-to-peer systems (file sharing, CDNs, cycle sharing)

Plaxton Routing. - From a peer - to - peer network point of view. Lars P. Wederhake

DG Forwarding Algorithm

SuperViz: An Interactive Visualization of Super-Peer P2P Network

Denial-of-Service Resilience in Peer-to-Peer File Sharing Systems

PEER TO PEER CLOUD FILE STORAGE ---- OPTIMIZATION OF CHORD AND DHASH. COEN283 Term Project Group 1 Name: Ang Cheng Tiong, Qiong Liu

P2P File Sharing: BitTorrent in Detail

Peer-to-Peer File Sharing Across Private Networks Using Proxy Servers

Lecture 23: Interconnection Networks. Topics: communication latency, centralized and decentralized switches (Appendix E)

Object Request Reduction in Home Nodes and Load Balancing of Object Request in Hybrid Decentralized Web Caching

Computer Networks - CS132/EECS148 - Spring

Async: Secure File Synchronization

LOAD BALANCING WITH PARTIAL KNOWLEDGE OF SYSTEM

Peer-to-Peer Systems: "A Shared Social Network"

CS435 Introduction to Big Data

Limitations of Packet Measurement

Communications Software. CSE 123b. CSE 123b. Spring Lecture 13: Load Balancing/Content Distribution. Networks (plus some other applications)

Large-Scale Web Applications

1. Comments on reviews a. Need to avoid just summarizing web page asks you for:

Measurements on the Spotify Peer-Assisted Music-on-Demand Streaming System

Transcription:

Distributed Hash Tables

DHTs Like it sounds a distributed hash table Put(Key, Value) Get(Key) > Value

Interface vs. Implementation Put/Get is an abstract interface Very convenient to program to Doesn't require a DHT in today's sense of the world. e.g., Amazon's S^3 storage service /bucket name/object id > data We'll mostly focus on the back end log(n) lookup systems like Chord But researchers have proposed alternate architectures that may work better, depending on assumptions!

Last time: Unstructured Lookup Pure flooding (Gnutella), TTL limited Send message to all nodes Supernodes (Kazaa) Flood to supernodes only Adaptive super nodes and other tricks (GIA) None of these scales well for searching for needles

Alternate Lookups Keep in mind contrasts to... Flooding (Unstructured) from last time Hierarchical lookups DNS Properties? Root is critical. Today's DNS root is widely replicated, run in serious secure datacenters, etc. Load is asymmetric. Not always bad DNS works pretty well But not fully decentralized, if that's your goal

P2P Goal (general) Harness storage & computation across (hundreds, thousands, millions) of nodes across Internet In particular: Can we use them to create a gigantic, hugely scalable DHT?

P2P Requirements Scale to those sizes... Be robust to faults and malice Specific challenges: Node arrival and departure system stability Freeloading participants Malicious participants Understanding bounds of what systems can and cannot be built on top of p2p frameworks

DHTs Two options: lookup(key) > node ID lookup(key) > data When you know the nodeid, you can ask it directly for the data, but specifying interface as > data provides more opportunities for caching and computation at intermediaries Different systems do either. We'll focus on the problem of locating the node responsible for the data. The solutions are basically the same.

Algorithmic Requirements Every node can find the answer Keys are load balanced among nodes Note: We're not talking about popularity of keys, which may be wildly different. Addressing this is a further challenge... Routing tables must adapt to node failures and arrivals How many hops must lookups take? Trade off possible between state/maint. traffic and num lookups...

Consistent Hashing How can we map a key to a node? Consider ordinary hashing func(key) % N > node ID What happens if you add/remove a node? Consistent hashing: Map node IDs to a (large) circular space Map keys to same circular space Key belongs to nearest node

DHT: Consistent Hashing Node 105 N105 Key 5 K5 K20 Circular ID space N32 N90 K80 A key is stored at its successor: node with next higher ID 15 441 Spring 2004, Jeff Pang 11

Consistent Hashing Very useful algorithmic trick outside of DHTs, etc. Any time you want to not greatly change object distribution upon bucket arrival/departure Detail: To have good load balance Must represent each bucket by log(n) virtual buckets 15 441 Spring 2004, Jeff Pang 12

DHT: Chord Basic Lookup N105 N120 N10 Where is key 80? N90 has K80 N32 K80 N90 N60 15 441 Spring 2004, Jeff Pang 13

DHT: Chord Finger Table 1/4 1/2 1/8 1/16 1/32 1/64 1/128 N80 Entry i in the finger table of node n is the first node that succeeds or equals n + 2 i In other words, the ith finger points 1/2 n i way around the ring 15 441 Spring 2004, Jeff Pang 14

DHT: Chord Join Assume an identifier space [0..8] Node n1 joins Succ. Table 7 0 1 i id+2 i succ 0 2 1 1 3 1 2 5 1 6 2 5 4 3 15 441 Spring 2004, Jeff Pang 15

DHT: Chord Join Node n2 joins Succ. Table 7 0 1 i id+2 i succ 0 2 2 1 3 1 2 5 1 6 2 Succ. Table 5 4 3 i id+2 i succ 0 3 1 1 4 1 2 6 1 15 441 Spring 2004, Jeff Pang 16

DHT: Chord Join Succ. Table Nodes n0, n6 join i id+2 i succ 0 1 1 1 2 2 2 4 0 Succ. Table Succ. Table 7 0 1 i id+2 i succ 0 2 2 1 3 6 2 5 6 i id+2 i succ 0 7 0 1 0 0 2 2 2 6 2 Succ. Table 5 4 3 i id+2 i succ 0 3 6 1 4 6 2 6 6 15 441 Spring 2004, Jeff Pang 17

DHT: Chord Join Nodes: n1, n2, n0, n6 Succ. Table i id+2 i succ 0 1 1 1 2 2 2 4 0 Items 7 Items: f7, f2 0 Succ. Table 7 1 i id+2 i succ 0 2 2 1 3 6 2 5 6 Items 1 Succ. Table 6 2 i id+2 i succ 0 7 0 1 0 0 2 2 2 5 4 3 Succ. Table i id+2 i succ 0 3 6 1 4 6 2 6 6 15 441 Spring 2004, Jeff Pang 18

DHT: Chord Routing Upon receiving a query for item id, a node: Checks whether stores the item locally If not, forwards the query to the largest node in its successor table that does not exceed id Succ. Table i id+2 i succ 0 7 0 1 0 0 2 2 2 6 0 Succ. Table 7 1 i id+2 i succ query(7) 0 2 2 1 3 6 2 5 6 5 4 Succ. Table i id+2 i succ 0 1 1 1 2 2 2 4 0 3 2 Items 7 Succ. Table i id+2 i succ 0 3 6 1 4 6 2 6 6 Items 1 15 441 Spring 2004, Jeff Pang 19

DHT: Chord Summary Routing table size? Log N fingers Routing time? Each hop expects to 1/2 the distance to the desired id => expect O(log N) hops. 15 441 Spring 2004, Jeff Pang 20

Alternate structures Chord is like a skiplist: each time you go ½ way towards the destination. Other topologies do this too... 15 441 Spring 2004, Jeff Pang 21

Tree like structures Pastry, Tapestry, Kademlia Pastry: Nodes maintain a Leaf Set size L L /2 nodes above & below node's ID (Like Chord's successors, but bi directional) Pointers to log_2(n) nodes at each level i of bit prefix sharing with node, with i+1 different e.g., node id 01100101 stores to neighbor at 1, 00, 010, 0111,... 15 441 Spring 2004, Jeff Pang 22

Hypercubes the CAN DHT Each has ID Maintains pointers to a neighbor who differs in one bit position Only one possible neighbor in each direction But can route to receiver by changing any bit 15 441 Spring 2004, Jeff Pang 23

So many DHTs... Compare along two axes: How many neighbors can you choose from when forwarding? (Forwarding Selection) How many nodes can you choose from when selecting neighbors? (Neighbor Selection) Failure resilience: Forwarding choices Picking low latency neighbors: Both help 15 441 Spring 2004, Jeff Pang 24

Proximity Ring: Forwarding: log(n) choices for next hop when going around ring Neighbor selection: Pick from 2^i nodes at level i (great flexibility) Tree: Forwarding: 1 choice Neighbor: 2^i 1 choices for ith neighbor 15 441 Spring 2004, Jeff Pang 25

Hypercube Neighbors: 1 choice (neighbors who differ in one bit) Forwarding: Can fix any bit you want. N/2 (expected) ways to forward So: Neighbors: Hypercube 1, Others: 2^i Forwarding: tree 1, hypercube logn/2, ring logn 15 441 Spring 2004, Jeff Pang 26

How much does it matter? Failure resilience without re running routing protocol Tree is much worse; ring appears best But all protocols can use multiple neighbors at various levels to improve these #s Proximity Neighbor selection more important than route selection for proximity, and draws from large space with everything but hypercube 15 441 Spring 2004, Jeff Pang 27

Other approaches Instead of log(n), can do: Direct routing (everyone knows full routing table) Can scale to tens of thousands of nodes May fail lookups and re try to recover from failures/additions One hop routing with sqrt(n) state instead of log(n) state What's best for real applications? Still up in the air. 15 441 Spring 2004, Jeff Pang 28

DHT: Discussion Pros: Guaranteed Lookup O(log N) per node state and search scope Cons: (Or otherwise) Hammer in search of nail? Now becoming popular in p2p Bittorrent Distributed Tracker. But still waiting for massive uptake. Or not. Many services (like Google) are scaling to huge #s without DHT like log(n) 15 441 Spring 2004, Jeff Pang 29

Further Information We didn't talk about Kademlia's XOR structure (like a generalized hypercube) See The Impact of DHT Routing Geometry on Resilience and Proximity for more detail about DHT comparison No silver bullet: DHTs very nice for exact match, but not for everything (next few slides) 15 441 Spring 2004, Jeff Pang 30

Writable, persistent p2p Do you trust your data to 100,000 monkeys? Node availability hurts Ex: Store 5 copies of data on different nodes When someone goes away, you must replicate the data they held Hard drives are *huge*, but cable modem upload bandwidth is tiny perhaps 10 Gbytes/day Takes many days to upload contents of 200GB hard drive. Very expensive leave/replication situation! 15 441 Spring 2004, Jeff Pang 31

When are p2p / DHTs useful? Caching and soft state data Works well! BitTorrent, KaZaA, etc., all use peers as caches for hot data Finding read only data Limited flooding finds hay DHTs find needles BUT 15 441 Spring 2004, Jeff Pang 32

A Peer to peer Google? Complex intersection queries ( the + who ) Billions of hits for each term alone Sophisticated ranking Must compare many results before returning a subset to user Very, very hard for a DHT / p2p system Need high inter node bandwidth (This is exactly what Google does massive clusters) 15 441 Spring 2004, Jeff Pang 33