Peer-to-Peer Data Management Wolf-Tilo Balke Sascha Tönnies Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de
11. Content Distribution 1. Reliability in Distributed Hash Tables 2. Storage Load Balancing in Distributed Hash Tables 1. Power of Two Choices 2. Virtual Server 3. Content Distribution 1. Swarming 2. Bit Torent VDMS und P2P Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 2
11.1 Stabilize Function The Stabilize Function corrects inconsistent connections Remember: Periodically done by each node n n asks its successor for its predecessor p n checks if p equals n n also periodically refreshes random finger x by (re)locating successor Successor-List to find new successor If successor is not reachable use next node in successor-list Start stabilize function But what happens to data in case of node failure? VDMS und P2P Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 3
11.1 Reliability of Data in Chord Original No Reliability of data Recommendation Use of Successor-List The reliability of data is an application task Replicate inserted data to the next f other nodes Chord informs application of arriving or failing nodes
11.1 Properties Advantages After failure of a node its successor has the data already stored Disadvantages Node stores f intervals More data load After breakdown of a node Find new successor Replicate data to next node More message overhead at breakdown Stabilize-function has to check every Successor-list Find inconsistent links More message overhead VDMS und P2P Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 5
11.1 Multiple Nodes in One Interval Fixed positive number f Indicates how many nodes have to act within one interval at least Procedure First node takes a random position A new node is assigned to any existing node Node is announced to all other nodes in same interval 1 4 6 9 2 5 7 10 3 8 Node
11. 1Multiple Nodes in One Interval Effects of algorithm Reliability of data Better load balancing Higher security 1 4 6 9 2 5 7 10 3 8 Node
11.1 Reliability of Data Insertion Copy of documents Always necessary for replication Less additional expenses Nodes have only to store pointers to nodes from the same interval Nodes store only data of one interval
11.1 Reliability of Data Reliability Failure: no copy of data needed Data are already stored within same interval Use stabilization procedure to correct fingers As in original Chord 1 4 6 9 2 5 7 10 3 8 Node
11.1 Properties Advantages Failure: no copy of data needed Rebuild intervals with neighbors only if critical Requests can be answered by f different nodes Disadvantages Less number of intervals as in original Chord Solution: Virtual Servers VDMS und P2P Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 10
11.1 Fault Tolerance Replication Each data item is replicated K times K replicas are stored on different nodes Redundancy Each data item is split into M fragments K redundant fragments are computed Use of an "erasure-code (see e.g. V. Pless: Introduction to the Theory of Error- Correcting Codes. Wiley-Interscience, 1998) Any M fragments allow to reconstruct the original data For each fragment we compute its key M + K different fragments have different keys VDMS und P2P Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 11
11.2 Storage Load Balancing in DHT Suitable hash function (easy to compute, few collisions) Standard assumption 1: uniform key distribution Every node with equal load No load balancing is needed Standard assumption 2: equal distribution Nodes across address space Data across nodes But is this assumption justifiable? Analysis of distribution of data using simulation VDMS und P2P Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 12
11. 2 Storage Load Balancing in DHT Analysis of distribution of data Example Parameters 4,096 nodes 500,000 documents Optimum ~122 documents per node Optimal distribution of documents across nodes No optimal distribution in Chord without load balancing VDMS und P2P Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 13
11.2 Storage Load Balancing in DHT Number of nodes without storing any document Parameters 4,096 nodes 100,000 to 1,000,000 documents Some nodes without any load Why is the load unbalanced? We need load balancing to keep the complexity of DHT management low VDMS und P2P Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 14
11.2 Definitions Definitions System with N nodes The load is optimally balanced, Load of each node is around 1/N of the total load. A node is overloaded (heavy) Node has a significantly higher load compared to the optimal distribution of load. Else the node is light VDMS und P2P Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 15
11.2 Load Balancing Algorithms Problem Significant difference in the load of nodes Several techniques to ensure an equal data distribution Power of Two Choices (Byers et. al, 2003) Virtual Servers (Rao et. al, 2003) Thermal-Dissipation-based Approach (Rieche et. al, 2004) A Simple Address-Space and Item Balancing (Karger et. al, 2004) VDMS und P2P Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 16
11.2 Overview Algorithms Power of Two Choices (Byers et. al, 2003) Virtual Servers (Rao et. al, 2003) John Byers, Jeffrey Considine, and Michael Mitzenmacher: Simple Load Balancing for Distributed Hash Tables in Second International Workshop on Peerto-Peer Systems (IPTPS), Berkeley, CA, USA, 2003. VDMS und P2P Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 17
11.2 Power of Two Choices Idea One hash function for all nodes h 0 Multiple hash functions for data h 1, h 2, h 3, h d Two options Data is stored at one node only Data is stored at one node & other nodes store a pointer VDMS und P2P Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 18
11.2 Power of Two Choices Inserting Data Results of all hash functions are calculated h 1 (x), h 2 (x), h 3 (x), h d (x) Data is stored on the retrieved node with the lowest load Alternative: other nodes store pointer The owner of the item has to insert the document periodically Prevent removal of data after a timeout (soft state) VDMS und P2P Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 19
11.2 Power of Two Choices Retrieving Without pointers Results of all hash functions are calculated Request all of the possible nodes in parallel One node will answer With pointers Request only one of the possible nodes. Node can forward the request directly to the final node VDMS und P2P Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 20
11.2 Power of Two Choices Advantages Simple Disadvantages Message overhead at inserting data With pointers Additional administration of pointers lead to even more load Without pointers Message overhead for every search VDMS und P2P Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 21
11.2 Overview Algorithms Power of Two Choices (Byers et. al, 2003) Virtual Servers (Rao et. al, 2003) Ananth Rao, Karthik Lakshminarayanan, Sonesh Surana, Richard Karp, and Ion Stoica Load Balancing in Structured P2P Systems in Second International Workshop on Peer-to-Peer Systems (IPTPS), Berkeley, CA, USA, 2003. VDMS und P2P Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 22
11.2 Virtual Server Each node is responsible for several intervals "Virtual server" Example Chord Chord Ring [Rao 2003] VDMS und P2P Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 23
11.2 Rules Rules for transferring a virtual server From heavy node to light node 1. The transfer of an virtual server makes the receiving node not heavy 2. The virtual server is the lightest virtual server that makes the heavy node light 3. If there is no virtual server whose transfer can make a node light, the heaviest virtual server from this node would be transferred VDMS und P2P Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 24
11.2 Virtual Server Each node is responsible for several intervals log (n) virtual servers Load balancing Different possibilities to change servers One-to-one One-to-many Many-to-many Copy of an interval is like removing and inserting a node in a DHT Chord Ring VDMS und P2P Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 25
11.2 Scheme 1: One-to-One One-to-One Light node picks a random ID Contacts the node x responsible for it Accepts load if x is heavy H L L H L L HL L [Rao 2003] VDMS und P2P Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 26
11.2 Scheme 2: One-to-Many One-to-Many Light nodes report their load information to directories Heavy node H gets this information by contacting a directory H contacts the light node which can accept the excess load L 1 D 1 H 1 L 2 L 3 L 5 H 3 L 4 D 2 H 2 Light nodes Directories Heavy nodes [Rao 2003] VDMS und P2P Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 27
11.2 Scheme 2: Many-to-Many Many-to-Many Many heavy and light nodes rendezvous at each step Directories periodically compute the transfer schedule and report it back to the nodes, which then do the actual transfer L 1 L 2 D 1 H 1 L 3 L 5 H 3 L 4 D 2 H 2 Light nodes Directories Heavy nodes [Rao 2003] VDMS und P2P Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 28
11.2 Virtual Server Advantages Easy shifting of load Whole Virtual Servers are shifted Disadvantages Increased administrative and messages overhead Maintenance of all Finger-Tables Much load is shifted [Rao 2003] VDMS und P2P Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 29
11.2 Simulation Scenario 4,096 nodes (comparison with other measurements) 100,000 to 1,000,000 documents Chord m= 22 bits. Consequently, 222 = 4,194,304 nodes and documents Hash function sha-1 (mod 2m) random Analysis Up to 25 runs per test VDMS und P2P Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 30
11.2 Results Without load balancing Power of Two Choices Virtual server + Simple + Original Bad load balancing + Simple + Lower load Nodes w/o load + No nodes w/o load Higher max. load than Power of Two Choices VDMS und P2P Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 31
11.3 Content Distribution Sometimes large amounts of data have to be distributed over networks Software updates, video on demand, etc. Early approaches: Napster/Gnutella/Fasttrack Download whole file from one peer If download fails: repeat search, resume download from alternative source Issues No load distribution Poor performance due to asymmetric uplink/downlink bandwidth (ADSL) Low reliability (except for small files) VDMS und P2P Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 32
11.3 Swarming Approach Idea: Chunks Split large files into small chunks Identify/protect chunks via hash values Parallelization 0x9A3C 0x7C23 0x194F 0xDE6A Download different chunks from different sources Utilize upload capacity of multiple sources Sources: Destination: VDMS und P2P Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 33
11.3 Swarming Properties Advantages Peer failures: no loss of files, only chunks Increased throughput Strategies Chunk selection Avoid scarcity Best overall availability? Fairness Free-Riding Bandwidth allocation Systems BitTorrent Microsoft Avalanche VDMS und P2P Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 34
11.3 BitTorrent Overview Bittorrent or BitTorrent Torrent = big stream Author: Bram Cohen, 2003 Only for file distribution, no search features Designed for Content providers Flash crowds Central components Web server for search Tracker for peer coordination VDMS und P2P Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 35
11.3 BitTorrent Definitions Peers Torrent Contains metadata about the files Contains the address of a tracker Swarm Specification of backup trackers possible All peers sharing a torrent are called a swarm Tracker Keeps track of which peers are in a swarm Coordinates communication between the peers VDMS und P2P Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 36
11.3 BitTorrent Joining a Torrent new leecher torrent 1 website 2 join peer list 3 tracker data request 4 seed/leecher Peers divided into: seeds: have the entire file leechers: still downloading 1. obtain the torrent 2. contact the tracker 3. obtain a peer list (contains seeds & leechers) 4. contact peers from that list for data VDMS und P2P Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 37
11.3 BitTorrent Exchanging Data leecher B leecher A I have! seed leecher C Verify pieces using hashes Download sub-pieces in parallel Advertise received pieces to the entire peer list Look for the rarest pieces VDMS und P2P Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 38
11.3 Torrent A Torrent file Passive component Files are typically fragmented into 256KB pieces Typically hosted on a web server Metadata file structure Describes the files in the torrent URL of tracker File name File length Piece length SHA-1 hashes of pieces Allow peers to verify integrity Creation date VDMS und P2P Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 39
11.3 Tracker Peer cache IP, port, peer id State information Completed Downloading Clients report status periodically to tracker Returns random list 50 random leechers/seeds Client first contacts 20-40 of them and more if some do not respond VDMS und P2P Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 40
11.3 Tracker VDMS und P2P Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 41
11.3 Tracker-less approaches Tracker issues Single point of failure Scalability Piratebay tracker nearly overloaded (>5 Mio. Peers) Decentralized tracker Replace with DHT (Kademlia) Does not tackle distributed search Currently not widely used VDMS und P2P Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 42
11.3 Chunk Selection Which chunk next? 1. Strict Priority Finish active chunks 2. Rarest First Improves availability of rare chunks Delays download of common chunks 3. Random First Chunk Get first chunk quickly (rarest chunk probably slow to get) 4. Endgame Mode Send requests for last sub-chunks to all known peers End of download not stalled by slow peers VDMS und P2P Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 43
11.3 Game Theory Basic Ideas of Game Theory Studies situations where players choose different actions in an attempt to maximize their returns Studies the ways in which strategic interactions among rational players produce outcomes with respect to the players preferences The outcomes might not have been intended by any of them Game theory offers a general theory of strategic behavior Described in mathematical form Plays an important role in Modern economics Decision theory Multi-agent systems VDMS und P2P Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 44
11.3 Game Theory Developed to explain the optimal strategy in two-person interactions. von Neumann and Morgenstern Initially: zero-sum games John Nash Works in game theory and differential geometry Nonzero-sum games Nash equilibrium 1994 Nobel Prize in Economics Harsanyi, Selten Incomplete information VDMS und P2P Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 45
11.3 Definitions Games Situations are treated as games. Rules The rules of the game state who can do what And when they can do it. Player's Strategies Plan for actions in each possible situation in the game Player's Payoffs Is the amount that the player wins or looses in a particular situation Dominant Strategy If players best strategy doesn t depend on what other players do VDMS und P2P Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 46
11.3 Prisoner's Dilemma Famous example of game theory A and B are arrested by the police They are questioned in separate cells Unable to communicate with each other. They know how it works If they both resist interrogation and proclaim their mutual innocence, they will get off with a three year sentence for robbery. If one of them confesses to the entire string of robberies and the other does not, the confessor will be rewarded with a light, one year sentence and the other will get a severe eight year sentence. If they both confess, then the judge will sentence both to a moderate four years in prison VDMS und P2P Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 47
11.3 Prisoner's Dilemma B Confess Not Confess A Confess 4 years each 1 year for A and 8 years for B Not Confess 8 years for A and 1 year for B 3 years each VDMS und P2P Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 48
11.3 A s Decision Tree There are two cases to consider If B Confesses A If B Does Not Confess A Confess Not Confess Confess Not Confess 4 Years in Prison 8 Years in Prison 1 Year in Prison 3 Years in Prison Best Strategy Best Strategy The dominant strategy for A is to confess No matter what B does, confessing is better choice Nash equilibrium: both A and B will confess VDMS und P2P Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 49
11.3 Repeated Games A repeated game Game that the same players play more than once Differ from one-shot games because people's current actions can depend on the past behavior of other players. Cooperation is encouraged Book recommendation Thinking strategically by A.Dixit and B Nalebuff German translation: Spieltheorie für Einsteiger VDMS und P2P Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 50
11.3 Tit for Tat Tit for tat Highly effective strategy An agent using this strategy will initially cooperate Then respond in kind to an opponent's previous action If the opponent previously was cooperative, the agent is cooperative. If not, the agent is not. Dependent on four conditions Unless provoked, the agent will always cooperate If provoked, the agent will retaliate The agent is quick to forgive The agent must have a good chance of competing against the opponent more than once VDMS und P2P Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 51
11.3 Choking Choking Temporary refusal to upload Downloading occurs as normal Connection is kept open No Setup costs TCP congestion control Choking mechanism Ensures that nodes cooperate Eliminates the free-rider problem Cooperation involves uploaded sub-pieces that you have on your peer Based on game-theoretic concepts Tit-for-tat strategy in repeated games VDMS und P2P Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 52
11.3 Unchoking leecher B leecher A seed leecher D leecher C Periodically calculate data-receiving rates Upload to (unchoke) the fastest downloaders Optimistic Unchoking Each BitTorrent peer has a single optimistic unchoke which is uploaded regardless of the current download rate from it This peer rotates every 30 sec Reason: To discover currently unused connections that are better than the ones being used VDMS und P2P Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 53
11.3 Choking Details BitTorrent Details A peer always unchokes a fixed number of its peers Default of 4 Choking decision based on current download rates Evaluated on a rolling 20-second average Choking evaluation performed every 10 seconds Prevents wastage of resources by rapidly choking/unchoking peers VDMS und P2P Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 54
11.3 Anti-Snubbing Choking policy When over a minute has gone by without receiving a single sub-piece from a particular peer, do not upload to it except as an optimistic unchoke Problem A peer might find itself being simultaneously choked by all its peers that it was just downloading from Download will lag until optimistic unchoke finds better peers Solution If choked by everyone, increase the number of simultaneous optimistic unchokes to more than one VDMS und P2P Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 55
11.3 Choking for Seeds Open issue: upload-only choking Once download is complete, a peer has no download rates to use for comparison nor has any need to use them The question is, which nodes to upload to? Policy Upload to those with the best upload rate. Advantages Ensures that pieces get replicated faster Peers that have good upload rates are probably not being served by others VDMS und P2P Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 56
11.3 BitTorrent Summary Optimized file transfer system No file search, no fancy GUI, etc. Very effective High throughput & scalability Nearly perfect utilization of bandwidth Fairness and load distribution not optimal, but good enough Commercially successful Distribution of RedHat distribution BBC evaluates the distribution of TV content (not in real-time) Centralized Easier to take down than other approaches VDMS und P2P Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 57
11.3 Swarming Summary Solves the problem of efficient file distribution Scalable Handles flash crowds Areas for optimization Incentive models Tracker-less approaches Further endgame improvements Next step: content streaming Real-time constraints Chunk order VDMS und P2P Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 58