Lesson 14 APPLICATION OF COMPLEX NETWORKS CHORD, SYMPHONY, FREENET 23/11/2015 1
GRAPH ANALYSIS: APPLICATIONS In this lesson: exploit the tools offered by complex network analysis to analyse some properties of the P2P structured/unstructured overlays. We will consider three different applications of complex network analysis to P2P systems: topological analysis of the Chord overlay Symphony: a Kleinberg-based DHT 0.7: a Friend to Friend (F2F) or Dark network 2
GRAPH ANALYSIS: APPLICATIONS The overlay is described by a direct graph a vertex of the graph for each peer an edge (p,q) if p has a reference to q in its routing table a direct graph: the existence of an edge (p,q) does not imply the existence of the edge (q,p) Let us analyse the graph with the tools for the complex network analysis of last lessons: average path lengths node degree distribution indegree, outdegree, since we consider the graph as direct clustering coefficient 3
CHORD OVERLAY ANALYSIS Chord Ring and Finger Tables Peer: vertices an edge between p and q if p has a reference to q in its finger table 4
CHORD OVERLAY ANALYSIS m=28, number of bits for the identifiers n=100 edges orientation is not shown 5
CHORD OVERLAY ANALYSIS n=32 each vertex is connected to the opposite vertex on the ring the nodes approximatively have the same number of outgoing edges the number on ingoing edges varies considerably at different vertexes. different in/out degrees 6
VERTEX DEGREE DISTRIBUTION Indegree (number of ingoing edges) for a network of 10000 nodes, 28 bit identifiers can be described by an exponential curve some vertexes have an high number of ingoing edges the corresponding peers manage a high number of messages this introduces load balancing problems 7
VERTEX DEGREE DISTRIBUTION out-degree (number of outgoing edges) for a Chord network of 10000 nodes, 28 bit identifiers symmetric values with respect to 13.5 outdegree = numbe of entries in the finger table the same outdegree approximatively for each node 8
CHORD OVERLAY ANALYSIS n=32 let n be the number of identifiers on the ring, and let the metrics to measure the distance between two vertexes on the ring be: d 2n (i,j) = min{ i-j, n- i-j } each vertex is connected to the vertexes that are approximatively at distance ½ n, ¼ n,...1 9
CHORD: PATH AVERAGE LENGTH at each step the distance to the searched node is halved curve fitting on the experimental results the average path length grows as the logarithm of the size of the network: the diameter of the network is low. 10
CHORD: CLUSTERING COEFFICIENT Neighbours of 11: 14, 18, 20, 28 are linked in a sequential list Neighbours of 21: 28, 1, 9 are linked in a sequential. This property holds for the neighbours of each node.. 11
CHORD: CLUSTERING COEFFICIENT Theorem: Let us consider a directed graph describing the Chord overlay where an edge (a,b) exists if and only if a node a has a reference to b in its finger table. The clustering coefficient of a Chord overlay is 2/log 2 n, where n is the number of nodes in the overlay Proof: each node in the Chord ring presents the same connectivity characteristics with its neighbours (finger) hence, without loss of generality, we can consider the node v = 0 and its k (k = log(n)) neighbours 1,2,...2 k-1 let us examine two nodes neighbours of v, x = 2 i and y = 2 j, with i<j x and y are neighbour each other if and only if 2 j -2 i is a power of 2 2 j -2 i = 2 m which can be rewritten as 2 i (2 j-i -1) = 2 m for some integer m (continues in the next slide...) 12
CHORD: CLUSTERING COEFFICIENT Theorem: Let us consider a directed graph describing the Chord overlay where an edge (a,b) exists if and only if node a has a reference to b in its finger table. The clustering coefficient of a Chord overlay is 2/log 2 n, where n is the number of nodes in the overlay Proof: x and y are neighbours if and only if 2 j -2 i is a power of 2 2 i (2 j-i -1) = 2 m for some integer m 2 i (2 j-i -1) = 2 m if and only if 2 j-i -1 =1 therefore j=i+1 the k neighbours of v are connected in a sequential list, where each neighbour points to the neighbour defined by the next power of 2. the link are unidirectional from the lower index nodes to the higher index ones from this it follows that CC(G)= (k-1) : k(k-1)/2 = 2/k = 2/log 2 n 13
CHORD: CLUSTERING COEFFICIENT let us consider a Chord overlay G with n= 1024 nodes cc(g) = 2/ log 2 (1024) = 2/10 = 0.2 the clustering coefficient decreases when the number of nodes increases, but the logarithm at the denominator weakens the decrease of the value of the clustering coefficient in any case, the clustering coefficient is high with respect to a random network with the same number of nodes navigability: routing finds the routing paths through the finger tables a Chord overlay may be considered, with a good degree of approximation, a small world network 14
CHORD: OVERLAY ANALYSIS n=32 Property of the Chord overlay: each vertex is connected to the opposite vertex on the ring the nodes approximatively have the same number of outgoing edges the number on ingoing edges varies considerably at different vertexes. 15
SYMPHONY: A KLEINBERG BASED DHT Symphony: Distributed Hashing in A Small World, Manku, Bawa, Raghavan (Stanford) the structure of Symphony is similar to that of Chord: pair an identifier in the logical space to each node and each data. identifiers are assigned in the interval [0,1], in a unitary ring the nodes are inserted in this ring according to their identifier. segment of responsibility: each node is responsible of all the data identified by an ID larger or equal to its identifier (clockwise ordering) and less than or equal to the identifier of the next node but...long range links are put at random, according Kleinberg's distribution. 16
SYMPHONY: A KLEINBERG BASED DHT 17
SYMPHONY: A KLEINBERG BASED DHT Each nodes defines: a link with its predecessor and one with its successor possibly some links with close neighbours k (k 1) long distance links 18
SYMPHONY: A KLEINBERG BASED DHT A node defines a long range link in the following way chooses a number x according to an harmonic distribution detects the point y distant x (clockwise) with respect to itself contacts the manager of y and tries to define a long range link with y The point x is chosen according to an harmonic distribution p n (x) = 1/(x log n) The probability to put a long range link depends on the inverse of the distance (Kleinberg) and on the number of nodes 19
SYMPHONY: A KLEINBERG BASED DHT Harmonic probability probability distribution: n is the number of nodes in the ring x [1/n,1] is the ring distance of the target from the source of the long range link In the harmonic distribution the probability of a long range link depends on the distance and also from the number of nodes given a distance, if the number of nodes increases, the probability decreases 20
SYMPHONY: A KLEINBERG BASED DHT In Chord the computation of the long range links is deterministic and complex Symphony exploits a statistical approximation to obtain a similar results, but in a more efficient way lower number of messages 21
SYMPHONY: A KLEINBERG BASED DHT Greedy Routing Algorithm (inspired by Kleinberg): when a node n looks for a key, it sends the key through the link (short or long range) which minimizes the distance, computed clockwise, between the target and the key Theorem: consider a Symphony DHT with n nodes, where each node has k long range links. The average number of nodes which has to be contacted before reaching the node managing a key is O( log 2 n/k) If k = log n the routing requires O(log n) hops, like Chord A sort of probabilistic Chord The result is valid only for the harmonic distribution. If, we choose a uniform distribution, the complexity grows as the radix of n 22
SYMPHONY: A KLEINBERG BASED DHT The value K is a upper limit to the number of connections managed by each node: it may be defined at configuration time. A node chosen as target of a long range link, may refuse the connection, if it has already exceeded the limit of K opened connections In this case, the node that has required the connection (the one which wanted to define the long range link), determines a new value of x, by applying the harmonic distribution Symphony also checks that multiple links are not defined between the same pair of nodes 23
SYMPHONY: A KLEINBERG BASED DHT To evaluate the probability distribution, each node needs to know the total number of nodes on the overlay The computation of the exact number of nodes may be complex in a distributed setting is not simple. It may require a gossip algorithm. Symphony exploits a heuristics based on the following observation: consider s (generally 3) close distinct nodes on the ring, s << n. if the identifiers are uniformly distributed on the ring, then each node approximatively manages a segment whose length is 1/n. consider X s, the sum of the lengths of the segments of the Symphony ring, managed by the s nodes, (the length of each segment is <1) 24
SYMPHONY: A KLEINBERG BASED DHT Let us consider s nodes: if the distribution is uniform, each node manages approximatively a segment the same length then, the following proportion holds s:xs = n:1 where 1 is the total length of the Symphony ring this implies: n = s/xs generally, consider the predecessor, the successor node and itself if s=3 each node knows the length that it manages and the length of the segments managed by its neighbours 25
SYMPHONY: A KLEINBERG BASED DHT A new node joining to the ring: chooses (hash function) the identifies id in the interval [0,1] defines a contact with a bootstrap node B whose address is known detects the node that manages id, by a greedy routing connects to the close neighbours on the ring (local contacts) estimates the number of nodes on the Symphony ring through the heuristics connect to k neighbours chosen at random (remote contacts) choose a value x [0,1] according to harmonic probability distribution: P(X = x) = 1/(x *log n) where n is the number of nodes in the overlay try to define a remote connection with the node managing the point whose distance is x. the long range links are then periodically updated to face churn. 26
SYMPHONY: A KLEINBERG BASED DHT Voluntary leave of a node n from the ring: eliminate all the long range links for each long range link pointing to n and starting from node y, notify to y the leave of n each y must detect a new target for its long range link the neighbours of n update their short range links the estimate of the total number of nodes of the network is recomputed by each neighbour of n. 27
SYMPHONY: CONCLUSIONS Greedy Routing ( a la Kleinberg ): each request is routed toward the node that manages the segment closest to the key in the request Theorem: the average number of steps of the Symphony routing algorithm with k = O(1) remote connections defined by each node is inversely proportional to k and proportional to (log n) 2 28
FREENET: INTRODUCTION by Ian Clarke, Oskar Sandberg, et. al.: a censor-resistant, secure, adaptive, unstructured Peer-to-Peer network. https://freenetproject.org/index.html a distributed storage and retrieval system we will focus on the topology aspect, not on security and anti-censorship versions 0.5 classic idea: an adaptive P2P network 0.7 new version in 2006 with new network strategy combining darknet (friend-to-friend) networks idea with smallworld graph and Kleinberg we will focus only on 0.7 29
FREENET 0.5 ROUTING AT A GLANCE 0.7 routing is based on 0.5 routing so we briefly recall this 0.5 routing: shares some characteristics with Gnutella and some with a DHT guided by the key, like for DHT bounded depth first backtrack key-based routing TTL based no flooding like Gnutella like Gnutella may return false negatives like a DHT the routing table stores pointers to GUID of other nodes each key is propagated to the node with the closest GUID 30
FREENET 0.5 ROUTING AT A GLANCE routing table stores pairs: content ID, node which stores that content. unique message identifiers used to detect loops (like Gnutella) depth-first search with backtracking content is cached on reply path improve subsequent access (spread popular data) improve fault tolerance by replicati 31
FREENET 0.5 ROUTING AT A GLANCE The request is forwarded to the node with GUID closest to the request. In case it reports failure, the 2nd best (step 4 and step 9 in example), 3rd best, etc. is chosen. Request: messages have their own message ID, so predecessor is known and loops (step 6/7 in example) can be detected. TTL (Time To Live Counter) limits the number of hops and is decremented at each hop. 32
FREENET 0.5 ROUTING AT A GLANCE Reply send the message with data and source locator (for connecting to nodes with close GUID data, see last slide) to the predecessor in the path. the predecessor caches the data with higher probability the closer it is to the target. anonimity: the predecessor may also alter the source (step 11 in the example, from Bob to Dave). 33
FREENET 0.5 ROUTING AT A GLANCE Cache Maintenance if new content arrives, delete least popular cache entries if necessary. 34
FREENET 0.5 ROUTING AT A GLANCE Routing tables contains pairs (keys, GUID of a node) Routing adaptively modifies the routing table: when a node knows that a key K is owned by node m it inserts the pair (m,k) in the routing table Links with new nodes are defined during the system life Content may be replicated on the way back of the queries Routing tables and content storage have limited size: exploit LRU strategy to maintain the size limited. 35
FRIEND TO FRIEND NETWORKS P2P networks provide decentralization and this enables to avoid the presence of a single central entity controlling the users' data decentralization is not enough to guarantee data privacy: several attacks may tamper data vulnerable to harvesting : people you don t know can easily discover whether you are part of the network a new philosophy: Friend to Friend Networks or Dark Nets: limit the connections only to trusted friends 36
FRIEND TO FRIEND NETWORKS Friend to Friend Networks or Dark Nets: limit the connections only to trusted friends advantage: only your trusted friends know you are part of the network disadvantage: traditional F2F networks are small with only few participants that form a closed group and thus, a closed network. may not scale to a large number of friends 37
FRIEND TO FRIEND NETWORKS overlay reflects social trust graph, only connections with the friends, based on trustness, topology is fixed the main reason for this is to obscure the participation of a node in the network because it is only directly visible to its friends. hidden participation (no 3rd party disclosure: hidden friendships ) compared to popular anonymous communication systems like TOR, the restriction of identity disclosure to trusted participants aims at providing additional protection against network attacks identification prosecution an intuitive solution for censorship resilient and privacy-preserving online social networks 38
FRIEND TO FRIEND NETWORKS communication between untrusted nodes is done indirectly via friend-tofriend links. the content injected by a user can still be stored (encrypted) on the data storage of a non friend node, but links between peers are trusted. a distributed storage system is always possible, like in 0.5 assign GUID to nodes and to content and store the content on with the closest ID...but need efficient routing which hard to achieve in such overlays because links cannot be chosen freely, but they are constrained by the trust relationshios 39
FREENET 0.7 Main goal: to build a global darknet no links to any other node than our predefined friends! no self adaptive algorithm for connecting to new nodes after replies. no need to propagate source to optimize graph (cluster close GUIDs) the darknet is expected to be a small-world Kleinberg-like graph. Focus on privacy AND efficiency A modified routing algorithm (different routing tables) 40
FREENET 0.7: BASIC IDEAS A (F2F) Darknet is essentially a social network of people trusted relationships. We know that (real) social networks have a Kleinberg behaviour (remember the Milgram experiment). People are able to find paths in the social graph: there exists a short path O(log N) between any pair of nodes Jhon Keinberg explained in 2000 how small world network can be navigable if people can rout in a social network, then it should be possible also for computers connected by F2F links, which reflect the social relationships 41
FREENET 0.7: BASIC IDEAS The main problem is: social routing is based on social notions which are not easily transferred in a routing algorithm. is Alice closer to Harry than Bob? in real life, people presumably use a large number of factors to decide this. Where do they live? What are their jobs? What are their interests? one cannot, in practice, expect a computer to route based on such things but remember: Kleinberg model tells us there should be a few long range connections and a many short ones. we can assign numerical identitied to peers so that this is fulfilled nodes corresponding to a community of friends should have numerically close identifiers, and communities should be connected by long range links 42
FREENET 0.7: BASIC IDEAS Idea: reverse engineer the node identifiers based on the friend-to-friend connection in the network Addressing and routing: in structured overlays (DHTs): choose an ID choose neighbours, according to the chosen ID in F2F overlays: nodes cannot choose neighbours, these are restricted to the friends to guarantee logarithmic routing: adapt the node ID! 43
THE FREENET 0.7 OVERLAY each node is assigned unique location key that is a number randomly chosen generated 0 and 1. the key values in the range 0 and 1 are to be thought of as being real numbers arranged on a circle (like the Chord ring) any arithmetic on the key values is carried out modulo 1 links between nodes correspond to friend-to-friend relations messages can only travel on these links 44
THE FREENET 0.7 OVERLAY under the trusted connection model, we have to assume that the graph of nodes and links between them is fixed, and cannot be optimized for routing purposes. this is a very different model from that of previous version of, and of other DHT systems that construct a graph explicitly so as to make efficient routing possible. 45
WHY A NEW ROUTING ALGORITHM? the darknet philosophy does not allow nodes to define connections with untrusted nodes (long range links) the routing algorithm must exploit only connections to friends Some major modifications are needed to guarantee the Kleinberg based routing (small world): the efficiency of routing depends on the structure of the fixed network. if there is no short path between two nodes, it will not be possible for a content inserted at one node to be reachable at another node in a short number of steps, regardless of how well we route. but...the graph is a subgraph of the world s social network, and this has been shown to be a small world! so, it must be possible to define an efficient routing algorithm! 46
FREENET 0.7: LOCATIONS SWAPPING Kleinberg s model suggests: there should be few long connections, and many short ones. we can assign numerical identities placing nodes in a circle, and do it in such a way that this is fulfilled. in other words, we reverse engineer the nodes positions based on the connections in the network. each node will have many connections to close neighbours (close with respect to the location key) linked friends will have close location keys and a set of long range links some linked friends will have far location keys then greedy route with respect to these numerical identities. 47
FREENET 0.7: LOCATIONS SWAPPING when nodes join the network, they choose a position on the circle randomly then nodes periodically switch position with other nodes so to minimize the product of the edges distances each connected pair of nodes (u,v) computes the ratio R between the product of the distances of u and v with their current neighbours and the distances between u and the neighbours of v and of v and the neighbours of u: if R >1 swap the location keys otherwise swap with probability R switching improves routing performance by giving close friends close identifiers some long range links are still present due to the probabilistic choice 48
FREENET 0.7: SWITCHING EXAMPLE 49
FREENET 0.7: SWITCHING EXAMPLE 50
AN ADVANTAGEOUS POSITIONS SWAP Red node changes its location with green node, but its neighbours do not change! 51
AN ADVANTAGEOUS POSITIONS SWAP Red node changes its location with green node, but its neighboursdo not change! 52
FREENET 0.7: LOCATIONS SWITCHING swapping location keys does not alter the physical connectivity of the network. In other words, with regard to who is whose neighbor, the network remains the same after a swap as it was before. from an operational standpoint, a major consequence of swapping the location keys is the possible migration of data objects from one node to another. data objects are stored on the basis of closeness of their hash values to the location keys at a node. so if two nodes are swapping their location keys, they would also need to swap their data objects. switching is an ongoing process as the network grows and shrinks it should be difficult to maintain permanent positions 53
FREENET 0.7: LOCATIONS SWITCHING Theorem [Sandberg]: The swapping action at every pair of nodes will eventually cause the location keys to converge to a state in which the routing needed for the GET and PUT requests will take only O(logN) steps with high probability. 54
FREENET 0.7 ROUTING similar to 0.5 routing, but no adaptivity (friend links) data identified by keys data stored at the node with the closest identifier (the closest library) depth-first routing in order of proximity to key (similar to 0.5) the request is routed to the neighbor whose location is closest to the key forwarding stops when data is found, HTL reaches zero or identical request was recently forwarded (to avoid circular routing) note that it is possible for a node to reset the HTL value to what it was originally in order to extend a path. 55
FREENET 0.7 GET REQUESTS Let s consider a GET request issued at F for a data object whose hash key is 0.10 with HTL equal to 2. F is only allowed to talk to D, D will receive the request with HTL=1. D has not the object there and forwards the request to G whose location key (0.73) is closest to the data key 56
FREENET 0.7 GET REQUESTS G does not find the data object in its store and will check the location keys at all its neighbors (not including the one from which the request was received) before responding negatively to the GET request. G discovers that B s location key is closer to the requested data key 0.10 than its own location key. So it will send the GET request to B after resetting its HTL to its original value of 2. 57
FREENET 0.7 GET REQUESTS The search will continue this way until HTL becomes zero and the location keys at all the neighbors are further away from the data key than the node that is the current holder of the request. At that point, the current node will either respond with the data object, if its exists in its store or will report non existence of the data object. 58
FREENET 0.7 GET REQUESTS note that the data object of key 0.10 is stored at node A but the search path fails to reach that node there is no theoretical guarantee that a data object will be found or that a data object will be stored at its globally best node. 59
FREENET 0.7 PUT REQUESTS PUT requests are routed the same as GET requests: Client initiates PUT requests Request is routed to the neighbor closest to the key If the receiver has any peer whose location is closer to the key, request is forwarded If not, the node resets the HTL to the maximum and sends the put request to all of its' neighbors Routing continues until HTL reaches zero (or node has seen request already) Once item is inserted at a node, it resends the request out to all known peers (replication) 60
FREENET 0.7 PUT REQUESTS 61
FREENET 0.7 PUT REQUESTS 62
FREENET 0.7 PUT REQUESTS 63
FREENET 0.7 PUT REQUESTS 64
IS FREENET 0.7 A SMALL WORLD? The definition of a small network is now more complex with respect to 0.5 no link to untrusted nodes can be extablished Basic idea: each node must have a set of short range links and some long range links since connections between nodes cannot be modified, dynamically modify the location keys of the nodes so to define a Kleinberg based overlay 65
APPLICATIONS BUILT ON FREENET 0.7 Freesites Frost Thaw Internal websites equivalent of WWW FProxy freesite browser jsite - Freesite creator message board/chat system feature rich, used for file sharing convenient access to FS API GUI filesharing upload/download/search Freemail email between users uses normal email client all applications are usable ONLY on network 66
CONCLUSIONS structured overlays (DHTs) and semi-structured overlays: node chooses an ID then chooses its neighbours, according to the ID chosen neighbours can change: for instance connections with new neighbours can be defined during the routing of queries friends of friends overlay neighbourhood is restricted to the friends and changes only when a new trusted friends joins the network node ID may change during the computation: this is exploited to obtain logarithmic bounds 67