Load balancing mechanism in range query enabled P2P networks 2009. 08. 27 Park, Byunggyu
Background Contents DHT(Distributed Hash Table) Motivation Proposed scheme Compression based Hashing Load balancing Technique using FRI(Fixed Routing Identifier) Summary
What is DHT? Background DHT provides the object lookup service for P2P applications Provide two primitives : put(k,v), get(k) Scalable O(logN) routing cost Provide load balancing Consistence hash function: O(logN) imbalance Support only point query E.g. Chord, CAN, Pastry, Tapestry, etc put(key, data) Distributed application get (key) Distributed hash table data node node. node
Background Due to un-flexible query support of DHT, application can be restricted Currently many applications require complex query support Multiple keyword Range Semantic Device: printer Type: laser PPM: 4 x 10 10 Query? Device: printer Type: laser Printing device
Background How to enable flexible query support in DHT? Object namespace Object namespace DHT Node ID space Lose order semantics Support only point query O(logN) Load imbalance Node ID space Order preserving mapping Support complex query Serious load imbalance
Motivation 1) Order-preserving mapping can provide flexible query 2) It causes load imbalance problem because of clustered property of data
Proposed Scheme Compression based Hashing Based on arithmetic coding Order-preserving mapping balancing load Load balancing Technique using FRI(Fixed Routing Identifier) Based on virtual server
Compression based hashing Object namespace Symbol probability Range a 0.80 [0.00, 0.80) b 0.02 [0.80, 0.82) c 018 0.18 [0.82, 1.00) Compression based acb 0.773504 Hashing 0.00 0.00 0.656 a Node ID space 080 0.80 Order preserving mapping 0.82 b Support complex query c Relaxed load imbalance 100 1.00 0.80 0.656 c 0.7712 0.77408 b <Arithmetic coding> 0.773504
Compression based Hashing Politic Sports Society Finance Culture...... Books Music Law education......... a aa f a f e ae f a b f null b z z f 000 0.00 000 0.00 Sample data Construct trie and calculate Get the compressed value of corpus frequency of each symbol using arithmetic coding System pre-processing 1.00 Binary representation ti of compressed value a f f b f f z f f ae f f Peer processing
Lookup process Compression based Hashing Metadata(K D ) Lookup(K D ) Compressed value (K D(C )) Compression(K D ) Binary representation (K D(2) ) N1 Translate(K D(C) ) N48 N8 Get(K D(2) ) N14 N42 N38 N32 N21
Data uniformity Evaluation Training data set : ACM keyword
Evaluation Unbalance factor {(L i E) 2 /E} of brown corpus Training data set: ACM keyword
Advantage of CBH Compression based Hashing Obtain uniform data distribution Order-preserving mapping Support complex query Flexible Can be applied lidto different type of fdata model dl Dis-advantage Require training data set Performance depends on accuracy of sample data
Load balancing Techniques in DHT Selective node join Node migration Support dynamic load balancing Virtual server Provide find-grained load balancing No change in underlying DHT High maintenance cost O(logN) virtual servers per physical node O(LogN) 2 routing entries per node Long query routing length Unstable routing Dynamic node leave/join Simple Hard to provide fine-grained load balancing Replication/Cache
Load balancing Techniques in DHT Virtual server Logical node in DHT Transfer unit in order to balance load Node5 Node A Node 1 Node1 Node3 Node4 Node6 Node 7 Node 2 Node B Node 6 Node 3 NETWORK Node C Node2 Node 5 Node 4 Node7 <Logical view of V.S > <Physical view of V.S> <Chord with V.S >
Load balancing Technique using FRI Node is classified into two types of nodes One physical node has one routing node and several storage nodes Routing node Has fixed routing identifier Maintains O(logN) routing entry Storage node(virtual server) Has storage identifier and shares routing identifier Miti Maintains constant number of routing entry Predecessor of routing node + successor of routing node Can be migrated to other node
Load balancing Technique using FRI Load balancing based on V.S S.Node R.Node R.Node S.Node S.Node R.Node S.Node S.Node S.Node Overloaded! R.Node SNode S.Node NETWORK S.Node S.Node R.Node S.Node
Routing in Chord using FRI Routing table structure N1 Interval F.R.I N8 1 N8 2 N8 Nodes S.R.I Snode 0 SRI S.R.I 0 Snode 1 S.R.I 1 N48 N14 <Routing table of Routing node N1> N42 Type F.R.I N38 N32 N21 Routing node Shared F.R.I Successor R N1 N8 - - Storage node <Routing table of Storage node>
Routing example Routing in Chord using FRI N1 N8 N48 N14 Routing node N42 Storage node N38 N32 N21
Load balancing in Chord using FRI Load information gathering Log(N) information from finger table Used to reassign storage node N1 Interval FRI F.R.I L N8 1 N8 2 N8 N48 L L N14 Finger table of N1 N42 Routing node N38 Storage node N32 N21
Load balancing in Chord using FRI Clustered routing node vs uniform routing node Skewed finger pointer vs High popular region Routing node Storage node Can not guarantee logn routing hops Hard to get overall load information Guarantee logn routing hops Easy to get overall load information
Load balancing in Chord using FRI Node join Random join (h c (IP) = F.R.I) Sequential join J J Routing node join J J Uniform routing node distribution Efficient load sampling Optimal routing
Load balancing Technique using FRI Advantage of FRI Include all advantages of virtual server scheme Fine-grained load balancing General and dflexible Solve inherent problems of virtual server Still provide O(logN) maintenance cost per physical node Stable routing Shorter query routing gpath Dis-advantage Change in underlying DHT Routing algorithm
Evaluation Load distribution according to increase of number of storage nodes 100 90 80 Num mber of data 70 60 50 40 30 FRI with 500 Rnodes 3 Snodes FRI with 500 Rnodes 2 Snodes FRI with 500 Rnodes 3 Snodes Original chord with 500 nodes 20 10 0 100 200 300 400 500 Node distribution
Evaluation Average routing path length of FRI vs Virtual Server 9 8 length Average e routing path 7 6 5 4 3 2 Original Chord Chord with FRI Chord with V.S 1 0 2 4 8 Number of V.S
Summary To satisfy various application requirement, complex query should be supported in DHT layer Order-preserving mapping Load balancing problem due to skewed data distribution Compression based hashing Provide order-preserving mapping Provide uniform data distribution Load balancing Technique using FRI Dynamic load balancing based on virtual server Reduce maintenance overhead O(logN) + C routing entries Fine-grained balancing Shorter routing path
C.B.H Future Work How to guarantee the accuracy of sample data? Comparison with other order-preserving hashing function Load Balancing using F.R.I Comparison with V.S scheme Balancing ratio( number of nodes, storage nodes, data ) Balancing overhead( dynamicity of P2P...) Maintenance cost
Thank you