1 Shuang Kai, 2 Qu Zheng *1, Shuang Kai Beijing University of Posts and Telecommunications, shuangk@bupt.edu.cn 2, Qu Zheng Beijing University of Posts and Telecommunications, buptquzheng@gmail.com Abstract Compared to the flat DHT-based P2P network, hierarchical DHT-based P2P network can take advantage of the stable and high-performance super nodes to improve the performance of P2P network in a dynamic environment. However, the node load imbalance is one of the basic problems faced by the hierarchical DHT-based P2P network. In this paper, we propose a load balancing method to the Single-connection intra-group structure (SiCo) of the hierarchical DHT-based P2P network. The method predicts the status of the node in the future by using the improved LMS algorithm and writing feedback. The simulation results show that in the SiCo hierarchical DHT-based P2P network, the average load of the system is greatly reduced, the predicted results reflect the case of the node more accurately, and the probability of successful completion of the tasks greatly increased. 1. Introduction Keywords: Hierarchical DHT-based P2P Network, Load Balancing, LMS, P2P In distributed hash table (DHT) based P2P networks (referred to as structured P2P networks or DHT network), all nodes, regardless of how their computing capacity and stability all bear the same functional role, which is responsible for the registration and discovery of the keywords in Hash space. The typical systems are Chord [1], KAD [2] and so on. However, it has been shown that: (1) the computing capabilities of node in the P2P network (including storage, bandwidth and CPU performance, etc.) are quite different. (2) As node in the P2P network can join and leave at any time, the stability of node is also very different. With the growth of the P2P network, weak nodes which have a poor computing ability or dynamic change severely restricts the performance of P2P networks. In order to solve this problem, researchers have proposed many kinds of hierarchical P2P networks. The typical systems are Chord2 [3], HIERAS [4] and Structured the super peers [5]. The main advantages of the hierarchical P2P network are: (1) Stable and high-performance node is selected as super node. (2) Compared to the flat DHT-based P2P network, the cost of maintenance and the count of search are smaller in the hierarchical P2P network. (3) And it can effectively deal with the problem of churn which is the seriously impact on the performance of P2P network caused by the nodes frequently join and leave. The other advantages of the hierarchical P2P network are more specific discussion in the literature [6]. Hierarchical DHT-based P2P network divide nodes into upper and lower layer on logically. Lower nodes are often referred to as the normal node or leaf node. All the nodes participate to the lower layer at first. Then stable and high-performance nodes are selected as super node and organized as upper layer in accordance with the DHT protocol. Every super node undertakes the task of upper DHT agreement, in addition to manage a set of leaf nodes including register and find of the keywords on behalf of its leaf nodes. The structure of hierarchical DHT-based P2P network has many types, which have different advantages and disadvantages. The detail introductions see Section 2 related work. This paper only discusses Figure 1 in Section 2 which shows the structure of hierarchical DHT-based P2P network. The load balancing method proposed in this paper is used for this structure. Its organizational structure is that each leaf node only connects with its super node and super node organized as upper layer in accordance with the DHT protocol. The remainder of the paper is organized as the following: Section 2 discusses the related work. Section 3 describes the improved load balancing algorithm. Section 4 simulates and compares the improved and traditional load balancing algorithm. At last, give the conclusion. Advances in information Sciences and Service Sciences(AISS) Volume5, Number10, May 2013 doi:10.4156/aiss.vol5.issue10.127 1085
2. Related Work As so far, researchers have proposed many structures of hierarchical DHT-based P2P network which divide nodes into upper and lower layer logically. Super nodes are in the upper layer and organized in accordance with the DHT protocol. In most of the hierarchical DHT-based P2P network, super nodes are organized in accordance with the Chord protocol. But in the structured super-peers, super nodes are organized in full graph. The organizational structure of the lower layer can be divided into two categories. One category is that the leaf nodes are connected to each other in accordance with the DHT protocol. The other one is that two leaf nodes which are managed by two different super nodes are disjoint and each leaf node only connects with its super node, as shown in Figure 1. So in this paper, we only discuss this kind of hierarchical DHT-based P2P network which is the Singleconnection intra-group structure (SiCo). Figure 1. The SiCo Hierarchical DHT-based Network In the Ideal case load balancing refers to the node bear the load based on its proportion of carrying capacity [7]. As so far, researchers have proposed many load balancing methods in flat DHT-based P2P network. But the load balancing methods in hierarchical DHT-based P2P network is very limited. And Literature [8][9][10] point out the importance of hierarchical DHT-based P2P network. In the flat DHT-based P2P network, many load balancing methods are using of the virtual server which was first proposed in Literature [11] and expanded in Literature [12][13]. The basic idea of this method is redrawing the hash space and keyword of the node by the virtual server. Its essence is to reach load balancing by frequently joining and leaving the network. This method increases the network churn and the cost of maintenance. So this method is not suitable for the SiCo hierarchical DHT-based P2P network. Therefore, this paper intends to solve the problem of load balancing in Figure 1. 3. The Improved Load Balancing Algorithm 3.1. Node Selection Normally, when we calculate the load in SiCo, we just considered the node s previous information, such as the completion status of tasks of a node, or a node s processing capabilities, which have certain hysteresis. Actually, in the new round of load calculation, the status of normal nodes has changed, so the information that super nodes handles cannot reflect the nodes real time status. In order to construct a better real-time system, when the super node collects normal nodes information, it could propose the specific requirements to the normal nodes, such as the number of current queued tasks, or the processing capacity. When a normal node submits its own status information, it should tell the super node its processing capacity, the number of current queued tasks, and some other information. In the traditional hierarchical DHT-based P2P network, we just consider the current information of a node when calculate its load. However, it takes some time while calculating the load and transmitting the data. In this process, the status of a node may change. 1086
Therefore, we predict the future status of the node and add them into the feedback process, so that it can more fully reflect the node information. 3.2. Load Calculation Algorithm In this paper, we use LMS algorithm to predict the condition, and then submit it to the super node. We made some improvements to the traditional LMS algorithm, and make it more suitable to the characteristics of hierarchical DHT-based P2P network. The LMS (Least Mean Square) algorithm [14] is proposed by Widrow and Hoff in 1960. It s an algorithm which based on error-correcting learning. Because of its simple structure, good stability and easy to comply, it has been widely used. In LMS, X(n) is the input examples, W(n) is the weight of X(n), d(n) is the predictable output, and e(n) is the output error. The goal of the algorithm is to adjust W (n) through X (n) and d (n) continuously to make e(n) smallest. When calculate the load of a normal node, we usually consider the followings: the processing capacity, and the number of queued tasks. Suppose the time the data transmission process costs t, the moment when the nth task begin is t n, then X(n) = (the processing capacity at the moment ( t n - t ), the number of queued tasks at the moment ( t n - t )), d(n) is the node s condition at the moment t n, W (n) represents the correspondence relationship between X (n) and the actual output. 3.2.1. The Weight Adjustment Mechanism Based on Forgotten Degree From the traditional weight correction formula W (n 1) W(n) X T (n)e(n), We could know that the n+1th weight is just related to the nth weight and the input examples. Actually, in the DHT network, the future condition is related to a series conditions before. If we only consider the nth condition, we will get inaccurate results. We use a constant number as the forgotten degree, which represents the previous n-1 times effect on the n+1th condition. The improved weight adjustment formula is as follows: W (n 1) (1 ) W(n) S(n - 1) X T (n)e(n) 0< <1 1 n 1 Here S(n - 1) e i nw (i ), which represents the previous n-1 times infection on this time s result. i 1 Since S( n) n e i n 1 W (i ) e 1 ( S (n 1) W (n)), when we calculate the weight, we just need i 1 S(n-1) other than the every weights before. The W(i) s influence on W(n) decreases as time increases. Assuming that in the DHT network, the frequency super node requests obey the Poisson distribution of parameter. We can believe that normal nodes allocated tasks obey the Poisson distribution of parameter / N approximately. Here N is the normal nodes total number. The larger is, the larger the interval between two tasks is, and the less the last task influences the current task; the smaller is, / N to ensure that the (n-1)th conditions influence to the (n+1)th value the opposite. We set 1 e is negative correlation with, and at the same time, the influence S(n-1) makes on W(n+1) is much smaller than the influence W(n) makes. 3.2.2. Adaptive Learning Rate Since in DHT network, nodes' processing capacity is limited, it's very important to reduce the run time and the load. In LMS, the learning rate determines the convergence speed. The smaller is, the slower the algorithm convergence but more stability. When is bigger, the opposite. In order to achieve faster convergence speed and better stability, [15] proposed an adaptive learning rate LMS algorithm, the core formula is as follows: 1087
min ' (n 1) min (n 1) ' (n 1) else ' (n 1) max max 2 ' (n 1) (n) e 2 (n) 0< <1 >0 3 In this algorithm, when e(n) is large, also becomes large, which ensure that the convergence speed is fast; when e(n) turns to be small, is also small, which ensure a good stability. On the basis of the algorithm in [15], this paper puts forward an improved adaptive learning rate LMS algorithm: ' (n 1) ( n ) e e ( n ) e 2 ( n) 4 are the same with the parameters in [15]. From formula Here is a constant, and,, min, max (4) and (3), we could know that when (n) fixed, if e(n)>, the learning rate in (4) is larger than that in (3), which ensures that it can get convergence more fast and saves tasks running time; when and e(n) <, the learning rate in (4) is smaller than that in (3), and since the curve of e e ( n ) e 2 (n) is more smooth than that of e 2 (n), the algorithm is more stable than the basic adaptive learning rate algorithm. 3.2.3. The LMS-based Load Calculation Steps a) Initialization, including assigned W j (0) to a small nonzero value randomly. b) For a set X(n) and the predicted output d(n), calculate the two formula: e(n) d (n) X T (n)w (n) and W (n 1) (1 ) W(n) S(n - 1) (n) X T (n)e(n) ; c) Judge the result to determine that whether it meets the end conditions ( e(n)< or reach the maximum iterations number M). If it meets, the iteration is over; otherwise, the n+1, and then turn to b). d) Before the completion of M tasks, the load calculation just contains the current status of nodes. After the M tasks accomplished, the normal nodes take M latest data to use the improved LMS algorithm, and obtain a set of weights W. Multiply the current situation by the weights, and submit it to the super node. 4. Simulation 4.1. Load of System Suppose that the system has reached a steady state. In order to reflect the heterogeneity of the node s loading capacity, we use the Zipf distribution to generate the nodes loading capacity (In [16][17], this distribution is used to generate P2P nodes loading capacity). In the simulation, the loading capacity of each node i is ci= 1000i-β. Hereβ=1.2, and the total loading capacity of the system is C= c1+ c2+ + cn. The load of each node is generated by two different kinds of randomly distribution: Normal distribution N (μ, σ2) and Pareto distribution (In [12], these two distributions are used to generate nodes load). In the simulation, the mean value of the Normal distribution is μ= [0.3: 0.05: 0.8]C/ n, which references the average carrying capacity C/n, and [ 0.3: 0.05: 0.8] means that take a value at intervals of 0.05 from 0.3 to 0.8, totally 11 numbers. The standard deviation of the Normal distribution is σ= 0.1μ. The density function of Pareto distribution is f (x) = aba/ xa+1 (a,b>0; b x< ). Here a is the shape parameter, b is the scale parameter, b=μ(a-1) /a ( It can obtained by the expectation L= ab/(a1)). In this paper, a=1.5, the mean value of Pareto distribution is the same with Normal distribution, and its standard deviation is infinity. Since Pareto distribution is a skewed heavy-tailed distribution, it could reflect the loading distribution when the network fluctuates serially [18]. Assume that the load of node i is fi, then in some moment, the total load of the system is F= f1+ f2 + + fn, and the average load is F/n. 1088
Figure 2. The Average Load of The System In this section, we test the average load of the improved hierarchical DHT network and the traditional hierarchical DHT network. The results are shown in Figure 2. In the improved hierarchical DHT network, the system s average load is greatly reduced, which makes the system a more stable state. 4.2. Node s status when allocated tasks Select 5 tasks randomly when the system is stable, and compare the normal node s load at the time it submit feedback and the predicted condition, and the time it allocated task. Table 1. Load in different time Table 2. Queued tasks in different time From the two tables, we could know that the predicted load and queued tasks is closer to the actual situation when allocated a task than that at the time when it submits feedback, which means that the prediction of node s conditions can reflect nodes capacity more realistically. 4.3. The number of Task Completion Figure 3. The Task Completion Number 1089
Assuming the normal nodes may not be able to complete the task they allocated. The feedback is more similar to the actual status, the higher the probability to complete the task. This section run the system for 20 minutes, and compared the task completion number in the improved hierarchical DHT network and the traditional hierarchical DHT network with the task generation number. The results are shown in Figure 3. In the improved hierarchical DHT network, the predict value can more realistic reflect node s status so that the successful completed tasks increased significantly. 5. Conclusion In this paper, we introduced the prediction of nodes sates in feedback, which reflects the condition nodes allocated tasks more accurately, and the task success rate is increased. At the same time, we put forward some improvement to LMS algorithm, which makes it more suit the Single-connection intragroup structure (SiCo) of the hierarchical DHT-based P2P network. Since we introduced the improved LMS algorithm, compared with the traditional hierarchical DHT network, the system s average load reduced significantly, tasks completion probability increased, and the system becomes steady more quickly. 6. Acknowledgements National Key Basic Research Program of China (973 Program)(2009CB320504), Innovative Research Groups of the National Natural Science Foundation of China (61121061), Important national science & technology specific projects: Next-generation broadband wireless mobile communications network (2011ZX03002-002-01) 7. References [1] Stoica I, Morris R, Karger D, Kaashoek M F, Balakrishnan H, Chord: A scalable peer-to-peer lookup service for internet applications, In Proceedings of the ACM SIGCOMM, pp.149-160, 2001. [2] Maymounkov P, Mazieres D. Kademlia, A peer-to-peer information system based on the XOR metric, In Proceedings of the International Workshop on Peer-to-Peer Systems, pp.53-65, 2002. [3] Joung Y J, Wang J C, Chord2: A two-layer Chord for reducing maintenance overhead via heterogeneity, Computer Networks, vol. 3, no. 51, pp.712-731, 2007. [4] Xu Z, Min R, Hu Y. H IERAS, A DHT based hierarchical P2P routing algorithm, In Proceedings of the International Conference on Parallel Processing, pp.187-194, 2003. [5] Mizrak A T, Cheng Y, Kumar V, Savage S, Structured super peers: Leveraging heterogeneity to provide constant time lookup, In Proceedings of the IEEE Workshop on Internet Application, 104-111, 2003. [6] Garces-Erice L, Biersack E W, Ross K W, Felber P A, Urvoy-Keller G, Hierarchical peer-to-peer systems, Parallel Processing Letters, vol. 4, no. 13, pp. 643-657, 2003. [7] Huang Jie, Huang Bei, Huang Qiucen, An Improved Dynamic Load Balancing Algorithm for a Distributed System in LAN, JCIT: Journal of Convergence Information Technology, vol.5, no.10, pp.91-98, 2010. [8] Zoels S, Despotovic Z, Kellerer W, On hierarchical DHT systems-an analytical approach for optimal designs, Computer Communications, vol. 3, no.31, pp. 576-590, 2008. [9] Zoels S, Despotovic Z, Kellerer W, Load balancing in a hierarchical DHT-based P2P system, In Proceedings of the International Conference on Collaborative Computing: Networking, Applications and Worksharing, pp. 353-361, 2007. [10] WANG Bin, SHEN Qing-guo, An Effective Algorithm for Hierarchical P2P Load Balancing, JCIT: Journal of Convergence Information Technology, vol. 6, no. 5, pp.231-236, 2011. [11] Rao A, Lakshminarayanan K, Surana S, Karp R, StoicaI, Load balancing in structured P2P systems, In Proceedings of the 2nd International Workshop Peer-to-Peer Systems, pp. 68-79, 2003. [12] Godfrey B, Lakshminarayanan K, Surana S, Karp R, StoicaI, Load balancing in dynamic structured P2P systems, In Proceedings of the IEEE INFOCOM, pp.2253-2262, 2004. 1090
[13] Godfrey B, Stoica I, Heterogeneity and load balance in Distributed Hash Tables, In Proceedings of the IEEE INFOCOM, pp.596-606, 2005. [14] B. Widrow, Thinking about thinking: the discovery of the LMS algorithm, Signal Processing Magazine, IEEE, pp.100-106, 2005. [15] Raymond H. Kwong, Edward W. Johnston, A Variable Step Size LMS Algorithm, IEEE Translation on signal processing, vol.7, no.40, pp.1633-1642, 1992. [16] Zhu Y, H u Y. Efficient, proximity-aw are load balancing for DHT-based P2P systems, IEEE Transactions Parallel and Distributed Systems, vol.4, no.16, pp. 349-361, 2005. [17] Lu Q, Ratsnasamy S, Shenker S, Can heterogeneity make Gnutella scalable?, In Proceedings of the 1st International Workshop on Peer- to-peer Systems, pp. 94-103, 2002. [18] Ledlie J, Seltzer M, Distributed, secure load balancing with skew, heterogeneity, and churn, In Proceedings of the IEEE INFOCOM, pp.1419-1430, 2005. 1091