DYNAMIC LOAD BALANCING IN A DECENTRALISED DISTRIBUTED SYSTEM 1 Introduction In parallel distributed computing system, due to the lightly loaded and overloaded nodes that cause load imbalance, could affect the time needed to complete a task to increase. Besides that, since the distributed system is shared by multiple users with their own computing task, load imbalance could affect other computing task too. Lightly loaded nodes which are capable of compute more jobs, might finish their task faster and sit idle. As a result, the utilisation of the distributed system is not optimised. In order to solve this, load balancing algorithm is used by either transfer jobs to lightly loaded nodes or offload some jobs from heavily loaded nodes. Load balancing algorithm can be further categarised as static or dynamic load balancing (Casavant & Kuhl, 1988). Both type of algorithms have the same goal, which is to optimise the system utilisation by effectively distribute jobs to nodes. A static load balancing algorithm formulates the job distribution decision before the execution of the program, that is during the compilation time. Whilst, dynamic load balancing algorithm distribute jobs during the execution of the program. In other words, static approach is most effective in a homogeneous environment because each node knows the structure of the system. On the contrary, a heterogeneous environment is most suitable to use dynamic approach because the structure of the system and nodes are not known until the execution of the program. Nodes in this approach uses the current system state information during execution of the program to formulate the decision. At any point of time, the decision regarding a job distribution might change due to the state of the system varies. Dynamic load balancing can be implemented in a centralised or decentralised model. The former is which one node decides on the job distribution, and the latter, at least two nodes are formulating the decision. Towards the decentralised model, Stankovic 1
and Sidhu (1984) proposed a bidding algorithm that allows an overloaded loaded node to initiate a bid request for more jobs to other nodes in the system. Around the same time, a drafting algorithm has been proposed by Ni, Xu, and Gendreau (1985) in which each lightly loaded node sends out a draft of job model to search for more jobs to execute. The bidding and drafting approaches are categorised as the sender and receiver initiated approach respectively. As described by Casavant and Kuhl, a sender initiated approach is which an overloaded node initiate a job transfer request, and a receiver initiated approach is otherwise. As noted that in a dynamic load balancing both sender or receiver initiated frequently request for load information from other nodes to formulate a job distribution decision. Hence, information exchange between nodes is an important factor towards the job distribution decision. Acker and Kulkarni (2007) proposed an algorithm that allows nodes joining and leaving the distributed system. In order to do so, Acker and Kulkarni use a neighbour list to store and keep track of nodes information, and periodically exchange load information among its neighbours. As there are more nodes joining into the system, the list of neighbours as well as the overall communication messages in the distributed system increase. Lu, Subrata, and Zomaya (2006) introduced the usage of mutual information feedback technique for updating neighbour list to reduce the overall communication messages. Apart from that, the formation of neighbour list is also an important factor in decentralised approach as it determines the neighbours that a node has. Nandagopal, Gokulnath, and Uthariaraj (2010, 2011) proposed Sender Initiated Decentralised Dynamic Load Balancing (SIDDLB) algorithm which their neighbour list formation is based on nodes that are higher computing powered to a node and has a relatively low network delay. This way, whenever a node is overloaded, it can make a better job distribution decision because it will always escalate a job up to a higher computing node. 2
2 Problem Statement & Research Objectives Due to the neighbour list formation, the nodes in SIDDLB approach suffers from load imbalance when there are nodes that do not have any nodes in the neighbour list, especially the highest computing node. Assuming that the network delay is negligible, the neighbour list for a node contains only nodes that are relatively high computing power. As a result, the neighbour construction causes the node with the highest computing power to empty. When this node is overloaded and a job arrives at that time, it has to accept the job since that it has no neighbour to offload the job to. The objectives of this research are as follows: ˆ To evaluate the performance impact for different sizes of workload towards a distributed system. ˆ To analyse and design of a decentralised dynamic load balancing algorithm that focuses on the neighbour list. ˆ To minimise the overall response time of computing tasks and maximise the system utilisation. ˆ To simulate the proposed decentralised dynamic load balancing algorithm in a distributed system using network simulator. 3
3 Literature Review The literature review focused on the construction of neighbour list and the method to obtain and update load information, namely information exchange. In a decentralised model, since there are no centralised view of nodes in a distributed system, the knowledge about existing nodes is based on the neighbour list at each node. Hence, it is crucial to keep the neighbour list up-to-date. 3.1 Neighbour List Acker and Kulkarni (2007) proposed a control protocol to dynamically construct the neighbour list of a node. This has given the algorithm the ability to accomodate nodes joining and leaving the system at any point of time. Acker and Kulkarni assume that the system is able to support multicast messages. When a node receives a multicast message from a node, it will insert it into the neighbour list if it does not exists. Otherwise, the load information regarding the sender will be updated to the list. The algorithm periodically sends load information update to its neighbour. Several studies (Lu et al., 2006; Lu & Zomaya, 2007) have constructed the neighbour list based on the nodes that are relatively nearby in terms of network delay. Then, sort the list based on the network delay. This way, the neighbour list among the nodes in distributed system may not be identical. Apart from this, a similar approach applied by Nandagopal et al. (2010, 2011) to form the neighbour list based on both computing power and network delay of a node. Intuitively, apart from having the nearest nodes to be its neighbour, this also forms a set of nodes that are higher computing powered. So, when a node is overloaded, it will always escalate the load to its neighbour which is higher computing powered. This algorithm assumes that if a job is escalated until the highest computing node, there must be no other nodes that are able to execute the job. Hence, this approach leaves the highest computing node without any neighbour. 4
3.2 Information Exchange With regards to the neighbour list proposed by Acker and Kulkarni (2007), the load information update interval depends on the total number of neighbours. Assuming that T i is the total number of neighbours for node i and k is a fixed interval. Then, the update interval is said to be T i k. Which means, the number of neighbours affects the load information update interval. As a result, the overall communication message is minimised. However, the increasing number of neighbours for a node may affect the validity of the load information. In order to minimise this issue, instead of updating all the neighbours load information, partial information such as mutual information feedback approach can be used (Lu et al., 2006; Lu & Zomaya, 2007; Nandagopal et al., 2010, 2011). During the information exchange progress, a node randomly selects some of the load information from its neighbours and send it to its neighbour by piggybacking onto a job transfer. In return, the receiver will do the same thing. Upon a node receives the information update, it will update the load information based on the neighbour they have in common, otherwise, the information will be discarded. Besides that, a load information will not be updated if the information is outdated. 4 Simulations Simulations were mainly conducted by using OMNeT++ network simulation library and framework (Varga & Hornig, 2008) and OverSim framework library (Baumgart, Heep, & Krause, 2007). In general, there were two types of simulations conducted. The first simulation is to analyse the impact towards a distributed system for various sizes of workloads. In this simulation, a structured tree based distributed system is used because if there are changes in computing a job at a leaf node, it will affect time taken to complete the job in its parent node and so forth. Various scenarios have been designed in such away that each branch of the tree receives different amount of load. With such computational 5
application that is able to partition into smaller pieces, π numerical integral is used. The overall response time and waiting time are measured. After this study, the second simulation was regarding the proposed decentralised dynamic load balancing algorithm. The proposed approach was compared againts the approach proposed in (Nandagopal et al., 2010, 2011). There were various scenarios setup to simulate overloaded node from the lowest computing powered node to the highest computing node. The overall response time is evaluated among these two algorithms. The system utilisation or the performance of load balancing algorithm can be measured by the usage of standard deviation to study the variation of overall response time. A smaller magnitude implies the effectiveness of the algorithm. 5 References Acker, D. S., & Kulkarni, S. (2007, May). A dynamic load dispersion algorithm for load-balancing in a heterogeneous grid system. In Sarnoff symposium, 2007 IEEE (pp. 1 5). doi: 10.1109/ SARNOF.2007.4567375 Baumgart, I., Heep, B., & Krause, S. (2007, May). OverSim: A flexible overlay network simulation framework. In IEEE global internet symposium, 2007 (pp. 79 84). doi: 10.1109/GI.2007.4301435 Casavant, T. L., & Kuhl, J. G. (1988, February). A taxonomy of scheduling in general-purpose distributed computing systems. IEEE Transactions on Software Engineering, 14 (2), 141 154. doi: 10.1109/ 32.4634 Lu, K., Subrata, R., & Zomaya, A. Y. (2006). Towards decentralized load balancing in a computational grid environment. In Advances in grid and pervasive computing (Vol. 3947, pp. 466 477). Springer Berlin Heidelberg. Retrieved from http://dx.doi.org/10.1007/11745693 46 doi: 10.1007/ 11745693 46 Lu, K., & Zomaya, A. Y. (2007). A hybrid policy for job scheduling and load balancing in heterogeneous computational grids. In Sixth international symposium on parallel and distributed computing, 2007. ispdc 07. (pp. 19 27). doi: 10.1109/ISPDC.2007.4 Nandagopal, M., Gokulnath, K., & Uthariaraj, V. R. (2010, September). Sender initiated decentralized dynamic load balancing for multi cluster computational grid environment. In Proceedings of the 1st amrita acm-w celebration on women in computing in india (pp. 63:1 63:4). New York, NY, USA: ACM. doi: 10.1145/1858378.1858441 6
Nandagopal, M., Gokulnath, K., & Uthariaraj, V. R. (2011, March). Load distribution through optimal neighbor selection in decentralized grid environment. European Journal of Scientific Research, 50 (4), 575 585. Retrieved from http://www.eurojournals.com/ejsr 50 4 13.pdf Ni, L. M., Xu, C.-W., & Gendreau, T. B. (1985, October). A distributed drafting algorithm for load balancing. IEEE Transactions on Software Engineering, SE-11 (10), 1153 1161. doi: 10.1109/ TSE.1985.231863 Stankovic, J. A., & Sidhu, I. S. (1984). An adaptive bidding algorithm for processes, clusters and distributed groups. In ICDCS (pp. 49 59). Varga, A., & Hornig, R. (2008). An overview of the OMNeT++ simulation environment. In Proceedings of the 1st international conference on simulation tools and techniques for communications, networks and systems & workshops (pp. 60:1 60:10). ICST, Brussels, Belgium, Belgium: ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering). Retrieved from http://dl.acm.org/citation.cfm?id=1416222.1416290 7