A Comparison of Dynamic Load Balancing Algorithms Toufik Taibi 1, Abdelouahab Abid 2 and Engku Fariez Engku Azahan 2 1 College of Information Technology, United Arab Emirates University, P.O. Box 17555, Al Ain, United Arab Emirates; 2 Faculty of Information Technology, Multimedia University, Jalan Multimedia, 63100 Cyberjaya, Selangor, Malaysia Received: 25/9/2006 Accepted: 24/12/2006 Taibi, Toufik, Abid, Abdelouahab, and Azahan, Engku (2007) A Comparison of Dynamic Load Balancing Algorithms. J.J. Appl. Sci: Natural Sciences 9 (2): 125-132. (2007) :..132-125 :(2) 9 Abstract: Distributed computing has the potential for running large-scale applications using heterogeneous and geographically distributed resources. However, a number of major technical issues must be handled before the full potential of distributed computing can be realized. Efficient job scheduling is a major prerequisite for the effective utilization of resources. Dynamic load balancing is the core of an efficient job scheduler. This paper describes the features of a Javabased simulator intended to analyze the performance of three load balancing algorithms, namely: senderinitiated, receiver-initiated and symmetrically-initiated algorithms using Average Waiting Time (AWT) and Average Turnaround Time (ATT) as criteria. Simulation results revealed that the symmetricallyinitiated algorithm performs better in almost all cases. Keywords: Dynamic load balancing, Sender-initiated algorithm, Receiver-initiated algorithm, Symmetrically initiated algorithm, Simulation. :.... :.. Introduction One of the primary goals of distributed computing is to share access to geographically distributed heterogeneous resources in a transparent manner. In such an environment, applications whose computational requirements exceed local resources can be executed. Moreover, the average job turnaround time will be reduced through workload balancing across multiple computing facilities. Distributed computing has evolved from Network of Workstations (NOW) [5] [6] to computational grids [4] in a bid to become a viable alternative to expensive dedicated parallel machines [3]. However, a number of major technical issues must be handled before the full potential of distributed computing can be realized. Efficient job scheduling is a major prerequisite for the effective utilization of resources. Dynamic load balancing is the core of an efficient job scheduler. Although numerous researchers have proposed scheduling algorithms for parallel architectures [2], the problem of scheduling jobs in a heterogeneous distributed environment is fundamentally different [4]. This paper describes features of a Java-based simulator intended to analyze the performance of three load balancing algorithms, namely: sender-initiated, receiverinitiated and symmetrically-initiated algorithms using Average Waiting Time (AWT) 125
Toufik Taibi, Abdelouahab Abid and Engku Fariez Engku Azahan * Principal author's e-mail address: toufikt@uaeu.ac.ae 126
and Average Turnaround Time (ATT) as criteria. The repeated simulation trails revealed that the symmetrically-initiated algorithm performs better in almost all cases. The rest of the paper is organized as follows. Section 2 describes the three load balancing algorithms. Section 3 describes the features of the simulator. Section 4 describes the simulation results after comparing the performance of the three algorithms, while section 5 concludes the paper. Dynamic Load Balancing Algorithms In static scheduling, once a job is assigned to a node (processing site), it remains there until its execution is completed. Static scheduling requires prior knowledge of the execution times and the communication behaviours of the jobs. The latter is used to co-locate communication-dependent jobs in the same node. The assumption of prior knowledge of jobs is not realistic for most distributed applications. As such, we have to rely on an adhoc scheduling strategy that is adaptive (dynamic) and allows its assignment decision to be made locally (decentralized). The target performance goals for scheduling are system utilization and fairness to the user jobs. A simple heuristic strategy to achieve higher utilization of a system is to avoid having idle nodes as much as possible. Assume that a controller job (running in a designated node) is used to maintain information about the queue size of each node in the system (centralized approach). Since jobs arrive and depart from the system asynchronously, an arriving job makes a request to the controller for the assignment to a node and the controller schedules the job to the node with the shortest queue. To update the queue size information, each node must inform the controller whenever a job is completed and departs from the node. Joining the job with the shortest queue is a static load sharing scheduling strategy that attempts to reduce node idling and to equalize node queue sizes (load balancing). Load balancing is a stronger requirement than load sharing as it improves utilization, achieves a sort of fairness in terms of equal workload for each node and reduces ATT of the jobs [1]. Load balancing can be made adaptive by allowing jobs to migrate from a longer queue to a shorter queue dynamically. If a central controller is not used for transferring a job from one node (sender) to the other (receiver), the job transfer must be initiated either by sender or receiver or both. Figure 1 shows the opportunities of job distribution in a distributed system. In a lightly loaded system, there is little opportunity for job distribution since most nodes are underutilized. In heavily loaded systems, there is little opportunity for job distribution since most nodes are not free to accept new jobs. In moderately loaded systems, there are good opportunities to distribute jobs from overutilized to underutilized nodes. Probability of job Distribution 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Node Utilization Figure 1. Opportunities for Task Distribution 127
Toufik Taibi, Abdelouahab Abid and Engku Fariez Engku Azahan 1. Sender-Initiated Algorithm The sender-initiated algorithm, as the name implies, is activated by a sender that wishes to off-load some of its computation. This algorithm facilitates job migration from a heavily loaded node to a lightly loaded node. There are three basic decisions that need to be made before a transfer of a job can take place: Transfer policy : When does a node become the sender? Selection policy : How does a sender choose a job for transfer? Location policy : What node should be the target receiver? If the queue size is the only indicator of the workload (which is the case in our simulation), a sender can use a transfer policy that initiates the algorithm when detecting that its queue length (SQ) has exceeded a certain threshold (ST) upon the arrival of a new job. The location policy requires knowledge of load distribution to locate a suitable receiver. The sender can send a multicast message to all other nodes asking for a reply about their queue sizes. Upon receiving this information, the sender can select the node with the smallest queue length (RQ) as the target receiver, provided that the queue length of the sender (SQ) is greater than the queue length of the target receiver (RQ) (i.e. SQ>RQ). Figure 2 depicts the flowchart of the sender-initiated algorithm. Job Arrives SQ+1>ST no yes Multicast Receivers & Receive RQs no Select Smallest RQ Queue Job SQ>RQ yes Migrate Job To Receiver Figure 2. Flowchart of the Sender-Initiated Algorithm Multicasting from a sender, receiving replies from receivers and migration of jobs between senders and receivers incurs, additional communication overhead, which increases the actual load of the system. In an already heavily loaded system, the problem could be worsened by the possibility of a ping-pong effect among senders trying to off-load jobs fruitlessly if all nodes are initiating the algorithm simultaneously. The sender-initiated algorithm, however, performs very well in a lightly loaded system as it is easy to find a receiver and the communication overhead has little effect on the system performance. 128
2. Receiver-Initiated Algorithm Sender-initiated algorithm is a push model, where jobs are pushed from one node to other nodes. A receiver can pull a job from other nodes to its queue if it is underutilized. The receiver-initiated algorithm can use a similar transfer policy of the sender-initiated algorithm, which activates the pull operation when its queue length falls below a certain threshold (RT), upon the departure of a job. Similarly multicasting can be used to implement the location policy that identifies a heavily loaded sender. However, the selection policy requires pre-emption since the jobs at the sender node have already started their execution. The decision about which job to remove is not as obvious as in the sender-initiated algorithm. The benefit of load sharing must overweigh the pre-emption and migration communication overhead. In our simulator, we remove the last job in the queue. At high system load, job migrations are few and a sender can be found easily. Load sharing is effectively accomplished with little overhead. When the system load is low, although there will be many migration initiations, degradation of performance due to the additional network traffic is not significant. As such, on average the receiver-initiated algorithm performs better than the sender-initiated algorithm. Figure 3 shows a node queue with sender/receiver thresholds, while Figure 4 depicts the flowchart of the receiver-initiated algorithm. ST RT End (SQ or RQ) Start Figure 3. A Node Queue with Sender/Receiver Thresholds Job Departs RQ-1<RT no Execute Next Job yes Multicast Senders & Receive SQs no yes Select Biggest SQ RQ<SQ Migrate Job from Sender Figure 4. Flowchart of the Receiver-Initiated Algorithm 3. Symmetrically-Initiated Algorithm Since the sender-initiated and receiver-initiated algorithms work well at different system loads, it seems logical to combine them. A node can activate the senderinitiated algorithms when its queue size exceeds one threshold ST, and can activate the receiver-initiated algorithm when its queue size falls below another threshold RT. As such, each node may dynamically play the role of either a sender or a receiver. 129
Toufik Taibi, Abdelouahab Abid and Engku Fariez Engku Azahan The Simulator The simulator was coded in Java. Figure 5 shows the simulator's main window. The Stop and Start/Pause buttons are for controlling the simulator, while the progress bar on top shows the simulation completion time. The simulator has four menus: File, Simulator, Results and Help. The Simulator menu allows the control of the simulator and the setting of the simulation parameters. It also allows the running of a comparative simulation, which is running all three types of simulation one after another using the same settings and showing a message window. The Results menu allows the display of the simulation log file and the result graph. Using the simulator, the user is able to perform the following actions: Controlling the simulator, which involves starting, stopping and pausing the simulation. Running the three algorithms in sequence with the same parameters and comparing their performance using AWT and ATT as factors. This is what we call comparative simulation. Changing basic simulation parameters such as number of nodes, range of the number of generated jobs and simulation type (selecting which of three algorithms to run). (See Figure 6.) Changing advanced simulation parameters such as queue length, ST, RT, job arrival time (fix or random, in which case a range of values is entered) and job burst time (fix or random, in which case a range of values is entered). (See Figure 7.) Displaying graphs. In the case of a single simulation run using one of the three algorithms, time is plotted against AWT and ATT. In the case of a comparative simulation, time is plotted against the AWT and ATT of each of the three algorithms. Figure 5. Simulator s Main Window 130
Figure 6. Simulator s Basic Settings Figure 7. Simulator s Advanced Settings Figure 8 shows a running simulation of the sender-initiated algorithm in a distributed system of nine nodes. Figure 8. A Running Simulation 131
Toufik Taibi, Abdelouahab Abid and Engku Fariez Engku Azahan Comparative Simulation Results Figure 9. Graph Results for Single Simulation Run Figure 9 shows the graph results of a single simulation run. The graph shows AWT and ATT of the whole system versus time. As it can be seen, both AWT and ATT steadily increase with time as more and more jobs are created. At a certain time interval, the graph levels off; this is when no new jobs are created. Figure 10. Graph Results for Comparative Simulation Figure 10 shows the graph results for a comparative simulation. Here, two separate graphs are shown for AWT and ATT. For the graph above, all three algorithms perform similarly at first. As time passes, the symmetrically-initiated algorithm shows an improvement over the other two, while the receiver-initiated algorithm performs better than the sender-initiated algorithm. As with the previous graph, it is observed that after a certain period of time, the graph levels off as no new jobs are introduced into the system. Conclusion The paper described the features of a simulator to compare the performance of three dynamic load balancing algorithms, namely: sender-initiated, receiverinitiated and symmetrically-initiated. The expectation that the symmetricallyinitiated algorithm generally works better holds true for almost all cases. This was so since a node can implement both sender-initiated and receiver-initiated algorithms depending on its queue length. Restricting to either one algorithm 132
causes the node to lose all benefits of dynamic load balancing when system load changes frequently. The symmetrically-initiated algorithm does carry with it an additional overhead since it utilizes two algorithms. However, since both algorithms are almost complementary (sender-initiated works only when queue size is large, and receiver-initiated works only when queue size is small), the effect is at most negligible. As a future enhancement, the simulator would run on multiple nodes (i.e., multiple computers each running an instance of the simulator). This would better reflect the nature of a distributed system. References 1- Chow, R. & Johnson, T. (1998) Distributed Operating Systems and Algorithms, Reading, MA, Addison-Wesley. 2- Krallmann, J., Schwiegelshohn, U. & Yahyapour, R. (1999) On the design and evaluation of job scheduling algorithms. Proceedings of 5 th Workshop on Job Scheduling Strategies for Parallel Processing, San Juan, Puerto Rico, 17 42. 3- Overeinder, B.J. & Sloot, P.M.A. (1996) A dynamic load balancing system for parallel cluster computing. Future Generation Computer Systems, 12: 101-105. 4- Shan, H., Oliker, L. & Biswas, R. (2003) Job superscheduler architecture and performance in computational grid environments. Proceedings of Super Computing Conference, Phoenix, USA, 44-45. 5- Piotrowski, A. & Dandamudi, S.P. (1997) A comparative study of load sharing on networks of workstations. Proceedings of International Conference on Parallel and Distributed Computing Systems, New Orleans, USA, 458-465. 6- Zaki, M., Li, W. & Parthasarathy, S. (1996) Customized dynamic load balancing for a network of workstations. Proceedings of 5th IEEE International Symposium on High-Performance Distributed Computing (HPDC), Syracuse, USA, 282-291. 133