A SIMULATOR FOR LOAD BALANCING ANALYSIS IN DISTRIBUTED SYSTEMS

Mihai Horia Zaharia, Florin Leon, Dan Galea (3) A Simulator for Load Balancing Analysis in Distributed Systems in A. Valachi, D. Galea, A. M. Florea, M. Craus (eds.) - Tehnologii informationale, Editura Universitatii Suceava, pp. 53-59, ISBN 973-666-59- A SIMULATOR FOR LOAD BALANCING ANALYSIS IN DISTRIBUTED SYSTEMS Mihai Horia ZAHARIA, Florin LEON, Dan GÂLEA 3 "Gh. Asachi" University of Iasi Department of Computer Engineering Bd-ulD. Mangeron Nr. 53 A mike@cs.tuiasi.ro fleon@cs.tuiasi.ro 3 dgalea@cs.tuiasi.ro Abstract. There is a lot of research in the area of load balancing at any level in distributed systems. Unfortunately most models take into account homogenous clusters. This approach makes the station and communication model more simple, so the area of possible proposed algorithms for load balancing is increased and sometimes their simplicity. The latest Internet technology development drives to the necessity that the cluster must be dynamic and also heterogeneous. In this paper a simulator for a heterogeneous cluster used in distributed computing is presented. Keywords: distributed systems, network topologies, load balancing, simulator.. Introduction Nowadays in accordance with advanced research needs the required computer power is always insufficient [3,5]. Until last decades there have been two clear differences at computing level. One is parallel computing and the other is represented by distributed systems. Due to massive development acquired both at the computer power and communication levels a new direction appeared. High Performance Computing represents a mixture of parallel and distributed computing. This means the use of heterogeneous dynamic clusters []. There are two dominant different approaches in accordance with the operating systems market. One is the DotNet framework introduced by Microsoft in order to create a background for distributed applications and language independent development. On the other hand there is the grid computing direction derived from GLOBUS international project. This approach is usual for UNIX/LINUX based computers or supercomputers and has application level granularity. Of course there are soft producers that offer the grid style approach under Microsoft platforms but they are yet at the beginning [,4]. There are some approaches that use JAVA in order to create the required core for Grid. No matter what approach is preferred the main idea is to use various architectures distributed in the net as a parallel supercomputer. Unfortunately, most of the research and provided solutions are in the area of homogeneous clusters. That is justified by the simplicity of the required model in comparison with heterogeneous ones. That drives us to propose a general station model from a heterogeneous cluster. A graphical simulator was created in order to analyze how the system will work under various loads and various types of node connections.

A Simulator for Load Balancing Analyze in Distributed Systems. Cluster Model One disadvantage of present approaches in distributed systems is the oversimplification due to the assumption that all the stations in the network are homogenous [6]. In a real cluster the workstations have different computing power and capabilities. The model we used considered a station as characterized by the following elements:! unique ID;! computing power;! availability;! task list. The unique ID can have different meanings: an arbitrarily assigned number, the IP address, or the network interface MAC. We supposed that the computing power of a machine could range from to. This doesn t cause a loss of generality, because the computing power can be expressed as a comparison with a standard machine, using different benchmark tests. For simulation purposes, we used a uniform distribution of random numbers to generate the computing power of the workstations in the network. The availability represents the amount of free resources. It is obvious that a computer running cannot be entirely free; usually the operating system itself consumes memory and hard-disk space. However, for a particular OS, these can be considered as constants (with a certain approximation). Therefore, we focused on the remaining free resources. A computer with no user tasks was considered to have an availability of. As more tasks arrive, from the user or from other stations, the availability decreases until the computer becomes overloaded. If a workstation is overloaded, it can send tasks to its neighbors. We made an explicit model of the tasks. Thus we assumed that a task is characterized by the physical space needed for temporary storage or transport in the network, the computing power of a machine necessary to actually solve the task (power request) and the time to solve. In order to maintain consistency, these values are expressed using the same convention as that used for stations. The values are stated for the unit station, the reference. If a task with a certain power request arrives on a computer with a computing power of, the actual power request will be the default one divided by. This ratio is the availability difference needed to solve the task: TaskPowerRequest ActualDisp onibilityneed = () StationComputingPower These identifiers were randomly generated using different distributions. The task space was: + ExpNeg(3,5), if + ExpNeg(3,5) < 5 TaskSpace = () 5, otherwise where ExpNeg is a function for creating random numbers with an exponential negative distribution: ExpNeg λ x ( λ, x) = λ e (3) The power request was provided as a uniform random number between and 8. The time to solve was generated using the absolute value of a normal (Gaussian) distribution: TimeToSolv e = Normal(5,3), (4)

Mihai Horia Zaharia, Florin Leon, Dan Galea where Normal is a function for creating random Gaussian numbers with a given mean and standard deviation: ( µ ω ) σ Normal ( µ, σ ) = e (5) π In our simulator, we explicitly modeled the network connections. They can have a broadband of Mb/s or Mb/s, like in reality. The space required to transfer a task is taken into consideration where it must be send from a station to another through a network connection. A bigger task will take longer to transfer. Also, the available broadband of the connection will have a significant influence on the performance of routing. 3. Simulator overview In this paragraph, several network topologies will be presented in the simulator, along with a description of system behavior when tasks are injected into the network. The simulation platform we considered had 64 workstations. Their parameters and the connections between them can be customized. The simulation is time-based. Every second, the simulator dynamically recalculates the loads on stations and network connections. A station first tries to solve its task in a FIFO order. When its availability is exceeded, it becomes overloaded and tries to move the tasks it cannot process to its neighbors. If all the neighbors are overloaded, too, the tasks are put on a waiting queue. If there are free neighbors, it chooses one neighbor for every extra task and sends it. The corresponding network connection begins its own processing, transporting a part of the task every second, until sending is complete. In the graphical simulation, we used the following coloring conventions:! a blue station has a availability over 8;! a yellow station has a availability between and 8;! a red station is overloaded (has a negative availability);! a black network connection is free or still available;! a magenta network connection is full. Figure displays a mesh network, where each station is connected to all its neighbors. The running scenario is to have the first station (in the upper left corner of the LAN) receive a certain number of tasks and distribute them in the network. An interesting phenomenon takes place: at the beginning, the overloading of stations propagates in waves. Figure. Mesh network Figure. Tree network On receiving all tasks, the first station immediately becomes overloaded. Then it overloads its immediate neighbors by sending the majority of its tasks to them. After that, it becomes again

A Simulator for Load Balancing Analyze in Distributed Systems available, but it in turn receives tasks from its overloaded neighbors. The same repeats for stations situated at equal distances from the initiator, thus giving the wave impression. In figure, a tree network is displayed. Here, tasks are injected on the first station, but then they are transmitted to four other stars. One can notice that nodes from upper levels are often overloaded than others. This simulation result is in good agreement with theoretical considerations, where a well-known disadvantage of the tree topology is the single point of failure also known as root bottleneck. In fact the overload appears onto the higher levels in the tree with a maximum on the root. The topology of FAT tree may decrease the overload in tight-coupled architectures but in the cluster appears another problem [7]. A simple workstation or dedicated server it is simply not enough to solve the overload. That s why powerful wired structures are used in communications centers management. Unfortunately in a distributed system where the cluster is dynamic elected by any self-elected workstation the static structure of Internet may or may not be helpful depending on the relative position of the initiator. A very popular network topology is the hyper-cube. In figure 3, a 5D hyper-cube is shown. It has only 3 vertices, therefore half of the stations are not connected. The topology shows in a more intuitive way the distribution of tasks in a cube configuration. Figure 3. 5D hyper-cube network Figure 3. 6D hyper-cube network Having 64 stations, a 6D hyper-cube is a more efficient approach (figure 4). In this case, the graphical configuration is not as obvious as the one in figure 3. To generate this topology, each station received an ID from to 63 and then connections were automatically generated between every two stations whose ID s in binary format differed only by one bit. Following the same scenario, one can see the speed with which tasks are distributed in the whole hyper-cube. Practically, no station remains free in a very short amount of time. This proves the high efficiency of this topology. 4. Performance study for different network topologies Next, we studied the performance of task distribution in three scenarios, for three network topologies: mesh, trees and hyper-cube. The first scenario was the one presented above: 5 tasks were introduced in the LAN through only one station. The simulation results for 5 trials are listed in table. The tree configuration proves to be the least efficient. When there are only a few big tasks left to solve, they have little chance to exit their star and to enter another, because they must traverse the first next two nodes beginning with root.

Mihai Horia Zaharia, Florin Leon, Dan Galea Table. First scenario: single task generator Mesh Tree 6D Hyper-cube 66 574 348 5 58 8 3 49 478 33 3 36 49 7 4 Figures 4 and 5 display system performance for a mesh topology in this scenario. At the beginning, all tasks are simultaneously injected into one station. Therefore, at the beginning the system is overloaded and many tasks are transmitted through network connections. By the end, the majority of tasks have been solved and only a few big tasks are still finding a workstation powerful enough to solve them. Total Disponibility Total Network Load disponibility 5-5 - -5 4 47 7 93 6 39 6 85 8 3 54 77 load 5 4 3-3 45 67 89 33 55 77 99 43 65 Figure 4. Mesh topology performance for single task generator: total availability Figure 5. Mesh topology performance for single task generator: total network load In the second scenario, we assumed that stations had users with generated local tasks. For 5, on every station tries were attempted to generate a task, each with a probability of.5% (table ). Table. Second scenario: multiple task generators Mesh Tree 6D Hyper-cube 3 47 5 53 57 47 6 96 37 74 3 4 6 77 35 In this scenario, the hyper-cube seems to have best performance. Although each station has six connections to their neighbors and the locally generated tasks can be optimally distributed in a acceptably small vicinity of the initial vertex in order to be solved. The system as a whole is never overloaded (figures 6 and 7).

A Simulator for Load Balancing Analyze in Distributed Systems Total Disponibility Total Network Load disponibility 4 3 load 6 5 4 3 5 9 43 57 7 85 99 3 7 4 55 5 9 43 57 7 85 99 3 7 4 55 Figure 6. Hyper-cube topology performance for multiple task generators: total availability Figure 7. Hyper-cube topology performance for multiple task generators: total network load In the third scenario, only one huge task was injected, with a power request of 95, a needed space of 5 and a time to solve of. Obviously, only a powerful computer (with a computing power of ) can solve such a task, therefore it moves through the system until it reaches a powerful station. Table 3. Third scenario: single huge task Mesh Tree 6D Hyper-cube 5 8 5 6 46 38 8 46 97 5 8 4 Again, the hyper-cube seems to have best performance, followed by the mesh topology. Figures 8 and 9 show the worst case, when a tree configuration is used, where the task is transported through the network for a long time. Every computer that receives it becomes overloaded and has to send it further to its neighbors. Network connections can transport it faster or slower, depending on their own speed (Mb/s or Mb/s). Finally a powerful workstation receives it and is able to solve it. A solution to speed up this process is to introduce an intelligent routing algorithm so that such a task could easily find the type of computer it requires. Total Disponibility Total Network Load disponibility 35 3 35 3 35 load 8 6 4 3 34 45 56 67 78 89 9 8 37 46 55 64 73 8 9 9 8 Figure 8. Tree topology performance for single huge task: total availability Figure 9. Tree topology performance for single huge task: total network load

Mihai Horia Zaharia, Florin Leon, Dan Galea 5. Conclusions The results suggest a new approach of routing in dynamic cluster using n-cube connection techniques. Of course that is suitable if the network has enough broadband and availability. From the point of view of a network administrator it is easy enough to create IP s for stations that follow an n-cube connection type. The next series of analyses will try to also implement the network overload using a random or non-linear variation form. The impact of heuristic algorithms for load balancing and cluster level routing will be studied. This simulator can be also used as a educational instrument for students to better understand previously treated problems. References [] Myers, D. S., Cummings, M. P. (3) Necessity is the mother of invention: a simple grid computing system using commodity tools, Vol. 63 Issue 5, p578, p, Journal of Parallel & Distributed Computing. [] Lesyng, B., Bala, P., Erwin, D. (3) EUROGRID European computational grid testbed, Vol. 63 Issue 5, p59, 7p, Journal of Parallel & Distributed Computing. [3] Chervenak, A., Deelman, E. et al. (3) High-performance remote access to climate simulation data: a challenge problem for data grid technologies, Vol. 9 Issue, p335, p Parallel Computing. [4] Chien, A., Calder, B., Elbert, S., Bhatia, K., (3) Entropia: architecture and performance of an enterprise desktop grid system, Vol. 63 Issue 5, p597, 4p, Journal of Parallel & Distributed Computing. [5] Schissel, D.P., Finkelstein, A. et al. () Data management, code deployment, and scientific visualization to enhance scientific discovery in fusion research through advanced computing, Vol. 6 Issue 3, p48, 6p, Fusion Engineering & Design. [6] D.J. Evans, W.U.N. Butt, (993) Dinamic Load Balancing using Task Transfer Probabilities, Vol. 9, pp 897-96, Parallel Computing. [7] Hwang, K. (993) Advanced Computer Architecture Paralelism Scalability Programability, McGraw-Hill, Inc.