Partition And Load Balancer on World Wide Web

Size: px
Start display at page:

Download "Partition And Load Balancer on World Wide Web"

Transcription

1 JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 17, (2001) UMPAL: An Unstructured Mesh Partitioner and Load Balancer on World Wide Web WILLIAM C. CHU *, DON-LIN YANG, JEN-CHIH YU AND YEH-CHING CHUNG * Department of Computer and Information Science TungHai University Taichung, Taiwan 407, R.O.C. chu@cis.thu.edu.tw Department of Information Engineering Feng Chia University Taichung, Taiwan 407, R.O.C. {jcyu, dlyang, ychung}@iecs.fcu.edu.tw The finite element method (FEM) has been widely used for the structural modeling of physical systems. Due to computation-intensiveness and computation-locality, it is attractive to implement the finite element method on distributed memory multicomputers. Many research efforts have already provided solid algorithms for mesh partitioning and load balancing. However, without proper support, mesh partitioning and load balancing are labor intensive and tedious. In this paper, we present an unstructured mesh partitioner and load balancer (UMPAL) on World Wide Web (WWW). UMPAL is an integrated tool that consists of five components, a partitioner, a load balancer, a simulator, a visualization tool, and a Web interface. In the partitioner, three partitioning methods, Jostle/DDM, Metis/DDM, and Party/DDM are provided. The load balancer provides two load-balancing methods, prefix code matching parallel load-balancing and binomial tree based parallel load-balancing. The simulator provides a performance simulation environment for a partitioned mesh. By inputting parameters of a target distributed memory multicomputer, one can get the execution result of a partitioned mesh from the simulator. The visualization tool provides a way for users to view a partitioned mesh. The Web interface provides a mean for users to use UMPAL via the Internet and integrates the other four parts. Through the Web interface, other four components can be operated independently or together. Additionally, UMPAL provides several demonstrations and their corresponding mesh models that allow beginners to download and experiment. The UMPAL is designed with ease of use, efficiency, and transparency in mind. The experimental results show the property being practical and usefulness of our UMPAL. Keywords: World Wide Web, partitioner, load balancer, unstructured mesh, Internet 1. INTRODUCTION The finite element method has been widely used for structural modeling of physical systems. To solve problems with using the finite element method on a distributed memory multicomputer, in general, we first need to establish a finite element model for the Received September 14, 1999; revised May 25, 2000; accepted June 27, Communicated by Gen-Huey Chen. 595

2 596 WILLIAM C. CHU, DON-LIN YANG, JEN-CHIH YU AND YEH-CHING CHUNG the problem. Usually, the model could can be a 2D or 3D mesh, which is a connected and undirected graph that consists of a number of finite elements. Each of which is composed of a number of nodes. The number of nodes of a finite element is determined by the application. In Fig. 1, an example of a 21-node 2D mesh of 24 finite elements is shown. Due to the properties of computation-intensiveness and computation-locality, it is very attractive to implement the finite element method on distributed memory multicomputers Fig. 1. An example of a 21-node 2D mesh with 24 finite elements (the circled and uncircled numbers denote the node numbers and finite element numbers, respectively). To efficiently execute a finite element application program on a distributed memory multicomputer, we need to map nodes of the corresponding mesh to processors of a distributed memory multicomputer such that each processor has the same amount of computational load and the communication among processors is minimized. Since this mapping problem is known to be NP-complete [9], many heuristics have been proposed to find satisfactory sub-optimal solutions [1, 7-8, 10, 12-13, 16-19, 21-26]. Based on these heuristics, many graph partitioners were developed [12, 16-17, 21-22, 24, 26]. Among them, Jostle [24], Metis [16], and Party [21] are regarded as the best graph partitioners available to date. If the number of nodes in a mesh will not be increased during the execution of a finite element application program, the mapping algorithm needs to be performed only once. For an adaptive mesh application program, the number of nodes will be increased due to the refinement of some finite elements during the execution. This will result in a load imbalance on the processors. A load-balancing algorithm has to be performed many times in order to balance the computational load on processors while also keeping the communication cost among processors as low as possible. To deal with the load imbalance problem of an adaptive mesh computation, many load-balancing methods have been proposed in the literature [2-6, 11, 14-15, 20, 22, 24-25]. Without tools support, mesh partitioning and load balancing are labor intensive and tedious. In this paper, we present an unstructured mesh partitioner and load balancer

3 UMPAL: A WWW TOOL 597 (UMPAL) on WWW. UMPAL is an integrated tool that consists of five components, a partitioner, a load balancer, a simulator, a visualization tool, and a Web interface. In the partitioner, three partitioning methods, Jostle/DDM, Metis/DDM, and Party/DDM are provided. In the load balancer, UMPAL provides two load-balancing methods, the prefix code matching parallel load-balancing method [3], and the binomial tree based parallel load-balancing method [4]. The simulator provides a performance simulation environment for a partitioned mesh. By inputting parameters into a target distributed memory multicomputer, one can simulate the partitioning of a mesh. The visualization tool provides a way for users to view the partitioned mesh. The Web interface provides a mean for users to use UMPAL via the Internet and integrates the other four parts. Through the Web interface, the other four components can be operated independently or in cooperation with the others. Additionally, UMPAL provides several demonstration examples and their corresponding models, which allow beginners to download and experiment. The design of UMPAL is based on ease of use, efficiency, and transparency. Our experimental results demonstrate the practicality and usefulness of our UMPAL. The rest of this paper is organized as follows. Relevant work is given in Section 2. In Section 3, the UMPAL is described in detail. In Section 4, some experimental results of using UMPAL are presented. 2. RELATED WORK Many methods have been proposed to deal with the partitioning/mapping problems of irregular graphs on distributed memory multicomputers. In general, they can be divided into five classes, orthogonal section [23, 25], min-cut [7-11, 18], spectral [1, 13, 23], multilevel [1, 16-17, 22, 24], and other miscellaneous approaches [10, 19, 25]. These methods were implemented in several graph partition libraries, such as Chaco [12], DIME [26], Jostle [24], Metis [16], Party [21], etc.. For the orthogonal section approach, an irregular graph is partitioned into modules by recursively cutting the graph into two subgraphs according to the node coordinates on the x and y-axes in turn. Each partitioned module has the same amount of computational load. These modules are then mapped to processors. Although this approach does not consider any connectivity information in the graph, it tries to group nodes that are closed together in the graph to the same modules. For the min-cut approach, the Kernighan-Lin heuristic [18] is the most frequently used method for local bisection. It uses a sequence of logical vertex pair exchanges to determine the sets to be physically exchanged. Several heuristics have been proposed to improve the performance of the KL heuristic [8]. In [7], a recursive min-cut bipartitioning algorithm was proposed to map graphs on hypercubes. The spectral approach is based on algebraic graph theory. In this method, a matrix similar to the adjacency matrix of the graph is constructed and some specific eigenvectors of this matrix are computed. The determination of the eigenvectors is the major computational task of this method. Nodes of the graph are distributed to corresponding processors according to the values of these eigenvectors. Spectral methods are efficient for graph partitioning. However, the time and space required by spectral methods to partition a graph are quite high.

4 598 WILLIAM C. CHU, DON-LIN YANG, JEN-CHIH YU AND YEH-CHING CHUNG The multilevel approach is based on a coarsening strategy that decreases the size of a graph in several levels using matching techniques. After coarsening, it uses the spectral method or a k-way partitioning method or other partitioning methods to partition the coarsened graph. After partitioning, the partition of the coarse graph is extrapolated to the original one. Then the partitioned modules are assigned to processors. Other partition methods such as index-based mapping [19], projection based mapping [10], simulated annealing (SA) [25], etc., use other heuristics to do graph partitioning and do not belong to the approaches described above. To solve the load imbalance problem of adaptive mesh computations, many load-balancing algorithms can be used to balance the load on processors. The dimension exchange method (DEM) is applied to application programs without geometric structure [6]. Ou and Ranka [20] proposed a linear programming-based method to solve the incremental graph partitioning problem. Since their method has scope for the transferred nodes, it may sometimes result in no solution. Hu and Blake [15] proposed a direct diffusion method that computes the diffusion solution by using an unsteady heat conduction equation, while optimally minimizing the Euclidean norm of the data movement. They proved that a diffusion solution can be found by solving a linear equation. Heirich and Taylor [11] proposed a direct diffusive load-balancing method for scalable multicomputers. They derived a reliable and scalable load-balancing method based on properties of the parabolic heat equation u t α 2 u =0. Horton [14] proposed a multilevel diffusion method by recursively bisecting a graph into two subgraphs and balancing the load of the two subgraphs. This method assumes that the graph can be recursively bisected into two connected graphs. Schloegel et al. [22] also proposed a multilevel diffusion scheme to construct a new partition of the graph incrementally. It contains three phases, a coarsening phase, a multilevel diffusion phase, and a multilevel refinement phase. These algorithms perform diffusion in a multilevel framework and minimize data movement without comprising the edge-cut. Their methods also include parameterized heuristics to specifically optimize edge-cut, total data migration, and the maximum amount of data migrated in and out of each processor. Walshaw et al. [24] implemented a parallel partitioner and a direct diffusion repartitioner in Jostle that is based on the diffusion solver proposed by Hu and Blake [15]. They also developed a multilevel diffusion repartitioner in Jostle. Although several graph partitioning and load balancing methods have been implemented as tools or libraries [12, 16, 21, 24, 26], none of them has offered its Web interface and high level support to users. 3. THE SYSTEM STRUCTURE OF UMPAL The system structure of UMPAL is shown in Fig. 2. It consists of five components, a partitioner, a load balancer, a simulator, a visualization tool, and a Web interface. Users can upload the unstructured mesh data and get the running results using any Web browser. Through the Web interface, the other four components can be operated independently or can be run cooperatively. In the following, we will describe them in details.

5 UMPAL: A WWW TOOL 599 User Web interface visualization tool Simulator Load balancer Partitioner Fig. 2. The system structure of UMPAL. 3.1 The Partitioner In the partitioner, we provide three partitioning methods, Jostle/DDM, Metis/DDM, and Party/DDM. Jostle/DDM, Metis/DDM, and Party/DDM were implemented based on the best algorithms provided in Jostle, Metis and Party, respectively, with the dynamic diffusion optimization method (DDM) [5]. The partitioner of UMPAL has the following advantages: 1. In Jostle, Metis, and Party, a 3% to 5% load imbalance among partitioned modules is allowed. The dynamic diffusion optimization method can efficiently balance the 3% to 5% load imbalance among partitioned modules allowed by these three methods, thereby improving the total cut-edges of partitioned modules. Therefore, the partition methods provided in the partitioner will perform better than their regular counterparts, i.e., Jostle, Metis, and Party. 2. The partition results of Jostle, Metis, and Party depend on the shapes of unstructured meshes. It is difficult to tell that which one performs best for a given unstructured mesh. If we want to get the best result among these three partitioners,. we need to run these three partitioners separately. Since the parameters used in these three partitioners are different, it may take some time to get the desire results. By integrating the Jostle/DDM, Metis/DDM, and Party/DDM methods in a partitioner, one can try each method once and take the best partitioning result because the parameters for these three methods are uniform in UMPAL. The flow chart of the partitioner is given in Fig. 3. From Fig. 3, we can see that the inputs of the partitioner are the number of processors and a file of an unstructured mesh connection model. The number of processors specifies how many processors will be involved in the partitioning process. The file of an unstructured mesh connection model can be uploaded from a user s Web browser or can be specified by using the demo model. In UMPAL, we provide five 2D and two 3D unstructured demo meshes. In an

6 600 WILLIAM C. CHU, DON-LIN YANG, JEN-CHIH YU AND YEH-CHING CHUNG unstructured mesh connection model file, the first line specifies the numbers of nodes and edges of an unstructured mesh. The second line describes the neighbors of node 1. The third line describes the neighbors of node 2, and so on. Fig. 4 shows an example file of the connection model of an unstructured mesh. unstructured mesh connection model number of processors partition methods: Metis/DDM, Party/DDM, and Jostle/DDM partitioned unstructured mesh file total cutedges, and load balancing degree Fig. 3. The flow chart of the partitioner the model contains 100 nodes and 300 edges the neighbor nodes of node 1, which are nodes 2, 3, 4 and the neighbor nodes of node 2, which are nodes 1, 3, and 4.. Fig. 4. Format of the unstructured mesh connection model file. The outputs of the partitioner are a partitioned unstructured mesh file and the partitioned results. In a partitioned unstructured mesh file, a number j in line i indicates that node i belongs to processor j. Users can download the partitioned unstructured mesh file for further use and see the partitioned results on a Web browser. The partitioned results include the load balancing degree and the total cut-edges of a partitioned unstructured mesh. Figs. 5 and 6 show the Web page of the partitioner and the partitioned results of an unstructured mesh Truss, respectively. 3.2 The Load Balancer In the load balancer, we provide two load-balancing methods, the prefix code matching parallel load-balancing (PCMPLB) method [3] and the binomial tree based parallel load-balancing (BINOTPLB) method [4]. From the flow chart of the load balancer given in Fig. 7, we can see that the inputs of the load balancer are the number of processors and files of the connection model, the element model, and the partitioned model of an unstructured mesh. The number of processors specifies how many processors will be involved in the load balancing process. The data format of the connection

7 UMPAL: A WWW TOOL 601 Fig. 5. The Web page of the partitioner. Fig. 6. the portioned results of Truss. model file of an unstructured mesh is the same as that described in the partitioner. In the element model file of an unstructured mesh, the first line specifies the number of elements. The second line describes the nodes of element 1. The third line describes the nodes of element 2, and so on. Fig. 8 gives an example file showing the format of the element model of an unstructured mesh. The data format of the partitioned model file of an unstructured mesh is the same as that of the output file of the partitioner. In the load balancer, users can also use the partitioned unstructured demo mesh model provided by UMPAL. In this case, the inputs are the load imbalance degree and the numberofprocessors. unstructured mesh element model unstructured mesh connection model partitioned unstructured mesh file number of processors load balancing methods: PCMPLB, and BINOTPLB load-balanced unstructured mesh file total cut-edges, and load balancing degree Fig. 7. The flow chart for the load balancer the model contains 300 elements Nodes 2, 3, and 4 form element Nodes 1, 3, and 4 form element 2.. Fig. 8. Format of the unstructured mesh element model file.

8 602 WILLIAM C. CHU, DON-LIN YANG, JEN-CHIH YU AND YEH-CHING CHUNG The outputs of the load balancer are a load-balanced unstructured mesh file and the load balancing results. The data format of a load-balanced unstructured mesh file is the same as that of the output file of the partitioner. Users can download the load-balanced unstructured mesh file for further use and see the load balancing results on a Web browser. The load balancing results include the load balancing degree and the total cut-edges. Fig. 9 shows the Web page of the load balancer. Fig. 10 shows the load balancing results of a partitioned unstructured mesh Truss with a 5% load imbalance on 10 processors. Fig. 9. The Web page of the load balancer. Fig. 10. The load-balanced results of Truss. 3.3 The Simulator The simulator provides a simulated distributed memory multicomputer for the performance evaluation of a partitioned unstructured mesh. The execution time of an unstructured mesh on a P-processor distributed memory multicomputer under a particular mapping/load-balancing method L i can be defined as follows: T par (L i )=max{t comp (L i, P j )+T comm (L i, P j )}, (1) T comp (L i, P j ) is the computation cost of processor P j under L i,andt comm (L i, P j )isthe communication cost of processor P j under L i,wherej = 0,..., P 1. The cost model used in Equation 1 assums a synchronous communication mode in which each processor goes through a computation phase followed by a communication phase. Therefore, the computation cost of processor P j under a mapping/load-balancing method L i can be defined as follows: T comp (L i, P j )=S load i (P j ) T task, (2) where S is the number of iterations performed by a finite element method, load i (P j )isthe number of nodes of an unstructured mesh assigned to processor P j,andt task is the time for a processor to execute the tasks of a node. For the communication model, we assume a synchronous communication mode and

9 UMPAL: A WWW TOOL 603 that every pair of processors can communicate with each other in one step. In general, it is possible to overlap communication with computation. In this case, T comm (L i, P j ) may not always reflect the true communication cost since it would partially overlap the computation. However, T comm (L i, P j ) should provide a good estimate for the communication cost. Since we use a synchronous communication mode, T comm (L i, P j ) can be defined as follows: T comm (L i, P j )=S (δ T setup + φ T c ), (3) where S is the number of iterations performed by a finite element method, δ is the number of processors that processor P j sends data to in each iteration, T setup is the setup time of the I/O channel, φ is the total amount of data that processor P j sends out in each iteration, and T c is the data transmission time of the I/O channel per byte. partitioned or load-balanced unstructured mesh file communication setup time, data transmission time, and executing time for one task simulator maximun processing time among all processors Fig. 11. Simulator flow chart. The simulator flow chart is given in Fig. 11. To use the simulator, users need to input the partitioned or load-balanced unstructured mesh file and the values of S, T setup, T c, T task, and the number of bytes sent by a finite element node to its neighbors. The partitioned or load-balanced unstructured mesh file can be uploaded from a user s browser or it can be a demo file provided by UMPAL. The data format of the partitioned or load-balanced unstructured mesh file is the same as for those described in the partitioner and the load balancer discussions. The outputs of the simulator are the execution time of the unstructured mesh on a simulated distributed memory multicomputer and the total cut-edges of a partitioned unstructured mesh. Fig. 12 shows the Web page for the simulator, and Fig. 13 presents the simulation results of Truss on a simulated 10-processor SP The Visualization Tool UMPAL also provides a visualization tool for visualizing the partitioned unstructured mesh. The working flow for the visualization tool is shown in Fig. 14. The inputs of the visualization tool are files of the coordinate model, the element model, the partitioned unstructured mesh models, and the image size. In the coordinate model file for an unstructured mesh, line 1 specifies the number of nodes, line 2 specifies the x, y, z

10 604 WILLIAM C. CHU, DON-LIN YANG, JEN-CHIH YU AND YEH-CHING CHUNG Fig. 12. Simulator Web page. Fig. 13. Simulation results. unstructured mesh element model unstructured mesh coordinate model partitioned or load-balanced unstructured mesh file width and height of image unstructured mesh visualization tool image of unstructured mesh Fig. 14. Visualization tool flow chart the model contains 300 nodes x, y, z coordinates of node x, y, z coordinates of node 2.. Fig. 15. Data format of an unstructured mesh coordinate model. coordinate of node 1, line 3 specifies the coordinate of node 2, and so on. Fig. 15 illustrates the file format for an unstructured mesh coordinate model. The data formats of the element model and the partitioned model of an unstructured mesh are the same as those described in the load balancer. After rendering, a Web browser displays the unstructured mesh with different colors, with each color representing one processor. Currently, the visualization tool can only display partitioned 2D meshes. The visualization of partitioned 3D meshes is still under development. Fig. 16 shows the visualization tool Web page. Fig. 17 shows the rendering result of Letter_S.

11 UMPAL: A WWW TOOL Fig. 16. Visualization tool Web page. 605 Fig. 17. Result of rendering Letter_S. 3.5 The Web Interface The Web interface allows users to try the various components of UMPAL. The interface consists of two parts, an HTML interface and a CGI interface. The HTML interface provides Web pages for users to input requests from Web browsers. The CGI interface is responsible for handling these requests. Through the Web interface, other four components can be operated independently or can be run in cooperation. The Web interface flow chart is shown in Fig. 18. As a user operates each component independently, the Web interface passes the requests to that component. The component will then process the requests and produce an output. When the request involves more than one component, the Web interface has to controls the data flow between each requested component. In this case, the partitioner is always executed before the load balancer, the load balancer before the simulator, and the simulator before the visualization tool. Fig. 19 gives an example of specifying a cooperative process, while Figure 20 presents the computed solution for the specified problem. 3.6 Implementation of UMPAL In order to support standard WWW browsers, the front end is coded in HTML using CGI. The CGI interface is implemented in the Perl language. The CGI interface receives the data and parameters from the forms of the HTML interface, and then it calls external tools to handle the requests. The tools of UMPAL, partitioner, balancer, and simulator are coded in the C programming language. They receive parameters from the CGI interface and use the specified methods to process user requests. To support an interactive visualization tool, the client/server software architecture is used in UMPAL. In the client side, a Java Applet is implemented to display images rendered by server. In the server side, a Java server-let is implemented as a Java Application. The Java server-let renders an image with specific size and unstructured mesh models. As the server finishes its rendering work, it sends the final image to the client side so users can see the final image.

12 606 WILLIAM C. CHU, DON-LIN YANG, JEN-CHIH YU AND YEH-CHING CHUNG User HTML interface upload required unstructured mesh models CGI element, and coordinate models load balanced model element, connection, and partitioned models partitioned model connection model visualization tool Simulator Load balancer partitioner otal cut-edges and simulation time load balanced model image of unstructured mesh model Service provider's Web pages partitioned model Fig. 18. The Web interface flow chart. (a) The upper half. (b) The lower half. Fig. 19. An example of specifying a cooperative process through the Web interface.

13 UMPAL: A WWW TOOL 607 Fig. 20. The solution for the problem given in Fig EXPERIENCE AND EXPERIMENTAL RESULTS In this section, we will present some experimental results for unstructured meshes by using the partitioner, the load balancer, and the simulator of UMPAL through a Web browser. 4.1 Experimental Results for the Partitioner To evaluate the performance of Jostle/DDM, MLkP/DDM, and Party/DDM, three 2D and two 3D unstructured meshes are used as test samples. The initial 2D unstructured meshes, Hook, Letter_S, andtruss, were created by using the distributed irregular mesh environment (DIME) [22], then followed by our mesh refinement algorithm. The 3D unstructured meshes, Femur and Tibia, were produced by using our auto mesh generation program on source images obtained from CT (computer tomography). These five unstructured meshes are part of a set of demo meshes provided in UMPAL and are shown in Fig. 21. The number of nodes, the number of elements, and the number of edges of these five unstructured meshes are given in Table 1. For presentation purposes, the number of nodes, number of elements, and number of edges of the irregular finite element graphs shown in Fig. 17 are less than those shown in Table 1. Table 2 shows the total cut-edges of Jostle/DDM, Metis/DDM, and Party/DDM with their counterparts for the three 2D and two 3D unstructured meshes on 70 processors. The total cut-edges of Jostle, Metis, and Party were obtained by running these three partitioners with default values. The load imbalance degree allowed by Jostle, Metis, and Party are 3%, 5%, and 5%, respectively. The total cut-edges of Jostle/DDM,

14 608 WILLIAM C. CHU, DON-LIN YANG, JEN-CHIH YU AND YEH-CHING CHUNG (a) Hook (1849 nodes, 3411 elements) (b) Letter_S (6075 nodes, elements) (c) Truss (7325 nodes, elements) (d) Femur (6141 nodes, 7448 elements) (e) Tibia (973 nodes, 1168 elements) Fig. 21. Unstructured meshes used to evaluate performance.

15 UMPAL: A WWW TOOL 609 Metis/DDM, and Party/DDM were obtained by applying the dynamic diffusion optimization method (DDM) [5] to the partitioned results of Jostle, Metis, and Party, respectively. Jostle/DDM, Metis/DDM, and Party/DDM guarantee that the load among partitioned modules is fully balanced. From Table 2, we can see that there are fewer total cut-edges produced by the methods provided in the partitioner are less than those of their counterparts. Table 1. The number of nodes, elements, and edges in the test samples. Samples #node #element #edges Hook Letter Truss_S Femur Tibia Table 2. The total cut-edges of the methods provided in the partitioner and their counterparts. Method Jostle Jostle/DDM Metis Metis/DDM Party Party/DDM Model Truss Letter_S Hook Tibia Femur Experimental Results for the Load Balancer To evaluate the performance of the prefix code matching parallel load-balancing method (PCMPLB) [3] and the binomial tree based parallel load-balancing method (BI- NOTPLB) [4] provided in the load balancer, we compare these two methods with the direct diffusion method (DD) and the multilevel diffusion method (MD). For an experiment, 3%, 5%, and 10% load imbalance cases for the 2D and 3D unstructured meshes were tested. We modified the multilevel k-way partitioning (MLkP) program provided in Metis to generate the desired test samples. The methods provided in the load balancer guarantee that the load among partitioned modules will be fully balanced, whereas the DD and MD methods do not. Table 3 shows the total cut-edges produced by DD, MD, PCMPLB, and BI- NOTPLB for three 2D unstructured meshes on 50 processors. We can see that the methods provided in the load balancer outperform the DD and MD methods in most cases. The load balancing results of PCMPLB and BINOTPLB depend on the test samples. It is difficult to tell which performs better than the other for a given partitioned unstructured mesh. However, one can compute both methods in the load balancer, check the results, and choose the better one.

16 610 WILLIAM C. CHU, DON-LIN YANG, JEN-CHIH YU AND YEH-CHING CHUNG Table 3. The total cut-edges produced by DD, MD, PCMPLB, and BINOTPLB for three 2D unstructured meshes on 50 processors. Model Truss Letter_S Hook Tibia Femur Load imbalance degree DD MD PCMPLB BINOTPLB 3% % % % % % % % % % % % % % % Experience with the Simulator In this experimental test, we simulate the execution of a parallel Laplace solver on a 70-processor SP2 parallel machine. According to [3], the values of T setup, T c,andt task, are 46µs, 0.035µs, and 350µs, respectively. Each finite element node needs to send 40 bytes to its neighbor nodes. The number of iterations performed by a Laplace solver is set to Table 4 shows the simulator output fir the test samples shown in Fig. 21 under different partitioning methods. For comparison, we also include the simulation results of test samples under Jostle, Metis, and Party. From Tables 2 and 4, we can see that, in general, the fewer the total cut-edges, the lower the execution time. This simulation may provide a reference to help in choosing the right method for a given unstructured mesh. Table 4. The simulator output for the test samples shown in Fig. 21. Method Model Jostle Jostle/DDM Metis Metis/DDM Party Party/DDM Truss Letter_S Hook Tibia Femur Time in seconds 5. CONCLUSIONS AND FUTURE WORK In this paper, we have presented a software tool, UMPAL, for processing partitioning and load balancing problems for unstructured meshes on the World Wide Web. Users can try UMPAL by accessing its Internet address,

17 UMPAL: A WWW TOOL 611 UMPAL is an integrated tool that consists of five components, a partitioner, a load balancer, a simulator, a visualization tool, and a Web interface. It was designed to be easy to use, efficient, and transparent. The experimental results presented here demonstrate the practicality and usefulness of UMPAL. There are several advantages of using UMPAL over the web. Firstly, users do not need to obtain licenses for any of the software packages. Also, there is no need for installation, maintenance, or upgrading the software. Different users will all use the latest version, brining standardization to the tools. The integration of different methods into our UMPAL has made experiments and simulations of parallel programs simple and cost effective. UMPAL offers a high level and user friendly interface. Furthermore, the demonstrations can educate beginners on how to apply FEM to solve parallel problems. In UMPAL, we only offer a simulator to execute the partitioned/load-balanced results produced by the partitioner/load balancer. It is possible to generate parallel codes for real machines e.g., IBM SP2 or PC clusters, according to the partitioner/load balancer results. In the future, we plan to add a parallel PDE code generator in UMPAL. There is one typical shortage of tools on the web, which is the downgrade of performance when there are multiple simultaneous requests. To solve this problem, UM- PAL can either be executed on a more powerful computer or be executed on a cluster of machines. For the current implementation of UMPAL, execution on a more powerful computer is the only way possible. In the future, we will implement a parallel/distributed version of UMPAL to enhance its performance. ACKNOWLEDGMENTS The authors would like to thank Dr. Robert Preis, Professor G. Karypis, and Professor Chris Walshaw for providing the Party, the Metis, and Jostle software packages, respectively. REFERENCES 1. S. T. Barnard and H. D. Simon, Fast multilevel implementation of recursive spectral bisection for partitioning unstructured problems, Concurrency: Practice and Experience, Vol. 6, No. 2, 1994, pp Y. C. Chung and C. J. Liao, Tree-based parallel load-balancing methods for solution-adaptive unstructured finite element models on distributed memory multicomputers, IEEE Transactions on Parallel and Distributed Systems, Vol. 10, No. 4, 1999, pp Y. C. Chung, C. J. Liao, and D. L. Yang, A prefix code matching parallel load-balancing method for solution-adaptive unstructured finite element graphs on distributed memory multicomputers, The Journal of Supercomputing, Vol. 15, No. 1, 2000, pp Y. C. Chung and C. J. Liao, A binomial tree-based parallel load-balancing methods for solution-adaptive unstructured finite element graphs on distributed memory multicomputers, in Proceedings of 1998 International Conference on Parallel CFD, 1998, pp

18 612 WILLIAM C. CHU, DON-LIN YANG, JEN-CHIH YU AND YEH-CHING CHUNG 5. Y. C. Chung, D. L. Yang, C. C. Chen, and C. J. Liao, A dynamic diffusion optimization method for irregular finite element graph partitioning, The Journal of Supercomputing, Vol. 17, No. 1, 2000, pp G. Cybenko, Dynamic load balancing for distributed memory multiprocessors, Journal of Parallel and Distributed Computing, Vol. 7, No. 2, 1989, pp F. Ercal, J. Ramanujam, and P. Sadayappan, Task allocation onto a hypercube by recursive mincut bipartitioning, Journal of Parallel and Distributed Computing, Vol. 10, No. 1, 1990, pp C. M. Fiduccia and R. M. Mattheyes, A linear-time heuristic for improving network partitions, in Proceedings of the 19th IEEE Design Automation Conference, 1982, pp M. R. Garey and D. S. Johnson, Computers and Intractability, A Guide to Theory of NP-Completeness, San Francisco, CA: Freeman, J. R. Gilbert, G. L. Miller, and S. H. Teng, Geometric mesh partitioning: implementation and experiments, in Proceedings of 9th International Parallel Processing Symposium, 1995, pp A. Heirich and S. Taylor, A parabolic load balancing method, in Proceedings of International Conference on Parallel Processing 95, 1995, pp B. Hendrickson and R. Leland, The Chaco User s Guide: Version 2.0, Technical Report SAND , Sandia National Laboratories, Albuquerque, NM, B. Hendrickson and R. Leland, An improved spectral graph partitioning algorithm for mapping parallel computations, SIAM Journal on Scientific Computing, Vol. 16, No. 2, 1995, pp G. Horton, A multi-level diffusion method for dynamic load balancing, Parallel Computing, Vol. 19, No. 2, 1993, pp Y. F. Hu and R. J. Blake, An Optimal Dynamic Load Balancing Algorithm, Technical Report DL-P , Daresbury Laboratory, Warrington, UK, G. Karypis and V. Kumar, Multilevel k-way partitioning scheme for irregular graphs, Journal of Parallel and Distributed Computing, Vol. 48, No. 1, 1998, pp G. Karypis and V. Kumar, A parallel algorithm for multilevel graph partitioning and sparse matrix ordering, Journal of Parallel and Distributed Computing, Vol. 48, No. 1, 1998, pp B. W. Kernigham and S. Lin, An efficient heuristic procedure for partitioning graphs, Bell System Technical Journal, Vol. 49, No. 2, 1970, pp C. W. Ou, S. Ranka, and G. Fox, Fast and parallel mapping algorithms for irregular problems, The Journal of Supercomputing, Vol. 10, No. 2, 1996, pp C. W. Ou and S. Ranka, Parallel incremental graph partitioning, IEEE Transactions on Parallel and Distributed Systems, Vol. 8, No. 8, 1997, pp R. Preis and R. Diekmann, The PARTY Partitioning Library User Guide Version 1.1, HENIZ NIXDORF INSTITUTE Universität Paderborn, Germany, K. Schloegel, G. Karypis, and V. Kumar, Multilevel diffusion schemes for repartitioning of adaptive meshes, Journal of Parallel and Distributed Computing, Vol. 47, No. 2, 1997, pp H. D. Simon, Partitioning of unstructured problems for parallel processing, Computing Systems in Engineering, Vol. 2, No. 2/3, 1991, pp

19 UMPAL: A WWW TOOL C. H. Walshaw, M. Cross, and M. G. Everett, Parallel dynamic graph partitioning for adaptive unstructured meshes, Journal of Parallel and Distributed Computing, Vol. 47, No. 2, 1997, pp R. D. Williams, Performance of dynamic load balancing algorithms for unstructured mesh calculations, Concurrency: Practice and Experience, Vol. 3, No. 5, 1991, pp R. D. Williams, DIME: Distributed Irregular Mesh Environment, California Institute of Technology, William Cheng-Chung Chu ( ) is an Associate Professor in the Department of Computer and Information Science at the Tung-Hai University, Taiwan. From 1994 to 1998, he was an Associate Professor at the Department of Information Engineering at the Feng-Chia University, Taiwan. Prior to that, he was a research scientist at Software Technology Center of the Palo Alto Research Laboratories of Lockheed Missiles and Space Company, Inc., where he received a special contribution awards from Lockheed in both 1992 and In 1992, he was a Visiting Scholar in the Department of Engineering Economic Systems at Stanford University, where he was involved in projects related to intelligent knowledge-based expert systems. His current research interests include software reengineering, maintenance, reuse, software quality, and e-commerce. William received his M.S. and Ph.D. degrees from Northwestern University in Evanston Illinois, in 1987 and 1989 respectively, both in Computer Science. His address is: chu@cis.thu.edu.tw Don-Lin Yang ( ) received a B.E. degree in Computer Science from Feng Chia University in 1973, a M.S. degree in Applied Science from the College of William and Mary in 1979, and a Ph.D. degree in Computer Science from the University of Virginia in Prior to joining the Department of Information Engineering at Feng Chia University in 1991, he was a staff programmer at IBM Santa Teresa Laboratory from 1985 to 1987 and a member of technical staff at AT&T Bell Laboratories from 1987 to Dr. Yang is currently an associate professor. His research interests include distributed and parallel computing, data mining, image processing, and network management. He is also a member of the IEEE computer society and ACM.

20 614 WILLIAM C. CHU, DON-LIN YANG, JEN-CHIH YU AND YEH-CHING CHUNG Jen-Chih Yu ( ) received his BS and MS degrees in Information Engineering from Feng Chia University, Taichung, Taiwan, in 1997 and 1999, respectively. His research interests include parallel volume rendering design, computer graphics, visualization, parallel processing, and parallel algorithms. Yeh-Ching Chung ( ) was born in He received a B.S. degree in computer science from Chung Yuan Christian University in 1983, and M.S. and a Ph.D. degrees in computer and information science from Syracuse University in 1988 and 1992, respectively. Currently, he is a Professor and the chair with the Department of Information Engineering at Feng Chia University, where he directs the Parallel and Distributed Processing Laboratory. His research interests include parallel compilers, parallel programming tools, mapping, scheduling, load balancing, Embedded Systems and Virtual Reality.

Distributed Dynamic Load Balancing for Iterative-Stencil Applications

Distributed Dynamic Load Balancing for Iterative-Stencil Applications Distributed Dynamic Load Balancing for Iterative-Stencil Applications G. Dethier 1, P. Marchot 2 and P.A. de Marneffe 1 1 EECS Department, University of Liege, Belgium 2 Chemical Engineering Department,

More information

Mesh Generation and Load Balancing

Mesh Generation and Load Balancing Mesh Generation and Load Balancing Stan Tomov Innovative Computing Laboratory Computer Science Department The University of Tennessee April 04, 2012 CS 594 04/04/2012 Slide 1 / 19 Outline Motivation Reliable

More information

Dynamic Load Balancing for Parallel Numerical Simulations based on Repartitioning with Disturbed Diffusion

Dynamic Load Balancing for Parallel Numerical Simulations based on Repartitioning with Disturbed Diffusion Dynamic Load Balancing for Parallel Numerical Simulations based on Repartitioning with Disturbed Diffusion Henning Meyerhenke University of Paderborn Department of Computer Science Fürstenallee 11, 33102

More information

Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations

Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations Roy D. Williams, 1990 Presented by Chris Eldred Outline Summary Finite Element Solver Load Balancing Results Types Conclusions

More information

Portable Parallel Programming for the Dynamic Load Balancing of Unstructured Grid Applications

Portable Parallel Programming for the Dynamic Load Balancing of Unstructured Grid Applications Portable Parallel Programming for the Dynamic Load Balancing of Unstructured Grid Applications Rupak Biswas MRJ Technology Solutions NASA Ames Research Center Moffett Field, CA 9435, USA rbiswas@nas.nasa.gov

More information

Mesh Partitioning and Load Balancing

Mesh Partitioning and Load Balancing and Load Balancing Contents: Introduction / Motivation Goals of Load Balancing Structures Tools Slide Flow Chart of a Parallel (Dynamic) Application Partitioning of the initial mesh Computation Iteration

More information

Partitioning and Dynamic Load Balancing for Petascale Applications

Partitioning and Dynamic Load Balancing for Petascale Applications Partitioning and Dynamic Load Balancing for Petascale Applications Karen Devine, Sandia National Laboratories Erik Boman, Sandia National Laboratories Umit Çatalyürek, Ohio State University Lee Ann Riesen,

More information

Heat Diffusion Based Dynamic Load Balancing for Distributed Virtual Environments

Heat Diffusion Based Dynamic Load Balancing for Distributed Virtual Environments Heat Diffusion Based Dynamic Load Balancing for Distributed Virtual Environments Yunhua Deng Rynson W.H. Lau Department of Computer Science, City University of Hong Kong, Hong Kong Abstract Distributed

More information

A scalable multilevel algorithm for graph clustering and community structure detection

A scalable multilevel algorithm for graph clustering and community structure detection A scalable multilevel algorithm for graph clustering and community structure detection Hristo N. Djidjev 1 Los Alamos National Laboratory, Los Alamos, NM 87545 Abstract. One of the most useful measures

More information

Hypergraph-based Dynamic Load Balancing for Adaptive Scientific Computations

Hypergraph-based Dynamic Load Balancing for Adaptive Scientific Computations Hypergraph-based Dynamic Load Balancing for Adaptive Scientific Computations Umit V. Catalyurek, Erik G. Boman, Karen D. Devine, Doruk Bozdağ, Robert Heaphy, and Lee Ann Riesen Ohio State University Sandia

More information

Partitioning and Dynamic Load Balancing for the Numerical Solution of Partial Differential Equations

Partitioning and Dynamic Load Balancing for the Numerical Solution of Partial Differential Equations Partitioning and Dynamic Load Balancing for the Numerical Solution of Partial Differential Equations James D. Teresco, Karen D. Devine, and Joseph E. Flaherty 3 Department of Computer Science, Williams

More information

A FAST AND HIGH QUALITY MULTILEVEL SCHEME FOR PARTITIONING IRREGULAR GRAPHS

A FAST AND HIGH QUALITY MULTILEVEL SCHEME FOR PARTITIONING IRREGULAR GRAPHS SIAM J. SCI. COMPUT. Vol. 20, No., pp. 359 392 c 998 Society for Industrial and Applied Mathematics A FAST AND HIGH QUALITY MULTILEVEL SCHEME FOR PARTITIONING IRREGULAR GRAPHS GEORGE KARYPIS AND VIPIN

More information

HPC enabling of OpenFOAM R for CFD applications

HPC enabling of OpenFOAM R for CFD applications HPC enabling of OpenFOAM R for CFD applications Towards the exascale: OpenFOAM perspective Ivan Spisso 25-27 March 2015, Casalecchio di Reno, BOLOGNA. SuperComputing Applications and Innovation Department,

More information

Dynamic mesh partitioning: a unified optimisation and load-balancing algorithm

Dynamic mesh partitioning: a unified optimisation and load-balancing algorithm Dynamic mesh partitioning: a unified optimisation and load-balancing algorithm C. Walshaw, M. Cross and M. G. verett Centre for Numerical Modelling and Process Analysis, University of Greenwich, London,

More information

Lecture 12: Partitioning and Load Balancing

Lecture 12: Partitioning and Load Balancing Lecture 12: Partitioning and Load Balancing G63.2011.002/G22.2945.001 November 16, 2010 thanks to Schloegel,Karypis and Kumar survey paper and Zoltan website for many of today s slides and pictures Partitioning

More information

Shape Optimizing Load Balancing for Parallel Adaptive Numerical Simulations Using MPI

Shape Optimizing Load Balancing for Parallel Adaptive Numerical Simulations Using MPI Shape Optimizing Load Balancing for Parallel Adaptive Numerical Simulations Using MPI Henning Meyerhenke Institute of Theoretical Informatics Karlsruhe Institute of Technology Am Fasanengarten 5, 76131

More information

The Implementation of Wiki-based Knowledge Management Systems for Small Research Groups

The Implementation of Wiki-based Knowledge Management Systems for Small Research Groups International Journal of Computer Information Systems and Industrial Management Applications (IJCISIM) ISSN 2150-7988 Vol.1 (2009), pp. 68 75 http://www.mirlabs.org/ijcisim The Implementation of Wiki-based

More information

Adaptive Time-Dependent CFD on Distributed Unstructured Meshes

Adaptive Time-Dependent CFD on Distributed Unstructured Meshes Adaptive Time-Dependent CFD on Distributed Unstructured Meshes Chris Walshaw and Martin Berzins School of Computer Studies, University of Leeds, Leeds, LS2 9JT, U K e-mails: chris@scsleedsacuk, martin@scsleedsacuk

More information

Adaptive Processor Allocation for Moldable Jobs in Computational Grid

Adaptive Processor Allocation for Moldable Jobs in Computational Grid 10 International Journal of Grid and High Performance Computing, 1(1), 10-21, January-March 2009 Adaptive Processor Allocation for Moldable Jobs in Computational Grid Kuo-Chan Huang, National Taichung

More information

Resource-Aware Load Balancing of Parallel Applications

Resource-Aware Load Balancing of Parallel Applications Resource-Aware Load Balancing of Parallel Applications Eric Aubanel Faculty of Computer Science University of New Brunswick Fredericton, NB Canada voice: +1 506-458-7268 fax: +1 506-453-3566 email: aubanel@unb.ca

More information

Evaluating partitioning of big graphs

Evaluating partitioning of big graphs Evaluating partitioning of big graphs Fredrik Hallberg, Joakim Candefors, Micke Soderqvist fhallb@kth.se, candef@kth.se, mickeso@kth.se Royal Institute of Technology, Stockholm, Sweden Abstract. Distributed

More information

Fast Multipole Method for particle interactions: an open source parallel library component

Fast Multipole Method for particle interactions: an open source parallel library component Fast Multipole Method for particle interactions: an open source parallel library component F. A. Cruz 1,M.G.Knepley 2,andL.A.Barba 1 1 Department of Mathematics, University of Bristol, University Walk,

More information

Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations

Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations C3P 913 June 1990 Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations Roy D. Williams Concurrent Supercomputing Facility California Institute of Technology Pasadena, California

More information

Load balancing in a heterogeneous computer system by self-organizing Kohonen network

Load balancing in a heterogeneous computer system by self-organizing Kohonen network Bull. Nov. Comp. Center, Comp. Science, 25 (2006), 69 74 c 2006 NCC Publisher Load balancing in a heterogeneous computer system by self-organizing Kohonen network Mikhail S. Tarkov, Yakov S. Bezrukov Abstract.

More information

Introduction to DISC and Hadoop

Introduction to DISC and Hadoop Introduction to DISC and Hadoop Alice E. Fischer April 24, 2009 Alice E. Fischer DISC... 1/20 1 2 History Hadoop provides a three-layer paradigm Alice E. Fischer DISC... 2/20 Parallel Computing Past and

More information

Dynamic Load Balancing for Cluster Computing Jaswinder Pal Singh, CSE @ Technische Universität München. e-mail: singhj@in.tum.de

Dynamic Load Balancing for Cluster Computing Jaswinder Pal Singh, CSE @ Technische Universität München. e-mail: singhj@in.tum.de Dynamic Load Balancing for Cluster Computing Jaswinder Pal Singh, CSE @ Technische Universität München. e-mail: singhj@in.tum.de Abstract: In parallel simulations, partitioning and load-balancing algorithms

More information

Hierarchical Partitioning and Dynamic Load Balancing for Scientific Computation

Hierarchical Partitioning and Dynamic Load Balancing for Scientific Computation Hierarchical Partitioning and Dynamic Load Balancing for Scientific Computation James D. Teresco 1, Jamal Faik 2, and Joseph E. Flaherty 2 1 Department of Computer Science, Williams College Williamstown,

More information

Dynamic Mapping and Load Balancing on Scalable Interconnection Networks Alan Heirich, California Institute of Technology Center for Advanced Computing Research The problems of mapping and load balancing

More information

A Comparison of General Approaches to Multiprocessor Scheduling

A Comparison of General Approaches to Multiprocessor Scheduling A Comparison of General Approaches to Multiprocessor Scheduling Jing-Chiou Liou AT&T Laboratories Middletown, NJ 0778, USA jing@jolt.mt.att.com Michael A. Palis Department of Computer Science Rutgers University

More information

Load Balancing Strategies for Parallel SAMR Algorithms

Load Balancing Strategies for Parallel SAMR Algorithms Proposal for a Summer Undergraduate Research Fellowship 2005 Computer science / Applied and Computational Mathematics Load Balancing Strategies for Parallel SAMR Algorithms Randolf Rotta Institut für Informatik,

More information

BSPCloud: A Hybrid Programming Library for Cloud Computing *

BSPCloud: A Hybrid Programming Library for Cloud Computing * BSPCloud: A Hybrid Programming Library for Cloud Computing * Xiaodong Liu, Weiqin Tong and Yan Hou Department of Computer Engineering and Science Shanghai University, Shanghai, China liuxiaodongxht@qq.com,

More information

Expanding the CASEsim Framework to Facilitate Load Balancing of Social Network Simulations

Expanding the CASEsim Framework to Facilitate Load Balancing of Social Network Simulations Expanding the CASEsim Framework to Facilitate Load Balancing of Social Network Simulations Amara Keller, Martin Kelly, Aaron Todd 4 June 2010 Abstract This research has two components, both involving the

More information

How To Get A Computer Science Degree At Appalachian State

How To Get A Computer Science Degree At Appalachian State 118 Master of Science in Computer Science Department of Computer Science College of Arts and Sciences James T. Wilkes, Chair and Professor Ph.D., Duke University WilkesJT@appstate.edu http://www.cs.appstate.edu/

More information

FRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG

FRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG FRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG INSTITUT FÜR INFORMATIK (MATHEMATISCHE MASCHINEN UND DATENVERARBEITUNG) Lehrstuhl für Informatik 10 (Systemsimulation) Massively Parallel Multilevel Finite

More information

Partitioning and Load Balancing for Emerging Parallel Applications and Architectures

Partitioning and Load Balancing for Emerging Parallel Applications and Architectures Chapter Partitioning and Load Balancing for Emerging Parallel Applications and Architectures Karen D. Devine, Erik G. Boman, and George Karypis. Introduction An important component of parallel scientific

More information

A STUDY OF TASK SCHEDULING IN MULTIPROCESSOR ENVIROMENT Ranjit Rajak 1, C.P.Katti 2, Nidhi Rajak 3

A STUDY OF TASK SCHEDULING IN MULTIPROCESSOR ENVIROMENT Ranjit Rajak 1, C.P.Katti 2, Nidhi Rajak 3 A STUDY OF TASK SCHEDULING IN MULTIPROCESSOR ENVIROMENT Ranjit Rajak 1, C.P.Katti, Nidhi Rajak 1 Department of Computer Science & Applications, Dr.H.S.Gour Central University, Sagar, India, ranjit.jnu@gmail.com

More information

Guide to Partitioning Unstructured Meshes for Parallel Computing

Guide to Partitioning Unstructured Meshes for Parallel Computing Guide to Partitioning Unstructured Meshes for Parallel Computing Phil Ridley Numerical Algorithms Group Ltd, Wilkinson House, Jordan Hill Road, Oxford, OX2 8DR, UK, email: phil.ridley@nag.co.uk April 17,

More information

TOPAS: a Web-based Tool for Visualization of Mapping Algorithms

TOPAS: a Web-based Tool for Visualization of Mapping Algorithms TOPAS: a Web-based Tool for Visualization of Mapping Algorithms 0. G. Monakhov, 0. J. Chunikhin, E. B. Grosbein Institute of Computational Mathematics and Mathematical Geophysics, Siberian Division of

More information

Load Balancing Strategies for Multi-Block Overset Grid Applications NAS-03-007

Load Balancing Strategies for Multi-Block Overset Grid Applications NAS-03-007 Load Balancing Strategies for Multi-Block Overset Grid Applications NAS-03-007 M. Jahed Djomehri Computer Sciences Corporation, NASA Ames Research Center, Moffett Field, CA 94035 Rupak Biswas NAS Division,

More information

Parallel Analysis and Visualization on Cray Compute Node Linux

Parallel Analysis and Visualization on Cray Compute Node Linux Parallel Analysis and Visualization on Cray Compute Node Linux David Pugmire, Oak Ridge National Laboratory and Hank Childs, Lawrence Livermore National Laboratory and Sean Ahern, Oak Ridge National Laboratory

More information

Scientific Computing Programming with Parallel Objects

Scientific Computing Programming with Parallel Objects Scientific Computing Programming with Parallel Objects Esteban Meneses, PhD School of Computing, Costa Rica Institute of Technology Parallel Architectures Galore Personal Computing Embedded Computing Moore

More information

Mesh Partitioning for Parallel Computational Fluid Dynamics Applications on a Grid

Mesh Partitioning for Parallel Computational Fluid Dynamics Applications on a Grid Mesh Partitioning for Parallel Computational Fluid Dynamics Applications on a Grid Youssef Mesri * Hugues Digonnet ** Hervé Guillard * * Smash project, Inria-Sophia Antipolis, BP 93, 06902 Sophia-Antipolis

More information

Dynamic Load Balancing of SAMR Applications on Distributed Systems y

Dynamic Load Balancing of SAMR Applications on Distributed Systems y Dynamic Load Balancing of SAMR Applications on Distributed Systems y Zhiling Lan, Valerie E. Taylor Department of Electrical and Computer Engineering Northwestern University, Evanston, IL 60208 fzlan,

More information

A Locality Enhanced Scheduling Method for Multiple MapReduce Jobs In a Workflow Application

A Locality Enhanced Scheduling Method for Multiple MapReduce Jobs In a Workflow Application 2012 International Conference on Information and Computer Applications (ICICA 2012) IPCSIT vol. 24 (2012) (2012) IACSIT Press, Singapore A Locality Enhanced Scheduling Method for Multiple MapReduce Jobs

More information

Index Terms : Load rebalance, distributed file systems, clouds, movement cost, load imbalance, chunk.

Index Terms : Load rebalance, distributed file systems, clouds, movement cost, load imbalance, chunk. Load Rebalancing for Distributed File Systems in Clouds. Smita Salunkhe, S. S. Sannakki Department of Computer Science and Engineering KLS Gogte Institute of Technology, Belgaum, Karnataka, India Affiliated

More information

Scalability and Classifications

Scalability and Classifications Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static

More information

An Improved Spectral Load Balancing Method*

An Improved Spectral Load Balancing Method* SAND93-016C An Improved Spectral Load Balancing Method* Bruce Hendrickson Robert Leland Abstract We describe an algorithm for the static load balancing of scientific computations that generalizes and improves

More information

CSE 4351/5351 Notes 7: Task Scheduling & Load Balancing

CSE 4351/5351 Notes 7: Task Scheduling & Load Balancing CSE / Notes : Task Scheduling & Load Balancing Task Scheduling A task is a (sequential) activity that uses a set of inputs to produce a set of outputs. A task (precedence) graph is an acyclic, directed

More information

LOAD BALANCING FOR MULTIPLE PARALLEL JOBS

LOAD BALANCING FOR MULTIPLE PARALLEL JOBS European Congress on Computational Methods in Applied Sciences and Engineering ECCOMAS 2000 Barcelona, 11-14 September 2000 ECCOMAS LOAD BALANCING FOR MULTIPLE PARALLEL JOBS A. Ecer, Y. P. Chien, H.U Akay

More information

Experiments on the local load balancing algorithms; part 1

Experiments on the local load balancing algorithms; part 1 Experiments on the local load balancing algorithms; part 1 Ştefan Măruşter Institute e-austria Timisoara West University of Timişoara, Romania maruster@info.uvt.ro Abstract. In this paper the influence

More information

Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes

Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes Eric Petit, Loïc Thebault, Quang V. Dinh May 2014 EXA2CT Consortium 2 WPs Organization Proto-Applications

More information

System Interconnect Architectures. Goals and Analysis. Network Properties and Routing. Terminology - 2. Terminology - 1

System Interconnect Architectures. Goals and Analysis. Network Properties and Routing. Terminology - 2. Terminology - 1 System Interconnect Architectures CSCI 8150 Advanced Computer Architecture Hwang, Chapter 2 Program and Network Properties 2.4 System Interconnect Architectures Direct networks for static connections Indirect

More information

Load Balancing Between Heterogenous Computing Clusters

Load Balancing Between Heterogenous Computing Clusters Load Balancing Between Heterogenous Computing Clusters Siu-Cheung Chau Dept. of Physics and Computing, Wilfrid Laurier University, Waterloo, Ontario, Canada, N2L 3C5 e-mail: schau@wlu.ca Ada Wai-Chee Fu

More information

A Robust Dynamic Load-balancing Scheme for Data Parallel Application on Message Passing Architecture

A Robust Dynamic Load-balancing Scheme for Data Parallel Application on Message Passing Architecture A Robust Dynamic Load-balancing Scheme for Data Parallel Application on Message Passing Architecture Yangsuk Kee Department of Computer Engineering Seoul National University Seoul, 151-742, Korea Soonhoi

More information

Efficient partitioning strategies for distributed Web crawling

Efficient partitioning strategies for distributed Web crawling Efficient partitioning strategies for distributed Web crawling José Exposto 1, Joaquim Macedo 2, António Pina 2, Albano Alves 1, and José Rufino 1 1 ESTiG - IPB, Bragança - Portugal 2 DI - UM, Braga -

More information

A Load Balancing Algorithm based on the Variation Trend of Entropy in Homogeneous Cluster

A Load Balancing Algorithm based on the Variation Trend of Entropy in Homogeneous Cluster , pp.11-20 http://dx.doi.org/10.14257/ ijgdc.2014.7.2.02 A Load Balancing Algorithm based on the Variation Trend of Entropy in Homogeneous Cluster Kehe Wu 1, Long Chen 2, Shichao Ye 2 and Yi Li 2 1 Beijing

More information

GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications

GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications Harris Z. Zebrowitz Lockheed Martin Advanced Technology Laboratories 1 Federal Street Camden, NJ 08102

More information

Load balancing. David Bindel. 12 Nov 2015

Load balancing. David Bindel. 12 Nov 2015 Load balancing David Bindel 12 Nov 2015 Inefficiencies in parallel code Poor single processor performance Typically in the memory system Saw this in matrix multiply assignment Overhead for parallelism

More information

How To Balance In Cloud Computing

How To Balance In Cloud Computing A Review on Load Balancing Algorithms in Cloud Hareesh M J Dept. of CSE, RSET, Kochi hareeshmjoseph@ gmail.com John P Martin Dept. of CSE, RSET, Kochi johnpm12@gmail.com Yedhu Sastri Dept. of IT, RSET,

More information

{emery,browne}@cs.utexas.edu ABSTRACT. Keywords scalable, load distribution, load balancing, work stealing

{emery,browne}@cs.utexas.edu ABSTRACT. Keywords scalable, load distribution, load balancing, work stealing Scalable Load Distribution and Load Balancing for Dynamic Parallel Programs E. Berger and J. C. Browne Department of Computer Science University of Texas at Austin Austin, Texas 78701 USA 01-512-471-{9734,9579}

More information

Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer

Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer Stan Posey, MSc and Bill Loewe, PhD Panasas Inc., Fremont, CA, USA Paul Calleja, PhD University of Cambridge,

More information

walberla: Towards an Adaptive, Dynamically Load-Balanced, Massively Parallel Lattice Boltzmann Fluid Simulation

walberla: Towards an Adaptive, Dynamically Load-Balanced, Massively Parallel Lattice Boltzmann Fluid Simulation walberla: Towards an Adaptive, Dynamically Load-Balanced, Massively Parallel Lattice Boltzmann Fluid Simulation SIAM Parallel Processing for Scientific Computing 2012 February 16, 2012 Florian Schornbaum,

More information

Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing

Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing /35 Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing Zuhair Khayyat 1 Karim Awara 1 Amani Alonazi 1 Hani Jamjoom 2 Dan Williams 2 Panos Kalnis 1 1 King Abdullah University of

More information

A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster

A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster Acta Technica Jaurinensis Vol. 3. No. 1. 010 A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster G. Molnárka, N. Varjasi Széchenyi István University Győr, Hungary, H-906

More information

Resource Allocation Schemes for Gang Scheduling

Resource Allocation Schemes for Gang Scheduling Resource Allocation Schemes for Gang Scheduling B. B. Zhou School of Computing and Mathematics Deakin University Geelong, VIC 327, Australia D. Walsh R. P. Brent Department of Computer Science Australian

More information

Static Load Balancing of Parallel PDE Solver for Distributed Computing Environment

Static Load Balancing of Parallel PDE Solver for Distributed Computing Environment Static Load Balancing of Parallel PDE Solver for Distributed Computing Environment Shuichi Ichikawa and Shinji Yamashita Department of Knowledge-based Information Engineering, Toyohashi University of Technology

More information

Overlapping Data Transfer With Application Execution on Clusters

Overlapping Data Transfer With Application Execution on Clusters Overlapping Data Transfer With Application Execution on Clusters Karen L. Reid and Michael Stumm reid@cs.toronto.edu stumm@eecg.toronto.edu Department of Computer Science Department of Electrical and Computer

More information

Fault-Tolerant Routing Algorithm for BSN-Hypercube Using Unsafety Vectors

Fault-Tolerant Routing Algorithm for BSN-Hypercube Using Unsafety Vectors Journal of omputational Information Systems 7:2 (2011) 623-630 Available at http://www.jofcis.com Fault-Tolerant Routing Algorithm for BSN-Hypercube Using Unsafety Vectors Wenhong WEI 1,, Yong LI 2 1 School

More information

Load Balance Strategies for DEVS Approximated Parallel and Distributed Discrete-Event Simulations

Load Balance Strategies for DEVS Approximated Parallel and Distributed Discrete-Event Simulations Load Balance Strategies for DEVS Approximated Parallel and Distributed Discrete-Event Simulations Alonso Inostrosa-Psijas, Roberto Solar, Verónica Gil-Costa and Mauricio Marín Universidad de Santiago,

More information

Doctor of Philosophy in Computer Science

Doctor of Philosophy in Computer Science Doctor of Philosophy in Computer Science Background/Rationale The program aims to develop computer scientists who are armed with methods, tools and techniques from both theoretical and systems aspects

More information

Implementing Parameterized Dynamic Load Balancing Algorithm Using CPU and Memory

Implementing Parameterized Dynamic Load Balancing Algorithm Using CPU and Memory Implementing Parameterized Dynamic Balancing Algorithm Using CPU and Memory Pradip Wawge 1, Pritish Tijare 2 Master of Engineering, Information Technology, Sipna college of Engineering, Amravati, Maharashtra,

More information

On the k-path cover problem for cacti

On the k-path cover problem for cacti On the k-path cover problem for cacti Zemin Jin and Xueliang Li Center for Combinatorics and LPMC Nankai University Tianjin 300071, P.R. China zeminjin@eyou.com, x.li@eyou.com Abstract In this paper we

More information

160 Numerical Methods and Programming, 2012, Vol. 13 (http://num-meth.srcc.msu.ru) UDC 004.021

160 Numerical Methods and Programming, 2012, Vol. 13 (http://num-meth.srcc.msu.ru) UDC 004.021 160 Numerical Methods and Programming, 2012, Vol. 13 (http://num-meth.srcc.msu.ru) UDC 004.021 JOB DIGEST: AN APPROACH TO DYNAMIC ANALYSIS OF JOB CHARACTERISTICS ON SUPERCOMPUTERS A.V. Adinets 1, P. A.

More information

A Refinement-tree Based Partitioning Method for Dynamic Load Balancing with Adaptively Refined Grids

A Refinement-tree Based Partitioning Method for Dynamic Load Balancing with Adaptively Refined Grids A Refinement-tree Based Partitioning Method for Dynamic Load Balancing with Adaptively Refined Grids William F. Mitchell Mathematical and Computational Sciences Division National nstitute of Standards

More information

Interconnection Networks Programmierung Paralleler und Verteilter Systeme (PPV)

Interconnection Networks Programmierung Paralleler und Verteilter Systeme (PPV) Interconnection Networks Programmierung Paralleler und Verteilter Systeme (PPV) Sommer 2015 Frank Feinbube, M.Sc., Felix Eberhardt, M.Sc., Prof. Dr. Andreas Polze Interconnection Networks 2 SIMD systems

More information

DYNAMIC LOAD BALANCING SCHEME FOR ITERATIVE APPLICATIONS

DYNAMIC LOAD BALANCING SCHEME FOR ITERATIVE APPLICATIONS Journal homepage: www.mjret.in DYNAMIC LOAD BALANCING SCHEME FOR ITERATIVE APPLICATIONS ISSN:2348-6953 Rahul S. Wankhade, Darshan M. Marathe, Girish P. Nikam, Milind R. Jawale Department of Computer Engineering,

More information

Computer Science. General Education Students must complete the requirements shown in the General Education Requirements section of this catalog.

Computer Science. General Education Students must complete the requirements shown in the General Education Requirements section of this catalog. Computer Science Dr. Ilhyun Lee Professor Dr. Ilhyun Lee is a Professor of Computer Science. He received his Ph.D. degree from Illinois Institute of Technology, Chicago, Illinois (1996). He was selected

More information

Dynamic load balancing in computational mechanics

Dynamic load balancing in computational mechanics Comput. Methods Appl. Mech. Engrg. 184 (2000) 485±500 www.elsevier.com/locate/cma Dynamic load balancing in computational mechanics Bruce Hendrickson *, Karen Devine Parallel Computing Sciences Department,

More information

THE DESIGN OF AN EFFICIENT LOAD BALANCING ALGORITHM EMPLOYING BLOCK DESIGN. Ilyong Chung and Yongeun Bae. 1. Introduction

THE DESIGN OF AN EFFICIENT LOAD BALANCING ALGORITHM EMPLOYING BLOCK DESIGN. Ilyong Chung and Yongeun Bae. 1. Introduction J. Appl. Math. & Computing Vol. 14(2004), No. 1-2, pp. 343-351 THE DESIGN OF AN EFFICIENT LOAD BALANCING ALGORITHM EMPLOYING BLOCK DESIGN Ilyong Chung and Yongeun Bae Abstract. In order to maintain load

More information

Three Effective Top-Down Clustering Algorithms for Location Database Systems

Three Effective Top-Down Clustering Algorithms for Location Database Systems Three Effective Top-Down Clustering Algorithms for Location Database Systems Kwang-Jo Lee and Sung-Bong Yang Department of Computer Science, Yonsei University, Seoul, Republic of Korea {kjlee5435, yang}@cs.yonsei.ac.kr

More information

The Construction of Seismic and Geological Studies' Cloud Platform Using Desktop Cloud Visualization Technology

The Construction of Seismic and Geological Studies' Cloud Platform Using Desktop Cloud Visualization Technology Send Orders for Reprints to reprints@benthamscience.ae 1582 The Open Cybernetics & Systemics Journal, 2015, 9, 1582-1586 Open Access The Construction of Seismic and Geological Studies' Cloud Platform Using

More information

UPS battery remote monitoring system in cloud computing

UPS battery remote monitoring system in cloud computing , pp.11-15 http://dx.doi.org/10.14257/astl.2014.53.03 UPS battery remote monitoring system in cloud computing Shiwei Li, Haiying Wang, Qi Fan School of Automation, Harbin University of Science and Technology

More information

ParFUM: A Parallel Framework for Unstructured Meshes. Aaron Becker, Isaac Dooley, Terry Wilmarth, Sayantan Chakravorty Charm++ Workshop 2008

ParFUM: A Parallel Framework for Unstructured Meshes. Aaron Becker, Isaac Dooley, Terry Wilmarth, Sayantan Chakravorty Charm++ Workshop 2008 ParFUM: A Parallel Framework for Unstructured Meshes Aaron Becker, Isaac Dooley, Terry Wilmarth, Sayantan Chakravorty Charm++ Workshop 2008 What is ParFUM? A framework for writing parallel finite element

More information

Novel Hierarchical Interconnection Networks for High-Performance Multicomputer Systems

Novel Hierarchical Interconnection Networks for High-Performance Multicomputer Systems JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 20, 1213-1229 (2004) Short Paper Novel Hierarchical Interconnection Networks for High-Performance Multicomputer Systems GENE EU JAN, YUAN-SHIN HWANG *, MING-BO

More information

Study on Redundant Strategies in Peer to Peer Cloud Storage Systems

Study on Redundant Strategies in Peer to Peer Cloud Storage Systems Applied Mathematics & Information Sciences An International Journal 2011 NSP 5 (2) (2011), 235S-242S Study on Redundant Strategies in Peer to Peer Cloud Storage Systems Wu Ji-yi 1, Zhang Jian-lin 1, Wang

More information

Performance Comparison of Database Access over the Internet - Java Servlets vs CGI. T. Andrew Yang Ralph F. Grove

Performance Comparison of Database Access over the Internet - Java Servlets vs CGI. T. Andrew Yang Ralph F. Grove Performance Comparison of Database Access over the Internet - Java Servlets vs CGI Corresponding Author: T. Andrew Yang T. Andrew Yang Ralph F. Grove yang@grove.iup.edu rfgrove@computer.org Indiana University

More information

A Review of Dynamic Load Balancing in Distributed Virtual En quantities

A Review of Dynamic Load Balancing in Distributed Virtual En quantities 6 Dynamic Load Balancing in Distributed Virtual Environments using Heat Diffusion YUNHUA DENG and RYNSON W. H. LAU, City University of Hong Kong Distributed virtual environments (DVEs) are attracting a

More information

Load Balancing on a Grid Using Data Characteristics

Load Balancing on a Grid Using Data Characteristics Load Balancing on a Grid Using Data Characteristics Jonathan White and Dale R. Thompson Computer Science and Computer Engineering Department University of Arkansas Fayetteville, AR 72701, USA {jlw09, drt}@uark.edu

More information

Load Balancing between Computing Clusters

Load Balancing between Computing Clusters Load Balancing between Computing Clusters Siu-Cheung Chau Dept. of Physics and Computing, Wilfrid Laurier University, Waterloo, Ontario, Canada, NL 3C5 e-mail: schau@wlu.ca Ada Wai-Chee Fu Dept. of Computer

More information

Load Balancing Techniques

Load Balancing Techniques Load Balancing Techniques 1 Lecture Outline Following Topics will be discussed Static Load Balancing Dynamic Load Balancing Mapping for load balancing Minimizing Interaction 2 1 Load Balancing Techniques

More information

Master Degree Program in Computer Science (CS)

Master Degree Program in Computer Science (CS) Master Degree Program in Computer Science (CS) Students holding Bachelor s degree in Computer Science are accepted as graduate students, after meeting the general requirements stated below. Applicants

More information

USE OF EIGENVALUES AND EIGENVECTORS TO ANALYZE BIPARTIVITY OF NETWORK GRAPHS

USE OF EIGENVALUES AND EIGENVECTORS TO ANALYZE BIPARTIVITY OF NETWORK GRAPHS USE OF EIGENVALUES AND EIGENVECTORS TO ANALYZE BIPARTIVITY OF NETWORK GRAPHS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA natarajan.meghanathan@jsums.edu ABSTRACT This

More information

A Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment

A Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment A Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment Panagiotis D. Michailidis and Konstantinos G. Margaritis Parallel and Distributed

More information

Radar Image Processing with Clusters of Computers

Radar Image Processing with Clusters of Computers Radar Image Processing with Clusters of Computers Alois Goller Chalmers University Franz Leberl Institute for Computer Graphics and Vision ABSTRACT Some radar image processing algorithms such as shape-from-shading

More information

Optimizing Load Balance Using Parallel Migratable Objects

Optimizing Load Balance Using Parallel Migratable Objects Optimizing Load Balance Using Parallel Migratable Objects Laxmikant V. Kalé, Eric Bohm Parallel Programming Laboratory University of Illinois Urbana-Champaign 2012/9/25 Laxmikant V. Kalé, Eric Bohm (UIUC)

More information

Parallel Ray Tracing using MPI: A Dynamic Load-balancing Approach

Parallel Ray Tracing using MPI: A Dynamic Load-balancing Approach Parallel Ray Tracing using MPI: A Dynamic Load-balancing Approach S. M. Ashraful Kadir 1 and Tazrian Khan 2 1 Scientific Computing, Royal Institute of Technology (KTH), Stockholm, Sweden smakadir@csc.kth.se,

More information

Survey on Load Rebalancing for Distributed File System in Cloud

Survey on Load Rebalancing for Distributed File System in Cloud Survey on Load Rebalancing for Distributed File System in Cloud Prof. Pranalini S. Ketkar Ankita Bhimrao Patkure IT Department, DCOER, PG Scholar, Computer Department DCOER, Pune University Pune university

More information

International Journal of Advanced Research in Computer Science and Software Engineering

International Journal of Advanced Research in Computer Science and Software Engineering Volume 2, Issue 9, September 2012 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com An Experimental

More information

A Dynamic Resource Management with Energy Saving Mechanism for Supporting Cloud Computing

A Dynamic Resource Management with Energy Saving Mechanism for Supporting Cloud Computing A Dynamic Resource Management with Energy Saving Mechanism for Supporting Cloud Computing Liang-Teh Lee, Kang-Yuan Liu, Hui-Yang Huang and Chia-Ying Tseng Department of Computer Science and Engineering,

More information

Spatio-Temporal Mapping -A Technique for Overview Visualization of Time-Series Datasets-

Spatio-Temporal Mapping -A Technique for Overview Visualization of Time-Series Datasets- Progress in NUCLEAR SCIENCE and TECHNOLOGY, Vol. 2, pp.603-608 (2011) ARTICLE Spatio-Temporal Mapping -A Technique for Overview Visualization of Time-Series Datasets- Hiroko Nakamura MIYAMURA 1,*, Sachiko

More information