Dynamic Load Balancing for Cluster Computing Jaswinder Pal Singh, Technische Universität München. singhj@in.tum.de

Size: px
Start display at page:

Download "Dynamic Load Balancing for Cluster Computing Jaswinder Pal Singh, CSE @ Technische Universität München. e-mail: singhj@in.tum.de"

Transcription

1 Dynamic Load Balancing for Cluster Computing Jaswinder Pal Singh, Technische Universität München. singhj@in.tum.de Abstract: In parallel simulations, partitioning and load-balancing algorithms compute the distribution of application data and work to processors. The effectiveness of this distribution greatly influences the performance of a parallel simulation. Decompositions that balance processor loads while keeping the application's communication costs low are preferred. Although a wide variety of partitioning and load-balancing algorithms have been developed, but the load-balancing problem is not yet solved completely as their effectiveness depends on the characteristics of the application using them. New applications and architectures require new partitioning features. New models are needed for non-square, non-symmetric, and highly connected systems arising from applications in biology, circuits, and materials simulations. Increased use of heterogeneous computing architectures requires partitioners that account for non-uniform computing, network, and memory resources. This paper introduces the topic and proposes algorithm to paralellize adaptive grid generation on a cluster followed by a brief look at the future prospects of DLB. Keywords: Distributed Systems, Dynamic Load Balancing, Adaptive Grids, Refinement Tree Bisection 1. Introduction Modern age scientific computations are increasingly becoming large, irregular and computationally intensive to demand more computing power than a conventional sequential computer can provide. The upper bound for the computing power for a single processor is limited by the fastest processor available at any certain time but it can be dramatically increased by integrating a set of processors together. Later are known as Parallel Systems (Fig 1). An in-depth discussion of existing systems has been presented in [1],[2],[3]. Cluster, the one relevant to our discussion, is a collection of heterogeneous workstations with a dedicated high-performance network and can have a Single System Image (SSI) spanning its nodes. Message Passing programming model in which each processor (or process) is assumed to have its own private data space, and data must be explicitly moved between spaces by sending messages, is inherent to Clusters. The principal challenge of parallel programming is to decompose the program into subcomponents that can be run in parallel and that can be achieved by exploring either functional or data parallelism. In Functional Parallelism, the problem is decomposed into a large number of smaller tasks, which are assigned to the processors as they become available. Processors that finish quickly are simply assigned more work. It is implemented in a Client-Server paradigm. The tasks are allocated to a group of slave processes by a master process that may also perform some of the tasks. Processor Organizations SISD SIMD MISD MIMD Uniprocessor Vector processor Array processor Shared Memory SMP NUMA Fig 1. Flynn s Classification Distributed Memory Cluster Data parallelism also known as Domain Decomposition represents the most common strategy for scientific programs. The application is decomposed by subdividing the data space over which it operates and assigning different processors to the work associated with different data subspaces. This leads to data sharing at the boundaries, and the programmer is responsible for ensuring that this data are correctly synchronized. Domain decomposition methods are useful in two contexts. First, the division of problems into smaller problems through usually artificial subdivisions of the domain are a means for introducing parallelism into a problem. Second, many problems involve more than one mathematical model, each posed on a different domain, so that domain decomposition occurs naturally. Examples of the latter are fluid-structure interactions. Associated with MIMD architectures, data parallelism originated the single program, multiple data (SPMD) programming model [4]; the same program is executed on different processors, over distinct data sets. Load Balancing: Motivation Consider, for example, an application that after domain decomposition can be mapped onto the processors of a parallel architecture. In our case underlying hardware system is a cluster and we run into problem because the static resource problem is mapped to a system with dynamic resources, resulting in a potentially unbalanced execution. Things get even more complicated if we run an application with a dynamic run-time behavior on a cluster i.e. mapping of a dynamic resource problem onto a dynamic resource machine. The system changes such as variation in availability of individual processor power, variation in number of

2 processors or dynamic changes in the run-time behavior of the application leads to load imbalance among processing elements. These factors contribute to inefficient use of resources and increase in total execution time, which is a major concern in parallel programming. Load balancing/ sharing is a policy which takes the advantage of the communication facility between the nodes of a cluster, by exchanging of status information and jobs between any two nodes, in order to find the appropriate granularity of tasks and partitioning them so that each node is assigned load in proportion to its performance. It aims at improving the performance of the system and decrease the total execution time. Load balancing algorithm can be either static or dynamic. Static load balancing only uses information about the average system behavior at the initialization phase, i.e. done at compile time while the Dynamic load balancing uses runtime state information to manage task allocation. The paper is outlined as follows. The steps involved in a dynamic load balancing algorithm and its classification criteria are presented in section 2. Section 3 introduces the some popular domain decomposition techniques. Parallelization of Adaptive Grids generation on a Cluster and has been discussed in section 4 as an application. Section 5 mentions current challenges and future aspects in domain decomposition methods. 2. Dynamic Load Balancing Dynamic load balancing is carried out through task migration the transfer of tasks form overloaded nodes to underloaded nodes. To decide when and how to perform task migration, information about the current workload must be exchanged among nodes. The communication and the computation required to make the balancing decision consumes the processing power and may result in worse overall performance if the algorithm is not efficient enough. In the context of SPMD applications, a task migration between two nodes corresponds to transferring all the data associated with this task and necessary to its execution hence the need for an appropriate DLB algorithm. In order to understand a DLB algorithm completely, the main four components (initiation, location, exchange, and load movement) have to be understood Initiation: The initiation strategy specifies the mechanism, which invokes the load balancing activities. This may be a periodic or event-driven initiation. The later are load dependent, based upon the monitoring of local load thus more responsive to load imbalances. They can be either sender- or receiver initiated. In sender initiated, congested servers attempt to transfer work to lightly loaded ones and the opposite takes place in receiver initiated policies. Load-balancer location: Specifies the location at which the algorithm itself is executed. In Centralized algorithms only a single processor, computes the necessary reassignments and informs the involved processors. A distributed algorithm runs locally within each processor. Although the use of former may lead to a bottleneck, but later require load information to be propagated to all the processors, leading to higher communication costs. Information Exchange: Specifies the information and load flow through the system based upon whether the information used by algorithm for decision-making is local or global and the communication policy. All processors take part in the global schemes, whereas in the local schemes, information on the processor or gathered from the surrounding neighborhood take part. Less communication costs are involved in the 1st case but global information exchange strategies tend to give more accurate decisions. The communication policy determines the neighborhood of each processor. It specifies the connection topology, which doesn t have to represent the actual physical topology. A uniform topology indicates a fixed set of neighbors to communicate with, while in a randomized topology the processor randomly chooses another processor to exchange information with. Also, the communication policy specifies the task/load exchange between different processors. In global strategies, task/load transfers may take place between any two processors, while local strategies define group of processors, and allow transfers to take place only between two processors within the same group. Load movement: specifies the appropriate load items to be moved/exchanged as there is a trade/off between the benefits of moving work to balance load and the cost of data movement. Local averaging represents one of the common techniques. The overloaded processor sends load-packets to its neighbors until its own load drops to a specific threshold or the average load. 3. Domain Decomposition or Partitioning algorithm 3.1 The partitioning problem At its simplest, a partitioning algorithm attempts to assign equal numbers of objects to partitions while minimizing communication costs between partitions. A partition's subdomain, then, consists of the data uniquely assigned to the partition; the union of subdomains is equal to the entire problem domain (Fig 2). Objects may have weights proportional to the computational costs of the objects. These nonuniform costs may

3 result from, e.g., variances in computation time due to different physics being solved on different objects, more degrees of freedom per element in adaptive p-refinement [5], or more small time steps taken on smaller elements to enforce timestep constraints in local mesh-refinement methods [6]. Similarly, nonuniform communication costs may be modeled by assigning weights to connections between objects. Partitioning then has the goal of assigning equal total object weight to each subdomain while minimizing the weighted communication cost. 3.2 Dynamic Repartitioning and Load Balancing Problem Workloads in dynamic computations evolve in time, for example, in finite element methods with adaptive mesh refinement, process workloads can vary dramatically as elements are added and/or removed from the mesh. Dynamic repartitioning of mesh data, often called dynamic load balancing, becomes necessary. It is also needed to maintain geometric locality in applications like crash simulations where high parallel efficiency is obtained when subdomains are constructed of geometrically close elements [7]. Fig. 2. An example of a two dimensional mesh (left) and a In our case Dynamic load balancing has decomposition of the mesh into four subdomains (right). the same goals as partitioning, but with the additional constraints that procedures (i) must operate in parallel on already distributed data, (ii) must execute quickly, as dynamic load balancing may be performed frequently, and (iii) should be incremental (i.e., small changes in workloads produce only small changes in the decomposition) as the cost of redistribution of mesh data is often the most significant part of a dynamic load-balancing step. While a more expensive procedure may produce a higher quality result, it is sometimes better to use a faster procedure to obtain lower-quality decomposition, if the workloads are likely to change again after a short time. 3.3 Partition Quality Assessment The most obvious measure of partition quality is computational load balance but it alone does not ensure efficient parallel computation. Communication costs must also be minimized which corresponds to minimizing the number of objects on sharing data across subdomain boundaries. For mesh-based applications, this cost is often approximated by the number of element faces on boundaries between two or more subdomains. To estimate the cost of interprocess communication following metrics have proved to provide better results: Subdomain's surface index is the percentage of all element faces within a subdomain that lie on the subdomain boundary. The maximum local surface index is the largest surface index over all subdomains and approximates the maximum communication needed by any one subdomain, while the global surface index measures the percentage of all element faces that are on subdomain boundaries [8] and approximates the total communication volume. Minimizing only the edge cut or global surface index statistics is not enough [8] for the following reasons: First, the number of faces shared by subdomains is not necessarily equal to the communication volume between the subdomains [8]; an element could easily share two or more faces, but the element's data would be communicated only once to the neighbor.. Second, interprocess connectivity i.e. the number of processes with which each process must exchange information during the solution phase is a significant factor due to its dependence upon interconnection network latency [8]. Third, communication should be balanced, not necessarily minimized [9]. A balanced communication load often corresponds to a small maximum local surface index. Internal connectivity of the subdomains has also proved to be a measure of partition quality. Having multiple disjoint connected components within a subdomain (also known as subdomain splitting ) can be undesirable as the solution of the linear systems will converge slowly for partitions with this property [11]. Additionally, if a relatively small disjoint part of one subdomain can be merged into a neighboring subdomain, the boundary size will decrease, thereby improving the surface indices. Subdomain aspect ratio is the ratio of the square of the radius of smallest circle that contains the entire subdomain to the subdomain's area [Diekmann, et al. [10]]. It has also been reported as an important factor in partition quality [11], particularly when iterative methods such as Conjugate Gradient (CG) or Multigrid are used to solve the linear systems. They [11] show that the number of iterations needed for a

4 preconditioned CG procedure grows with the subdomain aspect ratio. Furthermore, large aspect ratios are likely to lead to larger boundary sizes. Geometric locality of elements is an important indicator of partition effectiveness for some applications. While mesh connectivity provides a reasonable approximation to geometric locality in some simulations, it does not represent geometric locality in all simulations. (In a simulation of an automobile crash, for example, the windshield and bumper are far apart in the mesh, but can be quite close together geometrically.). Quality metrics based on connectivity are not appropriate for these types of simulations 3.4 Partitioning and Dynamic Load Balancing Taxonomy A variety of partitioning and dynamic load balancing procedures have been developed. Since no single procedure is ideal in all situations, many of these alternatives are commonly used. This section describes many of the approaches, grouping them into geometric methods, global graph-based methods, and local graph-based methods. Geometric methods examine only coordinates of the objects to be partitioned. Graph-based methods use the topological connections among the objects. Most geometric or graph-based methods operate as global partitioners or repartitioners. Local graph-based methods, however, operate among neighborhoods of processes in an existing decomposition to improve load balance. This section describes the methods; their relative merits are discussed in Section Geometric Methods Geometric methods use only objects' spatial coordinates and objects' computational weights to compute decomposition in a way that balances the total weight of objects assigned to each partition. They are effective for applications in which objects interact only if they are geometrically close to each other. Examples of such methods are: 1. Recursive Bisection: Recursive bisection methods divide the simulation's objects into two equally weighted sets; the algorithm is then applied recursively to obtain desired number of partitions. In Recursive Coordinate Bisection (RCB) [12], two sets are computed by cutting the problem geometry with a plane orthogonal to a coordinate axis. The plane's direction is selected to be orthogonal to the longest direction of the geometry; its position is computed so that half of the object weight is on each side of the plane. RCB is incremental and suitable for DLB. Like RCB, Recursive Inertial Bisection (RIB) [13] uses cutting planes to bisect the geometry; however, the direction of the plane is computed to be orthogonal to the principle axis of inertia. It is not incremental and may be not suitable for dynamic load balancing 2. Space-Filling Curves: A space-filling curve (SFC) maps n-dimensional space to one dimension [11]. In SFC partitioning, an object's coordinates are converted to a SFC key representing the object's position along a SFC through the physical domain. Sorting the keys gives a linear ordering of the objects. This ordering is cut into appropriately weighted pieces that are assigned to processors Global Graph-Based Partitioning A popular and powerful class of partitioning procedures make use of connectivity information rather than spatial coordinates. These methods use the fact that the partitioning problem in Section 3.1 can be viewed as the partitioning of an induced graph G = (V, E), where objects serve as the graph vertices (V) and connections between objects are the graph edges (E). For example, Figure 3 shows an induced graph for the mesh in Figure 2; here, elements are the objects to be partitioned and, thus, serve as vertices in the graph, while shared element faces define graph edges. A k-way partition Fig 3. 2-Dimensional mesh and its induced graph (left); It s four-way partitioning (right) of the graph G is obtained by dividing the vertices into subsets V 1 V k, where V = V 1 V k and V i V j = for i j. Figure 3 (right) shows one possible decomposition of the graph induced by the mesh. Vertices and edges may have weights associated with them representing computation and communication costs, respectively. The goal of graph partitioning, then, is to create subsets V k with equal vertex weights while minimizing the weight of edge cut by subset boundaries. An edge e ij between vertices v i and v j is cut when v i belongs to one subset and v j belongs to a different one. Algorithms to provide an optimal partitioning are NP-complete [14], so heuristic algorithms are generally used.

5 Greedy algorithm, Spectral partitioning and Multilevel partitioning are some of the static partitioners, intended for use as a preprocessing step rather than as a dynamic load balancing procedure. Some of the multilevel procedures do operate in parallel and can be used for dynamic load balancing Local Graph-based Methods In an adaptive computation, dynamic load balancing may be required frequently. Applying global partitioning strategies after each adaptive step can be costly relative to solution time. Thus, a number of dynamic load balancing techniques that are intended to be fast and incrementally migrate data from heavily to lightly loaded processes, have been developed. These are often referred to as local methods. Unlike global partitioning methods, local methods work with only a limited view of the application workloads. They consider workloads within small, overlapping sets of processors to improve balance within each set. Heavily loaded processors within a set transfer objects to less heavily loaded processors in the same set. Sets can be defined by the parallel architecture's processor connectivity or by the connectivity of the application data [16]. Sets overlap, allowing objects to move between sets through several iterations of the local method. Thus, when only small changes in application workloads occur through, say, adaptive refinement, a few iterations of a local method can correct imbalances while keeping the amount of data migrated low. For dramatic changes in application workloads, however, many iterations of a local method are needed to correct load imbalances; in such cases, invocation of a global partitioning method may result in a better, more cost-effective decomposition. Local methods typically consist of two steps: (i) computing a map of how much work (nodal weight) must be shifted from heavily loaded to lightly loaded processors, and (ii) selecting objects (nodes) that should be moved to satisfy that map. Many different strategies can be used for each step. Most strategies for computing a map of the amount of data to be shifted among processes are based on the diffusive algorithm of Cybenko [15]. Using processor connectivity or application communication patterns to describe a computational mesh, an equation representing the workflow is solved using a first-order finitedifference scheme. Since the stencil of the scheme is compact (using information only from neighboring processes), the method is local. Hu and Blake [16] take a more global view of load distributions, computing a diffusion solution while minimizing work flow over edges of a graph of the processes. 4. Parallelizing an Adaptive Mesh Generator using Refinement Tree Partitioning and SFC Adaptive computational techniques provide a reliable, robust, and efficient means of solving problems involving PDEs by finite difference, finite volume, or finite element technologies. With an adaptive approach, an initial mesh used to discretize the computational domain and numerical method used to discretize the PDEs are enhanced during the course of the solution procedure in order to optimize, e.g., the computational effort for a given level of accuracy. Enhancement typically involves h-refinement, where a mesh is refined or coarsened, respectively, in regions of low or high accuracy; r - refinement, where a mesh of a fixed topology is moved to follow evolving dynamic phenomena; and p-refinement, where the method order is increased or decreased, respectively, in regions of low or high accuracy. Unfortunately, parallelism greatly complicates an adaptive computation. The unique features of presented algorithm is the integrated utilization of space-filling curve (SFC) techniques for ordering and using refinement tree partitioning to parallelize the problem at hand. 4.1 The Adaptive Grid and Meshing Strategy The domain Ω is triangulated automatically by a grid generator based on h- refinement. Assuming that some sort of error estimator is available, the elements with large errors than some suitable tolerance are subdivided into a smaller element of the same type. The same procedure is continually repeated until either the error for each element in the newly constructed mesh is no larger than the tolerance or the highest allowable level of mesh refinement is reached. The refinement strategy is based on bisecting a marked edge and a marking algorithm that prevents small angles in the refined mesh. With uniform refinement every two levels all edges are halved. Note that the algorithm describes only one refinement level and can be applied for each grid level recursively. Fig 4. Edge marking strategy for 2D refinement by bisecting a marked edge

6 Algorithm 4.1. Let each element τ of the triangulation have a marked refinement edge, and let Σ be the set of elements flagged for refinement. I. bisect each τ Σ, obtain τ 1 and τ 2, the daughter elements; II. mark the edges opposite to the newly inserted node in τ i (i = 1 : 2) as depicted in figure 4; III. now, set Σ to be the set of triangles τ with hanging nodes; IV. IF Σ = ; THEN stop, ELSE go to step Refinement Tree Partitioning This method is based on the refinement tree that is generated during the process of adaptive grid refinement. It is not as generally applicable as the other fast algorithms, which use only information contained in the final grid, but in the context of adaptive multilevel methods it is able to produce higher quality partitions by taking advantage of the additional information about how the grid was generated. The refinement-tree partitioning algorithm [20] is a recursive bisection method. This means that the core of the algorithm partitions the data into two sets, i.e., bisects the data. The algorithm then bisects those two sets to produce four sets, and so forth until the desired number of sets is produced. The refinement tree of an adaptive triangular grid generated by bisection refinement is a binary tree containing one node for each triangle that appears during the refinement process. (It may actually be a forest, but the individual trees can be connected into a single tree by adding artificial nodes above the roots.) The two children of a node correspond to the two triangles that are formed by bisecting the triangle corresponding to the parent node. In Fig. 5, the numbering of the triangles in the grid and the nodes in the tree indicates the relationship. The nodes of the tree have two weights associated with them; a personal weight and a subtree weight. The personal weight is a representation of the computational work associated with the corresponding triangle. For example, a smaller weight can be used for elements containing Dirichlet boundary equations which require less computation than interior equations. The interior nodes, i.e., those that are not leaves, correspond to triangles in the coarser grids. These nodes can be assigned nonzero weights to represent the computation on the coarser grids of the multigrid algorithm, which is not possible with partitioning Fig 5. Refinement Trees algorithms that only consider the finest grid. For simplicity, in this paper a weight of 0 is assigned to the interior nodes, whereas leaves have different weights. The subtree weight of a node is the sum of the personal weights in the subtree rooted at that node and can be computed in O(N) operations for N triangles, using a depth first post order traversal of the tree. The algorithm for bisecting the grid into two equal sized sets is given below. For scalability, the refinement tree structure used for dynamic repartitioning must be distributed across the cooperating processes [17]. It also must be constructed automatically in parallel. Each node maintains information about its region of space (bounding box), process ownership, parent and offspring links, and attached objects and their costs for weighted load balancing. In a distributed tree, links may cross process boundaries, and must include both a process id and a pointer. A distributed tree also increases the complexity and overhead of interprocess communication, node refinement and pruning, and the insertion of new objects (e.g., elements created or removed by adaptive h-refinement) into the correct node. Objects to be inserted may reside on any process, and some objects will likely reside on processes other than those owning their destination nodes. Such objects are called orphans and must be migrated to the appropriate process. 4.3 Space Filling Curve for ordering Unstructured Grid The first part of the refinement-tree partitioning algorithm sums the personal weights in the tree to compute the subtree weights using depth first post order traversal. The traversal is done to indicate an ordering, or linearization, of the leaf nodes of the refinement tree. Since partitions are formed from contiguous segments of this linearization, its form has a direct effect on the quality of the resulting partitions. Space-filling curves provide a continuous mapping from one-dimensional to d-dimensional space [11] that have been used to linearize spatially-distributed data for partitioning [11],[18] storage and memory management [11], and

7 computational geometry. Herein, we regard the space-filling curves as a way of organizing the refinement tree traversals and, hence, linearizing the leaf nodes of the distributed refinement tree. In this context we construct a Algorithm 4.2 algorithm bisect Compute subtree weights Bisect_subtree (roots) end algorithm bisect bisect_subtree (node) If node is a leaf then assign node to the smaller set else if node has one child then bisect_subtree (node) else (node has two children) select a set for each child for each child, examine the sum of the subtree weight with the accumulated weight of the selected set for the smaller of the two sums, assign the subtree rooted at that child to the selected set, and add the subtree weight to the weight of the set bisect_subtree(other child) endif end algorithm bisect_subtree discrete SFC that is fine enough to have a curve node in each element, thus inducing a consecutive numbering of elements. Due to the preservation of data locality for a SFC it guarantees connectedness and locally compact partitions, where the consecutive numbering is used for partitioning the computational triangulated domain. We use a bitmap-based algorithm [19] for the indexing of generated SFC. Following data have to be known a priory: a) the number of triangles in the initial triangulation N 0, b) the maximum number of refinement levels l. With these data, for each element we need a bit structure of length b = log 2 (N 0 ) + l. While the first b - l bits are used for consecutively numbering the initial elements arbitrarily, each level is then represented by an additional bit. To illustrate the algorithm, observe the series of step in fig 6. The construction of a space-filling curve is given by the following algorithm. Fig 6.Series of steps in the construction of spacefilling curve in a locally bisection refined mesh Algorithm 4.3 Let τ k be an element on level k of the grid, and we denote with τ k i, (i = 1 : 2) both daughters of element τ k-1. (1) The algorithm starts with a zero bitmap of length b in τ 0 (2) Then, FOR each level (k = 1: l) DO: a. copy the mother's (τ k-1 ) bitmap to both daughter elements (τ k i ); b. determine left or right side element τ k e according to the level: τ e k = left if mod (k,2) = 0; τ e k = right if mod (k,2) = 1; c. set the k-th bit of daughter τ e k to 1 (3) END DO

8 5. Current Challenges As parallel simulations and environments become more sophisticated, partitioning algorithms must address new issues and application requirements. Software design that allows algorithms to be compared and reused is an important first step; carefully designed libraries that support many applications benefit application developers while serving as test-beds for algorithmic research. Existing partitioners need additional functionality to support new applications. Partitioning models must more accurately represent a broader range of applications, including those with non-symmetric, non-square, and/or highly-connected relationships. And partitioning algorithms should perform resource aware balancing i.e. they need to be sensitive to state-of-theart, heterogeneous computer architectures, adjusting work assignments relative to processing, memory and communication resources. New simulation areas such as electrical systems, computational biology, linear programming and nanotechnology show that traditional mesh-based PDE simulations include high connectivity, heterogeneity in topology, and matrices that are rectangular or non-symmetric. While graph models (see Section 3.4) are often considered the most effective models for mesh-based PDE simulations, but show limitation for such problems. As an alternative to graphs, hypergraphs can be used to model application data. A hypergraph HG = (V;HE) consists of a set of vertices V representing the data objects to be partitioned and a set of hyperedges HE connecting two or more vertices of V. By allowing larger sets of vertices to be associated through edges, the hypergraph model overcomes many of the limitations of the graph model. In the hypergraph model, the number of hyperedge cuts is equal to the communication volume, providing a more effective partitioning metric. Another challenge is Multi-phase simulations e.g. crash simulations, as they have different work loads in each phase of a simulation i.e. computation of forces and contact detection problem. Often, separate decompositions are used for each phase; data is communicated from one decomposition to the other between phases [21]. Obtaining a single decomposition that is good with respect to both phases would remove the need for communication between phases. Each object would have multiple loads, corresponding to its workload in each phase. The challenge would be computing a single decomposition that is balanced with respect to all loads. Such a multicriteria partitioner could be used in other situations as well, such as balancing both computational work and memory usage. References 1. Rajkumar Buyya et.al. High Performance Cluster Computing Vol 1, Prentice Hall PTR, G. Pfister. In Search of Clusters. Prentice Hall, 2 nd Edition, Grama, Gupta, Karypis, Kumar. Introduction to Parallel Computing, Addision Wesley, 2 nd Edition, Alexandre Plastino, Celso C. Ribeiro, Noemi de La Rocque Rodriguez. Developing SPMD applications with load balancing. Parallel Computing, 2003,Vol: 29, Issue: 6, p James D. Teresco, Karen D. Devine, and Joseph E. Flaherty. Partitioning and Dynamic Load Balancing for the Numerical Solution of PDE 6. Flaherty, Loy, Shephard, Szymanski, Teresco, Ziantz: Adaptive local refinement with octree loadbalancing for the parallel solution of three-dimensional conservation laws. J. Parallel Distrib. Comput., 47: , (1997) 7. Plimpton et. al. Transient dynamics simulations: Parallel algorithms for contact detection and smoothed particle hydrodynamics. J. Parallel Distrib. Comput., 50: , (1998) 8. Flaherty et. al. The quality of partitions produced by an iterative load balancer. Proc. Third Workshop on Languages, Compilers, and Runtime Systems, pages , Troy, (1996) 9. Pinar,Hendrickson, B.: Graph partitioning for complex objectives. 15th I PDPS, San Francisco, CA, (2001) 10. Diekmann et. al: Shape-optimized mesh partitioning and load balancing for parallel adaptive fem. Parallel Comput., 26(12): , (2000) 11. Flaherty et. al. Dynamic Octree Load Balancing using SFC. Williams College Department of Computer Science Technical Report CS-03-01, Berger, Bokhari,: A partitioning strategy for nonuniform problems on multiprocessors. IEEE Trans. Computers, 36: , (1987).

9 13. Simon, H. D.: Partitioning of unstructured problems for parallel processing. Comp. Sys. Engg., 2: (1991) 14. Garey, M., Johnson, D., and Stockmeyer, L.: Some simplified NP-complete graph problems. Theoretical Computer Science, 1(3): , (1976) 15. Cybenko, G.: Dynamic load balancing for distributed memory multiprocessors. J. Parallel Distrib. Comput., 7: , (1989) 16. Hu, Blake: An optimal dynamic load balancing algorithm. Preprint DL-P , Daresbury Laboratory, Warrington, WA4 4AD, UK, (1995) 17. Simone, Shephard, Flaherty, and Loy. A distributed octree and neighbor-finding algorithms for parallel mesh generation. Tech. Report , Rensselaer Polytechnic Institute, Scientific Computation Research Center, Troy, S. Aluru and F. Sevilgen. Parallel domain decomposition and load balancing using space-filling curves. In Proc. International Conference on High-Performance Computing, pages , J. Behrens et. al.: Amatos: Parallel adaptive mesh generator for atmospheric and oceanic simulation, Technical Report TR 02-03, BremHLR { Competence Center of High Performance Computing Bremen, Bremen, Germany, 2003, 20. W. F. Mitchell, Refinement Tree Based Partitioning for Adaptive Grids, in Proceedings of the 7th SIAM Conference on Parallel Processing for Scientific Computing, SIAM, Philadelphia (1995) pp S. Plimpton, Attaway, Hendrickson, Swegle, Vaughan, Gardner : Transient dynamics simulations: Parallel algorithms for contact detection and smoothed particle hydrodynamics, J. Parallel Distrib. Comput. 50 (1998)

Partitioning and Dynamic Load Balancing for the Numerical Solution of Partial Differential Equations

Partitioning and Dynamic Load Balancing for the Numerical Solution of Partial Differential Equations Partitioning and Dynamic Load Balancing for the Numerical Solution of Partial Differential Equations James D. Teresco, Karen D. Devine, and Joseph E. Flaherty 3 Department of Computer Science, Williams

More information

Distributed Dynamic Load Balancing for Iterative-Stencil Applications

Distributed Dynamic Load Balancing for Iterative-Stencil Applications Distributed Dynamic Load Balancing for Iterative-Stencil Applications G. Dethier 1, P. Marchot 2 and P.A. de Marneffe 1 1 EECS Department, University of Liege, Belgium 2 Chemical Engineering Department,

More information

A Refinement-tree Based Partitioning Method for Dynamic Load Balancing with Adaptively Refined Grids

A Refinement-tree Based Partitioning Method for Dynamic Load Balancing with Adaptively Refined Grids A Refinement-tree Based Partitioning Method for Dynamic Load Balancing with Adaptively Refined Grids William F. Mitchell Mathematical and Computational Sciences Division National nstitute of Standards

More information

Partitioning and Dynamic Load Balancing for Petascale Applications

Partitioning and Dynamic Load Balancing for Petascale Applications Partitioning and Dynamic Load Balancing for Petascale Applications Karen Devine, Sandia National Laboratories Erik Boman, Sandia National Laboratories Umit Çatalyürek, Ohio State University Lee Ann Riesen,

More information

Fast Multipole Method for particle interactions: an open source parallel library component

Fast Multipole Method for particle interactions: an open source parallel library component Fast Multipole Method for particle interactions: an open source parallel library component F. A. Cruz 1,M.G.Knepley 2,andL.A.Barba 1 1 Department of Mathematics, University of Bristol, University Walk,

More information

?kt. An Unconventional Method for Load Balancing. w = C ( t m a z - ti) = p(tmaz - 0i=l. 1 Introduction. R. Alan McCoy,*

?kt. An Unconventional Method for Load Balancing. w = C ( t m a z - ti) = p(tmaz - 0i=l. 1 Introduction. R. Alan McCoy,* ENL-62052 An Unconventional Method for Load Balancing Yuefan Deng,* R. Alan McCoy,* Robert B. Marr,t Ronald F. Peierlst Abstract A new method of load balancing is introduced based on the idea of dynamically

More information

Mesh Generation and Load Balancing

Mesh Generation and Load Balancing Mesh Generation and Load Balancing Stan Tomov Innovative Computing Laboratory Computer Science Department The University of Tennessee April 04, 2012 CS 594 04/04/2012 Slide 1 / 19 Outline Motivation Reliable

More information

Portable Parallel Programming for the Dynamic Load Balancing of Unstructured Grid Applications

Portable Parallel Programming for the Dynamic Load Balancing of Unstructured Grid Applications Portable Parallel Programming for the Dynamic Load Balancing of Unstructured Grid Applications Rupak Biswas MRJ Technology Solutions NASA Ames Research Center Moffett Field, CA 9435, USA rbiswas@nas.nasa.gov

More information

Lecture 12: Partitioning and Load Balancing

Lecture 12: Partitioning and Load Balancing Lecture 12: Partitioning and Load Balancing G63.2011.002/G22.2945.001 November 16, 2010 thanks to Schloegel,Karypis and Kumar survey paper and Zoltan website for many of today s slides and pictures Partitioning

More information

Characterizing the Performance of Dynamic Distribution and Load-Balancing Techniques for Adaptive Grid Hierarchies

Characterizing the Performance of Dynamic Distribution and Load-Balancing Techniques for Adaptive Grid Hierarchies Proceedings of the IASTED International Conference Parallel and Distributed Computing and Systems November 3-6, 1999 in Cambridge Massachusetts, USA Characterizing the Performance of Dynamic Distribution

More information

A Review of Customized Dynamic Load Balancing for a Network of Workstations

A Review of Customized Dynamic Load Balancing for a Network of Workstations A Review of Customized Dynamic Load Balancing for a Network of Workstations Taken from work done by: Mohammed Javeed Zaki, Wei Li, Srinivasan Parthasarathy Computer Science Department, University of Rochester

More information

Load Balancing on a Non-dedicated Heterogeneous Network of Workstations

Load Balancing on a Non-dedicated Heterogeneous Network of Workstations Load Balancing on a Non-dedicated Heterogeneous Network of Workstations Dr. Maurice Eggen Nathan Franklin Department of Computer Science Trinity University San Antonio, Texas 78212 Dr. Roger Eggen Department

More information

Hierarchical Partitioning and Dynamic Load Balancing for Scientific Computation

Hierarchical Partitioning and Dynamic Load Balancing for Scientific Computation Hierarchical Partitioning and Dynamic Load Balancing for Scientific Computation James D. Teresco 1, Jamal Faik 2, and Joseph E. Flaherty 2 1 Department of Computer Science, Williams College Williamstown,

More information

Scalability and Classifications

Scalability and Classifications Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static

More information

An Empirical Study and Analysis of the Dynamic Load Balancing Techniques Used in Parallel Computing Systems

An Empirical Study and Analysis of the Dynamic Load Balancing Techniques Used in Parallel Computing Systems An Empirical Study and Analysis of the Dynamic Load Balancing Techniques Used in Parallel Computing Systems Ardhendu Mandal and Subhas Chandra Pal Department of Computer Science and Application, University

More information

DECENTRALIZED LOAD BALANCING IN HETEROGENEOUS SYSTEMS USING DIFFUSION APPROACH

DECENTRALIZED LOAD BALANCING IN HETEROGENEOUS SYSTEMS USING DIFFUSION APPROACH DECENTRALIZED LOAD BALANCING IN HETEROGENEOUS SYSTEMS USING DIFFUSION APPROACH P.Neelakantan Department of Computer Science & Engineering, SVCET, Chittoor pneelakantan@rediffmail.com ABSTRACT The grid

More information

Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations

Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations Roy D. Williams, 1990 Presented by Chris Eldred Outline Summary Finite Element Solver Load Balancing Results Types Conclusions

More information

CSE 4351/5351 Notes 7: Task Scheduling & Load Balancing

CSE 4351/5351 Notes 7: Task Scheduling & Load Balancing CSE / Notes : Task Scheduling & Load Balancing Task Scheduling A task is a (sequential) activity that uses a set of inputs to produce a set of outputs. A task (precedence) graph is an acyclic, directed

More information

Load balancing in a heterogeneous computer system by self-organizing Kohonen network

Load balancing in a heterogeneous computer system by self-organizing Kohonen network Bull. Nov. Comp. Center, Comp. Science, 25 (2006), 69 74 c 2006 NCC Publisher Load balancing in a heterogeneous computer system by self-organizing Kohonen network Mikhail S. Tarkov, Yakov S. Bezrukov Abstract.

More information

New Challenges in Dynamic Load Balancing

New Challenges in Dynamic Load Balancing New Challenges in Dynamic Load Balancing Karen D. Devine 1, Erik G. Boman, Robert T. Heaphy, Bruce A. Hendrickson Discrete Algorithms and Mathematics Department, Sandia National Laboratories, Albuquerque,

More information

Dynamic load balancing in computational mechanics

Dynamic load balancing in computational mechanics Comput. Methods Appl. Mech. Engrg. 184 (2000) 485±500 www.elsevier.com/locate/cma Dynamic load balancing in computational mechanics Bruce Hendrickson *, Karen Devine Parallel Computing Sciences Department,

More information

Partitioning and Load Balancing for Emerging Parallel Applications and Architectures

Partitioning and Load Balancing for Emerging Parallel Applications and Architectures Chapter Partitioning and Load Balancing for Emerging Parallel Applications and Architectures Karen D. Devine, Erik G. Boman, and George Karypis. Introduction An important component of parallel scientific

More information

Partitioning and Divide and Conquer Strategies

Partitioning and Divide and Conquer Strategies and Divide and Conquer Strategies Lecture 4 and Strategies Strategies Data partitioning aka domain decomposition Functional decomposition Lecture 4 and Strategies Quiz 4.1 For nuclear reactor simulation,

More information

Load Balancing. Load Balancing 1 / 24

Load Balancing. Load Balancing 1 / 24 Load Balancing Backtracking, branch & bound and alpha-beta pruning: how to assign work to idle processes without much communication? Additionally for alpha-beta pruning: implementing the young-brothers-wait

More information

Load Balancing Strategies for Parallel SAMR Algorithms

Load Balancing Strategies for Parallel SAMR Algorithms Proposal for a Summer Undergraduate Research Fellowship 2005 Computer science / Applied and Computational Mathematics Load Balancing Strategies for Parallel SAMR Algorithms Randolf Rotta Institut für Informatik,

More information

Lecture 2 Parallel Programming Platforms

Lecture 2 Parallel Programming Platforms Lecture 2 Parallel Programming Platforms Flynn s Taxonomy In 1966, Michael Flynn classified systems according to numbers of instruction streams and the number of data stream. Data stream Single Multiple

More information

Partition And Load Balancer on World Wide Web

Partition And Load Balancer on World Wide Web JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 17, 595-614 (2001) UMPAL: An Unstructured Mesh Partitioner and Load Balancer on World Wide Web WILLIAM C. CHU *, DON-LIN YANG, JEN-CHIH YU AND YEH-CHING CHUNG

More information

Optimizing Load Balance Using Parallel Migratable Objects

Optimizing Load Balance Using Parallel Migratable Objects Optimizing Load Balance Using Parallel Migratable Objects Laxmikant V. Kalé, Eric Bohm Parallel Programming Laboratory University of Illinois Urbana-Champaign 2012/9/25 Laxmikant V. Kalé, Eric Bohm (UIUC)

More information

A Comparative Performance Analysis of Load Balancing Algorithms in Distributed System using Qualitative Parameters

A Comparative Performance Analysis of Load Balancing Algorithms in Distributed System using Qualitative Parameters A Comparative Performance Analysis of Load Balancing Algorithms in Distributed System using Qualitative Parameters Abhijit A. Rajguru, S.S. Apte Abstract - A distributed system can be viewed as a collection

More information

walberla: Towards an Adaptive, Dynamically Load-Balanced, Massively Parallel Lattice Boltzmann Fluid Simulation

walberla: Towards an Adaptive, Dynamically Load-Balanced, Massively Parallel Lattice Boltzmann Fluid Simulation walberla: Towards an Adaptive, Dynamically Load-Balanced, Massively Parallel Lattice Boltzmann Fluid Simulation SIAM Parallel Processing for Scientific Computing 2012 February 16, 2012 Florian Schornbaum,

More information

Heat Diffusion Based Dynamic Load Balancing for Distributed Virtual Environments

Heat Diffusion Based Dynamic Load Balancing for Distributed Virtual Environments Heat Diffusion Based Dynamic Load Balancing for Distributed Virtual Environments Yunhua Deng Rynson W.H. Lau Department of Computer Science, City University of Hong Kong, Hong Kong Abstract Distributed

More information

ParFUM: A Parallel Framework for Unstructured Meshes. Aaron Becker, Isaac Dooley, Terry Wilmarth, Sayantan Chakravorty Charm++ Workshop 2008

ParFUM: A Parallel Framework for Unstructured Meshes. Aaron Becker, Isaac Dooley, Terry Wilmarth, Sayantan Chakravorty Charm++ Workshop 2008 ParFUM: A Parallel Framework for Unstructured Meshes Aaron Becker, Isaac Dooley, Terry Wilmarth, Sayantan Chakravorty Charm++ Workshop 2008 What is ParFUM? A framework for writing parallel finite element

More information

The optimize load balancing in cluster computing..

The optimize load balancing in cluster computing.. The optimize load balancing in cluster computing.. Mr. Sunil Kumar Pandey, Prof. Rajesh Tiwari. Computer Science and Engineering Department, shri sankaracharya college Of Engineering & Technology Bhilai.

More information

A SIMULATOR FOR LOAD BALANCING ANALYSIS IN DISTRIBUTED SYSTEMS

A SIMULATOR FOR LOAD BALANCING ANALYSIS IN DISTRIBUTED SYSTEMS Mihai Horia Zaharia, Florin Leon, Dan Galea (3) A Simulator for Load Balancing Analysis in Distributed Systems in A. Valachi, D. Galea, A. M. Florea, M. Craus (eds.) - Tehnologii informationale, Editura

More information

Praktikum Wissenschaftliches Rechnen (Performance-optimized optimized Programming)

Praktikum Wissenschaftliches Rechnen (Performance-optimized optimized Programming) Praktikum Wissenschaftliches Rechnen (Performance-optimized optimized Programming) Dynamic Load Balancing Dr. Ralf-Peter Mundani Center for Simulation Technology in Engineering Technische Universität München

More information

Feedback guided load balancing in a distributed memory environment

Feedback guided load balancing in a distributed memory environment Feedback guided load balancing in a distributed memory environment Constantinos Christofi August 18, 2011 MSc in High Performance Computing The University of Edinburgh Year of Presentation: 2011 Abstract

More information

Keywords: Dynamic Load Balancing, Process Migration, Load Indices, Threshold Level, Response Time, Process Age.

Keywords: Dynamic Load Balancing, Process Migration, Load Indices, Threshold Level, Response Time, Process Age. Volume 3, Issue 10, October 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Load Measurement

More information

HPC Deployment of OpenFOAM in an Industrial Setting

HPC Deployment of OpenFOAM in an Industrial Setting HPC Deployment of OpenFOAM in an Industrial Setting Hrvoje Jasak h.jasak@wikki.co.uk Wikki Ltd, United Kingdom PRACE Seminar: Industrial Usage of HPC Stockholm, Sweden, 28-29 March 2011 HPC Deployment

More information

Expanding the CASEsim Framework to Facilitate Load Balancing of Social Network Simulations

Expanding the CASEsim Framework to Facilitate Load Balancing of Social Network Simulations Expanding the CASEsim Framework to Facilitate Load Balancing of Social Network Simulations Amara Keller, Martin Kelly, Aaron Todd 4 June 2010 Abstract This research has two components, both involving the

More information

Various Schemes of Load Balancing in Distributed Systems- A Review

Various Schemes of Load Balancing in Distributed Systems- A Review 741 Various Schemes of Load Balancing in Distributed Systems- A Review Monika Kushwaha Pranveer Singh Institute of Technology Kanpur, U.P. (208020) U.P.T.U., Lucknow Saurabh Gupta Pranveer Singh Institute

More information

Resource Allocation Schemes for Gang Scheduling

Resource Allocation Schemes for Gang Scheduling Resource Allocation Schemes for Gang Scheduling B. B. Zhou School of Computing and Mathematics Deakin University Geelong, VIC 327, Australia D. Walsh R. P. Brent Department of Computer Science Australian

More information

BOĞAZİÇİ UNIVERSITY COMPUTER ENGINEERING

BOĞAZİÇİ UNIVERSITY COMPUTER ENGINEERING Parallel l Tetrahedral Mesh Refinement Mehmet Balman Computer Engineering, Boğaziçi University Adaptive Mesh Refinement (AMR) A computation ti technique used to improve the efficiency i of numerical systems

More information

Adaptive Time-Dependent CFD on Distributed Unstructured Meshes

Adaptive Time-Dependent CFD on Distributed Unstructured Meshes Adaptive Time-Dependent CFD on Distributed Unstructured Meshes Chris Walshaw and Martin Berzins School of Computer Studies, University of Leeds, Leeds, LS2 9JT, U K e-mails: chris@scsleedsacuk, martin@scsleedsacuk

More information

Comparison on Different Load Balancing Algorithms of Peer to Peer Networks

Comparison on Different Load Balancing Algorithms of Peer to Peer Networks Comparison on Different Load Balancing Algorithms of Peer to Peer Networks K.N.Sirisha *, S.Bhagya Rekha M.Tech,Software Engineering Noble college of Engineering & Technology for Women Web Technologies

More information

Performance Evaluation of Mobile Agent-based Dynamic Load Balancing Algorithm

Performance Evaluation of Mobile Agent-based Dynamic Load Balancing Algorithm Performance Evaluation of Mobile -based Dynamic Load Balancing Algorithm MAGDY SAEB, CHERINE FATHY Computer Engineering Department Arab Academy for Science, Technology & Maritime Transport Alexandria,

More information

A Comparison of Load Balancing Algorithms for AMR in Uintah

A Comparison of Load Balancing Algorithms for AMR in Uintah 1 A Comparison of Load Balancing Algorithms for AMR in Uintah Qingyu Meng, Justin Luitjens, Martin Berzins UUSCI-2008-006 Scientific Computing and Imaging Institute University of Utah Salt Lake City, UT

More information

How To Balance In Cloud Computing

How To Balance In Cloud Computing A Review on Load Balancing Algorithms in Cloud Hareesh M J Dept. of CSE, RSET, Kochi hareeshmjoseph@ gmail.com John P Martin Dept. of CSE, RSET, Kochi johnpm12@gmail.com Yedhu Sastri Dept. of IT, RSET,

More information

Load Balancing and Communication Optimization for Parallel Adaptive Finite Element Methods

Load Balancing and Communication Optimization for Parallel Adaptive Finite Element Methods Load Balancing and Communication Optimization for Parallel Adaptive Finite Element Methods J. E. Flaherty, R. M. Loy, P. C. Scully, M. S. Shephard, B. K. Szymanski, J. D. Teresco, and L. H. Ziantz Computer

More information

Sparse Matrix Decomposition with Optimal Load Balancing

Sparse Matrix Decomposition with Optimal Load Balancing Sparse Matrix Decomposition with Optimal Load Balancing Ali Pınar and Cevdet Aykanat Computer Engineering Department, Bilkent University TR06533 Bilkent, Ankara, Turkey apinar/aykanat @cs.bilkent.edu.tr

More information

Hash-Storage Techniques for Adaptive Multilevel Solvers and Their Domain Decomposition Parallelization

Hash-Storage Techniques for Adaptive Multilevel Solvers and Their Domain Decomposition Parallelization Contemporary Mathematics Volume 218, 1998 B 0-8218-0988-1-03018-1 Hash-Storage Techniques for Adaptive Multilevel Solvers and Their Domain Decomposition Parallelization Michael Griebel and Gerhard Zumbusch

More information

Data Mining. Cluster Analysis: Advanced Concepts and Algorithms

Data Mining. Cluster Analysis: Advanced Concepts and Algorithms Data Mining Cluster Analysis: Advanced Concepts and Algorithms Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 More Clustering Methods Prototype-based clustering Density-based clustering Graph-based

More information

Data Warehousing und Data Mining

Data Warehousing und Data Mining Data Warehousing und Data Mining Multidimensionale Indexstrukturen Ulf Leser Wissensmanagement in der Bioinformatik Content of this Lecture Multidimensional Indexing Grid-Files Kd-trees Ulf Leser: Data

More information

Static Load Balancing of Parallel PDE Solver for Distributed Computing Environment

Static Load Balancing of Parallel PDE Solver for Distributed Computing Environment Static Load Balancing of Parallel PDE Solver for Distributed Computing Environment Shuichi Ichikawa and Shinji Yamashita Department of Knowledge-based Information Engineering, Toyohashi University of Technology

More information

Spatio-Temporal Mapping -A Technique for Overview Visualization of Time-Series Datasets-

Spatio-Temporal Mapping -A Technique for Overview Visualization of Time-Series Datasets- Progress in NUCLEAR SCIENCE and TECHNOLOGY, Vol. 2, pp.603-608 (2011) ARTICLE Spatio-Temporal Mapping -A Technique for Overview Visualization of Time-Series Datasets- Hiroko Nakamura MIYAMURA 1,*, Sachiko

More information

Source Code Transformations Strategies to Load-balance Grid Applications

Source Code Transformations Strategies to Load-balance Grid Applications Source Code Transformations Strategies to Load-balance Grid Applications Romaric David, Stéphane Genaud, Arnaud Giersch, Benjamin Schwarz, and Éric Violard LSIIT-ICPS, Université Louis Pasteur, Bd S. Brant,

More information

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 349 ISSN 2229-5518

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 349 ISSN 2229-5518 International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 349 Load Balancing Heterogeneous Request in DHT-based P2P Systems Mrs. Yogita A. Dalvi Dr. R. Shankar Mr. Atesh

More information

TESLA Report 2003-03

TESLA Report 2003-03 TESLA Report 23-3 A multigrid based 3D space-charge routine in the tracking code GPT Gisela Pöplau, Ursula van Rienen, Marieke de Loos and Bas van der Geer Institute of General Electrical Engineering,

More information

Iterative Solvers for Linear Systems

Iterative Solvers for Linear Systems 9th SimLab Course on Parallel Numerical Simulation, 4.10 8.10.2010 Iterative Solvers for Linear Systems Bernhard Gatzhammer Chair of Scientific Computing in Computer Science Technische Universität München

More information

Visualization of General Defined Space Data

Visualization of General Defined Space Data International Journal of Computer Graphics & Animation (IJCGA) Vol.3, No.4, October 013 Visualization of General Defined Space Data John R Rankin La Trobe University, Australia Abstract A new algorithm

More information

Parallel Programming

Parallel Programming Parallel Programming Parallel Architectures Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen fabregat@aices.rwth-aachen.de WS15/16 Parallel Architectures Acknowledgements Prof. Felix

More information

Load Balancing In Concurrent Parallel Applications

Load Balancing In Concurrent Parallel Applications Load Balancing In Concurrent Parallel Applications Jeff Figler Rochester Institute of Technology Computer Engineering Department Rochester, New York 14623 May 1999 Abstract A parallel concurrent application

More information

Distributed Particle Simulation Method on Adaptive Collaborative System

Distributed Particle Simulation Method on Adaptive Collaborative System Distributed Particle Simulation Method on Adaptive Collaborative System Yudong Sun, Zhengyu Liang, and Cho-Li Wang Department of Computer Science and Information Systems The University of Hong Kong, Pokfulam

More information

Dynamic Load Balancing in a Network of Workstations

Dynamic Load Balancing in a Network of Workstations Dynamic Load Balancing in a Network of Workstations 95.515F Research Report By: Shahzad Malik (219762) November 29, 2000 Table of Contents 1 Introduction 3 2 Load Balancing 4 2.1 Static Load Balancing

More information

Interconnection Networks. Interconnection Networks. Interconnection networks are used everywhere!

Interconnection Networks. Interconnection Networks. Interconnection networks are used everywhere! Interconnection Networks Interconnection Networks Interconnection networks are used everywhere! Supercomputers connecting the processors Routers connecting the ports can consider a router as a parallel

More information

How High a Degree is High Enough for High Order Finite Elements?

How High a Degree is High Enough for High Order Finite Elements? This space is reserved for the Procedia header, do not use it How High a Degree is High Enough for High Order Finite Elements? William F. National Institute of Standards and Technology, Gaithersburg, Maryland,

More information

A STUDY OF TASK SCHEDULING IN MULTIPROCESSOR ENVIROMENT Ranjit Rajak 1, C.P.Katti 2, Nidhi Rajak 3

A STUDY OF TASK SCHEDULING IN MULTIPROCESSOR ENVIROMENT Ranjit Rajak 1, C.P.Katti 2, Nidhi Rajak 3 A STUDY OF TASK SCHEDULING IN MULTIPROCESSOR ENVIROMENT Ranjit Rajak 1, C.P.Katti, Nidhi Rajak 1 Department of Computer Science & Applications, Dr.H.S.Gour Central University, Sagar, India, ranjit.jnu@gmail.com

More information

OpenFOAM Optimization Tools

OpenFOAM Optimization Tools OpenFOAM Optimization Tools Henrik Rusche and Aleks Jemcov h.rusche@wikki-gmbh.de and a.jemcov@wikki.co.uk Wikki, Germany and United Kingdom OpenFOAM Optimization Tools p. 1 Agenda Objective Review optimisation

More information

Lecture 10: Regression Trees

Lecture 10: Regression Trees Lecture 10: Regression Trees 36-350: Data Mining October 11, 2006 Reading: Textbook, sections 5.2 and 10.5. The next three lectures are going to be about a particular kind of nonlinear predictive model,

More information

Parallel Ray Tracing using MPI: A Dynamic Load-balancing Approach

Parallel Ray Tracing using MPI: A Dynamic Load-balancing Approach Parallel Ray Tracing using MPI: A Dynamic Load-balancing Approach S. M. Ashraful Kadir 1 and Tazrian Khan 2 1 Scientific Computing, Royal Institute of Technology (KTH), Stockholm, Sweden smakadir@csc.kth.se,

More information

Hypergraph-based Dynamic Load Balancing for Adaptive Scientific Computations

Hypergraph-based Dynamic Load Balancing for Adaptive Scientific Computations Hypergraph-based Dynamic Load Balancing for Adaptive Scientific Computations Umit V. Catalyurek, Erik G. Boman, Karen D. Devine, Doruk Bozdağ, Robert Heaphy, and Lee Ann Riesen Ohio State University Sandia

More information

A Robust Dynamic Load-balancing Scheme for Data Parallel Application on Message Passing Architecture

A Robust Dynamic Load-balancing Scheme for Data Parallel Application on Message Passing Architecture A Robust Dynamic Load-balancing Scheme for Data Parallel Application on Message Passing Architecture Yangsuk Kee Department of Computer Engineering Seoul National University Seoul, 151-742, Korea Soonhoi

More information

Grapes Partition and the Ham Sandwich Theorem

Grapes Partition and the Ham Sandwich Theorem In Proc. 9 th SIAM Conf. on Parallel Processing (1999). Load Balancing 2-Phased Geometrically Based Problems Andrew A. Poe Quentin F. Stout Abstract A k-weighted graph is a directed graph where each vertex

More information

FRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG

FRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG FRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG INSTITUT FÜR INFORMATIK (MATHEMATISCHE MASCHINEN UND DATENVERARBEITUNG) Lehrstuhl für Informatik 10 (Systemsimulation) Massively Parallel Multilevel Finite

More information

PROVABLY GOOD PARTITIONING AND LOAD BALANCING ALGORITHMS FOR PARALLEL ADAPTIVE N-BODY SIMULATION

PROVABLY GOOD PARTITIONING AND LOAD BALANCING ALGORITHMS FOR PARALLEL ADAPTIVE N-BODY SIMULATION SIAM J. SCI. COMPUT. c 1998 Society for Industrial and Applied Mathematics Vol. 19, No. 2, pp. 635 656, March 1998 019 PROVABLY GOOD PARTITIONING AND LOAD BALANCING ALGORITHMS FOR PARALLEL ADAPTIVE N-BODY

More information

An Overview of CORBA-Based Load Balancing

An Overview of CORBA-Based Load Balancing An Overview of CORBA-Based Load Balancing Jian Shu, Linlan Liu, Shaowen Song, Member, IEEE Department of Computer Science Nanchang Institute of Aero-Technology,Nanchang, Jiangxi, P.R.China 330034 dylan_cn@yahoo.com

More information

Experiments on the local load balancing algorithms; part 1

Experiments on the local load balancing algorithms; part 1 Experiments on the local load balancing algorithms; part 1 Ştefan Măruşter Institute e-austria Timisoara West University of Timişoara, Romania maruster@info.uvt.ro Abstract. In this paper the influence

More information

Clustering & Visualization

Clustering & Visualization Chapter 5 Clustering & Visualization Clustering in high-dimensional databases is an important problem and there are a number of different clustering paradigms which are applicable to high-dimensional data.

More information

An Ecient Dynamic Load Balancing using the Dimension Exchange. Ju-wook Jang. of balancing load among processors, most of the realworld

An Ecient Dynamic Load Balancing using the Dimension Exchange. Ju-wook Jang. of balancing load among processors, most of the realworld An Ecient Dynamic Load Balancing using the Dimension Exchange Method for Balancing of Quantized Loads on Hypercube Multiprocessors * Hwakyung Rim Dept. of Computer Science Seoul Korea 11-74 ackyung@arqlab1.sogang.ac.kr

More information

LOAD BALANCING FOR MULTIPLE PARALLEL JOBS

LOAD BALANCING FOR MULTIPLE PARALLEL JOBS European Congress on Computational Methods in Applied Sciences and Engineering ECCOMAS 2000 Barcelona, 11-14 September 2000 ECCOMAS LOAD BALANCING FOR MULTIPLE PARALLEL JOBS A. Ecer, Y. P. Chien, H.U Akay

More information

Topological Properties

Topological Properties Advanced Computer Architecture Topological Properties Routing Distance: Number of links on route Node degree: Number of channels per node Network diameter: Longest minimum routing distance between any

More information

Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes

Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes Eric Petit, Loïc Thebault, Quang V. Dinh May 2014 EXA2CT Consortium 2 WPs Organization Proto-Applications

More information

PARALLEL PROGRAMMING

PARALLEL PROGRAMMING PARALLEL PROGRAMMING TECHNIQUES AND APPLICATIONS USING NETWORKED WORKSTATIONS AND PARALLEL COMPUTERS 2nd Edition BARRY WILKINSON University of North Carolina at Charlotte Western Carolina University MICHAEL

More information

SIMULATION OF LOAD BALANCING ALGORITHMS: A Comparative Study

SIMULATION OF LOAD BALANCING ALGORITHMS: A Comparative Study SIMULATION OF LOAD BALANCING ALGORITHMS: A Comparative Study Milan E. Soklic Abstract This article introduces a new load balancing algorithm, called diffusive load balancing, and compares its performance

More information

Introduction to Cloud Computing

Introduction to Cloud Computing Introduction to Cloud Computing Parallel Processing I 15 319, spring 2010 7 th Lecture, Feb 2 nd Majd F. Sakr Lecture Motivation Concurrency and why? Different flavors of parallel computing Get the basic

More information

Improved Hybrid Dynamic Load Balancing Algorithm for Distributed Environment

Improved Hybrid Dynamic Load Balancing Algorithm for Distributed Environment International Journal of Scientific and Research Publications, Volume 3, Issue 3, March 2013 1 Improved Hybrid Dynamic Load Balancing Algorithm for Distributed Environment UrjashreePatil*, RajashreeShedge**

More information

A Load Balancing Algorithm based on the Variation Trend of Entropy in Homogeneous Cluster

A Load Balancing Algorithm based on the Variation Trend of Entropy in Homogeneous Cluster , pp.11-20 http://dx.doi.org/10.14257/ ijgdc.2014.7.2.02 A Load Balancing Algorithm based on the Variation Trend of Entropy in Homogeneous Cluster Kehe Wu 1, Long Chen 2, Shichao Ye 2 and Yi Li 2 1 Beijing

More information

Scaling 10Gb/s Clustering at Wire-Speed

Scaling 10Gb/s Clustering at Wire-Speed Scaling 10Gb/s Clustering at Wire-Speed InfiniBand offers cost-effective wire-speed scaling with deterministic performance Mellanox Technologies Inc. 2900 Stender Way, Santa Clara, CA 95054 Tel: 408-970-3400

More information

A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster

A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster Acta Technica Jaurinensis Vol. 3. No. 1. 010 A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster G. Molnárka, N. Varjasi Széchenyi István University Győr, Hungary, H-906

More information

Scientific Computing Programming with Parallel Objects

Scientific Computing Programming with Parallel Objects Scientific Computing Programming with Parallel Objects Esteban Meneses, PhD School of Computing, Costa Rica Institute of Technology Parallel Architectures Galore Personal Computing Embedded Computing Moore

More information

Cloud Computing is NP-Complete

Cloud Computing is NP-Complete Working Paper, February 2, 20 Joe Weinman Permalink: http://www.joeweinman.com/resources/joe_weinman_cloud_computing_is_np-complete.pdf Abstract Cloud computing is a rapidly emerging paradigm for computing,

More information

Load Balancing and Data Locality in Adaptive Hierarchical N-body Methods: Barnes-Hut, Fast Multipole, and Radiosity

Load Balancing and Data Locality in Adaptive Hierarchical N-body Methods: Barnes-Hut, Fast Multipole, and Radiosity Load Balancing and Data Locality in Adaptive Hierarchical N-body Methods: Barnes-Hut, Fast Multipole, and Radiosity Jaswinder Pal Singh, Chris Holt, Takashi Totsuka, Anoop Gupta and John L. Hennessy Computer

More information

A Comparison of General Approaches to Multiprocessor Scheduling

A Comparison of General Approaches to Multiprocessor Scheduling A Comparison of General Approaches to Multiprocessor Scheduling Jing-Chiou Liou AT&T Laboratories Middletown, NJ 0778, USA jing@jolt.mt.att.com Michael A. Palis Department of Computer Science Rutgers University

More information

Control 2004, University of Bath, UK, September 2004

Control 2004, University of Bath, UK, September 2004 Control, University of Bath, UK, September ID- IMPACT OF DEPENDENCY AND LOAD BALANCING IN MULTITHREADING REAL-TIME CONTROL ALGORITHMS M A Hossain and M O Tokhi Department of Computing, The University of

More information

Finite Element Method (ENGC 6321) Syllabus. Second Semester 2013-2014

Finite Element Method (ENGC 6321) Syllabus. Second Semester 2013-2014 Finite Element Method Finite Element Method (ENGC 6321) Syllabus Second Semester 2013-2014 Objectives Understand the basic theory of the FEM Know the behaviour and usage of each type of elements covered

More information

Load Balancing on a Grid Using Data Characteristics

Load Balancing on a Grid Using Data Characteristics Load Balancing on a Grid Using Data Characteristics Jonathan White and Dale R. Thompson Computer Science and Computer Engineering Department University of Arkansas Fayetteville, AR 72701, USA {jlw09, drt}@uark.edu

More information

AN APPROACH FOR SECURE CLOUD COMPUTING FOR FEM SIMULATION

AN APPROACH FOR SECURE CLOUD COMPUTING FOR FEM SIMULATION AN APPROACH FOR SECURE CLOUD COMPUTING FOR FEM SIMULATION Jörg Frochte *, Christof Kaufmann, Patrick Bouillon Dept. of Electrical Engineering and Computer Science Bochum University of Applied Science 42579

More information

Multiphase Flow - Appendices

Multiphase Flow - Appendices Discovery Laboratory Multiphase Flow - Appendices 1. Creating a Mesh 1.1. What is a geometry? The geometry used in a CFD simulation defines the problem domain and boundaries; it is the area (2D) or volume

More information

Sanjeev Kumar. contribute

Sanjeev Kumar. contribute RESEARCH ISSUES IN DATAA MINING Sanjeev Kumar I.A.S.R.I., Library Avenue, Pusa, New Delhi-110012 sanjeevk@iasri.res.in 1. Introduction The field of data mining and knowledgee discovery is emerging as a

More information

GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications

GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications Harris Z. Zebrowitz Lockheed Martin Advanced Technology Laboratories 1 Federal Street Camden, NJ 08102

More information