Dynamic load balancing of parallel cellular automata
|
|
|
- Brianne Boyd
- 10 years ago
- Views:
Transcription
1 Dynamic load balancing of parallel cellular automata Marc Mazzariol, Benoit A. Gennart, Roger D. Hersch Ecole Polytechnique Fédérale de Lausanne, EPFL * ABSTRACT We are interested in running in parallel cellular automata. We present an algorithm which explores the dynamic remapping of cells in order to balance the load between the processing nodes. The parallel application runs on a cluster of PCs connected by Fast-Ethernet. A general cellular automaton can be described as a set of cells where each cell is a state machine. To compute the next cell state, each cell needs some information from neighbouring cells. There are no limitations on the kind of information exchanged nor on the computation itself. Only the automaton topology defining the neighbours of each cell remains unchanged during the automaton s life. As a typical example of a cellular automaton we consider the image skeletonization problem. Skeletonization requires spatial filtering to be repetitively applied to the image. Each step erodes a thin part of the original image. After the last step, only the image skeleton remains. Skeletonization algorithms require vast amounts of computing power, especially when applied to large images. Therefore, skeletonization application can potentially benefit from the use of parallel processing. Two different parallel algorithms are proposed, one with a static load distribution consisting in splitting the cells over several processing nodes and the other with a dynamic load balancing scheme capable of remapping cells during the program execution. Performance measurements shows that the cell migration doesn t reduce the speedup if the program is already load balanced. It greatly improves the performance if the parallel application is not well balanced. Keywords : parallel cellular automata, dynamic load balancing, image filtering, skeletonization, Computer-Aided Parallelization (CAP), cluster of PCs 1 INTRODUCTION We are interested in running in parallel cellular automata. We present an algorithm which explores the dynamic remapping of cells in order to balance the load between the processing nodes. The parallel application runs on a cluster of PCs (Windows NT) connected by Fast-Ethernet (100 Mbits/sec). A general cellular automaton 1,2 can be described as a set of cells where each cell is a state machine. To compute the next cell state, each cell needs some information from neighbouring cells. There are no limitations on the kind of information exchanged nor on the computation itself. Only the automaton topology defining the neighbours of each cell remains unchanged during the automaton s life. Let us describe a simple solution for the parallel execution of a cellular automaton. The cells are distributed over several threads running on different computers. Each thread is responsible for running several automaton cells. Every thread applies successively to all its cells a 3 steps algorithm : (1)(2) exchange (send and receive) neighbouring information, (3) compute the next cell state. If communications are based on synchronous message passing, the whole system is synchronized at exchange time because of the neighbourhood dependencies. Due to the serial execution of communications, computations and multiple synchronizations, some processors remain partly idle and the achievable speedup does not scale when increasing the number of processors. Improved performance can be obtained by running communications asynchronously. One can then overlap data exchange with computation. Neighbouring information is received during the computation of the previous step and sent during the computation of the next step. This solution offers improved performances, but still does not achieve a linear speedup. Like in the skeletonization problem, discussed in section 2, the computation load may be highly data dependent and may considerably vary from cell to cell. Furthermore, the parallel application may run on heterogeneous processors inducing a severe load balancing problem. Due to the neighbouring dependencies, cells consuming more computation time slow down the whole system. To reach an optimal solution we need a flexible load balancing scheme. * Contacts : [email protected], [email protected] Proceedings SPIE Conference, Volume 4118, Parallel and Distributed Methods for Image Processing IV, 30 July 2000, San Diego, USA, 21-29
2 One solution is to allow each cell to be dynamically remapped during program execution. One or more cells may be displaced from overloaded threads to partly idle threads. Cell remapping requires 3 steps after terminating the computation of the cell to be remapped : (1) notify every thread about the decision to remap a given cell, (2) wait for acknowledgement from all threads and (3) remap the cell. Step (2) ensures that the neighbourhood information for the remapped cell is redirected towards the target thread. In the applications we consider, the overhead for remapping a cell is insignificant compared with the computation time. For the sake of load balancing, we will present in section 4 a trategy for cell remapping. As a typical example of a cellular automaton, we consider the image skeletonization problem 3,4. Skeletonization requires spatial filtering to be repetitively applied to the image. Each step erodes a thin part of the original image. After the last step, only the image skeleton remains. Skeletonization algorithms require vast amounts of computing power, especially when applied to large images. Therefore, skeletonization application can potentially benefit from the use of parallel processing. To parallelize image skeletonization, we divide the original image into tiles. These tiles are distributed across several threads. Each thread applies successively the skeletonization algorithm to all its tiles. Threads are mapped onto several processors according to a configuration file. Tiles cannot be processed independently from their neigbouring tiles. Before each computation step, neighbouring tiles need to exchange their borders. In addition, each computation step depends on the preceding step. Section 2 presents the image skeletonization algorithm. Section 3 develops a parallelization scheme. Section 4 shows how to load balance the application by cell remapping. The performance analysis is presented in section 5. 2 IMAGE SKELETONIZATION ALGORITHM Image skeletonization consists of extracting the skeleton from an input black and white image. The algorithm erodes repeatedly the image until only the skeleton remains. The erosion is performed by applying a 5x5 thinning filter to the whole image. The thinning filter is applied repeatedly, thinning the input image pixel by pixel. The algorithm ends once the thinning process leaves the image unchanged. Figure 1 shows a skeletonized image. Original image Figure 1. Skeletonization algorithm example Skeletonized image Since several skeletonization algorithms exist, let us describe the one providing excellent results 4. Let TR( P 1 ) be the number of white to black (0 1) transitions in the ordered set of pixels P 2, P 3, P 4,..., P 9, P 2 describing the neighbourhood of pixel P 1 (Fig. 2). Let NZ( P 1 ) be the number of black neighbours of P 1 (black = 1). P 1 is deleted, i.e. set to background (white = 0) if : 2 NZ( P 1 ) 6 and TR( P 1 ) = 1 and P 2 P 4 P 8 = 0 or TR( P 2 ) 1 and P 2 P 4 P 6 = 0 or TR( P 4 ) 1 (1)
3 The process is repeated as long as changes occur. This algorithm is highly data dependent. One thinning filter step modifies only small parts of the input image and leaves the major part unchanged. In the next section we take advantage of this fact to improve the algorithm. P 3 P 4 P 5 P 2 P 1 P 6 P 9 P 8 P 7 Figure 2. Neighbourhood of pixel P Improvement of the image skeletonization algorithm To improve the image skeletonization algorithm, we divide the input image into cells (or tiles). The program maintains a list of living and dead cells. A cell is dead if further applications of the thinning filter leave the cell unchanged. A cell is alive if it is not dead. The algorithm applies the thinning filter to each living cell, decides if the cell is still alive and if necessary updates the list of living and dead cells. This algorithm improves considerably the performance of the skeletonization since a significant part of the image is removed from the computation. What should be the cell size? A small cell size ensures a fine grain selection of the living and dead parts of the image, but increases the cell management overhead. The cell size should be chosen so as to keep the management overhead time small compared with the average computation time. C0 C1 C2 C3 C4 C5 C6 C7 L0 L1 L2 L3 L4 L5 L6 L7 Original image splitted image Figure 3. Division of the input image into cells 3 PARALLEL SKELETONIZATION WITH STATIC LOAD DISTRIBUTION To parallelize the skeletonization algorithm, the cells are uniformly distributed over N processing nodes. Each processing node applies repeatedly the thinning filter to all its living cells. Since the cells can not be processed independently from their neighbours, the processing nodes may need to exchange neighbouring information before applying the thinning filter to a given cell. The program ends once all the cells of every processing node are dead. This parallelization scheme ensures that all processing nodes are performing the same task. Initially the cells are distributed in a round robin fashion over the N processing nodes. For example if there are 4 processing units, the cells (Fig. 3) are distributed in row major order modulo the number of nodes ((L0, C0)->P0, (L0, C1)->P1,...).
4 More generally, if there are LMax lines, CMax columns and N processing nodes, the distribution of the cells over the processing nodes in function of the line and column numbers (L, C) is given by : NodeIndex( L, C) = ( L CMax + C) Modulo N (2) In the parallel algorithm, the overhead for the exchange of information between neighbouring cells increases since communication and synchronization is needed between processing nodes responsible for adjacent cells. The parallel program requires therefore larger cell sizes. To develop the parallel application, we use the Computer-Aided Parallelization (CAP) framework, which allows to manage the neighbourhood dependencies and the data flow synchronization. The CAP Computer-Aided Parallelization framework 5,6 is specially well suited for the parallelization of applications having significant communication and I/O bandwidth requirements. Application programmers specify at a high level of abstraction the set of threads present in the application, the processing operations offered by these threads, and the flow of data and parameters between operations. Such a specification is precompiled into a C++ source program which can be compiled and run on a cluster of distributed memory PCs. A configuration file specifies the mapping between CAP threads and operating system processes possibly located on different PCs. The compiled application is executed in a completely asynchronous manner : each thread has a queue of input tokens (serializable C++ data structures) containing the operation execution requests and their parameters. Network I/O operations are executed asynchronously, i.e. while data is being transferred to or from the network, other operations can be executed concurrently by the corresponding processing node. If the application is compute bound, in a pipeline of network communication and processing operations, CAP allows to hide the time taken by the network communications. After initialization of the pipeline, only the processing time, i.e. the cell state computation, determines the overall processing time. Each processing node contains two threads (IOThread, ComputeThread) running in one shared address space. The address space stores the data relative to all its cells. The IOThread runs the asynchronously called ReceiveNeighbouringInfo and SendNeighbouringInfo functions. The ComputeThread runs the ComputeNextCellStep function. The ReceiveNeighbouring- Info function receives the neighbouring information from the other cells and puts it in the local processing node memory. The SendNeighbouringInfo function sends the neighbouring information from the local cells to the neighbouring cells. The ComputeNextCellStep applies the skeletonization filter to the given cell. Before starting the computation, the ComputeNext- CellStep waits for the required neighbouring information and until the current neighbouring information has been sent. Once the filter is applied, the ComputeNextCellStep determines if the cell is still alive, and if necessary updates the list of living and dead cells. IOThread SendNeighbouringInfo SendNeighbouringInfo reception is done by other cells sending is done by other cells ReceiveNeighbouringInfo ReceiveNeighbouringInfo sending the neighbouring information to neighbouring cells receiving the neighbouring information from neighbouring cells ComputeThread Wait until send completes wait until receive completes ComputeNextCellStep compute the next cell step after having exchanged the needed neighbouring information Figure 4. Parallel skeletonization algorithm schedule for one cell Each processing node runs the same schedule. The schedule comprises a main loop executing for all the living cells of the local address space one running step. A running step consists of executing asynchronously the SendNeighbouringInfo, ReceiveNeighbouringInfo functions and in parallel the ComputeNextCellStep function for the given cell. Figure 4 shows a
5 schematic view of the schedule. The CAP tool allows the programmer to specify this schedule by appropriate high-level language constructs 6. When the program starts, all the cells are at step zero. Each time the thinning filter is applied to a cell, the cell step is incremented by one. Because of the neighbouring dependencies, the differences between the step of a given cell and of its neighbours is at most one. Therefore, during the program execution, some cells are waiting for their neighbours to perform the computation of the next step. If all the cells of a processing node are waiting, the processor becomes idle, reducing the overall performance. To avoid as much as possible such a situation, the parallel algorithm is improved by computing first the cell with the smallest step value on each processing node. While some cells are sending their neighbouring information or waiting for the reception of neighbouring information, other cells could potentially keep the processor busy. This argument fails if there is just one cell per processing node or if the cellular automaton topology implies that every cell is depending on all other cells. In order to run computations in parallel with communications, one may partially compute a cell without knowing the neighbouring information. Cell computation may start while receiving the neighbouring information from other cells 7. If the total computation load is evenly distributed over the processing nodes, the parallel algorithm can potentially keep all the processors busy. However, in the case of the skeletonization algorithm, the computation time is highly data dependent. To keep all processors busy, we need to balance dynamically the computation load. 4 DYNAMIC LOAD BALANCED PARALLEL SCHEME For load balancing, we need to remap the cells during program execution. In order to migrate a cell from one address space to another, we need to maintain the load (or the inverse of the load, i.e. an idle factor) for each processing node. A simple way of computing the load is presented here. Let A be the average step value of all cells 1 A = CellStepValue (3) NumberOfCells AllCells For each processing node, we compute an IdleFactor by adding the signed differences between the processing node cell step values and the average step value A. When the processor is idle, the IdleFactor is set to a MaxIdleFactorValue minus the number of cell in the processing node. IdleFactor = AllCells ( CellStepValue A), if the processor is not idle MaxIdleFactorValue NumberOfCells, if the processor is idle A negative IdleFactor indicates that the cell step values of the corresponding processing node are behind the other processing nodes. A processing node with a strongly negative IdleFactor is overloaded and slows other processing nodes which run its neighbouring cells. A positive IdleFactor indicates that the corresponding processing node is ahead of the others. The processor of such a processing node may soon become idle since the neighbouring dependencies with the cells of other processing nodes will put it in a wait state. To balance the load, a cell from the processing node having the most negative IdleFactor should be remapped to the processing node having the largest positive IdleFactor. The IdleFactor is evaluated periodically, every time a new cell migration is performed. Between two cell migrations, a specific IntegrationTime allows the system to take advantage of the previous cell migration. In order to compute the IdleFactor, one thread, called MigrationThread, is added in each processing node. Periodically, a new token is generated and traverses all the MigrationThreads of every processing nodes. The token is generated in processing node P0, then it visits all processing nodes in the order : P1, P2,..., PN and back to P0. The migration token makes three full traversals in order to allow the parallel system to decide which cell to remap. During the first traversal, the migration token collects the number of living cells of each processing node and the sum of their step values. This information is distributed to all the processing nodes during the second traversal. During this same traversal, every node computes its IdleFactor. This IdleFactor is collected by the migration token and distributed over all processing nodes during the third and last traversal. Then every node decides in a distributed manner which processing nodes are involved in the migration. (4) A processing node having no cell to process should have a higher IdleFactor than processing nodes with cells waiting for neighbouring information.
6 The processing node from which the migration starts, migrates the cell with the smallest step value. In order to perform the migration, the IOThread broadcasts to every processing node the migration cell destination and waits for acknowledgment. Once the IOThread receives acknowledgments from every processing node, no further information for the migrating cell will be received on the current processing node. The IOThread sends the cell data and all the previously received neighbouring information to the destination processing node. The migration is done. The time period between each migration cycle is set by the IntegrationTime parameter. If the IntegrationTime is too short, the processing nodes will waste time for performing useless cell migrations. In the worst case, a too short IntegrationTime results in migrating all the cells of a processing node leaving it without any cell. If the IntegrationTime is too large, then the processors may become idle before receiving a migrated cell. Experiments show that it is difficult to find an a good IntegrationTime. In order to improve the cell remapping strategy, let us introduce the notion of stability. A processing node is stable if the difference between the CellStep values within a processing node is at most one : Processing node is stable Max( CellStepValue) Min( CellStepValue) 1 (5) A processing node is unstable, if it is not stable. Since the ComputeNextCellStep function processes first the cells with the smallest CellStep value, the stable state is a permanent state if no cell migration occurs. Without cell migration, each unstable processing node will sooner or later reach the stable state. The migration cell emission and receiving processing nodes are determined by the IdleFactor. In order to improve the cell migration strategy, we take into account two migration rules avoiding in some special cases the migration of cells. We do not migrate the cell if the cell receiving processing node is in an unstable state. This rule avoids to carry out consecutive migrations to the same cell receiving processing node. We also do not migrate if the migrated cell will leave the receiving processing node in a stable state. This rule avoids migration if the receiving processing node has no major advance compared with the emission processing node. These two rules are not applied if the receiving processing node is detected to be idle. The stability information is exchanged in the same way as the IdleFactor. 5 PERFORMANCE MEASUREMENT The performance measurements were carried out on three input images : a balanced input image, a highly unbalanced input image and a slightly unbalanced input image. The balanced input image (Fig. 5) consists of a repetitive pattern ensuring an evenly distributed computation load. In the highly unbalanced input image (Fig. 6), according to the cell distribution (eqn. 2), the non-empty cells are distributed unevenly across the processing nodes. One of two processing nodes receives empty cells which require only one computation step. The slightly unbalanced input image (Fig. 7) is an intermediate case between the balanced and the highly unbalanced input images. The balanced and unbalanced input images are of size 2048x2048 pixels (8 bits/pixel) and splitted into 16x16 cells, incorporating 128x128 pixels. The slightly unbalanced input image is of size 1024x1536 pixels (8 bits/pixel) and splitted into 8x12 cells, incorporating 128x128 pixels.. (a) (b) (c) Figure 5. Example (a) of a balanced input image, (b) after segmentation into cells and (c) after skeletonization
7 (a) (b) (c) Figure 6. Example (a) of a highly unbalanced input image, (b) after segmentation into cells and (c) after skeletonization (a) (b) (c) Figure 7. Example (a) of a slightly unbalanced input image, (b) after segmentation into cells and (c) after skeletonization. The N processing nodes are all Bi-Pentium Pro 200 MHz PCs running under WindowsNT 4.0. They are connected through a Fast Ethernet switch. Figure 8 shows the speedup for the balanced input image, the highly unbalanced input image and the slightly unbalanced input image. The measurements are done for 1 to 10 processors. The performances of the algorithms with and without the cell migration scheme are compared. The measurements for the dynamically load balanced algorithm are done with an IntegrationTime of 0.5 sec between each cell migration. In the case of the balanced input image, there is no significant performance difference between the two algorithms. For such an input image, the cell migration is useless since the load is perfectly balanced between the processing nodes. The results show that the overhead induced by the management of the cell migration is low. The parallelization does not provide a linear speedup because the neighbouring information exchange consumes processing resources (CPU power for the TCP/IP communication protocol). In the case of the highly unbalanced input image, the performances are considerably improved by dynamic load balancing. Without cell migration the efficiency (speedup/n) is approximately 50% since one processor of two becomes idle. Implementing the cell migration allows the parallel program to reach approximately the same speedup with a balanced or an unbalanced input image. In the case of the slightly unbalanced input image, the performances are improved by using the dynamically load balanced algorithm. Since in the input image, about one out of four cells is empty, the theoretically maximal performance improve-
8 ment factor obtained by the dynamically load balanced algorithm is 4 3 = Our algorithm reaches a performance improvement factor of The difference is due to the fact that some time is needed until the system reaches a load balanced state. speed-up ideal speedup with migration without migration nb. proc. speed-up ideal speedup with migration without migration nb. proc (a) balanced image (b) highly unbalanced image speed-up ideal speedup with migration without migration nb. proc. (c) slightly unbalanced image Figure 8. Speedup for (a) the balanced input image, (b) the highly unbalanced input image and (c) the slightly unbalanced image. Globally, performances are improved by dynamic cell remapping. No improvement is expected in the case of a well balanced input image. A major improvement is achieved in the case of an unbalanced input image. The presented results closely match the expected results. 6 CONCLUSIONS We are interested in the parallelization of cellular automata. Our experiment is based on a particular image skeletonization method. We have developed a parallellization algorithm which can be easily applied to other cellular automata. We explore two parallelization methods, one with a static load distribution consisting in splitting the cells over several processing nodes and the other with a dynamic load balancing scheme capable of remapping cells during the program execution. Performance measurements show that the cell migration doesn t reduce the speedup if the application is already load balanced. It improves the performance if the parallel application is not well balanced. Cellular automata have a wide range of applications : matrix computation, state machines, Von Neumann automata, etc. Many problems can be expressed as cellular automata. Developing from the scratch a custom parallel application requires a large effort. This paper shows the possibility of developing first a generic parallel cellular automaton and, on top of it, parallel applications making use of the cellular automaton program interface. This approach reduces the programming effort without loosing efficiency. ACKNOWLEDGEMENT We thank Gilles Ritter for having written the first version of a simple parallel skeletonization algorithm. This research is supported by the Swiss SPP-ICS research program, grant
9 REFERENCES 1. Moshe Sipper, Evolution of Parallel Cellular Machines, LNCS 1194, Springer Verlag, Toffoli and Margolus, Cellular Automata Machines : A New Environment for Modeling, MIT Press, Robert J.Schalkoff, Digital Image Processing and Computer Vision, Wiley, 1989, p Anil K. Jain, Fundamentals of Digital Image Processing, Prentice Hall, 1989, p V. Messerli, R. D. Hersch, Parallelizing I/O intensive Image Access and Processing Applications, IEEE Concurrency, Vol. 7, No.2, April/June 1999, B. Gennart, R. D. Hersch, Computer-Aided Synthesis of Parallel Image Processing Applications, Proceedings Conf. Parallel and Distributed Methods for Image Processing III, SPIE Int. Symp. on Opt. Science, Denver, July 1999, SPIE Vol-3817, B. Gennart, R. D. Hersch, Synthesizing parallel imaging applications using the CAP Computer-Aided Parallelization tool, Proceedings Conf. Storage & Retrieval for Image and Video Databases VI, San Jose, Jan. 98, SPIE Vol-3312,
Cellular Computing on a Linux Cluster
Cellular Computing on a Linux Cluster Alexei Agueev, Bernd Däne, Wolfgang Fengler TU Ilmenau, Department of Computer Architecture Topics 1. Cellular Computing 2. The Experiment 3. Experimental Results
Load Balancing. Load Balancing 1 / 24
Load Balancing Backtracking, branch & bound and alpha-beta pruning: how to assign work to idle processes without much communication? Additionally for alpha-beta pruning: implementing the young-brothers-wait
Parallel Ray Tracing using MPI: A Dynamic Load-balancing Approach
Parallel Ray Tracing using MPI: A Dynamic Load-balancing Approach S. M. Ashraful Kadir 1 and Tazrian Khan 2 1 Scientific Computing, Royal Institute of Technology (KTH), Stockholm, Sweden [email protected],
Load Balancing on a Non-dedicated Heterogeneous Network of Workstations
Load Balancing on a Non-dedicated Heterogeneous Network of Workstations Dr. Maurice Eggen Nathan Franklin Department of Computer Science Trinity University San Antonio, Texas 78212 Dr. Roger Eggen Department
A Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment
A Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment Panagiotis D. Michailidis and Konstantinos G. Margaritis Parallel and Distributed
- An Essential Building Block for Stable and Reliable Compute Clusters
Ferdinand Geier ParTec Cluster Competence Center GmbH, V. 1.4, March 2005 Cluster Middleware - An Essential Building Block for Stable and Reliable Compute Clusters Contents: Compute Clusters a Real Alternative
A Load Balancing Algorithm based on the Variation Trend of Entropy in Homogeneous Cluster
, pp.11-20 http://dx.doi.org/10.14257/ ijgdc.2014.7.2.02 A Load Balancing Algorithm based on the Variation Trend of Entropy in Homogeneous Cluster Kehe Wu 1, Long Chen 2, Shichao Ye 2 and Yi Li 2 1 Beijing
Distributed Dynamic Load Balancing for Iterative-Stencil Applications
Distributed Dynamic Load Balancing for Iterative-Stencil Applications G. Dethier 1, P. Marchot 2 and P.A. de Marneffe 1 1 EECS Department, University of Liege, Belgium 2 Chemical Engineering Department,
RESEARCH PAPER International Journal of Recent Trends in Engineering, Vol 1, No. 1, May 2009
An Algorithm for Dynamic Load Balancing in Distributed Systems with Multiple Supporting Nodes by Exploiting the Interrupt Service Parveen Jain 1, Daya Gupta 2 1,2 Delhi College of Engineering, New Delhi,
CHAPTER 1 INTRODUCTION
1 CHAPTER 1 INTRODUCTION 1.1 MOTIVATION OF RESEARCH Multicore processors have two or more execution cores (processors) implemented on a single chip having their own set of execution and architectural recourses.
Building an Inexpensive Parallel Computer
Res. Lett. Inf. Math. Sci., (2000) 1, 113-118 Available online at http://www.massey.ac.nz/~wwiims/rlims/ Building an Inexpensive Parallel Computer Lutz Grosz and Andre Barczak I.I.M.S., Massey University
Multilevel Load Balancing in NUMA Computers
FACULDADE DE INFORMÁTICA PUCRS - Brazil http://www.pucrs.br/inf/pos/ Multilevel Load Balancing in NUMA Computers M. Corrêa, R. Chanin, A. Sales, R. Scheer, A. Zorzo Technical Report Series Number 049 July,
Routing in packet-switching networks
Routing in packet-switching networks Circuit switching vs. Packet switching Most of WANs based on circuit or packet switching Circuit switching designed for voice Resources dedicated to a particular call
A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster
Acta Technica Jaurinensis Vol. 3. No. 1. 010 A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster G. Molnárka, N. Varjasi Széchenyi István University Győr, Hungary, H-906
Load Distribution in Large Scale Network Monitoring Infrastructures
Load Distribution in Large Scale Network Monitoring Infrastructures Josep Sanjuàs-Cuxart, Pere Barlet-Ros, Gianluca Iannaccone, and Josep Solé-Pareta Universitat Politècnica de Catalunya (UPC) {jsanjuas,pbarlet,pareta}@ac.upc.edu
Survey on Load Rebalancing for Distributed File System in Cloud
Survey on Load Rebalancing for Distributed File System in Cloud Prof. Pranalini S. Ketkar Ankita Bhimrao Patkure IT Department, DCOER, PG Scholar, Computer Department DCOER, Pune University Pune university
Parallel Processing over Mobile Ad Hoc Networks of Handheld Machines
Parallel Processing over Mobile Ad Hoc Networks of Handheld Machines Michael J Jipping Department of Computer Science Hope College Holland, MI 49423 [email protected] Gary Lewandowski Department of Mathematics
A Review on an Algorithm for Dynamic Load Balancing in Distributed Network with Multiple Supporting Nodes with Interrupt Service
A Review on an Algorithm for Dynamic Load Balancing in Distributed Network with Multiple Supporting Nodes with Interrupt Service Payal Malekar 1, Prof. Jagruti S. Wankhede 2 Student, Information Technology,
SCALABILITY AND AVAILABILITY
SCALABILITY AND AVAILABILITY Real Systems must be Scalable fast enough to handle the expected load and grow easily when the load grows Available available enough of the time Scalable Scale-up increase
A number of tasks executing serially or in parallel. Distribute tasks on processors so that minimal execution time is achieved. Optimal distribution
Scheduling MIMD parallel program A number of tasks executing serially or in parallel Lecture : Load Balancing The scheduling problem NP-complete problem (in general) Distribute tasks on processors so that
Load Balancing MPI Algorithm for High Throughput Applications
Load Balancing MPI Algorithm for High Throughput Applications Igor Grudenić, Stjepan Groš, Nikola Bogunović Faculty of Electrical Engineering and, University of Zagreb Unska 3, 10000 Zagreb, Croatia {igor.grudenic,
Keywords: Dynamic Load Balancing, Process Migration, Load Indices, Threshold Level, Response Time, Process Age.
Volume 3, Issue 10, October 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Load Measurement
Load Distribution on a Linux Cluster using Load Balancing
Load Distribution on a Linux Cluster using Load Balancing Aravind Elango M. Mohammed Safiq Undergraduate Students of Engg. Dept. of Computer Science and Engg. PSG College of Technology India Abstract:
Systolic Computing. Fundamentals
Systolic Computing Fundamentals Motivations for Systolic Processing PARALLEL ALGORITHMS WHICH MODEL OF COMPUTATION IS THE BETTER TO USE? HOW MUCH TIME WE EXPECT TO SAVE USING A PARALLEL ALGORITHM? HOW
Resource Allocation Schemes for Gang Scheduling
Resource Allocation Schemes for Gang Scheduling B. B. Zhou School of Computing and Mathematics Deakin University Geelong, VIC 327, Australia D. Walsh R. P. Brent Department of Computer Science Australian
Keywords Load balancing, Dispatcher, Distributed Cluster Server, Static Load balancing, Dynamic Load balancing.
Volume 5, Issue 7, July 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Hybrid Algorithm
Spring 2011 Prof. Hyesoon Kim
Spring 2011 Prof. Hyesoon Kim Today, we will study typical patterns of parallel programming This is just one of the ways. Materials are based on a book by Timothy. Decompose Into tasks Original Problem
Introduction to LAN/WAN. Network Layer
Introduction to LAN/WAN Network Layer Topics Introduction (5-5.1) Routing (5.2) (The core) Internetworking (5.5) Congestion Control (5.3) Network Layer Design Isues Store-and-Forward Packet Switching Services
A Review of Load Balancing Algorithms for Cloud Computing
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume - 3 Issue -9 September, 2014 Page No. 8297-8302 A Review of Load Balancing Algorithms for Cloud Computing Dr.G.N.K.Sureshbabu
Fair Scheduling Algorithm with Dynamic Load Balancing Using In Grid Computing
Research Inventy: International Journal Of Engineering And Science Vol.2, Issue 10 (April 2013), Pp 53-57 Issn(e): 2278-4721, Issn(p):2319-6483, Www.Researchinventy.Com Fair Scheduling Algorithm with Dynamic
Contributions to Gang Scheduling
CHAPTER 7 Contributions to Gang Scheduling In this Chapter, we present two techniques to improve Gang Scheduling policies by adopting the ideas of this Thesis. The first one, Performance- Driven Gang Scheduling,
SCALABILITY OF CONTEXTUAL GENERALIZATION PROCESSING USING PARTITIONING AND PARALLELIZATION. Marc-Olivier Briat, Jean-Luc Monnot, Edith M.
SCALABILITY OF CONTEXTUAL GENERALIZATION PROCESSING USING PARTITIONING AND PARALLELIZATION Abstract Marc-Olivier Briat, Jean-Luc Monnot, Edith M. Punt Esri, Redlands, California, USA [email protected], [email protected],
5. Binary objects labeling
Image Processing - Laboratory 5: Binary objects labeling 1 5. Binary objects labeling 5.1. Introduction In this laboratory an object labeling algorithm which allows you to label distinct objects from a
Various Schemes of Load Balancing in Distributed Systems- A Review
741 Various Schemes of Load Balancing in Distributed Systems- A Review Monika Kushwaha Pranveer Singh Institute of Technology Kanpur, U.P. (208020) U.P.T.U., Lucknow Saurabh Gupta Pranveer Singh Institute
The Advantages and Disadvantages of Network Computing Nodes
Big Data & Scripting storage networks and distributed file systems 1, 2, in the remainder we use networks of computing nodes to enable computations on even larger datasets for a computation, each node
Load balancing Static Load Balancing
Chapter 7 Load Balancing and Termination Detection Load balancing used to distribute computations fairly across processors in order to obtain the highest possible execution speed. Termination detection
A Brief Analysis on Architecture and Reliability of Cloud Based Data Storage
Volume 2, No.4, July August 2013 International Journal of Information Systems and Computer Sciences ISSN 2319 7595 Tejaswini S L Jayanthy et al., Available International Online Journal at http://warse.org/pdfs/ijiscs03242013.pdf
Communication Protocol
Analysis of the NXT Bluetooth Communication Protocol By Sivan Toledo September 2006 The NXT supports Bluetooth communication between a program running on the NXT and a program running on some other Bluetooth
Implementation of Full -Parallelism AES Encryption and Decryption
Implementation of Full -Parallelism AES Encryption and Decryption M.Anto Merline M.E-Commuication Systems, ECE Department K.Ramakrishnan College of Engineering-Samayapuram, Trichy. Abstract-Advanced Encryption
MANAGING OF IMMENSE CLOUD DATA BY LOAD BALANCING STRATEGY. Sara Anjum 1, B.Manasa 2
INTERNATIONAL JOURNAL OF ADVANCED RESEARCH IN ENGINEERING AND SCIENCE MANAGING OF IMMENSE CLOUD DATA BY LOAD BALANCING STRATEGY Sara Anjum 1, B.Manasa 2 1 M.Tech Student, Dept of CSE, A.M.R. Institute
MAXIMIZING RESTORABLE THROUGHPUT IN MPLS NETWORKS
MAXIMIZING RESTORABLE THROUGHPUT IN MPLS NETWORKS 1 M.LAKSHMI, 2 N.LAKSHMI 1 Assitant Professor, Dept.of.Computer science, MCC college.pattukottai. 2 Research Scholar, Dept.of.Computer science, MCC college.pattukottai.
LOAD BALANCING FOR MULTIPLE PARALLEL JOBS
European Congress on Computational Methods in Applied Sciences and Engineering ECCOMAS 2000 Barcelona, 11-14 September 2000 ECCOMAS LOAD BALANCING FOR MULTIPLE PARALLEL JOBS A. Ecer, Y. P. Chien, H.U Akay
How To Balance In Cloud Computing
A Review on Load Balancing Algorithms in Cloud Hareesh M J Dept. of CSE, RSET, Kochi hareeshmjoseph@ gmail.com John P Martin Dept. of CSE, RSET, Kochi [email protected] Yedhu Sastri Dept. of IT, RSET,
Java Virtual Machine: the key for accurated memory prefetching
Java Virtual Machine: the key for accurated memory prefetching Yolanda Becerra Jordi Garcia Toni Cortes Nacho Navarro Computer Architecture Department Universitat Politècnica de Catalunya Barcelona, Spain
PERFORMANCE STUDY AND SIMULATION OF AN ANYCAST PROTOCOL FOR WIRELESS MOBILE AD HOC NETWORKS
PERFORMANCE STUDY AND SIMULATION OF AN ANYCAST PROTOCOL FOR WIRELESS MOBILE AD HOC NETWORKS Reza Azizi Engineering Department, Bojnourd Branch, Islamic Azad University, Bojnourd, Iran [email protected]
A Clustered Approach for Load Balancing in Distributed Systems
SSRG International Journal of Mobile Computing & Application (SSRG-IJMCA) volume 2 Issue 1 Jan to Feb 2015 A Clustered Approach for Load Balancing in Distributed Systems Shweta Rajani 1, Niharika Garg
Building Scalable Applications Using Microsoft Technologies
Building Scalable Applications Using Microsoft Technologies Padma Krishnan Senior Manager Introduction CIOs lay great emphasis on application scalability and performance and rightly so. As business grows,
A Survey Of Various Load Balancing Algorithms In Cloud Computing
A Survey Of Various Load Balancing Algorithms In Cloud Computing Dharmesh Kashyap, Jaydeep Viradiya Abstract: Cloud computing is emerging as a new paradigm for manipulating, configuring, and accessing
Parallel Computing for Data Science
Parallel Computing for Data Science With Examples in R, C++ and CUDA Norman Matloff University of California, Davis USA (g) CRC Press Taylor & Francis Group Boca Raton London New York CRC Press is an imprint
Load Balancing in cloud computing
Load Balancing in cloud computing 1 Foram F Kherani, 2 Prof.Jignesh Vania Department of computer engineering, Lok Jagruti Kendra Institute of Technology, India 1 [email protected], 2 [email protected]
A Robust Dynamic Load-balancing Scheme for Data Parallel Application on Message Passing Architecture
A Robust Dynamic Load-balancing Scheme for Data Parallel Application on Message Passing Architecture Yangsuk Kee Department of Computer Engineering Seoul National University Seoul, 151-742, Korea Soonhoi
Chapter 14: Distributed Operating Systems
Chapter 14: Distributed Operating Systems Chapter 14: Distributed Operating Systems Motivation Types of Distributed Operating Systems Network Structure Network Topology Communication Structure Communication
Load Balancing and Termination Detection
Chapter 7 Load Balancing and Termination Detection 1 Load balancing used to distribute computations fairly across processors in order to obtain the highest possible execution speed. Termination detection
Middleware support for the Internet of Things
Middleware support for the Internet of Things Karl Aberer, Manfred Hauswirth, Ali Salehi School of Computer and Communication Sciences Ecole Polytechnique Fédérale de Lausanne (EPFL) CH-1015 Lausanne,
An Empirical Study and Analysis of the Dynamic Load Balancing Techniques Used in Parallel Computing Systems
An Empirical Study and Analysis of the Dynamic Load Balancing Techniques Used in Parallel Computing Systems Ardhendu Mandal and Subhas Chandra Pal Department of Computer Science and Application, University
Load Balancing on a Grid Using Data Characteristics
Load Balancing on a Grid Using Data Characteristics Jonathan White and Dale R. Thompson Computer Science and Computer Engineering Department University of Arkansas Fayetteville, AR 72701, USA {jlw09, drt}@uark.edu
An Implementation Of Multiprocessor Linux
An Implementation Of Multiprocessor Linux This document describes the implementation of a simple SMP Linux kernel extension and how to use this to develop SMP Linux kernels for architectures other than
Dynamic Load Balancing in a Network of Workstations
Dynamic Load Balancing in a Network of Workstations 95.515F Research Report By: Shahzad Malik (219762) November 29, 2000 Table of Contents 1 Introduction 3 2 Load Balancing 4 2.1 Static Load Balancing
A General Framework for Tracking Objects in a Multi-Camera Environment
A General Framework for Tracking Objects in a Multi-Camera Environment Karlene Nguyen, Gavin Yeung, Soheil Ghiasi, Majid Sarrafzadeh {karlene, gavin, soheil, majid}@cs.ucla.edu Abstract We present a framework
TCP Servers: Offloading TCP Processing in Internet Servers. Design, Implementation, and Performance
TCP Servers: Offloading TCP Processing in Internet Servers. Design, Implementation, and Performance M. Rangarajan, A. Bohra, K. Banerjee, E.V. Carrera, R. Bianchini, L. Iftode, W. Zwaenepoel. Presented
Comparison on Different Load Balancing Algorithms of Peer to Peer Networks
Comparison on Different Load Balancing Algorithms of Peer to Peer Networks K.N.Sirisha *, S.Bhagya Rekha M.Tech,Software Engineering Noble college of Engineering & Technology for Women Web Technologies
Chapter 16: Distributed Operating Systems
Module 16: Distributed ib System Structure, Silberschatz, Galvin and Gagne 2009 Chapter 16: Distributed Operating Systems Motivation Types of Network-Based Operating Systems Network Structure Network Topology
Capacity Estimation for Linux Workloads
Capacity Estimation for Linux Workloads Session L985 David Boyes Sine Nomine Associates 1 Agenda General Capacity Planning Issues Virtual Machine History and Value Unique Capacity Issues in Virtual Machines
An Active Packet can be classified as
Mobile Agents for Active Network Management By Rumeel Kazi and Patricia Morreale Stevens Institute of Technology Contact: rkazi,[email protected] Abstract-Traditionally, network management systems
A Review of Customized Dynamic Load Balancing for a Network of Workstations
A Review of Customized Dynamic Load Balancing for a Network of Workstations Taken from work done by: Mohammed Javeed Zaki, Wei Li, Srinivasan Parthasarathy Computer Science Department, University of Rochester
Scheduling Allowance Adaptability in Load Balancing technique for Distributed Systems
Scheduling Allowance Adaptability in Load Balancing technique for Distributed Systems G.Rajina #1, P.Nagaraju #2 #1 M.Tech, Computer Science Engineering, TallaPadmavathi Engineering College, Warangal,
Scalability and Classifications
Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static
Client/Server Computing Distributed Processing, Client/Server, and Clusters
Client/Server Computing Distributed Processing, Client/Server, and Clusters Chapter 13 Client machines are generally single-user PCs or workstations that provide a highly userfriendly interface to the
Quantifying the Performance Degradation of IPv6 for TCP in Windows and Linux Networking
Quantifying the Performance Degradation of IPv6 for TCP in Windows and Linux Networking Burjiz Soorty School of Computing and Mathematical Sciences Auckland University of Technology Auckland, New Zealand
Control 2004, University of Bath, UK, September 2004
Control, University of Bath, UK, September ID- IMPACT OF DEPENDENCY AND LOAD BALANCING IN MULTITHREADING REAL-TIME CONTROL ALGORITHMS M A Hossain and M O Tokhi Department of Computing, The University of
Reconfigurable Architecture Requirements for Co-Designed Virtual Machines
Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Kenneth B. Kent University of New Brunswick Faculty of Computer Science Fredericton, New Brunswick, Canada [email protected] Micaela Serra
Chapter 11 I/O Management and Disk Scheduling
Operating Systems: Internals and Design Principles, 6/E William Stallings Chapter 11 I/O Management and Disk Scheduling Dave Bremer Otago Polytechnic, NZ 2008, Prentice Hall I/O Devices Roadmap Organization
Thread level parallelism
Thread level parallelism ILP is used in straight line code or loops Cache miss (off-chip cache and main memory) is unlikely to be hidden using ILP. Thread level parallelism is used instead. Thread: process
Parallel Compression and Decompression of DNA Sequence Reads in FASTQ Format
, pp.91-100 http://dx.doi.org/10.14257/ijhit.2014.7.4.09 Parallel Compression and Decompression of DNA Sequence Reads in FASTQ Format Jingjing Zheng 1,* and Ting Wang 1, 2 1,* Parallel Software and Computational
Dynamic Adaptive Feedback of Load Balancing Strategy
Journal of Information & Computational Science 8: 10 (2011) 1901 1908 Available at http://www.joics.com Dynamic Adaptive Feedback of Load Balancing Strategy Hongbin Wang a,b, Zhiyi Fang a,, Shuang Cui
MPI and Hybrid Programming Models. William Gropp www.cs.illinois.edu/~wgropp
MPI and Hybrid Programming Models William Gropp www.cs.illinois.edu/~wgropp 2 What is a Hybrid Model? Combination of several parallel programming models in the same program May be mixed in the same source
Module 15: Network Structures
Module 15: Network Structures Background Topology Network Types Communication Communication Protocol Robustness Design Strategies 15.1 A Distributed System 15.2 Motivation Resource sharing sharing and
Load Balancing and Termination Detection
Chapter 7 slides7-1 Load Balancing and Termination Detection slides7-2 Load balancing used to distribute computations fairly across processors in order to obtain the highest possible execution speed. Termination
Optimal Service Pricing for a Cloud Cache
Optimal Service Pricing for a Cloud Cache K.SRAVANTHI Department of Computer Science & Engineering (M.Tech.) Sindura College of Engineering and Technology Ramagundam,Telangana G.LAKSHMI Asst. Professor,
Course 12 Synchronous transmission multiplexing systems used in digital telephone networks
Course 12 Synchronous transmission multiplexing systems used in digital telephone networks o Disadvantages of the PDH transmission multiplexing system PDH: no unitary international standardization of the
Multilevel Communication Aware Approach for Load Balancing
Multilevel Communication Aware Approach for Load Balancing 1 Dipti Patel, 2 Ashil Patel Department of Information Technology, L.D. College of Engineering, Gujarat Technological University, Ahmedabad 1
A Comparison of General Approaches to Multiprocessor Scheduling
A Comparison of General Approaches to Multiprocessor Scheduling Jing-Chiou Liou AT&T Laboratories Middletown, NJ 0778, USA [email protected] Michael A. Palis Department of Computer Science Rutgers University
An Approach to Load Balancing In Cloud Computing
An Approach to Load Balancing In Cloud Computing Radha Ramani Malladi Visiting Faculty, Martins Academy, Bangalore, India ABSTRACT: Cloud computing is a structured model that defines computing services,
SIMULATION OF LOAD BALANCING ALGORITHMS: A Comparative Study
SIMULATION OF LOAD BALANCING ALGORITHMS: A Comparative Study Milan E. Soklic Abstract This article introduces a new load balancing algorithm, called diffusive load balancing, and compares its performance
Lizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin
BUS ARCHITECTURES Lizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin Keywords: Bus standards, PCI bus, ISA bus, Bus protocols, Serial Buses, USB, IEEE 1394
Lecture 3: Scaling by Load Balancing 1. Comments on reviews i. 2. Topic 1: Scalability a. QUESTION: What are problems? i. These papers look at
Lecture 3: Scaling by Load Balancing 1. Comments on reviews i. 2. Topic 1: Scalability a. QUESTION: What are problems? i. These papers look at distributing load b. QUESTION: What is the context? i. How
Optimizing Parallel Reduction in CUDA. Mark Harris NVIDIA Developer Technology
Optimizing Parallel Reduction in CUDA Mark Harris NVIDIA Developer Technology Parallel Reduction Common and important data parallel primitive Easy to implement in CUDA Harder to get it right Serves as
A STUDY OF TASK SCHEDULING IN MULTIPROCESSOR ENVIROMENT Ranjit Rajak 1, C.P.Katti 2, Nidhi Rajak 3
A STUDY OF TASK SCHEDULING IN MULTIPROCESSOR ENVIROMENT Ranjit Rajak 1, C.P.Katti, Nidhi Rajak 1 Department of Computer Science & Applications, Dr.H.S.Gour Central University, Sagar, India, [email protected]
