?kt. An Unconventional Method for Load Balancing. w = C ( t m a z - ti) = p(tmaz - 0i=l. 1 Introduction. R. Alan McCoy,*

Size: px
Start display at page:

Download "?kt. An Unconventional Method for Load Balancing. w = C ( t m a z - ti) = p(tmaz - 0i=l. 1 Introduction. R. Alan McCoy,*"


1 ENL An Unconventional Method for Load Balancing Yuefan Deng,* R. Alan McCoy,* Robert B. Marr,t Ronald F. Peierlst Abstract A new method of load balancing is introduced based on the idea of dynamically relocating virtual processes corresponding to computations on an abstract system with a larger number of processors. The algorithm introduced preserves the locality of nearest neighbor interactions and has been tested on simulated data and a molecular dynamics code. 1 Introduction A general approach to the problem of load balance for distributed-memory MIMD architectures is developed. It is targeted at those computations whose parallel structure is obtained by decomposing a task into components which run mostly independently, each component computation involving its own private data for the most part, and with only occasional synchronization points. For such "mostly local" computations, there is often considerable flexibility in how the work is allocated between processors, leading to very different degrees of load balancing. In many cases, complete balance should be possible, in principle. There are a number of factors which make such a balanced computation difficult to achieve. What may represent a uniform decomposition from the problem definition may not correspond t o a uniform distribution of load, and non-uniform decompositions may be much harder to program. The load distribution may not even be predictable a priori or, worse still, may vary during the course of the calculation. For this reason dynamic load balancing techniques, preferably requiring minimal work for the applications programmer and extra computer resources, are of great interest. We have developed a general method for attacking this problem. Load balancing is in fact an optimization problem. Suppose that for a given task distributed among p processors with a particular decomposition the computing times are t i V i 5 p with maximum t,,, and average f, Processor i is therefore idle for time (t,,, - t i ), so that the total waste of resources in this computation is P w = C ( t m a z - ti) = p(tmaz - 0i=l It is useful to define a normalized imbalance ratio which measures the percentage waste of resources due to load imbalance. Ignoring, for the moment, any overhead computing time introduced by the decomposition, the load balancing problem involves finding a decomposition which minimizes 1. *Center for Scientific Computing, The University at Stony Brook, Stony Brook, N Y Department of Applied Sciences, Brookhaven National Laboratory, Upton, N Y 11973?kT OtSTRlBUTlON OF THIS DOCUMENT IS UNUMIT

2 2 DENGET AL. Our approach is to write the application code as if the number of available processors were some multiple of the actual number, and then run many such virtual processors (VPs) on each physical one. Concurrent with the application we analyze the load on each processor and then move entire VPs from one physical processor to another. This relocation is transparent to the application code being executed. The strategy for such relocation must depend on the particular problem. There are, in fact, three overhead costs introduced by the load balancing procedure: (1) Additional computation introduced by partitioning the problem into a larger number of pieces. (2) Extra communication costs in moving VPs away from their original assigned physical processors, when the problem decomposition is by spatial domain and the communication requirements are spatially local. (3) Costs to analyze and determine the desired relocation and to actually move the data associated with a VP. To completely solve the optimization taking the variation of these overhead factors into account is prohibitively expensive. Even an approximate solution may cost more than the load imbalance itself unless the load imbalance distribution changes slowly, relative to the interval between synchronization points in the computation, so that it is possible to allow many time steps to achieve improved balance. constraining the The approach we have adopted is to minimize 1,or equivalently t,,,, decomposition choices to keep the other costs small without formally including them in the optimization. 2 Virtual Processor Approach The simplest way of implementing the virtual processor approach is to actually run a single process on each processor, whose address space is partitioned into blocks corresponding to the virtual processes, and which repeatedly executes the computational code, successively pointing to the data blocks corresponding to the virtual processes associated with that processor. At intervals, such as after a certain number of time steps in a simulation, or whenever it is forced t o wait at a synchronization point for a more heavily loaded processor to catch up, the algorithm discussed below is executed. Once a better decomposition has been determined, the data blocks corresponding to the relocated VPs are moved to their new processors, and the lists of pointers adjusted correspondingly. If the code uses explicit message passing for communication, then it is important to trap the message passing calls to replace them with simple assignment statements if the virtual processors are on the same physical processor. In implementing the tests discussed here, we have made extensive use of the IPX package [3] which allows one processor to execute asynchronously procedures operating on the data of another processor. This execution does not require waiting for any explicit action by the destination processor, and can interrupt an ongoing computational thread. The destination processor can, however, block such preemption during critical sections of code. This enables the computation of the redistribution, and the actual movement of data, to be completely transparent to the underlying computational code being balanced. 3 Algorithm for Redistributing Load In a reasonably large problem, with many processors, and many virtual processors (VPs) for each physical one, there are generally many different rearrangements which would lead to the same imbalance if it were not for communications costs. The algorithm introduced here is based on the assumption that communication costs are significant, and dominated

3 DISCLAIMER Portions of this document may be illegible in electronic image products. Images are produced from the best available original document.

4 LOADBALANCING 3 by local interactions. Assume we are dealing with a computational region which is decomposed by a rectangular grid into cells, each VP executing the computation associated with a single cell. For simplicity, we discuss here the case of a two-dimensional decomposition, though the algorithm can be generalized straightforwardly to higher dimensional decompositions, and other than rectangular grids. We consider the processors to be also represented as a (coarser) rectangular grid, each grid cell corresponding to a processor. The two grids are related by the fact that each grid point of the coarse (processor) grid is mapped to a grid point of the fine (virtual process) grid, and that the lines bounding the processor cells are mapped into the lines connecting the mapped grid points. We then assign any VP to the processor whose mapped cell contains the center of the VP cell. We introduce the concept of a pressure associated with each processor cell, proportional to its wasted resources: pi = f - t i. As a result of this pressure, cells associated with light loads will tend to expand, while cells associated with heavy loads (negative pressure) will contract. This expansion or contraction is achieved by computing the net force at each mapped coarse grid point resulting from the pressures in the cells which share that point as a vertex, the boundaries being constrained to remain straight. We proceed iteratively, at each step allowing the mapping to change by zero or one fine grid step, depending on the magnitude and direction of the resulting force. It is clear, by construction, that the algorithm preserves locality and tends t o process neighboring regions on the same processor. We assert, without proof, that the configuration will tend to relax towards one with improved load balance. We have carried out a series of simulations for a variety of cases, and the results tend to confirm the hypothesis. We have also applied the algorithm to a real molecular dynamics calculation. The full details will be reported elsewhere. 3.1 Simulation Results A number of cases were studied, assigning arbitrary loads to the fine grid cells. We examined cases where the heavy load was concentrated in a single region, in several localized regions, or along one or more lines. We studied varying grid shapes and multiplicities. In every case the algorithm generated a significant improvement in the load distribution, though the approach to the best solution was not always monotonic. To avoid instability, we introduced a threshold, inhibiting any rearrangement if the force at a given vertex was too small. This is made necessary by the fact that the movement of a vertex is quantized to be at least one grid unit. If the threshold is too small, instability can occur: if the threshold is too large, the algorithm terminates before the best solution is reached. In the figures we show the results of two examples. In the first, the imbalance is very large and represents a smooth function peaking in the upper right hand corner. In the second case, we took an alternating configuration. Initially the algorithm results in zero net force at the interior vertices, but the reflecting boundary breaks the symmetry and the interior points in turn relax after a very few iterations. 4 Application to Molecular Dynamics Code We have implemented the load balance algorithm as part of a molecular dynamics (MD) code and observe significant gains in efficiency when load balancing it used. There is a large interest in load balancing MD algorithms, especially in the study of particles in non-

5 4 DENGET AL. Loads Processor I D s A2 A1 d Pmcsssor ID Bl FIG. 1. These figures show the simulated results of applying the proposed algorithm t o Iwo diflerent load distributions. In both cases there were 9 processors with 4 x 4 VPs on each. Figures A 1 and B1 show the loads on each processor before (dashed bars) and after (solid bars) the load balancing. Figures A 2 and B2 show the distribution of VPs among processors. The grey scale regions represent the initial VP distribution on the processors, with darker regions corresponding to heavier loaded processors. Within each processor, the VP loads are randomly assigned. The dark lines indicate the final distribution of VPs after balancing. In case A the imbalance was reduced from 198% to 9% and in case B from 67% to 10%.

6 LOAD BALANCING 5 per t l t p 7- I - I I / 6-5- r,,, )// c2 FIG. 2. Figure C1 shows CPU time for each step of a molecular dynamics code with load balancing (solid line) and with no load balancing (dashed line), for the same simulation of particles on 25 processors. Figure C2 shows Ihe final configuration of VPs (fine grid) on each processor. A strong attracting point near the origin created an imbalance across the periodic boundaries; the load balance algorithm re-distributed the VPs accordingly. equilibrium phases. Previous efforts to load balance MD codes are based on adjusting the size of regions and thus, are not easily extended beyond one-dimensional partitions [l].we show the results of the proposed algorithm for a two-dimensional partition below. The molecular dynamics algorithm we used for testing is a short-range, link-cell method for distributed memory computers, similar to those in [4, 5, 21. Our implementation simulates particles interacting in a three-dimensional parallelpiped with periodic boundary conditions. To create an imbalanced case for comparison, we start with particles equally distributed in a three-dimensional domain (with periodic boundaries), having a uniform (reduced) density of p = We place an attracting point near the origin and allow the particles to interact for 7500 steps. As the simulation progresses, the attracting point causes the particles to converge toward the origin. We partition the 3D domain according to a two-dimensional grid of 5 x 5 processors. We evenly distribute 225 VPs across the 25 processors, so that each processor initially has an array of 3 x 3 VPs. The load balance algorithm is performed once every 200 steps, and the VPs are re-distributed only when Z > 10%. For comparison, we also executed the same simulation and partitioning with no load balancing. The tests were performed on a Paragon XP/S-4 parallel computer. Figure 2 shows the efficiency gained by using our load balancing algorithm in an MD code. The non-balanced case shows a steady increase in the amount of time for each MD step; when the region around the attracting point becomes saturated the rate of increase falls off. The load balanced case shows a great gain in efficiency due t o the automatic adjustment of VPs; the time per step does not rise substantially beyond that of the initial configuration, The non-monotonic behavior of the load balanced case occurs because the VPs are only moved when the imbalance Z is above the threshold. There is minimal overhead due to the use of multiple VPs per processor. In our implementation, the time

7 6 DENGET AL. required to relocate a VP was less than 2 CPU seconds, and we observe that this time is recovered in just a few time steps by the increase in load balance efficiency. 5 Future Development There are a number of ways in which the algorithm as developed and implemented to date can be extended and improved. (1) It should be generalized to three dimensional decomposition, and non-rectangular space filling grids. (2) The approach should be enhanced with a good graphical interface to allow interactive control of the rearrangement strategy. (3) The pressure concept might be generalized to allow curved boundaries, to achieve better final distributions. (4)To improve convergence for large problems, a nested approach could be taken in which groups of processors were merged into supercells to which the same algorithm was applied, the individual processor mappings then being refined. (5) The algorithm should be extended to allow for heterogeneous programming environments. (6) Some method should be developed to allow for possible memory constraints restricting the number of VPs which a single processor can handle. (7) Other virtual processor algorithms should be developed for cases where the communication pattern is other than local. 6 Conclusions We have proposed an fairly robust approach to a class of load balancing problems which can be implemented with very little impact on the details of the application code being balanced. Preliminary results indicate that, although crude, the method can significantly reduce the load imbalance in many cases with very little effort on the programmer s part, and very little dependence on the details of the architecture. Acknowledgements YFD and RAM thank Professor James Glimm for encouragement and the National Science Foundation for partial financial support (grant DMS ) The work at BNL was supported by the U.S. Department of Energy under contract number DE-AC02-76CH References [l] F. Brugg and S. L. Fornili, A distributed dynamic load balancer and its implementation on multi-transputer systems for moleculac dynamics simulation, Computer Physics Comm., 60 (1990), pp [2] P. S. Lomdahl, P. Tamayo, N. Gronbech-Jensen, and D. M. Beazley, 50 GFlops Molecular Dynamics on the Connection Machine 5, Proceedings of SUPERCOMPUTING 1993, IEEE Press. [3] R. B. Marr, J. E. Pasciak, and R. Peierls, IPX - Preemptive remote procedure execution for concurrent applications, Brookhaven National Laboratory report BNL60632 (1994). [4] S. Plimpton, Fast parallel algorithms for short-range molecular dynamics, pre-print SAND911144, Sandia National Laboratories, (1993). [5] D. C. Rapaport, Multi-million particle molecular dynamics 11,Computer Physics Comm., 62 (1991) pp DISCLAIMER This report was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government nor any agency thereof. nor any of their employees, makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof. The views and opinions of authors expressed herein do not necessarily state or reflect those of the. - -

Characterizing the Performance of Dynamic Distribution and Load-Balancing Techniques for Adaptive Grid Hierarchies

Characterizing the Performance of Dynamic Distribution and Load-Balancing Techniques for Adaptive Grid Hierarchies Proceedings of the IASTED International Conference Parallel and Distributed Computing and Systems November 3-6, 1999 in Cambridge Massachusetts, USA Characterizing the Performance of Dynamic Distribution

More information


2-DFINITE ELEMENT CABLE & BOX IEMP ANALYSIS P 7. I 2-DFINITE ELEMENT CABLE & BOX IEMP ANALYSIS - "L C. David Turner and Gary J. Scrivner Sandia National Laboratories Albuquerque, NM 87185-1152 a ABSTRACT and multiple dielectric regions. The applicable

More information



More information

Fast Multipole Method for particle interactions: an open source parallel library component

Fast Multipole Method for particle interactions: an open source parallel library component Fast Multipole Method for particle interactions: an open source parallel library component F. A. Cruz 1,M.G.Knepley 2,andL.A.Barba 1 1 Department of Mathematics, University of Bristol, University Walk,

More information

Partitioning and Divide and Conquer Strategies

Partitioning and Divide and Conquer Strategies and Divide and Conquer Strategies Lecture 4 and Strategies Strategies Data partitioning aka domain decomposition Functional decomposition Lecture 4 and Strategies Quiz 4.1 For nuclear reactor simulation,

More information

A Review of Customized Dynamic Load Balancing for a Network of Workstations

A Review of Customized Dynamic Load Balancing for a Network of Workstations A Review of Customized Dynamic Load Balancing for a Network of Workstations Taken from work done by: Mohammed Javeed Zaki, Wei Li, Srinivasan Parthasarathy Computer Science Department, University of Rochester

More information

IRS: Implicit Radiation Solver Version 1.0 Benchmark Runs

IRS: Implicit Radiation Solver Version 1.0 Benchmark Runs IRS: Implicit Radiation Solver Version 1.0 Benchmark Runs This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.

More information

Asynchronous data change notification between database server and accelerator control systems

Asynchronous data change notification between database server and accelerator control systems BNL-95091-2011-CP Asynchronous data change notification between database server and accelerator control systems W. Fu, S. Nemesure, J. Morris Presented at the 13 th International Conference on Accelerator

More information

Testing of Kaonetics Devices at BNL

Testing of Kaonetics Devices at BNL BNL-95365-2011-IR Testing of Kaonetics Devices at BNL A. Bolotnikov, G. Smith and R. James June 2011 Nonproliferation and National Security Department Brookhaven National Laboratory U.S. Department of

More information

3 Extending the Refinement Calculus

3 Extending the Refinement Calculus Building BSP Programs Using the Refinement Calculus D.B. Skillicorn? Department of Computing and Information Science Queen s University, Kingston, Canada skill@qucis.queensu.ca Abstract. We extend the

More information

Load Balancing on a Non-dedicated Heterogeneous Network of Workstations

Load Balancing on a Non-dedicated Heterogeneous Network of Workstations Load Balancing on a Non-dedicated Heterogeneous Network of Workstations Dr. Maurice Eggen Nathan Franklin Department of Computer Science Trinity University San Antonio, Texas 78212 Dr. Roger Eggen Department

More information

Science Goals for the ARM Recovery Act Radars

Science Goals for the ARM Recovery Act Radars DOE/SC-ARM-12-010 Science Goals for the ARM Recovery Act Radars JH Mather May 2012 DISCLAIMER This report was prepared as an account of work sponsored by the U.S. Government. Neither the United States

More information



More information

Decentralized Method for Traffic Monitoring

Decentralized Method for Traffic Monitoring Decentralized Method for Traffic Monitoring Guillaume Sartoretti 1,2, Jean-Luc Falcone 1, Bastien Chopard 1, and Martin Gander 2 1 Computer Science Department 2 Department of Mathematics, University of

More information


ABSTRACT FOR THE 1ST INTERNATIONAL WORKSHOP ON HIGH ORDER CFD METHODS 1 ABSTRACT FOR THE 1ST INTERNATIONAL WORKSHOP ON HIGH ORDER CFD METHODS Sreenivas Varadan a, Kentaro Hara b, Eric Johnsen a, Bram Van Leer b a. Department of Mechanical Engineering, University of Michigan,

More information

Requesting Nodes, Processors, and Tasks in Moab

Requesting Nodes, Processors, and Tasks in Moab LLNL-MI-401783 LAWRENCE LIVERMORE NATIONAL LABORATORY Requesting Nodes, Processors, and Tasks in Moab D.A Lipari March 29, 2012 This document was prepared as an account of work sponsored by an agency of

More information

Second Line of Defense Virtual Private Network Guidance for Deployed and New CAS Systems

Second Line of Defense Virtual Private Network Guidance for Deployed and New CAS Systems PNNL-19266 Prepared for the U.S. Department of Energy under Contract DE-AC05-76RL01830 Second Line of Defense Virtual Private Network Guidance for Deployed and New CAS Systems SV Singh AI Thronas January

More information



More information

Dynamic Vulnerability Assessment

Dynamic Vulnerability Assessment SANDIA REPORT SAND2004-4712 Unlimited Release Printed September 2004 Dynamic Vulnerability Assessment Cynthia L. Nelson Prepared by Sandia National Laboratories Albuquerque, New Mexico 87185 and Livermore,

More information

Explicit Spatial Scattering for Load Balancing in Conservatively Synchronized Parallel Discrete-Event Simulations

Explicit Spatial Scattering for Load Balancing in Conservatively Synchronized Parallel Discrete-Event Simulations Explicit Spatial ing for Load Balancing in Conservatively Synchronized Parallel Discrete-Event Simulations Sunil Thulasidasan Shiva Prasad Kasiviswanathan Stephan Eidenbenz Phillip Romero Los Alamos National

More information

Public Service Co. of New Mexico (PNM) - Smoothing and Peak Shifting. DOE Peer Review Steve Willard, P.E. September 26, 2012

Public Service Co. of New Mexico (PNM) - Smoothing and Peak Shifting. DOE Peer Review Steve Willard, P.E. September 26, 2012 Public Service Co. of New Mexico (PNM) - PV Plus Storage for Simultaneous Voltage Smoothing and Peak Shifting DOE Peer Review Steve Willard, P.E. September 26, 2012 Project Goals Develop an even more Beneficial

More information

Multiphase Flow - Appendices

Multiphase Flow - Appendices Discovery Laboratory Multiphase Flow - Appendices 1. Creating a Mesh 1.1. What is a geometry? The geometry used in a CFD simulation defines the problem domain and boundaries; it is the area (2D) or volume

More information

A Comparative Performance Analysis of Load Balancing Algorithms in Distributed System using Qualitative Parameters

A Comparative Performance Analysis of Load Balancing Algorithms in Distributed System using Qualitative Parameters A Comparative Performance Analysis of Load Balancing Algorithms in Distributed System using Qualitative Parameters Abhijit A. Rajguru, S.S. Apte Abstract - A distributed system can be viewed as a collection

More information

Clustering & Visualization

Clustering & Visualization Chapter 5 Clustering & Visualization Clustering in high-dimensional databases is an important problem and there are a number of different clustering paradigms which are applicable to high-dimensional data.

More information

Porting the Plasma Simulation PIConGPU to Heterogeneous Architectures with Alpaka

Porting the Plasma Simulation PIConGPU to Heterogeneous Architectures with Alpaka Porting the Plasma Simulation PIConGPU to Heterogeneous Architectures with Alpaka René Widera1, Erik Zenker1,2, Guido Juckeland1, Benjamin Worpitz1,2, Axel Huebl1,2, Andreas Knüpfer2, Wolfgang E. Nagel2,

More information

Multi-GPU Load Balancing for Simulation and Rendering

Multi-GPU Load Balancing for Simulation and Rendering Multi- Load Balancing for Simulation and Rendering Yong Cao Computer Science Department, Virginia Tech, USA In-situ ualization and ual Analytics Instant visualization and interaction of computing tasks

More information

Status And Future Plans. Mitsuyoshi Tanaka. AGS Department.Brookhaven National Laboratory* Upton NY 11973, USA INTRODUCTION

Status And Future Plans. Mitsuyoshi Tanaka. AGS Department.Brookhaven National Laboratory* Upton NY 11973, USA INTRODUCTION 6th Conference on the Intersections of Particle & Nuclear Physics Big Sky, Montana May 27-June 2, 1997 / BNL-6 40 4 2 c0,lvf- 7 70 5 The BNL AGS Accelerator Complex Status And Future Plans Mitsuyoshi Tanaka

More information

Keywords: Dynamic Load Balancing, Process Migration, Load Indices, Threshold Level, Response Time, Process Age.

Keywords: Dynamic Load Balancing, Process Migration, Load Indices, Threshold Level, Response Time, Process Age. Volume 3, Issue 10, October 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Load Measurement

More information

Small Modular Nuclear Reactors: Parametric Modeling of Integrated Reactor Vessel Manufacturing Within A Factory Environment Volume 1

Small Modular Nuclear Reactors: Parametric Modeling of Integrated Reactor Vessel Manufacturing Within A Factory Environment Volume 1 Small Modular Nuclear Reactors: Parametric Modeling of Integrated Reactor Vessel Manufacturing Within A Factory Environment Volume 1 Xuan Chen, Arnold Kotlyarevsky, Andrew Kumiega, Jeff Terry, and Benxin

More information

HPC Deployment of OpenFOAM in an Industrial Setting

HPC Deployment of OpenFOAM in an Industrial Setting HPC Deployment of OpenFOAM in an Industrial Setting Hrvoje Jasak h.jasak@wikki.co.uk Wikki Ltd, United Kingdom PRACE Seminar: Industrial Usage of HPC Stockholm, Sweden, 28-29 March 2011 HPC Deployment

More information

IDC Reengineering Phase 2 & 3 US Industry Standard Cost Estimate Summary

IDC Reengineering Phase 2 & 3 US Industry Standard Cost Estimate Summary SANDIA REPORT SAND2015-20815X Unlimited Release January 2015 IDC Reengineering Phase 2 & 3 US Industry Standard Cost Estimate Summary Version 1.0 James Mark Harris, Robert M. Huelskamp Prepared by Sandia

More information

Climate-Weather Modeling Studies Using a Prototype Global Cloud-System Resolving Model

Climate-Weather Modeling Studies Using a Prototype Global Cloud-System Resolving Model ANL/ALCF/ESP-13/1 Climate-Weather Modeling Studies Using a Prototype Global Cloud-System Resolving Model ALCF-2 Early Science Program Technical Report Argonne Leadership Computing Facility About Argonne

More information

Improving a Gripper End Effector

Improving a Gripper End Effector PNNL-13440 Improving a Gripper End Effector OD Mullen CM Smith KL Gervais January 2001 Prepared for the U.S. Department of Energy under Contract DE-AC06-76RL01830 DISCLAIMER This report was prepared as

More information

Distributed Dynamic Load Balancing for Iterative-Stencil Applications

Distributed Dynamic Load Balancing for Iterative-Stencil Applications Distributed Dynamic Load Balancing for Iterative-Stencil Applications G. Dethier 1, P. Marchot 2 and P.A. de Marneffe 1 1 EECS Department, University of Liege, Belgium 2 Chemical Engineering Department,

More information

Segmentation of building models from dense 3D point-clouds

Segmentation of building models from dense 3D point-clouds Segmentation of building models from dense 3D point-clouds Joachim Bauer, Konrad Karner, Konrad Schindler, Andreas Klaus, Christopher Zach VRVis Research Center for Virtual Reality and Visualization, Institute

More information

Optimizing the Virtual Data Center

Optimizing the Virtual Data Center Optimizing the Virtual Center The ideal virtual data center dynamically balances workloads across a computing cluster and redistributes hardware resources among clusters in response to changing needs.

More information

Mesh Generation and Load Balancing

Mesh Generation and Load Balancing Mesh Generation and Load Balancing Stan Tomov Innovative Computing Laboratory Computer Science Department The University of Tennessee April 04, 2012 CS 594 04/04/2012 Slide 1 / 19 Outline Motivation Reliable

More information

Scientific Computing Programming with Parallel Objects

Scientific Computing Programming with Parallel Objects Scientific Computing Programming with Parallel Objects Esteban Meneses, PhD School of Computing, Costa Rica Institute of Technology Parallel Architectures Galore Personal Computing Embedded Computing Moore

More information

Operatin g Systems: Internals and Design Principle s. Chapter 10 Multiprocessor and Real-Time Scheduling Seventh Edition By William Stallings

Operatin g Systems: Internals and Design Principle s. Chapter 10 Multiprocessor and Real-Time Scheduling Seventh Edition By William Stallings Operatin g Systems: Internals and Design Principle s Chapter 10 Multiprocessor and Real-Time Scheduling Seventh Edition By William Stallings Operating Systems: Internals and Design Principles Bear in mind,

More information

An optimisation framework for determination of capacity in railway networks

An optimisation framework for determination of capacity in railway networks CASPT 2015 An optimisation framework for determination of capacity in railway networks Lars Wittrup Jensen Abstract Within the railway industry, high quality estimates on railway capacity is crucial information,

More information

Development of High Stability Supports for NSLS-II RF BPMS

Development of High Stability Supports for NSLS-II RF BPMS BNL-82316-2009-CP Development of High Stability Supports for NSLS-II RF BPMS B. Kosciuk, R. Alforque, B. Bacha, P. Cameron, F. Lincoln, V. Ravindranath, I. Pinayev, S. Sharma, O. Singh Brookhaven National

More information

New Dimensions in Configurable Computing at runtime simultaneously allows Big Data and fine Grain HPC

New Dimensions in Configurable Computing at runtime simultaneously allows Big Data and fine Grain HPC New Dimensions in Configurable Computing at runtime simultaneously allows Big Data and fine Grain HPC Alan Gara Intel Fellow Exascale Chief Architect Legal Disclaimer Today s presentations contain forward-looking

More information


DYNAMIC RANGE IMPROVEMENT THROUGH MULTIPLE EXPOSURES. Mark A. Robertson, Sean Borman, and Robert L. Stevenson c 1999 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or

More information

User Guide. The Business Energy Dashboard

User Guide. The Business Energy Dashboard User Guide The Business Energy Dashboard 1 More Ways to Understand and Control Your Energy Use At FPL, we re investing in smart grid technologies as part of our commitment to building a smarter, more reliable

More information



More information



More information

Muse Server Sizing. 18 June 2012. Document Version Muse

Muse Server Sizing. 18 June 2012. Document Version Muse Muse Server Sizing 18 June 2012 Document Version Muse Notice No part of this publication may be reproduced stored in a retrieval system, or transmitted, in any form or by any means, without

More information

Parallel Scalable Algorithms- Performance Parameters

Parallel Scalable Algorithms- Performance Parameters www.bsc.es Parallel Scalable Algorithms- Performance Parameters Vassil Alexandrov, ICREA - Barcelona Supercomputing Center, Spain Overview Sources of Overhead in Parallel Programs Performance Metrics for

More information

159.735. Final Report. Cluster Scheduling. Submitted by: Priti Lohani 04244354

159.735. Final Report. Cluster Scheduling. Submitted by: Priti Lohani 04244354 159.735 Final Report Cluster Scheduling Submitted by: Priti Lohani 04244354 1 Table of contents: 159.735... 1 Final Report... 1 Cluster Scheduling... 1 Table of contents:... 2 1. Introduction:... 3 1.1

More information

Capacity Estimation for Linux Workloads

Capacity Estimation for Linux Workloads Capacity Estimation for Linux Workloads Session L985 David Boyes Sine Nomine Associates 1 Agenda General Capacity Planning Issues Virtual Machine History and Value Unique Capacity Issues in Virtual Machines

More information

Grid Computing Approach for Dynamic Load Balancing

Grid Computing Approach for Dynamic Load Balancing International Journal of Computer Sciences and Engineering Open Access Review Paper Volume-4, Issue-1 E-ISSN: 2347-2693 Grid Computing Approach for Dynamic Load Balancing Kapil B. Morey 1*, Sachin B. Jadhav

More information


A SIMULATOR FOR LOAD BALANCING ANALYSIS IN DISTRIBUTED SYSTEMS Mihai Horia Zaharia, Florin Leon, Dan Galea (3) A Simulator for Load Balancing Analysis in Distributed Systems in A. Valachi, D. Galea, A. M. Florea, M. Craus (eds.) - Tehnologii informationale, Editura

More information

How To Balance In Cloud Computing

How To Balance In Cloud Computing A Review on Load Balancing Algorithms in Cloud Hareesh M J Dept. of CSE, RSET, Kochi hareeshmjoseph@ gmail.com John P Martin Dept. of CSE, RSET, Kochi johnpm12@gmail.com Yedhu Sastri Dept. of IT, RSET,

More information

IMCM: A Flexible Fine-Grained Adaptive Framework for Parallel Mobile Hybrid Cloud Applications

IMCM: A Flexible Fine-Grained Adaptive Framework for Parallel Mobile Hybrid Cloud Applications Open System Laboratory of University of Illinois at Urbana Champaign presents: Outline: IMCM: A Flexible Fine-Grained Adaptive Framework for Parallel Mobile Hybrid Cloud Applications A Fine-Grained Adaptive

More information

Objective Criteria of Job Scheduling Problems. Uwe Schwiegelshohn, Robotics Research Lab, TU Dortmund University

Objective Criteria of Job Scheduling Problems. Uwe Schwiegelshohn, Robotics Research Lab, TU Dortmund University Objective Criteria of Job Scheduling Problems Uwe Schwiegelshohn, Robotics Research Lab, TU Dortmund University 1 Jobs and Users in Job Scheduling Problems Independent users No or unknown precedence constraints

More information

The International Journal Of Science & Technoledge (ISSN 2321 919X) www.theijst.com

The International Journal Of Science & Technoledge (ISSN 2321 919X) www.theijst.com THE INTERNATIONAL JOURNAL OF SCIENCE & TECHNOLEDGE Efficient Parallel Processing on Public Cloud Servers using Load Balancing Manjunath K. C. M.Tech IV Sem, Department of CSE, SEA College of Engineering

More information

Cellular Computing on a Linux Cluster

Cellular Computing on a Linux Cluster Cellular Computing on a Linux Cluster Alexei Agueev, Bernd Däne, Wolfgang Fengler TU Ilmenau, Department of Computer Architecture Topics 1. Cellular Computing 2. The Experiment 3. Experimental Results

More information

Advances in Oxide-Confined Vertical Cavity Lasers. Photonics Research Department. Albuquerque, NM 87 185. (505)844-7287 phone (505)844-8985 FAX

Advances in Oxide-Confined Vertical Cavity Lasers. Photonics Research Department. Albuquerque, NM 87 185. (505)844-7287 phone (505)844-8985 FAX I Advances in Oxide-Confined Vertical Cavity Lasers Kent D. Choquette, R. P. Schneider, Jr., K. L, Lear, K. M. 6gsC;. H. Q. Hou, H. C. Chui, M. Hagerott Crawford, and W. W. C 7 -. h I4 k#&j Photonics Research

More information

Parallel Processing over Mobile Ad Hoc Networks of Handheld Machines

Parallel Processing over Mobile Ad Hoc Networks of Handheld Machines Parallel Processing over Mobile Ad Hoc Networks of Handheld Machines Michael J Jipping Department of Computer Science Hope College Holland, MI 49423 jipping@cs.hope.edu Gary Lewandowski Department of Mathematics

More information


SCALABILITY AND AVAILABILITY SCALABILITY AND AVAILABILITY Real Systems must be Scalable fast enough to handle the expected load and grow easily when the load grows Available available enough of the time Scalable Scale-up increase

More information

Jan F. Prins. Work-efficient Techniques for the Parallel Execution of Sparse Grid-based Computations TR91-042

Jan F. Prins. Work-efficient Techniques for the Parallel Execution of Sparse Grid-based Computations TR91-042 Work-efficient Techniques for the Parallel Execution of Sparse Grid-based Computations TR91-042 Jan F. Prins The University of North Carolina at Chapel Hill Department of Computer Science CB#3175, Sitterson

More information

NetIQ Privileged User Manager

NetIQ Privileged User Manager NetIQ Privileged User Manager Performance and Sizing Guidelines March 2014 Legal Notice THIS DOCUMENT AND THE SOFTWARE DESCRIBED IN THIS DOCUMENT ARE FURNISHED UNDER AND ARE SUBJECT TO THE TERMS OF A LICENSE

More information

A Content-Based Load Balancing Algorithm for Metadata Servers in Cluster File Systems*

A Content-Based Load Balancing Algorithm for Metadata Servers in Cluster File Systems* A Content-Based Load Balancing Algorithm for Metadata Servers in Cluster File Systems* Junho Jang, Saeyoung Han, Sungyong Park, and Jihoon Yang Department of Computer Science and Interdisciplinary Program

More information

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research MapReduce and Distributed Data Analysis Google Research 1 Dealing With Massive Data 2 2 Dealing With Massive Data Polynomial Memory Sublinear RAM Sketches External Memory Property Testing 3 3 Dealing With

More information



More information


LOAD BALANCING FOR MULTIPLE PARALLEL JOBS European Congress on Computational Methods in Applied Sciences and Engineering ECCOMAS 2000 Barcelona, 11-14 September 2000 ECCOMAS LOAD BALANCING FOR MULTIPLE PARALLEL JOBS A. Ecer, Y. P. Chien, H.U Akay

More information

User Guide. The Business Energy Dashboard

User Guide. The Business Energy Dashboard User Guide The Business Energy Dashboard 1 More Ways to Understand and Control Your Energy Use At FPL, we re investing in smart grid technologies as part of our commitment to building a smarter, more reliable

More information


CHAPTER 1 INTRODUCTION 1 CHAPTER 1 INTRODUCTION 1.1 MOTIVATION OF RESEARCH Multicore processors have two or more execution cores (processors) implemented on a single chip having their own set of execution and architectural recourses.

More information

A Robust Dynamic Load-balancing Scheme for Data Parallel Application on Message Passing Architecture

A Robust Dynamic Load-balancing Scheme for Data Parallel Application on Message Passing Architecture A Robust Dynamic Load-balancing Scheme for Data Parallel Application on Message Passing Architecture Yangsuk Kee Department of Computer Engineering Seoul National University Seoul, 151-742, Korea Soonhoi

More information

Introduction to Cloud Computing

Introduction to Cloud Computing Introduction to Cloud Computing Parallel Processing I 15 319, spring 2010 7 th Lecture, Feb 2 nd Majd F. Sakr Lecture Motivation Concurrency and why? Different flavors of parallel computing Get the basic

More information

DISCLAIMER. This document was prepared as an account of work sponsored by an agency of the United States

DISCLAIMER. This document was prepared as an account of work sponsored by an agency of the United States DISCLAIMER This document was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government nor the University of California nor any of their

More information

Reliable Systolic Computing through Redundancy

Reliable Systolic Computing through Redundancy Reliable Systolic Computing through Redundancy Kunio Okuda 1, Siang Wun Song 1, and Marcos Tatsuo Yamamoto 1 Universidade de São Paulo, Brazil, {kunio,song,mty}@ime.usp.br, http://www.ime.usp.br/ song/

More information

Capacity Planning Process Estimating the load Initial configuration

Capacity Planning Process Estimating the load Initial configuration Capacity Planning Any data warehouse solution will grow over time, sometimes quite dramatically. It is essential that the components of the solution (hardware, software, and database) are capable of supporting

More information

Understanding the Benefits of IBM SPSS Statistics Server

Understanding the Benefits of IBM SPSS Statistics Server IBM SPSS Statistics Server Understanding the Benefits of IBM SPSS Statistics Server Contents: 1 Introduction 2 Performance 101: Understanding the drivers of better performance 3 Why performance is faster

More information

Moral Hazard. Itay Goldstein. Wharton School, University of Pennsylvania

Moral Hazard. Itay Goldstein. Wharton School, University of Pennsylvania Moral Hazard Itay Goldstein Wharton School, University of Pennsylvania 1 Principal-Agent Problem Basic problem in corporate finance: separation of ownership and control: o The owners of the firm are typically

More information

The STC for Event Analysis: Scalability Issues

The STC for Event Analysis: Scalability Issues The STC for Event Analysis: Scalability Issues Georg Fuchs Gennady Andrienko http://geoanalytics.net Events Something [significant] happened somewhere, sometime Analysis goal and domain dependent, e.g.

More information

Source Code Transformations Strategies to Load-balance Grid Applications

Source Code Transformations Strategies to Load-balance Grid Applications Source Code Transformations Strategies to Load-balance Grid Applications Romaric David, Stéphane Genaud, Arnaud Giersch, Benjamin Schwarz, and Éric Violard LSIIT-ICPS, Université Louis Pasteur, Bd S. Brant,

More information


PARALLEL PROCESSING AND THE DATA WAREHOUSE PARALLEL PROCESSING AND THE DATA WAREHOUSE BY W. H. Inmon One of the essences of the data warehouse environment is the accumulation of and the management of large amounts of data. Indeed, it is said that

More information

COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service

COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service Eddie Dong, Yunhong Jiang 1 Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE,

More information

Overlapping Data Transfer With Application Execution on Clusters

Overlapping Data Transfer With Application Execution on Clusters Overlapping Data Transfer With Application Execution on Clusters Karen L. Reid and Michael Stumm reid@cs.toronto.edu stumm@eecg.toronto.edu Department of Computer Science Department of Electrical and Computer

More information

RESEARCH PAPER International Journal of Recent Trends in Engineering, Vol 1, No. 1, May 2009

RESEARCH PAPER International Journal of Recent Trends in Engineering, Vol 1, No. 1, May 2009 An Algorithm for Dynamic Load Balancing in Distributed Systems with Multiple Supporting Nodes by Exploiting the Interrupt Service Parveen Jain 1, Daya Gupta 2 1,2 Delhi College of Engineering, New Delhi,

More information

Locality-Preserving Dynamic Load Balancing for Data-Parallel Applications on Distributed-Memory Multiprocessors

Locality-Preserving Dynamic Load Balancing for Data-Parallel Applications on Distributed-Memory Multiprocessors JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 18, 1037-1048 (2002) Short Paper Locality-Preserving Dynamic Load Balancing for Data-Parallel Applications on Distributed-Memory Multiprocessors PANGFENG

More information

UNIVERSITY OF CALIFORNIA, SAN DIEGO. A Performance Model and Load Balancer for a Parallel Monte-Carlo Cellular Microphysiology Simulator

UNIVERSITY OF CALIFORNIA, SAN DIEGO. A Performance Model and Load Balancer for a Parallel Monte-Carlo Cellular Microphysiology Simulator UNIVERSITY OF CALIFORNIA, SAN DIEGO A Performance Model and Load Balancer for a Parallel Monte-Carlo Cellular Microphysiology Simulator A thesis submitted in partial satisfaction of the requirements for

More information

FD4: A Framework for Highly Scalable Dynamic Load Balancing and Model Coupling

FD4: A Framework for Highly Scalable Dynamic Load Balancing and Model Coupling Center for Information Services and High Performance Computing (ZIH) FD4: A Framework for Highly Scalable Dynamic Load Balancing and Model Coupling Symposium on HPC and Data-Intensive Applications in Earth

More information

ESNET Requirements for Physics Research at the SSCL

ESNET Requirements for Physics Research at the SSCL es r-t i i J u CD SSCL-SR-1222 June 1993 Distribution Category: 400 L. Cormell T. Johnson ESNET Requirements for Physics Research at the SSCL Superconducting Super Collider Laboratory inu Disclaimer Notice

More information

Analysis of Filter Coefficient Precision on LMS Algorithm Performance for G.165/G.168 Echo Cancellation

Analysis of Filter Coefficient Precision on LMS Algorithm Performance for G.165/G.168 Echo Cancellation Application Report SPRA561 - February 2 Analysis of Filter Coefficient Precision on LMS Algorithm Performance for G.165/G.168 Echo Cancellation Zhaohong Zhang Gunter Schmer C6 Applications ABSTRACT This

More information

A Study on the Application of Existing Load Balancing Algorithms for Large, Dynamic, Heterogeneous Distributed Systems

A Study on the Application of Existing Load Balancing Algorithms for Large, Dynamic, Heterogeneous Distributed Systems A Study on the Application of Existing Load Balancing Algorithms for Large, Dynamic, Heterogeneous Distributed Systems RUPAM MUKHOPADHYAY, DIBYAJYOTI GHOSH AND NANDINI MUKHERJEE Department of Computer

More information

A Hybrid Load Balancing Policy underlying Cloud Computing Environment

A Hybrid Load Balancing Policy underlying Cloud Computing Environment A Hybrid Load Balancing Policy underlying Cloud Computing Environment S.C. WANG, S.C. TSENG, S.S. WANG*, K.Q. YAN* Chaoyang University of Technology 168, Jifeng E. Rd., Wufeng District, Taichung 41349

More information

Intel Ethernet Switch Load Balancing System Design Using Advanced Features in Intel Ethernet Switch Family

Intel Ethernet Switch Load Balancing System Design Using Advanced Features in Intel Ethernet Switch Family Intel Ethernet Switch Load Balancing System Design Using Advanced Features in Intel Ethernet Switch Family White Paper June, 2008 Legal INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL

More information

CSE 4351/5351 Notes 7: Task Scheduling & Load Balancing

CSE 4351/5351 Notes 7: Task Scheduling & Load Balancing CSE / Notes : Task Scheduling & Load Balancing Task Scheduling A task is a (sequential) activity that uses a set of inputs to produce a set of outputs. A task (precedence) graph is an acyclic, directed

More information

Introduction to DISC and Hadoop

Introduction to DISC and Hadoop Introduction to DISC and Hadoop Alice E. Fischer April 24, 2009 Alice E. Fischer DISC... 1/20 1 2 History Hadoop provides a three-layer paradigm Alice E. Fischer DISC... 2/20 Parallel Computing Past and

More information

Lecture Outline Overview of real-time scheduling algorithms Outline relative strengths, weaknesses

Lecture Outline Overview of real-time scheduling algorithms Outline relative strengths, weaknesses Overview of Real-Time Scheduling Embedded Real-Time Software Lecture 3 Lecture Outline Overview of real-time scheduling algorithms Clock-driven Weighted round-robin Priority-driven Dynamic vs. static Deadline

More information

GPU Parallel Computing Architecture and CUDA Programming Model

GPU Parallel Computing Architecture and CUDA Programming Model GPU Parallel Computing Architecture and CUDA Programming Model John Nickolls Outline Why GPU Computing? GPU Computing Architecture Multithreading and Arrays Data Parallel Problem Decomposition Parallel

More information

A Systems Approach to HVAC Contractor Security

A Systems Approach to HVAC Contractor Security LLNL-JRNL-653695 A Systems Approach to HVAC Contractor Security K. M. Masica April 24, 2014 A Systems Approach to HVAC Contractor Security Disclaimer This document was prepared as an account of work sponsored

More information

Resource Allocation Schemes for Gang Scheduling

Resource Allocation Schemes for Gang Scheduling Resource Allocation Schemes for Gang Scheduling B. B. Zhou School of Computing and Mathematics Deakin University Geelong, VIC 327, Australia D. Walsh R. P. Brent Department of Computer Science Australian

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing

Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing /35 Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing Zuhair Khayyat 1 Karim Awara 1 Amani Alonazi 1 Hani Jamjoom 2 Dan Williams 2 Panos Kalnis 1 1 King Abdullah University of

More information

A Load Balancing Algorithm based on the Variation Trend of Entropy in Homogeneous Cluster

A Load Balancing Algorithm based on the Variation Trend of Entropy in Homogeneous Cluster , pp.11-20 http://dx.doi.org/10.14257/ ijgdc.2014.7.2.02 A Load Balancing Algorithm based on the Variation Trend of Entropy in Homogeneous Cluster Kehe Wu 1, Long Chen 2, Shichao Ye 2 and Yi Li 2 1 Beijing

More information

Performance Comparison of Dynamic Load-Balancing Strategies for Distributed Computing

Performance Comparison of Dynamic Load-Balancing Strategies for Distributed Computing Performance Comparison of Dynamic Load-Balancing Strategies for Distributed Computing A. Cortés, A. Ripoll, M.A. Senar and E. Luque Computer Architecture and Operating Systems Group Universitat Autònoma

More information

HP Smart Array Controllers and basic RAID performance factors

HP Smart Array Controllers and basic RAID performance factors Technical white paper HP Smart Array Controllers and basic RAID performance factors Technology brief Table of contents Abstract 2 Benefits of drive arrays 2 Factors that affect performance 2 HP Smart Array

More information