Size: px
Start display at page:

Download ""

Transcription

1 Dynamic Mapping and Load Balancing on Scalable Interconnection Networks Alan Heirich, California Institute of Technology Center for Advanced Computing Research The problems of mapping and load balancing arbitrary programs and data structures on networks of computers are considered. A novel diusion algorithm is presented to solve these problems. It complements the diusion algorithms for load balancing which have enjoyed success on massively parallel processors. Like these algorithms it is innitely scalable and adapts in real time to the requirements of dynamic problems. This is in contrast to existing mapping strategies which are either not dynamic, not scalable, and/or inapplicable to irregular networks. Many recursive bisection algorithms have been proposed for mapping and load balancing on regular and irregular networks. Recursive spectral bisection (RSB) has been very popular because it has a solid theoretical foundation. RSB solves a series of constrained one dimensional quadratic minimization subproblems in order to obtain a locally optimal solution to a multidimensional problem. The constraint prevents the algorithm from converging on a trivial solution. Incorporating the constraint into each subproblem leads to a series of eigenvalue problems which are typically solved by (expensive) Lanczos iterations. The diusion algorithm presented here solves the same quadratic minimization problem as RSB but solves it in multiple dimensions in a single step. As in RSB it is necessary to incorporate constraints in order to avoid the trivial solution. These constraints are incorporated directly from the problem instance and have a natural representation in the problem semantics. The resulting algorithm is scalable, dynamic, and applicable to any interconnection topology which can be contiguously embedded in R n.

2 DIFFUSING COMPUTATIONS Diusion has been a metaphor for concurrent computation from the earliest days of parallel computing. Early papers introduced the metaphor, considered the problem of termination detection, and discussed mapping and process management. Computation: a form of energy which diuses through a (parallel) computer system. Challenges: manage this (dynamic) energy for peak eciency. { Mapping: to minimize communication. { Load balancing: to minimize idleness. Detect termination. Dijkstra & Scholten, Termination detection for diusing computations (1980). Martin, A distributed implementation method for parallel programming (1980). Chandy & Misra, Termination detection of diusing computations in communicating sequential processes (1982).

3 DIFFUSION ALGORITHMS With the advent of readily accessible parallel computers several practical diusion algorithms were proposed. The algorithms are based on computing an equilibrium of a dynamical system. \Solve" a Laplace system r 2 x = 0 by iteration. Converge to a xed point (equilibrium) from any initial condition. Cybenko (1989) showed correctness for the load balancing problem. Heirich (1994), Heirich & Taylor (1995) showed innite scalability. Heirich (1996) solves the mapping problem. Cybenko, Dynamic load balancing for distributed memory multiprocessors (1989). Heirich, Scalable load balancing by diusion (1994). Heirich & Taylor, A parabolic load balancing method (1995). Heirich, Dynamic mapping and load balancing on scalable interconnection networks (1996).

4 BIBLIOGRAPHY: DIFFUSION IN PARALLEL COMPUTING At least a dozen closely related papers have been published since All of these apply diusion to the load balancing problem. 1. Baden, SIAM J Sci Stat Comp, 12:1 (1991), Boillat, Concurrency, 2 (1990), Boillat, Bruge & Kropf, J Comp Phys, 96:1 (1991) Bruge & Fornili, Comp Phys Comm, 60 (1990), Chandy & Misra, ACM TOPLAS, 4:1 (1982). 6. Conley, Argonne Natl Lab Tech Rep ANL-93/40 (1993). 7. Cybenko, J Par Dist Comp, 7 (1989), Dijkstra & Scholten, Inf Proc Lett, 11:1 (1980), Heirich, Caltech Comp Sci Dept Tech Rep CS-TR (1994). 10. Heirich & Taylor, Proc 24th Intl Conf Par Proc, (1995) v. III, Hong, Tan & Chen, Proc ACM Sigmetric Conf, (1988), Horton, Par Comp, 19 (1993), Hosseini, Litow, Malwaki, Nadella & Vairavan, J Par Dist Comp, 10, (1990), Martin, Inf Proc, 80 (1980), Muniz & Zaluska, Par Comp, 21 (1995), Willabeek-LeMair & Reeves, IEEE Tr Par Dist Sys, 4 (1993), Xu & Lau, J Par Dist Comp, 16 (1992), Xu & Lau, J Par Dist Comp, 24 (1995),

5 PROPERTIES OF GSL ITERATIONS A Gauss-Seidel iteration on a Laplace equation is a general algorithmic paradigm for equilibrium computations. From a practical standpoint it has properties which make it robust and scalable in distributed systems. From a theoretical standpoint it ts a general framework of equilibrium computations in dynamical systems. Gauss-Seidel iteration on Laplace equation r 2 x = 0. Discrete Laplacian matrix Q, split into upper (U), lower (L), diagonal (D). Discrete iteration ~x ;(D + L) ;1 U~x. Properties: concurrent, asynchronous, fault tolerant, scalable, fast. Scalability and convergence: { Model problem (degree 4 lattice) has known eigenvalues. { Time dependent amplitude of point disturbance is linear superposition of modal amplitudes. { Convergence is exponential with respect to initial amplitude (i.e. logarithmic in time). { An arbitrary disturbance can be modeled as a composition of point disturbances.

6 SCALING FOR POINT DISTURBANCES The scalability of GSL iterations comes from the near scale invariance of the Laplacian spectrum. The gures below show the eigenvalues of two GSL iterations on a model problem. Both model problems are a degree-4 lattice. On the left are three spectral components from a lattice, and on the right the corresponding components from a lattice "ev0_64" "ev16_64" "ev31_64" "ev0_1024" "ev256_1024" "ev511_1024" The eigenvalues of a GSL iteration on an n n model problem are i j = B@cos i n + cos j n 1 CA 2

7 The time dependent amplitude of a point disturbance after iterates is x () 0 0 = X i j = X i j c ( i j ) 1 4 n 0 B@cos i n + cos j n Convergence to equilibrium occurs exponentially, i.e. time is logarithmic with respect to the height of the disturbance. Below left, height of the point disturbance during 32 successive GSL iterates. Below right, number of iterates required to decrease the disturbance by 90% for increasing problem sizes. This number is constant above a certain size. 1 CA Young, Iterative solution of large linear systems (1971).

8 GRAPH LAYOUT BY QUADRATIC MINIMIZATION Many NP-hard and NP-complete problems can be characterized as graph layout problems: given a (capacitated) graph, nd an arrangement of vertices which minimizes the sum of a metric over pairs of connected points. Minimize (nontrivially) aggregate distance z among a (weighted) set of connected points. One dimension: z P i j (x i ; x j ) 2 c i j. { Matrix bandwidth reduction. { Mapping on a ring. Two dimensions: z P i j (xi ; x j ) 2 + (y i ; y j ) 2 c i j. { Mapping on a mesh, heirarchical, or irregular network. { Detecting implicit control structures in compiled programs. { VLSI placement. { Graph isomorphism, many others. Kung & Stevenson, A software technique for reducing the routing time on a parallel computer with a xed interconnection network (1977). Bokhari, On the mapping problem (1981). Read & Corneil, The graph isomorphism disease (1977).

9 MAPPING PROGRAMS BY GRAPH EMBEDDING The mapping problem can be described as a graph layout problem in which a graph of communicating processes is embedded into a graph of connected computers. \Guest" graph G: vertices are processes, edges are communication channels. \Host" graph H: vertices are computers, edges are network links. Graph embedding problem (Rosenberg): map vertices of G onto vertices of H to minimize dilation (average distance) and equalize density. Mapping problem (Martin): map processes onto computers to minimize aggregate communication and equalize workload. Rosenberg, Issues in the study of graph embeddings (1981). Martin, A distributed implementation method for parallel programming (1980).

10 COORDINATE BISECTION ALGORITHMS In specialized problems, such as solving partial dierential equations, a mapping is implicit in the problem instance. The coordinates of a PDE grid provide a solution which can be found by repeated coordinate bisection. Repeatedly bisect along alternating axes using intrinsic problem coordinates. Example: partitioning an unstructured PDE grid. Advantages: scalable O(n log n=p), probably the best method for PDE problems. Problems: restricted to special cases in which vertices of G possess intrinsic coordinates. Inapplicable to irregular networks or dynamic problems. No provision for capacitated problems (weighted edges or vertices). Williams, Performance of dynamic load balancing algorithms for unstructured mesh calculations (1991).

11 GRAPH CONTRACTION ALGORITHMS Any problem instance can be solved by repeatedly coalescing vertices of a graph until it equals a standard topology. Ecient, scalable. Advantages: nds optimal solutions for standard host topologies. Problems: suboptimal solutions for irregular networks, inapplicable to dynamic problems, no provision for capacitated problems. Ben-Natan & Barak, Parallel contraction of grids for task assignment to processor networks (1992). Karabeg, Process partitioning through graph compaction (1995).

12 THE LAPLACIAN SPECTRUM Call C the matrix of edge weights of G. Then optimal bisections of G can be found by solving a one dimensional eigenvalue problem for C. Minimize z, z = 0:5 X i = 0:5 X i = 0:5( X i = X i X (x i ; x j ) 2 c i j (x 2 ; 2x i i x j + x 2 )c j i j j x 2 c i i i ; X X X 2 x i x j c i j + i j j j X x 2 i c i i ; X j = ~x T (D ; C)~x = ~x T Q~x X i6=j x i x j c i j x 2 j c j j) To avoid triviality (~x = 0) append a constraint ~x T ~x = 1. Minimize the resulting Lagrangian L ~x T Q~x ; (~x T ~x ; = 0 ) 2Q~x ; 2~x = 0 ) Q~x = ~x In two dimensions minimize ~x T Q~x + ~y T Q~y, and similarly in higher dimensions. Hall, An r-dimensional quadratic placement algorithm (1970).

13 RECURSIVE SPECTRAL BISECTION Over 100 papers have been written in recent years about the RSB algorithm most of which address its high cost. More recently it has been challenged on the grounds that equivalent quality solutions can be obtained by heuristics (Karypis). Recursively bisect using eigenvalues for optimal splitting at each step. Implementations by Lanczos algorithm can be expensive, non robust. Ultimate result can be suboptimal. Advantages: solid theoretical foundation allows edge weights in G. Problems: no edge weights in H, no vertex weights in G or H. suboptimal solutions, expensive algorithm, unscalable, inapplicable to irregular networks or dynamic problems. Barnard & Simon, A parallel implementation of multilevel recursive spectral bisection for application to adaptive unstructured meshes (1995). Karypis & Kumar, Multilevel graph partitioning schemes (1995).

14 A VISUAL METAPHOR In 1963 Tutte proposed an algorithm to draw a planar graph in the plane. The proposal was to solve a Laplace equation on the graph vertices while constraining the values of some vertices in order to avoid the trivial solution. This proposal can be extended to general graphs in R n if the algorithm is based on a GSL iteration. Equalizing edge lengths of a planar graph gives a (nonunique) optimal placement in the plane (Tutte). Metaphor extends to nonplanar graphs under a leastsquares minimization. Graph embedding can be accomplished by identifying local regions of the plane (or R n ) with vertices of H. Natural interpretation of capacities: { Process weights: areas of vertices in G. { Communication capacity: weight of edges in G. { Computer work capacities: areas of regions associated with vertices in H. Tutte, How to draw a graph (1963). Heirich, Dynamic mapping and load balancing on scalable interconnection networks (1996).

15 DIFFUSIVE MAPPING AND LOAD BALANCING A robust, scalable and dynamic scheme for mapping and load balancing can be constructed from GSL iterations. In the rst step solve the mapping problem by solving a Laplace equation on a graph of communicating processes or a connected data structure. In the second step solve the load balancing problem by solving a Laplace equation on the workloads of the computers. 1. Dene the mapping space R n. R n represents the network of connected computers. The dimension n is chosen so that R n can be partitioned according to the connectivity of the network. For example, R 2 provides a natural partitioning for a two dimensional mesh. 2. Partition R n according to the network connectivity. In R 2 this can be done by embedding the network in a square region of the plane, bisecting each network link, and using the bisectors to dene the regions associated with each computer. The area of each region represents the workload capacity of each computer. Equivalent procedures can be used for R n. 3. Obtain the guest graph. In many problems this is the form of the problem instance. For example, in a connected data structure, or a graph of communicating processes. In other problems it may be necessary to construct this graph from a data set, for example in ray tracing disconnected polygons. 4. Place a (small) set of \distinguished" vertices of the guest graph. This requires an oracle, or knowledge specic to the problem instance. For example, a small number of processes are typically associated with i/o. These nodes will constrain the algorithm and prevent it from nding the trivial solution. The placement of these nodes is critical.

16 5. Perform a GSL iteration on the vertices of the guest graph. The vertices which were placed in the previous step do not move. The work requirementofguest vertices can be represented by the areas they occupy in R n. This requires a trivial modication to the basic GSL iteration. The communication requirements of guest channels can be represented by nonuniform weights in Q. The GSL iteration will take these nonuniform weights into account in the desired way, by grouping more closely those vertices which are connected by higher weighted edges. 6. Perform a GSL iteration on the workloads of the computers. The rst GSL iteration will approach a xed point and then converge very slowly to the ultimate conguration. At this point the vertices of G will be properly ordered and further iterates are only rening the solution. Shortcut the nal iterates by directly uxing vertices of G across boundaries between the computers. A vertex of G is a candidate for uxing only if it is connected by an edge to a vertex on a dierent computer. The resulting algorithm is scalable, fault tolerant, delay insensitive, fast, and dynamic. It nds a locally minimal solution to the quadratic minimization problem just as RSB does. It incorporates naturally the capacities of computers, processes, and communication requirements. It does not address limited communication capacities of network links, nor network topologies which cannot be embedded in R n for some n. The constraints result from proper initial placement of guest vertices, and these placement are critical to the quality of the resulting solution. Application of these techniques are currently under way to the problem of Monte Carlo ray tracing on parallel computers.

17 APPLICATION TO PHOTOREALISTIC ANIMATION (Figure and model courtesy of Greg Ward, Lawrence Berkeley Laboratories) This model of an imaginary oce is comprised of 18,579 polygons. The polygons are assembled into a graph and dynamically mapped across processors of the 512 node IBM SP2 at the Cornell Theory Center. A Euclidian metric in R 3 is used to cluster the polygons in a 2 dimensional mapping space. Dynamic requirements in animation result from the movement of objects, including light sources.

Topological Properties

Topological Properties Advanced Computer Architecture Topological Properties Routing Distance: Number of links on route Node degree: Number of channels per node Network diameter: Longest minimum routing distance between any

More information

Distributed Dynamic Load Balancing for Iterative-Stencil Applications

Distributed Dynamic Load Balancing for Iterative-Stencil Applications Distributed Dynamic Load Balancing for Iterative-Stencil Applications G. Dethier 1, P. Marchot 2 and P.A. de Marneffe 1 1 EECS Department, University of Liege, Belgium 2 Chemical Engineering Department,

More information

Heat Diffusion Based Dynamic Load Balancing for Distributed Virtual Environments

Heat Diffusion Based Dynamic Load Balancing for Distributed Virtual Environments Heat Diffusion Based Dynamic Load Balancing for Distributed Virtual Environments Yunhua Deng Rynson W.H. Lau Department of Computer Science, City University of Hong Kong, Hong Kong Abstract Distributed

More information

An Empirical Study and Analysis of the Dynamic Load Balancing Techniques Used in Parallel Computing Systems

An Empirical Study and Analysis of the Dynamic Load Balancing Techniques Used in Parallel Computing Systems An Empirical Study and Analysis of the Dynamic Load Balancing Techniques Used in Parallel Computing Systems Ardhendu Mandal and Subhas Chandra Pal Department of Computer Science and Application, University

More information

Interconnection Networks. Interconnection Networks. Interconnection networks are used everywhere!

Interconnection Networks. Interconnection Networks. Interconnection networks are used everywhere! Interconnection Networks Interconnection Networks Interconnection networks are used everywhere! Supercomputers connecting the processors Routers connecting the ports can consider a router as a parallel

More information

Load balancing in a heterogeneous computer system by self-organizing Kohonen network

Load balancing in a heterogeneous computer system by self-organizing Kohonen network Bull. Nov. Comp. Center, Comp. Science, 25 (2006), 69 74 c 2006 NCC Publisher Load balancing in a heterogeneous computer system by self-organizing Kohonen network Mikhail S. Tarkov, Yakov S. Bezrukov Abstract.

More information

Partition And Load Balancer on World Wide Web

Partition And Load Balancer on World Wide Web JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 17, 595-614 (2001) UMPAL: An Unstructured Mesh Partitioner and Load Balancer on World Wide Web WILLIAM C. CHU *, DON-LIN YANG, JEN-CHIH YU AND YEH-CHING CHUNG

More information

Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations

Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations Roy D. Williams, 1990 Presented by Chris Eldred Outline Summary Finite Element Solver Load Balancing Results Types Conclusions

More information

CSE 4351/5351 Notes 7: Task Scheduling & Load Balancing

CSE 4351/5351 Notes 7: Task Scheduling & Load Balancing CSE / Notes : Task Scheduling & Load Balancing Task Scheduling A task is a (sequential) activity that uses a set of inputs to produce a set of outputs. A task (precedence) graph is an acyclic, directed

More information

A scalable multilevel algorithm for graph clustering and community structure detection

A scalable multilevel algorithm for graph clustering and community structure detection A scalable multilevel algorithm for graph clustering and community structure detection Hristo N. Djidjev 1 Los Alamos National Laboratory, Los Alamos, NM 87545 Abstract. One of the most useful measures

More information

Lecture 12: Partitioning and Load Balancing

Lecture 12: Partitioning and Load Balancing Lecture 12: Partitioning and Load Balancing G63.2011.002/G22.2945.001 November 16, 2010 thanks to Schloegel,Karypis and Kumar survey paper and Zoltan website for many of today s slides and pictures Partitioning

More information

Fast Multipole Method for particle interactions: an open source parallel library component

Fast Multipole Method for particle interactions: an open source parallel library component Fast Multipole Method for particle interactions: an open source parallel library component F. A. Cruz 1,M.G.Knepley 2,andL.A.Barba 1 1 Department of Mathematics, University of Bristol, University Walk,

More information

Mesh Generation and Load Balancing

Mesh Generation and Load Balancing Mesh Generation and Load Balancing Stan Tomov Innovative Computing Laboratory Computer Science Department The University of Tennessee April 04, 2012 CS 594 04/04/2012 Slide 1 / 19 Outline Motivation Reliable

More information

Portable Parallel Programming for the Dynamic Load Balancing of Unstructured Grid Applications

Portable Parallel Programming for the Dynamic Load Balancing of Unstructured Grid Applications Portable Parallel Programming for the Dynamic Load Balancing of Unstructured Grid Applications Rupak Biswas MRJ Technology Solutions NASA Ames Research Center Moffett Field, CA 9435, USA rbiswas@nas.nasa.gov

More information

Clustering and scheduling maintenance tasks over time

Clustering and scheduling maintenance tasks over time Clustering and scheduling maintenance tasks over time Per Kreuger 2008-04-29 SICS Technical Report T2008:09 Abstract We report results on a maintenance scheduling problem. The problem consists of allocating

More information

Optimizing Configuration and Application Mapping for MPSoC Architectures

Optimizing Configuration and Application Mapping for MPSoC Architectures Optimizing Configuration and Application Mapping for MPSoC Architectures École Polytechnique de Montréal, Canada Email : Sebastien.Le-Beux@polymtl.ca 1 Multi-Processor Systems on Chip (MPSoC) Design Trends

More information

High Performance Computing for Operation Research

High Performance Computing for Operation Research High Performance Computing for Operation Research IEF - Paris Sud University claude.tadonki@u-psud.fr INRIA-Alchemy seminar, Thursday March 17 Research topics Fundamental Aspects of Algorithms and Complexity

More information

DYNAMIC GRAPH ANALYSIS FOR LOAD BALANCING APPLICATIONS

DYNAMIC GRAPH ANALYSIS FOR LOAD BALANCING APPLICATIONS DYNAMIC GRAPH ANALYSIS FOR LOAD BALANCING APPLICATIONS DYNAMIC GRAPH ANALYSIS FOR LOAD BALANCING APPLICATIONS by Belal Ahmad Ibraheem Nwiran Dr. Ali Shatnawi Thesis submitted in partial fulfillment of

More information

Clustering & Visualization

Clustering & Visualization Chapter 5 Clustering & Visualization Clustering in high-dimensional databases is an important problem and there are a number of different clustering paradigms which are applicable to high-dimensional data.

More information

Chapter 4 Multi-Stage Interconnection Networks The general concept of the multi-stage interconnection network, together with its routing properties, have been used in the preceding chapter to describe

More information

A Systematic Approach. to Parallel Program Verication. Tadao TAKAOKA. Department of Computer Science. Ibaraki University. Hitachi, Ibaraki 316, JAPAN

A Systematic Approach. to Parallel Program Verication. Tadao TAKAOKA. Department of Computer Science. Ibaraki University. Hitachi, Ibaraki 316, JAPAN A Systematic Approach to Parallel Program Verication Tadao TAKAOKA Department of Computer Science Ibaraki University Hitachi, Ibaraki 316, JAPAN E-mail: takaoka@cis.ibaraki.ac.jp Phone: +81 94 38 5130

More information

DATA ANALYSIS II. Matrix Algorithms

DATA ANALYSIS II. Matrix Algorithms DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where

More information

QUALITY OF SERVICE METRICS FOR DATA TRANSMISSION IN MESH TOPOLOGIES

QUALITY OF SERVICE METRICS FOR DATA TRANSMISSION IN MESH TOPOLOGIES QUALITY OF SERVICE METRICS FOR DATA TRANSMISSION IN MESH TOPOLOGIES SWATHI NANDURI * ZAHOOR-UL-HUQ * Master of Technology, Associate Professor, G. Pulla Reddy Engineering College, G. Pulla Reddy Engineering

More information

J. R. Allwright. School of Computer and Information Science, Syracuse University. Northeast Parallel Architectures Center, Syracuse University,

J. R. Allwright. School of Computer and Information Science, Syracuse University. Northeast Parallel Architectures Center, Syracuse University, DRAFT SCCS-666 A Comparison of Parallel Graph Coloring Algorithms J. R. Allwright School of Computer and Information Science, Syracuse University. Syracuse, NY, U.S.A. R. Bordawekar, P. D. Coddington,

More information

A Note on Maximum Independent Sets in Rectangle Intersection Graphs

A Note on Maximum Independent Sets in Rectangle Intersection Graphs A Note on Maximum Independent Sets in Rectangle Intersection Graphs Timothy M. Chan School of Computer Science University of Waterloo Waterloo, Ontario N2L 3G1, Canada tmchan@uwaterloo.ca September 12,

More information

Load balancing Static Load Balancing

Load balancing Static Load Balancing Chapter 7 Load Balancing and Termination Detection Load balancing used to distribute computations fairly across processors in order to obtain the highest possible execution speed. Termination detection

More information

Adaptive Time-Dependent CFD on Distributed Unstructured Meshes

Adaptive Time-Dependent CFD on Distributed Unstructured Meshes Adaptive Time-Dependent CFD on Distributed Unstructured Meshes Chris Walshaw and Martin Berzins School of Computer Studies, University of Leeds, Leeds, LS2 9JT, U K e-mails: chris@scsleedsacuk, martin@scsleedsacuk

More information

Subgraph Patterns: Network Motifs and Graphlets. Pedro Ribeiro

Subgraph Patterns: Network Motifs and Graphlets. Pedro Ribeiro Subgraph Patterns: Network Motifs and Graphlets Pedro Ribeiro Analyzing Complex Networks We have been talking about extracting information from networks Some possible tasks: General Patterns Ex: scale-free,

More information

A Tool for Performance Analysis of Parallel Programs 1 M. Calzarossa, L. Massari, A. Merlo, D. Tessera, M. Vidal Dipartimento di Informatica e Sistemistica, Universita dipavia Via Abbiategrasso, 209-27100

More information

GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications

GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications Harris Z. Zebrowitz Lockheed Martin Advanced Technology Laboratories 1 Federal Street Camden, NJ 08102

More information

Yousef Saad University of Minnesota Computer Science and Engineering. CRM Montreal - April 30, 2008

Yousef Saad University of Minnesota Computer Science and Engineering. CRM Montreal - April 30, 2008 A tutorial on: Iterative methods for Sparse Matrix Problems Yousef Saad University of Minnesota Computer Science and Engineering CRM Montreal - April 30, 2008 Outline Part 1 Sparse matrices and sparsity

More information

System Interconnect Architectures. Goals and Analysis. Network Properties and Routing. Terminology - 2. Terminology - 1

System Interconnect Architectures. Goals and Analysis. Network Properties and Routing. Terminology - 2. Terminology - 1 System Interconnect Architectures CSCI 8150 Advanced Computer Architecture Hwang, Chapter 2 Program and Network Properties 2.4 System Interconnect Architectures Direct networks for static connections Indirect

More information

Xiaoqiao Meng, Vasileios Pappas, Li Zhang IBM T.J. Watson Research Center Presented by: Payman Khani

Xiaoqiao Meng, Vasileios Pappas, Li Zhang IBM T.J. Watson Research Center Presented by: Payman Khani Improving the Scalability of Data Center Networks with Traffic-aware Virtual Machine Placement Xiaoqiao Meng, Vasileios Pappas, Li Zhang IBM T.J. Watson Research Center Presented by: Payman Khani Overview:

More information

Compact Representations and Approximations for Compuation in Games

Compact Representations and Approximations for Compuation in Games Compact Representations and Approximations for Compuation in Games Kevin Swersky April 23, 2008 Abstract Compact representations have recently been developed as a way of both encoding the strategic interactions

More information

1 Example of Time Series Analysis by SSA 1

1 Example of Time Series Analysis by SSA 1 1 Example of Time Series Analysis by SSA 1 Let us illustrate the 'Caterpillar'-SSA technique [1] by the example of time series analysis. Consider the time series FORT (monthly volumes of fortied wine sales

More information

A Locality Enhanced Scheduling Method for Multiple MapReduce Jobs In a Workflow Application

A Locality Enhanced Scheduling Method for Multiple MapReduce Jobs In a Workflow Application 2012 International Conference on Information and Computer Applications (ICICA 2012) IPCSIT vol. 24 (2012) (2012) IACSIT Press, Singapore A Locality Enhanced Scheduling Method for Multiple MapReduce Jobs

More information

An Improved Spectral Load Balancing Method*

An Improved Spectral Load Balancing Method* SAND93-016C An Improved Spectral Load Balancing Method* Bruce Hendrickson Robert Leland Abstract We describe an algorithm for the static load balancing of scientific computations that generalizes and improves

More information

Finite cloud method: a true meshless technique based on a xed reproducing kernel approximation

Finite cloud method: a true meshless technique based on a xed reproducing kernel approximation INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING Int. J. Numer. Meth. Engng 2001; 50:2373 2410 Finite cloud method: a true meshless technique based on a xed reproducing kernel approximation N.

More information

PARALLEL PROGRAMMING

PARALLEL PROGRAMMING PARALLEL PROGRAMMING TECHNIQUES AND APPLICATIONS USING NETWORKED WORKSTATIONS AND PARALLEL COMPUTERS 2nd Edition BARRY WILKINSON University of North Carolina at Charlotte Western Carolina University MICHAEL

More information

Support Vector Machines with Clustering for Training with Very Large Datasets

Support Vector Machines with Clustering for Training with Very Large Datasets Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano

More information

Scalability and Classifications

Scalability and Classifications Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static

More information

3 P0 P0 P3 P3 8 P1 P0 P2 P3 P1 P2

3 P0 P0 P3 P3 8 P1 P0 P2 P3 P1 P2 A Comparison of 1-D and 2-D Data Mapping for Sparse LU Factorization with Partial Pivoting Cong Fu y Xiangmin Jiao y Tao Yang y Abstract This paper presents a comparative study of two data mapping schemes

More information

How To Get A Computer Science Degree At Appalachian State

How To Get A Computer Science Degree At Appalachian State 118 Master of Science in Computer Science Department of Computer Science College of Arts and Sciences James T. Wilkes, Chair and Professor Ph.D., Duke University WilkesJT@appstate.edu http://www.cs.appstate.edu/

More information

Data Mining. Cluster Analysis: Advanced Concepts and Algorithms

Data Mining. Cluster Analysis: Advanced Concepts and Algorithms Data Mining Cluster Analysis: Advanced Concepts and Algorithms Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 More Clustering Methods Prototype-based clustering Density-based clustering Graph-based

More information

Map-Reduce for Machine Learning on Multicore

Map-Reduce for Machine Learning on Multicore Map-Reduce for Machine Learning on Multicore Chu, et al. Problem The world is going multicore New computers - dual core to 12+-core Shift to more concurrent programming paradigms and languages Erlang,

More information

Load Balancing and Termination Detection

Load Balancing and Termination Detection Chapter 7 Load Balancing and Termination Detection 1 Load balancing used to distribute computations fairly across processors in order to obtain the highest possible execution speed. Termination detection

More information

Introduction to Parallel Computing. George Karypis Parallel Programming Platforms

Introduction to Parallel Computing. George Karypis Parallel Programming Platforms Introduction to Parallel Computing George Karypis Parallel Programming Platforms Elements of a Parallel Computer Hardware Multiple Processors Multiple Memories Interconnection Network System Software Parallel

More information

Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations

Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations C3P 913 June 1990 Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations Roy D. Williams Concurrent Supercomputing Facility California Institute of Technology Pasadena, California

More information

Load balancing. David Bindel. 12 Nov 2015

Load balancing. David Bindel. 12 Nov 2015 Load balancing David Bindel 12 Nov 2015 Inefficiencies in parallel code Poor single processor performance Typically in the memory system Saw this in matrix multiply assignment Overhead for parallelism

More information

Introduction to Logistic Regression

Introduction to Logistic Regression OpenStax-CNX module: m42090 1 Introduction to Logistic Regression Dan Calderon This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution License 3.0 Abstract Gives introduction

More information

DECENTRALIZED LOAD BALANCING IN HETEROGENEOUS SYSTEMS USING DIFFUSION APPROACH

DECENTRALIZED LOAD BALANCING IN HETEROGENEOUS SYSTEMS USING DIFFUSION APPROACH DECENTRALIZED LOAD BALANCING IN HETEROGENEOUS SYSTEMS USING DIFFUSION APPROACH P.Neelakantan Department of Computer Science & Engineering, SVCET, Chittoor pneelakantan@rediffmail.com ABSTRACT The grid

More information

Interconnection Network

Interconnection Network Interconnection Network Recap: Generic Parallel Architecture A generic modern multiprocessor Network Mem Communication assist (CA) $ P Node: processor(s), memory system, plus communication assist Network

More information

Graph Visualization U. Dogrusoz and G. Sander Tom Sawyer Software, 804 Hearst Avenue, Berkeley, CA 94710, USA info@tomsawyer.com Graph drawing, or layout, is the positioning of nodes (objects) and the

More information

A Novel Switch Mechanism for Load Balancing in Public Cloud

A Novel Switch Mechanism for Load Balancing in Public Cloud International OPEN ACCESS Journal Of Modern Engineering Research (IJMER) A Novel Switch Mechanism for Load Balancing in Public Cloud Kalathoti Rambabu 1, M. Chandra Sekhar 2 1 M. Tech (CSE), MVR College

More information

Big Graph Processing: Some Background

Big Graph Processing: Some Background Big Graph Processing: Some Background Bo Wu Colorado School of Mines Part of slides from: Paul Burkhardt (National Security Agency) and Carlos Guestrin (Washington University) Mines CSCI-580, Bo Wu Graphs

More information

Lecture 2 Parallel Programming Platforms

Lecture 2 Parallel Programming Platforms Lecture 2 Parallel Programming Platforms Flynn s Taxonomy In 1966, Michael Flynn classified systems according to numbers of instruction streams and the number of data stream. Data stream Single Multiple

More information

Job Allocation Scheme. (a) FCFS P/M 12/06 07/16 12/32 01/10 FCFS/FF. Jobs 0, 1 2 3, 4 0,1,4,5 14/28 2 07/16 3 11/20. (b)

Job Allocation Scheme. (a) FCFS P/M 12/06 07/16 12/32 01/10 FCFS/FF. Jobs 0, 1 2 3, 4 0,1,4,5 14/28 2 07/16 3 11/20. (b) Extended Abstract - Submitted to SC'99 Job Scheduling in the presence of Multiple Resource Requirements William Leinberger, George Karypis, Vipin Kumar Department of Computer Science and Engineering, University

More information

Dynamic Load Balancing of SAMR Applications on Distributed Systems y

Dynamic Load Balancing of SAMR Applications on Distributed Systems y Dynamic Load Balancing of SAMR Applications on Distributed Systems y Zhiling Lan, Valerie E. Taylor Department of Electrical and Computer Engineering Northwestern University, Evanston, IL 60208 fzlan,

More information

Technology White Paper Capacity Constrained Smart Grid Design

Technology White Paper Capacity Constrained Smart Grid Design Capacity Constrained Smart Grid Design Smart Devices Smart Networks Smart Planning EDX Wireless Tel: +1-541-345-0019 I Fax: +1-541-345-8145 I info@edx.com I www.edx.com Mark Chapman and Greg Leon EDX Wireless

More information

Unsupervised Data Mining (Clustering)

Unsupervised Data Mining (Clustering) Unsupervised Data Mining (Clustering) Javier Béjar KEMLG December 01 Javier Béjar (KEMLG) Unsupervised Data Mining (Clustering) December 01 1 / 51 Introduction Clustering in KDD One of the main tasks in

More information

Load Balancing and Termination Detection

Load Balancing and Termination Detection Chapter 7 slides7-1 Load Balancing and Termination Detection slides7-2 Load balancing used to distribute computations fairly across processors in order to obtain the highest possible execution speed. Termination

More information

Mean Value Coordinates

Mean Value Coordinates Mean Value Coordinates Michael S. Floater Abstract: We derive a generalization of barycentric coordinates which allows a vertex in a planar triangulation to be expressed as a convex combination of its

More information

Dynamic Load Balancing of Parallel Monte Carlo Transport Calculations

Dynamic Load Balancing of Parallel Monte Carlo Transport Calculations Dynamic Load Balancing of Parallel Monte Carlo Transport Calculations Richard Procassini, Matthew O'Brien and Janine Taylor Lawrence Livermore National Laboratory Joint Russian-American Five-Laboratory

More information

A New Nature-inspired Algorithm for Load Balancing

A New Nature-inspired Algorithm for Load Balancing A New Nature-inspired Algorithm for Load Balancing Xiang Feng East China University of Science and Technology Shanghai, China 200237 Email: xfeng{@ecusteducn, @cshkuhk} Francis CM Lau The University of

More information

Principles and characteristics of distributed systems and environments

Principles and characteristics of distributed systems and environments Principles and characteristics of distributed systems and environments Definition of a distributed system Distributed system is a collection of independent computers that appears to its users as a single

More information

Notation: - active node - inactive node

Notation: - active node - inactive node Dynamic Load Balancing Algorithms for Sequence Mining Valerie Guralnik, George Karypis fguralnik, karypisg@cs.umn.edu Department of Computer Science and Engineering/Army HPC Research Center University

More information

Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets

Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Macario O. Cordel II and Arnulfo P. Azcarraga College of Computer Studies *Corresponding Author: macario.cordel@dlsu.edu.ph

More information

An Ecient Dynamic Load Balancing using the Dimension Exchange. Ju-wook Jang. of balancing load among processors, most of the realworld

An Ecient Dynamic Load Balancing using the Dimension Exchange. Ju-wook Jang. of balancing load among processors, most of the realworld An Ecient Dynamic Load Balancing using the Dimension Exchange Method for Balancing of Quantized Loads on Hypercube Multiprocessors * Hwakyung Rim Dept. of Computer Science Seoul Korea 11-74 ackyung@arqlab1.sogang.ac.kr

More information

Single-Link Failure Detection in All-Optical Networks Using Monitoring Cycles and Paths

Single-Link Failure Detection in All-Optical Networks Using Monitoring Cycles and Paths Single-Link Failure Detection in All-Optical Networks Using Monitoring Cycles and Paths Satyajeet S. Ahuja, Srinivasan Ramasubramanian, and Marwan Krunz Department of ECE, University of Arizona, Tucson,

More information

USE OF EIGENVALUES AND EIGENVECTORS TO ANALYZE BIPARTIVITY OF NETWORK GRAPHS

USE OF EIGENVALUES AND EIGENVECTORS TO ANALYZE BIPARTIVITY OF NETWORK GRAPHS USE OF EIGENVALUES AND EIGENVECTORS TO ANALYZE BIPARTIVITY OF NETWORK GRAPHS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA natarajan.meghanathan@jsums.edu ABSTRACT This

More information

A FAST AND HIGH QUALITY MULTILEVEL SCHEME FOR PARTITIONING IRREGULAR GRAPHS

A FAST AND HIGH QUALITY MULTILEVEL SCHEME FOR PARTITIONING IRREGULAR GRAPHS SIAM J. SCI. COMPUT. Vol. 20, No., pp. 359 392 c 998 Society for Industrial and Applied Mathematics A FAST AND HIGH QUALITY MULTILEVEL SCHEME FOR PARTITIONING IRREGULAR GRAPHS GEORGE KARYPIS AND VIPIN

More information

Interconnection Network Design

Interconnection Network Design Interconnection Network Design Vida Vukašinović 1 Introduction Parallel computer networks are interesting topic, but they are also difficult to understand in an overall sense. The topological structure

More information

A RDT-Based Interconnection Network for Scalable Network-on-Chip Designs

A RDT-Based Interconnection Network for Scalable Network-on-Chip Designs A RDT-Based Interconnection Network for Scalable Network-on-Chip Designs ang u, Mei ang, ulu ang, and ingtao Jiang Dept. of Computer Science Nankai University Tianjing, 300071, China yuyang_79@yahoo.com.cn,

More information

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014 LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING ----Changsheng Liu 10-30-2014 Agenda Semi Supervised Learning Topics in Semi Supervised Learning Label Propagation Local and global consistency Graph

More information

Static Load Balancing of Parallel PDE Solver for Distributed Computing Environment

Static Load Balancing of Parallel PDE Solver for Distributed Computing Environment Static Load Balancing of Parallel PDE Solver for Distributed Computing Environment Shuichi Ichikawa and Shinji Yamashita Department of Knowledge-based Information Engineering, Toyohashi University of Technology

More information

Feature Point Selection using Structural Graph Matching for MLS based Image Registration

Feature Point Selection using Structural Graph Matching for MLS based Image Registration Feature Point Selection using Structural Graph Matching for MLS based Image Registration Hema P Menon Department of CSE Amrita Vishwa Vidyapeetham Coimbatore Tamil Nadu - 641 112, India K A Narayanankutty

More information

Architectural Level Power Consumption of Network on Chip. Presenter: YUAN Zheng

Architectural Level Power Consumption of Network on Chip. Presenter: YUAN Zheng Architectural Level Power Consumption of Network Presenter: YUAN Zheng Why Architectural Low Power Design? High-speed and large volume communication among different parts on a chip Problem: Power consumption

More information

solution flow update flow

solution flow update flow A High Performance wo Dimensional Scalable Parallel Algorithm for Solving Sparse riangular Systems Mahesh V. Joshi y Anshul Gupta z George Karypis y Vipin Kumar y Abstract Solving a system of equations

More information

Mesh Partitioning and Load Balancing

Mesh Partitioning and Load Balancing and Load Balancing Contents: Introduction / Motivation Goals of Load Balancing Structures Tools Slide Flow Chart of a Parallel (Dynamic) Application Partitioning of the initial mesh Computation Iteration

More information

Evaluating partitioning of big graphs

Evaluating partitioning of big graphs Evaluating partitioning of big graphs Fredrik Hallberg, Joakim Candefors, Micke Soderqvist fhallb@kth.se, candef@kth.se, mickeso@kth.se Royal Institute of Technology, Stockholm, Sweden Abstract. Distributed

More information

Sparse Matrix Decomposition with Optimal Load Balancing

Sparse Matrix Decomposition with Optimal Load Balancing Sparse Matrix Decomposition with Optimal Load Balancing Ali Pınar and Cevdet Aykanat Computer Engineering Department, Bilkent University TR06533 Bilkent, Ankara, Turkey apinar/aykanat @cs.bilkent.edu.tr

More information

Part 2: Community Detection

Part 2: Community Detection Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection - Social networks -

More information

The Liquid Model Load Balancing Method

The Liquid Model Load Balancing Method P APER ACCEPTED FOR THE JOURNAL OF P ARALLEL ALGORITHMS AND APPLICATIONS, SPECIAL ISSUE ON ALGORITHMS FOR ENHANCED MESH ARCHITECTURES The Liquid Model Load Balancing Method Dominik Henrich Institute for

More information

COMP 422, Lecture 3: Physical Organization & Communication Costs in Parallel Machines (Sections 2.4 & 2.5 of textbook)

COMP 422, Lecture 3: Physical Organization & Communication Costs in Parallel Machines (Sections 2.4 & 2.5 of textbook) COMP 422, Lecture 3: Physical Organization & Communication Costs in Parallel Machines (Sections 2.4 & 2.5 of textbook) Vivek Sarkar Department of Computer Science Rice University vsarkar@rice.edu COMP

More information

Oracle8i Spatial: Experiences with Extensible Databases

Oracle8i Spatial: Experiences with Extensible Databases Oracle8i Spatial: Experiences with Extensible Databases Siva Ravada and Jayant Sharma Spatial Products Division Oracle Corporation One Oracle Drive Nashua NH-03062 {sravada,jsharma}@us.oracle.com 1 Introduction

More information

On computer algebra-aided stability analysis of dierence schemes generated by means of Gr obner bases

On computer algebra-aided stability analysis of dierence schemes generated by means of Gr obner bases On computer algebra-aided stability analysis of dierence schemes generated by means of Gr obner bases Vladimir Gerdt 1 Yuri Blinkov 2 1 Laboratory of Information Technologies Joint Institute for Nuclear

More information

Interconnection Networks Programmierung Paralleler und Verteilter Systeme (PPV)

Interconnection Networks Programmierung Paralleler und Verteilter Systeme (PPV) Interconnection Networks Programmierung Paralleler und Verteilter Systeme (PPV) Sommer 2015 Frank Feinbube, M.Sc., Felix Eberhardt, M.Sc., Prof. Dr. Andreas Polze Interconnection Networks 2 SIMD systems

More information

Load Distribution in Large Scale Network Monitoring Infrastructures

Load Distribution in Large Scale Network Monitoring Infrastructures Load Distribution in Large Scale Network Monitoring Infrastructures Josep Sanjuàs-Cuxart, Pere Barlet-Ros, Gianluca Iannaccone, and Josep Solé-Pareta Universitat Politècnica de Catalunya (UPC) {jsanjuas,pbarlet,pareta}@ac.upc.edu

More information

Beyond the Stars: Revisiting Virtual Cluster Embeddings

Beyond the Stars: Revisiting Virtual Cluster Embeddings Beyond the Stars: Revisiting Virtual Cluster Embeddings Matthias Rost Technische Universität Berlin September 7th, 2015, Télécom-ParisTech Joint work with Carlo Fuerst, Stefan Schmid Published in ACM SIGCOMM

More information

A New Unstructured Variable-Resolution Finite Element Ice Sheet Stress-Velocity Solver within the MPAS/Trilinos FELIX Dycore of PISCEES

A New Unstructured Variable-Resolution Finite Element Ice Sheet Stress-Velocity Solver within the MPAS/Trilinos FELIX Dycore of PISCEES A New Unstructured Variable-Resolution Finite Element Ice Sheet Stress-Velocity Solver within the MPAS/Trilinos FELIX Dycore of PISCEES Irina Kalashnikova, Andy G. Salinger, Ray S. Tuminaro Numerical Analysis

More information

Master of Science in Computer Science

Master of Science in Computer Science Master of Science in Computer Science Background/Rationale The MSCS program aims to provide both breadth and depth of knowledge in the concepts and techniques related to the theory, design, implementation,

More information

NETZCOPE - a tool to analyze and display complex R&D collaboration networks

NETZCOPE - a tool to analyze and display complex R&D collaboration networks The Task Concepts from Spectral Graph Theory EU R&D Network Analysis Netzcope Screenshots NETZCOPE - a tool to analyze and display complex R&D collaboration networks L. Streit & O. Strogan BiBoS, Univ.

More information

Parallel Architectures and Interconnection

Parallel Architectures and Interconnection Chapter 2 Networks Parallel Architectures and Interconnection The interconnection network is the heart of parallel architecture. Feng [1] - Chuan-Lin and Tse-Yun 2.1 Introduction You cannot really design

More information

Solving a 2D Knapsack Problem Using a Hybrid Data-Parallel/Control Style of Computing

Solving a 2D Knapsack Problem Using a Hybrid Data-Parallel/Control Style of Computing Solving a D Knapsack Problem Using a Hybrid Data-Parallel/Control Style of Computing Darrell R. Ulm Johnnie W. Baker Michael C. Scherger Department of Computer Science Department of Computer Science Department

More information

Why the Network Matters

Why the Network Matters Week 2, Lecture 2 Copyright 2009 by W. Feng. Based on material from Matthew Sottile. So Far Overview of Multicore Systems Why Memory Matters Memory Architectures Emerging Chip Multiprocessors (CMP) Increasing

More information

FRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG

FRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG FRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG INSTITUT FÜR INFORMATIK (MATHEMATISCHE MASCHINEN UND DATENVERARBEITUNG) Lehrstuhl für Informatik 10 (Systemsimulation) Massively Parallel Multilevel Finite

More information

MATHEMATICS (MATH) 3. Provides experiences that enable graduates to find employment in sciencerelated

MATHEMATICS (MATH) 3. Provides experiences that enable graduates to find employment in sciencerelated 194 / Department of Natural Sciences and Mathematics MATHEMATICS (MATH) The Mathematics Program: 1. Provides challenging experiences in Mathematics, Physics, and Physical Science, which prepare graduates

More information

Efficient Load Balancing by Adaptive Bypasses for the Migration on the Internet

Efficient Load Balancing by Adaptive Bypasses for the Migration on the Internet Efficient Load Balancing by Adaptive Bypasses for the Migration on the Internet Yukio Hayashi yhayashi@jaist.ac.jp Japan Advanced Institute of Science and Technology ICCS 03 Workshop on Grid Computing,

More information

A Review of Dynamic Load Balancing in Distributed Virtual En quantities

A Review of Dynamic Load Balancing in Distributed Virtual En quantities 6 Dynamic Load Balancing in Distributed Virtual Environments using Heat Diffusion YUNHUA DENG and RYNSON W. H. LAU, City University of Hong Kong Distributed virtual environments (DVEs) are attracting a

More information