Load balancing in a heterogeneous computer system by self-organizing Kohonen network



Similar documents
Topological Properties

CSE 4351/5351 Notes 7: Task Scheduling & Load Balancing

Scalable Source Routing

Load Balancing. Load Balancing 1 / 24

Parallel Programming

DYNAMIC GRAPH ANALYSIS FOR LOAD BALANCING APPLICATIONS

An Ecient Dynamic Load Balancing using the Dimension Exchange. Ju-wook Jang. of balancing load among processors, most of the realworld

Distributed Dynamic Load Balancing for Iterative-Stencil Applications

TOPAS: a Web-based Tool for Visualization of Mapping Algorithms

Fault-Tolerant Routing Algorithm for BSN-Hypercube Using Unsafety Vectors

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS

Overview. Swarms in nature. Fish, birds, ants, termites, Introduction to swarm intelligence principles Particle Swarm Optimization (PSO)

Comparison of Supervised and Unsupervised Learning Classifiers for Travel Recommendations

Data Mining and Neural Networks in Stata

DECENTRALIZED LOAD BALANCING IN HETEROGENEOUS SYSTEMS USING DIFFUSION APPROACH

System Interconnect Architectures. Goals and Analysis. Network Properties and Routing. Terminology - 2. Terminology - 1

Warshall s Algorithm: Transitive Closure

A STUDY OF TASK SCHEDULING IN MULTIPROCESSOR ENVIROMENT Ranjit Rajak 1, C.P.Katti 2, Nidhi Rajak 3

Scalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011

Segmentation of stock trading customers according to potential value

Mobile Phone APP Software Browsing Behavior using Clustering Analysis

MULTISTAGE INTERCONNECTION NETWORKS: A TRANSITION TO OPTICAL

Scalability and Classifications

Load balancing Static Load Balancing

Load Balancing and Termination Detection

Chapter ML:XI (continued)

Hadoop SNS. renren.com. Saturday, December 3, 11

Architectural Level Power Consumption of Network on Chip. Presenter: YUAN Zheng

Energy Efficient Load Balancing among Heterogeneous Nodes of Wireless Sensor Network

Self Organizing Maps: Fundamentals

ultra fast SOM using CUDA

Interconnection Networks Programmierung Paralleler und Verteilter Systeme (PPV)

Training a Self-Organizing distributed on a PVM network

A Review And Evaluations Of Shortest Path Algorithms

Load Balancing on a Non-dedicated Heterogeneous Network of Workstations

Praktikum Wissenschaftliches Rechnen (Performance-optimized optimized Programming)

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

Why the Network Matters

Static Load Balancing

Load Balancing using Potential Functions for Hierarchical Topologies

Nimble Algorithms for Cloud Computing. Ravi Kannan, Santosh Vempala and David Woodruff

A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster

Cellular Computing on a Linux Cluster

Data Mining Techniques Chapter 7: Artificial Neural Networks

Complex Network Visualization based on Voronoi Diagram and Smoothed-particle Hydrodynamics

Load Balancing Between Heterogenous Computing Clusters

Meta-Heuristics for Reconstructing Cross Cut Shredded Text Documents

MANAGING QUEUE STABILITY USING ART2 IN ACTIVE QUEUE MANAGEMENT FOR CONGESTION CONTROL

Introduction to Scheduling Theory

LOAD BALANCING TECHNIQUES

An ACO-based Approach for Scheduling Task Graphs with Communication Costs

Lecture 2 Parallel Programming Platforms

Parallel Data Selection Based on Neurodynamic Optimization in the Era of Big Data


GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications

Evaluation of Different Task Scheduling Policies in Multi-Core Systems with Reconfigurable Hardware

Home Page. Data Structures. Title Page. Page 1 of 24. Go Back. Full Screen. Close. Quit

Cost Model: Work, Span and Parallelism. 1 The RAM model for sequential computation:

Outline. NP-completeness. When is a problem easy? When is a problem hard? Today. Euler Circuits

Parallel Ray Tracing using MPI: A Dynamic Load-balancing Approach

6.852: Distributed Algorithms Fall, Class 2

Optimizing Configuration and Application Mapping for MPSoC Architectures

Guessing Game: NP-Complete?

A Comparison of General Approaches to Multiprocessor Scheduling

Approximation Algorithms

A Randomized Saturation Degree Heuristic for Channel Assignment in Cellular Radio Networks

Self-Organizing g Maps (SOM) COMP61021 Modelling and Visualization of High Dimensional Data

Hybrid Evolution of Heterogeneous Neural Networks

Graph theoretic approach to analyze amino acid network

Reliable Systolic Computing through Redundancy

LaPIe: Collective Communications adapted to Grid Environments

Embedded Systems 20 BF - ES

Interconnection Networks

Mathematics Course 111: Algebra I Part IV: Vector Spaces

Decentralized Dynamic Load Balancing: The Particles Approach

How To Understand The Network Of A Network

A Simple Feature Extraction Technique of a Pattern By Hopfield Network

DATA ANALYSIS II. Matrix Algorithms

Visualization of large data sets using MDS combined with LVQ.

Embedded Systems 20 REVIEW. Multiprocessor Scheduling

IMPROVED PROXIMITY AWARE LOAD BALANCING FOR HETEROGENEOUS NODES

Fast Matching of Binary Features

Study on Cloud Computing Resource Scheduling Strategy Based on the Ant Colony Optimization Algorithm

Web Cluster Dynamic Load Balancing- GA Approach

Some Computer Organizations and Their Effectiveness. Michael J Flynn. IEEE Transactions on Computers. Vol. c-21, No.

Chapter 2 Parallel Architecture, Software And Performance

Experiments on the local load balancing algorithms; part 1

PLAANN as a Classification Tool for Customer Intelligence in Banking

Graph/Network Visualization

Principles and characteristics of distributed systems and environments

Discuss the size of the instance for the minimum spanning tree problem.

A Heuristic Approach to Service Restoration in MPLS Networks

Section 13.5 Equations of Lines and Planes

QUALITY OF SERVICE METRICS FOR DATA TRANSMISSION IN MESH TOPOLOGIES

Interconnection Networks. Interconnection Networks. Interconnection networks are used everywhere!

Hyper Node Torus: A New Interconnection Network for High Speed Packet Processors

An Analysis on Density Based Clustering of Multi Dimensional Spatial Data

Advanced Web Usage Mining Algorithm using Neural Network and Principal Component Analysis

Data Mining. Cluster Analysis: Advanced Concepts and Algorithms

Manifold Learning Examples PCA, LLE and ISOMAP

Transcription:

Bull. Nov. Comp. Center, Comp. Science, 25 (2006), 69 74 c 2006 NCC Publisher Load balancing in a heterogeneous computer system by self-organizing Kohonen network Mikhail S. Tarkov, Yakov S. Bezrukov Abstract. An effective algorithm is proposed for a balancing computation load in a heterogeneous multicomputer system by the self-organizing neural Kohonen network. The algorithm takes into consideration different performances of the system processors. 1. Introduction In the general case, a computer cluster is a set of computer units (CU) connected by a network. Each unit has a processor, a memory, and possibly a hard disk. All CUs can have different performances, communication possibilities and operating systems. A structure of a computer system (CS) can be arbitrary and depends upon several factors, such as hardware tools, a goal for which the CS is created, and financial possibilities. Unlike CS with a shared memory, the computer clusters have difficulties with intercomputer communications because of a relatively low network throughput. To minimize the idle time of all CUs and communications between processes, it is necessary to optimally load all CUs by computations. In the general case, this problem is reduced to optimal mapping of a program graph onto a graph of the computer system [1, 2]. Because of the problem complexity (it is NP-complete), different heuristics are used to seek for the optimal mapping. Currently, the popular ones are methods based on analogies to physics and biology, for example, annealing simulation, genetic algorithms, and neural networks [3]. It is interesting to use a self-organizing Kohonen network [3, 4] for the solution of the mapping problem because of a high ability to parallelize the network functioning. 2. The problem of computation load balancing A goal of the optimal mapping of parallel program processes (branches) onto processors is to minimize the program execution time. It is equivalent to the idle time minimization for each processor, and simultaneously the minimization of communications between processes placed onto different processors [1, 2]. Let G p (V p, E p ) be a graph of a parallel program, where: V p is a set of the program branches, V p = n;

70 M.S. Tarkov, Ya.S. Bezrukov E p is a set of logical (virtual) channels implementing communications between branches; τ p is a weight of a program branch, the weight τ p is equal to the number of operations in the branch p; v pq is a weight of an edge (p, q) E p, the weight v pq being equal to the number of information units transferred between the branches p and q. Let G s (V s, E s ) be a graph of a computer system, where: V s is a set of elementary computers (EC), V s = m n; E s is a set of connections between ECs; ϑ i is the i-th computer performance; δ ij is a time for transferring the information unit between the neighbor computers i and j. Let f m : G p G s be a mapping of the program graph G p (V p, E p ) onto the graph G s (V s, E s ) of the computer system. A distance (the number of edges of a shortest path) between nodes i = f m (p) and j = f m (q) in the graph G s we denote as d ij. The quality of the mapping f m is evaluated by the object function Φ(f m ) = αφ C (f m ) + (1 α)φ T (f m ), (1) where Φ C (f m ) evaluates the balance of a computation load, Φ T (f m ) evaluate the complete time of intercomputer communications, and α (0, 1) denotes a relative weight of Φ C (f m ) in the object function. For the mapping f m, a complete computation time of the i-th computer is equal to t i = 1 τ p. ϑ i Then Φ C (f m ) = f m(p)=i m (t i t min ) 2, i=1 where t min is a running time of the system computers for an ideal load balancing. In [4], the value t (1) min = 1 m m t i (2) has been proposed as an estimate of t min. On the other hand, a minimal (ideal) execution time of a parallel program is equal to the value i=1

Load balancing in a heterogeneous computer system... 71 m min = i=1 m i=1 ϑ i t (2) f m(p)=i τ p = n p=1 τ p m i=1 ϑ i (3) which can be attained on a computer with the performance equal to the total one for the CS without communication overhead. Communication overhead is estimated by the function Φ T (f m ) = p q δ k, v pq d pq where summation is realized for all pairs (p, q) of communicating branches of a parallel program and for all edges in the shortest path between the nodes i = f m (p) and j = f m (q) (δ k is a weight of the k-th edge of the path, k = 1,..., d pq ). k=1 3. The self-organizing neural Kohonen network A self-organizing Kohonen network consists of one layer of neurons. The number of inputs of each neuron is equal to the dimension L of an input data space. The number of neurons is determined by the number of classes in a set of objects processed by the network. The network is defined by the weight matrix W, where each neuron corresponds to a row W p and W pl is a weight of the l-th input of the p-th neuron, p = 1,..., n, l = 1,..., L. The network training begins with a random definition of weights in the matrix W and realized by the following iterative algorithm: 1. On the step t = 1, 2,... feed the vector x k, k = 1,..., K, from the training set to the network input. Calculate the distance d(x k, W i ) = x k W i between the input vector x k and the weight vector W p, p = 1,..., n, of each neuron. 2. Choose the neuron-winner W win with the weight vector nearest to the input vector x k. Modify the weights of the winner and neurons from its neighborhood by the formula Wp t+1 = Wp t + η(x k Wp), t where η (0, 1) is a training coefficient. 3. Go to the following iteration (t t + 1). As the network training proceeds, a neighborhood radius for the winner and the training step are reduced. As a training result, the neuron weight vectors are shifted to the cluster centers of the training set [3, 4]. 4. Load balancing by the Kohonen network The Kohonen network neurons correspond to the parallel program branches. The Kohonen network is imbedded into the topological space, where computers are interpreted as regions. The neighboring regions correspond to the

72 M.S. Tarkov, Ya.S. Bezrukov adjacent nodes of the CS graph, the processes being interpreted as points in the space. A point neighborhood is introduced in accordance with the graph G p of the parallel program. A process is in the neighborhood with radius r of another process, if the distance between them in the graph G p is less than r. As a result of training, processes, described by the neuron vectors, are mapped onto the regions corresponding to the system computers. Let us consider the computation load balancing by the Kohonen network in a heterogeneous CS. The neurons are in correspondence with the branches of the parallel program and are described by the vectors W s, s V p, in the topological space. The sequential mapping algorithm: 1. Let initially the branches of the parallel program be arbitrarily mapped onto the computers of the system. For this distribution compute the initial value of the object function Φ t for this distribution. 2. Choose a computer randomly and calculate new values of the object function for mapping each branch with its neighborhood onto this computer. If there are new values of Φ i, i = 1,..., n, less than Φ t, then choose a minimal value Φ t = min i Φ i and save the respective mapping. 3. Repeat Step 2, while there is a possibility to diminish the object function. A parallel version of the mapping algorithm is as follows: Partition a set of computers into subsets, and find a new minimal value of the object function for each subset independently. Then summarize the results for all subsets. 5. Experiments The experiments have been executed for the optimal mapping of typical program graphs onto typical computer system graphs. The following mappings were investigated: a line and a mesh onto a line; a ring and a torus onto a ring; and a line, a ring, a mesh, and a torus onto a complete graph. As the mapping criterion, the object function Φ = m (t i t min ) 2 i=1 has been used, where t i is a real processor load, and t min is the processor load for the balanced mapping of processes onto processors (here we set

Load balancing in a heterogeneous computer system... 73 Figure 1. Comparison of mapping versions with (T1 and T2) and without (M) accounting for heterogeneity of a computer system Figure 2. Comparison of mapping versions with T1 and T2 estimates for a minimal processing time α = 1 for expression (1)). The values t (1) min from (2) and the proposed here t (2) min from (3) are used as t min. The performance analysis of modern computers shows that their relative performances can be approximately estimated as random integers uniformly distributed on the segment [1, 40]. The relative weights of the program branches in the experiments are distributed on the segment [1, 20]. Figures 1, 2 show examples of mean values of the function Φ in the experiments for different versions of mapping a ring onto a complete graph with m = 10 nodes. The curves T1 are calculated for the Wyler s estimate (2) and the curves T2 are calculated for the estimate (3). Figure 1 shows that a consideration of the CS heterogeneity allows one to essentially diminish the non-uniformity of a computation load (approximately, by 10 times!). Also, the experiments show that estimate (3) provides a better mapping than estimate (2) by K. Wyler [4] (see Figure 2) and the mapping performance is slightly dependent upon the choice of a program graph and a CS graph. 6. Conclusion A new algorithm for mapping a parallel program graph onto a graph of a heterogeneous computer system is proposed. The algorithm is based on the self-organizing neural Kohonen network and can be easily parallelized. The experiments show that the developed algorithm implements an effective mapping of typical program structures (a line, a ring, a mesh, and a torus) onto different structures (a line, a ring, a mesh, a torus, and a complete graph) of the heterogeneous computer system. A parallel version of the mapping algorithm can be used for the dynamic load balancing in heterogeneous computer systems.

74 M.S. Tarkov, Ya.S. Bezrukov References [1] Bokhari S.H. One the mapping problem // IEEE Trans. Comp. 1981. Vol. C- 30, No. 3. P. 207 214. [2] Tarkov M.S. Mapping parallel program structures onto structures of distributed computer systems // Optoelectronics, Instrumentation and Data Processing. 2003. Vol. 39, No. 3. P. 72 83. [3] Osowski S. Neural Networks for Information Processing. Moscow: Finansi i Statistika, 2002 (In Russian). [4] Wyler K. Self-organizing mapping in a multiprocessor system // IAM-93-001. 1993.