Cellular Computing on a Linux Cluster



Similar documents
Dynamic load balancing of parallel cellular automata

Load Balancing on a Non-dedicated Heterogeneous Network of Workstations

Distributed Systems. REK s adaptation of Prof. Claypool s adaptation of Tanenbaum s Distributed Systems Chapter 1

A Robust Dynamic Load-balancing Scheme for Data Parallel Application on Message Passing Architecture

A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association

Client/Server Computing Distributed Processing, Client/Server, and Clusters

CHAPTER 1 INTRODUCTION

Advances in Smart Systems Research : ISSN : Vol. 3. No. 3 : pp.

Chapter 12: Multiprocessor Architectures. Lesson 01: Performance characteristics of Multiprocessor Architectures and Speedup

Dell High-Performance Computing Clusters and Reservoir Simulation Research at UT Austin.

High Performance Computing

High Performance Computing. Course Notes HPC Fundamentals

Interconnect Efficiency of Tyan PSC T-630 with Microsoft Compute Cluster Server 2003

QosCosGrid Grid Technologies and Complex System Modelling

Performance Characteristics of a Cost-Effective Medium-Sized Beowulf Cluster Supercomputer

Middleware and Distributed Systems. Introduction. Dr. Martin v. Löwis

A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures

SERVER CLUSTERING TECHNOLOGY & CONCEPT

Parallel Algorithm for Dense Matrix Multiplication

DECENTRALIZED LOAD BALANCING IN HETEROGENEOUS SYSTEMS USING DIFFUSION APPROACH

Spring 2011 Prof. Hyesoon Kim

Distributed communication-aware load balancing with TreeMatch in Charm++

Principles and characteristics of distributed systems and environments

COS 318: Operating Systems. Virtual Machine Monitors

Parallel Simplification of Large Meshes on PC Clusters

How To Understand The Concept Of A Distributed System

22S:295 Seminar in Applied Statistics High Performance Computing in Statistics

P013 INTRODUCING A NEW GENERATION OF RESERVOIR SIMULATION SOFTWARE

The Methodology of Application Development for Hybrid Architectures

A Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment

Load balancing in a heterogeneous computer system by self-organizing Kohonen network

Symmetric Multiprocessing

A Flexible Cluster Infrastructure for Systems Research and Software Development

Performance Monitoring of Parallel Scientific Applications

A Comparative Performance Analysis of Load Balancing Algorithms in Distributed System using Qualitative Parameters

Control 2004, University of Bath, UK, September 2004

A Review of Customized Dynamic Load Balancing for a Network of Workstations

MOSIX: High performance Linux farm

Distributed RAID Architectures for Cluster I/O Computing. Kai Hwang

PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN

Centralized Systems. A Centralized Computer System. Chapter 18: Database System Architectures

Operating System for the K computer

Virtual machine interface. Operating system. Physical machine interface

DISTRIBUTED AND PARALLELL DATABASE

Introduction to Cloud Computing

GPUs for Scientific Computing

High Performance Computing in CST STUDIO SUITE

Chapter 2 Parallel Architecture, Software And Performance

OpenMosix Presented by Dr. Moshe Bar and MAASK [01]

Client/Server and Distributed Computing

CS550. Distributed Operating Systems (Advanced Operating Systems) Instructor: Xian-He Sun

Overlapping Data Transfer With Application Execution on Clusters

MULTI-SPERT PHILIPP F ARBER AND KRSTE ASANOVI C. International Computer Science Institute,

Multilevel Load Balancing in NUMA Computers

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/ CAE Associates

Chapter 18: Database System Architectures. Centralized Systems

- An Essential Building Block for Stable and Reliable Compute Clusters

CMS Tier-3 cluster at NISER. Dr. Tania Moulik

Parallel Programming

Recommended hardware system configurations for ANSYS users

LinuxWorld Conference & Expo Server Farms and XML Web Services

Proactive, Resource-Aware, Tunable Real-time Fault-tolerant Middleware

Load Balancing using Potential Functions for Hierarchical Topologies

Cluster, Grid, Cloud Concepts

Dynamic Load Balancing in a Network of Workstations

A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS

Optimizing Shared Resource Contention in HPC Clusters

Storage Systems Autumn Chapter 6: Distributed Hash Tables and their Applications André Brinkmann

Introduction to DISC and Hadoop

Parallel Programming Survey

Lecture 2 Parallel Programming Platforms

Dynamic Load Balancing in Charm++ Abhinav S Bhatele Parallel Programming Lab, UIUC

Parallel Computing with MATLAB

Performance of the JMA NWP models on the PC cluster TSUBAME.

Improved LS-DYNA Performance on Sun Servers

Workshare Process of Thread Programming and MPI Model on Multicore Architecture

A Comparison of Distributed Systems: ChorusOS and Amoeba

Introduction to High Performance Cluster Computing. Cluster Training for UCL Part 1

Distributed Systems LEEC (2005/06 2º Sem.)

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November ISSN

System requirements for MuseumPlus and emuseumplus

Real Time Network Server Monitoring using Smartphone with Dynamic Load Balancing

Online Remote Data Backup for iscsi-based Storage Systems

Transcription:

Cellular Computing on a Linux Cluster Alexei Agueev, Bernd Däne, Wolfgang Fengler TU Ilmenau, Department of Computer Architecture

Topics 1. Cellular Computing 2. The Experiment 3. Experimental Results 4. Conclusion 2

1. Cellular Computing Extension of cellular automata: n-dimensional regular grid of connected cells Each cell: State Algorithm Major types: Synchronous vs. asynchronous Uniform vs. non-uniform cellular computing = simplicity + vast parallelism + locality (Sipper) 3

Sipper s Scheme Cellular computing Distributed computing Shared-memory computing General-purpose architectures Complex Serial Parallel Global Local Partially connected neural networks Fully connected neural networks Simple Finite-state machines Moshe Sipper: The Emergence of Cellular Computing. in: IEEE Computer, July 1999, pp. 18-26 4

Benefits and Examples Benefits: Scalability Robustness Simple approach to parallel programming Examples: Image processing Pseudorandom numbers Optimizations 5

Cellular Computers Real cellular computer: Direct hardware implementation Highly homogenous chip structure Algorithm fixed or loadable Virtual cellular computer: Simulation of a cellular structure Runs on single processor or multiprocessor Algorithm loadable 6

2. The Experiment Virtual cellular computer, implemented on a workstation cluster Parts: Distributed implementation of a virtual cellular computer Benchmark application: Conway s Game of Life Questions: Performance benefits from coarse-grain parallelism Cellular computing as approach to parallel programming for non-cellular distributed architectures 7

Conway s Game of Life: Rules A living cell with 0 or 1 neighbours dies from isolation. A living cell with 4 or more neighbours dies from overcrowding. A dead cell with exactly 3 neighbours becomes alive. All other cells remain unchanged. 8

Conway s Game of Life: Sample Patterns 9

Distributed Implementation Communicating by message passing Cutting the cellular field into equal parts Correcting border columns by communicating results of overlapping parts 1 master node - n slave nodes 10

Cutting the Field One cell The cell's neighbourhood Border columns "Next" columns 11

Technical Detail 11 node PCs (PIII/500, 512Mb) Linux OS Gigabit Ethernet network, optical media (star topology, fully switched) MPI middleware (lam 6.2b) 12

3. Experimental Results 10 nodes 3 nodes stand-alone 10 nodes 3 nodes stand-alone 15 10 5 400 300 200 100 a) field size: 100 x 10 cells c) field size: 1.000 x 100 cells 10 nodes 3 nodes stand-alone 10 nodes 3 nodes stand-alone 40 30 20 10 40.000 30.000 20.000 10.000 b) field size: 100 x 100 cells d) field size: 10.000 x 1.000 cells Runtimes in seconds, for 10.000 iterations 13

Relative Performance relative speed (cells per second per node) 250 200 150 100 50 0 stand-alone 3 nodes 10 nodes 10 3 10 4 10 5 10 6 10 7 total number of cells 14

4. Conclusion Distributed implementation works For large fields speedup approachs ideal values Overhead comes from OS functions rather then node communication: communication amount is proportional to number of rows but: overhead per row proves to decrease when number of rows is increasing 15

Further Work Generalize from 2-dimensional to n-dimensional Universal application interface Benchmarking more applications Comparing to non-cellular distributed solutions of same problems 16