Distributed Optimization of Fiber Optic Network Layout using MATLAB. R. Pfarrhofer, M. Kelz, P. Bachhiesl, H. Stögner, and A. Uhl



Similar documents
Observations on Data Distribution and Scalability of Parallel and Distributed Image Processing Applications

Cellular Computing on a Linux Cluster

Tackling Big Data with MATLAB Adam Filion Application Engineer MathWorks, Inc.

Client/Server Computing Distributed Processing, Client/Server, and Clusters

Load Balancing on a Non-dedicated Heterogeneous Network of Workstations

Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes

Parallel Simplification of Large Meshes on PC Clusters

A Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment

Video Encryption Exploiting Non-Standard 3D Data Arrangements. Stefan A. Kramatsch, Herbert Stögner, and Andreas Uhl

Scientific Computing Programming with Parallel Objects

Chapter 18: Database System Architectures. Centralized Systems

Centralized Systems. A Centralized Computer System. Chapter 18: Database System Architectures

Chapter 15: Distributed Structures. Topology

Distributed Dynamic Load Balancing for Iterative-Stencil Applications

A Distributed Render Farm System for Animation Production

Big Graph Processing: Some Background

COMP5426 Parallel and Distributed Computing. Distributed Systems: Client/Server and Clusters

Universidad Simón Bolívar

Interconnect Efficiency of Tyan PSC T-630 with Microsoft Compute Cluster Server 2003

System requirements for MuseumPlus and emuseumplus

Building an Inexpensive Parallel Computer

A Theory of the Spatial Computational Domain

Bringing Big Data Modelling into the Hands of Domain Experts

IT of SPIM Data Storage and Compression. EMBO Course - August 27th! Jeff Oegema, Peter Steinbach, Oscar Gonzalez

Silviu Panica, Marian Neagul, Daniela Zaharie and Dana Petcu (Romania)

Technical Research Paper. Performance tests with the Microsoft Internet Security and Acceleration (ISA) Server

Performance Characteristics of a Cost-Effective Medium-Sized Beowulf Cluster Supercomputer

Optimization of Cluster Web Server Scheduling from Site Access Statistics

Scalable Distributed Service Integrity Attestation for Software-as-a-Service Clouds

A Peer-to-peer Extension of Network-Enabled Server Systems

Control 2004, University of Bath, UK, September 2004

2015 The MathWorks, Inc. 1

Load balancing model for Cloud Data Center ABSTRACT:

Advanced Computational Software

Windows Compute Cluster Server Miron Krokhmal CTO

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/ CAE Associates

AdminToys Suite. Installation & Setup Guide

wu.cloud: Insights Gained from Operating a Private Cloud System

Big Data and Cloud Computing for GHRSST

Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU

Building a Private Cloud with Eucalyptus

High Performance Computing for Operation Research

On-Demand Supercomputing Multiplies the Possibilities

Agenda. Distributed System Structures. Why Distributed Systems? Motivation

Parallel Algorithm for Dense Matrix Multiplication

CGMgraph/CGMlib: Implementing and Testing CGM Graph Algorithms on PC Clusters

Medical Image Processing on the GPU. Past, Present and Future. Anders Eklund, PhD Virginia Tech Carilion Research Institute

LOAD BALANCING FOR MULTIPLE PARALLEL JOBS

PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN

Denis Caromel, CEO Ac.veEon. Orchestrate and Accelerate Applica.ons. Open Source Cloud Solu.ons Hybrid Cloud: Private with Burst Capacity

High Performance Computing

Remote Graphical Visualization of Large Interactive Spatial Data

System Models for Distributed and Cloud Computing

System Requirements for Microsoft Dynamics GP 9.0

Dynamic Load Balancing Strategy for Grid Computing

1 Bull, 2011 Bull Extreme Computing

MOSIX: High performance Linux farm

Faculty of Computer Science Computer Graphics Group. Final Diploma Examination

Adaptive Time-Dependent CFD on Distributed Unstructured Meshes

Dell High-Performance Computing Clusters and Reservoir Simulation Research at UT Austin.

Hadoop on a Low-Budget General Purpose HPC Cluster in Academia

LaPIe: Collective Communications adapted to Grid Environments

Three Paths to Faster Simulations Using ANSYS Mechanical 16.0 and Intel Architecture

An Experimental Workflow Development Platform for Historical Document Digitisation and Analysis

An Empirical Study and Analysis of the Dynamic Load Balancing Techniques Used in Parallel Computing Systems

Cluster, Grid, Cloud Concepts

Scheduling and Resource Management in Computational Mini-Grids

MapReduce Algorithms. Sergei Vassilvitskii. Saturday, August 25, 12

Understanding the Benefits of IBM SPSS Statistics Server

Overlapping Data Transfer With Application Execution on Clusters

A Load Balancing Algorithm based on the Variation Trend of Entropy in Homogeneous Cluster

Reliable Systolic Computing through Redundancy

The Lattice Project: A Multi-Model Grid Computing System. Center for Bioinformatics and Computational Biology University of Maryland

Efficient Parallel Execution of Sequence Similarity Analysis Via Dynamic Load Balancing

Speeding up MATLAB and Simulink Applications

How To Test For Performance And Scalability On A Server With A Multi-Core Computer (For A Large Server)

SGI. High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems. January, Abstract. Haruna Cofer*, PhD

Architectural Level Power Consumption of Network on Chip. Presenter: YUAN Zheng

Scalability and Classifications

PRODUCT INFORMATION. Insight+ Uses and Features

High Performance Computing in CST STUDIO SUITE

Using Peer to Peer Dynamic Querying in Grid Information Services

Grid based Integration of Real-Time Value-at-Risk (VaR) Services. Abstract

Mesh Generation and Load Balancing

Local-Area Network -LAN

Chapter 7. Using Hadoop Cluster and MapReduce

Expanding the CASEsim Framework to Facilitate Load Balancing of Social Network Simulations

Recommended hardware system configurations for ANSYS users

KnowledgeSEEKER Marketing Edition

High Performance Computing with Linux Clusters

Planning the Installation and Installing SQL Server

IBM Deep Computing Visualization Offering

Visualization Cluster Getting Started

Using In-Memory Computing to Simplify Big Data Analytics

Desktop Virtualization in the Educational Environment

LCMON Network Traffic Analysis

WIRELESS TRAINING SOLUTIONS. by vlogic, Inc. L a b 0.3 Remote Access Labs

Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster

Performance Analysis and Optimization Tool

Report Paper: MatLab/Database Connectivity

Transcription:

Distributed Optimization of Fiber Optic Network Layout using MATLAB R. Pfarrhofer, M. Kelz, P. Bachhiesl, H. Stögner, and A. Uhl uhl@cosy.sbg.ac.at R. Pfarrhofer, M. Kelz, P. Bachhiesl, H. Stögner, and A. Uhl 1 Carinthia Tech Institute & Salzburg University

Outline Introduction MATLAB Custer Computing: MDICE Optimization of Fiber Optic Network Layout Sequential Approach Distribution Strategy Experimental Results Conclusions R. Pfarrhofer, M. Kelz, P. Bachhiesl, H. Stögner, and A. Uhl 2 Carinthia Tech Institute & Salzburg University

Introduction About 95% of the total costs for the implementation of fiber optic networks may be expected for the area-wide realization of the last mile (access networks) planning tools currently are employed for the core net domain only. NETQUEST-OPT is a MATLAB-based planning tool for the computation of cost optimized and real world laying for fiber optic access networks, it is based on cluster strategies, exact and approximative graph theoretical algorithms, combinatorial optimization, and ring closure heuristics. NETQUEST-OPT currently requires computation times of several hours on a single workstation even for medium sized access domains. For an efficient planning process interactivity is a desired property, e.g. for rapid prototyping of different clustering strategies high performance computing systems are required. R. Pfarrhofer, M. Kelz, P. Bachhiesl, H. Stögner, and A. Uhl 3 Carinthia Tech Institute & Salzburg University

Cluster Computing and MATLAB Emerge of cluster computing a cheap and highly available and yet powerful architecture for image processing applications MATLAB provides users with easy access to an extensive library of high quality numerical routines which can be used in a dynamical way and may be easily extended and integrated into existing applications. MATLAB in HPC Developing a high performance interpreter (MPI/PVM based communication routines) or coarse grained parallelization by splitting up work among multiple MATLAB sessions. Calling high performance numerical libraries (e.g. SCALAPACK) Compiling MATLAB to another language (e.g. C, HPF) R. Pfarrhofer, M. Kelz, P. Bachhiesl, H. Stögner, and A. Uhl 4 Carinthia Tech Institute & Salzburg University

MDICE MDICE: MATLAB-based DIstributed Computing Environment Goal: in contrast to previous approaches the goal is to get along with a single MATLAB client (instead of a MATLAB client at each participating compute node) The main idea was to change the client program in a way that it can be compiled to a standalone application. The compiled client program library and the datasets for the job are sent to the compute node, where a background client service is running with low priority. For this reason the involved client machines may be used as desktop machines by other users during the computation. R. Pfarrhofer, M. Kelz, P. Bachhiesl, H. Stögner, and A. Uhl 5 Carinthia Tech Institute & Salzburg University

Client-Server Concept of MDICE Download http://www.fh-kaernten.at/nocms/studiengaenge/ma-tel/mdice/ R. Pfarrhofer, M. Kelz, P. Bachhiesl, H. Stögner, and A. Uhl 6 Carinthia Tech Institute & Salzburg University

Optimization of Fiber Optic Network Layout: Sequential I We map the real world geometries to a set of nodes V of a graph G = (V, E). Penalty Grid: A spatially balanced score card combines all relevant land classes with typical implementation costs. The access net domain is regarded as a regular grid [x i, x i+1 ] [y j, y j+1 ], i and j are indices of finite index sets. Each entry of the penalty matrix P describes the specific implementation costs for the grid pixel [x i, x i+1 ] [y j, y j+1 ]. R. Pfarrhofer, M. Kelz, P. Bachhiesl, H. Stögner, and A. Uhl 7 Carinthia Tech Institute & Salzburg University

Optimization of Fiber Optic Network Layout: Sequential II For an edge [u, v], u V, v V, u v, we define the entry W u,v of the symmetric cost matrix W Mat ( V V ) according to W u,v := K 1 k=0 s k+1 s k 2 P k+1. The computation of W is one of the most time demanding procedures in the entire laying optimization process and requires ( V 1) V /2 calls of computing W u,v. In real world geometries, the network planning problem leads to dimensions up to V = 20000 nodes, therefore we first determine local clusters, compute a local solution in each cluster and find cluster shortest spanning trees. R. Pfarrhofer, M. Kelz, P. Bachhiesl, H. Stögner, and A. Uhl 8 Carinthia Tech Institute & Salzburg University

Optimization of Fiber Optic Network Layout: Sequential III 1677 C (2) 1477 C (3) 1277 northing [m] 1077 877 677 C (1) 477 277 77 69 269 469 669 869 1069 1269 easting [m] 37 access nodes (triangles) Arial view of target area and three clusters (with overlayed computed solution) R. Pfarrhofer, M. Kelz, P. Bachhiesl, H. Stögner, and A. Uhl 9 Carinthia Tech Institute & Salzburg University

Optimization of Network Layout: Distribution Strategy Inter-cluster approach: parallelization by segmentation of the spatial domain into independent clusters is efficient but leads to suboptimal global umbrella networks only, especially in case of a larger number of clusters which would be required for scalable parallel or distributed execution. Intra-cluster approach: does not exploit the parallelism introduced by the clustering process at all, therefore an optimal global solution may be obtained if clustering can be avoided. In this context, the computations associated with the evaluation of the cost matrix W (which are covered by our experiments) may be distributed in a simple fashion by domain decomposition without the requirement of inter-process communication. This simple partitioning approach leads to entirely deterministic behaviour since the computational effort of evaluating parts of the matrix W is not data dependent, however, load balancing functionalities are provided in order to be able to cope with the possible interference of other applications and heterogeneous environments. R. Pfarrhofer, M. Kelz, P. Bachhiesl, H. Stögner, and A. Uhl 10 Carinthia Tech Institute & Salzburg University

Experimental Settings We consider cost matrices W from real-world problems associated with a different number of edges (i.e. 36950, 114350, 394650, and 1227714) and vary the number of jobs distributed among the workers after fixing the edges to 394650. Computational task is split into a certain number of equally sized jobs N to be distributed by the server among the M clients in a dynamic fashion ( asynchronous single task pool method ) Server machine (1.99 GHz Intel Pentium 4, 504 MB RAM) and client machines (996 MHz Intel Pentium 3, 128 MB RAM and 730 MHz Intel Pentium 3, 128 MB RAM), all types under Windows XP Prof., 100 MBit/s Ethernet network. Sequential reference times have been achieved on a 996 MHz client machine with a compiled (not interpreted) version of the application to allow a fair comparison We use MATLAB 6.5.0 with the MATLAB compiler 3.0 and the LCC C compiler 2.4. Homogenous environment (fast clients only) vs. Heterogenous environment (six 996 MHz and four 730 MHz clients) R. Pfarrhofer, M. Kelz, P. Bachhiesl, H. Stögner, and A. Uhl 11 Carinthia Tech Institute & Salzburg University

Experimental Results I: Homogenous Environment It is clearly exhibited that speedup saturates for the smaller problems at 4 or 8 clients, respectively. On the other hand, reasonable scalability is shown for the larger problems up to 20 clients. The number of clients is set to 10, and 30 jobs are used. The staggered start of computation at different clients which is due to the expensive initial communication phase. 16 14 36,950 edges (25 jobs) 114,350 edges (25 jobs) 394,650 edges (25 jobs) 1,227,714 edges (73 jobs) 10 9 Doing nothing Communicating Processing 8 12 7 Speedup 10 8 Worker 6 5 6 4 4 3 2 2 1 2 4 6 8 10 12 14 16 18 20 Number of worker nodes 0 100 200 300 400 500 600 Time in seconds Speedup for varying problem-size Visualization of execution R. Pfarrhofer, M. Kelz, P. Bachhiesl, H. Stögner, and A. Uhl 12 Carinthia Tech Institute & Salzburg University

Experimental Results II: Heterogenous Environment We vary the number of jobs distributed among the clients. Clearly, a high number of jobs leads to an excellent load distribution, but on the other hand the communication effort is increased thereby reducing the overall efficiency. The optimal number of jobs is identified to be 75. With 10 jobs on 10 clients, the performance is worse as compared to the analogous homogenous case and the slower clients (numbers 3,6,7,8) are immediately identified. 1000 950 10 9 Doing nothing Communicating Processing 8 900 7 Time in seconds 850 Worker 6 5 800 4 3 750 2 700 10 15 18 25 30 45 50 75 90 150 225 Number of jobs 1 0 100 200 300 400 500 600 700 800 Time in seconds Execution time (varying number of jobs) Visualization of execution (10 jobs) R. Pfarrhofer, M. Kelz, P. Bachhiesl, H. Stögner, and A. Uhl 13 Carinthia Tech Institute & Salzburg University

Experimental Results III: Execution Visualization Execution behaviour for 25 and 75 jobs: visual appearance of the two cases is quite different, but we result in almost identical execution times. Whereas for N = 25 we have little communication effort but highly unbalanced load, the opposite is true for N = 75. 10 Doing nothing Communicating Processing 10 Doing nothing Communicating Processing 9 9 8 8 7 7 Worker 6 5 Worker 6 5 4 4 3 3 2 2 1 1 0 100 200 300 400 500 600 700 Time in seconds 0 100 200 300 400 500 600 700 Time in seconds 25 Jobs 75 Jobs R. Pfarrhofer, M. Kelz, P. Bachhiesl, H. Stögner, and A. Uhl 14 Carinthia Tech Institute & Salzburg University

Conclusions 1. The custom MATLAB toolbox MDICE may take advantage of the large number of Windows NT-architecture based machines available in companies and universities at low licensing costs. 2. The distributed approach suggested in this work allows for an interactive network planning process provided enough PCs are available. 3. Additionally, our approach may lead to a better quality of the computed network solution as compared to a sequential clustering approach in case the clustering can be avoided in the distributed execution. R. Pfarrhofer, M. Kelz, P. Bachhiesl, H. Stögner, and A. Uhl 15 Carinthia Tech Institute & Salzburg University