A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster
|
|
|
- Poppy Johns
- 10 years ago
- Views:
Transcription
1 Acta Technica Jaurinensis Vol. 3. No A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster G. Molnárka, N. Varjasi Széchenyi István University Győr, Hungary, H-906 Egyetem tér 1. Phone: , fax: [email protected] Abstract: There are several iterative models for solving general, large linear equations. In this paper a parallel algorithm with slow convergence speed for studying the speed-up effect of the parallel algorithms has been presented. The difference between the ring and hierarchical topology considering running speed and efficiency has also been explored. Detailed numerical test results of the algorithm including the speedup of parallel execution are shown. Keywords: parallel programming, linear equation, cluster computing 1. The minimal residual algorithm The main goal of this article is to study the speed up effect on a parallel computer using a parallel algorithm for solving a full rank, general, symmetric, positive definite not sparse (but dense) linear equation with high condition number 10 ( n = 1000, cond = 10 ; n = 5000, n = 15000, 16 cond = cond = 4 10 ; n = 10000, ), where cond( A) = A A 1 1 cond = 7 10 ; and Euclidean norm was used. The condition number shows the difficulty of the linear equation. Higher condition number means the complexity of the problem, in other words the equation gets more and more difficult with numerical methods. The solving algorithms of the general Ax = b equation are well known [1] [] [3]. In the case of large systems direct algorithms are inefficient. Only iterative methods can be used that can produce results with the desired precision [4] [10], otherwise the floating point arithmetic causes several rounding errors. The base of the presented numerical algorithm for the solution of linear systems of equations is a generalisation of the classical one-step iterative algorithm (such as the gradient method). Generalisation will not improve the convergence speed of the algorithm but it highly improves parallel execution. In a sequential case the base algorithm [5] [6] has slow convergence speed. The minimal residual algorithm is a widely known algorithm, but the suggested versions of Algorithm1 and Algorithm have been created by the authors. The results of the parallel realisation of the algorithms and the measured data are the results of the present research. 65
2 Vol. 3. No Acta Technica Jaurinensis The suggested algorithms give a good opportunity to study the effects of parallel processing. If a cluster or a multiprocessor computer is used, one can expect considerable speed-up effect.. Methods for parallelisation on homogeneous and heterogeneous systems The aim of parallel processing is to break a large problem down to several smaller components or calculations that can be solved parallelly with different processors at the same time. The most efficient tools for scientific computations would be massively parallel computers, with a large shared memory, but this hardware is expensive and unattainable for the research team. Other solutions can be distributed systems and cluster computing. A small cluster of 16 PCs with normal network connection and an interconnected cluster machine with 88 processors (HP BladeSystem C3000) were used. On the cluster computing model a message passing software (MPI) was used to solve the tasks..1. Ring and hierarchical topologies In parallel solutions there are two bottlenecks for optimisation: the communication and the calculations. These aspects have been studied with two topologies. For linear equations based on numerical models the ring model is often used [8]. In this case every node has a connection with the two neighbouring nodes, or other nodes. This solution works efficiently on homogeneous systems. On this model the heartbeat algorithm is useful (see Figure 1 a). First it starts an initialization procedure, then a loop starts. In the first phase of the loop a data sending and receiving mechanism process is accomplished (synchronization). This is the data exchange period between each computation node. After that, every node runs the computation algorithm. This is the cpu period of the work. The loop runs until the stopping criteria. In this model every node has an equivalent role. This model needs a homogeneous network and the same type of processor because each synchronization step made by the slowest node. Fast nodes need to wait for them. Figure 1. Ring (a) and hierarchical (b) topology In other cases hierarchical topologies are useful [9]. The master-worker model is based on a distributed and large, heterogeneous cluster (see Fig 1b.). The master node controls 66
3 Acta Technica Jaurinensis Vol. 3. No the running processes, assigns problems to workers and manages the partial solutions. The role of the worker nodes is to solve smaller parts of the problem. This model works well with asynchronous methods, too because worker nodes have not connected to each other. Every worker node can reach the maximum performance of the processors... Algorithm 1 For a ring topology the following algorithm has been used: I. Let P be the number of nodes, let eps be the tolerated error value. II. Let A be the matrix to be solved and let b be the solution vector. III. For every node IV. For every node p P do in parallel: generate x1 random vector. p P do in parallel: Operation() while a result arrives or converges. V. The result of solution is x1 on master node. The algorithm uses the Operation() function on every node: 1. do. let x be a new random vector 3. let r1 = Ax1 b and r = Ax b, where r 1 r 0 ( 1, ) 4. let 1 : r r r c = r r Remarks: 5. let x 1 : = c 1 x 1 + ( 1 c 1 ) x 6. let r 1 : = c 1 r 1 + ( 1 c 1 ) r 7. let x 1 : = x1, and r 1 : = r if 1 < min or iterationn um > n r then send x1 to the next node and wait for a new x vector 9. while < eps r return x 1, the solution with desired precision. From the vector exchange it is expected that the given result is better, or when the algorithm reaches a local minimum value this 1 x solution is sent to another node, which continues the computation with a new random number coming from another node (see Figure ). 67
4 Vol. 3. No Acta Technica Jaurinensis In the implementation of the algorithm the Mersenne-twister pseudorandom number generator has been applied [7] and the independence of iteration sequences is based on the independent clock of the computing nodes. Two error measure methods have been used in the algorithm. The first was the general residual error: err r = r1 = Ax b. When the exact solution of the test case is known (x ), there is a chance to compute the absolute error value: err x = x x, where x is the approximate solution vector. Figure. The convergence of Algorithm1 (n = 100) In the case of large-sized, badly conditioned linear equations these error values are 16 relatively high numbers (with cond = 10, err = 1000 means a close solution as shown in Figure 4, in detail that means x x r n is x i x i and the error for every i= 1 i i member of the solution vector is approximately x x 10 ). If a problem was solved where the correct solution had already been known, and the residual and absolute error were compared it has to be noted that the absolute error of the solution is always better than the residual. The efficiency of the algorithm depends on load balancing: the operation can be repeated several times with slow convergence speed or the result vector can be exchanged between nodes to give extra speedup. In this case and referring to Amdahl's law the ring model has a theoretical maximum number of nodes. If more nodes are used and the best solution is sent to the next node, the larger ring will increase running time as the solution waves slowly on to the other nodes. 8 68
5 Acta Technica Jaurinensis Vol. 3. No Figure 3. The convergence of Algorithm1 (n = 500) This Algorithm1 works well on a homogeneous cluster (see Figure3). But if there is a slower or a loaded computer on the ring the send-receive method will be slow and several traffic jams are expected and running times also grow. Algorithm1 has been revised and a new, hierarchical model has been composed..3. Algorithm I. Let P be the number of nodes, let eps be the tolerated error value. II. Let A be the matrix to be solved and let b be the solution vector. III. The master node generates a random vector x and sends for every worker node. IV. For every node p P do in parallel: generate a random vector x 1 V. On master node do Control() while < eps VI. For every worker node p P x 1 do in parallel: Operation(x1) while a vector arrives VII. The result of solution is x1 on master node. On the worker nodes the Operation() function uses a residual approach like Algorithm1. The difference is that the operation function sends and receives data from the master node only. 69
6 Vol. 3. No Acta Technica Jaurinensis Figure 4. Residual and absolute error of the solution vector (n = 15000, P = 88 cpus, log-log scale) The Control function of the master node distributes and collects data from every worker node. On the master node the problem is not solved, but the result is presented here. The master node controls the data exchanges, and presents the best approximate result vector for every node (see Figure 4). This model is flexible because the number of worker nodes number can grow dynamically [8]. This model can be used on heterogeneous clusters, too, because the worker nodes are independent and communicate only with the master node. Every node works on the master s best solution. Figure 5. Convergence speed between topologies (n = 500, P = 16 cpus); 70
7 Acta Technica Jaurinensis Vol. 3. No The difference between the ring and hierarchical algorithm is that the hierarchical algorithm results in smaller computational time and better convergence. As it can be seen in Figure 5 the ring solution (solid line) has a minor gradient whereas the hierarchical solution (dotted line) has a steeper gradient. This is because at the ring model the corrective effect of a new solution reaches the previous node in P 1 steps, while in the hierarchical model the corrective effect is achieved in one step. Moreover, on the ring model we can only take the result of one or two neighbours into consideration. Figure 6. The results of working nodes (dotted) and the minimal residual error (solid line) (n = 10000, P = 88, log-log scale) The hierarchical model has better convergence features. At every iteration loop the master node sends the best solution vector for the workers. This adds some genetic features to the algorithm [5]. If we examine the details on Figure 3 and Figure 4 steps in the curve can be seen. It has to be noted that in a P-processor master-worker model only P-1 processors solve the linear equation. Let us focus now on this hierarchical solution. If the convergence curves are observed it can be noticed that all of the solutions are similar, and the final solution has always the same order of error in every case. The differences between the exact values of the errors are unfortunately caused by the pseudo-random number generator. In this algorithm only an approximate solution is achieved, not the exact vector. If the problem is examined from another point of view and the execution time of the algorithm is recorded, the results shown in Figure 7 are achieved. If more computing nodes are used the running time is expected to decrease. 71
8 Vol. 3. No Acta Technica Jaurinensis Figure 7. Time of execution and processor numbers (n = 5000); In these cases only parallel execution provides results in an acceptable time. At some points larger than linear relative speed-up can be achieved, as it is shown in Fig 8. But when the number of processors grows, the effective speedup and efficiency decreases as the worker nodes report their own solutions and the load of the master node grows. More precise load balancing can be used, but the size of the problem and the communication delays prevent further advances. For better results another method has to be used. Figure 8. Algortihm relative speedup (n = 500) 7
9 Acta Technica Jaurinensis Vol. 3. No Results Above a new type of algorithm for the solution of a linear equation system on heterogeneous clusters has been presented. The algorithm is based on a residual minimisation technique with master-worker solutions. The algorithm has some genetic features because the new, better vectors are made from a group of good vectors as seen in Figure 8 for extra speed-up. Computer tests have proved the theoretical results; parallel implementation is much better than the sequential one. We get a considerable speed-up effect using a parallel computer. The goal of creating these algorithms has been basic research but the solution of bad condition linear equations is a useful method for several practical and realistic problems. Optimization, finite element methods, control or simulation problems are often based on large dense linear equations. We have to note that only the simplest algorithm has been tested. The test with more effective algorithms will be the subject of a forthcoming work. 4. References [1] Louis A. Hageman, Davis M. Joung: Applied Iterative Methods, Computer Science and Applied Mathematics, Academic Press, (1981). [] P. G. Ciarlet: Introduction à l analyse numérique matricielle et à l optimisation, MASSON, Paris, (198). [3] G. Golub, A. Greenbaum, M. Luskin, eds., Recent Advances is Iterative Methods, The IMA Volumes in Math. and its Applications Vol.60., Springer Verlag, (1994). [4] G. Molnárka, N. Varjasi: Parallel algorithm for solution of general linear systems of equations, Informatika a felsőoktatásban 005, Debrecen ISBN , pp.176. [5] G. Molnárka: A scalable parallel algorithm for solving general linear system equations, 77th GAMM annual meeting 006, Berlin (006) pp.441. [6] N. Varjasi, Parallel Algorithm for linear equations with different network topologies, Proceedings of International e-conference on Computer Science (IeCCS) 006 in Lecture Series on Computer and Computational Sciences, (007) pp , Brill Academic Publishers, ISBN [7] M. Matsumoto and T. Nishimura, Mersenne Twister: A 63-dimensionally equidistributed uniform pseudorandom number generator, ACM Trans. on Modeling and Computer Simulation Vol. 8, No. 1, January (1998) pp [8] A. Basermann, B. Reichel, C. Schelthoff, Preconditioned CG methods for sparse matrices on massively parallel machines, in Parallel Computing 3. (1997) pp [9] E. J. H. Yero, M. A. A Henriques, Speedup and scalability analysis of Master- Slave applications on large heterogeneous clusters, in Journal of Parallel and Distributed Computing 67. (007) pp [10] I. S. Duff, H. A. van der Vorst, Developments and trends in the parallel solution of linear systems, in Parallel Computing 5. (1999) pp
10 Vol. 3. No Acta Technica Jaurinensis 74
Load Balancing using Potential Functions for Hierarchical Topologies
Acta Technica Jaurinensis Vol. 4. No. 4. 2012 Load Balancing using Potential Functions for Hierarchical Topologies Molnárka Győző, Varjasi Norbert Széchenyi István University Dept. of Machine Design and
Cellular Computing on a Linux Cluster
Cellular Computing on a Linux Cluster Alexei Agueev, Bernd Däne, Wolfgang Fengler TU Ilmenau, Department of Computer Architecture Topics 1. Cellular Computing 2. The Experiment 3. Experimental Results
P013 INTRODUCING A NEW GENERATION OF RESERVOIR SIMULATION SOFTWARE
1 P013 INTRODUCING A NEW GENERATION OF RESERVOIR SIMULATION SOFTWARE JEAN-MARC GRATIEN, JEAN-FRANÇOIS MAGRAS, PHILIPPE QUANDALLE, OLIVIER RICOIS 1&4, av. Bois-Préau. 92852 Rueil Malmaison Cedex. France
Windows Server Performance Monitoring
Spot server problems before they are noticed The system s really slow today! How often have you heard that? Finding the solution isn t so easy. The obvious questions to ask are why is it running slowly
A Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment
A Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment Panagiotis D. Michailidis and Konstantinos G. Margaritis Parallel and Distributed
Distributed Dynamic Load Balancing for Iterative-Stencil Applications
Distributed Dynamic Load Balancing for Iterative-Stencil Applications G. Dethier 1, P. Marchot 2 and P.A. de Marneffe 1 1 EECS Department, University of Liege, Belgium 2 Chemical Engineering Department,
APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder
APPM4720/5720: Fast algorithms for big data Gunnar Martinsson The University of Colorado at Boulder Course objectives: The purpose of this course is to teach efficient algorithms for processing very large
Performance Characteristics of a Cost-Effective Medium-Sized Beowulf Cluster Supercomputer
Res. Lett. Inf. Math. Sci., 2003, Vol.5, pp 1-10 Available online at http://iims.massey.ac.nz/research/letters/ 1 Performance Characteristics of a Cost-Effective Medium-Sized Beowulf Cluster Supercomputer
Parallel & Distributed Optimization. Based on Mark Schmidt s slides
Parallel & Distributed Optimization Based on Mark Schmidt s slides Motivation behind using parallel & Distributed optimization Performance Computational throughput have increased exponentially in linear
High Performance Computing. Course Notes 2007-2008. HPC Fundamentals
High Performance Computing Course Notes 2007-2008 2008 HPC Fundamentals Introduction What is High Performance Computing (HPC)? Difficult to define - it s a moving target. Later 1980s, a supercomputer performs
Load balancing Static Load Balancing
Chapter 7 Load Balancing and Termination Detection Load balancing used to distribute computations fairly across processors in order to obtain the highest possible execution speed. Termination detection
General Framework for an Iterative Solution of Ax b. Jacobi s Method
2.6 Iterative Solutions of Linear Systems 143 2.6 Iterative Solutions of Linear Systems Consistent linear systems in real life are solved in one of two ways: by direct calculation (using a matrix factorization,
Principles and characteristics of distributed systems and environments
Principles and characteristics of distributed systems and environments Definition of a distributed system Distributed system is a collection of independent computers that appears to its users as a single
Experiments on the local load balancing algorithms; part 1
Experiments on the local load balancing algorithms; part 1 Ştefan Măruşter Institute e-austria Timisoara West University of Timişoara, Romania [email protected] Abstract. In this paper the influence
Load balancing in a heterogeneous computer system by self-organizing Kohonen network
Bull. Nov. Comp. Center, Comp. Science, 25 (2006), 69 74 c 2006 NCC Publisher Load balancing in a heterogeneous computer system by self-organizing Kohonen network Mikhail S. Tarkov, Yakov S. Bezrukov Abstract.
How To Understand The Concept Of A Distributed System
Distributed Operating Systems Introduction Ewa Niewiadomska-Szynkiewicz and Adam Kozakiewicz [email protected], [email protected] Institute of Control and Computation Engineering Warsaw University of
Load Balancing and Termination Detection
Chapter 7 Load Balancing and Termination Detection 1 Load balancing used to distribute computations fairly across processors in order to obtain the highest possible execution speed. Termination detection
Building an Inexpensive Parallel Computer
Res. Lett. Inf. Math. Sci., (2000) 1, 113-118 Available online at http://www.massey.ac.nz/~wwiims/rlims/ Building an Inexpensive Parallel Computer Lutz Grosz and Andre Barczak I.I.M.S., Massey University
Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes
Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes Eric Petit, Loïc Thebault, Quang V. Dinh May 2014 EXA2CT Consortium 2 WPs Organization Proto-Applications
Interconnect Efficiency of Tyan PSC T-630 with Microsoft Compute Cluster Server 2003
Interconnect Efficiency of Tyan PSC T-630 with Microsoft Compute Cluster Server 2003 Josef Pelikán Charles University in Prague, KSVI Department, [email protected] Abstract 1 Interconnect quality
Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms
Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms Amani AlOnazi, David E. Keyes, Alexey Lastovetsky, Vladimir Rychkov Extreme Computing Research Center,
Scalability and Classifications
Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static
Adapting scientific computing problems to cloud computing frameworks Ph.D. Thesis. Pelle Jakovits
Adapting scientific computing problems to cloud computing frameworks Ph.D. Thesis Pelle Jakovits Outline Problem statement State of the art Approach Solutions and contributions Current work Conclusions
Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms
Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms Björn Rocker Hamburg, June 17th 2010 Engineering Mathematics and Computing Lab (EMCL) KIT University of the State
Load Balancing on a Non-dedicated Heterogeneous Network of Workstations
Load Balancing on a Non-dedicated Heterogeneous Network of Workstations Dr. Maurice Eggen Nathan Franklin Department of Computer Science Trinity University San Antonio, Texas 78212 Dr. Roger Eggen Department
Operation Count; Numerical Linear Algebra
10 Operation Count; Numerical Linear Algebra 10.1 Introduction Many computations are limited simply by the sheer number of required additions, multiplications, or function evaluations. If floating-point
Designing and Building Applications for Extreme Scale Systems CS598 William Gropp www.cs.illinois.edu/~wgropp
Designing and Building Applications for Extreme Scale Systems CS598 William Gropp www.cs.illinois.edu/~wgropp Welcome! Who am I? William (Bill) Gropp Professor of Computer Science One of the Creators of
Fast Multipole Method for particle interactions: an open source parallel library component
Fast Multipole Method for particle interactions: an open source parallel library component F. A. Cruz 1,M.G.Knepley 2,andL.A.Barba 1 1 Department of Mathematics, University of Bristol, University Walk,
Parallel Ray Tracing using MPI: A Dynamic Load-balancing Approach
Parallel Ray Tracing using MPI: A Dynamic Load-balancing Approach S. M. Ashraful Kadir 1 and Tazrian Khan 2 1 Scientific Computing, Royal Institute of Technology (KTH), Stockholm, Sweden [email protected],
Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh
Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem
HPC Deployment of OpenFOAM in an Industrial Setting
HPC Deployment of OpenFOAM in an Industrial Setting Hrvoje Jasak [email protected] Wikki Ltd, United Kingdom PRACE Seminar: Industrial Usage of HPC Stockholm, Sweden, 28-29 March 2011 HPC Deployment
A Load Balancing Algorithm based on the Variation Trend of Entropy in Homogeneous Cluster
, pp.11-20 http://dx.doi.org/10.14257/ ijgdc.2014.7.2.02 A Load Balancing Algorithm based on the Variation Trend of Entropy in Homogeneous Cluster Kehe Wu 1, Long Chen 2, Shichao Ye 2 and Yi Li 2 1 Beijing
A SIMULATOR FOR LOAD BALANCING ANALYSIS IN DISTRIBUTED SYSTEMS
Mihai Horia Zaharia, Florin Leon, Dan Galea (3) A Simulator for Load Balancing Analysis in Distributed Systems in A. Valachi, D. Galea, A. M. Florea, M. Craus (eds.) - Tehnologii informationale, Editura
How To Improve Performance On A Single Chip Computer
: Redundant Arrays of Inexpensive Disks this discussion is based on the paper:» A Case for Redundant Arrays of Inexpensive Disks (),» David A Patterson, Garth Gibson, and Randy H Katz,» In Proceedings
ACCELERATING COMMERCIAL LINEAR DYNAMIC AND NONLINEAR IMPLICIT FEA SOFTWARE THROUGH HIGH- PERFORMANCE COMPUTING
ACCELERATING COMMERCIAL LINEAR DYNAMIC AND Vladimir Belsky Director of Solver Development* Luis Crivelli Director of Solver Development* Matt Dunbar Chief Architect* Mikhail Belyi Development Group Manager*
A STUDY OF TASK SCHEDULING IN MULTIPROCESSOR ENVIROMENT Ranjit Rajak 1, C.P.Katti 2, Nidhi Rajak 3
A STUDY OF TASK SCHEDULING IN MULTIPROCESSOR ENVIROMENT Ranjit Rajak 1, C.P.Katti, Nidhi Rajak 1 Department of Computer Science & Applications, Dr.H.S.Gour Central University, Sagar, India, [email protected]
SOLVING LINEAR SYSTEMS
SOLVING LINEAR SYSTEMS Linear systems Ax = b occur widely in applied mathematics They occur as direct formulations of real world problems; but more often, they occur as a part of the numerical analysis
A numerically adaptive implementation of the simplex method
A numerically adaptive implementation of the simplex method József Smidla, Péter Tar, István Maros Department of Computer Science and Systems Technology University of Pannonia 17th of December 2014. 1
Parallelism and Cloud Computing
Parallelism and Cloud Computing Kai Shen Parallel Computing Parallel computing: Process sub tasks simultaneously so that work can be completed faster. For instances: divide the work of matrix multiplication
CHAPTER 1 INTRODUCTION
1 CHAPTER 1 INTRODUCTION 1.1 MOTIVATION OF RESEARCH Multicore processors have two or more execution cores (processors) implemented on a single chip having their own set of execution and architectural recourses.
Adaptive Stable Additive Methods for Linear Algebraic Calculations
Adaptive Stable Additive Methods for Linear Algebraic Calculations József Smidla, Péter Tar, István Maros University of Pannonia Veszprém, Hungary 4 th of July 204. / 2 József Smidla, Péter Tar, István
Dynamic Load Balancing in a Network of Workstations
Dynamic Load Balancing in a Network of Workstations 95.515F Research Report By: Shahzad Malik (219762) November 29, 2000 Table of Contents 1 Introduction 3 2 Load Balancing 4 2.1 Static Load Balancing
TESLA Report 2003-03
TESLA Report 23-3 A multigrid based 3D space-charge routine in the tracking code GPT Gisela Pöplau, Ursula van Rienen, Marieke de Loos and Bas van der Geer Institute of General Electrical Engineering,
Multicore Parallel Computing with OpenMP
Multicore Parallel Computing with OpenMP Tan Chee Chiang (SVU/Academic Computing, Computer Centre) 1. OpenMP Programming The death of OpenMP was anticipated when cluster systems rapidly replaced large
Chapter 18: Database System Architectures. Centralized Systems
Chapter 18: Database System Architectures! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems! Network Types 18.1 Centralized Systems! Run on a single computer system and
A simplified implementation of the least squares solution for pairwise comparisons matrices
A simplified implementation of the least squares solution for pairwise comparisons matrices Marcin Anholcer Poznań University of Economics Al. Niepodleg lości 10, 61-875 Poznań, Poland V. Babiy McMaster
AN INTERFACE STRIP PRECONDITIONER FOR DOMAIN DECOMPOSITION METHODS
AN INTERFACE STRIP PRECONDITIONER FOR DOMAIN DECOMPOSITION METHODS by M. Storti, L. Dalcín, R. Paz Centro Internacional de Métodos Numéricos en Ingeniería - CIMEC INTEC, (CONICET-UNL), Santa Fe, Argentina
Parallel Data Mining. Team 2 Flash Coders Team Research Investigation Presentation 2. Foundations of Parallel Computing Oct 2014
Parallel Data Mining Team 2 Flash Coders Team Research Investigation Presentation 2 Foundations of Parallel Computing Oct 2014 Agenda Overview of topic Analysis of research papers Software design Overview
Recommended hardware system configurations for ANSYS users
Recommended hardware system configurations for ANSYS users The purpose of this document is to recommend system configurations that will deliver high performance for ANSYS users across the entire range
BSPCloud: A Hybrid Programming Library for Cloud Computing *
BSPCloud: A Hybrid Programming Library for Cloud Computing * Xiaodong Liu, Weiqin Tong and Yan Hou Department of Computer Engineering and Science Shanghai University, Shanghai, China [email protected],
Measuring Parallel Processor Performance
ARTICLES Measuring Parallel Processor Performance Alan H. Karp and Horace P. Flatt Many metrics are used for measuring the performance of a parallel algorithm running on a parallel processor. This article
RevoScaleR Speed and Scalability
EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution
Chapter 12: Multiprocessor Architectures. Lesson 01: Performance characteristics of Multiprocessor Architectures and Speedup
Chapter 12: Multiprocessor Architectures Lesson 01: Performance characteristics of Multiprocessor Architectures and Speedup Objective Be familiar with basic multiprocessor architectures and be able to
Parallel Algorithm for Dense Matrix Multiplication
Parallel Algorithm for Dense Matrix Multiplication CSE633 Parallel Algorithms Fall 2012 Ortega, Patricia Outline Problem definition Assumptions Implementation Test Results Future work Conclusions Problem
2: Computer Performance
2: Computer Performance http://people.sc.fsu.edu/ jburkardt/presentations/ fdi 2008 lecture2.pdf... John Information Technology Department Virginia Tech... FDI Summer Track V: Parallel Programming 10-12
NEUROMATHEMATICS: DEVELOPMENT TENDENCIES. 1. Which tasks are adequate of neurocomputers?
Appl. Comput. Math. 2 (2003), no. 1, pp. 57-64 NEUROMATHEMATICS: DEVELOPMENT TENDENCIES GALUSHKIN A.I., KOROBKOVA. S.V., KAZANTSEV P.A. Abstract. This article is the summary of a set of Russian scientists
Comparison on Different Load Balancing Algorithms of Peer to Peer Networks
Comparison on Different Load Balancing Algorithms of Peer to Peer Networks K.N.Sirisha *, S.Bhagya Rekha M.Tech,Software Engineering Noble college of Engineering & Technology for Women Web Technologies
MPI Hands-On List of the exercises
MPI Hands-On List of the exercises 1 MPI Hands-On Exercise 1: MPI Environment.... 2 2 MPI Hands-On Exercise 2: Ping-pong...3 3 MPI Hands-On Exercise 3: Collective communications and reductions... 5 4 MPI
Lecture 2 Parallel Programming Platforms
Lecture 2 Parallel Programming Platforms Flynn s Taxonomy In 1966, Michael Flynn classified systems according to numbers of instruction streams and the number of data stream. Data stream Single Multiple
Interconnection Networks
Advanced Computer Architecture (0630561) Lecture 15 Interconnection Networks Prof. Kasim M. Al-Aubidy Computer Eng. Dept. Interconnection Networks: Multiprocessors INs can be classified based on: 1. Mode
Computer Architecture TDTS10
why parallelism? Performance gain from increasing clock frequency is no longer an option. Outline Computer Architecture TDTS10 Superscalar Processors Very Long Instruction Word Processors Parallel computers
Performance of networks containing both MaxNet and SumNet links
Performance of networks containing both MaxNet and SumNet links Lachlan L. H. Andrew and Bartek P. Wydrowski Abstract Both MaxNet and SumNet are distributed congestion control architectures suitable for
GPGPU accelerated Computational Fluid Dynamics
t e c h n i s c h e u n i v e r s i t ä t b r a u n s c h w e i g Carl-Friedrich Gauß Faculty GPGPU accelerated Computational Fluid Dynamics 5th GACM Colloquium on Computational Mechanics Hamburg Institute
A Direct Numerical Method for Observability Analysis
IEEE TRANSACTIONS ON POWER SYSTEMS, VOL 15, NO 2, MAY 2000 625 A Direct Numerical Method for Observability Analysis Bei Gou and Ali Abur, Senior Member, IEEE Abstract This paper presents an algebraic method
Parallel Computing. Benson Muite. [email protected] http://math.ut.ee/ benson. https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage
Parallel Computing Benson Muite [email protected] http://math.ut.ee/ benson https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage 3 November 2014 Hadoop, Review Hadoop Hadoop History Hadoop Framework
A Survey on Load Balancing and Scheduling in Cloud Computing
IJIRST International Journal for Innovative Research in Science & Technology Volume 1 Issue 7 December 2014 ISSN (online): 2349-6010 A Survey on Load Balancing and Scheduling in Cloud Computing Niraj Patel
AN APPROACH FOR SECURE CLOUD COMPUTING FOR FEM SIMULATION
AN APPROACH FOR SECURE CLOUD COMPUTING FOR FEM SIMULATION Jörg Frochte *, Christof Kaufmann, Patrick Bouillon Dept. of Electrical Engineering and Computer Science Bochum University of Applied Science 42579
Systolic Computing. Fundamentals
Systolic Computing Fundamentals Motivations for Systolic Processing PARALLEL ALGORITHMS WHICH MODEL OF COMPUTATION IS THE BETTER TO USE? HOW MUCH TIME WE EXPECT TO SAVE USING A PARALLEL ALGORITHM? HOW
Load Balancing and Termination Detection
Chapter 7 slides7-1 Load Balancing and Termination Detection slides7-2 Load balancing used to distribute computations fairly across processors in order to obtain the highest possible execution speed. Termination
MPI and Hybrid Programming Models. William Gropp www.cs.illinois.edu/~wgropp
MPI and Hybrid Programming Models William Gropp www.cs.illinois.edu/~wgropp 2 What is a Hybrid Model? Combination of several parallel programming models in the same program May be mixed in the same source
Various Schemes of Load Balancing in Distributed Systems- A Review
741 Various Schemes of Load Balancing in Distributed Systems- A Review Monika Kushwaha Pranveer Singh Institute of Technology Kanpur, U.P. (208020) U.P.T.U., Lucknow Saurabh Gupta Pranveer Singh Institute
An Empirical Study and Analysis of the Dynamic Load Balancing Techniques Used in Parallel Computing Systems
An Empirical Study and Analysis of the Dynamic Load Balancing Techniques Used in Parallel Computing Systems Ardhendu Mandal and Subhas Chandra Pal Department of Computer Science and Application, University
Solution of Linear Systems
Chapter 3 Solution of Linear Systems In this chapter we study algorithms for possibly the most commonly occurring problem in scientific computing, the solution of linear systems of equations. We start
Hybrid Evolution of Heterogeneous Neural Networks
Hybrid Evolution of Heterogeneous Neural Networks 01001110 01100101 01110101 01110010 01101111 01101110 01101111 01110110 01100001 00100000 01110011 01101011 01110101 01110000 01101001 01101110 01100001
Complex Network Visualization based on Voronoi Diagram and Smoothed-particle Hydrodynamics
Complex Network Visualization based on Voronoi Diagram and Smoothed-particle Hydrodynamics Zhao Wenbin 1, Zhao Zhengxu 2 1 School of Instrument Science and Engineering, Southeast University, Nanjing, Jiangsu
Maximum Likelihood Estimation of ADC Parameters from Sine Wave Test Data. László Balogh, Balázs Fodor, Attila Sárhegyi, and István Kollár
Maximum Lielihood Estimation of ADC Parameters from Sine Wave Test Data László Balogh, Balázs Fodor, Attila Sárhegyi, and István Kollár Dept. of Measurement and Information Systems Budapest University
Dynamic load balancing of parallel cellular automata
Dynamic load balancing of parallel cellular automata Marc Mazzariol, Benoit A. Gennart, Roger D. Hersch Ecole Polytechnique Fédérale de Lausanne, EPFL * ABSTRACT We are interested in running in parallel
The Methodology of Application Development for Hybrid Architectures
Computer Technology and Application 4 (2013) 543-547 D DAVID PUBLISHING The Methodology of Application Development for Hybrid Architectures Vladimir Orekhov, Alexander Bogdanov and Vladimir Gaiduchok Department
How To Balance In Cloud Computing
A Review on Load Balancing Algorithms in Cloud Hareesh M J Dept. of CSE, RSET, Kochi hareeshmjoseph@ gmail.com John P Martin Dept. of CSE, RSET, Kochi [email protected] Yedhu Sastri Dept. of IT, RSET,
PARALLEL PROGRAMMING
PARALLEL PROGRAMMING TECHNIQUES AND APPLICATIONS USING NETWORKED WORKSTATIONS AND PARALLEL COMPUTERS 2nd Edition BARRY WILKINSON University of North Carolina at Charlotte Western Carolina University MICHAEL
THE NAS KERNEL BENCHMARK PROGRAM
THE NAS KERNEL BENCHMARK PROGRAM David H. Bailey and John T. Barton Numerical Aerodynamic Simulations Systems Division NASA Ames Research Center June 13, 1986 SUMMARY A benchmark test program that measures
Load Balancing Techniques
Load Balancing Techniques 1 Lecture Outline Following Topics will be discussed Static Load Balancing Dynamic Load Balancing Mapping for load balancing Minimizing Interaction 2 1 Load Balancing Techniques
PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN
1 PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN Introduction What is cluster computing? Classification of Cluster Computing Technologies: Beowulf cluster Construction
Client Based Power Iteration Clustering Algorithm to Reduce Dimensionality in Big Data
Client Based Power Iteration Clustering Algorithm to Reduce Dimensionalit in Big Data Jaalatchum. D 1, Thambidurai. P 1, Department of CSE, PKIET, Karaikal, India Abstract - Clustering is a group of objects
Balancing Manufacturability and Optimal Structural Performance for Laminate Composites through a Genetic Algorithm
Balancing Manufacturability and Optimal Structural Performance for Laminate Composites through a Genetic Algorithm Mike Stephens Senior Composites Stress Engineer, Airbus UK Composite Research, Golf Course
Chapter 07: Instruction Level Parallelism VLIW, Vector, Array and Multithreaded Processors. Lesson 05: Array Processors
Chapter 07: Instruction Level Parallelism VLIW, Vector, Array and Multithreaded Processors Lesson 05: Array Processors Objective To learn how the array processes in multiple pipelines 2 Array Processor
Introduction to Cloud Computing
Introduction to Cloud Computing Parallel Processing I 15 319, spring 2010 7 th Lecture, Feb 2 nd Majd F. Sakr Lecture Motivation Concurrency and why? Different flavors of parallel computing Get the basic
Research on Clustering Analysis of Big Data Yuan Yuanming 1, 2, a, Wu Chanle 1, 2
Advanced Engineering Forum Vols. 6-7 (2012) pp 82-87 Online: 2012-09-26 (2012) Trans Tech Publications, Switzerland doi:10.4028/www.scientific.net/aef.6-7.82 Research on Clustering Analysis of Big Data
