MPI Hands-On List of the exercises

Size: px
Start display at page:

Download "MPI Hands-On List of the exercises"

Transcription

1 MPI Hands-On List of the exercises 1 MPI Hands-On Exercise 1: MPI Environment MPI Hands-On Exercise 2: Ping-pong MPI Hands-On Exercise 3: Collective communications and reductions MPI Hands-On Exercise 4: Matrix transpose MPI Hands-On Exercise 5: Matrix-matrix product MPI Hands-On Exercise 6: Communicators MPI Hands-On Exercise 7: Read an MPI-IO file MPI Hands-On Exercise 8: Poisson s equation...17

2 2/26 1 MPI Hands-On Exercise 1: MPI Environment MPI Hands-On Exercise 1: MPI Environment All the processes print a different message, depending on their odd or even rank. For example, for the odd-ranked processes, the message will be: I am the odd-ranked process, my rank is M For the even-ranked processes: I am the even-ranked process, my rank is N Remark: You could use the Fortran intrinsic function mod to test if the rank is even or odd. The function mod(n,m) gives the remainder of n divided by m.

3 3/26 2 MPI Hands-On Exercise 2: Ping-pong MPI Hands-On Exercise 2: Ping-pong Point to point communications: ping-pong between two processes 1 In the first sub-exercise, we will do only a ping (sending a message from process 0 to process 1). 2 In the second sub-exercise, after the ping we will do a pong (process 1 sends the message received from process 0). 3 In the last sub-exercise, we will do a ping-pong with different message sizes. This means: 1 Send a message of 1000 reals from process 0 to process 1 (this is only a ping). 2 Create a ping-pong version where process 1 sends the message received from process 0 and measures the communication with the MPI_WTIME() function. 3 Create a version where the message size vary in a loop and which measures communication durations and bandwidths.

4 4/26 2 MPI Hands-On Exercise 2: Ping-pong Remarks: The generation of random numbers uniformly distributed in the range [0., 1.[ is made by calling the Fortran random_number subroutine: call random_number(variable) variable can be a scalar or an array The time duration measurements could be done like this:... time_begin=mpi_wtime()... time_end=mpi_wtime() print ( ("... in",f8.6," seconds.") ),time_end-time_begin...

5 5/26 3 MPI Hands-On Exercise 3: Collective communications and reductions MPI Hands-On Exercise 3: Collective communications and reductions By simulating a toss up on each process, loop until all the processes make the same choice, or until reaching a maximum number of tests. The Fortran function nint(a) returns the nearest integer of the float a. A loop with an unknown number of iterations is written in Fortran with the do while syntax: do while (condition(s))... end do

6 6/26 3 MPI Hands-On Exercise 3: Collective communications and reductions 0 1 Play Play 0 P0 0 P2 Play P0 Play P2 P1 P1 P3 P3 1 0 Play Play 0 P0 1 P2 Play P0 Play P2 P1 P1 P3 P3 1 1 Stop Stop 1 P0 1 P2 Stop P0 Stop P2 P1 P1 P3 P3 Figure 1: Do toss up until there is unanimity

7 7/26 3 MPI Hands-On Exercise 3: Collective communications and reductions If each process generates a pseudo-random number using the random_number subroutine, all will generate the same at the first draw and there will therefore have unanimity at the outset, making the problem irrelevant. It is therefore necessary to change the default behavior (legitimate for a reproduction of similar executions of a code on different machines). To do this, we need to fix on each process a different seed value used to initialize the pseudo-random number generator, by calling the random_seed subroutine. As the values must be different on each process, we use the clock time (although the precision is not sufficient on some machines) and the rank. In addition, the size of the seed for the pseudo random number generator is not the same depending on the algorithms and compilers used. To be portable, we need to obtain the size of the seed, by calling the random_seed subroutine with the size argument, then with this size we allocate an array and initializes it. This array is given at the next call to random_seed with the put argument in order to fix the seed for future sequences of pseudo random-number generation.

8 8/26 4 MPI Hands-On Exercise 4: Matrix transpose MPI Hands-On Exercise 4: Matrix transpose The goal of this exercise is to practice with the derived datatypes. A is a matrix with 5 lines and 4 columns defined on the process 0. Process 0 sends its A matrix to process 1 and transposes this matrix during the send Process 0 Figure 2: Matrix transpose Process 1 To do this, we need to create two derived datatypes, a derived datatype type_line and a derived datatype type_transpose.

9 9/26 5 MPI Hands-On Exercise 5: Matrix-matrix product MPI Hands-On Exercise 5: Matrix-matrix product Collective communications: matrix-matrix product C = A B The matrixes are square and their sizes are a multiple of the number of processes. The matrixes A and B are defined on process 0. Process 0 sends a horizontal slice of matrix A and a vertical slice of matrix B to each process. Each process then calculates its diagonal block of matrix C. To calculate the non-diagonal blocks, each process sends to the other processes its own slice of A (see figure 3). At the end, process 0 gathers and verifies the results.

10 10/26 5 MPI Hands-On Exercise 5: Matrix-matrix product B A C Figure 3: Distributed matrix product

11 11/26 5 MPI Hands-On Exercise 5: Matrix-matrix product The algorithm that may seem the most immediate and the easiest to program, consisting of each process sending its slice of its matrix A to each of the others, does not perform well because the communication algorithm is not well-balanced. It is easy to seen this when doing performance measurements and graphically representing the collected traces. See the files produit_matrices_v1_n3200_p4.slog2, produit_matrices_v1_n6400_p8.slog2 and produit_matrices_v1_n6400_p16.slog2, using the jumpshot of MPE (MPI Parallel Environment).

12 12/26 5 MPI Hands-On Exercise 5: Matrix-matrix product Figure 4: Parallel matrix product on 4 processes, for a matrix size of 3200 (first algorithm)

13 13/26 5 MPI Hands-On Exercise 5: Matrix-matrix product Figure 5: Parallel matrix product on 16 processes, for a matrix size of 6400 (first algorithm)

14 14/26 5 MPI Hands-On Exercise 5: Matrix-matrix product Changing the algorithm in order to shift slices from process to process, we obtain a perfect balance between calculations and communications and have a speedup of 2 compared to the naive algorithm. See the figure produced by the file produit_matrices_v2_n6400_p16.slog2. Figure 6: Parallel matrix product on 16 processes, for a matrix size of 6400 (second algorithm)

15 15/26 6 MPI Hands-On Exercise 6: Communicators MPI Hands-On Exercise 6: Communicators Using the Cartesian topology defined below, subdivide in 2 communicators following the lines by calling MPI_COMM_SPLIT() v(:)=1,2,3,4 1 w=1. w=2. w=3. w= v(:)=1,2,3,4 0 w=1. w=2. w=3. w= Figure 7: Subdivision of a 2D topology and communication using the obtained 1D topology

16 16/26 7 MPI Hands-On Exercise 7: Read an MPI-IO file MPI Hands-On Exercise 7: Read an MPI-IO file We have a binary file data.dat with 484 integer values. With 4 processes, it consists of reading the 121 first values on process 0, the 121 next on the process 1, and so on. We will use 4 different methods: Read via explicit offsets, in individual mode Read via shared file pointers, in collective mode Read via individual file pointers, in individual mode Read via shared file pointers, in individual mode To compile and execute the code, use make, and to verify the results, use make verification which runs a visualisation program corresponding to the four cases.

17 17/26 8 MPI Hands-On Exercise 8: Poisson s equation MPI Hands-On Exercise 8: Poisson s equation Resolution of the following Poisson equation : 2 u x u y 2 = f(x,y) in [0,1]x[0,1] u(x,y) = 0. on the boundaries f(x,y) = 2. ( x 2 x+y 2 y ) We will solve this equation with a domain decomposition method : The equation is discretized on the domain with a finite difference method. The obtained system is resolved with a Jacobi solver. The global domain is split into sub-domains. The exact solution is known and is u exact(x,y) = xy(x 1)(y 1).

18 18/26 8 MPI Hands-On Exercise 8: Poisson s equation To discretize the equation, we define a grid with a set of points (x i,y j) x i = i h x for i = 0,...,ntx+1 y j = j h y for j = 0,...,nty +1 h x = h y = h x : h y : ntx : nty : 1 (ntx+1) 1 (nty +1) x-wise step y-wise step number of x-wise interior points number of y-wise interior points In total, there are ntx+2 points in the x direction and nty+2 points in the y direction.

19 19/26 8 MPI Hands-On Exercise 8: Poisson s equation Let u ij be the estimated solution at position x i = ih x and x j = jh y. The Jacobi solver consist of computing : u n+1 ij = c 0(c 1(u n i+1j +u n i 1j)+c 2(u n ij+1 +u n ij 1) f ij) with: c 0 = 1 h 2 xh 2 y 2 h 2 x +h 2 y c 1 = 1 h 2 x c 2 = 1 h 2 y

20 20/26 8 MPI Hands-On Exercise 8: Poisson s equation In parallel, the interface values of subdomains must be exchanged between the neighbours. We use ghost cells as receive buffers.

21 21/26 8 MPI Hands-On Exercise 8: Poisson s equation N W S Figure 8: Exchange points on the interfaces E

22 22/26 8 MPI Hands-On Exercise 8: Poisson s equation y x u(sx-1,sy) u(sx,sy-1) u(sx,sy) sy sy-1 u(sx,ey+1) u(sx,ey) ey ey+1 sx-1 sx ex ex+1 u(ex+1,sy) u(ex,sy) Figure 9: Numeration of points in different sub-domains

23 23/26 8 MPI Hands-On Exercise 8: Poisson s equation y x Figure 10: Process rank numbering in the sub-domains

24 24/26 8 MPI Hands-On Exercise 8: Poisson s equation Process 0 Process 1 File Process 2 Process 3 Figure 11: Writing the global matrix u in a file You need to : Define a view, to see only the owned part of the global matrix u; Define a type, in order to write the local part of matrix u(without interfaces); Apply the view to the file; Write using only one call.

25 25/26 8 MPI Hands-On Exercise 8: Poisson s equation Initialisation of the MPI environment. Creation of the 2D Cartesian topology/ Determination of the array indexes for each sub-domain. Determination of the 4 neighbour processes for each sub-domain. Creation of two derived datatypes, type_line and type_column. Exchange the values on the interfaces with the other sub-domains. Computation of the global error. When the global error is lower than a specified value (machine precision for example), we consider that we have reached the exact solution. Collecting of the global matrix u (the same one as we obtained in the sequential) in an MPI-IO file data.dat.

26 26/26 8 MPI Hands-On Exercise 8: Poisson s equation Directory: tp8/poisson A skeleton of the parallel version is proposed: It consists of a main program (poisson.f90) and several subroutines. All the modifications have to be done in the module_parallel_mpi.f90 file. To compile and execute the code, use make and to verify the results, use make verification which runs a reading program of the data.dat file and compares it with the sequential version.

CHM 579 Lab 1: Basic Monte Carlo Algorithm

CHM 579 Lab 1: Basic Monte Carlo Algorithm CHM 579 Lab 1: Basic Monte Carlo Algorithm Due 02/12/2014 The goal of this lab is to get familiar with a simple Monte Carlo program and to be able to compile and run it on a Linux server. Lab Procedure:

More information

Iterative Solvers for Linear Systems

Iterative Solvers for Linear Systems 9th SimLab Course on Parallel Numerical Simulation, 4.10 8.10.2010 Iterative Solvers for Linear Systems Bernhard Gatzhammer Chair of Scientific Computing in Computer Science Technische Universität München

More information

P013 INTRODUCING A NEW GENERATION OF RESERVOIR SIMULATION SOFTWARE

P013 INTRODUCING A NEW GENERATION OF RESERVOIR SIMULATION SOFTWARE 1 P013 INTRODUCING A NEW GENERATION OF RESERVOIR SIMULATION SOFTWARE JEAN-MARC GRATIEN, JEAN-FRANÇOIS MAGRAS, PHILIPPE QUANDALLE, OLIVIER RICOIS 1&4, av. Bois-Préau. 92852 Rueil Malmaison Cedex. France

More information

Parallel and Distributed Computing Programming Assignment 1

Parallel and Distributed Computing Programming Assignment 1 Parallel and Distributed Computing Programming Assignment 1 Due Monday, February 7 For programming assignment 1, you should write two C programs. One should provide an estimate of the performance of ping-pong

More information

Matrix Multiplication

Matrix Multiplication Matrix Multiplication CPS343 Parallel and High Performance Computing Spring 2016 CPS343 (Parallel and HPC) Matrix Multiplication Spring 2016 1 / 32 Outline 1 Matrix operations Importance Dense and sparse

More information

Hardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui

Hardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui Hardware-Aware Analysis and Optimization of Stable Fluids Presentation Date: Sep 15 th 2009 Chrissie C. Cui Outline Introduction Highlights Flop and Bandwidth Analysis Mehrstellen Schemes Advection Caching

More information

5: Magnitude 6: Convert to Polar 7: Convert to Rectangular

5: Magnitude 6: Convert to Polar 7: Convert to Rectangular TI-NSPIRE CALCULATOR MENUS 1: Tools > 1: Define 2: Recall Definition --------------- 3: Delete Variable 4: Clear a-z 5: Clear History --------------- 6: Insert Comment 2: Number > 1: Convert to Decimal

More information

OpenFOAM Optimization Tools

OpenFOAM Optimization Tools OpenFOAM Optimization Tools Henrik Rusche and Aleks Jemcov h.rusche@wikki-gmbh.de and a.jemcov@wikki.co.uk Wikki, Germany and United Kingdom OpenFOAM Optimization Tools p. 1 Agenda Objective Review optimisation

More information

Learn CUDA in an Afternoon: Hands-on Practical Exercises

Learn CUDA in an Afternoon: Hands-on Practical Exercises Learn CUDA in an Afternoon: Hands-on Practical Exercises Alan Gray and James Perry, EPCC, The University of Edinburgh Introduction This document forms the hands-on practical component of the Learn CUDA

More information

A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster

A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster Acta Technica Jaurinensis Vol. 3. No. 1. 010 A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster G. Molnárka, N. Varjasi Széchenyi István University Győr, Hungary, H-906

More information

THE NAS KERNEL BENCHMARK PROGRAM

THE NAS KERNEL BENCHMARK PROGRAM THE NAS KERNEL BENCHMARK PROGRAM David H. Bailey and John T. Barton Numerical Aerodynamic Simulations Systems Division NASA Ames Research Center June 13, 1986 SUMMARY A benchmark test program that measures

More information

A Pattern-Based Approach to. Automated Application Performance Analysis

A Pattern-Based Approach to. Automated Application Performance Analysis A Pattern-Based Approach to Automated Application Performance Analysis Nikhil Bhatia, Shirley Moore, Felix Wolf, and Jack Dongarra Innovative Computing Laboratory University of Tennessee (bhatia, shirley,

More information

OpenMP & MPI CISC 879. Tristan Vanderbruggen & John Cavazos Dept of Computer & Information Sciences University of Delaware

OpenMP & MPI CISC 879. Tristan Vanderbruggen & John Cavazos Dept of Computer & Information Sciences University of Delaware OpenMP & MPI CISC 879 Tristan Vanderbruggen & John Cavazos Dept of Computer & Information Sciences University of Delaware 1 Lecture Overview Introduction OpenMP MPI Model Language extension: directives-based

More information

CUDAMat: a CUDA-based matrix class for Python

CUDAMat: a CUDA-based matrix class for Python Department of Computer Science 6 King s College Rd, Toronto University of Toronto M5S 3G4, Canada http://learning.cs.toronto.edu fax: +1 416 978 1455 November 25, 2009 UTML TR 2009 004 CUDAMat: a CUDA-based

More information

HPC Deployment of OpenFOAM in an Industrial Setting

HPC Deployment of OpenFOAM in an Industrial Setting HPC Deployment of OpenFOAM in an Industrial Setting Hrvoje Jasak h.jasak@wikki.co.uk Wikki Ltd, United Kingdom PRACE Seminar: Industrial Usage of HPC Stockholm, Sweden, 28-29 March 2011 HPC Deployment

More information

WESTMORELAND COUNTY PUBLIC SCHOOLS 2011 2012 Integrated Instructional Pacing Guide and Checklist Computer Math

WESTMORELAND COUNTY PUBLIC SCHOOLS 2011 2012 Integrated Instructional Pacing Guide and Checklist Computer Math Textbook Correlation WESTMORELAND COUNTY PUBLIC SCHOOLS 2011 2012 Integrated Instructional Pacing Guide and Checklist Computer Math Following Directions Unit FIRST QUARTER AND SECOND QUARTER Logic Unit

More information

Sources: On the Web: Slides will be available on:

Sources: On the Web: Slides will be available on: C programming Introduction The basics of algorithms Structure of a C code, compilation step Constant, variable type, variable scope Expression and operators: assignment, arithmetic operators, comparison,

More information

Introduction to Matlab

Introduction to Matlab Introduction to Matlab Social Science Research Lab American University, Washington, D.C. Web. www.american.edu/provost/ctrl/pclabs.cfm Tel. x3862 Email. SSRL@American.edu Course Objective This course provides

More information

6 Scalar, Stochastic, Discrete Dynamic Systems

6 Scalar, Stochastic, Discrete Dynamic Systems 47 6 Scalar, Stochastic, Discrete Dynamic Systems Consider modeling a population of sand-hill cranes in year n by the first-order, deterministic recurrence equation y(n + 1) = Ry(n) where R = 1 + r = 1

More information

Glossary of Object Oriented Terms

Glossary of Object Oriented Terms Appendix E Glossary of Object Oriented Terms abstract class: A class primarily intended to define an instance, but can not be instantiated without additional methods. abstract data type: An abstraction

More information

CUDA programming on NVIDIA GPUs

CUDA programming on NVIDIA GPUs p. 1/21 on NVIDIA GPUs Mike Giles mike.giles@maths.ox.ac.uk Oxford University Mathematical Institute Oxford-Man Institute for Quantitative Finance Oxford eresearch Centre p. 2/21 Overview hardware view

More information

Evaluation of CUDA Fortran for the CFD code Strukti

Evaluation of CUDA Fortran for the CFD code Strukti Evaluation of CUDA Fortran for the CFD code Strukti Practical term report from Stephan Soller High performance computing center Stuttgart 1 Stuttgart Media University 2 High performance computing center

More information

Computer programming course in the Department of Physics, University of Calcutta

Computer programming course in the Department of Physics, University of Calcutta Computer programming course in the Department of Physics, University of Calcutta Parongama Sen with inputs from Prof. S. Dasgupta and Dr. J. Saha and feedback from students Computer programming course

More information

HPC Wales Skills Academy Course Catalogue 2015

HPC Wales Skills Academy Course Catalogue 2015 HPC Wales Skills Academy Course Catalogue 2015 Overview The HPC Wales Skills Academy provides a variety of courses and workshops aimed at building skills in High Performance Computing (HPC). Our courses

More information

POISSON AND LAPLACE EQUATIONS. Charles R. O Neill. School of Mechanical and Aerospace Engineering. Oklahoma State University. Stillwater, OK 74078

POISSON AND LAPLACE EQUATIONS. Charles R. O Neill. School of Mechanical and Aerospace Engineering. Oklahoma State University. Stillwater, OK 74078 21 ELLIPTICAL PARTIAL DIFFERENTIAL EQUATIONS: POISSON AND LAPLACE EQUATIONS Charles R. O Neill School of Mechanical and Aerospace Engineering Oklahoma State University Stillwater, OK 74078 2nd Computer

More information

Vector storage and access; algorithms in GIS. This is lecture 6

Vector storage and access; algorithms in GIS. This is lecture 6 Vector storage and access; algorithms in GIS This is lecture 6 Vector data storage and access Vectors are built from points, line and areas. (x,y) Surface: (x,y,z) Vector data access Access to vector

More information

Poisson Equation Solver Parallelisation for Particle-in-Cell Model

Poisson Equation Solver Parallelisation for Particle-in-Cell Model WDS'14 Proceedings of Contributed Papers Physics, 233 237, 214. ISBN 978-8-7378-276-4 MATFYZPRESS Poisson Equation Solver Parallelisation for Particle-in-Cell Model A. Podolník, 1,2 M. Komm, 1 R. Dejarnac,

More information

Mathematical Libraries on JUQUEEN. JSC Training Course

Mathematical Libraries on JUQUEEN. JSC Training Course Mitglied der Helmholtz-Gemeinschaft Mathematical Libraries on JUQUEEN JSC Training Course May 10, 2012 Outline General Informations Sequential Libraries, planned Parallel Libraries and Application Systems:

More information

Solution of Linear Systems

Solution of Linear Systems Chapter 3 Solution of Linear Systems In this chapter we study algorithms for possibly the most commonly occurring problem in scientific computing, the solution of linear systems of equations. We start

More information

Yousef Saad University of Minnesota Computer Science and Engineering. CRM Montreal - April 30, 2008

Yousef Saad University of Minnesota Computer Science and Engineering. CRM Montreal - April 30, 2008 A tutorial on: Iterative methods for Sparse Matrix Problems Yousef Saad University of Minnesota Computer Science and Engineering CRM Montreal - April 30, 2008 Outline Part 1 Sparse matrices and sparsity

More information

Illustration 1: Diagram of program function and data flow

Illustration 1: Diagram of program function and data flow The contract called for creation of a random access database of plumbing shops within the near perimeter of FIU Engineering school. The database features a rating number from 1-10 to offer a guideline

More information

Performance Results for Two of the NAS Parallel Benchmarks

Performance Results for Two of the NAS Parallel Benchmarks Performance Results for Two of the NAS Parallel Benchmarks David H. Bailey Paul O. Frederickson NAS Applied Research Branch RIACS NASA Ames Research Center NASA Ames Research Center Moffett Field, CA 94035

More information

Visualization of 2D Domains

Visualization of 2D Domains Visualization of 2D Domains This part of the visualization package is intended to supply a simple graphical interface for 2- dimensional finite element data structures. Furthermore, it is used as the low

More information

SOLUTIONS FOR PROBLEM SET 2

SOLUTIONS FOR PROBLEM SET 2 SOLUTIONS FOR PROBLEM SET 2 A: There exist primes p such that p+6k is also prime for k = 1,2 and 3. One such prime is p = 11. Another such prime is p = 41. Prove that there exists exactly one prime p such

More information

Making the Monte Carlo Approach Even Easier and Faster. By Sergey A. Maidanov and Andrey Naraikin

Making the Monte Carlo Approach Even Easier and Faster. By Sergey A. Maidanov and Andrey Naraikin Making the Monte Carlo Approach Even Easier and Faster By Sergey A. Maidanov and Andrey Naraikin Libraries of random-number generators for general probability distributions can make implementing Monte

More information

ALLIED PAPER : DISCRETE MATHEMATICS (for B.Sc. Computer Technology & B.Sc. Multimedia and Web Technology)

ALLIED PAPER : DISCRETE MATHEMATICS (for B.Sc. Computer Technology & B.Sc. Multimedia and Web Technology) ALLIED PAPER : DISCRETE MATHEMATICS (for B.Sc. Computer Technology & B.Sc. Multimedia and Web Technology) Subject Description: This subject deals with discrete structures like set theory, mathematical

More information

Jan F. Prins. Work-efficient Techniques for the Parallel Execution of Sparse Grid-based Computations TR91-042

Jan F. Prins. Work-efficient Techniques for the Parallel Execution of Sparse Grid-based Computations TR91-042 Work-efficient Techniques for the Parallel Execution of Sparse Grid-based Computations TR91-042 Jan F. Prins The University of North Carolina at Chapel Hill Department of Computer Science CB#3175, Sitterson

More information

Random-Number Generation

Random-Number Generation Random-Number Generation Raj Jain Washington University Saint Louis, MO 63130 Jain@cse.wustl.edu Audio/Video recordings of this lecture are available at: http://www.cse.wustl.edu/~jain/cse574-08/ 26-1

More information

Mathematical Libraries and Application Software on JUROPA and JUQUEEN

Mathematical Libraries and Application Software on JUROPA and JUQUEEN Mitglied der Helmholtz-Gemeinschaft Mathematical Libraries and Application Software on JUROPA and JUQUEEN JSC Training Course May 2014 I.Gutheil Outline General Informations Sequential Libraries Parallel

More information

GPU Acceleration of the SENSEI CFD Code Suite

GPU Acceleration of the SENSEI CFD Code Suite GPU Acceleration of the SENSEI CFD Code Suite Chris Roy, Brent Pickering, Chip Jackson, Joe Derlaga, Xiao Xu Aerospace and Ocean Engineering Primary Collaborators: Tom Scogland, Wu Feng (Computer Science)

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

1 Bull, 2011 Bull Extreme Computing

1 Bull, 2011 Bull Extreme Computing 1 Bull, 2011 Bull Extreme Computing Table of Contents HPC Overview. Cluster Overview. FLOPS. 2 Bull, 2011 Bull Extreme Computing HPC Overview Ares, Gerardo, HPC Team HPC concepts HPC: High Performance

More information

Fast Multipole Method for particle interactions: an open source parallel library component

Fast Multipole Method for particle interactions: an open source parallel library component Fast Multipole Method for particle interactions: an open source parallel library component F. A. Cruz 1,M.G.Knepley 2,andL.A.Barba 1 1 Department of Mathematics, University of Bristol, University Walk,

More information

High Performance Computing in CST STUDIO SUITE

High Performance Computing in CST STUDIO SUITE High Performance Computing in CST STUDIO SUITE Felix Wolfheimer GPU Computing Performance Speedup 18 16 14 12 10 8 6 4 2 0 Promo offer for EUC participants: 25% discount for K40 cards Speedup of Solver

More information

A new binary floating-point division algorithm and its software implementation on the ST231 processor

A new binary floating-point division algorithm and its software implementation on the ST231 processor 19th IEEE Symposium on Computer Arithmetic (ARITH 19) Portland, Oregon, USA, June 8-10, 2009 A new binary floating-point division algorithm and its software implementation on the ST231 processor Claude-Pierre

More information

Load Balancing on a Non-dedicated Heterogeneous Network of Workstations

Load Balancing on a Non-dedicated Heterogeneous Network of Workstations Load Balancing on a Non-dedicated Heterogeneous Network of Workstations Dr. Maurice Eggen Nathan Franklin Department of Computer Science Trinity University San Antonio, Texas 78212 Dr. Roger Eggen Department

More information

Code Generation Tools for PDEs. Matthew Knepley PETSc Developer Mathematics and Computer Science Division Argonne National Laboratory

Code Generation Tools for PDEs. Matthew Knepley PETSc Developer Mathematics and Computer Science Division Argonne National Laboratory Code Generation Tools for PDEs Matthew Knepley PETSc Developer Mathematics and Computer Science Division Argonne National Laboratory Talk Objectives Introduce Code Generation Tools - Installation - Use

More information

OpenFOAM: Computational Fluid Dynamics. Gauss Siedel iteration : (L + D) * x new = b - U * x old

OpenFOAM: Computational Fluid Dynamics. Gauss Siedel iteration : (L + D) * x new = b - U * x old OpenFOAM: Computational Fluid Dynamics Gauss Siedel iteration : (L + D) * x new = b - U * x old What s unique about my tuning work The OpenFOAM (Open Field Operation and Manipulation) CFD Toolbox is a

More information

HSL and its out-of-core solver

HSL and its out-of-core solver HSL and its out-of-core solver Jennifer A. Scott j.a.scott@rl.ac.uk Prague November 2006 p. 1/37 Sparse systems Problem: we wish to solve where A is Ax = b LARGE Informal definition: A is sparse if many

More information

Partitioning and Divide and Conquer Strategies

Partitioning and Divide and Conquer Strategies and Divide and Conquer Strategies Lecture 4 and Strategies Strategies Data partitioning aka domain decomposition Functional decomposition Lecture 4 and Strategies Quiz 4.1 For nuclear reactor simulation,

More information

Design and Implementation of a Massively Parallel Version of DIRECT

Design and Implementation of a Massively Parallel Version of DIRECT Design and Implementation of a Massively Parallel Version of DIRECT JIAN HE Department of Computer Science, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA ALEX VERSTAK Department

More information

Factorization Theorems

Factorization Theorems Chapter 7 Factorization Theorems This chapter highlights a few of the many factorization theorems for matrices While some factorization results are relatively direct, others are iterative While some factorization

More information

Hash-Storage Techniques for Adaptive Multilevel Solvers and Their Domain Decomposition Parallelization

Hash-Storage Techniques for Adaptive Multilevel Solvers and Their Domain Decomposition Parallelization Contemporary Mathematics Volume 218, 1998 B 0-8218-0988-1-03018-1 Hash-Storage Techniques for Adaptive Multilevel Solvers and Their Domain Decomposition Parallelization Michael Griebel and Gerhard Zumbusch

More information

NEW MEXICO Grade 6 MATHEMATICS STANDARDS

NEW MEXICO Grade 6 MATHEMATICS STANDARDS PROCESS STANDARDS To help New Mexico students achieve the Content Standards enumerated below, teachers are encouraged to base instruction on the following Process Standards: Problem Solving Build new mathematical

More information

Introduction to the Finite Element Method

Introduction to the Finite Element Method Introduction to the Finite Element Method 09.06.2009 Outline Motivation Partial Differential Equations (PDEs) Finite Difference Method (FDM) Finite Element Method (FEM) References Motivation Figure: cross

More information

Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms

Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms Amani AlOnazi, David E. Keyes, Alexey Lastovetsky, Vladimir Rychkov Extreme Computing Research Center,

More information

December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS

December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B KITCHENS The equation 1 Lines in two-dimensional space (1) 2x y = 3 describes a line in two-dimensional space The coefficients of x and y in the equation

More information

Optimization on Huygens

Optimization on Huygens Optimization on Huygens Wim Rijks wimr@sara.nl Contents Introductory Remarks Support team Optimization strategy Amdahls law Compiler options An example Optimization Introductory Remarks Modern day supercomputers

More information

Information technology Programming languages Fortran Enhanced data type facilities

Information technology Programming languages Fortran Enhanced data type facilities ISO/IEC JTC1/SC22/WG5 N1379 Working draft of ISO/IEC TR 15581, second edition Information technology Programming languages Fortran Enhanced data type facilities This page to be supplied by ISO. No changes

More information

Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes

Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes Eric Petit, Loïc Thebault, Quang V. Dinh May 2014 EXA2CT Consortium 2 WPs Organization Proto-Applications

More information

Notes on Factoring. MA 206 Kurt Bryan

Notes on Factoring. MA 206 Kurt Bryan The General Approach Notes on Factoring MA 26 Kurt Bryan Suppose I hand you n, a 2 digit integer and tell you that n is composite, with smallest prime factor around 5 digits. Finding a nontrivial factor

More information

Universal hashing. In other words, the probability of a collision for two different keys x and y given a hash function randomly chosen from H is 1/m.

Universal hashing. In other words, the probability of a collision for two different keys x and y given a hash function randomly chosen from H is 1/m. Universal hashing No matter how we choose our hash function, it is always possible to devise a set of keys that will hash to the same slot, making the hash scheme perform poorly. To circumvent this, we

More information

CS 4204 Computer Graphics

CS 4204 Computer Graphics CS 4204 Computer Graphics Computer Animation Adapted from notes by Yong Cao Virginia Tech 1 Outline Principles of Animation Keyframe Animation Additional challenges in animation 2 Classic animation Luxo

More information

Performance Tuning of a CFD Code on the Earth Simulator

Performance Tuning of a CFD Code on the Earth Simulator Applications on HPC Special Issue on High Performance Computing Performance Tuning of a CFD Code on the Earth Simulator By Ken ichi ITAKURA,* Atsuya UNO,* Mitsuo YOKOKAWA, Minoru SAITO, Takashi ISHIHARA

More information

OpenMP Programming on ScaleMP

OpenMP Programming on ScaleMP OpenMP Programming on ScaleMP Dirk Schmidl schmidl@rz.rwth-aachen.de Rechen- und Kommunikationszentrum (RZ) MPI vs. OpenMP MPI distributed address space explicit message passing typically code redesign

More information

14.10.2014. Overview. Swarms in nature. Fish, birds, ants, termites, Introduction to swarm intelligence principles Particle Swarm Optimization (PSO)

14.10.2014. Overview. Swarms in nature. Fish, birds, ants, termites, Introduction to swarm intelligence principles Particle Swarm Optimization (PSO) Overview Kyrre Glette kyrrehg@ifi INF3490 Swarm Intelligence Particle Swarm Optimization Introduction to swarm intelligence principles Particle Swarm Optimization (PSO) 3 Swarms in nature Fish, birds,

More information

Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations

Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations Roy D. Williams, 1990 Presented by Chris Eldred Outline Summary Finite Element Solver Load Balancing Results Types Conclusions

More information

AN INTERFACE STRIP PRECONDITIONER FOR DOMAIN DECOMPOSITION METHODS

AN INTERFACE STRIP PRECONDITIONER FOR DOMAIN DECOMPOSITION METHODS AN INTERFACE STRIP PRECONDITIONER FOR DOMAIN DECOMPOSITION METHODS by M. Storti, L. Dalcín, R. Paz Centro Internacional de Métodos Numéricos en Ingeniería - CIMEC INTEC, (CONICET-UNL), Santa Fe, Argentina

More information

Advanced Operating Systems CS428

Advanced Operating Systems CS428 Advanced Operating Systems CS428 Lecture TEN Semester I, 2009-10 Graham Ellis NUI Galway, Ireland DIY Parallelism MPI is useful for C and Fortran programming. DIY Parallelism MPI is useful for C and Fortran

More information

Compliance and Requirement Traceability for SysML v.1.0a

Compliance and Requirement Traceability for SysML v.1.0a 1. Introduction: Compliance and Traceability for SysML v.1.0a This document provides a formal statement of compliance and associated requirement traceability for the SysML v. 1.0 alpha specification, which

More information

22S:295 Seminar in Applied Statistics High Performance Computing in Statistics

22S:295 Seminar in Applied Statistics High Performance Computing in Statistics 22S:295 Seminar in Applied Statistics High Performance Computing in Statistics Luke Tierney Department of Statistics & Actuarial Science University of Iowa August 30, 2007 Luke Tierney (U. of Iowa) HPC

More information

Cellular Computing on a Linux Cluster

Cellular Computing on a Linux Cluster Cellular Computing on a Linux Cluster Alexei Agueev, Bernd Däne, Wolfgang Fengler TU Ilmenau, Department of Computer Architecture Topics 1. Cellular Computing 2. The Experiment 3. Experimental Results

More information

Performance Monitoring of Parallel Scientific Applications

Performance Monitoring of Parallel Scientific Applications Performance Monitoring of Parallel Scientific Applications Abstract. David Skinner National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory This paper introduces an infrastructure

More information

Binary Image Reconstruction

Binary Image Reconstruction A network flow algorithm for reconstructing binary images from discrete X-rays Kees Joost Batenburg Leiden University and CWI, The Netherlands kbatenbu@math.leidenuniv.nl Abstract We present a new algorithm

More information

Systolic Computing. Fundamentals

Systolic Computing. Fundamentals Systolic Computing Fundamentals Motivations for Systolic Processing PARALLEL ALGORITHMS WHICH MODEL OF COMPUTATION IS THE BETTER TO USE? HOW MUCH TIME WE EXPECT TO SAVE USING A PARALLEL ALGORITHM? HOW

More information

Programming Languages & Tools

Programming Languages & Tools 4 Programming Languages & Tools Almost any programming language one is familiar with can be used for computational work (despite the fact that some people believe strongly that their own favorite programming

More information

Performance Evaluation of Amazon EC2 for NASA HPC Applications!

Performance Evaluation of Amazon EC2 for NASA HPC Applications! National Aeronautics and Space Administration Performance Evaluation of Amazon EC2 for NASA HPC Applications! Piyush Mehrotra!! J. Djomehri, S. Heistand, R. Hood, H. Jin, A. Lazanoff,! S. Saini, R. Biswas!

More information

Fast Arithmetic Coding (FastAC) Implementations

Fast Arithmetic Coding (FastAC) Implementations Fast Arithmetic Coding (FastAC) Implementations Amir Said 1 Introduction This document describes our fast implementations of arithmetic coding, which achieve optimal compression and higher throughput by

More information

G.H. Raisoni College of Engineering, Nagpur. Department of Information Technology

G.H. Raisoni College of Engineering, Nagpur. Department of Information Technology Practical List 1) WAP to implement line generation using DDA algorithm 2) WAP to implement line using Bresenham s line generation algorithm. 3) WAP to generate circle using circle generation algorithm

More information

Breaking The Code. Ryan Lowe. Ryan Lowe is currently a Ball State senior with a double major in Computer Science and Mathematics and

Breaking The Code. Ryan Lowe. Ryan Lowe is currently a Ball State senior with a double major in Computer Science and Mathematics and Breaking The Code Ryan Lowe Ryan Lowe is currently a Ball State senior with a double major in Computer Science and Mathematics and a minor in Applied Physics. As a sophomore, he took an independent study

More information

Home Page. Data Structures. Title Page. Page 1 of 24. Go Back. Full Screen. Close. Quit

Home Page. Data Structures. Title Page. Page 1 of 24. Go Back. Full Screen. Close. Quit Data Structures Page 1 of 24 A.1. Arrays (Vectors) n-element vector start address + ielementsize 0 +1 +2 +3 +4... +n-1 start address continuous memory block static, if size is known at compile time dynamic,

More information

Programming Exercise 3: Multi-class Classification and Neural Networks

Programming Exercise 3: Multi-class Classification and Neural Networks Programming Exercise 3: Multi-class Classification and Neural Networks Machine Learning November 4, 2011 Introduction In this exercise, you will implement one-vs-all logistic regression and neural networks

More information

Figure 1: Graphical example of a mergesort 1.

Figure 1: Graphical example of a mergesort 1. CSE 30321 Computer Architecture I Fall 2011 Lab 02: Procedure Calls in MIPS Assembly Programming and Performance Total Points: 100 points due to its complexity, this lab will weight more heavily in your

More information

Random graphs with a given degree sequence

Random graphs with a given degree sequence Sourav Chatterjee (NYU) Persi Diaconis (Stanford) Allan Sly (Microsoft) Let G be an undirected simple graph on n vertices. Let d 1,..., d n be the degrees of the vertices of G arranged in descending order.

More information

Parallel Algorithm for Dense Matrix Multiplication

Parallel Algorithm for Dense Matrix Multiplication Parallel Algorithm for Dense Matrix Multiplication CSE633 Parallel Algorithms Fall 2012 Ortega, Patricia Outline Problem definition Assumptions Implementation Test Results Future work Conclusions Problem

More information

Large-Scale Reservoir Simulation and Big Data Visualization

Large-Scale Reservoir Simulation and Big Data Visualization Large-Scale Reservoir Simulation and Big Data Visualization Dr. Zhangxing John Chen NSERC/Alberta Innovates Energy Environment Solutions/Foundation CMG Chair Alberta Innovates Technology Future (icore)

More information

SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

More information

Parallel Programming for Multi-Core, Distributed Systems, and GPUs Exercises

Parallel Programming for Multi-Core, Distributed Systems, and GPUs Exercises Parallel Programming for Multi-Core, Distributed Systems, and GPUs Exercises Pierre-Yves Taunay Research Computing and Cyberinfrastructure 224A Computer Building The Pennsylvania State University University

More information

Lecture 3: Finding integer solutions to systems of linear equations

Lecture 3: Finding integer solutions to systems of linear equations Lecture 3: Finding integer solutions to systems of linear equations Algorithmic Number Theory (Fall 2014) Rutgers University Swastik Kopparty Scribe: Abhishek Bhrushundi 1 Overview The goal of this lecture

More information

Differentiating a Time-dependent CFD Solver

Differentiating a Time-dependent CFD Solver Differentiating a Time-dependent CFD Solver Presented to The AD Workshop, Nice, April 2005 Mohamed Tadjouddine & Shaun Forth Engineering Systems Department Cranfield University (Shrivenham Campus) Swindon

More information

Poisson Models for Count Data

Poisson Models for Count Data Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the

More information

How High a Degree is High Enough for High Order Finite Elements?

How High a Degree is High Enough for High Order Finite Elements? This space is reserved for the Procedia header, do not use it How High a Degree is High Enough for High Order Finite Elements? William F. National Institute of Standards and Technology, Gaithersburg, Maryland,

More information

Performance Improvement of Application on the K computer

Performance Improvement of Application on the K computer Performance Improvement of Application on the K computer November 13, 2011 Kazuo Minami Team Leader, Application Development Team Research and Development Group Next-Generation Supercomputer R & D Center

More information

The Pointless Machine and Escape of the Clones

The Pointless Machine and Escape of the Clones MATH 64091 Jenya Soprunova, KSU The Pointless Machine and Escape of the Clones The Pointless Machine that operates on ordered pairs of positive integers (a, b) has three modes: In Mode 1 the machine adds

More information

University of Amsterdam - SURFsara. High Performance Computing and Big Data Course

University of Amsterdam - SURFsara. High Performance Computing and Big Data Course University of Amsterdam - SURFsara High Performance Computing and Big Data Course Workshop 7: OpenMP and MPI Assignments Clemens Grelck C.Grelck@uva.nl Roy Bakker R.Bakker@uva.nl Adam Belloum A.S.Z.Belloum@uva.nl

More information

Load Balancing Techniques

Load Balancing Techniques Load Balancing Techniques 1 Lecture Outline Following Topics will be discussed Static Load Balancing Dynamic Load Balancing Mapping for load balancing Minimizing Interaction 2 1 Load Balancing Techniques

More information

FINITE DIFFERENCE METHODS

FINITE DIFFERENCE METHODS FINITE DIFFERENCE METHODS LONG CHEN Te best known metods, finite difference, consists of replacing eac derivative by a difference quotient in te classic formulation. It is simple to code and economic to

More information

We will learn the Python programming language. Why? Because it is easy to learn and many people write programs in Python so we can share.

We will learn the Python programming language. Why? Because it is easy to learn and many people write programs in Python so we can share. LING115 Lecture Note Session #4 Python (1) 1. Introduction As we have seen in previous sessions, we can use Linux shell commands to do simple text processing. We now know, for example, how to count words.

More information

FACTORING SPARSE POLYNOMIALS

FACTORING SPARSE POLYNOMIALS FACTORING SPARSE POLYNOMIALS Theorem 1 (Schinzel): Let r be a positive integer, and fix non-zero integers a 0,..., a r. Let F (x 1,..., x r ) = a r x r + + a 1 x 1 + a 0. Then there exist finite sets S

More information