MPI Hands-On List of the exercises
|
|
- Annabelle Elliott
- 8 years ago
- Views:
Transcription
1 MPI Hands-On List of the exercises 1 MPI Hands-On Exercise 1: MPI Environment MPI Hands-On Exercise 2: Ping-pong MPI Hands-On Exercise 3: Collective communications and reductions MPI Hands-On Exercise 4: Matrix transpose MPI Hands-On Exercise 5: Matrix-matrix product MPI Hands-On Exercise 6: Communicators MPI Hands-On Exercise 7: Read an MPI-IO file MPI Hands-On Exercise 8: Poisson s equation...17
2 2/26 1 MPI Hands-On Exercise 1: MPI Environment MPI Hands-On Exercise 1: MPI Environment All the processes print a different message, depending on their odd or even rank. For example, for the odd-ranked processes, the message will be: I am the odd-ranked process, my rank is M For the even-ranked processes: I am the even-ranked process, my rank is N Remark: You could use the Fortran intrinsic function mod to test if the rank is even or odd. The function mod(n,m) gives the remainder of n divided by m.
3 3/26 2 MPI Hands-On Exercise 2: Ping-pong MPI Hands-On Exercise 2: Ping-pong Point to point communications: ping-pong between two processes 1 In the first sub-exercise, we will do only a ping (sending a message from process 0 to process 1). 2 In the second sub-exercise, after the ping we will do a pong (process 1 sends the message received from process 0). 3 In the last sub-exercise, we will do a ping-pong with different message sizes. This means: 1 Send a message of 1000 reals from process 0 to process 1 (this is only a ping). 2 Create a ping-pong version where process 1 sends the message received from process 0 and measures the communication with the MPI_WTIME() function. 3 Create a version where the message size vary in a loop and which measures communication durations and bandwidths.
4 4/26 2 MPI Hands-On Exercise 2: Ping-pong Remarks: The generation of random numbers uniformly distributed in the range [0., 1.[ is made by calling the Fortran random_number subroutine: call random_number(variable) variable can be a scalar or an array The time duration measurements could be done like this:... time_begin=mpi_wtime()... time_end=mpi_wtime() print ( ("... in",f8.6," seconds.") ),time_end-time_begin...
5 5/26 3 MPI Hands-On Exercise 3: Collective communications and reductions MPI Hands-On Exercise 3: Collective communications and reductions By simulating a toss up on each process, loop until all the processes make the same choice, or until reaching a maximum number of tests. The Fortran function nint(a) returns the nearest integer of the float a. A loop with an unknown number of iterations is written in Fortran with the do while syntax: do while (condition(s))... end do
6 6/26 3 MPI Hands-On Exercise 3: Collective communications and reductions 0 1 Play Play 0 P0 0 P2 Play P0 Play P2 P1 P1 P3 P3 1 0 Play Play 0 P0 1 P2 Play P0 Play P2 P1 P1 P3 P3 1 1 Stop Stop 1 P0 1 P2 Stop P0 Stop P2 P1 P1 P3 P3 Figure 1: Do toss up until there is unanimity
7 7/26 3 MPI Hands-On Exercise 3: Collective communications and reductions If each process generates a pseudo-random number using the random_number subroutine, all will generate the same at the first draw and there will therefore have unanimity at the outset, making the problem irrelevant. It is therefore necessary to change the default behavior (legitimate for a reproduction of similar executions of a code on different machines). To do this, we need to fix on each process a different seed value used to initialize the pseudo-random number generator, by calling the random_seed subroutine. As the values must be different on each process, we use the clock time (although the precision is not sufficient on some machines) and the rank. In addition, the size of the seed for the pseudo random number generator is not the same depending on the algorithms and compilers used. To be portable, we need to obtain the size of the seed, by calling the random_seed subroutine with the size argument, then with this size we allocate an array and initializes it. This array is given at the next call to random_seed with the put argument in order to fix the seed for future sequences of pseudo random-number generation.
8 8/26 4 MPI Hands-On Exercise 4: Matrix transpose MPI Hands-On Exercise 4: Matrix transpose The goal of this exercise is to practice with the derived datatypes. A is a matrix with 5 lines and 4 columns defined on the process 0. Process 0 sends its A matrix to process 1 and transposes this matrix during the send Process 0 Figure 2: Matrix transpose Process 1 To do this, we need to create two derived datatypes, a derived datatype type_line and a derived datatype type_transpose.
9 9/26 5 MPI Hands-On Exercise 5: Matrix-matrix product MPI Hands-On Exercise 5: Matrix-matrix product Collective communications: matrix-matrix product C = A B The matrixes are square and their sizes are a multiple of the number of processes. The matrixes A and B are defined on process 0. Process 0 sends a horizontal slice of matrix A and a vertical slice of matrix B to each process. Each process then calculates its diagonal block of matrix C. To calculate the non-diagonal blocks, each process sends to the other processes its own slice of A (see figure 3). At the end, process 0 gathers and verifies the results.
10 10/26 5 MPI Hands-On Exercise 5: Matrix-matrix product B A C Figure 3: Distributed matrix product
11 11/26 5 MPI Hands-On Exercise 5: Matrix-matrix product The algorithm that may seem the most immediate and the easiest to program, consisting of each process sending its slice of its matrix A to each of the others, does not perform well because the communication algorithm is not well-balanced. It is easy to seen this when doing performance measurements and graphically representing the collected traces. See the files produit_matrices_v1_n3200_p4.slog2, produit_matrices_v1_n6400_p8.slog2 and produit_matrices_v1_n6400_p16.slog2, using the jumpshot of MPE (MPI Parallel Environment).
12 12/26 5 MPI Hands-On Exercise 5: Matrix-matrix product Figure 4: Parallel matrix product on 4 processes, for a matrix size of 3200 (first algorithm)
13 13/26 5 MPI Hands-On Exercise 5: Matrix-matrix product Figure 5: Parallel matrix product on 16 processes, for a matrix size of 6400 (first algorithm)
14 14/26 5 MPI Hands-On Exercise 5: Matrix-matrix product Changing the algorithm in order to shift slices from process to process, we obtain a perfect balance between calculations and communications and have a speedup of 2 compared to the naive algorithm. See the figure produced by the file produit_matrices_v2_n6400_p16.slog2. Figure 6: Parallel matrix product on 16 processes, for a matrix size of 6400 (second algorithm)
15 15/26 6 MPI Hands-On Exercise 6: Communicators MPI Hands-On Exercise 6: Communicators Using the Cartesian topology defined below, subdivide in 2 communicators following the lines by calling MPI_COMM_SPLIT() v(:)=1,2,3,4 1 w=1. w=2. w=3. w= v(:)=1,2,3,4 0 w=1. w=2. w=3. w= Figure 7: Subdivision of a 2D topology and communication using the obtained 1D topology
16 16/26 7 MPI Hands-On Exercise 7: Read an MPI-IO file MPI Hands-On Exercise 7: Read an MPI-IO file We have a binary file data.dat with 484 integer values. With 4 processes, it consists of reading the 121 first values on process 0, the 121 next on the process 1, and so on. We will use 4 different methods: Read via explicit offsets, in individual mode Read via shared file pointers, in collective mode Read via individual file pointers, in individual mode Read via shared file pointers, in individual mode To compile and execute the code, use make, and to verify the results, use make verification which runs a visualisation program corresponding to the four cases.
17 17/26 8 MPI Hands-On Exercise 8: Poisson s equation MPI Hands-On Exercise 8: Poisson s equation Resolution of the following Poisson equation : 2 u x u y 2 = f(x,y) in [0,1]x[0,1] u(x,y) = 0. on the boundaries f(x,y) = 2. ( x 2 x+y 2 y ) We will solve this equation with a domain decomposition method : The equation is discretized on the domain with a finite difference method. The obtained system is resolved with a Jacobi solver. The global domain is split into sub-domains. The exact solution is known and is u exact(x,y) = xy(x 1)(y 1).
18 18/26 8 MPI Hands-On Exercise 8: Poisson s equation To discretize the equation, we define a grid with a set of points (x i,y j) x i = i h x for i = 0,...,ntx+1 y j = j h y for j = 0,...,nty +1 h x = h y = h x : h y : ntx : nty : 1 (ntx+1) 1 (nty +1) x-wise step y-wise step number of x-wise interior points number of y-wise interior points In total, there are ntx+2 points in the x direction and nty+2 points in the y direction.
19 19/26 8 MPI Hands-On Exercise 8: Poisson s equation Let u ij be the estimated solution at position x i = ih x and x j = jh y. The Jacobi solver consist of computing : u n+1 ij = c 0(c 1(u n i+1j +u n i 1j)+c 2(u n ij+1 +u n ij 1) f ij) with: c 0 = 1 h 2 xh 2 y 2 h 2 x +h 2 y c 1 = 1 h 2 x c 2 = 1 h 2 y
20 20/26 8 MPI Hands-On Exercise 8: Poisson s equation In parallel, the interface values of subdomains must be exchanged between the neighbours. We use ghost cells as receive buffers.
21 21/26 8 MPI Hands-On Exercise 8: Poisson s equation N W S Figure 8: Exchange points on the interfaces E
22 22/26 8 MPI Hands-On Exercise 8: Poisson s equation y x u(sx-1,sy) u(sx,sy-1) u(sx,sy) sy sy-1 u(sx,ey+1) u(sx,ey) ey ey+1 sx-1 sx ex ex+1 u(ex+1,sy) u(ex,sy) Figure 9: Numeration of points in different sub-domains
23 23/26 8 MPI Hands-On Exercise 8: Poisson s equation y x Figure 10: Process rank numbering in the sub-domains
24 24/26 8 MPI Hands-On Exercise 8: Poisson s equation Process 0 Process 1 File Process 2 Process 3 Figure 11: Writing the global matrix u in a file You need to : Define a view, to see only the owned part of the global matrix u; Define a type, in order to write the local part of matrix u(without interfaces); Apply the view to the file; Write using only one call.
25 25/26 8 MPI Hands-On Exercise 8: Poisson s equation Initialisation of the MPI environment. Creation of the 2D Cartesian topology/ Determination of the array indexes for each sub-domain. Determination of the 4 neighbour processes for each sub-domain. Creation of two derived datatypes, type_line and type_column. Exchange the values on the interfaces with the other sub-domains. Computation of the global error. When the global error is lower than a specified value (machine precision for example), we consider that we have reached the exact solution. Collecting of the global matrix u (the same one as we obtained in the sequential) in an MPI-IO file data.dat.
26 26/26 8 MPI Hands-On Exercise 8: Poisson s equation Directory: tp8/poisson A skeleton of the parallel version is proposed: It consists of a main program (poisson.f90) and several subroutines. All the modifications have to be done in the module_parallel_mpi.f90 file. To compile and execute the code, use make and to verify the results, use make verification which runs a reading program of the data.dat file and compares it with the sequential version.
CHM 579 Lab 1: Basic Monte Carlo Algorithm
CHM 579 Lab 1: Basic Monte Carlo Algorithm Due 02/12/2014 The goal of this lab is to get familiar with a simple Monte Carlo program and to be able to compile and run it on a Linux server. Lab Procedure:
More informationIterative Solvers for Linear Systems
9th SimLab Course on Parallel Numerical Simulation, 4.10 8.10.2010 Iterative Solvers for Linear Systems Bernhard Gatzhammer Chair of Scientific Computing in Computer Science Technische Universität München
More informationP013 INTRODUCING A NEW GENERATION OF RESERVOIR SIMULATION SOFTWARE
1 P013 INTRODUCING A NEW GENERATION OF RESERVOIR SIMULATION SOFTWARE JEAN-MARC GRATIEN, JEAN-FRANÇOIS MAGRAS, PHILIPPE QUANDALLE, OLIVIER RICOIS 1&4, av. Bois-Préau. 92852 Rueil Malmaison Cedex. France
More informationParallel and Distributed Computing Programming Assignment 1
Parallel and Distributed Computing Programming Assignment 1 Due Monday, February 7 For programming assignment 1, you should write two C programs. One should provide an estimate of the performance of ping-pong
More informationMatrix Multiplication
Matrix Multiplication CPS343 Parallel and High Performance Computing Spring 2016 CPS343 (Parallel and HPC) Matrix Multiplication Spring 2016 1 / 32 Outline 1 Matrix operations Importance Dense and sparse
More informationHardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui
Hardware-Aware Analysis and Optimization of Stable Fluids Presentation Date: Sep 15 th 2009 Chrissie C. Cui Outline Introduction Highlights Flop and Bandwidth Analysis Mehrstellen Schemes Advection Caching
More information5: Magnitude 6: Convert to Polar 7: Convert to Rectangular
TI-NSPIRE CALCULATOR MENUS 1: Tools > 1: Define 2: Recall Definition --------------- 3: Delete Variable 4: Clear a-z 5: Clear History --------------- 6: Insert Comment 2: Number > 1: Convert to Decimal
More informationOpenFOAM Optimization Tools
OpenFOAM Optimization Tools Henrik Rusche and Aleks Jemcov h.rusche@wikki-gmbh.de and a.jemcov@wikki.co.uk Wikki, Germany and United Kingdom OpenFOAM Optimization Tools p. 1 Agenda Objective Review optimisation
More informationLearn CUDA in an Afternoon: Hands-on Practical Exercises
Learn CUDA in an Afternoon: Hands-on Practical Exercises Alan Gray and James Perry, EPCC, The University of Edinburgh Introduction This document forms the hands-on practical component of the Learn CUDA
More informationA Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster
Acta Technica Jaurinensis Vol. 3. No. 1. 010 A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster G. Molnárka, N. Varjasi Széchenyi István University Győr, Hungary, H-906
More informationTHE NAS KERNEL BENCHMARK PROGRAM
THE NAS KERNEL BENCHMARK PROGRAM David H. Bailey and John T. Barton Numerical Aerodynamic Simulations Systems Division NASA Ames Research Center June 13, 1986 SUMMARY A benchmark test program that measures
More informationA Pattern-Based Approach to. Automated Application Performance Analysis
A Pattern-Based Approach to Automated Application Performance Analysis Nikhil Bhatia, Shirley Moore, Felix Wolf, and Jack Dongarra Innovative Computing Laboratory University of Tennessee (bhatia, shirley,
More informationOpenMP & MPI CISC 879. Tristan Vanderbruggen & John Cavazos Dept of Computer & Information Sciences University of Delaware
OpenMP & MPI CISC 879 Tristan Vanderbruggen & John Cavazos Dept of Computer & Information Sciences University of Delaware 1 Lecture Overview Introduction OpenMP MPI Model Language extension: directives-based
More informationCUDAMat: a CUDA-based matrix class for Python
Department of Computer Science 6 King s College Rd, Toronto University of Toronto M5S 3G4, Canada http://learning.cs.toronto.edu fax: +1 416 978 1455 November 25, 2009 UTML TR 2009 004 CUDAMat: a CUDA-based
More informationHPC Deployment of OpenFOAM in an Industrial Setting
HPC Deployment of OpenFOAM in an Industrial Setting Hrvoje Jasak h.jasak@wikki.co.uk Wikki Ltd, United Kingdom PRACE Seminar: Industrial Usage of HPC Stockholm, Sweden, 28-29 March 2011 HPC Deployment
More informationWESTMORELAND COUNTY PUBLIC SCHOOLS 2011 2012 Integrated Instructional Pacing Guide and Checklist Computer Math
Textbook Correlation WESTMORELAND COUNTY PUBLIC SCHOOLS 2011 2012 Integrated Instructional Pacing Guide and Checklist Computer Math Following Directions Unit FIRST QUARTER AND SECOND QUARTER Logic Unit
More informationSources: On the Web: Slides will be available on:
C programming Introduction The basics of algorithms Structure of a C code, compilation step Constant, variable type, variable scope Expression and operators: assignment, arithmetic operators, comparison,
More informationIntroduction to Matlab
Introduction to Matlab Social Science Research Lab American University, Washington, D.C. Web. www.american.edu/provost/ctrl/pclabs.cfm Tel. x3862 Email. SSRL@American.edu Course Objective This course provides
More information6 Scalar, Stochastic, Discrete Dynamic Systems
47 6 Scalar, Stochastic, Discrete Dynamic Systems Consider modeling a population of sand-hill cranes in year n by the first-order, deterministic recurrence equation y(n + 1) = Ry(n) where R = 1 + r = 1
More informationGlossary of Object Oriented Terms
Appendix E Glossary of Object Oriented Terms abstract class: A class primarily intended to define an instance, but can not be instantiated without additional methods. abstract data type: An abstraction
More informationCUDA programming on NVIDIA GPUs
p. 1/21 on NVIDIA GPUs Mike Giles mike.giles@maths.ox.ac.uk Oxford University Mathematical Institute Oxford-Man Institute for Quantitative Finance Oxford eresearch Centre p. 2/21 Overview hardware view
More informationEvaluation of CUDA Fortran for the CFD code Strukti
Evaluation of CUDA Fortran for the CFD code Strukti Practical term report from Stephan Soller High performance computing center Stuttgart 1 Stuttgart Media University 2 High performance computing center
More informationComputer programming course in the Department of Physics, University of Calcutta
Computer programming course in the Department of Physics, University of Calcutta Parongama Sen with inputs from Prof. S. Dasgupta and Dr. J. Saha and feedback from students Computer programming course
More informationHPC Wales Skills Academy Course Catalogue 2015
HPC Wales Skills Academy Course Catalogue 2015 Overview The HPC Wales Skills Academy provides a variety of courses and workshops aimed at building skills in High Performance Computing (HPC). Our courses
More informationPOISSON AND LAPLACE EQUATIONS. Charles R. O Neill. School of Mechanical and Aerospace Engineering. Oklahoma State University. Stillwater, OK 74078
21 ELLIPTICAL PARTIAL DIFFERENTIAL EQUATIONS: POISSON AND LAPLACE EQUATIONS Charles R. O Neill School of Mechanical and Aerospace Engineering Oklahoma State University Stillwater, OK 74078 2nd Computer
More informationVector storage and access; algorithms in GIS. This is lecture 6
Vector storage and access; algorithms in GIS This is lecture 6 Vector data storage and access Vectors are built from points, line and areas. (x,y) Surface: (x,y,z) Vector data access Access to vector
More informationPoisson Equation Solver Parallelisation for Particle-in-Cell Model
WDS'14 Proceedings of Contributed Papers Physics, 233 237, 214. ISBN 978-8-7378-276-4 MATFYZPRESS Poisson Equation Solver Parallelisation for Particle-in-Cell Model A. Podolník, 1,2 M. Komm, 1 R. Dejarnac,
More informationMathematical Libraries on JUQUEEN. JSC Training Course
Mitglied der Helmholtz-Gemeinschaft Mathematical Libraries on JUQUEEN JSC Training Course May 10, 2012 Outline General Informations Sequential Libraries, planned Parallel Libraries and Application Systems:
More informationSolution of Linear Systems
Chapter 3 Solution of Linear Systems In this chapter we study algorithms for possibly the most commonly occurring problem in scientific computing, the solution of linear systems of equations. We start
More informationYousef Saad University of Minnesota Computer Science and Engineering. CRM Montreal - April 30, 2008
A tutorial on: Iterative methods for Sparse Matrix Problems Yousef Saad University of Minnesota Computer Science and Engineering CRM Montreal - April 30, 2008 Outline Part 1 Sparse matrices and sparsity
More informationIllustration 1: Diagram of program function and data flow
The contract called for creation of a random access database of plumbing shops within the near perimeter of FIU Engineering school. The database features a rating number from 1-10 to offer a guideline
More informationPerformance Results for Two of the NAS Parallel Benchmarks
Performance Results for Two of the NAS Parallel Benchmarks David H. Bailey Paul O. Frederickson NAS Applied Research Branch RIACS NASA Ames Research Center NASA Ames Research Center Moffett Field, CA 94035
More informationVisualization of 2D Domains
Visualization of 2D Domains This part of the visualization package is intended to supply a simple graphical interface for 2- dimensional finite element data structures. Furthermore, it is used as the low
More informationSOLUTIONS FOR PROBLEM SET 2
SOLUTIONS FOR PROBLEM SET 2 A: There exist primes p such that p+6k is also prime for k = 1,2 and 3. One such prime is p = 11. Another such prime is p = 41. Prove that there exists exactly one prime p such
More informationMaking the Monte Carlo Approach Even Easier and Faster. By Sergey A. Maidanov and Andrey Naraikin
Making the Monte Carlo Approach Even Easier and Faster By Sergey A. Maidanov and Andrey Naraikin Libraries of random-number generators for general probability distributions can make implementing Monte
More informationALLIED PAPER : DISCRETE MATHEMATICS (for B.Sc. Computer Technology & B.Sc. Multimedia and Web Technology)
ALLIED PAPER : DISCRETE MATHEMATICS (for B.Sc. Computer Technology & B.Sc. Multimedia and Web Technology) Subject Description: This subject deals with discrete structures like set theory, mathematical
More informationJan F. Prins. Work-efficient Techniques for the Parallel Execution of Sparse Grid-based Computations TR91-042
Work-efficient Techniques for the Parallel Execution of Sparse Grid-based Computations TR91-042 Jan F. Prins The University of North Carolina at Chapel Hill Department of Computer Science CB#3175, Sitterson
More informationRandom-Number Generation
Random-Number Generation Raj Jain Washington University Saint Louis, MO 63130 Jain@cse.wustl.edu Audio/Video recordings of this lecture are available at: http://www.cse.wustl.edu/~jain/cse574-08/ 26-1
More informationMathematical Libraries and Application Software on JUROPA and JUQUEEN
Mitglied der Helmholtz-Gemeinschaft Mathematical Libraries and Application Software on JUROPA and JUQUEEN JSC Training Course May 2014 I.Gutheil Outline General Informations Sequential Libraries Parallel
More informationGPU Acceleration of the SENSEI CFD Code Suite
GPU Acceleration of the SENSEI CFD Code Suite Chris Roy, Brent Pickering, Chip Jackson, Joe Derlaga, Xiao Xu Aerospace and Ocean Engineering Primary Collaborators: Tom Scogland, Wu Feng (Computer Science)
More informationLinear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
More information1 Bull, 2011 Bull Extreme Computing
1 Bull, 2011 Bull Extreme Computing Table of Contents HPC Overview. Cluster Overview. FLOPS. 2 Bull, 2011 Bull Extreme Computing HPC Overview Ares, Gerardo, HPC Team HPC concepts HPC: High Performance
More informationFast Multipole Method for particle interactions: an open source parallel library component
Fast Multipole Method for particle interactions: an open source parallel library component F. A. Cruz 1,M.G.Knepley 2,andL.A.Barba 1 1 Department of Mathematics, University of Bristol, University Walk,
More informationHigh Performance Computing in CST STUDIO SUITE
High Performance Computing in CST STUDIO SUITE Felix Wolfheimer GPU Computing Performance Speedup 18 16 14 12 10 8 6 4 2 0 Promo offer for EUC participants: 25% discount for K40 cards Speedup of Solver
More informationA new binary floating-point division algorithm and its software implementation on the ST231 processor
19th IEEE Symposium on Computer Arithmetic (ARITH 19) Portland, Oregon, USA, June 8-10, 2009 A new binary floating-point division algorithm and its software implementation on the ST231 processor Claude-Pierre
More informationLoad Balancing on a Non-dedicated Heterogeneous Network of Workstations
Load Balancing on a Non-dedicated Heterogeneous Network of Workstations Dr. Maurice Eggen Nathan Franklin Department of Computer Science Trinity University San Antonio, Texas 78212 Dr. Roger Eggen Department
More informationCode Generation Tools for PDEs. Matthew Knepley PETSc Developer Mathematics and Computer Science Division Argonne National Laboratory
Code Generation Tools for PDEs Matthew Knepley PETSc Developer Mathematics and Computer Science Division Argonne National Laboratory Talk Objectives Introduce Code Generation Tools - Installation - Use
More informationOpenFOAM: Computational Fluid Dynamics. Gauss Siedel iteration : (L + D) * x new = b - U * x old
OpenFOAM: Computational Fluid Dynamics Gauss Siedel iteration : (L + D) * x new = b - U * x old What s unique about my tuning work The OpenFOAM (Open Field Operation and Manipulation) CFD Toolbox is a
More informationHSL and its out-of-core solver
HSL and its out-of-core solver Jennifer A. Scott j.a.scott@rl.ac.uk Prague November 2006 p. 1/37 Sparse systems Problem: we wish to solve where A is Ax = b LARGE Informal definition: A is sparse if many
More informationPartitioning and Divide and Conquer Strategies
and Divide and Conquer Strategies Lecture 4 and Strategies Strategies Data partitioning aka domain decomposition Functional decomposition Lecture 4 and Strategies Quiz 4.1 For nuclear reactor simulation,
More informationDesign and Implementation of a Massively Parallel Version of DIRECT
Design and Implementation of a Massively Parallel Version of DIRECT JIAN HE Department of Computer Science, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA ALEX VERSTAK Department
More informationFactorization Theorems
Chapter 7 Factorization Theorems This chapter highlights a few of the many factorization theorems for matrices While some factorization results are relatively direct, others are iterative While some factorization
More informationHash-Storage Techniques for Adaptive Multilevel Solvers and Their Domain Decomposition Parallelization
Contemporary Mathematics Volume 218, 1998 B 0-8218-0988-1-03018-1 Hash-Storage Techniques for Adaptive Multilevel Solvers and Their Domain Decomposition Parallelization Michael Griebel and Gerhard Zumbusch
More informationNEW MEXICO Grade 6 MATHEMATICS STANDARDS
PROCESS STANDARDS To help New Mexico students achieve the Content Standards enumerated below, teachers are encouraged to base instruction on the following Process Standards: Problem Solving Build new mathematical
More informationIntroduction to the Finite Element Method
Introduction to the Finite Element Method 09.06.2009 Outline Motivation Partial Differential Equations (PDEs) Finite Difference Method (FDM) Finite Element Method (FEM) References Motivation Figure: cross
More informationDesign and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms
Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms Amani AlOnazi, David E. Keyes, Alexey Lastovetsky, Vladimir Rychkov Extreme Computing Research Center,
More informationDecember 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS
December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B KITCHENS The equation 1 Lines in two-dimensional space (1) 2x y = 3 describes a line in two-dimensional space The coefficients of x and y in the equation
More informationOptimization on Huygens
Optimization on Huygens Wim Rijks wimr@sara.nl Contents Introductory Remarks Support team Optimization strategy Amdahls law Compiler options An example Optimization Introductory Remarks Modern day supercomputers
More informationInformation technology Programming languages Fortran Enhanced data type facilities
ISO/IEC JTC1/SC22/WG5 N1379 Working draft of ISO/IEC TR 15581, second edition Information technology Programming languages Fortran Enhanced data type facilities This page to be supplied by ISO. No changes
More informationParallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes
Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes Eric Petit, Loïc Thebault, Quang V. Dinh May 2014 EXA2CT Consortium 2 WPs Organization Proto-Applications
More informationNotes on Factoring. MA 206 Kurt Bryan
The General Approach Notes on Factoring MA 26 Kurt Bryan Suppose I hand you n, a 2 digit integer and tell you that n is composite, with smallest prime factor around 5 digits. Finding a nontrivial factor
More informationUniversal hashing. In other words, the probability of a collision for two different keys x and y given a hash function randomly chosen from H is 1/m.
Universal hashing No matter how we choose our hash function, it is always possible to devise a set of keys that will hash to the same slot, making the hash scheme perform poorly. To circumvent this, we
More informationCS 4204 Computer Graphics
CS 4204 Computer Graphics Computer Animation Adapted from notes by Yong Cao Virginia Tech 1 Outline Principles of Animation Keyframe Animation Additional challenges in animation 2 Classic animation Luxo
More informationPerformance Tuning of a CFD Code on the Earth Simulator
Applications on HPC Special Issue on High Performance Computing Performance Tuning of a CFD Code on the Earth Simulator By Ken ichi ITAKURA,* Atsuya UNO,* Mitsuo YOKOKAWA, Minoru SAITO, Takashi ISHIHARA
More informationOpenMP Programming on ScaleMP
OpenMP Programming on ScaleMP Dirk Schmidl schmidl@rz.rwth-aachen.de Rechen- und Kommunikationszentrum (RZ) MPI vs. OpenMP MPI distributed address space explicit message passing typically code redesign
More information14.10.2014. Overview. Swarms in nature. Fish, birds, ants, termites, Introduction to swarm intelligence principles Particle Swarm Optimization (PSO)
Overview Kyrre Glette kyrrehg@ifi INF3490 Swarm Intelligence Particle Swarm Optimization Introduction to swarm intelligence principles Particle Swarm Optimization (PSO) 3 Swarms in nature Fish, birds,
More informationPerformance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations
Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations Roy D. Williams, 1990 Presented by Chris Eldred Outline Summary Finite Element Solver Load Balancing Results Types Conclusions
More informationAN INTERFACE STRIP PRECONDITIONER FOR DOMAIN DECOMPOSITION METHODS
AN INTERFACE STRIP PRECONDITIONER FOR DOMAIN DECOMPOSITION METHODS by M. Storti, L. Dalcín, R. Paz Centro Internacional de Métodos Numéricos en Ingeniería - CIMEC INTEC, (CONICET-UNL), Santa Fe, Argentina
More informationAdvanced Operating Systems CS428
Advanced Operating Systems CS428 Lecture TEN Semester I, 2009-10 Graham Ellis NUI Galway, Ireland DIY Parallelism MPI is useful for C and Fortran programming. DIY Parallelism MPI is useful for C and Fortran
More informationCompliance and Requirement Traceability for SysML v.1.0a
1. Introduction: Compliance and Traceability for SysML v.1.0a This document provides a formal statement of compliance and associated requirement traceability for the SysML v. 1.0 alpha specification, which
More information22S:295 Seminar in Applied Statistics High Performance Computing in Statistics
22S:295 Seminar in Applied Statistics High Performance Computing in Statistics Luke Tierney Department of Statistics & Actuarial Science University of Iowa August 30, 2007 Luke Tierney (U. of Iowa) HPC
More informationCellular Computing on a Linux Cluster
Cellular Computing on a Linux Cluster Alexei Agueev, Bernd Däne, Wolfgang Fengler TU Ilmenau, Department of Computer Architecture Topics 1. Cellular Computing 2. The Experiment 3. Experimental Results
More informationPerformance Monitoring of Parallel Scientific Applications
Performance Monitoring of Parallel Scientific Applications Abstract. David Skinner National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory This paper introduces an infrastructure
More informationBinary Image Reconstruction
A network flow algorithm for reconstructing binary images from discrete X-rays Kees Joost Batenburg Leiden University and CWI, The Netherlands kbatenbu@math.leidenuniv.nl Abstract We present a new algorithm
More informationSystolic Computing. Fundamentals
Systolic Computing Fundamentals Motivations for Systolic Processing PARALLEL ALGORITHMS WHICH MODEL OF COMPUTATION IS THE BETTER TO USE? HOW MUCH TIME WE EXPECT TO SAVE USING A PARALLEL ALGORITHM? HOW
More informationProgramming Languages & Tools
4 Programming Languages & Tools Almost any programming language one is familiar with can be used for computational work (despite the fact that some people believe strongly that their own favorite programming
More informationPerformance Evaluation of Amazon EC2 for NASA HPC Applications!
National Aeronautics and Space Administration Performance Evaluation of Amazon EC2 for NASA HPC Applications! Piyush Mehrotra!! J. Djomehri, S. Heistand, R. Hood, H. Jin, A. Lazanoff,! S. Saini, R. Biswas!
More informationFast Arithmetic Coding (FastAC) Implementations
Fast Arithmetic Coding (FastAC) Implementations Amir Said 1 Introduction This document describes our fast implementations of arithmetic coding, which achieve optimal compression and higher throughput by
More informationG.H. Raisoni College of Engineering, Nagpur. Department of Information Technology
Practical List 1) WAP to implement line generation using DDA algorithm 2) WAP to implement line using Bresenham s line generation algorithm. 3) WAP to generate circle using circle generation algorithm
More informationBreaking The Code. Ryan Lowe. Ryan Lowe is currently a Ball State senior with a double major in Computer Science and Mathematics and
Breaking The Code Ryan Lowe Ryan Lowe is currently a Ball State senior with a double major in Computer Science and Mathematics and a minor in Applied Physics. As a sophomore, he took an independent study
More informationHome Page. Data Structures. Title Page. Page 1 of 24. Go Back. Full Screen. Close. Quit
Data Structures Page 1 of 24 A.1. Arrays (Vectors) n-element vector start address + ielementsize 0 +1 +2 +3 +4... +n-1 start address continuous memory block static, if size is known at compile time dynamic,
More informationProgramming Exercise 3: Multi-class Classification and Neural Networks
Programming Exercise 3: Multi-class Classification and Neural Networks Machine Learning November 4, 2011 Introduction In this exercise, you will implement one-vs-all logistic regression and neural networks
More informationFigure 1: Graphical example of a mergesort 1.
CSE 30321 Computer Architecture I Fall 2011 Lab 02: Procedure Calls in MIPS Assembly Programming and Performance Total Points: 100 points due to its complexity, this lab will weight more heavily in your
More informationRandom graphs with a given degree sequence
Sourav Chatterjee (NYU) Persi Diaconis (Stanford) Allan Sly (Microsoft) Let G be an undirected simple graph on n vertices. Let d 1,..., d n be the degrees of the vertices of G arranged in descending order.
More informationParallel Algorithm for Dense Matrix Multiplication
Parallel Algorithm for Dense Matrix Multiplication CSE633 Parallel Algorithms Fall 2012 Ortega, Patricia Outline Problem definition Assumptions Implementation Test Results Future work Conclusions Problem
More informationLarge-Scale Reservoir Simulation and Big Data Visualization
Large-Scale Reservoir Simulation and Big Data Visualization Dr. Zhangxing John Chen NSERC/Alberta Innovates Energy Environment Solutions/Foundation CMG Chair Alberta Innovates Technology Future (icore)
More informationSAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
More informationParallel Programming for Multi-Core, Distributed Systems, and GPUs Exercises
Parallel Programming for Multi-Core, Distributed Systems, and GPUs Exercises Pierre-Yves Taunay Research Computing and Cyberinfrastructure 224A Computer Building The Pennsylvania State University University
More informationLecture 3: Finding integer solutions to systems of linear equations
Lecture 3: Finding integer solutions to systems of linear equations Algorithmic Number Theory (Fall 2014) Rutgers University Swastik Kopparty Scribe: Abhishek Bhrushundi 1 Overview The goal of this lecture
More informationDifferentiating a Time-dependent CFD Solver
Differentiating a Time-dependent CFD Solver Presented to The AD Workshop, Nice, April 2005 Mohamed Tadjouddine & Shaun Forth Engineering Systems Department Cranfield University (Shrivenham Campus) Swindon
More informationPoisson Models for Count Data
Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the
More informationHow High a Degree is High Enough for High Order Finite Elements?
This space is reserved for the Procedia header, do not use it How High a Degree is High Enough for High Order Finite Elements? William F. National Institute of Standards and Technology, Gaithersburg, Maryland,
More informationPerformance Improvement of Application on the K computer
Performance Improvement of Application on the K computer November 13, 2011 Kazuo Minami Team Leader, Application Development Team Research and Development Group Next-Generation Supercomputer R & D Center
More informationThe Pointless Machine and Escape of the Clones
MATH 64091 Jenya Soprunova, KSU The Pointless Machine and Escape of the Clones The Pointless Machine that operates on ordered pairs of positive integers (a, b) has three modes: In Mode 1 the machine adds
More informationUniversity of Amsterdam - SURFsara. High Performance Computing and Big Data Course
University of Amsterdam - SURFsara High Performance Computing and Big Data Course Workshop 7: OpenMP and MPI Assignments Clemens Grelck C.Grelck@uva.nl Roy Bakker R.Bakker@uva.nl Adam Belloum A.S.Z.Belloum@uva.nl
More informationLoad Balancing Techniques
Load Balancing Techniques 1 Lecture Outline Following Topics will be discussed Static Load Balancing Dynamic Load Balancing Mapping for load balancing Minimizing Interaction 2 1 Load Balancing Techniques
More informationFINITE DIFFERENCE METHODS
FINITE DIFFERENCE METHODS LONG CHEN Te best known metods, finite difference, consists of replacing eac derivative by a difference quotient in te classic formulation. It is simple to code and economic to
More informationWe will learn the Python programming language. Why? Because it is easy to learn and many people write programs in Python so we can share.
LING115 Lecture Note Session #4 Python (1) 1. Introduction As we have seen in previous sessions, we can use Linux shell commands to do simple text processing. We now know, for example, how to count words.
More informationFACTORING SPARSE POLYNOMIALS
FACTORING SPARSE POLYNOMIALS Theorem 1 (Schinzel): Let r be a positive integer, and fix non-zero integers a 0,..., a r. Let F (x 1,..., x r ) = a r x r + + a 1 x 1 + a 0. Then there exist finite sets S
More information