High performance computing systems. Lab 1



Similar documents
Lecture 6: Introduction to MPI programming. Lecture 6: Introduction to MPI programming p. 1

Lightning Introduction to MPI Programming

LOAD BALANCING DISTRIBUTED OPERATING SYSTEMS, SCALABILITY, SS Hermann Härtig

To connect to the cluster, simply use a SSH or SFTP client to connect to:

Parallelization: Binary Tree Traversal

HPCC - Hrothgar Getting Started User Guide MPI Programming

Session 2: MUST. Correctness Checking

HP-MPI User s Guide. 11th Edition. Manufacturing Part Number : B September 2007

MPI Application Development Using the Analysis Tool MARMOT

Parallel Programming with MPI on the Odyssey Cluster

Message Passing with MPI

Load Balancing. computing a file with grayscales. granularity considerations static work load assignment with MPI

Introduction to Hybrid Programming

MPI-Checker Static Analysis for MPI

Compute Cluster Server Lab 3: Debugging the parallel MPI programs in Microsoft Visual Studio 2005

Why Choose C/C++ as the programming language? Parallel Programming in C/C++ - OpenMP versus MPI

MPI Runtime Error Detection with MUST For the 13th VI-HPS Tuning Workshop

Introduction to MPI Programming!

WinBioinfTools: Bioinformatics Tools for Windows Cluster. Done By: Hisham Adel Mohamed

Parallel Computing. Parallel shared memory computing with OpenMP

Introduction. Reading. Today MPI & OpenMP papers Tuesday Commutativity Analysis & HPF. CMSC 818Z - S99 (lect 5)

RA MPI Compilers Debuggers Profiling. March 25, 2009

Introduction to Cloud Computing

MARMOT- MPI Analysis and Checking Tool Demo with blood flow simulation. Bettina Krammer, Matthias Müller

Debugging with TotalView

16 node Linux cluster at SCFBio

LS-DYNA Scalability on Cray Supercomputers. Tin-Ting Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp.

Advanced MPI. Hybrid programming, profiling and debugging of MPI applications. Hristo Iliev RZ. Rechen- und Kommunikationszentrum (RZ)

COMP/CS 605: Introduction to Parallel Computing Lecture 21: Shared Memory Programming with OpenMP

Introduction to grid technologies, parallel and cloud computing. Alaa Osama Allam Saida Saad Mohamed Mohamed Ibrahim Gaber

University of Amsterdam - SURFsara. High Performance Computing and Big Data Course

INF-110. GPFS Installation

The Double-layer Master-Slave Model : A Hybrid Approach to Parallel Programming for Multicore Clusters

Parallel and Distributed Computing Programming Assignment 1

Automated Testing of Installed Software

Parallel I/O on Mira Venkat Vishwanath and Kevin Harms

Analysis and Implementation of Cluster Computing Using Linux Operating System

Allinea Performance Reports User Guide. Version 6.0.6

Message Passing Interface (MPI)

Performance and scalability of MPI on PC clusters

OpenMP & MPI CISC 879. Tristan Vanderbruggen & John Cavazos Dept of Computer & Information Sciences University of Delaware

Hybrid Programming with MPI and OpenMP

How to Run Parallel Jobs Efficiently

Charm++, what s that?!

Parallel Astronomical Data Processing or How to Build a Beowulf Class Cluster for High Performance Computing?

MUSIC Multi-Simulation Coordinator. Users Manual. Örjan Ekeberg and Mikael Djurfeldt

The CNMS Computer Cluster

Grid Engine Basics. Table of Contents. Grid Engine Basics Version 1. (Formerly: Sun Grid Engine)

GRID Computing: CAS Style

SQLITE C/C++ TUTORIAL

Kommunikation in HPC-Clustern

SGE Roll: Users Guide. Version Edition

Asynchronous Dynamic Load Balancing (ADLB)

CSC230 Getting Starting in C. Tyler Bletsch

ADVANCED MPI. Dr. David Cronk Innovative Computing Lab University of Tennessee

Agenda. Using HPC Wales 2

Bright Cluster Manager 5.2. User Manual. Revision: Date: Fri, 30 Nov 2012

Linux Cluster Computing An Administrator s Perspective

1.0. User Manual For HPC Cluster at GIKI. Volume. Ghulam Ishaq Khan Institute of Engineering Sciences & Technology

Grid 101. Grid 101. Josh Hegie.

Static Approximation of MPI Communication Graphs for Optimized Process Placement

Libmonitor: A Tool for First-Party Monitoring

System Software for High Performance Computing. Joe Izraelevitz

Parallel Computing. Shared memory parallel programming with OpenMP

Retargeting PLAPACK to Clusters with Hardware Accelerators

How To Visualize Performance Data In A Computer Program

P1 P2 P3. Home (p) 1. Diff (p) 2. Invalidation (p) 3. Page Request (p) 4. Page Response (p)

Network Performance Studies in High Performance Computing Environments

Cloud Computing through Virtualization and HPC technologies

Streamline Computing Linux Cluster User Training. ( Nottingham University)

GPI Global Address Space Programming Interface

Interconnect Efficiency of Tyan PSC T-630 with Microsoft Compute Cluster Server 2003

Experiences with HPC on Windows

The Asterope compute cluster

How to build a Beowulf Cluster Applied to Computational Electromagnetic

R and High-Performance Computing

Debugging and Profiling Lab. Carlos Rosales, Kent Milfeld and Yaakoub Y. El Kharma

Manual for using Super Computing Resources

Notes on the SNOW/Rmpi R packages with OpenMPI and Sun Grid Engine

Introduction to Linux and Cluster Basics for the CCR General Computing Cluster

HPC at IU Overview. Abhinav Thota Research Technologies Indiana University

Hodor and Bran - Job Scheduling and PBS Scripts

Informatica e Sistemi in Tempo Reale

MapReduce Evaluator: User Guide

HPC Applications Scalability.

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming

Cloud-based OpenMP Parallelization Using a MapReduce Runtime. Rodolfo Wottrich, Rodolfo Azevedo and Guido Araujo University of Campinas

High Performance Computing. MPI and PETSc

Parallel Debugging with DDT

Transcription:

High performance computing systems Lab 1 Dept. of Computer Architecture Faculty of ETI Gdansk University of Technology Paweł Czarnul For this exercise, study basic MPI functions such as: 1. for MPI management: MPI_Init(...), MPI_Finalize(), Each MPI program should start with MPI_Init(...) and finish with MPI_Finalize(). Each process can fetch the number of processes in the default communicator MPI_COMM_WORLD (the application) by calling MPI_Comm_size (see the example below). Processes in an MPI application are identified by so-called ranks ranging from 0 to n-1 where n is the number of processes returned by MPI_Comm_size(). Based on the rank, each process can perform a part of all required computations so that all processes contribute to the final goal and process all required data. 2. for point-to-point communication: MPI_Send(...), MPI_Recv(...), int MPI_Send(void *buf, int count, MPI_Datatype dtype, int dest, int tag, MPI_Comm comm) MPI_Send sends data pointed by buf to process with rank dest. There should be count elements of data type dtype. For instance, when sending 5 doubles, count should be 5 and dtype should be MPI_DOUBLE. tag can be any number which additionally describes the message and comm can be MPI_COMM_WORLD for the default communicator. int MPI_Recv(void *buf, int count, MPI_Datatype dtype, int src, int tag, MPI_Comm comm, MPI_Status *stat) MPI_Recv is a blocking receive which waits for a message with tag tag from process with rank src in communicator comm. Dtype and count denote the type and the number of elements which are to be received and stored in buf. Stat holds information about the received message. 3. for collective communication: MPI_Barrier(...), MPI_Gather(...), MPI_Scatter(...), MPI_Allgather(...).

As an example, int MPI_Reduce(void *sbuf, void* rbuf, int count, MPI_Datatype dtype, MPI_Op op, int root, MPI_Comm comm) reduces all values given by processes in communicator comm to a single value in process with rank root. See the code below for adding numbers given by all processes to a single value in process 0. Study the following tutorial on MPI: http://www.lam-mpi.org/tutorials/ The following example computes pi in parallel using an old method from the 17 th century: Pi/4=1/1 1/3 + 1/5 1/7 + 1/9. (1) Note that the program works for any number of processes requested. Successive elements of (1) are assigned to successive processes with ranks from 0 to (proccount-1). For 2 processes: Pi/4 = 1/1 1/3 + 1/5 1/7 + 1/9. process 0 1 0 1 0. For 3 processes: Pi/4 = 1/1 1/3 + 1/5 1/7 + 1/9 1/11. process 0 1 2 1 0 2. etc. This is a simple load balancing technique. For example, checking if successive numbers are prime numbers might involve more time for larger numbers. This strategy balances the execution time among processes quite well. Note that in reality we only consider a predefined number of elements in (1). In general, we should make sure that the data types used for adding the numbers can store resulting subsums. #include <stdio.h> #include <mpi.h>

int main(int argc, char **argv) { double precision=1000000000; int myrank,proccount; double pi,pi_final; int mine,sign; int i; // Initialize MPI MPI_Init(&argc, &argv); // find out my rank MPI_Comm_rank(MPI_COMM_WORLD, &myrank); // find out the number of processes in MPI_COMM_WORLD MPI_Comm_size(MPI_COMM_WORLD, &proccount); // now distribute the required precision if (precision<proccount) { printf("precision smaller than the number of processes - try again."); MPI_Finalize(); return -1; } // each process performs computations on its part pi=0; mine=myrank*2+1; sign=(((mine-1)/2)%2)?-1:1; for (;mine<precision;) { // printf("\nprocess %d %d %d", myrank,sign,mine); // fflush(stdout); pi+=sign/(double)mine; mine+=2*proccount; sign=(((mine-1)/2)%2)?-1:1; } // now merge the numbers to rank 0 MPI_Reduce(&pi,&pi_final,1, MPI_DOUBLE,MPI_SUM,0, MPI_COMM_WORLD); if (!myrank) {

} pi_final*=4; printf("pi=%f",pi_final); // Shut down MPI MPI_Finalize(); return 0; } Assuming the code was saved in file program.c, we have to: 1. compile the code: mpicc program.c 2. run it 1 process: [klaster@n01 1]$ time mpirun -np 1./a.out real 0m9.286s user 0m9.244s sys 0m0.037s 2 processes: [klaster@n01 1]$ time mpirun -np 2./a.out real 0m4.706s user 0m9.286s sys 0m0.063s 4 processes: [klaster@n01 1]$ time mpirun -np 4./a.out real 0m2.420s user 0m9.380s sys 0m0.118s Note smaller execution times for larger numbers of processes used for computations. Lab 527: For this lab, you can use the default MPI implementation on desxx computers in the lab (XX range from 01 to 18) Open MPI.

Compile the code: student@des01:~> mpicc program.c create a configuration for the virtual machine in this case just 2 nodes (des01 and des02): student@des01:~> cat > machinefile des01 des02 then invoke the application for 1 process (running on des01): student@des01:~> mpirun -machinefile./machinefile -np 1 time./a.out 9.25user 0.01system 0:09.27elapsed 99%CPU (0avgtext+0avgdata 13008maxresident)k 0inputs+0outputs (0major+1009minor)pagefaults 0swaps and 2 processes (running on des01 and des02): student@des01:~> mpirun -machinefile./machinefile -np 2 time./a.out 4.63user 0.01system 0:04.65elapsed 99%CPU (0avgtext+0avgdata 13072maxresident)k 0inputs+0outputs (0major+1013minor)pagefaults 0swaps 4.63user 0.01system 0:04.67elapsed 99%CPU (0avgtext+0avgdata 13312maxresident)k 0inputs+0outputs (0major+1023minor)pagefaults 0swaps You can create a larger virtual machine and test the scalability of the application. Lab 527: You can also use mpich on desxx: student@des01:~> /opt/mpich/ch-p4/bin/mpicc program.c program.c: In function main : program.c:12:7: warning: unused variable i student@des01:~> scp a.out des02:~ a.out 100% 1427KB 1.4MB/s 00:00 student@des01:~> scp a.out des03:~ a.out 100% 1427KB 1.4MB/s 00:00 student@des01:~> scp a.out des04:~ a.out

now run the code: 1 process student@des01:~> /opt/mpich/ch-p4/bin/mpirun -np 1 -machinefile./machinefile./a.out student@des01:~> 2 processes student@des01:~> /opt/mpich/ch-p4/bin/mpirun -np 2 -machinefile./machinefile./a.out student@des01:~> 4 processes student@des01:~> /opt/mpich/ch-p4/bin/mpirun -np 4 -machinefile./machinefile./a.out cluster KASK: reach the cluster by ssh studentx@n01.eti.pg.gda.pl X is a number from 1 to 18 The following MPI implementations are available on cluster KASK (use a full path for running mpicc and mpirun): 1. MPICH executables such as mpicc and mpirun available in /opt/mpich2/gnu/bin/ 2. Open MPI - executables in /opt/sun-ct/bin/ 3. MVAPICH executables in /usr/mpi/gcc/mvapich-1.2.0/bin/ Note: the following nodes are available on the cluster: n01 access node compute-0-0 compute-0-1 compute-0-8 Bibliography MPI Docs http://www.mpi-forum.org/docs/docs.html