An Introduction to Parallel Computing With MPI. Computing Lab I

Size: px
Start display at page:

Download "An Introduction to Parallel Computing With MPI. Computing Lab I"

Transcription

1 An Introduction to Parallel Computing With MPI Computing Lab I The purpose of the first programming exercise is to become familiar with the operating environment on a parallel computer, and to create and run a simple parallel program using MPI. In code development, it is always a good idea to start simple and develop/debug each piece of code before adding more complexity. This first code will implement the basic MPI structure, query communicator info, and output process rank the classic hello world program. You will learn how to compile parallel programs and submit batch jobs using the scheduler. Write a basic hello world code which creates a MPI environment, determines the number of processes in the global communicator, and writes the rank of each process to standard output. You will have to implement the correct MPI language binding depending on which programming language you are using: FORTRAN, C, or C++. The general program structure for each language is shown below. You can write your code in the editor TextWrangler which is installed on the lab computers. This program allows you to edit and save source code files either locally on the lab computers or remotely on socrates and transfer files as needed using sftp. FORTRAN90 C program MPIhelloworld implicit none include mpif.h! Include the MPI header file integer::ierr,pid,np call MPI_INIT(ierr)! Initialize MPI environment call MPI_COMM_SIZE(MPI_COMM_WORLD,np,ierr)! Get number of processes (np) call MPI_COMM_RANK(MPI_COMM_WORLD,pid,ierr)! Get local rank (pid) write(*,*) I am process:,pid call MPI_FINALIZE(ierr)! Terminate MPI environment stop end program MPIhelloworld #include <mpi.h> /* Include the MPI header file */ #include <stdio.h> main (int argc, char *argv[]) { int ierr,pid,np; 1

2 C++ ierr = MPI_Init(&argc, &argv); /* Initialize MPI environment*/ MPI_Comm_size(MPI_COMM_WORLD, &np); /* Get number of processes (np) MPI_Comm_rank(MPI_COMM_WORLD, &pid); /* Get local rank (pid) */ printf( I am process: %d \n,pid); MPI_Finalize(); /* Terminate MPI environment */ } #include <mpi.h> // Include the MPI header file #include <isostream> int main (int argc, char ** argv) { int ierr,pid,np; MPI::Init(argc,argv); // Initialize MPI environment np = MPI::COMM_WORLD.Get_size(); // Get number of processes (np) pid = MPI::COMM_WORLD.Get_rank(); // Get local rank (pid) printf( I am process: %d \n,pid); MPI::Finalize(); // Terminate MPI environment } Save your source code to your home directory on socrates (from the TextWrangler File menu select Save to FTP/SFTP Server and log in). Now open a terminal program (such as Terminal or X11) and ssh to socrates. You should be able to log in with your NSID account. If you are not familiar with the UNIX command line environment, you can consult the attached document explaining all the basic commands you need to know. Your home directory is the location where you will keep all your source code, the executables, and your input and output data files. Your parallel code is submitted from the home directory and you can read and write files from there. Most parallel computers provide a different directory with additional disk space should your program use very large data files. socrates Information about socrates is available on the site Your account has been set up to use the OpenMPI implementation of the MPI standard. Socrates also has MPICH and LAM MPI installed. Socrates has the compilers gcc, g77, gfortran available. To compile a parallel MPI program you need to use the compiler scripts provided by OpenMPI which link the native compilers to the proper MPI libraries. The compiler scripts are mpif77 or mpif90 for FORTRAN programs, mpicc for C, and mpicc for C++ programs. They can be passed any flag accepted by the underlying compilers. To do a basic build, use one of the following commands: 2

3 []$ mpif90 -o executable sourcecode.f90 []$ mpicc -o executable sourcecode.c []$ mpicc -o executable sourcecode.cpp Socrates uses the TORQUE/Moab batching system to manage the load distribution on the cluster. This load leveling program creates a queuing system to manage the cluster, and users must submit their batch jobs to the queue. An outline of basic TORQUE commands is given below (which evolved from software called PBS Portable Batch System). To submit a parallel job, you will need to create a job script. Using a text editor (TextWrangler) create a new file named myjobscript.pbs and type in all the necessary commands required to submit your parallel job to the queue. A sample job script is shown below. Note that PBS commands are preceded by #PBS and comment lines are inserted with a single #. #/bin/sh # Sample PBS Script for use with OpenMPI on Socrates # Jason Hlady May 2010 # Specify the number of processors to use in the form of # nodes=x:ppn=y, where X = number of computers (nodes), # Y = number of processors per computer #PBS -l nodes=1:ppn=1 # Job name which will show up in queue, job output #PBS -N <my job name> # Optional: join error and output into one stream #PBS -j oe # Show what node the app started on--useful for serial jobs echo `hostname` cd $PBS_O_WORKDIR echo "Current working directory is `pwd`" echo "Starting run at: `date`" echo " " # Run the application mpirun <my program name> echo "Program finished with exit code $? at: `date`" exit 0 3

4 When you submit the batch job, TORQUE will assign a job ID number. The standard output and standard error of the job will be stored in the file myjobname.ojob_id# in your working directory. To submit your batch job simply enter qsub myjobscript.pbs The job ID number will be output to screen. To observe the status of your job in the queue, type qstat To kill a job enter qdel JOB_ID# You can view the man pages of any of these commands for more information and options. Computing Lab II Option 1: Jacobi Iteration on a Two-Dimensional Mesh This is a classic problem for learning the basics of building a parallel Single Program Multiple Data (SPMD) code with a domain decomposition approach and data dependency between processes. These issues are common to many parallel algorithms used in scientific programs. We will keep the algorithm as simple as possible so that you can focus on implementing the parallel communication and thinking about program efficiency. Consider solving the temperature distribution on a two-dimensional grid with fixed temperature values on the boundaries. Figure 1: Uniform grid of temperature values. Boundary values indicated by grey nodes. 4

5 The temperature values at all grid points can be stored in a two-dimensional data array, T(i,j). Starting from an initial guess for the temperature distribution (say T = 0 at all interior nodes (white squares)), we can calculate the final temperature distribution by repeatedly applying the calculation,,,,, over all interior nodes until the temperature values converge to the final solution. This is not a very efficient solver and it may take hundreds (or thousands) of sweeps of the grid before convergence, but it is the simplest algorithm you can use. An example FORTRAN 90 sequential program is given below. program jacobi! A program solving 2D heat equations using Jacobi iteration implicit none integer, parameter::id=100,jd=100 integer::i,j,n,nmax real(kind=8), dimension(0:id+1,0:jd+1)::tnew,told character(6)::filename! Initialize the domain Told=0.0_8! initial condition Told(0,:)=80.0_8! right boundary condition Told(id+1,:)=50.0_8! left boundary condition Told(:,0)=0.0_8! bottom boundary condition Told(:,jd+1)=100.0_8! top boundary condition Tnew=Told! Perform Jacobi iterations (nmax sweeps of the domain) nmax=1000 do n=1,nmax! Sweep interior nodes do i=1,id do j=1,jd Tnew(i,j)=(Told(i+1,j)+Told(i-1,j)+Told(i,j+1)+Told(i,j-1))/4.0_8! Copy Tnew to Told and sweep again Told=Tnew! Output field data to file 50 format(102f6.1) filename="t.dat" open(unit=20,file=filename,status="replace") do j=jd+1,0,-1 write(20,50)(tnew(i,j), i=0,id+1) stop end program jacobi 5

6 Now parallelize the Jacobi solver. Use a simple one-dimensional domain decomposition as shown below. Process: 0 1 n Figure 2: 1D decomposition of the grid. Each process will perform iterations only on its subdomain, and will have to exchange temperature values with neighboring processes at the subdomain boundaries. You should create a row of ghost points to store these communicated values. The external row of boundary values around the global domain can also be considered ghost points. If you keep things basic, you should be able to write the parallel program in less than 70 lines of code! Some tips and hints: To keep things simple, directly program the mapping of the domain to the processes, i.e. process 0 is on the left boundary, process n on the right boundary, the rest in the middle. You can also directly specify the different boundary conditions for each process. After every process sweeps its local nodes once, you will have to communicate the updated temperature values at subdomain boundaries before the next sweep. This can be accomplished in two communication shifts first everyone sends data to the process on the right and receives from the left, then everyone sends to the left and receives from the right. Make sure the communication pattern doesn t block. Since the data values you need to communicate may not be in contiguous memory locations in your 2D temperature data array, you can create a 1D buffer array and explicitly copy the data values in/out of the buffer and use the buffer array in the MPI_SEND and MPI_RECV calls. You may want to look at the data field when the computation is done, and the easiest way to do this is to have every process write its local data array to a separate data file. You will have to use a different file name for every process, and one way to automatically generate file names (in FORTRAN 90) with the process id as the file name is with ASCII number to character conversion: filename=achar((pid-mod(pid,10))/10+48) // achar(mod(pid,10)+48) // ".dat" which gives the file name 12.dat for pid = 12. Try using MPI_SENDRECV instead of separate blocking send and receive calls. This will allow you to solve the case when the domain is periodic in the x-direction (roll the domain into a 6

7 cylindrical shell with the two x-faces joined together) and process 0 communicates with process n,. You can implement a grid convergence measure such as the rms of the difference between T new and T old on the global grid, and then stop the outer loop when the convergence measure is acceptably small (say 10-5 ). To do this you will need to use collective communication calls to calculate the global convergence of the grid and to broadcast this value to all processes so that they stop at the same time. If you have the 1D domain decomposition working you can try a 2D domain decomposition which subdivides the domain into squares instead of strips. This is a more efficient decomposition since the number of subdomain ghost points is reduced. Option 2: Numerical Integration of a Set of Discrete Data This problem uses a master-worker model where the master process divides up the data and sends it to the workers, who perform local computations on the data and communicate results back to the master. There is no data dependency between workers (they don t need to communicate with each other). This is an example of what is called an embarrassingly parallel problem. Consider the numerical integration of a large set of discrete data values, which could represent points sampled from a function. Figure 3: Discrete data values, f(x i ), where i = 1,2,3,,n. To approximate the integral, we can fit straight lines between each pair of points and then compute the sum of the areas under each line segment. This is the trapezoid formula: 7

8 The locations may not be evenly spaced. An example FORTRAN 90 code is given below. program integrate! A program to numerically integrate discrete data from the file ptrace.dat implicit none integer, parameter::n=960000! Number of points in data file integer::i real(kind=8)::integral real(kind=8), dimension(n)::x,f! Open data file and read in data open(unit=21,file="ptrace.dat",status="old") do i=1,n read(21,*)x(i),f(i)! Now compute global integral integral=0.0_8 do i=1,n-1 integral=integral+(x(i+1)-x(i))*(f(i)+f(i+1))/2.0_8! trapezoidal formula! Ouput result write(*,*)"the integral of the data set is: ",integral stop end program integrate Now parallelize this program using the master-worker model. The master process (choose process 0, which is always present) reads in data from the file, divides it up evenly and distributes it to the workers (all other processes). The workers compute the integral of their portions of the data and return the results to the master. The master sums the results to find the global integral and outputs the result. If you keep things simple, you should be able to write the parallel program in less than 60 lines of code. Some tips and hints: If the data array is very large, the master process may not have enough local memory to store the entire array in memory. In this case it would be better to read in only part of the data set at a time and send it to a worker(s), before reading in more data (over-write previous values) and send to other workers, etc. In order to make this algorithm efficient, we need to minimize the idle time of the workers (and the master) and balance the computational work as evenly as possible. If the number of processes is small, and the data set is large, we may want the master process to help compute part of the integral while it is waiting for the workers to finish. Also, if the amount of data communicated to each worker is large (lots of communication overhead bandwidth related) other workers will be idling while they wait for their data. Would it be more efficient to send smaller parcels of data to each worker so that they all get to work quickly, and then repeatedly 8

9 send more data when they finish until all the work is done? But if the number of messages gets too large, then we will have increased latency-related overhead. You can try using non-blocking communication calls on the master process so that it can do other tasks while waiting for results from workers. You can also try using the scatter and reduce collective communication routines to implement the parallel program. Investigate Parallel Performance Measure the parallel performance of your code and examine how the efficiency varies with process count and problem size. Implement timing routines in your parallel code as well as in a sequential version, and write run time to standard output. When submitting timed parallel jobs to the queue, you want to make sure that resources are used exclusively for your job (i.e. other applications are not running at the same time on the same CPU). Also, the run time of your code may be affected by the mapping of processes to cores/sockets/nodes on the machine so experiment with this. It might be a good idea to launch the code several times and average the run time results. Measure the parallel efficiency and speedup of your code on different numbers of processes. You may also want to repeat the measurements on larger/smaller domains to examine the effects of problem size. The single process run time T 1 can be used to calculate speedup, or a tougher measure is to use the sequential code run time T s. Plot a curve of speedup versus number of processes used. Also plot efficiency versus number of processes. How well does your code scale? How does the problem size affect the efficiency? Are there ways that the parallel performance of your code can be improved? You may want to consider operation count in critical loops, memory usage, compiler optimization, communication overhead, etc. as ways to improve the speed of your code. 9

HPCC - Hrothgar Getting Started User Guide MPI Programming

HPCC - Hrothgar Getting Started User Guide MPI Programming HPCC - Hrothgar Getting Started User Guide MPI Programming High Performance Computing Center Texas Tech University HPCC - Hrothgar 2 Table of Contents 1. Introduction... 3 2. Setting up the environment...

More information

Parallel Programming with MPI on the Odyssey Cluster

Parallel Programming with MPI on the Odyssey Cluster Parallel Programming with MPI on the Odyssey Cluster Plamen Krastev Office: Oxford 38, Room 204 Email: plamenkrastev@fas.harvard.edu FAS Research Computing Harvard University Objectives: To introduce you

More information

Grid 101. Grid 101. Josh Hegie. grid@unr.edu http://hpc.unr.edu

Grid 101. Grid 101. Josh Hegie. grid@unr.edu http://hpc.unr.edu Grid 101 Josh Hegie grid@unr.edu http://hpc.unr.edu Accessing the Grid Outline 1 Accessing the Grid 2 Working on the Grid 3 Submitting Jobs with SGE 4 Compiling 5 MPI 6 Questions? Accessing the Grid Logging

More information

Parallelization: Binary Tree Traversal

Parallelization: Binary Tree Traversal By Aaron Weeden and Patrick Royal Shodor Education Foundation, Inc. August 2012 Introduction: According to Moore s law, the number of transistors on a computer chip doubles roughly every two years. First

More information

To connect to the cluster, simply use a SSH or SFTP client to connect to:

To connect to the cluster, simply use a SSH or SFTP client to connect to: RIT Computer Engineering Cluster The RIT Computer Engineering cluster contains 12 computers for parallel programming using MPI. One computer, cluster-head.ce.rit.edu, serves as the master controller or

More information

Miami University RedHawk Cluster Working with batch jobs on the Cluster

Miami University RedHawk Cluster Working with batch jobs on the Cluster Miami University RedHawk Cluster Working with batch jobs on the Cluster The RedHawk cluster is a general purpose research computing resource available to support the research community at Miami University.

More information

Streamline Computing Linux Cluster User Training. ( Nottingham University)

Streamline Computing Linux Cluster User Training. ( Nottingham University) 1 Streamline Computing Linux Cluster User Training ( Nottingham University) 3 User Training Agenda System Overview System Access Description of Cluster Environment Code Development Job Schedulers Running

More information

Grid Engine Basics. Table of Contents. Grid Engine Basics Version 1. (Formerly: Sun Grid Engine)

Grid Engine Basics. Table of Contents. Grid Engine Basics Version 1. (Formerly: Sun Grid Engine) Grid Engine Basics (Formerly: Sun Grid Engine) Table of Contents Table of Contents Document Text Style Associations Prerequisites Terminology What is the Grid Engine (SGE)? Loading the SGE Module on Turing

More information

Introduction to Sun Grid Engine (SGE)

Introduction to Sun Grid Engine (SGE) Introduction to Sun Grid Engine (SGE) What is SGE? Sun Grid Engine (SGE) is an open source community effort to facilitate the adoption of distributed computing solutions. Sponsored by Sun Microsystems

More information

Lecture 6: Introduction to MPI programming. Lecture 6: Introduction to MPI programming p. 1

Lecture 6: Introduction to MPI programming. Lecture 6: Introduction to MPI programming p. 1 Lecture 6: Introduction to MPI programming Lecture 6: Introduction to MPI programming p. 1 MPI (message passing interface) MPI is a library standard for programming distributed memory MPI implementation(s)

More information

Running applications on the Cray XC30 4/12/2015

Running applications on the Cray XC30 4/12/2015 Running applications on the Cray XC30 4/12/2015 1 Running on compute nodes By default, users do not log in and run applications on the compute nodes directly. Instead they launch jobs on compute nodes

More information

1.0. User Manual For HPC Cluster at GIKI. Volume. Ghulam Ishaq Khan Institute of Engineering Sciences & Technology

1.0. User Manual For HPC Cluster at GIKI. Volume. Ghulam Ishaq Khan Institute of Engineering Sciences & Technology Volume 1.0 FACULTY OF CUMPUTER SCIENCE & ENGINEERING Ghulam Ishaq Khan Institute of Engineering Sciences & Technology User Manual For HPC Cluster at GIKI Designed and prepared by Faculty of Computer Science

More information

The CNMS Computer Cluster

The CNMS Computer Cluster The CNMS Computer Cluster This page describes the CNMS Computational Cluster, how to access it, and how to use it. Introduction (2014) The latest block of the CNMS Cluster (2010) Previous blocks of the

More information

High performance computing systems. Lab 1

High performance computing systems. Lab 1 High performance computing systems Lab 1 Dept. of Computer Architecture Faculty of ETI Gdansk University of Technology Paweł Czarnul For this exercise, study basic MPI functions such as: 1. for MPI management:

More information

NYUAD HPC Center Running Jobs

NYUAD HPC Center Running Jobs NYUAD HPC Center Running Jobs 1 Overview... Error! Bookmark not defined. 1.1 General List... Error! Bookmark not defined. 1.2 Compilers... Error! Bookmark not defined. 2 Loading Software... Error! Bookmark

More information

Cluster@WU User s Manual

Cluster@WU User s Manual Cluster@WU User s Manual Stefan Theußl Martin Pacala September 29, 2014 1 Introduction and scope At the WU Wirtschaftsuniversität Wien the Research Institute for Computational Methods (Forschungsinstitut

More information

NEC HPC-Linux-Cluster

NEC HPC-Linux-Cluster NEC HPC-Linux-Cluster Hardware configuration: 4 Front-end servers: each with SandyBridge-EP processors: 16 cores per node 128 GB memory 134 compute nodes: 112 nodes with SandyBridge-EP processors (16 cores

More information

Parallel Programming for Multi-Core, Distributed Systems, and GPUs Exercises

Parallel Programming for Multi-Core, Distributed Systems, and GPUs Exercises Parallel Programming for Multi-Core, Distributed Systems, and GPUs Exercises Pierre-Yves Taunay Research Computing and Cyberinfrastructure 224A Computer Building The Pennsylvania State University University

More information

Work Environment. David Tur HPC Expert. HPC Users Training September, 18th 2015

Work Environment. David Tur HPC Expert. HPC Users Training September, 18th 2015 Work Environment David Tur HPC Expert HPC Users Training September, 18th 2015 1. Atlas Cluster: Accessing and using resources 2. Software Overview 3. Job Scheduler 1. Accessing Resources DIPC technicians

More information

Lightning Introduction to MPI Programming

Lightning Introduction to MPI Programming Lightning Introduction to MPI Programming May, 2015 What is MPI? Message Passing Interface A standard, not a product First published 1994, MPI-2 published 1997 De facto standard for distributed-memory

More information

Batch Scripts for RA & Mio

Batch Scripts for RA & Mio Batch Scripts for RA & Mio Timothy H. Kaiser, Ph.D. tkaiser@mines.edu 1 Jobs are Run via a Batch System Ra and Mio are shared resources Purpose: Give fair access to all users Have control over where jobs

More information

Manual for using Super Computing Resources

Manual for using Super Computing Resources Manual for using Super Computing Resources Super Computing Research and Education Centre at Research Centre for Modeling and Simulation National University of Science and Technology H-12 Campus, Islamabad

More information

Beyond Windows: Using the Linux Servers and the Grid

Beyond Windows: Using the Linux Servers and the Grid Beyond Windows: Using the Linux Servers and the Grid Topics Linux Overview How to Login & Remote Access Passwords Staying Up-To-Date Network Drives Server List The Grid Useful Commands Linux Overview Linux

More information

Introduction to Running Hadoop on the High Performance Clusters at the Center for Computational Research

Introduction to Running Hadoop on the High Performance Clusters at the Center for Computational Research Introduction to Running Hadoop on the High Performance Clusters at the Center for Computational Research Cynthia Cornelius Center for Computational Research University at Buffalo, SUNY 701 Ellicott St

More information

An introduction to Fyrkat

An introduction to Fyrkat Cluster Computing May 25, 2011 How to get an account https://fyrkat.grid.aau.dk/useraccount How to get help https://fyrkat.grid.aau.dk/wiki What is a Cluster Anyway It is NOT something that does any of

More information

High Performance Computing Facility Specifications, Policies and Usage. Supercomputer Project. Bibliotheca Alexandrina

High Performance Computing Facility Specifications, Policies and Usage. Supercomputer Project. Bibliotheca Alexandrina High Performance Computing Facility Specifications, Policies and Usage Supercomputer Project Bibliotheca Alexandrina Bibliotheca Alexandrina 1/16 Topics Specifications Overview Site Policies Intel Compilers

More information

SGE Roll: Users Guide. Version @VERSION@ Edition

SGE Roll: Users Guide. Version @VERSION@ Edition SGE Roll: Users Guide Version @VERSION@ Edition SGE Roll: Users Guide : Version @VERSION@ Edition Published Aug 2006 Copyright 2006 UC Regents, Scalable Systems Table of Contents Preface...i 1. Requirements...1

More information

Quick Tutorial for Portable Batch System (PBS)

Quick Tutorial for Portable Batch System (PBS) Quick Tutorial for Portable Batch System (PBS) The Portable Batch System (PBS) system is designed to manage the distribution of batch jobs and interactive sessions across the available nodes in the cluster.

More information

Session 2: MUST. Correctness Checking

Session 2: MUST. Correctness Checking Center for Information Services and High Performance Computing (ZIH) Session 2: MUST Correctness Checking Dr. Matthias S. Müller (RWTH Aachen University) Tobias Hilbrich (Technische Universität Dresden)

More information

Linux für bwgrid. Sabine Richling, Heinz Kredel. Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim. 27.

Linux für bwgrid. Sabine Richling, Heinz Kredel. Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim. 27. Linux für bwgrid Sabine Richling, Heinz Kredel Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim 27. June 2011 Richling/Kredel (URZ/RUM) Linux für bwgrid FS 2011 1 / 33 Introduction

More information

Hodor and Bran - Job Scheduling and PBS Scripts

Hodor and Bran - Job Scheduling and PBS Scripts Hodor and Bran - Job Scheduling and PBS Scripts UND Computational Research Center Now that you have your program compiled and your input file ready for processing, it s time to run your job on the cluster.

More information

COMP/CS 605: Introduction to Parallel Computing Lecture 21: Shared Memory Programming with OpenMP

COMP/CS 605: Introduction to Parallel Computing Lecture 21: Shared Memory Programming with OpenMP COMP/CS 605: Introduction to Parallel Computing Lecture 21: Shared Memory Programming with OpenMP Mary Thomas Department of Computer Science Computational Science Research Center (CSRC) San Diego State

More information

An Introduction to High Performance Computing in the Department

An Introduction to High Performance Computing in the Department An Introduction to High Performance Computing in the Department Ashley Ford & Chris Jewell Department of Statistics University of Warwick October 30, 2012 1 Some Background 2 How is Buster used? 3 Software

More information

Parallel Computing using MATLAB Distributed Compute Server ZORRO HPC

Parallel Computing using MATLAB Distributed Compute Server ZORRO HPC Parallel Computing using MATLAB Distributed Compute Server ZORRO HPC Goals of the session Overview of parallel MATLAB Why parallel MATLAB? Multiprocessing in MATLAB Parallel MATLAB using the Parallel Computing

More information

RA MPI Compilers Debuggers Profiling. March 25, 2009

RA MPI Compilers Debuggers Profiling. March 25, 2009 RA MPI Compilers Debuggers Profiling March 25, 2009 Examples and Slides To download examples on RA 1. mkdir class 2. cd class 3. wget http://geco.mines.edu/workshop/class2/examples/examples.tgz 4. tar

More information

Introduction to Linux and Cluster Basics for the CCR General Computing Cluster

Introduction to Linux and Cluster Basics for the CCR General Computing Cluster Introduction to Linux and Cluster Basics for the CCR General Computing Cluster Cynthia Cornelius Center for Computational Research University at Buffalo, SUNY 701 Ellicott St Buffalo, NY 14203 Phone: 716-881-8959

More information

SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education www.accre.vanderbilt.

SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education www.accre.vanderbilt. SLURM: Resource Management and Job Scheduling Software Advanced Computing Center for Research and Education www.accre.vanderbilt.edu Simple Linux Utility for Resource Management But it s also a job scheduler!

More information

Tutorial: Using WestGrid. Drew Leske Compute Canada/WestGrid Site Lead University of Victoria

Tutorial: Using WestGrid. Drew Leske Compute Canada/WestGrid Site Lead University of Victoria Tutorial: Using WestGrid Drew Leske Compute Canada/WestGrid Site Lead University of Victoria Fall 2013 Seminar Series Date Speaker Topic 23 September Lindsay Sill Introduction to WestGrid 9 October Drew

More information

OpenMP & MPI CISC 879. Tristan Vanderbruggen & John Cavazos Dept of Computer & Information Sciences University of Delaware

OpenMP & MPI CISC 879. Tristan Vanderbruggen & John Cavazos Dept of Computer & Information Sciences University of Delaware OpenMP & MPI CISC 879 Tristan Vanderbruggen & John Cavazos Dept of Computer & Information Sciences University of Delaware 1 Lecture Overview Introduction OpenMP MPI Model Language extension: directives-based

More information

Load Balancing. computing a file with grayscales. granularity considerations static work load assignment with MPI

Load Balancing. computing a file with grayscales. granularity considerations static work load assignment with MPI Load Balancing 1 the Mandelbrot set computing a file with grayscales 2 Static Work Load Assignment granularity considerations static work load assignment with MPI 3 Dynamic Work Load Balancing scheduling

More information

Compute Cluster Server Lab 3: Debugging the parallel MPI programs in Microsoft Visual Studio 2005

Compute Cluster Server Lab 3: Debugging the parallel MPI programs in Microsoft Visual Studio 2005 Compute Cluster Server Lab 3: Debugging the parallel MPI programs in Microsoft Visual Studio 2005 Compute Cluster Server Lab 3: Debugging the parallel MPI programs in Microsoft Visual Studio 2005... 1

More information

Paul s Norwegian Vacation (or Experiences with Cluster Computing ) Paul Sack 20 September, 2002. sack@stud.ntnu.no www.stud.ntnu.

Paul s Norwegian Vacation (or Experiences with Cluster Computing ) Paul Sack 20 September, 2002. sack@stud.ntnu.no www.stud.ntnu. Paul s Norwegian Vacation (or Experiences with Cluster Computing ) Paul Sack 20 September, 2002 sack@stud.ntnu.no www.stud.ntnu.no/ sack/ Outline Background information Work on clusters Profiling tools

More information

Parallel Computing. Parallel shared memory computing with OpenMP

Parallel Computing. Parallel shared memory computing with OpenMP Parallel Computing Parallel shared memory computing with OpenMP Thorsten Grahs, 14.07.2014 Table of contents Introduction Directives Scope of data Synchronization OpenMP vs. MPI OpenMP & MPI 14.07.2014

More information

Using WestGrid. Patrick Mann, Manager, Technical Operations Jan.15, 2014

Using WestGrid. Patrick Mann, Manager, Technical Operations Jan.15, 2014 Using WestGrid Patrick Mann, Manager, Technical Operations Jan.15, 2014 Winter 2014 Seminar Series Date Speaker Topic 5 February Gino DiLabio Molecular Modelling Using HPC and Gaussian 26 February Jonathan

More information

MPI Hands-On List of the exercises

MPI Hands-On List of the exercises MPI Hands-On List of the exercises 1 MPI Hands-On Exercise 1: MPI Environment.... 2 2 MPI Hands-On Exercise 2: Ping-pong...3 3 MPI Hands-On Exercise 3: Collective communications and reductions... 5 4 MPI

More information

Job Scheduling with Moab Cluster Suite

Job Scheduling with Moab Cluster Suite Job Scheduling with Moab Cluster Suite IBM High Performance Computing February 2010 Y. Joanna Wong, Ph.D. yjw@us.ibm.com 2/22/2010 Workload Manager Torque Source: Adaptive Computing 2 Some terminology..

More information

GRID Computing: CAS Style

GRID Computing: CAS Style CS4CC3 Advanced Operating Systems Architectures Laboratory 7 GRID Computing: CAS Style campus trunk C.I.S. router "birkhoff" server The CAS Grid Computer 100BT ethernet node 1 "gigabyte" Ethernet switch

More information

Grid Engine Users Guide. 2011.11p1 Edition

Grid Engine Users Guide. 2011.11p1 Edition Grid Engine Users Guide 2011.11p1 Edition Grid Engine Users Guide : 2011.11p1 Edition Published Nov 01 2012 Copyright 2012 University of California and Scalable Systems This document is subject to the

More information

Advanced MPI. Hybrid programming, profiling and debugging of MPI applications. Hristo Iliev RZ. Rechen- und Kommunikationszentrum (RZ)

Advanced MPI. Hybrid programming, profiling and debugging of MPI applications. Hristo Iliev RZ. Rechen- und Kommunikationszentrum (RZ) Advanced MPI Hybrid programming, profiling and debugging of MPI applications Hristo Iliev RZ Rechen- und Kommunikationszentrum (RZ) Agenda Halos (ghost cells) Hybrid programming Profiling of MPI applications

More information

16 node Linux cluster at SCFBio

16 node Linux cluster at SCFBio Clustering Tutorial What is Clustering? Clustering is the use of multiple computers, typically PCs or UNIX workstations, multiple storage devices, and redundant interconnections, to form what appears to

More information

Introduction to Running Computations on the High Performance Clusters at the Center for Computational Research

Introduction to Running Computations on the High Performance Clusters at the Center for Computational Research ! Introduction to Running Computations on the High Performance Clusters at the Center for Computational Research! Cynthia Cornelius! Center for Computational Research University at Buffalo, SUNY! cdc at

More information

Job scheduler details

Job scheduler details Job scheduler details Advanced Computing Center for Research & Education (ACCRE) Job scheduler details 1 / 25 Outline 1 Batch queue system overview 2 Torque and Moab 3 Submitting jobs (ACCRE) Job scheduler

More information

Matlab on a Supercomputer

Matlab on a Supercomputer Matlab on a Supercomputer Shelley L. Knuth Research Computing April 9, 2015 Outline Description of Matlab and supercomputing Interactive Matlab jobs Non-interactive Matlab jobs Parallel Computing Slides

More information

Ra - Batch Scripts. Timothy H. Kaiser, Ph.D. tkaiser@mines.edu

Ra - Batch Scripts. Timothy H. Kaiser, Ph.D. tkaiser@mines.edu Ra - Batch Scripts Timothy H. Kaiser, Ph.D. tkaiser@mines.edu Jobs on Ra are Run via a Batch System Ra is a shared resource Purpose: Give fair access to all users Have control over where jobs are run Set

More information

PBS Tutorial. Fangrui Ma Universit of Nebraska-Lincoln. October 26th, 2007

PBS Tutorial. Fangrui Ma Universit of Nebraska-Lincoln. October 26th, 2007 PBS Tutorial Fangrui Ma Universit of Nebraska-Lincoln October 26th, 2007 Abstract In this tutorial we gave a brief introduction to using PBS Pro. We gave examples on how to write control script, and submit

More information

Spring 2011 Prof. Hyesoon Kim

Spring 2011 Prof. Hyesoon Kim Spring 2011 Prof. Hyesoon Kim Today, we will study typical patterns of parallel programming This is just one of the ways. Materials are based on a book by Timothy. Decompose Into tasks Original Problem

More information

SLURM Workload Manager

SLURM Workload Manager SLURM Workload Manager What is SLURM? SLURM (Simple Linux Utility for Resource Management) is the native scheduler software that runs on ASTI's HPC cluster. Free and open-source job scheduler for the Linux

More information

LOAD BALANCING DISTRIBUTED OPERATING SYSTEMS, SCALABILITY, SS 2015. Hermann Härtig

LOAD BALANCING DISTRIBUTED OPERATING SYSTEMS, SCALABILITY, SS 2015. Hermann Härtig LOAD BALANCING DISTRIBUTED OPERATING SYSTEMS, SCALABILITY, SS 2015 Hermann Härtig ISSUES starting points independent Unix processes and block synchronous execution who does it load migration mechanism

More information

Using Parallel Computing to Run Multiple Jobs

Using Parallel Computing to Run Multiple Jobs Beowulf Training Using Parallel Computing to Run Multiple Jobs Jeff Linderoth August 5, 2003 August 5, 2003 Beowulf Training Running Multiple Jobs Slide 1 Outline Introduction to Scheduling Software The

More information

How To Run A Steady Case On A Creeper

How To Run A Steady Case On A Creeper Crash Course Introduction to OpenFOAM Artur Lidtke University of Southampton akl1g09@soton.ac.uk November 4, 2014 Artur Lidtke Crash Course Introduction to OpenFOAM 1 / 32 What is OpenFOAM? Using OpenFOAM

More information

How to Run Parallel Jobs Efficiently

How to Run Parallel Jobs Efficiently How to Run Parallel Jobs Efficiently Shao-Ching Huang High Performance Computing Group UCLA Institute for Digital Research and Education May 9, 2013 1 The big picture: running parallel jobs on Hoffman2

More information

Introduction to MPI Programming!

Introduction to MPI Programming! Introduction to MPI Programming! Rocks-A-Palooza II! Lab Session! 2006 UC Regents! 1! Modes of Parallel Computing! SIMD - Single Instruction Multiple Data!!processors are lock-stepped : each processor

More information

Informatica e Sistemi in Tempo Reale

Informatica e Sistemi in Tempo Reale Informatica e Sistemi in Tempo Reale Introduction to C programming Giuseppe Lipari http://retis.sssup.it/~lipari Scuola Superiore Sant Anna Pisa October 25, 2010 G. Lipari (Scuola Superiore Sant Anna)

More information

Getting Started with HPC

Getting Started with HPC Getting Started with HPC An Introduction to the Minerva High Performance Computing Resource 17 Sep 2013 Outline of Topics Introduction HPC Accounts Logging onto the HPC Clusters Common Linux Commands Storage

More information

Installing and running COMSOL on a Linux cluster

Installing and running COMSOL on a Linux cluster Installing and running COMSOL on a Linux cluster Introduction This quick guide explains how to install and operate COMSOL Multiphysics 5.0 on a Linux cluster. It is a complement to the COMSOL Installation

More information

On-demand (Pay-per-Use) HPC Service Portal

On-demand (Pay-per-Use) HPC Service Portal On-demand (Pay-per-Use) Portal Wang Junhong INTRODUCTION High Performance Computing, Computer Centre The Service Portal is a key component of the On-demand (pay-per-use) HPC service delivery. The Portal,

More information

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming Overview Lecture 1: an introduction to CUDA Mike Giles mike.giles@maths.ox.ac.uk hardware view software view Oxford University Mathematical Institute Oxford e-research Centre Lecture 1 p. 1 Lecture 1 p.

More information

Agenda. Using HPC Wales 2

Agenda. Using HPC Wales 2 Using HPC Wales Agenda Infrastructure : An Overview of our Infrastructure Logging in : Command Line Interface and File Transfer Linux Basics : Commands and Text Editors Using Modules : Managing Software

More information

SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education www.accre.vanderbilt.

SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education www.accre.vanderbilt. SLURM: Resource Management and Job Scheduling Software Advanced Computing Center for Research and Education www.accre.vanderbilt.edu Simple Linux Utility for Resource Management But it s also a job scheduler!

More information

Introduction to Parallel Programming and MapReduce

Introduction to Parallel Programming and MapReduce Introduction to Parallel Programming and MapReduce Audience and Pre-Requisites This tutorial covers the basics of parallel programming and the MapReduce programming model. The pre-requisites are significant

More information

Juropa. Batch Usage Introduction. May 2014 Chrysovalantis Paschoulas c.paschoulas@fz-juelich.de

Juropa. Batch Usage Introduction. May 2014 Chrysovalantis Paschoulas c.paschoulas@fz-juelich.de Juropa Batch Usage Introduction May 2014 Chrysovalantis Paschoulas c.paschoulas@fz-juelich.de Batch System Usage Model A Batch System: monitors and controls the resources on the system manages and schedules

More information

Martinos Center Compute Clusters

Martinos Center Compute Clusters Intro What are the compute clusters How to gain access Housekeeping Usage Log In Submitting Jobs Queues Request CPUs/vmem Email Status I/O Interactive Dependencies Daisy Chain Wrapper Script In Progress

More information

MS SQL Performance (Tuning) Best Practices:

MS SQL Performance (Tuning) Best Practices: MS SQL Performance (Tuning) Best Practices: 1. Don t share the SQL server hardware with other services If other workloads are running on the same server where SQL Server is running, memory and other hardware

More information

Parallel and Distributed Computing Programming Assignment 1

Parallel and Distributed Computing Programming Assignment 1 Parallel and Distributed Computing Programming Assignment 1 Due Monday, February 7 For programming assignment 1, you should write two C programs. One should provide an estimate of the performance of ping-pong

More information

High Performance Computing in CST STUDIO SUITE

High Performance Computing in CST STUDIO SUITE High Performance Computing in CST STUDIO SUITE Felix Wolfheimer GPU Computing Performance Speedup 18 16 14 12 10 8 6 4 2 0 Promo offer for EUC participants: 25% discount for K40 cards Speedup of Solver

More information

Debugging with TotalView

Debugging with TotalView Tim Cramer 17.03.2015 IT Center der RWTH Aachen University Why to use a Debugger? If your program goes haywire, you may... ( wand (... buy a magic... read the source code again and again and...... enrich

More information

NorduGrid ARC Tutorial

NorduGrid ARC Tutorial NorduGrid ARC Tutorial / Arto Teräs and Olli Tourunen 2006-03-23 Slide 1(34) NorduGrid ARC Tutorial Arto Teräs and Olli Tourunen CSC, Espoo, Finland March 23

More information

Debugging and Profiling Lab. Carlos Rosales, Kent Milfeld and Yaakoub Y. El Kharma carlos@tacc.utexas.edu

Debugging and Profiling Lab. Carlos Rosales, Kent Milfeld and Yaakoub Y. El Kharma carlos@tacc.utexas.edu Debugging and Profiling Lab Carlos Rosales, Kent Milfeld and Yaakoub Y. El Kharma carlos@tacc.utexas.edu Setup Login to Ranger: - ssh -X username@ranger.tacc.utexas.edu Make sure you can export graphics

More information

MFCF Grad Session 2015

MFCF Grad Session 2015 MFCF Grad Session 2015 Agenda Introduction Help Centre and requests Dept. Grad reps Linux clusters using R with MPI Remote applications Future computing direction Technical question and answer period MFCF

More information

Parallel Processing using the LOTUS cluster

Parallel Processing using the LOTUS cluster Parallel Processing using the LOTUS cluster Alison Pamment / Cristina del Cano Novales JASMIN/CEMS Workshop February 2015 Overview Parallelising data analysis LOTUS HPC Cluster Job submission on LOTUS

More information

High-Performance Computing

High-Performance Computing High-Performance Computing Windows, Matlab and the HPC Dr. Leigh Brookshaw Dept. of Maths and Computing, USQ 1 The HPC Architecture 30 Sun boxes or nodes Each node has 2 x 2.4GHz AMD CPUs with 4 Cores

More information

Workflow-Management with flowguide

Workflow-Management with flowguide Technical White Paper May 2003 flowguide IT-Services Workflow-Management with flowguide (TM) Improving Quality and Efficiency by Automating your Engineering Processes Betriebskonzepte Sicherheitslösungen

More information

Until now: tl;dr: - submit a job to the scheduler

Until now: tl;dr: - submit a job to the scheduler Until now: - access the cluster copy data to/from the cluster create parallel software compile code and use optimized libraries how to run the software on the full cluster tl;dr: - submit a job to the

More information

Notes on the SNOW/Rmpi R packages with OpenMPI and Sun Grid Engine

Notes on the SNOW/Rmpi R packages with OpenMPI and Sun Grid Engine Notes on the SNOW/Rmpi R packages with OpenMPI and Sun Grid Engine Last updated: 6/2/2008 4:43PM EDT We informally discuss the basic set up of the R Rmpi and SNOW packages with OpenMPI and the Sun Grid

More information

Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca December 1st, 2015

Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca December 1st, 2015 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca December 1st, 2015 1 Partners and sponsors 2 Exercise 0: Login and Setup Ubuntu login:

More information

CHM 579 Lab 1: Basic Monte Carlo Algorithm

CHM 579 Lab 1: Basic Monte Carlo Algorithm CHM 579 Lab 1: Basic Monte Carlo Algorithm Due 02/12/2014 The goal of this lab is to get familiar with a simple Monte Carlo program and to be able to compile and run it on a Linux server. Lab Procedure:

More information

Resource Management and Job Scheduling

Resource Management and Job Scheduling Resource Management and Job Scheduling Jenett Tillotson Senior Cluster System Administrator Indiana University May 18 18-22 May 2015 1 Resource Managers Keep track of resources Nodes: CPUs, disk, memory,

More information

High Performance Computing

High Performance Computing High Performance Computing at Stellenbosch University Gerhard Venter Outline 1 Background 2 Clusters 3 SU History 4 SU Cluster 5 Using the Cluster 6 Examples What is High Performance Computing? Wikipedia

More information

Storing Measurement Data

Storing Measurement Data Storing Measurement Data File I/O records or reads data in a file. A typical file I/O operation involves the following process. 1. Create or open a file. Indicate where an existing file resides or where

More information

CS10110 Introduction to personal computer equipment

CS10110 Introduction to personal computer equipment CS10110 Introduction to personal computer equipment PRACTICAL 4 : Process, Task and Application Management In this practical you will: Use Unix shell commands to find out about the processes the operating

More information

HPC at IU Overview. Abhinav Thota Research Technologies Indiana University

HPC at IU Overview. Abhinav Thota Research Technologies Indiana University HPC at IU Overview Abhinav Thota Research Technologies Indiana University What is HPC/cyberinfrastructure? Why should you care? Data sizes are growing Need to get to the solution faster Compute power is

More information

CSC230 Getting Starting in C. Tyler Bletsch

CSC230 Getting Starting in C. Tyler Bletsch CSC230 Getting Starting in C Tyler Bletsch What is C? The language of UNIX Procedural language (no classes) Low-level access to memory Easy to map to machine language Not much run-time stuff needed Surprisingly

More information

Grid Engine. Application Integration

Grid Engine. Application Integration Grid Engine Application Integration Getting Stuff Done. Batch Interactive - Terminal Interactive - X11/GUI Licensed Applications Parallel Jobs DRMAA Batch Jobs Most common What is run: Shell Scripts Binaries

More information

How To Visualize Performance Data In A Computer Program

How To Visualize Performance Data In A Computer Program Performance Visualization Tools 1 Performance Visualization Tools Lecture Outline : Following Topics will be discussed Characteristics of Performance Visualization technique Commercial and Public Domain

More information

CUDA programming on NVIDIA GPUs

CUDA programming on NVIDIA GPUs p. 1/21 on NVIDIA GPUs Mike Giles mike.giles@maths.ox.ac.uk Oxford University Mathematical Institute Oxford-Man Institute for Quantitative Finance Oxford eresearch Centre p. 2/21 Overview hardware view

More information

Introduction to the SGE/OGS batch-queuing system

Introduction to the SGE/OGS batch-queuing system Grid Computing Competence Center Introduction to the SGE/OGS batch-queuing system Riccardo Murri Grid Computing Competence Center, Organisch-Chemisches Institut, University of Zurich Oct. 6, 2011 The basic

More information

Parallel Computing with MATLAB

Parallel Computing with MATLAB Parallel Computing with MATLAB Scott Benway Senior Account Manager Jiro Doke, Ph.D. Senior Application Engineer 2013 The MathWorks, Inc. 1 Acceleration Strategies Applied in MATLAB Approach Options Best

More information

Setting up PostgreSQL

Setting up PostgreSQL Setting up PostgreSQL 1 Introduction to PostgreSQL PostgreSQL is an object-relational database management system based on POSTGRES, which was developed at the University of California at Berkeley. PostgreSQL

More information

Working with HPC and HTC Apps. Abhinav Thota Research Technologies Indiana University

Working with HPC and HTC Apps. Abhinav Thota Research Technologies Indiana University Working with HPC and HTC Apps Abhinav Thota Research Technologies Indiana University Outline What are HPC apps? Working with typical HPC apps Compilers - Optimizations and libraries Installation Modules

More information

Running on Blue Gene/Q at Argonne Leadership Computing Facility (ALCF)

Running on Blue Gene/Q at Argonne Leadership Computing Facility (ALCF) Running on Blue Gene/Q at Argonne Leadership Computing Facility (ALCF) ALCF Resources: Machines & Storage Mira (Production) IBM Blue Gene/Q 49,152 nodes / 786,432 cores 768 TB of memory Peak flop rate:

More information