An Introduction to Parallel Computing With MPI. Computing Lab I
|
|
- Oliver Sutton
- 7 years ago
- Views:
Transcription
1 An Introduction to Parallel Computing With MPI Computing Lab I The purpose of the first programming exercise is to become familiar with the operating environment on a parallel computer, and to create and run a simple parallel program using MPI. In code development, it is always a good idea to start simple and develop/debug each piece of code before adding more complexity. This first code will implement the basic MPI structure, query communicator info, and output process rank the classic hello world program. You will learn how to compile parallel programs and submit batch jobs using the scheduler. Write a basic hello world code which creates a MPI environment, determines the number of processes in the global communicator, and writes the rank of each process to standard output. You will have to implement the correct MPI language binding depending on which programming language you are using: FORTRAN, C, or C++. The general program structure for each language is shown below. You can write your code in the editor TextWrangler which is installed on the lab computers. This program allows you to edit and save source code files either locally on the lab computers or remotely on socrates and transfer files as needed using sftp. FORTRAN90 C program MPIhelloworld implicit none include mpif.h! Include the MPI header file integer::ierr,pid,np call MPI_INIT(ierr)! Initialize MPI environment call MPI_COMM_SIZE(MPI_COMM_WORLD,np,ierr)! Get number of processes (np) call MPI_COMM_RANK(MPI_COMM_WORLD,pid,ierr)! Get local rank (pid) write(*,*) I am process:,pid call MPI_FINALIZE(ierr)! Terminate MPI environment stop end program MPIhelloworld #include <mpi.h> /* Include the MPI header file */ #include <stdio.h> main (int argc, char *argv[]) { int ierr,pid,np; 1
2 C++ ierr = MPI_Init(&argc, &argv); /* Initialize MPI environment*/ MPI_Comm_size(MPI_COMM_WORLD, &np); /* Get number of processes (np) MPI_Comm_rank(MPI_COMM_WORLD, &pid); /* Get local rank (pid) */ printf( I am process: %d \n,pid); MPI_Finalize(); /* Terminate MPI environment */ } #include <mpi.h> // Include the MPI header file #include <isostream> int main (int argc, char ** argv) { int ierr,pid,np; MPI::Init(argc,argv); // Initialize MPI environment np = MPI::COMM_WORLD.Get_size(); // Get number of processes (np) pid = MPI::COMM_WORLD.Get_rank(); // Get local rank (pid) printf( I am process: %d \n,pid); MPI::Finalize(); // Terminate MPI environment } Save your source code to your home directory on socrates (from the TextWrangler File menu select Save to FTP/SFTP Server and log in). Now open a terminal program (such as Terminal or X11) and ssh to socrates. You should be able to log in with your NSID account. If you are not familiar with the UNIX command line environment, you can consult the attached document explaining all the basic commands you need to know. Your home directory is the location where you will keep all your source code, the executables, and your input and output data files. Your parallel code is submitted from the home directory and you can read and write files from there. Most parallel computers provide a different directory with additional disk space should your program use very large data files. socrates Information about socrates is available on the site Your account has been set up to use the OpenMPI implementation of the MPI standard. Socrates also has MPICH and LAM MPI installed. Socrates has the compilers gcc, g77, gfortran available. To compile a parallel MPI program you need to use the compiler scripts provided by OpenMPI which link the native compilers to the proper MPI libraries. The compiler scripts are mpif77 or mpif90 for FORTRAN programs, mpicc for C, and mpicc for C++ programs. They can be passed any flag accepted by the underlying compilers. To do a basic build, use one of the following commands: 2
3 []$ mpif90 -o executable sourcecode.f90 []$ mpicc -o executable sourcecode.c []$ mpicc -o executable sourcecode.cpp Socrates uses the TORQUE/Moab batching system to manage the load distribution on the cluster. This load leveling program creates a queuing system to manage the cluster, and users must submit their batch jobs to the queue. An outline of basic TORQUE commands is given below (which evolved from software called PBS Portable Batch System). To submit a parallel job, you will need to create a job script. Using a text editor (TextWrangler) create a new file named myjobscript.pbs and type in all the necessary commands required to submit your parallel job to the queue. A sample job script is shown below. Note that PBS commands are preceded by #PBS and comment lines are inserted with a single #. #/bin/sh # Sample PBS Script for use with OpenMPI on Socrates # Jason Hlady May 2010 # Specify the number of processors to use in the form of # nodes=x:ppn=y, where X = number of computers (nodes), # Y = number of processors per computer #PBS -l nodes=1:ppn=1 # Job name which will show up in queue, job output #PBS -N <my job name> # Optional: join error and output into one stream #PBS -j oe # Show what node the app started on--useful for serial jobs echo `hostname` cd $PBS_O_WORKDIR echo "Current working directory is `pwd`" echo "Starting run at: `date`" echo " " # Run the application mpirun <my program name> echo "Program finished with exit code $? at: `date`" exit 0 3
4 When you submit the batch job, TORQUE will assign a job ID number. The standard output and standard error of the job will be stored in the file myjobname.ojob_id# in your working directory. To submit your batch job simply enter qsub myjobscript.pbs The job ID number will be output to screen. To observe the status of your job in the queue, type qstat To kill a job enter qdel JOB_ID# You can view the man pages of any of these commands for more information and options. Computing Lab II Option 1: Jacobi Iteration on a Two-Dimensional Mesh This is a classic problem for learning the basics of building a parallel Single Program Multiple Data (SPMD) code with a domain decomposition approach and data dependency between processes. These issues are common to many parallel algorithms used in scientific programs. We will keep the algorithm as simple as possible so that you can focus on implementing the parallel communication and thinking about program efficiency. Consider solving the temperature distribution on a two-dimensional grid with fixed temperature values on the boundaries. Figure 1: Uniform grid of temperature values. Boundary values indicated by grey nodes. 4
5 The temperature values at all grid points can be stored in a two-dimensional data array, T(i,j). Starting from an initial guess for the temperature distribution (say T = 0 at all interior nodes (white squares)), we can calculate the final temperature distribution by repeatedly applying the calculation,,,,, over all interior nodes until the temperature values converge to the final solution. This is not a very efficient solver and it may take hundreds (or thousands) of sweeps of the grid before convergence, but it is the simplest algorithm you can use. An example FORTRAN 90 sequential program is given below. program jacobi! A program solving 2D heat equations using Jacobi iteration implicit none integer, parameter::id=100,jd=100 integer::i,j,n,nmax real(kind=8), dimension(0:id+1,0:jd+1)::tnew,told character(6)::filename! Initialize the domain Told=0.0_8! initial condition Told(0,:)=80.0_8! right boundary condition Told(id+1,:)=50.0_8! left boundary condition Told(:,0)=0.0_8! bottom boundary condition Told(:,jd+1)=100.0_8! top boundary condition Tnew=Told! Perform Jacobi iterations (nmax sweeps of the domain) nmax=1000 do n=1,nmax! Sweep interior nodes do i=1,id do j=1,jd Tnew(i,j)=(Told(i+1,j)+Told(i-1,j)+Told(i,j+1)+Told(i,j-1))/4.0_8! Copy Tnew to Told and sweep again Told=Tnew! Output field data to file 50 format(102f6.1) filename="t.dat" open(unit=20,file=filename,status="replace") do j=jd+1,0,-1 write(20,50)(tnew(i,j), i=0,id+1) stop end program jacobi 5
6 Now parallelize the Jacobi solver. Use a simple one-dimensional domain decomposition as shown below. Process: 0 1 n Figure 2: 1D decomposition of the grid. Each process will perform iterations only on its subdomain, and will have to exchange temperature values with neighboring processes at the subdomain boundaries. You should create a row of ghost points to store these communicated values. The external row of boundary values around the global domain can also be considered ghost points. If you keep things basic, you should be able to write the parallel program in less than 70 lines of code! Some tips and hints: To keep things simple, directly program the mapping of the domain to the processes, i.e. process 0 is on the left boundary, process n on the right boundary, the rest in the middle. You can also directly specify the different boundary conditions for each process. After every process sweeps its local nodes once, you will have to communicate the updated temperature values at subdomain boundaries before the next sweep. This can be accomplished in two communication shifts first everyone sends data to the process on the right and receives from the left, then everyone sends to the left and receives from the right. Make sure the communication pattern doesn t block. Since the data values you need to communicate may not be in contiguous memory locations in your 2D temperature data array, you can create a 1D buffer array and explicitly copy the data values in/out of the buffer and use the buffer array in the MPI_SEND and MPI_RECV calls. You may want to look at the data field when the computation is done, and the easiest way to do this is to have every process write its local data array to a separate data file. You will have to use a different file name for every process, and one way to automatically generate file names (in FORTRAN 90) with the process id as the file name is with ASCII number to character conversion: filename=achar((pid-mod(pid,10))/10+48) // achar(mod(pid,10)+48) // ".dat" which gives the file name 12.dat for pid = 12. Try using MPI_SENDRECV instead of separate blocking send and receive calls. This will allow you to solve the case when the domain is periodic in the x-direction (roll the domain into a 6
7 cylindrical shell with the two x-faces joined together) and process 0 communicates with process n,. You can implement a grid convergence measure such as the rms of the difference between T new and T old on the global grid, and then stop the outer loop when the convergence measure is acceptably small (say 10-5 ). To do this you will need to use collective communication calls to calculate the global convergence of the grid and to broadcast this value to all processes so that they stop at the same time. If you have the 1D domain decomposition working you can try a 2D domain decomposition which subdivides the domain into squares instead of strips. This is a more efficient decomposition since the number of subdomain ghost points is reduced. Option 2: Numerical Integration of a Set of Discrete Data This problem uses a master-worker model where the master process divides up the data and sends it to the workers, who perform local computations on the data and communicate results back to the master. There is no data dependency between workers (they don t need to communicate with each other). This is an example of what is called an embarrassingly parallel problem. Consider the numerical integration of a large set of discrete data values, which could represent points sampled from a function. Figure 3: Discrete data values, f(x i ), where i = 1,2,3,,n. To approximate the integral, we can fit straight lines between each pair of points and then compute the sum of the areas under each line segment. This is the trapezoid formula: 7
8 The locations may not be evenly spaced. An example FORTRAN 90 code is given below. program integrate! A program to numerically integrate discrete data from the file ptrace.dat implicit none integer, parameter::n=960000! Number of points in data file integer::i real(kind=8)::integral real(kind=8), dimension(n)::x,f! Open data file and read in data open(unit=21,file="ptrace.dat",status="old") do i=1,n read(21,*)x(i),f(i)! Now compute global integral integral=0.0_8 do i=1,n-1 integral=integral+(x(i+1)-x(i))*(f(i)+f(i+1))/2.0_8! trapezoidal formula! Ouput result write(*,*)"the integral of the data set is: ",integral stop end program integrate Now parallelize this program using the master-worker model. The master process (choose process 0, which is always present) reads in data from the file, divides it up evenly and distributes it to the workers (all other processes). The workers compute the integral of their portions of the data and return the results to the master. The master sums the results to find the global integral and outputs the result. If you keep things simple, you should be able to write the parallel program in less than 60 lines of code. Some tips and hints: If the data array is very large, the master process may not have enough local memory to store the entire array in memory. In this case it would be better to read in only part of the data set at a time and send it to a worker(s), before reading in more data (over-write previous values) and send to other workers, etc. In order to make this algorithm efficient, we need to minimize the idle time of the workers (and the master) and balance the computational work as evenly as possible. If the number of processes is small, and the data set is large, we may want the master process to help compute part of the integral while it is waiting for the workers to finish. Also, if the amount of data communicated to each worker is large (lots of communication overhead bandwidth related) other workers will be idling while they wait for their data. Would it be more efficient to send smaller parcels of data to each worker so that they all get to work quickly, and then repeatedly 8
9 send more data when they finish until all the work is done? But if the number of messages gets too large, then we will have increased latency-related overhead. You can try using non-blocking communication calls on the master process so that it can do other tasks while waiting for results from workers. You can also try using the scatter and reduce collective communication routines to implement the parallel program. Investigate Parallel Performance Measure the parallel performance of your code and examine how the efficiency varies with process count and problem size. Implement timing routines in your parallel code as well as in a sequential version, and write run time to standard output. When submitting timed parallel jobs to the queue, you want to make sure that resources are used exclusively for your job (i.e. other applications are not running at the same time on the same CPU). Also, the run time of your code may be affected by the mapping of processes to cores/sockets/nodes on the machine so experiment with this. It might be a good idea to launch the code several times and average the run time results. Measure the parallel efficiency and speedup of your code on different numbers of processes. You may also want to repeat the measurements on larger/smaller domains to examine the effects of problem size. The single process run time T 1 can be used to calculate speedup, or a tougher measure is to use the sequential code run time T s. Plot a curve of speedup versus number of processes used. Also plot efficiency versus number of processes. How well does your code scale? How does the problem size affect the efficiency? Are there ways that the parallel performance of your code can be improved? You may want to consider operation count in critical loops, memory usage, compiler optimization, communication overhead, etc. as ways to improve the speed of your code. 9
HPCC - Hrothgar Getting Started User Guide MPI Programming
HPCC - Hrothgar Getting Started User Guide MPI Programming High Performance Computing Center Texas Tech University HPCC - Hrothgar 2 Table of Contents 1. Introduction... 3 2. Setting up the environment...
More informationParallel Programming with MPI on the Odyssey Cluster
Parallel Programming with MPI on the Odyssey Cluster Plamen Krastev Office: Oxford 38, Room 204 Email: plamenkrastev@fas.harvard.edu FAS Research Computing Harvard University Objectives: To introduce you
More informationGrid 101. Grid 101. Josh Hegie. grid@unr.edu http://hpc.unr.edu
Grid 101 Josh Hegie grid@unr.edu http://hpc.unr.edu Accessing the Grid Outline 1 Accessing the Grid 2 Working on the Grid 3 Submitting Jobs with SGE 4 Compiling 5 MPI 6 Questions? Accessing the Grid Logging
More informationParallelization: Binary Tree Traversal
By Aaron Weeden and Patrick Royal Shodor Education Foundation, Inc. August 2012 Introduction: According to Moore s law, the number of transistors on a computer chip doubles roughly every two years. First
More informationTo connect to the cluster, simply use a SSH or SFTP client to connect to:
RIT Computer Engineering Cluster The RIT Computer Engineering cluster contains 12 computers for parallel programming using MPI. One computer, cluster-head.ce.rit.edu, serves as the master controller or
More informationMiami University RedHawk Cluster Working with batch jobs on the Cluster
Miami University RedHawk Cluster Working with batch jobs on the Cluster The RedHawk cluster is a general purpose research computing resource available to support the research community at Miami University.
More informationStreamline Computing Linux Cluster User Training. ( Nottingham University)
1 Streamline Computing Linux Cluster User Training ( Nottingham University) 3 User Training Agenda System Overview System Access Description of Cluster Environment Code Development Job Schedulers Running
More informationGrid Engine Basics. Table of Contents. Grid Engine Basics Version 1. (Formerly: Sun Grid Engine)
Grid Engine Basics (Formerly: Sun Grid Engine) Table of Contents Table of Contents Document Text Style Associations Prerequisites Terminology What is the Grid Engine (SGE)? Loading the SGE Module on Turing
More informationIntroduction to Sun Grid Engine (SGE)
Introduction to Sun Grid Engine (SGE) What is SGE? Sun Grid Engine (SGE) is an open source community effort to facilitate the adoption of distributed computing solutions. Sponsored by Sun Microsystems
More informationLecture 6: Introduction to MPI programming. Lecture 6: Introduction to MPI programming p. 1
Lecture 6: Introduction to MPI programming Lecture 6: Introduction to MPI programming p. 1 MPI (message passing interface) MPI is a library standard for programming distributed memory MPI implementation(s)
More informationRunning applications on the Cray XC30 4/12/2015
Running applications on the Cray XC30 4/12/2015 1 Running on compute nodes By default, users do not log in and run applications on the compute nodes directly. Instead they launch jobs on compute nodes
More information1.0. User Manual For HPC Cluster at GIKI. Volume. Ghulam Ishaq Khan Institute of Engineering Sciences & Technology
Volume 1.0 FACULTY OF CUMPUTER SCIENCE & ENGINEERING Ghulam Ishaq Khan Institute of Engineering Sciences & Technology User Manual For HPC Cluster at GIKI Designed and prepared by Faculty of Computer Science
More informationThe CNMS Computer Cluster
The CNMS Computer Cluster This page describes the CNMS Computational Cluster, how to access it, and how to use it. Introduction (2014) The latest block of the CNMS Cluster (2010) Previous blocks of the
More informationHigh performance computing systems. Lab 1
High performance computing systems Lab 1 Dept. of Computer Architecture Faculty of ETI Gdansk University of Technology Paweł Czarnul For this exercise, study basic MPI functions such as: 1. for MPI management:
More informationNYUAD HPC Center Running Jobs
NYUAD HPC Center Running Jobs 1 Overview... Error! Bookmark not defined. 1.1 General List... Error! Bookmark not defined. 1.2 Compilers... Error! Bookmark not defined. 2 Loading Software... Error! Bookmark
More informationCluster@WU User s Manual
Cluster@WU User s Manual Stefan Theußl Martin Pacala September 29, 2014 1 Introduction and scope At the WU Wirtschaftsuniversität Wien the Research Institute for Computational Methods (Forschungsinstitut
More informationNEC HPC-Linux-Cluster
NEC HPC-Linux-Cluster Hardware configuration: 4 Front-end servers: each with SandyBridge-EP processors: 16 cores per node 128 GB memory 134 compute nodes: 112 nodes with SandyBridge-EP processors (16 cores
More informationParallel Programming for Multi-Core, Distributed Systems, and GPUs Exercises
Parallel Programming for Multi-Core, Distributed Systems, and GPUs Exercises Pierre-Yves Taunay Research Computing and Cyberinfrastructure 224A Computer Building The Pennsylvania State University University
More informationWork Environment. David Tur HPC Expert. HPC Users Training September, 18th 2015
Work Environment David Tur HPC Expert HPC Users Training September, 18th 2015 1. Atlas Cluster: Accessing and using resources 2. Software Overview 3. Job Scheduler 1. Accessing Resources DIPC technicians
More informationLightning Introduction to MPI Programming
Lightning Introduction to MPI Programming May, 2015 What is MPI? Message Passing Interface A standard, not a product First published 1994, MPI-2 published 1997 De facto standard for distributed-memory
More informationBatch Scripts for RA & Mio
Batch Scripts for RA & Mio Timothy H. Kaiser, Ph.D. tkaiser@mines.edu 1 Jobs are Run via a Batch System Ra and Mio are shared resources Purpose: Give fair access to all users Have control over where jobs
More informationManual for using Super Computing Resources
Manual for using Super Computing Resources Super Computing Research and Education Centre at Research Centre for Modeling and Simulation National University of Science and Technology H-12 Campus, Islamabad
More informationBeyond Windows: Using the Linux Servers and the Grid
Beyond Windows: Using the Linux Servers and the Grid Topics Linux Overview How to Login & Remote Access Passwords Staying Up-To-Date Network Drives Server List The Grid Useful Commands Linux Overview Linux
More informationIntroduction to Running Hadoop on the High Performance Clusters at the Center for Computational Research
Introduction to Running Hadoop on the High Performance Clusters at the Center for Computational Research Cynthia Cornelius Center for Computational Research University at Buffalo, SUNY 701 Ellicott St
More informationAn introduction to Fyrkat
Cluster Computing May 25, 2011 How to get an account https://fyrkat.grid.aau.dk/useraccount How to get help https://fyrkat.grid.aau.dk/wiki What is a Cluster Anyway It is NOT something that does any of
More informationHigh Performance Computing Facility Specifications, Policies and Usage. Supercomputer Project. Bibliotheca Alexandrina
High Performance Computing Facility Specifications, Policies and Usage Supercomputer Project Bibliotheca Alexandrina Bibliotheca Alexandrina 1/16 Topics Specifications Overview Site Policies Intel Compilers
More informationSGE Roll: Users Guide. Version @VERSION@ Edition
SGE Roll: Users Guide Version @VERSION@ Edition SGE Roll: Users Guide : Version @VERSION@ Edition Published Aug 2006 Copyright 2006 UC Regents, Scalable Systems Table of Contents Preface...i 1. Requirements...1
More informationQuick Tutorial for Portable Batch System (PBS)
Quick Tutorial for Portable Batch System (PBS) The Portable Batch System (PBS) system is designed to manage the distribution of batch jobs and interactive sessions across the available nodes in the cluster.
More informationSession 2: MUST. Correctness Checking
Center for Information Services and High Performance Computing (ZIH) Session 2: MUST Correctness Checking Dr. Matthias S. Müller (RWTH Aachen University) Tobias Hilbrich (Technische Universität Dresden)
More informationLinux für bwgrid. Sabine Richling, Heinz Kredel. Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim. 27.
Linux für bwgrid Sabine Richling, Heinz Kredel Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim 27. June 2011 Richling/Kredel (URZ/RUM) Linux für bwgrid FS 2011 1 / 33 Introduction
More informationHodor and Bran - Job Scheduling and PBS Scripts
Hodor and Bran - Job Scheduling and PBS Scripts UND Computational Research Center Now that you have your program compiled and your input file ready for processing, it s time to run your job on the cluster.
More informationCOMP/CS 605: Introduction to Parallel Computing Lecture 21: Shared Memory Programming with OpenMP
COMP/CS 605: Introduction to Parallel Computing Lecture 21: Shared Memory Programming with OpenMP Mary Thomas Department of Computer Science Computational Science Research Center (CSRC) San Diego State
More informationAn Introduction to High Performance Computing in the Department
An Introduction to High Performance Computing in the Department Ashley Ford & Chris Jewell Department of Statistics University of Warwick October 30, 2012 1 Some Background 2 How is Buster used? 3 Software
More informationParallel Computing using MATLAB Distributed Compute Server ZORRO HPC
Parallel Computing using MATLAB Distributed Compute Server ZORRO HPC Goals of the session Overview of parallel MATLAB Why parallel MATLAB? Multiprocessing in MATLAB Parallel MATLAB using the Parallel Computing
More informationRA MPI Compilers Debuggers Profiling. March 25, 2009
RA MPI Compilers Debuggers Profiling March 25, 2009 Examples and Slides To download examples on RA 1. mkdir class 2. cd class 3. wget http://geco.mines.edu/workshop/class2/examples/examples.tgz 4. tar
More informationIntroduction to Linux and Cluster Basics for the CCR General Computing Cluster
Introduction to Linux and Cluster Basics for the CCR General Computing Cluster Cynthia Cornelius Center for Computational Research University at Buffalo, SUNY 701 Ellicott St Buffalo, NY 14203 Phone: 716-881-8959
More informationSLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education www.accre.vanderbilt.
SLURM: Resource Management and Job Scheduling Software Advanced Computing Center for Research and Education www.accre.vanderbilt.edu Simple Linux Utility for Resource Management But it s also a job scheduler!
More informationTutorial: Using WestGrid. Drew Leske Compute Canada/WestGrid Site Lead University of Victoria
Tutorial: Using WestGrid Drew Leske Compute Canada/WestGrid Site Lead University of Victoria Fall 2013 Seminar Series Date Speaker Topic 23 September Lindsay Sill Introduction to WestGrid 9 October Drew
More informationOpenMP & MPI CISC 879. Tristan Vanderbruggen & John Cavazos Dept of Computer & Information Sciences University of Delaware
OpenMP & MPI CISC 879 Tristan Vanderbruggen & John Cavazos Dept of Computer & Information Sciences University of Delaware 1 Lecture Overview Introduction OpenMP MPI Model Language extension: directives-based
More informationLoad Balancing. computing a file with grayscales. granularity considerations static work load assignment with MPI
Load Balancing 1 the Mandelbrot set computing a file with grayscales 2 Static Work Load Assignment granularity considerations static work load assignment with MPI 3 Dynamic Work Load Balancing scheduling
More informationCompute Cluster Server Lab 3: Debugging the parallel MPI programs in Microsoft Visual Studio 2005
Compute Cluster Server Lab 3: Debugging the parallel MPI programs in Microsoft Visual Studio 2005 Compute Cluster Server Lab 3: Debugging the parallel MPI programs in Microsoft Visual Studio 2005... 1
More informationPaul s Norwegian Vacation (or Experiences with Cluster Computing ) Paul Sack 20 September, 2002. sack@stud.ntnu.no www.stud.ntnu.
Paul s Norwegian Vacation (or Experiences with Cluster Computing ) Paul Sack 20 September, 2002 sack@stud.ntnu.no www.stud.ntnu.no/ sack/ Outline Background information Work on clusters Profiling tools
More informationParallel Computing. Parallel shared memory computing with OpenMP
Parallel Computing Parallel shared memory computing with OpenMP Thorsten Grahs, 14.07.2014 Table of contents Introduction Directives Scope of data Synchronization OpenMP vs. MPI OpenMP & MPI 14.07.2014
More informationUsing WestGrid. Patrick Mann, Manager, Technical Operations Jan.15, 2014
Using WestGrid Patrick Mann, Manager, Technical Operations Jan.15, 2014 Winter 2014 Seminar Series Date Speaker Topic 5 February Gino DiLabio Molecular Modelling Using HPC and Gaussian 26 February Jonathan
More informationMPI Hands-On List of the exercises
MPI Hands-On List of the exercises 1 MPI Hands-On Exercise 1: MPI Environment.... 2 2 MPI Hands-On Exercise 2: Ping-pong...3 3 MPI Hands-On Exercise 3: Collective communications and reductions... 5 4 MPI
More informationJob Scheduling with Moab Cluster Suite
Job Scheduling with Moab Cluster Suite IBM High Performance Computing February 2010 Y. Joanna Wong, Ph.D. yjw@us.ibm.com 2/22/2010 Workload Manager Torque Source: Adaptive Computing 2 Some terminology..
More informationGRID Computing: CAS Style
CS4CC3 Advanced Operating Systems Architectures Laboratory 7 GRID Computing: CAS Style campus trunk C.I.S. router "birkhoff" server The CAS Grid Computer 100BT ethernet node 1 "gigabyte" Ethernet switch
More informationGrid Engine Users Guide. 2011.11p1 Edition
Grid Engine Users Guide 2011.11p1 Edition Grid Engine Users Guide : 2011.11p1 Edition Published Nov 01 2012 Copyright 2012 University of California and Scalable Systems This document is subject to the
More informationAdvanced MPI. Hybrid programming, profiling and debugging of MPI applications. Hristo Iliev RZ. Rechen- und Kommunikationszentrum (RZ)
Advanced MPI Hybrid programming, profiling and debugging of MPI applications Hristo Iliev RZ Rechen- und Kommunikationszentrum (RZ) Agenda Halos (ghost cells) Hybrid programming Profiling of MPI applications
More information16 node Linux cluster at SCFBio
Clustering Tutorial What is Clustering? Clustering is the use of multiple computers, typically PCs or UNIX workstations, multiple storage devices, and redundant interconnections, to form what appears to
More informationIntroduction to Running Computations on the High Performance Clusters at the Center for Computational Research
! Introduction to Running Computations on the High Performance Clusters at the Center for Computational Research! Cynthia Cornelius! Center for Computational Research University at Buffalo, SUNY! cdc at
More informationJob scheduler details
Job scheduler details Advanced Computing Center for Research & Education (ACCRE) Job scheduler details 1 / 25 Outline 1 Batch queue system overview 2 Torque and Moab 3 Submitting jobs (ACCRE) Job scheduler
More informationMatlab on a Supercomputer
Matlab on a Supercomputer Shelley L. Knuth Research Computing April 9, 2015 Outline Description of Matlab and supercomputing Interactive Matlab jobs Non-interactive Matlab jobs Parallel Computing Slides
More informationRa - Batch Scripts. Timothy H. Kaiser, Ph.D. tkaiser@mines.edu
Ra - Batch Scripts Timothy H. Kaiser, Ph.D. tkaiser@mines.edu Jobs on Ra are Run via a Batch System Ra is a shared resource Purpose: Give fair access to all users Have control over where jobs are run Set
More informationPBS Tutorial. Fangrui Ma Universit of Nebraska-Lincoln. October 26th, 2007
PBS Tutorial Fangrui Ma Universit of Nebraska-Lincoln October 26th, 2007 Abstract In this tutorial we gave a brief introduction to using PBS Pro. We gave examples on how to write control script, and submit
More informationSpring 2011 Prof. Hyesoon Kim
Spring 2011 Prof. Hyesoon Kim Today, we will study typical patterns of parallel programming This is just one of the ways. Materials are based on a book by Timothy. Decompose Into tasks Original Problem
More informationSLURM Workload Manager
SLURM Workload Manager What is SLURM? SLURM (Simple Linux Utility for Resource Management) is the native scheduler software that runs on ASTI's HPC cluster. Free and open-source job scheduler for the Linux
More informationLOAD BALANCING DISTRIBUTED OPERATING SYSTEMS, SCALABILITY, SS 2015. Hermann Härtig
LOAD BALANCING DISTRIBUTED OPERATING SYSTEMS, SCALABILITY, SS 2015 Hermann Härtig ISSUES starting points independent Unix processes and block synchronous execution who does it load migration mechanism
More informationUsing Parallel Computing to Run Multiple Jobs
Beowulf Training Using Parallel Computing to Run Multiple Jobs Jeff Linderoth August 5, 2003 August 5, 2003 Beowulf Training Running Multiple Jobs Slide 1 Outline Introduction to Scheduling Software The
More informationHow To Run A Steady Case On A Creeper
Crash Course Introduction to OpenFOAM Artur Lidtke University of Southampton akl1g09@soton.ac.uk November 4, 2014 Artur Lidtke Crash Course Introduction to OpenFOAM 1 / 32 What is OpenFOAM? Using OpenFOAM
More informationHow to Run Parallel Jobs Efficiently
How to Run Parallel Jobs Efficiently Shao-Ching Huang High Performance Computing Group UCLA Institute for Digital Research and Education May 9, 2013 1 The big picture: running parallel jobs on Hoffman2
More informationIntroduction to MPI Programming!
Introduction to MPI Programming! Rocks-A-Palooza II! Lab Session! 2006 UC Regents! 1! Modes of Parallel Computing! SIMD - Single Instruction Multiple Data!!processors are lock-stepped : each processor
More informationInformatica e Sistemi in Tempo Reale
Informatica e Sistemi in Tempo Reale Introduction to C programming Giuseppe Lipari http://retis.sssup.it/~lipari Scuola Superiore Sant Anna Pisa October 25, 2010 G. Lipari (Scuola Superiore Sant Anna)
More informationGetting Started with HPC
Getting Started with HPC An Introduction to the Minerva High Performance Computing Resource 17 Sep 2013 Outline of Topics Introduction HPC Accounts Logging onto the HPC Clusters Common Linux Commands Storage
More informationInstalling and running COMSOL on a Linux cluster
Installing and running COMSOL on a Linux cluster Introduction This quick guide explains how to install and operate COMSOL Multiphysics 5.0 on a Linux cluster. It is a complement to the COMSOL Installation
More informationOn-demand (Pay-per-Use) HPC Service Portal
On-demand (Pay-per-Use) Portal Wang Junhong INTRODUCTION High Performance Computing, Computer Centre The Service Portal is a key component of the On-demand (pay-per-use) HPC service delivery. The Portal,
More informationOverview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming
Overview Lecture 1: an introduction to CUDA Mike Giles mike.giles@maths.ox.ac.uk hardware view software view Oxford University Mathematical Institute Oxford e-research Centre Lecture 1 p. 1 Lecture 1 p.
More informationAgenda. Using HPC Wales 2
Using HPC Wales Agenda Infrastructure : An Overview of our Infrastructure Logging in : Command Line Interface and File Transfer Linux Basics : Commands and Text Editors Using Modules : Managing Software
More informationSLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education www.accre.vanderbilt.
SLURM: Resource Management and Job Scheduling Software Advanced Computing Center for Research and Education www.accre.vanderbilt.edu Simple Linux Utility for Resource Management But it s also a job scheduler!
More informationIntroduction to Parallel Programming and MapReduce
Introduction to Parallel Programming and MapReduce Audience and Pre-Requisites This tutorial covers the basics of parallel programming and the MapReduce programming model. The pre-requisites are significant
More informationJuropa. Batch Usage Introduction. May 2014 Chrysovalantis Paschoulas c.paschoulas@fz-juelich.de
Juropa Batch Usage Introduction May 2014 Chrysovalantis Paschoulas c.paschoulas@fz-juelich.de Batch System Usage Model A Batch System: monitors and controls the resources on the system manages and schedules
More informationMartinos Center Compute Clusters
Intro What are the compute clusters How to gain access Housekeeping Usage Log In Submitting Jobs Queues Request CPUs/vmem Email Status I/O Interactive Dependencies Daisy Chain Wrapper Script In Progress
More informationMS SQL Performance (Tuning) Best Practices:
MS SQL Performance (Tuning) Best Practices: 1. Don t share the SQL server hardware with other services If other workloads are running on the same server where SQL Server is running, memory and other hardware
More informationParallel and Distributed Computing Programming Assignment 1
Parallel and Distributed Computing Programming Assignment 1 Due Monday, February 7 For programming assignment 1, you should write two C programs. One should provide an estimate of the performance of ping-pong
More informationHigh Performance Computing in CST STUDIO SUITE
High Performance Computing in CST STUDIO SUITE Felix Wolfheimer GPU Computing Performance Speedup 18 16 14 12 10 8 6 4 2 0 Promo offer for EUC participants: 25% discount for K40 cards Speedup of Solver
More informationDebugging with TotalView
Tim Cramer 17.03.2015 IT Center der RWTH Aachen University Why to use a Debugger? If your program goes haywire, you may... ( wand (... buy a magic... read the source code again and again and...... enrich
More informationNorduGrid ARC Tutorial
NorduGrid ARC Tutorial / Arto Teräs and Olli Tourunen 2006-03-23 Slide 1(34) NorduGrid ARC Tutorial Arto Teräs and Olli Tourunen CSC, Espoo, Finland March 23
More informationDebugging and Profiling Lab. Carlos Rosales, Kent Milfeld and Yaakoub Y. El Kharma carlos@tacc.utexas.edu
Debugging and Profiling Lab Carlos Rosales, Kent Milfeld and Yaakoub Y. El Kharma carlos@tacc.utexas.edu Setup Login to Ranger: - ssh -X username@ranger.tacc.utexas.edu Make sure you can export graphics
More informationMFCF Grad Session 2015
MFCF Grad Session 2015 Agenda Introduction Help Centre and requests Dept. Grad reps Linux clusters using R with MPI Remote applications Future computing direction Technical question and answer period MFCF
More informationParallel Processing using the LOTUS cluster
Parallel Processing using the LOTUS cluster Alison Pamment / Cristina del Cano Novales JASMIN/CEMS Workshop February 2015 Overview Parallelising data analysis LOTUS HPC Cluster Job submission on LOTUS
More informationHigh-Performance Computing
High-Performance Computing Windows, Matlab and the HPC Dr. Leigh Brookshaw Dept. of Maths and Computing, USQ 1 The HPC Architecture 30 Sun boxes or nodes Each node has 2 x 2.4GHz AMD CPUs with 4 Cores
More informationWorkflow-Management with flowguide
Technical White Paper May 2003 flowguide IT-Services Workflow-Management with flowguide (TM) Improving Quality and Efficiency by Automating your Engineering Processes Betriebskonzepte Sicherheitslösungen
More informationUntil now: tl;dr: - submit a job to the scheduler
Until now: - access the cluster copy data to/from the cluster create parallel software compile code and use optimized libraries how to run the software on the full cluster tl;dr: - submit a job to the
More informationNotes on the SNOW/Rmpi R packages with OpenMPI and Sun Grid Engine
Notes on the SNOW/Rmpi R packages with OpenMPI and Sun Grid Engine Last updated: 6/2/2008 4:43PM EDT We informally discuss the basic set up of the R Rmpi and SNOW packages with OpenMPI and the Sun Grid
More informationIntroduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca December 1st, 2015
Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca December 1st, 2015 1 Partners and sponsors 2 Exercise 0: Login and Setup Ubuntu login:
More informationCHM 579 Lab 1: Basic Monte Carlo Algorithm
CHM 579 Lab 1: Basic Monte Carlo Algorithm Due 02/12/2014 The goal of this lab is to get familiar with a simple Monte Carlo program and to be able to compile and run it on a Linux server. Lab Procedure:
More informationResource Management and Job Scheduling
Resource Management and Job Scheduling Jenett Tillotson Senior Cluster System Administrator Indiana University May 18 18-22 May 2015 1 Resource Managers Keep track of resources Nodes: CPUs, disk, memory,
More informationHigh Performance Computing
High Performance Computing at Stellenbosch University Gerhard Venter Outline 1 Background 2 Clusters 3 SU History 4 SU Cluster 5 Using the Cluster 6 Examples What is High Performance Computing? Wikipedia
More informationStoring Measurement Data
Storing Measurement Data File I/O records or reads data in a file. A typical file I/O operation involves the following process. 1. Create or open a file. Indicate where an existing file resides or where
More informationCS10110 Introduction to personal computer equipment
CS10110 Introduction to personal computer equipment PRACTICAL 4 : Process, Task and Application Management In this practical you will: Use Unix shell commands to find out about the processes the operating
More informationHPC at IU Overview. Abhinav Thota Research Technologies Indiana University
HPC at IU Overview Abhinav Thota Research Technologies Indiana University What is HPC/cyberinfrastructure? Why should you care? Data sizes are growing Need to get to the solution faster Compute power is
More informationCSC230 Getting Starting in C. Tyler Bletsch
CSC230 Getting Starting in C Tyler Bletsch What is C? The language of UNIX Procedural language (no classes) Low-level access to memory Easy to map to machine language Not much run-time stuff needed Surprisingly
More informationGrid Engine. Application Integration
Grid Engine Application Integration Getting Stuff Done. Batch Interactive - Terminal Interactive - X11/GUI Licensed Applications Parallel Jobs DRMAA Batch Jobs Most common What is run: Shell Scripts Binaries
More informationHow To Visualize Performance Data In A Computer Program
Performance Visualization Tools 1 Performance Visualization Tools Lecture Outline : Following Topics will be discussed Characteristics of Performance Visualization technique Commercial and Public Domain
More informationCUDA programming on NVIDIA GPUs
p. 1/21 on NVIDIA GPUs Mike Giles mike.giles@maths.ox.ac.uk Oxford University Mathematical Institute Oxford-Man Institute for Quantitative Finance Oxford eresearch Centre p. 2/21 Overview hardware view
More informationIntroduction to the SGE/OGS batch-queuing system
Grid Computing Competence Center Introduction to the SGE/OGS batch-queuing system Riccardo Murri Grid Computing Competence Center, Organisch-Chemisches Institut, University of Zurich Oct. 6, 2011 The basic
More informationParallel Computing with MATLAB
Parallel Computing with MATLAB Scott Benway Senior Account Manager Jiro Doke, Ph.D. Senior Application Engineer 2013 The MathWorks, Inc. 1 Acceleration Strategies Applied in MATLAB Approach Options Best
More informationSetting up PostgreSQL
Setting up PostgreSQL 1 Introduction to PostgreSQL PostgreSQL is an object-relational database management system based on POSTGRES, which was developed at the University of California at Berkeley. PostgreSQL
More informationWorking with HPC and HTC Apps. Abhinav Thota Research Technologies Indiana University
Working with HPC and HTC Apps Abhinav Thota Research Technologies Indiana University Outline What are HPC apps? Working with typical HPC apps Compilers - Optimizations and libraries Installation Modules
More informationRunning on Blue Gene/Q at Argonne Leadership Computing Facility (ALCF)
Running on Blue Gene/Q at Argonne Leadership Computing Facility (ALCF) ALCF Resources: Machines & Storage Mira (Production) IBM Blue Gene/Q 49,152 nodes / 786,432 cores 768 TB of memory Peak flop rate:
More information