Why Choose C/C++ as the programming language? Parallel Programming in C/C++ - OpenMP versus MPI



Similar documents
Parallel Computing. Parallel shared memory computing with OpenMP

Parallelization: Binary Tree Traversal

WinBioinfTools: Bioinformatics Tools for Windows Cluster. Done By: Hisham Adel Mohamed

High performance computing systems. Lab 1

To connect to the cluster, simply use a SSH or SFTP client to connect to:

HPCC - Hrothgar Getting Started User Guide MPI Programming

OpenMP & MPI CISC 879. Tristan Vanderbruggen & John Cavazos Dept of Computer & Information Sciences University of Delaware

COMP/CS 605: Introduction to Parallel Computing Lecture 21: Shared Memory Programming with OpenMP

A survival guide to Radiance on Windows. When you have (almost) no choice :-(

Parallel Computing. Shared memory parallel programming with OpenMP

CS 253: Intro to Systems Programming

An Incomplete C++ Primer. University of Wyoming MA 5310

Hybrid Programming with MPI and OpenMP

Lightning Introduction to MPI Programming

Lecture 6: Introduction to MPI programming. Lecture 6: Introduction to MPI programming p. 1

Load Balancing. computing a file with grayscales. granularity considerations static work load assignment with MPI

HPC Wales Skills Academy Course Catalogue 2015

TNM093 Practical Data Visualization and Virtual Reality Laboratory Platform

Introduction to Linux and Cluster Basics for the CCR General Computing Cluster

Informatica e Sistemi in Tempo Reale

Embedded Software Development

Parallel Algorithm Engineering

Building graphic-rich and better performing native applications. Pro. Android C++ with the NDK. Onur Cinar

Cloud-based OpenMP Parallelization Using a MapReduce Runtime. Rodolfo Wottrich, Rodolfo Azevedo and Guido Araujo University of Campinas

The "Eclipse Classic" version is recommended. Otherwise, a Java or RCP version of Eclipse is recommended.

Building Applications Using Micro Focus COBOL

Parallelization of video compressing with FFmpeg and OpenMP in supercomputing environment

LOAD BALANCING DISTRIBUTED OPERATING SYSTEMS, SCALABILITY, SS Hermann Härtig

Chapter 1: Getting Started

Debugging with TotalView

A Tutorial on installing and using Eclipse

OpenCV on Android Platforms

Multi-core Programming System Overview

System Structures. Services Interface Structure

How to use PDFlib products with PHP

INTEL PARALLEL STUDIO EVALUATION GUIDE. Intel Cilk Plus: A Simple Path to Parallelism

Chapter 6, The Operating System Machine Level

Session 2: MUST. Correctness Checking

Using the Windows Cluster

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture

Installing C++ compiler for CSc212 Data Structures

Compute Cluster Server Lab 3: Debugging the parallel MPI programs in Microsoft Visual Studio 2005

Introduction to Hybrid Programming

OpenFOAM on Windows. Porting Requirements and Experiences. Petr Vita June 9, Montanuniversität Leoben

Kernel Types System Calls. Operating Systems. Autumn 2013 CS4023

Integrating TAU With Eclipse: A Performance Analysis System in an Integrated Development Environment

CSC230 Getting Starting in C. Tyler Bletsch

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming

Efficiency of Web Based SAX XML Distributed Processing

Eddy Integrated Development Environment, LemonIDE for Embedded Software System Development

Introduction. Reading. Today MPI & OpenMP papers Tuesday Commutativity Analysis & HPF. CMSC 818Z - S99 (lect 5)

Android: Setup Hello, World: Android Edition. due by noon ET on Wed 2/22. Ingredients.

Using Intel C++ Compiler in Eclipse* for Embedded Linux* targets

ELEC 377. Operating Systems. Week 1 Class 3

Multi-Threading Performance on Commodity Multi-Core Processors

Introduction to Virtual Machines

What is Multi Core Architecture?

NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X

Cygwin: getting the setup tool

Efficiency Considerations of PERL and Python in Distributed Processing

University of Amsterdam - SURFsara. High Performance Computing and Big Data Course

CS 103 Lab Linux and Virtual Machines

OPERATING SYSTEM SERVICES

Supported platforms & compilers Required software Where to download the packages Geant4 toolkit installation (release 9.6)

Easing embedded Linux software development for SBCs

MPI / ClusterTools Update and Plans

Main Bullet #1 Main Bullet #2 Main Bullet #3

ASCII Encoding. The char Type. Manipulating Characters. Manipulating Characters

Volume SYNAMETRICS TECHNOLOGIES. A Division of IndusSoft Technologies, Inc. DeltaCopy User s Guide

Libmonitor: A Tool for First-Party Monitoring

The Plan Today... System Calls and API's Basics of OS design Virtual Machines

Parallel Programming with MPI on the Odyssey Cluster

Outline. hardware components programming environments. installing Python executing Python code. decimal and binary notations running Sage

Introduction CMake Language Specific cases. CMake Tutorial. How to setup your C/C++ projects? Luis Díaz Más.

Example of Standard API

BLM 413E - Parallel Programming Lecture 3

Code Estimation Tools Directions for a Services Engagement

MatrixSSL Getting Started

MobaXTerm: A good gnome-terminal like tabbed SSH client for Windows / Windows Putty Tabs Alternative

Quick Start Guide For Using Intel Compiler To Build Applications Targeting Quark SoC On Galileo Board

Release Notes for Open Grid Scheduler/Grid Engine. Version: Grid Engine

Tibetan For Windows - Software Development and Future Speculations. Marvin Moser, Tibetan for Windows & Lucent Technologies, USA

1001ICT Introduction To Programming Lecture Notes

System Requirements - Table of Contents

How to Write a Simple Makefile

Intel Integrated Native Developer Experience (INDE): IDE Integration for Android*

Introduction to UNIX and SFTP

Optimization tools. 1) Improving Overall I/O

Eclipse IDE for Embedded AVR Software Development

MPI and Hybrid Programming Models. William Gropp

Operating System Structure

Streamline Computing Linux Cluster User Training. ( Nottingham University)

How To Develop Android On Your Computer Or Tablet Or Phone

Installing (1.8.7) 9/2/ Installing jgrasp

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture

Transcription:

Parallel Programming (Multi/cross-platform) Why Choose C/C++ as the programming language? Compiling C/C++ on Windows (for free) Compiling C/C++ on other platforms for free is not an issue Parallel Programming in C/C++ - OpenMP versus MPI MPI Examples OpenMP Examples Project Assessed Work (50%)

Why Choose C/C++ as the parallel programming language? It is the language most computer science students and scientists are familiar with. A large number of parallel algorithm collections/sources contains descriptions (implementations) written in C or in C-like languages. C/C++ is closer to the real machine. It is nice to use something different from Java. NOTE: there exists support for parallel programming using a wide range of languages on a variety of OSs.

Compiling C/C++ on Windows (for free) If you want to compile C/C++ on a windows machine without paying for it - you have a number of (reasonable) choices: Microsoft Visual C++ 2010 Express Limited libraries (especially for parallel/concurrent development) Borland C++ Compiler 5.5 Limited IDE (basic command line) GNU C/C++ Compiler, and associated developement systems: DJGPP DJ Delorie's GNU Programming Platform Limited to executing inside the DOS box as it is a DOS compiler Cygwin Cygnus s gnuwin32 cross compiler and emulator GNU - a recursive acronym for "GNU's Not Unix!" MinGW32 - Minimalistic GNU for Windows 32 bit The mostinterestingchoiceisbetweencygwinand MinGW32 -wherethereare trade-offs in terms of portability and licensing.

Cygwin versus MinGW From Cygwin's web site (http://www.cygwin.com/ ) Cygwin is a Linux-like environment for Windows. It consists of two parts: A DLL (cygwin1.dll) which acts as a Linux API emulation layer providing substantial Linux API functionality. A collection of tools which provide Linux look and feel. Cygwin uses a DLL, cygwin.dll, (or maybe a set of DLLs) to provide a POSIX-like runtime on Windows. If you build something with Cygwin, any system you install it to will also need the Cygwin DLL(s). From MinGW's website (http://www.mingw.org/wiki/mingw ) MinGW ("Minimalistic GNU for Windows") is a collection of freely available and freely distributable Windows specific header files and import libraries combined with GNU toolsets that allow one to produce native Windows programs that do not rely on any 3rd-party C runtime DLLs MinGW compiles to a native Win32 application. A MinGW application does not need any special runtime.

Cygwin versus MinGW Compile something in Cygwin and you are compiling it for Cygwin. Compile something in MingW and you are compiling it for Windows. Cygwin is good when your app absolutelyneedsa POSIX environment to run - it is sometimes easier to port something to Cygwin than it is to port it to Windows, because Cygwin is a layer on top of Windows that emulates a POSIX environment. If you compile something for Cygwin then it will need to be run within the Cygwin environment, as provided by cygwin1.dll. For portability, you could distribute this dll with your project, if you were willing to comply with the relevant license. MingWis a Windows port of the GNU compiler tools, like GCC, Make, Bash, etc, which run directly in Windows without any emulation layer. By default it will compile to a native Win32 target, complete with.exe and.dll files, though you could also cross-compile with the right settings. It is an alternative to Microsoft Visual C compiler and its associated linking/make tools in a way.

POSIX - Why would we need this in multi-platform development? POSIX - Portable Operating System Interface for Unix A family of related standards specified by the IEEE to define the application programming interface (API), along with shell and utilities interfaces, for software compatible with variants of the Unix operating system, although the standard can apply to any operating system. POSIX for Windows Cygwin - near full compliance (the most free/open choice) Microsoft POSIX subsystem partial compliance Microsoft Windows Services for UNIX - full POSIX compliance for certain Microsoft Windows products UWIN from AT&T Research - near full compliance MKS Toolkit - near full compliance

Portability Issues Case 1: I want to create an application where I write source code once, compile it once and run it in any platforms (e.g. Windows, Linux and Mac OS X ). Best (?) Solution : Perhaps C/C++ is not the best choice. Why not write your source code in JAVA. Compile the source code once and run it anywhere. Case 2: I want to create an application where I write source code once, and compile the source code separately for each platform. Best (?) Solution : Write your source code in C or C++. Use standard header files only. Use any suitable compiler for any platform. Do not used advanced, machine dependent, language features.

Portability Issues Case 3: I want to create an application where I write source code once, and compile the source code separately for each platform. However, I need some advanced cross platform features. Best (?) Solution : Use GCC as it is cross platform compiler. On Windows, the simplest solution is MinGW. Do not use programming features from the Windows API! Case 4: I want to create an application where I write source code once, and compile the source code separately for each platform. However, I need advanced features (like mulit-threading) not supported by MinGW. Best (?) Solution : You should use POSIX (Portable Operating System Interface [for UNIX]) standard. It provides many advanced programming features (including multi-threading) and tools. On Windows, Cygwin provides a largely POSIXcompliant development and run-time environment.

Distributedversus Shared: MPI versus OpenMP? Distributed Memory why MPI? Sockets (standardized but low level) PVM - Parallel Virtual Machine (obsolete?) MPI - Message Passing Interface (de-facto standard) Shared Memory why OpenMP? Posix Threads (standardized, low level) OpenMP Open Multi-Processing (de-facto standard) Automatic Parallelization (compiler does it for you) ENGINEERING ADVICE: If in doubt - stick to the standards

Distributedversus Shared: MPI versus OpenMP? Note: OpenMPand MPIare oftencombined-in a hybridsystem- to provide parallelism at different levels of granularity. It is oftensaid that MPIis more complicated thanmp, but that it scales better however, this is not accepted by all parallel programmers/engineers

MPI Examples First check that you have, at the minimum, the following installed: mpicc mpirun Check where these are on your file system, and make sure they are in your path. For example, on linux systemsyou can use the whichcommand: gibson@gibson-laptop:~/teaching/openmpi$ which mpicc /usr/bin/mpicc gibson@gibson-laptop:~/teaching/openmpi$ which mpirun /usr/bin/mpirun If you wish to compile mpicode using an IDE (like Eclipse) youmay have to configure the build to be able to find theseexecutables. Your IDE may provide a specific plug-in to support development in parallel (using MPI and/or other libraries). For example Eclipse has a parallel tools platform at: http://www.eclipse.org/ptp/downloads.php

mpicc The mpicc command can be used to compile and link MPI programs written in C. It provides the options and any special libraries that are OS dependent The MPI library may be used with any compiler that uses the same lengths for basic data objects (such as long double) and that uses compatible run-time libraries. On many systems, the various compilers are compatible and may be used interchangably. The environment variable MPICH_CC may be used to select different C compiler and linker. Note that changing the compilers used can cause problems. Use this only if you could intermix code compiled with the different compilers and you know what you are doing. Combining compilation and linking in a single command is usually done as follows: mpicc -o code code.c The full list of options is found in the man page.

mpirun mpirun is a shell script which is required to run other mpi based programs. It attempts to: hide the differences in starting jobs for various devices from the user. determine what kind of machine it is running on and start the required number of jobs on that machine. On clusters, you must supply a file that lists the different machines that mpirun can use to run remote jobs or specify this file every time you run mpirun with the -machine file option. mpirun is typicall invoked as follows: mpirun -np <number of processes> <program name and arguments> If mpirun cannot determine what kind of machine you are on, and it is supported by the mpi implementation, you can use the machine and -arch options to tell it what kind of machine you are running on The full list of options is found in the man page.

MPI Examples HelloWorldMPI.c #include <stdio.h> #include <mpi.h> int main(int argc, char *argv[]) { int numprocs, rank, namelen; char processor_name[mpi_max_processor_name]; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &numprocs); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Get_processor_name(processor_name, &namelen); printf("hello World, from process %d on %s out of %d\n", rank, processor_name, numprocs); } MPI_Finalize();

MPI Examples HelloWorldMPI.c COMPILE gibson@gibson-laptop:~/teaching/openmpi$ mpicc -o HelloWorldMPI HelloWorldMPI.c EXECUTE ON 4 PROCESSORS gibson@gibson-laptop:~/teaching/openmpi$ mpirun -np 4 HelloWorldMPI Hello World, from process 0 on gibson-laptop out of 4 Hello World, from process 1 on gibson-laptop out of 4 Hello World, from process 2 on gibson-laptop out of 4 Hello World, from process 3 on gibson-laptop out of 4

MPI Examples testcommstarmpi.c P1 P2 P0 P3 P5 P4

MPI Examples testcommstarmpi.c COMPILE mpicc -o testcommstarmpi testcommstarmpi.c EXECUTE ON 6 PROCESSORS mpirun -np 6 testcommstarmpiprocess 0 will try to receive messages from 6 processes. Try to receive message from process 1 Messaged received from process 1 is - Greetings from process 1! Try to receive message from process 2 Messaged received from process 2 is - Greetings from process 2! Try to receive message from process 3 Messaged received from process 3 is - Greetings from process 3! Try to receive message from process 4 Messaged received from process 4 is - Greetings from process 4! Try to receive message from process 5 Messaged received from process 5 is - Greetings from process 5!

MPI Examples testcommstarmpi.c #include <stdio.h> #include <string.h> #include "mpi.h" int main(int argc, char* argv[]){ int my_rank; /* rank of process */ int p; /* number of processes */ int source; /* rank of sender */ int dest; /* rank of receiver */ int tag=0; /* tag for messages */ char message[100]; /* storage for message */ MPI_Status status ; /* return status for receive */ /* start up MPI */ MPI_Init(&argc, &argv); /* find out process rank */ MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); /* find out number of processes */ MPI_Comm_size(MPI_COMM_WORLD, &p);

MPI Examples testcommstarmpi.c if (my_rank!=0){ /* create message */ sprintf(message, "Greetings from process %d!", my_rank); dest = 0; /* use strlen+1 so that '\0' get transmitted */ MPI_Send(message, strlen(message)+1, MPI_CHAR, dest, tag, MPI_COMM_WORLD); } else{printf("process 0 will try to receive messages from %d processes.\n",p); for (source = 1; source < p; source++) { printf("try to receive message from process %d\n",source); MPI_Recv(message, 100, MPI_CHAR, source, tag, MPI_COMM_WORLD, &status); printf("messaged received from process %d is - %s\n",source, message); } } /* shut down MPI */ MPI_Finalize(); } return 0;

MPI Examples testcommstarmpi.c We could/should restructure the program to separate the master and slave code: main(){... MPI_Comm_Rank(MPI_COMM_WORLD,&id); MPI_Comm_size(MPI_COMM_WORLD,&np); if (id ==0 ) Master(); else... } Slave(); TO DO: Download the program testcommstarmpi.c from the web site http://www-public.int-evry.fr/~gibson/teaching/csc5021/code/testcommstarmpi.c and restructure to master-slave architecture, as above.

MPI Examples testcommringmpi.c P1 P2 TO DO: P5 P0 P3 Writea program thatestablishesa communication ring architecture between processes. P4

MPI Examples parallelmergesortmpi.c Original Work http://penguin.ewu.edu/~trolfe/parallelmerge/index.html Timothy J. Rolfe. 2010. A specimen of parallel programming: parallel merge sort implementation. ACM Inroads 1, 4 (December 2010), 72-79. http://www-public.int-evry.fr/~gibson/teaching/csc5021/readingmaterial/rolfe10.pdf Process Ranks in a Four-Level Sorting Tree for 8 processes Download the source code directory ParallelMerge.zip from the web site: http://www-public.int-evry.fr/~gibson/teaching/csc5021/code/parallelmerge.zip

MPI Examples parallelmergesortmpi.c mpirun -np 1 parallelmergesortmpi 1 processes mandates root height of 0 Size: 2671396 Sorting succeeds. Parallel: 1.072 Sequential: 1.082 Speed-up: 1.009 gibson@gibson-laptop:~/teaching/openmpi$ mpirun -np 2 parallelmergesortmpi 2 processes mandates root height of 1 Size: 14295844 Sorting succeeds. Parallel: 6.327 Sequential: 6.177 Speed-up: 0.976 gibson@gibson-laptop:~/teaching/openmpi$ mpirun -np 4 parallelmergesortmpi 4 processes mandates root height of 2 Size: 3494692 Sorting succeeds. Parallel: 1.553 Sequential: 1.379 Speed-up: 0.888 gibson@gibson-laptop:~/teaching/openmpi$ mpirun -np 6 parallelmergesortmpi 6 processes mandates root height of 3 Size: 2949924 Sorting succeeds. Parallel: 1.440 Sequential: 1.156 Speed-up: 0.803 gibson@gibson-laptop:~/teaching/openmpi$ mpirun -np 8 parallelmergesortmpi 8 processes mandates root height of 3 Size: 16315172 Sorting succeeds. Parallel: 8.446 Sequential: 7.444 Speed-up: 0.881 QUESTION: can you explain the speed-up values?

MPI Examples Checking for prime numbers Download the program parallelprimecountmpi.c from the web site: http://www-public.int-evry.fr/~gibson/teaching/csc5021/code/parallelprimecountmpi.c TO DO: Write test code to verify that this functions correctly Try to understand how the code works

OpenMP (Open Multi-Processing) is an API (application programming interface) that consists of a set of compiler directives, library routines, and environment variables that influence run-time behavior. In OpenMP with C/C++ the programmer uses #pragmas to control the parallel aspects

OpenMP- resources The first port of call shouldbe: http://openmp.org/wp/ There is a good tutorial presentation at http://www.openmp.org/mp-documents/omp-hands-on-sc08.pdf A Summary of OpenMP 3.0 C/C++ Syntax canbe found at: http://www.openmp.org/mp-documents/openmp3.0-summaryspec.pdf There isa good web site on OpenMP withc++ at: http://bisqwit.iki.fi/story/howto/openmp/#exampleinitializingatableinparallel There isa good guide at: http://geco.mines.edu/workshop/class2/examples/openmp/openmp.pdf One of the first published articles was: Leonardo Dagumand Ramesh Menon. 1998. OpenMP: An Industry-Standard API for Shared-Memory Programming. IEEE Comput. Sci. Eng. 5, 1 (January 1998), 46-55. http://www-public.int-evry.fr/~gibson/teaching/csc5021/readingmaterial/dagummenon98.pdf

OpenMP Examples HelloWorldOMP.c #include <stdio.h> #include <omp.h> int main(int argc, char *argv[]) { intiam = 0, np = 1; #pragma omp parallel default(shared) private(iam, np) { #if defined (_OPENMP) np = omp_get_num_threads(); iam = omp_get_thread_num(); #endif printf("hello from thread %d out of %d\n", iam, np); } }

OpenMP Examples Question: what do you think this code is doing? #pragma omp parallel for for(int x=0; x < width; x++) { for(int y=0; y < height; y++) { finalimage[x][y] = RenderPixel(x,y, &scenedata); } } TO DO: Compile and execute some openmp examples. Look for code that does parallel sorting to compare with the MPI merge sort.

Project Assessed Work (50%) You are to design, implement, test and compare 2 separate solutions to a string search problem: 1. Sequential 2. Parallel Problem Overview: Scientists have discovered different types of alien genetic material, sharing common structure: 3 dimensional lattice of cubes of distinctive components (letters) Eachlatticehas a finitenumber of letters in itsalphabet from 2 up to 100 The lattice sizes range from 4*4*4 up to 1000*1000*1000 Scientists have also discovered dictionaries of words from the genetic alphabet: The dictionarysize ranges from 3 up to 1000 words All words in a common dictionaryshare the same length, from 2 up to 100 The scientists hypothesise that dictionaries are associated to some of the genetic material, provided all words in the dictionary can be found in sequence (horizontal, vertical or diagonal) in the material lattice. You are to write a program thattakes as input anylatticematerialand anydictionary, and returns a boolean stating whether there is an association between the two.

Project Assessed Work (50%) Example Dictionary1 = abcb, bcca, aabc Dictionary2 = abcde, ebcda, edbac Dictionary3 = abbba, bbbba, aabaa Which of these dictionaries are associatedto the 5*5 lattice of genetic material (if any)?