Tutorial-4a: Parallel (multi-cpu) Computing



Similar documents
SLURM Workload Manager

The Asterope compute cluster

Introduction to Running Computations on the High Performance Clusters at the Center for Computational Research

Introduction to parallel computing and UPPMAX

Matlab on a Supercomputer

Until now: tl;dr: - submit a job to the scheduler

An introduction to compute resources in Biostatistics. Chris Scheller

Introduction to Cloud Computing

SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education

MFCF Grad Session 2015

Introduction to Supercomputing with Janus

Scalability and Classifications

File Transfer Examples. Running commands on other computers and transferring files between computers

Using WestGrid. Patrick Mann, Manager, Technical Operations Jan.15, 2014

SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education

Parallel Computing with Mathematica UVACSE Short Course

An Introduction to Parallel Computing/ Programming

Tutorial: Using WestGrid. Drew Leske Compute Canada/WestGrid Site Lead University of Victoria

General Overview. Slurm Training15. Alfred Gil & Jordi Blasco (HPCNow!)

A High Performance Computing Scheduling and Resource Management Primer

Spring 2011 Prof. Hyesoon Kim

Introduction to Linux and Cluster Basics for the CCR General Computing Cluster

To connect to the cluster, simply use a SSH or SFTP client to connect to:

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/ CAE Associates

Submitting batch jobs Slurm on ecgate. Xavi Abellan User Support Section

- An Essential Building Block for Stable and Reliable Compute Clusters

Beyond Windows: Using the Linux Servers and the Grid

Middleware and Distributed Systems. Introduction. Dr. Martin v. Löwis

Optimizing Shared Resource Contention in HPC Clusters

Work Environment. David Tur HPC Expert. HPC Users Training September, 18th 2015

Introduction to Running Hadoop on the High Performance Clusters at the Center for Computational Research

Debugging with TotalView

Secure Shell. The Protocol

High Performance Computing

Overview of HPC Resources at Vanderbilt

The Fastest Way to Parallel Programming for Multicore, Clusters, Supercomputers and the Cloud.

Lecture 2 Parallel Programming Platforms

HPC Wales Skills Academy Course Catalogue 2015

High Performance Computing

Computer Architecture TDTS10

Parallel Algorithm Engineering

Kernel. What is an Operating System? Systems Software and Application Software. The core of an OS is called kernel, which. Module 9: Operating Systems

Running applications on the Cray XC30 4/12/2015

Week Overview. Running Live Linux Sending from command line scp and sftp utilities

Programming for GCSE Topic H: Operating Systems

Introduction to Sun Grid Engine (SGE)

Chapter 2: Getting Started

An Introduction to High Performance Computing in the Department

Lecture 23: Multiprocessors

EVault Software. Course 361 Protecting Linux and UNIX with EVault

Multi-core architectures. Jernej Barbic , Spring 2007 May 3, 2007

LSKA 2010 Survey Report Job Scheduler

Operating System for the K computer

Symmetric Multiprocessing

Easy Setup Guide 1&1 CLOUD SERVER. Creating Backups. for Linux

Comsol Multiphysics. Running COMSOL on the Amazon Cloud. VERSION 4.3a

Parallel Programming

Linux für bwgrid. Sabine Richling, Heinz Kredel. Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim. 27.

Getting Started with HPC

Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

High Performance Computing. Course Notes HPC Fundamentals

New High-performance computing cluster: PAULI. Sascha Frick Institute for Physical Chemistry

Parallel Computing using MATLAB Distributed Compute Server ZORRO HPC

Introduction to Programming and Computing for Scientists

Assignment # 1 (Cloud Computing Security)

Using sftp in Informatica PowerCenter

Maintaining Non-Stop Services with Multi Layer Monitoring

Installing IBM Websphere Application Server 7 and 8 on OS4 Enterprise Linux

Chapter 1 Computer System Overview

The Lattice Project: A Multi-Model Grid Computing System. Center for Bioinformatics and Computational Biology University of Maryland

Testing New Applications In The DMZ Using VMware ESX. Ivan Dell Era Software Engineer IBM

Chapter 2 Parallel Architecture, Software And Performance

A Comparison of Distributed Systems: ChorusOS and Amoeba

Linux Overview. Local facilities. Linux commands. The vi (gvim) editor

Aqua Connect Load Balancer User Manual (Mac)

Origins of Operating Systems OS/360. Martin Grund HPI

How To Understand And Understand An Operating System In C Programming

Parallel Processing using the LOTUS cluster

Introduction to HPC Workshop. Center for e-research

SSH and Basic Commands

Hadoop Installation MapReduce Examples Jake Karnes

Distributed convex Belief Propagation Amazon EC2 Tutorial

GPUs for Scientific Computing

Pen Test Tips 2. Shell vs. Terminal

Security Configuration Guide P/N Rev A05

Introduction to GPU Programming Languages

Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building Blocks. An Oracle White Paper April 2003

Active-Active and High Availability

Centralized Systems. A Centralized Computer System. Chapter 18: Database System Architectures

Miami University RedHawk Cluster Connecting to the Cluster Using Windows

1.0. User Manual For HPC Cluster at GIKI. Volume. Ghulam Ishaq Khan Institute of Engineering Sciences & Technology

CMS Tier-3 cluster at NISER. Dr. Tania Moulik

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Transcription:

HTTP://WWW.HEP.LU.SE/COURSES/MNXB01 Introduction to Programming and Computing for Scientists (2015 HT) Tutorial-4a: Parallel (multi-cpu) Computing Balazs Konya (Lund University) Programming for Scientists Tutorial4a 1 / 16

Outline Parallel computing in a nutshell: motivation, terminology, solutions Howto ride on big iron : Login to remote computers The practical basics of working with batch systems Multi-task jobs http://arstechnica.com/information-technology/2013/07/creating-a-99-parallel-computing-machine-is-just-as-hard-as-it-sounds Balazs Konya (Lund University) Programming for Scientists Tutorial4a 2 / 16

What is parallel computing? Traditional computing: serial execution of a single stream of instructions on a single processing element Parallel computing: simultaneous execution of stream(s) of instructions on multiple processing elements Non-sequantialexecution of a computational task (part of) the problem solved by simultaneous subtasks (processes) Relies on the assumption that problems can be divided (decomposed) into smaller ideally independent ones that can be solved parallel Balazs Konya (Lund University) Programming for Scientists Tutorial4a 3 / 16

What is parallel computing? Parallelism levels ( distance among the processing elements): Bit and Instruction level: inside the processors (e.g. 64 bits processor can execute 2 32 bits operations) Multicore/multi cpu level: inside the same chip/computer. The processing elements share the memory, system bus and OS. Network-connected computers: clusters, distributed computing. Each processing element has its own memory space, OS, application software and data Huge difference depending on the interconnects: e.g. High Performance Computing (supercomputers) vs. High Throughput Computing ( seti@home) Balazs Konya (Lund University) Programming for Scientists Tutorial4a 4 / 16

Some classifications Flynn s taxonomy: Single Instruction SISD: sequential normal programs MIMD: most of the parallel programs SIMD: data chewing by the same algorithm MISD: rarely exists Multiple Instruction Single Data SISD MISD Multiple Data SIMD MIMD SMP vs. MPP (or the shared memory vs. distributed memory debate): SMP: Symmetric Multi Processors system: shared memory approach single box machines, OpenMP programming family MPP: Massively Parallel Processors system: distributed memory, network-connected CPUs clusters, MPI programming family (message passing) SMPs are easier to program but scale worse than the MPPs Balazs Konya (Lund University) Programming for Scientists Tutorial4a 5 / 16

Why parallel computing? It is cool Sometimes the problem does not fit into a single box: you need more resources than you can get from a single computer To obtain at least 10 times more power than is available on your desktop To get exceptional performance from computers To be couple of years ahead of what is possible by the current (hardware) technology The frequency scaling approach to increase performance does not work any longer (power consumption issues): The new approach is to stuff more and more processing units into machines, introducing parallelism everywhere Balazs Konya (Lund University) Programming for Scientists Tutorial4a 6 / 16

Measuring performance gain: the Speedup In an ideal scenario a program running on P processing elements would execute P times faster..., giving us a linear speadup Speedup S(n,P): ratio of execution time of the program on a single processor (T 1 ) and execution time of the parallel version of the program on P processors (T P ): In practice, the performance gain depends on the way the problem was divided among the processing elements and the system characteristics. Amdahl s law: gives an upper estimate for maximum theoretical speedup and states that it is limited by the nonparallelized part of the code: alpha is the sequential fraction of the program e.g. if 10% of the code is non-parallizable, then the maximum speedup is limited by 10, independent of the number of used processors (!) source: wikipedia Balazs Konya (Lund University) Programming for Scientists Tutorial4a 7 / 16

The dark side... the bearing of a child takes nine months, no matter how many women are assigned Not everything is suitable for parallelization Complexity increases as more and more communication is involved: embarrasingly paralell -> course-grained -> fine-grained problem domains Parallel computing opens up new set of problems: Communication overheads Concurrency problems Synchronization delays Race conditions and dead locks Nobody wants to debug a parallel code... Developing & deploying a parallel code usually consume more time than the expected speedup A practical advice for parallelization: Unless you have an embarrasingly parallel problem, forget it If you are stubborn, then at least use an available parallel (numerical) library and start with the profiling (understanding) of your program Wait for the holy grail of computational science: automatic parallelization by compilers Balazs Konya (Lund University) Programming for Scientists Tutorial4a 8 / 16

Accessing remote computers Secure Shell (SSH) is a secure way of accessing remote computers, executing commands remotely or moving data between computers. All network traffic is encrypted The de-facto protocol for remote login & computer access Available on non-linux platforms too (putty and winscp on windows) SSH servers are listening to incomming connections on the standard TCP 22 port Login is done with username/passwdor using keypairs (advanced topic) Exercise 1:use the Linux sshand scpcommands: remote computer: pptest-iridium.lunarc.lu.se ssh remote_user@machine X scp localfile user@machine:remote_dir Balazs Konya (Lund University) Programming for Scientists Tutorial4a 9 / 16

Working on a remote computer: screen IMAGINE that: You are being logged on a remote computer In the middle of a long task (e.g. compilation, download, etc..) Then, suddenly the network connection dies or you d like to go home and continue the same work from your home desktop Is there a way to avoid loosing all your work? How can one disconnect & reconnect to the same session without the need to restart everything from scratch? SOLUTION: use the screen! The utility that allows you to: Keep a session active even through network disruptions Disconnect and re-connect to a sessions from multiple locations (computers) Run a long remote running process without maintaining an active remote login session Balazs Konya (Lund University) Programming for Scientists Tutorial4a 10 / 16

Working on a remote computer: screen Exercise 2:use the Linux screen utility to manage remote screen sessions, connect, reconnect to active session, survive a network failure Screen is started from the command line just like any other command [iridium ~]$: screen You can create new windows inside screen, ctr+ac then rotate, switch between windows with ctrl+a n Listing your screens: [iridium ~]$: screen -list Disconnecting from your active session, screen (your task keeps running!): [iridium ~]$: screen d or ctrl+ad Re-connecting to an active screen session (re-attach to screen): [iridium ~]$: screen r Terminating, logging out of screen type exit from inside all your active screen sessions Using screen to log your activity: [iridium ~]$: screen L or ctrl+a H turns on/off logging during a screen session Balazs Konya (Lund University) Programming for Scientists Tutorial4a 11 / 16

Working with a cluster Goal: understand basic concepts of a cluster, Workload Management system, queue, jobs Cluster: Iridium cluster at LUNARC Frontend: pptest-iridium.lunarc.lu.se Batch system: SLURM Exercise 3: Look around on the front-end (e.g. inspect CPU and memory details): cat /proc/cpuinfo; cat /proc/meminfo; top who, pwd Check man pages for SLURM commands: sbatch, sinfo, squeue, scontrol, scancel Balazs Konya (Lund University) Programming for Scientists Tutorial4a 12 / 16

Working with a cluster Exercise 4: simple jobs with SLURM List SLURM queues (partitions) > sinfo Create file myscript (use provided examples) Submit simple jobs and check their status: > sbatch myscript > cat slurm-<jobid>.out > squeue > scontrol show job <jobid> Repeat with multi core/node jobs sbatch N4 myscript sbatch n6 myscript In a multi-core advanced example, pay attention how jobs are distributed across nodes and cores Simple myscript: #!/bin/sh #SBATCH -J simple job #SBATCH --time=1 echo we are on the node hostname who sleep 2m Multicore/node myscript: #!/bin/sh #SBATCH -J multi job #SBATCH --time=1 srun hostname sort sleep 5m Balazs Konya (Lund University) Programming for Scientists Tutorial4a 13 / 16

Working with a cluster: task farming Exercise 5: With a help of a master script you are going to execute X number of subtasks on Y number of processing units The master script (master.sh) takes care of launching (new) subtasks as soon as a processing element becomes available The worker.sh script imitates a payload execution that corresponds to a subtask Steps: 1. Download, copy the scripts to a new directory on pp-test-iridium 2. Set the problem size (NB_of_subtasks) and the number of processing elements ( #SBATCH -n ) in the master.sh, the payload size (i.e. How long a subtask runs) in the worker.sh 3. Launch the taskfarm (sbatch master.sh), monitor the execution of the subtasks (squeue j <jobid> -s) and finally check how much time the taskfarm processing required (check the output files of the subtasks and the slurm job) 4. Repeat the taskfarming with modified parameters, What is the speedup? Balazs Konya (Lund University) Programming for Scientists Tutorial4a 14 / 16

Further reading Introduction Parallel computing (by Lawrence Livermore National Laboratory) https://computing.llnl.gov/tutorials/parallel_comp/ most of the images are taken from this tutorial SLURM: https://computing.llnl.gov/linux/slurm/quickstart.html Balazs Konya (Lund University) Programming for Scientists Tutorial4a 15 / 16

Homework 4a Prepare and execute a simple workflow on the iridium cluster as a batch job Prepare: The workflow should be a series linux commands (e.g. ls, mkdir, cd, pwd, cat,...) Should contain a few minutes sleep Save the workflow as SLURM job descripton in a file homework7a.sh Execute: Submit several jobs with sbatch List the queue status List the job status Cancell some of the jobs Record your session on iridium using the logging feature of screen: Login to iridium with ssh Edit your homework4a.sh SLURM script Then, start up a screen session with logging enabled: screen L Execute all the slurm commands, then exit screen Send the screenlog.0 and the SLURM script as the solution of the homework Balazs Konya (Lund University) Programming for Scientists Tutorial4a 16 / 16