Introduction to parallel computing and UPPMAX



Similar documents
SLURM Workload Manager

Introduction to Running Computations on the High Performance Clusters at the Center for Computational Research

Introduction to Supercomputing with Janus

The Asterope compute cluster

Tutorial-4a: Parallel (multi-cpu) Computing

SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education

General Overview. Slurm Training15. Alfred Gil & Jordi Blasco (HPCNow!)

SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education

Matlab on a Supercomputer

An introduction to compute resources in Biostatistics. Chris Scheller

To connect to the cluster, simply use a SSH or SFTP client to connect to:

Using WestGrid. Patrick Mann, Manager, Technical Operations Jan.15, 2014

Managing GPUs by Slurm. Massimo Benini HPC Advisory Council Switzerland Conference March 31 - April 3, 2014 Lugano

Overview of HPC Resources at Vanderbilt

Until now: tl;dr: - submit a job to the scheduler

Biowulf2 Training Session

Introduction to Linux and Cluster Basics for the CCR General Computing Cluster

Using NeSI HPC Resources. NeSI Computational Science Team

Work Environment. David Tur HPC Expert. HPC Users Training September, 18th 2015

High Performance Computing

The Lattice Project: A Multi-Model Grid Computing System. Center for Bioinformatics and Computational Biology University of Maryland

Parallel Computing using MATLAB Distributed Compute Server ZORRO HPC

MPI / ClusterTools Update and Plans

Submitting batch jobs Slurm on ecgate. Xavi Abellan User Support Section

Tutorial: Using WestGrid. Drew Leske Compute Canada/WestGrid Site Lead University of Victoria

Remote & Collaborative Visualization. Texas Advanced Compu1ng Center

HPC at IU Overview. Abhinav Thota Research Technologies Indiana University

Introduction to HPC Workshop. Center for e-research

SURFsara HPC Cloud Workshop

HPC Wales Skills Academy Course Catalogue 2015

SeSE/SNIC-UPPMAX: Scientific Visualisation Workshop 2014

Background and introduction Using the cluster Summary. The DMSC datacenter. Lars Melwyn Jensen. Niels Bohr Institute University of Copenhagen

Parallel Programming Survey

1.0. User Manual For HPC Cluster at GIKI. Volume. Ghulam Ishaq Khan Institute of Engineering Sciences & Technology

Running COMSOL in parallel

bwgrid Treff MA/HD Sabine Richling, Heinz Kredel Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim 24.

Getting Started with HPC

Agenda. Using HPC Wales 2

Parallel Processing using the LOTUS cluster

Miami University RedHawk Cluster Working with batch jobs on the Cluster

Introduction to Running Hadoop on the High Performance Clusters at the Center for Computational Research

Linux für bwgrid. Sabine Richling, Heinz Kredel. Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim. 27.

Berkeley Research Computing. Town Hall Meeting Savio Overview

Hodor and Bran - Job Scheduling and PBS Scripts

The RWTH Compute Cluster Environment

Grid Engine experience in Finis Terrae, large Itanium cluster supercomputer. Pablo Rey Mayo Systems Technician, Galicia Supercomputing Centre (CESGA)

Parallel Computing. Introduction

Batch Job Analysis to Improve the Success Rate in HPC

The Top Six Advantages of CUDA-Ready Clusters. Ian Lumb Bright Evangelist

Mitglied der Helmholtz-Gemeinschaft. System monitoring with LLview and the Parallel Tools Platform

R and High-Performance Computing

Introduction to MSI* for PubH 8403

Using the Windows Cluster

Part I Courses Syllabus

An Introduction to High Performance Computing in the Department

Streamline Computing Linux Cluster User Training. ( Nottingham University)

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

SGE Roll: Users Guide. Version Edition

SURFsara HPC Cloud Workshop

NEC HPC-Linux-Cluster

benchmarking Amazon EC2 for high-performance scientific computing

CHEOPS Cologne High Efficient Operating Platform for Science Brief Instructions

Technical Computing Suite Job Management Software

A High Performance Computing Scheduling and Resource Management Primer

Installing and running COMSOL on a Linux cluster

Simulation of batch scheduling using real production-ready software tools

How To Run A Tompouce Cluster On An Ipra (Inria) (Sun) 2 (Sun Geserade) (Sun-Ge) 2/5.2 (

Broadening Moab/TORQUE for Expanding User Needs

Grid Scheduling Dictionary of Terms and Keywords

Grid Engine Basics. Table of Contents. Grid Engine Basics Version 1. (Formerly: Sun Grid Engine)

User s Manual

1 Bull, 2011 Bull Extreme Computing

Parallel Options for R

New High-performance computing cluster: PAULI. Sascha Frick Institute for Physical Chemistry

The CNMS Computer Cluster

Cloud Bursting with SLURM and Bright Cluster Manager. Martijn de Vries CTO

Getting help - guide to the ticketing system. Thomas Röblitz, UiO/USIT/UAV/ITF/FI ;)

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging

High Performance Computing with Sun Grid Engine on the HPSCC cluster. Fernando J. Pineda

Manual for using Super Computing Resources

Using the Yale HPC Clusters

CNAG User s Guide. Barcelona Supercomputing Center Copyright c 2015 BSC-CNS December 18, Introduction 2

Running on Blue Gene/Q at Argonne Leadership Computing Facility (ALCF)

PMI: A Scalable Parallel Process-Management Interface for Extreme-Scale Systems

Microsoft Compute Clusters in High Performance Technical Computing. Björn Tromsdorf, HPC Product Manager, Microsoft Corporation

MFCF Grad Session 2015

Debugging in Heterogeneous Environments with TotalView. ECMWF HPC Workshop 30 th October 2014

PROGRAMMING FOR BIOLOGISTS. BIOL 6297 Monday, Wednesday 10 am -12 pm

Building an energy dashboard. Energy measurement and visualization in current HPC systems

Computational infrastructure for NGS data analysis. José Carbonell Caballero Pablo Escobar

Clusters: Mainstream Technology for CAE

Transcription:

Introduction to parallel computing and UPPMAX Intro part of course in Parallel Image Analysis Elias Rudberg elias.rudberg@it.uu.se March 22, 2011

Parallel computing Parallel computing is becoming increasingly important nowadays. Physics Chemistry Biology Weather forecasts Image analysis!...

Supercomputers and computing centers Many computing centers / supercomputers exist in the world today. A few of the largest ones: National Supercomputing Center of Tianjin (China) Oak Ridge National Laboratory (USA) GSIC Center, Tokyo Institute of Technology (Japan) Commissariat à l énergie atomique (CEA) (France) Forschungszentrum Jülich (Germany) In Sweden, there are six computing centers organized through the Swedish National Infrastructure for Computing (SNIC).

Six computing centers in Sweden SNIC centers: HPC2N in Umeå. UPPMAX in Uppsala. Here! PDC in Stockholm. NSC in Linköping C3SE in Göteborg LUNARC in Lund.

UPPMAX Uppsala Multidisciplinary Center for Advanced Computational Science (UPPMAX). UPPMAX is Uppsala University s resource of high-performance computers, large-scale storage, and know-how of high-performance computing (HPC).

UPPMAX

Kalkyl The compute cluster Kalkyl was delivered in November 2009 and named after Professeur Tryphon Tournesol, (Professor Karl Kalkyl in Swedish, Professor Cuthbert Calculus in English) from the cartoons by Hergé.

How does a compute cluster like Kalkyl work? Many compute nodes connected via fast interconnect (Infiniband). A few nodes act as login nodes. Each node is a multicore computer running Linux. Users log in to the login nodes using ssh (Secure Shell protocol). Users run jobs on compute nodes through a queueing system.

How does a compute cluster like Kalkyl work?

How does each compute node work? Each compute node is a multi-core computer. On Kalkyl, each node has 8 cores.

Linux Each node on Kalkyl in running the Linux operating system. The Linux version installed now is Scientific Linux SL release 5.5. You can use Kalkyl even if your local computer is not using Linux. If using Mac, you can use ssh directly at the command prompt. If using Windows, you can install and use some ssh program, for example putty. In other operating systems there is probably also some way of using ssh.

Ways to use a compute cluster like Kalkyl There are several different ways of using a compute cluster: Run serial program as a single-core job. Possibly run many such independent jobs in parallel. No parallel programming needed. Run threaded program as a single-node job. The same program then uses all 8 cores in one compute node. Possibly 8 times faster compared to single-core run. Run program using distributed memory parallelization using the message passing interface (MPI). The same program uses N compute nodes. Possibly N 8 times faster compared to single-core run.

Ways to use a compute cluster like Kalkyl Single-core job A serial program (a program not using threads) may run as a single-core job. Other users may have jobs running on other cores on the same compute node wher your job runs.

Ways to use a compute cluster like Kalkyl Single-node job A threaded program may run as a single-node job.

Ways to use a compute cluster like Kalkyl Multi-node job A distributed memory parallelized (typically MPI) program may run on several compute nodes.

Queueing systems As a user, you typically run your program on compute node(s) by submitting a job script through a queueing system. The queueing system assigns different priorities to different jobs depending on which project they belong to and how much of each project s time has been used. There are several different queueing systems, for example: SLURM Used on Kalkyl at UPPMAX! Grid Engine (used on Grad) PBS/Torque/Maui (used at HPC2N) EASY scheduler (used at PDC) We focus on SLURM.

Using the SLURM queueing system on Kalkyl The Simple Linux Utility for Resource Management (SLURM) is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters. Useful commands available to control SLURM on Kalkyl: sbatch squeue scancel sinfo projinfo (submit a job) (get info about previously submitted jobs) (cancel a job) (get info about available resources) (get info about project, hours used etc)

Example of job script Example of job script (filename jobscript.sh): #!/bin/bash #SBATCH -p node #SBATCH -N 1 #SBATCH -t 00:30:00 #SBATCH -J eliasjob1 #SBATCH -A g2011040./ergo -m 0002.xyz < egonparams.ego To submit the job: $ sbatch jobscript.sh Submitted batch job 417772

Using the SLURM queueing system on Kalkyl After having submitted job(s), use the squeue command to check the status of your jobs: $ squeue -u eliasr JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 417778 node eliasjob eliasr PD 0:00 1 (Priority) If you have more than one job, squeue gives a list of jobs: $ squeue -u eliasr JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 417781 devel eliasjob eliasr R 0:18 1 q33 417778 node eliasjob eliasr R 0:28 1 q144

Kalkyl User Guide For more information about how to run jobs, see the Kalkyl User Guide available at the UPPMAX web page: http://www.uppmax.uu.se/ http://www.uppmax.uu.se/support/user-guides/kalkyl-user-guide

Distributed file systems Same file system accessible from all compute nodes. Same file system also accessible from all UPPMAX clusters (currently Grad and Kalkyl). Local /scratch directory on each node can provide faster file operations, useful for temporary files.

UPPMAX accounts and projects Each person using UPPMAX needs a personal user account. The type and amount of resources accessible to each user depends on which project(s) the user belongs to. When submitting a job to the queueing system, you must specify which project should be charged with the run time for the job. When logged in to Kalkyl, you can check which projects you are a member of using the projinfo command: $ projinfo g2011040 (Counting the number of core hours used since 2011-03-01/00:00:00 until now.) Project Used[h] Current allocation [h/month] User ----------------------------------------------------- g2011040 11.49 2000 cris 1.32 eliasr 10.17

What will happen during this course? This week: start using queueing system on Kalkyl to run single-core jobs. Week 2: Shared-memory parallelization for multi-core computers. OpenMP. Week 3: parallelization on GPU:s using OpenCL. Week 4: Distributed-memory parallelization. MPI. Week 5: Your own projects.