Juropa. Batch Usage Introduction. May 2014 Chrysovalantis Paschoulas

Size: px
Start display at page:

Download "Juropa. Batch Usage Introduction. May 2014 Chrysovalantis Paschoulas c.paschoulas@fz-juelich.de"

Transcription

1 Juropa Batch Usage Introduction May 2014 Chrysovalantis Paschoulas

2 Batch System Usage Model A Batch System: monitors and controls the resources on the system manages and schedules the jobs enforces limits and policies according to the Batch Model allocates the resources, sets up the environment and then runs the jobs Juropa Cluster (3289 compute nodes) Juropa JSC Partition 3033 compute nodes 64 nodes are dedicated to interactive jobs A few nodes are dedicated to software or system tests Rest of the compute nodes are used for normal batch jobs 256 compute nodes for a special partition

3 Batch System On Juropa for Batch System we use the combination of Moab and Torque Moab is the Workload Manager - Batch Scheduler Torque is the Resource Manager Moab/Torque Batch System: Manages policies, priorities, limits Starts jobs, manages output Provides features for advanced reservations, backfilling etc. Job statistics and accounting User commands for job submission, job query, job canceling etc.

4 Batch System Configuration & Limits In order to implement and manage the scheduling, the Batch System uses the abstraction of queues (classes), like jsc, inter_jsc etc. Class configuration: allowed users, list of nodes, max. limits, default values etc. Interactive jobs 64 compute nodes available No node Sharing Max. number of nodes: 16 (default: 1) Max. wall-clock time: 10h (default: 30 min) Max. running jobs: 20 per user (including batch jobs) Accounting: (number of nodes) x (connect-time)

5 Batch System Configuration & Limits Batch jobs 3033 compute nodes available No node sharing Max. number of nodes per job: 1024 Max. wall-clock time: 24h Max. number of running jobs: 20 per user Max. number of eligible jobs is limited to 20 per user Default values: Number of nodes: 1 Walltime limit: 30min Tasks per node: 8 Accounting: (number of nodes) x (wall-clock time) Jobs requesting more nodes than the limits can be run on special request Not included in normal scheduling Will be run e.g. once per week if needed during nonprime time Please contact sc@fz-juelich.de

6 Compiling Programs In order to compile and execute parallel programs on Juropa, there are available MPI wrappers for the Intel compilers. These wrappers build up the MPI environment for the compilation task. The current wrappers are: mpicc, mpicxx, mpif77, mpif90 Some useful compiler options: -openmp: enables OpenMP -g: creates debugging information -L: path to libraries for the linker -O[0-3]: optimization levels The wrappers (as many tools) are provided as modules and it is very easy for the users to choose the version of the compiler they want. module load parastation/mpi2 intel Compile example: mpicxx O2 program.cpp o mpi_program Execute MPI program: mpiexec <options> mpi_program

7 Job Submission After the compilation, the users can submit a job with the command: msub <options> <executable> or <jobscript> For example: msub l nodes=16:ppn=8 l walltime=04:00:00 e /lustre/jhome/zam/err o /lustre/jhome5/zam/out myscript Useful options of msub: l nodes=<num> # number of nodes l ppn=<num> # procs per node l walltime=<hh:mm:ss> # requested wall clock time j oe # combine stderr and stdout M < address> # send to this address m eab # send on end, abort or begin N <name> # name of job I # run an interactive job v tpt=<num threads> # number of OpenMP threads q <queue name> # destination queue (class) Instead of using msub with all this options in a single command line, it is possible for the users include all the job submission options in the batch script.

8 Batch Script The users in order to define the msub options in their batch scripts, they have to use #MSUB directives. For parallel jobs the users have to include the MPI execution command mpiexec. Example 1 In this example we will have an MPI application that will start 64 MPI tasks on 8 nodes using 8 cores per node. Each MPI task runs on a single core. #!/bin/bash x #MSUB l nodes=8:ppn=8 #MSUB l walltime=04:00:00 #MSUB e /home/jhome3/test_user/my error.txt #MSUB o /home/jhome3/test_user/my out.txt ### start of jobscript cd $PBS_O_WORKDIR echo "workdir: $PBS_O_WORKDIR" # NSLOTS = nodes * ppn = 8 * 8 = 64 NSLOTS=64 echo "running on $NSLOTS cpus " mpiexec np $NSLOTS./mpi_program

9 Batch Script Example 2 In the following example, the application will be started on 32 nodes using 1 MPI tasks per node, 32 tasks in total, and only one core will be used per node. #!/bin/bash x #MSUB N MPI_32x1_job #MSUB l nodes=32:ppn=1 ### start of jobscript ### mpiexec np 32 mpi_program >> $PBS_O_WORKDIR/out.$PBSJOBID Example 3 In this example we have a Hybrid job that uses MPI and OpenMP. The application will be started on 4 nodes, on each node 1 MPI tasks will be created, 4 tasks in total, and each task will have 8 OpenMP threads. #!/bin/bash x #MSUB N hybrid_8x8_job #MSUB l nodes=4:ppn=8 #MSUB v tpt=8 ### start of jobscript ### export OMP_NUM_THREADS=8 mpiexec np 4 exports=omp_num_threads mpi_omp_program

10 Task Allocation & SMT Task Allocation SMT To calculate the total number of MPI tasks for a job: total number of MPI tasks = (number of nodes)x(procs per node) When we have hybrid MPI/OpenMp jobs (option -tpt is used): total tasks = (number of nodes)x((procs per node)/(threads per task)) To calculate the number of MPI tasks per node for hybrid jobs: MPI tasks per node = (procs per node)/(threads per task) The compute nodes on Juropa support the SMT technology (Intel Xeon CPUs Nehalem Arch.). In order to use SMT for the jobs, the users have to set the msub option ppn=16. For example: #!/bin/bash x #MSUB N SMT_hybrid_8x8_job #MSUB l nodes=4:ppn=16 #MSUB v tpt=8 ### start of jobscript ### export OMP_NUM_THREADS=8 mpiexec np 8 exports=omp_num_threads application.exe

11 More Options Interactive Jobs In order to start an interactive job on Juropa, the users have to submit a job with the option I I of msub. If the resources are free, the users will automatically have access to the nodes and can start the application. msub I l nodes=2:ppn=8,walltime=00:15:00 Job dependencies & Job chains Users can submit a job defining dependencies msub W depend=<jobid> <jobscript> or even submit job chains #!/bin/bash NO_OF_JOBS=<no of jobs > JOB_SCRIPT=<jobscript> i=0 JOBID=$(msub $JOB_SCRIPT 2>&1 grep v e '^$' sed e 's/\s*//') while [ $i le $NO_OF_JOBS ]; do JOBID=$(msub W depend=afterok:$jobid $JOB_SCRIPT 2>&1 grep v e '^$' sed e 's/\s*//') let i=$i+1 done

12 Batch System - Commands msub Submit a job Returns job ID on success Note: during times of heavy load Moab might run into a timeout showq [-r -i -b] [-u <userid>] Shows all, running, idle or blocked jobs of all or specified users mjobctl -c <jobid> Cancel queued or running job mjobctl -c -w USER=<userID> Cancel all jobs of specified user checkjob [-v] <jobid> Display detailed information on a specified job

13 Batch System -Commands showstart <jobid> Shows estimated start-time of specified job. Estimated starttime can change while jobs with higher priority get scheduled mjobctl --help Shows all options, e.g. how to hold or resume holds on jobs showbf -c jsc Shows available resources for immediate use For more detailed information on Moab commands please see: Graphical view of usage, jobs, distribution of jobs, etc. llview Was developed by W.Frings, member of JSC

14 Batch System Job Scheduling Some comments on job scheduling Jobs are scheduled by priority Priority increases with number of nodes Priority increases during waiting (aging) No node sharing Backfilling mode Please specify the requested wall-time as exact as possible Jobs of users (groups) who ran out of cpu quota will get very low priority Please be aware of the fact that estimated start-time change very often

15 Job life-cycle 6. When there are enough free resources, Moab converts the job script into PBS syntax and tells the RM to start the job on the set of the associated nodes(calls qsub). Juropa Master Node MOAB Server 5. msub uses the submission filter to put the jobs into the proper queue and the job goes into the Moab queue. TORQUE Server 7. Torque starts the prologue on the compute nodes associated to the job. When all the required resources are available AND the nodes are in healthy condition then it starts the jobscript on the Mother Superior. Compute Nodes Login Nodes Compute node Users develop their Software. Mother Superior Compute node 02.. Compute node N 8. Mother Superior runs the jobscript and execute the mpiexec command. This communicates via psid daemons to start the MPI tasks on all compute nodes. And when is completed, the RM daemons run the epilogue on all nodes to clean up resources. 2. Compile with MPI Wrappers. 3. Create the batch script. 4. Submit their jobs with msub.

16 Batch System CPU quota & Accounting Query current status of cpu quota q_cpuquota <options> q_cpuquota -? # shows all available options Types of CPU quota Fixed: : a fixed amount of cpu quota can be used during the allocation period. (refers to small quota amounts) Monthly: : jobs will be scheduled with normal priority until current, previous and next monthly quota is exhausted. CPU quota not used in this time frame is lost. Charging mode Users will be charged for wall-clock time of their jobs on the set of nodes 3 states of contingent: normal, low-cont, no-cont (monthly quotas) CPU Quotas are defined per group All members of a group will be informed by mail if the group runs out of cpu quota or if new quota is assigned Jobs will get low priority (<0) and reduced wall-time limit (6 hours) when CPU quota is used up. They will only run if no other waiting jobs fit into free resources

17 Further Information Regular preventive maintenance every second Thursdays See Message of today at login Get recent status updates by subscribing to the system highmessages as described at the bottom of this page: Juropa on-line documentation User support at FZJ Phone:

Batch Scripts for RA & Mio

Batch Scripts for RA & Mio Batch Scripts for RA & Mio Timothy H. Kaiser, Ph.D. tkaiser@mines.edu 1 Jobs are Run via a Batch System Ra and Mio are shared resources Purpose: Give fair access to all users Have control over where jobs

More information

Miami University RedHawk Cluster Working with batch jobs on the Cluster

Miami University RedHawk Cluster Working with batch jobs on the Cluster Miami University RedHawk Cluster Working with batch jobs on the Cluster The RedHawk cluster is a general purpose research computing resource available to support the research community at Miami University.

More information

Job Scheduling with Moab Cluster Suite

Job Scheduling with Moab Cluster Suite Job Scheduling with Moab Cluster Suite IBM High Performance Computing February 2010 Y. Joanna Wong, Ph.D. yjw@us.ibm.com 2/22/2010 Workload Manager Torque Source: Adaptive Computing 2 Some terminology..

More information

Ra - Batch Scripts. Timothy H. Kaiser, Ph.D. tkaiser@mines.edu

Ra - Batch Scripts. Timothy H. Kaiser, Ph.D. tkaiser@mines.edu Ra - Batch Scripts Timothy H. Kaiser, Ph.D. tkaiser@mines.edu Jobs on Ra are Run via a Batch System Ra is a shared resource Purpose: Give fair access to all users Have control over where jobs are run Set

More information

Batch Usage on JURECA Introduction to Slurm. Nov 2015 Chrysovalantis Paschoulas c.paschoulas@fz-juelich.de

Batch Usage on JURECA Introduction to Slurm. Nov 2015 Chrysovalantis Paschoulas c.paschoulas@fz-juelich.de Batch Usage on JURECA Introduction to Slurm Nov 2015 Chrysovalantis Paschoulas c.paschoulas@fz-juelich.de Batch System Concepts (1) A cluster consists of a set of tightly connected "identical" computers

More information

Running applications on the Cray XC30 4/12/2015

Running applications on the Cray XC30 4/12/2015 Running applications on the Cray XC30 4/12/2015 1 Running on compute nodes By default, users do not log in and run applications on the compute nodes directly. Instead they launch jobs on compute nodes

More information

SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education www.accre.vanderbilt.

SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education www.accre.vanderbilt. SLURM: Resource Management and Job Scheduling Software Advanced Computing Center for Research and Education www.accre.vanderbilt.edu Simple Linux Utility for Resource Management But it s also a job scheduler!

More information

SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education www.accre.vanderbilt.

SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education www.accre.vanderbilt. SLURM: Resource Management and Job Scheduling Software Advanced Computing Center for Research and Education www.accre.vanderbilt.edu Simple Linux Utility for Resource Management But it s also a job scheduler!

More information

Job scheduler details

Job scheduler details Job scheduler details Advanced Computing Center for Research & Education (ACCRE) Job scheduler details 1 / 25 Outline 1 Batch queue system overview 2 Torque and Moab 3 Submitting jobs (ACCRE) Job scheduler

More information

Hodor and Bran - Job Scheduling and PBS Scripts

Hodor and Bran - Job Scheduling and PBS Scripts Hodor and Bran - Job Scheduling and PBS Scripts UND Computational Research Center Now that you have your program compiled and your input file ready for processing, it s time to run your job on the cluster.

More information

NYUAD HPC Center Running Jobs

NYUAD HPC Center Running Jobs NYUAD HPC Center Running Jobs 1 Overview... Error! Bookmark not defined. 1.1 General List... Error! Bookmark not defined. 1.2 Compilers... Error! Bookmark not defined. 2 Loading Software... Error! Bookmark

More information

JUROPA Linux Cluster An Overview. 19 May 2014 Ulrich Detert

JUROPA Linux Cluster An Overview. 19 May 2014 Ulrich Detert Mitglied der Helmholtz-Gemeinschaft JUROPA Linux Cluster An Overview 19 May 2014 Ulrich Detert JuRoPA JuRoPA Jülich Research on Petaflop Architectures Bull, Sun, ParTec, Intel, Mellanox, Novell, FZJ JUROPA

More information

Resource Management and Job Scheduling

Resource Management and Job Scheduling Resource Management and Job Scheduling Jenett Tillotson Senior Cluster System Administrator Indiana University May 18 18-22 May 2015 1 Resource Managers Keep track of resources Nodes: CPUs, disk, memory,

More information

Job Scheduling Explained More than you ever want to know about how jobs get scheduled on WestGrid systems...

Job Scheduling Explained More than you ever want to know about how jobs get scheduled on WestGrid systems... Job Scheduling Explained More than you ever want to know about how jobs get scheduled on WestGrid systems... Martin Siegert, SFU Cluster Myths There are so many jobs in the queue - it will take ages until

More information

NEC HPC-Linux-Cluster

NEC HPC-Linux-Cluster NEC HPC-Linux-Cluster Hardware configuration: 4 Front-end servers: each with SandyBridge-EP processors: 16 cores per node 128 GB memory 134 compute nodes: 112 nodes with SandyBridge-EP processors (16 cores

More information

Grid 101. Grid 101. Josh Hegie. grid@unr.edu http://hpc.unr.edu

Grid 101. Grid 101. Josh Hegie. grid@unr.edu http://hpc.unr.edu Grid 101 Josh Hegie grid@unr.edu http://hpc.unr.edu Accessing the Grid Outline 1 Accessing the Grid 2 Working on the Grid 3 Submitting Jobs with SGE 4 Compiling 5 MPI 6 Questions? Accessing the Grid Logging

More information

Introduction to Running Hadoop on the High Performance Clusters at the Center for Computational Research

Introduction to Running Hadoop on the High Performance Clusters at the Center for Computational Research Introduction to Running Hadoop on the High Performance Clusters at the Center for Computational Research Cynthia Cornelius Center for Computational Research University at Buffalo, SUNY 701 Ellicott St

More information

Getting Started with HPC

Getting Started with HPC Getting Started with HPC An Introduction to the Minerva High Performance Computing Resource 17 Sep 2013 Outline of Topics Introduction HPC Accounts Logging onto the HPC Clusters Common Linux Commands Storage

More information

Using WestGrid. Patrick Mann, Manager, Technical Operations Jan.15, 2014

Using WestGrid. Patrick Mann, Manager, Technical Operations Jan.15, 2014 Using WestGrid Patrick Mann, Manager, Technical Operations Jan.15, 2014 Winter 2014 Seminar Series Date Speaker Topic 5 February Gino DiLabio Molecular Modelling Using HPC and Gaussian 26 February Jonathan

More information

RA MPI Compilers Debuggers Profiling. March 25, 2009

RA MPI Compilers Debuggers Profiling. March 25, 2009 RA MPI Compilers Debuggers Profiling March 25, 2009 Examples and Slides To download examples on RA 1. mkdir class 2. cd class 3. wget http://geco.mines.edu/workshop/class2/examples/examples.tgz 4. tar

More information

Parallel Debugging with DDT

Parallel Debugging with DDT Parallel Debugging with DDT Nate Woody 3/10/2009 www.cac.cornell.edu 1 Debugging Debugging is a methodical process of finding and reducing the number of bugs, or defects, in a computer program or a piece

More information

Work Environment. David Tur HPC Expert. HPC Users Training September, 18th 2015

Work Environment. David Tur HPC Expert. HPC Users Training September, 18th 2015 Work Environment David Tur HPC Expert HPC Users Training September, 18th 2015 1. Atlas Cluster: Accessing and using resources 2. Software Overview 3. Job Scheduler 1. Accessing Resources DIPC technicians

More information

Debugging and Profiling Lab. Carlos Rosales, Kent Milfeld and Yaakoub Y. El Kharma carlos@tacc.utexas.edu

Debugging and Profiling Lab. Carlos Rosales, Kent Milfeld and Yaakoub Y. El Kharma carlos@tacc.utexas.edu Debugging and Profiling Lab Carlos Rosales, Kent Milfeld and Yaakoub Y. El Kharma carlos@tacc.utexas.edu Setup Login to Ranger: - ssh -X username@ranger.tacc.utexas.edu Make sure you can export graphics

More information

PBS Tutorial. Fangrui Ma Universit of Nebraska-Lincoln. October 26th, 2007

PBS Tutorial. Fangrui Ma Universit of Nebraska-Lincoln. October 26th, 2007 PBS Tutorial Fangrui Ma Universit of Nebraska-Lincoln October 26th, 2007 Abstract In this tutorial we gave a brief introduction to using PBS Pro. We gave examples on how to write control script, and submit

More information

The Moab Scheduler. Dan Mazur, McGill HPC daniel.mazur@mcgill.ca Aug 23, 2013

The Moab Scheduler. Dan Mazur, McGill HPC daniel.mazur@mcgill.ca Aug 23, 2013 The Moab Scheduler Dan Mazur, McGill HPC daniel.mazur@mcgill.ca Aug 23, 2013 1 Outline Fair Resource Sharing Fairness Priority Maximizing resource usage MAXPS fairness policy Minimizing queue times Should

More information

Quick Tutorial for Portable Batch System (PBS)

Quick Tutorial for Portable Batch System (PBS) Quick Tutorial for Portable Batch System (PBS) The Portable Batch System (PBS) system is designed to manage the distribution of batch jobs and interactive sessions across the available nodes in the cluster.

More information

Using Parallel Computing to Run Multiple Jobs

Using Parallel Computing to Run Multiple Jobs Beowulf Training Using Parallel Computing to Run Multiple Jobs Jeff Linderoth August 5, 2003 August 5, 2003 Beowulf Training Running Multiple Jobs Slide 1 Outline Introduction to Scheduling Software The

More information

Installing and running COMSOL on a Linux cluster

Installing and running COMSOL on a Linux cluster Installing and running COMSOL on a Linux cluster Introduction This quick guide explains how to install and operate COMSOL Multiphysics 5.0 on a Linux cluster. It is a complement to the COMSOL Installation

More information

Linux für bwgrid. Sabine Richling, Heinz Kredel. Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim. 27.

Linux für bwgrid. Sabine Richling, Heinz Kredel. Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim. 27. Linux für bwgrid Sabine Richling, Heinz Kredel Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim 27. June 2011 Richling/Kredel (URZ/RUM) Linux für bwgrid FS 2011 1 / 33 Introduction

More information

The RWTH Compute Cluster Environment

The RWTH Compute Cluster Environment The RWTH Compute Cluster Environment Tim Cramer 11.03.2013 Source: D. Both, Bull GmbH Rechen- und Kommunikationszentrum (RZ) How to login Frontends cluster.rz.rwth-aachen.de cluster-x.rz.rwth-aachen.de

More information

Advanced PBS Workflow Example Bill Brouwer 05/01/12 Research Computing and Cyberinfrastructure Unit, PSU wjb19@psu.edu

Advanced PBS Workflow Example Bill Brouwer 05/01/12 Research Computing and Cyberinfrastructure Unit, PSU wjb19@psu.edu Advanced PBS Workflow Example Bill Brouwer 050112 Research Computing and Cyberinfrastructure Unit, PSU wjb19@psu.edu 0.0 An elementary workflow All jobs consuming significant cycles need to be submitted

More information

Martinos Center Compute Clusters

Martinos Center Compute Clusters Intro What are the compute clusters How to gain access Housekeeping Usage Log In Submitting Jobs Queues Request CPUs/vmem Email Status I/O Interactive Dependencies Daisy Chain Wrapper Script In Progress

More information

SLURM Workload Manager

SLURM Workload Manager SLURM Workload Manager What is SLURM? SLURM (Simple Linux Utility for Resource Management) is the native scheduler software that runs on ASTI's HPC cluster. Free and open-source job scheduler for the Linux

More information

Tutorial: Using WestGrid. Drew Leske Compute Canada/WestGrid Site Lead University of Victoria

Tutorial: Using WestGrid. Drew Leske Compute Canada/WestGrid Site Lead University of Victoria Tutorial: Using WestGrid Drew Leske Compute Canada/WestGrid Site Lead University of Victoria Fall 2013 Seminar Series Date Speaker Topic 23 September Lindsay Sill Introduction to WestGrid 9 October Drew

More information

Guillimin HPC Users Meeting. Bryan Caron

Guillimin HPC Users Meeting. Bryan Caron November 13, 2014 Bryan Caron bryan.caron@mcgill.ca bryan.caron@calculquebec.ca McGill University / Calcul Québec / Compute Canada Montréal, QC Canada Outline Compute Canada News October Service Interruption

More information

Streamline Computing Linux Cluster User Training. ( Nottingham University)

Streamline Computing Linux Cluster User Training. ( Nottingham University) 1 Streamline Computing Linux Cluster User Training ( Nottingham University) 3 User Training Agenda System Overview System Access Description of Cluster Environment Code Development Job Schedulers Running

More information

Mitglied der Helmholtz-Gemeinschaft. System monitoring with LLview and the Parallel Tools Platform

Mitglied der Helmholtz-Gemeinschaft. System monitoring with LLview and the Parallel Tools Platform Mitglied der Helmholtz-Gemeinschaft System monitoring with LLview and the Parallel Tools Platform November 25, 2014 Carsten Karbach Content 1 LLview 2 Parallel Tools Platform (PTP) 3 Latest features 4

More information

How to Run Parallel Jobs Efficiently

How to Run Parallel Jobs Efficiently How to Run Parallel Jobs Efficiently Shao-Ching Huang High Performance Computing Group UCLA Institute for Digital Research and Education May 9, 2013 1 The big picture: running parallel jobs on Hoffman2

More information

An introduction to compute resources in Biostatistics. Chris Scheller schelcj@umich.edu

An introduction to compute resources in Biostatistics. Chris Scheller schelcj@umich.edu An introduction to compute resources in Biostatistics Chris Scheller schelcj@umich.edu 1. Resources 1. Hardware 2. Account Allocation 3. Storage 4. Software 2. Usage 1. Environment Modules 2. Tools 3.

More information

Introduction to Running Computations on the High Performance Clusters at the Center for Computational Research

Introduction to Running Computations on the High Performance Clusters at the Center for Computational Research ! Introduction to Running Computations on the High Performance Clusters at the Center for Computational Research! Cynthia Cornelius! Center for Computational Research University at Buffalo, SUNY! cdc at

More information

The Asterope compute cluster

The Asterope compute cluster The Asterope compute cluster ÅA has a small cluster named asterope.abo.fi with 8 compute nodes Each node has 2 Intel Xeon X5650 processors (6-core) with a total of 24 GB RAM 2 NVIDIA Tesla M2050 GPGPU

More information

Parallel Programming for Multi-Core, Distributed Systems, and GPUs Exercises

Parallel Programming for Multi-Core, Distributed Systems, and GPUs Exercises Parallel Programming for Multi-Core, Distributed Systems, and GPUs Exercises Pierre-Yves Taunay Research Computing and Cyberinfrastructure 224A Computer Building The Pennsylvania State University University

More information

Introduction to SDSC systems and data analytics software packages "

Introduction to SDSC systems and data analytics software packages Introduction to SDSC systems and data analytics software packages " Mahidhar Tatineni (mahidhar@sdsc.edu) SDSC Summer Institute August 05, 2013 Getting Started" System Access Logging in Linux/Mac Use available

More information

A highly configurable and efficient simulator for job schedulers on supercomputers

A highly configurable and efficient simulator for job schedulers on supercomputers Mitglied der Helmholtz-Gemeinschaft A highly configurable and efficient simulator for job schedulers on supercomputers April 12, 2013 Carsten Karbach, Jülich Supercomputing Centre (JSC) Motivation Objective

More information

Using NeSI HPC Resources. NeSI Computational Science Team (support@nesi.org.nz)

Using NeSI HPC Resources. NeSI Computational Science Team (support@nesi.org.nz) NeSI Computational Science Team (support@nesi.org.nz) Outline 1 About Us About NeSI Our Facilities 2 Using the Cluster Suitable Work What to expect Parallel speedup Data Getting to the Login Node 3 Submitting

More information

Mitglied der Helmholtz-Gemeinschaft UNICORE. Uniform Access to JSC Resources. Michael Rambadt, unicore-info@fz-juelich.de. 20.

Mitglied der Helmholtz-Gemeinschaft UNICORE. Uniform Access to JSC Resources. Michael Rambadt, unicore-info@fz-juelich.de. 20. Mitglied der Helmholtz-Gemeinschaft UNICORE Uniform Access to JSC Resources 20. Mai 2014 Michael Rambadt, unicore-info@fz-juelich.de Outline Introduction Features UNICORE Portal UNICORE Rich Client UNICORE

More information

8/15/2014. Best Practices @OLCF (and more) General Information. Staying Informed. Staying Informed. Staying Informed-System Status

8/15/2014. Best Practices @OLCF (and more) General Information. Staying Informed. Staying Informed. Staying Informed-System Status Best Practices @OLCF (and more) Bill Renaud OLCF User Support General Information This presentation covers some helpful information for users of OLCF Staying informed Aspects of system usage that may differ

More information

Submitting and Running Jobs on the Cray XT5

Submitting and Running Jobs on the Cray XT5 Submitting and Running Jobs on the Cray XT5 Richard Gerber NERSC User Services RAGerber@lbl.gov Joint Cray XT5 Workshop UC-Berkeley Outline Hopper in blue; Jaguar in Orange; Kraken in Green XT5 Overview

More information

Using the Windows Cluster

Using the Windows Cluster Using the Windows Cluster Christian Terboven terboven@rz.rwth aachen.de Center for Computing and Communication RWTH Aachen University Windows HPC 2008 (II) September 17, RWTH Aachen Agenda o Windows Cluster

More information

Introduction to HPC Workshop. Center for e-research (eresearch@nesi.org.nz)

Introduction to HPC Workshop. Center for e-research (eresearch@nesi.org.nz) Center for e-research (eresearch@nesi.org.nz) Outline 1 About Us About CER and NeSI The CS Team Our Facilities 2 Key Concepts What is a Cluster Parallel Programming Shared Memory Distributed Memory 3 Using

More information

Submitting batch jobs Slurm on ecgate. Xavi Abellan xavier.abellan@ecmwf.int User Support Section

Submitting batch jobs Slurm on ecgate. Xavi Abellan xavier.abellan@ecmwf.int User Support Section Submitting batch jobs Slurm on ecgate Xavi Abellan xavier.abellan@ecmwf.int User Support Section Slide 1 Outline Interactive mode versus Batch mode Overview of the Slurm batch system on ecgate Batch basic

More information

LoadLeveler Overview. January 30-31, 2012. IBM Storage & Technology Group. IBM HPC Developer Education @ TIFR, Mumbai

LoadLeveler Overview. January 30-31, 2012. IBM Storage & Technology Group. IBM HPC Developer Education @ TIFR, Mumbai IBM HPC Developer Education @ TIFR, Mumbai IBM Storage & Technology Group LoadLeveler Overview January 30-31, 2012 Pidad D'Souza (pidsouza@in.ibm.com) IBM, System & Technology Group 2009 IBM Corporation

More information

Grid Engine Basics. Table of Contents. Grid Engine Basics Version 1. (Formerly: Sun Grid Engine)

Grid Engine Basics. Table of Contents. Grid Engine Basics Version 1. (Formerly: Sun Grid Engine) Grid Engine Basics (Formerly: Sun Grid Engine) Table of Contents Table of Contents Document Text Style Associations Prerequisites Terminology What is the Grid Engine (SGE)? Loading the SGE Module on Turing

More information

To connect to the cluster, simply use a SSH or SFTP client to connect to:

To connect to the cluster, simply use a SSH or SFTP client to connect to: RIT Computer Engineering Cluster The RIT Computer Engineering cluster contains 12 computers for parallel programming using MPI. One computer, cluster-head.ce.rit.edu, serves as the master controller or

More information

Batch Systems. provide a mechanism for submitting, launching, and tracking jobs on a shared resource

Batch Systems. provide a mechanism for submitting, launching, and tracking jobs on a shared resource PBS INTERNALS PBS & TORQUE PBS (Portable Batch System)-software system for managing system resources on workstations, SMP systems, MPPs and vector computers. It was based on Network Queuing System (NQS)

More information

1.0. User Manual For HPC Cluster at GIKI. Volume. Ghulam Ishaq Khan Institute of Engineering Sciences & Technology

1.0. User Manual For HPC Cluster at GIKI. Volume. Ghulam Ishaq Khan Institute of Engineering Sciences & Technology Volume 1.0 FACULTY OF CUMPUTER SCIENCE & ENGINEERING Ghulam Ishaq Khan Institute of Engineering Sciences & Technology User Manual For HPC Cluster at GIKI Designed and prepared by Faculty of Computer Science

More information

OpenMP & MPI CISC 879. Tristan Vanderbruggen & John Cavazos Dept of Computer & Information Sciences University of Delaware

OpenMP & MPI CISC 879. Tristan Vanderbruggen & John Cavazos Dept of Computer & Information Sciences University of Delaware OpenMP & MPI CISC 879 Tristan Vanderbruggen & John Cavazos Dept of Computer & Information Sciences University of Delaware 1 Lecture Overview Introduction OpenMP MPI Model Language extension: directives-based

More information

Running on Blue Gene/Q at Argonne Leadership Computing Facility (ALCF)

Running on Blue Gene/Q at Argonne Leadership Computing Facility (ALCF) Running on Blue Gene/Q at Argonne Leadership Computing Facility (ALCF) ALCF Resources: Machines & Storage Mira (Production) IBM Blue Gene/Q 49,152 nodes / 786,432 cores 768 TB of memory Peak flop rate:

More information

Cluster@WU User s Manual

Cluster@WU User s Manual Cluster@WU User s Manual Stefan Theußl Martin Pacala September 29, 2014 1 Introduction and scope At the WU Wirtschaftsuniversität Wien the Research Institute for Computational Methods (Forschungsinstitut

More information

Using the Yale HPC Clusters

Using the Yale HPC Clusters Using the Yale HPC Clusters Stephen Weston Robert Bjornson Yale Center for Research Computing Yale University Oct 2015 To get help Send an email to: hpc@yale.edu Read documentation at: http://research.computing.yale.edu/hpc-support

More information

High Performance Computing Facility Specifications, Policies and Usage. Supercomputer Project. Bibliotheca Alexandrina

High Performance Computing Facility Specifications, Policies and Usage. Supercomputer Project. Bibliotheca Alexandrina High Performance Computing Facility Specifications, Policies and Usage Supercomputer Project Bibliotheca Alexandrina Bibliotheca Alexandrina 1/16 Topics Specifications Overview Site Policies Intel Compilers

More information

A Performance Data Storage and Analysis Tool

A Performance Data Storage and Analysis Tool A Performance Data Storage and Analysis Tool Steps for Using 1. Gather Machine Data 2. Build Application 3. Execute Application 4. Load Data 5. Analyze Data 105% Faster! 72% Slower Build Application Execute

More information

Biowulf2 Training Session

Biowulf2 Training Session Biowulf2 Training Session 9 July 2015 Slides at: h,p://hpc.nih.gov/docs/b2training.pdf HPC@NIH website: h,p://hpc.nih.gov System hardware overview What s new/different The batch system & subminng jobs

More information

OLCF Best Practices (and More) Bill Renaud OLCF User Assistance Group

OLCF Best Practices (and More) Bill Renaud OLCF User Assistance Group OLCF Best Practices (and More) Bill Renaud OLCF User Assistance Group Overview This presentation covers some helpful information for users of OLCF Staying informed Some aspects of system usage that may

More information

Parallel Computing using MATLAB Distributed Compute Server ZORRO HPC

Parallel Computing using MATLAB Distributed Compute Server ZORRO HPC Parallel Computing using MATLAB Distributed Compute Server ZORRO HPC Goals of the session Overview of parallel MATLAB Why parallel MATLAB? Multiprocessing in MATLAB Parallel MATLAB using the Parallel Computing

More information

HPC-Nutzer Informationsaustausch. The Workload Management System LSF

HPC-Nutzer Informationsaustausch. The Workload Management System LSF HPC-Nutzer Informationsaustausch The Workload Management System LSF Content Cluster facts Job submission esub messages Scheduling strategies Tools and security Future plans 2 von 10 Some facts about the

More information

PBSPro scheduling. PBS overview Qsub command: resource requests. Queues a7ribu8on. Fairshare. Backfill Jobs submission.

PBSPro scheduling. PBS overview Qsub command: resource requests. Queues a7ribu8on. Fairshare. Backfill Jobs submission. PBSPro scheduling PBS overview Qsub command: resource requests Queues a7ribu8on Fairshare Backfill Jobs submission 9 mai 03 PBS PBS overview 9 mai 03 PBS PBS organiza5on: daemons frontend compute nodes

More information

Manual for using Super Computing Resources

Manual for using Super Computing Resources Manual for using Super Computing Resources Super Computing Research and Education Centre at Research Centre for Modeling and Simulation National University of Science and Technology H-12 Campus, Islamabad

More information

HPC at IU Overview. Abhinav Thota Research Technologies Indiana University

HPC at IU Overview. Abhinav Thota Research Technologies Indiana University HPC at IU Overview Abhinav Thota Research Technologies Indiana University What is HPC/cyberinfrastructure? Why should you care? Data sizes are growing Need to get to the solution faster Compute power is

More information

A High Performance Computing Scheduling and Resource Management Primer

A High Performance Computing Scheduling and Resource Management Primer LLNL-TR-652476 A High Performance Computing Scheduling and Resource Management Primer D. H. Ahn, J. E. Garlick, M. A. Grondona, D. A. Lipari, R. R. Springmeyer March 31, 2014 Disclaimer This document was

More information

Debugging with TotalView

Debugging with TotalView Tim Cramer 17.03.2015 IT Center der RWTH Aachen University Why to use a Debugger? If your program goes haywire, you may... ( wand (... buy a magic... read the source code again and again and...... enrich

More information

SGE Roll: Users Guide. Version @VERSION@ Edition

SGE Roll: Users Guide. Version @VERSION@ Edition SGE Roll: Users Guide Version @VERSION@ Edition SGE Roll: Users Guide : Version @VERSION@ Edition Published Aug 2006 Copyright 2006 UC Regents, Scalable Systems Table of Contents Preface...i 1. Requirements...1

More information

Introduction to the SGE/OGS batch-queuing system

Introduction to the SGE/OGS batch-queuing system Grid Computing Competence Center Introduction to the SGE/OGS batch-queuing system Riccardo Murri Grid Computing Competence Center, Organisch-Chemisches Institut, University of Zurich Oct. 6, 2011 The basic

More information

General Overview. Slurm Training15. Alfred Gil & Jordi Blasco (HPCNow!)

General Overview. Slurm Training15. Alfred Gil & Jordi Blasco (HPCNow!) Slurm Training15 Agenda 1 2 3 About Slurm Key Features of Slurm Extending Slurm Resource Management Daemons Job/step allocation 4 5 SMP MPI Parametric Job monitoring Accounting Scheduling Administration

More information

Parallel Processing using the LOTUS cluster

Parallel Processing using the LOTUS cluster Parallel Processing using the LOTUS cluster Alison Pamment / Cristina del Cano Novales JASMIN/CEMS Workshop February 2015 Overview Parallelising data analysis LOTUS HPC Cluster Job submission on LOTUS

More information

Moab and TORQUE Highlights CUG 2015

Moab and TORQUE Highlights CUG 2015 Moab and TORQUE Highlights CUG 2015 David Beer TORQUE Architect 28 Apr 2015 Gary D. Brown HPC Product Manager 1 Agenda NUMA-aware Heterogeneous Jobs Ascent Project Power Management and Energy Accounting

More information

The CNMS Computer Cluster

The CNMS Computer Cluster The CNMS Computer Cluster This page describes the CNMS Computational Cluster, how to access it, and how to use it. Introduction (2014) The latest block of the CNMS Cluster (2010) Previous blocks of the

More information

Job Scheduling on a Large UV 1000. Chad Vizino SGI User Group Conference May 2011. 2011 Pittsburgh Supercomputing Center

Job Scheduling on a Large UV 1000. Chad Vizino SGI User Group Conference May 2011. 2011 Pittsburgh Supercomputing Center Job Scheduling on a Large UV 1000 Chad Vizino SGI User Group Conference May 2011 Overview About PSC s UV 1000 Simon UV Distinctives UV Operational issues Conclusion PSC s UV 1000 - Blacklight Blacklight

More information

HPCC - Hrothgar Getting Started User Guide MPI Programming

HPCC - Hrothgar Getting Started User Guide MPI Programming HPCC - Hrothgar Getting Started User Guide MPI Programming High Performance Computing Center Texas Tech University HPCC - Hrothgar 2 Table of Contents 1. Introduction... 3 2. Setting up the environment...

More information

Introduction to Sun Grid Engine (SGE)

Introduction to Sun Grid Engine (SGE) Introduction to Sun Grid Engine (SGE) What is SGE? Sun Grid Engine (SGE) is an open source community effort to facilitate the adoption of distributed computing solutions. Sponsored by Sun Microsystems

More information

Stanford HPC Conference. Panasas Storage System Integration into a Cluster

Stanford HPC Conference. Panasas Storage System Integration into a Cluster Stanford HPC Conference Panasas Storage System Integration into a Cluster David Yu Industry Verticals Panasas Inc. Steve Jones Technology Operations Manager Institute for Computational and Mathematical

More information

OLCF Best Practices. Bill Renaud OLCF User Assistance Group

OLCF Best Practices. Bill Renaud OLCF User Assistance Group OLCF Best Practices Bill Renaud OLCF User Assistance Group Overview This presentation covers some helpful information for users of OLCF Staying informed Some aspects of system usage that may differ from

More information

Batch Scheduling on the Cray XT3

Batch Scheduling on the Cray XT3 Batch Scheduling on the Cray XT3 Chad Vizino, Nathan Stone, John Kochmar, J. Ray Scott {vizino,nstone,kochmar,scott}@psc.edu Pittsburgh Supercomputing Center ABSTRACT: The Pittsburgh Supercomputing Center

More information

Broadening Moab/TORQUE for Expanding User Needs

Broadening Moab/TORQUE for Expanding User Needs Broadening Moab/TORQUE for Expanding User Needs Gary D. Brown HPC Product Manager CUG 2016 1 2016 Adaptive Computing Enterprises, Inc. Agenda DataWarp Intel MIC KNL Viewpoint Web Portal User Portal Administrator

More information

CNAG User s Guide. Barcelona Supercomputing Center Copyright c 2015 BSC-CNS December 18, 2015. 1 Introduction 2

CNAG User s Guide. Barcelona Supercomputing Center Copyright c 2015 BSC-CNS December 18, 2015. 1 Introduction 2 CNAG User s Guide Barcelona Supercomputing Center Copyright c 2015 BSC-CNS December 18, 2015 Contents 1 Introduction 2 2 System Overview 2 3 Connecting to CNAG cluster 2 3.1 Transferring files...................................

More information

Best Practice mini-guide "Stokes"

Best Practice mini-guide Stokes SGI Altix ICE at ICHEC Michael Lysaght, ICHEC Niall Wilson, ICHEC Eoin McHugh, ICHEC Michael Browne, ICHEC Gilles Civario, ICHEC February 2013 1 Table of Contents 1. Introduction... 3 2. System architecture

More information

Grid Engine Users Guide. 2011.11p1 Edition

Grid Engine Users Guide. 2011.11p1 Edition Grid Engine Users Guide 2011.11p1 Edition Grid Engine Users Guide : 2011.11p1 Edition Published Nov 01 2012 Copyright 2012 University of California and Scalable Systems This document is subject to the

More information

Allinea Performance Reports User Guide. Version 6.0.6

Allinea Performance Reports User Guide. Version 6.0.6 Allinea Performance Reports User Guide Version 6.0.6 Contents Contents 1 1 Introduction 4 1.1 Online Resources...................................... 4 2 Installation 5 2.1 Linux/Unix Installation...................................

More information

Grid Engine 6. Troubleshooting. BioTeam Inc. info@bioteam.net

Grid Engine 6. Troubleshooting. BioTeam Inc. info@bioteam.net Grid Engine 6 Troubleshooting BioTeam Inc. info@bioteam.net Grid Engine Troubleshooting There are two core problem types Job Level Cluster seems OK, example scripts work fine Some user jobs/apps fail Cluster

More information

DiskPulse DISK CHANGE MONITOR

DiskPulse DISK CHANGE MONITOR DiskPulse DISK CHANGE MONITOR User Manual Version 7.9 Oct 2015 www.diskpulse.com info@flexense.com 1 1 DiskPulse Overview...3 2 DiskPulse Product Versions...5 3 Using Desktop Product Version...6 3.1 Product

More information

Informationsaustausch für Nutzer des Aachener HPC Clusters

Informationsaustausch für Nutzer des Aachener HPC Clusters Informationsaustausch für Nutzer des Aachener HPC Clusters Paul Kapinos, Marcus Wagner - 21.05.2015 Informationsaustausch für Nutzer des Aachener HPC Clusters Agenda (The RWTH Compute cluster) Project-based

More information

Introduction to Linux and Cluster Basics for the CCR General Computing Cluster

Introduction to Linux and Cluster Basics for the CCR General Computing Cluster Introduction to Linux and Cluster Basics for the CCR General Computing Cluster Cynthia Cornelius Center for Computational Research University at Buffalo, SUNY 701 Ellicott St Buffalo, NY 14203 Phone: 716-881-8959

More information

On-demand (Pay-per-Use) HPC Service Portal

On-demand (Pay-per-Use) HPC Service Portal On-demand (Pay-per-Use) Portal Wang Junhong INTRODUCTION High Performance Computing, Computer Centre The Service Portal is a key component of the On-demand (pay-per-use) HPC service delivery. The Portal,

More information

High-Performance Reservoir Risk Assessment (Jacta Cluster)

High-Performance Reservoir Risk Assessment (Jacta Cluster) High-Performance Reservoir Risk Assessment (Jacta Cluster) SKUA-GOCAD 2013.1 Paradigm 2011.3 With Epos 4.1 Data Management Configuration Guide 2008 2013 Paradigm Ltd. or its affiliates and subsidiaries.

More information

MATLAB Distributed Computing Server System Administrator's Guide

MATLAB Distributed Computing Server System Administrator's Guide MATLAB Distributed Computing Server System Administrator's Guide R2015b How to Contact MathWorks Latest news: www.mathworks.com Sales and services: www.mathworks.com/sales_and_services User community:

More information

Until now: tl;dr: - submit a job to the scheduler

Until now: tl;dr: - submit a job to the scheduler Until now: - access the cluster copy data to/from the cluster create parallel software compile code and use optimized libraries how to run the software on the full cluster tl;dr: - submit a job to the

More information

GC3: Grid Computing Competence Center Cluster computing, I Batch-queueing systems

GC3: Grid Computing Competence Center Cluster computing, I Batch-queueing systems GC3: Grid Computing Competence Center Cluster computing, I Batch-queueing systems Riccardo Murri, Sergio Maffioletti Grid Computing Competence Center, Organisch-Chemisches Institut, University of Zurich

More information

High Performance Computing at the Oak Ridge Leadership Computing Facility

High Performance Computing at the Oak Ridge Leadership Computing Facility Page 1 High Performance Computing at the Oak Ridge Leadership Computing Facility Page 2 Outline Our Mission Computer Systems: Present, Past, Future Challenges Along the Way Resources for Users Page 3 Our

More information

Cluster Computing With R

Cluster Computing With R Cluster Computing With R Stowers Institute for Medical Research R/Bioconductor Discussion Group Earl F. Glynn Scientific Programmer 18 December 2007 1 Cluster Computing With R Accessing Linux Boxes from

More information

RWTH GPU Cluster. Sandra Wienke wienke@rz.rwth-aachen.de November 2012. Rechen- und Kommunikationszentrum (RZ) Fotos: Christian Iwainsky

RWTH GPU Cluster. Sandra Wienke wienke@rz.rwth-aachen.de November 2012. Rechen- und Kommunikationszentrum (RZ) Fotos: Christian Iwainsky RWTH GPU Cluster Fotos: Christian Iwainsky Sandra Wienke wienke@rz.rwth-aachen.de November 2012 Rechen- und Kommunikationszentrum (RZ) The RWTH GPU Cluster GPU Cluster: 57 Nvidia Quadro 6000 (Fermi) innovative

More information