Using the general purpose computing clusters at EPFL
|
|
- Hugh Stewart Kennedy
- 7 years ago
- Views:
Transcription
1 Using the general purpose computing clusters at EPFL scitas.epfl.ch March 5, 2015
2 1 / 1 Bellatrix Frontend at bellatrix.epfl.ch 16 x 2.2 GHz cores per node 424 nodes with 32GB Infiniband QDR network The batch system is SLURM RedHat 6.5
3 Castor I Frontend at castor.epfl.ch I 16 x 2.6 GHz cores per node I 50 nodes with 64GB I 2 nodes with 256GB I For sequential jobs (Matlab etc.) I The batch system is SLURM I RedHat 6.5 2/1
4 3 / 1 Deneb Frontend at deneb.epfl.ch 16 x 2.6 GHz cores per node 376 nodes with 64GB 8 nodes with 256GB 2 nodes with 512GB and 32 cores 16 nodes with 4 Nvidia K40 GPUs Infiniband QDR network The batch system is SLURM RedHat 6.5
5 4 / 1 Storage /home filesystem has per user quotas will be backed up for important things (source code, results and theses) shared by all clusters /scratch high performance temporary space is not backed up is organised by laboratory local to cluster
6 5 / 1 Connection Start the X server (automatic on a Mac) Open a terminal ssh -Y username@castor.epfl.ch Try the following commands: id pwd quota ls /scratch/<group>/<username>
7 6 / 1 The batch system Goal: to take a list of jobs and execute them when appropriate resources become available SCITAS uses SLURM on its clusters: The configuration depends on the purpose of the cluster (serial vs parallel)
8 7 / 1 sbatch The fundamental command is sbatch sbatch submits jobs to the batch system Suggested workflow: create a short job-script submit it to the batch system
9 8 / 1 sbatch - exercise Copy the first two examples to your home directory cp /scratch/examples/ex1.run. cp /scratch/examples/ex2.run. Open the file ex1.run with your editor of choice
10 9 / 1 ex1.run #!/bin/bash #SBATCH --workdir /scratch/<group>/<username> #SBATCH --nodes 1 #SBATCH --ntasks 1 #SBATCH --cpus-per-task 1 #SBATCH --mem 1024 sleep 10 echo "hello from $(hostname)" sleep 10
11 10 / 1 ex1.run #SBATCH is a directive to the batch system --nodes 1 the number of nodes to use - on Castor this is limited to 1 --ntasks 1 the number of tasks (in an MPI sense) to run per job --cpu-per-task 1 the number of cores per aforementioned task --mem 4096 the memory required per node in MB --time 12:00:00 --time 2-6 the time required # 12 hours # two days and six hours
12 11 / 1 Running ex1.run The job is assigned a default runtime of 15 minutes $ sbatch ex1.run Submitted batch job 439 $ cat /scratch/<group>/<username>/slurm-439.out hello from c03
13 12 / 1 What went on? sacct -j <JOB_ID> sacct -l -j <JOB_ID> Or more usefully: Sjob <JOB ID>
14 13 / 1 Cancelling jobs To cancel a specific job: scancel <JOB_ID> To cancel all your jobs: scancel -u <username>
15 14 / 1 ex2.run #!/bin/bash #SBATCH --workdir /scratch/<group>/<username> #SBATCH --nodes 1 #SBATCH --ntasks 1 #SBATCH --cpus-per-task 8 #SBATCH --mem #SBATCH --time 00:30:00 #SBATCH --mail-type=all #SBATCH --mail-user=< _address> /scratch/examples/linpack/runme_1_45k
16 15 / 1 What s going on? squeue squeue -u <username> Squeue Sjob <JOB_ID> scontrol -d show job <JOB_ID> sinfo
17 16 / 1 squeue and Squeue squeue Job states: Pending, Resources, Priority, Running squeue grep <JOB_ID> squeue -j <JOB_ID> Squeue <JOB_ID>
18 17 / 1 Sjob $ Sjob <JOB_ID> JobID JobName Cluster Account Partition Timelimit User Group ex1.run castor scitas-ge serial 00:15:00 jmenu scitas-ge batch batch castor scitas-ge Submit Eligible Start End T15:55: T15:55: T15:55: T15:56: T15:55: T15:55: T15:55: T15:56:08 Elapsed ExitCode State :00:20 0:0 COMPLETED 00:00:20 0:0 COMPLETED NCPUS NTasks NodeList UserCPU SystemCPU AveCPU MaxVMSize c04 00:00:00 00: c04 00:00:00 00: :00: K
19 18 / 1 scontrol $ scontrol -d show job <JOB_ID> $ scontrol -d show job 400 obid=400 Name=s1.job UserId=user(123456) GroupId=group(654321) Priority=111 Account=scitas-ge QOS=normal JobState=RUNNING Reason=None Dependency=(null) Requeue=1 Restarts=0 BatchFlag=1 ExitCode=0:0 DerivedExitCode=0:0 RunTime=00:03:39 TimeLimit=00:15:00 TimeMin=N/A SubmitTime= T09:45:27 EligibleTime= T09:45:27 StartTime= T09:45:27 EndTime= T10:00:27 PreemptTime=None SuspendTime=None SecsPreSuspend=0 Partition=serial AllocNode:Sid=castor: ReqNodeList=(null) ExcNodeList=(null) NodeList=c03 BatchHost=c03 NumNodes=1 NumCPUs=1 CPUs/Task=1 ReqS:C:T=*:*:* Nodes=c03 CPU IDs=0 Mem=1024 MinCPUsNode=1 MinMemoryCPU=1024M MinTmpDiskNode=0 Features=(null) Gres=(null) Reservation=(null) Shared=OK Contiguous=0 Licenses=(null) Network=(null) Command=/home/<user>/jobs/s1.job WorkDir=/scratch/<group>/<user>
20 19 / 1 Modules Modules make your life easier module avail module show <take your pick> module load <take your pick> module list module purge module list
21 20 / 1 ex3.run - Mathematica Copy the following files to your chosen directory: cp /scratch/examples/ex3.run. cp /scratch/examples/mathematica.in. Submit ex3.run to the batch system and see what happens...
22 21 / 1 ex3.run #!/bin/bash #SBATCH --ntasks 1 #SBATCH --cpus-per-task 1 #SBATCH --nodes 1 #SBATCH --mem 4096 #SBATCH --time 00:05:00 echo STARTING AT date module purge module load mathematica/9.0.1 math < mathematica.in echo FINISHED at date
23 22 / 1 Compiling ex4.* source files Copy the following files to your chosen directory: /scratch/examples/ex4_readme.txt /scratch/examples/ex4.c /scratch/examples/ex4.cxx /scratch/examples/ex4.f90 /scratch/examples/ex4.run Then compile them with: module load intelmpi/4.1.3 mpiicc -o ex4 c ex4.c mpiicpc -o ex4 cxx ex4.cxx mpiifort -o ex4 f90 ex4.f90
24 23 / 1 The 3 methods to get interactive access 1/3 In order to schedule an allocation use salloc with exactly the same options for resources as sbatch You will then arrive in a new prompt which is still on the submission node but by using srun you can get access to the allocated resources eroche@castor:hello > salloc -N 1 -n 2 salloc: Granted job allocation 1234 bash-4.1$ hostname castor bash-4.1$ srun hostname c03 c03
25 24 / 1 The 3 methods to get interactive access 2/3 To get a prompt on the machine one needs to use the --pty option with srun and then bash -i (or tcsh -i ) to get the shell: eroche@castor > salloc -N 1 -n 1 salloc: Granted job allocation 1235 eroche@castor > srun --pty bash -i bash-4.1$ hostname c03
26 25 / 1 The 3 methods to get interactive access 3/3 This is the least elegant but it is the method by which one can run X11 applications: eroche@bellatrix > salloc -n 1 -c 16 -N 1 salloc: Granted job allocation 1236 bash-4.1$ srun hostname c04 bash-4.1$ ssh -Y c04 eroche@c04 >
27 26 / 1 The SLURM jobs environment SLURM defines a number of variables, among them: jmenu@castor: > srun env grep SLURM SLURM SUBMIT DIR=/home/jmenu SLURM SUBMIT HOST=castor SLURM JOB ID=90771 SLURM JOB NUM NODES=1 SLURM JOB NODELIST=c05 SLURM JOB CPUS PER NODE=1 SLURM MEM PER CPU=1024 SLURM JOBID= SLURM NODELIST=c05 SLURM TASKS PER NODE=1 SLURM NTASKS=1 SLURM NPROCS=1... SLURM CPUS ON NODE=1 SLURM TASK PID= SLURMD NODENAME=c05
28 27 / 1 Dynamic libs used in an application ldd displays the libraries an executable file depends on: jmenu@castor: ~ /COURS > ldd ex4 f90 linux-vdso.so.1 => (0x00007fff4b905000) libmpigf.so.4 => /opt/software/intel/14.0.1/intel64/lib/libmpigf.so.4 (0x00007f556cf88000) libmpi.so.4 => /opt/software/intel/14.0.1/intel64/lib/libmpi.so.4 (0x00007f556c91c000) libdl.so.2 => /lib64/libdl.so.2 (0x e00000) librt.so.1 => /lib64/librt.so.1 (0x a00000) libpthread.so.0 => /lib64/libpthread.so.0 (0x ) libm.so.6 => /lib64/libm.so.6 (0x ) libc.so.6 => /lib64/libc.so.6 (0x a00000) libgcc s.so.1 => /lib64/libgcc s.so.1 (0x c600000) /lib64/ld-linux-x86-64.so.2 (0x )
29 28 / 1 ex4.run #!/bin/bash... module purge module load intelmpi/4.1.3 module list echo LAUNCH DIR=/scratch/scitas-ge/jmenu EXECUTABLE="./ex4 f90" echo "--> LAUNCH DIR = ${LAUNCH DIR}" echo "--> EXECUTABLE = ${EXECUTABLE}" echo echo "--> ${EXECUTABLE} depends on the following dynamic libraries:" ldd ${EXECUTABLE} echo cd ${LAUNCH DIR} srun ${EXECUTABLE}...
30 29 / 1 The debug partition In order to have priority access for debugging sbatch --partition=debug ex1.run Limits on Castor: 1 hours walltime 16 cores between all users
31 30 / 1 General: cgroups (Castor) cgroups ( control groups ) is a Linux kernel feature to limit, account, and isolate resource usage (CPU, memory, disk I/O, etc.) of process groups SLURM: Linux cgroups apply contraints to the CPUs and memory that can be used by a job They are automatically generated using the resource requests given to SLURM They are automatically destroyed at the end of the job, thus releasing all resources used Even if there is physical memory available a task will be killed if it tries to exceed the limits of the cgroup!
32 31 / 1 System process view Two tasks running on the same node with ps auxf root slurmstepd: [1072] user \_ /bin/bash /var/spool/slurmd/job01072/slurm_script user \_ sleep 10 root slurmstepd: [1073] user \_ /bin/bash /var/spool/slurmd/job01073/slurm_script user \_ sleep 10 Check memory, thread and core usage with htop
33 32 / 1 Fair share 1/3 The scheduler is configured to give all groups a share of the computing power Within each group the members have an equal share by default: jmenu@castor:~ > sacctmgr show association where account=lacal format=account,cluster,user,grpnodes, QOS,DefaultQOS,Share tree Account Cluster User GrpNodes QOS Def QOS Share lacal castor normal 1 lacal castor aabecker debug,normal normal 1 lacal castor kleinjun debug,normal normal 1 lacal castor knikitin debug,normal normal 1 Priority is based on recent usage this is forgotten about with time (half-life) fair share comes into play when the resources are heavily used
34 33 / 1 Fair share 2/3 Job priority is a weighted sum of various factors: jmenu@castor:~ > sprio -w JOBID PRIORITY AGE FAIRSHARE QOS Weights jmenu@bellatrix:~ > sprio -w JOBID PRIORITY AGE FAIRSHARE JOBSIZE QOS Weights To compare jobs priorities: jmenu@castor:~ > sprio -j80833,77613 JOBID PRIORITY AGE FAIRSHARE QOS
35 34 / 1 Fair share 3/3 FairShare values range from 0.0 to 1.0: Value Meaning 0.0 you used much more resources that you were granted 0.5 you got what you paid for 1.0 you used nearly no resources jmenu@bellatrix:~ > sshare -a -A lacal Accounts requested: : lacal Account User Raw Shares Norm Shares Raw Usage Effectv Usage FairShare lacal lacal boissaye lacal janson lacal jetchev lacal kleinjun lacal pbottine lacal saltini More information at:
36 35 / 1 Helping yourself man pages are your friend! man sbatch man sacct man gcc module load intel/ man ifort
37 36 / 1 Getting help If you still have problems then send a message to: 1234@epfl.ch Please start the subject with HPC for automatic routing to the HPC team Please give as much information as possible including: the jobid the directory location and name of the submission script where the slurm-*.out file is to be found how the sbatch command was used to submit it the output from env and module list commands
38 37 / 1 Appendix Change your shell at: Scitas web site:
SLURM Workload Manager
SLURM Workload Manager What is SLURM? SLURM (Simple Linux Utility for Resource Management) is the native scheduler software that runs on ASTI's HPC cluster. Free and open-source job scheduler for the Linux
More informationSubmitting batch jobs Slurm on ecgate. Xavi Abellan xavier.abellan@ecmwf.int User Support Section
Submitting batch jobs Slurm on ecgate Xavi Abellan xavier.abellan@ecmwf.int User Support Section Slide 1 Outline Interactive mode versus Batch mode Overview of the Slurm batch system on ecgate Batch basic
More informationUntil now: tl;dr: - submit a job to the scheduler
Until now: - access the cluster copy data to/from the cluster create parallel software compile code and use optimized libraries how to run the software on the full cluster tl;dr: - submit a job to the
More informationGeneral Overview. Slurm Training15. Alfred Gil & Jordi Blasco (HPCNow!)
Slurm Training15 Agenda 1 2 3 About Slurm Key Features of Slurm Extending Slurm Resource Management Daemons Job/step allocation 4 5 SMP MPI Parametric Job monitoring Accounting Scheduling Administration
More informationIntroduction to Running Computations on the High Performance Clusters at the Center for Computational Research
! Introduction to Running Computations on the High Performance Clusters at the Center for Computational Research! Cynthia Cornelius! Center for Computational Research University at Buffalo, SUNY! cdc at
More informationSLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education www.accre.vanderbilt.
SLURM: Resource Management and Job Scheduling Software Advanced Computing Center for Research and Education www.accre.vanderbilt.edu Simple Linux Utility for Resource Management But it s also a job scheduler!
More informationSLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education www.accre.vanderbilt.
SLURM: Resource Management and Job Scheduling Software Advanced Computing Center for Research and Education www.accre.vanderbilt.edu Simple Linux Utility for Resource Management But it s also a job scheduler!
More informationAn introduction to compute resources in Biostatistics. Chris Scheller schelcj@umich.edu
An introduction to compute resources in Biostatistics Chris Scheller schelcj@umich.edu 1. Resources 1. Hardware 2. Account Allocation 3. Storage 4. Software 2. Usage 1. Environment Modules 2. Tools 3.
More informationBiowulf2 Training Session
Biowulf2 Training Session 9 July 2015 Slides at: h,p://hpc.nih.gov/docs/b2training.pdf HPC@NIH website: h,p://hpc.nih.gov System hardware overview What s new/different The batch system & subminng jobs
More informationIntroduction to Linux and Cluster Basics for the CCR General Computing Cluster
Introduction to Linux and Cluster Basics for the CCR General Computing Cluster Cynthia Cornelius Center for Computational Research University at Buffalo, SUNY 701 Ellicott St Buffalo, NY 14203 Phone: 716-881-8959
More informationThe Asterope compute cluster
The Asterope compute cluster ÅA has a small cluster named asterope.abo.fi with 8 compute nodes Each node has 2 Intel Xeon X5650 processors (6-core) with a total of 24 GB RAM 2 NVIDIA Tesla M2050 GPGPU
More informationNEC HPC-Linux-Cluster
NEC HPC-Linux-Cluster Hardware configuration: 4 Front-end servers: each with SandyBridge-EP processors: 16 cores per node 128 GB memory 134 compute nodes: 112 nodes with SandyBridge-EP processors (16 cores
More informationMatlab on a Supercomputer
Matlab on a Supercomputer Shelley L. Knuth Research Computing April 9, 2015 Outline Description of Matlab and supercomputing Interactive Matlab jobs Non-interactive Matlab jobs Parallel Computing Slides
More informationIntroduction to Supercomputing with Janus
Introduction to Supercomputing with Janus Shelley Knuth shelley.knuth@colorado.edu Peter Ruprecht peter.ruprecht@colorado.edu www.rc.colorado.edu Outline Who is CU Research Computing? What is a supercomputer?
More informationManaging GPUs by Slurm. Massimo Benini HPC Advisory Council Switzerland Conference March 31 - April 3, 2014 Lugano
Managing GPUs by Slurm Massimo Benini HPC Advisory Council Switzerland Conference March 31 - April 3, 2014 Lugano Agenda General Slurm introduction Slurm@CSCS Generic Resource Scheduling for GPUs Resource
More informationMiami University RedHawk Cluster Working with batch jobs on the Cluster
Miami University RedHawk Cluster Working with batch jobs on the Cluster The RedHawk cluster is a general purpose research computing resource available to support the research community at Miami University.
More informationIntroduction to parallel computing and UPPMAX
Introduction to parallel computing and UPPMAX Intro part of course in Parallel Image Analysis Elias Rudberg elias.rudberg@it.uu.se March 22, 2011 Parallel computing Parallel computing is becoming increasingly
More informationSLURM Resources isolation through cgroups. Yiannis Georgiou email: yiannis.georgiou@bull.fr Matthieu Hautreux email: matthieu.hautreux@cea.
SLURM Resources isolation through cgroups Yiannis Georgiou email: yiannis.georgiou@bull.fr Matthieu Hautreux email: matthieu.hautreux@cea.fr Outline Introduction to cgroups Cgroups implementation upon
More informationIntroduction to Running Hadoop on the High Performance Clusters at the Center for Computational Research
Introduction to Running Hadoop on the High Performance Clusters at the Center for Computational Research Cynthia Cornelius Center for Computational Research University at Buffalo, SUNY 701 Ellicott St
More informationAn introduction to Fyrkat
Cluster Computing May 25, 2011 How to get an account https://fyrkat.grid.aau.dk/useraccount How to get help https://fyrkat.grid.aau.dk/wiki What is a Cluster Anyway It is NOT something that does any of
More informationThe RWTH Compute Cluster Environment
The RWTH Compute Cluster Environment Tim Cramer 11.03.2013 Source: D. Both, Bull GmbH Rechen- und Kommunikationszentrum (RZ) How to login Frontends cluster.rz.rwth-aachen.de cluster-x.rz.rwth-aachen.de
More informationLinux für bwgrid. Sabine Richling, Heinz Kredel. Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim. 27.
Linux für bwgrid Sabine Richling, Heinz Kredel Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim 27. June 2011 Richling/Kredel (URZ/RUM) Linux für bwgrid FS 2011 1 / 33 Introduction
More informationTutorial-4a: Parallel (multi-cpu) Computing
HTTP://WWW.HEP.LU.SE/COURSES/MNXB01 Introduction to Programming and Computing for Scientists (2015 HT) Tutorial-4a: Parallel (multi-cpu) Computing Balazs Konya (Lund University) Programming for Scientists
More informationBatch Usage on JURECA Introduction to Slurm. Nov 2015 Chrysovalantis Paschoulas c.paschoulas@fz-juelich.de
Batch Usage on JURECA Introduction to Slurm Nov 2015 Chrysovalantis Paschoulas c.paschoulas@fz-juelich.de Batch System Concepts (1) A cluster consists of a set of tightly connected "identical" computers
More informationTo connect to the cluster, simply use a SSH or SFTP client to connect to:
RIT Computer Engineering Cluster The RIT Computer Engineering cluster contains 12 computers for parallel programming using MPI. One computer, cluster-head.ce.rit.edu, serves as the master controller or
More informationUsing the Yale HPC Clusters
Using the Yale HPC Clusters Stephen Weston Robert Bjornson Yale Center for Research Computing Yale University Oct 2015 To get help Send an email to: hpc@yale.edu Read documentation at: http://research.computing.yale.edu/hpc-support
More informationParallel Processing using the LOTUS cluster
Parallel Processing using the LOTUS cluster Alison Pamment / Cristina del Cano Novales JASMIN/CEMS Workshop February 2015 Overview Parallelising data analysis LOTUS HPC Cluster Job submission on LOTUS
More informationUsing NeSI HPC Resources. NeSI Computational Science Team (support@nesi.org.nz)
NeSI Computational Science Team (support@nesi.org.nz) Outline 1 About Us About NeSI Our Facilities 2 Using the Cluster Suitable Work What to expect Parallel speedup Data Getting to the Login Node 3 Submitting
More informationGetting Started with HPC
Getting Started with HPC An Introduction to the Minerva High Performance Computing Resource 17 Sep 2013 Outline of Topics Introduction HPC Accounts Logging onto the HPC Clusters Common Linux Commands Storage
More informationCNAG User s Guide. Barcelona Supercomputing Center Copyright c 2015 BSC-CNS December 18, 2015. 1 Introduction 2
CNAG User s Guide Barcelona Supercomputing Center Copyright c 2015 BSC-CNS December 18, 2015 Contents 1 Introduction 2 2 System Overview 2 3 Connecting to CNAG cluster 2 3.1 Transferring files...................................
More informationUMass High Performance Computing Center
.. UMass High Performance Computing Center University of Massachusetts Medical School October, 2014 2 / 32. Challenges of Genomic Data It is getting easier and cheaper to produce bigger genomic data every
More informationHPC-Nutzer Informationsaustausch. The Workload Management System LSF
HPC-Nutzer Informationsaustausch The Workload Management System LSF Content Cluster facts Job submission esub messages Scheduling strategies Tools and security Future plans 2 von 10 Some facts about the
More informationGrid Engine Users Guide. 2011.11p1 Edition
Grid Engine Users Guide 2011.11p1 Edition Grid Engine Users Guide : 2011.11p1 Edition Published Nov 01 2012 Copyright 2012 University of California and Scalable Systems This document is subject to the
More informationSlurm Workload Manager Architecture, Configuration and Use
Slurm Workload Manager Architecture, Configuration and Use Morris Jette jette@schedmd.com Goals Learn the basics of SLURM's architecture, daemons and commands Learn how to use a basic set of commands Learn
More informationUsing WestGrid. Patrick Mann, Manager, Technical Operations Jan.15, 2014
Using WestGrid Patrick Mann, Manager, Technical Operations Jan.15, 2014 Winter 2014 Seminar Series Date Speaker Topic 5 February Gino DiLabio Molecular Modelling Using HPC and Gaussian 26 February Jonathan
More informationAgenda. Using HPC Wales 2
Using HPC Wales Agenda Infrastructure : An Overview of our Infrastructure Logging in : Command Line Interface and File Transfer Linux Basics : Commands and Text Editors Using Modules : Managing Software
More informationIntroduction to the SGE/OGS batch-queuing system
Grid Computing Competence Center Introduction to the SGE/OGS batch-queuing system Riccardo Murri Grid Computing Competence Center, Organisch-Chemisches Institut, University of Zurich Oct. 6, 2011 The basic
More informationHodor and Bran - Job Scheduling and PBS Scripts
Hodor and Bran - Job Scheduling and PBS Scripts UND Computational Research Center Now that you have your program compiled and your input file ready for processing, it s time to run your job on the cluster.
More informationLSN 10 Linux Overview
LSN 10 Linux Overview ECT362 Operating Systems Department of Engineering Technology LSN 10 Linux Overview Linux Contemporary open source implementation of UNIX available for free on the Internet Introduced
More informationCHEOPS Cologne High Efficient Operating Platform for Science Brief Instructions
CHEOPS Cologne High Efficient Operating Platform for Science Brief Instructions (Version: 07.10.2013) Foto: Thomas Josek/JosekDesign Viktor Achter Dr. Stefan Borowski Lech Nieroda Dr. Lars Packschies Volker
More informationIntroduction to HPC Workshop. Center for e-research (eresearch@nesi.org.nz)
Center for e-research (eresearch@nesi.org.nz) Outline 1 About Us About CER and NeSI The CS Team Our Facilities 2 Key Concepts What is a Cluster Parallel Programming Shared Memory Distributed Memory 3 Using
More information8/15/2014. Best Practices @OLCF (and more) General Information. Staying Informed. Staying Informed. Staying Informed-System Status
Best Practices @OLCF (and more) Bill Renaud OLCF User Support General Information This presentation covers some helpful information for users of OLCF Staying informed Aspects of system usage that may differ
More informationHow to Run Parallel Jobs Efficiently
How to Run Parallel Jobs Efficiently Shao-Ching Huang High Performance Computing Group UCLA Institute for Digital Research and Education May 9, 2013 1 The big picture: running parallel jobs on Hoffman2
More informationWork Environment. David Tur HPC Expert. HPC Users Training September, 18th 2015
Work Environment David Tur HPC Expert HPC Users Training September, 18th 2015 1. Atlas Cluster: Accessing and using resources 2. Software Overview 3. Job Scheduler 1. Accessing Resources DIPC technicians
More informationAdvanced Techniques with Newton. Gerald Ragghianti Advanced Newton workshop Sept. 22, 2011
Advanced Techniques with Newton Gerald Ragghianti Advanced Newton workshop Sept. 22, 2011 Workshop Goals Gain independence Executing your work Finding Information Fixing Problems Optimizing Effectiveness
More informationManual for using Super Computing Resources
Manual for using Super Computing Resources Super Computing Research and Education Centre at Research Centre for Modeling and Simulation National University of Science and Technology H-12 Campus, Islamabad
More informationAn Introduction to High Performance Computing in the Department
An Introduction to High Performance Computing in the Department Ashley Ford & Chris Jewell Department of Statistics University of Warwick October 30, 2012 1 Some Background 2 How is Buster used? 3 Software
More informationGetting Started with Amazon EC2 Management in Eclipse
Getting Started with Amazon EC2 Management in Eclipse Table of Contents Introduction... 4 Installation... 4 Prerequisites... 4 Installing the AWS Toolkit for Eclipse... 4 Retrieving your AWS Credentials...
More informationTutorial: Using WestGrid. Drew Leske Compute Canada/WestGrid Site Lead University of Victoria
Tutorial: Using WestGrid Drew Leske Compute Canada/WestGrid Site Lead University of Victoria Fall 2013 Seminar Series Date Speaker Topic 23 September Lindsay Sill Introduction to WestGrid 9 October Drew
More informationThe CNMS Computer Cluster
The CNMS Computer Cluster This page describes the CNMS Computational Cluster, how to access it, and how to use it. Introduction (2014) The latest block of the CNMS Cluster (2010) Previous blocks of the
More informationBright Cluster Manager 5.2. User Manual. Revision: 3324. Date: Fri, 30 Nov 2012
Bright Cluster Manager 5.2 User Manual Revision: 3324 Date: Fri, 30 Nov 2012 Table of Contents Table of Contents........................... i 1 Introduction 1 1.1 What Is A Beowulf Cluster?..................
More informationOLCF Best Practices (and More) Bill Renaud OLCF User Assistance Group
OLCF Best Practices (and More) Bill Renaud OLCF User Assistance Group Overview This presentation covers some helpful information for users of OLCF Staying informed Some aspects of system usage that may
More informationStreamline Computing Linux Cluster User Training. ( Nottingham University)
1 Streamline Computing Linux Cluster User Training ( Nottingham University) 3 User Training Agenda System Overview System Access Description of Cluster Environment Code Development Job Schedulers Running
More informationCluster@WU User s Manual
Cluster@WU User s Manual Stefan Theußl Martin Pacala September 29, 2014 1 Introduction and scope At the WU Wirtschaftsuniversität Wien the Research Institute for Computational Methods (Forschungsinstitut
More informationUsing the Windows Cluster
Using the Windows Cluster Christian Terboven terboven@rz.rwth aachen.de Center for Computing and Communication RWTH Aachen University Windows HPC 2008 (II) September 17, RWTH Aachen Agenda o Windows Cluster
More informationCompute Cluster Documentation
Drucksachenkategorie Compute Cluster Documentation Compiled by Martin Geier (DLR FA-STM) Authors Michael Schäfer (INIT GmbH) Martin Geier (DLR FA-STM) Date 22.05.2015 Table of contents Table of contents...
More informationRunning on Blue Gene/Q at Argonne Leadership Computing Facility (ALCF)
Running on Blue Gene/Q at Argonne Leadership Computing Facility (ALCF) ALCF Resources: Machines & Storage Mira (Production) IBM Blue Gene/Q 49,152 nodes / 786,432 cores 768 TB of memory Peak flop rate:
More informationJob Scheduling with Moab Cluster Suite
Job Scheduling with Moab Cluster Suite IBM High Performance Computing February 2010 Y. Joanna Wong, Ph.D. yjw@us.ibm.com 2/22/2010 Workload Manager Torque Source: Adaptive Computing 2 Some terminology..
More informationPartek Flow Installation Guide
Partek Flow Installation Guide Partek Flow is a web based application for genomic data analysis and visualization, which can be installed on a desktop computer, compute cluster or cloud. Users can access
More informationRWTH GPU Cluster. Sandra Wienke wienke@rz.rwth-aachen.de November 2012. Rechen- und Kommunikationszentrum (RZ) Fotos: Christian Iwainsky
RWTH GPU Cluster Fotos: Christian Iwainsky Sandra Wienke wienke@rz.rwth-aachen.de November 2012 Rechen- und Kommunikationszentrum (RZ) The RWTH GPU Cluster GPU Cluster: 57 Nvidia Quadro 6000 (Fermi) innovative
More informationOLCF Best Practices. Bill Renaud OLCF User Assistance Group
OLCF Best Practices Bill Renaud OLCF User Assistance Group Overview This presentation covers some helpful information for users of OLCF Staying informed Some aspects of system usage that may differ from
More informationA High Performance Computing Scheduling and Resource Management Primer
LLNL-TR-652476 A High Performance Computing Scheduling and Resource Management Primer D. H. Ahn, J. E. Garlick, M. A. Grondona, D. A. Lipari, R. R. Springmeyer March 31, 2014 Disclaimer This document was
More informationJob Scheduling Using SLURM
Job Scheduling Using SLURM Morris Jette jette@schedmd.com Goals Learn the basics of SLURM's architecture, daemons and commands Learn how to use a basic set of commands Learn how to build, configure and
More informationParallel Computing using MATLAB Distributed Compute Server ZORRO HPC
Parallel Computing using MATLAB Distributed Compute Server ZORRO HPC Goals of the session Overview of parallel MATLAB Why parallel MATLAB? Multiprocessing in MATLAB Parallel MATLAB using the Parallel Computing
More informationHPC at IU Overview. Abhinav Thota Research Technologies Indiana University
HPC at IU Overview Abhinav Thota Research Technologies Indiana University What is HPC/cyberinfrastructure? Why should you care? Data sizes are growing Need to get to the solution faster Compute power is
More informationUsing Parallel Computing to Run Multiple Jobs
Beowulf Training Using Parallel Computing to Run Multiple Jobs Jeff Linderoth August 5, 2003 August 5, 2003 Beowulf Training Running Multiple Jobs Slide 1 Outline Introduction to Scheduling Software The
More informationGC3: Grid Computing Competence Center Cluster computing, I Batch-queueing systems
GC3: Grid Computing Competence Center Cluster computing, I Batch-queueing systems Riccardo Murri, Sergio Maffioletti Grid Computing Competence Center, Organisch-Chemisches Institut, University of Zurich
More informationGrid Engine Basics. Table of Contents. Grid Engine Basics Version 1. (Formerly: Sun Grid Engine)
Grid Engine Basics (Formerly: Sun Grid Engine) Table of Contents Table of Contents Document Text Style Associations Prerequisites Terminology What is the Grid Engine (SGE)? Loading the SGE Module on Turing
More informationHPCC USER S GUIDE. Version 1.2 July 2012. IITS (Research Support) Singapore Management University. IITS, Singapore Management University Page 1 of 35
HPCC USER S GUIDE Version 1.2 July 2012 IITS (Research Support) Singapore Management University IITS, Singapore Management University Page 1 of 35 Revision History Version 1.0 (27 June 2012): - Modified
More informationCaltech Center for Advanced Computing Research System Guide: MRI2 Cluster (zwicky) January 2014
1. How to Get An Account CACR Accounts 2. How to Access the Machine Connect to the front end, zwicky.cacr.caltech.edu: ssh -l username zwicky.cacr.caltech.edu or ssh username@zwicky.cacr.caltech.edu Edits,
More informationLinux command line. An introduction to the Linux command line for genomics. Susan Fairley
Linux command line An introduction to the Linux command line for genomics Susan Fairley Aims Introduce the command line Provide an awareness of basic functionality Illustrate with some examples Provide
More information1.0. User Manual For HPC Cluster at GIKI. Volume. Ghulam Ishaq Khan Institute of Engineering Sciences & Technology
Volume 1.0 FACULTY OF CUMPUTER SCIENCE & ENGINEERING Ghulam Ishaq Khan Institute of Engineering Sciences & Technology User Manual For HPC Cluster at GIKI Designed and prepared by Faculty of Computer Science
More informationSGE Roll: Users Guide. Version @VERSION@ Edition
SGE Roll: Users Guide Version @VERSION@ Edition SGE Roll: Users Guide : Version @VERSION@ Edition Published Aug 2006 Copyright 2006 UC Regents, Scalable Systems Table of Contents Preface...i 1. Requirements...1
More informationParallel Programming for Multi-Core, Distributed Systems, and GPUs Exercises
Parallel Programming for Multi-Core, Distributed Systems, and GPUs Exercises Pierre-Yves Taunay Research Computing and Cyberinfrastructure 224A Computer Building The Pennsylvania State University University
More informationCommand Line Crash Course For Unix
Command Line Crash Course For Unix Controlling Your Computer From The Terminal Zed A. Shaw December 2011 Introduction How To Use This Course You cannot learn to do this from videos alone. You can learn
More informationSetting up PostgreSQL
Setting up PostgreSQL 1 Introduction to PostgreSQL PostgreSQL is an object-relational database management system based on POSTGRES, which was developed at the University of California at Berkeley. PostgreSQL
More information10 STEPS TO YOUR FIRST QNX PROGRAM. QUICKSTART GUIDE Second Edition
10 STEPS TO YOUR FIRST QNX PROGRAM QUICKSTART GUIDE Second Edition QNX QUICKSTART GUIDE A guide to help you install and configure the QNX Momentics tools and the QNX Neutrino operating system, so you can
More informationBackground and introduction Using the cluster Summary. The DMSC datacenter. Lars Melwyn Jensen. Niels Bohr Institute University of Copenhagen
Niels Bohr Institute University of Copenhagen Who am I Theoretical physics (KU, NORDITA, TF, NSC) Computing non-standard superconductivity and superfluidity condensed matter / statistical physics several
More informationHPC Wales Skills Academy Course Catalogue 2015
HPC Wales Skills Academy Course Catalogue 2015 Overview The HPC Wales Skills Academy provides a variety of courses and workshops aimed at building skills in High Performance Computing (HPC). Our courses
More informationInformationsaustausch für Nutzer des Aachener HPC Clusters
Informationsaustausch für Nutzer des Aachener HPC Clusters Paul Kapinos, Marcus Wagner - 21.05.2015 Informationsaustausch für Nutzer des Aachener HPC Clusters Agenda (The RWTH Compute cluster) Project-based
More informationRunning applications on the Cray XC30 4/12/2015
Running applications on the Cray XC30 4/12/2015 1 Running on compute nodes By default, users do not log in and run applications on the compute nodes directly. Instead they launch jobs on compute nodes
More informationJUROPA Linux Cluster An Overview. 19 May 2014 Ulrich Detert
Mitglied der Helmholtz-Gemeinschaft JUROPA Linux Cluster An Overview 19 May 2014 Ulrich Detert JuRoPA JuRoPA Jülich Research on Petaflop Architectures Bull, Sun, ParTec, Intel, Mellanox, Novell, FZJ JUROPA
More informationThirty Useful Unix Commands
Leaflet U5 Thirty Useful Unix Commands Last revised April 1997 This leaflet contains basic information on thirty of the most frequently used Unix Commands. It is intended for Unix beginners who need a
More informationIntroduction to Operating Systems
Introduction to Operating Systems It is important that you familiarize yourself with Windows and Linux in preparation for this course. The exercises in this book assume a basic knowledge of both of these
More informationMartinos Center Compute Clusters
Intro What are the compute clusters How to gain access Housekeeping Usage Log In Submitting Jobs Queues Request CPUs/vmem Email Status I/O Interactive Dependencies Daisy Chain Wrapper Script In Progress
More informationINF-110. GPFS Installation
INF-110 GPFS Installation Overview Plan the installation Before installing any software, it is important to plan the GPFS installation by choosing the hardware, deciding which kind of disk connectivity
More informationPart I Courses Syllabus
Part I Courses Syllabus This document provides detailed information about the basic courses of the MHPC first part activities. The list of courses is the following 1.1 Scientific Programming Environment
More informationWorking with HPC and HTC Apps. Abhinav Thota Research Technologies Indiana University
Working with HPC and HTC Apps Abhinav Thota Research Technologies Indiana University Outline What are HPC apps? Working with typical HPC apps Compilers - Optimizations and libraries Installation Modules
More informationUsing Google Compute Engine
Using Google Compute Engine Chris Paciorek January 30, 2014 WARNING: This document is now out-of-date (January 2014) as Google has updated various aspects of Google Compute Engine. But it may still be
More informationSlurm at CEA. Site Report CEA 10 AVRIL 2012 PAGE 1. 16 septembre 2014. SLURM User Group - September 2014 F. Belot, F. Diakhaté, A. Roy, M.
Site Report SLURM User Group - September 2014 F. Belot, F. Diakhaté, A. Roy, M. Hautreux 16 septembre 2014 CEA 10 AVRIL 2012 PAGE 1 Agenda Supercomputing projects Developments evaluation methodology Xeon
More informationHRSK-II Nutzerschulung
HRSK-II Nutzerschulung 2013-05-30 Jan Wender Senior IT Consultant 1 2 science + computing ag Founded 1989 Offices Tübingen München Düsseldorf Berlin Employees ~300 Turnover 2012 ~30 Mio. Euro Shareholder
More informationKiko> A personal job scheduler
Kiko> A personal job scheduler V1.2 Carlos allende prieto october 2009 kiko> is a light-weight tool to manage non-interactive tasks on personal computers. It can improve your system s throughput significantly
More informationSimulation of batch scheduling using real production-ready software tools
Simulation of batch scheduling using real production-ready software tools Alejandro Lucero 1 Barcelona SuperComputing Center Abstract. Batch scheduling for high performance cluster installations has two
More informationUsing the Yale HPC Clusters
Using the Yale HPC Clusters Stephen Weston Robert Bjornson Yale Center for Research Computing Yale University Dec 2015 To get help Send an email to: hpc@yale.edu Read documentation at: http://research.computing.yale.edu/hpc-support
More informationUnix Sampler. PEOPLE whoami id who
Unix Sampler PEOPLE whoami id who finger username hostname grep pattern /etc/passwd Learn about yourself. See who is logged on Find out about the person who has an account called username on this host
More informationDo it Yourself System Administration
Do it Yourself System Administration Due to a heavy call volume, we are unable to answer your call at this time. Please remain on the line as calls will be answered in the order they were received. We
More informationVisualization Cluster Getting Started
Visualization Cluster Getting Started Contents 1 Introduction to the Visualization Cluster... 1 2 Visualization Cluster hardware and software... 2 3 Remote visualization session through VNC... 2 4 Starting
More informationInstallation Guide for FTMS 1.6.0 and Node Manager 1.6.0
Installation Guide for FTMS 1.6.0 and Node Manager 1.6.0 Table of Contents Overview... 2 FTMS Server Hardware Requirements... 2 Tested Operating Systems... 2 Node Manager... 2 User Interfaces... 3 License
More informationHow to Use NoMachine 4.4
How to Use NoMachine 4.4 Using NoMachine What is NoMachine and how can I use it? NoMachine is a software that runs on multiple platforms (ie: Windows, Mac, and Linux). It is an end user client that connects
More information