NYUAD HPC Center Running Jobs
|
|
|
- Marshall King
- 9 years ago
- Views:
Transcription
1 NYUAD HPC Center Running Jobs 1 Overview... Error! Bookmark not defined. 1.1 General List... Error! Bookmark not defined. 1.2 Compilers... Error! Bookmark not defined. 2 Loading Software... Error! Bookmark not defined. 3 Version Changes... Error! Bookmark not defined. 4 Requesting Additional Software... Error! Bookmark not defined. 5 Software Available at NYU... Error! Bookmark not defined. 1 Running your job Submitting and running jobs on any cluster involves these 3 major steps: 1. Setup and launch. 2. Monitor progress. 3. Retrieve and analyze the results. Before you start, please read the document for information and policies on your data allocation. Of all the space, only /scratch should be used for computational purposes. NYUAD HPC Services strongly recommends that you use restart and check pointing techniques to safeguard your computational results in the event of a crash or outage. It is the responsibility of the researcher to develop these techniques. The larger your jobs are and the longer they are scheduled to run, the more important this is. At a minimum, be sure to develop a restart program file within your source code. A restart file enables you to restart the job at certain intervals. The main purpose is to divide a large job into sections, so that it will run within the scheduled time and, if there are any unplanned outages, the entire job will not be lost. 27-Nov-12 Page 1 of 10
2 The following sections describe how to submit and run your jobs. 2 PBS Scripts NYUAD HPC Center Running Jobs You'll need to use PBS (Portable Batch System) scripts to set up and launch jobs on any cluster. While it is possible to submit batch requests using an elaborate command line invocation, it is much easier to use PBS scripts, which are more transparent and can be reapplied for sets of slightly different jobs. A PBS script performs these two key jobs: 1. It tells the scheduler about your job, such as: The name of the program executable How many CPUs you need and length of time to run the job What to do if something goes wrong 2. The scheduler will 'run' your script when it comes time to launch your job. A typical PBS script looks like this: #!/bin/bash #PBS -l nodes=1:ppn=1,walltime=5:00:00 #PBS -N jobname #PBS -e localhost:/scratch/netid/${pbs_jobname}.e${pbs_jobid} #PBS -o localhost:/scratch/netid/${pbs_jobname}.o${pbs_jobid} cd /scratch/netid/jobdirectory/./command &> output exit 0; The first "#PBS -l" line tells the scheduler to use one node with one processor per node (1 CPU in total), and this job will abort if not completed in 12 hours. You should put your job's name after "#PBS -N". If you would like to receive s regarding to this job, you may leave your address after "#PBS -M". The "" asks the system to you when the job Aborts, Begins, and Ends. Two kinds of files, error files and output files, are usually generated when the job gets executed. The path to store these files are controlled by the "#PBS -e" and "#PBS -o". You might want to change the path correspondingly. Page 2 of Nov-12
3 All lines that start with #PBS pass a PBS command, while adding a white space does not. For example, in the lines below the first line will implement a PBS walltime, whereas the second line will not. #PBS -l walltime=4:0:0 # PBS -l walltime=4:0:0 After setting up all the parameters as above, you may tell the scheduler how to execute your job by listing all the commands. You may also set up environmental variables right before these commands. Estimating Resources Requested Estimating walltime as accurately as possible will help MOAB/Torque to schedule your job more efficiently. If your job requires hours to finish do not ask for a much longer walltime. Please review available queues ( and queue parameters offered by NYUAD HPC. Estimating the number of nodes and the number of CPU cores is equally important. Requesting more nodes or more CPU cores than the job needs will remove these resources from the available pool. Serial jobs should use one CPU core (ppn=1) unless there are higher than usual memory requirements. If a higher memory requirement is essential, send to [email protected] to arrive at the best coreto-memory distribution for a given job. There are occasions when using an entire compute node for a single process is required and all 12 CPU cores should be requested for a serial job (ppn=12). Invoking Interactive Sessions Use the following PBS command to initiate an interactive session to a compute node on the cluster: $ qsub -I -q interactive -l nodes=1:ppn=12,walltime=04:00:00 Use the following PBS command to initiate an interactive session. It uses an X session to a compute node on the cluster: $ qsub -I -X -q interactive -l nodes=1:ppn=12,walltime=04:00:00 Submitting Serial Jobs A serial job on NYUAD HPC is defined as a job that does not require more than one node, which do not involve any inter-compute node data communications either. Submitting Single-Core Jobs A serial job usually takes 1 CPU core in a node. We specify this in the "#PBS -l" line. The PBS script should be like this, #!/bin/bash 27-Nov-12 Page 3 of 10
4 #PBS -l nodes=1:ppn=1,walltime=5:00:00 #PBS -N jobname #PBS -e localhost:$pbs_o_workdir/${pbs_jobname}.e${pbs_jobid} #PBS -o localhost:$pbs_o_workdir/${pbs_jobname}.o${pbs_jobid} cd /scratch/netid/jobdirectory/ NYUAD HPC Center Running Jobs./serialtest &> output exit 0; We then save this script in a text file, say job.pbs. Then we submit the job by running, $ qsub job.pbs Submitting OpenMP Serial Jobs Although OpenMP (NOT OpenMPI) jobs can use more than one CPU cores, all such cores are within a node. The OpenMP jobs, as a result, are serial jobs and cannot be submitted to Bowery. To submit an OpenMP job to 1 node and 8 CPU cores: #!/bin/bash #PBS -l nodes=1:ppn=12,walltime=5:00:00 #PBS -N jobname #PBS -e localhost:$pbs_o_workdir/${pbs_jobname}.e${pbs_jobid} #PBS -o localhost:$pbs_o_workdir/${pbs_jobname}.o${pbs_jobid} cd /scratch/netid/jobdirectory/ export OMP_NUM_THREADS=12./omptest &> output exit 0; Page 4 of Nov-12
5 Submitting Parallel Jobs Parallel jobs use more than one node and usually contain cross-node message/data communications. MPI is widely used for parallel jobs. MPI wrappers are available on all the NYU HPC clusters. However, it is highly recommended to launch parallel jobs on the Bowery cluster. You are also encouraged to submit p12 jobs (jobs with walltime equal or less than 12 hours) as there are many more p12 nodes available. Additionally, 96 nodes from chassis 4 to 9 on Bowery are all 12-hour nodes and have 12 CPU's per node. By declaring ppn=12, you can make sure your jobs go to these often less busier compute nodes. This will also avoid wasting resources by utilizing all the 12 CPU cores on each node. Submitting MPI Parallel Jobs To submit an MPI job to 2 nodes and 24 CPU cores: #!/bin/bash #PBS -l nodes=2:ppn=12,walltime=5:00:00 #PBS -N jobname #PBS -e localhost:$pbs_o_workdir/${pbs_jobname}.e${pbs_jobid} #PBS -o localhost:$pbs_o_workdir/${pbs_jobname}.o${pbs_jobid} cd /scratch/netid/jobdirectory/ /share/apps/mpiexec/0.84/gnu/bin/mpiexec -comm ib -np 24./mvatest &> output exit 0; Submitting MPI Jobs with Fewer CPU Cores It is also possible to claim fewer CPU cores than what a node actually has with an MPI job. To submit a serial MPI job to 1 node and 4 CPU cores: #!/bin/bash #PBS -l nodes=1:ppn=4,walltime=5:00:00 #PBS -N jobname 27-Nov-12 Page 5 of 10
6 #PBS -e localhost:$pbs_o_workdir/${pbs_jobname}.e${pbs_jobid} #PBS -o localhost:$pbs_o_workdir/${pbs_jobname}.o${pbs_jobid} cd /scratch/netid/jobdirectory/ NYUAD HPC Center Running Jobs /share/apps/mpiexec/0.84/gnu/bin/mpiexec -np 4 env VIADEV_ENABLE_AFFINITY=0./mpitest &> output exit 0; You must specify the "env VIADEV_ENABLE_AFFINITY=0" in your script with MPI serial jobs because otherwise, MPI tends to bind your jobs with a certain group of CPUs (the first four CPUs in a node, for example). If you or someone else submit another serial MPI job to the same node, the job may also be bound to the same CPUs and thus both calculations will be impeded. Submitting bigmem jobs on BuTinah The bigmem queue on BuTinah has been created for jobs with memory requirements of more than 48 GB memory. If the memory usage is less than 48 GB, please use other compute nodes. #!/bin/sh #PBS -V #PBS -N PBS_JOB_NAME #PBS -l nodes=1:ppn=12,walltime=12:00:00 #PBS -e localhost:$pbs_o_workdir/${pbs_jobname}.e${pbs_jobid} #PBS -o localhost:$pbs_o_workdir/${pbs_jobname}.o${pbs_jobid} #PBS -q bigmem #PBS -l mem=64gb Submit many similar runs (serial & parallel) with mpiexec Because of the overhead incurred by the scheduler processing each job submitted, which is particularly serious when the jobs are small and/or short in duration, it is generally not a good idea to simply run a loop over qsub and inject a large set of small jobs into the queue. It is often far more efficient from this point of view to package such little jobs up into larger 'super-jobs', provided that each small job in the larger job is expected to finish at about the same time (so that cores aren't left allocated but idle). Page 6 of Nov-12
7 Assuming this condition is met, what follows is a recommended method of aggregating a number of smaller jobs into a single Torque/PBS job which can be scheduled as a unit. that there exists in principle another approach to this problem using a feature of Torque/PBS called job arrays, but this feature still incurs significant scheduling overhead (because an array of N jobs is still handled internally as N ordinary jobs), in contrast to the method described here. For simplicity this example assumes the small jobs are serial (one-core) jobs. Firstly, group the small jobs into sets of similar runtime (choose the largest multiple of 12 for queues p12 & p48 which will end close together), and package each set of N separate similar runtime jobs as a single N-core job as follows. The PBS directives at the top of the submission script should specify: For queue p48, p12 #PBS -l nodes=<n/12>:ppn=12 where the <>s above should be replaced by the results of the trivial calculations enclosed. Then instead of the usual Executable Line $./$executable $arguments at the end of the submission script to launch one serial job, launch N via something like the following. that this makes use of the Ohio SC version of mpiexec. This won't work with OpenMPI. Launch Commands source /etc/profile.d/env-modules.sh module load mpiexec/gnu/0.84 cd directory_for_job1 mpiexec -comm none -n 1 $executable arguments_for_job1 > output 2> error & cd directory_for_job2 mpiexec -comm none -n 1 $executable arguments_for_job2 > output 2> error &... cd directory_for_jobn mpiexec -comm none -n 1 $executable arguments_for_jobn > output 2> error & wait 27-Nov-12 Page 7 of 10
8 In the above: 1. the use of mpiexec (not mpirun) with the options -comm none -n 1, which mean that in this case, the application isn't using MPI and just needs 1 core (being serial). We are simply using the job launch functionality of mpiexec in this example, but we could alter the arguments to launch parallel MPI 'small' jobs instead of serial ones (in the case of MVAPICH2, add the -comm pmi option to select the correct parallel launch protocol); 2. the > output 2> error which direct the stdout and stderr to files called output and error respectively in each directory for the corresponding job (obviously you can change the names of these, and even have the jobs running in the same directory if they are really independent; check example below); 3. the & at the end of each mpiexec line which allows them to run simultaneously (the mpiexecs will cooperate and take different cores out of the set allocated by Torque/PBS); 4. the wait command at the end, which prevents the job script from finishing before the mpiexecs. 3 Many Serial Tasks in One Batch Job with PBSDSH Often it is necessary to have tasks run in parallel without using MPI. Torque provides a tool called pbsdsh to facilitate this. It makes best use of available resources for embarrassingly parallel tasks that don't require MPI. This application is run from your submission script and spawns a script or application across all the CPUs and nodes allocated by the batch server. This means that the script must be smart enough to figure out its role within the group. Here is an example of a job on 24 cores. PBS Script #!/bin/sh #PBS -l nodes=2:ppn=12 pbsdsh $PBS_O_WORKDIR/myscript.sh Since the same shell script myscript.sh is executed on each core, that script needs to be clever enough to decide what its role is. Unless all processes shall do the same, we have to distinguish cores or processes. The environment variable PBS_VNODENUM helps. In case of n requested cores it takes a value from 0 to n-1 and numbers the requested cores. You can use PBS_VNODENUM to submit it to the same program as an argument, to start a different program in each process or to read different input files. The following three examples show the shell script myscript.sh belonging to these three cases. Example: Submit PBS_VNODENUM as Argument PBS_VNODENUM as Argument Page 8 of Nov-12
9 $ cat myscript.sh #!/bin/sh cd $PBS_O_WORKDIR PATH=$PBS_O_PATH./myprogram $PBS_VNODENUM Setting the current directory and the environment variable PATH is necessary since only a very basic environment is defined by default. Example: Start Different Programs Different Programs $ cat myscript.sh #!/bin/sh cd $PBS_O_WORKDIR PATH=$PBS_O_PATH./myprogram.$PBS_VNODENUM Example: Read Different Input Files Different Input Files $ cat myscript.sh #!/bin/sh cd $PBS_O_WORKDIR PATH=$PBS_O_PATH./myprogram < mydata.$pbs_vnodenum Link to PBSDSH examples Monitoring Your Job You can monitor your job's progress while it is running. There are various ways to do that. One way is by using this PBS command: $ showq You'll see all current jobs on the cluster. To see the lines relevant to your job, you can use this command: $ showq grep NetID You should see a line like this: $ showq grep NetID NetID Running 16 23:26:49 Wed Feb 13 14:38:33 27-Nov-12 Page 9 of 10
10 The above result indicates the job number; owner name; job status; number of CPUs; time remaining to run; and the date and time the job was submitted. You can also see this same information by using the following command: $ showq -u NetID Or simply type "myq": $ myq If the cluster is busy, your job may have to wait in the queue, in which case the status of the job would be Idle. If you are interested in the current cluster usage, you may input: $ pbstop Each column represents a node. The node is busy if the column is filled with letter blocks. You typically need to wait longer before your jobs get executed if fewer nodes are available. To see where you jobs are running locally, type: $ pbstop -u NetID Be sure to substitute your own "NetID" for NetID Deleting a Job If you want to stop your job before it has finished running, you can do so using the qdel command: $ qdel jobid To stop/delete all the jobs $ qdel all "qdel all" deletes all the jobs owned by an user irrespective of state of the job. 4 Questions? Please read our FAQs on our website page first. If you have more questions on running jobs, please send an to [email protected]. Page 10 of Nov-12
Miami University RedHawk Cluster Working with batch jobs on the Cluster
Miami University RedHawk Cluster Working with batch jobs on the Cluster The RedHawk cluster is a general purpose research computing resource available to support the research community at Miami University.
Hodor and Bran - Job Scheduling and PBS Scripts
Hodor and Bran - Job Scheduling and PBS Scripts UND Computational Research Center Now that you have your program compiled and your input file ready for processing, it s time to run your job on the cluster.
Running applications on the Cray XC30 4/12/2015
Running applications on the Cray XC30 4/12/2015 1 Running on compute nodes By default, users do not log in and run applications on the compute nodes directly. Instead they launch jobs on compute nodes
Batch Scripts for RA & Mio
Batch Scripts for RA & Mio Timothy H. Kaiser, Ph.D. [email protected] 1 Jobs are Run via a Batch System Ra and Mio are shared resources Purpose: Give fair access to all users Have control over where jobs
Ra - Batch Scripts. Timothy H. Kaiser, Ph.D. [email protected]
Ra - Batch Scripts Timothy H. Kaiser, Ph.D. [email protected] Jobs on Ra are Run via a Batch System Ra is a shared resource Purpose: Give fair access to all users Have control over where jobs are run Set
Job Scheduling with Moab Cluster Suite
Job Scheduling with Moab Cluster Suite IBM High Performance Computing February 2010 Y. Joanna Wong, Ph.D. [email protected] 2/22/2010 Workload Manager Torque Source: Adaptive Computing 2 Some terminology..
Using Parallel Computing to Run Multiple Jobs
Beowulf Training Using Parallel Computing to Run Multiple Jobs Jeff Linderoth August 5, 2003 August 5, 2003 Beowulf Training Running Multiple Jobs Slide 1 Outline Introduction to Scheduling Software The
Quick Tutorial for Portable Batch System (PBS)
Quick Tutorial for Portable Batch System (PBS) The Portable Batch System (PBS) system is designed to manage the distribution of batch jobs and interactive sessions across the available nodes in the cluster.
SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education www.accre.vanderbilt.
SLURM: Resource Management and Job Scheduling Software Advanced Computing Center for Research and Education www.accre.vanderbilt.edu Simple Linux Utility for Resource Management But it s also a job scheduler!
Work Environment. David Tur HPC Expert. HPC Users Training September, 18th 2015
Work Environment David Tur HPC Expert HPC Users Training September, 18th 2015 1. Atlas Cluster: Accessing and using resources 2. Software Overview 3. Job Scheduler 1. Accessing Resources DIPC technicians
PBS Tutorial. Fangrui Ma Universit of Nebraska-Lincoln. October 26th, 2007
PBS Tutorial Fangrui Ma Universit of Nebraska-Lincoln October 26th, 2007 Abstract In this tutorial we gave a brief introduction to using PBS Pro. We gave examples on how to write control script, and submit
SLURM Workload Manager
SLURM Workload Manager What is SLURM? SLURM (Simple Linux Utility for Resource Management) is the native scheduler software that runs on ASTI's HPC cluster. Free and open-source job scheduler for the Linux
Linux für bwgrid. Sabine Richling, Heinz Kredel. Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim. 27.
Linux für bwgrid Sabine Richling, Heinz Kredel Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim 27. June 2011 Richling/Kredel (URZ/RUM) Linux für bwgrid FS 2011 1 / 33 Introduction
Grid Engine Basics. Table of Contents. Grid Engine Basics Version 1. (Formerly: Sun Grid Engine)
Grid Engine Basics (Formerly: Sun Grid Engine) Table of Contents Table of Contents Document Text Style Associations Prerequisites Terminology What is the Grid Engine (SGE)? Loading the SGE Module on Turing
SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education www.accre.vanderbilt.
SLURM: Resource Management and Job Scheduling Software Advanced Computing Center for Research and Education www.accre.vanderbilt.edu Simple Linux Utility for Resource Management But it s also a job scheduler!
Introduction to Running Hadoop on the High Performance Clusters at the Center for Computational Research
Introduction to Running Hadoop on the High Performance Clusters at the Center for Computational Research Cynthia Cornelius Center for Computational Research University at Buffalo, SUNY 701 Ellicott St
Job scheduler details
Job scheduler details Advanced Computing Center for Research & Education (ACCRE) Job scheduler details 1 / 25 Outline 1 Batch queue system overview 2 Torque and Moab 3 Submitting jobs (ACCRE) Job scheduler
Grid 101. Grid 101. Josh Hegie. [email protected] http://hpc.unr.edu
Grid 101 Josh Hegie [email protected] http://hpc.unr.edu Accessing the Grid Outline 1 Accessing the Grid 2 Working on the Grid 3 Submitting Jobs with SGE 4 Compiling 5 MPI 6 Questions? Accessing the Grid Logging
Cluster@WU User s Manual
Cluster@WU User s Manual Stefan Theußl Martin Pacala September 29, 2014 1 Introduction and scope At the WU Wirtschaftsuniversität Wien the Research Institute for Computational Methods (Forschungsinstitut
Martinos Center Compute Clusters
Intro What are the compute clusters How to gain access Housekeeping Usage Log In Submitting Jobs Queues Request CPUs/vmem Email Status I/O Interactive Dependencies Daisy Chain Wrapper Script In Progress
Using the Yale HPC Clusters
Using the Yale HPC Clusters Stephen Weston Robert Bjornson Yale Center for Research Computing Yale University Oct 2015 To get help Send an email to: [email protected] Read documentation at: http://research.computing.yale.edu/hpc-support
Getting Started with HPC
Getting Started with HPC An Introduction to the Minerva High Performance Computing Resource 17 Sep 2013 Outline of Topics Introduction HPC Accounts Logging onto the HPC Clusters Common Linux Commands Storage
SGE Roll: Users Guide. Version @VERSION@ Edition
SGE Roll: Users Guide Version @VERSION@ Edition SGE Roll: Users Guide : Version @VERSION@ Edition Published Aug 2006 Copyright 2006 UC Regents, Scalable Systems Table of Contents Preface...i 1. Requirements...1
RA MPI Compilers Debuggers Profiling. March 25, 2009
RA MPI Compilers Debuggers Profiling March 25, 2009 Examples and Slides To download examples on RA 1. mkdir class 2. cd class 3. wget http://geco.mines.edu/workshop/class2/examples/examples.tgz 4. tar
The RWTH Compute Cluster Environment
The RWTH Compute Cluster Environment Tim Cramer 11.03.2013 Source: D. Both, Bull GmbH Rechen- und Kommunikationszentrum (RZ) How to login Frontends cluster.rz.rwth-aachen.de cluster-x.rz.rwth-aachen.de
NEC HPC-Linux-Cluster
NEC HPC-Linux-Cluster Hardware configuration: 4 Front-end servers: each with SandyBridge-EP processors: 16 cores per node 128 GB memory 134 compute nodes: 112 nodes with SandyBridge-EP processors (16 cores
Grid Engine Users Guide. 2011.11p1 Edition
Grid Engine Users Guide 2011.11p1 Edition Grid Engine Users Guide : 2011.11p1 Edition Published Nov 01 2012 Copyright 2012 University of California and Scalable Systems This document is subject to the
Parallel Computing using MATLAB Distributed Compute Server ZORRO HPC
Parallel Computing using MATLAB Distributed Compute Server ZORRO HPC Goals of the session Overview of parallel MATLAB Why parallel MATLAB? Multiprocessing in MATLAB Parallel MATLAB using the Parallel Computing
Parallel Programming for Multi-Core, Distributed Systems, and GPUs Exercises
Parallel Programming for Multi-Core, Distributed Systems, and GPUs Exercises Pierre-Yves Taunay Research Computing and Cyberinfrastructure 224A Computer Building The Pennsylvania State University University
Introduction to the SGE/OGS batch-queuing system
Grid Computing Competence Center Introduction to the SGE/OGS batch-queuing system Riccardo Murri Grid Computing Competence Center, Organisch-Chemisches Institut, University of Zurich Oct. 6, 2011 The basic
Parallel Debugging with DDT
Parallel Debugging with DDT Nate Woody 3/10/2009 www.cac.cornell.edu 1 Debugging Debugging is a methodical process of finding and reducing the number of bugs, or defects, in a computer program or a piece
Introduction to Sun Grid Engine (SGE)
Introduction to Sun Grid Engine (SGE) What is SGE? Sun Grid Engine (SGE) is an open source community effort to facilitate the adoption of distributed computing solutions. Sponsored by Sun Microsystems
Installing and running COMSOL on a Linux cluster
Installing and running COMSOL on a Linux cluster Introduction This quick guide explains how to install and operate COMSOL Multiphysics 5.0 on a Linux cluster. It is a complement to the COMSOL Installation
Job Scheduling Explained More than you ever want to know about how jobs get scheduled on WestGrid systems...
Job Scheduling Explained More than you ever want to know about how jobs get scheduled on WestGrid systems... Martin Siegert, SFU Cluster Myths There are so many jobs in the queue - it will take ages until
Tutorial: Using WestGrid. Drew Leske Compute Canada/WestGrid Site Lead University of Victoria
Tutorial: Using WestGrid Drew Leske Compute Canada/WestGrid Site Lead University of Victoria Fall 2013 Seminar Series Date Speaker Topic 23 September Lindsay Sill Introduction to WestGrid 9 October Drew
High Performance Computing Facility Specifications, Policies and Usage. Supercomputer Project. Bibliotheca Alexandrina
High Performance Computing Facility Specifications, Policies and Usage Supercomputer Project Bibliotheca Alexandrina Bibliotheca Alexandrina 1/16 Topics Specifications Overview Site Policies Intel Compilers
HPC at IU Overview. Abhinav Thota Research Technologies Indiana University
HPC at IU Overview Abhinav Thota Research Technologies Indiana University What is HPC/cyberinfrastructure? Why should you care? Data sizes are growing Need to get to the solution faster Compute power is
Parallel Processing using the LOTUS cluster
Parallel Processing using the LOTUS cluster Alison Pamment / Cristina del Cano Novales JASMIN/CEMS Workshop February 2015 Overview Parallelising data analysis LOTUS HPC Cluster Job submission on LOTUS
The CNMS Computer Cluster
The CNMS Computer Cluster This page describes the CNMS Computational Cluster, how to access it, and how to use it. Introduction (2014) The latest block of the CNMS Cluster (2010) Previous blocks of the
Using NeSI HPC Resources. NeSI Computational Science Team ([email protected])
NeSI Computational Science Team ([email protected]) Outline 1 About Us About NeSI Our Facilities 2 Using the Cluster Suitable Work What to expect Parallel speedup Data Getting to the Login Node 3 Submitting
Running on Blue Gene/Q at Argonne Leadership Computing Facility (ALCF)
Running on Blue Gene/Q at Argonne Leadership Computing Facility (ALCF) ALCF Resources: Machines & Storage Mira (Production) IBM Blue Gene/Q 49,152 nodes / 786,432 cores 768 TB of memory Peak flop rate:
High-Performance Reservoir Risk Assessment (Jacta Cluster)
High-Performance Reservoir Risk Assessment (Jacta Cluster) SKUA-GOCAD 2013.1 Paradigm 2011.3 With Epos 4.1 Data Management Configuration Guide 2008 2013 Paradigm Ltd. or its affiliates and subsidiaries.
Introduction to Running Computations on the High Performance Clusters at the Center for Computational Research
! Introduction to Running Computations on the High Performance Clusters at the Center for Computational Research! Cynthia Cornelius! Center for Computational Research University at Buffalo, SUNY! cdc at
Biowulf2 Training Session
Biowulf2 Training Session 9 July 2015 Slides at: h,p://hpc.nih.gov/docs/b2training.pdf HPC@NIH website: h,p://hpc.nih.gov System hardware overview What s new/different The batch system & subminng jobs
Debugging and Profiling Lab. Carlos Rosales, Kent Milfeld and Yaakoub Y. El Kharma [email protected]
Debugging and Profiling Lab Carlos Rosales, Kent Milfeld and Yaakoub Y. El Kharma [email protected] Setup Login to Ranger: - ssh -X [email protected] Make sure you can export graphics
Using WestGrid. Patrick Mann, Manager, Technical Operations Jan.15, 2014
Using WestGrid Patrick Mann, Manager, Technical Operations Jan.15, 2014 Winter 2014 Seminar Series Date Speaker Topic 5 February Gino DiLabio Molecular Modelling Using HPC and Gaussian 26 February Jonathan
The Moab Scheduler. Dan Mazur, McGill HPC [email protected] Aug 23, 2013
The Moab Scheduler Dan Mazur, McGill HPC [email protected] Aug 23, 2013 1 Outline Fair Resource Sharing Fairness Priority Maximizing resource usage MAXPS fairness policy Minimizing queue times Should
Introduction to HPC Workshop. Center for e-research ([email protected])
Center for e-research ([email protected]) Outline 1 About Us About CER and NeSI The CS Team Our Facilities 2 Key Concepts What is a Cluster Parallel Programming Shared Memory Distributed Memory 3 Using
Introduction to Grid Engine
Introduction to Grid Engine Workbook Edition 8 January 2011 Document reference: 3609-2011 Introduction to Grid Engine for ECDF Users Workbook Introduction to Grid Engine for ECDF Users Author: Brian Fletcher,
Streamline Computing Linux Cluster User Training. ( Nottingham University)
1 Streamline Computing Linux Cluster User Training ( Nottingham University) 3 User Training Agenda System Overview System Access Description of Cluster Environment Code Development Job Schedulers Running
Resource Management and Job Scheduling
Resource Management and Job Scheduling Jenett Tillotson Senior Cluster System Administrator Indiana University May 18 18-22 May 2015 1 Resource Managers Keep track of resources Nodes: CPUs, disk, memory,
1.0. User Manual For HPC Cluster at GIKI. Volume. Ghulam Ishaq Khan Institute of Engineering Sciences & Technology
Volume 1.0 FACULTY OF CUMPUTER SCIENCE & ENGINEERING Ghulam Ishaq Khan Institute of Engineering Sciences & Technology User Manual For HPC Cluster at GIKI Designed and prepared by Faculty of Computer Science
Matlab on a Supercomputer
Matlab on a Supercomputer Shelley L. Knuth Research Computing April 9, 2015 Outline Description of Matlab and supercomputing Interactive Matlab jobs Non-interactive Matlab jobs Parallel Computing Slides
Advanced PBS Workflow Example Bill Brouwer 05/01/12 Research Computing and Cyberinfrastructure Unit, PSU [email protected]
Advanced PBS Workflow Example Bill Brouwer 050112 Research Computing and Cyberinfrastructure Unit, PSU [email protected] 0.0 An elementary workflow All jobs consuming significant cycles need to be submitted
Using the Yale HPC Clusters
Using the Yale HPC Clusters Stephen Weston Robert Bjornson Yale Center for Research Computing Yale University Dec 2015 To get help Send an email to: [email protected] Read documentation at: http://research.computing.yale.edu/hpc-support
To connect to the cluster, simply use a SSH or SFTP client to connect to:
RIT Computer Engineering Cluster The RIT Computer Engineering cluster contains 12 computers for parallel programming using MPI. One computer, cluster-head.ce.rit.edu, serves as the master controller or
How to Run Parallel Jobs Efficiently
How to Run Parallel Jobs Efficiently Shao-Ching Huang High Performance Computing Group UCLA Institute for Digital Research and Education May 9, 2013 1 The big picture: running parallel jobs on Hoffman2
Batch Systems. provide a mechanism for submitting, launching, and tracking jobs on a shared resource
PBS INTERNALS PBS & TORQUE PBS (Portable Batch System)-software system for managing system resources on workstations, SMP systems, MPPs and vector computers. It was based on Network Queuing System (NQS)
Introduction to SDSC systems and data analytics software packages "
Introduction to SDSC systems and data analytics software packages " Mahidhar Tatineni ([email protected]) SDSC Summer Institute August 05, 2013 Getting Started" System Access Logging in Linux/Mac Use available
An Introduction to High Performance Computing in the Department
An Introduction to High Performance Computing in the Department Ashley Ford & Chris Jewell Department of Statistics University of Warwick October 30, 2012 1 Some Background 2 How is Buster used? 3 Software
Notes on the SNOW/Rmpi R packages with OpenMPI and Sun Grid Engine
Notes on the SNOW/Rmpi R packages with OpenMPI and Sun Grid Engine Last updated: 6/2/2008 4:43PM EDT We informally discuss the basic set up of the R Rmpi and SNOW packages with OpenMPI and the Sun Grid
Rocoto. HWRF Python Scripts Training Miami, FL November 19, 2015
Rocoto HWRF Python Scripts Training Miami, FL November 19, 2015 Outline Introduction to Rocoto How it works Overview and description of XML Effectively using Rocoto (run, boot, stat, check, rewind, logs)
Adaptive Resource Optimizer For Optimal High Performance Compute Resource Utilization
Technical Backgrounder Adaptive Resource Optimizer For Optimal High Performance Compute Resource Utilization July 2015 Introduction In a typical chip design environment, designers use thousands of CPU
High Performance Computing
High Performance Computing at Stellenbosch University Gerhard Venter Outline 1 Background 2 Clusters 3 SU History 4 SU Cluster 5 Using the Cluster 6 Examples What is High Performance Computing? Wikipedia
Guillimin HPC Users Meeting. Bryan Caron
November 13, 2014 Bryan Caron [email protected] [email protected] McGill University / Calcul Québec / Compute Canada Montréal, QC Canada Outline Compute Canada News October Service Interruption
OpenMP & MPI CISC 879. Tristan Vanderbruggen & John Cavazos Dept of Computer & Information Sciences University of Delaware
OpenMP & MPI CISC 879 Tristan Vanderbruggen & John Cavazos Dept of Computer & Information Sciences University of Delaware 1 Lecture Overview Introduction OpenMP MPI Model Language extension: directives-based
GRID Computing: CAS Style
CS4CC3 Advanced Operating Systems Architectures Laboratory 7 GRID Computing: CAS Style campus trunk C.I.S. router "birkhoff" server The CAS Grid Computer 100BT ethernet node 1 "gigabyte" Ethernet switch
8/15/2014. Best Practices @OLCF (and more) General Information. Staying Informed. Staying Informed. Staying Informed-System Status
Best Practices @OLCF (and more) Bill Renaud OLCF User Support General Information This presentation covers some helpful information for users of OLCF Staying informed Aspects of system usage that may differ
Caltech Center for Advanced Computing Research System Guide: MRI2 Cluster (zwicky) January 2014
1. How to Get An Account CACR Accounts 2. How to Access the Machine Connect to the front end, zwicky.cacr.caltech.edu: ssh -l username zwicky.cacr.caltech.edu or ssh [email protected] Edits,
Grid Engine. Application Integration
Grid Engine Application Integration Getting Stuff Done. Batch Interactive - Terminal Interactive - X11/GUI Licensed Applications Parallel Jobs DRMAA Batch Jobs Most common What is run: Shell Scripts Binaries
An introduction to compute resources in Biostatistics. Chris Scheller [email protected]
An introduction to compute resources in Biostatistics Chris Scheller [email protected] 1. Resources 1. Hardware 2. Account Allocation 3. Storage 4. Software 2. Usage 1. Environment Modules 2. Tools 3.
NorduGrid ARC Tutorial
NorduGrid ARC Tutorial / Arto Teräs and Olli Tourunen 2006-03-23 Slide 1(34) NorduGrid ARC Tutorial Arto Teräs and Olli Tourunen CSC, Espoo, Finland March 23
Submitting batch jobs Slurm on ecgate. Xavi Abellan [email protected] User Support Section
Submitting batch jobs Slurm on ecgate Xavi Abellan [email protected] User Support Section Slide 1 Outline Interactive mode versus Batch mode Overview of the Slurm batch system on ecgate Batch basic
Beyond Windows: Using the Linux Servers and the Grid
Beyond Windows: Using the Linux Servers and the Grid Topics Linux Overview How to Login & Remote Access Passwords Staying Up-To-Date Network Drives Server List The Grid Useful Commands Linux Overview Linux
HPC system startup manual (version 1.30)
HPC system startup manual (version 1.30) Document change log Issue Date Change 1 12/1/2012 New document 2 10/22/2013 Added the information of supported OS 3 10/22/2013 Changed the example 1 for data download
Beginners Shell Scripting for Batch Jobs
Beginners Shell Scripting for Batch Jobs Evan Bollig and Geoffrey Womeldorff Before we begin... Everyone please visit this page for example scripts and grab a crib sheet from the front http://www.scs.fsu.edu/~bollig/techseries
HPCC - Hrothgar Getting Started User Guide MPI Programming
HPCC - Hrothgar Getting Started User Guide MPI Programming High Performance Computing Center Texas Tech University HPCC - Hrothgar 2 Table of Contents 1. Introduction... 3 2. Setting up the environment...
Introduction to Linux and Cluster Basics for the CCR General Computing Cluster
Introduction to Linux and Cluster Basics for the CCR General Computing Cluster Cynthia Cornelius Center for Computational Research University at Buffalo, SUNY 701 Ellicott St Buffalo, NY 14203 Phone: 716-881-8959
A High Performance Computing Scheduling and Resource Management Primer
LLNL-TR-652476 A High Performance Computing Scheduling and Resource Management Primer D. H. Ahn, J. E. Garlick, M. A. Grondona, D. A. Lipari, R. R. Springmeyer March 31, 2014 Disclaimer This document was
High-Performance Computing
High-Performance Computing Windows, Matlab and the HPC Dr. Leigh Brookshaw Dept. of Maths and Computing, USQ 1 The HPC Architecture 30 Sun boxes or nodes Each node has 2 x 2.4GHz AMD CPUs with 4 Cores
Fair Scheduler. Table of contents
Table of contents 1 Purpose... 2 2 Introduction... 2 3 Installation... 3 4 Configuration...3 4.1 Scheduler Parameters in mapred-site.xml...4 4.2 Allocation File (fair-scheduler.xml)... 6 4.3 Access Control
Chapter 2: Getting Started
Chapter 2: Getting Started Once Partek Flow is installed, Chapter 2 will take the user to the next stage and describes the user interface and, of note, defines a number of terms required to understand
Compute Cluster Server Lab 3: Debugging the parallel MPI programs in Microsoft Visual Studio 2005
Compute Cluster Server Lab 3: Debugging the parallel MPI programs in Microsoft Visual Studio 2005 Compute Cluster Server Lab 3: Debugging the parallel MPI programs in Microsoft Visual Studio 2005... 1
UMass High Performance Computing Center
.. UMass High Performance Computing Center University of Massachusetts Medical School October, 2014 2 / 32. Challenges of Genomic Data It is getting easier and cheaper to produce bigger genomic data every
Batch Job Analysis to Improve the Success Rate in HPC
Batch Job Analysis to Improve the Success Rate in HPC 1 JunWeon Yoon, 2 TaeYoung Hong, 3 ChanYeol Park, 4 HeonChang Yu 1, First Author KISTI and Korea University, [email protected] 2,3, KISTI,[email protected],[email protected]
Parallelization: Binary Tree Traversal
By Aaron Weeden and Patrick Royal Shodor Education Foundation, Inc. August 2012 Introduction: According to Moore s law, the number of transistors on a computer chip doubles roughly every two years. First
Using the Windows Cluster
Using the Windows Cluster Christian Terboven [email protected] aachen.de Center for Computing and Communication RWTH Aachen University Windows HPC 2008 (II) September 17, RWTH Aachen Agenda o Windows Cluster
OLCF Best Practices. Bill Renaud OLCF User Assistance Group
OLCF Best Practices Bill Renaud OLCF User Assistance Group Overview This presentation covers some helpful information for users of OLCF Staying informed Some aspects of system usage that may differ from
PBSPro scheduling. PBS overview Qsub command: resource requests. Queues a7ribu8on. Fairshare. Backfill Jobs submission.
PBSPro scheduling PBS overview Qsub command: resource requests Queues a7ribu8on Fairshare Backfill Jobs submission 9 mai 03 PBS PBS overview 9 mai 03 PBS PBS organiza5on: daemons frontend compute nodes
MONITORING PERFORMANCE IN WINDOWS 7
MONITORING PERFORMANCE IN WINDOWS 7 Performance Monitor In this demo we will take a look at how we can use the Performance Monitor to capture information about our machine performance. We can access Performance
LoadLeveler Overview. January 30-31, 2012. IBM Storage & Technology Group. IBM HPC Developer Education @ TIFR, Mumbai
IBM HPC Developer Education @ TIFR, Mumbai IBM Storage & Technology Group LoadLeveler Overview January 30-31, 2012 Pidad D'Souza ([email protected]) IBM, System & Technology Group 2009 IBM Corporation
Parallel Options for R
Parallel Options for R Glenn K. Lockwood SDSC User Services [email protected] Motivation "I just ran an intensive R script [on the supercomputer]. It's not much faster than my own machine." Motivation "I
Optimizing Shared Resource Contention in HPC Clusters
Optimizing Shared Resource Contention in HPC Clusters Sergey Blagodurov Simon Fraser University Alexandra Fedorova Simon Fraser University Abstract Contention for shared resources in HPC clusters occurs
