High Performance Computing Facility Specifications, Policies and Usage. Supercomputer Project. Bibliotheca Alexandrina



Similar documents
Grid 101. Grid 101. Josh Hegie.

Streamline Computing Linux Cluster User Training. ( Nottingham University)

Introduction to Sun Grid Engine (SGE)

Grid Engine Basics. Table of Contents. Grid Engine Basics Version 1. (Formerly: Sun Grid Engine)

Grid Engine 6. Troubleshooting. BioTeam Inc.

SGE Roll: Users Guide. Version Edition

Parallel Debugging with DDT

SLURM Workload Manager

User s Manual

Grid Engine Users Guide p1 Edition

Manual for using Super Computing Resources

Running applications on the Cray XC30 4/12/2015

Debugging and Profiling Lab. Carlos Rosales, Kent Milfeld and Yaakoub Y. El Kharma

An Introduction to High Performance Computing in the Department

NEC HPC-Linux-Cluster

OLCF Best Practices. Bill Renaud OLCF User Assistance Group

The CNMS Computer Cluster

The RWTH Compute Cluster Environment

Introduction to the SGE/OGS batch-queuing system

1.0. User Manual For HPC Cluster at GIKI. Volume. Ghulam Ishaq Khan Institute of Engineering Sciences & Technology

Using WestGrid. Patrick Mann, Manager, Technical Operations Jan.15, 2014

HPCC - Hrothgar Getting Started User Guide MPI Programming

Grid Engine Training Introduction

Enigma, Sun Grid Engine (SGE), and the Joint High Performance Computing Exchange (JHPCE) Cluster

High Performance Computing with Sun Grid Engine on the HPSCC cluster. Fernando J. Pineda

SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education

High Performance Computing

SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education

Hodor and Bran - Job Scheduling and PBS Scripts

GRID Computing: CAS Style

The SUN ONE Grid Engine BATCH SYSTEM

Cluster Computing With R

Installing and running COMSOL on a Linux cluster

How To Run A Tompouce Cluster On An Ipra (Inria) (Sun) 2 (Sun Geserade) (Sun-Ge) 2/5.2 (

Overview of HPC systems and software available within

ABAQUS High Performance Computing Environment at Nokia

How to Run Parallel Jobs Efficiently

OLCF Best Practices (and More) Bill Renaud OLCF User Assistance Group

Running on Blue Gene/Q at Argonne Leadership Computing Facility (ALCF)

Martinos Center Compute Clusters

Efficient cluster computing

Linux Cluster Computing An Administrator s Perspective

Introduction to Running Hadoop on the High Performance Clusters at the Center for Computational Research

Quick Tutorial for Portable Batch System (PBS)

Miami University RedHawk Cluster Working with batch jobs on the Cluster

Introduction to ACENET Accelerating Discovery with Computational Research May, 2015

Introduction to Grid Engine

High Performance Computing Cluster Quick Reference User Guide

Beyond Windows: Using the Linux Servers and the Grid

8/15/2014. Best (and more) General Information. Staying Informed. Staying Informed. Staying Informed-System Status

bwgrid Treff MA/HD Sabine Richling, Heinz Kredel Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim 29.

Notes on the SNOW/Rmpi R packages with OpenMPI and Sun Grid Engine

CNAG User s Guide. Barcelona Supercomputing Center Copyright c 2015 BSC-CNS December 18, Introduction 2

Batch Job Analysis to Improve the Success Rate in HPC

RA MPI Compilers Debuggers Profiling. March 25, 2009

Introduction to Linux and Cluster Basics for the CCR General Computing Cluster

1 DCSC/AU: HUGE. DeIC Sekretariat /RB. Bilag 1. DeIC (DCSC) Scientific Computing Installations

Getting Started with HPC

Grid Engine experience in Finis Terrae, large Itanium cluster supercomputer. Pablo Rey Mayo Systems Technician, Galicia Supercomputing Centre (CESGA)

Using NeSI HPC Resources. NeSI Computational Science Team

Batch Scripts for RA & Mio

Parallel Processing using the LOTUS cluster

MPI / ClusterTools Update and Plans

Work Environment. David Tur HPC Expert. HPC Users Training September, 18th 2015

Tutorial: Using WestGrid. Drew Leske Compute Canada/WestGrid Site Lead University of Victoria

HPC at IU Overview. Abhinav Thota Research Technologies Indiana University

Introduction to Sun Grid Engine 5.3

Job scheduler details

Linux für bwgrid. Sabine Richling, Heinz Kredel. Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim. 27.

How To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) (

Caltech Center for Advanced Computing Research System Guide: MRI2 Cluster (zwicky) January 2014

PBS Tutorial. Fangrui Ma Universit of Nebraska-Lincoln. October 26th, 2007

LSKA 2010 Survey Report Job Scheduler

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

User s Guide. Introduction

Rocoto. HWRF Python Scripts Training Miami, FL November 19, 2015

New High-performance computing cluster: PAULI. Sascha Frick Institute for Physical Chemistry

Bright Cluster Manager 5.2. User Manual. Revision: Date: Fri, 30 Nov 2012

The Asterope compute cluster

Biowulf2 Training Session

Gentran Integration Suite. Archiving and Purging. Version 4.2

RWTH GPU Cluster. Sandra Wienke November Rechen- und Kommunikationszentrum (RZ) Fotos: Christian Iwainsky

Chapter 2: Getting Started

Microsoft HPC. V 1.0 José M. Cámara (checam@ubu.es)

Ra - Batch Scripts. Timothy H. Kaiser, Ph.D. tkaiser@mines.edu

TCB No September Technical Bulletin. GS FLX+ System & GS FLX System. Installation of 454 Sequencing System Software v2.

NYUAD HPC Center Running Jobs

Matlab on a Supercomputer

Cluster performance, how to get the most out of Abel. Ole W. Saastad, Dr.Scient USIT / UAV / FI April 18 th 2013

Transcription:

High Performance Computing Facility Specifications, Policies and Usage Supercomputer Project Bibliotheca Alexandrina Bibliotheca Alexandrina 1/16

Topics Specifications Overview Site Policies Intel Compilers Intel MPI Sun Grid Engine (SGE) Job Scheduler Bibliotheca Alexandrina 2/16

Specifications Overview Hardware 130 Compute Nodes ~12 TFLOPS peak performance 9.1 TFLOPS LINPACK benchmark ~1 TBytes main memory 10 Gbit/s infniband interconnect 36 TBytes shared scratch storage (raw) High-density, highly automated tape storage Bibliotheca Alexandrina 3/16

Specifications Overview (cont'd) Software Operating System RHEL 5.2 Compilers GNU Compiler Collection 4.1.2 (gcc, g++, etc.) Intel Compiler 11.0 Intel MPI3.2 OpenMPI, MPICH,...etc. Bibliotheca Alexandrina 4/16

Specifications Overview (cont'd) Software (cont'd) Management Provisioning: ROCKS Monitoring: Ganglia Shared Scratch Storage Lustre File System Backup Software Veritas Netbackup Bibliotheca Alexandrina 5/16

Site Policies Home Directory NFS Hosted on a separate node Shared among all users Very limited space; 100MB/user Bibliotheca Alexandrina 6/16

Site Policies (cont'd) Working Directory Lustre File System Hosted on SunFire X4500 server Shared among all users Quota/user not yet decided Bibliotheca Alexandrina 7/16

Site Policies (cont'd) Backup Policy Archive Working Directory Full Backup Frequency: 1/Week Retention Period: 2 Weeks Differential-Incremental Backup Frequency: 1/Day Retention Period: 2 Weeks Bibliotheca Alexandrina 8/16

Intel Compilers Load Intel Compiler 11 $ module load intel/compiler/11.0 Fortran ifort [options] file1 [ file2 ] C/C++ icc/icpc [options] file1 [file2 ] Bibliotheca Alexandrina 9/16

Intel MPI Load Intel MPI modules $ module load intel/mpi/3.2 This will load the underlying intel/compiler/11.0 Fortran MPI Wrappers mpif77, mpif90 [options] <files> C/C++ MPI Wrappers mpicc/mpic++ [options] <files> Bibliotheca Alexandrina 10/16

SGE Job Scheduler Load sge6.2 module $ module load sge6.2 Most useful SGE commands qsub / qdel (Submit jobs & delete jobs) qstat & qhost (Status info about queues, hosts and jobs) qacct (Summary info on completed job) Bibliotheca Alexandrina 11/16

SGE Job Scheduler (cont'd) qsub General format: $qsub <qsub options> program <prog options> Useful qsub options Option Description -b y[es] n[o] Indicate explicitly whether command should be treated as binary or script. -cwd Execute the job from the current working directory. -j y[es] n[o] Specifies whether or not the standard error is merged into the standard output. -l resource=value,... Launch the job in a queue meeting the given resource request list. -pe parallel_environment n Parallel programming environment (PE) to instantiate. -r y[es] n[o] Identifies the ability of a job to be rerun or not, in case the node on which the job is running crashes. Bibliotheca Alexandrina 12/16

SGE Job Scheduler (cont'd) How to provide options to qsub Default request file ($HOME/.sge_request) In job script (preceded by #$) On the command line Bibliotheca Alexandrina 13/16

SGE Job Scheduler (cont'd) qstat General format: $ qstat <qstat options> Useful qstat options Option -explain a A c E Description Displays the reason for the state of a queue instance. 'a' shows the reason for the alarm state. Suspend alarm state reasons will be displayed by 'A'. 'E' displays the reason for a queue instance error state. -f Shows a summary of information on all queues to be displayed along with the queued job list. -j [job_list] Prints various information for all jobs. -ne In combination with -f option suppresses the display of empty queues. Bibliotheca Alexandrina 14/16

SGE Job Scheduler (cont'd) qacct General format: $ qacct <qacct options> Useful qacct options Option Description -j [ID name] Print the accounting information about a finished job given its names or ID. If neither a name nor an ID is given all jobs are enlisted. -o [Owner] The name of the owner of the jobs for which accounting statistics are assembled. -P [Project] The name of the project for which usage is summarized. Bibliotheca Alexandrina 15/16

SGE Job Scheduler (cont'd) Debugging your jobs qstat -j <job_id> (problems with pending jobs) This will tell you why the job is pending qsub -w v <full job request> (verify submit) This will tell you if the job can run Watch your STDERR and STDOUT qacct -j <job_id> Check exit status to see if jobs were completed. Ask for help if you get stuck Bibliotheca Alexandrina 16/16

Thanks supercomputer@bibalex.org Bibliotheca Alexandrina