NEC HPC-Linux-Cluster



Similar documents
Streamline Computing Linux Cluster User Training. ( Nottingham University)

1.0. User Manual For HPC Cluster at GIKI. Volume. Ghulam Ishaq Khan Institute of Engineering Sciences & Technology

Manual for using Super Computing Resources

Grid 101. Grid 101. Josh Hegie.

Miami University RedHawk Cluster Working with batch jobs on the Cluster

Introduction to Running Hadoop on the High Performance Clusters at the Center for Computational Research

Work Environment. David Tur HPC Expert. HPC Users Training September, 18th 2015

Getting Started with HPC

Linux für bwgrid. Sabine Richling, Heinz Kredel. Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim. 27.

Introduction to Linux and Cluster Basics for the CCR General Computing Cluster

Hodor and Bran - Job Scheduling and PBS Scripts

An Introduction to High Performance Computing in the Department

Grid Engine Basics. Table of Contents. Grid Engine Basics Version 1. (Formerly: Sun Grid Engine)

User s Manual

The CNMS Computer Cluster

PBS Tutorial. Fangrui Ma Universit of Nebraska-Lincoln. October 26th, 2007

HPC at IU Overview. Abhinav Thota Research Technologies Indiana University

High Performance Computing Facility Specifications, Policies and Usage. Supercomputer Project. Bibliotheca Alexandrina

The Asterope compute cluster

Using WestGrid. Patrick Mann, Manager, Technical Operations Jan.15, 2014

Using Parallel Computing to Run Multiple Jobs

Using the Yale HPC Clusters

SLURM Workload Manager

Running on Blue Gene/Q at Argonne Leadership Computing Facility (ALCF)

Quick Tutorial for Portable Batch System (PBS)

Grid Engine Users Guide p1 Edition

bwgrid Treff MA/HD Sabine Richling, Heinz Kredel Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim 24.

Introduction to SDSC systems and data analytics software packages "

Martinos Center Compute Clusters

Parallel Debugging with DDT

SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education

Running applications on the Cray XC30 4/12/2015

Introduction to Sun Grid Engine (SGE)

Tutorial: Using WestGrid. Drew Leske Compute Canada/WestGrid Site Lead University of Victoria

Introduction to Running Computations on the High Performance Clusters at the Center for Computational Research

Table of Contents New User Orientation...1

To connect to the cluster, simply use a SSH or SFTP client to connect to:

CNAG User s Guide. Barcelona Supercomputing Center Copyright c 2015 BSC-CNS December 18, Introduction 2

Juropa. Batch Usage Introduction. May 2014 Chrysovalantis Paschoulas

SGE Roll: Users Guide. Version Edition

Using NeSI HPC Resources. NeSI Computational Science Team

The RWTH Compute Cluster Environment

Introduction to HPC Workshop. Center for e-research

Installing and running COMSOL on a Linux cluster

High-Performance Reservoir Risk Assessment (Jacta Cluster)

Batch Scripts for RA & Mio

Parallel Computing using MATLAB Distributed Compute Server ZORRO HPC

Beyond Windows: Using the Linux Servers and the Grid

On-demand (Pay-per-Use) HPC Service Portal

New High-performance computing cluster: PAULI. Sascha Frick Institute for Physical Chemistry

Using the Yale HPC Clusters

Introduction to ACENET Accelerating Discovery with Computational Research May, 2015

HPCC USER S GUIDE. Version 1.2 July IITS (Research Support) Singapore Management University. IITS, Singapore Management University Page 1 of 35

Introduction to Supercomputing with Janus

Parallel Processing using the LOTUS cluster

A Crash course to (The) Bighouse

Caltech Center for Advanced Computing Research System Guide: MRI2 Cluster (zwicky) January 2014

RA MPI Compilers Debuggers Profiling. March 25, 2009

SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education

icer Bioinformatics Support Fall 2011

Using the Millipede cluster - I

An introduction to compute resources in Biostatistics. Chris Scheller

High Performance Computing with Sun Grid Engine on the HPSCC cluster. Fernando J. Pineda

bwgrid Treff MA/HD Sabine Richling, Heinz Kredel Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim 29.

Biowulf2 Training Session

GRID Computing: CAS Style

Cluster Computing With R

HPC system startup manual (version 1.30)

NYUAD HPC Center Running Jobs

UMass High Performance Computing Center

How To Run A Tompouce Cluster On An Ipra (Inria) (Sun) 2 (Sun Geserade) (Sun-Ge) 2/5.2 (

Best Practice mini-guide "Stokes"

Ra - Batch Scripts. Timothy H. Kaiser, Ph.D. tkaiser@mines.edu

Introduction to the CRAY XE6(Lindgren) environment at PDC. Dr. Lilit Axner (PDC, Sweden)

JUROPA Linux Cluster An Overview. 19 May 2014 Ulrich Detert

Debugging and Profiling Lab. Carlos Rosales, Kent Milfeld and Yaakoub Y. El Kharma

An introduction to Fyrkat

8/15/2014. Best (and more) General Information. Staying Informed. Staying Informed. Staying Informed-System Status

Matlab on a Supercomputer

Using the Windows Cluster

Overview of HPC Resources at Vanderbilt

White Paper. Fabasoft on Linux Cluster Support. Fabasoft Folio 2015 Update Rollup 2

Advanced PBS Workflow Example Bill Brouwer 05/01/12 Research Computing and Cyberinfrastructure Unit, PSU

Agenda. Using HPC Wales 2

Parallel Programming for Multi-Core, Distributed Systems, and GPUs Exercises

OLCF Best Practices (and More) Bill Renaud OLCF User Assistance Group

OpenMP & MPI CISC 879. Tristan Vanderbruggen & John Cavazos Dept of Computer & Information Sciences University of Delaware

Bright Cluster Manager 5.2. User Manual. Revision: Date: Fri, 30 Nov 2012

How to Run Parallel Jobs Efficiently

OLCF Best Practices. Bill Renaud OLCF User Assistance Group

Automated Testing of Installed Software

ABAQUS High Performance Computing Environment at Nokia

Parallel Computing with Mathematica UVACSE Short Course

GC3: Grid Computing Competence Center Cluster computing, I Batch-queueing systems

How To Run A Steady Case On A Creeper

bwgrid Treff MA/HD Sabine Richling, Heinz Kredel Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim 20.

WestGrid. Handbook for Researchers at the University of Manitoba. January 2010

24/08/2004. Introductory User Guide

High Performance Computing Cluster Quick Reference User Guide

Technological Overview of High-Performance Computing. Gerolf Ziegenhain - TU Kaiserslautern, Germany

Transcription:

NEC HPC-Linux-Cluster Hardware configuration: 4 Front-end servers: each with SandyBridge-EP processors: 16 cores per node 128 GB memory 134 compute nodes: 112 nodes with SandyBridge-EP processors (16 cores per node) and 128 GB memory 4 nodes with SandyBridge-EP processors (16 cores per node) and 256 GB memory 18 nodes with Intel Haswell processors (24 cores per node) and 128 GB memory (at the moment only available for members of the excellence cluster Future Ocean and Geomar-Users) all compute nodes are connected with FDR-Infiniband network all nodes have access to a 1.5 PB global fast parallel filesystem (ScaTeFS) Access to the NEC-Linux cluster: login to the front-ends: ssh nesh-fe.rz.uni-kiel.de -l username resp. ssh -X nesh-fe.rz.unikiel.de -l username (if you need X-forwarding, graphical interface) the name nesh-fe is an alias for the login-servers nesh-fe1 to nesh-fe3, i.e. with the name nesh-fe you will be automatically redirected to one of the login nodes Operating system: Red Hat Linux (6.4); bash is standard shell Available file systems: each user has access to four different file systems: home-directory available via the environment variable $HOME path: /sfs/fs5/home-sh/username (CAU-Users) path: /sfs/fs6/geomar-sh/username (Geomar-Users) files stored in this directory have an unlimited life time will be backed up daily is available on all nodes only 32 TB for all university users and 32 TB for all Geomar-Users; no individual quota at the moment slower access time compared to the $WORK-directory work-directory available via the environment variable $WORK 9 different directories: please check your $WORK variable for the absolute pathname with the command echo $WORK is available on all nodes no backup of this directory unlimited life-time

no individual quota at the moment faster access time than to the $HOME directory local disk space on each node size: 500 TB on each cluster-node tape_cache-directory available via the environment variable $TAPE_CACHE access to our tape-library path: /nfs/tape_cache/username for files not needed for current calculations only available on the login node and in the batch class feque please store if possible only larger tar-files on this directory (max. size of one file: 500 GB) slow access time --> please don't work directly with files stored on the tape-library; please copy them first back to your $WORK-directory please don't use the rsync-command for copying and synchronizing files and directories to or from your your $TAPE_CACHE directory Available compilers: gnu-compiler: version 4.4.7 in the standard search path: gcc, gfortran and g++ newer versions available via modules module load gcc_4.8.2 (version 4.8.2) module load gcc_5.1.0 (version 5.1.0) Intel-compiler: version 14.0.0 and version 15.0.3.: ifort, icc and icpc please use the following command to set the correct environment variables before using the Intel-compiler: module load intel (version 14.0.0) module load intel15.0.3 (version 15.0.3.) MPI implementation for parallel programs: Intel-MPI please use the following command to set the correct environment variables before compiling or running an MPI-program: module load intelmpi compiling parallel MPI programs: gnu-compiler: mpif90, mpicc or mpicxx Intel-Compiler: mpiifort, mpiicc, mpiicpc For using more than one node for a parallel calculation, the ssh access must be configured for password-less access between the batch nodes. 1. To achieve this, the following steps are necessary: 1. type the command ssh-keygen -t dsa (confirm prompted questions only with enter-key) 2. copying the file $HOME/.ssh/id_dsa.pub into the file $HOME/.ssh/authorized_keys

Environment Modules We use on our NEC HPC-system the module concept to manage the environments for using different software packages, libraries and compiler versions. With the modules approach it is no longer necessary to explicitly specify paths for different software packages and it is easy to switch between different versions of the same software package. The main commands related to the environment modules are: module avail lists all available modules module load name adds the module named name to your environemt module unload name removes the module name and clears all corresponding settings module list lists all currently loaded modules module show name shows all settings which are performed by the module name man module provides details about the module command an all its subcommands Available libraries (only a selection): Intel MKL-library HDF5 library netcdf library Available software (only a selection): matlab R (version 3.1.1; no module load command necessary) perl (version 5.10.1) python: versions 2.6.6, 2.7 and 3.3.6 trinity samtools cufflinks Batch system and batch queues: please use for interactive work only the login node nesh-fe for longer calculations the following batch classes are available on the NEC cluster: clexpress: max. walltime 1 hour; 2 nodes available clmedium: max. walltime 48 hours (76 nodes available at the moment) cllong: max. walltime 100 hours at the moment (40 nodes available at the moment) clbigmem: max. walltime 100 hours; 4nodes with up to 256 GB memory are available clfocean: max. walltime 100 hours (4 nodes each with 16 cores and 128 GB memory) extra authorisation is necessary for this queue clfo2: max. walltime 100 hours (18 nodes each with 24 cores and 128 GB memory) extra authorisation is necessary for this queue feque: one node available; e.g. for transferring data to the tape_library via batch

Commands for submitting and monitoring batch jobs: qsub nec_cluster.nqs send the batch script with the name nec_cluster.nqs to the batchsystem qstatall information about all waiting and running batch jobs on the whole system (Linux cluster and vector nodes) qstatcl and qstatace information about all batch jobs on the Linux cluster nodes (qstatcl) or vector nodes (qstatace) qstat information about your own batchjobs qdel jobid: command to delete one of your jobs qcl: overview of the currently available free nodes in the different batch queues Example batch script for a serial calculation: #!/bin/bash #PBS -q clmedium --> name of the batch class #PBS -l cpunum_job= 1 --> number of cores per node (here 1 core) #PBS -b 1 ---> number of nodes (here 1 node) #PBS -l cputim_job=32:00::00 --> CPU-time (= at least corenumber * walltime) #PBS -l elapstim_req=2:00:00 ---> walltime requested #PBS -l memsz_job=10gb --> memory request per node #PBS -j o --> redirect standard and error output into the same file #PBS -o clustertest.out ---> name of the standard output #PBS -m bea --> specifies that the batch system send an email notification when the job begins (b), ends (e), or aborrs (a). #PBS -M mustermann@xxx.uni-kiel.de --> send status information to this email adress cd $PBS_O_WORKDIR. /opt/modules/modules/3.2.6/init/bash --> initialisation of the module-concept module load intel --> set the correct environment variables for the intel-compiler module load trinity --> setting environment variables for special software (here e.g. trinity)./name_of_the executable --> starting a serial program explanation for the line. /opt/modules/modules/3.2.6/init/bash: Be aware of the fact that there is a space between the dot and /opt

Example batch script for a parallel calculation: this example batch script runs on four nodes each with 16 cores #!/bin/bash #PBS -q clmedium --> name of the batch class #PBS -l cpunum_job=16 --> number of cores per node #PBS -b 4 ---> number of nodes (here 4 nodes each with 16 cores) #PBS -l cputim_job=32:00::00 --> CPU-time (= at least corenumber * walltime) #PBS -l elapstim_req=2:00:00 ---> walltime requested #PBS -l memsz_job=10gb --> memory request per node #PBS -T intmpi ---> line is necessary for running MPI-programs #PBS -j o --> redirect standard and error output into the same file #PBS -o clustertest.out ---> name of the standard output cd $PBS_O_WORKDIR. /opt/modules/modules/3.2.6/init/bash --> initialisation of the module-concept module load intel intelmpi --> set the correct environment variables for the intel-compiler moduleload trinity --> setting environment variables for special software (here e.g. trinity) mpirun $NQSII_MPIOPTS -np 64./prog.exe --> starting a parallel program explanation for the line. /opt/modules/modules/3.2.6/init/bash: Be aware of the fact that there is a space between the dot and /opt