The CNMS Computer Cluster



Similar documents
Getting Started with HPC

Streamline Computing Linux Cluster User Training. ( Nottingham University)

Hodor and Bran - Job Scheduling and PBS Scripts

Introduction to Linux and Cluster Basics for the CCR General Computing Cluster

1.0. User Manual For HPC Cluster at GIKI. Volume. Ghulam Ishaq Khan Institute of Engineering Sciences & Technology

Manual for using Super Computing Resources

User s Manual

NEC HPC-Linux-Cluster

An Introduction to High Performance Computing in the Department

Miami University RedHawk Cluster Working with batch jobs on the Cluster

A Crash course to (The) Bighouse

Grid Engine Basics. Table of Contents. Grid Engine Basics Version 1. (Formerly: Sun Grid Engine)

Grid 101. Grid 101. Josh Hegie.

HPCC - Hrothgar Getting Started User Guide MPI Programming

High Performance Computing Facility Specifications, Policies and Usage. Supercomputer Project. Bibliotheca Alexandrina

Introduction to Running Computations on the High Performance Clusters at the Center for Computational Research

PBS Tutorial. Fangrui Ma Universit of Nebraska-Lincoln. October 26th, 2007

Introduction to Running Hadoop on the High Performance Clusters at the Center for Computational Research

LANL Computing Environment for PSAAP Partners

JUROPA Linux Cluster An Overview. 19 May 2014 Ulrich Detert

Job Scheduling on a Large UV Chad Vizino SGI User Group Conference May Pittsburgh Supercomputing Center

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

1 Bull, 2011 Bull Extreme Computing

Work Environment. David Tur HPC Expert. HPC Users Training September, 18th 2015

MOSIX: High performance Linux farm

Linux Cluster Computing An Administrator s Perspective

High Performance Computing at the Oak Ridge Leadership Computing Facility

Quick Tutorial for Portable Batch System (PBS)

Running on Blue Gene/Q at Argonne Leadership Computing Facility (ALCF)

Using WestGrid. Patrick Mann, Manager, Technical Operations Jan.15, 2014

How To Compare Amazon Ec2 To A Supercomputer For Scientific Applications

The RWTH Compute Cluster Environment

ABAQUS High Performance Computing Environment at Nokia

Installing and running COMSOL on a Linux cluster

Using NeSI HPC Resources. NeSI Computational Science Team

FLOW-3D Performance Benchmark and Profiling. September 2012

Building Clusters for Gromacs and other HPC applications

Sun Constellation System: The Open Petascale Computing Architecture

Oracle Database Reliability, Performance and scalability on Intel Xeon platforms Mitch Shults, Intel Corporation October 2011

INDIAN INSTITUTE OF TECHNOLOGY KANPUR Department of Mechanical Engineering

Recommended hardware system configurations for ANSYS users

A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures

Introduction to Supercomputing with Janus

Introduction to SDSC systems and data analytics software packages "

How To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) (

Enabling Technologies for Distributed Computing

1 DCSC/AU: HUGE. DeIC Sekretariat /RB. Bilag 1. DeIC (DCSC) Scientific Computing Installations

PARALLELS SERVER BARE METAL 5.0 README

Cluster Computing at HRI

Improved LS-DYNA Performance on Sun Servers

Introduction to Sun Grid Engine (SGE)

Introduction to HPC Workshop. Center for e-research

GRID Computing: CAS Style

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

How to Run Parallel Jobs Efficiently

Martinos Center Compute Clusters

Linux für bwgrid. Sabine Richling, Heinz Kredel. Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim. 27.

Tutorial: Using WestGrid. Drew Leske Compute Canada/WestGrid Site Lead University of Victoria

A Smart Investment for Flexible, Modular and Scalable Blade Architecture Designed for High-Performance Computing.

Introduction 1 Performance on Hosted Server 1. Benchmarks 2. System Requirements 7 Load Balancing 7

MPI / ClusterTools Update and Plans

System Requirements Table of contents

The Asterope compute cluster

Parallel Programming for Multi-Core, Distributed Systems, and GPUs Exercises

Enabling Technologies for Distributed and Cloud Computing

CORRIGENDUM TO TENDER FOR HIGH PERFORMANCE SERVER

Debugging and Profiling Lab. Carlos Rosales, Kent Milfeld and Yaakoub Y. El Kharma

CNR-INFM DEMOCRITOS and SISSA elab Trieste

SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education

Building a Top500-class Supercomputing Cluster at LNS-BUAP

High Performance Computing Cluster Quick Reference User Guide

Using the Windows Cluster

HP Proliant BL460c G7

SR-IOV: Performance Benefits for Virtualized Interconnects!

24/08/2004. Introductory User Guide

Achieving Performance Isolation with Lightweight Co-Kernels

Running applications on the Cray XC30 4/12/2015

High Performance Computing

New High-performance computing cluster: PAULI. Sascha Frick Institute for Physical Chemistry

- An Essential Building Block for Stable and Reliable Compute Clusters

CMS Tier-3 cluster at NISER. Dr. Tania Moulik

SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education

Purchase of High Performance Computing (HPC) Central Compute Resources by Northwestern Researchers

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/ CAE Associates

Web Server (Step 1) Processes request and sends query to SQL server via ADO/OLEDB. Web Server (Step 2) Creates HTML page dynamically from record set

Kashif Iqbal - PhD Kashif.iqbal@ichec.ie

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales

Trend Micro Incorporated reserves the right to make changes to this document and to the products described herein without notice.

HPC at IU Overview. Abhinav Thota Research Technologies Indiana University

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect

Syncplicity On-Premise Storage Connector

Lecture 1: the anatomy of a supercomputer

Transcription:

The CNMS Computer Cluster This page describes the CNMS Computational Cluster, how to access it, and how to use it. Introduction (2014) The latest block of the CNMS Cluster (2010) Previous blocks of the CNMS Cluster (2008) Second blocks of the CNMS Cluster (2006) First block of the CNMS cluster The computer cluster is a Beowulf style machine purchased in 2010 for the CNMS. It is provided with a light level of support. It is NOT provided with the level of support of a user facility like NERSC or NCCS. Please take care to Hardware 1. make suggestions and report problems you might be the first person to spot an issue 2. not to abuse the resource: check with your CNMS contact if you need e.g. to run an extraordinary large number or size of jobs. The CNMS Computer Cluster is composed of one rack of the 2014 additions, two racks of the 2010 additions to the Oak Ridge Institutional Cluster, two racks of the 2008 additions, and 1 rack of the original units (http://oic.ornl.gov). The software and hardware of the 2014 additions are configured very similarly to earlier racks. Experience, codes, and scripts should carry over. Briefly the 2014 unit has 896 cores, 3.5TB of Memory (4G/core of RAM) = approximately 7.89 TFLOPs (on the compute nodes) theoretical. The 2010 units are an Intel/Linux/PBS/MPI/Infiniband system with 336 processor cores total, 3GB memory/core, delivering nearly 3 million CPU hours per year. CNMS has two of these units. The 2008 blocks are two units consisting of XE310 boxes (3 Ghz) from SGI that contain one login node, one storage node and 28 compute nodes within 14 chassis. Each node has eight cores. There are 16GB of memory per node. The login and storage nodes are XE240 boxes from SGI; Two units consisting of XE240 SGI nodes and contain one login node, one storage node and 20 compute nodes within 20 separate chassis. Each node has 8 cores. There are 16GB of memory per node. These nodes contain larger node-local scratch space and a much higher I/O to this scratch space because the scratch space is a volume from 4 disks. The 2006 block is one unit of the original blocks that consist of a bladed architecture from Ciara Technologies called VXRACK. Each VXRACK contains two login nodes, three storage nodes, and 80 compute nodes. Each compute node has Dual Intel 3.4GHz Xeon EM64T processors, 4GB of memory and dual Gigabit Ethernet Interconnects. Compute nodes 2014: 28 Compute Nodes: Each with two Interlagos Opterons 6274 w/ 16 cores 2.2GHz 16MB L3 Cache, 128GB RAM (DDR3 Registered, Dual Rank, 1333MHz, 1.5v), 1TB HDD 2010: 2x42 nodes each with two (2) Quad Core 2.26 GHz Intel Xeon E5520 (Nahelem-EP) processors with 24GB 1066MHz DDR3 memory per node and 500GB of local disk: 2008 2x48 nodes each with two (2) Quad Core 3.0 GHz Xeon processors; 2006 1x80 nodes each with Dual Core 3.4 Ghz Xeon EM64T processors. The interconnect is quad data rate (4X) Infiniband for 2014, 2010 and 2008 units and gigabit for the 2006 unit.

Head node 2014: One Interlagos Opteron 6234 w/ 12 cores, 2.4GHz 16MB L3 Cache, 64GB RAM (DDR3 Registered, Dual Rank, 1333MHz, 1.5v), DVD RW, two 1TB HDD 2010: Two (2) Quad Core 2.26 GHz Intel Xeon E5520 (Nahelem-EP) processors with 24GB 1066MHz DDR3 memory and redundant power supplies. 2008: Two (2) Quad Core 3.0 GHz Intel Xeon XE340 processors and Two (2) Quad Core 3.0 GHz Intel Xeon XE240 processors 2006: Dual core 3.4 Ghz Xeon EM64T processors. Access The machine is behind the ORNL firewall. A SecureID token and OIC account are required for access. You must have an active CNMS user project or be a CNMS staff. If you have a project or are staff and do not have ORNL computer access, write Erica Lohman (lohmanem@ornl.gov) and provide the CNMS user project number and PI and ask for access to the CNMS computer cluster. Obtaining Help Account problems should be sent to the main ORNL helpline, oic_admin@ornl.gov. Be sure to clearly state that you are referring to the OIC and how to reproduce the problem. For simulation software issues the OIC administrators might be able to help if the problem seems hardware related. Contacting other CNMS staff members or OIC users is likely to be more productive for more general issues. Logging in The OIC is heterogeneous, so where you log in will be based on the kind of account you have. Your login nodes are actually a farm of similar computers running Linux behind a load balancer. If you have issues with a login node, please send email to oic_admin@ornl.gov and include the hostname of the login node that you are assigned after logging in. The OIC is available to ssh to directly ONLY internally to ORNL on the ORNL Admin network. This is the network where most workstations and desktops are located on site at the lab (but not the visitor network). You cannot ssh directly into the OIC login nodes from outside of ORNL or from any other protection zone or enclave (such as supercomputing or open research) If you are attempting access from outside ORNL, first use VPN or login via the gateway machine login1.ornl.gov. You need a SecurID token for this step and a valid UCAMS userid and password. From inside ORNL or the gateway machine ssh myuserid@oicphase2.ornl.gov (for access to nodes the 2008) ssh myuserid@oicphase3.ornl.gov (for access to nodes on the 2010 cluster) ssh myuserid@oicphase5.ornl.gov (access to nodes on the 2014 cluster) Use your UCAMS password. Using the head nodes The head nodes are intended for compiling and submitting jobs. Do not run long simulations or analysis jobs on the head nodes. These should be submitted as PBS jobs.

Software The CNMS unit shares software with the other OIC units. See https://docs.oic.ornl.gov/wiki/doku.php Most significant software packages (compilers, mpi, some applications) have been configured via the modules environment. The environment is also installed e.g., at NERSC and NCCS. The command module list shows currently loaded modules module avail shows available modules module load abc loads the abc module module unload abc unload (removes) the abc module These commands can be added to shell startup files if you wish. Recommended software Paul Kent recommends using the following setup unless you have technical reasons to use otherwise. Be sure to unload other loaded mpi and compiler modules. Many older compiler and mpi versions are also available, but understand that they are more buggy and help will be harder to obtain if you are doing something exotic. Please use the latest Intel C++ and Fortran Compilers module load icce/11.1 Notice that this includes the fortran compiler Please use the latest OpenMPI module load mpi/openmpi-1.6.5-pgi If you have used the OIC before and changed the defaults, you might also need to switcher mpi = openmpi- 1.3.2-intel If you choose to use the gnu compilers you will need to use e.g. mpi/openmpi-1.6.5-gcc4 Please use the Intel MKL BLAS/LAPACK Linked by specifying -mkl on the compile/link line using the v11 Intel compilers. (See below for more options including Scalapack). The actual libraries are under /opt/intel/compiler /11.1/072/mkl/. Unfortunately the link lines can be quite complex if you opt to manually link... By default, the MKL library uses threads. For single core jobs, this setting can improve performance considerably. However, If you are using all the cores on a node for MPI tasks, this risks actually reducing performance. Set OMP_NUM_THREADS=1 to avoid this. Running jobs Jobs are submitted and monitored via standard PBS queueing commands: qsub, qstat, qdel, qhold, qrls. Abuse Respect other users. We aim for a permissive configuration of this cluster so that the resource can be maximally and flexibly utilized. If you see problems please let the computational thrust leaders know. Note that the main userspace filesystem is mounted via NFS. Large amounts of I/O will flood the network slowing access for everyone. Instead write to local scratch $PBS_SCRATCH and copy large files back, or change/reconfigure your software to write infrequently. This scratch space is not shared between nodes. (October 2010) The 2010 units feature a shared/global scratch filespace for each job. This means that files written in this space can be read by all nodes with high performance. Use $GLOBAL_SCRATCH within your job for access. These directories are deleted at the end of each job, so be sure to copy back the files you need. While a job is running, you can access these directories on the head nodes via /global_scratch/your_job_id/ A very good way to end up the black books of the administrators and other users is to run a quantum chemistry code (e.g. NWCHEM) and write a large integrals file to your homespace. Please don t do this.

Sample job The job below requests 2 hours on 16 cores #PBS - N TESTJOB #PBS - j oe #PBS - M my_email@somewhere.inter.net #PBS - m abe #PBS - q cnms08fq #PBS - l walltime=2:00:00,nodes=2:ppn=8 NCORES=16 EXEC=./supercapacitor_simulation # In case of unresolved symbols, good candidates are: wrong mpi module loaded &/or missing LD_LIBRARY_PATH to # compiler runtime libraries or Intel MKL library. Check your shell startup files or e.g. # export LD_LIBRARY_PATH=/opt/intel/Compiler/11.1/072/mkl/lib/em64t/:${LD_LIBRARY_PATH} #echo $LD_LIBRARY_PATH cd $PBS_O_WORKDIR mpirun - v - - mca mpi_leave_pinned 1 - - mca mpool_base_use_mem_hooks 1 - - mca bt1 openib,self - np ${NCORES} ${EXEC} Queue structure Queue name Core count Node Run Time Unit Global scratch Local scratch (not shared between nodes) 2014 $GLOBAL_SCRATCH cnms14q 16 none 3 72 hours cnms10fq 8 2 3 72 2010 $GLOBAL_SCRATCH hours cnms10tq None None 2 None 2010 $GLOBAL_SCRATCH 72 2008 None $PBS_SCRATCH cnms08fq 8 1 3 hours cnms08tq None None 2 None 2008 None $PBS_SCRATCH cnmsq None None 2 None 2006 None $PBS_SCRATCH The queue configuration will be adjusted based on experience and feedback. Specifying the number of cpus, nodes Please choose a processor core count appropriate for your calculation. The quantum of allocation is a single node. Each node has eight cores. Consequently a single core job locks up the resources of an eight core job. Even a barely parallel job will make better use of resources than a serial run. Also note that each 2010 node has 24GB memory, more than machines at NERSC or NCCS. Large jobs might fit on fewer cores than on other machines. To check the parallelization efficiency of your simulation on the cluster you will have to

run benchmarks. Efficiencies should be reasonable, but not as good as e.g. a Cray XT that has a better interconnect and tuned software stack. For jobs requiring 8 or fewer cores, vary the ppn (processors per node) request: e.g. a four core run #PBS l nodes=1:ppn=4 mpirun np 4./a.out For jobs requiring multiples of 8 cores: e.g. a sixty-four core run #PBS l nodes=8:ppn=8 mpirun np 64./a.out Submitting jobs qsub myjob.job Monitoring jobs qstat Note that the default queue status report includes every job on every queue on the cluster. To see only your jobs Deleting jobs qstat u $USER qdel W 0 jobnumber Queue dependencies Standard PBS job dependencies are supported enabling very long running simulations or automatic running of analysis: qsub first.job 12345.b15l01.oic.ornl.gov qsub W depend=afterok:12345 second.job 12346.b15l01.oic.ornl.gov qsub W depend=afterok:12346 third.job The second job will only start after the first one has completed with no errors. Similarly the third will only start after the second has completed with no errors. By using a (very) little scripting VASP, LAMMPS, NAMD etc. jobs can be automatically restarted and continued for long simulation times. Thus there is no effective on the length of simulation that can be run. Building codes The OpenMPI library provides wrappers for the C, C++, and Fortran compilers. If you are building a standard MPI application, look for an Intel/MPI configuration.

The following should compile and link, producing a hello_world executable cat >hello_world.f <<EOF program hello include 'mpif.h' integer rank, size, ierror, tag, status(mpi_status_size) call MPI_INIT(ierror) call MPI_COMM_SIZE(MPI_COMM_WORLD, size, ierror) call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierror) print*, 'node', rank, ': Hello world' call MPI_FINALIZE(ierror) end EOF mpif90 o hello_world hello_world.f Numerical libraries See the available modules The latest (v11) Intel compilers can link agains the Intel Math Kernel Library (MKL) by specifying the mkl on the link line mpif90 mkl o numerical.exe numerical.f90 The libraries can be directly linked from MKLPATH=/opt/intel/Compiler/11.1/072/mkl/lib/em64t To link the parallel Scalapack library in combination with OpenMPI use -L$(MKLPATH) -lmkl_scalapack_lp64 -lmkl_blacs_openmpi_lp64 Applications LAMMPS NWChem VASP Paul Kent has compiled versions of Vasp 4.6 and 5.2.8 for the OIC unit using the latest compilers, MKL, the MKL FFTW, and MKL scalapack. Access is only provided to authorized license holders of vasp.