Research Computing The Apollo HPC Cluster. Tom Armour Jeremy Maris Research Computing IT Services University of Sussex

Similar documents
The CNMS Computer Cluster

An introduction to Fyrkat

Introduction to ACENET Accelerating Discovery with Computational Research May, 2015

How to Run Parallel Jobs Efficiently

Sun Constellation System: The Open Petascale Computing Architecture

Overview of HPC systems and software available within

An Alternative Storage Solution for MapReduce. Eric Lomascolo Director, Solutions Marketing

JUROPA Linux Cluster An Overview. 19 May 2014 Ulrich Detert

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

Getting Started with HPC

Advanced Techniques with Newton. Gerald Ragghianti Advanced Newton workshop Sept. 22, 2011

Streamline Computing Linux Cluster User Training. ( Nottingham University)

Grid Engine Basics. Table of Contents. Grid Engine Basics Version 1. (Formerly: Sun Grid Engine)

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

1 Bull, 2011 Bull Extreme Computing

Manual for using Super Computing Resources

How To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) (

FLOW-3D Performance Benchmark and Profiling. September 2012

An Introduction to High Performance Computing in the Department

User s Manual

Automated Testing of Installed Software

Background and introduction Using the cluster Summary. The DMSC datacenter. Lars Melwyn Jensen. Niels Bohr Institute University of Copenhagen

High Performance Computing Facility Specifications, Policies and Usage. Supercomputer Project. Bibliotheca Alexandrina

ALPS Supercomputing System A Scalable Supercomputer with Flexible Services

Using Reservations to Implement Fixed Duration Node Allotment with PBS Professional

SLURM Workload Manager

Cluster Implementation and Management; Scheduling

The Top Six Advantages of CUDA-Ready Clusters. Ian Lumb Bright Evangelist

icer Bioinformatics Support Fall 2011

Parallel Processing using the LOTUS cluster

New Storage System Solutions

Commoditisation of the High-End Research Storage Market with the Dell MD3460 & Intel Enterprise Edition Lustre

Part I Courses Syllabus

Cloud Computing through Virtualization and HPC technologies

NEC HPC-Linux-Cluster

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/ CAE Associates

HP reference configuration for entry-level SAS Grid Manager solutions

PRIMERGY server-based High Performance Computing solutions

HPC Cloud. Focus on your research. Floris Sluiter Project leader SARA

Parallel Debugging with DDT

Clusters: Mainstream Technology for CAE

Technological Overview of High-Performance Computing. Gerolf Ziegenhain - TU Kaiserslautern, Germany

A Crash course to (The) Bighouse

Lustre failover experience

Kashif Iqbal - PhD Kashif.iqbal@ichec.ie


CRIBI. Calcolo Scientifico e Bioinformatica oggi Università di Padova 13 gennaio 2012

The PHI solution. Fujitsu Industry Ready Intel XEON-PHI based solution. SC Denver

Hadoop on the Gordon Data Intensive Cluster

1.0. User Manual For HPC Cluster at GIKI. Volume. Ghulam Ishaq Khan Institute of Engineering Sciences & Technology

SRNWP Workshop. HP Solutions and Activities in Climate & Weather Research. Michael Riedmann European Performance Center

The Hardware Dilemma. Stephanie Best, SGI Director Big Data Marketing Ray Morcos, SGI Big Data Engineering

Building a Top500-class Supercomputing Cluster at LNS-BUAP

SGE Roll: Users Guide. Version Edition

CAS2K5. Jim Tuccillo

Computational infrastructure for NGS data analysis. José Carbonell Caballero Pablo Escobar

Current Status of FEFS for the K computer

Hodor and Bran - Job Scheduling and PBS Scripts

SR-IOV: Performance Benefits for Virtualized Interconnects!

The RWTH Compute Cluster Environment

Fujitsu HPC Cluster Suite

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

Dell Virtualization Solution for Microsoft SQL Server 2012 using PowerEdge R820

THE SUN STORAGE AND ARCHIVE SOLUTION FOR HPC

HPC Growing Pains. Lessons learned from building a Top500 supercomputer

Performance Comparison of Intel Enterprise Edition for Lustre* software and HDFS for MapReduce Applications

Workshop on Parallel and Distributed Scientific and Engineering Computing, Shanghai, 25 May 2012

Introduction to SDSC systems and data analytics software packages "

High Performance Computing Cluster Quick Reference User Guide

SUN HPC SOFTWARE CLUSTERING MADE EASY

INDIAN INSTITUTE OF TECHNOLOGY KANPUR Department of Mechanical Engineering

ECDF Infrastructure Refresh - Requirements Consultation Document

HPC Advisory Council

The Asterope compute cluster

Picking the right number of targets per server for BeeGFS. Jan Heichler March 2015 v1.2

High Performance Computing OpenStack Options. September 22, 2015

CORRIGENDUM TO TENDER FOR HIGH PERFORMANCE SERVER

Grid 101. Grid 101. Josh Hegie.

1 DCSC/AU: HUGE. DeIC Sekretariat /RB. Bilag 1. DeIC (DCSC) Scientific Computing Installations

Kriterien für ein PetaFlop System

HPC Wales Skills Academy Course Catalogue 2015

HPC Update: Engagement Model

Parallel Large-Scale Visualization

How To Run A Tompouce Cluster On An Ipra (Inria) (Sun) 2 (Sun Geserade) (Sun-Ge) 2/5.2 (

Caltech Center for Advanced Computing Research System Guide: MRI2 Cluster (zwicky) January 2014

CHAPTER FIVE RESULT ANALYSIS

Recommended hardware system configurations for ANSYS users

Dell Reference Configuration for Hortonworks Data Platform

Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems

Introduction to Linux and Cluster Basics for the CCR General Computing Cluster

MPI / ClusterTools Update and Plans

Thematic Unit of Excellence on Computational Materials Science Solid State and Structural Chemistry Unit, Indian Institute of Science

Biowulf2 Training Session

Improved LS-DYNA Performance on Sun Servers

Parallel Programming for Multi-Core, Distributed Systems, and GPUs Exercises

PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN

bwgrid Treff MA/HD Sabine Richling, Heinz Kredel Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim 24.

PARALLELS CLOUD STORAGE

HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect

Transcription:

Research Computing The Apollo HPC Cluster Tom Armour Jeremy Maris Research Computing IT Services University of Sussex

Apollo Cluster - Aims Computing provision beyond capability of desktop Priority access determined by HPC Advisory Group HPC User Group to be formed next term Shared infrastructure and support from IT Services Extension by departments Storage (adding to Lustre, access to SAN) CPU Software Licenses Expansion by IT Services as budgets allow

High Performance Computing? Generically, computing systems comprised of multiple processors linked together in a single system. Used to solve problems beyond the scope of the desktop. High performance computing Maximising number of cycles per second, usually parallel High throughput computing Maximising number of cycles per year, usually serial Facilitating the storage, access and processing of data Coping with the massive growth in data

High Performance Computing? Single problem split across many processors tasks must run quickly tightly coupled, task parallel, communication between threads Weather forecasting Theoretical chemistry Imaging processing 3D image reconstruction 4D visualisation Sequence assembly Whole genome analysis

High Throughput Computing A lot of work done over a long time frame one program run many times, eg searching a large data set loosely coupled (data parallel, embarrassingly parallel) ATLAS analysis Genomics (sequence alignment, BLAST etc) Virtual Screening (eg in drug discovery) Parameter exploration (simulations) Statistical analysis (eg bootstrap analysis,)

Apollo Cluster - Hardware Total 488 cores 22 x 12 core 2.67GHz Intel nodes - 264 cores 2 x 48 core 2.2GHz AMD nodes 96 cores 17 blades, GigE (informatics) - 128 cores 48 GB RAM for Intel nodes 256 GB RAM for AMD nodes 20 TB Home NFS file system (backed up) 80 TB Lustre scratch file system (not backed up) QDR (40Gbs) Infiniband interconnect 14/12/2011

Apollo Cluster - Filesystems Home - 20 TB, RAID 6 set of 12 x 2TB disks Exported via NFS Backed up, keep your valuable data here Easily overloaded if many cores read/write at same time Lustre parallel file system 80TB Redundant metadata server Three object servers, each with 3 x 10 TB RAID6 OST Data striping configured by file or directory Can stripe across all 108 disks. Aggregate data rate ~ 3.8GB/s NOT backed up, for temporary storage. /mnt/lustre/scratch Local 400GB scratch disk per node

Apollo Cluster - Lustre Performance!"#$"#% &''()!! )!!* )!!+ )!!, -'."/0% *1 2 2 2 3"0#.4 567% 8)9! 567% 567% 567% *./%# @CDEFCGDH IHCJ@E KCJL@CHEF KCJLJCKKG KCJL@CJ@G.#*./%# @CHHECKID FICGEI KCH@GCFDG KCKKLCIIJ KCHJHCDFF.#&3 JCGDJCKF@ FICEGI KCKF@CEGK KCKIFCIFK KCKIECHLI.#.#&3 KDCFKKCEKI JDLCIDH GCKJLCGLG DCGKDCFGD LCLJFCHD@.&03)-$.#&3 KEKCHIG JC@IK.&03)-$*./%# JIKCILE FCHDG.#;#.+#$.#&3 JEKCEDL @CG@J +%./3#$.#&3 @H@CDEJ FCLFD -/8#3 JDHCI@J FC@DE IOIOZONE Benchmark Summary (c ) 2010 Alces Software Ltd

Apollo Cluster - Software Module system used for defining paths and libraries Need to load software required Module avail, module add XX, module unload XX Access optimised math libraries, MPI stacks, compilers etc Latest version of packages, eg python, gcc easily installed Intel parallel studio suite C C++ Fortran, MKL, performance analysis etc gcc and Open64 compilers Jeremy or Tom will compile/install software for users

Apollo Cluster - Software Compilers/tools Libraries Programs ant acml adf gcc suite atlas aimpro Intel (c, c++, fortran) blas, gotoblas alberta git cloog FSL jdk 1.6_024 fftw2, fftw3 gadget mercurial gmp gap open64 gsl Gaussian 09 python 2.6.6 hpl hdf5 sbcl lapack, scalapack idl MVAPICH, MVAPICH2 matlab mpfr mricron nag netcdf openmpi paraview ppl stata WRF

Apollo Cluster - Queues 1 Sun Grid Engine used for batch system (soon UNIVA) parallel.q for MPI and OpenMP jobs Intel nodes Slot limit of 36 cores per user at present serial.q for serial, OpenMP and MPI jobs AMD nodes Slot limit of 36 cores per user Informatics only queues inf.q + others No other job limits need user input for configuration

Apollo Cluster - serial job script Queue is by default the serial.q #!/bin/sh #$ -N sleep #$ -S /bin/sh #$ -cwd #$ -q serial.q #$ -M user@sussex.ac.uk #$ -m bea echo Starting at: `date` sleep 60 echo Now it is: `date`

Apollo Cluster - parallel job script For parallel jobs you must specify the pe - parallel environment. parallel environments: openmpi, openmp, mvapich2 often less efficient. #!/bin/sh #$ -N JobName #$ -M user@sussex.ac.uk #$ -m bea #$ -cwd #$ -pe openmpi NUMBER_OF_CPUS # eg 12-36 #$ -q parallel.q #$ -S /bin/bash # source modules environment:. /etc/profile.d/modules.sh module add gcc/4.3.4 qlogic/openmpi/gcc mpirun -np $NSLOTS -machinefile $TMPDIR/machines /path/to/exec

Apollo Cluster - Monitoring Ganglia statistics for apollo and feynman (the EPP cluster) are at http://feynman.hpc.susx.ac.uk

Apollo Cluster - Historic load

Apollo Cluster - Accounting Simply, qacct o Accounting monitored to show fair usage, etc UNIVA s UniSight reporting tool coming soon. Largest users of CPU are Theoretical Chemistry and Informatics Other usage growing, eg Engineering (CFD), weather modelling (Geography) and fmri Image analyis (BSMS and Psychology) Use the AMD nodes (serial.q and inf.q)!

Apollo Cluster - Documentation For queries and accounts on the cluster, email researchsupport@its.sussex.ac.uk - Jeremy and Tom monitor this; your email won t get lost! Userguides and example scripts are available on the cluster in /cm/shared/docs/userguide Other documentation in /cm/shared/docs path Workshop slides available from wiki and http://www.sussex.ac.uk/its/services/research NEW- Wiki at https://www.hpc.sussex.ac.uk

Apollo Cluster - Adding nodes C6100 chassis + four servers each 2.67 GHz, 12 core, 48GB RAM, IB card, licences ~ 14,600 R815 48 core 2.3GHz AMD 128GB ~ 9,300 C6145 New AMD Interlagos 2 x64 core nodes each with 128GB ~ 15,500 Departments guaranteed 90% exclusive use of their nodes, 10% sharing with others, plus back fill of idle time. Contact researchsupport@its.sussex.ac.uk for pricing. We can get quotes for your research proposals.

Feynman Cluster Initially a Tier-3 local facility for Particle Physics Same system architecture as Apollo, logically separate Used to analyse data from the ATLAS experiment at the Large Hadron Collider at CERN Serial workload with high IO and storage requirements 96 cores and 2/3 of Lustre storage (54 TB) Joining GridPP as a Tier-2 national facility, with expansion of storage and CPU System managed by Emyr James, in collaboration with IT Services

Future Developments Wiki for collaborative documentation UNIVA Grid Engine and Hadoop/Map Reduce Moving cluster to new racks to for allow expansion Adding new Infiniband switches Updating Lustre to 1.8.7 Adding 60 TB more Lustre storage for EPP Consolidating clusters (Feynman, Zeus) Two days downtime sometime in February or early March

Questions?

Questions? Problems and Issues Access to resources Who should have priority access? Who should have a lesser priority? Fair share access for all? Queue design Time limits? Minimum and/or maximum slots on parallel.q? Requests?

Questions? Is this a useful day? Do you want another one? What should we cover? When? Shorter hands on training sessions? Session on compilers and code optimisation?