JUROPA Linux Cluster An Overview. 19 May 2014 Ulrich Detert

Similar documents
JuRoPA. Jülich Research on Petaflop Architecture. One Year on. Hugo R. Falter, COO Lee J Porter, Engineering

Welcome to the. Jülich Supercomputing Centre. D. Rohe and N. Attig Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich

How To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) (

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

Introduction to Linux and Cluster Basics for the CCR General Computing Cluster

Current Status of FEFS for the K computer

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

PADS GPFS Filesystem: Crash Root Cause Analysis. Computation Institute

Kriterien für ein PetaFlop System

Parallel Programming Survey

The Asterope compute cluster

OpenMP Programming on ScaleMP

HPC Update: Engagement Model

Using the Windows Cluster

Sun Constellation System: The Open Petascale Computing Architecture

Mathematical Libraries and Application Software on JUROPA and JUQUEEN

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures

Juropa. Batch Usage Introduction. May 2014 Chrysovalantis Paschoulas

Oracle Exadata: The World s Fastest Database Machine Exadata Database Machine Architecture

Running Native Lustre* Client inside Intel Xeon Phi coprocessor

Overview of HPC systems and software available within

Commoditisation of the High-End Research Storage Market with the Dell MD3460 & Intel Enterprise Edition Lustre

FLOW-3D Performance Benchmark and Profiling. September 2012

Building a Top500-class Supercomputing Cluster at LNS-BUAP

A Crash course to (The) Bighouse

Getting Started with HPC

Using NeSI HPC Resources. NeSI Computational Science Team

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/ CAE Associates

1 Bull, 2011 Bull Extreme Computing

Mathematical Libraries on JUQUEEN. JSC Training Course

An Introduction to High Performance Computing in the Department

Michael Kagan.

Introduction to Running Computations on the High Performance Clusters at the Center for Computational Research

Recommended hardware system configurations for ANSYS users

Intel Cluster Ready Appro Xtreme-X Computers with Mellanox QDR Infiniband

New Storage System Solutions

Altix Usage and Application Programming. Welcome and Introduction

Multicore Parallel Computing with OpenMP

The CNMS Computer Cluster

Mitglied der Helmholtz-Gemeinschaft. System monitoring with LLview and the Parallel Tools Platform

How to Deploy OpenStack on TH-2 Supercomputer Yusong Tan, Bao Li National Supercomputing Center in Guangzhou April 10, 2014

Windows Server 2008 R2 Hyper V. Public FAQ

Cluster Implementation and Management; Scheduling

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

Introduction to Supercomputing with Janus


ALPS Supercomputing System A Scalable Supercomputer with Flexible Services

INDIAN INSTITUTE OF TECHNOLOGY KANPUR Department of Mechanical Engineering

Globus and the Centralized Research Data Infrastructure at CU Boulder

Introduction to HPC Workshop. Center for e-research

SLURM Workload Manager

Mitglied der Helmholtz-Gemeinschaft UNICORE. Uniform Access to JSC Resources. Michael Rambadt, 20.

InterWorx Clustering Guide. by InterWorx LLC

Main Memory Data Warehouses

The Foundation for Better Business Intelligence

1 DCSC/AU: HUGE. DeIC Sekretariat /RB. Bilag 1. DeIC (DCSC) Scientific Computing Installations

HPC Growing Pains. Lessons learned from building a Top500 supercomputer

Analyzing the Virtualization Deployment Advantages of Two- and Four-Socket Server Platforms

The PHI solution. Fujitsu Industry Ready Intel XEON-PHI based solution. SC Denver

SMB Direct for SQL Server and Private Cloud

HP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief

Designed for Maximum Accelerator Performance

ECLIPSE Performance Benchmarks and Profiling. January 2009

Data Movement and Storage. Drew Dolgert and previous contributors

High Performance Computing in Aachen

An introduction to Fyrkat

Oracle Database Reliability, Performance and scalability on Intel Xeon platforms Mitch Shults, Intel Corporation October 2011

Manual for using Super Computing Resources

Panasas at the RCF. Fall 2005 Robert Petkus RHIC/USATLAS Computing Facility Brookhaven National Laboratory. Robert Petkus Panasas at the RCF

System Requirements Table of contents

CORRIGENDUM TO TENDER FOR HIGH PERFORMANCE SERVER

Cray DVS: Data Virtualization Service

Microsoft SharePoint Server 2010

Fujitsu HPC Cluster Suite

Running on Blue Gene/Q at Argonne Leadership Computing Facility (ALCF)

Solution for private cloud computing

Oracle TimesTen In-Memory Database on Oracle Exalogic Elastic Cloud

DELL TM PowerEdge TM T Mailbox Resiliency Exchange 2010 Storage Solution

A highly configurable and efficient simulator for job schedulers on supercomputers

Configuration Maximums VMware vsphere 4.0

Stovepipes to Clouds. Rick Reid Principal Engineer SGI Federal by SGI Federal. Published by The Aerospace Corporation with permission.

SRNWP Workshop. HP Solutions and Activities in Climate & Weather Research. Michael Riedmann European Performance Center

High Performance Computing Infrastructure at DESY

Dynamic Extension of a Virtualized Cluster by using Cloud Resources CHEP 2012

PARALLELS SERVER 4 BARE METAL README

HPC Software Requirements to Support an HPC Cluster Supercomputer

Purpose Computer Hardware Configurations... 6 Single Computer Configuration... 6 Multiple Server Configurations Data Encryption...

PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN

PARALLELS SERVER BARE METAL 5.0 README

Parallels Cloud Storage

Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet. September 2014

7 Real Benefits of a Virtual Infrastructure

XSEDE Service Provider Software and Services Baseline. September 24, 2015 Version 1.2

高 通 量 科 学 计 算 集 群 及 Lustre 文 件 系 统. High Throughput Scientific Computing Clusters And Lustre Filesystem In Tsinghua University

Lessons learned from parallel file system operation

RDMA over Ethernet - A Preliminary Study

SR-IOV: Performance Benefits for Virtualized Interconnects!

Using WestGrid. Patrick Mann, Manager, Technical Operations Jan.15, 2014

Power Efficiency Comparison: Cisco UCS 5108 Blade Server Chassis and IBM FlexSystem Enterprise Chassis

Transcription:

Mitglied der Helmholtz-Gemeinschaft JUROPA Linux Cluster An Overview 19 May 2014 Ulrich Detert

JuRoPA JuRoPA Jülich Research on Petaflop Architectures Bull, Sun, ParTec, Intel, Mellanox, Novell, FZJ JUROPA (Juropa-JSC) FZJ production system NIC and VSR projects Commercial customers PRACE Tier 1 system HPC-FF HPC-FF: High Performance Computing For Fusion Dedicated to European Fusion Research Community Operation of the HPC-FF partition ceased on June 30, 2013. All HPC-FF compute resources have been moved to the JSC partition!

Juropa Components (1) JUROPA Hardware Sun Constellation System + Bull NovaScale R422-E2 Infiniband QDR (40 Gb/s per link and direction) Full Fat Tree Topology 3288 compute nodes: 2 Intel Nehalem-EP quad-core processors (Xeon X5570) 2.93 GHz 24 GB memory (DDR3, 1066 MHz) IB QDR HCA 26304 cores 308 TF peak

Juropa Components (2) Lustre Storage Pool 4 Meta Data Servers (MDS) Bull NovaScale R423-E2 (Nehalem-EP 4-core/Westmere 6-core) 100 TB for meta data home and work (EMC² CX4-240) 14 Object Storage Servers (OSS) (home) Sun Fire X4170 Server 500 TB user data 8 Object Storage Servers (OSS) (home) Bull NovaScale R423-E2 500 TB user data 8 Object Storage Servers (OSS) (work) Bull NovaScale R423-E2 800 TB user data Aggregated data rate ~40 GB/s

Juropa Architecture

Infiniband Topology (Fat Tree) 648 port switch 648 port switch 648 port switch 648 port switch 648 port switch 648 port switch 648 port switch 648 port switch JUROPA Nodes... JUROPA Nodes x 92 building blocks (24 nodes each) x 60 building blocks (18 nodes each)

Infiniband Topology (Fat Tree) Sun Nodes Service BULL Nodes 23 x 4 QNEM modules, 24 ports each 6 x M9 switches, 648 ports max. each, 468/276 links used Mellanox MTS3600 switches (Shark), 36 ports, for service nodes 4 Compute Sets (CS) with 15 Compute Cells (CC) each CC with 18 Compute Nodes (CN) and 1 Mellanox MTS3600 (Shark) switch each Virtual 648-port switches constructed from 54x/44x Mellanox MTS3600

Software (1) Operating System SUSE SLES 11 SP1 Cluster Management ParaStation GridMonitor, Jumpmon Batch System Torque Resource Manager (start jobs, return output etc.) Moab Workload Manager (priorities, accounting, job chains) User command line interface (job start, status, cancel etc.) Compiler Intel Professional Fortran, C/C++

Software (2) Libraries Intel Math Kernel Library (MKL) BLAS, LAPACK, ScaLAPACK1, Sparse Solvers, Fast Fourier Transforms, Vector Math etc. Highly optimized for Intel CPU architecture /usr/local ~ 90 packages from adf, amber to wsmp, zlib MPI - Message Passing Interface ParTec MPI (based on MPICH2) OpenMP - Memory-Parallel Multi-Threading Intel Unicore, PRACE (gsissh, gridftp)

Modules Modules allow to switch between versions of a specific software or library module avail shows available modules module list lists loaded modules Currently loaded modulefiles: 1) parastation/mpi2-intel-5.0.26-1 2) mkl/10.2.5.035 3) intel/11.1.072 module load unload load / unload a module module help list usage information

What is Simultaneous Multi-Threading (SMT)? 1 3288 Node Memory MPI (Distributed memory)... Node Memory Compute nodes: : mostly independent of each other, local memory, shared network access Node QPI Socket 0 Socket 1 Memory Controller Memory Controller Socket: : full-featured quad-core Nehalem-EP CPU with its own memory controller and Quickpath Interconnect (QPI) Memory Memory

What is SMT? Socket Core 0 Cache HWT 0 HWT 1 Core 1 HWT 0 HWT 1 Core 2 HWT 0 HWT 1 Memory Controller Process (Task): Core 3 HWT 0 HWT 1 Core: : full-featured processor with its own register set, functional units and caches. Memory access is shared among cores. HWT - Hardware Thread: Collection of registers and functional units ( virtual core ) that are replicated in each core. Non-replicated units like caches are shared among HWTs. Computational entity that possesses its own copy of program code and data. Thread: Cache Cache Cache Computational entity that shares program code and data with other threads. Threads may have (a limited amount of) non-shared, local data.

Node Types (1) Compute nodes Intended for batch jobs includes interactive batch (64 nodes max.) charged for connect time No direct login on compute nodes (except interactive batch) Exclusive usage by one user/job (no node sharing) smallest reservation unit is one node (= 8 cores / 16 with SMT) charged for wall-clock time unlimited access to existing resources ~ 22 GB memory, 8 processors (16 with SMT), wallclock time limit 24 h

Compute Nodes - Available Memory Communication Scalability Memory consumption for static IB communication buffers depends on the number of communicating tasks Example: Default requirement per connection = 0.5 MB => An 8-core node that connects to 8 x 512 tasks consumes 16 GB just for buffers Solutions Smaller / less buffers: PSP_OPENIB_SENDQ_SIZE=3..16 (default: 16) PSP_OPENIB_RECVQ_SIZE=3..16 (default: 16) Buffer allocation on-demand : export PSP_ONDEMAND=1 or mpiexec --ondemand...

Node Types (2) Login nodes (juropa, juropa01.. 07, 09) Intended for interactive work program development (edit, compile, test) pre and postprocessing access to home filesystem and work (Lustre) no production jobs here! (cpu time limit 30 min.) use command ulimit -Sa or ulimit -Ha to display limits HPC-FF Login node (hpcff, hpcff01) Access to HPC-FF data Access to HPC-FF data ceased on April 30, 2014

Node Types (3) GPFS nodes (juropagpfs, juropagpfs01.. 05) Access to GPFS file systems (mounted also on JUQUEEN) Intended for data manipulation copy data to and from GPFS/Lustre restore data from TSM backup import/export data to/from external sources (PRACE) same limits as on Login nodes (except CPU limit: 360 min) GPFS nodes 04 and 05 for big-memory requests: 192 GB memory Interactive performance might be degraded on GPFS nodes due to heavy data traffic. Recommendation: Use Login nodes, if GPFS file system, large memory, higher CPU time limit or connection to PRACE network is not needed.

Accessing the System Login nodes ssh [-X] <user>@juropa.fz-juelich.de Login nodes 01..07, 09 are selected round-robin Hostnames: jj28l01.. 07, 09 ssh [-X] <user>@juropa01.fz-juelich.de Access specific node juropa01

Accessing the System GPFS nodes ssh [-X] <user>@juropagpfs.fz-juelich.de GPFS nodes 01..05 are selected round-robin Hostnames: jj28g01.. 05 ssh [-X] <user>@juropagpfs01.fz-juelich.de Access to all nodes requires SSH key To be provided when applying for an account (JSC Dispatch) Do provide a passphrase for your ssh key! Password access is not possible!

File Systems Overview GPFS home / work GPFS arch Login Nodes Login HPC-FF GPFS Nodes Lustre home... Lustre work Compute Nodes

File Systems (1) Lustre Mounted on Login, GPFS and Compute nodes $WORK = $LUSTREWORK = /lustre/jwork 800 TB default group quota: 3 TB, 2 million files no backup files older than 28 days will be deleted recommended for large temporary files and high performance requirements $HOME = $LUSTREHOME = /lustre/jhome1.. 24 from 29 to 62 TB per file system, distributed among user groups default group quota: 3 TB, 2 million files daily backup recommended for permanent program data with low performance requirements (e.g. program sources, input files, configuration data)

File Systems (2) GPFS Mounted on GPFS nodes only Shared with JUQUEEN, mounted from JUST file server Only available with valid JUQUEEN user ID $GPFSWORK = /gpfs/work 3.6 PB, mounted from JUST file server details on sizes, quota, backup etc. => tomorrow $GPFSHOME = /gpfs/homea,.. homec mounted JUQUEEN home file systems details on sizes, quota, backup etc. => tomorrow $GPFSARCH = /gpfs/arch,..1,..2 automatic data migration to/from tape library

Backup Backup is done for the Lustre home file systems, GPFS/home and GPFS/arch Daily backup with TSM Restore of user data can only be done on the GPFS nodes: ssh -X <user>@juropagpfs.fz-juelich.de adsmback Select home, arch or gpfshome This opens a panel for interactive restore Select => Restore => File Level Choose your files/directories to restore => Restore

Further Information Regular preventive maintenance on Thursdays See Message of today at login Juropa on-line documentation http://www.fz-juelich.de/ias/jsc/juropa/ User support at FZJ sc@fz-juelich.de Phone: 02461 61-2828