Large scale HPC workflow management using PBS professional in a National Academic Computing Center

Similar documents
Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

How To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) (

Supercomputing Resources in BSC, RES and PRACE

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/ CAE Associates

Kriterien für ein PetaFlop System

HPC technology and future architecture

How Cineca supports IT

CALMIP a Computing Meso-center for middle HPC academis Users

OpenMP Programming on ScaleMP

Cosmological simulations on High Performance Computers

Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales

Mississippi State University High Performance Computing Collaboratory Brief Overview. Trey Breckenridge Director, HPC

Deploying and managing a Visualization Onera

Overview of HPC systems and software available within

Altix Usage and Application Programming. Welcome and Introduction

High Performance Computing in CST STUDIO SUITE

JUROPA Linux Cluster An Overview. 19 May 2014 Ulrich Detert

An introduction to Fyrkat

Journée Mésochallenges 2015 SysFera and ROMEO Make Large-Scale CFD Simulations Only 3 Clicks Away

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

Computational infrastructure for NGS data analysis. José Carbonell Caballero Pablo Escobar

Cloud Computing through Virtualization and HPC technologies

Welcome to the. Jülich Supercomputing Centre. D. Rohe and N. Attig Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich

Research Computing Building Blocks INFRASTRUCTURE FOR DATA AT PURDUE PRESTON SMITH, DIRECTOR OF RESEARCH SERVICES

Purchase of High Performance Computing (HPC) Central Compute Resources by Northwestern Researchers

ALPS Supercomputing System A Scalable Supercomputer with Flexible Services

Data Centric Systems (DCS)

Estonian Scientific Computing Infrastructure (ETAIS)

SGI High Performance Computing

HPC Cluster Decisions and ANSYS Configuration Best Practices. Diana Collier Lead Systems Support Specialist Houston UGM May 2014

JuRoPA. Jülich Research on Petaflop Architecture. One Year on. Hugo R. Falter, COO Lee J Porter, Engineering

InfiniBand Strengthens Leadership as the High-Speed Interconnect Of Choice

Building a Top500-class Supercomputing Cluster at LNS-BUAP

HPC Update: Engagement Model

Parallel Programming Survey

The PHI solution. Fujitsu Industry Ready Intel XEON-PHI based solution. SC Denver

PRACE hardware, software and services. David Henty, EPCC,

Resource Scheduling Best Practice in Hybrid Clusters

supercomputing. simplified.

SR-IOV In High Performance Computing

Virtualization of a Cluster Batch System

Extreme Computing: The Bull Way

Performance Evaluation of Amazon EC2 for NASA HPC Applications!

HPC Growing Pains. Lessons learned from building a Top500 supercomputer

Scaling from Workstation to Cluster for Compute-Intensive Applications

Sun Constellation System: The Open Petascale Computing Architecture


Using WestGrid. Patrick Mann, Manager, Technical Operations Jan.15, 2014

CORRIGENDUM TO TENDER FOR HIGH PERFORMANCE SERVER

Recent Advances in HPC for Structural Mechanics Simulations

Parallel Large-Scale Visualization

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK

Jean-Pierre Panziera Teratec 2011

HPC Wales Skills Academy Course Catalogue 2015

HPC Software Requirements to Support an HPC Cluster Supercomputer

Using the Windows Cluster

PRIMERGY server-based High Performance Computing solutions

HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect

HADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW

Cooling and thermal efficiently in

Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer

David Vicente Head of User Support BSC

Themis Athanassiadou HPC Project Manager. ClusterVision. ClusterVision. Engineer Innovate Integrate

Petascale Visualization: Approaches and Initial Results

National Facility Job Management System

Overview of HPC Resources at Vanderbilt

Building Clusters for Gromacs and other HPC applications

A National Computing Grid: FGI

MEGWARE HPC Cluster am LRZ eine mehr als 12-jährige Zusammenarbeit. Prof. Dieter Kranzlmüller (LRZ)

Parallel Computing with MATLAB

Cornell University Center for Advanced Computing

Performance, Reliability, and Operational Issues for High Performance NAS Storage on Cray Platforms. Cray User Group Meeting June 2007

Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing

ECDF Infrastructure Refresh - Requirements Consultation Document

wu.cloud: Insights Gained from Operating a Private Cloud System

1 Bull, 2011 Bull Extreme Computing

Panasas High Performance Storage Powers the First Petaflop Supercomputer at Los Alamos National Laboratory

How To Build A Cloud Server For A Large Company

Clusters: Mainstream Technology for CAE

High Productivity Computing With Windows

Thematic Unit of Excellence on Computational Materials Science Solid State and Structural Chemistry Unit, Indian Institute of Science

Accelerating CFD using OpenFOAM with GPUs

Performance Measurement and monitoring in TSUBAME2.5 towards next generation supercomputers

Evoluzione dell Infrastruttura di Calcolo e Data Analytics per la ricerca

IT service for life science

GPU Hardware and Programming Models. Jeremy Appleyard, September 2015

GPUs for Scientific Computing

Integration of Virtualized Workernodes in Batch Queueing Systems The ViBatch Concept

GridKa site report. Manfred Alef, Andreas Heiss, Jos van Wezel. Steinbuch Centre for Computing

Brainlab Node TM Technical Specifications

Using Reservations to Implement Fixed Duration Node Allotment with PBS Professional

This presentation provides an overview of the architecture of the IBM Workload Deployer product.

Oracle Maximum Availability Architecture with Exadata Database Machine. Morana Kobal Butković Principal Sales Consultant Oracle Hrvatska

PLGrid Infrastructure Solutions For Computational Chemistry

Managing Complexity in Distributed Data Life Cycles Enhancing Scientific Discovery

Relations with ISV and Open Source. Stephane Requena GENCI

Performance Characteristics of Large SMP Machines

22S:295 Seminar in Applied Statistics High Performance Computing in Statistics

benchmarking Amazon EC2 for high-performance scientific computing

Pedraforca: ARM + GPU prototype

Transcription:

Large scale HPC workflow management using PBS professional in a National Academic Computing Center Gérard GIL (CINES) 29 / 10 / 2010

Outline Presentation of CINES Missions / activities Organisation of French HPC CINES s HPC resources and services Workflow Management Existing environment PBSpro implementation Statistics Next 2

Outline Presentation of CINES Missions / activities Organisation of French HPC CINES s HPC resources and services 3

Presentation of CINES : Missions / activities CINES is supervised and funded by the Ministry of Higher Education and Research. CINES is located in Montpellier (South of France) 30 years serving the National Academic Research CINES provides the french public research community with computing resources and services. 50 persons : (technicians, engineers and administratives) 4

Presentation of CINES : Missions / activities 2 MISSIONS High Performance Computing Digital preservation 5

Presentation of CINES : Missions / activities Digital preservation Since 2004 the CINES was given the mandate to provide long-term preservation capabilities for digital objects related to scientific and technical information National agreement process: CINES is in process of agreement by the Archive Nationale International certification process: CINES is one of the three pilot sites (UKDA, DNb) to test European Certification Framework for long term preservation supported by European Commission (iso certification) 6

Presentation of CINES : Missions / activities Digital preservation Electronic PhD thesis French Ministry of higher education referred CINES as the national center for long-term conservation of digital thesis 7

Presentation of CINES : Missions / activities Digital preservation Electronic PhD thesis Digitized publications TGE-ADONIS : Multimedia documents for the Research Center on Oral Resources (CRDO) Liber floridus : medieval manuscripts of universities libraries Mazarine, St Geneviève, IRH, 8

Presentation of CINES : Missions / activities Digital preservation Electronic PhD thesis Digitized publications Pedagogic multimedia CANAL_U : videos of University channels (for CERIMES) 9

Presentation of CINES : Missions / activities Digital preservation Electronic PhD thesis Digitized publications Pedagogic multimedia Scientific datasets A new domain directly linked to our mission in HPC 10

Presentation of CINES : Organization of French HPC High Performance Computing National HPC Center since 1980 Scalar, vector and parallel processing Accelators : GPU, CELL, FPGA 11

Presentation of CINES : Organization of French HPC French HPC coordination 12

Presentation of CINES : Organization of French HPC Main missions of www.genci.fr Coordinate national academic supercomputing centers (civil activities) Promote the European HPC and participate to its organization,, European Exascale Software Initiative, Promote simulation and HPC in fundamental and industrial research Give access to its equipments to promote HPC : HPC-PME initiative : GENCI - OSEO - INRIA 13

Presentation of CINES : Organization of French HPC French academic Equipments for researchers National computing centers pyramidal organization EUROPEAN: PRACE NATIONAL: CCRT, CINES, IDRIS REGIONAL: middle range centers very large equipments for solving extremely complex problems free of charge resources allocated to scientific projects after evaluation by thematic national committees www.edari.fr From 2007 to 2010 : 20 to 700 Tflops 14

Presentation of CINES : Organization of French HPC Main National Equipments CCRT: CEA BULL : Novascale xeon system : 143 Tflops NVIDIA : TESLA GPU system : 192 Tflops CINES: Universities SGI : Altix ICE xeon system : 267 Tflops IDRIS: CNRS IBM : SP / Power6 system : 68 Tflops IBM : Blue Gene/P system : 139 Tflops 15

Presentation of CINES s HPC resources and services CINES national HPC resources JADE : SGI Altix ICE 8200 EX Linpack : 237 Tflops (n 14 on Top500, June 2010) Linux Suse SGI Tempo : Cluster administration ALTAIR PBSpro 10.1.4 (patched) : Job scheduler Lustre : scratch files NFS / DMF : /home and archives 16

Presentation of CINES s HPC resources and services JADE 17

Presentation of CINES s HPC resources and services JADE Infiniband hypercube topology Increase the bisectional bandwidth within the same rack standard enhanced 4 dimensional Hypercube 18

CINES HPC systems Pre/post processing resources IBM P1600 5 nodes P575 16cores: Power5 /32 GB Infiniband DDR + GPFS : 0,5 Tflops 4 nodes P755 32cores: Power7 /128GB, Infiniband DDR + GPFS : 3 Tflops Presentation of CINES s HPC resources and services Computing resources SGI ICE 8200 EX : 267 Tflops 1536 nodes bi-proc quad core (12288 cores) Xeon 3 GHz (Harpertown), 32 GB/node 1344 nodes bi-proc quad core (10752 cores) Xeon 2.8 GHz (Nehalem), 36 GB/node Infiniband DDR/QDR, 700 TB (Lustre) Bull Novascale 20 nodes R422, 2 sockets quad core / 8Go 24 nodes R422-E1 + Tesla S1070 GPU, Infiniband DDR +Lustre BACKBONE 10 Gbe Storage resources File servers SGI Altix 450 16 Montecito cores / 64 GB 500 To LSI disks storage (DMF) 2 NAS FAS250 : /home Library : 2 x IBM TS3500 2000 cartridges 7 readers Jaguar 3 7 readers LTO 4 19

Presentation of CINES s HPC resources and services Annual National resource allocation campain : edari 1 - ENVIRONMENT 2 - COMPUTATIONAL FLUID DYNAMICS 3 - BIOMEDICAL SIMULATION & HEALTH APPLICATIONS 4 - ASTROPHYSICS & GEOPHYSICS 5 - THEORICAL PHYSICS & PLASMA PHYSICS 6 - COMPUTER SCIENCES & MATHEMATICS 7 - MOLECULAR SYSTEMS & BIOLOGY 8 - QUATUM CHEMISTRY & MOLECULAR DYNAMICS 9 - PHYSICS & MATERIAL SCIENCE 10 - NEW APPLICATIONS & TRANSVERSE APPLICATIONS 2010 20

Presentation of CINES s HPC resources and services Services for users User support and HPC expertise CINES offers expertise in profiling, optimisation and parallelization Data visualization Hardware, Softwares and expertise Trainings and workshops Programming, MPI, OpenMP, Participation in projects European projects, national and international collaborations 21

Outline Presentation of CINES Missions Organisation of French HPC CINES HPC resources and services Workflow Management Existing environment PBSpro implementation Statistics Next 22

Workflow Management : Existing environment Existing multi platform tools Project management Workload monitoring Jobs monitoring Jobs statistics Accounting process Today only PBSpro installed from «PBS GridWorks suite» 23

Workflow Management : Existing environment Project Management : edari 24

Workflow Management : Existing environment Machines workload and availability (admin) 25

Workflow Management : Existing environment Machines workload and availability (users) Available on http://www.cines.fr 26

Workflow Management : Existing environment Graphical job monitoring: llview (Juelich Supercomputing Center) (ADMIN) 29/10/2010 EHTC 2010 - Gérard GIL (CINES) 27

Workflow Management : PBSpro implementation «PBS professional» on JADE Selected as «Job Scheduler» within the JADE procurement Installed since 2008 (v 9.1 v 10.1) Several adjustements, RFE and bugs corrections to fit our needs node selection based on topology accounting feature large-job management PBS is configured in order to facilitate resources access to users no need to know the «queue» configuration only «walltime» & «number of nodes/cores» are required specify more to enforce specific selection or placement 28

Workflow Management : PBSpro implementation 2 logical partitions to manage the machine heterogeneity Individual casting of applications, based on efficiency (profiling) Largest and most efficient jobs routed to «Nehalem» All other routed to «Harpertown» One PBSpro «HOOK» to adapt PBS to our needs Automatic allocation of job resources and priority Batch access control to Users Machine selection users authorization control parameters controls Dynamic node pools «PBS professional» on JADE Users/job priority : Backfill (Top1), Fairshare, Formula Large jobs are privileged 29

Workflow Management : Statistics PBSpro usage: some key numbers Almost 1 million jobs treated since september 2008 Up to 8000 cores per job usually managed by PBSpro Jobs walltime up to 24 hours (standard) 120 hours (only if checkpointed) Most of cpu hours consumed by jobs over than 512 cores on «Harpertown» partition jobs over than 2048 cores on «Nehalem» partition Waiting cores in queue = 10 times the machine size Machine s workload close to 90% every month 30

Workflow Management PBSpro : next Upgrade to V 11 Improve job priority management : FAIRSHARE + FORMULA Improve workload with «scheduled machine maintenance» Manage «Service Licence Agreement» for users 31

Questions? 32

33