Technical Computing Suite Job Management Software

Similar documents
Operating System for the K computer

Petascale Software Challenges. Piyush Chaudhary High Performance Computing

PRIMERGY server-based High Performance Computing solutions

- An Essential Building Block for Stable and Reliable Compute Clusters

Current Status of FEFS for the K computer

LS-DYNA Scalability on Cray Supercomputers. Tin-Ting Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp.

Cluster Implementation and Management; Scheduling

Scaling from Workstation to Cluster for Compute-Intensive Applications

Cloud Computing through Virtualization and HPC technologies

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

ALPS Supercomputing System A Scalable Supercomputer with Flexible Services

Symmetric Multiprocessing

Using the Windows Cluster

HPC Software Requirements to Support an HPC Cluster Supercomputer

Workshop on Parallel and Distributed Scientific and Engineering Computing, Shanghai, 25 May 2012

Operations Management Software for the K computer

Advancing Applications Performance With InfiniBand

Cray DVS: Data Virtualization Service

The K computer: Project overview

LS DYNA Performance Benchmarks and Profiling. January 2009

Supercomputer System for Numerical Weather Prediction by Taiwan Central Weather Bureau

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/ CAE Associates

MapReduce Evaluator: User Guide

Accelerating From Cluster to Cloud: Overview of RDMA on Windows HPC. Wenhao Wu Program Manager Windows HPC team

Recommended hardware system configurations for ANSYS users

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks

Microsoft Compute Clusters in High Performance Technical Computing. Björn Tromsdorf, HPC Product Manager, Microsoft Corporation

Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales

Energy efficient computing on Embedded and Mobile devices. Nikola Rajovic, Nikola Puzovic, Lluis Vilanova, Carlos Villavieja, Alex Ramirez

Optimizing Shared Resource Contention in HPC Clusters

21. Software Development Team

Clusters: Mainstream Technology for CAE

IBM Platform Computing : infrastructure management for HPC solutions on OpenPOWER Jing Li, Software Development Manager IBM

White Paper. Real-time Capabilities for Linux SGI REACT Real-Time for Linux

Lecture 2 Parallel Programming Platforms

FLOW-3D Performance Benchmark and Profiling. September 2012

Scaling Study of LS-DYNA MPP on High Performance Servers

High Performance Computing. Course Notes HPC Fundamentals

Debugging with TotalView

InfiniBand Software and Protocols Enable Seamless Off-the-shelf Applications Deployment

Equalizer. Parallel OpenGL Application Framework. Stefan Eilemann, Eyescale Software GmbH

owncloud Enterprise Edition on IBM Infrastructure

Dell High-Performance Computing Clusters and Reservoir Simulation Research at UT Austin.

Distributed communication-aware load balancing with TreeMatch in Charm++

Improved LS-DYNA Performance on Sun Servers

Linux Performance Optimizations for Big Data Environments

Microsoft HPC. V 1.0 José M. Cámara (checam@ubu.es)

High Performance Computing (HPC)

The Top Six Advantages of CUDA-Ready Clusters. Ian Lumb Bright Evangelist

GPFS Storage Server. Concepts and Setup in Lemanicus BG/Q system" Christian Clémençon (EPFL-DIT)" " 4 April 2013"

Supercomputing on Windows. Microsoft (Thailand) Limited

HPC Wales Skills Academy Course Catalogue 2015

Panasas High Performance Storage Powers the First Petaflop Supercomputer at Los Alamos National Laboratory

SUN. Linda Fellingham, Ph. D Manager, Visualization and Graphics Sun Microsystems

REFERENCE. Microsoft in HPC. Tejas Karmarkar, Solution Sales Professional, Microsoft

Grid Scheduling Dictionary of Terms and Keywords

Red Hat Enterprprise Linux - Renewals DETAILS SUPPORTED ARCHITECTURE

Program Grid and HPC5+ workshop

MPI / ClusterTools Update and Plans

Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.

Virtual InfiniBand Clusters for HPC Clouds

Fast Setup and Integration of ABAQUS on HPC Linux Cluster and the Study of Its Scalability

How to control Resource allocation on pseries multi MCM system

Client/Server Computing Distributed Processing, Client/Server, and Clusters

Sun Constellation System: The Open Petascale Computing Architecture

Interoperability Testing and iwarp Performance. Whitepaper

Performance of the JMA NWP models on the PC cluster TSUBAME.

Performance Comparison of Fujitsu PRIMERGY and PRIMEPOWER Servers

ECLIPSE Performance Benchmarks and Profiling. January 2009

IT of SPIM Data Storage and Compression. EMBO Course - August 27th! Jeff Oegema, Peter Steinbach, Oscar Gonzalez

SRNWP Workshop. HP Solutions and Activities in Climate & Weather Research. Michael Riedmann European Performance Center

Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

JuRoPA. Jülich Research on Petaflop Architecture. One Year on. Hugo R. Falter, COO Lee J Porter, Engineering

supercomputing. simplified.

Running Native Lustre* Client inside Intel Xeon Phi coprocessor

Running a Workflow on a PowerCenter Grid

Running applications on the Cray XC30 4/12/2015

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

Mellanox Academy Online Training (E-learning)

LOAD BALANCING DISTRIBUTED OPERATING SYSTEMS, SCALABILITY, SS Hermann Härtig

SR-IOV: Performance Benefits for Virtualized Interconnects!

So#ware Tools and Techniques for HPC, Clouds, and Server- Class SoCs Ron Brightwell

The PHI solution. Fujitsu Industry Ready Intel XEON-PHI based solution. SC Denver

Simple Introduction to Clusters

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Datasheet FUJITSU Software Systemwalker Runbook Automation V15

Performance Improvement of Application on the K computer

STEPPING TOWARDS A NOISELESS LINUX ENVIRONMENT

HPC Cloud. Focus on your research. Floris Sluiter Project leader SARA

Scalability and Classifications

Streamline Integration using MPI-Hybrid Parallelism on a Large Multi-Core Architecture

LSKA 2010 Survey Report Job Scheduler

Broadening Moab/TORQUE for Expanding User Needs

Data Centric Systems (DCS)

How System Settings Impact PCIe SSD Performance

1 Bull, 2011 Bull Extreme Computing


Transcription:

Technical Computing Suite Job Management Software Toshiaki Mikamo Fujitsu Limited Supercomputer PRIMEHPC FX10 PRIMERGY x86 cluster

Outline System Configuration and Software Stack Features The major functions of job scheduler Efficient Resource Usage Fair Share Scheduling System-optimal Resource Assignment Summary and Future 1

Hybrid System Configuration Supercomputer PRIMEHPC FX10 PRIMERGY x86 cluster 6D mesh/torus Interconnect (Tofu) Fat-Tree Interconnect (Infiniband) Local file system (Temporary area occupied by jobs) IO network (IB), management network (GbE) Management nodes User Login nodes Login Compilation Job submission Global file system (Data storage area) Job management nodes File management nodes System operations management Job operations management Control nodes Administrator 2

System Software Stack System operations management System configuration management System control System monitoring System installation & operation Job operations management Job manager Job scheduler Resource management Parallel execution environment User/ISV Applications HPC Portal / System Management Portal Technical Computing Suite High-performance file system Lustre-based distributed file system High scalability IO bandwidth guarantee High reliability & availability VISIMPACT TM Shared L2 cache on a chip Hardware intra-processor synchronization Compilers Hybrid parallel programming Sector cache support SIMD / Register file extensions Support Tools IDE Profiler & Tuning tools Interactive debugger MPI Library Scalability of High-Func. Barrier Comm. Linux-based enhanced Operating System Supercomputer PRIMEHPC FX10 3 Red Hat Enterprise Linux PRIMERGY x86 cluster

Features Same job operations in FX10 and PRIMERGY Efficient, fair and system-optimal job scheduling See slide below for details Resource / Access control Elapsed time limit / CPU time limit / Physical memory limit Enable / Disable execute permission of job operation commands Reduce OS jitter / Power saving control Job statistical information The amount of CPU time / Memory / IO SIMD rate / MIPS / MFLOPS 4

Job Scheduler Renew our job scheduler for large-scale system Our job scheduler features: Multi-process enable to coexist multiple scheduler in a cluster. Multi-thread enable to balance the load of scheduling. 5

Efficient Resource Usage Backfill scheduling for keeping the resources busy Our scheduler manages space(compute nodes) and time. It will backfill the low priority jobs so as not to prevent high priority jobs. Time Now t1 t2 t3 Not backfilled Running job Job C Job B Job D Job C Backfilled Running job Job B Job D Job D Job C 6

Fair Share Scheduling Fairly share resources between users/groups based on past usage. 1 Fair share value is issued in advance for each user/group. 2 The value is changed by the result of resource usage. 3 The job execution priority is determined dynamically according to the value. Fair share value (money) Payment time Fair share value is like money. Return of overpaid Deposit Payment[P] = (#Node allocated) x (Elapsed time limit of job) Deposit[D] = (Elapsed time) x (Recovery rate) Return of overpaid[r] = P - ((#Node allocated) x (Actual elapsed time of job)) 7

Optimal Job Scheduling for FX10 Interconnect topology-aware resource assignment One interconnect unit : 12 nodes (2 x 3 x 2) Job assignment rule: rectangular solid shape Guaranteeing neighbor communication Avoiding interfering with other jobs Rotates rectangular solid of interconnect unit to reduce fragmentation In-use unoccupied 6 z y 8 4 8 6 x 6 8 4 4 8 8 4 6

Optimal Job Scheduling for FX10 Asynchronous file staging Compute nodes PRIMEHPC FX10 Interconnect IO nodes Stage IN/OUT Local file system Stage IN Asynchronously transfer files from Global to Local FS before the job starts. Stage OUT Asynchronously transfer files from Local to Global FS after the job ends. Compute nodes Time Now t1 t2 t3 Running job Async. Async. Job B Job C IO network (IB), management network (GbE) IO nodes Stage IN Stage IN Stage OUT Stage IN Stage OUT Login nodes Global file system (Data storage area) Co-scheduling of computation and file transfer. 9

Optimal Job Scheduling for PRIMERGY Fine-grained node assignment Node selection method : balancing / concentration Rank placement policy : pack / unpack Priority control of allocated nodes Execution mode : node is occupied or not by a job. Strict core assignment Node#0 Node#1 Node#2 Job C Job B Job D Node concentration Node#0 Node#1 Node#2 R0 R0 Job B R1 R1 Rank pack Node Rank unpack Processes are bound to cores in the job territory. 0 2 4 6 No process can move to cores in other job territory. 1 3 P 5 Job B 7 core 10

Summary and Future We developed the job management software. Unified operability on PRIMEHPC FX10 and PRIMERGY New job scheduler : Efficiency, Fairness and System-optimization Practical resource control and job statistical information Future Work Operation simulator Administrator will be able to simulate the operation situation subsequent to operation parameter changes. 11

12