Slurm Native Workload Management on Cray Systems

Similar documents
Running applications on the Cray XC30 4/12/2015

General Overview. Slurm Training15. Alfred Gil & Jordi Blasco (HPCNow!)

SLURM Workload Manager

The Application Level Placement Scheduler

Managing GPUs by Slurm. Massimo Benini HPC Advisory Council Switzerland Conference March 31 - April 3, 2014 Lugano

Chapter 2: Getting Started

SLURM Resources isolation through cgroups. Yiannis Georgiou Matthieu Hautreux

A High Performance Computing Scheduling and Resource Management Primer

Slurm License Management

ITG Software Engineering

The Information Technology Solution. Denis Foueillassar TELEDOS project coordinator

Resource Scheduling Best Practice in Hybrid Clusters

Slurm at CEA. Site Report CEA 10 AVRIL 2012 PAGE septembre SLURM User Group - September 2014 F. Belot, F. Diakhaté, A. Roy, M.

Job Scheduling Using SLURM

Introduction to Running Computations on the High Performance Clusters at the Center for Computational Research

The Asterope compute cluster

Requesting Nodes, Processors, and Tasks in Moab

HPC-Nutzer Informationsaustausch. The Workload Management System LSF

Cloud Bursting with SLURM and Bright Cluster Manager. Martijn de Vries CTO

Matlab on a Supercomputer

Slurm Workload Manager Architecture, Configuration and Use

Manjrasoft Market Oriented Cloud Computing Platform

CA Workload Automation Agents Operating System, ERP, Database, Application Services and Web Services

Optimizing Shared Resource Contention in HPC Clusters

Moab and TORQUE Highlights CUG 2015

Virtualization for Cloud Computing

SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education

OLCF Best Practices. Bill Renaud OLCF User Assistance Group

Maintaining Non-Stop Services with Multi Layer Monitoring

Monitoring Infrastructure for Superclusters: Experiences at MareNostrum

Batch Scheduling on the Cray XT3

SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education

JUROPA Linux Cluster An Overview. 19 May 2014 Ulrich Detert

Slurm Fault Tolerant Workload Management

Manjrasoft Market Oriented Cloud Computing Platform

Building an energy dashboard. Energy measurement and visualization in current HPC systems

This presentation provides an overview of the architecture of the IBM Workload Deployer product.

SRNWP Workshop. HP Solutions and Activities in Climate & Weather Research. Michael Riedmann European Performance Center

WebSphere Training Outline

An introduction to compute resources in Biostatistics. Chris Scheller

Until now: tl;dr: - submit a job to the scheduler

Big Data Evaluator 2.1: User Guide

Real Scale Experimentations of SLURM Resource and Job Management System

Overview of HPC Resources at Vanderbilt

Microsoft HPC. V 1.0 José M. Cámara (checam@ubu.es)

MPI / ClusterTools Update and Plans

Improving Job Scheduling by using Machine Learning & Yet an another SLURM simulator. David Glesser, Yiannis Georgiou (BULL) Denis Trystram(LIG)

Simplifying Big Data Deployments in Cloud Environments with Mellanox Interconnects and QualiSystems Orchestration Solutions

CA Workload Automation Agents for Mainframe-Hosted Implementations

LSKA 2010 Survey Report Job Scheduler

Solution for private cloud computing

3. License Management - Unix & Linux

automates system administration for homogeneous and heterogeneous networks

Basic & Advanced Administration for Citrix NetScaler 9.2

8/15/2014. Best (and more) General Information. Staying Informed. Staying Informed. Staying Informed-System Status

Simulation of batch scheduling using real production-ready software tools

Visualizing gem5 via ARM DS-5 Streamline. Dam Sunwoo ARM R&D December 2012

Chapter 1 - Web Server Management and Cluster Topology

Grid Engine Training Introduction

Introduction to Sun Grid Engine (SGE)

Windows Server 2008 R2 Hyper-V Server and Windows Server 8 Beta Hyper-V

A stream computing approach towards scalable NLP

Virtualization of a Cluster Batch System

DB2 Connect for NT and the Microsoft Windows NT Load Balancing Service

HP StorageWorks MPX200 Simplified Cost-Effective Virtualization Deployment

D83167 Oracle Data Integrator 12c: Integration and Administration

The PHI solution. Fujitsu Industry Ready Intel XEON-PHI based solution. SC Denver

The ENEA-EGEE site: Access to non-standard platforms

MapGuide Open Source Repository Management Back up, restore, and recover your resource repository.

Work Environment. David Tur HPC Expert. HPC Users Training September, 18th 2015

Harmonizing policy management with Murphy in GENIVI, AGL and TIZEN IVI

YARN Apache Hadoop Next Generation Compute Platform

ELEC 377. Operating Systems. Week 1 Class 3

Event Listener based UCS Integration with OpenStack

PANDORA FMS NETWORK DEVICES MONITORING

The Evolution of Cray Management Services

基 於 SDN 與 可 程 式 化 硬 體 架 構 之 雲 端 網 路 系 統 交 換 器

Streamline Computing Linux Cluster User Training. ( Nottingham University)

Scheduling in SAS 9.3

OWB Users, Enter The New ODI World

Feature Comparison. Windows Server 2008 R2 Hyper-V and Windows Server 2012 Hyper-V

EMC Smarts SAM, IP, ESM, MPLS, NPM, OTM, and VoIP Managers Support Matrix

Batch Usage on JURECA Introduction to Slurm. Nov 2015 Chrysovalantis Paschoulas

Job Scheduling on a Large UV Chad Vizino SGI User Group Conference May Pittsburgh Supercomputing Center

Upgrading a Telecom Billing System with Intel Xeon Processors

Computing in High- Energy-Physics: How Virtualization meets the Grid

- An Essential Building Block for Stable and Reliable Compute Clusters

System Software for High Performance Computing. Joe Izraelevitz

OASIS: a data and software distribution service for Open Science Grid

Batch Scheduling and Resource Management

PANDORA FMS NETWORK DEVICE MONITORING

Power and Energy aware job scheduling techniques

CHAPTER 15: Operating Systems: An Overview

Parallel Processing using the LOTUS cluster

The MOSIX Cluster Management System for Distributed Computing on Linux Clusters and Multi-Cluster Private Clouds

XTM Web 2.0 Enterprise Architecture Hardware Implementation Guidelines. A.Zydroń 18 April Page 1 of 12

Example of Standard API

Windows Server 2008 R2 Hyper-V Live Migration

Transcription:

Slurm Native Workload Management on Cray Systems Morris Jette jette@schedmd.com

Outline Slurm operation with Cray ALPS resource manager Native Slurm design New Slurm capabilities (for all most system types)

Outline Slurm operation with Cray ALPS resource manager Native Slurm design New Slurm capabilities (for all most system types)

ALPS and BASIL ALPS Application Level Placement Scheduler Cray's resource manager Six daemons plus variety of tools One daemon runs on each compute node to launch user tasks Other daemons run on service nodes Rudimentary scheduling software Dependent upon external scheduler (e.g. Slurm) for workload management BASIL Batch Application Scheduler Interface Layer XML interface to ALPS

Slurm and ALPS Functionality Slurm Manages resources and jobs Prioritize queues and enforces limits Scheduling and accounting of jobs ALPS Allocates and releases reservations for jobs (as directed by Slurm) Launches tasks Monitors node health

Slurm Architecture for Cray Slurm Slurmctld runs on sdb Slurmd runs on service node(s) No Slurm daemons on compute nodes BASIL Alps Compute nodes

Job Launch Process User submits a job script Slurmctld creates an ALPS reservation Slurmctld sends the job script to slurmd Slurmd claims the reservation for the session ID Slurmd launches the user script Aprun (ALPS tool) launches the tasks on compute nodes (invoked directly by the user or run by srun) When the job finishes the reservation is released

Outline Slurm operation with Cray ALPS resource manager Native Slurm design New Slurm capabilities (for all most system types)

Motivation for Native Slurm Current architecture has limitations due to the translation from Slurm to ALPS Not all features of Slurm are supported by ALPS Spawning multiple concurrent jobs per login session Running multiple applications (job steps) per job allocation Running multiple jobs per node Job profiling Improved performance Allow native Slurm functionality scheduling, resource management and reporting Majority of MPI implementations already interface to Slurm as launcher with the srun command

ALPS Refactored ALPS divided into Library with underlying functions Network management Node health check Daemons, commands, etc. Preserve previous functionality

Slurm Native Implementation Cray and SchedMD developed plugins to provide the following services: Dynamic node state change information System topology information MPMD (Multiple-Program Multiple-Data) support Node Health Check Support (can be disabled in Slurm) Network performance counter management Congestion management information for Cray Hardware Supervisory System

Slurm Native Architecture Slurm Cray plugins slurmctld alpscomm Module of C library with specific Cray info slurmd slurmd slurmd Slurmd daemons run on compute nodes

Slurm Cray Specific Feature Network Performance Counters (NPC) To access the Cray's NPC, use network option in sbatch/salloc/srun commands --network=system for the system wide NPC --network=blade for the blade NPC Core Specialization To specify count, use -S/--core-spec=# option in sbatch/salloc/srun Ability to reserve number of cores allocated to the job and not used by the application All non-application processes are migrated to the specialized cores to reduce application jitter (system noise)

Slurm Configuration for Cray Configure plugins to use Cray without ALPS CoreSpec (bind system programs to specialized cores) Set CoreSpecPlugin=core_spec/cray Job Submit (sets gres=craynetwork to limit number of simultaneous applications) Set JobSubmitPlugin=job_submit/cray Also set craynetwork GRES count on each node to 4 (or 2 if node includes Xeon Phi) Process tracking (uses Cray job container to purge files) Set ProctrackType=proctrack/cray Select (manages network counters, plus wrapper for select/cons_res/) Set SelectType=select/cray Switch Set SwitchType=switch/cray Task (configures some environment variables) Set TaskPlugin=cray (other task plugins could also be used)

Outline Slurm operation with Cray ALPS resource manager Native Slurm design New Slurm capabilities (for all most system types)

New Slurm Functionality Core specialization (extended to generic Linux clusters by Bull using cgroup) CPU governor under user control Gang scheduling support for user controlled CPU governor and frequency New function calls added to several plugins (greater flexibility)

Slurm Limitation on Cray Number of running applications per node limited due to network constraints 2 or 4 simultaneous applications (depending upon hardware)