A highly configurable and efficient simulator for job schedulers on supercomputers



Similar documents
Mitglied der Helmholtz-Gemeinschaft. System monitoring with LLview and the Parallel Tools Platform

Report on Project: Advanced System Monitoring for the Parallel Tools Platform (PTP)

Welcome to the. Jülich Supercomputing Centre. D. Rohe and N. Attig Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich

Final Report. Cluster Scheduling. Submitted by: Priti Lohani

Distributed communication-aware load balancing with TreeMatch in Charm++

JUROPA Linux Cluster An Overview. 19 May 2014 Ulrich Detert

Job Scheduling with Moab Cluster Suite

A High Performance Computing Scheduling and Resource Management Primer

Microsoft HPC. V 1.0 José M. Cámara (checam@ubu.es)

Overlapping Data Transfer With Application Execution on Clusters

Batch Systems. provide a mechanism for submitting, launching, and tracking jobs on a shared resource

Performance Monitoring of Parallel Scientific Applications

LoadLeveler Overview. January 30-31, IBM Storage & Technology Group. IBM HPC Developer TIFR, Mumbai

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research

Parallel I/O on JUQUEEN

Job Scheduling Explained More than you ever want to know about how jobs get scheduled on WestGrid systems...

Job scheduler details

Simplest Scalable Architecture

Collaborative Filtering Scalable Data Analysis Algorithms Claudia Lehmann, Andrina Mascher

Deciding which process to run. (Deciding which thread to run) Deciding how long the chosen process can run

The Moab Scheduler. Dan Mazur, McGill HPC Aug 23, 2013

How To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) (

Application Performance Analysis Tools and Techniques

Operatin g Systems: Internals and Design Principle s. Chapter 10 Multiprocessor and Real-Time Scheduling Seventh Edition By William Stallings

CPU Scheduling. Core Definitions

Trace-Based and Sample-Based Profiling in Rational Application Developer

Parallel Scalable Algorithms- Performance Parameters

Parallel Large-Scale Visualization

Table of Contents INTRODUCTION Prerequisites... 3 Audience... 3 Report Metrics... 3

Bogdan Vesovic Siemens Smart Grid Solutions, Minneapolis, USA

The Methodology Behind the Dell SQL Server Advisor Tool

Scheduling Algorithms for Dynamic Workload

How to analyse your system to optimise performance and throughput in IIBv9

Using Power to Improve C Programming Education

Energy-aware job scheduler for highperformance

Windows Server Performance Monitoring

Distributed Computing and Big Data: Hadoop and MapReduce

Fair Scheduling Algorithm with Dynamic Load Balancing Using In Grid Computing

Toad for Oracle 8.6 SQL Tuning

Tivoli IBM Tivoli Web Response Monitor and IBM Tivoli Web Segment Analyzer

Multifaceted Resource Management for Dealing with Heterogeneous Workloads in Virtualized Data Centers

Binary search tree with SIMD bandwidth optimization using SSE

Multilevel Load Balancing in NUMA Computers

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. #-approximation algorithm.

DB2 for i. Analysis and Tuning. Mike Cain IBM DB2 for i Center of Excellence. mcain@us.ibm.com

Announcements. Basic Concepts. Histogram of Typical CPU- Burst Times. Dispatcher. CPU Scheduler. Burst Cycle. Reading

Also on the Performance tab, you will find a button labeled Resource Monitor. You can invoke Resource Monitor for additional analysis of the system.

How To Manage An Sap Solution

2. is the number of processes that are completed per time unit. A) CPU utilization B) Response time C) Turnaround time D) Throughput

A CP Scheduler for High-Performance Computers

Evaluation of Nagios for Real-time Cloud Virtual Machine Monitoring

Aperiodic Task Scheduling

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Matlab on a Supercomputer

GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications

Parallel file I/O bottlenecks and solutions

Operating Systems Concepts: Chapter 7: Scheduling Strategies

Virtuoso and Database Scalability

Elastic Enterprise Data Warehouse Query Log Analysis on a Secure Private Cloud

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering

Capacity Planning Process Estimating the load Initial configuration

Using WestGrid. Patrick Mann, Manager, Technical Operations Jan.15, 2014

Objectives. Chapter 5: CPU Scheduling. CPU Scheduler. Non-preemptive and preemptive. Dispatcher. Alternating Sequence of CPU And I/O Bursts

Resource Allocation Schemes for Gang Scheduling

Discovering the Petascale User Experience in Scheduling Diverse Scientific Applications: Initial Efforts towards Resource Simulation

CPU SCHEDULING (CONT D) NESTED SCHEDULING FUNCTIONS

Dynamic Adaptive Feedback of Load Balancing Strategy

Research of Railway Wagon Flow Forecast System Based on Hadoop-Hazelcast

OPERATING SYSTEMS SCHEDULING

Lecture Outline Overview of real-time scheduling algorithms Outline relative strengths, weaknesses

Microsoft SQL Server OLTP Best Practice

Algorithm Design and Analysis

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction

Monitor and Manage Your MicroStrategy BI Environment Using Enterprise Manager and Health Center

Knowledge Discovery and Data Mining. Structured vs. Non-Structured Data

General Overview of Shared-Memory Multiprocessor Systems

Miami University RedHawk Cluster Working with batch jobs on the Cluster

HPC Deployment of OpenFOAM in an Industrial Setting

Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. !-approximation algorithm.

Dynamic Load Balancing in a Network of Workstations

Transcription:

Mitglied der Helmholtz-Gemeinschaft A highly configurable and efficient simulator for job schedulers on supercomputers April 12, 2013 Carsten Karbach, Jülich Supercomputing Centre (JSC)

Motivation Objective Simulation of supercomputer job schedulers for prediction of job start times Features of simulation program: Efficient and configurable Extensible and generic Use cases: User predicts start time of his own jobs Administrator configurable simulation of supercomputer, throughput optimization Problem: unpublished scheduling algorithms, job schedulers do not provide global on-line prediction April 12, 2013 Carsten Karbach, Jülich Supercomputing Centre (JSC) 2 22

Motivation Solution JuFo (Jülich Forecast): C++ application using data format LML Based on prediction component of LLview LLview provides status information of supercomputers in LML Abstraction of scheduling systems Moab and Loadleveler April 12, 2013 Carsten Karbach, Jülich Supercomputing Centre (JSC) 3 22

Mitglied der Helmholtz-Gemeinschaft Part I: Problem definition April 12, 2013 Carsten Karbach, Jülich Supercomputing Centre (JSC)

1. Problem definition Problem definition What is the objective of JuFo? On-line simulation of global job schedulers LML as interface to LLview, input and output of JuFo Prediction of job dispatch time and used resources global job scheduler Prioritizes and places jobs Fixed CPU set is allocated to each job Handling of reservations, queues and dependencies Examples: Moab (JUROPA), Loadleveler (JUQUEEN) April 12, 2013 Carsten Karbach, Jülich Supercomputing Centre (JSC) 5 22

1. Problem definition The scheduling problem Supercomputer offers resources e.g. CPUs, GPUs, memory Jobs request resources, wait in queues until resources become available Problem: generate optimal schedule for all waiting jobs Optimization over time and resources Approximation with FCFS, List-Scheduling and Backfilling April 12, 2013 Carsten Karbach, Jülich Supercomputing Centre (JSC) 6 22

Mitglied der Helmholtz-Gemeinschaft Part II: Embedding JuFo April 12, 2013 Carsten Karbach, Jülich Supercomputing Centre (JSC)

2. Embedding JuFo Embedding JuFo Supercomputer step 1 step 2 Raw- LML scheduler simulator Client llview-client data gatherer LML_da Raw- LML extends JuFo PTP step 3 LML April 12, 2013 Carsten Karbach, Jülich Supercomputing Centre (JSC) 8 22

2. Embedding JuFo Data format Large-scale system Markup Language (LML) Data format for status information of supercomputers Based on XML, specified by an XML Schema Interface for LML da, JuFo and Clients LLview client visualizes JuFo s outputs Input data required by JuFo Compute nodes: number of processors per node Queues: grouping of jobs, scheduling rules Jobs: running and waiting; requested resources April 12, 2013 Carsten Karbach, Jülich Supercomputing Centre (JSC) 9 22

Mitglied der Helmholtz-Gemeinschaft Part III: Scheduling algorithms April 12, 2013 Carsten Karbach, Jülich Supercomputing Centre (JSC)

3. Scheduling algorithms Constraints No job migration No preemption Wall clock limit (WCL) is mandatory for each job WCL is used as actual job run-time in JuFo April 12, 2013 Carsten Karbach, Jülich Supercomputing Centre (JSC) 11 22

3. Scheduling algorithms Scheduling Algorithms FCFS: Job with longest queue time starts first List-Scheduling: Prioritization by arbitrary formula, lower ranked jobs can run before higher ranked jobs Backfilling: Reservations for top dogs, fill idling resources with smaller jobs April 12, 2013 Carsten Karbach, Jülich Supercomputing Centre (JSC) 12 22

3. Scheduling algorithms Further requirements Prioritization JUROPA: systemprio + userprio + 10 nodes + qtime + qtime/wall 300 actjobs Job dependencies Reservations Resource requests: nodes, CPUs, GPUs, global/per node, network topology Nodesharing Top dogs per queue Queue defines scheduling constraints: Allowed nodes Limits number of active/waiting jobs April 12, 2013 Carsten Karbach, Jülich Supercomputing Centre (JSC) 13 22

Mitglied der Helmholtz-Gemeinschaft Part IV: Design of JuFo April 12, 2013 Carsten Karbach, Jülich Supercomputing Centre (JSC)

4. Design of JuFo Package overview April 12, 2013 Carsten Karbach, Jülich Supercomputing Centre (JSC) 15 22

4. Design of JuFo Target Packages are encapsulated Interaction only via interfaces, actual implementation unknown to foreign packages Reference implementation for each interface Result Extensible basis for job scheduler simulation Well defined extension points for new sorting/scheduling algorithms and resource managers Arbitrary combination of available sorting/scheduling algorithms and resource managers April 12, 2013 Carsten Karbach, Jülich Supercomputing Centre (JSC) 16 22

Mitglied der Helmholtz-Gemeinschaft Part V: Validation April 12, 2013 Carsten Karbach, Jülich Supercomputing Centre (JSC)

5. Validation How to validate results? 1 Log all events for a given time span (new jobs, early job completion, canceling waiting jobs) 2 Run JuFo for this time span using exact WCLs 3 Compare reality with prediction April 12, 2013 Carsten Karbach, Jülich Supercomputing Centre (JSC) 18 22

5. Validation Validation results for JUROPA Validation framework JuFo is best configured for JUROPA Validation run on 8 different days Results Time span: 2-5 hours Jobs: 100-280 per day Ø Error: 1-13 minutes High variance due to missing information April 12, 2013 Carsten Karbach, Jülich Supercomputing Centre (JSC) 19 22

Mitglied der Helmholtz-Gemeinschaft Part VI: Outlook April 12, 2013 Carsten Karbach, Jülich Supercomputing Centre (JSC)

6. Outlook Outlook Configuration and validation for JUQUEEN, JUDGE Parallelization for higher efficiency Visualization: in PTP, new visualization methods April 12, 2013 Carsten Karbach, Jülich Supercomputing Centre (JSC) 21 22

Thank you for your attention! April 12, 2013 Carsten Karbach, Jülich Supercomputing Centre (JSC) 22 22

Summary JuFo: extensible simulation for global job schedulers LML as data format LML da collects input data Scheduling algorithms: FCFS, List-Scheduling, Backfilling Analysis of Loadleveler and Moab Simulation design Validation framework, successful on JUROPA April 12, 2013 Carsten Karbach, Jülich Supercomputing Centre (JSC) 23 22

Backfilling implementation resources Resource limit Job 2 Job 1 BW 1 BW 2 Res 1 BW 3 Res 2 current position time [s] Backfill window: resources time span April 12, 2013 Carsten Karbach, Jülich Supercomputing Centre (JSC) 24 22

Scheduling problem mathematical definition Schedule s = (s 1,..., s n ) T with s j as dispatch time of job j V = {s R n 0 s is valid}, all allowed schedules f : objective function, e.g. f (s) := max j J (s j + w j ), w j WCL j find s opt with f (s opt ) = max s V f (s) Derived problem from resource-constrained project scheduling problem (RCPSP) RCPSP is NP-hard no solution in polynomial time Instead approximation with First-Come-First-Served (FCFS) List-Scheduling Backfilling April 12, 2013 Carsten Karbach, Jülich Supercomputing Centre (JSC) 25 22

Implementation FCFS void fcfs(waitingjobs) sort waitingjobs by queue date timepos = 0 timepos < timeline.size() && waitingjobs.size() > 0 react on event at timepos waitingjobs.size() > 0 is waitingjobs[0] insertable T F insert job into system at timepos break waitingjobs.erase( waitingjobs[0] ) timepos++ April 12, 2013 Carsten Karbach, Jülich Supercomputing Centre (JSC) 26 22

Summary of JuFo packages Simulation: implements scheduling algorithms ResourceManager: manages compute resources; decides, whether a job can be started on given resources Jobsorting: configurable prioritization of waiting jobs Timeline: stores all simulation events such as job dispatch, completion and reservations LMLParsing: reads input LML, converts into object hierarchy, generates output LML April 12, 2013 Carsten Karbach, Jülich Supercomputing Centre (JSC) 27 22

Validation results JUROPA Details I April 12, 2013 Carsten Karbach, Jülich Supercomputing Centre (JSC) 28 22

Validation results JUROPA Details II Difference between users WCL and actual job time span: on average 2-4 hours Accuracy of prediction based on users wall comp. bad (average error of multiple hours) Target of JuFo is not exact prediction, but modeling the job scheduler based on input data provided by LML da Better results expected by combining JuFo with statistical data for WCL April 12, 2013 Carsten Karbach, Jülich Supercomputing Centre (JSC) 29 22

JuFo for JUROPA improvements Use actual contingent data for job priorities instead of showq outputs system specific extension of LML da Jobs with identical system priority are sometimes sorted wrong improve details of job sorting Reservations are not collected by LML da extension of LML da Dynamic querying of queue-configuration extension/configuration of LML da April 12, 2013 Carsten Karbach, Jülich Supercomputing Centre (JSC) 30 22

JUROPA prediction April 12, 2013 Carsten Karbach, Jülich Supercomputing Centre (JSC) 31 22

JUROPA actual schedule April 12, 2013 Carsten Karbach, Jülich Supercomputing Centre (JSC) 32 22

Mitglied der Helmholtz-Gemeinschaft Part I: LLview s prediction April 12, 2013 Carsten Karbach, Jülich Supercomputing Centre (JSC)

1. LLview s prediction LLview monitors supercomputer status Source: Snapshot LLview for JUQUEEN (IBM BG/Q, 450k cores, 5.9 Petaflops) April 12, 2013 Carsten Karbach, Jülich Supercomputing Centre (JSC) 34 22

1. LLview s prediction LLview s prediction SchedSim SchedSim is written in Perl Especially designed and tested for Loadleveler systems Basis for design of JuFo Source: Snapshot LLview for JUQUEEN April 12, 2013 Carsten Karbach, Jülich Supercomputing Centre (JSC) 35 22

1. LLview s prediction SchedSim Analysis Pro + Highly configurable + Advanced modeling of JUGENE + Simple installation, embedded into LLview Contra Limited scalability and performance Hard to extend Based on old implicit XML format, which is to be replaced by LML April 12, 2013 Carsten Karbach, Jülich Supercomputing Centre (JSC) 36 22

Mitglied der Helmholtz-Gemeinschaft Part II: Job schedulers April 12, 2013 Carsten Karbach, Jülich Supercomputing Centre (JSC)

2. Job schedulers Loadleveler vs. Moab Loadleveler JUGENE, JUQUEEN Consists of scheduler, execution machine and central manager Considers torus-network No simulation mode, prediction only for top dogs Backfilling Moab JUROPA, JUDGE Scheduler using Torque as resource manager Network irrelevant Simulation mode and showstart Backfilling with backfill window first April 12, 2013 Carsten Karbach, Jülich Supercomputing Centre (JSC) 38 22

2. Job schedulers Similarities Reservations Highly configurable Compute nodes offer resources, jobs request them Main components: 1 Job sorting 2 Scheduling algorithm 3 Resource manager April 12, 2013 Carsten Karbach, Jülich Supercomputing Centre (JSC) 39 22

Mitglied der Helmholtz-Gemeinschaft Part III: Optimization April 12, 2013 Carsten Karbach, Jülich Supercomputing Centre (JSC)

3. Optimization Serial Optimization Similar jobs: take advantage of jobs with similar requests (queue, number CPUs, WCL) Simultaneous events: execute events with identical time stamp in one iteration; reduces overall number of iterations Backfill windows: generate backfill windows and place jobs into them, instead of forward simulation separately for each job Recursive interval data structure: Use intervals for storing compute resources instead of expanded trees April 12, 2013 Carsten Karbach, Jülich Supercomputing Centre (JSC) 41 22

3. Optimization Recursive interval data structure JuFo manages compute nodes and CPUs per node Jobs are allocated to resources, e.g. job 1 uses nodes 10-15 Idea 1: Tree simple, but not efficient and memory demanding Idea 2: Recursive intervals efficient, but more complex operations Recursive ID list Ranges Node 1 Node 2 Node 3 Node [1,3] Core 1 Core 2 Core [1,2] April 12, 2013 Carsten Karbach, Jülich Supercomputing Centre (JSC) 42 22

3. Optimization Profile analysis I Problem How much simulation time is spent for each component? Solution Profile analysis with gprof Investigate JuFo runs on JUROPA: Backfilling, 1 top dog per queue, divided into JSC and HPC-FF 10 input samples with 900-1600 jobs, more than 3000 nodes, simulation time 16-65 s April 12, 2013 Carsten Karbach, Jülich Supercomputing Centre (JSC) 43 22

3. Optimization Profile analysis II 10,9 56,9 6,4 19,3 6,5 backfillwindows sort reservetopdogs isinsertable other Results Bigger part for backfill windows But efficient job placement 20% for sorting April 12, 2013 Carsten Karbach, Jülich Supercomputing Centre (JSC) 44 22

3. Optimization Ideas for parallelization I/O: Parallel parsing of job and node data, parallel output Job sorting: job priorities are calculated independently parallel sorting Scheduling algorithms: parallel search for suitable jobs at each time step Resource manager: parallel search for suitable compute nodes, torus search April 12, 2013 Carsten Karbach, Jülich Supercomputing Centre (JSC) 45 22