Heuristic Static Load-Balancing Algorithm Applied to CESM

Similar documents
Heuristic Static Load-Balancing Algorithm Applied to the Fragment Molecular Orbital Method

The Greedy Method. Introduction. 0/1 Knapsack Problem

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

An MILP model for planning of batch plants operating in a campaign-mode

Compiling for Parallelism & Locality. Dependence Testing in General. Algorithms for Solving the Dependence Problem. Dependence Testing

Power-of-Two Policies for Single- Warehouse Multi-Retailer Inventory Systems with Order Frequency Discounts

Logical Development Of Vogel s Approximation Method (LD-VAM): An Approach To Find Basic Feasible Solution Of Transportation Problem

Project Networks With Mixed-Time Constraints

J. Parallel Distrib. Comput.

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

Availability-Based Path Selection and Network Vulnerability Assessment

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

Modern Problem Solving Techniques in Engineering with POLYMATH, Excel and MATLAB. Introduction

2. SYSTEM MODEL. the SLA (unlike the only other related mechanism [15] we can compare it is never able to meet the SLA).

2008/8. An integrated model for warehouse and inventory planning. Géraldine Strack and Yves Pochet

Forecasting the Direction and Strength of Stock Market Movement

How To Solve An Onlne Control Polcy On A Vrtualzed Data Center

Optimized ready mixed concrete truck scheduling for uncertain factors using bee algorithm

Testing and Debugging Resource Allocation for Fault Detection and Removal Process

SUMMARY. Topology optimization, buckling, eigenvalue, derivative, structural optimization 1. INTRODUCTION

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

Examensarbete. Rotating Workforce Scheduling. Caroline Granfeldt

Credit Limit Optimization (CLO) for Credit Cards

Dynamic Constrained Economic/Emission Dispatch Scheduling Using Neural Network

Support Vector Machines

INSTITUT FÜR INFORMATIK

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

Peak Inverse Voltage

Formulating & Solving Integer Problems Chapter

Chapter 4 ECONOMIC DISPATCH AND UNIT COMMITMENT

SOLVING CARDINALITY CONSTRAINED PORTFOLIO OPTIMIZATION PROBLEM BY BINARY PARTICLE SWARM OPTIMIZATION ALGORITHM

A Genetic Programming Based Stock Price Predictor together with Mean-Variance Based Sell/Buy Actions

A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña

An ILP Formulation for Task Mapping and Scheduling on Multi-core Architectures

An Integrated Approach of AHP-GP and Visualization for Software Architecture Optimization: A case-study for selection of architecture style

In some supply chains, materials are ordered periodically according to local information. This paper investigates

Method for Production Planning and Inventory Control in Oil

SCHEDULING OF CONSTRUCTION PROJECTS BY MEANS OF EVOLUTIONARY ALGORITHMS

Hosting Virtual Machines on Distributed Datacenters

POLYSA: A Polynomial Algorithm for Non-binary Constraint Satisfaction Problems with and

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

Figure 1. Inventory Level vs. Time - EOQ Problem

Research Article Enhanced Two-Step Method via Relaxed Order of α-satisfactory Degrees for Fuzzy Multiobjective Optimization

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

LITERATURE REVIEW: VARIOUS PRIORITY BASED TASK SCHEDULING ALGORITHMS IN CLOUD COMPUTING

New Approaches to Support Vector Ordinal Regression

1 Example 1: Axis-aligned rectangles

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

Imperial College London

Schedulability Bound of Weighted Round Robin Schedulers for Hard Real-Time Systems

行 政 院 國 家 科 學 委 員 會 補 助 專 題 研 究 計 畫 成 果 報 告 期 中 進 度 報 告

Preventive Maintenance and Replacement Scheduling: Models and Algorithms

Ant Colony Optimization for Economic Generator Scheduling and Load Dispatch

IMPACT ANALYSIS OF A CELLULAR PHONE

Omega 39 (2011) Contents lists available at ScienceDirect. Omega. journal homepage:

On File Delay Minimization for Content Uploading to Media Cloud via Collaborative Wireless Network

Transport-Problem-Based Algorithm fordynamicload Balancing in Distributed LogicSimulation

Solving Factored MDPs with Continuous and Discrete Variables

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network *

Loop Parallelization

Using Multi-objective Metaheuristics to Solve the Software Project Scheduling Problem

Ants Can Schedule Software Projects

A GENETIC ALGORITHM-BASED METHOD FOR CREATING IMPARTIAL WORK SCHEDULES FOR NURSES

Activity Scheduling for Cost-Time Investment Optimization in Project Management

Enabling P2P One-view Multi-party Video Conferencing

A Simple Approach to Clustering in Excel

DECOMPOSITION ALGORITHM FOR OPTIMAL SECURITY-CONSTRAINED POWER SCHEDULING

The Mathematical Derivation of Least Squares

Joint Scheduling of Processing and Shuffle Phases in MapReduce Systems

A method for a robust optimization of joint product and supply chain design

AN optimization problem to maximize the up-link

Mooring Pattern Optimization using Genetic Algorithms

THE LOAD PLANNING PROBLEM FOR LESS-THAN-TRUCKLOAD MOTOR CARRIERS AND A SOLUTION APPROACH. Professor Naoto Katayama* and Professor Shigeru Yurimoto*

Optimization of High-Pressure Vapor-Liquid Equilibrium Modelling of Binary Mixtures (Supercritical Fluid + Ionic Liquid) by Particle Swarm Algorithm

A Prefix Code Matching Parallel Load-Balancing Method for Solution-Adaptive Unstructured Finite Element Graphs on Distributed Memory Multicomputers

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6

Solution: Let i = 10% and d = 5%. By definition, the respective forces of interest on funds A and B are. i 1 + it. S A (t) = d (1 dt) 2 1. = d 1 dt.


How To Make A Network Of A Network From A Remnant N Inventory System

Transcription:

Heurstc Statc Load-Balancng Algorthm Appled to CESM 1 Yur Alexeev, 1 Sher Mckelson, 1 Sven Leyffer, 1 Robert Jacob, 2 Anthony Crag 1 Argonne Natonal Laboratory, 9700 S. Cass Avenue, Argonne, IL 60439, USA 2 CCSM Software Engneerng Group, NCAR, Boulder, CO 80305, USA

Argonne Natonal Laboratory supercomputers Intrepd (IBM Blue Gene/P) Mra(IBM Blue Gene/Q) 40,960 nodes / 163,840 cores 557 Teraflops peak PowerPC 450 wth 4 cores/node at 850 MHz Double FPU 2 wde double precson SIMD 512 MB per core 49,152 nodes / 786,432 cores 10 Petaflops peak PowerPC A2 wth 16 cores/node at 1.6 GHz Quad FPU 4 wde double precson SIMD 1Gb per core 2

CESM setup CESM fully coupled actve components, 1 degree resoluton: f09_g16.b Calculatons were run on Intrepd (40 racks Blue Gene/P) Goal: mnmze total executon tme Tme Node allocaton 3

Heurstc Statc Load-Balancng (HSLB) Algorthm (1) Gather Data: Run CESM calculatons D tmes usng a dfferent total numbers of cores. Collect the runnng tmes y j for each component. (2) Ft: Next, solve least squares problem for each component to determne the coeffcents a, b, c, and d for each fragment n performance model. (3) Solve: Determne the best allocaton by solvng the MINLP, and obtan the optmal values of sze n for each component. (4) Execute: Execute CESM smulatons, usng the determned subgroup szes n step (3). 4

Gather data for step (1) Calculatons were run on 512, 1024, 2048, 4096, 8192 cores 5

Performance model for step (2) a T scal nonln seral c ( n ) = T ( n ) + T ( n ) + T = + b n + d, = 1,..., C n T ( n ) - the wall-clock tme to compute the th component as a functon of n the number of cores allocated to process t T scal ( n ) = T seral = a n d - tme spent n perfectly scalable porton of the component - tme spent n the non-parallelzed porton of the component T nonln c ( n ) = b n - tme spent n partally parallelzed porton: ntalzaton, communcaton, and synchronzaton etc. (anythng nonlnear and not seral) Model makes sense both mathematcally and from the vewpont of Amdahl s law 6

Fttng data for step (2) Obtan the best ft by solvng the least squares problem mn a, b, c, d D y j j= 1 a n j c b n j subject to a, b, c, d R + d 2 7

Formulatng the Optmzaton Problem Problem: optmze the number of nodes, n, to be allocated to each component { 1,... C} C mnmze the total wall tme over all components : mn T ( n ) n = 1 mnmze the maxmum wall tme used by a component : mn max ( n ) maxmze the mnmum wall tme used by a component : Number of nodes tme n T max mnt ( n ) n 8

Formulatng the mathematcal problem for step (3) 1 Gven: + - set of postve nteger numbers 2 + - set of postve real numbers 3 C = { ce,lnd,atm,ocn} = {, l, a, o} - set of components 4 N + - total number of nodes avalable for allocaton 5 O = { 2,4,,480,768} = { O,,O } - 1 m possble allocatons for ocn 6 A = { 1,2,,1638,1664} = { A,, A } - 1 m possble allocatons for atm 7 Varables: T + - wall-clock tme obtaned by solvng allocaton problem 8 T celnd + - wall-clock tme to balance lnd and ce 9 Tsync + - synchronzaton tolerance to balance lnd and ce 10 n j + - number of nodes allocated 11 Tj( nj) + - (ftted) performance functon modelng tme taken to run on n j 12 z k {0,1} - bnary varables to model selecton of number nodes, n o 13 Mnmze: T Constrants for layout (1) 14 Subject to: Tcelnd T ( n ) 15 Tcelnd Tl ( nl ) 16 T Tcelnd + Ta ( na ) 17 T To( no) 18 Tl ( nl ) T ( n ) Tsync 19 Tl ( nl ) T ( n ) + Tsync 20 na+ no N 21 n+ nl na 9

Solvng MINLP problem Formulaton s wrtten n AMPL Classcal branch-and-bound [Dakn, 1965] mplemented n MINOTAUR: http://wk.mcs.anl.gov/mnotaur Solve relaxed NLP (contnuous relaxaton); soluton value provdes lower bound Branch on y Solve NLP & branch untl: Node nfeasble Node nteger feasble (get upper bound) Lower bound Tree search exhaustve but not complete enumeraton Method guarantees to fnd optmal global soluton or show that none exst Soluton tme s 10 seconds on a sngle core (155 components) 10

MINLP Tree Synthess MINLP B&B Tree: 10000+ nodes after 360s 11

Results CESM fully coupled actve components, 1 degree resoluton: f09_g16.b Calculatons were run on Intrepd (40 racks Blue Gene/P) 1 resoluton, 128 nodes Manual HSLB components # nodes Tme, sec Predcted # nodes Predcted Tme, sec Actual Tme, sec lnd 24 63.766 15 100.951 100.202 ce 80 109.054 89 102.972 116.472 atm 104 306.952 104 307.651 308.699 ocn 24 362.669 24 365.649 365.853 Total tme, sec 416.006 410.623 425.171 12

Results CESM fully coupled actve components, 1/8 degree resoluton: ne240_f02_t12.b 13

Predcton of Optmal Layout 14

Predcton of Effcency En ( ) = T(64) / Tn ( ) n / 64 15

Future work Convert the AMPL code to C++ to be more portable Create scrpts that wll automate the load balancng process - Frst scrpt wll gather tmng data for scalng curve by creatng/runnng 4-5 test layouts - Second scrpt wll analyze the tmng fles and produce a load balanced layout based on how many cores the user would lke to run on 16

Acknowledgments Thank you Dr. Ray Loy and ALCF team members (Argonne Natonal Laboratory) Jm Edwards and Marana Vertensten for encouragng ths work and helpful dscussons. Fundng was provded by U.S. Department of Energy 17