Dynamic Load Balancing in Charm++ Abhinav S Bhatele Parallel Programming Lab, UIUC

Similar documents

Optimizing Load Balance Using Parallel Migratable Objects

Distributed communication-aware load balancing with TreeMatch in Charm++

Charm++, what s that?!

A Review of Customized Dynamic Load Balancing for a Network of Workstations

Characterizing the Performance of Dynamic Distribution and Load-Balancing Techniques for Adaptive Grid Hierarchies

Multi-GPU Load Balancing for Simulation and Rendering

Dynamic Topology Aware Load Balancing Algorithms for Molecular Dynamics Applications

ParFUM: A Parallel Framework for Unstructured Meshes. Aaron Becker, Isaac Dooley, Terry Wilmarth, Sayantan Chakravorty Charm++ Workshop 2008

A Hierarchical Approach for Load Balancing on Parallel Multi-core Systems

Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations

Distributed Dynamic Load Balancing for Iterative-Stencil Applications

DECENTRALIZED LOAD BALANCING IN HETEROGENEOUS SYSTEMS USING DIFFUSION APPROACH

DOCLITE: DOCKER CONTAINER-BASED LIGHTWEIGHT BENCHMARKING ON THE CLOUD

Mesh Generation and Load Balancing

Chapter 15: Distributed Structures. Topology

Scientific Computing Programming with Parallel Objects

CSE 4351/5351 Notes 7: Task Scheduling & Load Balancing

GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications

Cellular Computing on a Linux Cluster

Feedback guided load balancing in a distributed memory environment

FRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG

Overlapping Data Transfer With Application Execution on Clusters

Parallel Computing. Benson Muite. benson.

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research

A Survey on Load Balancing and Scheduling in Cloud Computing

MOSIX: High performance Linux farm

A Clustered Approach for Load Balancing in Distributed Systems

COMP 422, Lecture 3: Physical Organization & Communication Costs in Parallel Machines (Sections 2.4 & 2.5 of textbook)

How To Balance In Cloud Computing

Load Balancing Techniques

Interconnection Networks Programmierung Paralleler und Verteilter Systeme (PPV)

Protein Protein Interaction Networks

Decentralized Dynamic Load Balancing: The Particles Approach

Mining Association Rules on Grid Platforms

An Efficient load balancing using Genetic algorithm in Hierarchical structured distributed system

Praktikum Wissenschaftliches Rechnen (Performance-optimized optimized Programming)

System Interconnect Architectures. Goals and Analysis. Network Properties and Routing. Terminology - 2. Terminology - 1

Big Graph Processing: Some Background

Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing

Stability of QOS. Avinash Varadarajan, Subhransu Maji

HPC Deployment of OpenFOAM in an Industrial Setting

Load balancing in a heterogeneous computer system by self-organizing Kohonen network

A Survey on Load Balancing Technique for Resource Scheduling In Cloud

Inciting Cloud Virtual Machine Reallocation With Supervised Machine Learning and Time Series Forecasts. Eli M. Dow IBM Research, Yorktown NY

OpenMosix Presented by Dr. Moshe Bar and MAASK [01]

A Method Based on the Combination of Dynamic and Static Load Balancing Strategy in Distributed Rendering Systems

CROSS LAYER BASED MULTIPATH ROUTING FOR LOAD BALANCING

Load Balancing in Structured Peer to Peer Systems

Introduction to Parallel Computing. George Karypis Parallel Programming Platforms

A Novel Switch Mechanism for Load Balancing in Public Cloud

On the Placement of Management and Control Functionality in Software Defined Networks

Load Balancing in Structured Peer to Peer Systems

Static Load Balancing

walberla: Towards an Adaptive, Dynamically Load-Balanced, Massively Parallel Lattice Boltzmann Fluid Simulation

Interconnection Networks. Interconnection Networks. Interconnection networks are used everywhere!

CHAPTER 1 INTRODUCTION

Cost Effective Selection of Data Center in Cloud Environment

USAGE OF DYNAMIC LOAD BALANCING FOR DISTRIBUTED SYSTEM IN CLOUD COMPUTING

Load balancing; Termination detection

IMPROVED PROXIMITY AWARE LOAD BALANCING FOR HETEROGENEOUS NODES

Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand

Instruction scheduling

Oracle SOA Infrastructure Deployment Models/Patterns

Preserving Message Integrity in Dynamic Process Migration

Decentralized Utility-based Sensor Network Design

Multilevel Load Balancing in NUMA Computers

Interconnection Networks

Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes

Mesh Partitioning and Load Balancing

A Novel Approach for Efficient Load Balancing in Cloud Computing Environment by Using Partitioning

Integrated System Modeling for Handling Big Data in Electric Utility Systems

International journal of Engineering Research-Online A Peer Reviewed International Journal Articles available online

Hyacinth An IEEE based Multi-channel Wireless Mesh Network

JoramMQ, a distributed MQTT broker for the Internet of Things

Dynamic Load Balancing in a Network of Workstations

Lecture 18: Interconnection Networks. CMU : Parallel Computer Architecture and Programming (Spring 2012)

Infrastructure as a Service (IaaS)

Energy Efficient MapReduce

Transcription:

Dynamic Load Balancing in Charm++ Abhinav S Bhatele Parallel Programming Lab, UIUC

Outline Dynamic Load Balancing framework in Charm++ Measurement Based Load Balancing Examples: Hybrid Load Balancers Topology-aware Load Balancers User Control and Flexibility Future Work

Dynamic Load-Balancing Task of load balancing (LB) Given a collection of migratable objects and a set of processors Find a mapping of objects to processors Almost same amount of computation on each processor Additional constraints Ensure communication between processors is minimum Take topology of the machine into consideration Dynamic mapping of chares to processors Load on processors keeps changing during the actual execution

Load-Balancing Approaches A rich set of strategies in Charm++ Two main ideas No correlation between successive iterations Fully dynamic Seed load balancers Load varies slightly over iterations CSE, Molecular Dynamics simulations Measurement-based load balancers

Principle of Persistence Object communication patterns and computational loads tend to persist over time In spite of dynamic behavior Abrupt and large, but infrequent changes (e.g. AMR) Slow and small changes (e.g. particle migration) Parallel analog of principle of locality Heuristics, that hold for most CSE applications

Measurement Based Load Balancing Based on principle of persistence Runtime instrumentation (LB Database) communication volume and computation time Measurement based load balancers Use the database periodically to make new decisions Many alternative strategies can use the database Centralized vs. distributed Greedy improvements vs. complete reassignment Topology-aware

Load Balancer Strategies Centralized Object load data are sent to processor 0 Integrate to a complete object graph Migration decision is broadcasted from processor 0 Global barrier Distributed Load balancing among neighboring processors Build partial object graph Migration decision is sent to its neighbors No global barrier

Load Balancing on Large Machines Existing load balancing strategies don t scale on extremely large machines Limitations of centralized strategies: Central node: memory/communication bottleneck Decision-making algorithms tend to be very slow Limitations of distributed strategies: Difficult to achieve well-informed load balancing decisions

Simulation Study - Memory Overhead Simulation performed with the performance simulator BigSim Memory usage (MB) 32K processors 64K processors Number of objects lb_test benchmark is a parameterized program that creates a specified number of communicating objects in 2D-mesh.

Load Balancing Execution Time Execution time of load balancing algorithms on a 64K processor simulation

Hierarchical Load Balancers Hierarchical distributed load balancers Divide into processor groups Apply different strategies at each level Scalable to a large number of processors

Hierarchical Tree (an example) 64K processor hierarchical tree Level 2 1 Level 1 0 1024 63488 64512 64 Level 0... 0 1023 1024 2047 63488 64511 64512 65535 1024 Apply different strategies at each level

An Example: Hybrid LB Dividing processors into independent sets of groups, and groups are organized in hierarchies (decentralized) Each group has a leader (the central node) which performs centralized load balancing A particular hybrid strategy that works well Gengbin Zheng, PhD Thesis, 2005

Our HybridLB Scheme Refinement-based Load balancing 1 Load Data 0 1024 63488 64512 Load Data (OCG)... 0 1023 1024 2047 63488 64511 64512 65535 Greedy-based Load balancing token object

Memory Overhead Memory usage (MB) CentralLB HybridLB Number of Objects Simulation of lb_test (for 64k processors)

Total Load Balancing Time Simulation of lb_test for 64K processors Time(s) Number of Objects GreedyCommLB HybridLB(GreedyCommLB) N procs 4096 8192 16384 Memory 6.8MB 22.57MB 22.63MB lb_test benchmark s actual run on BG/L at IBM (512K objects)

Load Balancing Quality Simulation of lb_test for 64K processors Maximum predicted load (seconds) Number of Objects GreedyCommLB HybridLB

Topology-aware mapping of tasks Problem Map tasks to processors connected in a topology, such that: Compute load on processors is balanced Communicating chares (objects) are placed on nearby processors.

Mapping Model Task Graph : G t = (V t, E t ) Weighted graph, undirected edges Nodes chares, w(v a ) computation Edges communication, c ab bytes between v a and v b Topology-graph : G p = (V p, E p ) Nodes processors Edges Direct Network Links Ex: 3D-Torus, 2D-Mesh, Hypercube

Model (Contd.) Task Mapping Assigns tasks to processors P : V t V p Hop-Bytes Hop-Bytes Communication cost The cost imposed on the network is more if more links are used Weigh inter-processor communication by distance on the network

Load Balancing Framework in Charm++ Issues of mapping and decomposition separated User had full control over mapping Many choices Initial static mapping Mapping at run-time as newer objects created Write a new load balancing strategy: inherit from BaseLB

Future Work Hybrid Model-based Load Balancers User gives a model to the LB Combine it with measurement based load balancer Multicast aware Load Balancers Try and place targets of multicast on the same processor

Conclusions Measurement based LBs are good for most cases Need scalable LBs in the future due to large machines like BG/L Hybrid Load Balancers Communication sensitive LBs Topology aware LBs