A number of tasks executing serially or in parallel. Distribute tasks on processors so that minimal execution time is achieved. Optimal distribution



Similar documents
Load Balancing and Termination Detection

Load Balancing and Termination Detection

Chapter 7 Load Balancing and Termination Detection

Static Load Balancing

Load balancing Static Load Balancing

Load Balancing and Termination Detection

Load balancing; Termination detection

Load Balancing Techniques

Load balancing; Termination detection

LOAD BALANCING TECHNIQUES

CSE 4351/5351 Notes 7: Task Scheduling & Load Balancing

Scheduling Shop Scheduling. Tim Nieberg

Outline. NP-completeness. When is a problem easy? When is a problem hard? Today. Euler Circuits

Load Balancing. Load Balancing 1 / 24

PPD: Scheduling and Load Balancing 2

Home Page. Data Structures. Title Page. Page 1 of 24. Go Back. Full Screen. Close. Quit

Praktikum Wissenschaftliches Rechnen (Performance-optimized optimized Programming)

Social Media Mining. Graph Essentials

5. A full binary tree with n leaves contains [A] n nodes. [B] log n 2 nodes. [C] 2n 1 nodes. [D] n 2 nodes.

Distributed Computing over Communication Networks: Maximal Independent Set

Approximation Algorithms

Cost Model: Work, Span and Parallelism. 1 The RAM model for sequential computation:

Scheduling Allowance Adaptability in Load Balancing technique for Distributed Systems

Load Balancing in Distributed System. Prof. Ananthanarayana V.S. Dept. Of Information Technology N.I.T.K., Surathkal

Why? A central concept in Computer Science. Algorithms are ubiquitous.

DYNAMIC GRAPH ANALYSIS FOR LOAD BALANCING APPLICATIONS

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. #-approximation algorithm.

A Comparison of General Approaches to Multiprocessor Scheduling

Data Structures and Algorithms Written Examination

Routing in packet-switching networks

Load balancing in a heterogeneous computer system by self-organizing Kohonen network

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. !-approximation algorithm.

Chapter Load Balancing. Approximation Algorithms. Load Balancing. Load Balancing on 2 Machines. Load Balancing: Greedy Scheduling

A Comparative Performance Analysis of Load Balancing Algorithms in Distributed System using Qualitative Parameters

Computer Algorithms. NP-Complete Problems. CISC 4080 Yanjun Li

Performance Analysis of Load Balancing Algorithms in Distributed System

Load Balancing Algorithms for Peer to Peer and Client Server Distributed Environments

W4118 Operating Systems. Instructor: Junfeng Yang

Multiprocessor Scheduling and Scheduling in Linux Kernel 2.6

Scheduling. Yücel Saygın. These slides are based on your text book and on the slides prepared by Andrew S. Tanenbaum

Introduction to Scheduling Theory

OPERATING SYSTEMS SCHEDULING

Real Time Scheduling Basic Concepts. Radek Pelánek

Reductions & NP-completeness as part of Foundations of Computer Science undergraduate course

Process Scheduling CS 241. February 24, Copyright University of Illinois CS 241 Staff

Scheduling Algorithms

Interconnection Networks. Interconnection Networks. Interconnection networks are used everywhere!

Cloud Computing. Lectures 10 and 11 Map Reduce: System Perspective

A Comparison of Dynamic Load Balancing Algorithms

Module 6. Embedded System Software. Version 2 EE IIT, Kharagpur 1

Final Report. Cluster Scheduling. Submitted by: Priti Lohani

Algorithm Design and Analysis

A Study on the Application of Existing Load Balancing Algorithms for Large, Dynamic, Heterogeneous Distributed Systems

Road Map. Scheduling. Types of Scheduling. Scheduling. CPU Scheduling. Job Scheduling. Dickinson College Computer Science 354 Spring 2010.

Analysis of Algorithms, I

Distributed Computing over Communication Networks: Topology. (with an excursion to P2P)

PARALLELIZED SUDOKU SOLVING ALGORITHM USING OpenMP

Social Media Mining. Network Measures

Scalable Source Routing

A STUDY OF TASK SCHEDULING IN MULTIPROCESSOR ENVIROMENT Ranjit Rajak 1, C.P.Katti 2, Nidhi Rajak 3

CPU Scheduling Outline

International Journal of Scientific & Engineering Research, Volume 6, Issue 4, April ISSN

2. is the number of processes that are completed per time unit. A) CPU utilization B) Response time C) Turnaround time D) Throughput

Bicolored Shortest Paths in Graphs with Applications to Network Overlay Design

Applied Algorithm Design Lecture 5

Various Schemes of Load Balancing in Distributed Systems- A Review

Design and Implementation of Efficient Load Balancing Algorithm in Grid Environment

Scheduling. Scheduling. Scheduling levels. Decision to switch the running process can take place under the following circumstances:

Dynamic load balancing of parallel cellular automata

Operatin g Systems: Internals and Design Principle s. Chapter 10 Multiprocessor and Real-Time Scheduling Seventh Edition By William Stallings

Lecture 7: Clocking of VLSI Systems

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research

A Review And Evaluations Of Shortest Path Algorithms

SIMULATION OF LOAD BALANCING ALGORITHMS: A Comparative Study

A Survey Of Various Load Balancing Algorithms In Cloud Computing

Analysis of MapReduce Algorithms

Minimize Response Time Using Distance Based Load Balancer Selection Scheme

Reminder: Complexity (1) Parallel Complexity Theory. Reminder: Complexity (2) Complexity-new

Latch Timing Parameters. Flip-flop Timing Parameters. Typical Clock System. Clocking Overhead

Objectives. Chapter 5: CPU Scheduling. CPU Scheduler. Non-preemptive and preemptive. Dispatcher. Alternating Sequence of CPU And I/O Bursts

CPU SCHEDULING (CONT D) NESTED SCHEDULING FUNCTIONS

CPU Scheduling 101. The CPU scheduler makes a sequence of moves that determines the interleaving of threads.

6.852: Distributed Algorithms Fall, Class 2

Main Points. Scheduling policy: what to do next, when there are multiple threads ready to run. Definitions. Uniprocessor policies

Load Balancing in cloud computing

Parallelization: Binary Tree Traversal

Operating Systems. III. Scheduling.

Administration. Instruction scheduling. Modern processors. Examples. Simplified architecture model. CS 412 Introduction to Compilers

Experiments on the local load balancing algorithms; part 1

A Survey on Load Balancing and Scheduling in Cloud Computing

Efficiency of Server Task Queueing for Dynamic Load Balancing

Introduction to CMOS VLSI Design (E158) Lecture 8: Clocking of VLSI Systems

Transcription:

Scheduling MIMD parallel program A number of tasks executing serially or in parallel Lecture : Load Balancing The scheduling problem NP-complete problem (in general) Distribute tasks on processors so that minimal execution time is achieved Optimal distribution Processor allocation + execution order such that the execution time is minimized Scheduling system (Consumer, Policy, Resource) Consumer Scheduler Policy Resource Load Balancing Imperfect balance Perfect balance Scheduling Principles Local scheduling Timesharing between processes on one processor Global scheduling Allocate work to processors in a // system Static allocation (before execution, at compile time) Dynamic allocation (during execution) scheduler static dynamic sub-optimal optimal distributed non-distributed For the observer it is the longest execution time that matters!!! heuristic approx cooperative optimal non cooperative sub-optimal heuristic approx Static Load Balancing Scheduling decisions are made before execution Task graph known before execution Each job is allocated to one processor statically Optimal scheduling (impossible?) Sub-optimal scheduling Heuristics (use knowledge acquired through experience) Example: Put tasks that communicate a lot on the same processor Approximative Limited machine-/program-model, suboptimal Drawbacks Can not handle non-determinism in programs, should not be used when we do not know exactly what will happen Dynamic Load Balancing Scheduling decisions during program execution Distributed Decisions made by local distributed schedulers Cooperative Local schedulers cooperate global scheduling Non cooperative Local schedulers do not cooperate affect only local performance Non distributed Decisions made by one processor (master) Disadvantages Hard to find optimal schedulers Overhead as it is done during execution 6 (e.g. DFS-search)

Other kinds of scheduling Single application / multiple application system Only one application at the time, minimize execution time for that application Several parallel applications (compare to batch-queues), minimize the average execution time for all applications Adaptive / non adaptive scheduling Changes behavior depending on feedback from the system Is not affected by feedback Preemptive / non-preemptive scheduling Allows a process to be interrupted if it is allowed to resume later on non-preemptive preemptive Does not allow a process to be interrupted 7 Graph Theory Approach Static Scheduling (for programs without loops and jumps) DAG (directed acyclic graph) = task graph Start-node (no parents), exit-node (no children) Machine Model Processors P = {P,..., P m Edge matrix (mxm), comm-cost P i,j T, A, D Processor performance S i [instructions per second] Parallel Program Model Tasks T = {T,..., T n The execution order is given by the arrows Communication matrix (nxn), no. elem. D i,j Number of instructions A i 0 8 8 Construction of schedules Schedule: mapping that allocates one or more disjunct time interval to each task so that Exactly one processor gets each interval The sum of the intervals equals the execution time for the task Different intervals on the same processor do not overlap The order between tasks is maintained Some processor is always allocated a job 9 Optimal Scheduling Algorithms The scheduling problem is NP complete for the general case. Exceptions: HLF (Highest Level First), CP (Critical Path), LP (Longest Path) which in most cases gives optimal scheduling List scheduling: priority list with nodes and allocate the nodes one by one to the processes. Choose the node with highest priority and allocate that to the first available process. Repeat until the list is empty. It varies between algorithms how to compute priority Tree structured task graph. Simplification: All tasks have the same execution time All processors have the same performance Arbitrary task graph on two processors. Simplification: All tasks have the same execution time 0 List Scheduling Remember Each task is allocated a priority & is placed in a list sorted by priority When a processor is free, allocate the task with the highest priority If two tasks have the same priority, take one randomly Different choice of priority gives different kinds of scheduling Level gives closest to optimal priority order (HLF) Task #Pr Level 0 Number of reasons I'm not ready Scheduling of a tree structured task graph Level maximum number of nodes from x to a terminal node Optimal algorithm (HLF) Determine the level of each node = priority When a processor is available, schedule the ready task with the highest priority HLF can fail You can always construct an example that fails Works for most algorithms

Scheduling Heuristics The complexity increases if the model allows Tasks with different execution times Different speed of the communication links Communication conflicts Loops and jumps Limited networks Find suboptimal solutions Find, with the help of a heuristic, solutions that most of the time are close to optimal Parallelism vs Communication Delay Scheduling must be based on both Communication delay The time when a processor is ready to work Trade-off between maximizing the parallelism & minimizing the communication (max-min problem) P P > T P P < T D Example, Trade-off // vs Communication Time D Dy Dy P P D P P D < T, assign T to P Time = T + D + T + Dy + T, or Time = T + T + + T If min(, Dy) > T assign T to P D P The Granularity Problem Find the best clustering of tasks in the task graph (minimize execution time) Coarse Grain Less parallelism Fine Grain More parallelism More scheduling time More communication conflicts 6 Redundant Computing Sometimes you may eliminate communication delays by duplicating work P P P P Dynamic Load Balancing Local scheduling Example: Threads, Processes, I/O Global scheduling Example: Some simulations Pool of tasks / distributed pool of tasks receiver-initiated or sender-initiated Queue line structure 7 8

Centralized Decentralized Distributed Pool of Tasks How to choose processor to communicate with? Centralized Distributed Work Transfer - Distributed The receiver takes the initiative. Pull One process asks another process for work The process asks when it is out of work, or has too little to do. Works well, even when the system load is high Can be expensive to approximate system loads Decentralized 9 0 Work Transfer - Distributed Work Transfer - Decentralized The sender takes the initiative. Push One process sends work to another process The process asks (or just sends) when it has too many tasks, or high load Works well when the system load is low Hard to know when to send Example of process choices Load (hard) Round robin Must make sure that the processes do not get in phase, i.e. they all ask the same process Randomly (random polling) Good generator necessary?? Queue Line Structure Tree Based Queue Have two processes per node One worker process that computes asks the queue for work Another that asks (to the left) for new tasks if the queue is nearly empty receives new tasks from the left neighbor receives requests from the right neighbor and from the worker process and answers these requests Each process sends to one of two processes generalization of the previous technique

Example Shortest Path Example Shortest Path Given a set of linked nodes where the edges between the nodes are marked with weights, find the path from one specific node to another that has the least accumulated weight. How do you represent the graph? 6 d j =min(d j, d i +w i,j ) Moore's Algorithm Keep a queue, containing vertices not yet computed on. Begin with the start vertex. Keep a list with shortest distances. Begin with zero for the start vertex, and infinity for the others. For each node in the beginning of the queue, update the list according to the expression above. If there is an update, add the vertex to the queue again. 7 Sequential code Using an adjacency matrix. while ((i = next_vertex())!= no_vertex) /* while a vertex */ for (j = ; j < n; j++) /* get next edge */ if (w[i][j]!= infinity) { /* if an edge */ newdist_j = dist[i] + w[i][j]; if (newdist_j < dist[j]) { dist[j] = newdist_j; append_queue(j); /* vertex to queue if not there */ /* no more vertices to consider */ 8 Parallel Implementation I Dynamic load balancing Centralized work pool Each computational node takes vertices from the queue and returns new vertices The distances are stored as a list, copied out to the nodes 9 Code: Parallel Implementation I while (vertex_queue()!= empty) { recv(pany, source = Pi); v = get_vertex_queue(); send(&v, Pi); send(&dist, &n, Pi);. recv(&j, &dist[j], PANY, source = Pi); append_queue(j, dist[j]); ; recv(pany, source = Pi); send(pi, termination_tag); While(true){ send(pmaster); recv(&v, Pmaster, tag); if (tag!= termination_tag) { recv(&dist, &n, Pmaster); for (j = ; j < n; j++){ if (w[v][j]!= infinity) { newdist_j = dist[v] + w[v][j]; if (newdist_j < dist[j]) { dist[j] = newdist_j; send(&j, &dist[j], Pmaster); else {break; 0

Parallel Implementation II Decentralized work pool Each vertex is a process. As soon as a vertex gets a new weight (start node it self), it sends new distances to its neighbors Parallel Implementation II Code: recv(newdist, PANY); if (newdist < dist) dist = newdist; /* start searching around vertex */ for (j = ; j < n; j++) /* get next edge */ if (w[j]!= infinity) { d = dist + w[j]; send(&d, Pj); /* send distance to proc j */ Have to handel messages in the air. (MPI_Probe) Shortest Path Probably have to group the vertices, i.e., several vertices per processor. Vertices close to each other on the same processor Little communication Little parallelism Vertices far away on the same processor (scatter) Lot of communication Much parallelism Group messeges? Synchronizing? Terminating Ring algorithm: Terminating Algorithms Let a process p 0 send a token on the ring when p 0 is out of work When a process receives a token: If out of work, pass the token on If not, wait until out of work, and then pass the token on When p 0 gets back the token, p 0 knows that everyone is out of work Can notify the others Does not work if processes borrows work from each other p 0 Terminating Algorithms Dijkstra's ring algorithm: Let a process p 0 send a white token on the ring when p 0 is out of work If a process p i sends work to p j, j < i, it will be colored black When a process receives a token: If the process is black, the token is colored black If out of work, pass the token on work If not, wait until out of work, then pass the token on If p 0 gets a white token back, p 0 knows that everyone is out of work sends a terminating message (e.g., a red token) If p 0 gets a black token back, p 0 sends out a white token p 0 p j p i Kontrollfrågor Antag att fem (arbets-)processer ska lösa shortest path för grafen till höger med Parallell implementation I. Hur många, och vilka, meddelanden skickas? Antag att fem (arbets-)processer ska lösa shortest path för grafen till höger med Parallell implementation II. Hur många, och vilka, meddelanden skickas? Hitta en optimal tidsfördelning för taskgrafen till höger för två processorer. 0 7 0 6