Load balancing; Termination detection Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico November 14, 2013 CPD (DEI / IST) Parallel and Distributed Computing 17 2014-11-14 1 / 22
Outline Load Balancing static dynamic centralized decentralized Termination Detection CPD (DEI / IST) Parallel and Distributed Computing 17 2014-11-14 2 / 22
Foster s Design Methodology Development of scalable parallel algorithms by delaying machine-dependent decisions to later stages. Four steps: partitioning communication agglomeration mapping CPD (DEI / IST) Parallel and Distributed Computing 17 2014-11-14 3 / 22
Mapping Examples discussed in previous classes: circuit satisfiability sieve of Eratosthenes all pairs shortest paths matrix-vector multiplication So far, all primitive tasks require same amount of computation. Aggregation + Mapping strategy: create one task per processor and distribute primitive tasks evenly. CPD (DEI / IST) Parallel and Distributed Computing 17 2014-11-14 4 / 22
Mapping Examples discussed in previous classes: circuit satisfiability sieve of Eratosthenes all pairs shortest paths matrix-vector multiplication So far, all primitive tasks require same amount of computation. Aggregation + Mapping strategy: create one task per processor and distribute primitive tasks evenly. What if we can not predict beforehand the amount of computation required per primitive task? CPD (DEI / IST) Parallel and Distributed Computing 17 2014-11-14 4 / 22
Load Balancing P 4 P 4 P 3 P 3 P 2 P 2 P 1 P 1 P 0 P 0 Perfect load balancing t Imperfect load balancing t CPD (DEI / IST) Parallel and Distributed Computing 17 2014-11-14 5 / 22
Load Balancing P 4 P 4 P 3 P 3 P 2 P 2 P 1 P 1 P 0 P 0 Perfect load balancing t Imperfect load balancing t Parallel execution time defined by last task to finish. CPD (DEI / IST) Parallel and Distributed Computing 17 2014-11-14 5 / 22
Load Balancing P 4 P 4 P 3 P 3 P 2 P 2 P 1 P 1 P 0 P 0 Perfect load balancing t Imperfect load balancing t Parallel execution time defined by last task to finish. Parallel time minimized when load distributed evenly. CPD (DEI / IST) Parallel and Distributed Computing 17 2014-11-14 5 / 22
Load Balancing Static load balancing: assignment of tasks to processors decided during program development mapping problem. CPD (DEI / IST) Parallel and Distributed Computing 17 2014-11-14 6 / 22
Load Balancing Static load balancing: assignment of tasks to processors decided during program development mapping problem. Dynamic load balancing: assignment of tasks to processors decided during runtime CPD (DEI / IST) Parallel and Distributed Computing 17 2014-11-14 6 / 22
Static Load Balancing Substantial amount of research has been dedicated to static load balancing. Problem Statement Given a number of tasks, each with a given computational effort, and a number of partitions (processors), assign each to a partition such that the maximum computational effort over all partitions is minimized. CPD (DEI / IST) Parallel and Distributed Computing 17 2014-11-14 7 / 22
Static Load Balancing Substantial amount of research has been dedicated to static load balancing. Problem Statement Given a number of tasks, each with a given computational effort, and a number of partitions (processors), assign each to a partition such that the maximum computational effort over all partitions is minimized. NP-Hard combinatorial problem! CPD (DEI / IST) Parallel and Distributed Computing 17 2014-11-14 7 / 22
Limitations of Static Load Balancing Limitations of static load balancing: CPD (DEI / IST) Parallel and Distributed Computing 17 2014-11-14 8 / 22
Limitations of Static Load Balancing Limitations of static load balancing: computational effort of each task may not be known a-priori some problems have an indeterminate number of steps to complete CPD (DEI / IST) Parallel and Distributed Computing 17 2014-11-14 8 / 22
Limitations of Static Load Balancing Limitations of static load balancing: computational effort of each task may not be known a-priori some problems have an indeterminate number of steps to complete programs are subject to variable communication delays CPD (DEI / IST) Parallel and Distributed Computing 17 2014-11-14 8 / 22
Limitations of Static Load Balancing Limitations of static load balancing: computational effort of each task may not be known a-priori some problems have an indeterminate number of steps to complete programs are subject to variable communication delays performance of each processor may vary, and may not be known beforehand Dynamic load balancing overcomes these issues by making the division of the load dependent on the actual runtimes. CPD (DEI / IST) Parallel and Distributed Computing 17 2014-11-14 8 / 22
Limitations of Static Load Balancing Limitations of static load balancing: computational effort of each task may not be known a-priori some problems have an indeterminate number of steps to complete programs are subject to variable communication delays performance of each processor may vary, and may not be known beforehand Dynamic load balancing overcomes these issues by making the division of the load dependent on the actual runtimes. Penalty: additional overhead due to task management. CPD (DEI / IST) Parallel and Distributed Computing 17 2014-11-14 8 / 22
Dynamic Load Balancing Dynamic load balancing can be accomplished through: centralized management one process (master) is responsible for assigning tasks to slave processes decentralized management all processes are equal and divide work among them cooperatively CPD (DEI / IST) Parallel and Distributed Computing 17 2014-11-14 9 / 22
Centralized Dynamic Load Balancing Work Pool or Processor Farm model: master holds all tasks of the application CPD (DEI / IST) Parallel and Distributed Computing 17 2014-11-14 10 / 22
Centralized Dynamic Load Balancing Work Pool or Processor Farm model: master holds all tasks of the application new tasks may be generated during execution CPD (DEI / IST) Parallel and Distributed Computing 17 2014-11-14 10 / 22
Centralized Dynamic Load Balancing Work Pool or Processor Farm model: master holds all tasks of the application new tasks may be generated during execution when idle, slave process requests a task to the master CPD (DEI / IST) Parallel and Distributed Computing 17 2014-11-14 10 / 22
Centralized Dynamic Load Balancing Work Pool or Processor Farm model: master holds all tasks of the application new tasks may be generated during execution when idle, slave process requests a task to the master master selects tasks among those ready to run and sends to slave CPD (DEI / IST) Parallel and Distributed Computing 17 2014-11-14 10 / 22
Centralized Dynamic Load Balancing Work Pool or Processor Farm model: master holds all tasks of the application new tasks may be generated during execution when idle, slave process requests a task to the master master selects tasks among those ready to run and sends to slave tasks of larger size or importance are executed first CPD (DEI / IST) Parallel and Distributed Computing 17 2014-11-14 10 / 22
Centralized Dynamic Load Balancing Work Pool or Processor Farm model: master holds all tasks of the application new tasks may be generated during execution when idle, slave process requests a task to the master master selects tasks among those ready to run and sends to slave tasks of larger size or importance are executed first if tasks are all the same, a FIFO queue may be used CPD (DEI / IST) Parallel and Distributed Computing 17 2014-11-14 10 / 22
Centralized Dynamic Load Balancing Work Pool or Processor Farm model: master holds all tasks of the application new tasks may be generated during execution when idle, slave process requests a task to the master master selects tasks among those ready to run and sends to slave tasks of larger size or importance are executed first if tasks are all the same, a FIFO queue may be used specialized slaves can be considered CPD (DEI / IST) Parallel and Distributed Computing 17 2014-11-14 10 / 22
Centralized Dynamic Load Balancing Work Pool or Processor Farm model: master holds all tasks of the application new tasks may be generated during execution when idle, slave process requests a task to the master master selects tasks among those ready to run and sends to slave tasks of larger size or importance are executed first if tasks are all the same, a FIFO queue may be used specialized slaves can be considered master can also hold global data CPD (DEI / IST) Parallel and Distributed Computing 17 2014-11-14 10 / 22
Centralized Dynamic Load Balancing Work Pool or Processor Farm model: Work pool T 5 Master Process T 1 T 2 T 7 T 4 T n T 3 T 6 Send Task P 0 Request Task (possibly, submit new task) P 1 P 2 P p Slave Processes CPD (DEI / IST) Parallel and Distributed Computing 17 2014-11-14 11 / 22
Centralized Dynamic Load Balancing Limitations of centralized dynamic load balancing: CPD (DEI / IST) Parallel and Distributed Computing 17 2014-11-14 12 / 22
Centralized Dynamic Load Balancing Limitations of centralized dynamic load balancing: Master can become a bottleneck as it can only issue one task at a time. not a significant problem if few slaves and/or computationally intensive tasks for finer-grained tasks and many slaves, distribute task management CPD (DEI / IST) Parallel and Distributed Computing 17 2014-11-14 12 / 22
Decentralized Dynamic Load Balancing Initial approach can be simply to evolve the centralized system into a tree-like task management scheme: main master distribute tasks to be managed by second-level masters each second-level master behaves as in the centralized method Naturally, this model can be extended to as many levels as desired. CPD (DEI / IST) Parallel and Distributed Computing 17 2014-11-14 13 / 22
Decentralized Dynamic Load Balancing Fully distributed work pool: each process starts with a given number of tasks, but may send or receive new tasks to handle. Receiver-initiated method: process request new tasks when it has few or no tasks to do better in high load systems Sender-initiated method: process under heavy load send tasks to processes willing to accept them A mixture of both approaches is possible. CPD (DEI / IST) Parallel and Distributed Computing 17 2014-11-14 14 / 22
Decentralized Dynamic Load Balancing Fully distributed work pool: each process starts with a given number of tasks, but may send or receive new tasks to handle. Receiver-initiated method: process request new tasks when it has few or no tasks to do better in high load systems Sender-initiated method: process under heavy load send tasks to processes willing to accept them A mixture of both approaches is possible. Which process to send/request? CPD (DEI / IST) Parallel and Distributed Computing 17 2014-11-14 14 / 22
Decentralized Dynamic Load Balancing Fully distributed work pool: each process starts with a given number of tasks, but may send or receive new tasks to handle. Receiver-initiated method: process request new tasks when it has few or no tasks to do better in high load systems Sender-initiated method: process under heavy load send tasks to processes willing to accept them A mixture of both approaches is possible. Which process to send/request? round-robin random CPD (DEI / IST) Parallel and Distributed Computing 17 2014-11-14 14 / 22
Mapping Strategy Decision tree for mapping strategy: Static number of tasks Dynamic number of tasks Structured communication pattern Unstructured communication pattern Frequent communication between tasks Many tasks with little communication Roughly constant computation time per task Computation per task varies Create one task per processor (agglomerate tasks to minimize communication) Cyclically map tasks to processors to balance load Use a static load balancing algorithm Use a decentralized dynamic load balancing algorithm Use a centralized dynamic load balancing algorithm CPD (DEI / IST) Parallel and Distributed Computing 17 2014-11-14 15 / 22
Termination Detection When can we be sure that the computation has finished? CPD (DEI / IST) Parallel and Distributed Computing 17 2014-11-14 16 / 22
Termination Detection When can we be sure that the computation has finished? Static load balancing CPD (DEI / IST) Parallel and Distributed Computing 17 2014-11-14 16 / 22
Termination Detection When can we be sure that the computation has finished? Static load balancing Typically termination is easy to detect has layout of application is fixed and well controlled. CPD (DEI / IST) Parallel and Distributed Computing 17 2014-11-14 16 / 22
Termination Detection When can we be sure that the computation has finished? Static load balancing Typically termination is easy to detect has layout of application is fixed and well controlled. Centralized dynamic load balancing CPD (DEI / IST) Parallel and Distributed Computing 17 2014-11-14 16 / 22
Termination Detection When can we be sure that the computation has finished? Static load balancing Typically termination is easy to detect has layout of application is fixed and well controlled. Centralized dynamic load balancing Also easy for master to recognize termination: task queue is empty every slave is idle, having requested another task, and without generating any new task CPD (DEI / IST) Parallel and Distributed Computing 17 2014-11-14 16 / 22
Termination Detection When can we be sure that the computation has finished? Static load balancing Typically termination is easy to detect has layout of application is fixed and well controlled. Centralized dynamic load balancing Also easy for master to recognize termination: task queue is empty every slave is idle, having requested another task, and without generating any new task Alternatively, if a slave indicates that a solution has been found, master can terminate all slaves. CPD (DEI / IST) Parallel and Distributed Computing 17 2014-11-14 16 / 22
Termination Detection When can we be sure that the computation has finished? Static load balancing Typically termination is easy to detect has layout of application is fixed and well controlled. Centralized dynamic load balancing Also easy for master to recognize termination: task queue is empty every slave is idle, having requested another task, and without generating any new task Alternatively, if a slave indicates that a solution has been found, master can terminate all slaves. Decentralized dynamic load balancing CPD (DEI / IST) Parallel and Distributed Computing 17 2014-11-14 16 / 22
Termination Detection When can we be sure that the computation has finished? Static load balancing Typically termination is easy to detect has layout of application is fixed and well controlled. Centralized dynamic load balancing Also easy for master to recognize termination: task queue is empty every slave is idle, having requested another task, and without generating any new task Alternatively, if a slave indicates that a solution has been found, master can terminate all slaves. Decentralized dynamic load balancing Not so trivial... CPD (DEI / IST) Parallel and Distributed Computing 17 2014-11-14 16 / 22
Termination Detection In general, termination at time t requires the following conditions: 1 local termination conditions exist for all processes at time t 2 there are no messages in transit at time t Second condition is what makes this problem difficult. CPD (DEI / IST) Parallel and Distributed Computing 17 2014-11-14 17 / 22
Termination Detection In general, termination at time t requires the following conditions: 1 local termination conditions exist for all processes at time t 2 there are no messages in transit at time t Second condition is what makes this problem difficult. Wait long enough before assuming program has finished? CPD (DEI / IST) Parallel and Distributed Computing 17 2014-11-14 17 / 22
Termination Detection Using Acknowledgment Messages each process has two states: active and inactive. CPD (DEI / IST) Parallel and Distributed Computing 17 2014-11-14 18 / 22
Termination Detection Using Acknowledgment Messages each process has two states: active and inactive. processes start inactive CPD (DEI / IST) Parallel and Distributed Computing 17 2014-11-14 18 / 22
Termination Detection Using Acknowledgment Messages each process has two states: active and inactive. processes start inactive they become active when they receive first task sender of tasks becomes parent of process CPD (DEI / IST) Parallel and Distributed Computing 17 2014-11-14 18 / 22
Termination Detection Using Acknowledgment Messages each process has two states: active and inactive. processes start inactive they become active when they receive first task sender of tasks becomes parent of process process may receive many other tasks while active (computation itself need not be a tree) CPD (DEI / IST) Parallel and Distributed Computing 17 2014-11-14 18 / 22
Termination Detection Using Acknowledgment Messages each process has two states: active and inactive. processes start inactive they become active when they receive first task sender of tasks becomes parent of process process may receive many other tasks while active (computation itself need not be a tree) every time a process receives a task sends an acknowledgment, with the exception of the parent CPD (DEI / IST) Parallel and Distributed Computing 17 2014-11-14 18 / 22
Termination Detection Using Acknowledgment Messages each process has two states: active and inactive. processes start inactive they become active when they receive first task sender of tasks becomes parent of process process may receive many other tasks while active (computation itself need not be a tree) every time a process receives a task sends an acknowledgment, with the exception of the parent acknowledgment to parent is only sent when process is ready to become inactive CPD (DEI / IST) Parallel and Distributed Computing 17 2014-11-14 18 / 22
Termination Detection Using Acknowledgment Messages each process has two states: active and inactive. processes start inactive they become active when they receive first task sender of tasks becomes parent of process process may receive many other tasks while active (computation itself need not be a tree) every time a process receives a task sends an acknowledgment, with the exception of the parent acknowledgment to parent is only sent when process is ready to become inactive process is ready to become inactive when CPD (DEI / IST) Parallel and Distributed Computing 17 2014-11-14 18 / 22
Termination Detection Using Acknowledgment Messages each process has two states: active and inactive. processes start inactive they become active when they receive first task sender of tasks becomes parent of process process may receive many other tasks while active (computation itself need not be a tree) every time a process receives a task sends an acknowledgment, with the exception of the parent acknowledgment to parent is only sent when process is ready to become inactive process is ready to become inactive when local termination conditions exist (all tasks finished) CPD (DEI / IST) Parallel and Distributed Computing 17 2014-11-14 18 / 22
Termination Detection Using Acknowledgment Messages each process has two states: active and inactive. processes start inactive they become active when they receive first task sender of tasks becomes parent of process process may receive many other tasks while active (computation itself need not be a tree) every time a process receives a task sends an acknowledgment, with the exception of the parent acknowledgment to parent is only sent when process is ready to become inactive process is ready to become inactive when local termination conditions exist (all tasks finished) it has transmitted all acknowledgments for tasks it received CPD (DEI / IST) Parallel and Distributed Computing 17 2014-11-14 18 / 22
Termination Detection Using Acknowledgment Messages each process has two states: active and inactive. processes start inactive they become active when they receive first task sender of tasks becomes parent of process process may receive many other tasks while active (computation itself need not be a tree) every time a process receives a task sends an acknowledgment, with the exception of the parent acknowledgment to parent is only sent when process is ready to become inactive process is ready to become inactive when local termination conditions exist (all tasks finished) it has transmitted all acknowledgments for tasks it received it has received all acknowledgments for tasks it has sent out When first process becomes idle, computation can terminate. CPD (DEI / IST) Parallel and Distributed Computing 17 2014-11-14 18 / 22
Termination Detection Using Acknowledgment Messages Parent Process Inactive Final Ack First Task Ack Task Other processes Active CPD (DEI / IST) Parallel and Distributed Computing 17 2014-11-14 19 / 22
Termination Detection Ring Termination Algorithm processes become white when they terminate CPD (DEI / IST) Parallel and Distributed Computing 17 2014-11-14 20 / 22
Termination Detection Ring Termination Algorithm processes become white when they terminate when P 0 becomes white, it sends a white token to P 1 CPD (DEI / IST) Parallel and Distributed Computing 17 2014-11-14 20 / 22
Termination Detection Ring Termination Algorithm processes become white when they terminate when P 0 becomes white, it sends a white token to P 1 the token is passed to the next process in the ring when the process finishes, and: if the color of the process is black, the token becomes black otherwise, the token keeps the same color CPD (DEI / IST) Parallel and Distributed Computing 17 2014-11-14 20 / 22
Termination Detection Ring Termination Algorithm processes become white when they terminate when P 0 becomes white, it sends a white token to P 1 the token is passed to the next process in the ring when the process finishes, and: if the color of the process is black, the token becomes black otherwise, the token keeps the same color a process P i becomes black if it sends a message to process P j and j < i CPD (DEI / IST) Parallel and Distributed Computing 17 2014-11-14 20 / 22
Termination Detection Ring Termination Algorithm processes become white when they terminate when P 0 becomes white, it sends a white token to P 1 the token is passed to the next process in the ring when the process finishes, and: if the color of the process is black, the token becomes black otherwise, the token keeps the same color a process P i becomes black if it sends a message to process P j and j < i a black process becomes white when it passes the token CPD (DEI / IST) Parallel and Distributed Computing 17 2014-11-14 20 / 22
Termination Detection Ring Termination Algorithm processes become white when they terminate when P 0 becomes white, it sends a white token to P 1 the token is passed to the next process in the ring when the process finishes, and: if the color of the process is black, the token becomes black otherwise, the token keeps the same color a process P i becomes black if it sends a message to process P j and j < i a black process becomes white when it passes the token when P 0 receives a black token, it passes a white token CPD (DEI / IST) Parallel and Distributed Computing 17 2014-11-14 20 / 22
Termination Detection Ring Termination Algorithm processes become white when they terminate when P 0 becomes white, it sends a white token to P 1 the token is passed to the next process in the ring when the process finishes, and: if the color of the process is black, the token becomes black otherwise, the token keeps the same color a process P i becomes black if it sends a message to process P j and j < i a black process becomes white when it passes the token when P 0 receives a black token, it passes a white token when P 0 receives a white token, computation can terminate CPD (DEI / IST) Parallel and Distributed Computing 17 2014-11-14 20 / 22
Review Load Balancing static dynamic centralized decentralized Termination Detection CPD (DEI / IST) Parallel and Distributed Computing 17 2014-11-14 21 / 22
Next Class OpenMP vs/with MPI CPD (DEI / IST) Parallel and Distributed Computing 17 2014-11-14 22 / 22