Introduction to Scheduling Theory



Similar documents
Real Time Scheduling Basic Concepts. Radek Pelánek

Scheduling Shop Scheduling. Tim Nieberg

Classification - Examples

Applied Algorithm Design Lecture 5

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. #-approximation algorithm.

Scheduling Single Machine Scheduling. Tim Nieberg

Cost Model: Work, Span and Parallelism. 1 The RAM model for sequential computation:

Single machine parallel batch scheduling with unbounded capacity

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. !-approximation algorithm.

Algorithm Design and Analysis

Common Approaches to Real-Time Scheduling

Complexity Theory. IE 661: Scheduling Theory Fall 2003 Satyaki Ghosh Dastidar

Chapter Load Balancing. Approximation Algorithms. Load Balancing. Load Balancing on 2 Machines. Load Balancing: Greedy Scheduling

11. APPROXIMATION ALGORITHMS

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research

Approximation Algorithms

Duplicating and its Applications in Batch Scheduling

A STUDY OF TASK SCHEDULING IN MULTIPROCESSOR ENVIROMENT Ranjit Rajak 1, C.P.Katti 2, Nidhi Rajak 3

Distributed Load Balancing for Machines Fully Heterogeneous

Single machine models: Maximum Lateness -12- Approximation ratio for EDD for problem 1 r j,d j < 0 L max. structure of a schedule Q...

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS

Why? A central concept in Computer Science. Algorithms are ubiquitous.

Completion Time Scheduling and the WSRPT Algorithm

Improving Experiments by Optimal Blocking: Minimizing the Maximum Within-block Distance

5 INTEGER LINEAR PROGRAMMING (ILP) E. Amaldi Fondamenti di R.O. Politecnico di Milano 1

CHAPTER II THE LIMIT OF A SEQUENCE OF NUMBERS DEFINITION OF THE NUMBER e.

1 Approximating Set Cover

2.3 Convex Constrained Optimization Problems

Ph.D. Thesis. Judit Nagy-György. Supervisor: Péter Hajnal Associate Professor

Fairness in Routing and Load Balancing

6.2 Permutations continued

5.1 Bipartite Matching

NP-Hardness Results Related to PPAD

Every tree contains a large induced subgraph with all degrees odd

JUST-IN-TIME SCHEDULING WITH PERIODIC TIME SLOTS. Received December May 12, 2003; revised February 5, 2004

Lecture Outline Overview of real-time scheduling algorithms Outline relative strengths, weaknesses

Modeling and Performance Evaluation of Computer Systems Security Operation 1

Scheduling Real-time Tasks: Algorithms and Complexity

The Goldberg Rao Algorithm for the Maximum Flow Problem

Exponential time algorithms for graph coloring

Solutions to Homework 6

IE 680 Special Topics in Production Systems: Networks, Routing and Logistics*

Dimensioning an inbound call center using constraint programming

Lecture 4 Online and streaming algorithms for clustering

Reliability Guarantees in Automata Based Scheduling for Embedded Control Software

Load balancing in a heterogeneous computer system by self-organizing Kohonen network

Decentralized Utility-based Sensor Network Design

Private Approximation of Clustering and Vertex Cover

R u t c o r Research R e p o r t. A Method to Schedule Both Transportation and Production at the Same Time in a Special FMS.

OPTIMAL DESIGN OF DISTRIBUTED SENSOR NETWORKS FOR FIELD RECONSTRUCTION

Analysis of Algorithms I: Optimal Binary Search Trees

Network File Storage with Graceful Performance Degradation

ONLINE DEGREE-BOUNDED STEINER NETWORK DESIGN. Sina Dehghani Saeed Seddighin Ali Shafahi Fall 2015

Passive Discovery Algorithms

Online Bipartite Perfect Matching With Augmentations

Improved Algorithms for Data Migration

On Integer Additive Set-Indexers of Graphs

Discrete Applied Mathematics. The firefighter problem with more than one firefighter on trees

Introduction to production scheduling. Industrial Management Group School of Engineering University of Seville

Scheduling Technicians and Tasks in a Telecommunications Company

A Computer Application for Scheduling in MS Project

CSC2420 Spring 2015: Lecture 3

The Ergodic Theorem and randomness

The simple case: Cyclic execution

A Practical Scheme for Wireless Network Operation

Determination of the normalization level of database schemas through equivalence classes of attributes

Load Balancing. Load Balancing 1 / 24

Techniques algébriques en calcul quantique

ESQUIVEL S.C., GATICA C. R., GALLARD R.H.

Tools for parsimonious edge-colouring of graphs with maximum degree three. J.L. Fouquet and J.M. Vanherpe. Rapport n o RR

A REMARK ON ALMOST MOORE DIGRAPHS OF DEGREE THREE. 1. Introduction and Preliminaries

Online Scheduling with Bounded Migration

Regular Expressions with Nested Levels of Back Referencing Form a Hierarchy

School Timetabling in Theory and Practice

Statistical Machine Learning

CSE 4351/5351 Notes 7: Task Scheduling & Load Balancing

[Refer Slide Time: 05:10]

a 11 x 1 + a 12 x a 1n x n = b 1 a 21 x 1 + a 22 x a 2n x n = b 2.

each college c i C has a capacity q i - the maximum number of students it will admit

On the k-path cover problem for cacti

1 The Line vs Point Test

Minimizing the Number of Machines in a Unit-Time Scheduling Problem

Portable Bushy Processing Trees for Join Queries

Establishing a Mobile Conference Call Under Delay and Bandwidth Constraints

PRIME FACTORS OF CONSECUTIVE INTEGERS

Transcription:

Introduction to Scheduling Theory Arnaud Legrand Laboratoire Informatique et Distribution IMAG CNRS, France arnaud.legrand@imag.fr November 8, 2004 1/ 26

Outline 1 Task graphs from outer space 2 Scheduling definitions and notions 3 Platform models and scheduling problems 2/ 26

Outline 1 Task graphs from outer space 2 Scheduling definitions and notions 3 Platform models and scheduling problems 3/ 26

Analyzing a simple code Solving A.x = B where A is lower triangular matrix 1: for i = 1 to n do 2: Task T i,i : x(i) b(i)/a(i, i) 3: for j = i + 1 to n do 4: Task T i,j : b(j) b(j) a(j, i) x(i) For a given value 1 i n, all tasks T i, are computations done during the i th iteration of the outer loop. < seq is the sequential order : T 1,1 < seq T 1,2 < seq T 1,3 < seq...< seq T 1,n < seq T 2,2 < seq T 2,3 < seq...< seq T n,n. 4/ 26

Analyzing a simple code Solving A.x = B where A is lower triangular matrix 1: for i = 1 to n do 2: Task T i,i : x(i) b(i)/a(i, i) 3: for j = i + 1 to n do 4: Task T i,j : b(j) b(j) a(j, i) x(i) For a given value 1 i n, all tasks T i, are computations done during the i th iteration of the outer loop. < seq is the sequential order : T 1,1 < seq T 1,2 < seq T 1,3 < seq...< seq T 1,n < seq T 2,2 < seq T 2,3 < seq...< seq T n,n. 4/ 26

Analyzing a simple code Solving A.x = B where A is lower triangular matrix 1: for i = 1 to n do 2: Task T i,i : x(i) b(i)/a(i, i) 3: for j = i + 1 to n do 4: Task T i,j : b(j) b(j) a(j, i) x(i) For a given value 1 i n, all tasks T i, are computations done during the i th iteration of the outer loop. < seq is the sequential order : T 1,1 < seq T 1,2 < seq T 1,3 < seq...< seq T 1,n < seq T 2,2 < seq T 2,3 < seq...< seq T n,n. 4/ 26

Independence However, some independent tasks could be executed in parallel. Independent tasks are the ones whose execution order can be changed without modifying the result of the program. Two independent tasks may read the value but never write to the same memory location. For a given task T, In(T ) denotes the set of input variables and Out(T ) the set of output variables. { In the previous example, we have : In(T i,i ) = {b(i), a(i, i)} for i = 1 to N do Out(T i,i ) = {x(i)} and Task T i,i : x(i) b(i)/a(i, i) { for j = i + 1 to N do In(T i,j ) = {b(j), a(j, i), x(i)} Task T i,j : b(j) b(j) a(j, i) x(i) Out(T i,j ) = {b(j)} for j > i. 5/ 26

Independence However, some independent tasks could be executed in parallel. Independent tasks are the ones whose execution order can be changed without modifying the result of the program. Two independent tasks may read the value but never write to the same memory location. For a given task T, In(T ) denotes the set of input variables and Out(T ) the set of output variables. { In the previous example, we have : In(T i,i ) = {b(i), a(i, i)} for i = 1 to N do Out(T i,i ) = {x(i)} and Task T i,i : x(i) b(i)/a(i, i) { for j = i + 1 to N do In(T i,j ) = {b(j), a(j, i), x(i)} Task T i,j : b(j) b(j) a(j, i) x(i) Out(T i,j ) = {b(j)} for j > i. 5/ 26

Independence However, some independent tasks could be executed in parallel. Independent tasks are the ones whose execution order can be changed without modifying the result of the program. Two independent tasks may read the value but never write to the same memory location. For a given task T, In(T ) denotes the set of input variables and Out(T ) the set of output variables. { In the previous example, we have : In(T i,i ) = {b(i), a(i, i)} for i = 1 to N do Out(T i,i ) = {x(i)} and Task T i,i : x(i) b(i)/a(i, i) { for j = i + 1 to N do In(T i,j ) = {b(j), a(j, i), x(i)} Task T i,j : b(j) b(j) a(j, i) x(i) Out(T i,j ) = {b(j)} for j > i. 5/ 26

Bernstein conditions Definition. Two tasks T and T are not independent ( T T ) whenever they share a written variable: In(T ) Out(T ) T T or Out(T ) In(T ). or Out(T ) Out(T ) Those conditions are known as Bernstein s conditions [Ber66]. We can check that: Out(T 1,1 ) In(T 1,2 ) = {x(1)} T 1,1 T 1,2. Out(T 1,3 ) Out(T 2,3 ) = {b(3)} T 1,3 T 2,3. for i = 1 to N do Task T i,i: x(i) b(i)/a(i, i) for j = i + 1 to N do Task T i,j: b(j) b(j) a(j, i) x(i) 6/ 26

Bernstein conditions Definition. Two tasks T and T are not independent ( T T ) whenever they share a written variable: In(T ) Out(T ) T T or Out(T ) In(T ). or Out(T ) Out(T ) Those conditions are known as Bernstein s conditions [Ber66]. We can check that: Out(T 1,1 ) In(T 1,2 ) = {x(1)} T 1,1 T 1,2. Out(T 1,3 ) Out(T 2,3 ) = {b(3)} T 1,3 T 2,3. for i = 1 to N do Task T i,i: x(i) b(i)/a(i, i) for j = i + 1 to N do Task T i,j: b(j) b(j) a(j, i) x(i) 6/ 26

Bernstein conditions Definition. Two tasks T and T are not independent ( T T ) whenever they share a written variable: In(T ) Out(T ) T T or Out(T ) In(T ). or Out(T ) Out(T ) Those conditions are known as Bernstein s conditions [Ber66]. We can check that: Out(T 1,1 ) In(T 1,2 ) = {x(1)} T 1,1 T 1,2. Out(T 1,3 ) Out(T 2,3 ) = {b(3)} T 1,3 T 2,3. for i = 1 to N do Task T i,i: x(i) b(i)/a(i, i) for j = i + 1 to N do Task T i,j: b(j) b(j) a(j, i) x(i) 6/ 26

Bernstein conditions Definition. Two tasks T and T are not independent ( T T ) whenever they share a written variable: In(T ) Out(T ) T T or Out(T ) In(T ). or Out(T ) Out(T ) Those conditions are known as Bernstein s conditions [Ber66]. We can check that: Out(T 1,1 ) In(T 1,2 ) = {x(1)} T 1,1 T 1,2. Out(T 1,3 ) Out(T 2,3 ) = {b(3)} T 1,3 T 2,3. for i = 1 to N do Task T i,i: x(i) b(i)/a(i, i) for j = i + 1 to N do Task T i,j: b(j) b(j) a(j, i) x(i) 6/ 26

Precedence If T T, then they should be ordered with the sequential execution order. T T if T T and T < seq T. More precisely is defined as the transitive closure of (< seq ). T 1,1 for i = 1 to N do Task T i,i: x(i) b(i)/a(i, i) for j = i + 1 to N do Task T i,j: b(j) b(j) a(j, i) x(i) A dependence graph G is used. (e : T T ) G means that T can start only if T has already been finished. T is a predecessor of T. Transitivity arc are generally omitted. T 1,2 T 1,3 T 1,4 T 1,5 T 1,6 T 2,2 T 2,3 T 2,4 T 2,5 T 2,6 T 3,3 T 3,4 T 3,5 T 3,6 T 4,4 T 4,5 T 5,5 T 4,6 T 5,6 T 6,6 7/ 26

Precedence If T T, then they should be ordered with the sequential execution order. T T if T T and T < seq T. More precisely is defined as the transitive closure of (< seq ). T 1,1 for i = 1 to N do Task T i,i: x(i) b(i)/a(i, i) for j = i + 1 to N do Task T i,j: b(j) b(j) a(j, i) x(i) A dependence graph G is used. (e : T T ) G means that T can start only if T has already been finished. T is a predecessor of T. Transitivity arc are generally omitted. T 1,2 T 1,3 T 1,4 T 1,5 T 1,6 T 2,2 T 2,3 T 2,4 T 2,5 T 2,6 T 3,3 T 3,4 T 3,5 T 3,6 T 4,4 T 4,5 T 5,5 T 4,6 T 5,6 T 6,6 7/ 26

Precedence If T T, then they should be ordered with the sequential execution order. T T if T T and T < seq T. More precisely is defined as the transitive closure of (< seq ). T 1,1 for i = 1 to N do Task T i,i: x(i) b(i)/a(i, i) for j = i + 1 to N do Task T i,j: b(j) b(j) a(j, i) x(i) A dependence graph G is used. (e : T T ) G means that T can start only if T has already been finished. T is a predecessor of T. Transitivity arc are generally omitted. T 1,2 T 1,3 T 1,4 T 1,5 T 1,6 T 2,2 T 2,3 T 2,4 T 2,5 T 2,6 T 3,3 T 3,4 T 3,5 T 3,6 T 4,4 T 4,5 T 5,5 T 4,6 T 5,6 T 6,6 7/ 26

Precedence If T T, then they should be ordered with the sequential execution order. T T if T T and T < seq T. More precisely is defined as the transitive closure of (< seq ). T 1,1 for i = 1 to N do Task T i,i: x(i) b(i)/a(i, i) for j = i + 1 to N do Task T i,j: b(j) b(j) a(j, i) x(i) A dependence graph G is used. (e : T T ) G means that T can start only if T has already been finished. T is a predecessor of T. Transitivity arc are generally omitted. T 1,2 T 1,3 T 1,4 T 1,5 T 1,6 T 2,2 T 2,3 T 2,4 T 2,5 T 2,6 T 3,3 T 3,4 T 3,5 T 3,6 T 4,4 T 4,5 T 5,5 T 4,6 T 5,6 T 6,6 7/ 26

Coarse-grain task graph Task-graph do not necessarily come from instruction-level analysis. scan(proteins p) (termgo:0008372) scan(proteinterms t) (termgo:0008372) select p.proteinid, blast(p.sequence) from proteins p, proteinterms t where p.proteinid = t.proteinid and t.term = GO:0008372 project (p.proteinid,p.sequence) exchange project (t.proteinid) exchange join (p.proteinid=t.proteinid) exchange operation call (blast(p.sequence)) project (p.proteinid,blast) Each task may be parallel, preemptable, divisible,... Each edge depicts a dependency i.e. most of the times some data to transfer. In the following, we will focus on simple models. 8/ 26

Coarse-grain task graph Task-graph do not necessarily come from instruction-level analysis. scan(proteins p) (termgo:0008372) scan(proteinterms t) (termgo:0008372) select p.proteinid, blast(p.sequence) from proteins p, proteinterms t where p.proteinid = t.proteinid and t.term = GO:0008372 project (p.proteinid,p.sequence) exchange project (t.proteinid) exchange join (p.proteinid=t.proteinid) exchange operation call (blast(p.sequence)) project (p.proteinid,blast) Each task may be parallel, preemptable, divisible,... Each edge depicts a dependency i.e. most of the times some data to transfer. In the following, we will focus on simple models. 8/ 26

Coarse-grain task graph Task-graph do not necessarily come from instruction-level analysis. scan(proteins p) (termgo:0008372) scan(proteinterms t) (termgo:0008372) select p.proteinid, blast(p.sequence) from proteins p, proteinterms t where p.proteinid = t.proteinid and t.term = GO:0008372 project (p.proteinid,p.sequence) exchange project (t.proteinid) exchange join (p.proteinid=t.proteinid) exchange operation call (blast(p.sequence)) project (p.proteinid,blast) Each task may be parallel, preemptable, divisible,... Each edge depicts a dependency i.e. most of the times some data to transfer. In the following, we will focus on simple models. 8/ 26

Coarse-grain task graph Task-graph do not necessarily come from instruction-level analysis. scan(proteins p) (termgo:0008372) scan(proteinterms t) (termgo:0008372) select p.proteinid, blast(p.sequence) from proteins p, proteinterms t where p.proteinid = t.proteinid and t.term = GO:0008372 project (p.proteinid,p.sequence) exchange project (t.proteinid) exchange join (p.proteinid=t.proteinid) exchange operation call (blast(p.sequence)) project (p.proteinid,blast) Each task may be parallel, preemptable, divisible,... Each edge depicts a dependency i.e. most of the times some data to transfer. In the following, we will focus on simple models. 8/ 26

Outline 1 Task graphs from outer space 2 Scheduling definitions and notions 3 Platform models and scheduling problems 9/ 26

Task system Definition (Task system). A task system is an directed graph G = (V, E, w) where : V is the set of tasks (V is finite) E represent the dependence constraints: e = (u, v) E iff u v w : V N is a time function that give the weight (or duration) of each task. T1,1 T1,2 T1,3 T1,4 T1,5 T1,6 We could set w(t i,j ) = 1 but also decide that performing a division is more expensive than a multiplication followed by an addition. T2,2 T2,3 T2,4 T2,5 T2,6 T3,3 T3,4 T3,5 T3,6 T4,4 T4,5 T5,5 T4,6 T5,6 T6,6 10/ 26

Task system Definition (Task system). A task system is an directed graph G = (V, E, w) where : V is the set of tasks (V is finite) E represent the dependence constraints: e = (u, v) E iff u v w : V N is a time function that give the weight (or duration) of each task. T1,1 T1,2 T1,3 T1,4 T1,5 T1,6 We could set w(t i,j ) = 1 but also decide that performing a division is more expensive than a multiplication followed by an addition. T2,2 T2,3 T2,4 T2,5 T2,6 T3,3 T3,4 T3,5 T3,6 T4,4 T4,5 T5,5 T4,6 T5,6 T6,6 10/ 26

Schedule and Allocation Definition (Schedule). A schedule of a task system G = (V, E, w) is a time function σ : V N such that: (u, v) E, σ(u) + w(u) σ(v) Let us denote by P = {P 1,..., P p } the set of processors. Definition (Allocation). An allocation of a task system G = (V, E, w) is a function π : V P such that: { π(t ) = π(t σ(t ) + w(t ) σ(t ) or ) σ(t ) + w(t ) σ(t ) Depending on the application and platform model, much more complex definitions can be proposed. 11/ 26

Schedule and Allocation Definition (Schedule). A schedule of a task system G = (V, E, w) is a time function σ : V N such that: (u, v) E, σ(u) + w(u) σ(v) Let us denote by P = {P 1,..., P p } the set of processors. Definition (Allocation). An allocation of a task system G = (V, E, w) is a function π : V P such that: { π(t ) = π(t σ(t ) + w(t ) σ(t ) or ) σ(t ) + w(t ) σ(t ) Depending on the application and platform model, much more complex definitions can be proposed. 11/ 26

Schedule and Allocation Definition (Schedule). A schedule of a task system G = (V, E, w) is a time function σ : V N such that: (u, v) E, σ(u) + w(u) σ(v) Let us denote by P = {P 1,..., P p } the set of processors. Definition (Allocation). An allocation of a task system G = (V, E, w) is a function π : V P such that: { π(t ) = π(t σ(t ) + w(t ) σ(t ) or ) σ(t ) + w(t ) σ(t ) Depending on the application and platform model, much more complex definitions can be proposed. 11/ 26

Gantt-chart Manipulating functions is generally not very convenient. That is why Gantt-chart are used to depict schedules and allocations. T 1 Processors T 2 T 3 T 4 P 3 P 2 T 5 T 6 T 7 P 1 Time T 8 12/ 26

Basic feasibility condition Theorem. Let G = (V, E, w) be a task system. There exists a valid schedule of G iff G has no cycle. Sketch of the proof. Assume that G has a cycle v 1 v 2... v k v 1. Then v 1 v 1 and a valid schedule σ should hold σ(v 1 ) + w(v 1 ) σ(v 1 ) true, which is impossible because w(v 1 ) > 0. If G is acyclic, then some tasks have no predecessor. They can be scheduled first. More precisely, we sort topologically the vertexes and schedule them one after the other on the same processor. Dependences are then fulfilled. Therefore all task systems we will be considering in the following are Directed Acyclic Graphs. 13/ 26

Basic feasibility condition Theorem. Let G = (V, E, w) be a task system. There exists a valid schedule of G iff G has no cycle. Sketch of the proof. Assume that G has a cycle v 1 v 2... v k v 1. Then v 1 v 1 and a valid schedule σ should hold σ(v 1 ) + w(v 1 ) σ(v 1 ) true, which is impossible because w(v 1 ) > 0. If G is acyclic, then some tasks have no predecessor. They can be scheduled first. More precisely, we sort topologically the vertexes and schedule them one after the other on the same processor. Dependences are then fulfilled. Therefore all task systems we will be considering in the following are Directed Acyclic Graphs. 13/ 26

Basic feasibility condition Theorem. Let G = (V, E, w) be a task system. There exists a valid schedule of G iff G has no cycle. Sketch of the proof. Assume that G has a cycle v 1 v 2... v k v 1. Then v 1 v 1 and a valid schedule σ should hold σ(v 1 ) + w(v 1 ) σ(v 1 ) true, which is impossible because w(v 1 ) > 0. If G is acyclic, then some tasks have no predecessor. They can be scheduled first. More precisely, we sort topologically the vertexes and schedule them one after the other on the same processor. Dependences are then fulfilled. Therefore all task systems we will be considering in the following are Directed Acyclic Graphs. 13/ 26

Makespan Definition (Makespan). The makespan of a schedule is the total execution time : Processors P3 MS(σ) = max{σ(v) + w(v)} min {σ(v)}. v V v V The makespan is also often referred as C max in the literature. P2 P1 Time C max = max v V C v P b(p): find a schedule with the smallest possible makespan, using at most p processors. MS opt (p) denotes the optimal makespan using only p processors. P b( ): find a schedule with the smallest makespan when the number of processors that can be used is not bounded. We note MS opt ( ) the corresponding makespan. 14/ 26

Makespan Definition (Makespan). The makespan of a schedule is the total execution time : Processors P3 MS(σ) = max{σ(v) + w(v)} min {σ(v)}. v V v V The makespan is also often referred as C max in the literature. P2 P1 Time C max = max v V C v P b(p): find a schedule with the smallest possible makespan, using at most p processors. MS opt (p) denotes the optimal makespan using only p processors. P b( ): find a schedule with the smallest makespan when the number of processors that can be used is not bounded. We note MS opt ( ) the corresponding makespan. 14/ 26

Makespan Definition (Makespan). The makespan of a schedule is the total execution time : Processors P3 MS(σ) = max{σ(v) + w(v)} min {σ(v)}. v V v V The makespan is also often referred as C max in the literature. P2 P1 Time C max = max v V C v P b(p): find a schedule with the smallest possible makespan, using at most p processors. MS opt (p) denotes the optimal makespan using only p processors. P b( ): find a schedule with the smallest makespan when the number of processors that can be used is not bounded. We note MS opt ( ) the corresponding makespan. 14/ 26

Critical path Let Φ = (T 1, T 2,..., T n ) be a path in G. w can be extended to paths in the following way : Lemma. w(φ) = n i=1 w(t i) Let G = (V, E, w) be a DAG and σ p a schedule of G using p processors. For any path Φ in G, we have MS(σ p ) w(φ). Proof. Let Φ = (T 1, T 2,..., T n ) be a path in G: (T i, T i+1 ) E for 1 i < n. Therefore we have σ p (T i ) + w(t i ) σ p (T i+1 ) for 1 i < n, hence n MS(σ p ) w(t n ) + σ p (T n ) σ p (T 1 ) w(t i ) = w(φ). i=1 15/ 26

Critical path Let Φ = (T 1, T 2,..., T n ) be a path in G. w can be extended to paths in the following way : Lemma. w(φ) = n i=1 w(t i) Let G = (V, E, w) be a DAG and σ p a schedule of G using p processors. For any path Φ in G, we have MS(σ p ) w(φ). Proof. Let Φ = (T 1, T 2,..., T n ) be a path in G: (T i, T i+1 ) E for 1 i < n. Therefore we have σ p (T i ) + w(t i ) σ p (T i+1 ) for 1 i < n, hence n MS(σ p ) w(t n ) + σ p (T n ) σ p (T 1 ) w(t i ) = w(φ). i=1 15/ 26

Critical path Let Φ = (T 1, T 2,..., T n ) be a path in G. w can be extended to paths in the following way : Lemma. w(φ) = n i=1 w(t i) Let G = (V, E, w) be a DAG and σ p a schedule of G using p processors. For any path Φ in G, we have MS(σ p ) w(φ). Proof. Let Φ = (T 1, T 2,..., T n ) be a path in G: (T i, T i+1 ) E for 1 i < n. Therefore we have σ p (T i ) + w(t i ) σ p (T i+1 ) for 1 i < n, hence n MS(σ p ) w(t n ) + σ p (T n ) σ p (T 1 ) w(t i ) = w(φ). i=1 15/ 26

Speed-up and Efficiency Definition. Let G = (V, E, w) be a DAG and σ p a schedule of G using only p processors: Speed-up: s(σ p ) = Efficiency: e(σ p ) = s(σ p) p Seq MS(σ p ), where Seq = MS opt(1) = v V = Seq p MS(σ p ). w(v). Theorem. Let G = (V, E, w) be a DAG. For any schedule σ p using p processors: 0 e(σ p ) 1. 16/ 26

Speed-up and Efficiency Definition. Let G = (V, E, w) be a DAG and σ p a schedule of G using only p processors: Speed-up: s(σ p ) = Efficiency: e(σ p ) = s(σ p) p Seq MS(σ p ), where Seq = MS opt(1) = v V = Seq p MS(σ p ). w(v). Theorem. Let G = (V, E, w) be a DAG. For any schedule σ p using p processors: 0 e(σ p ) 1. 16/ 26

Speed-up and Efficiency Definition. Let G = (V, E, w) be a DAG and σ p a schedule of G using only p processors: Speed-up: s(σ p ) = Efficiency: e(σ p ) = s(σ p) p Seq MS(σ p ), where Seq = MS opt(1) = v V = Seq p MS(σ p ). w(v). Theorem. Let G = (V, E, w) be a DAG. For any schedule σ p using p processors: 0 e(σ p ) 1. 16/ 26

Speed-up and Efficiency (Cont d) Theorem. Let G = (V, E, w) be a DAG. For any schedule σ p using p processors: 0 e(σ p ) 1. Proof. Processors P4 P3 P2 P1 Time active idle Let Idle denote the total idle time. Seq + Idle is then equal to the total surface of the rectangle, i.e. p MS(σ p ). Seq Therefore e(σ p ) = p MS(σ p ) 1. The speed-up is thus bound the number of processors. No supralinear speed-up in our model! 17/ 26

A trivial result Theorem. Let G = (V, E, w) be a DAG. We have Seq = MS opt (1)... MS opt (p) MS opt (p + 1)... MS opt ( ). Allowing to use more processors cannot hurt. However, using more processors may hurt, especially in a model where communications are taken into account. If we define MS (p) as the smallest makespan of schedules using exactly p processors, we may have MS (p) > MS (p ) with p < p. 18/ 26

A trivial result Theorem. Let G = (V, E, w) be a DAG. We have Seq = MS opt (1)... MS opt (p) MS opt (p + 1)... MS opt ( ). Allowing to use more processors cannot hurt. However, using more processors may hurt, especially in a model where communications are taken into account. If we define MS (p) as the smallest makespan of schedules using exactly p processors, we may have MS (p) > MS (p ) with p < p. 18/ 26

A trivial result Theorem. Let G = (V, E, w) be a DAG. We have Seq = MS opt (1)... MS opt (p) MS opt (p + 1)... MS opt ( ). Allowing to use more processors cannot hurt. However, using more processors may hurt, especially in a model where communications are taken into account. If we define MS (p) as the smallest makespan of schedules using exactly p processors, we may have MS (p) > MS (p ) with p < p. 18/ 26

Graham Notation Many parameter can change in a scheduling problem. Graham has then proposed the following classification : α β γ [Bru98] α is the processor environment (a few examples): : single processor; P : identical processors; Q: uniform processors; R: unrelated processors; β describe task and resource characteristics (a few examples): pmtn: preemption; prec, tree or chains: general precedence constraints, tree constraints, set of chain constraints and independent tasks otherwise; rj : tasks have release dates; pj = p or p p j p: all task have processing time equal to p, or comprised between p and p, or have arbitrary processing times otherwise; d: deadlines; γ denotes the optimization criterion (a few examples): Cmax : makespan; Ci : average completion time; w i C i : weighted A.C.T; Lmax : maximum lateness (max C i d i );... 19/ 26

Graham Notation Many parameter can change in a scheduling problem. Graham has then proposed the following classification : α β γ [Bru98] α is the processor environment (a few examples): : single processor; P : identical processors; Q: uniform processors; R: unrelated processors; β describe task and resource characteristics (a few examples): pmtn: preemption; prec, tree or chains: general precedence constraints, tree constraints, set of chain constraints and independent tasks otherwise; rj : tasks have release dates; pj = p or p p j p: all task have processing time equal to p, or comprised between p and p, or have arbitrary processing times otherwise; d: deadlines; γ denotes the optimization criterion (a few examples): Cmax : makespan; Ci : average completion time; w i C i : weighted A.C.T; Lmax : maximum lateness (max C i d i );... 19/ 26

Graham Notation Many parameter can change in a scheduling problem. Graham has then proposed the following classification : α β γ [Bru98] α is the processor environment (a few examples): : single processor; P : identical processors; Q: uniform processors; R: unrelated processors; β describe task and resource characteristics (a few examples): pmtn: preemption; prec, tree or chains: general precedence constraints, tree constraints, set of chain constraints and independent tasks otherwise; rj : tasks have release dates; pj = p or p p j p: all task have processing time equal to p, or comprised between p and p, or have arbitrary processing times otherwise; d: deadlines; γ denotes the optimization criterion (a few examples): Cmax : makespan; Ci : average completion time; w i C i : weighted A.C.T; Lmax : maximum lateness (max C i d i );... 19/ 26

Graham Notation Many parameter can change in a scheduling problem. Graham has then proposed the following classification : α β γ [Bru98] α is the processor environment (a few examples): : single processor; P : identical processors; Q: uniform processors; R: unrelated processors; β describe task and resource characteristics (a few examples): pmtn: preemption; prec, tree or chains: general precedence constraints, tree constraints, set of chain constraints and independent tasks otherwise; rj : tasks have release dates; pj = p or p p j p: all task have processing time equal to p, or comprised between p and p, or have arbitrary processing times otherwise; d: deadlines; γ denotes the optimization criterion (a few examples): Cmax : makespan; Ci : average completion time; w i C i : weighted A.C.T; Lmax : maximum lateness (max C i d i );... 19/ 26

Graham Notation Many parameter can change in a scheduling problem. Graham has then proposed the following classification : α β γ [Bru98] α is the processor environment (a few examples): : single processor; P : identical processors; Q: uniform processors; R: unrelated processors; β describe task and resource characteristics (a few examples): pmtn: preemption; prec, tree or chains: general precedence constraints, tree constraints, set of chain constraints and independent tasks otherwise; rj : tasks have release dates; pj = p or p p j p: all task have processing time equal to p, or comprised between p and p, or have arbitrary processing times otherwise; d: deadlines; γ denotes the optimization criterion (a few examples): Cmax : makespan; Ci : average completion time; w i C i : weighted A.C.T; Lmax : maximum lateness (max C i d i );... 19/ 26

Graham Notation Many parameter can change in a scheduling problem. Graham has then proposed the following classification : α β γ [Bru98] α is the processor environment (a few examples): : single processor; P : identical processors; Q: uniform processors; R: unrelated processors; β describe task and resource characteristics (a few examples): pmtn: preemption; prec, tree or chains: general precedence constraints, tree constraints, set of chain constraints and independent tasks otherwise; rj : tasks have release dates; pj = p or p p j p: all task have processing time equal to p, or comprised between p and p, or have arbitrary processing times otherwise; d: deadlines; γ denotes the optimization criterion (a few examples): Cmax : makespan; Ci : average completion time; w i C i : weighted A.C.T; Lmax : maximum lateness (max C i d i );... 19/ 26

Graham Notation Many parameter can change in a scheduling problem. Graham has then proposed the following classification : α β γ [Bru98] α is the processor environment (a few examples): : single processor; P : identical processors; Q: uniform processors; R: unrelated processors; β describe task and resource characteristics (a few examples): pmtn: preemption; prec, tree or chains: general precedence constraints, tree constraints, set of chain constraints and independent tasks otherwise; rj : tasks have release dates; pj = p or p p j p: all task have processing time equal to p, or comprised between p and p, or have arbitrary processing times otherwise; d: deadlines; γ denotes the optimization criterion (a few examples): Cmax : makespan; Ci : average completion time; w i C i : weighted A.C.T; Lmax : maximum lateness (max C i d i );... 19/ 26

Graham Notation Many parameter can change in a scheduling problem. Graham has then proposed the following classification : α β γ [Bru98] α is the processor environment (a few examples): : single processor; P : identical processors; Q: uniform processors; R: unrelated processors; β describe task and resource characteristics (a few examples): pmtn: preemption; prec, tree or chains: general precedence constraints, tree constraints, set of chain constraints and independent tasks otherwise; rj : tasks have release dates; pj = p or p p j p: all task have processing time equal to p, or comprised between p and p, or have arbitrary processing times otherwise; d: deadlines; γ denotes the optimization criterion (a few examples): Cmax : makespan; Ci : average completion time; w i C i : weighted A.C.T; Lmax : maximum lateness (max C i d i );... 19/ 26

Graham Notation Many parameter can change in a scheduling problem. Graham has then proposed the following classification : α β γ [Bru98] α is the processor environment (a few examples): : single processor; P : identical processors; Q: uniform processors; R: unrelated processors; β describe task and resource characteristics (a few examples): pmtn: preemption; prec, tree or chains: general precedence constraints, tree constraints, set of chain constraints and independent tasks otherwise; rj : tasks have release dates; pj = p or p p j p: all task have processing time equal to p, or comprised between p and p, or have arbitrary processing times otherwise; d: deadlines; γ denotes the optimization criterion (a few examples): Cmax : makespan; Ci : average completion time; w i C i : weighted A.C.T; Lmax : maximum lateness (max C i d i );... 19/ 26

Graham Notation Many parameter can change in a scheduling problem. Graham has then proposed the following classification : α β γ [Bru98] α is the processor environment (a few examples): : single processor; P : identical processors; Q: uniform processors; R: unrelated processors; β describe task and resource characteristics (a few examples): pmtn: preemption; prec, tree or chains: general precedence constraints, tree constraints, set of chain constraints and independent tasks otherwise; rj : tasks have release dates; pj = p or p p j p: all task have processing time equal to p, or comprised between p and p, or have arbitrary processing times otherwise; d: deadlines; γ denotes the optimization criterion (a few examples): Cmax : makespan; Ci : average completion time; w i C i : weighted A.C.T; Lmax : maximum lateness (max C i d i );... 19/ 26

Graham Notation Many parameter can change in a scheduling problem. Graham has then proposed the following classification : α β γ [Bru98] α is the processor environment (a few examples): : single processor; P : identical processors; Q: uniform processors; R: unrelated processors; β describe task and resource characteristics (a few examples): pmtn: preemption; prec, tree or chains: general precedence constraints, tree constraints, set of chain constraints and independent tasks otherwise; rj : tasks have release dates; pj = p or p p j p: all task have processing time equal to p, or comprised between p and p, or have arbitrary processing times otherwise; d: deadlines; γ denotes the optimization criterion (a few examples): Cmax : makespan; Ci : average completion time; w i C i : weighted A.C.T; Lmax : maximum lateness (max C i d i );... 19/ 26

Graham Notation Many parameter can change in a scheduling problem. Graham has then proposed the following classification : α β γ [Bru98] α is the processor environment (a few examples): : single processor; P : identical processors; Q: uniform processors; R: unrelated processors; β describe task and resource characteristics (a few examples): pmtn: preemption; prec, tree or chains: general precedence constraints, tree constraints, set of chain constraints and independent tasks otherwise; rj : tasks have release dates; pj = p or p p j p: all task have processing time equal to p, or comprised between p and p, or have arbitrary processing times otherwise; d: deadlines; γ denotes the optimization criterion (a few examples): Cmax : makespan; Ci : average completion time; w i C i : weighted A.C.T; Lmax : maximum lateness (max C i d i );... 19/ 26

Graham Notation Many parameter can change in a scheduling problem. Graham has then proposed the following classification : α β γ [Bru98] α is the processor environment (a few examples): : single processor; P : identical processors; Q: uniform processors; R: unrelated processors; β describe task and resource characteristics (a few examples): pmtn: preemption; prec, tree or chains: general precedence constraints, tree constraints, set of chain constraints and independent tasks otherwise; rj : tasks have release dates; pj = p or p p j p: all task have processing time equal to p, or comprised between p and p, or have arbitrary processing times otherwise; d: deadlines; γ denotes the optimization criterion (a few examples): Cmax : makespan; Ci : average completion time; w i C i : weighted A.C.T; Lmax : maximum lateness (max C i d i );... 19/ 26

Graham Notation Many parameter can change in a scheduling problem. Graham has then proposed the following classification : α β γ [Bru98] α is the processor environment (a few examples): : single processor; P : identical processors; Q: uniform processors; R: unrelated processors; β describe task and resource characteristics (a few examples): pmtn: preemption; prec, tree or chains: general precedence constraints, tree constraints, set of chain constraints and independent tasks otherwise; rj : tasks have release dates; pj = p or p p j p: all task have processing time equal to p, or comprised between p and p, or have arbitrary processing times otherwise; d: deadlines; γ denotes the optimization criterion (a few examples): Cmax : makespan; Ci : average completion time; w i C i : weighted A.C.T; Lmax : maximum lateness (max C i d i );... 19/ 26

Graham Notation Many parameter can change in a scheduling problem. Graham has then proposed the following classification : α β γ [Bru98] α is the processor environment (a few examples): : single processor; P : identical processors; Q: uniform processors; R: unrelated processors; β describe task and resource characteristics (a few examples): pmtn: preemption; prec, tree or chains: general precedence constraints, tree constraints, set of chain constraints and independent tasks otherwise; rj : tasks have release dates; pj = p or p p j p: all task have processing time equal to p, or comprised between p and p, or have arbitrary processing times otherwise; d: deadlines; γ denotes the optimization criterion (a few examples): Cmax : makespan; Ci : average completion time; w i C i : weighted A.C.T; Lmax : maximum lateness (max C i d i );... 19/ 26

Graham Notation Many parameter can change in a scheduling problem. Graham has then proposed the following classification : α β γ [Bru98] α is the processor environment (a few examples): : single processor; P : identical processors; Q: uniform processors; R: unrelated processors; β describe task and resource characteristics (a few examples): pmtn: preemption; prec, tree or chains: general precedence constraints, tree constraints, set of chain constraints and independent tasks otherwise; rj : tasks have release dates; pj = p or p p j p: all task have processing time equal to p, or comprised between p and p, or have arbitrary processing times otherwise; d: deadlines; γ denotes the optimization criterion (a few examples): Cmax : makespan; Ci : average completion time; w i C i : weighted A.C.T; Lmax : maximum lateness (max C i d i );... 19/ 26

Graham Notation Many parameter can change in a scheduling problem. Graham has then proposed the following classification : α β γ [Bru98] α is the processor environment (a few examples): : single processor; P : identical processors; Q: uniform processors; R: unrelated processors; β describe task and resource characteristics (a few examples): pmtn: preemption; prec, tree or chains: general precedence constraints, tree constraints, set of chain constraints and independent tasks otherwise; rj : tasks have release dates; pj = p or p p j p: all task have processing time equal to p, or comprised between p and p, or have arbitrary processing times otherwise; d: deadlines; γ denotes the optimization criterion (a few examples): Cmax : makespan; Ci : average completion time; w i C i : weighted A.C.T; Lmax : maximum lateness (max C i d i );... 19/ 26

Graham Notation Many parameter can change in a scheduling problem. Graham has then proposed the following classification : α β γ [Bru98] α is the processor environment (a few examples): : single processor; P : identical processors; Q: uniform processors; R: unrelated processors; β describe task and resource characteristics (a few examples): pmtn: preemption; prec, tree or chains: general precedence constraints, tree constraints, set of chain constraints and independent tasks otherwise; rj : tasks have release dates; pj = p or p p j p: all task have processing time equal to p, or comprised between p and p, or have arbitrary processing times otherwise; d: deadlines; γ denotes the optimization criterion (a few examples): Cmax : makespan; Ci : average completion time; w i C i : weighted A.C.T; Lmax : maximum lateness (max C i d i );... 19/ 26

Outline 1 Task graphs from outer space 2 Scheduling definitions and notions 3 Platform models and scheduling problems 20/ 26

Complexity results If we have an infinite number of processors, the as-soon-as-possible schedule is optimal. MS opt ( ) = max w(φ). Φ path in G P, 2 C max is weakly NP-complete (2-Partition); Proof. By reduction to 2-Partition: can A = {a 1,..., a n } be partitioned into two sets A 1, A 2 such a A 1 a = a A 2 a. p = 2, G = (V, E, w with V = {v 1,..., v n }, E = and w(v i ) = a i, 1 i n. Finding a schedule of makespan smaller or equal to 1 2 i a i is equivalent to solve the instance of 2-Partition. P, 3 prec C max is strongly NP-complete (3DM); P prec, p j = 1 C max is strongly NP-complete (3DM); P, p 3 prec, p j = 1 C max is open; P, 2 prec, 1 p j 2 C max is strongly NP-complete; 21/ 26

Complexity results If we have an infinite number of processors, the as-soon-as-possible schedule is optimal. MS opt ( ) = max w(φ). Φ path in G P, 2 C max is weakly NP-complete (2-Partition); Proof. By reduction to 2-Partition: can A = {a 1,..., a n } be partitioned into two sets A 1, A 2 such a A 1 a = a A 2 a. p = 2, G = (V, E, w with V = {v 1,..., v n }, E = and w(v i ) = a i, 1 i n. Finding a schedule of makespan smaller or equal to 1 2 i a i is equivalent to solve the instance of 2-Partition. P, 3 prec C max is strongly NP-complete (3DM); P prec, p j = 1 C max is strongly NP-complete (3DM); P, p 3 prec, p j = 1 C max is open; P, 2 prec, 1 p j 2 C max is strongly NP-complete; 21/ 26

Complexity results If we have an infinite number of processors, the as-soon-as-possible schedule is optimal. MS opt ( ) = max w(φ). Φ path in G P, 2 C max is weakly NP-complete (2-Partition); Proof. By reduction to 2-Partition: can A = {a 1,..., a n } be partitioned into two sets A 1, A 2 such a A 1 a = a A 2 a. p = 2, G = (V, E, w with V = {v 1,..., v n }, E = and w(v i ) = a i, 1 i n. Finding a schedule of makespan smaller or equal to 1 2 i a i is equivalent to solve the instance of 2-Partition. P, 3 prec C max is strongly NP-complete (3DM); P prec, p j = 1 C max is strongly NP-complete (3DM); P, p 3 prec, p j = 1 C max is open; P, 2 prec, 1 p j 2 C max is strongly NP-complete; 21/ 26

Complexity results If we have an infinite number of processors, the as-soon-as-possible schedule is optimal. MS opt ( ) = max w(φ). Φ path in G P, 2 C max is weakly NP-complete (2-Partition); Proof. By reduction to 2-Partition: can A = {a 1,..., a n } be partitioned into two sets A 1, A 2 such a A 1 a = a A 2 a. p = 2, G = (V, E, w with V = {v 1,..., v n }, E = and w(v i ) = a i, 1 i n. Finding a schedule of makespan smaller or equal to 1 2 i a i is equivalent to solve the instance of 2-Partition. P, 3 prec C max is strongly NP-complete (3DM); P prec, p j = 1 C max is strongly NP-complete (3DM); P, p 3 prec, p j = 1 C max is open; P, 2 prec, 1 p j 2 C max is strongly NP-complete; 21/ 26

Complexity results If we have an infinite number of processors, the as-soon-as-possible schedule is optimal. MS opt ( ) = max w(φ). Φ path in G P, 2 C max is weakly NP-complete (2-Partition); Proof. By reduction to 2-Partition: can A = {a 1,..., a n } be partitioned into two sets A 1, A 2 such a A 1 a = a A 2 a. p = 2, G = (V, E, w with V = {v 1,..., v n }, E = and w(v i ) = a i, 1 i n. Finding a schedule of makespan smaller or equal to 1 2 i a i is equivalent to solve the instance of 2-Partition. P, 3 prec C max is strongly NP-complete (3DM); P prec, p j = 1 C max is strongly NP-complete (3DM); P, p 3 prec, p j = 1 C max is open; P, 2 prec, 1 p j 2 C max is strongly NP-complete; 21/ 26

Complexity results If we have an infinite number of processors, the as-soon-as-possible schedule is optimal. MS opt ( ) = max w(φ). Φ path in G P, 2 C max is weakly NP-complete (2-Partition); Proof. By reduction to 2-Partition: can A = {a 1,..., a n } be partitioned into two sets A 1, A 2 such a A 1 a = a A 2 a. p = 2, G = (V, E, w with V = {v 1,..., v n }, E = and w(v i ) = a i, 1 i n. Finding a schedule of makespan smaller or equal to 1 2 i a i is equivalent to solve the instance of 2-Partition. P, 3 prec C max is strongly NP-complete (3DM); P prec, p j = 1 C max is strongly NP-complete (3DM); P, p 3 prec, p j = 1 C max is open; P, 2 prec, 1 p j 2 C max is strongly NP-complete; 21/ 26

Complexity results If we have an infinite number of processors, the as-soon-as-possible schedule is optimal. MS opt ( ) = max w(φ). Φ path in G P, 2 C max is weakly NP-complete (2-Partition); Proof. By reduction to 2-Partition: can A = {a 1,..., a n } be partitioned into two sets A 1, A 2 such a A 1 a = a A 2 a. p = 2, G = (V, E, w with V = {v 1,..., v n }, E = and w(v i ) = a i, 1 i n. Finding a schedule of makespan smaller or equal to 1 2 i a i is equivalent to solve the instance of 2-Partition. P, 3 prec C max is strongly NP-complete (3DM); P prec, p j = 1 C max is strongly NP-complete (3DM); P, p 3 prec, p j = 1 C max is open; P, 2 prec, 1 p j 2 C max is strongly NP-complete; 21/ 26

Complexity results If we have an infinite number of processors, the as-soon-as-possible schedule is optimal. MS opt ( ) = max w(φ). Φ path in G P, 2 C max is weakly NP-complete (2-Partition); Proof. By reduction to 2-Partition: can A = {a 1,..., a n } be partitioned into two sets A 1, A 2 such a A 1 a = a A 2 a. p = 2, G = (V, E, w with V = {v 1,..., v n }, E = and w(v i ) = a i, 1 i n. Finding a schedule of makespan smaller or equal to 1 2 i a i is equivalent to solve the instance of 2-Partition. P, 3 prec C max is strongly NP-complete (3DM); P prec, p j = 1 C max is strongly NP-complete (3DM); P, p 3 prec, p j = 1 C max is open; P, 2 prec, 1 p j 2 C max is strongly NP-complete; 21/ 26

List scheduling When simple problems are hard, we should try to find good approximation heuristics. Natural idea: using greedy strategy like trying to allocate the most possible task at a given time-step. However at some point we may face a choice (when there is more ready tasks than available processors). Any strategy that does not let on purpose a processor idle is efficient. Such a schedule is called list-schedule. Theorem (Coffman). Let G = (V, E, w) be a DAG, p the number of processors, and σ p a list-schedule of G. ( ) MS(σ p ) 2 1 p MS opt (p). One can actually prove that this bound cannot be improved. Most of the time, list-heuristics are based on the critical path. 22/ 26

List scheduling When simple problems are hard, we should try to find good approximation heuristics. Natural idea: using greedy strategy like trying to allocate the most possible task at a given time-step. However at some point we may face a choice (when there is more ready tasks than available processors). Any strategy that does not let on purpose a processor idle is efficient. Such a schedule is called list-schedule. Theorem (Coffman). Let G = (V, E, w) be a DAG, p the number of processors, and σ p a list-schedule of G. ( ) MS(σ p ) 2 1 p MS opt (p). One can actually prove that this bound cannot be improved. Most of the time, list-heuristics are based on the critical path. 22/ 26

List scheduling When simple problems are hard, we should try to find good approximation heuristics. Natural idea: using greedy strategy like trying to allocate the most possible task at a given time-step. However at some point we may face a choice (when there is more ready tasks than available processors). Any strategy that does not let on purpose a processor idle is efficient. Such a schedule is called list-schedule. Theorem (Coffman). Let G = (V, E, w) be a DAG, p the number of processors, and σ p a list-schedule of G. ( ) MS(σ p ) 2 1 p MS opt (p). One can actually prove that this bound cannot be improved. Most of the time, list-heuristics are based on the critical path. 22/ 26

Taking communications into account A very simple model (things are already complicated enough): the macro-data flow model. If there is some data-dependence between T and T, the communication cost is { c(t, T 0 if alloc(t ) = alloc(t ) ) = c(t, T ) otherwise Definition. A DAG with communication cost (say cdag) is a directed acyclic graph G = (V, E, w, c) where vertexes represent tasks and edges represent dependence constraints. w : V N is the computation time function and c : E N is the communication time function. Any valid schedule has to respect the dependence constraints. e = (v, v ) E, { σ(v) + w(v) σ(v ) if alloc(v) = alloc(v ) σ(v) + w(v) + c(v; v ) σ(v ) otherwise. 23/ 26

Taking communications into account A very simple model (things are already complicated enough): the macro-data flow model. If there is some data-dependence between T and T, the communication cost is { c(t, T 0 if alloc(t ) = alloc(t ) ) = c(t, T ) otherwise Definition. A DAG with communication cost (say cdag) is a directed acyclic graph G = (V, E, w, c) where vertexes represent tasks and edges represent dependence constraints. w : V N is the computation time function and c : E N is the communication time function. Any valid schedule has to respect the dependence constraints. e = (v, v ) E, { σ(v) + w(v) σ(v ) if alloc(v) = alloc(v ) σ(v) + w(v) + c(v; v ) σ(v ) otherwise. 23/ 26

Taking communications into account (cont d) Even Pb( ) is NP-complete!!! You constantly have to figure out whether you should use more processor (but then pay more fore communications) or not. Finding the good trade-off is a real challenge. 4/3-approximation if all communication times are smaller than computation times. Finding guaranteed approximations for other settings is really hard, but really useful. That must be the reason why there is so much research about it! 24/ 26

Conclusion Most of the time, the only thing we can done is to compare heuristics. There are three ways of doing that: Theory: being able to guarantee your heuristic; Experiment: Generating random graphs and/or typical application graphs along with platform graphs to compare your heuristics. Smart: proving that your heuristic is optimal for a particular class of graphs (fork, join, fork-join, bounded degree,... ). However, remember that the first thing to do is to look whether your problem is NP-complete or not. Who knows? You may be lucky... 25/ 26

Bibliography A.J. Bernstein. Analysis of programs for parallel processing. IEEE Transactions on Electronic Computers, 15:757 762, October 1966. Peter Brucker. Scheduling Algorithms. Springer, Heidelberg, 2 edition, 1998. 26/ 26