Throughput constraint for Synchronous Data Flow Graphs

Size: px

Start display at page:

Download "Throughput constraint for Synchronous Data Flow Graphs"

Jeffry Parrish
10 years ago
Views:

1 Throughput constraint for Synchronous Data Flow Graphs *Alessio Bonfietti Michele Lombardi Michela Milano Luca Benini!"#$%&'()*+,-)./&0&20304(5

2 !"#$%&'()* Resource allocation and Scheduling of an application modeled with a Synchronous Data-Flow Graph given a Throughput Bound Embedded Systems Multimedia Systems MP-SoC (MultiProcessor-System-on-Chip) Multimedia Applications Stream Computing based on Data-Flow Model Throughput Synchronous Data-Flow 2

3 +,-*./'-'0%2"&"34#'5%6/"(. Node (Actors) - Task 3 D Edge - Communication Channel 2 2 Token - Data Concurrency A B Execution Constraints (Rate) Periodic Behaviour (Repetition Vector) Repetition Vector : [3,2,2,] 3 C An actor fires when there are enough tokens on all input arcs.!"#$%&'(%")%"#*+,%-.,(/0 23 #+)/4'(/%5)6!"#$%*+7()/%8+,%("#$%5)9+5)9%#$"))(: ;3 <,+=4#(/%+4*6!"#$%*+7()/%8+,%("#$%+4*9+5)9%#$"))(: 3

4 7'8'9:-:'0%+,-*./'-'0%2"&"34#'5%6/"(. HSDFG HSDFG Homogeneous Rate (all rates are ) Single Execution of each Task A B Higher number of nodes A2 Property: Preserves the Throughput Transformation process is based on the Repetition Vector (eg: [2,]) SDFG Throughput of SDF graph = A 2 B Throughput of HSDF graph 4

5 !./'09.(0& To compute the throughput of the HSDF graph:! Find all Cycles in the graph! For each Cycle c compute the total execution time over the number of tokens of c! Throughput = / the maximum of these values The throughput is the average number of actor firings over time 5

6 ;-(0&%<%=0&(0& Input Output SDF Graph # Tasks # Channels # Tokens Architecture # Processors Allocation Bind each task with a specific processor Schedule Order the execution of tasks on each processor Homogeneous Architecture 6

7 !"#$%"&'()*+ More than two decades of work on SDFG mapping. Heuristic Approach eg: Periodic Admissible Sequential Schedule (PASS) [Lee 87] eg: SDF-3 tool [Geilen,Stuijk 06] Single-Core PASS Multi-Core Complete Approach based Heuristic SDF on Constraint Programming Complete Our Work 7

8 >'-&/")-&%?/'9/"88)-9 Constraint programming is a problem-solving methodology Solve Hard Combinatorial Problems Model Variables Finite Domain: set of values that a variable can assume Constraint: Filtering Algorithm Domain Reduction 8

9 Solving Consistency Constraint Propagation: reduction of the domain of the variables to prevent search to find an infeasible solution Search Solve model : define/choose search algorithm define/choose heuristics once problem is modeled using constraints, wide selection of solution techniques available 9

10 ,)&"#'-'.$*/$0#" Idea: Model the effects of decisions by means of modifications to the graph Allocation Variables Graph Variables P i [0..#processor-] Scheduling Variables Arcsij [0,] Next i [-,0..task-] 0

11 ,)&"#'-'2)3%*$/3% Idea: Model the effects of decisions by means of modifications to the graph HSDFG Order Proc A B A,B Edge Constraints P[] Proc 2 C D C,D Order Constraints Next[] Deterministic Behaviour

12 4)#5"* Tree-Search Strategy : 2 Phases ) Task Allocation P[] 2) Ordering of tasks on processors Next[] Throughput Constraint Static Symmetry Breaking Constraint 2

13 67*)897:8%'2)3%*$/3% Thr_cst(P, Next, Arcs, Thr_Bound) W: Execution Time ψ: Number of Token Level k Task v A B C D 0 0 A HSDFG B D C Execution times: [,,,] 3

14 67*)897:8%'2)3%*$/3% Thr_cst(P, Next, Arcs, Thr_Bound) W: Execution Time ψ: Number of Token Level k Task v A B C D 0 0 A HSDFG B D C Execution times: [,,,] 3

15 67*)897:8%'2)3%*$/3% Thr_cst(P, Next, Arcs, Thr_Bound) W: Execution Time ψ: Number of Token Level k Task v A B C D 0 0 A HSDFG B D C 4 4 Execution times: [,,,] 3

16 67*)897:8%'2)3%*$/3% A. Dasdan, R. K. Gupta [Das98] Improvements: R.M.Karp [Karp66] Level k A B C D Task v ) 2) 3) Take into account Tokens The original algorithm was devised to count a token for each arc. Generalized for not Strongly Connected Graphs Most Throughput computation algorithms target S.C. graphs. Cycle Identification at each step, instead at the end of the algorithm 4 4 4

17 67*)897:8%'2)3%*$/3% Algorithm Step Longer Cycles! Throughput Lower bound "+25$,62--!,6+-27/+5% Throughput Lower bound "+25$,62--!,6+-27/+5% Throughput 6+-27/+5% Throughput Value Decreasing (0+2'*23 -+4%0,"+25$ " #" # $ $ % % & & Throughput 6+-27/+5% ' ' (%)*+,$%--!,./#%0#! Algorithm Step Fast Pruning 5

18 4)#5"*'4"$*;7 Search Step Longer Cycles Throughput Value Decreasing Fast Pruning! Throughput Upper Lower Bound bound "+25$,62--!,6+-27/+5% -+4%0,"+25$ (0+2'*23 " # $ % & Throughput Solution 6+-27/+5% ' (%)*+,$%--!,./#%0#! Algorithm Search Step Step 6

19 4)#5"*'4"$*;7 Search Step Longer Cycles Throughput Value Decreasing Fast Pruning! Throughput Upper Lower Bound bound "+25$,62--!,6+-27/+5% -+4%0,"+25$ (0+2'*23 " # $ % & Throughput Solution 6+-27/+5% ' (%)*+,$%--!,./#%0#! Algorithm Search Step Step 6

20 4)#5"*'4"$*;7 Search Step Longer Cycles Throughput Value Decreasing Fast Pruning! Throughput Upper Lower Bound bound "+25$,62--!,6+-27/+5% -+4%0,"+25$ (0+2'*23 " # $ % & Throughput Solution 6+-27/+5% ' (%)*+,$%--!,./#%0#! Algorithm Search Step Step 6

21 4)#5"*'4"$*;7 Search Step Longer Cycles Throughput Value Decreasing Fast Pruning! Throughput Upper Lower Bound bound "+25$,62--!,6+-27/+5% -+4%0,"+25$ (0+2'*23 " # $ % & Throughput Solution 6+-27/+5% ' (%)*+,$%--!,./#%0#! Algorithm Search Step Step 6

22 4)#5"*'4"$*;7 Search Step Longer Cycles Throughput Value Decreasing Fast Pruning! Throughput Upper Lower Bound bound "+25$,62--!,6+-27/+5% -+4%0,"+25$ (0+2'*23 " # $ % & Throughput Solution 6+-27/+5% ' (%)*+,$%--!,./#%0#! Algorithm Search Step Step 6

23 Caching System Provides some incrementality Trivial Bound Total execution time on a Processor PI PII Processor Pruning Exclude from the computation processors with no in-arcs/out-arcs. PIII PIV 7

24 6"% Implemented on ILOG Solver 6.3 Instance structure Cyclic Graph Architectures: 2 Core Acyclic Graph 4 Core Strongly Connected Graph 8 Core 8

25 Run Time 6"% Optimization Improvement Algorithm Not Optimized Algorithm Optimized 9

26 6"% Optimal Solution # Processors # nodes Cyclic Acyclic Str. Connected Search Time Constr. Time Search Time Constr. Time Search Time Constr. Time 0 0,02 0,0 0,04 0,02 0,02 0,0 2 5,88,33 2,66 0,85 0,92 0, ,5 77,89 46,89 08,06 44,5 82,3 0 0,0 0 0,04 0, ,4,65,85,35 0,06 0, ,2989 0,863 43,39 293,38 0,89 0,66 0 0,0 0 0,02 0,0 0, ,02 0,0,37,09 0, ,27 0,9 207,2 69,24 0,06 0,02 20

27 2)3;#8/)3 CP-base method for allocating and scheduling HSDFGs on multiprocessor platforms. It can be used to find the optimal solution, or to find a feasible solution Future & Present Work Search directly on SDF graph. Take into account Latency Communication, Memory Capacity Constraints.. 2

Optimizing Configuration and Application Mapping for MPSoC Architectures

Optimizing Configuration and Application Mapping for MPSoC Architectures École Polytechnique de Montréal, Canada Email : [email protected] 1 Multi-Processor Systems on Chip (MPSoC) Design Trends