Throughput constraint for Synchronous Data Flow Graphs *Alessio Bonfietti Michele Lombardi Michela Milano Luca Benini!"#$%&'()*+,-)./&0&20304(5 60,7&-8990,.+:&;/&.<+&=>"!?@A>&"'&=,0B+C.
!"#$%&'()* Resource allocation and Scheduling of an application modeled with a Synchronous Data-Flow Graph given a Throughput Bound Embedded Systems Multimedia Systems MP-SoC (MultiProcessor-System-on-Chip) Multimedia Applications Stream Computing based on Data-Flow Model Throughput Synchronous Data-Flow 2
+,-*./'-'0%2"&"34#'5%6/"(. Node (Actors) - Task 3 D Edge - Communication Channel 2 2 Token - Data Concurrency A 2 2 3 B Execution Constraints (Rate) Periodic Behaviour (Repetition Vector) Repetition Vector : [3,2,2,] 3 C An actor fires when there are enough tokens on all input arcs.!"#$%&'(%")%"#*+,%-.,(/0 23 #+)/4'(/%5)6!"#$%*+7()/%8+,%("#$%5)9+5)9%#$"))(: ;3 <,+=4#(/%+4*6!"#$%*+7()/%8+,%("#$%+4*9+5)9%#$"))(: 3
7'8'9:-:'0%+,-*./'-'0%2"&"34#'5%6/"(. HSDFG HSDFG Homogeneous Rate (all rates are ) Single Execution of each Task A B Higher number of nodes A2 Property: Preserves the Throughput Transformation process is based on the Repetition Vector (eg: [2,]) SDFG Throughput of SDF graph = A 2 B Throughput of HSDF graph 4
!./'09.(0& To compute the throughput of the HSDF graph:! Find all Cycles in the graph! For each Cycle c compute the total execution time over the number of tokens of c! Throughput = / the maximum of these values The throughput is the average number of actor firings over time 5
;-(0&%<%=0&(0& Input Output SDF Graph # Tasks # Channels # Tokens Architecture # Processors Allocation Bind each task with a specific processor Schedule Order the execution of tasks on each processor Homogeneous Architecture 6
!"#$%"&'()*+ More than two decades of work on SDFG mapping. Heuristic Approach eg: Periodic Admissible Sequential Schedule (PASS) [Lee 87] eg: SDF-3 tool [Geilen,Stuijk 06] Single-Core PASS Multi-Core Complete Approach based Heuristic SDF-3...... on Constraint Programming Complete Our Work 7
>'-&/")-&%?/'9/"88)-9 Constraint programming is a problem-solving methodology Solve Hard Combinatorial Problems Model Variables Finite Domain: set of values that a variable can assume Constraint: Filtering Algorithm Domain Reduction 8
>?@%A"/)"B#:%C%>'-&/")-&%C%+:"/*. Solving Consistency Constraint Propagation: reduction of the domain of the variables to prevent search to find an infeasible solution Search Solve model : define/choose search algorithm define/choose heuristics once problem is modeled using constraints, wide selection of solution techniques available 9
,)&"#'-'.$*/$0#" Idea: Model the effects of decisions by means of modifications to the graph Allocation Variables Graph Variables P i [0..#processor-] Scheduling Variables Arcsij [0,] Next i [-,0..task-] 0
,)&"#'-'2)3%*$/3% Idea: Model the effects of decisions by means of modifications to the graph HSDFG Order Proc A B A,B Edge Constraints P[] Proc 2 C D C,D Order Constraints Next[] Deterministic Behaviour
4)#5"* Tree-Search Strategy : 2 Phases ) Task Allocation P[] 2) Ordering of tasks on processors Next[] Throughput Constraint Static Symmetry Breaking Constraint 2
67*)897:8%'2)3%*$/3% Thr_cst(P, Next, Arcs, Thr_Bound) W: Execution Time ψ: Number of Token Level k Task v A B C D 0 0 A HSDFG B D 2 3 4 C Execution times: [,,,] 3
67*)897:8%'2)3%*$/3% Thr_cst(P, Next, Arcs, Thr_Bound) W: Execution Time ψ: Number of Token Level k Task v A B C D 0 0 A HSDFG B D 2 3 4 2 C Execution times: [,,,] 3
67*)897:8%'2)3%*$/3% Thr_cst(P, Next, Arcs, Thr_Bound) W: Execution Time ψ: Number of Token Level k Task v A B C D 0 0 A HSDFG B D 2 2 2 3 3 C 4 4 Execution times: [,,,] 3
67*)897:8%'2)3%*$/3% A. Dasdan, R. K. Gupta [Das98] Improvements: R.M.Karp [Karp66] Level k A B C D 0 0 2 3 Task v 2 2 3 ) 2) 3) Take into account Tokens The original algorithm was devised to count a token for each arc. Generalized for not Strongly Connected Graphs Most Throughput computation algorithms target S.C. graphs. Cycle Identification at each step, instead at the end of the algorithm 4 4 4
67*)897:8%'2)3%*$/3% Algorithm Step Longer Cycles! Throughput Lower bound "+25$,62--!,6+-27/+5% Throughput Lower bound "+25$,62--!,6+-27/+5% Throughput 6+-27/+5% Throughput Value Decreasing (0+2'*23 -+4%0,"+25$ " #" # $ $ % % & & Throughput 6+-27/+5% ' ' (%)*+,$%--!,./#%0#! Algorithm Step Fast Pruning 5
4)#5"*'4"$*;7 Search Step Longer Cycles Throughput Value Decreasing Fast Pruning! Throughput Upper Lower Bound bound "+25$,62--!,6+-27/+5% -+4%0,"+25$ (0+2'*23 " # $ % & Throughput Solution 6+-27/+5% ' (%)*+,$%--!,./#%0#! Algorithm Search Step Step 6
4)#5"*'4"$*;7 Search Step Longer Cycles Throughput Value Decreasing Fast Pruning! Throughput Upper Lower Bound bound "+25$,62--!,6+-27/+5% -+4%0,"+25$ (0+2'*23 " # $ % & Throughput Solution 6+-27/+5% ' (%)*+,$%--!,./#%0#! Algorithm Search Step Step 6
4)#5"*'4"$*;7 Search Step Longer Cycles Throughput Value Decreasing Fast Pruning! Throughput Upper Lower Bound bound "+25$,62--!,6+-27/+5% -+4%0,"+25$ (0+2'*23 " # $ % & Throughput Solution 6+-27/+5% ' (%)*+,$%--!,./#%0#! Algorithm Search Step Step 6
4)#5"*'4"$*;7 Search Step Longer Cycles Throughput Value Decreasing Fast Pruning! Throughput Upper Lower Bound bound "+25$,62--!,6+-27/+5% -+4%0,"+25$ (0+2'*23 " # $ % & Throughput Solution 6+-27/+5% ' (%)*+,$%--!,./#%0#! Algorithm Search Step Step 6
4)#5"*'4"$*;7 Search Step Longer Cycles Throughput Value Decreasing Fast Pruning! Throughput Upper Lower Bound bound "+25$,62--!,6+-27/+5% -+4%0,"+25$ (0+2'*23 " # $ % & Throughput Solution 6+-27/+5% ' (%)*+,$%--!,./#%0#! Algorithm Search Step Step 6
<8*%7"*'=:>?/@$>)3 Caching System Provides some incrementality Trivial Bound Total execution time on a Processor PI PII Processor Pruning Exclude from the computation processors with no in-arcs/out-arcs. PIII PIV 7
6"% Implemented on ILOG Solver 6.3 Instance structure Cyclic Graph Architectures: 2 Core Acyclic Graph 4 Core Strongly Connected Graph 8 Core 8
Run Time 6"% Optimization Improvement Algorithm Not Optimized Algorithm Optimized 9
6"% Optimal Solution # Processors # nodes Cyclic Acyclic Str. Connected Search Time Constr. Time Search Time Constr. Time Search Time Constr. Time 0 0,02 0,0 0,04 0,02 0,02 0,0 2 5,88,33 2,66 0,85 0,92 0,53 20 308,5 77,89 46,89 08,06 44,5 82,3 0 0,0 0 0,04 0,02 0 0 4 5 2,4,65,85,35 0,06 0,04 20 0,2989 0,863 43,39 293,38 0,89 0,66 0 0,0 0 0,02 0,0 0,0 0 8 5 0,02 0,0,37,09 0,02 0 20 0,27 0,9 207,2 69,24 0,06 0,02 20
2)3;#8/)3 CP-base method for allocating and scheduling HSDFGs on multiprocessor platforms. It can be used to find the optimal solution, or to find a feasible solution Future & Present Work Search directly on SDF graph. Take into account Latency Communication, Memory Capacity Constraints.. 2