- - Loop Parallelzaton C-52 Complaton steps: nested loops operatng on arrays, sequentell executon of teraton space DECLARE B[..,..+] FOR I :=.. FOR J :=.. I B[I,J] := B[I-,J]+B[I-,J-] ED FOR ED FOR analyze data dependences data-flow: defnton and use of array elements transform loops keep data dependences ntact - parallelze nner loop(s map onto feld or vector of processors - 2 be Prof. Dr. Uwe Kastens map arrays onto processors such that many acceses are local, transform ndex spaces Vorlesung Übersetzer II SS 2 / Fole 52 Overvew Explan Applcaton area: scentfc computatons goals: execute nner loops n parallel wth effcent data access transformaton steps goals and
Iteraton Space of ested Loops C-53 Iteraton space of n properly nested loops: n-dmensonal space of ntegral ponts (polytope each pont (,..., n of that space represents an executon of the nnermost loop body loop bounds are not known before run-tme teraton space s not necessarly orthogonal teraton space s sequentally enumerated Example: Computaton of Pascal s trangle DECLARE B[..,..] 2 be Prof. Dr. Uwe Kastens FOR I :=.. FOR J :=.. I B[I,J] := B[I-,J]+B[I-,J-] ED FOR ED FOR Vorlesung Übersetzer II SS 2 / Fole 53 oton of teraton space Use the example for explanaton Show executon order of teraton ponts Stepsze greater than causes unused ponts n the teraton space: non-convex polytope Draw an teraton space wth stepsze 3 n one dmenson.
Data Dependences n Iteraton Spaces C-54 Data dependency from teraton pont to 2: Iteraton computes a value that s used n teraton 2 (flow dependency relatve dependence vector d = 2 - = (2 -,..., 2 n - n holds for all teraton ponts except at the border Flow-dependences can not be drected aganst the executon order, can not pont backward n tme: each dependence vector must be lexcographcally postve,. e. d = (,...,, d,..., d > Example: Computaton of Pascal s trangle DECLARE B[..,..] 2 be Prof. Dr. Uwe Kastens FOR I :=.. FOR J :=.. I B[I,J] := B[I-,J]+B[I-,J-] ED FOR ED FOR Vorlesung Übersetzer II SS 2 / Fole 54 Understand dependences n loops Explan Vektor representaton of dependences show examples show admssable drectons graphcally Show dfferent dependence vectors and array accesses n a loop body whch cause such dependences.
Loop Transformaton C-55 The teraton space of a loop nest s transformed onto new coordnates. Goals: execute nnermost loop(s n parallel mprove localty of data accesses; n space: storage of executng processor, n tme: reuse of values stored n cache systolc computaton and communcaton scheme Data dependences must pont forward n tme,.e. lexcographcally postve and not wthn parallel dmensons 3 lnear fundamental transformatons: Reversal: flp executon order for one dmenson Permutaton: exchange two loops of the loop nest 2 be Prof. Dr. Uwe Kastens Skewng: add teraton count of an outer loop to that of an nner one non-lnear transformatons, e. g. Scalng: stretch the teraton space n one dmenson, causes gaps Tlng: ntroduce addtonal nner loops that cover tles of fxed sze Vorlesung Übersetzer II SS 2 / Fole 55 Overvew Explan the goals admssable drectons of dependences Show dagrams for the transformatons
Reversal C-56 Iteraton count of one loop s negated, that dmenson s enumerated backward Transformaton matrx: (... -... - - r ( *( = ( = ( 2-dmensonal: loop varables old new r 2 be Prof. Dr. Uwe Kastens for = to for = to... orgnal transformed for r = - to for r = to... - r r Understand reversal transformaton Vorlesung Übersetzer II SS 2 / Fole 56 Explan the effect of reversal transformaton. Explan the notaton of the transformaton matrx. there may be no dependences n the drecton of the reversed loop - they would pont barckward after the transformaton. Show an example where reversal enables loop fuson. Show a example where reversal enables loop fuson.
Skewng C-57 The teraton count of an outer loop s added to the count of an nner loop; the teraton space s shfted; the executon order of teraton ponts remans unchanged Transformaton matrx: (... f... for = to for = to... orgnal s ( *( = ( = ( f 2-dmensonal: loop varables old new f*+ for s = to for s = f*s to +f*s... s + s 2 be Prof. Dr. Uwe Kastens transformed s Understand skewng transformaton Explan the effect of skewng transformaton. Skewng s always applcable Skewng can enable loop permutaton Show a example where enables loop permutaton Vorlesung Übersetzer II SS 2 / Fole 57
Permutaton C-58 Two loops of the loop nest are nterchanged; the teraton space s flpped; the executon order of teraton ponts changes Transformaton matrx: (... p ( *( = ( = ( 2-dmensonal: loop varables old new p for = to for = to... for p = to orgnal for p = to... p 2 be Prof. Dr. Uwe Kastens transformed p Vorlesung Übersetzer II SS 2 / Fole 58 Understand loop permutaton Explan the effect of loop permutaton Permutaton often yelds a parallelzable nnermost loop. Show a example where permutaton yelds a parallelzable nnermost loop.
Use of Transformaton atrces C-59 Transformaton matrx T defnes new teraton counts n terms of the old ones: T * = e. g. Reversal - - ( *( = ( = ( Transformaton matrx T transforms old dependency vectors nto new ones: T * d = d e. g. - - ( *( = ( 2 be Prof. Dr. Uwe Kastens nverse Transformaton matrx T - defnes old teraton counts n terms of new ones, for transformaton of ndex expressons n the loop body: T - * = e. g. - - ( *( = ( = ( concatenaton of transformatons frst T then T 2 : T 2 * T = T e. g. ( * - ( = ( - Learn how to use the transformaton matrces explan the 4 uses wth examples transform a loop completely Vorlesung Übersetzer II SS 2 / Fole 59 Why do the dependence vectors change under a transformaton, although the dependence between array elements remans unchanged?
Example for Transformaton and Parallelzaton of a Loop for = to for = to a[, ] = (a[, -] + a[-, ] / 2; C-6 2 be Prof. Dr. Uwe Kastens Parallelze the obove loop.. Draw the teraton space. 2. Compute the dependence vectors and draw examples of them nto the teraton space. Why can the nner loop not be executed n parallel? 3. Apply a skewng transformaton and draw the teraton space. 4. Apply the permutaton transformaton and draw the teraton space. Explan why the nner loop now can be executed n parallel. 5. Compute the matrx of the composed transformaton and use t to transform the dependence vectors. 6. Compute the nverse of the transformaton matrx and use t to transform the ndex expressons. 7. Wrte the complete loops wth new loop varables p and p and new loop bounds. Vorlesung Übersetzer II SS 2 / Fole 6 Exercse the method wth an example Explan the steps of the transformaton. Soluton on C-6 Are there other transformatons that lead to a parallel nner loop?
Soluton of the Transformaton and Parallelzaton Example C-6 =4 + =4 =7 p ( =7 ( =7 = ( ( ( = ( =4 + ( - Inverse p 2. A dependence n drecton of the parallel dmenson s not allowed. 2 be Prof. Dr. Uwe Kastens 4. Both dependence vectors pont forward n p drecton. 7. for p = to + for p = max (, p- to mn (p, a[p, p-p] = (a[p, p-p-] + a[p-, p-p] / 2; Vorlesung Übersetzer II SS 2 / Fole 6 Soluton for C-6 Explan the bounds of the teraton spaces, the dependence vectors, the transformaton matrx and ts nverse, the condtons for beng parallelzable, the transformaton of the ndex expressons. Descrbe the transformaton steps.
Inequaltes Descrbe Loop Bounds C-6a The bounds of a loop nest are descrbed by a set of lnear nequaltes. Each nequalty separates the space n nsde and outsde of the teraton space : (- * - B * c ( ( 2 3 4 examp 4 2 3 2 be Prof. Dr. Uwe Kastens (- * - ( ( postve (negatve factors represent upper (lower bounds 2 3 4 examp 2 4 3 2 Vorlesung Übersetzer II SS 2 / Fole 6a Understand representaton of bounds Explan matrx notaton Explan graphc nterpretaton There can be arbtrary many nequaltes Gve the representatons of other teraton spaces.
Transformaton of Loop Bounds C-6b The nverse of a transformaton matrx T - transforms a set of nequaltes: B * T - c skewng nverse ( ( - examp new bounds: ( - B T- B * T- - ( ( * - - - - B * T- c 4 2 2 be Prof. Dr. Uwe Kastens ( - - - * ( ( 2 3 4 3 Understand the transformaton of bounds Explan how the nequaltes are transformed Compute further transformatons of bounds. Vorlesung Übersetzer II SS 2 / Fole 6b
Transformaton and Parallelzaton Iteraton space orgnal transformed (, -> (, - = (s, s s - C-62 sequental tme s s - parallel prozessor mappng s mod 2 DECLARE B[..,..] FOR I :=.. FOR J :=.. I B[I,J] := B[I-,J]+B[I-,J-] ED FOR ED FOR DECLARE B[..,..] FOR IS :=.. FOR JS := -IS.. B[IS,JS+IS] := B[IS-,JS+IS]+B[IS-,JS-+IS] ED FOR ED FOR Vorlesung Übersetzer II SS 2 / Fole 62 Example for parallelzaton Explan skewng transformaton. Inner loop n parallel. Explan the tme and processor mappng. mod 2 folds the arbtrary large loop dmenson on a fxed number of 2 processors. Gve the matrx of ths transformaton. Use t to compute the dependence vectors, the ndexexpressons, and the loop bounds.
Data appng C-63 Goal: Dstrbute array elements over processors, such that as many as possble accesses are local. Index space of an array: n-dmensonal space of ntegral ndex ponts (polytop same propertes as teraton space same mathematc model same transformatons are applcable (Skewng, Reversal, Permutaton,... no restrktons by data dependences 2 be Prof. Dr. Uwe Kastens Vorlesung Übersetzer II SS 2 / Fole 63 reuse model of teraton spaces Explan wth examples of ndex spaces Draw an ndex space for each of the 3 transformatons.
orgnal DECLARE B[..,..] Data Dstrbuton for Parallel Loops FOR I :=.. FOR J := -I.. B[I,J+I] := B[I-,J+I]+B[I-,J-+I] ED FOR ED FOR DECLARE B[..,-..]... B[I,J] := B[I-,J-]+B[I-,J] Index space transformed P wrtesb[i,j+i] Data on P 5% local skewng (, -> (,- %local - - C-64 Vorlesung Übersetzer II SS 2 / Fole 64 See the effect of ndex transformaton Explan local and non-local accesses. Explan ndex transformaton. Demonstrate mproved localty. Skewng causes unused storage. How do you compute the mappng of the ndces usng the transformaton matrx?