Loop Parallelization



Similar documents
Compiling for Parallelism & Locality. Dependence Testing in General. Algorithms for Solving the Dependence Problem. Dependence Testing

where the coordinates are related to those in the old frame as follows.

v a 1 b 1 i, a 2 b 2 i,..., a n b n i.

Recurrence. 1 Definitions and main statements

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP)

BERNSTEIN POLYNOMIALS

What is Candidate Sampling

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

PERRON FROBENIUS THEOREM

L10: Linear discriminants analysis

Support Vector Machines

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

Application of Quasi Monte Carlo methods and Global Sensitivity Analysis in finance

Forecasting the Direction and Strength of Stock Market Movement

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

How To Assemble The Tangent Spaces Of A Manfold Nto A Coherent Whole

Production. 2. Y is closed A set is closed if it contains its boundary. We need this for the solution existence in the profit maximization problem.

1 Example 1: Axis-aligned rectangles

arxiv: v1 [cs.dc] 11 Nov 2013

Conversion between the vector and raster data structures using Fuzzy Geographical Entities

Ring structure of splines on triangulations

On the Solution of Indefinite Systems Arising in Nonlinear Optimization

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

Realistic Image Synthesis

21 Vectors: The Cross Product & Torque

This circuit than can be reduced to a planar circuit

Finite difference method

Comparison of Control Strategies for Shunt Active Power Filter under Different Load Conditions

Politecnico di Torino. Porto Institutional Repository

Logical Development Of Vogel s Approximation Method (LD-VAM): An Approach To Find Basic Feasible Solution Of Transportation Problem

Project Networks With Mixed-Time Constraints

How To Calculate The Accountng Perod Of Nequalty

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence

A Lyapunov Optimization Approach to Repeated Stochastic Games

A Programming Model for the Cloud Platform

POLYSA: A Polynomial Algorithm for Non-binary Constraint Satisfaction Problems with and

How To Understand The Results Of The German Meris Cloud And Water Vapour Product

Extending Probabilistic Dynamic Epistemic Logic

Calculation of Sampling Weights

Matrix Multiplication I

Adaptive Fractal Image Coding in the Frequency Domain

REGULAR MULTILINEAR OPERATORS ON C(K) SPACES

Efficient Project Portfolio as a tool for Enterprise Risk Management

An Enhanced Super-Resolution System with Improved Image Registration, Automatic Image Selection, and Image Enhancement

An ILP Formulation for Task Mapping and Scheduling on Multi-core Architectures

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

Mean Value Coordinates for Closed Triangular Meshes

8 Algorithm for Binary Searching in Trees

AN EFFECTIVE MATRIX GEOMETRIC MEAN SATISFYING THE ANDO LI MATHIAS PROPERTIES

J. Parallel Distrib. Comput.

A Fast Incremental Spectral Clustering for Large Data Sets

2008/8. An integrated model for warehouse and inventory planning. Géraldine Strack and Yves Pochet

An Integrated Semantically Correct 2.5D Object Oriented TIN. Andreas Koch

Dimensionality Reduction for Data Visualization

O(n) mass matrix inversion for serial manipulators and polypeptide chains using Lie derivatives Kiju Lee, Yunfeng Wang and Gregory S.

Actuator forces in CFD: RANS and LES modeling in OpenFOAM

How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

Multiple stage amplifiers

Conferencing protocols and Petri net analysis

Chapter 11 Torque and Angular Momentum

HÜCKEL MOLECULAR ORBITAL THEORY

INSTITUT FÜR INFORMATIK

Improved SVM in Cloud Computing Information Mining

GIS: data processing Example of spatial queries. 3.1 Spatial queries. Chapter III. Geographic Information Systems: Data Processing

Optimal resource capacity management for stochastic networks

Heuristic Static Load-Balancing Algorithm Applied to CESM

Section 2 Introduction to Statistical Mechanics

Quantization Effects in Digital Filters

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network *

Introduction to Differential Algebraic Equations

FINANCIAL MATHEMATICS. A Practical Guide for Actuaries. and other Business Professionals

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

Solving Factored MDPs with Continuous and Discrete Variables

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek

Detecting Leaks from Waste Storage Ponds using Electrical Tomographic Methods

Chapter 4 ECONOMIC DISPATCH AND UNIT COMMITMENT

Calculating the high frequency transmission line parameters of power cables

Least Squares Fitting of Data

Rotation Kinematics, Moment of Inertia, and Torque

A Master Time Value of Money Formula. Floyd Vest

A Multi-Camera System on PC-Cluster for Real-time 3-D Tracking

NMT EE 589 & UNM ME 482/582 ROBOT ENGINEERING. Dr. Stephen Bruder NMT EE 589 & UNM ME 482/582

SprayExpo 2.0. Program description. SprayExpo has been developed by

Optimal outpatient appointment scheduling

Point cloud to point cloud rigid transformations. Minimizing Rigid Registration Errors

Transcription:

- - Loop Parallelzaton C-52 Complaton steps: nested loops operatng on arrays, sequentell executon of teraton space DECLARE B[..,..+] FOR I :=.. FOR J :=.. I B[I,J] := B[I-,J]+B[I-,J-] ED FOR ED FOR analyze data dependences data-flow: defnton and use of array elements transform loops keep data dependences ntact - parallelze nner loop(s map onto feld or vector of processors - 2 be Prof. Dr. Uwe Kastens map arrays onto processors such that many acceses are local, transform ndex spaces Vorlesung Übersetzer II SS 2 / Fole 52 Overvew Explan Applcaton area: scentfc computatons goals: execute nner loops n parallel wth effcent data access transformaton steps goals and

Iteraton Space of ested Loops C-53 Iteraton space of n properly nested loops: n-dmensonal space of ntegral ponts (polytope each pont (,..., n of that space represents an executon of the nnermost loop body loop bounds are not known before run-tme teraton space s not necessarly orthogonal teraton space s sequentally enumerated Example: Computaton of Pascal s trangle DECLARE B[..,..] 2 be Prof. Dr. Uwe Kastens FOR I :=.. FOR J :=.. I B[I,J] := B[I-,J]+B[I-,J-] ED FOR ED FOR Vorlesung Übersetzer II SS 2 / Fole 53 oton of teraton space Use the example for explanaton Show executon order of teraton ponts Stepsze greater than causes unused ponts n the teraton space: non-convex polytope Draw an teraton space wth stepsze 3 n one dmenson.

Data Dependences n Iteraton Spaces C-54 Data dependency from teraton pont to 2: Iteraton computes a value that s used n teraton 2 (flow dependency relatve dependence vector d = 2 - = (2 -,..., 2 n - n holds for all teraton ponts except at the border Flow-dependences can not be drected aganst the executon order, can not pont backward n tme: each dependence vector must be lexcographcally postve,. e. d = (,...,, d,..., d > Example: Computaton of Pascal s trangle DECLARE B[..,..] 2 be Prof. Dr. Uwe Kastens FOR I :=.. FOR J :=.. I B[I,J] := B[I-,J]+B[I-,J-] ED FOR ED FOR Vorlesung Übersetzer II SS 2 / Fole 54 Understand dependences n loops Explan Vektor representaton of dependences show examples show admssable drectons graphcally Show dfferent dependence vectors and array accesses n a loop body whch cause such dependences.

Loop Transformaton C-55 The teraton space of a loop nest s transformed onto new coordnates. Goals: execute nnermost loop(s n parallel mprove localty of data accesses; n space: storage of executng processor, n tme: reuse of values stored n cache systolc computaton and communcaton scheme Data dependences must pont forward n tme,.e. lexcographcally postve and not wthn parallel dmensons 3 lnear fundamental transformatons: Reversal: flp executon order for one dmenson Permutaton: exchange two loops of the loop nest 2 be Prof. Dr. Uwe Kastens Skewng: add teraton count of an outer loop to that of an nner one non-lnear transformatons, e. g. Scalng: stretch the teraton space n one dmenson, causes gaps Tlng: ntroduce addtonal nner loops that cover tles of fxed sze Vorlesung Übersetzer II SS 2 / Fole 55 Overvew Explan the goals admssable drectons of dependences Show dagrams for the transformatons

Reversal C-56 Iteraton count of one loop s negated, that dmenson s enumerated backward Transformaton matrx: (... -... - - r ( *( = ( = ( 2-dmensonal: loop varables old new r 2 be Prof. Dr. Uwe Kastens for = to for = to... orgnal transformed for r = - to for r = to... - r r Understand reversal transformaton Vorlesung Übersetzer II SS 2 / Fole 56 Explan the effect of reversal transformaton. Explan the notaton of the transformaton matrx. there may be no dependences n the drecton of the reversed loop - they would pont barckward after the transformaton. Show an example where reversal enables loop fuson. Show a example where reversal enables loop fuson.

Skewng C-57 The teraton count of an outer loop s added to the count of an nner loop; the teraton space s shfted; the executon order of teraton ponts remans unchanged Transformaton matrx: (... f... for = to for = to... orgnal s ( *( = ( = ( f 2-dmensonal: loop varables old new f*+ for s = to for s = f*s to +f*s... s + s 2 be Prof. Dr. Uwe Kastens transformed s Understand skewng transformaton Explan the effect of skewng transformaton. Skewng s always applcable Skewng can enable loop permutaton Show a example where enables loop permutaton Vorlesung Übersetzer II SS 2 / Fole 57

Permutaton C-58 Two loops of the loop nest are nterchanged; the teraton space s flpped; the executon order of teraton ponts changes Transformaton matrx: (... p ( *( = ( = ( 2-dmensonal: loop varables old new p for = to for = to... for p = to orgnal for p = to... p 2 be Prof. Dr. Uwe Kastens transformed p Vorlesung Übersetzer II SS 2 / Fole 58 Understand loop permutaton Explan the effect of loop permutaton Permutaton often yelds a parallelzable nnermost loop. Show a example where permutaton yelds a parallelzable nnermost loop.

Use of Transformaton atrces C-59 Transformaton matrx T defnes new teraton counts n terms of the old ones: T * = e. g. Reversal - - ( *( = ( = ( Transformaton matrx T transforms old dependency vectors nto new ones: T * d = d e. g. - - ( *( = ( 2 be Prof. Dr. Uwe Kastens nverse Transformaton matrx T - defnes old teraton counts n terms of new ones, for transformaton of ndex expressons n the loop body: T - * = e. g. - - ( *( = ( = ( concatenaton of transformatons frst T then T 2 : T 2 * T = T e. g. ( * - ( = ( - Learn how to use the transformaton matrces explan the 4 uses wth examples transform a loop completely Vorlesung Übersetzer II SS 2 / Fole 59 Why do the dependence vectors change under a transformaton, although the dependence between array elements remans unchanged?

Example for Transformaton and Parallelzaton of a Loop for = to for = to a[, ] = (a[, -] + a[-, ] / 2; C-6 2 be Prof. Dr. Uwe Kastens Parallelze the obove loop.. Draw the teraton space. 2. Compute the dependence vectors and draw examples of them nto the teraton space. Why can the nner loop not be executed n parallel? 3. Apply a skewng transformaton and draw the teraton space. 4. Apply the permutaton transformaton and draw the teraton space. Explan why the nner loop now can be executed n parallel. 5. Compute the matrx of the composed transformaton and use t to transform the dependence vectors. 6. Compute the nverse of the transformaton matrx and use t to transform the ndex expressons. 7. Wrte the complete loops wth new loop varables p and p and new loop bounds. Vorlesung Übersetzer II SS 2 / Fole 6 Exercse the method wth an example Explan the steps of the transformaton. Soluton on C-6 Are there other transformatons that lead to a parallel nner loop?

Soluton of the Transformaton and Parallelzaton Example C-6 =4 + =4 =7 p ( =7 ( =7 = ( ( ( = ( =4 + ( - Inverse p 2. A dependence n drecton of the parallel dmenson s not allowed. 2 be Prof. Dr. Uwe Kastens 4. Both dependence vectors pont forward n p drecton. 7. for p = to + for p = max (, p- to mn (p, a[p, p-p] = (a[p, p-p-] + a[p-, p-p] / 2; Vorlesung Übersetzer II SS 2 / Fole 6 Soluton for C-6 Explan the bounds of the teraton spaces, the dependence vectors, the transformaton matrx and ts nverse, the condtons for beng parallelzable, the transformaton of the ndex expressons. Descrbe the transformaton steps.

Inequaltes Descrbe Loop Bounds C-6a The bounds of a loop nest are descrbed by a set of lnear nequaltes. Each nequalty separates the space n nsde and outsde of the teraton space : (- * - B * c ( ( 2 3 4 examp 4 2 3 2 be Prof. Dr. Uwe Kastens (- * - ( ( postve (negatve factors represent upper (lower bounds 2 3 4 examp 2 4 3 2 Vorlesung Übersetzer II SS 2 / Fole 6a Understand representaton of bounds Explan matrx notaton Explan graphc nterpretaton There can be arbtrary many nequaltes Gve the representatons of other teraton spaces.

Transformaton of Loop Bounds C-6b The nverse of a transformaton matrx T - transforms a set of nequaltes: B * T - c skewng nverse ( ( - examp new bounds: ( - B T- B * T- - ( ( * - - - - B * T- c 4 2 2 be Prof. Dr. Uwe Kastens ( - - - * ( ( 2 3 4 3 Understand the transformaton of bounds Explan how the nequaltes are transformed Compute further transformatons of bounds. Vorlesung Übersetzer II SS 2 / Fole 6b

Transformaton and Parallelzaton Iteraton space orgnal transformed (, -> (, - = (s, s s - C-62 sequental tme s s - parallel prozessor mappng s mod 2 DECLARE B[..,..] FOR I :=.. FOR J :=.. I B[I,J] := B[I-,J]+B[I-,J-] ED FOR ED FOR DECLARE B[..,..] FOR IS :=.. FOR JS := -IS.. B[IS,JS+IS] := B[IS-,JS+IS]+B[IS-,JS-+IS] ED FOR ED FOR Vorlesung Übersetzer II SS 2 / Fole 62 Example for parallelzaton Explan skewng transformaton. Inner loop n parallel. Explan the tme and processor mappng. mod 2 folds the arbtrary large loop dmenson on a fxed number of 2 processors. Gve the matrx of ths transformaton. Use t to compute the dependence vectors, the ndexexpressons, and the loop bounds.

Data appng C-63 Goal: Dstrbute array elements over processors, such that as many as possble accesses are local. Index space of an array: n-dmensonal space of ntegral ndex ponts (polytop same propertes as teraton space same mathematc model same transformatons are applcable (Skewng, Reversal, Permutaton,... no restrktons by data dependences 2 be Prof. Dr. Uwe Kastens Vorlesung Übersetzer II SS 2 / Fole 63 reuse model of teraton spaces Explan wth examples of ndex spaces Draw an ndex space for each of the 3 transformatons.

orgnal DECLARE B[..,..] Data Dstrbuton for Parallel Loops FOR I :=.. FOR J := -I.. B[I,J+I] := B[I-,J+I]+B[I-,J-+I] ED FOR ED FOR DECLARE B[..,-..]... B[I,J] := B[I-,J-]+B[I-,J] Index space transformed P wrtesb[i,j+i] Data on P 5% local skewng (, -> (,- %local - - C-64 Vorlesung Übersetzer II SS 2 / Fole 64 See the effect of ndex transformaton Explan local and non-local accesses. Explan ndex transformaton. Demonstrate mproved localty. Skewng causes unused storage. How do you compute the mappng of the ndces usng the transformaton matrx?