Lecture 10 Partial Redundancy Elimination



Similar documents
CIS570 Lecture 9 Reuse Optimization II 3

Lecture 2 Introduction to Data Flow Analysis

CSE 504: Compiler Design. Data Flow Analysis

Register Usage Penalty at Procedure Calls

Optimizing compilers. CS Modern Compilers: Theory and Practise. Optimization. Compiler structure. Overview of different optimizations

Clustering and scheduling maintenance tasks over time

Software Pipelining - Modulo Scheduling

A Fresh Look at Optimizing Array Bound Checking

Optimization of ETL Work Flow in Data Warehouse

Administration. Instruction scheduling. Modern processors. Examples. Simplified architecture model. CS 412 Introduction to Compilers

KEEP THIS COPY FOR REPRODUCTION PURPOSES. I ~~~~~Final Report

EECS 583 Class 11 Instruction Scheduling Software Pipelining Intro

Static Single Assignment Form. Masataka Sassa (Tokyo Institute of Technology)

Binary Search Trees. A Generic Tree. Binary Trees. Nodes in a binary search tree ( B-S-T) are of the form. P parent. Key. Satellite data L R

AN IMPLEMENTATION OF SWING MODULO SCHEDULING WITH EXTENSIONS FOR SUPERBLOCKS TANYA M. LATTNER

ABCD: Eliminating Array Bounds Checks on Demand

CIS570 Modern Programming Language Implementation. Office hours: TDB 605 Levine

DATABASE NORMALIZATION

Abstract Title: Planned Preemption for Flexible Resource Constrained Project Scheduling

Opus: University of Bath Online Publication Store

Chapter 2: Project Time Management

Static analysis of parity games: alternating reachability under parity

Analysis of an Artificial Hormone System (Extended abstract)

Determination of the normalization level of database schemas through equivalence classes of attributes

Union-Find Algorithms. network connectivity quick find quick union improvements applications

Group18-CUCE2012. Mr. Mobile Project. Software Testing Plan (STP) Version: 4.0. CM Identifier: G18_SE004

How to make the computer understand? Lecture 15: Putting it all together. Example (Output assembly code) Example (input program) Anatomy of a Computer

M. Sugumaran / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 2 (3), 2011,

Evaluation of Different Task Scheduling Policies in Multi-Core Systems with Reconfigurable Hardware

Project and Production Management Prof. Arun Kanda Department of Mechanical Engineering Indian Institute of Technology, Delhi

Software Analysis (POPA Sec , 2.5.4, 2.5.6)

CSE 4351/5351 Notes 7: Task Scheduling & Load Balancing

Home Page. Data Structures. Title Page. Page 1 of 24. Go Back. Full Screen. Close. Quit

Module-I Lecture-I Introduction to Digital VLSI Design Flow

HARDWARE ACCELERATION IN FINANCIAL MARKETS. A step change in speed

Coverability for Parallel Programs

Dynamic Programming. Lecture Overview Introduction

Critical Path Analysis

EdExcel Decision Mathematics 1

A Constraint Programming Application for Rotating Workforce Scheduling

FPGA area allocation for parallel C applications

Machine Learning: Multi Layer Perceptrons

Data Structures and Algorithm Analysis (CSC317) Intro/Review of Data Structures Focus on dynamic sets

Algorithms and Data Structures

Project Time Management

Instruction Scheduling. Software Pipelining - 2

AUTOMATING DISCRETE EVENT SIMULATION OUTPUT ANALYSIS AUTOMATIC ESTIMATION OF NUMBER OF REPLICATIONS, WARM-UP PERIOD AND RUN LENGTH.

Project Scheduling in Software Development

Time Management II. June 5, Copyright 2008, Jason Paul Kazarian. All rights reserved.

Project Management. 03/02/04 EPS 2004 Bjørn Engebretsen


Intelligent Heuristic Construction with Active Learning

Distributed Explicit Partial Rerouting (DEPR) Scheme for Load Balancing in MPLS Networks

Scenario: Optimization of Conference Schedule.

A Computer Application for Scheduling in MS Project

Why have SSA? Birth Points (cont) Birth Points (a notion due to Tarjan)

Scheduling Glossary Activity. A component of work performed during the course of a project.

EdExcel Decision Mathematics 1

A Survey of Static Program Analysis Techniques

Mathematics for Computer Science/Software Engineering. Notes for the course MSM1F3 Dr. R. A. Wilson

Chapter 4. Distance Vector Routing Protocols

Lecture 9. Semantic Analysis Scoping and Symbol Table

A SURVEY ON WORKFLOW SCHEDULING IN CLOUD USING ANT COLONY OPTIMIZATION

An On-Line Algorithm for Checkpoint Placement

7 THE ULTIMATE CHALLENGES (For recruitment agencies)

Big Data looks Tiny from the Stratosphere

Online Appendix. A2 : Description of New York City High School Admissions

Lecture Outline. Topology Design: I. Topologies and Reliability. Another Case: Mobile Performance Optimization. Jeremiah Deng.

Unique column combinations

Dynamic programming formulation

Analysis of Approximation Algorithms for k-set Cover using Factor-Revealing Linear Programs

A STUDY OF TASK SCHEDULING IN MULTIPROCESSOR ENVIROMENT Ranjit Rajak 1, C.P.Katti 2, Nidhi Rajak 3

Use of Game Theory in Executive Support System

Complete Redundancy Detection in Firewalls

Optimising Patient Transportation in Hospitals

InvGen: An Efficient Invariant Generator

IMPLEMENTATION OF RELIABLE CACHING STRATEGY IN CLOUD ENVIRONMENT

Hydraulic Pipeline Application Modules PSI s Tools to Support Pipeline Operation

Firewall Verification and Redundancy Checking are Equivalent

Project management: a simulation-based optimization method for dynamic time-cost tradeoff decisions

Database Design and the Reality of Normalisation

Bachelor of Games and Virtual Worlds (Programming) Subject and Course Summaries

Picture Maze Generation by Successive Insertion of Path Segment

Critical Path Method (CPM)

Previous Lectures. B-Trees. External storage. Two types of memory. B-trees. Main principles

Dynamic programming. Doctoral course Optimization on graphs - Lecture 4.1. Giovanni Righini. January 17 th, 2013

Sybilproof Reputation Mechanisms

Scheduling Shop Scheduling. Tim Nieberg

Cost Model: Work, Span and Parallelism. 1 The RAM model for sequential computation:

CODE FACTORING IN GCC ON DIFFERENT INTERMEDIATE LANGUAGES

Database Replication with Oracle 11g and MS SQL Server 2008

Oracle 10g PL/SQL Training

CHAPTER 22 Database Security Integration Using Role-Based Access Control

Seminar: Cryptographic Protocols Staged Information Flow for JavaScript

Static IP Routing and Aggregation Exercises

Lecture 4 on Obfuscation by Partial Evaluation of Distorted Interpreters

2.1 CAN Bit Structure The Nominal Bit Rate of the network is uniform throughout the network and is given by:

Model Checking: An Introduction

Enumerating possible Sudoku grids

AUTOMATED TRADING RULES

Transcription:

References Lecture 10 Partial Redundancy Elimination Global code motion optimization Remove partially redundant expressions Loop invariant code motion Can be extended to do Strength Reduction No loop analysis needed Bidirectional flow problem 1. E. Morel and C. Renvoise, Global Optimization by Suppression of Partial Redundancies, CACM 22 (2), Feb. 1979, pp. 96-103. 2. Knoop, Rüthing, Steffen, Lazy Code Motion, PLDI 92. 3. F. Chow, A Portable Machine-Independent Global Optimizer--Design and Measurements. Stanford CSL memo 83-254. 4. Dhamdhere, Rosen, Zadeck, How to Analyze Large Programs Efficiently and Informatively, PLDI 92. 5. K. Drechsler, M. Stadel, A Solution to a Problem with Morel and Renvoise s Global Optimization by Suppression of Partial Redundancies, ACM TOPLAS 10 (4), Oct. 1988, pp. 635-640. 6. D. Dhamdhere, Practical Adaptation of the Global Optimization Algorithm of Morel and Renvoise, ACM TOPLAS 13 (2), April 1991. 7. D. Dhamdhere, A Fast Algorithm for Code Movement Optimisation, SIGPLAN Not. 23 (10), 1988, pp. 172-180. 8. S. Joshi, D. Dhamdhere, A composite hoisting --- strength reduction transformation for global program optimisation, International Journal of Computer Mathematics, 11 (1982), pp. 21-41, 111-126. 15-745: Partial Redundancy Elim. 1 15-745: Partial Redundancy Elim. 2 Redundancy A Common Subexpression is a Redundant Computation Partially Redundant Computation Partial Redundancy t3 = a + b t3 = a + b Occurrence of expression E at P is redundant if E is available there: E is evaluated along every path to P, with no operands redefined since. Redundant expression can be eliminated Occurrence of expression E at P is partially redundant if E is partially available there: E is evaluated along at least one path to P, with no operands redefined since. Partially redundant expression can be eliminated if we can insert computations to make it fully redundant. 15-745: Partial Redundancy Elim. 3 15-745: Partial Redundancy Elim. 4 1

Loop Invariants are Partial Redundancies Loop invariant expression is partially redundant As before, partially redundant computation can be eliminated if we insert computations to make it fully redundant. Remaining copies can be eliminated through copy propagation or more complex analysis of partially redundant assignments. Partial Redundancy Elimination The Method: 1. Insert Computations to make partially redundant expression(s) fully redundant. 2. Eliminate redundant expression(s). Issues [Outline of Lecture]: 1. What expression occurrences are candidates for elimination? 2. Where can we safely insert computations? 3. Where do we want to insert them? For this lecture, we assume one expression of interest,. In practice, with some restrictions, can do many expressions in parallel. 15-745: Partial Redundancy Elim. 5 15-745: Partial Redundancy Elim. 6 Which Occurrences Might Be Eliminated? In CSE, E is available at P if it is previously evaluated along every path to P, with no subsequent redefinitions of operands. If so, we can eliminate computation at P. In PRE, E is partially available at P if it is previously evaluated along at least one path to P, with no subsequent redefinitions of operands. If so, we might be able to eliminate computation at P, if we can insert computations to make it fully redundant. Occurrences of E where E is partially available are candidates for elimination. Finding Partially Available Expressions Forward flow problem Lattice = 0, 1 }, meet is union ( ), Top = 0 (not PAVAIL), entry = 0 PAVOUT[i] = (PAVIN[i] KILL[i]) AVLOC[i] 0 i = entry PAVIN[i] = PAVOUT[p] otherwise p preds(i) For a block, Expression is locally available (AVLOC) if downwards exposed. Expression is killed (KILL) if any assignments to operands. 15-745: Partial Redundancy Elim. 7 15-745: Partial Redundancy Elim. 8 2

For expression. Partial Availability Example Where Can We Insert Computations? Safety: never introduce a new expression along any path. AVLOC = 0 PAVIN = PAVOUT = PAVIN = PAVOUT = t3 = a + b PAVIN = PAVOUT = Occurrence in loop is partially redundant. Insertion could introduce exception, change program behavior. If we can add a new basic block, can insert safely in most cases. Solution: insert expression only where it is anticipated. Performance: never increase the # of computations on any path. Under simple model, guarantees program won t get worse. Reality: might increase register lifetimes, add copies, lose. 15-745: Partial Redundancy Elim. 9 15-745: Partial Redundancy Elim. 10 Finding Anticipated Expressions Anticipation Example Backward flow problem Lattice = 0, 1 }, meet is intersection ( ), top = 1 (ANT), exit = 0 ANTIN[i] = ANTLOC[i] (ANTOUT[i] - KILL[i]) 0 i = exit ANTOUT[i] = ANTIN[s] s succ(i) otherwise For a block, Expression locally anticipated (ANTLOC) if upwards exposed. For expression. ANTLOC = 1 ANTIN = ANTOUT = ANTIN = ANTOUT = ANTIN = ANTOUT = Expression is anticipated at end of first block. Computation may be safely inserted there. 15-745: Partial Redundancy Elim. 11 15-745: Partial Redundancy Elim. 12 3

Where Do We Want to Insert Computations? Where Do We Want to Insert? Example Morel-Renvoise and variants: Placement Possible Dataflow analysis shows where to insert: Placement possible at entry of block or before. Placement possible at exit of block or before. Insert at earliest place where PP = 1. Only place at end of blocks, PPIN really means Placement possible or not necessary in each predecessor block. Don t need to insert where expression is already available. INSERT[i] = PPOUT[i] ( PPIN[i] KILL[i]) AVOUT[i] Remove (upwards-exposed) computations where PPIN=1. DELETE[i] = PPIN[i] ANTLOC[i] 15-745: Partial Redundancy Elim. 13 15-745: Partial Redundancy Elim. 14 Formulating the Problem Computing Placement Possible PPOUT: we want to place at output of this block only if we want to place at entry of all successors PPIN: we want to place at input of this block only if (all of): we have a local computation to place, or a placement at the end of this block which we can move up we want to move computation to output of all predecessors where expression is not already available (don t insert at input) we can gain something by placing it here (PAVIN) Forward or Backward? BOTH! Problem is bidirectional, but lattice 0, 1} is finite, so as long as transfer functions are monotone, it converges. PPOUT: we want to place at output of this block only if we want to place at entry of all successors 0 i = entry PPOUT[i] = PPIN[s] otherwise s succ(i) PPIN: we want to place at start of this block only if (all of): we have a local computation to place, or a placement at the end of this block which we can move up we want to move computation to output of all predecessors where expression is not already available (don t insert at input) we gain something by moving it up (PAVIN heuristic) 0 i = exit PPIN[i] = ([ANTLOC[i] (PPOUT[i] KILL[i])] (PPOUT[p] AVOUT[p]) otherwise p preds(i) PAVIN[i]) 15-745: Partial Redundancy Elim. 15 15-745: Partial Redundancy Elim. 16 4

Placement Possible Example 1 Placement Possible Example 2 AVLOC = 0 PAVIN = 0 PAVOUT = 0 AVOUT = 0 PAVIN = 0 P ANTLOC = 1 PAVIN = 1 P AVLOC = 0 PAVIN = 0 PAVOUT = 0 AVOUT = 0 PAVIN = 1 P ANTLOC = 1 PAVIN = 1 P 15-745: Partial Redundancy Elim. 17 15-745: Partial Redundancy Elim. 18 Placement Possible Correctness Convergence of analysis: transfer functions are monotone. Safety: Insert only if anticipated. PPIN[i] (PPOUT[i] KILL[i]) ANTLOC[i] PPOUT[i] = INSERT PPOUT ANTOUT, so insertion is safe. Performance: never increase the # of computations on any path DELETE = PPIN ANTLOC 0 i = exit PPIN[s] s succ(i) otherwise On every path from an INSERT, there is a DELETE. The number of computations on a path does not increase. Morel-Renvoise Limitations Movement usefulness tied to PAVIN heuristic Makes some useless moves, might increase register lifetimes: Doesn t find some eliminations: Bidirectional data flow difficult to compute. 15-745: Partial Redundancy Elim. 19 15-745: Partial Redundancy Elim. 20 5

Related Work Don t need heuristic Dhamdhere, Drechsler-Stadel, Knoop,et.al. use restricted flow graph or allow edge placements. Data flow can be separated into unidirectional passes Dhamdhere, Knoop, et. al. Improvement still tied to accuracy of computational model Assumes performance depends only on the number of computations along any path. Ignores resource constraint issues: register allocation, etc. Knoop, et.al. give earliest and latest placement algorithms which begin to address this. Further issues: more than one expression at once, strength reduction, redundant assignments, redundant stores. 15-745: Partial Redundancy Elim. 21 6