Question 1. Part (a) Part (b) Part (c)

Similar documents
Compiler Construction

Regular Expressions and Automata using Haskell

Formal Languages and Automata Theory - Regular Expressions and Finite Automata -

Why? A central concept in Computer Science. Algorithms are ubiquitous.

6.045: Automata, Computability, and Complexity Or, Great Ideas in Theoretical Computer Science Spring, Class 4 Nancy Lynch

Data Structures and Algorithms Written Examination

Optimizations. Optimization Safety. Optimization Safety. Control Flow Graphs. Code transformations to improve program

Symbol Tables. Introduction

Static Analysis. Find the Bug! : Analysis of Software Artifacts. Jonathan Aldrich. disable interrupts. ERROR: returning with interrupts disabled

[Refer Slide Time: 05:10]

Intermediate Code. Intermediate Code Generation

Sources: On the Web: Slides will be available on:

SYSM 6304: Risk and Decision Analysis Lecture 5: Methods of Risk Analysis

University of Toronto Department of Electrical and Computer Engineering. Midterm Examination. CSC467 Compilers and Interpreters Fall Semester, 2005

Algorithms and Data Structures

Informatique Fondamentale IMA S8

fakultät für informatik informatik 12 technische universität dortmund Data flow models Peter Marwedel Informatik 12 TU Dortmund Germany

COMPUTER SCIENCE TRIPOS

AUTOMATED TEST GENERATION FOR SOFTWARE COMPONENTS

Regression Verification: Status Report

Automata-based Verification - I

TORA : Temporally Ordered Routing Algorithm

2) Write in detail the issues in the design of code generator.

Circuits 1 M H Miller

Static Typing for Object-Oriented Programming

Testing LTL Formula Translation into Büchi Automata

Class notes Program Analysis course given by Prof. Mooly Sagiv Computer Science Department, Tel Aviv University second lecture 8/3/2007

03 - Lexical Analysis

Measuring the Performance of an Agent

Lecture 1. Basic Concepts of Set Theory, Functions and Relations

CSE 504: Compiler Design. Data Flow Analysis

1 Introduction. 2 An Interpreter. 2.1 Handling Source Code

COMP 250 Fall 2012 lecture 2 binary representations Sept. 11, 2012

Boolean Expressions, Conditions, Loops, and Enumerations. Precedence Rules (from highest to lowest priority)

CSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) Total 92.

Finite Automata and Regular Languages

CSE 135: Introduction to Theory of Computation Decidability and Recognizability

Introduction to Programming (in C++) Loops. Jordi Cortadella, Ricard Gavaldà, Fernando Orejas Dept. of Computer Science, UPC

Cost Model: Work, Span and Parallelism. 1 The RAM model for sequential computation:

Example. Introduction to Programming (in C++) Loops. The while statement. Write the numbers 1 N. Assume the following specification:

Scanner. tokens scanner parser IR. source code. errors

Reading 13 : Finite State Automata and Regular Expressions

The Halting Problem is Undecidable

Dynamic Programming. Lecture Overview Introduction

CSC4510 AUTOMATA 2.1 Finite Automata: Examples and D efinitions Definitions

Finite Automata. Reading: Chapter 2

Regular Languages and Finite Automata

1 Operational Semantics for While

Statements and Control Flow

Lecture 2 Introduction to Data Flow Analysis

1 Definition of a Turing machine

Automatic Test Data Synthesis using UML Sequence Diagrams

Automata on Infinite Words and Trees

Universality in the theory of algorithms and computer science

Enforcing Security Policies. Rahul Gera

A Static Analyzer for Large Safety-Critical Software. Considered Programs and Semantics. Automatic Program Verification by Abstract Interpretation

Object Oriented Software Design

Deterministic Finite Automata

Mathematical Induction

Static Taint-Analysis on Binary Executables

6.852: Distributed Algorithms Fall, Class 2

Binary Adders: Half Adders and Full Adders

OPRE 6201 : 2. Simplex Method

The Graphical Method: An Example

Programming Languages

Omega Automata: Minimization and Learning 1

Network Flow I. Lecture Overview The Network Flow Problem

Object Oriented Software Design

The Union-Find Problem Kruskal s algorithm for finding an MST presented us with a problem in data-structure design. As we looked at each edge,

II. BASICS OF PACKET FILTERING

Reliability Guarantees in Automata Based Scheduling for Embedded Control Software

A First Investigation of Sturmian Trees

Automata and Computability. Solutions to Exercises

1 if 1 x 0 1 if 0 x 1

WESTMORELAND COUNTY PUBLIC SCHOOLS Integrated Instructional Pacing Guide and Checklist Computer Math

Chapter 7: Termination Detection

The following themes form the major topics of this chapter: The terms and concepts related to trees (Section 5.2).

Linear Programming. March 14, 2014

Tutorial on C Language Programming

UNIVERSITY OF CALIFORNIA Department of Electrical Engineering and Computer Sciences Computer Science Division. P. N. Hilfinger

Lexical analysis FORMAL LANGUAGES AND COMPILERS. Floriano Scioscia. Formal Languages and Compilers A.Y. 2015/2016

Intermediate Math Circles March 7, 2012 Linear Diophantine Equations II

language 1 (source) compiler language 2 (target) Figure 1: Compiling a program

Lecture 9. Semantic Analysis Scoping and Symbol Table

INF5140: Specification and Verification of Parallel Systems

Algorithm & Flowchart & Pseudo code. Staff Incharge: S.Sasirekha

Software Synthesis from Dataflow Models for G and LabVIEW

CLC Server Command Line Tools USER MANUAL

AI: A Modern Approach, Chpts. 3-4 Russell and Norvig

Glossary of Object Oriented Terms

Advanced compiler construction. General course information. Teacher & assistant. Course goals. Evaluation. Grading scheme. Michel Schinz

26 Integers: Multiplication, Division, and Order

A binary search tree or BST is a binary tree that is either empty or in which the data element of each node has a key, and:

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science

Unix Shell Scripts. Contents. 1 Introduction. Norman Matloff. July 30, Introduction 1. 2 Invoking Shell Scripts 2

Write Barrier Removal by Static Analysis

How to make the computer understand? Lecture 15: Putting it all together. Example (Output assembly code) Example (input program) Anatomy of a Computer

Introduction to Java

On Recognizable Timed Languages FOSSACS 2004

International Journal of Advanced Research in Computer Science and Software Engineering

Transcription:

Question 1. In an environment in which computer programs are freely transmitted across the Internet, porting and security issues are becoming increasingly important. Define at least three classes of portability and/or security problems that a program (in source or object form) imported from an external site may be subject to. Assume that we have the complete source for an imported program, including instructions for its configuration (e.g., a Makefile including compiler options). What kinds of compile-time analyses can be used to detect possible occurrences of the problems you defined in part (a)? Instead of compile-time analyses, what kinds of run-time actions can be used to detect or prevent the classes of problems you defined in part (a)?

2 Question 2. Recall that among the possible ways to represent the statements in a basic block are: as a sequence of abstract-syntax trees (one for each statement) as a DAG. Give an algorithm for building a DAG representation of a basic block. (Assume that a basic block consists of a sequence of 3-address statements.) Illustrate your algorithm using the following sequence of statements (please give illustrations of some intermediate stages of your algorithm, not just the final DAG). a = 4 * k b = a c = b * k d = 4 * k b = b + 1 What are some advantages of the DAG representation over the sequence of trees representation? Suppose the sequence of statements in the basic block includes array references (e.g., A[k] = 0 or x = A[k] ). How does this complicate the process of building the DAG representation of the basic block (and what must be done to handle such array references)?

3 Question 3. In this question, we explore how a program optimizer might take into account information about whether the condition of an if-then-else statement or a while loop is always true or always false. We consider only a limited version of the problem, which has the following features: Programs are assumed to consist of a single procedure. The program is assumed to have no aliasing. The analysis only needs to track state changes for certain kinds of assignment statements: assignments of constant values, i.e., statements of the form x = c, where c is a constant, copy statements, i.e., statements of the form x = y, where y is another program variable. The analysis should assume that nothing is known about the value of x after other kinds of statements that assign to x (e.g., x = y + z). All expressions in conditions are of the form x = = 0, where x is a program variable. Assume that we have an analysis that determines (safely), for each condition with an expression of the form x = = 0, whether x is always 0, always non-0, or of unknown value. One approach to optimizing the program would be to iterate between phases of analysis and transformation: [1] Analyze the program (to discover information about the values of variables used in conditions) [2] while there are conditions with statically determinable values do [3] Remove the non-executable branches and their controlling conditions [4] Analyze the program (to discover information about the values of variables used in conditions) [5] od For Parts (a) and (b), assume that the analysis algorithm used in steps [1] and [4] does not interpret conditions. (Step [2] interprets conditions with respect to the information gathered in steps [1] and [4] however, this is done as a separate stage in between invocations of the analysis algorithm proper.) Give an example program for which more than one iteration is necessary to produce the best results. Explain what code is removed on each iteration, and why. Give an example program in which there is a condition with the following properties: (i) the condition depends only on assignments from constants and copies, (ii) the condition is always true on any actual execution, yet (iii) the iterative algorithm given above would not detect this fact. [Hint: Consider while loops.] suggests that we need an analysis that accounts for the values of conditions as part of the analysis. Define a dataflow analysis (for statically determining the values of conditions of the form x = = 0) that incorporates the notion of not propagating information down a branch until there is evidence that the branch will be taken. Recall that a dataflow analysis can be defined by specifying A lattice of dataflow values, together with the lattice s meet operation. A dataflow function for conditions and for each kind of statement. (Each function maps the dataflow value that characterizes the state before the statement/predicate executes to the dataflow value that characterizes the state after it executes.) [Hint: Put the dataflow functions on the edges of the control-flow graph. The function on a condition s outgoing true edge need not be the same as the function on the condition s outgoing false edge.] Part (d) Illustrate your answer to using your example from.

4 Question 4. Consider a generalized kind of constant propagation that determines, for each program point and for each variable, whether the variable is N-limited; that is, whether it contains one of a set of up to N values. (Normal constant propagation determines whether a variable is 1-limited.) Define a dataflow framework that determines which variables at which points are N-limited, for a fixed N. Assume the usual simple imperative programming language (a program is a single procedure, there is no aliasing, etc). How can the knowledge that a variable is N-limited at a particular point be used by an optimizing compiler? What are the advantages and disadvantages of this dataflow problem compared with normal constant propagation?

5 Question 5. Consider the simple imperative language defined below. program cmd cmd Id := intexp repeat cmd until boolexp cmd ; cmd switch ( intexp ) cases cases case intexp : cmd case intexp : cmd ; cases intexp IntLit Id intexp + intexp boolexp BoolLit intexp == intexp That is, a program is a command, and a command is an assignment, a repeat-loop, a command followed by another command, or a switch, and commands contain simple integer and boolean expressions. A partially defined denotational semantics for this language is given below. A State is a mapping from identifiers to values; initial state σ 0 is the state that maps all identifiers to zero. The meaning functions I and B, used to define the meanings of integer and boolean literals, simply return the values of their arguments. The function update used to define the meaning of the assignment command takes three parameters: a state σ, an identifier x, and an integer value v, and returns a state that is the same as σ except that it maps x to v. Meaning Functions P: Command State C: Command State State IE: IntExpression State Integer BE: BoolExpression State Bool P[[ C ]] = C [[ C ]] σ 0 C[[ Id := intexp ]] = λσ. update(σ, Id, IE [[ intexp ]] C[[ C 1 ; C 2 ]] = λσ. C [[ C 2 ]] (C [[ C 1 ]] (σ)) IE[[ IntLit ]] = λσ. I [[ IntLit ]] IE[[ Id ]] = λσ.σ( Id ) IE[[ intexp 1 + intexp 2 ]] = λσ.(ie [[ intexp 1 ]] σ) + (IE [[ intexp 2 ]] σ) BE[[ BoolLit ]] = λσ. B [[ BoolLit ]] BE[[ intexp 1 == intexp 2 ]] = λσ. (IE [[ intexp 1 ]] σ) == (IE [[ intexp 2 ]] σ) You are to supply the definitions of the meaning functions for the repeat loop: and the switch: C [[ repeat C until boolexp ]] C [[ switch intexp cases ]] In writing these definitions you may use the fix operator (which returns the least-fixed-point of its functional argument), as well as the usual functional constructs (e.g., let, if-then-else). If you need to change any of the types of the meaning functions (P, C, IE, or BE) be sure to write down the new types; if you need to add a new kind of meaning function, write its type, too.

6 Question 6. Consider a DFA (deterministic finite automaton) that accepts the set of tokens of a programming language. For purposes of this question, it is convenient to think of the DFA s transition function δ as defining a labeled directed graph (or state-transition diagram) in the usual way: The nodes are the states Q; each transition δ(q, a) = q corresponds to an edge q a q, labeled with a. In addition, however, it is convenient to assume that the graph is augmented with an explicit failure node, q fail, which represents a new non-final state, and that the graph is normalized as follows: (i) (ii) Nodes (states) from which there is no path to a final-state node are said to be useless. All useless nodes are condensed to q fail. That is, if node m is useless, edges of the form m a q and q b m are replaced by edges of the form q fail a q and q b q fail, respectively. (Some of these edges may be removed by normalization-step (iii) below.) The graph is made into a total representation of δ: An edge of the form q c q fail is added to the graph for each undefined transition δ(q, c). (iii) q fail is made into a sink node: All edges of the form q fail a q fail are removed from the graph. Let us call a state s an unbounded state iff It is a non-accepting state, and There are an infinite number of paths that start from s and do not include a final state. That is, the DFA contains unbounded states iff there are arbitrarily long sequences of characters that are prefixes of valid tokens, without themselves being valid tokens. (Note that q fail is not an unbounded state.) Give a regular expression that defines a token that might reasonably be part of some programming language and for which the DFA has an unbounded state. Show the DFA for the token, and indicate which state or states are unbounded. For this part, either (i) (ii) Give an algorithm to identify the set of unbounded states of a DFA, or Explain how to define a collection of equations that identify the set of unbounded states of a DFA. Can a set of equations as you have defined them have more than one solution? If so, explain how would you go about solving the equations to ensure that the final solution obtained identifies exactly the unbounded states. If not, explain why they have a unique solution. (Whichever approach you choose, you should address the general case, not just your example from.) For most programming languages, the DFA for the language s tokens never has a path in it from a final state to an unbounded state. Why would it be a bad thing if there were such a path?