Intermediate Representation

Similar documents
2) Write in detail the issues in the design of code generator.

Intermediate Code. Intermediate Code Generation

CS412/CS413. Introduction to Compilers Tim Teitelbaum. Lecture 20: Stack Frames 7 March 08

Semantic Analysis: Types and Type Checking

CA Compiler Construction

Interpreters and virtual machines. Interpreters. Interpreters. Why interpreters? Tree-based interpreters. Text-based interpreters

GENERIC and GIMPLE: A New Tree Representation for Entire Functions

Advanced compiler construction. General course information. Teacher & assistant. Course goals. Evaluation. Grading scheme. Michel Schinz

C Compiler Targeting the Java Virtual Machine

Stack machines The MIPS assembly language A simple source language Stack-machine implementation of the simple language Readings:

Embedded Systems. Review of ANSI C Topics. A Review of ANSI C and Considerations for Embedded C Programming. Basic features of C

Compilers I - Chapter 4: Generating Better Code

language 1 (source) compiler language 2 (target) Figure 1: Compiling a program

Lecture 9. Semantic Analysis Scoping and Symbol Table

X86-64 Architecture Guide

Optimizations. Optimization Safety. Optimization Safety. Control Flow Graphs. Code transformations to improve program

PROBLEMS (Cap. 4 - Istruzioni macchina)

Using Eclipse CDT/PTP for Static Analysis

Instruction Set Architecture (ISA)

Compiler Construction

Chapter 5 Names, Bindings, Type Checking, and Scopes

Scoping (Readings 7.1,7.4,7.6) Parameter passing methods (7.5) Building symbol tables (7.6)

Lecture Outline. Stack machines The MIPS assembly language. Code Generation (I)

Language Processing Systems

Glossary of Object Oriented Terms

Technical paper review. Program visualization and explanation for novice C programmers by Matthew Heinsen Egan and Chris McDonald.

Stack Allocation. Run-Time Data Structures. Static Structures

Semester Review. CSC 301, Fall 2015

CSCI 3136 Principles of Programming Languages

Chapter 7D The Java Virtual Machine

A TOOL FOR DATA STRUCTURE VISUALIZATION AND USER-DEFINED ALGORITHM ANIMATION

KITES TECHNOLOGY COURSE MODULE (C, C++, DS)

Where we are CS 4120 Introduction to Compilers Abstract Assembly Instruction selection mov e1 , e2 jmp e cmp e1 , e2 [jne je jgt ] l push e1 call e

Symbol Tables. Introduction

Syntaktická analýza. Ján Šturc. Zima 208

The previous chapter provided a definition of the semantics of a programming

Organization of Programming Languages CS320/520N. Lecture 05. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Memory Allocation. Static Allocation. Dynamic Allocation. Memory Management. Dynamic Allocation. Dynamic Storage Allocation

Compilers. Introduction to Compilers. Lecture 1. Spring term. Mick O Donnell: michael.odonnell@uam.es Alfonso Ortega: alfonso.ortega@uam.

Obfuscation: know your enemy

Stacks. Linear data structures

AUTOMATED TEST GENERATION FOR SOFTWARE COMPONENTS

TEACHING COMPUTER PROGRAMMING WITH PROGRAM ANIMATION

Administration. Instruction scheduling. Modern processors. Examples. Simplified architecture model. CS 412 Introduction to Compilers

1 The Java Virtual Machine

Krishna Institute of Engineering & Technology, Ghaziabad Department of Computer Application MCA-213 : DATA STRUCTURES USING C

Recovering Business Rules from Legacy Source Code for System Modernization

Optimizing compilers. CS Modern Compilers: Theory and Practise. Optimization. Compiler structure. Overview of different optimizations

Introduction. Compiler Design CSE 504. Overview. Programming problems are easier to solve in high-level languages

Programming Languages

How to make the computer understand? Lecture 15: Putting it all together. Example (Output assembly code) Example (input program) Anatomy of a Computer

COMP 356 Programming Language Structures Notes for Chapter 10 of Concepts of Programming Languages Implementing Subprograms.

Habanero Extreme Scale Software Research Project

Data Structure with C

An Introduction to Assembly Programming with the ARM 32-bit Processor Family

1) The postfix expression for the infix expression A+B*(C+D)/F+D*E is ABCD+*F/DE*++

OAMulator. Online One Address Machine emulator and OAMPL compiler.

Moving from CS 61A Scheme to CS 61B Java

Optimal Stack Slot Assignment in GCC

Scanning and parsing. Topics. Announcements Pick a partner by Monday Makeup lecture will be on Monday August 29th at 3pm

Frances: A Tool For Understanding Code Generation

Computer Systems Architecture

DATA STRUCTURES USING C

1 Classical Universal Computer 3

Programming Language Pragmatics

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science

Instruction Set Design

- Applet java appaiono di frequente nelle pagine web - Come funziona l'interprete contenuto in ogni browser di un certo livello? - Per approfondire

How To Trace

COMP 356 Programming Language Structures Notes for Chapter 4 of Concepts of Programming Languages Scanning and Parsing

Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University

The following themes form the major topics of this chapter: The terms and concepts related to trees (Section 5.2).

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2

Data Structure [Question Bank]

Run$me Query Op$miza$on

LaTTe: An Open-Source Java VM Just-in Compiler

Programming Languages

The Java Virtual Machine (JVM) Pat Morin COMP 3002

Instruction Set Architecture

Intel Assembler. Project administration. Non-standard project. Project administration: Repository

Static Single Assignment Form. Masataka Sassa (Tokyo Institute of Technology)

Stacks. Data Structures and Data Types. Collections

Introduction to Data-flow analysis

Inside the Java Virtual Machine

7. Virtual Machine I: Stack Arithmetic 1

Automatic Test Data Synthesis using UML Sequence Diagrams

Efficient Data Structures for Decision Diagrams

University of Twente. A simulation of the Java Virtual Machine using graph grammars

02 B The Java Virtual Machine

Chapter 2 Topics. 2.1 Classification of Computers & Instructions 2.2 Classes of Instruction Sets 2.3 Informal Description of Simple RISC Computer, SRC

a storage location directly on the CPU, used for temporary storage of small amounts of data during processing.

Pushdown automata. Informatics 2A: Lecture 9. Alex Simpson. 3 October, School of Informatics University of Edinburgh als@inf.ed.ac.

Compiler Construction

Algorithms and Data Structures

Machine-Code Generation for Functions

Lecture 1: Introduction

Embedded Programming in C/C++: Lesson-1: Programming Elements and Programming in C

CIS570 Lecture 9 Reuse Optimization II 3

Central Processing Unit (CPU)

Static Analysis of Virtualization- Obfuscated Binaries

Transcription:

Intermediate Representation Abstract syntax tree, controlflow graph, three-address code cs5363 1

Intermediate Code Generation Intermediate language between source and target Multiple machines can be targeted Attaching a different backend for each machine Intel, AMD, IBM machines can all share the same parser for C/C++ Multiple source languages can be supported Attaching a different frontend (parser) for each language Eg. C and C++ can share the same backend Allow independent code optimizations Multiple levels of intermediate representation Supporting the needs of different analyses and optimizations cs5363 2

IR In Compilers Internal representation of input program by compilers Source code of the input program Results of program analysis Control-flow graphs, data-flow graphs, dependence graphs Symbol tables Book-keeping information for translation (eg., types and addresses of variables and subroutines) Selecting IR --- depends on the goal of compilation Source-to-source translation: close to source language Parse trees and abstract syntax trees Translating to machine code: close to machine code Linear three-address code External format of IR Support independent passes over IR cs5363 3

Abstraction Level in IR Source-level IR High-level constructs are readily available for optimization Array access, loops, classes, methods, functions Machine-level IR Expose low-level instructions for optimization Array address calculation, goto branches Subscript A i j Source-level tree loadi 1 => r1 sub rj, r1 => r2 loadi 10 => r3 mult r2, r3 => r4 sub ri, r1 => r5 add r4, r5 => r6 loadi @A => r7 add r7, r6 => r8 load r8 => raij ILOC code cs5363 4

Parse Tree And AST Graphically represent grammatical structure of input program Parse tree: tree representation of syntax derivations AST: condensed form of parse tree Operators and keywords do not appear as leaves Chains of single productions are collapsed Parse trees Abstract syntax trees S IF B THEN S1 ELSE S2 If-then-else B S1 S2 E E + T T 5 3 + 3 5 cs5363 5

Implementing AST in C Define different kinds of AST nodes Grammar: E ::= E + T E T T T ::= (E) id num typedef enum {PLUS, MINUS, ID, NUM} ASTNodeTag; Define AST node types typedef struct ASTnode { AstNodeTag kind; union { symbol_table_entry* id_entry; int num_value; struct ASTnode* opds[2]; } description; }; Define AST node construction routines ASTnode* mkleaf_id(symbol_table_entry* e); ASTnode* mkleaf_num(int n); ASTnode* mknode_plus(struct ASTnode* opd1, struct ASTNode* opd2); ASTnode* mknode_minus(struct ASTnode* opd1, struct ASTNode* opd2); cs5363 6

Implementing AST in Java Grammar: E ::= E + T E T T T ::= (E) id num Define AST node abstract class ASTexpression { public System.String tostring(); } class ASTidentifier extends ASTexpression { private symbol_table_entry id_entry; } class ASTvalue extends ASTexpression { private int num_value; } class ASTplus extends ASTexpression { private ASTnode opds[2]; } Class ASTminus extends ASTexpression { private ASTnode opds[2];... } Define AST node construction routines ASTexpression mkleaf_id(symbol_table_entry e) { return new ASTidentifier(e); } ASTexpression mkleaf_num(int n) { return new ASTvalue(n); } ASTexpression mknode_plus(astnode opd1, struct ASTNode opd2) { return new ASTplus(opd1, opd2); ASTexpression mknode_minus(astnode opd1, struct ASTNode opd2) { return new ASTminus(opd1, opd2); cs5363 7

Constructing AST Use syntax-directed definitions Associate each non-terminal with an AST A pointer to an AST node: E.nptr T.nptr Evaluate synthesized attribute bottom-up From children ASTs, compute AST of the parent E ::= E1 + T { E.nptr=mknode_plus(E1.nptr,T.nptr); } E ::= E1 T { E.nptr=mknode_minus(E1.nptr,T.nptr); } E ::= T { E.nptr=T.nptr; } T ::= (E) {T.nptr=E.nptr; } T ::= id { T.nptr=mkleaf_id(id.entry); } T ::= num { T.nptr=mkleaf_num(num.val); } Exercise: what is the AST for 5 + (15-b)? What if top-down parsing is used (need to eliminate left-recursion)? cs5363 8

Example: AST for 5+(15-b) Bottom-up parsing: evaluate attribute at each reduction Parse tree for 5+(15-b) E5 E1 + T4 T1 ( E3 ) 5 E2 - T3 T2 15 b 1. reduce 5 to T1 using T::=num: T1.nptr = leaf(5) 2. reduce T1 to E1 using E::=T: E1.nptr = T1.nptr = leaf(5) 3. reduce 15 to T2 using T::=num: T2.nptr=leaf(15) 4. reduce T2 to E2 using E::=T: E2.nptr=T2.nptr = leaf(15) 5. reduce b to T3 using T::=num: T3.nptr=leaf(b) 6. reduce E2-T3 to E3 using E::=E-T: E3.nptr=node( -,leaf(15),leaf(b)) 7. reduce (E3) to T4 using T::=(E): T4.nptr=node( -,leaf(15),leaf(b)) 8. reduce E1+T4 to E5 using E::=E+T: E5.nptr=node( +,leaf(5), node( -,leaf(15),leaf(b))) cs5363 9

Symbol tables Symbol tables Record information about names defined in programs Types of variables and functions Additional properties (eg., static, global, scope) Contain information about context of program fragment Can use different symbol tables for different purposes Naming conflicts The same name may represent different things in different places Use separate symbol tables for names in different scopes Multiple layers of symbol tables for nested scopes Implementation of symbol tables Map names to additional information (types,values,etc.) Efficient implementation: using hash tables cs5363 10

Implementing symbol tables Interface Lookup(name) Returns the record for name if one exists in the table; otherwise, indicates that name is not found Insert(name, record) Stores the information in record in the table for name. Symbol tables in nested scopes StartNewScope() Increment the current scope level and creates a new symbol table ExitScope() Changes the current-level symbol table pointer so that it points to the symbol table of surrounding scope Use a global symbol table pointer to keep track of the current scope cs5363 11

Linear IR Low level IL before final code generation A linear sequence of low-level instructions Implemented as a collection (table or list) of tuples Similar to assembly code for an abstract machine Explicit conditional branches and goto jumps Reflect instruction sets of the target machine Stack-machine code and three-address code Stack-machine code two-address code three-address code Push 2 Push y Multiply Push x subtract MOV 2 => t1 MOV y => t2 MULT t2 => t1 MOV x => t4 SUB t1 => t4 t1 := 2 t2 := y t3 := t1*t2 t4 := x t5 := t4-t3 Linear IR for x 2 * y cs5363 12

Stack-machine code Also called one-address code Assumes an operand stack Take operands from top of stack; push results onto the stack Need special operations such as Swapping two operands on top of the stack Compact in space, simple to generate and execute Most operands do not need names Results are transitory unless explicitly moved to memory Used as IR for Smalltalk and Java Stack-machine code for x 2 * y Push 2 Push y Multiply Push x subtract cs5363 13

Three address code Each instruction contains at most two operands and one result. Typical forms include Arithmetic operations: x := y op z x := op y Data movement: x := y [ z ] x[z] := y x := y Control flow: if y op z goto x goto x Function call: param x return y call foo Each instruction maps to at most a few machine instructions Additional constraints depend on target machine instructions Eg., for x := y op z and x := op y all operands must be in registers all operands must be temporaries? Reasonably compact, while allowing reuse of names and values Three-address code for x 2 * y t1 := 2 t2 := y t3 := t1*t2 t4 := x t5 := t4-t3 cs5363 14

Storing Three-Address Code Store all instructions in a quadruple table Every instruction has four fields: op, arg1, arg2, result The label of instructions index of instruction in table Three-address code Quadruple entries t1 := - c t2 := b * t1 t3 := -c t4 := b * t3 t5 := t2 + t4 a := t5 (0) (1) (2) (3) op Uminus Mult Uminus Mult arg1 c b c b arg2 t1 t3 result t1 t2 t3 t4 (4) Plus t2 t4 t5 (5) Assign t5 a Alternative: store all the instructions in a singly/doubly linked list What is the tradeoff? cs5363 15

Mapping Storages To Variables Variables are placeholders for values Every variable must have a location to store its value Register, stack, heap, static storage Values need to be loaded into registers before operation x and y are in registers x and y are in memory Three-address code for x 2 * y: t1 := 2 t2 := t1*y t3 := x-t2 t1 := 2 t2 := y t3 := t1*t2 t4 := x t5 := t4-t3 Which variables can be kept in registers? Which variables must be stored in memory? void A(int b, int *p) { int a, d; a = 3; d = foo(a); *p =b+d; } cs5363 16

Appendix: Control-flow graph Graphical representation of runtime control-flow paths Nodes of graph: basic blocks (straight-line computations) Edges of graph: flows of control Useful for collecting information about computation Detect loops, remove redundant computations, register allocation, instruction scheduling Alternative CFG: Each node contains a single statement i = 0 while (i < 50) { t1 = b * 2; a = a + t1; i = i + 1; }. i =0; t1 := b * 2; a := a + t1; i = i + 1; if I < 50 cs5363 17

Appendix: Dependence graph Graphical representation of reordering constraints between statements Each node n is a single operation/statement Edge (n1,n2) indicates n2 uses result of n1 The order of evaluating n1,n2 cannot be reversed Graph is acyclic within each basic block; is cyclic if loops exist Used in reordering transformations Instruction scheduling, loop transformations Construction For each pair of statements, evaluate ordering constraint a: r1 := w b: r1 := r1 + r1 c: r2 := x d: r1 := r1 * r2 e: r2 := y f: r1 := r1 * r2 g: r2 := z h: r1 := r1 * r2 i: return r1 b d a f c Dependence graph e h cs5363 18 g i

Appendix: Static Single-Assignment A variable can hold multiple values throughout its lifetime Mapping multiple values to a name can hide opportunities of optimization Static single-assignment form (SSA) Each variable is defined by a single operation in the code Each use of variable refers to a single definition Use -functions to merge definitions from different control-flow paths x := y := while (x < 100) x := x + 1 y := y + x SSA: x0 := y0 := if (x0 < 100) goto loop goto next loop: x1 := (x0,x2) y1 := (y0,y2) x2 := x1 + 1 y2 := y1 + x2 if (x2 < 100) goto loop next: x3 := (x0,x2) y3 := (y0,y2) cs5363 19