Intermediate Code. Intermediate Code Generation. Why an Intermediate Representation? Motivation. What Makes a Good IR? Intermediate Code.

Similar documents
2) Write in detail the issues in the design of code generator.

Advanced compiler construction. General course information. Teacher & assistant. Course goals. Evaluation. Grading scheme. Michel Schinz

Instruction Set Architecture (ISA)

Stack machines The MIPS assembly language A simple source language Stack-machine implementation of the simple language Readings:

GENERIC and GIMPLE: A New Tree Representation for Entire Functions

Intermediate Code. Intermediate Code Generation

Language Processing Systems

language 1 (source) compiler language 2 (target) Figure 1: Compiling a program

Semantic Analysis: Types and Type Checking

Lab Experience 17. Programming Language Translation

Compilers I - Chapter 4: Generating Better Code

Interpreters and virtual machines. Interpreters. Interpreters. Why interpreters? Tree-based interpreters. Text-based interpreters

Compiler Construction

How to make the computer understand? Lecture 15: Putting it all together. Example (Output assembly code) Example (input program) Anatomy of a Computer

Algorithm & Flowchart & Pseudo code. Staff Incharge: S.Sasirekha

C Compiler Targeting the Java Virtual Machine

1/20/2016 INTRODUCTION

Syntaktická analýza. Ján Šturc. Zima 208

High level code and machine code

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2

Chapter 7D The Java Virtual Machine

Introduction. Compiler Design CSE 504. Overview. Programming problems are easier to solve in high-level languages

Programming Languages

CA Compiler Construction

Topics. Introduction. Java History CS 146. Introduction to Programming and Algorithms Module 1. Module Objectives

Lecture Outline. Stack machines The MIPS assembly language. Code Generation (I)

İSTANBUL AYDIN UNIVERSITY

Parsing Technology and its role in Legacy Modernization. A Metaware White Paper

Scoping (Readings 7.1,7.4,7.6) Parameter passing methods (7.5) Building symbol tables (7.6)

Instruction Set Design

Notes on Assembly Language

Compiler Construction

1 Classical Universal Computer 3

CIS570 Modern Programming Language Implementation. Office hours: TDB 605 Levine

CSCI 3136 Principles of Programming Languages

Instruction Set Architecture

How To Trace

Lecture 9. Semantic Analysis Scoping and Symbol Table

Machine-Code Generation for Functions

Scanning and parsing. Topics. Announcements Pick a partner by Monday Makeup lecture will be on Monday August 29th at 3pm

Algorithms and Data Structures

1 The Java Virtual Machine

Semester Review. CSC 301, Fall 2015

CPU Organisation and Operation

The previous chapter provided a definition of the semantics of a programming

Computer Systems Architecture

Compiler I: Syntax Analysis Human Thought

03 - Lexical Analysis

Obfuscation: know your enemy

Low Level Virtual Machine for Glasgow Haskell Compiler

A TOOL FOR DATA STRUCTURE VISUALIZATION AND USER-DEFINED ALGORITHM ANIMATION

10CS35: Data Structures Using C

Where we are CS 4120 Introduction to Compilers Abstract Assembly Instruction selection mov e1 , e2 jmp e cmp e1 , e2 [jne je jgt ] l push e1 call e

Instruction Set Architecture (ISA) Design. Classification Categories

CS / IT B.E. VII Semester Examination, December 2013 Cloud Computing Time : Three Hours Maximum Marks : 70

Computer Organization and Architecture

AUTOMATED TEST GENERATION FOR SOFTWARE COMPONENTS

HOTPATH VM. An Effective JIT Compiler for Resource-constrained Devices

Simple Image File Formats

The Java Virtual Machine (JVM) Pat Morin COMP 3002

CS412/CS413. Introduction to Compilers Tim Teitelbaum. Lecture 20: Stack Frames 7 March 08

Compilers. Introduction to Compilers. Lecture 1. Spring term. Mick O Donnell: michael.odonnell@uam.es Alfonso Ortega: alfonso.ortega@uam.

Handout 1. Introduction to Java programming language. Java primitive types and operations. Reading keyboard Input using class Scanner.

Master of Sciences in Informatics Engineering Programming Paradigms 2005/2006. Final Examination. January 24 th, 2006

(Refer Slide Time: 00:01:16 min)

Lecture 2 Introduction to Data Flow Analysis

CIS570 Lecture 9 Reuse Optimization II 3

a storage location directly on the CPU, used for temporary storage of small amounts of data during processing.

HAVING a good mental model of how a

COMP 356 Programming Language Structures Notes for Chapter 4 of Concepts of Programming Languages Scanning and Parsing

General Introduction

High-Level Programming Languages. Nell Dale & John Lewis (adaptation by Michael Goldwasser)

CODE FACTORING IN GCC ON DIFFERENT INTERMEDIATE LANGUAGES

LSN 2 Computer Processors

7.1 Our Current Model

EC 362 Problem Set #2

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science

University of Twente. A simulation of the Java Virtual Machine using graph grammars

Lumousoft Visual Programming Language and its IDE

PES Institute of Technology-BSC QUESTION BANK

More MIPS: Recursion. Computer Science 104 Lecture 9

An Introduction to Assembly Programming with the ARM 32-bit Processor Family

Lexical Analysis and Scanning. Honors Compilers Feb 5 th 2001 Robert Dewar

02 B The Java Virtual Machine

Chapter 1. Dr. Chris Irwin Davis Phone: (972) Office: ECSS CS-4337 Organization of Programming Languages

Levels of Programming Languages. Gerald Penn CSC 324

The programming language C. sws1 1

Moving from CS 61A Scheme to CS 61B Java

Automata and Computability. Solutions to Exercises

Symbol Tables. Introduction

Jonathan Worthington Scarborough Linux User Group

X86-64 Architecture Guide

- Applet java appaiono di frequente nelle pagine web - Come funziona l'interprete contenuto in ogni browser di un certo livello? - Per approfondire

Generating a Recognizer from Declarative Machine Descriptions

5. A full binary tree with n leaves contains [A] n nodes. [B] log n 2 nodes. [C] 2n 1 nodes. [D] n 2 nodes.

OAMulator. Online One Address Machine emulator and OAMPL compiler.

KITES TECHNOLOGY COURSE MODULE (C, C++, DS)

Programming Languages

Instruction scheduling

picojava TM : A Hardware Implementation of the Java Virtual Machine

Transcription:

Intermediate Code Generation Intermediate Code CS 471 October 29, 2007 Source code Lexical Analysis tokens Syntactic Analysis AST Semantic Analysis AST Intermediate Code Gen IR lexical errors syntax errors semantic errors 1 CS 471 Fall 2007 Motivation What we have so far... An abstract syntax tree With all the program information Known to be correct Well-typed Nothing missing No ambiguities What we need... Something Executable Closer to actual machine level of abstraction Why an Intermediate Representation? What is the IR used for? Portability Optimization Component Interface Program understanding Compiler Front end does lexical analysis, parsing, semantic analysis, translation to IR Back end does optimization of IR, translation to machine instructions Try to keep machine dependences out of IR for as long as possible 2 CS 471 Fall 2007 3 CS 471 Fall 2007 Intermediate Code Abstract machine code (Intermediate Representation) Allows machine-independent code generation, optimization optimize x86 AST IR PowerPC Alpha What Makes a Good IR? Easy to translate from AST Easy to translate to assembly Narrow interface: small number of node types (instructions) Easy to optimize Easy to retarget AST (>40 node types) IR (13 node types) x86 (>200 opcodes) 4 CS 471 Fall 2007 5 CS 471 Fall 2007 1

Intermediate Representations Any representation between the AST and ASM 3 address code: triples, quads (low-level) Expression trees (high-level) Tiger intermediate MEM code: + a[i]; MEM BINOP a MUL i CONST Intermediate representation Components code representation symbol table analysis information string table Issues Use an existing IR or design a new one? Many available: RTLs, SSA, LVM, etc How close should it be to the source/target? W 6 CS 471 Fall 2007 7 CS 471 Fall 2007 IR selection Using an existing IR cost savings due to reuse it must be expressive and appropriate for the compiler operations Designing an IR decide how close to machine code it should be decide how expressive it should be decide its structure consider combining different kinds of IRs The IR Machine A machine with Infinite number of temporaries (think registers) Simple instructions 3-operands Branching Calls with simple calling convention Simple code structure Array of instructions Labels to define targets of branches 8 CS 471 Fall 2007 9 CS 471 Fall 2007 Temporaries The machine has an infinite number of temporaries Call them t0, t1, t2,... Temporaries can hold values of any type The type of the temporary is derived from the generation Temporaries go out of scope with each function Optimizing Compilers Goal: get program closer to machine code without losing information needed to do useful optimizations Need multiple IR stages AST optimize HIR optimize MIR opt x86 (LIR) opt PowerPC (LIR) opt Alpha (LIR) 10 CS 471 Fall 2007 11 CS 471 Fall 2007 2

High-Level IR (HIR) used early in the process usually converted to lower form later on Preserves high-level language constructs structured flow, variables, methods Allows high-level optimizations based on properties of source language (e.g. inlining, reuse of constant variables) Example: AST Medium-Level IR (MIR) Try to reflect the range of features in the source language in a language-independent way Intermediate between AST and assembly Unstructured jumps, registers, memory locations Convenient for translation to high-quality machine code OtherMIRs: quadruples: a = b OP c ( a is explicit, not arc) UCODE: stack machine based (like Java bytecode) advantage of tree IR: easy to generate, easier to do reasonable instruction selection advantage of quadruples: easier optimization 12 CS 471 Fall 2007 13 CS 471 Fall 2007 Low-Level IR (LIR) IR classification: Level Assembly code + extra pseudo instructions Machine dependent Translation to assembly code is trivial Allows optimization of code for low-level considerations: scheduling, memory layout for i := op1 to op2 step op3 instructions endfor High-level i := op1 if step < 0 goto L2 L1: if i > op2 goto L3 instructions i := i + step goto L1 L2: if i < op2 goto L3 instructions i := i + step goto L2 L3: Medium-level 14 CS 471 Fall 2007 15 CS 471 Fall 2007 IR classification: Structure Graphical Trees, graphs Not easy to rearrange Large structures Linear Looks like pseudocode Easy to rearrange Hybrid Combine graphical and linear IRs Example: low-level linear IR for basic blocks, and graph to represent flow of control Parse tree Abstract syntax tree High-level Useful for source-level information Retains syntactic structure Common uses source-to-source translation semantic analysis syntax-directed editors 16 CS 471 Fall 2007 17 CS 471 Fall 2007 3

: Often Use Basic Blocks Basic block = a sequence of consecutive statements in which flow of control enters at the beginning and leaves at the end without halt or possibility of branching except at the end Partitioning a sequence of statements into BBs 1.Determine leaders (first statements of BBs) The first statement is a leader The target of a conditional is a leader A statement following a branch is a leader 2.For each leader, its basic block consists of the leader and all the statements up to but not including the next leader. Basic blocks unsigned int fibonacci (unsigned int n) { unsigned int f0, f1, f2; f0 = 0; f1 = 1; if (n <= 1) return n; for (int i=2; i<=n; i++) { f2 = f0+f1; f0 = f1; f1 = f2; } return f2; } read(n) f0 := 0 f1 := 1 if n<=1 goto L0 i := 2 L2: if i<=n goto L1 return f2 L1: f2 := f0+f1 f0 := f1 f1 := f2 i := i+1 go to L2 L0: return n 18 CS 471 Fall 2007 19 CS 471 Fall 2007 Basic blocks Leaders: read(n) f0 := 0 f1 := 1 if n<=1 goto L0 i := 2 L2: if i<=n goto L1 return f2 L1: f2 := f0+f1 f0 := f1 f1 := f2 i := i+1 go to L2 L0: return n i := 2 i<=n f2 := f0+f1 f0 := f1 f1 := f2 i := i+1 entry read(n) f0 := 0 f1 := 1 n <= 1 return f2 return n exit Tree, for basic block* root: operator up to two children: operands can be combined Uses: algebraic simplifications may generate locally optimal code. L1: i := 2 t1:= i+1 t2 := t1>0 if t2 goto L1 assgn, i add, t1 gt, t2 2 i 1 t1 0 *straight-line code with no branches or branch targets. 2 gt, t2 add, t1 0 assgn, i 1 20 CS 471 Fall 2007 21 CS 471 Fall 2007 Directed acyclic graphs (DAGs) Like compressed trees leaves: variables, constants available on entry internal nodes: operators annotated with variable names? distinct left/right children Used for basic blocks (DAGs don't show control flow) Can generate efficient code. Note: DAGs encode common expressions But difficult to transform Good for analysis Generating DAGs Check whether an operand is already present if not, create a leaf for it Check whether there is a parent of the operand that represents the same operation if not create one, then label the node representing the result with the name of the destination variable, and remove that label from all other nodes in the DAG. 22 CS 471 Fall 2007 23 CS 471 Fall 2007 4

Directed acyclic graphs (DAGs) Example m := 2 * y * z n := 3 * y * z p := 2 * y - z Control flow graphs (CFGs) Each node corresponds to a basic block, or part of a basic block, or may need to determine facts at specific points within BB a single statement more space and time Each edge represents flow of control 24 CS 471 Fall 2007 25 CS 471 Fall 2007 Dependence graphs : they represents constraints on the sequencing of operations Dependence = a relation between two statements that puts a constraint on their execution order. Control dependence Based on the program's control flow. Data dependence Based on the flow of data between statements. Nodes represent statements Edges represent dependences Labels on the edges represent types of dependences Built for specific optimizations, then discarded Dependence graphs Example: s1 a := b + c s2 if a>10 goto L1 s3 d := b * e s4 e := d + 1 s5 L1: d := e / 2 s1 s2 s3 s4 s5 control dependence: s3 and s4 are executed only when a<=10 true or flow dependence: s2 uses a value defined in s1 This is read-after-write dependence antidependence: s4 defines a value used in s3 This is write-after-read dependence output dependence: s5 defines a value also defined in s3 This is write-after-write dependence input dependence: s5 uses a value also uses in s3 This is read-after-read situation. It places no constraints in the execution order, but is used in some optimizations. 26 CS 471 Fall 2007 27 CS 471 Fall 2007 IR classification: Structure Graphical Trees, graphs Not easy to rearrange Large structures Linear Looks like pseudocode Easy to rearrange Hybrid Combine graphical and linear IRs Example: low-level linear IR for basic blocks, and graph to represent flow of control Linear IRs Sequence of instructions that execute in order of appearance Control flow is represented by conditional branches and jumps Common representations stack machine code three-address code 28 CS 471 Fall 2007 29 CS 471 Fall 2007 5

Linear IRs Stack machine code Assumes presence of operand stack Useful for stack architectures, JVM Operations typically pop operands and push results. Advantages Easy code generation Compact form Disadvantages Difficult to rearrange Difficult to reuse expressions Example: Three-Address Code Each instruction is of the form x := y op z y and z can be only registers or constants Just like assembly Common form of intermediate code The AST expression x + y * z is translated as t 1 := y * z t 2 := x + t 1 Each subexpression has a home 30 CS 471 Fall 2007 31 CS 471 Fall 2007 Three Address Code Result, operand, operand, operator x := y op z, where op is a binary operator and x, y, z can be variables, constants or compiler generated temporaries (intermediate results) Can write this as a shorthand <op, arg1, arg2, result > -- quadruples Statements Assignment x := y op z Copy stmtsx := y Goto L Three Address Code Statements (cont) if x relop y goto L Indexed assignments x := y[j] or s[j] := z Address and pointer assignments (for C) x := &y x := *p *x := y Parmx; call p, n; return y; 32 CS 471 Fall 2007 33 CS 471 Fall 2007 Bigger Example Summary Consider the AST t1 t1 := :=--c t2 t2 := := b * t1 t1 t3 t3 := :=--c t4 t4 := := b * t3 t3 t5 t5 := := t2 t2 +t4 a := := t5 t5 IRs provide the interface between the front and back ends of the compiler Should be machine and language independent Should be amenable to optimization Next Time: Tiger IR 34 CS 471 Fall 2007 35 CS 471 Fall 2007 6