Compilers I - Chapter 4: Generating Better Code

Similar documents
1 Classical Universal Computer 3

Stack machines The MIPS assembly language A simple source language Stack-machine implementation of the simple language Readings:

Notes on Assembly Language

EC 362 Problem Set #2

Chapter 7D The Java Virtual Machine

Addressing The problem. When & Where do we encounter Data? The concept of addressing data' in computations. The implications for our machine design(s)

Interpreters and virtual machines. Interpreters. Interpreters. Why interpreters? Tree-based interpreters. Text-based interpreters

Instruction Set Architecture (ISA)

Central Processing Unit (CPU)

Lecture Outline. Stack machines The MIPS assembly language. Code Generation (I)

Lecture 7: Machine-Level Programming I: Basics Mohamed Zahran (aka Z)

Lecture 27 C and Assembly

Instruction Set Design

Instruction Set Architecture. or How to talk to computers if you aren t in Star Trek

a storage location directly on the CPU, used for temporary storage of small amounts of data during processing.

Name: Class: Date: 9. The compiler ignores all comments they are there strictly for the convenience of anyone reading the program.

Chapter 2 Topics. 2.1 Classification of Computers & Instructions 2.2 Classes of Instruction Sets 2.3 Informal Description of Simple RISC Computer, SRC

The Little Man Computer

Instruction Set Architecture

(Refer Slide Time: 00:01:16 min)

Computer Organization and Architecture

An Introduction to Assembly Programming with the ARM 32-bit Processor Family

2) Write in detail the issues in the design of code generator.

Central Processing Unit Simulation Version v2.5 (July 2005) Charles André University Nice-Sophia Antipolis

Assembly Language Programming

MICROPROCESSOR AND MICROCOMPUTER BASICS

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2

The previous chapter provided a definition of the semantics of a programming

Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University

Sources: On the Web: Slides will be available on:

OAMulator. Online One Address Machine emulator and OAMPL compiler.

Assembly Language: Function Calls" Jennifer Rexford!

Language Processing Systems

CA Compiler Construction

Administrative Issues

Instruction Set Architecture

Embedded Systems. Review of ANSI C Topics. A Review of ANSI C and Considerations for Embedded C Programming. Basic features of C

High-Level Programming Languages. Nell Dale & John Lewis (adaptation by Michael Goldwasser)

Unpacked BCD Arithmetic. BCD (ASCII) Arithmetic. Where and Why is BCD used? From the SQL Server Manual. Packed BCD, ASCII, Unpacked BCD

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored?

Where we are CS 4120 Introduction to Compilers Abstract Assembly Instruction selection mov e1 , e2 jmp e cmp e1 , e2 [jne je jgt ] l push e1 call e

PROBLEMS (Cap. 4 - Istruzioni macchina)

l C-Programming l A real computer language l Data Representation l Everything goes down to bits and bytes l Machine representation Language

Chapter 5 Instructor's Manual

CS412/CS413. Introduction to Compilers Tim Teitelbaum. Lecture 20: Stack Frames 7 March 08

Winter 2002 MID-SESSION TEST Friday, March 1 6:30 to 8:00pm

Platform-independent static binary code analysis using a metaassembly

More MIPS: Recursion. Computer Science 104 Lecture 9

Data Structures and Algorithms V Otávio Braga

Comp 255Q - 1M: Computer Organization Lab #3 - Machine Language Programs for the PDP-8

CPU Organisation and Operation

Administration. Instruction scheduling. Modern processors. Examples. Simplified architecture model. CS 412 Introduction to Compilers

The AVR Microcontroller and C Compiler Co-Design Dr. Gaute Myklebust ATMEL Corporation ATMEL Development Center, Trondheim, Norway

Divide: Paper & Pencil. Computer Architecture ALU Design : Division and Floating Point. Divide algorithm. DIVIDE HARDWARE Version 1

CSCI 3136 Principles of Programming Languages

Machine Programming II: Instruc8ons

CS:APP Chapter 4 Computer Architecture. Wrap-Up. William J. Taffe Plymouth State University. using the slides of

BCD (ASCII) Arithmetic. Where and Why is BCD used? Packed BCD, ASCII, Unpacked BCD. BCD Adjustment Instructions AAA. Example

In the Beginning The first ISA appears on the IBM System 360 In the good old days

Outline. Conditional Statements. Logical Data in C. Logical Expressions. Relational Examples. Relational Operators

Computer Systems Architecture

1 Description of The Simpletron

LSN 2 Computer Processors

7.1 Our Current Model

High level code and machine code

Stack Allocation. Run-Time Data Structures. Static Structures

PROGRAMMABLE LOGIC CONTROLLERS Unit code: A/601/1625 QCF level: 4 Credit value: 15 OUTCOME 3 PART 1

1 The Java Virtual Machine

MACHINE ARCHITECTURE & LANGUAGE

Boolean Expressions, Conditions, Loops, and Enumerations. Precedence Rules (from highest to lowest priority)

The 80x86 Instruction Set

Hardware Assisted Virtualization

8085 INSTRUCTION SET

CPU Organization and Assembly Language

İSTANBUL AYDIN UNIVERSITY

Introduction. Application Security. Reasons For Reverse Engineering. This lecture. Java Byte Code

CS:APP Chapter 4 Computer Architecture Instruction Set Architecture. CS:APP2e

CSC 2405: Computer Systems II

WESTMORELAND COUNTY PUBLIC SCHOOLS Integrated Instructional Pacing Guide and Checklist Computer Math

CS61: Systems Programing and Machine Organization

A3 Computer Architecture

X86-64 Architecture Guide

612 CHAPTER 11 PROCESSOR FAMILIES (Corrisponde al cap Famiglie di processori) PROBLEMS

MAX = 5 Current = 0 'This will declare an array with 5 elements. Inserting a Value onto the Stack (Push)

ALLIED PAPER : DISCRETE MATHEMATICS (for B.Sc. Computer Technology & B.Sc. Multimedia and Web Technology)

Pushdown automata. Informatics 2A: Lecture 9. Alex Simpson. 3 October, School of Informatics University of Edinburgh als@inf.ed.ac.

M A S S A C H U S E T T S I N S T I T U T E O F T E C H N O L O G Y DEPARTMENT OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE

CHAPTER 7: The CPU and Memory

Stacks. Linear data structures

language 1 (source) compiler language 2 (target) Figure 1: Compiling a program

COMPUTER SCIENCE TRIPOS

What Is Recursion? Recursion. Binary search example postponed to end of lecture

Inside the Java Virtual Machine

A deeper look at Inline functions

Recursion vs. Iteration Eliminating Recursion

MACHINE INSTRUCTIONS AND PROGRAMS

Instruction Set Architecture (ISA) Design. Classification Categories

Chapter 1 Computer System Overview

7. Virtual Machine I: Stack Arithmetic 1

Transcription:

Compilers I - Chapter 4: Generating Better Code Lecturers: Paul Kelly (phjk@doc.ic.ac.uk) Office: room 304, William Penney Building Naranker Dulay (nd@doc.ic.ac.uk) Materials: Office: room 562 Textbook Course web pages (http://www.doc.ic.ac.uk/~phjk/compilers) Piazza (http://piazza.com/imperial.ac.uk/fall2015/221) 1

Overview We have seen a simple code generator which handles the basic components of common programming languages - statements and expressions We will need to handle more: declarations (constants, records etc) storage management procedures and functions But first we will examine ways of producing better-quality output code The main issue is the effective use of registers At this stage we are looking for simple, fast algorithms which do a reasonably good job: optimizing compilers use more powerful (slower!) techniques, which we will cover later 2

The plan A simple language with assignments, loops etc. A stack-based instruction set and its code generator Code generation for a machine with registers: an unbounded number of registers a fixed number of registers avoiding running out of registers register allocation across multiple statements Conditionals and Boolean expressions 3

Code generation with an unbounded number of registers We will concentrate on using registers well in arithmetic expressions: Initially assume there will always be enough registers. Invent a scheme to handle cases where we run out of registers Modify the translator to evaluate expressions in the order which minimises the number of registers needed When we look at sequences of assignments ( basic blocks ) it is clear that better code results if variables are kept in registers as well as nameless intermediate values. We will examine the graph colouring approach to register allocation which addresses this. 4

Instruction set for example machine with registers Instruction set data type: data Instruction = Add reg reg (reg := reg + reg) Sub reg reg... (similar) Model: Stack machine as before, augmented with some number of registers R0, R1,... Load reg name (reg := value at location name) LoadImm reg num (load constant into reg) Store reg name (store reg at location name) Push reg (push reg onto stack) Pop reg (remove value from stack, and put it in the reg) CompEq reg reg (subtract reg from reg and set reg to 1 if the result was zero, 0 otherwise) JTrue reg label (if reg = 1 jump to label) JFalse reg label (if reg = 0 jump to label) October Define 15 label (set up destination for jump) 5

Storage allocation for intermediate values in expressions The code generator given earlier would translate the expression: (100*3) + ((200*2) + 300) + (400 + (500*3)) to the stack machine assembly code: PushImm 100, PushImm 3, Mul, PushImm 200, PushImm 2, Mul, PushImm 300, Add, Add, PushImm 400, PushImm 500, PushImm 3, Mul, Add, Add 6

If you feed this into the stack machine simulator ( http://www.doc.ic.ac.uk/ phjk/compilers/haskell/ StackMachine.hs) you can get the trace: 7

How the stack machine uses memory The first instruction puts 100 into slot 0 the second puts 3 into slot 1 the third adds the contents of slots 0 and 1 and puts the result in slot 0 We see that if the computer knows where the stack pointer starts, it can work out beforehand where everything will be This shows the way towards using registers efficiently 8

LoadImm R0 100, LoadImm R1 3, Mul R0 R1, LoadImm R1 100, LoadImm R2 2, Mul R1 R2, LoadImm R2 300, Add R1 R2, Add R0 R1, LoadImm R1 400, LoadImm R2 500, LoadImm R3 3, Mul R2 R3, Add R1 R2, Add R0 R1 Here is the code which results from working out where all the operands will be at compiletime 9

The translation function The translation function transexp requires an extra parameter which specifies the register in which the result is to be left: transexp :: exp -> reg -> [instruction] The code generated by transexp e Ri can use registers R i, R i+1 upwards, but must leave the other registers (R 0...R i-1 ) intact The easy cases. transexp (Const n) r = [LoadImm r n] transexp (Ident x) r = [Load r x] 10

The translation function transexp (Binop op e1 e2) r = transexp e1 r ++ transexp e2 (r+1) ++ transbinop op r (r+1) where transbinop Plus r1 r2 = [Add r1 r2] transbinop Minus r1 r2 = [Sub r1 r2] etc. Parameter r is used to track where the stack pointer would point 11

Example: x * 3 + 4 AST: Walkthrough: transexp (Binop Plus (Binop Times (Ident "x") (Const 3)) (Const 4)) 0 (deliver result to register 0) 12

Example: x * 3 + 4 transexp (Binop Plus (Binop Times (Ident "x")(const 3))(Const 4)) 0 Using the definition of transexp, we unfold this to get: ( transexp (Binop Times (Ident "x")(const 3)) 0 ) ++ ( transexp (Const 4) 1 ) ++ [Add R0 R1] Unfolding again, this reduces to: [Load R0 "x", LoadImm R1 3, Mul R0 R1, (R0 := x*3) LoadImm R1 4, (R1 := 4) Add R0 R1] (R0 := (x*3)+4) How might this be improved? 13

How might use of registers be improved? Example: x * 3 + 4 We used two registers can we get away with fewer? How about: [Load R0 "x", MulImm R0 3, AddImm R0 4] This is clearly better because it involves fewer instructions, and uses fewer registers How to fix the translator to do this? Instead of loading the constant into a register, use an instruction that takes an immediate operand eg in Intel s IA32: movl x,%eax imull $3,%eax addl $4,%eax 14

Using immediate operands The modification required to the translator is small we need to add a rule to catch the special case: transexp (Binop op e1 (Const n)) r = transexp e1 r ++ transbinopimm op r n where transbinopimm Plus r n = [AddImm r n] etc. If the operator is commutative, we can catch another case: transexp (Binop op (Const n) e2) r commutative op = transexp e2 r ++ transbinopimm op r n The general idea here is to use pattern-matching to find opportunities to use instructions see instruction selection in the textbooks 15

Problem: We don t have an unbounded number of registers Before we see how to overcome this problem in the register machine case, we introduce the accumulator machine a machine with only one register, its accumulator. The solution to the problem will be to combine the two techniques. 16

The Accumulator Machine This machine has a stack, and just one register, the accumulator. The stack is used for intermediate values as before, but arithmetic etc. instructions are always of the form: Acc := Acc + Store[SP]; SP := SP+1; The instruction set: data Instruction = Add Sub Mul Div... AddImm num... CompEq... Push Pop Load name LoadImm num Store name Jump label Jtrue label JFalse label Define label 17

The accumulator machine What the instructions do: Add: Acc:=Acc+Store[SP]; SP:=SP+1; Push: SP:=SP-1; Store[SP]:=Acc; Pop: Acc:=Store[SP]; SP:=SP+1; Load name: Acc:=Store[name]; Store name: Store[name]:=Acc; 18

Translator for accumulator machine: The translator transexp generates code to evaluate an expression and leave its value in the accumulator: transexp (Const n) = [LoadImm n] transexp (Ident x) = [Load x] transexp (Binop op e1 e2) = transexp e2 ++ (Note that e2 has to [Push] ++ be evaluated before e1 transexp e1 ++ so that it forms the transbinop op right-hand operand of where the binary operator) transbinop Plus = [Add] etc. For e1 + e2, push value of e2 onto stack while evaluating e1 19

Translator for machine with limited register set A neat solution to the problem of running out of registers is to combine the register machine and accumulator strategies: While free registers remain, use the register machine strategy When the limit is reached (ie. when there is one register left), revert to the accumulator strategy, using the last register as the accumulator The effect is that most expressions get the full benefit of registers, while unusually large expressions are handled correctly 20

Code generation with limited registers - Implementation: if no more registers, use stack for e2 while evaluating e1 If there are more registers, use register r to hold e1. Evaluate e2 leaving result in register r+1 transexp (Const n) r = [LoadImm r n] transexp (Ident x) r = [Load r x] transexp (Binop op e1 e2) r = if r == MAXREG then transexp e2 r ++ [Push r] ++ transexp e1 r ++ transbinopstack op r elseif r < MAXREG transexp e1 r ++ transexp e2 (r+1) ++ transbinop op r (r+1) Arithmetic instruction gets one operand from stack (next slide) Arithmetic instruction gets both operands from registers 21

If there were enough registers, generate an arithmetic instruction which takes register operands: transbinop Plus r1 r2 = [Add r1 r2] transbinop Minus r1 r2 = [Sub r1 r2] etc. If there is only one register left, we need an arithmetic instruction which takes one operand from the top of the stack, as in the accumulator machine: transbinopstack Add r = [AddStack r] transbinopstack Sub r = [SubStack r] etc. The AddStack r instruction is simply: r := r+store[sp]; SP:=SP+1; 22

Conclusion In the last chapter we saw a code generator for simple statements and expressions. In this chapter we looked at ways of improving code for expressions We looked at using stacks, accumulators and registers three ways of managing storage for intermediate values The stack approach is not very efficient. Adding just one register ( accumulator ) much reduces the number of memory references With several registers, main memory for intermediate values is rarely needed. If you do run out, it is efficient enough to revert to the accumulator scheme. Optimising control structures (if-then-else, while, for) is trickier 23