Where we are CS 4120 Introduction to Compilers Abstract Assembly Instruction selection mov e1 , e2 jmp e cmp e1 , e2 [jne je jgt ] l push e1 call e



Similar documents
Administration. Instruction scheduling. Modern processors. Examples. Simplified architecture model. CS 412 Introduction to Compilers

CS412/CS413. Introduction to Compilers Tim Teitelbaum. Lecture 20: Stack Frames 7 March 08

X86-64 Architecture Guide

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2

Instruction Set Architecture

Instruction Set Design

Instruction Set Architecture. or How to talk to computers if you aren t in Star Trek


Chapter 2 Topics. 2.1 Classification of Computers & Instructions 2.2 Classes of Instruction Sets 2.3 Informal Description of Simple RISC Computer, SRC

Stack machines The MIPS assembly language A simple source language Stack-machine implementation of the simple language Readings:

2) Write in detail the issues in the design of code generator.

Lecture Outline. Stack machines The MIPS assembly language. Code Generation (I)

Notes on Assembly Language

CPU Organization and Assembly Language

İSTANBUL AYDIN UNIVERSITY

Central Processing Unit (CPU)

Software Pipelining. for (i=1, i<100, i++) { x := A[i]; x := x+1; A[i] := x

Lecture 7: Machine-Level Programming I: Basics Mohamed Zahran (aka Z)

Instruction Set Architecture

Compilers I - Chapter 4: Generating Better Code

Intel 8086 architecture

Lecture 27 C and Assembly

EC 362 Problem Set #2

l C-Programming l A real computer language l Data Representation l Everything goes down to bits and bytes l Machine representation Language

Instruction Set Architecture (ISA)

M A S S A C H U S E T T S I N S T I T U T E O F T E C H N O L O G Y DEPARTMENT OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE

CS:APP Chapter 4 Computer Architecture. Wrap-Up. William J. Taffe Plymouth State University. using the slides of

LSN 2 Computer Processors

Chapter 5 Instructor's Manual

CS:APP Chapter 4 Computer Architecture Instruction Set Architecture. CS:APP2e

How It All Works. Other M68000 Updates. Basic Control Signals. Basic Control Signals

OAMulator. Online One Address Machine emulator and OAMPL compiler.

PROBLEMS #20,R0,R1 #$3A,R2,R4

Overview. CISC Developments. RISC Designs. CISC Designs. VAX: Addressing Modes. Digital VAX

Computer Organization and Architecture

EECS 427 RISC PROCESSOR

Instruction Set Architecture (ISA) Design. Classification Categories

Systems I: Computer Organization and Architecture

Execution Cycle. Pipelining. IF and ID Stages. Simple MIPS Instruction Formats

Return-oriented programming without returns

a storage location directly on the CPU, used for temporary storage of small amounts of data during processing.

Software Fingerprinting for Automated Malicious Code Analysis

Introduction. Compiler Design CSE 504. Overview. Programming problems are easier to solve in high-level languages

Computer Systems Architecture

Operating Systems. Virtual Memory

A single register, called the accumulator, stores the. operand before the operation, and stores the result. Add y # add y from memory to the acc

Processor Architectures

Computer Organization and Components

IA-64 Application Developer s Architecture Guide

A3 Computer Architecture

64-Bit NASM Notes. Invoking 64-Bit NASM

Interpreters and virtual machines. Interpreters. Interpreters. Why interpreters? Tree-based interpreters. Text-based interpreters

Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University

Computer Organization

1 Classical Universal Computer 3

Language Processing Systems

CPU Performance Equation

Administrative Issues

CSE 141 Introduction to Computer Architecture Summer Session I, Lecture 1 Introduction. Pramod V. Argade June 27, 2005

Solution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches:

language 1 (source) compiler language 2 (target) Figure 1: Compiling a program

Hacking Techniques & Intrusion Detection. Ali Al-Shemery arabnix [at] gmail

VLIW Processors. VLIW Processors

MACHINE ARCHITECTURE & LANGUAGE

Week 1 out-of-class notes, discussions and sample problems

RISC Processor Simulator (SRC) INEL 4215: Computer Architecture and Organization September 22, 2004

PROBLEMS. which was discussed in Section

Abysssec Research. 1) Advisory information. 2) Vulnerable version

Assembly Language Programming

Review: MIPS Addressing Modes/Instruction Formats

Computer Architecture Lecture 3: ISA Tradeoffs. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 1/18/2013

612 CHAPTER 11 PROCESSOR FAMILIES (Corrisponde al cap Famiglie di processori) PROBLEMS

Assembly Language: Function Calls" Jennifer Rexford!

Guide to RISC Processors

x64 Cheat Sheet Fall 2015

What to do when I have a load/store instruction?

In the Beginning The first ISA appears on the IBM System 360 In the good old days

Instruction scheduling

UNIVERSITY OF CALIFORNIA, DAVIS Department of Electrical and Computer Engineering. EEC180B Lab 7: MISP Processor Design Spring 1995

CHAPTER 7: The CPU and Memory

A Tiny Guide to Programming in 32-bit x86 Assembly Language

Optimizations. Optimization Safety. Optimization Safety. Control Flow Graphs. Code transformations to improve program

Computer Architectures

THUMB Instruction Set

Chapter 9 Computer Design Basics!

INSTRUCTION LEVEL PARALLELISM PART VII: REORDER BUFFER

The 80x86 Instruction Set

EE282 Computer Architecture and Organization Midterm Exam February 13, (Total Time = 120 minutes, Total Points = 100)

Property of ISA vs. Uarch?

Central Processing Unit Simulation Version v2.5 (July 2005) Charles André University Nice-Sophia Antipolis

Chapter 2 Logic Gates and Introduction to Computer Architecture

A s we saw in Chapter 4, a CPU contains three main sections: the register section,

Computer organization

Module: Software Instruction Scheduling Part I

High-Level Programming Languages. Nell Dale & John Lewis (adaptation by Michael Goldwasser)

Reduced Instruction Set Computer (RISC)

An Introduction to Assembly Programming with the ARM 32-bit Processor Family

Multi-core architectures. Jernej Barbic , Spring 2007 May 3, 2007

Transcription:

0/5/03 Where we are CS 0 Introduction to Compilers Ross Tate Cornell University Lecture 8: Instruction Selection Intermediate code synta-directed translation reordering with traces Canonical intermediate code Abstract assembly code tiling dynamic programming register allocation Assembly code Abstract Assembly Abstract assembly = assembly code w/ infinite register set Canonical intermediate code = abstract assembly code epression trees (e, e ) mov e, e JUMP(e) jmp e (e,l) cmp e, e [jne je jgt ] l CALL(e, e, ) push e; ; call e LABEL(l ) l: Instruction selection Conversion to abstract assembly is problem of instruction selection for a single IR statement node Full abstract assembly code: glue translated instructions from each of the statements Problem: more than one way to translate a given statement. How to choose? 3 Eample 86-6 ISA TEMP(t) TEMP(t) TEMP(fp)? mov %rbp, add $, mov (),t3 add t3, t add (%rbp),t Need to map IR tree to actual machine instructions need to know how instructions work A two-address CISC architecture (inherited from 00, 8008, 8086 ) Typical instruction has opcode (mov, add, sub, shl, shr, mul, div, jmp, jcc, push, pop, test, enter, leave) destination (r,n,(r),k(r),(r,r), (r,r,w),k(r,r,w) ) (may also be an operand) source (any legal destination, or a constant $k) opcode src src/dest mov $,%ra add %rc,%rbz sub %rbp,%esi add %edi, (%rc,%edi,6) je label jmp (%rbp) 5 6

0/5/03 AT&T vs Intel Tiling Intel synta: opcode dest, src Registers ra, rb, rc,...r8,r9,...r5 constants k memory operands [n], [rk], [rw*r],... AT&T synta (GNU assembler default): opcode src, dest %ra, %rb, constants $k memory operands n, k(r), (r,r,w),... Idea: each Pentium instruction performs computation for a piece of the IR tree: a tile mov %rbp, add $, mov (),t3 TEMP(t) t3 add t3, t TEMP(t) TEMP(fp) Tiles connected by new temporary registers (, t3) that hold result of tile 7 8 t Tiles mov t, add $, Some tiles TEMP(t) t mov t,t Tiles capture compiler s understanding of instruction set Each tile: sequence of instructions that update a fresh temporary (may need etra mov s) and associated IR tree All outgoing edges are temporaries t t t t CONST(i) mov t, add t, mov $i,(t,t ) 9 0 Designing tiles Only add tiles that are useful to compiler Many instructions will be too hard to use effectively or will offer no advantage Need tiles for all single-node trees to guarantee that every tree can be tiled, e.g. t mov t, add t, t3 t3 More handy tiles lea instruction computes a memory address but doesn t actually load from memory t t lea (t,t ), lea (t,t,k ), t t * f CONST(k ) t (k one of,,8,6)

0/5/03 Matching for RISC As defined in lecture, have (cond, destination) Appel: (op, e, e, destination) where op is one of ==,!=, <, <=, =>, > Our translates easily to RISC ISAs that have eplicit comparison result t < t3 NAME(n) cmplt, t3, t br MIPS t, n Condition code ISA Appel s corresponds more directly to Pentium conditional jumps set condition codes < NAME(n) t cmp t, jl n test condition codes However, can handle Pentium-style jumps with lecture IR with appropriate tiles 3 Branches An annoying instruction How to tile a conditional jump? Fold comparison operator into tile t EQ l (l) l (l) test t jnz l cmp t, t je l Pentium mul instruction multiples single operand by ea, puts result in ea (low 3 bits), ed (high 3 bits) Solution: add etra mov instructions, let register allocation deal with ed overwrite MUL t t mov t, %ea mul mov %ea, t t 5 6 Tiling Problem Eample How to pick tiles that cover IR statement tree with minimum eecution time? Need a good selection of tiles small tiles to make sure we can tile every tree large tiles for efficiency Usually want to pick large tiles: fewer instructions instructions cycles: RISC core instructions take cycle, other instructions may take more add %ra,(%rc) mov (%rc),%rd add %rd,%ra mov %ra,(%rc) = ; t ebp: Pentium frame-pointer register mov (%ebp,), t mov t, add $, mov, (%ebp,) 7 8 3

0/5/03 Alternate (non-risc) tiling Greedy tiling = ; add $, (ebp,) r/m3 r/m3 CONST(k) 9 Assume larger tiles = better Greedy algorithm: starrom top of tree and use largest tile that matches tree Tile remaining subtrees recursively 8 MUL 0 Improving instruction selection Greedy tiling may not generate best code Always selects largest tile, not necessarily fastest instruction May pull nodes up into tiles when better to leave below Can do better using dynamic programming algorithm Timing model Idea: associate cost with each tile (proportional to # cycles to eecute) caveat: cost is fictional on modern architectures Estimate of total eecution time is sum of costs of all tiles Total cost: 5 Finding optimum tiling Goal: find minimum total cost tiling of tree Algorithm: for every node, find minimum total-cost tiling of that node and sub-tree. Lemma: once minimum-cost tiling of all children of a node is known, can find minimum-cost tiling of the node by trying out all possible tiles matching the node Therefore: starrom leaves, work upward to top node Recursive implementation Any dynamic-programming algorithm equivalent to a memoized version of same algorithm that runs top-down For each node, record best tile for node Start at top, recurse: First, check in table for best tile for this node If not computed, try each matching tile to see which one has lowest cost Store lowest-cost tile in table and return Finally, use entries in table to emit code 3

0/5/03 Problems with model Modern processors: eecution time not sum of tile times instruction order matters Processors are pipelining instructions and eecuting different pieces of instructions in parallel bad ordering (e.g. too many memory operations in sequence) stalls processor pipeline processor can eecute some instructions in parallel (super-scalar) cost is merely an approimation instruction scheduling needed Finding matching tiles Eplicitly building every tile: tedious Easier to write subroutines for matching Pentium source, destination operands Reuse matcher for all opcodes 5 6 Matching tiles Tile Specifications abstract class IR_Stmt { Assembly munch(); class IR_Move etends IR_Stmt { IR_Epr src, dst; Assembly munch() { if (src instanceof IR_Plus && ((IR_Plus)src).lhs.equals(dst) && is_regmem3(dst) { Assembly e = (IR_Plus)src).rhs.munch(); return e.append(new AddIns(dst, e.target())); r/m3 else if... Previous approach simple, efficient, but hard-codes tiles and their priorities Another option: eplicitly create data structures representing each tile in instruction set Tiling performed by a generic tree-matching and code generation procedure Can generate from instruction set description generic back end! For RISC instruction sets, over-engineering 7 8 Summary Can specify code-generation process as a set of tiles that relate IR trees to instruction sequences Instructions using fied registers problematic but can be handled using etra temporaries Greedy algorithm implemented simply as recursive traversal Dynamic-programming algorithm generates better code, can also be implemented recursively using memoization Real optimization will require instruction scheduling 9 5