Intel Assembler. Project administration. Non-standard project. Project administration: Repository



Similar documents
Introduction. Compiler Design CSE 504. Overview. Programming problems are easier to solve in high-level languages

Assembly Language: Function Calls" Jennifer Rexford!

CS412/CS413. Introduction to Compilers Tim Teitelbaum. Lecture 20: Stack Frames 7 March 08

How to make the computer understand? Lecture 15: Putting it all together. Example (Output assembly code) Example (input program) Anatomy of a Computer

Compiler Construction

CSCI 3136 Principles of Programming Languages

Lecture 7: Machine-Level Programming I: Basics Mohamed Zahran (aka Z)

CS61: Systems Programing and Machine Organization

Instruction Set Architecture

COMP 356 Programming Language Structures Notes for Chapter 4 of Concepts of Programming Languages Scanning and Parsing

Language Processing Systems

Scanning and parsing. Topics. Announcements Pick a partner by Monday Makeup lecture will be on Monday August 29th at 3pm

Machine-Level Programming II: Arithmetic & Control

CS:APP Chapter 4 Computer Architecture Instruction Set Architecture. CS:APP2e

If-Then-Else Problem (a motivating example for LR grammars)

Programming Languages

X86-64 Architecture Guide

Glossary of Object Oriented Terms

Compilers. Introduction to Compilers. Lecture 1. Spring term. Mick O Donnell: michael.odonnell@uam.es Alfonso Ortega: alfonso.ortega@uam.

C Compiler Targeting the Java Virtual Machine

Compiler Construction

Semester Review. CSC 301, Fall 2015

Return-oriented programming without returns

Machine Programming II: Instruc8ons

A Tiny Guide to Programming in 32-bit x86 Assembly Language

Lecture 27 C and Assembly

Scoping (Readings 7.1,7.4,7.6) Parameter passing methods (7.5) Building symbol tables (7.6)

Advanced compiler construction. General course information. Teacher & assistant. Course goals. Evaluation. Grading scheme. Michel Schinz

Lecture 9. Semantic Analysis Scoping and Symbol Table

Design Patterns in Parsing

Programming Language Pragmatics

Pushdown automata. Informatics 2A: Lecture 9. Alex Simpson. 3 October, School of Informatics University of Edinburgh als@inf.ed.ac.

Stack machines The MIPS assembly language A simple source language Stack-machine implementation of the simple language Readings:

Syntaktická analýza. Ján Šturc. Zima 208

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science

Computer Programming. Course Details An Introduction to Computational Tools. Prof. Mauro Gaspari:

Context free grammars and predictive parsing

Embedded Systems. Review of ANSI C Topics. A Review of ANSI C and Considerations for Embedded C Programming. Basic features of C

Static vs. Dynamic. Lecture 10: Static Semantics Overview 1. Typical Semantic Errors: Java, C++ Typical Tasks of the Semantic Analyzer

1/20/2016 INTRODUCTION

Semantic Analysis: Types and Type Checking

An Introduction to Assembly Programming with the ARM 32-bit Processor Family

CS143 Handout 08 Summer 2008 July 02, 2007 Bottom-Up Parsing

Lecture Outline. Stack machines The MIPS assembly language. Code Generation (I)

Hacking Techniques & Intrusion Detection. Ali Al-Shemery arabnix [at] gmail

language 1 (source) compiler language 2 (target) Figure 1: Compiling a program

Compiler I: Syntax Analysis Human Thought

1 The Java Virtual Machine

CHAPTER 6 TASK MANAGEMENT

Off-by-One exploitation tutorial

l C-Programming l A real computer language l Data Representation l Everything goes down to bits and bytes l Machine representation Language

Textual Modeling Languages

Compiler and Language Processing Tools

Intel 8086 architecture

Stack Allocation. Run-Time Data Structures. Static Structures

KITES TECHNOLOGY COURSE MODULE (C, C++, DS)

2) Write in detail the issues in the design of code generator.

Programming Languages

COMPUTER SCIENCE. 1. Computer Fundamentals and Applications

Technical paper review. Program visualization and explanation for novice C programmers by Matthew Heinsen Egan and Chris McDonald.

Faculty of Engineering Student Number:

Sources: On the Web: Slides will be available on:

Software Engineering and Service Design: courses in ITMO University

CA Compiler Construction

64-Bit NASM Notes. Invoking 64-Bit NASM

Computer Organization and Architecture

Departamento de Investigación. LaST: Language Study Tool. Nº 143 Edgard Lindner y Enrique Molinari Coordinación: Graciela Matich

Approximating Context-Free Grammars for Parsing and Verification

Programming and Software Development CTAG Alignments

Instruction Set Architecture (ISA)

Organization of Programming Languages CS320/520N. Lecture 05. Razvan C. Bunescu School of Electrical Engineering and Computer Science

University of Toronto Department of Electrical and Computer Engineering. Midterm Examination. CSC467 Compilers and Interpreters Fall Semester, 2005

Fundamentals of Java Programming

Handout 1. Introduction to Java programming language. Java primitive types and operations. Reading keyboard Input using class Scanner.

02 B The Java Virtual Machine

Lumousoft Visual Programming Language and its IDE

Objects for lexical analysis

03 - Lexical Analysis

Where we are CS 4120 Introduction to Compilers Abstract Assembly Instruction selection mov e1 , e2 jmp e cmp e1 , e2 [jne je jgt ] l push e1 call e

The previous chapter provided a definition of the semantics of a programming

Automating Mimicry Attacks Using Static Binary Analysis

Programming from the Ground Up

Division of Mathematical Sciences

Lexical analysis FORMAL LANGUAGES AND COMPILERS. Floriano Scioscia. Formal Languages and Compilers A.Y. 2015/2016

Interpreters and virtual machines. Interpreters. Interpreters. Why interpreters? Tree-based interpreters. Text-based interpreters

Assessment for Master s Degree Program Fall Spring 2011 Computer Science Dept. Texas A&M University - Commerce

Applying Clang Static Analyzer to Linux Kernel

Using Eclipse CDT/PTP for Static Analysis

A Java-based environment for teaching programming language concepts æ

Static Analysis. Find the Bug! : Analysis of Software Artifacts. Jonathan Aldrich. disable interrupts. ERROR: returning with interrupts disabled

Organization of DSLE part. Overview of DSLE. Model driven software engineering. Engineering. Tooling. Topics:

CSC4510 AUTOMATA 2.1 Finite Automata: Examples and D efinitions Definitions

Artificial Intelligence. Class: 3 rd

Systems Design & Programming Data Movement Instructions. Intel Assembly

1 Introduction. 2 An Interpreter. 2.1 Handling Source Code

Chapter 7D The Java Virtual Machine

Software Fingerprinting for Automated Malicious Code Analysis

Transcription:

Lecture 14 Project, Assembler and Exam Source code Compiler phases and program representations Frontend Lexical analysis (scanning) Backend Immediate code generation Today Project Emma Söderberg Revised by Emma Söderberg on March 5, 2013. Based on slides by Görel Hedin and Lennart Andersson. Syntactic analysis (parsing) Semantic analysis Analysis Tokens AST Attributed AST Optimization Machine code generation Synthesis Intermediate code Intermediate code Machine code Intel assembler Exam Repetition Beyond.. EDA180: Compiler Construction F14-1 EDA180: Compiler Construction F14-2 Standard project Course Project Build a compiler for your language In teams of 2 persons. Prerequisites: Approved assignments Assignment supervisor may grant postponement Design a small procedural language: integer and boolean types variables, constants, expressions, statements,... block structure with nested procedures parameters, return values, recursion name analysis type analysis intermediate code generation assembly code generation EDA180: Compiler Construction F14-3 EDA180: Compiler Construction F14-4

Non-standard project Project administration Design a language of your choice. Must be accepted by project supervisor in advance. Should be approximately the same size as the standard project. Typical requirements: non-trivial grammar non-trivial name analysis significant semantic computations translation to some intermediate code translation to native code Estimated work load: 40 hours (20-80) Administration Report your project group to your assignment supervisor. Your assignment supervisor will be your project supervisor. Book a meeting with your supervisor. Three tasks: Design, Front end, Back end. Three deadlines: March 24, April 22, and May 6. (also on the course webpage) Project supervisors: Niklas Fors, niklas.fors@cs.lth.se. Jesper Öqvist, jesper.oqvist@cs.lth.se. EDA180: Compiler Construction F14-5 EDA180: Compiler Construction F14-6 Project administration: Repository Git (recommended): Private repository don t assist plagiarism. View the section on Cooperation or Plagiarism on the department web page. Note that this excludes GitHub. BitBucket: Private Git repositories Academic license Used by your supervisor Set up your own and give access to your supervisor. Subversion: We can set up a repository for you. Intel Assembler Generate assembler from ICode Tools: as, ld, gcc EDA180: Compiler Construction F14-7 EDA180: Compiler Construction F14-8

Intel 386/486/Pentium processor architecture Register structure Structure of the EAX register (bits): General-purpose registers: EAX, EBX, ECX, EDX, ESI, EDI ESP stack pointer EBP base pointer Instruction pointer: EIP Segment registers: ECS, EDS, EES, ESS Flags register: EFLAGS 32 bits used to store results of comparisons. 31 24 23 16 15 8 7 0 AH AL AX EAX AL,AH 8-bit registers. AX a 16-bit register. EAX extended AX to 32 bits. EBX, ECX, and EDX have the same structure. EDA180: Compiler Construction F14-9 EDA180: Compiler Construction F14-10 Program example Memory.data # allocating memory n:.long 234 # the number length:.long 0 # the result ten:.long 10 # the divisor.text # instructions.global _start # make _start globally known _start: movl $0, %ebx # use ebx as counter movl n, %eax # copy number to eax nextdigit: movl $0, %edx # prepare for long division idivl ten # divide combined edx:eax by 10 # quotient to eax addl $1, %ebx # add 1 to counter cmpl $0, %eax # compare eax to 0 jg nextdigit # jump if eax>0 movl %ebx, length # copy counter to memory Memory size: Every byte (b, 8 bits) has an address, 0, 1,... word (w, 16 bits) long (l, 32 bits) In the project: All variables reside on the stack. Memory for the stack is allocated by ld (default 2Mb). You will not need a.data segment! Variables may have predetermined locations in memory and be referred to by name. EDA180: Compiler Construction F14-11 EDA180: Compiler Construction F14-12

Useful operand forms Operand Refers to $1448 constant 1448 (base 10) nextdigit label address %eax value in eax (%ebp) value at address contained in ebp 4(%ebp) value at 4 bytes after address in ebp (%ebp,%eax,4) value at ebp+4*eax The last three forms refer to values in main memory. Useful instructions Instruction Operands Effect movl rmc32, rm32 rm32 rmc32 addl rmc32, rm32 rm32 rm32+rmc32 subl rmc32, rm32 rm32 rm32-rmc32 negl rm32 rm32 -rm32 idivl rm32 eax edx:eax/rm32 edx remainder notl rm32 rm32! rm32, bitwise, false = 0 andl rmc32, rm32 rm32 rm32 & rmc32, bitwise orl rmc32, rm32 rm32 rm32 rmc32, bitwise cmpl rmc32 1, rmc32 2 compare by computing rmc32 2 -rmc32 1 leal m32, r32 r32 address denoted by m32 Operand types: r register, m memory, c constant An instruction can have at most one memory (m) operand. EDA180: Compiler Construction F14-13 EDA180: Compiler Construction F14-14 Conditional and jump instructions The result of comparisons (compl) end up in the EFLAGS register and may be used by succeeding instructions. Condition codes (cc) set by the compl instruction: l le e ne g ge < = > Jumps may be conditional: jmp dest je dest jg dest jcc dest jump unconditionally jump if equal jump if greater jump if cc (conditional code) Other conditional instructions: setcc rm8 rm8 = cc? 1 : 0 cmovcc rm32, r32 r32 = rm32 if cc Stack instructions Instruction Operand Effect pushl rmc32 push value in rmc32 popl rm32 pop to rm32 Example: pushl %ebx Stack before: Stack after: value towards address 0 ebx value value EDA180: Compiler Construction F14-15 EDA180: Compiler Construction F14-16

Procedure calls C compiler conventions Instruction Operands Effect call c32 push return address and jump ret pop return address and jump int c32 interrupt to kernel Example: p: call p...... ret # will push address of next instruction # will pop address and jump Arguments are pushed on the stack in reverse order in the caller s activation record. Caller pops arguments after return. Callee must restore EBX, ESI, EDI, ESP, and EBP before returning. EAX is used for return values. EDA180: Compiler Construction F14-17 EDA180: Compiler Construction F14-18 Debugging assembler The ddd debugger (gdb): Step through program The Exam Inspect memory Inspect registers EDA180: Compiler Construction F14-19 EDA180: Compiler Construction F14-20

The exam Old exams Regular exam: Wednesday March 13, 8-13, Sparta:D. Next exam: Friday August 30, 8-13, Victoriastadion 1A. One week advance registration is required for the August exam. Allowed material at the exam: Manual page on JastAdd syntax. ICode reference. Dictionary between English and your native language. Bonus points from the seminar exercises: Are counted at both the above examination dates, but not next year. Prerequisites for writing the exam: Approved assignments. Assignment supervisor may grant postponement. See the course web site, but note that... from 2008 a slightly different intermediate code is used. in 2003 and earlier, a slightly different JastAdd notation was used. Now, walk-through of the exam from 2007-03-06... EDA180: Compiler Construction F14-21 EDA180: Compiler Construction F14-22 Exam: Problem 1 Lexical analysis Exam: Problem 2 Parsing According to the Java Language Specification, an identifier is an unlimited-length sequence of Java letters and Java digits, the first of which must be a Java letter. Assume that a Java letter is one of a z, A Z, and that a Java digit is one of 0 9. According to the Java code conventions a class identifier should start with a capital letter, a method name should start with a small letter, and all letters should be capital in a constant name. a. Specify regular expressions for class and method identifiers according to the Java code conventions. You may use [a-z], but not more complex ranges like [a-za-z], as a regular expression denoting the language of all strings with one character from the specified range. b. An identifier cannot have the same spelling as the null literal. Construct a DFA recognizing class and method identifiers according to the Java code conventions and the literal null with distinct final states. A qualified identifier in Java adheres to the grammar qualifiedid qualifiedid. qualifiedid qualifiedid ID where ID is an identifier token. a. This grammar is ambiguous. Provide a string that has two different parse trees and draw the trees. b. Construct an equivalent grammar on canonical form that is unambiguous. c. Consider the language of all strings generated by the first grammar followed by a $ token. Construct a canonical LL(1) grammar for this language and present the LL(1) table. d. Specify an equivalent EBNF grammar for the first grammar that is not recursive and requires just 1 token lookahead. EDA180: Compiler Construction F14-23 EDA180: Compiler Construction F14-24

Exam: Problem 3 Semantic analysis Consider the following fragment of an abstract grammar. ProcedureDecl ::= Type <ID> Parameters Stmt; abstract Stmt; Assignment: Stmt ::= <ID> Expr; IfStmt: Stmt ::= Expr Then:Stmt Else:Stmt; Return: Stmt ::= Expr; StmtList: Stmt ::= Stmt*; a. Every execution path through the procedure block must terminate with a return statement. Construct a.jadd file with a method that checks this. Note that the following concrete program should not generate an error message. integer fac(integer n) { if (n==0) { return 1; else { return n*fac(n-1); Exam: Problem 3 Semantic analysis b. Assume that there is a traversing visitor: class TraversingVisitor implements Visitor {... Object visit(ifstmt node, Object data) { node.getexpr().accept(this, data); node.getthen().accept(this, data); node.getelse().accept(this, data); return null;... Construct a subclass of this class that provides a method static int numberofreturns(proceduredecl node) that will return the number of return statements in the node argument. EDA180: Compiler Construction F14-25 EDA180: Compiler Construction F14-26 Exam: Problem 4 Code generation and run-time system Exam: Problem 4 Code generation and run-time system You are going to generate intermediate code for the printr procedure in void main() { int n; void printr(int k); { if (k >= 0) { printr(k-1); print(k); n = read(); printr(n); a. Introduce a Print instruction in ICode that can be used for the print statement in the example. You should specify the abstract and the context-free grammars. b. What code should be generated for printr? Assume the same activation record layout as in the lectures, i.e. header, local variables, and temporaries, and that arguments are pushed on the stack by the caller. You must not replace the recursive calls by iteration. You must use a labeling scheme that would avoid name clashes in more complex examples. c. Draw a diagram showing the stack of activation records just before k=0 is printed for the case that n=2. You should indicate where the dynamic and static links point and the values of variables, parameters, and temporaries. The static links should be correct even if they are not used in this example. EDA180: Compiler Construction F14-27 EDA180: Compiler Construction F14-28

F14: Machine code generation Repetition F14-F01 Overall knowledge about: Machine architecture with CPU, registers, and memory. EDA180: Compiler Construction F14-29 EDA180: Compiler Construction F14-30 F13: Optimization F12: Memory Management SSA form (Static Single Assignment) A powerful representation for optimization. Typical optimizations at the intermediate code level: Dominance analysis. Copy propagation. Constant propagation.... Typical optimizations at the machine code level: Register allocation. Instruction scheduling (to take advantage of pipelining). Overall knowledge: The difference between manual and automatic memory management. Terminology: fragmentation, memory leak, dangling pointer, compaction, root pointer,... Main ideas in the main algorithms: reference counting, mark-sweep, copying, generation-based, conservative, incremental,... Main benefits and drawbacks of the different algorithms. You don t have to: Memorize the details of the algorithms. EDA180: Compiler Construction F14-31 EDA180: Compiler Construction F14-32

F11: Intermediate Code F10: Run-time systems What different kinds of intermediate code are there? Why temporary variables are needed and how they are handled. Advantages of using intermediate code. Difference between intermediate code and machine code. Difference between a virtual machine and a real machine. Translate a program to ICode. How to implement code generation based on the AST. You don t have to: Memorize the details of ICode you may use the ICode reference on the exam. Terminology: activation record, stack, stack pointer, frame pointer, static link, dynamic link, return address, object, heap, heap pointer,... How procedure calls work, with parameter and return value transmission. How object creation works. How local and non-local variables in procedures are accessed. How different kinds of variables are accessed in an OO language. What v-tables are and how they are used in OO languages for method calls. Draw the execution state at a given point in a given program. EDA180: Compiler Construction F14-33 EDA180: Compiler Construction F14-34 F9: Attribute grammars F8: Name and type analysis You should understand: General idea. What is the difference between inherited and synthesized attributes? You should be able to: Compute values for synthesized and inherited attribute for a given attribute grammar. Make name analysis using synthesized and inherited attributes. Terminology: name analysis, type analysis, scope, block, homogeneous blocks, declaration-before-use, bindings, symbol table,... Different kinds of scope rules. The difference between IdDecls and IdUses. How to implement name analysis based on the AST. Typical kinds of errors that can occur during compilation, and what different compiler phases they are identified in. EDA180: Compiler Construction F14-35 EDA180: Compiler Construction F14-36

F7: LR parsing F6: AST computations, AOP, The visitor pattern You should understand: The principles for how an LR parser works, LR items. Why LR is more powerful than LL. Typical kinds of unambiguous grammars that can be handled by an LR parser but not by an LL parser. Shift and reduce actions. What is meant by a Shift/Reduce or Reduce/Reduce conflict? The Visitor pattern and how to use it. Intertype declarations (static Aspect-oriented programming) and how to use them. The benefits and drawbacks of these techniques, compared to each other and compared to writing tangled code. Implement various computations using Visitors and Intertype declarations, e.g., unparsing, metrics, interpretation, name analysis, type checking, computation of information needed for code generation,... EDA180: Compiler Construction F14-37 EDA180: Compiler Construction F14-38 F5: Nullable, First and Follow,... Abstract syntax trees F5: Nullable, First and Follow,... Abstract syntax trees The principles for how an LL parser works. Intuitive definitions: nullable, FIRST, FOLLOW. Construct the nullable, FIRST, and FOLLOW tables for any CFG. Construct the LL(1) table for a CFG. decide if a grammar is LL(1) or not. The difference between a parse tree and an abstract syntax tree. The difference between a CFG and an abstract grammar. How to design an object-oriented abstract grammar with good names. Write down an abstract grammar using the JastAdd notation. How to build ASTs using semantic actions. How to build the AST when an LL parser is used. You don t have to: Memorize the API for generated JastAdd classes you may use the JastAdd manual page on the exam. Memorize the JJTree way for building ASTs. EDA180: Compiler Construction F14-39 EDA180: Compiler Construction F14-40

F4: LL Parsing F4: LL Parsing The different names for LL parsing. How to implement an LL parser by hand using recursive procedures. Typical kinds of grammars that an LL(1) parser cannot accept. Given a CFG with some of these typical problems, construct an equivalent CFG that is LL(1). What is the difference between local lookahead and global lookahead? What the dangling else problem is and how to handle it in an LL parser generator. Why it is sometimes useful to extend a CFG by an EOF-rule, and how to do it. What is meant by ambiguous and unambiguous grammars. Given an ambiguous grammar for expressions, construct an equivalent unambiguous grammar (given associativity and precedence rules). Typical kinds of unambiguous grammars that cannot be handled by an LL(1) parser. When could such grammars be LL(k)? Construct equivalent grammars that are LL(1). EDA180: Compiler Construction F14-41 EDA180: Compiler Construction F14-42 F3: Context-free grammars and Parsing How to design a clear and simple CFG for a language (disregarding ambiguities, non-ll-ness, etc.). Terminology: terminals, nonterminals, productions, start symbol. The formal definition of a CFG, G = (N, T, P, S), and what it means. The different notation forms for CFGs. Given a grammar on EBNF form, how to construct an equivalent grammar on canonical form, and vice versa. What is meant by (leftmost/rightmost) derivation. Show that a string belongs to a given language REs. Typical notation for regular expressions. The difference between REs and CFGs. F2: Regular expressions and Scanning Typical kinds of tokens and non-tokens. How to define typical tokens and non-tokens using regular expressions. What typical ambiguities may occur for a set of token definitions? How can such ambiguities be resolved? What a finite automaton (FA) is. The difference between a deterministic and nondeterministic FA. How to translate an NFA to a DFA. How to implement a scanner based on FAs, including handling ambiguities between regular expressions. EDA180: Compiler Construction F14-43 EDA180: Compiler Construction F14-44

F1: Introduction The typical phases in a compiler. The typical representations of a program inside a compiler. The separation into analysis and synthesis. The separation into front end and back end. Typical applications of compiler construction techniques (in addition to the typical source-to-machine code compiler). Beyond.. Examples of compiler-related research: Development of programming editors textual and graphical. Evaluation of reference attributes incremental/parallel. Optimizing compilers for multiprocessors.... Examples of compiler-related Master s thesis projects: Extend the Java language Java 7, Lambda expressions... Develop IDE for the Modelica Language (Modelon/Ideon) Optimize the JModelica compiler (Modelon/Ideon)... Let us now if you are interested in a Master s thesis or PhD thesis project! EDA180: Compiler Construction F14-45 EDA180: Compiler Construction F14-46