Introduction. Compiler Design CSE 504. Overview. Programming problems are easier to solve in high-level languages



Similar documents
How to make the computer understand? Lecture 15: Putting it all together. Example (Output assembly code) Example (input program) Anatomy of a Computer

CSCI 3136 Principles of Programming Languages

Language Processing Systems

Intel Assembler. Project administration. Non-standard project. Project administration: Repository

Compilers. Introduction to Compilers. Lecture 1. Spring term. Mick O Donnell: michael.odonnell@uam.es Alfonso Ortega: alfonso.ortega@uam.

Programming Languages

CA Compiler Construction

Compiler Construction

Compiler Construction

Introduction to formal semantics -

Lexical analysis FORMAL LANGUAGES AND COMPILERS. Floriano Scioscia. Formal Languages and Compilers A.Y. 2015/2016

Compiler I: Syntax Analysis Human Thought

03 - Lexical Analysis

Scanning and parsing. Topics. Announcements Pick a partner by Monday Makeup lecture will be on Monday August 29th at 3pm

Syntaktická analýza. Ján Šturc. Zima 208

High-Level Programming Languages. Nell Dale & John Lewis (adaptation by Michael Goldwasser)

Static vs. Dynamic. Lecture 10: Static Semantics Overview 1. Typical Semantic Errors: Java, C++ Typical Tasks of the Semantic Analyzer

Advanced compiler construction. General course information. Teacher & assistant. Course goals. Evaluation. Grading scheme. Michel Schinz

Computer Programming. Course Details An Introduction to Computational Tools. Prof. Mauro Gaspari:

Semantic Analysis: Types and Type Checking

Scoping (Readings 7.1,7.4,7.6) Parameter passing methods (7.5) Building symbol tables (7.6)

Parsing Technology and its role in Legacy Modernization. A Metaware White Paper

COMP 356 Programming Language Structures Notes for Chapter 4 of Concepts of Programming Languages Scanning and Parsing

Programming Languages

Lecture 7: Machine-Level Programming I: Basics Mohamed Zahran (aka Z)

CS412/CS413. Introduction to Compilers Tim Teitelbaum. Lecture 20: Stack Frames 7 March 08

Pushdown automata. Informatics 2A: Lecture 9. Alex Simpson. 3 October, School of Informatics University of Edinburgh als@inf.ed.ac.

Object-Oriented Software Specification in Programming Language Design and Implementation

Programming Languages CIS 443

Programming Languages

Semester Review. CSC 301, Fall 2015

1/20/2016 INTRODUCTION

1 Introduction. 2 An Interpreter. 2.1 Handling Source Code

Context free grammars and predictive parsing

Lecture 9. Semantic Analysis Scoping and Symbol Table

If-Then-Else Problem (a motivating example for LR grammars)

The previous chapter provided a definition of the semantics of a programming

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science

X86-64 Architecture Guide

Chapter 1. Dr. Chris Irwin Davis Phone: (972) Office: ECSS CS-4337 Organization of Programming Languages

FTP client Selection and Programming

Language Evaluation Criteria. Evaluation Criteria: Readability. Evaluation Criteria: Writability. ICOM 4036 Programming Languages

The programming language C. sws1 1

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

Textual Modeling Languages

CSE 307: Principles of Programming Languages

n Introduction n Art of programming language design n Programming language spectrum n Why study programming languages? n Overview of compilation

Lecture 27 C and Assembly

Automated Repair of Binary and Assembly Programs for Cooperating Embedded Devices

Assembly Language: Function Calls" Jennifer Rexford!

High level code and machine code

The Hardware/Software Interface CSE 351 Spring 2016

Natural Language to Relational Query by Using Parsing Compiler

Flex/Bison Tutorial. Aaron Myles Landwehr CAPSL 2/17/2012

2) Write in detail the issues in the design of code generator.

Organization of DSLE part. Overview of DSLE. Model driven software engineering. Engineering. Tooling. Topics:

Introduction to Automata Theory. Reading: Chapter 1

5HFDOO &RPSLOHU 6WUXFWXUH

Software Fingerprinting for Automated Malicious Code Analysis

Unified Language for Network Security Policy Implementation

Lexical Analysis and Scanning. Honors Compilers Feb 5 th 2001 Robert Dewar

CS 40 Computing for the Web

Machine-Level Programming II: Arithmetic & Control

Chapter 6: Programming Languages

Programming Language Pragmatics

C Compiler Targeting the Java Virtual Machine

Compiler and Language Processing Tools

Approximating Context-Free Grammars for Parsing and Verification

Programming Assignment II Due Date: See online CISC 672 schedule Individual Assignment

A Tiny Guide to Programming in 32-bit x86 Assembly Language

Master of Sciences in Informatics Engineering Programming Paradigms 2005/2006. Final Examination. January 24 th, 2006

A Programming Language Where the Syntax and Semantics Are Mutable at Runtime

Instruction Set Architecture

A TOOL FOR DATA STRUCTURE VISUALIZATION AND USER-DEFINED ALGORITHM ANIMATION

Pattern-based Program Visualization

SOFTWARE TOOL FOR TRANSLATING PSEUDOCODE TO A PROGRAMMING LANGUAGE

The Mjølner BETA system

Implementing Programming Languages. Aarne Ranta

[Refer Slide Time: 05:10]

English Grammar Checker

Architectural Design Patterns for Language Parsers

Design Patterns in Parsing

Departamento de Investigación. LaST: Language Study Tool. Nº 143 Edgard Lindner y Enrique Molinari Coordinación: Graciela Matich

Where we are CS 4120 Introduction to Compilers Abstract Assembly Instruction selection mov e1 , e2 jmp e cmp e1 , e2 [jne je jgt ] l push e1 call e

CSC Software II: Principles of Programming Languages

Artificial Intelligence. Class: 3 rd

Chapter 7: Functional Programming Languages

Return-oriented programming without returns

Symbol Tables. Introduction

NATURAL LANGUAGE QUERY PROCESSING USING SEMANTIC GRAMMAR

NATURAL LANGUAGE TO SQL CONVERSION SYSTEM

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November ISSN

Compiler Design July 2004

How Compilers Work. by Walter Bright. Digital Mars

Chapter 12 Programming Concepts and Languages

IT UNIVERSITY OF COPENHAGEN. Abstract. Department of Software Development and Technology (SDT) Master s Thesis. Generic deobfuscator for Java

Computer Science 281 Binary and Hexadecimal Review

Pattern-based Program Visualization

Lab Experience 17. Programming Language Translation

Transcription:

Introduction Compiler esign CSE 504 1 Overview 2 3 Phases of Translation ast modifled: Mon Jan 28 2013 at 17:19:57 EST Version: 1.5 23:45:54 2013/01/28 Compiled at 11:48 on 2015/01/28 Compiler esign Introduction CSE 504 1 / 33 What is a Compiler? Overview Programming problems are easier to solve in high-level languages High-level languages are closer to the problem domain E.g. Java, Python, SQ, Tcl/Tk,... Solutions have to be executed by a machine Instructions to a machine are specified in a language that reflects to the cycle-by-cycle working of a processor Compilers are the bridges: Software that translates programs written in high-level languages to efficient executable code. Compiler esign Introduction CSE 504 2 / 33

Overview An Example int gcd(int m, int n) { if (m == 0) return n; else if (m > n) return gcd(n, m); else return gcd(n%m, m); } _gcd: FB2: pushq %rbp CFI0: movq %rsp, %rbp CFI1: movl %edi, %ecx movl %esi, %edi testl %ecx, %ecx jne 11 jmp 3.align 4,0x90 13: 11: 6: 3: movl movl cmpl jg movl sarl idivl movl testl jne movl leave ret %edx, %ecx %edi, %edx %edi, %ecx 6 %edi, %eax $31, %edx %ecx %ecx, %edi %edx, %edx 13 %edi, %eax Compiler esign Introduction CSE 504 3 / 33 Requirements Overview In order to translate statements in a language, one needs to understand both the structure of the language: the way sentences are constructed in the language, and the meaning of the language: what each sentence stands for. Terminology: Structure Syntax Meaning Semantics Compiler esign Introduction CSE 504 4 / 33

Overview Translation Strategy Classic Software Engineering Problem Objective: Translate a program in a high level language into efficient executable code. Strategy: ivide translation process into a series of phases Each phase manages some particular aspect of translation. Interfaces between phases governed by specific intermediate forms. Compiler esign Introduction CSE 504 5 / 33 Translation Process Source Program Syntax Analysis Abstract Syntax Tree Semantic Processing Target Program Compiler esign Introduction CSE 504 6 / 33

Translation Steps Syntax Analysis Phase: Recognizes sentences in the program using the syntax of the language Semantic Analysis Phase: Infers information about the program using the semantics of the language Intermediate Code Generation Phase: Generates abstract code based on the syntactic structure of the program and the semantic information from Phase 2. Optimization Phase: Refines the generated code using a series of optimizing transformations. Final Code Generation Phase: Translates the abstract intermediate code into specific machine instructions. Compiler esign Introduction CSE 504 7 / 33 Structure of a Compiler: an Analogy : the structure (syntax) of a sentence in a language is used to give it a meaning (semantics). Bawat tao y isinilang na may laya at magkakapantay ang taglay na dangal at karapatan. Green wire connect after first not cut white also red wire. He sailed the coffee out of the leaf. This sentence has four words. Compiler esign Introduction CSE 504 8 / 33

Syntax efining and Recognizing Sentences in a anguage ayered approach Alphabet: defines allowed symbols exical Structure: defines allowed words Syntactic Structure: defines allowed sentences We will later associate meaning with sentences (semantics) based on their syntactic structure. Compiler esign Introduction CSE 504 9 / 33 Formal anguage Specification Solid theoretical results applied to a practical problem. eclarative vs. Operational Notations eclarative notation is used to define a language efines precisely the set of allowed objects (words/sentences) Examples: Regular expressions, Grammars. Operational notation is used to recognize statements in a language efines an algorithm for determining whether or not a given word/sentence is in the language Example: Automata Results from theory on converting between the two notations. Compiler esign Introduction CSE 504 10 / 33

Formal anguages A language is a set of strings over a set of symbols. The set of symbols of a language is called its alphabet (usually denoted by Σ. Each string in the language is called a sentence. Parts of sentences are called phrases. Compiler esign Introduction CSE 504 11 / 33 Context-Free Grammars A well-studied notation for defining formal languages. A Context Free Grammar (CFG, or grammar unless otherwise qualified) is defined over an alphabet, called terminal symbols. A CFG is defined by a set of productions. Each production is of the form X β where X is a single non-terminal symbol representing a set of phrases in the language, and β is a sequence of terminal and non-terminal symbols Example: Stmt while Expr do Stmt A unique non-terminal, called the start symbol, represents the set of all sentences in the language. The language defined by a grammar G is denoted by (G). Compiler esign Introduction CSE 504 12 / 33

Example Grammar ist of digits separated by + and signs (Example 2.1 in book): erivation of 9-5+2 from : + - 0 1... 9 = - = - = 9 - = 9 - + = 9 - + = 9-5 + = 9-5 + = 9-5 + 2 Compiler esign Introduction CSE 504 13 / 33 Parse Trees Pictorial representation of derivations = - = - = 9 - = 9 - + = 9 - + = 9-5 + = 9-5 + = 9-5 + 2 - + 9 5 2 = - = - + = - + = - + 2 = - + 2 = - 5 + 2 = - 5 + 2 = 9-5 + 2 Note: one parse tree may correspond to multiple derivations! Compiler esign Introduction CSE 504 14 / 33

Ambiguity A grammar is ambiguous if some sentence in the language has more than one parse tree. - + + - 9 5 2 9 5 2 Compiler esign Introduction CSE 504 15 / 33 Associativity and Precedence 9-5+2 (9-5)+2 9-5-2 (9-5)-2 9+5+2 (9+5)+2 + and usually have the same precedence and are left-associative. i.e. the second parse tree in the previous slide is the correct one The grammar can be changed to reflect the associativity and precedence: + - 0 1... 9 Compiler esign Introduction CSE 504 16 / 33

Schemes A notation that attaches program fragments (also called actions) to productions in a grammar. The intuition is, whenever a production is used in recognizing a sentence, the corresponding action will be taken. Example: + {add} - {sub} 0 {push 0} 1 {push 1}. Compiler esign Introduction CSE 504 17 / 33 Actions can be seen as additional leaves introduced into a parse tree. Example: Reading the actions left-to-right in the tree gives the translation. + {add} - {sub} 0 {push 0} 1 {push 1}. - 9 + 5 2 {push 9} {push 5} {sub} {push 2} {add} Compiler esign Introduction CSE 504 18 / 33

Grammars for anguage Specification The language (i.e. set of allowed strings) of most programming languages can be specified using CFGs. The grammar notation may be tedious for some aspects of a language. For instance, an integer is defined by a grammar of the following form: I P + P - P P P P 0 1... 9 For simpler fragments, the notation of regular expressions may be used. I = (+ -)?[0 9]+ Compiler esign Introduction CSE 504 19 / 33 Syntax Analysis in Practice Usually divided into exical Analysis followed by Parsing. exical Analysis: A lexical analyzer converts a stream of characters into a stream of tokens. Each token has a name (associated with terminal symbols) and a value (also called its attribute). A lexical analyzer is specified by a set of regular expression patterns and actions that are executed when the patterns are matched. Parsing: A parser converts a stream of tokens into a tree. Parsing uncovers the structure of a sentence in the language. Parsers are specified by grammars (actually, by translation schemes which are sets of productions associated with actions). Compiler esign Introduction CSE 504 20 / 33

Translation Process Phases of Translation Source Program Syntax Analysis Abstract Syntax Tree Semantic Processing Target Program Compiler esign Introduction CSE 504 21 / 33 Syntax Analysis Phases of Translation Source Program exical Analysis Token Stream Parsing Abstract Syntax Tree Compiler esign Introduction CSE 504 22 / 33

Phases of Translation exical Analysis First step of syntax analysis Objective: Convert the stream of characters representing input program into a sequence of tokens. Tokens are the words of the programming language. Examples: The sequence of characters static int is recognized as two tokens, representing the two words static and int. The sequence of characters *x++ is recognized as three tokens, representing *, x and ++. Compiler esign Introduction CSE 504 23 / 33 Parsing Phases of Translation Second step of syntax analysis Objective: Uncover the structure of a sentence in the program from a stream of tokens. For instance, the phrase x = +y, which is recognized as four tokens, representing x, = and + and y, has the structure =(x, +(y)), i.e., an assignment expression, that operates on x and the expression +(y). Output: A tree called abstract syntax tree that reflects the structure of the input sentence. Compiler esign Introduction CSE 504 24 / 33

Phases of Translation Abstract Syntax Tree (AST) Represents the syntactic structure of the program, hiding a few details that are irrelevent to later phases of compilation. For instance, consider a statement of the form: if (m == 0) S1 else S2 where S1 and S2 stand for some block of statements. A possible AST for this statement is: == If then else m 0 AST for S1 AST for S2 Compiler esign Introduction CSE 504 25 / 33 Semantic Processing Phases of Translation Abstract Syntax Tree Type Checking Intermediate Code Generation Code Optimization Final Code Generation Target Program Semantic Analysis Code Generation Compiler esign Introduction CSE 504 26 / 33

Phases of Translation Type Checking A instance of Semantic Analysis Objective: ecorate the AST with semantic information that is necessary in later phases of translation. For instance, the AST == If then else m 0 AST for S1 AST for S2 is transformed into If then else == : boolean m 0 : integer : integer AST for S1 AST for S2 Compiler esign Introduction CSE 504 27 / 33 Phases of Translation Intermediate Code Generation Objective: Translate each sub-tree of the decorated AST into intermediate code. Intermediate code hides many machine-level details, but has instruction-level mapping to many assembly languages. Main motivation for using an intermediate code is portability. Compiler esign Introduction CSE 504 28 / 33

Phases of Translation Intermediate Code Generation, an Example m If then else == : boolean 0 : integer : integer AST for S1 AST for S2 = loadint m loadimmed 0 intequal jmpnz.1 jmp.2.1:... code for S1 jmp.3.2:... code for S2 jmp.3.3: Compiler esign Introduction CSE 504 29 / 33 Code Optimization Phases of Translation Objective: Improve the time and space efficiency of the generated code. Usual strategy is to perform a series of transformations to the intermediate code, with each step representing some efficiency improvement. Peephole optimizations: generate new instructions by combining/expanding on a small number of consecutive instructions. Global optimizations: reorder, remove or add instructions to change the structure of generated code. Compiler esign Introduction CSE 504 30 / 33

Phases of Translation Code Optimization, an Example loadint m loadimmed 0 intequal jmpz.1 jmp.2.1:... code for S1 jmp.3.2:... code for S2 jmp.3.3: = loadint m jmpnz.2.1:... code for S1 jmp.3.2:... code for S2.3: Compiler esign Introduction CSE 504 31 / 33 Final Code Generation Phases of Translation Objective: Map instructions in the intermediate code to specific machine instructions. Supports standard object file formats. Generates sufficient information to enable symbolic debugging. Compiler esign Introduction CSE 504 32 / 33

Phases of Translation Final Code Generation, an Example loadint m jmpnz.2.1:... code for S1 jmp.3.2:... code for S2.3: = movl 8(%ebp), %esi testl %esi, %esi jne.2.1:... code for S1 jmp.3.2:... code for S2.3: Compiler esign Introduction CSE 504 33 / 33