Compilers. Introduction to Compilers. Lecture 1. Spring term. Mick O Donnell: michael.odonnell@uam.es Alfonso Ortega: alfonso.ortega@uam.



Similar documents
CSCI 3136 Principles of Programming Languages

Programming Languages

CA Compiler Construction

Compiler I: Syntax Analysis Human Thought

Compiler Construction

1/20/2016 INTRODUCTION

Introduction. Compiler Design CSE 504. Overview. Programming problems are easier to solve in high-level languages

n Introduction n Art of programming language design n Programming language spectrum n Why study programming languages? n Overview of compilation

Chapter 1. Dr. Chris Irwin Davis Phone: (972) Office: ECSS CS-4337 Organization of Programming Languages

Language Processing Systems

Language Evaluation Criteria. Evaluation Criteria: Readability. Evaluation Criteria: Writability. ICOM 4036 Programming Languages

How to make the computer understand? Lecture 15: Putting it all together. Example (Output assembly code) Example (input program) Anatomy of a Computer

Programming Languages CIS 443

Semantic Analysis: Types and Type Checking

CS412/CS413. Introduction to Compilers Tim Teitelbaum. Lecture 20: Stack Frames 7 March 08

Chapter 6: Programming Languages

Lecture 9. Semantic Analysis Scoping and Symbol Table

Chapter 12 Programming Concepts and Languages

Lecture 1: Introduction

Programming Languages

Programming Languages

Computer Programming. Course Details An Introduction to Computational Tools. Prof. Mauro Gaspari:

03 - Lexical Analysis

Compiler Construction

The programming language C. sws1 1

High-Level Programming Languages. Nell Dale & John Lewis (adaptation by Michael Goldwasser)

1 Introduction. 2 An Interpreter. 2.1 Handling Source Code

Advanced compiler construction. General course information. Teacher & assistant. Course goals. Evaluation. Grading scheme. Michel Schinz

CSE 130 Programming Language Principles & Paradigms

CSE 307: Principles of Programming Languages

CSC Software II: Principles of Programming Languages

X86-64 Architecture Guide

Scoping (Readings 7.1,7.4,7.6) Parameter passing methods (7.5) Building symbol tables (7.6)

Lexical Analysis and Scanning. Honors Compilers Feb 5 th 2001 Robert Dewar

C Compiler Targeting the Java Virtual Machine

Programming Language Pragmatics

3 SOFTWARE AND PROGRAMMING LANGUAGES

Semester Review. CSC 301, Fall 2015

Topics. Introduction. Java History CS 146. Introduction to Programming and Algorithms Module 1. Module Objectives

Sources: On the Web: Slides will be available on:

02 B The Java Virtual Machine

CSE 373: Data Structure & Algorithms Lecture 25: Programming Languages. Nicki Dell Spring 2014

Embedded Systems. Review of ANSI C Topics. A Review of ANSI C and Considerations for Embedded C Programming. Basic features of C

Outline. hardware components programming environments. installing Python executing Python code. decimal and binary notations running Sage

Name: Class: Date: 9. The compiler ignores all comments they are there strictly for the convenience of anyone reading the program.

Python Programming: An Introduction to Computer Science

AQA GCSE in Computer Science Computer Science Microsoft IT Academy Mapping

Scanning and parsing. Topics. Announcements Pick a partner by Monday Makeup lecture will be on Monday August 29th at 3pm

Lexical analysis FORMAL LANGUAGES AND COMPILERS. Floriano Scioscia. Formal Languages and Compilers A.Y. 2015/2016

1.1 WHAT IS A PROGRAMMING LANGUAGE?

The previous chapter provided a definition of the semantics of a programming

CS 106 Introduction to Computer Science I

McGraw-Hill The McGraw-Hill Companies, Inc.,

2) Write in detail the issues in the design of code generator.

ATSBA: Advanced Technologies Supporting Business Areas. Programming with Java. 1 Overview and Introduction

CSE-111 Great Ideas in Computer Science Albert Y. C. Chen University at Buffalo, SUNY

Levels of Programming Languages. Gerald Penn CSC 324

Jonathan Worthington Scarborough Linux User Group

Copyright 2012 Pearson Education, Inc. Chapter 1 INTRODUCTION TO COMPUTING AND ENGINEERING PROBLEM SOLVING

Interpreters and virtual machines. Interpreters. Interpreters. Why interpreters? Tree-based interpreters. Text-based interpreters

Symbol Tables. Introduction

Master of Sciences in Informatics Engineering Programming Paradigms 2005/2006. Final Examination. January 24 th, 2006

C++ Programming Language

Pushdown automata. Informatics 2A: Lecture 9. Alex Simpson. 3 October, School of Informatics University of Edinburgh als@inf.ed.ac.

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Query optimization. DBMS Architecture. Query optimizer. Query optimizer.

The C Programming Language course syllabus associate level

Lecture 7: Machine-Level Programming I: Basics Mohamed Zahran (aka Z)

Instruction Set Architecture (ISA)

Physical Data Organization

An Introduction to Computer Science and Computer Organization Comp 150 Fall 2008

Format string exploitation on windows Using Immunity Debugger / Python. By Abysssec Inc

Image credits:

Division of Mathematical Sciences

Pemrograman Dasar. Basic Elements Of Java

Introduction to Java

What is a programming language?

COMP 356 Programming Language Structures Notes for Chapter 4 of Concepts of Programming Languages Scanning and Parsing

1 The Java Virtual Machine

First Java Programs. V. Paúl Pauca. CSC 111D Fall, Department of Computer Science Wake Forest University. Introduction to Computer Science

Chapter 13 Computer Programs and Programming Languages. Discovering Computers Your Interactive Guide to the Digital World

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science

Good FORTRAN Programs

Software Fingerprinting for Automated Malicious Code Analysis

Principles of Programming Languages Topic: Introduction Professor Louis Steinberg

OKLAHOMA SUBJECT AREA TESTS (OSAT )

How To Write Portable Programs In C

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2

Big Data Technology Map-Reduce Motivation: Indexing in Search Engines

High-speed image processing algorithms using MMX hardware

Moving from CS 61A Scheme to CS 61B Java

Objects for lexical analysis

Datavetenskapligt Program (kandidat) Computer Science Programme (master)

Parsing Technology and its role in Legacy Modernization. A Metaware White Paper

7.1 Our Current Model

How Compilers Work. by Walter Bright. Digital Mars

Programming Language Inter-conversion

Lecture 7: Programming for the Arduino

Software: Systems and. Application Software. Software and Hardware. Types of Software. Software can represent 75% or more of the total cost of an IS.

Raima Database Manager Version 14.0 In-memory Database Engine

Transcription:

Compilers Spring term Mick O Donnell: michael.odonnell@uam.es Alfonso Ortega: alfonso.ortega@uam.es Lecture 1 to Compilers 1

Topic 1: What is a Compiler? 3 What is a Compiler? A compiler is a computer program which coverts source code written in a high level programming language into another form, typically machine code. unsigned int gcd (unsigned int a, unsigned int b) { if (a == 0 &&b == 0) b = 1; else if (b == 0) b = a; else if (a!= 0) while (a!= b) if (a <b) b -= a; else a -= b; } return b; The machine code is a sequence of instructions for the machine to perform. 4 2

Why study Compilers? Very few computer scientists actually write or modify compilers. So why is this a core subject? Some computer scientists need to understand how compilers work, so that they can write them. For this reason, the knowledge needs to be passed on. But more importantly, understanding how our computer programs are compiled and executed can help any programmer understand how the code they write drives the machine, and thus help us write faster, more effective programs. 5 Topic 2: History of Programming Languages 6 3

Development of Programming FIXED HARDWARE Early automatic machines were constructed with a single task in mind, and thus could not be programmed. Machines would process data supplied by the user. E.g., Cash Registers, Ballot Counters. PROGRAMMABLE HARDWARE In the 1940s, the first programmable computing machines were made. Users could change the way in which the machine processed data by configuring a number of switches on the machine. STORED PROGRAMS Some machines allowed operators to enter program line by line, stored in memory. In 1944, Harvard Mark I accepted programs stored on paper tape. In 1947-48: The magnetic drum memory was introduced as a data storage device for computers. 7 Development of Programming (Cont.) MACHINE CODE The earliest programmable computers worked directly in machine code. A machine code program is a sequence of instructions, each instruction consists of an operator, and (in the early days) a single argument, which would be either a value (e.g., an integer), or a reference to a register (a memory location). The CPU loads in one instruction at a time, and executes that instruction. The next instruction in sequence is then loaded in. Each instruction is represented as a fixed number of bits (in early days, 8 bits were used, although currently 64 bit instructions are common). The first 4 bits represented the operator ( e.g., 00 for NOOP, 01 for ADD, 02 for MOV, etc. The remaining 4 bits represent the data to operate on. e.g. 01 01 Add 1 to the current sum. 8 4

Development of Programming (cont.) INTRODUCTION OF ASSEMBLY LANGUAGE Programming in machine code is very slow, requiring the programmer to continually keep track of which binary code represents which operation, and converting numbers into binary. Assembly Language introduced a mnemonic representation, using symbols such as ADD, MOV or POP to use rather than the binary codes. A program called the ASSEMBLER then translates the assembly language program into machine code. Assembly programs also allow the inclusion of comments, which explain the program to other programmers, but are ignored in the conversion to machine code. Distinct assembly language for each machine type. EXAMPLE mov al, 0x61 10110000 01100001 9 Development of Programming (cont.) HIGH LEVEL PROGRAMMING LANGUAGES Assembly language is specific to a particular machine code, and thus a particular machine type. The next step was to develop programming languages which work on any machine. Each machine provides a compiler to translate the high-level language to the machine code for that machine. Some writing in assembler still took place where optimal performance needed (and still happens today!) 1945: Zuse developed plankalkul (plan calculus) the first programming language. This was the predecessor of algorithmic programming languages and concepts of logic programming. It was designed to be a chess-playing program. 1949: Short Order, developed by John Mauchly, is thought to be the first high-level programming language. 1954-57 an IBM team lead by John W. Backus developed FORTRAN 10 5

Development of Programming (cont.) NEW PARADIGMS While the earliest HLPLs were seen as sequences of instructions, the developing experience of using compilers to map from abstract languages to machine runnable code allowed computer scientists to experiment with the way programming languages were formulated. Best known of the new programming paradigms are: Functional (Lisp, Scheme, ML, Haskell), Logical (Prolog, etc.), Object-oriented (C++, Java, CLOS, Python, etc.) 11 Development of Programming (cont.) VIRTUAL MACHINES In the standard approach, source programs are compiled to distinct machine code formats for each hardware/os platform. An alternative approach compiles the source code to a platformindependent form of machine code, called Byte. Each platform then provides software (e.g., Java Runtime for Windows) to execute the byte code on that platform. The byte code thus does not drive the CPU directly: the virtual machine examines the byte code to see which of its functions it should run. For this reason, the virtual machine is usually called an interpreter. 12 6

Development of Programming (cont.) INTERPRETED LANGUAGES? Programming languages such as Java, Perl, Python, Tcl/Tk, etc. are often called interpreted languages, because historically they have always been used only in virtual machines. However, more and more compilers are becoming available for these languages, which produce native machine code from the source code. For this reason, calling a language interpreted or compiled is on the way out, as a given language can be used in both environments. 13 Topic 3: History of Compilers 14 7

Development of Compilers Machine code No need for processing, used directly Assembly code: Translated to machine code via a direct mapping (e.g., replace symbol MOV with hex code 0A BC Comments dropped during mapping High level programs Initial approaches wrote compiler in machine code, with knowledge of the programming language coded directly. Later approaches separated language structure from the code, using the notion of grammars: PROGRAM begin STATEMENT* end STATEMENT VAR = EXPR ; print EXPR EXPR VAR NUM EXPR + EXPR etc. 15 Evolution of Bugs The term bug was used to refer to hardware problems as early as 1878, e.g., Thomas Edison to a friend: It has been just so in all of my inventions. The first step is an intuition, and comes with a burst, then difficulties arise this thing gives out and [it is] then that "Bugs" as such little faults and difficulties are called show themselves. Regularly applied to problems in radar electronics during WW II First computer bug: 1947: Working on a prototype of the Mark II, at Harvard, an operator finds the first computer "bug," logged at 15:45 hours on September 9, 1947, a moth that had caused a relay failure. 1962: NASA Mariner 1 went off-course during launch 1985-1987: four people died when exposed to lethal doses of radiation from Therac- 25 linear accelerator machines, used for treatment of cancer. Software errors caused machines to incorrectly calculate the amount of radiation being delivered to the patient. 1989: A computer in Paris read files on traffic violations and then mistakenly sent out letters charging 41,000 traffic offenders with crimes including murder, drug trafficking, extortion, and prostitution. Recipients were described as "surprised." 16 8

Topic 4: How Compilers Work 17 How does a compiler work? A compiler can be viewed in two parts: 1. Source : which takes as input source code as a sequence of characters, and interprets it as a structure of symbols (vars, values, operators, etc.) 2. Object Generator: which takes the structural analysis from (1) and produces runnable code as output. 18 9

How does a compiler work? A compiler can be viewed in two parts: 1. Source : which takes as input source code as a sequence of characters, and interprets it as a structure of symbols (vars, values, operators, etc.) 2. Object Generator: which takes the structural analysis from (1) and produces runnable code as output. Source 19 How does a compiler work? A compiler can be viewed in two parts: 1. Source : which takes as input source code as a sequence of characters, and interprets it as a structure of symbols (vars, values, operators, etc.) 2. Object Generator: which takes the structural analysis from (1) and produces runnable code as output. Source Generat or Object 20 10

How does a compiler work? A compiler can be viewed in two parts: 1. Source : which takes as input source code as a sequence of characters, and interprets it as a structure of symbols (vars, values, operators, etc.) 2. Object Generator: which takes the structural analysis from (1) and produces runnable code as output. The Source is machine-independent, while the Object Generator needs to be produce different code for each machine-type, and is thus machine-dependent. 21 How does a compiler work? A compiler can be viewed in two parts: 1. Source : which takes as input source code as a sequence of characters, and interprets it as a structure of symbols (vars, values, operators, etc.) 2. Object Generator: which takes the structural analysis from (1) and produces runnable code as output. The Source is machine-independent, while the Object Generator needs to be produce different code for each machine-type, and is thus machine-dependent. (1) is often called the front-end of the compiler, and (2) the back-end 22 11

How does a compiler work? The Front End typically has three stages: Lexical Analysis: accepts the source code as a sequence of chars, outputs the code as a sequence of tokens. Syntax : interprets the program tokens as a structured program. Semantic : checks that variables are instantiated before use, etc. FRONT END Source Lexical Syntactic Semantic 23 The Lexical FRONT END Source Lexical Syntactic Semantic 24 12

Lexical : Also called tokeniser, scanner, or in Spanish, morphological analyser Main task: recognise which character sequences are tokens (variables, values, operators, etc.) E.g., A := 100; A, :=, 100, ; Secondary task: tag each token by its type, e.g., identifier entifier, reserved- word, integer eger, etc. begin int A; A := 100; A := A+A; output A End How does a compiler work? (reserved-word,begin) (type, int)(<id>,a)(<symb>,;) (<id>,a)(<mult-symb>,:=) (<cons int>,100)(<symb>,;) (<id>,a)(<mult-symb>,:=) (<id>,a)(<symb>,+)(<id>,a)(<symb>,;) (reserved-word,output)(<id>,a) (reserved-word,end) 25 Parts of translators / compilers Lexical analyser The tokens recognised in lexical analysis are usually of the following types: Identifiers Reserved words (e.g. for, while ) Numeric constants (integers, real numbers, etc.) Literal (or string) constants. Simple symbols: operators (+, -, *, ), separators (;,., ) Multiple symbols: operators (+=, -=, ) These tokens, together with their types, become the atoms (terminals) of the next stage, syntactic analysis. 26 13

Parts of translators / compilers Lexical analyser Other tasks done in lexical analysis: Remove excess blank-spaces and comments. Detect lexical errors: Badly formed symbols Badly formed identifiers Badly formed constants. Badly formed comments. 27 Parts of translators / Compilers Lexical analyser: example The figure shows an example of morphological analysis of a program correctly written in the programming language asple. The morphological analyser returns, as syntactic units, pairs with: The name of the unit (reserved-word, identifier, symbol, multiplesymbol ) The char sequence from the source that corresponds to that unit. begin int A; A := 100; A := A+A; output A end AM (reserved-word,begin) (type, int)(<id>,a)(<symb>,;) (<id>,a)(<mult-symb>,:=) (<cons int>,100)(<symb>,;) (<id>,a)(<mult-symb>,:=) (<id>,a)(<symb>,+)(<id>,a)(<symb>,;) (reserved-word,output)(<id>,a) (reserved-word,end) 28 14

The Syntactic FRONT END Source Lexical Syntactic Semantic 29 Syntactic Also called parser Inputs: the tokenised program produced by lexical analyser Resources: A grammar defining the structure of the language PROGRAM begin STATEMENT* end STATEMENT VAR = EXPR ; print EXPR EXPR VAR NUM EXPR + EXPR etc. Outputs: a structural representation of the program, showing, e.g., that tokens group into a statement, that statements group into a block, etc. This data structure is called a Parse Tree, or sometimes an Intermediate Representation. The parse tree is a language independent structure, which gives a great deal of flexibility to the code generator. 30 15

Parts of translators / compilers Syntactic analyses: example <Program> begin <declrcns> ; <stmts> end <declrcn> <statement> ; <stmts> <type> <ids> <assignment> <statement> ; <stmts> int <id> <id> := <expr> <asignment> <statement> A.Sintáctico (<palabra clave>,begin) (<tipo>,int)(<id>,a)(<simb>,;) A A <const.int> <id> := <expr> <outputstmt> 100 A <expr> + <expr> output<expr> <id> <id> <id> (<id>,a)(<simbm>,:=) (<cons int>,100)(<simb>,;) (<id>,a)(<simbm>,:=) (<id>,a)(<simb>,+)(<id>,a)(<simb>,;) (<palabra clave>,output)(<id>,a) (palabra clave>,end) A A A 31 Syntactic : Syntactic Other functions: It is also responsible for identifying syntactic errors in the code. i.e. places where a sequence of tokens does not match the syntax rules of the language, e.g., Missing ; at the end of a statement in Java 32 16

Parts of translators / compilers Lexical vs. Syntactic analyser The distinction between lexical analyser and syntax analyser is somewhat arbitrary: It should be possible to write a context-independent grammar and implement a pushdown automata that recognises the complete language. However, some of the syntactic elements of programming languages (e.g. comments, constants, names) belong to a simpler type of languages, and are usually easy to describe with regular expressions, and thence they are recognisable with finite automatas. Therefore, it is usually a good idea to split the analysis in two steps. 33 The Semantic FRONT END Source Lexical Syntactic Semantic 34 17

Semantic Semantic : Can perform checks on program consistency: Operations have arguments of allowed types variables are initialised before referenced, correct number of arguments to function calls, etc. Typically a global view of the program 35 The Symbol Table FRONT END Source Lexical Syntactic Semantic Symbol Table 36 18

Parts of translators / compilers The symbols table is a table whose aim is to store all the information necessary to determine that the program is correct, and to generate the code: Names and types of variables. Names of procedures, and types of arguments and return values. Names of objects, packages, modules Shared between the 3 modules of the front end FRONT END Symbols table Lexical Syntactic Semantic Symbol Table 37 Parts of translators / compilers Symbols table Definition: the symbol table is a data structure which holds information on the identifiers in the program (variable names, function names, etc.). Implementation: The symbol table is usually implemented with a data structure (typically a hash table), to allow for efficient execution of: Introducing information Retrieving information 38 19

Parts of translators / compilers Symbols table: example The symbols table may have the following content begin int A; A := 100; A := A+A; output A endm.a Syntactic Anal. Element <id> Type int Value A 39 The Front End Summary of Front End : FRONT END Source Lexical Syntactic Semantic Symbol Table 40 20

The Back End The Back End typically has two stages: Generator: Generates object code from parse tree (typically machine code) Optimiser: recognises structures in the machine code which can be made more efficient, and changes them BACK END Generator Optimiser Object 41 The Back End The Back End sometimes has an initial pre-processing step, which: breaks any expressions into their simplest components. For example, the assignment: a := 1 + 2 * 3 would be broken into: temp := 2 * 3; a := 1 + temp; Such expressions are called Binary Expressions. Such expressions facilitate the generation of assembler language code. Compilers that translate from one high level language to another often do not contain this step. This step sometimes also performs machine independent optimizations. 42 21

The Back End A different back-end is required for each object language, Distinct code to generate Optimisation depends on target language Each target machine type has different machine code. Thus, the back end differs for each machine type BACK END 1 Generator 1 Optimiser 1 Object 1 BACK END 2 Generator 2 Optimiser 2 Object 2 43 Optimisation: Parts of translators / Compilers The Back End It is difficult for a compiler to generate the target code in a way that fully takes advantage of the resources (memory, cache, CPU ) in a efficient way. These last modules try to reorganise and rewrite portions of the code in order to mitigate this problem. 44 22

Optimisation : EXAMPLE The translator might have generated, initially, the following version of the program in assembler. The optimiser might realise that, in this case, the management of arithmetic expressions has produced two instructions that are redundant (as the value is already in EAX) and can be removed. Parts of translators / Compilers The Back End segment.data _A dd 0 segment.codigo global _main _main: push dword 100 pop eax mov [_A], eax push dword [_A] push dword [_A] POP edx POP eax add eax,edx push eax pop eax mov [_A], eax push dword [_A] pop eax push eax call imprime_entero add esp, 4 call imprime_fin_linea ret 45 Optimisation : EXAMPLE The translator might have generated, initially, the following version of the program in assembler. The optimiser might realise that, in this case, the management of arithmetic expressions has produced two instructions that are redundant (as the value is already in EAX) and can be removed. Parts of translators / Compilers The Back End segment.data _A dd 0 segment.codigo global _main _main: push dword 100 pop eax mov [_A], eax push dword [_A] POP edx add eax,edx push eax pop eax mov [_A], eax push dword [_A] pop eax push eax call imprime_entero add esp, 4 call imprime_fin_linea ret 46 23

Parts of Translators / Compilers Error recovery Error recovery: Most translators are not only used for translating from a high-level programming language into other language. They are also a tool for software development, showing the programmer the bugs in their program Therefore, one of the major functions of translators is the diagnostic they provide in case of coding mistakes. Compilers should also be designed so that they don t just stop at the first bug, but can recover from the error, and locate other bugs. 47 Parts of Translators / Compilers Complete example begin int A; A := 100; A := A+A; output A end Source AM Syntactic A. Generation, Optimisation, Memory Magmt. segment.data _A dd 0 segment.codigo global _main _main: push dword 100 pop eax mov [_A], eax push dword [_A] POP edx add eax,edx push eax pop eax mov [_A], eax push dword [_A] pop eax push eax call imprime_entero add esp, 4 call imprime_fin_linea ret (<palabra clave>,begin) (<tipo>,int)(<id>,a)(<simb>,;) Object (<id>,a)(<simbm>,:=) (<cons int>,100)(<simb>,;) (<id>,a)(<simbm>,:=) (<id>,a)(<simb>,+)(<id>,a)(<simb>,;) (<palabra clave>,output)(<id>,a) (palabra clave>,end) 48 24

Topic 5: Interpreters 49 Compilers vs. Interpreters General concepts The back end of a compiler translates the internal representation of the program into object code in a file. An interpreter can have the same front end, but rather than converting the internal representation, it knows how to execute it directly. An interpreter thus allows for on the fly execution of lines of code provided by a programmer. Sometimes used in an interactive environment with a user But also allows for scripts (source code) to be executed as a program without intervening compilation. 50 25

Compilers vs. Interpreters Example If a session of an interpreter received, from the user, the program statements from the previous examples Interpreter Window Input begin int A; A := 100; A := A+A; output A Session start int A := 100 A := A + A Session end Output 200 51 Main parts of an interpreter: Symbols table Morphological analyser Syntax analyser Semantic analyser execution Memory management Interpreters Parts of an interpreter 52 26

Topic 6: Other Topics 53 Other concepts Single and multiple Pass Compilers Pass: Complete revision of the source program, for any specific purpose. One-pass compiler: Compiler that performs all the translation with just one pass over the source code. Many-passes compiler: Compiler that performed more than one pass. These usually impose less restrictions to the source language. 54 27