Theory and Compiling COMP360

Similar documents
Honors Class (Foundations of) Informatics. Tom Verhoeff. Department of Mathematics & Computer Science Software Engineering & Technology

Pushdown Automata. place the input head on the leftmost input symbol. while symbol read = b and pile contains discs advance head remove disc from pile

03 - Lexical Analysis

CSC4510 AUTOMATA 2.1 Finite Automata: Examples and D efinitions Definitions

Compiler Construction

The Halting Problem is Undecidable

Pushdown automata. Informatics 2A: Lecture 9. Alex Simpson. 3 October, School of Informatics University of Edinburgh als@inf.ed.ac.

Introduction to Automata Theory. Reading: Chapter 1

Compiler I: Syntax Analysis Human Thought

Lexical Analysis and Scanning. Honors Compilers Feb 5 th 2001 Robert Dewar

How to make the computer understand? Lecture 15: Putting it all together. Example (Output assembly code) Example (input program) Anatomy of a Computer

Formal Languages and Automata Theory - Regular Expressions and Finite Automata -

Programming Languages CIS 443

6.045: Automata, Computability, and Complexity Or, Great Ideas in Theoretical Computer Science Spring, Class 4 Nancy Lynch

COMP 356 Programming Language Structures Notes for Chapter 4 of Concepts of Programming Languages Scanning and Parsing

Pushdown Automata. International PhD School in Formal Languages and Applications Rovira i Virgili University Tarragona, Spain

Automata and Computability. Solutions to Exercises

Finite Automata. Reading: Chapter 2

Scanner. tokens scanner parser IR. source code. errors

Introduction to Finite Automata

Compilers. Introduction to Compilers. Lecture 1. Spring term. Mick O Donnell: michael.odonnell@uam.es Alfonso Ortega: alfonso.ortega@uam.

Implementation of Recursively Enumerable Languages using Universal Turing Machine in JFLAP

Regular Languages and Finite State Machines

CSCI 3136 Principles of Programming Languages

Automata Theory. Şubat 2006 Tuğrul Yılmaz Ankara Üniversitesi

1 Introduction. 2 An Interpreter. 2.1 Handling Source Code

Lecture 18 Regular Expressions

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science

Finite Automata. Reading: Chapter 2

Semester Review. CSC 301, Fall 2015

Philadelphia University Faculty of Information Technology Department of Computer Science First Semester, 2007/2008.

Overview of E0222: Automata and Computability

Scoping (Readings 7.1,7.4,7.6) Parameter passing methods (7.5) Building symbol tables (7.6)

FINITE STATE AND TURING MACHINES

C H A P T E R Regular Expressions regular expression

Regular Expressions and Automata using Haskell

Fast nondeterministic recognition of context-free languages using two queues

SRM UNIVERSITY FACULTY OF ENGINEERING & TECHNOLOGY SCHOOL OF COMPUTING DEPARTMENT OF SOFTWARE ENGINEERING COURSE PLAN

Automata and Formal Languages

Computer Science Theory. From the course description:

Programming Assignment II Due Date: See online CISC 672 schedule Individual Assignment

6.080/6.089 GITCS Feb 12, Lecture 3

If-Then-Else Problem (a motivating example for LR grammars)

Properties of Stabilizing Computations

Introduction. Compiler Design CSE 504. Overview. Programming problems are easier to solve in high-level languages

3515ICT Theory of Computation Turing Machines

CA Compiler Construction

NFAs with Tagged Transitions, their Conversion to Deterministic Automata and Application to Regular Expressions

C Compiler Targeting the Java Virtual Machine

Language Processing Systems

University of Toronto Department of Electrical and Computer Engineering. Midterm Examination. CSC467 Compilers and Interpreters Fall Semester, 2005

CS154. Turing Machines. Turing Machine. Turing Machines versus DFAs FINITE STATE CONTROL AI N P U T INFINITE TAPE. read write move.

Turing Machines: An Introduction

1/20/2016 INTRODUCTION

Programming Languages

Programming Language Pragmatics

Chapter 2: Elements of Java

Advanced compiler construction. General course information. Teacher & assistant. Course goals. Evaluation. Grading scheme. Michel Schinz

CS 301 Course Information

Lab Experience 17. Programming Language Translation

The programming language C. sws1 1

Flex/Bison Tutorial. Aaron Myles Landwehr CAPSL 2/17/2012

Theory of Computation Chapter 2: Turing Machines

Syntaktická analýza. Ján Šturc. Zima 208

Teaching Pre-Algebra in PowerPoint

Lecture 9. Semantic Analysis Scoping and Symbol Table

5HFDOO &RPSLOHU 6WUXFWXUH

n Introduction n Art of programming language design n Programming language spectrum n Why study programming languages? n Overview of compilation

Course Manual Automata & Complexity 2015

Informatique Fondamentale IMA S8

A TOOL FOR DATA STRUCTURE VISUALIZATION AND USER-DEFINED ALGORITHM ANIMATION

Turing Machines, Part I

2) Write in detail the issues in the design of code generator.

Division of Mathematical Sciences

Genetic programming with regular expressions

Compiler Construction

Compilers Lexical Analysis

Optimization of SQL Queries in Main-Memory Databases

Being Regular with Regular Expressions. John Garmany Session

Model 2.4 Faculty member + student

Computer Architecture Syllabus of Qualifying Examination

T Reactive Systems: Introduction and Finite State Automata

Finite Automata and Regular Languages

Parsing Technology and its role in Legacy Modernization. A Metaware White Paper

Introduction to Turing Machines

CAs and Turing Machines. The Basis for Universal Computation

THEORY of COMPUTATION

How To Compare A Markov Algorithm To A Turing Machine

University of Wales Swansea. Department of Computer Science. Compilers. Course notes for module CS 218

Regular Languages and Finite Automata

Bachelors of Computer Application Programming Principle & Algorithm (BCA-S102T)

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Programming Languages

A Programming Language Where the Syntax and Semantics Are Mutable at Runtime

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science

24 Uses of Turing Machines

Acknowledgements. In the name of Allah, the Merciful, the Compassionate

Lexical analysis FORMAL LANGUAGES AND COMPILERS. Floriano Scioscia. Formal Languages and Compilers A.Y. 2015/2016

Increasing Interaction and Support in the Formal Languages and Automata Theory Course

Syllabus of. BCA III (Bachelor of Computer Application) Semester-V

Transcription:

Theory and Compiling COMP360

It has been said that man is a rational animal. All my life I have been searching for evidence which could support this. Bertrand Russell

Reading Read sections 3.1 3.3 in the textbook by Friday

Language Recognition We say a theoretical machine can recognize a language if it can correctly identify a string of characters as being a syntactically correct program If a theoretical machine can recognize a language, it can identify input that is not in the proper form, i.e. missing a semicolon or improper brackets

Chomsky Hierarchy There are four classes of languages Each class is a subset of another class Regular languages are the simplest while r.e. languages are the most complex

Languages and Machines Language Regular Expressions Context Free Context Sensitive Recursively Enumerable Machine Deterministic Finite State Automata (DFA) Push Down Automata (PDA) Bounded Turing Machine Turing Machine

Finite state automaton A finite state automaton (FSA) is a graph with directed labeled arcs, two types of nodes (final and non-final state), and a unique start state: This is also called a state machine What strings, starting in state A, end up at state C The language accepted by machine M is set of strings that move from start node to a final node Programming Language design and Implementation -4th Edition Copyright Prentice Hall, 2000

Example FSA This FSA recognizes a Java or C++ character string Remember that \ is an escape sequence that allows you to put a quote character in a string

Deterministic FSAs Deterministic FSA: For each state and for each input symbol, there is exactly one transition Non-deterministic FSA (NDFSA): Remove this restriction At each node there is 0, 1, or more than one transition for each alphabet symbol A string is accepted if there is some path from the start state to some final state Example nondeterministic FSA (NDFSA): 01 is accepted via path: ABD even though 01 also can take the paths: ACC or ABC and C is not a final state Programming Language design and Implementation -4th Edition Copyright Prentice Hall, 2000

Equivalence of FSA and NDFSA Important early result: NDFSA = DFSA Let subsets of states be states in DFSA. Keep track of which subset you can be in. Any string from {A} to either {D} or {CD} represents a path from A to D in the original NDFSA. Programming Language design and Implementation -4th Edition Copyright Prentice Hall, 2000

FSA and Regular Languages An FSA can recognize any regular language Regular languages are often used for pattern matching or searches A popular tool for searching files for a word is grep, General Regular Expression Parser

Regular Expressions Regular expressions are often used to specify a regular language A regular expression defines a series of symbols in the order they can appear in the language Consider a very simple language with only the letters a and b aabb is a regular expression for a string with two letters a followed by two b s

Regular Expression Alternatives A regular expression can specify alternatives using the vertical bar character, aabb ba defines a regular language that accepts two a s then two b s or a string with just a b and an a You can use parenthesis to group strings (aabb ba)a defines a regular language that accepts aabba or baa

Regular Expression Repetitions A symbol or (group) might repeat multiple times in a valid string of the language A * indicates that a symbol or (group) repeats zero, one or many times A + indicates that a symbol or (group) repeats once or many times, but at least once a 0..1 mean the symbol may appear zero or 1 time (aabb ba) + a defines the strings aabbaabba, aabbbaa, baa and more

Which strings are not (aa bb)*a + A. a B. bbaabba C. aaa D. aabb E. bbbbbbbbbbaaaaaaa

Write a Regular Expression Write a regular expression to define a string that begins and ends with an a and can have an even number of b s between them

Possible Solution Write a regular expression to define a string that begins and ends with an a and can have an even number of b s between them a (bb)* a

Equivalent A FSA and a regular expression can define a regular language Regular expressions can be recognized by a regular language

Converting a Regular Expression to a FSA Consider the regular expression a (bb)* a It can be recognized by the FSA a b b a a b

Try it Write a regular expression that defines a double number, such as 12.3 or -12.3 Use n to represent a numerical digit, 0..9

Possible Solution Write a regular expression that defines a double number, such as 12.3 or -12.3 Use n to represent a numerical digit, 0..9-0..1 n +. n*

Degenerate Case 123. and.123 are both valid numbers A single period is not a valid number There must be a digit before or after the decimal point A possible solution might be - 0..1 ((n +. n*) (n*. n + ))

FSA Limits The only memory that a FSA has is the current state As a FSA inputs symbols, it moves to different states An FSA cannot count or compare an arbitrary number of symbols

Push Down Automata A PDA has a FSA and a stack memory The top of the stack, input symbol and FSA state determine what a PDA will do A PDA can: Push a new symbol on the stack Pop the top symbol from the stack Change the FSA state

Push Down Automata Languages A Push Down Automata (PDA) can recognize context free grammars Most programming languages are context free grammars All regular languages are a subset of context free grammars A PDA can count things

Turing Machine A Turing Machine (TM) has a FSA and an infinite tape TM can take an action based on the current symbol on the tape and the current FSA state Overwrite the symbol on the tape Move left or right Change the FSA sate

Turing Power A Turing machine can recognize a n b n c n where n is the same for a, b and c Consider a Turing machine that starts at the first a: It changes the a to an X to moves to the first b It changes the b to a Y and move back to the second (now first) a Continue changing a and b. If you run out of one symbol before the other, then it is not in the language Repeat this process for Y and c

Programming Languages Almost all modern programming languages are context free grammars The syntax of a context free programming language can be recognized by a push down automata Nobody has developed a programming language more complicated than a context free grammar

Compiler s Purpose The compiler converts the program source code into a form that can be executed by the hardware The compiler works with the language libraries and the system linker

Output of a Compiler Most language systems compile the source into an object file of machine language which is linked into an executable Some languages, like Java and C#, are compiled to an intermediate language which is interpreted A compiler can output a high level language Some systems interpret and execute the source code directly

Run Time Compilers Java and C# compilers create an intermediate language that can be interpreted by a virtual machine Most virtual machines contain a Just In Time (JIT) compiler that compiles the intermediate language into machine language for efficiency

Stages of a Compiler Source preprocessing Lexical Analysis (scanning) Syntactic Analysis (parsing) Semantic Analysis Optimization Code Generation Link to libraries

Source Preprocessing In C and C++, preprocessor statements begin with a # The preprocessor edits the source code based on the preprocessor statements #include is the same as copying the included file at that point with the editor The output of the preprocessor is expanded source code with no # statements Old C compilers had a separate preprocessor program

Lexical Analysis Lexical Analysis or scanning reads the source code (or expanded source code) It removes all comments and white space The output of the scanner is a stream of tokens Tokens can be words, symbols or character strings A scanner can be a finite state automata

Syntactic Analysis Syntactic Analysis or parsing reads the stream of tokens created by the scanner It checks that the language syntax is correct The output of the Syntactic Analyzer is a parse tree The parser can be implemented by a context free grammar stack machine

Semantic Analysis The Semantic Analysis inputs the parse tree from the parser This stage determines what the program is to do The output of the Semantic Analysis is an intermediate code. This is similar to assembler language, but may include higher level operations

Optimization Most compilers will attempt to optimize the intermediate code Some compilers will also optimize after code generation There are many optimizations possible such as moving computations out of loops, avoiding redundant loads and stores, efficient use of registers, etc.

Code Generation The Code Generator inputs the intermediate language and outputs machine language for the target machine The code generator is specific to the machine architecture

Linking and Loading While not truly part of the compiler, the libraries provide the functionality that is more than just a few machine language statements The linker reads the object files and outputs and executable file

Simple Program /* This is an example program */ A = Boy + Cat + Dog;

After Lexical Scan A = Boy + Cat + Dog ;

Parsing <statement> -> <variable> = <expression> <variable> -> A Boy Cat Dog <expression> -> <variable> + <expression> <variable>

Compile A = B + C + D Intermediate code Temp1 = B + C Temp2 = Temp1 + D A = Temp2

Simple Machine Language Load register with B Add C to register Store register in Temp1 Load register with Temp1 Add D to register Store register in Temp2 Load register with Temp2 Store register in A

Optimized Machine Language Load register with B Add C to register Store register in Temp1 Load register with Temp1 Add D to register Store register in Temp2 Load register with Temp2 Store register in A

Symbol Table Many stages of a compiler create and reference a symbol table The symbol table keeps a list of all of the names used in the program To assist debugging, the symbol table can be written into the output object file. This tells debuggers where variables are located

Reading Read sections 3.1 3.3 in the textbook by Wednesday