Compiler I: Syntax Analysis Human Thought



Similar documents
How to make the computer understand? Lecture 15: Putting it all together. Example (Output assembly code) Example (input program) Anatomy of a Computer

Scanning and parsing. Topics. Announcements Pick a partner by Monday Makeup lecture will be on Monday August 29th at 3pm

Compilers. Introduction to Compilers. Lecture 1. Spring term. Mick O Donnell: michael.odonnell@uam.es Alfonso Ortega: alfonso.ortega@uam.

Programming Assignment II Due Date: See online CISC 672 schedule Individual Assignment

Flex/Bison Tutorial. Aaron Myles Landwehr CAPSL 2/17/2012

Compiler Construction

Programming Languages

Compiler Construction

COMP 356 Programming Language Structures Notes for Chapter 4 of Concepts of Programming Languages Scanning and Parsing

CSCI 3136 Principles of Programming Languages

High-Level Language. Building a Modern Computer From First Principles.

03 - Lexical Analysis

Introduction. Compiler Design CSE 504. Overview. Programming problems are easier to solve in high-level languages

CA Compiler Construction

Language Processing Systems

Scoping (Readings 7.1,7.4,7.6) Parameter passing methods (7.5) Building symbol tables (7.6)

Semantic Analysis: Types and Type Checking

Syntaktická analýza. Ján Šturc. Zima 208

University of Toronto Department of Electrical and Computer Engineering. Midterm Examination. CSC467 Compilers and Interpreters Fall Semester, 2005

Lexical Analysis and Scanning. Honors Compilers Feb 5 th 2001 Robert Dewar

7. Virtual Machine I: Stack Arithmetic 1

Advanced compiler construction. General course information. Teacher & assistant. Course goals. Evaluation. Grading scheme. Michel Schinz

Programming Languages CIS 443

Programming Languages

Let s put together a Manual Processor

A single register, called the accumulator, stores the. operand before the operation, and stores the result. Add y # add y from memory to the acc

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science

Name: Class: Date: 9. The compiler ignores all comments they are there strictly for the convenience of anyone reading the program.

Lecture 9. Semantic Analysis Scoping and Symbol Table

Programming Project 1: Lexical Analyzer (Scanner)

Lexical analysis FORMAL LANGUAGES AND COMPILERS. Floriano Scioscia. Formal Languages and Compilers A.Y. 2015/2016

Formal Languages and Automata Theory - Regular Expressions and Finite Automata -

Good FORTRAN Programs

1 Introduction. 2 An Interpreter. 2.1 Handling Source Code

CS103B Handout 17 Winter 2007 February 26, 2007 Languages and Regular Expressions

Design Patterns in Parsing

The previous chapter provided a definition of the semantics of a programming

Compilers Lexical Analysis

ADVANCED SCHOOL OF SYSTEMS AND DATA STUDIES (ASSDAS) PROGRAM: CTech in Computer Science

Pushdown Automata. place the input head on the leftmost input symbol. while symbol read = b and pile contains discs advance head remove disc from pile

Acknowledgements. In the name of Allah, the Merciful, the Compassionate

Introduction to Automata Theory. Reading: Chapter 1

İSTANBUL AYDIN UNIVERSITY

Natural Language to Relational Query by Using Parsing Compiler

Pushdown automata. Informatics 2A: Lecture 9. Alex Simpson. 3 October, School of Informatics University of Edinburgh als@inf.ed.ac.

Programming Language Pragmatics

Sources: On the Web: Slides will be available on:

Compiler Design July 2004

Bottom-Up Parsing. An Introductory Example

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Query optimization. DBMS Architecture. Query optimizer. Query optimizer.

A TOOL FOR DATA STRUCTURE VISUALIZATION AND USER-DEFINED ALGORITHM ANIMATION

Embedded Systems. Review of ANSI C Topics. A Review of ANSI C and Considerations for Embedded C Programming. Basic features of C

Context free grammars and predictive parsing

An Introduction to Assembly Programming with the ARM 32-bit Processor Family

Scanner. tokens scanner parser IR. source code. errors

Lumousoft Visual Programming Language and its IDE

[Refer Slide Time: 05:10]

Semester Review. CSC 301, Fall 2015

Language Evaluation Criteria. Evaluation Criteria: Readability. Evaluation Criteria: Writability. ICOM 4036 Programming Languages

The programming language C. sws1 1

Informatica e Sistemi in Tempo Reale

Honors Class (Foundations of) Informatics. Tom Verhoeff. Department of Mathematics & Computer Science Software Engineering & Technology

A Programming Language Where the Syntax and Semantics Are Mutable at Runtime

CS143 Handout 08 Summer 2008 July 02, 2007 Bottom-Up Parsing

Topics. Introduction. Java History CS 146. Introduction to Programming and Algorithms Module 1. Module Objectives

Automata and Formal Languages

C Compiler Targeting the Java Virtual Machine

1/20/2016 INTRODUCTION

Compiler and Language Processing Tools

A Lex Tutorial. Victor Eijkhout. July Introduction. 2 Structure of a lex file

n Introduction n Art of programming language design n Programming language spectrum n Why study programming languages? n Overview of compilation

Textual Modeling Languages

Yacc: Yet Another Compiler-Compiler

Bachelors of Computer Application Programming Principle & Algorithm (BCA-S102T)

Embedded Programming in C/C++: Lesson-1: Programming Elements and Programming in C

Programming Languages

Data Integration through XML/XSLT. Presenter: Xin Gu

GENERIC and GIMPLE: A New Tree Representation for Entire Functions

Moving from CS 61A Scheme to CS 61B Java

Parsing Expression Grammar as a Primitive Recursive-Descent Parser with Backtracking

Computer Science. Master of Science

ZIMBABWE SCHOOL EXAMINATIONS COUNCIL. COMPUTER STUDIES 7014/01 PAPER 1 Multiple Choice SPECIMEN PAPER

Natural Language Database Interface for the Community Based Monitoring System *

Theory of Compilation

Automatic Text Analysis Using Drupal

Fundamentals of Programming and Software Development Lesson Objectives

Instructor Özgür ZEYDAN BEU Dept. of Enve. Eng. CIV 112 Computer Programming Lecture Notes (1)

Introduction to formal semantics -

Syntax Check of Embedded SQL in C++ with Proto

If-Then-Else Problem (a motivating example for LR grammars)

NATURAL LANGUAGE QUERY PROCESSING USING SEMANTIC GRAMMAR

Chapter 6: Programming Languages

CS5310 Algorithms 3 credit hours 2 hours lecture and 2 hours recitation every week

CS 3530 Operating Systems. L02 OS Intro Part 1 Dr. Ken Hoganson

CS 141: Introduction to (Java) Programming: Exam 1 Jenny Orr Willamette University Fall 2013

Transcription:

Course map Compiler I: Syntax Analysis Human Thought Abstract design Chapters 9, 12 H.L. Language & Operating Sys. Compiler Chapters 10-11 Virtual Machine Software hierarchy Translator Chapters 7-8 Assembly Language Assembler Chapter 6 Building a Modern Computer From First Principles www.nand2tetris.org Machine Language Computer Architecture Chapters 4-5 Hardware hierarchy Hardware Platform Gate Logic Chapters 1-3 Chips & Logic Gates Electrical Engineering Physics Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 1 Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 2 Motivation: Why study about s? Because Compilers Are an essential part of applied computer science Are very relevant to computational linguistics Are implemented using classical programming techniques Employ important software engineering principles Train you in developing software for transforming one structure to another (programs, files, transactions, ) Train you to think in terms of description s. Parsing files of some complex syntax is very common in many applications. The big picture Modern s are two-tiered: Front-end: from high-level to some intermediate Back-end: from the intermediate to binary. CISC implementation over CISC Some RISC Some imp. over RISC Intermediate emulator imp. over the Hack platform written in a high-level Hack Compiler lectures (Projects 10,11) lectures (Projects 7-8) HW lectures (Projects 1-6) CISC RISC other digital, each equipped with its implementation Any computer Hack computer Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 3 Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 4

CISC implementation over CISC RISC imp. over RISC Intermediate emulator imp. over the Hack platform written in a high-level Hack Compiler architecture (front end) Some Some Tokenizing / Lexical analysis / scanning Compiler (Chapter 10) XML Syntax Analyzer Program Tokenizer Parser Code Gene -ration (Chapter 11) (source) Syntax analysis: understanding the structure of the source scanner Tokenizing: creating a stream of atoms (target) Parsing: matching the atom stream with the grammar XML output = one way to demonstrate that the syntax analyzer works Code generation: reconstructing the semantics using the syntax of the target. Remove white space Construct a token list ( atoms) Things to worry about: Language specific rules: e.g. how to treat ++ Language-specific classifications: keyword, symbol, identifier, integercconstant, stringconstant, While we are at it, we can have the tokenizer record not only the token, but also its lexical classification (as defined by the source grammar). Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 5 Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 6 C function to split a string into tokens char* strtok ( char* str, const char* delimiters ); str: string to be broken into tokens Tokenizer Source if (x < 153) {let city = Paris ; delimiters: string containing the delimiter characters Tokenizer Tokenizer s output <tokens> <keyword> if </keyword> <symbol> ( </symbol> <identifier> x </identifier> <symbol> < </symbol> <integerconstant> 153 </integerconstant> <symbol> ) </symbol> <symbol> { </symbol> <keyword> let </keyword> <identifier> city </identifier> <symbol> = </symbol> <stringconstant> Paris </stringconstant> <symbol> ; </symbol> <symbol> </symbol> </tokens> Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 7 Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 8

Parsing The tokenizer discussed thus far is part of a larger program called parser Each is characterized by a grammar. The parser is implemented to recognize this grammar in given texts The parsing process: A text is given and tokenized Parsing examples English He ate an apple on the desk. (5+3)*2 sqrt(9*4) The parser determines weather or not the text can be generated from the grammar In the process, the parser performs a complete structural analysis of the text The text can be in an expression in a : Natural (English, ) Programming (, ). he parse ate an apple 5 + 3 * - 2 sqrt * 9 4 on the desk Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 9 Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 10 Regular expressions Context-free grammar a b* {, a, b, bb, bbb, (a b)* {, a, b, aa, ab, ba, bb, aaa, ab*(c ) {a, ac, ab, abc, abb, abbc, S () S (S) S SS S a as bs strings ending with a S x S y S S+S S S-S S S*S S S/S S (S) (x+y)*x-x*y/(x+x) Simple (terminal) forms / complex (non-terminal) forms Grammar = set of rules on how to construct complex forms from simpler forms Highly recursive. Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 11 Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 12

Recursive descent parser A typical grammar of a typical C-like A=bB cc A() { if (next()== b ) { eat( b ); B(); else if (next()== c ) { eat( c ); C(); A=(bB)* A() { while (next()== b ) { eat( b ); B(); Grammar program: : whilestatement ifstatement // other possibilities '{' Sequence '' whilestatement: 'while' '(' expression ')' ifstatement: simpleif ifelse simpleif: 'if' '(' expression ')' ifelse: 'if' '(' expression ')' 'else' Sequence: '' // null, i.e. the empty sequence ';' Sequence expression: // definition of an expression comes here // more definitions follow Code sample { while (expression) Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 13 Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 14 Parse tree program: Recursive descent parsing Input Text: while (count<=100) { /** demonstration */ count++; // Tokenized: while ( count <= 100 ) { count ++ ; whilestatement expression : whilestatement ifstatement // other possibilities '{' Sequence '' whilestatement: 'while' '(' expression ')' Sequence Sequence while ( count <= 100 ) { count ++ ; Highly recursive LL(0) grammars: the first token determines in which rule we are In other grammars you have to look ahead 1 or more tokens is almost LL(0). sample while (expression) Parser implementation: a set of parsing methods, one for each rule: parsestatement() parsewhilestatement() parseifstatement() parsestatementsequence() parseexpression(). Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 15 Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 16

The grammar The grammar (cont.) x : x appears verbatim x: x is a construct x?: x appears 0 or 1 times x*: x appears 0 or more times x y: either x or y appears (x,y): x appears, then y. Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 17 x : x appears verbatim x: x is a construct x?: x appears 0 or 1 times x*: x appears 0 or more times x y: either x or y appears (x,y): x appears, then y. Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 18 syntax analyzer in action Tokenizer: a tokenizer for the (proposed implementation) Class Bar { method Fraction foo(int y) { var int temp; // a variable let temp = (xxx+12)*-63; Syntax analyzer Syntax analyzer Using the grammar, a programmer can write a syntax analyzer program (parser) The syntax analyzer takes a source text file and attempts to match it on the grammar If successful, it can generate a parse tree in some structured format, e.g. XML. The syntax analyzer s algorithm shown in this slide: If xxx is non-terminal, output: <xxx> Recursive for the body of xxx </xxx> If xxx is terminal (keyword, symbol, constant, or identifier), output: <xxx> xxx value </xxx> <vardec> <keyword> var </keyword> <keyword> int </keyword> <identifier> temp </identifier> <symbol> ; </symbol> </vardec> <s> <letstatement> <keyword> let </keyword> <identifier> temp </identifier> <symbol> = </symbol> <expression> <term> <symbol> ( </symbol> <expression> <term> <identifier> xxx </identifier> </term> <symbol> + </symbol> <term> <int.const.> 12 </int.const.> </term> </expression> Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 19 Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 20

Tokenizer (cont.) CompilationEngine: a recursive top-down parser for The CompilationEngine effects the actual compilation output. It gets its input from a Tokenizer and emits its parsed structure into an output file/stream. The output is generated by a series of compilexxx() routines, one for every syntactic element xxx of the grammar. The contract between these routines is that each compilexxx() routine should read the syntactic construct xxx from the input, advance() the tokenizer exactly beyond xxx, and output the parsing of xxx. Thus, compilexxx()may only be called if indeed xxx is the next syntactic element of the input. In the first version of the, which we now build, this module emits a structured printout of the, wrapped in XML tags (defined in the specs of project 10). In the final version of the, this module generates executable (defined in the specs of project 11). In both cases, the parsing logic and module API are exactly the same. Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 21 Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 22 CompilationEngine (cont.) CompilationEngine (cont.) Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 23 Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 24

CompilationEngine (cont.) Summary and next step Syntax analysis: understanding syntax Code generation: constructing semantics Compiler (Chapter 10) XML Syntax Analyzer Program Tokenizer Parser Code Gene -ration (Chapter 11) The generation challenge: Extend the syntax analyzer into a full-blown that, instead of generating passive XML, generates executable Two challenges: (a) handling data, and (b) handling commands. Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 25 Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 26 Perspective The parse tree can be constructed on the fly Syntax analyzers can be built using: Lex tool for tokenizing (flex) Yacc tool for parsing (bison) Do everything from scratch (our approach ) The is intentionally simple: Statement prefixes: let, do, No operator priority No error checking Basic data types, etc. Richer s require more powerful s The : designed to illustrate the key ideas that underlie modern s, leaving advanced features to more advanced courses Industrial-strength s: (LL) Have good error diagnostics Generate tight and efficient Support parallel (multi-core) processors. Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 27