University of Wales Swansea. Department of Computer Science. Compilers. Course notes for module CS 218

Size: px
Start display at page:

Download "University of Wales Swansea. Department of Computer Science. Compilers. Course notes for module CS 218"

Transcription

1 University of Wales Swansea Department of Computer Science Compilers Course notes for module CS 218 Dr. Matt Poole 2002, edited by Mr. Christopher Whyley, 2nd Semester 2006/2007 www-compsci.swan.ac.uk/~cschris/compilers

2 1 Introduction 1.1 Compilation Definition. Compilation is a process that translates a program in one language (the source language) into an equivalent program in another language (the object or target language). Error messages Source program Compiler Target program An important part of any compiler is the detection and reporting of errors; this will be discussed in more detail later in the introduction. Commonly, the source language is a high-level programming language (i.e. a problem-oriented language), and the target language is a machine language or assembly language (i.e. a machine-oriented language). Thus compilation is a fundamental concept in the production of software: it is the link between the (abstract) world of application development and the low-level world of application execution on machines. Types of Translators. An assembler is also a type of translator: Assembly program Assembler Machine program An interpreter is closely related to a compiler, but takes both source program and input data. The translation and execution phases of the source program are one and the same. Source program Input data Interpreter Output data Although the above types of translator are the most well-known, we also need knowledge of compilation techniques to deal with the recognition and translation of many other types of languages including: 2

3 Command-line interface languages; Typesetting / word processing languages (e.g.tex); Natural languages; Hardware description languages; Page description languages (e.g. PostScript); Set-up or parameter files. Early Development of Compilers s. Early stored-program computers were programmed in machine language. Later, assembly languages were developed where machine instructions and memory locations were given symbolic forms s. Early high-level languages were developed, for example FORTRAN. Although more problem-oriented than assembly languages, the first versions of FORTRAN still had many machine-dependent features. Techniques and processes involved in compilation were not wellunderstood at this time, and compiler-writing was a huge task: e.g. the first FORTRAN compiler took 18 man years of effort to write. Chomsky s study of the structure of natural languages led to a classification of languages according to the complexity of their grammars. The context-free languages proved to be useful in describing the syntax of programming languages s onwards. The study of the parsing problem for context-free languages during the 1960 s and 1970 s has led to efficient algorithms for the recognition of context-free languages. These algorithms, and associated software tools, are central to compiler construction today. Similarly, the theory of finite state machines and regular expressions (which correspond to Chomsky s regular languages) have proven useful for describing the lexical structure of programming languages. From Algol 60, high-level languages have become more problem-oriented and machine independent, with features much removed from the machine languages into which they are compiled. The theory and tools available today make compiler construction a managable task, even for complex languages. For example, your compiler assignment will take only a few weeks (hopefully) and will only be about 1000 lines of code (although, admittedly, the source language is small). 1.2 The Context of a Compiler The complete process of compilation is illustrated as: 3

4 skeletal source program preprocessor source program compiler assembly program assembler relocatable m/c code link/load editor absolute m/c code Preprocessors Preprocessing performs (usually simple) operations on the source file(s) prior to compilation. Typical preprocessing operations include: (a) Expanding macros (shorthand notations for longer constructs). For example, in C, #define foo(x,y) (3*x+y*(2+x)) defines a macro foo, that when used in later in the program, is expanded by the preprocessor. For example, a = foo(a,b) becomes a = (3*a+b*(2+a)) (b) Inserting named files. For example, in C, #include "header.h" is replaced by the contents of the file header.h Linkers A linker combines object code (machine code that has not yet been linked) produced from compiling and assembling many source programs, as well as standard library functions and resources supplied by the operating system. This involves resolving references in each object file to external variables and procedures declared in other files. 4

5 1.2.3 Loaders Compilers, assemblers and linkers usually produce code whose memory references are made relative to an undetermined starting location that can be anywhere in memory (relocatable machine code). A loader calculates appropriate absolute addresses for these memory locations and amends the code to use these addresses. 1.3 The Phases of a Compiler The process of compilation is split up into six phases, each of which interacts with a symbol table manager and an error handler. This is called the analysis/synthesis model of compilation. There are many variants on this model, but the essential elements are the same. source program lexical analyser syntax analyser symbol-table manager semantic analyser intermediate code gen tor error handler code optimizer code generator target program Lexical Analysis A lexical analyser or scanner is a program that groups sequences of characters into lexemes, and outputs (to the syntax analyser) a sequence of tokens. Here: (a) Tokens are symbolic names for the entities that make up the text of the program; e.g. if for the keyword if, and id for any identifier. These make up the output of the lexical analyser. 5

6 (b) A pattern is a rule that specifies when a sequence of characters from the input constitutes a token; e.g the sequence i, f for the token if, and any sequence of alphanumerics starting with a letter for the token id. (c) A lexeme is a sequence of characters from the input that match a pattern (and hence constitute an instance of a token); for example if matches the pattern for if, and foo123bar matches the pattern for id. For example, the following code might result in the table given below. program foo(input,output);var x:integer;begin readln(x);writeln( value read =,x) end. Lexeme Token Pattern program program p, r, o, g, r, a, m newlines, spaces, tabs foo id (foo) letter followed by seq. of alphanumerics ( leftpar a left parenthesis input input i, n, p, u, t, comma a comma output output o, u, t, p, u, t ) rightpar a right parenthesis ; semicolon a semi-colon var var v, a, r x id (x) letter followed by seq. of alphanumerics : colon a colon integer integer i, n, t, e, g, e, r ; semicolon a semi-colon begin begin b, e, g, i, n newlines, spaces, tabs readln readln r, e, a, d, l, n ( leftpar a left parenthesis x id (x) letter followed by seq. of alphanumerics ) rightpar a right parenthesis ; semicolon a semi-colon writeln writeln w, r, i, t, e, l, n ( leftpar a left parenthesis value read = literal ( value read = ) seq. of chars enclosed in quotes, comma a comma x id (x) letter followed by seq. of alphanumerics ) rightpar a right parenthesis newlines, spaces, tabs end end e, n, d. fullstop a fullstop 6

7 It is the sequence of tokens in the middle column that are passed as output to the syntax analyser. This token sequence represents almost all the important information from the input program required by the syntax analyser. Whitespace (newlines, spaces and tabs), although often important in separating lexemes, is usually not returned as a token. Also, when outputting an id or literal token, the lexical analyser must also return the value of the matched lexeme (shown in parentheses) or else this information would be lost Symbol Table Management A symbol table is a data structure containing all the identifiers (i.e. names of variables, procedures etc.) of a source program together with all the attributes of each identifier. For variables, typical attributes include: its type, how much memory it occupies, its scope. For procedures and functions, typical attributes include: the number and type of each argument (if any), the method of passing each argument, and the type of value returned (if any). The purpose of the symbol table is to provide quick and uniform access to identifier attributes throughout the compilation process. Information is usually put into the symbol table during the lexical analysis and/or syntax analysis phases Syntax Analysis A syntax analyser or parser is a program that groups sequences of tokens from the lexical analysis phase into phrases each with an associated phrase type. A phrase is a logical unit with respect to the rules of the source language. For example, consider: a := x * y + z 7

8 After lexical analysis, this statement has the structure id 1 assign id 2 binop 1 id 3 binop 2 id 4 Now, a syntactic rule of Pascal is that there are objects called expressions for which the rules are (essentially): (1) Any constant or identifier is an expression. (2) If exp 1 and exp 2 are expressions then so is exp 1 binop exp 2. Taking all the identifiers to be variable names for simplicity, we have: By rule (1) exp 1 = id 2 and exp 2 = id 3 are both phrases with phrase type expression ; by rule (2) exp 3 = exp 1 binop 1 exp 2 is also a phrase with phrase type expression ; by rule (1) exp 4 = id 4 is a phase with type expression ; by rule (2), exp 5 = exp 3 binop 2 exp 4 is a phrase with phrase type expression. Of course, Pascal also has a rule that says: id assign exp is a phrase with phrase type assignment, and so the Pascal statement above is a phrase of type assignment. Parse Trees and Syntax Trees. The structure of a phrase is best thought of as a parse tree or a syntax tree. A parse tree is tree that illustrates the grouping of tokens into phrases. A syntax tree is a compacted form of parse tree in which the operators appear as the interior nodes. The construction of a parse tree is a basic activity in compiler-writing. A parse tree for the example Pascal statement is: assignment id 1 assign exp 5 exp 3 binop 2 exp 4 exp 1 binop 1 exp 2 id 4 id 2 id 3 8

9 and a syntax tree is: assign id 1 binop 2 binop 1 id 4 id 2 id 3 Comment. The distinction between lexical and syntactical analysis sometimes seems arbitrary. The main criterion is whether the analyser needs recursion or not: lexical analysers hardly ever use recursion; they are sometimes called linear analysers since they scan the input in a straight line (from from left to right). syntax analysers almost always use recursion; this is because phrase types are often defined in terms of themselves (cf. the phrase type expression above) Semantic Analysis A semantic analyser takes its input from the syntax analysis phase in the form of a parse tree and a symbol table. Its purpose is to determine if the input has a well-defined meaning; in practice semantic analysers are mainly concerned with type checking and type coercion based on type rules. Typical type rules for expressions and assignments are: Expression Type Rules. Let exp be an expression. (a) If exp is a constant then exp is well-typed and its type is the type of the constant. (b) If exp is a variable then exp is well-typed and its type is the type of the variable. (c) If exp is an operator applied to further subexpressions such that: (i) the operator is applied to the correct number of subexpressions, (ii) each subexpression is well-typed and (iii) each subexpression is of an appropriate type, then exp is well-typed and its type is the result type of the operator. Assignment Type Rules. Let var be a variable of type T 1 and let exp be a well-typed expression of type T 2. If 9

10 (a) T 1 = T 2 and (b) T 1 is an assignable type then var assign exp is a well-typed assignment. For example, consider the following code fragment: intvar := intvar + realarray where intvar is stored in the symbol table as being an integer variable, and realarray as an array or reals. In Pascal this assignment is syntactically correct, but semantically incorrect since + is only defined on numbers, whereas its second argument is an array. The semantic analyser checks for such type errors using the parse tree, the symbol table and type rules Error Handling Each of the six phases (but mainly the analysis phases) of a compiler can encounter errors. On detecting an error the compiler must: report the error in a helpful way, correct the error if possible, and continue processing (if possible) after the error to look for further errors. Types of Error. Errors are either syntactic or semantic: Syntax errors are errors in the program text; they may be either lexical or grammatical: (a) A lexical error is a mistake in a lexeme, for examples, typing tehn instead of then, or missing off one of the quotes in a literal. (b) A grammatical error is a one that violates the (grammatical) rules of the language, for example if x = 7 y := 4 (missing then). Semantic errors are mistakes concerning the meaning of a program construct; they may be either type errors, logical errors or run-time errors: (a) Type errors occur when an operator is applied to an argument of the wrong type, or to the wrong number of arguments. 10

11 (b) Logical errors occur when a badly conceived program is executed, for example: while x = y do... when x and y initially have the same value and the body of loop need not change the value of either x or y. (c) Run-time errors are errors that can be detected only when the program is executed, for example: var x : real; readln(x); writeln(1/x) which would produce a run time error if the user input 0. Syntax errors must be detected by a compiler and at least reported to the user (in a helpful way). If possible, the compiler should make the appropriate correction(s). Semantic errors are much harder and sometimes impossible for a computer to detect Intermediate Code Generation After the analysis phases of the compiler have been completed, a source program has been decomposed into a symbol table and a parse tree both of which may have been modified by the semantic analyser. From this information we begin the process of generating object code according to either of two approaches: (1) generate code for a specific machine, or (2) generate code for a general or abstract machine, then use further translators to turn the abstract code into code for specific machines. Approach (2) is more modular and efficient provided the abstract machine language is simple enough to: (a) produce and analyse (in the optimisation phase), and (b) easily translated into the required language(s). One of the most widely used intermediate languages is Three-Address Code (TAC). TAC Programs. A TAC program is a sequence of optionally labelled instructions. Some common TAC instructions include: (i) var 1 := var 2 binop var 3 (ii) var 1 := unop var 2 (iii) var 1 := num 11

12 (iv) goto label (v) if var 1 relop var 2 goto label There are also TAC instructions for addresses and pointers, arrays and procedure calls, but will will use only the above for the following discussion. Syntax-Directed Code Generation. In essence, code is generated by recursively walking through a parse (or syntax) tree, and hence the process is referred to as syntax-directed code generation. For example, consider the code fragment: z := x * y + x and its syntax tree (with lexemes replacing tokens): := z + * x x y We use this tree to direct the compilation into TAC as follows. At the root of the tree we see an assignment whose right-hand side is an expression, and this expression is the sum of two quantities. Assume that we can produce TAC code that computes the value of the first and second summands and stores these values in temp1 and temp2 respectively. Then the appropriate TAC for the assignment statement is just z := temp1 + temp2 Next we consider how to compute the values oftemp1 andtemp2 in the same top-down recursive way. For temp1 we see that it is the product of two quantities. Assume that we can produce TAC code that computes the value of the first and second multiplicands and stores these values in temp3 and temp4 respectively. Then the appropriate TAC for the computing temp1 is temp1 := temp3 * temp4 Continuing the recursive walk, we consider temp3. Here we see it is just the variable x and thus the TAC code 12

13 temp3 := x is sufficient. Next we come to temp4 and similar to temp3 the appropriate code is temp4 := y Finally, considering temp2, of course temp2 := x suffices. Each code fragment is output when we leave the corresponding node; this results in the final program: temp3 := x temp4 := y temp1 := temp3 * temp4 temp2 := x z := temp1 + temp2 Comment. Notice how a compound expression has been broken down and translated into a sequence of very simple instructions, and furthermore, the process of producing the TAC code was uniform and simple. Some redundancy has been brought into the TAC code but this can be removed (along with redundancy that is not due to the TAC-generation) in the optimisation phase Code Optimisation An optimiser attempts to improve the time and space requirements of a program. There are many ways in which code can be optimised, but most are expensive in terms of time and space to implement. Common optimisations include: removing redundant identifiers, removing unreachable sections of code, identifying common subexpressions, unfolding loops and 13

14 eliminating procedures. Note that here we are concerned with the general optimisation of abstract code. Example. Consider the TAC code: temp1 := x temp2 := temp1 if temp1 = temp2 goto 200 temp3 := temp1 * y goto temp3 := z 300 temp4 := temp2 + temp3 Removing redundant identifiers (just temp2) gives temp1 := x if temp1 = temp1 goto 200 temp3 := temp1 * y goto temp3 := z 300 temp4 := temp1 + temp3 Removing redundant code gives temp1 := x 200 temp3 := z 300 temp4 := temp1 + temp3 Notes. Attempting to find a best optimisation is expensive for the following reasons: A given optimisation technique may have to be applied repeatedly until no further optimisation can be obtained. (For example, removing one redundant identifier may introduce another.) A given optimisation technique may give rise to other forms of redundancy and thus sequences of optimisation techniques may have to be repeated. (For example, above we removed a redundant identifier and this gave rise to redundant code, but removing redundant code may lead to further redundant identifiers.) The order in which optimisations are applied may be significant. (How many ways are there of applying n optimisation techniques to a given piece of code?) 14

15 1.3.8 Code Generation The final phase of the compiler is to generate code for a specific machine. In this phase we consider: memory management, register assignment and machine-specific optimisation. The output from this phase is usually assembly language or relocatable machine code. Example. The TAC code above could typically result in the ARM assembly program shown below. Note that the example illustrates a mechanical translation of TAC into ARM; it is not intended to illustrate compact ARM programming!.x EQUD 0 four bytes for x.z EQUD 0 four bytes for z.temp EQUD 0 four bytes each for temp1, EQUD 0 temp3, and EQUD 0 temp4..prog MOV R12,#temp R12 = base address MOV R0,#x R0 = address of x LDR R1,[R0] R1 = value of x STR R1,[R12] store R1 at R12 MOV R0,#z R0 = address of z LDR R1,[R0] R1 = value of z STR R1,[R12,#4] store R1 at R12+4 LDR R1,[R12] R1 = value of temp1 LDR R2,[R12,#4] R2 = value of temp3 ADD R3,R1,R2 add temp1 to temp3 STR R3,[R12,#8] store R3 at R

16 2 Languages In this section we introduce the formal notion of a language, and the basic problem of recognising strings from a language. These are central concepts that we will use throughout the remainder of the course. Note.This section contains mainly theoretical definitions; the lectures will cover examples and diagrams illustrating the theory. 2.1 Basic Definitions An alphabet Σ is a finite non-empty set (of symbols). A string or word over an alphabet Σ is a finite concatenation (or juxtaposition) of symbols from Σ. The length of a string w (that is, the number of characters comprising it) is denoted w. The empty or null string is denoted ǫ. (That is, ǫ is the unique string satisfying ǫ = 0.) The set of all strings over Σ is denoted Σ. For each n 0 we define We define Σ n = {w Σ w = n}. Σ + = n 1Σ n. (Thus Σ = Σ + {ǫ}.) For a symbol or word x, x n denotes x concatenated with itself n times, with the convention that x 0 denotes ǫ. A language over Σ is a set L Σ. Two languages L 1 and L 2 over common alphabet Σ are equal if they are equal as sets. Thus L 1 = L 2 if, and only if, L 1 L 2 and L 2 L Decidability Given a language L over some alphabet Σ, a basic question is: For each possible word w Σ, can we effectively decide if w is a member of L or not? We call this the decision problem for L. Note the use of the word effectively : this implies the mechanism by which we decide on membership (or non-membership) must be a finitistic, deterministic and mechanical procedure 16

17 that can be carried out by some form of computing agent. Also note the decision problem asks if a given word is a member of L or not; that is, it is not sufficient to be only able to decide when words are members of L. More precisely then, a language L Σ is said to be decidable if there exists an algorithm such that for every w Σ (1) the algorithm terminates with output Yes when w L and (2) the algorithm terminates with output No when w L. If no such algorithm exists then L is said to be undecidable. Note. Decidability is based on the notion of an algorithm. In standard theoretical computer science this is taken to mean a Turing Machine; this is an abstract, but extremely low-level model of computation that is equivalent to a digital computer with an infinite memory. Thus it is sufficient in practice to use a more convenient model of computation such as Pascal programs provided that any decidability arguments we make assume an infinite memory. Example. Let Σ = {0, 1} be an alphabet. Let L be the (infinite) language L = {w Σ w = 0 n 1for some n}. Does the program below solve the decision problem for L? read( char ); if char = END_OF_STRING then print( "No" ) else /* char must be 0 or 1 */ while char = 0 do read( char ) od; /* char must be 1 or END_OF_STRING */ if char = 1 then print( "Yes" ) else print( "No" ) fi fi Answer: Basic Facts (1) Every finite language is decidable. (Hence every undecidable language is infinite.) 17

18 (2) Not every infinite language is undecidable. (3) Programming languages are (usually) infinite but (always) decidable. (Why?) 2.4 Applications to Compilation Languages may be classified by the means in which they are defined. Of interest to us are regular languages and context-free languages. Regular Languages. The significant aspects of regular languages are: they are defined by patterns called regular expressions; every regular language is decidable; the decision problem for any regular language is solved by a deterministic finite state automaton (DFA); and programming languages lexical patterns are specified using regular expressions, and lexical analysers are (essentially) DFAs. Regular languages and their relationship to lexical analysis are the subjects of the next section. Context-Free Languages. The significant aspects of context-free languages are: they are defined by rules called context-free grammars; every context-free language is decidable; the decision problem for any context-free language of interest to us is solved by a deterministic push-down automaton (DPDA); programming language syntax is specified using context-free grammars, and (most) parsers are (essentially) DPDAs. Context-free languages and their relationship to syntax analysis are the subjects of sections 4 and 5. 18

19 3 Lexical Analysis In this section we study some theoretical concepts with respect to the class of regular languages and apply these concepts to the practical problem of lexical analysis. Firstly, in Section 3.1, we define the notion of a regular expression and show how regular expressions determine regular languages. We then, in Section 3.2, introduce deterministic finite automata (DFAs), the class of algorithms that solve the decision problems for regular languages. We show how regular expressions and DFAs can be used to specify and implement lexical analysers in Section 3.3, and in Section 3.4 we take a brief look at Lex, a popular lexical analyser generator built upon the theory of regular expressions and DFAs. Note.This section contains mainly theoretical definitions; the lectures will cover examples and diagrams illustrating the theory. 3.1 Regular Expressions Recall from the Introduction that a lexical analyser uses pattern matching with respect to rules associated with the source language s tokens. For example, the token then is associated with the pattern t, h, e, n, and the token id might be associated with the pattern an alphabetic character followed by any number of alphanumeric characters. The notation of regular expressions is a mathematical formalism ideal for expressing patterns such as these, and thus ideal for expressing the lexical structure of programming languages Definition Regular expressions represent patterns of strings of symbols. A regular expression r matches a set of strings over an alphabet. This set is denoted L(r) and is called the language determined or generated by r. Let Σ be an alphabet. We define the set RE(Σ) of regular expressions over Σ, the strings they match and thus the languages they determine, as follows: RE(Σ) matches no strings. The language determined is L( ) =. ǫ RE(Σ) matches only the empty string. Therefore L(ǫ) = {ǫ}. If a Σ then a RE(Σ) matches the string a. Therefore L(a) = {a}. if r and s are in RE(Σ) and determine the languages L(r) and L(s) respectively, then r s RE(Σ) matches all strings matched either by r or by s. Therefore, L(r s) = L(r) L(s). 19

20 rs RE(Σ) matches any string that is the concatenation of two strings, the first matching r and the second matching s. Therefore, the language determined is L(rs) = L(r)L(s) = {uv Σ u L(r) and v L(s)}. (Given two sets S 1 and S 2 of strings, the notation S 1 S 2 denotes the set of all strings formed by appending members of S 1 with members of S 2.) r RE(Σ) matches all finite concatenations of strings which all match r. The language denoted is thus L(r ) = (L(r)) = (L(r)) i i N = {ǫ} L(r) L(r)L(r) Regular Languages Let L be a language over Σ. L is said to be a regular language if L = L(r) for some r RE(Σ) Notation We need to use parentheses to overide the convention concerning the precedence of the operators. The normal convention is: is higher than concatenation, which is higher than. Thus, for example, a bc is a (b(c )). We write r + for rr. We write r? for ǫ r. We write r n as an abbreviation for r...r (n times r), with r 0 denoting ǫ Lemma Writing r = s to mean L(r) = L(s) for two regular expressions r, s RE(Σ), the following identities hold for all r, s, t RE(Σ): r s = s r ( is commutative) (r s) t = r (s t) ( is associative) (rs)t = r(st) (concatenation is associative) r(s t) = rs rt (concatenation (r s)t = rt st distributes over ) 20

21 r = r = r = r = ǫ r? = r r = r (r s ) = (r s) ǫr = rǫ = r Regular definitions It is often useful to give names to complex regular expressions, and to use these names in place of the expressions they represent. Given an alphabet comprising all ASCII characters, letter = A B Z a b z digit = ident = letter(letter digit) are examples of regular definitions for letters, digits and identifiers The Decision Problem for Regular Languages For every regular expression r RE(Σ) there exists a string-processing machine M = M(r) such that for every w Σ, when input to M: (1) if w L(r) then M terminates with output Yes, and (2) if w L(r) then M terminates with output No. Thus, every regular language is decidable. The machines in question are Deterministic Finite State Automata. 3.2 Deterministic Finite State Automata In this section we define the notion of a DFA without reference to its application in lexical analysis. Here we are interested purely in solving the decision problem for regular languages; that is, defining machines that say yes or no given an inputted string, depending on its membership of a particular language. In Section 3.3 we use DFAs as the basis for lexical analysers: pattern matching algorithms that output sequences of tokens. 21

22 3.2.1 Definition A deterministic finite state automaton (or DFA) M is a 5-tuple where Q is a finite non-empty set of states, Σ is an alphabet, M = (Q, Σ, δ, q 0, F) δ : Q Σ Q is the transition or next-state function, q 0 Q is the initial state, and F Q is the set of accepting or final states. The idea behind a DFA M is that it is an abstract machine that defines a language L(M) Σ in the following way: The machine begins in its start state q 0 ; Given a string w Σ the machine reads the symbols of w one at a time from left to right; Each symbol causes the machine to make a transition from its current state to a new state; if the current state is q and the input symbol is a, then the new state is δ(q, a); The machine terminates when all the symbols of the string have been read; If, when the machine terminates, its state is a member of F, then the machine accepts w, else it rejects w. Note the name final state is not a good one since a DFA does not terminate as soon as a final state has been entered. The DFA only terminates when all the input has been read. We formalise this idea as follows: Definition. Let M = (Q, Σ, δ, q 0, F) be a DFA. We define ˆδ : Q Σ Q by ˆδ(q, ǫ) = q for each q Q and ˆδ(q, aw) = ˆδ(δ(q, a), w) for each q Q, a Σ and w Σ. We define the language of M by L(M) = {w Σ ˆδ(q 0, w) F }. 22

23 3.2.2 Transition Diagrams DFAs are best understood by depicting them as transition diagrams; these are directed graphs with nodes representing states and labelled arcs between states representing transitions. A transition diagram for a DFA is drawn as follows: (1) Draw a node labelled q for each state q Q. (2) For every q Q and every a Σ draw an arc labelled a from node q to node δ(q, a); (3) Draw an unlabelled arc from outside the DFA to the node representing the initial state q 0 ; (4) Indicate each final state by drawing a concentric circle around its node to form a double circle Examples Let M 1 = (Q, Σ, δ, q 0, F) where Q = {1, 2, 3, 4}, Σ = {a, b}, q 0 = 1, F = {4} and where δ is given by: δ(1, a) = 2 δ(1, b) = 3 δ(2, a) = 3 δ(2, b) = 4 δ(3, a) = 3 δ(3, b) = 3 δ(4, a) = 3 δ(4, b) = 4. From the transition diagram for M 1 it is clear that: L(M 1 ) = {w {a, b} ˆδ(1, w) F } = {w {a, b} ˆδ(1, w) = 4} = {ab, abb, abbb,..., ab n,...} = L(ab + ). Let M 2 be obtained from M 1 by adding states 1 and 2 to F. Then L(M 2 ) = L(ǫ ab ). Let M 3 be obtained from M 1 by changing F to {3}. Then L(M 3 ) = L((b aa abb a)(a b) ). Simplifications to transition diagrams. 23

24 It is often the case that a DFA has an error state, that is, a non-accepting state from which there are no transitions other than back to the error state. In such a case it is convenient to apply the convention that any apparently missing transitions are transitions to the error state. It is also common for there to be a large number of transitions between two given states in a DFA, which results in a cluttered transition diagram. For example, in an identifier recognition DFA, there may be 52 arcs labelled with each of the lower- and upper-case letters from the start state to a state representing that a single letter has been recognised. It is convenient in such cases to define a set comprising the labels of each of these arcs, for example, letter = {a, b, c,...,z, A, B, C,..., Z} Σ and to replace the arcs by a single arc labelled by the name of this set, e.g. letter. It is acceptable practice to use these conventions provided it is made clear that they are being operated Equivalence Theorem (1) For every r RE(Σ) there exists a DFA M with alphabet Σ such that L(M) = L(r). (2) For every DFA M with alphabet Σ there exists an r RE(Σ) such that L(r) = L(M). Proof. See J.E. Hopcroft and J. D. Ullman Introduction to Automata Theory, Languages, and Computation (Addison Wesley, 1979). Applications. The significance of the Equivalence Theorem is that its proof is constructive; there is an algorithm that, given a regular expression r, builds a DFA M such that L(M) = L(r). Thus, if we can write a fragment of a programming language syntax in terms of regular expressions, then by part (1) of the Theorem we can automatically construct a lexical analyser for that fragment. Part (2) of the Equivalence Theorem is a useful tool for showing that a language is regular, since if we cannot find a regular expression directly, part (2) states that it is sufficient to find a DFA that recognises the language. The standard algorithm for constructing a DFA from a given regular expression is not difficult, but would require that we also take a look at nondeterministic finite state automata (NFAs). NFAs are equivalent in power to DFAs but are slightly harder to understand (see the course text for details). Given a regular expression, the RE-DFA algorithm first constructs an NFA equivalent to the RE (by a method known as Thompson s Construction), and then transforms the NFA into an equivalent DFA (by a method known as the Subset Construction). 24

25 3.3 DFAs for Lexical Analysis Let s suppose we wish to construct a lexical analyser based on a DFA. We have seen that it is easy to construct a DFA that recognises lexemes for a given programming language token (e.g. for individual keywords, for identifiers, and for numbers). However, a lexical analyser has to deal with all of a programming language s lexical patterns, and has to repeatedly match sequences of characters against these patterns and output corresponding tokens. We illustrate how lexical analysers may be constructed using DFAs by means of an example An example DFA lexical analyser Consider first writing a DFA for recognising tokens for a (minimal!) language with identifiers and the symbols +, ; and := (we ll add keywords later). A transition diagram for a DFA that recognises these symbols is given by: letter digit in_ident other (re-read) letter whitespace other identifier (or keyword) error : in_assign start = done assign end_of_file end_of_input + plus other ; semi_colon error The lexical analyser code (see next section) consists of a procedure get next token which outputs the next token, which can be either identifier (for identifiers), plus (for +), semi colon (for ;), assign (for :=), error (if an error occurs, for example if an invalid character such as ( is read or if a : is not followed by a =) and end of input (for when the complete input file has been read). The (lexical analyser based on) the DFA begins in state start, and returns a token when it enters state done; the token returned depends on the final transition it takes to enter the done state, and is shown on the right hand side of the diagram. 25

26 For a state with an output arc labelled other, the intuition is that this transition is made on reading any character except those labelled on the state s other arcs; re-read denotes that the read character should not be consumed it is re-read as the first character when get next token is next called. Notice that adding keyword recognition to the above DFA would be tricky to do by hand and would lead to a complex DFA why? However, we can recognise keywords as identifiers using the above DFA, and when the accepting state for identifiers is entered the lexeme stored in the buffer can be checked against a table of keywords. If there is a match, then the appropriate keyword token is output, else identifier token is output. Further, when we have recognised an identifier we will also wish to output its string value as well as the identifier token, so that this can be used by the next phase of compilation Code for the example DFA lexical analyser Let s consider how code may be written based on the above DFA. Let s add the keywords if, then, else and fi to the language to make it slightly more realistic. Firstly, we define enumerated types for the sets of states and tokens: state = (start, in_identifier, in_assign, done); token = (k_if, k_then, k_else, k_fi, plus, identifier, assign, semi_colon, error, end_of_input); Next, we define some variables that are shared by the lexical analyser and the syntax analyser. The job of the procedure get next token is to set the value of current token to next token, and if this token is identifier it also sets the value of current identifier to the current lexeme. The value of the Boolean variable reread character determines whether the last character read during the previous execution of get next token should be re-read at the beginning of its next execution. The current character variable holds the value of the last character read by get next token. current_token : token; current_identifier : string[100]; reread_character : boolean; current_character : char; We also need the following auxiliary functions (their implementations are omitted here) with the obvious interpretations: function is_alpha(c : char) : boolean; 26

27 function is_digit(c : char) : boolean; function is_white_space(c : char) : boolean; Finally, we define two constant arrays: { Constants used to recognise keyword matches } NUM_KEYWORDS = 4; token_tab : array[1..num_keywords] of token = (k_if, k_then, k_else, k_fi); keyword_tab : array[1..num_keywords] of string = ( if, then, else, fi ); that store keyword tokens and keywords (with associated keywords and tokens stored at the same location each array) and a function that searches the keyword array for a string and returns the token associated with a matched keyword or the token identifier if not. Notice that the arrays and function are easily modified for any number of keywords appearing in our source language. function keyword_lookup(s : string) : token; {If s is a keyword, return this keyword s token; else return the identifier token} var index : integer; found : boolean; begin keyword_lookup := identifier; found := FALSE; index := 1; while (index <= NUM_KEYWORDS) and (not found) do begin if keyword_tab[index] = s then begin keyword_lookup := token_tab[index]; found := TRUE end; index := index + 1 end end; The get next token procedure is implemented as follows. Notice that within the main loop a case statement is used to deal with transitions from the current state. After the loop exits (when the done state is entered), if an identifier has been recognised, keyword lookup is used to check whether or not a keyword has been matched. 27

28 procedure get_next_token; {Sets the value of current_token by matching input characters. Also, sets the values current_identifier and reread_character if appropriate} var current_state : state; no_more_input : boolean; begin current_state := start; current_identifier := ; while not (current_state = done) do begin no_more_input := eof; {Check whether at end of file} if not (reread_character or no_more_input) then read(current_character); reread_character := FALSE; case current_state of start: if no_more_input then begin current_token := end_of_input; current_state := done end else if is_white_space(current_character) then current_state := start else if is_alpha(current_character) then begin current_identifier := current_identifier + current_character; current_state := in_identifier end else case current_character of ; : begin current_token := semi_colon; current_state := done end; + : begin current_token := plus; current_state := done end; : : current_state := in_assign else begin current_token := error; current_state := done; end end; {case} in_identifier: if (no_more_input or not(is_alpha(current_character) or is_digit(current_character))) then begin current_token := identifier; 28

29 current_state := done; reread_character := true end else current_identifier := current_identifier + current_character; in_assign: if no_more_input or (current_character <> = ) then begin current_token := error; current_state := done end else begin current_token := assign; current_state := done end end; {case} end; {while} if (current_token = identifier) then current_token := keyword_lookup(current_identifier); end; Test code (in the absence of a syntax analyser) might be the following. This just repeatedly calls get next token until the end of the input file has been reached, and prints out the value of the read token. {Request tokens from lexical analyser, outputting their values, until end_of_input} begin reread_character := false; repeat get_next_token; writeln( Current Token is, token_to_text(current_token)); if (current_token = identifier) then writeln( Identifier is, current_identifier); until (current_token = end_of_input) end. where function token_to_text(t : token) : string; converts token values to text. 29

30 3.4 Lex Lex is a widely available lexical analyser generator Overview Given a Lex source file comprising regular expressions for various tokens, Lex generates a lexical analyser (based on a DFA), written in C, that groups characters matching the expressions into lexemes, and can return their corresponding tokens. In essence, a Lex file comprises a number of lines typically of the form: pattern action where pattern is a regular expression and action is a piece of C code. When run on a Lex file, Lex produces a C file called lex.yy.c (a lexical analyser). When compiled, lex.yy.c takes a stream of characters as input and whenever a sequence of characters matches a given regular expression the corresponding action is executed. Characters not matching any regular expressions are simply copied to the output stream. Example. Consider the Lex fragment: a { printf( "read a \n" ); } b { printf( "read b \n" ); } After compiling (see below on how to do this) we obtain a binary executable which when executed on the input: sdfghjklaghjbfghjkbbdfghjk dfghjkaghjklaghjk produces sdfghjklread a ghjread b fghjkread b read b dfghjk dfghjkread a ghjklread a ghjk 30

31 Example. Consider the Lex program: %{ int abc_count, xyz_count; %} %% ab[cc] {abc_count++; } xyz {xyz_count++; } \n { ; }. { ; } %% main() { abc_count = xyz_count = 0; yylex(); printf( "%d occurrences of abc or abc\n", abc_count ); printf( "%d occurrences of xyz\n", xyz\_count ); } This file first declares two global variables for counting the number of occurrences of abc or abc and xyz. Next come the regular expressions for these lexemes and actions to increment the relevant counters. Finally, there is a main routine to initialise the counters and call yylex(). When executed on input: akhabfabcdbcaxyzxyzabchsdk dfhslkdxyzabcabcdkkjxyzkdf the lexical analyser produces: 4 occurrences of abc or abc 3 occurrences of xyz Some features of Lex illustrated by this example are: 31

32 (1) The notation for ; for example, [cc] matches either c or C. (2) The regular expression \n which matches a newline. (3) The regular expression. which matches any character except a newline. (4) The action { ; } which does nothing except to suppress printing Format of Lex Files The format of a Lex file is: definitions analyser specification auxiliary functions Lex Definitions. The (optional) definitions section comprises macros (see below) and global declarations of types, variables and functions to be used in the actions of the lexical analyser and the auxiliary functions (if present). All such global declaration code is written in C and surrounded by %{ and %}. Macros are abbreviations for regular expressions to be used in the analyser specification. For example, the token identifier could be defined by: IDENTIFIER [a-za-z][a-za-z0-9]* The shorthand character range construction [x-y] matches any of the characters between (and including) x and y. For example, [a-c] means the same as a b c, and [a-ca-c] means the same as a b c A B C. Definitions may use other definitions (enclosed in braces) as illustrated in: ALPHA ALPHANUM IDENTIFIER [a-za-z] [a-za-z0-9] {ALPHA}{ALPHANUM}* and: ALPHA [a-za-z] NUM [0-9] ALPHANUM ({ALPHA} {NUM}) IDENTIFIER {ALPHA}{ALPHANUM}* Notice the use of parentheses in the definition of ALPHANUM. What would happen without them? 32

33 Lex Analyser Specifications. These have the form: r 1 { action 1 } r 2 { action 2 } r n { action n } where r 1, r 2,..., r n are regular expressions (possibly involving macros enclosed in braces) and action 1, action 2,..., action n are sequences of C statements. Lex translates the specification into a function yylex() which, when when called, causes the following to happen: The current input character(s) are scanned to look for a match with the regular expressions. If there is no match, the current character is printed out, and the scanning process resumes with the next character. If the next m characters match r i then (a) the matching characters are assigned to string variable yytext, (b) the integer variable yyleng is assigned the value m, (c) the next m characters are skipped, and (d) action i is executed. If the last instruction of action i is return n; (where n is an integer expression) then the call to yylex() terminates and the value of n is returned as the function s value; otherwise yylex() resumes the scanning process. If end-of-file is read at any stage, then the call to yylex() terminates returning the value 0. If there is a match against two or more regular expressions, then the expression giving the longest lexeme is chosen; if all lexemes are of the same length then the first matching expression is chosen. Lex Auxiliary Functions. This optional section has the form: fun 1 fun 2... fun n where each fun i is a complete C function. We can also compile lex.yy.c with the lex library using the command: gcc lex.yy.c -ll 33

34 This has the effect of automatically including a standard main() function, equivalent to: main() { yylex(); return; } Thus in the absence of anyreturn statements in the analyser s actions, this one call toyylex() consumes all the input up to and including end-of-file Lexical Analyser Example The lex program below illustrates how a lexical analyser for a Pascal-type language is defined. Notice that the regular expression for identifiers is placed at the end of the list (why?). We assume that the syntax analyser requests tokens by repeatedly calling the function yylex(). The global variable yylval (of type integer in this example) is generally used to pass tokens attributes from the lexical analyser to the syntax analyser and is shared by both phases of the compiler. Here it is being used to pass integers values and identifiers symbol table positions to the syntax analyser. 34

35 %{ %} definitions (as integers) of IF, THEN, ELSE, ID, INTEGER,... delim [ \t\n] ws {delim}+ letter [A-Za-z] digit [0-9] id {letter}({letter} {digit})* integer [+\-]?{digit}+ %% {ws} { ; } if { return(if); } then { return(then); } else { return(else); } {integer} { yylval = atoi(yytext); return(integer); } {id} { yylval = InstallInTable(); return(id); } %% int InstallInTable() { put yytext in symbol table and return the position it has been inserted. } 35

36 4 Syntax Analysis In this section we will look at the second phase of compilation: syntax analysis, or parsing. Since parsing is a central concept in compilation and because (unlike lexical analysis) there are many approaches to parsing, this section makes up most of the remainder of the course. In Section 4.1 we discuss the class of context-free languages and its relationship to the syntactic structure of programming languages and the compilation process. Parsing algorithms for context-free languages fall into two main categories: top-down and bottom-up parsers (the names refer to the process of parse tree construction). Different types of top-down and bottom-up parsing algorithms will be discussed in Sections 4.2 and 4.3 respectively. 4.1 Context-Free Languages Regular languages are inadequate for specifying all but the simplest aspects of programming language syntax. To specify more-complex languages such as L = {w {a, b} w = a n b n for some n}, L = {w {(, )} w is a well-balanced string of parentheses} and the syntax of most programming languages, we use context-free languages. In this section we define context-free grammars and languages, and their use in describing the syntax of programming languages. This section is intended to provide a foundation for the following sections on parsing and parser construction. Note Like Section 3, this section contains mainly theoretical definitions; the lectures will cover examples and diagrams illustrating the theory Context-Free Grammars Definition. A context-free grammar is a tuple G = (T, N, S, P) where: T is a finite nonempty set of (terminal) symbols (tokens), N is a finite nonempty set of (nonterminal) symbols (denoting phrase types) disjoint from T, S N (the start symbol), and P is a set of (context-free) productions (denoting rules for phrase types) of the form A α where A N and α (T N). 36

37 Notation. In what follows we use: a, b, c,... for members of T, A, B, C,... for members of N,...,X, Y, Z for members of T N, u, v, w,... for members of T, and α, β, γ,... for members of (T N). Examples. (1) G 1 = (T, N, S, P) where T = {a, b}, N = {S} and P = {S ab, S asb}. (2) G 2 = (T, N, S, P) where T = {a, b}, N = {S, X} and P = {S X, S aa, S bb, S asa, S bsb, X a, X b}. Notation. It is customary to define a context free grammar by simply listing its productions and assuming: The terminals and nonterminals of the grammar are exactly those terminals appearing in the productions. (It is usually clear from the context whether a symbol is a terminal or nonterminal.) The start symbol is the nonterminal on the left-hand side of the first production. Right-hand sides separated by indicate alternatives. For example, G 2 above can be written as S X X aa bb asa bsb a b 37

Compiler Construction

Compiler Construction Compiler Construction Regular expressions Scanning Görel Hedin Reviderad 2013 01 23.a 2013 Compiler Construction 2013 F02-1 Compiler overview source code lexical analysis tokens intermediate code generation

More information

Scanner. tokens scanner parser IR. source code. errors

Scanner. tokens scanner parser IR. source code. errors Scanner source code tokens scanner parser IR errors maps characters into tokens the basic unit of syntax x = x + y; becomes = + ; character string value for a token is a lexeme

More information

Compilers Lexical Analysis

Compilers Lexical Analysis Compilers Lexical Analysis SITE : http://www.info.univ-tours.fr/ mirian/ TLC - Mírian Halfeld-Ferrari p. 1/3 The Role of the Lexical Analyzer The first phase of a compiler. Lexical analysis : process of

More information

Formal Languages and Automata Theory - Regular Expressions and Finite Automata -

Formal Languages and Automata Theory - Regular Expressions and Finite Automata - Formal Languages and Automata Theory - Regular Expressions and Finite Automata - Samarjit Chakraborty Computer Engineering and Networks Laboratory Swiss Federal Institute of Technology (ETH) Zürich March

More information

COMP 356 Programming Language Structures Notes for Chapter 4 of Concepts of Programming Languages Scanning and Parsing

COMP 356 Programming Language Structures Notes for Chapter 4 of Concepts of Programming Languages Scanning and Parsing COMP 356 Programming Language Structures Notes for Chapter 4 of Concepts of Programming Languages Scanning and Parsing The scanner (or lexical analyzer) of a compiler processes the source program, recognizing

More information

Lexical analysis FORMAL LANGUAGES AND COMPILERS. Floriano Scioscia. Formal Languages and Compilers A.Y. 2015/2016

Lexical analysis FORMAL LANGUAGES AND COMPILERS. Floriano Scioscia. Formal Languages and Compilers A.Y. 2015/2016 Master s Degree Course in Computer Engineering Formal Languages FORMAL LANGUAGES AND COMPILERS Lexical analysis Floriano Scioscia 1 Introductive terminological distinction Lexical string or lexeme = meaningful

More information

Compiler I: Syntax Analysis Human Thought

Compiler I: Syntax Analysis Human Thought Course map Compiler I: Syntax Analysis Human Thought Abstract design Chapters 9, 12 H.L. Language & Operating Sys. Compiler Chapters 10-11 Virtual Machine Software hierarchy Translator Chapters 7-8 Assembly

More information

5HFDOO &RPSLOHU 6WUXFWXUH

5HFDOO &RPSLOHU 6WUXFWXUH 6FDQQLQJ 2XWOLQH 2. Scanning The basics Ad-hoc scanning FSM based techniques A Lexical Analysis tool - Lex (a scanner generator) 5HFDOO &RPSLOHU 6WUXFWXUH 6RXUFH &RGH /H[LFDO $QDO\VLV6FDQQLQJ 6\QWD[ $QDO\VLV3DUVLQJ

More information

Compiler Construction

Compiler Construction Compiler Construction Lecture 1 - An Overview 2003 Robert M. Siegfried All rights reserved A few basic definitions Translate - v, a.to turn into one s own language or another. b. to transform or turn from

More information

6.045: Automata, Computability, and Complexity Or, Great Ideas in Theoretical Computer Science Spring, 2010. Class 4 Nancy Lynch

6.045: Automata, Computability, and Complexity Or, Great Ideas in Theoretical Computer Science Spring, 2010. Class 4 Nancy Lynch 6.045: Automata, Computability, and Complexity Or, Great Ideas in Theoretical Computer Science Spring, 2010 Class 4 Nancy Lynch Today Two more models of computation: Nondeterministic Finite Automata (NFAs)

More information

Introduction to Automata Theory. Reading: Chapter 1

Introduction to Automata Theory. Reading: Chapter 1 Introduction to Automata Theory Reading: Chapter 1 1 What is Automata Theory? Study of abstract computing devices, or machines Automaton = an abstract computing device Note: A device need not even be a

More information

Pushdown automata. Informatics 2A: Lecture 9. Alex Simpson. 3 October, 2014. School of Informatics University of Edinburgh als@inf.ed.ac.

Pushdown automata. Informatics 2A: Lecture 9. Alex Simpson. 3 October, 2014. School of Informatics University of Edinburgh als@inf.ed.ac. Pushdown automata Informatics 2A: Lecture 9 Alex Simpson School of Informatics University of Edinburgh als@inf.ed.ac.uk 3 October, 2014 1 / 17 Recap of lecture 8 Context-free languages are defined by context-free

More information

Compilers. Introduction to Compilers. Lecture 1. Spring term. Mick O Donnell: michael.odonnell@uam.es Alfonso Ortega: alfonso.ortega@uam.

Compilers. Introduction to Compilers. Lecture 1. Spring term. Mick O Donnell: michael.odonnell@uam.es Alfonso Ortega: alfonso.ortega@uam. Compilers Spring term Mick O Donnell: michael.odonnell@uam.es Alfonso Ortega: alfonso.ortega@uam.es Lecture 1 to Compilers 1 Topic 1: What is a Compiler? 3 What is a Compiler? A compiler is a computer

More information

CA4003 - Compiler Construction

CA4003 - Compiler Construction CA4003 - Compiler Construction David Sinclair Overview This module will cover the compilation process, reading and parsing a structured language, storing it in an appropriate data structure, analysing

More information

C H A P T E R Regular Expressions regular expression

C H A P T E R Regular Expressions regular expression 7 CHAPTER Regular Expressions Most programmers and other power-users of computer systems have used tools that match text patterns. You may have used a Web search engine with a pattern like travel cancun

More information

Lexical Analysis and Scanning. Honors Compilers Feb 5 th 2001 Robert Dewar

Lexical Analysis and Scanning. Honors Compilers Feb 5 th 2001 Robert Dewar Lexical Analysis and Scanning Honors Compilers Feb 5 th 2001 Robert Dewar The Input Read string input Might be sequence of characters (Unix) Might be sequence of lines (VMS) Character set ASCII ISO Latin-1

More information

CSCI 3136 Principles of Programming Languages

CSCI 3136 Principles of Programming Languages CSCI 3136 Principles of Programming Languages Faculty of Computer Science Dalhousie University Winter 2013 CSCI 3136 Principles of Programming Languages Faculty of Computer Science Dalhousie University

More information

Honors Class (Foundations of) Informatics. Tom Verhoeff. Department of Mathematics & Computer Science Software Engineering & Technology

Honors Class (Foundations of) Informatics. Tom Verhoeff. Department of Mathematics & Computer Science Software Engineering & Technology Honors Class (Foundations of) Informatics Tom Verhoeff Department of Mathematics & Computer Science Software Engineering & Technology www.win.tue.nl/~wstomv/edu/hci c 2011, T. Verhoeff @ TUE.NL 1/20 Information

More information

03 - Lexical Analysis

03 - Lexical Analysis 03 - Lexical Analysis First, let s see a simplified overview of the compilation process: source code file (sequence of char) Step 2: parsing (syntax analysis) arse Tree Step 1: scanning (lexical analysis)

More information

Programming Languages CIS 443

Programming Languages CIS 443 Course Objectives Programming Languages CIS 443 0.1 Lexical analysis Syntax Semantics Functional programming Variable lifetime and scoping Parameter passing Object-oriented programming Continuations Exception

More information

Name: Class: Date: 9. The compiler ignores all comments they are there strictly for the convenience of anyone reading the program.

Name: Class: Date: 9. The compiler ignores all comments they are there strictly for the convenience of anyone reading the program. Name: Class: Date: Exam #1 - Prep True/False Indicate whether the statement is true or false. 1. Programming is the process of writing a computer program in a language that the computer can respond to

More information

[Refer Slide Time: 05:10]

[Refer Slide Time: 05:10] Principles of Programming Languages Prof: S. Arun Kumar Department of Computer Science and Engineering Indian Institute of Technology Delhi Lecture no 7 Lecture Title: Syntactic Classes Welcome to lecture

More information

Regular Expressions and Automata using Haskell

Regular Expressions and Automata using Haskell Regular Expressions and Automata using Haskell Simon Thompson Computing Laboratory University of Kent at Canterbury January 2000 Contents 1 Introduction 2 2 Regular Expressions 2 3 Matching regular expressions

More information

Turing Machines: An Introduction

Turing Machines: An Introduction CIT 596 Theory of Computation 1 We have seen several abstract models of computing devices: Deterministic Finite Automata, Nondeterministic Finite Automata, Nondeterministic Finite Automata with ɛ-transitions,

More information

Reading 13 : Finite State Automata and Regular Expressions

Reading 13 : Finite State Automata and Regular Expressions CS/Math 24: Introduction to Discrete Mathematics Fall 25 Reading 3 : Finite State Automata and Regular Expressions Instructors: Beck Hasti, Gautam Prakriya In this reading we study a mathematical model

More information

Introduction to Lex. General Description Input file Output file How matching is done Regular expressions Local names Using Lex

Introduction to Lex. General Description Input file Output file How matching is done Regular expressions Local names Using Lex Introduction to Lex General Description Input file Output file How matching is done Regular expressions Local names Using Lex General Description Lex is a program that automatically generates code for

More information

2) Write in detail the issues in the design of code generator.

2) Write in detail the issues in the design of code generator. COMPUTER SCIENCE AND ENGINEERING VI SEM CSE Principles of Compiler Design Unit-IV Question and answers UNIT IV CODE GENERATION 9 Issues in the design of code generator The target machine Runtime Storage

More information

Automata and Computability. Solutions to Exercises

Automata and Computability. Solutions to Exercises Automata and Computability Solutions to Exercises Fall 25 Alexis Maciel Department of Computer Science Clarkson University Copyright c 25 Alexis Maciel ii Contents Preface vii Introduction 2 Finite Automata

More information

The Halting Problem is Undecidable

The Halting Problem is Undecidable 185 Corollary G = { M, w w L(M) } is not Turing-recognizable. Proof. = ERR, where ERR is the easy to decide language: ERR = { x { 0, 1 }* x does not have a prefix that is a valid code for a Turing machine

More information

Pushdown Automata. place the input head on the leftmost input symbol. while symbol read = b and pile contains discs advance head remove disc from pile

Pushdown Automata. place the input head on the leftmost input symbol. while symbol read = b and pile contains discs advance head remove disc from pile Pushdown Automata In the last section we found that restricting the computational power of computing devices produced solvable decision problems for the class of sets accepted by finite automata. But along

More information

Programming Assignment II Due Date: See online CISC 672 schedule Individual Assignment

Programming Assignment II Due Date: See online CISC 672 schedule Individual Assignment Programming Assignment II Due Date: See online CISC 672 schedule Individual Assignment 1 Overview Programming assignments II V will direct you to design and build a compiler for Cool. Each assignment will

More information

Semantic Analysis: Types and Type Checking

Semantic Analysis: Types and Type Checking Semantic Analysis Semantic Analysis: Types and Type Checking CS 471 October 10, 2007 Source code Lexical Analysis tokens Syntactic Analysis AST Semantic Analysis AST Intermediate Code Gen lexical errors

More information

Syntaktická analýza. Ján Šturc. Zima 208

Syntaktická analýza. Ján Šturc. Zima 208 Syntaktická analýza Ján Šturc Zima 208 Position of a Parser in the Compiler Model 2 The parser The task of the parser is to check syntax The syntax-directed translation stage in the compiler s front-end

More information

Scanning and parsing. Topics. Announcements Pick a partner by Monday Makeup lecture will be on Monday August 29th at 3pm

Scanning and parsing. Topics. Announcements Pick a partner by Monday Makeup lecture will be on Monday August 29th at 3pm Scanning and Parsing Announcements Pick a partner by Monday Makeup lecture will be on Monday August 29th at 3pm Today Outline of planned topics for course Overall structure of a compiler Lexical analysis

More information

Informatica e Sistemi in Tempo Reale

Informatica e Sistemi in Tempo Reale Informatica e Sistemi in Tempo Reale Introduction to C programming Giuseppe Lipari http://retis.sssup.it/~lipari Scuola Superiore Sant Anna Pisa October 25, 2010 G. Lipari (Scuola Superiore Sant Anna)

More information

Moving from CS 61A Scheme to CS 61B Java

Moving from CS 61A Scheme to CS 61B Java Moving from CS 61A Scheme to CS 61B Java Introduction Java is an object-oriented language. This document describes some of the differences between object-oriented programming in Scheme (which we hope you

More information

Bachelors of Computer Application Programming Principle & Algorithm (BCA-S102T)

Bachelors of Computer Application Programming Principle & Algorithm (BCA-S102T) Unit- I Introduction to c Language: C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating

More information

CSC4510 AUTOMATA 2.1 Finite Automata: Examples and D efinitions Definitions

CSC4510 AUTOMATA 2.1 Finite Automata: Examples and D efinitions Definitions CSC45 AUTOMATA 2. Finite Automata: Examples and Definitions Finite Automata: Examples and Definitions A finite automaton is a simple type of computer. Itsoutputislimitedto yes to or no. It has very primitive

More information

How to make the computer understand? Lecture 15: Putting it all together. Example (Output assembly code) Example (input program) Anatomy of a Computer

How to make the computer understand? Lecture 15: Putting it all together. Example (Output assembly code) Example (input program) Anatomy of a Computer How to make the computer understand? Fall 2005 Lecture 15: Putting it all together From parsing to code generation Write a program using a programming language Microprocessors talk in assembly language

More information

Programming Project 1: Lexical Analyzer (Scanner)

Programming Project 1: Lexical Analyzer (Scanner) CS 331 Compilers Fall 2015 Programming Project 1: Lexical Analyzer (Scanner) Prof. Szajda Due Tuesday, September 15, 11:59:59 pm 1 Overview of the Programming Project Programming projects I IV will direct

More information

Finite Automata. Reading: Chapter 2

Finite Automata. Reading: Chapter 2 Finite Automata Reading: Chapter 2 1 Finite Automaton (FA) Informally, a state diagram that comprehensively captures all possible states and transitions that a machine can take while responding to a stream

More information

The programming language C. sws1 1

The programming language C. sws1 1 The programming language C sws1 1 The programming language C invented by Dennis Ritchie in early 1970s who used it to write the first Hello World program C was used to write UNIX Standardised as K&C (Kernighan

More information

1 Introduction. 2 An Interpreter. 2.1 Handling Source Code

1 Introduction. 2 An Interpreter. 2.1 Handling Source Code 1 Introduction The purpose of this assignment is to write an interpreter for a small subset of the Lisp programming language. The interpreter should be able to perform simple arithmetic and comparisons

More information

Scoping (Readings 7.1,7.4,7.6) Parameter passing methods (7.5) Building symbol tables (7.6)

Scoping (Readings 7.1,7.4,7.6) Parameter passing methods (7.5) Building symbol tables (7.6) Semantic Analysis Scoping (Readings 7.1,7.4,7.6) Static Dynamic Parameter passing methods (7.5) Building symbol tables (7.6) How to use them to find multiply-declared and undeclared variables Type checking

More information

3515ICT Theory of Computation Turing Machines

3515ICT Theory of Computation Turing Machines Griffith University 3515ICT Theory of Computation Turing Machines (Based loosely on slides by Harald Søndergaard of The University of Melbourne) 9-0 Overview Turing machines: a general model of computation

More information

CS5236 Advanced Automata Theory

CS5236 Advanced Automata Theory CS5236 Advanced Automata Theory Frank Stephan Semester I, Academic Year 2012-2013 Advanced Automata Theory is a lecture which will first review the basics of formal languages and automata theory and then

More information

Notes on Complexity Theory Last updated: August, 2011. Lecture 1

Notes on Complexity Theory Last updated: August, 2011. Lecture 1 Notes on Complexity Theory Last updated: August, 2011 Jonathan Katz Lecture 1 1 Turing Machines I assume that most students have encountered Turing machines before. (Students who have not may want to look

More information

CS154. Turing Machines. Turing Machine. Turing Machines versus DFAs FINITE STATE CONTROL AI N P U T INFINITE TAPE. read write move.

CS154. Turing Machines. Turing Machine. Turing Machines versus DFAs FINITE STATE CONTROL AI N P U T INFINITE TAPE. read write move. CS54 Turing Machines Turing Machine q 0 AI N P U T IN TAPE read write move read write move Language = {0} q This Turing machine recognizes the language {0} Turing Machines versus DFAs TM can both write

More information

Regular Languages and Finite State Machines

Regular Languages and Finite State Machines Regular Languages and Finite State Machines Plan for the Day: Mathematical preliminaries - some review One application formal definition of finite automata Examples 1 Sets A set is an unordered collection

More information

Automata and Formal Languages

Automata and Formal Languages Automata and Formal Languages Winter 2009-2010 Yacov Hel-Or 1 What this course is all about This course is about mathematical models of computation We ll study different machine models (finite automata,

More information

Chapter One Introduction to Programming

Chapter One Introduction to Programming Chapter One Introduction to Programming 1-1 Algorithm and Flowchart Algorithm is a step-by-step procedure for calculation. More precisely, algorithm is an effective method expressed as a finite list of

More information

Embedded Systems. Review of ANSI C Topics. A Review of ANSI C and Considerations for Embedded C Programming. Basic features of C

Embedded Systems. Review of ANSI C Topics. A Review of ANSI C and Considerations for Embedded C Programming. Basic features of C Embedded Systems A Review of ANSI C and Considerations for Embedded C Programming Dr. Jeff Jackson Lecture 2-1 Review of ANSI C Topics Basic features of C C fundamentals Basic data types Expressions Selection

More information

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Exam Name MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. 1) The JDK command to compile a class in the file Test.java is A) java Test.java B) java

More information

C Compiler Targeting the Java Virtual Machine

C Compiler Targeting the Java Virtual Machine C Compiler Targeting the Java Virtual Machine Jack Pien Senior Honors Thesis (Advisor: Javed A. Aslam) Dartmouth College Computer Science Technical Report PCS-TR98-334 May 30, 1998 Abstract One of the

More information

Automata Theory. Şubat 2006 Tuğrul Yılmaz Ankara Üniversitesi

Automata Theory. Şubat 2006 Tuğrul Yılmaz Ankara Üniversitesi Automata Theory Automata theory is the study of abstract computing devices. A. M. Turing studied an abstract machine that had all the capabilities of today s computers. Turing s goal was to describe the

More information

The previous chapter provided a definition of the semantics of a programming

The previous chapter provided a definition of the semantics of a programming Chapter 7 TRANSLATIONAL SEMANTICS The previous chapter provided a definition of the semantics of a programming language in terms of the programming language itself. The primary example was based on a Lisp

More information

Formal Grammars and Languages

Formal Grammars and Languages Formal Grammars and Languages Tao Jiang Department of Computer Science McMaster University Hamilton, Ontario L8S 4K1, Canada Bala Ravikumar Department of Computer Science University of Rhode Island Kingston,

More information

Finite Automata. Reading: Chapter 2

Finite Automata. Reading: Chapter 2 Finite Automata Reading: Chapter 2 1 Finite Automata Informally, a state machine that comprehensively captures all possible states and transitions that a machine can take while responding to a stream (or

More information

A Lex Tutorial. Victor Eijkhout. July 2004. 1 Introduction. 2 Structure of a lex file

A Lex Tutorial. Victor Eijkhout. July 2004. 1 Introduction. 2 Structure of a lex file A Lex Tutorial Victor Eijkhout July 2004 1 Introduction The unix utility lex parses a file of characters. It uses regular expression matching; typically it is used to tokenize the contents of the file.

More information

Regular Languages and Finite Automata

Regular Languages and Finite Automata Regular Languages and Finite Automata 1 Introduction Hing Leung Department of Computer Science New Mexico State University Sep 16, 2010 In 1943, McCulloch and Pitts [4] published a pioneering work on a

More information

(IALC, Chapters 8 and 9) Introduction to Turing s life, Turing machines, universal machines, unsolvable problems.

(IALC, Chapters 8 and 9) Introduction to Turing s life, Turing machines, universal machines, unsolvable problems. 3130CIT: Theory of Computation Turing machines and undecidability (IALC, Chapters 8 and 9) Introduction to Turing s life, Turing machines, universal machines, unsolvable problems. An undecidable problem

More information

Introduction to Java Applications. 2005 Pearson Education, Inc. All rights reserved.

Introduction to Java Applications. 2005 Pearson Education, Inc. All rights reserved. 1 2 Introduction to Java Applications 2.2 First Program in Java: Printing a Line of Text 2 Application Executes when you use the java command to launch the Java Virtual Machine (JVM) Sample program Displays

More information

How To Port A Program To Dynamic C (C) (C-Based) (Program) (For A Non Portable Program) (Un Portable) (Permanent) (Non Portable) C-Based (Programs) (Powerpoint)

How To Port A Program To Dynamic C (C) (C-Based) (Program) (For A Non Portable Program) (Un Portable) (Permanent) (Non Portable) C-Based (Programs) (Powerpoint) TN203 Porting a Program to Dynamic C Introduction Dynamic C has a number of improvements and differences compared to many other C compiler systems. This application note gives instructions and suggestions

More information

Introduction to Turing Machines

Introduction to Turing Machines Automata Theory, Languages and Computation - Mírian Halfeld-Ferrari p. 1/2 Introduction to Turing Machines SITE : http://www.sir.blois.univ-tours.fr/ mirian/ Automata Theory, Languages and Computation

More information

University of Toronto Department of Electrical and Computer Engineering. Midterm Examination. CSC467 Compilers and Interpreters Fall Semester, 2005

University of Toronto Department of Electrical and Computer Engineering. Midterm Examination. CSC467 Compilers and Interpreters Fall Semester, 2005 University of Toronto Department of Electrical and Computer Engineering Midterm Examination CSC467 Compilers and Interpreters Fall Semester, 2005 Time and date: TBA Location: TBA Print your name and ID

More information

Deterministic Finite Automata

Deterministic Finite Automata 1 Deterministic Finite Automata Definition: A deterministic finite automaton (DFA) consists of 1. a finite set of states (often denoted Q) 2. a finite set Σ of symbols (alphabet) 3. a transition function

More information

Basics of Compiler Design

Basics of Compiler Design Basics of Compiler Design Anniversary edition Torben Ægidius Mogensen DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF COPENHAGEN Published through lulu.com. c Torben Ægidius Mogensen 2000 2010 torbenm@diku.dk

More information

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design i About the Tutorial A compiler translates the codes written in one language to some other language without changing the meaning of the program. It is also expected that a compiler should make the target

More information

JavaScript: Introduction to Scripting. 2008 Pearson Education, Inc. All rights reserved.

JavaScript: Introduction to Scripting. 2008 Pearson Education, Inc. All rights reserved. 1 6 JavaScript: Introduction to Scripting 2 Comment is free, but facts are sacred. C. P. Scott The creditor hath a better memory than the debtor. James Howell When faced with a decision, I always ask,

More information

7.1 Our Current Model

7.1 Our Current Model Chapter 7 The Stack In this chapter we examine what is arguably the most important abstract data type in computer science, the stack. We will see that the stack ADT and its implementation are very simple.

More information

Mathematics for Computer Science/Software Engineering. Notes for the course MSM1F3 Dr. R. A. Wilson

Mathematics for Computer Science/Software Engineering. Notes for the course MSM1F3 Dr. R. A. Wilson Mathematics for Computer Science/Software Engineering Notes for the course MSM1F3 Dr. R. A. Wilson October 1996 Chapter 1 Logic Lecture no. 1. We introduce the concept of a proposition, which is a statement

More information

Semester Review. CSC 301, Fall 2015

Semester Review. CSC 301, Fall 2015 Semester Review CSC 301, Fall 2015 Programming Language Classes There are many different programming language classes, but four classes or paradigms stand out:! Imperative Languages! assignment and iteration!

More information

LEX/Flex Scanner Generator

LEX/Flex Scanner Generator Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 1 LEX/Flex Scanner Generator Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 2 flex - Fast Lexical Analyzer Generator We can use flex a to automatically

More information

CS 3719 (Theory of Computation and Algorithms) Lecture 4

CS 3719 (Theory of Computation and Algorithms) Lecture 4 CS 3719 (Theory of Computation and Algorithms) Lecture 4 Antonina Kolokolova January 18, 2012 1 Undecidable languages 1.1 Church-Turing thesis Let s recap how it all started. In 1990, Hilbert stated a

More information

Computability Theory

Computability Theory CSC 438F/2404F Notes (S. Cook and T. Pitassi) Fall, 2014 Computability Theory This section is partly inspired by the material in A Course in Mathematical Logic by Bell and Machover, Chap 6, sections 1-10.

More information

Flex/Bison Tutorial. Aaron Myles Landwehr aron+ta@udel.edu CAPSL 2/17/2012

Flex/Bison Tutorial. Aaron Myles Landwehr aron+ta@udel.edu CAPSL 2/17/2012 Flex/Bison Tutorial Aaron Myles Landwehr aron+ta@udel.edu 1 GENERAL COMPILER OVERVIEW 2 Compiler Overview Frontend Middle-end Backend Lexer / Scanner Parser Semantic Analyzer Optimizers Code Generator

More information

CHAPTER 7 GENERAL PROOF SYSTEMS

CHAPTER 7 GENERAL PROOF SYSTEMS CHAPTER 7 GENERAL PROOF SYSTEMS 1 Introduction Proof systems are built to prove statements. They can be thought as an inference machine with special statements, called provable statements, or sometimes

More information

We will learn the Python programming language. Why? Because it is easy to learn and many people write programs in Python so we can share.

We will learn the Python programming language. Why? Because it is easy to learn and many people write programs in Python so we can share. LING115 Lecture Note Session #4 Python (1) 1. Introduction As we have seen in previous sessions, we can use Linux shell commands to do simple text processing. We now know, for example, how to count words.

More information

Programming Languages

Programming Languages Programming Languages Programming languages bridge the gap between people and machines; for that matter, they also bridge the gap among people who would like to share algorithms in a way that immediately

More information

Antlr ANother TutoRiaL

Antlr ANother TutoRiaL Antlr ANother TutoRiaL Karl Stroetmann March 29, 2007 Contents 1 Introduction 1 2 Implementing a Simple Scanner 1 A Parser for Arithmetic Expressions 4 Symbolic Differentiation 6 5 Conclusion 10 1 Introduction

More information

Language Processing Systems

Language Processing Systems Language Processing Systems Evaluation Active sheets 10 % Exercise reports 30 % Midterm Exam 20 % Final Exam 40 % Contact Send e-mail to hamada@u-aizu.ac.jp Course materials at www.u-aizu.ac.jp/~hamada/education.html

More information

Chapter 2: Elements of Java

Chapter 2: Elements of Java Chapter 2: Elements of Java Basic components of a Java program Primitive data types Arithmetic expressions Type casting. The String type (introduction) Basic I/O statements Importing packages. 1 Introduction

More information

GENERIC and GIMPLE: A New Tree Representation for Entire Functions

GENERIC and GIMPLE: A New Tree Representation for Entire Functions GENERIC and GIMPLE: A New Tree Representation for Entire Functions Jason Merrill Red Hat, Inc. jason@redhat.com 1 Abstract The tree SSA project requires a tree representation of functions for the optimizers

More information

CS 141: Introduction to (Java) Programming: Exam 1 Jenny Orr Willamette University Fall 2013

CS 141: Introduction to (Java) Programming: Exam 1 Jenny Orr Willamette University Fall 2013 Oct 4, 2013, p 1 Name: CS 141: Introduction to (Java) Programming: Exam 1 Jenny Orr Willamette University Fall 2013 1. (max 18) 4. (max 16) 2. (max 12) 5. (max 12) 3. (max 24) 6. (max 18) Total: (max 100)

More information

Automata on Infinite Words and Trees

Automata on Infinite Words and Trees Automata on Infinite Words and Trees Course notes for the course Automata on Infinite Words and Trees given by Dr. Meghyn Bienvenu at Universität Bremen in the 2009-2010 winter semester Last modified:

More information

Symbol Tables. Introduction

Symbol Tables. Introduction Symbol Tables Introduction A compiler needs to collect and use information about the names appearing in the source program. This information is entered into a data structure called a symbol table. The

More information

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.035, Fall 2005 Handout 7 Scanner Parser Project Wednesday, September 7 DUE: Wednesday, September 21 This

More information

Python Loops and String Manipulation

Python Loops and String Manipulation WEEK TWO Python Loops and String Manipulation Last week, we showed you some basic Python programming and gave you some intriguing problems to solve. But it is hard to do anything really exciting until

More information

Regular Expressions with Nested Levels of Back Referencing Form a Hierarchy

Regular Expressions with Nested Levels of Back Referencing Form a Hierarchy Regular Expressions with Nested Levels of Back Referencing Form a Hierarchy Kim S. Larsen Odense University Abstract For many years, regular expressions with back referencing have been used in a variety

More information

Finite Automata and Regular Languages

Finite Automata and Regular Languages CHAPTER 3 Finite Automata and Regular Languages 3. Introduction 3.. States and Automata A finite-state machine or finite automaton (the noun comes from the Greek; the singular is automaton, the Greek-derived

More information

AUTOMATED TEST GENERATION FOR SOFTWARE COMPONENTS

AUTOMATED TEST GENERATION FOR SOFTWARE COMPONENTS TKK Reports in Information and Computer Science Espoo 2009 TKK-ICS-R26 AUTOMATED TEST GENERATION FOR SOFTWARE COMPONENTS Kari Kähkönen ABTEKNILLINEN KORKEAKOULU TEKNISKA HÖGSKOLAN HELSINKI UNIVERSITY OF

More information

The C Programming Language course syllabus associate level

The C Programming Language course syllabus associate level TECHNOLOGIES The C Programming Language course syllabus associate level Course description The course fully covers the basics of programming in the C programming language and demonstrates fundamental programming

More information

Data Integrator. Pervasive Software, Inc. 12365-B Riata Trace Parkway Austin, Texas 78727 USA

Data Integrator. Pervasive Software, Inc. 12365-B Riata Trace Parkway Austin, Texas 78727 USA Data Integrator Event Management Guide Pervasive Software, Inc. 12365-B Riata Trace Parkway Austin, Texas 78727 USA Telephone: 888.296.5969 or 512.231.6000 Fax: 512.231.6010 Email: info@pervasiveintegration.com

More information

Fast nondeterministic recognition of context-free languages using two queues

Fast nondeterministic recognition of context-free languages using two queues Fast nondeterministic recognition of context-free languages using two queues Burton Rosenberg University of Miami Abstract We show how to accept a context-free language nondeterministically in O( n log

More information

VIRTUAL LABORATORY: MULTI-STYLE CODE EDITOR

VIRTUAL LABORATORY: MULTI-STYLE CODE EDITOR VIRTUAL LABORATORY: MULTI-STYLE CODE EDITOR Andrey V.Lyamin, State University of IT, Mechanics and Optics St. Petersburg, Russia Oleg E.Vashenkov, State University of IT, Mechanics and Optics, St.Petersburg,

More information

The following themes form the major topics of this chapter: The terms and concepts related to trees (Section 5.2).

The following themes form the major topics of this chapter: The terms and concepts related to trees (Section 5.2). CHAPTER 5 The Tree Data Model There are many situations in which information has a hierarchical or nested structure like that found in family trees or organization charts. The abstraction that models hierarchical

More information

Ed. v1.0 PROGRAMMING LANGUAGES WORKING PAPER DRAFT PROGRAMMING LANGUAGES. Ed. v1.0

Ed. v1.0 PROGRAMMING LANGUAGES WORKING PAPER DRAFT PROGRAMMING LANGUAGES. Ed. v1.0 i PROGRAMMING LANGUAGES ii Copyright 2011 Juhász István iii COLLABORATORS TITLE : PROGRAMMING LANGUAGES ACTION NAME DATE SIGNATURE WRITTEN BY István Juhász 2012. március 26. Reviewed by Ágnes Korotij 2012.

More information

A Programming Language for Mechanical Translation Victor H. Yngve, Massachusetts Institute of Technology, Cambridge, Massachusetts

A Programming Language for Mechanical Translation Victor H. Yngve, Massachusetts Institute of Technology, Cambridge, Massachusetts [Mechanical Translation, vol.5, no.1, July 1958; pp. 25-41] A Programming Language for Mechanical Translation Victor H. Yngve, Massachusetts Institute of Technology, Cambridge, Massachusetts A notational

More information

CSE 135: Introduction to Theory of Computation Decidability and Recognizability

CSE 135: Introduction to Theory of Computation Decidability and Recognizability CSE 135: Introduction to Theory of Computation Decidability and Recognizability Sungjin Im University of California, Merced 04-28, 30-2014 High-Level Descriptions of Computation Instead of giving a Turing

More information

CMPSCI 250: Introduction to Computation. Lecture #19: Regular Expressions and Their Languages David Mix Barrington 11 April 2013

CMPSCI 250: Introduction to Computation. Lecture #19: Regular Expressions and Their Languages David Mix Barrington 11 April 2013 CMPSCI 250: Introduction to Computation Lecture #19: Regular Expressions and Their Languages David Mix Barrington 11 April 2013 Regular Expressions and Their Languages Alphabets, Strings and Languages

More information