CSC236 Week 9. Larry Zhang

Similar documents
Regular Languages and Finite State Machines

CMPSCI 250: Introduction to Computation. Lecture #19: Regular Expressions and Their Languages David Mix Barrington 11 April 2013

Reading 13 : Finite State Automata and Regular Expressions

CS103B Handout 17 Winter 2007 February 26, 2007 Languages and Regular Expressions

C H A P T E R Regular Expressions regular expression

6.045: Automata, Computability, and Complexity Or, Great Ideas in Theoretical Computer Science Spring, Class 4 Nancy Lynch

Formal Languages and Automata Theory - Regular Expressions and Finite Automata -

Automata Theory. Şubat 2006 Tuğrul Yılmaz Ankara Üniversitesi

Automata and Formal Languages

CSC4510 AUTOMATA 2.1 Finite Automata: Examples and D efinitions Definitions

Fundamentele Informatica II

(IALC, Chapters 8 and 9) Introduction to Turing s life, Turing machines, universal machines, unsolvable problems.

Regular Languages and Finite Automata

THEORY of COMPUTATION

Introduction to Automata Theory. Reading: Chapter 1

Scanner. tokens scanner parser IR. source code. errors

The Halting Problem is Undecidable

Automata and Computability. Solutions to Exercises

Pushdown Automata. place the input head on the leftmost input symbol. while symbol read = b and pile contains discs advance head remove disc from pile

Deterministic Finite Automata

Regular Expressions with Nested Levels of Back Referencing Form a Hierarchy

CAs and Turing Machines. The Basis for Universal Computation

How To Compare A Markov Algorithm To A Turing Machine

Compiler Construction

Proseminar on Semantic Theory Fall 2013 Ling 720. Problem Set on the Formal Preliminaries : Answers and Notes

Lecture 2: Regular Languages [Fa 14]

Honors Class (Foundations of) Informatics. Tom Verhoeff. Department of Mathematics & Computer Science Software Engineering & Technology

CSE 135: Introduction to Theory of Computation Decidability and Recognizability

24 Uses of Turing Machines

Introduction to Finite Automata

Finite Automata and Regular Languages

Lecture 1. Basic Concepts of Set Theory, Functions and Relations

Computability Theory

Lights and Darks of the Star-Free Star

CS 3719 (Theory of Computation and Algorithms) Lecture 4

Regular Expressions and Automata using Haskell

University of Toronto Department of Electrical and Computer Engineering. Midterm Examination. CSC467 Compilers and Interpreters Fall Semester, 2005

3515ICT Theory of Computation Turing Machines

Finite Automata. Reading: Chapter 2

Finite Automata. Reading: Chapter 2

CPSC 121: Models of Computation Assignment #4, due Wednesday, July 22nd, 2009 at 14:00

6.3 Conditional Probability and Independence

Automata on Infinite Words and Trees

10.2 Series and Convergence

Computational Models Lecture 8, Spring 2009

Omega Automata: Minimization and Learning 1

CS5236 Advanced Automata Theory

Mathematical Induction. Lecture 10-11

CS154. Turing Machines. Turing Machine. Turing Machines versus DFAs FINITE STATE CONTROL AI N P U T INFINITE TAPE. read write move.

CHAPTER 7 GENERAL PROOF SYSTEMS

Lexical analysis FORMAL LANGUAGES AND COMPILERS. Floriano Scioscia. Formal Languages and Compilers A.Y. 2015/2016

THE TURING DEGREES AND THEIR LACK OF LINEAR ORDER

Turing Machines: An Introduction

1 Definition of a Turing machine

Lecture 2: Universality

Testing LTL Formula Translation into Büchi Automata

Regular Languages and Finite Automata

Compiler I: Syntax Analysis Human Thought

Introduction to Theory of Computation

Informatique Fondamentale IMA S8

T Reactive Systems: Introduction and Finite State Automata

MACM 101 Discrete Mathematics I

Primes in Sequences. Lee 1. By: Jae Young Lee. Project for MA 341 (Number Theory) Boston University Summer Term I 2009 Instructor: Kalin Kostadinov

Overview of E0222: Automata and Computability

Logic in Computer Science: Logic Gates

The Optimum One-Pass Strategy for Juliet

Pushdown automata. Informatics 2A: Lecture 9. Alex Simpson. 3 October, School of Informatics University of Edinburgh als@inf.ed.ac.

Lempel-Ziv Coding Adaptive Dictionary Compression Algorithm

Finite Automata and Formal Languages

Number Systems and Radix Conversion

11 Multivariate Polynomials

136 CHAPTER 4. INDUCTION, GRAPHS AND TREES

Turing Machines, Part I

CS 301 Course Information

Genetic programming with regular expressions

6.080/6.089 GITCS Feb 12, Lecture 3

Lecture I FINITE AUTOMATA

Implementation of Recursively Enumerable Languages using Universal Turing Machine in JFLAP

Overview. Essential Questions. Precalculus, Quarter 4, Unit 4.5 Build Arithmetic and Geometric Sequences and Series

Chapter 7 Uncomputability

ASSIGNMENT ONE SOLUTIONS MATH 4805 / COMP 4805 / MATH 5605

So let us begin our quest to find the holy grail of real analysis.

COMP 250 Fall 2012 lecture 2 binary representations Sept. 11, 2012

Cardinality. The set of all finite strings over the alphabet of lowercase letters is countable. The set of real numbers R is an uncountable set.

Matrix Representations of Linear Transformations and Changes of Coordinates

Continued Fractions and the Euclidean Algorithm

6.080 / Great Ideas in Theoretical Computer Science Spring 2008

Basic Concepts of Set Theory, Functions and Relations

Lecture 18 Regular Expressions

Base Conversion written by Cathy Saxton

Compilers Lexical Analysis

Semantic Analysis: Types and Type Checking

Matrix Algebra. Some Basic Matrix Laws. Before reading the text or the following notes glance at the following list of basic matrix algebra laws.

Introduction. Appendix D Mathematical Induction D1

Models of Sequential Computation

Vector and Matrix Norms

Discrete Math in Computer Science Homework 7 Solutions (Max Points: 80)

Section IV.1: Recursive Algorithms and Recursion Trees

Transcription:

CSC236 Week 9 Larry Zhang 1

NEW TOPIC Finite Automata & Regular Language 2

Finite Automata An important part of the theory of computation A simple and powerful model for computation It describes a simple idealized machine (the theoretical computer) It has many applications Digital circuit design (you ll use it in CSC258) Compiler and interpreter (how computer understands your program) Text searching/parsing, lexical analysis, pattern matching Neural networks (models the working mechanism of the neurons) etc. Yet another hallmark for a CS pro. 3

The application we focus on in CSC236 Regular Language & Regular Expressions 4

Alphabet String Language Regular language Regular expression Kleene star... 5

Terminology: Alphabet Alphabet: a finite set of symbols e.g., {0, 1} # the alphabet for binary strings e.g., {a, b, c,, z} # the alphabet for English words etc We denote an alphabet using the Greek letter... Σ ( sigma ) Has nothing to do with summation. 6

Terminology: String A string w over alphabet is a finite sequence of symbols from For example, the following are some strings over the alphabet {0, 1} 0, 1, 0110, 1110 Each English word is a string over the alphabet {a, b,, z} e.g., brace, yourself A special string: the empty string which we denote with ε It is a string over any alphabet. 7

Terminology: Length of a String The length of string w is the number of characters in w It is written as w, for example brace = 5 010111 = 6 ε = 0 8

Exercises 9

Consider alphabet = {0, 1} set of all binary strings with length 0 A set with one string (the empty string) in it: {ε} 10

Consider alphabet = {0, 1} set of all binary strings of length 1 A set with two strings in it: {0, 1} 11

Consider alphabet = {0, 1} set of all binary strings of length 2 A set with four strings in it: {00, 01, 10, 11} 12

Consider alphabet = {0, 1} set of all binary strings of length 12 {000000000000,, 111111111111} A set with 4096 (2^12) strings in it: 13

Consider alphabet = {0, 1} set of all binary strings of length from 0 to arbitrarily large A set with an infinite number of strings in it. 14

Consider alphabet = {0, 1} set of all binary strings of length 0 or 1 A set with three strings in it: {ε, 0, 1} 15

back to terminologies... 16

Warm-Up Consider the alphabet be = {a, b, c,, z} then the set of all English words L is basically * L is only a subset of *, i.e., L * Not all strings in * are in L, such as sdfasdf, ttttt and the empty string. 17

Terminology: Language A language L over alphabet is a subset of * i.e., L * The set of all English words is a language over = {a, b,..., z} Languages can have finite or infinite size, e.g., The English language is finite is finite is finite is infinite 18

Operations on Languages Recursive define set, whose properties can be proven by structural induction! Given two languages L, M *, three operations can be used to generate new languages. Union, L M: all strings from L together with all strings from M. Concatenation, LM: concatenate each string in L with each string in M. Kleene Star, L*: all strings that can be formed by concatenating zero or more strings from L. 19

Exercises 20

Kleene star example Let language L = {0, 1}, then what is L*? Answer: the set of all binary strings, including the empty string. 21

Consider languages L = {a, aa} and M = {a, cc} What is L M? L M = {a, aa, cc} 22

Consider languages L = {a, aa} and M = {a, cc} What is LM? LM = {a.a, a.cc, aa.a, aa.cc} (the dot is for visualizing concatenation, it does NOT count as a character) 23

Consider languages L = {a, aa} and M = {a, cc} What is L*? L* = {ε, a, aa, aaa, aaaa, aaaaa, } better description: L* = {w w consists of 0 or more a s} 24

Consider languages L = {a, aa} and M = {a, cc} What is M*? M* = {ε, a, aa, cc, acc, cca, aaa, aacc, ccaa, ccc, cccc, } M* = {w w consists of 0 or more a s and c s, and all c s are in pairs } M* = {w w consists of 0 or more a s and cc s} 25

Consider languages L = {a, aa} and M = {a, cc} What is (L M)*? (L M)* = {a, aa, cc}* (L M)* = {w w consists of 0 or more a s and c s, and all c s are in pairs } It is the same set as M* 26

more terminologies... 27

Terminology: Regular Languages Regular languages are a subset of all languages. (Not all languages are regular) The set of regular languages over alphabet is recursively defined as follows, the empty set, is a regular language {ε}, the language consisting of only the empty string, is a regular language For any symbol a, {a} is a regular language. If L, M are regular languages, then so are L M, LM, and L*. 28

Quick Exercise Prove that language L = {a, aa} is a regular language. Proof: {a} is regular by definition So {aa} = {a}{a} is regular (concatenation rule) So {a, aa} = {a} U {aa} is regular (union rule) 29

one last terminology... 30

Terminology: Regular Expressions A regular expression (regex) is a string representation of a regular language. A regex matches a set of strings (the represented regular language). It also has a recursive definition: these are the bases, and there is more... 31

Definition of regex, continued... 32

More Exercises 33

EX1: Find Language from Regular Expression Describe the language represented by the following regular expression. r: (01)+1(0+1)* The * has the highest precedence. It matches the string 01 Must start with 1, followed by any string with 0 s or 1 s L(r): set with 01, and all strings with 0 s and 1 s that start with 1 34

EX2: Develop Regular Expression from Language Give a regular expression that represents the following language. Naive way: enumerate all possible strings ε + a + aa + b + bb + ab + ba Then try to combine some of them ε + a(ε + a) + b(ε + b) + ab + ba or, (ε + a + b)(ε + a + b) Let s try to expand (ε + a + b)(ε + a + b) = εε + εa + εb + aε + aa + ab + bε + ba + bb = ε + a + b + aa + ab + ba + bb # εε = ε # εa = aε = a # εb = bε = b 35

EX3 Give a regular expression that represents the following language Naive way: enumerate all possible strings No can do! There are infinitely many strings in L. What should match: ε, 0, 1, 00, 01, 10, 100, 101, 001, 01010010100 What should NOT match: 11, 111, 011, 110, 1111, 01110, 00000011 36

EX3 Observation 1: if we have a 1, what must happen after that 1? There must be a 0 or nothing. Observation 2: 0 s can go anywhere and there can be as many of them as you want. 37

EX3 Attempt #1: (10)* It matches ε, 10, 1010, 101010,, all in L. But it does NOT match everything needed. It s missing strings with multiple 0 s in a row Like, 100, 10010000 Try to fix this: (100*)* 38

EX3 Attempt #2: (100*)* Now I m allowing multiple 0 s in a row. Right? It s still missing some strings 01, 001, 0001010, 000000101000 The strings that start with 0 s. Try to fix this: 0*(100*)* 39

EX3 Attempt #3: 0*(100*)* Now I m allowing the string to start with 0 or 1. It s good, right? It s still missing some strings 1, 101, 0010101, 00100001 the strings that end with 1 s. How to fix it? Attempt #4: 0*(100*)*(ε + 1) This one is correct! 40

Regex 0*(100*)*(ε + 1) represents the language This regex is NOT unique. The following is also a valid regex for the above language. (ε+1)(00*1)*0* Verify it yourself! 41

Home Exercise N = {w : {0,1}* w represents a binary number divisible by 2} Find a regex for language N. 42

Takeaway For a regex to correctly represent a language L, it must match every string in L, and nothing else. The regex is wrong if any of the following happens There is a string in L, that the regex does not match. There is a string that is not in L, but is matched by the regex. The general steps of coming up with a regex Observe and understand the pattern that need to be matched, educatedly attempt a regex Verify if the attempted regex is wrong (above two criteria), if wrong, know the reason, fix it. Repeat the above until you re convinced you have right answer. 43

Next week DFA: model regular expression as a computation. 44