Regular Expressions and Finite State Automata

Similar documents
6.045: Automata, Computability, and Complexity Or, Great Ideas in Theoretical Computer Science Spring, Class 4 Nancy Lynch

Formal Languages and Automata Theory - Regular Expressions and Finite Automata -

C H A P T E R Regular Expressions regular expression

CS103B Handout 17 Winter 2007 February 26, 2007 Languages and Regular Expressions

Reading 13 : Finite State Automata and Regular Expressions

Automata and Formal Languages

Scanner. tokens scanner parser IR. source code. errors

CMPSCI 250: Introduction to Computation. Lecture #19: Regular Expressions and Their Languages David Mix Barrington 11 April 2013

Regular Expressions and Automata using Haskell

Informatique Fondamentale IMA S8

The Halting Problem is Undecidable

THEORY of COMPUTATION

Regular Languages and Finite State Machines

(IALC, Chapters 8 and 9) Introduction to Turing s life, Turing machines, universal machines, unsolvable problems.

Regular Expressions with Nested Levels of Back Referencing Form a Hierarchy

Compiler Construction

Automata and Computability. Solutions to Exercises

6.3 Conditional Probability and Independence

Deterministic Finite Automata

CHAPTER 7 GENERAL PROOF SYSTEMS

CS5236 Advanced Automata Theory

CS154. Turing Machines. Turing Machine. Turing Machines versus DFAs FINITE STATE CONTROL AI N P U T INFINITE TAPE. read write move.

Pushdown Automata. place the input head on the leftmost input symbol. while symbol read = b and pile contains discs advance head remove disc from pile

Introduction to Finite Automata

Regular Languages and Finite Automata

Lecture 18 Regular Expressions

Honors Class (Foundations of) Informatics. Tom Verhoeff. Department of Mathematics & Computer Science Software Engineering & Technology

Policy Analysis for Administrative Role Based Access Control without Separate Administration

INCIDENCE-BETWEENNESS GEOMETRY

Automata Theory. Şubat 2006 Tuğrul Yılmaz Ankara Üniversitesi

Fundamentele Informatica II

Mathematics for Computer Science/Software Engineering. Notes for the course MSM1F3 Dr. R. A. Wilson

24 Uses of Turing Machines

CSC4510 AUTOMATA 2.1 Finite Automata: Examples and D efinitions Definitions

Boolean Algebra Part 1

Automata on Infinite Words and Trees

Lights and Darks of the Star-Free Star

CS 3719 (Theory of Computation and Algorithms) Lecture 4

Introduction to Automata Theory. Reading: Chapter 1

4. FIRST STEPS IN THE THEORY 4.1. A

Finite Automata and Regular Languages

6 Commutators and the derived series. [x,y] = xyx 1 y 1.

Omega Automata: Minimization and Learning 1

6.2 Permutations continued

CSE 135: Introduction to Theory of Computation Decidability and Recognizability

Pushdown automata. Informatics 2A: Lecture 9. Alex Simpson. 3 October, School of Informatics University of Edinburgh als@inf.ed.ac.

Properties of Real Numbers

Finite Automata. Reading: Chapter 2

Regular Languages and Finite Automata

ASSIGNMENT ONE SOLUTIONS MATH 4805 / COMP 4805 / MATH 5605

26 Integers: Multiplication, Division, and Order

The following themes form the major topics of this chapter: The terms and concepts related to trees (Section 5.2).

Web Data Extraction: 1 o Semestre 2007/2008

8 Divisibility and prime numbers

Lecture 1. Basic Concepts of Set Theory, Functions and Relations

Computability Theory

CAs and Turing Machines. The Basis for Universal Computation

ML-Flex Implementation Notes

How To Compare A Markov Algorithm To A Turing Machine

The Set Data Model CHAPTER What This Chapter Is About

Compiler I: Syntax Analysis Human Thought

I. GROUPS: BASIC DEFINITIONS AND EXAMPLES

Computational Models Lecture 8, Spring 2009

Boolean Algebra (cont d) UNIT 3 BOOLEAN ALGEBRA (CONT D) Guidelines for Multiplying Out and Factoring. Objectives. Iris Hui-Ru Jiang Spring 2010

Notes on Complexity Theory Last updated: August, Lecture 1

Discrete Mathematics

Introduction to Theory of Computation

Chapter 3. Cartesian Products and Relations. 3.1 Cartesian Products

WRITING PROOFS. Christopher Heil Georgia Institute of Technology

This asserts two sets are equal iff they have the same elements, that is, a set is determined by its elements.

Lecture 4. Regular Expressions grep and sed intro

CH3 Boolean Algebra (cont d)

Lecture 2: Regular Languages [Fa 14]

Algorithmic Software Verification

Reminder: Complexity (1) Parallel Complexity Theory. Reminder: Complexity (2) Complexity-new

Reminder: Complexity (1) Parallel Complexity Theory. Reminder: Complexity (2) Complexity-new GAP (2) Graph Accessibility Problem (GAP) (1)

ÖVNINGSUPPGIFTER I SAMMANHANGSFRIA SPRÅK. 15 april Master Edition

Mathematical Induction. Lecture 10-11

Semantics of UML class diagrams

Solutions to TOPICS IN ALGEBRA I.N. HERSTEIN. Part II: Group Theory

The Optimum One-Pass Strategy for Juliet

Symmetry of Nonparametric Statistical Tests on Three Samples

3515ICT Theory of Computation Turing Machines

Elementary Number Theory and Methods of Proof. CSE 215, Foundations of Computer Science Stony Brook University

Full and Complete Binary Trees

Chapter 7 Uncomputability

Matrix Algebra. Some Basic Matrix Laws. Before reading the text or the following notes glance at the following list of basic matrix algebra laws.

CS510 Software Engineering

E3: PROBABILITY AND STATISTICS lecture notes

MACM 101 Discrete Mathematics I

Lecture 2: Universality

Discrete Mathematics Problems

3. Mathematical Induction

Turing Machines: An Introduction

Math Circle Beginners Group October 18, 2015

Math 312 Homework 1 Solutions

College of the Holy Cross CCSCNE 06 Programming Contest Problems

Mathematical Induction

ω-automata Automata that accept (or reject) words of infinite length. Languages of infinite words appear:

Cardinality. The set of all finite strings over the alphabet of lowercase letters is countable. The set of real numbers R is an uncountable set.

Transcription:

Regular Expressions and Finite State Automata Themes Finite State Automata (FSA) Describing patterns with graphs Programs that keep track of state Regular Expressions (RE) Describing patterns with regular expressions Converting regular expressions to programs Theorems The languages (Regular Languages) recognized by FSA and generated by RE are the same There are languages generated by grammars that are not Regular

Regular Expressions Describe (generate) Regular Languages A pattern: ε the empty string a a literal character, stands for itself Operations Concatenation, RS Alternation, R S Closure (Kleene Star) R*, the set of all strings that can be made by concatenating zero or more strings in R

Regular Expressions In the algebra of regular expressions, an atomic operand is one of the following: A character : L(x) = {x} The symbol ε : L(ε) = {ε} The symbol : L( ) = {} A variable whose value can be any pattern defined by a regular expression

Regular Expressions There are three operators used to build regular expressions: Union R S L(R S) = L(R) L(S) Concatenation RS L(RS) = {rs, r R and s S} Closure R* L(R*) = {ε,r,rr,rrr, }

RE examples a b* denotes {ε, a, b, bb, bbb,...} (a b)* denotes the set of all strings consisting of any number of a and b symbols, including the empty string b*(ab*)* the same ab*(c ε) denotes the set of strings starting with a, then zero or more bs and finally optionally a c.

Regular Expressions a (ab) (a (ab)) (c (bc)) a* a*b* (ab)* a bc*d letter = a b c z A B C Z _ digit = 0 1 2 3 4 5 6 7 8 9 letter(letter digit)*

Finite State Machine Accepts or rejects a string A finite collection of states Has a single start state Transition from one state to another on a given input Machine accepts if in an accepting state at end of input (whatever that means)

FSM - Example a b* a Λ-a,b Λ b Λ-b Λ (a b)* b a,b a,b Λ-a,b Λ-a,b Λ

Problem Find all words that contain all of the vowels in alphabetical order* ab ste mi ous adj : sparing in use of food or drink : temperate ab ste mi ous ly adv ab ste mi ous ness n (c)2000 Zane Publishing, Inc. and Merriam-Webster, Incorporated. All rights reserved. *Note: this problem statement is a little ambiguous. Does aieiiaoeua also pass?

Problem 4 Solution 1 Λ-a Λ-e Λ-i Λ-o Λ-u Λ > S0 a S1 e S2 i S3 o S4 u S5 Λ = set of all letters

Problem 4 Solution 2 $ grep '.*a.*e.*i.*o.*u.*' < /usr/dict/words adventitious facetious sacrilegious

Problem 5 Partial Anagram: Find all words that can be made from the letters in Washington/ a, ago, ah, an, angst,

Problem 5 Grammar <Washington> <w><a><s><h><i><n><g><t><o><n> <w> w ε <a> a ε <s> s ε <o> o ε Note, this only finds partial anagrams where the characters maintain their relative order

Generating Subsets Let S = {a,b,c} Review basic notions of set theory (Sec. 7.2 & 7.3) The power set of S, P(S) is the set of all subsets of S including S and the empty set P(S) = {b,c}, {b}, {c}, {} {a,b,c},{a,b},{a,c},{a},

Recursive Program to Generate P(S) PowerSet(S) if S = {} return {{}}; else S = PowerSet(S\First(S)); S = S ; for s in S do S = S (First(S) union s); return S;

Generating Permutations Let S = [a,b,c] The permutations of S are [a,b,c] [a,c,b] [b,a,c] [b,c,a] [c,a,b] [c,b,a]

Recursive Program to Generate Perm(L), L = [a 1,, a n ] S = Perm(L) if Length(L) = 1 return {L}; else for a in L do S = Perm(L/a); // delete a from L for s in S do S = S [a,s];

Alternate Approach Instead of generating all possibilities and checking the result to see if it is a word, check each word to see if it is a partial anagram. To check a word see if it has the right letters make sure each letter occurs an allowable number of times

Problem 5 - Solution 1 S = {a,g,h,i,n,o,s,t,w} S Λ-a Λ-a Λ S0 \n S1 S0 a S1 a S5 Check Letters Filter Double a s

Problem 5 - Solution 1, cont. Λ Λ-a Λ S0 a S1 a S2 g Λ-g S3 g Λ S4 Λ-w Λ w S18 w S19

Problem 5 Solution 2 $tr A-Z a-z </usr/dict/words \ egrep '^[aghinostw]*$' \ egrep v \ 'a.*a g.*g h.*h i.*i n.*n.*n o.*o s.*s t.*t w.* w' a ago ah an angst

State Machines and Automata Finite set of states, start state, Accepting States Transition from state to state depending on next input The language accepted by a finite automata is the set of input strings that end up in accepting states

Problem 6 Create a finite state automata that accepts strings of a s and b s with an even number of a s. b a > S1 S0 a abbbabaabbb 011110010000 b

Problem 6 Program to implement FSA a > b S1 S0 b a bool EA() { S0: x = getchar(); if (x == b ) goto S0; if (x == a ) goto S1; if (x == ENDM) return true; S1: x = getchar(); if (x == b ) goto S1; if (x == a ) goto S0; if (x == ENDM) return false; }

Problem 7 Create a regular expression for the language that consists of strings of a s and b s with an even number of a s. b a S1 a b* (b*ab*a)* > S0 b

Problem 8 Create a grammar that generates the language that consists of strings of a s and b s with an even number of a s. b a > b S1 S0 a <S0> b<s0> <S0> a<s1> <S0> ε <S1> b<s1> <S1> a<s0>

Equivalence of Regular Expressions and Finite Automata The languages accepted by finite automata are equivalent to those generated by regular expressions Given any regular expression R, there exists a finite state automata M such that L(M) = L(R) see Problems 9 and 10 for an indication of why this is true. Given any finite state automata M, there exists a regular expression R such that L(R) = L(M) see Problem 7 for an indication why this is true.

Proof of Equivalence of Regular Expressions and Finite Automata Sec. 10.8 of the text proves that there is a finite state automata that recognizes the language generated by any given regular expression. The proof is by induction on the number of operators in the regular expression and uses a finite state automata with ε transitions. Epsilon transitions are introduced to simplify the construction used in the proof. It is then shown that any finite state automata with ε transitions can be converted to a regular finite state automata.

Proof of Equivalence of Regular Expressions and Finite Automata Sec. 10.9 of the text shows how to derive a regular expression that generates the same language that is accepted by a given finite state automata. The basic idea is to combine the transitions in each node along all paths that lead to an accepting state. The combination of the characters along the paths are described using regular expressions. See Problem 7 for an example.

Proof of Equivalence of Regular Expressions and Finite Automata The proofs given in Sections 10.8 and 10.9 are constructive: an algorithm is given that constructs a finite state automata given a regular expression, and an algorithm is given that derives the regular expression given a finite state automata. This means the conversion process can be implemented. In fact, it is commonly the case that regular expressions are used to describe patterns and that a program is created to match the pattern based on the conversion of a regular expression into a finite state automata.

Finite State Automata from Base case: a Regular Expressions > a Union Given REs R 1 and R 2 > > R 1 R 2 > ε ε R 1 R 2 ε ε For any machine w/more than 1 accept state, we can add a new, single accepting state, and add epsilon transitions

Concatenation FSA from REs > Closure > R 1 R ε 2 ε R 1 ε

Problem 9 Construct a finite state automata with ε transitions that accepts the language generated by the regular expression (a bc) ε S1 a S2 ε > S0 S6 ε b c S3 S4 S5 ε

Problem 10 Find an equivalent finite state automata to the one in problem 9 that does not use ε transitions > S1 a S2 b S4 c S5

Grammars and Regular Expressions Given a regular expression R, there exists a grammar with syntactic category <S> such that L(R) = L(<S>). There are grammars such that there does NOT exist a regular expression R with L(<S>) = L(R) <S> a<s>b ε L(<S>) = {a n b n, n=0,1,2, }

Proof that a n b n is not Recognized by a Finite State Automata The proof is a proof by contradiction. In this type of proof, we assume that something is true and then show that this leads to a contradiction (something that is false). The only way out of this situation is that the assumption was wrong. This implies that what we assumed true is in fact false. To show that there is no finite state automata that recognizes the language L = {a n b n, n = 0,1,2, }, we assume that there is a finite state automata M that recognizes L and show that this leads to a contradiction.

Proof that a n b n is not Recognized by a Finite State Automata Since M is a finite state automata it has a finite number of states. Let the number of states = m. Since M recognizes the language L all strings of the form a k b k must end up in accepting states. Choose such a string with k = n which is greater than m. Since n > m there must be a state s that is visited twice while the string a n is read [we can only visit m distinct states and since n > m after reading (m+1) a s, we must go to a state that was already visited].

Proof that a n b n is not Recognized by a Finite State Automata Suppose that state s is reached after reading the strings a j and a k (j k). Since the same state is reached for both strings, the finite state machine can not distinguish strings that begin with a j from strings that begin with a k. Therefore, the finite state automata must either accept or reject both of the strings a j b j and a k b j. However, a j b j should be accepted, while a k b j should not be accepted. The only way out of this contradiction is that the assumption that there was a finite state automata that recognizes the language L was wrong.