Regular Expressions. Seungjin Choi

Similar documents
6.045: Automata, Computability, and Complexity Or, Great Ideas in Theoretical Computer Science Spring, Class 4 Nancy Lynch

C H A P T E R Regular Expressions regular expression

Scanner. tokens scanner parser IR. source code. errors

Formal Languages and Automata Theory - Regular Expressions and Finite Automata -

CMPSCI 250: Introduction to Computation. Lecture #19: Regular Expressions and Their Languages David Mix Barrington 11 April 2013

Regular Languages and Finite State Machines

CS103B Handout 17 Winter 2007 February 26, 2007 Languages and Regular Expressions

Automata and Formal Languages

Reading 13 : Finite State Automata and Regular Expressions

THEORY of COMPUTATION

Introduction to Finite Automata

Compiler Construction

Automata and Computability. Solutions to Exercises

Automata Theory. Şubat 2006 Tuğrul Yılmaz Ankara Üniversitesi

Finite Automata. Reading: Chapter 2

Pushdown Automata. place the input head on the leftmost input symbol. while symbol read = b and pile contains discs advance head remove disc from pile

ASSIGNMENT ONE SOLUTIONS MATH 4805 / COMP 4805 / MATH 5605

Lecture 18 Regular Expressions

CSC4510 AUTOMATA 2.1 Finite Automata: Examples and D efinitions Definitions

Finite Automata. Reading: Chapter 2

Lecture 2: Regular Languages [Fa 14]

Automata on Infinite Words and Trees

Pushdown automata. Informatics 2A: Lecture 9. Alex Simpson. 3 October, School of Informatics University of Edinburgh als@inf.ed.ac.

Fundamentele Informatica II

Regular Expressions and Automata using Haskell

The Halting Problem is Undecidable

Lights and Darks of the Star-Free Star

Deterministic Finite Automata

Introduction to Automata Theory. Reading: Chapter 1

Arkansas Tech University MATH 4033: Elementary Modern Algebra Dr. Marcel B. Finan

Regular Languages and Finite Automata

CS154. Turing Machines. Turing Machine. Turing Machines versus DFAs FINITE STATE CONTROL AI N P U T INFINITE TAPE. read write move.

CS5236 Advanced Automata Theory

3515ICT Theory of Computation Turing Machines

Finite Automata and Regular Languages

26 Integers: Multiplication, Division, and Order

6.080/6.089 GITCS Feb 12, Lecture 3

Honors Class (Foundations of) Informatics. Tom Verhoeff. Department of Mathematics & Computer Science Software Engineering & Technology

Informatique Fondamentale IMA S8

MACM 101 Discrete Mathematics I

Overview of E0222: Automata and Computability

Properties of Real Numbers

Compiler I: Syntax Analysis Human Thought

Regular Languages and Finite Automata

NFAs with Tagged Transitions, their Conversion to Deterministic Automata and Application to Regular Expressions

Mathematics for Computer Science/Software Engineering. Notes for the course MSM1F3 Dr. R. A. Wilson

ω-automata Automata that accept (or reject) words of infinite length. Languages of infinite words appear:

Automata, languages, and grammars

Composability of Infinite-State Activity Automata*

Section 1.1 Real Numbers

(IALC, Chapters 8 and 9) Introduction to Turing s life, Turing machines, universal machines, unsolvable problems.

Lecture 2 Matrix Operations

ÖVNINGSUPPGIFTER I SAMMANHANGSFRIA SPRÅK. 15 april Master Edition

Genetic programming with regular expressions

CS 3719 (Theory of Computation and Algorithms) Lecture 4

Matrix Algebra. Some Basic Matrix Laws. Before reading the text or the following notes glance at the following list of basic matrix algebra laws.

4.5 Finite Mathematical Systems

PRIMARY CONTENT MODULE Algebra I -Linear Equations & Inequalities T-71. Applications. F = mc + b.

Lecture I FINITE AUTOMATA

Basics of Compiler Design

24 Uses of Turing Machines

So let us begin our quest to find the holy grail of real analysis.

Math Workshop October 2010 Fractions and Repeating Decimals

Omega Automata: Minimization and Learning 1

Converting Finite Automata to Regular Expressions

How to make the computer understand? Lecture 15: Putting it all together. Example (Output assembly code) Example (input program) Anatomy of a Computer

A Second Course in Mathematics Concepts for Elementary Teachers: Theory, Problems, and Solutions

Web Data Extraction: 1 o Semestre 2007/2008

Linear Algebra. A vector space (over R) is an ordered quadruple. such that V is a set; 0 V ; and the following eight axioms hold:

Math 312 Homework 1 Solutions

Abstract Algebra Cheat Sheet

Boolean Algebra Part 1

CM2202: Scientific Computing and Multimedia Applications General Maths: 2. Algebra - Factorisation

Learning Analysis by Reduction from Positive Data

Class One: Degree Sequences

Theory of Computation Class Notes 1

The Optimum One-Pass Strategy for Juliet

CSE 135: Introduction to Theory of Computation Decidability and Recognizability

University of Toronto Department of Electrical and Computer Engineering. Midterm Examination. CSC467 Compilers and Interpreters Fall Semester, 2005

Boolean Algebra. Boolean Algebra. Boolean Algebra. Boolean Algebra

Formal Deænition of Finite Automaton. 1. Finite set of states, typically Q. 2. Alphabet of input symbols, typically æ.

ML-Flex Implementation Notes

A.2. Exponents and Radicals. Integer Exponents. What you should learn. Exponential Notation. Why you should learn it. Properties of Exponents

Lexical analysis FORMAL LANGUAGES AND COMPILERS. Floriano Scioscia. Formal Languages and Compilers A.Y. 2015/2016

A Little Set Theory (Never Hurt Anybody)

Elements of Abstract Group Theory

Answer Key for California State Standards: Algebra I

2.0 Chapter Overview. 2.1 Boolean Algebra

IV. ALGEBRAIC CONCEPTS

LINEAR DIFFERENTIAL EQUATIONS IN DISTRIBUTIONS

4. FIRST STEPS IN THE THEORY 4.1. A

Continued Fractions and the Euclidean Algorithm


Learning Automata and Grammars

Properties of Stabilizing Computations

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

Finite dimensional C -algebras

Undergraduate Notes in Mathematics. Arkansas Tech University Department of Mathematics

December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS

INCIDENCE-BETWEENNESS GEOMETRY

Transcription:

Regular Expressions Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 / 27

Outline Regular expressions: One type of language-defining notation Regular expressions and finite automata Regular grammars (will be discussed in Lecture 4) 2 / 27

Operators Union: Concatenation: L M = {w w L or w M}. LM = {w w = xy, x L, y M}. Powers: L 0 = {ɛ}, L 1 = L, L n+1 = LL n. Kleene Closure (star-closure): L = L 0 L 1 L 2 = i=0l i. 3 / 27

i Question: What are 0, i,? 0 = {ɛ}. i, i 1 is empty since we cannot select any strings from the empty set. = {ɛ}. 4 / 27

Regular Expressions A FA (DFA or NFA) is a blueprint for constructing a machine recognizing a regular language. (machine-like description) A regular expression is a user-friendly declarative way of describing a regular language. (algebraic description) Involves a combination of: strings of symbols from Σ parentheses operators (+,, ). (union, concatenation, star-closure) 5 / 27

Examples {a, b, c} (regular language) a + b + c (regular expression). (a + b c) represents the star-closure of {a} {bc}, which is {ɛ, a, bc, abc, bca, bcbc, aaa,...}. 01 + 10 : The language consisting of all strings that are either a single 0 followed by any number of 1 s or a single 1 followed by any number of 0 s. UNIX grep command UNIX Lex (lexical analyzer generator) and Flex (fast lex) tools 6 / 27

Inductive Definition of Regular Expressions Definition Let Σ be a given alphabet. Then 1., ɛ, and a Σ are all regular expressions. These are called primitive regular expressions. 2. If r 1 and r 2 are regular expressions, so are r 1 + r 2, r 1 r 2, r 1, (r 1). 3. A string is a regular expression if and only if it can be derived from primitive regular expressions by a finite number of applications of the rules described above. 7 / 27

Language L(r) Definition The language L(r) denoted by any regular expression r, is defined by the following rules: 1. is a regular expression denoting the empty set. 2. ɛ is a regular expression denoting {ɛ}. 3. For every a Σ, a is a regular expression denoting {a}. 4. L(r 1 + r 2 ) = L(r 1 ) L(r 2 ). 5. L(r 1 r 2 ) = L(r 1 )L(r 2 ). 6. L((r 1 )) = L(r 1 ). 7. L(r 1 ) = (L(r 1)). 8 / 27

Example Exhibit the language L (a (a + b)) in set notation. Solution. L (a (a + b)) = L(a )L(a + b) = (L(a)) (L(a) L(b)) = {ɛ, a, aa, aaa,...} {a, b} = {a, aa, aaa,..., b, ab, aab,...}. 9 / 27

Another Example Given Σ = {a, b} and r = (a + b) (a + bb), find L(r). L(r) = L ((a + b) ) L(a + bb) = (L(a + b)) (L(a) L(bb)) = (L(a) L(b)) ( L(a) L 2 (b) ) = {a, b} ({a} {bb}) = {a, b} {a, bb} = {ɛ, a, b, aa, ab, ba, bb, aaa, aab,...}{a, bb} = {a, bb, aa, abb, ba, bbb,...}. (a + b) represents any string of a s and b s. {a} = {ɛ, a, aa,...}. 10 / 27

More Examples 1. Given r = (aa) (bb) b, determine L(r). The set of strings with an even number of a s followed by an odd number of b s, i.e., L(r) = {a 2n b 2m+1 n 0, m 0}. 2. For Σ = {0, 1}, give a regular expression r such that L(r) = {w Σ w has at least one pair of consecutive zeros}. r = (0 + 1) 00(0 + 1). 3. Find a regular expression for L = {w {0, 1} w has no pair of consecutive zeros}. r = (1 011 ) (0 + ɛ) + 1 (0 + ɛ). Alternatively, we see L as the }{{}}{{} ending in 0 all 1 s repetitions of the strings 1 and 01, i.e., r = (1 + 01) (0 + ɛ). 11 / 27

Connection between Regular Expressions and Regular Languages For every regular language, there is a regular expression and vice versa. Theorem Let r be a regular expression. Then, there exists some NFA that accepts L(r). Consequently L(r) is a regular language. Proof is shown in the next several slides! 12 / 27

Proof We begin with automata that accept languages for simple regular expressions, ɛ, and a Σ. q 0 q 1 NFA accepts e q 0 q 1 NFA accepts ɛ a q 0 q 1 NFA accepts a M(r) NFA accepts L(r) 13 / 27

ǫ ǫ M(r 1 ) M(r 2 ) ǫ ǫ Automaton for L(r 1 + r 2 ) ǫ M(r 1 ) ǫ M(r 2 ) ǫ Automaton for L(r 1 r 2 ) 14 / 27

ǫ ǫ M(r 1 ) ǫ ǫ Automaton for L(r1 ) W.l.g we consider a NFA with a single final state. Just like building automata for L(r 1 + r 2 ), L(r 1 r 2 ), L(r1 ), we can build automata for arbitrary regular expressions. 15 / 27

Example Find an NFA which accepts L(r) where r = (a + bb) (ba + ɛ). 16 / 27

Generalized Transition Graph Generalized transition graph is a transition graph whose edges are labeled with regular expressions. a c a + b L (a + a (a + b)c ) 17 / 27

Remove State q d e c q i q q j a b ae d ce d ce b q i q j ae b After removing state q 18 / 27

Canonical Form r 1 r 3 r 4 q i q j r 2 Associated regular expression is r = r 1 r 2 (r 4 + r 3 r 1 r 2). 19 / 27

Regular Language and Regular Expression Theorem Let L be a regular language. Then there exists a regular expression r such that L = L(r). Proof. Let M be an NFA that accepts L. W.l.g. we can assume that M has only one final state and that q 0 / F. We interpret the graph M as a generalized transition graph and apply the construction (illustrated in previous slides) to it. We use the method of removing state q. We continue this process, removing one state after the other, until we reach the canonical form. Then the regular expression is of the form in the previous slide. 20 / 27

Example Find a regular expression for L = {w {a, b} n a (w) is even and n b (w) is odd}. The associate DFA is given by a EE a OE b b a b b EO OO a 21 / 27

The canonical graph is of the form aa + ab(bb) ba b + a(bb) ba a(bb) a EE EO b + ab(bb) a 22 / 27

Algebraic Laws: Commutativity and Associativity Commutative law for union: r 1 + r 2 = r 2 + r 1 Commutative law for concatenation: r 1 r 2 r 2 r 1 Associative law for union: (r 1 + r 2 ) + r 3 = r 1 + (r 2 + r 3 ) Associative law for concatenation: (r 1 r 2 )r 3 = r 1 (r 2 r 3 ) 23 / 27

Algebraic Laws: Identities and Annihilators Identities ( and ɛ) + r = r + = r ɛ r = r ɛ = r Annihilator ( ) r = r = 24 / 27

Algebraic Laws: Distributive Laws Left distributive lat of concatenation over union r 1 (r 2 + r 3 ) = r 1 r 2 + r 1 r 3 Right distributive law of concatenation over union (r 2 + r 3 )r 1 = r 2 r 1 + r 3 r 1 25 / 27

Algebraic Laws: Idempotent Law Definition An operator is said to be idempotent if the result of applying it to two of the same values as arguments is that that value. Common arithmetic operators are not idempotent, i.e., x + x x, x x x. In regular expressions, idempotent law for union r + r = r 26 / 27

Algebraic Laws: Laws Involving Closures (r ) = r = ɛ ɛ = ɛ r + = rr = r r r = r + + ɛ 27 / 27