Regular Expressions. Seungjin Choi

Regular Expressions Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 / 27

Outline Regular expressions: One type of language-defining notation Regular expressions and finite automata Regular grammars (will be discussed in Lecture 4) 2 / 27

Operators Union: Concatenation: L M = {w w L or w M}. LM = {w w = xy, x L, y M}. Powers: L 0 = {ɛ}, L 1 = L, L n+1 = LL n. Kleene Closure (star-closure): L = L 0 L 1 L 2 = i=0l i. 3 / 27

i Question: What are 0, i,? 0 = {ɛ}. i, i 1 is empty since we cannot select any strings from the empty set. = {ɛ}. 4 / 27

Regular Expressions A FA (DFA or NFA) is a blueprint for constructing a machine recognizing a regular language. (machine-like description) A regular expression is a user-friendly declarative way of describing a regular language. (algebraic description) Involves a combination of: strings of symbols from Σ parentheses operators (+,, ). (union, concatenation, star-closure) 5 / 27

Examples {a, b, c} (regular language) a + b + c (regular expression). (a + b c) represents the star-closure of {a} {bc}, which is {ɛ, a, bc, abc, bca, bcbc, aaa,...}. 01 + 10 : The language consisting of all strings that are either a single 0 followed by any number of 1 s or a single 1 followed by any number of 0 s. UNIX grep command UNIX Lex (lexical analyzer generator) and Flex (fast lex) tools 6 / 27

Inductive Definition of Regular Expressions Definition Let Σ be a given alphabet. Then 1., ɛ, and a Σ are all regular expressions. These are called primitive regular expressions. 2. If r 1 and r 2 are regular expressions, so are r 1 + r 2, r 1 r 2, r 1, (r 1). 3. A string is a regular expression if and only if it can be derived from primitive regular expressions by a finite number of applications of the rules described above. 7 / 27

Language L(r) Definition The language L(r) denoted by any regular expression r, is defined by the following rules: 1. is a regular expression denoting the empty set. 2. ɛ is a regular expression denoting {ɛ}. 3. For every a Σ, a is a regular expression denoting {a}. 4. L(r 1 + r 2 ) = L(r 1 ) L(r 2 ). 5. L(r 1 r 2 ) = L(r 1 )L(r 2 ). 6. L((r 1 )) = L(r 1 ). 7. L(r 1 ) = (L(r 1)). 8 / 27

Example Exhibit the language L (a (a + b)) in set notation. Solution. L (a (a + b)) = L(a )L(a + b) = (L(a)) (L(a) L(b)) = {ɛ, a, aa, aaa,...} {a, b} = {a, aa, aaa,..., b, ab, aab,...}. 9 / 27

Another Example Given Σ = {a, b} and r = (a + b) (a + bb), find L(r). L(r) = L ((a + b) ) L(a + bb) = (L(a + b)) (L(a) L(bb)) = (L(a) L(b)) ( L(a) L 2 (b) ) = {a, b} ({a} {bb}) = {a, b} {a, bb} = {ɛ, a, b, aa, ab, ba, bb, aaa, aab,...}{a, bb} = {a, bb, aa, abb, ba, bbb,...}. (a + b) represents any string of a s and b s. {a} = {ɛ, a, aa,...}. 10 / 27

More Examples 1. Given r = (aa) (bb) b, determine L(r). The set of strings with an even number of a s followed by an odd number of b s, i.e., L(r) = {a 2n b 2m+1 n 0, m 0}. 2. For Σ = {0, 1}, give a regular expression r such that L(r) = {w Σ w has at least one pair of consecutive zeros}. r = (0 + 1) 00(0 + 1). 3. Find a regular expression for L = {w {0, 1} w has no pair of consecutive zeros}. r = (1 011 ) (0 + ɛ) + 1 (0 + ɛ). Alternatively, we see L as the }{{}}{{} ending in 0 all 1 s repetitions of the strings 1 and 01, i.e., r = (1 + 01) (0 + ɛ). 11 / 27

Connection between Regular Expressions and Regular Languages For every regular language, there is a regular expression and vice versa. Theorem Let r be a regular expression. Then, there exists some NFA that accepts L(r). Consequently L(r) is a regular language. Proof is shown in the next several slides! 12 / 27

Proof We begin with automata that accept languages for simple regular expressions, ɛ, and a Σ. q 0 q 1 NFA accepts e q 0 q 1 NFA accepts ɛ a q 0 q 1 NFA accepts a M(r) NFA accepts L(r) 13 / 27

ǫ ǫ M(r 1 ) M(r 2 ) ǫ ǫ Automaton for L(r 1 + r 2 ) ǫ M(r 1 ) ǫ M(r 2 ) ǫ Automaton for L(r 1 r 2 ) 14 / 27

ǫ ǫ M(r 1 ) ǫ ǫ Automaton for L(r1 ) W.l.g we consider a NFA with a single final state. Just like building automata for L(r 1 + r 2 ), L(r 1 r 2 ), L(r1 ), we can build automata for arbitrary regular expressions. 15 / 27

Example Find an NFA which accepts L(r) where r = (a + bb) (ba + ɛ). 16 / 27

Generalized Transition Graph Generalized transition graph is a transition graph whose edges are labeled with regular expressions. a c a + b L (a + a (a + b)c ) 17 / 27

Remove State q d e c q i q q j a b ae d ce d ce b q i q j ae b After removing state q 18 / 27

Canonical Form r 1 r 3 r 4 q i q j r 2 Associated regular expression is r = r 1 r 2 (r 4 + r 3 r 1 r 2). 19 / 27

Regular Language and Regular Expression Theorem Let L be a regular language. Then there exists a regular expression r such that L = L(r). Proof. Let M be an NFA that accepts L. W.l.g. we can assume that M has only one final state and that q 0 / F. We interpret the graph M as a generalized transition graph and apply the construction (illustrated in previous slides) to it. We use the method of removing state q. We continue this process, removing one state after the other, until we reach the canonical form. Then the regular expression is of the form in the previous slide. 20 / 27

Example Find a regular expression for L = {w {a, b} n a (w) is even and n b (w) is odd}. The associate DFA is given by a EE a OE b b a b b EO OO a 21 / 27

The canonical graph is of the form aa + ab(bb) ba b + a(bb) ba a(bb) a EE EO b + ab(bb) a 22 / 27

Algebraic Laws: Commutativity and Associativity Commutative law for union: r 1 + r 2 = r 2 + r 1 Commutative law for concatenation: r 1 r 2 r 2 r 1 Associative law for union: (r 1 + r 2 ) + r 3 = r 1 + (r 2 + r 3 ) Associative law for concatenation: (r 1 r 2 )r 3 = r 1 (r 2 r 3 ) 23 / 27

Algebraic Laws: Identities and Annihilators Identities ( and ɛ) + r = r + = r ɛ r = r ɛ = r Annihilator ( ) r = r = 24 / 27

Algebraic Laws: Distributive Laws Left distributive lat of concatenation over union r 1 (r 2 + r 3 ) = r 1 r 2 + r 1 r 3 Right distributive law of concatenation over union (r 2 + r 3 )r 1 = r 2 r 1 + r 3 r 1 25 / 27

Algebraic Laws: Idempotent Law Definition An operator is said to be idempotent if the result of applying it to two of the same values as arguments is that that value. Common arithmetic operators are not idempotent, i.e., x + x x, x x x. In regular expressions, idempotent law for union r + r = r 26 / 27

Algebraic Laws: Laws Involving Closures (r ) = r = ɛ ɛ = ɛ r + = rr = r r r = r + + ɛ 27 / 27