ɛ-closure, Kleene s Theorem,

DEGefWb5wiGH2XgYMEzUKajEmtCDUsRQ4d 1 A nice paper relevant to this course is titled The Glory of the Past 2 NICTA Researcher, Adjunct at the Australian National University and Griffith University ɛ-closure, Kleene s Theorem, and Kleene s Algebra Charles Gretton 2 24 February 2014 NICTA Funding and Supporting Members and Partners Kleene s Theorem Copyright NICTA 2014 Charles Gretton 2 1/30

First, a Quick Summary of ɛ-closure I ve previously waived my hands about ɛ-labelled arcs. Recall, ɛ is our notation for the empty string. How can we formally treat ɛ-labelled arcs in an NFA? Using a function called ECLOSE... A construction similar to that for NFA DFA is used to prove ɛ-nfa DFA Kleene s Theorem Copyright NICTA 2014 Charles Gretton 2 2/30

Quick Summary of ɛ-closure ɛ-nfa is Q, Σ, δ, q 0, F. ECLOSE : Q 2 Q i.e. gives an element in the powerset of states. It is defined inductively, as follows. BASIS: q ECLOSE(q) i.e. the argument is in the basis. INDUCTION: Suppose p ECLOSE(q) and r δ(p, ɛ), then r ECLOSE(q). Kleene s Theorem Copyright NICTA 2014 Charles Gretton 2 3/30

Quick Example of ɛ-closure ECLOSE : Q 2 Q i.e. gives an element in the powerset of states. It is defined inductively, as follows. BASIS: q ECLOSE(q) i.e. the argument is in the basis. INDUCTION: Suppose p ECLOSE(q) and r δ(p, ɛ), then r ECLOSE(q).??? Kleene s Theorem Copyright NICTA 2014 Charles Gretton 2 4/30

Quick Example of ɛ-closure ECLOSE : Q 2 Q i.e. gives an element in the powerset of states. It is defined inductively, as follows. BASIS: q ECLOSE(q) i.e. the argument is in the basis. INDUCTION: Suppose p ECLOSE(q) and r δ(p, ɛ), then r ECLOSE(q). ECLOSE(1) = {1, 2, 3, 4, 6} Kleene s Theorem Copyright NICTA 2014 Charles Gretton 2 5/30

ɛ-closure Subset Construction For DFA ɛ-nfa, just add δ(q, ɛ) = transitions to all states i.e. ɛ is not an alphabet symbol. ɛ-nfa DFA Same as for NFA DFA, Each DFA state corresponds to a unique set of ɛ-nfa states. Only take the ɛ-closure at ɛ-nfa transitions. Kleene s Theorem Copyright NICTA 2014 Charles Gretton 2 6/30

Summarizing NFA DFA The equivalent NFA can be exponentially smaller than the corresponding DFA. ɛ-nfa DFA From transitivity: ɛ-nfa NFA All formalisms so far are expressively equivalent. Kleene s Theorem Copyright NICTA 2014 Charles Gretton 2 7/30

Languages and Regular Expressions Regular expression implementations in many UNIX tools: grep, sed, awk, emacs and vi Following Kleene s seminal work, we deal with regularity formally. We describe the language intended by a given regular expression. We consider the class of languages that can be expressed. We examine algebraic laws which inform re-writing rules for simplifying expressions. Kleene s Theorem Copyright NICTA 2014 Charles Gretton 2 8/30

Operations on Languages L 1, L 2 Σ product of languages inductive i th product L 1.L 2 = L 1 L 2 = {WX W L 1, X L 2 } positive closure L 0 = {ɛ} i > 0, L i = LL i 1 L = i 0 Li L + = LL Kleene s Theorem Copyright NICTA 2014 Charles Gretton 2 9/30

Regular Expressions A regular expression is a well formed word over a given alphabet Σ We use the following symbols: Σ {ɛ ( ) +. } alphabet empty string empty set open parenthesis close parenthesis union or choice concatenation X.Y often written as XY Kleene closure the star of this show Kleene s Theorem Copyright NICTA 2014 Charles Gretton 2 10/30

Regular Expressions A regular expression is a well formed word over a given alphabet Σ We use the following symbols: Σ {ɛ ( ) +. } symbol :: alphabet constant (or 0-ary operator) :: empty string constant (or 0-ary operator) :: empty set punctuation :: open parenthesis punctuation :: close parenthesis infix binary operator :: union or choice infix binary operator :: concatenation X.Y often written as XY postfix unary operator :: Kleene closure the star of this show Kleene s Theorem Copyright NICTA 2014 Charles Gretton 2 11/30

Regular Expressions We use the following symbols: Σ {ɛ ( ) +. } Taking a Σ {ɛ, }, regular expressions satisfy the following grammar: regxp ::= a (regxp) regxp + regxp regxp.regxp regxp The Kleene closure operator, *, has the highest precedence. Concatenation,., is usually represented by juxtaposition e.g. concatenation a.b is written ab Concatenation has the next highest precedence. Union/choice has the least precedence. Kleene s Theorem Copyright NICTA 2014 Charles Gretton 2 12/30

Regular Expression Examples Taking a Σ {ɛ, }, regular expressions satisfy the following grammar: regxp ::= a (regxp) regxp + regxp regxp.regxp regxp So, which of the following is correct? ab + a = (a(b + a)) ab + a = (ab) + a ab + a = (a(b )) + a Kleene s Theorem Copyright NICTA 2014 Charles Gretton 2 13/30

Regular Expression Examples Taking a Σ {ɛ, }, regular expressions satisfy the following grammar: regxp ::= a (regxp) regxp + regxp regxp.regxp regxp So, which of the following is correct? ab + a = (a(b + a)) ab + a = (ab) + a ab + a = (a(b )) + a Kleene s Theorem Copyright NICTA 2014 Charles Gretton 2 14/30

Semantics of Regular Expressions BASIS Language of Expression Semantics L(ɛ) = {ɛ} L( ) = L(a), a Σ = {a} INDUCTION Below, L 1 and L 2 are well formed regular expressions. Language of Expression Semantics L(L 1 + L 2 ) = L(L 1 ) L(L 2 ) L(L 1 L 2 ) = L(L 1 )L(L 2 ) L(L 1 ) = L(L 1) L((L 1 )) = L(L 1 ) Kleene s Theorem Copyright NICTA 2014 Charles Gretton 2 15/30

Every Finite Language is Regular A regular language is a language that can be given by a regular expression. A finite language is a finite set of strings: {X 1, X 2,.., X n } The regular expression that gives that finite language is: X 1 + X 2 +.. + X n QED Kleene s Theorem Copyright NICTA 2014 Charles Gretton 2 16/30

Algebra and Regular Expressions Commutativity laws: L 1 + L 2 = L 2 + L 1 Kleene s Theorem Copyright NICTA 2014 Charles Gretton 2 17/30

Algebra and Regular Expressions Cancellation laws: Identities Annihilators L + = L + L = L ɛl = L Lɛ = L L = L = Kleene s Theorem Copyright NICTA 2014 Charles Gretton 2 18/30

Algebra and Regular Expressions Associative laws: L 1 + (L 2 + L 3 ) = (L 1 + L 2 ) + L 3 L 1 (L 2 L 3 ) = (L 1 L 2 )L 3 Power associativity L 1 (L 1 L 1 ) = (L 1 L 1 )L 1......... (L 1 L 1 )(L 1 L 1 ) = (L 1 (L 1 L 1 ))L 1 Kleene s Theorem Copyright NICTA 2014 Charles Gretton 2 19/30

Algebra and Regular Expressions Left and right distributive laws: L 1 (L 2 + L 3 ) = L 1 L 2 + L 1 L 3 (L 2 + L 3 )L 1 = L 2 L 1 + L 3 L 1 Idempotence: L + L = L Closure laws There are exactly two languages whose closures are finite. What are they? Kleene s Theorem Copyright NICTA 2014 Charles Gretton 2 20/30

Algebra and Regular Expressions Left and right distributive laws: Idempotence: L 1 (L 2 + L 3 ) = L 1 L 2 + L 1 L 3 (L 2 + L 3 )L 1 = L 2 L 1 + L 3 L 1 Closure laws L + L = L ɛ = ɛ = ɛ (L ) = L Kleene s Theorem Copyright NICTA 2014 Charles Gretton 2 21/30

Application of Algebraic Laws a + ab precedence: (a + (a(b ))) annihilators: (aɛ + (a(b ))) left distributive law: a(ɛ + b ) cancellation law: a(b ) CONCLUSION: ab = a + ab Kleene s Theorem Copyright NICTA 2014 Charles Gretton 2 22/30

Relating Concrete and Lifted Expressions When I write (L ) = L, I intended that L could be any regular expression. For example, the following concrete expressions are instances of this identity. 1 ((a) ) = a 2 ((ɛ(a + bab + ) + bab) ) = (ɛ(a + bab + ) + bab) The above are called concrete regular expressions, because they contain no variables. Our textbook talks about a specific class of concrete expression (above, Type 1), where each variable symbol from the lifted expression is replaced with an alphabet constant. Kleene s Theorem Copyright NICTA 2014 Charles Gretton 2 23/30

Theorem Relating Concrete and Lifted Expressions Let E be a regular expression with variable symbols L i, i [1..m]. For example taking m = 2, E could be (L 1 + L 2 ), (L 1 L 2 ), etc Let E be an expression where each variable L i is replaced by a constant a i. Every X L(E) can be written as X 1 X 2..X k, where each substring X j is in one of the languages L(L i ). Writing L(X j ) = i if X j is from the i th language, a L(X1 )a L(X2 )..a L(Xk ) L(E ) Kleene s Theorem Copyright NICTA 2014 Charles Gretton 2 24/30

Theorem Relating Concrete and Lifted Expressions BASIS: E {ɛ,, L 1 }. The cases E {ɛ, } are trivial, because the concrete and lifted expressions are identical. Taking E = L 1, then L(E) = L(L 1 ). Also E = a 1 and L(E ) = L(a 1 ) = {a 1 }. Above, the L 1 /a 1 correspondence is clearly satisfied. Kleene s Theorem Copyright NICTA 2014 Charles Gretton 2 25/30

Theorem Relating Concrete and Lifted Expressions INDUCTION: E {E 1 + E 2, E 1 E 2, E 1 }. E 1 and E 2 are constructed so that if a variable appears in both subexpressions, then that is substituted with the same concrete symbol. For example, take E = L 1 L 2 + L 2 L 1. Then a valid concrete expression is: E = a 1 a 2 + a 2 a 1. An invalid concrete expression would be: E = a 1 a 2 + a 3 a 4. Kleene s Theorem Copyright NICTA 2014 Charles Gretton 2 26/30

Theorem Relating Concrete and Lifted Expressions INDUCTION: E {E 1 + E 2, E 1 E 2, E 1 }. E 1 and E 2 are constructed so that if a variable appears in both subexpressions, then that is substituted with the same concrete symbol. L(E ) = L(E 1 + E 2 ) = L(E 1 ) L(E 2 ) L(E) = L(E 1 + E 2 ) = L(E 1 ) L(E 2 ) Our inductive hypothesis gives us the correspondence for strings in L(E 1 )/L(E 1 ). Also for strings in L(E 2 )/L(E 2 ). Following the definition of union, we have proved this step case. Kleene s Theorem Copyright NICTA 2014 Charles Gretton 2 27/30

Theorem Relating Concrete and Lifted Expressions INDUCTION: E {E 1 + E 2, E 1 E 2, E 1 }. E 1 and E 2 are constructed so that if a variable appears in both subexpressions, then that is substituted with the same concrete symbol. L(E ) = L(E 1.E 2 ) = L(E 1 ).L(E 2 ) L(E) = L(E 1.E 2 ) = L(E 1 ).L(E 2 ) Our inductive hypothesis gives us the correspondence for strings in L(E 1 )/L(E 1 ). Also for strings in L(E 2 )/L(E 2 ). Following the definition of concatenation, we have proved this step case. Kleene s Theorem Copyright NICTA 2014 Charles Gretton 2 28/30

Theorem Relating Concrete and Lifted Expressions INDUCTION: E {E 1 + E 2, E 1 E 2, E 1 }. L((E 1 ) ) = L(ɛ) L(E 1 )+ L(E 1 ) = L(ɛ) L(E 1) + Our inductive hypothesis gives us the correspondence for strings in L(E 1 )/L(E 1 ). Because closure yields zero or more occurrences of the language strings, we also have a correspondence between L(E 1 ) /L(E 1 ). QED Kleene s Theorem Copyright NICTA 2014 Charles Gretton 2 29/30

Theorem Relating Concrete and Lifted Expressions What if we added to the step cases: E {E 1 E 2, E 1 + E 2, E 1 E 2, E 1 }? Let s propose a law L 1 L 2 L 3 = L 1 L 2 The law holds for a concretization: L(a b c) = L(a b) = But fails generally: L(a a ) L(a a) So, the theorem we just proved is rather specific to regular expressions. Kleene s Theorem Copyright NICTA 2014 Charles Gretton 2 30/30