Automata theory. An algorithmic approach. Lecture Notes. Javier Esparza



Similar documents
Homework 3 Solutions

Regular Sets and Expressions

One Minute To Learn Programming: Finite Automata

EQUATIONS OF LINES AND PLANES

Reasoning to Solve Equations and Inequalities

Section 5-4 Trigonometric Functions

PROF. BOYAN KOSTADINOV NEW YORK CITY COLLEGE OF TECHNOLOGY, CUNY

Math 135 Circles and Completing the Square Examples

0.1 Basic Set Theory and Interval Notation

Regular Languages and Finite Automata

FORMAL LANGUAGES, AUTOMATA AND THEORY OF COMPUTATION EXERCISES ON REGULAR LANGUAGES

CS99S Laboratory 2 Preparation Copyright W. J. Dally 2001 October 1, 2001

A Visual and Interactive Input abb Automata. Theory Course with JFLAP 4.0

Polynomial Functions. Polynomial functions in one variable can be written in expanded form as ( )

Modular Generic Verification of LTL Properties for Aspects

Babylonian Method of Computing the Square Root: Justifications Based on Fuzzy Techniques and on Computational Complexity

Pentominoes. Pentominoes. Bruce Baguley Cascade Math Systems, LLC. The pentominoes are a simple-looking set of objects through which some powerful

Bayesian Updating with Continuous Priors Class 13, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

flex Regular Expressions and Lexical Scanning Regular Expressions and flex Examples on Alphabet A = {a,b} (Standard) Regular Expressions on Alphabet A

Example 27.1 Draw a Venn diagram to show the relationship between counting numbers, whole numbers, integers, and rational numbers.

Operations with Polynomials

Regular Repair of Specifications

LINEAR TRANSFORMATIONS AND THEIR REPRESENTING MATRICES

Or more simply put, when adding or subtracting quantities, their uncertainties add.

Solution to Problem Set 1

Binary Representation of Numbers Autar Kaw

9 CONTINUOUS DISTRIBUTIONS

A.7.1 Trigonometric interpretation of dot product A.7.2 Geometric interpretation of dot product

Appendix D: Completing the Square and the Quadratic Formula. In Appendix A, two special cases of expanding brackets were considered:

Use Geometry Expressions to create a more complex locus of points. Find evidence for equivalence using Geometry Expressions.

2 DIODE CLIPPING and CLAMPING CIRCUITS

1.2 The Integers and Rational Numbers

Algebra Review. How well do you remember your algebra?

P.3 Polynomials and Factoring. P.3 an 1. Polynomial STUDY TIP. Example 1 Writing Polynomials in Standard Form. What you should learn

Generating In-Line Monitors For Rabin Automata

1.00/1.001 Introduction to Computers and Engineering Problem Solving Fall Final Exam

Graphs on Logarithmic and Semilogarithmic Paper

Factoring Polynomials

5 a LAN 6 a gateway 7 a modem

6.2 Volumes of Revolution: The Disk Method

Physics 43 Homework Set 9 Chapter 40 Key

Second Term MAT2060B 1. Supplementary Notes 3 Interchange of Differentiation and Integration

Unit 6: Exponents and Radicals

4.11 Inner Product Spaces

Example A rectangular box without lid is to be made from a square cardboard of sides 18 cm by cutting equal squares from each corner and then folding

Lec 2: Gates and Logic

Object Semantics Lecture 2

. At first sight a! b seems an unwieldy formula but use of the following mnemonic will possibly help. a 1 a 2 a 3 a 1 a 2

Basic Analysis of Autarky and Free Trade Models

Mathematics. Vectors. hsn.uk.net. Higher. Contents. Vectors 128 HSN23100

Integration by Substitution

Gene Expression Programming: A New Adaptive Algorithm for Solving Problems

Automated Grading of DFA Constructions

and thus, they are similar. If k = 3 then the Jordan form of both matrices is

Unambiguous Recognizable Two-dimensional Languages

9.3. The Scalar Product. Introduction. Prerequisites. Learning Outcomes

FAULT TREES AND RELIABILITY BLOCK DIAGRAMS. Harry G. Kwatny. Department of Mechanical Engineering & Mechanics Drexel University

All pay auctions with certain and uncertain prizes a comment

Pointed Regular Expressions

1. Introduction Texts and their processing

Protocol Analysis / Analysis of Software Artifacts Kevin Bierhoff

Words Symbols Diagram. abcde. a + b + c + d + e

Concept Formation Using Graph Grammars

RTL Power Optimization with Gate-level Accuracy

Vectors Recap of vectors

Bypassing Space Explosion in Regular Expression Matching for Network Intrusion Detection and Prevention Systems

On the expressive power of temporal logic

Learning Workflow Petri Nets

MATH 150 HOMEWORK 4 SOLUTIONS

On decidability of LTL model checking for process rewrite systems

A formal model for databases in DNA

Two hours UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE. Date: Friday 16 th May Time: 14:00 16:00

Rotating DC Motors Part II

Lectures 8 and 9 1 Rectangular waveguides

Basic Research in Computer Science BRICS RS Brodal et al.: Solving the String Statistics Problem in Time O(n log n)

Integration. 148 Chapter 7 Integration

Experiment 6: Friction

Java CUP. Java CUP Specifications. User Code Additions You may define Java code to be included within the generated parser:

Hillsborough Township Public Schools Mathematics Department Computer Programming 1

5.2. LINE INTEGRALS 265. Let us quickly review the kind of integrals we have studied so far before we introduce a new one.

Learning Outcomes. Computer Systems - Architecture Lecture 4 - Boolean Logic. What is Logic? Boolean Logic 10/28/2010

Decision Rule Extraction from Trained Neural Networks Using Rough Sets

An Undergraduate Curriculum Evaluation with the Analytic Hierarchy Process

Vector differentiation. Chapters 6, 7

Section 5.2, Commands for Configuring ISDN Protocols. Section 5.3, Configuring ISDN Signaling. Section 5.4, Configuring ISDN LAPD and Call Control

Small Businesses Decisions to Offer Health Insurance to Employees

MODULE 3. 0, y = 0 for all y

Solutions for Selected Exercises from Introduction to Compiler Design

FUNCTIONS AND EQUATIONS. xεs. The simplest way to represent a set is by listing its members. We use the notation

19. The Fermat-Euler Prime Number Theorem

COMPONENTS: COMBINED LOADING

RIGHT TRIANGLES AND THE PYTHAGOREAN TRIPLETS

AntiSpyware Enterprise Module 8.5

DlNBVRGH + Sickness Absence Monitoring Report. Executive of the Council. Purpose of report

Vendor Rating for Service Desk Selection

Chapter. Contents: A Constructing decimal numbers

Lecture 3 Gaussian Probability Distribution

Review guide for the final exam in Math 233

Virtual Machine. Part II: Program Control. Building a Modern Computer From First Principles.

Transcription:

Automt theory An lgorithmic pproch 0 Lecture Notes Jvier Esprz My 3, 2016

2

3 Plese red this! Mny yers go I don t wnt to sy how mny, it s depressing I tught course on the utomt-theoretic pproch to model checking t the Technicl University of Munich, sing it on lectures notes for nother course on the sme topic tht Moshe Vrdi hd recently tught in Isrel. Between my lectures I extended nd polished the notes, nd sent them to Moshe. At tht time he nd Orn Kupfermn were thinking of writing ook, nd the ide cme up of doing it together. We mde some progress, ut life nd other work got in the wy, nd the project hs een postponed so mny times tht it I don t dre to predict completion dte. Some of the work tht got in the wy ws the stndrd course on utomt theory in Munich, which I hd to tech severl times. The syllus contined oth utomt on finite nd infinite words, nd for the ltter I used our notes. Ech time I hd to tech the course gin, I took the opportunity to dd some new mteril out utomt on finite words, which lso required to reshpe the chpters on infinite words, nd the notes kept growing nd evolving. Now they ve reched the point where they re in sufficiently good shpe to e shown not only to my students, ut to lrger udience. So, fter getting Orn nd Moshe s very kind permission, I ve decided to mke them ville here. Despite severl ttempts I hven t yet convinced Orn nd Moshe to pper s co-uthors of the notes. But I don t give up: prt from the mteril we wrote together, their influence on the rest is much lrger thn they think. Actully, my secret hope is tht fter they see this mteril in my home pge we ll finlly mnge to gther some morsels of time here nd there nd finish our joint project. If you think we should do so, tell us! Send n emil to: vrdi@cs.rice.edu, orn@cs.huji.c.il, nd esprz@in.tum.de. Sources I hven t yet compiled creful list of the sources I ve used, ut I m listing here the min ones. I pologize in dvnce for ny omissions. The chpter on utomt for fixed-length lnguges ( Finite Universes ) ws very influenced y Henrik Reif Andersen s eutiful introduction to Binry Decision Digrms, ville t www.itu.dk/courses/ava/e2005/dd-ep.pdf. The short chpter on pttern mtching is influenced y Dvid Eppstein s lecture notes for his course on Design nd Anlysis of Algorithms, see http://www.ics.uci.edu/ eppstein/tech.html. As mentioned ove, the chpters on opertions for Büchi utomt nd pplictions to verifiction re hevily sed on notes y Orn Kupfermn nd Moshe Vrdi. The chpter on the emptiness prolem for Büchi utomt is sed on severl reserch ppers:

4 Jen-Michel Couvreur: On-the-Fly Verifiction of Liner Temporl Logic. World Congress on Forml Methods 1999: 253-271 Jen-Michel Couvreur, Alexndre Duret-Lutz, Denis Poitrenud: On-the-Fly Emptiness Checks for Generlized Bchi Automt. SPIN 2005: 169-184. Kthi Fisler, Rnn Frer, Gil Kmhi, Moshe Y. Vrdi, Zijing Yng: Is There Best Symolic Cycle-Detection Algorithm? TACAS 2001:420-434 Jco Geldenhuys, Antti Vlmri: More efficient on-the-fly LTL verifiction with Trjn s lgorithm. Theor. Comput. Sci. (TCS) 345(1):60-82 (2005) Stefn Schwoon, Jvier Esprz: A Note on On-the-Fly Verifiction Algorithms. TACAS 2005:174-190. The chpter on Liner Arithmetic is hevily sed on the work of Bernrd Boigelot, Pierre Wolper, nd their co-uthors, in prticulr the pper An effective decision procedure for liner rithmetic over the integers nd rels, pulished in ACM. Trns. Comput. Logic 6(3) in 2005. Acknowledgments First of ll, thnks to Orn Kupfermn nd Moshe Vrdi for ll the resons explined ove (if you hven t red the section Plese red this yet, plese do it now!). Mny thnks to Jörg Kreiker, Jn Kretinsky, nd Michel Luttenerger for mny discussions on the topic of this notes, nd for their contriutions to severl chpters. All three of them helped me to tech the utomt course of different ocssions. In prticulr, Jn contriuted lot to the chpter on pttern mtching. Breno Fri helped to drw mny figures. He ws funded y progrm of the Computer Science Deprtment Technicl University of Munich. Thnks lso to Fio Bove, Birgit Engelmnn, Moritz Fuchs, Stefn Krusche, Philipp Müller, Mrtin Perzl, Mrcel Ruegenerg, Frnz Sller, Hyk Shoukourin, nd Dniel Weißuer, who provided very helpful comments.

Contents 1 Introduction nd Outline 11 1.1 Outline........................................ 12 I Automt on Finite Words 15 2 Automt Clsses nd Conversions 17 2.1 Regulr expressions: lnguge to descrie lnguges............... 17 2.2 Automt clsses................................... 18 2.2.1 Using DFAs s dt structures........................ 20 2.3 Conversion Algorithms etween Finite Automt.................. 25 2.3.1 From NFA to DFA............................... 25 2.3.2 From NFA-ε to NFA.............................. 27 2.4 Conversion lgorithms etween regulr expressions nd utomt......... 32 2.4.1 From regulr expressions to NFA-ε s.................... 33 2.4.2 From NFA-ε s to regulr expressions.................... 34 2.5 A Tour of Conversions................................ 38 3 Minimiztion nd Reduction 49 3.1 Miniml DFAs.................................... 50 3.2 Minimizing DFAs................................... 53 3.2.1 Computing the lnguge prtition...................... 53 3.2.2 Quotienting.................................. 56 3.2.3 Hopcroft s lgorithm............................. 58 3.3 Reducing NFAs.................................... 60 3.3.1 The reduction lgorithm........................... 61 3.4 A Chrcteriztion of the Regulr Lnguges.................... 65 4 Opertions on Sets: Implementtions 71 4.1 Implementtion on DFAs............................... 72 4.1.1 Memership.................................. 72 5

6 CONTENTS 4.1.2 Complement.................................. 72 4.1.3 Binry Boolen Opertions.......................... 73 4.1.4 Emptiness................................... 76 4.1.5 Universlity.................................. 76 4.1.6 Inclusion.................................... 77 4.1.7 Equlity.................................... 77 4.2 Implementtion on NFAs............................... 77 4.2.1 Memership.................................. 78 4.2.2 Complement.................................. 79 4.2.3 Union nd intersection............................. 79 4.2.4 Emptiness nd Universlity.......................... 81 4.2.5 Inclusion nd Equlity............................. 85 5 Applictions I: Pttern mtching 93 5.1 The generl cse................................... 93 5.2 The word cse..................................... 95 5.2.1 Lzy DFAs.................................. 98 6 Opertions on Reltions: Implementtions 103 6.1 Encodings....................................... 104 6.2 Trnsducers nd Regulr Reltions.......................... 105 6.3 Implementing Opertions on Reltions........................ 107 6.3.1 Projection................................... 107 6.3.2 Join, Post, nd Pre.............................. 109 6.4 Reltions of Higher Arity............................... 113 7 Finite Universes 119 7.1 Fixed-length Lnguges nd the Mster Automton................. 119 7.2 A Dt Structure for Fixed-length Lnguges.................... 121 7.3 Opertions on fixed-length lnguges........................ 123 7.4 Determiniztion nd Minimiztion.......................... 129 7.5 Opertions on Fixed-length Reltions........................ 130 7.6 Decision Digrms.................................. 135 7.6.1 Decision Digrms nd Kernels....................... 137 7.6.2 Opertions on Kernels............................ 139 8 Applictions II: Verifiction 147 8.1 The Automt-Theoretic Approch to Verifiction.................. 147 8.2 Progrms s Networks of Automt......................... 149 8.2.1 Prllel Composition............................. 153 8.2.2 Asynchonous Product............................ 154

CONTENTS 7 8.2.3 Stte- nd event-sed properties....................... 155 8.3 Concurrent Progrms................................. 156 8.3.1 Expressing nd Checking Properties..................... 157 8.4 Coping with the Stte-Explosion Prolem...................... 158 8.4.1 On-the-fly verifiction............................. 160 8.4.2 Compositionl Verifiction.......................... 161 8.4.3 Symolic Stte-spce Explortion...................... 164 8.5 Sfety nd Liveness Properties............................ 169 9 Automt nd Logic 177 9.1 First-Order Logic on Words.............................. 177 9.1.1 Expressive power of FO(Σ)......................... 180 9.2 Mondic Second-Order Logic on Words....................... 181 9.2.1 Expressive power of MSO(Σ)......................... 182 10 Applictions III: Presurger Arithmetic 197 10.1 Syntx nd Semntics................................. 197 10.2 An NFA for the Solutions over the Nturls...................... 199 10.2.1 Equtions................................... 203 10.3 An NFA for the Solutions over the Integers...................... 204 10.3.1 Equtions................................... 207 10.3.2 Algorithms.................................. 208 II Automt on Infinite Words 211 11 Clsses of ω-automt nd Conversions 213 11.1 ω-lnguges nd ω-regulr expressions....................... 213 11.2 Büchi utomt.................................... 214 11.2.1 From ω-regulr expressions to NBAs nd ck............... 216 11.2.2 Non-equivlence of NBA nd DBA..................... 218 11.3 Generlized Büchi utomt............................. 219 11.4 Other clsses of ω-utomt............................. 220 11.4.1 Co-Büchi Automt............................. 221 11.4.2 Muller utomt............................... 225 11.4.3 Rin utomt................................ 228 12 Boolen opertions: Implementtions 233 12.1 Union nd intersection................................ 233 12.2 Complement...................................... 236 12.2.1 The prolems of complement........................ 236 12.2.2 Rnkings nd rnking levels......................... 238

8 CONTENTS 12.2.3 A (possily infinite) complement utomton................. 239 12.2.4 The size of A................................. 244 13 Emptiness check: Implementtions 249 13.1 Algorithms sed on depth-first serch........................ 249 13.1.1 The nested-dfs lgorithm.......................... 252 13.1.2 The two-stck lgorithm........................... 258 13.2 Algorithms sed on redth-first serch....................... 270 13.2.1 Emerson-Lei s lgorithm........................... 271 13.2.2 A Modified Emerson-Lei s lgorithm.................... 273 13.2.3 Compring the lgorithms.......................... 275 14 Applictions I: Verifiction nd Temporl Logic 279 14.1 Automt-Bsed Verifiction of Liveness Properties................. 279 14.1.1 Checking Liveness Properties........................ 280 14.2 Liner Temporl Logic................................ 283 14.3 From LTL formuls to generlized Büchi utomt................. 286 14.3.1 Stisfction sequences nd Hintikk sequences............... 286 14.3.2 Constructing the NGA for n LTL formul................. 290 14.3.3 Size of the NGA............................... 292 14.4 Automtic Verifiction of LTL Formuls....................... 293 15 Applictions II: Mondic Second-Order Logic nd Liner Arithmetic 297 15.1 Mondic Second-Order Logic on ω-words...................... 297 15.1.1 Expressive power of MSO(Σ) on ω-words.................. 298 15.2 Liner Arithmetic................................... 299 15.2.1 Encoding Rel Numers........................... 299 15.3 Constructing n NBA for the Rel Solutions..................... 300 15.3.1 A NBA for the Solutions of x F β.................... 302 16 Solutions to exercises 307

Why this ook? There re excellent textooks on utomt theory, rnging from course ooks for undergrdutes to reserch monogrphies for specilists. Why nother one? During the lte 1960s nd erly 1970s the min ppliction of utomt theory ws the development of lexicogrphic nlyzers, prsers, nd compilers. Anlyzers nd prsers determine whether n input string conforms to given syntx, while compilers trnsform strings conforming to syntx into equivlent strings conforming to nother. With these pplictions in mind, it is nturl to look t utomt s strct mchines tht ccept, reject, or trnsform input strings, nd this view deply influenced the textook presenttion of utomt theory. Results out the expressive power of mchines, equivlences etween models, nd closure properties, received much ttention, while constructions on utomt, like the powerset or product construction, often plyed suordinte rôle s proof tools. To give simple exmple, in mny textooks of the time nd in lter textooks written in the sme style the product construction is not introduced s n lgorithm tht, given two NFAs recognizing lnguges L 1 nd L 2, constructs third NFA recognizing their intersection L 1 L 2. Insted, the text contins theorem stting tht regulr lnguges re closed under intersection, nd the product construction is hidden in its proof. Moreover, it is not presented s n lgorithm, ut s the mthemticl, sttic definition of the sets of sttes, trnsition reltion, etc. of the product utomton. Sometimes, the simple ut computtionlly importnt fct tht only sttes rechle from the initil stte need e constructed is not even mentioned. I clim tht this presenttion style, summrized y the slogn utomt s strct mchines, is no longer dequte. In the second hlf of the 1980s nd in the 1990s progrm verifiction emerged s new nd exciting ppliction of utomt theory. Automt were used to descrie the ehviour or intended ehviour of hrdwre nd softwre systems, not their syntx, nd this shift from syntx to semntics hd importnt consequences. While utomt for lexicl or syntcticl nlysis typiclly hve t most some thousnds of sttes, utomt for semntic descriptions cn esily hve tens of millions. In order to hndle utomt of this size it ecme impertive to py specil ttention to efficient constructions nd lgorithmic issues, nd reserch in this direction mde gret progress. Moreover, utomt on infinite words, clss of utomt models originlly introduced in the 60s to solve strct prolems in logic, ecme necessry to specify nd verify liveness properties of softwre. These utomt run over words of infinite length, nd so they cn hrdly e seen s mchines ccepting or rejecting n input: they could only do so fter infinite time! 9

10 CONTENTS This ook intends to reflect the evolution of utomt theory. Modern utomt theory puts more emphsis on lgorithmic questions, nd less on expressivity. This chnge of focus is cptured y the new slogn utomt s dt structures. Just s hsh tles nd Fioncci heps re oth dequte dt structures for representing sets depending when the opertions one needs re those of dictionry or priority queue, utomt re the right dt structure for represent sets nd reltions when the required opertions re union, intersection, complement, projections nd joins. In this view the lgorithmic implementtion of the opertions gets the limelight, nd, s consequence, they constitute the spine of this ook. The shpe of the ook is lso very influenced y two further design decisions. First, experience tells tht utomt-theoretic constructions re est explined y mens of exmples, nd tht exmples re est presented with the help of pictures. Automt on words re lessed with grphicl representtion of instntneous ppel. We hve invested much effort into finding illustrtive, nontrivil exmples whose grphicl representtion stillfits in one pge. Second, for students lerning directly from ook, solved exercises re lessing, n esy wy to evlute progress. Moreover, thy cn lso e used to introduce topics tht, for expository resons, cnnot e presented in the min text. The ook contins lrge numer of solved exercises rnging from simple pplictions of lgorithms to reltively involved proofs.

Chpter 1 Introduction nd Outline Courses on dt structures show how to represent sets of ojects in computer so tht opertions like insertion, deletion, lookup, nd mny others cn e efficiently implemented. Typicl representtions re hsh tles, serch trees, or heps. These lecture notes lso del with the prolem of representing nd mnipulting sets, ut with respect to different set of opertions: the oolen opertions of set theory (union, intersection, nd complement with respect to some universe set), some tests tht check sic properties (if set is empty, if it contins ll elements of the universe, or if it is contined in nother one), nd opertions on reltions. Tle 1.1 formlly defines the opertions to e supported, where U denotes some universe of ojects, X, Y re susets of U, x is n element of U, nd R, S U U re inry reltions on U: Oserve tht mny other opertions, for exmple set difference, cn e reduced to the ones ove. Similrly, opertions on n-ry reltions for n 3 cn e reduced to opertions on inry reltions. An importnt point is tht we re not only interested on finite sets, we wish to hve dt structure le to del with infinite sets over some infinite universe. However, simple crdinlity rgument shows tht no dt structure cn provide finite representtions of ll infinite sets: n infinite universe hs uncountly mny susets, ut every dt structure mpping sets to finite representtions only hs countly mny instnces. (Loosely speking, there re more sets to e represented thn representtions ville.) Becuse of this limittion every good dt structure for infinite sets must find resonle compromise etween expressiility (how lrge is the set of representle sets) nd mnipulility (which opertions cn e crried out, nd t which cost). These notes present the compromise offered y word utomt, which, s shown y 50 yers of reserch on the theory of forml lnguges, is the est one ville for most purposes. Word utomt, or just utomt, represent nd mnipulte sets whose elements re encoded s words, i.e., s sequences of letters over n lphet 1. Any kind of oject cn e represented y word, t lest in principle. Nturl numers, for 1 There re generliztions of word utomt in which ojects re encoded s trees. The theory of tree utomt is lso very well developed, ut not the suject of these notes. So we shorten word utomton to just utomton. 11

12 CHAPTER 1. INTRODUCTION AND OUTLINE Opertions on sets) Complement(X) : returns U \ X. Intersection(X, Y) : returns X Y. Union(X, Y) : returns X Y. Tests on sets Memer(x, X) : returns true if x X, flse otherwise. Empty(X) : returns true if X =, flse otherwise. Universl(X) : returns true if X = U, flse otherwise. Included(X, Y) : returns true if X Y, flse otherwise. Equl(X, Y) : returns true if X = Y, flse otherwise. Opertions on reltions Projection 1(R) : returns the set π 1 (R) = {x y (x, y) R}. Projection 2(R) : returns the set π 2 (R) = {y x (x, y) R}. Join(R, S ) : returns the reltion R S = {(x, z) y X (x, y) R (y, z) S } Post(X, R) : returns the set post R (X) = {y U x X (x, y) R}. Pre(X, R) : returns the set pre R (X) = {y U x X (y, x) R}. Tle 1.1: Opertions nd tests for mnipultion of sets nd reltions instnce, re represented in computer science s sequences of digits, i.e., s words over the lphet of digits. Vectors nd lists cn lso e represented s words y conctenting the word representtions of their elements. As mtter of fct, whenever computer stores n oject in file, the computer is representing it s word over some lphet, like ASCII or Unicode. So word utomt re very generl dt structure. However, while ny oject cn e represented y word, not every oject cn e represented y finite word, tht is, word of finite length. Typicl exmples re rel numers nd non-terminting executions of progrm. When ojects cnnot e represented y finite words, computers usully only represent some pproximtion: flot insted of rel numer, or finite prefix insted of non-terminting computtion. In the second prt of the notes we show how to represent sets of infinite ojects exctly using utomt on infinite words. While the theory of utomt on finite words is often considered gold stndrd of theoreticl computer science powerful nd eutiful theory with lots of importnt pplictions in mny fields utomt on infinite words re hrder, nd their theory does not chieve the sme degree of perfection. This gives us structure for Prt II of the notes: we follow the steps of Prt I, lwys compring the solutions for infinite words with the gold stndrd. 1.1 Outline Prt I presents dt structures nd lgorithms for the well-known clss of regulr lnguges.

1.1. OUTLINE 13 Chpter 2 introduces the clssicl dt structures for the representtion of regulr lnguges: regulr expressions, deterministic finite utomt (DFA), nondeterministic finite utomt (NFA), nd nondeterministic utomt with ε-trnsitions. We refer to ll of them s utomt. The chpter presents some exmples showing how to use utomt to finitely represent sets of words, numers or progrm sttes, nd descries conversions lgorithms etween the representtions. All lgorithms re well known (nd cn lso e found in other textooks) with the exception of the lgorithm for the elimintion of ɛ-trnsitions. Chpter 3 ddress the issue of finding smll representtions for given set. It shows tht there is unique miniml representtion of lnguge s DFA, nd introduces the clssicl minimiztion lgorithms. It then shows how to the lgorithms cn e extended to reduce the size of NFAs. Chpter 4 descries lgorithms implementing oolen set opertions nd tests on DFAs nd NFAs. It includes recent, simple improvement in lgorithms for universlity nd inclusion. Chpter 5 presents first, clssicl ppliction of the techniques nd results of Chpter 4: pttern mtching. Even this ppliction gets new twist when exmined from the utomt-s-dtstructures point of view. The chpter presents the Knuth-Morris-Prtt lgorithm s the design of new dt structure, lzy DFAs, for which the memership opertion cn e performed very efficiently. Chpter 6 shows how to implement opertions on reltions. It discusses the notion of encoding (which requires more cre for opertions on reltrions thn for opertions on sets), nd introduces trnsducers s dt structure. Chpter 7 presents utomt dt structures for the importnt specil cse in which the universe U of ojects is finite. In this cse ll ojects cn e encoded y words of the sme length, nd the set nd reltion opertions cn e optimized. In prticulr, one cn then use miniml DFAs s dt structure, nd directly implement the lgorithms without using ny minimiztion lgorithm. In the second prt of the chpter, we show tht (ordered) Binry Decision Digrms (BDDs) re just further optimiztion of miniml DFAs s dt structure. We introduce slightly more generl clss of deterministic utomt, nd show tht the miniml utomton in this more generl clss (which is lso unique) hs t most s mny sttes s the miniml DFA. We then show how to implement the set nd reltion opertions for this new representtion. Chpter 8 pplies nerly ll the constructions nd lgorithms of previous chpter to the prolem of verifying sfety properties of sequentil nd concurrent progrms with ounded-rnge vriles. In prticulr, the chpter shows how to model concurrent progrms s networks of utomt, how to express sfety properties using utomt or regulr expressions, nd how to utomticlly check the properties using the lgorithmic constructions of previous chpters. Chpter 9 introduces first-order logic (FOL) nd mondic-second order logic (MSOL) on words s representtion llowing us to descried regulr lnguge s the set of words stisfying property. The chpter shows tht FOL cnnot descrie ll regulr lnguges, nd tht MSOL does. Chpter 10 introduces Presurger rithmetic, nd the lgorithm tht computes n utomton encoding ll the solutions of given formul. In prticulr, it presents n lgorithm to compute n utomton for the solutions of liner inequlity over the nturls or over the integers.

14 CHAPTER 1. INTRODUCTION AND OUTLINE Prt II presents dt structures nd lgorithms for ω-regulr lnguges. Chpter 11 introduces ω-regulr expressions nd severl different clsses of ω-utomt: deterministic nd nondterministic Büchi, generlized Büchi, co-büchi, Muller, Rin, nd Street utomt. It explins the dvntges nd disdvntges of ech clss, in prticulr whether the utomt in the clss cn e determinized, nd presents conversion lgorithms etween the clsses. Chpter 12 presents implementtions of the set opertions (union, intersection nd complementtion) for Büchi nd generlized Büchi utomt. In prticulr, it presents in detil complementtion lgorithm for Büchi utomt. Chpter?? presents different implementtions of the emptiness test for Büchi nd generlized Büchi utomt. the first prt of the chpter presents two liner-time implementtions sed on depth-first-serch (DFS): the nested-dfs lgorithm nd the two-stck lgorithm, modifiction of Trjn s lgorithm for the computtion of strongly connected components. The second prt presents further implemnttions sed on redth-first-serch. Chpter 14 pplies the lgorithms of previous chpters to the prolem of verifying liveness properties of progrms. After n introductory exmple, the chpter introduces Liner Temporl Logic s property specifiction formlism, nd shows how to lgorithmiclly otin Büchi utomton recognizing the lnguge of ll words stisfying the formul. The verifiction lgorithm cn then e reduced to comintion of the oolen opertions nd emptiness check. Chpter 15 extends the logic pproch to regulr lnguges studied in Chpters 9 nd 10 to ω- words. The first prt of the chpter introduces mondic second-order logic on ω-words, nd shows how to construct Büchi utomton recognizing the set of words stisfying given formul. The second prt introduces liner rithnetic, the frist-order theory of therel numers with ddition, nd shows how to construct Büchi utomton recognizing the encodings of ll the rel numers stisfying given formul.

Prt I Automt on Finite Words 15

Chpter 2 Automt Clsses nd Conversions In Section 2.1 we introduce sic definitions out words nd lnguges, nd then introduce regulr expressions, textul nottion for defining lnguges of finite words. Like ny other forml nottion, it cnnot e used to define ech possile lnguge. However, the next chpter shows tht they re n dequte nottion when deling with utomt, since they define exctly the lnguges tht cn e represented y utomt on words. 2.1 Regulr expressions: lnguge to descrie lnguges An lphet is finite, nonempty set. The elements of n lphet re clled letters. A finite, possily empty sequence of letters is word. A word 1 2... n hs length n. The empty word is the only word of length 0 nd it is written ε. The conctention of two words w 1 = 1... n nd w 2 = 1... m is the word w 1 w 2 = 1... n 1... m, sometimes lso denoted y w 1 w 2. Notice tht ε w = w = w ε = w. For every word w, we define w 0 = ε nd w k+1 = w k w. Given n lphet Σ, we denote y Σ the set of ll words over Σ. A set L Σ of words is lnguge over Σ. The complement of lnguge L is the lnguge Σ \ L, which we often denote y L (notice tht this nottion implicitly ssumes the lphet Σ is fixed). The conctention of two lnguges L 1 nd L 2 is L 1 L 2 = {w 1 w 2 Σ w 1 L 1, w 2 L 2 }. The itertion of lnguge L Σ is the lnguge L = i 0 L i, where L 0 = {ε} nd L i+1 = L i L for every i 0. Definition 2.1 Regulr expressions r over n lphet Σ re defined y the following grmmr, where Σ r ::= ε r 1 r 2 r 1 + r 2 r The set of ll regulr expressions over Σ is written RE(Σ). The lnguge L(r) Σ of regulr expression r RE(Σ) is defined inductively y 17

18 CHAPTER 2. AUTOMATA CLASSES AND CONVERSIONS L( ) = L(r 1 r 2 ) = L(r 1 ) L(r 2 ) L(r ) = L(r) L(ε) = {ε} L(r 1 + r 2 ) = L(r 1 ) L(r 2 ) L() = {} A lnguge L is regulr if there is regulr expression r such tht L = L(r). We often use lnguge, nd identify regulr expression nd its lnguge. For instnce, when there is no risk of confusion we write the lnguge r insted of the lnguge L(r). Exmple 2.2 Let Σ = {0, 1}. Some exmples of lnguges expressile y regulr expressions re: The set of ll words: (0 + 1). We often use Σ s n revition of (0 + 1), nd so Σ s n revition of Σ. The set of ll words of length t most 4: (0 + 1 + ε) 4. The set of ll words tht egin nd end with 0: 0Σ 0. The set of ll words contining t lest one pir of 0s exctly 5 letters prt. Σ 0Σ 4 0Σ. The set of ll words contining n even numer of 0s: ( 1 01 01 ). The set of ll words contining n even numer of 0s nd n even numer of 1s: ( 00 + 11 + (01 + 10)(00 + 11) (01 + 10) ). 2.2 Automt clsses We riefly recpitulte the definitions of deterministic nd nondeterministic finite utomt, s well s nondeterministic utomt with ε-trnsitions nd regulr expressions. Definition 2.3 A deterministic utomton (DA) is tuple A = (Q, Σ, δ, q 0, F), where Q is set of sttes, Σ is n lphet, δ: Q Σ Q is trnsition function, q 0 Q is the initil stte, nd F Q is the set of finl sttes.

2.2. AUTOMATA CLASSES 19 0 1 A run of A on input 0 1... n 1 is sequence q 0 q 1 q 2... q n, such tht q i Q for 0 i n, nd δ(q i, i ) = q i+1 for 0 i < n 1. A run is ccepting if q n F. A ccepts word w Σ, if there is n ccepting run on input w. The lnguge recognized y A is the set L(A) = {w Σ w is ccepted y A}. A deterministic finite utomton (DFA) is DA with finite set of sttes. Notice tht DA hs exctly one run on given word. Given DA, we often sy the word w leds from q 0 to q, mening tht the unique run of the DA on the word w ends t the stte q. Grphiclly, non-finl sttes of DFA re represented y circles, nd finl sttes y doule circles (see the exmple elow). The trnsition function is represented y leled directed edges: if δ(q, ) = q then we drw n edge from q to q leled y. We lso drw n edge into the initil stte. Exmple 2.4 Figure 2.4 shows the grphicl representtion of the DFA A = (Q, Σ, δ, q 0, F), where Q = {q 0, q 1, q 2, q 3 }, Σ = {, }, F = {q 0 }, nd δ is given y the following tle The runs of A on nd re n 1 δ(q 0, ) = q 1 δ(q 1, ) = q 0 δ(q 2, ) = q 3 δ(q 3, ) = q 2 δ(q 0, ) = q 3 δ(q 1, ) = q 2 δ(q 2, ) = q 1 δ(q 3, ) = q 0 q 0 q 0 q 1 q 1 q 0 q 2 q 3 q 1 q 0 q 2 The first one is ccepting, ut the second one is not. The DFA recognizes the lnguge of ll words over the lphet {, } tht contin n even numer of s nd n even numer of s. The DFA is in the sttes on the left, respectively on the right, if it hs red n even, respectively n odd, numer of s. Similrly, it is in the sttes t the top, respectively t the ottom, if it hs red n even, respectively n odd, numer of s. q 0 q 1 q 3 q 2 Figure 2.1: A DFA

20 CHAPTER 2. AUTOMATA CLASSES AND CONVERSIONS Trp sttes. Consider the DFA of Figure 2.2 over the lphet {,, c}. The utomton recognizes the lnguge {, }. The pink stte on the left is often clled trp stte or grge collector: if run reches this stte, it gets trpped in it, nd so the run cnnot e ccepting. DFAs often hve trp stte with mny ingoing trnsitions, nd this mkes difficult to find nice grphicl representtion. So when drwing DFAs we often omit the trp stte. For instnce, we only drw the lck prt of the utomton in Figure 2.2. Notice tht no informtion is lost: if stte q hs no outgoing trnsition leled y, then we know tht δ(q, ) = q t, where q t is the trp stte., c,, c,, c, c c Figure 2.2: A DFA with trp stte 2.2.1 Using DFAs s dt structures Here re four exmples of how DFAs cn e used to represent interesting sets of ojects. Exmple 2.5 The DFA of Figure 2.6 (drwn without the trp stte!) recognizes the strings over the lphet {,, 0, 1,..., 9} tht encode rel numers with finite deciml prt. We wish to exclude 002, 0, or 3.10000000, ut ccept 37, 10.503, or 0.234 s correct encodings. A description of the strings in English is rther long: string encoding numer consists of n integer prt, followed y possily empty frctionl prt; the integer prt consists of n optionl minus sign, followed y nonempty sequence of digits; if the first digit of this sequence is 0, then the sequence itself is 0; if the frctionl prt is nonempty, then it strts with., followed y nonempty sequence of digits tht does not end with 0; if the integer prt is 0, then the frctionl prt is nonempty. Exmple 2.6 The DFA of Figure recognizes the inry encodings of ll the multiples of 3. For instnce, it recognizes 11, 110, 1001, nd 1100, which re the inry encodings of 3, 6, 9, nd 12, respectively, ut not, sy, 10 or 111. Exmple 2.7 The DFA of Figure 2.5 recognizes ll the nonnegtive integer solutions of the ineqution 2x y 2, using the following encoding. The lphet of the DFA hs four letters,

2.2. AUTOMATA CLASSES 21 0 0,..., 9 1,..., 9 1,..., 9 0 1,..., 9 0 1,..., 9 0 Figure 2.3: A DFA for deciml numers 0 1 1 0 1 0 Figure 2.4: A DFA for the multiples of 3 in inry nmely A word like [ ] 0, 0 [ ] 0, 1 [ ] 1, nd 0 [ ] 1 1 [ ] [ ] [ ] [ ] [ ] [ ] 1 0 1 1 0 0 0 1 0 0 1 1. encodes pir of numers, given y the top nd ottom rows, 101100 nd 010011. The inry encodings strt with the lest significnt it, tht is 101100 encodes 2 0 + 2 2 + 2 3 = 13, nd 010011 encodes 2 2 + 2 5 + 2 6 = 50 We see this s n encoding of the vlution (x, y) := (13, 50). This vlution stisfies the ineqution, nd indeed the word is ccepted y the DFA. Exmple 2.8 Consider the following progrm with two oolen vriles x, y:

22 CHAPTER 2. AUTOMATA CLASSES AND CONVERSIONS [ 0 [ [ 0 0, [ 1 [ [ 0 1, [ 1 [ [ 1 1, [ [ ] [ 0 1 0, 1 [ [ 0 0, [ [ 1 1, [ ] [ 0 0 0 [ ] [ 1 1 0 Figure 2.5: A DFA for the solutions of 2x y 2. 1 while x = 1 do 2 if y = 1 then 3 x 0 4 y 1 x 5 end A configurtion of the progrm is triple [l, n x, n y ], where l {1, 2, 3, 4, 5} is the current vlue of the progrm counter, nd n x, n y {0, 1} re the current vlues of x nd y. The initil configurtions re [1, 0,, [1, 0,, [1, 1,, [1, 1,, i.e., ll configurtions in which control is t line 1. The DFA of Figure 2.6 recognizes ll rechle configurtions of the progrm. For instnce, the DFA ccepts [5, 0,, indicting tht it is possile to rech the lst line of the progrm with vlues x = 0, y = 1. 4 2 1 1 0 1 0, 1 0 0, 1 5 0 1 3 1 Figure 2.6: A DFA for the rechle configurtions of the progrm of Exmple 2.8 Definition 2.9 A non-deterministic utomton (NA) is tuple A = (Q, Σ, δ, Q 0, F), where Q, Σ,

2.2. AUTOMATA CLASSES 23 nd F re s for DAs, Q 0 is set of initil sttes nd δ: Q Σ P(Q) is trnsition reltion. 0 1 A run of A on input 0 1... n is sequence p 0 p 1 p 2... p n, such tht p i Q for 0 i n, p 0 Q 0, nd p i+1 δ(p i, i ) for 0 i < n 1. Tht is, run strts t some initil stte. A run is ccepting if p n F. A ccepts word w Σ, if there is n ccepting run on input w. The lnguge recognized y A is the set L(A) = {w Σ w is ccepted y A}. The runs of NAs re defined s for DAs, ut sustituting p 0 Q 0 for p i+1 δ(p i, i ) for δ(p i, i ) = p i+1. Acceptnce nd the lnguge recognized y NA re defined s for DAs. A nondeterministic finite utomton (NFA) is NA with finite set of sttes. A stte of NFA my hve zero, one, or mny outgoing trnsitions leled y the sme letter. Also, NFA my hve zero, one, or mny runs on the sme word. Oserve, however, tht the numer of runs on word is finite. We often identify the trnsition function δ of DA with the set of triples (q,, q ) such tht q = δ(q, ), nd the trnsition reltion δ of NFA with the set of triples (q,, q ) such tht q δ(q, ); so we often write (q,, q ) δ, mening q = δ(q, ) for DA, or q δ(q, ) for NA. If NFA hs severl initil sttes, then its lnguge is the union of the sets of words ccepted y runs strting t ech initil stte. Exmple 2.10 Figure 2.7 shows NFA A = (Q, Σ, δ, Q 0, F) where Q = {q 0, q 1, q 2, q 3 }, Σ = {, }, Q 0 = {q 0 }, F = {q 3 }, nd the trnsition reltion δ is given y the following tle δ(q 0, ) = {q 1 } δ(q 1, ) = {q 1 } δ(q 2, ) = δ(q 3, ) = {q 3 } δ(q 0, ) = δ(q 1, ) = {q 1, q 2 } δ(q 2, ) = {q 3 } δ(q 3, ) = {q 3 } A hs no run for ny word strting with. It hs exctly one run for, nd four runs for, nmely q 0 q 1 q 1 q 1 q 1 q 0 q 1 q 1 q 1 q 2 q 0 q 1 q 1 q 2 q 3 q 0 q 1 q 2 q 3 q 3 Two of these runs re ccepting, the other two re not. L(A) is the set of words tht strt with nd contin two consecutive s. n 1,, q 0 q 1 q 2 q 3 Figure 2.7: A NFA. Definition 2.11 A non-deterministic utomton with ε-trnsitions (NA-ε) is tuple A = (Q, Σ, δ, Q 0, F), where Q, Σ, q 0, nd F re s for NAs nd

24 CHAPTER 2. AUTOMATA CLASSES AND CONVERSIONS δ: Q (Σ {ε}) P(Q) is trnsition reltion. The runs nd ccepting runs of NA-ε re defined s for NAs. A ccepts word 1... n Σ if A hs n ccepting run on ε k 0 1 ε k 1... ε k n 1 n ε k n (Σ {ε}) for some k 0, k 1,..., k n 0. A nondeterministic finite utomton with ε-trnsitions (NFA-ε) is NA-ε with finite set of sttes. Notice tht, if some cycle of the utomton only ε-trnsitions, the numer of runs of NFA-ε on word my e even infinite. Exmple 2.12 Figure 2.8 shows NFA-ε. 0 1 2 ε ε Figure 2.8: A NFA-ε. Definition 2.13 Let A = (Q, Σ, δ, Q 0, F) e n utomton. A stte q Q is rechle from q Q if q = q or if there exists run q 1 n... q on some input 1... n Σ. A is in norml form if every stte is rechle from the initil stte. Convention: Unless otherwise stted, we ssume tht utomt re in norml form. All our lgorithms preserve norml forms, i.e., when the output is n utomton, the utomton is in norml form. We extend NAs to llow regulr expressions on trnsitions. Such utomt re clled NA-reg nd they re oviously generliztion of oth regulr expressions nd NA-εs. They will e useful to provide uniform conversion etween utomt nd regulr expressions. Definition 2.14 A non-deterministic utomton with regulr expression trnsitions (NA-reg) is tuple A = (Q, Σ, δ, Q 0, F), where Q, Σ, q 0, nd F re s for NAs, nd where δ: Q RE(Σ) P(Q) is reltion such tht δ(q, r) = for ll ut finite numer of pirs (q, r) Q RE(Σ). Accepting runs re defined s for NFAs. A ccepts word w Σ if A hs n ccepting run on r 1... r k such tht w = L(r 1 )... L(r k ). A nondeterministic finite utomton with regulr expression trnsitions (NFA-reg) is NA-reg with finite set of sttes.

2.3. CONVERSION ALGORITHMS BETWEEN FINITE AUTOMATA 25 2.3 Conversion Algorithms etween Finite Automt We recll tht ll our dt structures cn represent exctly the sme lnguges. Since DFAs re specil cse of NFA, which re specil cse of NFA-ε, it suffices to show tht every lnguge recognized y n NFA-ε cn lso e recognized y n NFA, nd every lnguge recognized y n NFA cn lso e recognized y DFA. 2.3.1 From NFA to DFA. The powerset construction trnsforms n NFA A into DFA B recognizing the sme lnguge. We first give n informl ide of the construction. Recll tht NFA my hve mny different runs on word w, possily leding to different sttes, while DFA hs exctly one run on w. Denote y Q w the set of sttes q such tht some run of A on w leds from its initil stte q 0 to q. Intuitively, B keeps trck of the set Q w : its sttes re sets of sttes of A, with {q 0 } s initil stte, nd its trnsition function is defined to ensure tht the run of B on w leds from {q 0 } to Q w (see elow). It is then esy to ensure tht A nd B recognize the sme lnguge: it suffices to choose the finl sttes of B s the sets of sttes of A contining t lest one finl stte, ecuse for every word w: B ccepts w iff Q w is finl stte of B iff Q w contins t lest finl stte of A iff some run of A on w leds to finl stte of A iff A ccepts w. Let us now define the trnsition function of B, sy. Keeping trck of the set Q w mounts to stisfying (Q w, ) = Q w for every word w. But we hve Q w = q Q w δ(q, ), nd so we define (Q, ) = q Q δ(q, ) for every Q Q. Notice tht we my hve Q = ; in this cse, is stte of B, nd, since (, ) = for every, trp stte. Summrizing, given A = (Q, Σ, δ, Q 0, F) we define the DFA B = (Q, Σ,, Q 0, F) s follows: Q = P(Q); (Q, ) = δ(q, ) for every Q Q nd every Σ; q Q Q 0 = {q 0 }; nd F = {Q Q Q F }.

26 CHAPTER 2. AUTOMATA CLASSES AND CONVERSIONS NFAtoDFA(A) Input: NFA A = (Q, Σ, δ, Q 0, F) Output: DFA B = (Q, Σ,, Q 0, F) with L(B) = L(A) 1 Q,, F ; 2 W = {Q 0 } 3 while W do 4 pick Q from W 5 dd Q to Q 6 if Q F then dd Q to F 7 for ll Σ do 8 Q δ(q, ) q Q 9 if Q Q then dd Q to W 10 dd (Q,, Q ) to Tle 2.1: NFAtoDFA(A) Notice, however, tht B my not e in norml form: it my hve mny sttes non-rechle from Q 0. For instnce, ssume A hppens to e DFA with sttes {q 0,..., q n 1 }. Then B hs 2 n sttes, ut only the singletons {q 0 },..., {q n 1 } re rechle. The conversion lgorithm shown in Tle 16 constructs only the rechle sttes. It is written in pseudocode, with strct sets s dt structure. Like nerly ll the lgorithms presented in the next chpters, it is workset lgorithm. Workset lgorithms mintin set of ojects, the workset, witing to e processed. Like in mthemticl sets, the elements of the workset re not ordered, nd the workset contins t most one copy of n element (i.e., if n element lredy in the workset is dded to it gin, the workset does not chnge). For most of the lgorithms in this ook, the workset cn e implemented s hsh tle. In NFAtoDFA() the workset is clled W, in other lgorithms just W (we use clligrphic font to emphsize tht in this cse the ojects of the workset re sets). Workset lgorithms repetedly pick n oject from the workset (instruction pick Q from W), nd process it; notice tht picking n oject removes it from the workset. Processing n oject my generte new ojects tht re dded to the list. The lgorithm termintes when the workset is empty. Since ojects removed from the list my generte new ojects, workset lgorithms my potentilly fil to terminte, even if the set of ll ojects is finite, ecuse the sme oject might e dded to nd removed from the workset infinitely mny times. Termintion is gurnteed y mking sure tht no oject tht hs een removed from the list once is ever dded to it gin. For this, ojects picked from the workset re stored (in NFAtoDFA() they re stored in Q), nd ojects re dded to the workset only if they hve not een stored yet.

2.3. CONVERSION ALGORITHMS BETWEEN FINITE AUTOMATA 27 Figure 2.9 shows n NFA t the top, nd some snpshots of the run of NFAtoDFA() on it. The sttes of the DFA re lelled with the corresponding sets of sttes of the NFA. The lgorithm picks sttes from the workset in order {1}, {1, 2}, {1, 3}, {1, 4}, {1, 2, 4}. Snpshots ()-(d) re tken right fter it picks the sttes {1, 2}, {1, 3}, {1, 4}, nd {1, 2, 4}, respectively. Snpshot (e) is tken t the end. Notice tht out of the 2 4 = 16 susets of sttes of the NFA only 5 re constructed, ecuse the rest re not rechle from {1}. Complexity. If A hs n sttes, then the output of NFAtoDFA(A) cn hve up to 2 n sttes. To show tht this ound is essentilly rechle, consider the fmily {L n } n 1 of lnguges over Σ = {, } given y L n = ( + ) ( + ) (n 1). Tht is, L n contins the words of length t lest n whose n-th letter strting from the end is n. The lnguge L n is ccepted y the NFA with n+1 sttes shown in Figure 2.10(): intuitively, the utomton chooses one of the s in the input word, nd checks tht it is followed y exctly n 1 letters efore the word ends. Applying the suset construction, however, yields DFA with 2 n sttes. The DFA for L 3 is shown on the left of Figure 2.10(). The sttes of the DFA hve nturl interprettion: they store the lst n letters red y the utomton. If the DFA is in the stte storing 1 2... n nd it reds the letter n+1, then it moves to the stte storing 2... n+1. Sttes re finl if the first letter they store is n. The interpreted version of the DFA is shown on right of Figure 2.10(). We cn lso esily prove tht ny DFA recognizing L n must hve t lest 2 n sttes. Assume there is DFA A n = (Q, Σ, δ, q 0, F) such tht Q < 2 n nd L(A n ) = L n. We cn extend δ to mpping ˆδ: Q {, } Q, where ˆδ(q, ε) = q nd ˆδ(q, w σ) = δ(ˆδ(q, w), σ) for ll w Σ nd for ll σ Σ. Since Q < 2 n, there must e two words u v 1 nd u v 2 of length n for which ˆδ(q 0, u v 1 ) = ˆδ(q 0, u v 2 ). But then we would hve tht ˆδ(q 0, u v 1 u) = ˆδ(q 0, u v 2 u); tht is, either oth u v 1 u nd u v 2 u re ccepted y A n or neither do. Since, however, v 1 u = v 2 u = n, this contrdicts the ssumption tht A n consists of exctly the words with n t the n-th position from the end. 2.3.2 From NFA-ε to NFA. Let A e NFA-ε over n lphet Σ. In this section we use to denote n element of Σ, nd α, β to denote elements of Σ {ε}. Loosely speking, the conversion first dds to A new trnsitions tht mke ll ε-trnsitions redundnt, without chnging the recognized lnguge: every word ccepted y A efore dding the new trnsitions is ccepted fter dding them y run without ε-trnsitions. The conversion then removes ll ε-trnsitions, delivering n NFA tht recognizes the sme lnguge s A. The new trnsitions re shortcuts: If A hs trnsitions (q, α, q ) nd (q, β, q ) such tht α = ε or β = ε, then the shortcut (q, αβ, q ) is dded. (Notice tht either αβ = for some Σ, or αβ = ε.) Shortcuts my generte further shortcuts: for instnce, if αβ = nd A hs further trnsition (q, ε, q ), then new shortcut (q,, q ) is dded. We cll the process of dding ll possile shortcuts sturtion. Oviously, sturtion does not chnge the lnguge of A, nd if

28 CHAPTER 2. AUTOMATA CLASSES AND CONVERSIONS, 1 2 3, 4 1 1, 2 1 1, 2 () () 1, 3 1 1, 2 1 1, 2 1, 4 1, 3 1, 2, 4 (c) 1, 4 1, 3 1, 2, 4 (d) 1 1, 2 1, 4 1, 3 1, 2, 4 (e) Figure 2.9: Conversion of NFA into DFA.

2.3. CONVERSION ALGORITHMS BETWEEN FINITE AUTOMATA 29,...,,, 1 2 n n + 1 () NFA for L n. 1 1, 4 1, 2 1, 3 1, 2, 3 1, 2, 4 1, 2, 3, 4 1, 3, 4 () DFA for L 3 nd interprettion. Figure 2.10: NFA for L n, nd DFA for L 3.

30 CHAPTER 2. AUTOMATA CLASSES AND CONVERSIONS efore sturtion A hs run ccepting nonempty word, for exmple q 0 ε q 1 ε q 2 q 3 ε q 4 q 5 ε q 6 then fter sturtion it hs run ccepting the sme word, nd visiting no ε-trnsitions, nmely q 0 q 4 q 6 However, we cnnot yet remove ε-trnsitions. The NFA-ε of Figure 2.11() ccepts ε. After sturtion we get the NFA-ε of Figure 2.11(). However, removing ll ε-trnsitions yields n NFA tht no longer ccepts ε. To solve this prolem, if A ccepts ε from some initil stte, then 0 1 2 ε ε () NFA-ε ccepting L(0 1 2 ) 0 1 2 ε 0, 1 ε 1, 2 0, 1, 2 () After sturtion 0 1 2 0, 1 1, 2 0, 1, 2 (c) After mrking the initil stte nd finl nd removing ll ε-trnsitions. Figure 2.11: Conversion of n NFA-ε into n NFA y shortcutting ε-trnsitions. we mrk tht stte s finl, which clerly does not chnge the lnguge. To decide whether A ccepts ε, we check if some stte rechle from some initil stte y sequence of ε-trnsitions is finl. Figure 2.11(c) shows the finl result. Notice tht, in generl, fter removing ε-trnsitions

2.3. CONVERSION ALGORITHMS BETWEEN FINITE AUTOMATA 31 the utomton my not e in norml form, ecuse some sttes my no longer e rechle. So the nïve procedure runs in three phses: sturtion, ε-check, nd normliztion. However, it is possile to crry ll three steps in single pss. We give workset implementtion of this procedure in which the check is done while sturting, nd only the rechle sttes re generted (in the pseudocode α nd β stnd for either letter of Σ or ɛ, nd stnds for letter of Σ). Furthermore, the lgorithm voids constructing some redundnt shortcuts. For instnce, for the NFA-ε of Figure 2.11() the lgorithm does not construct the trnsition leled y 2 leding from the stte in the middle to the stte on the right. NFAεtoNFA(A) Input: NFA-ε A = (Q, Σ, δ, Q 0, F) Output: NFA B = (Q, Σ, δ, q 0, F ) with L(B) = L(A) 1 Q 0 Q 0 2 Q Q 0 ; δ ; F F Q 0 3 δ ; W {(q, α, q ) δ q Q 0 } 4 while W do 5 pick (q 1, α, q 2 ) from W 6 if α ε then 7 dd q 2 to Q ; dd (q 1, α, q 2 ) to δ ; if q 2 F then dd q 2 to F 8 for ll q 3 δ(q 2, ε) do 9 if (q 1, α, q 3 ) δ then dd (q 1, α, q 3 ) to W 10 for ll Σ, q 3 δ(q 2, ) do 11 if (q 2,, q 3 ) δ then dd (q 2,, q 3 ) to W 12 else / α = ε / 13 dd (q 1, α, q 2 ) to δ ; if q 2 F then dd q 1 to F 14 for ll β Σ {ε}, q 3 δ(q 2, β) do 15 if (q 1, β, q 3 ) δ δ then dd (q 1, β, q 3 ) to W The correctness proof is conceptully esy, ut the different cses require some cre, nd so we devote it proposition. Proposition 2.15 Let A e NFA-ε, nd let B = NFAεtoNFA(A). Then B is NFA nd L(A) = L(B). Proof: Notice first tht every trnsition tht leves W is never dded to W gin: when trnsition (q 1, α, q 2 ) leves W it is dded to either δ or δ, nd trnsition enters W only if it does not elong to either δ or δ. Since every execution of the while loop removes trnsition from the workset, the lgorithm eventully exits the loop nd termintes. To show tht B is NFA we hve to prove tht it only hs non-ε trnsitions, nd tht it is in norml form, i.e., tht every stte of Q is rechle from q 0 in B. For the first prt, oserve tht

32 CHAPTER 2. AUTOMATA CLASSES AND CONVERSIONS trnsitions re only dded to δ in line 7, nd none of them is n ε-trnsition ecuse of the gurd in line 6. For the second prt, we need the following invrint, which cn e esily proved y inspection: for every trnsition (q 1, α, q 2 ) dded to W, if α = ε then q 1 Q 0, nd if α ε, then q 2 is rechle in B (fter termintion). Now, since new sttes re dded to Q only t line 7, pplying the invrint we get tht every stte of Q is rechle in B from some stte in Q 0. It remins to prove L(A) = L(B). The inclusion L(A) L(B) follows from the fct tht every trnsition dded to δ is shortcut, which cn e proved y inspection. For the inclusion L(A) L(B), we first prove ε ε q n e run of A such tht q n F. If tht ε L(A) implies ε L(B). Let q 0 q 1... q n 1 n = 0 (i.e., q n = q 0 ), then we re done. If n > 0, then we prove y induction on n tht trnsition (q 0, ε, q n ) is eventully dded to W (nd so eventully picked from it), which implies tht q 0 is eventully dded to F t line 13. If n = 1, then (q 0, ε, q n ) is dded to W t line 3. If n > 1, then y hypothesis (q 0, ε, q n 1 ) is eventully dded to W, picked from it t some lter point, nd so (q 0, ε, q n ) is dded to W t line 15. We now prove tht for every w Σ +, if w L(A) then w L(B). Let w = 1 2... n with n 1. Then A hs run q 0 ε... ε q m1 1 q m1 +1 ε... ε q mn n q mn +1 ε... ε q m such tht q m F. We hve just proved tht trnsition (q 0, ε, q m1 ) is eventully dded to W. So (q 0, 1, q m1 +1) is eventully dded t line 15, (q 0, 1, q m+2 ),..., (q 0, 1, q m2 ) re eventully dded t line 9, nd (q m2, 2, q m2 +1) is eventully dded t line 11. Iterting this rgument, we otin tht 1 2 q 0 q m2 q m3... q mn n q m is run of B. Moreover, q m is dded to F t line 7, nd so w L(B). Complexity. Oserve tht the lgorithm processes pirs of trnsitions (q 1, α, q 2 ), (q 2, β, q 3 ), where (q 1, α, q 2 ) comes from W nd (q 2, β, q 3 ) from δ (lines 8, 10, 14). Since every trnsition is removed from W t most once, the lgorithm processes t most Q Σ δ pirs (ecuse for fixed trnsition (q 2, β, q 3 ) δ there re Q possiilities for q 1 nd Σ possiilities for α). The runtime is dominted y the processing of the pirs, nd so it is O( Q Σ δ ). 2.4 Conversion lgorithms etween regulr expressions nd utomt To convert regulr expressions to utomt nd vice vers we use NFA-regs s introduced in Definition 2.14. Both NFA-ε s nd regulr expressions cn e seen s suclsses of NFA-regs: n NFA-ε is n NFA-reg whose trnsitions re leled y letters or y ε, nd regulr expression r is the NFA-reg A r hving two sttes, the one initil nd the other finl, nd single trnsition leled r leding from the initil to the finl stte. We present lgorithms tht, given n NFA-reg elonging to one of this suclsses, produces sequence of NFA-regs, ech one recognizing the sme lnguge s its predecessor in the sequence, nd ending in n NFA-reg of the other suclss.