When Simulation Meets Antichains (on Checking Language Inclusion of NFAs)

Similar documents

Polynomial Functions. Polynomial functions in one variable can be written in expanded form as ( )

Regular Sets and Expressions

LINEAR TRANSFORMATIONS AND THEIR REPRESENTING MATRICES

Homework 3 Solutions

One Minute To Learn Programming: Finite Automata

Example 27.1 Draw a Venn diagram to show the relationship between counting numbers, whole numbers, integers, and rational numbers.

Helicopter Theme and Variations

Reasoning to Solve Equations and Inequalities

Factoring Polynomials

9.3. The Scalar Product. Introduction. Prerequisites. Learning Outcomes

Integration. 148 Chapter 7 Integration

9 CONTINUOUS DISTRIBUTIONS

SPECIAL PRODUCTS AND FACTORIZATION

Regular Languages and Finite Automata

Operations with Polynomials

Treatment Spring Late Summer Fall Mean = 1.33 Mean = 4.88 Mean = 3.

Math 135 Circles and Completing the Square Examples

Graphs on Logarithmic and Semilogarithmic Paper

EQUATIONS OF LINES AND PLANES

All pay auctions with certain and uncertain prizes a comment

Unambiguous Recognizable Two-dimensional Languages

5.2. LINE INTEGRALS 265. Let us quickly review the kind of integrals we have studied so far before we introduce a new one.

Decision Rule Extraction from Trained Neural Networks Using Rough Sets

Learning Workflow Petri Nets

Babylonian Method of Computing the Square Root: Justifications Based on Fuzzy Techniques and on Computational Complexity

Economics Letters 65 (1999) macroeconomists. a b, Ruth A. Judson, Ann L. Owen. Received 11 December 1998; accepted 12 May 1999

Solving BAMO Problems

Or more simply put, when adding or subtracting quantities, their uncertainties add.

Data replication in mobile computing

Algebra Review. How well do you remember your algebra?

Protocol Analysis / Analysis of Software Artifacts Kevin Bierhoff

Second Term MAT2060B 1. Supplementary Notes 3 Interchange of Differentiation and Integration

Mathematics. Vectors. hsn.uk.net. Higher. Contents. Vectors 128 HSN23100

and thus, they are similar. If k = 3 then the Jordan form of both matrices is

PROF. BOYAN KOSTADINOV NEW YORK CITY COLLEGE OF TECHNOLOGY, CUNY

4.11 Inner Product Spaces

Small Business Networking

MATH 150 HOMEWORK 4 SOLUTIONS

Use Geometry Expressions to create a more complex locus of points. Find evidence for equivalence using Geometry Expressions.

On decidability of LTL model checking for process rewrite systems

Vectors Recap of vectors

Integration by Substitution

Lecture 5. Inner Product

Basic Analysis of Autarky and Free Trade Models

Small Business Networking

Econ 4721 Money and Banking Problem Set 2 Answer Key

Neighborhood Based Fast Graph Search in Large Networks

Binary Representation of Numbers Autar Kaw

How To Network A Smll Business

Experiment 6: Friction

The Velocity Factor of an Insulated Two-Wire Transmission Line

Small Business Networking

Small Business Networking

piecewise Liner SLAs and Performance Timetagment

19. The Fermat-Euler Prime Number Theorem

FUNCTIONS AND EQUATIONS. xεs. The simplest way to represent a set is by listing its members. We use the notation

ORBITAL MANEUVERS USING LOW-THRUST

Vendor Rating for Service Desk Selection

Applications to Physics and Engineering

PHY 140A: Solid State Physics. Solution to Homework #2

Appendix D: Completing the Square and the Quadratic Formula. In Appendix A, two special cases of expanding brackets were considered:

Section 5-4 Trigonometric Functions

How To Set Up A Network For Your Business

Example A rectangular box without lid is to be made from a square cardboard of sides 18 cm by cutting equal squares from each corner and then folding

Bayesian Updating with Continuous Priors Class 13, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

AREA OF A SURFACE OF REVOLUTION

STATUS OF LAND-BASED WIND ENERGY DEVELOPMENT IN GERMANY

Space Vector Pulse Width Modulation Based Induction Motor with V/F Control

Secure routing for structured peer-to-peer overlay networks

Learning to Search Better than Your Teacher

Assumption Generation for Software Component Verification

Value Function Approximation using Multiple Aggregation for Multiattribute Resource Management

Concept Formation Using Graph Grammars

COMPARISON OF SOME METHODS TO FIT A MULTIPLICATIVE TARIFF STRUCTURE TO OBSERVED RISK DATA BY B. AJNE. Skandza, Stockholm ABSTRACT

Virtual Machine. Part II: Program Control. Building a Modern Computer From First Principles.

6.2 Volumes of Revolution: The Disk Method

How To Understand The Theory Of Inequlities

RIGHT TRIANGLES AND THE PYTHAGOREAN TRIPLETS

CHAPTER 11 Numerical Differentiation and Integration

Version 001 Summer Review #03 tubman (IBII ) 1

Object Semantics Lecture 2

A.7.1 Trigonometric interpretation of dot product A.7.2 Geometric interpretation of dot product

2 DIODE CLIPPING and CLAMPING CIRCUITS

Basic Research in Computer Science BRICS RS Brodal et al.: Solving the String Statistics Problem in Time O(n log n)

Physics 43 Homework Set 9 Chapter 40 Key

Modeling POMDPs for Generating and Simulating Stock Investment Policies

Warm-up for Differential Calculus

1.00/1.001 Introduction to Computers and Engineering Problem Solving Fall Final Exam

Small Businesses Decisions to Offer Health Insurance to Employees

Solution to Problem Set 1

Small Business Cloud Services

Transcription:

When Simultion Meets Antichins (on Checking Lnguge Inclusion of NFAs) Prosh Aziz Abdull 1, Yu-Fng Chen 1, Lukáš Holík 2, Richrd Myr 3, nd Tomáš Vojnr 2 1 Uppsl University 2 Brno University of Technology 3 University of Edinburgh Abstrct. We describe new nd more efficient lgorithm for checking universlity nd lnguge inclusion on nondeterministic finite word utomt (NFA) nd tree utomt (TA). To the best of our knowledge, the ntichin-bsed pproch proposed by Wulf et l. ws the most efficient one so fr. Our ide is to exploit simultion reltion on the sttes of finite utomt to ccelerte the ntichin-bsed lgorithms. Normlly, simultion reltion cn be obtined firly efficiently, nd it cn help the ntichin-bsed pproch to prune out lrge portion of unnecessry serch pths. We evlute the performnce of our new method on NFA/TA obtined from rndom regulr expressions nd from the intermedite steps of regulr model checking. The results show tht our pproch significntly outperforms the previous ntichin-bsed pproch in most of the experiments. 1 Introduction The lnguge inclusion problem for regulr lnguges is importnt in mny ppliction domins, e.g., forml verifiction. Mny verifiction problems cn be formulted s lnguge inclusion problem. For exmple, one my describe the ctul behviors of n implementtion in n utomton A nd ll of the behviors permitted by the specifiction in nother utomton B. Then, the problem of whether the implementtion meets the specifiction is equivlent to the problem L(A) L(B). Methods for proving lnguge inclusion cn be ctegorized into two types: those bsed on simultion (e.g., [6]) nd those bsed on the subset construction (e.g., [5, 8 10]). Simultion-bsed pproches first compute simultion reltion on the sttes of two utomt A nd B nd then check if ll initil sttes of A cn be simulted by some initil stte of B. Since simultion cn be computed in polynomil time, simultionbsed methods re usully very efficient. Their min drwbck is tht they re incomplete. Simultion preorder implies lnguge inclusion, but not vice-vers. On the other hnd, methods bsed on the subset construction re complete but inefficient becuse in mny cses they will cuse n exponentil blow up in the number of sttes. Recently, Wulf et l. [11] proposed the ntichin-bsed pproch. To the best of our knowledge, it ws the most efficient one mong ll of the methods bsed on the subset construction. Although the ntichin-bsed method significntly outperforms the clssicl subset construction, in mny cses, it still sometimes suffers from the exponentil blow up problem. In this pper, we describe new pproch tht nicely combines the simultionbsed nd the ntichin-bsed pproches. The computed simultion reltion is used for pruning out unnecessry serch pths of the ntichin-bsed method. To simplify the presenttion, we first consider the problem of checking universlity for word utomton A. In similr mnner to the clssicl subset construction, we

strt from the set of initil sttes nd serch for sets of sttes (here referred to s mcrosttes) which re not ccepting (i.e., we serch for counterexmple of universlity). The key ide is to define n esy-to-check ordering on the sttes of A which implies lnguge inclusion (i.e., p q implies tht the lnguge of the stte p is included in the lnguge of the stte q). From, we derive n ordering on mcro-sttes which we use in two wys to optimize the subset construction: (1) serching from mcrostte needs not continue in cse smller mcro-stte hs lredy been nlyzed; nd (2) given mcro-stte is represented by (the subset of) its mximl elements. In this pper, we tke the ordering to be the well-known mximl simultion reltion on the utomton A. In fct, the nti-chin lgorithm of [11] coincides with the specil cse where the ordering is the identity reltion. Subsequently, we describe how to generlize the bove pproch to the cse of checking lnguge inclusion between two utomt A nd B, by extending the ordering to pirs ech consisting of stte of A nd mcro-stte of B. In the second prt of the pper, we extend our lgorithms to the cse of tree utomt. First, we define the notion of open trees which we use to chrcterize the lnguges defined by tuples of sttes of the tree utomton. We identify here new ppliction of the so clled upwrd simultion reltion from [1]. We show tht it implies (open tree) lnguge inclusion, nd we describe how we cn use it to optimize existing lgorithms for checking the universlity nd lnguge inclusion properties. We hve implemented our lgorithms nd crried out n extensive experimenttion using NFA obtined from severl different sources. These include NFA from rndom regulr expressions nd lso 1069 pirs of NFA generted from the intermedite steps of bstrct regulr model checking [4] while verifying the correctness of the bkery lgorithm, producer-consumer system, the bubble sort lgorithm, n lgorithm tht reverses circulr list, nd Petri net model of the reders/writers protocol. We hve lso considered tree-utomt derived from intermedite steps of bstrct regulr tree model checking. The experiments show tht our pproch significntly outperforms the previous ntichin-bsed pproch in lmost ll of the considered cses. (Furthermore, in those cses where simultion is sufficient to prove lnguge inclusion, our lgorithm hs polynomil running time.) The reminder of the pper is orgnized s follows. Section 2 contins some bsic definitions. In Section 3, we begin the discussion by pplying our ide to solve the universlity problem for NFA. The problem is simpler thn the lnguge inclusion problem nd thus we believe tht presenting our universlity checking lgorithm first mkes it esier for the reder to grsp the ide. The correctness proof of our universlity checking lgorithm is given in Section 4. In Section 5 we discuss our lnguge inclusion checking lgorithm for NFA. Section 6 defines bsic nottions for tree utomt nd in Section 7, we present the lgorithms for checking universlity nd lnguge inclusion for tree utomt. The experimentl results re described in Section 8. Finlly, in Section 9, we conclude the pper nd discuss further reserch directions. 2 Preliminries A Nondeterministic Finite Automton (NFA) A is tuple (Σ,Q,I,F,δ) where: Σ is n lphbet, Q is finite set of sttes, I Q is non-empty set of initil sttes, F Q is

set of finl sttes, nd δ Q Σ Q is the trnsition reltion. For convenience, we use p q to denote the trnsition from the stte p to the stte q with the lbel. A word u = u 1...u n is ccepted by A from the stte q 0 if there exists sequence u j q 0 u 1 q 1 u 2...u n q n such tht q n F nd q j 1 q j for ll 0 < j n. Define L(A)(q) := {u u is ccepted by A from the stte q} (the lnguge of the stte q in A). Define the lnguge L(A) of A s S q I L(A)(q). We sy tht A is universl if L(A) =Σ. Let A =(Σ,Q A,I A,F A,δ A ) nd B =(Σ,Q B,I B,F B,δ B ) be two NFAs. Define their union utomton A B :=(Σ,Q A Q B,I A I B,F A F B,δ A δ B ). We define the post-imge of stte Post(p) := {p Σ : (p,, p ) δ}. A simultion on A =(Σ,Q,I,F,δ) is reltion Q Q such tht p r only if (i) p F = r F nd (ii) for every trnsition p p, there exists trnsition r r such tht p r. It cn be shown tht for ech utomton A =(Σ,Q,I,F,δ), there exists unique mximl simultion. The following is well-known lemm. Lemm 1. Given simultion on n NFA A, p r = L(A)(p) L(A)(r). For convenience, we cll set of sttes in A mcro-stte, i.e., mcro-stte is subset of Q. A mcro-stte is ccepting if it contins t lest one ccepting stte, otherwise it is rejecting. For mcro-stte P, define L(A)(P) := S p P L(A)(p). We sy tht mcro-stte P is universl if L(A)(P)=Σ. For two mcro-sttes P nd R, we write P R s shorthnd for p P. r R : p r. We define the post-imge of mcro-stte Post(P) := {P Σ : P = {p p P : (p,, p ) δ}}. We use A to denote the set of reltions over the sttes of A tht implies lnguge inclusion, i.e., if A, then we hve p r = L(A)(p) L(A)(r). 3 Universlity of NFAs The universlity problem for n NFA A =(Σ,Q,I,F,δ) is to decide whether L(A) = Σ. The problem is PSPACE-complete. The clssicl lgorithm for the problem first determinizes A with the subset construction nd then checks if every rechble mcrostte is ccepting. The lgorithm is inefficient since in mny cses the determiniztion will cuse n exponentil blow-up in the number of sttes. Note tht for universlity checking, we cn stop the subset construction immeditely nd conclude tht A is not universl whenever rejecting mcro-stte is encountered. An exmple of run of this lgorithm is given in Fig. 1. The utomton A used in Fig. 1 is universl becuse ll rechble mcro-sttes re ccepting. In this section, we propose more efficient pproch to universlity checking. In similr mnner to the clssicl lgorithm, we run the subset construction procedure nd check if ny rejecting mcro-stte is rechble. However, our lgorithm ugments the subset construction with two optimiztions, henceforth referred to s Optimiztion 1 nd Optimiztion 2, respectively. Optimiztion 1 is bsed on the fct tht if the lgorithm encounters mcro-stte R whose lnguge is superset of the lnguge of visited mcro-stte P, then there is no need to continue the serch from R. The intuition behind this is tht if word is not ccepted from R, then it is lso not ccepted from P. For instnce, in Fig. 1(b), the serch needs not continue from the mcro-stte {s 2,s 3 } since its lnguge is superset of the lnguge of the initil mcro-stte {s 1,s 2 }. However, in generl it is difficult to

Clssicl b s 1 b s 2 b b s 3 b () Source NFA A s 1 s 2 s 1 (c) Optimiztion 1 nd 2 b b s 4 s2,s3 Antichin Optimiztion 1 s1,s2 s1,s2,s4 b b s1,s3 s2,s3 s2,s3 s1,s2,s3,s4 b b s1,s2 s1,s3 s1,s2,s3 s1,s2,s3,s4 b s1,s2,s3 b s1,s2,s3,s4 (b) A run of the lgorithms. The res lbeled Optimiztion 1, Antichin, Clssicl re the mcro-sttes generted by our pproch with the mximl simultion nd Optimiztion 1, the ntichin-bsed pproch, nd the clssicl pproch, respectively. Fig. 1. Universlity Checking Algorithms check if L(A)(P) L(A)(R) before the resulting DFA is completely built. Therefore, we suggest to use n esy-to-compute lterntive bsed on the following lemm. Lemm 2. Let P, R be two mcro-sttes, A be n NFA, nd be reltion in A. Then, P R implies L(A)(P) L(A)(R). Note tht in Lemm 2, cn be ny reltion on the sttes of A tht implies lnguge inclusion. This includes ny simultion reltion (Lemm 1). When is the mximl simultion or the identity reltion, it cn be efficiently obtined from A before the subset construction lgorithm is triggered nd used to prune out unnecessry serch pths. An exmple of how the described optimiztion cn help is given in Fig. 1(b). If is the identity, the universlity checking lgorithm will not continue the serch from the mcro-stte {s 1,s 2,s 4 } becuse it is superset of the initil mcro-stte. In fct, the ntichin-bsed pproch [11] cn be viewed s specil cse of our pproch when is the identity. Notice tht, in this cse, only 7 mcro-sttes re generted (the clssicl lgorithm genertes 13 mcro-sttes). When is the mximl simultion, we do not need to continue from the mcro-stte {s 2,s 3 } either becuse s 1 s 3 nd hence {s 1,s 2 } {s 2,s 3 }. In this cse, only 3 mcro-sttes re generted. As we cn see from the exmple, better reduction of the number of generted sttes cn be chieved when weker reltion (e.g., the mximl simultion) is used. Optimiztion 2 is bsed on the observtion tht L(A)(P)=L(A)(P\{p 1 }) if there is some p 2 P with p 1 p 2. This fct is simple consequence of Lemm 2 (note tht P P \{p 1 }). Since the two mcro-sttes P nd P \{p 1 } hve the sme lnguge, if word is not ccepted from P, it is not ccepted from P \{p 1 } either. On the other hnd, if ll words in Σ cn be ccepted from P, then they cn lso be ccepted from P \{p 1 }. Therefore, it is sfe to replce the mcro-stte P with P \{p 1 }. Consider the exmple in Fig. 1. If is the mximl simultion reltion, we cn remove the stte s 2 from the initil mcro-stte {s 1,s 2 } without chnging its lnguge,

Algorithm 1: Universlity Checking 1 2 3 4 5 6 7 8 9 10 Input: An NFA A =(Σ,Q,I,F,δ) nd reltion A. Output: TRUE if A is universl. Otherwise, FALSE. if I is rejecting then return FALSE; Processed:=/0; Next:={Minimize(I)}; while Next = /0 do Pick nd remove mcro-stte R from Next nd move it to Processed; forech P {Minimize(R ) R Post(R)} do if P is rejecting then return FALSE; else if S Processed Next s.t. S P then Remove ll S from Processed Next s.t. P S; Add P to Next; 11 return TRUE becuse s 2 s 1. This chnge will propgte to ll the serching pths. With this optimiztion, our pproch will only genertes 3 mcro-sttes, ll of which re singletons. The result fter pply the two optimiztions re pplied is shown in Fig. 1(c). Algorithm 1 describes our pproch in pseudocode. In this lgorithm, the function Minimize(R) implements Optimiztion 2. The function does the following: it chooses new stte r 1 from R, removes r 1 from R if there exists stte r 2 in R such tht r 1 r 2, nd then repets the procedure until ll of the sttes in R re processed. Lines 8 10 of the lgorithm implement Optimiztion 1. Overll, the lgorithm works s follows. Till the set Next of mcro-sttes witing to be processed is non-empty (or rejecting mcrostte is found), the lgorithm chooses one mcro-stte from Next, nd moves it to the Processed set. Moreover, it genertes ll successors of the chosen mcro-stte, minimizes them, nd dds them to Next unless there is lredy some -smller mcrostte in Next or in Processed. If new mcro-stte is dded to Next, the lgorithm t the sme time removes ll -bigger mcro-sttes from both Next nd Processed. Note tht the pruning of the Next nd Processed sets together with checking whether new mcro-stte should be dded into Next cn be done within single itertion through Next nd Processed. We discuss correctness of the lgorithm in the next section. 4 Correctness of the Optimized Universlity Checking In this section, we prove correctness of Algorithm 1. Due to the spce limittion, we only present n overview. A more detiled proof cn be found in Appendix A. Let A =(Σ,Q,I,F,δ) be the input utomton. We first introduce some definitions nd nottions tht will be used in the proof. For mcro-stte P, define Dist(P) N { } s the length of the shortest word in Σ tht is not in L(A)(P) (if L(A)(P) =Σ, Dist(P) = ). For set of mcro-sttes MSttes, the function Dist(MSttes) N { } returns the length of the shortest word in Σ tht is not in the lnguge of some mcro-stte in MSttes. More precisely, if MSttes = /0, Dist(MSttes)=, otherwise, Dist(MSttes)=min P MSttes Dist(P). The predicte Univ(MSttes) is true if nd only if ll the mcro-sttes in MSttes re universl, i.e., P MSttes : L(A)(P)=Σ.

Lemm 3 describes the invrints used to prove the prtil correctness of Alg. 1. Lemm 3. The below two loop invrints hold in Algorithm 1: 1. Univ(Processed Next) = Univ({I}). 2. Univ({I}) = Dist(Processed) > Dist(Next). Due to the finite number of mcro-sttes, we cn show tht Algorithm 1 eventully termintes. Algorithm 1 returns FALSE only if either the set of initil sttes is rejecting, or the minimized version of some successor R of mcro-stte R chosen from Next on line 5 is found rejecting. In the ltter cse, due to Lemm 2, R is lso rejecting. Then, R is non-universl, nd hence Univ(Processed Next) is flse. By Lemm 3 (Invrint 1), we hve A is not universl. The lgorithm returns TRUE only when Next becomes empty. When Next is empty, Dist(Processed) > Dist(Next) is not true. Therefore, by Lemm 3 (Invrint 2), A is universl. This gives the following theorem. Theorem 1. Algorithm 1 lwys termintes nd returns TRUE iff the input utomton A is universl. 5 The Lnguge Inclusion Problem The technique described in Section 3 cn be generlized to solve the lnguge-inclusion problem. Let A nd B be two NFAs. The lnguge inclusion problem for A nd B is to decide whether L(A) L(B). This problem is lso PSPACE-complete. The clssicl lgorithm for solving this problem builds on-the-fly the product utomton A B of A nd the complement of B nd serches for n ccepting stte. A stte in the product utomton A B is pir (p,p) where p is stte in A nd P is mcro-stte in B. For convenience, we cll such pir (p, P) product-stte. A product-stte is ccepting iff p is n ccepting stte in A nd P is rejecting mcro-stte in B. We use L(A,B)(p,P) to denote the lnguge of the product-stte (p,p) in A B. The lnguge of A is not contined in the lnguge of B iff there exists some ccepting product-stte (p, P) rechble from some initil product-stte. Indeed, L(A, B)(p, P) =L(A)(p) \ L(B)(P), nd the lnguge of A B consists of words which cn be used s witnesses of the fct tht L(A) L(B) does not hold. In similr mnner to universlity checking, the lgorithm cn stop the serch immeditely nd conclude tht the lnguge inclusion does not hold whenever n ccepting product-stte is encountered. An exmple of run of the clssicl lgorithm is given in Fig. 2. We find tht L(A) L(B) is true nd the lgorithm genertes 13 product-sttes (Fig. 2(c), the re lbeled Clssicl ). Optimiztion 1 tht we use for universlity checking cn be generlized for lnguge inclusion checking s follows. Let A =(Σ,Q A,I A,F A,δ A ) nd B =(Σ,Q B,I B,F B, δ B ) be two NFAs such tht Q A Q B = /0. We denote by A B the NFA (Σ,Q A Q B,I A I B,F A F B,δ A δ B ). Let be reltion in (A B). During the process of constructing the product utomton nd serching for n ccepting product-stte, we cn stop the serch from product-stte (p,p) if () there exists some visited productstte (r,r) such tht p r nd R P, or (b) p P : p p. Optimiztion 1() is justified by Lemm 4, which is very similr to Lemm 2 for universlity checking. Lemm 4. Let A, B be two NFAs, (p,p), (r,r) be two product-sttes, where p, r re sttes in A nd P, R re mcro-sttes in B, nd be reltion in (A B). Then, p r nd R P implies L(A,B)(p,P) L(A,B)(r,R).

p 1,b () NFA A q 1,b (b) NFA B p 2 q 2 Clssicl Antichin Optimiztion 1(b) p1,{q1} Optimiztion 1() p1,{q2} p2,{q2} b p1,{q1,q2} p2,{q1,q2} p1,{q1,q2} p2,{q1,q2} p1,{q1} b p1,{q1,q2} p2,{q1,q2} p1,{q1,q2} p2,{q1,q2} p1,{q1} (c) A run of the lgorithms while checking L(A) L(B). Fig. 2. Lnguge Inclusion Checking Algorithms By the bove lemm, if word tkes the product-stte (p,p) to n ccepting productstte, it will lso tke (r,r) to n ccepting product-stte. Therefore, we do not need to continue the serch from (p,p). Let us use Fig. 2(c) to illustrte Optimiztion 1(). As we mentioned, the ntichinbsed pproch cn be viewed s specil cse of our pproch when is the identity. When is the identity, we do not need to continue the serch from the productstte (p 2,{q 1,q 2 }) becuse {q 2 } {q 1,q 2 }. In this cse, the lgorithm genertes 8 product-sttes (Fig. 2(c), the re lbeled Antichin ). In the cse tht is the mximl simultion, we do not need to continue the serch from product-sttes (p 1,{q 2 }), (p 1,{q 1,q 2 }), nd (p 2,{q 1,q 2 }) becuse q 1 q 2 nd the lgorithm lredy visited the product-sttes (p 1,{q 1 }) nd (p 2,{q 2 }). Hence, the lgorithm genertes only 6 product-sttes (Fig. 2(c), the re lbeled Optimiztion 1() ). If the condition of Optimiztion 1(b) holds, we hve tht the lnguge of p (w.r.t. A) is subset of the lnguge of P (w.r.t. B). In this cse, for ny word tht tkes p to n ccepting stte in A, it lso tkes P to n ccepting mcro-stte in B. Hence, we do not need to continue the serch from the product-stte (p,p) becuse ll of its successor sttes re rejecting product-sttes. Consider gin the exmple in Fig. 2(c). With Optimiztion 1(b), if is the mximl simultion on the sttes of A B, we do not need to continue the serch from the first product-stte (p 1,{q 1 }) becuse p 1 q 1. In this cse, the lgorithm cn conclude tht the lnguge inclusion holds immeditely fter the first product-stte is generted (Fig. 2(c), the re lbeled Optimiztion 1(b) ). Observe tht from Lemm 4, it holds tht for ny product-stte (p,p) such tht p 1 p 2 for some p 1, p 2 P, L(A,B)(p,P) =L(A,B)(p,P \{p 1 }) (s P P \{p 1 }). Optimiztion 2 tht we used for universlity checking cn therefore be generlized for lnguge inclusion checking too. We give the pseudocode of our optimized inclusion checking in Algorithm 2, which is strightforwrd extension of Algorithm 1. In the lgorithm, the definition of the Minimize(R) function is the sme s wht we hve defined in Algorithm 1. The function Initilize(PSttes) pplies Optimiztion 1 on the set of product-sttes PSttes to void unnecessry serching. More precisely, it returns mximl subset of PSttes such tht (1) for ny two elements (p,p), (q,q) in the subset, p q Q P nd (2) for ny element (p,p) in the subset, p P : p p. We define the post-imge of productstte Post((p,P)) := {(p,p ) Σ : (p,, p ) δ,p = {p p P : (p,, p ) δ}}.

Algorithm 2: Lnguge Inclusion Checking 1 2 3 4 5 6 7 8 9 10 11 Input: NFA A =(Σ,Q A,I A,F A,δ A ), B =(Σ,Q B,I B,F B,δ B ). A reltion (A B). Output: TRUE if L(A) L(B). Otherwise, FALSE. if there is n ccepting product-stte in {(i,i B ) i I A } then return FALSE; Processed:=/0; Next:= Initilize({(i,Minimize(I B )) i I A }); while Next = /0 do Pick nd remove product-stte (r,r) from Next nd move it to Processed; forech (p,p) {(r,minimize(r )) (r,r ) Post((r,R))} do if (p,p) is n ccepting product-stte then return FALSE; else if p P s.t. p p then if (s,s) Processed Next s.t. p s S P then Remove ll (s,s) from Processed Next s.t. s p P S; Add (p,p) to Next; 12 return TRUE Correctness: Define Dist(P) N { } s the length of the shortest word in the lnguge of the product-stte P or if the lnguge of P is empty. The vlue Dist(PSttes) N { } is the length of the shortest word in the lnguge of some product-stte in PSttes or if PSttes is empty. The predicte Incl(PSttes) is true iff for ll productsttes (p,p) in PSttes, L(A)(p) L(B)(P). The correctness of Algorithm 2 cn now be proved in very similr wy to Algorithm 1, using the below invrints: 1. Incl(Processed Next) = Incl({(i,I B ) i I A }). 2. Incl({(i,I B ) i I A })= Dist(Processed) > Dist(Next). 6 Tree Automt Preliminries To be ble to present generliztion of the bove methods for the domin of tree utomt, we now introduce some needed preliminries on tree utomt. A rnked lphbet Σ is set of symbols together with rnking function # : Σ N. For Σ, the vlue #() is clled the rnk of. For ny n 0, we denote by Σ n the set of ll symbols of rnk n from Σ. Let ε denote the empty sequence. A tree t over rnked lphbet Σ is prtil mpping t : N Σ tht stisfies the following conditions: (1) dom(t) is finite, prefix-closed subset of N nd (2) for ech v dom(t), if #(t(v)) = n 0, then {i vi dom(t)} = {1,...,n}. Ech sequence v dom(t) is clled node of t. For node v, we define the i th child of v to be the node vi, nd the i th subtree of v to be the tree t such tht t (v )=t(viv ) for ll v N.Alef of t is node v which does not hve ny children, i.e., there is no i N with vi dom(t). We denote by T (Σ) the set of ll trees over the lphbet Σ. A (finite, non-deterministic, bottom-up) tree utomton (bbrevited s TA in the sequel) is qudruple A =(Q,Σ,,F) where Q is finite set of sttes, F Q is set of finl sttes, Σ is rnked lphbet, nd is set of trnsition rules. Ech trnsition rule is triple of the form ((q 1,...,q n ),,q) where q 1,...,q n,q Q, Σ, nd #() =n. We use (q 1,...,q n ) q to denote tht ((q 1,...,q n ),,q). In the specil cse where n = 0, we spek bout the so-clled lef rules, which we sometimes bbrevite s q.

Let A =(Q,Σ,,F) be TA. A run of A over tree t T (Σ) is mpping π : dom(t) Q such tht, for ech node v dom(t) of rity #(t(v)) = n where q = π(v), if q i = π(vi) for 1 i n, then hs rule (q 1,...,q n ) t(v) π q. We write t = q to denote π tht π is run of A over t such tht π(ε)=q. We use t = q to denote tht t = q for some run π. The lnguge ccepted by stte q is defined by L(A)(q)={t t = q}, while the lnguge of A is defined by L(A)= S q F L(A)(q). 7 Universlity nd Lnguge Inclusion of Tree Automt To optimize universlity nd inclusion checking on word utomt, we used reltions tht imply lnguge inclusion. For the cse of universlity nd inclusion checking on tree utomt, we now propose to use reltions tht imply inclusion of lnguges of the so clled open trees (i.e., lefless trees or equivlently trees whose leves re replced by specil symbol denoting hole ) tht re ccepted from tuples of tree utomt sttes. We formlly define the notion below. Notice tht in contrst to the notion of lnguge ccepted from stte of word utomton, which refers to possible futures of the stte, the notion of lnguge ccepted t stte of TA refers to possible psts of the stte. Our notion of lnguges of open trees ccepted from tuples of tree utomt sttes speks gin bout the future of sttes, which turns out useful when trying to optimize the (ntichin-bsed) subset construction for TA. Consider specil symbol Σ with rnk 0, clled hole. An open tree over Σ is tree over Σ such tht ll its leves re lbeled 1 by. We use T (Σ) to denote the set of ll open trees over Σ. Given sttes q 1,...,q n Q nd n open tree t with leves v 1,...,v n, run π of A on t from (q 1,...,q n ) is defined in similr wy s the run on tree except tht for ech lef v i,1 i n, we hve π(v i )= π q i. We use t(q 1,...,q n ) = q to denote tht π is run of A on t from (q 1,...,q n ) such tht π(ε) =q. The nottion t(q 1,...,q n )= q is explined in similr mnner to runs on trees. Then, the lnguge of A ccepted from tuple (q 1,...,q n ) of sttes is L (A)(q 1,...,q n )={t T t(q 1,...,q n )= q for some q F}. Finlly, we define the lnguge ccepted from tuple of mcro-sttes (P 1,...,P n ) Q n s the set L (A)(P 1,...,P n )= S {L (A)(q 1,...,q n ) (q 1,...,q n ) P 1... P n }.We define Post (q 1,...,q n ) := {q (q 1,...,q n ) q}. For tuple of mcro-sttes, we let Post (P 1,...,P n ) := S {Post (q 1,...,q n ) (q 1,...,q n ) P 1 P n }. Let us use t to denote the open tree tht rises from tree t T (Σ) by replcing ll the lef symbols of t by nd let for every lef symbol Σ, I = {q q} is the so clled -initil mcro-stte. Lnguges ccepted t finl sttes of A correspond to the lnguges ccepted from tuples of initil mcro-sttes of A s stted in Lemm 5. Lemm 5. Let t be tree over Σ with leves lbeled by 1,..., n. Then t L(A) if nd only if t L (A)(I 1,...,I n ). 7.1 Upwrd Simultion We now work towrds defining suitble reltions on sttes of TA llowing us to optimize the universlity nd inclusion checking. We extend reltions Q Q on sttes to tuples of sttes such tht (q 1,...,q n ) (r 1,...,r n ) iff q i r i for ech 1 i n. We define 1 Note tht no internl nodes of n open tree cn be lbeled by s #()=0.

the set A of reltions tht imply inclusion of lnguges of tuples of sttes such tht A iff (q 1,...,q n ) (r 1,...,r n ) implies L (A)(q 1,...,q n ) L (A)(r 1,...,r n ). We define n extension of simultion reltions on sttes of word utomt tht stisfies the bove property s follows. Upwrd simultion on A is reltion Q Q such tht if q r, then (1) q F = r F nd (2) if (q 1,...,q n ) q where q = q i, then (q 1,...,q i 1,r,q i+1,...,q n ) r where q r. Upwrd simultions were discussed in [1], together with n efficient lgorithm of computing them. 2 Lemm 6. For the mximl upwrd simultion on A, we hve A. The proof of this lemm cn be obtined s follows. We first show tht the mximl upwrd simultion hs the following property: If (q 1,...,q n ) q in A, then for every (r 1,...,r n ) with (q 1,...,q n ) (r 1,...,r n ), there is r Q such tht q r nd (r 1,...,r n ) r. From (q 1,...,q n ) q nd q 1 r 1, we hve tht there is some rule (r 1,q 2,...,q n ) s 1 such tht q s 1. From the existence of (r 1,q 2,...,q n ) s 1 nd from q 2 r 2, we then get tht there is some rule (r 1,r 2,q 3,...,q n ) s 2 such tht s 1 s 2, etc. Since the mximl upwrd simultion is trnsitive [1], we obtin the property mentioned bove. This in turn implies Lemm 6. 7.2 Tree Automt Universlity Checking We now show how upwrd simultions cn be used for optimized universlity checking on tree utomt. Let A =(Σ,Q,F, ) be tree utomton. We define Tn (Σ) s the set of ll open trees over Σ with n leves. We sy tht n n-tuple (q 1,...,q n ) of sttes of A is universl if L (A)(q 1,...,q n )=Tn (Σ), this is, ll open trees with n leves constructible over Σ cn be ccepted from (q 1,...,q n ). A set of mcro-sttes MSttes is universl if ll tuples in MSttes re universl. From Lemm 5, we cn deduce tht A is universl (i.e., L(A)=T (Σ)) if nd only if {I Σ 0 } is universl. The following Lemm llows us to design new TA universlity checking lgorithm in similr mnner to Algorithm 1 using Optimiztions 1 nd 2 from Section 3. Lemm 7. For ny A nd two tuples of mcro-sttes of A, we hve (R 1,...,R n ) (P 1,...,P n ) implies L (A)(R 1,...,R n ) L (A)(P 1,...,P n ). Algorithm 3 describes our pproch to checking universlity of tree utomt in pseudocode. It resembles closely Algorithm 1. There re two min differences: (1) The initil vlue of the Next set is the result of pplying the function Initilize to the set {Minimize(I ) Σ 0 }. Initilize returns the set of ll mcro-sttes in {Minimize(I ) Σ 0 }, which re miniml w.r.t. (i.e., those mcro sttes with the best chnce of finding counterexmple to universlity). (2) The computtion of the Post-imge of set of mcro-sttes is bit more complicted. More precisely, for ech symbol Σ n,n N, we hve to compute the post imge of ech n-tuple of mcro-sttes from the set. We design the lgorithm such tht we void computing the Post-imge of tuple more thn once. We define the Post-imge Post(MSttes)(R) of set of 2 In [1], upwrd simultions re prmeterized by some downwrd simultion. However, upwrd simultions prmeterized by downwrd simultion greter thn the identity cnnot be used in our frmework since they do not generlly imply inclusion of lnguges of tuples of sttes.

Algorithm 3: Tree Automt Universlity Checking 1 2 3 4 5 6 7 8 9 10 Input: A tree utomton A =(Σ,Q,F, ) nd reltion A. Output: TRUE if A is universl. Otherwise, FALSE. if Σ 0 such tht I is rejecting then return FALSE; Processed:=/0; Next:= Initilize{Minimize(I ) Σ 0 }; while Next = /0 do Pick nd remove mcro-stte R from Next nd move it to Processed; forech P {Minimize(R ) R Post(Processed)(R)} do if P is rejecting mcro-stte then return FALSE; else if Q Processed Next s.t. Q P then Remove ll Q from Processed Next s.t. P Q; Add P to Next; 11 return TRUE mcro-sttes MSttes w.r.t. mcro-sttes R MSttes. It is the set of ll mcro-sttes P = Post (P 1,...,P n ) where Σ n,n N nd R occurs t lest once in the tuple (P 1,...,P n ) MSttes. Formlly, Post(MSttes)(R) = S Σ{Post (P 1,...,P n ) n = #(),P 1,...,P n MSttes,R {P 1,...,P n }}. The following theorem sttes correctness of Algorithm 3, which cn be proved using similr invrints s in the cse of Algorithm 1 when the notion of distnce from n ccepting stte is suitbly defined (see Appendix B for more detils). Theorem 2. Algorithm 3 lwys termintes nd returns TRUE if nd only if the input tree utomton A is universl. 7.3 Tree Automt Lnguge Inclusion Checking We re interested in testing lnguge inclusion of two tree utomt A =(Σ,Q A,F A, A ) nd B =(Σ,Q B,F B, B ). From Lemm 5, we hve tht L(A) L(B) iff for every tuple 1,..., n of symbols from Σ 0, L (A)(I A 1,...,I A n ) L (B)(I B 1,...,I B n ). In other words, for ny 1,..., n Σ 0, every open tree tht cn be ccepted from tuple of sttes from I A 1... I A n cn lso be ccepted from tuple of sttes from I B 1... I B n. This justifies similr use of the notion of product-sttes s in Section 5. We define the lnguge of tuple of product-sttes s L (A,B)((q 1,P 1 ),...,(q n,p n )) := L (A)(q 1,...,q n ) \ L (B)(P 1,...,P n ). Observe tht we obtin tht L(A) L(B) iff the lnguge of every n-tuple (for ny n N) of product-sttes from the set {(i,i B ) Σ 0,i I A } is empty. Our lgorithm for testing lnguge inclusion of tree utomt will check whether it is possible to rech product-stte of the form (q,p) with q F A nd P F B = /0 (tht we cll ccepting) from tuple of product-sttes from {(i,i B ) Σ 0,i I A }. The following lemm llows us to use Optimiztion 1() nd Optimiztion 2 from Section 5.

Algorithm 4: Tree Automt Lnguge Inclusion Checking 1 2 3 4 5 6 7 8 9 10 11 Input: TAs A nd B over n lphbet Σ. A reltion (A B). Output: TRUE if L(A) L(B). Otherwise, FALSE. if there exists n ccepting product-stte in S Σ 0 {(i,i B ) i I A } then return FALSE; Processed:=/0; Next:=Initilize( S Σ 0 {(i,minimize(i B )) i I A }); while Next = /0 do Pick nd remove product-stte (r,r) from Next nd move it to Processed; forech (p,p) {(r,minimize(r )) (r,r ) Post(Processed)(r,R)} do if (p,p) is n ccepting product-stte then return FALSE; else if p P s.t. p p then if (q,q) Processed Next s.t. p q Q P then Remove ll (q,q) from Processed Next s.t. q p P Q; Add (p,p) to Next; 12 return TRUE Lemm 8. Let (A B). For ny two tuples of sttes nd two tuples of productsttes such tht (p 1,...,p n ) (r 1,...,r n ) nd (R 1,...,R n ) (P 1,...,P n ), we hve L (A,B)((p 1,P 1 ),...,(p n,p n )) L (A,B)((r 1,R 1 ),...,(r n,r n )). It is lso possible to use Optimiztion 1(b) where we stop serching from productsttes of the form (q,p) such tht q r for some r P. However, note tht this optimiztion is of limited use for tree utomt. Under the ssumption tht the utomt A nd B do not contin useless sttes, the reson is tht for ny q Q A nd r Q B, if q ppers t left-hnd side of some rule of rity more thn 1, then no reflexive reltion from (A B) llows q r. 3 Algorithm 4 describes our method for checking lnguge inclusion of TA in pseudocode. It closely follows Algorithm 2. It differs in two min points. First, the initil vlue of the Next set is the result of pplying the function Initilize on the set {(i,minimize(i B )) Σ 0,i I A }, where Initilize is the sme function s in Algorithm 2. Second, the computtion of the Post imge of set of product-sttes mens tht for ech symbol Σ n,n N, we construct the Post -imge of ech n-tuple of product-sttes from the set. Like in Algorithm 3, we design the lgorithm such tht we void computing the Post -imge of tuple more thn once. We define the post imge Post(PSttes)(r, R) of set of product-sttes PSttes w.r.t. product-stte (r, R) PSttes. It is the set of ll product-sttes (q,p) such tht there is some Σ,#()=n nd some n-tuple ((q 1,P 1 ),...,(q n,p n )) of product-sttes from PSttes tht contins t lest one occurrence of (r,r), where q Post (q 1,...,q n ) nd P = Post (P 1,...,P n ). Theorem 3. Algorithm 4 lwys termintes nd returns TRUE iff L(A) L(B). 3 To see this, ssume tht n open tree t is ccepted from (q 1,...,q n ) Q n A,q = q i,1 i n. If q r, then by the definition of, t L (A B)(q 1,...,q i 1,r,q i+1,...,q n ). However, tht cnnot hppen, s A B does not contin ny rules with left hnd sides contining both sttes from A nd sttes from B.

40000 35000 30000 25000 20000 15000 10000 5000 Antichin Simultion 0 0 1000 2000 3000 4000 5000 6000 () Detiled results Size Antichin Simultion 0-1000 0.059 0.099 1000-2000 1.0 0.7 2000-3000 3.6 1.69 3000-4000 11.2 3.2 4000-5000 20.1 4.79 5000-33.7 6.3 (b) Averge execution time for different NFA pir sizes (in seconds) Fig. 3. Lnguge inclusion checking on NFAs generted from regulr model checker 8 Experimentl Results In this section, we describe our experimentl results. We concentrted on experiments with inclusion checking, since it is more common thn universlity checking in vrious symbolic verifiction procedures, decision procedures, etc. We compred our pproch, prmeterized by mximl simultion (or, for tree utomt, mximl upwrd simultion), with the previous pure ntichin-bsed pproch of [11], nd with clssicl subset-construction-bsed pproch. We implemented ll the bove in OCml. We used the lgorithm in [7] for computing mximl simultions. In order to mke the figures esier to red, we often do not show the results of the clssicl lgorithm, since in ll of the experiments tht we hve done, the clssicl lgorithm performed much worse thn the other two pproches. 8.1 The Results on NFA For lnguge inclusion checking of NFA, we tested our pproch on exmples generted from the intermedite steps of tool for bstrct regulr model checking [4]. In totl, we hve 1069 pirs of NFA generted from different verifiction tsks, which included verifying version of the bkery lgorithm, system with prmeterized number of producers nd consumers communicting through double-ended queue, the bubble sort lgorithm, n lgorithm tht reverses circulr list, nd Petri net model of the reders/writers protocol (cf. [4, 3] for detiled description of the verifiction problems). In Fig. 3 (), the horizontl xis is the sum of the sizes of the pirs of utomt whose lnguge inclusion we check, nd the verticl xis is the execution time (the time for computing the mximl simultion is included). Ech point denotes result from inclusion testing for pir of NFA. Fig. 3 (b) shows the verge results for different NFA sizes. From the figure, one cn see tht our pproch hs much better performnce thn the ntichin-bsed one. Also, the difference between our pproch nd the ntichinbsed pproch becomes lrger when the size of the NFA pirs increses. If we compre the verge results on the smllest 1000 NFA pirs, our pproch is 60% slower thn the the ntichin-bsed pproch. For the lrgest NFA pirs (those with size lrger thn 5000), our pproch is 5.32 times fster thn the the ntichin-bsed pproch. We lso tested our pproch using NFA generted from rndom regulr expressions. We hve two different tests: (1) lnguge inclusion does not lwys hold nd (2) lnguge inclusion lwys holds 4. The result of the first test is in Fig. 4(). In the figure, 4 To get sufficient number of tests for the second cse, we generte two NFA A nd B from rndom regulr expressions, build their union utomton C = A B, nd test L(A) L(C).

1000000 100000 Simultion Antichin 10000 1000 100 10 1 0 500 1000 1500 2000 20000 18000 16000 14000 12000 10000 8000 6000 4000 2000 0 100 200 300 400 500 600 700 800 900 Simultion Antichin Clssicl () Lnguge inclusion does not lwys hold (b) Lnguge inclusion lwys holds Fig. 4. Lnguge inclusion checking on NFA generted from regulr expressions the horizontl xis is the sum of the sizes of the pirs of utomt whose lnguge inclusion we check, nd the verticl xis is the execution time (the time for computing the mximl simultion is included). From Fig. 4(), we cn see tht the performnce of our pproch is much more stble. It seldom produces extreme results. In ll of the cses we tested, it lwys termintes within 10 seconds. In contrst, the ntichin-bsed pproch needs more thn 100 seconds in the worst cse. The result of the second test is in Fig. 4(b) where the horizontl xis is the length of the regulr expression nd the verticl xis is the verge execution time of 30 cses in milliseconds. From Fig. 4(b), we observe tht our pproch hs much better performnce thn the ntichin-bsed pproch if the lnguge inclusion holds. When the length of the regulr expression is 900, our pproch is lmost 20 times fster thn the ntichin-bsed pproch. When the mximl simultion reltion is given, nturl wy to ccelerte the lnguge inclusion checking is to use to minimize the size of the two input utomt by merging -equivlent sttes. In this cse, the simultion reltion becomes sprser. A question rises whether our pproch hs still better performnce thn the ntichin-bsed pproch in this cse. Therefore, we lso evluted our pproch under this setting. Here gin, we used the NFA pirs generted from bstrct regulr model checking [4]. The results show tht lthough the ntichin-bsed pproch gins some speed-up when combined with minimiztion, it is still slower thn our pproch. The min reson is tht in mny cses, simultion holds only in one direction, but not in the other. Our pproch cn lso utilize this type of reltion. In contrst, the minimiztion lgorithm merges only simultion equivlent sttes. We hve lso evluted the performnce of our pproch using bckwrd lnguge inclusion checking combined with mximl bckwrd simultion. As Wulf et l. [11] hve shown in their pper, bckwrd lnguge inclusion checking of two utomt is in fct equivlent to the forwrd version on the reversed utomt. This cn be esily generlized to our cse. The result is very consistent to wht we hve obtined; our lgorithm is still significntly better thn the ntichin-bsed pproch. 8.2 The Results on TA For lnguge inclusion checking on TA, we tested our pproch on 86 tree utomt pirs generted from the intermedite steps of regulr tree model checker [2] while verifying the lgorithm of reblncing red-blck trees fter insertion or deletion of lef

node. The results re given in Antichin Simultion Tble 1. Our pproch hs Size Diff. # of Pirs (sec.) (sec.) much better performnce when 0-200 1.05 0.75 139.5% 29 the size of TA pir is lrge. 200-400 11.7 4.7 246% 15 For TA pirs of size smller 400-600 65.2 19.9 327.9% 14 thn 200, our pproch is on 600-800 3019.2.6 568.7 531% 13 verge 1.39 times fster thn 800-1000 4481.9 840.4 533% 5 the ntichin-bsed pproch. 1000-1200 11761.7 1720.9 683.4% 10 However, for those of size Tble 1. Lnguge inclusion checking on TA bove 1000, our pproch is on verge 6.8 times fster thn the ntichin-bsed pproch. 9 Conclusion We hve introduced severl originl wys to combine simultion reltions with ntichins in order to optimize lgorithms for checking universlity nd inclusion on NFA. We hve lso shown how the proposed techniques cn be extended to the domin of tree utomt. This ws chieved by introducing the notion of lnguges of open trees ccepted from tuples of tree utomt sttes nd using the mximl upwrd simultions prmeterized by the identity proposed in our erlier work [1]. We hve implemented the proposed techniques nd performed number of experiments showing tht our techniques cn provide very significnt improvement over currently known pproches. In the future, we would like to perform even more experiments, including, e.g., experiments where our techniques will be incorported into the entire frmework of bstrct regulr (tree) model checking or into some utomt-bsed decision procedures. Aprt from tht, it is lso interesting to develop the described techniques for other clsses of utomt (notbly Büchi utomt) nd use them in setting where the trnsitions of the utomt re represented not explicitly but symboliclly, e.g., using BDDs. References 1. P.A. Abdull, A. Boujjni, L. Holík, L. Kti, nd T. Vojnr. Computing Simultions over Tree Automt. In Proc. of TACAS 08, LNCS 4963, 2008. 2. A. Boujjni, P. Hbermehl, L. Holík, T. Touili, nd T. Vojnr. Antichin-Bsed Universlity nd Inclusion Testing over Nondet. Finite Tree Automt. In CIAA 08, LNCS 5148, 2008. 3. A. Boujjni, P. Hbermehl, P. Moro, T. Vojnr. Verifying Progrms with Dynmic 1- Selector-Linked Structures in Regulr Model Checking. In TACAS 05, LNCS 3440, 2005. 4. A. Boujjni, P. Hbermehl, nd T. Vojnr. Abstrct Regulr Model Checking. In Proc. of CAV 04, LNCS 3114. Springer, 2004. 5. J. A. Brzozowski. Cnonicl Regulr Expressions nd Miniml Stte Grphs for Definite Events. In Mthemticl Theory of Automt, 1962. 6. D. L. Dill, A. J. Hu, nd H. Wong-Toi. Checking for Lnguge Inclusion Using Simultion Preorders. In Proc. of CAV 92, LNCS 575. Springer, 1992. 7. L. Holík nd J. Šimáček. Optimizing n LTS-Simultion Algorithm. Technicl Report FIT- TR-2009-03, Brno University of Technology, 2009. 8. J. E. Hopcroft. An n.log n Algorithm for Minimizing Sttes in Finite Automton. Technicl Report CS-TR-71-190, Stnford University, 1971. 9. A. R. Meyer nd L. J. Stockmeyer. The Equivlence Problem for Regulr Expressions with Squring Requires Exponentil Spce. In Proc. of the 13th Annul Symposium on Switching nd Automt Theory. IEEE CS, 1972. 10. F. Møller. http://www.brics.dk/utomton, 2004. 11. M. D. Wulf, L. Doyen, T. A. Henzinger, nd J.-F. Rskin. Antichins: A New Algorithm for Checking Universlity of Finite Automt. In Proc. of CAV 06, LNCS 4144. Springer, 2006.

A Correctness of the NFA Universlity Checking The following lemm is implied directly by the fct tht if L(A)(P) L(A)(R), then the shortest word rejected by R is lso rejected by P. Lemm 9. Let P nd R be two mcro-sttes such tht L(A)(P) L(A)(R). We hve Dist(P) Dist(R). Lemm 3. The below two loop invrints hold in Algorithm 1: 1. Univ(Processed Next) = Univ({I}). 2. Univ({I}) = Dist(Processed) > Dist(Next). Proof. It is trivil to see tht the invrints hold t the entry of the loop, tking into ccount Lemm 2 covering the effect of the Minimize function. We show tht the invrints continue to hold when the loop body is executed from configurtion of the lgorithm in which the invrints hold. We use Processed old nd Next old to denote the vlues of Processed nd Next when the control is on line 4 before executing the loop body nd we use Processed new nd Next new to denote their vlues when the control gets bck to line 4 fter executing the loop body once. We ssume tht Next old = /0. Let us strt with Invrint 1. Assume first tht Univ(Processed old Next old ) holds. Then, R must be universl, which holds lso for ll of its successors nd, due to Lemm 2, lso for their minimized versions, which my be dded to Next on line 10. Hence, Univ(Processed new Next new ) holds fter executing the loop body, nd thus Invrint 1 holds too. Now ssume tht Univ(Processed old Next old ) holds. Then, Univ({I}) holds, nd hence Invrint 1 must hold for Processed new nd Next new too. We proceed to Invrint 2 nd we ssume tht Univ({I}) holds (the other cse being trivil). Hence, Dist(Processed old ) > Dist(Next old ) holds. We distinguish two cses: 1. Dist(R)= or Q Processed old : Dist(Q) Dist(R). In this cse, Dist(Processed) will not decrese on line 5. From Dist(Processed old ) > Dist(Next old ), there exists some mcro-stte R in Next old s.t. Dist(R )=Dist(Next old ) < Dist(Processed old ) Dist(Q) Dist(R). Therefore, Dist(Next) will not chnge on line 5 either. Moreover, for ny mcro-stte P, removing Q s.t. P Q from Next nd Processed on line 9 nd then dding P to Next on line 10 cnnot invlidte Dist(Processed new ) > Dist(Next new ) since Dist(P) Dist(Q) due to Lemms 2 nd 9. Hence, Invrint 2 must hold for Processed new nd Next new too. 2. Dist(R) = nd Q Processed old : Dist(Q) Dist(R). In this cse, the vlue of Dist(Processed) decreses to Dist(R) on line 5. Clerly, Dist(R) = 0 or else we would hve terminted before. Then there must be some successor R of R which is either rejecting (nd the loop stops without getting bck to line 4) or one step closer to rejection, mening tht Dist(R ) < Dist(R). Moreover, R either ppers in Next new or there lredy exists some R Next old such tht R R, mening tht Dist(Processed new ) > Dist(Next new ). It is impossible tht R Processed old : R R, becuse R Processed old : Dist(R ) > Dist(R) > Dist(R ) nd from Lemms 2 nd 9, R R implies Dist(R ) < Dist(R ). Furthermore, if some mcro-stte is removed from Processed on line 9, Dist(Processed) cn only grow, nd hence we re done.

Lemm 10 (Termintion). Algorithm 1 eventully termintes. Proof. For the lgorithm not to terminte, it would hve to be the cse tht some mcrostte is repetedly dded into Next. However, once some mcro-stte R is dded into Next, there will lwys be some mcro-stte Q Processed Next such tht Q R. This holds since R either stys in Next, moves to Procesed, or is replced by some Q such tht Q R in ech itertion of the loop. Hence, R cnnot be dded to Next for the second time since mcro-stte is dded to Next on line 10 only if there is no Q Processed Next such tht Q R. Theorem 1. The lgorithm termintes with the return vlue FALSE if the input utomton A is not universl. Otherwise, it termintes with the return vlue TRUE. Proof. From Lemm 10, the lgorithm eventully termintes. It returns FALSE only if either the set of initil sttes is rejecting, or the minimized version R of some successor S of mcro-stte R chosen from Next on line 5 is found rejecting. In the ltter cse, due to Lemm 2, S is lso rejecting. Then R is non-universl, nd hence Univ(Processed Next) is flse. By Lemm 3 (Invrint 1), we hve A is not universl. The lgorithm returns TRUE only when Next becomes empty. When Next is empty, Dist(Processed) > Dist(Next) is not true. Therefore, by Lemm 3 (Invrint 2), A is universl. B Correctness of the TA Universlity Checking In this section, we prove correctness of Algorithm 3 in very similr wy to Algorithm 1, using suitbly modified notions of distnces nd rnks. Let A =(Q,Σ,,F) be TA. For n 0 nd n n-tuple of mcro-sttes (Q 1,...,Q n ) where Q i Q for 1 i n, we let Dist(Q 1,...,Q n )=0 iff Q i F = /0 for some i {1,...,n}. We define Dist(Q 1,...,Q n )=k N + { } iff Q i F for ll i {1,...,n} nd k = min({ t t Tn (Σ) t L (A)(Q 1,...,Q n )}). Here, we ssume min(/0)=. For set MSttes of mcro-sttes over Q, we let Rnk(MSttes)=min({Dist(Q 1,...,Q n ) n 1 1 i n : Q i MSttes}) nd we define Univ(MSttes) Rnk(MSttes)=. Lemm 11. The below two loop invrints hold in Algorithm 3: 1. Univ(Processed Next) = Univ({I Σ 0 }). 2. Univ({I Σ 0 })= Rnk(Processed) > Rnk(Processed Next). Proof. It is trivil to see tht the invrints hold t the entry of the loop, tking into ccount Lemm 7. We show tht the invrints continue to hold when the loop body is executed from configurtion of the lgorithm in which the invrints hold. We use Processed old nd Next old to denote the vlues of Processed nd Next when the control is on line 4 before executing the loop body nd we use Processed new nd Next new to denote their vlues when the control gets bck to line 4 fter executing the loop body once. We ssume tht Next old = /0. Let us strt with Invrint 1. Assume first tht Univ(Processed old Next old ) holds. Then, R cn pper within tuples constructed over Processed old Next old which re universl only. In such cse, ll mcro-sttes Q rechble from ll tuples T built over

Processed old Next old re such tht when we dd them to Processed old Next old, the resulting set will still llow building universl tuples only. Otherwise, one could tke non-universl tuple contining some of the newly dded mcro-sttes Q, replce Q by the tuple T from which it rose, nd obtin non-universl tuple over Processed old Next old, which is impossible. Hence, the possibility of dding the new mcro-sttes to Next on line 10 cnnot cuse non-universlity of Processed new Next new, which due to Lemm 7 holds when dding the minimized mcro-sttes too. Moreover, removing elements from Next or Processed cnnot cuse non-universlity either. Hence, Invrint 1 holds over Processed new nd Next new in this cse. Next, let us ssume tht Univ(Processed old Next old ) holds. Then, Univ({I Σ 0 }) holds, nd hence Invrint 1 must hold for Processed new nd Next new too. We proceed to Invrint 2 nd we ssume tht Univ({I Σ 0 }) holds (the other cse being trivil). Hence, Rnk(Processed old ) > Rnk(Processed old Next old ) holds. We distinguish two cses: 1. In order to build tuple T over Processed old nd Next old tht is of Dist equl to Rnk(Processed old Next old ), one needs to use mcro-stte Q in Next old \{R}. The mcro-stte Q stys in Next new or is replced by -smller mcro-stte dded to Next on line 10 tht, due to Lemm 7, cn only llow to build tuples of the sme or even smller Dist. Likewise, the mcro-sttes ccompnying Q in T sty in Next new or Processed new or re replced by -smller mcro-sttes dded to Next on line 10 llowing to build tuples of the sme or smller Dist, due to Lemm 7. Hence, moving R to Processed on line 5 cnnot cuse the invrint to brek. Moreover, dding some further mcro-sttes to Next on line 10 cn only cuse Rnk(Processed Next) to decrese while removing mcro-sttes from Processed on line 9 cn only cuse Rnk(Processed) to grow. Finlly, replcing mcro-stte in Next by -smller one s combined effect of lines 9 nd 10 cn gin just decrese Rnk(Processed Next), due to Lemm 7. Hence, in this cse, Invrint 2 must hold over Processed new nd Next new. 2. One cn build some tuple T over Processed old nd Next old tht is of Dist equl to Rnk(Processed old Next old ) using Processed old {R} only. In this cse, there must be tuples constructible over Processed old {R} nd contining R tht re not universl. We cn distinguish the following subcses: () From some of the tuples built over Processed old {R} nd contining R, nonccepting mcro-stte is reched vi single trnsition of A, nd the lgorithm stops without getting bck to line 4. (b) Otherwise, some of the mcro-sttes tht pper in Post(Processed, R) nd tht will be dded in the minimized form to Next must llow one to construct tuples which re of Dist smller thn those bsed on R. This holds since if mcrostte Q is reched from some tuple T contining R by single trnsition, we cn replce T in lrger tuples leding to non-ccepttion by Q, nd hence decrese the size of the open tree needed to rech non-ccepttion. Tking into ccount Lemm 7 to cover the effect of the minimiztion nd using similr resoning s bove for covering the effect of lines 9 nd 10, it is then cler tht Invrint 2 will remin to hold in this cse.

Lemm 12. Algorithm 3 eventully termintes. Proof. An nlogy of the proof of Lemm 10. Theorem 2 cn now be proved in very similr wy s Theorem 1. C Correctness of the TA Lnguge Inclusion Checking We prove correctness of Algorithm 4 in very similr wy to Algorithm 2, using suitbly modified notions of distnces nd rnks. Let A =(Σ,Q A,F A, A ) nd B =(Σ,Q B,F B, B ) be two tree utomt. For n 0 nd n n-tuple of mcro-sttes ((q 1,P 1 ),...,(q n,p n )), we let Dist((q 1,P 1 ),...,(q n,p n )) = 0 iff ε L (A,B)((q 1,P 1 ),...,(q n,p n )). Otherwise we define Dist((q 1,P 1 ),...,(q n,p n )) = k N + { } iff k = min({ t t Tn (Σ) t L (A,B)((q 1,P 1 ),...,(q n,p n ))}). Here, we ssume min(/0) =. For set PSttes of product-sttes, we let Rnk(PSttes) = min({dist((q 1,P 1 ),...,(q n,p n )) n 1 1 i n : (q i,p i ) PSttes}). The predicte Incl(PSttes) is defined to be true iff Rnk(PSttes)=. Lemm 13. The following two loop invrints hold in Algorithm 4: 1. Incl(Processed Next) = Incl( S Σ 0 {(i,i B ) i I A }). 2. Incl( S Σ 0 {(i,i B ) i I A }) = Rnk(Processed) > Rnk(Processed Next). The proof is similr to tht of Lemm 11. Lemm 14. Algorithm 4 eventully termintes. Proof. An nlogy of the proof of Lemm 10. Theorem 3 cn now be proved in very similr wy s Theorem 1.