Lecture 2: Lnguges vi Logicl Formule Lnguges cn e defined (nd this is perhps the most nturl wy to define them) s words stisfying prticulr property. For exmple, the set of words stisfying ech of the following properties is regulr lnguge: 1. Every occurnce of n is eventully followed y. 2. There is exctly one. 3. The first letter is n. 4. The lst letter is. 5. Every occurnce of is immeditely followed y. 6. There re even numer of s. However, the set of words which stisfy the property there re equl numer of s nd s is not regulr lnguge. So wht identifies properties tht define regulr lnguges? 1 First order logic of words A nturl forml lnguge to descrie properties such s the ones descried ove is the first order logic over words. This logic is quite esy to understnd nd we shll illustrte this logic with few exmples. For instnce, consider the formul, x. ((x) y. ((y > x) (x)) Here, the vriles x, y etc. refer to positions in the word. The formul (x) sserts tht the letter t position x is. The quntifiers hve the usul mening. The formul y > x is true if the position y ppers somewhere to the right of the position x. Thus, word w stisfies the ove formul only if for ny position (x) with the letter, there is some position to its right (y) with the letter. This cptures in the lnguge of first order logic the first property listed ove. The second property cn e expressed s x. y. ((x) (y)) x = y Consider the formul First(x) = y.(x = y) (x < y). This formul evlutes to true t position x if nd only if it is the first position in the word. With this, the third property cn e expressed s x.((x) First(x)) Consider the formul (x < y) z.(x < z) (y z) (where y z is (y = z) (y < z).) It sserts tht the position y is the position immeditely to the right of the position x. We 1
shll write (y = x + 1) to denote this formul nd using this we write the fourth property ove s x.(x) y.(y = x + 1) (y) As it turns out, the fifth property, which does descrie regulr lnguge, hs no equivlent in the first order logic of words. This is one of the results tht we shll prove. But for the moment, we shll turn to extending the first order logic of words so tht ll regulr lnguges cn e descried. 2 Mondic logic over words In first order logic, the only kind of vriles we hve re those tht represent positions in the word. In mondic logic, in ddition to the position vriles, we re lso llowed to use vriles tht represent sets of positions. We shll use X,Y,... to denote vriles tht rnge over sets of positions. We re lso llowed to quntify over such vriles. For exmple X. x.(x X) (x) sserts tht in every suset of positions, every position it contins hs the letter. Clerly this is true of word if nd only if every position in the word hs the letter. The following formul X. x.(x X) (x) is not true of ny word!! This is ecuse, when X is the empty set of positions x.(x X) (x) cnnot e stisfied. Once we hve quntifiction over sets, we cn descrie the set of even length words s follows: The set of ll positions cn e divided into two sets X nd Y such tht 1. The first position is in X 2. The lst position is in Y 3. for ech position x if x is in X then the next position (if it exists) lies in Y 4. for ech position x if x is in Y then the next position (if it exists) is in X The third nd fourth properities ensure tht the positions of the word re lterntely in X nd Y nd the first nd second ssertions ensure tht there re even numer of lterntions nd thus the word must hve even numer of positions! The mondic formul expressing the ove properties is: X. Y. ( x.(x X) (x Y ) x.first(x) (x X) x.lst(x) (x Y ) x. y. ((x X) (y = x + 1)) (y Y ) x. y. ((x Y ) (y = x + 1)) (y X)) 2
The first line sserts tht the set of ll positions is divided into two sets X nd Y nd the following four lines descrie the four properties mentioned ove. This extended logic which includes the use of set vriles (nd x X) is clled the second-order mondic logic of words (or second-order mondic logic of order). We shll often write MSO to denote this logic. It is lso clled S1S (or second-order mondic logic of one successor). 3 Mondic logic nd regulr lnguges Büchi nd Elgot showed tht the clss of lnguges definle using formuls in mondic logic over words is precisely the clss of regulr lnguges. Theorem 1 (Büchi/Elgot) A lnguge L is regulr if nd only if it cn e descried using formul in MSO. The trnsltion from finite utomton ccepting regulr lnguge to MSO formul descriing the sme lnguge is the esier direction. Before we give the detils, we shll illustrte the ides with simple exmple. Consider the following utomton with 2 sttes. q 0 q 1 Let us exmine run of this utomton on some word, sy. q 0 Oserve tht run on word of length n involves sequence of sttes of length n + 1. For the moment if we omit the lst stte reched on input of length n, we hve sequence of of sttes of length n. Thus, we cn think of the run (without the lst stte) s lelling of the positions of the word with sttes (the stte reched efore the letter t tht position is red). Cn we then write out formul tht descries the run of the utomton on ny given word? Here is n outline of how to do this: The trick is to use one second order vrile X q for ech q Q, nd use X q to pick out the positions tht re lelled y stte q. Let the sttes of the utomton e q 0,q 1,...q k. (Thus, for the ove exmple we use two vriles X q0 nd X q1.) We then ssert tht the set of positions cn e decomposed into Q sets, X q0,x q1,...x qk such tht 3
1. The first position elongs to X q0 (since q 0 is the strt stte of the utomton). 2. If x nd x + 1 re positions in the given word nd the letter t position x is some then, x nd x + 1 elong to sets X p nd X q such tht δ(p,) = q. 3. If x is the lst position in the word w, the letter t this position is nd x elongs to X q then δ(q,) F. For the ove utomton these ssertions cn e expressed in MSO s follows: X q0. X q1. ( x.(x X q0 ) (x X q1 ) x.first(x) (x X q0 ) x. y. ((x X q0 ) (x) (y = x + 1)) (y X q1 ) x. y. ((x X q0 ) (x) (y = x + 1)) (y X q0 ) x. y. ((x X q1 ) (x) (y = x + 1)) (y X q0 ) x. y. ((x X q1 ) (x) (y = x + 1)) (y X q1 ) x.(lst(x) ((x X q0 (x)) (x X q1 (x))) The first line sys ech position is lelled either with q 0 or q 1 (to represent the stte reched efore leding the letter t tht position). The second line sserts tht stte corresponding to the first position is the strt stte. The susequent four lines encode the four possile trnsitions of the utomton. The lst line ensures tht the stte ssigned to the lst position is such tht the trnsition from this stte on the letter t the lst position tkes the run into finl stte. In generl given finite utomton A = ({q 0,...,q k }, Σ,δ,q 0,F), the following formul sserts tht there is n ccepting run for A over the given word. Thus the models of this formul re precisely the words in the lnguge ccepted y A. X q0. X q1... X qk. x. 0 i k x X q i x. i j (x X i) (x X j ) x.first(x) x X q0 δ(p,)=q x.((x X p) (x) (y = x + 1)) (y X q ) x.lst(x) ( δ(p,) F ((x X p) (x))) The first nd second lines ssert tht the set of positions is decomposed into collection of sets, one for ech stte in Q. The the next three lines ssert the 3 properties mentioned ove nd thus word stisfies this property if nd only if it is ccepted y the utomton A. This completes sketch of the rgument showing tht every regulr lnguge cn e descried y MSO formul. Notes: In this lecture nd the next our presenttion follows tht of Struing [1]. 4
References [1] Howrd Struing: Finite Automt, Forml Logic nd Circuit Complexity, Birkhäuser, 1994. 5