Prcedure fr Checking Equality f Regular Expressins. GINZBURG * Carnegie Institute f Technlgy, Pittsburgh, Pa. ~STm~CT. simple "mechanical" prcedure is described fr checking equality f regular expressins. The prcedure, based n the wrk f. Salmaa, uses derivatives f regular expressins and transitin graphs. Given a regular expressin R, a crrespnding transitin graph is cnstructed. It is used t generate a finite set f left-linear equatins which characterize R. Tw regular events R and S are equal if and nly if each cnstant term in the set f left-linear equatins frmed fr the pair (~)is (~)r (~). The prcedure des nt invlve any cmputatins with r transfrmatins f regular expressins and is especially apprpriate fr the use f a cmputer.. Let R dente a regular expressin ver the alphabet Z = {, } (a tw-letter ~dphabet is taken fs simplicity), and let x be a wrd in Z*. t~ will dente the derivative f R with respect t x, i.e., the set f wrds w ~ Z* such that xw C R. Fr instance, fr the empty wrd ne has R = R. The basic prperties f derivatives f regular expressins are derived in [2], where it is als prved that every R has nly a finite number f unequal derivatives. Let R = R (), R(~), ''',R (m) (R) be a set f derivatives f R such that every derivative f R is equal t at least ne clement in this set and assume that fr every R (~) (i =, 2, -, m) ne can single ut R (j~) and R (~ such that ~ R~ i) = R (jo and R~ = R ki). (2R) Then it is pssible t cnstruct the system f left-linear equatins R (~) = R~ ~) + ir~ ~) +~(~) = OR (j~) + ir k +~(~) (i =,2,...,m), (3R) where 7 ~) = if C R (~), and,~(i) = ~ (the empty set) therwise. The system (3R) has a unique slutin (up t equality f regular expressins) [2, 6]. 2. Let S be anther regular expressin, and assume that the set S = S ('), S(~), ''', S (~) (S) has the same prperties as (R); i.e., there can be fund equalities (2S) similar t On leave frm Technin, Israel Institute f Technlgy, Haifa, Israel. ' hi this paper tw regular expressins, R and S, are said t be equal (ntatin: R = S) if and nly if the regular events described by these expressins are equal. Jurnal f the ssciatin fr Cmputing Machinery, l. 4, N. 2, pril 967, pp. 355-362.
356 BRHM GINZBI (2R). Using them, ne can cnstruct a system S w) = OS~') + SIX') +/t(~') = OS(Ji ') + S(~') + ~(i,) (i' =,2,...,n, ~(~') = "4) similar t (3R). Using (3R) and (3S), ne can build the fllwing "cmpund system" fr R S. (This cnstructin appears essentially in [6].) Starting with the pair (the "cla vectr"), i.e., \S]' ne writes (i s ()]= S S + s~ + ~ " Using (3R) and (3S) r, if these systems are nt explicitly written, using (2R) : (2S), ne replaces in the right-hand side f this equatin the derivatives f R f S by equal derivatives frm (R) and (S), respectively. Fr each pair ~ S(~,)] btained in the right-hand side f the equatin, ne adds equatin S(,] = O~,s~i,)] + l ~,si") ) t- ~ 6(i') ], and the pairs f derivatives in its right-hand side are replaced nce mre by ments frm (R) and (IS), using (2R) and (2S). The prcedure is cntinued u there are n new pairs. It fllws frm the existence f (R) and (S) that number u f distinct pairs will satisfy u _< ran. By enumerating the pairs, ne tains the cmpund system S(.) ] \ S(.) /,, S(.~) / \ ~(a) / ' where a =, 2,..., u, _< a ~ u, ~ ai ~ U, R(I) = ~, S() = and ~(,) and ~(~) are/~ f. If ~(,) = ~(,)fr every a, ne has in (4) tw iden~t systems f equatins fr the R(~) and S(,) ; hence, R(,) -- S(,) (a =, 2,, particularly R = R(~) = S() = S. Cnversely, if R = S, then in the cmpund system (4), btained by (3R) (3S) in the abve way, ne has necessarily ~(,) = ~i(,). (This is explicitly shw [6] with "right derivatives" instead f the "left" nes used here.) Thus, the equality R = S f tw regular expressins can be established by sh ing that in the cmpund system (4) fr R and S,,y(~) = ~(,) fr all a. This car dne by cmputing derivatives f R and S. Unfrtunately, the derivatin is l quite cumbersme and invlves als the cmparisn f the results in rder t fir finite set cntaining all unequal derivatives, Therefre~ it seems t be f interes find a simple "mechanical" prcedure fr cnstructin f (4). Such a prcedm described belw. 3. Given a regular expressin R; there exist straightfrward algrithms fr ( structing a transitin graph (called als a transitin system in [3]) representin~ Fr example, let R [ --b ( + )"]'. Cnsider the transitin grap] in Figure. The vertices (in the present case vertex nly) dented by -- Jurnal f the ssciatin fr Cmputing Machinery, l. 4, N. 2, pril 967
Checking Equality f Regular Expressins 357 called initial, while thse dented by -"k (vertex 5) are called final. This transitin graph represents the given R, because every patti starting at. an initial vertex and ending at a final ne crrespnds t a wrd in R, and, cnversely, t every wrd in R there crrespnds such a path in G. Fr example, the path -4-3-3--2~-5 describes the wrd C R. 4. The same transitin graph G chn be used als t describe derivatives f R. T this end, dente by ~ the set f all vertices in G which can be reached frm the initial vertices fllwing a path crrespnding t the wrd x ~ ~2". It fllws immediately frm the definitin f the derivative that R~: cnsists f all wrds and nly f these wrds, which crrespnd t paths leading frm the vertices in ~ t the final vertices in G. In shrt, R, is represented by the same transitin graph, but with, as initial vertices. In the abve example, R~ is described by the same G with initial vertices ~ = {2, 4, 5}. The final vertex 5 remains unchanged. Ntice that /~ C R~, because the vertex 5 is initial and final fr R~. 5. Thus, t every derivative R~ f R there can be put in crrespndence a get. :~ f vertices f G. The riginal initial vertices frm the The crrespndence between the subsets f the set f vertices f G and the unequal derivatives f R is nt ne-t-ne. T every derivative there c~respnds at least ne such subset, but there are subsets t which n derivative crrespnds, and there can be als distinct subsets describing equal derivatives (see the examples belw). Every regular expressin can be represented by a finite transitin graph, and, thus the mentined result frm [2], that every R has nly a finite number f unequal derivatives, fllws directly. 6. system f equatins (3) can be derived using the subsets n,, ~,,. nly, withut actual cmputatin f the derivatives. Indeed, cnsider Table I, which crrespnds t G, in Figure. The entries in the first, clumn ("inputs") are wrds x ~ 2" rdered by lengttl and fr the same length by the numerical magnitude. In the secnd clunm ("vertices f G") the crrespnding subsets f vertices ~ are marked. Thus, = {}, = {3}, = {2, 4, 5}, = 3}, and s n, as can be read directly frnt I i 5 _ + I I FiG, I Jurnal f t~fe sseiatin fr Cmputing Machinery, l. 4, N. 2, ~ri! 967
358 BRHM GINZBUR( Figure. In the third clumn ("equal t"), appears in the rw f, becaus~ = (i.e., R = R). = implies/~ in the rw f, etc. rw (an( the crrespnding derivative) with an entry in the clumn "equal t" will be calle( a terminal. Here all derivatives f "secnd rder" are terminal; i.e., they are equal tl derivatives f smaller rders and, dearly, s will be all "higher" derivatives. Thus the table need nt be prlnged. s a rule, if the rw f x is terminal, ne des n enter in the table mre inputs beginning with x. In the last clumn ("includes/~") a "" appears, if and nly if the crrespnd ing ~ includes a final vertex (these vertices are labeled with a -t- ). Fr any x which is nt terminal, the rws x and xl are added t the table. Th, prcess is stpped when there are n new nnterminal wrds. (There is nly a finit number f subsets in a finite set!) The btained table can be used t write the system (3R) fr R, because the set c the nnterminal derivatives fulfills dearly the prperties f (R). One has R = R = R + R R = R + Rl = R + R (5 R = R~+ Rn + = OR + RW. Ntice that appears in the equatins fr the derivatives with a "" in the lag clumn f the table. 7. The abve technique is nw used t check an equality R = S. Example. n equality frm [5]: R ~- [ + ( + )"]' = ()* -t- ()*( -t- )[ + ()*( -I- )"()* =-- S. R was discussed abve. Nw the same prcedure will be applied t S. transitin graph H fr S is given in Figure 2. The system f equatins (3S) is here (see Table II) : S = S + S S = S -t- Sl = S + Sl $ = S~+ Sn+ = S+ S+ (6 Sl = S + Sll = S + Sn Sll = Sn + Sre + = Sl + S +. TBLE I ertices f G Includes Inputs 2 3 4 Equal t 5+ Ol lo Jurnal f the ssciatin fr Cmputing Machinery, l. 4, N. 2, pril 967
Checking Equality f Regular Expressins 359 I I I I I I + Fr. 2 TBLE II D@uts Ol lo ll ertices f H 2 3 4 5 6 7 8 9 + Equal t i Includes i.... Ol Tile cmpund system can be written using (5) and (6), r directly frm the tables, which actually give the equalities (2R) and (2S). One btains: S^ S + Sl S S + Sl Sl S + Sil There are n new pairs, and fr all appearing pairs 7(,) = ~(,) ; hence R = S. Ntice that it fllws that Sl= S (because Rl = R), but this fact was nt clear frm the table fr S. This is an example f tw equal derivatives with distinct subsets. Jurnal f the ssciatin fr Cmputing Machinery, l. 4, N. 2, pril 967
36 BRHM GINZBUR 8. Prcedure fr Checking an Equality R = S. I. Cnstruct transitin graphs fr R and S. II. Cnstruct the crrespnding tables. III. Write the set f the distinct pairs, which will appear in the cmpund systc~ (use the clumns "equal t" f the tables). I. R = S if and nly if bth elements in each pair simultaneusly d r d n( include. (Use the clumns "includes " fr checking this prperty.) 9. Example 2. R =-- [(")*"]* = + ( + )* + ( + )*( + )* =- S This equality and the transitin graph fr R appear in [4]. Fr R, see Figure 3 an Table III. Ntice that in the ease when there are arrws with in the transitin grap] i C ~ implies that every vertex which can be reached frm i by a chain f arrv is als an element f ~. In the last case, fr example, includes additinal t als2and3, and46~l, 2,3C~. Fr S, see Figure 4 and Table I. There will appear the fllwing pairs (r mits R and S) : First (~); it implies () and (~) The pair () implies (~) and (~) The pair () implies (~) and (~) The tw added pairs (~) and "- _() " d nt imply new nes; i'e" the set f a appearing pairs is () ' (~)' ()' (), (~) s bth elements in the pairs (), (~), () include and bth elements in tt pairs (I) ' () d n tinclude, the cheeks I are fulfilled and cnsequent] R=S, TBLE lli Inputs + 2 3 4 Equal l Includes O Jurnal f the ssciatin fr Cmputing Machinery, l. 4, N. 2, pril 967
Chec/,in, g I:,'quality q]' t{egula'r E:~:pressi(ms 36 Fi. 3 I Fr. 4 I TBLE I InpuSs I 2 3 4 + + Equal t Includes Ol. The use f the tables in the abve prcedure can be replaced by the fllwing relatinal technique. transitin graph G can be described by a set f relatins ver its vertex set in the bvius way: t every input cr C ~ and t/\ there crrespnds a relatin 7~, such that at~b if and nly if there is in G a z-arrw frm the vertex a t the vertex b. Dente by T' the transitive clsure f T and by ~/~ the unin q~' U I, where I is the identity relatin. Then fr any x = z~z2... c,k -~ I~* ne has ~ = ()(7~TT~T^". 7~). (The peratin in the brackets is the usual cmpsitin f relatins, an([ ()T = {b ] 3a E, atb}.) Jurnal f the ~ciatin fr Cmputing Machinery, k lt, N. 2, pril 967
362 BRHM GIN'ZBURG Fr example, fr G in Figure 3, 3. 3 2 ' 2 3 4 2 3 32 " = {, 2, 3} ( = {}T), = ()(~/~ T T ~) = {2, 3I. This cmputatinal apprach is especially apprpriate fr the use f a cmputer. CKNOWLEDGMENT. The authr thanks Prfessr David C. Cper and Mr. Zhar Manna fr their interest and stimulating discussins. REFERENCES. NDER, S. On the algebra f regular expressins. ppl. Math., Harvard U., Cambridge, Mass., Jan. 965, pp: -8 (ditt). 2. BnzzwsKI, J.. Derivatives f regular expressins. J. CM (Jan. 964), 48-494. 3. HRRISON, M.. Intrductin t Switching and utmata Thery. McGraw-Hill, New Yrk, 965. 4. McNUGttTON, R. Techniques fr manipulating regular expressins. Machines Structures Grup Mem N. i, MIT Prject MC, Cambridge, Mass., Nv. 965. 5. McNuI~TON, R., ND YMD, H. Regular expressins and state graphs fr autmata. Trans. IRE EC-9 (96), 39-47. 6. SLOM,. Tw cmplete axim systems fr the algebra f regular events. J. CM 3 (966), 58-69. RECEIED JUNE, 966; REISED NOEMBER, 966 $urnal f the ssciatin fr Cmputing Machinery, l. 4, N. 2, pril 967