Regular Repair of Specifications

Regulr Repir of Specifictions Michel Benedikt Oxford University michel.enedikt@coml.ox.c.uk Griele Puppis Oxford University griele.puppis@coml.ox.c.uk Cristin Riveros Oxford University cristin.riveros@coml.ox.c.uk Astrct Wht do you do if computtionl oject (e.g. progrm trce) fils specifiction? An ovious pproch is to perform repir: modify the oject minimlly to get something tht stisfies the constrints. In this pper we study repir of temporl constrints, given s utomt or temporl logic formuls. We focus on determining the numer of repirs tht must e pplied to word stisfying given input constrint in order to ensure tht it stisfies given trget constrint. This numer my well e unounded; one of our min contriutions is to isolte the complexity of the ounded repir prolem, sed on chrcteriztion of the pirs of regulr lnguges tht dmit such repir. We consider this in the setting where the repir strtegy is unconstrined nd lso when the strtegy is restricted to use finite memory. Although the streming setting is quite different from the generl setting, we find tht there re surprising connections etween streming nd non-streming, s well s within vrints of the streming prolem. I. INTRODUCTION When computtionl oject does not stisfy specifiction, n ovious pproch is to repir it edit it minimlly so tht it ecomes vlid. We my wnt to perform this editing trnsformtion on the oject, or we my e merely interested in knowing how difficult it would e to perform tht is, determining how fr given oject or collection of ojects is from stisfying the specifiction. In the dtse community, this hs een extensively studied under the notion of constrint repir (see e.g. [1], [2]): the specifictions considered there re reltionl integrity constrints, such s keys nd foreign keys, nd the prolems considered include determining how much dtse needs to e modified in order to stisfy given constrint. Here we initite the study of repir for temporl constrints on words. The notion of repiring word is indeed more ovious thn in the cse of dtses: we cn simply consider the edit distnce etween strings, stndrd mesure of how mny sic opertions it tkes to get from one string to the next. Edit distnce is lifted in nturl wy to give mesure dist(w, L) of the distnce of string w to lnguge (collection of strings) L: the miniml distnce of w to ny string in L. It is well-known [3] tht the stndrd dynmic progrmming pproch to edit distnce extends to give n efficient lgorithm for clculting dist(w, L) when L is regulr lnguge given s n NFA. In this work we tke the next step nd consider distnce etween lnguges given lnguges R nd T (specified in different wys) we im to clculte how difficult it is to trnsform string stisfying R into string stisfying T. The nottion is motivted y considering R to e restriction constrint tht the input is gurnteed to stisfied while T is trget constrint tht we wnt to enforce. We consider the worst-cse over string w R of the numer of edit opertions needed to move w into T : sup w R dist(w, T ). Tht is, we look t the worst-cse numer of opertions needed to get from R to T. Of course, this numer my e infinite; the core of our results is procedure for solving the ounded repir prolem determining whether the supremum ove is finite. In order to compute this effectively, we need to restrict the lnguges R nd T. We consider this prolem for regulr lnguges, presented y oth deterministic nd non-deterministic finite utomt. We lso consider lnguges specified y liner temporl logic. In ll these cses we determine the complexity of the ounded repir prolem. Aove we considered the use of n edit/correction function tht cn red the whole string in memory. In this work we consider the impct of limittions on the editing process wht hppens when we require the editing to e done y trnsducer, reding in the input letter-y-letter nd producing the corrected output, sed only on finite mount of control stte nd fixed mount of lookhed in the word. We refer to this s streming repir processor. We isolte the complexity of the streming repir prolem for ny lookhed nd for ny of the lnguge clsses considered in the non-streming cse. The ove dels with the prolem of determining whether the distnce etween two specifictions is finite or infinite. But in the finite cse, we my wnt to compute this distnce exctly, nd to produce the processor tht optimlly edits given specifiction. Note tht in the non-streming setting, it is esy to descrie the optiml processor: it is simply the function tht given word w runs dynmic-progrmming lgorithm to compute the edit distnce to the trget (e.g. the lgorithm from [3] in the cse of NFAs). However, in the streming setting it is not cler how to derive the optiml editing lgorithm efficiently. We give results on the complexity of computing the exct ound when it is finite in oth the streming nd non-streming setting, nd lso give procedures for computing the optiml processor in the streming setting. The streming nd non-streming repir prolems hve very different flvors: the former re closely relted to gmes plyed on the components of n utomt, while the ltter require more glol nlysis, nd exhiit close reltion to distnce utomt. However, there re connections etween the different prolems: we show tht in the cse where there is no restriction, the ounded repir prolems re the sme for oth the streming nd non-streming setting. We lso show

tht the ounded repir prolem in the streming setting is independent of the lookhed, nd is roust under plusile lterntive definitions. In summry our contriutions re: We formlize the ounded repir prolem for lnguges nd chrcterize when regulr lnguges hve ounded repir, in oth the streming nd non-streming setting. We show tht the ounded repir prolem in the streming setting is independent of the lookhed, nd is roust under vrints of the cost function. Using the chrcteriztions ove, we give results on the complexity of the ounded repir prolem in ech setting. We present results on the complexity of computing the optiml ound, nd on computing the optiml strtegy in the streming cse. We demonstrte specil cses where the streming nd non-streming ounded repir prolems hve the sme solution. Orgniztion: Section II gives preliminries, while Section III defines the sic prolems nd shows some connections with gmes nd distnce utomt. Section IV gives the chrcteriztions of ounded repir tht we will use throughout the reminder. Section V studies the non-streming cse, while Section VI dels with the streming cse. Section VIII riefly discusses extensions to infinite words, while Section IX gives relted work nd conclusions. Proofs re in the full pper. II. BASIC NOTATION AND TERMINOLOGY Let Σ e finite lphet nd Σ e the set of finite words over Σ. We denote the empty word y ε nd the length of word w Σ y w. Automt. Non-deterministic finite stte utomt (shortly, NFA) will e represented y tuples of the form A = (Σ, Q, E, I, F ), where Σ is finite lphet, Q is finite set of sttes, E Q Σ Q is trnsition reltion, nd I, F Q re sets of initil nd finl sttes. By L (A) we denote the lnguge recognized y A. If A is deterministic finite stte utomton (DFA), then we usully denote the unique initil stte y q 0 nd turn its trnsition reltion E into prtil function δ from Q Σ to Q defined y δ(q, ε) = q nd δ(q, u) = δ(q, u) iff (q,, q ) E. For technicl resons, we ssume tht ll sttes of n NFA re rechle from some initil stte nd from ll sttes finl stte is rechle. It is worth noticing tht, since the decision prolems we re going to del with re t lest NLOGSPACE-hrd nd since ny given utomton cn e pruned using NLOGSPACE, this ssumption will hve no impct on our complexity results. Since utomt cn e viewed s directed (leled) grphs, we inherit the stndrd definitions nd constructions in grph theory. In prticulr, given n NFA A = (Σ, Q, E, I, F ) nd stte q Q, we denote y C(q) the strongly connected component (shortly, SCC) of A tht contins ll sttes mutully rechle from q. Given set C of sttes of A (e.g., SCC), we denote y A C the utomton otined y restricting A to the set C nd y letting the new initil nd finl sttes e ll nd only the sttes in C. Note tht if C consists of single trnsient stte, then the lnguge L (A C) recognized y the suutomton A C is empty. Finlly, we denote y dg(a) the directed cyclic (unleled) grph of the SCCs of A nd y dg (A) the grph otined from the trnsitive closure of the edges of dg(a). Trnsducers. A (letter-to-word sequentil) trnsducer is device of the form S = (Σ,, Q, δ, q 0, Ω), where Σ is finite input lphet, is finite output lphet, Q is finite set of sttes, δ is trnsition function from Q Σ to Q, q 0 is n initil stte, nd Ω is finl output function from Q to. For every input word u = 1... n Σ, there is one run of S of the form q 1/v 1 2/v 2 0 q1... n/v n ε/v n+1 qn, with δ(q i, i+1 ) = (v i+1, q i+1 ) for ll 0 i < n nd Ω(q n ) = v n+1 ; in this cse, we define the output of S on u to e the word S(u) = v 1 v 2... v n v n+1. Trnsducers s ove produce n output word immeditely fter reding n input chrcter. We will lso consider trnsducers with ounded mount of dely. A k-lookhed trnsducer, with k N, is defined s sequentil trnsducer where the trnsition function δ now hs input in Q Σ (Σ ) k with Σ = Σ { } nd Σ. Given n input word u nd position 1 i u in it, we denote y u i the (k + 1)- chrcter suword of u k tht strts t position i nd ends t position i + k. The output of n k-lookhed trnsducer S on n input u of length n is the unique word v = v 1 v 2... v n v n+1 for which there exists sequence of sttes q 0,..., q n stisfying δ(q i, u i ) = (v i, q i+1 ), for ll 1 i n nd Ω(q n ) = v n+1. Clerly, 0-lookhed trnsducer is simply stndrd (letterto-word sequentil) trnsducer. Logics. In this pper we look t lnguges defined y utomt, nd lso consider liner temporl logic LTL, which uses the modl opertors X (next) nd U (until), long with oolen opertors. Herefter, we shll interpret LTL formuls on finite models only nd this requires creful use of the modl opertors. For instnce, the LTL formul X true does not hold on singleton words. We lso ssume tht the propositionl vriles of n LTL formul re precisely the symols of the underlying lphet. This implies tht two different propositionl vriles cn not hold t the sme position in model. III. PROBLEM SETTING Given two finite lphets Σ nd, we denote y dist(u, v) the Levenshtein distnce (henceforth, edit distnce) etween two words u Σ nd v, which is defined s the length of shortest sequence s of edit opertions (e.g., deleting, modifying, nd inserting single chrcter) tht trnsforms u into v [4]. A processor is simply function from Σ to. For processor f, we refer to dist(u, f(u)) s the cost of f on the word u. Given lnguge R Σ, we define the worst-cse cost of f over R s the supremum of the cost of f over ll words in R. If the cost is unounded, then we sy tht the worst-cse cost is ω. The generl setting of repir prolem consists of two lnguges R Σ nd T, clled the restriction

nd trget lnguges, respectively. We would like to repir string tht is known to elong to the restriction lnguge into string in the trget lnguge. A processor f is repir strtegy of R into T if for every word u R, the output f(u) is in T. We denote y dist(r, T ) the worst-cse cost of n optiml repir strtegy of R into T. It is esy to see tht dist(r, T ) = sup u R min v T dist(u, v), since the est strtegy is just to output on ny u R the word in T tht is closest to u with respect to the edit distnce. The ounded repir prolem is to decide, given lnguges R nd T, whether dist(r, T ) is finite, tht is, whether there is repir strtegy f of R into T nd nturl numer n N such tht dist(u, f(u)) n for ll u R. Similrly, the threshold prolem is to compute the exct vlue of dist(r, T ). Clerly, the lnguges R nd T must e finitely represented, for instnce, in terms of mchines or logicl formuls. In this pper, we study the complexity of the ounded repir prolem for input lnguges represented y mens of the following formlisms: (i) deterministic finite stte utomt (DFA), (ii) non-deterministic finite stte utomt (NFA), nd (iii) LTL formuls with only future modl opertors. Streming vs non-streming. In its most generl formultion, repir strtegy could e ny function mpping words to edit words. However, we know from [3] tht there is dynmic progrmming lgorithm tht, given word u nd trget lnguge T represented y DFA, computes in polynomil time n optiml edit sequence s such tht s(u) T. In prticulr, this shows tht optiml repir strtegies cn e descried y functions of firly low complexity. Sometimes it is desirle to hve repir strtegies tht re in even more limited clsses. Perhps the idel cse is when strtegy is relizle with ounded memory one-pss lgorithm, tht is, using (letter-to-word sequentil) trnsducer. Recll tht letter-to-word trnsducer defines word-to-word function (i.e., processor); if this function is repir strtegy, we refer to the trnsducer s streming repir strtegy. The ide is tht ny input word u from restriction lnguge should e repired in n online wy. Similrly, we cn tlk out k-lookhed streming repir strtegy. Accordingly, we define the ounded repir prolem in the (k-lookhed) streming cse s the prolem of deciding, given two lnguges R nd T, whether there is (k-lookhed) streming strtegy for repiring R into T with uniformly ounded cost. To stress the difference etween the streming nd the non-streming settings, we explicitly refer to the originl prolem s the ounded repir prolem in the nonstreming cse. The following exmple, due to Slwomir Stworko, illustrtes the difference etween the streming nd non-streming setting: Exmple 1. Consider R = ( + )c ( + + + ) nd T = c + + c +. In the non-streming cse, one cn get from R to T y only editing the initil letter nd, thus, dist(r, T ) is equl to 1. In contrst, streming repir strtegy must decide whether to leve or chnge the initil letter, nd then it could e forced to repir n unounded sequence of or fter the sequence of c. Costs in the streming cse. Note tht, if we hve trnsducer S nd word u = 1... n Σ, then we cn define the cost of S on u in two wys: letting q 1/v 1 2/v 2 0 q1... n/v n ε/v n+1 qn e the run of S on u, we define the ggregte cost of S on u to e the sum over ll indices 1 i n of dist( i, v i ) nd v n+1, where dist( i, v i ) is 1 if v i is empty, v i 1 if i occurs in v i, nd v i otherwise; considering the trnsducer S s processor, we define the edit cost of S on u to e simply the edit distnce etween u nd the output S(u). The first cost considers the distortions performed in producing the input from the output it is equivlent to considering the trnsducer s producing edits rther thn strings nd counting the numer of edits produced. The second cost is glol nd it considers only the output nd not its production. Clerly, the lst cost never exceeds the ggregte cost. It is importnt to notice tht these two models of cost cn e very different in generl. Consider trnsducer S on the input lphet Σ = {, } tht swps s nd s. On the string u n = () n, the ggregte cost is 2n since S chnges ech letter, ut the edit cost etween u nd S(u) is only 2. Nevertheless, it will turn out tht for the ounded repir prolem it does not mtter which model of cost we choose (see Theorem 3). Specil cses. We re lso interested in vrint of the ounded repir prolem where the restriction lnguge is universl lnguge of the form Σ. In this cse, the input to the ounded repir prolem consists of restriction lphet Σ nd trget lnguge T. We refer to this vrint of the ounded repir prolem s the unrestricted cse. A. Repir Prolems, Automt, nd Gmes In the cse of DFA, oth the non-streming nd streming prolems correspond to specil cses of prior prolems studied in utomt nd gmes. Non-streming repir prolems correspond to distnce utomt, while the streming vrint corresponds to energy gmes. We explin the correspondences in detil now. In oth cses, we find tht the results for the more generl frmework do not give tight ounds. Non-streming repirs nd distnce utomt. Intuitively, distnce utomton is trnsducer D tht receives s input finite word w nd outputs corresponding cost D(w) in N = N { }. Formlly, distnce utomton is trnsducer of the form D = (Σ, Q, E, I, F ), where Σ is the input lphet, Q is finite set of sttes, E Q Σ N Q is the trnsition reltion, nd I, F : Q N re the initil nd finl cost functions. The cost D(w) on input w = 1... n Σ is otined y tking the minimum mong the costs of the runs of D on w, where the cost of run q 1/c 1 2/c 2 0 q1... n/c n qn is defined s I(q 0 ) + n i=1 c i + F (q n ). We let D(w) = if D dmits no successful run on w. The min prolem tht hs een studied for distnce utomt is the limitedness prolem which consists of deciding whether the cost function computed y given distnce

utomton D is uniformly ounded on ll words w Σ with cost D(w). This prolem ws shown decidle y Hshigushi [5] nd lter in [6] ws shown to e PSPACEcomplete. Distnce utomt hve een relted to edit-distnce prolems in severl prior works see Section IX for further discussion of the connections. Here we note only simple reduction of the ounded repir prolem to limitedness. Given two NFA R nd T, one cn construct distnce utomton D tht computes the cost of repiring ny word from L (R) into word from L (T ). Let R = (Σ, Q, E, I, F ) nd T = (, Q, E, I, F ) e two NFA for the restriction nd trget lnguges. First of ll, we ssocite with ech symol Σ mtrix M() whose entries M()[p, q] re indexed over the pirs of sttes p, q of T nd give the minimum edit-distnce etween the symol nd word v such tht T cn move from p to q consuming v. If q is not rechle from p, then we let M()[p, q] =. We then define the distnce utomton D s the qudruple (Σ, Q Q, E M, I M, F M ), where E M is the set of ll trnsitions of the form ( (p, p ),, c, (q, q ) ), with Σ, (p,, q) E, nd c = M()[p, q ]. Further, we define I M (p, p ) s the length of the minimum word from stte in I to p if p I nd otherwise. Similrly, F M (p, p ) is the length of the minimum word from p to stte in F if p F nd otherwise. It is esy to see tht the cost function computed y D mps word u L (R) (which is ccepted y D too) to the cost of the est non-streming repir of u into L (T ). Moreover, the distnce utomton D hs size polynomil in the size of R nd T. Comining this reduction with the PSPACE upper ound for the limitedness prolem, we see tht the ounded repir prolem for NFA is in PSPACE. The sme reduction technique cn e pplied to solve the ounded repir prolem for DFA. In this cse, however, the resulting complexity ound is not optiml: the ounded repir prolem for DFA is in fct in conp (cf. Corollry 4). Roughly speking, the reson why the ounded repir prolem for DFAs is esier thn the limitedness prolem for distnce utomt is tht the distnce utomt emerging from DFA repir prolems re deterministic on the 0-cost moves. In ddition to not giving tight ounds, pproches vi distnce utomt give less insight into the prolems. We invite the reder, for exmple, to compre the PSPACE upper ound tht we derive from our chrcteriztion of repirility, Theorem 2, with the PSPACE upper ound given in [6]. Streming repirs nd energy gmes. Just s nonstreming repir prolems cn e seen within the frmework of distnce utomt, ounded repir prolems in the streming setting re specil cses of gmes on grph with quntittive ojectives. An interesting fmily of such gmes is tht of energy gmes studied in [7], which re plyed on finite weighted rens. The gme is plyed etween n energy plyer, who wnts to mntin the the running sum of the weights (i.e., the energy) lwys positive, nd her opponent. A vrint of energy gmes llows the prmeteriztion y n initil credit of energy; the higher the credit the more possiility for the energy plyer to win. It is well known tht the prolem of determining whether there is finite initil credit so tht the energy plyer hs winning strtegy is in NP conp [8], ut the exct complexity is still unknown. Furthermore, this prolem cn e solved in time polynomil in the size of the ren nd the lrgest weight in solute vlue. As mtter of fct, the ltter complexity result implies tht energy gmes cn e solved in polynomil time with respect to the size of the ren provided tht the weights re represented in unry. One cn esily reduce the ounded repir prolem in the streming setting, under the ggregte cost model for lnguges recognized y DFA, to the finite initil credit prolem for energy gmes. Informlly, the choice of the opponent in the energy gme corresponds to the letters emitted y the restriction, while the edits correspond to choices of the energy plyer. Formlly, we hve node in the ren for ech pir of sttes of the restriction DFA R nd of the trget DFA T cll this node Restriction Plyer Node. We lso hve node for ech comintion of restriction stte, trget stte, nd letter plyed cll this Trget Plyer Node. The former represents the sttes reched y the restriction nd trget utomt fter prsing the unedited nd edited words, while the ltter dds the lst letter emitted y the restriction. There is n edge of weight 0 going from Restriction Plyer Node (p, p ) to ny Trget Plyer Node (q, p, ), where (p,, q) is trnsition of the restriction DFA R. Similrly, there is n edge of weight c going from Trget Plyer Node (q, p, ) to Restriction Plyer Node (q, q ) provided tht there is word v t distnce c from (i.e., dist(, v) = c) such tht T cn move from p to q consuming v. It is importnt to oserve tht this reduction provides PTIME upper ound to the complexity of the ounded streming repir prolem for DFA given tht the size of the resulting ren is polynomil in the size of the restriction nd trget DFA nd, moreover, the weights re ounded y the size of the trget DFA. Our chrcteriztion results (see Theorem 3) give nlogous (tight) complexity ounds for lnguges recognized y DFA nd moreover, prove tht the ounded repir prolem in the streming setting is not sensitive to the models of ggregte/edit cost. They lso provide tight ounds for specil cses of the prolem tht cnnot nturlly e cptured in the setting of energy gmes. Our repir strtegy cn e seen s specil cse of the notion of good-for-energy strtegy, which is introduced in [8] to solve energy prity gmes. Despite the connections mentioned ove, mny concepts nd prolems concerning repir do not hve nturl nlogs in the gme setting, nd vice vers. For instnce, in the gme setting one could llow lookhed for one plyer, ut it is not s nturl s in the repir setting. Moreover, while the ggregte cost metric fits the gme setting nturlly, our usul cost function does not. Conversely, the inry weights tht re llowed in the gme setting hve no nturl nlog in the context of edits. Our chrcteriztion lso llows us to esily isolted specil cses of lower complexity tht re not esily seen from the emedding into energy gmes.

IV. CHARACTERIZATIONS OF BOUNDED REPAIRABILITY The non-streming cse. We fix restriction lnguge R nd trget lnguge T nd we ssume tht these lnguges re recognized y two NFA R nd T, respectively. Recll tht dg(r) is the directed cyclic grph of the SCCs of R nd dg (T ) is the symmetric nd trnsitive closure of dg(t ). Moreover, recll tht we ssume tht ll unrechle nd sink sttes re removed from oth R nd T. We sy tht pth π = C 1... C n in dg(r) is covered y pth π = C 1... C n in dg (T ) if we hve L (R C i ) L (T C i ) for ll indices 1 i n, nmely, if the lnguge recognized y the i-component long π is contined in the lnguge recognized y the i-component long π. The following chrcteriztion reduces the ounded repir prolem in the non-streming cse to the pth mtching prolem in finite directed cyclic grphs. Theorem 2. Given two NFA R nd T, the following conditions re equivlent 1) there is repir strtegy of L (R) into L (T ) with uniformly ounded cost, 2) every pth in dg(r) is covered y some pth in dg (T ), 3) there is repir strtegy of L (R) into L (T ) with worstcse cost t most (1 + dg(r) ) T. The interesting directions re from 2) to 3) nd from 1) to 2). For the first impliction, if the coverility condition is stisfied, then we repir word w L (R) y choosing ny pth π = C 1... C n in dg(r) tken y run of w, nd looking t covering pth in dg (T ). We cn consider w = u 1 1 u 2... n 1 u n such tht u i L (R C i ) nd j Σ for ll i n nd j < n. For covering pth π = C 1... C n of π this implies tht u i L (T C i ). Therefore, t the oundry points i where w jumps from the SCC C i to the next SCC C i+1 in R, we cn insert smll words tht push the computtion from C i to C i+1 in T ; ecuse these re strongly connected components nd there is pth from C i to C i+1, we cn rrnge jump to ny stte in C i+1. Thus we cn repir w y inserting ounded numer of smll words nd dding smll word t the end to rech finl stte in T. The second impliction is more complex, nd is proven y contrposition. Assuming the negtion of 2) we know tht there is pth π = C 1... C k of dg(r) tht is not covered y pths in dg (T ). For ech SCC C i of π we construct word u i tht witnesses ll non-continments of L (R C i ) in SCCs of T. We then construct, for ech n, word w n formed y conctenting n-fold itertions of ech word u i, tht is, w n = u 0 u n 1 u 1... u n k u k where the fixed words u 0,..., u k re rrnged to mke sure the resulting word is in L (R). Finlly, we rgue tht w n requires t lest n edits to e repired into word in L (T ). The streming cse. We now modify Theorem 2 to give chrcteriztion of the streming repir prolem, dding in gme setting. Fix two DFAs R nd T recognizing the restriction nd trget lnguges. We ssocite with the DFA rechility gme etween two plyers, Adm nd Eve, on suitle ren A R,T, defined in terms of the SCCs of R nd T. The ide underlying this gme is s follows: during Adm s construction of pth π in dg(r), Eve hs to provide construction of corresponding pth f(π) in dg (T ) tht covers π; moreover, the resulting function f must stisfy the following condition: if π C is n extension of the pth π in dg(r) y single SCC, then either f(π C) coincides with f(π) or it is n extension of f(π) y single SCC, nmely, f(π C) is of the form f(π) D. Formlly, the nodes of the ren A R,T for Adm (resp., Eve) re the pirs of the form (C, D) (resp., (D, C)), where C is SCC of R nd D is SCC of T. The edges of the ren connect Adm s nodes (C, D) to Eve s nodes (D, C ) where (C, C ) is n edge of dg(r) nd, similrly, Eve s nodes (D, C) to Adm s nodes (C, D ) where (D, D ) is n edge of dg (T ) nd, in ddition, L (R C) L (T D ). The initil node is n Eve node (D 0, C 0 ), where C 0 is the SCC of the initil stte of R nd D 0 is the SCC of the initil stte of T. The lst plyer who moves wins. Intuitively, Adm s ojective is to rech node (C, D) where Eve cnnot respond with ny move. Conversely, Eve s ojective is to rech node (D, C) where Adm cnnot respond with ny move. As usul, we sy tht plyer hs winning strtegy on the ren A R,T if he/she cn win the rechility gme on A R,T independently of the choices of the other plyer. The following chrcteriztion reduces the ounded repir prolem in the streming cse to the prolem of determining the winner of rechility gme. It lso shows tht, quite surprisingly, the ounded repir prolem in the streming setting is not sensitive to the notions of trnsducer with/without lookhed nd to the models of ggregte/edit cost. Theorem 3. Given two DFA R nd T, the following conditions re equivlent 1) there is k-lookhed streming strtegy, for some k N, tht repirs L (R) into L (T ) with uniformly ounded edit cost, 2) Eve hs winning strtegy for the rechility gme on A R,T, 3) there is 0-lookhed streming strtegy tht repirs L (R) into L (T ) with worst-cse ggregte cost t most (1 + dg(r) ) T. Going from 2) to 3), if we hve strtegy for Eve, we cn get streming repir strtegy y trcking the current SCC C of the input string nd mintining the invrint tht the component of the current repired string D is such tht (C, D) is position consistent with Eve s winning strtegy. When new letter comes in nd chnges the SCC in the restriction from C to C, we respond with repir tht moves from D to the response SCC D tht preserves the invrint. For the direction from 1) to 2), we ssume k-lookhed repir strtegy nd derive strtegy for Eve; our strtegy will mintin the invrint tht the position (C, D) corresponds to some input string w nd response w consistent with the

repir strtegy. If, y wy of contrdiction, we hve such pir (C, D) corresponding to some string w, successor SCC C of C corresponding to some extension wu of w nd (D, C ) is lossing position for Eve, then we cn construct single counterexmple word for every cndidte SCC. Given tht for every successor SCC D of D there is v L (R C ) \ L (T D ), we cn conctente multiple copies of v together. If we mke the numer of copies lrge enough, such string cnnot e repired y our trnsducer with ounded numer of edit opertions, contrdiction. V. COMPLEXITY RESULTS IN THE NON-STREAMING CASE In this section, we study the ounded repir prolem nd the threshold prolem in the non-streming setting. A. The ounded repir prolem We egin y nlyzing the complexity in the cse of lnguges recognized y non-deterministic finite utomt. NFA. Theorem 2 gives strightforwrd PSPACE lgorithm tht solves the ounded repir prolem etween two NFA R nd T in this setting: the lgorithm first guesses universlly pth π = C 1...C n in dg(r), then it guesses existentilly pth π = C 1...C n of the sme length in dg (T ), nd finlly it checks the continment of the suutomton R C i in the suutomton T C i for ll indices 1 i n. Together with the PSPACE lower ound for the prolem proven lter (see Corollry 19), we otin: Corollry 4. The ounded repir prolem in the nonstreming cse, where the restriction nd trget lnguges re represented y NFA, is PSPACE-complete. DFA. The sme chrcteriztion result cn e used to solve the prolem when the restriction lnguge is represented y n NFA nd the trget lnguge is represented y DFA. In this cse, we cn tke dvntge of the determinism to show tht the prolem turns out to e conp-complete. Intuitively, the complexity upper ound follows from the oservtion tht continment of lnguges recognized y SCCs of DFA is decidle in PTIME even if the successful runs cn strt from ritrry sttes inside the SCCs nd tht the ove mentioned coverility prolem for pths of SCCs is in conp. In other words, we cn guess pth in dg(r) nd check in PTIME if this pth is not covered in dg (T ). The complexity lower ound follows from reduction from the vlidity prolem for propositionl formuls in disjunctive norml form (i.e., the dul of the SAT prolem): the ide is to encode in the restriction lnguge ll the possile vlutions for the propositionl vriles nd then restrict the trget lnguge to consist only of encodings of vlutions tht stisfy t lest one cluse of the formul. Notice tht some redundncy is needed in the restriction to forid the repir strtegy from modifying the encoded vlutions. Theorem 5. The ounded repir prolem in the non-streming cse, where the restriction lnguge is represented y n NFA nd the trget lnguge is represented y DFA, is in conp nd it is conp-hrd lredy for lnguges represented y DFA. Before turning to the complexity of the ounded repir prolem for lnguges specified y LTL formuls, we riefly outline some prmeterized complexity results in the utomton cse. We first consider the cse where the restriction utomton is fixed nd the trget utomton is DFA provided s input to the prolem. Using rguments similr to the previous conp upper ound, one cn show tht the ounded repir prolem etween fixed restriction lnguge nd the trget lnguge recognized y given DFA is in PTIME. It is more difficult to show tht the ounded repir prolem is trctle when we fix the trget utomton. Here, insted of guessing pth π in dg(r) nd then checking whether π is covered y some pth π in dg (T ), we directly compute ll instnces of the coverility reltion. We then perform top-down lgorithm to compute which restriction components re covered. Proposition 6. Let T e fixed trget lnguge. The prolem of deciding, given n NFA R, whether there is non-streming repir strtegy of L (R) into T with uniformly ounded cost is in PTIME. LTL. We conclude the section y nlyzing the complexity of the ounded repir prolem where lnguges defined y LTL formuls re involved. We first consider the prolem where oth the restriction lnguge R nd the trget lnguge T re defined y some LTL formuls φ nd ψ. It is not difficult to see tht this prolem is in conexptime. Indeed, one cn use stndrd utomton-sed techniques to construct, in exponentil time, two DFA R nd T tht recognize the reversls R nd T of the lnguges R nd T. Since, in the nonstreming setting, the cost of repiring R into T is the sme s the cost of repiring R into T, one cn exploit Theorem 5 to solve the ounded repir prolem on the DFA R nd T in conexptime. For the complexity lower ound, one cn reduce the prolem of deciding the non-existence of tiling of n exponentil squre grid, which is known to e conexptimecomplete [9], to the prolem of deciding the existence of repir strtegy of uniformly ounded cost etween two regulr lnguges defined y suitle LTL formuls. The ide of such reduction is to let the formul for the restriction lnguge encode ll cndidte tilings nd the formul for the trget lnguge check tht none of them is correct. Theorem 7. The ounded repir prolem in the non-streming cse, where the restriction nd trget lnguges re represented y LTL formuls, is conexptime-complete. The ounded repir prolem ecomes esier when it involves repirs of lnguges recognized y NFA into lnguges defined y LTL formuls. The ide is to convert the formul into symolic utomt (represented using propositionl formuls), nd then pply the chrcteriztion theorem, looking for pths in the NFA tht re not covered y pths in the symoliclly-represented trget lnguge. Becuse the required continment checks cn e done in PSPACE on the symolic representtions, we get:

Theorem 8. The ounded repir prolem in the non-streming cse, where the restriction lnguge R is represented y n NFA nd the trget lnguge T is represented y n LTL formul, is in PSPACE. Similrly, the ounded repir prolem remins in PSPACE when the restriction is specified y n LTL formul φ nd the trget is recognized y n NFA T. In this cse, one still uses Theorem 2 nd symolic DFA R recognizing the reversl of the lnguge defined y φ. However, insted of universlly guessing n entire pth π in dg( R) one guesses the lef of counterexmple pth, nd verifies tht it is not covered y moving down from the root to the lef. Theorem 9. The ounded repir prolem in the non-streming cse, where the restriction lnguge R is represented y n LTL formul nd the trget lnguge T is represented y n NFA, is in PSPACE. B. The threshold prolem We now consider the prolem of clculting the exct cost. In the cse of DFA, we know from Theorem 5 tht we cn determine whether the repir cost is finite or infinite in conp. Furthermore, Theorem 2 tells us tht if the cost is finite it must e ounded y polynomil in the input size. Thus, to determine the exct repir cost in the cse where it is finite, it suffices to test whether the cost is ove or elow given threshold k in unry, since then we cn try every k elow the polynomil ound. Perhps surprising, this prolem is hrder thn the finiteness prolem, lthough still within polynomil spce: Theorem 10. The prolem of determining, given k nd two lnguges R nd T recognized y DFA, whether dist(r, T ) is ove k, is PSPACE-complete. The sme holds when R nd T re given s n NFA. The upper ound is shown y rechility nlysis in product utomt representing ll sttes rechle vi t most k edits. The lower ound uses reduction from tiling polynomil width corridor. Roughly speking, our restriction lnguge will represent codes of potentil tilings, with ech tile repeted k times. Our trget lnguge will check tht the word still codes tiling k-redundntly, nd will lso check for mrkings on tiles tht indicte tht violtion of horizontl or verticl constrints lies within k-neighorhood of the mrked tile. If there is no ccepting run, then every potentil tiling cn e mrked with constrint violtion. Conversely, if the restriction is repirle, then it cn e shown tht mrking must correctly indicte violtions on every cndidte tiling. In the cse of LTL, it is not priori even cler how to compute the distnce of single word to formul. However, this cn e shown to e in PSPACE. In the generl cse of two LTL formuls we get n exponentil low-up over the utomt cse, s expected: Theorem 11. The prolem of determining, given k nd two lnguges R nd T defined y LTL formuls, whether dist(r, T ) is ove k, is EXPSPACE-complete. The lower ound is proven using vrition of the tiling technique in the previous theorem. VI. COMPLEXITY RESULTS IN THE STREAMING CASE A. The ounded repir prolem DFA. Let us consider two DFA R nd T. The chrcteriztion of Theorem 3 shows tht the prolem of deciding the existence of streming repir strtegy of L (R) into L (T ) with uniformly ounded cost mounts t solving rechility gme over suitle (cyclic) ren A R,T. In prticulr, we oserve tht the ren A R,T cn e computed from R nd T in polynomil time nd tht checking continment of lnguges recognized y SCCs of utomt is in PTIME. Moreover, it is known tht the prolem of deciding the winner of rechility gmes over cyclic grphs is PTIME-complete [10]. This shows tht the ounded repir prolem for DFA in the streming cse is PTIME-complete: Corollry 12. The ounded repir prolem in the streming cse, where the restriction nd trget lnguges re represented y DFA, is PTIME-complete. It is worth noticing tht the complexity of the ounded repir prolem for DFA in the streming setting is lower thn the nlogous prolem in the non-streming setting (indeed Theorem 5 shows tht the ltter prolem is conpcomplete). This will e in contrst with the complexity results for lnguges defined y LTL formuls, where the streming setting ecomes more difficult thn the non-streming setting (compre Theorem 5 nd Theorem 15). NFA. When oth restriction nd trget re NFA we re not le to provide tight complexity ounds, thus we only clim tht the complexity of the ounded repir prolem for NFA is etween PSPACE nd EXPTIME. The lower ound follows from Corollry 19 nd the upper ound from the stndrd suset construction on NFA: Corollry 13. The ounded repir prolem in the streming cse, where the restriction nd trget lnguges re represented y NFA, is in EXPTIME nd it is PSPACE-hrd. In the cse where the restriction is DFA R nd the trget is n NFA T, we otin tight PSPACE ound. PSPACEhrdness follows gin from Corollry 19. As for the PSPACE upper ound, we oserve tht the longest collection of moves of Adm in the ren A R,det(T ), where det(t ) denotes the DFA otined from T y pplying the stndrd suset construction, is liner in the size of dg(r). By representing ech SCC of det(t ) using set of sttes from T, one otins n lternting polynomil-time procedure tht simultes the rechility gme over A R,det(T ). In the symmetric cse, where the restriction is n NFA nd the trget is DFA, one could prove n EXPTIME upper ound on the ounded repir prolem vi reduction to energy gmes with imperfect informtion (studied y Degorre et. l. in [11]). However, we cn improve this upper ound to PSPACE y simulting rechility gme over the ren A det(r),t. In this cse the crucil oservtion is tht it is sfe

to modify the ren A det(r),t y llowing Adm to move down the DAG of det(r) with shortcuts, nmely, from SCC of R to ny descendnt of it (rther thn simply successor of it). Allowing this freedom in the new rechility gme clerly mkes it esier for Adm to win. On the other hnd, if Adm wins in the modified ren, then he cn lso win in the originl ren vi longer plys. Moreover, if Adm wins the modified rechility gme, then he cn do so in t most dg (T ) rounds y properly choosing shortcut moves tht push Eve towrds sink node. This shows tht the prolem is in PSPACE (we do not know whether it is lso PSPACE-hrd). Theorem 14. The ounded repir prolem in the streming cse, where the restriction lnguge is DFA nd the trget lnguge is n NFA, is PSPACE-complete. The ounded repir prolem in the streming cse, where the restriction lnguge is n NFA nd the trget lnguge is DFA, is in PSPACE. LTL. We now turn to the complexity of the ounded repir prolem in the streming cse, where oth restriction nd trget lnguges re represented y LTL formuls. By following stndrd constructions in utomt theory, one cn trnslte ny pir of LTL formuls φ nd ψ into DFA R nd T tht hve size douly exponentil in the size of the formuls φ nd ψ nd tht recognize the sme lnguges defined y φ nd ψ. This gives strightforwrd 2EXPTIME upper ound to the complexity of the ounded repir prolem. As for the complexity lower ound, we cn reduce the prolem of deciding the winner of tiling gme over n exponentil squre grid this prolem is known to e EXPSPACE-complete [9] to the prolem of deciding the existence of streming repir strtegy of uniformly ounded cost etween the lnguges defined y suitle LTL formuls (the ide of such reduction is similr to the conexptime-hrdness proof of Theorem 7): Theorem 15. The ounded repir prolem in the streming cse, where the restriction nd trget lnguges re given y LTL formuls, is in 2EXPTIME nd is EXPSPACE-hrd. B. The threshold prolem nd constructing streming repirs For the streming cse, if we consider streming repir strtegies with ggregte cost, the threshold prolem mintins its PTIME complexity. Further, one cn esily reduce this threshold prolem to rechility gme over suitle ren. Theorem 16. The prolem of determining, given k nd two lnguges R nd T recognized y DFA, whether one cn repir R into T with streming repir strtegy with ggregte cost t most k, is in PTIME. In fct, it follows from the reduction tht one cn efficiently compute the optiml streming repir tht stisfies given threshold. This is ecuse we cn construct streming repir strtegy tht stisfies given threshold y finding winning strtegy for Eve in the rechility gme. Finding such strtegy is well-known to e in PTIME. Corollry 17. Let R nd T e the restriction nd trget lnguges specified y DFA. If R is streming repirle into T with ggregte cost t most k, then n optiml streming repir strtegy of R into T with ggregte cost t most k cn e computed in PTIME. Note tht in the ove we del with the ggregte cost; the exmple from Section III shows tht this cost cn differ from the edit cost, while our chrcteriztion theorem shows tht one is finite iff the other is. We do not know if finding the exct edit cost is even trctle. VII. SPECIAL CASES: UNRESTRICTED REPAIR PROBLEMS We now consider specil cse of the ounded repir prolem, nmely, the unrestricted cse where the restriction lnguge is ssumed to e Σ nd the trget lnguge T is represented y finite stte utomton. The following result dpts the chrcteriztion theorems given in Section IV to give necessry nd sufficient condition for the unrestricted cse. This result, which cn e viewed s specil cse of oth Theorem 2 nd Theorem 3, lso shows tht there is no difference etween the non-streming nd the streming settings when the restriction lnguge is universl. Corollry 18. Given n lphet Σ nd n NFA T, the following conditions re equivlent 1) there is repir strtegy of Σ into L (T ) with uniformly ounded cost, 2) T hs SCC C such tht Σ L (T C), 3) there is 0-lookhed streming strtegy tht repirs Σ into L (T ) with worst-cse ggregte cost t most 2 T. Using the ove chrcteriztion, one cn esily devise n NLOGSPACE lgorithm tht solves the ounded repir prolem for DFA in the unrestricted (streming or non-streming) cse. Indeed, if the trget utomton T is DFA nd C is component of T, then we hve Σ L (T C) iff for every symol Σ nd every stte q in C, T contins trnsition of the form (q,, q ), with q C. Checking this property mounts to performing stndrd NLOGSPACE rechility nlysis over T. Conversely, NLOGSPACE-hrdness follows from the fct tht the emptiness prolem for DFA is reducile to the ounded repir prolem: given DFA A over n lphet Σ, we hve tht L (A) iff Σ is repirle into L (A ) with uniformly ounded cost, where A is DFA tht cn e constructed from A in logrithmic-spce. In similr wy, one cn show tht the ounded repir prolem for NFA in the unrestricted cse is PSPACE-complete. This follows from Corollry 18 nd from suitle reductions from the universlity prolem for NFA. Indeed, checking whether trget NFA T hs SCC C such tht Σ L (T C) is equivlent to the prolem of deciding whether Σ is repirle into L (T ) with uniformly ounded cost, nd it is clerly reducile to the universlity prolem for NFA. As for the PSPACE-hrdness, we oserve tht given NFA A recognizes the universl lnguge Σ iff (Σ {#}) is repirle into (L (A) {#}) with uniformly ounded cost. Notice tht finite utomton for the lnguge (L (A) {#}) cn e computed in liner time.

We thus conclude the following: Corollry 19. The ounded repir prolem in the unrestricted cse, where the trget lnguges re represented y DFA (resp., NFA) is NLOGSPACE-complete (resp., PSPACEcomplete). Another consequence of Corollry 18 is the following. Suppose tht trget lnguge T is recognized y DFA T tht is complete over the trget lphet, nmely, for every symol nd every stte p of T, T contins trnsition from p leled y. Let us consider restriction lphet Σ contined in nd suppose tht Σ is not repirle into T with uniformly ounded cost. Let us consider SCC C of T tht is rechle from the initil stte nd terminl, nmely, with no outgoing edges. We know tht C does not contin ny finl stte (otherwise, C would e finl SCC nd hence, y Corollry 18, Σ would e repirle into L (T ) with uniformly ounded cost). In this cse, however, the sme component C in the complement DFA T would e finl nd hence Σ would e repirle into L (T ) (= \ T ) with uniformly ounded cost. This shows tht: Corollry 20. Given n lphet Σ nd regulr lnguge T, with Σ, one of the following two cses (possily oth) holds: 1) Σ is repirle into T with uniformly ounded cost, 2) Σ is repirle into \T with uniformly ounded cost. We now turn to the complexity of the ounded repir prolem in the unrestricted cse, ut where the trget lnguges re represented y LTL formuls. We clim tht prolem is PSPACE-hrd for LTL formuls. This complexity lower ounds follows from rguments similr to the utomton-sed setting, nmely, from reduction of the stisfiility prolem for LTL formuls, which is known to e PSPACE-hrd [12]. As for the complexity upper ound, we clim tht the prolem for LTL formuls is in PSPACE nd, thus, PSPACE-complete. Indeed, given n LTL formul ψ defining trget lnguge T, one cn compute in polynomil time symolic representtion of DFA T tht recognizes the reversl T of T. Moreover, one cn perform stndrd rechility nlysis on the symolic representtion of T in polynomil spce. Finlly, we oserve tht Σ is repirle into T with uniformly ounded cost iff Σ is repirle into T with uniformly ounded cost. This shows tht the ounded repir prolem in the unrestricted cse for LTL formuls is in PSPACE. Corollry 21. The ounded repir prolem in the unrestricted cse, where the trget lnguges re represented y LTL formuls, is PSPACE-complete. VIII. TOWARDS INFINITE WORDS In this section, we riefly discuss nturl generliztion of our chrcteriztion result for the ounded repir prolem over infinite words. Recll tht Theorem 2 reduces the ounded non-streming repir prolem to the prolem of deciding the property of coverility etween pths of SCCs in the DAGs of R : R : T : T : Figure 1: Some non-deterministic Büchi utomt. the restriction nd trget utomt. If we turn to lnguges of infinite words recognized y non-deterministic Büchi utomt (NBA), then the chrcteriztion result is similr. There is however slight compliction due to the cceptnce condition in the infinite cse. First of ll, we modify the nottion for the su-utomt otined from SCC. As in the previous cses for NFA, given n NBA B nd SCC C of it, we write B C to denote the usul NFA otined y restricting B to the sttes in C nd y letting them e oth initil nd finl sttes. We lso write B ω C to denote the NBA otined y restricting B to the set of sttes in C nd y letting them e initil (we do keep insted the finl sttes s in B). To understnd why we introduce the two vrints B C nd B ω C of su-utomt, it is worth looking t the following exmples. Let R nd T e the single-component NBA depicted t the top of Figure 1 nd let C nd D e their unique SCCs, respectively. Oserve tht, when we view R nd T s NFA, we hve L (R C) L (T D), nd hence, y Theorem 2, dist ( L (R), L (T ) ) < ω. However, when we view R nd T s NBA, we hve L (R C) L (T D), ut dist ( L ω (R), L ω (T ) ) = ω. On the other hnd, if we consider the NBA R nd T t the ottom of Figure 1, nd we denote y C nd D e their unique SCCs, then we hve tht L ω (R ω C ) L ω (T ω D ), ut dist ( L ω (R ), L ω (T ) ) < ω. The ove exmples suggest tht we should use oth vrints of su-utomt for estlishing chrcteriztion result for ounded non-streming repirility of lnguges recognized y NBA. We now turn to the generliztion of the notion of coverility. Given two NBA R nd T, pth π of length k in dg(r), nd set of pths Π in dg (T ), we sy tht π is Büchi-covered y Π iff 1) ll pths in Π hve length precisely k + 1, 2) L (R π(i)) L (T π (i)) for ll indices i < k, π Π 3) L ω (R ω π(k)) L (T π (k)) L ω (T ω π (k+1)). π Π π Π The chrcteriztion theorem for ounded non-streming repirility of NBA-recognizle lnguges is s follows: Theorem 22. Given two NBA R nd T, the following conditions re equivlent 1) there is repir strtegy of L ω (R) into L ω (T ) with uniformly ounded cost, 2) every pth in dg(r) is Büchi-covered y set of pths in dg (T ), 3) there is repir strtegy of L ω (R) into L ω (T ) with worst-cse cost t most (2 + dg(r) ) T. c c

fixed DFA NFA LTL fixed Const PTIME PSPACE PSPACE DFA PTIME CoNP PSPACE PSPACE NFA PTIME CoNP PSPACE PSPACE LTL PSPACE PSPACE PSPACE CoNEXP Tle I: Complexity of ounded non-streming repir fixed DFA NFA LTL fixed Const PTIME PSPACE PSP, EXPSP DFA PTIME PTIME PSPACE PSP, EXPSP NFA PT, PSP PT, PSP PSP, EXP PSP, 2EXP LTL PSP, EXPSP PSP, EXPSP PSP, 2EXP EXPSP, 2EXP Tle II: Complexity of ounded streming repir We omit the proof of this theorem, which is lmost identicl to tht of Theorem 2, nd we insted invite the reder to check tht the chrcteriztion for the infinite-word cse is consistent with the exmples tht we gve ove. As mtter of fct, the ove chrcteriztion result esily yields PSPACE upper ound for the ounded non-streming repir prolem etween lnguges recognized y NBA. IX. RELATED WORK AND CONCLUSIONS In this work we hve investigted lnguge repir in the most sic setting of words. Our results re summrized in Tle I nd Tle II in the non-streming setting our ounds re tight (indicted y single clss), while in the streming setting we hve severl gps (where cell gives lower nd upper ounds). We omit the corresponding tle for computing the exct cost: in the cse of non-streming repir we cn derive tight ounds in ll cses, nd lso in the cse of streming repir for ggregte cost. In the ltter cse we lso know the complexity of computing the optiml strem repir processor. We hve focused on the cse of finite words, ut infinite words rise mny new issues. In the cse of infinite words in streming setting, one cn look for strtegies tht llow finitely mny edits per word, without uniform ound, nd likewise look for strtegies with continuous (ut not uniformlyounded) lookhed. This lst issue hs een investigted for purely qulittive gmes y Holtmnn et. l. [13]. Relted work on edit distnce of lnguges. The prolem of finding the miniml distnce of string to regulr lnguge ws first considered y Wgner in [3], who showed tht the prolem could e solved y dpting the dynmic progrmming pproch to edit distnce, giving polynomil time lgorithm. Severl uthors hve extended the definition to del with distnces etween lnguges. Mohri [14] defines distnce function etween two sets of strings, nd more generlly etween string distriutions: in the cse of lnguges, this is the minimum distnce etween two strings in the two respective lnguges, which is pproprite for mny pplictions. Konstntinidis [15] focuses on the minimum distnce etween distinct strings within the sme lnguge, giving trctle lgorithms for computing it. Our notion of cost is quite distinct from this, since it is symmetric in the two lnguges, focusing on the mximum of the distnce of string in one lnguge to the other lnguge. Grhne nd Thomo [16] consider relted prolem of pproximte continment of regulr expressions. Expressions re evluted with respect to n edge-leled grph nd re given numericl semntics y distortion generliztion of the notion of edit distnce. Approximte continment of T 1 nd T 2 mens, roughly speking, tht for every input grph R nd every word w generted y R, the distnce to trget T 1 is ounded y the distnce to T 2. Grhne nd Thomo lso study k-continment (distnce to T 1 is t most k more thn T 2 ) nd pproximte-continment (kcontinment for some k), relying primrily on reduction to the limitedness prolem for distnce utomt. Their prolem differs in severl fundmentl respects from ours: they re interested in ounding the difference over ll words, not just the worst cse; in ddition they quntify over ll restrictions (dtses, in their terminology). An entire line of reserch in XML dt mngement hs delt with comprisons nd mtching lgorithms etween schem lnguges; mny of these lift edit distnce etween trees to the level of schems (i.e. lnguges) see, for exmple, [17]. However the lifting is done y looking t the syntctic structure of the schem description, rther thn t the instnce level (distnce etween documents in ech schem). Acknowledgments. We thnk the nonymous referees nd Slwek Stworko for mny helpful comments. The uthors were supported y EPSRC (UK) grnt EP/G004021/1. REFERENCES [1] M. Arens, L. Bertossi, nd J. Chomicki, Consistent query nswers in inconsistent dtses, in PODS, 1999, pp. 68 79. [2] F. Afrti nd P. Kolitis, Repir checking in inconsistent dtses: Algorithms nd complexity, in ICDT, 2009, pp. 31 41. [3] R. Wgner, Order-n correction for regulr lnguges, CACM, vol. 17, no. 5, pp. 265 268, 1974. [4] R. Wgner nd M. Fischer, The string-to-string correction prolem, JACM, vol. 21, no. 1, pp. 168 173, 1974. [5] K. Hshiguchi, Improved limitedness theorems on finite utomt with distnce functions, Theor. Comp. Sci., vol. 72, no. 1, pp. 27 38, 1990. [6] H. Leung nd V. Podolskiy, The limitedness prolem on distnce utomt: Hshiguchi s method revisited, Theor. Comp. Sci., vol. 310, pp. 147 158, 2004. [7] A. Chkrrti, L. de Alfro, T. A. Henzinger, nd M. Stoeling, Resource interfces, in EMSOFT, 2003, pp. 117 133. [8] K. Chtterjee nd L. Doyen, Energy prity gmes, in ICALP, 2010, pp. 599 610. [9] P. Vn Emde Bos, The convenience of tilings, in Complexity, Logic nd Recursion Theory, vol. 187, 1997, pp. 331 363. [10] C. Ppdimitriou, Computtionl Complexity. Addison-Wesley Longmn Pulishing Co., Inc., 1994. [11] A. Degorre, L. Doyen, R. Gentilini, J. Rskin, nd S. Toruńczyk, Energy nd men pyoff gmes with imperfect informtion, in CSL, 2010, pp. 260 274. [12] A. Sistl nd E. Clrke, The complexity of propositionl liner temporl logics, JACM, vol. 32, no. 3, pp. 733 749, 1985. [13] M. Holtmnn, L. Kiser, nd W. Thoms, Degrees of lookhed in regulr infinite gmes, in FOSSACS, 2010, pp. 252 266. [14] M. Mohri, Edit-distnce of weighted utomt: generl definitions nd lgorithms, Int l Journl of Foundtions of Comp. Sci., vol. 14, no. 6, pp. 957 982, 2003. [15] S. Konstntinidis, Computing the edit distnce of regulr lnguge, Inf. nd Comp., vol. 205, no. 9, pp. 1307 1316, 2007. [16] G. Grhne nd A. Thomo, Query nswering nd continment for regulr pth queries under distortions, in FOIKS, 2004, pp. 98 115. [17] H. Do nd E. Rhm, COMA - ystem for flexile comintion of schem mtching pproches, in VLDB, 2002, pp. 610 621.