1Dept.ofComputerScience,UniversityofChile.BlancoEncalada2120,Santiago, ABit-parallelApproachtoSuxAutomata: FastExtendedStringMatching Abstract.Wepresentanewalgorithmforstringmatching.Thealgorithm,calledBNDM,isthebit-parallelsimulationofaknown(butomaton"whichismadedeterministicinthepreprocessing.BNDM,in- recent)algorithmcalledbdm.bdmskipscharactersusinga\suxau- Chile.gnavarro@dcc.uchile.cl. 2InstitutGaspardMonge,CiteDescartes,Champs-sur-Marne,77454 Marne-la-ValleeCedex2,France.raffinot@monge.univ-mlv.fr. 3PartiallysupportedbyChileanFondecytgrant1-950622. GonzaloNavarro13MathieuRanot2 theiroriginalformulation.weshowthat,asotherbit-parallelalgorithms, BNDMcanbeextendedtohandleclassesofcharactersinthepattern orverylongpatterns(e.g.onenglishtextitisthefastestbetween5 stead,simulatesthenondeterministicversionusingbit-parallelism.this easilyimplementothervariantsofbdmwhichareextremelycomplexin ily.thismakesitthefastestalgorithminallcasesexceptforveryshort and110characters).moreover,thealgorithmisverysimple,allowingto algorithmis20%-25%fasterthanbdm,2-3timesfasterthanotherbitparallelalgorithms,and10%-40%fasterthanalltheboyer-moorefam- 1Introduction Thestring-matchingproblemistondalltheoccurrencesofagivenpattern p=p1p2:::pminalargetextt=t1t2:::tn,bothsequencesofcharactersfrom eralizethesuxautomatondenitiontohandleclassesofcharacters.to andinthetext,multiplepatternsandtoallowerrorsinthepatternor thebestofourknowledge,thisextensionhasnotbeenstudiedbefore. inthetext,combiningsimplicity,eciencyandexibility.wealsogen- thersthavinglinearworst-casebehavior,isknuth-morris-pratt(kmp)[14].a Moore(BM)[6].Thisalgorithmleadstoseveralvariations,likeHoorspool[12] secondalgorithm,asfamousaskmp,whichallowstoskipcharacters,isboyer- andsunday[20],formingthefastestknownstring-matchingalgorithms. regardedaslookingforautomatawhichareecientinsomesense.forinstance, anitecharacterset. KMPissimplyadeterministicautomatonthatsearchesthepattern,beingits mainmeritthatitiso(m)inspaceandconstructiontime.manyvariationsof Severalalgorithmsexisttosolvethisproblem.Oneofthemostfamous,and thebmfamilyaresupportedbyanautomatonaswell. wheretheideaistosearchasubstringofthepatterninsteadofaprex(askmp), Alargepartoftheresearchinecientalgorithmsforstringmatchingcanbe Anotherautomaton,called\suxautomaton"isusedin[9,10,11,15,19],
approach,whichhasalsobeenextendedtomultipatternmatching[9,11,19] (i.e.lookingfortheoccurrencesofasetofpatterns). orasux(asbm).optimalsublinearalgorithmsonaverage,like\backward DAWGMatch"(BDM)orTurboBDM[10,11],havebeenobtainedwiththis algorithmshavebeenobtainedforexactstringmatching[2,22],aswellasapproximatestringmatching[22,23,3].althoughthesealgorithmsworkwellonly Anotherrelatedlineofresearchistotakethoseautomataintheirnondetionsinsidecomputerwordstoperformmanyoperationsinparallel.Competitive memoryrequirements. onrelativelyshortpatterns,theyaresimpler,moreexible,andhaveverylow terministicforminsteadofmakingthemdeterministic.usuallythenondeter- ministicversionsareverysimpleandregularandcanbesimulatedusing\bit- parallelism"[1].thistechniqueusestheintrinsicparallelismofthebitmanipula- Matching(BNDM),whichweextendtohandleclassesofcharacters,tosearch tainafaststringmatchingalgorithm,calledbackwardnondeterministicdawg Shift-Or[2].BNDMusesanondeterministicsuxautomatonthatissimulated multiplepatterns,andtoallowerrorsinthepatternand/orinthetext,like thanshift-or),fasterthanitsdeterministic-automatoncounterpartbdm(20%- 25%faster),usinglittlespaceincomparisonwiththeBDMorTurboBDMalgorithms,andbeingverysimpletoimplement.Itbecomesthefasteststring matchingalgorithm,beatingalltheboyer-moorefamily(sundayincluded)by terministicversionusingbit-parallelism.thisextensionhasnotbeenconsidered forthebdmorturbobdmalgorithmsbefore. previousoneswhichcouldbeextendedinsuchaway(typically2-3timesfaster usingbit-parallelism.thisnewalgorithmhastheadvantageofbeingfasterthan Inthispaperwemergesomeaspectsofthetwoapproachesinordertoob- 90-150letters),dependingonjjandthearchitecture,otheralgorithmsbecome fasterthanbndm(sundayandbdm,respectively).moreover,wedeneanew suxautomatonwhichhandlesclassesofcharactersandwesimulateitsnonde- 10%to40%.Onlyforveryshort(upto2-6letters)orverylongpatterns(past thebitwise-xorand\"complementsallthebits.theshift-leftoperation,\<<", thebitsofcomputerwords:\j"isthebitwise-or,\&"isthebitwise-and,\b"is denotebitrepetition(e.g.031=0001).weusec-likesyntaxforoperationson ofpifpcanbewrittenp=uxv,u;v2.wedenotefact(p)thesetoffactors calledsu(p). ofp.afactorxofpiscalledasuxofpisp=ux.thesetofsuxesofpis Weintroducesomenotationnow.Awordx2isafactor(i.e.substring) movesthebitstotheleftandenterszerosfromtheright,i.e.bmbm?1:::b2b1<< r=bm?r:::b2b10r.wecaninterpretbitmasksasintegersalsotoperform arithmeticoperationsonthem. Anexpandedversionofthisworkcanbefoundin[17]. Wedenoteasb`:::b1thebitsofamaskoflength`.Weuseexponentiationto
thenexplainhowisitusedinthesearchalgorithm 2.1SuxAutomata WedescribeinthissectiontheBDMpatternmatchingalgorithm[10,11].This algorithmisbasedonasuxautomaton.werstdescribesuchautomatonand 2SearchingwithSuxAutomata andisshowninfigure1.weshownowhowthecorrespondingdeterministic niteautomatonthatrecognizesallthesuxesofthispattern.by\incomplete" wemeanthatsometransitionsarenotpresent. fordeterministicacyclicwordgraph)istheminimal(incomplete)deterministic automatonisbuilt.i01234567 Asuxautomatononapatternp=p1p2:::pm(frequentlycalledDAWG(p)- Thenondeterministicversionofthisautomatonhasaveryregularstructure Fig.1.Anondeterministicsuxautomatonforthepatternp=baabbaa.Dashedlines representepsilontransitions(i.e.theyoccurwithoutconsuminganyinput).iisthe initialstateoftheautomaton. Givenafactorxofthepatternp,endpos(x)isthesetofallthepattern positionswhereanoccurrenceofxends(thereisatleastone,sincexisafactor ofthepattern,andthereareasmanyasrepetitionsofxinsidep).formally, suchintegeraposition.forexample,endpos(baa)=f3;7ginthewordbaabbaa. Noticethatendpos()isthecompletesetofpossiblepositions(recallthatis givenx2fact(p),wedeneendpos(x)=fi=9u;p1p2:::pi=uxg.wecalleach theemptystring).noticethatforanyu;v,endpos(u)andendpos(v)areeither Fact(p),wedeneuvifandonlyifendpos(u)=endpos(v) disjointoronecontainedintheother. p=baabbaa,wehavethatbaaaabecauseinalltheplaceswhereaaendsin (noticethatoneofthefactorsmustbeasuxoftheotherforthisequivalence tohold,althoughtheconverseisnottrue).forinstance,inourexamplepattern thepattern,baaendsalso(andvice-versa). Wedeneanequivalencerelationbetweenfactorsofthepattern.Foru;v2 setsofpositions.astate,therefore,canbethoughtofafactorofthepattern ThenodesoftheDAWGcorrespondtotheequivalenceclassesof,i.e.to
inthenondeterministicautomaton. 1;)[p(i2+1;)[:::[p(ik;),where Anotherwaytoseeitisthatthesetofpositionsisinfactthesetofactivestates alreadyrecognized,exceptbecausewedonotdistinguishbetweensomefactors. whichisthesametosaythatwetrytoextendthefactorthatwerecognized withthenexttextcharacter,andkeepthepositionsthatstillmatch.ifweare Thereisanedgelabeledfromthesetofpositionsfi1;i2;:::ikgtop(i1+ p(i;)=(figifimandpi= pattern).asanexample,thedeterministicsuxautomatonofthewordbaabbaa correspondstothesetf0::mg.finally,astateisterminalifitscorresponding subsetofpositionscontainsthelastpositionm(i.e.wematchedasuxofthe isgiveninfigure2. leftwithnomatchingpositions,wedonotbuildthetransition.theinitialstate ;otherwise Fig.2.Deterministicsuxautomatonofthewordbaabbaa.Thelargestnodeisthe initialstate. 0,1,2,3,4,5,6,7 ab2,3,6,7 1,4,52,6 a aba3,7bbaa b The(deterministic)suxautomatonisawellknownstructure[8,5,11,18], 4567 andwedonotproveanyofitspropertieshere(neitherthecorrectnessofthe previousconstruction).thesizeofdawg(p)islinearinm(countingbothnodes algorithmisthatthisautomatoncannotonlybeusedtorecognizethesuxes ofp,butalsofactorsofp.bythesuxautomatondenition,thereisapath andedges),andcanbebuiltinlineartime[8].averyimportantfactforour butoptimalinaverage(o(nlogm=m)time)4.othermorecomplexvariations suchasturbobdm[10]andmultibdm[11,19]achievelineartimeintheworst matchingalgorithmcalledbdm.thisalgorithmiso(mn)timeintheworstcase, 2.2SearchAlgorithm Thesuxautomatonstructureisusedin[10,11]todesignasimplepattern labeledbyxformtheinitialnodeofdawg(p)ifandonlyifxisafactorofp. 4ThelowerboundofO(nlogm=m)inaverageforanypatternmatchingalgorithm underaberbouillimodelisfroma.c.yaoin[24].
case.tosearchapatternp=p1p2:::pminatextt=t1t2:::tn,thesux longestone.thebackwardsearchendsbecauseoftwopossiblereasons: suxesofprarethereverseprexesofp).thelastrecognizedprexisthe positionlastinsidethewindowandendingattheendofthewindow(sincethe avariablelast).thiscorrespondstondingaprexofthepatternstartingat searchesbackwardsinsidethewindowforafactorofthepatternpusingthe notcorrespondtotheentirepatternp,thewindowpositionisremembered(in automatonofpr=pmpm?1:::p1(i.ethepatternreadbackwards)isbuilt.a suxautomaton.duringthissearch,ifaterminalstateisreachedwhichdoes windowoflengthmisslidalongthetext,fromlefttoright.thealgorithm 1.Wefailtorecognizeafactor,i.ewereachaletterthatdoesnotcorrespond thewindowtotherightinlastcharacters(wecannotmissanoccurrence toatransitionindawg(pr).figure3illustratesthiscase.wethenshift becauseinthatcasethesuxautomatonwouldhavefounditsprexinthe window). Failtorecognizeafactorat:thepatterncannotstartbefore. SearchforafactorwiththeDAWG Recordinlastthewindowpositionwhenaterminalstateisreached last Window Themaximumprexstartsatlast 2.Wereachthebeginningofthewindow,thereforerecognizingthepatternp. Wereporttheoccurrence,andweshiftthewindowexactlyasintheprevious Fig.3.Basicsearchwiththesuxautomaton safeshift case(noticethatwehavethepreviouslastvalue). Newwindow Searchexample:wesearchthepatternaabbaabinthetext WerstbuildDAWG(pr=baabbaa),whichisgiveninFigure2.Wenotethe currentwindowbetweensquarebracketsandtherecognizedprexinarectangle. Webeginwith T=[abbabaa]bbaab,m=7,last=7. T=abbabaabbaab:
1.T=[abbabaa]bbaab. aisafactorofprandareverse prexofp.last=6. 2.T=[abbabaa]bbaab. aaisafactorofprandareverse prexofp.last=5. 3.T=[abbabaa]bbaab. aabisafactorofpr. Wefailtorecognizethenexta. Soweshiftthewindowtolast. Wesearchagainintheposition: T=abbab[aabbaab], last=7. 4.T=abbab[aabbaab]. bisafactorofpr. 5.T=abbab[aabbaab]. baisafactorofpr. 6.T=abbab[aabbaab]. baaisafactorofpr. 7.T=abbab[aabbaab]. baaisafactorofpr,andareverse prexofp.last=4. 8.T=abbab[aabbaab]. baabisafactorofpr. 9.T=abbab[aabbaab]. baabbisafactorofpr. 10.T=abbab[aabbaab]. baabbaisafactorofpr. 11.T=abbab[aabbaab]. Werecognizethewordaabbaab andreportanoccurrence. 3Bit-Parallelism In[2],anewapproachtotextsearchingwasproposed.Itisbasedonbitparallelism[1],whichconsistsintakingadvantageoftheintrinsicparallelism ofthebitoperationsinsideacomputerwordtocutdownthenumberofoperationsbyafactorofatmostw,wherewisthenumberofbitsinthecomputer word. TheShift-Oralgorithmusesbit-parallelismtosimulatetheoperationofa nondeterministicautomatonthatsearchesthepatterninthetext(seefigure4). AsthisautomatonissimulatedintimeO(mn),theShift-Oralgorithmachieves O(mn=w)worst-casetime(optimalspeedup).Ifweconverttheautomatonto deterministicwegetaversionofkmp[14],whichiso(n)searchtime,although twiceasslowinpracticeformw. 01234567 baabbaa Fig.4.Anondeterministicautomatontosearchthepatternp=baabbaainatext. Theinitialstateis0. WeexplainnowavariantoftheShift-Oralgorithm(calledShift-And).The algorithmbuildsrstatablebwhichforeachcharacterstoresabitmask bm:::b1.themaskinb[c]hasthei-thbitsetifandonlyifpi=c.thestateof thesearchiskeptinamachinewordd=dm:::d1,wherediissetwheneverthe
statenumberediinfigure4isactive.therefore,wereportamatchwhenever dmisset. textcharacter:eachstategetsthevalueofthepreviousone,providedthetext whichmimicswhatoccursinsidethenondeterministicautomatonforeachnew charactermatchesthecorrespondingarrow.the\j0m?11"correspondstothe usingtheformulad0 WesetD=0originally,andforeachnewtextcharacterTj,weupdateD allthetime). algorithmusesdm=wecomputerwordsforthesimulation(notallthemareactive initialself-loop.forpatternslongerthanthecomputerword(i.e.m>w),the Thisalgorithmisverysimpleandcanbeextendedtohandleclassesofcharacters(i.e.eachpatternpositionmatchesasetofcharacters),andtoallow mismatches.thisparadigmwaslaterenhancedtosupportwildcards,regular problems[22,3].bit-parallelismbecameageneralwaytosimulatesimplenon- expressions,approximatesearch,etc.yieldingthefastestalgorithmsforthose deterministicautomatainsteadofconvertingthemtodeterministic.thisishow weuseitinouralgorithm. 4Bit-ParallelismonSuxAutomata WesimulatetheBDMalgorithmusingbit-parallelism.Theresultisanalgorithm whichissimpler,useslessmemory,hasmorelocalityofreference,andiseasily ((D<<1)j0m?11)&B[Tj] dm:::d1. And,wekeepthestateofthesearchusingmbitsofacomputerwordD= 4.1TheBasicAlgorithm WesimulatethereverseversionoftheautomatonofFigure1.JustasforShift- showlaterhowtoextendthealgorithmforlongerpatterns. extendedtohandlemorecomplexpatterns.werstassumethatmwand ispositionedatanewtextpositionjustafterpos,itsearchesbackwardsthe windowtpos+1::tpos+musingthedawgautomaton,untileithermiterations areperformed(whichimpliesamatchinthecurrentwindow)ortheautomaton cannotperformanytransition. TheBDMalgorithmmovesawindowoverthetext.Eachtimethewindow updated.eachtimewendaprexofthepattern(dm=1)werememberthe weinitializedandscanthewindowbackwards.foreachnewtextcharacterwe Tpos+1+m?k::Tpos+m.Sincewebeginatiteration0,theinitialvalueforDis1m. Thereisamatchifandonlyifafteriterationmitholdsdm=1.Whenever longestprexmatchedcorrespondstothenextwindowposition. dm=1,wehavematchedaprexofthepatterninthecurrentwindow.the Inourcase,thebitdiatiterationkissetifandonlyifpm?i+1::m?i+k= Thealgorithmisasfollows.Eachtimewepositionthewindowinthetext
setsthebitscorrespondingtothepositionswherethepatternhasthecharacter c(justasinshift-and).theformulatoupdatedfollows andwesuspendthescanning(thiscorrespondstonothavinganytransitionto followintheautomaton).ifwecanperformmiterationsthenwereportamatch. positioninthewindow.ifwerunoutof1'sindthentherecannotbeamatch notshownforclarity. realcode,relatedtoimprovedowofcontrolandbitmanipulationtricks,are WeuseamaskBwhichforeachcharactercstoresabitmask.Thismask ThealgorithmissummarizedinFigure5.Someoptimizationsdoneonthe D0 (D&B[Tj])<<1 BNDM(p=p1p2:::pm;T=t1t2:::tn) 3. 1.Preprocessing 2. 4.Search 5. 6. 7. 10. 8. 9. Fori21::mdoB[pm?i+1] Forc2doB[c] Whilepos<=n?mdo j D=1m WhileD!=0mdo 0m;last m0m 11. j j?1 D&B[Tpos+j] B[pm?i+1]j0m?i10i?1 13. 12. ifd&10m?1!=0mthen Fig.5.Bit-parallelcodeforBDM.Someoptimizationsarenotshownforclarity. 16. 17. 14. 15. 18.Endofwhile pos D ifj>0thenlast pos+last D<<1 elsereportanoccurrenceatpos+1 j thecurrentwindowbetweensquarebrackets,aswellastherecognizedprexin Searchexample:wesearchthepatternaabbaabinthetextT=abbabaabba arectangle.webeginwith T=[abbabaa]bbaab,D=1111111,B=a1100110 ab.immediatelyaftereachstepnumber(1to11)weshowthetextandnote last=7,j=7. b0011001,m=7,
1.[abbabaa]bbaab. 1111111 &1100110 D=1100110 j=6;last=6 2.[abbabaa]bbaab. 1001100 &1100110 D=1000100 j=5;last=5 3.[abbabaa]bbaab. 0001000 &0011001 D=0001000 j=4;last=5 4.[abbabaa]bbaab. 0010000 &1100110 D=0000000 j=3;last=5 Wefailtorecognize thenexta.soweshift thewindowtolast.we searchagainintheposition:abbab[aabbaab],last =7,j=7. 5.abbab[aabbaab]. 1111111 &0011001 D=0011001 j=6;last=7 6.abbab[aabbaab]. 0110010 &1100110 D=0100010 j=5;last=7 7.abbab[aabbaab]. 1000100 &1100110 D=1000100 j=4;last=4 8.abbab[aabbaab]. 0001000 &0011001 D=0001000 j=3;last=4 9.abbab[aabbaab]. 0010000 &0011001 D=0010000 j=2;last=4 10.abbab[aabbaab]. 0100000 &1100110 D=0100000 j=2;last=4 11.abbab[aabbaab]. 1000000 &1100110 D=1000000 j=0;last=4 Reportanoccurrenceat6. 4.2HandlingLongerPatterns WecancopewithlongerpatternsbysettingupanarrayofwordsDtandsimulatingtheworkonalongcomputerword.Weproposeadierentalternative whichwasexperimentallyfoundtobefaster. Ifm>w,wepartitionthepatterninM=dm=wesubpatternssi,suchthat p=s1s2:::smandsiisoflengthmi=wifi<mandmm=m?w(m?1). Thosesubpatternsaresearchedwiththebasicalgorithm. Wenowsearchs1inthetextwiththebasicalgorithm.Ifs1isfoundata textpositionj,weverifywhethers2followsit.thatis,wepositionawindowat Tj+m1::Tj+m1+m2?1andusethebasicalgorithmfors2inthatwindow.Ifs2is inthewindow,wecontinuesimilarlywiths3andsoon.thisprocessendseither becausewendthecompletepatternandreportit,orbecausewefailtonda subpatternsi. Wehavetomovethewindownow.Aneasyalternativeistousetheshift last1thatcorrespondstothesearchofs1.however,ifwetestedthesubpatterns s1tosi,eachonehasapossibleshiftlasti,andweusethemaximumofallshifts.
jj=w))otherwise. Thatis,O(mn)intheworstcase(e.g.T=an;p=am),O(n=m)inthebestcase (e.g.t=an;p=am?1b),ando(nlogjjm=m)onaverage(whichisoptimal). 4.3Analysis ThepreprocessingtimeforouralgorithmisO(m+jj)ifmw,andO(m(1+ Ouralgorithm,however,benetsfrommorelocalityofreference,sincewedonot accessanautomatonbutonlyafewvariableswhichcanbeputinregisters(with theexceptionofthebtable).asweshowintheexperiments,thisdierence makesouralgorithmthefastestone. Inthesimplecasemw,theanalysisisthesameasfortheBDMalgorithm. words).thebestcaseoccurswhenthetexttraversalusings1alwaysperformsits maximumshiftafterlookingonecharacter,whichiso(n=w).weshow,nally, thattheaveragecaseiso(nlogjjw=w).clearlythesecomplexitiesareworse thanthoseofthesimplebdmalgorithmforlongenoughpatterns.weshowin oftheo(mn)stepsofthebdmalgorithmforcestoworkondm=wecomputer theexperimentsuptowhichlengthourversionisfasterinpractice. s1andcheckfortherestofthepattern.thecheckfors2inthewindowcosts Whenm>w,ouralgorithmisO(nm2=w)intheworstcase(sinceeach totalcostincurredbytheexistenceofs2:::smisatmost O(w)atmost.Withprobability1=jjwwends2andchecks3,andsoon.The Thesearchcostfors1isO(nlogjjw=w).Withprobability1=jjw,wend theshiftsnow.thesearchofeachsubpatternsiprovidesashiftlasti,andwe whichthereforedoesnotaectthemaincosttosearchs1(neitherintheory sincetheextracostiso(1)norinpracticesince"isverysmall).weconsider takethemaximumshift.now,theshiftlastiparticipatesinthismaximumwith M?1 Xi=1w takingthemaximum)thelongestpossibleshiftswwiththeirprobabilities,we probability1=jjwi.thelongestpossibleshiftisw.hence,ifwesum(insteadof jjwi"=w jjw(1+o(w=jjw))=o(1) 5FurtherImprovements longerthanlast1andshorterthanlast1+"=last1+o(1),andhencethecost isthatofsearchings1pluslowerorderterms. getintothesamesumabove,whichis"=o(1).therefore,theaverageshiftis [10,15],thelastonebeinglinearintheworstcaseandstillsublinearonaverage.Themainideaistoavoidretraversingthesamecharactersinthebackward caseevenformw,sincewecantraversethecompletewindowbackwards andadvanceitinonecharacter.ouraimnowistoreduceitsworstcasefrom O(nm2=w)toO(nm=w),i.e.O(n)whenm=O(w). 5.1ALinearAlgorithm Althoughouralgorithmhasoptimalaveragecase,itisnotlinearintheworst ImprovedvariationsonBDMalreadyexist,suchasTurboBDMandTurboRF
positions,wealreadyknowthatti+last::ti+m?1isaprexofthepattern(recallfigure3).theendingpositionoftheprexinthewindowisusuallycalled thecriticalposition.themainproblemifthisareaisnotretraversedishowto rememberonlytherstone. asfollows:letubethepatternprexbeforethecriticalposition.ifwereach setofpositionswhichisgivenbythestatewereachedinthesuxautomaton. p=uzr)werecognizethewholepatternp,andthenextshiftcorrespondsto thelongestborderofp(i.e.thelongestproperprexthatisalsoasux),which Weshifttotherightmostoccurrenceofzrinthepattern. possibletoknowwhetherzrisasuxofthepatternp:ifzrisasux,(i.e. thecriticalpositionafterreading(backwards)afactorzwiththedawg,itis canbecomputedinadvance.ifzrisnotasux,itappearsinthepatternina OnestrategyaddsakindofBMmachinetotheBDMalgorithm.Itworks windowvericationusingthefactthatwhenweadvancethewindowinlast determinethenextshift,sinceamongallpossibleshiftsinti+last::ti+m?1we preprocessingphasetoassociateinlineartimeanoccurrenceofzrinthepattern patternafterthebmshift.wedothatnow.recallthatuistheprexbefore thecriticalposition.theturborf(secondvariation)[10]usesacomplicated thefactorzwereadwiththedawgisasux,wetestwhetherdjzj=1.to gettherightmostoccurrence,weseektherightmost1ind,whichwecanget (ifitexists)inconstanttimewithlog2(d&(d?1))5.weimplementedthis algorithmunderthenamebmbndmintheexperimentalpartofthispaper. Thisalgorithmremainsquadratic,becausewedonotkeepaprexofthe ItisnotdiculttosimulatethisideainourBNDMalgorithm.Toknowif usethispreprocessingphaseondawgs.withoursimulation,thispreprocessing toaborderbuofu,inordertoobtainthemaximalprexofthepatternthatisa suxofuzr.moreover,theturborfusesasuxtree,anditisquitedicultto phasebecomessimple.toeachprexuiofthepatternp,weassociateamask Bord[i]thatregistersthestartingpositionsofthebordersofui(included). factorzends).hence,thebitsofx=bord[i]&darethepositionssatisfying anoccurrenceofzr.therstsetofpositionsisbord[i],andthesecondoneis bothcriteria.aswewanttherightmostsuchoccurrence(i.e.themaximalprex), nameturbobndmintheexperimentalpartofthispaper. preciselythecurrentdvalue(i.e.positionsinthepatternwheretherecognized wetakeagainlog2(x&(x?1)).weimplementedthisalgorithmunderthe borderofu,wewantthepositionswhichstartaborderofuandcontinuewith Thistableisprecomputedinlineartime.Now,tojoinoneoccurrenceofzrtoa 5.2AConstant-SpaceAlgorithm Itisalsointerestingtonoticethat,althoughthealgorithmneedsO(jjm=w) extraspace,wecanmakeitconstantspaceonabinaryalphabet2=f0;1g. 5Itisfasterandcleanertoimplementthislog2byshiftingthemasktotherightuntil andgetthesameresult. itbecomeszero.usingthistechniquewecanusethesimplerexpressiond^(d?1)
Thetrickisthatinthiscase,B[1]=pandB[0]=B[1].Therefore,weneed whichifthealphabetisconsideredofconstantsizeisofthesameorderofthe representingthesymbolsofwithbitsandworkingonthebits(themisaligned noextrastorageapartfromthepatternitselftoperformalltheoperations.in normalsearchtime. theory,anytextoveranitealphabetcouldbesearchedinconstantspaceby matcheshavetobelaterdiscarded).thisinvolvesanaveragesearchtimeof 6Extensions mlog2jjlog2(mlog2jj)=normaltimelog2jj1+log2log2jj nlog2jj log2m Weanalyzenowsomeextensionsapplicabletoourbasicscheme,whichforma thisworktheonlyextendedpatternswedealwitharethoseallowingaclassof successfulcombinationofeciencyandexibility. charactersateachposition. patterns"thosethataremorecomplexthanasimplestringtobesearched.in 6.1ClassesofCharacters AsintheShift-Oralgorithm,weallowthateachpositioninthepatternmatches notonlyasinglecharacterbutanarbitrarysetofcharacters.wecall\extended isnotanymoreadawg.wecallitextendeddawg.toourknowledge,this ognizesallsuxesofanextendedpatternp=c1c2:::cm.thisautomaton inp.afactorx=x1x2:::xrofp=c1c2:::cmisasuxifx12cm?r+1;x22 thatx12ci?r+1;x22ci?r+2;:::;xr2ci.suchaniiscalledapositionofx Ci?r+2;:::;xr2Cm. inisafactorofanextendedpatternp=c1c2:::cmifthereexistsanisuch Similarlytotherstpartofthiswork,wedesignanautomatonwhichrec- Wedenotep=C1C2:::Cmsuchextendedpatterns.Awordx=x1x2:::xr intheextendedpatternb[a,b]abbaa,andl-endpos(bba)=f3;6g(noticethat, DAWG,exceptforthenewdenitionofsuxes.Foranyxfactorofp,wedenote L-endpos(x)thesetofpositionsofxinp.Forexample,L-endpos(baa)=f3;7g implementation. ConstructionTheconstructionweuseisquitesimilartotheonewegiveforthe kindofautomatonhasneverbeenstudied.werstgiveaformalconstruction unlikebefore,thesetsofpositionscanbenotdisjointandnooneasubsetofthe oftheextendeddawg(provingitscorrectness)andlaterpresentabit-parallel other).wedenetheequivalencerelationeforu;vfactorsofpby uevifandonlyifl-endpos(u)=l-endpos(v):
factors(aspreviouslydened).theequivalencerelationeiscompatiblewith Lemma1LetpbeanextendedpatternandEtheequivalencerelationonits theconcatenationonwords. Wedenep(i;)withi2f0;1;:::;m;m+1g;2by StatesoftheautomatonaretheequivalenceclassesofE.Thereisanedge Thislemmaallowsustodeneanautomatonfromthisequivalenceclass. p(i;)=(figifimand2ci labeledbyfromthesetofpositionsfi1;i2;:::ikgtop(i1+1;)[p(i2+ ;otherwise word[a,b]aa[a,b]baaisgiveninfigure6. 1;)[:::[p(ik+1;),ifitisnotempty.Theinitialnodeoftheautomaton thesetofpositionsthatcontainm.asanexample,thesuxautomatonofthe isthesetthatcontainsallthepositions.terminalsnodesoftheautomatonare 0,1,2,3,4,5,6,7ba1,2,3,4,6,7 1,4,5aa2,3,4,7 2,6 4,5b b aaa,bbaa b3,7 3,44567 a,b Lemma2TheExtendedDAWGofanextendedpatternp=C1C2:::Cmrecognizesthesetofsuxesofp. Fig.6.ExtendedDAWGoftheextendedpattern0[a;b]1a2a3[a;b]4b5a6a7 ba bit-parallelism. Abit-parallelimplementation:fromtheaboveconstruction,theonlymodicationthatouralgorithmneedsisthattheBtablehasthei-thbitsetforall charactersbelongingtothesetofthei-thpositionofthepattern.thereforewe tendedpatternp.wedonotgiveanalgorithmtobuildthisextendeddawg Wecanusethisnewautomatontorecognizethesetofsuxesofanex- initsdeterministicform,butwesimulatethedeterministicautomatonusing simplychangeline3(partofthepreprocessing)inthealgorithmoffigure5to Fori21::m;c2doifc2pithenB[c] B[c]j0m?i10i?1
longshifts.however,thisismuchmoreresistantthansomesimplevariationsof doesnotchange. suchthatnowthepreprocessingtakeso(jjm)timebutthesearchalgorithm characters.thisiscommon,forinstance,indnadatabases.wecaneasily itselfmayhavebasiccharactersaswellasothersymbolsdenotingsetsofbasic Boyer-Mooresinceitusesmoreknowledgeaboutthematchedcharacters. canbedegradediftheclassesofcharactersaresignicantlylargeandprevent Moore-likealgorithm.Itshouldbeclear,however,thattheeciencyoftheshifts Wepointoutnowanotherextensionrelatedtoclassesofcharacters:thetext WecombinetheexibilityofextendedpatternswiththeeciencyofaBoyerhandlesuchtexts.AssumethatthesymbolCrepresentsthesetfc1;:::;crg. ThenwesetB[C]=B[c1]j:::jB[cr].Thisismuchmorediculttoachieve oflengthminparallel,wecanuseanarrangementproposedin[22],which withalgorithmsnotbasedonbit-parallelism. 6.2MultiplePatterns TosearchasetofpatternsP1:::Pr(i.e.reportingtheoccurrencesofallthem) concatenatesthepatternsasfollows:p=p11p21:::pr1p12p22:::pr2::::: P1mP2m:::Prm(i.e.alltherstletters,thenallthesecondletters,etc.)and Figure5isthattheshiftisnotinonebitbutinrbitsinline15(sincewehave searchespjustasasinglepattern.theonlydierenceinthealgorithmof position.thatis,wereplacetheold10m?1testmaskby1r0r(m?1)inline12. neededforeachword.moreover,itwillreportthematchesofanyofthepatterns dmofthecomputerwordweconsideralltherbitscorrespondingtothehighest rbitspermultipatternposition)andthatinsteadoflookingforthehighestbit andwillnotallowshiftingmorethanwhatallpatternsallowtoshift. patterns).inthiscasetheshiftinline15isforonebit,andthemaskforline12is (10m?1)r.Onsomeprocessorsashiftinonepositionisfasterthanashiftinr>1 positions,whichcouldbeanadvantageforthisarrangement.ontheotherhand, Thiswillautomaticallysearchforrwordsoflengthmandkeepallthebits intheircurrentproposal. inthiscasewemustclearthebitsthatarecarriedfromthehighestpositionof apatterntothenextone,replacingline15ford=(d<<1)&(1m?10)r.this dierentlengthsforthealgorithmofwuandmanber[22]whichisnotpossible involvesanextraoperation.finally,thisarrangementallowstohavepatternsof Analternativearrangementis:P=P1P2:::Pr(i.e.justconcatenatethe sothatthepatternsineachgrouptinwbits.sincethisskipscharacters,itis betteronaveragethan[22].asweshowintheexperiments,thisisalsobetter ifmbw=2candrm>wwedividethesetofpatternsintodr=bw=mcegroups, themostconservativeamongalltherpatterns. thansequentiallysearchingeachpatterninturn,evengiventhattheshiftsare Clearlythesetechniquescannotbeappliedtothecasem>bw=2c.However,
patterninatextallowingatmostk\errors".theerrorsareinsertions,deletions Approximatestringmatchingistheproblemofndingalltheoccurrencesofa 6.3ApproximateStringMatching oneforlowerrorlevels. anecientlterisproposedtodeterminethatlargetextareascannotcontain allthepiecesinparallel.sincekerrorscannotdestroythek+1pieces,someof thepiecesmustappearwithnoerrorsclosetoeachoccurrence.theyusethe andreplacementstoperforminthepatternsothatitmatchesthetext.in[22], thebestofbothworlds:ourperformanceiscomparabletoboyer-moorealgorithmsandwekeeptheexibilityofbit-parallelismhandleclassesofcharacters. Weshowintheexperimentshowouralgorithmperformsinthissetup. 7ExperimentalResults Weranextensiveexperimentsonrandomandnaturallanguagetexttoshow Ourmultipatternsearchtechniquepresentedintheprevioussectioncombines anoccurrence.itisbasedondividingthepatternink+1piecesandsearching amultipatternboyer-moorestrategyispreferred,whichisfasterbutdoesnot handleclassesofcharactersandotherextensions.thisalgorithmisthefastest multipatternsearchalgorithmmentionedinthepreviousparagraph.in[4,3], textsandpatternswith=2to64,aswellasnaturallanguagetextanddna sequences. UltraSparc-1of167MHz,with64MbofRAM,runningSunOS5.5.1.WemeasureCPUtimes,whicharewithin2%with95%condence.Weusedrandom howecientareouralgorithmsinpractice.theexperimentswererunonasun thebmfamily.turbobndmiscompetitivewithsimplebndmandhaslinear simplebndm.classicalbdm,ontheotherhand,issometimesslowerthan form2-6.thefastestalgorithmisbmbndm,thoughitisverycloseto KMP(veryslowtoappearintheplots,closeto0.14sec/Mb),Shift-Or(not patterns.thecomparisonincludesthebestknownalgorithms:bm,bm-sunday, alwaysshown,closeto0.07sec/mb),classicalbdm,andourthreebit-parallel variants:bndm,bmbndmandturbobndm. Ourbit-parallelalgorithmsarealwaysthefastestforshortpatterns,except WeshowinFigure7someoftheresultsforshort(mw)andlong(m>w) BNDM).Forlargeralphabets,ontheotherhand,anotherverysimplealgorithm ismorecomplex(noticethatboyer-mooreisfasterthanbdm,butslowerthan worstcase.ouralgorithmsareespeciallygoodforsmallalphabetssincetheyuse moreinformationonthematchedpatternthanothers.theonlygoodcompetitor forsmallalphabetsisboyer-moore,whichhoweverisslowerbecausethecode getsveryclose:bm-sunday.however,wearealwaysatleast10%faster. 6Wedidnotincludethemorecomplexvariationsofouralgorithmbecausetheyhave alreadybeenshownverysimilartothesimpleone.wedidnotincludealsothe algorithmswhichareknownnottoimprove,suchasshift-orandkmp. Onlongerpatterns6ouralgorithmceasestoimprovebecauseitbasically
searchesfortherstwlettersofthepattern,whileclassicalbdmkeepsimproving.hence,ouralgorithmceasestobethebestone(beatenbybdm)form 90-150.Thisvaluewouldatleastduplicateina64-bitarchitecture. generatedmanuallyasfollows:weselectfromanenglishtextaninfrequentword, thebndmalgorithm.wecomparetheeciencyagainstshift-or.theresultis presentedintable1,whichshowsthateveninthecaseofthreeinitialornal namely"responsible"(closeto10matchespermegabyte).thenwereplaceits lettersallowingalargeclassofcharacterstheshiftsaresignicantandwedouble theperformanceofshift-or.hence,ourgoalsofhandlingclassesofcharacters rstorlastcharactersbytheclassfa::zg.thiswilladverselyaecttheshiftsof withimprovedsearchtimesareachieved. Weshowalsosomeillustrativeresultsusingclassesofcharacters,whichwere responsible6.582.71 responsibl?6.512.96 responsib??6.523.23 responsi???6.493.40?esponsible6.462.93??sponsible6.553.42???ponsible6.513.78 PatternShift-OrBNDM Table1.Searchtimeswithclassesofcharacters,in1/100-thofsecondspermegabyte onenglishtext.thequestionmark'?'representstheclassfa::zg. rstarrangementisslightlymoreecientthanthesecondone,theyarealways calledmulti-bndm(1)and(2)attendingtotheirpresentationorder),against vesequentialsearcheswithbndm(calledbndminthelegend),andagainst theparallelversionproposedin[22](calledmulti-wm).asitcanbeseen,our moreecientthanasequentialsearch(althoughtheimprovementisnotve-fold thatalthoughwetaketheminimumshiftamongallthepatterns,wecanstill dobetterthansearchingeachpatterninturn.wetakerandomgroupsofve patternsoflength6andcompareourmultipatternalgorithm(initstwoversions, WepresentinFigure8someresultsonourmultipatternalgorithm,toshow forapproximatestringmatching.weincludethefastestknownalgorithmsin thecomparison[4,3,7,13,16,23,22].wecomparethosealgorithmsagainst buttwo-orthree-foldbecauseofshortershifts),andaremoreecientthanthe ourversionof[4](wherethesundayalgorithmisreplacedbyourbndm),while proposalof[22]provided8. weconsider[22]notasthebit-parallelalgorithmpresentedtherebuttheirother proposal,namelyreductiontoexactsearchingusingtheiralgorithmmulti-wm formultipatternsearch(showninthepreviousexperiment).figure9showsthe Finally,weshowtheperformanceofourmultipatternalgorithmwhenused
resultsfordierentalphabetsizesandm=20. algorithmceasestobecompetitiveshortbeforetheoriginalversion[4].thisis back,ouralgorithmisquitecloseto[4](sometimesevenfaster)whichmakesit becausethelengthofthepatternstosearchforiso(m=k).despitethisdraw- areasonablycompetitiveyetmoreexiblealternative,whilebeingfasterthan theotherexiblecandidate[22]. 8Conclusions SinceBNDMisnotverygoodforveryshortpatterns,theapproximatesearch ofanondeterministicsuxautomaton.thisautomatonhasbeenpreviously Wepresentanewalgorithm(calledBNDM)basedonthebit-parallelsimulation usedindeterministicforminanalgorithmcalledbdm.ournewalgorithm isexperimentallyshowntobeveryfastonaverage.itisthefastestalgorithm matchingandapproximatepatternmatching,amongothers. tendedsimplyandecientlytohandleclassesofcharacters,multiplepattern usingbit-parallelismandbecomepracticalalgorithms.turbobndmhasaverageperformanceveryclosetobndm,thougho(n)worstcasebehavior,while BMBNDMisslightlyfasterthanBNDM.TheBNDMalgorithmcanbeex- inallcasesforpatternsfromlength5to110(onenglish;theboundsvary variationscalledturbobndmandbmbndmwhicharederivedfromthecor- dependingonthealphabetsizeandthearchitecture).wepresentalsosome respondingvariantsofbdm.thesevariantsaremuchmoresimplyimplemented tershasneverbeenstudied.itsstudyshouldpermittoextendthebdmand TurboRFtohandleclassesofcharacters. Thenewsuxautomatonweintroduceandsimulateforclassesofcharac- References isjustabmalgorithmwhichusespairsofcharactersinsteadofsingleones. matchingsoftwares.weplantoworkonthisideatoo. Thisisanorthogonaltechniquethatcanbeincorporatedinallalgorithms,and ageneralstudyofthistechniquewouldpermittoimprovethespeedofpattern TheAgrepsoftware[21]isinmanycasesfasterthanBNDM.However,Agrep 5.A.Blumer,A.Ehrenfeucht,andD.Haussler.Averagesizesofsuxtreesand 4.R.Baeza-YatesandC.Perleberg.Fastandpracticalapproximatepatternmatching.InProc.CPM'92,pages185{192.Springer-Verlag,1992.LNCS644ing.InProc.ofCPM'96,pages1{23,1996. 35(10):74{82,October1992. putercongress,volumei,pages465{476.elsevierscience,september1992. 3.R.Baeza-YatesandG.Navarro.Afasteralgorithmforapproximatestringmatch- 1.R.Baeza-Yates.Textretrieval:Theoryandpractice.In12thIFIPWorldCom- 2.R.Baeza-YatesandG.Gonnet.Anewapproachtotextsearching.CACM, dawgs.discreteappliedmathematics,24(1):37{45,1989. 6.R.S.BoyerandJ.S.Moore.Afaststringsearchingalgorithm.Communications oftheacm,20(10):762{772,1977.
10.M.Crochemore, 9.M.Crochemore, 8.M.Crochemore.Transducersandrepetitions.Theor.Comput.Sci.,45(1):63{86, 7.W.ChangandJ.Lampe.Theoreticalandempiricalcomparisonsofapproximate 1986. stringmatchingalgorithms.inproc.ofcpm'92,pages172{181,1992.lncs644. 12.R.N.Horspool.Practicalfastsearchinginstrings.Softw.Pract.Exp.,10:501{506, 11.M.CrochemoreandW.Rytter.Textalgorithms.OxfordUniversityPress,1994. S.Jarominek,T.Lecroq,W.Plandowski,andW.Rytter.Fastpracticalmultipatternmatching.Rapport93-3,InstitutGaspardMonge,UniversitedeMarnela Vallee,1993. L.Gasieniec,S.Jarominek,T.Lecroq,W.Plandowski,andW.Rytter.Speeding uptwostring-matchingalgorithms.algorithmica,(12):247{267,1994. A.Czumaj, 13.P.Jokinen,J.Tarhio,andE.Ukkonen.Acomparisonofapproximatestring 14.D.E.Knuth,J.H.Morris,Jr,andV.R.Pratt.Fastpatternmatchinginstrings. matchingalgorithms.softwarepracticeandexperience,26(12):1439{1458,1996. 1980. A.Czumaj, 15.T.Lecroq.Recherchesdemot.Thesededoctorat,Universited'Orleans,France, 16.G.Navarro.Apartialdeterministicautomatonforapproximatestringmatching. SIAMJournalonComputing,6(1):323{350,1977. 1992. 19.M.Ranot.Onthemultibackwarddawgmatchingalgorithm(MultiBDM).In 17.G.NavarroandM.Ranot.Abit-parallelapproachtosuxautomata:Fast 18.M.Ranot.Asymptoticestimationoftheaveragenumberofterminalstatesin InProc.ofWSP'97,pages112{124.CarletonUniversityPress,1997. Processing,pages149{165,Valparaiso,Chile,November12-13,1997.CarletonUniversityPress. Chile,November12-13,1997.CarletonUniversityPress. R.Baeza-Yates,editor,Proceedingsofthe4rdSouthAmericanWorkshoponString dawgs.inr.baeza-yates,editor,proc.ofwsp'97,pages140{148,valparaiso, puterscience,univ.ofchile,jan1998.ftp://ftp.dcc.uchile.cl/pub/users/- gnavarro/bndm.ps.gz. extendedstringmatching.technicalreporttr/dcc-98-1,dept.ofcom- 20.D.Sunday.Averyfastsubstringsearchalgorithm.CACM,33(8):132{142,August 21.S.WuandU.Manber.Agrep{afastapproximatepattern-matchingtool.In 22.S.WuandU.Manber.Fasttextsearchingallowingerrors.CACM,35(10):83{91, 23.S.Wu,U.Manber,andE.Myers.Asub-quadraticalgorithmforapproximate 24.A.C.Yao.Thecomplexityofpatternmatchingforarandomstring.SIAMJournal Proc.ofUSENIXTechnicalConference,pages153{162,1992. 1990. October1992. oncomputing,8(3):368{387,1979. limitedexpressionmatching.algorithmica,15(1):50{67,1996. ThisarticlewasprocessedusingtheLATEXmacropackagewithLLNCSstyle
30 51015202530 2 7 234567 m 40 160 406080100120140160 1.5 5.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 m 30 51015202530 2.0 5.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 m 40 160 406080100120140160 1.5 2.4 1.5 1.8 2.1 2.4 m 30 51015202530 2 7 234567 m 40 160 406080100120140160 1.7 2.5 1.7 1.9 2.1 2.3 m BDM BNDM BMBNDM TurboBNDMShift-Or Sunday Boyer-Moore Fig.7.Timesin1/100-thofsecondspermegabyte.Forrsttothirdrow,random textwith=4,randomtextwith=64andenglishtext.leftcolumnshowsshort patterns,rightcolumnshowslongpatterns.
248163264 0 40 0 5 10 15 20 25 30 35 40 t Multi-BNDM(1)Multi-BNDM(2)Multi-WMBNDM Fig.8.Timesin1/100-thofsecondspermegabyte,formultipatternsearchonrandom textofdierentalphabetsizes(xaxis). ++++++++++ 1 10 12345678910 0.0 4.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 k t ++++++++++ 1 10 12345678910 0.0 4.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 k t Ex.Part.(ours) Ex.Part.[4] Ex.Part.[22] BitParall.[3]Col.Part.[7] Counting[13]+DFA[16] 4-russians[23] Fig.9.Timesinsecondspermegabyte,forrandomtextonpatternsoflength20,and =16and64(rstandsecondcolumn,respectively).Thexaxisisthenumberof errorsallowed.