|
|
- Alexander Hensley
- 8 years ago
- Views:
Transcription
1 TheIMSCorpusWorkbench CorpusAdministrator'sManual InstitutfurmaschinelleSprachverarbeitung UniversitatStuttgart OliverChrist {Computerlinguistik{ D70174Stuttgart1 Azenbergstr.12 LastModied:WedNov914:33:271994(oli) Created:ThuFeb2410:34:111994(oli) tc.bibentry:christ:94b Released:{notyet{
2 Contents 1Overview 1.1Introduction:::::::::::::::::::::::::::::::::::: 1.2TheroleoftheCorpusAdministrator::::::::::::::::::::: 4 1.4CreditsandAcknowledgements::::::::::::::::::::::::: 1.3Organizationofthismanual::::::::::::::::::::::::::: 2Internalcorpusrepresentation 5 2.1Positionalattributes::::::::::::::::::::::::::::::: 2.1.1Integerizedles:::::::::::::::::::::::::::::: 2.1.2Inverseditemsequence:::::::::::::::::::::::::: Example:asimplewordsearch::::::::::::::::::::: Otherattributetypes:::::::::::::::::::::::::::::: Structuralattributes::::::::::::::::::::::::::: Thesetofpositionalattributes:::::::::::::::::::::15 2.3Externaltoolsanddynamicattributes::::::::::::::::::::: Alignmentattributes::::::::::::::::::::::::::: Bigramandmappingtables:::::::::::::::::::::::18 3Encoding:Transformingacorpusintoitsinternalrepresentation 3.2Theencodeprogram:::::::::::::::::::::::::::::::21 3.1Theinternalrepresentationofacorpus::::::::::::::::::::: Spacerequirements::::::::::::::::::::::::::::::::26 3.3Themakeallprogram:::::::::::::::::::::::::::::: Positionalattributes::::::::::::::::::::::::::: Structuralattributes:::::::::::::::::::::::::::27 1
3 4Thecorpusregistry IMSCorpusWorkbench:Administrator'sManual 2 4.2Thecontentsofaregistryle::::::::::::::::::::::::::29 4.1Someremarksaboutnomenclature::::::::::::::::::::::: Positionalattributes::::::::::::::::::::::::::: Structuralattributes::::::::::::::::::::::::::: Theheader:::::::::::::::::::::::::::::::: Mappingtables:::::::::::::::::::::::::::::: ngramtables::::::::::::::::::::::::::::::: Alignmentattributes:::::::::::::::::::::::::::35 4.4Alastexample::::::::::::::::::::::::::::::::::37 4.3Registrationofremotecorpora::::::::::::::::::::::::: Dynamicattributes::::::::::::::::::::::::::::35 5Remoteaccess{clientandserversetup 4.5Stepstofollow::::::::::::::::::::::::::::::::::38 5.2Howtostartthecorpusdataserver:::::::::::::::::::::::41 5.1The.ratand.ratlogles:::::::::::::::::::::::::::39 6Utilitiesanddebuggingtools 6.1Decodingofcorpusandattributeinformation::::::::::::::::: Decodingofcorpusinformation:decode::::::::::::::::42 6.2CreationandDecodingofBigramTables:::::::::::::::::::: Creationofbigramtables:gen-bigrams:::::::::::::::: Decodingofwordlists:lexdecode:::::::::::::::::::42 6.3CreationandDecodingofMappingTables::::::::::::::::::: Creationofmappingtables:gen-mapping-table::::::::::: Decodingofbigramtables:decode-bigrams:::::::::::::43 6.4Generalutilities:::::::::::::::::::::::::::::::::: Comparingwordlistsandcorpora:check-coverage::::::::: Decodingofmappingtables:decode-mapping-table::::::::: Convertinginternalintegerstoreadablenumbers:itoa::::::: Convertingreadablenumberstointernalintegers:atoi:::::::44
4 7Accesscontrolandsecurityissues IMSCorpusWorkbench:Administrator'sManual 3 7.1Controllinglocalaccesstocorpora:::::::::::::::::::::::45 AHardwareandoperatingsystemrequirements 7.2Controllingremoteaccesstocorpora::::::::::::::::::::::46 BReusedsoftwarepackagesandcopyrightnotices 48 B.1TheregularexpressionmatcherbyHenrySpencer::::::::::::::49
5 Chapter1 Overview queryingoflargetextcorpora.thismanualdescribeshowtoencodeatextcorpusand 1.1 TheIMScorpusworkbenchisasetoftoolsfortheecientencoding,representationand Introduction howthevariousadministrationtoolsmustbeusedtotransformatextcorpusintothe moregeneralpapers,especially[christ,1994]foranoverviewofthesystemarchitectureas familiarwiththeoverallarchitecture,werecommenda\top-down"readingthroughthe representationusedbytheaccesstools.thismanualdoesnotdescribethefunctionality awhole. ofthequerytoolsorthearchitectureoftheworkbenchingeneral.ifthereaderisnot representationusedbytheimsworkbench,thefollowingstepshavetobeperformed: ecientlookup.totransformatextcorpusfromitstextualrepresentationtotheinternal pusdata,thedierent\items"(i.e.,words)usedinthecorpusandseveralindexlesfor Theinternalrepresentationofacorpusconsistsofasetofleswhichrepresentthecor- 3.declarationofthecorpusinaglobal\registrydirectory"; 2.encodingofthetextle; 1.transformationofthetextleinone-word-per-lineformat; Steps1and3havetobedonemanually,forsteps2and4therearetoolswithinthe workbench. 4.andbuildingseveralleindices. expectstondalewiththeverysamenameasthesymbolicnameofthecorpustobe toatool,itislookedupinacentraldirectory(calledthe\corpusregistry"),wherethetool usedinstep2above)accessacorpusviaasymbolicname.whenasymbolicnameispassed Thethirdstep,theregistrationofacorpus,isnecessarysincealmostalltools(buttheone accessed.thisleholdsadescriptionofthecomponentsofthecorpus,mainlyalistwhere thecomponentsarestoredphysically.so,auserdoesnothavetoknowwhereacorpusis stored,heorsheonlyhastoknowitssymbolicnametoaccessit. 4
6 Afteracorpusistransformedintoitsinternalrepresentationandregistered,itcanbe IMSCorpusWorkbench:Administrator'sManual 5 usedbythevarioustoolsoftheworkbench,forexamplethequerytools(xkwic,cqp, print-aligned). WithintheIMScorpusworkbench,thecorpusadministratorhasthetaskstoprovideusers 1.2 withnewcorporaortochangeexistingcorporawhensomeinformationhastobeadded TheroleoftheCorpusAdministrator orupdated.second,theadministratorhastoproperlyinstalltheusuallylargecorpusles inthelesystemandtondan\optimal"placewithregardtobackuppolicies,diskusage andaccesseciency.third,thecorpusadministratorhastocareaboutaccesscontrol, evenwithinoneinstitution.thesetasksaresimilarto\standard"systemadministration. sincecorporaexistwherecopyrightsorlicenseagreementsinhibitanunrestrictedaccess, accesscontrol,thatis,yourlocalsystemadministrator. familiarwiththestandardunixtextprocessingtools,backupstrategies,andsecurityand Wethereforesuggestthatthecorpusadministrationtasksarefullledbysomeonewhois whichareusedtostorethecorpusdata.youmayskiptheentirechapterifyouwant,it Thismanualisorganizedasfollows.Thenextchapter2explainstheinternaldatastructures 1.3 Organizationofthismanual stepswhicharenecessarytotransformacorpusfromitstextualrepresentationintoits isnotnecessaryfortheotherchapters,butusefulifyouhaveproblemswiththetools formatoftheleswhichdescribethephysicalattributesofacorpus.chapter5describes internalrepresentation.chapter4,then,describesindetailtheregistrydirectoryandthe orwanttolearnhowtomanipulatethedatales.chapter3,then,describesthevarious howtosetuptheclient-server-capabilitiesoftheworkbench.chapter6describesutilities andrelatedtoolswhicheitheraddmore(orothertypesof)informationtoacorpusor checkwhetherthetoolscanrunatallonyoursystem,youmayrefertoappendixafor hardwareandoperatingsystemspecicrequirementsofourtools. Anotherimportantpointisaccesscontrolforcorpora,whichisdiscussedinchapter7.To areusefulforotherpurposes,forexamplefordebuggingofacorpus(duringencoding). makeuseofintegerizeddatalesandreversedindices.bothofthesetechniquesarewellknownintheareaofinformationprocessingformanydecades,buttoourknowledgetherst whoappliedthemtotextandcorpusprocessinginthelinguisticareawaskenchurch. Hedeservesourgreatestthanksforpointingustothesemethods. TheinternaldatastructuresweuseinCqp,Xkwicandsomeothertoolsoftheworkbench 1.4 CreditsandAcknowledgements
7 Neithertheauthors,norIMS,northeUniversityofStuttgartmakeanyrepresentations IMSCorpusWorkbench:Administrator'sManual 6 aboutthesuitabilityofthesoftwaredescribedhereinortheassociateddocumentation foranypurpose.itisprovided"asis"withoutexpressorimpliedwarranty.wedisclaim allwarrantieswithregardtothesoftwaredescribedhereinortherelateddocumentation, othertortiousaction,arisingoutoforinconnectionwiththeuseorperformanceofthis liableforanyspecial,indirectorconsequentialdamagesoranydamageswhatsoever resultingfromlossofuse,dataorprots,whetherinanactionofcontract,negligenceor includingallimpliedwarrantiesofmerchantabilityandtness,innoeventshallwebe software.
8 Chapter2 Internalcorpusrepresentation Thischapterexplainshowcorpusdataisrepresentedinternally.Whenyouunderstandthe internalrepresentation,youcanusethetoolsofthetoolboxtocreate,updateorchange corpusinformationwithouthavingtogobacktothetextualversionandencodingthewhole stuagain.youwillalsobeabletogureouthowtoencodetheinternalrepresentation Ifyoudonotneedto\hack"withthecorpusdata,youmayskiptheentirechapter.The forleswhichcannotbecomputedbythetoolsofthetoolbox,forexampleduetomemory problems,softwarebugsorlimitations. understandingoftheinternalrepresentationisnotnecessaryfortheotherpartsofthis manual,butusefulifyouencounterproblemswiththetools. 2.1 WithintheIMScorpusworkbench,acorpuscanhaveanarbitrarynumberofannotations ofdierenttypes.inoursystem,acorpusisprimarilyregardedasasequenceofwords(not Positionalattributes whichisthemostimportantannotationtype.attributesofthisclasshavea(string)value asasequenceofcharacters).thewords,then,arenumbered,sothatwecandirectlyaccess ateachcorpusposition.1 thenthwordofthecorpus).thisleadstothemoregeneralnotionofpositionalattributes, thewordatacertaincorpuspositionp(i.e.,therstwordinthecorpus,or,ingeneral, : pos: N N IP NUM N ADJ N IP regardedasthenumberofalineinthisrepresentation. 1Whenthecorpusisstoredinaverticalizedone-word-per-lineformat,acorpuspositioncanalsobe Figure2.1:Corpuspositionsandvalues word: Pierre Vinken, 61 years old blessing n-2 n-1 7
9 Thecorpustextfallswithintheclassofpositionalattributes,sincewecanspecify,foreach IMSCorpusWorkbench:Administrator'sManual 8 ofthecorpus.inourview,weregardpos-tagsasassignedtoacorpuspositionratherthan corpusposition,thewordwhichoccursatthatposition.thepositionalattributewhich holdsthecorpustextproperalwayshasthepredenedattributename\word".other positionalattributesare,forexample,part-of-speechtags,whichareassignedtothewords string(asillustratedingure2.1).wethereforeusethesameinternalrepresentationforthe tothewordatthatposition.then,thepositionalattributes\word"and\tag"donotdier verymuchanymore:bothhave,foreachcorpusposition,avaluewhichis,inourcase,a wordsequenceofthecorpus(thecorpustext)andthetagsequence(theassociatedpostags),aswellasforother,additionalpositionalattributeslikelemmas,morphosyntactic equallength,oneofwhichcapturesthesequenceofwords,theothercapturesthesequence tags,etc.inotherwords,ataggedcorpusisinourviewasetoftwopositionalattributesof then,isacollectionofattributesofdierenttypes. Thequestionoftheinternalrepresentationofcorporawithmultiple(positional)annotationscanthusbereducedtothequestionofrepresentingasinglepositionalattribute (rememberthattheallpositionalattributesmustbeofequallength,thatis,encodeequal Thetwokeyconceptsoftheinternalrepresentationofapositionalattributesare: lengthitemstreams). integerizedrepresentation:itemsareencodedasintegernumbers,whereequalitems informationencodedinsuchapositionalattribute(here,wordvs.tag).acorpusingeneral, oftags.inthefollowing,wethereforeusetheterm\item"toabstractfromthetypeof inversedleindices:forthesequenceofnumbers,aninversedleiscreated.the (words,:::)getthesameintegercode.thesequenceofitemsisthenrepresentedas inversedlecaptures,foreachitem(better:itemcode)thesetofoccurrencesofthe asequenceofintegernumbers; Fortheconstructionoftheintegercode,younormallyneedasegmentationortokenization tool,sincethe,andtheareconsidereddierentandundesirablygetdierentcodes. iteminthepositionalattribute. Theadvantagesoftheintegercodeisthattherepresenteditemshaveequalinternallength (inthecaseofintegers,4bytesonourmachines).sincethelengthoftheitemsequenceis knownandtheitemsareofequallength,theitemsequencecanbehandledlikeanarray ofitems,withtheadvantageofrandomaccess.theinversedleisneededforlookup:since computedinasinglestep. itdirectlyindexesthesetofoccurrencesofagivenitem(code),theoccurrencescanbe tionoftheitemsequence.then,theitemsequenceisrepresentedasasequenceofinteger 2.1.1Integerizedles codes. Itisobviousthatatleasttwofunctionsareneededtohandlethisencoding: Assaidabove,therstsetofdatastructuresisanintegerizedleofthetextualrepresenta-
10 rst,afunctiontocomputetheintegercodeofagivenitem(astring); IMSCorpusWorkbench:Administrator'sManual 9 Therstdatastructureistheitemlistor\lexicon":itcapturesthesetof(dierent)items. Thesetwofunctionsrequiresomeauxiliarydatastructurestobeecientlycomputable. second,afunctiontoretrievethe(character)stringwhenthecodeisgiven. (2.1) (octal\000)ispaddedattheendofeachword.theleisnotsorted(butitmaybe).a UNIXcommandtoproducethislewouldbe: Internally,thisisthesetofstringsoccurringintheitemsequence,whereaNULLcharacter whereitisassumedthattheinputitemsequenceisinone-word-per-lineformat.inthis example,theoutputwouldbesorted,butthisisnotnecessary.theitemlistalreadydenes sort-u1wpl-item-seq tr'\n''\0'>lexicon theitemcodeforeachitem,sinceitisassumedthattherstitemintheitemlisthascode 0,thenextonehascode1,andsoon. 0)fromtheinputstream,sothattheexampleabovewillnotworkwith\traditionaltr". GNU'strdoesnothavethisbug. Notethat\traditional"trsdeleteASCII0( representedbytheitemcodeciscomputedinonestepviathisindex,whichwecallthe totheleosets(inbytes)intheitemlist.thus,thestartingpositionsofthethestring thestringsinthisle.thisindexgives,foreachitemcode,amappingfromitemcodes Forthelookupofastringinthislist,itisusefultohaveanindexofstartingpositionsof anitemcodeiseverythingbetweenthestartingpositioncomputedbytheitemindexup NULLcharacter(whichmustnotoccurintheitemsthemselves),thestringrepresentedby tothenextnullcharacter. itemlistindexorlexiconindex.sincethestringsintheitemlistareterminatedwiththe computed: (2.2) Again,thisindexcanbecomputedbyaUNIXcommandwhenthelexiconisalready tr'\0''\n'<lexicon atoiisautilityprogramincludedinthetoolboxandmapsnumbers(representedtextually atoi>lexicon.idx gawk'begin{pos=0} asasequenceofdigits)totheirinternalrepresentation. {printpos;pos=pos+length($1)+1}' perhapswouldneedmorespace,butcouldcomputetheconversioninnstepswherenis implemented.2forexample,thesamefunctionalitycouldbeachievedwithtries,which bedoneinanumberofdierentways{currently,binarysearchoverasortedstringindexis Thenextdatastructuresupportsthemappingfromstringstotheiritemcodes.Thiscould thelengthoftheinputstring. sinceitisrarelyused(allcomputationsaredoneontheitemcodes,wheneverpossible,insteadofstrings), wedidn'tyetconvertittoamoreecientmethod. 2Themethodcurrentlyimplementedintheworkbenchisverysimpleandcouldbespedupalot,but
11 Thebinarysearchrequiresasortedstructure.Forthispurpose,wedonotkeepasorted IMSCorpusWorkbench:Administrator'sManual 10 1),theitemcodeatthisposition.So,Ls(0)istheitemcodeofthe\smallest"item,and Ls(1)isthecodeofthesecond-smallestitem,etc.Thesorteditemlistcanthusbetextually itemina\virtual"sorteditemlist(rangingfrom0tothenumberofencodeditemsminus itemlist,butratheranotherindex(denotedbyls)whichholds,foreachpositionpofan printedbythefunction gprints; for(i=0;i<"sizeofitemset";i++)f code=sortidx[i]; s=lexidx[code]; Here,foreachpossiblepositioninthesortedindex,i,rsttheitemcodecodeatthatpositioniscomputed.Then,throughaccessingtheitemindex,thecharacterstringrepresented (2.3)tr'\0''\n'<lexicon Asyoucanimagine,thislecaneasilyproducedbyaUNIXcommand: bycodeisdetermined,whichisthenprinted. gawk'{printnr-1"\t"$1}' Therstlinecomputesthestringsfromtheitemlist,whicharethenprexedbytheircode gawk'{print$1}' atoi>lexicon.srt sort+1 (whichisthe\position"intheitemlist),beginningwithcode0fortherstword.thislist ofcode/valuepairsisthensortedbythevalues,whichoccurinthesecondcolumn.the outputofthesortingisthenltered,sothatonlythecodesareprinted.thecodesequence isthentransformedintotheinternalformatandwrittentotheindexle. Note:OneofthereasonswedonotusetheseUNIXcommandstocreatethedatastructures fromotherprograms(itworkswithsignedcharacters,whereasinternallyweworkwith isthattheunixsortcommandsometimeshandlestheorderof8bit-charactersdierently Thedatastructuresusedtorepresenttheencodeditemsequenceandtheassociatedauxiliarydatastructureswhichfacilitatethenecessarymappingsareillustratedingure2.2. itemstoitemcodeswillnotworkproperlyotherwise. whenthestandard7bitasciicharactersetisused.theinternalfunctionswhichmapfrom unsignedcharacters).sotheunixcommandswhichusesortonlycreatethesameles Anotherpossiblityistoproducetheitemlistandtheindicesinasinglegawkrun.The scriptbelowcanbeusedforthissecondpurpose,butitassignsotheritemcodes: istoreadanalreadyexistingitemlistle,whichmaybeproducedbythecommandsabove. Theonlyleforwhichwedidn'tyetgiveaUNIXcommandistheitemsequence(orbetter, thesequenceofencodeditems).gawk'sarrayscanbeusedforthispurpose.onepossiblity (2.4)BEGIN{ maxcode=0; OliverChrist } position=0; IMSStuttgart August9,1996
12 IMSCorpusWorkbench:Administrator'sManual 11 Item Sequence Item List Index Item List Sorted Index never the cucumber ; 31 & wine {if(!($1initemlist)){ Figure2.2:Integerizeditemsandassociateddatastructures print$1>"lexicon.asc" itemlist[$1]=maxcode; printposition>"lexicon.idx.asc" Item Index ==> String }else maxcode++; code=maxcode; position=position+length($1)+1; Afterthecodeisexecutedwithatextleasinput,theASCIIrepresentationshavetobe } printcode>"corpus.asc" code=itemlist[$1]; convertedintotheinternalformat(thiscouldbedoneviapipesinthegawkscriptalso,but weleftthatouthereforthesakeofclarity): (2.5)atoi<corpus.asc>corpus Afterthat,command2.3canbeusedtoproducethesorteditemlistindex. tr'\n''\0'<lexicon.asc>lexicon rm-f*.asc atoi<lexicon.idx.asc>lexicon.idx sequence.thisinversedleholds,foreachitemcode,thesetofpositionsintheitem 2.1.2Inverseditemsequence Thesecondsetofdatastructuresconcernstheinversedleindexassociatedwiththeitem
13 sequencewheretheitemcodeoccurs.throughthemappingfunctionsintroducedinthe IMSCorpusWorkbench:Administrator'sManual 12 Theinversedleisrepresentedbyasetofthreeles: whereacertainwordorpart-of-speechtagoccurs. lastsection,wecanalsoregardtheinverseditemsequenceasalistofcorpuspositions second,anindexintothisle.thisindexreturns,foreachitemcode,thestartpoint rst,theinversedleitself,whichcontainsasetofcorpuspositions; third,atableofitemcodefrequencies,whichgives,foreachitemcode,thenumber ofoccurrencesofthecodeinthecorpus(whichis,ofcourse,equaltothesizeofthe oftheassociatedoccurrencesintheinversedle; ThethreelescanalsobecomputedbyUNIXcommands.First,thereversedsequenceis producedbythefollowingcommand: setofoccurrences). (2.6)itoacorpus gawk'{print$i"\t"nr-1}' First,theinternalrepresentationoftheitemsequenceisconvertedintoreadablenumbers. gawk'{print$2}' atoi>corpus.rev sort-ns Thisnumbersequenceisthensuxedwithitspositioninthecorpus,whichisthensorted bythecode,sothatwegetcode/positionpairs.fromthissequence,thepositionisstripped Thefrequenciescanalreadybecomputedinthegawkencodescript(2.4),butanother o,sothatweonlygetthesequenceofpositions,whichexactlyistheinversedle. possibilityisaslightlymodiedversionofthescriptabove: (2.7)itoacorpus gawk'{print$i"\t"nr-1}' gawk'{print$1}' atoi>corpus.cnt uniq-c sort-ns Here,wekeepthecodesequenceofthecode/positionpairs.Thissequenceofcodesappears Thelastle,theindexintotheinversedle,cansimplybecomputedfromthefrequencies tointernalformat.3 collapsedintoonlyasinglelineandcounted.thesecountsarestrippedoandconverted insortedorder.bythecalltotheuniqutility,equalsubsequentlines(here:codes)are bysummingthemup: (2.8)itoacorpus.cnt beomitted.theversionhereisjustforclarity. 3Itwouldbemoreecienttouseagawkarraytoholdtheitemcodecounts,sincethesortstepcould gawk'begin{pos=0}{printpos;pos+=$1}' atoi>corpus.rdx
14 Now,thewholesetofsevenlesrepresentingthedataofapositionalattribute(which IMSCorpusWorkbench:Administrator'sManual 13 therearetoolswhichperformthesestepsmuchfasterthantheshellscriptspresentedhere. scriptsmayhelptoproducetheencodedversionofacorpus. Butinsomecases,theutilitiesofthetoolboxrunintomemoryproblems,andthenthese wecallthesevencomponentsofapositionalattribute)havebeencreated.inthetoolbox, Index into reversed file Reversed File : : Nr of item occurrences (freq) Thecomponentsassociatedwiththeinversedleandtheirmeaningsareillustratedin Figure2.3:Reversedleindices 31 Item Index ==> Set of occurrences foranitem,awordforexample,isperformed. gure2.3.thenextsectionwillshowthesinglestepswhicharetakenwhenasimplesearch Aftertheinternaldatastructureshavebeenintroduced,wecancomputetheconcordance forasingleitem,forexamplethewordthe.mostdatastructurescanbetreatedasanarray, 2.1.3Example:asimplewordsearch soweusethesymbols Cfortheitemsequence(accessedbyC[i]whereiisacorpusposition).Theelements Rforthereverseditemsequence(accessedbyR[i]whereiisanindexintothis ofcareitemcodes; sequence,computedfromirbelow).theelementsofrarecorpuspositions;
15 Figure2.4:Asimplewordsearch Sorted Index Word list never the cucumber ; & wine Index into reversed corpus ID: Nr of item occurrences (freq) : : Reversed Corpus : Corpus "Match" Index into Word List and Word List Lookup Concordance Element IMSCorpusWorkbench:Administrator'sManual 14
16 ILfortheitemlistindex(accessedbyIL[c]wherecisanitemcode).Theelements IMSCorpusWorkbench:Administrator'sManual 15 Ffortheitemfrequencytable(accessedbyF[c]wherecisanitemcode).The elementsoffareitemfrequenciesinc; ofilarebyteosetsintotheitemlist; IRforthereverseditemsequenceindex(accessedbyIR[c]wherecisanitemcode). SLforthesorteditemlistindex(accessedbySL[i]whereiisapositioninthe\virtual" Lfortheitemlist(anarrayofcharacters,onlyaccessedbyosetsofIL); TheelementsofIRarepointers(osets)intoR; ForcomputingthesetofoccurrencesofatextualiteminC,thefollowingstepshavetobe Thesesevenarraysarethecomponentsofapositionalattribute. sorteditemlist).theelementsofslareitemcodes. taken(alsoillustratedingure2.4fortheword\the"): rst,theitemcodec(i)ofitemihastobedetermined.forthispurpose,thesorted iftheitemcodecouldbedetermined,thereverseditemsequenceindexisconsulted found; itemindexslisconsultedandsearchedwithbinarysearchuntiltheitemcodeis second,theitemfrequencylistisaccessedtocomputethe\length"oftheposition iinthereverseditemsequence; todeterminethestartingpositionrs(i)=ir[c(i)]ofthepositionsetassociatedwith then,thesetofoccurrencesp(i)isthesetofpositionsstoredinthereverseditem setf(i)=f[c(i)]; Thetaskofcomputingthesetofoccurrencesofiintheitemsequenceisthencompleted. Notethattheitemsequenceitselfdidn'thavetobeaccessed. sequencerstartingatrs(i)withlengthf(i)(r[rs(i)]:::r[rs(i)+f(i) 1]). Forcomputingtheconcordanceandprintingit,though,theitemsequenceCmustbe bounds(0;jcj 1).Foreachitemkinthissubsequence,theassociated(textual)itemmust foreachp2p(i)the\subsequence"between[p cl;p+cr]incmustbecomputed(inthe bedeterminedbycomputingthestartpositionts(k)=il[k]ofkintheitemlistindex. consulted.whenclistheleftdisplaycontext(intermsofitems)andcristherightcontext, 2.1.4Thesetofpositionalattributes Then,theitemlistcanbeconsultedtogetthestrings(k),whichthenisprinted. ofcandrareequal(seealsogure2.5): positionalattributehasitsownsetofcomponents.foreachpositionalattribute,thelength TheIMSCorpusToolboxsupportsanarbitrarynumberofpositionalattributes.Each OliverChrist IMSStuttgart jcj=jrj August9,1996
17 IMSCorpusWorkbench:Administrator'sManual 16 PA word Item freqs Item Seq Reversed Item Seq Index for RC Index for IL Sorted Idx Item list PA pos Item Seq Reversed Item Seq Item freqs Index for RC Index for IL Sorted Idx Item list PA lemma Item freqs Item Seq Index for RC Item list Reversed Item Seq Index for IL andthelengthsofil;ir;sl;andfareequal: Figure2.5:Thesetofpositionalattributes Sorted Idx itemsequencesoftheseattributesmustbeequal: Furthermore,betweenallpositionalattributesassociatedwithacorpus,thelengthsofthe jilj=jirj=jslj=jfj ofcourse,nosuchconditionusuallyholdsbetweentheothercomponentsoftwopositional attributes. jcwordj=jclemmaj=jcposj=jcsynj=::: 2.2.1Structuralattributes Otherattributetypes phrases,orotherentities.internally,thesestructuresarerepresentedasintervalsofcorpus Structuralattributescaptureinformationaboutboundariesofsentences,paragraphs, Currently,therearetwolimitationswithrespecttostructuralattributes: internally(4bytes,thesizeofaninteger,foreachofthetwopositions). positions,whicharethestartandendpoint(inclusive)ofthestructure.suchanintervalisapairofcorpuspositions.therefore,eachstructuralitemneeds8bytesofstorage
18 rst,theintervalsmustnotberecursive(forexample,embeddednpsinnps); IMSCorpusWorkbench:Administrator'sManual 17 andtheymustnotbeoverlapping. Positional : Attributes pos: N N IP NUM N ADJ N IP word: Pierre Vinken, 61 years old blessing n-2 n-1 Figure2.6illustratestherepresentationofstructuralattributes.Thenumberofstructural attributesassociatedwithacorpusisnotlimited. Figure2.6:Structuralattributes Structural s Attributes paragraph S.Normally,thestructuralattributedataiscreatedwiththeencodeutility.Butinsome Unlikepositionalattributes,thedataforapositionalattributeisstoredinasinglele, Creatingstructuralattributedata isanarrayofintegerpairs,wherejsjisthenumberofintervals.thelesizeofsisthen cases,itisusefultomanipulateorcreatethelesthroughotherutilities.thedatales corpushasbeenencoded),asimpleawkscriptcanhelp.youmust,however,beawareofthe needs4bytes. Ifthelesareconstructedmanually(withoutthehelpofencode,forexample,afterthe 42jSj,sinceforeachinterval,twointegernumbershavetobestored,eachofwhich internalrepresentationofpositionalattributesandthe\logics"ofcorpuspositions;second, somepitfallshavetobecircumvented. Let'sassumethataone-word-per-lineinputlewithmarkedsentenceboundaries(in withatoi): (2.9)BEGIN{ SGML-style,like<s></s>)isavailable.Then,theintervalscanbeextractedbythe followingawkscript(theoutputofwhichhastobeconvertedintointernalintegerformat position=0; open=0; }{if($1==closetag){ closetag="</"structure">" opentag="<"structure">" structure="s" if(open){ #closingtag,don'tincrementposition. printposition-1;#thankstoa3@wsserv.vdl.nl(adriverhoef)
19 open=0; IMSCorpusWorkbench:Administrator'sManual 18 }else{ }elseif($1==opentag){ } print"closingnon-opengroupatline"nr":"$0>>"/dev/stderr" #tag,don'tincrementpositionthen exit if(open){ open=1 }printposition printposition-1 #forgottoclosegroup,whichwedon'tconsideranerror }elseif($1~/<\/?[a-za-z]+>/){ }END{position++; #donothing,otherstructuraltag? First,caremustbetakenwhengroupsareclosedwhicharenotopen.Theothercase, } if(open) reopeningopengroups,isnotconsideredanerror,sinceclosingtagsareoptional.additionally,whenstructuretagsareusedinthetext,thelinenumber(position)mustnotbe incremented.buteventhen,thisawkprogrammayyielderrors.so,atleastcheckwhether thesizeoftheresultinglecanbedividedby8. printposition Bigramandmappingtables 2.2.2Alignmentattributes 2.3 Externaltoolsanddynamicattributes
20 IMSCorpusWorkbench:Administrator'sManual 19, Positional Attributes : pos: N N IP NUM N ADJ N IP word: Pierre Vinken, 61 years old blessing n-2 n-1 Alignment Figure2.7:Alignmentattributes word: Pierre Vinken, 61, wird Bigram Tables: Pierre Vinken 61 years Pierre Vinken, 61 years Mapping Tables: NP NPS PUNCT CARD N Figure2.8:Bigramandmappingtables Pierre Vinken 61 years, Figure2.9:External(dynamic)attributes Value Request pipe() invocation Data Access Module Value computation Value return Value passing Value check/conversion External Tool
21 Chapter3 Encoding:Transformingacorpus intoitsinternalrepresentation asasequenceofcharacters).thewords,then,arenumbered,sothatwecantalkabout WithintheIMScorpusworkbench,acorpuscanhaveanarbitrarynumberofannotations ofdierenttypes.inoursystem,acorpusisprimarilyregardedasasequenceofwords(not thewordatacertaincorpuspositionp,therstwordinthecorpus,or,ingeneral,thenth themostimportantannotationtype.attributesofthisclasshavea(string)valueateach corpusposition.1 Thecorpustextfallswithintheclassofpositionalattributes,sincewecanspecify,foreach wordofthecorpus.thisleadstothemoregeneralnotionofpositionalattributes,whichis ofthecorpus.inourview,weregardpos-tagsasassignedtoacorpuspositionrather corpusposition,thewordwhichoccursatthatposition.thepositionalattributewhich thantothewordatthatposition.then,thepositionalattributes\word"and\tag"do holdsthecorpustextproperalwayshasthepredenedattributename\word".other positionalattributesare,forexample,part-of-speechtags,whichareassignedtothewords taggedcorpusisinourviewasetoftwocorporaofequallength,oneofwhichcapturesthe ourcase,astring.wethereforeusethesamerepresentationforthewordsequenceofthe corpus(thecorpustext)andthetagsequence(theassociatedpos-tags).inotherwords,a notdierverymuchanymore:bothhave,foreachcorpusposition,avaluewhichis,in sequenceofwords,theothercapturesthesequenceoftags.inthefollowing,wetherefore Section3.2describesthestepswhicharenecessarytoprepareatextuallyrepresented usetheterm\item"toabstractfromthetypeofinformationencodedinsuchanattribute corpustobesuitableasinputfortheencodingtoolsaswellastherstofthetwoencoding Inthefollowingsection3.1,weshortlydescribetheinternalrepresentationofacorpus. (here,wordvs.tag).acorpusthen,isacollectionofattributesofdierenttypes. theindicesassociatedwithacorpus. regardedasthenumberofalineinthisrepresentation. tools,encode.section3.3thendescribesthesecondencodingtoolwhichisusedtobuild 1Whenthecorpusisstoredinaverticalizedone-word-per-lineformat,acorpuspositioncanalsobe 20
22 3.1 Theinternalrepresentationofacorpus IMSCorpusWorkbench:Administrator'sManual 21 Afterencoding,eachitemofatextualcorpusisrepresentedasauniqueintegervalue2. Forexample,iftherstitemofatextcorpusis\The",all\The"sinthetextcorpus willinternallyberepresentedastheintegernumber03.thecorpuscanthenbephysically representedasasequenceofintegernumbers.tobeabletogettheitemwhichisrepresented ofintegers),the\lexicon"whichholds,foreachinteger,thestringitrepresents,andan now,3lesarenecessarytoholdtheinformation:thecorpus(consistingofasequence indextothelexicon.thesethreelesarethosewhichareproducedwithinthesecondstep byaninteger,another(indexed)leholdsthemappingsfromintegerstostrings.upto describedbelowinsection3.2. oftheitemsinthelexicon,whichisnecessarytoecientlycomputetheintegercodeofan duringtheencodingofacorpus.thetoolwhichperformsthistaskiscalledencodeandis Twoadditionallesarebuiltnext:therstholdsinformationaboutthesortedsequence item,giventhestring.theotherleholds,foreachitem,thenumberoftimesitoccursin itself,leadingtoanotherle.insummary,wehavesevenlessofarwhichrepresentthe eachiteminthecorpus,thecorpuspositionswheretheitemoccurs.thisindexisindexed thecorpus. informationofonepositionalattribute.thefouradditionalleswhicharenotbuiltbythe Forecientlookup,areversedleorreversedindexhastobebuilt.Thisindexholds,for inputfortheencodeprogram,whichisdescribedinthenextsection. totheencodingofacorpus,ithastobetransformedintoaformatwhichissuitableas encodeprogramarecreatedwiththemakeallprogramdescribedinsection3.3.butprior Whenacorpusconsistsofseveralpositionalattributes(forexample,aPOSattribute 3.2 additionallytothestandard\word"attribute),itcaneitherbeencodedinonesinglestep Theencodeprogram (providedthatitisinasuitabletextualinputformatforencode)orthevariouspositional attributescanbeencodedoneafteranotherandbeaddedtoanalreadyexistingcorpus. Thislatterwayisalsousefulwhenoneofthepositionalattributeshasbeenchanged,for beencodedoccursinasingleline.thislinemaycontainblanks,providingawaytoencode accuratetagassignments. Inbothcases,theinputformatisaone-word-per-lineformat,whereeachitemwhichisto example,whenatagsethasbeenchangedorabettertaggerwasavailabletoproducemore adjacentmulti-wordlexemes,ifdesired.butcareshouldbetakentoavoidblanksatthe endofanitem,since,forexample,\the"and\the"aredierentstringsandthereforeare Inthecaseofthecorpustext,theinputmaylookasfollows: encodedwithdierentcodeswhichcanleadtoundesiredeectswhennotalloccurrences ofthearefoundinatextduetoablankattheendofsome. consideredequal. paperbykenw.church,\asetofunixtoolsforprocessinglargetextcorpora". 3Ofcourse,\the"willgetanothercodethan\The",since\The"and\the"textuallydierandarenot 2Theinternalcorpusrepresentationweuseishighlyinspiredbyan{unfortunately{unpublisheddraft
23 Pierre IMSCorpusWorkbench:Administrator'sManual 22 Vinken,61 years old,will join the board as anonexecutive director Anotherle,then,mayholdthesequenceofassignedtags(inwhichcasebothlesmust Nov. holdthesamenumberoflines).theinputformatcan,forexample,beproducedoutofa 29 rawtextlewiththetrcommand4:. Thiscommandreplacesallblanksintheinputlewithlinebreaks.Theencodeprogram threelescorpus,lexiconandlexicon.idxinthecurrentdirectory: thentakesaone-word-per-lineinputle(orreadsthatformatfromstdin)andcreatesthe tr'''\n'<text_file>1wpl-file encodinginasinglestep,onecouldenterthefollowingpipe: The-toptioninstructsencodetoreaditsinputfromthelegivenasanargumentof theoptioninsteadofreadingthestandardinput.todoboththetransformationandthe encode-t1wpl-file or,ifonewantstoholdthetextinacompressedformat: zcattext_file.gz tr'''\n' encode tr'''\n'<text_file encode arydetectormustberunonarawcorpustoproducetheappropriateinputle.the (specialcharactershavebeenseparatedfromthewords),perhapsevenasentencebound- thesimpletrexamplesabove,itisalreadyassumedthatthecorpushasbeentokenized Ingeneral,theone-word-per-lineformatcanbeproducedbyanyprogramyoulike.In Now,encodemayalsotakeanannotatedcorpuswithseveralpositionalattributesina applicabletonew,rawcorporawhichrstmayhavetobepreprocessedbyothertools. simpleexampleonlyshowshowencodeprocessesitsinput,ingeneral,thismethodisnot singlele.inthiscase,eachlineoftheinputformatconsistsofanumberofattribute values,separatedbytabulatorcharacters.thus,theinputleconsistsofseveralcolumns, eachdenotesonepositionalattribute.apos-taggedtextthenmaylookasfollows: OliverChrist 4Thetrcommandisastandardcommandavailableonmanyplatformsandisnotpartofthistoolset. IMSStuttgart August9,1996
24 IMSCorpusWorkbench:Administrator'sManual 23 Pierre<tab>NP Vinken<tab>NP,<tab>, 61<tab>CD years<tab>nns old<tab>jj,<tab>, will<tab>md join<tab>vb the<tab>dt board<tab>nn as<tab>in a<tab>dt nonexecutive<tab>jj director<tab>nn Nov.<tab>NP 29<tab>CD.<tab>SENT where<tab>denotesasingletabulatorcharacter(asciivalue9)5.encodemustthen knowwhichpositionalattributeisrepresentedintheothercolumnsoftheleandhow theyshallbenamed.thisisdonewiththe-poption: encode-t<input-file>-ppos Here,theoption-Ppos(\P"for\positionalattribute")instructsencodetotreatthe secondcolumnintheleasthesequenceofvaluesoftheposattribute.thelesassociated withtheposattributehavetheirnamesprexedwithpos(whichleadstopos.corpus, pos.lexiconandsoon).theorderinwhichthe-poptionsaregivenisrelevant,sincethe rst-poptiondenotesthenameoftheattributerepresentedinthesecondcolumninthe inputle,thesecond-poptiondenotesthethirdcolumnetc.bydefault,therstcolumnis treatedasthewordsequenceandthereforegetstheprexword,butthiscanbeoverridden withthe-poption.pleaserefertothemanualpageofencodefordetails. Upto32positionalattributescancurrentlybeencodedinasinglestep.Iflargeamounts oftextaretobeencoded,trytodeterminethediskspacethecorpusneedsafterencoding inadvanceandlookforalesystemwhereenoughspaceisavailable.somehintsonthe expectedsizearegiveninsection3.4.asexplainedinchapter4,itispossibletosplitthe lesofacorpusbetweenseverallesystemsincasethereisn'tenoughspaceonasingle disk.ifallelsefails,youmayhavetoencodethesetofpositionalattributesinseveralruns ofencode,eachwiththeappropriateprexpassedwiththe-poption. Acorpus(oranarbitrarypositionalattribute)maybeassignedanothertypeofinformation whichcanbeencodedwiththeencodeprogram,namelystructuralinformationwhichcan beusedtorepresentarticle,sentenceorparagraphboundaries.thiskindofinformationis representedintheinputlewithsgml-likemarkers: <article> <s> Pierre<tab>NP Vinken<tab>NP,<tab>, 5Therearenogeneralhintsonhowtoproducethisinputformat.Ingeneral,itisagoodideatouse standardtoolslikeawkandsed.
25 IMSCorpusWorkbench:Administrator'sManual 24 29<tab>CD.<tab>SENT <s> attributesinthele.somepointshavetobenoted: Ofcourse,structuralinformationcanbeencodedindependentlyofadditionalpositional </s> </article> inalinewithastructuremarker(s,article),novaluesofpositionalattributesmay theendtagsmaybeomitted.inthatcase,astructurespansallitemsuntilthestart occur; ofthenextstructureortheendofle; ifastructuremarkerline,everythingafterablankoraftertheclosinganglebracket Intheaboveexample,thecallforencodewouldlooklikethis: structuresmustnotberecursiveoroverlapping,thatis,treescannotberepresented. (>)ofthetagisneglected; Here,thetwoencodedstructuralattributesareeachdeclaredwiththe-S(for\structural attribute")option.theorderinwhichthestructuralattributesaredeclareddoesnot encode-t<input-file>-ppos-sarticle-ss Becarefultodeclareallstructuralattributesintheencodecall,sinceundeclaredstructural matter. attributesareconsideredassimpleattributevaluesintherstcolumnandthereforeare Again,upto32structuralattributescancurrentlybeencodedinasinglestep. astructureattributedesignatorinanglebracketsorislineofaxednumeroftabulatorseparatedcolumns.errorsmayoccurifthisruleisnotobeyed. -p<prefix>hastheeectthatthelesbelongingtothepositionalattributeinthe rstcolumnoftheinputlewillgettheprex\prefix.".notethatthedotatthe endoftheprexisaddedautomaticallyandmustnotbepartoftheoptionvalue.nor- havetabulator-separatedcolumnsafterthem.theruleissimple:eitheralineconsistsof occursincelineswithundeclaredstructuralattributesintherstcolumningeneraldonot treatedliterally.ifyouareencodingseveralpositionalattributesatonce,anerrorwill encodeacceptsanumberoffurtheroptions: lesmaycollidewiththethoseofthenewattribute.therefore,whentheprexis readypresentinthedirectorythedataiswrittento,sincethenamesofthecorpus Thiswillleadtoproblemsifapositionalattributeisencodedafteracorpusisalmally,thelesgetthenamecorpus[.cnt,.rev,.rdx]andlexicon[.idx,.srt]. thelesforthepositionalattributeintherstcolumnoftheinputle,theother notgiven,datamaybeoverwrittenandlost.thisoptionaectsonlythenamesof OliverChrist positionalattributes{ifpresent{willgettheprexgivenwiththe-poption; IMSStuttgart August9,1996
26 -d<path>letstheuserspecifythedirectoryinwhichthedatashallbewritten.the IMSCorpusWorkbench:Administrator'sManual 25 -sinstructsencodetoskipemptylines(lineswithnocharacters{notevenblanks{ pathshouldnotendwithaslash.defaultistowritealloutputlestothecurrent directory; themostrecentdescriptionoftheprogram. encodeacceptsanumberoffurtheroptions.pleaserefertotheencodemanualpagefor init)duringencoding. thepositionalattribute,sinceittriestowritethedatatothesamelesinwhichthe corpusafterencodingthecorpusitself:alossofdatamayoccurduringtheencodingof -Doptionisspecied.Becarefulwhenyouadd(orupdate)apositionalattributetoa Pleasebeawarethatencodewritesitoutputlesintothecurrentdirectoryunlessthe corpusdatamayalreadybestored.eitheryoushouldpassthe-poptiontoprexthe lesbelongingtotherstcolumnorputeachpositionalattributeinadirectoryofitself topreventencodefromoverwritingimportantdata.itisaverygoodideatochangethe leaccessmodeofalllesbelongingtoapositionalattributetonon-writeableforanyone 3.3 afterencoding,inordertopreventaccidentialoverwriting. Afterencodingacorpus,eachpositionalattributehastobedeclaredinthecorpusregistry. Pleaserefertochapter4foradetaileddescriptionofhowtodothis.Themakeallprogram, Themakeallprogram whichconstructsthesecondsetoflesduringtheencodingprocess,willnotworkon undeclaredpositionalattributesorcorpora. Afterapositionalattributeisdeclaredintheregistry,makeallmustberuntoconstruct thesymbolicnameofthepositionalattributeforwhichtheindexlesshallbeproduced: thenecessaryindexles.therearenooptions,andtheonlyargumentmakeallacceptsis Thiswillproduceallmissinglesforallpositionalattributesdeclaredforthecorpus treebank. makealltreebank Ifyouonlywanttoproducethelesforasinglepositionalattribute,givethenameofthe attributeasanadditionalargument: thatthiscallcanbeissuedfromanypointinthelesystem,sincemakealllooksupin (giventhatthesesymbolicnamesarethoseoftherespectivepositionalattributes).note makealltreebankpos theregistrytondthecorpusdata.dataisonlywrittentothedirectoryspeciedinthe networkloadincaseofnfs-mountedlesystems). registrydescriptionle.itisthereforeagoodideatorunmakealleitheronaveryfast machineoronamachinewhichlocallyholdsthedisksthecorpusisstoredon(toreduce
27 Note:\makeall"willcurrentlytrytocreatenon-compressedlesforattributeswhich IMSCorpusWorkbench:Administrator'sManual 26 alreadyhavethecompletedataincompressedform.thisisabugandwillbexedina futurerelease. Afterrunningmakeallonallpositionalattributesofacorpus,thecorpusisreadyforuse. errorsoccurorwhentheprogramhastobedebugged. makeallcurrentlyproducesalotofdebuggingoutput.thisoutputisonlyimportantwhen sageslike\can'tallocatememory"or\notenoughmemory".theseproblemscanonly makeallmayhaveproblemsduetomemoryorswapspacelimitationsandyielderrormes- Whentryingtoencodereallybigcorpora(20millionwordsandabove),encodeand thewholecorpusdata(seebelowinsection3.4).askyoursystemadministratorforfurther availableswapspaceisshownwiththepstat-scommandandshouldbeenoughtohold runencode/makeallonanothermachineatyoursitewhichhasenoughswapspace).the besolvedbyprovidingmoreswapspacetothemachinetheprogramisrunningon(orto help. 3.4 Thissectiongivessomehintsonhowmuchspacewillbeusedbyanencodedattributes. Currently,weonlycoverpositionalandstructuralattributeshere. Spacerequirements sequence,jaj,thereforeisthenumberofelementsofthissequence.further,letdbethe LetAbethesequenceofitemscapturedbythepositionalattribute.Thelengthofthis 3.4.1Positionalattributes thenumberofdistinctwords. setofdistinctstringsencodedina(thelistofdierentwords,forexample).then,jdjis Twonumbersareimportant: thenumberofitemsintheinputle(jaj)whichisequaltothenumberoflinesinthe annotationmarkers); command(maybepre-pipedwithagrep-vcommandtogetridofthestructural markers,ifpresent).thisnumbercan,forexample,becomputedwiththewc-l one-word-per-lineinputleforencode(minusthenumberofstructuralannotation thenumberofdistinctitemsintheinputle(jdj)(thisnumbercanbecomputedby Anotherimportantnumberisthespaceneededfortheone-timerepresentationofalldifferentitems,whichhereisdenotedS.Thisisthesumofthelengthsofeachdierentword runningthepipesort-u wc per-lineinputtextle). -lovertherespectivecolumnintheone-word- plusone: OliverChrist S=Xs2D(strlen(s)+1)=jDj+Xs2Dstrlen(s) IMSStuttgart August9,1996
28 Theaddingof1isnecessarysinceanullcharacter('\0')isaddedtoeachstring.The IMSCorpusWorkbench:Administrator'sManual 27 numberisgivenbyrunningthepipesort-u wc-covertherespectivecolumninthe Now,thesizeofonepositionalattribute(inbytes)canbecomputedasfollows: inputle.6 afterencoding. Thesizeoftheinputtextledoesnotgointothisformula,sinceitisnotneededanymore Mp=2(jAj4)+S+4(4jDj)=8jAj+16jDj+S Foreachpositionalattribute,thisformulahastobeevaluatedagain,sincethenumber ofdierentitemsintheattribute(jdj)andthespaceneededtorepresentthemonce mainlydependsonitslength,whichisnotverysurprising.alessaccuratenumberofspace haveasmallnumberofdistictvalues,thespaceneededtorepresentapositionalattribute positionalattributesofacorpus).sincepositionalattributesbutthewordattributeusually (S)usuallydiersbetweenseveralpositionalattributes(jAjmustbeconstantforanytwo requirementcanberoughlyestimatedbymultiplyingthesizeoftheuncompressedinput 3.4.2Structuralattributes textle(s)by2. Thedataofastructuralattribute,forexamplesentenceboundaries,isstoredinasingle le,asanorderedsequenceofcorpusintervals(thatis,pairsofcorpuspositions).so,the computationofthespaceneededtorepresenttheinformationofonestructuralattributes inthisattribute(\thenumberofsentences"),eachbeingapairoftwocorpuspositions. isverysimple:letsbethestructuralattribute.then,jsjisthenumberofintervalsstored asfollows: Sinceeachcorpuspositionisstoredasa4-byteintegernumber,thespacecanbecomputed So,ifyouwanttorepresent1000sentences,youneed8000bytestostorethedata. Ms=(jSj42)=8jSj GNU's/FSF'ssetoftextutilities,orwiththeawkutility. OliverChrist 6Thecolumnscanbeextractedfromamulti-columnlewiththecutprogramwhichiscontainedwithin IMSStuttgart August9,1996
29 Chapter4 Thecorpusregistry tionally,theregistryholdsalewhichdescribeswhocanaccessalocalcorpusfromremote hostsandalewhichcapturesalogofallremoteconnectionstolocalcorpora.thischapteronlydescribesthedescriptionlesforlocalcorporaandthedescriptionlesforremote describeswhereinthelesystemthevariousleswhichbuildthecorpusarestored.addi- Thecorpusregistryholds,foreachcorpusbeingprocessedbytheworkbench,alewhich inchapter5. Theregistryissimplyagloballyaccessibledirectory,calledtheregistrydirectory,andholds, corpora,thetwoadditionalleswhicharenecessaryforremoteconnectionsaredescribed 4.2below,denewhichannotationsareassociatedwiththecorpusandwherethedatais aleiscalledtheregistryleofthecorpus.thecontentsofthele,describedinsection stored.anannotationwhichisnotdenedintheregistryleofacorpuscannotbeaccessed foreachcorpus,alewiththesamenameasthecorpusname,inlowercaseletters.1such beaccessiblebyalltools.itiswithintheresponsibilityofthecorpusadministrator(you!) byanyofthetools.similarly,whenanattributeisdenedintheregistry,itissupposedto toassurethatallannotationsdenedinallregistrylesareaccessible,andthatonlythose Currently,therearesomeruleshowtonametheattributesofacorpusandhowtoname attributesaredenedwhichareinfactaccessible. therespectiveregistryles.theserulesaredescribedinthefollowingsection. Attheendofthischapter,section4.5summarizesthesinglestepswhichyoushouldfollow Thisissueisdiscussedinchapter7below. whenpreparingandregisteringanewcorpus. Asalreadymentionedinsection1.2above,corporaexistwhereaccesshastobecontrolled. 4.1 Twosimpleruleshavetobeobeyedforthedenitionofnamesforcorporaandattributes: 1InCqp,allcorpusnamesareenteredinuppercase,buttheyareconvertedtolowercasetoloadthe Someremarksaboutnomenclature correctcorpus. 28
30 corpusandattributenamesmustbeginwithalowercaseletter,andmaybe IMSCorpusWorkbench:Administrator'sManual 29 Bydefault,thetoolsexpecttheregistrydirectorytobe/corpora/c1/registry.Sinceat yoursitethisdirectorymostprobablydoesnotexist,thedefaultvaluecanbeoverridden followedbyanarbitrarylongsequenceoflowercaselettersordigits. bytheenvironmentvariablecorpusregistry.pleasedonotaddaslashattheendofthe variableinhisorher.tcshrcor.cshrcshellinitializationleinhis/herhomedirectory valueofthisvariable.wesuggestthateithereachuserofthetoolssetsthisenvironment orthatitissetinoftheglobalshellinitializationles,whichusuallyresidein/etcand non-defaultregistrydirectory. areonlywriteablebythesystemadministratororthesuperuser.pleaseaskyoursystem Additionally,almostalltoolsofthetoolboxtakethe-rcommandlineoptiontospecifya administratorforfurtherhelpincaseyoushouldn'tknowwhereorhowtosetthevariable. 4.2 Intheregistryle,allattributesofacorpusaredeclared.Additionally,some\global" variablesareset. Thecontentsofaregistryle Aregistrylemaycontainemptylines. Acommentbeginswithahashmark(#),everythinguptotheendofthelineisnotread. Theformatofaregistryleis: Thisorderhastobekeptinallregistryles.Intheattributedenitionsection,theattributesmaybedeclaredinanyorder. <Attributedenitions> <Headerinformationandglobalvariables> 4.2.1Theheader Theeldnamesare Theheaderconsistsof4declarationsofvalues,eachofwhichisprecededbytheeldname. Theeldnames(keywords)arealluppercase. ashort(one-line)descriptionofthecorpus(keywordname).theeldvalueisastring auniqueidentier(keywordid).theeldvalueisasymbol.usually,theeldvalue shouldbethesameasthelenameoftheregistryle; enclosedindoublequotes; optionally,the\homedirectory"ofthecorpus(keywordhome).theeldvalueisa OliverChrist path(notenclosedindoublequotes); IMSStuttgart August9,1996
31 optionally,thepathofthe\infole"ofthecorpus(keywordinfo).thistextle IMSCorpusWorkbench:Administrator'sManual 30 shouldcontainadescriptionofthecorpus,itsannotations,perhapsadministrative IftheHOMEeldismissing,youhavetospecifythepathforeachattribute,soitismore part-of-speechannotationthere,ifthecorpusistagged. information,etc.itisalsoagoodideatoincludeadescriptionofthetagsetofthe convenienttodenethiseldwhenallcorpus-relateddatalesarekeptinasingledirectory. buttonisselectedinthecorpuslist). Aftertheheader,thesetofcorpusattributes(annotations)isdeclared.Thedierenttypes Cqp,thisisdonewiththeinfocommand,inXkwic,thisleisdisplayedwhentheInfo IftheINFOeldismissing,nocorpusinformationcanbedisplayedinCqporXkwic(in ofannotationsare positionalattributes(section4.2.2); structuralattributes(section4.2.3); mappingtables(section4.2.4); ngramtables(section4.2.5); dynamicattributes(section4.2.7); alignmentinformation(section4.2.6); thecaseofmappingtablesandbigramtables). Assaidbefore,allannotationsmaybedeclaredinalmostanyorder(butseethenotesin insection3.2.theselesarecalledthecomponentsofapositionalattribute.aregistryle Apositionalattributeisencodedasasetofsevenles,whichhavebeendescribedabove 4.2.2Positionalattributes isstored.itisnotnecessarytomanuallysetthenamesofallcomponentssincethereare defaultruleshowtocomputeundenedcomponentlenamesfromthedenedones.in denes,foreachcomponentofapositionalattribute,thelenameinwhichthecomponent donothavetodeclarecomponentpathsatall. Thedeclarationofapositionalattributelooksasfollows: fact,wesuggesttorelyonthedefaultnameswhichencodeassignstotheles.then,you optionalandisonlyneededwhenyouhavetodenenon-defaultlenames. wherenameistheidentierfortheattribute(likeword,pos,lemma,:::).theoptbodyis ATTRIBUTENameOptBody Whenyouusethebody,itcanbeoneofthefollowing: OliverChrist adenitionofthecomponentpaths,comppathspec; IMSStuttgart August9,1996
32 orthedeclarationthattheattributeisfoundonaremotehost,remotespec; IMSCorpusWorkbench:Administrator'sManual 31 orapathwhichoverwritesthehomeeldofthecorpusandsaysthatallcomponents remotelystoredcorporaisdisabled.whenthealtpathdeclarationisused,itdeclaresa TheRemoteSpeccurrentlyisnotsupported,sinceintheactualdistribution,accessto arestoredatadierentplace,altpath. pathdierentofthecorpuspathforthisspecialattribute. Withinthesebraces,asequenceofcomponentname/pathspecicationpairsislisted.Each Thecomponentpathspecication,CompPathSpec,mustbeenclosedinbracesf:::g. componentnamemayonlyoccuronce: fcomponentidpathspec ::: \homedirectory"oftheattribute.twoothervirtualcomponentsareaname,thenameof Onecomponentis\virtual"(DIR)anddoesn'tdescribelenamesbutratherdenotesthe g theattributejustbeingdened,andapath,whichisthe\homedirectory"ofthecorpus, ThePathSpecisastandardpath,inwhichMacrosmaybeused.Suchamacroisstarted whichdefaultstothevalueofthevalueofthehomeeldoftheheader(ifpresent). withadollarsign$anddirectlyfollowedbyacomponentname.forexample,themacro specications.forexample, $ANAMErepresentsthevalueoftheattributename.Macrovaluesmaybeusedinpath directory),followedbyaslash,followedbythevalueoftheanamevariable,andthenfollowed standsfortheconcatenationofthevalueoftheapathvariable(usuallythecorpushome $APATH/$ANAME.f by.f. Ingeneral,itispossibletorefertothevalueofanycomponentbyprexingitscomponent Thefollowingtableliststhecomponents,thecomponentidentiersandthedefaultvalue refertoacomponentvaluewhichisnotyetdened. identierwithadollarsign($).thus,whenacomponentvalueisdened,itispossibleto ormacrothroughwhichthe(default)valueiscomputed: usethevaluesofpreviouslydenedcomponentsinthedenition.itisanerrorwhenyou
Natural Language Processing
Natural Language Processing 2 Open NLP (http://opennlp.apache.org/) Java library for processing natural language text Based on Machine Learning tools maximum entropy, perceptron Includes pre-built models
More informationLecture 2, Introduction to Python. Python Programming Language
BINF 3360, Introduction to Computational Biology Lecture 2, Introduction to Python Young-Rae Cho Associate Professor Department of Computer Science Baylor University Python Programming Language Script
More informationChapter 7. Language models. Statistical Machine Translation
Chapter 7 Language models Statistical Machine Translation Language models Language models answer the question: How likely is a string of English words good English? Help with reordering p lm (the house
More informationTechnical Information www.jovian.ca
Technical Information www.jovian.ca Europa is a fully integrated Anti Spam & Email Appliance that offers 4 feature rich Services: > Anti Spam / Anti Virus > Email Redundancy > Email Service > Personalized
More informationSecret Debian Internals
Enrico Zini enrico@debian.org 25 February 2007 BTS Where to find it Source code: bzr branch http://bugs.debian.org/debbugs-source/mainline/ Data on merkel at /org/bugs.debian.org/spool/ Data rsyncable
More informationMarkus Dickinson. Dept. of Linguistics, Indiana University Catapult Workshop Series; February 1, 2013
Markus Dickinson Dept. of Linguistics, Indiana University Catapult Workshop Series; February 1, 2013 1 / 34 Basic text analysis Before any sophisticated analysis, we want ways to get a sense of text data
More informationVerizon Firewall. 1 Introduction. 2 Firewall Home Page
Verizon Firewall 1 Introduction Verizon Firewall monitors all traffic to and from a computer to block unauthorized access and protect personal information. It provides users with control over all outgoing
More informationCompleting the Accounts Payable (AP) Redistribution Form For Invoices Matched to a Purchase Order
Completing the Accounts Payable (AP) Redistribution Form For Invoices Matched to a Purchase Order The steps below outline how to use Oracle to find the necessary information to complete the AP Redistribution
More informationMotivation. Korpus-Abfrage: Werkzeuge und Sprachen. Overview. Languages of Corpus Query. SARA Query Possibilities 1
Korpus-Abfrage: Werkzeuge und Sprachen Gastreferat zur Vorlesung Korpuslinguistik mit und für Computerlinguistik Charlotte Merz 3. Dezember 2002 Motivation Lizentiatsarbeit: A Corpus Query Tool for Automatically
More information1. VERSIONSHISTORY...2 2. MERCHANT TRANSACTION RECONCILIATION FILE GENERAL...2 3. STRUCTURE FILE...3 4. DETAILED FILE DESCRIPTION...
MERCHANT TRANSACTION RECONCILIATION FILE FILE DESCRIPTION ACTUAL FROM APRIL 2014 VERSION 2.1 CONTENT 1. VERSIONSHISTORY...2 2. MERCHANT TRANSACTION RECONCILIATION FILE GENERAL...2 3. STRUCTURE FILE...3
More informationAudit Troubleshooting
CHAPTER 2 Revised: July 2010, Introduction This chapter provides the information needed for monitoring and troubleshooting audit events and alarms. This chapter is divided into the following sections:
More informationNAT TCP SIP ALG Support
The feature allows embedded messages of the Session Initiation Protocol (SIP) passing through a device that is configured with Network Address Translation (NAT) to be translated and encoded back to the
More informationSales Person Commission
Sales Person Commission Table of Contents INTRODUCTION...1 Technical Support...1 Overview...2 GETTING STARTED...3 Adding New Salespersons...3 Commission Rates...7 Viewing a Salesperson's Invoices or Proposals...11
More informationNLP Lab Session Week 3 Bigram Frequencies and Mutual Information Scores in NLTK September 16, 2015
NLP Lab Session Week 3 Bigram Frequencies and Mutual Information Scores in NLTK September 16, 2015 Starting a Python and an NLTK Session Open a Python 2.7 IDLE (Python GUI) window or a Python interpreter
More informationLitigation Support connector installation and integration guide for Summation
Litigation Support connector installation and integration guide for Summation For AccuRoute v2.3 July 28, 2009 Omtool, Ltd. 6 Riverside Drive Andover, MA 01810 Phone: +1/1 978 327 5700 Toll-free in the
More informationSchema documentation for types1.2.xsd
Generated with oxygen XML Editor Take care of the environment, print only if necessary! 8 february 2011 Table of Contents : ""...........................................................................................................
More informationConfigure your firewall for administrative access via RADIUS authentication
Configure your firewall for administrative access via RADIUS authentication Version 1.0 PAN-OS 5.0.1 Johan Loos johan@accessdenied.be Configure your Palo Alto firewall for RADIUS Authentication This guide
More informationPayScan Bill Payment Retailer Software Architecture & Design
PayScan Bill Payment Retailer Software Architecture & Design INTRODUCTION The following diagram (not illustrated here) depicts the software components of PayScan bill payment system within a Retailer Point
More informationSAPScript. A Standard Text is a like our normal documents. In Standard Text, you can create standard documents like letters, articles etc
SAPScript There are three components in SAPScript 1. Standard Text 2. Layout Set 3. ABAP/4 program SAPScript is the Word processing tool of SAP It has high level of integration with all SAP modules STANDARD
More informationCOMP 356 Programming Language Structures Notes for Chapter 4 of Concepts of Programming Languages Scanning and Parsing
COMP 356 Programming Language Structures Notes for Chapter 4 of Concepts of Programming Languages Scanning and Parsing The scanner (or lexical analyzer) of a compiler processes the source program, recognizing
More informationSpecial Topics in Computer Science
Special Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 Jong C. Park Computer Science Department Korea Advanced Institute of Science and Technology INTRODUCTION Jong C. Park, CS
More informationFinancial Processing Journal Voucher (JV)
Financial Processing Journal Voucher (JV) Contents Document Layout... 1 Journal Voucher Details Tab... 2 Process Overview... 4 Business Rules... 4 Routing... 4 Initiating a Journal Voucher Document...
More informationConfiguring NetFlow on Cisco ASR 9000 Series Aggregation Services Router
Configuring NetFlow on Cisco ASR 9000 Series Aggregation Services Router This module describes the configuration of NetFlow on the Cisco ASR 9000 Series Aggregation Services Router. A NetFlow flow is a
More informationThere s nothing like a Firewall. Olivier Paul, GET/INT MONAM 07, Toulouse, France
There s nothing like a Firewall Olivier Paul, GET/INT MONAM 07, Toulouse, France How does this relate to risk? From wikipedia Risk = (probability of incident) * (Cost of incident) From risk evaluation
More informationCardSwipe Integration
CardSwipe Integration CardSwipe is an app that passes data from a Mag Stripe Reader to your application. No data is stored in CardSwipe making it PCI Compliant. Communication between CardSwipe and your
More informationPaperCut Payment Gateway Module PayPal Website Payments Standard Quick Start Guide
PaperCut Payment Gateway Module PayPal Website Payments Standard Quick Start Guide This guide is designed to supplement the Payment Gateway Module documentation and provides a guide to installing, setting
More informationeaccounts Customer Instruction Manual
eaccounts Customer Instruction Manual Table of Contents 1.0 eaccounts Homepage... 3 2.0 Login... 4 3.0 Login History... 5 4.0 Download History... 6 5.0 Verify Customer Details... 7 6.0 Verify Operations
More informationA Mixed Trigrams Approach for Context Sensitive Spell Checking
A Mixed Trigrams Approach for Context Sensitive Spell Checking Davide Fossati and Barbara Di Eugenio Department of Computer Science University of Illinois at Chicago Chicago, IL, USA dfossa1@uic.edu, bdieugen@cs.uic.edu
More informationEVALITA 07 parsing task
EVALITA 07 parsing task Cristina BOSCO Alessandro MAZZEI Vincenzo LOMBARDO (Dipartimento di Informatica Università di Torino) 1 overview 1. task 2. development data 3. evaluation 4. conclusions 2 task
More informationgeändert / changed: Pos. Beschreibung Datum: Name: Item description: Date: Name: 1 2 3 4
Hose Fitting Fitting Hitzeschutz Part Nr.: Material or Hose Fitting Fitting Hitzeschutz Part Nr.: Material or Hose Fitting Fitting Hitzeschutz Part Nr.: Material or Hose Fitting Fitting Hitzeschutz Part
More informationUniversal Data Mover. User Guide. Indesca / Infitran. Version 4.1.0. udm-user-4101
Universal Data Mover User Guide Indesca / Infitran Version 4.1.0 udm-user-4101 Universal Data Mover User Guide Indesca / Infitran 4.1.0 Document Name Document ID Universal Data Mover 4.1.0 User Guide
More informationExpect-lite Update. Linux Symposium 2014. A quick look four years later by Craig Miller
Expect-lite Update Linux Symposium 2014 A quick look four years later by Craig Miller Quick Introduction What is expect-lite Expect-lite is an open-source automation software Designed for non-programmers
More informationSDL Passolo 2015 Table of Contents General... 1 Content Overview... 1 Typographic Conventions... 2 First Steps... 5 First steps... 5 The Start Page... 5 Creating a Project... 5 Updating and Alignment...
More informationSecure Held Print Jobs. Administrator's Guide
Secure Held Print Jobs Administrator's Guide September 2013 www.lexmark.com Contents 2 Contents Overview...3 Configuring Secure Held Print Jobs...4 Configuring and securing the application...4 Using Secure
More informationWinSALTS. The 32-bit version of the WinSALTS Program. WinSALTS Training Handout Modified for ROM II EDI Users 10 March 10, 2005 Version 5.
WinSALTS The 32-bit version of the WinSALTS Program WinSALTS Training Handout Modified for ROM II EDI Users 10 March 10, 2005 Version 5.04 SALTS CENTRAL NAVSISA (N00367) 5450 Carlisle Pike P.O. Box 2010
More informationUser's Guide. Using RFDBManager. For 433 MHz / 2.4 GHz RF. Version 1.23.01
User's Guide Using RFDBManager For 433 MHz / 2.4 GHz RF Version 1.23.01 Copyright Notice Copyright 2005 Syntech Information Company Limited. All rights reserved The software contains proprietary information
More informationVISION FINANCIALS. Budget Status (GLS8020) Introduction. Purpose of the Report
VISION FINANCIALS Budget Status (GLS8020) Introduction Purpose of the Report The report displays all Commitment Control ledger amounts (budgeted, associated revenue, pre-encumbrance, encumbrance, expense)
More informationData Intensive Computing Handout 5 Hadoop
Data Intensive Computing Handout 5 Hadoop Hadoop 1.2.1 is installed in /HADOOP directory. The JobTracker web interface is available at http://dlrc:50030, the NameNode web interface is available at http://dlrc:50070.
More informationIT cost survey for Swiss banks 2015 Evaluation report (based on 2014 effective data and 2015 budget data)
IT cost survey for Swiss banks 2015 Evaluation report (based on 2014 effective data and 2015 budget data) Zurich, May 2015 ferhat.geyran@itopia.ch rene.stierli@itopia.ch Agenda slide/page Introduction
More informationPreparing your data for analysis using SAS. Landon Sego 24 April 2003 Department of Statistics UW-Madison
Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison Assumptions That you have used SAS at least a few times. It doesn t matter whether you run SAS in
More informationCLC Server Command Line Tools USER MANUAL
CLC Server Command Line Tools USER MANUAL Manual for CLC Server Command Line Tools 2.5 Windows, Mac OS X and Linux September 4, 2015 This software is for research purposes only. QIAGEN Aarhus A/S Silkeborgvej
More informationLoad Balancing and Sessions. C. Kopparapu, Load Balancing Servers, Firewalls and Caches. Wiley, 2002.
Load Balancing and Sessions C. Kopparapu, Load Balancing Servers, Firewalls and Caches. Wiley, 2002. Scalability multiple servers Availability server fails Manageability Goals do not route to it take servers
More informationIntegrating Procurement Cards with Oracle Internet Expenses: Lessons Learned. Session ID: 08141
Integrating Procurement Cards with Oracle Internet Expenses: Lessons Learned October 06, 2011 Presented By Ashish Nagarkar (AST Corporation) anagarka@astcorporation.com Sari Fessenden (City of Modesto)
More informationOverview of Web Services API
1 CHAPTER The Cisco IP Interoperability and Collaboration System (IPICS) 4.5(x) application programming interface (API) provides a web services-based API that enables the management and control of various
More informationSwipe reader interfaces
Section 2-9 Swipe reader interfaces This section: Defines the built-in Wiegand and Mag Stripe data formats which can be read by the 4422 swipe card module, the 4410 swipe card and PINpad module, the 4420
More informationMicrosoft Dynamics GP. Field Service - Preventive Maintenance
Microsoft Dynamics GP Field Service - Preventive Maintenance Copyright Copyright 2010 Microsoft Corporation. All rights reserved. Complying with all applicable copyright laws is the responsibility of the
More information24 Uses of Turing Machines
Formal Language and Automata Theory: CS2004 24 Uses of Turing Machines 24 Introduction We have previously covered the application of Turing Machine as a recognizer and decider In this lecture we will discuss
More informationIssue Tracking System. User Manual
Issue Tracking System User Manual Document Number: ODM_ITS-USM-0001(ITS_Customer_Interface) Revision Number : 2.5 Security Level : Public Date : 2010.12.17 Prepared by: HuiHui Wang Date Prepared: 2010.12.17
More informationSending an Email Message from a Process
Adobe Enterprise Technical Enablement Sending an Email Message from a Process In this topic, you will learn how the Email service can be used to send email messages from a process. Objectives After completing
More informationTable Of Contents. iii
PASSOLO Handbook Table Of Contents General... 1 Content Overview... 1 Typographic Conventions... 2 First Steps... 3 First steps... 3 The Welcome dialog... 3 User login... 4 PASSOLO Projects... 5 Overview...
More informationNLP Programming Tutorial 5 - Part of Speech Tagging with Hidden Markov Models
NLP Programming Tutorial 5 - Part of Speech Tagging with Hidden Markov Models Graham Neubig Nara Institute of Science and Technology (NAIST) 1 Part of Speech (POS) Tagging Given a sentence X, predict its
More informationThe make utility. Basics
The make utility Basics make is a utility that helps keep the executable versions of programs current. It automatically updates a target file when changes are made to the files used to build the target.
More informationIBM MaaS360 Mobile Document Editor User Guide
IBM MaaS360 Mobile Document Editor User Guide Introduction MaaS360 Mobile Document Editor allows you to edit files directly in IBM MaaS360 Secure Mobile Mail or in your IBM MaaS360 Docs Repository. MaaS360
More informationData Intensive Computing Handout 6 Hadoop
Data Intensive Computing Handout 6 Hadoop Hadoop 1.2.1 is installed in /HADOOP directory. The JobTracker web interface is available at http://dlrc:50030, the NameNode web interface is available at http://dlrc:50070.
More informationA Time Efficient Algorithm for Web Log Analysis
A Time Efficient Algorithm for Web Log Analysis Santosh Shakya Anju Singh Divakar Singh Student [M.Tech.6 th sem (CSE)] Asst.Proff, Dept. of CSE BU HOD (CSE), BUIT, BUIT,BU Bhopal Barkatullah University,
More informationDocument Management: Document Imaging System Setup
Document Management: DocMan, Release 4.1 2003 Enterprise Computer Systems, Inc., Greenville, SC Notice This manual is provided to enhance your knowledge of the software product. It is your responsibility
More informationFAR014: A Day in the Life for a Mars Planner
FAR014: A Day in the Life for a Mars Planner featuring the new Replenishment Workbench Robert Collom, Supply Capability Manager Mars Information Services Diana Mitten, Product Director JDA Agenda Introduction
More informationCHAPTER 11 LEGAL ACCOUNTING MODULE 11.0 OVERVIEW 11.1 REQUIREMENTS AND INSTALLATION. 11.1.1 Special Requirements. 11.1.
EXTENDED SERVICE OPTIONS CHAPTER 11 11.0 OVERVIEW The Legal Accounting Module provides line item tracking of legal expenses incurred during the collection process. You can track expenses incurred by the
More informationHorizon Debt Collect. User s and Administrator s Guide
Horizon Debt Collect User s and Administrator s Guide Microsoft, Windows, Windows NT, Windows 2000, Windows XP, and SQL Server are registered trademarks of Microsoft Corporation. Sybase is a registered
More informationEligible Professional Menu Measure Frequently Asked Questions
Eligible Professional Menu Measure Frequently Asked Questions Drug Formulary Checks 1. If an EP is unable to meet the measure of a meaningful use objective because it is outside of the scope of his or
More informationAn Eprints Apache Log Filter for Non-Redundant Document Downloads by Browser Agents
An Eprints Apache Log Filter for Non-Redundant Document Downloads by Browser Agents Ed Sponsler Caltech Library System http://resolver.caltech.edu/caltechlib:spoeal04 December, 2004 Contents 1 Abstract
More informationMicrosoft Dynamics GP. Pay Steps for Human Resources Release 9.0
Microsoft Dynamics GP Pay Steps for Human Resources Release 9.0 Copyright Copyright 2006 Microsoft Corporation. All rights reserved. Complying with all applicable copyright laws is the responsibility of
More informationIntroduction to Linux operating system. module Basic Bioinformatics PBF
Introduction to Linux operating system module Basic Bioinformatics PBF What is Linux? A Unix-like Operating System A famous open source project Free to use, distribute, modify under a compatible licence
More informationPageR Enterprise Monitored Objects - AS/400-5
PageR Enterprise Monitored Objects - AS/400-5 The AS/400 server is widely used by organizations around the world. It is well known for its stability and around the clock availability. PageR can help users
More informationRiOffice Users Manual
RiOffice Users Manual Rio Networks 9/23/2009 Contents Available Services... 4 Core PBX Features... 4 Voicemail Features... 4 Call Center Features... 4 Call Features... 4 Using Your Phone... 5 Phone Layout...
More informationIntercluster Lookup Service
When the (ILS) is configured on multiple clusters, ILS updates Cisco Unified Communications Manager with the current status of remote clusters in the ILS network. The ILS cluster discovery service allows
More informationData Intensive Computing Handout 4 Hadoop
Data Intensive Computing Handout 4 Hadoop Hadoop 1.2.1 is installed in /HADOOP directory. The JobTracker web interface is available at http://dlrc:50030, the NameNode web interface is available at http://dlrc:50070.
More informationCSCI 5417 Information Retrieval Systems Jim Martin!
CSCI 5417 Information Retrieval Systems Jim Martin! Lecture 9 9/20/2011 Today 9/20 Where we are MapReduce/Hadoop Probabilistic IR Language models LM for ad hoc retrieval 1 Where we are... Basics of ad
More informationSystem and Network Management
- System and Network Management Network Management : ability to monitor, control and plan the resources and components of computer system and networks network management is a problem created by computer!
More informationVBA Microsoft Access 2007 Macros to Import Formats and Labels to SAS
WUSS 2011 VBA Microsoft Access 2007 Macros to Import Formats and Labels to SAS Maria S. Melguizo Castro, Jerry R Stalnaker, and Christopher J. Swearingen Biostatistics Program, Department of Pediatrics
More informationWeb Development using PHP (WD_PHP) Duration 1.5 months
Duration 1.5 months Our program is a practical knowledge oriented program aimed at learning the techniques of web development using PHP, HTML, CSS & JavaScript. It has some unique features which are as
More informationInstalling and Setting up Microsoft DNS Server
Training Installing and Setting up Microsoft DNS Server Introduction Versions Used Windows Server 2003 Setup Used i. Server Name = martini ii. Credentials: User = Administrator, Password = password iii.
More informationClick-To-Talk. ZyXEL IP PBX License IP PBX LOGIN DETAILS. Edition 1, 07/2009. LAN IP: https://192.168.1.12 WAN IP: https://172.16.1.1.
Click-To-Talk ZyXEL IP PBX License Edition 1, 07/2009 IP PBX LOGIN DETAILS LAN IP: https://192.168.1.12 WAN IP: https://172.16.1.1 Username: admin Password: 1234 www.zyxel.com Copyright 2009 ZyXEL Communications
More informationMicrosoft Dynamics GP. Field Service Preventive Maintenance
Microsoft Dynamics GP Field Service Preventive Maintenance Copyright Copyright 2011 Microsoft. All rights reserved. Limitation of liability This document is provided as-is. Information and views expressed
More informationSWIFT MT940 MT942 formats for exporting data from OfficeNet Direct
SWIFT MT940 MT942 formats for exporting data from OfficeNet Direct January 2008 2008 All rights reserved. With the exception of the conditions specified in or based on the 1912 Copyright Act, no part of
More informationFortiVoice. Version 7.00 User Guide
FortiVoice Version 7.00 User Guide FortiVoice Version 7.00 User Guide Revision 2 28 October 2011 Copyright 2011 Fortinet, Inc. All rights reserved. Contents and terms are subject to change by Fortinet
More informationConnect Ticket Entry. Quick Reference Guide
Connect Ticket Entry Quick Reference Guide Davisware 514 Market Loop West Dundee, IL 60118 Phone: (847) 426-6000 Fax: (847) 426-6027 Contents are the exclusive property of Davisware. Copyright 2015. All
More informationTemplate and Daily Schedules
IDX PATIENT SCHEDULING APPLICATION Template and Daily Schedules TRAINING GUIDE IDX 9.0 MSU HEALTHTEAM TRAINING AND EDUCATION JANUARY 2007 1 MODULE 1 OVERVIEW OF SCHEDULES... 3 MAINTENANCE ACTIVITIES...
More informationUsing Process Monitor
Using Process Monitor Process Monitor Tutorial This information was adapted from the help file for the program. Process Monitor is an advanced monitoring tool for Windows that shows real time file system,
More informationChapter 24: Creating Reports and Extracting Data
Chapter 24: Creating Reports and Extracting Data SEER*DMS includes an integrated reporting and extract module to create pre-defined system reports and extracts. Ad hoc listings and extracts can be generated
More informationBorderware Firewall Server Version 7.1. VPN Authentication Configuration Guide. Copyright 2005 CRYPTOCard Corporation All Rights Reserved
Borderware Firewall Server Version 7.1 VPN Authentication Configuration Guide Copyright 2005 CRYPTOCard Corporation All Rights Reserved http://www.cryptocard.com Overview The BorderWare Firewall Server
More informationCPSM MEDITECH 5.67. Inventory Inquiries
CPSM MEDITECH 5.67 Inventory Inquiries Contents CPSM Inventory Inquires... 2 Stock Inquiry... 2 Select... 11 Item Inquiry... 16 Purchase Order Inquiry... 32 Check Purchase Order Number... 38 View Vendor
More informationenabling prepaid products & services everywhere General description: This documentation will illustrate how to configure the AMS Gateway Airtime Module for IQ Business/IQ Enterprise/IQ POS/IQ Free POS
More informationChapter 6: Data Entry
Chapter 6: Data Entry The Imports module in SEER*DMS includes data entry screens that allow you to enter data using the keyboard. This feature is used to key in data printed on paper forms or images of
More informationTrustkeeper PCI Compliance Guide for Merchants
Trustkeeper PCI Compliance Guide for Merchants For questions about Trustkeeper and the enrollment process please contact Trustwave at 866-659-9067. 1. Register yourself with Trustkeeper The first step
More informationPaperless Collection System. PCS Collector
Paperless Collection System PCS Collector About this Manual This IDX Training Manual is written to give you a step-by-step guide for your classroom training and a handy reference for your daily work. The
More informationApplying Co-Training Methods to Statistical Parsing. Anoop Sarkar http://www.cis.upenn.edu/ anoop/ anoop@linc.cis.upenn.edu
Applying Co-Training Methods to Statistical Parsing Anoop Sarkar http://www.cis.upenn.edu/ anoop/ anoop@linc.cis.upenn.edu 1 Statistical Parsing: the company s clinical trials of both its animal and human-based
More informationThis presentation explains how to monitor memory consumption of DataStage processes during run time.
This presentation explains how to monitor memory consumption of DataStage processes during run time. Page 1 of 9 The objectives of this presentation are to explain why and when it is useful to monitor
More informationApply PERL to BioInformatics (II)
Apply PERL to BioInformatics (II) Lecture Note for Computational Biology 1 (LSM 5191) Jiren Wang http://www.bii.a-star.edu.sg/~jiren BioInformatics Institute Singapore Outline Some examples for manipulating
More informationINASP: Effective Network Management Workshops
INASP: Effective Network Management Workshops Linux Familiarization and Commands (Exercises) Based on the materials developed by NSRC for AfNOG 2013, and reused with thanks. Adapted for the INASP Network
More informationDOMIQ, SIP and Mobotix cameras
DOMIQ, SIP and Mobotix cameras This tutorial is the second in the series in which we present integration of Mobotix devices with the DOMIQ system. The main subject of this tutorial is the implementation
More informationConfiguring Denial of Service Protection
24 CHAPTER This chapter contains information on how to protect your system against Denial of Service (DoS) attacks. The information covered in this chapter is unique to the Catalyst 6500 series switches,
More informationLogging Service and Log Viewer for CPC Monitoring
TECHNICAL BULLETIN Logging Service and Log Viewer for CPC Monitoring Overview CPC has developed a set of add-on programs for its Monitoring software that generates logs of events and errors encountered
More informationReal Estate Reports Overview Quick Reference Guide
Real Estate Reports Overview Quick Reference Guide Overview This guide shows you the options available for customising the standard RE reports available in SAP. It covers the following: Using individual
More informationThirdlane User Portal 2.1. Users Guide 05/12/2008. Third Lane Technologies, LLC 39 Power Lane Fairfax, CA 94930. http://www.thirdlane.
Thirdlane User Portal 2.1 Users Guide 05/12/2008 Third Lane Technologies, LLC 39 Power Lane Fairfax, CA 94930 http://www.thirdlane.com Copyright 2003-2008. Third Lane Technologies, LLC. All rights reserved.
More informationVTiger CRM + Joomla/ChronoForms Integration
VTiger CRM + Joomla/ChronoForms Integration Table of Contents 1.- Configuration of VTiger... 2 A.- Enabling the Webforms module... 2 B.- Creating a service user... 2 C.- Editing the Webforms configuration
More informationAvaya Network Configuration Manager User Guide
Avaya Network Configuration Manager User Guide May 2004 Avaya Network Configuration Manager User Guide Copyright Avaya Inc. 2004 ALL RIGHTS RESERVED The products, specifications, and other technical information
More informationCOSC 3351 Software Design. Architectural Design (II) Edgar Gabriel. Spring 2008. Virtual Machine
COSC 3351 Software Design Architectural Design (II) Spring 2008 Virtual Machine A software system of virtual machine architecture usually consists of 4 components: Program component: stores the program
More information