DataIntegrationwithXMLandSemanticWeb Technologies

Size: px
Start display at page:

Download "DataIntegrationwithXMLandSemanticWeb Technologies"

Transcription

1 DataIntegrationwithXMLandSemanticWeb Technologies Athesispresented by RubénTous Submittedinpartialfullmentoftherequerimentsfor DoctorateinComputerScienceandDigitalCommunication thedegreeofdoctorofphilosophy DepartmentofTechnologies Advisor:Catedràticd'UniversitatJaimeDelgadoMercé UniversitatPompeuFabra Barcelona,June2006

2 Dipòsit legal: B ISBN:

3 c2006-rubéntous Allrightsreserved.

4

5 Thesisadvisor Catedràticd'UniversitatJaimeDelgadoMercé RubénTous Author DataIntegrationwithXMLandSemanticWebTechnologies Accessingdatafrommultipleheterogeneousdatabasesentailsdealingwithdierentdatamodels,dierentschemasanddierentquerylanguagesandinterfaces.Thisthesis canbedividedintwomainparts,bothrelatedtodierentaspectsofthedesignofmodern dataintegrationsystems. geneousdatamodelsandschemas. Therstpartisrelatedtotheproblemofdealingwithhetero- Abstract oneisfocusedinthesemanticintegrationbetweenxmlandrdf(theresourcedescriptionframework).wesuggestastrategybasedonthemappingofthexmlmodeltordf triples.translatingxmldocumentstordfpermitstakingprotfromthepowerfultools ofdescriptionlogicsallowingxmldocumentsinteroperateatthesemanticlevel. mappinghasgeneratedsomeinterestingresults,likeaschema-awareandontology-aware This Itisalsodividedintwodierentsub-parts. XPathprocessorthatcanbeusedforschemasemanticintegrationorevenforimplicitquery transcodingamongdierentdatamodels. RightsManagement(DRM)domain,wheresomeorganizationsareinvolvedinstandardizationoradoptionofrightsexpressionlanguages(REL). TheapproachhavebeentestedintheDigital measurementandowlontologiesalignment. Inthesecondsub-partwesuggestavectorspacemodelforsemanticsimilarity measureisapre-requisiteofanyontologymatchingsystem.thepresentedmodelisbased Arigorous,ecientandscalablesimilarity onamatrixrepresentationofnodesfromanrdflabelleddirectedgraph. describedwithrespecttohowitrelatestootherconceptsusingn-dimensionalvectors,being Aconceptis nthenumberofselectedcommonpredicates.wehavesuccessfullytestedthemodelwith thepublictestcasesoftheontologyalignmentevaluationinitiative2005. queryinterfaces.wesuggestastrategythatallowsredistributinganexpressiveuserquery Thesecondpartoftheworkisrelatedtotheproblemofdealingwithheterogeneous (expressedinaxml-baseddataquerylanguage)overasetofautonomousandheterogeneous databasesaccessedthroughwebforms.theidea,thathasrecentlybeenrenamedbythomas overtheresultsreturnedbythedierentsources,thatmustbeasupersetoftheresults Kabisch[81]as"QueryTunneling",consistsonthereprocessingoftheinitialuserquery thatsatisfytheinitialquery.wedescribeinthisdocumentthestrategyanditslimitations, andanimplementationintheformoftwojavaapis,thejavasimpleapiformetasearch (JSAM),thathasbeenusedinthedevelopmentofaspanishnewsmetasearchengine,and thejavasimpleapiforwebinformationintegration(sawii),thatoershighleveltools tothedevelopmentofarticulatedwrappersforcomplexwebform-chainsandresultpages.

6

7 Contents TitlePage... Abstract... TableofContents... ListofFigures vii ListofTables...xiv xii AlreadyPublishedWork... Acknowledgments...xvii Dedication...xix 1 Introduction 1.1 Aboutthisthesis AimsandHypothesis Methodology Relatedtotherstcontributionpart'HeterogeneousDataModelsand Relatedtothesecondcontributionpart'HeterogeneousQueryInterfaces:QueryTunneling' 3 Schemas:SemanticIntegration' Documentoutline BackgroundInformation StateoftheArtandProblemStatement HeterogeneousDataModelsandSchemas:SemanticIntegration HeterogeneousQueryInterfaces:QueryTunneling Relationshipbetweenthesispartsandchapters BackgroundInformation 2.1 Informationvs.Data Metadata InformationRetrievalvs.DataRetrieval TraditionalInformationRetrievalvs.MultimediaInformationRetrieval 2.5 DataIntegration DistributedInformationRetrievalvs.DataIntegration Metasearch Datalog TheExtensibleMarkupLanguage(XML) TheSemanticWeb vii

8 viii Contents 2.11QueryingtheSemanticWeb SemanticIntegration ResourceDescriptionFramework(RDF) 2.14OntologyWebLanguage(OWL) OWLandDescriptionLogics I StateoftheArtandProblemStatement 17 3 StateoftheArtinDataIntegration 3.1 HistoricalProgress MediatedSchema(themodelingproblem) Formalisationofthemodellingproblem Formalisationofglobal-as-view(GAV)approach Formalisationoflocal-as-view(LAV)approach Queryreformulationalgorithms(thequeryingproblem) 4.1 QueryreformulationinLAVandGAV(thequeryingproblem) Answeringqueriesusingviews Querycontainment Rewritingaqueryusingviews Parametrizedviews Queryprocessing DataIntegrationandXML 5.1 MappingtheclassicdataintegrationproblemstoXML XMLquerylanguagesanddataintegration XSLTransformations(XSLT) XMLQuery(XQuery) XMLDataIntegrationSystems Tukwila Enosys XQuareFusion 34 6 SemanticIntegration 6.1 OntologiesandDataIntegration Semanticintegrationchallenges OntologyAlignment AlignmentMethods Similaritymeasures GMO.Astructure-basedsemanticsimilarityalgorithm Graphsimilaritycalculationalgorithm GMOadaptationofthegraphsimilarityalgorithmtoOWL-DL ConceptofsimilarityinGMO Anexample

9 ix 6.4 UpperOntologies IEEESUMO DOLCE WordNet Cyc/OpenCyc CurrentChallengesinDataintegration 7.1 SemanticMappingsGeneration:SchemamatchingandOntologyAlignment. 7.2 Answeringqueriesusingontologyalignments Uncertainmappings XML-RDFsemanticintegration QueryinghighlyvolatileandrestrictedWebdatasources DataintegrationinP2P ProblemStatement 8.1 Problemaddressed1:Semanticintegration Problemaddressed2:Heterogeneousqueryinterfaces II HeterogeneousDataModelsandSchemas:SemanticIntegration 57 9 XMLSemanticIntegration:AModelMappingApproach 9.1 Alreadypublishedwork 9.2 Introduction Relatedwork Thequeryrewritingapproach Otherrelatedwork.Model-mappingvs.Structure-mapping ArchitectureofthesemanticXPathprocessor Overview OWL.Anontologyweblanguage AnOWLontologyfortheXMLmodel(XML/RDFSyntax) XPath XPathdatamodel XPathsyntax XPathFormalsemantics RDQLAQueryLanguageforRDF XPathtranslationtoRDQL Exampleresults Incorporatingschema-awareness MappingXMLSchematoRDF Asimpleexampleofschema-awareXPathprocessing CompleteXSDtoOWLMapping Implementationandperformance JenaInferenceEngine Performance Contents

10 x Contents 9.7 TestingintheDRMApplicationDomain ApplicationtoODRLlicenseprocessing ApplicationtotheMPEG-21authorizationmodel Conclusions AVectorSpaceModelforSemanticSimilarityCalculationandOWLOntologyAlignment 10.1Alreadypublishedwork Introduction Motivation OntologyAlignment Semanticsimilaritymeasures Ourapproach RepresentingRDFlabelleddirectedgraphswithavectorspacemodel(VSM) Similarityofentitieswithinthesameontology Applyingthemodeltoanontologyalignmentprocess Computationalcostandoptimization Comparisonagainstalgoritmsbasedonbipartitegraphs Anextendedexample Results RelatedWork Conclusions III HeterogeneousQueryInterfaces:XMLQueryTunneling 99 11FacingHeterogeneousQueryInterfaces:QueryTunneling 11.1Alreadypublishedwork Introduction Websearchengines Specialisedsearchengines Metasearchengines Ourapproach:Advancedmetasearch SpecialisedMetasearch XMLSearchNeutralLanguage(XSNL) 11.5APracticalApplication:AdvancedNewsMeta-searchEngine Implementation Mappingtheuserquerytotargetsystems Metadataextraction("ScreenScraping") Reprocessingtheresults Relatedwork 11.8Conclusions

11 Contents xi 12WaitingPoliciesforDistributedInformationRetrievalontheWeb 12.1Motivation DistributedSearchEnginePerformance 12.3WaitingPolicy TargetEnginesBehaviour 12.5Resultsvs.Time SourceDiscardingPolicies 12.7MinimumGrantedResultsPolicy Conclusions IV GeneralConclusions Conclusions 13.1HeterogeneousDataModelsandSchemas:SemanticIntegration HeterogeneousQueryInterfaces:XMLQueryTunneling 13.3AFinalComment Bibliography 129

12 List of Figures Architectureofadataintegrationsystem Global-as-view(GAV)approach Local-as-view(LAV)approach AconjunctivequeryQ2withtwosubgoals... AconjunctivequeryQ1withthreesubgoalsthatiscontainedinQ TheRDFgraphofOA... TheRDFgraphofOB 6.3 GA(left)andGB(right) ComparisonbetweenanRDFgraph(left)anditscorrespondantdirected bipartitegraph(right)... G0A G0B 6.7 GA GB Workowofthequeryansweringprocess SemanticXPathprocessorarchitectureoverview... XMLsimpleexampledescribingtwomovies RDFgraphformoviesexample Jenahybridexecutionmodel SomeODRLconstraintelementsdenedassubstitutionGroupofconstraintElement MPEG-21RELauthorizationmodel 9.7 Exampleofresourceelements PortionoftheactstaxonomyinMPEG-21RDD GA 10.2GA(left)andGB(right) BipartiteversionofFigure GA GB BipartiteversionofGA xii

13 xiii 10.7BipartiteversionofGB StrategyDiagram DMAG'sNewsAdvancedMeta-searchEngineuserinterface 11.3DMAG'sNewsAdvancedMeta-searchEngineadvanceduserinterface MappingXSNLtoeachenginewithXMLQuery ScreenScrapingwithXMLQuery ReprocessingoftheresultswithXMLQuery Threewaytrade-oinsearchengineperformance[85] 12.2WaitingPolicy DistributionofEnginesDelay 12.4ResultsEvolution Resultsprogressioninhostilenetworkconditions 12.6TimeoutPolicyExample Minimum-resultsPolicyExample 12.8SourceDiscardingPoliciesFormula MinimumGrantedResultsPolicyFormula MinimumGrantedResultsPolicy ListofFigures

14 List of Tables XPathaxis XMLSchemainformalsemantics XSD2OWLtranslationsfortheXMLSchemaconstructs 9.4 OWLconstructssupportedbytheJena'sOWLreasoner Performancefordierentdocumentdepthlevels NumberofnamedXMLSchemaprimitivesinODRLandMPEG-21REL OAEI2005testswhereourapproach(vsm)obtainsadierentresultthan[64] 97 xiv

15 AlreadyPublishedWork LargeportionsofChapter9haveappearedinthefollowingpapers: TousR.,GarcíaR.,RodríguezE.,DelgadoJ.ArchitectureofaSemanticXPath Processor.ApplicationtoDigitalRightsManagement,6thInternationalConferenceonElectronicCommerceandWebTechnologiesEC-Web2005.August 2005Copenhagen,Denmark. (2005),pp.1-10.ISSN: LectureNotesinComputerScience,Vol TousR.,DelgadoJ.ASemanticXPathprocessor.InterDB2005International WorkshoponDatabaseInteroperability.ELSEVIER'sElectronicNotesinTheoreticalComputerScience2005 TousR.,DelgadoJ.RDFDatabasesforQueryingXML.AModel-mapping Approach.DISWeb2005InternationalWorkshopDataIntegrationandtheSemanticWeb.ProcedingsoftheCAiSE'05Workshops.FaculdadedeEngenharia dauniversidadedoporto.isbn pages: TousR.,DelgadoJ.UsingOWLforQueryinganXML/RDFSyntax.WWW'05: Specialinteresttracksandpostersofthe14thinternationalconferenceonWorld WideWeb.Chiba,Japan.Pages: ACMPress2005.ISBN: LargeportionsofChapter10haveappearedinthefollowingpaper: TousR.,DelgadoJ.AVectorSpaceModelforSemanticSimilarityCalculation andowlontologyalignment,17thinternationalconferenceondatabaseand ExpertSystemsApplications(DEXA2006),4-8September2006. lishedinlecturenotesincomputerscience. Tobepub- LargeportionsofChapters11and12haveappearedinthefollowingpapers: GilR.,TousR.,GarcíaR.,RodríguezE.,DelgadoJ.ManagingIntellectual PropertyRightsintheWWW:PatternsandSemantics.1stInternationalConferenceonAutomatedProductionofCrossMediaContentforMulti-channel Distribution(AXMEDIS2005),November2005 TousR.,Delgado,J.InteroperabilityAdaptorsforDistributedInformation SearchontheWeb.Proceedingsofthe7thICCC/IFIPInternationalConference onelectronicpublishing TousR.,Delgado,J.AdvancedMeta-SearchofNewsintheWeb,Proceedings ofthe6thinternationaliccc/ifipconferenceonelectronicpublishing.publisher:vwfberlin,2002.isbn pages. ElectronicpreprintsareavailableontheInternetatthefollowingURL:

16

17 Acknowledgments trationproject,andbyvisnet,europeancommissionistfp6networkofexcellence.i ThisworkhasbeensupportedbyAgentWeb,TIC SpanishAdminis- wouldlikealsotoexpressmygratitudetojaimedelgado,myadvisor,withoutwhomthis thesishaveneverbeginandwhatisworst,neverend.iamalsogratefultomiqueloliver andjoanvinyesfortheirsupportindiculttimes. withmesogoodandbadmomentsduringthedoctorateprogramme.ithankanaescudero IthankJuanC.Sánchezforsharing forbeingsuchapatientdeskmateandforhergrammarwisdom.ithankborisbellaltafor hissenseofhumourandsarcasticpointofview.ithankannasferopoulouforherunconditionalsupportandinspiration.ithankmohammadmobaseriforhisquietandmeditated pointofview.ithankjaumebarcelóforhisimpulsiveandextremepointofview(that'sa crossacknowledgment;-))ithankrobertogarcíaforintroducingmetothesemanticweb andforourbreakfastbrainstorms,undoubtfullyoneofmybestsourcesofinspiration. thankevarodríguezforherpracticalopinionsandhercontributionsinthedrmpart.ii thankrosagilforhersincerityandsensibility.ithankmanelguerreroforhiscomments andhelpwithlinux. mybeginningsatupf.ithankjorgeinfanteforourneverendingdiscussionswhereinever I'minrealdebtwithRamonMartíforhishelpandpartnershipin winbutalwayslearn. ity.ithanksusanna'belén'fernández,iolandasabaterandcarinaortegaformakingmy IthankNúriaGarciaforherexperiencedcounsel,andherhuman- visitstotheesupsecretariatapleasure. mostcompetentprofessionalievermet.iamalsogratefultoallmyotherfriendsfromthe IthankRoserClaveroforbeingthenicestand DepartmentofTechnology(presentandformer).Iamforeverindebtedtomyparents,my brotherandelenafortheirunderstanding,endlesspatienceandencouragement.

18

19 DedicatedtoLourdes,Benjamí,DanandElena

20

21 Chapter 1 Introduction oftheworkthatisaddressedbythisdocument.intheotherhandtofacilitatethereading Thisintroductorychapterservestothreepurposes.Inonehandtodenethescope ofthedocumentbygivingandoverviewofitsstructure.finally,butthemostimportant, toprovidethenecessaryelementstoenclosetheworkintheframeworkofaphdthesis. 1.1 Aboutthisthesis tributedmultimediaapplicationsgroup(dmag),andalsointhecontextofthedoctorate ThisdocumentdescribesresearchworkbeingdoneundertheframeworkoftheDis- incomputerscienceanddigitalcommunicationofthedepartmentoftechnologyofthe UniversitatPompeuFabra.Theworkfocusesonthedierentaspectsrelatedtothedesign ofmoderndataintegrationsystemsinthecontextoftheworldwideweb,combiningan exploratorystagewithothersfocusedonthedesignofnewstrategiesandmodelsandthe developmentofspecictools.themainscopeoftheworkisthedataintegrationproblem, withinthedatabasesresearchdiscipline,butthatalsofallsbetweenotherwell-knowndisciplines.ononehandinformationretrieval,sinceinmostcasesthepurposeofstoringand accessingdataandmetadataisnotthemetadatathemselvesbutthesupporttoinformation searchandretrievalprocesses. SemanticWebInitiative,apromisingandrelativelynewresearchlinewheremetadatahas OntheotherhandtheWebdiscipline,andspeciallythe acentralrole. contextofthiswork,andtheweaknessesandopportunitiesthatjustifytheworkitself. AmongallthesesourcesI'vetriedtochoosetheelementsthatdenethe 1.2 AimsandHypothesis arisenwithrespecttothequeryingofdistributedandheterogeneoussourcesofdataand Thegeneralgoalofthisresearchprojectistostudysolutionstothenewproblems metadata.thisgoalcanbeseenasareformulationofsometopicsfacedbythedataintegrationcommunity,butnowarisenagainasdatabasesbecomeopenandaccessiblethrough theweb,storingdataandmetadatainoldandnewformatsthatcanbethebasisofbetter waysofsearchingandretrievingdigitalcontents.withinthisambitiousandbroadaim,the workhasfocusedintworelatedbutindependentaspects,semanticintegrationandquery 1

22 2 Chapter1:Introduction interoperability.withinthesemanticintegrationproblem,wefacethexml-rdfintegrationfromanovelapproach,andalsotheontologyalignmentproblem. interoperabilityproblem,wefacethequeryingofrestrictedandheterogeneousweb-based Withinthequery databaseinterfaceswithxmltechnologies. APIs)totestthemodelsandstrategiessuggested. Amoretangiblesub-goaloftheworkisthedevelopmentoftools(intheformof integration,theimplementationofasemanticxpathprocessoranditsusageinthedigital Fortherstpart,relatedtosemantic RightsManagement(DRM)domainwilldemonstratetheinterestofournovelwaytomap XMLandRDF.Relatedtoontologyalignment,theimplementationofournovelstructure similaritymeasureanditsresultsagainsttheontologyevaluationinitiative2005testsuite willdemonstratetherelevanceofourapproach. interoperability,twojavaapis-thejavasimpleapiformetasearch(jsam)andthe Forthesecondpart,relatedtoquery JavaSimpleAPIforWebInformationIntegration(SAWII)-integratedinsideanadvanced spanishnewsmetasearchenginewillservetoprovetheadvantageofthequerytunneling technique strategiesstilldon'ttakeprotfromthepotentialofthenewusesofmetadataontheweb Ȧlltheworkissustainedoverthepresumptionthatthetraditionaldataintegration andsomeofitsnewdirectlyorindirectlyrelatedtechnologies,likexml[151],xmlquery [154],RDF[124]orOWL[112].Thesuccessofstandardsyntaxesformetadata(XMLand RDF),thedisseminationofmetadatarelatedtomultimediacontents(e.g.MPEG-7[101])or tothemorebroadframeworkofthesemanticweb,opennewopportunitiesandchallenges fordistributeddataretrievalanddataintegration. Statementchapter. ForamoredetaileddescriptionofthegoalsofthethesispleaserefertotheProblem 1.3 Methodology theexploratoryandanalyticalstageofthethesishasimpliedtheidenticationandreading Inordertoavoidreplicationofwork,andalsotoachievethenecessarybackground, ofrelevantmaterialsfromalotofdierentsources.becausetheworkfallsbetweendierent researchdisciplines,thatsometimeshavedierentapproachestothesameproblems,ihave triedtoidentifythemostauthoritativesourcesofeacheld Relatedtotherstcontributionpart'HeterogeneousDataModels andschemas:semanticintegration' A.Y.Halevy[54]etal. Iacquiredbackgroundofissuesrelatedtosemanticintegrationintheworksof al.[5],m.c.a.klein,l.v.lakshmananandf.sadri[87],p.f.patel-schneiderandj.simeon (thepiazzainfrastructure),i.cruzetal. [65],B.Amannet [117]. integration)camefromthereadingofarelativelyoldworkofyoshikawaetal. However,thesparkthatinspiredthecontributionoftherstsub-part(XML-RDF dierentiatedbetweenapproachesthatmapthestructureofsomexmlschematoaset [97]that ofrelationaltablesandworksthatmapthexmlmodeltoageneralrelationalschema respectively. I'vealsofoundapreviousworkfromLehtiandFankhauser[91]thatalso

23 Chapter1:Introduction 3 pursuesthetargettoachieveasemanticbehaviourforxpath/xquery.fortheformalisation ofthexml/rdfsyntaxwithdescriptionlogicsi'veborrowedabundantmaterialfromian HorrocksandUlrikeSattler,like[62],[60],[59],[63]and[61].Duringthedevelopmentofthe inference-basedxpathprocessori'veusedthejaxenuniversaljavaxpathengine[75]for parsingxpath2.0expressionsandthejenaapi[76]anditsowlinferenceengine[77]for processingthequeries. [139],RDQL[127],ortherecentSPARQL[138])I'vechosenRDQLforitsmaturityand AmongthedierntRDFquerylanguages(rdfDB[125],SquishQL becauseit'ssupportedbythejenaapi.thisstagehasentailedalsofrequentvisitstosome unfriendlyw3cspecicationslikethexmlinformationset[68],thexmlpathlanguage (XPath)2.0[69],XQuery1.0[154],XQuery1.0andXPath2.0DataModel[152],XQuery1.0 andxpath2.0formalsemantics[153],rdf/xmlsyntaxspecication[147],owlweb OntologyLanguageOverview[112]andXSLTransformations(XSLT)Version2.0[155]. I'vefoundagoodsurveyonschemaalignmentfromErhardRahmandPhilipA.Bernstein Forthesecondsub-part(avectorspacemodelforsematicsimilaritycalculation), readabundantmaterialaboutsemanticsimilarity,liketheoldworksofphilipresnik[129], [122],andamorerecentsurveyonontologyalignmentfromNatalyaF.Noy[106].I'vealso offranciscoazuajeandolivierbodenreider[8]orm.bisson[19]amongothers.herethe DekangLin[94]orJ.J.JiangandD.W.Conrath[79];andalsomorerecentoneslikethose keyworkwherei'vefoundinspirationhasbeenthepaperofweihuetal.[64]fromwho I'vealsoreceivedpersonalsupportandpatientclarications.Thegraphmatchingalgorithm thatihaveusedcomesfromtheworkofvincentd.blondeletal.[20] Relatedtothesecondcontributionpart'HeterogeneousQueryInterfaces:QueryTunnelingrialfromtheworkofRicardoBaeza[9]andsomeofthereferencesheuses.Formorespecic ForgeneralconceptsontheInformationRetrievalareaI'veborrowedalotofmate- aspectsonwebsearch,i'vestudiedtheworksofkobayashiandtakeda[85],lawrenceand Giles[88],BrinandPage[23][113]andothers.Ithasbeendiculttondrigurousresearch worksaboutmetasearch,becauseisapopulartopicamongindependentandsometimes volunteer-basedcommunities,buti'vefoundagoodbasisontheworksofdreilinger[33] andselbergandetzioni[135],amongothers. arebehindthisinitiative,liketimberners-lee[11][15][12][14][13],jameshendler[17],ora ForthestudyabouttheSemanticWebI'vereadmaterialsbythepeoplewho Lassila[17]orDanBrickley[22],andI'vehadtoassimilatethespecicationofRDFand otherrelatedtechnologies.i'vealsoinvestedsometimetryingtounderstandandappreciate theaddedvalueoftherdfmodelagainstothers(e.g.relationalorxml),andihavefound somegoodmaterialsaboutthistopics,likee.g.[25]or[13].searchingformaterialsrelatedto thesemanticwebactivity,i'vefoundsomepromisingworks,liketherdfwebinitiativeand itsfoafontology[39],theworkofguhaandmaccool[120][49]orthequerybyexample byreynolds[130].tomaketheanalysisaboutdigitallibrariesandgeneralmetadataissues Z39.50protocol[156],SRW[140]andtheOAI-PMHprotocol[109][110]. I'vespentsometimestudyingthespecicationsoftheDublinCoreElementSet[28],the someworksdiscussingissuesaboutdigitallibrariesandtheopenarchivesinitiative,like I'vealsofound e.g.thosebylagozeandsompel[86][137]orbaker[10].thesearejustsomeofthematerials

24 4 Chapter1:Introduction reviewed,butservetoillustrateinsomewaytheselectioncriteria. tionalsoftwareengineeringapproach.therequirementsstagehasconsistedininnumerable ForthedevelopmentofthespecialisedmetasearchapplicationI'vefollowedatradi- meetingswithmyadvisor,jaimedelgado,totrytodeneclearlythedierentaspectsof thefunctionality.theanalysisstagehasinvolvedthehardtaskoftranslatingthenatural languagerequirementstosomeformalmodel,i'veusedumltodothisbutalsosomehomemadediagrams.thedesignstagehasconsistedinweavingtheobjectsarchitectureofthe application,withanspecialemphasisinthemulti-threadingsub-parts.ithasalsoinvolved thespecicationoftestxml-basedquerylanguageandthestudyofthecriticalpartsof thesystem,asthemappingofthequeriesorthemetadataextraction. thissomeexistingxml-basedquerylanguages,likexcql[150]orxmlquery[154]. I'vestudiedtodo Theimplementationmethodologyhasinvolvedthestudyandselectionofthetools(Java, Tomcat,XML,XMLQuery,Tidy)andtheanalysisoftheperformance,whereI'veinvested animportanttime,speciallyindeningagoodpolicytodiscardtargetsthataresueringof notordinarydelays.i'mgoingtotalkmoreabouttheimplementationmethodologylater. 1.4 Documentoutline andproblemstatement','heterogeneousdatamodelsandschemas:semanticintegration' Thisdocumenthasfourmainparts,'BackgroundInformation','StateoftheArt and'heterogeneousqueryinterfaces:querytunneling',andalsoothersmallersectionslike thisintroductionorthenalcomments BackgroundInformation oftheexploratorystagethatcanhelpreadersnotfamiliarwithdatabases,information The'BackgroundInformation'parttriestocompileinacoherentwaysomeresults retrievalorthesemanticweb.thisimpliestodeneanddescribethekeyconceptsofthese areasandrelatedtothework,andalsothediculttasktoweavealltherelationships betweenthem.readersfamiliarwiththesetechnologieswillprobablyskipthispart StateoftheArtandProblemStatement beingdoneinthedataintegrationeldingeneralandthesemanticintegrationresearchtrend The'StateoftheArtandProblemStatement'partisasurveyoftheprogresses inparticularinthelastyears.itbeginsenumeratingsomeclassicalworksandconceptsof theareaandendsanalysingitscurrentweaknessesandopportunities HeterogeneousDataModelsandSchemas:SemanticIntegration themaincontributionsofthiswork. AftertheStateoftheArtcometworelatedbutindependentpartscontaining providesacompleteinternalstructure,includinganintroduction,arelatedworksection Becauseoftheirparticularitieseachoneofthem andconclusions.

25 Chapter1:Introduction 5 Integration'partincludestwoworksrelatedtotheuseofSemanticWebtechnologiestohelp Therstmaincontribution,'HeterogeneousDataModelsandSchemas:Semantic solvingtheproblemtoworkindomainswheremultipleschemasorontologiesexist. rstsub-part,'xmlsemanticintegration:amodelmappingapproach',describesamodel The tofacetheproblemofdealingwithheterogeneousdatamodelsandschemas.itisfocused inthesemanticintegrationbetweenxmlandrdf(theresourcedescriptionframework). IsuggestastrategybasedonthemappingoftheXMLmodeltoRDFtriples. ingxmldocumentstordfpermitstakingprotfromthepowerfultoolsofdescription Translat- LogicsallowingXMLdocumentsinteroperateatthesemanticlevel.Thismappinghasgeneratedsomeinterestingresults,likeaschema-awareandontology-awareXPathprocessor thatcanbeusedforschemasemanticintegrationorevenforimplicitquerytranscoding amongdierentdatamodels. andowlontologyalignment',describesanovelsemanticsimilaritymeasurebasedona Thesecondsub-part,'AVectorSpaceModelforSemanticSimilarityCalculation matrixrepresentationofnodesfromanrdflabelleddirectedgraph.aconceptisdescribed withrespecttohowitrelatestootherconceptsusingn-dimensionalvectors,beingnthe numberofselectedcommonpredicates.itshowshowadaptingthegraphmatchingalgorithm thetestcasesoftheontologyalignmentevaluationinitiative2005. in[20]toapplythisideatothealignmentoftwoontologies.italsoincludestheresultsof HeterogeneousQueryInterfaces:QueryTunneling describesastrategytofacetheproblemofdealingwithheterogeneousqueryinterfaces. Thesecondmaincontribution,'HeterogeneousQueryInterfaces:QueryTunneling' suggestastrategythatallowsredistributinganexpressiveuserqueryoverasetofdatabasesi withheterogeneousweb-basedqueryinterfaces.theidea,thathasrecentlybeenrenamed bythomaskabisch[81]as"querytunneling",consistsonthereprocessingoftheinitial userqueryovertheresultsreturnedbythedierentsources,thatmustbeasupersetofthe resultsthatsatisfytheinitialquery.idescribeinthispartthestrategyanditslimitations, andanimplementationintheformoftwojavaapis,thejavasimpleapiformetasearch (JSAM),thathasbeenusedinthedevelopmentofaspanishnewsmetasearchengine,and thejavasimpleapiforwebinformationintegration(sawii),thatoershighleveltools tothedevelopmentofarticulatedwrappersforcomplexwebform-chainsandresultpages Relationshipbetweenthesispartsandchapters referstotheproblemofschema/ontologyinteroperability. Therstpart,'HeterogeneousDataModelsandSchemas:SemanticIntegration', intwosub-problems,1)howcanweuseaglobal(mediator)dataschema/ontologyto Thisproblemcanbedivided interoperateasetofheterogeneousschemas/ontologies?and2)howcanweautomatically generatethisglobalschema/ontology? ingastrategytodesignaschema-awareandontology-awarexpathprocessor,whichprocess Chapter9facestherstoftheseproblemsfortheparticularcaseofXML,suggest- queriestakinginconsiderationtherelationshipsdenedinoneormorexmlschemasor OWL/RDFSontologies.Thisallowstowritethequeriesintermsofoneoftheschemas/ontologies

26 6 Chapter1:Introduction (actingasaglobalschema),whoserelationshipswiththespecicschemashavebeenpreviouslyspecied.howtheserelationships(mappings)areobtainedisnotthefocusofthis chapter,butofthenextone. problem(howthemappingsbetweenontologies/schemascanbeautomaticallygenerated). Chapter10isindependentfromthepreviousone,butisrelatedtothesecond Thisisthereasonwhythetwochaptershavebeenplacedinthesamepart. practicalapproachtothedesignofaweb-baseddataintegrationsystemusingxmltechnologies. Query,forrealworldscenarios. Itincludessomechapterscoveringpracticalsolutions,basedonXMLandXML Thesecondpart,'HeterogeneousQueryInterfaces:QueryTunneling',describesa chapters,thefocusofthisworkisnotthegenerationofschemamappingsortheinteractionwithontologies,buttheproblemofdistributingthequeriesoverrestrictedautonomous However,despiteofitisstronglyrelatedtotheprevious interfaces.thisisthereasonwhythetwocontributionshavebeenplacedinseparateparts.

27 Chapter 2 Background Information 'dataintegration','semanticweb'andothers,arespeciallysusceptibleofbeinginterpreted Termsas'information','data','metadata','informationretrieval','dataretrieval', inverydierentways.so,toavoidmisunderstandings,inthissectioni'mgoingtoclarify, oratleastnarrow,thesemanticsofsomeimportantconceptsrelatedtothiswork. 2.1 Informationvs. Data herei'mgoingtospeciallystrictinseparatingthetwoconcepts.accordingtothefreeon- Despiteofinsomeworkstheterms'information'and'data'areusedindistinctly, linedictionaryofcomputing[111],dataarenumbers,characters,images,orothermethod ofrecording,inaformwhichcanbeassessedbyahumanor(especially)inputintoacomputer,storedandprocessedthere,ortransmittedonsomedigitalchannel.computersnearly alwaysrepresentdatainbinary.dataonitsownhasnomeaning,onlywheninterpretedby somekindofdataprocessingsystemdoesittakeonmeaningandbecomeinformation.. highereortofabstraction.nowwecantakeprotthatwe'vealreadyseparatedthesetwo So,itisclearthatiseasiertodenedatathaninformation,whichrequiresa conceptstolocateanotherone,knowledge. Accordingto[21]Wehadtwodecadeswhich focusedsolelyondataprocessing,followedbytwodecadesfocusingoninformationtechnology, andnowthathasshiftedtoknowledge.there'sacleardierencebetweendata,information, andknowledge. Knowledgeistheabilitytousethatinformation.. Informationisabouttakingdataandputtingitintoameaningfulpattern. 2.2 Metadata tionontheorganizationofthedata,thevariousdatadomains,andtherelationshipbetween Inshort,metadataisdataaboutdata. Accordingto[9]metadataisinforma- them.timberners-leegivesamoreweb-centricdenition,metadataismachineunderstandableinformationaboutwebresourcesorotherthings. ofmetadata[100],descriptivemetadataandsemanticmetadata.descriptivemetadatais Wecandierencetwokinds externaltothemeaningofthedataitdescribes,andpertainsmoretohowitwascreated. Forexample,theDublinCoreMetadataElementSet[28]proposes15eldstodescribea 7

28 8 Chapter2:BackgroundInformation document. dataitisdescribing.anexampleofsemanticmetadatacouldbesomempeg-7[101]descriptors,thatallowtodescribeforexamplethatavideoincludesafootballmatchina rainyday. intellectualpropertyrights,etc.butnowadayssomepointtomoreambitioususes(overall Therearealotofdierentusesofmetadata,likecataloguing,contentrating, SemanticMetadatafeaturesthesubjectmatterthatcanbefoundwithinthe ofsemanticmetadata)likeforexamplethesemanticwebinitiative. 2.3 InformationRetrievalvs. DataRetrieval ganizationof,andaccesstoinformationitems. Accordingto[9],informationretrievaldealswiththerepresentation,storage,or- theuseraccesstotheinformationinwhichheisinterested.however,theuserinformation TheaimofanIRsystemistofacilitate needcannotbeeasilyformalized,anitmustbetranslatedintoaqueryprocessablebythe retrievalsystem. mightberelevanttotheuser. Giventheuserquery,theIRsystemaimstoretrieveinformationwhich acollectionsatisfyclearlydenedconditionsasthoseinarelationalalgebraexpression. Ontheotherhand,adataretrievalsystemaimstodeterminewhichobjectsof Foradataretrievalsystem,likeadatabase,asingleerroneousobjectamongathousands retrievedmeansatotalfailure. expressivequerylanguagesandperformanceissues,whileinformationretrievalfacesthe So,dataretrievaldealswithwelldeneddatamodels, problemofinterpretingthecontentsoftheinformationitemstodecidetheirrelevance.the twoconceptsarenotisolated,dataretrievaliseveranimportantpartofanirsystem,and canbeseenasalower-levellayer.becausethisresearchworkfocusesinmetadata,instead ofonothertraditionalirissues,itisinsomewaymorerelatedtodataretrievalratherthan toinformationretrieval.however,becausethecontextkeepsbeingirsystemsfortheweb, thereferencestoiraspectswillbeusual. 2.4 mationretrieval Traditional Information Retrieval vs. Multimedia InforditionalIRisanolddiscipline,withpublishedbooksfromeventhelast70slikee.g.[131], Traditionalinformationretrievalonlydealswithunstructuredtextualdata.Tra- andalreadyclassicconferenceslikeacmsigir(internationalconferenceoninformation Retrieval)orTREC(TextREtrievalConference).Itsresearchisbasedinsolidandwellcharacterizedmodels,likethebooleanmodel,thevectormodelortheprobabilisticIRmodel. stagesincethe90s. TheirruptionoftheWebandWebsearchengineshasputIRatthecenterofthe retrievalofheterogeneousmultimediacontents.multimediadataisrapidlygrowinginthe HoweverthenewcontextintroducesnewchallengesforIRlikethe Internet,andalsometadatarelatedtomultimediainformationobjects,thatofcourseinclude textualdocuments.multimediairsystemsmustsupportdierentkindsofmediawithvery heterogeneouscharacteristicssuchastext,stillandmovingimages,graphsandsound.this posesseveralinterestingchallenges,duetotheheterogeneityofdataandthefuzzinessof information.multimediairsystemshaveaninterestingfeatureforourconcerns,theymust

29 Chapter2:BackgroundInformation 9 handlemetadata,becauseitiscrucialfordataretrieval,whereastraditionalirsystemsdo nothavesuchrequirement. 2.5 DataIntegration studymechanismsforaseamlessaccesstoautonomousandheterogeneousinformation TheDataintegration(alsonamedInformationIntegration)researchdiscipline sources. cations. Traditionallythetargetofadataintegrationsystemistoprovideamediation ThesesourcescanvaryfromlegacydatabasestoSemanticWeborP2Pappli- architectureinwhichauserposesaquerytoamediatorthatretrievesdatafromunderlying sourcestoanswerthequery. dierentdatamodelsandschemas,arethechallengesofadataintegrationsystem. Theconstraintsofthesourcesaccess,andtheirpotentially situationwillbedescribedinthestateoftheartandproblemstatementchapter. Dataintegrationisthemaintopicofthisthesis,anditsevolutionandcurrent 2.6 DistributedInformationRetrievalvs. DataIntegration accordingto[24],inasingledatabasemodeloftextretrieval,inwhichdocumentsarecopied SearchenginesfortheWeborotherinformationretrievalsystemsareusuallybased, toacentralizeddatabase,wheretheyareindexedandmadesearchable(thismodelcan beseenastheinformationretrievalversionofdatawarehousingfordataretrieval). ever,someinformationisnotaccessibleunderthismodel(itcanbequeriedbutcannotbe How- copiedtothecentralizeddatabase)fordierentreasons(size,volalility,interfacerestrictions).thealternativeisamulti-databasemodel,inwhichthecentralsite(oranypeerina distributedpeer-to-peercontext),insteadofstoringcopiesofthedocuments,translatesthe userinformationneedintoqueriestothedierentsources. bythedistributedinformationretrievaldiscipline,andhasbeenalsoinformallyknownas Thiskindofmodelisstudied metasearch. Adistributedinformationretrievalsystemcoverstraditionallythefollowingstages: Sourcedescription:Thecontentsofeachtextdatabasemustbedescribed Sourceselection:Giventhedescriptorsandtheuserinformationneed,whichsources mustbequeried. Sourcequerying:Maptheinformationneedtotheselectedsourcesandquerythem. Resultsmerging:Mergetherankedlistsreturnedbythedierentsources. alent,andinsomecontextsthewordsaremixed. Insomeaspectsdataintegrationanddistributedinformationretrievalareequiv- retrievaltargetstosatisfyauserinformationneedoverunstructureddataorsemi-structured However,whiledistributedinformation datasources,adataintegrationsystemaimstosatisfyaqueryoveralsoautonomousand heterogeneous,butstructureddatasources.

30 10 Chapter2:BackgroundInformation 2.7 Metasearch retrievalsystems(see[33]or[135]foramoredetaileddenition). MetasearchistheinformalnamethatreferstoWeb-baseddistributedinformation systemsaresupposedtosolvesomeoftheproblemsdescribedintheprevioussection,and Becausemetasearch becausetheyneverhavereallygainedthefavourofwebusers,someinterestingconclusions canbeextractedfromtheirevolution. asetofconventionalcrawler-basedsearchenginesthatcovercertainsubsetsofthepublic Intheory,thestrengthofametasearchengineisitsrecallvalue,becauseitqueries indexableweb. beingdisjoints,arehighlyoverlapped[88]. However,somestudieshavedemonstratedthatthesesubsets,farfrom problemofconventionalwebsearchsystems,butprecision.anotherimportantdrawback Furthermore,recallisnotpreciselythemain ofmetasearchsystemsisthattheirworkoftendoesneitherrelyoveranagreementwith theirunderlyingsourcesnoroverspecicinterchangeprotocols. theproblemtomanagequeryformsandresultspagesdesignedforhumanconsumptionand Thisforcesthemtoface writteninhtml.thisisusuallysolvedwithhand-codedscreen-scrapingrulesorsimilar thingsthatreducespeedanddicultmaintainability. thatthesesystemshaveanimportantadvantage,theabilitytoaccessthehiddenweb, Somemetasearchdefendersclaim becausetheycapturetheresultsofthetargetsourceson-the-y,thatallowsthemtoharvest dynamicallygeneratedcontent. legalissues1,isfarfrombeingelegant,rememberthesentenceoftimberners-lee[14]: However,thisstyleofdoingthings,withoutconsidering programwhichistakingthedocumentandtryingtogureoutwhereinthatmassofglowing "Andsoyouhaveoneprogramwhichisturningitfromdataintodocuments,andanother ashingthingsisthepriceofthebook.itpicksitoutfromthethirdrowofthesecondcolumn ofthethirdtableinthepage.andthenwhensomethingchangessuddenlyyougettheisbn numberinsteadofthepriceofabookandyouhaveaproblem.thisprocessiscalled'screen scraping',andisclearlyridiculous." 2.8 Datalog toinitialdataintegrationapproaches. Chapters3and4makeuseoftheDataloglanguage[43]insomeexamplesrelated thatsyntacticallyisasubsetofprolog.itslogicbasismadeitpopularinacademicdatabase Datalogisanold(1978)databasequerylanguage research,butdespiteofitsadvantagesoverstandardquerylanguageslikesqlitnever succeededinbecomingpartofcommercialsystems. rulesofthedatalogprogram). ADatalogqueryprogramconsistsofanitesetofHornclausesC1;:::;Ck(the logicwithatmostonepositiveliteral: Hornclausesexpressasubsetofstatementsofrst-order L(L 1;:::;Ln 1Someconventionalsearchenginesexplicitlyforbidmetasearch.InGoogle'stermsofservicepagewecan nd"youmaynotsendautomatedqueriesofanysorttogoogle'ssystemwithoutexpresspermissionin advancefromgoogle".

31 Chapter2:BackgroundInformation 11 Equivalentto: InDatalogeachliteralcorrespondstoanatomicformulap(A1;A2;:::;An)wherepisthe L_:L1;:::;:Ln arevariables).let'sseearelationtablewrittenindatalog: relationnameandaiarevariablesorconstants(namesthatstartwithanuppercaseletter employees(name, dept, id) Let'sseeanexamplequeryprogramwithoneDatalogrule: financeemployees(x, Therearetwotypesofrelations: Y) :- users(x, "finance", (1)baserelations(physicallystoredinthe Y) database)and(2)derivedrelations(temporaryrelationsthatholdintermediateresults). Thegeneralformofaruleisasfollows: Whereqiarebaseorderivedrelationnames, p(x1;x2;:::;xn): q1(x11;x12;:::;x1m);:::;qk(xk1;:::;xkp);e: number)andeachxiappearinginpappearsinatleastoneoftheqi's. eisanarithmeticpredicate(any canbeinterpretedas: TheDatalogrule p(:::)istrueifq1(:::)andq2(:::)and...qk(:::)andeistrue. Answersarecomputedbyusingtop-down(orbackward-chaining)ortop-downalgorithms. InDatalogananswertoanatomicqueryisasetofconstantsthatsatisfythequery. 2.9 TheExtensibleMarkupLanguage(XML) describeformatting,actions,structureinformation,textsemantics,attributes,etc.. Accordingto[9]Markupisdenedasextratextualsyntaxthatcanbeusedto exampleofmarkupcanbetheformattingcommandsofthepopulartextformattingsoftware One TeX. (SGML),ametalanguagefortaggingtextdevelopedbyCharlesF.Goldfarbandhisgroup, InthelateseventieswasdenedtheStandardGeneralizedMarkupLanguage andbasedonapreviousworkdoneatibm.in1996thesgmleditorialreviewboard becamethexmlworkinggroupundertheauspicesoftheworldwidewebconsortium (W3C),chairedbyJonBosakofSunMicrosystemsandwiththeintermediationofDan Connolly. ofsgmlwhichgoalistoenablegenericsgmltobeserved,received,andprocessedonthe ThisgroupdevelopedTheExtensibleMarkupLanguage[151](XML),asubset WebinthewaythatisnowpossiblewithHTML(alsobasedonSGML).XMLhasbeendesignedforeaseofimplementationandforinteroperabilitywithbothSGMLandHTML[151]. thatcanbeusedtodenespecicmarkuplanguages(likexhtml,mathml,svg,etc.). XML,thesameasSGML,isnotexactlyamarkuplanguage,itisametalanguage ThatmeansthatXMLallowsuserstodenenewtagsandstructuresfortheirownlanguages.

32 12 Chapter2:BackgroundInformation Forsomereasons,someobviousandotherthatwillremainamystery,XMLhavereached anamazingsuccessworldwide.inaninterconnectedandglobalsociety,theinterchangeof dataoverastandardsyntaxhasbecomeakeyissue,andhereiswherexmltsperfectly TheSemanticWeb provideadatamodelfortheweb,allowinginformationtobeunderstoodandprocessed TheSemanticWebisapromisinginitiativeleadbytheW3Cwhichaimisto alsobymachines. representationofdataontheworldwideweb.[...]itisbasedontheresourcedescription TheocialdenitionoftheW3C[142]saysTheSemanticWebisthe Framework(RDF),whichintegratesavarietyofapplicationsusingXMLforsyntaxand themeaningoftherepresentationofdata?ofteninthisdocumentwehavereferredtode URIsfornaming.. So,itisclearthatthisinitiativestronglyreliesonRDF,butwhatis dicultiesrelatedtosearchandretrieveinformationontheweb.oneofthemainreasons isthefactthatthemostpartofwebdata,despiteofbeingprocessedbymachines,can beonlyunderstandbyhumans. audio,etc Thisincludenaturallanguagetext,still/movingimages, retrieval,sayingthatwhiledataretrievalisappropriatefordatabasesitisnotappropriate Ḃeforewehavediscussedthedierencebetweeninformationretrievalanddata (ornotenough)fortheweb.thereasonisthattheinformationontheweb,contraryto twothings,thedevelopmentofadatamodelfortheweb,andthedisseminationofmachineunderstandablemetadata(undertheframeworkofthedatamodel)linkedinsomewayto thewebinformation. AnotherclassicdenitionisthatbyBerners-Leeetal. databases,doesnothaveanunderlyingdatamodel.so,therepresentationofdatameans [17]The SemanticWebisanextensionofthecurrentwebinwhichinformationisgivenwell-dened meaning,betterenablingcomputersandpeopletoworkincooperation QueryingtheSemanticWeb improvingthesearchandretrieveofinformationontheinternet. TheSemanticWebinitiativehasopenedabroadspectrumofopportunitiesfor casual,butoneofthemaintargetsofthisnewscenarioaspointedin[15]or[12].however, Ofcoursethisisnot theconsolidationofastandardisedwaytointerchangesemanticinformationisjustanother stepintheraceforinteroperability.otherbattlesarebeingghttorationalisethewaythis informationisprocessedandsearchandretrievalaremaybethemostimportantelements oftheinformationfeedchain.thechallengeistondecientandrationalwaystoexploit thisnewinformationthatbeginstobedisseminatedoverthenet,andthat,despiteofitis formalisedinastandardway(rdf[22]),itcanbestoredindierentways(embeddedon HTMLpages,inadatabase,inspecicknowledgerepositories,etc.)anditremainshighly heterogeneous(aninnumerableanunrestrictednumberofontologies,potentiallyoverlapped, canco-liveinthesemanticweb). heterogeneity,areofrelevanceforouranalysisandalsoveryrelatedwithwhatwehave Thistwokeyissues,howtolocateandaccesstheinformation,andhowtomanage saidintheprevioussections.someresearchworksreectspecialapproachestothis,likethe

33 Chapter2:BackgroundInformation 13 andisbasedonp2pnetworking(itsjxtaimplementation[80])andrdf.thisinteresting Edutellaproject[102]thatconstitutesadistributedsearchnetworkforeducationalresources workusesthequeryexchangelanguagefamilyrdf-qel-i(basedondatalogsemantics andsubsets)asstandardisedqueryexchangelanguageformat.becauseedutellapeersare highlyheterogeneousandhavedierentkindsoflocalstorageforrdftriples,aswellas somekindoflocalquerylanguage(e.g.,sql)toenablethemtoparticipateinthenetwork, wrappersareusedtotranslatequeriesandresultsfromthepeerandviceversa. RDFstoreandaqueryenginecapabletoprocessRQLqueries[3][42].Ofspecialinterestfor AnotherworkisSesame[73],anextensiblearchitectureimplementingapersistent NegotiationandWebofTrustenabledregistries. usistap[49]asystemthatimplementsageneralqueryinterfacecalledgetdata,semantic SearchanddescribesanimplementedsystemwhichusesthedatafromtheSemanticWeb ItintroducestheconceptofSemantic toimprovetraditionalsearchresults. tonetworkaccessibledatapresentedasdirectedlabelledgraphs,incontrasttoexpressive TheGetDatainterfaceisasimplequeryinterface querylanguageslikesql,rqlordql.thisworkdefendsdeployabilityagainstquery expressiveness. [130],amechanismforspecifyingRDFsubgraphs,whichtheycall'QuerybyExample',that Relatedtothisproject,andalsowiththequerylanguageofEdutella,isRDF-QBE couldallowahighperformancestandardisedinterfaceforretrievalofsemanticinformation fromremoteservers.fromallthisstudycaseswecanobservethelatentnecessityofdening alow-barriermechanismthatallowtoharvestheterogeneoussemanticinformationandhow itgeneratesatrade-obetweendeployabilityandexpressiveness.someofthem(e.g.tap) pointthenecessitytoconsideralsootherconventionalornot-semanticsearchstrategies,like crawler-basedengines,whenthinkinginfutureapplications SemanticIntegration databasesanddataintegration,andthepeoplefromknowledgemanagementandontology Thesemanticintegrationresearchareaisajointeortbetweenthepeoplefrom research. schemasoransweringqueriesusingmultiplesources.ontologiescanbethesolutiontosome Itcanbeseenasareformulationofsomeoldproblemslikematchingdatabase oftheseoldchallengesbutalsothesourceofnewproblems,likeontologyalignment.this lasttopictriestodeterminewhichconceptsandpropertiesrepresentsimilarnotionsbetween twoontologiesandisoneofthemaingoalsinthisarea.relatedtoontologyalignmentarise otherquestions,likehowdowerepresentthemappingsorwhatdowedowiththem.[106] presentsarecentsurveyonsemanticintegration. deeperinthestateoftheartandproblemstatementchapter. Semanticintegrationisoneofthemaintopicsofthisthesis,anditwillbediscussed 2.13 ResourceDescriptionFramework(RDF) processingmetadata;itprovidesinteroperabilitybetweenapplicationsthatexchangemachine- Accordingto[124]theResourceDescriptionFramework(RDF)isafoundationfor

34 14 Chapter2:BackgroundInformation understandableinformationontheweb.rdfisthemainbuildingblockofthesemantic Web,aframeworkcomposedbysomedierentbutstronglyrelatedelements(adatamodel,a syntaxandasubclassinglanguageamongotherthings).firstrdfisasyntax-independent datamodeldesignedforrepresentingnamedpropertiesandpropertyvalues. modelconsistsofthreeobjecttypes.ononehandresources,asallthingsbeingdescribed, Thebasic likewebpages,images,videosoranyotherthing,evenitisnotdigital(likeareal-lifebook). syntax.ontheotherhandwehaveproperties,asspeciccharacteristicsusedtodescribea Theonlyrequirementisthattheyhaveaname,andthisnameconformstotheURI[16] resource.thevalueofaproperty(theobject)canbeanotherresource(speciedbyauri) oritcanbealiteral(i.e.asimplestring).finally,aspecicresourcetogetherwithanamed propertyplustheobject(thevalueofthepropertyforthatresource)isastatement,the thirdbasicobjecttypeofthemodel. cataloguing(fordescribingthecontentandcontentrelationshipsofsomeinformationobject), RDFisbeingusedinavarietyofapplicationareas,likeinresourcediscovery,in byintelligentsoftwareagents,incontentrating,orindescribingintellectualpropertyrights forexample.thecombinationofrdfwithdigitalsignaturesaimstoallowwhatisknown asthe"weboftrust"[83].theconceptualmodelofrdfiscomplementedwithanxml interchangesyntax. creatingandexchangingmetadata. Thesyntaxisneededtoensuretherequiredinteroperabilitywhen orientedmodellingsystems. Tocompletetheframework,RDFhaveaclasssystemmuchlikemanyobject- containsclassesorganisedinahierarchy,oeringextensibilitythroughsubclassrenement. AcollectionofRDFclassesiscalledaschema. Aschema RDFschemasallowreusabilityofmetadatadenitions. writteninrdf,withtherdfschemalanguage[126].rdfschemasarebeingusednowadaystoserialiseontologies. Theschemasthemselvesmaybe 2.14 OntologyWebLanguage(OWL) Accordingto[112]theOntologyWebLanguage(OWL)isalanguageintendedto beusedwhentheinformationcontainedindocumentsneedstobeprocessedbyapplications, asopposedtosituationswherethecontentonlyneedstobepresentedtohumans.owlcan beusedtoexplicitlyrepresentthemeaningoftermsinvocabulariesandtherelationships betweenthoseterms. Thisrepresentationoftermsandtheirinterrelationshipsiscalledan ontology" plementingtherdfscapabilitiesinprovidingsemanticsforgeneralization-hierarchiesof OWLisavocabularyfordescribingpropertiesandclassesofRDFresources,com- suchpropertiesandclasses.owlenrichestherdfsvocabularybyadding,amongothers, relationsbetweenclasses(e.g.disjointness),cardinality(e.g."exactlyone"),equality,richer typingofproperties,characteristicsofproperties(e.g.symmetry),andenumeratedclasses. Thelanguagehasthreeincreasinglyexpressivesublanguagesdesignedfordierentuses: OWLLitehasthelowestformalcomplexity,andservesforsimpleclassicationhierarchies. 1. Ithassomerestrictionslikee.g. permittingonlycardinalityvaluesof0or

35 Chapter2:BackgroundInformation 15 OWLDLtriestooerthemaximumexpressivenesswhileretainingcomputational completeness(allconclusionsareguaranteedtobecomputable)anddecidability(all computationswillnishinnitetime).owldlissonamedduetoitscorrespondence withdescriptionlogics. OWLFulldoesnotgiveanycomputationalguaranteebutoersthemaximumexpressivenessandthesyntacticfreedom(e.g. collectionofindividualsandasanindividual). aclasscanbetreatedsimultaneouslyasa explicitly. InthisworkIwillfocusalwaysinOWLDL,evenwhenIdonotmentionit 2.15 OWLandDescriptionLogics supposedthatthisknowledgehasservedtochoosetheconstructorsandaxiomssupported OWLhastheinuenceofmorethan10yearsofDescriptionLogicresearch.Itis carefully,balancingexpressivenessandeciency. OWLontheSHfamilyofDescriptionLogics[63]. Thisbalancewasachievedbybasing theshiqdescriptionlogic[62]andtheshoqdescriptionlogic[61],thatovercomes MembersoftheSHfamilyinclude somelimitationsofshiqbytakingthelogicshqandextendingitwithindividualsand concretedatatypes. languages.owllitecanbeseenasavariantoftheshif(d)descriptionlogiclanguage, TheOWLLiteandOWLDLspeciesaresyntacticalvariantsofDescriptionLogic whichisitselfjustshoin(d)withouttheoneofconstructorandwiththeatleastand atmostconstructorslimitedto0and1[60]. oftheshoq(d)(addinginverserolesandrestrictedtounqualiednumberrestrictions) OWLDLisavariantoftheSHOIN(D)language,whichisitselfanextension adescriptionlogiclanguage.entailmentinowlfullisundecidableinthegeneralcase, [59]. OWLFullextendsbothOWLDLandRDF(S)andthuscannotbetranslatedinto becauseitallowsarbitraryrolesinnumberrestrictions,whichmakesthelogicundecidable [62].

36

37 Part I State of the Art and Problem Statement 17

38

39 Chapter 3 State of the Art in Data Integration knownresearchproblem. Theintegrationofdatafrommultipleheterogeneoussourcesisanoldandwell- communities,butthepromiseofintegratingdataonthewwworpeer-to-peersystems IthasbeentraditionallyfacedbythedatabaseandAIresearch anditsrelatedtechnologies(xml,rdf,etc.)hasattractedeortsfromotherdisciplines. BelowIexplaintheprogressofthistopicandwhatchallengeslieahead. 3.1 HistoricalProgress tionstheillusionofinteractingwithonesingleinformationsystem.therearetwogeneral Theintegrationofmultipleinformationsourcesaimsatgivingusersandapplica- approachestothisproblem,materializedintegrationandvirtualintegration. consistsonrststoringalldatafromallsourceslocallyandthenqueryingthem. Materializedintegrationisstronglyrelatedtomaterializedviewsindatabases,and warehousingisawell-knowexampleofmaterializedintegration.itissuitedforsituations Data whendatachangesinfrequentlyandafastevaluationofcomplexqueriesisrequired.however itisnotalwayspossibleorconvenienttoreplicateandupdatealldatafromasetofsources. Therearesituationswhenthesizeorvolatilityofdata(orthelimitationsimposedbythe sourcesqueryinterfaces)makesmaterializationimpossible.thisisthereasonwhyvirtual integrationhasbecomeofincreasinginterestinrecentyearsasithasmatured. tostoreandupdatealldatafromallsources.inpurevirtualintegrationtheglobalschema Virtualintegrationaimstooerthesameresultswithouttheconstraintofhaving isstrictlyalogicalentity.queriesissuedoveritaredynamicallyrewrittenatruntimeand redirectedtotheunderlyingdatasources.resultingdataisfetchedfromthesourcesthrough wrappersandmerged. integrationfromnow.thereareseveralproblemsrelatedtodataintegration,butthemain Herewearejustinterestedinvirtualintegration,orsimplydata onesare:1)theabilitytopresentanintegrated(mediated)schemafortheusertoquery, fromthedierentsourcesaccordingtotheirrelationshipswiththemediatedschema,orthe orthemodellingproblem,2)theabilitytoreformulatethequerytocombineinformation queryingproblem,and3)theabilitytoecientlyexecutethequeryoverthevariouslocal andremotedatasources. 19

40 20 Chapter3:StateoftheArtinDataIntegration Figure3.1:Architectureofadataintegrationsystem MediatedSchema(themodelingproblem) schemainorderforseveralautonomoussourcestointeroperate(seeg.3.1).thiskindof Wetalkaboutamediatedschemawhenadataintegrationsystememploysalogical schemaisusuallyaccompaniedbythedenitionofsemanticmappings(translations)between themediatedschemaandtheschemasofthedierentsources. determinehowthequeriesposedtothesystemareanswered. Thiscorrespondencewill ellingaretheglobal-as-viewandthelocal-as-view. Thetwoclassicapproachesconcerningmediatedschemasandmappingsmod- [2](TSIMMIS)[26][50],consistsonamediatedschema(theglobalschema)whichisdened Theglobal-as-viewapproachorGAV asasetofviewsoverthedatasources.thiskindofmediationhastheadvantagethatthe userquerycanbesimplymergedwiththeviewdenitions(unfolded)obtainingafullquery. Thedisadvantageofthisapproachisthatthemediatedschemaisstronglycoupledwiththe underlyingsourceschemasandtheirchanges,makingitabadsolutionforthewebcontext, wheresourcesareautonomousandvolatile. describessourcesasviewsoverthemediatedschema.ithastheadvantagethatchangeson Thelocal-as-viewapproachorLAV[98][56][34]takestheinversepoint-of-viewand theunderlyingsourcesdoesnotimplychangesonthemediatedschema.thedisadvantage ofthisapproachisthediculttomaptheuserquery,referredtothemediatedschema,to thedierentdatasources.it'sworthmentionalsothehybridcombinationofgavandlav intotheglavformalism[41] Formalisationofthemodellingproblem Aformalisationofthemodelingproblemborrowedfrom[93]canbe: Denition3.1.1.AdataintegrationsystemIisatriple<G,S,M>where: Gistheglobalschema(structureandconstraints), Sisthesourceschema(structuresandconstraints),and

41 Chapter3:StateoftheArtinDataIntegration 21 Figure3.2:Global-as-view(GAV)approach MisthemappingbetweenGandS. datacoherentwiths).wecallglobaldatabaseforianydatabaseforg.aglobaldatabase TospecifythesemanticsofIwehavetostartwithasourcedatabaseD(source BforIissaidtobelegalwithrespecttoDif: BislegalwithrespecttoG,i.e.,BsatisesalltheconstraintsofG; BsatisesthemappingMwithrespecttoD qisaqueryofaritynanddbisadatabase,wedenotewithqdbthesetoftuples(ofarity Wecanalsospecifythesemanticsofqueriesposedtoadataintegrationsystem.If n)indbthatsatisfyq.givenasourcedatabasedfori,theanswerqi;dtoaqueryqin IwithrespecttoD,isthesetoftuplestsuchthatt2qBforeveryglobaldatabaseBthat islegalforiwithrespecttod.thesetqi;discalledthesetofcertainanswerstoqini Formalisationofglobal-as-view(GAV)approach queryqsovers. WhenmodelingwithGAV,themappingMassociatestoeachelementginGa Denition3.1.2.AGAVmappingisasetofassertions,oneforeachelementgofG,of theformg Theideaisthateachelementgoftheglobalschemashouldbecharacterizedin qs termsofaviewqsoverthesources. retrievedatarelatedtoeachelementfromtheglobalschema.inthissensethegavapproach Themappingisexplicitlytellingthesystemhowto helpsenormouslythequeryprocessingdesign,butisjusteectivewhenthesystemisbased onasetofsourcesthatisstable. Example3.1.1.Adataintegrationsystemovertwosourcesofmoviesinformationcould presentthefollowingglobalschema: Globalschema: movie(title,year,director) european(director)

42 22 Chapter3:StateoftheArtinDataIntegration Figure3.3:Local-as-view(LAV)approach review(title,critique) Thetwodatasourcescouldpresentthefollowinglocalschemas: Source1:r1(Title,Year,Director)since1960,europeandirectors Source2:r2(Title,Critique)since1990 Eachentityoftheglobalschemahasassotiatedoneormoreviewsoverthesources: movie(t;y;d) f(t;y;d)jr1(t;y;d)g european(d) f(d)jr1(t;y;d)g review(t;r) f(t;r)jr2(t;r)g Formalisationoflocal-as-view(LAV)approach sourceschemasaqueryqgoverg. WhenmodelingwithLAV,themappingMassociatestoeachelementsofthe Denition3.1.3.ALAVmappingisasetofassertions,oneforeachelementsofS,ofthe forms TheideaisthateachsourcesshouldbecharacterizedintermsofaviewqGover qg theglobalschema.thismeansthataddinganewsourcejustimpliesaddinganewassertion inthemapping.thisfavoursthemaintainabilityandextensibilityofthedataintegration system. Example3.1.2.ThemoviesexampleundertheLAVapproach: Globalschema: movie(title,year,director) european(director) review(title,critique) InLAVthesourcesarefeaturedasviewsovertheglobalschema:

43 Chapter3:StateoftheArtinDataIntegration 23 r1(t;y;d) f(t;y;d)jmovie(t;y;d)^european(d)^y>=1960g r2(t;r) f(t;y;d)jmovie(t;y;d)^review(t;r)^y>=1990g

44 Chapter 4 Query reformulation algorithms (the querying problem) 4.1 lem) QueryreformulationinLAVandGAV(thequeryingprob- theuserqueryinadataintegrationsystem.ainitialquery,targetingthelogicalmediated Stronglyrelatedtothemediatedschemawefoundthealgorithmsforanswering schema,mustbetranslatedintoqueriesoverthedierentdatasources.inthecaseofthe Global-as-view(GAV)approach,thisproblemreducestoviewunfolding(unnesting). thelocal-as-view(lav)approach,ittranslatestothemorecomplexproblemofanswering queriesusingviews[51],withsomegoodsolutionslikeminicon[119]orthebucket[56] algorithms. ThefollowingtwoexamplesillustratesthequeryingproblemforGAVandLAV: Example4.1.1.QueryingthemoviesexampleundertheGAVapproach: Thequery"Titleandcritiqueofmoviesin1998"couldbeformalised(inrespecttothe globalschema)as: f(t;r)jmovie(t;1998;d)^review(t;r)g BecauseinGAVwehaveviewsforeachschemaentity,thequeryisprocessedbymeansof viewunfolding,i.e.,byexpandingtheatomsaccordingtotheirdenitions: movie(t;1998;d)!r1(t;1998;d) review(t;r)!r2(t;r) Example4.1.2.QueryingthemoviesexampleundertheLAVapproach: Havingthesamequeryofthepreviousexample("Titleandcritiqueofmoviesin1998") anditsformalisation: 24

45 Chapter4:Queryreformulationalgorithms(thequeryingproblem) 25 f(t;r)jmovie(t;1998;d)^review(t;r)g BecauseinLAVboththequeryandthemappingstargettheglobalschema,itisnottrivial todeterminehowtomapthequerytothelocalsources.thisprocessisperformedbymeans ofaninferencemechanismthatre-expressesatomsoftheglobalschemaintermsofatoms ofthesources: f(t;r)jr2(t;r)^r1(t;1998;d)g ing)inlav.however,lavappearstobeabettersolutionwhenautonomousandhetero- geneoussourcesarepresent,likeinthecaseoftheweb.insuchcontextwecannotrewrite theglobalschemaanditsmappingsonceandagain,soweneedastableglobalschemaand individualmappingsthatcanbechangedindependently. WhilequeryreformulationlookseasierinGAV,itisverycomplex(itneedsreason- 4.2 Answeringqueriesusingviews torefertothedatasources. TherstgoalofthedataintegrationsystemistoreformulateauserqueryQ beabstractedbyaviewviovertheglobalschema. IntheLAVapproachthedatacoveredbyeachsourcecan determiningwhichviewsshouldbequeriedtoachievethebestpossibleanswer. Thersttaskofthesystemwillbe etal. isprobablytheoldestandbestknownworkfacingtheproblemofdeterminingthe Theoldresearchpaper(1995)"AnsweringQueriesUsingViews"[55]byA.Halevy combinationofdatasources(modelledasviews)thatmustbeusedtoansweragivenquery inalavapproach.thisworkconsiderstheproblemofrewritingaconjunctivequeryusing asetofconjunctiveviewsinthepresenceofalargenumberofcandidateviews. malisetheproblem: AsmostpartofsimilarworksoftheseinitialapproachesitusesDatalog1tofor- AconjunctivequeryQhastheform: q(x):-e1(x1);:::;en(xn) relationandiscalledtheheadofthequery.theatomse1(x1);:::;en(xn)arethesubgoals whereqande1;:::;enarepredicatenames.theatomq(x)representstheanswer ofthequery,wheree1;:::;enaredatabaserelationsfromtheglobalschema Querycontainment WesaythataqueryQ1iscontainedinthequeryQ2,denotedbyQ1vQ2,iftheanswer Thequeryrewritingproblemiscloselyrelatedtotheconceptofquerycontainment. toq1isasubsetoftheanswertoq2.todetermineifaconjunctivequeryq1iscontained intoanotherconjunctivequeryq2wemustndforeachsubgoalofq2asubgoalinq1 1seetheBackgroundInformationchaptersforabriefintroductiontoDatalog

46 26 Chapter4:Queryreformulationalgorithms(thequeryingproblem) Figure4.1:AconjunctivequeryQ2withtwosubgoals Figure4.2:AconjunctivequeryQ1withthreesubgoalsthatiscontainedinQ2 appliedtoconjunctivequeries. containedinit. Figuresg.4.1andg.4.2showsgraphicallytheconceptofcontainment Example4.2.1.AqueryQ2thatasksforpeoplewithblueeyesandblondhaircontainsa queryq1thatasksforwomenwithblueeyesandblondhairbecauseforeachsubgoalofq2 woman).incontrast,aqueryq3thatasksforpeoplewithblueeyesisnotcontainedinq2 (blueeyes,blondhair)q1hasasubgoalcontainedinit(plussomeothergoals,likebeinga becauseitdoesnotprovideasubgoalcontainedinthesubgoal"blondhair". containmentsacontainmentmapping. Whenallsubgoalsofaquerycontainsubgoalsofanotherquerywecallthesetof onlyifthereisacontainmentmappingfromq2toq1. SowecansaythataqueryQ2containsQ1ifand

47 Chapter4:Queryreformulationalgorithms(thequeryingproblem) Rewritingaqueryusingviews Qusingjustsomeoftheviewsandcomparisonpredicates. GivenaconjunctivequeryQandasetofviewsV=V1;:::Vn,wewanttorewrite Example4.2.2.Considerthefollowingglobalschema. theidentiers(m)ofallmoviesandtheirnames(n). Therelationactor(a,n,y)stores Therelationmovie(m,n)stores identiers(a)ofactors,theirnames(n)andbirthyears(y). storestherelationshipbetweenmoviesandactorsidentiers.thefollowingqueryasksfor Therelationstarring(a,m) thelmographyoftheactor'christopherwalken'. Q(m):-movie(m,n),actor(a,'ChristopherWalken'),starring(a,m) TheycanalsobemodelledasconjunctivequeriesandformalisedwithDatalogrules.There Databaseviewsarenamedqueriesthatreturnasubsetofthedatainadatabase. aresomedierencesbetweenansweringqueriesusingrealrelationalviewsandanswering queriesusingvirtualviewsrepresentingdatasourcesinalav-baseddataintegrationsystem.twolavviewswiththesamedenitionarenotassumedtocontainthesametuples becausetheyrepresentautonomousdatasources.so,itmakessensetohavetheviews: V1(m):-movies(m,n) V2(m):-movies(m,n) ofmovies. V1andV2canrepresenttwodierentmoviedatabasescontainingdierentsubsets Example4.2.3.Continuingwithourexample,considerthefollowingviews: V2(m):-movies(m,n) V1(m):-movies(m,n) V3(a,m):-starring(a,m) V4(a):-actors(a,n,y),y<1950 viewsorcomparisonpredicates(withoutdirectlyusingrelationpredicates). Nowwecanconsidertheproblemofrewritingaqueryoveradatabaseusingonly V1;:::Vnbeasetofviews.ThequeryQ0isarewritingofQusingVifQ0vQ. Denition4.2.1.(containedrewritingorsimplyrewriting)LetQbeaquery,andV = Example4.2.4.Onepossiblerewritingofthepreviousexamplequeryusingtheviewsis: Q'(m):-V1(m),V3(a,m),V4(a) Anotherpossiblerewritingcanbe: Q'(x,y):-V2(m),V3(a,m),V4(a) Example4.2.5.Unfoldingtheviewsofthepreviousexamplewecanseethattheresulting rewritingsarecontainedintheinitialquery,sotheyarevalidrewritings.taketherstfor

48 28 Chapter4:Queryreformulationalgorithms(thequeryingproblem) example: Q(m):-movie(m),actor(a,'spain'),starring(a,m),woman(a) initialquery,buttheydonotguaranteethattheyarethesameresultsthatcouldbeobtained Theobtainedrewritingsguaranteethatresultswillnotbeoutsidethescopeofthe applyingdirectlythequeryoverahypotheticallocaldatabasecontainingallthedatafrom thesources.thisidealsituationiscalledequivalentrewritingandispursuedwhenanswering queriesusingviewsisappliedtoqueryoptimizationandphysicaldataindependenceina localdatabase. Denition4.2.2.(equivalentrewriting)LetQbeaquery,andV=V1;:::Vnbeasetof views.thequeryq0isanequivalentrewritingofqusingvifq0vqandqvq0 possibleusingthegivenviews. Inthecontextofdataintegrationwepursuetoobtainthebiggestsetofresults containedrewriting. Thebestrewritinginthissenseiscalledthemaximally- Denition4.2.3.(maximally-containedrewriting)Q0isamaximally-containedrewriting ofqusingviewsv=v1;:::vnif(1)q0vqand(2)thereisnootherqueryq00suchthat Q0vQ00vQ. theunionofallpossiblecontainedrewritings. Themaximally-containedrewritingofaconjunctivequerycanbeobtainedwith Example4.2.6.Toobtainthemaximally-containedrewritingofourexamplewesimply performtheunionofthetwoobtainedrewritings.thisisusuallyrepresentedjustbyshowing thelistofrewritings: Q'(m):-V1(m),V3(a,m),V4(a)Q'(m):-V2(m),V3(a,m),V4(a) 4.3 Parametrizedviews thedataintegrationproblem. Theinitialapproachestoansweringqueriesusingviewsprovideaformalbasisfor representedwithasinglevieworwithanitesetofviewsbecausetheyusetopresent However,datasourcesintherealworldarediculttobe parametrizedqueryinterfaces.theseinitialworksonansweringqueriesusingviewsassume potentiallyinnitenumberofviews. anitesetofviewsv,butaparametrizedqueryinterfacecanbeonlyrepresentedbya thatadatasourcecanberepresentedwithaniteviewlike: Example4.3.1.Continuingwithourexampleaboutmovies,itisnotrealistictoassume V1(m):-movies(m,n) themoviename.inthiscaseinsteadofoneviewwewouldhaveoneviewforeachpossible Probablythesourcewouldoeraqueryinterfacewithsomeparameters,likee.g.

49 Chapter4:Queryreformulationalgorithms(thequeryingproblem) 29 name: V1(m):-movies(m,'RioBravo') V2(m):-movies(m,'AssaultonPrecinct13')... queriesusingsourcesmodelledbyaninnitesetofviews.in[115]authorsalreadyconsidered Toovercomethislimitationsomeworkshaveanalysedtheproblemofanswering thispossibility,showingthatitisimportanttobeabletoexploitthelocalprocessingpower ofsourcestoreducetheamountofdatatransmittedoverthenetwork.inthisworkisintroducedtheconceptofaparametrizedviewasaconjunctivequerythatcontainsplaceholders inargumentpositions. signingaconstanttoeachplaceholder. AparametrizedviewVrepresentsthesetofallviewdenitionsobtainedbyas- beginningwithanasterisk(*). Placeholderscanbedenotedbyargumentnames Example4.3.2.Wecanrewritethepreviousexamplewiththisparametrizedview: V1(m):-movies(m,*n) anyinnitesetofviewscanbepartitionedintoanitesetofequivalenceclasses,insuch In[57]A.Halevy,A.RajaramanandJ.D.Ullmanextendthisideashowingthat awaythatallviewsinanequivalenceclassarealsoequivalentwithrespecttorewritings ofaqueryq.theequivalenceclassesallowtokeepapplyingthetraditionalalgorithmsof answeringqueriesusingviewslike[56]. 4.4 Queryprocessing queryreformulationandqueryprocessing.queryreformulationcorrespondstotheresearch Theresolutionofaqueryinadataintegrationsystemcanbedividedintwostages, providethebestvalidresponsetoagivenquery.however,knowingwhichsourcestoquery aroundansweringqueriesusingviews,andfocusontheselectionofthesourcesthatcan itisnotenough. wichreferstothesourcesmodelledasviews. Theobtainedrewrittenqueryoftherststageisadeclarativequery wouldbetranslatedtoasyntactictreeandthenoptimizedforexecution. Inalocalsystemsuchahigh-levelquery adataintegrationsystemsomeofthealgebraicoperationscanbeperformedlocallyatthe Bycontrast,in sources,whileothersmustbeperformedinthemediator.thequeryprocessingstageaims togeneratethebestexecutionplanforagivenqueryandexecutingthatplanwiththehelp ofthemediatorandthewrappersofthesources. ingagoodperformancecanbeadiculttask. Becausethetargetsystemsaredistributed,autonomousandheterogeneous,achiev- haveconsideredadaptivequeryprocessing[71][72][7]wherethesystemsstartswithsome Inanswertothischallenge,severalworks executionplanandadaptsitastheexecutionproceeds.

50 Chapter 5 Data Integration and XML queriesandmappingstillmid-90s. TheclassicdataintegrationliteraturefocusedontheRelationalModelforboth toanewandemergingdatamodel,xml[68]. However,inlate-90sresearchesturnedtheirinterest standardtoexposeandinterchangedata,soitwastheidealchooseforsystemspursuing Thenewmodelarousedasade-facto datainteroperability.now,xmlanditsquerylanguagesaretheselectedinterfacesforweb Services,XML-nativedatabasesandlotsofotherapplications. 5.1 MappingtheclassicdataintegrationproblemstoXML theclassicdataintegrationliterature,butnewsolutionsneedtobefoundtotacklethe IntegratingdatafromvariousXMLsourcesarisethesameproblemsdescribedin particularitiesofthenewscenario. termsisexpressedthequery(there'snoneedtocallittheglobalschemaifwearee.g.in Therstoftheseclassicalproblemsisschemamapping. Theschemainwhich apeer-to-peercontext)mustbesomewaymappedtotheschemaorschemasofthesources wherethequerywillbeactuallyexecuted. attributecorrespondence,wheresomepropertyorattributeinonerepresentationcorrespondstosomeattributeintheotherrepresentation.wendanincreasedcomplexitywhen mappingconceptsthataresemanticallythesame,butthexmlrepresentationsmaybe structureddierently. Thesimplestapproachtosuchmappingisan Example5.1.1.Thisexample,borrowedfrom[52],illustratessomeoftheproblemsof mappingxmlschemas. Source1.xml pubs DTD: book* title author* publisher* name name 30

51 Chapter5:DataIntegrationandXML 31 Source2.xml authors DTD: author* full-name publication* title pub-type dierentshapes. Theexampleshowshowasimpleschemadescribingbooksandauthorscantake goalofthatmapping.itmayserveforsimplemigrationtasks(translationofdatafromone Thedicultofobtainingamappingbetweenthemwilldependonthe schematoanother),andthenasimpletranslationtemplatewillbeenough. maybeneededforqueryingpurposes,andthenamorecomplexstrategyisneeded,related However,it totheoldqueryrewritingproblemdescribedinprevioussections. 5.2 XMLquerylanguagesanddataintegration integrationapplications. XMLquerylanguageshavebeenbroadlyusedforthedevelopmentofsimpledata XQuerycanbeadirectsolutionforsomerealworldproblems. MappingbetweenschemasordeningwrapperswithXSLTor arebasedonthemanualcodingoftemplatesandupdates,sotheyrepresentthemodern Thesesolutionsgenerally versionofthemoreprimitivedataintegrationapproaches XSLTransformations(XSLT) formingxmldocumentsintootherxmldocuments.xsltisacomponentofthew3c's XSLTransformations(XSLT)isalanguagestandardizedbytheW3Cfortrans- XMLStylesheetLanguage,andinitiallyitsmainpurposewastobeusedinconjunction withaformattinglanguagelikexsl:fo,targettingthepresentationlayerindependence. However,XSLTcanbeusedindependently,andithasbeenusedinmanyapplicationareas, butspeciallybythedataintegrationcommunity. formingasourcexmldocumentintoaresultxmldocument.anxsltstylesheetasso- ciatespatternswithtemplates.whenapatternismatchedagainstanelementinthesource XMLtree,thecorrespondingtemplateisinstantiatedtogenerateXMLcodefortheresult document.thisgenerationcanincludedatafromthesourcetree,butalsocanincludenew data. AtransformationexpressedinXSLT,calledastylesheet,describesrulesfortrans- 2005),isarevisedversionoftheXSLT1.0Recommendationpublishedon16November Thecurrentversion,XSLT2.0(W3CCandidateRecommendation3November XSLTsharesthesamedatamodelasXPath2.0,whichisdenedin[152]. ItisdesignedtobeusedinconjunctionwithXPath2.0,whichisdenedin[69]. choicefordataintegrationapplications. ThecapabilitiesofXSLTfortransformingXMLdocumentsmakesitanatural InscenarioswhereheterogeneousXMLschemas

52 32 Chapter5:DataIntegrationandXML needtobemapped,xsltstylesheetscanbemanuallycodedorsemi-automaticallygeneratedtoallowtheconversionbetweenthedierentschemas. intra-modelconversions,likerdf-to-xml.anotherusageofxslthasbeeninthedenitionofwebwrappers.htmlcodecanbeeasilymodiedtobecomexhtmlwithtoolslike HTMLTidy[143],andthenlteredwithXSLTstylesheets. makeusefromxsltdataintegrationcapabilities,likethealtovamapforcetool[99]. Lotsofcommercialproducts XSLThasbeenusedalsofor Example5.2.1.ThisexampleshowshowaninputXMLdocumentcanbetransformed usinganxslttemplate.takethefollowintxmldocumentdescribingtwomovies. intput.xml: <?xml <movies> version="1.0"?> <movie <title>blade id="26"> <year>1982</year> Runner</title> </movie> <movie <title>rio username="27"> <year>1959</year> Bravo</title> </movie> </movies> lyingtreeoftheinputdocument. ThefollowingXSLTtemplateisappliedrecursivelytoallthenodesoftheunder- elements.italsotranslatestheidattributesintoequivalentelements. Thetemplatetranslatesthemovieelementsintorecord template.xslt: <?xml <xsl:stylesheet version="1.0"?> <xsl:output method="xml" xmlns:xsl=" indent="yes"/> version="1.0"> <xsl:template <transform> match="/"> </transform> <xsl:apply-templates/> </xsl:template> <xsl:template <record> match="movie"> <id> </id> <xsl:value-of /> <title> </title> <xsl:value-of select="title" /> </record>

53 Chapter5:DataIntegrationandXML 33 </xsl:template> </xsl:stylesheet> ThisistheresultingXMLdocument: output.xml: <?xml <transform> version="1.0" encoding="utf-8"?> <record> <id>26</id> </record> <title>blade Runner</title> <record> <id>27</id> </record> <title>rio Bravo</title> </transform> XMLQuery(XQuery) tureofanxmldocument(denedin[152]inasql-likefashion. TheW3C'sXMLQuerylanguage(XQuery)[154]allowstoquerythelogicalstruc- previousxmlquerylanguagecalledquilt,whichinturnborrowedfeaturesfromseveral Itisderivedfromthe otherlanguages,includingxpath1.0,xql,sql,andoql. tionalitywithflwrexpressions(for-let-where-sortby-return),elementcon- structors,variables,functionsandupdatingcapabilities. XQueryVersion1.0isanextensionofXPathVersion2.0.ItenrichesXPathfunc- Thisdiscussioncanalsotakeplaceindataintegrationscenarios.InprincipleXQueryand TherehavebeensomediscussionabouttheoverlappingofXQueryandXSLT. XSLTcanbeinterchangeableonmostpartofsituations,e.g. heterogeneousschemasorforthedenitionofwrappers.thenalchooseusuallydepends whenmappingdatafrom onthequalityoftoolsanddeveloperspreferences. primarychoicefortransformingxmldata,whilexqueryitisbecomingthestandardfor IngeneralXSLTcontinuestobethe queryingandupdatingxml-baseddatabases. Example5.2.2.ThisexampleshowshowaninputXMLdocumentcanbetransformed usinganxqueryexpression.theresultofprocessingthisquerywillbethesameasinthe previousexample. <transform> { for return $m in doc("input.xml")//movie <record> <id>{ $m/@title/text() } </id>

54 34 Chapter5:DataIntegrationandXML <title>{ $m/title/text() } </title> </record> }</transform> 5.3 XMLDataIntegrationSystems lotsofxml-relateddataintegrationprojectslikexquare,liquiddata(enosys),nimble, ThesuccessofXMLanditspotentialuseininteroperabilityproblemshasfuelled XMLInformationWorkbench,Callixa,Metamatrix,Xyleme,TukwilaorRaccoon.HereI describethefeaturesofthreerepresentativecases Tukwila ResearchGroup,isanexampleoftheLAVapproachappliedtoXMLdata. TheTukwilasystem,developedattheTheUniversityofWashingtonDatabase MiniCon[119]algorithmtoreformulatethequeriesposedovertheglobalschemaintoqueries Itusesthe overthelocalxmlsources.tukwilaisanative-xmlintegrationsystem,becauseitoperates directlyovernonmaterializedxml(notconvertedtoanotherinternalrepresentation).the systemintroducesthex-scanoperator,thatallowstoprocessxmldataasitisbeing received(streamingxml). activeresearchersindataintegrationlikezacharyivesandalonhalevy,andprovidesadvancedfeaturesnotpresentinothercommercialsystems. TodayTukwilaalreadyisanoldacademicprototype,butitwasdesignedbysome Enosys betterchoice,butatleastdemonstratesitwasviableandwellmotivated. Theindustrialsuccessofaresearchapproachdonotalwaysentailthatitwasthe suchasuccessinanxml-baseddataintegrationsystem,theenosysxmlintegration Wecannd Platform[116].Thissystem,basedonthewrapper-mediatorarchitecture,allowsquerying heterogeneousdatasourcesabstractedwithxmlschemas.wrappers(orxmlizersinthe project'sterminology)usesxmlschemasaslogicalviewsofthesources,andamediator resolvesxqueryexpressionsoverthesources. systemispartofthebea'saqualogicdsp,formerlyknownasliquiddata,anxquerybasedenterpriseinformationintegration(eii)solutionthattakesadataservicelayer-based approachtodataintegration. OnJune18,2003,EnosysSoftwarewasacquiredbyBEASystems,Inc.Nowthe XQuareFusion isasetofopensourcejavamodulesforextendingj2eeplatformswithxml-based,heterogeneousinformationintegrationcapabilities.insteadofbeinganapi,xquareisdesigned XQuare(XQueryAdvancedRuntimeEnvironment),previouslyknownasXQuark,

55 Chapter5:DataIntegrationandXML 35 tobeembeddedintojava-basedweborapplicationservers,andrelyonthestandardj2ee servicesforexchanging,processingandpublishingxmlinformation.thegoalofxquareis presentingtoapplicationsasingle,uniformxmlviewofthedierentdatasources,which canthenbequeriedwithxquerytoproducexmldocuments. includerelationaldatabases,xmldocuments,webservicesandanyxquery-enableddata Accessibledatasources sourceandjcaconnectors.lastknownreleasewasinseptember10,2005.

56 Chapter 6 Semantic Integration 6.1 OntologiesandDataIntegration nismstoallowseveralautonomousdatasourcestointeroperate. TheDataIntegrationdisciplinehasstudiedduringalmosttwodecadesmecha- traditionaldataintegrationsystemshasbeentheimpossibilitytoautomaticallyestablish Themainlimitationof semanticmappingsbetweenthedataschemasofthedierentsources. fromtraditionaldatamodels(e.g. relationalandxml)arebuiltwithareducedsetof Schemasobtained simpleandmeaninglessconstructsandlotsofhumanreadablelabels.thesekindsofdata structuresaredesignedtobeinginterpretedjustforhumans,andautomaticallyestablishing semanticmappingsamongthembecomesaverydicultandimprecisetask. globalnameofthesemanticwebinitiativehasraisedanewopportunityandchallenge Therecentsuccessofnotsorecentsemantic-richmodellinglanguagesunderthe tothedataintegrationcommunity. representinformationdomains. Theyarebuiltwitharichsetofmeaningfulconstructs Ontologies,insteadofschemas,arethenewwayto providedbythesemanticwebmodellinglanguageslikerdfs[126]andowl[112]. couldtrytoperformsomeautomaticinterpretationsoftherepresentedmeaning. Becauseontologiesarebuiltwithstandardsemanticoperators,asoftwareagent couldallowforexampletoautomaticallyentailthesemanticmappingsbetweentwodierent This ontologies.however,thesetofstandardsemanticoperatorsarestillverysmall,andtoday ontologymodellinginvolvesalotofambiguousnatural-languagelabels.obviously,asgreater isthesetofstandardsemanticsofanontology,easierwillbeitsautomaticprocessingby softwareagents. UpperOntology,liketheIEEESUMO(SuggestedUpperMergedOntology)[105],DOLCE Thisisthereasonoftheproliferationofinitiativestostandardizean (DescriptiveOntologyforLinguisticandCognitiveEngineering)[44]orCyc/OpenCyc[92]. levelontologystandardizationisnotthegoal.formanyknowledgedomains(anatomy,web Whiletheghtforastandardsemanticbasisremainsatthetoplevel,atthedomain directories,digitalrightsmanagement,music,etc.),severaloverlappingontologieshavebeen engineered.eachisadierentabstractionandrepresentationofthesameorsimilarconcepts. Toenablecollaborationwithinandacrossinformationdomains,softwareagentsrequirethe semanticalignment(mapping)ofthedierentformalisms. SchemaMappingfromDataIntegration,butnowwithnewpromisingexpectativesand Itisthesameoldproblemof 36

57 Chapter6:SemanticIntegration 37 underthenameofontologyalignment.thistopichasattractedalotofinterestrecently, evenbeingobjectofaninternationalcontest,theontologyalignmentevaluationinitiative 2005[108] Semanticintegrationchallenges [106]enumeratesthethreemaindimensionsofsemantic-integrationresearch: Mappingdiscovery(ontologyalignment) ontologies,howdowendthesimilaritiesbetweenthem. Howdowedenethesimilaritybetweentwoontologyentities. And,giventwo Declarativeformalrepresentationsofmappings thisnewknowledgetoenablereasoningwithmappings. Oncewehavefoundthemappingsbetweentwoontologies,howdowerepresent Reasoningwithmappings howwefacetheiruncertainty. Whatdowedowiththeobtainedmappings,howweusethemtoanswerqueries, 6.2 OntologyAlignment producesamappingbetweenelementsofthetwographsthatcorrespondsemanticallyto Ontologyalignment(ormatching)istheoperationthattakestwoontologiesand eachother.severalontologyalignmentalgorithmshavebeenprovidedlikeprompt[107], GLUE[32],Ontrapro[6],OLA[37]orFOAM[36]. Denition6.2.1.(from[35])GiventwoontologiesOandO0,analignmentbetweenOand thetwomatchedentities,rbeingarelationshipholdingbetweeneande0,andnexpressing O0isasetofcorrespondences(i.e.,4-uples): <e;e0;r;n>withe2oande02o0being thelevelofcondence[0..1]inthiscorrespondence. edgerepresentationlanguage(e.g. Itistypicallyassumedthatthetwoontologiesaredescribedwithinthesameknowltonomousalignment,butothersemi-automaticandinteractiveapproachesexist. OWL[112]). HereIwillfocusonautomaticandau- Example6.2.1.Let'sseesimpleexample. OAandOBrespectively. Figures 6.1and 6.2presenttwoontologies, isalways1.0)isdenedasfollows: ApossiblealignmentA1(tosimplify,therelationisalways"="andthecondence <a:human,b:people,=,1.0> <a:director,b:manager,=,1.0> <a:staff,b:employee,=,1.0> <a:directs,b:supervise,=,1.0>

58 38 Chapter6:SemanticIntegration Figure6.1:TheRDFgraphofOA Figure6.2:TheRDFgraphofOB

59 Chapter6:SemanticIntegration 39 AnotherreasonablealignmentA2: <a:human,b:people,=,1.0> <a:director,b:manager,=,1.0> <a:staff,b:sales <a:directs,b:supervise,=,1.0> Employee,=,1.0> AndanobviouslywrongalignmentA3: <a:human,b:manager,=,1.0> <a:director,b:employee,=,1.0> <a:directs,b:sales Employee,=,1.0> AlignmentMethods mathematicsformatchinggraphs[58][114],indatabasesformappingschemas[122]andin Theontologyalignmentproblemhasanimportantbackgroundworkindiscrete ofsimilarity,aninversemeasureofthedistancebetweenentities. machinelearningforclusteringstructuredobjects[19].itiscloselyrelatedtotheconcept sure,usedtodeducethattwodierentdataitemscorrespondtothesameinformation. Mostpartofontologyalignmentalgorithmsrelyonsomesemanticsimilaritymea- Semanticsimilaritybetweenontologyentities(withinthesameontologyorbetweentwo dierentones)maybedenedinmanydierentways. termsoftopologicalpatterns. Givenapairofentities,candc0,atraditionalmethodfor Forexample,itmaybedenedin measuringtheirsimilarityconsistsofcalculatingthedistancebetweentheminthegraph. Theshorterthisdistance,thehigherthesimilarity. countingmethod. Thisiscommonlyknownastheedge othersimilaritymethodsbasede.g.ininformationtheory.therecentlycelebratedontologyalignmentevaluationinitiative2005[108]hasshownthatbestalignmentalgorithms Topologicalsimilaritymethodshaveevolvedfromthissimpleidea,butthereexist combinedierentsimilaritymeasures Similaritymeasures provideaclassication(updating[122]): Therearemanydierentwaystodenethesimilaritybetweenontologies. [37] terminological(t)comparingthelabelsoftheentities;stringbased(ts)doesthe terminologicalmatchingthroughstringstructuredissimilarity(e.g.,editdistance); terminologicalwithlexicons(tl)doestheterminologicalmatchingmodulotherelationshipsfoundinalexicon(i.e.,consideringsynonymsasequivalentandhyponyms assubsumed); internalstructurecomparison(i)comparingtheinternalstructureofentities(e.g., thevaluerangeorcardinalityoftheirattributes);

60 40 Chapter6:SemanticIntegration externalstructurecomparison(s)comparingtherelationsoftheentitieswith otherentities;taxonomicalstructure(st)comparingthepositionoftheentitieswithin ataxonomy;externalstructurecomparisonwithcycles(sc)anexternalstructure comparisonrobusttocycles; extensionalcomparison(e)comparingtheknownextensionofentities,i.e.theset ofotherentitiesthatareattachedtothem(ingeneralinstancesofclasses); semanticcomparison(m)comparingtheinterpretations(ormoreexactlythemodels)oftheentities. IMHOcanbesimpliedtothreecategorieswhenbeingappliedtoontologies: Thistaxonomyisinheritedfromthestudyofsimilarityinrelationalschemas,and TopologicalandExtensional. Lexical, Lexicalapproaches techniquestomatchlabelsofentities.labelsarewritteninnaturallanguageandconstitute Lexical(orterminological)similarityisbasedinapplyinginformationretrieval oneofthemainsourcesofambiguityinanontology. OntologyAlignmentEvaluationInitiative2005makeuseoflexicalsimilaritymeasuresat However,allbestalgorithmsofthe theirrststages. Topological(structural)approaches rstwaystoevaluatesemanticsimilarityinataxonomywerebasedonlyonthetopology Theinitialworksaroundontologiesjustfocusonis-aconstructs(taxonomies).The oftheconcepttree. nodes. Theshorterthepathfromonenodetoanother,themoresimilartheyare. Workslike[121]and[90]measurethedistancebetweenthedierent multiplepaths,onetakesthelengthoftheshortestone.[148]ndsthepathlengthtothe Given rootnodefromtheleastcommonsubsumer(lcs)ofthetwoconcepts,whichisthemost specicconcepttheyshareasanancestor. lengthsfromtheindividualconceptstotheroot.[89]ndstheshortestpathbetweentwo Thisvalueisscaledbythesumofthepath concepts,andscalesthatvaluebythemaximumpathlengthintheisahierarchyinwhich theyoccur. inthetaxonomyrepresentuniformdistances. However,theproblemofthisapproachisthatitreliesonthenotionthatnodes thedistancecoveredbyasingletaxonomicnode,speciallywhencertainsub-taxonomiesare Actually,therecanbeabigvariabilityin muchdenserthanothers(e.g.,biologicalcategories). sures,basedongraphmatchingfromdiscretemathematics. Recently,newworkslike[20]denemoresophisticatedtopologicalsimilaritymeasuressuittheparticularitiesofthenewontologies,builtwithmoreexpressivelanguageslike Thesenewgraph-basedmea- OWL[112].TheirusebysomeofthebestalignmentalgorithmsoftheOntologyAlignment EvaluationInitiative2005(e.g.[64])arisessomeexpectationoverthiswayofmeasuringthe similarityoftwoconcepts.

61 Chapter6:SemanticIntegration 41 Extensionalapproaches othersituationswecanalsohavesomeinformationabouttheinstancescorrespondingto Insomecasesweknowonlythelabelledstructureofanontology. However,in theclassesandpropertiesdenedintheontology(itsextension). enoughrepresentative,theycanoersomerelevantinformationtothesimilaritymeasurement.extensionalorcorpusbasedmeasuresarerelatedtostatistics,machine-learningand InformationTheory. Iftheseinstancesare followedbytwomeasures,lin[94]andjcn[79],thataugmenttheinformationcontentofthe Oneoftheoldestextensionalmeasureisres,denedbyResnikin1995[129].Itwas LCSoftwoconceptswiththesumoftheinformationcontentoftheindividualconcepts.[94] contentofthelcsfromthissum. scalestheinformationcontentofthelcsbythissum,while[79]subtractstheinformation areessentiallybasedintheconceptofjointprobabilitydistributiondenedin[38]. Morerecentinformation-theoreticapproachesare[30],[32](GLUE)and[67].They distributionconsistsofthefourprobabilities:p(a;b),p(a;b),p(a;b),andp(a;b).a This termsuchasp(a,b)istheprobabilitythatarandomlychoseninstancefromtheuniverse belongstoabutnottob,andiscomputedasthefractionoftheuniversethatbelongsto AbutnottoB. theirapplicationtofunctionalgenomics[8]. Practicalusesofextensionalinformation-theoreticsimilaritymeasuresexist,like 6.3 GMO.Astructure-basedsemanticsimilarityalgorithm tive2005wasfalcon-ao.amongothertoolsitmakesuseofalexicalsimilarityresolverand OneofthebestbehavingalgorithmsoftheOntologyAlignmentEvaluationInitia- aninterestingstructuralsimilaritystrategycalledgmo(graphmatchingforontologies [64]). turalsimilaritiesbetweenowl-dlontologies,andbecauseitobtainedexcellentresultsin GMOisinterestingbecauseitisapurelyautomaticalgorithmforndingstruc- testswherelexicallabelswhereobfuscated(toevaluatethebehaviourofstructuralsimilarity strategies).amongotherparticularities,gmosimpliesthealignmentofontologiesdened withrichmodellinglanguageslikeowl-dlbecause,insteadofmanagingeachrelationship (is-a,part-of,...)specically,itmakesuseoftheunderlyingdirectedbipartitegraphofthe participatingontologies Graphsimilaritycalculationalgorithm basedonthefollowingupdatingequationtocomputethesimilaritymatrix: GMOisbasedonthestructuralsimilaritycalculationdescribedin[20],thatis Denition6.3.1.Xk+1=BXkAT+BTXkA;k=0;1;::: wherexkisthenb nasimilaritymatrixofentriesxijatiterationk,andaandbarethe andodditerationsofthisequationconverge. adjacencymatricesofgaandgbrespectively.[20]demonstratesthatthenormalizedeven

62 42 Chapter6:SemanticIntegration OncewehaveasimilaritymatrixbetweenGBandGA,wecanobtaintherelationshipsbetweenentitiesofGBandentitiesofGAbyusingthefollowingformula: Let'sdecomposethisbasicequationforabetterunderstandingofitsbehaviour. relba=bxk ementsofga. ThisnewmatrixdescribestheelementsofGBintermsoftheirrelationshipwiththeel- termsoftheirrelationshipwiththemselves: WecancomparethismatrixwithA,thatdescribestheelementsofGAin simba=relbaat onlytherelationshipbetweengbandgaw.r.t. TheresultingmatrixisalreadyasimilaritymatrixbetweenGBandGA,butitdescribes GA.Wemustaddtheequivalentformulasandweobtainthenalequation: howelementsofgbrelatetoelementsof Xk+1=BXkAT+BTXkA;k=0;1;::: Thatcanbeseenas: Xk+1=simba+simab Where: relba=bxk simba=relbaat relab=xka simab=btrelab Example6.3.1.Let'sseeasimpleexample.TakethefollowingtrivialgraphsGAandGB. alreadyknowingthesimilarityvaluesofsomepairofentities,wecanmodifythismatrix NotethatinitiallythesimilaritymatrixX0issetto1. Ifwestarttheprocess accordingly,andkeeptheknownvaluesbetweeniterations. betweengaandgb: Let'scalculatethesimilarity A= AB= AX0= A

63 Chapter6:SemanticIntegration 43 Figure6.3:GA(left)andGB(right) X1=BX0AT+BTX0A= A X1=X1=frobeniusNorm(X1)= 0 0;316 0;316 0; ;632 0;316 0; ;316 A Iteratingthealgorithm22timesitconvergestothefollowingresult: X22= 0; ; ;577 1 A So,asexpectedtheentitiesa0,b0andc0(rows)aresimilartoa,bandc(columns)respectively GMOadaptationofthegraphsimilarityalgorithmtoOWL-DL OWL-DLontologies. GMOtakesthegraphstructuralsimilaritycalculationof[20]andadaptsitto Denition6.3.2.(from[64])LetG0AbetheRDFdirectedlabelledgraphofOA. directedbipartitegraphofontologyoa,denotedbyga,isaderivationofg0abyreplacing The the"s"(subject)edgeswithedgespointingtostatementnodes,andthe"p"(predicate)and "o"(object)edgeswithedgespointingfromstatementnodes.theadjacencymatrixofga iscalledthematrixrepresentationofontologyoa,denotedbya. ments,sharedentities,etc.)itisconvenienttogivetheinputmatricesthefollowingblock Becauseofthedierentnatureoftheontologyentities(classes,properties,state- structure, A= 0 0 A ES 0 0 AS AE AOP 0 1 A

64 44 Chapter6:SemanticIntegration Figure6.4:ComparisonbetweenanRDFgraph(left)anditscorrespondantdirectedbipartitegraph(right). B= 0 0 B BS ES BE BOP 0 Brespectively. AESandBESrepresenttheconnectionsfromexternalentitiestostatementsinAand ASandBSrepresenttheconnectionsfrominternalentitiestostatementsinAandB respectively. AEandBErepresenttheconnectionsfromstatementstoexternalentitiesinAand Brespectively. Brespectively. AOPandBOPrepresenttheconnectionsfromstatementstointernalentitiesinAand datatypesandliterals. ExternalentitiesareusuallythoseconstructsdenedbyRDFSorOWL,built-in AssaidbeforeGMOusestheupdatingequationfrom[20]: Xk+1=BXkAT+BTXkA;k=0;1;::: entities.itcanbedecomposedasfollows: Thematrix Xkincludesalsothesimilarityrelatedtostatementsandexternal 1 A Xk= E BA Ok Sk A EBArepresentsthesimilarityamongexternalentities.

65 Chapter6:SemanticIntegration 45 Okrepresentsthesimilarityamonginternalentities. Skrepresentsthesimilarityamongstatements. Othersimilaritiesarekepttozero(e.g.betweenstatementsandinternalentities) externalentitiesaresupposedtobethesame.takeforexampleonlythreeexternalentities, InitiallySk=1,Ok=1,andEBAissetinadvanceasanidentitymatrixbecause subclassof,rangeanddomain.theircrossedsimilaritymatrixwouldbe: EBA= A thefrobeniusnormofthethreematrices(ebaiskeptunchanged).finallyskandokare ForeachiterationSkandOkarerecalculatedandnormalizedusingthesumof normalizedagainbutwiththe2-norm. entities(inclasses,propertiesandinstances)thatimprovesthescalabilityandperformance. Someimprovementsofthealgorithmdescribedin[64]includeafurtherclassicationof ConceptofsimilarityinGMO amongnodes.thisconceptofsimilarityisinheritedfromthegraphmatchingproblemfrom Thetraditionaldenitionsofstructuralsimilarityareusuallybasedonthedistance discretemathematics.however,inaknowledgerepresentationscenario,thesameorsimilar informationcanberepresentedtakingawiderangeofdierentshapes.so,simplegraphbasedsimilaritycanarisetotallyarbitraryresults. canbedenedintermsofhowthesetwoconceptsrelatetotheworldtheyshare.twored Intuitively,similarityoftwoconcepts objectsaresimilarw.r.t.thecolourdimension,buttheirsimilaritycannotbedeterminedin ageneralway. gies,somemodellingconstructslikesubclassof,rangeordomainappearassharedexternal BecauseGMOisbasedinthedirectedbipartitegraphoftheparticipatingontolo- entitiesintheinputgraphs. waytheyrelatetothesesharedentities. Initially,GMOmeasuressimilarityofentitiescomparingthe algorithmsarealwaysiterative),entitiesappearingtobesimilarcanbealsotakenasareferencetondsimilaritiesbetweenotherentities1.thisisamorerigorousandlessarbitrary conceptofsimilaritythanthosebasedonnodedistance.itcompareshowentitiesrelateto commonconcepts,soitisclosertothehumaninterpretationprocess,inwhichthemeaningofsomethingisentailedfromhowitrelatestothingsforwhichthemeaningisalready Asthealgorithmiterates(structuralsimilarity known. 1[20]demonstratesthatthealgorithmconverges

66 46 Chapter6:SemanticIntegration Anexample Example6.3.2.Let'sseeasimpleexample.TakethefollowinggraphsG0AandG0B. Figure6.5:G0A Figure6.6:G0B

67 Chapter6:SemanticIntegration 47 Figure6.7:GA Figure6.8:GB Iteratingthealgorithm22timesitconvergestothefollowingresult(Rows: b:teacher, b:overseastudent,b:people,b:other,b:student;columns:a:graduate,a:scholastics,a:phdstudent,

68 48 Chapter6:SemanticIntegration a:supervisor): 0 0;049 0;014 0;106 X12= B 0;051 0;013 0; ;02 0 0;013 0;018 0;145 0; ;014 0;014 0;018 0;049 Thesimilaritybetweenb:teachanda:superviseis0,446. Afternormalization: 0 0;336 0;098 0;73 X12=X1=maxValue(X1)= B 0;353 0;09 0; ; ;09 0; ; ;098 0;098 0;127 0;336 Thesimilaritybetweenb:teachanda:superviseis1. So,asexpectedtheentitiesa0,b0andc0(rows)aresimilartoa,bandc(columns)respectively. 6.4 UpperOntologies tology(a.k.a.upperontology),formalizingconceptssuchasprocessesandevents,timeand Somerecentandnotsorecentinitiativespursuetodevelopageneral-purposeon- space,physicalobjects,andsoon. standardmeaningbuildingblockstoallowdomain-specicontologiesextendthem. Theseupperontologiesaimtooersomebasicand scenario,whereaglobalschema(acommonviewondierentlocalschemas)isusually Asnotedby[106],thisscenarioisdierentfromthetraditionaldataintegration generatedoncetheunderlyinglocalschemasarealreadyknown. intermsoftheglobalschema,andtheintegrationproblemreducesto1)mapthelocal Userqueriesarewritten schemastotheglobalschema(usingtheglobalasview(gav)orthelocalasview(lav) approaches)and2)answerthequeryusingthedenedmappings. norpretendingstandardqueriesbeingwritteninitsterms. Anupperontologydoesnotaimtobeaviewoverallitsderiveddomain-ontologies, moregeneral,sinceitdeneconstructsforontologiesyettobedeveloped.however,itserves Anupperontologyisusually alsotothedataintegrationgoals,siceincreasingthenumberofstandardsemanticsitalso improvesthecondenceofontologyalignmentalgorithms. [92].Veryrelatedtothisidea-despiteitisnotstrictlyanupperontology-wendalsoWord- Someupperortop-levelontologiesareSUMO[105],DOLCE[44]andCyc/OpenCyc Net[95] IEEESUMO UpperOntologyWorkingGroupaimedatdevelopingastandardupperontologythatwill SUMO(SuggestedUpperMergedOntology)[105]isaneortbytheIEEEStandard 1 CA 1 CA

69 Chapter6:SemanticIntegration 49 promotedatainteroperability,informationsearchandretrieval,automatedinferencing,and naturallanguageprocessing..sumotriestostandardizeahierarchyofsomebasicground conceptslikeobject,continousobject,process,quantityorrelation DOLCE formalfoundationalontologydevelopedasanupperontologyinthewonderwebproject. DOLCE(DescriptiveOntologyforLinguisticandCognitiveEngineering)[44]isa DOLCEaimstoprovideasetofcommonsemanticstoachieveinteroperabilityamongontologiesrelatedtoWonderWeb.Accordingto[44],itaimsatcapturingontologicalcategories underlyingnaturallanguageandhumancommon-sense WordNet Englishnouns,verbs,adjectivesandadverbsareorganizedintosynsets(synonymsets), WordNetisanonlinelexicalreferencesystem,developedatPrincetonUniversity. eachrepresentingoneunderlyinglexicalconceptthatissemanticallyidenticaltoeachother. DespiteofWordNetdoesnotdeneitselfasanontology,synsetsarecross-linkedthrough relationshipssuchassynonymyandantonymy,hypernymyandhyponymy(subclass-ofand Superclass-Of)meronymyandholonymy(Part-OfandHas-a).So,wecanconsiderWordNet asaspecialkindofupperontology Cyc/OpenCyc attempttobuildauniversalexpertsystem. DougLenat'sCyc(fromenCYClopedia)Project[92]wasbegunin1984asan representingtime,substances,perception,etc.,andtheoriginalemphasisonframesshifted Theprojectresolvedbasicquestionsabout towardsrst-orderpredicatecalculusinstead. wasreplacedwiththeideaofmanypartially-independentmicro-theories. Theinitialideaofauniedknowledgebase semanticsubstratumofterms,rules,andrelations.itintendedtoprovidealayerofmeaning Cyc'smaingoalwasconstructingafoundationofbasiccommonsenseknowledge,a thatcanbeusedbyotherprograms(suchasdomain-specicexpertsystems).nowadaysits opensourceversion,opencyc,isstillprogressing,andcontainsover47,000conceptterms andover300,000facts.

70 Chapter 7 Current Challenges in Data integration Firstdataintegrationproblemswererelatedtotheevolutionoflocalareanetworks. Dataintegrationhasevolvedinparalleltocomputernetworksandcomputerparadigms. successofinternetandthewwwfedanewgenerationofproblems,andsolutionsforsome The ofthem.nowxml,thesemanticweb,andthep2pparadigmarisenewchallengesforthis discipline. 7.1 OntologyAlignment Semantic Mappings Generation: Schema matching and semanticmappingsbetweenasetofdierentdatasourcesandamediatedschema.traditionallythesemappingshavebeenwrittenmanually,beingthisthemaindrawbackofdata integrationsystems.manualmappinggenerationisacostlyanderror-pronetask,and-what isworst-itentailsseriousmaintainabilityproblems. Mostpartofapproachesofdataintegration(GAV,LAVandGLAV)relyonthe seemsnottobepossible. Forthemoment,completeautomationofthegenerationofsemanticmappings ofthesourceschemasorontologies,thatsurpassesallknownaitechniques.however,the Thistaskentailsthecompleteunderstandingofthesemantics topologyandlexicalinformationoftheschemasandontologies,oreventheirrelateddata, providecluesthatcanservetohelptheprocessofgenerationandmaintainmentaswehave seeninthechapteraboutsemanticintegration. Alignment,becauseontologieshavepotentiallybetterpossibilitiesforsemanticintegration Now,thefocushasturnedtoOntology thanschemas. years,likethesemi-automaticapproachin[122],ortheworkofa.doan[31],whoseph.d. However,schemaintegrationresearchhasachievedgoodresultstheselast thesis"learningtomatchtheschemasofdatabases: 2003ACMDoctoralDissertationAward. AMultistrategyApproach"wonthe 50

71 Chapter7:CurrentChallengesinDataintegration Answeringqueriesusingontologyalignments withonesingleinformationsystem.thisinteractionusuallytakestheformofaquerythat Dataintegrationaimsatgivingusersandapplicationstheillusionofinteracting theuserissuestothesystem,whichitmustprocessandreturnasatisfactoryanswer. ontology-basedinformationsystems,userqueriesaretranslatedtoqueriesorotherretrieval In tasksovertheaboxdened1. systemstoanswerqueriesindependentlyoftheontologyfromwhichthequerytermshave OneofthemaingoalsofOntologyAlignmentisallowingsemanticinformation beentaken.thecommonwaytoproceedinmostpartofexistingsystemsistomaterialize theobtainedmappingsintonewstatements(e.g. inferenceenginedoitstask.however,notallthemappingshavethesamelevelofcondence, owl:equivalentclass),andthenletthe anddecidingwhichtoincludeandwhichnotbecomesaproblemthatisusuallysolved heuristically. Figure7.1:Workowofthequeryansweringprocess Uncertainmappings exactconstraintsofaquery,thenitisnotincludedintheresult. Intraditionalsemanticinformationsystems,ifastatementdoesnotsatisfythe somesituationswhentherearesomedegreeofuncertaintyoveranstatement.oneofthis However,thereare situationscouldbeanontologyalignmentprocess,whichreturnsasetofmappingswith theirrespectivelevelofcondence.traditionaldescriptionlanguages(e.g.owl)orquery languages(e.g.sparql),donotprovidemechanismstofacethisproblem.however,some initiativeslikefuzzyowl[141]orpr-owl[27]arenowworkingtoincludecertaintyin thesemanticweb. describeacontrolledvocabulary(e.g.asetofclassesandproperties)whileaboxarestatementsaboutthat 1AboxandTboxareusedtodescribetwodierenttypesofstatementsinontologies. Tboxstatements vocabulary(instances).

72 52 Chapter7:CurrentChallengesinDataintegration PR-OWL(probabilisticOWL) constructsformodellinguncertaintyinontologies. PR-OWL[27]isanovelontologydescriptionlanguagethatextendsOWLproviding limitationsofdeterministicclassicallogictoafullrst-orderprobabilisticlogic.pr-owl Itallowsmovingbeyondthecurrent itisnotlimitedtoextendingtheattribute-valuemodelbyincludingsyntaxtodescribe probabilities,itgoesbeyondbyallowingtorepresentcomplexbayesianprobabilisticmodels. PR-OWLhasMulti-EntityBayesianNetworks(MEBN)asitsunderlyinglogicalbasis.This kindofnetworkscombinebayesianprobabilitytheorywithclassicalfirstorderlogic. f-owl(fuzzyowl) degreestoowlfacts. FuzzyOWLorsimplyf-OWL[141]isafuzzyextensionofOWLDLbyadding amembershipdegree,thatrangesfrom0to1,thesemanticsoff-owlmustberedened. Despitethattheonlysyntacticchangerequiredistheadditionof [141]describesthenewsemanticsandalsof-SHOINasanextensionoftheSHOINDL. 7.3 XML-RDFsemanticintegration lengesofdataintegration.thesuccessoftheresourcedescriptionframework(rdf)[124] InteroperabilityamongautonomousXMLschemashasbeenoneoftherecentchal- hand,raisingthenecessitytoestablishinteroperabilitymechanismsbetweenxmlschemas anditsrelatedtechnologies(rdfs[126],owl[112])hasrefuelledtheproblemby,onone andrdfschemas(rdfsorowl),and,ontheotherhand,openingthepossibilityto userdfs/owlontologiesasasolutionforthesemanticmappingproblemamongxml schemas. triestoexploittheadvantagesofanxml-to-rdfmapping[54][65][5][84][87][117][132].the Oneofthecontributionsofthisthesisisstronglyrelatedbytheresearchtrendthat XML-to-RDFmappinghasbeenfacedtraditionallyfromwhatisknownasthestructuremapping[97],thatdenesadirectwaytomapXMLschemaentitiestoRDFclasses.Our contributionconsistsonexploringadierentapproach,themappingofthegeneralxml relatedworksectionofchapter9. modeltordf.amoredeepanalysisoftherelatedstateoftheartcanbefoundinthe 7.4 QueryinghighlyvolatileandrestrictedWebdatasources andlavapproachesortheminicon[119]andbucket[56]algorithms. Theclassicalproblemsofdataintegrationhavewell-knownsolutionsliketheGAV lutionofweb-relatedtechnologieslikexmlandrdf,andtheproliferationofnewdata However,theevo- sourcesandwrappertechnologiessuggestthereformulationoftheoldproblemsandsolutions.oneofthecontributionsofthisthesisisrelatedtousingxquery[154]inalav-based approachtoqueryasetofspanishonlinenewspapers.recentlythomaskabischandmattis Neiling[81]haveusedaverysimilarstrategybutusingRDFandRDQLtoquerydatarelatedtoresearchpapersfromthebest-knownwebsources.Iborrowfromthemthenameof

73 Chapter7:CurrentChallengesinDataintegration 53 QueryTunnelingtodenethistrend,whichinfactisasimplicationoftheLAVapproach butmoreappropriatefortheveryrestrictedwebdatainterfaces. 7.5 DataintegrationinP2P servermodelbyeliminatingthenecessityofhavingcentralservers,reducingcommunication Peer-to-peer(P2P)architecturesarebecomingpopular. Theyupdatetheclient- andstoragecostsandimprovingreliability.someworkshavesuggestedthathavingacentral schemafordataisnotagoodideainap2parchitecture.wecanndanexampleofthis needtoagreeonacentraldataschema,theycandenetheirlocalsemanticmappingstothe inthepeer-datamanagementsystems(pdms)[4][53][18].inapdmsparticipantsdonot mostconvenientpeer,andqueriescanbeansweredbychainingmappings.thisapproach improvesexibility,allowingeachpeertoquerythesystemusingitsownschema. forsemanticweb. In[52]AlonY.Halevyetal.describedPiazza:DataManagementInfrastructure alsotheintegrationofrdf.in[4]aredescribedsomeoftheproblemsrelatedtopdms, PiazzaisaPDMSbasedontheuseofXMLandXQuerybutallows focusingondataplacement(alsorelatedtothepiazzasystem). anintelligentmaterializationofviews(replication)insomenodesinthenetworkallowsto Theydemonstratethat improveperformanceandavailability. formediatingbetweendierentpeersinapdms,andanalgorithmforansweringqueries In[18]PhilipA.Bernsteinetal.describedlocalrelationalmodelsasaformalism usingtheformalism.in[103]isdescribedtheedutellasystem,focusedinthexml-rdfinteroperability.itaimstoprovidequeryandstorageservicesforrdf,butwiththeabilityto useheterogeneousunderlyingsources.therdfqueriesarereformulatedtotheunderlying storageformatsandquerylanguagesusingcanonicalmappings(edutelladoesnotemploy point-to-pointmappingsbetweennodes). tioninadecentralizedfashion.schemasandmappingsaredynamicallyspreadthroughthe TheChattyWeb[1]describesprotocolsforexchangingsemanticmappinginforma- networkbyagossipmechanism,andqueriesareroutedandmappedusingthisinformation. useofrelationaltablesandprovidesatheoreticalmodel. Hyperion[82]facestheproblemofmappingobjectsfromdierentsources.Itfocusesonthe oninformationretrievalalgorithmsforqueryreformulation.eachpeerandeachoneofits Finally,PeerDB[104]avoidsschemamappingstakingadierentapproachbased attributesisassociatedwithasetofkeywords.givenaqueryoverapeerschema,peerdb reformulatesthequeryintootherpeerschemasbymatchingthekeywordsassociatedwith theattributesofthetwoschemas.

74 Chapter 8 Problem Statement DataIntegration.Ingeneral,integrationofmultipledatasourcesaimsatgivingaunied Thisthesisfacestwospecicproblemswithinthegeneralandoldresearchtopicof viewoverasetofpre-existentdata. datawithouthavingtodealwiththeparticularitiesofeachsource.achievingthisambitious Thisentailstoallowusersauniformaccessoverthe goalimpliessolvingseveralproblemsthathavedeneddierentresearchtopics. 8.1 Problemaddressed1: Semanticintegration acrossdisparatesources. Semanticheterogeneityisoneofthekeychallengesinintegratingandsharingdata structure.inthedatabasearea,semanticscanberegardedaspeople'sinterpretationofdata Semanticsrefertomeaning,incontrasttosyntaxthatrefersto andschemaitemsaccordingtotheirunderstandingoftheworldinacertaincontext. manticintegrationistheresearchareafocusedinreconcilingdatafromautonomoussources Se- usingontologiesorothersemantic-basedtools. problems: Thisthesisaimstocontributetothisresearchtrendbyprovidingsolutionstotwo 1.XML-RDFSemanticIntegration:HowtotakeprotfromontologiestointegrateXML datafromdisparateschemas? butalsotooneormoreontologies?itispossibletodothisandkeepusingconventional HowtoqueryXMLdatarelatedtomultipleschemas XMLquerylanguageslikeXPathorXQuery? 2.OWLOntologyAlignment:Canarigorousandscalablesemanticsimilaritymeasure bedenedforowlontologies?cananontologyalignmentprocesssuccessfullywork directlyovertherdflabeleddirectedgraph,oritisbettertoprocesstheequivalent bipartitegraph? 8.2 Problemaddressed2: Heterogeneousqueryinterfaces wereprovidedtotheproblemofhowaninitialquery,targetingalogicalmediatedschema, WithintheolddataintegrationLAV(Local-As-View)approach,somesolutions 54

75 Chapter8:ProblemStatement 55 mustbetranslatedintoqueriesoverasetofdierentautonomousdatasources. solutions,generallycomplex,werebasedondatalog,andfocusonansweringexpressive Theold queriesoverheterogeneousbutrichqueryinterfaces. relatedtechnologies,likexml,allowstoreformulatethisoldproblem,thatisthebasisof TheevolutionoftheWebandits thesecondmaincontributionofthiswork: 1.CanXML-technologiesandastrategybasedonthereprocessingofresultsbeapracticalsolutionforweb-baseddataintegrationsystems? instantiatedtodevelopspecicapplications? Howthisapproachcanbe

76 56 Chapter8:ProblemStatement

77 Part II Heterogeneous Data Models and Schemas: Semantic Integration 57

78

79 Chapter 9 XML Semantic Integration: A Model Mapping Approach enginethatmakesuseofit)canbeanaturalandpowerfultoolforprocessingmetadata, Thisworkdescribes1)Whyanontology-awareXPathprocessor(oranXQuery 2)HowaprocessorwithsuchbehaviourhasbeenimplementedusingDescriptionLogics (materialisedinrdfsandowlconstructs)and3)arealapplicationscenariointhe DigitalRightsManagement(DRM)domain.Wepresentthearchitectureofaschema-aware andontology-awarexpathprocessorthatactsoveranrdfmappingofxml.translating XMLdocumentstoRDFpermitstakingprotfromthepowerfultoolsofDescriptionLogics allowingxmldocumentsinteroperateatthesemanticlevel. thedigitalrightsmanagement(drm)domain,wheresomeorganizationsareinvolvedin Wetestourapproachin standardizationoradoptionofrightsexpressionlanguages(rel).weexplorehowaschemaawareandontology-awarexpath/xqueryprocessorcanbeusedintwoofthemainrel initiatives(mpeg-21relandodrl). 9.1 Alreadypublishedwork Largeportionsofthischapterhaveappearedinthefollowingpapers: TousR.,GarcíaR.,RodríguezE.,DelgadoJ.ArchitectureofaSemanticXPath Processor.ApplicationtoDigitalRightsManagement,6thInternationalConferenceonElectronicCommerceandWebTechnologiesEC-Web2005.August 2005Copenhagen,Denmark. (2005),pp.1-10.ISSN: LectureNotesinComputerScience,Vol TousR.,DelgadoJ.ASemanticXPathprocessor.InterDB2005International WorkshoponDatabaseInteroperability.ELSEVIER'sElectronicNotesinTheoreticalComputerScience2005 TousR.,DelgadoJ.RDFDatabasesforQueryingXML.AModel-mapping Approach.DISWeb2005InternationalWorkshopDataIntegrationandtheSemanticWeb.ProcedingsoftheCAiSE'05Workshops.FaculdadedeEngenharia dauniversidadedoporto.isbn pages:

80 60 Chapter9:XMLSemanticIntegration:AModelMappingApproach TousR.,DelgadoJ.UsingOWLforQueryinganXML/RDFSyntax.WWW'05: Specialinteresttracksandpostersofthe14thinternationalconferenceonWorld WideWeb.Chiba,Japan.Pages: ACMPress2005.ISBN: Introduction forinstancevaliditycheck. ThemostpartofXML-basedapplicationsmakeuseofoneormoreXMLschemas generallytheschemasdenealsoinheritancehierarchiesamongtypesandelementnames(to Inadditiontodeningavalidstructureforthedocuments, rationalizethewritingoftheschemasandtheinstancesrespectively).however,sometimes itisnecessarytoconsiderthisinformationnotonlyforvalidationbutalsowhenevaluating queriesoverthexmldata. ontologiestodenesemanticconnectionsamongapplicationconcepts. TodayitisalsobecomingcommontheuseofRDFS/OWL ontologiesdenerelationshipsthatarerelevantforqueryevaluation(equivalencesamong Insomecases,the names,subclassing,transitiveness,etc.). knowledgeishardtoaccessfordevelopers,becauseitrequiresaspecictreatment,like Unfortunately,allthisstructuralandsemantic deningmultipleextraqueriesfortheschemasorusingcomplexrdftoolstoaccessthe ontologiesinformation. ontology-awarexpath/xqueryprocessor. Toovercomethissituationwepresentthearchitectureofaschema-awareand setofxmlschemasandrdfs/owlontologiesandwillresolvethequeriestakingin Theprocessorcanbefedwithanunlimited considerationthestructuralandsemanticconnectionsdescribedinthem. goal,theprocessoractsoveranrdfmappingofxml,contributingtoarecentresearch Toachievethis trendthatdenesanxml-to-rdfmappingallowingxmldocumentsinteroperateatthe SchemainRDF.Thisrepresentationretainsthenodeorder,incontrastwiththeusual semanticlevel.weuseamodel-mappingapproachtorepresentinstancesofxmlandxml structure-mappingapproach,soitallowsacompletemappingofallxpathaxis. worktohelpidentifyingtheproblemandtherelevanceofthecontribution. Thischapterisstructuredinthreemainblocks. First,wedescribesomerelated describethearchitectureofthesemanticxpathprocessorandsomeimplementationdetails. Second,we Third,weapplyourapproachtoaplausibleusagescenario,theDigitalRightsManagement (DRM)domain.WeexplorehowanXPath/XQueryprocessorwithsemanticbehaviourcan beusedforprocessinglicensesfromtwoofthemainrightsexpressionlanguage(rel) initiatives(mpeg-21relandodrl). 9.3 Relatedwork Thequeryrewritingapproach haviourforxpath/xquery. Thereisapreviousworkthatalsopursuesthetargettoachieveasemanticbe- thetranslationfromxmlschemastoowl.becausetheauthorsdonotattempttoprovide Thisapproachisdescribedin[91],andalsoshareswithours anewxqueryimplementation,theyusetheobtainedontologyasaguidancetorewritethe

81 Chapter9:XMLSemanticIntegration:AModelMappingApproach 61 toaconventionalxqueryinstance. originalsemanticxqueryinstance(theycallitsemanticwebquerylanguageorswql) anewquerylanguageanddoesnotneedatranslationbetweenthesemanticqueriesand Thedierencebetweenthisapproachandoursisthatourworkdoesnotdescribe XPath/XQueryexpressions.WehavedevelopedanewXPathprocessorthatdirectlymanipulatesconventionalXPathinstancesbuttakinginconsiderationthesemanticrelationships denedintheschemasand/orontologiesandrelatedtothenamesinvolvedinthequery.the processorcanbeembeddedintoaconventionalxqueryprocessortoobtainthesemantic behaviouralsoforxquery Otherrelatedwork.Model-mappingvs.Structure-mapping toowl.theoriginsofthisapproachcanbefoundinaresearchtrendthattriestoexploitthe ThekeyelementofourworkisthemappingoftheXMLandXMLSchemamodels advantagesofanxml-to-rdfmapping[54][65][5][84][87][117][132].however,theconcepts ofstructure-mappingandmodel-mappingareolder. dierentiatebetweenworksthatmapthestructureofsomexmlschematoasetofrelational In2001,[97]denedthesetermsto tables(elementnamesbecometablenames)andworksthatmapthexmlmodeltoa generalrelationalschema(asmallnumberoftablesrepresentingelements,atributesand relationships,elementnamesbecomejusteldvalues)respectively. mapxmldocumentstordftriples([65]classiesthisapproachasdirecttranslation). Morerecently,[84]takesastructure-mappingapproachanddenesadirectwayto [54],[65],and[5]takealsoastructure-mappingapproachbutfocusingondeningsemantic mappingsbetweendierentxmlschemas([65]classiestheirownapproachashigh-level Mediator). ofxpathconstructs. Theyalsodescribesomesimplemappingmechanismstocoverjustasubset (thoughwithinthestructure-mappingtrend)andfocusonintegratingxmlandrdfto Otherauthorslike[87]or[117]takeaslightlydierentstrategy incorporatetoxmltheinferencingrulesofrdf(strategiesclassiedby[65]asencoding analogouslanguagetoxpathbutfornatural(notderivedfromxml)rdfdata(thislast Semantics). Finallyit'sworthmentiontheRPathinitiative[132],thattriestodenean workdoesn'tpursueinteroperabilitybetweenmodelsorschemas). 9.4 ArchitectureofthesemanticXPathprocessor Overview works. ThekeyissueistheXML-to-RDFmapping,alreadypresentinotherworks,but Figure 9.1outlineshowtheschema-awareandontology-awareXPathprocessor approach,thatmapsthespecicstructureofsomexmlschematordfconstructs,wemap thatwefacefromthemodel-mappingapproach. Incontrastwiththestructure-mapping thexmlinfoset[68]usingrdfsandowlaxiomsbasedonthealreadyexistingw3c's withoutanyrestrictionandwithoutlosinginformationaboutnode-order.weusethesame RDFSinformativerepresentation[147]. ThisallowsustorepresentanyXMLdocument approachwithxsd,obtaininganrdfrepresentationoftheschemas,aswewillexplain later.incorporatingalternativeowlorrdfsontologiesisstraightforward,becausethey

82 62 Chapter9:XMLSemanticIntegration:AModelMappingApproach Figure9.1:SemanticXPathprocessorarchitectureoverview arealreadycompatiblewiththeinferenceengine.inthegurewecanseealsothatanowl representationofthexmlmodelisnecessary.thisontologyallowstheinferenceengineto correctlyprocessthedierentxpathaxisandunderstandhowthexmlelementsrelateto thedierentxsdconstructs. twomovies: Example9.4.1.Letusseeasimpleexample.TakethefollowingXMLdocumentdescribing <movies> <movie <title>blade id="m1"> <year>1982</year> Runner</title> <director <name>ridley id="d1"> </director> Scott</name> </movie> <movie <title>paris, id="m2"> <year>1984</year> Texas </title> <director <name>wimid="d2"> </director> Wenders</name> </movie> </movies> documents: AndalsoitsattachedXMLschemadescribingthevalidstructureforall"movies" <xs:schema> <xs:element <xs:complextype> name="movies"> <xs:sequence> <xs:element <xs:complextype> name="movie">

83 Chapter9:XMLSemanticIntegration:AModelMappingApproach 63 <xs:sequence> <xs:element <xs:element name="title"/> <xs:element name="year"/> <xs:complextype> name="director"> <xs:sequence> </xs:sequence> <xs:element name="name"/> </xs:complextype> <xs:attribute name="id"/> <xs:attribute </xs:element> </xs:complextype> name="id"/> </xs:sequence> </xs:element> </xs:element> </xs:complextype> </xs:schema> increasetheinteroperabilityofapplicationsortoxinteroperabilityproblemsinanelegant Theobject-orientednatureofsomeXMLSchemaconstructsallowsusingthemto way. namesoftwodierentxmllanguages.thepreviousschemadenestheelementsmovies, Forexample,thesubstitutionGroupinheritancemechanismcanbeusedtobindthe movie,title,year,etc.itcouldbeinterestinginsomecontexttohavethepossibilitytowrite theelementandattributenamesinalanguagedierentfromenglish.wecangeneratea schemathatbindsthedierentnamesfromthespanishversiontothe(master)english version: <xs:schema> <xs:element <xs:complextype> name="películas" substitutiongroup='movies'> <xs:sequence> <xs:element <xs:complextype> name="película" substitutiongroup='movies'> <xs:sequence> <xs:element <xs:element name="título" name="año" substitutiongroup='year'/> substitutiongroup='title'/> <xs:element <xs:complextype> name="director" substitutiongroup='director'> <xs:sequence> </xs:sequence> <xs:element name="nombre" substitutiongroup='name'/> </xs:complextype> <xs:attribute name="id"/> </xs:element>

84 64 Chapter9:XMLSemanticIntegration:AModelMappingApproach </xs:complextype> <xs:attribute name="id"/> </xs:sequence> </xs:element> </xs:element> </xs:complextype> </xs:schema> willobtainthesameasforthe/pelicula/pais(independentlyifthexmlinstanceiswritten Now,usingourschema-awareXPathprocessor,ifweaskfor/movie/countrywe inenglishorinspanish).so,wecandevelopapplicationsthatarenottiedtoaparticular schemabuttoanglobalone. tologies)todenesemanticrelationshipsbetweenotherxmlschemasorontologies,andto ThisfeatureallowsusingXMLschemas(oralsoOWLon- issuexpathqueriesthatwillbesolvedaccordingly.thisisjustoneofthefeaturesofthe approachinatrivialscenario,butservestoillustratetheidea OWL.Anontologyweblanguage WorkingGroup(WebOnt),isalanguagefordeningandinstantiatingWebontologies.The TheOWLWebOntologyLanguage,beingproducedbytheW3CWebOntology languagecanbeusedtoformalizeadomainbydeningclasses,propertiesofthoseclasses andindividuals. inferenceenginesorotherapplicationcanreasonaboutthedierentclassesandindividuals, Withtheinformationofadomaininamachine-understandableformat, derivinglogicalconsequences,i.e.factsnotliterallypresentintheontology,butentailedby thesemantics AnOWLontologyfortheXMLmodel(XML/RDFSyntax) documentinasetofrdftripletslike[65],wetriedtorepresentthexmlinfoset[68]using Insteadoftakingtheintuitivestructure-mappingapproachtotransformaXML anowlontologybasedonthealreadyexistingw3c's[147].thisallowsustorepresent anyxmldocumentwithoutanyrestrictionandwithoutlosinginformationaboutnodeorder.fig.9.3showsgraphicallyhowtheexampleofg.9.2willberepresentedusingthe classesandpropertiesdenedwithowl.thedescendantsoftheclassnode(document, element,attributeandtextnode)inconjunctionwiththeobjectpropertychildofarethe mainbuildingblocksofthedocumenttree,whiletheobjectpropertypreceding-siblingis necessarytopreservethenodeorder. theexpressivepowerofowltodenepropertieslikeparentof,descendant,ancestor,de- FollowingweincludetheOWLontologytoshowthedetails.Wetakeprotfrom scendantorself,ancestororself,immediatefollowingsibling,followingsibling,following,pre- cedingsibling,andprecedingjustintermsofthetwoprimitiveschildofandimmediatepre- cedingsibling.thiswillbeofgreathelplaterwhenwetranslateanxpathquerytoardql queryfortherdf-representationofthexmldata. supersetofchildof,whichitselfisdenedastheinverseofparentof.allthesepropertiesdo Fore.g.,wedenedescendantasa 1seetheBackgroundInformationchaptersforabriefintroductiontoOWL

85 Chapter9:XMLSemanticIntegration:AModelMappingApproach 65 notneedtobepresentintherepresentationbecausetheywillbededucedbytheinference enginewhenprocessingthequeries.asimplieddescriptionoftheontologyindescription Logicssyntax(SHIQ-likestyle[62])wouldbe: DocumentvNode ElementvNode TextNodevNode childofvdescendant parentofvancestor childofparentof Trans(ancestor) ancestorvancestororself selfvdescendantorself selfvancestororself selfsameas immediateprecedingsiblingvprecedingsiblinng immediatefollowingsiblingvfollowingsibling immediateprecedingsiblingimmediatefollowingsibling Trans(followingSibling) andpropertiesdenedwithowl. Fig.9.3showsgraphicallyhowtheexampleofg.9.2willberepresentedusingtheclasses Figure9.2:XMLsimpleexampledescribingtwomovies

86 66 Chapter9:XMLSemanticIntegration:AModelMappingApproach Figure9.3:RDFgraphformoviesexample XPath addressingpartsofanxmldocument",thatwasstrictlycorrectforxpath1.0,theversion WhiletheocialdenitionofXPath[69]remainssaying"XPathisalanguagefor 2.0cannotbedenedjustthatway.BecausethesyntaxofXPath2.0isacompact-version ofxquery2.0,itbetterwouldbedenedase.g. andprocessingxml".xpathusesacompact,non-xmlsyntaxtofacilitateuseofxpath "asequence-basedlanguageforquerying withinurisandxmlattributevalues.xpathoperatesontheabstract,logicalstructure ofanxmldocument,ratherthanitssurfacesyntax. suchasxslt2.0[155]orxquery1.0[154].howevertherelationofxpathandthesetwo [69]saysthatXPath2.0hasbeendesignedtobeembeddedinanotherhostlanguage languagesdiers.xpath2.0andxquery1.0havethesamesemantics,denedbyxquery 1.0andXPath2.0FormalSemantics[153].[69]saysthat"XQuery1.0isanextensionof XPath2.0".Soonecantalkaboutthesamelanguagewithtwosyntaxes,onewiththeSQL avour(xquery),andthecompactversion(xpath)tobeembeddedinahostlanguage (XSLT) XPathdatamodel mationcontainedintheinputtothehostlanguageinwhichxpathisembeddedandalso [152]speciestheXQuery1.0andXPath2.0datamodel. Itdenestheinfor- allpermissiblevaluesofxpathexpressions.thedatamodelisbasedonthexmlinfoset keyelementsofthexpathdatamodel: [68]. Thefollowingdenitions(extractedfrom[152]andnotcomprehensive)describethe 1.Everyinstanceofthedatamodelisasequence. 2.Asequenceisanorderedcollectionofzeroormoreitems.

87 Chapter9:XMLSemanticIntegration:AModelMappingApproach 67 3.Asequencecannotbeamemberofasequence. 4.Anitemiseitheranodeoranatomicvalue 5.Everynodeisoneofthesevenkindsofnodes(document,element,attribute,text, namespace,processinginstruction,andcomment). dierencewithrespecttoxpath1.0,inwhichthebasicconstructswerenode-sets(without So,thebasicbuildingblockofthedatamodelisthesequence.Thisisanimportant duplicates).sequencescancontainduplicatesbutnotothersequences,combiningsequences alwaysproduceaattenedsequenceinsteadofanesting XPathsyntax forexpressions,conditionals,intersections,unionsanddierencesamongotherconstructs. ThegrammarrulesofXPath2.0haveincreasedincomplexity,sincenowsupports Herewearegoingtodescribejusttherulesthataresharedwiththeversion1.0,focusingin locationpaths(nowpathexpr).thepartialbackus-naurform(bnf)rulesforanxpath's expressionare: Expr ExprSingle ::= ExprSingle ::= ForExpr ("," ExprSingle)* OrExpr ::= AndExpr ( "or" QuantifiedExpr AndExpr )* IfExpr OrExpr AndExpr ::= PathExpr ( "and" PathExpr )* characters. Thebasicbuildingblockofthesyntaxistheexpression,whichisastringofUnicode PathExpr,thebasicconstructtoaddresspartsofanXMLdocument.TheBNFrulesfora Forthisworkwearegoingtoconsiderjustexpressionsconsistingonasingle PathExprare: PathExpr RelativePathExpr ::= RelativePathExpr ::= AxisStep "/" ("/" (AxisStep)* RelativePathExpr)? AxisStep ForwardStep ::= ::= (ForwardStep (ForwardAxis ReverseStep) NodeTest) Predicate* AbbrevForwardStep ::= "@"? NodeTest AbbrevForwardStep ReverseStep AbbrevReverseStep ::= (ReverseAxis ::= ".." NodeTest) AbbrevReverseStep NodeTest NameTest ::= ::= KindTest QName "*" NameTest KindTest Predicate ::= ::="node()" "[" Expr "]" "text()"... TheysaysimplythataPathExprisasequenceofsteps(axisStep),eachonecomposedof Theserules,extractedfrom[69],weremoresimpleandclearintheversion1.0. anaxis(forwardaxisorreverseaxis),anodetestandalistofpredicates.axisarethekey element,becausedenethedirectionofeachstep.thereare13dierentaxis:

88 68 Chapter9:XMLSemanticIntegration:AModelMappingApproach ForwardAxis ::= <"descendant" <"child" "::"> <"attribute" "::"> "::"> <"self" <"descendant-or-self" "::"> <"following-sibling" "::"> "::"> <"following" <"namespace" "::"> ReverseAxis ::= <"parent" "::"> "::"> <"ancestor" <"preceding-sibling" "::"> <"preceding" "::"> "::"> <"ancestor-or-self" "::"> XMLdocumentdescribingmoviescouldbe: Table 9.1givesthemeaningofeachaxis. SomeexampleXPathqueriesforan /child::movies/child::movie/child::title (in /descendant-or-self::title abbreviated form /movies/movie/title) (in abbreviated form //title) (in abbreviated form /following-sibling::node() /following-sibling) XPathFormalsemantics isnotacoincidencethatsomeoftheaxiomsarealreadypresentinthexml/rdfontology, XPathcanbeformallydenedbydescribingtheoperationsonthisdatamodel.It becausetheymapdirectlytoxmlprimitives(e.g.child).firstwemustdenethefunction E,correspondingtotheXPathExprrulefromtheEBNFgrammar[69]. E:Path!Node!sequence(Node) E[[e1=e2]]x=fx2jx12E[[e1]]x^x22E[[e2]]x1g E[[a::t]]x=fx1jx12Aa(x)^Tt(x1)g E[[e[p]]]x=fx1jx12E[[e]]x^P[[p]]x1g mar. ThefunctionAadescribesboththeForwardAxisandtheReverseAxisrulesfromthegram- Aa:!Node!sequence(Node)

89 Chapter9:XMLSemanticIntegration:AModelMappingApproach 69 childtable9.1:xpathaxis Allchildrenofthecontextelement descendant (attributescannothavechildren) Thedescendantsofthecontextnode (thechildren,thechildrenofthe parent children,andsoon) ancestor Theparentofthecontextnode. Theancestorsofthecontextnode (theparent,theparentoftheparent, following-sibling andsoon) Thosechildrenofthecontextnode's parentthatoccurafterthecontext preceding-sibling nodeindocumentorder Thosechildrenofthecontextnode's parentthatoccurbeforethecontext following nodeindocumentorder Allnodesthataredescendantsof therootofthetreeinwhichthe contextnodeisfound,arenot descendantsofthecontextnode, andoccurafterthecontextnodein preceding documentorder Allnodesthataredescendantsof therootofthetreeinwhichthe contextnodeisfound,arenot ancestorsofthecontextnode, andoccurbeforethecontextnode attribute indocumentorder namespace Theattributesofthecontextnode self Namespacenodes descendant-or-self Thecontextnodeandthe ancestor-or-self descendantsofthecontextnode Thecontextnodeandtheancestors ofthecontextnode Adescendant(x)=fx1jchildOf(x1;x)_ Achild(x)=fx1jchildOf(x1;x)g (childof(x2;x) ^x12adescendant(x2))g Adescendant Aparent(x)=fx1jchildOf(x;x1)g or self(x)=fxg[fx1jx12adescendant(x)g Aancestor(x)=fx1jchildOf(x;x1)_ (childof(x;x2)^x12aancestor(x2))g

90 70 Chapter9:XMLSemanticIntegration:AModelMappingApproach Aancestor or self(x)=fxg[fx1jx12aancestor(x)g Apreceding sibling(x)=fx1jprecedingsibling(x1;x)g Apreceding(x)=fx1jx12Adescendant or self(x2) ^x22apreceding sibling(x3)g ^x32aancestor or self(x)g Afollowing sibling(x)=fx1jprecedingsibling(x;x1)g Afollowing(x)=fx1jx12Adescendant or self(x2) ^x22afollowing sibling(x3)g Aattribute(x)=fx1jattributeOf(x1;x)g ^x32aancestor or self(x)g Aattribute(x)=fx1jnamespaceOf(x1;x)g T:NodeTest!Node!Boolean Tn(x)=fhasName(x;n)g T (x)=ftrueg Tnode()(x)=ftype(x;0node0)g Telement()(x)=ftype(x;0elementNode0)g Ttext()(x)=ftype(x;0textNode0)g ThefunctionTdescribestheNodeTestrulefromthegrammar. dierentpredicatesbutdeningallisoutofthescopeofthisdocument.asanexamplewe ThefunctionPdescribesthePredicatesrulefromthegrammar.Therearealotof deneherethepredicatethatexpressestheexistenceofaspecicsub-treeasacondition. P:Predicate!Node!Boolean P[[p]]x=f9x12E[[p]]xg RDQLAQueryLanguageforRDF animplementationofthesquishql[139]rdfquerylanguage,whichitselfisderivedfrom RDQL[127]isthepopularRDFquerylanguagefromHPLabsBristol.RDQLis rdfdb[125]. andhasanenormousinuencetotheneww3c'srdfquerylanguage,sparql[138]. ThespecicationofRDQLwassubmittedtotheW3Cin9January2004, HoweverwehavechosenRDQLinsteadofSPARQLbecauseoftheexistenceofamature queryprocessorasthejenaapi[76].theresultsobtainedareextensible(andweplanto dothisexplicitwhentoolsareavailable)totheneww3c'slanguage. consistsofagraphpattern,expressedasalistoftriplepatterns. AnRDFmodelisagraph,oftenexpressedasasetoftriples. Eachtriplepatternis AnRDQLquery comprisedofnamedvariablesandrdfvalues(urisandliterals). additionallyhaveasetofconstraintsonthevaluesofthosevariables,andalistofthe AnRDQLquerycan variablesrequiredintheanswerset.anexamplerdqlquerycouldbe:

91 Chapter9:XMLSemanticIntegration:AModelMappingApproach 71 SELECT WHERE (?book,?book AND?year >= <somelibrary:year>, 2004?year) USING somelibrary FOR < andaliteralobjectconsistingonanintegerequalorgreaterthan2004.acompleteexplanationofthelanguagecanbefoundin[127]. ThissamplequerywillreturnalltheRDFtripleswithapredicatesomelibrary:year XPathtranslationtoRDQL ofthem(like[65])takeastructure-mappingapproachanddescribesomesimplemapping SomeworksfacetheproblemtoexecuteXPathqueriesoverRDFdata. Most mechanismstocoverjustasubsetofxpathconstructs(asmentionedbeforeitisnotfeasible tomaptheconstructsbasedonnode-orderinastructure-mappingapproach). works,liketherpathinitiative[132],trytodeneananalogouslanguagetoxpathbutfor Another natural(notderivedfromxml)rdfdata. RDQLquerythatweexecuteoveranexact(andnotjustanintuitivemapping)RDF OurstrategyisradicallydierentbecausewetransformaXPathqueryintoa representationoftheinputxmldata. constructsinanaturalandelegantway. ThismakesfeasiblethemappingofallXPath RDQL[127]query.Analogouslyeachnodetestandpredicatecanbemappedalsowithjust EachXPathaxiscanbemappedintooneormoretriplepatternsofthetarget oneoremoretriplepatterns.theoutputrdqlqueryalwaystakestheform: SELECT WHERE * (?v1, [triple<rdf:type>, pattern 2] <xmloverrdf:document>) [triple... pattern 3] USING [triple pattern N] xmloverrdf FOR < thefollowingaxisisdescribedas: ThetranslationcanbededucedfromtheXPathformalsemantics. Forexample, Afollowing(x)=fx1jx12Adescendant or self(x2) ^x22afollowing sibling(x3)g ^x32aancestor or self(x)g Sothefollowingaxismustbetranslatedto:

92 72 Chapter9:XMLSemanticIntegration:AModelMappingApproach (?vi, i = i <xmloverrdf:ancestor-or-self>, + 1?vi-1) (?vi, i = i + <xmloverrdf:following-sibling>, 1?vi-1) (?vi, <xmloverrdf:descendant-or-self>,?vi-1) i = i + TherearealsosimpleconversionrulesforallnodeTestsandpredicatesbutweomit 1 themtosavespace. beginswithvalue2(becauseofthersttriplepatternisalwaysthesameasshownbefore). Thenotationusedincludesvariablenameslikeviandvi-1wherei Soifwewouldhavejusttheexpression: /child::movies/child::movie Wewilltranslatetherstchildaxisto: (?v2, <xmloverrdf:childof>,?v1) Therstnodetestto: (?v2, <xmloverrdf:hasname>, < Thesecondchildaxisto: (?result, <xmloverrdf:childof>,?v2) Andthesecondnodetestto: (?result, <xmloverrdf:hasname>, < ThecompleteWHEREclausewillappearas: WHERE (?v1,,(?v2, <rdf:type>, <xmloverrdf:childof>, <xmloverrdf:document>),(?v2, <xmloverrdf:hasname>,?v1),(?result, <xmloverrdf:childof>, < <xmloverrdf:hasname>,?v2) < Exampleresults Anexamplequerycouldbe: /child::movies/child::movie/child::title (in abbreviated form /movies/movie/title) Thatistranslatedto:

93 Chapter9:XMLSemanticIntegration:AModelMappingApproach 73 SELECT WHERE * (?v1, (?v2, <rdf:type>, <xmloverrdf:childof>, <xmloverrdf:document>) (?v2, <xmloverrdf:hasname>,?v1) (?v3, <xmloverrdf:childof>, "movies") (?v3, <xmloverrdf:hasname>,?v2) (?result, <xmloverrdf:childof>, "movie"), (?result, <xmloverrdf:hasname>,?v3) "title") Result: 6, 9 (node numbers, see figure) 9.5 Incorporatingschema-awareness MappingXMLSchematoRDF literalbutaresource(anrdfresource).thiskeyaspectallowstoapplytohasnameallthe InourontologyfortheXMLmodel,theobjectofthehasNamepropertyisnota potentialoftheowlrelationaships(e.g. So,ifwewantourXPathprocessortobeschema-aware,wejustneedtotranslatetheXML deningontologieswhithnamesrelationships). SchemalanguagetoRDF,andtoaddtoourXML/RDFSyntaxontologythenecessary OWLconstructsthatallowtheinferenceenginetounderstandthesemanticsofthedierent style[62])wouldbe: XMLSchemacomponents. TheaddedaxiomsinDesctiptionLogicssyntax(SHIQ-like hasnamevfromsubstitutiongroup Trans(fromSubstitutionGroup) hasnamevfromtype Trans(fromType) fromtypevsubtypeof Asimpleexampleofschema-awareXPathprocessing XPathquery.TakethissimpleXMLdocument: Thenextexampleilustratesthebehaviourofourprocessorinaschema-related <A> <B <B id='b1' id='b2'> /> <C <Did='C1'> </C> id='d1'></d> </B> </A> <B id='b3'/>

94 74 Chapter9:XMLSemanticIntegration:AModelMappingApproach Anditsattachedschema: <schema> <complextype <complexcontent> name='btype'> </complexcontent> <extension base='superbtype'></extension> </complextype> <element type='btype' name='b' </schema> substitutiongroup='superb' /> IDs'B1','B2'and'B3'.Theseelementshaveanamewithvalue'B',andtheschemaspecies WhenevaluatingtheXPathquery//SUPERB,ourprocessorwillreturntheelementswith thatthisnamebelongstothesubstitutiongroup'superb',sotheymatchthequery.also, whenevaluatingthequery//superbtype,theprocessorwillreturn'b1','b2'and'b3'. ItassumesthatthequeryisaskingforelementsfromthetypeSUPERBTypeoroneofits subtypes CompleteXSDtoOWLMapping justmapsthexmlschemasemanticsthatareneededinordertomakethexpathprocessor ThepreviousXMLSchema(XSD)toRDFmappingispartialinthesensethatit XSDsemanticsaware. (XSD2OWL)thatisresponsibleforcapturingalmostalltheschemaimplicitsemantics.This ThereisalsoamorecompleteXMLSchematoOWLmapping semanticsaredeterminedbythecombinationofxmlschemaconstructs.thexsd2owl mappingisbasedontranslatingtheseconstructstotheowlonesthatbestcapturetheir semantics. andthenusedtoguidethexmlschematoowlmappingsshownintable9.3. TheinformalsemanticsofXMLSchemaconstructsarepresentedinTable9.2 Schemasemantics.ThesamenamesusedforXMLconstructsareusedforOWLones,althoughinthenewnamespacedenedfortheontology.XSDandOWLconstructsnamesare identical;thisusuallyproducesuppercase-namedowlpropertiesbecausethecorresponding elementnameisuppercase,althoughthisisnottheusualconventioninowl. TheXSD2OWLmappingisquitetransparentandcapturesagreatpartXML ofthecorrespondingxmlschemas. Therefore,XSD2OWLproducesOWLontologiesthatmakeexplicitthesemantics xsd:sequenceandtheexclusivityofxsd:choice. Theonlycaveatsaretheimplicitorderconveyedby noclearsolutionthatretainsthegreatleveloftransparencythathasbeenachieved.theuse Fortherstproblem,owl:intersectionOfdoesnotretainitsoperandsorder,thereis ofrdflistsmightimposeorderbutintroducesad-hocconstructsnotpresentintheoriginal metadata. notcontributemuchfromasemanticpointofview.forthesecondproblem,owl:unionofis Moreover,asithasbeendemonstratedinpractise,theelementsorderingdoes aninclusiveunion,thesolutionistousethedisjointnessowlconstruct,owl:disjointwith, betweenallunionoperandsinordertomakeitexclusive. employedrdf:propertyforpropertiestocopewiththefactthattherearepropertiesthathave TheresultingOWLontologyisOWL-FullbecausetheXSD2OWLtranslatorhas

95 Chapter9:XMLSemanticIntegration:AModelMappingApproach 75 XMLSchema Table9.2:XMLSchemainformalsemantics elementjattribute Namedrelationbetweennodesornodesandvalues Sharedinformalsemantics Relationcanappearinplaceofamoregeneralone complextypejgroupjattributegroup Relationsandcontextualrestrictionspackage Therelationrangekind complextype//element Contextualisedrestrictionofarelation Restrictthenumberofoccurrencesofarelation Packageconcretisesthebasepackage sequencechoice Combinationofrelationsinacontext Table9.3:XSD2OWLtranslationsfortheXMLSchemaconstructs XMLSchema rdf:property elementjattribute owl:datatypeproperty owl:objectproperty rdfs:subpropertyof complextypejgroupjattributegroup rdfs:range complextype//element owl:restriction owl:maxcardinality sequence owl:mincardinality choice owl:intersectionof 0wl:unionOf

96 76 Chapter9:XMLSemanticIntegration:AModelMappingApproach bothdatatypeandobjecttyperangesasspeciedinthexmlschemaforthecorresponding xsd:element. tions.theseapplicationscanusethexsdschemasemanticsformalisedinthecorresponding ThefullmappingfacilitatestheimplementationofXSDsemantics-awareapplica- ontologies. useothersemantics-enabledtoolstothexmldomain,e.g.reasoningenginesorontology ThisontologiesenablesemanticXPathsbutalsotheyopenthepossibilityto alignmentsolutionsforschemaintegration. Management(DRM)domain[48]. Forinstance,thisapproachhasalreadyshownitsusefulnessintheDigitalRights Systemsimplementation[45][46]andassistednegotiationofdigitalgoods[47]. TheseontologieshavebeenthenexploitedforDRM 9.6 Implementationandperformance 2API[76]forRDQLcomputationandOWLreasoning.ToprocessXPathexpressionswe TheworkhasbeenmaterialisedintheformofaJavaAPI.WehaveusedtheJena havemodiedandrecompiledthejaxenxpathprocessor[75]. foundathttp://dmag.upf.edu/contorsion. Anon-linedemocanbe JenaInferenceEngine usethejena'sowlreasonertoallowtherdqlqueryprocessortoderiveadditionalrdf TheJenaAPI[76]providesasetofdierentinferenceenginesorreasoners. We assertionsfromthebaserdfdatatogetherwiththexml/rdfontologyaxioms. reasonerincludesrulesforeachoneoftheowl/liteconstructsandalsoothers,soit This canbeconsideredanincompleteimplementationofowl/full.table constructssupportedbytheowlreasoner. 9.4enumeratesthe Table9.4:OWLconstructssupportedbytheJena'sOWLreasoner rdfs:subclassof,rdfs:subpropertyof,rdf:type Constructs rdfs:domain,rdfs:range owl:somevaluesfrom,owl:allvaluesfrom owl:mincardinality,owl:maxcardinality,owl:cardinality owl:intersectionof owl:equivalentclass,owl:disjointwith owl:sameas,owl:dierentfrom,owl:distinctmembers owl:thing owl:equivalentproperty,owl:inverseof owl:functionalproperty,owl:inversefunctionalproperty owl:symmeticproperty,owl:transitiveproperty owl:hasvalue allowsrule-basedinferenceoverrdfgraphs,combiningtwodierentstrategies. TheOWLreasonerisbuiltontopofageneralpurposeruleengine. Thisengine Onone

97 Chapter9:XMLSemanticIntegration:AModelMappingApproach 77 handtheengineusesforwardchaining(specicallytheretealgorithm[40])toprecompute deductions. likeprolog)toanswerthequeries.thecombinationofthesewwostrategies(hybridmodel) Ontheotherhanditusesbackwardchaining(aLogicProgrammingstrategy isusedbydefault,buttheenginecanbeconguredtorunjustoneofthem. Figure9.4:Jenahybridexecutionmodel Forwardrulescaninfernewdata(deductions)andalsootherrules. Theforwardenginemaintainsasetofinferredstatementsinthedeductionsstore. mulated,thebackwardchaininglpengineappliesthemergeofthesuppliedandgenerated Whenaqueryisfor- rulestothemergeoftherawanddeduceddata. ofsomebackwardrulesthatcouldbeinstantiatedforaspecicdataset. Thehybridapproachallowsimprovingperformancebyreducinge.g.thegenerality extractedfrom[77],considertherdfssubpropertyofentailments.asimplesolutionwould Asanexample, involvethefollowingbackwardrule: (?a?q?b) <- (?p rdfs:subpropertyof?q), (?a?p?b). everygoalfromthequerywillmatch. Ofcoursetherulewouldwork,butbecausetheheadiscomposedjustbyvariables, forsubpropertyrelationsforallpossiblegoals. Thiswillcausethattheenginewillhavetotest specicdatasetbeforethebackwardprocessbegin.wecantrythefollowingcombination So,itmakessensetoadaptthisruletoa ofaforwardruleandabackwardrule: (?p rdfs:subpropertyof -> [ (?a?q),?q notequal(?p,?q)?b) <- (?a?p?b) ]. TheforwardstrategywouldprecompileallthedeclaredsubPropertyOfrelationshipsinto simplebackwardrules. whichactuallyhasasubproperty. Theseruleswouldonlyberedifthegoalreferencesaproperty Performance theprocessor.wehaverealisedaperformancetestoverajavavirtualmachinev1.4.1in Thoughperformancewasn'tthetargetofthework,itisanimportantaspectof a2ghzintelpentiumprocessorwith256mbofmemory. ontwovariables,thesizeofthetargetdocuments,andthecomplexityofthequery.table Thenaldelaydependsmainly 9.5showsthedelayoftheinferencingstagefordierentdocumentdepthlevelsandalsofor somedierentqueries.

98 78 Chapter9:XMLSemanticIntegration:AModelMappingApproach whensimplequeriesareused(queriesthatnotinvolvetransitiveaxis),butwhendocument Theprocessorbehaveswellwithmedium-sizedocumentsandalsowithlargeones sizegrowsthedelayrelatedtothecomplexqueriesincreasesexponentially. mancelimitationsofthejena'sowlinferenceenginehavebeendescribedin[78].weare Someperfor- nowworkingonthisproblem,tryingtoobtainamorescalableinferenceengine. expression Table9.5:Performancefordierentdocumentdepthlevels 5d 32ms 10d 47ms 15d 47ms 20d 62ms 20d(XalanXPathprocessor) /A/B/following-sibling::B125ms46ms 48ms 16ms /A/B/following::B 125ms62ms 63ms 47ms 15ms /A//B 172ms203ms250ms219ms31ms 16ms //A//B 178ms266ms281ms422ms32ms 9.7 TestingintheDRMApplicationDomain RightsManagement(DRM)akeyissue.Traditionally,DRMSystems(DRMS)havedealt TheamountofdigitalcontentdeliveryintheInternethasmadeWeb-scaleDigital withthisproblemforboundeddomains.however,whenscaledtotheweb,drmssarevery diculttodevelopandmaintain.thesolutionisinteroperabilityofdrms,i.e.acommon frameworkforunderstandingwithasharedlanguageandvocabulary. notacoincidencethatorganisationslikempeg(movingpictureexpertsgroup),oma Thatiswhyitis (OpenMobileAlliance),OASIS(OrganizationfortheAdvancementofStructuredInformationStandards),TV-AnytimeForum,OeBF(OpeneBookForum)orPRISM(Publishing RequirementsforIndustrialStandardMetadata)areallinvolvedinstandardisationoradoptionofrightsexpressionlanguages(REL).TwoofthemainRELinitiativesareMPEG-21 REL[149]andODRL[66]. thelanguagesyntaxandabasicvocabulary.theserelsarethensupplementedwithwhat BothareXMLsublanguagesdenedbyXMLSchemas.TheXMLSchemasdene arecalledrightsdatadictionaries[133]. lightweightformalisationofthevocabularytermssemanticsasxmlschemasoradhoc Theyprovidethecompletevocabularyanda ontologies. implementationindrms.theyseemquitecompleteandgenericenoughtocopewithsuch ODRLandMPEG-21RELhavejustbeendenedandareavailablefortheir acomplexdomain.however,theproblemisthattheyhavesucharichstructurethatthey areverydiculttoimplement. "traditional"xmltoolslikedomorxpath.therearetoomanyattributes,elementsand TheyarerichinthecontextofXMLlanguagesandthe complextypes(seetable9.6)todealwith ApplicationtoODRLlicenseprocessing thatapplytohowwecanaccessthelicensedcontent.thiswouldrequiresomanyxpath Considerlookingforallconstraintsinarightexpression,usuallyarightslicense,

99 Chapter9:XMLSemanticIntegration:AModelMappingApproach 79 Table9.6:NumberofnamedXMLSchemaprimitivesinODRLandMPEG-21REL ODRL Schemasxsd:attributexsd:complexTypexsd:elementTotal DD-11 EX MPEG-21 REL-SX EL-R REL-MX queriesastherearedierentwaystoexpressconstraints. constraints:industry,interval,memory,network,printer,purpose,quality...thisamounts Forinstance,ODRLdenes23 tolotsofsourcecode,diculttodevelopandmaintainbecauseitisverysensibletominorchangestotherelspecs. denitions. Hopefullythereisaworkaroundhiddeninthelanguage stitutiongrouprelationsamongelementsandtheextension/restriction Aswehavesaid,thereisthelanguagesyntaxbutalsosomesemantics.Thesub- complextypesencodegeneralisationhierarchiesthatcarrysomelightweight,taxonomy-like, baseonesamong semantics.forinstance,allconstraintsinodrlaredenedasxmlelementssubstituting theo-ex:constraintelement,seefigure9.5.thedicultyisthatalthoughthisinformation oex:constraintelement substitutiongroup... odd:industry odd:interval odd:memory odd:network odd:printer odd:purpose odd:quality... Figure9.5:SomeODRLconstraintelementsdenedassubstitutionGroupofconstraintElement isprovidedbythexmlschemas,itremainshiddenwhenworkingwithinstancedocuments ofthisxmlschemas.however,usingthesemantics-enabledxpathprocessorwecanprot fromallthisinformation.asithasbeenshown,thexmlschemasaretranslatedtoowl ontologiesthatmakethegeneralisationhierarchiesexplicit,usingsubclassofandsubpropertyofrelations. semanticxpathlike//o-ex:constraintelementtoretrieveallo-ex:constraintelementplus Theontologycanbeusedthentocarryouttheinferencesthatallowa allelementsdenedasitssubstitutiongroup ApplicationtotheMPEG-21authorizationmodel MPEG-21REL inmpeg-21ipmpcomponents[70],mpeg-21rel[128]andrdd[123]parts.mpeg-21 InMPEG-21standardtheprotectionandgovernanceofdigitalcontentarespecied IPMPComponentsprovidesmechanismstoprotectadigitalitem(DI)[29]andtoassociate licensestothetargetoftheirgovernance,whilempeg-21relspeciesthesyntaxand semanticsofthelanguageforissuingrightsforuserstoactondiswhilempeg-21rdd comprisesasetoftermstosupportthempeg-21rel.supposeanmpeg-21compliant

100 80 Chapter9:XMLSemanticIntegration:AModelMappingApproach peerthatreceivesaprotectedandgovernedmpeg-21di,whichconsistsofadigitalasset andsomemetadata,astheinformationrelatedtothetoolstounprotecttheassetandthe conditionsofuseofthisasset(seeexample). toobtaintheipmpmetadataassociatedtotheasset,thenifanylicensethatgovernsthe WhenprocessingthisDItherststepis protectedassetisfound,theapplicationhastoresolvediftheusercanexercisetherequested actionbymeansoftheauthorizationmechanismdenedinmpeg-21rel,andiftheresult ispositivetheassetisunprotectedandtheactionisexercised. <DIDL> <Item> <Component> <Resource <ipmpdidl:protectedasset mimetype="application/ipmp"> <ipmpdidl:identifier> mimetype="audio/mp3"> </ipmpdidl:identifier> <dii:identifier> urn:mpegra:mpeg21:dii:as002-11</dii:identifier> <ipmpdidl:info> <ipmpinfo:ipmpinfodescriptor> <ipmpinfo:tool> <ipmpinfo:rightsdescriptor>... </ipmpinfo:tool> <ipmpinfo:license> </ipmpinfo:license> <r:license>... </r:license> </ipmp:ipmpinfodescriptor> </ipmpinfo:rightsdescriptor> </ipmpdidl:info> </ipmpdidl:protectedasset> <ipmpdidl:contents> EFJDV9FUV98JRF424U039RNCNK... </ipmpdidl:contents> </Component> </Resource> </DIDL> </Item> Inthescenariodescribedabove,theXPathprocessorisusefulwhenimplementinglicense basedauthorizationmechanisms.mpeg-21relstandardspecicationdenestheauthorizationmodel,figure9.6,thatmakesuseoftheauthorizationrequestandstoryelements andresolvesthequestion"isaprincipalauthorizedtoexercisearightsucharesource?". TheXPathprocessorsimpliestheimplementationoftheauthorizationalgorithmbecause itallowstotheapplicationtoquicklyidentifywhichelementsareofaparticulartypeinthe licenses,authorizationrequestandstoriesconsideredtoresolvethisalgorithm.therefore, whentheapplicationhastodetermineifalicenseoragrantwithinanauthorizationrequest orstoryhasanyelementrepresentingaresource,aprincipaloracondition,thisprocess couldresultcomplexandcostlyifwedon'tusethexpathprocessor. whenwelookforaresourceinalicenseorgrantelementwithinanauthorizationrequest Aclearexampleis orstory,ifwedon'thavethecapabilitytosearchforanelementthatitssubstitutiongroup isresource,thenwehavetolookforoneoftheelementsdepictedinfigure9.7. Inthe

101 Chapter9:XMLSemanticIntegration:AModelMappingApproach 81 Figure9.6:MPEG-21RELauthorizationmodel Figure9.7:Exampleofresourceelements

102 82 Chapter9:XMLSemanticIntegration:AModelMappingApproach authorizationprocess,thexpathprocessoralsoisusefulwhencomparingtherighttheuser wantstoexerciseandtherightintheuser'slicense,becausewehavetotakeintoaccount therightslineagedenedinrddasdescribedabove. MPEG-21RDD determiningiftheuserhastheappropriaterightstakingintoaccounttherightslineage InordertointerpretRELlicenses,thesemanticXPathprocessorhelpuswhen denedintherdd(rightsdatadictionary). tionarydenitions,mpeg-21hasanontologyasdictionary(rdd).thesemanticsthatit IncontrastwithODRL,thatusesXMLSchemasbothforthelanguageanddic- providescanalsobeintegratedinoursemanticxpathprocessor.todothat,thempeg-21 RDDontologyistranslated[74]totheontologylanguageusedbytheSemanticXPathProcessor,i.e.OWL.Oncethisisdone,thisontologyisconnectedtothesemanticformalisation buildupfromthempeg-21relxmlschemas. canalsoprotfromtheadhocontologysemantics. Consequently,semanticXPathqueries MPEG-21RDD,seeFigure9.8,canbeseamlesslyintegratedinordertofacilitatelicense Forinstance,theactstaxonomyin checkingimplementation.considerthescenario:wewanttocheckifoursetoflicensesauthorisesustouninstallalicensedprogram.ifweusexpath,theremustbeapathtolookfor Figure9.8:PortionoftheactstaxonomyinMPEG-21RDD licensesthatgranttheuninstallact,e.g.//r:license/r:grant/mx:uninstall.moreover,asit isshowninthetaxonomy,theusetoolactisageneralisationoftheuninstallact.therefore, wemustalsocheckforlicensesthatgrantususetool,e.g//r:license/r:grant/mx:uninstall. Ansuccessively,weshouldcheckforinteractwith,doandact. oftheactsthatgeneraliseuninstallimpliesthatthelicensealsostatesthattheuninstall However,ifweuseasemanticXPath,theexistenceofalicensethatgrantsany actisalsogranted. latesthelicensetothegrantedactimpliesallthefactsthatrelatethelicensetoallthe Thisissobecause,byinference,thepresenceofthefactthatre- actsthatspecialisethisact. pression//r:license/r:grant/mx:uninstall. Therefore,itwouldsucetocheckthesemanticXPathex- Ifanyofthemoregeneralactsisgrantedit wouldmatch. /r:license/r:grant/dd:installand/r:license/r:grant/dd:uninstall. Forinstance,theXMLtree/r:license/r:grant/dd:usetoolimpliesthetrees

103 Chapter9:XMLSemanticIntegration:AModelMappingApproach Conclusions ontology-awarexpath/xqueryprocessor. Inthischapterwehavedescribedanovelstrategyfordesigningaschema-awareand canbeusedtotransparentlyresolvequeriesoverxmlinstancesboundtoschemasthat Suchbehaviour,thatwehavecalledsemantic, deneinheritancehierarchiesamongtypesandelementnames,orrelatedtoontologiesthat denerelationshipsthatarerelevantforthequeriesevaluation. lowingacompletetranslationfromxmlandxsdinstancestordftriples. OurapproachhasconsistedinmappingtheXMLandXSDmodelstoOWL,al- expressionsaretranslatedtordqlqueries(wehaveprovidedasimpleandelegantalgorithm TheXPath todoit)thatarethenresolvedbyanrdqlenginewithowlreasoningcapabilities.the chosenrepresentationretainsthenodeorder,incontrastwiththeusualstructure-mapping approach,thatmapsthespecicstructureofsomexmlschematordfconstructs.the workhasbeenmaterialisedintheformofajavaapi.anon-linedemocanbefoundat scenario,thedigitalrightsmanagementdomain,wheretheschema-awareandontologyawarexpath/xqueryprocessorhasshownitsbenets. allowsatransparentaccesstothesemanticshiddenintheschemasoftherightsexpression Thebehaviouroftheprocessor Finally,wehavedemonstratedhowourapproachcanbeusefulinaplausibleusage Languages,sowedonotneedtorecodethem.Thisallowsdevelopingsoftwarelesscoupled withtheunderlyingspecications.

104

105 Chapter 10 A Vector Space Model for Semantic Similarity Calculation and OWL Ontology Alignment producesasetofsemanticcorrespondences(usuallysemanticsimilarities)betweensome Ontologyalignment(ormatching)istheoperationthattakestwoontologiesand elementsofoneofthemandsomeelementsoftheother.arigorous,ecientandscalable similaritymeasureisapre-requisiteofanontologyalignmentprocess.thischapterpresents asemanticsimilaritymeasurebasedonamatrixrepresentionofnodesfromanrdflabelled directedgraph.anentityisdescribedwithrespecttohowitrelatestootherentitiesusing graphsimilaritycalculationdescribedin[20]whenapplyingthisideatothealignmentoftwo N-dimensionalvectors,beingNthenumberofselectedexternalpredicates.Weadaptthe ontologies.wehavesuccessfullytestedthemodelwiththepublictestcasesoftheontology AlignmentEvaluationInitiative Alreadypublishedwork Largeportionsofthischapterhaveappearedinthefollowingpaper: TousR.,DelgadoJ.AVectorSpaceModelforSemanticSimilarityCalculation andowlontologyalignment,17thinternationalconferenceondatabaseand ExpertSystemsApplications(DEXA2006),4-8September2006. lishedinlecturenotesincomputerscience. Tobepub ). 1ThisworkhasbeenpartlysupportedbytheSpanishadministration(DRM-MMproject,TSI

106 Chapter10:AVectorSpaceModelforSemanticSimilarityCalculationandOWLOntology 86 Alignment 10.2 Introduction Motivation agement,etc.)severaloverlappingontologies(middleontologies)arebeingengineered.each Formanyknowledgedomains(biology,music,webdirectories,digitalrightsman- oneisadierentabstractionandrepresentationofthesameorsimilarconcepts.thereare proliferatingalsoamyriadofproblem-specicontologies(lowerontologies)formanyapplications,metadatarepositories,personalinformationsystemsandpeer-to-peernetworks. requirethesemanticalignment(mapping)ofthedierentformalisms.thealignmentprocess Toenablecollaborationwithinandacrossinformationdomains,softwareagents willidentifytheequivalencesbetweensomeentities(e.g. participatingontologies,andthedierentlevelsofcondence.thesemappingsarerequired classesandproperties)ofthe beforethequeryingofsemanticdatafromautonomoussourcescantakeplace OntologyAlignment producesasetofsemanticcorrespondences(usuallysemanticsimilarities)betweensome Ontologyalignment(ormatching)istheoperationthattakestwoontologiesand elementsofoneofthemandsomeelementsoftheother. gorithmshavebeenprovidedlikeglue[32],ola[37]orfoam[36]. Severalontologyalignmental- denition,borrowedfrom[35],canbegiven: Amoreformal Denition GiventwoontologiesOandO0,analignmentbetweenOandO0isa setofcorrespondences(i.e.,4-uples): matchedentities,rbeingarelationshipholdingbetweeneande0,andnexpressingthelevel <e;e0;r;n>withe2oande02o0beingthetwo ofcondence[0..1]inthiscorrespondence. edgerepresentationlanguage(e.g. Itistypicallyassumedthatthetwoontologiesaredescribedwithinthesameknowltonomousalignment,butothersemi-automaticandinteractiveapproachesexist. OWL[112]). Herewewillfocusonautomaticandau Semanticsimilaritymeasures mathematicsformatchinggraphs[58][114],indatabasesformappingschemas[122]andin Theontologyalignmentproblemhasanimportantbackgroundworkindiscrete machinelearningforclusteringstructuredobjects[19]. algorithmsarejustfocusedonndingcloseentities(the"="relationship),andrelyonsome Mostpartofontologyalignment semanticsimilaritymeasure. itemscorrespondtothesameinformation.dataitemscanbeontologyclassesandproperties,butalsoinstancesoranyotherinformationrepresentationentities.semanticsimilarity betweenontologyentities(withinthesameontologyorbetweentwodierentones)maybe denedinmanydierentways.therecentlyheldontologyalignmentevaluationinitiative 2005[108]hasshownthatthebestalignmentalgorithmscombinedierentsimilaritymeasures. Asemanticsimilaritymeasuretriestondcluestodeducethattwodierentdata [37]providesaclassication(updating[122])inheritedfromthestudyofsimilarity

107 Chapter10:AVectorSpaceModelforSemanticSimilarityCalculationandOWLOntology Alignment 87 inrelationalschemas. appliedtoontologies:lexical,topological,extensionalandmodel-based. Thisclassicationcanbesimpliedtofourcategorieswhenbeing Ourapproach similarityapproach. Theworkpresentedinthischaptertakesatopologicalorstructure-basedsemantic sophisticatedstructure-basedsimilaritymeasuresarerequired.inrdfgraphs,relationships Asontologiesandknowledge-representationlanguagesevolve,more arelabeledwithpredicatenames,andtrivialdistance-basedstrategiescannotbeapplied. Someworkslike[64]exploresimilaritymeasuresbasedonstructureforRDFequivalent bipartitegraphs. graphs.theapproachcanbeoutlinedinthefollowingtwopoints: OurworkfocusalsoinRDF,butfacesdirectlythenaturalRDFlabelleddirected 1.TocomputethesemanticsimilarityoftwoentitieswehavetakenthecommonRDF andowlpredicatesasasemanticreference. dependingonhowtheyrelatetootherobjectsintermsofthesepredicates.wehave Objectsaredescribedandcompared modeledthisideaasasimplevectorspace. 2.Toecientlyapplyoursimilaritymeasuretotheontologyalignmentproblemwehave adaptedittothegraphmatchingalgorithmof[20] spacemodel(vsm) RepresentingRDFlabelleddirectedgraphswithavector additionandscalarmultiplication(andalsowithsomenaturalconstraintssuchasclosure, InlinearalgebraavectorspaceisasetVofvectorstogetherwiththeoperationsof associativity,andsoon).avectorspacemodel(vsm)isanalgebraicmodelintroduceda longtimeagobysalton[134]intheinformationretrievaleld. avsmallowstodescribeandcompareobjectsusingn-dimensionalvectors.eachdimensioncorrespondstoanorthogonalfeatureoftheobject(e.g. document). weightofcertaintermina Inamoregeneralsense, tionshipswithalltheotherentitiespresentintheontology-firstwewillfocusonsimilarity InanOWLontology,wewillcompareentitiestakingintoconsiderationtheirrela- withinthesameontology,nextwewillstudyitsapplicationtothealignmentoftwoontologies-. space.forthisvectorspace,wewilltakeasdimensionsanyowl,rdfschema,orother Becauserelationshipscanbeofdierentnaturewewillmodelthemwithavector externalpredicate(notontologyspecic)e.g.rdfs:subclassof,rdfs:rangeorfoaf:name.we canformallydenetherelationshipoftwonodesinthemodel: Denition Givenanypairofnodesn1andn2ofadirectedlabelledRDFgraph GOrepresentingtheOWLontologyO,therelationshipbetweenthem,rel(n1;n2),isdened thereisanarclabelledwiththepredicatepifromn1ton2or0otherwise.piisapredicate bythevectorfarc(n1;n2;p1);:::;arc(n1;n2;pn)g,wherearcisafunctionthatreturns1if fromthesetofexternalpredicatesp(e.g.{rdfs:subclassof,foaf:name}).

108 88 Chapter10:AVectorSpaceModelforSemanticSimilarityCalculationandOWLOntology Alignment rel(n1;n2)=farc(n1;n2;p1);:::;arc(n1;n2;pn)gj n1;n22go^8i2[0;n];pi2p arc(n1;n2;pi)= ( 1 ifthereisanarclabelledwithpifromn1ton2; 0 otherwise. Example Letusseeasimpleexample.TakethefollowinggraphGArepresenting anontologyoa. shipsbetweennodes.externalpredicatesrdfs:domainandrdfs:rangehavebeenchosenfor Imagineatrivialtwo-dimensionalvectorspacetomodeltherelation- dimensions0and1respectively. Figure10.1:GA scribedbyf1;0g.therelationshipbetweenthepropertyactsinandtheclassmoviewillbe Therelationshipbetweenthepropertydirectsandtheclassdirectorwillbede- describedbyf0;1g,andsoon. relationshipsbetweenitandalltheotherentitiesintheontology. Now,thefulldescriptionofanentitycanbeachievedwithavectorcontainingthe togetherweobtainathree-dimensionalmatrixarepresentationofthelabelleddirected Puttingallthevectors graphga(roworder:director,actor,movie,directs,actsin,voicein): A= 0 B@ (0;0) (0;0) (0;0) (0;0) (0;0) (0;0) (0;0) (0;0) (0;0) (0;0) (0;0) (0;0) (1;0) (0;0) (0;0) (0;0) (0;1) (0;0) (0;0) (0;0) (0;0) (0;0) (0;0) (0;0) (0;0) (0;0) (1;0) (1;0) (0;1) (0;1) (0;0) (0;0) (0;0) (0;0) (0;0) (0;0) 10.4 Similarityofentitieswithinthesameontology Inthegeneralcase,thecorrelationbetweentwovectorsxandyinanN-dimensional vectorspacecanbecalculatedusingthescalarproduct.wecannormalizeitbydividingthis 1 CA

109 Chapter10:AVectorSpaceModelforSemanticSimilarityCalculationandOWLOntology Alignment 89 productbytheproductofthevectormodules,obtainingthecosinedistance,atraditional similaritymeasure. composedbyrelationshipvectors(sotheyarematrices).wecancalculatethescalarproduct Inourcase,vectorsdescribingentitiesintermsofotherentitiesare oftwoofsuchvectorsofvectorsvandwusingalsothescalarproducttocomputeviwi: V W= i=1 NX j=1 MX V ijwij e.g.thevectordescribingdirectsandthevectordescribingactsinisdirects actsin=1. Applyingthisequationtotheaboveexamplewecanseethatthescalarproductof ThescalarproductofactsInandvoiceInisactsIn voicein=2,andsoon.normalizing thesevalues(tokeepthembetween0and1)wouldallowtoobtainatrivialsimilaritymatrix oftheontologyentities.however,weaimtopropagatethestructuralsimilaritiesiteratively, andalsotoapplythisideatothealignmentoftwodierentontologies. sectionswewilldescribehowtodoitbyadaptingtheideasdescribedin[20]. Inthefollowing 10.5 Applyingthemodeltoanontologyalignmentprocess modelwehaveadaptedthegraphmatchingalgorithmof[20].thisadaptedalgorithmcalculatesentitysimilaritiesinanrdflabelleddirectedgraphbyiterativelyusingthefollowing updatingequation: Tocalculatethealignmentoftwoontologiesrepresentedwithourvectorspace Denition Sk+1=BSkAT+BTSkA;k=0;1;::: whereskisthenb NAsimilaritymatrixofentriessijatiterationk,andAandBare thenb NB NPandNA NA NPthree-dimensionalmatricesrepresentingGAandGB predicatesselectedasdimensionsofthevsm. respectively. NAandNBarethenumberofrowsofAandB,andPisthenumberof fortherstiterationthatallentitiesfromgaareequaltoallentitiesingb).ifwestartthe Notethat,asitisdonein[20],initiallythesimilaritymatrixS0issetto1(assuming processalreadyknowingthesimilarityvaluesofsomepairofentities,wecanmodifythis matrixaccordingly,andkeeptheknownvaluesbetweeniterations. Example Let'sseeasimpleexample.TakethefollowinggraphsGAandGB.Figure 10.6showstheircorrespondingRDFlabelleddirectedgraphs. Figure10.2:GA(left)andGB(right)

110 90 Chapter10:AVectorSpaceModelforSemanticSimilarityCalculationandOWLOntology Alignment A= (0;0) (0;0) (1;0) (0;0) (0;1) (0;0) (0;0) (0;0) 1 (0;0) AB= Rba=BS0= (0;0) (0;0) (0;0) (1;0) (0;0) (0;1) 1 (0;0) (0;0) (0;0) AS0= (1;1) (0;0) (1;1) (0;0) (1;1) (0;0) (0;0) (0;0) 1 (0;0) A simba=rbaat= A Rab=S0A= 0 (0;0) (1;0) (0;0) (1;0) (1;0) (0;1) 1 (0;1) A simab=btrab= A S1=simba+simab=BS0AT+BTS0A= A A elementsbythefrobeniusnormofthematrix,denedasthesquarerootofthesumofthe Tonormalizethesimilaritymatrix(tokeepitsvaluesbetween0and1)[20]dividesallits absolutesquaresofitselements2. S1=S1=frobeniusNorm(S1)= 0; ; ; A Iteratingthealgorithm4timesitconvergestothefollowingresult: S4= 0; ; ; A So,asexpectedtheentitiesa0,b0andc0(rows)aresimilartoa,bandc(columns)respectively. 2Frobeniusnorm: q P Mi=1 P N j=1jaijj

111 Chapter10:AVectorSpaceModelforSemanticSimilarityCalculationandOWLOntology Alignment Computationalcostandoptimization isindependentofthesizeoftheontologies,operationsinvolvingrelationshipsvectorscan Becausethenumberofselectedexternalpredicatespi2Pcanbesmallandit beconsideredofconstantcost,andthegeneralalgorithmofordero(n2). numberofnodescanbeconsiderablyhigh,someoptimizationsarerequiredtoconstraint Becausethe Classes(c),Instances(i),ExternalClasses(c0)andExternalInstances(i0).Becausenodes theprocessingtime.inspiredin[64],wehaveclassiednodesintovetypes:properties(p), fromonetypecannotbesimilartonodesofanothertype,thematricescanberewritten (rowsandcolumnscorrespondtotypespreviouslymentionedandinthesameorder): 0 Ap A= Ac Ap Ap Ap Ap c0 B Ai Ac Ac p Ai Ai Ai 0 c i Sp Sk= 0 B Sc Si Sc0 Si0 p Ac0 Ai0 Ac0 Ai0 Ac0 Ai0 Ac0 Ai0 c0 c0 c0 c0 Ac i0 Ai i0 i0 Ac0 i0 Ai0 i0 Denition TheSk+1equationcanbedecomposedintothreeformulas: Spk+1=Bp Bp pspkatp p+bp csckatp c+bp isikatp i+bp c0+ katp i0si0 i0+btp pspkap p+btc psckac p+bti psikai p+ Sck+1=Bc kac0 p+bti0 Bc pspkatc p+bc csckatc c+bc isikatc i+bc katc i0+btp cspkap c+btc csckac c+bti csikai c+ Sik+1=Bi kac0 c+bti0 Bi pspkati p+bi csckati c+bi isikati i+bi kati i0+btp ispkap i+btc isckac i+bti isikai i+ kac0 i+bti0 1 CA katp c0sc0 p BTc0 psc0 psi0 kai0 katc c0+ c0sc0 i0si0 c BTc0 csc0 csi0 kai0 kati c0+ c0sc0 i0si0 i BTc0 isc0 isi0 betweeniterations. k+1arediagonalmatricespassedasinputparameters.theyarekeptunchanged kai0 Sc k+1andsi0 Comparisonagainstalgoritmsbasedonbipartitegraphs tothinkthatitwouldbebettertodirectlyapplyitovertheontologiesequivalentbipartite Theuseofanalgorithmtomeasuresimilaritybetweendirectedgraphscouldlead graphs(likeitisdonein[64]),insteadofadaptingittordflabelleddirectedgraphs. 1 CA

112 92 Chapter10:AVectorSpaceModelforSemanticSimilarityCalculationandOWLOntology Alignment However,ourapproachhassomeadvantages;ononehandwereducecriticallythenumber ofnodesandthecomputationalcost. predicatesofowlaretreatedasalltheothernodes,whileinourmodeltheybecome Ontheotherhand,inbipartitegraphsthecore thesemanticreferencetodescribeandcompareentities.figure bipartiteversionofthepreviousexamplewithtwographsofthreenodes. 10.7showstheequivalent Figure10.3:BipartiteversionofFigure 10.6 Appyingthe[64]weobtainthefollowingsimilaritymatrixbetweena0,b0,c0anda,b,c: alreadyknowingthesimilarityvaluesofsomepairofentities,wecanmodifysetthismatrix NotethatinitiallythesimilaritymatrixX0issetto1. Ifwestarttheprocess accordingly,andkeeptheknownvaluesbetweeniterations. betweengaandgb: Let'scalculatethesimilarity A= 0 B@ CA 0B B= CA X 0= 0 Iteratingthealgorithm22timesitconvergestothefollowingresult: 1 0 B@ CA X22= 0 B@ 0;405 0; ;05 0 0; ;05 0 0;05 0;153 0; ; ;05 0; ;172 0; ;153 0;05 1 0; ;05 0; ;05 1 X22= 0; ;153 0;05 0 0;153 0; A 1 CA

113 Chapter10:AVectorSpaceModelforSemanticSimilarityCalculationandOWLOntology Alignment 93 Ascanbeseen,theinclusionofstatementnodesaddssomesymmetriesnotpresentinthe (andvice-versa)appear. originalgraphs,resultinginlesspreciseresults. Somesimilaritiesbetweennodesb0andc Anextendedexample Example Let'sseeasimpleexample.TakethefollowinggraphsGAandGB. Figure10.4:GA Iteratingthealgorithm22timesitconvergestothefollowingresult: a:scholastics,a:phdstudent,a:supervisor Rows:b:Teacher,b:OverseaStudent,b:People,b:Other,b:StudentColumns:a:Graduate, 0 0;049 X12= 0;013 0;014 0;106 B 0;051 0; ;02 0 0;013 0;018 0;145 0; ;014 0;014 0;018 0;049 Separatelyb:teachanda:supervisesimilarity=0,446 Afternormalization: 1 CA

114 94 Chapter10:AVectorSpaceModelforSemanticSimilarityCalculationandOWLOntology Alignment Figure10.5:GB Figure10.6:BipartiteversionofGA 0 0;336 0;098 0;73 X12=X1=maxValue(X1)= B 0;353 0;09 0; ; ;09 0; ; ;098 0;098 0;127 0;336 1 CA

115 Chapter10:AVectorSpaceModelforSemanticSimilarityCalculationandOWLOntology Alignment 95 Figure10.7:BipartiteversionofGB Separatelyb:teachanda:supervisesimilarity=1 So,asexpectedtheentitiesa0,b0andc0(rows)aresimilartoa,bandc(columns)respectively. BRows:b:Other,b:People,b:OverseaStudent,b:Student,b:Teacher,b:teach,owl:ObjectProperty ARows:a:PhdStudent,a:Graduate,a:Scholastics,a:Supervisor,a:supervise,owl:ObjectPropert Relationships:(subClassOf,domain,range,type) A= 0 B@ (0;0;0;0) (0;0;0;0) (1;0;0;0) (0;0;0;0) (0;0;0;0) (1;0;0;0) (0;0;0;0) (0;0;0;0) (0;0;0;0) (0;0;0;0) (0;0;0;0) (0;0;0;0) (0;0;0;0) (0;0;0;0) (0;0;0;0) (0;0;0;0) (0;0;0;0) (0;0;0;0) (0;0;0;0) (0;0;0;0) (0;0;0;1) (0;0;0;0) (0;0;0;0) (1;0;0;0) (0;0;1;0) (0;0;0;0) (0;0;0;0) (0;0;0;0) (0;0;0;0) (0;0;0;0) (0;0;0;0) (0;0;0;0) (0;0;0;0) (0;0;0;0) (0;1;0;0) (0;0;0;0) 1 CA B= 0 B@ (0;0;0;0) (0;0;0;0) (1;0;0;0) (0;0;0;0) (0;0;0;0) (0;0;0;0) (0;0;0;0) (0;0;0;0) (0;0;0;0) (0;0;0;0) (0;0;0;0) (0;0;0;0) (0;0;0;0) (0;0;0;0) (0;0;0;0) (0;0;0;0) (1;0;0;0) (0;0;0;0) (0;0;0;0) (0;0;0;0) (0;0;0;0) (1;0;0;0) (0;0;0;0) (0;0;0;0) (0;0;0;0) (0;0;0;0) (0;0;0;0) (0;0;0;0) (1;0;0;0) (0;0;0;0) (0;0;0;0) (0;0;0;0) (0;0;0;0) (0;0;0;0) (0;0;0;0) (0;0;0;0) (0;0;0;0) (0;0;0;1) (0;0;1;0) (0;0;0;0) (0;0;0;0) (0;0;0;0) (0;0;0;0) (0;0;0;0) (0;0;0;0) (0;0;0;0) (0;0;0;0) (0;1;0;0) (0;0;0;0) 1 CA

116 96 Chapter10:AVectorSpaceModelforSemanticSimilarityCalculationandOWLOntology Alignment X0= 0 B@ Iteratingthealgorithm48timesitconvergestothefollowingresult: Cols:a:PhD,a:Graduate,a:Scholastics,a:Supervisor,a:supervise,owl:ObjectProperty Rows:b:Other,b:People,b:OverseaStudent,b:Student,b:Teacher,b:teach,owl:ObjectProperty X48= 0 B@ 0 0;156 0; ;156 0; ;399 0;156 0;156 0; ; Results 2005testsuite[108]. TotestourapproachwehaveusedtheOntologyAlignmentEvaluationInitiative withpairsofontologiestoalignaswellasexpected(human-based)results.theontologies Theevaluationorganizersprovideasystematicbenchmarktestsuite aredescribedinowl-dlandserializedintherdf/xmlformat.theexpectedalignments areprovidedinastandardformatexpressedinrdf/xmlanddescribedin[108].because ourmodeldoesnotdealwithlexicalsimilarity,wehaveintegratedouralgorithminside anotherhybridaligner,falcon[64](replacingitsstructuresimilaritymodulebyours).this constraintstheinterestoftheobtainedresults,butotherwiseithadn'tbeenpossiblea comparativeevaluation.becausemostpartofthetestsincludemorelexicalsimilaritythan structuralsimilaritychallenges,ouralignerandfalcon3obtainverysimilarresults(thesame fortests and ).thedierencesfallbetweentests ,thatweshowin table uesofprecision(thenumberofcorrectalignmentsfounddividedbythetotalnumberof Rowscorrespondtotestnumbers,whilecolumnscorrespondtotheobtainedval- alignmentsfound)andrecall(thenumberofcorrectalignmentsfounddividedbythetotal ofexpectedalignments). arethesameasthoseoffalcon,andcanbeobtainedin[64] 3Adescriptionofallthetestscanbeobtainedfrom[108].Ourresultsfortestsnotpresentinthetable 1 CA 1 CA

117 Chapter10:AVectorSpaceModelforSemanticSimilarityCalculationandOWLOntology Alignment 97 testprec. vsmrec. prec. falcon rec. prec. foamrec. prec. ola rec Table10.1:OAEI2005testswhereourapproach(vsm)obtainsadierentresultthan[64] 10.7 RelatedWork constructs(taxonomies).previousworkslike[90]measurethedistancebetweenthedierent Theinitialworkaroundstructure-basedsemanticsimilarityjustfocusedonis-a nodes. multiplepaths,onetakesthelengthoftheshortestone.[148]ndsthepathlengthtothe Theshorterthepathfromonenodetoanother,themoresimilartheyare. Given rootnodefromtheleastcommonsubsumer(lcs)ofthetwoentities,whichisthemost specicentitytheyshareasanancestor.thisvalueisscaledbythesumofthepathlengths fromtheindividualentitiestotheroot. andscalesthatvaluebythemaximumpathlengthintheisahierarchyinwhichtheyoccur. [89]ndstheshortestpathbetweentwoentities, sures,basedongraphmatchingfromdiscretemathematics. Recently,newworkslike[20]denemoresophisticatedtopologicalsimilaritymeasuressuittheparticularitiesofthenewontologies,builtwithmoreexpressivelanguages Thesenewgraph-basedmea- likeowl[112].ourworkisbasedonthepreviousworkin[20],andalsoinitsadaptation toowl-dlontologiesalignmentin[64]. strategycalledgmo(graphmatchingforontologies). Thislastworkdescribesastructuralsimilarity operatesoverrdfbipartitegraphs.itallowsamoredirectapplicationofgraphmatching Dierentlyfromourwork,GMO algorithms,butalsoincreasesthenumberofnodesandreducesscalability Conclusions surementthatcanbedirectlyappliedtoowlontologiesmodelledasrdflabelleddirected Wehavepresentedhereanapproachtostructure-basedsemanticsimilaritymea- graphs.theworkisbasedontheintuitiveideathatsimilarityoftwoentitiescanbedened intermsofhowthesetwoentitiesrelatetotheworldtheyshare(e.g.tworedobjectsare similarwithrespecttothecolourdimension,buttheirsimilaritycannotbedeterminedina generalway).wedescribeandcompareontologicalobjectsintermsofhowtheyrelateto

118 98 Chapter10:AVectorSpaceModelforSemanticSimilarityCalculationandOWLOntology Alignment otherobjects.wemodeltheserelationshipswithavectorspaceofndimensionsbeingn thenumberofselectedexternalpredicates(e.g. Wehaveadaptedthegraphmatchingalgorithmof[20]totheseideatoiterativelycompute rdfs:subclassof,rdfs:rangeorfoaf:name). thesimilaritiesbetweentwoowlontologies.wehavepresentedalsoanoptimizationofthe algorithmtocriticallyreduceitscomputationalcost.thegoodresultsobtainedinthetests performedovertheontologyalignmentevaluationinitiative2005testsuitehasproventhe valueoftheapproachinsituationsinwhichstructuralsimilaritiesexist.

119 Part III Heterogeneous Query Interfaces: XML Query Tunneling 99

120

121 Chapter 11 Facing Heterogeneous Query Interfaces: Query Tunneling andsomeimplementationaspectsrelatedtothedevelopmentofanadvancedmetasearch Inthischapterwedescribethemotivation,therequirements,thedesigndecisions strategy.thisstrategyisareformulationofthetraditionalansweringqueriesusingviews [51]probleminalocal-as-view(LAV)scenariofromthedataintegrationdiscipline. describeanewsolutionsuitedforthehighlyrestrictedandvolatilescenariooftheweb,and basedonxmltechnologies Alreadypublishedwork Largeportionsofthischapterhaveappearedinthefollowingpapers: GilR.,TousR.,GarcíaR.,RodríguezE.,DelgadoJ.ManagingIntellectual PropertyRightsintheWWW:PatternsandSemantics.1stInternationalConferenceonAutomatedProductionofCrossMediaContentforMulti-channel Distribution(AXMEDIS2005),November2005 TousR.,Delgado,J.InteroperabilityAdaptorsforDistributedInformation SearchontheWeb.Proceedingsofthe7thICCC/IFIPInternationalConference onelectronicpublishing TousR.,Delgado,J.AdvancedMeta-SearchofNewsintheWeb,Proceedings ofthe6thinternationaliccc/ifipconferenceonelectronicpublishing.publisher:vwfberlin,2002.isbn pages Introduction virtuallibraries,videosormusicrepositories,theavailablemetasearchenginesusuallyoer Themainideabehindthisworkisthat,inspecicdomainssuchasnewspapernews, 101

122 102 Chapter11:FacingHeterogeneousQueryInterfaces:QueryTunneling veryrestricteduserinterfaces(oftenonlyakeywordseld)becauseofthedicultytoface theinterfaceparticularitiesofeachunderlyingsource. thislimitationwithastrategyinspiredintheoldworksrelatedtodataintegrationbut Ourapproachtargetstoovercome makinguseofthenewpossibilitiesoeredbyxmltechnologies. onthereprocessingofthemetadatareturnedbythedierentsources,allowingtoinclude Webaseourstratgy metadata-basedsearchconditionsintheuserinterfaceeveniftheyarenotoeredbysome ofthetargetengines Websearchengines wehavetwopossiblealternatives:wecanexploreonebyonealltheexistingobjectsinside Whenwearelookingforsomeinformationorcontent,inanydigitalenvironment, thesetofinterestthatinthecaseofthewebcouldtakeprobablymorethanamillionyears orwecanuseasearchapplicationthatallowsustoexpressconstraintsabouttheproperties oftheobjectsthatweareseeking,usingsomekindoflanguageasforexamplesqlinthe contextofdatabases. querywillhave.traditionalsearchovertheinternetisusuallyperformedusingapplications Themoreexpressivenessthelanguagehasthemoreprecisionthe knownas'searchengines'.thesesystemsseekalistofkeywordsamongthetextualcontent oftheweb(html,pdf,etc.).theconcordanceofaresourcewiththequerydependson thetimeskeywordsarepresentandalsoontheirrelativeandabsoluteposition. userscanenterasequenceofkeywordsandbooleanoperatorstoconstrainthowthesekeywordsmustbesearched. sitesoeran"advancedsearch"pagetofacilitateanalternativewayinbooleanqueries. Becausecommonusersarenotprogrammersmostofthesearch Thetraditionalsearchenginesuserinterfaceconsistsofasingletexteldwhere Oncethesearchisnishedthesearchengineshowstotheuseraresultspage,whereitlists thewebresourceswherethekeywordshavebeenfound. order,fromthebestresulttotheworstaccordingtothecriteriadescribedbefore. Thelistisshowedindescendent itemsoftheresultslist,inthiskindofsearch,containfewinformationabouttheresource described:informationaboutthetitle,ashortdescription,thesizeandmaybetheauthor Specialisedsearchengines theusersmorecomplexinterfacesthanthegenericones.theseinterfacesallowtospecify Inspecicdomains,asnewspapernews,theavailablesearchenginesusetooerto constraintsaboutspecicfeaturesoftheresourcesbeingsearched,asthedateofanarticle, thepriceofabook,etc.theresultspageofaspecialisedsearchengineitisquitesimilarto theresultspagedescribedintheprevioussection,butitprovidesmoreinformationabout eachiteminthelist. dierentsetofattributestodescribeeachmatchingresult,butevenenginesofthesame Enginesofdierentdomains,asvideos,musicorgamesforexample,willusea domain,booksinthiscase,willprobablyuseasimilarbutnotequalsetofattributes.thisis themaindrawbackthatconstraintsthefunctionalityoftheexistingspecialisedmetasearch engines,aswewilldiscusslater,andoneofthetargetsofthispartoftheresearchwork.

123 Chapter11:FacingHeterogeneousQueryInterfaces:QueryTunneling Metasearchengines temsthathelpusersndinginformationintheinternetbyautomaticallyqueryingasetof Theincreasingnumberofsearchengineshasmotivatedtheapparitionofnewsys- availablesearchengines.thesesystemsarecalledmetasearchengines.fromtheuserspoint ofviewtraditionalmeta-searchengineshavethesameinterfaceandfunctionalityasnormal ones,butitiscommonlyacceptedthattheyareslower.metasearchengineshavetofacethe problemofqueryingapplicationsdesignedforhumaninteractionwithinterfacesasthosedescribedabove.someinitiativeshaveappearedtodeneastandardizedandmachine-friendly accesspointtowebsearchsystems,butthesuccessoftheseapproachesisconstrainedby thefactthatsearchserviceprovidersarereluctantofothersystemstakingunrestrained protoftheirwork. theexploitationbythirdpartiesoftheinformationharvestingeortoftheexistingsearch However,theinexistenceofmachine-friendlyinterfacescannotavoid engines,mainlybecausetheyusebrowsersasapresentationlayer,withexposedhttp requestsandhtmlresultspages.thisleavesadooropentootherapplicationstoactas browsersandlaunchqueriesagainstthem.sothetaskofmeta-searchcanbedividedintwo mainsub-problems: 1.Howtoqueryeachsearchengine 2.Howtoobtaintheinformationfromeachresultspage Thenecessitytofeatureeachengineinterface,overallconsideringthelackofcollaboration, isverytime-consumingandcumbersome,andnoonecanguaranteethattheinterfaceswill remainunchanged.thismakesexistingmetasearchenginesverydiculttomaintain,and theuncertainnessabouttheirupdatestatereducestheirpublicacceptance SpecialisedMetasearch intheeldofspecialisedsearchhappensexactlythesame. Ifintheeldofgenericsearchwecanndthegureofthemeta-searchengine, enginesdesignedtolaunchqueriesagainstasetofspecialisedsearchenginesofthesame Thereexistsomemeta-search domain. blearea. Currentlythereexistspecialisedmetasearchsystemsinpracticallyeverypossi- accuratequeriesconstrainingtheparticularfeaturesofthetargetresources. Assaidbefore,specialisedsearchenginesprovidecomplexinterfacestoperform traditionalspecialisedmeta-searchusuallyoeronlya"one-eld"interfacetotheuser.the Surprisingly, originofthislimitationliesinthedicultytofeaturetheinterfaceparticularitiesofevery underlyingspecialisedengine.toprovidearichestinterfaceitwouldbenecessarytomap theinterfacesemanticswiththesemanticsofeverytargetengine,ahardtaskespeciallyifwe considerthattheinterfacescouldchange.toovercomethislimitationisoneofthetargets ofourresearchworkconcerningthisarea,asitwillbefurtherexplained Ourapproach: Advancedmetasearch improvethecapabilitiesofmetasearchengines[118].itisnotaquantitativeapproach,since Ourresearchgrouphasbeenworkingduringseveralyearsinpracticalsolutionsto

124 104 Chapter11:FacingHeterogeneousQueryInterfaces:QueryTunneling wedonotpretendtoreducethesearchtimeortoamplifythesearcheld,butaqualitative approach. allowstoexpressunrestrictedsearchcriteriaoveranyexistingresourceintheinternet, Thetargetistoprovideusers(humanoragents)withasearchinterfacethat withoutdoinganypresumptionovertheexistingtechnologies(asotherapproachesdo,as theopenarchivesinitiative[109]).toachievethesegoalswehavedenedanewstrategy todesignmeta-searchengines.thisstrategyisbasedonvemainideas: 1.Commonmetadataspecication:Thespecicationofaselectedcommonsetofproperties(metadata)oftheobjectstargetedbythesearch. formalisedusingxmldtds,xmlschemasorrdfschemasforexample. Thisspecicationcouldbe 2.User-interfaceindependentquerylanguage:Thespecicationofagenericquerylanguagethatwillbetheentrypointtothemeta-searchengine. reinventthewheel,ifweassumethatresultswillcomeinsomexmlform,w3c's Itisnotnecessaryto XQuery[154]languagewillsuce(orRQLifweareusingRDF).Thelanguageneedn't tobeknownbyhumanusersbecauseitcouldbedistilledfromhuman-friendlyinterfaces. 3.Human/machinemaintainableXMLdescriptors:TheuseofXMLdescriptorstofeaturethe'hostile'underlyingenginesinterfaces,tofacilitateitsgenerationandmaintainabilitybyhumanadministratorsorlearningagents. 4.Mapping:TheXMLdescriptorsshouldallowtomapthegenericqueriesoftheusers (formalizedinthelanguagementionedabove)tothespecicinterfacesoftheunderlyingengines.thesedescriptorsshouldalsobeusedtomaptheheterogeneousresults obtainedtothegenericsetofmetadata. beformalizedusingxmlorrdf.somequestionsarisehere,aswhathappenswith Thehomogeneousresultsobtainedcould searchconditionsthatcannotbemappedtosomeengines,orwhatmustbedonewith resultswherenotallthepropertiesweredened(speciallythepropertiesreferenced insomeofthesearchconditions).thefollowingpointwillanswerthesequestions. 5.Reprocessing:Thekeyaspectofourstrategyisthereprocessingoftheresults. causesomeoftheconditionsexpressedbythegenericuserqueryprobablycannotbe Be- mappedtoalltheunderlyingengines,itisnecessarytoreprocessthequeryoverthe obtainedresults,oncetheyhavebeennormalised.becausetheuserqueryarrivesto thesystemintheformofastandardquerylanguage(xquery,rql,etc.)thisstage canbeperformedbysimplyexecutingtherespectivequeryprocessorovertheobtained results.thisstepguaranteesthattheresultsreturnedtotheuserarecoherentwith theconditionsexpressedintheinitialquery. OurapproachcanbeappliedtoanykindofsearchovertheWeb,butitbecomesspecially appropriatewhenitisappliedtospecialisedmeta-search.thereasonisthat,indespiteof thatthespecialisedsearchenginesofthesamedomainusetosharesimilarandrichsetsof metadata,thetraditionalspecialisedmeta-searchhasnotfoundtillnowawaytoexploit it,unlessbyestablishingpartnershipsandspecicprotocolswiththeunderlyingengines administrators. leavesundenedsomepointsmarkedherewithdottedlines. Fig. 11.1illustratesgraphicallythemainfeaturesofthisstrategy,that

125 Chapter11:FacingHeterogeneousQueryInterfaces:QueryTunneling 105 Figure11.1:StrategyDiagram 11.4 XMLSearchNeutralLanguage(XSNL) ofproperties,hasanunderlyingsemantic.unfortunately,thereisnotanuniversallyaccepted Anyquery,expressedinsomesyntaxoverasetofitemsfeaturedwithacertainset standardwaytoexpressit.theparametersofthehttpformsfromwebsearchengines canbeviewedasthebuildingblocksofaparticularquerysyntax,oneforeachdierent engine.theproblemtomapagenericuserquery,expressedinsomequerylanguage,toa setofdierenthttpinterfacesistheproblemofmappingbetweenquerysyntaxes. tonewsappearedtoday,thisweekandthisyear.theotherengineallowstoconstraintthe Imaginetwonewssearchengines;oneoersthepossibilitytoconstraintthesearch searchtoanexplicitmonthfromthethreelastyears.thesearetwodierentsyntaxes.if wewouldhaveasyntaxthatallowedtoexpressaspecicrangeofdates,wecouldmapitto eachoneofthesyntaxes.soweneedalanguagethatactsbetweenthemeta-searchinterface andthetargetsystems,builttobeasexibleandnegrainedaspossible,allowingtomap thebiggestsetofpossiblequeryconditions. etal.facesasimilarsituationandconsiderstheproblemofrewritingaconjunctivequery Theoldresearchpaper(1995)"AnsweringQueriesUsingViews"[55]byA.Halevy usingasetofconjunctiveviews.asmostpartofsimilarworksoftheseinitialapproaches naturalwaytointeractwithmodernwebinterfaces. itusesdatalog1.insteadofdatalog,wehavechosenxmlrelatedtechnologiesasamore itxmlsearchneutrallanguage(xsnl)andwehaveappliedittothedevelopmentof WehavedenedanXML-basedquerylanguagetotestourapproach. Wecall anadvancedmeta-searchenginespecializedinnewspapernews,asexplainedinthenext sections.thereasonwhywedonotusexmlqueryastheintermediatelanguage(weuseit toprocessxsnlsentences)isthatourstrategyisbasedonhavingsimplequeriesexpressed inxml.thereexistsalsoanxmlserializationofxmlquery,butitistooverboseand complextosuitourapproach.thisdoesn'tmeanthatusers(developers)cannotusexml Querytoprocesstheresults,becauseXSNLisjustusedasamediatorquerylanguagewith XMLoutput. ThefollowingXMLcodeshowsasampleinstanceofXSNLinthenewscontext: 1seetheBackgroundInformationchaptersforabriefintroductiontoDatalog

126 106 Chapter11:FacingHeterogeneousQueryInterfaces:QueryTunneling XSNL sample query: <query> <select> <property <property propertyname="headline" propertyname="date" /> /> </select> <where> <contains <between propertyname="content" propertyname="date" from=" " value="iran" /> <in minimum="3" propertyname="source"> to=" " /> <valueitem <valueitem value="el value="abc" Mundo" /> /> <valueitem <valueitem value="la value="el Vanguardia" Pais" /> /> <valueitem <valueitem value="reuters" value="cnn Spain" /> <valueitem value="le Monde" /> /> <valueitem <valueitem value="the value="bbc" Washington /> Post" /> <valueitem <valueitem value="diari value="la Stampa" Avui" /> /> </where> </in> <sortby <sortby propertyname="source_order" propertyname="date"/> type="asc"/> </query> newspapernewsbyspecifyingsomekeywordsandadaterange.inthewebservertheuser Tounderstandtheexample,let'simagineawebpagewhereausercansearch requestisanalysedandtranslatedtoxsnl.finallythexsnlissenttothemetasearch engineandthesearchprocessbegins. thesqllanguage.ithavea'select'element,wherecanbespeciedthedesiredattributes ThestructureofaXSNLdocumentisinspiredin oftheresultingobjects,a'where'element,wherecanbespeciedthesearchconstraints, andan'sortby'element,todeterminetheresultsorder. speciedbyusingdierentelements,allowingtoaddtothelanguagenewconstrainttypes Thedierentconstraintscanbe withdierentstructures Engine A Practical Application: Advanced News Meta-search specialisedinnewspapernews[144]. Wehaveappliedourideasinthedevelopmentofanadvancedmetasearchengine andnon-commercialtraditionalsearchengines,andalsohundredsofavailablemeta-search Inthisdomainthereexistthousandsofcommercial

127 Chapter11:FacingHeterogeneousQueryInterfaces:QueryTunneling 107 applications.themostpartofthenewspaperswithpresenceintheweboersearchservices intheirsites. eachoneofthemusesadierentsetofparametersinthequeriesandadierentresults Alltheseenginesarethepotentialinformationsourcesofoursystem,but pageformat.ourobjectiveistooertotheuseragenericinterfacethatallowstospecify unrestrictedconditionsoverthesetofcommonpropertiesthatwehaveselectedinthis domain(headline,author,date,section,page,newspaperandlanguage). target,onceselectedandformalized(wearecurrentlyusingaxmldtd)thesubsetof Toachievethis metadataofinterest,thenextstepistoanalysetheinterfaceofeveryenginetoobtain informationaboutthequerymethod. eachspecicsetofqueryparameterstothegenericcommonpropertiesselected.weplan WeuseXMLdescriptorstodescribehowtomap touselearningagentstoperformthisoperationperiodicallybecausetheinterfacesofthe enginescouldchangeovertime. informationabouttheresultspage,thatwillbeusedduringtheparsingprocess.oncewe Asapartoftheinterfacefeaturingwemustalsoacquire haveamechanismtofeaturetheenginesinterfaces,wecandesigntheinterfaceofthemetasearchengine.wehaveselectedxmlmessages(soap[136])containingxquerysentences andhttpprotocol. independentclients-userinterfacesoragents-however,todemonstratethefunctionalityof Thisinterfaceisopenandcanbeusedbythird-partiestodevelop thesystem,wehavedevelopedourowninterface(seeg.11.2org.11.3fortheadvanced interface). metasearchengine.theenginemapsthepartsofthequery(atleastthosethatarepossible) ThecriteriaspeciedbytheuseristranslatedtoXQueryandsenttothe toeachunderlyingengineinterfaceandthenlaunchesallthesearchesinparallel.theresults obtainedareheterogeneousandmustbeparsedandmappedtothecommonsetofproperties. Becausenoonecanguaranteethatallthecriteriahavebeenmappedtoalltheengines,the results(nowhomogeneousandserializedinxml)mustbereprocessed.thisreprocessingis easilyperformedintheserveronlybyusingaxqueryprocessorwiththexqueryreceived astheinput Implementation strategyexplainedabove.thenextsubsectionsexplainhow,withthehelpofw3c'sxml Implementingtheprototypeapplicationhasmeanttoinstantiatethemetasearch QueryLanguage[154],theprototypeexecutesthefollowingqueryoverthedierentsources: XSNL sample query: <query> <select> <property <property propertyname="headline" propertyname="date" /> /> </select> <where> <contains <between propertyname="content" propertyname="date" from=" " value="test" /> </where> to=" " />

128 108 Chapter11:FacingHeterogeneousQueryInterfaces:QueryTunneling Figure11.2:DMAG'sNewsAdvancedMeta-searchEngineuserinterface

129 Chapter11:FacingHeterogeneousQueryInterfaces:QueryTunneling 109 Figure11.3:DMAG'sNewsAdvancedMeta-searchEngineadvanceduserinterface

130 110 Chapter11:FacingHeterogeneousQueryInterfaces:QueryTunneling Figure11.4:MappingXSNLtoeachenginewithXMLQuery </query> <sortby propertyname="date" /> Mappingtheuserquerytotargetsystems theparametersforthehttpcallforeachsearchenginefromtheuserqueryformalisedin Therstholetollfromthestrategydiagramdescribedaboveisthewaytoobtain XSNL.Thisisnotatrivialissue,becauseitmustbedeterminediftheconditionexpressedin oneparameterisalsoexpressedwithoneoremorestatementsinthexsnlquery,andthen extractthenecessaryinformationtogiveavaluetotheparameter.thisprocesscanrequire acomplexanalysisofthequery,andtheinformationtodoitmustnotbecoupledwith code,formaintainabilityreasons.thedescriptionofthewaytocharacterizeaparameterof oneengineshouldbeeditable,human-friendlyandmodiableatruntime.hereiswherethe inxml.theparametersthatrequiresomeanalysisoftheuserqueryarefeaturedwitha XMLQuerylanguagets,asisillus-trateding.11.4.Eachengineparameterisfeatured XMLQuerywithinaCDATAclause.Thefollowingsimpliedexampleshowsafragmentof thecongurationleofanewsmetasearchengine. Configuration file fragment: <parameter> <name>precision</name> <type>xquery</type> <value> <![CDATA[ <result-value> LET LET $a $b := := document('input.xml') LET $c := ELSE IF ($b IF = ($b 'or') = 'and') THEN '1' ELSE '3' THEN '2' </result-value> RETURN $c ]]>

131 Chapter11:FacingHeterogeneousQueryInterfaces:QueryTunneling 111 </value> </parameter> <default>1</default> Figure11.5:ScreenScrapingwithXMLQuery tobeapplied. Theparameterprecisionofthetargetenginecorrespondstoathebooleanoperator 'type'ofthe'contains'element.thisisaverysimpliedexample,butitservestoillustrate ThisinformationisobtainedfromtheuserXMLquery,fromtheattribute theidea Metadataextraction("ScreenScraping") analyzedtoextractthemetadatarelatedtoeachitem. Thesecondholetollinthedesignofoursystemisthewaytheresultspagesare convenienttoformalizethismetadatainxml.so,whynottoconvertthehtmlresults Aswewillseelater,itwouldbe pageinxmlandthenapplyaxmlquerytoit? ahtmlpagetoxml,evenifitismalformed,andsomeofthemperformveryfast. Thereexistalotoftoolstoconvert exampleisw3c'shtmltidy[143]. TheXMLQuerycanbeeditedandmodiedby An humanadministratorsorsoftwareagentsatruntime,withoutthenecessitytorecompilethe sources.thefollowingxmlfragmentshowsanexampleonhowtoapplythisideatoextract informationfromtheresultspagesofthewashingtonpostsearcher: ExampleofXMLquerywrapperforscreenscraping: <resultsmap> <![CDATA[ import import dt ps as as org.dmag.metasearch.utils.xquerytransformdate; FOR $c IN document('input.xml')/table org.dmag.metasearch.utils.xquerytransformstring; WHERE RETURN$c/tr/td/font/b/a <result> <property <property name='headline' name='description' value=$c/tr/td[1] value=$c/tr/td[2] />, <property name='date' value=dt(ps($c/tr/td[3])) />, />,

132 112 Chapter11:FacingHeterogeneousQueryInterfaces:QueryTunneling Figure11.6:ReprocessingoftheresultswithXMLQuery </result> <property name='link' />, ]]> </results> </resultsmap> Reprocessingtheresults ingthemtoassurethatalltheinitialconditionshavebeenapplied.thegoodnewsisthat OncewehaveobtainedtheresultsinXML,wemustfacetheproblemofreprocess- XMLQuerytsperfectlytodothisjob,thebadnewsisthatourinitialqueryisformalized inxsnl,notinxmlquery. instanceofxsnltoaxmlquery,asitisshowninthefollowingxmlfragment: Toovercomethisproblemwemustsimplytransformour <results> for where in document('input.xml')//result and ' ' return $c ' ' </results> sortby reprocessedresultswillbetheoutputofthesystem,beingtheinterfaceisresponsibleto NowwealreadyhaveaXMLQueryandwecanjustapplyittotheresults.The renderitinaconvenientway. theundenedaspects,aspresenteding. Withthiswehavecompletedtheinitialstrategyllingall Relatedwork traditionallyknownasthequeryingproblemofthedataintegrationdiscipline.thisproblem IntheStateoftheArtchapterswehaveseenthattheproblemfacedinthisworkis

133 Chapter11:FacingHeterogeneousQueryInterfaces:QueryTunneling 113 isrelatedtotheabilitytoreformulateaquerytocombineinformationfromthedierent sourcesaccordingtotheirrelationshipswithamediatedschema. schemaofdataistheonlyschemavisibletotheusersandtheirqueries,givingthemthe Thisvirtualmediated illusionofinteractingwithonesingleinformationsystem. (GAV)[2][26][50],whichdenesthemediatedschemaasasetofviewsoverthedatasources, Therearetwoclassicapproachesconcerningmediatedschemas,theglobal-as-view andthelocal-as-view(lav)[98][56][34],whichtakestheinversepoint-of-viewanddescribes sourcesasviewsoverthemediatedschema.ourapproachissimilartothelavscenario, becausedescribessourcesintermsofamediatedschema,andoersaverysimpliedsolution totheproblemofansweringqueriesusingviews. verysimilartoourscanbefoundin[81],thatdenesthisideaas"querytunneling". Traditionalworksonclassicmetasearchare[33][135]. Amorerecentapproach makesuseofrdftoolsandfocusesonscienticpapersdatabases. It 11.8 Conclusions tondsomeinformation.surprisingly,andinconcordancewithwhatwehaveexposedin NowadaystheWebhasbecometherstplacewherepeoplegoeswhentheyneed thisdocument,wecanarmthatthefunctionalityofthecurrentsearchsystemsofthe Webisverylimited,overallincomparisonwithotherdigitalenvironments. 'keywordsparadigm'consistinginthatonetypessomewordsinatexteldandpressthe Today,the 'search'button,satisesthenecessitiesofthemostpartofthepeople,andprobablythe averagewebuserdoesnotwanttohearnothingaboutnewsearchinterfaces. thewebisgrowingexponentially,andalsotheneedforinformation,andsoontheresults However, obtainedfromaquerybasedonalistofkeywordswillbeunmanageable,andnewandfaster searchmechanismswillbeneeded. non-textualresourcesoftheweb(images,videos,music,etc.)willbeenrichedwithsome Furthermore,intheshortterm,themostpartofthe kindofmetadata,supportingnewstandardsasmpeg-7[101].thequeriestargetingthese resourcesmustbecapabletoexpresscomplexconditionsaboutpropertiesandattributes. Inthelongterm,the'SemanticWeb'willrequirestrategiestoadapttheexistinghumanorientedsearchservicestoenableitsusebysoftwareagentswithouttraumaticimpactin theunderlyingtechnologies. assumptionsofthesuccessofsomestandardorprotocol. Ourapproachtargetsallthesechallengeswithoutmaking

134

135 Chapter 12 Waiting Policies for Distributed Information Retrieval on the Web ablewebsearchsystemsinresponsetoauser'srequest,havetofacetheproblemofinter- actingwithalargesetofunpredictablesystemswithheterogeneousandchangingresponse times. thequalityoftheresultsmotivatethediscussionbetweendierentalgorithmstodetermine Thenecessitytoachieveacompromisebetweenthenaluserperceiveddelayand DistributedWebsearchengines,thosesystemsthatqueryon-the-yasetofavail- whenthequeryprocessofaninformationsourceshouldbeaborted.wecallthesealgorithms 'waitingpolicies',andtheirgoalistomaximisethequalityofservice(qos)byminimizing theimpactofsourcesbehaviourwithoutreducingtheresultqualitybeyonduser'stolerance. Ourexperienceinthedesignanddevelopmentofspecialisedmetasearchengineshasgiven usthepossibilitytotrysomeofthesepolicies,andtoextractsomeconclusionsthatwe presenthere Motivation tems,thencollatestheresultsinsomewayandformatsthemfordisplay.wecanidentifya Adistributedsearchengineisasystemthatsendsqueriestomultiplesearchsys- lotofdierentkindsofthesesystems,dependingontheirdatasources,thatcanbeinternal indexes,associatedtextsearchengines,databasesearchengines,messagearchives,intranet orwebwidesearchengines,orevenleservers. ofitsforms(metasearch,contentsyndication,aggregation,etc.),butmainlyoverspecialized ThispartoftheworkfocusesondistributedsearchoverWebsearchsystems,inany ones. whentheusersubmitsaquerytothesystemthroughtheuserinterface.theenginethen ThetypicalsessionwhenusingatraditionaldistributedWebsearchenginebegins sendstheuserquerytoasetofunderlyingsearchengines(orcomponentsearchengines [146]).Thequerymustbetranslatedtoanappropriateformatforeachlocalsystem.Once receivedtheresultsfromtheunderlyingsources,thesearemergedintoasinglerankedlist andpresentedtotheuser. orinterestofmetasearchingorotherformsofdistributedsearchontheweb,alreadywell Nevertheless,wearenotgoingtodiscussherethemotivation documented(seefore.g.[135],[33]or[88]). 115

136 116 Chapter12:WaitingPoliciesforDistributedInformationRetrievalontheWeb geting,theprocessofqueryingthedierentenginesisdoneinparallel.eachtargetengine Generally,andindependentlyofthekindofsystemsthedistributedengineistar- iscalledandtreatedbyadierentexecutionthreadofthedistributedengine,thatwaits untilthesource(i.e.thequeriedsystem)respondswithasetofresultsandthenpassesthe informationtoaresultspool.amainthreadisresponsibletoanalysetheevolutionofthe resultspoolanddeterminewhentheprocessmustend.thesetofrulesusedtodothisis thatwecallthe'waitingpolicy',anditisthefocusofthisarticle. thecomplexityofthepolicycanvary.ifwehaveasystemthatqueriesasmallsetofremote Anydistributedsearchsystemneedsawaitingpolicy,butdependingonthecontext sourcesandrequiresalltheresponsestofulltheprocess,thewaitingpolicywillconsistin simplydeterminingifsomeoftheremotesystemsareoutofservice(eventhiscanbeanon trivialissue).thisiswhathappenswithmostoftraditionalweb-widemeta-searchsystems hundredsoreventhousandsofsources,wecanconsiderthatmaybeitisnotnecessarynor (fore.g. [135]). However,ifwehaveaWeb-baseddistributedsearchenginethatqueries desirabletowaitthemall(eveniftheyareonservice). Thisalsomayhappenwhenworkingwithspeciallyunreliableunderlyingsources,asin Maybetheuserpreferstosacricesomeresultstoachieveabetterresponsetime. somespecialiseddomains.but,howtomeasurethe'quality'oftheresults?and,whenthe 'quality'isenoughtodecidetoabortthesearchprocess?thesearedicultquestionsthat dependonsubjectivevariableslikeuser'sperceptionoftheresultsrelevanceortheresponse time.wehavebeenforcedtostudythisproblemasasideeectofourworkonanadvanced metasearchstrategy[118],speciallyforthedevelopmentofthearchitecturethatinstantiates thisstrategyandthatisbeingusedinrealengineslike[144] DistributedSearchEnginePerformance sicalinformationretrievalsystems(see,e.g.,[96])canbeextendedtoevaluatewebsearch Someofthemeasuresproposedtoquantitativelymeasuretheperformanceofclasdencytofavoursomeperformanceissuesmorestronglythantraditionalusersofinformation anddistributedsearchengines. However,asremarkedby[85]Webusersmayhaveaten- retrievalsystems.forexample,interactiveresponsetimesappeartobeatthetopofthelist ofimportantissuesforwebusers. recognizesathreewaytrade-obetweenthespeedofinformationretrieval,precisionand Abasicmodelfromtraditionalretrievalsystems[145] recall(seeg.12.1).precisionistheratioofrelevantdocumentstothenumberofretrieved documents: precision= retrieveddocuments relevantdocuments (12.1) Recallisdenedastheproportionofrelevantdocumentsthatareretrievedwithrespectthe totalnumberifexistingrelevantdocuments: recall= totalrelevantdocuments relevantdocuments (12.2) Theprecisionisrelatedtotheexpressivenessofthequeriesandthestructureoftheinformationtoexplore.MostWebusersarenotsomuchinterestedinthetraditionalmeasure

137 Chapter12:WaitingPoliciesforDistributedInformationRetrievalontheWeb 117 Figure12.1:Threewaytrade-oinsearchengineperformance[85] Figure12.2:WaitingPolicy ofprecisionandrecallastheprecisionoftheresultsdisplayedintherstpageofthelistof retrieveddocuments,before"nextpage"commandisused.thisspecialcharacteristiccan beextendedtometa-searchenginesusers,andisakeypointwhenconsideringtodesigna waitingpolicy,becauseitshowsthatusersmayprefertosacricesomeoftheresults(even relevantones)toimproveresponsetime WaitingPolicy informationsearchprocessmustend.adistributedsearchimpliestheinteractionwithremote,heterogeneousandpotentiallyunreliablesystems,withdierentandvariableresponse times.thisdoesnotimplyonlythatsomeenginescansporadicallyappearout-of-service, butthiscanalsoimplythatsomeenginesmayexperienceenormousdelayssometimesunrelatedtonetworkoverloads. notenoughtoactivateamethodtodetectsourcefailures,becausesomesourcescanbe Thissituationscanlasthours,daysorevenweeks. Soitis Awaitingpolicy(seeg.12.2)isanalgorithmthatdetermineswhenadistributed on-servicebutwithresponsetimesbeyondusertolerance.becausethisisaverydynamic context,thesystemmusthaveamechanismtodeterminewhenthesearchprocessmustbe stopped. tionshipbetweenthe'resultsquality'andtheuser'sperceivedresponsetime.theresponse Thealgorithmmustguaranteethequalityofservice(QoS)byoptimisingtherela- timecanbeeasilymeasuredbuttheresultsqualitydependsonacombinationofobjective

138 118 Chapter12:WaitingPoliciesforDistributedInformationRetrievalontheWeb Figure12.3:DistributionofEnginesDelay andsubjectiveaspects. tothequery,thetotalnumberofresultsorthenumberofsourcessuccessfullyqueried.the Objectiveaspectscanbethecorrectnessoftheresultsinrelation policiesthatwewillstudyherefocusonthethirdoftheseaspects,thenumberofsources, thathasatightrelationshipwithresultsquality.be-causethequalityofthesourcescanvary, evendependingontheuserperception,onecanestablishponderingmechanisms,assigning dierentweightstoeachsourceofinformation. successfulsources'assumingthat,ifweightshavebeenassigned,thenumberalreadyreects Howeverwewilltalkabout'numberof it TargetEnginesBehaviour amainparameter,itisinterestingtoknowifthereissomepatterninthebehaviourof Whendesigningawaitingpolicy,andtakingthenumberofsuccessfulsourcesas thetargetengines.wehavetestedtheresponsetimesofanarbitrarysetofapproximately sixtysearchengines1(seeg.12.3).thehorizontalaxis(g.12.3)representsthesequence ofdelaytimesandtheverticalaxisthenumberoftargetengines.themeasurementshave beengroupedinintervalsof1000ms(thegraphshowsthelowermarginoftheinterval,for e.g.thersttenenginesnishedinlessthanonesecond).knowingthiswecananticipate howtheresultspoolwillevolve,becausewheneachengineterminates,thepoolreceivesa newresult2.so,thefunctionthatmodelsthetimeevolutionofthenumberofresultsrst growsslowly,becauseonlyafewsetofengineshaveverysmalldelaytimes,thengrowsvery aone-to-oneassociationbetweenresultsandengines,buttheconclusionscanbeextended fast,andnallygrowsslowlyagain(seeg.12.4). Toconstructthegraphwehavemade tothesituationthatwehavementionedbefore,whenweassigndierentweightstothe engines,specializedsearchengines,meta-searchengines,digitallibrariesandothers.thecompletelistcan 1Wehavechosenaboutsixtyheterogeneousenginesfromdiverselocations.TheyincludeWeb-widesearch 2Herefor'result'wemeanasetofresultsreturnedfromonesource bedownloadathttp:// ertous/projects/ir/url_waiting_policies.htm

139 Chapter12:WaitingPoliciesforDistributedInformationRetrievalontheWeb 119 Figure12.4:ResultsEvolution sources. reasonsthatunderlaytheirresponsetime.despitethat,theyareallconditionedbyglobal Becausetargetenginesareremotesystemsarbitrarilylocated,wedonotknowthe changesofnetworkconditions,overallconsideringthattheresponsetimeincludestheresults transmissiontothedistributedsearchengine,aroundwhichtheysharethesamenetwork state.so,wecanassumethattargetengineswillhaveacertainglobalbehaviour,andthat thesameenginesofthepreviousguresbutunderdierentnetworkconditions. thisbehaviourcanchange. Fig.12.5showstheprogressionoftheresultsobtainedfrom simulatedanetworkoverloadintheproximitiesofthedistributedsearchenginethathave Wehave increasedthedelayofallthesources Resultsvs. Time xesheuristicallyatimelimitforwaitingforresults;oncesurpassed,enginesthatstillhave ThesimplestwaitingpolicyonecanimagineisaxedTimeoutPolicy.Someone notrespondedarediscarded.inthiscase,usersalwaysexperiencethesamedelay(without consideringthetimeittakestosendcollatedresultstothem). arebad,onlyafewsetofengineshavethechancetorespond,andusersgetonlyasmall Whennetworkconditions subsetofthepotentialresults.obviouslythiscannotbeconsideredagoodpolicybutserves toillustratewhatistheproblemwearefacing.theonlyadvantageofatimeoutpolicyis thatdoesnotpropagatethechangesoftargetenginesbehaviourtoendusers,asillustrated ing harvestingofresultsthatendswhentime-outarrives(verticalline). Thegureshowshowadegradationofnetworkconditionsslowsdownthe untilaminimumnumberofengineshavesuccessfullynished.thispolicyguaranteesthe TheoppositeapproachtoTimeoutPolicyistheMinimum-resultsPolicy,thatwaits qualityoftheresponse,butpropagatestotheuserthedelaytimeofthetargetenginesand itschanges(seefig.12.7).againitmustbementionedherethatwefocusonthenumberof successfulenginestomeasuretheresultsqualityassumingthatthisvaluecouldhavebeen

140 120 Chapter12:WaitingPoliciesforDistributedInformationRetrievalontheWeb Figure12.5:Resultsprogressioninhostilenetworkconditions Figure12.6:TimeoutPolicyExample

141 Chapter12:WaitingPoliciesforDistributedInformationRetrievalontheWeb 121 Figure12.7:Minimum-resultsPolicyExample modiedtoreectsourcerelevanceissues. timeorthenumberofsuccessfulsourcesrespectively. Thesetwodierentapproachesoptimiseonlyoneofthevariables,theresponse instantiatingoneofthesealgorithmscanbeveryfrustrating.inhostilenetworkconditions Theuserexperiencewithasystem thetimeoutpolicymayabortpracticallyallsearchthreads,eliminatingthepossibilityto ndanswerstoevensimplequeries.ifnetworkperformanceisgood,theminimum-results policywillabortthreadsevenwithaveryfastresponsetime.itisclearthatnoneofthese algorithmsisthesolution,becauseitshouldoptimisethetwovariables.todesignagood waitingpolicyweshouldachieveacompromisebetweentimeandquality,andthismustbe abletoadapttoenvironmentalchanges SourceDiscardingPolicies conditions. Somepoliciesfocusonbeingabletodetecttargetfailuresunderchangingnetwork potentialsource. Thesepoliciestrytomaximisethenumberofresultswithoutdiscardingany delay,and,whennetworkconditionsvary,beabletodeterminetheexpecteddelayforeach Oneapproximationtothisistomaintainstatisticsabouteachsource target.ifwehavensourcesandkprevioussearchexperiences,wecantriviallycalculate theaveragedelay(d)ofeachsourcetorepresentitshistoricalperformance(seefig.12.8). Becauseweprobablywanttousethelatestinformation(theinformationofthesearch processincourse),inthesearchexperiencek+1wecanapproximatethefuturedelayof someengineeapplyingtheaveragechangeratioofthensourcesalreadynished. tothenetworkconditionsanditsownhistoricalvalues. Thiskindofpoliciesworknetodiscardsourcesthatarebehavingdisaccording totheuserthechangesinthenetworkperformance,andcannotbeconsideredadenite Howeverthesepoliciespropagate solutionsaccordingtowhatwehavesaidbefore.

Schneps, Leila; Colmez, Coralie. Math on Trial : How Numbers Get Used and Abused in the Courtroom. New York, NY, USA: Basic Books, 2013. p i.

Schneps, Leila; Colmez, Coralie. Math on Trial : How Numbers Get Used and Abused in the Courtroom. New York, NY, USA: Basic Books, 2013. p i. New York, NY, USA: Basic Books, 2013. p i. http://site.ebrary.com/lib/mcgill/doc?id=10665296&ppg=2 New York, NY, USA: Basic Books, 2013. p ii. http://site.ebrary.com/lib/mcgill/doc?id=10665296&ppg=3 New

More information

programsitproduces.finally,weshowhowtoproduceecient,optimizingprogramgeneratorsby

programsitproduces.finally,weshowhowtoproduceecient,optimizingprogramgeneratorsby TopicsinOnlinePartialEvaluation TechnicalReport:CSL-TR-93-563 (alsofusememo93-14) March,1993 ErikRuf DepartmentsofElectricalEngineering&ComputerScience ComputerSystemsLaboratory Partialevaluationisaperformanceoptimizationtechniqueforcomputerprograms.Whenaprogram

More information

ParallelDynamicLoad-BalancingforAdaptiveDistributive MemoryPDESolvers. NasirTouheed by

ParallelDynamicLoad-BalancingforAdaptiveDistributive MemoryPDESolvers. NasirTouheed by ParallelDynamicLoad-BalancingforAdaptiveDistributive MemoryPDESolvers NasirTouheed by Submittedinaccordancewiththerequirements forthedegreeofdoctorofphilosophy SchoolofComputerStudies TheUniversityofLeeds

More information

San$Diego$Imperial$Counties$Region$of$Narcotics$Anonymous$ Western$Service$Learning$Days$$ XXX$Host$Committee!Guidelines$ $$

San$Diego$Imperial$Counties$Region$of$Narcotics$Anonymous$ Western$Service$Learning$Days$$ XXX$Host$Committee!Guidelines$ $$ SanDiegoImperialCountiesRegionofNarcoticsAnonymous WesternServiceLearningDays XXXHostCommitteeGuidelines I. Purpose ThepurposeoftheWesternServiceLearningDays(WSLD)XXXHostCommittee(HostCommittee)isto organize,coordinateandproducethewsldxxxeventwithinthe6weekperiodof3weekspriortotheendof

More information

PG DIPLOMA IN GLOBAL STRATEGIC MANAGEMENT LIST OF BOOKS*

PG DIPLOMA IN GLOBAL STRATEGIC MANAGEMENT LIST OF BOOKS* PG DIPLOMA IN GLOBAL STRATEGIC MANAGEMENT LIST OF BOOKS* Paper I: INTERNATIONAL BUSINESS ENVIRONMENT Global Business Environment (ICMR Publication Textbook) [Ref. No: GBE 11 2K4 23] [ISBN: 81-7881-693-8]

More information

Contents RELATIONAL DATABASES

Contents RELATIONAL DATABASES Preface xvii Chapter 1 Introduction 1.1 Database-System Applications 1 1.2 Purpose of Database Systems 3 1.3 View of Data 5 1.4 Database Languages 9 1.5 Relational Databases 11 1.6 Database Design 14 1.7

More information

Learn AX: A Beginner s Guide to Microsoft Dynamics AX. Managing Users and Role Based Security in Microsoft Dynamics AX 2012. Dynamics101 ACADEMY

Learn AX: A Beginner s Guide to Microsoft Dynamics AX. Managing Users and Role Based Security in Microsoft Dynamics AX 2012. Dynamics101 ACADEMY Learn AX: A Beginner s Guide to Microsoft Dynamics AX Managing Users and Role Based Security in Microsoft Dynamics AX 2012 About.com is a Rand Group Knowledge Center intended to provide our clients, and

More information

TABLE OF CONTENTS CHAPTER TITLE PAGE

TABLE OF CONTENTS CHAPTER TITLE PAGE viii TABLE OF CONTENTS CHAPTER TITLE PAGE TITLE PAGE DECLARATION DEDICATION ACKNOWLEDGEMENT ABSTRACT ABSTRAK TABLE OF CONTENTS LIST OF TABLES LIST OF FIGURES LIST OF APPENDICES I II III IV VI VII VIII

More information

1. Access your account Log in to your online account at http://service.ringcentral.com using your main Ring Central phone number and password.

1. Access your account Log in to your online account at http://service.ringcentral.com using your main Ring Central phone number and password. Ring Central Quick Set Up settings for GA Hotline For assistance call National Hotline Committee Chair hotlinechair@trusteewebsite.com Ring Central Support (888) 898-4591 The purpose of this document is

More information

B1 Project Management 100

B1 Project Management 100 Assignment of points B1 Project Management 100 Requirements for Design Presentation Meetings and Proposal Submissions for Key to Project Management Design Presentation Meeting and Proposal Submissions

More information

Shimmush Tehillim, Tehillim, Psalms 151-155 and Their Kabbalistic Use

Shimmush Tehillim, Tehillim, Psalms 151-155 and Their Kabbalistic Use Shimmush Tehillim, Tehillim, Psalms 151-155 and Their Kabbalistic Use Your purchase helps us to sponsor a new translation Shimmush Tehillim, Tehillim, Psalms 151-155 and Their Kabbalistic Use Edited by

More information

Oximeter Data Management Software. User Manual

Oximeter Data Management Software. User Manual Oximeter Data Management Software User Manual Version 1.0 Date: 14 th Nov, 2013 Shenzhen Med-Link Electronics Tech Co., Ltd I Content 1 Outline---------------------------------------------------------------

More information

Apple Pro Training Series. OS X Server. Essentials. Arek Dreyer. and Ben Greisler

Apple Pro Training Series. OS X Server. Essentials. Arek Dreyer. and Ben Greisler Apple Pro Training Series OS X Server Essentials Arek Dreyer and Ben Greisler Table of Contents Configuring and Monitoring OS X Server Lesson 1 About This Guide 3 Learning Methodology 4 Lesson Structure

More information

How To Get A Financial Aid Award In Athena

How To Get A Financial Aid Award In Athena Athena Self-Service Walkthrough By the UGA Office of Student Financial Aid 220 Holmes/Hunter Academic Building Athens, GA 30602-6114 Phone: (706) 542-6147 Section 1 How to Find Out What is Needed by the

More information

Website user experience

Website user experience copenhagen business school handelshøjskolen solbjerg plads 3 dk-2000 frederiksberg danmark www.cbs.dk Website user experience Website user experience A cross-cultural study of the relation between users

More information

DOCUMENTATION FILE RESTORE

DOCUMENTATION FILE RESTORE DOCUMENTATION Copyright Notice The use and copying of this product is subject to a license agreement. Any other use is prohibited. No part of this publication may be reproduced, transmitted, transcribed,

More information

A STUDY OF THE IMPACT OF CONSTRUCTION ACCIDENTS ON THE PROJECT CONTINUITY

A STUDY OF THE IMPACT OF CONSTRUCTION ACCIDENTS ON THE PROJECT CONTINUITY A STUDY OF THE IMPACT OF CONSTRUCTION ACCIDENTS ON THE PROJECT CONTINUITY Final Project Report as one of requirement to obtain S1 degree of Universitas Atma Jaya Yogyakarta By: KARTIKA IRIANTHY ZEBUA NPM.

More information

This Version Not For Distribution EMR/EHR

This Version Not For Distribution EMR/EHR This Version Not For Distribution EMR/EHR Cheng B Saw, Ph.D. Chair, Asian-Oceanic Affairs of AAPM Director - Physics, Northeast Radiation Oncology Centers President, CBSaw Publishing, LLC Harrisburg, PA,

More information

SAML Authentication within Secret Server

SAML Authentication within Secret Server SAML Authentication within Secret Server Secret Server allows the use of SAML Identity Provider (IdP) authentication instead of the normal authentication process for single sign-on (SSO). To do this, Secret

More information

Dealing with digital Information richness in supply chain Management - A review and a Big Data Analytics approach

Dealing with digital Information richness in supply chain Management - A review and a Big Data Analytics approach Florian Kache Dealing with digital Information richness in supply chain Management - A review and a Big Data Analytics approach kassel IH university press Contents Acknowledgements Preface Glossary Figures

More information

The Impact of Corporate Venture Capital

The Impact of Corporate Venture Capital Timo B. Poser The Impact of Corporate Venture Capital Potentials of Competitive Advantages for the Investing Company Deutscher Universitats-Verlag IX Table of contents List of illustrations List of tables

More information

LIST OF TABLES. 2.4 Variables Related to CRM Implementation 57. 2.11 Variables in Customer Retention at Commercial Banks 74

LIST OF TABLES. 2.4 Variables Related to CRM Implementation 57. 2.11 Variables in Customer Retention at Commercial Banks 74 LIST OF TABLES Table 2.1 Steps in CRM Development at 51 2.2 CRM system at Commercial 53 2.3 Stages in CRM implementation 55 2.4 Variables Related to CRM Implementation 57 2.5 CRM Acceptance Among the Employees

More information

CSSE 372 Software Project Management: Software Risk Management

CSSE 372 Software Project Management: Software Risk Management CSSE 372 Software Project Management: Software Risk Management Shawn Bohner Office: Moench Room F212 Phone: (812) 877-8685 Email: bohner@rose-hulman.edu Plan for the Day n Early Plus/Delta for course n

More information

PMP Certification Exam Prep Bootcamp

PMP Certification Exam Prep Bootcamp Commitment / Vision / Results SM Toll Free (US): (800) 877-8129 Office: (919) 495-7371 Fax: (919) 556-0283 Email: info@cvr-it.com Web site: www.cvr-it.com Course Overview PMP Certification Exam Prep Bootcamp

More information

THE PERFORMANCE MANAGEMENT GROUP LLC

THE PERFORMANCE MANAGEMENT GROUP LLC THE PERFORMANCE MANAGEMENT GROUP LLC ON-CAMPUS TRAINING: LEAN SIX SIGMA EXCELLENCE IN HEALTHCARE DELIVERY BLACK BELT CERTIFICATION Performance Improvement Training for the Healthcare Industry ABOUT THE

More information

Online submission of account of receipts and utilization of Foreign Contribution for the year in FC-6 Form

Online submission of account of receipts and utilization of Foreign Contribution for the year in FC-6 Form Online submission of account of receipts and utilization of Foreign Contribution for the year in FC-6 Form Instructions for online filing of annual account in FC-6 Form 1. For online filing of annual accounts,

More information

Performance Objective Identification Worksheet

Performance Objective Identification Worksheet Performance Objective Identification Worksheet INSTRUCTIONS For each of the performance objective identified, you must indicate by page and paragraph number the equivalent performance objective in your

More information

Managing Successful Projects

Managing Successful Projects 2008 AGI-Information Management Consultants May be used for personal purporses only or by libraries associated to dandelon.com network. Managing Successful Projects with PRINCE2 Office of Government Commerce

More information

Software Monthly Maintenance (Non Accounting Use) Quick Reference Guide

Software Monthly Maintenance (Non Accounting Use) Quick Reference Guide Software Monthly Maintenance (Non Accounting Use) Quick Reference Guide When not using the accounting within the software the system will build up information that will affect the performance and speed

More information

PROJECT MANAGEMENT PROFESSIONAL CERTIFIED ASSOCIATE IN PROJECT MANAGEMENT (PMP & CAPM) EXAM PREPARATION WORKSHOP

PROJECT MANAGEMENT PROFESSIONAL CERTIFIED ASSOCIATE IN PROJECT MANAGEMENT (PMP & CAPM) EXAM PREPARATION WORKSHOP TSE015 PROJECT MANAGEMENT PROFESSIONAL CERTIFIED ASSOCIATE IN PROJECT MANAGEMENT (PMP & CAPM) EXAM PREPARATION WORKSHOP Course Outline I. Introduction and Course Objectives A. About PMI B. PMP and CAPM

More information

1 of 7 31/10/2012 18:34

1 of 7 31/10/2012 18:34 Regulatory Story Go to market news section Company TIDM Headline Released Number Ironveld PLC IRON Holding(s) in Company 18:01 31-Oct-2012 0348Q18 RNS Number : 0348Q Ironveld PLC 31 October 2012 TR-1:

More information

THE PERFORMANCE MANAGEMENT GROUP LLC

THE PERFORMANCE MANAGEMENT GROUP LLC THE PERFORMANCE MANAGEMENT GROUP LLC ONLINE TRAINING: LEAN SIX SIGMA SERVICE EXCELLENCE BLACK BELT CERTIFICATION Performance Improvement Training for Service Industries: Financial Services Telecommunications

More information

EMPLOYEE PERFORMANCE REVIEW FORM

EMPLOYEE PERFORMANCE REVIEW FORM EMPLOYEE PERFORMANCE REVIEW FORM The employee under review must complete all sections designated Employee. Supervisors must completed all sections designated Supervisor. Performance Reviews should be submitted

More information

PRESS RELEASE. End of press release

PRESS RELEASE. End of press release Unofficial Translation This is an unofficial translation of the press release made below and it has been prepared for information purposes only. In the case of any discrepancy between this translation

More information

(Japanese Note) 1. With reference to subparagraph (m) of paragraph 1 of Article 3 of the Convention:

(Japanese Note) 1. With reference to subparagraph (m) of paragraph 1 of Article 3 of the Convention: (Japanese Note) Translation London, February 2,2006 Excellency: I have the honour to refer to the Convention between Japan and the United Kingdom of Great Britain and Northern Ireland for the Avoidance

More information

Contents. 1 Introduction. 2 Feature List. 3 Feature Interaction Matrix. 4 Feature Interactions

Contents. 1 Introduction. 2 Feature List. 3 Feature Interaction Matrix. 4 Feature Interactions 1 Introduction 1.1 Purpose and Scope................................. 1 1 1.2 Organization..................................... 1 2 1.3 Requirements Notation............................... 1 2 1.4 Requirements

More information

1. Who can use Agent Portal? 2. What is the definition of an active agent? 3. How to access Agent portal? 4. How to login?

1. Who can use Agent Portal? 2. What is the definition of an active agent? 3. How to access Agent portal? 4. How to login? 1. Who can use Agent Portal? Any active agent who is associated with Future Generali Life Insurance Company Limited can logon to Agent Portal 2. What is the definition of an active agent? An agent, whose

More information

TEXAS STATE BOARD OF PLUMBING EXAMINERS RULE ADOPTION

TEXAS STATE BOARD OF PLUMBING EXAMINERS RULE ADOPTION TEXAS STATE BOARD OF PLUMBING EXAMINERS RULE ADOPTION TITLE 22 Examining Boards PART 17 Texas State Board of Plumbing Examiners CHAPTER 361 Administration 22 Tex. Admin. Code 361.6 Fees 22 Tex. Admin.

More information

Declaration to be submitted by directors in the Applicant Company 1

Declaration to be submitted by directors in the Applicant Company 1 Form SNBFI/D1 Name of the Applicant Company: Declaration to be submitted by directors in the Applicant Company 1 1. Personal Details 1.1 Full name: 1.2 National Identity Card number: 1.3 Passport number:

More information

ACCUPLACER Arithmetic & Elementary Algebra Study Guide

ACCUPLACER Arithmetic & Elementary Algebra Study Guide ACCUPLACER Arithmetic & Elementary Algebra Study Guide Acknowledgments We would like to thank Aims Community College for allowing us to use their ACCUPLACER Study Guides as well as Aims Community College

More information

THE PSYCHOLOGY CLUB EASTERN CONNECTICUT STATE UNIVERSITY CONSTITUTION. Article I: Name. Article II: Purpose

THE PSYCHOLOGY CLUB EASTERN CONNECTICUT STATE UNIVERSITY CONSTITUTION. Article I: Name. Article II: Purpose THE PSYCHOLOGY CLUB EASTERN CONNECTICUT STATE UNIVERSITY CONSTITUTION Article I: Name The name of this organization shall be called the Psychology Club of Eastern Connecticut State College. Article II:

More information

Workflow Administration of Windchill 10.2

Workflow Administration of Windchill 10.2 Workflow Administration of Windchill 10.2 Overview Course Code Course Length TRN-4339-T 2 Days In this course, you will learn about Windchill workflow features and how to design, configure, and test workflow

More information

Data Security at the KOKU

Data Security at the KOKU I. After we proposed our project to the central registration office of the city of Hamburg, they accepted our request for transferring information from their birth records. Transfer of all contact details

More information

INSURANCE IN. Historical Development, Present Status and Future. Challenges

INSURANCE IN. Historical Development, Present Status and Future. Challenges INSURANCE IN ETHIOPIA Historical Development, Present Status and Future Challenges Hailu Zeleke August 2007 TABLE OF CONTENTS PAGE ACKNOWLEDGEMENTS vi ACRONYMS vii INTRODUCTION viif CHARTER 1: RISK - MEANING

More information

TABLE OF CONTENTS CHAPTER DESCRIPTION PAGE

TABLE OF CONTENTS CHAPTER DESCRIPTION PAGE vii TABLE OF CONTENTS CHAPTER DESCRIPTION PAGE TITLE DECLARATION DEDICATION ACKNOWLEDGEMENT ABSTRACT ABSTRAK LIST OF CONTENTS LIST OF FIGURES LIST OF TABLES LIST OF APPENDICES i ii iii iv v vi vii xii

More information

Delivering Business Intelligence With Microsoft SQL Server 2005 or 2008 HDT922 Five Days

Delivering Business Intelligence With Microsoft SQL Server 2005 or 2008 HDT922 Five Days or 2008 Five Days Prerequisites Students should have experience with any relational database management system as well as experience with data warehouses and star schemas. It would be helpful if students

More information

Introduction. Acknowledgments Support & Feedback Preparing for the Exam. Chapter 1 Plan and deploy a server infrastructure 1

Introduction. Acknowledgments Support & Feedback Preparing for the Exam. Chapter 1 Plan and deploy a server infrastructure 1 Introduction Acknowledgments Support & Feedback Preparing for the Exam xv xvi xvii xviii Chapter 1 Plan and deploy a server infrastructure 1 Objective 1.1: Design an automated server installation strategy...1

More information

How to Become a Clinical Psychologist

How to Become a Clinical Psychologist How to Become a Clinical Psychologist Based on information gathered from assistant psychologists, trainee clinical psychologists and clinical psychology course directors across the country, How to Become

More information

Financial Services (Investment and Fiduciary Services) FINANCIAL SERVICES (FEES) REGULATIONS 2011 FINANCIAL SERVICES (FEES) REGULATIONS 2011

Financial Services (Investment and Fiduciary Services) FINANCIAL SERVICES (FEES) REGULATIONS 2011 FINANCIAL SERVICES (FEES) REGULATIONS 2011 Financial Services (Investment and Fiduciary Services) Legislation made under s. 53 and 56. 1989-47 (LN. ) Commencement 1.4.2011 Amending enactments Relevant current provisions Commencement date LN. 2011/036

More information

Agenda Item #06-29 Effective Spring 2007 Eastern Illinois University Revised Course Proposal MGT 4500, Employee Staffing and Development

Agenda Item #06-29 Effective Spring 2007 Eastern Illinois University Revised Course Proposal MGT 4500, Employee Staffing and Development Agenda Item #06-29 Effective Spring 2007 Eastern Illinois University Revised Course Proposal MGT 4500, Employee Staffing and Development 1. Catalog Description a. Course Number: MGT 4500 b. Title: Employee

More information

Financial Health and Funding in Colleges

Financial Health and Funding in Colleges Report by the Comptroller and Auditor General Managing Finances in English Further Education Colleges Ordered by the House of Commons to be printed 2 May 2000 LONDON: The Stationery Office 0.00 HC 454

More information

Software Factories: Assembling Applications with Patterns, Models, Frameworks, and Tools

Software Factories: Assembling Applications with Patterns, Models, Frameworks, and Tools Software Factories: Assembling Applications with Patterns, Models, Frameworks, and Tools Jack Greenfield Keith Short WILEY Wiley Publishing, Inc. Preface Acknowledgments Foreword Parti Introduction to

More information

The City of Philadelphia Department of Human Services. The Improving Outcomes for Children Initiative. Community Umbrella Agency Practice Guidelines

The City of Philadelphia Department of Human Services. The Improving Outcomes for Children Initiative. Community Umbrella Agency Practice Guidelines Webelievethatacommunity neighborhoodapproachwithclearlydefinedrolesbetween countyandproviderstaffwillpositivelyimpactsafety,permanency,andwell being." Whatareweworkingtogethertoachieve? o Morechildrenandyouthmaintainedsafelyintheirownhomesandcommunities.

More information

TITLE 9. HEALTH SERVICES CHAPTER 1. DEPARTMENT OF HEALTH SERVICES ADMINISTRATION ARTICLE 4. CODES AND STANDARDS REFERENCED

TITLE 9. HEALTH SERVICES CHAPTER 1. DEPARTMENT OF HEALTH SERVICES ADMINISTRATION ARTICLE 4. CODES AND STANDARDS REFERENCED TITLE 9. HEALTH SERVICES CHAPTER 1. DEPARTMENT OF HEALTH SERVICES ADMINISTRATION ARTICLE 4. CODES AND STANDARDS REFERENCED R9-1-412. Physical Plant Health and Safety Codes and Standards A. The following

More information

BSTP SRF 81 Medical Emergency and Work Place Injury

BSTP SRF 81 Medical Emergency and Work Place Injury Page 1 of 5 Table of Contents I. PRINCIPLE (Purpose):...2 II. ROLE:...2 III. GLOSSARY, ABBREVIATIONS OR DEFINITIONS:...2 IV. INDICATIONS (Policy):...2 V. SPECIMENS (Samples):...2 VI. MATERIALS, REAGENTS,

More information

Department of Defense Fiscal Year (FY) 2014 President's Budget Submission

Department of Defense Fiscal Year (FY) 2014 President's Budget Submission Department of Defense Fiscal Year (FY) 2014 President's Budget Submission April 2013 Defense Contract Audit Agency Justification Book Procurement, Defense-Wide UNCLASSIFIED THIS PAGE INTENTIONALLY LEFT

More information

Consolidated Annual Report of the AB Capital Group for the financial year 2008/2009. covering the period from July 1, 2008 to June 30, 2009

Consolidated Annual Report of the AB Capital Group for the financial year 2008/2009. covering the period from July 1, 2008 to June 30, 2009 Consolidated Annual Report of the AB Capital Group for the financial year 2008/2009 covering the period from July 1, 2008 to June 30, 2009 Selected financial data converted to EUR SELECTED FINANCIAL DATA

More information

Notice of Expression of Interest for Provision of Consultancy services for Real Estate Marketing, Branding and Sale of Housing Units at Lubowa

Notice of Expression of Interest for Provision of Consultancy services for Real Estate Marketing, Branding and Sale of Housing Units at Lubowa NATIONAL SOCIAL SECURITY FUND DE POINT CONSULTANTS LIMITED Notice of Expression of Interest for Provision of Consultancy services for Real Estate Marketing, Branding and Sale of Housing Units at Lubowa

More information

Screen Design : Navigation, Windows, Controls, Text,

Screen Design : Navigation, Windows, Controls, Text, Overview Introduction Fundamentals of GUIs - methods - Some examples Screen : Navigation, Windows, Controls, Text, Evaluating GUI Performance 1 Fundamentals of GUI What kind of application? - Simple or

More information

Accounts Receivable. Chapter

Accounts Receivable. Chapter Chapter 7 Accounts Receivable The Accounts Receivable module displays information about individual outstanding income sources. Use this screen to verify that invoice receipts, cash receipts, and other

More information

ISSUING THE AIR OPERATOR CERTIFICATE, OPERATIONS SPECIFICATIONS, AND COMPLETING THE CERTIFICATION REPORT

ISSUING THE AIR OPERATOR CERTIFICATE, OPERATIONS SPECIFICATIONS, AND COMPLETING THE CERTIFICATION REPORT ISSUING THE AIR OPERATOR CERTIFICATE, OPERATIONS SPECIFICATIONS, AND COMPLETING THE CERTIFICATION REPORT GUIDANCE MATERIAL FOR INSPECTORS CA AOC-017 AIR OPERATOR CERTIFICATION RECORD OF AMENDMENTS Amendment

More information

SQL Server Integration Services Design Patterns

SQL Server Integration Services Design Patterns SQL Server Integration Services Design Patterns Second Edition Andy Leonard Tim Mitchell Matt Masson Jessica Moss Michelle Ufford Apress* Contents J First-Edition Foreword About the Authors About the Technical

More information

SECOND EDITION THE SECURITY RISK ASSESSMENT HANDBOOK. A Complete Guide for Performing Security Risk Assessments DOUGLAS J. LANDOLL

SECOND EDITION THE SECURITY RISK ASSESSMENT HANDBOOK. A Complete Guide for Performing Security Risk Assessments DOUGLAS J. LANDOLL SECOND EDITION THE SECURITY RISK ASSESSMENT HANDBOOK A Complete Guide for Performing Security Risk Assessments DOUGLAS J. LANDOLL CRC Press Taylor & Francis Group Boca Raton London New York CRC Press is

More information

Computer Studies/Information and Communications Technology (ICT)

Computer Studies/Information and Communications Technology (ICT) Page 1 of 10 Professional Master of Education Subject Declaration Form IMPORTANT This declaration form should be returned to the PME provider(s) to which you have applied or the Postgraduate Applications

More information

NATIONAL UNIVERSITY OF SCIENCE AND TECHNOLOGY

NATIONAL UNIVERSITY OF SCIENCE AND TECHNOLOGY NATIONAL UNIVERSITY OF SCIENCE AND TECHNOLOGY FACULTY OF COMMERCE GENERAL MASTERS IN BUSINESS ADMINISTRATION MANAGERIAL ACCOUNTING GMB 562 FINAL EXAMINATION 11 DECEMBER 2003 TIME ALLOWED: 3 HOURS + 30

More information

STATE UNIVERSITY OF NEW YORK COLLEGE OF TECHNOLOGY CANTON, NEW YORK COURSE OUTLINE JUST 201 CRITICAL ISSUES IN CRIMINAL JUSTICE

STATE UNIVERSITY OF NEW YORK COLLEGE OF TECHNOLOGY CANTON, NEW YORK COURSE OUTLINE JUST 201 CRITICAL ISSUES IN CRIMINAL JUSTICE STATE UNIVERSITY OF NEW YORK COLLEGE OF TECHNOLOGY CANTON, NEW YORK COURSE OUTLINE JUST 201 CRITICAL ISSUES IN CRIMINAL JUSTICE Prepared by: Dr. Brian K. Harte SCHOOL OF BUSINESS AND CRIMINAL JUSTICE Department

More information

The following report presents financial data only. The full and binding version is available in Polish. K2 INTERNET S.A.

The following report presents financial data only. The full and binding version is available in Polish. K2 INTERNET S.A. The following report presents financial data only. The full and binding version is available in Polish. K2 INTERNET S.A. Annual Financial Statement of K2 Internet S.A. for the twelve-month period ended

More information

FINAL JOINT PRETRIAL ORDER. This matter is before the Court on a Final Pretrial Conference pursuant to R. 4:25-1.

FINAL JOINT PRETRIAL ORDER. This matter is before the Court on a Final Pretrial Conference pursuant to R. 4:25-1. SUPERIOR COURT OF NEW JERSEY MIDDLESEX COUNTY:LAW DIVISION Docket No. Plaintiff(s), v. Defendant(s). FINAL JOINT PRETRIAL ORDER This matter is before the Court on a Final Pretrial Conference pursuant to

More information

TERMS OF REFERENCE FINANCIAL CONSULTING FIRM 6 MONTHS, NATIONAL

TERMS OF REFERENCE FINANCIAL CONSULTING FIRM 6 MONTHS, NATIONAL TERMS OF REFERENCE FINANCIAL CONSULTING FIRM 6 MONTHS, NATIONAL 1. Background Financial statement of the social insurance offices consists of 8 main financial statements and these financial statements

More information

PAPER-6 PART-5 OF 5 CA A.RAFEQ, FCA

PAPER-6 PART-5 OF 5 CA A.RAFEQ, FCA Chapter-4: Business Continuity Planning and Disaster Recovery Planning PAPER-6 PART-5 OF 5 CA A.RAFEQ, FCA Learning Objectives 2 To understand the concept of Business Continuity Management To understand

More information

To define and explain different learning styles and learning strategies.

To define and explain different learning styles and learning strategies. Medical Office Assistant Program Overview The Medical Office Assistant program prepares students for entry-level employment as a medical office assistant. It discusses the fundamentals of medical terminology,

More information

Implementation & Administration

Implementation & Administration Microsoft SQL Server 2008 R2 Master Data Services: Implementation & Administration Tyler Graham Suzanne Selhorn Mc Grauu Hill New York Chicago San Francisco Lisbon London Madrid Mexico City Milan New Delhi

More information

Department of International Trade at Feng Chia University Master s Program Requirements Policy

Department of International Trade at Feng Chia University Master s Program Requirements Policy Department of International Trade at Feng Chia University Master s Program Requirements Policy Revised and approved by the Department Affairs Committee on June 9 th, 2005 Revised and approved by the Department

More information

Regulatory Story. RNS Number : 8343I. DCD Media PLC. 08 July 2013. TR-1: NOTIFICATION OF MAJOR INTEREST IN SHARES i

Regulatory Story. RNS Number : 8343I. DCD Media PLC. 08 July 2013. TR-1: NOTIFICATION OF MAJOR INTEREST IN SHARES i 1 of 7 25/11/2013 11:51 Regulatory Story Go to market news section Company TIDM Headline Released DCD Media PLC DCD Holding(s) in Company 15:19 08-Jul-2013 8343I15 RNS : 8343I DCD Media PLC 08 July 2013

More information

THE FIRST SCHEDULE (See rule 7) Table I - FEES PAYABLE

THE FIRST SCHEDULE (See rule 7) Table I - FEES PAYABLE Number of entry On what payable Number of the relevant Form THE FIRST SCHEDULE (See rule 7) Table I - FEES PAYABLE Natural For e-filing Small entity, alone or with natural Others, alone or with natural

More information

"Charting the Course... MOC 20409 B Server Virtualization with Windows Hyper-V and System Center. Course Summary

Charting the Course... MOC 20409 B Server Virtualization with Windows Hyper-V and System Center. Course Summary Description Course Summary This five day course will provide you with the knowledge and skills required to design and implement Microsoft Server solutions using Hyper-V and System. Objectives At the end

More information

Progress in mammal pest control on New Zealand conservation lands

Progress in mammal pest control on New Zealand conservation lands Progress in mammal pest control on New Zealand conservation lands SCIENCE FOR CONSERVATION 127 Published by Department of Conservation P.O. Box 10-420 Wellington, New Zealand Science for Conservation presents

More information

Introduction to Windchill Projectlink 10.2

Introduction to Windchill Projectlink 10.2 Introduction to Windchill Projectlink 10.2 Overview Course Code Course Length TRN-4270 1 Day In this course, you will learn how to participate in and manage projects using Windchill ProjectLink 10.2. Emphasis

More information

Pro NuGet. Second Edition. Maarten Balliauw. Xavier Decoster

Pro NuGet. Second Edition. Maarten Balliauw. Xavier Decoster Pro NuGet Second Edition Maarten Balliauw Xavier Decoster Contents About the Authors About the Technical Reviewers Foreword The Bigger Picture xvii xix xxi xxiii (^Chapter 1: Getting Started 1 Preparing

More information

CalREDIE Browser Requirements

CalREDIE Browser Requirements CalREDIE Browser Requirements Table of Contents Section 1: Browser Settings... 2 Section 2: Windows Requirements... 11 Section 3: Troubleshooting... 12 1 Section 1: Browser Settings The following browser

More information

Zeenov Agora High Level Architecture

Zeenov Agora High Level Architecture Zeenov Agora High Level Architecture 1 Major Components i) Zeenov Agora Signaling Server Zeenov Agora Signaling Server is a web server capable of handling HTTP/HTTPS requests from Zeenov Agora web clients

More information

No. 20 February 1, 2016. The President

No. 20 February 1, 2016. The President Vol. 81 Monday, No. 20 February 1, 2016 Part III The President Memorandum of January 28, 2016 White House Cancer Moonshot Task Force VerDate Sep2014 15:39 Jan 29, 2016 Jkt 238001 PO 00000 Frm 00001

More information

Automation for Customer Care System

Automation for Customer Care System Automation for Customer Care System Rajnish Kumar #1, Thakur Avinash Nagendra #2 1, 2# Department of Computer Engineering Sir Visvesvaraya Institute of Technology, Nasik Abstract This paper entitled Automation

More information

UNICEF International Public Sector Accounting Standards (IPSAS) Project Implementation Plan July 2009

UNICEF International Public Sector Accounting Standards (IPSAS) Project Implementation Plan July 2009 UNICEF International Public Sector Accounting Standards (IPSAS) Project Implementation Plan July 2009 I. Introduction The adoption of IPSAS by UNICEF represents major organisational change. This implementation

More information

Making Technology Investments Profitable

Making Technology Investments Profitable Making Technology Investments Profitable ROl Road Map from Business Case to Value Realization Second Edition JACK M. KEEN WILEY John Wiley & Sons, Inc. Preface to the Second Edition Acknowledgments Introduction

More information

NATIONAL UNIVERSITY OF SCIENCE AND TECHNOLOGY FACULTY OF COMMERCE DEPARTMENT OF INSURANCE AND ACTUARIAL SCIENCE

NATIONAL UNIVERSITY OF SCIENCE AND TECHNOLOGY FACULTY OF COMMERCE DEPARTMENT OF INSURANCE AND ACTUARIAL SCIENCE NATIONAL UNIVERSITY OF SCIENCE AND TECHNOLOGY FACULTY OF COMMERCE DEPARTMENT OF INSURANCE AND ACTUARIAL SCIENCE B.COMM (HONS) DEGREE IN RISK MANAGEMENT AND INSURANCE STRATEGIC MARKETING OF INSURANCE [CIN

More information

Continuity Plan Template for Non-Federal Governments

Continuity Plan Template for Non-Federal Governments Continuity Plan Template for Non-Federal Governments [Department/Agency/Organization Name] [Month Day, Year] [Department/Agency/Organization Name] [Street Address] [City, State Zip Code] [Department/Agency/Organization

More information

List of figures. List of tables. Abbreviations and acronyms

List of figures. List of tables. Abbreviations and acronyms Preface List of figures List of tables Abbreviations and acronyms vii xv xix xxiii I Introduction and methods 1 General introduction 3 1.1 Introduction... 3 1.2 Themotivesforthisresearch... 3 1.3 Riskmanagement...

More information

( ) = ( ) = {,,, } β ( ), < 1 ( ) + ( ) = ( ) + ( )

( ) = ( ) = {,,, } β ( ), < 1 ( ) + ( ) = ( ) + ( ) { } ( ) = ( ) = {,,, } ( ) β ( ), < 1 ( ) + ( ) = ( ) + ( ) max, ( ) [ ( )] + ( ) [ ( )], [ ( )] [ ( )] = =, ( ) = ( ) = 0 ( ) = ( ) ( ) ( ) =, ( ), ( ) =, ( ), ( ). ln ( ) = ln ( ). + 1 ( ) = ( ) Ω[ (

More information

Total Facilities Management

Total Facilities Management Total Facilities Management Third Edition Brian Atkin PhD, MPMI, BSC, FRICS, FCIOB Adrian Brooks BSC (Hons), MBA, MRICS WILEY-BLACKWELL A John Wiley & Sons, Ltd., Publication Preface to the Third Edition

More information

C ONTENTS. Acknowledgments

C ONTENTS. Acknowledgments kincaidtoc.fm Page vii Friday, September 20, 2002 1:25 PM C ONTENTS Preface Acknowledgments xxi xxvii Part 1 CRM: Is It Right for Your Company? 1 Chapter 1 Commerce in the 21st Century 3 1.1 Understanding

More information

This is a sample chapter from A Manager's Guide to Service Management. To read more and buy, visit http://shop.bsigroup.com/bip0005 BSI British

This is a sample chapter from A Manager's Guide to Service Management. To read more and buy, visit http://shop.bsigroup.com/bip0005 BSI British A Manager s Guide to Service Management A Manager s Guide to Service Management Jenny Dugmore Shirley Lacy First published in the UK in 1995 by BSI, 389 Chiswick High Road, London W4 4AL Second edition

More information

Distributed Administrative Computing Security System (DACSS) Web Training & Other Useful Tools

Distributed Administrative Computing Security System (DACSS) Web Training & Other Useful Tools Distributed Administrative Computing Security System (DACSS) Web Training & Other Useful Tools UCLA Audit & Advisory Services 1 Agenda I. Distributed Administrative Computing Security System (DACSS): Background,

More information

Life Insurance. Basic Life Insurance. Optional Life Insurance

Life Insurance. Basic Life Insurance. Optional Life Insurance Life Insurance Basic Life Insurance PEIA fers active employees under age 65 a basic $10,000 decreasing term life insurance policy with accidental death and dismemberment (AD&D) benefits. The value this

More information

EIGHTH DIRECTIVE OF DEPUTY RECEIVER (IMPLEMENTING THE HOW/HWC PLAN OE LIQUIDATION)

EIGHTH DIRECTIVE OF DEPUTY RECEIVER (IMPLEMENTING THE HOW/HWC PLAN OE LIQUIDATION) COMMONWEALTH OF VIRGINIA STATE CORPORATION COMMISSION COMMONWEALTH OF VIRGINIA at the Relation of the STATE CORPORATION COMMISSION, Applicant, v. CASE NO. INS-1994-00218 HOW INSURANCE COMPANY, A RISK RETENTION

More information

3 BUSINESS ACCOUNTING STANDARD,,INCOME STATEMENT I. GENERAL PROVISIONS

3 BUSINESS ACCOUNTING STANDARD,,INCOME STATEMENT I. GENERAL PROVISIONS APPROVED by Resolution No. 1 of 18 December 2003 of the Standards Board of the Public Establishment the Institute of Accounting of the Republic of Lithuania 3 BUSINESS ACCOUNTING STANDARD,,INCOME STATEMENT

More information

SITE PHOTOGRAPHS (GOOGLE EARTH): ROAD REHABILITATION Refer to Appendix B for a map of the viewpoint locations.

SITE PHOTOGRAPHS (GOOGLE EARTH): ROAD REHABILITATION Refer to Appendix B for a map of the viewpoint locations. 469335: N7 Rehab BA Report_Appendix B5 Page i SITE PHOTOGRAPHS (GOOGLE EARTH): ROAD REHABILITATION Refer to Appendix B for a map of the viewpoint locations. R1 R1 R2 R2 Looking north with Okiep east of

More information

Requirements Fulfilled This course is required for all students majoring in Information Technology in the College of Information Technology.

Requirements Fulfilled This course is required for all students majoring in Information Technology in the College of Information Technology. Course Title: ITAP 3382: Business Intelligence Semester Credit Hours: 3 (3,0) I. Course Overview The objective of this course is to give students an understanding of key issues involved in business intelligence

More information

Administrative Services

Administrative Services Policy Title: Administrative Services De-identification of Client Information and Use of Limited Data Sets Policy Number: DHS-100-007 Version: 2.0 Effective Date: Upon Approval Signature on File in the

More information

Archive for Chatter - Installation and Configuration Guide

Archive for Chatter - Installation and Configuration Guide Archive for Chatter - Installation and Configuration Guide Thank you for installing Archive for Chatter, the leading Salesforce.com Chatter archiving and compliance app on the AppExchange! To get started,

More information