Efficient SPARQL-to-SQL Translation using R2RML to manage



Similar documents
City Data Pipeline. A System for Making Open Data Useful for Cities. stefan.bischof@tuwien.ac.at

Relational Database to RDF Mapping Patterns

Evaluating SPARQL-to-SQL translation in ontop

Semantic Web Technologies and Data Management

Publishing Linked Data Requires More than Just Using a Tool

Towards a reference architecture for Semantic Web applications

Semantic Stored Procedures Programming Environment and performance analysis

Lightweight Data Integration using the WebComposition Data Grid Service

Semantic Model based on Three-layered Metadata for Oil-gas Data Integration

Semantic Modeling with RDF. DBTech ExtWorkshop on Database Modeling and Semantic Modeling Lili Aunimo

Publishing Census Data as Linked Open Data

SmartLink: a Web-based editor and search environment for Linked Services

Heterogeneous databases mediation

Linked Open Data Infrastructure for Public Sector Information: Example from Serbia

Dr. Anuradha et al. / International Journal on Computer Science and Engineering (IJCSE)

A Semantic web approach for e-learning platforms

The Ontological Approach for SIEM Data Repository

Application of ontologies for the integration of network monitoring platforms

Graph Database Performance: An Oracle Perspective

UIMA and WebContent: Complementary Frameworks for Building Semantic Web Applications

I. INTRODUCTION NOESIS ONTOLOGIES SEMANTICS AND ANNOTATION

HadoopRDF : A Scalable RDF Data Analysis System

Towards the Integration of a Research Group Website into the Web of Data

A MEDIATION LAYER FOR HETEROGENEOUS XML SCHEMAS

Annotation: An Approach for Building Semantic Web Library

Creating an RDF Graph from a Relational Database Using SPARQL

Semantic Interoperability

MUSYOP: Towards a Query Optimization for Heterogeneous Distributed Database System in Energy Data Management

JOURNAL OF COMPUTER SCIENCE AND ENGINEERING

A Generic Database Web Service

Detection and Elimination of Duplicate Data from Semantic Web Queries

Lift your data hands on session

LDIF - Linked Data Integration Framework

Ontology-Based Query Expansion Widget for Information Retrieval

An Efficient and Scalable Management of Ontology

CURRICULUM VITAE JORGE PÉREZ

DISCOVERING RESUME INFORMATION USING LINKED DATA

IAAA Grupo de Sistemas de Información Avanzados

12 The Semantic Web and RDF

Semantic SharePoint. Technical Briefing. Helmut Nagy, Semantic Web Company Andreas Blumauer, Semantic Web Company

Publishing Relational Databases as Linked Data

The Ontology and Architecture for an Academic Social Network

Benchmarking the Performance of Storage Systems that expose SPARQL Endpoints

Ontology based ranking of documents using Graph Databases: a Big Data Approach

Web 2.0-based SaaS for Community Resource Sharing

Visualization of Semantic Windows with SciDB Integration

ONTOLOGY-BASED APPROACH TO DEVELOPMENT OF ADJUSTABLE KNOWLEDGE INTERNET PORTAL FOR SUPPORT OF RESEARCH ACTIVITIY

Department of Defense. Enterprise Information Warehouse/Web (EIW) Using standards to Federate and Integrate Domains at DOD

A Framework for Collaborative Project Planning Using Semantic Web Technology

Scope. Cognescent SBI Semantic Business Intelligence

KEYWORD SEARCH IN RELATIONAL DATABASES

A generic approach for data integration using RDF, OWL and XML

Mapping between Relational Databases and OWL Ontologies: an Example

Leveraging existing Web frameworks for a SIOC explorer to browse online social communities

Exchange of Data for Big Data in Hybrid Cloud Environment

II. PREVIOUS RELATED WORK

DLDB: Extending Relational Databases to Support Semantic Web Queries

How To Write A Drupal Rdf Plugin For A Site Administrator To Write An Html Oracle Website In A Blog Post In A Flashdrupal.Org Blog Post

BUSINESS VALUE OF SEMANTIC TECHNOLOGY

Presentation / Interface 1.3

Introduction to Ontologies

Semantic Web Development in China

Pragmatic Web 4.0. Towards an active and interactive Semantic Media Web. Fachtagung Semantische Technologien September 2013 HU Berlin

SQLMutation: A tool to generate mutants of SQL database queries

Big Data, Fast Data, Complex Data. Jans Aasman Franz Inc

How To Use An Orgode Database With A Graph Graph (Robert Kramer)

GetLOD - Linked Open Data and Spatial Data Infrastructures

Chapter 1: Introduction

Search Result Optimization using Annotators

THE SEMANTIC WEB AND IT`S APPLICATIONS

Linked Data Interface, Semantics and a T-Box Triple Store for Microsoft SharePoint

Integrating Heterogeneous Data Sources Using XML

Converting Relational Database Into Xml Document

Performance Analysis, Data Sharing, Tools Integration: New Approach based on Ontology

Design of Data Archive in Virtual Test Architecture

Full-text Search in Intermediate Data Storage of FCART

An Eclipse Plug-In for Visualizing Java Code Dependencies on Relational Databases

PULLING OUT OPINION TARGETS AND OPINION WORDS FROM REVIEWS BASED ON THE WORD ALIGNMENT MODEL AND USING TOPICAL WORD TRIGGER MODEL

Querying DBpedia Using HIVE-QL

Proceedings of the SPDECE Ninth nultidisciplinary symposium on the design and evaluation of digital content for education

Improving EHR Semantic Interoperability Future Vision and Challenges

Revealing Trends and Insights in Online Hiring Market Using Linking Open Data Cloud: Active Hiring a Use Case Study

A collaborative platform for knowledge management

Transcription:

Efficient SPARQL-to-SQL Translation using R2RML to manage Database Schema Changes 1 Sunil Ahn, 2 Seok-Kyoo Kim, 3 Soonwook Hwang 1, First Author Sunil Ahn, KISTI, Daejeon, Korea, siahn@kisti.re.kr *2,Corresponding Author Seok-Kyoo Kim, KISTI, Daejeon, Korea, anemone@kisti.re.kr 3 Soonwook Hwang, KISTI, Daejeon, Korea, hwang@kisti.re.kr Abstract There had been several efforts for efficient translation from SPARQL into SQL. The R2RML standard provides an ability to view existing relational data in the RDF data model and offer a virtual SPARQL endpoint over the mapped relational data. However, R2RML does not support flexible mapping when database schema change frequently. This paper presents an efficient translation technique from SPARQL into SQL to manage database schema changes by extending R2RML. We applied it to a tag metadata catalog service, and it demonstrated the effectiveness of our method. 1. Introduction Keywords: SPARQL, SQL, RDF, R2RML RDF is a family of W3C specifications originally designed as a metadata data model. It has come to be a standard model for conceptual description or modeling of information and data interchange on the Web [1]. It is a directed, labeled graph data format for representing information in the Web. In addition to RDF, the W3C has recommended the SPARQL query language [2], to be used to extract information from RDF graphs. It is used to query data stored using the RDF data model in the same way SQL is used for the relational database. As the use of Semantic Web becomes more widespread, it becomes more important for many Semantic Web applications to be able to access the content of relational databases used by legacy systems. There had been several efforts to convert relational data to the RDF format, or translate SPARQL into SQL [3-5]. Recently W3C proposed R2RML [6-7] to provide an easy way to introduce semantic technologies in web applications using existing relational solutions and data. It expresses customized mappings from data stored in relational databases to RDF datasets. However, R2RML does not support flexible mapping when a database schema changes frequently. For example, it is not easy to define mapping policy using R2RML when tables are created and removed dynamically. This paper presents an efficient technique to translate from SPARQL into SQL to manage database schema changes. We extend the R2RML vocabularies with optional terms that could be used to retrieve information on a database schema. We applied it to a tag metadata catalog service called Tagfiler [8], which showed the effectiveness of our method. The rest of this paper is organized as follows. In Section 2, we summarize the previous work. Section 3 presents a tag metadata catalog service called Tagfiler and its SPARQL interface which motivated our works. The R2RML extension to manage database schema changes and its implementation is presented in Section 4. Section 5 explains the implementation shortly, and Section 6 concludes the paper. 2. Related Works There had been several efforts for efficient translation from SPARQL into SQL. [3] describes a transformation from SPARQL into the relational algebra, an abstract intermediate language for the expression. This makes existing work on query planning and optimization available to SPARQL implementations. [5] presents a translation from SPARQL-to-SQL, which uses SQL-based algorithms that implement each SPARQL algebra operator via SQL query augmentation, and generate a flat SQL Journal of Next Generation Information Technology(JNIT) Volume 4, Number 8, October 2013 209

statement for efficient processing by relational database query engines. R2RML [6-7] offers a generic mechanism for the description of relational databases, in order to support SPARQL queries in any R2RML RDF graph. 3. Tagfiler and its SPARQL Interface Tags are commonly used in blogs to identify images or text within their site as a categories or topic. Web pages and blogs with identical tags can then be linked together allowing users to search for similar or related content. There had been several efforts to bridge the gap between free-tagging and semantic annotation by giving meaning to tags in RDF [9-10]. Tagfiler [8] is a collaborative tag metadata catalog rendered as an easy-to-use web service. It allows data to be annotated with extensible, domain-specific tags. It uses a relational data model to save tag related metadata. A table is allocated to hold all the information for each tag, so that a table is created if a new tag is defined. To facilitate the integration of metadata and various linked data located at several sites, we have implemented the SPARQL interface for Tagfiler. The main considerations in a design were obedience with the RDF standard and flexible RDF conversion to support extensive integration with other sources. For this purpose, we have adopted R2RML, which expresses customized mappings from tagging data stored in relational databases to RDF datasets. Such mappings provide the ability to view existing tagging data in the RDF data model, expressed in a structure and target vocabulary of the mapping author's choice. In Tagfiler, tables are created or removed dynamically when a tag is newly defined or removed. We found R2RML was inadequate to define a mapping policy for frequently created and removed tables. It led us to extend the R2RML vocabulary with optional terms that could be used to retrieve information on a database schema. 4. The R2RML extension to manage Database Schema Changes In order to deal with schema changes, we extended the R2RML vocabulary, which allows variables to be defined in R2RML. Values of these variables are retrieved from a user defined SQL queries, so that it is possible to reflect frequent schema changes in R2RML. Figure 1 shows an example of SQL Queries that define a R2RML view and a variable view. The <#TagView> uses a variable table name {PRE1}, and the values of {PRE1} are retrieved from the <#Variables> view. If there are 4 values for the {Pre1} variable from the <#Variables> view, then 4 SQL queries are generated for the <#TagView> and all the results are integrated. Figure 2 shows an example of R2RML mapping which utilizes the variable view. <#TagView> rr:sqlquery """ SELECT _name.value AS SUBJ, bar.value AS OBJ, FROM {PRE1} bar, _name WHERE _name.subject = bar.subject ; """. <#Variables> rr:sqlquery """ SELECT DISTINCT tagname AS PRE1 FROM _name, subjecttags, _tagdef, \"_tagdef type\" tt WHERE _name.subject = subjecttags.subject AND _tagdef.subject = tt.subject AND _tagdef.value = tagname AND tt.value!= \'empty\' ; """. Figure 1. Example SQL Queries 210

<#TriplesMap> rr:logicaltable <#TagView>; rr:variabletable <#Variables>; rr:subjectmap [ rr:template "{SUBJ}"; rr:class tag:subject; ]; rr:predicateobjectmap [ rr:template "{PRE1J}"; rr:objectmap [ rr:column "OBJ" ]; ]. Figure 2. Example R2RML mapping Figure 3 shows the extended R2RML which defines the variabletable property. It is used to define a variable table which holds values of variables from an SQL query. Figure 5 shows the RDF schema definition for the variabletable property. Figure 3. The properties of variable tables variabletable (Property) Definition of variable table to be mapped. Definition This property is an object property. The domain of this property is Triples Map The range of this property is Logical Table Figure 4. The RDF schema definition for the variabletable property 4. Implementation Figure 5 depicts the abstract architecture of the SPARQL interface for Tagfiler. The SPARQL interface accepts a query and passes the query to the RDFLib [11]. The RDFLib parses the query and accesses the TF Store to retrieve required triples. If the queried results come out from the RDFLib, the SPARQL interface returns results in a form of HTML or XML depending on the request http protocol. Figure 6 shows an example of queried results in a form of XML. The RDFLib library provides an abstracted Store API for persistence of RDF. The TF Store implements a triple () API of the RDFLib to map from tagging data stored at databases to RDF data. The role of the triple () API implementation is to find the matching triples to an SPARQL input. This 211

API implementation creates a couple of SQL queries based on user s input and the defined R2RML policies and finds the matching triples. 5. Conclusion This paper presented an efficient translation technique from SPARQL into SQL to manage database schema changes by extending R2RML. We applied it to a tag metadata catalog service, and it demonstrated flexibly mapping from tagging data stored in a database to the RDF format based on the extended R2RML standard. Future works will mainly focus on integrating tagging data stored at multiple sites. Figure 5. The abstract architecture of the SPARQL interface for Tagfiler $ curl -b cookie.txt2 -H "Accept: application/rdf+xml" -k --data "query=construct+%7b+%3fs+%3fp+%3fo+%7d+where+%7b+%3fs+%3fp+%3fo+%7d" https://localhost/tagfiler/sparql/ <?xml version="1.0" encoding="utf-8"?> <rdf:rdf xmlns:local="http://localhost/tagfiler/tags/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:tags="http://tagfiler.org/tags#"> <rdf:description rdf:about="http://localhost/tagfiler/tags/coordinates"> <rdf:type rdf:resource="http://tagfiler.org/tags#tag"/> <rdf:description rdf:about="http://localhost/tagfiler/tags/tag"> <rdf:type rdf:resource="http://www.w3.org/2000/01/rdf-schema#class"/> <rdf:description rdf:about="http://tagfiler.org/tags#label"> <rdf:description rdf:about="http://localhost/tagfiler/tags/name=brazil"> <rdf:type rdf:resource="http://tagfiler.org/tags#subject"/> <local:thumbnail rdf:resource="http://upload.wikimedia.org/wikipedia/en/0/05/flag_of_brazil.svg"/> <tags:tagged rdf:resource="http://localhost/tagfiler/tags/coordinates"/> <tags:tagged rdf:resource="http://localhost/tagfiler/tags/thumbnail"/> <tags:label rdf:datatype="http://www.w3.org/2001/xmlschema#string">thumbnail</tags:label> 212

<tags:label rdf:datatype="http://www.w3.org/2001/xmlschema#string">coordinates</tags:label> <local:coordinates rdf:datatype="http://www.w3.org/2001/xmlschema#string">-7.4475000, - 35.2441700</local:coordinates> <rdf:description rdf:about="http://tagfiler.org/tags#tagged"> <rdf:description rdf:about="http://localhost/tagfiler/tags/subject"> <rdf:type rdf:resource="http://www.w3.org/2000/01/rdf-schema#class"/> <rdf:description rdf:about="http://localhost/tagfiler/tags/thumbnail"> <rdf:type rdf:resource="http://tagfiler.org/tags#tag"/> <rdf:description rdf:about="http://tagfiler.org/tags#mean"> </rdf:rdf> Figure 6. an example of queried results in a form of XML 6. References [1] Elena Simperl, Philipp Cimiano, Axel Polleres, Óscar Corcho, Valentina Presutti, The Semantic Web: Research and Applications, LNCS, vol. 7295, Springer, 2012. [2] Jorge Pérez, Marcelo Arenas, Claudio Gutierrez, Semantics and complexity of SPARQL, ACM Transactions on Database Systems, ACM, vol. 34, no. 3, pp.16-45, 2009. [3] Richard Cyganiak, A relational algebra for SPARQL, Technical Report, HP Laboratories Bristol, HPL-2005-170, pp.1-20, 2005. [4] Li Ma, Chen Wang, Jing Lu, Feng Cao, Yue Pan, Yong Yu, Effective and Efficient Semantic Web Data Management over DB2, In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pp.1183-1194, 2008. [5] Brendan Elliott, En Cheng, Chimezie Thomas-Ogbuji, Z. Meral Ozsoyoglu, A complete translation from SPARQL into efficient SQL, IDEAS '09 Proceedings of the 2009 International Database Engineering & Applications Symposium, pp.31-42, 2009. [6] Souripriya Das, Seema Sundara, Richard Cyganiak, R2RML: RDB to RDF Mapping Language, W3C Recommendation, http://www.w3.org/tr/r2rml/, 2012 [7] Edgard Marx, Percy Salas, Karin Breitman, Jose Viterbo, Marco Antonio Casanova, RDB2RDF: A relational to RDF plug-in for Eclipse, Software: Practice and Experience, vol. 43, no. 4, pp.435-447, 2012. [8] Carl Kesselman, DataSet Services in GlobusOnline, http://www.globusworld.org/files/2013/06- Kesselman-DataSet_Services_in_Globus_Online.pdf, 2013. [9] Alexandre Passant, Meaning Of A Tag: A Collaborative Approach to Bridge the Gap Between Tagging and Linked, Proceedings of the Linked Data on the Web, pp.132-135, 2008. [10] Common Tag, http://commontag.org/. [11] The Official RDFLib support site, http://rdflib.net/. 213