Efficient SPARQL-to-SQL Translation using R2RML to manage Database Schema Changes 1 Sunil Ahn, 2 Seok-Kyoo Kim, 3 Soonwook Hwang 1, First Author Sunil Ahn, KISTI, Daejeon, Korea, siahn@kisti.re.kr *2,Corresponding Author Seok-Kyoo Kim, KISTI, Daejeon, Korea, anemone@kisti.re.kr 3 Soonwook Hwang, KISTI, Daejeon, Korea, hwang@kisti.re.kr Abstract There had been several efforts for efficient translation from SPARQL into SQL. The R2RML standard provides an ability to view existing relational data in the RDF data model and offer a virtual SPARQL endpoint over the mapped relational data. However, R2RML does not support flexible mapping when database schema change frequently. This paper presents an efficient translation technique from SPARQL into SQL to manage database schema changes by extending R2RML. We applied it to a tag metadata catalog service, and it demonstrated the effectiveness of our method. 1. Introduction Keywords: SPARQL, SQL, RDF, R2RML RDF is a family of W3C specifications originally designed as a metadata data model. It has come to be a standard model for conceptual description or modeling of information and data interchange on the Web [1]. It is a directed, labeled graph data format for representing information in the Web. In addition to RDF, the W3C has recommended the SPARQL query language [2], to be used to extract information from RDF graphs. It is used to query data stored using the RDF data model in the same way SQL is used for the relational database. As the use of Semantic Web becomes more widespread, it becomes more important for many Semantic Web applications to be able to access the content of relational databases used by legacy systems. There had been several efforts to convert relational data to the RDF format, or translate SPARQL into SQL [3-5]. Recently W3C proposed R2RML [6-7] to provide an easy way to introduce semantic technologies in web applications using existing relational solutions and data. It expresses customized mappings from data stored in relational databases to RDF datasets. However, R2RML does not support flexible mapping when a database schema changes frequently. For example, it is not easy to define mapping policy using R2RML when tables are created and removed dynamically. This paper presents an efficient technique to translate from SPARQL into SQL to manage database schema changes. We extend the R2RML vocabularies with optional terms that could be used to retrieve information on a database schema. We applied it to a tag metadata catalog service called Tagfiler [8], which showed the effectiveness of our method. The rest of this paper is organized as follows. In Section 2, we summarize the previous work. Section 3 presents a tag metadata catalog service called Tagfiler and its SPARQL interface which motivated our works. The R2RML extension to manage database schema changes and its implementation is presented in Section 4. Section 5 explains the implementation shortly, and Section 6 concludes the paper. 2. Related Works There had been several efforts for efficient translation from SPARQL into SQL. [3] describes a transformation from SPARQL into the relational algebra, an abstract intermediate language for the expression. This makes existing work on query planning and optimization available to SPARQL implementations. [5] presents a translation from SPARQL-to-SQL, which uses SQL-based algorithms that implement each SPARQL algebra operator via SQL query augmentation, and generate a flat SQL Journal of Next Generation Information Technology(JNIT) Volume 4, Number 8, October 2013 209
statement for efficient processing by relational database query engines. R2RML [6-7] offers a generic mechanism for the description of relational databases, in order to support SPARQL queries in any R2RML RDF graph. 3. Tagfiler and its SPARQL Interface Tags are commonly used in blogs to identify images or text within their site as a categories or topic. Web pages and blogs with identical tags can then be linked together allowing users to search for similar or related content. There had been several efforts to bridge the gap between free-tagging and semantic annotation by giving meaning to tags in RDF [9-10]. Tagfiler [8] is a collaborative tag metadata catalog rendered as an easy-to-use web service. It allows data to be annotated with extensible, domain-specific tags. It uses a relational data model to save tag related metadata. A table is allocated to hold all the information for each tag, so that a table is created if a new tag is defined. To facilitate the integration of metadata and various linked data located at several sites, we have implemented the SPARQL interface for Tagfiler. The main considerations in a design were obedience with the RDF standard and flexible RDF conversion to support extensive integration with other sources. For this purpose, we have adopted R2RML, which expresses customized mappings from tagging data stored in relational databases to RDF datasets. Such mappings provide the ability to view existing tagging data in the RDF data model, expressed in a structure and target vocabulary of the mapping author's choice. In Tagfiler, tables are created or removed dynamically when a tag is newly defined or removed. We found R2RML was inadequate to define a mapping policy for frequently created and removed tables. It led us to extend the R2RML vocabulary with optional terms that could be used to retrieve information on a database schema. 4. The R2RML extension to manage Database Schema Changes In order to deal with schema changes, we extended the R2RML vocabulary, which allows variables to be defined in R2RML. Values of these variables are retrieved from a user defined SQL queries, so that it is possible to reflect frequent schema changes in R2RML. Figure 1 shows an example of SQL Queries that define a R2RML view and a variable view. The <#TagView> uses a variable table name {PRE1}, and the values of {PRE1} are retrieved from the <#Variables> view. If there are 4 values for the {Pre1} variable from the <#Variables> view, then 4 SQL queries are generated for the <#TagView> and all the results are integrated. Figure 2 shows an example of R2RML mapping which utilizes the variable view. <#TagView> rr:sqlquery """ SELECT _name.value AS SUBJ, bar.value AS OBJ, FROM {PRE1} bar, _name WHERE _name.subject = bar.subject ; """. <#Variables> rr:sqlquery """ SELECT DISTINCT tagname AS PRE1 FROM _name, subjecttags, _tagdef, \"_tagdef type\" tt WHERE _name.subject = subjecttags.subject AND _tagdef.subject = tt.subject AND _tagdef.value = tagname AND tt.value!= \'empty\' ; """. Figure 1. Example SQL Queries 210
<#TriplesMap> rr:logicaltable <#TagView>; rr:variabletable <#Variables>; rr:subjectmap [ rr:template "{SUBJ}"; rr:class tag:subject; ]; rr:predicateobjectmap [ rr:template "{PRE1J}"; rr:objectmap [ rr:column "OBJ" ]; ]. Figure 2. Example R2RML mapping Figure 3 shows the extended R2RML which defines the variabletable property. It is used to define a variable table which holds values of variables from an SQL query. Figure 5 shows the RDF schema definition for the variabletable property. Figure 3. The properties of variable tables variabletable (Property) Definition of variable table to be mapped. Definition This property is an object property. The domain of this property is Triples Map The range of this property is Logical Table Figure 4. The RDF schema definition for the variabletable property 4. Implementation Figure 5 depicts the abstract architecture of the SPARQL interface for Tagfiler. The SPARQL interface accepts a query and passes the query to the RDFLib [11]. The RDFLib parses the query and accesses the TF Store to retrieve required triples. If the queried results come out from the RDFLib, the SPARQL interface returns results in a form of HTML or XML depending on the request http protocol. Figure 6 shows an example of queried results in a form of XML. The RDFLib library provides an abstracted Store API for persistence of RDF. The TF Store implements a triple () API of the RDFLib to map from tagging data stored at databases to RDF data. The role of the triple () API implementation is to find the matching triples to an SPARQL input. This 211
API implementation creates a couple of SQL queries based on user s input and the defined R2RML policies and finds the matching triples. 5. Conclusion This paper presented an efficient translation technique from SPARQL into SQL to manage database schema changes by extending R2RML. We applied it to a tag metadata catalog service, and it demonstrated flexibly mapping from tagging data stored in a database to the RDF format based on the extended R2RML standard. Future works will mainly focus on integrating tagging data stored at multiple sites. Figure 5. The abstract architecture of the SPARQL interface for Tagfiler $ curl -b cookie.txt2 -H "Accept: application/rdf+xml" -k --data "query=construct+%7b+%3fs+%3fp+%3fo+%7d+where+%7b+%3fs+%3fp+%3fo+%7d" https://localhost/tagfiler/sparql/ <?xml version="1.0" encoding="utf-8"?> <rdf:rdf xmlns:local="http://localhost/tagfiler/tags/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:tags="http://tagfiler.org/tags#"> <rdf:description rdf:about="http://localhost/tagfiler/tags/coordinates"> <rdf:type rdf:resource="http://tagfiler.org/tags#tag"/> <rdf:description rdf:about="http://localhost/tagfiler/tags/tag"> <rdf:type rdf:resource="http://www.w3.org/2000/01/rdf-schema#class"/> <rdf:description rdf:about="http://tagfiler.org/tags#label"> <rdf:description rdf:about="http://localhost/tagfiler/tags/name=brazil"> <rdf:type rdf:resource="http://tagfiler.org/tags#subject"/> <local:thumbnail rdf:resource="http://upload.wikimedia.org/wikipedia/en/0/05/flag_of_brazil.svg"/> <tags:tagged rdf:resource="http://localhost/tagfiler/tags/coordinates"/> <tags:tagged rdf:resource="http://localhost/tagfiler/tags/thumbnail"/> <tags:label rdf:datatype="http://www.w3.org/2001/xmlschema#string">thumbnail</tags:label> 212
<tags:label rdf:datatype="http://www.w3.org/2001/xmlschema#string">coordinates</tags:label> <local:coordinates rdf:datatype="http://www.w3.org/2001/xmlschema#string">-7.4475000, - 35.2441700</local:coordinates> <rdf:description rdf:about="http://tagfiler.org/tags#tagged"> <rdf:description rdf:about="http://localhost/tagfiler/tags/subject"> <rdf:type rdf:resource="http://www.w3.org/2000/01/rdf-schema#class"/> <rdf:description rdf:about="http://localhost/tagfiler/tags/thumbnail"> <rdf:type rdf:resource="http://tagfiler.org/tags#tag"/> <rdf:description rdf:about="http://tagfiler.org/tags#mean"> </rdf:rdf> Figure 6. an example of queried results in a form of XML 6. References [1] Elena Simperl, Philipp Cimiano, Axel Polleres, Óscar Corcho, Valentina Presutti, The Semantic Web: Research and Applications, LNCS, vol. 7295, Springer, 2012. [2] Jorge Pérez, Marcelo Arenas, Claudio Gutierrez, Semantics and complexity of SPARQL, ACM Transactions on Database Systems, ACM, vol. 34, no. 3, pp.16-45, 2009. [3] Richard Cyganiak, A relational algebra for SPARQL, Technical Report, HP Laboratories Bristol, HPL-2005-170, pp.1-20, 2005. [4] Li Ma, Chen Wang, Jing Lu, Feng Cao, Yue Pan, Yong Yu, Effective and Efficient Semantic Web Data Management over DB2, In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pp.1183-1194, 2008. [5] Brendan Elliott, En Cheng, Chimezie Thomas-Ogbuji, Z. Meral Ozsoyoglu, A complete translation from SPARQL into efficient SQL, IDEAS '09 Proceedings of the 2009 International Database Engineering & Applications Symposium, pp.31-42, 2009. [6] Souripriya Das, Seema Sundara, Richard Cyganiak, R2RML: RDB to RDF Mapping Language, W3C Recommendation, http://www.w3.org/tr/r2rml/, 2012 [7] Edgard Marx, Percy Salas, Karin Breitman, Jose Viterbo, Marco Antonio Casanova, RDB2RDF: A relational to RDF plug-in for Eclipse, Software: Practice and Experience, vol. 43, no. 4, pp.435-447, 2012. [8] Carl Kesselman, DataSet Services in GlobusOnline, http://www.globusworld.org/files/2013/06- Kesselman-DataSet_Services_in_Globus_Online.pdf, 2013. [9] Alexandre Passant, Meaning Of A Tag: A Collaborative Approach to Bridge the Gap Between Tagging and Linked, Proceedings of the Linked Data on the Web, pp.132-135, 2008. [10] Common Tag, http://commontag.org/. [11] The Official RDFLib support site, http://rdflib.net/. 213