Benchmarking Bottom-Up and Top-Down Strategie for SPARQL-to-SQL Query Tranlation Kahlev a, Chebotko b,c, John Abraham b, Pearl Brazier b, and Shiyong Lu a a Department of Computer Science, Wayne State Univerity, Detroit, Michigan, USA b Department of Computer Science, Univerity of Texa - Pan American, Edinburg, Texa, USA c Correponding Author. E-mail: chebotkoa@utpa.edu Abtract Many reearcher have propoed uing conventional relational databae to tore and query large Semantic Web dataet. The mot complex component of thi approach i SPARQL-to-SQL query tranlation. Exiting algorithm tranlate SPARQL querie to SQL uing either bottom-up or top-down trategy and reult in emantically equivalent but yntactically different relational querie. While it can be expected that relational query optimizer produce identical query execution plan for emantically equivalent bottomup and top-down querie, i thi uually the cae in practice? And if not, which trategy yield fater SQL querie? To addre thee quetion, thi work tudie bottom-up and top-down tranlation of SPARQL querie with neted optional graph pattern that yield SQL querie with left outer join whoe reordering i not alway poible. Thi paper preent: i) a bottom-up neted optional graph pattern tranlation algorithm, ii) a top-down neted optional graph pattern tranlation algorithm, and iii) a performance tudy featuring SPARQL querie with neted optional graph pattern over RDF databae created in Oracle, DB2, and PotgreSQL. Keyword: SPARQL; SQL; tranlation; query; bottom-up; topdown; Semantic Web; RDF; query optimization; query performance 1. Introduction Semantic Web technologie are finding more and more application in olving challenging problem of intelligent data and computing reource earch, dicovery, haring, and integration. Numerou RDF [1] dataet, uch a UniProt, GeoName, WordNet, DBpedia, and hundred of other 1, have become available over the Web for ue and exploration. The rapid growth of emantic dataet bring forward a new challenge - efficient management of RDF data that i crucial for upporting new emantic-enabled application. Many reearcher have propoed uing conventional relational databae to tore and query large Semantic Web dataet [2]. Emerged ytem, called relational RDF databae, hare a common deign pattern that ue a chema 1 W3C SWEO Linking Open Data community project, http://www. w3.org/wiki/sweoig/takforce/communityproject/ LinkingOpenData mapping algorithm to generate a relational databae chema, a data mapping algorithm to inert RDF data into the databae, and a query mapping algorithm to tranlate RDF querie into equivalent SQL querie. SPARQL-to-SQL tranlation i not only the mot complex mapping in a relational RDF databae, but alo very critical to overall querying performance. Exiting algorithm tranlate SPARQL querie to SQL uing either bottom-up or top-down trategy and reult in emantically equivalent but yntactically different relational querie. To illutrate the difference between bottom-up and topdown SPARQL-to-SQL tranlation in the context of neted optional graph pattern, we ue a ample RDF graph G in Fig. 1 that decribe academic relation among profeor and graduate tudent in a univerity. The graph i preented both graphically and a a et of triple. The RDF chema define two concept/clae (Profeor and GradStudent) and two relation/propertie (haadvior and hacoadvior). Each relation ha the GradStudent cla a a domain and the Profeor cla a a range. Additionally, two intance of Profeor, two intance of GradStudent and relation among thee intance are defined a hown in the figure. We deign an RDF query that return (1) every graduate tudent in the RDF graph; (2) the tudent advior if thi information i available; and (3) the tudent coadvior if thi information i available and if the tudent advior ha been uccefully retrieved in the previou tep. In other word, while the query attempt to find tudent and a many advior a poible, there i no point to return a coadvior if no advior i aigned to a tudent. The SPARQL repreentation of thi query i a follow: SELECT??a?c WHERE {? type GradStudent. OPTIONAL { /* R1() */? haadvior?a. OPTIONAL { /* R2(,a) */? hacoadvior?c. /* R3(,c) */ } } } The query ha three variable:? for tudent,?a for advior, and?c for coadvior. There are two OPTIONAL claue, where the innermot one i the neted OPTIONAL claue. For the purpoe of illutration, let aume that each individual triple pattern in the query i tranlated into
Intance Schema G = { (John, type, Profeor), (, type, Profeor), (, type, GradStudent), (, type, GradStudent), (, haadvior, ), (, hacoadvior, John), (, hacoadvior, ) } John Profeor rdf:type haadvior hacoadvior haadvior hacoadvior GradStudent a c John NULL NULL R1. = R4. Or R1. I Null Or R4. I Null R1 R2 R2. = R3. a R3 R4 a c John c John (a) Bottom-up query evaluation Reult R1 a NULL a c John NULL NULL R4. = R3. And R4.a I Not Null R4 R1. = R2. R2 R3 c John a (b) Top-down query evaluation Fig. 2: Evaluation of top-down and bottom-up querie. Fig. 1: Sample RDF graph. SQL and repreented by a virtual relation that capture the correponding variable binding: R1() (? type Grad- Student), R2(,a) (? haadvior?a), and R3(,c) (? hacoadvior?c). Then, SQL querie generated for thi SPARQL query uing our bottom-up and top-down tranlation preented in [3] and [4], repectively, are: /* Bottom-up query */ Select R1. A, a, c From R1 Left Outer Join (Select R2. A, a, c From R2 Left Outer Join R3 On (R2. = R3.)) R4 On (R1. = R4. Or R1. I Null Or R4. I Null) and /* Top-down query */ Select R4. A, a, c From (Select R1. A, a From R1 Left Outer Join R2 On (R1. = R2.)) R4 Left Outer Join R3 On (R4. = R3. And R4.a I Not Null) Both bottom-up and top-down SQL querie have two left outer join, however the join order and condition are different. The evaluation of thee querie produce the ame reulting relation a hown in Fig. 2. The reearch that we report i motivated by the following two quetion: While it can be expected that relational query optimizer produce identical query execution plan for emantically equivalent bottom-up and top-down querie, i thi uually the cae in practice? And if not, which trategy yield fater SQL querie? In our earch for the anwer, in thi paper, we preent i) a bottom-up neted optional graph pattern tranlation algorithm, ii) a top-down neted optional graph pattern tranlation algorithm, and iii) a performance tudy featuring SPARQL querie with neted optional graph pattern over RDF databae created in Oracle, DB2, and PotgreSQL. The organization of thi paper i a follow. Related work i dicued in Section 2. Notation and preliminary definition are introduced in Section 3. Our algorithm for bottom-up and top-down neted optional graph pattern tranlation are preented in Section 4 and 5, repectively. Finally, our performance tudy and concluion are reported in Section 6 and 7. 2. Related Work In recent year, a number of relational RDF databae ytem have been developed to upport large-cale Semantic Web application [2]. Repreentative of uch ytem include Jena, Seame, 3tore, KAON, RStar, OpenLink Virtuoo, PARKA, DLDB, DBOWL, RDFSuite, RDFBroker, RDFProv, and S2ST (ee [2] or [3] for a urvey). While they hare a common deign pattern, they differ in employed databae chema, inference upport and algorithm that map RDF data and querie to the relational model. One of the mot complex mapping in relational RDF databae i the SPARQL-to-SQL query mapping or tranlation [3], [5], [6], [4], [7], [8]. Exiting algorithm tranlate SPARQL querie to SQL uing either bottom-up or top-down trategy and reult in emantically equivalent but yntactically different relational querie. To our bet knowledge, thi work i the firt to compare bottom-up and top-down query tranlation in the context of complex neted optional graph
pattern. The importance of uch a comparion i twofold: it give inight to the query optimization problem of chooing a good tranlation trategy for a particular query and motivate future reearch on a potentially hybrid tranlation trategy where both bottom-up and top-down approache are employed. While we preent thi work in the context of relational RDF databae, it inight are alo beneficial for query optimization in non-relational RDF databae, uch a emerging Hadoop and HBae baed RDF data management ytem in the cloud environment [9], [10]. Other related work on RDF query optimization that are complimentary to our reearch include containment and minimization of RDF/S query pattern [11], SPARQL query rewriting [12], and variou RDF data indexing technique [13], [14], [15], [16]. 3. Notation and Preliminary Definition Let I, B, L, and V denote pairwie dijoint infinite et of Internationalized Reource Identifier (IRI), blank node, literal, and variable, repectively. Let IB, IL, IV, IBL, and IV L denote I B, I L, I V, I B L, and I V L, repectively. Element of the et IBL are alo called RDF term. In the following, we formalize the notion of RDF triple, RDF graph, triple pattern, baic graph pattern, and neted optional graph pattern. Definition 3.1 (RDF triple and RDF graph): An RDF triple t i a tuple (, p, o) (IB) I (IBL), where, p, and o are a ubject, predicate, and object, repectively. An RDF graph G i a et of RDF triple. Definition 3.2 (Triple pattern): A triple pattern tp i a triple (p, pp, op) (IV L) (IV ) (IV L), where p 2, pp, and op are a ubject pattern, predicate pattern, and object pattern, repectively. Definition 3.3 (Baic graph pattern): A baic graph pattern bgp i a et of triple pattern {tp 1, tp 2,..., tp n 1, tp n }, alo denoted a tp 1 AND tp 2 AND AND tp n 1 AND tp n, where AND i a binary operator that correpond to the conjunction in SPARQL and n i the number of triple pattern in bgp. Definition 3.4 (Neted optional graph pattern): A neted optional graph pattern nogp ha the form bgp 1 OP T { bgp 2 OP T { {bgp n 1 OP T { bgp n }} }}, where OP T correpond to the OPTIONAL contruct in SPARQL, curly brace {} denote neting of graph pattern, and n 3 repreent the number of baic graph pattern in nogp. Formal emantic of RDF and SPARQL are decribed in [17], [5], [3]. In thi paper, to achieve the emantic equivalence of SQL querie that reult from bottom-up and top-down SPARQL-to-SQL tranlation, we require neted optional graph pattern to be well-deigned [5], uch that 2 Note that a triple pattern can have a literal a a ubject pattern, while an RDF triple cannot have a literal a a ubject. Thi inconitency between current RDF [1] and SPARQL [17] pecification doe not affect our work and mot likely will be reolved by W3C. for any ub-pattern {bgp i 1 OP T { bgp i }} in nogp, if a variable?v occur both outide thi ub-pattern and inide bgp i, then?v alo occur in bgp i 1. In order to upport a generic tranlation of SPARQL graph pattern into equivalent SQL querie over different databae chema, we need a generic repreentation for a relational RDF torage cheme, in which the following information will be modeled: (1) which relation i ued to tore RDF triple that can potentially match a triple pattern, and (2) which relational attribute of the relation are ued to tore the component (ubject, predicate, and object) of triple. To capture thi information, we formalize the relational RDF torage cheme a the following two RDFto-Relational mapping α and β. Definition 3.5 (Mapping α): Given a et of all poible triple pattern T P = (IV L) (IV ) (IV L) and a et of relation REL in a relational RDF databae, a mapping α i a many-to-one mapping α : T P REL, if given a triple pattern tp T P, α(tp) i a relation in which all the triple that may match tp are tored. Definition 3.6 (Mapping β): Given a et of all poible triple pattern T P = (IV L) (IV ) (IV L), a et P OS = {ub, pre, obj}, and a et of relational attribute AT R in a relational RDF databae, a mapping β i a many-to-one mapping β : T P P OS AT R, if given a triple pattern tp T P and a poition po P OS, β(tp, po) i a relational attribute whoe value may match tp at poition po. Example of different torage cheme captured with α and β can be found in our prior work [3]. In addition to mapping α and β, our tranlation ue three auxiliary function: (1) a function alia that generate a unique alia for a relation, (2) a function var that return a et of all variable in a graph pattern, and (3) a function name that generate a unique name for a variable in V, uch that the generated name conform to the SQL yntax for relational attribute name (e.g., a variable can be renamed by imply removing initial? or $ ). Finally, for the brevity of our preentation, we aume the exitence of an algorithm that tranlate SPARQL baic graph pattern into fully flat SQL querie. We denote uch an algorithm a function BGPtoFlatSQL; a imilar algorithm i preented in [18]. 4. Bottom-Up Neted Optional Graph Pattern Tranlation The bottom-up approach to SPARQL-to-SQL query tranlation i well-tudied in the literature [3] and implemented in many relational RDF databae. Thi ection preent an algorithm that implement one of our tranlation rule decribed in [3]. It hould be noted that, while thi paper aume that neted OPTIONAL claue contain baic graph pattern, which i ufficient for our tudy, in the general
cae, other graph pattern, uch a equential optional graph pattern and alternative graph pattern, are poible. The algorithm ue the tranlation rule for the general cae with an additional implification that eliminate the call of the Coalece function for ome attribute in projection lit. The ue of Coalece i redundant with only baic graph pattern aumed in OPTIONAL claue; however, other implification on join condition are not applied. Our bottom-up tranlation function NOGPtoSQL-BU i outlined in Algorithm 1. It viit each baic graph pattern in a SPARQL neted optional graph pattern nogp tarting from bgp n and going up to bgp 1. Each baic graph pattern i tranlated to SQL uing function BGPtoFlatSQL producing a flat SQL query. During the firt loop iteration, the tranlation of bgp n i aigned to variable ql and the tranlation of bgp n 1 i aigned to variable ql i. A new SQL query that compute a left outer join between virtual relation ql i and ql i contructed. Thi query contain: ($ql i ) $a 1 Left Outer Join ($ql) $a 2 in it From claue, where a 1 and a 2 are unique aliae; a join condition $a 1.$ra = $a 2.$ra Or $a 1.$ra I Null Or $a 2.$ra I Null in it On claue, which require common relational attribute in a 1 and a 2 to be equal or one of them to be Null; and a projection lit in it Select claue of all attribute in a 1 and all other unique attribute in a 2. Thi newly contructed query i aigned to variable ql, overwriting it previou value. The following loop iteration repeat the procedure but with a new value of ql a previouly decribed and a new value of ql i that now hold the tranlation of bgp n 2. After the final iteration, a value of ql repreent a fully generated query and i returned. 5. Top-Down Neted Optional Graph Pattern Tranlation One of the firt top-down SPARQL-to-SQL query tranlation found in the literature i decribed in our unpublihed report [4]. Thi ection ummarize our olution for the cae when only baic graph pattern are ued in OPTIONAL claue. Our top-down tranlation function NOGPtoSQL-TD i outlined in Algorithm 2. The logic of thi algorithm i imilar to the logic decribed for NOGPtoSQL-BU. One obviou difference i that function NOGPtoSQL-TD viit each baic graph pattern in a SPARQL neted optional graph pattern nogp tarting from bgp 1 and going down to bgp n. The other difference lie in how a join condition i generated. It encode the following emantic: before a neted optional graph pattern can ucceed, all containing optional graph pattern mut have ucceeded. Therefore, a join condition mut check that a baic graph pattern in a containing OPTIONAL claue ha a olution. Thi i achieved via a Not Null check on a relational attribute with pecial propertie: thi attribute mut appear in the Select claue of ql, ince the Algorithm 1 Bottom-up tranlation of SPARQL neted optional graph pattern to SQL querie 1: function NOGPtoSQL-BU 2: input: neted optional graph pattern nogp; mapping α and β; function alia, var, and name 3: output: bottom-up SQL query 4: Let nogp = bgp 1 OP T { bgp 2 OP T { {bgp n 1 OP T { bgp n }} }} and n 3 5: //Contruct a bottom-up SQL query: 6: ql = BGPtoFlatSQL(bgp n, α, β, alia, var, name) 7: for i = n 1; i 1; i = i 1 do 8: //Contruct the SQL From claue: 9: ql i = BGPtoFlatSQL(bgp i, α, β, alia, var, name) 10: a 1 = alia(); a 2 = alia() 11: from = ($ql i ) $a 1 Left Outer Join ($ql) $a 2 12: //Contruct a join condition: 13: cond = True 14: for each relational attribute ra that appear in the Select claue of both ql i and ql do 15: cond += And ( $a 1.$ra = $a 2.$ra Or $a 1.$ra I Null Or $a 2.$ra I Null ) 16: end for 17: //Contruct the SQL Select claue: 18: elect = 19: for each relational attribute ra that appear in the Select claue of ql i do 20: elect += $a 1.$ra A $ra, 21: end for 22: for each relational attribute ra that appear in the Select claue of ql but not ql i do 23: elect += $a 2.$ra A $ra, 24: end for 25: ql = Select $elect From $from On($cond) 26: end for 27: return ql 28: end function tranlation of the containing graph pattern i part of ql, and it mut correpond to a variable that firt occurred in a baic graph pattern of the containing OPTIONAL claue and not in any preceding baic graph pattern. If uch an attribute i not readily available, a new attribute for a dummy variable can be introduced in a baic graph pattern to perform the check. Further detail on thi olution can be found in [4]. 6. Performance Study Thi ection report our query performance tudy conducted uing the WordNet dataet and tet SPARQL querie that were tranlated to SQL uing the propoed bottom-up and top-down query tranlation algorithm and evaluated in three relational databae management ytem. 6.1 Experimental Setup The experiment were conducted on a erver with two 2GHz Intel Xeon E5504 Nehalem CPU, 32GB RAM and 6TB dik array running Ubuntu 9.02 Jaunty x64. Three different databae management ytem, namely Oracle 10.2 Expre Edition, DB2 9.7 Expre-C and PotgreSQL 8.3.12, were intalled on the erver.
Algorithm 2 Top-down tranlation of SPARQL neted optional graph pattern to SQL querie 1: function NOGPtoSQL-TD 2: input: well-deigned neted optional graph pattern nogp; mapping α and β; function alia, var, and name 3: output: top-down SQL query 4: Let nogp = bgp 1 OP T { bgp 2 OP T { {bgp n 1 OP T { bgp n }} }} and n 3 5: //Contruct a top-down SQL query: 6: ql = BGPtoFlatSQL(bgp 1, α, β, alia, var, name) 7: for i = 2; i n; i = i + 1 do 8: //Contruct the SQL From claue: 9: ql i = BGPtoFlatSQL(bgp i, α, β, alia, var, name) 10: a 1 = alia(); a 2 = alia() 11: from = ($ql) $a 1 Left Outer Join ($ql i ) $a 2 12: //Contruct a join condition: 13: Let v be a relational attribute that (1) appear in the Select claue of ql, (2) v = name(?v) correpond to a variable?v, and (3) variable?v var(bgp i 1 ) (var(bgp 1 ) var(bgp 2 ) var(bgp i 2 )) firt occur in bgp i 1 but not in bgp 1, bgp 2,, bgp i 2. If ql ha no attribute that atifie thee condition, a dummy attribute mut be introduced a dicued in [4]. 14: cond = $v I Not Null 15: for each relational attribute ra that appear in the Select claue of both ql and ql i do 16: cond += And $a 1.$ra = $a 2.$ra 17: end for 18: //Contruct the SQL Select claue: 19: elect = 20: for each relational attribute ra that appear in the Select claue of ql do 21: elect += $a 1.$ra A $ra, 22: end for 23: for each relational attribute ra that appear in the Select claue of ql i but not ql do 24: elect += $a 2.$ra A $ra, 25: end for 26: ql = Select $elect From $from On($cond) 27: end for 28: return ql 29: end function Our algorithm were implemented in Java 6 within the S2ST 3 ytem; generic chema and data mapping algorithm upported by S2ST were ued to generate identical databae chema in Oracle, DB2 and PotgreSQL, and to tore the RDF dataet into the databae, repectively. 6.2 Dataet and Tet Querie The OWL repreentation of WordNet 4 wa choen for our experiment. WordNet i a lexical databae for the Englih language, which organize Englih word into ynonym et according to part of peech (e.g. noun, verb, etc.) and enumerate linguitic relation between thee et. In the WordNet.OWL, each part of peech i modeled a an owl:cla, and each linguitic relation i 3 S2ST: Next-Generation Relational RDF Databae Management Sytem (RRDBMS), http://www.2t.org 4 WordNet (verion 1.2), a lexical databae for Englih, http:// wordnet.princeton.edu Table 1: Propertie and Reource in WordNet 1.2 Property Count Reource Count type 251,726 WordObject 140,470 wordform 195,802 Noun 75,804 gloaryentry 111,223 Verb 13,214 hyponymof 90,267 AdjectiveSatellite 11,231 imilarto 22,494 Adjective 7,345 antonymof 7,115 Adverb 3,629 Other 36,225 Other 33 Total 714,852 Total 251,726 modeled a an owl:objectproperty, owl:datatypeproperty, owl:tranitiveproperty, or owl:symmetricproperty. The relevant tatitic for the WordNet dataet i hown in Table 1. For example, WordNet.OWL contain 251,726 triple involving rdf:type a the predicate, and 140,470 of them have wn:wordobject a the object. Table 2 how 22 SPARQL querie over the WordNet dataet that were carefully elected for our experiment. In the table, W tand for WHERE and O tand for OPTIONAL; the SPARQL SELECT claue i omitted for brevity, and the projection include all ditinct variable of a query. Querie Q1-Q6 are contructed a all poible permutation of the three triple pattern occurring outide and inide OPTIONAL claue. Thee querie have one neted OPTIONAL claue. Querie Q1 -Q6 and Q1 -Q6 are obtained from repective querie Q1-Q6 by retricting variable value in the firt and econd triple pattern, repectively. The rationale for uch retriction i to reduce cardinalitie of intermediate relation reulting from firt left outer join in the querie. In particular, in term of the intermediate relation ize, Q1 - Q6 favor the top-down approach and Q1 -Q6 favor the bottom-up approach. We choe not to retrict variable value in the third triple pattern of the neted OPTIONAL claue in any of querie Q1-Q6 becaue the relation that reult after matching the third triple pattern i alway ued a the right operand of a left outer join and therefore can only marginally influence the join reult for the given dataet and querie. Finally, querie Q7, Q8, Q7, and Q8 are intereting becaue they only include triple pattern of the ame form with the ame predicate and variable a ubject and object pattern. From the viewpoint of bottom-up and top-down tranlation, thee querie are ymmetric. 6.3 Bottom-Up and Top-Down Query Performance The S2ST ytem wa ued to generate databae chema with property relation [3] and load WordNet.OWL into Oracle, DB2 and PotgreSQL. The tet SPARQL querie were tranlated to SQL uing algorithm NOGPtoSQL- BU and NOGPtoSQL-TD. The reulting SQL querie were evaluated by RDBMS. To prevent an unintentional comparion of the three RDBMS, Fig. 3 report the ratio of a bottom-up query evaluation time to a top-down query evaluation time for each tet query. In the figure, if ratio
Table 2: Tet SPARQL Querie Q# SPARQL Q1 W{?a rdf:type :Adjective O{?a :wordform?c O{?a :gloaryentry?b}}} Q2 W{?a rdf:type :Adjective O{?a :gloaryentry?b O{?a :wordform?c}}} Q3 W{?a :wordform?c O{?a rdf:type :Adjective Q4 O{?a :gloaryentry?b}}} W{?a :gloaryentry?b O{?a rdf:type :Adjective O{?a :wordform?c}}} Q5 W{?a :wordform?c O{?a :gloaryentry?b O{?a rdf:type :Adjective}}} Q6 W{?a :gloaryentry?b O{?a :wordform?c O{?a rdf:type :Adjective}}} Q7 W{?n1 :hyponymof?n2 O{?n2 :hyponymof?n3 O{?n3 :hyponymof?n4}}} Q8 W{?n1 :hyponymof?n2 O{?n2 :hyponymof?n3 O{?n3 :hyponymof?n4 O{?n4 :hyponymof?n5 O{?n5 :hyponymof?n6 O{?n6 :hyponymof?n7}}}}}} Q1 -Q6 Q1 -Q6 Q7 Q8 Same a repective querie Q1 - Q6 but with one variable in the firt triple pattern (the W claue) retricted to a URI or literal Same a repective querie Q1 - Q6 but with one variable in the econd triple pattern (the firt O claue) retricted to a URI or literal Same a Q7 but with?n1 and?n4 retricted to URI Same a Q8 but with?n1 and?n7 retricted to URI (a) over an RDF databae intantiated in Oracle > 1, a top-down query wa fater; if ratio < 1, a bottomup query wa fater; and if ratio = 1, both top-down and bottom-up querie howed the ame execution time. Our firt obervation wa that bottom-up and top-down querie generally howed different execution time. Thi obervation gave the definite No anwer to quetion While it can be expected that relational query optimizer produce identical query execution plan for emantically equivalent bottom-up and top-down querie, i thi uually the cae in practice? in the cae of SPARQL querie with neted optional graph pattern. Our econd obervation wa that different databae management ytem howed quite different and ometime even contradicting query evaluation ratio. For example, Oracle howed much le contrat between bottom-up and top-down approache than DB2 and PotgreSQL. Some querie, uch a Q1, Q3, Q4, Q5, and Q6, howed different clae of ratio (> 1, < 1, and = 1) in different databae. For example, for Q6, the bottom-up approach wa lower than the top-down approach in Oracle, equivalent to the top-down approach in DB2, and fater than the top-down approach in PotgreSQL. Our third obervation wa that electivitie of participating triple pattern and their occurrence in a SPARQL query had a ignificant impact on which SPARQL-to-SQL tranlation trategy won, which could be explained by a imilar effect of cardinalitie of join participating relation and intermediate relation on correponding top-down and bottom-up SQL querie. In particular, top-down querie Q1 and Q2 were conitently fater in all experiment, given that the firt triple pattern?a rdf:type :Adjective yielded the mallet reult et of 7, 345 triple (the other two triple pattern yielded over 10 time larger reult), and therefore the intermediate relation in the top-down querie wa alo mall and over 10 time (b) over an RDF databae intantiated in DB2 (c) over an RDF databae intantiated in PotgreSQL Fig. 3: Bottom-up and top-down query performance. maller than the intermediate relation in the correponding bottom-up querie. When?a rdf:type :Adjective occurred in the firt OPTIONAL claue of Q3 and Q4, the ituation wa oppoite: the intermediate relation in the bottom-up querie wa over 10 time maller than the intermediate relation in the correponding top-down querie. However, while all three ytem howed that the ratio decreaed when compared to Q1 and Q2, only Oracle howed the advantage of the bottom-up approach, and DB2 and PotgreSQL till ran top-down querie fater. Moving?a rdf:type :Adjective to the neted OPTIONAL claue in Q5 and Q6 did not favor one or the other tranlation trategy ince the lat triple pattern did not influence the ize of an intermediate relation. Top-down querie Q5 and Q6 were conitently fater in all experiment. Next, retricting electivitie of the firt triple pattern in Q1 -Q6 to 1 or 2 triple, which wa favorable for the top-down approach, howed that the top-down querie
were fater or a fat a the correponding bottom-up querie. Interetingly, Oracle howed identical performance for both top-down and bottom-up querie Q1 -Q6. Finally, Q1 -Q6, which retricted electivitie of the econd triple pattern and favored the bottom-up approach, howed a conitent performance pattern only for PotgreSQL, where bottomup querie were fater. For Oracle and DB2, ome querie howed a imilar pattern: top-down querie Q1 and Q5 were fater and bottom-up querie Q3 and Q4 were fater; in addition, both bottom-up and top-down Q6 howed identical time in DB2, top-down Q6 wa fater in Oracle, bottom-up query Q2 wa ignificantly fater (the mallet ratio in our experiment) in Oracle but a fat a top-down query Q2 in DB2. Our fourth obervation wa that ymmetric querie Q7 and Q8 (and imilarly Q7 and Q8 ), which are neutral to both top-down and bottom-up tranlation trategie, howed better performance of the top-down querie. The ratio were ignificantly larger for DB2 and PotgreSQL, while only from 1.19 to 2.12 time larger in Oracle. Thee ymmetric querie howed that, in a general (with no particular bia toward one or the other tranlation trategy) cae, the topdown approach i uperior to the bottom-up approach. Our lat, fifth obervation wa that a choice of a tranlation trategy could have a tremendou impact on a reulting query performance. In one cae of Q2 for Oracle, the bottom-up query wa over 600 time fater than the top-down query. In 12 other cae (all occurred in experiment with DB2 and PotgreSQL), the ratio were greater than 1, 000 in the favor of top-down querie. 6.4 Summary The performance tudy give the anwer to the two quetion of thi paper. For the firt quetion, our reult imply that, in a general cae, a relational RDF databae deigner cannot rely on a relational query optimizer to produce identical or cloe to identical query execution plan for emantically equivalent SQL querie reulted from bottom-up and top-down tranlation of SPARQL querie. To anwer the econd quetion, neither of the two approache i univerally better than it ibling. The performance of querie produced by bottom-up and top-down tranlation trategie depend on many factor, including electivitie of triple pattern, their order and location in a SPARQL query, and even a relational engine that evaluate tranlated querie. A number of important obervation are made that ugget direction for chooing the bet tranlation trategy for a particular query by a SPARQL query optimizer; the choice can have a tremendou impact on query performance. 7. Concluion and Future Work In thi paper, we tudied the bottom-up and top-down SPARQL-to-SQL tranlation trategie and compared them empirically in the context of SPARQL querie with neted optional graph pattern. We propoed bottom-up and topdown neted graph pattern tranlation algorithm and compared their reulting SQL querie in Oracle, DB2, and PotgreSQL. Our performance tudy uggeted that the choice between bottom-up and top-down tranlation algorithm can have dramatic performance implication on the reulting SQL querie. Thi choice depend on many factor, including electivitie of triple pattern, their order and location in a SPARQL query, and even a relational engine that evaluate tranlated querie. In the future, we will reearch a formal framework for optimizing SPARQL querie and defining heuritic for chooing a good tranlation trategy for a SPARQL query. Reference [1] W3C, Reource Decription Framework (RDF): Concept and Abtract Syntax. W3C Recommendation, 10 February 2004. G. Klyne, J. J. Carroll, and B. McBride (Ed.), 2004. [2] A. Chebotko and S. Lu, Querying the Semantic Web: An Efficient Approach Uing Relational Databae. LAP Lambert Academic Publihing, 2009. [3] A. Chebotko, S. Lu, and F. Fotouhi, Semantic preerving SPARQLto-SQL tranlation, Data & Knowledge Engineering (DKE), vol. 68, no. 10, pp. 973 1000, 2009. [4] A. Chebotko, S. Lu, H. M. Jamil, and F. Fotouhi, Semantic preerving SPARQL-to-SQL query tranlation for optional graph pattern, Wayne State Univerity, Tech. Rep. TR-DB-052006-CLJF, May 2006, available from http://www.c.wayne.edu/~artem/main/reearch/ TR-DB-052006-CLJF.pdf. [5] J. Perez, M. Arena, and C. Gutierrez, Semantic and complexity of SPARQL, ACM Tranaction on Databae Sytem (TODS), vol. 34, no. 3, pp. 16:1 16:45, 2009. [6] R. Cyganiak, A relational algebra for SPARQL, Hewlett-Packard Laboratorie, Tech. Rep. HPL-2005-170, 2005, available from http: //www.hpl.hp.com/techreport/2005/hpl-2005-170.html. [7] F. Zemke, Converting SPARQL to SQL, Tech. Rep., October 2006, available from http://lit.w3.org/archive/public/public-rdf-dawg/ 2006OctDec/att-0058/parql-to-ql.pdf. [8] S. Harri and N. Shadbolt, SPARQL query proceing with conventional relational databae ytem, in Proc. of SSWS, 2005, pp. 235 244. [9] M. F. Huain, L. Khan, M. Kantarcioglu, and B. M. Thuraiingham, Data intenive query proceing for large RDF graph uing cloud computing tool, in Proc. of CLOUD, 2010, pp. 1 10. [10] C. Franke, S. Morin, A. Chebotko, J. Abraham, and P. Brazier, Ditributed Semantic Web data management in HBae and MySQL Cluter, in Proc. of CLOUD, 2011. [11] G. Serfioti, I. Koffina, V. Chritophide, and V. Tannen, Containment and minimization of RDF/S query pattern. in Proc. of ISWC, 2005, pp. 607 623. [12] O. Hartig and R. Heee, The SPARQL query graph model for query optimization, in Proc. of ESWC, 2007, pp. 564 578. [13] A. Harth and S. Decker, Optimized index tructure for querying RDF from the Web, in Proc. of LA-WEB, 2005, pp. 71 80. [14] O. Udrea, A. Pugliee, and V. S. Subrahmanian, GRIN: A graph baed RDF index, in Proc. of AAAI, 2007, pp. 1465 1470. [15] C. Wei, P. Karra, and A. Berntein, Hexatore: extuple indexing for Semantic Web data management, Proc. of PVLDB, vol. 1, no. 1, pp. 1008 1019, 2008. [16] G. H. L. Fletcher and P. W. Beck, Scalable indexing of RDF graph for efficient join proceing, in Proc. of CIKM, 2009, pp. 1513 1516. [17] W3C, SPARQL Query Language for RDF. W3C Recommendation, 15 January 2008. E. Prud hommeaux and A. Seaborne (Ed.), 2008. [18] A. Chebotko, S. Lu, X. Fei, and F. Fotouhi, RDFProv: A relational RDF tore for querying and managing cientific workflow provenance, Data & Knowledge Engineering (DKE), vol. 69, no. 8, pp. 836 865, 2010.