XML-to-SQL Query Mapping in the Presence of Multi-valued Schema Mappings and Recursive XML Schemas

Size: px
Start display at page:

Download "XML-to-SQL Query Mapping in the Presence of Multi-valued Schema Mappings and Recursive XML Schemas"

Transcription

1 XML-to-SQL Query Mapping in the Presence of Multi-valued Schema Mappings and Recursive XML Schemas Mustafa Atay, Artem Chebotko, Shiyong Lu, and Farshad Fotouhi Department of Computer Science Wayne State University Detroit, Michigan USA {matay, artem, shiyong, Abstract. Several query mapping algorithms have been proposed to translate XML queries into SQL queries for a schema-based relational XML storage. However, existing query mapping algorithms only support single-valued mapping schemes, in which each XML element type is mapped to exactly one relation, and do not support multi-valued mapping schemes, in which each XML element type can be mapped to multiple relations. In this paper, we propose a generic query mapping algorithm, ID-XMLToSQL, for a schema-based relational XML storage. To the best of our knowledge, our algorithm provides the first generic solution to the XML-to-Relational query mapping problem that is applicable to both single-valued and multi-valued mapping schemes. Moreover, our algorithm also provides an elegant solution to the query mapping problem in the presence of recursive XML schemas and recursive queries. While existing algorithms need special recursion operators, our algorithm only requires the traditional relational operators and thus, can work with all relational databases. 1 Introduction Numerous researchers propose to use relational databases for storing and querying XML documents in order to get benefits of this mature technology. This approach requires algorithms to map XML schemas, documents and queries, into their relational equivalents. An XML-to-SQL query mapping algorithm for a schema-based relational XML storage should respect the underlying XML-to-Relational schema mapping scheme. The XML-to-Relational schema mapping schemes in the literature can be classified into the following two categories: Single-valued Schema Mappings. In a single-valued schema mapping, an XML element or attribute type is mapped into exactly one single relation in the target relational schema. Thus, it shows the characteristics of a function. The Shared schema mapping approach introduced in [1] and ODT DMap approach introduced in [2] fall into this category. R. Wagner, N. Revell, and G. Pernul (Eds.): DEXA 2007, LNCS 4653, pp , c Springer-Verlag Berlin Heidelberg 2007

2 604 M. Atay et al. Multi-valued Schema Mappings. In a multi-valued schema mapping, an XML element or attribute type can be mapped into more than one relation in the target relational schema. The multi-valued schema mappings do not show the characteristics of a function and thus they are harder to deal with. The Basic and Hybrid schema mapping approaches proposed in [1] fall into this category. Although there are several query mapping algorithms for single-valued schema mapping schemes, there is no published query mapping algorithm which supports multi-valued schema mapping schemes to our best knowledge. Therefore, we propose a generic query mapping algorithm which supports both multi-valued and single-valued schema mapping schemes in this paper. Our generic algorithm also provides an elegant solution to the XML-to- Relational query mapping problem in the presence of recursive XML schemas and recursive queries. This problem is identified as an important practical problem in the literature [3,4,5]. Recursive XML schemas are common in practice as pointed out by [6] in which 35 DTDs found to be recursive out of 60 real-world DTDs. On the other hand, recursive XML queries, which include descendant axis //, are also common in practice. The challenge of XML-to-SQL query mapping is that, when there is recursion both in an XML query and in its underlying XML schema, there might be infinitely many paths corresponding to the given recursive XML query. There are two elegant algorithms [4,5] in the literature which address this issue. These algorithms solve the recursion within the relational engine by using special SQL operators which are not supported by some RDBMSs. On the other hand, we solve the recursion at XML schema level without using special SQL operators. The main contributions of this paper include the following: 1. We propose a generic query mapping algorithm, ID-XMLToSQL, for a schema-based relational XML storage scheme. To the best of our knowledge, our algorithm provides the first generic solution to the XML-to-Relational query mapping problem that is applicable to all relational XML storage mapping schemes proposed in the literature, including both single-valued and multi-valued schema mapping schemes. 2. We propose to convert a cyclic XML schema graph to a directed acyclic graph by unfolding the cycles in the XML schema graph to facilitate the recursive query mapping process. Thus, we can find out a finite number of matching paths on the generated acyclic graph for an arbitrary XML query including the recursive ones. Therefore, our proposed query mapping technique can be implemented on any RDBMS as it does not require using special SQL operators to capture the recursion while the existing algorithms need special recursion operators. Organization: The rest of the paper is organized as follows. Section 2 gives a summary of related work. We give a motivation on generic query mapping in Section 3. Section 4 includes all necessary preliminaries for our generic query mapping algorithm. The outline of our proposed query mapping algorithm ID- XMLToSQL isgiveninsection5.wedemonstrateaperformancestudyofthe

3 XML-to-SQL Query Mapping in the Presence 605 algorithm ID-XMLToSQL in Section 6. Finally, Section 7 concludes the paper and points out some potential future work. 2 Related Work In order to query XML data stored in a relational database, one should map the XML queries into relational queries based on the underlying XML-to-Relational schema mapping scheme. Hence, we can split the XML-to-Relational query mapping algorithms into the following two categories based on the underlying schema mapping schemes: Schema-less Query Mapping. There has been a lot of work on schema-less query mapping [7,8,9,10,11,12,13,14]. In this approach, XML schema is considered to be missing or not used and a generic relational schema is generated for all XML documents. Then, a given XML query is mapped to its relational equivalent using the generic relational schema. Schema-based Query Mapping. There have been several works on schemabased query mapping [1,15,4,5,16,17,18] where an XML schema is provided and used to generate a good relational schema. The generated relational schemas vary according to the input XML schemas. Therefore, an XML-to- Relational query mapping algorithm should know and respect the underlying XML-to-Relational schema mapping to generate correct and efficient relational queries. The problem of mapping recursive XML queries in the presence of recursive schemas studied in schema-less query mapping space [8,10]. However, their query mapping algorithms are not applicable to the schema-based query mapping space. Recently, two elegant approaches proposed in [4,5] to map recursive XML queries to their relational equivalents in the presence of recursive XML schemas. The query mapping algorithm of [4] first derives a query graph for an input path query from the XML schema graph. Then, it partitions the query graph into strongly-connected components and generates an SQL query for each component. If a component is recursive, then, the recursion in this component is captured in the corresponding SQL query by using the with construct of SQL 99. The query mapping algorithm of [5] first rewrites a given XPath query into a regular XPath expression which is capable of capturing recursion both in a DTD and in an XPath query. Furthermore, they provide an algorithm for translating regular XPath expressions to relational queries using least fixpoint (LFP) operator. The LFP operator is used to capture the recursion in the queries. However, these recursive query mapping algorithms are not generic enough to be used with multi-valued mappings such as Basic and Hybrid introduced in [1]. Moreover, they require the usage of special SQL operators such as with construct of SQL 99 or LFP operator which are not supported by some RDBMSs. Our proposed ID-XMLToSQL algorithm overcomes these limitations.

4 606 M. Atay et al. 3 Motivation A generic query mapping algorithm for a schema-based relational XML storage is supposed to work with a general class of XML-to-Relational schema mappings which can be classified into two main categories as Single-valued Schema Mappings and Multi-valued Schema Mappings. Surprisingly, there is no published XML-to-Relational query mapping algorithm in the schema-based XML storage space which is generic enough to work with the multi-valued XML-to-Relational schema mappings. The recursive query translation algorithm of [4] handles a general class of single-valued XML-to- Relational mappings. The main query translation procedure SQL() in [4] uses the function Annot() to find out the relation/column corresponding to an XML element. Neither Annot() nor SQL() support the multi-valued XML-to-Relational schema mapping. Thus, [4] is not generic enough to handle all types of mappings proposed in the literature. While the RegT osql algorithm proposed in [5] supports a broad class of XPath queries, it still lacks the support for multi-valued schema mappings. A single-valued mapping is a function which returns only one relation for an input XML element/attribute type. The target relation to retrieve an XML element or attribute can easily be determined from a single-valued mapping. Thus, single-valued mappings are relatively easier to handle during the query mapping phase. A multi-valued mapping is not a function since it can return multiple relations for an input XML element/attribute type. This situation may cause ambiguity while a query mapping algorithm is trying to locate the target relation for an XML element type to retrieve its data. Hence, a query mapping algorithm based on a multi-valued mapping should be intelligent enough to resolve this possible ambiguity and find out the correct relation(s) to access. Thus, it is more challenging to map XML queries to relational queries under multi-valued mapping schemes than under single-valued mapping schemes. A B1 B2 B3 C D1 D2 D3 E Fig. 1. A Sample XML Schema Graph

5 XML-to-SQL Query Mapping in the Presence 607 Table 1. Single-valued and Multi-valued Schema Mapping Examples Single-valued σ-mapping (Shared) Element Relation A A B1 B1 B2 B2 B3 B3 C C D1 D1 D2 C D3 D3 E E (A) Multi-valued σ-mapping (Hybrid) Element Relation A A B1 B1 B2 B2 B3 B3 C B1, B2, B3 D1 D1 D2 B1, B2, B3 D3 A, B1, B2, B3 E E (B) We use a data structure to store XML-to-Relational schema mapping information. We call this data structure as σ-mapping and formally define it in Section 4.1. The σ-mappings based on Shared and Hybrid approaches for the XML schema shown in Figure 1 are given in Table 1.A and Table 1.B, respectively. We assume the XML attribute types are mapped to the same relation with their parent element types. Example 1. If the XPath expression /A/B1/C/D3 is given against the XML schema graph shown in Figure 1, following will be its SQL equivalent based on a typical query mapping algorithm which generates a SQL query by joining all the relations along a path: Select T4.ID From σ(a) T1, σ(b1) T2, σ(c) T3, σ(d3) T4 Where T1.ID=T2.parentID And T2.ID=T3.parentID And T3.ID=T4.parentID While it is trivial to find out the matching relations in this SQL query based on the single-valued σ-mapping given in Table 1.A, it is not straightforward to find out them in case of the multi-valued σ-mapping shown in Table 1.B. For instance, it is not clear which relation should be returned for σ(c) out of the set {B1,B2,B3} and for σ(d3) out of the set {A,B1,B2,B3}. We propose the notion of path-based σ-mapping (σ p -mapping) in Section 4.2 to resolve the ambiguity due to the multi-valued schema mapping schemes by the help of input path structure and the existing mapping information. 4 Preliminaries 4.1 Schema-Based Query Mapping In schema-based relational XML storage, query mapping typically takes an XML query, an XML schema, the XML-to-Relational schema mapping information, which is called σ-mapping, and a database as input, produces a relational query, runs it against the database where the XML document is stored, and returns the query results as output. In the following, we formalize the notions of σ-mapping and query mapping:

6 608 M. Atay et al. Definition 1 (σ-mapping). Given an XML schema S with element-type set E and attribute-type set A, and a database schema R, aσ-mapping is a mapping σ :(E A) R, such that given an attribute/element type e, σ(e) is the set of relations in which the instances of e will be stored. Definition 2 (Query Mapping). A query mapping QM is a function that assigns to each tuple (Q, S, X, R, B, σ) a relational query Q, where Q is an XML query, S is an XML schema, X is an XML document conforming to S, R is a database schema, B is a database of R, σ is a mapping from S to R, and Q is a set of relational queries equivalent to Q such that Q (B) Q(X). 4.2 σ p -Mapping We propose to define a path-based σ-mapping (σ p -mapping) to resolve the mapping ambiguity that arises in the presence of multi-valued schema mappings. The σ p -mapping uses the information obtained from the path structure and σ-mapping to find a single relation for each element in the input path. Once σ p -mapping of a particular path expression is computed, then the equivalent relational query can be constructed without any ambiguity concern. Lemma 1. Any edge in an XML schema graph G is identified either as a normal-edge or a -edge. Proof. If an element can occur at most once under its parent, then it is connected to its parent by an edge labeled by, or? in XML schema graph G. Allthe edges in G labeled by, and? operators constitute normal-edges. If an element can occur more than once under its parent, then this element is connected to its parent by an edge labeled by or + in G. All the edges in G labeled by and + operators constitute -edges. Since there is no occurrence operator other than {,,?,, + } in G, any edge in an XML schema graph is either a normal-edge or a -edge. In the following, we formalize the notions of simple path expression and σ p - mapping: Definition 3 (Simple Path Expression). A simple path expression p can be denoted as /n 1 /n 2 /.../n k where each n i isthenodetypeofstepi and the axis of each step is child axis / which denotes parent-child relationship. The node type n 1 represents the root element of the XML document and k represents the number of steps in p. Definition 4 (σ p -Mapping). Given an input simple path p = /e 1 /e 2 /.../e n, σ-mapping σ, and an XML schema graph G, σ p (e i ) is defined as follows where i =1, 2,..., n: { σ(ei ), if σ(e i ) =1 σ p (e i ) = e i, if σ(e i ) >1 and (e i 1,e i ) is a -edge in G σ p(e i 1), if σ(e i) >1 and (e i 1,e i)isanormal-edgeing

7 XML-to-SQL Query Mapping in the Presence 609 Example 2. If the XPath expression p=/a/b1/c/d3 is given based on the XML schema graph shown in Figure 1, the below σ p -mapping is produced by computing the σ p based on the multi-valued schema mapping shown in Table 1.B: σ p Element Relation A A B1 B1 C B1 D3 B1 Theorem 1 (Correctness). Given an input simple path expression p = /e 1 /e 2 /.../e n, σ p (e i ) returns the correct and single target relation for every element e i in p, wherei =1, 2,..., n. Proof (Sketch). First, σ p (e i ) returns the same relation as σ(e i ) if the input element e i is mapped to a single relation. Second, if the input element e i is mapped to multiple relations, then the type of the edge between e i and its parent e i 1 is checked from the XML schema graph. If the edge is a -edge, then the σ p (e i ) returns the relation e i since e i is mapped to a separate relation as it occurs multiple times under its parent. Third, if the input element e i is mapped to multiple relations and the type of the edge between e i and its parent e i 1 is a normal-edge, then the σ p (e i 1 ) is called to determine the target relation for e i since it is mapped to the same relation as its parent e i 1.Recursivecalltoσ p (e i 1 ) stops whenever a single relation is returned. If all the edges from e 1 to e i 1 are normal-edges, then the recursion is going to stop at σ p (e 1 )sincee 1 is the root element and it is always mapped to the single relation e 1. All the edges in an XML schema graph fall into either normal-edge or -edge categories as it follows from Lemma 1. As a result, σ p (e i ) returns the correct and the single relation corresponding to element e i. Besides multi-valued mappings, the σ p -mapping can deal with single-valued schema mappings where it returns the same values as σ-mapping. Therefore, σ p -mapping is sufficient to develop a generic XML-to-Relational query mapping algorithm in the presence of multi-valued schema mappings as well as singlevalued schema mappings. 4.3 Unfolded XML Schema Graph The challenge with translating recursive XML queries over recursive XML schemas is to identify the infinite number of matching paths in the XML schema graph. However, if we unfold the recursive XML schema based on the maximum levels of depths for each cycle in the schema graph, we can find out a finite number of matching paths for an arbitrary XML query including the recursive ones. This observation leads us to an elegant and efficient solution for the problem of translating recursive XML queries in the presence of recursive XML schemas. We propose to convert a cyclic XML schema graph to a directed acyclic graph by unfolding the cycles in the original schema. This new schema is called unfolded

8 610 M. Atay et al. <A> < B1 > < C > < D1 >< E /></ D1 > < D2 > < E >< D1 /></ E > </ D2 > < D3 >< E /></ D3 > </ C > </ B1 > < B1 > < C > < D1 >< E /></ D1 > < D2 >< E /></ D2 > < D3 > < E > < D1 >< E /></ D1 > </ E > </ D3 > </ C > </ B1 > < D3 /> </A> A B1 B2 B3 C D1 D2 D3 E D1 E Fig. 2. A Sample XML Document and its Unfolded XML Schema Graph (UXG) XML schema graph (UXG). A UXG of a sample XML document, which conforms to the XML schema graph given in Figure 1, is shown in Figure 2. The formal definition of UXG is given in Definition 5. Definition 5 (Unfolded XML Schema Graph (UXG)). Given an XML schema S, unfolded schema of S is a directed acyclic graph UXG =(V, E, d 1,...d k ), where V is the set of vertices, E is the set of edges, each d i is the maximum level of depth for each cycle c i in S and k denotes the number of cycles in S. Eachcycle c i in S is unfolded to depth d i in UXG in top-down topological order. The vertices represent element types in S, and the edges represent their parent-child relationships. Each vertex is labeled with the name of the corresponding element type. An edge is labeled by if it is incident to a vertex which can appear more than once under its parent in the corresponding XML documents, otherwise no label is used. A recursive XML schema S can be converted into a non-recursive one in the form of a UXG G by unfolding the recursion in S with a finite number of occurrences of recursion that is decided from the XML documents X stored in the database, such that X conforms to S and G at the same time. In other words, S and G are equivalent to each other with respect to X. We can create a UXG by using one of the following two approaches: Static approach. The maximum depth of each cycle in the XML schema graph is determined by the help of a domain expert and a fixed UXG is generated during the schema mapping phase. This fixed UXG is used during the query mapping regardless of the structure of underlying XML documents. Dynamic approach. The maximum depth of each cycle in the XML schema graph is initialized to 1 and a default UXG is generated during schema

9 XML-to-SQL Query Mapping in the Presence 611 mapping phase. When a new XML document is loaded to the database during the data mapping phase, the maximum depth of each cycle in the current document is found and UXG is modified if any current depth value is greater than the existing one. Static UXG approach does not have any computation overhead during the data mapping phase. However, it may return unnecessary matching paths for a given recursive XML query. On the other hand, dynamic UXG approach associates some computational cost during the data mapping phase to maintain the UXG for minimizing the total number of matching paths for the input recursive XML queries. The UXG graph is constructed either during the schema mapping phase or the data mapping phase. We assume bulk data is loaded to the database system first, then it is queried next in a batch-processing fashion. Therefore, the construction of UXG does not introduce additional overhead to XML-to-Relational query mapping performance since it is precomputed before query mapping phase. 5 ID-Based Generic Query Mapping All the schema-based approaches proposed for XML-to-Relational query mapping in the literature have used ID-based techniques as in [4,5]. In ID-based techniques, each element is associated with a unique ID and the tree structure of the XML document is preserved by maintaining a foreign key to the parent which we call parentid. Each child axis / is translated into an equijoin between child and parent elements over their parentid and ID columns in ID-based techniques. We propose a generic ID-based XML-to-Relational query mapping algorithm, ID-XMLToSQL, in this section. An outline of ID-XMLToSQL is given in Figure 3. The ID-XMLToSQL algorithm first identifies all the matching simple paths p i and σ p -mappings σ pi corresponding to those paths when a path expression P and a UXG G u is given. Then it calls the SQL generation procedure SPathToSQL() for each simple path p i along with its mapping σ pi,and then, gets the union of the output SQL queries. We formalize the notion of a path expression as follow: Definition 6 (Path Expression). A path expression P can be denoted as a 1 n 1 a 2 n 2...a k n k where each n i is a node type and each a i is either child axis / or descendant axis //. Each a i n i constitutes a navigation step of P and k is the number of steps in P. A naive XML-to-SQL query mapping procedure follows a blindfold approach. It takes an input simple path expression and generates a relational query by joining the relations corresponding to each step in the simple path expression. A sample SQL query generated using naive query mapping approach is given in Example 1. When consecutive elements in a simple path expression are mapped to the same relation, then the naive approach unnecessarily joins the same relation

10 612 M. Atay et al. 00 Algorithm ID-XMLToSQL 01 Input: Path Expression P,UXGG u 02 Output: SQL query sql 03 Begin 04 Let p i, i=1,2,...,n, be the set of all matching simple paths of P in G u 05 Let σ pi be σ p-mapping for the simple path p i, i=1,2,...,n 06 sql= 07 sql = n i=1 SPathToSQL(pi,σp i ) 08 Return sql 09 End 00 Procedure SPathToSQL(Simple Path Expression p, σ p-mapping σ p) 01 Begin 02 Use σ p to cluster p = /e 1/e 2/.../e m according to Definition 7 03 FromClause= From 04 WhereClause= Where 05 For i=1 to m do / Construct From Clause / 06 If e i is the first element of a cluster then 07 FromClause += $σ p(e i) 08 End If 09 End For 10 For i=2 to m do / Construct Where Clause / 11 If e i is the first element of a cluster then 12 WhereClause += $σ p(e i 1).(e i 1.ID) =σ p(e i).(e i.parentid) 13 End If 14 If e i is neither first nor last element of a cluster then 15 WhereClause += $σ p(e i).(e i.id) is not null 16 End If 17 End For 18 sql= Select $σ p(e m).(e m.id) + FromClause + WhereClause 19 Return sql 20 End Fig. 3. ID-based Query Mapping Algorithm ID-XMLToSQL with itself multiple times. For the simple path expression and its σ p -mapping given in Example 2, corresponding SQL query will include two unnecessary self joins since the elements of last three steps in the path are mapped to the same relation. An intelligent XML-to-SQL query mapping algorithm should be able to recognize the elements mapped to the same relations and avoid the unnecessary self-join operations. We deal with this issue in SPathToSQL() procedure. The outline of SPathToSQL() procedure is shown in Figure 3. The SPathToSQL() procedure identifies the clusters in a path expression which are the groups of elements in consecutive navigation steps mapped into the same relation. The SPathToSQL() procedure recognizes each cluster in a simple path expression and only joins the relation corresponding to the last element of a cluster to the relation corresponding to the first element of its successor cluster. Thus, it avoids the self-join problem of a blindfold query mapping approach. The notion of a cluster is formalized as follows: Definition 7 (Cluster). Given a simple path expression p and a mapping σ p over p, the elements of consecutive steps in p which are mapped to the same relation constitute a cluster. Hence, p can be denoted as a sequence of clusters

11 XML-to-SQL Query Mapping in the Presence 613 such that p = c 1 c 2...c k where each c i is a cluster and k is the number of clusters in p. The SPathToSQL() procedure given in Figure 3 first constructs the From clause at lines It introduces one relation per cluster to the From clause since all the elements in a cluster are mapped to the same relation. The Where clause is constructed at lines A transition from one cluster to another in the input path is handled at lines A predicate of the form σ p (e i 1 ).(e i 1.ID) = σ p (e i ).(e i.parentid) joining the last element of the previous cluster to the first element of current cluster is added to the Where clause. As a result, the relations representing all the neighboring cluster are joined. The SPathToSQL() procedure adds an existential predicate of the form σ p (e i ).(e i.id) is not null for the intermediate elements of a cluster to the Where clause (lines 14-16) as it skips the intermediate elements in a cluster. Thus, it ensures that the middle elements of a cluster co-exist with the elements at each end of the cluster in the underlying XML document. The output SQL query is constructed and returned at lines The existential predicate not null is not introduced for the elements at each end of a cluster since they are already included within the join conditions of the output SQL query. Although the last element in a path expression may not be used in a join condition, we do not need to check the existence of the last element as it is used in the Select clause. We do not need to check the existence of the first element of a simple path expression, which is the root element, as all the simple paths start from the root element. Example 3. If the path expression /A/D3//E is given against the UXG shown in Figure 2 and input to ID-XMLToSQL algorithm, ID-XMLToSQL calls SPath- ToSQL() procedure with the following simple paths identified from the UXG: (i) /A/D3/E and (ii) /A/D3/E/D1/E and, their σ p -mappings: (i) {(A,A), (D3,A), (E,E)} and (ii) {(A,A), (D3,A), (E,E), (D1,D1), (E,E)}, respectively. Below is the generated output SQL query by our ID-XMLToSQL algorithm: Select E.ID From A, E Where A.D3.ID=E.parentID UNION ALL Select E.ID From A, E T1, D1, E T2 Where A.D3.ID=T1.parentID And T1.ID=D1.parentID and D1.ID=T2.parentID Theorem 2 (Time Complexity). The time complexity of the procedure SPath- ToSQL is O(n) where n is the number of steps in an input simple path expression p. Proof (Sketch). The statement at line 02 navigates p once to cluster it and can be evaluated in O(n). The loop at lines navigates p once to construct the From clause and is evaluated in O(n). The loop at lines navigates p once to

12 614 M. Atay et al. construct the Where clause and is executed in O(n). Thus, the time complexity of SPathToSQL() is O(n). 6 Experimental Study We compare the performance of our ID-XMLToSQL algorithm and the recursive query translation algorithm SQLGen of [4] in this section. We used a Pentium IV computer with 2.4 GHz processor and 1 GB main memory for the experiments. The experiments were run using the Java software development kit. We minimized the usage of system resources during the experiments to get more realistic results. We ran the programs 6 times and got the average value, excluding the first run, to have more accurate results. We used auction.xml document of XMark benchmark [19] as our data set to compare the performance of our proposed ID-XMLToSQL algorithm and SQL- Gen algorithm of [4]. The DTD of XMark includes several cycles, and thus, it is an appropriate XML schema for our experiments.the number of elements in the test XML document is 73,740. We selected nine queries with particular features for the test suit. Our test query suit is shown in Table 2. All the queries in our test suit are recursive queries as they contain descendant axis //. All the queries return the elements which are included in a cycle in the XML schema. While the queries Q1, Q8 and Q9 include clusters of two or more elements, the queries Q2, Q3, Q5, Q7, Q8 and Q9 include shared elements which have more than one parents in the XML schema. We implemented only a single-valued schema mapping scheme to run the two query mapping algorithms ID-XMLToSQL and SQLGen as SQLGen does not support multi-valued schema mapping schemes. We used a commercial relational DBMS which allows the usage of advanced SQL 99 with clause as it is centric to the algorithm of SQLGen. We measured the response time for each test query by running the queries generated by two algorithms separately. The experimental results are shown in Figure 4. We used logarithmic scale to increase the readability of the chart. As can be seen from the chart, our ID-XMLToSQL algorithm outperformed the SQLGen algorithm in all the test queries. The main reasons for the performance difference between ID-XMLToSQL and SQLGen include the followings: Table 2. Query Suit for Testing Query Query Definition Q1 /site/categories/category/description//parlist Q2 //text Q3 //parlist Q4 //asia//listitem Q5 //item//listitem Q6 //asia//parlist Q7 //item/parlist Q8 /site/regions/asia/item//parlist Q9 /site/regions/asia/item//listitem

13 XML-to-SQL Query Mapping in the Presence 615 Interval-XMLToSQL ID-XMLToSQL SQLGen Time (Logarithmic) Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Queries Fig. 4. Experimental Results for Query Mapping ID-XMLToSQL resolves the recursion at the XML schema level using precomputed unfolded XML schema graph unlike SQLGen which resolves it inside the relational engine using recursive SQL query. The queries generated by SQLGen are typically more complex and larger than the ones generated by our ID-XMLToSQL. ID-XMLToSQL uses the notion of clustering and avoids unnecessary selfjoins. 7 Conclusions and Future Work In this paper, we proposed the generic XML-to-SQL query mapping algorithm ID-XMLToSQL which can be used with multi-valued schema mappings as well as with single-valued schema mappings. ID-XMLToSQL uses our proposed pathbased σ p -mapping technique to find the target relation for a given element of a path query in the presence of multi-valued schema mappings. We proposed to convert a cyclic XML schema graph to an acyclic one by unfolding the cycles in the graph to a maximum level of depth. Thus, we are able to map the recursive XML queries over the unfolded XML schema graph to SQL queries without using special operators to capture the recursion. Therefore, our proposed query mapping algorithm can be used on any RDBMS as it uses standard SQL features unlike other recursive query mapping algorithms in the literature. We compared the performance of our ID-XMLToSQL algorithm to SQLGen algorithm of [4] and observed that ID-XMLToSQL outperformed SQLGen for all the queries in our test suit. We consider augmenting our proposed ID-based generic query mapping algorithm with interval-based and path-based mapping schemes as a potential future work. Acknowledgment The authors would like to thank Rajasekar Krishnamurthy for providing the source code of SQLGen algorithm and his cooperation, and Dapeng Liu for involving in the implementation of our ID-XMLToSQL algorithm.

14 616 M. Atay et al. References 1. Shanmugasundaram, J., Tufte, K., Zhang, C., He, G., DeWitt, D.J., Naughton, J.F.: Relational databases for querying XML documents: Limitations and opportunities. In: VLDB, pp (1999) 2. Atay, M., Chebotko, A., Liu, D., Lu, S., Fotouhi, F.: Efficient schema-based XMLto-Relational data mapping. Information Systems Journal 32(3), (2007) 3. Krishnamurthy, R., Kaushik, R., Naughton, J.F.: XML-to-SQL query translation literature: The state of the art and open problems. In: XML Database Symposium (2003) 4. Krishnamurthy, R., Chakaravarthy, V.T., Kaushik, R., Naughton, J.F.: Recursive XML schemas, recursive XML queries, and relational storage: XML-to-SQL query translation. In: Proc. of the 20th International Conference on Data Engineering, Boston, pp (March 2004) 5. Fan, W., Yu, J.X., Lu, H., Lu, J., Rastogi, R.: Query translation from XPath to SQL in the presence of recursive DTDs. In: Proc. of the 31sh VLDB Conference, Trondheim, Norway (2005) 6. Choi, B.: What are real DTDs like. In: WebDB Workshop (2002) 7. Deutsch, A., Fernandez, M.F., Suciu, D.: Storing semistructured data with STORED. In: SIGMOD Conference, pp (1999) 8. Florescu, D., Kossmann, D.: Storing and querying XML data using an RDBMS. IEEE Data Engineering Bulletin 22(3), (1999) 9. Schmidt, A., Kersten, M., Windhouwer, M., Waas, F.: Efficient relational storage and retrieval of XML documents. In: WebDB (2000) 10. Yoshikawa, M., Amagasa, T., Shimura, T., Uemura, S.: XRel: A path-based approach to storage and retrieval of XML documents using relational databases. ACM Transactions on InternetTechnology (TOIT) 1(1), (2001) 11. Tatarinov, I., Viglas, S., Beyer, K.S., Shanmugasundaram, J., Shekita, E.J., Zhang, C.: Storing and querying ordered XML using a relational database system. In: SIGMOD Conference, pp (2002) 12. Dehaan, D., Toman, D., Conses, M.P., Ozsu, T.: A comprehensive XQuery to SQL translation using dynamic interval encoding. In: SIGMOD Conference (2003) 13. Teubner, J., Grust, T., Keulen, M.V.: Staircase join: Teach a relational DBMS to watch its (axis) steps. In: VLDB Conference (2003) 14. Krishnamurthy, R., Kaushik, R., Naughton, J.F.: Efficient XML-to-Relational query translation: Where to add intelligence? In: Proc. of the 30th VLDB Conference, Toronto, Canada (2004) 15. Runapongsa, K., Patel, J.M.: Storing and querying XML data in object-relational dbmss. In: EDBT Workshops (2002) 16. Cheng, J., Xu, J.: DB2 extender for XML. IBM (2000), Oracle: XML Database Developer s guide - Oracle XML DB Release 2 (2002), Microsoft: SQLXML and XML Mapping Technologies (2004), Schmidt, A., Waas, F., Kersten, M.L., Carey, M.J., Manolescu, I., Busse, R.: XMark: a benchmark for XML data management. In: VLDB, pp (2002)

Unraveling the Duplicate-Elimination Problem in XML-to-SQL Query Translation

Unraveling the Duplicate-Elimination Problem in XML-to-SQL Query Translation Unraveling the Duplicate-Elimination Problem in XML-to-SQL Query Translation Rajasekar Krishnamurthy University of Wisconsin [email protected] Raghav Kaushik Microsoft Corporation [email protected]

More information

Efficient Mapping XML DTD to Relational Database

Efficient Mapping XML DTD to Relational Database Efficient Mapping XML DTD to Relational Database Mohammed Adam Ibrahim Fakharaldien 1, Khalid Edris 2, Jasni Mohamed Zain 3, Norrozila Sulaiman 4 Faculty of Computer System and Software Engineering,University

More information

Model-Mapping Approaches for Storing and Querying XML Documents in Relational Database: A Survey

Model-Mapping Approaches for Storing and Querying XML Documents in Relational Database: A Survey Model-Mapping Approaches for Storing and Querying XML Documents in Relational Database: A Survey 1 Amjad Qtaish, 2 Kamsuriah Ahmad 1 School of Computer Science, Faculty of Information Science and Technology,

More information

A Tale of Two Approaches: Query Performance Study of XML Storage Strategies in Relational Databases

A Tale of Two Approaches: Query Performance Study of XML Storage Strategies in Relational Databases A Tale of Two Approaches: Performance Study of XML Storage Strategies in Relational Databases Sandeep Prakash Sourav S. Bhowmick School of Computer Engineering, Nanyang Technological University, Singapore

More information

Enhancing Traditional Databases to Support Broader Data Management Applications. Yi Chen Computer Science & Engineering Arizona State University

Enhancing Traditional Databases to Support Broader Data Management Applications. Yi Chen Computer Science & Engineering Arizona State University Enhancing Traditional Databases to Support Broader Data Management Applications Yi Chen Computer Science & Engineering Arizona State University What Is a Database System? Of course, there are traditional

More information

Deferred node-copying scheme for XQuery processors

Deferred node-copying scheme for XQuery processors Deferred node-copying scheme for XQuery processors Jan Kurš and Jan Vraný Software Engineering Group, FIT ČVUT, Kolejn 550/2, 160 00, Prague, Czech Republic [email protected], [email protected] Abstract.

More information

An Efficient and Scalable Management of Ontology

An Efficient and Scalable Management of Ontology An Efficient and Scalable Management of Ontology Myung-Jae Park 1, Jihyun Lee 1, Chun-Hee Lee 1, Jiexi Lin 1, Olivier Serres 2, and Chin-Wan Chung 1 1 Korea Advanced Institute of Science and Technology,

More information

ON ANALYZING THE DATABASE PERFORMANCE FOR DIFFERENT CLASSES OF XML DOCUMENTS BASED ON THE USED STORAGE APPROACH

ON ANALYZING THE DATABASE PERFORMANCE FOR DIFFERENT CLASSES OF XML DOCUMENTS BASED ON THE USED STORAGE APPROACH ON ANALYZING THE DATABASE PERFORMANCE FOR DIFFERENT CLASSES OF XML DOCUMENTS BASED ON THE USED STORAGE APPROACH Hagen Höpfner and Jörg Schad and Essam Mansour International University Bruchsal, Campus

More information

Parallelizing Structural Joins to Process Queries over Big XML Data Using MapReduce

Parallelizing Structural Joins to Process Queries over Big XML Data Using MapReduce Parallelizing Structural Joins to Process Queries over Big XML Data Using MapReduce Huayu Wu Institute for Infocomm Research, A*STAR, Singapore [email protected] Abstract. Processing XML queries over

More information

Storing and Querying Ordered XML Using a Relational Database System

Storing and Querying Ordered XML Using a Relational Database System Kevin Beyer IBM Almaden Research Center Storing and Querying Ordered XML Using a Relational Database System Igor Tatarinov* University of Washington Jayavel Shanmugasundaram* Cornell University Stratis

More information

Translating WFS Query to SQL/XML Query

Translating WFS Query to SQL/XML Query Translating WFS Query to SQL/XML Query Vânia Vidal, Fernando Lemos, Fábio Feitosa Departamento de Computação Universidade Federal do Ceará (UFC) Fortaleza CE Brazil {vvidal, fernandocl, fabiofbf}@lia.ufc.br

More information

Constraint Preserving XML Storage in Relations

Constraint Preserving XML Storage in Relations Constraint Preserving XML Storage in Relations Yi Chen, Susan B. Davidson and Yifeng Zheng ÔØº Ó ÓÑÔÙØ Ö Ò ÁÒ ÓÖÑ Ø ÓÒ Ë Ò ÍÒ Ú Ö ØÝ Ó È ÒÒ ÝÐÚ Ò [email protected] [email protected] [email protected]

More information

Join Minimization in XML-to-SQL Translation: An Algebraic Approach

Join Minimization in XML-to-SQL Translation: An Algebraic Approach Join Minimization in XML-to-SQL Translation: An Algebraic Approach Murali Mani Song Wang Daniel J. Dougherty Elke A. Rundensteiner Computer Science Dept, WPI {mmani,songwang,dd,rundenst}@cs.wpi.edu Abstract

More information

Technologies for a CERIF XML based CRIS

Technologies for a CERIF XML based CRIS Technologies for a CERIF XML based CRIS Stefan Bärisch GESIS-IZ, Bonn, Germany Abstract The use of XML as a primary storage format as opposed to data exchange raises a number of questions regarding the

More information

An Oracle White Paper October 2013. Oracle XML DB: Choosing the Best XMLType Storage Option for Your Use Case

An Oracle White Paper October 2013. Oracle XML DB: Choosing the Best XMLType Storage Option for Your Use Case An Oracle White Paper October 2013 Oracle XML DB: Choosing the Best XMLType Storage Option for Your Use Case Introduction XMLType is an abstract data type that provides different storage and indexing models

More information

KEYWORD SEARCH IN RELATIONAL DATABASES

KEYWORD SEARCH IN RELATIONAL DATABASES KEYWORD SEARCH IN RELATIONAL DATABASES N.Divya Bharathi 1 1 PG Scholar, Department of Computer Science and Engineering, ABSTRACT Adhiyamaan College of Engineering, Hosur, (India). Data mining refers to

More information

Relational Databases for Querying XML Documents: Limitations and Opportunities. Outline. Motivation and Problem Definition Querying XML using a RDBMS

Relational Databases for Querying XML Documents: Limitations and Opportunities. Outline. Motivation and Problem Definition Querying XML using a RDBMS Relational Databases for Querying XML Documents: Limitations and Opportunities Jayavel Shanmugasundaram Kristin Tufte Gang He Chun Zhang David DeWitt Jeffrey Naughton Outline Motivation and Problem Definition

More information

XML-to-SQL Query Translation

XML-to-SQL Query Translation XML-to-SQL Query Translation By Rajasekar Krishnamurthy A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Computer Sciences) at the UNIVERSITY

More information

GRAPH THEORY LECTURE 4: TREES

GRAPH THEORY LECTURE 4: TREES GRAPH THEORY LECTURE 4: TREES Abstract. 3.1 presents some standard characterizations and properties of trees. 3.2 presents several different types of trees. 3.7 develops a counting method based on a bijection

More information

DLDB: Extending Relational Databases to Support Semantic Web Queries

DLDB: Extending Relational Databases to Support Semantic Web Queries DLDB: Extending Relational Databases to Support Semantic Web Queries Zhengxiang Pan (Lehigh University, USA [email protected]) Jeff Heflin (Lehigh University, USA [email protected]) Abstract: We

More information

10. XML Storage 1. 10.1 Motivation. 10.1 Motivation. 10.1 Motivation. 10.1 Motivation. XML Databases 10. XML Storage 1 Overview

10. XML Storage 1. 10.1 Motivation. 10.1 Motivation. 10.1 Motivation. 10.1 Motivation. XML Databases 10. XML Storage 1 Overview 10. XML Storage 1 XML Databases 10. XML Storage 1 Overview Silke Eckstein Andreas Kupfer Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 10.6 Overview and

More information

Creating Synthetic Temporal Document Collections for Web Archive Benchmarking

Creating Synthetic Temporal Document Collections for Web Archive Benchmarking Creating Synthetic Temporal Document Collections for Web Archive Benchmarking Kjetil Nørvåg and Albert Overskeid Nybø Norwegian University of Science and Technology 7491 Trondheim, Norway Abstract. In

More information

Relational Databases

Relational Databases Relational Databases Jan Chomicki University at Buffalo Jan Chomicki () Relational databases 1 / 18 Relational data model Domain domain: predefined set of atomic values: integers, strings,... every attribute

More information

XML Data Integration

XML Data Integration XML Data Integration Lucja Kot Cornell University 11 November 2010 Lucja Kot (Cornell University) XML Data Integration 11 November 2010 1 / 42 Introduction Data Integration and Query Answering A data integration

More information

XML Data Integration Based on Content and Structure Similarity Using Keys

XML Data Integration Based on Content and Structure Similarity Using Keys XML Data Integration Based on Content and Structure Similarity Using Keys Waraporn Viyanon 1, Sanjay K. Madria 1, and Sourav S. Bhowmick 2 1 Department of Computer Science, Missouri University of Science

More information

PartJoin: An Efficient Storage and Query Execution for Data Warehouses

PartJoin: An Efficient Storage and Query Execution for Data Warehouses PartJoin: An Efficient Storage and Query Execution for Data Warehouses Ladjel Bellatreche 1, Michel Schneider 2, Mukesh Mohania 3, and Bharat Bhargava 4 1 IMERIR, Perpignan, FRANCE [email protected] 2

More information

Classification and Prediction

Classification and Prediction Classification and Prediction Slides for Data Mining: Concepts and Techniques Chapter 7 Jiawei Han and Micheline Kamber Intelligent Database Systems Research Lab School of Computing Science Simon Fraser

More information

Reliability Guarantees in Automata Based Scheduling for Embedded Control Software

Reliability Guarantees in Automata Based Scheduling for Embedded Control Software 1 Reliability Guarantees in Automata Based Scheduling for Embedded Control Software Santhosh Prabhu, Aritra Hazra, Pallab Dasgupta Department of CSE, IIT Kharagpur West Bengal, India - 721302. Email: {santhosh.prabhu,

More information

Efficient Integration of Data Mining Techniques in Database Management Systems

Efficient Integration of Data Mining Techniques in Database Management Systems Efficient Integration of Data Mining Techniques in Database Management Systems Fadila Bentayeb Jérôme Darmont Cédric Udréa ERIC, University of Lyon 2 5 avenue Pierre Mendès-France 69676 Bron Cedex France

More information

A Study on Workload Imbalance Issues in Data Intensive Distributed Computing

A Study on Workload Imbalance Issues in Data Intensive Distributed Computing A Study on Workload Imbalance Issues in Data Intensive Distributed Computing Sven Groot 1, Kazuo Goda 1, and Masaru Kitsuregawa 1 University of Tokyo, 4-6-1 Komaba, Meguro-ku, Tokyo 153-8505, Japan Abstract.

More information

Storing and Querying XML Data in Object-Relational DBMSs

Storing and Querying XML Data in Object-Relational DBMSs Storing and Querying XML Data in Object-Relational DBMSs Kanda Runapongsa and Jignesh M. Patel University of Michigan, Ann Arbor MI 48109, USA {krunapon, jignesh}@eecs.umich.edu Abstract. As the popularity

More information

Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce

Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce Mohammad Farhan Husain, Pankil Doshi, Latifur Khan, and Bhavani Thuraisingham University of Texas at Dallas, Dallas TX 75080, USA Abstract.

More information

Towards Full-fledged XML Fragmentation for Transactional Distributed Databases

Towards Full-fledged XML Fragmentation for Transactional Distributed Databases Towards Full-fledged XML Fragmentation for Transactional Distributed Databases Rebeca Schroeder 1, Carmem S. Hara (supervisor) 1 1 Programa de Pós Graduação em Informática Universidade Federal do Paraná

More information

XML Databases 10 O. 10. XML Storage 1 Overview

XML Databases 10 O. 10. XML Storage 1 Overview XML Databases 10 O 10. XML Storage 1 Overview Silke Eckstein Andreas Kupfer Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 10. XML Storage 1 10.1 Motivation

More information

XStruct: Efficient Schema Extraction from Multiple and Large XML Documents

XStruct: Efficient Schema Extraction from Multiple and Large XML Documents XStruct: Efficient Schema Extraction from Multiple and Large XML Documents Jan Hegewald, Felix Naumann, Melanie Weis Humboldt-Universität zu Berlin Unter den Linden 6, 10099 Berlin {hegewald,naumann,mweis}@informatik.hu-berlin.de

More information

An Overview of Distributed Databases

An Overview of Distributed Databases International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 4, Number 2 (2014), pp. 207-214 International Research Publications House http://www. irphouse.com /ijict.htm An Overview

More information

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction COMP3420: Advanced Databases and Data Mining Classification and prediction: Introduction and Decision Tree Induction Lecture outline Classification versus prediction Classification A two step process Supervised

More information

Database Auditing Design on Historical Data

Database Auditing Design on Historical Data ISBN 978-952-5726-09-1 (Print) Proceedings of the Second International Symposium on Networking and Network Security (ISNNS 10) Jinggangshan, P. R. China, 2-4, April. 2010, pp. 275-281 Database Auditing

More information

G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs

G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs Sherif Sakr National ICT Australia UNSW, Sydney, Australia [email protected] Sameh Elnikety Microsoft Research Redmond, WA, USA [email protected]

More information

Unified XML/relational storage March 2005. The IBM approach to unified XML/relational databases

Unified XML/relational storage March 2005. The IBM approach to unified XML/relational databases March 2005 The IBM approach to unified XML/relational databases Page 2 Contents 2 What is native XML storage? 3 What options are available today? 3 Shred 5 CLOB 5 BLOB (pseudo native) 6 True native 7 The

More information

Determination of the normalization level of database schemas through equivalence classes of attributes

Determination of the normalization level of database schemas through equivalence classes of attributes Computer Science Journal of Moldova, vol.17, no.2(50), 2009 Determination of the normalization level of database schemas through equivalence classes of attributes Cotelea Vitalie Abstract In this paper,

More information

Database Systems. Lecture 1: Introduction

Database Systems. Lecture 1: Introduction Database Systems Lecture 1: Introduction General Information Professor: Leonid Libkin Contact: [email protected] Lectures: Tuesday, 11:10am 1 pm, AT LT4 Website: http://homepages.inf.ed.ac.uk/libkin/teach/dbs09/index.html

More information

Modeling and Querying E-Commerce Data in Hybrid Relational-XML DBMSs

Modeling and Querying E-Commerce Data in Hybrid Relational-XML DBMSs Modeling and Querying E-Commerce Data in Hybrid Relational-XML DBMSs Lipyeow Lim, Haixun Wang, and Min Wang IBM T. J. Watson Research Center {liplim,haixun,min}@us.ibm.com Abstract. Data in many industrial

More information

Object Oriented Databases. OOAD Fall 2012 Arjun Gopalakrishna Bhavya Udayashankar

Object Oriented Databases. OOAD Fall 2012 Arjun Gopalakrishna Bhavya Udayashankar Object Oriented Databases OOAD Fall 2012 Arjun Gopalakrishna Bhavya Udayashankar Executive Summary The presentation on Object Oriented Databases gives a basic introduction to the concepts governing OODBs

More information

The Goldberg Rao Algorithm for the Maximum Flow Problem

The Goldberg Rao Algorithm for the Maximum Flow Problem The Goldberg Rao Algorithm for the Maximum Flow Problem COS 528 class notes October 18, 2006 Scribe: Dávid Papp Main idea: use of the blocking flow paradigm to achieve essentially O(min{m 2/3, n 1/2 }

More information

The Entity-Relationship Model

The Entity-Relationship Model The Entity-Relationship Model 221 After completing this chapter, you should be able to explain the three phases of database design, Why are multiple phases useful? evaluate the significance of the Entity-Relationship

More information

Oracle8i Spatial: Experiences with Extensible Databases

Oracle8i Spatial: Experiences with Extensible Databases Oracle8i Spatial: Experiences with Extensible Databases Siva Ravada and Jayant Sharma Spatial Products Division Oracle Corporation One Oracle Drive Nashua NH-03062 {sravada,jsharma}@us.oracle.com 1 Introduction

More information

ABSTRACT 1. INTRODUCTION. Kamil Bajda-Pawlikowski [email protected]

ABSTRACT 1. INTRODUCTION. Kamil Bajda-Pawlikowski kbajda@cs.yale.edu Kamil Bajda-Pawlikowski [email protected] Querying RDF data stored in DBMS: SPARQL to SQL Conversion Yale University technical report #1409 ABSTRACT This paper discusses the design and implementation

More information

A MEDIATION LAYER FOR HETEROGENEOUS XML SCHEMAS

A MEDIATION LAYER FOR HETEROGENEOUS XML SCHEMAS A MEDIATION LAYER FOR HETEROGENEOUS XML SCHEMAS Abdelsalam Almarimi 1, Jaroslav Pokorny 2 Abstract This paper describes an approach for mediation of heterogeneous XML schemas. Such an approach is proposed

More information

Single machine parallel batch scheduling with unbounded capacity

Single machine parallel batch scheduling with unbounded capacity Workshop on Combinatorics and Graph Theory 21th, April, 2006 Nankai University Single machine parallel batch scheduling with unbounded capacity Yuan Jinjiang Department of mathematics, Zhengzhou University

More information

INTEGRATION OF XML DATA IN PEER-TO-PEER E-COMMERCE APPLICATIONS

INTEGRATION OF XML DATA IN PEER-TO-PEER E-COMMERCE APPLICATIONS INTEGRATION OF XML DATA IN PEER-TO-PEER E-COMMERCE APPLICATIONS Tadeusz Pankowski 1,2 1 Institute of Control and Information Engineering Poznan University of Technology Pl. M.S.-Curie 5, 60-965 Poznan

More information

XBench Benchmark and Performance Testing of XML DBMSs

XBench Benchmark and Performance Testing of XML DBMSs XBench Benchmark and Performance Testing of XML DBMSs Benjamin Bin Yao M. Tamer Özsu University of Waterloo School of Computer Science {bbyao, tozsu}@uwaterloo.ca Nitin Khandelwal University of Pennsylvania

More information

Integrating Heterogeneous Data Sources Using XML

Integrating Heterogeneous Data Sources Using XML Integrating Heterogeneous Data Sources Using XML 1 Yogesh R.Rochlani, 2 Prof. A.R. Itkikar 1 Department of Computer Science & Engineering Sipna COET, SGBAU, Amravati (MH), India 2 Department of Computer

More information

Integrating Pattern Mining in Relational Databases

Integrating Pattern Mining in Relational Databases Integrating Pattern Mining in Relational Databases Toon Calders, Bart Goethals, and Adriana Prado University of Antwerp, Belgium {toon.calders, bart.goethals, adriana.prado}@ua.ac.be Abstract. Almost a

More information

Efficiently Identifying Inclusion Dependencies in RDBMS

Efficiently Identifying Inclusion Dependencies in RDBMS Efficiently Identifying Inclusion Dependencies in RDBMS Jana Bauckmann Department for Computer Science, Humboldt-Universität zu Berlin Rudower Chaussee 25, 12489 Berlin, Germany [email protected]

More information

Translating between XML and Relational Databases using XML Schema and Automed

Translating between XML and Relational Databases using XML Schema and Automed Imperial College of Science, Technology and Medicine (University of London) Department of Computing Translating between XML and Relational Databases using XML Schema and Automed Andrew Charles Smith acs203

More information

µz An Efficient Engine for Fixed points with Constraints

µz An Efficient Engine for Fixed points with Constraints µz An Efficient Engine for Fixed points with Constraints Kryštof Hoder, Nikolaj Bjørner, and Leonardo de Moura Manchester University and Microsoft Research Abstract. The µz tool is a scalable, efficient

More information

CS 598CSC: Combinatorial Optimization Lecture date: 2/4/2010

CS 598CSC: Combinatorial Optimization Lecture date: 2/4/2010 CS 598CSC: Combinatorial Optimization Lecture date: /4/010 Instructor: Chandra Chekuri Scribe: David Morrison Gomory-Hu Trees (The work in this section closely follows [3]) Let G = (V, E) be an undirected

More information

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS STEVEN P. LALLEY AND ANDREW NOBEL Abstract. It is shown that there are no consistent decision rules for the hypothesis testing problem

More information

Chapter 13: Query Processing. Basic Steps in Query Processing

Chapter 13: Query Processing. Basic Steps in Query Processing Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing

More information

Introduction to Scheduling Theory

Introduction to Scheduling Theory Introduction to Scheduling Theory Arnaud Legrand Laboratoire Informatique et Distribution IMAG CNRS, France [email protected] November 8, 2004 1/ 26 Outline 1 Task graphs from outer space 2 Scheduling

More information

Semantic Errors in SQL Queries: A Quite Complete List

Semantic Errors in SQL Queries: A Quite Complete List Semantic Errors in SQL Queries: A Quite Complete List Christian Goldberg, Stefan Brass Martin-Luther-Universität Halle-Wittenberg {goldberg,brass}@informatik.uni-halle.de Abstract We investigate classes

More information

Clean Answers over Dirty Databases: A Probabilistic Approach

Clean Answers over Dirty Databases: A Probabilistic Approach Clean Answers over Dirty Databases: A Probabilistic Approach Periklis Andritsos University of Trento [email protected] Ariel Fuxman University of Toronto [email protected] Renée J. Miller University

More information

Big Data Begets Big Database Theory

Big Data Begets Big Database Theory Big Data Begets Big Database Theory Dan Suciu University of Washington 1 Motivation Industry analysts describe Big Data in terms of three V s: volume, velocity, variety. The data is too big to process

More information

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10 1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom

More information

Quiz! Database Indexes. Index. Quiz! Disc and main memory. Quiz! How costly is this operation (naive solution)?

Quiz! Database Indexes. Index. Quiz! Disc and main memory. Quiz! How costly is this operation (naive solution)? Database Indexes How costly is this operation (naive solution)? course per weekday hour room TDA356 2 VR Monday 13:15 TDA356 2 VR Thursday 08:00 TDA356 4 HB1 Tuesday 08:00 TDA356 4 HB1 Friday 13:15 TIN090

More information

A first step towards modeling semistructured data in hybrid multimodal logic

A first step towards modeling semistructured data in hybrid multimodal logic A first step towards modeling semistructured data in hybrid multimodal logic Nicole Bidoit * Serenella Cerrito ** Virginie Thion * * LRI UMR CNRS 8623, Université Paris 11, Centre d Orsay. ** LaMI UMR

More information

On the k-path cover problem for cacti

On the k-path cover problem for cacti On the k-path cover problem for cacti Zemin Jin and Xueliang Li Center for Combinatorics and LPMC Nankai University Tianjin 300071, P.R. China [email protected], [email protected] Abstract In this paper we

More information

An XML Based Data Exchange Model for Power System Studies

An XML Based Data Exchange Model for Power System Studies ARI The Bulletin of the Istanbul Technical University VOLUME 54, NUMBER 2 Communicated by Sondan Durukanoğlu Feyiz An XML Based Data Exchange Model for Power System Studies Hasan Dağ Department of Electrical

More information

QuickDB Yet YetAnother Database Management System?

QuickDB Yet YetAnother Database Management System? QuickDB Yet YetAnother Database Management System? Radim Bača, Peter Chovanec, Michal Krátký, and Petr Lukáš Radim Bača, Peter Chovanec, Michal Krátký, and Petr Lukáš Department of Computer Science, FEECS,

More information

Connectivity and cuts

Connectivity and cuts Math 104, Graph Theory February 19, 2013 Measure of connectivity How connected are each of these graphs? > increasing connectivity > I G 1 is a tree, so it is a connected graph w/minimum # of edges. Every

More information

Enforcing Data Quality Rules for a Synchronized VM Log Audit Environment Using Transformation Mapping Techniques

Enforcing Data Quality Rules for a Synchronized VM Log Audit Environment Using Transformation Mapping Techniques Enforcing Data Quality Rules for a Synchronized VM Log Audit Environment Using Transformation Mapping Techniques Sean Thorpe 1, Indrajit Ray 2, and Tyrone Grandison 3 1 Faculty of Engineering and Computing,

More information

SQL Query Evaluation. Winter 2006-2007 Lecture 23

SQL Query Evaluation. Winter 2006-2007 Lecture 23 SQL Query Evaluation Winter 2006-2007 Lecture 23 SQL Query Processing Databases go through three steps: Parse SQL into an execution plan Optimize the execution plan Evaluate the optimized plan Execution

More information

Scalable Data Integration by Mapping Data to Queries

Scalable Data Integration by Mapping Data to Queries Scalable Data Integration by Mapping Data to Queries Martin Hentschel 1, Donald Kossmann 1, Daniela Florescu 2, Laura Haas 3, Tim Kraska 1, and Renée J. Miller 4 1 Systems Group, Department of Computer

More information

A Workbench for Prototyping XML Data Exchange (extended abstract)

A Workbench for Prototyping XML Data Exchange (extended abstract) A Workbench for Prototyping XML Data Exchange (extended abstract) Renzo Orsini and Augusto Celentano Università Ca Foscari di Venezia, Dipartimento di Informatica via Torino 155, 30172 Mestre (VE), Italy

More information

Topics in basic DBMS course

Topics in basic DBMS course Topics in basic DBMS course Database design Transaction processing Relational query languages (SQL), calculus, and algebra DBMS APIs Database tuning (physical database design) Basic query processing (ch

More information

Comparison of XML Support in IBM DB2 9, Microsoft SQL Server 2005, Oracle 10g

Comparison of XML Support in IBM DB2 9, Microsoft SQL Server 2005, Oracle 10g Comparison of XML Support in IBM DB2 9, Microsoft SQL Server 2005, Oracle 10g O. Beza¹, M. Patsala², E. Keramopoulos³ ¹Dpt. Of Information Technology, Alexander Technology Educational Institute (ATEI),

More information

Personalized e-learning a Goal Oriented Approach

Personalized e-learning a Goal Oriented Approach Proceedings of the 7th WSEAS International Conference on Distance Learning and Web Engineering, Beijing, China, September 15-17, 2007 304 Personalized e-learning a Goal Oriented Approach ZHIQI SHEN 1,

More information

XML Design for Relational Storage

XML Design for Relational Storage XML Design for Relational Storage Solmaz Kolahi University of Toronto [email protected] Leonid Libkin University of Edinburgh [email protected] ABSTRACT Design principles for XML schemas that eliminate

More information

OPTIMAL DESIGN OF DISTRIBUTED SENSOR NETWORKS FOR FIELD RECONSTRUCTION

OPTIMAL DESIGN OF DISTRIBUTED SENSOR NETWORKS FOR FIELD RECONSTRUCTION OPTIMAL DESIGN OF DISTRIBUTED SENSOR NETWORKS FOR FIELD RECONSTRUCTION Sérgio Pequito, Stephen Kruzick, Soummya Kar, José M. F. Moura, A. Pedro Aguiar Department of Electrical and Computer Engineering

More information