XML-to-SQL Query Mapping in the Presence of Multi-valued Schema Mappings and Recursive XML Schemas
|
|
|
- Anis Nelson
- 9 years ago
- Views:
Transcription
1 XML-to-SQL Query Mapping in the Presence of Multi-valued Schema Mappings and Recursive XML Schemas Mustafa Atay, Artem Chebotko, Shiyong Lu, and Farshad Fotouhi Department of Computer Science Wayne State University Detroit, Michigan USA {matay, artem, shiyong, Abstract. Several query mapping algorithms have been proposed to translate XML queries into SQL queries for a schema-based relational XML storage. However, existing query mapping algorithms only support single-valued mapping schemes, in which each XML element type is mapped to exactly one relation, and do not support multi-valued mapping schemes, in which each XML element type can be mapped to multiple relations. In this paper, we propose a generic query mapping algorithm, ID-XMLToSQL, for a schema-based relational XML storage. To the best of our knowledge, our algorithm provides the first generic solution to the XML-to-Relational query mapping problem that is applicable to both single-valued and multi-valued mapping schemes. Moreover, our algorithm also provides an elegant solution to the query mapping problem in the presence of recursive XML schemas and recursive queries. While existing algorithms need special recursion operators, our algorithm only requires the traditional relational operators and thus, can work with all relational databases. 1 Introduction Numerous researchers propose to use relational databases for storing and querying XML documents in order to get benefits of this mature technology. This approach requires algorithms to map XML schemas, documents and queries, into their relational equivalents. An XML-to-SQL query mapping algorithm for a schema-based relational XML storage should respect the underlying XML-to-Relational schema mapping scheme. The XML-to-Relational schema mapping schemes in the literature can be classified into the following two categories: Single-valued Schema Mappings. In a single-valued schema mapping, an XML element or attribute type is mapped into exactly one single relation in the target relational schema. Thus, it shows the characteristics of a function. The Shared schema mapping approach introduced in [1] and ODT DMap approach introduced in [2] fall into this category. R. Wagner, N. Revell, and G. Pernul (Eds.): DEXA 2007, LNCS 4653, pp , c Springer-Verlag Berlin Heidelberg 2007
2 604 M. Atay et al. Multi-valued Schema Mappings. In a multi-valued schema mapping, an XML element or attribute type can be mapped into more than one relation in the target relational schema. The multi-valued schema mappings do not show the characteristics of a function and thus they are harder to deal with. The Basic and Hybrid schema mapping approaches proposed in [1] fall into this category. Although there are several query mapping algorithms for single-valued schema mapping schemes, there is no published query mapping algorithm which supports multi-valued schema mapping schemes to our best knowledge. Therefore, we propose a generic query mapping algorithm which supports both multi-valued and single-valued schema mapping schemes in this paper. Our generic algorithm also provides an elegant solution to the XML-to- Relational query mapping problem in the presence of recursive XML schemas and recursive queries. This problem is identified as an important practical problem in the literature [3,4,5]. Recursive XML schemas are common in practice as pointed out by [6] in which 35 DTDs found to be recursive out of 60 real-world DTDs. On the other hand, recursive XML queries, which include descendant axis //, are also common in practice. The challenge of XML-to-SQL query mapping is that, when there is recursion both in an XML query and in its underlying XML schema, there might be infinitely many paths corresponding to the given recursive XML query. There are two elegant algorithms [4,5] in the literature which address this issue. These algorithms solve the recursion within the relational engine by using special SQL operators which are not supported by some RDBMSs. On the other hand, we solve the recursion at XML schema level without using special SQL operators. The main contributions of this paper include the following: 1. We propose a generic query mapping algorithm, ID-XMLToSQL, for a schema-based relational XML storage scheme. To the best of our knowledge, our algorithm provides the first generic solution to the XML-to-Relational query mapping problem that is applicable to all relational XML storage mapping schemes proposed in the literature, including both single-valued and multi-valued schema mapping schemes. 2. We propose to convert a cyclic XML schema graph to a directed acyclic graph by unfolding the cycles in the XML schema graph to facilitate the recursive query mapping process. Thus, we can find out a finite number of matching paths on the generated acyclic graph for an arbitrary XML query including the recursive ones. Therefore, our proposed query mapping technique can be implemented on any RDBMS as it does not require using special SQL operators to capture the recursion while the existing algorithms need special recursion operators. Organization: The rest of the paper is organized as follows. Section 2 gives a summary of related work. We give a motivation on generic query mapping in Section 3. Section 4 includes all necessary preliminaries for our generic query mapping algorithm. The outline of our proposed query mapping algorithm ID- XMLToSQL isgiveninsection5.wedemonstrateaperformancestudyofthe
3 XML-to-SQL Query Mapping in the Presence 605 algorithm ID-XMLToSQL in Section 6. Finally, Section 7 concludes the paper and points out some potential future work. 2 Related Work In order to query XML data stored in a relational database, one should map the XML queries into relational queries based on the underlying XML-to-Relational schema mapping scheme. Hence, we can split the XML-to-Relational query mapping algorithms into the following two categories based on the underlying schema mapping schemes: Schema-less Query Mapping. There has been a lot of work on schema-less query mapping [7,8,9,10,11,12,13,14]. In this approach, XML schema is considered to be missing or not used and a generic relational schema is generated for all XML documents. Then, a given XML query is mapped to its relational equivalent using the generic relational schema. Schema-based Query Mapping. There have been several works on schemabased query mapping [1,15,4,5,16,17,18] where an XML schema is provided and used to generate a good relational schema. The generated relational schemas vary according to the input XML schemas. Therefore, an XML-to- Relational query mapping algorithm should know and respect the underlying XML-to-Relational schema mapping to generate correct and efficient relational queries. The problem of mapping recursive XML queries in the presence of recursive schemas studied in schema-less query mapping space [8,10]. However, their query mapping algorithms are not applicable to the schema-based query mapping space. Recently, two elegant approaches proposed in [4,5] to map recursive XML queries to their relational equivalents in the presence of recursive XML schemas. The query mapping algorithm of [4] first derives a query graph for an input path query from the XML schema graph. Then, it partitions the query graph into strongly-connected components and generates an SQL query for each component. If a component is recursive, then, the recursion in this component is captured in the corresponding SQL query by using the with construct of SQL 99. The query mapping algorithm of [5] first rewrites a given XPath query into a regular XPath expression which is capable of capturing recursion both in a DTD and in an XPath query. Furthermore, they provide an algorithm for translating regular XPath expressions to relational queries using least fixpoint (LFP) operator. The LFP operator is used to capture the recursion in the queries. However, these recursive query mapping algorithms are not generic enough to be used with multi-valued mappings such as Basic and Hybrid introduced in [1]. Moreover, they require the usage of special SQL operators such as with construct of SQL 99 or LFP operator which are not supported by some RDBMSs. Our proposed ID-XMLToSQL algorithm overcomes these limitations.
4 606 M. Atay et al. 3 Motivation A generic query mapping algorithm for a schema-based relational XML storage is supposed to work with a general class of XML-to-Relational schema mappings which can be classified into two main categories as Single-valued Schema Mappings and Multi-valued Schema Mappings. Surprisingly, there is no published XML-to-Relational query mapping algorithm in the schema-based XML storage space which is generic enough to work with the multi-valued XML-to-Relational schema mappings. The recursive query translation algorithm of [4] handles a general class of single-valued XML-to- Relational mappings. The main query translation procedure SQL() in [4] uses the function Annot() to find out the relation/column corresponding to an XML element. Neither Annot() nor SQL() support the multi-valued XML-to-Relational schema mapping. Thus, [4] is not generic enough to handle all types of mappings proposed in the literature. While the RegT osql algorithm proposed in [5] supports a broad class of XPath queries, it still lacks the support for multi-valued schema mappings. A single-valued mapping is a function which returns only one relation for an input XML element/attribute type. The target relation to retrieve an XML element or attribute can easily be determined from a single-valued mapping. Thus, single-valued mappings are relatively easier to handle during the query mapping phase. A multi-valued mapping is not a function since it can return multiple relations for an input XML element/attribute type. This situation may cause ambiguity while a query mapping algorithm is trying to locate the target relation for an XML element type to retrieve its data. Hence, a query mapping algorithm based on a multi-valued mapping should be intelligent enough to resolve this possible ambiguity and find out the correct relation(s) to access. Thus, it is more challenging to map XML queries to relational queries under multi-valued mapping schemes than under single-valued mapping schemes. A B1 B2 B3 C D1 D2 D3 E Fig. 1. A Sample XML Schema Graph
5 XML-to-SQL Query Mapping in the Presence 607 Table 1. Single-valued and Multi-valued Schema Mapping Examples Single-valued σ-mapping (Shared) Element Relation A A B1 B1 B2 B2 B3 B3 C C D1 D1 D2 C D3 D3 E E (A) Multi-valued σ-mapping (Hybrid) Element Relation A A B1 B1 B2 B2 B3 B3 C B1, B2, B3 D1 D1 D2 B1, B2, B3 D3 A, B1, B2, B3 E E (B) We use a data structure to store XML-to-Relational schema mapping information. We call this data structure as σ-mapping and formally define it in Section 4.1. The σ-mappings based on Shared and Hybrid approaches for the XML schema shown in Figure 1 are given in Table 1.A and Table 1.B, respectively. We assume the XML attribute types are mapped to the same relation with their parent element types. Example 1. If the XPath expression /A/B1/C/D3 is given against the XML schema graph shown in Figure 1, following will be its SQL equivalent based on a typical query mapping algorithm which generates a SQL query by joining all the relations along a path: Select T4.ID From σ(a) T1, σ(b1) T2, σ(c) T3, σ(d3) T4 Where T1.ID=T2.parentID And T2.ID=T3.parentID And T3.ID=T4.parentID While it is trivial to find out the matching relations in this SQL query based on the single-valued σ-mapping given in Table 1.A, it is not straightforward to find out them in case of the multi-valued σ-mapping shown in Table 1.B. For instance, it is not clear which relation should be returned for σ(c) out of the set {B1,B2,B3} and for σ(d3) out of the set {A,B1,B2,B3}. We propose the notion of path-based σ-mapping (σ p -mapping) in Section 4.2 to resolve the ambiguity due to the multi-valued schema mapping schemes by the help of input path structure and the existing mapping information. 4 Preliminaries 4.1 Schema-Based Query Mapping In schema-based relational XML storage, query mapping typically takes an XML query, an XML schema, the XML-to-Relational schema mapping information, which is called σ-mapping, and a database as input, produces a relational query, runs it against the database where the XML document is stored, and returns the query results as output. In the following, we formalize the notions of σ-mapping and query mapping:
6 608 M. Atay et al. Definition 1 (σ-mapping). Given an XML schema S with element-type set E and attribute-type set A, and a database schema R, aσ-mapping is a mapping σ :(E A) R, such that given an attribute/element type e, σ(e) is the set of relations in which the instances of e will be stored. Definition 2 (Query Mapping). A query mapping QM is a function that assigns to each tuple (Q, S, X, R, B, σ) a relational query Q, where Q is an XML query, S is an XML schema, X is an XML document conforming to S, R is a database schema, B is a database of R, σ is a mapping from S to R, and Q is a set of relational queries equivalent to Q such that Q (B) Q(X). 4.2 σ p -Mapping We propose to define a path-based σ-mapping (σ p -mapping) to resolve the mapping ambiguity that arises in the presence of multi-valued schema mappings. The σ p -mapping uses the information obtained from the path structure and σ-mapping to find a single relation for each element in the input path. Once σ p -mapping of a particular path expression is computed, then the equivalent relational query can be constructed without any ambiguity concern. Lemma 1. Any edge in an XML schema graph G is identified either as a normal-edge or a -edge. Proof. If an element can occur at most once under its parent, then it is connected to its parent by an edge labeled by, or? in XML schema graph G. Allthe edges in G labeled by, and? operators constitute normal-edges. If an element can occur more than once under its parent, then this element is connected to its parent by an edge labeled by or + in G. All the edges in G labeled by and + operators constitute -edges. Since there is no occurrence operator other than {,,?,, + } in G, any edge in an XML schema graph is either a normal-edge or a -edge. In the following, we formalize the notions of simple path expression and σ p - mapping: Definition 3 (Simple Path Expression). A simple path expression p can be denoted as /n 1 /n 2 /.../n k where each n i isthenodetypeofstepi and the axis of each step is child axis / which denotes parent-child relationship. The node type n 1 represents the root element of the XML document and k represents the number of steps in p. Definition 4 (σ p -Mapping). Given an input simple path p = /e 1 /e 2 /.../e n, σ-mapping σ, and an XML schema graph G, σ p (e i ) is defined as follows where i =1, 2,..., n: { σ(ei ), if σ(e i ) =1 σ p (e i ) = e i, if σ(e i ) >1 and (e i 1,e i ) is a -edge in G σ p(e i 1), if σ(e i) >1 and (e i 1,e i)isanormal-edgeing
7 XML-to-SQL Query Mapping in the Presence 609 Example 2. If the XPath expression p=/a/b1/c/d3 is given based on the XML schema graph shown in Figure 1, the below σ p -mapping is produced by computing the σ p based on the multi-valued schema mapping shown in Table 1.B: σ p Element Relation A A B1 B1 C B1 D3 B1 Theorem 1 (Correctness). Given an input simple path expression p = /e 1 /e 2 /.../e n, σ p (e i ) returns the correct and single target relation for every element e i in p, wherei =1, 2,..., n. Proof (Sketch). First, σ p (e i ) returns the same relation as σ(e i ) if the input element e i is mapped to a single relation. Second, if the input element e i is mapped to multiple relations, then the type of the edge between e i and its parent e i 1 is checked from the XML schema graph. If the edge is a -edge, then the σ p (e i ) returns the relation e i since e i is mapped to a separate relation as it occurs multiple times under its parent. Third, if the input element e i is mapped to multiple relations and the type of the edge between e i and its parent e i 1 is a normal-edge, then the σ p (e i 1 ) is called to determine the target relation for e i since it is mapped to the same relation as its parent e i 1.Recursivecalltoσ p (e i 1 ) stops whenever a single relation is returned. If all the edges from e 1 to e i 1 are normal-edges, then the recursion is going to stop at σ p (e 1 )sincee 1 is the root element and it is always mapped to the single relation e 1. All the edges in an XML schema graph fall into either normal-edge or -edge categories as it follows from Lemma 1. As a result, σ p (e i ) returns the correct and the single relation corresponding to element e i. Besides multi-valued mappings, the σ p -mapping can deal with single-valued schema mappings where it returns the same values as σ-mapping. Therefore, σ p -mapping is sufficient to develop a generic XML-to-Relational query mapping algorithm in the presence of multi-valued schema mappings as well as singlevalued schema mappings. 4.3 Unfolded XML Schema Graph The challenge with translating recursive XML queries over recursive XML schemas is to identify the infinite number of matching paths in the XML schema graph. However, if we unfold the recursive XML schema based on the maximum levels of depths for each cycle in the schema graph, we can find out a finite number of matching paths for an arbitrary XML query including the recursive ones. This observation leads us to an elegant and efficient solution for the problem of translating recursive XML queries in the presence of recursive XML schemas. We propose to convert a cyclic XML schema graph to a directed acyclic graph by unfolding the cycles in the original schema. This new schema is called unfolded
8 610 M. Atay et al. <A> < B1 > < C > < D1 >< E /></ D1 > < D2 > < E >< D1 /></ E > </ D2 > < D3 >< E /></ D3 > </ C > </ B1 > < B1 > < C > < D1 >< E /></ D1 > < D2 >< E /></ D2 > < D3 > < E > < D1 >< E /></ D1 > </ E > </ D3 > </ C > </ B1 > < D3 /> </A> A B1 B2 B3 C D1 D2 D3 E D1 E Fig. 2. A Sample XML Document and its Unfolded XML Schema Graph (UXG) XML schema graph (UXG). A UXG of a sample XML document, which conforms to the XML schema graph given in Figure 1, is shown in Figure 2. The formal definition of UXG is given in Definition 5. Definition 5 (Unfolded XML Schema Graph (UXG)). Given an XML schema S, unfolded schema of S is a directed acyclic graph UXG =(V, E, d 1,...d k ), where V is the set of vertices, E is the set of edges, each d i is the maximum level of depth for each cycle c i in S and k denotes the number of cycles in S. Eachcycle c i in S is unfolded to depth d i in UXG in top-down topological order. The vertices represent element types in S, and the edges represent their parent-child relationships. Each vertex is labeled with the name of the corresponding element type. An edge is labeled by if it is incident to a vertex which can appear more than once under its parent in the corresponding XML documents, otherwise no label is used. A recursive XML schema S can be converted into a non-recursive one in the form of a UXG G by unfolding the recursion in S with a finite number of occurrences of recursion that is decided from the XML documents X stored in the database, such that X conforms to S and G at the same time. In other words, S and G are equivalent to each other with respect to X. We can create a UXG by using one of the following two approaches: Static approach. The maximum depth of each cycle in the XML schema graph is determined by the help of a domain expert and a fixed UXG is generated during the schema mapping phase. This fixed UXG is used during the query mapping regardless of the structure of underlying XML documents. Dynamic approach. The maximum depth of each cycle in the XML schema graph is initialized to 1 and a default UXG is generated during schema
9 XML-to-SQL Query Mapping in the Presence 611 mapping phase. When a new XML document is loaded to the database during the data mapping phase, the maximum depth of each cycle in the current document is found and UXG is modified if any current depth value is greater than the existing one. Static UXG approach does not have any computation overhead during the data mapping phase. However, it may return unnecessary matching paths for a given recursive XML query. On the other hand, dynamic UXG approach associates some computational cost during the data mapping phase to maintain the UXG for minimizing the total number of matching paths for the input recursive XML queries. The UXG graph is constructed either during the schema mapping phase or the data mapping phase. We assume bulk data is loaded to the database system first, then it is queried next in a batch-processing fashion. Therefore, the construction of UXG does not introduce additional overhead to XML-to-Relational query mapping performance since it is precomputed before query mapping phase. 5 ID-Based Generic Query Mapping All the schema-based approaches proposed for XML-to-Relational query mapping in the literature have used ID-based techniques as in [4,5]. In ID-based techniques, each element is associated with a unique ID and the tree structure of the XML document is preserved by maintaining a foreign key to the parent which we call parentid. Each child axis / is translated into an equijoin between child and parent elements over their parentid and ID columns in ID-based techniques. We propose a generic ID-based XML-to-Relational query mapping algorithm, ID-XMLToSQL, in this section. An outline of ID-XMLToSQL is given in Figure 3. The ID-XMLToSQL algorithm first identifies all the matching simple paths p i and σ p -mappings σ pi corresponding to those paths when a path expression P and a UXG G u is given. Then it calls the SQL generation procedure SPathToSQL() for each simple path p i along with its mapping σ pi,and then, gets the union of the output SQL queries. We formalize the notion of a path expression as follow: Definition 6 (Path Expression). A path expression P can be denoted as a 1 n 1 a 2 n 2...a k n k where each n i is a node type and each a i is either child axis / or descendant axis //. Each a i n i constitutes a navigation step of P and k is the number of steps in P. A naive XML-to-SQL query mapping procedure follows a blindfold approach. It takes an input simple path expression and generates a relational query by joining the relations corresponding to each step in the simple path expression. A sample SQL query generated using naive query mapping approach is given in Example 1. When consecutive elements in a simple path expression are mapped to the same relation, then the naive approach unnecessarily joins the same relation
10 612 M. Atay et al. 00 Algorithm ID-XMLToSQL 01 Input: Path Expression P,UXGG u 02 Output: SQL query sql 03 Begin 04 Let p i, i=1,2,...,n, be the set of all matching simple paths of P in G u 05 Let σ pi be σ p-mapping for the simple path p i, i=1,2,...,n 06 sql= 07 sql = n i=1 SPathToSQL(pi,σp i ) 08 Return sql 09 End 00 Procedure SPathToSQL(Simple Path Expression p, σ p-mapping σ p) 01 Begin 02 Use σ p to cluster p = /e 1/e 2/.../e m according to Definition 7 03 FromClause= From 04 WhereClause= Where 05 For i=1 to m do / Construct From Clause / 06 If e i is the first element of a cluster then 07 FromClause += $σ p(e i) 08 End If 09 End For 10 For i=2 to m do / Construct Where Clause / 11 If e i is the first element of a cluster then 12 WhereClause += $σ p(e i 1).(e i 1.ID) =σ p(e i).(e i.parentid) 13 End If 14 If e i is neither first nor last element of a cluster then 15 WhereClause += $σ p(e i).(e i.id) is not null 16 End If 17 End For 18 sql= Select $σ p(e m).(e m.id) + FromClause + WhereClause 19 Return sql 20 End Fig. 3. ID-based Query Mapping Algorithm ID-XMLToSQL with itself multiple times. For the simple path expression and its σ p -mapping given in Example 2, corresponding SQL query will include two unnecessary self joins since the elements of last three steps in the path are mapped to the same relation. An intelligent XML-to-SQL query mapping algorithm should be able to recognize the elements mapped to the same relations and avoid the unnecessary self-join operations. We deal with this issue in SPathToSQL() procedure. The outline of SPathToSQL() procedure is shown in Figure 3. The SPathToSQL() procedure identifies the clusters in a path expression which are the groups of elements in consecutive navigation steps mapped into the same relation. The SPathToSQL() procedure recognizes each cluster in a simple path expression and only joins the relation corresponding to the last element of a cluster to the relation corresponding to the first element of its successor cluster. Thus, it avoids the self-join problem of a blindfold query mapping approach. The notion of a cluster is formalized as follows: Definition 7 (Cluster). Given a simple path expression p and a mapping σ p over p, the elements of consecutive steps in p which are mapped to the same relation constitute a cluster. Hence, p can be denoted as a sequence of clusters
11 XML-to-SQL Query Mapping in the Presence 613 such that p = c 1 c 2...c k where each c i is a cluster and k is the number of clusters in p. The SPathToSQL() procedure given in Figure 3 first constructs the From clause at lines It introduces one relation per cluster to the From clause since all the elements in a cluster are mapped to the same relation. The Where clause is constructed at lines A transition from one cluster to another in the input path is handled at lines A predicate of the form σ p (e i 1 ).(e i 1.ID) = σ p (e i ).(e i.parentid) joining the last element of the previous cluster to the first element of current cluster is added to the Where clause. As a result, the relations representing all the neighboring cluster are joined. The SPathToSQL() procedure adds an existential predicate of the form σ p (e i ).(e i.id) is not null for the intermediate elements of a cluster to the Where clause (lines 14-16) as it skips the intermediate elements in a cluster. Thus, it ensures that the middle elements of a cluster co-exist with the elements at each end of the cluster in the underlying XML document. The output SQL query is constructed and returned at lines The existential predicate not null is not introduced for the elements at each end of a cluster since they are already included within the join conditions of the output SQL query. Although the last element in a path expression may not be used in a join condition, we do not need to check the existence of the last element as it is used in the Select clause. We do not need to check the existence of the first element of a simple path expression, which is the root element, as all the simple paths start from the root element. Example 3. If the path expression /A/D3//E is given against the UXG shown in Figure 2 and input to ID-XMLToSQL algorithm, ID-XMLToSQL calls SPath- ToSQL() procedure with the following simple paths identified from the UXG: (i) /A/D3/E and (ii) /A/D3/E/D1/E and, their σ p -mappings: (i) {(A,A), (D3,A), (E,E)} and (ii) {(A,A), (D3,A), (E,E), (D1,D1), (E,E)}, respectively. Below is the generated output SQL query by our ID-XMLToSQL algorithm: Select E.ID From A, E Where A.D3.ID=E.parentID UNION ALL Select E.ID From A, E T1, D1, E T2 Where A.D3.ID=T1.parentID And T1.ID=D1.parentID and D1.ID=T2.parentID Theorem 2 (Time Complexity). The time complexity of the procedure SPath- ToSQL is O(n) where n is the number of steps in an input simple path expression p. Proof (Sketch). The statement at line 02 navigates p once to cluster it and can be evaluated in O(n). The loop at lines navigates p once to construct the From clause and is evaluated in O(n). The loop at lines navigates p once to
12 614 M. Atay et al. construct the Where clause and is executed in O(n). Thus, the time complexity of SPathToSQL() is O(n). 6 Experimental Study We compare the performance of our ID-XMLToSQL algorithm and the recursive query translation algorithm SQLGen of [4] in this section. We used a Pentium IV computer with 2.4 GHz processor and 1 GB main memory for the experiments. The experiments were run using the Java software development kit. We minimized the usage of system resources during the experiments to get more realistic results. We ran the programs 6 times and got the average value, excluding the first run, to have more accurate results. We used auction.xml document of XMark benchmark [19] as our data set to compare the performance of our proposed ID-XMLToSQL algorithm and SQL- Gen algorithm of [4]. The DTD of XMark includes several cycles, and thus, it is an appropriate XML schema for our experiments.the number of elements in the test XML document is 73,740. We selected nine queries with particular features for the test suit. Our test query suit is shown in Table 2. All the queries in our test suit are recursive queries as they contain descendant axis //. All the queries return the elements which are included in a cycle in the XML schema. While the queries Q1, Q8 and Q9 include clusters of two or more elements, the queries Q2, Q3, Q5, Q7, Q8 and Q9 include shared elements which have more than one parents in the XML schema. We implemented only a single-valued schema mapping scheme to run the two query mapping algorithms ID-XMLToSQL and SQLGen as SQLGen does not support multi-valued schema mapping schemes. We used a commercial relational DBMS which allows the usage of advanced SQL 99 with clause as it is centric to the algorithm of SQLGen. We measured the response time for each test query by running the queries generated by two algorithms separately. The experimental results are shown in Figure 4. We used logarithmic scale to increase the readability of the chart. As can be seen from the chart, our ID-XMLToSQL algorithm outperformed the SQLGen algorithm in all the test queries. The main reasons for the performance difference between ID-XMLToSQL and SQLGen include the followings: Table 2. Query Suit for Testing Query Query Definition Q1 /site/categories/category/description//parlist Q2 //text Q3 //parlist Q4 //asia//listitem Q5 //item//listitem Q6 //asia//parlist Q7 //item/parlist Q8 /site/regions/asia/item//parlist Q9 /site/regions/asia/item//listitem
13 XML-to-SQL Query Mapping in the Presence 615 Interval-XMLToSQL ID-XMLToSQL SQLGen Time (Logarithmic) Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Queries Fig. 4. Experimental Results for Query Mapping ID-XMLToSQL resolves the recursion at the XML schema level using precomputed unfolded XML schema graph unlike SQLGen which resolves it inside the relational engine using recursive SQL query. The queries generated by SQLGen are typically more complex and larger than the ones generated by our ID-XMLToSQL. ID-XMLToSQL uses the notion of clustering and avoids unnecessary selfjoins. 7 Conclusions and Future Work In this paper, we proposed the generic XML-to-SQL query mapping algorithm ID-XMLToSQL which can be used with multi-valued schema mappings as well as with single-valued schema mappings. ID-XMLToSQL uses our proposed pathbased σ p -mapping technique to find the target relation for a given element of a path query in the presence of multi-valued schema mappings. We proposed to convert a cyclic XML schema graph to an acyclic one by unfolding the cycles in the graph to a maximum level of depth. Thus, we are able to map the recursive XML queries over the unfolded XML schema graph to SQL queries without using special operators to capture the recursion. Therefore, our proposed query mapping algorithm can be used on any RDBMS as it uses standard SQL features unlike other recursive query mapping algorithms in the literature. We compared the performance of our ID-XMLToSQL algorithm to SQLGen algorithm of [4] and observed that ID-XMLToSQL outperformed SQLGen for all the queries in our test suit. We consider augmenting our proposed ID-based generic query mapping algorithm with interval-based and path-based mapping schemes as a potential future work. Acknowledgment The authors would like to thank Rajasekar Krishnamurthy for providing the source code of SQLGen algorithm and his cooperation, and Dapeng Liu for involving in the implementation of our ID-XMLToSQL algorithm.
14 616 M. Atay et al. References 1. Shanmugasundaram, J., Tufte, K., Zhang, C., He, G., DeWitt, D.J., Naughton, J.F.: Relational databases for querying XML documents: Limitations and opportunities. In: VLDB, pp (1999) 2. Atay, M., Chebotko, A., Liu, D., Lu, S., Fotouhi, F.: Efficient schema-based XMLto-Relational data mapping. Information Systems Journal 32(3), (2007) 3. Krishnamurthy, R., Kaushik, R., Naughton, J.F.: XML-to-SQL query translation literature: The state of the art and open problems. In: XML Database Symposium (2003) 4. Krishnamurthy, R., Chakaravarthy, V.T., Kaushik, R., Naughton, J.F.: Recursive XML schemas, recursive XML queries, and relational storage: XML-to-SQL query translation. In: Proc. of the 20th International Conference on Data Engineering, Boston, pp (March 2004) 5. Fan, W., Yu, J.X., Lu, H., Lu, J., Rastogi, R.: Query translation from XPath to SQL in the presence of recursive DTDs. In: Proc. of the 31sh VLDB Conference, Trondheim, Norway (2005) 6. Choi, B.: What are real DTDs like. In: WebDB Workshop (2002) 7. Deutsch, A., Fernandez, M.F., Suciu, D.: Storing semistructured data with STORED. In: SIGMOD Conference, pp (1999) 8. Florescu, D., Kossmann, D.: Storing and querying XML data using an RDBMS. IEEE Data Engineering Bulletin 22(3), (1999) 9. Schmidt, A., Kersten, M., Windhouwer, M., Waas, F.: Efficient relational storage and retrieval of XML documents. In: WebDB (2000) 10. Yoshikawa, M., Amagasa, T., Shimura, T., Uemura, S.: XRel: A path-based approach to storage and retrieval of XML documents using relational databases. ACM Transactions on InternetTechnology (TOIT) 1(1), (2001) 11. Tatarinov, I., Viglas, S., Beyer, K.S., Shanmugasundaram, J., Shekita, E.J., Zhang, C.: Storing and querying ordered XML using a relational database system. In: SIGMOD Conference, pp (2002) 12. Dehaan, D., Toman, D., Conses, M.P., Ozsu, T.: A comprehensive XQuery to SQL translation using dynamic interval encoding. In: SIGMOD Conference (2003) 13. Teubner, J., Grust, T., Keulen, M.V.: Staircase join: Teach a relational DBMS to watch its (axis) steps. In: VLDB Conference (2003) 14. Krishnamurthy, R., Kaushik, R., Naughton, J.F.: Efficient XML-to-Relational query translation: Where to add intelligence? In: Proc. of the 30th VLDB Conference, Toronto, Canada (2004) 15. Runapongsa, K., Patel, J.M.: Storing and querying XML data in object-relational dbmss. In: EDBT Workshops (2002) 16. Cheng, J., Xu, J.: DB2 extender for XML. IBM (2000), Oracle: XML Database Developer s guide - Oracle XML DB Release 2 (2002), Microsoft: SQLXML and XML Mapping Technologies (2004), Schmidt, A., Waas, F., Kersten, M.L., Carey, M.J., Manolescu, I., Busse, R.: XMark: a benchmark for XML data management. In: VLDB, pp (2002)
Unraveling the Duplicate-Elimination Problem in XML-to-SQL Query Translation
Unraveling the Duplicate-Elimination Problem in XML-to-SQL Query Translation Rajasekar Krishnamurthy University of Wisconsin [email protected] Raghav Kaushik Microsoft Corporation [email protected]
Efficient Mapping XML DTD to Relational Database
Efficient Mapping XML DTD to Relational Database Mohammed Adam Ibrahim Fakharaldien 1, Khalid Edris 2, Jasni Mohamed Zain 3, Norrozila Sulaiman 4 Faculty of Computer System and Software Engineering,University
Model-Mapping Approaches for Storing and Querying XML Documents in Relational Database: A Survey
Model-Mapping Approaches for Storing and Querying XML Documents in Relational Database: A Survey 1 Amjad Qtaish, 2 Kamsuriah Ahmad 1 School of Computer Science, Faculty of Information Science and Technology,
A Tale of Two Approaches: Query Performance Study of XML Storage Strategies in Relational Databases
A Tale of Two Approaches: Performance Study of XML Storage Strategies in Relational Databases Sandeep Prakash Sourav S. Bhowmick School of Computer Engineering, Nanyang Technological University, Singapore
Enhancing Traditional Databases to Support Broader Data Management Applications. Yi Chen Computer Science & Engineering Arizona State University
Enhancing Traditional Databases to Support Broader Data Management Applications Yi Chen Computer Science & Engineering Arizona State University What Is a Database System? Of course, there are traditional
Deferred node-copying scheme for XQuery processors
Deferred node-copying scheme for XQuery processors Jan Kurš and Jan Vraný Software Engineering Group, FIT ČVUT, Kolejn 550/2, 160 00, Prague, Czech Republic [email protected], [email protected] Abstract.
An Efficient and Scalable Management of Ontology
An Efficient and Scalable Management of Ontology Myung-Jae Park 1, Jihyun Lee 1, Chun-Hee Lee 1, Jiexi Lin 1, Olivier Serres 2, and Chin-Wan Chung 1 1 Korea Advanced Institute of Science and Technology,
ON ANALYZING THE DATABASE PERFORMANCE FOR DIFFERENT CLASSES OF XML DOCUMENTS BASED ON THE USED STORAGE APPROACH
ON ANALYZING THE DATABASE PERFORMANCE FOR DIFFERENT CLASSES OF XML DOCUMENTS BASED ON THE USED STORAGE APPROACH Hagen Höpfner and Jörg Schad and Essam Mansour International University Bruchsal, Campus
Parallelizing Structural Joins to Process Queries over Big XML Data Using MapReduce
Parallelizing Structural Joins to Process Queries over Big XML Data Using MapReduce Huayu Wu Institute for Infocomm Research, A*STAR, Singapore [email protected] Abstract. Processing XML queries over
Storing and Querying Ordered XML Using a Relational Database System
Kevin Beyer IBM Almaden Research Center Storing and Querying Ordered XML Using a Relational Database System Igor Tatarinov* University of Washington Jayavel Shanmugasundaram* Cornell University Stratis
Translating WFS Query to SQL/XML Query
Translating WFS Query to SQL/XML Query Vânia Vidal, Fernando Lemos, Fábio Feitosa Departamento de Computação Universidade Federal do Ceará (UFC) Fortaleza CE Brazil {vvidal, fernandocl, fabiofbf}@lia.ufc.br
Constraint Preserving XML Storage in Relations
Constraint Preserving XML Storage in Relations Yi Chen, Susan B. Davidson and Yifeng Zheng ÔØº Ó ÓÑÔÙØ Ö Ò ÁÒ ÓÖÑ Ø ÓÒ Ë Ò ÍÒ Ú Ö ØÝ Ó È ÒÒ ÝÐÚ Ò [email protected] [email protected] [email protected]
Join Minimization in XML-to-SQL Translation: An Algebraic Approach
Join Minimization in XML-to-SQL Translation: An Algebraic Approach Murali Mani Song Wang Daniel J. Dougherty Elke A. Rundensteiner Computer Science Dept, WPI {mmani,songwang,dd,rundenst}@cs.wpi.edu Abstract
Technologies for a CERIF XML based CRIS
Technologies for a CERIF XML based CRIS Stefan Bärisch GESIS-IZ, Bonn, Germany Abstract The use of XML as a primary storage format as opposed to data exchange raises a number of questions regarding the
An Oracle White Paper October 2013. Oracle XML DB: Choosing the Best XMLType Storage Option for Your Use Case
An Oracle White Paper October 2013 Oracle XML DB: Choosing the Best XMLType Storage Option for Your Use Case Introduction XMLType is an abstract data type that provides different storage and indexing models
KEYWORD SEARCH IN RELATIONAL DATABASES
KEYWORD SEARCH IN RELATIONAL DATABASES N.Divya Bharathi 1 1 PG Scholar, Department of Computer Science and Engineering, ABSTRACT Adhiyamaan College of Engineering, Hosur, (India). Data mining refers to
Relational Databases for Querying XML Documents: Limitations and Opportunities. Outline. Motivation and Problem Definition Querying XML using a RDBMS
Relational Databases for Querying XML Documents: Limitations and Opportunities Jayavel Shanmugasundaram Kristin Tufte Gang He Chun Zhang David DeWitt Jeffrey Naughton Outline Motivation and Problem Definition
XML-to-SQL Query Translation
XML-to-SQL Query Translation By Rajasekar Krishnamurthy A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Computer Sciences) at the UNIVERSITY
GRAPH THEORY LECTURE 4: TREES
GRAPH THEORY LECTURE 4: TREES Abstract. 3.1 presents some standard characterizations and properties of trees. 3.2 presents several different types of trees. 3.7 develops a counting method based on a bijection
DLDB: Extending Relational Databases to Support Semantic Web Queries
DLDB: Extending Relational Databases to Support Semantic Web Queries Zhengxiang Pan (Lehigh University, USA [email protected]) Jeff Heflin (Lehigh University, USA [email protected]) Abstract: We
10. XML Storage 1. 10.1 Motivation. 10.1 Motivation. 10.1 Motivation. 10.1 Motivation. XML Databases 10. XML Storage 1 Overview
10. XML Storage 1 XML Databases 10. XML Storage 1 Overview Silke Eckstein Andreas Kupfer Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 10.6 Overview and
Creating Synthetic Temporal Document Collections for Web Archive Benchmarking
Creating Synthetic Temporal Document Collections for Web Archive Benchmarking Kjetil Nørvåg and Albert Overskeid Nybø Norwegian University of Science and Technology 7491 Trondheim, Norway Abstract. In
Relational Databases
Relational Databases Jan Chomicki University at Buffalo Jan Chomicki () Relational databases 1 / 18 Relational data model Domain domain: predefined set of atomic values: integers, strings,... every attribute
XML Data Integration
XML Data Integration Lucja Kot Cornell University 11 November 2010 Lucja Kot (Cornell University) XML Data Integration 11 November 2010 1 / 42 Introduction Data Integration and Query Answering A data integration
XML Data Integration Based on Content and Structure Similarity Using Keys
XML Data Integration Based on Content and Structure Similarity Using Keys Waraporn Viyanon 1, Sanjay K. Madria 1, and Sourav S. Bhowmick 2 1 Department of Computer Science, Missouri University of Science
PartJoin: An Efficient Storage and Query Execution for Data Warehouses
PartJoin: An Efficient Storage and Query Execution for Data Warehouses Ladjel Bellatreche 1, Michel Schneider 2, Mukesh Mohania 3, and Bharat Bhargava 4 1 IMERIR, Perpignan, FRANCE [email protected] 2
Classification and Prediction
Classification and Prediction Slides for Data Mining: Concepts and Techniques Chapter 7 Jiawei Han and Micheline Kamber Intelligent Database Systems Research Lab School of Computing Science Simon Fraser
Reliability Guarantees in Automata Based Scheduling for Embedded Control Software
1 Reliability Guarantees in Automata Based Scheduling for Embedded Control Software Santhosh Prabhu, Aritra Hazra, Pallab Dasgupta Department of CSE, IIT Kharagpur West Bengal, India - 721302. Email: {santhosh.prabhu,
Efficient Integration of Data Mining Techniques in Database Management Systems
Efficient Integration of Data Mining Techniques in Database Management Systems Fadila Bentayeb Jérôme Darmont Cédric Udréa ERIC, University of Lyon 2 5 avenue Pierre Mendès-France 69676 Bron Cedex France
A Study on Workload Imbalance Issues in Data Intensive Distributed Computing
A Study on Workload Imbalance Issues in Data Intensive Distributed Computing Sven Groot 1, Kazuo Goda 1, and Masaru Kitsuregawa 1 University of Tokyo, 4-6-1 Komaba, Meguro-ku, Tokyo 153-8505, Japan Abstract.
Storing and Querying XML Data in Object-Relational DBMSs
Storing and Querying XML Data in Object-Relational DBMSs Kanda Runapongsa and Jignesh M. Patel University of Michigan, Ann Arbor MI 48109, USA {krunapon, jignesh}@eecs.umich.edu Abstract. As the popularity
Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce
Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce Mohammad Farhan Husain, Pankil Doshi, Latifur Khan, and Bhavani Thuraisingham University of Texas at Dallas, Dallas TX 75080, USA Abstract.
Towards Full-fledged XML Fragmentation for Transactional Distributed Databases
Towards Full-fledged XML Fragmentation for Transactional Distributed Databases Rebeca Schroeder 1, Carmem S. Hara (supervisor) 1 1 Programa de Pós Graduação em Informática Universidade Federal do Paraná
XML Databases 10 O. 10. XML Storage 1 Overview
XML Databases 10 O 10. XML Storage 1 Overview Silke Eckstein Andreas Kupfer Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 10. XML Storage 1 10.1 Motivation
XStruct: Efficient Schema Extraction from Multiple and Large XML Documents
XStruct: Efficient Schema Extraction from Multiple and Large XML Documents Jan Hegewald, Felix Naumann, Melanie Weis Humboldt-Universität zu Berlin Unter den Linden 6, 10099 Berlin {hegewald,naumann,mweis}@informatik.hu-berlin.de
An Overview of Distributed Databases
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 4, Number 2 (2014), pp. 207-214 International Research Publications House http://www. irphouse.com /ijict.htm An Overview
COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction
COMP3420: Advanced Databases and Data Mining Classification and prediction: Introduction and Decision Tree Induction Lecture outline Classification versus prediction Classification A two step process Supervised
Database Auditing Design on Historical Data
ISBN 978-952-5726-09-1 (Print) Proceedings of the Second International Symposium on Networking and Network Security (ISNNS 10) Jinggangshan, P. R. China, 2-4, April. 2010, pp. 275-281 Database Auditing
G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs
G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs Sherif Sakr National ICT Australia UNSW, Sydney, Australia [email protected] Sameh Elnikety Microsoft Research Redmond, WA, USA [email protected]
Unified XML/relational storage March 2005. The IBM approach to unified XML/relational databases
March 2005 The IBM approach to unified XML/relational databases Page 2 Contents 2 What is native XML storage? 3 What options are available today? 3 Shred 5 CLOB 5 BLOB (pseudo native) 6 True native 7 The
Determination of the normalization level of database schemas through equivalence classes of attributes
Computer Science Journal of Moldova, vol.17, no.2(50), 2009 Determination of the normalization level of database schemas through equivalence classes of attributes Cotelea Vitalie Abstract In this paper,
Database Systems. Lecture 1: Introduction
Database Systems Lecture 1: Introduction General Information Professor: Leonid Libkin Contact: [email protected] Lectures: Tuesday, 11:10am 1 pm, AT LT4 Website: http://homepages.inf.ed.ac.uk/libkin/teach/dbs09/index.html
Modeling and Querying E-Commerce Data in Hybrid Relational-XML DBMSs
Modeling and Querying E-Commerce Data in Hybrid Relational-XML DBMSs Lipyeow Lim, Haixun Wang, and Min Wang IBM T. J. Watson Research Center {liplim,haixun,min}@us.ibm.com Abstract. Data in many industrial
Object Oriented Databases. OOAD Fall 2012 Arjun Gopalakrishna Bhavya Udayashankar
Object Oriented Databases OOAD Fall 2012 Arjun Gopalakrishna Bhavya Udayashankar Executive Summary The presentation on Object Oriented Databases gives a basic introduction to the concepts governing OODBs
The Goldberg Rao Algorithm for the Maximum Flow Problem
The Goldberg Rao Algorithm for the Maximum Flow Problem COS 528 class notes October 18, 2006 Scribe: Dávid Papp Main idea: use of the blocking flow paradigm to achieve essentially O(min{m 2/3, n 1/2 }
The Entity-Relationship Model
The Entity-Relationship Model 221 After completing this chapter, you should be able to explain the three phases of database design, Why are multiple phases useful? evaluate the significance of the Entity-Relationship
Oracle8i Spatial: Experiences with Extensible Databases
Oracle8i Spatial: Experiences with Extensible Databases Siva Ravada and Jayant Sharma Spatial Products Division Oracle Corporation One Oracle Drive Nashua NH-03062 {sravada,jsharma}@us.oracle.com 1 Introduction
ABSTRACT 1. INTRODUCTION. Kamil Bajda-Pawlikowski [email protected]
Kamil Bajda-Pawlikowski [email protected] Querying RDF data stored in DBMS: SPARQL to SQL Conversion Yale University technical report #1409 ABSTRACT This paper discusses the design and implementation
A MEDIATION LAYER FOR HETEROGENEOUS XML SCHEMAS
A MEDIATION LAYER FOR HETEROGENEOUS XML SCHEMAS Abdelsalam Almarimi 1, Jaroslav Pokorny 2 Abstract This paper describes an approach for mediation of heterogeneous XML schemas. Such an approach is proposed
Single machine parallel batch scheduling with unbounded capacity
Workshop on Combinatorics and Graph Theory 21th, April, 2006 Nankai University Single machine parallel batch scheduling with unbounded capacity Yuan Jinjiang Department of mathematics, Zhengzhou University
INTEGRATION OF XML DATA IN PEER-TO-PEER E-COMMERCE APPLICATIONS
INTEGRATION OF XML DATA IN PEER-TO-PEER E-COMMERCE APPLICATIONS Tadeusz Pankowski 1,2 1 Institute of Control and Information Engineering Poznan University of Technology Pl. M.S.-Curie 5, 60-965 Poznan
XBench Benchmark and Performance Testing of XML DBMSs
XBench Benchmark and Performance Testing of XML DBMSs Benjamin Bin Yao M. Tamer Özsu University of Waterloo School of Computer Science {bbyao, tozsu}@uwaterloo.ca Nitin Khandelwal University of Pennsylvania
Integrating Heterogeneous Data Sources Using XML
Integrating Heterogeneous Data Sources Using XML 1 Yogesh R.Rochlani, 2 Prof. A.R. Itkikar 1 Department of Computer Science & Engineering Sipna COET, SGBAU, Amravati (MH), India 2 Department of Computer
Integrating Pattern Mining in Relational Databases
Integrating Pattern Mining in Relational Databases Toon Calders, Bart Goethals, and Adriana Prado University of Antwerp, Belgium {toon.calders, bart.goethals, adriana.prado}@ua.ac.be Abstract. Almost a
Efficiently Identifying Inclusion Dependencies in RDBMS
Efficiently Identifying Inclusion Dependencies in RDBMS Jana Bauckmann Department for Computer Science, Humboldt-Universität zu Berlin Rudower Chaussee 25, 12489 Berlin, Germany [email protected]
Translating between XML and Relational Databases using XML Schema and Automed
Imperial College of Science, Technology and Medicine (University of London) Department of Computing Translating between XML and Relational Databases using XML Schema and Automed Andrew Charles Smith acs203
µz An Efficient Engine for Fixed points with Constraints
µz An Efficient Engine for Fixed points with Constraints Kryštof Hoder, Nikolaj Bjørner, and Leonardo de Moura Manchester University and Microsoft Research Abstract. The µz tool is a scalable, efficient
CS 598CSC: Combinatorial Optimization Lecture date: 2/4/2010
CS 598CSC: Combinatorial Optimization Lecture date: /4/010 Instructor: Chandra Chekuri Scribe: David Morrison Gomory-Hu Trees (The work in this section closely follows [3]) Let G = (V, E) be an undirected
INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS
INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS STEVEN P. LALLEY AND ANDREW NOBEL Abstract. It is shown that there are no consistent decision rules for the hypothesis testing problem
Chapter 13: Query Processing. Basic Steps in Query Processing
Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing
Introduction to Scheduling Theory
Introduction to Scheduling Theory Arnaud Legrand Laboratoire Informatique et Distribution IMAG CNRS, France [email protected] November 8, 2004 1/ 26 Outline 1 Task graphs from outer space 2 Scheduling
Semantic Errors in SQL Queries: A Quite Complete List
Semantic Errors in SQL Queries: A Quite Complete List Christian Goldberg, Stefan Brass Martin-Luther-Universität Halle-Wittenberg {goldberg,brass}@informatik.uni-halle.de Abstract We investigate classes
Clean Answers over Dirty Databases: A Probabilistic Approach
Clean Answers over Dirty Databases: A Probabilistic Approach Periklis Andritsos University of Trento [email protected] Ariel Fuxman University of Toronto [email protected] Renée J. Miller University
Big Data Begets Big Database Theory
Big Data Begets Big Database Theory Dan Suciu University of Washington 1 Motivation Industry analysts describe Big Data in terms of three V s: volume, velocity, variety. The data is too big to process
131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10
1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom
Quiz! Database Indexes. Index. Quiz! Disc and main memory. Quiz! How costly is this operation (naive solution)?
Database Indexes How costly is this operation (naive solution)? course per weekday hour room TDA356 2 VR Monday 13:15 TDA356 2 VR Thursday 08:00 TDA356 4 HB1 Tuesday 08:00 TDA356 4 HB1 Friday 13:15 TIN090
A first step towards modeling semistructured data in hybrid multimodal logic
A first step towards modeling semistructured data in hybrid multimodal logic Nicole Bidoit * Serenella Cerrito ** Virginie Thion * * LRI UMR CNRS 8623, Université Paris 11, Centre d Orsay. ** LaMI UMR
On the k-path cover problem for cacti
On the k-path cover problem for cacti Zemin Jin and Xueliang Li Center for Combinatorics and LPMC Nankai University Tianjin 300071, P.R. China [email protected], [email protected] Abstract In this paper we
An XML Based Data Exchange Model for Power System Studies
ARI The Bulletin of the Istanbul Technical University VOLUME 54, NUMBER 2 Communicated by Sondan Durukanoğlu Feyiz An XML Based Data Exchange Model for Power System Studies Hasan Dağ Department of Electrical
QuickDB Yet YetAnother Database Management System?
QuickDB Yet YetAnother Database Management System? Radim Bača, Peter Chovanec, Michal Krátký, and Petr Lukáš Radim Bača, Peter Chovanec, Michal Krátký, and Petr Lukáš Department of Computer Science, FEECS,
Connectivity and cuts
Math 104, Graph Theory February 19, 2013 Measure of connectivity How connected are each of these graphs? > increasing connectivity > I G 1 is a tree, so it is a connected graph w/minimum # of edges. Every
Enforcing Data Quality Rules for a Synchronized VM Log Audit Environment Using Transformation Mapping Techniques
Enforcing Data Quality Rules for a Synchronized VM Log Audit Environment Using Transformation Mapping Techniques Sean Thorpe 1, Indrajit Ray 2, and Tyrone Grandison 3 1 Faculty of Engineering and Computing,
SQL Query Evaluation. Winter 2006-2007 Lecture 23
SQL Query Evaluation Winter 2006-2007 Lecture 23 SQL Query Processing Databases go through three steps: Parse SQL into an execution plan Optimize the execution plan Evaluate the optimized plan Execution
Scalable Data Integration by Mapping Data to Queries
Scalable Data Integration by Mapping Data to Queries Martin Hentschel 1, Donald Kossmann 1, Daniela Florescu 2, Laura Haas 3, Tim Kraska 1, and Renée J. Miller 4 1 Systems Group, Department of Computer
A Workbench for Prototyping XML Data Exchange (extended abstract)
A Workbench for Prototyping XML Data Exchange (extended abstract) Renzo Orsini and Augusto Celentano Università Ca Foscari di Venezia, Dipartimento di Informatica via Torino 155, 30172 Mestre (VE), Italy
Topics in basic DBMS course
Topics in basic DBMS course Database design Transaction processing Relational query languages (SQL), calculus, and algebra DBMS APIs Database tuning (physical database design) Basic query processing (ch
Comparison of XML Support in IBM DB2 9, Microsoft SQL Server 2005, Oracle 10g
Comparison of XML Support in IBM DB2 9, Microsoft SQL Server 2005, Oracle 10g O. Beza¹, M. Patsala², E. Keramopoulos³ ¹Dpt. Of Information Technology, Alexander Technology Educational Institute (ATEI),
Personalized e-learning a Goal Oriented Approach
Proceedings of the 7th WSEAS International Conference on Distance Learning and Web Engineering, Beijing, China, September 15-17, 2007 304 Personalized e-learning a Goal Oriented Approach ZHIQI SHEN 1,
XML Design for Relational Storage
XML Design for Relational Storage Solmaz Kolahi University of Toronto [email protected] Leonid Libkin University of Edinburgh [email protected] ABSTRACT Design principles for XML schemas that eliminate
OPTIMAL DESIGN OF DISTRIBUTED SENSOR NETWORKS FOR FIELD RECONSTRUCTION
OPTIMAL DESIGN OF DISTRIBUTED SENSOR NETWORKS FOR FIELD RECONSTRUCTION Sérgio Pequito, Stephen Kruzick, Soummya Kar, José M. F. Moura, A. Pedro Aguiar Department of Electrical and Computer Engineering
