Storing and Querying XML Data in Object-Relational DBMSs
|
|
|
- Gloria Cannon
- 9 years ago
- Views:
Transcription
1 Storing and Querying XML Data in Object-Relational DBMSs Kanda Runapongsa and Jignesh M. Patel University of Michigan, Ann Arbor MI 48109, USA {krunapon, Abstract. As the popularity of extensible Markup Language (XML) continues to increase at an astonishing pace, data management systems for storing and querying large repositories of XML data are urgently needed. In this paper, we investigate an Object-Relational DBMS (ORDBMS) for storing and querying XML data. We present an algorithm, called XORator, for mapping XML documents to tables in an ORDBMS. An important part of this mapping is assigning a fragment of an XML document to a new XML data type. We demonstrate that using the XO- Rator algorithm, an ORDBMS is usually more efficient than a Relational DBMS (RDBMS). Based on an actual implementation in DB2 V.7.2, we compare the performance of the XORator algorithm with a well-known algorithm for mapping XML data to an RDBMS. Our experiments show that the XORator algorithm requires less storage space, has much faster loading times, and in most cases can evaluate queries faster. The primary reason for this performance improvement is that the XORator algorithm results in a database that is smaller in size, and queries that usually have fewer number of joins. 1 Introduction As the popularity of XML (extensible Markup Language) [5] for representing easily sharable data continues to grow, large repositories of XML data are likely to emerge. Data management systems for storing and querying these large repositories are urgently needed. Currently, there are two dominating approaches for managing XML repositories [14]. The first approach is to use a native XML database engine for storing and querying XML data sets [20, 1]. This approach has the advantage that it can provide a more natural data model and query language for XML data, which is typically viewed using a hierarchical or graph representation. The second approach is to map the XML data and queries to constructs provided by a Relational DBMS (RDBMS) [12,15,25,26]. XML data is mapped to relations, and queries on the XML data are converted into SQL queries. The results of the SQL queries are then converted to XML documents before returning the answer to the user. If the mapping of the XML data and queries to relational constructs is automatic, then the user does not need to be involved in the complexity of mapping. One can leverage many decades of research and commercialization efforts by exploiting existing features in an RDBMS. An additional advantage of an RDBMS is that This work was supported in part by the National Science Foundation under NSF grant IIS , by an IBM CAS Fellowship, and by gift donations from IBM and NCR. A.B. Chaudhri et al. (Eds.): EDBT 2002 Workshops, LNCS 2490, pp , c Springer-Verlag Berlin Heidelberg 2002
2 Storing and Querying XML Data in Object-Relational DBMSs 267 it can be used for querying both XML data and data that exists in the relational systems. The disadvantage of using an RDBMS is that it can lower performance since a mapping from XML data to the relational data may produce a database schema with many relations. Queries on the XML data when translated into SQL queries may potentially have many joins, which would make the queries expensive to evaluate. In this paper, we investigate a third approach, namely using an Object-Relational DBMS (ORDBMS) for managing XML data sets. Our motivations for using an ORDBMS are threefold: first, most database vendors today offer universal database products that combine their relational DBMS and ORDBMS offerings into a single product. This implies that the ORDBMS products have all the advantages of an RDBMS. Second, an ORDBMS has a more expressive type system than an RDBMS, and as we will show, can be used to produce a more efficient mapping from an XML data model to constructs in the OR- DBMS type system. Third, an ORDBMS is better suited for storing and querying XML documents that may use a richer set of data types. We present an algorithm, called XORator (XML to OR Translator), that uses Document Type Definitions (DTDs) to map XML documents to tables in an ORDBMS. An important part of this mapping is the assignment of a fragment of an XML document to a new XML data type, called XADT (XML Abstract Data Type). Among several recently proposed XML schema languages, in this paper, we use DTDs since real XML documents that conform to DTDs are readily available today. Although we focus on using DTD, the XORator algorithm is applicable to any XML schema language that allows defining elements composed of attributes and other nested subelements. In this paper, we also explore alternative storage organizations for the XADT. Storing a large XML fragment as a tagged string can be inefficient as repeated tags can occupy a large amount of space. To reduce this space overhead, we also explore the use of an alternative compressed storage technique for the XADT. We have implemented the XORator algorithm and the XADT in DB2 UDB V.7.2, and used real and synthetic data sets to demonstrate the effectiveness of the proposed algorithm. In the experiments, we compare the XORator algorithm with the well-known Hybrid algorithm for mapping XML data to relational databases [25]. Our experiments demonstrate that compared to the Hybrid algorithm, the XORator algorithm requires less storage space, has much faster loading times, and in most cases can evaluate queries faster. In many cases, query evaluation using the XORator algorithm is faster by an order of magnitude, primarily because the XORator algorithm produces a database that is smaller in size, and results in queries that usually have fewer number of joins. The remainder of this paper is organized as follows. We first discuss related work in Section 2. Section 3 describes the XORator algorithm for mapping XML documents to relations in an ORDBMS using a DTD. We then compare the effectiveness of the XORator algorithm with the Hybrid algorithm in Section 4. Finally, we present our conclusions and discuss future work in Section 5. 2 Related Work In this section we discuss and compare previous work on mapping XML data to relational data. Several commercial DBMSs offer some support for storing and querying
3 268 K. Runapongsa and J.M. Patel XML documents [23, 11, 10]. However, these engines do not provide automatic mappings from XML data to relational data, thus the user needs to design an appropriate storage mapping. A number of previous works have been proposed for automatic mapping from XML documents to relations [12, 15, 17, 24, 25, 26]. Deutsch, Fernandez, and Suciu [12] proposed the STORED system for mapping between the semistructured data model and the relational data model. They adapted a data mining algorithm to identify highly supported patterns for storage in relations. Along the lines of mapping XML data sets to relations, Florescu and Kossmann [15] proposed and evaluated a number of alternative mapping techniques. From their experimental results, the best overall approach is an approach based on separate Attribute tables for every attribute name, and inlining values into these Attribute tables. While these approaches require only an instance of XML data in the transformation process, Shanmugasundaram et al. [25] used the DTD to find a good" storage mapping. They proposed three strategies to map DTDs into relational schemas and identified the Hybrid inlining algorithm as being superior to the other ones (in most cases). Most recently, Bohannon et al. [2] introduced a cost-based framework for XML-to-relational storage mapping that automatically finds the best mapping for a given configuration of an XML Schema, XML data statistics, and an XML query workload. Like [25, 2], we also use the schema of XML documents to derive a relational schema. However, unlike these previously discussed algorithms, we leverage the data type extensibility feature of ORDBMSs to provide a more efficient mapping. We compare the effectiveness of the XORator algorithm (using an ORDBMS) with the Hybrid algorithm (using an RDBMS), and show that the XORator algorithm generally performs significantly better. Shimura et al. [26] proposed the method that decomposed XML documents into the nodes, and stored them in relational tables according to the node types. They defined a user data type to store a region of each node within a document. This data type keeps positions of nodes, and the methods (associated with the data type) determine ancestordescendant and element order relationships. Schimdt et al. [24] proposed the Monet XML data model, which is based on the notion of binary associations, and showed that their approach had better performance than the approach proposed by Shimura et al. [26]. Since the Monet approach uses a mapping scheme that converts each distinct edge in DTD to a table, their mapping scheme produces a large number of tables. The Shakespeare DTD maps to four tables using the XORator algorithm, while it maps to ninety-five tables using the algorithm proposed in [24]. Techniques for resolving the data model and schema heterogeneity difference between the relational and XML data models have been examined [16]. The problem of preserving the semantics of the XML data model in the mapping process has also been addressed [18]. These techniques are complementary to the XORator algorithm of mapping based on the structural information of the XML data. Our work is closest to the work proposed by Klettke and Meyer [17]. While their mapping scheme uses a combination of DTD, the statistics of sample XML documents, and the query workload to map XML data to ORDBMS data, the XORator algorithm only examines the DTD. Whereas there is no implementation or experimental evaluation presented in [17], we implement the XORator algorithm and compare it with the Hybrid algorithm. Furthermore, their mapping assumes the existence of the following type
4 Storing and Querying XML Data in Object-Relational DBMSs 269 constructors: set-of, and list-of in ORDBMSs (which are not available in current commercial products), and requires that the user set a threshold specifying which attributes should be assigned to an XML data type. However, there are no guidelines provided in choosing a threshold. On the other hand, the XORator algorithm requires neither user input nor query workload. The XORator algorithm is a practical demonstration of the use of an XML data type and the advantage of using an ORDBMS over an RDBMS. 3 Storing XML Documents in an ORDBMS In this section, we describe the XORator algorithm for generating an object-relational schema from a DTD. In our discussions below we will graphically represent a DTD using the DTD graph proposed by Shanmugasundaram et al. [25]. A sample DTD for describing Plays is shown in Fig. 1, and the corresponding DTD graph is shown in Fig. 3(a). Fig. 1 <!ELEMENT PLAY <!ELEMENT INDUCT <!ELEMENT ACT <!ELEMENT SCENE <!ELEMENT SPEECH <!ELEMENT PROLOGUE <!ELEMENT TITLE <!ELEMENT SUBTITLE <!ELEMENT SUBHEAD <!ELEMENT SPEAKER <!ELEMENT LINE (INDUCT?, ACT+)> (TITLE, SUBTITLE, SCENE+)> (SCENE+, TITLE, SUBTITLE, PROLOGUE?)> (TITLE, SUBTITLE, (SPEECH SUBHEAD)+)> (SPEAKER, LINE)+> Fig. 1. A DTD of a Plays Data Set shows a DTD which states that a PLAY element can have two subelements: INDUCT and ACT in that order. Symbol? followed INDUCT indicates that there can be zero or one occurrence of INDUCT subelement nested in each PLAY element. Symbol + followed ACT indicates that there can be one or more occurrences of ACT subelements nested in each PLAY element. At the second ELEMENT definition, symbol followed SUBTITLE indicates that there is zero or more occurrences of SUBTITLE subelements nested in each INDUCT element. A subelement without any followed symbol represents that there must be only one occurrence of that subelement. For example, an ACT element must contain one and only one TITLE subelement. For more details about DTD, please refer to [4]. 3.1 Reducing DTD Complexity The first step in the mapping process is to simplify the DTD information to a form that makes the mapping process easier. We start by applying the set of rules proposed in [25] to simplify the complexity of DTD element specifications. These transformations reduce the number of nested expressions and the number of element items. Examples of these transformations are as follow:
5 270 K. Runapongsa and J.M. Patel Flattening (to convert a nested definition into a flat representation): (e 1,e 2 ) e 1,e 2 Simplification (to reduce multiple unary operators into a single unary operator) : e 1 e 1 Grouping (to group subelements that have the same name): e 0,e 1,e 1,e 2 e 0,e 1,e 2 In addition, e + is transformed to e. The simplified version of the DTD shown in Fig. 1 is depicted in Fig. 2. <!ELEMENT PLAY <!ELEMENT INDUCT <!ELEMENT ACT <!ELEMENT SCENE <!ELEMENT SPEECH <!ELEMENT PROLOGUE <!ELEMENT TITLE <!ELEMENT SUBTITLE <!ELEMENT SUBHEAD <!ELEMENT SPEAKER <!ELEMENT LINE (INDUCT?, ACT)> (TITLE, SUBTITLE, SCENE)> (SCENE, TITLE, SUBTITLE, PROLOGUE?)> (TITLE, SUBTITLE, SPEECH, SUBHEAD)> (SPEAKER, LINE)> Fig. 2. A DTD of a Plays Data Set (Simplified Version) 3.2 Building a DTD Graph After simplifying the DTD using the simplification rules [25], we build a DTD graph to represent the structure of the DTD. Nodes in the DTD graph are elements, attributes, and operators. Unlike the DTD graph proposed by Shanmugasundaram et al. [25] where each element below a node appears exactly once, in our DTD graph, elements that contain characters are duplicated to eliminate the sharing. To illustrate the application of this rule, consider the SUBTITLE element which is an element of type PCDATA (contains characters). In the DTD graph [25], the SUBTITLE element appears only once, as shown in Fig. 3(a). We choose to decouple the shared SUBTITLE element by rewriting the DTD graph, as shown in Fig. 3(b). The advantage of this approach is that fewer joins are required for queries that involve the SUBTITLE element and its parent elements in the DTD graph (such as the INDUCT or the ACT elements) when the SUBTITLE element is represented directly in the table corresponding to the parent attribute. The disadvantage of this approach is that queries on the SUBTITLE elements must now query all tables that contain data corresponding to the SUBTITLE element. In the future, we plan to take the query workload (if it is available) into account during the transformation. 3.3 XORator: Mapping DTD to an ORDBMS Schema The next step is to map this DTD graph to constructs in an ORDBMS schema. For this purpose, the XORator algorithm builds on the procedure used in the Hybrid algo-
6 Storing and Querying XML Data in Object-Relational DBMSs 271 PLAY PLAY?? ACT ACT INDUCT SCENE? INDUCT TITLE SCENE SUBTITLE? TITLE SUBTITLE SPEECH PROLOGUE? SUBHEAD TITLE SUBTITLE TITLE SUBTITLE PROLOGUE? SPEECH SUBHEAD SPEAKER SPEAKER LINE (a) The DTD Graph for the Plays DTD LINE (b) The Revised DTD Graph for the Plays DTD Fig. 3. The DTD Graphs for the Plays DTD rithm [25]. The procedure used in the Hybrid algorithm is summarized as follows. After creating a DTD graph, the Hybrid algorithm creates an element graph which expands the relevant part of the DTD graph into a tree structure. Given an element graph, a relation is created for nodes that satisfy any of these following conditions:1) nodes that have an in-degree of zero, 2) nodes that directly below a operator, 3) recursive nodes with indegree greater than one, and 4) one node among mutually recursive nodes with in-degree one. All remaining nodes (nodes not mapped to a relation) are inlined as attributes under the relation created for their closest ancestor nodes (in the element graph). On the other hand, the XORator algorithm creates only a DTD graph, and not all nodes below a operator are mapped to a relation. The XORator algorithm allows mapping an entire subtree of the DTD graph to an attribute of the XADT.AnXADT attribute can store a fragment of an XML document, and its interfaces are described in Section 3.4. The implementation details of the XADT in DB2 are described in Section 4.1. Using the XADT, the XORator algorithm applies the following rules: 1. If a non-leaf node N in the DTD graph has exactly one parent, and if there are no links incident on any of the descendants of this node, then node N is assigned to an XADT attribute. If node N is assigned to a relation, then queries on this node and its parent require a join. This rules identifies maximal subgraphs that are connected to the remaining nodes in the graph by a single node. Each subgraph is mapped to an XADT attribute.
7 272 K. Runapongsa and J.M. Patel 2. If a non-leaf node below a node is accessed by multiple nodes, then it is assigned to a relation. For nodes that are mapped to relations, the ancestors of these nodes must also be assigned as relations. 3. If a leaf node is below a node, then it is assigned as an attribute of the XADT. Otherwise, it is assigned as an attribute of string type. The schemas of the relations produced by the two algorithms are shown in Fig. 4 and Fig. 5 respectively. play act scene induct speech subtitle subhead speaker line (playid:integer) (actid:integer, act parentid:integer, act childorder:integer, act title:string, act prologue:string) (sceneid:integer, scene parentid:integer, scene childorder:integer, scene title:string) (inductid:integer, induct parentid:integer, induct childorder:integer, induct title:string) (speechid:integer, speech parentid:integer, speech parentcode:string, speech childorder:integer) (subtitleid:integer, subtitle parentid:integer, subtitle parentcode:integer, subtitle childorder:integer, subtitle value:string) (subheadid:integer, subhead parentid:integer, subhead childorder:integer, subhead value:string) (speakerid:integer, speaker parentid:integer, speaker childorder:integer, speaker value:string) (lineid:integer, line parentid:integer, line childorder:integer, line value:string) Fig. 4. The Relational Schema Produced Using the Hybrid Algorithm Fields shown in italic are primary keys. As proposed in [25], each relation has an ID field to serve as the primary key for that relation, and all relations corresponding to element nodes having a parent also have a parentid field to serve as a foreign key to the parent tuple. Moreover, all relations corresponding to element nodes that have multiple parents have a parentcode field to identify the corresponding parent tables. In this paper, we also add a childorder field to serve as the order number of the element among its siblings. 3.4 Defining an XML Data Type (XADT) There are two aspects in designing the XADT: choosing a storage format for the data type, and defining appropriate methods on the data type. We discuss each of these aspects in turn. Storage Alternatives for the XADT A naive storage format is to store in the attribute the text string corresponding to the
8 Storing and Querying XML Data in Object-Relational DBMSs 273 play act scene induct speech (playid:integer) (actid:integer, act parentid:integer, act childorder:integer, act title:string, act subtitle:xadt, act prologue:string) (sceneid:integer, scene parentid:integer, scene childorder:integer, scene title:string, scene subtitle:xadt, scene subhead:xadt) (inductid:integer, induct parentid:integer, induct childorder:integer, induct title:string, induct subtitle:xadt) (speechid:integer, speech parentid:integer, speech parentcode:string, speech childorder:integer, speech speaker:xadt, speech line:xadt) Fig. 5. The Relational Schema Produced Using the XORator Algorithm fragment of the XML document. Since a string may have many repeated element tag names, this storage format may be inefficient. An alternative storage representation is to use a compressed representation for the XML fragment. The approach that we adopt in this paper is to use a compression technique inspired by the XMill compressor [19]. The element tags are mapped to integer codes, and element tags are replaced by these integer codes. A small dictionary is stored along with the XML fragment to record the mapping between the integer codes and the actual element tag names. In some cases where there are few repeated tags in the XADT attribute, the compression increases the storage size because of the dictionary space. Consequently, we have two implementations of the XADT: one that uses compression, and the other one that does not. The decision to use the correct implementation of the XADT is made during the document transformation process by monitoring the effectiveness of the compression technique. This is achieved by randomly parsing a few sample documents to obtain the storage space sizes in both uncompressed and compressed versions. Compression is used only if the space efficiency is above a certain threshold value. Methods on the XADT In addition to defining the required methods for input and output on the XADT, we also define the following methods: 1. XADT getelm(xadt inxml, VARCHAR rootelm, VARCHAR searchelm, VAR- CHAR searchkey, INTEGER level): This method examines the XML fragment stored in inxml, and returns all rootelm elements that have searchelm within a depth of level from the rootelm. A default value for level indicates that the level information is to be ignored. If searchkey and searchelm are specified, this method only considers the searchelm that contains the searchkey keyword. If only searchkey is an empty string, then it returns all rootelm elements that have searchelm as subelements. If only searchelm is an empty string, then it returns all rootelm elements. If both searchelm and searchkey are empty strings, this method returns all rootelm elements in the inxml fragment. The above function answers a simple path query with two element tag names, but more complex path queries can be answered by a composition of multi-
9 274 K. Runapongsa and J.M. Patel ple calls to this function. This function takes an XADT attribute as input and produces an XADT output which can then be an input to another call of this function. 2. INTEGER findkeyinelm(xadt inxml, VARCHAR searchelm, VARCHAR searchkey): This method examines all elements with the tag name searchelm in inxml, and searches for all searchelm elements with content that matches the searchkey keyword. As soon as the first searchelm element that contains searchkey is found, the function returns a value of 1 (true). Otherwise, the function returns a value of 0 (false). If only searchkey is an empty string, this method simply checks whether inxml contains any searchelm elements. If only searchelm is an empty string, this method simply checks whether searchkey is part of the content of any element in inxml. Both searchelm and searchkey cannot be empty strings at the same time. This function is a special case of the getelm method defined above, and is implemented for efficiency purposes only. 3. XADT getelmindex(xadt inxml, VARCHAR parentelm, VARCHAR childelm, INTEGER startpos, INTEGER endpos): This method returns all childelm elements that are children of the parentelm elements and with the sibling order from startpos to endpos positions. If only parentelm is an empty string, then childelm is treated as the root element in the XADT. Note that childelm cannot be an empty string. In this paper, we only use the three methods described above, however, more specialized methods can be implemented to improve the performance using the XADT even further. Sample queries in both algorithms posed on a data set describing Shakespeare Play are depicted in Fig. 6 and Fig. 7. The DTD for this data set is shown in Fig. 9. Fig. 6 shows query QE1, which retrieves lines that are spoken in acts by the speaker HAMLET and have the keyword friend in the line. Fig. 6(a) demonstrates the use of the XADT methods: getelm and findkeyinelm, and Fig. 6(b) demonstrates the query QE1 executed over the database produced by the Hybrid algorithm. Fig. 7 shows query QE2, which returns the second line in each speech. Fig. 7(a) demonstrates the use of the XADT method: getelmindex, and Fig. 7(b) demonstrates the query QE2 executed over the database produced by the Hybrid algorithm. 3.5 Unnest Operator In addition to the functionality provided by the methods described above, to answer queries posed on the XML data we also need an unnest operator. As described in Section 3.3, using the XORator algorithm, it is possible to map an entire subtree of a DTD graph below a node to an XADT attribute. One can then view the XADT attribute as a set of XML fragment trees. When a query needs to examine individual elements in the set, an unnest operator is required. For example, for the Plays DTD (of Fig. 1), consider the query that requests a distinct list of all speakers who speak in at least one play. In our approach, speakers are stored as an XADT attribute. It is possible that one speech has
10 Storing and Querying XML Data in Object-Relational DBMSs 275 SELECT getelm(speech line, LINE, LINE, friend ) FROM speech, act WHERE findkeyinelm(speech speaker, SPEAKER, HAMLET ) = 1 AND findkeyinelm(speech line, LINE, friend ) = 1 AND speech parentid = act ID AND speech parentcode = ACT (a) Using the XORator Algorithm SELECT line val FROM speech, act, speaker, line WHERE speech parentid = act ID AND speech parentcode = ACT AND speaker parentid = speech ID AND speaker val = HAMLET AND line parentid = speech ID AND line val like %friend% (b) Using the Hybrid Algorithm Fig. 6. Query QE1 in Both Algorithms SELECT FROM getelmindex(speech line,, LINE,2,2) speech SELECT line val FROM speech, line WHERE line parentid = speech ID AND line childorder = 2 (a) Using the XORator Algorithm (b) Using the Hybrid Algorithm Fig. 7. Query QE2 in Both Algorithms many speakers. Thus the speaker attribute, which is of type XADT, of a speech tuple can store the XML fragment, such as <sp>s1</sp><sp>s2</sp>, while another speech tuple could have a single speaker stored as <sp>s1</sp>. To evaluate the query, we need to first unnest the speakers and then retrieve distinct speakers. In our implementation, we define such an unnest operation using a table User-Defined Function (UDF). A table UDF is an external UDF which delivers a table to the SQL query in which it is referenced. A table UDF can be invoked in the FROM clause of a SQL statement, and produces a table with tuples in an unnested form. Fig. 8 shows the content of table speakers before we unnest the speaker attribute and the result of the query that unnests the speaker attribute. The first parameter (speaker) of the table UDF unnest is the attribute name that contains nested elements. The second parameter of this function, sp, is the tag name of elements to be unnested. unnesteds is the name of the table that is returned from this function, unnest. This table has one attribute, out, which contains the unnested elements. 4 Performance Evaluation In this section, we present the results of implementing the XORator algorithm (along with the XADT), and the results comparing its effectiveness with the Hybrid algorithm [25]. We evaluated the effectiveness of the two algorithms using both real and synthetic data
11 276 K. Runapongsa and J.M. Patel QUERY: SELECT FROM speaker speakers RESULT: SPEAKER <sp>s1</sp><sp>s2</sp> <sp>s1</sp> 2 record(s) selected. (a) Before Unnesting QUERY: SELECT FROM unnesteds.out as SPEAKER speakers, table(unnest(speaker, sp, unnesteds)) RESULT: SPEAKER <sp>s1</sp> <sp>s2</sp> 2 record(s) selected. (b) After Nesting Fig. 8. Before and After Unnesting speaker Attribute sets. For the real data set, we used the well-known Shakespeare plays data set [3]. In Section 4.3, we present the experimental results comparing the Hybrid and the XORator algorithms using this data set. We also tested the XORator algorithm using a data set that is deep and would force the XORator algorithm to map large XML fragments to XADT attributes. The deep DTD allows us to test how effective the XORator algorithm is when most of the data is stored inside the XADT attribute. We used the SIGMOD Proceedings DTD [21] for this purpose. Since we wanted large data sets, we used an XML document generator [9] to generate data conforming to this DTD. The results of the experiment with this DTD are presented in Section Implementation Overview We implemented the XADT in DB2 UDB V.7.2. Two versions of the XADT were implemented based on the two storage alternatives discussed in Section 3.4. The first implementation stores all the element tag names as strings. The second one stores all the element tag names as integers and uses a dictionary to encode the tags as integers, thereby storing the XML data in a compressed format. In both cases, the XADT was implemented on top of the VARCHAR data type provided by DB2. We used the C string functions to implement the methods outlined in Section 3.4. To parse the original XML documents, we used the IBM s Alphawork Java XML Parser Release (XML4JV ) [8]. We modified the parser so that it reads the DTD and applies the XORator algorithm and the Hybrid algorithm, generating SQL commands to create tables in DB2. The modified parser also chooses the storage alternative. The compressed format is chosen only if it reduces the storage space by at least 20%. In our current implementation all tuples in a table with an attribute of the XADT use the same storage representation. To decide which storage alternative to use, we randomly parse a few sample documents to obtain the storage space sizes in both uncompressed and compressed cases.
12 Storing and Querying XML Data in Object-Relational DBMSs Experimental Platform and Methodology We performed all experiments on a single-processor 550 MHz Pentium III machine running Windows 2000 V.5.0 with 256 MB of main memory. We used the IBM DB2 V.7.2 as the database system. All the UDFs on the XADT are run in a NOT FENCED mode in DB2. We chose to run in the NOT FENCED mode because the FENCED option causes the UDF run in an address space that is separated from the database address space, and this causes a significant performance penalty [7]. DB2 was configured to use a page size of 8MB, and the use of hash joins was enabled. Before executing queries in both algorithms, we created indexes as suggested by the DB2 Index Wizard, and collected statistics. The execution times reported in this section are cold numbers. Each query was run five times, and we report the average of the middle three execution times. 4.3 Experiments Using the Shakespeare Plays Data Set In this experiment, we loaded 37 XML Shakespeare play documents (size 7.5 MB) into DB2 using the two mapping algorithms. The DTD corresponding to the Shakespeare Plays data set [3] is shown in Fig. 9. For this data set, the XORator algorithm chooses not <!ELEMENT PLAY (TITLE,FM,PERSONAE,SCNDESCR, PLAYSUBT,INDUCT?, PROLOGUE?,ACT+,EPILOGUE?)> <!ELEMENT TITLE <!ELEMENT FM (P+)> <!ELEMENT P <!ELEMENT PERSONAE (TITLE,(PERSONA PGROUP)+)> <!ELEMENT PGROUP (PERSONA+,GRPDESCR)> <!ELEMENT PERSONA <!ELEMENT GRPDESCR <!ELEMENT SCNDESCR <!ELEMENT PLAYSUBT <!ELEMENT INDUCT (TITLE,SUBTITLE,(SCENE+ (SPEECH STAGEDIR SUBHEAD)+))> <!ELEMENT ACT (TITLE,SUBTITLE, PROLOGUE?,SCENE+,EPILOGUE?)> <!ELEMENT SCENE (TITLE,SUBTITLE, (SPEECH STAGEDIR SUBHEAD)+)> <!ELEMENT PROLOGUE (TITLE,SUBTITLE, (STAGEDIR SPEECH)+)> <!ELEMENT EPILOGUE (TITLE,SUBTITLE, (STAGEDIR SPEECH)+)> <!ELEMENT SPEECH (SPEAKER+, (LINE STAGEDIR SUBHEAD)+)> <!ELEMENT SPEAKER <!ELEMENT LINE (#PCDATA STAGEDIR)> <!ELEMENT STAGEDIR Fig. 9. The DTD of Shakespeare Data Set to use the compressed storage alternative since the compressed representation actually increases the storage size. The schemas for these relations are presented in the extended
13 278 K. Runapongsa and J.M. Patel version of this paper [22]. Table 1 shows the comparisons of the number of tables, database sizes, and index size between the two algorithms. The sizes of the database produced by the XORator algorithm is about 60% of the size of the database produced by the Hybrid algorithm. Table 1. Database Sizes When Using the Shakespeare Data Set Hybrid XORator Number of tables 17 7 Database size (MB) 15 9 Index size (MB) 30 3 To produce data sets that are larger than the base data set (size 7.5 MB), we took the original Shakespeare data set and loaded it multiple times, producing data sets that were two, four and eight times the original database size. We call these configurations DSx2, DSx4, and DSx8 respectively. We call the original configuration DSx1. The query set in this experiment is described below: QS1: Flattening List speakers and the lines that they speak. QS2: Full path expression Retrieve all lines that have stage directions associated with the lines. QS3: Selection Retrieve the lines that have the keyword Rising in the text of the stage direction. QS4: Multiple selections Retrieve the speeches spoken by the speaker Romeo in the play Romeo and Juliet. QS5: Twig with selection Retrieve the speeches in the play Romeo and Juliet spoken by the speaker Romeo, and the lines in the speech that contain the keyword love. QS6: Order access Retrieve the second line in all speeches that are in prologues. We chose this simple set of queries because it allows us to study the core performance issues and is an indicator of performance for more complex queries. In this paper, we do not focus on automatically rewriting XML queries into equivalent SQL queries. We refer the reader to proposed query rewriting algorithms from XML query languages to SQL [26, 6, 13]. The SQL statements corresponding to the above queries for both algorithms are presented in the extended version of this paper [22]. The data loading times and the result of executing queries QS1 through QS6 for the various data sets are shown in Fig. 10. In this figure, we plot the ratios of the execution times using the Hybrid and the XORator algorithms on a log scale. Since the size of the database produced using the XORator algorithm is about 60% of the size of the database produced using the Hybrid algorithm, the XORator algorithm results in much less loading times than the Hybrid algorithm. The XORator algorithm also results in significantly better execution times for all queries, except query QS6. In most queries, the XORator algorithm is an order of magnitude faster than the Hybrid algorithm. This is because all queries using the XORator algorithm requested at least one fewer join than the corresponding query (on the database produced) using the Hybrid algorithm. In the case of query QS6, the response times of the XORator algorithm are more than
14 Storing and Querying XML Data in Object-Relational DBMSs 279 Hybrid/XORator Performance ratios (log scale) DSx1 DSx2 DSx4 DSx8 QS1 QS2 QS3 QS4 QS5 QS6 Loading time Fig. 10. Hybrid/XORator Performance Ratios for DSx1, DSx2, DSx4, and DSx8 those of the Hybrid algorithm. This is because the database needs to scan the XADT attribute to extract elements in the specified order when using the XORator algorithm, while the database needs to only extract the value of the element order attribute when using the Hybrid algorithm. As shown in Fig. 10, the ratios of the response times of some queries, such as query QS3, do not always increase as the data size increases. This is because the optimizer chooses different plans for different data set sizes. Note that the query optimizer has the most up-to-date statistics since we always ran the runstats command and created indexes as suggested by the DB2 Index Wizard before executing the queries. 4.4 Experiments Using the Synthetic Data Set This section presents the experimental results from using the SIGMOD Proceedings data set. The DTD of this data set is an example of a deep DTD. With such a DTD, the XORator technique maps large fragments of XML data to the XADT attributes. So this DTD is representative of the worst-case scenario for the XORator algorithm. The DTD corresponding to the SIGMOD Proceedings data set has seven levels. Elements, such as author, that are likely to be queried often, are at the bottom-most level. The DTD is shown in Fig. 11 In this experiment, we loaded the 3000 documents (size 12 MB) into DB2. For this data set, the XORator algorithm chooses to use the compressed storage alternative since the compressed representation reduces database size by about 38%. Table 2 shows the comparisons of the number of tables, database sizes, and index size between the two algorithms. The sizes of the database produced by the XORator algorithm is about 65% of the size of the database produced by the Hybrid algorithm. Please refer to the extended version of this paper [22] for the schemas of these relations.
15 280 K. Runapongsa and J.M. Patel <! ELEMENT PP (volume,number,month,year,conference, date,confyear,location,slist)> <!ELEMENT volume <!ELEMENT number <!ELEMENT month <!ELEMENT year <!ELEMENT conference <!ELEMENT date <!ELEMENT confyear <!ELEMENT location <!ELEMENT slist (slisttuple)> <!ELEMENT slisttuple (sectionname,articles)> <!ELEMENT sectionname <!ATTLIST sectionname SectionPosition CDATA #IMPLIED > <!ELEMENT articles (atuple)> <!ELEMENT atuple (title,authors,initpage,endpage, Toindex,fullText)> <!ELEMENT title <!ATTLIST title articlecode CDATA #IMPLIED> <!ELEMENT authors (author)> <!ELEMENT author <!ATTLIST author AuthorPosition CDATA #IMPLIED > <!ELEMENT initpage <!ELEMENT endpage <!ELEMENT Toindex (index)?> <!ELEMENT index <!ATTLIST Toindex %Xlink;> <!ELEMENT fulltext (size)?> <!ELEMENT size <!ATTLIST fulltext %Xlink;> Fig. 11. The DTD of the SIGMOD Proceedings Data Set Table 2. Database Sizes When Using the SIGMOD Proceedings Dataset Hybrid XORator Number of tables 7 1 Database size (MB) Index size (MB) 34 2 To produce data sets that are larger than the base data set (size 12 MB), we took the original SIGMOD Proceedings data set and loaded it multiple times, producing data sets that were two, four and eight times the original database size. We call these configurations DPx2, DPx4, and DPx8 respectively. We call the original configuration DPx1. The query set in this experiment is described below:
16 Storing and Querying XML Data in Object-Relational DBMSs 281 QG1: Selection and extraction Retrieve the authors of the papers with the keyword Join in the paper title. QG2: Flattening List all authors and the names of the proceeding sections in which their papers appear. QG3: Flattening with selection Retrieve the proceeding section names that have the papers published by authors whose names have the keyword Worthy. QG4: Aggregation For each author, count the number of proceeding sections in which the author has a paper. QG5: Aggregation with selection Count the number of proceeding sections that have papers published by authors whose names have the keyword Bird. QG6: Order access with selection Retrieve the second author of the papers with the keyword Join in the paper title. The SQL statements corresponding to the above queries for both algorithms are presented in the extended version of this paper [22]. Hybrid/XORator Performance ratios (log scale) DPx1 DPx2 DPx4 DPx8 QG1 QG2 QG3 QG4 QG5 QG6 Loading time Fig. 12. Hybrid/XORator Performance Ratios for DPx1, DPx2, DPx4, and DPx8 The performance comparison between the Hybrid technique and the XORator technique (with the XADT implemented in compressed version) for the various data sets is shown in Fig. 12. Since the size of the database produced using the XORator algorithm is about 65% of the size of the database produced using the Hybrid algorithm, the XORator approach results in much smaller loading times. Two observations can be made based on Fig. 12: a) when the size of data is small (DPx1 and DPx2), the XORator algorithm performs worse than the Hybrid algorithm; and b) when the size of data becomes large (DPx4 and DPx8), the XORator algorithm outperforms the Hybrid algorithm. When the amount of data is small, the XORator algorithm results in worse execution times for all queries. With the SIGMOD Proceedings data set, all data is mapped to a single table. Consequently, there is no table join in the query, but each query has four to
17 282 K. Runapongsa and J.M. Patel eight calls of UDFs to extract subelements or to join elements inside the XADT attribute. The cost of invoking UDFs is a significant component of the query evaluation of the XORator algorithm. To investigate if a UDF incurs a higher performance penalty than an equivalent built-in function, we conduct the following experiment. We implemented two string functions, namely return length" and return substring", using built-in string functions and using UDFs. The experiment consists of two queries: QT1: Return the length of string in the SPEAKER attribute. QT2: Return a substring of string in the SPEAKER attribute from the fifth position to the last position. Both queries were run on the Shakespeare data set and returned 31,028 tuples. In the built-in function case, we used the SQL length and substr functions in queries QT1 and QT2 respectively. In the UDF case, we used UDFs that called the C functions strlen and strncpy in queries QT1 and QT2 respectively. The results of these experiments are shown in Fig. 13 (the exact time taken to run this query is deliberately not shown in this figure). Compared to using a built-in function, using the equivalent UDF is approximately 40% more expensive. Built-in UDFs Response Time 0 QT1 QT2 Queries Fig. 13. Overhead in Invoking UDFs Invoking UDFs are expensive for two reasons. First, our implementation of the methods on the XADT uses string compare and copy functions on the VARCHAR. This sometimes requires scanning a large amount of data. Perhaps, if we have the metadata associated with each XADT attribute to help us quickly access the starting position of each element stored inside the XADT data, the performance may be improved. We plan on investigating this issue further in the future. Second, the cost of evaluating a UDF is actually higher compared to the cost of evaluating an equivalent built-in function, as shown in Fig. 13. If the XADT were implemented as a native data type (by the database vendors), we would expect the overhead in invoking the methods associated with the XADT would be reduced significantly, making the XORator technique more competitive.
18 Storing and Querying XML Data in Object-Relational DBMSs 283 Regarding the second observation in Fig. 12, as the data size increases, the ratios of the response times between the Hybrid and the XORator algorithms become more than one (i.e., the XORator technique starts to outperform the Hybrid technique). The reason is that queries using the XORator algorithm typically have no join and thus the response times grow at O(n) rate (scan cost). However, the queries using the Hybrid algorithm, which typically have many joins, grow at either O(nlogn) rate (merge sort join cost), or O(n 2 ) rate (nested loop join cost), where n is the number of tuples. cost), where n is the number of tuples. 5 Conclusion and Future Work We have proposed and evaluated XORator, an algorithm for mapping XML document with DTDs to relations in an Object-Relational DBMS. Using the type-extensibility mechanisms provided by an ORDBMS, we added a new data type called XADT that can store and query arbitrary fragments of an XML document. The XORator algorithm maps a DTD to an object-relational schema that may have attributes of type XADT. Using an implementation in DB2, we show that the XORator algorithm generally outperforms the well-known Hybrid algorithm that is used for mapping XML documents to relations in an RDBMS. The primary reason for this superior performance is that queries in the database produced using XORator usually execute fewer number of joins. We also show that it is important to pay close attention to the implementation and evaluation of the UDF. Perhaps, if the database vendors implemented the XADT as a native data type, the overhead in invoking the methods associated with the XADT be reduced, making the XORator algorithm more effective. For future work, we will expand the mapping rules to accommodate additional factors, such as the query workload, and the statistics of XML data, including the number of levels and the size of the data that is in an XML fragment. We will also investigate storing of metadata with the XADT attribute and indexing on the XADT attribute to improve the performance of the methods on the XADT. Acknowledgements. We would like to thank H.V. Jagadish and Atul Prakash for providing insightful comments. We also would like to thank Kelly Lyons and Chun Zhang for providing information about UDFs. Finally, we would like to thank Hamid Pirahesh and Berthold Reinwald for allowing Kanda Runapongsa to work on this paper during her summer internship at IBM Almaden Research Center. References 1. Software AG. Tamino - The Information Server for Electronic Business, P. Bohannon, J. Freire, P. Roy, and J. Simeón. From XML Schema to Relations: A Cost-Based Approach to XML Storage. In Proceedings IEEE International on Data Engineering, San Jose, California, February J. Bosak. The Plays of Shakespeare in XML, July J. Bosak, T. Bray, D. Connolly, E. Maler, G. Nicol, C.M. Sperberg-McQueen, L. Wood, and J. Clark. W3C XML Specification DTD, June
19 284 K. Runapongsa and J.M. Patel 5. T. Bray, J. Paoli, C.M. Sperberg-McQueen, and E. Maler. Extensible Markup Language (XML), October M.J. Carey, J. Kiernan, J. Shanmugasundaram, E.J. Shekita, and S.N. Subramanian. XPERANTO: A Middleware for Publishing Object-Relational Data as XML Documents. In Proceedings International Conference Very Large Data Bases, pages , Cairo, Egypt, September D. Chamberlin. Using The New DB2: IBM s Object-Relational Database System. Morgan Kaufmann Publishers, Inc., San Francisco, California, IBM Corporation. XML Parser for Java., February IBM Corporation. IBM XML Generator, September IBM Corporation. IBM DB2 UDB XML Extender Adminstration and Programming, March v71wrk/dxx%awmst.pdf. 11. Oracle Corporation. Oracle XML SQL Utility A. Deutsch, M. F. Fernandez, and D. Suciu. Storing Semistructured Data with STORED. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages ACM Press, M. F. Fernandez, Dan Suciu, and Wang-Chiew Tan. SilkRoute: Trading between Relations and XML. In Proceedings of Ninth International World Wide Web Conference, D. Florescu, G. Graefe, G. Moerkotte, H. Pirahesh, and H. Schöning. Panel: XML Data Management: Go Native or Spruce up Relational Systems? In Proceedings of the ACM SIGMOD International Conference on Management of Data, Santa Barbara, California, May (Panel Chair: Per-Ake Larson). 15. D. Florescu and D. Kossmann. Storing and Querying XML Data using an RDBMS. Bulletin of the Technical Committee on Data Engineering, 22(3):27 34, G. Kappel, E. Kapsammer, S. Raush-Schott, and W. Retschzegger. X-Ray - Towards Integrating XML and Relational Database Systems. In International Conference on Conceptual Modeling (ER), pages , Utah, USA, October M. Klettke and H. Meyer. XML and Object-Relational Database Systems - Enhancing Structural Mappings Based on Statistics. In International Workshop on the Web and Databases, Dallas, Texas, May D. Lee and W. W. Chu. Constraints-Preserving Transformation from XML Document Type Definition to Relational Schema. In International Conference on Conceptual Modeling (ER), October H. Liefke and D. Suciu. XMill: an Efficient Compressor for XML Data. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages , Dallas, Texas, May J. McHugh, S. Abiteboul, R. Goldman, D. Quass, and J. Widom. Lore: A Database Management System for Semistructured Data. SIGMOD Record, 26(3):54 66, September Sigmod Record. Sigmod Record: XML Edition K. Runapongsa and J. M. Patel. Storing and Querying XML Data in ORDBMSs. University of Michigan, Technical Report, M. Rys. State-of-the-art Support in RDBMS:Microsoft SQL Server s XML Features. Bulletin of the Technical Committee on Data Engineering, 24(2):3 11, June A.R. Schmidt, M.L. Kersten, M. Windhouwer, and F. Waas. Efficient Relational Storage and Retrieval of XML Documents. In WebDB 2000 Third International Workshop on the Web and Databases, Dallas, Texas, May 2000.
20 Storing and Querying XML Data in Object-Relational DBMSs J. Shanmugasundaram, K. Tufte, G. He, C. Zhang, D.DeWitt, and J.Naughton. Relational Databases for Querying XML Documents: Limitations and Opportunities. In Proceedings International Conference Very Large Data Bases, pages , Edinburgh, Scotland, September T. Shimura, M. Yoshikawa, and S. Uemura. Storage and Retrieval of XML Documents Using Object-Relational Databases. In International Conference on Database and Expert Systems Applications, pages , Florence, Italy, September 1999.
Model-Mapping Approaches for Storing and Querying XML Documents in Relational Database: A Survey
Model-Mapping Approaches for Storing and Querying XML Documents in Relational Database: A Survey 1 Amjad Qtaish, 2 Kamsuriah Ahmad 1 School of Computer Science, Faculty of Information Science and Technology,
Relational Databases for Querying XML Documents: Limitations and Opportunities. Outline. Motivation and Problem Definition Querying XML using a RDBMS
Relational Databases for Querying XML Documents: Limitations and Opportunities Jayavel Shanmugasundaram Kristin Tufte Gang He Chun Zhang David DeWitt Jeffrey Naughton Outline Motivation and Problem Definition
Efficient Mapping XML DTD to Relational Database
Efficient Mapping XML DTD to Relational Database Mohammed Adam Ibrahim Fakharaldien 1, Khalid Edris 2, Jasni Mohamed Zain 3, Norrozila Sulaiman 4 Faculty of Computer System and Software Engineering,University
Unraveling the Duplicate-Elimination Problem in XML-to-SQL Query Translation
Unraveling the Duplicate-Elimination Problem in XML-to-SQL Query Translation Rajasekar Krishnamurthy University of Wisconsin [email protected] Raghav Kaushik Microsoft Corporation [email protected]
Technologies for a CERIF XML based CRIS
Technologies for a CERIF XML based CRIS Stefan Bärisch GESIS-IZ, Bonn, Germany Abstract The use of XML as a primary storage format as opposed to data exchange raises a number of questions regarding the
Storing and Querying Ordered XML Using a Relational Database System
Kevin Beyer IBM Almaden Research Center Storing and Querying Ordered XML Using a Relational Database System Igor Tatarinov* University of Washington Jayavel Shanmugasundaram* Cornell University Stratis
Translating between XML and Relational Databases using XML Schema and Automed
Imperial College of Science, Technology and Medicine (University of London) Department of Computing Translating between XML and Relational Databases using XML Schema and Automed Andrew Charles Smith acs203
Constraint Preserving XML Storage in Relations
Constraint Preserving XML Storage in Relations Yi Chen, Susan B. Davidson and Yifeng Zheng ÔØº Ó ÓÑÔÙØ Ö Ò ÁÒ ÓÖÑ Ø ÓÒ Ë Ò ÍÒ Ú Ö ØÝ Ó È ÒÒ ÝÐÚ Ò [email protected] [email protected] [email protected]
Efficiently Identifying Inclusion Dependencies in RDBMS
Efficiently Identifying Inclusion Dependencies in RDBMS Jana Bauckmann Department for Computer Science, Humboldt-Universität zu Berlin Rudower Chaussee 25, 12489 Berlin, Germany [email protected]
A Workbench for Prototyping XML Data Exchange (extended abstract)
A Workbench for Prototyping XML Data Exchange (extended abstract) Renzo Orsini and Augusto Celentano Università Ca Foscari di Venezia, Dipartimento di Informatica via Torino 155, 30172 Mestre (VE), Italy
Chapter 2: Designing XML DTDs
2. Designing XML DTDs 2-1 Chapter 2: Designing XML DTDs References: Tim Bray, Jean Paoli, C.M. Sperberg-McQueen: Extensible Markup Language (XML) 1.0, 1998. [http://www.w3.org/tr/rec-xml] See also: [http://www.w3.org/xml].
Unified XML/relational storage March 2005. The IBM approach to unified XML/relational databases
March 2005 The IBM approach to unified XML/relational databases Page 2 Contents 2 What is native XML storage? 3 What options are available today? 3 Shred 5 CLOB 5 BLOB (pseudo native) 6 True native 7 The
INTEGRATION OF XML DATA IN PEER-TO-PEER E-COMMERCE APPLICATIONS
INTEGRATION OF XML DATA IN PEER-TO-PEER E-COMMERCE APPLICATIONS Tadeusz Pankowski 1,2 1 Institute of Control and Information Engineering Poznan University of Technology Pl. M.S.-Curie 5, 60-965 Poznan
ON ANALYZING THE DATABASE PERFORMANCE FOR DIFFERENT CLASSES OF XML DOCUMENTS BASED ON THE USED STORAGE APPROACH
ON ANALYZING THE DATABASE PERFORMANCE FOR DIFFERENT CLASSES OF XML DOCUMENTS BASED ON THE USED STORAGE APPROACH Hagen Höpfner and Jörg Schad and Essam Mansour International University Bruchsal, Campus
Deferred node-copying scheme for XQuery processors
Deferred node-copying scheme for XQuery processors Jan Kurš and Jan Vraný Software Engineering Group, FIT ČVUT, Kolejn 550/2, 160 00, Prague, Czech Republic [email protected], [email protected] Abstract.
Enhancing Traditional Databases to Support Broader Data Management Applications. Yi Chen Computer Science & Engineering Arizona State University
Enhancing Traditional Databases to Support Broader Data Management Applications Yi Chen Computer Science & Engineering Arizona State University What Is a Database System? Of course, there are traditional
Quiz! Database Indexes. Index. Quiz! Disc and main memory. Quiz! How costly is this operation (naive solution)?
Database Indexes How costly is this operation (naive solution)? course per weekday hour room TDA356 2 VR Monday 13:15 TDA356 2 VR Thursday 08:00 TDA356 4 HB1 Tuesday 08:00 TDA356 4 HB1 Friday 13:15 TIN090
Chapter 6: Physical Database Design and Performance. Database Development Process. Physical Design Process. Physical Database Design
Chapter 6: Physical Database Design and Performance Modern Database Management 6 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden Robert C. Nickerson ISYS 464 Spring 2003 Topic 23 Database
Application of XML Tools for Enterprise-Wide RBAC Implementation Tasks
Application of XML Tools for Enterprise-Wide RBAC Implementation Tasks Ramaswamy Chandramouli National Institute of Standards and Technology Gaithersburg, MD 20899,USA 001-301-975-5013 [email protected]
10. XML Storage 1. 10.1 Motivation. 10.1 Motivation. 10.1 Motivation. 10.1 Motivation. XML Databases 10. XML Storage 1 Overview
10. XML Storage 1 XML Databases 10. XML Storage 1 Overview Silke Eckstein Andreas Kupfer Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 10.6 Overview and
Using Database Metadata and its Semantics to Generate Automatic and Dynamic Web Entry Forms
Using Database Metadata and its Semantics to Generate Automatic and Dynamic Web Entry Forms Mohammed M. Elsheh and Mick J. Ridley Abstract Automatic and dynamic generation of Web applications is the future
A Document Management System Based on an OODB
Tamkang Journal of Science and Engineering, Vol. 3, No. 4, pp. 257-262 (2000) 257 A Document Management System Based on an OODB Ching-Ming Chao Department of Computer and Information Science Soochow University
Database Design Patterns. Winter 2006-2007 Lecture 24
Database Design Patterns Winter 2006-2007 Lecture 24 Trees and Hierarchies Many schemas need to represent trees or hierarchies of some sort Common way of representing trees: An adjacency list model Each
Integrating Heterogeneous Data Sources Using XML
Integrating Heterogeneous Data Sources Using XML 1 Yogesh R.Rochlani, 2 Prof. A.R. Itkikar 1 Department of Computer Science & Engineering Sipna COET, SGBAU, Amravati (MH), India 2 Department of Computer
Caching XML Data on Mobile Web Clients
Caching XML Data on Mobile Web Clients Stefan Böttcher, Adelhard Türling University of Paderborn, Faculty 5 (Computer Science, Electrical Engineering & Mathematics) Fürstenallee 11, D-33102 Paderborn,
Implementing XML Schema inside a Relational Database
Implementing XML Schema inside a Relational Database Sandeepan Banerjee Oracle Server Technologies 500 Oracle Pkwy Redwood Shores, CA 94065, USA + 1 650 506 7000 [email protected] ABSTRACT
PartJoin: An Efficient Storage and Query Execution for Data Warehouses
PartJoin: An Efficient Storage and Query Execution for Data Warehouses Ladjel Bellatreche 1, Michel Schneider 2, Mukesh Mohania 3, and Bharat Bhargava 4 1 IMERIR, Perpignan, FRANCE [email protected] 2
Efficient Iceberg Query Evaluation for Structured Data using Bitmap Indices
Proc. of Int. Conf. on Advances in Computer Science, AETACS Efficient Iceberg Query Evaluation for Structured Data using Bitmap Indices Ms.Archana G.Narawade a, Mrs.Vaishali Kolhe b a PG student, D.Y.Patil
XBench Benchmark and Performance Testing of XML DBMSs
XBench Benchmark and Performance Testing of XML DBMSs Benjamin Bin Yao M. Tamer Özsu University of Waterloo School of Computer Science {bbyao, tozsu}@uwaterloo.ca Nitin Khandelwal University of Pennsylvania
An Oracle White Paper October 2013. Oracle XML DB: Choosing the Best XMLType Storage Option for Your Use Case
An Oracle White Paper October 2013 Oracle XML DB: Choosing the Best XMLType Storage Option for Your Use Case Introduction XMLType is an abstract data type that provides different storage and indexing models
Efficient Integration of Data Mining Techniques in Database Management Systems
Efficient Integration of Data Mining Techniques in Database Management Systems Fadila Bentayeb Jérôme Darmont Cédric Udréa ERIC, University of Lyon 2 5 avenue Pierre Mendès-France 69676 Bron Cedex France
www.gr8ambitionz.com
Data Base Management Systems (DBMS) Study Material (Objective Type questions with Answers) Shared by Akhil Arora Powered by www. your A to Z competitive exam guide Database Objective type questions Q.1
VIRTUAL LABORATORY: MULTI-STYLE CODE EDITOR
VIRTUAL LABORATORY: MULTI-STYLE CODE EDITOR Andrey V.Lyamin, State University of IT, Mechanics and Optics St. Petersburg, Russia Oleg E.Vashenkov, State University of IT, Mechanics and Optics, St.Petersburg,
Modern Databases. Database Systems Lecture 18 Natasha Alechina
Modern Databases Database Systems Lecture 18 Natasha Alechina In This Lecture Distributed DBs Web-based DBs Object Oriented DBs Semistructured Data and XML Multimedia DBs For more information Connolly
XBRL Processor Interstage XWand and Its Application Programs
XBRL Processor Interstage XWand and Its Application Programs V Toshimitsu Suzuki (Manuscript received December 1, 2003) Interstage XWand is a middleware for Extensible Business Reporting Language (XBRL)
XML Data Integration Based on Content and Structure Similarity Using Keys
XML Data Integration Based on Content and Structure Similarity Using Keys Waraporn Viyanon 1, Sanjay K. Madria 1, and Sourav S. Bhowmick 2 1 Department of Computer Science, Missouri University of Science
Generating XML from Relational Tables using ORACLE. by Selim Mimaroglu Supervisor: Betty O NeilO
Generating XML from Relational Tables using ORACLE by Selim Mimaroglu Supervisor: Betty O NeilO 1 INTRODUCTION Database: : A usually large collection of data, organized specially for rapid search and retrieval
Chapter 1: Introduction
Chapter 1: Introduction Database System Concepts, 5th Ed. See www.db book.com for conditions on re use Chapter 1: Introduction Purpose of Database Systems View of Data Database Languages Relational Databases
Fuzzy Multi-Join and Top-K Query Model for Search-As-You-Type in Multiple Tables
Fuzzy Multi-Join and Top-K Query Model for Search-As-You-Type in Multiple Tables 1 M.Naveena, 2 S.Sangeetha 1 M.E-CSE, 2 AP-CSE V.S.B. Engineering College, Karur, Tamilnadu, India. 1 [email protected],
A Tale of Two Approaches: Query Performance Study of XML Storage Strategies in Relational Databases
A Tale of Two Approaches: Performance Study of XML Storage Strategies in Relational Databases Sandeep Prakash Sourav S. Bhowmick School of Computer Engineering, Nanyang Technological University, Singapore
KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS
ABSTRACT KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS In many real applications, RDF (Resource Description Framework) has been widely used as a W3C standard to describe data in the Semantic Web. In practice,
WEBVIEW An SQL Extension for Joining Corporate Data to Data Derived from the World Wide Web
WEBVIEW An SQL Extension for Joining Corporate Data to Data Derived from the World Wide Web Charles A Wood and Terence T Ow Mendoza College of Business University of Notre Dame Notre Dame, IN 46556-5646
QuickDB Yet YetAnother Database Management System?
QuickDB Yet YetAnother Database Management System? Radim Bača, Peter Chovanec, Michal Krátký, and Petr Lukáš Radim Bača, Peter Chovanec, Michal Krátký, and Petr Lukáš Department of Computer Science, FEECS,
Converting XML Data To UML Diagrams For Conceptual Data Integration
Converting XML Data To UML Diagrams For Conceptual Data Integration Mikael R. Jensen Thomas H. Møller Torben Bach Pedersen Department of Computer Science, Aalborg University mrj,thm,tbp @cs.auc.dk Abstract
Managing XML Documents Versions and Upgrades with XSLT
Managing XML Documents Versions and Upgrades with XSLT Vadim Zaliva, [email protected] 2001 Abstract This paper describes mechanism for versioning and upgrding XML configuration files used in FWBuilder
Standard Recommended Practice extensible Markup Language (XML) for the Interchange of Document Images and Related Metadata
Standard for Information and Image Management Standard Recommended Practice extensible Markup Language (XML) for the Interchange of Document Images and Related Metadata Association for Information and
A MEDIATION LAYER FOR HETEROGENEOUS XML SCHEMAS
A MEDIATION LAYER FOR HETEROGENEOUS XML SCHEMAS Abdelsalam Almarimi 1, Jaroslav Pokorny 2 Abstract This paper describes an approach for mediation of heterogeneous XML schemas. Such an approach is proposed
Database Systems. Lecture 1: Introduction
Database Systems Lecture 1: Introduction General Information Professor: Leonid Libkin Contact: [email protected] Lectures: Tuesday, 11:10am 1 pm, AT LT4 Website: http://homepages.inf.ed.ac.uk/libkin/teach/dbs09/index.html
Introduction to XML Applications
EMC White Paper Introduction to XML Applications Umair Nauman Abstract: This document provides an overview of XML Applications. This is not a comprehensive guide to XML Applications and is intended for
14 Databases. Source: Foundations of Computer Science Cengage Learning. Objectives After studying this chapter, the student should be able to:
14 Databases 14.1 Source: Foundations of Computer Science Cengage Learning Objectives After studying this chapter, the student should be able to: Define a database and a database management system (DBMS)
A Business Process Services Portal
A Business Process Services Portal IBM Research Report RZ 3782 Cédric Favre 1, Zohar Feldman 3, Beat Gfeller 1, Thomas Gschwind 1, Jana Koehler 1, Jochen M. Küster 1, Oleksandr Maistrenko 1, Alexandru
Enforcing Data Quality Rules for a Synchronized VM Log Audit Environment Using Transformation Mapping Techniques
Enforcing Data Quality Rules for a Synchronized VM Log Audit Environment Using Transformation Mapping Techniques Sean Thorpe 1, Indrajit Ray 2, and Tyrone Grandison 3 1 Faculty of Engineering and Computing,
An XML Based Data Exchange Model for Power System Studies
ARI The Bulletin of the Istanbul Technical University VOLUME 54, NUMBER 2 Communicated by Sondan Durukanoğlu Feyiz An XML Based Data Exchange Model for Power System Studies Hasan Dağ Department of Electrical
Performance Tuning for the Teradata Database
Performance Tuning for the Teradata Database Matthew W Froemsdorf Teradata Partner Engineering and Technical Consulting - i - Document Changes Rev. Date Section Comment 1.0 2010-10-26 All Initial document
OData Extension for XML Data A Directional White Paper
OData Extension for XML Data A Directional White Paper Introduction This paper documents some use cases, initial requirements, examples and design principles for an OData extension for XML data. It is
XML Data Integration
XML Data Integration Lucja Kot Cornell University 11 November 2010 Lucja Kot (Cornell University) XML Data Integration 11 November 2010 1 / 42 Introduction Data Integration and Query Answering A data integration
Natural Language to Relational Query by Using Parsing Compiler
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,
How To Use X Query For Data Collection
TECHNICAL PAPER BUILDING XQUERY BASED WEB SERVICE AGGREGATION AND REPORTING APPLICATIONS TABLE OF CONTENTS Introduction... 1 Scenario... 1 Writing the solution in XQuery... 3 Achieving the result... 6
Constraint-based Query Distribution Framework for an Integrated Global Schema
Constraint-based Query Distribution Framework for an Integrated Global Schema Ahmad Kamran Malik 1, Muhammad Abdul Qadir 1, Nadeem Iftikhar 2, and Muhammad Usman 3 1 Muhammad Ali Jinnah University, Islamabad,
Object Oriented Databases. OOAD Fall 2012 Arjun Gopalakrishna Bhavya Udayashankar
Object Oriented Databases OOAD Fall 2012 Arjun Gopalakrishna Bhavya Udayashankar Executive Summary The presentation on Object Oriented Databases gives a basic introduction to the concepts governing OODBs
ALIAS: A Tool for Disambiguating Authors in Microsoft Academic Search
Project for Michael Pitts Course TCSS 702A University of Washington Tacoma Institute of Technology ALIAS: A Tool for Disambiguating Authors in Microsoft Academic Search Under supervision of : Dr. Senjuti
Big Data Begets Big Database Theory
Big Data Begets Big Database Theory Dan Suciu University of Washington 1 Motivation Industry analysts describe Big Data in terms of three V s: volume, velocity, variety. The data is too big to process
XML Databases 10 O. 10. XML Storage 1 Overview
XML Databases 10 O 10. XML Storage 1 Overview Silke Eckstein Andreas Kupfer Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 10. XML Storage 1 10.1 Motivation
CellStore: Educational and Experimental XML-Native DBMS
CellStore: Educational and Experimental XML-Native DBMS Jaroslav Pokorný 1 and Karel Richta 2 and Michal Valenta 3 1 Charles University of Prague, Czech Republic, [email protected] 2 Czech Technical
XML: extensible Markup Language. Anabel Fraga
XML: extensible Markup Language Anabel Fraga Table of Contents Historic Introduction XML vs. HTML XML Characteristics HTML Document XML Document XML General Rules Well Formed and Valid Documents Elements
Technical Report. The KNIME Text Processing Feature:
Technical Report The KNIME Text Processing Feature: An Introduction Dr. Killian Thiel Dr. Michael Berthold [email protected] [email protected] Copyright 2012 by KNIME.com AG
Preparing Data Sets for the Data Mining Analysis using the Most Efficient Horizontal Aggregation Method in SQL
Preparing Data Sets for the Data Mining Analysis using the Most Efficient Horizontal Aggregation Method in SQL Jasna S MTech Student TKM College of engineering Kollam Manu J Pillai Assistant Professor
low-level storage structures e.g. partitions underpinning the warehouse logical table structures
DATA WAREHOUSE PHYSICAL DESIGN The physical design of a data warehouse specifies the: low-level storage structures e.g. partitions underpinning the warehouse logical table structures low-level structures
Managing large sound databases using Mpeg7
Max Jacob 1 1 Institut de Recherche et Coordination Acoustique/Musique (IRCAM), place Igor Stravinsky 1, 75003, Paris, France Correspondence should be addressed to Max Jacob ([email protected]) ABSTRACT
