Translating XQuery expressions to Functional Queries in a Mediator Database System
|
|
|
- Grace Bond
- 10 years ago
- Views:
Transcription
1 Uppsala Student Thesis Computing Science No ISSN Translating XQuery expressions to Functional Queries in a Mediator Database System A student project paper by Tobias Hilka Advisor and Supervisor Tore Risch January 2004 Uppsala Database Laboratory Uppsala University P.O. Box 513 S Uppsala Sweden
2 Contents 1 INTRODUCTION XML XPath XQuery FIRST APPROACH TRANSLATION AmosQL and XQuery The Parts of FLWOR expressions Simple Query using for Where Clause Order By Clause special clause Multiple for clauses Combination of for and let clause Restrictions TRANSLATION RULES IMPLEMENTATION OF THE QUERY TRANSLATOR CONCLUSION AND FUTURE WORK REFERENCES...25 APPENDIX A: TRANSLATED QUERIES...26 A.1 TC/MD...26 A.2 TC/SD...31 A.3 DC/MD...36 A.4 DC/SD
3 1 Introduction This report on my student project will discuss some approaches of translating XQuery expressions into functional queries in a mediator database system. The mediator database system used is AMOS II (Active Mediator Object System)[1][2]. This system uses a functional data model and is based on the query language AmosQL. This language is relationally complete object-oriented and is similar to the object-oriented parts of SQL-99. An Amos II database system can integrate data of different type into its own object-oriented database using wrappers. The wrappers process data from different external sources such as ODBC based relational databases, CAD systems or internet search engines. This results in a common data model and a query language for heterogeneous data. AMOS II can not only be used as a stand-alone database system but also interoperate with many other independent AMOS II system over a communication network such as the internet. Applications can access data stored in some AMOS II database using other AMOS II databases called mediators. These mediators combine the underlying data sources in the needed way to offer a high-level abstraction. This greatly simplifies accessing heterogeneous data sources at the application level. The mediator system themselves can also manage data of its own. The data stored in an AMOS II database resides in main memory until it is stored explicitly to hard disk. There are two kinds of interfaces between AMOS II and the programming language Java called the callin and the callout interfaces[3]. Using the callout interface, the programmer can define foreign AMOS II functions in Java. These functions can freely be used in database queries. To manipulate the database, foreign functions written in Java use the callin interface by calling AmosQL statements or functions. The purpose of the project presented here was to find a way to translate XQuery expressions into queries expressed in AmosQL. Since XQuery is a very powerful language, not all constructions that are possible with XQuery can be translated to AmosQL. I used the well known XMLSchema based Waterloo benchmark for XML databases[4] and the queries used to generate workload. 1.1 XML XML stands for Extensible Markup Language and was standardized by the W3C[5]. It is designed to structure, store and send information. The tags used in XML are not predefined like in HTML and can be defined using a Document Type Definition (DTD) or an XML Schema[6]. Because of that, XML is extensible. With this standard it is possible to exchange data (e.g. over the internet) between different systems that are not able to communicate otherwise. Since its introduction by the W3C, XML has gained attention throughout the industry and is very widely used in software development. 3
4 1.2 XPath As increasing amounts of information are stored, exchanged, and presented using the XML format, the ability to intelligently query XML data sources becomes increasingly important. The primary purpose of XPath[7] is to address parts of an XML document. It operates on its abstract, logical structure. XPath gets its name from its use of a path notation as in URLs for navigating through the hierarchical structure of an XML document. In addition to its use for addressing, XPath is also designed so that it has a natural subset that can be used for matching (testing whether or not a node matches a pattern). The primary syntactic construct in XPath is the expression. One important kind of expression is a location path which selects a set of nodes relative to the context node. There are two kinds of location path: relative location paths and absolute location paths. A relative location path consists of one or more location steps separated by /. The steps are composed from left to right. Each step selects a set of nodes relative to the context node. Every node in this set will be used as context node for the following step. An absolute location path consists of / optionally followed by a relative location path. A / by itself selects the root node of a document which contains as a child the element node for the document, which will be used as context node for the following step. A location path itself consists of the following parts: an axis, specifying the tree relationship between the nodes, a node test, specifying the node type and name of the nodes to be selected and optional predicates, refining the nodes selected by the location step. For example, in "child::paragraph[position()=last()]", child is the name of the axis, paragraph is the node test and [position()=last()] is a predicate selecting the last paragraph child of the context node. The example uses the unabbreviated syntax. There are a number of syntactic abbreviations which make it possible to write the example as paragraph[last()]. A fully working extension to the AMOS II system, evaluating XPath expressions was done in a previous project called AXE[]. The AXE extension is used in this current project for the necessary evaluation of XPath expressions. 1.3 XQuery XQuery Version 1.0 [8] is an extension to the XPath language. Therefore they share the same data model. Any expression that is syntactically valid and executes successfully in both languages will the same result. XQuery is designed to be a language in which queries are concise and easily understood. It is also flexible enough to query a broad spectrum of XML information sources, including both databases and documents. The basic building block of XQuery is the expression. There are several kinds of expressions provided by XQuery. This work focused on the translation of the so 4
5 called FLWOR-exrpressions as presented in the queries of the Waterloo benchmark. These expressions support iteration and binding of variables to intermediate results. This kind of expression is often useful for computing joins between two or more documents and for restructuring. The name FLWOR (pronounced flower ) consists of the first letter of the keywords for, let, where, order by and. The purpose of the for and let clauses is to generate a sequence of bound variables, the tuple stream. The simplest example of a for clause contains one variable and an associated expression. It evaluates the expression and iterates over the items in the resulting sequence, binding the variable to each item in turn. A for clause may also contain multiple variables, each with an associated expression. In this case, the for clause iterates each variable over the items that result from evaluating its expression. The resulting tuple stream contains one tuple for each combination of values in the Cartesian product of the sequences resulting from evaluating the given expressions. A let clause may also contain one or more variables, each with an associated expression. The difference compared to a for clause is that a let clause binds each variable to the result of its associated expression, without iteration. The variable bindings generated by let clauses are added to the binding tuples generated by the for clauses. If there are no for clauses, the let clauses generate one tuple containing all the variable bindings. A simple example will illustrate that: - Query 1: for $s in (<x/>, <y/>, <z/>) let $i := (<a/>, <b/>) <out>{$s}{$i}</out> Output: <out> <x/> <a/> <b/> </out> <out> <y/> <a/> <b/> </out> <out> <z/> <a/> <b/> </out> - Query 2: for $i in (<a/>, <b/>) for $s in (<x/>, <y/>, <z/>) <out>{$s}{$i}</out> Output: <out> <x/> <a/> </out> <out> <y/> <a/> </out> <out> <z/> <a/> </out> <out> <x/> <b/> </out> <out> <y/> <b/> </out> <out> <z/> <b/> </out> - Query 3: 5
6 let $i := (<a/>, <b/>) let $s := (<x/>, <y/>, <z/>) <out>{$i}{$s}</out> Output: <out> <x/> <y/> <z/> <a/> <b/> </out> The optional where clause serves as a filter for the tuples generated by the for and let clauses. It is evaluated once for each tuple. If the effective boolean value of the where-expression is true, the tuple is used in the execution of the clause. If it is false, the tuple is discarded. The clause is evaluated once for each tuple in the stream and the results of these evaluations are concatenated to form the result of the FLWOR expression. The order by clause contains one or more ordering specifications. For each tuple in the stream, these specifications are evaluated. In the current version of this project, only ordering specifications that are results of the FLWOR expression can be used as ordering key. 6
7 2 First Approach The first approach to the translation of XQuery to AmosQL was to use a system developed at UBDL that uses XML Schema to generate a database schema for AMOS II [9]. The basic idea behind this project was to parse a given XML Schema definition and translate it into the corresponding schema definitions in AMOS II. In other words, to dynamically generate a file containing the type and function definitions corresponding to the imported XML Schema. After the XML Schema specification is imported to AMOS II it is possible to populate the AMOS II mainmemory database with any XML document that is described by this schema. After the schema is imported to AMOS II and the database is populated it is possible to state arbitrary queries over the imported data using AmosQL. Here is an example of a XML Schema file describing a note: <schema> <element name= note > <complextype> <sequence> <element name= to type= string /> <element name= from type= string /> <element name= heading type= string /> <complextype> <element name= body type= string /> <attribute name= lang type= string /> </complextype> </sequence> </complextype> </element> </schema> The corresponding AMOS II schema would look like this: create Type XS_note; create Type XS_to; create Type XS_from; create Type XS_heading; create Type XS_body; create function XS_to(XS_note)->XS_to as stored; create function XS_from(XS_note)->XS_from as stored; create function XS_heading(XS_note)->XS_heading as stored; create function XS_body(XS_note)->XS_body as stored; create function XS_to(XS_to)->charstring as stored; create function XS_from(XS_from)->charstring as stored; create function XS_heading(XS_heading)->charstring as stored; create function XS_body(XS_body)->charstring as stored; create function XS_lang(XS_body)->charstring as stored; As you can see, all <element> tags are translated in AMOS II types. For every element there is a function inserted which s the element of a given the parent node. For all <element> and <attribute> tags of type string in XML Schema a function ing the corresponding string of the element is ed. 7
8 A simple example of a possible translation of an XQuery query to an AmosQL Query using this database schema is TC/MD_Q1: Return the title of the article that has matching id attribute value (1). XQuery: for $art in input()/article[@id= 1 ] $art/prolog/title select XS_title(XS_title(XS_prolog(a))) from XS_article a where XS_id(a)=1; The translation is possible in simple cases like the example above. An example using some more advanced features of the XQuery language is the following: TC/MD_Q9: Return all author names (several consecutive elements unknown) of the article with matching id attribute value (2). for $art in input()/article[@id="2"] $art//author[position()=last()]/name The [position()=last()] predicate was added to make the problems of the translation more obvious. First of all, it is very hard to translate queries that use the descendant-orself::node() axis. In abbreviated syntax this expression would be // like in the example above. The meaning of this construct is: select all nodes named author that are descendants of the document root node article[@id="2"]. The predicate is evaluated after the selection (see later). Since there is only the parent or the child of a node known, constructs like this need a descendant(object) function. Such a function could be constructed by a generic function which looks up all corresponding functions to this object in the schema and creates some kind of descendant function. Another problem of this approach can not be solved. It concerns the nondocument-preserving storage of the XML document in the database. There is no way to translate queries that refer to the structure of the document. Predicates like [postion()=last()] of the example can not be translated. Once stored in the database, the order of the elements is lost. The main interest of the translation of XML Schema into the corresponding AMOS II schema was data-centric which means, only the content of an XML document is important. This type of documents are mostly intended to be processed by software. One example for this kind of documents would be air travel information. Document-centric documents (or text-centric as in the waterloo benchmark) in contrast are mostly for human consumption. The structure 8
9 of the document and particularly the order of the elements are of great interest. Documents-centric documents are for example web pages or user manuals. Data centric queries are relatively easy to express over the translated XMLSchema using AmosQL. However, XQuery is mainly document centric and the queries are specified as paths through the document. Because of this the method of querying the translated data centric XMLSchema document is not so straight forward with XQuery. Therefore we decided to base the query translator on the document centric AXE system representation that represent the document structure explicitly and uses XPath to navigate the document structure. 9
10 3 Translation Since the first approach turned out to be not capable to satisfy the requirements of document-centric query expressions, a new approach was taken. This new approach uses the AXE system developed at the UBDL. As mentioned before, the AXE system is a general wrapper for schema-free XML documents. It also includes an implementation of XPath expressions. To represent the XML structure in AMOS II, the system used a DOM-like database schema. DOM stands for Document Object Model. It is a specification to describe the structure of documents and the way to access and manipulate documents. DOM uses a tree like object graph to model the documents that it represents. The mapping to the AMOS II database schema is done in a rather straightforward way. With the help of the document preserving storage in the database and the implementation of the XPath expressions this system is suitable to serve as a platform for translating XQuery expressions to AmosQL. The current version of the translator only works with AXE Version AmosQL and XQuery The most flexible way to specify queries in AmosQL is to use the select statement. The simplified syntax of a select statement that is used for this translation is select-stmt ::= "select" ["distinct"] expr-commalist [from-clause] [where-clause] from-clause ::= "from" variable-declaration-commalist variable-declaration ::= type-name variable-name "bag of" type-name variable-name where-clause ::= "where" predicate-expression The [into-clause] of the original syntax is skipped because for the translation there is no use to have temporary variables in the AMOS II system. Even more simplified, the select statement has the following format: select result from type extents where condition The general semantic of an AmosQL query is - form the Cartesian product of the type extents - restrict it by the condition - for each possible variable binding to tuple elements in the restricted cartesian product, evaluate the result expressions - result containing NIL are not included in the result set 10
11 Since it would be very inefficient to execute the query like this, the select statement passes a query optimization before being executed. The partial grammar of the XQuery FLWOR expressions that can be translated looks like this: FLOWRExpr ::= (ForClause LetClause)+ WhereClause? OrderByClause? ExprReturn Translating the whole scope of possible constructs of FLWOR expressions is far beyond the scope of this work, therefore some restrictions were made. ForClause ::= "for" "$" VarName "in" PathExpr ("," "$" VarName "in" PathExpr)* LetClause ::= "let" "$" VarName ":=" PathExpr ("," "$" VarName ":=" PathExpr)* WhereClause ::= "where" Expr OrderByClause ::= "order" "by" OrderSpec PathExpr ::= ("/" RelativePathExpr?) ("//" RelativePathExpr) RelativePathExpr ExprReturn ::= ((count(varname)) (VarName PathExpr)) ("{" (count(varname)) (VarName PathExpr) "}")* OrderSpec ::= VarName PathExpr The OrderSpec part has to be one of the elements in ExprReturn since the sorting abilities of AMOS II are very restricted. 3.2 The Parts of FLWOR expressions To illustrate the way the XQuery queries are translated into AmosQL some simple examples from the Waterloo benchmark are used. In this benchmark the expression text-centric is used for the expression document-centric documents that was used so far. The queries are split into four groups: - Text-Centric/Single-Document (TC/SD), - Text-Centric/Multiple-Document (TC/MD) - Data-Centric/Single-Document (DC/SD) - Data-Centric/Multiple-Document (DC/MD) As an orientation for the reader, the group and the number of the queries are given for every example. The translated queries were generated by a software tool which was implemented during this project. Some of the queries may have been written in an easier way by the user. Since translation rules are used for generating the AmosQL queries they may look a little complicated. The translation rules will be explained in chapter 4. The path expressions used in the waterloo benchmark usually start with a function called input(). This is the unnamed input set of the database and is skipped in the translation of the expressions Simple Query using for TC/MD_Q1: 11
12 Return the title of the article that has matching id attribute value (1). XQuery: for $art in 1 ] $art/prolog/title where f0=evaluate(xf0, "/article[@id='1']") and r0=evaluate(f0, "/prolog/title"); As you can see, for every for statement a new type extent is inserted in the from-clause. The same is done for every statement. Furthermore, the name of the variable from the statement is inserted into the result clause. To get the requested nodes, the evaluate(node subtree, charstring expression) -> bag of node b function of the AXE system is used. It takes a node (the root node for this evaluation) and an expression as arguments, evaluates the expression and s a bag of nodes matching this expression. As you can see, in the statement for the result variable, f0 (each node ed by the evaluate function of the for clause) is used as root node. Therefore, only the relative path from this node to the result node is used, just as it is expressed in the XQuery statement. The query above can also be written in a much more simple: select r from node x, node r where r=evaluate(x, "/article[@id='1']/prolog/title"); As mentioned before, the reason for the more complicated queries are the translation rules used to generate the queries Where Clause An example using a simple where clause is TC/MD_Q2: Find the title of the article(s) authored by (Ben Yang). XQuery: for $prolog in input()/article/prolog where $prolog/authors/author/name="ben Yang" $prolog/title from node f0, node xf0, node tmp_f0, node r0 where f0=evaluate(xf0, "/article/prolog") and tmp_f0=evaluate(f0, "/.[authors/author/name='ben Yang']") and r0=evaluate(tmp_f0, "/title"); 12
13 This simple where clause is directly translated into a call to the evaluate function using f0, the variable it refers to, as root node. It uses the abbreviated syntax of XPath "/.[predicate]". Because of the slash, the evaluation starts with the root node (here all prologs). The "." takes the current node and the condition in brackets is evaluated to restrict the current node. Since f0 contains all nodes named "prolog" which have the parent node "article", this statement selects only the "prolog" nodes, which have an author named "Ben Yang" following the given path. After that, the "title" of the "prolog"(respectively the "article") is extracted. There are some more complicated ways of expressing a where clause. empty(pathexpr) A query that illustrates the empty(pathexpr) function is TC/MD_Q14: List article title that does not have genre element. XQuery: for $a in input()/article/prolog where empty ($a/genre) <NoGenre> {$a/title} </NoGenre> where f0=evaluate(xf0, "/article/prolog") and notany(evaluate(f0, "/genre")) and r0=evaluate(f0, "/title"); The empty expression of XQuery is directly translated into a call to the evaluate function, using the variable of the for statement as root node and passing the ed nodes to the AmosQL function notany. This function s true if the bag passed as argument is empty. contains(pathexpr, String) The contains function is illustrated using the following example. TC/MD_Q17: Return the titles of articles which contain a certain word ("hockey"). XQuery: for $a in input()/article where contains ($a//p, "hockey") $a/prolog/title where f0=evaluate(xf0, "/article") 13
14 and some(select x from node x, text t where x=evaluate(f0, "//p") and like(nodevalue(t), "*hockey*") and parentnode(t)=x) and r0=evaluate(f0, "/prolog/title"); The contains function of XQuery is translated into a rather complicated expression in AmosQL. The inner query within the some function uses the evaluate function to get all the nodes named p which are descendants of the root node f0. Then, an element of type text is used to compare the string, given by the contains function to the value of this element. The function like(charstring string, Charstring pattern) -> Boolean b of AmosQL is used for the comparison. The text element is a child node of a p node. If at least one of these elements exist, the some(bag of node) -> boolean function of AmosQL will true and the tuple will be used to evaluate the result expression. Two more complicated expressions of XQuery are some and every. These expressions require special treatment. some One example showing the use of the some expression is TC/MD_Q6: Find titles of articles where both keywords ("the" and "hockey") are mentioned in the same paragraph of abstracts. XQuery: for $a in input()/article where some $b in $a/body/abstract/p satisfies (contains($b, "the") and contains($b, "hockey")) $a/prolog/title from node f0, node xf0, node w, node r0 where f0=evaluate(xf0, "/article") and some(select x from node x, text t where x=evaluate(w, "/.") and like(nodevalue(t), "*the*") and parentnode(t)=x) and some(select x from node x, text t where x=evaluate(w, "/.") and like(nodevalue(t), "*hockey*") and parentnode(t)=x) and w=evaluate(f0, "/body/abstract/p") and r0=evaluate(f0, "/prolog/title"); Inner queries are used to evaluate the XQuery some expression. One for each statement of the some expression. The first call to the evaluate function in each inner query selects all elements, that are addressed by the path of the new variable $b plus additional path given for each statement (in this case, no additional path is given, so "." is inserted). The contains function is translated with the help of a text element which is child of a node x and the like function of AmosQL. every An example will illustrate the every expression. 14
15 TC/MD_Q7: Find titles of articles where a keyword ("hockey") is mentioned in every paragraph of abstract. XQuery: for $a in input()/article where every $b in $a/body/abstract/p satisfies contains($b, "hockey") $a/prolog/title where f0=evaluate(xf0, "/article") and count(select x from node x, text t where x=evaluate(f0, "/body/abstract/p")and like(nodevalue(t), "*hockey*") and parentnode(t)=x) =count(select x from node x where x=evaluate(f0, "/body/abstract/p")) and r0=evaluate(f0, "/prolog/title"); The every function of XQuery is translated to AmosQL using two calls of the count function. The idea behind this is to have one inner query which has no restrictions and selects all the nodes of a path (in this case, the second call to count) without restriction. A second inner query(in this case the first one) selects all nodes of a path with the restriction given by the every statement. If the count of both queries is the same, every node of the given path satisfies the condition Order By Clause The following example demonstrates the translation of the order by clause. TC/MD_Q11: List the titles of articles that have a matching country element type (Canada), sorted by date. XQuery: for $a in input()/article/prolog where $a/dateline/country="canada" order by $a/dateline/country {$a/title} {$a/dateline/country} sort((, r1 from node f0, node xf0, node tmp_f0, node r0, node r1 where f0=evaluate(xf0, "/article/prolog") and tmp_f0=evaluate(f0, "/.[dateline/country='canada']") and r0=evaluate(tmp_f0, "/title") and r1=evaluate(tmp_f0, "/dateline/date") 15
16 ), 2, 'inc'); In this example there are two results, therefore two type extents are inserted. The order by clause is not directly translated. As mentioned before, only elements that appear as results of the query can be used as sorting key. The reason for this is the sorting function of AMOS II: sort(bag b, integer pos, charstring order) -> vector v This function takes a bag, containing the tuples to sort, an integer, giving the position in the tuple by which to sort (starting with 1), and a string, either 'dec' or 'inc' giving the order of the sorting special clause The normal way of the translation of the clause is obvious looking at the examples in this chapter. The clause is translated into a call of evaluate. If the only result value is the count of a bag of nodes, the translation is different. Looking at a simple example will demonstrate it. The query s the number of articles in the database. XQuery: let $a := input()/article count($a) count( select in(r0) from bag of node l0, bag of node r0 where l0=(select evaluate(x, "/article ") from node x) and r0=l0 ); The statement for f0 selects all articles. The statement for r0 also select all articles since the root node of the evaluate call is f0 and the path is ".". After building the query, the count function is applied on the results of the query (all articles) Multiple for clauses Using multiple for clauses are mostly used to have some kind of relationship between the different clauses. One possibility to express this relationship is directly in the for clause. The following example illustrates this. TC/MD_Q4: Find the heading of the section following the section entitled "Introduction" in a certain article with id attribute value (4). XQuery: for $a in input()/article[@id="4"]/body/section[@heading="introduction"], $p in input()/article[@id="4"]/body/section[. >> $a][1] 16
17 <HeadingOfSection> </HeadingOfSection> from node f0, node xf0, node f1, node xf1, attr r0 where f0=evaluate(xf0, and f1=evaluate(f0, "following-sibling::node()[1] ") and f1=evaluate(xf1, and nodename(r0)="heading" and ownerelement(r0)=f1; In this example the first section following a certain section with an attribute heading and value "introduction" should be ed. First, the section with the attribute heading and the value "introduction" is bound to f0. Then two expressions are used to get the relationship. The first expression of f1 s all sections in the article (with id value 4). The second one s the first ([1]) section following (following-sibling::node()) the section ed by f0. The root node of this expression is f0. Looking at the statement, the special treatment of an attribute as value is shown. Since attributes are not stored as nodes, attr r0 has to be inserted into the type extents instead of node r0. An attribute always belongs to an element. Therefore the function ownerelement(attr a) -> node n is called. Since several attributes can belong to one element, the name of the attribute has to be checked using the nodename(attr a) -> charstring function. Another possibility of getting some kind of relationship between several for clauses is to use the where clause. TC/MD_Q19: Retrieve the headwords of entries cited, in etymology part, by certain entry with id attribute value (E1). XQuery: for $a in input()/article[@id="7"]/epilog/references/a_id, $b in input()/article where $a = $b/@id {$b/prolog/title} from node f0, node xf0, node f1, node xf1, text tf0, node zf0, attr zf1, node r0 where f0=evaluate(xf0, "/article[@id='7']/epilog/references/a_id") and f1=evaluate(xf1, "/article") and zf0=evaluate(f0, "/.") and parentnode(tf0)=zf0 17
18 and nodename(zf1)="id" and ownerelement(zf1)=f1 and nodevalue(tf0)=nodevalue(zf1) and r0=evaluate(f1, "/prolog/title"); In this case temporary variables named zf0 and zf1 are introduced to express the condition of the where clause. Furthermore, a type extent text is introduced to match the string representation of the node value of node. The function nodevalue applied on a node does not the string representation of this node. Therefore the text type extent has to be introduced. Applying the nodevalue function on an attribute s the string representation and no text type extent is needed. zf0 contains the nodes named a_id, which were already selected by f0. This expression is introduced because in case that the path does consist of several steps, they are performed here. zf0 contains the parent nodes of the text type extent. zf1 is of type attr and contains the id attribute of the nodes selected by f Combination of for and let clause This section will deal with the combination of a for and a let clause which is used by some queries in the waterloo benchmark. There are basically two different kinds of combination of these two clauses. The first one introduced here is to get the count of a bag of objects. TC/MD_Q3: Group articles by date and calculate the total number of articles in each group. XQuery: for $a in distinct-values (input()/article/prolog/dateline/date) let $b := input()/article/prolog/dateline[date=$a] <Date>{$a/text()}</Date> <NumberOfArticles>{count($b)}</NumberOfArticles> select distinct r0, count(r1) from node f0, node xf0, bag of node l0, text f0t, node r0, bag of node r1 where f0=evaluate(xf0, "/article/prolog/dateline/date") and l0=(select x from node x, text t, node x1 where x=evaluate(x1, "/article/prolog/dateline/date") and nodevalue(t)=nodevalue(f0t) and parentnode(t)=x) and parentnode(f0t)=f0 and r0=evaluate(f0, "/text()") and r1=l0; The for clause selects all disctinct values of the nodes called "date". The let clause selects in each iteration of the for clause all the datelines who have the same date as the date selected by the for clause. To express the different kind of variable bindings of the let clause compared to the for clause, an inner query is used. This inner query also uses a text type 18
19 extent to get the string representation of the date. The type extent of the corresponding expression of the let clause is bag of node and not simply node as in case of the for clause. Therefore the function count(bag of object o) -> integer can be applied and will the number of objects in this bag. The other possibility is very similar to the combination of two for clauses. The example given here will show it. DC/MD_Q4 List the item id of the previous item of a matching item with id attribute value (8). XQuery: let $item := input()/items/item[@id="8"] for $previtem in input()/items/item[. << $item] [position() = last()] <CurrentItem>{$item/@id}</CurrentItem> <PreviousItem>{$prevItem/@id}</PreviousItem>, r1 from node f0, node xf0, bag of node l0, attr r0, attr r1 where f0=evaluate(in(l0), "precedingsibling::node()[position()=last()]") and f0=evaluate(xf0, "/items/item") and l0=(select evaluate(x, "/items/item[@id='8'] ") from node x) and nodename(r0)="id" and ownerelement(r0)=in(l0) and nodename(r1)="id" and ownerelement(r1)=f0; As mentioned before there is no big difference between the combination of two for clauses and the combination of a for clause and a let clause as shown here. The type extent of the let clause is still a bag of node (l0). Later, the function in(bag of object o) -> bag of object x is applied on l0. Therefore, at this point, there is no difference in treatment between for and let clauses. The only difference is that in case on of the element extracted by the let clause should be used as a root node of an evaluation, the function in(bag of object o) -> bag of object x has to be applied (first condition in the AmosQL where clause). If the in( ) function is not applied during evaluation like here through the ownerelement(r0)=in(l0) and the results of the let clause are directly results of the query, the in( ) function is applied "at the end" in the select statement (e.g. select in(r0)). Since the direct translation of the variable binding done by the let clause is not possible, it is mostly treated the same as a for clause. The difference is only obvious when using the count( ) function. 19
20 3.3 Restrictions As mentioned before, not all expressions of the XQuery language can be translated into AmosQL statements. Nevertheless almost all queries of the Waterloo benchmark can be translated. One that could not be translated was for example DC/SD_Q10. for $a in input()/catalog/item where $a/date_of_release gt " " and $a/date_of_release lt " " order by $a/publisher/name {$a/title} {$a/publisher} The results of this query should be ordered by the name of the publisher. In the current version of this project, only order by clauses which are also mentioned in the clause can be evaluated. In the clause, only publisher is ed, not his name. Therefore the translation is not possible. Another one that could not be translated is DC/SD_Q20. for $size in input()/catalog/item/attributes/size_of_book where $size/length*$size/width*$size/height > {$size/../../title} In this query a calculation of an integer is done in the where clause ($size/length*$size/width*$size/height > ). This is not supported in the current version. Another problem is the evaluation of XPath expressions with the AXE system. This system does not support character escapes (like 'eq' instead of '=') according to the XML rules. Therefore characters like comparison characters have to be written in their normal representation and not using their escape characters. 20
21 4. Translation Rules A brief summary of the necessary steps to perform the translation is given here. For the translation of multiple for/let clauses with references to each other please see For every for statement do the following: insert a new node type extent into the from clause insert a node x type extent into the from clause for temporary results insert a new statement looking like varname=evaluate(x, path ) into the where clause For every let statement do the following: insert a new bag of node type extent into the from clause insert a node x type extent into the from clause for temporary results insert a new statement looking like varname=evaluate(x, path ) into the where clause For every where statement do the following: simple case (where = " ") insert into the where clause: refvarname=evaluate(refvarname, "/.[predicate]") empty( ) insert into the where clause: notany(evaluate(refvar, "path")) contains( ) insert inner query into the where clause (see 3.2.2) some insert some(inner query) into the where clause (see 3.2.2) every insert count(innerquery)=count(innerquerynoconditions) into the where clause (see 3.2.2) For the order by statement do the following: write query as sort(query, pos, 'inc' 'dec') where query is the original query, pos is the position of the order key and 'inc' 'dec' is the sort order For every statement do the following: insert a new node/bag of node/attr type extent into the from clause, depending on the referenced variable insert varname/in(varname)/count(varname) into the select clause if a path expression is to be ed: insert a new statement looking like varname=evaluate(refvarname, path ) 21
22 into the where clause, the refnamevar was declared in some for or let statement if an attribute is to be ed: insert ownerelement(varname) = refvarname and nodename(varname) = "nodevalue" into the where clause if count(bag of object) is to be ed: insert varname = evaluate(refvarname, "path") into the where clause and if count is the only value: rewrite query as count(query) if count is not the only value: add count(varname) to the select clause 22
23 5. Implementation of the query translator The query translater implemented in this project consist of two Java classes. The Translator class does the actual translation of the XQuery expression to the AmosQL expression. The XQueryTrans class is the driver class for the communication with the AMOS II database. AMOS II Database system evalxquery AmosQL Query XQuery Extenstion XQueryTrans Class XQuery query String AmosQL query String Translator Class The function evalxquery(string strxqueryexpression) of the XQueryTrans class takes the XQuery expression as argument and calls the Translator class. The Translator class does the actual translation to an AmosQL expression. This expression is ed to the XQueryTrans class which calls AMOS II and executes the AmosQL query. To use this extension in AMOS II the following statement has to be executed: create function evalxquery(charstring expression) -> bag of Node as foreign "JAVA:XQueryTrans/evalXQuery"; With this statement a function called evalxquery is added to the AMOS II database system. It takes the XQuery expression as an argument. The value is bag of node. The definition as foreign function is done by the statement as foreign "JAVA:XQueryTrans/evalXQuery. The keyword JAVA means that the extension is written in java. The name of the java class is XQueryTrans and the name of the function to call is evalxquery. After executing this statement the XQuery extension can be used as a regular function of AMOS II. 23
24 6. Conclusion and future work XML is nowadays very common in use and it is therefore necessary to find a way to query the data of this format. Intelligent queries are necessary to deal with the large amount of data that needs to be processed. In this paper, ways to translate XQuery expressions into queries of the Functional Database system AMOS II with its query language AmosQL were discussed. The first approach using an XMLSchema translator developed at UDBL was not able to deal with the whole scope of the XQuery queries. That system translated XML Schema definitions into corresponding AMOS II database schema definitions. Because of the data centric approach assumed by the XMLSchema translator, i.e. focussing on the storage of the data rather than on the structure of the document, constructs of XQuery dealing with the document centric elements could not be translated. In general XQuery expressions are over the document structure, not the XMLSchema data schema and therefore the translated XMLSchema is of no use for XQuery translation. The other approach using the AXE system developed at the UDBL was capable to deal with both data-centric and document-centric XML documents and the corresponding queries. Due to the large complexity of XQuery, restrictions have to be made to build a translator. These restrictions where made by focusing only on the queries given by the Waterloo benchmark. Some more restrictions have to be made since they are not supported in AMOS II, for example the restrictions concerning the order by clause. A complete query translator was implemented. This system is able to translate queries expressed in XQuery into corresponding queries in AmosQL using the database schema and functions given by the AXE system. This implementation leaves some room for optimization and simplification of the generated expressions. This can be done as a future work. Some restrictions had to be made to implement the translation. This can be a starting point for future work. The restrictions that had to be made when translating the queries of the Waterloo benchmark concern the order by clause of a FLWOR expression and the calculation of values within a where clause. The biggest restriction that had to be made concerns disallowing nested FLWOR expressions such as a FLWOR expression in a clause. Such expressions are not present in the Waterloo benchmark. Utilization of indexes is always wanted in a database environment. Since functions are needed to have indexes, it is not possible to use this mechanism in the current version. Future work can therefore use the first approach given in chapter 2 and combine it with the current version of this project to a hybrid system. The first approach was not implemented yet and can be done in the future. Currently no syntax check is made for the queries expressed in XQuery. This could be added in future versions. 24
25 7. References [1] Risch, T.; Josifovski, V.; Katchaounov, T. (2000): AMOS II Concepts [2] Risch, T.; Josifovski, V.; Katchaounov, T. (2000): Functional Data Integration in a Distributed Mediator System [3] Elin, D.; Risch, T.(2000): Amos II Java Interfaces [4] Bin Yao, B.; Ozsu, M.T.; Keenleyside, J. (2003) XBench - A Family of Benchmarks for XML DBMSs [5] Bray, T.; Paoli, J.; Sperberg-McQueen, C.M.; Maler, E. (2000): Extensible Markup Language (XML) 1.0 (Second Edition) W3C Recommendation [6] Thompson, H.S.; Beech, D.; Maloney, M.; Mendelsohn, N. (2001): XML Schema Part 1: Structures W3C Recommendation [7] Clark, J.; DeRose, S. (1999) XML Path Language (XPath) Version 1.0 W3C Recommendation [8] Boag, S.; Chamberlin, D.; Fernández, M.F.; Florescu, D.; Robie, J.; Siméon, J. (2003): XQuery 1.0: An XML Query Language W3C Working Draft [9] Johannson, T.; Heggbrenna, R. (2003): Importing XML Schema into an Object-Oriented Database Mediator System Uppsala Master's Theses in Computer Science no
26 Appendix A: Translated Queries A.1 TC/MD TC/MD_Q1: Return the title of the article that has matching id attribute value (1) for $art in $art/prolog/title where f0=evaluate(xf0, and r0=evaluate(f0, "/prolog/title"); TC/MD_Q2: Find the title of the article authored by Ben Yang for $prolog in input()/article/prolog where $prolog/authors/author/name="ben Yang" $prolog/title from node f0, node xf0, node tmp_f0, node r0 where f0=evaluate(xf0, "/article/prolog") and tmp_f0=evaluate(f0, "/.[authors/author/name='ben Yang']") and r0=evaluate(tmp_f0, "/title"); TC/MD_Q3: Group articles by date and calculate the total number of articles in each group for $a in distinct-values (input()/article/prolog/dateline/date) let $b := input()/article/prolog/dateline[date=$a] <Date>{$a/text()}</Date> <NumberOfArticles>{count($b)}</NumberOfArticles> select distinct r0, count(r1) from node f0, node xf0, bag of node l0, text f0t, node r0, bag of node r1 where f0=evaluate(xf0, "/article/prolog/dateline/date") and l0=(select x from node x, text t, node x1 where x=evaluate(x1, "/article/prolog/dateline/date") and nodevalue(t)=nodevalue(f0t) and parentnode(t)=x) and parentnode(f0t)=f0 and r0=evaluate(f0, "/text()") and r1=l0; TC/MD_Q4: Find the heading of the section following the section entitled Introduction" in a certain article with id attribute value (4). 26
27 for $a in $p in >> $a][1] <HeadingOfSection> </HeadingOfSection> from node f0, node xf0, node f1, node xf1, attr r0 where f0=evaluate(xf0, and f1=evaluate(f0, "following-sibling::node()[1] ") and f1=evaluate(xf1, and nodename(r0)="heading" and ownerelement(r0)=f1; TC/MD_Q5: Return the headings of the first section of a certain article with id attribute value (2). for $a in input()/article[@id="2"] <HeadingOfSection> {$a/body/section[1]/@heading} </HeadingOfSection> where f0=evaluate(xf0, "/article[@id='2']") and r0=evaluate(f0, "/body/section[1]/@heading"); TC/MD_Q6: Find titles of articles where both keywords ( the" and hockey") are mentioned in the same paragraph of abstracts. for $a in input()/article where some $b in $a/body/abstract/p satisfies (contains($b, "the") and contains($b, "hockey")) $a/prolog/title from node f0, node xf0, node w, node r0 where f0=evaluate(xf0, "/article") and some(select x from node x, text t where x=evaluate(w, "/.") and like(nodevalue(t), "*the*") and parentnode(t)=x) and some(select x from node x, text t where x=evaluate(w, "/.") and like(nodevalue(t), "*hockey*") and parentnode(t)=x) and w=evaluate(f0, "/body/abstract/p") and r0=evaluate(f0, "/prolog/title"); TC/MD_Q7: Find titles of articles where a keyword ( hockey") is mentioned in every paragraph of abstract. for $a in input()/article where every $b in $a/body/abstract/p satisfies contains($b, "hockey") 27
28 $a/prolog/title where f0=evaluate(xf0, "/article") and count(select x from node x, text t where x=evaluate(f0, "/body/abstract/p")and like(nodevalue(t), "*hockey*") and parentnode(t)=x) =count(select x from node x where x=evaluate(f0, "/body/abstract/p")) and r0=evaluate(f0, "/prolog/title"); TC/MD_Q8: Return the names of all authors (one element name unknown) of the article with matching id attribute value (2). for $art in input()/article[@id="2"] $art/prolog/*/author/name where f0=evaluate(xf0, "/article[@id='2']") and r0=evaluate(f0, "/prolog/*/author/name"); TC/MD_Q9: Return all author names (several consecutive element unknown) of the article with matching id attribute value (2). for $art in input()/article[@id="2"] $art//author/name where f0=evaluate(xf0, "/article[@id='2']") and r0=evaluate(f0, "//author/name"); TC/MD_Q10: List the titles of articles sorted by country. for $a in input()/article/prolog order by $a/dateline/country {$a/title} {$a/dateline/country} sort((, r1, node r1 where f0=evaluate(xf0, "/article/prolog") and r0=evaluate(f0, "/title") and r1=evaluate(f0, "/dateline/country") ), 2, 'inc'); 28
29 TC/MD_Q11: List the titles of articles that have a matching country element type (Canada), sorted by date. for $a in input()/article/prolog where $a/dateline/country="canada" order by $a/dateline/date {$a/title} {$a/dateline/date} sort((, r1 from node f0, node xf0, node tmp_f0, node r0, node r1 where f0=evaluate(xf0, "/article/prolog") and tmp_f0=evaluate(f0, "/.[dateline/country='canada']") and r0=evaluate(tmp_f0, "/title") and r1=evaluate(tmp_f0, "/dateline/date") ), 2, 'inc'); TC/MD_Q12: Retrieve the body of the article that has a matching id attribute value (4). for $a in input()/article[@id="4"] <Article> {$a/body} </Article> where f0=evaluate(xf0, "/article[@id='4']") and r0=evaluate(f0, "/body"); TC/MD_Q13: Construct a brief information on the article that has a matching id attribute value (5), including title, the name of _rst author, date and abstract. for $a in input()/article[@id="5"] {$a/prolog/title} {$a/prolog/authors/author[1]/name} {$a/prolog/dateline/date} {$a/body/abstract}, r1, r2, r3, node r1, node r2, node r3 where f0=evaluate(xf0, "/article[@id='5']") and r0=evaluate(f0, "/prolog/title") and r1=evaluate(f0, "/prolog/authors/author[1]/name") and r2=evaluate(f0, "/prolog/dateline/date") and r3=evaluate(f0, "/body/abstract"); TC/MD_Q14: List article title that doesn't have genre element. for $a in input()/article/prolog 29
30 where empty ($a/genre) <NoGenre> {$a/title} </NoGenre> where f0=evaluate(xf0, "/article/prolog") and notany(evaluate(f0, "/genre")) and r0=evaluate(f0, "/title"); TC/MD_Q15: List author names whose contact elements are empty in articles. for $a in input()/article/prolog/authors/author where empty($a/contact/text()) <NoContact> {$a/name} </NoContact> where f0=evaluate(xf0, "/article/prolog/authors/author") and notany(evaluate(f0, "/contact/text()")) and r0=evaluate(f0, "/name"); TC/MD_Q16: Get the article by its id attribute value (4). for $a in $a where f0=evaluate(xf0, and r0=evaluate(f0, "/."); TC/MD_Q17: Return the titles of articles which contain a certain word ( hockey"). for $a in input()/article where contains ($a//p, "hockey") $a/prolog/title where f0=evaluate(xf0, "/article") and some(select x from node x, text t where x=evaluate(f0, "//p") and like(nodevalue(t), "*hockey*") and parentnode(t)=x) and r0=evaluate(f0, "/prolog/title"); TC/MD_Q18: List the titles and abstracts of articles which contain a given phrase ( the hockey"). for $a in input()/article 30
31 where contains ($a//p, "the hockey") {$a/prolog/title} {$a/body/abstract}, r1, node r1 where f0=evaluate(xf0, "/article") and some(select x from node x, text t where x=evaluate(f0, "//p") and like(nodevalue(t), "*the hockey*") and parentnode(t)=x) and r0=evaluate(f0, "/prolog/title") and r1=evaluate(f0, "/body/abstract"); TC/MD_Q19: List the names of articles cited by an article with a certain id attribute value (2). for $a in input()/article[@id='2']/epilog/references/a_id, $b in input()/article where $a = $b/@id {$b/prolog/title} from node f0, node xf0, node f1, node xf1, text tf0, node zf0, attr zf1, node r0 where f0=evaluate(xf0, "/article[@id='2']/epilog/references/a_id") and f1=evaluate(xf1, "/article") and zf0=evaluate(f0, "/.") and parentnode(tf0)=zf0 and nodename(zf1)="id" and ownerelement(zf1)=f1 and nodevalue(tf0)=nodevalue(zf1) and r0=evaluate(f1, "/prolog/title"); A.2 TC/SD TC/SD_Q1: Return the entry that has matching headword ("the"). for $ent in input()/dictionary/e where $ent/hwg/hw="the" $ent from node f0, node xf0, node tmp_f0, node r0 where f0=evaluate(xf0, "/dictionary/e") and tmp_f0=evaluate(f0, "/.[hwg/hw='the']") and r0=evaluate(tmp_f0, "/."); TC/SD_Q2: Find the headword of the entry which has matching quotation year (1900). for $ent in input()/dictionary/e 31
32 where $ent/ss/s/qp/q/qd="1900" $ent/hwg/hw from node f0, node xf0, node tmp_f0, node r0 where f0=evaluate(xf0, "/dictionary/e") and tmp_f0=evaluate(f0, "/.[ss/s/qp/q/qd='1900']") and r0=evaluate(tmp_f0, "/hwg/hw"); TC/SD_Q3: Group entries by quotation location in a certain quotation year (1900) and calculate the total number entries in each group. for $a in distinct-values (input()/dictionary/e/ss/s/qp/q[qd="1900"]/loc) let $b := input()/dictionary/e/ss/s/qp/q[loc=$a] <Location>{$a/text()}</Location> <NumberOfEntries>{count($b)}</NumberOfEntries> select distinct r0, count(r1) from node f0, node xf0, bag of node l0, text f0t, node r0, bag of node r1 where f0=evaluate(xf0, "/dictionary/e/ss/s/qp/q[qd='1900']/loc") and l0=(select x from node x, text t, node x1 where x=evaluate(x1, "/dictionary/e/ss/s/qp/q/loc") and nodevalue(t)=nodevalue(f0t) and parentnode(t)=x) and parentnode(f0t)=f0 and r0=evaluate(f0, "/text()") and r1=l0; TC/SD_Q4: List the headword of the previous entry of a matching headword ("you"). let $ent := input()/dictionary/e[hwg/hw="you"] for $prevent in input()/dictionary/e[hwg/hw << $ent] [position() = last()] <CurrentEntry>{$ent/hwg/hw/text()}</CurrentEntry> <PreviousEntry>{$prevEnt/hwg/hw/text()}</PreviousEntry>, r1 from node f0, node xf0, node zf0, bag of node l0, node r0, node r1 where zf0=evaluate(in(l0), "hwg/hw") and zf0=evaluate(in(l0), "hwg/hw/precedingsibling::node()[position() = last()]") and f0=evaluate(xf0, "/dictionary/e") and l0=(select evaluate(x, "/dictionary/e[hwg/hw='you'] ") from node x) and r0=evaluate(in(l0), "/hwg/hw/text()") and r1=evaluate(f0, "/hwg/hw/text()"); TC/SD_Q5: Return the first sense of a matching headword ("that"). 32
33 for $a in input()/dictionary/e where $a/hwg/hw="that" $a/ss/s[1] from node f0, node xf0, node tmp_f0, node r0 where f0=evaluate(xf0, "/dictionary/e") and tmp_f0=evaluate(f0, "/.[hwg/hw='that']") and r0=evaluate(tmp_f0, "/ss/s[1]"); TC/SD_Q6: Return the words where some quotations were quoted in a certain year (1900). for $word in input()/dictionary/e where some $item in $word/ss/s/qp/q satisfies $item/qd eq "1900" $word from node f0, node xf0, node w, node xw, node r0 where f0=evaluate(xf0, "/dictionary/e") and xw=evaluate(w, "/.[qd eq '1900']") and w=evaluate(f0, "/ss/s/qp/q") and r0=evaluate(f0, "/."); TC/SD_Q7: Return the words where all quotations were quoted in a certain year (1900). for $word in input()/dictionary/e where every $item in $word/ss/s/qp/q satisfies $item/qd eq "1900" $word where f0=evaluate(xf0, "/dictionary/e") and count(evaluate(f0, "/ss/s/qp/q[qd eq '1900']")) =count(evaluate(f0, "/ss/s/qp/q")) and r0=evaluate(f0, "/."); TC/SD_Q8: Return Quotation Text (one element name unknown) of a word ("and"). for $ent in input()/dictionary/e where $ent/*/hw = "and" $ent/ss/s/qp/*/qt from node f0, node xf0, node tmp_f0, node r0 where f0=evaluate(xf0, "/dictionary/e") and tmp_f0=evaluate(f0, "/.[*/hw = 'and']") and r0=evaluate(tmp_f0, "/ss/s/qp/*/qt"); TC/SD_Q9: 33
34 Return Quotation Text (several consecutive element names unknown) of a word ("or"). for $ent in input()/dictionary/e where $ent//hw = "or" $ent//qt from node f0, node xf0, node tmp_f0, node r0 where f0=evaluate(xf0, "/dictionary/e") and tmp_f0=evaluate(f0, "/.[//hw = 'or']") and r0=evaluate(tmp_f0, "//qt"); TC/SD_Q10: List the words and their pronunciation, alphabetically, quoted in a certain year (1900). for $a in input()/dictionary/e where $a/ss/s/qp/q/qd = "1900" order by $a/hwg/hw {$a/hwg/hw} {$a/hwg/pr} sort((, r1 from node f0, node xf0, node tmp_f0, node r0, node r1 where f0=evaluate(xf0, "/dictionary/e") and tmp_f0=evaluate(f0, "/.[ss/s/qp/q/qd = '1900']") and r0=evaluate(tmp_f0, "/hwg/hw") and r1=evaluate(tmp_f0, "/hwg/pr") ), 1, 'inc'); TC/SD_Q11: List the quotation locations and quotation dates, sorted by date, for a word ("word"). for $a in input()/dictionary/e[hwg/hw="the"]/ss/s/qp/q order by $a/qd {$a/a} {$a/qd} sort((, r1, node r1 where f0=evaluate(xf0, "/dictionary/e[hwg/hw='the']/ss/s/qp/q") and r0=evaluate(f0, "/a") and r1=evaluate(f0, "/qd") ), 2, 'inc'); TC/SD_Q12: Retrieve the senses of a word ("his"). for $a in input()/dictionary/e where $a/hwg/hw="his" 34
35 <Entry> {$a/ss} </Entry> from node f0, node xf0, node tmp_f0, node r0 where f0=evaluate(xf0, "/dictionary/e") and tmp_f0=evaluate(f0, "/.[hwg/hw='his']") and r0=evaluate(tmp_f0, "/ss"); TC/SD_Q13: Construct a brief information on a word ("his"), including: headword, pronunciation, part_of_speech, first etymology and first sense definition. for $a in input()/dictionary/e where $a/hwg/hw="his" {$a/hwg/hw} {$a/hwg/pr} {$a/hwg/pos} {$a/etymology/cr[1]} {$a/ss/s[1]/def}, r1, r2, r3, r4 from node f0, node xf0, node tmp_f0, node r0, node r1, node r2, node r3, node r4 where f0=evaluate(xf0, "/dictionary/e") and tmp_f0=evaluate(f0, "/.[hwg/hw='his']") and r0=evaluate(tmp_f0, "/hwg/hw") and r1=evaluate(tmp_f0, "/hwg/pr") and r2=evaluate(tmp_f0, "/hwg/pos") and r3=evaluate(tmp_f0, "/etymology/cr[1]") and r4=evaluate(tmp_f0, "/ss/s[1]/def"); TC/SD_Q14: List the ids of entries that do not have variant form lists and etymologies. for $a in input()/dictionary/e where empty($a/vfl) and empty($a/et) <NoVFLnET> {$a/@id} </NoVFLnET> from node f0, node xf0, attr r0 where f0=evaluate(xf0, "/dictionary/e") and notany(evaluate(f0, "/vfl")) and notany(evaluate(f0, "/et")) and nodename(r0)="id" and ownerelement(r0)=f0; TC/SD_Q17: Return the headwords of the entries which contain a certain word ("hockey"). for $a in input()/dictionary/e where contains ($a, "hockey") $a/hwg/hw 35
36 where f0=evaluate(xf0, "/dictionary/e") and some(select x from node x, text t where x=evaluate(f0, "/.") and like(nodevalue(t), "*hockey*") and parentnode(t)=x) and r0=evaluate(f0, "/hwg/hw"); TC/SD_Q18: List the headwords of entries which contain a given phrase ("the hockey"). for $a in input()/dictionary/e where contains($a, "the hockey") $a/hwg/hw where f0=evaluate(xf0, "/dictionary/e") and some(select x from node x, text t where x=evaluate(f0, "/.") and like(nodevalue(t), "*the hockey*") and parentnode(t)=x) and r0=evaluate(f0, "/hwg/hw"); TC/SD_Q19: Retrieve the headwords of entries cited, in etymology part, by certain entry with id attribute value (E1). for $ent in input()/dictionary/e[@id="e1"], $related in input()/dictionary/e where $ent/et/cr = $related/@id {$related/hwg/hw} from node f0, node xf0, node f1, node xf1, text tf0, node zf0, attr zf1, node r0 where f0=evaluate(xf0, "/dictionary/e[@id='e1']") and f1=evaluate(xf1, "/dictionary/e") and zf0=evaluate(f0, "/et/cr") and parentnode(tf0)=zf0 and nodename(zf1)="id" and ownerelement(zf1)=f1 and nodevalue(tf0)=nodevalue(zf1) and r0=evaluate(f1, "/hwg/hw"); A.3 DC/MD DC/MD_Q1: Return the customer id of the order that has matching id attribute value (1). for $order in input()/order[@id="1"] $order/customer_id 36
37 where f0=evaluate(xf0, and r0=evaluate(f0, "/customer_id"); DC/MD_Q3: Group orders with total amount bigger than a certain number( ), by customer id and calculate the total number of each group. for $a in distinct-values (input()/order [total > ]/customer_id) let $b := input()/order[customer_id=$a] <CustKey>{$a/text()}</CustKey> <NumberOfOrders>{count($b)}</NumberOfOrders> select distinct r0, count(r1) from node f0, node xf0, bag of node l0, text f0t, node r0, bag of node r1 where f0=evaluate(xf0, "/order[total > ]/customer_id") and l0=(select x from node x, text t, node x1 where x=evaluate(x1, "/order/customer_id") and nodevalue(t)=nodevalue(f0t) and parentnode(t)=x) and parentnode(f0t)=f0 and r0=evaluate(f0, "/text()") and r1=l0; DC/MD_Q4 List the item id of the previous item of a matching item with id attribute value (8). let $item := input()/items/item[@id="8"] for $previtem in input()/items/item[. << $item][position()=last()] <CurrentItem>{$item/@id}</CurrentItem> <PreviousItem>{$prevItem/@id}</PreviousItem>, r1 from node f0, node xf0, bag of node l0, attr r0, attr r1 where f0=evaluate(in(l0), "precedingsibling::node()[position()=last()]") and f0=evaluate(xf0, "/items/item") and l0=(select evaluate(x, "/items/item[@id='8'] ") from node x) and nodename(r0)="id" and ownerelement(r0)=in(l0) and nodename(r1)="id" and ownerelement(r1)=f0; DC/MD_Q5: Return the first order line item of a certain order with id attribute value (2). for $a in input()/order[@id="2"] $a/order_lines/order_line[1] 37
38 where f0=evaluate(xf0, and r0=evaluate(f0, "/order_lines/order_line[1]"); DC/MD_Q6: Return invoice where some discount rates of sub-line items are higher than a certain number (0.02). for $ord in input()/order where some $item in $ord/order_lines/order_line satisfies $item/discount_rate gt 0.02 $ord from node f0, node xf0, node w, node xw, node r0 where f0=evaluate(xf0, "/order") and xw=evaluate(w, "/.[discount_rate gt 0.02]") and w=evaluate(f0, "/order_lines/order_line") and r0=evaluate(f0, "/."); DC/MD_Q7: Return invoice where all discount rates of sub-line items are higher than a certain number (0.02). for $ord in input()/order where every $item in $ord/order_lines/order_line satisfies $item/discount_rate gt 0.02 $ord where f0=evaluate(xf0, "/order") and count(evaluate(f0, "/order_lines/order_line[discount_rate gt 0.02]")) =count(evaluate(f0, "/order_lines/order_line")) and r0=evaluate(f0, "/."); DC/MD_Q8: Return the order line item ids of an order with an attribute value (3). for $a in input()/order[@id="3"] $a/*/order_line/item_id where f0=evaluate(xf0, "/order[@id='3']") and r0=evaluate(f0, "/*/order_line/item_id"); DC/MD_Q9: Return the item ids of an order with id attribute value (4). for $a in input()/order[@id="4"] $a//item_id 38
39 where f0=evaluate(xf0, and r0=evaluate(f0, "//item_id"); DC/MD_Q10: List the orders (order id, order date and ship type), with total amount larger than a certain number ( ), ordered alphabetically by ship type. for $a in input()/order where $a/total gt order by $a/ship_type {$a/@id} {$a/order_date} {$a/ship_type} sort((, r1, r2 from node f0, node xf0, node tmp_f0, attr r0, node r1, node r2 where f0=evaluate(xf0, "/order") and tmp_f0=evaluate(f0, "/.[total gt ]") and nodename(r0)="id" and ownerelement(r0)=tmp_f0 and r1=evaluate(tmp_f0, "/order_date") and r2=evaluate(tmp_f0, "/ship_type") ), 3, 'inc'); DC/MD_Q11: List the orders (order id, order date and order total), with total amount larger than a certain number ( ), in descending order by total amount. for $a in input()/order where $a/total gt order by $a/total descending {$a/@id} {$a/order_date} {$a/total} sort((, r1, r2 from node f0, node xf0, node tmp_f0, attr r0, node r1, node r2 where f0=evaluate(xf0, "/order") and tmp_f0=evaluate(f0, "/.[total gt ]") and nodename(r0)="id" and ownerelement(r0)=tmp_f0 and r1=evaluate(tmp_f0, "/order_date") and r2=evaluate(tmp_f0, "/total") ), 3, 'dec'); DC/MD_Q12: List all order lines of a certain order with id attribute value (5). for $a in input()/order[@id="5"] {$a/order_lines} 39
40 where f0=evaluate(xf0, and r0=evaluate(f0, "/order_lines"); DC/MD_Q14: List the ids of orders that only have one order line. for $a in input()/order where empty($a/order_lines/order_line[2]) <OneItemLine> </OneItemLine> from node f0, node xf0, attr r0 where f0=evaluate(xf0, "/order") and notany(evaluate(f0, "/order_lines/order_line[2]")) and nodename(r0)="id" and ownerelement(r0)=f0; DC/MD_Q16: Retrieve one whole order document with certain id attribute value (6). for $a in $a where f0=evaluate(xf0, and r0=evaluate(f0, "/."); DC/MD_Q17: Return the ids of authors whose biographies contain a certain word ("hockey"). for $a in input()/authors/author where contains ($a/biography, "hockey") {$a/@id} from node f0, node xf0, attr r0 where f0=evaluate(xf0, "/authors/author") and some(select x from node x, text t where x=evaluate(f0, "/biography") and like(nodevalue(t), "*hockey*") and parentnode(t)=x) and nodename(r0)="id" and ownerelement(r0)=f0; DC/MD_19: For a particular order with id attribute value (7), get its customer name and phone, and its order status. 40
41 for $order in input()/order, $cust in input()/customers/customer where $order/customer_id = $cust/@id and $order/@id = "7" {$order/@id} {$order/order_status} {$cust/first_name} {$cust/last_name} {$cust/phone_number}, r1, r2, r3, r4 from node f0, node xf0, node f1, node xf1, text tf0, node zf0, attr zf1, node tmp_f0, attr r0, node r1, node r2, node r3, node r4 where f0=evaluate(xf0, "/order") and f1=evaluate(xf1, "/customers/customer") and zf0=evaluate(f0, "/customer_id") and parentnode(tf0)=zf0 and nodename(zf1)="id" and ownerelement(zf1)=f1 and nodevalue(tf0)=nodevalue(zf1) and tmp_f0=evaluate(f0, "/.[@id = '7']") and nodename(r0)="id" and ownerelement(r0)=tmp_f0 and r1=evaluate(tmp_f0, "/order_status") and r2=evaluate(f1, "/first_name") and r3=evaluate(f1, "/last_name") and r4=evaluate(f1, "/phone_number"); A.4 DC/SD DC/SD_Q1 Return the item that has matching item id attribute value (I1). for $item in input()/catalog/item[@id="i1"] $item where f0=evaluate(xf0, "/catalog/item[@id='i1']") and r0=evaluate(f0, "/."); DC/SD_Q2 Find the title of the item which has matching author first name (Ben). for $item in input()/catalog/item where $item/authors/author/name/first_name = "Ben" $item/title from node f0, node xf0, node tmp_f0, node r0 where f0=evaluate(xf0, "/catalog/item") 41
42 and tmp_f0=evaluate(f0, "/.[authors/author/name/first_name = 'Ben']") and r0=evaluate(tmp_f0, "/title"); DC/SD_Q3: Group items released in a certain year (1990), by publisher name and calculate the total number of items for each group. for $a in distinct-values (input()/catalog/item [date_of_release >= " "] [date_of_release < " "]/publisher/name) let $b := input()/catalog/item/publisher[name=$a] <Publisher>{$a/text()}</Publisher> <NumberOfItems>{count($b)}</NumberOfItems> select distinct r0, count(r1) from node f0, node xf0, bag of node l0, text f0t, node r0, bag of node r1 where f0=evaluate(xf0, "/catalog/item [date_of_release >= ' '] [date_of_release < ' ']/publisher/name") and l0=(select x from node x, text t, node x1 where x=evaluate(x1, "/catalog/item/publisher/name") and nodevalue(t)=nodevalue(f0t) and parentnode(t)=x) and parentnode(f0t)=f0 and r0=evaluate(f0, "/text()") and r1=l0; DC/SD_Q4: List the item id of the previous item of a matching item with id attribute value (I2). let $item := input()/catalog/item[@id="i2"] for $previtem in input()/catalog/item [. << $item][position() = last()] <CurrentItem>{$item/@id}</CurrentItem> <PreviousItem>{$prevItem/@id}</PreviousItem>, r1 from node f0, node xf0, bag of node l0, attr r0, attr r1 where f0=evaluate(in(l0), "preceding-sibling::node()[position() = last()]") and f0=evaluate(xf0, "/catalog/item") and l0=(select evaluate(x, "/catalog/item[@id='i2'] ") from node x) and nodename(r0)="id" and ownerelement(r0)=in(l0) and nodename(r1)="id" and ownerelement(r1)=f0; DC/SD_Q5: Return the information about the _rst author of item with a matching id attribute value (I3). 42
43 for $a in $a/authors/author[1] where f0=evaluate(xf0, and r0=evaluate(f0, "/authors/author[1]"); DC/SD_Q6: Return item information where some authors are from certain country (Canada). for $item in input()/catalog/item where some $auth in $item/authors/author/contact_information/mailing_address satisfies $auth/name_of_country = "Canada" $item from node f0, node xf0, node w, node xw, node r0 where f0=evaluate(xf0, "/catalog/item") and xw=evaluate(w, "/.[name_of_country = 'Canada']") and w=evaluate(f0, "/authors/author/contact_information/mailing_address") and r0=evaluate(f0, "/."); DC/SD_Q7: Return item information where all its authors are from certain country (Canada). for $item in input()/catalog/item where every $add in $item/authors/author/contact_information/mailing_address satisfies $add/name_of_country = "Canada" $item where f0=evaluate(xf0, "/catalog/item") and count(evaluate(f0, "/authors/author/contact_information/mailing_address[name _of_country = 'Canada']")) =count(evaluate(f0, "/authors/author/contact_information/mailing_address")) and r0=evaluate(f0, "/."); DC/SD_Q8: Return the publisher of an item with id attribute value (I4). for $a in input()/catalog/*[@id="i4"] $a/publisher where f0=evaluate(xf0, "/catalog/*[@id='i4']") and r0=evaluate(f0, "/publisher"); 43
44 DC/SD_Q9: Return the ISBN of an item with id attribute value (I5). for $a in input()/catalog/item where $a//isbn/text() from node f0, node xf0, node tmp_f0, node r0 where f0=evaluate(xf0, "/catalog/item") and tmp_f0=evaluate(f0, and r0=evaluate(tmp_f0, "//ISBN/text()"); DC/SD_Q10: List the item titles ordered alphabetically by publisher name, with release date within a certain time period (from to ). for $a in input()/catalog/item where $a/date_of_release gt " " and $a/date_of_release lt " " order by $a/publisher/name {$a/title} {$a/publisher} not supported in current version unordered query generated:, r1 from node f0, node xf0, node tmp_f0, node tmp_tmp_f0, node r0, node r1 where f0=evaluate(xf0, "/catalog/item") and tmp_f0=evaluate(f0, "/.[date_of_release gt ' ']") and tmp_tmp_f0=evaluate(tmp_f0, "/.[date_of_release lt ' ']") and r0=evaluate(tmp_tmp_f0, "/title") and r1=evaluate(tmp_tmp_f0, "/publisher"); DC/SD_Q11: List the item titles in descending order by date of release with date of release within a certain time range (from to ). for $a in input()/catalog/item where $a/date_of_release gt " " and $a/date_of_release lt " " order by $a/date_of_release descending {$a/title} {$a/date_of_release} sort((, r1 from node f0, node xf0, node tmp_f0, node tmp_tmp_f0, node r0, node r1 where f0=evaluate(xf0, "/catalog/item") 44
45 and tmp_f0=evaluate(f0, "/.[date_of_release gt ' ']") and tmp_tmp_f0=evaluate(tmp_f0, "/. [date_of_release lt ' ']") and r0=evaluate(tmp_tmp_f0, "/title") and r1=evaluate(tmp_tmp_f0, "/date_of_release") ), 2, 'dec'); DC/SD_Q12: Get the mailing address of the _rst author of certain item with id attribute value (I6). for $a in {$a/authors/author[1]/contact_information/mailing_address} where f0=evaluate(xf0, and r0=evaluate(f0, "/authors/author[1]/contact_information/mailing_address"); DC/SD_Q14: Return the names of publishers who publish books between a period of time (from to ) but do not have FAX number. for $a in input()/catalog/item where $a/date_of_release gt " " and $a/date_of_release lt " " and empty($a/publisher/contact_information/fax_number) {$a/publisher/name} from node f0, node xf0, node tmp_f0, node tmp_tmp_f0, node r0 where f0=evaluate(xf0, "/catalog/item") and tmp_f0=evaluate(f0, "/.[date_of_release gt ' ']") and tmp_tmp_f0=evaluate(tmp_f0, "/.[date_of_release lt ' ']") and notany(evaluate(tmp_tmp_f0, "/publisher/contact_information/fax_number")) and r0=evaluate(tmp_tmp_f0, "/publisher/name"); DC/SD_Q17: Return the ids of items whose descriptions contain a certain word ("hockey"). for $a in input()/catalog/item where contains ($a/description, "hockey") {$a/@id} from node f0, node xf0, attr r0 where f0=evaluate(xf0, "/catalog/item") 45
46 and some(select x from node x, text t where x=evaluate(f0, "/description") and like(nodevalue(t), "*hockey*") and parentnode(t)=x) and nodename(r0)="id" and ownerelement(r0)=f0; DC/SD_Q19: Retrieve the item titles related by certain item with id attribute value (I7). for $item in input()/catalog/item[@id="i7"], $related in input()/catalog/item where $item/related_items/related_item/item_id = $related/@id {$related/title} from node f0, node xf0, node f1, node xf1, text tf0, node zf0, attr zf1, node r0 where f0=evaluate(xf0, "/catalog/item[@id='i7']") and f1=evaluate(xf1, "/catalog/item") and zf0=evaluate(f0, "/related_items/related_item/item_id") and parentnode(tf0)=zf0 and nodename(zf1)="id" and ownerelement(zf1)=f1 and nodevalue(tf0)=nodevalue(zf1) and r0=evaluate(f1, "/title"); DC/SD_Q20: Retrieve the item title whose size (length*width*height) is bigger than certain number (500000). for $size in input()/catalog/item/attributes/size_of_book where $size/length*$size/width*$size/height > {$size/../../title} not supported in current version query generated: from node f0, node xf0, node tmp_f0, node r0 where f0=evaluate(xf0, "/catalog/item/attributes/size_of_book") and tmp_f0=evaluate(f0, "/.[length*$size/width*$size/height > ]") and r0=evaluate(tmp_f0, "/../../title"); 46
Semistructured data and XML. Institutt for Informatikk INF3100 09.04.2013 Ahmet Soylu
Semistructured data and XML Institutt for Informatikk 1 Unstructured, Structured and Semistructured data Unstructured data e.g., text documents Structured data: data with a rigid and fixed data format
Managing large sound databases using Mpeg7
Max Jacob 1 1 Institut de Recherche et Coordination Acoustique/Musique (IRCAM), place Igor Stravinsky 1, 75003, Paris, France Correspondence should be addressed to Max Jacob ([email protected]) ABSTRACT
Markup Languages and Semistructured Data - SS 02
Markup Languages and Semistructured Data - SS 02 http://www.pms.informatik.uni-muenchen.de/lehre/markupsemistrukt/02ss/ XPath 1.0 Tutorial 28th of May, 2002 Dan Olteanu XPath 1.0 - W3C Recommendation language
Deferred node-copying scheme for XQuery processors
Deferred node-copying scheme for XQuery processors Jan Kurš and Jan Vraný Software Engineering Group, FIT ČVUT, Kolejn 550/2, 160 00, Prague, Czech Republic [email protected], [email protected] Abstract.
Quiz! Database Indexes. Index. Quiz! Disc and main memory. Quiz! How costly is this operation (naive solution)?
Database Indexes How costly is this operation (naive solution)? course per weekday hour room TDA356 2 VR Monday 13:15 TDA356 2 VR Thursday 08:00 TDA356 4 HB1 Tuesday 08:00 TDA356 4 HB1 Friday 13:15 TIN090
Introduction to XML Applications
EMC White Paper Introduction to XML Applications Umair Nauman Abstract: This document provides an overview of XML Applications. This is not a comprehensive guide to XML Applications and is intended for
Extensible Markup Language (XML): Essentials for Climatologists
Extensible Markup Language (XML): Essentials for Climatologists Alexander V. Besprozvannykh CCl OPAG 1 Implementation/Coordination Team The purpose of this material is to give basic knowledge about XML
Unified XML/relational storage March 2005. The IBM approach to unified XML/relational databases
March 2005 The IBM approach to unified XML/relational databases Page 2 Contents 2 What is native XML storage? 3 What options are available today? 3 Shred 5 CLOB 5 BLOB (pseudo native) 6 True native 7 The
Chapter 3: XML Namespaces
3. XML Namespaces 3-1 Chapter 3: XML Namespaces References: Tim Bray, Dave Hollander, Andrew Layman: Namespaces in XML. W3C Recommendation, World Wide Web Consortium, Jan 14, 1999. [http://www.w3.org/tr/1999/rec-xml-names-19990114],
XBRL Processor Interstage XWand and Its Application Programs
XBRL Processor Interstage XWand and Its Application Programs V Toshimitsu Suzuki (Manuscript received December 1, 2003) Interstage XWand is a middleware for Extensible Business Reporting Language (XBRL)
Translating between XML and Relational Databases using XML Schema and Automed
Imperial College of Science, Technology and Medicine (University of London) Department of Computing Translating between XML and Relational Databases using XML Schema and Automed Andrew Charles Smith acs203
Creating a TEI-Based Website with the exist XML Database
Creating a TEI-Based Website with the exist XML Database Joseph Wicentowski, Ph.D. U.S. Department of State July 2010 Goals By the end of this workshop you will know:...1 about a flexible set of technologies
Modern Databases. Database Systems Lecture 18 Natasha Alechina
Modern Databases Database Systems Lecture 18 Natasha Alechina In This Lecture Distributed DBs Web-based DBs Object Oriented DBs Semistructured Data and XML Multimedia DBs For more information Connolly
An Approach to Translate XSLT into XQuery
An Approach to Translate XSLT into XQuery Albin Laga, Praveen Madiraju and Darrel A. Mazzari Department of Mathematics, Statistics, and Computer Science Marquette University P.O. Box 1881, Milwaukee, WI
How To Use X Query For Data Collection
TECHNICAL PAPER BUILDING XQUERY BASED WEB SERVICE AGGREGATION AND REPORTING APPLICATIONS TABLE OF CONTENTS Introduction... 1 Scenario... 1 Writing the solution in XQuery... 3 Achieving the result... 6
A Workbench for Prototyping XML Data Exchange (extended abstract)
A Workbench for Prototyping XML Data Exchange (extended abstract) Renzo Orsini and Augusto Celentano Università Ca Foscari di Venezia, Dipartimento di Informatica via Torino 155, 30172 Mestre (VE), Italy
Model-Mapping Approaches for Storing and Querying XML Documents in Relational Database: A Survey
Model-Mapping Approaches for Storing and Querying XML Documents in Relational Database: A Survey 1 Amjad Qtaish, 2 Kamsuriah Ahmad 1 School of Computer Science, Faculty of Information Science and Technology,
XML: extensible Markup Language. Anabel Fraga
XML: extensible Markup Language Anabel Fraga Table of Contents Historic Introduction XML vs. HTML XML Characteristics HTML Document XML Document XML General Rules Well Formed and Valid Documents Elements
Structured vs. unstructured data. Motivation for self describing data. Enter semistructured data. Databases are highly structured
Structured vs. unstructured data 2 Databases are highly structured Semistructured data, XML, DTDs Well known data format: relations and tuples Every tuple conforms to a known schema Data independence?
XML and Data Management
XML and Data Management XML standards XML DTD, XML Schema DOM, SAX, XPath XSL XQuery,... Databases and Information Systems 1 - WS 2005 / 06 - Prof. Dr. Stefan Böttcher XML / 1 Overview of internet technologies
Standard Recommended Practice extensible Markup Language (XML) for the Interchange of Document Images and Related Metadata
Standard for Information and Image Management Standard Recommended Practice extensible Markup Language (XML) for the Interchange of Document Images and Related Metadata Association for Information and
A Tale of Two Approaches: Query Performance Study of XML Storage Strategies in Relational Databases
A Tale of Two Approaches: Performance Study of XML Storage Strategies in Relational Databases Sandeep Prakash Sourav S. Bhowmick School of Computer Engineering, Nanyang Technological University, Singapore
Storing and Querying Ordered XML Using a Relational Database System
Kevin Beyer IBM Almaden Research Center Storing and Querying Ordered XML Using a Relational Database System Igor Tatarinov* University of Washington Jayavel Shanmugasundaram* Cornell University Stratis
[MS-ACCDT]: Access Template File Format. Intellectual Property Rights Notice for Open Specifications Documentation
[MS-ACCDT]: Intellectual Property Rights Notice for Open Specifications Documentation Technical Documentation. Microsoft publishes Open Specifications documentation for protocols, file formats, languages,
Relational Databases
Relational Databases Jan Chomicki University at Buffalo Jan Chomicki () Relational databases 1 / 18 Relational data model Domain domain: predefined set of atomic values: integers, strings,... every attribute
OData Extension for XML Data A Directional White Paper
OData Extension for XML Data A Directional White Paper Introduction This paper documents some use cases, initial requirements, examples and design principles for an OData extension for XML data. It is
Chapter 1: Introduction
Chapter 1: Introduction Database System Concepts, 5th Ed. See www.db book.com for conditions on re use Chapter 1: Introduction Purpose of Database Systems View of Data Database Languages Relational Databases
Data Integration through XML/XSLT. Presenter: Xin Gu
Data Integration through XML/XSLT Presenter: Xin Gu q7.jar op.xsl goalmodel.q7 goalmodel.xml q7.xsl help, hurt GUI +, -, ++, -- goalmodel.op.xml merge.xsl goalmodel.input.xml profile.xml Goal model configurator
XBench Benchmark and Performance Testing of XML DBMSs
XBench Benchmark and Performance Testing of XML DBMSs Benjamin Bin Yao M. Tamer Özsu University of Waterloo School of Computer Science {bbyao, tozsu}@uwaterloo.ca Nitin Khandelwal University of Pennsylvania
AN ENHANCED DATA MODEL AND QUERY ALGEBRA FOR PARTIALLY STRUCTURED XML DATABASE
THE UNIVERSITY OF SHEFFIELD DEPARTMENT OF COMPUTER SCIENCE RESEARCH MEMORANDA CS-03-08 MPHIL/PHD UPGRADE REPORT AN ENHANCED DATA MODEL AND QUERY ALGEBRA FOR PARTIALLY STRUCTURED XML DATABASE SUPERVISORS:
Moving from CS 61A Scheme to CS 61B Java
Moving from CS 61A Scheme to CS 61B Java Introduction Java is an object-oriented language. This document describes some of the differences between object-oriented programming in Scheme (which we hope you
Language Interface for an XML. Constructing a Generic Natural. Database. Rohit Paravastu
Constructing a Generic Natural Language Interface for an XML Database Rohit Paravastu Motivation Ability to communicate with a database in natural language regarded as the ultimate goal for DB query interfaces
Exchanger XML Editor - Canonicalization and XML Digital Signatures
Exchanger XML Editor - Canonicalization and XML Digital Signatures Copyright 2005 Cladonia Ltd Table of Contents XML Canonicalization... 2 Inclusive Canonicalization... 2 Inclusive Canonicalization Example...
Integrating VoltDB with Hadoop
The NewSQL database you ll never outgrow Integrating with Hadoop Hadoop is an open source framework for managing and manipulating massive volumes of data. is an database for handling high velocity data.
Web Service Mediation Through Multi-level Views
Web Service Mediation Through Multi-level Views Manivasakan Sabesan and Tore Risch Department of Information Technology, Uppsala University, Sweden {msabesan, Tore.Risch}@it.uu.se Abstract. The web Service
T-110.5140 Network Application Frameworks and XML Web Services and WSDL 15.2.2010 Tancred Lindholm
T-110.5140 Network Application Frameworks and XML Web Services and WSDL 15.2.2010 Tancred Lindholm Based on slides by Sasu Tarkoma and Pekka Nikander 1 of 20 Contents Short review of XML & related specs
An Oracle White Paper October 2013. Oracle XML DB: Choosing the Best XMLType Storage Option for Your Use Case
An Oracle White Paper October 2013 Oracle XML DB: Choosing the Best XMLType Storage Option for Your Use Case Introduction XMLType is an abstract data type that provides different storage and indexing models
Recovering Business Rules from Legacy Source Code for System Modernization
Recovering Business Rules from Legacy Source Code for System Modernization Erik Putrycz, Ph.D. Anatol W. Kark Software Engineering Group National Research Council, Canada Introduction Legacy software 000009*
Database trends: XML data storage
Database trends: XML data storage UC Santa Cruz CMPS 10 Introduction to Computer Science www.soe.ucsc.edu/classes/cmps010/spring11 [email protected] 25 April 2011 DRC Students If any student in the class
Database Systems. Lecture 1: Introduction
Database Systems Lecture 1: Introduction General Information Professor: Leonid Libkin Contact: [email protected] Lectures: Tuesday, 11:10am 1 pm, AT LT4 Website: http://homepages.inf.ed.ac.uk/libkin/teach/dbs09/index.html
A Comparison of Database Query Languages: SQL, SPARQL, CQL, DMX
ISSN: 2393-8528 Contents lists available at www.ijicse.in International Journal of Innovative Computer Science & Engineering Volume 3 Issue 2; March-April-2016; Page No. 09-13 A Comparison of Database
6. SQL/XML. 6.1 Introduction. 6.1 Introduction. 6.1 Introduction. 6.1 Introduction. XML Databases 6. SQL/XML. Creating XML documents from a database
XML Databases Silke Eckstein Andreas Kupfer Institut für Informationssysteme Technische Universität http://www.ifis.cs.tu-bs.de in XML XML Databases SilkeEckstein Institut fürinformationssysteme TU 2 Creating
Database Design Patterns. Winter 2006-2007 Lecture 24
Database Design Patterns Winter 2006-2007 Lecture 24 Trees and Hierarchies Many schemas need to represent trees or hierarchies of some sort Common way of representing trees: An adjacency list model Each
XML Programming with PHP and Ajax
http://www.db2mag.com/story/showarticle.jhtml;jsessionid=bgwvbccenyvw2qsndlpskh0cjunn2jvn?articleid=191600027 XML Programming with PHP and Ajax By Hardeep Singh Your knowledge of popular programming languages
XML Processing and Web Services. Chapter 17
XML Processing and Web Services Chapter 17 Textbook to be published by Pearson Ed 2015 in early Pearson 2014 Fundamentals of http://www.funwebdev.com Web Development Objectives 1 XML Overview 2 XML Processing
Symbol Tables. Introduction
Symbol Tables Introduction A compiler needs to collect and use information about the names appearing in the source program. This information is entered into a data structure called a symbol table. The
MapReduce. MapReduce and SQL Injections. CS 3200 Final Lecture. Introduction. MapReduce. Programming Model. Example
MapReduce MapReduce and SQL Injections CS 3200 Final Lecture Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. OSDI'04: Sixth Symposium on Operating System Design
1 Introduction. 2 An Interpreter. 2.1 Handling Source Code
1 Introduction The purpose of this assignment is to write an interpreter for a small subset of the Lisp programming language. The interpreter should be able to perform simple arithmetic and comparisons
VIRTUAL LABORATORY: MULTI-STYLE CODE EDITOR
VIRTUAL LABORATORY: MULTI-STYLE CODE EDITOR Andrey V.Lyamin, State University of IT, Mechanics and Optics St. Petersburg, Russia Oleg E.Vashenkov, State University of IT, Mechanics and Optics, St.Petersburg,
Compiler Construction
Compiler Construction Regular expressions Scanning Görel Hedin Reviderad 2013 01 23.a 2013 Compiler Construction 2013 F02-1 Compiler overview source code lexical analysis tokens intermediate code generation
CSCE-608 Database Systems COURSE PROJECT #2
CSCE-608 Database Systems Fall 2015 Instructor: Dr. Jianer Chen Teaching Assistant: Yi Cui Office: HRBB 315C Office: HRBB 501C Phone: 845-4259 Phone: 587-9043 Email: [email protected] Email: [email protected]
Last Week. XML (extensible Markup Language) HTML Deficiencies. XML Advantages. Syntax of XML DHTML. Applets. Modifying DOM Event bubbling
XML (extensible Markup Language) Nan Niu ([email protected]) CSC309 -- Fall 2008 DHTML Modifying DOM Event bubbling Applets Last Week 2 HTML Deficiencies Fixed set of tags No standard way to create new
10. XML Storage 1. 10.1 Motivation. 10.1 Motivation. 10.1 Motivation. 10.1 Motivation. XML Databases 10. XML Storage 1 Overview
10. XML Storage 1 XML Databases 10. XML Storage 1 Overview Silke Eckstein Andreas Kupfer Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 10.6 Overview and
Syntax Check of Embedded SQL in C++ with Proto
Proceedings of the 8 th International Conference on Applied Informatics Eger, Hungary, January 27 30, 2010. Vol. 2. pp. 383 390. Syntax Check of Embedded SQL in C++ with Proto Zalán Szűgyi, Zoltán Porkoláb
Parsing Technology and its role in Legacy Modernization. A Metaware White Paper
Parsing Technology and its role in Legacy Modernization A Metaware White Paper 1 INTRODUCTION In the two last decades there has been an explosion of interest in software tools that can automate key tasks
Introduction to Web Services
Department of Computer Science Imperial College London CERN School of Computing (icsc), 2005 Geneva, Switzerland 1 Fundamental Concepts Architectures & escience example 2 Distributed Computing Technologies
SnapLogic Salesforce Snap Reference
SnapLogic Salesforce Snap Reference Document Release: October 2012 SnapLogic, Inc. 71 East Third Avenue San Mateo, California 94401 U.S.A. www.snaplogic.com Copyright Information 2012 SnapLogic, Inc. All
WebSphere Business Monitor
WebSphere Business Monitor Monitor models 2010 IBM Corporation This presentation should provide an overview of monitor models in WebSphere Business Monitor. WBPM_Monitor_MonitorModels.ppt Page 1 of 25
An XML Based Data Exchange Model for Power System Studies
ARI The Bulletin of the Istanbul Technical University VOLUME 54, NUMBER 2 Communicated by Sondan Durukanoğlu Feyiz An XML Based Data Exchange Model for Power System Studies Hasan Dağ Department of Electrical
Lecture 9. Semantic Analysis Scoping and Symbol Table
Lecture 9. Semantic Analysis Scoping and Symbol Table Wei Le 2015.10 Outline Semantic analysis Scoping The Role of Symbol Table Implementing a Symbol Table Semantic Analysis Parser builds abstract syntax
XML Databases 6. SQL/XML
XML Databases 6. SQL/XML Silke Eckstein Andreas Kupfer Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 6. SQL/XML 6.1Introduction 6.2 Publishing relational
10CS73:Web Programming
10CS73:Web Programming Question Bank Fundamentals of Web: 1.What is WWW? 2. What are domain names? Explain domain name conversion with diagram 3.What are the difference between web browser and web server
CHAPTER 1 INTRODUCTION
CHAPTER 1 INTRODUCTION 1.1 Introduction Nowadays, with the rapid development of the Internet, distance education and e- learning programs are becoming more vital in educational world. E-learning alternatives
High Performance XML Data Retrieval
High Performance XML Data Retrieval Mark V. Scardina Jinyu Wang Group Product Manager & XML Evangelist Oracle Corporation Senior Product Manager Oracle Corporation Agenda Why XPath for Data Retrieval?
[MS-ASMS]: Exchange ActiveSync: Short Message Service (SMS) Protocol
[MS-ASMS]: Exchange ActiveSync: Short Message Service (SMS) Protocol Intellectual Property Rights Notice for Open Specifications Documentation Technical Documentation. Microsoft publishes Open Specifications
International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 ISSN 2229-5518
International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 INTELLIGENT MULTIDIMENSIONAL DATABASE INTERFACE Mona Gharib Mohamed Reda Zahraa E. Mohamed Faculty of Science,
Lesson 4 Web Service Interface Definition (Part I)
Lesson 4 Web Service Interface Definition (Part I) Service Oriented Architectures Module 1 - Basic technologies Unit 3 WSDL Ernesto Damiani Università di Milano Interface Definition Languages (1) IDLs
G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs
G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs Sherif Sakr National ICT Australia UNSW, Sydney, Australia [email protected] Sameh Elnikety Microsoft Research Redmond, WA, USA [email protected]
Data XML and XQuery A language that can combine and transform data
Data XML and XQuery A language that can combine and transform data John de Longa Solutions Architect DataDirect technologies [email protected] Mobile +44 (0)7710 901501 Data integration through
DTD Tutorial. About the tutorial. Tutorial
About the tutorial Tutorial Simply Easy Learning 2 About the tutorial DTD Tutorial XML Document Type Declaration commonly known as DTD is a way to describe precisely the XML language. DTDs check the validity
MongoDB Aggregation and Data Processing Release 3.0.4
MongoDB Aggregation and Data Processing Release 3.0.4 MongoDB Documentation Project July 08, 2015 Contents 1 Aggregation Introduction 3 1.1 Aggregation Modalities..........................................
Implementing XML Schema inside a Relational Database
Implementing XML Schema inside a Relational Database Sandeepan Banerjee Oracle Server Technologies 500 Oracle Pkwy Redwood Shores, CA 94065, USA + 1 650 506 7000 [email protected] ABSTRACT
Advanced compiler construction. General course information. Teacher & assistant. Course goals. Evaluation. Grading scheme. Michel Schinz 2007 03 16
Advanced compiler construction Michel Schinz 2007 03 16 General course information Teacher & assistant Course goals Teacher: Michel Schinz [email protected] Assistant: Iulian Dragos INR 321, 368 64
MongoDB Aggregation and Data Processing
MongoDB Aggregation and Data Processing Release 3.0.8 MongoDB, Inc. December 30, 2015 2 MongoDB, Inc. 2008-2015 This work is licensed under a Creative Commons Attribution-NonCommercial- ShareAlike 3.0
Schematron Validation and Guidance
Schematron Validation and Guidance Schematron Validation and Guidance Version: 1.0 Revision Date: July, 18, 2007 Prepared for: NTG Prepared by: Yunhao Zhang i Schematron Validation and Guidance SCHEMATRON
PeopleSoft Query Training
PeopleSoft Query Training Overview Guide Tanya Harris & Alfred Karam Publish Date - 3/16/2011 Chapter: Introduction Table of Contents Introduction... 4 Navigation of Queries... 4 Query Manager... 6 Query
Glossary of Object Oriented Terms
Appendix E Glossary of Object Oriented Terms abstract class: A class primarily intended to define an instance, but can not be instantiated without additional methods. abstract data type: An abstraction
C Compiler Targeting the Java Virtual Machine
C Compiler Targeting the Java Virtual Machine Jack Pien Senior Honors Thesis (Advisor: Javed A. Aslam) Dartmouth College Computer Science Technical Report PCS-TR98-334 May 30, 1998 Abstract One of the
DataDirect XQuery Technical Overview
DataDirect XQuery Technical Overview Table of Contents 1. Feature Overview... 2 2. Relational Database Support... 3 3. Performance and Scalability for Relational Data... 3 4. XML Input and Output... 4
Object Oriented Databases. OOAD Fall 2012 Arjun Gopalakrishna Bhavya Udayashankar
Object Oriented Databases OOAD Fall 2012 Arjun Gopalakrishna Bhavya Udayashankar Executive Summary The presentation on Object Oriented Databases gives a basic introduction to the concepts governing OODBs
1 File Processing Systems
COMP 378 Database Systems Notes for Chapter 1 of Database System Concepts Introduction A database management system (DBMS) is a collection of data and an integrated set of programs that access that data.
XFlash A Web Application Design Framework with Model-Driven Methodology
International Journal of u- and e- Service, Science and Technology 47 XFlash A Web Application Design Framework with Model-Driven Methodology Ronnie Cheung Hong Kong Polytechnic University, Hong Kong SAR,
JSquash: Source Code Analysis of Embedded Database Applications for Determining SQL Statements
JSquash: Source Code Analysis of Embedded Database Applications for Determining SQL Statements Dietmar Seipel 1, Andreas M. Boehm 1, and Markus Fröhlich 1 University of Würzburg, Department of Computer
3. Relational Model and Relational Algebra
ECS-165A WQ 11 36 3. Relational Model and Relational Algebra Contents Fundamental Concepts of the Relational Model Integrity Constraints Translation ER schema Relational Database Schema Relational Algebra
Representation of E-documents in AIDA Project
Representation of E-documents in AIDA Project Diana Berbecaru Marius Marian Dip. di Automatica e Informatica Politecnico di Torino Corso Duca degli Abruzzi 24, 10129 Torino, Italy Abstract Initially developed
Using Object And Object-Oriented Technologies for XML-native Database Systems
Using Object And Object-Oriented Technologies for XML-native Database Systems David Toth and Michal Valenta David Toth and Michal Valenta Dept. of Computer Science and Engineering Dept. FEE, of Computer
Relational Databases for Querying XML Documents: Limitations and Opportunities. Outline. Motivation and Problem Definition Querying XML using a RDBMS
Relational Databases for Querying XML Documents: Limitations and Opportunities Jayavel Shanmugasundaram Kristin Tufte Gang He Chun Zhang David DeWitt Jeffrey Naughton Outline Motivation and Problem Definition
