Relational Databases for Querying XML Documents: Limitations and Opportunities. Outline. Motivation and Problem Definition Querying XML using a RDBMS

Similar documents

Translating between XML and Relational Databases using XML Schema and Automed

Unraveling the Duplicate-Elimination Problem in XML-to-SQL Query Translation

Storing and Querying XML Data using an RDMBS

Quiz! Database Indexes. Index. Quiz! Disc and main memory. Quiz! How costly is this operation (naive solution)?

Content Management for Declarative Web Site Design

Introduction to XML. Data Integration. Structure in Data Representation. Yanlei Diao UMass Amherst Nov 15, 2007

1. Write the query of Exercise 6.19 using TRC and DRC: Find the names of all brokers who have made money in all accounts assigned to them.

Data Integration for XML based on Semantic Knowledge

Structured vs. unstructured data. Motivation for self describing data. Enter semistructured data. Databases are highly structured

Managing Typed Hybrid XML and Relational Data Sources for Web Content Management

Last Week. XML (extensible Markup Language) HTML Deficiencies. XML Advantages. Syntax of XML DHTML. Applets. Modifying DOM Event bubbling

Chapter 2: Designing XML DTDs

Implementing XML Schema inside a Relational Database

XML: extensible Markup Language. Anabel Fraga

Standard Recommended Practice extensible Markup Language (XML) for the Interchange of Document Images and Related Metadata

database abstraction layer database abstraction layers in PHP Lukas Smith BackendMedia

Modern Databases. Database Systems Lecture 18 Natasha Alechina

CS2Bh: Current Technologies. Introduction to XML and Relational Databases. The Relational Model. The relational model

Chapter 5. SQL: Queries, Constraints, Triggers

Oracle Database 10g Express

Principles of Database Management Systems. Overview. Principles of Data Layout. Topic for today. "Executive Summary": here.

Xml Mediator and Data Management

Semistructured data and XML. Institutt for Informatikk INF Ahmet Soylu

Enhancing Traditional Databases to Support Broader Data Management Applications. Yi Chen Computer Science & Engineering Arizona State University

Relational Database Concepts

Lesson 8: Introduction to Databases E-R Data Modeling

High-performance XML Storage/Retrieval System

ICAB4136B Use structured query language to create database structures and manipulate data

Technologies for a CERIF XML based CRIS

Wee Keong Ng. Web Data Management. A Warehouse Approach. With 106 Illustrations. Springer

Relational Databases

A Workbench for Prototyping XML Data Exchange (extended abstract)

CSE 530A Database Management Systems. Introduction. Washington University Fall 2013

Integrating XML and Databases

An Active Web-based Distributed Database System for E-Commerce

Efficient Data Access and Data Integration Using Information Objects Mica J. Block

SQL Server for developers. murach's TRAINING & REFERENCE. Bryan Syverson. Mike Murach & Associates, Inc. Joel Murach

Storing and Querying Ordered XML Using a Relational Database System

Efficient XML-to-SQL Query Translation: Where to Add the Intelligence?

XML Data Integration

Storing and Querying XML Data in Object-Relational DBMSs

Object Oriented Databases. OOAD Fall 2012 Arjun Gopalakrishna Bhavya Udayashankar

Relational Database Basics Review

AN ENHANCED DATA MODEL AND QUERY ALGEBRA FOR PARTIALLY STRUCTURED XML DATABASE

An Oracle White Paper October Oracle XML DB: Choosing the Best XMLType Storage Option for Your Use Case

NoSQL and Agility. Why Document and Graph Stores Rock November 12th, 2015 COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

Internationalization Tag Set 1.0 A New Standard for Internationalization and Localization of XML

Oracle Data Miner (Extension of SQL Developer 4.0)

Using Database Metadata and its Semantics to Generate Automatic and Dynamic Web Entry Forms

XQuery and the E-xml Component suite

Application of XML Tools for Enterprise-Wide RBAC Implementation Tasks

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

The Niagara Internet Query System

Physical Design. Meeting the needs of the users is the gold standard against which we measure our success in creating a database.

KEYWORD SEARCH IN RELATIONAL DATABASES

10. XML Storage Motivation Motivation Motivation Motivation. XML Databases 10. XML Storage 1 Overview

Evaluation of view maintenance with complex joins in a data warehouse environment (HS-IDA-MD )

Converting XML Data To UML Diagrams For Conceptual Data Integration

XML and Data Integration

XML Schema Definition Language (XSDL)

Structured vs. unstructured data. Semistructured data, XML, DTDs. Motivation for self-describing data

Example Instances. SQL: Queries, Programming, Triggers. Conceptual Evaluation Strategy. Basic SQL Query. A Note on Range Variables

SQL: Queries, Programming, Triggers

i-scream The future is bright; the future is blue.

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc.

Databases 2011 The Relational Model and SQL

A Brief Introduction to MySQL

Information Discovery on Electronic Medical Records

Efficient Structure Oriented Storage of XML Documents Using ORDBMS

Model-Mapping Approaches for Storing and Querying XML Documents in Relational Database: A Survey

Piazza: Data Management Infrastructure for Semantic Web Applications

XEP-0043: Jabber Database Access

Tutorial on Relational Database Design

Relational Databases. Christopher Simpkins

Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification

End the Microsoft Access Chaos - Your simplified path to Oracle Application Express

SQL SELECT Query: Intermediate

Introduction to SQL and SQL in R. LISA Short Courses Xinran Hu

Instant SQL Programming

Web Caching With Dynamic Content Abstract When caching is a good idea

Effective Use of SQL in SAS Programming

SQL Query Performance Tuning: Tips and Best Practices

HadoopSPARQL : A Hadoop-based Engine for Multiple SPARQL Query Answering

edoc Document Generation Suite

XML and Data Management

Lecture 6. SQL, Logical DB Design

Cache Database: Introduction to a New Generation Database

Chapter 6: Physical Database Design and Performance. Database Development Process. Physical Design Process. Physical Database Design

SQL Query Evaluation. Winter Lecture 23

Graham Kemp (telephone , room 6475 EDIT) The examiner will visit the exam room at 15:00 and 17:00.

SQL: Queries, Programming, Triggers

WRITING EFFICIENT SQL. By Selene Bainum

SAP Business Objects Business Intelligence platform Document Version: 4.1 Support Package Data Federation Administration Tool Guide

Integrating and Exchanging XML Data using Ontologies

Data Integration Hub for a Hybrid Paper Search

14 Databases. Source: Foundations of Computer Science Cengage Learning. Objectives After studying this chapter, the student should be able to:

Database Systems. Lecture 1: Introduction

A Comparison of Database Query Languages: SQL, SPARQL, CQL, DMX

Transcription:

Relational Databases for Querying XML Documents: Limitations and Opportunities Jayavel Shanmugasundaram Kristin Tufte Gang He Chun Zhang David DeWitt Jeffrey Naughton Outline Motivation and Problem Definition Querying XML using a RDBMS Physical schema design Query mapping Result construction Extensions to Relational Model Conclusion and Future Work 1

XML in One Slide Hierarchical document format for information exchange in WWW Self describing data (tags) Nested element structure having a root Element data can have Attributes Sub-elements XML Example <book> <booktitle> The Selfish Gene </booktitle> <author id = dawkins > <name> <firstname> Richard </firstname> <lastname> Dawkins </lastname> </name> <address> <city> Timbuktu </city> <zip> 99999 </zip> </address> </author> </book> 2

What is the big deal about XML? Fast emerging as dominant standard for data representation on WWW What is the big deal about XML? Fast emerging as dominant standard for data representation on WWW Exciting database opportunity: Unlike HTML, tags are not only for presentation Can capture semantics Can query the web if we can query XML!!! 3

Usage Scenario Application/User Query over XML Documents XML Result (processed or displayed in browser) Query Engine XML Document XML Document XML Document XML Query Languages: XML-QL WHERE <book> <booktitle> The Selfish Gene </booktitle> <author> <lastname> $l </lastname> </> </> IN http://www.amazon.com/listings.xml, http://www.barnesandnoble.com/books.xml CONSTRUCT <lastname> $l </lastname> 4

Usage Scenario Application/User Query over XML Documents XML Result (processed or displayed in browser) Query Engine XML Document XML Document XML Document Key Requirement Presence of schema for XML documents For issuing queries For applications to interpret data Current proposals Document Type Descriptors (DTDs) With typing Document Content Descriptors (DCDs) XML Schemas 5

DTD Graph book article monograph booktitle? * contactauthor authorid author title name editor * name address authorid? firstname lastname DTDs: Schema for XML Docs <!ELEMENT book (booktitle, author) <!ELEMENT booktitle (#PCDATA)> <!ELEMENT author (name, address)> <!ATTLIST author id ID #REQUIRED> <!ELEMENT name (firstname?, lastname)> <!ELEMENT firstname (#PCDATA)> <!ELEMENT lastname (#PCDATA)> <!ELEMENT address ANY> <!ELEMENT article (title, author*, contactauthor)> <!ELEMENT title (#PCDATA)> <!ELEMENT contactauthor EMPTY> <!ATTLIST contactauthor authorid IDREF IMPLIED> <!ELEMENT monograph (title, author, editor)> <!ELEMENT editor (monograph*)> <!ATTLIST editor name CDATA #REQUIRED> 6

Given: The Problem DTDs Collection of XML documents conforming to DTDs Query: Based on DTD schemas Over collection of XML documents, performing selections, joins, etc. Producing an XML result What do we use to Query XML? XML a prime example of semi-structured data Invest in development of semi-structured query engines? [e.g., Lore] 7

What do we use to Query XML? XML a prime example of semi-structured data Invest in development of semi-structured query engines? [e.g., Lore] Ignores three decades of research/development of relational model Requires new tools for use with relational data What do we use to Query XML? XML a prime example of semi-structured data Invest in development of semi-structured query engines? [e.g., Lore] Ignores three decades of research/development of relational model Requires new tools for use with relational data Can we not leverage relational technology? 8

Outline Motivation and Problem Definition Querying XML using a RDBMS Physical schema design Query mapping Result construction Extensions to Relational Model Conclusion and Future Work Our Approach DTD XML Documents XML-QL Query XML Result Automatic Translation Layer Relational Schema Tuples SQL Query Relational Result Translation Information Commercial RDBMS (DB2) 9

Scope of the Work Schema/Data mapping: Automate storage of XML in RDBMS (Oracle s ifs, DB2 s XML Extender) Query mapping: Provide XML views of relational sources Result construction: Export existing data as XML Outline Motivation and Problem Definition Querying XML using a RDBMS Physical schema design Query mapping Result construction Extensions to Relational Model Conclusion and Future Work 10

DTDs to Relations: Issues Complex DTD specifications Two level nature of relational schema (tuples and attributes) vs. arbitrary nesting of XML DTD Recursion DTD Graph book article monograph booktitle? * contactauthor authorid author title name editor * name address authorid? firstname lastname 11

DTD to Relational Schema Naïve Approach: Each Element ==> Relation Each Attribute of Element ==> Column of Relation Connect elements using foreign keys Problem? Fragmentation! Fragmentation: Example <!ELEMENT author (name, address)> <!ATTLIST author id ID #REQUIRED> <!ELEMENT name (firstname?, lastname)> <!ELEMENT firstname (#PCDATA)> <!ELEMENT lastname (#PCDATA)> <!ELEMENT address ANY> author (authorid: integer, id: string) name (nameid: integer, authorid: integer) firstname (firstnameid: integer, nameid: integer, value: string) lastname (lastnameid: integer, nameid: integer, value: string) address (addressid: integer, authorid: integer, value: string) Results in 5 relations Just retrieving first and last names of an author requires three joins! 12

Shared Inlining Technique Intuition: Inline as many sub-elements as possible Do not inline only if it is a shared, recursive or set sub-element. Technique: Necessary and Sufficient Condition for shared/ recursive element: In-degree >= 2 in DTD graph Issues with Sharing Elements Parent of elements not fixed at schema level Need to store type and ids of parents (or if there are no parents) parentcode field (type of parent) parentid field (id of parent) Not foreign key relationship 13

Shared: Relational Schema book (bookid: integer, book.booktitle.isroot: boolean, book.booktitle : string) article (articleid: integer, article.contactauthor.isroot: boolean, article.contactauthor.authorid: string) monograph (monographid: integer, monograph.parentid: integer, monograph.parentcode: integer, monograph.editor.isroot: boolean, monograph.editor.name: string) title (titleid: integer, title.parentid: integer, title.parentcode: integer, title: string) author (authorid: integer, author.parentid: integer, author.parentcode: integer, author.name.isroot: boolean, author.name.firstname.isroot: :boolean, author.name.firstname: string, author.name.lastname.isroot: boolean, author.name.lastname: string, author.address.isroot: boolean, author.address: string, author.authorid: string) Shared Inlining Techniques: Pros + Reduces number of joins for queries like get the first and last names of an author + Efficient for queries such as list all authors with name Jack 14

Shared Inlining Technique: Cons - Sharing whenever possible implies extra joins for path expressions Article with a given title name Hybrid Inlining Technique Inlines some elements that are shared in Shared Elements with in-degree >= 2 that are not set sub-elements or recursive Handles set and recursive sub-elements as in Shared 15

Hybrid: Relational Schema book (bookid: integer, book.booktitle.isroot: boolean, book.booktitle : string, author.name.firstname: string, author.name.lastname: string, author.address: string, author.authorid: string) article (articleid: integer, article.contactauthor.isroot: boolean, article.contactauthor.authorid: string, article.title.isroot: boolean, article.title: string) monograph (monographid: integer, monograph.parentid: integer, monograph.parentcode: integer, monograph.title: string, monograph.editor.isroot: boolean, monograph.editor.name: string, author.name.firstname: string, author.name.lastname: string, author.address: string, author.authorid: string) author (authorid: integer, author.parentid: integer, author.parentcode: integer, author.name.isroot: boolean, author.name.firstname.isroot: boolean, author.name.firstname: string, author.name.lastname.isroot: boolean, author.name.lastname: string, author.address.isroot: boolean, author.address: string, author.authorid: string) Hybrid Inlining Technique: Pros + Reduces joins through shared elements (that are not set or recursive elements) + Shares some strengths of Shared: Reduces joins for queries like get first and last names of a book author 16

Hybrid Inlining Technique: Cons - Requires more SQL sub-queries to retrieve all authors with first name Jack Tradeoff between reducing number of queries and reducing number of joins Shared and Hybrid target query- and joinreduction respectively Qualitative Evaluation 37 real DTDs from Oasis XML web page (http://www.oasis-open.org/cover/xml.html) Query set: All path expressions (that are valid in a given DTD) of a given length Metric: Number of joins in query Study benefit of inlining Also study tradeoff: Number of joins per SQL query Number of SQL queries 17

Path Expression with Length 3 Shared Hybrid Shared Hybrid Joins/Query 2 1.5 1 0.5 0 aml bips14 math nitf-x ofx1516 pif residential saej smil vrml Total Joins 2.5 2 1.5 1 0.5 0 Shared aml bips14 math nitf-x ofx1516 pif residential saej smil vrml Queries 2 1.5 1 0.5 0 Hybrid aml bips14 math nitf-x ofx1516 pif residential saej smil vrml Classification of DTDs Group 1: J hybrid << J shared, Q hybrid > Q shared, TJ hybrid < TJ shared 35.14% Group 2: J hybrid << J shared, Q hybrid >> Q shared, TJ hybrid ~ TJ shared 5.40% Group 3: J hybrid < J shared, Q hybrid >> Q shared, TJ hybrid > TJ shared 16.22% Group 4: J hybrid ~ J shared, Q hybrid ~ Q shared, TJ hybrid ~ TJ shared 43.24% 18

Evaluation Shared and Hybrid have pros and cons In many cases, Shared and Hybrid are nearly identical Number of joins per SQL query ~ path length Mainly due to large number of set nodes Problem as join processing is expensive! Outline Motivation and Problem Definition Querying XML using a RDBMS Physical schema design Query mapping Result construction Extensions to Relational Model Conclusion and Future Work 19

Semi-structured Queries to SQL Queries Semi-structured queries have a lot more flexibility than SQL Allow path expressions with various operators and wild cards Discussion based on Shared approach Queries with Simple Path Expressions: Example WHERE <book> <booktitle> The Selfish Gene </booktitle> <author> <name> <firstname> $f </firstname> <lastname> $l </lastname> </name> </author> </book> IN * CONFORMING TO pubs.dtd CONSTRUCT <result> $f $l </result> 20

Queries with Simple Path Expressions: Translation Select A. author.name.firstname, A. author.name.lastname From author A, book B Where B.bookID = A.parentID AND A.parentCODE = 0 AND B. book.booktitle = The Selfish Gene Queries with Recursive Path Expressions: Example WHERE <*.monograph> <editor.(monograph.editor)*> <name> $n </name> </> <title> Subclass Cirripedia </title> </> IN * CONFORMING TO pubs.dtd CONSTRUCT <result> $n </result> 21

Queries with Recursive Path Expressions: Translation With Q1 (monographid, name) AS (Select X.monographID, X. editor.name From monograph X Where X.title = Subclass Cirripedia UNION ALL Select Z.monographID, Z. editor.name From Q1 Y, monograph Z Where Y.monographID = Z.parentID AND Z.parentCODE = 0 ) Select A.name From Q1 A Queries with Arbitrary Path Expressions WHERE <(article monograph).$*.name> $n </> CONSTRUCT <name> $n </> Split complex path expression to (possibly many) simple recursive path expressions Has effect of splitting a single XML-QL/ Lorel query to (possibly many) SQL queries Can handle nested recursive queries (though DB2 does not support it) 22

Outline Motivation and Problem Definition Querying XML using a RDBMS Physical schema design Query mapping Result construction Extensions to Relational Model Conclusion and Future Work Structuring Scenarios Simple structuring Tag variables Grouping Complex element construction Heterogeneity Nested queries 23

Simple Structuring: Query WHERE <book> <author> <firstname> $f </firstname> <lastname> $l </lastname> </> </> IN * CONFORMS TO pubs.dtd CONSTRUCT <author> <firstname> $f </firstname> <lastname> $l </lastname> </author> Simple Structuring: Translation (Richard, Dawkins) (NULL, Darwin) <author> <firstname> Richard </firstname> <lastname> Dawkins </lastname> </author> <author> <lastname> Darwin </lastname> </author> 24

Grouping: Query WHERE <$p> <(title booktitle)> $t </> <author> <lastname> $l </lastname> </> </> IN * CONFORMS TO pubs.dtd CONSTRUCT <author ID=authorID($l)> <name> $l </name> <$p ID=pID($p)> <title> $t </> </> </> Grouping: Translation (Darwin, book, Origin of Species) (Darwin, book, Descent of Man) (Darwin, monograph, Subclass Cirripedia) (Dawkins, book, The Selfish Gene) <author> <name> Darwin </name> <book> <title> Origin of Species </title> <title> The Descent of Man </title> </book> <monograph> <title> Subclass Cirripedia </title> </monograph> </author> <author> <name> Dawkins </name> <book> <title> The Selfish Gene </title> </book> </author> 25

Complex Results: Query Assume article has multiple authors and titles WHERE <article> $a </> IN * CONFORMS TO pubs.dtd CONSTRUCT <article> $a </> Complex Results: Translation Returning single table: Results in Multi-Valued Dependencies Returning many tables: Database functionality (outer join, optimization) duplicated outside Similar problem with recursive types Problem with Relational Model! 26

Outline Motivation and Problem Definition Querying XML using a RDBMS Physical schema design Query mapping Result construction Extensions to Relational Model Conclusion and Future Work Extensions to Relational Model Support for sets Untyped/variable typed references Information retrieval style indices Flexible comparison operators Multiple query optimization/execution Complex recursion 27

Support for Sets Set-valued attributes Reduce fragmentation Nesting operators Construct complex XML results Untyped/Variable Typed References IDREFs not typed in XML Joins through IDREFs in relational model: One join for every possible reference type Proliferation of joins Better idea is to have database support 28

Information Retrieval Style Indices Query ANY fields in a DTD Limited query support for XML fragments Reduces fragmentation Example: Oracle s ConText Search Engine, DB2 s Text Extender Outline Motivation and Problem Definition Querying XML using a RDBMS Physical schema design Query mapping Result construction Extensions to Relational Model Conclusion and Future Work 29

Conclusion Relational model can be used to query XML documents! But Several limitations make model awkward or inefficient Extensions remove some of limitations Open Question: Is it better to develop a specialized XML database or extend the relational model? Future Research Directions Study impact of extensions Performance comparison with native XML approaches Exploit queryable XML sources: May have high performance RDBMS under wraps Push processing to queryable source 30

More Details? Relational Databases for Querying XML Documents: Limitations and Opportunities, VLDB 99 http://www.cs.wisc.edu/~jai/pubs.html XML Query Languages: XQL Last name of the author of the book The Selfish Gene book[booktitle = The Selfish Gene ]/author/lastname 31

XML Query Languages: Lorel SELECT X.author.lastname FROM book X WHERE X.booktitle = The Selfish Gene Complexity of DTDs DTD element specifications can be of arbitrary complexity <!ELEMENT a ((b c e)?,(e? (f?,(b,b)*))*)> is valid! Fortunately, can simplify DTD for translation purposes: Key observations: not necessary to regenerate DTD from relational schema XML query languages do not query over complex structures of elements 32

Simplification of DTDs Flattening Transformations (e 1, e 2 )* e 1 *, e 2 * (e 1, e 2 )? e 1?, e 2? (e 1 e 2 ) e 1?, e 2? Simplification Transformations e 1 ** e 1 * e 1 *? e 1 * e 1?* e 1 * e 1?? e 1? Grouping Transformations..., a*,..., a*,... a*,......, a*,..., a?,... a*,......, a?,..., a*,... a*,......, a?,..., a?,... a*,,...a,, a, a*, <!ELEMENT a ((b c e)?,(e? (f?,(b,b)*))*)> <!ELEMENT a (b*, c?, e*, f*)> Basic Inlining Technique Intuition: Inline as many sub-elements as possible Do not inline only if it is a set sub-element. Connect relations using foreign keys Complications: A document can be rooted at any element Create separate relational schema for each root Recursion Detect cycles in schema 33

Basic: Relational Schema book (bookid: integer, book.booktitle : string, book.author.name.firstname: string, book.author.name.lastname: string, book.author.address: string, author.authorid: string) booktitle (booktitleid: integer, booktitle: string) article (articleid: integer, article.contactauthor.authorid: string, article.title: string) article.author (article.authorid: integer, article.author.parentid: integer, article.author.name.firstname: string, article.author.name.lastname: string, article.author.address: string, article.author.authorid: string) contactauthor (contactauthorid: integer, contactauthor.authorid: string) title (titleid: integer, title: string) monograph (monographid: integer, monograph.parentid: integer, monograph.title: string, monograph.editor.name: string, monograph.author.name.firstname: string, monograph.author.name.lastname: string, monograph.author.address: string, monograph.author.authorid: string) editor (editorid: integer, editor.parentid: integer, editor.name: string) editor.monograph (editor.monographid: integer, editor.monograph.parentid: integer, editor.monograph.title: string, editor.monograph.author.name.firstname: string, editor.monograph.author.name.lastname: string, editor.monograph.author.address: string, editor.monograph.author.authorid: string) author (authorid: integer, author.name.firstname: string, author.name.lastname: string, author.address: string, aauthor.authorid: string) name (nameid: integer, name.firstname: string, name.lastname: string) firstname (firstnameid: integer, firstname: string) lastname (lastnameid: integer, lastname: string) address (addressid: integer, address: string) Basic Inlining Technique: Pros Reduces number of joins for queries like get the first and last names of a book author Efficient for queries such as list all authors of books 34

Basic Inlining Technique: Cons Queries like list all authors with name Jack Union of 5 queries! Large number of relations: Separate relational schema for each element as root (minor) Unrolling recursive strongly connected components (major) Experimental Results Is Basic practical? Many DTD have recursion Large strongly connected components Schema translation program ran out of virtual memory!!! Concentrate on Shared vs. Hybrid 35

Varying Path Expression Length Shared Hybrid Shared Hybrid Total Joins 11 10 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 Path Length Total Joins 11 10 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 Path Length Group 1 DTD Group 3 DTD Tag Variables: Query WHERE <$p> <author> <firstname> $f </firstname> <lastname> $l </lastname> </> </> IN * CONFORMS TO pubs.dtd CONSTRUCT <$p> <author> <firstname> $f </firstname> <lastname> $l </lastname> </author> </> 36

Tag Variables: Translation (book, Richard, Dawkins) (book, NULL, Darwin) (monograph, NULL, Darwin) <book> <author> <firstname> Richard </firstname> <lastname> Dawkins </lastname> </author> </book> <book> <author> <lastname> Darwin </lastname> </author> </book> <monograph> <author> <lastname> Darwin </lastname> </author> </monograph> Heterogeneous Results WHERE <article> <$p> $y </> </article> IN * CONFORMING TO pubs.dtd CONSTRUCT <$p> $y </> Rewritten as separate queries Results merged together 37

Nested Queries Very short answer: Can be rewritten as SQL queries Use outer joins to make connection Flexible Comparison Operators Comparison between values of different types Problem present even with typed schemas Different schemas may represent comparable information as different types 38

Multiple Query Optimization/ Execution Complex path expressions translated to (possibly) many SQL queries SQL queries share scans, selections, joins, etc. More efficient to optimize and execute as a group More Powerful Recursion Path expressions can be of arbitrary complexity Some systems support fixed-point operators (e.g., DB2) Nested fixed-point operators not supported Required for completeness 39