Purely Relational XQuery

Similar documents
Database-Supported XML Processors

OptImatch: Semantic Web System with Knowledge Base for Query Performance Problem Determination

Quiz! Database Indexes. Index. Quiz! Disc and main memory. Quiz! How costly is this operation (naive solution)?

QuickDB Yet YetAnother Database Management System?

Inside the PostgreSQL Query Optimizer

Unified XML/relational storage March The IBM approach to unified XML/relational databases

SQL/XML-IMDBg GPU boosted In-Memory Database for ultra fast data management Harald Frick CEO QuiLogic In-Memory DB Technology

Database & Information Systems Group Prof. Marc H. Scholl. XML & Databases. Tutorial. 11. SQL Compilation, XPath Symmetries

IBM DB2 XML support. How to Configure the IBM DB2 Support in oxygen

Comp 5311 Database Management Systems. 16. Review 2 (Physical Level)

Data XML and XQuery A language that can combine and transform data

Storing and Querying Ordered XML Using a Relational Database System

MS SQL Performance (Tuning) Best Practices:

Creating a TEI-Based Website with the exist XML Database

Oracle Database 11g: SQL Tuning Workshop Release 2

Unit Storage Structures 1. Storage Structures. Unit 4.3

Deferred node-copying scheme for XQuery processors

SQL Query Evaluation. Winter Lecture 23

SQL Optimization & Access Paths: What s Old & New Part 1

Scalable Data Integration by Mapping Data to Queries

Discovering SQL. Wiley Publishing, Inc. A HANDS-ON GUIDE FOR BEGINNERS. Alex Kriegel WILEY

Introduction. Part I: Finding Bottlenecks when Something s Wrong. Chapter 1: Performance Tuning 3

Oracle Database 11g: SQL Tuning Workshop

DB2 for Linux, UNIX, and Windows Performance Tuning and Monitoring Workshop

Damia: Data Mashups for Intranet Applications

Enhancing Traditional Databases to Support Broader Data Management Applications. Yi Chen Computer Science & Engineering Arizona State University

Indexing XML Data in RDBMS using ORDPATH

IBM DB2: LUW Performance Tuning and Monitoring for Single and Multiple Partition DBs


XML and Data Management

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Query optimization. DBMS Architecture. Query optimizer. Query optimizer.

Data Warehousing. Jens Teubner, TU Dortmund Winter 2014/15. Jens Teubner Data Warehousing Winter 2014/15 1


Modern Databases. Database Systems Lecture 18 Natasha Alechina

Use a Native XML Database for Your XML Data

How To Use Query Console

ABSTRACT 1. INTRODUCTION. Kamil Bajda-Pawlikowski

How To Improve Performance In A Database

PostgreSQL Business Intelligence & Performance Simon Riggs CTO, 2ndQuadrant PostgreSQL Major Contributor

Big Data looks Tiny from the Stratosphere

MapReduce Jeffrey Dean and Sanjay Ghemawat. Background context

Database Application Developer Tools Using Static Analysis and Dynamic Profiling

G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs

ICOM 6005 Database Management Systems Design. Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 August 23, 2001

Symbol Tables. Introduction

DBMS / Business Intelligence, SQL Server

I Have...Who Has... Multiplication Game

MapReduce. MapReduce and SQL Injections. CS 3200 Final Lecture. Introduction. MapReduce. Programming Model. Example

low-level storage structures e.g. partitions underpinning the warehouse logical table structures

Relational Databases

Oracle Database: SQL and PL/SQL Fundamentals NEW

1. INTRODUCTION TO RDBMS

Oracle Database: SQL and PL/SQL Fundamentals

Parallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel

ORACLE DATABASE 10G ENTERPRISE EDITION

Semistructured data and XML. Institutt for Informatikk INF Ahmet Soylu

Best Practices for DB2 on z/os Performance

Query Optimization for Distributed Database Systems Robert Taylor Candidate Number : Hertford College Supervisor: Dr.

Integrating Apache Spark with an Enterprise Data Warehouse

Chapter 1: Introduction

Querying Microsoft SQL Server

Oracle Database 10g. Page # The Self-Managing Database. Agenda. Benoit Dageville Oracle Corporation benoit.dageville@oracle.com

16.1 MAPREDUCE. For personal use only, not for distribution. 333

The Import & Export of Data from a Database

CIS 631 Database Management Systems Sample Final Exam

In-Memory Data Management for Enterprise Applications

History of Database Systems

Execution Strategies for SQL Subqueries

DB2 for i. Analysis and Tuning. Mike Cain IBM DB2 for i Center of Excellence. mcain@us.ibm.com

Markup Languages and Semistructured Data - SS 02

DB2 LUW Performance Tuning and Monitoring for Single and Multiple Partition DBs

Microkernels & Database OSs. Recovery Management in QuickSilver. DB folks: Stonebraker81. Very different philosophies

Understanding SQL Server Execution Plans. Klaus Aschenbrenner Independent SQL Server Consultant SQLpassion.at

The MonetDB Architecture. Martin Kersten CWI Amsterdam. M.Kersten

Towards Fast SQL Query Processing in DB2 BLU Using GPUs A Technology Demonstration. Sina Meraji sinamera@ca.ibm.com

ENHANCEMENTS TO SQL SERVER COLUMN STORES. Anuhya Mallempati #

2 Associating Facts with Time

SQL Anywhere 12 New Features Summary

Efficiently Identifying Inclusion Dependencies in RDBMS

Course ID#: W 35 Hrs. Course Content

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Introducing Microsoft SQL Server 2012 Getting Started with SQL Server Management Studio

Querying Microsoft SQL Server 20461C; 5 days

Execution Plans: The Secret to Query Tuning Success. MagicPASS January 2015

White Paper. Optimizing the Performance Of MySQL Cluster

Maksym Iaroshenko Co-Founder and Senior Software Engineer at Eltrino. Magento non-mysql implementations

Oracle Database: SQL and PL/SQL Fundamentals

InfiniteGraph: The Distributed Graph Database

Big Data, Fast Processing Speeds Kevin McGowan SAS Solutions on Demand, Cary NC

Transcription:

Purely Relational XQuery Database-Supported XML Processors Torsten Grust http://www-db.in.tum.de/~grust/ LUG Erding, Oct 2007

A Database Guy s View of XML, XPath, and XQuery 2

A Database Guy s View of XML, XPath, and XQuery Build XML processors that do not stumble once XML input volumes become sizable. 2

A Database Guy s View of XML, XPath, and XQuery Build XML processors that do not stumble once XML input volumes become sizable. Wait! Sizable data volume...? Use Relational Databases! 2

A Database Guy s View of XML, XPath, and XQuery Build XML processors that do not stumble once XML input volumes become sizable. Wait! Sizable data volume...? Use Relational Databases! <?xml version= 1.0?> <root> <elem att= val > <!-- comment --> text </elem> </root> let $doc := doc( in.xml ) for $t in $doc//text() return count($t/../comment()) 2

A Database Guy s View of XML, XPath, and XQuery Build XML processors that do not stumble once XML input volumes become sizable. Wait! Sizable data volume...? Use Relational Databases! <?xml version= 1.0?> <root> <elem att= val > <!-- comment --> text </elem> </root> let $doc := doc( in.xml ) for $t in $doc//text() return count($t/../comment()) 2

A Database Guy s View of XML, XPath, and XQuery Build XML processors that do not stumble once XML input volumes become sizable. Wait! Sizable data volume...? Use Relational Databases! pre size level 1 1 10 2 1 20 3 1 30 let $doc := doc( in.xml ) for $t in $doc//text() return count($t/../comment()) 2

A Database Guy s View of XML, XPath, and XQuery Build XML processors that do not stumble once XML input volumes become sizable. Wait! Sizable data volume...? Use Relational Databases! pre size level 1 1 10 2 1 20 3 1 30 SELECT pre,size, ROW_NUMBER ( ) FROM t000 WHERE value GROUP BY iter, 2

A Database Guy s View of XML, XPath, and XQuery Build XML processors that do not stumble once XML input volumes become sizable. Wait! Sizable data volume...? Use Relational Databases! OFF-THE-SHELF pre size level 1 1 10 2 1 20 3 1 30 SELECT pre,size, ROW_NUMBER ( ) FROM t000 WHERE value GROUP BY iter, 2

RDBMSs as Processors for Non-Relational Languages Relational databases contain the best understood and most scalable query processors available today. Can we use relational query engines as-is to efficiently process non-relational languages? 3

RDBMSs as Processors for Non-Relational Languages Relational databases contain the best understood and most scalable query processors available today. Can we use relational query engines as-is to efficiently process non-relational languages? L X X 3

RDBMSs as Processors for Non-Relational Languages Relational databases contain the best understood and most scalable query processors available today. Can we use relational query engines as-is to efficiently process non-relational languages? L X X R R -1 SQL / Relational Algebra 3

RDBMSs as Processors for Non-Relational Languages Relational databases contain the best understood and most scalable query processors available today. Can we use relational query engines as-is to efficiently process non-relational languages? XQuery R R -1 SQL / Relational Algebra 3

Purely Relational XQuery Find a relational representation of XQuery (XML, XPath) that leverages the table-based processing model: Exploit functionality already built into the database system (do not invade the database kernel): SQL idioms, OLAP extensions, partitioned B-Trees. Run on top off-the-shelf relational database systems: 4

Purely Relational XQuery Find a relational representation of XQuery (XML, XPath) that leverages the table-based processing model: Exploit functionality already built into the database system (do not invade the database kernel): SQL idioms, OLAP extensions, partitioned B-Trees. Run on top off-the-shelf relational database systems: DB2 (kxsystems) IBM DB2 V9 SQL Server 2005 PostgreSQL kdb+ MonetDB DB2 Version 9 4

Pathfinder pathfinder-xquery.org Relational Database System (Kernel) 5

Pathfinder pathfinder-xquery.org Pathfinder XQuery Compiler XPath/XQuery Semantics XML Encoding Relational Database System (Kernel) 5

Pathfinder pathfinder-xquery.org XQuery Pathfinder XQuery Compiler XPath/XQuery Semantics XML Encoding Relational Database System (Kernel) 5

Pathfinder pathfinder-xquery.org XQuery Pathfinder XQuery Compiler XPath/XQuery Semantics XML Encoding SQL Relational Database System (Kernel) 5

Pathfinder pathfinder-xquery.org XQuery Pathfinder XQuery Compiler XPath/XQuery Semantics XML Encoding RA SQL Relational Database System (Kernel) 5

Pathfinder & DB2 DB2 Against 110+ MB of XML DB2 Version 9 for Linux, UNIX, an 6

Pathfinder & DB2 DB2 Against 110+ MB of XML DB2 Version 9 for Linux, UNIX, an 6

Slow Motion Replay XQuery Pathfinder SQL DB2 DB2 Version 9 for Linux, UNIX, and Windows 7

Slow Motion Replay XQuery XQuery (XMark Q8) let $a := doc("xmark-110mb.xml") return Pathfinder <item person="{ $p/name/text() }"> { count($ca) } </item> SQL DB2 DB2 Version 9 for Linux, UNIX, and Windows 7

Slow Motion Replay XQuery SQL:1999 (no SQL/XML) Pathfinder SQL DB2 DB2 Version 9 WITH t0000 (iter,pre) AS SELECT FROM WHERE t0001 (iter) AS SELECT FROM WHERE SELECT FROM WHERE ORDER BY pre,size,kind, for Linux, UNIX, and Windows 7

Slow Motion Replay XQuery Pathfinder SQL DB2 Relational Encoding of Result pre size kind 5072509 2 1 5072510 0 2 5072511 0 3 DB2 Version 9 for Linux, UNIX, and Windows 7

Slow Motion Replay XQuery Pathfinder SQL DB2 Serialized XML Text <?xml version="1.0" encoding="utf-8"?> <XQuery> <item person= Jixiang >0</item> <item person= Harpreet >0</item> </XQuery> DB2 Version 9 for Linux, UNIX, and Windows 7

Table Access Modes 8

Table Access Modes Relational query engines derive much of their efficiency from the simplicity of the table of tuples model. A B 8

Table Access Modes Relational query engines derive much of their efficiency from the simplicity of the table of tuples model. 1. Sequential scans: read-ahead on disks, prefetching CPU caches. A B 8

Table Access Modes Relational query engines derive much of their efficiency from the simplicity of the table of tuples model. 1. Sequential scans: read-ahead on disks, prefetching CPU caches. A B 8

Table Access Modes Relational query engines derive much of their efficiency from the simplicity of the table of tuples model. 1. Sequential scans: read-ahead on disks, prefetching CPU caches. A B 2. Index-based access: B-Trees, range scans. 8

Table Access Modes Relational query engines derive much of their efficiency from the simplicity of the table of tuples model. 1. Sequential scans: read-ahead on disks, prefetching CPU caches. A B 2. Index-based access: B-Trees, range scans. 8

Table Access Modes Relational query engines derive much of their efficiency from the simplicity of the table of tuples model. 1. Sequential scans: read-ahead on disks, prefetching CPU caches. A B 2. Index-based access: B-Trees, range scans. But: We will benefit only if we actually operate the relational database in this bulk-oriented mode. 8

The Core of XQuery 9

The Core of XQuery (e1,..,en) ordered item sequences 9

The Core of XQuery (e1,..,en) for $x in e 1 return e 2 ordered item sequences iteration 9

The Core of XQuery (e1,..,en) for $x in e 1 return e 2 let $x := e1 return e2 ordered item sequences iteration variable binding 9

The Core of XQuery (e1,..,en) for $x in e 1 return e 2 let $x := e1 return e2 if (e1) then e2 else e3 ordered item sequences iteration variable binding conditionals 9

The Core of XQuery (e1,..,en) for $x in e 1 return e 2 let $x := e1 return e2 if (e1) then e2 else e3 <t> { e1 } </t> ordered item sequences iteration variable binding conditionals XML node construction 9

The Core of XQuery (e1,..,en) for $x in e 1 return e 2 let $x := e1 return e2 if (e1) then e2 else e3 <t> { e1 } </t> e 1 /child::t, e 1 /following::t ordered item sequences iteration variable binding conditionals XML node construction XPath location steps 9

The Core of XQuery (e1,..,en) for $x in e 1 return e 2 let $x := e1 return e2 if (e1) then e2 else e3 <t> { e1 } </t> e 1 /child::t, e 1 /following::t unordered { e 1 } ordered item sequences iteration variable binding conditionals XML node construction XPath location steps local order indifference 9

The Core of XQuery (e1,..,en) for $x in e 1 return e 2 let $x := e1 return e2 if (e1) then e2 else e3 <t> { e1 } </t> e 1 /child::t, e 1 /following::t unordered { e 1 } fn:doc(e 1 ), fn:data(e 1 ) ordered item sequences iteration variable binding conditionals XML node construction XPath location steps local order indifference built-in functions + Document order, node identity, dynamic typing, schema validation, user-defined functions,... 9

The Core of XQuery (e1,..,en) for $x in e 1 return e 2 let $x := e1 return e2 if (e1) then e2 else e3 <t> { e1 } </t> e 1 /child::t, e 1 /following::t unordered { e 1 } fn:doc(e 1 ), fn:data(e 1 ) ordered item sequences iteration variable binding conditionals XML node construction XPath location steps local order indifference built-in functions + Document order, node identity, dynamic typing, schema validation, user-defined functions,... The XQuery SQL gap appears substantial. 9

FP-Style Iteration 10

FP-Style Iteration The XQuery for..in..return construct performs side-effect free iteration: for $x in (e1,..,en) return b 10

FP-Style Iteration The XQuery for..in..return construct performs side-effect free iteration: for $x in (e1,..,en) return b ( b[e1/$x],..,b[en/$x] ) Individual iterations cannot interfere and may be evaluated in any order (or in parallel). 10

Loop Lifting Relational loop unrolling : For any XQuery subexpression, generate a relational plan that evaluates the expression in all iterations. 11

Loop Lifting Relational loop unrolling : For any XQuery subexpression, generate a relational plan that evaluates the expression in all iterations. for $x in (10,20,30) return $x + 42 11

Loop Lifting Relational loop unrolling : For any XQuery subexpression, generate a relational plan that evaluates the expression in all iterations. for $x in (10,20,30) return $x + 42 iter pos item 1 1 42 2 1 42 3 1 42 11

Loop Lifting Relational loop unrolling : For any XQuery subexpression, generate a relational plan that evaluates the expression in all iterations. for $x in (10,20,30) return $x + 42 iter pos item 1 1 42 2 1 42 3 1 42 for $x in (10,20,30) return $x + 42 11

Loop Lifting Relational loop unrolling : For any XQuery subexpression, generate a relational plan that evaluates the expression in all iterations. for $x in (10,20,30) return $x + 42 for $x in (10,20,30) return $x + 42 iter pos item 1 1 42 2 1 42 3 1 42 iter pos item 1 1 10 2 1 20 3 1 30 11

Loop Lifting Relational loop unrolling : For any XQuery subexpression, generate a relational plan that evaluates the expression in all iterations. for $x in (10,20,30) return $x + 42 for $x in (10,20,30) return $x + 42 iter pos item 1 1 42 2 1 42 3 1 42 iter pos item 1 1 10 2 1 20 3 1 30 let $x := (10,20,30) return $x 11

Loop Lifting Relational loop unrolling : For any XQuery subexpression, generate a relational plan that evaluates the expression in all iterations. for $x in (10,20,30) return $x + 42 for $x in (10,20,30) return $x + 42 let $x := (10,20,30) return $x 11 iter pos item 1 1 42 2 1 42 3 1 42 iter pos item 1 1 10 2 1 20 3 1 30 iter pos item 1 1 10 1 2 20 1 3 30

Loop Lifting 12

Loop Lifting for $x in (10,20,30) return $x + 42 12

Loop Lifting for $x in (10,20,30) return $x + 42 iter pos item 1 1 10 2 1 20 3 1 30 iter 1 pos 1 item 1 1 1 42 2 1 42 3 1 42 12

Loop Lifting for $x in (10,20,30) return $x + 42 iter pos item 1 1 10 2 1 20 3 1 30 iter = iter 1 iter 1 pos 1 item 1 1 1 42 2 1 42 3 1 42 + item 2 :(item,item 1 ) π iter,pos,item 2 12

Loop Lifting for $x in (10,20,30) return $x + 42 iter pos item 1 1 10 2 1 20 3 1 30 iter = iter 1 + item 2 :(item,item 1 ) π iter,pos,item 2 iter 1 pos 1 item 1 1 1 42 2 1 42 3 1 42 iter pos item iter 1 pos 1 item 1 1 1 10 1 1 42 2 1 20 2 1 42 3 1 30 3 1 42 12

Loop Lifting for $x in (10,20,30) return $x + 42 iter pos item 1 1 10 2 1 20 3 1 30 iter = iter 1 + item 2 :(item,item 1 ) π iter,pos,item 2 iter 1 pos 1 item 1 1 1 42 2 1 42 3 1 42 iter pos item iter 1 pos 1 item 1 item 2 1 1 10 1 1 42 52 2 1 20 2 1 42 62 3 1 30 3 1 42 72 12

Loop Lifting for $x in (10,20,30) return $x + 42 iter pos item 1 1 10 2 1 20 3 1 30 iter = iter 1 iter 1 pos 1 item 1 1 1 42 2 1 42 3 1 42 + item 2 :(item,item 1 ) π iter,pos,item 2 iter pos item 2 1 1 52 2 1 62 3 1 72 12

Loop Lifting 13

for $x in (1,2,3,4) Loop Lifting return if ($x mod 2 = 0) then even else odd 13

for $x in (1,2,3,4) Loop Lifting return if ($x mod 2 = 0) then even else odd iter pos item 1 1 false 2 1 true 3 1 false 4 1 true 13

for $x in (1,2,3,4) Loop Lifting return if ($x mod 2 = 0) then even else odd σ item π iter iter pos item 1 1 false 2 1 true 3 1 false 4 1 true pos item 1 even pos item 1 odd σ item π iter 13

for $x in (1,2,3,4) Loop Lifting return if ($x mod 2 = 0) then even else odd iter pos item 2 1 true 4 1 true σ item π iter iter pos item 1 1 false 2 1 true 3 1 false 4 1 true pos item 1 even pos item 1 odd σ item π iter 13

for $x in (1,2,3,4) Loop Lifting return if ($x mod 2 = 0) then even else odd iter 2 4 σ item π iter iter pos item 1 1 false 2 1 true 3 1 false 4 1 true pos item 1 even pos item 1 odd σ item π iter 13

for $x in (1,2,3,4) Loop Lifting return if ($x mod 2 = 0) then even else odd iter pos item 2 1 even 4 1 even σ item π iter iter pos item 1 1 false 2 1 true 3 1 false 4 1 true pos item 1 even pos item 1 odd σ item π iter 13

for $x in (1,2,3,4) Loop Lifting return if ($x mod 2 = 0) then even else odd iter pos item 1 1 odd σ item π iter 2 1 even 3 1 odd iter pos item 1 1 false 2 1 true 3 1 false 4 1 true pos item 1 even pos item 1 odd 4 1 even σ item π iter 13

Intermediate Code: Relational Algebra XQuery Pathfinder XQuery Compiler Relational Algebra σ π 14

Intermediate Code: Relational Algebra XQuery Pathfinder XQuery Compiler Relational Algebra σ π Compiles into RA that mimics capabilites of SQL:1999 query engines. RA dialect is... 1. primitive 2. explicit 3. designed to be easily analyzable 14

Intermediate Code: Relational Algebra XQuery Pathfinder XQuery Compiler Relational Algebra σ π Code Gen Code Gen Code Gen Compiles into RA that mimics capabilites of SQL:1999 query engines. RA dialect is... 1. primitive 2. explicit 3. designed to be easily analyzable 14

Intermediate Code: Relational Algebra XQuery Pathfinder XQuery Compiler Relational Algebra σ π Code Gen Code Gen Code Gen Compiles into RA that mimics capabilites of SQL:1999 query engines. RA dialect is... 1. primitive SQL Q MIL 2. explicit (kxsystems) 3. designed to be easily analyzable 14

Typical Plans... 15

Typical Plans... 15

Relational XQuery Optimization let $d := fn:doc( ) return for $a in $d//a for $b in $d//b where $b/@c = $a/@d return $b 16

Relational XQuery Optimization We use concepts in the relational domain to analyze plans and to steer simplification and optimization. Detect join-like XQuery expressions. Let XQuery s syntactic diversity not affect the detection: let $d := fn:doc( ) return for $a in $d//a for $b in $d//b where $b/@c = $a/@d return $b 16

Relational XQuery Optimization We use concepts in the relational domain to analyze plans and to steer simplification and optimization. Detect join-like XQuery expressions. Let XQuery s syntactic diversity not affect the detection: let $d := fn:doc( ) return $d//b[@c = $d//a/@c] 16

Relational XQuery Optimization We use concepts in the relational domain to analyze plans and to steer simplification and optimization. Detect join-like XQuery expressions. Let XQuery s syntactic diversity not affect the detection: let $d := fn:doc( ) return $d//b[@c = $d//a/@c] For both queries, the value-based XQuery join surfaces as a multi-valued dependency in the algebraic plans. 16

ROW# (pos1:<sort, pos>/outer) (iter, pos, item:res) access textnode content (res:<item>) X (iter = inner) @ (pos), val: #1 (iter:inner, item) (outer:iter, sort:pos, inner) NUMBER (inner) ROW# (pos:<item>/iter) DISTINCT (iter, item) ROW# (pos:<item>/iter) / child::text (iter, item) (iter, item) ROW# (pos:<item>/iter) DISTINCT (iter, item) ROW# (pos:<item>/iter) / child::element name { item* } (iter, item) (iter, pos, item:cast) CAST (cast:<item>), type: ua (iter, pos, item:res) access attribute value (res:<item>) @ (pos), val: #1 (iter:inner, item) (iter, item) @ (pos), val: #1 (iter:inner, item) X (iter = inner) (outer:iter, sort:pos, inner) NUMBER (inner) ROW# (pos:<item>/iter) DISTINCT (iter, item) ROW# (pos:<item>/iter) / attribute::attribute id { atomic* } (iter, item) (iter:inner, item) @ (pos), val: #1 (iter:inner, item) NUMBER (inner) ROW# (pos:<item>/iter) DISTINCT (iter, item) ROW# (pos:<item>/iter) / child::element closed_auction { item* } (iter, item) (iter, item) ROW# (pos:<item>/iter) NUMBER (inner) ROW# (pos:<item>/iter) DISTINCT (iter, item) ROW# (pos:<item>/iter) / child::element person { item* } (iter, item) (iter, item) ROW# (pos:<item>/iter) DISTINCT (iter, item) DISTINCT (iter, item) ROW# (pos:<item>/iter) / child::element closed_auctions { item* } (iter, item) ROW# (pos:<item>/iter) / child::element people { item* } (iter, item) (iter, item) (iter) @ (pos), val: #1 (iter:inner, item) (iter, item) ROW# (pos:<item>/iter) DISTINCT (iter, item) DIFF ROW# (pos:<item>/iter) @ (item), val: false DISTINCT (iter:outer) NUMBER (inner) / child::element site { item* } (iter, item) ROW# (pos1:<sort, pos>/outer) (outer:iter, sort:pos, inner) NUMBER (inner) X (iter = inner) @ (item), val: 1 @ (pos), val: #1 (iter) SEL (item) (iter, pos, item:res) = (res:<item, item1>) X (iter = iter1) (iter:inner, item) X (iter = outer) (iter, pos, item:res) NOT (res:<item>) @ (pos), val: #1 U @ (item), val: true (iter, pos, item:cast) (iter1:iter, item1:cast) CAST (cast:<item>), type: str (iter:inner, pos, item) X (iter = outer) ROW# (pos1:<sort, pos>/outer) (iter:outer, pos:pos1, item) (iter:outer, pos:pos1, item) CAST (cast:<item>), type: str ROW# (pos1:<sort, pos>/outer) (iter, pos, item:cast) CAST (cast:<item>), type: ua (iter, pos, item:res) access attribute value (res:<item>) @ (pos), val: #1 X (iter = inner) (iter:inner, item) @ (pos), val: #1 (iter:inner, item) X (iter = outer) (iter:inner, pos, item) (outer:iter, sort:pos, inner) NUMBER (inner) ROW# (pos:<item>/iter) DISTINCT (iter, item) ROW# (pos:<item>/iter) / attribute::attribute person { atomic* } (iter, item) (iter, item) ROW# (pos:<item>/iter) DISTINCT (iter, item) ROW# (pos:<item>/iter) (outer:iter, sort:pos, inner) (outer:iter, sort:pos, inner) (outer:iter, sort:pos, inner) / child::element buyer { item* } (iter, item) X (iter = outer) (iter, item) Rewriting XMark Q8 17

EMPTY_FRAG FRAG_UNION FRAGs ROW# (pos1:<sort>) ELEM (iter, item:<iter, item><iter, pos, item>) FRAG_UNION FRAG_UNION FRAGs @ (item), val: person (iter:outer, pos:pos1, item) ROW# (pos1:<sort>/outer) X (iter = inner) (iter, pos, item:res) access textnode content (res:<item>) @ (pos), val: #1 (iter:inner, item) ATTR (res:<item, item1>) X (iter = iter1) @ (pos), val: #1 NUMBER (inner) ROW# (pos:<item>/iter) / child::text (iter, item) X (iter = inner) @ (pos), val: #1 ROOTS ELEM_TAG @ (item), val: item (iter1:iter, item1:item) fn:string_join (outer:iter, sort:pos, inner) / child::element name { item* } (iter, item) @ (item), val: " " (iter:inner) (iter, pos:pos1, item) ROW# (pos1:<ord>/iter) @ (ord), val: #1 @ (ord), val: #2 (iter, pos, item:res) @ (pos), val: #1 ROOTS U FRAGs (iter, item:res) ROOTS TEXT (res:<cast>) CAST (cast:<item>), type: str U FRAG_UNION @ (item), val: 0 DIFF (iter) COUNT (item:/iter) (iter:outer) X (iter = inner) (iter) X (iter = iter1) (iter1:iter) SEL (item) (iter, item:res) NOT (res:<item>) U @ (item), val: false (iter:inner) (iter:inner, item) NUMBER (inner) ROW# (pos:<item>) / child::element person { item* } (iter, item) / child::element people { item* } (iter, item) FRAGs (iter:inner, item) NUMBER (inner) @ (item), val: true / child::element closed_auction { item* } (iter, item) / child::element closed_auctions { item* } (iter, item) / child::element site { item* } (iter, item) DOC DIFF (iter, item:cast) CAST (cast:<item>), type: ua (iter, item:res) access attribute value (res:<item>) / attribute::attribute person { atomic* } (iter, item) / child::element buyer { item* } (iter, item) @ (item), val: "auctiong.xml" DISTINCT (iter:outer) @ (item), val: false (iter:inner) DIFF NUMBER (inner) (iter:inner, item) X (iter = inner) (outer:iter, sort:pos, inner) / child::element site { item* } (iter, item) ROOTS (iter) SEL (item) (iter, item:res) NOT (res:<item>) U DISTINCT (iter:outer) @ (item), val: true X (iter = inner) (iter) SEL (item) (iter, item:res) = (res:<item, item1>) (iter, item:cast) X (iter = iter1) CAST (cast:<item>), type: str (iter:inner, item) X (iter = outer) (iter:inner, item) (iter:outer, item) X (iter = inner) (outer:iter, inner) (iter:inner, item) NUMBER (inner) (iter1:iter, item1:cast) CAST (cast:<item>), type: str (iter, item:cast) CAST (cast:<item>), type: ua (iter, item:res) (iter:inner, item) access attribute value (res:<item>) (iter:inner, item) NUMBER (inner) (iter:outer, item) X (iter = inner) (outer:iter, inner) (outer:iter, inner) NUMBER (inner) / attribute::attribute id { atomic* } (iter, item) (iter:inner, item) X (iter = outer) (iter:inner, item) X (iter = outer) (outer:iter, inner) (outer:iter, inner) Rewriting XMark Q8 Rewriting phases: 1. Constant propagation and projection pushdown, 17

EMPTY_FRAG FRAG_UNION FRAGs SERIALIZE (item, pos) ROW# (pos:<pos1>) (pos1, item) X (iter = iter1) ELEM (iter1, item:<iter1, item><iter1, pos, item>) (iter1, item1, pos) ROW# (pos:<pos1>/iter1) (iter1, pos1, item1) access textnode content (item1:<item>) ROW# (pos1:<item>/iter1) / child::text (iter1, item) ROOTS FRAG_UNION FRAG_UNION FRAGs ATTR (item:<item2, item1>) @ (item2), val: person fn:string_join / child::element name { item* } (iter1, item) (item:item1, iter1:iter) @ (item1), val: " " ELEM_TAG @ (item), val: item NUMBER (iter) ROW# (pos1:<item1>) (item1) (iter1:iter) / child::element person { item* } (iter, item1) (iter1, item, pos) ROW# (pos:<pos1>/iter1) U @ (pos1), val: #1 @ (pos1), val: #2 (iter1, item) (iter1, item) ROOTS FRAGs TEXT (item:<item2>) CAST (item2:<item1>), type: str U @ (item1), val: 0 DIFF (iter1) ROOTS COUNT (item1:/iter1) (iter1) X (iter = iter2) (iter:iter2) DISTINCT (iter2) SEL (item) (iter2, item) = (item:<item3, item4>) (iter2, item3, item4) CAST (item4:<item>), type: str CAST (item3:<item2>), type: str (iter2, item2, item) CAST (item:<item4>), type: ua access attribute value (item4:<item3>) (item3, iter2, item2) / child::element people { item* } (iter, item1) FRAG_UNION X (iter1 = iter3) (iter1:iter3, item3) / attribute::attribute id { atomic* } (iter3, item3) (item3:item, iter3:iter1) FRAGs (iter2, item2:item1, iter3:iter1) NUMBER (iter1) (item, iter2, item1) CAST (item1:<item3>), type: ua access attribute value (item3:<item2>) / child::element site { item* } (iter, item1) DOC TBL: (iter item) [#1,"auctionG.xml"] @ (iter), val: #1 (item1:item) ROOTS (item2, item, iter2) X (iter = iter3) (iter:iter3, item2) / attribute::attribute person { atomic* } (iter3, item2) / child::element buyer { item* } (iter3, item2) (item, item1, iter1, iter2:iter) (item2:item, iter3:iter) (item:item1, iter2:iter, iter3:iter) (item) NUMBER (iter) (item1, iter1:iter) / child::element closed_auction { item* } (iter, item) / child::element closed_auctions { item* } (iter, item) / child::element site { item* } (iter, item) Rewriting XMark Q8 Rewriting phases: 1. Constant propagation and projection pushdown, 2. functional dependency and data flow analysis, 17

EMPTY_FRAG FRAG_UNION FRAGs SERIALIZE (item, pos) ROW# (pos:<pos1>) (pos1, item) X (iter = iter1) ELEM (iter1, item:<iter1, item><iter1, pos, item>) (iter1, item1, pos) ROW# (pos:<pos1>/iter1) (iter1, pos1, item1) access textnode content (item1:<item>) ROW# (pos1:<item>/iter1) / child::text (iter1, item) ROOTS FRAG_UNION FRAG_UNION FRAGs ATTR (item:<item2, item1>) @ (item2), val: person fn:string_join @ (item1), val: " " / child::element name { item* } (iter1, item) (item:item1, iter1:iter) ELEM_TAG @ (item), val: item (iter1:iter) NUMBER (iter) ROW# (pos1:<item1>) (item1) / child::element person { item* } (iter, item1) (iter1, item, pos) ROW# (pos:<pos1>/iter1) U @ (pos1), val: #1 @ (pos1), val: #2 (iter1, item) (iter1, item) ROOTS FRAGs TEXT (item:<item2>) CAST (item2:<item1>), type: str U @ (item1), val: 0 DIFF (iter1) COUNT (item1:/iter1) (iter1) DISTINCT (iter, iter1) ROOTS X (item1 = item) (iter, item1) (iter1, item) access attribute value (item1:<item>) (item, iter:iter2) / attribute::attribute person { atomic* } (iter2, item) / child::element people { item* } (iter, item1) FRAG_UNION FRAGs / child::element buyer { item* } (iter2, item) DOC TBL: (iter item1) [#1,"auctionG.xml"] (item:item1, iter2:iter) NUMBER (iter) ROOTS (item1) / child::element closed_auction { item* } (iter, item1) / child::element closed_auctions { item* } (iter, item1) / child::element site { item* } (iter, item1) access attribute value (item:<item1>) (item1, iter1:iter2) / attribute::attribute id { atomic* } (iter2, item1) (item1, iter2:iter) Rewriting XMark Q8 Rewriting phases: 1. Constant propagation and projection pushdown, 2. functional dependency and data flow analysis, 3. algebraic XQuery join detection, 17

Attach (item), val: item SERIALIZE ROOTS twig (iter, item) ELEM (iter, item) fcns ATTR (iter, item, item1) Attach (item), val: person fn:string_join fcns access textnode content (item1:<item>) Attach (item1), val: " " Project (iter, item) / + child::text (item:item1) level=5 / + child::element name { item* } (item1:iter) level=4 Project (iter) / + child::element person { item* } (iter:item) level=3 Project (item) TEXT (iter, item) Project (iter, item) access attribute value (item1:<item>) / + child::element people { item* } (item:item1) level=2 CAST (item:<item1>), type: str / + attribute::attribute id { atomic* } (item:iter) level=4 Project (item1) / + child::element site { item* } (item1:item) level=1 ROOTS DOC (iter, item) TBL: (iter item) [#1,"auctionG.xml"] Project (item) UNION COUNT (item1:/iter) Project (iter) ThetaJoin (item eq item1) Project (iter, item1) access attribute value (item:<item1>) Project (item1) / + attribute::attribute person { atomic* } (item1:item) level=5 Project (item) / + child::element buyer { item* } (item:item1) level=4 Project (item1) / + child::element closed_auction { item* } (item1:item) level=3 Project (item) / + child::element closed_auctions { item* } (item:item1) level=2 Attach (item1), val: 0 DIFF Project (iter) Rewriting XMark Q8 Rewriting phases: 1. Constant propagation and projection pushdown, 2. functional dependency and data flow analysis, 3. algebraic XQuery join detection, 4. XPath join graph isolation, 17

EMPTY FRAG item1 :item item:iter descendant::text() GPS = 802, level = 5 SERIALIZE FRAG UNION FRAGs FRAGs ROOTS twig (iter, item) DOC (iter, item) TBL: (iter item) item:item1 πitem1 ROOTS πiter,item1 πitem item1 :item attribute(person) GPS = 960, level = 5 πitem item:item1 descendant::element(buyer) GPS = 959, level = 4 πitem1 :item item = item1 πiter,item1 item1 :item πiter ELEM (iter, item) πitem @item: item item:iter attribute(id) GPS = 799, level = 4 item:iter fcns ATTR (iter, item, item1) @item: person COUNT item1 :()/iter πiter fcns TEXT (iter, item) πiter,item descendant::element(person) GPS = 798, level = 3 nil CAST (item: item1 ), type: str @item1 :0 \ πiter Rewriting XMark Q8 Rewriting phases: 1. Constant propagation and projection pushdown, 2. functional dependency and data flow analysis, 3. algebraic XQuery join detection, 4. XPath join graph isolation, 5. data guide exploitation ( Node GPS ). Pathfinder & IBM DB2 V9, 115 MB XML input data < 1 sec 17

Rewriting XMark Q8 π iter Rewriting phases: item=item1 1. Constant propagation and projection pushdown, π item π iter,item1 2. functional dependency and data flow analysis, item:item1 item1 :item 3. algebraic XQuery join detection, π item1 item1 :item 4. XPath join graph item:iter GPS = descendant::text() 802, isolation, level = 5 attribute(person) GPS = 960, level = 5 item1 :item item:iter a GPS = 79 5. data guide exploitation ( Node GPS ). π item Pathfinder & IBM DB2 V9, 115 MB XML item:item1 input data < 1 sec descendant::element(buyer) GPS = 959, level = 4 17 π item π item1 :item

upstream Can DB2 Cope? SORT(33) 373.32 HSJOIN(35) 373.32 NLJOIN(37) 75.02 HSJOIN(43) 298.3 IXSCAN(41) 50.01 NLJOIN(45) 149.15 DB2 Version 9 NLJOIN( IDX_GUIDE_PRE for Linux, UNIX, and Windows IXSCAN(53) 50.01 NLJOIN(47) 75.02 NLJOIN( XMARK_1.DOC IDX_GUIDE_PRE_VALUE IXSCAN(51) 50.01 IXSCAN(49) 50.01 IXSCAN(63) 50.0 XMARK_1.DOC IDX_GUIDE_PRE_PRE_PLUS_SIZE IDX_VALUE_KIND_PRE_SIZE IDX_GUIDE_PRE_VAL XMARK_1.DOC XMARK_1.DOC XMARK_1.DOC 18

upstream Can DB2 Cope? SORT(33) 373.32 HSJOIN(35) 373.32 NLJOIN(37) 75.02 HSJOIN(43) 298.3 IXSCAN(41) 50.01 NLJOIN(45) 149.15 DB2 Version 9 NLJOIN( IDX_GUIDE_PRE for Linux, UNIX, and Windows IXSCAN(53) 50.01 NLJOIN(47) 75.02 NLJOIN( XMARK_1.DOC IDX_GUIDE_PRE_VALUE IXSCAN(51) 50.01 IXSCAN(49) 50.01 IXSCAN(63) 50.0 XMARK_1.DOC IDX_GUIDE_PRE_PRE_PLUS_SIZE IDX_VALUE_KIND_PRE_SIZE IDX_GUIDE_PRE_VAL XMARK_1.DOC XMARK_1.DOC XMARK_1.DOC 18

upstream Can DB2 Cope? SORT(33) 373.32 HSJOIN(35) 373.32 NLJOIN(37) 75.02 HSJOIN(43) 298.3 IXSCAN(41) 50.01 NLJOIN(45) 149.15 DB2 Version 9 NLJOIN( IDX_GUIDE_PRE for Linux, UNIX, and Windows IXSCAN(53) 50.01 NLJOIN(47) 75.02 NLJOIN( XMARK_1.DOC IDX_GUIDE_PRE_VALUE IXSCAN(51) 50.01 IXSCAN(49) 50.01 IXSCAN(63) 50.0 XMARK_1.DOC IDX_GUIDE_PRE_PRE_PLUS_SIZE IDX_VALUE_KIND_PRE_SIZE IDX_GUIDE_PRE_VAL XMARK_1.DOC XMARK_1.DOC XMARK_1.DOC 18

upstream Can DB2 Cope? SORT(33) 373.32 HSJOIN(35) 373.32 NLJOIN(37) 75.02 HSJOIN(43) 298.3 IXSCAN(41) 50.01 NLJOIN(45) 149.15 DB2 Version 9 NLJOIN( IDX_GUIDE_PRE for Linux, UNIX, and Windows IXSCAN(53) 50.01 NLJOIN(47) 75.02 NLJOIN( XMARK_1.DOC IDX_GUIDE_PRE_VALUE IXSCAN(51) 50.01 IXSCAN(49) 50.01 IXSCAN(63) 50.0 XMARK_1.DOC IDX_GUIDE_PRE_PRE_PLUS_SIZE IDX_VALUE_KIND_PRE_SIZE IDX_GUIDE_PRE_VAL XMARK_1.DOC XMARK_1.DOC XMARK_1.DOC 18

upstream Can DB2 Cope? SORT(33) 373.32 HSJOIN(35) 373.32 NLJOIN(37) 75.02 HSJOIN(43) 298.3 IXSCAN(41) 50.01 NLJOIN(45) 149.15 DB2 Version 9 NLJOIN( IDX_GUIDE_PRE for Linux, UNIX, and Windows IXSCAN(53) 50.01 NLJOIN(47) 75.02 NLJOIN( XMARK_1.DOC IDX_GUIDE_PRE_VALUE IXSCAN(51) 50.01 IXSCAN(49) 50.01 IXSCAN(63) 50.0 XMARK_1.DOC IDX_GUIDE_PRE_PRE_PLUS_SIZE IDX_VALUE_KIND_PRE_SIZE IDX_GUIDE_PRE_VAL XMARK_1.DOC XMARK_1.DOC XMARK_1.DOC 18

upstream Can DB2 Cope? SORT(33) 373.32 HSJOIN(35) 373.32 NLJOIN(37) 75.02 HSJOIN(43) 298.3 IXSCAN(41) 50.01 NLJOIN(45) 149.15 DB2 Version 9 NLJOIN( IDX_GUIDE_PRE for Linux, UNIX, and Windows IXSCAN(53) 50.01 NLJOIN(47) 75.02 NLJOIN( XMARK_1.DOC IDX_GUIDE_PRE_VALUE IXSCAN(51) 50.01 IXSCAN(49) 50.01 IXSCAN(63) 50.0 XMARK_1.DOC IDX_GUIDE_PRE_PRE_PLUS_SIZE IDX_VALUE_KIND_PRE_SIZE IDX_GUIDE_PRE_VAL XMARK_1.DOC XMARK_1.DOC XMARK_1.DOC 18

upstream Can DB2 Cope? SORT(33) 373.32 HSJOIN(35) 373.32 NLJOIN(37) 75.02 HSJOIN(43) 298.3 IXSCAN(41) 50.01 NLJOIN(45) 149.15 DB2 Version 9 NLJOIN( IDX_GUIDE_PRE for Linux, UNIX, and Windows IXSCAN(53) 50.01 NLJOIN(47) 75.02 NLJOIN( XMARK_1.DOC IDX_GUIDE_PRE_VALUE IXSCAN(51) 50.01 IXSCAN(49) 50.01 IXSCAN(63) 50.0 XMARK_1.DOC IDX_GUIDE_PRE_PRE_PLUS_SIZE IDX_VALUE_KIND_PRE_SIZE IDX_GUIDE_PRE_VAL XMARK_1.DOC XMARK_1.DOC XMARK_1.DOC 18

upstream Can DB2 Cope? SORT(33) 373.32 HSJOIN(35) 373.32 *) Indexes avised by DB2 itself NLJOIN(37) 75.02 HSJOIN(43) 298.3 IXSCAN(41) 50.01 NLJOIN(45) 149.15 DB2 Version 9 NLJOIN( IDX_GUIDE_PRE for Linux, UNIX, and Windows IXSCAN(53) 50.01 NLJOIN(47) 75.02 NLJOIN( XMARK_1.DOC *) IDX_GUIDE_PRE_VALUE IXSCAN(51) 50.01 IXSCAN(49) 50.01 IXSCAN(63) 50.0 XMARK_1.DOC IDX_GUIDE_PRE_PRE_PLUS_SIZE *) *) IDX_VALUE_KIND_PRE_SIZE IDX_GUIDE_PRE_VAL XMARK_1.DOC XMARK_1.DOC XMARK_1.DOC 18

Order Indifference let $warning := <p>do <em>not</em> press button, computer will <em>explode!</em></p> return fn:distinct-doc-order( for $x in $warning//* return $x/text() ) 19

Order Indifference Order in XML and XQuery processing is essential: let $warning := <p>do <em>not</em> press button, computer will <em>explode!</em></p> return fn:distinct-doc-order( for $x in $warning//* return $x/text() ) 19

Order Indifference Order in XML and XQuery processing is essential: let $warning := <p>do <em>not</em> press button, computer will <em>explode!</em></p> return fn:distinct-doc-order( for $x in $warning//* return $x/text() ) Do not press button, computer will explode! 19

Order Indifference Order in XML and XQuery processing is essential: let $warning := <p>do <em>not</em> press button, computer will <em>explode!</em></p> return for $x in $warning//* return $x/text() Do press button, computer will not explode! 19

Order Indifference Order in XML and XQuery processing is essential: let $warning := <p>do <em>not</em> press button, computer will <em>explode!</em></p> return for $x in $warning//* return $x/text() Do press button, computer will not explode! Order-awareness thus is often deeply wired into XQuery processors. Supporting constructs like unordered { e1 } is challenging. 19

Order Indifference Order in XML and XQuery processing is essential: let $warning := <p>do <em>not</em> press button, computer will <em>explode!</em></p> return for $x in $warning//* return $x/text() Do press button, computer will not explode! Order-awareness thus is often deeply wired into XQuery processors. SERIALIZE (item, pos) Supporting constructs like unordered { e1 } is challenging. FRAG_UNION FRAGs (pos1, item) X (iter = iter1) ROOTS ELEM (iter1, item:<iter1, item><iter1, pos, item>) ROW# (pos:<pos1>) ELEM_TAG (iter1, item, pos) @ (item), val: item ROW# (pos:<pos1>/iter1) U 19 FRAG_UNION FRAG_UNION @ (pos1), val: #1 @ (pos1), val: #2 (iter1, item) (iter1, item) ROOTS FRAGs ROOTS FRAGs TEXT (item:<item2>)

Order Indifference Order in XML and XQuery processing is essential: let $warning := <p>do <em>not</em> press button, computer will <em>explode!</em></p> return for $x in $warning//* return $x/text() Do press button, computer will not explode! Order-awareness thus is often deeply wired into XQuery processors. π iter,item SERIALIZE (item, pos) Supporting constructs like unordered { e1 } is challenging. FRAG_UNION FRAGs (pos1, item) X (iter = iter1) ROOTS ELEM (iter1, item:<iter1, item><iter1, pos, item>) ROW# (pos:<pos1>) ELEM_TAG (iter1, item, pos) @ (item), val: item ROW# (pos:<pos1>/iter1) U 19 FRAG_UNION FRAG_UNION @ (pos1), val: #1 @ (pos1), val: #2 (iter1, item) (iter1, item) ROOTS FRAGs ROOTS FRAGs TEXT (item:<item2>)

More Purely Relational XQuery 20

More Purely Relational XQuery Declarative XQuery debugging with time travel. Users observe expressions and may traverse iterations forward/backward. 20

More Purely Relational XQuery Declarative XQuery debugging with time travel. Users observe expressions and may traverse iterations forward/backward. iter pos item 1 2 3 4 20

More Purely Relational XQuery Declarative XQuery debugging with time travel. Users observe expressions and may traverse iterations forward/backward. iter pos item 1 2 3 4 Dependable cardinality estimates at XQuery subexpression level (# items per iteration and overall contribution). for $x in $doc//x return if (e 1 ) then e 2 else e 3 20

More Purely Relational XQuery Declarative XQuery debugging with time travel. Users observe expressions and may traverse iterations forward/backward. iter pos item 1 2 3 4 Dependable cardinality estimates at XQuery subexpression level (# items per iteration and overall contribution). for $x in $doc//x 4.3 return if (e 1 ) then e 2 else e 3 8 2 20

Atomization Indexes <a> <b> <c>same<d>shirt</d></c> different </b> day </a> 21

Atomization Indexes <a> <b> <c>same<d>shirt</d></c> different </b> day </a> 21

Atomization Indexes <a> <b> <c>same<d>shirt</d></c> different </b> day </a> Same c b d a different day shirt 21

Atomization Indexes <a> <b> <c>same<d>shirt</d></c> different </b> day </a> Same c b d a different day node atom1 atom2 atom3 atom4 a b c shirt d 21

Atomization Indexes <a> <b> <c>same<d>shirt</d></c> different </b> day </a> Same c b d a different day node atom1 atom2 atom3 atom4 a b c shirt d 21

Atomization Indexes <a> <b> <c>same<d>shirt</d></c> different </b> day </a> Same c b d a different day node atom1 atom2 atom3 atom4 a b c Same d shirt different day shirt 21

Atomization Indexes <a> <b> <c>same<d>shirt</d></c> different </b> day </a> Same c b d a different day node atom1 atom2 atom3 atom4 a b c Same d shirt shirt different day shirt 21

Atomization Indexes <a> <b> <c>same<d>shirt</d></c> different </b> day </a> Same c b d a different day node atom1 atom2 atom3 atom4 a b c Same Same Same Same shirt shirt shirt different day different d shirt shirt different day shirt 21

Atomization Indexes <a> <b> <c>same<d>shirt</d></c> different </b> day </a> Same c b d a different day node atom1 atom2 atom3 atom4 a b c Same Same Same Same shirt shirt shirt different day different d shirt shirt different day shirt dict atom 21

Atomization Indexes <a> <b> <c>same<d>shirt</d></c> different </b> day </a> Same c b d a different day node atom1 atom2 atom3 atom4 a b c Sameδ 1 Sameδ 1 Sameδ 1 Sameδ 1 shirt shirt shirt different day different d shirt shirt different day shirt dict δ 1 Same atom 21

Atomization Indexes <a> <b> <c>same<d>shirt</d></c> different </b> day </a> Same c b d a different day node atom1 atom2 atom3 atom4 a δ 1 δ 2 δ 3 δ 4 b δ 1 δ 2 δ 3 c δ 1 δ 2 δ 1 d δ 2 δ 2 δ 3 δ 4 21 shirt dict atom δ 1 Same δ 2 shirt δ 3 different day δ 4

Atomization Indexes <a> <b> <c>same<d>shirt</d></c> different </b> day </a> Same c b d a different day node atom1 atom2 atom3 atom4 a δ 15 δ 2 δ 3 δ 4 b δ 15 δ 2 δ 3 c δ 15 δ 2 δ 1 d δ 2 δ 2 δ 3 shirt dict atom δ 1 Same δ 2 shirt δ 3 different δ 4 day δ 5 Same shirt δ 4 21

Atomization Indexes <a> <b> <c>same<d>shirt</d></c> different </b> day </a> Same c b d a different day node atom1 atom2 atom3 atom4 a δ 15 6 δ 2 δ 3 δ 4 b δ 15 6 δ 2 δ 3 c δ 1 5 δ 2 δ 1 d δ 2 δ 2 δ 3 δ 4 21 shirt dict atom δ 1 Same δ 2 shirt δ 3 different δ 4 day δ 5 δ 1 δ 2 δ 6 δ 5 δ 3

Atomization Indexes <a> <b> <c>same<d>shirt</d></c> different </b> day </a> Same c b d a different day node atom1 atom2 a δ 6 δ 4 b δ 6 c δ 5 δ 1 d δ 2 δ 2 δ 3 δ 4 21 shirt dict atom δ 1 Same δ 2 shirt δ 3 different δ 4 day δ 5 δ 1 δ 2 δ 6 δ 5 δ 3

Relational Encodings for XML XML documents represent ordered, unranked trees (nodes of 7 kinds). Relational XQuery processing calls for an adequate relational encoding of such trees: 1. Schema-oblivious (XQuery constructs arbitrary XML fragments), 2. accessible for the query processor and index support ( 10 7 nodes in typical GB-range XML instances). 22

Relational Encodings for XML XML documents represent ordered, unranked trees (nodes of 7 kinds). Relational XQuery processing calls for an adequate relational encoding of such trees: 1. Schema-oblivious (XQuery constructs arbitrary XML fragments), 2. accessible for the query processor and index support ( 10 7 nodes in typical GB-range XML instances). doc X 22 1 <a>b<c/>d<e/><f>g</f></a> 2

Pre/Postorder Encoding <a> <b><c><d/>e</c></b> <f><!--g--> <h><i/><j/></h> </f> </a> 23

Pre/Postorder Encoding <a> <b><c><d/>e</c></b> <f><!--g--> <h><i/><j/></h> </f> </a> d c b e a g i f h j 23

Pre/Postorder Encoding <a> <b><c><d/>e</c></b> <f><!--g--> <h><i/><j/></h> </f> </a> 1b3b 2c2c 3d0d 0a9a 6g4g 4e1e 5f8f 8i5i 7h7h 9j6j 23

Pre/Postorder Encoding <a> <b><c><d/>e</c></b> <f><!--g--> <h><i/><j/></h> </f> </a> 1b3b 2c2c 3d0d 0a9a 6g4g 4e1e 5f8f 8i5i 7h7h 9j6j 23 pre post node 0 9 a 1 3 b 2 2 c 3 0 d 4 1 e 5 8 f 6 4 g 7 7 h 8 5 i 9 6 j

Pre/Postorder Encoding <a> <b><c><d/>e</c></b> <f><!--g--> <h><i/><j/></h> </f> </a> 1b3b 2c2c 3d0d 0a9a 6g4g 4e1e 5f8f 8i5i 7h7h 9j6j 23 document order pre post node 0 9 a 1 3 b 2 2 c 3 0 d 4 1 e 5 8 f 6 4 g 7 7 h 8 5 i 9 6 j

Pre/Postorder Encoding <a> <b><c><d/>e</c></b> <f><!--g--> <h><i/><j/></h> </f> </a> 1b3b 2c2c 3d0d 0a9a 6g4g 4e1e 5f8f 8i5i 7h7h 9j6j document order pre post node 0 9 a 1 3 b 2 2 c 3 0 d 4 1 e 5 8 f 6 4 g 7 7 h 8 5 i 9 6 j kind elem elem elem elem text elem comm elem elem elem 23

Pre/Postorder Encoding <a> <b><c><d/>e</c></b> <f><!--g--> <h><i/><j/></h> </f> </a> 1b3b 2c2c 3d0d 0a9a 6g4g 4e1e 5f8f 8i5i 7h7h 9j6j document order pre post node 0 9 a 1 3 b 2 2 c 3 0 d 4 1 e 5 8 f 6 4 g 7 7 h 8 5 i 9 6 j kind elem elem elem elem text elem comm elem elem elem size level 9 0 3 1 2 2 0 3 0 3 4 1 0 2 2 2 0 3 0 3 23

Pre/Postorder Encoding 0a9a <a> <b><c><d/>e</c></b> <f><!--g--> <h><i/><j/></h> </f> </a> 1b3b 2c2c 3d0d 4e1e 6g4g 5f8f 8i5i 7h7h 9j6j document order pre node 0 a 1 b 2 c 3 d 4 e 5 f 6 g 7 h 8 i 9 j kind elem elem elem elem text elem comm elem elem elem size level 9 0 3 1 2 2 0 3 0 3 4 1 0 2 2 2 0 3 0 3 23

Pre/Postorder Encoding <a> <b><c><d/>e</c></b> <f><!--g--> <h><i/><j/></h> </f> </a> post 5 5 pre 23 3d0d 1b3b 2c2c document order 4e1e 0a9a 6g4g pre node 0 a 1 b 2 c 3 d 4 e 5 f 6 g 7 h 8 i 9 j 5f8f 8i5i kind elem elem elem elem text elem comm elem elem elem 7h7h 9j6j size level 9 0 3 1 2 2 0 3 0 3 4 1 0 2 2 2 0 3 0 3

Pre/Postorder Encoding <a> <b><c><d/>e</c></b> <f><!--g--> <h><i/><j/></h> </f> </a> post a 5 b c d f e 5 g h i j pre 23 3d0d 1b3b 2c2c document order 4e1e 0a9a 6g4g pre node 0 a 1 b 2 c 3 d 4 e 5 f 6 g 7 h 8 i 9 j 5f8f 8i5i kind elem elem elem elem text elem comm elem elem elem 7h7h 9j6j size level 9 0 3 1 2 2 0 3 0 3 4 1 0 2 2 2 0 3 0 3

B-Trees Accelerate XPath Location Steps The XPath axes ancestor, descendant, preceding, following partition any XML document: 24

B-Trees Accelerate XPath Location Steps The XPath axes ancestor, descendant, preceding, following partition any XML document: a f h 5 b c e g i j d 5 24

B-Trees Accelerate XPath Location Steps The XPath axes ancestor, descendant, preceding, following partition any XML document: a f h 5 b c e g i j d 5 24

B-Trees Accelerate XPath Location Steps The XPath axes ancestor, descendant, preceding, following partition any XML document: a f h 5 b c e g i j d 5 24

B-Trees Accelerate XPath Location Steps The XPath axes ancestor, descendant, preceding, following partition any XML document: a f h 5 b c e g i j d 5 24

B-Trees Accelerate XPath Location Steps The XPath axes ancestor, descendant, preceding, following partition any XML document: a f h 5 b c e g i j d 5 24

B-Trees Accelerate XPath Location Steps The XPath axes ancestor, descendant, preceding, following partition any XML document: a 5 b c d f e 5 g h i j 24 B-Tree pre post node 0 9 a 1 3 b 2 2 c 3 0 d 4 1 e 5 8 f 6 4 g 7 7 h 8 5 i 9 6 j

B-Trees Accelerate XPath Location Steps The XPath axes ancestor, descendant, preceding, following partition any XML document: a 5 b c d f e 5 g h i j 24 B-Tree pre post node 0 9 a 1 3 b 2 2 c 3 0 d 4 1 e 5 8 f 6 4 g 7 7 h 8 5 i 9 6 j

Partitioned B-Trees 25

Partitioned B-Trees Use low selectivity key prefixes to partition the XML nodes in a B-Tree according to various criteria: 25

Partitioned B-Trees Use low selectivity key prefixes to partition the XML nodes in a B-Tree according to various criteria: 1. level (support XPath child axis) B-Tree 25

Partitioned B-Trees Use low selectivity key prefixes to partition the XML nodes in a B-Tree according to various criteria: 1. level (support XPath child axis) B-Tree level = 1 level = 2 level = 3 25

Partitioned B-Trees Use low selectivity key prefixes to partition the XML nodes in a B-Tree according to various criteria: 1. level (support XPath child axis) B-Tree 2. element tag name 3. path to node ( Node GPS ) level = 1 level = 2 level = 3 Given a Pathfinder-generated workload, the index advisor of IBM DB2 V9 proposes such partitionings automatically. 25

Injecting More Tree Awareness In XQuery, an XPath location step originates in a sequence of context nodes. Duplicate nodes, out of order results (XPath semantics!), wasted work. 26

Injecting More Tree Awareness In XQuery, an XPath location step originates in a sequence of context nodes. Duplicate nodes, out of order results (XPath semantics!), wasted work. c3 c4 c1 c2 26

Injecting More Tree Awareness In XQuery, an XPath location step originates in a sequence of context nodes. Duplicate nodes, out of order results (XPath semantics!), wasted work. c3 c4 c1 c2 26

Injecting More Tree Awareness In XQuery, an XPath location step originates in a sequence of context nodes. Duplicate nodes, out of order results (XPath semantics!), wasted work. c3 c4 c1 c2 26

Injecting More Tree Awareness In XQuery, an XPath location step originates in a sequence of context nodes. Duplicate nodes, out of order results (XPath semantics!), wasted work. c3 c4 c1 c2 26

Injecting More Tree Awareness In XQuery, an XPath location step originates in a sequence of context nodes. Duplicate nodes, out of order results (XPath semantics!), wasted work. c3 c4 c1 c2 26

Injecting More Tree Awareness In XQuery, an XPath location step originates in a sequence of context nodes. Duplicate nodes, out of order results (XPath semantics!), wasted work. Pathfinder XQuery Compiler c1 c3 c4 XPath/XQuery Semantics XML Encoding c2 26

Injecting More Tree Awareness In XQuery, an XPath location step originates in a sequence of context nodes. Duplicate nodes, out of order results (XPath semantics!), wasted work. Pathfinder XQuery Compiler c1 c2 c3 c4 XPath/XQuery Semantics XML Encoding staircase join 26

Staircase Join: Context Pruning XPath following axis: c3 c4 c1 c2 27

Staircase Join: Context Pruning XPath following axis: c1 c2 27

Staircase Join: Context Pruning XPath following axis: c 1 c1 c2 c 2 27

Staircase Join: Context Pruning XPath following axis: c 1 c2 c 2 Index scan yields duplicate free result in document order. 27

Staircase Join: Skipping XPath descendant axis: c3 c4 c1 c2 28

Staircase Join: Skipping XPath descendant axis: c3 c1 28

Staircase Join: Skipping XPath descendant axis: c3 c1 scan 28

Staircase Join: Skipping XPath descendant axis: c3 v c 1 v c1 scan 28

Staircase Join: Skipping XPath descendant axis: c3 v c 1 v c1 scan skip scan Index scan yields duplicate free result in document order. 28

MonetDB/XQuery 29

MonetDB/XQuery MonetDB: Extensible relational database kernel, optimized for query processing close to the CPU. Full vertical fragmentation (column store). pre post node 0 9 a 1 3 b 2 2 c 3 0 d 4 1 e 5 8 f 6 4 g 7 7 h 8 5 i 9 6 j 29

MonetDB/XQuery MonetDB: Extensible relational database kernel, optimized for query processing close to the CPU. Full vertical fragmentation (column store). pre post 0 9 1 3 2 2 3 0 4 1 5 8 6 4 7 7 8 5 9 6 pre node 0 a 1 b 2 c 3 d 4 e 5 f 6 g 7 h 8 i 9 j 29

MonetDB/XQuery MonetDB: Extensible relational database kernel, optimized for query processing close to the CPU. Full vertical fragmentation (column store). pre post 0 9 1 3 2 2 3 0 4 1 5 8 6 4 7 7 8 5 9 6 pre node 0 a 1 b 2 c 3 d 4 e 5 f 6 g 7 h 8 i 9 j 29

MonetDB/XQuery MonetDB: Extensible relational database kernel, optimized for query processing close to the CPU. Full vertical fragmentation (column store). Pathfinder + MonetDB = MonetDB/XQuery. Implementation of XQuery 1.0. Evaluates queries against GB-range XML instances in interactive time. pre post 0 9 1 3 2 2 3 0 4 1 5 8 6 4 7 7 8 5 9 6 pre node 0 a 1 b 2 c 3 d 4 e 5 f 6 g 7 h 8 i 9 j + XQuery Update, XQuery RPC ( execute at uri { e } ), support for multi-dimensional XML,... 29

DB2 s XML support doesn t beat its relational self. 30

DB2 s XML support doesn t beat its relational self. Pathfinder builds on 30+ years of development of relational database technology. 30

DB2 s XML support doesn t beat its relational self. Pathfinder builds on 30+ years of development of relational database technology. Outperforms the built-in DB2 V9 purexml engine, especially for large XML input instances. 30

DB2 s XML support doesn t beat its relational self. Pathfinder builds on 30+ years of development of relational database technology. Outperforms the built-in DB2 V9 purexml engine, especially for large XML input instances. Covers other data-intensive languages: 1. SQL/XML 30

DB2 s XML support doesn t beat its relational self. Pathfinder builds on 30+ years of development of relational database technology. Outperforms the built-in DB2 V9 purexml engine, especially for large XML input instances. Covers other data-intensive languages: 1. SQL/XML 2. LINQ + other Nested Loop languages. 30

grust@in.tum.de http://www-db.in.tum.de/~grust/ 31

32