Database-Supported XML Processors Prof. Dr. Torsten Grust torsten.grust@uni-tuebingen.de Winter 2008/2009 Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 1
Part I Preliminaries Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 2
Outline of this part 1 Welcome Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 3
A Word About Myself Torsten Grust Originally from Hannover 1989 1994 Student of Computer Science @ TU Clausthal 1994 2004 Database Research @ U Konstanz 1999 Promotion 2000 Visiting Scientist @ IBM, DB2 Everyplace 2004 Habilitation 2004-2005 Professor @ TU Clausthal 2005-2008 Professor @ TU München since 9/2008 Professor @ U Tübingen Web home: http://www-db.informatik.uni-tuebingen.de/ Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 4
Welcome to this Course... We will use relational database technology to develop a highly efficient, scalable processor for XML languages like XPath, XQuery, and XML Schema. This means that 1 you will get to know these XML technologies quite well, and 2 you can apply and deepen your (rusty?) knowledge of RDBMSs in a challenging, unusual, and fun domain. Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 5
Relational XML Processing XML Processors Tree Processors This is a course on Relational Tree Processors. Relational Tree Encoding E E Map tree queries into relational queries over tree encodings: Tree E Rel tree query relational query Tree Rel E 1 Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 6
Compiling XQuery to Relational Algebra (1) Input: XQuery Expression Query against an Internet auction database (think ebay): How many auction items are listed in each of the site s [geographical] regions? for $r in doc( auction.xml )/site/regions/* return count($r//item) Tree query: Note how this query uses tree navigation operators / (read: child) and // (descendant) to explore the input XML document auction.xml. Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 7
Compiling XQuery to Relational Algebra (2) Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 8
Compiling XQuery to Relational Algebra (3) 1. Output: Relational Algebra (MonetDB s Dialect) 2 a0000 := a0004.reverse ().sort ().reverse (); 3 a0000 := a0000.ctrefine (a0003); 4 a0000 := a0000.ctrefine (a0002); 5 a0000 := a0000.mark (0@0).reverse (); 6 a0001 := a0000.leftjoin (a0002); 7 a0005 := a0000.leftjoin (a0004); 8 a0006 := a0000.leftjoin (a0003); 9. 10 a0003 := count(a0004.reverse ()); 11 a0007 := a0003.reverse ().mark (0@0).reverse (); 12 a0008 := a0003.mark (0@0).reverse (); 13. 14 [... 429 lines in total... ] Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 9
Pathfinder For about 6 1 /2 years now, work is underway to design and build the purely relational XQuery processor Pathfinder. Joint work with a couple of brilliant guys from Pathfinder generates an internal algebraic representation of XQuery expressions and then emits 1 MIL code for consumption by MonetDB/XQuery, or 2 SQL:1999 code to be executed by off-the-shelf RDBMS, e.g., IBM DB2. Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 10
Pathfinder & IBM DB2 vs. 110+ MB of XML (Pathfinder & IBM DB2 Screencast) Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 11
Hands On! In a sense, this course is an in-depth tour of the techniques and concepts behind Pathfinder. Because Pathfinder has been under development since 2002, the system is already usable and provides an ideal playground for us. Available under the Mozilla OSS License www.pathfinder-xquery.org www.monetdb-xquery.org Source code and installers for Unix (Linux, Mac OS X), Windows. Please download and use it (and submit bug reports ;-)). Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 12
Further Reading Material...... the XML standard family: http://www.w3.org/xml/ (links marked with are frequently found on the slides) Warning: rather impenetrable on first sight!... on XPath and XQuery: XQuery from the Experts Jonathan Robie et.al. ISBN 0-321-18060-7 Addison-Wesley, 2003 The XML Query Language Michael Brundage ISBN 0-321-16581-0 Addison-Wesley, 2004... various research papers on how database technology can embrace XML, XPath, and XQuery (this is a vivid research area); downloadable from the course web page. Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 13
Further Reading Material Easily digestable introductions to XML, XPath, and XQuery: The Annotated XML Specification http://www.xml.com/axml/testaxml.htm Chapter XPath of XML in a Nutshell (O Reilly) http://www.oreilly.com/catalog/xmlnut2/chapter/ XQuery: A Guided Tour http://www.datadirect.com/developer/xml/ xquery/docs/katz c01.pdf Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 14
Organisatorisches Termine Zeit Ort Vorlesung Do,13:15 14:45 Sand 6/7, kleiner Hörsaal Übung Di, 13:15 14:45 Sand 6/7, kleiner Hörsaal (Jan Rittinger) Homepage + Material zur Vorlesung www-db.informatik.uni-tuebingen.de/teaching/ws0809/dbxml Folien [PDF] zum Download verfügbar (ca. einen Tag vor Termin). Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 15
Wie profitiert man von dieser Vorlesung? Übungsaufgaben und Klausuraufgaben werden sich sehr ähneln. Aktiv dabei sein! Übungen starten nächsten Dienstag (28. Oktober) Beispiele nachvollziehen und eigene Experimente starten: Michael Kay s Saxon (www.saxonica.com) Pathfinder Klausur/mdl. Kolloq zum Ende des Semesters bestehen. Sprechstunde nutzen Fast immer, wenn die Türen zu unseren Büros (Sand 13, B312 und B318) offen stehen. Effektiv sind das 90 % unserer Anwesenheitszeiten. Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 16
Questions? Questions...? Comments...? Suggestions...? Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 17