XML & Databases Tutorial 11. SQL Compilation, XPath Symmetries Christian Grün, Database & Information Systems Group University of, Winter 2005/06
SQL Compilation Relational Encoding: the table representation of pre-/post encoded XML documents allows an easy storage in a relational, SQL-driven database system before we have already transformed XPath location steps, e.g. result context/descendant::node() into simple range matches and predicate tests: result pre(table) > pre(context) post(table) < post(context) range matches can now be further transformed into SQL queries: SELECT DISTINCT t.* FROM table t, context c WHERE t.pre > c.pre AND t.post < c.post ORDER BY t.pre Seite 2
SQL Compilation Relational Encoding: to guarantee a one-step compilation of all SQL commands, we can store the attributes pre, post, par, kind & tag in our pre-/post table a query like result context/child::toothpick is thus evaluated as: SELECT DISTINCT t.* FROM table t, context c WHERE t.pre > c.pre AND t.post < c.post AND t.par = c.pre AND t.kind = 'elem' AND t.tag = 'toothpick' ORDER BY t.pre Seite 3
SQL Compilation Multiple Location Steps: as SQL commands can arbitrarily be nested, we can combine several location steps into one single SQL query: result context /descendant::node() /child::toothpick SELECT DISTINCT t2.* FROM ( SELECT DISTINCT t1.* FROM context c1, table t1 WHERE t1.pre > c1.pre AND t1.post < c1.post ) c2, table t2 WHERE t2.pre > c2.pre AND t2.post < c2.post AND t2.par = c2.pre AND t2.kind = 'elem' AND t2. = 'toothpick' ORDER BY t2.pre Seite 4
SQL Compilation Window Queries: the relational encoding of an XML document can also be captured in so-called window queries (axis :: tag t, context c): axis pre post par kind tag child (c.pre,*) (*,c.post) c.pre elem t descendant (c.pre,*) (*,c.post) elem t descendant-or-self [c.pre,*) (*,c.post] elem t following (c.pre,*) (c.post,*) elem t following-sibling (c.pre,*) (c.post,*) c.par elem t parent c.par (c.post,*) elem t ancestor (*,c.pre) (c.post,*) elem t ancestor-or-self (*,c.pre] [c.post,*) elem t preceding (*,c.pre) (*,c.post) elem t preceding-sibling (*,c.pre) (*,c.post) c.par elem t Seite 5
SQL Compilation Window Queries: window queries are somewhat easier to read and can be implemented by customized SQL functions the last query result context/descendant::node()/child::toothpick can then be formulated as follows: SELECT DISTINCT t1.* FROM table t1, table t2, context c WHERE t1 INSIDE window(child::toothpick, t2) AND t2 INSIDE window(descendant::node(), c) ORDER BY t1.pre Seite 6
XPath Symmetries Idea: when a query is evaluated, the efficiency might be improved when the execution order of the location steps is changed XPath location steps can be divided in forward and reverse axes as a SAX parser works sequentially, reverse axes can be transformed into forward axes when the content of an XML document is indexed, it makes sense to first evaluate predicates redundant location steps can be merged into single steps dependant of the query implementation some steps might be executed faster than others Paper: Olteanu et al., Symmetry in XPath. 2002 Seite 7
XPath Symmetries Full Symmetry: v' v/parent v v'/child mondial Partial Symmetries: v' v/descendant v v'/ancestor v' v/following v v'/preceding province city city Full root Symmetry: v' r/descendant-or-self r v'/ancestor city city Simplification: v' v//city v/descendant-or-self::node()/child::city v' v/descendant::city Seite 8
XPath Symmetries Multiple Location Step: v' v/child/child v v'/parent/parent mondial Simplified Step: v' v/child/parent v v'/self warning: a location step can lead to an empty result set /child = () /child/parent = () whereas /self = province city city city city Using predicates: by introducing predicates, we can guarantee the equivalence: v' /child/parent v' /self[child] Seite 9
XPath Symmetries Predicates: Predicates are also helpful as they just modify our context set, but don t replace it with a new one compare: province/child = city with: province[child] = province Other Symmetries: province mondial city city child::city/parent::province self::province[child::city] descendant::city/parent::province descendant-or-self::province[child::city] /descendant::/preceding::province /descendant::province[following::] city city Seite 10
XPath Symmetries Bottom-Up Approach: Predicates are often used in XPath to match text or attribute nodes. A conventional top-down approach creates many context nodes that are dropped at the end: mondial /mondial///city[/text() = "Rome ] province city city If we have stored all context nodes in an index, we can go the other way round and parse the leaf nodes first: //text()[. = "Rome ] /parent::/parent::city [ancestor::/parent::mondial] city city "Parma" "Rome" note that the parent axis is very efficient when the parent is stored as attribute! Seite 11