Databases and Information Systems Storage models for XML trees in small main memory devices Long term goals: reduce memory compression (?) still query efficiently small data structures Databases and Information Systems SS 0 07 Prof. Dr. Stefan Böttcher Storage models for XML trees / Storage models for XML trees (): binary tables for binary trees based on label, firstchild (fc) and nextsibling(ns) <> <></> <>pc00</> </> label(, Label) fc(p,fc) Label() fc() pc00 ns(,ns) ns() pc00 Databases and Information Systems SS 0 07 Prof. Dr. Stefan Böttcher Storage models for XML trees /
Storage models for XML trees (): a single table for binary trees based on label, firstchild (fc) and nextsibling(ns) <> <></> <>pc00</> </> Label() fc() ns() pc00 pc00 Databases and Information Systems SS 0 07 Prof. Dr. Stefan Böttcher Storage models for XML trees / Storage models for XML trees (): a single table for unranked trees based on label, firstchild (fc) and nextsibling(ns) <> <></> <>pc00</> </> Label() p() sibling pc00 pc00 Databases and Information Systems SS 0 07 Prof. Dr. Stefan Böttcher Storage models for XML trees /
Ex.: Compute needed storage for XML trees () three tables binary tree () single table binary tree () single table unranked tree Label() fc() Label() fc() ns() Label() p() sibling pc00 ns() pc00 pc00 Assume: byte per, fc(), ns() and bytes on average per Label. compute sizes for (), () and (). develop general formulas for binary XML tree with N inner nodes: a) how many leaf nodes? b) formulas for () and for (). How can we store an unranked tree in (more) tables? Databases and Information Systems SS 0 07 Prof. Dr. Stefan Böttcher Storage models for XML trees / Using arrays to store XML trees () three tables binary tree () single table binary tree () single table unranked tree Label() fc() Label() fc() ns() Label() p() sibling pc00 ns() pc00 pc00 Use as array index st column is not needed how to reduce high values? use relative s / array indices (array index difference) instead of absoulte s Databases and Information Systems SS 0 07 Prof. Dr. Stefan Böttcher Storage models for XML trees /
How to treat different node types? escape text nodes: <E>"text"</E> <E> <=text> </=text> </E> escape attribute nodes: <E a="value"></e> <E> <@a> <=value> </=value> </@a> </E> + escape node, comments, PIs only elements remain Databases and Information Systems SS 0 07 Prof. Dr. Stefan Böttcher Storage models for XML trees / 7 How to transform XML into a binary XML tree. Simplify single node type (element nodes) only. generate binary tree E E E E E E E7 E8. store binary tree Databases and Information Systems SS 0 07 Prof. Dr. Stefan Böttcher Storage models for XML trees / 8
How to transform XML into a binary XML tree. Simplify single node type (element nodes) only. generate binary tree fc E fc E ns E ns E fc E ns E E7 ns E8. store binary tree Databases and Information Systems SS 0 07 Prof. Dr. Stefan Böttcher Storage models for XML trees / 9 Generating and storing a binary XML tree XML file SAXEvents Simplified SAX events Binary Simplified SAX events list storing binary simplified SAX events Databases and Information Systems SS 0 07 Prof. Dr. Stefan Böttcher Storage models for XML trees / 0
Simple Access to XML (SAX). <doc>. <customer name= Alice >. <order>...... </order> 7. <address> </address> </customer> 8. <customer> 9. <order/> 0. <address/> </customer> </doc> name = Alice. doc. customer customer. order address 9. order address 0. 7.... Parser accesses at most one XML element node at a time: can navigate and process nodes only in document order less flexible programming than DOM + need less space in main memory + loading document nodes into main memory is fast 8. Databases and Information Systems SS 0 07 Prof. Dr. Stefan Böttcher Storage models for XML trees / SAXParserJavaAPI () // generate JAXP SAXParserFactory SAXParserFactory spf = SAXParserFactory.newInstance(); // set namespaceaware to true spf.setnamespaceaware(true); // generate JAXP SAXParser SAXParser saxparser = spf.newsaxparser(); // get handle to the embedded SAX XMLReader XMLReader xmlreader = saxparser.getxmlreader(); // generate new SAX output stream for ContentHandler of XMLReader xmlreader.setcontenthandler(new SAXOut()); // setup ErrorHandler, before parsing starts xmlreader.seterrorhandler(new MyErrorHandler(System.err)); // parse the XML file using the XMLReader xmlreader.parse("file:"+filename); Databases and Information Systems SS 0 07 Prof. Dr. Stefan Böttcher Storage models for XML trees /
SAXParserJavaAPI () // Parser calls this procedure once, when parsing the document starts public void startdocument() throws SAXException { } // SAX parser calls this once for each start tag of an element public void startelement( String namespaceuri, String localname, String qname, Attributes atts) throws SAXException { // code example: for(int i=0; i<atts.getlength(); i++) { // for each attribute out.println( atts.getqname(i) + "=\"" + atts.getvalue(i)+"\""); } // output attribute name and attribute value } // SAX parser calls this once for each end tag of an element public void endelement( String namespaceuri, String localname, String qname) throws SAXException { } // SAX parser calls this once when end of document is reached public void enddocument() throws SAXException { } Databases and Information Systems SS 0 07 Prof. Dr. Stefan Böttcher Storage models for XML trees / SAXParserJavaAPI () // SAX parser calls this once // for each text found in the XML document public void characters(char[ ] ch, int start, int length) throws SAXException { String text = new String (ch, start, length); text = text.trim(); } Databases and Information Systems SS 0 07 Prof. Dr. Stefan Böttcher Storage models for XML trees / 7
From SAX events to binary SAX events Pairs of SAXEvents Location Step Generate node endelement(_) startelement(a) startelement(_) startelement(a) endelement(_) endelement(_) startelement(_) endelement(_) nextsibling :: a firstchild :: a parent :: * no location step nextsibling : a firstchild : a (nothing) go back to parent (nothing) Databases and Information Systems SS 0 07 Prof. Dr. Stefan Böttcher Storage models for XML trees / Summary: Storage of XML trees different storage models binary tree can be efficiently stored different implementations: multiple tables, single table we use single table because of further compression steps Databases and Information Systems SS 0 07 Prof. Dr. Stefan Böttcher Storage models for XML trees / 8
Succinct storage of XML trees () single table binary tree Label() fc() ns() ns pc00 pc00 How can we avoid pointers? use array instead of table avoids first column use bits denoting existence of fc and ns avoids fc and ns columns, but requires bits Databases and Information Systems SS 0 07 Prof. Dr. Stefan Böttcher Storage models for XML trees / 7 Succinct storage of XML trees () use bits denoting existence of fc and ns avoid pointers single table binary tree Label() fc() ns() ns pc00 pc00 <r> <A> <K> <=M> </=M> </K> <P> <=p> </=p> </P> </A> </r> tags 0 0 0 0 0 0 bits node s Databases and Information Systems SS 0 07 Prof. Dr. Stefan Böttcher Storage models for XML trees / 8 9
Succinct storage of XML trees () use bits denoting existence of fc and ns avoid pointers <r> <A> <K> <=M> </=M> </K> <P> <=p> </=p> </P> </A> </r> tags 0 0 0 0 0 0 bits node s Label() pc00 Databases and Information Systems SS 0 07 Prof. Dr. Stefan Böttcher Storage models for XML trees / 9 Succinct storage of XML trees () Exercise. How can we support navigation via firstchild (fc) and nextsibling (ns)?. How can we compress further without disabling navigation? <r> <A> <K> <=M> </=M> </K> <P> <=p> </=p> </P> </A> </r> tags 0 0 0 0 0 0 bits node s Label() pc00 Databases and Information Systems SS 0 07 Prof. Dr. Stefan Böttcher Storage models for XML trees / 0 0
Succinct storage of XML trees (). How can we support navigation via firstchild (fc) and nextsibling (ns)?. How can we compress further without disabling navigation? <r> <A> <K> <=M> </=M> </K> <P> <=p> </=p> </P> </A> </r> tags 0 0 0 0 0 0 bits node s Label() pc00. store actual position fc : look at next bit = fc exists ns : close following subtree, i.e. s = 0s and look at next bit = ns exists. count s until actual position node Databases and Information Systems SS 0 07 Prof. Dr. Stefan Böttcher Storage models for XML trees / Succinct storage of XML trees () Succinct representation of s in table (, Label() ) T concept <r> <A> <K> <=M> </=M> </K> <P> <=p> </=p> </P> </A> </r> tags 0 0 0 0 0 0 bits node s Label() pc00 lenght Label() 7 pc00 zip packages packages' size e.g. 0? Databases and Information Systems SS 0 07 Prof. Dr. Stefan Böttcher Storage models for XML trees /
Succinct storage of XML trees (7) Succinct representation of s in table (, Label() ) T concept <r> <A> <K> <=M> </=M> </K> <P> <=p> </=p> </P> </A> </r> tags 0 0 0 0 0 0 bits node s Label() pc00 lenght Label() 7 pc00 zip packages packages' size e.g. 0? store / search only,, 7 (s that start a new package) Databases and Information Systems SS 0 07 Prof. Dr. Stefan Böttcher Storage models for XML trees / Tuple entifier (T) concept for strings zip packages packages' size e.g. 0 +7++ <=0 Byte lenght Label() 7 pc00 07 store only s that start a new package String that starts a new package () () 7 +++ <=0 Byte pc00 0 improvement (?) relative addresses (= package lengths) Databases and Information Systems SS 0 07 Prof. Dr. Stefan Böttcher Storage models for XML trees /
Generating and storing a binary XML tree XML file SAXEvents Simplified SAX events Binary Simplified SAX events list storing binary simplified SAX events binary DAG of binary simplified SAX events grammar of simpified DAG events succinct representation of grammar Databases and Information Systems SS 0 07 Prof. Dr. Stefan Böttcher Storage models for XML trees /