XML Databases 10 O. 10. XML Storage 1 Overview



Similar documents
10. XML Storage Motivation Motivation Motivation Motivation. XML Databases 10. XML Storage 1 Overview

XML Databases 6. SQL/XML

6. SQL/XML. 6.1 Introduction. 6.1 Introduction. 6.1 Introduction. 6.1 Introduction. XML Databases 6. SQL/XML. Creating XML documents from a database

XML Databases 13. Systems

An Oracle White Paper October Oracle XML DB: Choosing the Best XMLType Storage Option for Your Use Case

Implementing XML Schema inside a Relational Database

Unified XML/relational storage March The IBM approach to unified XML/relational databases

Quiz! Database Indexes. Index. Quiz! Disc and main memory. Quiz! How costly is this operation (naive solution)?

XML and Data Management

Technologies for a CERIF XML based CRIS

How To Improve Performance In A Database

Database Design Patterns. Winter Lecture 24

A Workbench for Prototyping XML Data Exchange (extended abstract)

Semistructured data and XML. Institutt for Informatikk INF Ahmet Soylu

Advanced Information Management

Translating between XML and Relational Databases using XML Schema and Automed

Introduction to XML. Data Integration. Structure in Data Representation. Yanlei Diao UMass Amherst Nov 15, 2007

Database Systems. Lecture 1: Introduction

KEYWORD SEARCH IN RELATIONAL DATABASES

Unraveling the Duplicate-Elimination Problem in XML-to-SQL Query Translation

Model-Mapping Approaches for Storing and Querying XML Documents in Relational Database: A Survey

CIS 631 Database Management Systems Sample Final Exam

ON ANALYZING THE DATABASE PERFORMANCE FOR DIFFERENT CLASSES OF XML DOCUMENTS BASED ON THE USED STORAGE APPROACH

Physical Design. Meeting the needs of the users is the gold standard against which we measure our success in creating a database.

Chapter 1: Introduction. Database Management System (DBMS) University Database Example

High Performance XML Data Retrieval

Discovering SQL. Wiley Publishing, Inc. A HANDS-ON GUIDE FOR BEGINNERS. Alex Kriegel WILEY

Coping with Semantics in XML Document Management

Storing and Querying Ordered XML Using a Relational Database System

Generating XML from Relational Tables using ORACLE. by Selim Mimaroglu Supervisor: Betty O NeilO

NETMARK: A SCHEMA-LESS EXTENSION FOR RELATIONAL DATABASES FOR MANAGING SEMI-STRUCTURED DATA DYNAMICALLY

Structured vs. unstructured data. Motivation for self describing data. Enter semistructured data. Databases are highly structured

IT2304: Database Systems 1 (DBS 1)

How To Create A Table In Sql (Ahem)

Physical Data Organization

Lesson 4 Web Service Interface Definition (Part I)

Modern Databases. Database Systems Lecture 18 Natasha Alechina

OData Extension for XML Data A Directional White Paper

14 Databases. Source: Foundations of Computer Science Cengage Learning. Objectives After studying this chapter, the student should be able to:

Managing XML Data to optimize Performance into Object-Relational Databases

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

Big Data Analytics. Rasoul Karimi

SQL Server for developers. murach's TRAINING & REFERENCE. Bryan Syverson. Mike Murach & Associates, Inc. Joel Murach

An XML Based Data Exchange Model for Power System Studies


Comparison of XML Support in IBM DB2 9, Microsoft SQL Server 2005, Oracle 10g

Managing E-Commerce Catalogs in a DBMS with Native XML Support

1 File Processing Systems

Mining Text Data: An Introduction

CS2Bh: Current Technologies. Introduction to XML and Relational Databases. Introduction to Databases. Why databases? Why not use XML?

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015

Structured vs. unstructured data. Semistructured data, XML, DTDs. Motivation for self-describing data

SQL Query Evaluation. Winter Lecture 23

æ A collection of interrelated and persistent data èusually referred to as the database èdbèè.

XML Data Integration

WEBVIEW An SQL Extension for Joining Corporate Data to Data Derived from the World Wide Web

Chapter 1: Introduction

Chapter 2: Designing XML DTDs

Application of XML Tools for Enterprise-Wide RBAC Implementation Tasks

Enhancing Traditional Databases to Support Broader Data Management Applications. Yi Chen Computer Science & Engineering Arizona State University

Relational Database Basics Review

Contents RELATIONAL DATABASES

A MEDIATION LAYER FOR HETEROGENEOUS XML SCHEMAS

2.1.5 Storing your application s structured data in a cloud database

Object Oriented Databases. OOAD Fall 2012 Arjun Gopalakrishna Bhavya Udayashankar

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

IT2305 Database Systems I (Compulsory)

Files. Files. Files. Files. Files. File Organisation. What s it all about? What s in a file?

CHAPTER 1 INTRODUCTION

MOC 20461C: Querying Microsoft SQL Server. Course Overview

Objectives. Distributed Databases and Client/Server Architecture. Distributed Database. Data Fragmentation

AN ENHANCED DATA MODEL AND QUERY ALGEBRA FOR PARTIALLY STRUCTURED XML DATABASE

Introduction to IR Systems: Supporting Boolean Text Search. Information Retrieval. IR vs. DBMS. Chapter 27, Part A

Lecture Data Warehouse Systems

IBM DB2 for Linux, UNIX, and Windows. Best Practices. Managing XML Data. Matthias Nicola IBM Silicon Valley Lab Susanne Englert IBM Silicon Valley Lab

SAP Business Objects Business Intelligence platform Document Version: 4.1 Support Package Data Federation Administration Tool Guide

Optimizing the Performance of the Oracle BI Applications using Oracle Datawarehousing Features and Oracle DAC

Data Integration Hub for a Hybrid Paper Search

Report on the Train Ticketing System

Introduction to XML Applications

Introduction. Introduction: Database management system. Introduction: DBS concepts & architecture. Introduction: DBS versus File system

Database Design and Programming

Modeling and Querying E-Commerce Data in Hybrid Relational-XML DBMSs

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre

2. Background on Data Management. Aspects of Data Management and an Overview of Solutions used in Engineering Applications

Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores

Introduction: Database management system

Transcription:

XML Databases 10 O 10. XML Storage 1 Overview Silke Eckstein Andreas Kupfer Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de

10. XML Storage 1 10.1 Motivation 10.2 Text-based storage 10.2.1 Index structures 10.3 Model-based storage 10.4 Schema-based storage 10.5 Conclusion 10.6 Overview and References XML Databases Silke Eckstein Institut für Informationssysteme TU Braunschweig 2

10.1 Motivation Applications require different types of XML documents Structure vs. content Regular vs. irregular Thus, XML documents are Data-centric Document-centric or somewhere in-between Questions Storage of XML documents Efficient processing of queries on the stored documents or data There are several methods for storage 1 st goal: Learn and understand methods 2 nd goal: Classify methods Principles Advantages and disadvantages Usage XML Databases Silke Eckstein Institut für Informationssysteme TU Braunschweig 3

10.1 Motivation Characterisation of XML documents: Data-centric documents Structured, regular E.g. product catalog, order, invoice Document-centric documents Unstructured, irregular E.g. scientific article, book, email, web page Semi-structured documents Data-centric and document-centric parts E.g. publications, Amazon, MS Press (example chapters) XML Databases Silke Eckstein Institut für Informationssysteme TU Braunschweig 4

10.1 Motivation Requirements for the physical layer: Order preserving and lossless storage of XML documents Efficient access to XML documents or parts thereof Quick response time for Queries Update operations Indexing Transaction processing Support of XPath and XQuery Support of SAX and DOM for applications XML Databases Silke Eckstein Institut für Informationssysteme TU Braunschweig 5

10.1 Motivation Storage approaches for XML documents Text-based Storage as character data Model-based Generic storage of the graph structure Storage of the DOM Schema-based Mapping to (object-)relational databases Deriving the database schema from the XML structure Using user defined mapping procedures XML Databases Silke Eckstein Institut für Informationssysteme TU Braunschweig 6

10. XML Storage 1 10.1 Motivation 10.2 Text-based storage 10.2.1 Index structures 10.3 Model-based storage 10.4 Schema-based storage 10.5 Conclusion 10.6 Overview and References XML Databases Silke Eckstein Institut für Informationssysteme TU Braunschweig 7

10.2 Text-based storage The whole XML document text is stored as character data File in the file system CLOB (Character-Large-OBject) in the DBS Operations documents as a whole are very efficient Reading and writing the whole document But the content is monolithic and opaque with respect to the relational query engine (query can't inspect a fragment) Getting granular access requires additional support Full text index Path index XML Databases Silke Eckstein Institut für Informationssysteme TU Braunschweig 8

10.2.1 Index structures Index structures for XML documents allow efficient access for specific queries Different types of indexes are optimized for different types of queries Generate redundancy Index has to be up-to-date by propagating data changes Index structures can be storage structures as well They define the storage method XML Databases Silke Eckstein Institut für Informationssysteme TU Braunschweig 9

10.2.1 Index structures Types of index structures Value index Indexes atomar values of an XML document, like element content or attribute values Index format for structured parts of XML documents Already known from databases (B-trees, hash index, ) Full text index Indexes single words from the full text Index format for unstructured parts of XML documents Already known from Information Retrieval (inverted lists, tries, suffix trees, ) Path index Indexes subtrees/paths in an XML document Index format for semistructured parts of XML documents Already known from object-databases (access support relations, ) XML Databases Silke Eckstein Institut für Informationssysteme TU Braunschweig 10

10. 2.1 Index structures B-tree as value index for an XML fragment document [Tür08] XML Databases Silke Eckstein Institut für Informationssysteme TU Braunschweig 11

10. 2.1 Index structures Full text index Not limited to exact matches Keyword-based search and boolean retrieval Pattern search (with regular expressions) Use of Statistical, word-based methods Stop word removal Elimination of uncommon items Linguistic methods Normalization of words (e.g. capitalisation, hyphenation,) Word decomposition by rules (engl.) or dictionaries (german) Stemming Knowledge-based methods Use of ontologies and thesauri to search for synonyms, hypernyms and hyponyms XML Databases Silke Eckstein Institut für Informationssysteme TU Braunschweig 12

10. 2.1 Index structures Inverted list as full text index for XML word occurrence word position in the text [Tür08] XML Databases Silke Eckstein Institut für Informationssysteme TU Braunschweig 13

10. 2.1 Index structures word occurrence [Tür08] XML Databases Silke Eckstein Institut für Informationssysteme TU Braunschweig 14

10. 2.1 Index structures Path index Structure information must be identifiable and reconstructable Assigning the markup to the content as well as Representing the hierarchical nesting and order of elements/attributes Especially suited for keyword search with regard to structure or path expressions FOR $b IN //book WHERE CONTAINS($b/author,"Benjamin") RETURN $b XML Databases Silke Eckstein Institut für Informationssysteme TU Braunschweig 15

10. 2.1 Index structures Types of path indexes Nested path index Access to root node from every node Multi-index Accessing parent nodes Join-index Access parent and child nodes Access Support Relations (ASR) Generalization of indexes above, by listing all paths in a table [Tür08] XML Databases Silke Eckstein Institut für Informationssysteme TU Braunschweig 16

10. 2.1 Index structures Conclusion Efficient query processing on XML documents requires different types of index structures Value index For efficient access to structured parts Keyword search, value search Full text index For efficient access to unstructured parts Path index Using the document structure Navigating queries XML Databases Silke Eckstein Institut für Informationssysteme TU Braunschweig 17

10.2 Text-based storage Summary text-based storage Schema definition: not required Document reconstruction: documents stay in their original format Queries: Information retrieval queries Processing the markup of the queries XML queries possible Special features: Full text functions Efficiency: Character string must be parsed on every access with XML processors expensive No concurrency on read or write no parallel processing Usage: For document-centric XML applications Suitable to only a limited extent also for semi-structured applications XML Databases Silke Eckstein Institut für Informationssysteme TU Braunschweig 18

10. XML Storage 1 10.1 Motivation 10.2 Text-based storage 10.2.1 Index structures 10.3 Model-based storage 10.4 Schema-based storage 10.5 Conclusion 10.6 Overview and References XML Databases Silke Eckstein Institut für Informationssysteme TU Braunschweig 19

10.3 Model-based storage Idea: generic storage of the graph structure XML elements, XML attributes, are nodes of a graph Nesting of elements defines edges Nodes get an (internal) ID based on graph traversal Using relations or object classes to store elements and attributes Elements ID Element name Value Reference to preceeding Rank Attributes ID Attribute name Value Reference to element Document structure can be restored completely Extension for data type adapted storage is possible XML Databases Silke Eckstein Institut für Informationssysteme TU Braunschweig 20

10.3 Model-based storage The EDGE approach [FK99] XML documents Variant BINARY: horizontal partition of EDGE based on label [Tür08] XML Databases Silke Eckstein Institut für Informationssysteme TU Braunschweig 21

10.3 Model-based storage XML queries XML queries (XPath, XQuery) are mapped to SQL queries (taking storage structures into account) Result of XML query is generated from result of database query "Labeling" of the result tuples Result is in XML format [Tür08] XML Databases Silke Eckstein Institut für Informationssysteme TU Braunschweig 22

10.3 Model-based storage Example: list bargain buy with prices SELECT a.content, b.content FROM Edge a, Edge b WHERE (a.label = 'price') AND (a.content < 10.00) AND (b.label = 'description') AND (b.parent = a.parent) AND (a.key = b.key) [Tür08] XML Databases Silke Eckstein Institut für Informationssysteme TU Braunschweig 23

10.3 Model-based storage DOM-based storage Information from the Document Object Model are stored in the database Storage alternatives (Object-)relational databases Object-oriented databases Developing own data structure [Tür08] XML Databases Silke Eckstein Institut für Informationssysteme TU Braunschweig 24

10.3 Model-based storage DOM-based storage example Node type: ELEMENT Node type: ATTRIBUTE Node type: TEXT [Tür08] XML Databases Silke Eckstein Institut für Informationssysteme TU Braunschweig 25

10.3 Model-based storage XML Queries XML queries (DOM method invocations) are mapped to SQL queries (taking storage structures into account) Result of method invocation is generated from result of database query [Tür08] XML Databases Silke Eckstein Institut für Informationssysteme TU Braunschweig 26

10.3 Model-based storage Summary model-based storage Schema definition: not required for storage Document reconstruction: Possible, but expensive Queries: XML queries possible Adapted database queries Special features: Querying many elements/attributes is expensive Efficiency: Navigation from the given context is efficient Restoring the document and evaluating path expressions is inefficient Usage: For data- and document-centric as well as for semi-structured XML applications XML Databases Silke Eckstein Institut für Informationssysteme TU Braunschweig 27

10. XML Storage 1 10.1 Motivation 10.2 Text-based storage 10.2.1 Index structures 10.3 Model-based storage 10.4 Schema-based storage 10.5 Conclusion 10.6 Overview and References XML Databases Silke Eckstein Institut für Informationssysteme TU Braunschweig 28

10.4 Schema-based storage Motivation XML content shall be stored in a conventional database Accepting the loss of native access DB schema is derieved from a DTD or an XML schema Problem Generate DB schema automatically Thereby use as much structure information as possible General approach for mapping from a DTD Transform DTD into a tree representation Nodes: element types, attributes, etc. (type layer!!!) Edges: nesting relationships of element types and their restrictions Traverse tree in order to transform nodes and edges into database tables (according to certain rules) XML Databases Silke Eckstein Institut für Informationssysteme TU Braunschweig 29

10.4 Schema-based storage Generating the DB schema for a DTD: Rules to map element types: XML element type column of a table Sequence of element types columns of a table Alternative of element types column of a table Element type with quantifier? column with null values Element type with quantifier +,* set/list of columns (SET OF, LIST OF) Nested element types TUPLE OF Rules to map attributes: XML attribute column of a table IMPLIED null values allowed REQUIRED null values not allowed Default value DEFAULT constraint XML Databases Silke Eckstein Institut für Informationssysteme TU Braunschweig 30

10.4 Schema-based storage Mapping to relational databases DTD is usually required Queries use SQL functionality RDBMS data types are used (e.g. prices are NUMERIC) Problem: Mapping of collection types Subdivide into additional relations Example: Comment: Customer_Info: Feedback: Comment_ID Customer_info Feedback 44901 C0001 F0001 ID Fname Lname Email C0001 Charles Sanchez C.Sanchez@hotmail... ID Type Content F001 opinion Darjeeling Special XML Databases Silke Eckstein Institut für Informationssysteme TU Braunschweig 31

10.4 Schema-based storage Mapping with STORED (Semistructured TO RElational Data) Basic idea: Use data mining techniques on the XML structure to find a good mapping to tables [DFS99] Input XML documents (or an average sample of the collection) Query workload Restrictions of storage space, number of tables, No DTD or XML schema is required! Output Relational schema STORED-queries: Mapping instructions for XML documents to DB tables Procedure Determine the XML subtrees with the largest support in the collection and in the queries These subtrees are materialised in tables Irregular data is stored in overflow tables according to the EDGE approach XML Databases Silke Eckstein Institut für Informationssysteme TU Braunschweig 32

10.4 Structure-based storage Mapping with STORED example Subtrees with high support XML documents shown as tree structure [Tür08] XML Databases Silke Eckstein Institut für Informationssysteme TU Braunschweig 33

10.4 Schema-based storage Mapping to object relational databases DTD is usually required Queries use SQL functionality "Natural" mapping to tuple types, collection types In case of irregular document structure databases contain many null values. Comment: Comment_ID <Customer_info> <Feedback> 44901 Fname Lname Email Charles Sanchez C.Sanchez@hotmail... Type Content opinion Darjeeling Specia XML Databases Silke Eckstein Institut für Informationssysteme TU Braunschweig 34

10.4 Schema-based storage [Tür08] XML Databases Silke Eckstein Institut für Informationssysteme TU Braunschweig 35

10.4 Schema-based storage Mapping of recursive data definitions DTDs can be recursive Infinite recursion is impossible on instance layer of a database Procedure: Marking the nodes Subdividing into separate tables Use primary and foreign keys in RDBMS Use reference types in ORDBMS <!ELEMENT book (front, body, references)> <!ELEMENT references (book+)> XML Databases Silke Eckstein Institut für Informationssysteme TU Braunschweig 36

10.4 Schema-based storage Mapping of element sequences Sequence can be important Use an additional attribute in these cases Example: <lecture> <lesson>introduction</lesson> <lesson>xml basics</lesson> Order Lesson 1 Introduction 2 XML basics XML Databases Silke Eckstein Institut für Informationssysteme TU Braunschweig 37

10.4 Schema-based storage Mapping of alternatives XML allows to specify alternatives Example: <!ELEMENT car (compactcar sedan van)*> Three possible storage variants Each alternative is stored as separate table column Subdivide alternatives in separate tables Use a table column of type XML type XML Databases Silke Eckstein Institut für Informationssysteme TU Braunschweig 38

10.4 Schema-based storage Variant 1 all alternatives in one table Problem: many null values (wasting storage space) [Tür08] XML Databases Silke Eckstein Institut für Informationssysteme TU Braunschweig 39

10.4 Schema-based storage Variant 2 subdivided into multiple tables For queries, combination of tables is needed [Tür08] XML Databases Silke Eckstein Institut für Informationssysteme TU Braunschweig 40

10.4 Schema-based storage Variant 3 Using column type XML XML type allows XML queries or DOM methods [Tür08] XML Databases Silke Eckstein Institut für Informationssysteme TU Braunschweig 41

10.4 Schema-based storage Mapping of mixed content example [Tür08] XML Databases Silke Eckstein Institut für Informationssysteme TU Braunschweig 42

10.4 Schema-based storage Mapping of mixed content Mapping to plain tables is ill-suited Use variant 3 from above or Content model ANY is not representable at all Arbitrary content, arbitrary element types Often the fitting storage structure can only be decided on instance layer XML Databases Silke Eckstein Institut für Informationssysteme TU Braunschweig 43

10.4 Schema-based storage Schema-based storage with automatic mapping Advantages Queries, data types, aggregation functions, views Integration in other databases when storing structured data Disadvantages Large schema, sparsely filled databases (many null values) No flexible data types, storage of alternatives has problems Less flexible queries No information retrieval queries possible without additional extensions No full text operations for semi- or unstructured data Usually native access is not possible any more XML Databases Silke Eckstein Institut für Informationssysteme TU Braunschweig 44

10.4 Schema-based storage Mapping solutions with different specializations Algorithms, middleware, commercial applications, Varying amount of required input or user decisions Many algorithms create different database schemas Two phases Mapping Assign a place for each node type in the DB Shredding Import the XML data as DB tuples XML Databases Silke Eckstein Institut für Informationssysteme TU Braunschweig 45

10.4 Schema-based storage Algorithm/product based on: n/a DTD schema restrictions: keys cardin. types DTD optimisation [Bus08] XML Databases Silke Eckstein Institut für Informationssysteme TU Braunschweig 46

10.4 Schema-based storage The shredder can be part of the DB Usually requires an XML schema In the IBM Data Studio, the shredder is part of the "annotated XML schema decomposition" Direct approach in DB2: register the XML schema and call the stored procedure: register xmlschema http://our.org/custacc from dec_files/custacc.xsd as cust_schema ; complete xmlschema cust_schema enable decomposition ; call SYSPROC.XDBDECOMPXML ('VRODRIG', 'CUST_SCHEMA',?,?, 1, null, null, null) XML Databases Silke Eckstein Institut für Informationssysteme TU Braunschweig 47

10.4 Schema-based storage Shredding without XML schema in DB2 XMLTABLE function in combination with an INSERT INSERT INTO ENVELOPEXT (MAILFROM, MAILTO, MAILDATE, SUBJECT) SELECT MAILFROM, MAILTO, MAILDATE, SUBJECT FROM XMLTABLE( XMLNAMESPACES('http://www.sal.com/mails' AS "email"), '$doc/email:mails/mail' (: some xquery-expression :) PASSING xml-source AS "doc" COLUMNS MAILFROM VARCHAR (100)PATH 'envelope/from', MAILTO VARCHAR (100)PATH 'envelope/to', MAILDATE VARCHAR (30) PATH 'envelope/email:date', SUBJECT VARCHAR (100)PATH 'envelope/subject') AS T; http://www.ibm.com/developerworks/db2/l ibrary/techarticle/dm-0801ledezma/ XML Databases Silke Eckstein Institut für Informationssysteme TU Braunschweig 48

10.4 Schema-based storage Summary Schema-based storage with automatic mapping Schema definition: Is usually required and analysed not required, e.g. for STORED Document reconstruction: Limited (requires logging of the mapping process) Queries: Database queries XML queries possible,but lack the XPath horizontal axes, e.g. following, preceding-sibling Special features: Federation with existing databases is possible Efficiency: High efficiency by using the DB-engine Usage: For data-centric XML applications, but with limited nesting XML Databases Silke Eckstein Institut für Informationssysteme TU Braunschweig 49

10.4 Schema-based storage User defined mapping Idea In all previously shown methods it is not possible to affect the storage in the DB With user defined mappings the user defines the storage structure The structure of XML documents and database schema can be designed independently from each other Also possible: storing XML documents in existing databases Annotation of DTD and XML schema, respectively In many cases the mapping definition is combined with existing schema information Only limited XML queries possible Logging of the mapping process from XML documents to databases For a given query all relevant data has to be stored (lossless mapping) XML Databases Silke Eckstein Institut für Informationssysteme TU Braunschweig 50

10.4 Schema-based storage Example: XML document mapping instruction [Tür08] XML Databases Silke Eckstein Institut für Informationssysteme TU Braunschweig 51

10.4 Schema-based storage Mapping instruction Example syntax for XML-DBMS (Roland Bourret) <ClassMap> <ElementType Name="sales:SalesOrder"/> <ToClassTable> <Table Name="Sales"/> </ToClassTable> <PropertyMap> <Attribute Name="SONumber"/> <ToColumn> <Column Name="Number"/> </ToColumn> </PropertyMap> </ClassMap> Connection between elements and tables Connection between elements/attributes and table columns XML Databases Silke Eckstein Institut für Informationssysteme TU Braunschweig 52

10.4 Schema-based storage Remarks Many different mapping languages or schema annotations Automatic mappings usually have an internal mapping language Remember the mapping constructs from lecture 5 and 6. The SQL/XML annotations are a mapping language, too. DB2 uses similar annotations as SQL/XML On the next slide, the example from lecture 6 is shown with DB2 syntax XML Databases Silke Eckstein Institut für Informationssysteme TU Braunschweig 53

Mapping SQL tables CREATE TABLE Account ( Name CHAR(20), Balance NUMERIC(12,2), ); Name Balance Joe 2000 Jim 3500 [Tür08] Mapping SQL table columns to XML SQL/XML elements schema annotationsin Mapping table DB2 rowstoxml (table is called <row> rowset) elements <ACCOUNT> <row> <NAME>Joe</NAME> <BALANCE>2000</BALANCE> </row> <row> <NAME>Jim</NAME> <BALANCE>3500</BALANCE> </row> </ACCOUNT> <xsd:complextype xmlns:db2-xdb= "http://www.ibm.com/xmlns/prod/db2/xdb1" name="row.account"> <xsd:sequence> <xsd:elementname="name" type="char_20" db2-xdb:rowset="account" db2-xdb:column="name"/> <xsd:element name="balance" type="numeric_12_2"/> db2-xdb:rowset="account" db2-xdb:column="balance"/> </xsd:sequence> </xsd:complextype> <xsd:complextype name="table.account"> <xsd:sequence> <xsd:element name="row" type="row.account"/> </xsd:sequence> </xsd:complextype> <xsd:element name="account" type="table.account"/> XML Databases Silke Eckstein Institut für Informationssysteme TU Braunschweig 54

10.4 Schema-based storage Summary schema-based storage with user defined mapping Schema definition: Depends on mapping language Document reconstruction: Not possible in most cases (requires logging of the mapping process) Queries: Database queries XML queries in rare cases only! Special features: Integration with existing databases is possible Efficiency: High efficiency by using the DB-engine Usage: For data-centric XML applications XML Databases Silke Eckstein Institut für Informationssysteme TU Braunschweig 55

10. XML Storage 1 10.1 Motivation 10.2 Text-based storage 10.2.1 Index structures 10.3 Model-based storage 10.4 Schema-based storage 10.5 Conclusion 10.6 Overview and References XML Databases Silke Eckstein Institut für Informationssysteme TU Braunschweig 56

10.5 Conclusion Different methods for storage of XML documents Text-based Storing whole XML documents as string Can use full text index or path index Model-based Generic mapping of the tree structure Schema-based Detect and analyse the structure of the XML documents Derive a DB schema from the structure Hybrid approaches A combination of some of those methods No algorithm has the optimal solution for all kind of XML documents Reasonable solution is heavily dependent on the application XML Databases Silke Eckstein Institut für Informationssysteme TU Braunschweig 57

10.6 References "XML und Datenbanken" [Tür08] Can Türker Lecture, University of Zurich, 2008 "XML und Datenbanken" [KM03] M. Klettke, H. Meier dpunkt.verlag, 2003 "Generierung eines adaptiven Datenbankschemas für datenzentrierte XML- Dokumente" [Bus08] Carsten Busche Diplomarbeit, TU Braunschweig, 2008 [FK99] D. Florescu, D. Kossmann: Storing and Querying XML Data using an RDBMS. IEEE Data engineering Bulletin (DEBU), Volume 22(3), Seiten 27-34, 1999. [DFS99] A. Deutsch, M.F. Fernández, D. Suciu: Storing Semistructured Data with STORED. Proceedings of the 1999 ACM SIGMOD international conference on Management of data, Seiten 431-442, ACM, 1999. XML Databases Silke Eckstein Institut für Informationssysteme TU Braunschweig 58

10.6 Overview 1. Introduction 2. XML Basics 3. Schema definition 4. XML query languages I 5. Mapping relational data to XML 6. SQL/XML 7. XML processing 8. XML query languages II XQuery Data Model 9. XML query languages III XQuery 10. XML storage I Overview 11.XML storage II 12. Updates / Transactions 13. Systems XML Databases Silke Eckstein Institut für Informationssysteme TU Braunschweig 59

Questions, Ideas, Comments Now, or... Room: IZ 232 Office our: Tuesday, 12:30 13:30 Uhr or on appointment Email: eckstein@ifis.cs.tu-bs.de XML Databases Silke Eckstein Institut für Informationssysteme TU Braunschweig 60