Introduction to XML. Data Integration. Structure in Data Representation. Yanlei Diao UMass Amherst Nov 15, 2007

Similar documents
04 XML Schemas. Software Technology 2. MSc in Communication Sciences Program in Technologies for Human Communication Davide Eynard

Quiz! Database Indexes. Index. Quiz! Disc and main memory. Quiz! How costly is this operation (naive solution)?

Last Week. XML (extensible Markup Language) HTML Deficiencies. XML Advantages. Syntax of XML DHTML. Applets. Modifying DOM Event bubbling

XML Schema Definition Language (XSDL)

Semistructured data and XML. Institutt for Informatikk INF Ahmet Soylu

DTD Tutorial. About the tutorial. Tutorial

XML: extensible Markup Language. Anabel Fraga

Translating between XML and Relational Databases using XML Schema and Automed

Structured vs. unstructured data. Semistructured data, XML, DTDs. Motivation for self-describing data

XML and Data Management

Chapter 2: Designing XML DTDs

Introduction. Web Data Management and Distribution. Serge Abiteboul Ioana Manolescu Philippe Rigaux Marie-Christine Rousset Pierre Senellart

Java and XML parsing. EH2745 Lecture #8 Spring

Lecture 21: NoSQL III. Monday, April 20, 2015

Et tu, XML? Philip Wadler, Avaya Labs

How To Write A Type Definition In Xhtml 1.2.2

Structured vs. unstructured data. Motivation for self describing data. Enter semistructured data. Databases are highly structured

XML and Data Integration

Languages for Data Integration of Semi- Structured Data II XML Schema, Dom/SAX. Recuperación de Información 2007 Lecture 3.

XML Based Customizable Screen. Rev 1.1

6. SQL/XML. 6.1 Introduction. 6.1 Introduction. 6.1 Introduction. 6.1 Introduction. XML Databases 6. SQL/XML. Creating XML documents from a database

Course: Introduction to XML

Integration and interoperability of data sources: forward into the new century

Visualization of GML data using XSLT.

XML Databases 6. SQL/XML

AN ENHANCED DATA MODEL AND QUERY ALGEBRA FOR PARTIALLY STRUCTURED XML DATABASE

How To Use Xml In A Web Browser (For A Web User)

Modern Databases. Database Systems Lecture 18 Natasha Alechina

<Namespaces> Core XML Technologies. Why Namespaces? Namespaces - based on unique prefixes. Namespaces. </Person>

Model-driven Rule-based Mediation in XML Data Exchange

Exercises: XSD, XPath Basi di da4 2

XML. Document Type Definitions XML Schema

An XML Based Data Exchange Model for Power System Studies

Content Management for Declarative Web Site Design

Relational Databases for Querying XML Documents: Limitations and Opportunities. Outline. Motivation and Problem Definition Querying XML using a RDBMS

Implementing XML Schema inside a Relational Database

Outline. Data Modeling. Conceptual Design. ER Model Basics: Entities. ER Model Basics: Relationships. Ternary Relationships. Yanlei Diao UMass Amherst

Connecting to WebSphere ESB and WebSphere Process Server

George McGeachie Metadata Matters Limited. ER SIG June 9th,

XML. Dott. Nicole NOVIELLI XML: extensible Markup Language

Intelligent Agents and XML - A method for accessing webportals in both B2C and B2B E-Commerce

QUT Digital Repository:

Conceptual Level Design of Semi-structured Database System: Graph-semantic Based Approach

The A2A Data Model and its application in WieWasWie. Michel

Hybrid Similarity Measure for XML Data Integration and Transformation

Data Integration for XML based on Semantic Knowledge

1. Write the query of Exercise 6.19 using TRC and DRC: Find the names of all brokers who have made money in all accounts assigned to them.

Managing Typed Hybrid XML and Relational Data Sources for Web Content Management

How To Write A Contract Versioning In Wsdl 2.2.2

XML Data Integration

Validating Documents of Web-based Metalanguages Using Semantic Rules

How To Create A Table In Sql (Ahem)

XML WEB TECHNOLOGIES

INTEGRATING WEB SERVICES INTO A WEB-BASED COLLEGE ADMISSION PORTAL SYSTEM

Application of XML Tools for Enterprise-Wide RBAC Implementation Tasks

Part 5 Overview. Conceptual Design. Conceptual Design Activities. Purpose. Verteilte Web-basierte Systeme. Part V. Chapter://1

Standard Recommended Practice extensible Markup Language (XML) for the Interchange of Document Images and Related Metadata

XML. CIS-3152, Spring 2013 Peter C. Chapin

XML for RPG Programmers: An Introduction

OpenTravel Alliance XML Schema Design Best Practices

Model-Mapping Approaches for Storing and Querying XML Documents in Relational Database: A Survey

Modernize your NonStop COBOL Applications with XML Thunder September 29, 2009 Mike Bonham, TIC Software John Russell, Canam Software

2009 Martin v. Löwis. Data-centric XML. Other Schema Languages

Logical Design of Data Warehouses from XML

A Workbench for Prototyping XML Data Exchange (extended abstract)

Data Modeling. Database Systems: The Complete Book Ch ,

A MEDIATION LAYER FOR HETEROGENEOUS XML SCHEMAS

Converting XML Data To UML Diagrams For Conceptual Data Integration

Using XML to Test Web Software Services. Modern Web Sites

The Relational Model. Why Study the Relational Model? Relational Database: Definitions. Chapter 3

The Relational Model. Why Study the Relational Model? Relational Database: Definitions

Combining Unstructured, Fully Structured and Semi-Structured Information in Semantic Wikis

Allegato XML flusso richieste di produzione

SQL DATA DEFINITION: KEY CONSTRAINTS. CS121: Introduction to Relational Database Systems Fall 2015 Lecture 7

AUTOMATING DATA ACQUISITION INTO ONTOLOGIES FROM PHARMACOGENETICS RELATIONAL DATA SOURCES USING DECLARATIVE OBJECT DEFINITIONS AND XML

Xml Normalization and Reducing RedUndies

Technologies for a CERIF XML based CRIS

Privilege Algebra for Access Control in Digital Libraries

Chapter 15 Working with Web Services

[MS-MDE]: Mobile Device Enrollment Protocol. Intellectual Property Rights Notice for Open Specifications Documentation

XEP-0043: Jabber Database Access

XML DATA INTEGRATION SYSTEM

Stage 3 proposal: Feature #13102 (Release Management Domain)

Data integration is a central issue within Semantic Web research, especially when

Xml Mediator and Data Management

Archiving Scientific Data - A Practical Approach


CHAPTER 9: DATAPORT AND XMLPORT CHANGES

Providing Semantically Equivalent, Complete Views for Multilingual Access to Integrated Data

Geography Markup Language (GML) simple features profile

XML Databases 10 O. 10. XML Storage 1 Overview

Extensible Markup Language (XML): Essentials for Climatologists

EHR-IIS Interoperability Enhancement Project. Transport Layer Protocol Recommendation Formal Specification. Version 1.

Chapter 1: Introduction

XStruct: Efficient Schema Extraction from Multiple and Large XML Documents

Databases 2011 The Relational Model and SQL


Data Modeling Basics

An Introduction to Designing XML Data Documents

XML - A Practical Application and Design

Transcription:

Introduction to XML Yanlei Diao UMass Amherst Nov 15, 2007 Slides Courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives and Gerome Miklau. 1 Structure in Data Representation Relational data is highly structured structure is defined by the schema good for system design good for precise query semantics / answers Structure can be limiting data exchange hard: integration of diff schema authoring is constrained: schema-first querying constrained: must know schema changes to structure not easy 2 Data Integration 1. Find all departments whose total employee salaries exceed 1% of the budget of the company. 2. Find names of employees with the top sales record last month. US Australia Internet Europe Asia 3 1

Integration of Text and Structured Data Structured data - Databases Semistructured Data WWW Unstructured Text - Documents 4 Need for A New Data Model Loose (and rich) structure Integration of structured, but heterogeneous data sources Textual data with tags and links Combination of data models Evolving and irregular structures 5 XML: Universal Data Exchange Format XML is the confluence of many factors: Databases needed a more flexible interchange format. Data needed to be generated and consumed by applications. The Web needed a more declarative format for data. Documents needed a mechanism for extended tags. XML was originally proposed for online publishing, is becoming the wire format for data exchange. W3C Recommendation: http://www.w3.org/tr/rec-xml/ 6 2

From HTML to XML HTML describes the presentation. 7 HTML <h1> Bibliography </h1> <p> <i> Foundations of Databases </i> Abiteboul, Hull, Vianu <br> Addison Wesley, 1995 <p> <i> Data on the Web </i> Abiteboul, Buneman, Suciu <br> Morgan Kaufmann, 1999 8 XML <bibliography> <book> <title> Foundations </title> <author> Abiteboul </author> <author> Hull </author> <author> Vianu </author> <publisher> Addison Wesley </publisher> <year> 1995 </year> </book> </bibliography> XML describes the content! 9 3

XML: Syntax & Typing 10 XML Syntax Tags: book, title, author, start tag: <book> end tag: </book> Elements: <book> </book>,<author> </author> elements are nested empty element: <red></red>, abbrv. <red/> An XML document: single root element An XML document is well formed if it has matching tags 11 XML Syntax <book price = 55 currency = USD > <title> Foundations of Databases </title> <author> Abiteboul </author> <year> 1995 </year> </book> Attributes are alternative ways to represent data. 12 4

XML Syntax <person id= o555 > <name> Jane </name> <person id= o456 > <name> Mary </name> <children idref= o123 o555 /> <person id= o123 mother= o456 ><name>john</name> Oids and references in XML are just syntax. 13 XML Semantics: a Tree! <data> <person id= o555 > <name> Mary </name> <address> <street> Maple </street> <no> 345 </no> <city> Seattle </city> </address> <person> <name> John </name> <address> Thailand </address> <phone> 23456 </phone> </data> i d o555 Attribute node name Mary person address street no city Maple 345 data Seattle name John Element node person address phone Thai 23456 Text node Order matters! IDREF will turn it to a graph. 14 XML Data XML is self-describing Schema elements become part of the data Relational schema: persons(name,phone) In XML <persons>, <name>, <phone> are part of the data, and are repeated many times Consequence: XML is much more flexible Some real data: http://www.cs.washington.edu/research/xmldatasets/ 15 5

Relational Data as XML person name phone John 3634 Sue 6343 Dick 6363 XML: person row row row name name name phone phone phone John 3634 Sue Dick 6343 6363 <person> <row> <name>john</name> <phone> 3634</phone></row> <row> <name>sue</name> <phone> 6343</phone> <row> <name>dick</name> <phone> 6363</phone></row> 16 XML is Semi-structured Data Missing attributes: <data> <person> <name> John</name> <phone>1234</phone> <person> <name>joe</name> </data> no phone! Could represent in a table with nulls name phone John 1234 Joe - 17 XML is Semi-structured Data Repeated attributes <person> <name> Mary</name> <phone>2345</phone> <phone>3456</phone> two phones! Impossible in tables: nested collections (non 1NF) name phone Mary 2345 3456??? 18 6

XML is Semi-structured Data Attributes with different types in different objects <data> <person> <name> <first> John </first> <last> Smith </last> </name> <phone>1234</phone> <person> <name> M. Carey</name> <phone>3456</phone> </data> structured name! unstructured name! Mixed content: <db> contains both <book>s and <publisher>s 19 Data Typing in XML Data typing in the relational model: schema Data typing in XML Much more complex Typing restricts valid trees that can occur theoretical foundation: tree languages Practical methods: DTD (Document Type Definition) XML Schema 20 Document Type Definitions (DTD) Part of the original XML specification To be replaced by XML Schema Much more complex An XML document may have a DTD XML document: well-formed = if tags are correctly closed Valid = if it has a DTD and conforms to it Validation is useful in data exchange 21 7

DTD Example <!DOCTYPE company [ <!ELEMENT company ((person product)*)> <!ELEMENT person (ssn, name, office, phone?)> <!ELEMENT ssn (#PCDATA)> <!ELEMENT name (#PCDATA)> <!ELEMENT office (#PCDATA)> <!ELEMENT phone (#PCDATA)> <!ELEMENT product (pid, name, description?)> <!ELEMENT pid (#PCDATA)> <!ELEMENT description (#PCDATA)> ]> 22 DTD Example Example of valid XML document: <company> <person> <ssn> 123456789 </ssn> <name> John </name> <office> B432 </office> <phone> 1234 </phone> <person> <ssn> 987654321 </ssn> <name> Jim </name> <office> B123 </office> <product>... </product>... </company> 23 DTD: The Content Model <!ELEMENT tag (CONTENT)> content model Content model: Complex = a regular expression over other elements Text-only = #PCDATA Empty = EMPTY Any = ANY Mixed content = (#PCDATA A B C)* 24 8

DTD: Regular Expressions sequence DTD <!ELEMENT name (firstname, lastname)) optional <!ELEMENT name (firstname?, lastname)) Kleene star <!ELEMENT person (name, phone*)) alternation <!ELEMENT person (name, (phone email))) XML <name> <firstname>..... </firstname> <lastname>..... </lastname> </name> <person> <name>..... </name> <phone>..... </phone> <phone>..... </phone> <phone>..... </phone>...... 25 Attributes in DTDs <!ELEMENT person (ssn, name, office, phone?)> <!ATTLIST person age CDATA #REQUIRED> <person age= 25 > <name>...</name>... 26 Attributes in DTDs <!ELEMENT person (ssn, name, office, phone?)> <!ATTLIST person age CDATA #REQUIRED id ID #REQUIRED manager IDREF #REQUIRED manages IDREFS #REQUIRED > <person age= 25 id= p29432 manager= p48293 manages= p34982 p423234 > <name>...</name>... 27 9

Attributes in DTDs Types: CDATA = string ID = key IDREF = foreign key IDREFS = foreign keys separated by space (Monday Wednesday Friday) = enumeration 28 Attributes in DTDs Kind: #REQUIRED #IMPLIED = optional value = default value value #FIXED = the only value allowed 29 Using DTDs Must include in the XML document Either include the entire DTD: <!DOCTYPE rootelement [... ]> Or include a reference to it: <!DOCTYPE rootelement SYSTEM http://www.mydtd.org > Or mix the two... (e.g. to override the external definition) 30 10

XML Schema DTDs capture grammatical structure, but have some drawbacks: Not themselves in XML, inconvenient to build tools Don t capture database datatypes domains No way of defining OO-like inheritance XML Schema addresses shortcomings of DTDs XML syntax Subclassing Domains and built-in datatypes min. and max # of occurrences of elements http://www.w3.org/xml/schema 31 Basics of XML Schema Need to use the XML Schema namespace (generally named xsd) simpletypes are a way of restricting domains on scalars Can define a simpletype based on integer, with values within a particular range complextypes are a way of defining element structures Basically equivalent to!element, but more powerful Specify sequence, choice between child elements Specify minoccurs and maxoccurs (default 1) Must associate an element/attribute with a simpletype, or an element with a complextype 32 Simple Schema Example <xsd:schema xmlns:xsd="http://www.w3.org/2001/xmlschema"> <xsd:element name= mastersthesis" type= ThesisType"/> <xsd:complextype name= ThesisType"> <xsd:attribute name= mdate" type="xsd:date"/> <xsd:attribute name= key" type="xsd:string"/> <xsd:attribute name= advisor" type="xsd:string"/> <xsd:sequence> <xsd:element name= author" type= xsd:string"/> <xsd:element name= title" type= xsd:string"/> <xsd:element name= year" type= xsd:integer"/> <xsd:element name= school" type= xsd:string /> <xsd:element name= committeemember" type= CommitteeType minoccurs= 0"/> </xsd:sequence> </xsd:complextype> </xsd:schema> 33 11