XML Schemadefinition



Similar documents
XML Schema Definition Language (XSDL)

XML: extensible Markup Language. Anabel Fraga

DTD Tutorial. About the tutorial. Tutorial

Chapter 3: XML Namespaces

04 XML Schemas. Software Technology 2. MSc in Communication Sciences Program in Technologies for Human Communication Davide Eynard

Structured vs. unstructured data. Semistructured data, XML, DTDs. Motivation for self-describing data

Last Week. XML (extensible Markup Language) HTML Deficiencies. XML Advantages. Syntax of XML DHTML. Applets. Modifying DOM Event bubbling

Semistructured data and XML. Institutt for Informatikk INF Ahmet Soylu

Introduction to XML. Data Integration. Structure in Data Representation. Yanlei Diao UMass Amherst Nov 15, 2007

Quiz! Database Indexes. Index. Quiz! Disc and main memory. Quiz! How costly is this operation (naive solution)?

XML WEB TECHNOLOGIES

XML and Data Management

How To Use Xml In A Web Browser (For A Web User)

Chapter 2: Designing XML DTDs

Java and XML parsing. EH2745 Lecture #8 Spring

Standard Recommended Practice extensible Markup Language (XML) for the Interchange of Document Images and Related Metadata

Structured vs. unstructured data. Motivation for self describing data. Enter semistructured data. Databases are highly structured

XML. CIS-3152, Spring 2013 Peter C. Chapin

Languages for Data Integration of Semi- Structured Data II XML Schema, Dom/SAX. Recuperación de Información 2007 Lecture 3.

WWW. World Wide Web Aka The Internet. dr. C. P. J. Koymans. Informatics Institute Universiteit van Amsterdam. November 30, 2007

Visualization of GML data using XSLT.

Introduction to Web Services

An XML Based Data Exchange Model for Power System Studies

Introduction to XML Applications

T XML in 2 lessons! %! " #$& $ "#& ) ' */,: -.,0+(. ". "'- (. 1

Presentation / Interface 1.3

Geography Markup Language (GML) simple features profile

Schema Developer s Guide

Core Components Data Type Catalogue Version October 2011

XML- New meta language in e-business

OpenTravel Alliance XML Schema Design Best Practices

Extensible Markup Language (XML): Essentials for Climatologists

Fast track to HTML & CSS 101 (Web Design)

Schematron Validation and Guidance

CST6445: Web Services Development with Java and XML Lesson 1 Introduction To Web Services Skilltop Technology Limited. All rights reserved.

Multimedia Applications. Mono-media Document Example: Hypertext. Multimedia Documents

Lightweight Data Integration using the WebComposition Data Grid Service

Agents and Web Services

Translating between XML and Relational Databases using XML Schema and Automed

DRAFT. Standard Definition. Extensible Event Stream. Christian W. Günther Fluxicon Process Laboratories

LabVIEW Internet Toolkit User Guide

WEB DEVELOPMENT IA & IB (893 & 894)

10CS73:Web Programming

Web Services Technologies

XML Processing and Web Services. Chapter 17

6. SQL/XML. 6.1 Introduction. 6.1 Introduction. 6.1 Introduction. 6.1 Introduction. XML Databases 6. SQL/XML. Creating XML documents from a database

XML for RPG Programmers: An Introduction

Web Development I & II*

XML Databases 6. SQL/XML

No Trade Secrets. Microsoft does not claim any trade secret rights in this documentation.

CHAPTER 1 INTRODUCTION

FileMaker Server 9. Custom Web Publishing with PHP

An Approach to Eliminate Semantic Heterogenity Using Ontologies in Enterprise Data Integeration

JAXB Tips and Tricks Part 2 Generating Java Classes from XML Schema. By Rob Ratcliff

Concrete uses of XML in software development and data analysis.

Textual Modeling Languages

Rights Expression Language Version 1.0 Version 13-September Open Mobile Alliance OMA-Download-DRMREL-v1_ C

XML for Manufacturing Systems Integration

Application of XML Tools for Enterprise-Wide RBAC Implementation Tasks

INTERNATIONAL TELECOMMUNICATION UNION

Design and Development of Website Validator using XHTML 1.0 Strict Standard

Common definitions and specifications for OMA REST interfaces

XML. Document Type Definitions XML Schema

T Network Application Frameworks and XML Web Services and WSDL Tancred Lindholm

Representation of E-documents in AIDA Project

SOFTWARE ENGINEERING PROGRAM

XEP-0043: Jabber Database Access

[MS-FSDAP]: Forms Services Design and Activation Web Service Protocol

by LindaMay Patterson PartnerWorld for Developers, AS/400 January 2000

An XML Based Knowledge Management System for e-collaboration and e-learning

estatistik.core: COLLECTING RAW DATA FROM ERP SYSTEMS

Internet Technologies_1. Doc. Ing. František Huňka, CSc.

Unified XML/relational storage March The IBM approach to unified XML/relational databases

Exchanger XML Editor - Canonicalization and XML Digital Signatures

Standard Languages for Developing Multimodal Applications

Log Analysis Software Architecture

Developing Web Views for VMware vcenter Orchestrator

Schema Classes. Polyhedra Ltd

Developing XML Solutions with JavaServer Pages Technology

business transaction information management

Building A Very Simple Web Site

No Trade Secrets. Microsoft does not claim any trade secret rights in this documentation.

Transcription:

Vorlesung IFS in der Bioinformatik SS 2011 Modul 2: a.univ.-prof. Dr. Werner Retschitzegger IFS Johannes Kepler University Linz www.jku.ac.at Institute of Bioinformatics www.bioinf.jku.at Information Systems Group www.ifs.uni-linz.ac.at Outline Introduction Motivation for XML Document Markup Languages Application Areas for XML XML 1.0 Namespaces XML Schema The following slides are based (among others) on: Elliotte Rusty Harold, W. Scott Means, XML in a Nutshell: A Desktop Quick Reference, 3rd Edition, O'Reilly & Associates, 2005 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) M2-2

Motivation for XML 1/5 From HTML to XML "If I invent another programming language, its name will contain the letter X." (N. Wirth, Software Pioniere Konferenz, Bonn 2001) Google Indicator: Love XML ABC Soccer SQL Werner Retschitzegger 2,2 Mrd. 603 Mio. 252 Mio. 237 Mio. 223 Mio. 20,6 K... as of Sep/16/08 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) M2-3 Motivation for XML 2/5 From HTML to XML HTML (HyperText Markup Language) is the "Lingua Franca" for representing Hypertext Documents at the Web Standardized 1989 by W3C (World Wide Web Consortium) Basic concept: "Markup" in terms of "Tags" Drawbacks Restricted number of pre-defined tags permanent extensions with proprietary tags Tags primarily describe layout aspects hardens Web search Brian Kerningham: "The problem with HTML-WYSIWYG is that what you see is all you've got" 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) M2-4

Motivation for XML 3/5 From HTML to XML HTML describes layout of content <h1>pdacatalog</h1> <h2>nokia 8210</h2> <table border="1"> <tr> <td>battery</td><td>900mah</td> </tr> <tr> <td>weight</td><td>141g</td> </tr> </table> <PDACatalog> <Producer name="nokia"> <PDA name="8210"> <Battery>900mAh</Battery> <Weight>141g</Weight> </PDA> </Producer> </PDACatalog> XML describes structure and semantics of content PDA-Catalog Battery Weight Tim Bray, Co-Editor of XML 1.0: "XML will become the ASCII of the 21st century - basic, essential, unexciting" 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) M2-5 Motivation for XML 4/5 Features of XML Layout Independence Separation of structure and semantics of the content from its layout Platform and Vendor Independence Endorsed by the W3C Internationality Based on the UNICODE-Standard Extensibility Tags can be defined and named arbitrarily meta language Structurability Tags can be nested arbitrarily Semi-structured Content can contain fully structured parts and fully unstructured parts Self-describing Tags describing structure and semantics of the content are... for humans: relatively easy to read and edit... for machines: easy to generate and parse X-Technology Infrastructure W3C provides a set of XML-based standards XML Standards Family Correctness Proof Optionally, XML documents can be proofed for correctness 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) M2-6

Motivation for XML 5/5 Properties of XML Documents and XML Processors Well-formedness Entities syntactical properties, e.g.: At least 1 tag per document Exactly 1 root tag Tags have to be none-overlapping Each tag has to have an end tag... XML-Document PDACatalog1.XML PDA Features XML Processor Entity Parser Manager Catalog.DTD XML-Processors parse XML documents and check Validity XML document is well-formed and corresponds to a schema Schema defines vocabulary and grammar Alternatives: DTD or XML Schema-Standard Application Document parts Errors either solely well-formedness (non-validating processors) or also validity (validating processors) Can be called from within an application (e.g., browser) Decompose an XML document into its parts forming a tree, which allows to access its parts from within an application 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) M2-7 Document Markup Languages 1/4 History Vannevar Bush 1945 Memex Douglas Engelbart 1962 Augment Ted Nelson 1965 Xanadu William Tunniclife (GCA) 1967 GenCode Goldfarb, Mosher, Lorie (IBM) 1969 GML (Generalized Markup Language) ANSI 1978 Standardisierung (GenCode & GML) Charles Goldfarb ISO 1986 SGML (Standard Generalized Markup Language - ISO 8879) Tim Berners-Lee (CERN) 1989 HTML (Hypertext Markup Language) Mark Andreessen (NCSA) 1993 HTML-Forms (XMosaic) Netscape, Microsoft 1994 HTML-Derivations Jon Bosak, Tim Bray, 1996 XML Working Group James Clark et al. (W3C) 10. 2. 1998 XML 1.0 29. 9. 2006 XML 1.1, 2nd Edition 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) M2-8

Document Markup Languages 2/4 Memex http://www.ps.uni-sb.de/~duchier/pub/vbush/vbush-all.shtml 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) M2-9 Document Markup Languages 3/4 XML and OMG s Metadata Architecture SGML XML Meta Level M2 z.b. MathML WML XHTML HTML Language Level (e.g. DTDs) M1 z.b. e iπ +1= 0 n Instance Level (documents) M0 f (n) = Σ k k=1 [www.omg.org] 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) M2-10

Document Markup Languages 4/4 XML versus...... SGML XML vs. SGML (60 pages vs. 600 pages) XML has 20% of SGML s complexity, but 80% of its functionality XML documents are conform to an ISO revision of SGML - WebSGML (Annex to the SGML-Standard ISO8879)... HTML XML is complementary to HTML (semantic and structure vs. layout) XML is not backward compatible to HTML Simple conversion from HTML documents to XML... XHTML = Extensible HTML W3C Recommendation Aug. 2002 (2nd edition) HTML 4.01 as an XML application, i.e. HTML was described by means of a XML-DTD 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) M2-11 Application Areas of XML 1/4 Three Main Application Areas Data Exchange ("Portable Data") Using XML solely as an exchange format or Using also a common schema Multi-Delivery One and the same content can be delivered to different end user devices Intelligent Retrieval Instead of a simple keyword search on basis of HTML documents, structurebased search on basis of XML documents "Mozart"- Componistor chocolate ball? 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) M2-12

Application Areas of XML 2/4 Industrial Sectors "Verticalisation of XML" XML-DTDs for... Literature "Gutenberg" Travel "opentravel" News "NewsML" Marketing "adxml" Weather "OMF" Human Resources "XML-HR" Voice Applications "VoxML" Vector Graphics "SVG" Mobile Applications "WML" Geo Applications "ANZMETA" Health Care "HL7" Mathematics "MathML Banking "MBA egovernment egovml Electronic Commerce CBL: Common Business Library (Commerce One) BizTalk: Microsoft cxml: Commerce XML RosettaNet:Format for Online- Orders ebxml: OASIS + XML/EDI FnXML: Financial Products Markup Language... [http://www.oasis-open.org/cover/xml.html#applications] 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) M2-13 Application Areas of XML 3/4 Sources of XML Data Inter-application and mobile devices communication data e.g., Web Services Logs and Blogs e.g., RSS Metadata e.g., Schema, WSDL, XMP Presentation data e.g., XHTML Documents e.g., Word Views of other sources of data e.g., Relational, LDAP, CSV, Excel, etc. Sensor data 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) M2-14

Application Areas of XML 4/4 XML Standardization Family (excerpt) It takes ten minutes to understand (base) XML, but then ten month to XML understand the new technologies hung around it. XML language concepts incl. DTD (Peter Chen) XML Namespaces Support of a global identification schema W3C Standardization Levels: for element names and attribute names (1) Note XPath (XML Path Language) (2) Working Draft (WD) Path expressions for navigation in XML documents (3) Candidate Recommendation (CR) (4) Proposed Recommendation (PR) XML Schema (5) Recommendation (REC) XML-based language for the definition of XML schemata XLink, XPointer XML-based language for the linking of (parts of) XML documents XSL (Extensible Stylesheet Language) XSLT: Transformation of XML documents (declarative) XSL-FO: Rendering of XML documents (declarative) DOM (Document Object Model) API for accessing XML documents in a procedural manner 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) M2-15 Outline Introduction XML 1.0 XML Document DTD Entities Namespaces XML Schema M2-16

XML Document 1/3 Running Example: PDACatalog Elementname Comment Subelement Start Tag Element Content" of <Producer> End Tag PDACatalog1.XML <?xml version="1.0" encoding="utf-8"?> <PDACatalog> <!-- NOKIA --> <Producer name="nokia"> <ProducerNo no="h1234"/> <PDA name="7110"> <Weight>141g 141g</Weight> <Price contract= yes" yes">999</price> <Price contract= no" no">4999</price> </PDA> <PDA name="8210">... </PDA> </Producer> </PDACatalog PDACatalog> Prologue (optional) "xml declaration" Root Element" or Document Element" Empty Element" Text Character Data" Mixed Content" Attribute Attribute Value M2-17 XML Document 2/3 Elements and Attributes Element- and attribute names have to be valid "XML Names" [ letter _ : ] [ letter '0..9' '.' '-' '_' ':' ]* "letter": A-Z, a-z, and others like ä, ê ς ':' reserved for namespaces No length restriction Case-sensitive Empty elements can be represented in long form or short form <ProducerNo no="h1234"></producerno> or <ProducerNo no="h1234"/> Attribute values must be enlosed by quotation marks <PDA name='8210'> or <PDA name="8210"> M2-18

XML Document 3/3 Comments Can stretch across multiple rows Between start tag and end tag of an element Before or after the root element Restrictions Comment within a tag not allowed Nesting of comments not allowed "--" within a comment not allowed <!-- A comment may comprise also <tagnames> or &entities; -->... M2-19 DTD 1/8 Purpose and Characteristica A DTD defines vocabulary and grammar for a set of XML documents An XML document is allowed to reference a single DTD only ("document type declaration - DOCTYPE") A DTD has to be referenced AFTER the prologue but BEFORE the root element A DTD does NOT DEFINE the root element of a XML document The root element is rather defined within the XML document itself using the DOCTYPE-Declaration Can be an arbitrary element of the DTD PDACatalog1.XML <?xml version="1.0"?> <!DOCTYPE PDACatalog... <PDACatalog>... Usage Definition Root Element Catalog.DTD M2-20

DTD 2/8 Incorporating DTD s into XML Documents 3 Alternatives 1. External DTD, i.e., a dedicated file (*.dtd) identified by means of an URI ("external subset") <!DOCTYPE PDACatalog SYSTEM "Catalog.dtd"> 2. Internal DTD, i.e., defined within the XML document ("internal subset") <!DOCTYPE PDACatalog [ ]> 3. External & internal DTD, i.e., internal complements external Excursus URL vs. URI: An URL (Uniform Resource Locator) identifies Internet resources on basis of their location using the Domain Name Service (DNS) An URI (Uniform Resource Identifier) identifies arbitrary resources on basis of their names (z.b. ISBN#) or other properties of the resource Each URL is a valid URI M2-21 DTD 3/8 Example Catalog.dtd UML Class Diagram 1 PDACatalog ProducerNo no * Producer name 1..* PDA name 1 1..* Weight Price contract Legend: XML Element XML Attribute 1 : exactly once 1..* : once or several times * : 0 or several times : part-of XML DTD <!-- Catalog DTD Version 1.0 --> <!ELEMENT PDACatalog (Producer*)> <!ELEMENT Producer (ProducerNo, PDA+)> <!ATTLIST Producer name CDATA #REQUIRED> <!ELEMENT ProducerNo EMPTY> <!ATTLIST ProducerNo no ID #REQUIRED> <!ELEMENT PDA (Weight, Price+)> <!ATTLIST PDA name CDATA #REQUIRED> <!ELEMENT Weight (#PCDATA)> <!ELEMENT Price (#PCDATA)> <!ATTLIST Price contract (yes no) "no"> M2-22

DTD 4/8 Element Declaration <!ELEMENT element name (Content Model)> Sequence <!ELEMENT Producer (ProducerNo, PDA+)> Alternative <!ELEMENT Battery (LiIo NiMh NiCd)> Cardinality Optional (0 or once) <!ELEMENT PDA (Comment?)> Null or several times <!ELEMENT PDACatalog (Producer*)> Once or several times <!ELEMENT Producer (PDA+)> Content model can be nested by means of paranthesis <!ELEMENT div1 (head, (p list note)*, div2*)> M2-23 DTD 5/8 Element Declaration Empty Element Element may contain attributes, but neither text nor subelements <!ELEMENT ProducerNo EMPTY> Element Content Element contains subelements and optional attributes but no text <!ELEMENT PDACatalog (Producer*)> Mixed Content Element contains text and optional subelements or attributes <!ELEMENT Price (#PCDATA)> <!ELEMENT Price (#PCDATA Category Discount)*> Element with arbitrary content Content not exactly specified in DTD Used elements have to be declared anyway <!ELEMENT Comment ANY> M2-24

DTD 6/8 Attribute Declaration <!ATTLIST element name attributename1 type default attributename2 type default... > Attribute names must be unique within an element Default specifications NOT NULL #REQUIRED Optional Value #IMPLIED Default Value [#FIXED] "value" M2-25 DTD 7/8 Attribute Declaration 10 Types CDATA String <!ATTLIST Producer name CDATA #REQUIRED> ID, IDREF(S) ID ensures uniqueness of attribute values within a document Per element 1 attribute of type ID allowed only IDREF is a reference to an attribute of type ID <!ATTLIST Example identity ID #IMPLIED reference IDREF #IMPLIED> Referential integrity (untyped!) is checked by XML processor Values of ID- and IDREF(S)-attributes must be valid XML names, i.e., starting numbers are not allowed M2-26

DTD 8/8 Attribute Declaration 10 Types Enumeration Type A pre-defined set of values consisting of XML name tokens <!ATTLIST Price contract (yes no) "no"> ENTITY, ENTITIES Attribute value is the name of a declared non-parsed Entity <!ATTLIST Image filename ENTITY #REQUIRED> NMTOKEN(S) "XML name tokens are an extended form of XML names In addition, they can start with "0..9 ", ". " and "-" <!ATTLIST journal year NMTOKEN #REQUIRED> NOTATION Attribute value is the name of a declared notation seldomly used <!ATTLIST image type NOTATION (gif tiff) #REQUIRED> M2-27 Entities 1/9 Overview Referenceable, named parts of XML documents (plain text, markup or other arbitrary formats) or a DTD Purpose: Character replacement macros, modularisation Processing: References are expanded during parsing General Entities Usage in XML documents Parameter Entities Usage in DTDs Internal External Pre-defined Replacement of XML-specific char s Unicode Replacement of none-ascii-char s User-defined Replacement of document parts Internal embedded External file Parsed Nonparsed M2-28

Entities 2/9 Pre-defined Entities Purpose: Representation of XML specific characters e.g. <> "escaping" 5 pre-defined Entities & & (ampersand) < < (less than) > > (greater than) Example <formular>x < y</formular> Usage As element value or attribute value Alternative: CDATA-Section Example: <formular>x <![CDATA[<]]> y</formular> Within CDATA only its end is recognized (']]>') CDATA-Sections cannot be nested &apos; ' (apostrophe) &qout; " (quotation mark) Interpreted as plain text, NOT as markup M2-29 Entities 3/9 Unicode ("Character Encoding") Entities Purpose Representation of characters, not available at the keyboard http://www.unicode.org/ Unicode classifies characters into letters, numbers, punctuations, symbols (general, technical, mathematical), etc. Unique assignment of characters to numbers Supports 25 living languages (Cyrillic, Hebrew, Hiragana,...) All in all approx. 50.000 different characters Usage As element value or attribute value Arbitrary Unicode-characters are referenced via their numbers (decimal or hexadecimal) û û and all represent the same character M2-30

Entities 4/9 User-Defined Internal Entities Text or well-formed markup is associated with a name Declaration within the DTD: <!ENTITY entityname "replacementtext or Markup"> Usage &entityname; As element value or attribute value of the XML document In entities themselves but cyclic references are forbidden M2-31 Entities 5/9 User-Defined External Parsed Entities Purpose Decomposition of the XML document (similar to SSI Server Side Include-mechanism) Because of the document s size or for reuse Declaration within the DTD Charakteristica In principal well-formed, but may contain multiple root elements Reference to a DTD not allowed Usage Syntax analogous to internal entities As element values of the XML document and within entities themeselves Cyclic references forbidden NOT within attribute values <!ENTITY entityname SYSTEM "URI"> M2-32

Entities 6/9 User-Defined External Non-Parsed Entities Purpose References to files with arbitrary formats, e.g. ASCII, notwellformed XML, GIF, JPEG, QuickTime Movies <!ENTITY entityname SYSTEM "URI" NDATA formatname> <!NOTATION formatname SYSTEM "URI"> NDATA defines a "non-parsed" Entity and specifies an arbitrary file format a NOTATION-declaration is necessary to identify a corresponding application (via an URI), which is able to process files of this format Usage Only as attribute value of type ENTITY Syntax: entity name within quotation marks (Note: NO &...;) Processor informs the application only that there exists a nonparsed entity at a certain location no expansion! (More expressive) Alternative: W3C s XLink-Standard M2-33 Entities 7/9 User-Defined Entities Example Declaration Usage <?XML version="1.0"?> <!DOCTYPE PDACatalog SYSTEM Catalog.dtd" [ <!ENTITY linknokia "http://www.nokia.de/8210"> <!ENTITY address "<town>linz</town>"> <!ENTITY features SYSTEM "feat8210.xml"> <!ENTITY bildnokia SYSTEM "/pictures/8210.jpg" NDATA jpeg> <!NOTATION jpeg SYSTEM "image/jpeg"> <!ATTLIST Image filename ENTITY #REQUIRED> ]> <PDA name="8210"> <Picture><Image filename="bildnokia"/></picture> <ProducerInfo>&linkNokia;</ProducerInfo> &features; &address; </PDA> internal external, parsed external, non-parsed Usage as attribute value Usage as element value M2-34

Entities 8/9 Parameter Entities Purpose Modularization of DTDs Syntactical difference to General Entities % blank included for declaration % blank excluded for usage Definition of... Name and content model of elements Attribute declaration Internal <!ENTITY % Battery "(type, capacity)" > <!ELEMENT PDABatt %Battery;> <!ELEMENT camcorderbatt %Battery;> External <!ENTITY % linknokia SYSTEM "http://nokia.de" > %linknokia; M2-35 Entities 9/9 Parameter Entities Overriding A Parameter Entity defined within an external DTD can be arbitrarily overriden within the internal DTD of a XML document This allows to adapt the external DTD to the requirements of single XML documents without having to change the external DTD Thus, the Parameter Entity is used as a kind of "Customization Hook" External DTD <!ENTITY % residental_content "address,rooms"> Internal DTD of a XML document <!ENTITY % residental_content "address,rooms,baths"> M2-36

Outline Introduction XML 1.0 Namespaces XML Schema M2-37 Namespaces 1/5 A XML namespace (NS) allows a unique global identification of elments and attributes W3C-REC "Namespaces in XML", 14th Jan. 1999 (13 pages) For this, elements and attributes of a domain (e.g. MathML) are assigned to one or more NS XSL uses, e.g., different namespaces for XSLT and XSL-FO A NS is represented by an URI Needs not directly refer to the corresponding vocabulary Thus, provides a level of indirection which allows to decouple the location of the vocabulary from the unique identifier the URI The associated elements and attributes have to be qualified by means of this URI in case of usage, thus being made globaly unique This allows the reuse and especially the combination ( mixture ) of different vocabularies M2-38

Namespaces 2/5 NS with Prefix vs. Default NS BUT: URIs cannot be used for direct qualification This is since URIs normally contain characters which are not allowed as part of valid XML names (e.g., " / ", " & ") Instead, user-defined prefixes have to be used One oremorens aredeclaredon basisof thepre-defined attribute xmlns This attribute can be defined in the context of any element of the DTD Thenameof theelementitselfwherethens has beendeclaredas well as direct and indirect subelements and attributes can be qualified with the NS NS-inheritance Default NS Also declared via the pre-defined attribute xmlns BUT only1 per element, and without declaring any prefix None-qualified subelements are automatically associated with the default NS, attributes NOT Can be overriden within subelements M2-39 Namespaces 3/5 Declaration and Usage Pre-defined Attribute for NS Declaration Default-NS (no Prefix) NS Prefix (optional) URI of the NS... <edi:hc xmlns:edi='http://ecommerce.org/schema' xmlns='http://www.mobildev.com/schema'> <model name="8210"> <edi:price edi:units='euro'>32.18</edi:price> <price währung='usd'>25.16</price>... </model>... </edi:hc> The NS of the element edi:price is http://ecommerce.org/schema The NS of the elements model and price is the default NS http://www.mobildev.com/schema The attributes name and währung have NO NS associated with M2-40

Namespaces 4/5... and DTDs NS are in principle independent of DTDs Can be used in documents with or without DTDs BUT: All elements and attributes which are qualified in the XML document must also be declared appropriately within the DTD Huge Overhead this is since DTD s are not aware of NS <edi:hc>... <!ELEMENT edi:hc (...)> <edi:price>... <!ELEMENT edi:price (#PCDATA)> What can be done is to specify a default NS within the DTD <!ATTLIST edi:hc xmlns CDATA #FIXED 'http://www.mobildev.com/schema'> M2-41 Namespaces 5/5 Exemplary NS-URIs RDF http://www.w3.org/1999/02/22-rdf-syntax-ns# http://www.w3.org/2000/01/rdf-schema# MathML http://www.w3.org/1998/math/mathml XHTML http://www.w3.org/1999/xhtml SMIL http://www.w3.org/tr/rec-smil XSL http://www.w3.org/1999/xsl/transform http://www.w3.org/1999/xsl/format M2-42

Outline Introduction XML 1.0 Namespaces XML Schema Introduction Elements and Attributes Pre-defined Datatypes User-defined Datatypes Keys Schema Composition Schema Modeling Styles Comparison DTD XML Schema M2-43 Introduction DTD versus XML Schema 1/2 XML Schema Definition of the structure of XML documents W3C REC May 2001, approx. 420 pages W3C REC 2nd edition October 2004 Drawbacks DTDs Proprietary syntax Few datatypes, in fact only one String Global definition of elements Parameter Entities for modularization & overriding ID, IDREF(S): Severe restrictions Advantages XML Schema XML as syntax Numerous pre-defined datatypes User-defined simple and complex datatypes Inheritance Keys, references: flexible concept M2-44

Introduction DTD versus XML Schema 2/2 Catalog.xsd <?xml version="1.0"?> <schema...> <simpletype name="producernotype">... <element name="pdacatalog"> <complextype> <sequence> Catalog.dtd... <!ELEMENT PDACatalog (Producer*) > <!ELEMENT Producer (ProducerNo, PDA+)> <!ELEMENT PDA (Weight, Battery)> <!ELEMENT Weight (#PCDATA)> <!ELEMENT Battery (#PCDATA)>... <element name="producer" minoccurs="0" maxoccurs="unbounded"> <complextype> <sequence> <element name="producerno" type="hc:producernotype" minoccurs="1" maxoccurs="1"/> <element name= PDA" minoccurs="1" maxoccurs="unbounded"> <complextype> <sequence> <element name="weight" type="string" minoccurs="1" maxoccurs="1"/> <element name="battery" type="string" minoccurs="1" maxoccurs="1"/> </sequence>... </schema> M2-45 Introduction Declaration of Namespaces in the Schema Namespace for own Vocabulary Namespace (NS) of the vocabulary to be defined can be declared by means of attribute targetnamespace (optional!) NS of the XML Schema-Standard Vocabulary Declaration is obligatory! Additional NS (i.e., vocabularies) can be incorporated A single NS can be defined as Default NS Either own NS, XML Schema NS or other NS For all other NS used, a prefix is obligatory <?xml version="1.0"?> <schema targetnamespace="http://www.ifs.uni-linz.ac.at/hc" xmlns:hc="http://www.ifs.uni-linz.ac.at/hc" xmlns="http://www.w3.org/2001/xmlschema" attributeformdefault="qualified" elementformdefault="qualified">... M2-46

Introduction Usage of NS in the XML Document Schema of a XML document is defined within the root element via the attribute schemalocation 1. Part: targetnamespace of the schema 2. Part: location of the schema document <?xml version="1.0"?> <schema targetnamespace="http://www.ifs.uni-linz.ac.at/hc" xmlns:hc="http://www.ifs.uni-linz.ac.at/hc" xmlns="http://www.w3.org/2001/xmlschema" attributeformdefault="qualified" elementformdefault="qualified">... <?xml version="1.0"?> <PDACatalog >... Catalog.xsd Catalog1.xml xsi:schemalocation="http://www.ifs.uni-linz.ac.at/hc Catalog.xsd" xmlns="http://www.ifs.uni-linz.ac.at/hc" xmlns:xsi="http://www.w3.org/2001/xmlschema-instance xsi:nonamespaceschemalocation= "directpathtoxsd_file" M2-47 Elements and Attributes 1/3 Global / Local Definition Element <element name="name" type="type" minoccurs="int" maxoccurs="int unbounded"... /> Attribut Simple or Complex Type Global Definition Direct subelement of schema Cardinality: Upper/Lower Bounds (only in local elements) <attribute name="name" type="type" use="how-its-used" default/fixed="value"... /> Simple Type Values: required, optional, prohibited (only in local attributes) only relevant, if use is not defined NOTE: the root element of the XML document is required to be defined as global element! Local Definition Definition on an arbitrary nesting level Analoguosly for Datatypes! M2-48

Elements and Attributes 2/3 Global / Local Datatypes and References Global or Local Datatypes <element name="name" minoccurs="int" maxoccurs="int unbounded"...> <complextype> </complextype> </element> <attribute name="name" use="how-its-used" default/fixed="value"...> <simpletype>...</simpletype> </attribute> Reference to an existing Element or Attribute <element ref="name" minoccurs="int" maxoccurs="int unbounded".../> <attribute ref="name" use="how-its-used" default/fixed="value".../> M2-49 Elements and Attributes 3/3 Summarizing Example Global/Local Global Element, local Datatype Local Element, global Datatype Reference to a global Element Local Attribute, pre-def. Datatype Global Element, local Datatype Local Element, pre-def. Datatype <schema...> <element name="producer"> <complextype> <sequence> <element name="producerno" type="hc:producernotype" minoccurs="1" maxoccurs="1"/> <element ref="hc:pda" maxoccurs="unbounded"/> </sequence> <attribute name="name" type="string" use="required"/> </complextype> </element> <element name="pda"> <complextype> <sequence> <element name="weight" type="string"/> <element name="battery" type="string"/> </sequence> </complextype> Orthogonality of Concepts: </element> <simpletype name="producerno"> M2-50

Pre-Defined Datatypes 1/4 Primitive (atomic) Derived (all complex types) anytype anysimple Type string boolean float double duration date time date gyear decimal Time Month gyear gmonth Day gday gmonth hex base64 any QName NOTATION Binary Binary URI normalized String integer token nonpositiveinteger long nonnegativeinteger language NMTOKEN Name negativeinteger int positiveinteger unsignedlong NMTOKENS NCName short unsignedint ID IDREF ENTITY byte unsignedshort IDREFS ENTITIES unsignedbyte (W3C REC, 28th Oct. 2004) M2-51 Pre-Defined Datatypes 2/4 String Datatypes Pre-defined primitive Types Pre-defined derived Types anysimpletype Binary string-encoded Datatypes string String-Datatype without hex base64 Whitespace-Replacement Binary Binary any URI NOTATION QName Qualified name: supports the usage of names with NS-prefix normalized String token language NMTOKEN Normalized String with whitespace replacement. Each Tab, Linefeed and CR is replaced by Blank. "Tokenized" String all whitespace characters are replaced by blanks, all starting and ending blanks are deleted and multiple consecutive blanks are replaced by a single one. Standardized language codes (e.g. en, en-us, de, de-de) Name token: String without blanks (z.b. "CMS", "234234") NMTOKENS Because of backward-compatibility reasons, usable only as types for attributes Name XML-Name: must start with letter, ":" or "-" (e.g., "CMS", "-1") NCName Name without prefix ID IDREF ENTITY Backward-compatibility to DTDs IDREFS ENTITIES M2-52

Pre-defined Datatypes 3/4 Numerical Datatypes Pre-defined primitive Types Pre-defined derived Types boolean anysimpletype double float decimal Floating Point Numbers: simple (32 Bits) and double (64 Bits) precision Decimal Numbers: decimal separator ".", "+" or "-" possible. integer nonpositiveinteger long nonnegativeinteger negativeinteger int positiveinteger unsignedlong short unsignedint byte unsignedshort 64, 32, 16 or 8 Bit unsignedbyte M2-53 Pre-defined Datatypes 4/4 Date- and Time Datatypes anysimpletype duration time datetime date gyearmonth gyear gmonthday gday gmonth "hh:mm:ss" "PnYnMnDTnHnMnS" "CCYY" "CCYY-MM" "CCYY-MM-DD" "CCYY-MM-DDThh:mm:ss" "--MM" "---DD" "--MM-DD" Month of the year Day of the month Day of the year M2-54

User-defined Datatypes Alternatives Should the Type contain Elements or Attributes? Nesting <sequence> <all> <choice> Empty / Mixed Structured Content <complextype> Should the Type contain Elements? yes Attributes & Elements <complexcontent> yes no Attributes <simplecontent> Derivation <restriction> <extension> no Unstructured Content <simpletype> Derivation <restriction> <union> or <list> Note: <complexcontent> only necessary in case of derivation from a user-defined type Named / Anonymous M2-55 User-defined Datatypes Alternatives Examples Simple Complex No Derivation <xsd:integer> <xsd:complextype> <xsd:sequence>... </xsd:sequence> </xsd:complextype> Anonymous Derivation <xsd:simpletype name="longitudetype"> <xsd:restriction base="xsd:integer"> <xsd:mininclusive value="-180"/> <xsd:maxinclusive value="180"/> </xsd:restriction> </xsd:simpletype> <xsd:complextype name="booktypewithid"> <xsd:complexcontent> <xsd:extension base="booktype"> <xsd:attribute name="id" type="xsd:token"/> </xsd:extension> </xsd:complexcontent> </xsd:complextype> Named User-defined Pre-defined M2-56

User-defined Datatypes Derived Simple Datatypes <simpletype> Restriction of a pre-defined datatype <restriction> Union of pre-defined datatypes (Extension) <union> Values must correspond to at least one of the combined datatypes List of values of one pre-defined datatype (or again of a List-Datatype) <list> M2-57 Alternative Definition Possibilities Referencing an existing datatype via the attribute base User-defined Datatypes Derived Simple Datatypes <simpletype> restriction Local definition from scratch by using simpletype as subelement of the restriction-element 12 Possible Restrictions, depending on the base datatype length <simpletype name="batterytype"> <restriction base="string"> <enumeration value="nimh"/> <enumeration value="nicd"/> <enumeration value="liio"/> </restriction> </simpletype> <element name="battery" type="hc:batterytype"/> minlength maxlength pattern enumeration mininclusive maxinclusive minexclusive maxexclusive whitespace totaldigits fractiondigits XML-Document <Battery>NiCd</Battery> M2-58

User-defined Datatypes Derived Simple Datatypes <simpletype> restriction M2-59 User-defined Datatypes Derived Simple Datatypes <simpletype> restriction Restrictions using a pattern element Restrictions of the lexical values Simple regular expressions Normal characters: "C&A" Categories of characters:"\p{isbasiclatin}" Sets of characters: "[\p{isbasiclatin}-[\d]]" Quantifiers: "[a-za-z]{1,8}" Paranthesis: "(XML(\s+ -))?Schema" Combinations of these expressions M2-60

User-defined Datatype Derived Simple Datatypes <simpletype> union/list Alternative Definition Possibilities Referencing an existing datatype via attributes (membertypes or itemtype) Local definition from scratch by using simpletype as subelement of the union- or list-elements <simpletype name="pdafeaturetype"> <union membertypes="hc:pdacolor hc:pdarobustness"/> </simpletype> <simpletype name="pdafeaturelisttype"> <list itemtype="hc:pdafeature"/> </simpletype> <element name="pdafeaturelist" type="hc:pdafeaturelisttype"/> XML-Dokument: <PDAFeatureList>blue waterproof shockproof</pdafeaturelist> M2-61 User-defined Datatypes <complextype> - Nested Elements/Attributes/Empty/Mixed Nested Elements Possible within a complex datatype only Attributes Possible within a complex datatype only Independent of the existence of nested elements Empty Content Possible within a complex datatype only Does not have nested elements Mixed Content Datatype may contain nested elements and text In contrast to DTDs, for nested elements, the ordering and cardinality properties can be arbitrarily specified M2-62

User-defined Datatype <complextype> Nested Elements / Attributes Sequence <sequence> <complextype name= PDAType"> <sequence minoccurs="1" maxoccurs="1"> <element name="weight" type="string" minoccurs="1" maxoccurs="1"/> <element name="battery" type="string" minoccurs="1" maxoccurs="1"/> </sequence> <attribute name="no" type="nonnegativeinteger" use="required"/> </complextype> Choice <choice> Arbitrary Ordering <all> Nested Elements can be used in arbitrary order Cardinality Expressed by means of minoccurs and maxoccurs M2-63 User-defined Datatypes <complextype> Mixed Content <complextype name= PDAType" mixed="true"> <sequence> <element name="weight" type="string" minoccurs="1" maxoccurs="1"/> <element name="battery" type="string" minoccurs="1" maxoccurs="1"/> </sequence> </complextype> <element name= PDA" type="hc:pdatype"/> XML Document <PDA>Type Nokia 7110 has <Weight>141g</Weight>and <Battery>900mAh</Battery> </PDA> M2-64

User-defined Datatypes <complextype> Derivation of Complex Types Extension <extension> Additional nested elements and/or attributes Restriction <restriction> Domain Cardinality Abstract Datatypes <complextype> with attribute abstract = "true Prohibition of Derivation <complextype> with attribute final Potential values: #all, restriction, extension M2-65 User-defined Datatypes <complextype> Derivation via Extension Elements are attached at the end Extension must be specified within a <complexcontent>-tag PDAType extendedpdatype <complextype name= extendedpdatype"> <complexcontent> <extension base="hc:pdatype" > <sequence> <element name= Band" type="string" minoccurs="1" maxoccurs="1"/> <element name="feature" type="string" minoccurs="1" maxoccurs="10"/> </sequence> </extension> </complexcontent> </complextype> M2-66

User-defined Datatypes <complextype> Derivation via Restriction The declarations of the base datatype which should retain must be repeated Restriction must be specified within a <complexcontent>-tag PDAType extendedpdatype <complextype name= restrictedpdatype"> restrictedpdatype <complexcontent> <restriction base="hc:extendedpdatype"> <sequence> <element name="weight" type="string" minoccurs="1" maxoccurs="1"/> <element name= Band" type="string" minoccurs="1" maxoccurs="1"/> <element name="feature" type="string" minoccurs="1" maxoccurs="5"/> </sequence> </restriction> </complexcontent> </complextype> M2-67 User-defined Datatype <complextype> Two Usage Possibilities Static Dynamic Definition of the derived datatype within the XML document via the attribute type of the XML Schema Instance (xsi) NS Element PDA has datatype PDAType <PDA> <Weight>141g</Weight> <Battery>900mAh</Battery> </PDA> <PDA xsi:type= extendedpdatype"> <Weight>115g</Weight> <Battery>550mAh</Battery> <Band>Dualband</Band> <Feature>Waterproof</Feature> </PDA> Datatype extendedpdatype is derived from PDAType: Extension with Band & Feature M2-68

Keys 1/2 Characteristics of a key (key) Value (combination) must be unique Value must exist Key must be defined as subelement of another element following the type definition Candidates for keys (field) Elements with simple datatypes only! Attributes Combinations of elements and attributes Scope can be defined (selector) Reference to key can be defined (keyref) Elements, Attributes and Combinations thereof can be defined to be unique (unique) Value (combination) must be unique Value need NOT exist M2-69 Keys 2/2 <element name="pdacatalog"> <complextype>...</complextype> <key name= typekey"> <selector xpath="hc:producer/hc:pda"/> <field xpath="@name"/> <field xpath="@version"/> </key> <keyref name="reftotypekey" refer="hc:typekey"> <selector xpath="hc:stock/hc:pdaquantity"/> <field xpath="@name"/> <field xpath="@version"/> </keyref> </element> <element name="pdacatalog"> <complextype>...</complextype> <unique name="uniqueproducerno"> <selector xpath="hc:producer"/> <field xpath="@producerno"/> </unique> </element> PDA Name Version Weight... PDAQuantity Name Version Quantity M2-70

Schema Composition Within a Schema 1/2 Group of Elements <group name="maindata"> <sequence> <element name="weight" type="string" minoccurs="1" maxoccurs="1"/> <element name="battery" type="string" minoccurs="1" maxoccurs="1"/> </sequence> </group> <complextype name= PDAType"> <sequence> <group ref="hc:maindata"/> <element name="feature" type="string" minoccurs="1" maxoccurs="10"/> </sequence> </complextype> M2-71 Schema Composition Within a Schema 2/2 Group of Attributes <attributegroup name="batteryattributegroup"> <attribute name="type" type="string" default="nimh"/> <attribute name= capacity" type="string" use="required"/> </attributegroup> <complextype name= BatteryType"> <sequence>...</sequence> <attributegroup ref="hc:batteryattributegroup"/> </complextype> M2-72

Schema Composition Different Schemata 1/2 Incorporation of other schemata via include, redefine and import include, redefine and import elements must be subelements of schema prior to any other declaration Include of a Schema include NS of included schema must be equal to the NS of the including schema or no NS at all The included schema can be used as if it were declared directly within the including schema <schema...> <include schemalocation="pda.xsd"/>... M2-73 Schema Composition Different Schemata 2/2 Including and Redefining a Schema redefine Same functionality as include In addition, included components (simpletype, complextype, group, attributegroup) can be newly defined New definitions replace the original ones <redefine schemalocation="pda.xsd"> <complextype name= PDAType">...</complexType>... </redefine>... Import of a Schema import Imported schema can have an arbitrary NS (could be unequal to the current one) or none <import namespace="http://" http://www.somewhere.else.com" schemalocation="producer.xsd"/>... M2-74

Schema Modeling Styles Non-Normative Datamodel of XML Schema Concepts Legend: http://www.w3.org/tr/xmlschema-1/ M2-75 Schema Modeling Styles XML Schema Concepts in Practice Analysis of 1400 Schemata of diverse standard vocabularies Open Travel Alliance (OTA), Human Resource XML (HR-XML), W3C, Global Justice XML, etc. P. Kiel, Profiling XML Schema, http://www.xml.com/pub/a/2006/09/20/profiling-xml-schema.html M2-76

Schema Modeling Styles Relationships /Global vs. Local /Element vs. Type Relationships Realisation by means of nesting or via references Global Elements/Attribute-Declarations Pre-requisite for reuse in the same/another schema Root element must be global Local Element/Attribute-Declarations In case that a declaration makes sense only in combination with the declared type Local Element Declarations Can occur with different structure but the same name in different types Local Attribute Declarations Makes sense since attributes are most often tightly coupled to elements Three Stereotypical Design Forms Russian Doll Design Salami Slice Design Venetian Blinds Design Literature XMLSchema Best Practices (Roger Costello): www.xfront.com P. Kiel, Profiling XML Schema, http://www.xml.com/pub/a/2006/09/20/profiling-xml-schema.html M2-77 Schema Modeling Styles Russian Doll Design Nested Element Declarations Local declarations only Prevents global types Advantages Structure obvious (corresponds to the XML document s structure) Prevents side-effects Disadvantages Danger of deep nesting levels No reuse of declarations redundancies No extensibility in terms of derivation M2-78

Schema Modeling Styles Salami Slice Design Global Element Declarations Usage of global elements per reference (ref-attribute) Each global element can be a root element Advantages Reuse of elements Disadvantages Large numger of global elements Confusing Danger of side-effects in case of changes to global elements No extensibility in terms of derivation M2-79 Schema Modeling Styles Venetian Blinds Design Global Type Declarations Elements, except the root element, are declared locally Advantages Reuse of types A named type is available for each element and attribute Types can be imported from other schemata Extensibility by derivation <redefine> Disadvantages Large number of global types Confusing M2-80

Schema Modeling Styles Comparison Russian Doll Design For restrictive structures Structure of the XML documents in large parts pre-defined by the schema Salami Slice Design For flexible structures Structure of the XML documents can strongly vary since different root elements are possible Venetian Blinds Design For flexible structures too Structure of XML documents can strongly vary in case that type inheritance is used In practice mixtures! M2-81 Schema Modeling Styles Possible Mixture: Garden of Eden M2-82

Comparison DTD XML Schema General Criteria DTD XML Schema Syntax Structure Namespaces Proprietary Simple XML Complex M2-83 Comparison DTD XML Schema Elements Default values Definition of the content DTD Text, Elements, mixed content XML Schema simple Types, complex Types Defined Order "," <sequence> Arbitrary Order <all> Alternative " " <choice> Cardinality "?", "*", "+" minoccurs and maxoccurs (more flexible) M2-84

Comparison DTD XML Schema Attributes DTD XML Schema Default values Optionality M2-85 Comparison DTD XML Schema Datatypes DTD XML Schema Pre-defined Datatypes User-defined Datatypes few datatypes in fact STRING only, e.g. CDATA, ID,... various datatypes; e.g. boolean, integer... Domains Enumerating all possible many possibilities: <length>, values (only for attributes)... Patterns for Datatypes Very restricted (e.g. by means of cardinality) <pattern> M2-86

Comparison DTD XML Schema Inheritance Derivation from predefined, simple datatypes Derivation from complex datatypes (extension) Dervivation from complex datatypes (restriktion) DTD XML Schema base base and <extension> base and <restriction> M2-87 Comparison DTD XML Schema Summary Most important advantages of DTD s quick and easy to specify benefical for the specification of simple documents Most important advantages of XML Schema Numerous datatypes Object-oriented approach more modelling possibilities than with DTDs beneficial for the specification of complex documents M2-88

Literature Books XML in a Nutshell: A Desktop Quick Reference, 3rd Edition Elliotte Rusty Harold, W. Scott Means, O'Reilly & Associates, 2005 O Reilly XML.com: http://www.xml.com XML 1.1 Bible, Elliotte Rusty Harold, 2nd Edition, John Wiley & Sons, 2004 Elliotte Rusty Harold. Cafe con Leche XML News and Resources: http://www.ibiblio.org/xml Conferences XML Europe (XTech Conference Series) http://www.xmleurope.com XML Conference & Exposition http://www.xmlconference.org Online Resources Commented XML-Standard Tim Bray http://www.xml.com/axml/testaxml.htm W3Schools http://www.w3schools.com/xml/ XML & DTD Patterns http://www.xmlpatterns.com/ Overview XML Editors http://www.perfectxml.com/soft.asp?cat=6 Java and XML. Sun Microsystems, Inc http://java.sun.com/xml/ IBM XML Zone http://www.ibm.com/developer/xml/ Microsoft XML Developer Center http://msdn.microsoft.com/xml/default.asp XML Schema Test Suites vom W3C http://www.w3.org/2001/05/xmlschema-test-collection.html IBM's Schema Quality Checker (SQC) http://www.alphaworks.ibm.com/tech/xmlsqc M2-89