MSc in Communication Sciences 2009-10 Program in Technologies for Human Communication Davide Eynard Software Technology 2 04 XML Schemas
2 XML: recap and evaluation During last lesson we saw the basics of XML... Tree structure Elements and attributes Content vs presentation... And the basics of XML evaluation Well-formedness (just syntax) Validity wrt a schema
3 Why do we need a schema? XML can be used to describe different data and is totally unaware of what you are speaking about Example: You can check if the syntax is right...... but you cannot constrain its usage in any way! <person> <firstname>john</firstname> <lastname>doe</lastname> <SSN>123-45-6789</SSN> <SSN>987-65-4321</SSN> </person> SSN should be unique, but a simple check on syntax would not find errors in this code
4 What does a schema do? A schema allows you to define all the elements and attributes that can be used inside an XML document Moreover, you can add constraints specifying: Which are the children of a particular element n which order they appear How many children an element can have f the element is empty or contains text Datatypes for elements and attributes Default values for elements and attributes Given this information, an XML document conforming to a given schema can be validated The document is valid if it is well-formed and it follows the structure given inside the schema Validation can be done automatically by any tool which understands the schema language.
5 DTD vs XML Schema There are many ways of defining the structure of an XML document i.e. DTD, XML Schema, RELAX NG, Schematron,... DTD and XML Schema are the most used ones, but XML Schema is having more success (W3C recommendation, 2001) because: it is written with XML syntax it supports datatypes it supports namespaces it supports inheritance and data type extension
6 Using a DTD Writing a DTD is just as easy as writing another text file, but how can we use a DTD? How can we say a file should follow a schema? How can we use this information to validate the file? To match a document with a DTD, we should add the following to the xml prolog: <! DOCTYPE rootelement SYSTEM dtdlocation > To test it, we can use online validators or validating editors Example: <?xml version="1.0" encoding="utf-8"?> <!DOCTYPE messages SYSTEM "./messages.dtd"> <messages> <message msgid="1"> <from>...
7 DTD Elements 1 An element can declared in the following way: <!ELEMENT element-name category> or <!ELEMENT element-name (element-content)> Category = EMPTY <!ELEMENT br EMPTY> <br/> Elements containing only a sequence of characters <!ELEMENT element-name (#PCDATA)> Elements containing any mixture of text and other elements <!ELEMENT element-name ANY> Elements containing one or more children elements <!ELEMENT element-name (child1, child2,...)> Follows the specified order!
8 DTD Elements 2 Example: <text-message> <from>+393357654321</from> <to>+393471234567</to> <text>hi there!</text> </text-message> <!ELEMENT text-message (from, to, text)> <!ELEMENT from (#PCDATA)> <!ELEMENT to (#PCDATA)> <!ELEMENT text (#PCDATA)>
9 DTD Elements 3 Disjunction: <!ELEMENT email_header (cc bcc)> <!ELEMENT cc (#PCDATA)> <!ELEMENT bcc (#PCDATA)> We can use disjunction to specify subelements in generic order: <!ELEMENT email_header ((from,to) (to,from))>... What if we have 10 subelements? Cardinality: <!ELEMENT email_header (from,to+,cc*,subject)>? zero times or once * zero or more times + one or more times
10 DTD Attributes 1 Attributes are defined in a DTD with an attribute list: <!ATTLST element-name attr-name attr-type value-type> For each attribute you have to define: The name of the element it is related to ts name ts type ts value type
11 DTD Attributes 2 Example: <text-message from= +393357654321 > <to>+393471234567</to> <text>hi there!</text> </text-message> <!ELEMENT text-message (to, text)> <!ATTLST text-message from CDATA #REQURED> <!ELEMENT to (#PCDATA)> <!ELEMENT text (#PCDATA)>
12 DTD Attributes 3 Attribute types: CDATA, a string D, a name that is unique across the XML document DREF, a reference to another element with the D attribute DREFS, a sequence of DREF (v1... vn), an enumeration of all possible values i.e. weekday (monday tuesday... sunday) Limitations No dates No numbers No booleans
13 DTD Attributes 4 Attribute value types: #REQURED (attribute must appear in every occurrence of the element type in the XML document) #MPLED (the appearance of the attribute is optional) #FXED value (every element must have this attribute with this value) <!ATTLST html xmlns CDATA #FXED 'http://www.w3.org/1999/xhtml'> value (specifies the default value for the attribute) <!ATTLST car color (red white blue) red >
14 From DTD to XML Schema Main differences between XML DTD and XML Schema: Note: XML Schema's syntax is based on XML itself (you can use the same tools for XML documents and schemas!) t allows the reuse of existing schemas (inheritance) and their refinement (extension) t supports more specific datatypes t supports namespaces XML Schema is also called XML Schema Definition (XSD)
15 Namespaces Elements in XML files can be defined by the developers What if two developers use the same name for different kinds of elements? Example: <table> <tr> <td>apples</td> <td>bananas</td> </tr> </table> <table> <name>african Coffee Table</name> <width>80</width> <length>120</length> </table>
16 Namespaces definition We need a way to specify that element names come from two different contexts we put a prefix before element names we specify what namespace that prefix represents <h:table xmlns:h = "http://www.w3.org/tr/html4/"> <tr> <td>apples</td> <td>bananas</td> </tr> </h:table> <f:table xmlns:f = http://my.name.space/furniture/ > <name>african Coffee Table</name> <width>80</width> <length>120</length> </f:table>
17 Root and default namespaces You can also define all the namespaces you are going to use in the root element of your XML document: <root xmlns:h = "http://www.w3.org/tr/html4/" xmlns:f = http://my.name.space/furniture/ > <h:table>...</h:table> <f:table>...</f:table> </root> f the xmlns attribute is not followed by a prefix, then the specified namespace is considered as the default one <html xmlns="http://www.w3.org/1999/xhtml">
18 Documents using XML Schema How is the prolog of XML documents using an XML Schema? <?xml version="1.0" encoding="utf-8"?> <messages xmlns = "http://my.name.space" xmlns:xsi = "http://www.w3.org/2001/xmlschema-instance" xsi:schemalocation = "http://my.name.space./messages.xsd"> <message msgid="1"> <from>...... </messages>
19 XML Schema opening tag An XML schema is an XML document whose root element is called schema and is defined like follows: <?xml version="1.0" encoding="utf-8"?> <xs:schema xmlns:xs = "http://www.w3.org/2001/xmlschema" Source, targetnamespace = "http://my.name.space" target, xmlns = "http://my.name.space" and default ns elementformdefault = "qualified"> Notes: the xs:schema element is the root of every XML schema qualified = Associated with a namespace, either by the use of a declared prefix or via a default namespace declaration. More details here
20 The four constructs of XML Schema XML Schema is built on four constructs: A simple type definition defines a family of text strings (Unicode) A complex type definition defines a collection of requirements for attributes, sub-elements, and char data An element declaration associates an element name with either a simple type or a complex type An attribute declaration associates an attribute name with a simple type (attributes always contain unstructured text)
21 XML Schema Elements and Types To declare an element (equivalent to <!ELEMENT> in a DTD) you have to use the element tag: <element name=... /> The most important (optional) attribute is type, as it defines the element's content type: <element name=... type=... />
22 Cardinality and default values To change cardinality, you can use the (optional) attributes minoccurs and maxoccurs: <element name="from" minoccurs="1" maxoccurs="1" /> <element name="to" minoccurs="1" maxoccurs="unbounded" /> <element name="cc" minoccurs="0" maxoccurs="unbounded" /> Note: minoccurs= x, where x is an integer >=0 maxoccurs= x, where x is an integer >0 or unbounded The default is 1 in both cases Also, default or fixed values can be specified: <element name="color" type="xs:string" default="red"/> <element name="color" type="xs:string" fixed="green/>
23 XML Schema Attributes and Types To declare an attribute use the attribute tag (very similar to the element one): <attribute name=... /> Similarly to element, attributes can have types, default, and fixed values: <attribute name=... type=... /> <attribute name="color" type="xs:string" default="red"/> Attributes are optional by default. You can use the use attribute to make them required: <attribute name="color" type="xs:string" use="required"/> Note: Attributes can be defined only within a complex element type (see later)
24 XML Schema built-in data types
25 Simple derived types Derived datatypes (such as integer), are built from the original ones using Restrictions Lists Unions
26 Complex types Complex types are used to define elements which contain attributes, text, other elements, or any combination of these They are built using the following operators Element references, such as <element ref= name > Concatenation, using the sequence element Union, using a choice element The all element (like sequence but unordered) The any construct The group element (to allow references to item groups) MinOccurs and maxoccurs attributes to define cardinalities The mixed (boolean) attribute to allow mixed content
27 An example <xsd:complextype name= TeacherType > <xsd:sequence> <xsd:element name= firstname type= xsd:string minoccurs= 0 maxoccurs= unbounded /> <xsd:element name= lastname type= xsd:string /> </xsd:sequence> <xsd:attribute name= title type= xsd:string use= optional /> </xsd:complextype> <xsd:element name= teacher type= TeacherType /> <teacher title= Ph.D. > <firstname>davide</firstname> <lastname>eynard</lastname> </teacher>
28 Schema extension Modularization is allowed by the following three constructs: <include schemalocation="ur"/> <import namespace="ns" schemalocation="ur"/> <redefine schemalocation="ur">... </redefine> nheritance and extensions Restrictions
29 Limitations of XML Schema XML Schema is much more powerful and expressive than DTDs, however it still has some limitations: Too difficult for non-experts (problem: non-experts need to read the schema to write valid XML documents!) Element and attribute declarations are context insensitive Although XML Schema is built with XML, it still does not have a complete XML Schema Technical limitations When describing mixed content, the character data cannot be constrained in any way A schema cannot enforce a particular root element Element defaults cannot contain markup, but only character data... and many others
30 References Some Web references: G. Antoniou and F. van Harmelen A Semantic Web Primer, The MT Press 2004. Chapter 2 slides: http://www.ics.forth.gr/isl/swprimer M.C. Daconta, L.J. Obrst, and K.T. Smith. The Semantic Web, Wiley, 2003. Chapter 3 online: http://www.wiley.com/legacy/compbooks/daconta/sw/sample.html Tools: W3 School website, in particular http://www.w3schools.com/dtd and http://www.w3schools.com/schema Anders Møller and Michael. Schwartzbach. An ntroduction to XML and Web Technologies, Addison-Wesley, 2006. Chapter 4 (Schema Languages) online: http://www.brics.dk/ixwt Examples from Elizabeth Castro, XML for the World Wide Web: Visual Quickstart Guide, Peachpit Press, 2000. http://www.cookwood.com/xml XML Validation Services: http://www.stg.brown.edu/service/xmlvalid, http://validator.w3.org XML Copy Editor, a free (as in freedom), multiplatform editor which supports validation: http://xml-copy-editor.sourceforge.net Validator, a free (as in freedom), multiplatform, drag and drop XML validator which works on Windows, Linux, and Mac OS X: http://homepage.mac.com/rcrews/software/validator/