XML Schema Nan Niu (nn@cs.toronto.edu) CSC309 -- Fall 2008 This week Tuesday: lecture Schedule Thursday: A3 due (no tutorial) A4 will be posted around Thursday Next week Tuesday: no lecture Thursday: A2 re-marking & A4 office hour, in BA 5170 2 XML and DTD Learned on Oct 7 Well-formed vs. valid Deficiencies in DTDs Do not use XML syntax Do not support namespaces Data types cannot be strictly specified For example, date vs. string Web interoperability 3 Namespace Support modularity and reuse; avoid name collision Document may include elements and attributes from different schemas Namespaces are used to disambiguate Defined with xmlns attribute xmlns[:prefix]=url If no prefix is specified, it is referred to as the defaults namespace URI uniquely identifies namespace Has no further meaning 4 Namespace: Example <countries xmlns= http://www.countries-info.org/countries xmlns:cap= http://www.countries-info.org/countrycapitals > <country> <name> Mexico </name> <population>108,700,891 <population> <capital> <cap:name> Mexico </cap:name> <cap:population> 18,131,000 </cap:population> </capital> </country> <!--more countries--> XML Schema (www.w3.org/xml/schema) XML Schemas express shared vocabularies and allow machines to carry out rules made by people. They provide a means for defining the structure, content and semantics of XML documents. </countries> 5 6 1
Schema Description of an XML content model Define structure of instances Define data types of elements and attributes Is an XML application Follow XML syntax Support namespaces The XML Schema language itself is a set of XML tags The application being described is another set of XML tags Documents that conform to a schema s rules are considered instances of that schema 7 Sample Schema <xsd:schema xmlns= http://www.library.org xmlns:xsd= http://www.w3c.org/2001/xmlschema targetnamespace= http://www.library.org > <xsd:element name= Book > </xsd:element> <xsd:element name= Library > <xsd:complextype> <xsd:sequence> <xsd:element ref= Book minoccurs= 0 maxoccurs= unbounded /> </xsd:sequence> </xsd:complextype> </xsd:element> </xsd:schema> 8 Conceptual Model 9 XML Schema xmlns = http://www.w3.org/2001/xmlschema Tags <schema> <element> <attribute> Primitive and derived data types int, boolean, string, date, anyuri, etc. 10 XML Schema Definition (XSD) Document model that conforms to the XML Schema standard An XML application that can be used to describe other XML applications (document types) Defined in terms of the XML Schema tag set: <schema> <element> <attribute> <schema> Root element for schema documents xmlns For the schema namespace and for the namespace being defined xmlns:xsd= http://www.w3.org/2001/xmlschema targetnamespace Declare the namespace being defined elementformdefault With the value qualified, all elements defined in the target namespace must be namespace qualified when used, i.e., either with a prefix or be the default 11 12 2
<schema>: Example <schema xmlns= http://www.w3.org/2001/xmlschema xmlns:c= http://www.cs.toronto.edu/courses targetnamespace= http://www.cs.toronto.edu/courses elementformdefault= qualified > </schema> <schema>: Example (More Commonly Used) <xsd:schema xmlns= http://www.w3.org/2001/xmlschema xmlns:c= http://www.cs.toronto.edu/courses targetnamespace= http://www.cs.toronto.edu/courses elementformdefault= qualified > </xsd:schema> 13 14 <element> type ref Must reference a global element maxoccurs (nonnegativeinteger unbounded) : 1 minoccurs nonnegativeinteger : 1 (simpletype complextype) <element>: Example Definition <element name= score type= integer default= 0 /> Instances <score /> <score>124</score> Built-in types: boolean, integer, float, double, decimal, string, nonemptystring, time, date 15 16 <attribute> type default use (optional prohibited required) : optional ref Must reference a global element simpletype <attribute>: Example Create new attributes <attribute name= format type= string /> <attribute name= uri type= string use= required /> How to associate attributes to elements? 17 18 3
Element with Attributes Definition <element name= picture > <attribute name= format type= string use= optional /> <attribute name= uri type= string use= required /> Instance <picture format= GIF uri= images/blue_jays.gif /> 19 4 kinds of complex elements Empty elements <product pid= 123 /> Elements that contain only other elements <employee> <firstname>john</firstname> <lastname>green</lastname> </employee> Elements that contain only text <food type= dessert >Ice cream</food> Elements that contain both other elements and text <description> It happened on <date lang= norwegian >03.03.99</date> </description> Each of the above elements may contain attributes as well 20 (all choice sequence group ) all: each child appears (at most) once in any order <xsd:all> <xsd:element name= firstname /> <xsd:element name= lastname /> choice: only one of its children is allowed <xsd:choice> <xsd:element name= prof /> <xsd:element name= student /> sequence: in the given order <xsd:squence> <xsd:element name= firstname /> <xsd:element name= lastname /> group: a custom defined group attribute 21 Example of Nested Element 22 Example of Repeatable Elements (restriction list union) restriction: constrain the range of values list: list of the defined type union: union of the included types 23 24 4
Constraining Facets (<restriction>) length maxinclusive minlength maxexclusive maxlength minexclusive pattern maxinclusive enumeration totaldigits whitespace fractiondigits 25 Example <simpletype name= testu > <restriction base= string > <pattern value= \d{3}-{a-z}{2} /> </restricion> Pattern matches Three digits followed by a hyphen followed by two upper-case ASCII letters XML Schema Regular Expression References http://www.xmlschemareference.com/regularexpress ion.html http://www.w3.org/tr/xmlschema-2/#dt-regex 26 Constraining Element Content and Attribute Value Constraining Element Content and Attribute Value New simpletype References an existing type through base attribute Restricts or extends existing type Example 1: <attribute name= Security > <restriction base= NMTOKEN > <enumeration value= normal /> <enumeration value= secret /> <enumeration value= topsecret /> </restriction> </attribute> Example 2 <simpletype name= internalemailaddress > <restriction base= anyuri > <pattern value= [a-za-z]+\.[a-za-z]+@cs.toronto.edu /> </restrintion> <element name= studentemail type= internalemailaddress /> 27 Example 3: <element name= stockprice > <restriction base= decimal > <totaldigits value= 5 /> <fractiondigits value= 2 /> <mininclusive value= 0 /> </restriction> Example 4: Extension based on Unions <simpletype name= timeordate > <union membertypes= time date /> 28 Named Complex Types A complextype that is named in the schema can be set as the type of an element Option 1 <element name= Email > <sequence> <element ref= E:From />.. </sequence> Option 2 <complextype name= emailtype > <sequence> <element ref= E:From />.. </sequence> <element name= Email type= E:eMailType > Schema Processor / Validator Validates a document instance against a schema definition Online validation http://www.w3.org/2001/03/webdata/xsv Apache Xerces Java parser http://xml.apache.org/xerces-j/schema.html java Validator v library.xml http://xerces.apache.org/xerces2-j/ java -cp xercesimpl.jar;. Validator -v library.xml 29 30 5