Languages for Data Integration of Semi- Structured Data II XML Schema, Dom/SAX. Recuperación de Información 2007 Lecture 3.



Similar documents
04 XML Schemas. Software Technology 2. MSc in Communication Sciences Program in Technologies for Human Communication Davide Eynard

Introduction to XML. Data Integration. Structure in Data Representation. Yanlei Diao UMass Amherst Nov 15, 2007

XML Schema Definition Language (XSDL)

Java and XML parsing. EH2745 Lecture #8 Spring

XML & Databases. Tutorial. 2. Parsing XML. Universität Konstanz. Database & Information Systems Group Prof. Marc H. Scholl

XML in programming. Patryk Czarnik. XML and Modern Techniques of Content Management 2012/13

Introduction to XML Applications

XML and Data Management

Modernize your NonStop COBOL Applications with XML Thunder September 29, 2009 Mike Bonham, TIC Software John Russell, Canam Software

Geography Markup Language (GML) simple features profile

by LindaMay Patterson PartnerWorld for Developers, AS/400 January 2000

Visualization of GML data using XSLT.

XML WEB TECHNOLOGIES

Quiz! Database Indexes. Index. Quiz! Disc and main memory. Quiz! How costly is this operation (naive solution)?

High Performance XML Data Retrieval

Chapter 3: XML Namespaces

An XML Based Data Exchange Model for Power System Studies

Last Week. XML (extensible Markup Language) HTML Deficiencies. XML Advantages. Syntax of XML DHTML. Applets. Modifying DOM Event bubbling

Translating between XML and Relational Databases using XML Schema and Automed

XML: extensible Markup Language. Anabel Fraga

Concrete uses of XML in software development and data analysis.

Agents and Web Services

CST6445: Web Services Development with Java and XML Lesson 1 Introduction To Web Services Skilltop Technology Limited. All rights reserved.

DTD Tutorial. About the tutorial. Tutorial

N CYCLES software solutions. XML White Paper. Where XML Fits in Enterprise Applications. May 2001

How To Write A Contract Versioning In Wsdl 2.2.2

Standard Recommended Practice extensible Markup Language (XML) for the Interchange of Document Images and Related Metadata

Semistructured data and XML. Institutt for Informatikk INF Ahmet Soylu

How To Write A Technical Interoperability Standard For Spain

XML Based Customizable Screen. Rev 1.1

Exchanger XML Editor - Canonicalization and XML Digital Signatures

Data Integration through XML/XSLT. Presenter: Xin Gu

Structured Data and Visualization. Structured Data. Programming Language Support. Programming Language Support. Programming Language Support

LabVIEW Internet Toolkit User Guide

XML Processing and Web Services. Chapter 17

6. SQL/XML. 6.1 Introduction. 6.1 Introduction. 6.1 Introduction. 6.1 Introduction. XML Databases 6. SQL/XML. Creating XML documents from a database

Structured vs. unstructured data. Motivation for self describing data. Enter semistructured data. Databases are highly structured

XML Databases 6. SQL/XML

Web Services Technologies

XIII. Service Oriented Computing. Laurea Triennale in Informatica Corso di Ingegneria del Software I A.A. 2006/2007 Andrea Polini

An Approach to Eliminate Semantic Heterogenity Using Ontologies in Enterprise Data Integeration

Ficha técnica de curso Código: IFCAD320a

Managing large sound databases using Mpeg7

DRAFT. Standard Definition. Extensible Event Stream. Christian W. Günther Fluxicon Process Laboratories

XML-BASED AUTOMATIC TEST DATA GENERATION

Designing the Service Contract

Advanced Information Management

Towards XML-based Network Management for IP Networks

Introduction to Web Services

No Trade Secrets. Microsoft does not claim any trade secret rights in this documentation.

Device Feature Key Synchronization

T XML in 2 lessons! %! " #$& $ "#& ) ' */,: -.,0+(. ". "'- (. 1

SOFTWARE ENGINEERING PROGRAM

Lightweight Data Integration using the WebComposition Data Grid Service

Textual Modeling Languages

Extensible Markup Language (XML): Essentials for Climatologists

XML for RPG Programmers: An Introduction

Developers Guide. Designs and Layouts HOW TO IMPLEMENT WEBSITE DESIGNS IN DYNAMICWEB. Version: English

Service Description: NIH GovTrip - NBS Web Service

[MS-DVRD]: Device Registration Discovery Protocol. Intellectual Property Rights Notice for Open Specifications Documentation

Allegato XML flusso richieste di produzione

RUT developers handbook 9.51 Introduction to XML and DOM, with applications in Matlab v. 2.0

XML. CIS-3152, Spring 2013 Peter C. Chapin

CHAPTER 9: DATAPORT AND XMLPORT CHANGES

XML. Document Type Definitions XML Schema

Presentation / Interface 1.3

XSLT Mapping in SAP PI 7.1

REDUCING THE COST OF GROUND SYSTEM DEVELOPMENT AND MISSION OPERATIONS USING AUTOMATED XML TECHNOLOGIES. Jesse Wright Jet Propulsion Laboratory,

Parallels Operations Automation 5.4

[MS-FSDAP]: Forms Services Design and Activation Web Service Protocol

A MEDIATION LAYER FOR HETEROGENEOUS XML SCHEMAS

Model-driven Rule-based Mediation in XML Data Exchange

+ <xs:element name="productsubtype" type="xs:string" minoccurs="0"/>

XML Schemadefinition

10CS73:Web Programming

JavaScript: Client-Side Scripting. Chapter 6

BACKGROUND. Namespace Declaration and Qualification

User manual for e-line DNB: the XML import file. User manual for e-line DNB: the XML import file

OpenTravel Alliance XML Schema Design Best Practices

Selling on Amazon Guide to XML

Transcription:

Languages for Data Integration of Semi- Structured Data II XML Schema, Dom/SAX Recuperación de Información 2007 Lecture 3.

Overview XML-schema, a powerful alternative to DTDs XML APIs: DOM, a data-object Model for XML SAX, a simple API for XML Validating Parsers based on DOM and SAX More XML Companion standards 2

XML Schema Another way to describe Structure of an XML document. DTDs have very limited typing, missing: Cardinalities in DTDs hardly expressable at least very inconvenient. Reuse, inheritance of types is missing, etc. Typing, Datatypes All this is provided in XML Schema! 3

XML-Schema : Why use XMLschema? In order to validate an XML document you use XML Schema to specify: The allowed structure of an XML document The allowed data types contained in one (or several) XML documents 4

XML-Schema : Why use schemas instead of DTDs? XML Schema Documents (XSDs) themselves are XML documents and therefore use the same syntax and can themselves be validated with schemas! (whereas DTDs are somehow SGML legacy ) XML-schema has more powerful possibilites to define custom datatypes (regular expressions, inheritance, cardinalities etc.) (BTW: Schemas may also be combined with DTDs) 5

XML-Schema : Why use schemas instead of DTDs? XML-schema allows to define elements with nil content XML-schema allows to reuse element and attribute definitions (by reference) XML-schema uses XML namespaces (allows to refer to and prescribe certain namespaces) XML-schema allows to define full context-free grammars for expressing arbitrary XML structures! 6

XML-Schema : A Simple XML-Schema <?xml version= 1.0 encoding= UTF-8?> <marketing xmlns:xsi = http://www.w3c.org/2001/xmlschema-instance xsi:nonamespaceschemalocation = http://www.dot.com/myschema.xsd > <employee>gustav Sielmann</employee> <employee>arnold Rummer</employee> <employee>johann Neumeier</employee> </marketing> Instance <?xml version= 1.0 encoding= UTF-8?> <xsd:schema xmlns:xsd = http://www.w3c.org/2001/xmlschema > <xsd:element name = marketing> <xsd:complextype> <xsd:sequence> <xsd:element name = employee type = xsd:string maxoccurs = unbounded /> </xsd:sequence> </xsd:complextype> </xsd:element> </xsd:schema> Schema Definition 7

General features of XSD (XML Schema Definition) XML Schema is always a separate, i.e. external entity (file) No internal Schema within the xml-instance Schemalink via attribute in Document Element schemalocation attribute (for Schema with a tragetnamespace) nonamespaceschemalocation (for schema with no targetnamespace) Schemata can be embedded: <include>, <redefine>, <import> import if different namespaces <xsd:import schemalocation=" http://www.w3.org/1999/xlink xlink.xsd"/> with <redefine> single elements can be redefined e.g. a second schema which redefines the first one: <xsd:redefine schemalocation="..."><xsd:complextype.../> which restricts or extends the type with the same name from the original schema 8

XSD and namespaces: XML Schema uses namespaces itself - to distinguish schema instructions from the language we are describing supports namespace assigning - by associating a target namespace to the language we are describing. Example: Here: the default namespace is that of XML Schema (such that e.g. complextype is considered an XML Schema element) the target namespace is our business card namespace the b prefix also denotes our business card namespace (such that we can refer to target language constructs from within the schema) 9

XSD structure An XSD is composed of: The Schema Element Element Definitions Attribute Definitions Type Definitions Annotations 10

XSD : The Schema Element The schema element is the container element where all elements, attributes and data types contained in an XML document are stored The schema element refers to the XMLschema definition at W3C and is the document element (root) of an XSD: <xsd:schema xmlns:xsd = http://www.w3c.org/2001/xmlschema >..... </xsd:schema> 11

XSD: Schema element Can have several attributes: <?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/xmlschema" targetnamespace="http://www.w3schools.com" xmlns="http://www.w3schools.com" elementformdefault="qualified">... </xsd:schema> Elements and data types used in an XML schema document (schema, element, complextype, sequence, string, boolean, etc.) come from the "http://www.w3.org/2001/xmlschema" namespace, and must be prefixed with xsd: 12

XSD: Schema element Can have several attributes: <?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/xmlschema" targetnamespace="http://www.w3schools.com" xmlns="http://www.w3schools.com" elementformdefault="qualified">... </xsd:schema> The language defined by this schema (note, to, from, heading, body.) is supposed to be in the "http://www.w3schools.com" namespace. 13

XSD: Schema element Can have several attributes: <?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/xmlschema" targetnamespace="http://www.w3schools.com" xmlns="http://www.w3schools.com" elementformdefault="qualified">... </xsd:schema> default namespace is "http://www.w3schools.com". Usually makes sense to have the same def.ns and targetns 14

XSD: Schema element Can have several attributes: <?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/xmlschema" targetnamespace="http://www.w3schools.com" xmlns="http://www.w3schools.com" elementformdefault="qualified">... </xsd:schema> Any elements used by an XML instance document which were declared in this schema must be namespace qualified by the target namespace (analogous: attributeformdefault), i.e. the language defined here is bound to the namespace. alternative: unqualified, 15

How to reference XSD inside an instance document: In the instance file: Again default namespace, (this is now what was the targetns of the schema) <?xml version="1.0"?> <note xmlns="http://www.w3schools.com" xmlns:xsi="http://www.w3.org/2001/xmlschema-instance" xsi:schemalocation="http://www.w3schools.com note.xsd"> XML Schema Instance namespace The schemalocation attribute has two values. The first value is the namespace to use. The second value is the location of the XML schema to use for that namespace 16

How to reference XSD XSD can make use of namespaces! schemalocation attribute in instance (value: pair namespaces-xsd) w/o targetnamespace: nonamespaceschemalocation attribute in instance sophisticated don t bother too much for the moment. Find more in the references: e.g. http://ww.w3schools.com/ http://www.w3.org/xml/schema 17

XML-Schema : Defining Elements Elements Must have a name and a type (attributes) Either globally defined: as child of <schema> or locally defined: in context of other elements (can also have the same name as globally defined) Elements can refer to other (global) element or type definitions Elements can have an associated cardinality Complex or simple type: simple: (refinements of) built-in types (e.g. strings, numbers, dates, etc.) complex: can have nested elements, etc. Example for a simple type element referring to a built-in type: <xsd:element name = employee type = xsd:string /> Example for a complex type element which does not refer to a global or built-in type declaration: <xsd:element name = marketing"> <xsd:complextype> <xsd:sequence> <xsd:element name = employee type = xsd:string maxoccurs = unbounded /> </xsd:sequence> </xsd:complextype> </xsd:element> 18

Element order in XSD In general, there is no preferred order within a schema document you generally have to infer the "containing" element of a schema based upon which <xsd:element> references the highest elements in the tree. (unlike DTDs where you strictly define the document element) 19

XML-Schema : Defining Attributes Like elements, attributes must have a name and type Attributes can use custom data types (but only simple types*!) Attributes can be restricted similar like in DTDs (optional, fixed, required, default values) <xsd:attribute name = country type = xsd:string fixed= Austria /> <xsd:attribute name="partnum" type="ipo:sku" use="required"/> Attributes can refer to other (global) attribute definitions: <xsd:attribute ref= xml:lang" use="optional" /> * More on what exactly are simple types in the following slides. 20

XML-Schema : Annotations XML-schema provides several tags for annotating a schema: documentation (intended for human readers), appinfo (intended for applications) and annotation Documentation and appinfo usually appear as subelements of annotation <xsd:annotation> <xsd:documentation xml:lang = en > here goes the documentation text for the schema </xsd:documentation> </xsd:annotation> Remark: Different from normal comments in XML, since these annotations can be used in e.g. XML Schema editors, etc., difference is mainly conceptually 21

XML-Schema : Data Type Definitions XML-schema data types are either: Pre-built Simple Types Derived from Simple Types Complex Types 22

XML-Schema : Simple Types Simple types are elements that contain data Simple types may not contain attributes or sub-elements New simple types are defined by deriving them from built-in simple types <xsd:simpletype name = mysimpledayofmonth > <xsd:restriction base = xsd:positiveinteger > <!--positiveinteger defines the minimum to be 1--> <xsd:maxinclusion value = 31 /> </xsd:restriction> </xsd:simpletype> 23

XML-Schema : Some Built-in Simple Types Simple Type xsd:string xsd:hexbinary xsd:integer xsd:decimal xsd:boolean xsd:date xsd:anyuri Example Man, this day is long! 0FB7-12834, 0, 235-1.23, 0, 5.256 TRUE, FALSE, 0, 1 1977-10-03 http://www.dot.com/my.html#a3 24

XML-Schema : Complex Types Complex types are elements that allow subelements and/or attributes Complex types are defined by listing the elements and/or attributes nested within Complex types are used if one wants to define sequences, groups or choices of elements <xsd:complextype name = myadresstype > <xsd:sequence> <xsd:element name = Name type = xsd:string > <xsd:element name = Email type = xsd:string > <xsd:element name = Tel type = xsd:string > </xsd:sequence> </xsd:complextype> 25

XML-Schema : Complex Types In complex types elements can be combined using the following constructs: Sequence: all the named elements must appear in the sequence listed Choice: one and only one of the elements must appear All: all the named elements must appear, but in no specific order Group: collection of elements, usually used to refer to a common group of elements (only usable for global declarations and for references!), groups other sequences, choices or alls, for reuse. 26

XML-Schema : Complex Types <xsd:complextype name = myadresstype > <xsd:sequence> <xsd:element name = Name type = xsd:string > <xsd:element name = Email type = xsd:string > <xsd:element name = Tel type = xsd:string > </xsd:sequence> </xsd:complextype> Cardinality can also be restricted (for choice and sequence! <xsd:complextype name = myatmost2adresstype > <xsd:sequence minoccurs=0 maxoccurs=2> <xsd:element name = Name type = xsd:string > <xsd:element name = Email type = xsd:string > <xsd:element name = Tel type = xsd:string > </xsd:sequence> </xsd:complextype> 27

XML-Schema : Mixed, Empty and Any Content Mixed content is used if you want to model elements that includes both subelements and character data <xs:complextype mixed="true"> Empty content is used to define elements that must not include any subelements and character data, implicit in the definition: <xs:element name="product"> <xs:complextype> <xs:attribute name="prodid" type="xs:positiveinteger"/> </xs:complextype> </xs:element> Any content (the most basic data type) does not constrain the content in any way. The <any> element enables us to extend the XML document with elements not specified by the schema: <xs:element name="person"> <xs:complextype> <xs:sequence> <xs:element name="firstname" type="xs:string"/> <xs:element name="lastname" type="xs:string"/> <xs:any minoccurs="0"/> </xs:sequence> </xs:complextype> </xs:element> 28

XML-Schema : Inheritance XML-Schema provides a pseudo inheritance via type-derivations In XML-Schema all inheritance has to be defined explicitly New types can only be created by extending or restricting existing types Types can only be derived from one type multiple inheritance is not supported 29

XML-Schema : Restricting a Type New simple types can be derived by constraining facets of a simple type The XSD:RESTRICTION element is used to state the base type <xsd:simpletype name = AustrianPostalCode > <xsd:restriction base = xsd:integer > <xsd:mininclusive value = 1000 > <xsd:maxinclusive value = 9999 > </xsd:restriction> </xsd:simpletype> 30

XML-Schema : Extending a Type New complex types can be derived by extending other types (simplecontent/complexcontent) The XSD:EXTENSION element is used to state the base type <xsd:complextype name = myprice > <xsd:simplecontent> <xsd:extension base = "xsd:decimal"> <xsd:attribute name = "currency" type="xsd:string"/> </xsd:extension> </xsd:simplecontent> </xsd:complextype> When do you need this? E.g. if you want to attach attributes to a simple type element 31

Yet another complex example proddescr <element name="item"> <annotation> <documentation>one item of a purchase order with its details</documentation> </annotation> <complextype> <sequence> <choice> <element name="productname" type="string"/> <element name="productdescr" type="string"/> </choice> <element name="quantity"> <simpletype> <restriction base="positiveinteger"> <maxexclusive value="100"/> </restriction> </simpletype> </element> <element name="price" type="decimal"> <annotation> <documentation>needs to be specified in US$</documentation> </annotation> </element> <element ref="ipo:comment" minoccurs="0" maxoccurs="5"/> <element name="shipdate" type="date" minoccurs="0"/> </sequence> <attribute name="partnum" type="ipo:sku"/> </complextype> </element> 32

Inheritance in XSD compared with real type systems/inheritance: Note that this inheritance is not FULL inheritance in the sense of the semantics defined for OOP or Ontologies, for instance there is no Bottom-up inheritance of instances: E.g. element employee is a subclass (restriction/extension) of complextype person, but in XSD no means to say that an employee is a person then ) 33

These were some but not ALL features of DTDs and XSD Some examples and exercices in the zip file I will pt on the lecture homepage might help! Play around with it and find out more! 34

Overview XML-schema, a powerful alternative to DTDs XML APIs: DOM, a data-object Model for XML SAX, a simple API for XML Validating Parsers based on DOM and SAX More XML Companion standards 35

XML Parsers&APIs: DOM vs. SAX Document Object Model (DOM) DOM is a programming language independent object model and an API The specification describes a defined XML-usage handling elements as objects for interaction with object-oriented programming languages such as Java (tutorials: Java DOM implementation of apache: Xerces) DOM provides a complete tree structure for all objects of XML-documents Not suitable for extremely large XML-files. Good for forms/editors, simple to program with. DOM is a (bunch of) W3C Standard recommendation(s). Simple API for XML (SAX) More low-level API for XML application processing with the help of object-oriented programming languages such as Java. (tutorials: SAX implementation of apache: Xerces) Event-based rather than tree-based, not the whole tree in memory, sequential. SAX delivers an XML-element to an output stream and is suitable for processing large XML- Files. Good for App2App exchange. An XML parser reads an XML document and provides it in a form of DOM or in its own structure with SAX-events for further processing. A validating parser also checks with DTD/XSD. 36

DOM What is DOM? DOM(Document Object Model) Was developed by W3C Specify how future Web browser and embedded scripts should access HTML and XML documents This and the following slides are very much borrowed from: http://oopsla.snu.ac.kr/research/object/open_xml/8_xmldomsax_noted.ppt copyright 2001 SNU OOPSLA Lab. 37

DOM Java implementation SUN provides a class for parsing XML, called Xml Document. Xml Document methods to parse XML file, build the document tree. To use the SUN parser => import org.w3c.dom.*; import com.sun.xml.tree.*; import org.xml.sax.*; Check docs for these to learn details! This slide: copyright 2001 SNU OOPSLA Lab. 38

DOM Nodes (1/4) Nodes describe elements, text, comments, processing instructions, CDATA section, entity references... The Node interface itself defines a number of methods. 1. Each node has characteristics (type, name, value) 2. Having a contextual location in the document tree. 3. Capability to modify its contents. This slide: copyright 2001 SNU OOPSLA Lab. 39

DOM Nodes (2/4) Node characteristics getnodetype => determining its type getnodename => returning the name of the node setnodevalue => replacing the value of node haschildnodes => whether node has children or not getattributes => accessing attribute This slide: copyright 2001 SNU OOPSLA Lab. 40

DOM Nodes (3/4) Node navigation When processing a document via the DOM interface, it is to use node as a stepping-stones. Each node has methods that return references to surrounding nodes. getparentnode( ) getprevioussibling( ) getfirstchild( ) getchildnodes( ) getlastchild( ) This slide: copyright 2001 SNU OOPSLA Lab. getnextsibling( ) 41

DOM Nodes (4/4) Node manipulation Ex) remove child method. appendchild method insertbefore method replacechild method Old Child New Child This slide: copyright 2001 SNU OOPSLA Lab. 42

DOM Documents An entire XML document is represented by a special type of node. - getdoctype - getimplementation - getdocumentelement - getelementsbytagname This slide: copyright 2001 SNU OOPSLA Lab. 43

DOM Elements Element interface Extends the Node interfaces Adds element-specific functionality General element processing - gettagname method - getelementsbytagname method - normalize method Example Here is some normalize: text Here is sometext This slide: copyright 2001 SNU OOPSLA Lab. 44

DOM Attributes Attribute characteristics - getname method - getvalue - setvalue - getspecified Creating attribute - createattribute This slide: copyright 2001 SNU OOPSLA Lab. 45

DOM Node lists The Nodelist interface contains two method - Node item(int index); int getlength( ); getlength( ); getlength( ) node 0 3 Item(1); node 1 node 2 This slide: copyright 2001 SNU OOPSLA Lab. 46

DOM Named node maps The NamedNodemap interface is designed to contain nodes, in no particular order, that can be accessed by name. 4 getlength( ); Item(1); getlength( ) ID Lang getnameditem( Security ); Security setnameditem(o); Added removenameditem( Added ) This slide: copyright 2001 SNU OOPSLA Lab. 47

SAX What is SAX? SAX(the Simple API for XML) Is a standard API for event-driven processing of XML data Allowing parsers to deliver information to applications in digestible chunks This slide: copyright 2001 SNU OOPSLA Lab. 48

SAX Call-backs and interfaces The SAX interface are: Parser Document Handler AttributeList ErrorHandler EntityResolver Locator DTD Handler This slide: copyright 2001 SNU OOPSLA Lab. 49

SAX The Parser The Work of Parser The parser developer creates a class that actually parses the XML document or data stream The parser reads the XML source data Stops reading when encounters a meaningful object Sends the information to the main application by calling an appropriate method Waits for this method to return before continuing This slide: copyright 2001 SNU OOPSLA Lab. 50

SAX Document handlers In order for the application to receive basic markup events from the parser, the application developer must create a class that implements the DocumentHandler interface. Document Handler Application Feedback When event driven give create parsing startdocument() startelement() characters() endelement() enddocument() <! > <->. </-> Parser Event driven This slide: copyright 2001 SNU OOPSLA Lab. 51

SAX Attribute lists A wrapper object for all attribute details int getlength(); to associate how many attributes are present. String getname(int i); to discover the name of one of the attributes String gettype(int i); when a DTD is in use, to get a data type String gettype(string name); assigned to each attribute. String getvalue(int i); to get the value of an attribute String getvalue(string name); This slide: copyright 2001 SNU OOPSLA Lab. 52

SAX Error handlers When the application needs to be informed of warnings and errors It can implement ErrorHandler interface This slide: copyright 2001 SNU OOPSLA Lab. 53

SAX Locators Necessity An error message is not particularly helpful when no indication is given as to where the error occurred. Locator interface can tell the entity, line number and character number of the warning or error This slide: copyright 2001 SNU OOPSLA Lab. 54

SAX Handler bases HandlerBase class Providing some sensible default behavior for each event, which could be subclassed to add application-specific functionality E.g. DTDHandler This slide: copyright 2001 SNU OOPSLA Lab. 55

Making XML Application XML Application Architecture An XML application is typically built around an XML parser It has an interface to its users, and an interface to some sort of back-end data store User Interface XML Application XML Parser Data Store This slide: copyright 2001 SNU OOPSLA Lab. 56

Making XML Application Kinds of Parsers Validating versus non-validating parsers Validating parsers validate XML documents as they parse them with respect to a DTD or an XSD Non-validating parsers ignore any validation errors Parsers that support the Document Object Model(DOM) Parsers that support the Simple API for XML(SAX) This slide: copyright 2001 SNU OOPSLA Lab. 57

Making XML Application DOM Parser Tree structure that contains all of the elements of a document Provides a variety of functions to examine the contents and structure of the document This slide: copyright 2001 SNU OOPSLA Lab. 58

Making XML Application SAX Parser Generates events at various points in the document It s up to you to decide what to do with each of those events This slide: copyright 2001 SNU OOPSLA Lab. 59

Making XML Application DOM vs SAX Why use DOM? Need to know a lot about the structure of a document Need to move parts of the document around Need to use the information in the document more than once Why use SAX? Only need to extract a few elements from an XML document This slide: copyright 2001 SNU OOPSLA Lab. 60

A sample XML parser implementation: Xerces: xml.apache.org/ provides full implementation of SAX and DOM (level 2) APIs for JAVA, C++, etc. 61

Overview XML-schema, a powerful alternative to DTDs XML APIs: DOM, a data-object Model for XML SAX, a simple API for XML Validating Parsers based on DOM and SAX More XML Companion standards 62

XML : XPath XPath is a non-xml language used to identify particular parts of XML documents. XPath lets you write expressions that refer to the document's first person element, the seventh child element of the third person element, the ID attribute of the first person element whose contents are the string "Fred Jones, XSLT and XPointer use XPath to identify particular points in an XML document. 63

XML : XPointer XPointer defines an addressing scheme for individual parts of an XML document. XPointers enable you to target a given element by number, name, type, or relation, to other elements in the document. XPointers uses the XPath syntax to identify the parts of the document they point to. 64

XML : XML Linking Language The XML Linking Language (XLink) allows elements to be inserted into XML documents in order to create and describe links between resources. Generalizes Link Concept of HTML, based on XPointer. XLink provides a framework for creating both basic unidirectional links and more complex linking structures, e.g. linking relationships among more than two resources, associate metadata with a link, 65

XML : XML Style Language XSL (XML Style Language) is a language for expressing stylesheets for XML documents XSL describes how to display an XML document of a given type XSL defines a set of elements called Formatting Objects and attributes (in part borrowed from CSS2 properties and adding more complex ones) 66

XML : XSLT XSLT (XSL Transformations) is a language for transforming XML documents into other XML documents. XSLT is also designed to be used independently of XSL. However, XSLT is not intended as a completely general-purpose XML transformation language. Rather it is designed primarily for the kinds of transformations that are needed when XSLT is used as part of XSL. A transformation expressed in XSLT describes rules for transforming a source tree into a result tree. Important for Data Integration/Mediation! 67

Etc. etc. References XML-Schema : http://www.w3.org/xml/schema (also has links to useful tools, etc.) Fallside, D. XML Schema Primer, http://www.w3.org/tr/xmlschema-0/ DOM & SAX: Some very nice introduction to DOM&SAX: http://oopsla.snu.ac.kr/research/object/open_xml/8_xmldomsax_noted.ppt Xerces: http://xml.apache.org Examples will be provided on the Lecture-homepage (zip file)! 68