An XML file. XML Parsing. Generic XML Parsers. XML Parsing. Event-based Parser. Tree-based XML Parser. Michael Li

Similar documents
Databases and Information Systems 2

XML nyelvek és alkalmazások

XML & Databases. Tutorial. 2. Parsing XML. Universität Konstanz. Database & Information Systems Group Prof. Marc H. Scholl

5 &2 ( )" " & & )9 2) " *

Structured Data and Visualization. Structured Data. Programming Language Support. Programming Language Support. Programming Language Support

How To Write An Xml Document In Java (Java) (Java.Com) (For Free) (Programming) (Web) (Permanent) (Powerpoint) (Networking) (Html) (Procedure) (Lang

technische universität dortmund Prof. Dr. Ramin Yahyapour

XML Parsing and Web Services Seminar Enterprise Computing

XML Programming in Java

Network Programming. CS 282 Principles of Operating Systems II Systems Programming for Android

Last Week. XML (extensible Markup Language) HTML Deficiencies. XML Advantages. Syntax of XML DHTML. Applets. Modifying DOM Event bubbling

Java 2 Platform, Enterprise Edition (J2EE): Enabling Technologies for EAI

PHP and XML. Brian J. Stafford, Mark McIntyre and Fraser Gallop

Tutorial for Creating Resources in Java - Client

XML in programming. Patryk Czarnik. XML and Modern Techniques of Content Management 2012/13

Firewall Builder Architecture Overview

JDOM Overview. Application development with XML and Java. Application Development with XML and Java. JDOM Philosophy. JDOM and Sun

by LindaMay Patterson PartnerWorld for Developers, AS/400 January 2000

XML Filtering in Peer-to-peer Systems

Data XML and XQuery A language that can combine and transform data

Amazon Glacier. Developer Guide API Version

Responders: Language Support for Interactive Applications

Java and XML parsing. EH2745 Lecture #8 Spring

ITP 342 Mobile App Dev

XML WEB TECHNOLOGIES

ETL Systems; XML Processing in PHP

DTD Tutorial. About the tutorial. Tutorial

MarkLogic Server. Java Application Developer s Guide. MarkLogic 8 February, Copyright 2015 MarkLogic Corporation. All rights reserved.

T Network Application Frameworks and XML Web Services and WSDL Tancred Lindholm

N CYCLES software solutions. XML White Paper. Where XML Fits in Enterprise Applications. May 2001

Processing XML with Java A Performance Benchmark

JAVA. EXAMPLES IN A NUTSHELL. O'REILLY 4 Beijing Cambridge Farnham Koln Paris Sebastopol Taipei Tokyo. Third Edition.

AdaDoc. How to write a module for AdaDoc. August 28, 2002

Name: Class: Date: 9. The compiler ignores all comments they are there strictly for the convenience of anyone reading the program.

Mobility Information Series

GUI and Web Programming

High Performance XML Data Retrieval

Lecture 5: Java Fundamentals III

An XML Based Data Exchange Model for Power System Studies

DataDirect XQuery Technical Overview

CST6445: Web Services Development with Java and XML Lesson 1 Introduction To Web Services Skilltop Technology Limited. All rights reserved.

TagSoup: A SAX parser in Java for nasty, ugly HTML. John Cowan (cowan@ccil.org)

Change Management for XML, in XML

Overview of DatadiagramML

CHECKING AND SIGNING XML DOCUMENTS ON JAVA SMART CARDS Challenges and Opportunities

XML Processing and Web Services. Chapter 17

JMS Messages C HAPTER 3. Message Definition

Python Loops and String Manipulation

Ficha técnica de curso Código: IFCAD320a

13 File Output and Input

Concrete uses of XML in software development and data analysis.

Keywords: XML, Web-based Editor

Ambientes de Desenvolvimento Avançados

XBRL Processor Interstage XWand and Its Application Programs

Java and XSLT. Java and XSLT. By GiantDino. Eric M. Burke Publisher: O'Reilly First Edition September 2001 ISBN: , 528 pages

The Google Web Toolkit (GWT): Declarative Layout with UiBinder Basics

Continuous Integration Part 2

Developing a Web Server Platform with SAPI Support for AJAX RPC using JSON

A TOOL FOR DATA STRUCTURE VISUALIZATION AND USER-DEFINED ALGORITHM ANIMATION

How To Use X Query For Data Collection

First Java Programs. V. Paúl Pauca. CSC 111D Fall, Department of Computer Science Wake Forest University. Introduction to Computer Science

Extracting data from XML. Wednesday DTL

Performance Tips for DataDirect XQuery 2.0

Syllabus for CS 134 Java Programming

Managing large sound databases using Mpeg7

UNIVERSITY OF WATERLOO Software Engineering. Analysis of Different High-Level Interface Options for the Automation Messaging Tool

Chapter 3: XML Namespaces

ISM/ISC Middleware Module

Languages for Data Integration of Semi- Structured Data II XML Schema, Dom/SAX. Recuperación de Información 2007 Lecture 3.

10CS73:Web Programming

Voice Mail User s Guide (FACILITY NOT AVAILABLE IN RESIDENCES)

Building a Multi-Threaded Web Server

Web Services Technologies

Pre-authentication XXE vulnerability in the Services Drupal module

Creating a Web Service using IBM Rational HATS. For IBM System i (5250) Creating a Web Service using HATS 1

Standard Recommended Practice extensible Markup Language (XML) for the Interchange of Document Images and Related Metadata

Understanding In and Out of XAML in WPF

11.1 THE WORLD WIDE WEB

Scanner. It takes input and splits it into a sequence of tokens. A token is a group of characters which form some unit.

XML is an overwhelmingly popular data exchange format, because it s human-readable and easily digested by software.

Jonathan Worthington Scarborough Linux User Group

Specific Simple Network Management Tools

Open XML Court Interface (OXCI) Architecture Proposal

2013 Ruby on Rails Exploits. CS 558 Allan Wirth

MAX 2006 Beyond Boundaries

Web Services for Management Perl Library VMware ESX Server 3.5, VMware ESX Server 3i version 3.5, and VMware VirtualCenter 2.5

NASA Workflow Tool. User Guide. September 29, 2010

LabVIEW Internet Toolkit User Guide

XML Serialization in.net Venkat Subramaniam

A MEDIATION LAYER FOR HETEROGENEOUS XML SCHEMAS

White Paper. Nokia Web Services Framework for Devices a Service-oriented Architecture

Populating Your Domino Directory (Or ANY Domino Database) With Tivoli Directory Integrator. Marie Scott Thomas Duffbert Duff

Using Files as Input/Output in Java 5.0 Applications

Tail-f White Paper. Configuration Management Simplified. Executive summary. Why NETCONF and YANG?

XML in software development

HP Unified Functional Testing

PharmaSUG Paper QT26

Base Conversion written by Cathy Saxton

Exchanger XML Editor - Canonicalization and XML Digital Signatures

XML: extensible Markup Language. Anabel Fraga

Transcription:

An XML file XML Parsing Michael Li Email: jwl@cs.nott.ac.uk <note> <to>jerry</to> <from>tom</from> <heading>reminder</heading> <body>don't forget me this weekend!</body> </note> XML stands for EXtensible Markup Language XML was designed to store and transfer data XML tags are not predefined. You must define your own tags XML is designed to be self-descriptive An XML file represents a tree structure XML Parsing Generic XML Parsers Understand XML notation Extracting the tree from serialized XML into a format that the computer can process As the document is parsed, the data in the document becomes available to the application using the parser Many possible methods Can also perform validation and well-formedness checks Common types in use today Tree-based (DOM) Event-driven (SAX) Pull-based (Microsoft.NET) 3 4 Tree-based XML Parser Event-based Parser Deserializes the XML, and builds an in-memory representation of the XML tree Provides an API for the user to manipulate the tree Slower than event-based parsers One-size-fits-all, which can be a never-reallyfits-anywhere Low-level interface Tree-based parsers are usually built on top of an event-based parser Send messages to user s code when they discover content in the XML file Can work in either a push (event-driven) or a pull 5 6 1

Trees or Events Which parsing is best? Depends on the task in hand Event-based good for building your own object structures Tree-based suitable for jobs that change quickly, or have a short lifetime Traditionally, the execution of a program is under the programmer s control Execution starts at a defined entry point (e.g. main()) Continues until the program exits, executing the program line after line 7 8 The program sometimes hands control over to the OS (e.g. to read input from the keyboard) Event-driven programming operates in the opposite direction The program surrenders control to the OS The OS then sends messages to the program telling it about events that have happened Examples include mouse input, key presses etc. 9 10 Event-based Parsing So how does this work for XML? Using SAX as the example User registers a handler for the events with the OS The OS calls back into the program when the event occurs Parser is passed a pointer to a user-implemented object This object supports a defined interface () As the parser finds certain types of object in the serialized XML stream, it generates calls to the methods 11 12 2

Parsing This is exactly what happens in an event-based parser The parser recognizes each XML tag and calls the appropriate method on the interface An object that implements the interface will then be notified of the relevant parts of the XML document. <?xml version= 1.0?> <Document> Hello World <Bold> Goodbye Universe! </Bold> ); </Document> StartDocument(); StartElement( Document ); Characters( Hello World ); StartElement( Bold ); Charcters( Goodbye Universe! ); EndElement( Bold ); EndElement( Document ); EndDocument(); 13 14 SAX SAX (Simple API for XML) defines an interface called The parser is passed a reference to an object that implements the interface The SAX parser then calls the relevant methods on that object in response to the XML document public interface public void setdocumentlocator(locator locator); public void startdocument() throws SAXException; public void enddocument() throws SAXException; public void startprefixmapping(string prefix, String uri) throws SAXException; public void endprefixmapping(string prefix) throws SAXException; public void startelement(string namespaceuri, String localname, String rawname, Attributes atts) throws SAXException; public void endelement(string namespaceuri, String localname, String rawname) throws SAXException; public void characters(char ch[], int start, int length) throws SAXException; public void ignorablewhitespace(char ch[], int start, int length) throws SAXException; public void processinginstruction(string target, String data) throws SAXException; public void skippedentity (String name) throws SAXException; 15 16 public void startdocument() throws SAXException; public void enddocument() throws SAXException; No need to implement unused methods since SAX provides a default does nothing implementation Therefore inherit from DefaultHandler, not Self-explanatory Called only once when parsing starts and when it finishes respectively enddocument() can be a useful place to put code to process the deserialized data 17 18 3

public void startelement(string namespaceuri, String localname, String qname, Attributes atts); public void endelement(string namespaceuri, String localname, String qname); Attributes Each element in the document invokes these methods namespaceuri and rawname are used when dealing with multiple namespaces and can be ignored for our purposes localname contains the name of the element startelement() methods are passed a reference to an Attributes object User can obtain the value of an attribute by using the method getvalue(string attname) String date = atts.getvalue( number ); 19 20 public void characters(char ch[], int start, int length); State Called whenever PCDATA is parsed Note the start parameter do not expect your data to start at ch[0] SAX does not define how it will pass the PCDATA into this function You may get called once for each character! SAX is stateless If you need to know state, then you must maintain it yourself Recall Finite State Automata 21 22 A SAX example A SAX example (cont.) Sting file = "memo.xml"; // try block to create and use the parser try // create the parser as a SAX2 parser and set its handlers SAXParserFactory fact = SAXParserFactory.newInstance(); fact.setvalidating(false); fact.setnamespaceaware(true); SAXParser parser = fact.newsaxparser(); // start the parser parser.parse(file, new CCountTags()); // catch any errors from either the parser or the parser setup catch (Exception e) System.err.println(e.getMessage()); public class CCountTags extends DefaultHandler private int m_cstartelements; private int m_cendelements; public void startdocument() m_cstartelements = 0; m_cendelements = 0; public void startelement(string namespaceuri, String localname, String rawname, Attributes atts) m_cstartelements++; public void endelement(string namespaceuri, String localname, String rawname) m_cendelements++; public void enddocument() System.out.println("Found "+m_cstartelements+" start elements"); System.out.println("Found "+m_cendelements+" end elements"); 23 24 4

Readings "SAX, the power API" (Benoît Marchal, developerworks, August 2001): Learn about when to use the SAX API instead of DOM, plus get an overview of commonly used SAX interfaces and detailed examples in a Java-based application with many code samples "Simplify XML programming with JDOM" (Wes Biggs and Harry Evans, developerworks, May 2001): Explore an alternate object API that is optimized for the Java language. 25 5