XML Programming in Java



Similar documents
XML & Databases. Tutorial. 2. Parsing XML. Universität Konstanz. Database & Information Systems Group Prof. Marc H. Scholl

XML in programming. Patryk Czarnik. XML and Modern Techniques of Content Management 2012/13

by LindaMay Patterson PartnerWorld for Developers, AS/400 January 2000

Structured Data and Visualization. Structured Data. Programming Language Support. Programming Language Support. Programming Language Support

First Java Programs. V. Paúl Pauca. CSC 111D Fall, Department of Computer Science Wake Forest University. Introduction to Computer Science

High Performance XML Data Retrieval

java.util.scanner Here are some of the many features of Scanner objects. Some Features of java.util.scanner

How To Write An Xml Document In Java (Java) (Java.Com) (For Free) (Programming) (Web) (Permanent) (Powerpoint) (Networking) (Html) (Procedure) (Lang

Java 2 Platform, Enterprise Edition (J2EE): Enabling Technologies for EAI

AP Computer Science Java Subset

Chapter 2 Introduction to Java programming

JDOM Overview. Application development with XML and Java. Application Development with XML and Java. JDOM Philosophy. JDOM and Sun

Masters programmes in Computer Science and Information Systems. Object-Oriented Design and Programming. Sample module entry test xxth December 2013

Ordered Lists and Binary Trees

Java and XML parsing. EH2745 Lecture #8 Spring

UIL Computer Science for Dummies by Jake Warren and works from Mr. Fleming

J a v a Quiz (Unit 3, Test 0 Practice)

Unified XML/relational storage March The IBM approach to unified XML/relational databases

Java Interview Questions and Answers

Languages for Data Integration of Semi- Structured Data II XML Schema, Dom/SAX. Recuperación de Información 2007 Lecture 3.

Java EE Web Development Course Program

CS 111 Classes I 1. Software Organization View to this point:

Data Structures and Algorithms

Continuous Integration Part 2

Basic Programming and PC Skills: Basic Programming and PC Skills:

Data XML and XQuery A language that can combine and transform data

Extracting data from XML. Wednesday DTL

N CYCLES software solutions. XML White Paper. Where XML Fits in Enterprise Applications. May 2001

CST6445: Web Services Development with Java and XML Lesson 1 Introduction To Web Services Skilltop Technology Limited. All rights reserved.

Output: struct treenode{ int data; struct treenode *left, *right; } struct treenode *tree_ptr;

A binary search tree or BST is a binary tree that is either empty or in which the data element of each node has a key, and:

Extending XSLT with Java and C#

Last Week. XML (extensible Markup Language) HTML Deficiencies. XML Advantages. Syntax of XML DHTML. Applets. Modifying DOM Event bubbling

CS 141: Introduction to (Java) Programming: Exam 1 Jenny Orr Willamette University Fall 2013

Markup Languages and Semistructured Data - SS 02

Boolean Expressions, Conditions, Loops, and Enumerations. Precedence Rules (from highest to lowest priority)

Exchanger XML Editor - Canonicalization and XML Digital Signatures

JAVA r VOLUME II-ADVANCED FEATURES. e^i v it;

12-6 Write a recursive definition of a valid Java identifier (see chapter 2).

WORKING WITH WEB SERVICES. Rob Richards

Tutorial for Creating Resources in Java - Client

Sample CSE8A midterm Multiple Choice (circle one)

Java Programming Fundamentals

DataDirect XQuery Technical Overview

Moving from CS 61A Scheme to CS 61B Java

The Forger s Art Exploiting XML Digital Signature Implementations HITB 2013

TagSoup: A SAX parser in Java for nasty, ugly HTML. John Cowan (cowan@ccil.org)

Java Application Developer Certificate Program Competencies

CompSci 125 Lecture 08. Chapter 5: Conditional Statements Chapter 4: return Statement

Managing large sound databases using Mpeg7

Introduction to Java

SOAP Tips, Tricks & Tools

Introduction to Computer Programming, Spring Term 2014 Practice Assignment 3 Discussion

Introduction to XML Applications

JAVA - QUICK GUIDE. Java SE is freely available from the link Download Java. So you download a version based on your operating system.

XML Processing and Web Services. Chapter 17

Pemrograman Dasar. Basic Elements Of Java

Smooks Dev Tools Reference Guide. Version: GA

Concrete uses of XML in software development and data analysis.

Design Patterns in Parsing

Translating to Java. Translation. Input. Many Level Translations. read, get, input, ask, request. Requirements Design Algorithm Java Machine Language

qwertyuiopasdfghjklzxcvbnmqwerty uiopasdfghjklzxcvbnmqwertyuiopasd fghjklzxcvbnmqwertyuiopasdfghjklzx cvbnmqwertyuiopasdfghjklzxcvbnmq

Free Java textbook available online. Introduction to the Java programming language. Compilation. A simple java program

API for java.util.iterator. ! hasnext() Are there more items in the list? ! next() Return the next item in the list.

JAVA IN A NUTSHELL O'REILLY. David Flanagan. Fifth Edition. Beijing Cambridge Farnham Köln Sebastopol Tokyo

Compiler Construction

Free Java textbook available online. Introduction to the Java programming language. Compilation. A simple java program

Supporting Data Set Joins in BIRT

Recursion. Definition: o A procedure or function that calls itself, directly or indirectly, is said to be recursive.

Stacks. Linear data structures

Custom SIS Integration Type Development. Jim Riecken Blackboard Learn Product Development

JavaScript: Control Statements I

Chapter 5. Recursion. Data Structures and Algorithms in Java

AdaDoc. How to write a module for AdaDoc. August 28, 2002

Introduction to Java Applications Pearson Education, Inc. All rights reserved.

EVALUATION. WA1844 WebSphere Process Server 7.0 Programming Using WebSphere Integration COPY. Developer

Java SE 8 Programming

ETL Systems; XML Processing in PHP

Object-Oriented Design Lecture 4 CSU 370 Fall 2007 (Pucella) Tuesday, Sep 18, 2007

A LANGUAGE INDEPENDENT WEB DATA EXTRACTION USING VISION BASED PAGE SEGMENTATION ALGORITHM

ECE 122. Engineering Problem Solving with Java

public static void main(string[] args) { System.out.println("hello, world"); } }

Unit 6. Loop statements

Towards XML-based Network Management for IP Networks

Zoomer: An Automated Web Application Change Localization Tool

Database & Information Systems Group Prof. Marc H. Scholl. XML & Databases. Tutorial. 11. SQL Compilation, XPath Symmetries

Exception Handling. Overloaded methods Interfaces Inheritance hierarchies Constructors. OOP: Exception Handling 1

Transcription:

XML Programming in Java Leonidas Fegaras University of Texas at Arlington Web Databases and XML L6: XML Programming in Java 1

Java APIs for XML Processing DOM: a language-neutral interface for manipulating XML data requires that the entire document be in memory SAX: push-based stream processing hard to write non-trivial applications StAX: pull-based stream processing far better than SAX, but not very popular yet Web Databases and XML L6: XML Programming in Java 2

DOM The Document Object Model (DOM) is a platform- and language-neutral interface that allows programs and scripts to dynamically access and update the content and structure of XML documents The following is part of the DOM interface: public interface Node { public String getnodename (); public String getnodevalue (); public NodeList getchildnodes (); public NamedNodeMap getattributes (); public interface Element extends Node { public NodeList getelementsbytagname ( String name ); public interface Document extends Node { public Element getdocumentelement (); public interface NodeList { public int getlength (); public Node item ( int index ); Web Databases and XML L6: XML Programming in Java 3

Traversing the DOM Tree Finding all children of node n with a given tagname NodeList nl = n.getchildnodes(); for (int i = 0; i < nl.getlength(); i++) { Node m = nl.item(i); if (m.getnodename() == tagname ) but can also be done this way NodeList nl = n.getelementsbytagname( tagname ); for (int i = 0; i < nl.getlength(); i++) { Node m = nl.item(i); Finding all descendants-or-self of node n with a given tagname requires recursion Web Databases and XML L6: XML Programming in Java 4

Node Printer using DOM static void printnode ( Node e ) { if (e instanceof Text) System.out.print(((Text) e).getdata()); else { NodeList c = e.getchildnodes(); System.out.print( "<" + e.getnodename() + ">" ); for (int k = 0; k < c.getlength(); k++) printnode(c.item(k)); System.out.print( "</" + e.getnodename() + ">" ); Web Databases and XML L6: XML Programming in Java 5

A DOM Example import javax.xml.parsers.*; import org.w3c.dom.*; import java.io.file; class DOMTest { static void query ( Node n ) { NodeList nl = ((Element) n).getelementsbytagname("gradstudent"); for (int i = 0; i < nl.getlength(); i++) { Element e = (Element) (nl.item(i)); if (e.getelementsbytagname("firstname").item(0).getfirstchild().getnodevalue().equals("leonidas")) printnode(e); public static void main ( String args[] ) throws Exception { DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); DocumentBuilder db = dbf.newdocumentbuilder(); Document doc = db.parse(new File("cs.xml")); Node root = doc.getdocumentelement(); query(root); //gradstudent[firstname='leonidas'] Web Databases and XML L6: XML Programming in Java 6

A Better, Modular Programming Want to write modular Java code to evaluate XPath queries One component for each XPath axis step An XPath query is constructed by composing these components Each component must read a sequence of Nodes and return a sequence of Nodes... starting from the root: root = new Sequence( cs.xml ) Examples: /department/gradstudent/name root.child("department").child("gradstudent").child("name").print(); /department/gradstudent[gpa="3.5"]/name Condition p = new Condition() { public Sequence condition ( Node x ) { return (new Sequence(x)).child("gpa").equals("3.5"); ; root.child("department").child("gradstudent").predicate(p).child("name").print(); See exanples/dom1.java Web Databases and XML L6: XML Programming in Java 7

Implementation Every component is a method of Sequence that returns a Sequence class Sequence extends Vector<Node> {... child: return the children of the current nodes that have the given tagname public Sequence child ( String tagname ) { Sequence result = new Sequence(); for (int i = 0; i < size(); i++) { Node n = elementat(i); NodeList c = n.getchildnodes(); for (int k = 0; k < c.getlength(); k++) if (c.item(k).getnodename().equals(tagname)) result.add(c.item(k)); ; return result; Web Databases and XML L6: XML Programming in Java 8

Descendant must be Recursive descendant: return the descendants of the current nodes that have the given tagname public Sequence descendant ( String tagname ) { Sequence result = new Sequence(); for (int i = 0; i < size(); i++) { Node n = elementat(i); NodeList c = n.getchildnodes(); for (int k = 0; k < c.getlength(); k++) { if (c.item(k).getnodename().equals(tagname)) result.add(c.item(k)); result.append((new Sequence(c.item(k))).descendant(tagname)); ; return result; Web Databases and XML L6: XML Programming in Java 9

Predicates The predicate method must be parameterized by a condition abstract class Condition { public Condition () { abstract public Sequence condition ( Node x ); Filter out the current nodes that do not satisfy the condition pred public Sequence predicate ( Condition pred ) { Sequence result = new Sequence(); for (int i = 0; i < size(); i++) { Node n = elementat(i); Sequence s = pred.condition(n); boolean b = false; for (int j = 0; j < s.size(); j++) b =!(s.elementat(j) instanceof Text)!((Text) s.elementat(j)).getwholetext().equals("false"); if (b) result.add(n); ; return result; Web Databases and XML L6: XML Programming in Java 10

An Even Better Method Problem: the intermediate Sequences of Nodes may be very big they are created to be discarded at the next step Example: //book[author='smith'] //book may return thousands of items to be discarded by the predicate Instead of creating Sequence, we can process one Node at a time Pull-based processing Method: pull-based iterators to avoid creating intermediate sequences of nodes A method used extensively in stream processing Also used in database query evaluation (query pipeline processing) See: examples/dom2.java Web Databases and XML L6: XML Programming in Java 11

Iterators Iterator interface: abstract class Iterator { Iterator input; // the previous iterator in the pipeline public Node next () { return input.next(); /department/gradstudent/name query(new Child("name",new Child("gradstudent",new Child("department", new Document("cs.xml"))))); /department/gradstudent[gpa="3.5"]/name Clone n = new Clone(new Child("gradstudent",new Child("department", new Document("cs.xml")))); query(new Child("name",new Predicate(new Equals("3.5",new Child("gpa",n)),n))); Document cs.xml /department /gradstudent clone /gpa = 3.5 predicate /name Web Databases and XML L6: XML Programming in Java 12

Node Children Child: return the children of the current nodes that have the given tagname class Child extends Iterator { final String tagname; NodeList nodes; int index; public Child ( String tagname, Iterator input ) { super(input); this.tagname = tagname; index = 0; nodes = new NodeList(); public Node next () { while (true) { while (index == nodes.getlength()) { Node e = input.next(); if (e == null) return null; nodes = e.getchildnodes(); index = 0; ; Node n = nodes.item(index++); if (n.getnodetype() == Node.ELEMENT_NODE && n.getnodename().equals(tagname)) return n; Web Databases and XML L6: XML Programming in Java 13

Cloning the Stream For each element e, emit e, then end-of-stream (null), then e again The first e is consumed by the predicate while the second by the result class Clone extends Iterator { Node node; int state; public Node next () { state = (state+1) % 3; switch (state) { case 0: node = input.next(); return node; case 2: return node; ; return null; Web Databases and XML L6: XML Programming in Java 14

Predicate Filters out the current nodes that do not satisfy the condition class Predicate extends Iterator { final Iterator condition; public Predicate ( Iterator condition, Clone input ) {... public Node next () { while (true) { Node n = condition.next(); if (n == null) if (input.next() == null) return null; else continue; while (n!= null) n = condition.next(); return input.next(); Web Databases and XML L6: XML Programming in Java 15

SAX DOM requires the entire document to be read before it takes any action Requires a large amount of memory for large XML documents It waits for the document to load in memory before it starts processing For selective queries, most loaded data will not be used SAX is a Simple API for XML that allows you to process a document as it's being read The SAX API is event based The XML parser sends events, such as the start or the end of an element, to an event handler, which processes the information Web Databases and XML L6: XML Programming in Java 16

Parser Events Receive notification of the beginning of a document void startdocument () Receive notification of the end of a document void enddocument () Receive notification of the beginning of an element void startelement ( String namespace, String localname, String qname, Attributes atts ) Receive notification of the end of an element void endelement ( String namespace, String localname, String qname ) Receive notification of character data void characters ( char[] text, int start, int length ) Web Databases and XML L6: XML Programming in Java 17

SAX Example: a Printer class Printer extends DefaultHandler { public Printer () { super(); public void startdocument () { public void enddocument () { System.out.println(); public void startelement ( String uri, String name, String tag, Attributes atts ) { System.out.print( < + tag + > ); public void endelement ( String uri, String name, String tag ) { System.out.print( </ + tag + > ); public void characters ( char text[], int start, int length ) { System.out.print(new String(text,start,length)); Web Databases and XML L6: XML Programming in Java 18

Parsing with SAX import javax.xml.parsers.saxparserfactory; import org.xml.sax.*; import java.io.filereader; try { SAXParserFactory factory = SAXParserFactory.newInstance(); factory.setvalidating(false); factory.setnamespaceaware(false); XMLReader xmlreader = factory.newsaxparser().getxmlreader(); xmlreader.setcontenthandler(printer); xmlreader.parse(file); catch (Exception e) { throw new Error(e); Web Databases and XML L6: XML Programming in Java 19

//*[lastname='smith']/firstname Need to construct a finite state machine class QueryHandler extends DefaultHandler { int state; public void startelement ( String uri, String name, String tag, Attributes atts ) { if (tag.equals("lastname")) state = 1; else if (state == 2 && tag.equals("firstname")) state = 3; public void endelement ( String uri, String name, String tag ) { if (tag.equals("firstname")) state = 0; public void characters ( char text[], int start, int length ) { String s = new String(text,start,length); if (state == 1 && s.equals("smith")) state = 2; else if (state == 3) System.out.println(s); <lastname> 0 1 2 <firstname> </firstname> Web Databases and XML L6: XML Programming in Java 20 = Smith 3 print

A Better, Modular Method For each XPath query, construct a pipeline to process the SAX stream one-event-at-a-time Each handler in the pipeline corresponds to an XPath axis step A handler acts as a filter that either blocks a SAX event, or propagates the event to the next pipeline handler A handler must have a fixed size state (independent of stream size) See: examples/sax.java Examples: /department/gradstudent/name new Document("cs.xml", new Child("department", new Child("gradstudent", new Child("name",new Print())))); Document cs.xml /department /gradstudent /name Print Web Databases and XML L6: XML Programming in Java 21

The Child Handler class Child extends XPathHandler { String tagname; // the tagname of the child boolean keep; // are we keeping or skipping events? short level; // the depth level of the current element public Child ( String tagname, XPathHandler next ) { super(next); this.tagname = tagname; keep = false; level = 0; public void startelement ( String nm, String ln, String qn, Attributes a ) throws SAXException { if (level++ == 1) keep = tagname.equals(qn); if (keep) next.startelement(nm,ln,qn,a); public void endelement ( String nm, String ln, String qn ) throws SAXException { if (keep) next.endelement(nm,ln,qn); if (--level == 1) keep = false; public void characters ( char[] text, int start, int length ) throws SAXException { if (keep) next.characters(text,start,length); Web Databases and XML L6: XML Programming in Java 22

Example Input Stream <department> <deptname> Computer Science </deptname> <gradstudent> <name> <lastname> Smith </lastname> <firstname> John </firstname> </name> </gradstudent>... </department> SAX Events StartDocument: SE: department SE: deptname C: Computer Science EE: deptname SE: gradstudent SE: name SE: lastname C: Smith EE: lastname SE: firstname C: John EE: firstname EE: name EE: gradstudent... EE: department EndDocument: Child:gradstudent Child: name Printer Web Databases and XML L6: XML Programming in Java 23

Predicates Need new events: startpredicate, endpredicate, releasepredicate /department/gradstudent[gpa="3.5"]/name XPathHandler n = new Child("name",new Print()); new Document("cs.xml", new Child("department", new Child("gradstudent", new Predicate(1,new Child("gpa", n)))); new Equals(1,"3.5",n)), Document cs.xml /department /gradstudent predicate /gpa = 3.5 /name Web Databases and XML L6: XML Programming in Java 24

How Predicate Works predicate stream A /name /gpa = 3.5 stream B stream A Outcome of = 3.5 stream B streams A and B <tag> false startpredicate <tag> false true releasepredicate releasepredicate </tag> true </tag> EndPredicate <tag> false startpredicate <tag> false false </tag> </tag> endpredicate Web Databases and XML L6: XML Programming in Java 25

The Predicate Handler class Predicate extends XPathHandler { int level; public Predicate ( int predicate_id, XPathHandler condition, XPathHandler next ) {... public void startelement ( String nm, String ln, String qn, Attributes a ) { if (level++ == 0) next.startpredicate(predicate_id); next.startelement(nm,ln,qn,a); condition.startelement(nm,ln,qn,a); public void endelement ( String nm, String ln, String qn ) { next.endelement(nm,ln,qn); condition.endelement(nm,ln,qn); if (--level == 0) next.endpredicate(predicate_id); public void characters ( char[] text, int start, int length ) { next.characters(text,start,length); condition.characters(text,start,length); Web Databases and XML L6: XML Programming in Java 26

StAX Unlike SAX, you pull events from document Imports: import javax.xml.transform.stax.*; import javax.xml.stream.*; Create a pull parser: XMLInputFactory factory = XMLInputFactory.newInstance(); XMLStreamReader reader = factory.createxmlstreamreader(new FileReader(file)); Pull the next event: reader.geteventtype() Type of events: START_TAG, END_TAG, TEXT START_DOCUMENT, END_DOCUMENT Methods: Predicates: isstartelement(), isendelement(), ischaracters(),... Accessors: getlocalname(), gettext(),... See: examples/stax.java Web Databases and XML L6: XML Programming in Java 27

Better StAX Events class Attributes { public String[] names; public String[] values; abstract class Event { class StartTag extends Event { public String tag; public Attributes attributes; class EndTag extends Event { public String tag; class CData extends Event { public String text; Web Databases and XML L6: XML Programming in Java 28

Iterators A stream iterator must conform with the following class abstract class Iterator { Iterator input; public Iterator ( Iterator input ) { this.input = input; public Event next () { return input.next(); public void skip () { input.skip(); // the next iterator in the pipeline Web Databases and XML L6: XML Programming in Java 29

The Child Iterator class Child extends Iterator { String tag; short nest; boolean keep; public Event next () { while (true) { Event t = input.next(); if (t == null) return null; if (t instanceof StartTag) { if (nest++ == 1) { keep = tag.equals(((starttag) t).tag); if (!keep) continue; else if (t instanceof EndTag) if (--nest == 1 && keep) { keep = false; return t; ; if (keep) return t; Web Databases and XML L6: XML Programming in Java 30