Validating XML Data with an XML Schema



Similar documents
S. Bouzefrane. How to set up the Java Card development environment under Windows? Samia Bouzefrane.

CRM Outlook Plugin Installation

Published. Technical Bulletin: Use and Configuration of Quanterix Database Backup Scripts 1. PURPOSE 2. REFERENCES 3.

Data processing goes big

Schematron Validation and Guidance

Rational Developer for IBM i (RDi) Introduction to RDi

Data Transfer Tips and Techniques

Developing In Eclipse, with ADT

E-Return Intermediary (ERI) User Registration and Services

Third-Party Software Support. Converting from SAS Table Server to a SQL Server Database

Embarcadero DB Change Manager 6.0 and DB Change Manager XE2

Automating Testing and Configuration Data Migration in OTM/GTM Projects using Open Source Tools By Rakesh Raveendran Oracle Consulting

Introduction to Eclipse

Tutorial 5: Developing Java applications

Tool-Assisted Knowledge to HL7 v3 Message Translation (TAMMP) Installation Guide December 23, 2009

Hydrologic Modeling System HEC-HMS

Integrating VoltDB with Hadoop

SARANGSoft WinBackup Business v2.5 Client Installation Guide

JobScheduler Web Services Executing JobScheduler commands

Creating a Java application using Perfect Developer and the Java Develo...

Force.com Migration Tool Guide

enter the administrator user name and password for that domain.

Reflection DBR USER GUIDE. Reflection DBR User Guide. 995 Old Eagle School Road Suite 315 Wayne, PA USA

Eclipse installation, configuration and operation

Introduction to Android Development

EVENT LOG MANAGEMENT...

IBM DB2 XML support. How to Configure the IBM DB2 Support in oxygen

Citrix EdgeSight for Load Testing Installation Guide. Citrix EdgeSight for Load Testing 3.5

Change Manager 5.0 Installation Guide

Getting Started with Command Prompts

Witango Application Server 6. Installation Guide for Windows

Oracle Service Bus Examples and Tutorials

Sharpdesk V3.5. Push Installation Guide for system administrator Version

Informatics for Integrating Biology & the Bedside. i2b2 Workbench Developer s Guide. Document Version: 1.0 i2b2 Software Release: 1.3.

Stellar Phoenix Exchange Server Backup

CONFIGURING MICROSOFT SQL SERVER REPORTING SERVICES

SDK Code Examples Version 2.4.2

How To Connect A Java To A Microsoft Database To An Ibm.Com Database On A Microsq Server On A Blackberry (Windows) Computer (Windows 2000) On A Powerpoint (Windows 5) On An Ubio.Com

Building Data Integrator Real-time Jobs and Calling Web Services

ADFS 2.0 Application Director Blueprint Deployment Guide

Installation Guide for contineo

Introduction to XML Applications

Appium mobile test automation

Stored Documents and the FileCabinet

Administration GUIDE. SharePoint Server idataagent. Published On: 11/19/2013 V10 Service Pack 4A Page 1 of 201

VMWare Workstation 11 Installation MICROSOFT WINDOWS SERVER 2008 R2 STANDARD ENTERPRISE ED.

Short notes on webpage programming languages

StreamServe Persuasion SP4

MOVES Batch Mode: Setting up and running groups of related MOVES run specifications. EPA Office of Transportation and Air Quality 11/3/2010

Bulk Downloader. Call Recording: Bulk Downloader

Using Actian PSQL as a Data Store with VMware vfabric SQLFire. Actian PSQL White Paper May 2013

EMC Documentum Composer

Designing and Implementing Forms 34

Unity Error Message: Your voic box is almost full

TestManager Administration Guide

DataDirect XQuery Technical Overview

Selenium Automation set up with TestNG and Eclipse- A Beginners Guide

NetClient CS Document Management Portal User Guide. version 9.x

Install guide for Websphere 7.0

Builder User Guide. Version Visual Rules Suite - Builder. Bosch Software Innovations

Using the Eclipse Data Tools Platform with SQL Anywhere 10. A whitepaper from Sybase ianywhere

The full setup includes the server itself, the server control panel, Firebird Database Server, and three sample applications with source code.

Rational Reporting. Module 3: IBM Rational Insight and IBM Cognos Data Manager

Exam Name: IBM InfoSphere MDM Server v9.0

Department of Veterans Affairs. Open Source Electronic Health Record Services

CREATE A CUSTOM THEME WEBSPHERE PORTAL

Go to CGTech Help Library. Installing CGTech Products

creating a text-based editor for eclipse

How To Use Query Console

VERSION 9.02 INSTALLATION GUIDE.

F-Series Desktop User Manual F20. English - Europe/New Zealand

Working with SQL Server Integration Services

EMCO Network Inventory 5.x

DiskPulse DISK CHANGE MONITOR

IBM Operational Decision Manager Version 8 Release 5. Getting Started with Business Rules

Magaya Software Installation Guide

Jolly Server Getting Started Guide

Process Integrator Deployment on IBM Webspher Application Server Cluster

InfiniteInsight 6.5 sp4

B/E Aerospace FTP Tool Training Guide

To begin, visit this URL:

Managing DICOM Image Metadata with Desktop Operating Systems Native User Interface

HR Onboarding Solution

3. Installation and Configuration. 3.1 Java Development Kit (JDK)

Important Notice. (c) Cloudera, Inc. All rights reserved.

Builder User Guide. Version 5.4. Visual Rules Suite - Builder. Bosch Software Innovations

DocuShare Installation Guide

NETWRIX CHANGE NOTIFIER

PMOD Installation on Linux Systems

Installation Instruction STATISTICA Enterprise Small Business

Batch Eligibility Long Term Care claims

DB2 Database Demonstration Program Version 9.7 Installation and Quick Reference Guide

How To Configure CU*BASE Encryption

TIBCO Hawk SNMP Adapter Installation

File Management Where did it go? Teachers College Summer Workshop

IBM InfoSphere MDM Server v9.0. Version: Demo. Page <<1/11>>

GETTING STARTED WITH SQL SERVER

PHP on IBM i: What s New with Zend Server 5 for IBM i

Teradata SQL Assistant Version 13.0 (.Net) Enhancements and Differences. Mike Dempsey

Transcription:

Validating XML Data with an XML Schema Date: May 2007 Version: DRAFT 0.2 1

Contents 1. XML Validation Concepts a. Concepts b. Errors c. Resources 2. Example: Validation with XMLSpy a. Downloading Spy b. Creating a new XMLSpy Project c. Associate the homestead XML Schema with a folder d. Open the file in XMLSpy e. Add the active file to the folder f. Click the "Validate" button 3. Example: Manipulating Large XML Data Sets with Ant & Eclipse a. Tools for Records and Metadata vs. Tools for Data b. Apache Ant DOS command line c. Eclipse GUI interface d. V The File Viewer Viewing large files e. XML databases 2

Disclaimer The information and examples in this document are for demonstration purposes only. The information and examples presented are for your information to assist in enhancing the abilities of counties to work with and validate XML datasets with Minnesota Revenue XML schemas. The Minnesota Department of Revenue does not endorse nor support any products mentioned in this presentation. It is beyond the scope of the mission of the Property Tax Division to support tools within each county. Your staff is responsible for assuring that your tools match you business requirements. 3

XML Validation Concepts If you have: 1) A valid XML file. And 2) a well defined XML Schema, you can 3) check the XML file to see if it is XML and has all the required tags defined by the schema with any standard XML validation program. This is called validation. <XML File/> <XML Schema/> XML Validator Validation errors Validates 4

XML Validation Concepts XML is a text file where well defined tags surround each data value. Tag example: <Zip_Code>55101</Zip_Code> An XML Schema describes what tags are needed and where they need to be for a particular file. <xs:element name="zip_code"> <xs:simpletype> <xs:restriction base="xs:string"> <xs:pattern value= [0-9]{5}"/> </xs:restriction> </xs:simpletype> </xs:element> This fragment from an XML Schema defines a tag for Zip_Code 5

XML Validation Errors Tag example: <Zip_Code>55101</Zip_Code> <XML File/> <XML Schema/> XML Validator Validation errors Validates If you have: 1) An invalid XML file: You get an invalid XML, malformed XML or content error. Examples are missing tag brackets or other syntax errors. 2) A valid XML file with tag errors: You get a reasonable list of XML tag errors found that are inconsistent with the specific XML Schema being validated against. 6

XML Validation Errors for XML Escape Characters There are five characters are used in XML syntax that cannot be used directly in a data value. They must be escaped by representing the character using the ampersand representation Character < > & ' " Name less than greater than ampersand single quote or apostrophe double quote Escape < > & &apos; " 7

10 Common XML Transmission Errors 1. Mal-formed XML 2. Missing namespace declarations 3. Invalid document structure 4. Missing required element 5. Missing data in element 6. Invalid document type code values 7. Invalid property type code value 8. Invalid character values 9. Incorrect number of repeating fields 10. Incorrect tax year For more information about XML Errors, please also refer to the document: XML and XML Errors 8

XML & Validation Resources W3C XML Standards Page http://www.w3.org/xml/ OASIS XML Cover Pages http://xml.coverpages.org/xml.html#xmlvalresources (lots of references) XML.com http://www.xml.com (up-to-date XML information) XML.com Schema Tools http://www.xml.com/pub/a/2000/12/13/schematools.html (older list of schema tools) XMLSpy http://www.altova.com (free 30 day eval xml tools and validation) XMLStar http://xmlstar.sourceforge.net (free tools and validation) 9

Example: Validating a Homestead File with XMLSpy 10

Validating with XMLSpy Steps 1. Download XML Spy (30 day free eval) and homestead zip file 2. Create a new XML Spy Project 3. Associate the homestead XML Schema with a folder 4. Open the file in XMLSpy 5. Add the active file to the folder 6. Click the "Validate" button 11

Download XML Spy http://www.altova.com/products/xmlspy/xml_editor.html Altova will e-mail you a 30 day license key 12

Download Homestead Files 13

Start XML Spy Double click the XML Spy icon Create a New Project 14

New Project Window Note: if the window is not visible use the Window/Project menu to show the project window 15

Set the Properties of the XML Folder Right click over the XML files folder in the project view NOTE: RIGHT CLICK not left click 16

Folder Properties Click the "Validate with:" check box 17

Browse to homestead schema Click OK and then double click on your xml data file to be validated 18

Add this file to your project RIGHT click and select the "Add Active File" 19

Click the green check 20

View Results in Validation View If your file is valid a green check will appear in the validation view Error message will appear in this same window 21

File Size Limitations XMLSpy tends to have problems validating files over about 25MB on a system with 1GB of RAM Use Apache Ant and/or Eclipse if you want to validate larger files 22

Example: Manipulating Large XML Data Sets with Ant & Eclipse Tips for XML Files Above 25MB 23

Agenda Tools for Records and Metadata vs. Tools for Data Apache Ant DOS command line Eclipse GUI interface V The File Viewer Viewing large files XML databases 24

Records vs. Databases XML File Viewers (like XML Spy) are ideal for viewing single records and metadata (XML Schemas) Visual editing tools tend stop working when file sizes exceed about 25MB (given 2GB of RAM) (e.g. We don't use MS-Word to edit 100,000 records in a database) Other tools are more appropriate for debugging large data sets 25

In Memory vs. Streaming There are several different approaches to checking large files Load the entire file into memory (DOM) Stream the file through memory (SAX) Page only relevant sections into memory (Chunking used in V-The-File-Viewer) 26

Apache Ant Open source build manager User give ant a high-level description of a task Ant executes task using dependency analysis (only validate after extract) Called from shell (DOS or UNIX) Called from Integrated Development Environment (IDE) Download Link http://www.uniontransit.com/apache/ant/binaries/apache-ant-1.7.0-bin.zip See Wikipedia "Apache Ant" 27

28

Download.zip file 29

Adding tools.jar Apache ant needs one missing jar file call "tools.jar" that is free with Sun's Software Development Tools It is freely available from the Java download as part of the JavaSDK 1.4+ (but not the JDK) Temporary file is on the Java Open Source User Group JOSUG web site (www.josug.org/tools.jar) File is about 6MB! This must be in your build "Classpath" 30

Apache Ant 1.7 Many new features Simple <schemavalidate> target Faster execution path to your xml schema <schemavalidate nonamespacefile="homestead-data_v0.28.xsd" file="my-homestead-data.xml"> </schemavalidate> path to your xml data 31

build.xml Ant From DOS Command Line <?xml version="1.0" encoding="utf-8"?> <project default="validate-homestead"> <property name="srcdir" value="c:/homestead/stress-test"/> <property name="schemadir" value="c:/homestead/schemas"/> <target name="validate-homestead"> <schemavalidate nonamespacefile="${schemadir}/homestead-data_v0.28.xsd" file="${srcdir}/100mb-test.xml"> </schemavalidate> </target> </project> Change these to match your local system 1. Download Apache Ant version 1.7.0 2. Copy the build.xml into a directly 3. Change file locations in properties of the build file to match your local files 4. Run ant.bat (using the full path name) in folder that build file is located in 32

Apache Ant Tasks schemavalidate New Ant 1.7 optional task just for XML Schema xmlvalidate very general Ant 1.6 task for validation of XML files check for well-formed files check for validation against an XML Schema xslt transforms XML files replace replace specific text in large files 33

schemavalidate options http://ant.apache.org/manual/optionaltasks/xmlvalidate.html http://ant.apache.org/manual/optionaltasks/schemavalidate.html 34

Example <schemavalidate> task 100MB file validates in 10 seconds 35

Sample Ant 1.6 Validate Script This will validate only the 100MB-test.xml file Replace this with *.xml and all XML files in the source directory will be validated 36

Eclipse OpenSource Integrated development environment originally sponsored by IBM "GUI" front end to Apache Ant See http://www.eclipse.org/ 37

Sample Ant Classpath 38

Complete Ant 1.7 Build File <?xml version="1.0" encoding="utf-8"?> <project default="validate-homestead"> <property name="datadir" value="c:/homestead/data-files"/> <property name="schemadir" value="c:/homestead/schemas"/> <target name="validate-homestead"> <schemavalidate nonamespacefile="${schemadir}/homestead-data_v0.28.xsd" file="${datadir}/my-data-file.xml"> </schemavalidate> </target> </project> Properties can be set once in the file and reference many times. This makes your build files easier to maintain. 39

GUI "Point and Click" UI Sample "point and click" GUI interface Alt+Shift+X, Q to run a task 40

XML Transform View a homestead record of a specific parcel ID Big File (Gigabytes) XML Transform With Matching Rules match Very Small File no match 41

Sample XML Transform <?xml version="1.0" encoding="utf-8"?> <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/xsl/transform" xmlns:mn="http://data.state.mn.us" xmlns:c="http://niem.gov/niem/common/1.0" xmlns:u="http://niem.gov/niem/universal/1.0" xmlns:mnr="http://revenue.state.mn.us" xmlns:mnr-ptx="http://propertytax.state.mn.us" > <xsl:output indent="yes" exclude-result-prefixes="mn mnr c u mnr-ptx"/> <!-- only display the homestead record for this parcel ID --> <xsl:template match="/homesteadrecordsdocument/countyhomesteadrecord/homesteadparcels/homesteadparcel/countypr opertytaxstatement[mn:parcelid='1234567']"> <!-- copy the CountyHomesteadRecord that matched this parcel ID to the output --> <xsl:copy-of select="../../.."/> </xsl:template> <!-- do not output anything else --> <xsl:template match="@* node()"> <xsl:apply-templates select="@* node()"/> </xsl:template> </xsl:stylesheet> 42

V-The File Viewer Opens multi-gigabyte files in a few seconds $20 application (less in quantity) Easily allows viewing of files greater than 1GB (uses file "chunking" technology) Note: read-only tool See http://www.fileviewer.com/ 43

Use Goto Function or Goto is (Ctrl-G) 44

XML Databases XML databases store XML in its native format You can associate a column in your databases or a "collection" with the homestead XML Schema This allows you to have the database itself validate data before transmission to the state 45

Example of XML Databases IBM DB2 version 9 "PureXML" free and low-cost "express" versions for development and testing exist (open source) native XML database with XML Schema validation Over 50 other free and low-cost solutions with 30, 60 or 90 day evaluation periods http://www.rpbourret.com/xml/xmldatabaseprods.htm 46

DB2 IBM DB2 version 9 supports fast searches on complex XML data sets Load records into XML datatype Records are quickly validated using an XML Schema Searching is very fast 47

exist Open source Built in web-administration Easy to setup and configure Allows data to be validated on insert Fast searches Every XQuery IS a REST web service 48

Microsoft SQL Server 2005 Supports native XML datatype Supports fast indexing Add SOAP services to XML documents Support for XQuery and XQuery updates 49

Ant Book Covers Ant 1.7 50

Questions? 51