EMC Documentum xdb. Installation and Administration Guide P/N A01. Version 9.0

Transcription

1 EMC Documentum xdb Version 9.0 Installation and Administration Guide P/N A01 EMC Corporation Corporate Headquarters: Hopkinton, MA

2 Copyright EMC Corporation. All rights reserved. Published February 2009 EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice. THE INFORMATION IN THIS PUBLICATION IS PROVIDED AS IS. EMC CORPORATION MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Use, copying, and distribution of any EMC software described in this publication requires an applicable software license. For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com. All other trademarks used herein are the property of their respective owners.

3 Page 1 of 130 xdb includes software developed by the Apache Software Foundation ( Table of Contents 1 Table of contents 2 Introduction 2.1 What is this? 2.2 Who is this manual for? 2.3 How is this manual structured? 2.4 How to use this manual? 2.5 Conventions Terminology Text usage 2.6 Support and feedback 3 Quickstart 3.1 Pre-installation 3.2 Installing xdb 3.3 Creating a database 3.4 Running a sample 4 General information 4.1 What is xdb? 4.2 The technology behind xdb DOM support and data manipulation Searching, linking and indexing Session and transaction control Administration features Import from non-xml resources and storage of BLOBs Transform XML into various outputs Versioning 4.3 xdb logical architecture Federated database Databases Users and user lists Groups and group lists Indexes and index lists Libraries Catalogs and DTDs and XML Schemas Documents Super user 5 Installing xdb 5.1 Installation requirements

4 Page 2 of Things to check before installation Supported platforms System requirements Java Things to do before installation 5.2 Installation - Windows Run the installation program 5.3 Installation - UNIX Starting the installation Entering installation parameters Performing post-installation steps 5.4 Verify the installation 5.5 Configuration files 5.6 Running the samples Create a database Execute the samples 5.7 Uninstalling xdb 5.8 Configuring the xhive.bootstrap property 5.9 Using the Java command line instead of xhive-ant 5.10 The xdb dedicated server process Configuration of the server XHStartServer XHStopServer 6 Creating applications 6.1 Introduction 6.2 Create a database Creating a database using xdb create-database Creating a database using the adminclient Creating a database using API functions 6.3 Connect to the database 6.4 Use sessions and transactions 6.5 Create libraries 6.6 Parse XML documents 6.7 Validate XML documents 6.8 Store XML documents 6.9 DOM configuration settings Deviations from specification Additional parameters 6.10 Parse documents with context 6.11 Store BLOBs 6.12 Import non-xml data 6.13 Create documents 6.14 Retrieve XML documents and document parts Through DOM operations By document ID By document name Using the executefullpathxpointerquery() method Using XQuery, XPath and XPointer Using a Context Conditioned Index 6.15 Use XQuery 6.16 Use XPath and XPointer XPath XPointer Working with namespaces 6.17 Use indexes Library id indexes Library name indexes Id attribute indexes Element name indexes Value indexes Full text indexes Context conditioned indexes Creating index node filters 6.18 Traverse XML documents Using DOM operations Through DOM Traversal Using function objects 6.19 Export XML documents 6.20 Publish XML documents Publish using XSLT

5 Page 3 of Publish to PDF 6.21 Use XLink 6.22 Use abstract schemas 6.23 Revalidate documents with XML schema 6.24 Access PSVI information 6.25 Manage users and groups 6.26 Use versioning Working with versioned documents Retrieving (older) versions of documents Branching Node level versioning 6.27 Using metadata on library children 6.28 Using JAAS to connect to the database 7 Using Sessions & Locking 7.1 Working with sessions The session lifecycle XhiveDriverIf.createSession() join() leave() connect() begin() commit() rollback() checkpoint() disconnect() terminate() Sessions and references to database objects 7.2 Sessions and transaction isolation 7.3 Sessions and locking What gets locked when? xdb behavior when a locking conflict occurs Readonly transactions 7.4 The xdb info command 8 Using XQuery 8.1 Executing queries 8.2 External variables and functions 8.3 Accessing documents and libraries 8.4 XQuery Error reporting 8.5 Data model discrepancies 8.6 Current implementation XPath axes Module Imports Supported options and extension expressions (pragmata) Examples XQuery Update Syntax Conditional Order By Extension functions Namespace declarations Collation support Limitations 8.7 Using indexes in XQuery Value and element name indexes Range queries Indexes on metadata Multiple indexes Using indexes to enhance order by performance Ignoring indexes 8.8 Type information and XQuery 8.9 Extending XQuery functionality using Java Type marshalling Java objects and instance methods Type checking Known limitations 8.10 Miscellaneous performance tips Parallel queries 8.11 Full text searching Supported W3C XQuery Full Text features Logical full-text operators Wildcard option

6 Page 4 of Anyall options Positional filters Score variables Score calculation xhive:fts function Full text search query syntax Boolean queries Prefix search Phrase search Wildcards Tokenization The analyzer Examples Limitations 9 Using indexes 9.1 Library indexes Library id indexes Library name indexes 9.2 Id attribute indexes 9.3 Element name indexes 9.4 Value indexes Value Index Types 9.5 Full text indexes 9.6 Path indexes Path index specification syntax 9.7 Metadata value indexes 9.8 Context conditioned indexes 9.9 Index performance Index scope Index selectiveness Index property: sorted keys Summary 9.10 Concurrent indexes 10 Using validation and the Catalog 10.1 Introduction 10.2 Catalog content Catalog location Identification of XML Schema and DTD models Managing models 10.3 Linking models to documents DTDs XML Schema 10.4 Validated parsing DTDs XML Schema 10.5 Validation DTDs XML Schema 10.6 PSVI 11 Performance 11.1 Internal server 11.2 JVM settings and cache size 11.3 Database page size 11.4 Multiple disks 11.5 Linux filesystems 11.6 Disk write caches 12 Internal structure of xdb 12.1 Segments and files Segments Files Setting up database configurations Log files 12.2 Detachable Libraries 12.3 Database configuration files xhive-clustering element segment element file element 13 Administering xdb 13.1 The Admin Client Starting the adminclient

7 Page 5 of Importing data Exporting and editing documents Adding Indexes Querying 13.2 The Command Line Client Running single commands The interactive console 13.3 Creating a new federation 13.4 Creating a backup Online ("hot") backups Incremental backups The xdb backup command Restoring backups The xdb restore command Offline ("cold") backups Snapshot backups Exporting individual libraries and documents 13.5 Managing a detachable library 13.6 Creating a library backup Online backup using API The xdb backup-library command Restoring a library backup The xdb restore-library command 13.7 Using Secure Socket Layer (SSL) Server side setup Client side setup 13.8 Readonly federations (CDROM usage) 13.9 Federation sets Creating federation sets Using federation sets 14 Replication 14.1 Introduction 14.2 Creating a replica 14.3 Running a replicator Dedicated server Internal server 14.4 Readonly usage of replicators Temporary data 14.5 Removing a replica 14.6 Federation metadata Replicated metadata Metadata not replicated 14.7 Moving the master copy Failure of the master server Scheduled moving of the master copy 15 Ant tasks 15.1 Introduction 15.2 Requirements: 15.3 Working with xdb's Ant tasks: 15.4 xdb's Ant Types: 15.5 xdb Ant Types Reference: <federation/> <database/> <library/> <document/> <user/> <group/> 15.6 Referencing xdb's Ant Types: 15.7 xdb's Ant Tasks: <createdatabase/> <copydatabase/> <renamedatabase/> <deletedatabase/> <backup/> <restore/> <replicatefederation/> <createlibrary/> <deletelibrary/> <upload/> <parse/>

8 Page 6 of <libraryidindex/> <idattributeindex/> <pathvalueindex/> <valueindex/> <elementindex/> <fulltextindex/> <metadatavalueindex/> <metadatafulltextindex/> <listindexes/> <deleteindex/> <batchindexadder/> <exportlibrary/> <serialize/> <deserialize/> <serialize-users/> <deserialize-users/> <adduser/> <deleteuser/> <addgroup/> <deletegroup/> <session/> <createfederation/> <updatefederation/> <registerreplicator/> <unregisterreplicator/> <closedriver/> 15.8 Notes Appendix 1 Table of contents 2 Introduction 2.1 What is this? This manual is designed to provide a technical introduction to the xdb product. This manual describes all aspects of installing, configuring, programming with, administering, and using xdb. This manual contains the basic information on xdb. There is also more detailed information on specific aspects of xdb, and information on using xdb combined with other tools. These 'Advanced Topics' can be found on the EMC Developer Network Who is this manual for? This manual is for all users of xdb, and assumes that you have knowledge of: Java. XML. The operating system that you are using. Basic database principles such as transactions, locking, and access rights. Furthermore, some knowledge of the following subjects is recommended, but not necessary: DOM. XQuery XSLT. 2.3 How is this manual structured? This manual is structured in a way that provides basic and essential information first, and then moves on to more advanced topics. This enables you to get started quickly with xdb.

9 Page 7 of 130 Quickstart gives you an overview of the essential steps you need to perform to install and set up xdb on your system. It gives you enough information to reach a point where you have installed xdb, created your first database, and learned how to use the sample files provided with the product. These sample files illustrate the key concepts of xdb functionality. General information describes the architecture of xdb, and provides information on the concepts that you need to understand when using the product. These concepts cover both XML in general and specific issues that relate to xdb. Installing xdb gives instructions on how to prepare your system for installing the product, and describes the actions you need to perform when running the installation program. Creating applications gives you instructions on how to develop xdb applications, moving from basic tasks such as programming a transaction to advanced techniques such as using versioning, XLink and content models. Using Sessions (and Locking) explains in detail the features and issues of the xdb transaction model. Using XQuery explains in detail how to use XQuery within xdb. Using Indexes describes how you can use one of xdb's indexing methods to enhance the performance of your xdb application. Using the Catalog and DTDs describes how xdb deals with DTDs. Internal Structure of xdb gives advanced information on the physical data-structure of xdb. Administering xdb describes the tasks you need to perform as a database administrator in xdb. The information provided includes basic tasks such as creating users, and also deals with more advanced topics such as backing up a database. 2.4 How to use this manual? It is recommended that you read this manual chapter by chapter, as this provides you with a logical progression through the concepts and tasks involved in using xdb. This approach is especially important if you are using xdb for the first time. However, if you are a first-time user, you may want to leave the later chapters (covering issues such as maintenance and optimization) until you have gained experience in the basic features of the product. If you are an experienced user of XML and xdb, you may not need to read the General information chapter in detail. However, it is recommended that you browse through the chapter before starting to use the product, because there are some changes in architecture as well as new functionality in this release of xdb. If you are an administrator, the Administering xdb chapter should be of particular interest. However, all users may find essential information in this chapter. 2.5 Conventions To ensure clarity and consistency, this manual uses various conventions in the areas of terminology and text usage Terminology Unless otherwise stated, the product and the application refer to the xdb application. The adminclient is a front-end component of xdb that gives easy access to administration features, such as user management. The administrator is the database administrator who uses the adminclient. In general, the xdb API conforms to the coding standards supplied by Sun Microsystems Text usage Strings of literal code and command-line entries appear in courier font. When requested to enter such a string, enter the string exactly as it appears in the manual. Variable information appears in italicized text. Sometimes variables are located in the middle of a literal string, in which case you must replace the italicized part with a suitable value in that context. Sample code appears in a text box, like this: Here is where you find sample code. Sample code is provided to illustrate various programming concepts and techniques in xdb. In most cases, the sample code included in the manual is available in sample files that are supplied with the product. If this is the case, the sample code in the manual is followed by a

10 Page 8 of 130 reference to the appropriate sample file. 2.6 Support and feedback EMC provides support via Powerlink We appreciate your feedback on this manual, and on any aspect of the xdb product. You can send your feedback by to veen_michiel@emc.com. You can also access product information, browse FAQs, and submit questions through the EMC Developer Network 3 Quickstart This section aims to get you started quickly with xdb. It covers all the steps necessary to get xdb running, from installation to creating a database to running your first sample. For more detailed information, consult the chapter Installing xdb. 3.1 Pre-installation Before installing xdb, be aware that: xdb should run on any system that has a Java 5 installation. When installing on Windows 2000, you must be logged on as an administrator or a user with similar access. xdb requires a Sun JDK 5, or fully compatible Java Virtual Machine. You need to install the JDK before installing xdb. 3.2 Installing xdb On Windows, simply run the setup.exe file and follow the instructions on screen. Besides installing the files in the proper directories, your PATH environment variable is augmented, and a page server service is started. If you have previous versions of xdb running, ensure you uninstall them in the proper way or choose other directories and port-numbers. For more information, see the Uninstalling xdb section. On UNIX, start the distribution from the distribution directory, with the command sh setup.sh /InstallationPath/xdb. The UNIX installation requires some post-installation steps, which are described in the readme.txt file of the distribution. You can install the xdb program under any account. 3.3 Creating a database The first step after installing xdb is creating a database. The easiest way to do this is through the AdminClient, as follows: 1. Start the AdminClient by running xdb admin, which is located in the bin subdirectory of the xdb target directory, and in the Start menu group of xdb on Windows. 2. Create a database by selecting the menu option Databases->Create database. 3. Create a database named united_nations with the super user password as entered during installation of xdb and administrator password northsea. This database will be used by default in all samples described in this manual. 4. After creating the database, you can close the administrator client. Note that you can also use the administrator client to view the data after you have run the samples, and to perform other actions. Alternatively the command line tool xdb create-database can be used to create databases. 3.4 Running a sample xdb samples are run using the ant build system. A command line tool called xhive-ant is provided which sets the proper CLASSPATH and other parameters. The samples are run as follows: 1. Open a command prompt and go (cd) to the XhiveDir\bin directory. 2. To run a sample which inserts two documents into the database, enter the command: xhive-ant run-sample -Dname=manual.StoreDocuments

11 Page 9 of 130 On successful completion of the sample, a message appears stating the number of documents stored in the database. The sources of all the samples can be found in XhiveDir\src\samples\manual. You should check the values of the properties in SampleProperties.java, so that they match your settings, before running the samples. 4 General information 4.1 What is xdb? xdb is a database that enables high-speed storage and manipulation of very large numbers of XML documents. Using xdb, programmers can build custom XML content management solutions that are fully tailored to the exact requirements of any given application. xdb stores XML documents in an integrated, highly scalable, high-performance, object-oriented database. exposes the database and its contents to the user via an application programming interface (API). The API is written in Java. xdb implements (and extends) the following recommendations of the World-Wide Web Consortium (W3C) for querying, retrieving, manipulating, and writing XML documents: Document Object Model (DOM) Level 1 DOM Level 2 (Core and Traversal) DOM Level 3 (Core and Load/Save) The extensible Stylesheet Language - Transformation (XSLT) XQuery XPath XLink XPointer 4.2 The technology behind xdb This section introduces the following technical aspects of xdb: DOM support and data manipulation Searching, linking and indexing Session and transaction control Administration features Import from non-xml resources and storage of BLOBs Transform XML into various outputs Versioning DOM support and data manipulation xdb implements an extended DOM level 3 interface for manipulating the content, structure, and style of documents. xdb supports all DOM level 3 functionality, including functions for retrieval, modification and navigation within XML documents. DOM level 3 does not support XML collections for handling more than one document, so extended API functions in xdb provide support for processing multiple documents simultaneously. xdb supports nested libraries. You can store libraries within other libraries in the same way that you store documents within libraries. All operations on documents (incl. XQuery queries) can also be performed on libraries as they are implemented as DOM nodes Searching, linking and indexing XML Query Language (XQuery) is a string syntax that you can use to address any piece of information in an XML document, make selections based on conditions, and even construct new structures based on queried information. This makes XQuery an extremely useful tool for searching in XML databases. xdb includes an XQuery query engine implementation. For information on using XQuery, see Using XQuery.

12 Page 10 of 130 Besides XQuery, we also have support for XPath and XPointer queries. For information on using XPath and XPointer, see Use XPath and XPointer. XLink is a W3C recommendation that enables links between XML documents. You can use XLink to create simple links equivalent to <a href> links in HTML, and to create more complex links, known as extended links. You can use extended links to, for example, create one-tomany links, and to add some semantic meaning to your links. For a complete and up-to-date description of XLink, refer to the XML Linking Language (XLink) Version 1.0 documentation at the W3C website. Indexing enables faster access to parts of a library or document and increases the performance and scalability of xdb applications. xdb supports several different indexing methods: Context Conditioned Indexing, library id and library name indexing, id attribute indexing, element name indexing, value indexing and full text indexing Session and transaction control xdb includes a transaction mechanism to ensure that changes and updates to the database (the 'transactions') are completed harmoniously across the system. All transactions take place within a session that can be committed or rolled back if the transaction conflicts with other transactions. For instructions on how to use sessions and transactions in xdb, see Using sessions & transactions Administration features The administration features of xdb enable the xdb administrator to manage user and user group permissions for access to documents and queries. An administration tool (a.k.a. 'the adminclient') provides access to the administration features and also includes a data browser that displays the contents of the xdb database using a tree-view. The administration features are also accessible through the API Import from non-xml resources and storage of BLOBs xdb contains an SQL Loader that uses JDBC to import data stored in relational databases. All JDBC-compliant RDBMSs are supported. Data in sequential files can also be loaded into xdb using the SQL Loader interface. xdb also contains an integrated version of the Xerces parser for importing XML documents from files on the file system. More information on the import facilities of xdb can be found in the Import non-xml data section. Next to XML documents it is also possible to store Binary Large OBjects (BLOBs) in xdb Transform XML into various outputs xdb contains a transformation engine that uses XSLT. XSLT provides a declarative means for transforming an XML source tree into any required result tree. This makes it possible to transform XML into such formats as HTML or WML. Powerful publishing of XML content on the Web (or for any device) invariably requires extensive use of XSLT. xdb includes a special interface, com.xhive.util.interfaces.xhiveformatterif, for producing formatted documents that are converted to PDF. You can either format an FO document as a PDF string, or can transform an XML file into a FO document and then format it as a PDF string. For a detailed specification of the formataspdf method, refer to the Publish XML documents section Versioning xdb provides linear versioning with branches. This feature enables the storage of multiple (older) versions of a document within the context of the document. Storing multiple versions of a document makes it easier to keep track of the changes in a document and to restore older versions when needed. 4.3 xdb logical architecture This section describes the logical architecture of xdb. For an overview of the internal architecture of xdb, go to the Internal xdb architecture section. The following diagram provides an overview of the logical architecture of xdb:

13 Page 11 of 130 Figure 1. xdb's logical architecture Federated database A single xdb application contains one or more databases. These databases are grouped within a single federated database (the xdb Federation). A federated database contains the following objects: Super user Databases You can obtain an iterator for all the database names using the getdatabases() method of XhiveFederationIf Databases An xdb application can contain one or more databases. These databases are contained in the federated database, and are created by the super user. A database can contain: Users and user lists Groups and group lists Indexes and index lists Libraries Documents Catalogs and Schemas Blobs (binary large objects) Users and user lists Every user of an xdb application has a user account stored in the database. This user account is identified by a unique user name, and has a password for access control. Each user has his own specific set of access rights to the database. For example, one user could be allowed to update documents in the database, while another user has read-only access to documents.

14 Page 12 of 130 In addition to regular users, a super user account exists. This account is created automatically during installation, and enables initial database configuration. All other users have their accounts created by the database administrator (a regular user with extra privileges). xdb maintains a user list for each database which contains all the users of that database Groups and group lists A group is an object in the database with which you can associate one or more users. The main purpose of a group is to enable you to assign the same access rights and privileges to a subset of users. By assigning rights to groups, you then only need to add and remove users from a group to change their privileges. xdb maintains a group list for each database which contains all groups of that database Indexes and index lists An index list provides a list of all the indexes of a library or document Libraries A library is a logical means of storing documents or other libraries. You can create as many libraries as you need to produce a hierarchy or storage architecture for your documents. The nested structure of libraries within xdb is very similar to the nested structure of directories or folders within your file system: any library can contain other libraries and there is exactly one root library. The root library is automatically created when the database is created but otherwise behaves like an ordinary library Catalogs and DTDs and XML Schemas The catalog is the part of the database that stores Document Type Definitions (DTDs) and XML Schemas. For information, see the Catalog section Documents A document is the means of storage for XML data in an xdb database. xdb can handle both valid (that is, conforming to a structure defined in a DTD or XML Schema) and well-formed XML documents Super user A federated database has one super user, with username superuser. The super user has the right to create and delete databases, and to perform administration operations like setting the license key and backing up the database. The super user cannot access regular data like libraries and documents. The super user is not represented by an object in the xdb API. The super user is created during xdb installation. 5 Installing xdb 5.1 Installation requirements Before running the installation program, you need to perform both checks and tasks to ensure that the installation is successful Things to check before installation You need to check the following issues before installation: Supported platforms System requirements Java settings Supported platforms xdb is a Java application. Therefore you can install it on any platform provided you have already installed a Java Development Kit 5 or higher. Installers are available for Windows 2000/XP and Unix (tested on Linux, Solaris, HP-UX and Mac OS X) System requirements The following table shows the system requirements for each supported platform.

15 Page 13 of Java To install xdb, you must already have installed a Java Development Kit (JDK 5 or higher), or a fully compatible Java Virtual Machine. For platform-specific information, see System Requirements. Also note the base directory of the JDK installation. The installation program requires this information Things to do before installation You need to perform the following tasks before installation: Read the readme.txt file. On Windows, log in as an administrator, or as a user with similar access. 5.2 Installation - Windows When you have satisfied the installation requirements, you can run the installation program. Follow all the prompts in the installation program, then create a database. This completes the installation procedure - xdb is then ready for use Run the installation program System requirements for xdb RAM 256MB Hard drive space for xdb software 50MB Hard drive space per database 1MB (minimum, real size depending on data loaded) Java JDK 5 or higher ( Network TCP/IP On Windows, the installation program you need to run is called xdb_setup.exe, located in the root directory of the xdb CD or distribution file. Besides installing the files in the proper directories, your PATH environment variable is augmented. If you have previous versions of xdb running, ensure you uninstall them in the proper way. For information on uninstalling xdb, see the Uninstalling xdb section. When you run xdb_setup.exe, the installation process begins. Figure 2. You simply need to follow the instructions on screen to complete the installation. The installation program requires your input for the following steps: 1. Accept the software license agreement. If you do not accept the license agreement, the installation is terminated.

16 Page 14 of 130 Figure Specify the target installation directory. The default on Microsoft Windows is C:\Program Files\xDB (or similar, depending on the language of your Windows installation). Figure Browse for the base directory of the applicable Java Development Kit. For example, C:\jdk1.5.

17 Page 15 of 130 Figure Enter the license key you received for the product. Note that there is a time limit on the validity of xdb software licenses. If in doubt about your license, contact xdb customer support. Figure Specify the password of the super user. This password is later needed to create databases with xdb and perform other super user tasks.

18 Page 16 of 130 Figure Choose whether to proceed with a standard installation (recommended), or to perform some additional advanced installation steps. Figure 8. If you proceed with a standard installation, the program files for xdb are now installed on your host machine. If you proceed with an advanced installation, your input is needed for the following two additional steps. 7. Specify the database directories. The installation program provides defaults for all database directories, based on the installation directory. You should never put database files on remote file servers, like NFS servers or MS Windows shares. For performance, it's best to place the log files on another disk then the database files (in xdb 5, that makes a bigger difference then in previous xdb versions). It is also possible to change the default page size. Ideally, the page size should match the page size of the filesystem where the database shall reside. You should not pick larger sizes than the file system one. See the section on page sizes for more information.

19 Page 17 of 130 Figure You can set some JVM properties, that only relate to with what JVM options client applications like the samples through ant and the Adminclient are run. Figure During installation, a service is installed and started that acts as a page-server. There are some settings to specify for this service: The server can run on any port. A check is made whether the port is available. The port-number determines what URL you have to use later to access the database. The cache size determines the amount of pages cached by the server to speed up performance. This is one of the major factors determining overall performance. Leaving it at 0 so that the database will use half the JVM's memory as cache is a safe default. By default, only processes running on the same machine can access the server for security reasons. If you want to have processes on multiple machines access the database, you can choose here to allow it here.

20 Page 18 of 130 Figure 11. After successful installation of xdb, the installation program displays a message confirming that the installation completed without errors. To have all settings take effect you will need to do log off and log in (a reboot is not necessary, only the PATH variable change needs to be registered). 5.3 Installation - UNIX On UNIX platforms you can install xdb as any user. Installation consists of three steps: 1. Extracting the distribution to a directory. 2. Entering installation parameters. 3. Performing post-installation steps Starting the installation Extract the distribution.tar.gz file to the directory where you would like to install xdb. Make sure that you have a working Java 5 executable in your PATH (test this by running java -version from the command line). Then run the included setup script using sh setup.sh. You will need write permissions in both the installation directory (the directory where you untarred the distribution to) and the directory where you create your initial federation Entering installation parameters After the installation files have been copied, you need to enter a number of parameters needed during installation. The format for these questions is: Question [default] : For most questions a default answer is provided. If that default is okay, press the Enter-key, otherwise type a new value (and press Enter) to override that default. If information you enter is incorrect, a message is displayed, and the question is asked again. For yes/no-style questions the default alternative is printed in upper case - in [y/n], no will be the default. The basic questions are: A base directory to the JDK - the installation program attempts to identify where the JDK resides based on the JAVA_HOME and PATH environment variables, but it is wise to check whether that directory really contains the full JDK. Your valid xdb license key. You need a super-user password to administer xdb databases. Note that this password needs to be typed twice and is displayed in clear

21 Page 19 of 130 text when run with Java versions older than 6. The database directory, where your data will reside. After this you are asked whether you want to alter advanced settings. If you select no (the default) the installation is completed using default values for the advanced settings. If you select yes you can set the following parameters: The directory for journal files, for performance reasons you could place the journal files on a different harddisk from your database files. The page size in bytes, ideally this should be equal to the block-size of the filesystem where the databases will be located (and never larger than that size). See the section on page sizes for more information. The portnumber where the xdb page server will accept connections on. Whether or not applications on other machines may access the pageserver on this machine. The minimum and maximum amount of memory passed to the JVM (-Xms and -Xmx). These values are used in scripts to start up client applications like the Adminclient and xhive-ant. The amount of pages the page server will cache. More means more memory usage but better performance. The default is 0, which means the server will use 50% of the JVM's available memory. Other Java command line options (we try to default to some useful options here for certain platforms, usually you do not need to change them) Performing post-installation steps After you have installed xdb on Unix, you have to perform the following post-installation steps: Add the $XHIVE_HOME/bin directory to your path, for example: bash$ export PATH="${PATH:$XHIVE_HOME/bin" or if you use a (t)csh: tcsh> setenv PATH ${PATH:$XHIVE_HOME/bin Start the page server with the command xdb run-server. This will start a process for the page server that can then be accessed from the Administrator client, the command line tools etc.

22 Page 20 of 130 Figure Verify the installation The following are the most important commands that are available after installation. Note that everything that can be done with these commands can also be done by programing your own application using the xdb API. More information on each command can be found by running them without any parameters. Additional commands are described in the comman line section. xdb admin xhive-ant xdb create-database xdb delete-database xdb create-federation xdb configure-federation xdb backup xdb restore xdb info xdb start-server, xdb stopserver xdb run-server Important xdb commands Graphical administration client, for maintaining databases, users and content. Compile and run the samples, as well as other programs. Create a new database. Remove an existing database. Create a new empty federation Set the superuser password and/ or license key on a federation. Backup a federation to a backup file Restore a federation from a backup file Shows debug information on currently open transactions and their locks Start a server process for a specific federation Start the dedicated server process for the default federation

23 Page 21 of 130 xdb stop-server xdb create-replica xdb suspend-diskwrites Stop the dedicated server process for the default federation Create a full replica (optionally including registering a replica id for it) of a federation at a given location. Ensure the federation files are flushed to disk and suspend (or resume) writing to them 5.5 Configuration files The installation process will create the following three configuration files for you: xdb.properties, which contains settings used when launching commands through the xdb command, the xhive-ant tool, or one of thexh* tools (which in turn map to xdb). $XHIVE_HOME\bin\xDB Server.lax contains JVM options for the NT service process on Windows. The relevant options in this file arelax.nl.java.option.java.heap.size.initial, equivalent to -Xms, and lax.nl.java.option.java.heap.size.max, equivalent to -Xmx. You can also change JVM parameters using lax.nl.java.option.additional and the default JVM to use (lax.nl.current.vm). All other settings will be read from the regular property file. This only applies to Windows, on UNIX operating systems the server is configured through xdb.properties. $XHIVE_HOME\bin\xDB Admin Client.lax contains JVM options for thexdb Admin Client.exe executable on Windows. These settings only apply when you start the Admin Client through the Start Menu shortcut or the executable, not when you start it through xdb admin or xhive-ant run-admin. xdb.properties contains key/value pairs for the database settings, described below. Settings from the configuration file can be overridden by setting an environment variable with the same name (e.g. XHIVE_BOOTSTRAP), or simply by passing the corresponding command line switch to the tool. The command line tools will first search for a.xdb.properties file in the current user's home directory. This way, you can create default xdb configurations for specific users. XHIVE_BOOTSTRAP xdb settings Property Example value (Windows) Explanation xhive://localhost:1235 The xdb URL used by command line tools. In the example, tools will try to connect to a server running on localhost. XHIVE_DATABASE empty The default database to use. This is not set by the installer. XHIVE_USERNAME XHIVE_PASSWORD XHIVE_MAX_MEMORY XHIVE_MIN_MEMORY XHIVE_CACHEPAGES 0 XHIVE_FEDERATION Administrator empty 128M 32M XHIVE_SERVER_MAX_MEMORY 256M XHIVE_SERVER_CACHEPAGES 0 C:/Program Files/xDB/data/XhiveDatabase.bootstrap The default username for the command line tools. Do not use this on production systems! The default password for the command line tools. Do not use this on production systems! Maximum memory used by a single command line tool, as in the -Xmx parameter to the JVM. Minimum memory used by a single command line tool, as in the -Xms parameter to the JVM. The number of cachepages allocated in command line tools. 0 will cause half of the JVM memory to be used. The location of the default database file. Note that you can have more than one federation. This property is used by the xdb run-server (XHStartServer, and the Windows service), xdb create-federation, and the xdb restore tools. Maximum memory for the xdb server process. Cachepages for the xdb server process. 0 will cause half of the JVM memory to be used. XHIVE_OPTS -server Additional options to be passed to the JVM. XHIVE_SERVER_ADDRESS localhost The server will listen at this address. If set to "*", the server will accept connections from all hosts, if set to "localhost" only connections from the same machine are accepted. XHIVE_SERVER_PORT 1235 The port used by the xdb server. XHIVE_HOME C:/Program Files/xDB The installation location. You can change this to use a different xdb version or an installation in a different location. If left empty, the tools will try to infer a location. The JDK installation to be used with the xdb tools. This must be a proper Java Development Kit, not a Java

24 Page 22 of 130 XHIVE_JAVA_HOME C:/Program Files/Java/jdk1.6.0_10 Runtime Environment (JRE). If left empty, the tools will use JAVA_HOME or, as a last resort, any java executable on the path. Changing server related settings (such as XHIVE_SERVER_MAX_MEMORY) will require a server restart. Other settings will be picked up the next time you run a command line tool. For more information about the command line tools, see here 5.6 Running the samples After you have installed xdb you can create databases and run the samples that are provided with xdb Create a database To create a database: 1. Start the AdminClient by running xdb admin, which is located in the bin subdirectory of the xdb target directory, and in the Start menu group of xdb on Windows. 2. Create a database by selecting the menu option Databases->Create database. 3. Create a database named united_nations with the super user password as entered during installation of xdb and administrator password northsea. This database will be used by default in all samples described in this manual. 4. After creating the database, you can close the administrator client. Note that you can also use the administrator client to view the data after you have run the samples, and to perform other actions. You can also create databases using the command line tool xdb create-database. For more details on creating databases, see the section Create a database in the Creating applications chapter Execute the samples xdb samples can be run using the ant tool: 1. Open a command prompt and go (cd) to the XhiveDir\bin directory. 2. To run a sample which inserts two documents into the database, enter the command: xhive-ant run-sample -Dname=manual.StoreDocuments On successful completion of the sample, a message appears stating the number of documents stored in the database. The sources of all the samples can be found in XhiveDir\src\samples\manual. 5.7 Uninstalling xdb In the (hypothetical) event that you would like to uninstall xdb, follow the instructions in the uninstalling.txt file. Uninstalling xdb on Windows can be done by using the Add/ Remove Programs item in the Control Panel, or the Uninstall xdb link in the xdb program group. The uninstaller will not delete data directories, compiled samples and other files created after installation. These can be removed 'by hand' afterwards. On Unix, you should first stop the page server if running, with the command xdb stop-server. Then, you can remove the installation and data directories. 5.8 Configuring the xhive.bootstrap property A property called xhive.bootstrap specifies the location of the xdb federation. The property can be used in two different ways: If the property is a URL of the form xhive://host:port, the xdb code will attempt to connect to an xdb server running behind the specified TCP/IP port. If the property is the complete (or relative) path to an XhiveDatabase.bootstrap file, an xdb server will be started in the current JVM. Depending on the application, this can be much faster than using a remote server because the communication overhead is avoided. However, only one JVM can run an xdb server for a specific federation at the same time. If you are using xhive-ant to run xdb applications, the xhive.bootstrap property automatically points to the location of the xdb server.

25 Page 23 of 130 The syntax of the xhive.bootstrap property as used with xhive-ant is: java -Dxhive.bootstrap= xhive:// host : port or java -Dxhive.bootstrap= PathName /XhiveDatabase.bootstrap Where PathName is the complete path to the XhiveDatabase.bootstrap file. Enclose the parameter in quotation marks if it contains whitespace. It is also possible to specify the path to XhiveDatabase.bootstrap or the URL to connect to in your Java application, as an argument to getdriver() on XhiveDriverFactory. 5.9 Using the Java command line instead of xhive-ant If you use the Java command line, you first need to configure the CLASSPATH variable to include the necessary jar files. What jars are necessary depends on the functionality that you need to use. xhive.jar, google-collect.jar, xercesimpl.jar, xbean.jar, jrs173_api.jar, antlr runtime.jar, icu4j.jar, lucene.jar: These are all needed for basic xdb functionality. When using Java 6, the jsr173_api.jar file is not needed, as the interfaces in it are included in Java 6. xalan.jar, serializer.jar: These jars are only needed when using XSLT transformations. If desired, you can substitute another JAXP compliant XSLT processor like Saxon. fop.jar, xmlgraphics-commons.jar, avalon-framework.jar, batik-all.jar, commons-io.jar, commons-logging.jar: These jars are only needed when creating PDFs using the XhiveFormatterIf. jline.jar: This jar is only used by the command line clients and is not necessary for applications using xdb. ant.jar, ant-launcher.jar: These jars are only used by the xhive-ant script. They are not necessary to run applications The xdb dedicated server process When you have installed xdb, the system is configured to run with a dedicated page server that runs as a background process and to which other applications that want to access the database can connect. It is important to realize that this process is not a required part for the xdb architecture. Java applications could also access the database directly through the bootstrap file without making any network connection. However, when accessing database files directly, no other processes (including the dedicated server described here) may be accessing the database at the same time (but that does not need to be a problem, as long as the 'main application' starts a listener thread for other applications to connect to, see below). What is also important to realize is that the server program is a very small Java program, accessing the xdb API like any xdb application you would write yourself. There are some different configuration possibilities, but apart from those options and the processing of the passed parameters the whole server process is essentially these five lines: XhiveDriverIf driver = XhiveDriverFactory.getDriver(bootstrapPath); // Get a driver for a bootstrap-location driver.init(cachepages); // Initialize the cache of the driver ServerSocket socket = new ServerSocket(port); // Create a listen socket driver.startlistenerthread(socket); // Start accepting remote connections wait(); // Wait forever in this main thread Only the lines related to startlistenerthread are specific to accepting remote connections, adding these lines to your own application will make it accept connections from other applications. As mentioned in the performance section, running xdb with a separate dedicated server such as configured after installation is not necessarily the best configuration for performance. The reason that it is configured by default with a separate server is because we feel that is is the more convenient way to get started, but for production applications where performance is essential running the complete database in process with the application itself might be better Configuration of the server The default configuration of a server is determined during the installation. The settings are the port-number to accept connections on (default is '1235'), the size of the page-cache to use in the server (default is '0', which will make the server use half of the available JVM memory), and from what address(es) to accept connections ('localhost' means only connections from the same machine are accepted, '*' means that connections from every machine are accepted). You can change these configuration parameters after installation, by changing the parameters in the 'configuration' files. For the windows

26 Page 24 of 130 service, this file is xdb-server.properties. For Unix the parameters need to be changed in the xdb.properties file. The files are located in the xdb installation directory. You will have to stop and start the server process after any changes (see below). Please note that when enlarging the cache size, you will also have to allocate more memory to the process, configured through the XHIVE_SERVER_MAX_MEMORY parameter in the same configuration file. Your selected page size can be found in the XhiveDatabase.bootstrap file, but cannot be changed after creating a federation. The dedicated process is not configured for SSL connections by default. However, examination of the configuration files should show that that the server process is essentially the Java class com.xhive.tools.xhiveserver, which matches the xdb run-server command. So examining the options of the xhive run-server command and reading the SSL section should get you started on how to change the configuration file. As explained, it is not required to use a server process at all. If you want to configure xdb for use without a server, you have to: Use /path/to/xhivedatabase.bootstrap instead ofxhive://hostname:portname as the bootstrap path in your application. Stop the server (since only one process is allowed access to the federation). On Unix, the server is simply not started if you don't run it. On Windows you need to stop the server using net stop xhive-server, and disable the automatic startup in the system preferences XHStartServer XHStartServer starts a background server process for the default federation, to which other processes can connect. The log output of any errors is redirected to a file named server.log which by is placed in the same directory where the XhiveDatabase.bootstrap file resides. On Windows, this background server is implemented as a service, so this command consists only of starting the service named 'xdb-server', which effectively runs the Java program com.xhive.tools.xhiveserver. The service starts automatically when the machine boots. Note that since this is a service, the SYSTEM account must have access to all involved resources (jars and database files). On Unix, the background server is implemented as simply a Java process running in the background, so the command consists of a script that starts a background Java process for the class com.xhive.tools.xhiveserver. If you would want to run a server at startup, you should add this command to your system startup configuration (be aware of the user under which the command is started in that case, that user must have full access to all involved resources) XHStopServer XHStopServer stops the background server process for the default federation. On Windows, this means simply stopping the 'xdb-server' service. On Unix, this involves killing the background process whose pid was registered in the file server.pid placed in the same directory as where the XhiveDatabase.bootstrap file resides, by XHStartServer. If there are any unexpected terminations of the process, you will have to remove this pid-file manually. An alternative to this is the xdb stop-server tool, which will connect to a running federation and tell it to exit. 6 Creating applications 6.1 Introduction This chapter describes how to create xdb applications of varying complexities. The sections appear in logical order, beginning with basic steps such as creating a database, and moving to advanced topics such as using XPointer, XQuery, XLink and Abstract Schema. When developing an xdb application, you may need to: Create a database Connect to the database Use sessions and transactions Create libraries Parse XML documents Validate XML documents Store XML documents DOM configuration settings

27 Page 25 of 130 Parse documents with context Store BLOBs Import non-xml data Create documents Retrieve XML documents and document parts Use XQuery Use XPath and XPointer Use indexes Traverse XML documents Export XML documents Publish XML documents Use XLink Use abstract schemas Revalidate documents with XML schema Access PSVI information Manage users and groups Use versioning Using metadata on library children Using JAAS to connect to the database We provide samples for most of the important functions. One of the ways to run a sample is through the xhive-ant tool. For example, you can use xhive-ant run-sample -Dname=manual.StoreDocuments to compile and run the StoreDocuments sample. Before you run the samples, it is wise to adjust the values in src/samples/manual.sampleproperties.java to match your system. 6.2 Create a database When you install xdb on a system, one database, called a federated database, is created by default. This federated database acts as a holder for one or more "regular" xdb databases. The federated database also contains super user information. For more information on xdb architecture, see the xdb architecture section. As an xdb user, you can choose to create as many "regular" databases as you need. You can create a database in xdb using any of the following means: The command-line utility xdb create-db. The adminclient. The Application Programming Interface (API) functions Creating a database using xdb create-database The command line utility xdb create-database is located in the XhiveDir /bin directory. When creating a new database using xhive create-database, you must give the name of the new database, the super user password, and the administrator password, as follows: xdb create-database SuperUserPassword AdminPassword DatabaseName

28 Page 26 of 130 Note: xdb is case-sensitive for database names, user names, and passwords. For example, Xhive and XHIVE are regarded as different database names. Passwords must be between three and eight characters long, and must be alphanumeric. The default administrator password used in all samples is northsea. Make the appropriate changes to SampleProperties.java if a different password is used Creating a database using the adminclient To create a database using the adminclient: 1. Start the adminclient. 2. Select Databases->Create database. The Create database window is displayed. Figure 13. Create database 3. Enter the database name, super user password, and administrator password Creating a database using API functions If you are a super user, you can create a database using the API. The method for creating a database is called createdatabase(), and is located in the com.xhive.core.interfaces.xhivefederationif interface. To create a new database using the API: 1. Start a session and open a connection to the database as super user. The databasename parameter of the connect() call should be null: XhiveDriverIf driver = XhiveDriverFactory.getDriver(); if (!driver.isinitialized()) { driver.init(1024); XhiveSessionIf session = driver.createsession(); session.connect(superusername, superuserpassword, null); Note: By default, sample CreateDatabase.java uses northsea as the super user password. If you entered a different super user password during xdb installation, change the source code of the sample accordingly. 2. Get a handle to the database federation: XhiveFederationIf federation = session.getfederation(); 3. Call createdatabase() with the name of the new database and the administrator password. You can specify a configuration file or use a default configuration (null): federation.createdatabase(newdbname, administratorpassword, null, System.out); More information on the physical (file) structure of a database can be found in a separate section.

29 Page 27 of 130 Note: A super user can only create (and delete) a database. Therefore, after creating a database using the createdatabase() method, you need to disconnect and reconnect as an administrator to perform actions on the new database. See also: samples: <XhiveDir>\src\samples\manual\CreateDatabase.java API documentation: com.xhive.core.interfaces.xhivefederationif 6.3 Connect to the database To connect to an xdb database: 1. Obtain an xdb driver: XhiveDriverIf xhivedriver = XhiveDriverFactory.getDriver(); If you do not specify the bootstrap location in the environment of the JVM, you can also pass it as an argument to getdriver: XhiveDriverIf xhivedriver = XhiveDriverFactory.getDriver("xhive://localhost:1235"); (if you connect to the database without a server you should use a path to the file XhiveDatabase.bootstrap) 2. Initialize the local page cache shared by the sessions for this driver: xhivedriver.init(1024); (You should only initialize a specific driver once in your application, you can check with isinitialized()). 3. Create a new XhiveSessionIf: XhiveSessionIf session = xhivedriver.createsession(); 4. Connect to the database, supplying a user name, password, and database name: See also: session.connect(username, UserPassword, DatabaseName); samples: <XhiveDir>\src\samples\manual\ConnectDatabase.java API documentation: com.xhive.xhivedriverfactory com.xhive.core.interfaces.xhivedriverif com.xhive.core.interfaces.xhivesessionif 6.4 Use sessions and transactions There is also a separate chapter with background information on sessions. In xdb all operations to databases take place within a session. The developer can determine the scope of a session. To create a session in xdb: 1. Establish a connection to the database, as described in the Connect to the database section. 2. Create a session using the createsession() method, which you can find in the com.xhive.core.interfaces.xhivesessionif interface. XhiveDriverIf driver = XhiveDriverFactory.getDriver(); if (!driver.isinitialized()) { driver.init(1024);

30 Page 28 of 130 XhiveSessionIf session = driver.createsession(); session.connect( username, userpassword, databasename ); Within a session one or more transactions can occur. A transaction is a group of operations that accesses and updates one or more XML documents or parts of XML documents in a database. Uniting a group of operations in a transaction enables you to make that group of operations 'atomic', that is, either all the instructions are executed to (successful) completion, or none are performed. This ensures that a database is never left in an inconsistent state. To use transactions within xdb do the following: 1. Start the transaction with the begin() method of com.xhive.core.interfaces.xhivesessionif. 2. Enter the instructions you want executed during the transaction. 3. End the transaction using either the commit() or rollback() method. Use the commit() method when you actually want to execute your transaction. Use the rollback()method when, because of some kind of failure during the transaction, you want to reverse all the instructions within the transaction. You should take care to always roll back the transaction if you get an unexpected exception. If for instance the disk becomes full while loading a document, modifications may have been done only partially. Committing such partial modifications can result in an inconsistent data structure in the database. Within the scope of the transaction, an application could, for example, parse external XML documents and append them to a library. If an error occurs during parsing or appending, the complete transaction is rolled back and none of the files are appended: session.begin(); try { XhiveLSParserIf parser = rootlibrary.createlsparser(); for ( int i=1; i<=numfiles; i++ ) { XhiveDocumentIf newdocument = parser.parseuri( new File(baseFileName + i + ".xml").tourl().tostring()); rootlibrary.appendchild(newdocument); session.commit(); catch (Exception e) { // in case of an error: do a rollback session.rollback(); e.printstacktrace(); // remove the session session.disconnect(); session.terminate(); In addition to commit(), xdb offers an alternative method, called checkpoint(), which you can use to commit all persistent operations executed since the last begin() or checkpoint(). One advantage of using checkpoint() instead of commit() is that the transaction remains active after the checkpoint() call, another is that references to database variables remain usable. After a commit has been performed and you begin a new transaction on the same session, you must re-get all database variables on the session. This means that for instance the code session.begin(); XhiveLibraryIf library = session.getdatabase().getroot(); session.commit(); session.begin(); System.out.println(library.getName()); session.commit(); will not execute. You will get a XhiveException with error code XhiveException.OBJECT_DEAD when you access the library after the second begin. The reason is that another concurrent session may have removed the database objects that you have references to. You have to get a new reference to the library after the second begin, e.g.: session.begin(); XhiveLibraryIf library = session.getdatabase().getroot(); session.commit(); session.begin(); library = session.getdatabase().getroot(); System.out.println(library.getName()); session.commit(); The following sections discuss some of the possible persistent operations, including creating a library and storing a document. Note: Although disconnect() marks the end of the scope of a session, it does not free all the resources allocated by the session. To improve performance, you should use the terminate() method to terminate a session when you no longer need it. If this is a remote session, this will terminate the TCP connection to the xdb server. If you do not call terminate(), the TCP connection will be closed when the session is finalized after it has been garbage collected. See also:

31 Page 29 of 130 samples: <XhiveDir>\src\samples\manual\UseSessions.java API documentation: com.xhive.core.interfaces.xhivesessionif com.xhive.core.interfaces.xhivedriverif 6.5 Create libraries In xdb, a library is a means of storing documents or other libraries. You can create as many libraries as you need to produce a hierarchy or storage architecture for your documents. The nested structure of libraries within xdb is very similar to the nested structure of directories or folders within your file system: any library can contain other libraries and there is exactly one root library. The root library is automatically created when the database is created but otherwise behaves like an ordinary library. To create a library, perform the following steps: 1. Obtain a handle to the parent library. If the root library is the parent library, use the getroot() method to get a handle. Otherwise, use a previously instantiated variable. 2. Create the library using the createlibrary() method. 3. Give the new library a unique name using the setname() method (optional). 4. Append the new library to its parent using the appendchild() method. The following code creates a library structure in the sample database with one top-level library called Publications. This top-level library has one sub-library called General Info. // get a handle to the root library XhiveLibraryIf rootlibrary = united_nations_db.getroot(); // create a library XhiveLibraryIf newliba = rootlibrary.createlibrary(); // give the new library a name newliba.setname("publications"); // append the new library to its parent rootlibrary.appendchild(newliba); // create a library which is a sublibrary of newliba XhiveLibraryIf newliba1 = newliba.createlibrary(); // give the new library a name newliba1.setname("general Info"); // append the new library to its parent newliba.appendchild(newliba1); The created library hierarchy looks like this: Figure 14. Sample library hierarchy To create a detachable library, one must call method XhiveLibraryIf.createLibrary(int options, String segmentid) with option flag XhiveLibaryIf.DETACHABLE_LIBRARY set. When flag XhiveLibraryIf.DETACHABLE_LIBRARY is set, one must also specify a segment id. If an unused segment with the given id already exists, the existing segment will be used for the library. If the specified segment does not exist, server will create one with the given name and use it for the library. Once used for a detachable library, a segment cannot be used for anything else but the descendants of the library. Note: Although naming a library is optional, it is strongly recommended as several access and indexing methods only work with named libraries. See also: samples:

32 Page 30 of 130 <XhiveDir>\src\samples\manual\CreateLibrary.java API documentation: com.xhive.core.interfaces.xhivedatabaseif com.xhive.dom.interfaces.xhivelibraryif com.xhive.dom.interfaces.xhivelibrarychildif 6.6 Parse XML documents To import an XML document from an external source, the XML document needs to be parsed. You can parse documents using the parseuri method of the DOM Load/ Save LSParser interface. The XhiveLibraryIf interface extends DOMImplementationLS, which can be used to create LSParser and LSSerializer objects. You must create LSParsers on the library where you want to store the document. When parsing succeeds, a DOM Document is returned. LSParser builder = rootlibrary.createlsparser(); Document firstdocument = builder.parseuri( new File(fileName).toURL().toString()); To store a parsed document in the database, you also need to perform an explicit appendchild. Otherwise, the document is only parsed and not stored. See the Store XML documents section for more information on storing documents. The parse documents sample uses the default LSParser configuration settings. Section Using DOM configuration provides more information on builder configuration options and how to change them. xdb supports the DOM Load/ Save specification, which provides standard ways for parsing and serializing DOMs. For more documentation, see the Load/ Save specification. See also: samples: <XhiveDir>\src\samples\manual\ParseDocuments.java <XhiveDir>\src\samples\manual\DOMLoadSave.java API documentation: org.w3c.dom.as com.xhive.dom.interfaces.xhivelibraryif org.w3c.dom.ls.lsparser org.w3c.dom.ls.lsserializer 6.7 Validate XML documents Validation consists of two main steps: 1. The document is validated. 2. The DTD/ XML Schema information is stored in the catalog as an abstract schema. As described in the catalog section, each time a document is validated, the application checks whether a abstract schema with the PUBLIC id of that document for DTDs, or the XML Schema with the namespace and file id, is already in the catalog. If so, the validation process uses this abstract schema instead of the external DTD. However, setting LSParser configuration parameter 'xhive-ignore-catalog' to true during parsing overrides catalog checking and always creates a new abstract schema for each document being validated. This is an important note, if there is only a system ID referring to a DTD, and you parse with validation, then a DTD is stored for every document you parse (this does not happen for XML Schemas)! By setting configuration parameter 'xhive-store-schema' to false it is possible to parse with validation without storing the schema The 'validate' configuration parameter default value is false. Therefore, to parse a file with validation, you need to set this parameter: LSParser parser = charterlib.createlsparser(); parser.getdomconfig().setparameter("validate", Boolean.TRUE); Document firstdocument = parser.parseuri( new File(fileName).toURL().toString()); The parsed document contains a reference to a DTD. This DTD is stored as an ASModel within the catalog of the library on which the document is parsed. The XhiveCatalogIf interface (in the com.xhive.dom.interfaces package) contains several methods for updating and querying the abstract schema models stored:

33 Page 31 of 130 // retrieve the catalog of the "UN Charter" library XhiveCatalogIf unchartercatalog = charterlib.getcatalog(); // get the abstract schema models that exist in the root library catalog Iterator iter = unchartercatalog.getasmodels(); ASModel asmodel; while ( iter.hasnext() ) { asmodel = (ASModel)iter.next(); System.out.println(" asmodel = " + asmodel.getlocation()); See also: samples: <XhiveDir>\src\samples\manual\ParseDocumentsWithValidation.java <XhiveDir>\src\samples\manual\ParseDocumentsWithContext.java API documentation: com.xhive.dom.interfaces.xhivelibraryif org.w3c.dom.ls.lsparser 6.8 Store XML documents To store XML documents in an xdb database, use the appendchild() method. Before you can use the appendchild() method, you need to get a handle to the library where the document should be stored. Every database has a root library by default. To store a document in the root library of the sample database, you could use the following code: XhiveLibraryIf rootlibrary = united_nations_db.getroot(); rootlibrary.appendchild(firstdocument); Alternatively, you can store a document using the insertbefore() method, which is also a standard DOM method. The second parameter to specify with insertbefore() is the document in front of which you want to insert the new document: rootlibrary.insertbefore(seconddocument, firstdocument); See also: samples: <XhiveDir>\src\samples\manual\StoreDocuments.java API documentation: org.w3c.dom.node 6.9 DOM configuration settings The DOM level 3 DOMConfiguration interface is used to set configuration settings. XhiveDocumentIf, LSParser and LSSerializer objects each have their own configuration object. DOMConfiguration can be used to set boolean parameters like "validate", "namespaces" and "cdata-sections". Furthermore, the configuration can be used to set string parameters like "schema-location" and user object parameters like error-handler. The boolean configuration settings and string parameters of a document are stored in the database. These settings are used by XhiveDocumentIf function normalizedocument. The complete list of options can be found in the JavaDOC documentation of DOMConfiguration, and getdomconfig() of LSParser and LSSerializer (all linked below). the following piece of code shows how to set a boolean parameter and user object parameter on the document configuration: XhiveDocumentIf document = (XhiveDocumentIf) builder.parseuri(new File(fileName).toURL().toString()); DOMConfiguration config = document.getdomconfig(); config.setparameter("validate", Boolean.TRUE); config.setparameter("error-handler", new SimpleDOMErrorPrinter()); In xdb, the default configuration settings conform to the default settings as defined by the DOM level 3 Core Specification and the Load/ Save Specification. A parameter is supported if it can be set to another value. The following piece of code shows how to test if a boolean parameter value is supported by a configuration: config.cansetparameter("validate", Boolean.TRUE); The following section shows deviations to the default values and additional xdb specific settings Deviations from specification

34 Page 32 of 130 According to the specification, LSParser default value of boolean parameter "element-content-whitespace" is "Boolean.TRUE". Documents parsed with this setting and stored in xdb may have a large number of text nodes containing only spaces. These additional nodes need more space on disk and may have a negative impact on query and validation performance. For this reason, in xdb the LSParser default value of this setting is set to "Boolean.FALSE" Additional parameters xdb has added a number of parameters that can used by the LSParser and/or Document. Parameter name "xhive-psvi" "xhiveignorecatalog" "xhive-storeschema" "xhive-storeschema-onlyinternalsubset" "xhivepredefinedentities" "xhivecharacterreferences" "xhive-rawattributes" "xhiveinsertxmlbase" "xhive-syncfeatures" "xhiveschema-ids" "xhive-nodecallback" See also: Purpose During validated parsing, the corresponding DTD's and XML schemas in the catalog are ignored. During validated parsing, the corresponding DTD's or XML schemas are stored in the catalog. Store psvi information on elements and attributes. Documents parsed with this feature turned on, give access to psvi information and enable support of data types by XQuery queries. Store only the internal subset of the document (not any external subset). Modifier for xhive-store-schema (only has a function when that parameter is set to true, and when DTDs are involved). Use this option if you only want to store the internal subset of the document (not the external subset). samples: <XhiveDir>\src\samples\manual\DOMLoadSave.java <XhiveDir>\src\samples\manual\TextCompression.java Supported by interface Default value XhiveDocumentIf, LSParser XhiveDocumentIf, LSParser LSParser LSParser Boolean.FALSE is equal to the value of parameter "validate" but this value can but can be overridden after setting parameter "validate". Boolean.FALSE Boolean.FALSE Store predefined entities as entity reference nodes. LSParser Boolean.FALSE Store character references as entity reference nodes. This will create entity reference nodes with names that are not legal according to the DOM recommendation. On parsing: store the raw unparsed value of the attribute. On serializing: use the raw value of the attribute instead of the normal one. On parsing: set the xml:base attribute on the top level element read from an external parsed entity, ensuring that Node.getBaseURI() gives correct results. On serializing (only when the "entities" option is set to false): insert xml:base attributes when external parsed entities are expanded during serialization. Convenience setting. With this setting turned on, parameter settings of XhiveDocumentIf are synchronized with the parameter settings of LSParser. Note that parameter settings "xhive-psvi" and "schemalocation" are always synchronized. Identification of XML schema ids used by the document. LSParser LSParser, LSSerializer LSParser, LSSerializer LSParser readonly parameter of XhiveDocumentIf. It is not supported. Provides applications the ability to call the user-defined function before the text or CDATASection nodes construction. In the future it is possible that the function will be called before the construction of nodes with other types. Currently, the function is used to define the text nodes which should be compressed during the loading of the document into the database. However, the user should not use compression for all text or XhiveNodeCallbackIf CDATASection nodes because for small text the header of the compressed text representation (at least 11 bytes, see zlib compression algorithm for details) adds additional overhead. Moreover, the compression algorithms consumes CPU time. In current implementation we store text or CDATASection in a compressed representation only if the compressed text size is smaller than the original one. Boolean.FALSE Boolean.FALSE Boolean.FALSE Boolean.FALSE null null

35 Page 33 of 130 API documentation: org.w3c.dom.domconfiguration org.w3c.dom.ls.lsparser org.w3c.dom.ls.lsserializer com.xhive.dom.interfaces.xhivedocumentif 6.10 Parse documents with context You can use function parsewithcontext on LSParser to parse complete documents. LSParser parser = charterlib.createlsparser(); // Using null, null, null as arguments here means the document will be completely empty Document document = charterlib.createdocument(null, null, null); // Other actions on document... LSInput source = charterlib.createlsinput(); source.setsystemid("file:///c:/docs/document.xml"); parser.parsewithcontext(source, document, LSParser.ACTION_REPLACE); This involves more code than a simple parse, but has the advantage that you can first create the documents, then set the indexes on them (at the 'other actions on document' line), and then parse the data on it. Especially for large documents, there may be some performance gained if you index the document during parsing. See also: samples: <XhiveDir>\src\samples\manual\DOMLoadSave.java API documentation: org.w3c.dom.ls.lsparser org.w3c.dom.ls.lsserializer com.xhive.dom.interfaces.xhivenodeif 6.11 Store BLOBs xdb enables the storage of Binary Large OBjects (BLOBs) within a database. Examples of BLOBs are: image files (GIF, JPEG, PNG, BMP etc.), sound files (MP3, WAV etc.) and Microsoft Office files (DOC, XLS, PPT etc.). The advantage of storing BLOBs next to XML documents within xdb is the fact that this allows the management of all resources for a specific project or product in one uniform way. BLOBs are stored in xdb as a special type of node, the BLOB node. The method createblob() creates a BLOB node. After creating a BLOB node, the content of the node must be filled through the setcontents() method. To actually add the BLOB node, the normal methods for adding nodes can be used: appendchild() or insertbefore(): String imgfilename = "un_flags.gif"; String imgname = "Flags of UN members"; FileInputStream imgfile = new FileInputStream(SampleProperties.baseDir + imgfilename) // create BLOB node XhiveBlobNodeIf img = charterlib.createblob(); // set the contents and name of the BLOB node img.setcontents(imgfile); img.setname(imgname); // append the BLOB node to the library charterlib.appendchild(img); BLOBs stored in xdb can be retrieved using the getcontents() method in XhiveBlobNodeIf. This method returns an InputStream. The getsize() method returns the size of the BLOB in bytes: // retrieve the contents of the BLOB node InputStream in = img.getcontents(); // retrieve the size of the BLOB node int imgsize = (int)img.getsize(); A FileOutputStream can be used to output the contents of the BLOB node to a file: // output the image to a new file FileOutputStream out = new FileOutputStream(SampleProperties.baseDir + "copy_of_" + imgfilename); byte[] buffer = new byte[imgsize]; int length; while((length = in.read(buffer))!= -1) {

36 Page 34 of 130 out.write(buffer, 0, length); in.close(); When iterating over the child nodes of a library, BLOB nodes can be distinguished from other nodes by their node type which is XhiveNodeIf.BLOB_NODE: Node n = charterlib.getfirstchild(); while (n!= null) { if (n instanceof XhiveBlobNodeIf) { System.out.println( "BLOB node found: " + ((XhiveLibraryChildIf)n).getName()); n = n.getnextsibling(); See also: samples: <XhiveDir>\src\samples\manual\StoreBLOBs.java API documentation: com.xhive.dom.interfaces.xhiveblobnodeif com.xhive.dom.interfaces.xhivenodeif 6.12 Import non-xml data xdb can import data from non-xml files, provided you supply information on how the data fields are separated and arranged in the source file. The com.xhive.util.interfaces.xhivesqlloaderif interface contains the methods used for importing non-xml data. For a detailed specification of these methods, refer to the XhiveSqlLoaderIf Javadoc. In this example, data is imported in CSV format into xdb, and stored as an XML document. The data to import looks like this: "Member", "Date of Admission", "Additional Notes" "Iceland", 19 Nov. 1946, "" "India", 30 Oct. 1945, "" "Indonesia", 28 Sep. 1950, "By letter of 20 January..." "Iran (Islamic Republic of)", 24 Oct. 1945, "" You can import the data with an XhiveSqlLoaderIf object, using the loadsqldata() method: Document un_members_doc = loader.loadsqldata(impl, FileName, ',', '\\', '"', true, "UN_members", XhiveSqlLoaderIf.IGNORE_HEADER, "member", new String[] {"name","admission_date","additional_note", new boolean[] {false, false, false); This command specifies the following information: The DOM implementation used: Impl. File name: variable FileName. Separator:, Escape character: \ Encloser: ". The encloser is the character that specifies the start and end of a string. Column names: true. This specifies that the first row in the data file gives column names and not data. Document element name: UN_members. This specifies the name of the root element. Mode: IGNORE_HEADER. This specifies that column headers are not used as element names.

37 Page 35 of 130 Row element name: member. This is the name of the element used to enclose each row. Column element names: new String[] {"name", "admission_date", "additional_note". This specifies the names to use for each of the elements. Column element ignore: new boolean[] {false, false, false);. This specifies whether to ignore (that is, not import) columns. This command produces the following XML: <UN_members> <member> <name>iceland</name> <admission_date>19 Nov. 1946</admission_date> <additional_note></additional_note> </member> <member> <name>india</name> <admission_date>30 Oct. 1945</admission_date> <additional_note></additional_note> </member> <member> <name>indonesia</name> <admission_date>28 Sep. 1950</admission_date> <additional_note>by letter of 20 January </additional_note> </member> <member> <name>iran (Islamic Republic of)</name> <admission_date>24 Oct. 1945</admission_date> <additional_note></additional_note> </member> </UN_members> In a similar way, you can use the loadsqldata() method to import data from a JDBC-compliant relational database. See the API documentation for more details. The loadsqldata() method has several variations. See the API documentation for all the possible parameters. See also: samples: <XhiveDir>\src\samples\manual\StoreRelationalData.java API documentation: com.xhive.util.interfaces.xhivesqlloaderif 6.13 Create documents XML data is stored in xdb databases as documents. A document is represented in the xdb API by the org.w3c.dom.document interface. This interface contains a number of methods for creating a new XML document, updating (parts of) XML documents, and accessing parts (elements, comments, attributes, and so on) of the document. See the Retrieve XML documents and document parts section for more information. To create a new document: 1. Obtain a handle to a DOM implementation (through rootlibrary as XhiveLibraryIf extends DOMImplementation): DOMImplementation impl = rootlibrary; 2. Create a DocumentType and a Document using the createdocument() method in org.w3c.dom.domimplementation: DocumentType doctype = impl.createdocumenttype("typename", "publicid", "systemid"); Document eventsdocument= impl.createdocument(null, "events", doctype); Because no namespaceuri is used, the first parameter can be left empty. The second parameter of createdocument(), events, is the (tag) name of the root element. The third parameter sets the doctype of the new document. 3. Obtain a handle to the root element of the newly created document: Element rootelement = eventsdocument.getdocumentelement(); 4. You can now add document parts using standard DOM methods. These methods are located in the org.w3c.dom.document interface. The most commonly used methods are:

38 Page 36 of 130 createattribute() createcomment() createelement() createtextnode() The following code adds a comment, an element named event with attribute occurrence, and a (text) value "UNICEF, Executive Board, annual session" to the new document: // add a comment to the document before the root element Comment comment = eventsdocument.createcomment("this document contains UN events"); eventsdocument.insertbefore(comment, rootelement); // add a new element to root element Element eventelement = eventsdocument.createelement("event"); rootelement.appendchild( eventelement ); // add text value to the element Text eventtext = eventsdocument.createtextnode("unicef, Executive Board, annual session"); eventelement.appendchild(eventtext); // add an attribute to the element eventelement.setattribute("occurrence", "year"); To add an element date with value "4-8 June, 2001" to element event, use the following code: // add a new element to event Element dateelement = eventsdocument.createelement("date"); eventelement.appendchild( dateelement ); // add text value to the date element Text datetext = eventsdocument.createtextnode("4-8 June, 2001"); dateelement.appendchild(datetext); Here is the resulting XML document: <!DOCTYPE typename PUBLIC "publicid" "systemid">  <events> <event occurrence="year"> UNICEF, Executive Board, annual session <date>4-8 June, 2001</date> </event> </events> As usual, to actually store the document in the database you need to use the appendchild() or insertbefore() method: rootlibrary.appendchild(eventsdocument); See also: samples: <XhiveDir>\src\samples\manual\CreateDocument.java API documentation: org.w3c.dom.document org.w3c.dom.domimplementation 6.14 Retrieve XML documents and document parts xdb offers several ways to retrieve documents and document parts from the database: 1. Through DOM operations. 2. By document ID. 3. By document name. 4. Using the executefullpathxpointerquery() method. 5. Using XQuery or XPath.

39 Page 37 of Using an index Through DOM operations The DOM specifications include a number of methods for retrieving documents or document parts. These methods are located in the org.w3c.dom.node interface, and use the fact that elements within the DOM are linked as parent/child or siblings. The following code checks whether a library has any children (either documents or libraries). If so, it counts the number of children: int nrchildren = 0; Node n = charterlib.getfirstchild(); while(n!= null) { nrchildren++; n = n.getnextsibling(); The org.w3c.dom.node interface also includes the getlastchild() and getprevioussibling() methods, which enable you to retrieve the children in reverse order. In addition to retrieving documents, DOM operations can also be used to retrieve specific parts of XML documents. The methods available include getfirstchild(), getnextsibling(), and haschildnodes() (all in org.w3c.dom.node). The following code uses DOM operations to display all elements within an XML document: public static void showchildren (Node thenode, int level) { // some output formatting String indentation = ""; for (int i=0; i<level; i++) { indentation += "\t"; Node n = thenode.getfirstchild(); int j = 1; // as long as there are children... while(n!= null) { // and child is of type 'element'... if ( n.getnodetype() == org.w3c.dom.node.element_node ) { // show the element... System.out.println(indentation + "child " + j++ + " is a " + n.getnodename()); // and get the children of this element (recursively) showchildren(n, level+1); // get next child n = n.getnextsibling(); The org.w3c.dom.element interface contains several methods for retrieving element attributes, including getattribute() and getattributenode(). The com.xhive.dom.interfaces.xhivenodeif interface contains several "convenience" methods that extend the functionality of org.w3c.dom.node. These methods include getfirstchildbytype(), getfirstchildelementbyname(), and getdescendantbykey(). The DOM specification also includes several methods for traversing documents. For more information on retrieving parts of XML documents by traversal, see the Traverse XML documents section. Note: The performance of the getchildnodes() method (in interfaceorg.w3c.dom.node) which returns all the children of a node in anodelist, can be slow. The use of the getnextsibling() method to iterate across child nodes is recommended By document ID When a document is created, xdb automatically assigns an identifier (of type long) to the new document. This identifier is unique within the context of the library in which the document is stored. You can use the get() method (in com.xhive.dom.interfaces.xhivelibraryif) to retrieve documents by their document identifier:

40 Page 38 of 130 int anid = 10; Node child = charterlib.get(anid); System.out.println("document with ID = " + anid + " in \"UN Charter\" has name: " + ((XhiveLibraryChildIf)child).getName()); By document name Although every document has an identifier, it is often more convenient to retrieve documents by their name. Document names are optional, and, when set, a document name has to be unique within the context of the library in which the document is stored. You can retrieve documents by name by using the get() method in com.xhive.dom.interfaces.xhivelibraryif: String documentname = "UN Charter - Chapter 2"; Document docretrievedbyname = (Document)charterLib.get( documentname ); Using the executefullpathxpointerquery() method The com.xhive.dom.interfaces.xhivelibrarychildif interface includes the executefullpathxpointerquery() method which can be used to retrieve a document by its library path and document name. Adding an XPointer expression to the library path and document name allows retrieval of document parts. The syntax of the input query is: /libname[/libname...][/docname /id:docid[##versionid]][#xpointer_query] where square brackets ([]) enclose optional parts. The following code retrieves document UN Charter - Chapter 2 from library UN Charter and outputs its content: Iterator docsfound = rootlibrary.executefullpathxpointerquery("/un Charter/UN Charter - Chapter 2"); Document docretrievedbyfpxpq = (Document)docsFound.next(); System.out.println(docRetrievedByFPXPQ.toString()); It is also possible to specify a relative path: // newlib is a sub library of "UN Charter" // execute the FullPathXPointerQuery relative to the new sub library docsfound = newlib.executefullpathxpointerquery("../un Charter - Chapter 3"); docretrievedbyfpxpq= (Document)docsFound.next(); System.out.println(docRetrievedByFPXPQ.toString()); When the document name is not specified, the executefullpathxpointerquery() method returns a library: Iterator librariesfound = rootlibrary.executefullpathxpointerquery("/un Charter"; XhiveLibraryIf charterlib = (XhiveLibraryIf)librariesFound.next(); By adding an XPointer expression to the query string the executefullpathxpointerquery() method can be used to retrieve XML document parts. Here is an example that retrieves all title elements within the sample document Un Charter - Chapter 5 in library UN Charter: String samplelibname = "/UN Charter"; String sampledocname = "UN Charter - Chapter 5"; String sampledocpath = samplelibname + "/" + sampledocname; String queryxpointer = "#xpointer(/descendant::title)"; Iterator resultnodes = rootlibrary.executefullpathxpointerquery(sampledocpath + queryxpointer); while ( resultnodes.hasnext() ) { Node resultnode = (Node)resultNodes.next(); System.out.println( resultnode.getfirstchild().getnodevalue() ); Because the document name is an optional part of the input string, you could use the following code to retrieve the first paragraph of UN article #68 without specifying which document contains this article: queryxpointer = "#xpointer(/descendant::article[@number='68']/para[1])"; // note that we only specify the library path and not the document name: resultnodes = rootlibrary.executefullpathxpointerquery(samplelibname + queryxpointer); while ( resultnodes.hasnext() ) { Node resultnode = (Node)resultNodes.next(); System.out.println( resultnode.getfirstchild().getnodevalue() ); You can also get a specific version of a document. By adding ## followed by a version id or label after the document name of the query-string, the method will fetch a copy of the specified version. You can add an XPointer expression to a query string that gets a specific version of a

41 Page 39 of 130 document to retrieve an XML document part. You can only get a version of a document, not a library. It is also an error to try to retrieve a non-existing version. The version identifier in the query string is first evaluated as a label, if no version with that label can be found the version identifier is treated as a version id Here is an example that retrieves all title elements within version 1.3 of the sample document Un Charter - Chapter 5 in library UN Charter: String samplelibname = "/UN Charter"; String sampledocname = "UN Charter - Chapter 5"; String sampledocpath = samplelibname + "/" + sampledocname; String versionidentifier = "##1.3"; String queryxpointer = "#xpointer(/descendant::title)"; Iterator resultnodes = rootlibrary.executefullpathxpointerquery(sampledocpath + versionidentifier + queryxpointer); while ( resultnodes.hasnext() ) { Node resultnode = (Node)resultNodes.next(); System.out.println( resultnode.getfirstchild().getnodevalue() ); For more information on specifying XPointer queries, see the Use XPath and XPointer section Using XQuery, XPath and XPointer See the Use XQuery and Use XPath and XPointer section for details Using a Context Conditioned Index See the Use Context Conditioned Indexes section for details. See also: samples: <XhiveDir>\src\samples\manual\RetrieveDocuments.java <XhiveDir>\src\samples\manual\RetrieveDocumentParts.java API documentation: org.w3c.dom.node org.w3c.dom.element com.xhive.dom.interfaces.xhivelibrarychildif com.xhive.dom.interfaces.xhivelibraryif com.xhive.dom.interfaces.xhivenodeif 6.15 Use XQuery Most of the details of using XQuery with xdb are contained in a separate chapter. You can execute an XQuery query using the executexquery(string query) method on the XhiveLibraryChildIf interface (meaning it can be executed on libraries and documents). It returns a java.util.iterator that represents the result sequence. Each element of the result is an instance of XhiveXQueryValueIf. For example, you could use the following code to execute a query that retrieves all the titles of UN Charter chapters: Iterator result = charterlib.executexquery("//chapter/title"); while (result.hasnext()) { XhiveXQueryValueIf value = (XhiveXQueryValueIf) result.next(); // We know this query will only return nodes. Node node = value.asnode(); // Do something with the node... For more details on the different executexquery variants, details of the XQuery engine specifics for xdb, extension functions we have added etc., see the separate section on XQuery. See also: samples: <XhiveDir>\src\samples\manual\XQuery.java API documentation: com.xhive.dom.interfaces.xhivelibrarychildif

42 Page 40 of 130 com.xhive.query.interfaces 6.16 Use XPath and XPointer Note: The information in this section is mostly for backward compatibility. Please use XQuery whenever possible XPath XPath queries can be executed using the executexpathquery(...) methods on the XhiveNodeIf interface. Optionally a query context that contains namespace declarations, variable and function bindings, and an absolute root can be supplied when executing an XPath query. The results of an XPath query are presented through the XhiveQueryResultIf interface. For example, you could use the following code to execute a query that retrieves all the titles of UN Charter chapters: XhiveQueryResultIf result = charterlib.executexpathquery("descendant::chapter/title"); The XhiveQueryResultIf interface includes methods for extracting different types of information from the result of a query. A result can be a string, a boolean, a number, or a location set. The latter is a collection of nodes, points, and ranges. You can convert a query result to any of the other types by using the following methods: To retrieve the string value of the result, use getstringresult(). To retrieve the boolean value of the result, use getbooleanresult(). To retrieve the numeric value of the result, use getnumberresult(). To retrieve the location set value of the result, use getlocationsetvalue(). The conversion rules that apply to these methods are listed in the XPath specs. Note: An XhiveException occurs if you call getlocationsetvalue() on a result that is not a location set. To process the results in the example (returned as a location set), you could use the following code: if (result!= null){ if ( resulttype == XhiveQueryResultIf.LOCATIONSET ){ XhiveLocationIteratorIf resultnodeset = result.getlocationsetvalue(); XhiveLocationIf resultnode; while ( resultnodeset.hasnext() ) { resultnode = resultnodeset.next(); if ( resultnode.getlocationtype() == Node.ELEMENT_NODE ) { System.out.println(" " + ((Node)resultNode).getFirstChild().getNodeValue()); XPointer XML Pointer Language (XPointer), which is based on XML Path Language (XPath), supports addressing into the internal structures of XML documents. This enables you to traverse a hierarchical document structure and select parts of the hierarchy based on various properties. For a complete and up-to-date description of XPointer, refer to the XML Pointer Language (XPointer) Version 1.0 documentation at the W3C website. XPointer queries can be executed in a similar fashion as XPath queries: XhiveQueryResultIf result = charterlib.executexpointerquery("xpointer(/chapter/article/para/list/item[1]/para)"); Contrary to XPath queries, the result of an XPointer query has to be a non-empty location set. Any other outcome generates an exception. The location set is processed in the same way as described before Working with namespaces

43 Page 41 of 130 Before a namespace can be used in an XPath query, its namespace must be declared in a XhiveXpathContextIf that should be supplied when executing the query. This can be done using the addnamespacebinding(...) method: XhiveXPathContextIf xpathcontext = nsdocument.createxpathcontext(); // Add a namespace declaration for the xsl-namespace // Note that the prefix does not have to match xpathcontext.addnamespacebinding("ns", " // Execute the query XhiveQueryResultIf result = nsdocument.executexpathquery("descendant::ns:template/@match", xpathcontext); Note: XhiveXPathContextIf objects become invalid after an end of a transaction (commit, checkpoint, rollback), this to prevent that the context operates on invalid data (because the view of the database changes after the beginning of a new transaction). If a context is used after a transaction end an XhiveException.INVALID_CONTEXT is thrown. In XPointer, the namespace declaration is included in the query. The format of the declaration is defined in the XPointer specifications: thequery = "xmlns(ns= xpointer(descendant::ns:template/@match)"; See also: samples: <XhiveDir>\src\samples\manual\XPath.java <XhiveDir>\src\samples\manual\XPathXPointerNamespaces.java API documentation: com.xhive.query.interfaces com.xhive.xpath.interfaces.xhivexpathcontextif com.xhive.dom.interfaces.xhivenodeif 6.17 Use indexes Using indexes can dramatically improve the performance of your queries. Detailed information about indexes can be found in a dedicated chapter. This section only contains some pieces of sample code Library id indexes The following sample shows how to add a library id index: //get the index list of the library XhiveIndexListIf indexlist = library.getindexlist(); //add a library id index to the library String idindexname = "Library ID Index"; XhiveIndexIf idindex = indexlist.addlibraryidindex(idindexname); Note that the root library has a library id index by default when a database is created Library name indexes The following sample shows how to add a library name index: //add a library name index to the library String nameindexname = "Library Name Index"; XhiveIndexIf nameindex = indexlist.addlibrarynameindex(nameindexname); Note that library name indexes are added by default when a library is created Id attribute indexes The code below shows how to add an id attribute index to a document //Get the indexlist XhiveIndexListIf indexlist = document.getindexlist(); //add the id attribute index to the indexlist of the document if the index is not found String indexname = "ID Attribute Index"; XhiveIndexIf index = indexlist.getindex(indexname); if (index == null){

44 Page 42 of 130 index = indexlist.addidattributeindex(indexname); Id attribute indexes are used by XQuery/XPath/XPointer queries and the getelementsbyid(string elementid) method. This type of indexes will most likely not be accessed directly by users. However, the following code shows how to view the keys of the index: //Print the element of key = "p3" String key = "p3"; XhiveNodeIteratorIf nodeiter = index.getnodesbykey(key); if (nodeiter.hasnext()){ System.out.println(" Element = " + nodeiter.next()); Element name indexes The following code shows how to create a selected element name index: //Add a selected element name index String[] names = {"NAME", "BORN", "WIFE"; XhiveIndexIf selectedelementnameindex = indexlist.getindex(selectedelementindexname); if (selectedelementnameindex == null){ indexlist.addelementnameindex(selectedelementindexname, names); In case of namespaces, the selected element names could be defined as follows: //Define the names of an element name index with namespaces String[] names = {" chapter", " owner"; Value indexes To create a value index, use the addvalueindex() method. The exact type of value index is determined by the parameters that are supplied to the addvalueindex() method: // create a value index that stores elements by element value XhiveIndexIf nameindex = indexlist.addvalueindex(nameindexname, null, "NAME", null, null); // create a value index that stores elements by attribute value XhiveIndexIf idindex = indexlist.addvalueindex(idindexname, null, null, null, "ID"); // create a value index that stores named elements by attribute value XhiveIndexIf personbyfatherindex = indexlist.addvalueindex(personbyfatherindexname, null, "PERSON", null, "FATHER"); Full text indexes To create a full text index, use the addfulltextindex() method. The exact type of value index is determined by the parameters that are supplied to the addfulltextindex() method: // create a full text index on the text-contents of the name elements XhiveIndexIf nameindex = indexlist.addfulltextindex(nameindexname, null, "NAME", null, null, null, XhiveIndexIf.FTI_SUPPORT_PHRASES XhiveIndexIf.FTI_GET_ALL_TEXT); Context conditioned indexes A context conditioned index stores Node objects by a user-defined key. The three basic steps involved in creating a context conditioned index are: 1. Get a handle to the index list. 2. Create an XhiveCCIndexIf object. 3. Create an index node filter (using XhiveIndexNodeFilterIf) to define which nodes to include and not include in the index. // get the index list that belongs to this database XhiveIndexListIf indexlist = charterlib.getindexlist(); // create an XhiveIndexIf object String indexname = "Index of even numbered chapters"; XhiveCCIndexIf index = (XhiveCCIndexIf)indexList.getIndex(indexName);

45 Page 43 of 130 if ( index!= null ) { // remove existing index first indexlist.removeindex(index); index = (XhiveCCIndexIf)indexList.addNodeFilterIndex("samples.manual.SampleIndexFilter",indexName ); Once the context conditioned index has been created the indexdocument() method of XhiveCCIndexIf can be used to add context conditioned index entries for a document: index.indexdocument(newdocument); The following sample shows how to use the created index to retrieve all the titles of the even number chapters: Iterator keyiter = index.getkeys(); while (keyiter.hasnext()) { String key = (String) keyiter.next(); System.out.println(key); The getnodesbykey() method is used to retrieve nodes from the index based on a given key: XhiveNodeIteratorIf nodesfound = index.getnodesbykey("amendments"); while (nodesfound.hasnext()) { XhiveNodeIf docfound = (XhiveNodeIf)nodesFound.next(); System.out.println(docFound.toXml()); Creating index node filters An index node filter determines which nodes to include in a context conditioned index. An index node filter is a Java class which implements the XhiveIndexNodeFilterIf interface. The XhiveIndexNodeFilterIf extends org.w3c.dom.traversal.nodefilter interface and classes implementing XhiveIndexNodeFilterIf must define the following functions: acceptnode(node n); getkeys(node n); The following code creates the index node filter used in the sample above: package samples.manual; import com.xhive.index.interfaces.*; import java.util.vector; import org.w3c.dom.*; public class SampleIndexFilter implements XhiveIndexNodeFilterIf { Vector v = new Vector(); public SampleIndexFilter() { super(); public short acceptnode(node n) { // add chapters to the index which have an even number if ( n.getnodename().equals("chapter") && (Integer.parseInt(((Element)n).getAttribute("number")) % 2 == 0) ) { return FILTER_ACCEPT; return FILTER_SKIP; public Vector getkeys(node n) { // use the titles of the chapters as key for this index v.removeallelements(); v.addelement( n.getfirstchild().getfirstchild().getnodevalue() ); return( v ); The speed of indexing depends on the implementation of the filter. By using the return value NodeFilter.FILTER_REJECT for subtrees that are

46 Page 44 of 130 of no interest to the index, you can enhance the indexing performance. See also: samples: <XhiveDir>\src\samples\manual\LibraryIndexes.java <XhiveDir>\src\samples\manual\IdAttributeIndex.java <XhiveDir>\src\samples\manual\ValueIndexIndex.java <XhiveDir>\src\samples\manual\ElementNameIndex.java <XhiveDir>\src\samples\manual\CCIndex.java <XhiveDir>\src\samples\manual\SampleIndexFilter.java <XhiveDir>\src\samples\manual\FTI.java <XhiveDir>\src\samples\manual\IndexAdder.java API documentation: com.xhive.index.interfaces.xhiveindexif com.xhive.index.interfaces.xhiveindexlistif com.xhive.index.interfaces.xhiveindexadderif com.xhive.index.interfaces 6.18 Traverse XML documents When you need to perform an action (for example, applying a change) on a number of nodes, you can use traversal. Traversal moves through a tree processing every node it encounters. In xdb, you can traverse XML documents using: DOM operations DOM Traversal Function objects Using DOM operations For information on using DOM operations, see the section on retrieving XML document parts through DOM operations Through DOM Traversal The DOM Level 2 Traversal specification specifies operations for traversing XML documents. These are defined in the org.w3c.dom.traversal package. The package defines two different means of traversal: NodeIterator TreeWalker The difference between the two means of traversal is that a NodeIterator 'works' on a flat representation of an XML document (or set of XML documents), while a TreeWalker uses the tree representation of an XML document. For example, the following simple XML document: <A> </A> <B>some text</b> <C> <D>1st child of C</D> <E>2nd child of C</E> </C> <F>some more text</f> has the following flat representation: A B C D E F and the following tree representation:

47 Page 45 of 130 Figure 15. The most important methods defined by both the NodeIterator and TreeWalker interface are nextnode() and previousnode(). These methods retrieve the next and previous node in a traversal. In addition, the TreeWalker defines a number of additional methods: parentnode() firstchild() lastchild() previoussibling() nextsibling() The org.w3c.dom.traversal package also defines the NodeFilter interface, which is used by both NodeIterators and TreeWalkers to determine what nodes should be included in the traversal. The NodeFilter interface contains one method, acceptnode(), that determines whether a node should be accepted or rejected. acceptnode() returns one of the following values: FILTER_ACCEPT, The current node is included. FILTER_SKIP, the current node is not accepted, but the children of the current node are considered for acceptance. FILTER_REJECT, the current node is not accepted, and, for TreeWalkers, the children of the current node are not considered for inclusion. The following example shows a NodeFilter implementation, samplefilter(), that skips all title elements and rejects all list elements: public class SampleFilter implements NodeFilter { public short acceptnode (Node n) { if (n.getnodetype() == Node.ELEMENT_NODE) { Element elem = (Element) n; if ( elem.getnodename().equals("title")) { return FILTER_SKIP; if ( elem.getnodename().equals("list")) { return FILTER_REJECT; return FILTER_ACCEPT; To create a NodeIterator that uses the above filter, you need to get a handle to the DocumentTraversal implementation: DocumentTraversal doctraversal = XhiveDriverFactory.getDriver().getDocumentTraversal(); The createnodeiterator() method is used to create a NodeIterator and takes the following parameters: root, the node at which to start the traversal. whattoshow, flag specifying which node types to include. filter, the filter to use, null if you do not want to use a filter. entityreferenceexpansion, a flag specifying whether to expand entity reference nodes.

48 Page 46 of 130 To traverse the first chapter of the UN Charter using a NodeIterator and without a NodeFilter, you could use the following code: System.out.println("\n#NodeIterator without a NodeFilter:"); NodeIterator iter = doctraversal.createnodeiterator(resultgetdocument, NodeFilter.SHOW_ALL, null, false); Node node; while ((node = iter.nextnode())!= null){ System.out.println(" Node Name = " + node.getnodename()); To restrict the traversal and not include title or list elements, change the second line of the above example to: NodeIterator iter = doctraversal.createnodeiterator(resultgetdocument, NodeFilter.SHOW_ALL, samplefilter, false); Also, samplefilter has to be instantiated: NodeFilter samplefilter = new SampleFilter(); If you use a TreeWalker (together with samplefilter) instead of a NodeIterator to traverse the sample document, the child nodes of the list element are also skipped: TreeWalker walker = doctraversal.createtreewalker(resultgetdocument, NodeFilter.SHOW_ALL, samplefilter, false); while ((node = walker.nextnode())!= null){ System.out.println(" Node Name = " + node.getnodename()); Using function objects Function objects are an elegant way to traverse documents because they enable separation and re-use of traversal and function methods. A function object is a class that implements the com.xhive.dom.interfaces.xhivefunctionif interface. You need to define the following methods in a function object class: isdone(), which indicates whether the traversal can be terminated. process(),which contains the code for the actual processing of the node. test(), which indicates whether the node has to be processed with process(). To create a function object that only processes Element nodes with an attribute number, you can declare the test() method as follows: public boolean test (Node node){ return node.getnodetype() == Node.ELEMENT_NODE && ((Element)node).hasAttribute("number"); The test() method is called for every node and determines whether to process the node. The process() method of the function object class defines what has to happen with the nodes that pass the test as defined in test(): public void process (Node node){ String indentation = ""; String elementname = ((Element)node).getTagName(); if ( elementname.equals("article") ) { indentation = " "; System.out.println( indentation + elementname + " " + ((Element)node).getAttribute("number") ); The isdone() method, which the traversal method calls automatically, checks if the traversal has to continue or can terminate. In the example, all nodes have to be processed, so isdone() always returns false: public boolean isdone (Node node){ return false;

49 Page 47 of 130 You could, for example, use the isdone() method to limit the number of processed nodes to a specified number: public boolean isdone (Node node){ return nrresults == maxnrresult; // nrresults is incremented in process() The com.xhive.dom.interfaces.xhivenodeif interface contains the traversal methods for function objects. The following traversal methods are available: traverseallnodesdocumentorder() traverses the current node and all descending nodes, including attributes in document order. traverseancestors() traverses the ancestors of the current node. traverseattributesdocumentorder() traverses the attributes of the current node and its descending nodes in document order. traversebreadthfirst() traverses the current node and all its descendants in breadth-first order. traversechildren() traverses the children of the current node. traversedocumentorder() traverses the current node and all its descendants in document order. traversereversedocumentorder() traverses the current node and all of its descendants in reverse document order. The following example uses the function MyNumberFinder to traverse all nodes within a library in document order and breadth-first: MyNumberFinder numberfinder = new MyNumberFinder(); System.out.println("# traverse the charter library with traversedocumentorder:"); charterlib.traversedocumentorder(numberfinder); System.out.println("# traverse the charter library with traversebreadthfirst:"); charterlib.traversebreadthfirst(numberfinder); You could also use the same function object to traverse a single document in various ways: Document chapter5document = (Document)charterLib.get("UN Charter - Chapter 5"); System.out.println("# traverse \"UN Charter - Chapter 5\" with traversedocumentorder:"); ((XhiveDocumentIf)chapter5Document).traverseDocumentOrder(numberFinder); System.out.println("# traverse \"UN Charter - Chapter 5\" with traversereversedocumentorder:"); ((XhiveDocumentIf)chapter5Document).traverseReverseDocumentOrder(numberFinder); See also: samples: <XhiveDir>\src\samples\manual\DomTraversal.java <XhiveDir>\src\samples\manual\MyNumberFinder.java <XhiveDir>\src\samples\manual\FunctionObjects.java API documentation: org.w3c.dom.traversal com.xhive.dom.interfaces.xhivenodeif com.xhive.dom.interfaces.xhivefunctionif 6.19 Export XML documents For serialization of a DOM (to e.g. a String or an output stream), you can use our toxml(...) method on XhiveNodeIf, or use a DOM Load/ Save LSSerializer, e.g.: LSSerializer writer = charterlib.createlsserializer(); writer.getdomconfig().setparameter("format-pretty-print", Boolean.TRUE); String output = writer.writetostring(firstdocument); See also: samples: <XhiveDir>\src\samples\manual\DOMLoadSave.java

50 Page 48 of 130 API documentation: com.xhive.dom.interfaces.xhivenodeif org.w3c.dom.ls.lsserializer 6.20 Publish XML documents When publishing XML documents from xdb, you can: Publish using XSLT, through the interface XhiveTransformerIf. Publish to PDF, through the interface XhiveFormatterIf Publish using XSLT xdb contains an XSL Transformation (XSLT) engine. XSLT provides a declarative means for transforming an XML source tree into any required result tree. This enables publishing of XML content in (X)HTML, WAP/WML, PDF, or any other format. For more information on XSLT, refer to the W3C website. To publish from xdb using XSLT, you specify an XML and XSL document within a Java application. Both the XML and XSL documents are stored in the database. The XSL document, which is actually an XML file, specifies the transformations that need to take place to produce the desired output. After specifying which XML and XSL document to use, you need to use either the transformtostring(), transformtostream() or transformtodocument() method to produce the output. The output can be another XML document, or a document of any other format. The following example takes a document from the databases, parses an XSL file and then uses the transformtostring() method to transform it and output it as a string. // parse the XSL source file LSParser parser = charterlib.createlsparser(); Document xsldocument = parser.parseuri(new File(baseDirectory + "/publish2html.xsl").tourl().tostring()); // get a handle to the Transformer implementation XhiveTransformerIf transformer = XhiveDriverFactory.getDriver().getTransformer(); // retrieve the document to publish Document firstdocument = (Document) charterlib.get("un Charter - Chapter 1"); // transform the XML document using the XSL document String result = transformer.transformtostring( firstdocument, xsldocument); Publish to PDF The steps required for producing output in PDF format are similar to those required for producing other formats. However, a different interface, com.xhive.util.interfaces.xhiveformatterif exists for converting output to PDF. This interface contains the formataspdf() method. The formataspdf() method can either format an XSL-FO document or an XML document as a PDF string. In the latter case, you need to supply an XSL document. The following example retrieves an XML document and applies the formataspdf() method to the XML document and a parsed XSL document to produce a PDF string. This string is written to a file: // parse the XSL source file LSParser parser = charterlib.createlsparser(); Document xsldocument = builder.parseuri(new File(baseDirectory + "/publish2pdf.xsl").tourl().tostring()); XhiveFormatterIf formatter = XhiveDriverFactory.getDriver().getFormatter(); // format the XML document as PDF using the parsed XSL document String result = formatter.formataspdf(rootlibrary, firstdocument, xsldocument); // output the PDF-file File pdffile = new File(baseDirectory + "/output.pdf"); FileWriter fw = new FileWriter(pdfFile); PrintWriter pw = new PrintWriter(fw); pw.println(result); fw.close(); Note: When parsing XSL documents, the XhiveLibraryIf.PARSER_NAMESPACES_ENABLED option must be TRUE. Otherwise an exception is thrown during transformation of the XML document.

51 Page 49 of 130 See also: samples: <XhiveDir>\src\samples\manual\publish2HTML.java <XhiveDir>\src\samples\manual\publish2PDF.java API documentation: com.xhive.util.interfaces.xhivetransformerif com.xhive.util.interfaces.xhiveformatterif 6.21 Use XLink XLink is a W3C recommendation that enables links between XML documents. You can use XLink to create simple links equivalent to <a> links in HTML, and to create more complex links. For a complete and up-to-date description of XLink, refer to the XML Linking Language (XLink) Version 1.0 documentation at the W3C website. All information about an XLink is stored as an attribute. For example, the attribute xlink:show describes how the content should be displayed, while xlink:type describes the type of the XLink. The xlink:href attribute defines the target of the link and can be defined using URI's and XPointer. For example: xlink:href="/un Charter/UN Charter - Chapter 4#xpointer(/chapter[1]/title[1])" defines the (first) title of the first chapter of document "UN Charter - Chapter 4" in library "UN Charter" as the target for this link. For more information on syntax of such links, check the section on executefullpathxpointerquery. The XLink information is accessible through the DOM API because it is stored as attributes. However, it is much easier to use the convenience methods for XLink as provided by xdb through the interfaces within package com.xhive.dom.xlink.interfaces. The convenience methods available include methods for: Retrieving specific attributes, for example gettitle(), gethref(), and getrole(). Retrieving all available links, for example getlinks(), getlinksbytitle() and getlinksbyrole(). Retrieving all nodes linking to a specific resource, for example getnodeslinkingto(). Retrieving the information from arcs, for example getarcs(), getfrom(), getto(), getstartingresources() and getendingresources(). Retrieving the information from locators, for example getlocators(), getrole() and getlabel(). Retrieving (the content of) the targeted resources, for example expanddocument() and getresourceslinkedby(). Sample DomLinkBase.java demonstrates several of the XLink methods available in xdb: //Gets an Iterator with all extended links for (Iterator i = linkbase.getlinksby("type", "extended"); i.hasnext();) { XhiveExtendedLinkIf link = (XhiveExtendedLinkIf) i.next(); System.out.println(" Title = " + link.gettitle()); //Gets an Iterator with all arcs for (Iterator j = link.getarcs(); j.hasnext();) { XhiveArcIf arc = (XhiveArcIf) j.next(); if (arc.getfrom().equals("margaret Martin")) { for (Iterator m = arc.getendingresources(); m.hasnext();) { See also: XhiveLocatorIf locator = (XhiveLocatorIf) m.next(); System.out.println(" *Checked out resource = " + locator.getlabel()); samples: <XhiveDir>\src\samples\manual\XLink.java samples: <XhiveDir>\src\samples\manual\DomLinkBase.java API documentation: com.xhive.dom.xlink.interfaces

52 Page 50 of Use abstract schemas The abstract schema support consists of a set of interfaces (formerly a W3C draft specification) that enables the use of e.g. DTD and XML Schema information with the DOM. Abstract schema contains interfaces for handling schema information itself (for example, the structure of element declarations) plus interfaces for applying schema information to the validation of DOMs. xdb provides access to former W3C specification for use with DTDs and more limited (only the readonly part of the ASModel interface) for XML Schema, with some productspecific alterations. For a detailed description of the interfaces used for abstract schema models, refer to the API documentation. Note: Important: At this time, the DOM L3 abstract schema specifications are cancelled by W3C. We will continue to support the AS specification for the manipulation of the model. In xdb, ASModels representing DTDs or XML Schemas are stored in the catalog. Usually you use the abstract schema of a document that was retrieved during parsing with validation. To use an abstract schema which has already been stored in the catalog, use the setactiveasmodel() method: Document doc = charterlib.createdocument(null, "chapter", null); DocumentAS document = (DocumentAS) doc; document.setactiveasmodel(model); It is also possible to get information about the declarations in the DTD, using the ASModel, ASElementDecl, ASAttributeDecl, ASNotationDecl, and ASEntityDecl interfaces. This is not yet possible for XML Schemas, use the XML Schema API for that. For example, you can use the following code to show all required attributes of an element: ASModel model = ((DocumentAS) document).getactiveasmodel(); ASElementDecl eltdeclaration = model.getelementdecl(element.gettagname()); ASNamedObjectMap attributedeclarations = eltdeclaration.getasattributedecls(); System.out.println("The following attributes are all required:"); for (int i = 0; i < attributedeclarations.getlength(); i++) { ASAttributeDecl attdecl = (ASAttributeDecl) attributedeclarations.item(i); // Check whether this attribute is required if (attdecl.getdefaulttype() == ASAttributeDecl.REQUIRED) { String attname = attdecl.getobjectname(); System.out.println(attName); As said, one way to get an ASModel into the catalog is to parse a document with validation. You can also parse a DTD or XML Schema directly into the catalog of a library (or the catalog of the first library that is an ancestor of this library and has a catalog), with code like: ASDOMBuilder builder = (ASDOMBuilder) charterlib.createlsparser(); model = builder.parseasuri(url, ASDOMBuilder.DTD_SCHEMA_TYPE); unchartercatalog.setpublicid(publicid, model); In a similar way, it is possible to serialize a schema model in the catalog: DOMASWriter writer = (DOMASWriter) charterlib.createlsserializer(); ByteArrayOutputStream output = new ByteArrayOutputStream(); writer.writeasmodel(output, model); String schemastring = output.tostring(); See also: samples: <XhiveDir>\src\samples\manual\DOMValidation.java API documentation: org.w3c.dom.as com.xhive.dom.interfaces.xhivecatalogif 6.23 Revalidate documents with XML schema To revalidate a document with XML schema, the XhiveDocumentIf function normalizedocument should be used. Configuration settings of the normalization process can be set on the document configuration. To normalize a document with validation, the "validate" parameter must be turned on. As function normalizedocument does not throw any exceptions, users are advised to set an error handler during normalization. The following piece of code shows how to set the parameters and normalize the document: DOMConfiguration config = ((XhiveDocumentIf)document).getDomConfig(); config.setparameter("validate", Boolean.TRUE); config.setparameter("error-handler", new SimpleDOMErrorPrinter());

53 Page 51 of 130 document.normalizedocument(); During validated parsing and validated normalization, psvi information is set if the document was parsed with parameter "xhive-psvi" turned on. If a document is parsed without validation, psvi information can be set at a later stage during revalidation. To revalidate against a different schema, parameter "schema-location" can be used. Before a schema location can be set, the schema type must also be set. The following code shows how to set a schema-location: config.setparameter("schema-type", " config.setparameter("schema-location", "personal.xsd"); If the schema-location is set, the validator will first search for a corresponding XML schema in the catalog. If no such schema is found, the validator will search for a schema in the file system. During parsing of a document, the schema-location is resolved relative to the uri of the document. During validation, this uri is not available. Therefore a full path must be set if revalidation is executed against a schema located in the filesystem. See also: samples: <XhiveDir>\src\samples\manual\ValidateDocumentWithXMLSchema.java API documentation: com.xhive.dom.interfaces.xhivedocumentif org.w3c.dom.domconfiguration 6.24 Access PSVI information Access to schema and validation information of attributes and elements can be achieved by using the Xerces XML Schema API interfaces. To access the psvi information of an element, the element must be cast to an ElementPSVI object. The following piece of code shows how retrieve the validity of a node: Element = (Element) document.getelementsbytagnamens(null, " ").item(0); ElementPSVI elempsvi = (ElementPSVI) ; short validity = elempsvi.getvalidity(); Schema information can be traversed by retrieving element declarations, attribute declaration and type definitions. The following piece of code shows how to retrieve a the data type name of an element: XSTypeDefinition elemtypedef = elempsvi.gettypedefinition(); String typename = elemtypedef.getname(); Another way to access data type information of a elements and attributes is by using DOM level 3 TypeInfo which can be accessed by using the XhiveNodeIf interface. The PSVI information can only be accessed to its full extent when LSParser parameter "xhive-psvi is turned on during parsing. If a document is created by using XhiveLibraryIf (which servers as DOMImplementation), function createdocumentpsvi must be used. See also: samples: <XhiveDir>\src\samples\manual\PSVI.java API documentation: org.w3c.dom.typeinfo com.xhive.dom.interfaces.xhivenodeif org.apache.xerces.xni.psvi org.apache.xerces.impl.xs.psvi 6.25 Manage users and groups xdb provides authorization features to support authorization and security requirements. Authorization in this context means managing the actions of users, and security means managing user access to information. The xdb security model is based on three levels: user, administrator, and super user. These types of user have distinct access rights: The user can access all data for which they have authorization. The administrator can access all the data and user information in one database.

54 Page 52 of 130 The super user can create and delete a database, but cannot access any of the data stored in the databases. You can refine this basic level of security and authorization using the xdb authority object, represented in the xdb API by the XhiveAuthorityIf interface. These objects are associated with, among others, Document objects, and store the following permission settings: Executable: sets whether the object can be executed. (Note: Currently, there are no objects in xdb to which this setting applies), Readable: sets whether the object can be read, Writable: sets whether the object can be written to or deleted. A document object always has exactly one attached authority object. The authority object divides users into three levels of access for the object: owner: the user who owns this object. group: the group that has group access rights to the object. other: access rights for all other users. It is logical for the owner to have higher access rights than group, and for group in turn to have higher access rights than other, but xdb gives you complete freedom to assign any access rights to any of these access levels. The xdb API offers several methods for managing users, user lists, groups and group lists, including: In interface XhiveUserListIf: adduser(), removeuser(), getuser() and hasuser(). In interface XhiveUserIf: setpassword() and isadministrator(). In interface XhiveGroupListIf: addgroup(), removegroup(), getgroup() and hasgroup(). In interface XhiveGroupIf: ismember() and users(). XhiveUserListIf userlist = united_nations_db.getuserlist(); XhiveGroupListIf grouplist = united_nations_db.getgrouplist(); // create a new user (unless it already exists) if (!userlist.hasuser(username) ) { userlist.adduser(username, userpassword); // create a new group (unless it already exists) if (!grouplist.hasgroup(groupname) ) { grouplist.addgroup(groupname); // add user to group (unless it is already a member) XhiveUserIf user = userlist.getuser(username); XhiveGroupIf group = grouplist.getgroup(groupname); if (!group.ismember(user) ) { user.addgroup(group); For a complete list of supported methods, refer to the API documentation for the interfaces mentioned. See also: samples: <XhiveDir>\src\samples\manual\ManageUsers.java API documentation: com.xhive.core.interfaces.xhivegroupif com.xhive.core.interfaces.xhivegrouplistif com.xhive.core.interfaces.xhiveuserif com.xhive.core.interfaces.xhiveuserlistif 6.26 Use versioning xdb enables the storage of different versions of a document or blob within the context of that document/blob through the (optional) versioning feature. To enable versioning for a document, use the makeversionable() method in XhiveLibraryChildIf: XhiveLibraryChildIf doc = briefinglib.get("briefing.xml"); doc.makeversionable(); The getxhiveversion() method in XhiveLibraryChildIf can be used to get the last version of a document:

55 Page 53 of 130 // get the last version of doc XhiveVersionIf lastversion = doc.getxhiveversion(); The XhiveVersionIf interface contains several methods through which information about a version can be obtained, including getdate(), getlabel() and getcreator(): System.out.println("id : " + version.getid()); System.out.println("creation date: " + version.getdate().tostring()); System.out.println("label : " + version.getlabel()); System.out.println("created by : " + version.getcreator().getname()); Working with versioned documents Once versioning has been turned on for a document, a document has to be checked out (through the checkout() method) before it can be updated: // do a check out of the last version Document lastversiondoc = lastversion.checkout(); A versioned document which has not been checked out is read-only. An attempt to update a versioned document without doing a check out generates a VERSION_ACCESS_DENIED exception. Each versioned document can only be checked out by one user at a time (i.e. a check out lock is set). An attempt to check out a document which has been checked out by another user generates a VERSION_CHECKED_OUT exception. After updating a versioned document, the document has to be checked in using the checkin() method: // do some updates on "checkedoutdoc" //... // check the document in lastversion.checkin(lastversiondoc); The checkin() method creates a new version, and releases the check out lock. An alternative way to release the check out lock is through the abort() method. When abort() is used, all updates on the checked out document are ignored and no new version is created. It is not necessary to check in the specific library child that was created by the checkout. Any document or blob (as applicable) can be checked in, thereby creating a new version. Note: Versioned documents remain accessible through the normal retrieval methods for documents. The last stored version of a versioned document is used for the retrieval, traversal, query or index Retrieving (older) versions of documents There are several methods to retrieve (older) versions of documents: getxhiveversion() in XhiveLibraryChildIf returns the last version of a document. getpreviousversion() in XhiveVersionIf returns the previous version of a document. getnextversions() in XhiveVersionIf returns all next versions of a document. Alternatively, a document's versionspace can be used to get access to the versions of a document. Every versioned document has its own versionspace which is automatically created when versioning is enabled. Method getversionspace() in XhiveVersionIf is used to get access to the versionspace: XhiveVersionSpaceIf versionspace = doc.getxhiveversion().getversionspace(); To retrieve a version through a versionspace, use either the getversionbyid() or the getversionbylabel() method: // example of accessing a version via the getversionbyid() and getversionbylabel() methods XhiveVersionIf version1_1 = versionspace.getversionbyid("1.1"); Document doc1_1 = version1_1.getasdocument(); Branching A version branch is a sequence of versions which has been separated from the main sequence of versions. Branches are normally used when multiple (groups of) users are working in parallel on the same documents. xdb automatically creates a new version branch if a document is checked in on a version which has one or more successors. Alternatively, a branch can be created explicitly by calling createbranch() on XhiveVersionIf. For example: a team of analysts is working on an article entitled "The European Airline Industry, ". The first four sections of the

56 Page 54 of 130 article are introductory and generic. Section 5 deals with future scenarios for the airlines. Because of the dynamic market situation the team decides to split itself in two smaller groups. One group will write section 5 assuming that the European Union allows government support to airlines while the second group writes the same section assuming that government support will be considered illegal. Up until section 5, the document has been checked out and checked in twice, so the latest version is 1.3. The first group ("Government Intervention") checks out this version and adds its section 5. After they check in their document, version 1.4 is created. The second group ("Free Market") checks out 1.3, adds their story of how the airline market will evolve and checks the document in. Because the check in occurs on a version (1.3) which has a successor (1.4), a new branch of version 1.3 is created: The next check in by the "Free Market" group will create version The first group continues to work on the head branch, their next check in will create version 1.5. Figure 16. Overview of branches xdb automatically numbers versions and branches as follows: versions: 1.1, 1.2, 1.3 etc. branches (for version 1.x): 1.x.2.1, 1.x.4.1, 1.x.6.1 etc sub-branches (for branch 1.x.y.1): 1.x.y.1.2.1, 1.x.y.1.4.1, 1.x.y etc. etc. The getbranches() method in XhiveVersionSpaceIf returns the branches of a document. The XhiveBranchIf interface contains methods to retrieve the versions and documents of a branch: getversions(), getversionbydate() and getheadlibrarychild() Node level versioning If desired, individual nodes (and all their descendants) of a document can be checked out. This could for instance be used to have several authors independently edit a chapter of a very large document. Nodes can be checked out on a particular branch of the document. XhiveDocumentIf doc =...; // Some versioned document Node introchapter = doc.executexquery("/root/chapter[@id='intro']").next().asnode(); XhiveVersionIf version = doc.getxhiveversion(); XhiveBranchIf branch = version.getbranch(); // Create an owner document for the copy of the chapter to be edited XhiveDocumentIf temporarydoc = session.createtemporarydocument(); XhiveCheckoutIf checkout = branch.checkoutnode(introchapter, temporarydoc); Node chaptercopy = checkout.getnodecopy(); // Edit the copy of the chapter contained in chaptercopy //... Map<Node, Node> nodes = Collections.singletonMap(introChapter, chaptercopy); branch.checkinnodesandmetadata(nodes, null, 0); Metadata fields can be checked out as well. Checking out a particular key name will allow the check in of a value for that key. XhiveLibraryChildIf lc =...; // Some versioned document or blob XhiveVersionIf version = lc.getxhiveversion(); XhiveBranchIf branch = version.getbranch(); branch.checkoutmetadatafield("key"); String oldvalue = lc.getmetadata().get("key"); // If required Map<String, String> metadata = Collections.singletonMap("key", "newvalue"); branch.checkinnodesandmetadata(null, metadata, 0); Any number of nodes and metadata fields can be checked in at once, to create a single new version of the document on that branch. Because nodes are checked out on a branch instead of on a single version, checking in a node will create a new head version of the branch, regardless of what the latest version was when the node was checked out. To create a new branch, use XhiveVersionIf.createBranch(). See also: samples: <XhiveDir>\src\samples\manual\Versioning.java

57 Page 55 of 130 <XhiveDir>\src\samples\manual\Branching.java API documentation: com.xhive.dom.interfaces.xhivelibrarychildif com.xhive.versioning.interfaces.xhivebranchif com.xhive.versioning.interfaces.xhivecheckoutif com.xhive.versioning.interfaces.xhiveversionif com.xhive.versioning.interfaces.xhiveversionspaceif 6.27 Using metadata on library children All library children (libraries, documents and blobs) can contain metadata. Metadata consists of key-value pairs of strings. These can be used for any purpose desired, e.g. to store information about a document that you do not want to store in the document itself, because it does not fit the DTD in use. The metadata is a java.util.map that can only contain strings. Example: XhiveDocumentIf doc =...; doc.getmetadata().put("author", "Jane Doe"); If the document is versioned, the metadata cannot be changed, just like the document itself. When a new version of the document is checked in, the metadata of the document that is checked in is also checked in, overriding the previous version of the metadata. For use of metadata in XQuery queries, see the xhive:metadata function in the section on XQuery extension functions Using JAAS to connect to the database xdb has its own users and groups, and authority system. However, in certain cases you will want to authenticate users through external authentication systems, like an LDAP database or based on operating system authentication. xdb allows you to authenticate based on JAAS. JAAS offers a pluggable authentication module framework, with modules offered for different kinds of authentication systems. When xdb is used in combination with JAAS, the checking of username and password is handled by JAAS. All other user and group related functionality is still handled by xdb however. To enable it to work, once JAAS authentication is successful xdb will automatically create users and groups within xdb that match the authentication information. To be able to use JAAS authentication with xdb, you will have to implement certain Java interfaces and configure the PAM (Pluggable Authentication Module) used. Specifically, the following has to be done: You must configure your PAM, e.g. server to connect to, in the JAAS configuration file or in your Java code (module specific). Then you must enable JAAS on the xdb driver object, with a call to driver.getsecurityconfig().enablejavaauthentication (chosenconfigurationentryname, XhiveNameHandlerIf). You can only call this method for drivers that connect directly to the bootstrap file (not to a xhive://... URL, although your application can act as a server for other such drivers). Side note: The standard dedicated server that comes with xdb cannot be configured for JAAS so cannot be used in combination with JAAS. For the XhiveNameHandlerIf argument in the above call you must provide your own implementation of that interface. The goal of that interface is to map JAAS user/group objects (Subjects) to xdb usernames and groupnames (Strings). Then, when connecting on a session you must use connect(databasename, CallbackHandler) instead of the standard connect (username, password, databasename). The callback handler used in the call above is a JAAS interface that allows passing authentication parameters to your PAM. You can provide your own implementation, but there may already be standard classes available that you can use, like DialogCallbackHandler (that is used in the Adminclient for JAAS authentication). The exact details of implementing and configuring JAAS are not explained here, as it is dependent on the PAM and actual authentication system used. You can start with the following sources for more information: We provide a sample for connecting to an LDAP server in the XhiveDirsrc/samples/ldap directory, which shows examples of what interfaces must be implemented, and what API calls must be used to enable JAAS authentication. Sun offers reference guides, tutorials and API documentation. See also: samples: <XhiveDir>\src\samples\manual\../ldap/SampleClient.java <XhiveDir>\src\samples\manual\../ldap/XhiveServerWithLDAP.java API documentation: com.xhive.core.interfaces.xhivedriverif com.xhive.core.interfaces.xhivesessionif

58 Page 56 of Using Sessions & Locking In xdb, all access to the database must take place in transactions within sessions. Sessions can be seen as connections to a database. In a session, you can start multiple transactions. Within those transactions, you can perform actions on the database, commit changes to the database, or do rollbacks. Associated with transactions are locks. Whenever you change data in a transaction, that data is locked. Locks are released when a transaction is committed or rollbacked. When one transaction has locked data, other transactions can not alter it at the same time. The xdb session mechanism complies with the ACID database properties: Atomicity: either all actions in a transaction succeed and are made persistent in the database (after a commit()), or none of the actions are (after a rollback()). Consistency: within a transaction you get a coherent view of the database. This means that during a transaction, all read actions on a particular part of the database will return the same value. Isolation: changes made in one transaction are not visible in concurrent transactions until the transaction is committed. Durability: it is guaranteed that when a commit() succeeds, the data is really written to the disk. 7.1 Working with sessions This section gives information on how XhiveSessionIf objects should be used. There is information on the normal order in which the methods of session are called and details on the individual methods The session lifecycle Sample code on how to create and work with sessions can be found in the "Creating Applications" chapter. A full session lifecycle consists of at least the following operations (except for createsession() all methods mentioned can be found in the XhiveSessionIf interface): createsession() (on XhiveDriverIf) join() connect() begin() Perform database actions commit() or rollback() leave() disconnect() terminate() When creating xdb applications, we advise you to follow this model closely. The methods will be described in detail below. We also advise you to reuse sessions as much as possible. The easiest way to do this is by creating a pool of sessions. The following code comes from the sample servlet that can be downloaded from the EMC Developer Network. and shows how to deal with sessions when a session pool is used: XhiveSessionIf session = sessionpool.getsession(); try { session.join(); session.begin(); executerequest(session); session.commit(); finally { if (session.isopen()) { session.rollback();

59 Page 57 of 130 if (session.isjoined()) { session.leave(); sessionpool.returnsession(session); As can be seen, the structure is very simple. There is only one conditional statement in the code. The finally-block ensures that the session transaction will never remain open. When an exception occurs in the try-block, the session will be rolled back, otherwise the commit has succeeded. In this sample the actual database operations are performed in executerequest(), which does not contain any session-related operations at all XhiveDriverIf.createSession() Creating a session is a relatively slow and resource-hungry task. Our advice is to call this method only when necessary and to use a pool of sessions where possible. The easiest way to do this is by using a java.util.stack on which to push back sessions when they are no longer needed. To retrieve a session, you could use the following code: XhiveSessionIf session; if (! sessionstack.isempty()) { session = (XhiveSessionIf) sessionstack.pop(); else { session = xhivedriver.createsession(); join() By calling this method, you join this session to the current thread. This means that only this thread can use database objects (like documents) that belong to this session. When a session is created, it is automatically joined to the thread that creates it. You should always call this method before you start using a session in a certain thread. When working with servlets or EJBs, you will execute each request in an 'unknown' thread, but during the execution of the request the thread does not change, so join() should only be called once. The time it takes to execute the join() method can almost be neglected. Still you should only call it when needed, as it can help you to detect unexpected or unwanted thread changes. Note: It is not possible to use a single session concurrently in multiple threads, use separate sessions for multithreaded use of xdb. You should not try to serialize use of a session in multiple threads by synchronizing (using the Java synchronized keyword) on the session object. This will conflict with internal use of the session synchronization and may lead to deadlocks. If you need to serialize use of a session, use an application object to synchronize on leave() By calling this method, you indicate the session is no longer bound to this thread. It is important to call leave() on no longer used sessions in threads. Otherwise it might not be possible to terminate the session after the thread is exited and terminate() is called from another thread connect() A session is connected to a database by calling connect(). The overhead of this call is relatively small, so in a multi-user setting you may choose to connect every time you start using a session begin() A call to begin() starts a transaction. After the call to begin(): All the changes you make in the database are part of the transaction, and only become visible in the database after a commit() or checkpoint(). All the data you are reading from the database is in the same state as at the time of the begin commit() A call to this method commits all changes you made to the database, and cleans up data temporarily stored in the database (like intermediate query results). Also, all locks are released. The time taken up by a commit depends on the amount of changes made in the transaction, especially certain removal operations can be quite expensive in terms of processing time.

60 Page 58 of rollback() When you call rollback(), all changes made in the database since begin() or checkpoint() are revoked. This method can have considerable overhead, as changes that were made may already have been written to disk checkpoint() When checkpoint() is called, the changes made since begin() or the last checkpoint() within the transaction are committed in the database. The changes will now be visible to other sessions. The transaction remains open after a checkpoint(). There are differences between checkpoint() and a commit()/begin() pair: The locks on the database are kept (unless you pass the option true to downgrade the locks). checkpoint() is faster than a commit()/begin() pair. Like a commit(), checkpoint() will delete all temporary objects, like XQuery constructed result nodes or elements created but not appended to their document. Therefore, even though references to existing database objects remain valid, you can no longer use these temporary objects after a checkpoint() disconnect() A call to disconnect() brings the session back to the state it was in when the session was first created. disconnect() has no overhead terminate() When you terminate a remote session, it will close the TCP connection to the server. Terminating a local session has no effect. After a call to terminate(), the session object can no longer be used. If you do not terminate a session, it will continue to use resources until it is garbage collected. The finalizer will close any TCP connection to the server. Open sessions (those within a transaction) have internal references pointing to them and will never be garbage collected. Sessions not in a transaction (whether connected or not) will be garbage collected when you release all references to them. See also: API documentation: com.xhive.core.interfaces.xhivesessionif Sessions and references to database objects Objects in the database can only be accessed in an open transaction. Furthermore, objects retrieved from a database in one transaction cannot be used in following transactions after a commit(). So for instance, the following is not allowed: session.begin(); XhiveLibraryIf library = session.getdatabase().getroot(); session.commit(); session.begin(); System.out.println(library.getName()); session.commit(); Instead, you should do: session.begin(); XhiveLibraryIf library = session.getdatabase().getroot(); session.commit(); session.begin(); library = session.getdatabase().getroot(); System.out.println(library.getName()); session.commit(); The reason for this is that after you have called commit() and begin() the library may have been removed in another session. If you do try to use an object in a different transaction, you will usually get an XhiveException.OBJECT_DEAD, which indicates that the object cannot be used anymore in your Java application. Instead of calling commit() and begin(), you can also call checkpoint(), which makes all changes permanent, but does not refresh the view on the database. Also, by default, all locks are kept. Therefore, you can keep using all references to database objects you have retrieved. 7.2 Sessions and transaction isolation As mentioned before, xdb supports transaction isolation and atomicity. When begin() is called, the view on the database for that session is

61 Page 59 of 130 updated, meaning the changes made in other transactions within other sessions become visible. This section gives an example with three sessions and shows when a change in readwrite session A becomes visible for the other readonly sessions (B and C). Each column shows the actions in a session, in chronological order. Session A Session B Session C begin() begin() adddocument 'doc' // 'doc' not seen begin() // 'doc' not seen commit() commit() // 'doc' not seen begin() commit() // 'doc' is seen begin() commit() // 'doc' is seen commit() As shown in the example above, the document added in the transaction within session A is not visible in any other transaction until the transaction within session A is committed. And even then, the open transaction of session B still would not see the document that was added until its next call to begin(). 7.3 Sessions and locking To ensure that the same data cannot be modified in different transactions, data is locked during modification. These locks are placed on the related objects as soon as they are modified, and they are released after a commit() or rollback(). When an object is locked, other transactions cannot change it at the same time. By default, a transaction that attempts to use a locked object blocks until the lock is released. Besides implicit locking of data during modification, you can also lock libraries explicitly by calling lock() on them What gets locked when? An xdb database is divided into a number of so-called 'locking contexts'. Within a locking context, as soon as you make a change in any one item within that context, the entire locking context gets locked. For instance, a library is a locking context. When you add a document to that library, other concurrent sessions cannot add any other documents in that library until commit() is called. The following table shows a number of sample actions and describes what will get locked: Action Add/remove a document (or library) Modify a document Add/remove an index to/from a library Add/remove a user/group Update a user/ group Update a context conditioned index What gets locked The library to which the document is added. The document. The library to which the index is added. The database object (which means that one concurrent thread can make these changes). The database object. The context conditioned index. Documents in xdb use an internal data structure called namebase. This structure cannot be accessed directly using the API, but its existence is relevant for the locking behaviour. The namebase maps element and attribute names to small integers that take less space and time to process. When the namebase is changed (e.g., when a document is parsed that contains new element names) it is locked. When creating a library you can specify two options that influence the namebase locking behaviour for that library: XhiveLibraryIf.LOCK_WITH_PARENT: By default, each library will be created with its own namebase. However, in certain cases you may want to share the namebase of the parent library, especially if you have many libraries with very little content. XhiveLibraryIf.DOCUMENTS_DO_NOT_LOCK_WITH_PARENT: By default, all documents in a library share the same namebase. If you have a reasonably fixed set of documents, where the contents of the documents gets changed often, it could be useful to use the DOCUMENTS_DO_NOT_LOCK_WITH_PARENT in the creation of the library. In that case, each document in the library will get its own private namebase.

62 Page 60 of 130 Using more namebases improves concurrency, but adds some space and processing overhead. The default options are a sufficient compromise for most applications. We recommend using the default options unless testing conclusively shows a need to do otherwise. See also: API documentation: com.xhive.dom.interfaces.xhivelibraryif xdb behavior when a locking conflict occurs When one transaction tries to lock a database object for writing while another transaction has read the same object, or when one transaction tries to read a database object that another transaction is currently writing, that transaction cannot continue. What happens at that point depends on the wait option setting and the status of the collection of all current locks. You can specify what xdb must do when a transaction attempts to use a locked object, through the wait option setting on a session (XhiveSessionIf.setWaitOption()). By default (WAIT), the transaction blocks until the lock is released by the other transaction (when that transaction ends). You also set the wait option to not wait at all (NO_WAIT), in that case an XhiveLockNotGrantedException is thrown as soon as a locked object is encountered. It is also possible to set a time (in milliseconds) that must be waited for a lock to be granted, this option is useful if you want to wait for locks but not for concurrent transactions that take a very long time. When the wait time has passed and the other transaction still locks the database object, an XhiveLockNotGrantedException is thrown. Regardless of the wait option used, it is possible to get lock exceptions. This is because deadlocks can occur. Here is one common scenario: Transaction A reads document X Transaction A writes document X Transaction B reads document Y Transaction A wants to write document Y -> blocks because transaction B already has a readlock Transaction B wants to read X -> blocks because transaction A already has a writelock At this moment, both transactions cannot continue, because they are both waiting for each other to finish. In this case, xdb will pick one transaction and throw an XhiveDeadlockException (which is a subclass of XhiveLockNotGrantedException) in it. If a rollback is then performed on that transaction, the locks will be released so that the other transaction can continue. It is possible to get a deadlock even when only one database object is involved: Transaction A reads document X Transaction B reads document X Transaction A wants to write document X -> blocks because transaction B already has a readlock Transaction B wants to write document X -> blocks because transaction A already has a readlock In your application code, you should always take into account that locking exceptions can occur, and catch them and act appropriately. Usually the appropriate thing to do is to restart the transaction and try the same operation again. By using readonly transactions (see below) that do not use locks, you can help alleviate the number of potential locking conflicts Readonly transactions On session objects, you can set readonly mode by calling setreadonlymode(). By default, transactions are readwrite. This means that they can modify database objects. Readonly transactions cannot modify database objects. The advantage of readonly transactions is that they do not take any locks, not even read locks. This improves concurrency with transactions that do modify data. To get a consistent view of the database without using locks, readonly transactions view a logical snapshot of the data at the time the transaction begins (using the begin() method of the session). Any modifications to the data since that time (whether committed or not) are not seen by the readonly transaction. Data pages that have been deleted (e.g., when deleting a document) will not be reallocated to new documents as long as there are still open transactions that could possibly use the old data. Therefore, you should not keep transactions open indefinitely. The checkpoint() method has no effect in readonly transactions. 7.4 The xdb info command The xdb info command prints debug information on the currently open transactions and the locks that they have. It is a simple wrapper around the XhiveDriverIf.printSessionInformation() API that prints the information to standard output. Only sessions that are currently open (in a transaction) are shown. If sessions are not open, they are not listed in any internal administration to allow them to be garbage

63 Page 61 of 130 collected if no longer referenced by user code. Locks are printed as either R(ead) or W(rite) locks, an internal id and a short description of the locked object. If the description says "<page not in cache>", this usually means that the object locked is new and its first page has not yet been entered in the database server cache. It can also mean that the first object page has been removed from the cache to create space for other pages. Although in this case the object could be read from disk to retrieve the name, this is not done to avoid affecting the performance of current transactions too much. If your application creates different sessions for different purposes, it can be useful to give the sessions a name to identify them in the output of xdb info. This name can be set by creating the sessions using XhiveDriverIf.createSession(String name). 8 Using XQuery This chapter explains how to use XQuery in xdb. 8.1 Executing queries You can execute an XQuery query using the executexquery(string query) method on the XhiveNodeIf interface. It return an iterator that represents the result sequence. Each element of the result is an instance of XhiveXQueryValueIf. In xdb for Java 5 (JDK 1.5 and later) all iterators and results are also typed using the new template syntax (e.g. executexquery(string) returns an Iterator<XhiveXQueryValueIf>). XhiveNodeIf lc =... ; Iterator result = lc.executexquery("doc('doc')//item"); while (result.hasnext()) { XhiveXQueryValueIf value = (XhiveXQueryValueIf)result.next(); // We know this query will only return nodes. Node node = value.asnode(); // Do something with the node... Within the query, the context item (accessible via.) is initially bound to the node the query was executed on. Example (Java 1.5 syntax): XhiveNodeIf node =...; Iterator<XhiveXQueryValueIf> result = node.executexquery("./author/first,./author/last,./contents"); for (XhiveXQueryValueIf value : result) { // do something with the value... If you only want to display the result, you can use the tostring() method on the values returned, regardless of their type: XhiveLibraryChildIf lc =... ; String query =... ; Iterator result = lc.executexquery(query); while (result.hasnext()) { System.out.println(result.next().toString()); If the query uses node constructors, any nodes created are created in a temporary document. If desired, these nodes can be inserted into another document using the DOM importnode() method. If you want to insert the nodes into a particular document, you can specify an owner document for new nodes in the call. This is more efficient than creating a temporary document and importing its nodes into the destination document. XhiveLibraryChildIf lc =... ; XhiveDocumentIf doc =... ; // Create new nodes in this document Iterator result = lc.executexquery("<count>{count(//item)</count>", doc); // We know this query will only return a single node. XhiveXQueryValueIf value = (XhiveXQueryValueIf)result.next(); Node node = value.asnode(); // Append it to the document element of destination document doc.getdocumentelement().appendchild(node); The query result is evaluated lazily, i.e., each time you call next() on the result iterator. Beware not to call result.next() after a modification (in the same session) of the searched documents or libraries. If you do this, undefined results may occur. If you want to use the query output to modify the searched documents, use the xhive:force() function or the update syntax (see below). 8.2 External variables and functions XQuery provides a method for importing external values into the query scope (parameters). To use this feature, first create a query using the method createxquery(string query) on a XhiveNodeIf. This method parses the query, resolves module imports etc, and then returns an object of type XhiveXQueryQueryIf that represents the query. The XhiveXQueryQueryIf object is only valid for the current database session, so make sure you don't use it across multiple sessions.

64 Page 62 of 130 XhiveNodeIf node =...; XhiveXQueryQueryIf query = node.createxquery( "declare variable $pi external; " + "for $rad in doc('radius.xml')//radius " + "return ($rad * $rad * $pi)"); query.setvariable("pi", java.lang.math.pi); Iterator<XhiveXQueryValueIf> result = query.execute();... XhiveXQueryQueryIf also provides a executeon(xhivenodeif node) method, that allows you to run the same query multiple times on different context items. Using executeon() the initial context item (".") will point to the given XhiveNodeIf. XhiveXQueryQueryIf provides methods to set variables to all kinds of primitive Java types (int, float,...). It also understands certain Javas numeric objects (Integer, Float,... and the two unlimited precision datatypes BigInteger and BigDecimal) and, of course, Strings and all XhiveXQueryValueIf objects. It is also possible to supply an Iterator over a sequence of such objects. This can be especially handy for executing X-Query queries over the results of other queries, effectively creating a lazily executed X-Query pipeline. Iterators used by a query cannot be reused afterwards, not even by the same query, so the declared variable will be empty if this query is run again. XhiveNodeIf node =...; Iterator<XhiveXQueryValueIf> subresult = node.executexquery(...); XhiveXQueryQueryIf query = node.createxquery( "declare variable $values external; " + "for $value in $values " + "return $value + 5"); query.setvariable("values", subresult); Iterator<XhiveXQueryValueIf> result = query.execute();... Custom functions can be set using the setfunction(string, XhiveXQueryExtensionFunctionIf) method. You can find out more about the extension functions in the API docs for XhiveXQueryExtensionFunctionIf. Your supplied function may or may not be called, depending on optimization, and it may especially be called in a different order then you expect. XhiveNodeIf node =...; XhiveXQueryQueryIf query = node.createxquery( "declare function circle-area($radius as xs:number) as xs:double external;" + "for $rad in doc('radius.xml')//radius " + "return circle-area($rad)"); query.setfunction("circle-area", new XhiveXQueryExtensionFunctionIf() { public Object[] call(iterator<? extends XhiveXQueryValueIf>[] args) { double rad = args[0].next().asdouble(); double res = java.lang.math.pi * rad * rad; return new Object[]{ new Double(res) ; ); Iterator<XhiveXQueryValueIf> result = query.execute();... All the methods for setting variables and those for functions come in two flavors, one which takes a single String as the name and places the variable/function in the empty namespace, the other one takes two String arguments, the first being the namespace URI and the second the localname. If you use a namespace URI you will have to declare a prefix for it within the query. In xdb it is not strictly necessary to declare external variables and functions, but it is recommended for compatibility with other XQuery implementations. 8.3 Accessing documents and libraries In a query, you can refer to specific documents using the XQuery doc() function. Its argument should be a path, optionally starting with a '/' and containing names or ids of libraries or documents, separated by '/'. Paths that don't start with a '/' are evaluated relative to the context item the query is executed on, or it's parent library. If the path designates a library, the function returns a sequence of all documents in that library and its descendant libraries. Examples: doc("/"), (: All documents in the database :) doc("/document.xml"), (: The document "document.xml" in the root library :) doc("/mylibrary"), (: All documents in "MyLibrary" :) doc("/id:10"), (: The document (or all documents in the library) with id "10" :) doc("/mylib/mydoc"), doc("/mylib/mysublib/id:1234") doc("relative/path") doc("../steps/work/./too") The argument does not have to be a string literal, it can be any expression returning a string.

65 Page 63 of 130 In xdb, the collection() function is currently a synonym for doc() and works in precisely the same way, except that it can also be called without any parameter If there is a context node, an absolute path expression starts at the root of the context node fn:root(.). Otherwise (in an outer expression), it starts at the document or all documents in the library that the executexquery method was called on, or the document containing an initial context node. If you want to access the calling document(s) when there is a context node, use the xhive:input() function. /docelem[//@id="2"] (: this is equivalent to :) xhive:input()/docelem[root(.)//@id="2"] 8.4 XQuery Error reporting Errors within XQuery processing are reported by throwing exceptions extending XhiveXQueryException, which in turn extends XhiveException. All XQuery related exceptions are in the package com.xhive.error.xquery. XhiveXQueryErrorException - thrown on semantic errors within the query, either thrown directly or one of the subclasses: XhiveXQueryTypeException - on errors related to the type system, e.g. when a supplied value didn't match the expected type XhiveStackOverflowException - on stack overflows in user defined functions XhiveXQueryParseException - on parse errors in the query XhiveXQueryFTSParseException - if a FTS query is incorrect XhiveXQueryUnknownFunctionException - if an unknown function is used in the query XhiveXQueryUnsupportedException - if a specific unsupported feature is used (currently only collations in order by statements) or the declared XQuery version is greater than 1.0 XhiveXQueryInternalException - on internal errors 8.5 Data model discrepancies The xdb data model (which is the Document Object Model with extensions like libraries) does not fit the XQuery/XPath data model perfectly. Following the Document Object Model (DOM) Level 3 XPath Specification xdb does the following: Entity reference nodes are treated as if they had been expanded. Queries will never return entity reference nodes. Children of an entity reference node are treated as siblings of the siblings of the entity reference node. Adjacent text and CDATA section DOM nodes are treated as single XQuery text nodes. The string value of the XQuery text node is the concatenation of the contents of the adjacent DOM text and CDATA section nodes. A query like //text() returns only the first of each set of adjacent DOM nodes. BLOB nodes and library nodes don't have a representation in XQuery/XPath, so they are invisible to queries. However, selecting a library using the doc() function returns all elements in the library, as described above. 8.6 Current implementation xdb's XQuery implementation is based on the XQuery 1.0 W3C Recommendation (23 January 2007) XPath axes xdb implements the 'Full Axis Feature', thus providing the axes ancestor, ancestor-or-self, following, following-sibling, preceding and preceding-sibling Module Imports xdb implements the 'Module Import Feature', which provides a way of creating library functions. Modules can be imported using the following syntax: import module namespace prefix = ' at 'location'; (:... use functions from the module... :) XQuery leaves the syntax for the location of modules as implementation defined. In xdb, the location part can be any valid Java URI (think file://..., etc.) as well as a URI within the database. Use xhive:// or just a relative/absolute path without a protocol identifier, it follows the same syntax as for the doc() function). Import paths are evaluated relative to the XhiveNodeIf (or respectively the library containing it) on which the query is executed or created.

66 Page 64 of 130 Importing a module into the current query will make all those functions and variables available, that have been declared within the module namespace. Modules can import other modules themselves (as long as they don't create a loop). If a module imports another module, functions and variables from the latter one are available only within the directly importing module, they are not propagated further. In a slight extension of the standard xdb allows modules to declare variables and functions outside the module namespace. Those can only be used within the module itself. XQuery modules can be stored within xdb either as BLOB nodes or as XML documents. BLOB nodes must contain the module in flat UTF-8 text, XML documents can have any encoding as long as it is correctly specified when importing them. In XML documents the string-value of the document root element, as defined by the XQuery Data Model, is used as the query. This basically means that the query is the concatenation of all text nodes below that root node. Example: <querymodule><![cdata[ module namespace mns = ' declare variable $mns:pi := ; declare function mns:circle-area($r as xs:double) as xs:double { $r * $r * $mns:pi ; ]]></querymodule> The name of the root element (here: <querymodule/>) is ignored, anything will work. Feel free to choose something interesting. Be careful about escaping XML within the query. It is recommended to use a CDATA block for the contents of the module, as otherwise embedded direct element constructors will be interpreted as XML syntax. For example in the following module: <querymodule> module namespace foo = 'bar'; declare variable $foo:element := <element>"hello World!"</element>; </querymodule> the variable $foo:element will just contain the string "Hello World!", because <element/> was not escaped. XQuery modules that are stored outside of xdb (accessed via URIs) are always expected to be in UTF-8 character encoding Supported options and extension expressions (pragmata) xdb supports a set of options. Those can be set globally in the query prologue using the syntax: declare option QName "Value"; If you just want to set an option for a specific part of you query, you can use an extension expression. Extension expressions are specified using the syntax (# QName Value #) { expr where the option identified by the QName is set for the whole inner expression. Quotes around Value are optional. Multiple options can be set at once by writing multiple (# #) parts before the curly braces. Currently supported options are: xhive:index-debug allows you to check if an index is used in a query. When its value is different from the empty string, the query evaluator will print a message to an output stream whenever a value is looked up in an index selected by the optimizer. You can set the output stream using XhiveXQueryQueryIf.setDebugStream(), by default messages are printed to System.out. xhive:queryplan-debug - works like index-debug but shows how the query is divided into parts (and in what order the parts are executed), and shows what indexes (with what options) are looked for. xhive:pathexpr-debug - works like index-debug but shows what low level expressions within the XQuery are executed, and in what order. xhive:optimizer-debug - works like xhive:queryplan-debug but shows how the query optimizer tries to create an index plan for a path expression. The output contains detailed information on the indexes that are considered, including those that are eliminated, and how a query plan is constructed. The contents of the output are not currently documented. It is recommended that you set the output stream for this option using XhiveXQueryQueryIf.setDebugStream(QName XhiveXQueryQueryIf.XHIVE_OPTIMIZER_DEBUG, PrintWriter printwriter), so that the output does not mask the output from other debug options, as the output is very verbose. xhive:ignore-indexes - give a comma separated list of indexes that should not be used to optimize accesses. xhive:fts-analyzer-class - set a fully specified classname of the Analyzer that is to be used in the XQuery Full Text query or

67 Page 65 of 130 xhive:fts function when no indexes are present. See Full text indexes for more information about full text indexes and analyzers. xhive:fts-similarity-class - set a fully specified classname of the Similarity that is to be used for calculation of scores in the XQuery Full Text query. xhive:timer - set up a timer for the encapsulated expression. The second parameter is used as a label for the timer. The timer values can be accessed via XhiveXQueryQueryIf.getTimings(). xhive:max-tail-recursion-depth - set the maximum recursion depth for tail recursive functions. The default is xhive:implicit-timezone - set the implicit-timezone, used by functions and operations on the various date types (xs:date,...) if no explicit timezone is given. The default implicit timezone is set to the local timezone. xhive:fts-implicit-conjunction - set the implicit conjunction operator for full text search. The only valid values are "AND" and "OR". The default implicit conjunction operator is "OR" Examples declare option xhive:index-debug "true"; doc("/products")//product[@product_id = "42"] (# xhive:index-debug "true" #) { doc("/products")//product[@product_id = "42"] (# xhive:queryplan-debug "true" #) (# xhive:pathexpr-debug "true" #) { doc("/products")//product[@product_id = "42"] (# xhive:queryplan-debug "true" #) (# xhive:optimizer-debug "true" #) { doc("/products")//product[@product_id = "42"] declare option implicit-timezone 'PT10H'; adjust-datetime-to-timezone(xs:datetime(" t10:00:00-07:00")) (# xhive:fts-implicit-conjunction 'AND' #) { document("/manual")//paragraph[xhive:fts(.,"long list of words")]/text() XQuery Update Syntax xdb implements the current W3C XQuery Update Facility standard. As of xdb version 9.0, this is the standard at This standard is in candidate recommendation state, so the specified behaviour might still change. Updates in XQuery are evaluated using snapshot semantics. This means that the query is evaluated completely before the updates are applied, so that update effects are not visible from within the query. This allows lazy evaluation and out of order execution, and it also makes updates less error prone. To make sure that all updates have been generated and applied after execute()/executexquery() returns, an XQuery using the update syntax is not evaluated lazily but rather at once, caching the results. This can lead to increased memory usage if a query executes updates and returns a lot of values. xdb also supports a proprietary update syntax and will continue to do so. Some operations on documents and libraries are only possible with this custom syntax. This syntax consists of the following functions: xhive:create-library($uri as xs:string) as empty-sequence() creates a library with the given $uri. If parent libraries mentioned in the path are missing, they are created too (the function behaves like UNIX mkdir -p). There is a set of insert functions that follow the same style. The functions are xhive:insert-into($where as node(), $what as item()*) as empty-sequence(), xhive:insert-into-as-first($where as node(), $what as item()*) as empty-sequence (), xhive:insert-into-as-last($where as node(), $what as item()*) as empty-sequence(), xhive:insert-before ($where as node(), $what as item()*) as empty-sequence(), xhive:insert-after($where as node(), $what as item() *) as empty-sequence() Applying an insert function inserts the given items ($what) relative to $where: into it as last, as first, before or after it, respectively. insert-into and insert-into-as-last behave identical. Atomic values within $what are converted into text nodes like in element constructors. If $where is not a node or the empty sequence, an error is raised.

68 Page 66 of 130 xhive:insert-document($uri as xs:string, $document as document-node()) as empty-sequence() inserts the given $document at $uri. If there is already a document at $uri an error is raised. xhive:remove($nodes as node()*) as empty-sequence() removes the given $nodes from their parents. xhive:delete() is an alias to this function. xhive:remove-library($uri as xs:string) as empty-sequence() removes the library at $uri with all it's children. xhive:rename-to($what as node(), $newname as xs:qname) as empty-sequence() renames the given node to $newname. This function raises an error if it's target is not an attribute node, an element node, a processing instruction or a document node. Additionally processing instructions can only be renamed to unqualified localnames, e.g. QNames without a namespace URI. To construct a QName, use the standard function fn:qname($uri as xs:string?, $qname as xs:string) as xs:qname. xhive:replace-value-of($where as node(), $newcontents as item()*) as empty-sequence() removes all children of $where and replaces them with $newcontents. Similar to xhive:delete($where/node()),xhive:insert-into($where, $newcontents). xhive:move($target as node(), $sources as node()*) as empty-sequence() and xhive:move($target as node(), $anchor as node()?, $sources as node()*) as empty-sequence(). Examples: This allows to directly move DOM nodes into a new target. By default, $sources are inserted as last into $target. If $anchor is given and non empty, the $sources will be inserted before $anchor. Using this function has a potential performance advantage as nodes do not need to be copied/imported if $target and $sources belong to the same document. More importantly, this allows to move nodes that are covered by indexes with UNIQUE_KEYS flags. Moving elements with a "delete node $node" and an "insert node $node into $target" statement will give a DUPLICATE_KEY exception if any of the nodes below $node are covered by a unique index. Use xhive:move($target, $node) will work. for $book in doc('bib.xml')/bib/book where $book/@year < 1990 return xhive:remove($book) for $book in doc('bib.xml')/bib/book, $review in doc(' where $review/@isbn = $book/@isbn return xhive:insert-into($book, $review) xhive:insert-doc('/lib/newfile.xml', document { <root>... </root> ) Conditional Order By xdb implements a proprietary extension to order by statements that makes it possible to order the results of a query depending on user input. The XQuery FLWOR grammar is extended as follows: ordermodifier: ("ascending" "descending" <"ascending" "if" "("> ExprSingle ")")? (<"empty" "greatest"> <"empty" "least">)? ("collation" URILiteral)? The expression in parenthesis is evaluated to a boolean value and the result is order ascending if the result is true, descending otherwise. This syntax can be quite useful for writing queries that need data to be ordered by many different columns, depending on user input. Imagine ordering tabular data with 8 columns in all descending/ascending combinations by writing 64 different queries and encapsulating them in if statements. declare variable $asc_order1 external; declare variable $asc_order2 external; for $entry in //...

69 Page 67 of 130 order by $entry/id ascending if ($asc_order1), $entry/name ascending if ($asc_order2) return $entry Together with dynamic checks for QNames (e.g. $entry/*[node-name() eq $order_col1] instead of $entry/id) you can avoid writing a lot of duplicated query code. Please note that this functionality is proprietary and queries using it will not be compatible to other XQuery implementations. Note that the ascending if.. syntax prevents index supported evaluation of order by expressions Extension functions xdb implements a number of useful functions that are not part of the XQuery Working Draft. These are all in the namespace which is bound to the prefix xdb by default. xhive:fts($context as node(), $query as xs:string, $options as xs:string) as xs:boolean executes a query using xdb's full text index. The $options argument is optional and (if present) should be a string literal containing a semicolon-separated list of options. There is a separate section on this function. Example: doc("/library")//chapter[xhive:fts(title, "venice and merchant*")] The include-attrs option executes a query using xdb's full text index, but also looks in attributes of elements under $context. xhive:evaluate($query as xs:string) as item()* takes a single string argument and evaluates it as an XQuery query. It returns the result of that query. Example: for $query in doc("/queries")//query return <queryresult> { $query <result>{ xhive:evaluate($query) </result> </queryresult> xhive:parse($doc-text as xs:string, $schema-hint as xs:string) as document-node() and xhive:parse($doc-text as xs:string) as document-node() take the serialized text of an XML document and parse it into a document. The document will be validated if it declares a schema by default ("validate-if-schema"). Validation against a certain schema can be forced by passing a $schema-hint. If the document is not well formed, not valid or fails to parse in some other way, an error is thrown. (: parse the contents of the given element and return it as a document-node() :) xhive:parse(/channel/item[1]/content:encoded) (: Take the given serialized document and store it in the DB :) declare variable $doc-text as xs:string external; let $doc := xhive:parse($doc-text, " atom.xsd") return xhive:insert-document("feed-lib/newentry.xml", $doc) xhive:input() as document-node() returns the the calling document(s), useful for when there is another active context node. For a sample see above. xhive:java($class as xs:string,...) as item()* calls a function written in Java by the user. Please see the API docs for com.xhive.query.interfaces.xhivexqueryextensionfunctionif for details. xhive:java("com.mydomain.myclass", $x, doc("/mydoc")//item) xhive:get-nodes-by-key($library as xs:string, $indexname as xs:string, $key as xs:string) as node()* looks up the key in the index with the specified name on the specified library (or document) and returns the nodes in the index, providing direct access to indexes. E.g., xhive:get-nodes-by-key("/mylibrary", "item_index", "pc34") xhive:set-option(xs:string $optionname, xs:string $value) as empty-sequence() sets an option (as a side effect) and returns the empty sequence. The possible options are described above. This method is deprecated, use the syntax described above! xhive:document-name($document as document-node()) returns the name of the document as set by XhiveLibraryChildIf.setName(String name). If the node passed is not a document or is a document without a name, this function returns the empty sequence. xhive:force($items as item()*) as item()* forces the immediate evaluation of its argument. If you use this function as the outermost expression of your query, the query will be evaluated immediately and the result will be stored internally. This can be used if you want to use the query result to modify the data. This is normally impossible, because of lazy evaluation.

70 Page 68 of 130 String query = "xhive:force(doc('doc')//elem)"; Iterator result = lc.executexquery(query); while (result.hasnext()) { XhiveXQueryValueIf value = (XhiveXQueryValueIf)result.next(); // We know this query will only return nodes. Node node = value.asnode(); // Remove this node node.getparentnode().removechild(node); xhive:version($document as document-node()*, $version as xs:string) as document-node()* returns a sequence of documents that represent the contents of specific versions of a set of input documents. For nodes that are not documents, or are not versioned, or for non-existing versions, an empty sequence is returned. The version argument is first evaluated as a label, if no version with that label is found, the argument is evaluated as a version id (so if you have a version 1.4 which has as label "1.2", a query for version 1.2 will yield version 1.4. It is possible to specify a set of documents as an argument, this is useful for for instance a query like xhive:version(doc("/versioned-lib"), "release2") that will get all document-versions from a library that have the label release2. xhive:version-property($document as document-node()*, $version as xs:string, $property as xs:string) as xs:string* returns the value of a specified attribute of a version. This function acts mostly like the xhive:version function, but the result consists of a sequence of strings. The property argument must be one of date (give back the date on which the version was created, in the format yyyy-mm-ddthh:mm), creator (the name of the user who created the version) or checked-out-by (the name of the user who has checked out this version of the document, if it is indeed checked out). xhive:version-ids($document as node()*[, $branchversion as xs:string]) as xs:string* returns the ids of versions of a document (although you can pass multiple documents to this function, you will usually only pass one). If no second argument is specified, all version ids of the version space are returned as a sequence of strings. By passing a second argument, you can get some detailed information: If you specify the id of a branch, the result will contain only those version ids that are part of that branch (but that includes all version ids the branch shares with other branches). If you pass "1" as the argument, the result will contain a list of all ids of branches in the version-space of the document argument. If you specify the id of a version, the result will contain the labels set on that version, if there are any. For non-versioned documents, or when the branchversion argument refers to a non-existing branch or version, the result will be the empty sequence. As an example of these versioning functions, this query gets all the different titles of a book-document of all versions created before 2003: distinct-values( let $doc := doc("/version-lib/book.xml") for $version in xhive:version-ids($doc) where xhive:version-property($doc, $version, "date") < " " return xhive:version($doc, $version)/book/title ) xhive:metadata($document as document-node(), $key as xs:string) as xdt:untypedatomic* retrieves the value that belongs to $key in the metadata of the document $document. If the key is the empty sequence, the result will be a sequence with the values of all metadata fields. xhive:metadata(doc("/mydoc"), "author") xhive:highlight($arg as item()*,...) as item()* calls the extension function set on the query using the XhiveXQueryQueryIf.setHighlighter(highlighter) API. The first argument to the extension function will be a sequence of strings consisting of the tokens used by any full text search in the current FLWOR expression. Any arguments passed to the XQuery function will be passed as other arguments to the highlighter function. For example, if the query is for $elem in //para where $elem ftcontains "Rotterdam" return xhive:highlight($elem) the highlighter function will be called with two arguments, "rotterdam" and the matching para element. The tokens passed will have been processed by the analyzer used for the query. If the query strings contain wildcards, the wildcards will have been replaced by the characters specified in the XhiveFtsUtilIf interface. The XhiveFtsUtilIf.compilePattern method can be used to match query strings with wildcards against terms in the text Namespace declarations

71 Page 69 of 130 For your convenience, several namespace prefixes have been bound by default. Of course, you can override these in the query prologue. In addition to those specified in the XQuery working draft (xml, xs, xsd, xsi, local), the following two prefixes have been predefined: fn is bound to the XQuery functions namespace This is also the default function namespace, allowing use of standard XQuery functions without prefix. xdb is bound to the xdb extension functions namespace Collation support Several XQuery functions take a collation argument. The possible values of this argument are implementation defined. In xdb, a collation consists of a locale and an optional strength, separated by a slash. The possible locales are those supported by the java.text.collator class. You can create a list using the code fragment java.util.locale[] locales = java.util.locale.getavailablelocales(); for (int i = 0; i < locales.length; ++i) { System.out.println(locales[i].toString()); The strength argument is a number from 1 to 4, corresponding to the collator strengths PRIMARY, SECONDARY, TERTIARY and IDENTICAL. If the strength is unspecified, the default strength of the java.text.collator instance is used. Example usages: compare($string1, $string2, xs:anyuri("en")), starts-with($string, "abc", xs:anyuri("nl_nl")), ends-with($string, "xyz", xs:anyuri("no/1")), substring-after($string1, $string2, xs:anyuri("fr_ca/2")) If you do not specify a collation, the implementation will use the normal Java String class methods for comparison, effectively comparing UTF-16 code units. Note that functions like substring and string-length always count Unicode code points, which may not give the same result as counting Java characters Limitations We do not implement the complete working draft, in particular: The following items are not supported: Collations in "order by" specifications The XML base-uri declaration and the construction declaration Type names in element tests and attribute tests Of the XQuery optional features we support: Full Axis Feature: supported Schema Import Feature: not supported Static Typing Feature: not supported Module Feature: supported Static Typing Extensions: not supported As in XPath 1.0, a path step does not set the context size and position, only predicates do so. You can work around that by using a for iterator with a positional variable (for $x at $y in...) and the count() function. 8.7 Using indexes in XQuery xdb contains several kinds of indexes: Library name indexes and library id indexes. If available, these are always used by the XQuery doc() function. Id attribute indexes on documents are always used by the XQuery id()function. Id attribute indexes on libraries are never used by queries (except when explicitly used with the xhive:get-nodes-by-key() extension function). Value indexes and element name indexes are used when the optimizer can determine that they can be used for a specific expression in the query. See below for details. Content conditioned indexes can only be used with the xhive:get-nodes-by-key() extension function.

72 Page 70 of 130 Full text indexes in XQuery can be used through the xhive:fts function, see the "Full text searching" paragraph for more details. External full text indexes are never used by queries Value and element name indexes Value indexes and element name indexes are used for path expressions that: Start with a call to the XQuery doc() function that specifies the document or library containing the index or one of its ancestor libraries. If the specified library does not contain a useful index, but one or more descendant libraries do, the evaluator will use the index on those descendant libraries and evaluate the query in the other libraries by brute force search. Contain only "downward" steps. That is: child, descendant or descendant-or-self. Of course, this includes the abbreviated versions like //. For element name indexes, the first step must be a descendant(-or-self) step. Contain at most one predicate per step. If you have multiple predicates in a step, rewrite them to use and. For instance, replace //parent[@color = "red"][elem[@attr = "green"]] with //parent[@color = "red" and elem[@attr = "green"]] to have the query use possible value indexes on parent/@color and elem/@attr. For value indexes: contain a predicate or where-clause that checks the indexed value using a value or general comparison against any expression that is constant for this path expression and whose type corresponds to the type of the value index. For element name indexes: contain a step with an indexed element name. Assume there is a value index (with the default type "STRING") on attribute attr of element elem on the root library. Here are some examples: (: Use index without further checks :) doc("/")//elem[@attr = $var] (: Ditto :) for $x in doc("/")//elem where $x/@attr eq func(2) return... (: Use index and check parent of indexed element :) doc("/")//parent[@color = "red"]/elem[@attr = "green"] (: Use index and check ancestors of indexed element :) doc("/")/parent[@color = "red"]//elem[@attr = "green"] (: Use index and lookup children :) doc("/")//elem[@attr eq substring($str, 1, 3)]/name (: Use index and return parent of indexed node :) doc("/")//parent[elem/@attr = "black"] (: Use index and return all ancestors of indexed nodes called "parent" :) doc("/")//parent[descendant::elem/@attr = "black"] With an element value index, the predicate or where-clause must check the contents of the element for an index to be used. Assume an element value index on the elem element in the root library: (: Use the index directly :) doc("/")//elem[. eq "red"] (: In a flower expression :) for $x in doc("/")//elem where $x = "green" (: Uses text() instead of context node :) doc("/")//person/elem[text() = "yellow"] (: Use the index and return the parent of the indexed node :) doc("/")//person[elem = "black"] (: Or equivalently :) for $x in doc("/")//person where $x/elem = "black"

73 Page 71 of 130 return $x See also the section on path indexes for more examples of queries using indexes Range queries Range queries are queries that constrain data to a range of values, instead of to a single value. If the optimizer finds a predicate or where-clause that uses both "less than (or equal)" and "greater than (or equal)" on the same node, it can use an index (if available) to find the values in the requested range. Only indexes with the XhiveIndexIf.KEY_SORTED property can be used for range queries. Example: doc('/')//book[@author >= "A" < "B"] If there is an index (with sorted keys) on book/@author, the optimizer will scan the index from A to B to find the result of this expression. You should be careful that the optimizer can be sure that the conditions refer to the same node. In this example: (: Cannot use range query on author index :) doc('/')//book[author >= "A" and author < "B"] the optimizer cannot use the author index in a range query, because the book may have one author satisfying the first condition and another author that satisfies the second condition. To make both conditions refer to the same author and allow use of the index, you have to write the query like this: (: Can use range query on author index :) doc('/')//book[author[. >= "A" and. < "B"]] (: Can use path index book[author + title] :) doc('/')//book[author = "Asimov" and title[. >= "Robot" and. < "Second"]] Indexes on metadata To use an index that has been created on the library mylib on the metadata field author, use: doc("/mylib")[xhive:metadata(., "author") = "Jane Doe"] This will look up Jane Doe in the author index and return all documents found. The optimizer can only use indexes for expressions where the metadata key is a string literal or the literal empty sequence (for an index on all metadata fields), not a generic expression. With full text indexes, you need an expression like: doc("/mylib")[xhive:fts(xhive:metadata(., "p"), "XQuery")] Multiple indexes If different parts of a query can use different indexes, this is done in the obvious way. E.g., if you have an index on attribute x of element x, and an index on attribute y of element y, both will be used in for $x in doc("/")//x[@x = "x"] for $y in doc("/")//y[@y = $x/@yref] return... (: or, equivalently :) for $x in doc("/")//x for $y in doc("/")//y where $x/@x = "x" and $y/@y = $x/@yref return... If both indexes can be used for a single path expression, the optimizer creates a query plan with an intersection. If you use a query like doc("/")//x[@x="x"]/y[@y="y"] the optimizer first looks up "x" in the index for x/@x and stores the result in a temporary set. It then looks up "y" in the index for y/@y, and checks that the parents of the indexed elements are present in the temporary set. If so, the node element is put in the result set. This is somewhat like evaluating the query let $x := xhive:get-nodes-by-key("/", "x/@x", "x") return xhive:get-nodes-by-key("/", "y/@y", "y")[parent::x intersect $x] With metadata indexes, you can create similar combinations. doc("/")[xhive:metadata(., "status") = "ready" and.//x/@x = "x"]

74 Page 72 of 130 doc("/")[xhive:metadata(., "author") = "PP"]//chapter[xhive:fts(title, "xdb")] Using indexes to enhance order by performance Indexes with the property XhiveIndexIf.KEY_SORTED can be used to speed up queries that use an order by statement. This is only possible if the single order specs (the expressions given in the order by statement, separated by commata) match the indexed values, in the correct order the FLWOR expression is run on a single indexed library or document the order by is not stable If there is a multi-valued path value index, the optimizer will try to use as many values from the index as possible. In this case the order specs have to be in the same order as in the index specification. (: with an index on foo.xml, this will use an order by index :) for $book in doc('foo.xml')//book[@year] (: have to mention child for index usage :) order by $book/@year descending return $book You can find out if your order by query is being optimized by enabling the queryplan debug statements. Given a path value index like //book [@year<string> + title<string>] this query will print something like Found an index to support the first 2 order specs. declare option xhive:queryplan-debug 'true'; for $book in doc('foo.xml')//book[@year and title] order by $book/@year, $book/title return $book If the query plan does not match the expected plan, then you can enable the optimizer debug statements and check to see whether or not the optimizer considered the desired index. declare option xhive:optimizer-debug 'true'; for $book in doc('foo.xml')//book[@year and title] order by $book/@year, $book/title return $book It is also possible to optimize a subset of the order specs, e.g. if an index can only support two of three order specs, only the last order spec has to be evaluated, and that only in the case the first two values are equal. Queries that use range or equality comparisons on index values in combination with an order by statement will also benefit from indexes. With the path value index from the last example, this query will be speed up, too. for $book in doc('/booklib/')//book[@year = '2002' and title > 'V'] order by $book/@year, $book/title return $book Ignoring indexes You can ignore (disable) certain indexes using the xhive:ignore-indexes option. It takes a comma separated list of index names. These indexes will not be used to optimize the given query. declare option xhive:ignore-indexes 'myindex1,ftsindex'; for $x in Type information and XQuery There are two ways in which type information is stored in xdb: Value indexes can have a type. During validation of a document, PSVI information can be stored with the nodes of the document, registering the type of the node as declared in the associated XML Schema. In this section, information is given on how this typed information is used in XQuery. The focus here is on the basic data-types (as defined in XML Schema and used in XQuery), not on derived types that can also be used in XQuery and in PSVI. When you have a query like

75 Page 73 of 130 < /my/first/idelement] the way the comparison between the id-attribute and the idelement element is processed, depends on the type-information that can be found for the data: If no type information can be found anywhere, the comparison is performed between the string-values of the two items involved. If PSVI information is stored for the items involved, it will use that. For instance, if it is known that /my/first/idelement represents an integer, the comparison will be performed as if both values are integers. This can be important for ordering (e.g. 2 < 12, but '2' > '12'). If the id attribute is known to be an integer, but /my/first/idelement is known to be of an incompatible type (through PSVI), the XQuery processor will give an exception. In the above example, the XQuery processor would be able to use a value index on the id attribute (it should be noted that this would have to be a sorted index because of the '<'). For the query example what kind of index is looked for actually depends on the type of /my/first/idelement. If the element would be untyped, or the PSVI information would indicate it represents a string, a value index of type string would be used. But if PSVI information indicates that the type of /my/first/idelement in the linked XML Schema is an integer, an integer index would be looked up (and, when only a string value index would be defined it would not be used). What is important to realize is that the index-type will never determine the type used in the comparison. First a type is determined for the comparison, and only then will an index of that type be looked for. However, when a typed index is used it can lead to different query results compared to a query evaluation without that index, as id-attributes would be treated as integers when an integer index would be used, where otherwise they may have been untyped and treated as text. It can be useful to add declare option xhive:queryplan-debug "true"; to your query, then debugging information will be printed on what indexes (and indexes of what type with what options) are being sought by the XQuery processor. If you would want to execute the sample XQuery with an integer comparison, but /my/first/idelement does not have PSVI information or has PSVI information but is of a non-integer type in the linked XML Schema, you can still get the correct result by explicitly casting the compared value to the right type, or using one of the internal conversion functions, like //element[@id < int(/my/first/idelement)] or See also: //element[@id < (/my/first/idelement cast as xs:integer)] samples: <XhiveDir>\src\samples\manual\TypedIndex.java 8.9 Extending XQuery functionality using Java xdb provides three extension mechanisms to integrate custom Java code with the XQuery engine. Extension functions can be declared in XQuery and assigned using XhiveXQueryQueryIf.setFunction(...) (see following section) or accessed directly using xhive:java() Probably the easiest way, XQuery code can directly specify Java modules: import module namespace math = "java:java.lang.math"; math:sqrt(4), $math:e All public methods and public static fields from the given class are made available to the query. They are accessible using the plain Java names and a translated version where all characters are lower-cased and camel case is transformed into a hyphenated version (a hyphen is inserted between lowercase and uppercase characters). E.g. "getfoo()" will be "get-foo()" in XQuery; "localuri" will be "local-uri" Type marshalling Parameters will be marshalled. The system currently translates the following types: XQuery type Java type xs:string java.lang.string xs:int java.lang.integer / int xs:integer java.math.biginteger xs:long java.lang.long / long xs:decimal java.lang.bigdecimal

76 Page 74 of 130 xs:double xs:float item() node() node() xs:string java.lang.double / double java.lang.float / float com.xhive.query.interace.xhivexqueryvalueif com.xhive.dom.interfaces.xhivenodeif org.w3c.dom.node java.lang.string Additionally, the system understands Iterators, Collections (List, Set) and Arrays of these types as a zero-or-one sequence (*) of a type. If there is a Java 5 generic type parameter (e.g. Iterator<XhiveNodeIf>) the type will be interpreted Java objects and instance methods Parameters with an unrecognized type will be returned as Java objects to XQuery. These objects can then be passed to other Java functions, all XQuery expressions on them will fail. Java values in XQuery are needed to call instance methods (as opposed to static methods). If a method is non-static, the instance it should be called on has to be passed as an additional first parameter. /* Java code */ public int foo(string bar) {... (: XQuery code :) import module namespace eg = 'java:mypackage.eg'; let $x := eg:new() return eg:foo($x, 'param1') Instances can be created using a constructor with the eg:new(...) syntax or injected from the outside as an external parameter. import module namespace eg = 'java:mypackage.eg'; declare variable $x external; eg:foo($x, 'param1'); Type checking Parameters from XQuery will be checked for the correct type and promoted to Java objects according the the table. public static String foo(string bar, int baz, Iterator<XhiveNodeIf> nodes) {... (: legal call :) eg:foo("bar", 5, <element/>) (: wrong type :) eg:foo("bar", "baz", ()) The return value of the function will be transformed to XQuery values exactly as in xhive:java(), except that returning Iterators, Collections, Arrays and Sets is possible, too Known limitations In Java two methods with different type parameter types can have the same name. In XQuery, functions with the same name are only allowed if they have a different number of parameters. In xdb, the query parser analyzes the input types from the query and tries to select the correct Java method accordingly. This is done by calculating a score for each method based on how good the XQuery parameter types match the Java parameters. An error will be reported if the scoring ends in a draw (more than one method has the best score). To direct the parser on which method to use, users can add 'treat as' or 'cast as' statements to the call. eg:foo(/some/path treat as element(*, xs:integer)) 8.10 Miscellaneous performance tips If a path expression uses an index, use as few explicit steps as possible, as these must all be checked to verify whether they match. On the other hand, if the path expression does not use an index, but must be searched by scanning the document, use as many explicit steps and predicates as possible, so that branches of the DOM tree can be skipped as soon as possible. (: Preferred with index :) doc("/")//elem[@attr = "green"] (: Preferred without index :) doc("/")/docelem/persons/person/name/elem[@attr = "green"] Some predicate expressions are optimized to stop searching as soon as the required number of items are found. If the predicate is an integer expression that does not depend on the context node, context position or context size evaluation will stop as soon as possible. Similarly if the predicate requires position() to be less than (or equal to) such an expression.

77 Page 75 of 130 (: Stops searching after the second foo element :) let $x := doc("/mydoc")//foo return $x[position() le 2] (: Also stops searching after the second foo element :) let $x := doc("/mydoc")//foo return $x[2] Sometimes you can rewrite a query to make it use such a predicate. (: Tests predicate for all foo elements :) (doc("/mydoc")/descendant::foo)[position() = 2 to 4] (: Stops after the fourth foo element :) (doc("/mydoc")/descendant::foo)[position() le 4][position() ge 2] In cases like the above example, you can also use the subsequence() function to make sure that the a search does not continue after the requested number of items. (: Stops searching after the fourth foo element :) let $x := doc("/mydoc")//foo return subsequence($x, 2, 3) Use let to move expensive computations out of loops. (: Before (searches document b for each occurrence of element a) :) doc("/a")//a[@id = doc("/b")//b[@id = "10"]/@ref_a_id] (: After (searches document b at most once) :) let $ref_a_id := doc("/b")//b[@id = "10"]/@ref_a_id return doc("/a")//a[@id = $ref_a_id] If you know something occurs only once, use a predicate [1] to allow the evaluator to stop searching after the 1st occurrence. (: Even better if you know each id is only used once, will stop searching after 1st occurrence found. :) let $ref_a_id := (doc("/b")//b[@id = "10"])[1]/@ref_a_id return (doc("/a")//a[@id eq $ref_a_id])[1] (: Do not confuse the previous query with this one. This query is probably not what the user intended. :) let $ref_a_id := doc("/b")//b[@id = "10"][1]/@ref_a_id return doc("/a")//a[@id eq $ref_a_id][1] Use the unordered function where possible. If the order of the result is not important, using unordered may speed up your query by allowing the evaluator to elide sorting the result in document order. (: Assuming the query uses an index, this needs to sort the values looked up in the index. :) doc("/")//elem[@attr = 'value'] (: This version needs no sorting step. :) unordered(doc("/")//elem[@attr = 'value']) Rewrite recursive functions to be tail recursive. xdb implements the so called "tail call modulo cons" recursion generalization. This allows recursive functions calls that return the result of the call directly, to be evaluated iteratively, without any recursion. This saves stack space. This does only work if the tail call is the last thing in an evaluation branch of the function, except for other tail calls. Tail recursion also works for mutually tail-callable functions. (: This function is tail recursive because the recursive call is : the last thing on the 'else' evaluation branch :) declare function local:x($arg) { if ($arg eq 3) then 'foo' else ('a', local:x($arg - 1)) ; (: : This function is not tail recursive because the result of the : recursive call is used in the 'or' statement and thereby needed : for evaluation of the method body. :) declare function local:x($arg) { exists($arg/@attr) or local:x($arg/child::*) ;

78 Page 76 of 130 Many functions can be re-written to be tail recursive. (: Not tail recursive because the result is used in the '+' operation :) declare function local:sum($x as xs:integer) as xs:integer { if ($x eq ) then 0 else $x + local:sum($x - 1) ; (: Re-written to use an accumulator, call with $acc = 0 to start :) declare function local:sum($x as xs:integer, $acc as xs:integer) as xs:integer { if ($x eq ) then $acc else local:sum($x - 1, $x + $acc) ; Tail recursion does not necessary make things faster, but it allows recursive functions that would otherwise result in stack overflows. Because of the evaluation strategy the results of tail calls cannot be type checked and the parameters to tail calls are not evaluated lazily but directly. The end result of the method is still type checked. You can find out if you function is tail recursive by enabling queryplan debug (declare option xhive:queryplan-debug 'stdout';). You can set the maximum tail recursion depth using xhive:max-tail-recursion-depth, see options Parallel queries A particular subset of queries can be evaluated in parallel. For parallel evaluation to be considered at all, an executor instance must be provided using code like: XhiveXQueryQueryIf query =... ; Executor executor = Executors.newCachedThreadPool(); query.setparallelexecution(executor); While parallel evaluation can reduce the response time of queries, there is some overhead involved that may reduce the total throughput. If a FLWOR or path expression is evaluated on a library and no relevant indexes can be found, the query evaluation will descend to the child libraries to evaluate the expression on each child library separately. It is this step that can be parallelized. The database will create jobs for the evaluation of the expression on each child library and submit them to the user-supplied executor. Parallel query evaluation is most useful in cases where the child libraries searched are located on different disks, so that the I/O load can be spread. Generally, the expressions that can be parallelized are the same expressions as those that can use indexes, regardless of whether indexes are actually present or used. See the section on value indexes about the form of expressions that can be optimized. If the xhive:queryplan-debug option has been turned on for the query, the output will contain a message if the query is being parallelized. See also: samples: <XhiveDir>\src\samples\manual\ParallelQuery.java 8.11 Full text searching xdb partially implements W3C XQuery Full Text Facility Standard available at xdb also extends XQuery with a full text search function. We still support this proprietary syntax in version 9.0, but we recommend to use standard syntax proposed by W3C Supported W3C XQuery Full Text features Currently, xdb supports the following list of W3C Full Text features: Logical full-text operators Wildcard option Anyall options Positional filters Score variables Logical full-text operators xdb supports all full-text logical operators: ftor, ftand, ftnot and not in. The detailed description of these operations is available at Examples of full-text search queries with logical operators are:

79 Page 77 of 130 (: retrieves all books with title containing terms "programming" and "web" :) doc('bib.xml')/bib/book[title ftcontains "programming" ftand "web"] (: retrieves all books with title containing terms "Unix" and "TCP" or term "programming" :) doc('bib.xml')/bib/book[title ftcontains "programming" ftand "web" ftor "programming"] (: retrieves all books with title containing terms "Unix", but not containing term "UDP" :) doc('bib.xml')/bib/book[title ftcontains "Unix" ftand ftnot "UDP"] (: retrieves all books with title containing terms "Unix" when it is not part of "Unix environment" :) doc('bib.xml')/bib/book[title ftcontains "Unix" not in "Unix environment"] Wildcard option xdb supports all qualifiers of wildcard option: '.', '.?' '.*', '.+' and '.{n,m'. The detailed description of wildcard option is available at Examples of full-text search queries with wildcard option are: (: retrieves all books with publisher containing term starting from "Kauf" :) doc('bib.xml')/bib/book[publisher ftcontains "Kauf.*" with wildcards] (: retrieves all books with publisher containing term starting from "Aca" then followed by arbitrary character and ended by "emic" :) doc('bib.xml')/bib/book[publisher ftcontains "Aca.emic" with wildcards] (: retrieves all books with publisher containing term "Publisher" or terms starting from "Publisher" and ended by arbitrary character :) doc('bib.xml')/bib/book[publisher ftcontains "Publisher.?" with wildcards] (: retrieves all books with publisher containing term "Aca.emic" :) doc('bib.xml')/bib/book[publisher ftcontains "Aca.emic" without wildcards] Anyall options xdb supports anyall options: any, any word, all, all words, phrase. The detailed description of anyall options is available at Examples of full-text search queries with anyall option are: (: retrieves all books with title containing phrase "TCP programming" and term "UDP" :) doc('bib.xml')/bib/book[title ftcontains {"TCP programming", "UDP" all] (: retrieves all books with title containing phrase "TCP programming" :) doc('bib.xml')/bib/book[title ftcontains {"TCP", "programming" phrase] (: retrieves all books with title containing at least one of "TCP", "programming" or "UDP" term :) doc('bib.xml')/bib/book[title ftcontains {"TCP programming", "UDP" any word] Positional filters xdb supports three positional filters: ordered, window and distance. The detailed description of positional filters is available at Examples of full-text search queries with positional filters are: (: retrieves all books with title containing both "unix" and "programming" and the order of matched terms is the same as in the query :) doc('bib.xml')/bib/book[title ftcontains "unix" ftand "programming" ordered] (: retrieves all books with title containing both "unix" and "programming" which are found within 3 words unit :) doc('bib.xml')/bib/book[title ftcontains "unix" ftand "programming" window 3 words] (: retrieves all books with title containing both "unix" and "programming" and the distance between matched terms must be at least 2 words :) doc('bib.xml')/bib/book[title ftcontains "unix" ftand "programming" distance at least 2 words] Score variables xdb supports scoring mechanism using score variables in for and let clauses of FLWOR expression. Score variables are of type xs:double in the range [0, 1], and a higher score value implies a higher degree of significance. The detailed description of XQFT score variables is

80 Page 78 of 130 available at Examples of full-text search queries with score variables are: (: retrieves all books with title containing both "unix" and "programming" terms and sortered by score :) for $book score $s in doc('bib.xml')/bib/book[title ftcontains "unix" ftand "programming"] order by $s return $book (: retrieves all books with title containing "unix" and sortered by score in descending order. However the scores reflect whether the book's content contains "programming" and "java" terms :) for $book in doc('bib.xml')/bib/book[title ftcontains "unix"] let score $s := $book/content ftcontains "programming" ftand "java" order by $s descending return $book Score calculation Scoring is available for both indexed as well as non-indexed queries. However when using indexes the quality of the score estimation is much higher since (depending on the options used) we will have access to frequency, and occurrence counts for the whole set of searched nodes. For optimal score estimation the Full Text indexes should be created with options FTI_SUPPORT_PHRASES, and with FTI_SUPPORT_SCORING. The scoring implementation of xdb is partially based on Lucene's. XDB also uses a Lucene based Similarity class, and just as in Lucene, the user can influence the results by changing the similarity measure used. This can accomplished using the xhive:fts-similarity-class x-query option. For a basic presentation of the concepts used to estimate the score a given query, we refer the reader to Lucene's own Similarity API which describes well many of the concepts used for score calculation. Currently our scoring implementation differs from Lucene most significantly in the fact that we do not evaluate all results, estimating all scores before returning the first result with its first score. We merely estimate the expected amount of results, and use this to normalize and weight different query components. In our current implementation it is not possible for a user or administrator to manually increase the weight that a certain node has, and therefore to manually increase the relevance of that node with respect to scoring. We cannot guarantee stability of scoring relevance order. As we improve xdb score calculation the (relative) relevance of different search results is bound to change xhive:fts function xdb extends XQuery with a full text search function. The full text search function can be used to search for terms within a text string. In general you can think of 'terms' as words. For example, the string, 'yadda yadda yadda' contains three terms, each with the value 'yadda'. In xdb terms are the basic units for full text indexing and searching. This is different from for instance the contains function in XQuery that considers the text as a single monolithic string. Using the full text search function has a number of advantages, since the full text search function : looks upon the input string as a list of terms, instead of a list of characters like the contains function does. This makes the usage of indexes more practical. Currently no indexes are used when using the contains function, the full text search, however, can use indexes. Allows usage of wildcards, and prefixes. Allows you to search for exact or sloppy phrases. What is exactly seen as a term is determined by the tokenization process which is controlled by the analyzer class xdb's current full text search implementation is partially based on code from the Lucene project. This manual also contains parts from the Lucene Lucene FAQ The full text search function is declared as follows: xhive:fts(node(s), querystring, options) The first argument of the function should be a (set of) node(s). The second argument is expected to be a query string. If a set of nodes is provided as the first argument the full text search is executed on all the nodes. The options argument is optional and (if present) should be a string literal containing a semicolon-separated list of options. The options available are: 1. include-attrs: the function will also search on the the attribute values of searched elements (and descendants) along with the other text-nodes. Note that if you want to use the include-attrs option in combination with full text indexes, you must use the FTI_INCLUDE_ATTRIBUTES option on that index (and vice versa, if you do not use include-attrs the option may not be set on the

81 Page 79 of 130 index). 2. analyze-wildcards: if set the terms in the query will be sent to the analyzer. However, if the query contains wildcards, the analyzer must not remove the characters used to represent the wildcards. (These are defined on XhiveFtsUtilIf) 3. analyzer classname: you can change the analyzer to be used in the query text by passing an analyzer classname as an option. The result is a boolean value returning true if the text matches the query Full text search query syntax The syntax of full text search queries is as follows: Query ::= Clause ( [ Conjunction ] Clause ) * Conjunction ::= 'AND' 'OR' ' ' '&&' Clause ::= [ Modifier ] BasicClause [ Boost ] Modifier ::= '-' '+' '!' 'NOT' BasicClause ::= ( TermQuery Phrase '(' Query ')' ) TermQuery ::= ( Term WildCardTerm PrefixQuery ) [ Fuzzy ] PrefixQuery ::= Term '*' Phrase ::= '"' Term * '"' [ SlopFactor ] Fuzzy SlopFactor Boost Term ::= '~' ::= '~' DecimalDigit+ ::= '^' DecimalDigit+ '.' DecimalDigit+ ::= <a-word-or-token-to-match> WildCardTerm ::= <a-word-or-token-to-match-with-wildcards> The following characters are reserved and need to be escaped with a backslash (\) if used without any special meaning: +, -,!, (, ), :, ^, [, ], ", {,, ~, *,? Boolean queries A boolean query represents a composite query that may contain sub queries of arbitrary nesting level and with composition rules such as 'and', 'or' or 'not'. Each sub query of a boolean query has two binary qualifiers that control how its super query is matched. These qualifiers are: prohibited - when this flag is set, the matching status of the sub query is negated such that the query is considered as a match only when the sub query does not match. required - when this flag is set, the sub query is required to match (or not to match if its 'prohibited' flag is set) for the super query to match. This this is a necessary but not sufficient condition for the super query to match. The default implicit conjunction is OR, so a query like "apples oranges bananas" is equal to "apples OR oranges OR bananas". You may locally change the implicit conjunction using the pragma option xhive:fts-implicit-conjunction. There is some overlap of functionality with XQuery. XQuery also allows you to use the same boolean operators. For instance the queries: //element[xhive:fts(., "apples AND oranges")] (# xhive:fts-implicit-conjunction 'AND' #) { //element[xhive:fts(., "apples oranges")] Yields the same results as the query: //element[xhive:fts(., "apples") and xhive:fts(., "oranges")] So there is no semantical differences between the three options. But in the current implementation the first two types of query will be significantly faster than the last type (this could change in future versions) Prefix search

82 Page 80 of 130 Search for all terms starting with a certain prefix, such as in 'build*' Phrase search A phrase query represents a query that is matched against a consecutive sequence of terms in the field. For example, the phrase query 'winding road' should match 'winding road' but not 'road winding' (with the exception of more relaxed slop factors as discussed below). A phrase query may have an optional boost factor and an optional slop parameter (default = 0). The slop parameter can be used to relax the phrase matching by accepting somewhat out of order sequences of the terms Wildcards The following wildcards can be used: *,?. The * wildcard is a substitute for an arbitrary number of characters, the? wildcard substitutes a single character. Only indexes built with the option FTI_LEADING_WILDCARD_SEARCH are able to efficiently search for terms with a wildcard as the first character, if this option is not set, the search will be exhaustive and may become extremely slow Tokenization Tokenization is the process of breaking content in 'terms' (words). In xdb tokenization is done by an object called the analyzer The analyzer An analyzer breaks up content in tokens, and can also change tokens to improve its search capacity. It might change the terms to lowercase, or change a term from plural to singular. For example, an analyzer can break the text 'The ill dogs' into the tokens 'the', 'ill, dogs' (lower case) while another analyzer can break it into 'ill', 'dog' (lower case, plural normalized to singular and the common word 'the' removed'). Both the text to be searched and the query are passed through the same analyzer. If an index is available then the same analyzer as used for building the index is used for analyzing the query. If no index is available then it depends on the value of the "fts-analyzer-class" option which analyzer is used. To use a different analyzer in the query, include its class name as an option in the optional options argument of the xhive:fts function. The default analyzer used by the fts function has the following behaviour: It creates terms containing only letters and/or digits (everything else will trigger the start of a new term). It converts all characters in a term in lower case. English stopwords are filtered out. The list of English stopwords that are filtered out is: "a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "no", "not", "of", "on", "or", "s", "such","t", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with" Examples An example with java code is included in the distribution. Examples of full text search queries are (: The text should contain the terms 'apples' and 'oranges' :) 'apples AND oranges' (: The text should contain the term 'oranges' but not the term 'apples' :) '!apples && oranges' (: The text should contain terms starting starting with appl* :) 'appl*' (: The text should contain the phrase "apples and oranges" :) '" apples and oranges "' (: The text should contain something like the phrase "apples and oranges" the maximum distance between the terms should be at most 3 terms. :) '" apples and oranges " ~3' Limitations The current full text search implementation has some limitations:

83 Page 81 of 130 The query parses recognizes boost factor, but currently it's not possible to rank the end results. Prefix queries are not passed through the analyzer. This is especially important to know if your index contains only lowercase terms, because in that case you should not use uppercase letters in your prefix query. Using phrase queries on indexed nodes when the index does not support phrase queries will currently yield an unsupported operation exception. This means that when using phrase queries you should either use an index that supports phrase queries, or no index at all. Phrase queries should always be surrounded by double quotes. A list of terms within single quotes is not yet recognized by the query parser as a phrase query. 9 Using indexes Indexes are used to speed up queries. Especially in case of large datasets, indexes are essential to query performance. An index stores key-value pairs. In xdb, the index key is a string or (for value indexes only) a number type and its value a node set. xdb currently has the following indexing methods: Library indexes. Id attribute indexes. Element name indexes. Value indexes. Full text indexes. Path indexes. Metadata indexes. Context conditioned indexes. Each xdb indexing method has certain characteristics: An index is either 'live' or 'non-live'. 'Live' indexes are automatically updated when the indexed data is updated. 'Non-live' indexes are only updated on request. All indexes, except for context conditioned indexes, are 'live' indexes. As a result, the existence of indexes affect data update performance in a negative way. To maintain performance of data updates, it is important to add a 'live' index only if the index is essential to query performance and to select index type and location with care. Context conditioned indexes are updated at user-request and therefore data update performance is not influenced by this type of indexes. An index can be defined on library or document level and is maintained automatically for all descendants (id attribute, value, library, element name and full text indexes) or for all children (library indexes) of that level. Context conditioned indexes are updated on request. Library name and library id indexes can only be defined on libraries, all other indexes on both libraries and documents (in most cases, you will want to place indexes on the libraries). An index is either accessible or inaccessible through an index list (XhiveIndexListIf). All index types have a common interface (XhiveIndexIf). These indexes are created with the following default properties: The index does not keep the keys sorted. The index is stored in a separate storage container and therefore not locked with its owner library or document. This way, concurrent access to the index is improved. The key type of value indexes is XhiveIndexIf.TYPE_STRING. Indexes are very scalable: the number of keys and values of an index can grow without limitations. Especially in case of Id attribute, element name and value indexes, values (node sets) can become very large. Information about how to optimize index performance can be found here. The following paragraphs describe each indexing method in detail.

84 Page 82 of 130 Note: A detailed description of how XQuery uses indexes internally can be found here 9.1 Library indexes Library indexes improve the performance and scalability of access to the children of a library. This indexing method can only be applied to libraries. Library indexes are 'live', which means that indexes are updated automatically when library children (e.g. documents, libraries, BLOB nodes etc.) are inserted, replaced or removed. Library indexes are stored in a XhiveIndexListIf and must have a unique name. There are two types of library indexes: Library id index. Library name index. A library can only have one library id and one library name index. Adding a second library id or library name index will generate a exception. By default, a library has a library name index. The root library also has a library id index Library id indexes A library id index stores the children of a library by their id. This index improves the performance and scalability of method get(long id) in XhiveLibraryIf. As each library child is given a unique id, each child of the library can be accessed through the index. It is recommended to use library id indexes when a large number of children is stored in the library and the get(long id) method is used Library name indexes A library name index stores the children of a library by their name. This index improves the performance and scalability of method get (String name) in XhiveLibraryIf. As names are not mandatory for library children, not all children may be represented by the index. It is recommended to use library name indexes when method get(string name) is used and a large number of library children is involved. The index also improves performance of XLink operations and full path XPointer queries as library and document names are used to uniquely identify a document in a library. 9.2 Id attribute indexes An id attribute index stores elements by their element id. An element id is the (unique) attribute of an element of type ID. The id attribute index is used by and improves the performance of DOM method getelementbyid(string elementid) and XQuery/XPath/XPointer queries. You can specify which attributes are ID-attributes in the DTD or XML-schema linked to the document. When a DTD is associated with the document in the database (e.g. by parsing with validation), attributes created through the DOM call createattribute(string name) will be made ID-attributes when so defined in the DTD. If you do not want to work with a DTD, you can also explicitly make ID-attributes by calling createidattribute(string name) on XhiveDocumentIf. Id attribute indexes can be created on either library or document level. Element id's are only unique within the context of a document. Therefore, id attribute indexes on library level can have more values for a given id. Furthermore, not validated documents may also have more elements with the same id. The index does not limit the number of entries for a given key. Id attribute indexes are 'live', which means that they are automatically updated when element id's are inserted, replaced or removed. The indexes are of class XhiveIndexIf and stored in a XhiveIndexListIf. As index lists can store indexes of different types, each index in the index list must have a unique name. Each index list can only contain one id attribute index. Note: Id attribute indexes are used by XQuery/XPath/XPointer queries and the getelementsbyid(string elementid) method. This type of indexes will most likely not be accessed directly by users. 9.3 Element name indexes An element name index stores elements by their name. This means they are only useful if you want to get all elements with a certain name. To access elements with certain values, see the Value Indexes. Two types of element name indexes are supported: All elements are indexed. Only elements of a selection of element names are indexed. This type of index is called a selected element name index. To maintain performance of data updates, users are recommended to use selected element name indexes, if at all.

85 Page 83 of 130 To distinguish between elements of same (local)name with different namespaces, keys must be "namespace aware". An element is indexed with key: String nodename, when the namespaceuri is null String namespaceuri + ' ' + localname, when the namespaceuri is not null. The string contains both namespaceuri and local name, divided by one single space character. I.e. an element with namespaceuri and local name chapter is indexed by key chapter Element name indexes can be created on either library or document level. 9.4 Value indexes A value index stores elements by an element value or an attribute value. Value indexes are created on either library or document level. There are three different types of value indexes: Value indexes that store elements by element value. Value indexes that store elements by attribute value. Value indexes that store named elements by attribute value. Value indexes are 'live' which means that they are automatically updated when elements and/or attributes are inserted, replaced or removed. An index list can contain multiple value indexes. Value indexes have namespace support. Whenever namespaces are used, the documents involved must have been parsed with the XhiveIndexIf.PARSER_NAMESPACES_ENABLED option and the elementuri and/or attributeuri parameters must be supplied to the addvalueindex() method Value Index Types For value indexes, it is possible to specify a type, which indicates the type of values that are put in the index. The default, and also the most common type, is the string type. The complete list of types is TYPE_STRING, TYPE_LONG, TYPE_INT, TYPE_FLOAT, TYPE_DOUBLE, TYPE_DAY_TIME_DURATION, TYPE_DATE_TIME, TYPE_DATE, TYPE_TIME, and TYPE_YEAR_MONTH_DURATION. All these types match basic XML Schema data-types. When selecting a type for the index, it is the responsibility of the user to make sure that all data that is indexed adheres to the type. This means that it is not required that the documents indexed for instance adhere to an XML Schema in which data-types are specified, as long as the values of all elements and attributes that are placed in the index match the type-syntax the index will properly constructed. When an attempt is made to index data that does not match the syntax of the type of the index, an exception is thrown. All data can be indexed using TYPE_STRING. However, it can be useful to index data that represents e.g. decimals with type TYPE_DOUBLE, because: In the case of a sorted index, the sorting will be based on type rather than always in lexicographical order (e.g. 2 < 15). Different lexicographical representations of the same value are registered under the same index key (e.g == ). Furthermore, when the XQuery processor is looking for an value of type integer and may be able to use an index, it will only look for an integer value index. Note: Value indexes are not integrated within XPath and XPointer, but are used by XQuery. See also: API documentation: com.xhive.index.interfaces.xhiveindexif 9.5 Full text indexes A full text index stores elements by the full text information of its element text value(s) or an attribute value. Value indexes are created on either library or document level. There are three main differences with value indexes:

86 Page 84 of 130 Value indexes can be used to get elements by their complete value, with full text indexes you can search for individual words of the value. Besides looking for individual words, full text indexes can be used in more complex boolean and wildcard queries (unlike value indexes, where you can only look for a complete text match). Value indexes can only index elements with exactly one text-node child, with full text indexes you can optionally index all the underlying text of an element (including the text of sub-elements). Thus, full text indexes are more versatile than value indexes. This versatility comes with a price: full text indexes are slower (especially during updates) than value indexes. There are a number of index options specifically for full text indexes: FTI_GET_ALL_TEXT, The element is indexed by its string value, which is computed from the string value of all descendant nodes. If not set, the element can only have text-child nodes, but will result in faster index updates. FTI_SUPPORT_PHRASES, if set the index will be optimized to perform phrase queries (at the cost of a larger index). FTI_SUPPORT_SCORING, if set the index will store extra information about the indexed tokens with the purpose of improving the quality of our score calculation. FTI_SA_ADJUST_TO_LOWERCASE, if set the indexed terms will be converted to lower case, meaning queries are performed caseinsensitively. FTI_SA_FILTER_ENGLISH_STOP_WORDS, if set words will not be indexed if they are from a list of standard (English) stopwords. FTI_INCLUDE_ATTRIBUTES, if set words that are part of the attribute values of the elements being indexed will also be indexed. Has no effect on full text indexes placed on attributes. When this option is set, you should set the option "include-attrs" when using the xhive:fts function. FTI_LEADING_WILDCARD_SEARCH, if set the index will be able to efficiently search for terms with a leading wildcard (i.e. '*plication'). This option also improves the speed of searches of the form "PREFIX*SUFFIX", such as "cou*eracts". Setting option does not slow down normal non-wildcard searches, but it may increase the time required for an index update. For indexes, it is also possible to specify a custom Analyzer class, which acts as a tokenizer and allows for finer-grained control over the indexing process. Custom analyzers have to be subclasses of org.apache.lucene.analysis.analyzer. See section Supported options for information about how to specify an analyzer class to use in xhive:fts in the absence of indexes. Full text indexes are 'live' which means that they are automatically updated when elements and/or attributes are inserted, replaced or removed. An index list can contain multiple full text indexes. Full text indexes have namespace support. Whenever namespaces are used, the documents involved must have been parsed with the XhiveIndexIf.PARSER_NAMESPACES_ENABLED option and the elementuri and/or attributeuri parameters must be supplied to the addfulltextindex() method. Note: Full text indexes are used by XQuery, specifically by the xhive:fts function. 9.6 Path indexes Path indexes are like value indexes in that they index the value of elements and attributes. But path indexes allow a more general way of specifying the indexed element and also allow multiple element and/or attribute values to be used as index keys. One of the multiple values used as index key can be a full-text field that can be used to accelerate xhive:fts queries. Path indexes are specified using an XPath-like syntax consisting of a path to the indexed element and optionally a specification of the values to use as index keys. Like normal element value indexes, the element whose value is used as index key can only contain a single child node that must be a text or cdata section node. Here are various examples: //elem will index the value of all elements with the name elem. This is exactly like an element value index. Such an index will be able to speed up queries like //elem[. = "value"], but also for queries like /foo/bar/elem[. = "value"] or /foo[bar/elem = "value"]. //elem[@attr] will index the value of all attr attributes of elem indexes. This is exactly like an attribute value index. Such an index will speed up queries like //elem[@attr = "value"].

87 Page 85 of 130 is like the previous index, except that now the elements and attributes must be in the namespace will index the value of all attr attributes, regardless of the name of their owner element. This is exactly like an attribute value index. Such an index will speed up queries like = "value"] and also = "value"]. //chapter/title will index the value of all title elements that are children of chapter elements. This is useful if you also have other title elements that should not be found in your query. //chapter[title] will also use the value of the title elements beneath chapter elements, but unlike the previous example the indexed nodes will be the chapter elements instead of the title elements. Both indexes can be used for a query like //chapter [title="intro"], but this index will save a parent lookup step from the indexed element. Only the previous example index can be used for the query //chapter/title[. = "Intro"]. /root[node/@id] will create an index where the keys are the values of id attributes of node elements that are children of the document element root. Using a path with only child steps like this can speed up index updates, because the whole document does not have to be searched for indexed elements. //elem[@attr1 will create an index where the keys are pairs of the values of the attributes attr1 and attr2. This is useful for queries like //elem[@attr1 = "value1" = "value2"] that can be evaluated with only a single lookup in this index. This index will also be used for a query like //elem[@attr1 = "value1" but not for a query like //elem[@attr1 = "value2"]. If one (or both) of the attributes does not exist for a particular element, the index will not contain that element, because the example query should never return that element. //elem<int> will index the values of elem elements as integers, like a value index with the option TYPE_INT. This is useful for a query like //elem[. = 10] Specifying the type as an option is not possible with path indexes, because in a multivalued index each of the values used as the index key can have a different type. //items/item[@id<int> + price<float> + description/name<string>] is an example of an index where the various values have different types. It can be used for the query //items/item[@id = 10 and price = xs:float(4.53) and description/name = "keyboard"]. //article[body<full_text>] is an example of an index where articles are indexed by the full-text content in the article bodies. It can be used for the query //article[xhive:fts(body,"apples")]. //article[body<full_text:my.package.customanalyzer:>] is an example of an index where articles are indexed by the full-text content in the article bodies using a custom text analyzer (see the section about full text indexes for details on using custom analyzers). //article[author<string> + body<full_text::get_all_text,sa_adjust_to_lowercase,sa_filter_english_stop_words>] is an example of an index where articles are indexed on both their authors and content. It can be used for the query //article [author="john" and xhive:fts(body,"apples")] Path index specification syntax Here is a syntax for path index specifications. spec := elementpath ( type '[' values ']' )? elementpath := elementstep+ elementstep := ('/' '//') name values := value ( '+' value )* value := ('.' '@' name valuepath) type? valuepath := './/'? name (('/' '//') name)* ('/@' name)? type '<' ( 'STRING' 'INT' 'LONG' 'FLOAT' 'DOUBLE' 'DATE_TIME' 'DATE' 'TIME' := 'YEAR_MONTH_DURATION' 'DAY_TIME_DURATION' fulltextspec ) '>' fulltextspec := 'FULL_TEXT' (':' analyzername? (':' fulltextoptions?)?)? fulltextoptions := (fulltextoption fulltextoption ',' fulltextoptions) fulltextoption ('GET_ALL_TEXT' 'SUPPORT_PHRASES' 'SA_ADJUST_TO_LOWERCASE' 'SA_FILTER_ENGLISH_STOP_WORDS' := 'INCLUDE_ATTRIBUTES') name := '*' uri? localname uri := '{' ([^&] reference)* '' In these productions localname is an XML name, reference is an XML character or predefined entity reference, and analyzername is a java class name. Note that some of the full-text options are only valid in combination with the XhiveStandardAnalyzer, see full text indexes. See also: API documentation:

88 Page 86 of 130 com.xhive.index.interfaces.xhiveindexif 9.7 Metadata value indexes Metadata value indexes are just like normal value indexes, but instead of indexing the content of a node, they index the value of a metadata entry. The biggest difference is that, here, the index key is the complete value of a metadata entry. See also: API documentation: com.xhive.index.interfaces.xhiveindexlistif 9.8 Context conditioned indexes A context conditioned index stores Node objects by a user-defined key. A Node can be an element, text node, processing instruction, comment or document. Each context conditioned index has a user-defined index node filter that is used to determine which nodes should be included in the index and what key should be used. Context conditioned indexes are not automatically updated. In other words, context conditioned indexes are 'non-live', and must be updated periodically, by the application program. Context conditioned indexes should be used for: Searching for elements by element value or attribute value while the search can not use value indexes. Complex queries: in certain cases a (well-defined) context conditioned index is faster than its equivalent XQuery/XPath/XPointer query. The three basic steps involved in creating a context conditioned index are: 1. Get a handle to the index list. 2. Create an XhiveCCIndexIf object. 3. Create an index node filter (using XhiveIndexNodeFilterIf) to define which nodes to include and not include in the index. An example of using context conditioned indexes can be found here 9.9 Index performance This chapter aims to give information on how to speed up query performance by using indexes while maintaining a reasonable data update performance. The following topics are discussed: Index scope. Index selectiveness. Index property: sorted keys. Summary. Section "Using indexes in XQuery" explains how indexes are used internally by XQuery Index scope xdb supports nested library structures. Users are free to select the scope (context) of an index. I.e. the scope of the root-library is bigger than the scope of a nested library. The smaller the scope of the live index, the less index updates will be triggered by a data update within the selected scope. Therefore the smaller the index scope, the better the performance of data updates. As the value (the number of nodes) by key is smaller, query performance is also better. The scope of library indexes is of no importance as library indexes only apply to the direct children of the library Index selectiveness In general, indexes provide the best query and update performance when the index is as 'selective' as possible. In other words: each key must have the smallest possible number of nodes.

89 Page 87 of 130 As library indexes have unique names and id's, both library name and id indexes have optimal selectiveness and therefore have excellent queryand update behaviour. Next to library indexes, especially value indexes and id attribute indexes are fit to serve as selective indexes. Element name indexes are usually not very selective: most documents do not have unique element names. To maintain a good data update performance, users are advised to use selected element name indexes instead of the default element name indexes as selected element names only index a subset of all elements Index property: sorted keys An index with property XhiveIndexIf.KEY_SORTED maintains its keys sorted. Indexes with sorted keys may need more time to update than indexes without sorted keys. By default, indexes do not keep their keys sorted. Sorted keys are necessary to use the index for inequality queries in XQuery Summary Follow these rules to achieve the best index performance: Do not add more indexes than required Reduce the scope of the index as much as possible Make the index as selective as possible Use selected element name indexes instead of default element name indexes Create indexes after loading the data 9.10 Concurrent indexes Indexes can be specified as concurrent, using the XhiveIndexIf.CONCURRENT flag when creating the index. Concurrent indexes are not locked for the duration of the transaction when accessed or modified. Only the pages actually used are latched, and only for the time that they are actually read or modified. This improves concurrency at the expense of some extra overhead when using the indexes. Whether the net effect is beneficial will depend on your application. There are a few things to beware of when using concurrent indexes: When using the XhiveIndexIf.getKeys() method, the index may return keys that have no nodes. If you only use indexes from within XQuery, this is not an issue. Uniqueness checking is not implemented for concurrent indexes. The extra locking required to ensure uniqueness would defeat the main purpose of concurrent indexes. (Simply checking that a key is not present is not sufficient, because its removal may not yet have been committed and could still be rolled back.) Concurrent indexes do not provide phantom protection. If you read from the index a second time during the same transaction, new keys and nodes that have been committed since the previous lookup may appear. (Keys and nodes read can never disappear, because the actual data read is still read locked for the duration of the transaction.) Concurrent indexes do not have a separate authority value. The XhiveIndexIf.getAuthority() method will return the authority of the owning library child. See also: samples: <XhiveDir>\src\samples\manual\CCIndex.java 10 Using validation and the Catalog This chapter gives some background on working with XML Schemas and DTDs in xdb Introduction In xdb libraries, XML documents (as well as sub-libraries and BLOBs) can be stored. XML documents can have a DTD (document type definition) or an XML Schema attached to it which can be used to validate the document. DTDs and XML Schemas are also accessible from the catalog, as ASModel objects. This section does not deal with using ASModel objects, there is another section that gives information on that. In this section, it is explained how you can get information in and out of the catalog, and how the catalog is used in combination with documents, during parsing and validation.

90 Page 88 of 130 In the rest of this section, DTDs and XML Schemas are referred to as 'models' Catalog content Catalog location A catalog is linked to a library. By default, only the root-library has a catalog, and all models would be stored there. However, it is also possible to place a catalog on a sub-library, and in this way you can split your models over multiple catalogs. Those catalogs on sub-libraries are so-called local catalog, and are created by calling addlocalcatalog on a library. When adding local catalogs, they override information in the root-catalog. What this means is that when information in the catalog is looked up, it will first look in the local catalog, but if nothing could be found there, it will look up to higher level catalogs. This also works the other way around. You can put a model with a certain id in the rootlibrary that is used in all libraries. But if you add a local catalog to some sub-library and put a model with the same id in it that one will be used for documents in that sub-library, and possible descendant libraries of that library Identification of XML Schema and DTD models Each model in a catalog has a unique id. The id depends on the schema type of the model: A DTD model is identified by its public id. This id is a string and can be more or less picked at random. If you do not set a public id on a DTD itself, xdb will automatically generate one for it when required. An XML schema is identified by its filename Managing models DTDs and XML schemas stored in the catalog are accessible as ASModel objects. That interface is part of the DOM Abstract Schema specification. It provides several interfaces for functionality like access to model information, guided document editing, parsing and serialization. Documents with a DTD can have internal subsets, which means there is a document type declaration along with DTD declarations in the document itself. When parsed with validation, that internal subset is stored and available as an ASModel, but the internal subset is not stored in the catalog. It is only available on the document itself. A document with a DTD has at most one external ASModel object. A document with an XML schema can have multiple ASModel objects as each schema document is represented by a separate ASModel. You can add models to a catalog in two ways: Explicitly, by parsing a model onto it. There is a code sample for this in the Abstract Schema section. Implicitly, when you parse a document with validation enabled, the model linked to the document will automatically be stored in the catalog. On a catalog, you can set a default DTD (through its ASModel). This default is only used during parsing. It is not possible to set a default XML Schema. See also: API documentation: com.xhive.dom.interfaces.xhivelibraryif com.xhive.dom.interfaces.xhivecatalogif org.w3c.dom.as.asmodel 10.3 Linking models to documents When accessing schema information of a document (i.e. by the validator), xdb will automatically try to find the correct model for it in the catalog of the library where the document is (to be) stored. This section explains how models can be linked to documents in xdb DTDs If a document in xdb has a document type declaration, it may have a DTD in the catalog linked to it. An example of a document type declaration is <!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.0//EN" "svg10.dtd]"> Here the document type declaration has both a public id (-//W3C//DTD SVG 1.0//EN) and a system id (svg10.dtd). When you get the active ASModel of a document, it will look up the ASModel representing the model by looking it up by its id, in all catalogs

91 Page 89 of 130 up the hierarchy until the root-library is encountered. You can set the active ASModel using Abstract Schema. What happens then effectively is that a document type declaration is introduced in the document if it does not exist yet, and the id is set to the id of the ASModel. This is one way to change what model is linked to a document. You can also replace the model in the catalog and assign the id to the new model. It then becomes the model for all documents that refer to that id XML Schema The link from a document to its XML schema can be specified in two ways: By the schema location attribute on the document element. An example is: <personnel xmlns:xsi=" xsi:nonamespaceschemalocation='personal.xsd'> By the schema location parameter in the document configuration as shown in the Revalidation sample. A document with XML schema may have more than one active model attached to it. For documents with XML schema, linked models can be changed by setting the schema-location parameter in the document configuration settings. It is also possible to use setactiveasmodel and addas from abstract schema for this. These functions also modify the schema-location parameter value. After validation (or validated parsing), the string concatenation of the ids of all ASModels used are stored on the document. This string is used to identify the ASModels for validation or access to the PSVI interfaces. The document configuration parameter "xhive-schema-ids" shows the value of this string Validated parsing If you parse with validation and the models used are not found in the catalog, the models are stored automatically in the catalog. If you do not parse with validation, no models are stored in the catalog. We have added options that allow storing of the model (and/ or internal subset) without validation or to validate without model storage. As model resolution is handled in different ways for DTD and XML schema models, they are handled separately DTDs When parsing with validation, the following steps are executed when a document type declaration is encountered: 1. If the document type declaration has a public id, this id is used to identify the ASModel to validate against. 2. If no public id was specified, or if no DTD with that public id can be found, it is checked whether a default DTD is specified. If it is, it will use that one to validate against. 3. If no DTD was found through the two methods above, the DTD specified by the system id will be read from the file-system to be validated against. Besides that, the DTD will also be stored in the first local catalog encountered (when going up to the root-library). If a public id was specified, that one is used, otherwise one is generated by xdb. The third step in the above parsing scenario holds some important information for documents with DTD. If you do not have a public id in the document type declaration of your documents, a DTD will be stored for each document that you parse with validation, even if they all point to a DTD with the same system id. This is because we cannot be sure that the system id really refers to the DTD stored in the catalog. To prevent this, without modifying your input files, you should load the DTD beforehand, and set the resulting ASModel as the default model, to be used when no DTD could be found through public id resolution XML Schema When parsing with validation, the following scenario is followed to identify the XML schema models: 1. If the document is parsed by using the LSParser interface and a schema-location is specified in the LSParser configurations settings, the ASModel defined by this setting is used for validation. This schema-location is also set in the configuration settings of the document. 2. If one of the XML schema instance attributes "nonamespaceschemalocation" or "schemalocation" is set, the corresponding ASModel is used for validation. A Model defined by configuration parameter schema-location overrides a model defined by the schema location attributes of the document if both models have the same target namespace.

92 Page 90 of Validation A Document stored in xdb can by (re)validated against its model. The validation of documents with XML schema is handled differently from validation of documents with DTD DTDs The validator uses the ASModel defined by the PUBLIC id of the document. If the corresponding document is not found in the catalog, validation fails. Validation of a document with DTD can be executed by using the AS interfaces XML Schema The XML schema validation process can be influenced by the configuration settings of the document. Validation of the document is handled as shown in Revalidate documents with XML schema. The validator tries to find the ASModel(s) of the document in the order as shown below. If a model is found, the model is used for validation. 1. The "schema-location" parameter in the configuration settings of the document. 2. The attributes of the XML schema instance attributes "nonamespaceschemalocation" or "schemalocation" specified in the document. A Model defined by configuration parameter schema-location overrides a model defined by the schema location attributes of the document if both models have the same target namespace PSVI xdb supports XML Schema data types. This enables the use of data types by XQuery queries. Furthermore, xdb supports the Xerces XML Schema API that provides access to the Post Schema Validation Infoset (PSVI) and XML Schema model information. This api is not yet submitted to W3C. Users should be aware that interfaces may change in the future. The XML Schema API can be used to traverse XML schema components like type definitions, element declarations and schema constraints. In addition to the schema information, psvi information of each individual node can be accessed like validity, validation context, normalized value, typedefinition and member type definition as shown in Access PSVI information. Nodes storing state information need more diskspace. Users can set a configuration option "xhive-psvi" to enable storage of psvi information. If this option is not set, queries will not support data types and the XML Schema API is only partially accessible. I.e. no validity information of a node is available. When accessing schema information of a document by using the XML Schema API, the corresponding ASModels are identified by the value of the "xhive-schema-ids" configuration setting. This value shows the schema ids corresponding to information stored by the configuration schema-location parameter and schema-location attributes. If possible, the xhive-schema-ids value excludes the schema locations of the attributes that are overruled by the schema-location value. 11 Performance This chapter gives some tips on getting the best performance out of xdb. For XQuery query performance, see also the chapter on XQuery Internal server Usually the easiest and most effective way of speeding up your xdb application is to run the xdb server in the same JVM as your application. Instead of client/server communication over TCP/IP, all requests are done as simple method calls, which are much faster. Additionally, the system will not use a separate client and server cache, which allows you to use more RAM for the single cache. Because only one server can be running for a specific federation, using an internal server is only possible if your application architecture allows it. If you have many users running a client application that uses xdb directly, you cannot use a server in the client. However, in many application architectures all database accesses are done from a single application server. In these cases, using an internal server will not only speed up database calls, it will also simplify application deployment. An internal server can also function as a server for remote clients. If you run an internal xdb server within your application server, you can still connect to it with the administrator client, a data loading utility, the xdb backup command, etc. To set this up, your main application has to contain code like this: XhiveDriverIf driver = XhiveDriverFactory.getDriver(); driver.init(cachepages); int port =...; ServerSocket socket = new ServerSocket(port); driver.startlistenerthread(socket);

93 Page 91 of 130 The main application has to use the filename of the bootstrap file as the xhive.bootstrap property or as an argument to XhiveDriverFactory.getDriver(). This will make it run the xdb server internally. All other applications use an URL of the form xhive://host:port to connect to the server that runs in your application. If you want to use the same internal xdb server from different applications in the same application server, you should make sure that both applications use the same Java class loader to load the xdb classes. Otherwise, the xdb code loaded by each class loader will attempt to start its own xdb server and all but the first one will fail JVM settings and cache size There are several JVM settings that impact the performance of Java applications, including xdb. The easiest to use and most important one is the -server flag. Some JRE versions come with a client and a server compiler. Using the server compiler can give a large performance improvement to CPU bound processes at the cost of a larger startup time that is usually irrelevant for server applications. In our experience, upgrading to a new major version of the Sun JDK gives a performance improvement of some 10% to CPU bound applications, due to new optimizations in the new version. The new versions are generally more stable as well, so we recommend always using the latest release. Another setting that can be very important depending on your application is the amount of memory available to the JVM and to the xdb cache. The defaults chosen by the installer were meant not to have too much impact on a typical developer desktop and are generally insufficient for real server applications. It is impossible to recommend specific numbers, as the optimal settings will depend on your application and data, the hardware and the other tasks that that hardware has to perform. The rule of thumb is: the more, the better Database page size When creating a federation, you need to specify a page size for the database pages. This should be a power of 2 from 512 to In most cases, the best page size to use is the same page size that the filesystem uses (see below). Each document and blob will use an integral number of pages. If you expect to have many small documents, it may be useful to choose a database page size that is smaller than the filesystem page size. This saves disk space. The disadvantage is that if xdb writes a database page, the operating system will have to retrieve the old filesystem page, copy the database page into the filesystem page and write back the whole filesystem page. When a database page equals a filesystem page, retrieving the old filesystem page is not necessary. Choosing a database page size that is bigger than the filesystem page size is not advisable, because then file-writes are no longer atomic. In the event of an operating system crash, a database page may be written to the disk in part, which can cause that page, and thus perhaps the entire database, to become inaccessible. On Solaris, the filesystem block size is usually 8192 bytes, on MS Windows (with a default NTFS filesystem) and on Linux this is usually These defaults can be overridden though. To find out the page size of a filesystem on which you are going to deploy database files, you can: On Windows 2000, run chkdsk c:, to find out the size of an allocation unit (this command will also check the file system, so may take some time to run). On Windows XP, run fsutil fsinfo ntfsinfo c: (one command), to find out the number of bytes per cluster. On Linux with the ext2/ext3 filesystem, run tune2fs -l /dev/device, to find out with what block size the filesystem was created. With xfs, use xfs_info /mountpoint. You need to be root to run these commands. You can run mount to find the mapping between devices and logical mount points. On Solaris or HP-UX, run mkfs -m /dev/device, to find out with what blocksize (bsize) the filesystem was created. You need to be root to run this command. You can run mount to find the mapping between devices and logical mount points. This command may also work on other Unix-variants, but use with care as mkfs is also used to create new filesystems Multiple disks If you have multiple physical disks available, you should put the log directory on a separate disk. Writes to the federation log are usually more performance critical than writes to the database files, because a committing transaction has to wait until its log records have been flushed to the disk before it can continue. Unless the cache has too many dirty pages in it, writes to database files happen asynchronously from a background thread. If you have multiple disks for the actual data, the easiest and most effective way of spreading the disk access load is to use hardware or software to put the disks in a RAID 0 or RAID 1+0 configuration Linux filesystems On Linux, there are a number of filesystems you can choose for formatting your disks. The xfs filesystem is known for good throughput on very large files, such as xdb uses. Our tests confirm that for typical xdb usage the xfs filesystem gives the best performance. Xfs users may

94 Page 92 of 130 want to read the XFS FAQ Disk write caches Most hard disk drives use an internal write cache for buffering writes. When the operating system writes a block to the disk, the disk confirms the write as soon as the data is in the drive cache. On a power failure or similar condition, the data in the cache may not be written to the drive platters. If your operating system properly requests the disk to flush its cache when an application calls fdatasync() (Unix), FlushFileBuffers() (Win32) or similar, this is not a problem. Otherwise, you should consider disabling the write cache on the relevant disk drives to avoid data corruption on a power failure. This can have a significant negative effect on performance, though. Of course, if the drive contains a backup battery to guarantee that confirmed writes are always written to the physical disk even on a power failure, this is not an issue. MS Windows users should read this paper. 12 Internal structure of xdb This chapter describes the internal architecture of xdb Note: The information in this chapter is only applicable to the latest version of xdb. Similar information about databases created with older versions of xdb can be found in their respective manuals Segments and files Segments A database consists of a number of segments. A segment is a locical storage location within an xdb database. Each database always has at least one segment, the default segment. The default segment can never be deleted without deleting the database. Segments can be added and deleted (when empty) using the xdb API. You can specify in which segment data is stored. By default, all data is stored in the default segment. Data will never be automatically stored in another segment, even if the current segment is full. If you want automatic overflow, use a single segment with multiple files. You can specify the segment for temporary data using the XhiveDatabaseIf.setTemporaryDataSegment() method. You can specify the segment to store all children (documents, sublibraries, blobs) of a specific library in using the XhiveLibraryIf.setChildSegmentId() method Files A segment can be spread physically over multiple files. A segment always has at least one file, the default file. The default file can never be deleted without deleting its owner segment. Files can be added using the adminclient or using the xdb API. If no xdb code is running, it is safe to move database files to a new location and manually edit the paths in the bootstrap file. You should never edit the bootstrap file while an xdb server is running for that federation. You should also never change the file ids in the bootstrap file. If you have moved database files, you should not create an incremental backup without creating a full backup first. While creating such an incremental backup will seem to succeed, you will lose data when the backup is restored. A file consists of a number of pages. The pagesize is set when creating the federation. You can limit the maximum size of a file. If the file has reached this maximum size, further allocation in the segment will happen in the next file of the segment. If all the files in the segment have reached their maximum size, any further allocation in the segment will fail. There is no way to control or check in which file of a segment data is stored, different pages of single document can be stored in different files. If you need control over the location of data, use different segments. Note: If a file has a maximum size limit of 0 (meaning unlimited), the next file of the segment will never be used, not even if the disk containing the first file is full. Unfortunately, the only reliable way of detecting a full disk is actually writing the (still empty) page to the file while allocating the page. We consider this too expensive. Note: Sometimes xdb has to allocate pages for internal page allocation administration. At those times, the maximum file size limit is not heeded. This is because the allocation administration is required and there is no way to continue and keep the data consistent without allocating this space. Therefore, a file can become slightly larger than the limit set by the administrator Setting up database configurations Configurations can be set up in three ways:

95 Page 93 of 130 Using the admin client. You can use the admin client to create or delete segments and or files, and change default cluster rules. Using the API. You can also use the API to create or delete segments and or files, and change default cluster rules. Using a configuration file when creating a database. The configuration file can be used to specify an initial database configuration. The configuration can be specified when creating a database using the adminclient, the command line tool or the API. The default configuration (the one used when using the admin client and specifying the default configuration, or when using the command line tool or API and not supplying a custom configuration file), has the following parameters: The database has a single segment with a single file. All data is clustered in the default segment. The configuration of existing databases can also be obtained (as a DOM Document) through an API call (XhiveDatabaseIf.getConfigurationFile()). As indicated before, these settings of the database can all be changed later with the API or the Administrator client Log files To ensure recoverability of data, xdb uses the well known database technique of write ahead logging (WAL). Data is first committed to the database log before being written to the actual database files. Log files are written to the log directory specified when creating the federation and are sequentially numbered. If the keep-log-files option of the federation is unset, log files that are no longer needed for recovery are automatically deleted. If the option is set, log files are only deleted after creating a backup. See also the section on incremental backups. If (and only if) no xdb server is running for the federation, you can safely move the log files manually and edit the bootstrap file accordingly Detachable Libraries A library is detachable if The library and its descendants, including meta-data and indexes, are all stored on isolated (not shared with any other library) segments, and Its ancestors do not have indexes referencing the library. A detachable library may have its own external FT indexes. A detachable library s external FT index should not be merged to its parent or ancestor levels to maintain its detachability. But it is allowed to have external FT indexes at descendant library level of the detachable library. A detachable library is read-only if no changes to the library, i.e., adding/removing objects or indexes to/from the library, are allowed. Attempt to update a read-only library causes exception. A detachable library and its segments can be in the following three mutually exclusive states: read-write Both read/write from/to the library is allowed. read-only Only read from library is allowed. detached The library is logically (or physically) removed from the database and thus not accessible from the database. When a detachable library is created, its default state is read-write. All the segments of the library must have the same state in any time. In addition, a detachable library can be marked as non-searchable. A non-searchable detachable library is not visible to search queries Database configuration files Database configuration files can be used to describe initial database configurations. They are only used when creating a database, modifications of a file do not influence existing databases. The configuration file is in XML format. This section describes the possible elements and attributes. For example, this is the configuration file for a default database: <xhive-clustering> <segment id="default"/> </xhive-clustering> xhive-clustering element Document element for the configuration file Possible child elements: segment.

96 Page 94 of segment element Represents a segment. For each occurrence of this element a segment is created. Name Use Default Description Possible child elements: file file element Represents a file. For each occurrence of this element a file is created. 13 Administering xdb This chapter describes administration of xdb. There are four ways to perform administrative tasks within xdb: The admin client allows administrating xdb through a graphical user interface. The command line provides a text based interface to administrate and use xdb through a terminal or from scripting languages. The Ant tasks provide integration with the common Java build environment and also allow scripted execution of administrative operations The Admin Client The adminclient provides access for administrators and super users to key database functions. Using the adminclient, you can rapidly perform most tasks with xdb databases, in a user friendly way. The adminclient works based on the xdb API, so everything that can be done with the adminclient can also be done in a Java program with the API. In fact, the sources of the administrator client are included so you can inspect how the Adminclient uses the xdb API to perform certain tasks. This chapter contains a short tutorial on how to accomplish certain tasks in the adminclient, to get an idea of its functionality. Starting the adminclient Importing data Editing and exporting data Adding indexes Querying with XQuery Starting the adminclient To start the xdb adminclient: Attributes id Required - Id for the segment. Should be unique within the configuration file. path Optional - Path to the location where the default file should be placed. If not supplied then the default file is placed in the same directory as that of the federated database. max-size Optional 0 Maximum number of bytes that are allowed in the default file. 0 means unlimited. temporary Optional false Name Use Default Description path Optional - maxsize Optional 0 Whether this segment is a temporary data segment. Attributes Path to the location where the file should be placed. If not supplied then the file is placed in the same directory as that of the federated database. Maximum number of bytes that are allowed to be created in the file. 0 means unlimited. 1. Execute the command xdb admin in the XhiveDir\bin directory of the installation (or, on Microsoft Windows platforms: choose that program in the start menu). This starts the xdb adminclient. 2. An alternative way to run the administrator client is to execute the command xhive-ant run-admin in the XhiveDir\bin directory of the installation, this will compile and run the administrator client from the included sources

97 Page 95 of 130 Note, in xdb version 7.0 and higher this requires the JAVA_HOME environment variable in xhive-ant should point to a JDK Select "Database->Connect". The Connect to a database window is displayed. 4. Enter the database name, user name, and password for a valid user. The admin client is a database explorer which contains in the left part a treeview of the database information. In the right part of the database explorer details are shown off the selected node in the treeview. Most operations are accessible by right clicking on nodes in the treeview. Figure 17. Main administrator view Note: The status bar in the bottom right corner is a memory usage indicator, by clicking the trashcan in the right corner you can force a garbage collect on the Adminclient. Note: A number of preferences are automatically stored (like the last query executed and the last database connected to). Theses preferences are stored in a file named.xhive.admin.properties in your home directory (on MS Windows 2000, this is the directory c:\documents and Settings\username by default), you may remove this file if you want to reset the settings of the Adminclient program. There are four main nodes in the tree: The Database info node, for the database information such as segments, container pools and cluster rules. The Groups and Users node, for viewing, adding and removing users and groups that act as the basis of the authority model. The root-library node, the root-library is the top-level library that contains all other libraries and documents. Below the root-library, the hierarchy of libraries with their contents is presented. Besides the regular children of the library, there can be special types of 'folders' presented as children of the library. Folders with the letter 'C' in them present a (local) catalog of the library, which can hold schema-items such as DTDs and XML Schemas. Folders with the letter 'V' in them represent version spaces, representing the versioning information of a versioned document in that library. Documents in a library that are versioned have a small 'v' letter in the bottom right corner of their pictogram. In the example below, the 'V' folder 'briefing.xml' holds the version information of the versioned document 'briefing.xml'.

98 Page 96 of 130 Figure 18. Library contents example You can right-click on items in the tree to execute actions on them. For categories of information (e.g. users or libraries) such an action could be to add a new item of that category. For specific items in the tree (e.g. a specific user) you will find actions to change their properties in the right-click menu. There are a couple of actions that are not accessible through the right-click menus of the various nodes in the database treeview, but are in the regular menu of the adminclient window: Creating and deleting databases (in the create database dialog, a default configuration will work for most applications). Change the superuser password (in the settings menu). Enter a new license key (in the settings menu). The rest of the adminclient sections of below deal with some common tasks Importing data One of the principle functions is to import XML documents into the database. Importing is done through the 'Import' menu-item in the rightclick menu of library-nodes (the selected library will also becomes the target of the imported data). The basics of importing are very simple. Through the add button you can add (multiple) files and directories, and by clicking 'Okay' that data will be imported into the database. By default, a sub directory structure of an imported library will be recreated in the database.

99 Page 97 of 130 Figure 19. Main import dialog However, there are a lot of different options in the import dialog that have a serious influence on the way data is imported. The options can be divided into three groups: Below the list of items to be imported, there are some options that influence how the importer treats those documents that are loaded, mostly concerning where documents are (re-)placed, and how libraries are created. For instance, with the 'Flatten library structure' all data imported from a directory will be placed directly in the target library. In the screenshot above, you can see some error messages regarding filters. If the messages would not be resolved, the file sql_output.txt would simply be skipped. The reason is the adminclient importer uses filters to determine how each file is stored, and by default only filters for files with the extension xml and xsl are defined. The rationale for this is that: Not every file encountered in processing in for instance a directory being imported needs to be an XML document. A file need not be stored in xdb as an XML document, specifically you can store non-xml data as BLOBs, and schema-related files (DTDs and XML Schema's) as schema models in the catalog. Therefore, filename matching filters can be defined in the filters tab of the import dialog. When an import is run, it will try to match filenames against the wildcard patterns of the adminclient, in the order in which they are specified. If no matching pattern is found, the file is ignored. Filters are stored in the preferences of the adminclient, so you only need to define them once. There is also a 'Parser configuration' tab, which holds the parser options used. There you can specify whether to use validation, what information items of the original file to preserve in the parsed Document, and other properties. These are the properties that can also be set on the DOMConfiguration object of the LSParser (see the javadoc links in the DOMConfiguration section for more information on the semantics of the options). Unlike the filters, the configuration options are not stored in the preferences (some sensible defaults are chosen) Exporting and editing documents