DataDirect XQuery Technical Overview Table of Contents 1. Feature Overview... 2 2. Relational Database Support... 3 3. Performance and Scalability for Relational Data... 3 4. XML Input and Output... 4 5. Calling Java Methods or SQL Functions from XQuery... 5 6. Custom URI Resolvers... 5 7. XML Converters... 5 8. Making SOAP Requests from DataDirect XQuery... 6 9. IDE Support... 7 9.1. Stylus Studio... 8 9.2. <oxygen/> XML Editor for Eclipse (DataDirect XQuery Edition)... 9 10. Comparison to Other XQuery Implementations... 9 As a native XML query language, XQuery was designed to simplify data integration, native XML processing, and XML reporting. DataDirect XQuery is an implementation of XQuery that can query XML, relational data, SOAP messages, EDI, or a combination of data sources. It provides fast, reliable and scalable XQuery support for all major relational databases and runs on any Java platform. DataDirect XQuery implements the XQuery for Java API (XQJ), and is easily embeddable into any Java program it installs just like a JDBC driver, does not require any other product or application server, and has no server of its own. But DataDirect XQuery can also be used with any J2EE Application Server. DataDirect XQuery is highly optimized for relational data, can handle very large XML inputs and outputs, and can be used with DataDirect's XML Converters to query non-xml formats such as EDI. DataDirect XQuery is recommended for applications dealing with XML, relational, and legacy formats, including data integration, XML-based data exchange, web services, XML-driven web sites, XML pipelines, and XML publishing. 1
DataDirect XQuery Architecture DataDirect XQuery is used in Java applications, which issue issue queries to the XQuery Engine using the XQJ API, receiving the results of a query as XML. The XQuery Engine analyzes the query to determine which data sources are used, dividing the query up and sending it to the appropriate adaptors as follows. If a relational source is queried, the XQuery Engine sends the query to the SQL adaptor. The SQL adaptor translates the query into highly optimized SQL, which is used to query the database. The SQL adaptor receives the results, mapping them into XML and returning them as they are retrieved by the application. If an XML source is queried, the XQuery Engine sends the query to the Streaming XML adaptor, which executes the query and returns XML results. If a flat or EDI file is queried, the XQuery Engine sends the query to the Streaming XML adaptor, which relies on the XML Converters to create an XML representation of the flat or EDI file. If the XML results are obtained from more than one source, the XQuery Engine combines the results. 1. Feature Overview Here are some of the features of DataDirect XQuery 2.0: Delivers the productivity of XQuery for data integration, native XML programming, and XML reporting in a Java environment. Implements the November, 2005 XQuery Candidate Recommendation. Implements XQuery API for Java (XQJ), EDR 1. Supports all leading relational databases, with excellent performance and scalability. Generates very efficient SQL for querying relational data. 2
Can query XML documents or DOM trees. The built-in URI resolver supports the ftp:, http:, and file: schemes. You can also write your own URI resolver to support any XML source. Intelligently queries very large XML files. Input XML documents can be multiple Gigabytes, and can be larger than available memory. Supports very large query results with lazy instantiation when streaming output via text streams, SAX, or StAX. Supports XML converters to allow non-xml formats to be queried as XML. Converters available from DataDirect support many formats, including thousands of versions and sub-versions of EDI messages (X12 and EDIFACT), along with other formats like tab delimited, comma separated value, dbase files, and many more. You can also design your own converters using Stylus Studio and use them in your DataDirect XQuery programs. Supports calls to SQL functions or Java methods as external functions in an XQuery. Pure Java implementation runs on any Java platform. Does not require any kind of server, but supports any J2EE Application Server, including JBoss, Tomcat, WebLogic, Sun Java System Application Server, and WebSphere. Supported by Stylus Studio, an Integrated Development Environment that supports many XML technologies, and which makes it very easy to visualize data sources for data integration, create queries using DataDirect XQuery, view query results, and debug queries using an integrated debugger. Supported by the <oxygen/> XML Editor for Eclipse (DataDirect XQuery Edition), an Eclipse plugin that lets you visualize database connections, write XQueries, and view results. 2. Relational Database Support DataDirect XQuery supports the leading relational databases, and lets you write applications that can use any database your customer has. The following databases are supported in DataDirect XQuery 2.0. SQL Server 2000, 2005 Oracle 9i, 10gR1, R2 DB2 Windows/UNIX/Linux: v8.x, v9 DB2 iseries: V5R2, V5R3 DB2 z/os: v8 Sybase: 12.5.x, 15 3. Performance and Scalability for Relational Data DataDirect XQuery's SQL Adaptors were designed with a strong emphasis on performance and scalability. As a result, applications written with DataDirect XQuery usually perform better than applications written using JDBC and an XML API. The best way to convince yourself of this is to test DataDirect XQuery in your own environment but you might want to know some of the reasons you will be impressed by our performance. 3
We know how to measure performance. Our performance test suites are extensive, and represent a major development effort. These suites are run regularly as part of our standard release cycle. When our support staff identifies interesting customer performance scenarios, these are added to our performance test suites. We use highly selective SQL queries to retrieve no more data than we need for a query, carefully analyzing an XQuery for all conditions that can be used to limit the data needed. We retrieve only the rows needed in the query results, and only the columns needed within those rows. Some XQuery implementations retrieve much more data, filtering out unneeded data in the XQuery engine this significantly hurts performance. Operations that can be done in SQL are pushed into the database. This is particularly important for joins, order by clauses, and SQL functions. Each database has its own SQL adaptor, which takes an XQuery and generates SQL specifically for that database. The SQL we generate is optimized for each database based on the results of our performance testing. Some XQuery implementations support only one database; others generate the same SQL regardless of the database involved, a strategy that simply can not offer optimal performance. We use lazy evaluation so that streaming APIs like SAX, StAX, or output streams can retrieve data as soon as it is available. As data is needed, we retrieve it incrementally from JDBC result sets. Because XML is hierarchical, we use SQL generation algorithms that are optimized for building XML hierarchies. We make extensive use of merge-joins when translating XQuery to SQL. And the added cost of XML construction, which can be considerable in some implementations, is negligable in ours. Users can control several aspects of SQL generation through pragmas defined in the XQuery, which can significantly improve the performance in some cases. The underlying database drivers can significantly affect performance. As the leading vendor of JDBC, ODBC, and ADO.NET drivers, DataDirect knows how to make drivers perform. And because we control both the XQuery implementation and the underlying drivers, we can add optimizations to the drivers to support our implementation. 4. XML Input and Output DataDirect XQuery can query any of the following XML sources. XML files XML documents made available via user-written URI resolvers XML stored in columns of relational databases Java DOM trees DataDirect XQuery is optimized for querying large XML files, which can be many Gigabytes. We use techniques known as Document Projection and Streaming to dramatically reduce the memory required to query such documents. Using Document Projection, we analyze the query before parsing XML input, creating only the parts of the document needed by the query. If you enable streaming, we parse the document incrementally, discarding parts that are no longer needed for query processing. Using these techniques, documents significantly larger than available RAM can be queried. For many queries, memory consumption is constant and independent of the size of the input document. DataDirect XQuery supports the following output formats. 4
SAX StAX DOM java.io.outputstream java.io.writer Because we use lazy evaluation and stream results, very large query results are handled efficiently. 5. Calling Java Methods or SQL Functions from XQuery DataDirect XQuery lets you call Java methods Methods or SQL functions from within a query. Java methods can be used to return system information, to invoke a web service call, or simply to make a function available that is not in the standard XQuery function library. SQL functions can be used to invoke a stored procedure or to make a function available that is not in the XQuery function library. All external functions are namespace qualified, and the namespace that is used tells DataDirect XQuery whether the external function is written in Java or in SQL. Before calling such a function, it must be declared in the query. 6. Custom URI Resolvers By default, DataDirect XQuery uses the Java URI resolver when locating documents using the doc() function. But if your program requires that resources be located in custom repositories, converts some resource to XML on the fly, or does some other magic that is not built in to DataDirect XQuery, you can write your own URI resolver, and queries will use that instead. Your URI resolver must return an XML document using one of the following Java APIs: StreamSource, SAXSource, StAXSource (using an XMLStreamreader), or DOMSource. 7. XML Converters A great deal of useful data is found in plain-text formats, not as XML. With DataDirect's XML Converters, you can convert many formats to XML on the fly, so they can be queried with DataDirect XQuery. Converters are available for EDI messages (X12, and EDIFACT), along with other formats like tab delimited, comma separated value, dbase files, and many more. For instance, there are literally thousands of versions and subversions of EDI formats, and no standard way to query or process them. XML Converters allow you to read any of these formats as XML without writing code to parse them. Consider the following EDI message: UNA:+.? 'UNB+UNOA:4+STYLUSSTUDIO:1+DATADIRECT:1+20051107:1159+600 2'UNH+SSDD1+ORDERS:D:03B:UN:EAN008'BGM+220+BKOD99+9'DTM+137:20051 107:102'NAD+BY+5412345000176::9'NAD+SU+4012345000094::9'LIN+1+1+0 764569104:IB'QTY+1:25'FTX+AFM+1++XPath 2.0 Programmer?'s Referenc e'lin+2+1+0764569090:ib'qty+1:25'ftx+afm+1++xslt 2.0 Programmer?' s Reference'LIN+3+1+1861004656:IB'QTY+1:16'FTX+AFM+1++Java Server Programming'LIN+4+1+0596006756:IB'QTY+1:10'FTX+AFM+1++Enterprise Service Bus'UNS+S'CNT+2:4'UNT+22+SSDD1'UNZ+1+6002' When this is read using an XML Adaptor, it is converted to the following XML (for brevity, only the first few lines are given): 5
<?xml version="1.0" encoding="utf-8"?> <EDIFACT> <UNB> <UNB01> <UNB0101><!--0001: Syntax identifier-->unoa</unb0101> <UNB0102><!--0002: Syntax version number-->4</unb0102> </UNB01> <UNB02> <UNB0201><!--0004: Interchange sender identification-->stylusstudio</unb02 <UNB0202><!--0007: Identification code qualifier-->1</unb0202> </UNB02> <UNB03> <UNB0301><!--0010: Interchange recipient identification-->datadirect</unb0 <UNB0302><!--0007: Identification code qualifier-->1</unb0302> </UNB03> <UNB04> <UNB0401><!--0017: Date-->20051107</UNB0401> <UNB0402><!--0019: Time-->1159</UNB0402> </UNB04> <UNB05><!--0020: INTERCHANGE CONTROL REFERENCE-->6002</UNB05> </UNB> <!--!!! SNIP!!! --> </EDIFACT> An XML Converter is invoked using the doc() function. The following XQuery extracts the sender, recipient, time, and date from the EDI message. Note that the query need not understand anything about the complex EDI format, it uses the XML structure produced by the XML Converter. let $msg := doc("adapter://edi?transaction.edi")/edifact/unb return <transaction-summary> <sender>{ $msg/unb02/unb0201/text() }</sender> <recipient>{$msg/unb03/unb0301/text()}</recipient> <date>{$msg/unb04/unb0401/text()}</date> <time>{$msg/unb04/unb0402/text()}</time> </transaction-summary> Here is the output of the above query. <transaction-summary> <sender>stylusstudio</sender> <recipient>datadirect</recipient> <date>20051107</date> <time>1159</time> </transaction-summary> 8. Making SOAP Requests from DataDirect XQuery DataDirect can use the ws:call() function to make SOAP requests. For instance, we might want to look up a book on Amazon.com using the ISBN of the book. The following function creates the payload for the web message we want to send. 6
declare function local:amazon-listing($isbn) { <tns:request> <tns:condition>all</tns:condition> <tns:deliverymethod>ship</tns:deliverymethod> <tns:futurelaunchdate/> <tns:idtype>asin</tns:idtype> <tns:ispupostalcode/> <tns:merchantid/> <tns:offerpage/> <tns:itemid>{ $isbn }</tns:itemid> <tns:responsegroup>medium</tns:responsegroup> <tns:reviewpage/> <tns:searchindex/> <tns:searchinsidekeywords/> <tns:variationpage/> </tns:request> }; Now we can use the following query to create the message payload by calling our function, then use ws:call() to invoke the web service and get the result, which is a description of the book whose ISBN we are sending: declare namespace ws = "ddtekjava:com.stylusstudio.webservice.soapcall"; declare function ws:call($location as element(), $payload as element()) as document-node() external; let $loc := <location address="http://soap.amazon.com/onca/soap?service=awsecommerceservice" soapaction="http://soap.amazon.com" /> let $payload := local:amazon-listing("0395518482") return ws:call($loc, $payload) 9. IDE Support Two Integrated Developer Environments provide support for DataDirect XQuery: Stylus Studio and the <oxygen/> XML Editor for Eclipse (DataDirect XQuery Edition). Each of these tools offers the following abilities: Browse relational data sources to configure connections Write and test DataDirect XQuery queries Drag and drop from data sources to help create XQuery path expressions View Query Results Automatically generate simple Java classes that run a query using DataDirect XQuery's implementation of the XQJ API These two tools have different strengths. Stylus Studio provides a broader range of support for XQuery and other XML technologies; the <oxygen/> XML Editor for Eclipse (DataDirect XQuery Edition) is tightly integrated into a widely used Java development environment. 7
9.1. Stylus Studio DataDirect XQuery in Stylus Studio Stylus Studio is an XML Integrated Development Environment, designed to support a wide variety of XML technologies, including the following XQuery features. XQuery Editor provides syntax coloring, code folding, auto-completion and automatic indenting, and allows ad-hoc queries. Database Connections Window simplifies creation of XQuery queries that access relational data. XQuery Mapper a visual tool that lets you create XQuery transformations that map multiple input data sources into an output XML structure. Supports two-way editing between the mapper visual representation and the XQuery source. XQuery Debugger lets you simulate XQuery execution step-by-step, or backmap from the query result into the XQuery source to understand which part of your query is responsible for each part of the output XML Publisher lets you generate HTML or XSL-FO (PDF, PostScript) reports by visually creating XQueries based on one or more XML, EDI, or Relational data sources. XML Pipeline lets you create complex XML applications consisting of multiple XML operations, such as XQuery processing, XML Schema validation, XPath-based flow control, nested pipelines and more. Fully supports pipeline and XQuery-level debugging. 8
9.2. <oxygen/> XML Editor for Eclipse (DataDirect XQuery Edition) DataDirect XQuery in <oxygen/> XML Editor for Eclipse The <oxygen/> XML Editor for Eclipse (DataDirect XQuery Edition) is an Eclipse plugin that provides XQuery and generic XML support in a widely used Java development environment. This plugin is fully integrated with DataDirect XQuery, and provides the following. XQuery Editor provides syntax coloring, indenting, and code folding, and allows ad-hoc queries. Database Connections Window simplifies creation of XQuery queries that access relational data. XQuery Results Window shows the results of a query. XQuery Perspective lets you focus on the components used for XQuery development. Because this tool runs in Eclipse, it is easy to switch from the XQuery Perspective to the Java perspective, which is very convenient when writing Java programs that use XQuery. 10. Comparison to Other XQuery Implementations XQuery is now a widely-implemented language, with implementations for most relational databases, XML repositories, and XML IDEs. There are also in-memory implementations, implementations based on full-text systems, and application servers. When choosing an XQuery implementation, be aware that there are huge differences in the data sources that can be queried, the environments in which the XQuery engine runs, the quality and completeness of the XQuery implementation, performance, and the level of support you can expect from the company that stands behind it. Here are some of the differences between DataDirect XQuery and other XQuery implementations. Unlike XQuery implementations from relational vendors, DataDirect XQuery works with any major relational database. 9
Unlike server-based XQuery implementations, DataDirect XQuery does not require that a server be installed. DataDirect XQuery works with any web server or J2EE application server, but does not require a server and has no server of its own. Unlike the dedicated XQuery implementations in some XML IDEs, DataDirect XQuery is designed to be embedded into any Java application, and has no dependency on the IDE. Of course, you can use the Stylus Studio IDE or the <oxygen/> XML Editor for Eclipse (DataDirect XQuery Edition) to visualize data sources and design and debug the queries for your application. Unlike XQuery implementations from native XML databases and repositories, DataDirect XQuery works with conventional relational databases. Unlike memory-based XQuery implementations, DataDirect XQuery can generate efficient SQL to query relational data along with other kinds of data. Unlike the vast majority of XQuery implementations in any environment, DataDirect XQuery can query very large XML files because of its support for Document Projection and Document Streaming. DataDirect XQuery's implementation of XQuery is complete, current, and compatible. DataDirect XQuery offers exceptional performance and scalability. 10