An Empirical Study on XML Schema Idiosyncrasies in Big Data Processing
|
|
|
- Godfrey Malone
- 10 years ago
- Views:
Transcription
1 An Empirical Study on XML Schema Idiosyncrasies in Big Data Processing Dmitry Vasilenko, Mahesh Kurapati Business Analytics, IBM, Chicago, USA {dvasilen, Abstract The design and maintenance of the XML schemas for the enterprise, if done incorrectly, can be difficult and frustrating experience. The applications that use XML data binding or participate in the processing of large and complex XML have to quickly adapt to support changing and evolving requirements without forcing each application that exchanges XML data or uses value-objects to upgrade in lock-step. While not ideally suited for the Big Data processing, XML remains the widely accepted data format of choice. This paper provides an overview of the practical techniques and approaches that can help an XML schema author to make intelligent design decisions. Keywords- XML Schema, XML Data Binding, Apache Hadoop, Apache Hive, Map-Reduce, VTD-XML, XPath. I. INTRODUCTION An XML designer, who does not have an extensive personal experience with the actual programming, will usually not have the intuition needed to make intelligent design decisions. An XML schema, while deemed to be correct and conformant might not be able to be transformed to value-objects for the particular programming language with the expected object layout or method signatures. Additionally, the processing of large and complex XML in map-reduce environments can be difficult and could require non trivial custom solutions. A number of generic solutions to processing XML for big data were proposed such as the one described in [1]. Such approaches rely on the fact that the XML data can be reliably and unambiguously transformed to predictable value-objects even if the XML schema for the underlying data is not readily available. The approaches discussed in this paper are derived in part from [2] and based on evidence and experience gained examining and working with the significant number of schemas for XML data binding and big data processing. The examples are illustrated by snippets from the XML schemas the authors worked with. The objective of the discussed approaches is not to dictate rules, but rather, to shed light on each design issue, so an informed decision can be made. II. XML DATA BINDING The W3C specification provides basic [3] and advanced [4] patterns known to be interoperable between state of the art data binding implementations. The examples shown below are derived from such patterns but mainly focus on eliminating any ambiguities on how the data binding can be achieved or interpreted even if the schema is not available to the implementation. A. GLOBAL VERSUS LOCAL The XML component (element, complextype, or simpletype) is "global" if it is an immediate child of the <schema> element, whereas it is "local" if it is nested within another component [5]. The [5] also provides a useful discussion on the design choices. The following example shows locally defined elements <freeform> and <password> which, depending on the layout of the enclosing XML elements, can lead to ambiguities on how the names of the generated objects should be constructed. <xs:choice> <xs:element name="freeform"> <xs:extension base="basetype"> <xs:attribute name="max" type="xs:int" use="optional"/> <xs:attribute name="min" type="xs:int" use="optional" default="0"/> ISSN : Vol. 7 No.10 Oct
2 <xs:element name="password"> <xs:extension base="basetype"> <xs:attribute name="max" type="xs:int" use="optional"/> </xs:choice> On the other hand, defining the elements in the global scope eliminates any naming uncertainty: <xs:element name="freeform"> <xs:extension base="basetype"> <xs:attribute name="max" type="xs:int" use="optional"/> <xs:attribute name="min" type="xs:int" use="optional" default="0"/> <xs:element name="password"> <xs:extension base="basetype"> <xs:attribute name="max" type="xs:int" use="optional"/> <xs:choice> <xs:element ref="freeform"/> <xs:element ref="password"/> </xs:choice> This approach also correlates with the recommendations provided by Kohsuke Kawaguchi in [6]. ISSN : Vol. 7 No.10 Oct
3 B. UNNAMED ELEMENTS AND CARDINALITY To support min/maxoccurs attributes for the unnamed XSD elements such as <choice> and <sequence> the XML binding frameworks tend to generate wrapper classes. The names of these wrapper classes can be awkward and inconsistent. Using named elements with min/maxoccurs attributes helps to avoid ambiguity. The following example shows the questionable usage of the min/maxoccurs for the <sequence> element: <xs:element name="input"> <xs:extension base="baseparametertype"> <xs:sequence minoccurs="0" maxoccurs="unbounded"> <xs:element ref="node"/> </xs:sequence> To support a such construct some code generators will generate a wrapper class, say, InputItem to wrap a Node element as shown in the example below: public class Input extends com.example.baseparametertype implements Serializable { public void addinputitem(com.example.inputitem vinputitem) throws IndexOutOfBoundsException { } } public class InputItem implements Serializable { private com.example.node _node; } Such a class wrapper can be easily avoided by directly specifying the cardinality of the <node> element: <xs:element name="input"> <xs:extension base="baseparametertype"> <xs:sequence> <xs:element ref="node" minoccurs="0" maxoccurs="unbounded"/> </xs:sequence> The same is true for the <choice> element as well. ISSN : Vol. 7 No.10 Oct
4 C. ELEMENTS VERSUS TYPES There are different schools of thought on how the types and elements should be expressed in the XML schema to achieve the desired interoperability and ease of maintenance. An overview of the "element vs. type"" considerations that also discusses the element nullability aspect is provided in [7]. Paul Downey in [8] provides a strong critique of the "xsi:type" usage. The authors found that the following approach works the best: 1. Define a global type, e.g.: <xs:complextype name="notificationelement" abstract="true">.. 2. Derive a global element from the global type by extension: <xs:element name="notificationdomain"> <xs:extension base="ns1:notificationelement">. 3. Use <element ref=""> in the schema to refer to the element. <xs:choice> <xs:element ref="ns1:notificationdomain" minoccurs="0"/> </xs:choice> This approach eliminates any variability on how the types and elements should be considered when dealing with the XML data binding. Admittedly, such an approach can also lead to designs that cannot easily evolve. As the terminal elements cannot be extended, the suggested design is mostly applicable to the well researched and modeled domains where the terminal nature of the resultant XML elements is acceptable. D. PRIMITIVE TYPES AND OBJECT WRAPPERS The XSD <choice> element allows only one of the elements contained in the <choice> declaration to be present within the containing element. <xs:choice> <xs:element name="numbervalue" type="xs:double"/> <xs:element name="datevalue" type="xs:date"/> <xs:element name="booleanvalue" type="xs:boolean"/> <xs:element name="stringvalue" type="xs:string"/> </xs:choice> While modern XML data binding implementations can handle a mix of the primitive types and object wrappers the resultant "choice" object can be difficult to process. The discussion about primitive vs. object dichotomy can be found in [9]. ISSN : Vol. 7 No.10 Oct
5 E. ELEMENT NAMES AND RESERVED WORDS While mostly applicable to the older implementations, naming the XML elements using reserved words for a given programming language can lead to awkward and inconsistent method signatures. Consider the following XSD fragment that contains an element named "default": <xs:complextype name="basetype"> <xs:sequence> <xs:element name="description" type="xs:string"/> <xs:element name="default" type="xs:string" maxoccurs="unbounded"/> </xs:sequence> The generated method signatures did not follow the expected "camel" case [10] for the Java programming language and included underscores as shown below: public void add_default(string inname) throws IndexOutOfBoundsException; public void add_default(int i, String inname) throws IndexOutOfBoundsException; public void clear_default(); On the other hand, the accessors for the simple "description" element have been generated correctly using normal Java naming conventions: public void setdescription(string inname); public String getdescription(); While the issue has been resolved in the later versions of the code generator, the clients were effectively locked into the erroneous API. The issue could be avoided by renaming the element to "defaultvalue" so that the accessors have the expected method signatures. F. EMBEDDED ENUMERATIONS The generated name of an enumeration, defined as an immediate child of the attribute can contain the parent name, but this is not always desirable. <xs:attribute name="visibility" use="optional" default="always"> <xs:simpletype> <xs:restriction base="xs:string"> <xs:enumeration value="noui"/> <xs:enumeration value="hidden"/> <xs:enumeration value="always"/> </xs:restriction> </xs:simpletype> </xs:attribute> ISSN : Vol. 7 No.10 Oct
6 Defining the enumeration name in the global scope and referring to it using the type attribute as shown below helps to avoid ambiguity: <xs:simpletype name="visibility"> <xs:restriction base="xs:nmtoken"> <xs:enumeration value="noui"/> <xs:enumeration value="hidden"/> <xs:enumeration value="always"/> </xs:restriction> </xs:simpletype> <xs:attribute name="visibility" type="visibility" use="optional" default="always"> G. ELEMENT TYPE Omitting the type of the XSD element will generate an object of anytype which should be avoided. <xs:element name="fieldname"/> Instead, the type of the element should be specified explicitly. H. NAME AND REFERENCE If the "name" is used in the XSD definition of an element or attribute the "type" should be explicitly specified. The following example shows an incorrect attribute specification that mixes the "name" and the "reference": <xs:attribute name="hierarchy" ref="search:hierarchy" use="required"/> The correct definition follows: <xs:attribute name="hierarchy" type="search:hierarchy" use="required"/> I. ARRAYS AND WRAPPERS The authors observed that in some cases array wrapper elements such as "actionlist", shown below, get filtered out by the code generator: <xs:element name="actionlist"> <xs:sequence> <xs:element ref="search:actionspecification" minoccurs="0" maxoccurs="unbounded"/> </xs:sequence> As a countermeasure one can use an additional "reserved" element, as shown below, to ensure that the wrapper does not get omitted: <xs:element name="actionlist"> <xs:sequence> <xs:element ref="search:actionspecification" minoccurs="0" maxoccurs="unbounded"/> <xs:element name="reserved" type="xs:string" minoccurs="0"/> </xs:sequence> ISSN : Vol. 7 No.10 Oct
7 III. XML SCHEMA AND BIG DATA PROCESSING Efficient processing of XML in map-reduce environments can be rather challenging due to the "impedance mismatch inefficiencies" [11], size and complexity [12]. It is discussed in [1], that the Apache Hive XML serializer-deserializer (SerDe) can be successfully used to process the data of arbitrary complexity. Additionally, as was shown in [13], a software processor for Hive XML SerDe based on the Virtual Token Descriptor technology [14] can visibly improve performance. The authors observed that XML processing for Big Data gravitates towards defining a minimal fragment of an XML document that can be reliably mapped to the Hive table row definition. The elements of such an XML fragment can be translated to the column definitions using appropriate XPath expressions. In a number of cases, however, the XML namespace declarations for the fragment will be lost as they are typically defined in the XML root element. The widely accepted workaround is to remove XML namespaces using XSLT transform before uploading the XML to the Hadoop file system. As XML schemas are not normally designed with such roworiented metaphor in mind, the major difficulties are in finding such an XML fragment that will be self-sufficient and translatable to the desired row and column definitions. Additionally, complex data types often need to be flatten as shown in the following example. Consider the following, simple and well designed, XML fragment: <Report account="12345"> <Session id="123" rid="ab123"> <line by="z" name="pete"/> <line by="y" name="oxe"/> </Session> <Session id="456" rid="cd456"> <line by="m" name="sam"/> <line by="n" name="doug"/> </Session> <Session id="789" rid="ef789"> <line by="p" name="russ"/> <line by="q" name="john"/> </Session> </Report> that needs to be translated to the following output: account session_id reference_id by name ab123 z pete ab123 y oxe cd456 m sam cd456 n doug ef789 p russ ef789 q john ISSN : Vol. 7 No.10 Oct
8 The Hive table CREATE DDL for the above XML looks like this: CREATE TABLE report( account string, sessions array<struct<id:string,rid:string,session:array<struct<by:string,name:string>>>>) ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.xmlserde' WITH SERDEPROPERTIES ( "xml.processor.class"="com.ximpleware.hive.serde2.xml.vtd.xmlprocessor", "column.xpath.account"="/report/@account", "column.xpath.sessions"="/report/session" ) STORED AS INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.xmlinputformat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.ignorekeytextoutputformat' TBLPROPERTIES ( "xmlinput.start"="<report", "xmlinput.end"="</report>" ); Note that the sessions column definition is somewhat complex even for such relatively simple XML. To generate the desired output, two nested LATERAL VIEW transformations will be required, one to flatten the sessions and another to flatten the line array: SELECT account, exp_sessions.id, exp_sessions.rid, line.by, line.name from (SELECT account,exp_sessions.exp_session.id,exp_sessions.exp_session.rid, exp_sessions.exp_session FROM report LATERAL VIEW EXPLODE(sessions) exp_sessions as exp_session) exp_sessions LATERAL VIEW EXPLODE(exp_session.session) exp_lines as line; A. ARRAYS AND WRAPPER ELEMENTS Handling arrays without a wrapper element as in the example below is rather challenging if the parent element should be represented in a single Hive column. <ReportingFI> <Name>a</Name> <Name>b</Name> <DocSpec> <DocRefId>d1</DocRefId> </DocSpec> </ReportingFI> ISSN : Vol. 7 No.10 Oct
9 A possible solution is to introduce a wrapper, Names, for the array so that the ReportingFI element can be represented as a structure that contains the array member. The modified example follows: <FATCA_OECD version="1" schemalocation="urn:oecd:ties:fatca:v1 FatcaXML_v1.1.xsd "> <MessageSpec> <MessageRefId>Sample1</MessageRefId> </MessageSpec> <FATCA> <ReportingFI> <Names> <Name>a</Name> <Name>b</Name> </Names> <DocSpec> <DocRefId>d1</DocRefId> </DocSpec> </ReportingFI> <ReportingFI> <Names> <Name>c</Name> <Name>d</Name> </Names> <DocSpec> <DocRefId>d2</DocRefId> </DocSpec> </ReportingFI> </FATCA> </FATCA_OECD> The CREATE TABLE DDL will look like this: CREATE TABLE fatca(messagerefid string, reportingfis array<struct<reportingfi:struct<names:array<struct<name:string>>,docspec:struct<docrefid:string>>>>) ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.xmlserde' WITH SERDEPROPERTIES ( "xml.processor.class"="com.ximpleware.hive.serde2.xml.vtd.xmlprocessor", "column.xpath.messagerefid"="/fatca_oecd/messagespec/messagerefid/text()", "column.xpath.reportingfis"="/fatca_oecd/fatca/reportingfi" ) STORED AS INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.xmlinputformat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.ignorekeytextoutputformat' TBLPROPERTIES ( "xmlinput.start"="<fatca_oecd ", "xmlinput.end"="</fatca_oecd>" ); ISSN : Vol. 7 No.10 Oct
10 To flatten the data structure and generate the output one has to use the LATERAL VIEW: set hive.exec.mode.local.auto=true; set hive.cli.print.header=true; SELECT messagerefid, exp_reportingfi.reportingfi.docspec.docrefid, exp_reportingfi.reportingfi.names FROM fatca LATERAL VIEW explode(reportingfis) exploded_reportingfis AS exp_reportingfi; Running the SELECT query will produce the following output: messagerefid docrefid names Sample1 d1 [{"name":"a"},{"name":"b"}] Sample1 d2 [{"name":"c"},{"name":"d"}] B. HANDLING ENUMERATED TYPES When dealing with the enumerations defined in such a way that the value of the type is referenced by an identifier, one can use the technique where the identifiers are collected into arrays, one per type as shown in the following DDL: CREATE TABLE response_table(loan_id string, seller_number string, messages array<map<string,map<string,string>>>, product_ids array<string>, classification_ids array<string> ) ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.xmlserde' WITH SERDEPROPERTIES ( "xml.processor.class"="com.ximpleware.hive.serde2.xml.vtd.xmlprocessor", "column.xpath.loan_id"="//attrvalwrapper[attrkey/text()='loanid']/attrval/text()", "column.xpath.seller_number"="//attrvalwrapper[attrkey/text()='sellernumber']/attrval/text()", "column.xpath.messages"="//messagewrapper", "column.xpath.product_ids" = "//DecisionValWrapper[DecisionKey/text()='DecisionCategory'][DecisionVal/text() ='Product']/parent::DecisionValueContainer/parent::Decision/RowIdContainer//RowId/text()", "column.xpath.classification_ids" = "//DecisionValWrapper[DecisionKey/text()='DecisionCategory'][DecisionVal/text() ='Classification']/parent::DecisionValueContainer/parent::Decision/RowIdContainer//RowId/text()" ) STORED AS INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.xmlinputformat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.ignorekeytextoutputformat' TBLPROPERTIES ( "xmlinput.start"="<answer>", "xmlinput.end"="</answer>" ); ISSN : Vol. 7 No.10 Oct
11 To generate the enumerated value the SQL CASE statement with the array_contains() Hive UDF can be used to find if the collected array of identifiers indeed contains the identifier associated with the XML fragment being processed. The approach might also require flattening the structure using the LATERAL VIEW as shown below: SELECT loan_id, seller_number, CASE WHEN CAST(array_contains(product_ids, exp_message['messagewrapper']['rowid']) AS STRING) = 'TRUE' THEN 'Product' WHEN CAST(array_contains(classification_ids, exp_message['messagewrapper']['rowid']) AS STRING) = 'TRUE' THEN 'Classification' ELSE 'Unknown' END AS category_code FROM response_table LATERAL VIEW explode(messages) exploded_messages AS exp_message; IV. CONCLUSION In Big Data processing there is a tendency to represent the XML data using a relational metaphor that often conflicts with the commonly perceived tree-like nature of the XML. Implementing the translation layer to flatten the data after the XML schema is finalized can be rather challenging if such a usage of the XML was not considered during the design phase. The issues, techniques and approaches discussed in this paper can help designers to make intelligent decisions when developing schemas for XML that should be used for the data binding or processed in the map-reduce environments. ACKNOWLEDGMENTS Authors thank Vishwanath Kamat, Linda Liu, Vinayak Agrawal, Curtis Browning and Steven Halter of IBM for invaluable comments during preparation of this paper. REFERENCES [1] Dmitry Vasilenko, Mahesh Kurapati. Efficient Processing of XML Documents in Hadoop Map Reduce, IJCSE, 2014, Vol.6, No.9, p [2] XML Schemas: Best Practices, [3] Basic XML Schema Patterns for Databinding Ver. 1.0, [4] Advanced XML Schema Patterns for Databinding Ver. 1.0, [5] Global versus Local, [6] Kohsuke Kawaguchi, W3C XML Schema: DOs and DON'Ts, [7] Should it be an Element or a Type? [8] Paul Downey, xsi:type is Evil, [9] Sherman R. Alpert, Primitive Types Considered Harmful, [10] Camel Case, [11] Douglas Crockford, JSON: The Fat-Free Alternative to XML, [12] Michael Driscoll, How XML Threatens Big Data, [13] Vinayak Agrawal, Processing XML data in BigInsights 3.0, [14] VTD-XML, ISSN : Vol. 7 No.10 Oct
Efficient Processing of XML Documents in Hadoop Map Reduce
Efficient Processing of Documents in Hadoop Map Reduce Dmitry Vasilenko, Mahesh Kurapati Business Analytics IBM Chicago, USA [email protected], [email protected] Abstract has dominated the enterprise
+ <xs:element name="productsubtype" type="xs:string" minoccurs="0"/>
otcd.ntf.001.01.auctiondetail.. otcd.ntf.001.01.auctionresult - + otcd.ntf.001.01.automaticterminationsummary
User manual for e-line DNB: the XML import file. User manual for e-line DNB: the XML import file
User manual for e-line DNB: the XML import file version 1.2 dated 19 February 2015 1 1. Contents 1. Contents... 2 2. e-line DNB... 3 2.1 Submitting your reports to De Nederlandsche Bank... 3 2.3 Entering
Advanced PDF workflows with ColdFusion
Advanced PDF workflows with ColdFusion and LiveCycle Outline About PDF Generating PDF from ColdFusion Working with PDF forms in ColdFusion Workflows with XFA forms Designing XFA forms with LC Designer
DocuSign Connect Guide
Information Guide 1 DocuSign Connect Guide 2 Copyright 2003-2014 DocuSign, Inc. All rights reserved. For information about DocuSign trademarks, copyrights and patents refer to the DocuSign Intellectual
Schema XSD opisująca typy dokumentów obsługiwane w Systemie invooclip
Krajowa Izba Rozliczeniowa S.A. Schema XSD opisująca typy dokumentów obsługiwane w Systemie invooclip Wersja 1.1
[MS-FSDAP]: Forms Services Design and Activation Web Service Protocol
[MS-FSDAP]: Forms Services Design and Activation Web Service Protocol Intellectual Property Rights Notice for Open Specifications Documentation Technical Documentation. Microsoft publishes Open Specifications
Archivio Sp. z o.o. Schema XSD opisująca typy dokumentów obsługiwane w Systemie Invo24
Archivio Sp. z o.o. Schema XSD opisująca typy dokumentów obsługiwane w Systemie Invo24 Wersja 1.0 Archivio Sp. z o.o. Strona 1
BIG DATA HANDS-ON WORKSHOP Data Manipulation with Hive and Pig
BIG DATA HANDS-ON WORKSHOP Data Manipulation with Hive and Pig Contents Acknowledgements... 1 Introduction to Hive and Pig... 2 Setup... 2 Exercise 1 Load Avro data into HDFS... 2 Exercise 2 Define an
Authentication Context for the OASIS Security Assertion Markup Language (SAML) V2.0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 Authentication Context for the OASIS Security Assertion Markup Language (SAML)
No Trade Secrets. Microsoft does not claim any trade secret rights in this documentation.
[MS-EDCSOM]: Intellectual Property Rights Notice for Open Specifications Documentation Technical Documentation. Microsoft publishes Open Specifications documentation for protocols, file formats, languages,
Gplus Adapter 8.0. for Siebel CRM. Developer s Guide
Gplus Adapter 8.0 for Siebel CRM Developer s Guide The information contained herein is proprietary and confidential and cannot be disclosed or duplicated without the prior written consent of Genesys Telecommunications
[MS-QoE]: Quality of Experience Monitoring Server Protocol Specification
[MS-QoE]: Quality of Experience Monitoring Server Protocol Specification Intellectual Property Rights Notice for Open Specifications Documentation Technical Documentation. Microsoft publishes Open Specifications
COM_2006_023_02.xsd <?xml version="1.0" encoding="utf-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/xmlschema" elementformdefault="qualified">
DRAFT. Standard Definition. Extensible Event Stream. Christian W. Günther Fluxicon Process Laboratories [email protected]
Extensible Event Stream Standard Definition Christian W. Günther Fluxicon Process Laboratories [email protected] XES Version: 1.0 Revision: 1 November 25, 2009 DRAFT Introduction Event logs, as they
Security for industrial automation and control systems: Patch compatibility information
Security for industrial automation and control systems: Patch compatibility information A Progress Report for Review and Comment From ISA99 Work Group 6 (Patch Management) The material in this report has
ASPIRE Programmable Language and Engine
ASPIRE Programmable Language and Engine Athens Information Technology Agenda ASPIRE Programmable Language (APDL) ASPIRE Programmable Engine (APE) 2 ASPIRE Programmable Language ASPIRE Programmable Language
Model-driven Rule-based Mediation in XML Data Exchange
Model-driven Rule-based Mediation in XML Data Exchange Yongxin Liao, Dumitru Roman, Arne J. Berre SINTEF ICT, Oslo, Norway October 5, 2010 ICT 1 Outline Intro to XML Data Exchange FloraMap: Flora2-based
Service Description: NIH GovTrip - NBS Web Service
8 July 2010 Page 1 Service Description: NIH GovTrip - NBS Web Service Version # Change Description Owner 1.0 Initial Version Jerry Zhou 1.1 Added ISC Logo and Schema Section Ian Sebright 8 July 2010 Page
Modernize your NonStop COBOL Applications with XML Thunder September 29, 2009 Mike Bonham, TIC Software John Russell, Canam Software
Modernize your NonStop COBOL Applications with XML Thunder September 29, 2009 Mike Bonham, TIC Software John Russell, Canam Software Agenda XML Overview XML Thunder overview Case Studies Q & A XML Standard
[MS-QoE]: Quality of Experience Monitoring Server Protocol. Intellectual Property Rights Notice for Open Specifications Documentation
[MS-QoE]: Intellectual Property Rights Notice for Open Specifications Documentation Technical Documentation. Microsoft publishes Open Specifications documentation for protocols, file formats, languages,
The New System of Accounts Message structure: Cash settlement balances derivatives (demt.smc.001.01)
The New System of Accounts Message structure: Cash settlement balances derivatives (demt.smc.001.01) Document date: 01.02.2007 : This message includes a statement of financial liabilities and receivables
Chapter 4. Sharing Data through Web Services
Chapter 4. Sharing Data through Web Services Data sharing would seem to be a simple task. Agencies have been making their data publicly available through the Internet since the 1980s. The World Wide Web
Data Integration Hub for a Hybrid Paper Search
Data Integration Hub for a Hybrid Paper Search Jungkee Kim 1,2, Geoffrey Fox 2, and Seong-Joon Yoo 3 1 Department of Computer Science, Florida State University, Tallahassee FL 32306, U.S.A., [email protected],
PEPPOL Deliverable D1.1 Requirements for Use of Signatures in Public Procurement Processes Part 5: XKMS v2 Interface Specification
PEPPOL Deliverable D1.1 Requirements for Use of Signatures in Public Procurement Processes Part 5: XKMS v2 Interface Specification Profiling and Extensions Specification Version 1.2 PEPPOL WP1 2009-04-30
Appendix 1 Technical Requirements
1 av 13 Appendix 1 Technical Requirements Version 2.4.7 Technical requirements for membership in the Skolfederation The Skolfederation has, like many other federation initiatives, the goal to use the following
An Eclipse Plug-In for Visualizing Java Code Dependencies on Relational Databases
An Eclipse Plug-In for Visualizing Java Code Dependencies on Relational Databases Paul L. Bergstein, Priyanka Gariba, Vaibhavi Pisolkar, and Sheetal Subbanwad Dept. of Computer and Information Science,
keyon Luna SA Monitor Service Administration Guide 1 P a g e Version Autor Date Comment
Luna SA Monitor Service Administration Guide Version Autor Date Comment 1.1 Thomas Stucky 25. July 2013 Update installation instructions. 1 P a g e Table of Contents 1 Overview... 3 1.1 What is the keyon
XEP-0337: Event Logging over XMPP
XEP-0337: Event Logging over XMPP Peter Waher mailto:[email protected] xmpp:[email protected] http://www.linkedin.com/in/peterwaher 2015-11-09 Version 0.2 Status Type Short Name Experimental
Enhancing Massive Data Analytics with the Hadoop Ecosystem
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 3, Issue 11 November, 2014 Page No. 9061-9065 Enhancing Massive Data Analytics with the Hadoop Ecosystem Misha
<!--=========================================--> <!--=========================================-->
Object Oriented Databases. OOAD Fall 2012 Arjun Gopalakrishna Bhavya Udayashankar
Object Oriented Databases OOAD Fall 2012 Arjun Gopalakrishna Bhavya Udayashankar Executive Summary The presentation on Object Oriented Databases gives a basic introduction to the concepts governing OODBs
[MS-DVRD]: Device Registration Discovery Protocol. Intellectual Property Rights Notice for Open Specifications Documentation
[MS-DVRD]: Intellectual Property Rights Notice for Open Specifications Documentation Technical Documentation. Microsoft publishes Open Specifications documentation for protocols, file formats, languages,
CA ERwin Data Modeler
CA ERwin Data Modeler Creating Custom Mart Reports Using Crystal Reports Release 9.6.0 This Documentation, which includes embedded help systems and electronically distributed materials (hereinafter referred
Common definitions and specifications for OMA REST interfaces
Common definitions and specifications for OMA REST interfaces Candidate Version 1.0 11 Jan 2011 Open Mobile Alliance OMA-TS-REST_Common-V1_0-20110111-C OMA-TS-REST_Common-V1_0-20110111-C Page 2 (20) Use
Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB
Overview of Databases On MacOS Karl Kuehn Automation Engineer RethinkDB Session Goals Introduce Database concepts Show example players Not Goals: Cover non-macos systems (Oracle) Teach you SQL Answer what
Oracle Java CAPS Message Library for EDIFACT User's Guide
Oracle Java CAPS Message Library for EDIFACT User's Guide Part No: 821 2607 March 2011 Copyright 2008, 2011, Oracle and/or its affiliates. All rights reserved. License Restrictions Warranty/Consequential
Data Management in the Cloud
Data Management in the Cloud Ryan Stern [email protected] : Advanced Topics in Distributed Systems Department of Computer Science Colorado State University Outline Today Microsoft Cloud SQL Server
Integration of Apache Hive and HBase
Integration of Apache Hive and HBase Enis Soztutar enis [at] apache [dot] org @enissoz Page 1 About Me User and committer of Hadoop since 2007 Contributor to Apache Hadoop, HBase, Hive and Gora Joined
Liberty ID-WSF Authentication, Single Sign-On, and Identity Mapping Services Specification
: Version: v2.0 Liberty ID-WSF Authentication, Single Sign-On, and Identity Mapping Services Specification Version: v2.0 Editors: Jeff Hodges, NeuStar, Inc. Robert Aarts, Hewlett-Packard Paul Madsen, NTT
estatistik.core: COLLECTING RAW DATA FROM ERP SYSTEMS
WP. 2 ENGLISH ONLY UNITED NATIONS STATISTICAL COMMISSION and ECONOMIC COMMISSION FOR EUROPE CONFERENCE OF EUROPEAN STATISTICIANS Work Session on Statistical Data Editing (Bonn, Germany, 25-27 September
Sage CRM Connector Tool White Paper
White Paper Document Number: PD521-01-1_0-WP Orbis Software Limited 2010 Table of Contents ABOUT THE SAGE CRM CONNECTOR TOOL... 1 INTRODUCTION... 2 System Requirements... 2 Hardware... 2 Software... 2
Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP
Big-Data and Hadoop Developer Training with Oracle WDP What is this course about? Big Data is a collection of large and complex data sets that cannot be processed using regular database management tools
Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2012/13
Systems Infrastructure for Data Science Web Science Group Uni Freiburg WS 2012/13 Hadoop Ecosystem Overview of this Lecture Module Background Google MapReduce The Hadoop Ecosystem Core components: Hadoop
Best Practices for Hadoop Data Analysis with Tableau
Best Practices for Hadoop Data Analysis with Tableau September 2013 2013 Hortonworks Inc. http:// Tableau 6.1.4 introduced the ability to visualize large, complex data stored in Apache Hadoop with Hortonworks
Why NoSQL? Your database options in the new non- relational world. 2015 IBM Cloudant 1
Why NoSQL? Your database options in the new non- relational world 2015 IBM Cloudant 1 Table of Contents New types of apps are generating new types of data... 3 A brief history on NoSQL... 3 NoSQL s roots
Session Initiation Protocol (SIP) Registration Extensions
[MS-SIPREGE]: Intellectual Property Rights Notice for Open Specifications Documentation Technical Documentation. Microsoft publishes Open Specifications documentation for protocols, file formats, languages,
MedBiquitous Web Services Design Guidelines
MedBiquitous Web Services Design Guidelines Version 2.0 13 May 2009 MedBiquitous Technical Steering Committee Revision History Date Version Description Author 17 Dec 2003 0.9 Draft for Technical Steering
Portal Factory 1.0 - CMIS Connector Module documentation
DOCUMENTATION Portal Factory 1.0 - CMIS Connector Module documentation Rooted in Open Source CMS, Jahia s Digital Industrialization paradigm is about streamlining Enterprise digital projects across channels
ONVIF TM. ONVIF Specification Version 2.4 Release Notes. ONVIF www.onvif.org [email protected]
ONVIF TM ONVIF Specification Version 2.4 Release Notes ONVIF www.onvif.org [email protected] 2008-2013 by ONVIF TM All rights reserved. Recipients of this document may copy, distribute, publish, or display
Impala: A Modern, Open-Source SQL Engine for Hadoop. Marcel Kornacker Cloudera, Inc.
Impala: A Modern, Open-Source SQL Engine for Hadoop Marcel Kornacker Cloudera, Inc. Agenda Goals; user view of Impala Impala performance Impala internals Comparing Impala to other systems Impala Overview:
000-420. IBM InfoSphere MDM Server v9.0. Version: Demo. Page <<1/11>>
000-420 IBM InfoSphere MDM Server v9.0 Version: Demo Page 1. As part of a maintenance team for an InfoSphere MDM Server implementation, you are investigating the "EndDate must be after StartDate"
Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related
Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Summary Xiangzhe Li Nowadays, there are more and more data everyday about everything. For instance, here are some of the astonishing
Introduction to Apache Pig Indexing and Search
Large-scale Information Processing, Summer 2014 Introduction to Apache Pig Indexing and Search Emmanouil Tzouridis Knowledge Mining & Assessment Includes slides from Ulf Brefeld: LSIP 2013 Organizational
Introduction to XML. Data Integration. Structure in Data Representation. Yanlei Diao UMass Amherst Nov 15, 2007
Introduction to XML Yanlei Diao UMass Amherst Nov 15, 2007 Slides Courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives and Gerome Miklau. 1 Structure in Data Representation Relational data is highly
EFSOC Framework Overview and Infrastructure Services
EFSOC Framework Overview and Infrastructure Services Infolab Technical Report Series INFOLAB-TR-13 Kees Leune Id: infraserv.tex,v 1.12 2003/10/23 10:36:08 kees Exp 1 Contents 1 Introduction 4 1.1 Notational
CAS Protocol 3.0 specification
CAS Protocol 3.0 specification Contents CAS Protocol 3.0 Specification 5 Authors, Version 5 1. Introduction 5 1.1. Conventions & Definitions.................... 5 1.2 Reference Implementation....................
XML. Document Type Definitions XML Schema
XML Document Type Definitions XML Schema 1 Well-Formed and Valid XML Well-Formed XML allows you to invent your own tags. Valid XML conforms to a certain DTD. 2 Well-Formed XML Start the document with a
04 XML Schemas. Software Technology 2. MSc in Communication Sciences 2009-10 Program in Technologies for Human Communication Davide Eynard
MSc in Communication Sciences 2009-10 Program in Technologies for Human Communication Davide Eynard Software Technology 2 04 XML Schemas 2 XML: recap and evaluation During last lesson we saw the basics
Advanced Object Oriented Database access using PDO. Marcus Börger
Advanced Object Oriented Database access using PDO Marcus Börger ApacheCon EU 2005 Marcus Börger Advanced Object Oriented Database access using PDO 2 Intro PHP and Databases PHP 5 and PDO Marcus Börger
Implementing XML Schema inside a Relational Database
Implementing XML Schema inside a Relational Database Sandeepan Banerjee Oracle Server Technologies 500 Oracle Pkwy Redwood Shores, CA 94065, USA + 1 650 506 7000 [email protected] ABSTRACT
Experiences with JSON and XML Transformations IBM Submission to W3C Workshop on Data and Services Integration October 20-21 2011, Bedford, MA, USA
Experiences with JSON and XML Transformations IBM Submission to W3C Workshop on Data and Services Integration October 20-21 2011, Bedford, MA, USA 21 October 2011 John Boyer, Sandy Gao, Susan Malaika,
Agents and Web Services
Agents and Web Services ------SENG609.22 Tutorial 1 Dong Liu Abstract: The basics of web services are reviewed in this tutorial. Agents are compared to web services in many aspects, and the impacts of
Introduction to Web Services
Department of Computer Science Imperial College London CERN School of Computing (icsc), 2005 Geneva, Switzerland 1 Fundamental Concepts Architectures & escience example 2 Distributed Computing Technologies
HareDB HBase Client Web Version USER MANUAL HAREDB TEAM
2013 HareDB HBase Client Web Version USER MANUAL HAREDB TEAM Connect to HBase... 2 Connection... 3 Connection Manager... 3 Add a new Connection... 4 Alter Connection... 6 Delete Connection... 6 Clone Connection...
DTD Tutorial. About the tutorial. Tutorial
About the tutorial Tutorial Simply Easy Learning 2 About the tutorial DTD Tutorial XML Document Type Declaration commonly known as DTD is a way to describe precisely the XML language. DTDs check the validity
Big Data and Analytics by Seema Acharya and Subhashini Chellappan Copyright 2015, WILEY INDIA PVT. LTD. Introduction to Pig
Introduction to Pig Agenda What is Pig? Key Features of Pig The Anatomy of Pig Pig on Hadoop Pig Philosophy Pig Latin Overview Pig Latin Statements Pig Latin: Identifiers Pig Latin: Comments Data Types
Supporting Data Set Joins in BIRT
Supporting Data Set Joins in BIRT Design Specification Draft 1: Feb 13, 2006 Abstract This is the design specification of the BIRT Data Set Join feature. Document Revisions Version Date Description of
CHAPTER 9: DATAPORT AND XMLPORT CHANGES
Chapter 9: Dataport and XMLport Changes CHAPTER 9: DATAPORT AND XMLPORT CHANGES Objectives Introduction The objectives are: Provide an overview of dataport changes. Discuss changes in XMLport object and
Integrate Master Data with Big Data using Oracle Table Access for Hadoop
Integrate Master Data with Big Data using Oracle Table Access for Hadoop Kuassi Mensah Oracle Corporation Redwood Shores, CA, USA Keywords: Hadoop, BigData, Hive SQL, Spark SQL, HCatalog, StorageHandler
Big Data Approaches. Making Sense of Big Data. Ian Crosland. Jan 2016
Big Data Approaches Making Sense of Big Data Ian Crosland Jan 2016 Accelerate Big Data ROI Even firms that are investing in Big Data are still struggling to get the most from it. Make Big Data Accessible
Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview
Programming Hadoop 5-day, instructor-led BD-106 MapReduce Overview The Client Server Processing Pattern Distributed Computing Challenges MapReduce Defined Google's MapReduce The Map Phase of MapReduce
Oracle Big Data SQL Technical Update
Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical
Data Domain Profiling and Data Masking for Hadoop
Data Domain Profiling and Data Masking for Hadoop 1993-2015 Informatica LLC. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or
Xiaoming Gao Hui Li Thilina Gunarathne
Xiaoming Gao Hui Li Thilina Gunarathne Outline HBase and Bigtable Storage HBase Use Cases HBase vs RDBMS Hands-on: Load CSV file to Hbase table with MapReduce Motivation Lots of Semi structured data Horizontal
Quiz! Database Indexes. Index. Quiz! Disc and main memory. Quiz! How costly is this operation (naive solution)?
Database Indexes How costly is this operation (naive solution)? course per weekday hour room TDA356 2 VR Monday 13:15 TDA356 2 VR Thursday 08:00 TDA356 4 HB1 Tuesday 08:00 TDA356 4 HB1 Friday 13:15 TIN090
Exam Name: IBM InfoSphere MDM Server v9.0
Vendor: IBM Exam Code: 000-420 Exam Name: IBM InfoSphere MDM Server v9.0 Version: DEMO 1. As part of a maintenance team for an InfoSphere MDM Server implementation, you are investigating the "EndDate must
D4.1.2 Cloud-based Data Storage (Prototype II)
< ADVENTURE WP 4 D4.1.2 Cloud-based Data Storage (Prototype II) D4.1.2 Cloud-based Data Storage (Prototype II) Authors: ASC, TUDA Delivery Date: 2013-10-01 Due Date: 2013-08-31 Dissemination Level: PU
Translating between XML and Relational Databases using XML Schema and Automed
Imperial College of Science, Technology and Medicine (University of London) Department of Computing Translating between XML and Relational Databases using XML Schema and Automed Andrew Charles Smith acs203
XIII. Service Oriented Computing. Laurea Triennale in Informatica Corso di Ingegneria del Software I A.A. 2006/2007 Andrea Polini
XIII. Service Oriented Computing Laurea Triennale in Informatica Corso di Outline Enterprise Application Integration (EAI) and B2B applications Service Oriented Architecture Web Services WS technologies
Hadoop Job Oriented Training Agenda
1 Hadoop Job Oriented Training Agenda Kapil CK [email protected] Module 1 M o d u l e 1 Understanding Hadoop This module covers an overview of big data, Hadoop, and the Hortonworks Data Platform. 1.1 Module
PHP Language Binding Guide For The Connection Cloud Web Services
PHP Language Binding Guide For The Connection Cloud Web Services Table Of Contents Overview... 3 Intended Audience... 3 Prerequisites... 3 Term Definitions... 3 Introduction... 4 What s Required... 5 Language
Agency to System Infrastructure Provider Interface Specification
Agency to System Infrastructure Provider Interface Specification Version 1.0.0 November 8, 2006 FINAL Document History Status Release Date Comment Audience Draft 0.1.0 08/31/06 Posted for public review
Semistructured data and XML. Institutt for Informatikk INF3100 09.04.2013 Ahmet Soylu
Semistructured data and XML Institutt for Informatikk 1 Unstructured, Structured and Semistructured data Unstructured data e.g., text documents Structured data: data with a rigid and fixed data format
