Apache Tika for Enabling Metadata Interoperability

Size: px
Start display at page:

Download "Apache Tika for Enabling Metadata Interoperability"

Transcription

1 Apache Tika for Enabling Metadata Interoperability Apache: Big Data Europe September 28 30, 2015 Budapest, Hungary Presented by Michael Starch (NASA JPL) and Nick Burch (Quan,cate) Proposed by Giuseppe Totaro (Sapienza University of Rome) and Chris Ma=mann (NASA JPL) Contact:

2 Summary What is Apache Tika Tika and Metadata Metadata Interoperability Tika for Enabling Metadata Interoperability Conclusion and Future Work 2

3 Apache Tika as the de facto babel fish for digital documents WHAT IS TIKA 3

4 What is Tika Java- based toolkit to detect and extract metadata and text from heterogeneous files Built from source code using Maven Provides a single Parser interface to wrap around third- party parsing libraries Enables recursive parsing Performs language detecuon and translauon Used either programmaucally or standalone 4

5 Supported Format HTML, XML, XHTML MicrosoZ Office document formats OpenDocument Format iworks document formats EPUB, PDF, RTF Compression and packaging formats Text formats Audio, Image and Video formats Mail formats 5

6 DetecUon Tika tries to idenufy the right file type Custom MIME types registry DetecUon methods File name Content-Type hints MAGIC bytes Character encodings Combined approaches 6

7 Extending Tika Parsers Add your MIME- Type (Uka- mimetypes.xml) <mime-type type="application/x-isatab-investigation"> <_comment>isa-tab Investigation file</_comment> <magic priority="50"> <match value="ontology SOURCE REFERENCE type="string" offset="0"/> </magic> <glob pattern="i_*.txt"/> </mime-type> Create your Parser class public class ISArchiveParser implements Parser public void parse() {...} } 7

8 New Features The last stable release is Tika 1.10 Some new features of the last releases: Upgraded to Java 7 (TIKA- 1536) ExtracUon of biomedical informauon relying on Apache ctakes (TIKA- 1645, TIKA- 1642) ProbabilisUc mimetype detecuon tika-batch module for directory to directory batch processing 8

9 ExtracUon of metadata with Apache Tika TIKA AND METADATA 9

10 What is Metadata Informally defined as data about data DescripUve, structural, administrauve, rights management, preservauon [NISO (2004)] E.g., Title, Author, CreaUon Date, Rights Every metadata schema may vary a lot: Naming (e.g., Description and Info) Correspondences (e.g., Creator and FirstName / LastName) RepresentaUon (e.g., and 2007/01/01 ) 10

11 Model PerspecUve of Metadata A Survey of Techniques for Achieving Metadata Interoperability 11 Universal Modelling Language Meta- Meta-Model M3 instance of Abstraction Levels Schema Definition Language Metadata Schema Meta-Model instance of Model M2 M1 instance of [HASLHOFER, KLAS (2010)] Meta data M0 Fig. 5. Metadata Apache: building Big Data blocks Europe from 2015 a model perspective 11

12 Tika and Metadata Tika enables metadata extracuon (if present) Tika maps metadata onto common, consistent key- value pairs in Metadata (Some) Metadata APIs org.apache.uka.metadata Metadata class: mulu- values metadata container TikaCoreProperties: core set of basic properues org.apache.uka.xmp XMPMetadata: conversion to XMP model 12

13 Solr s ExtracUngRequestHandler Apache Solr uses Tika to ingest binary and/or structured documents Solr's ExtractingRequestHandler uses Tika to upload binary files into Solr Input parameters (configurauon) fmap.<source_field>=<target_field> uprefix=<prefix> Performs only name mapping 13

14 Metadata Roadmap The six- point roadmap includes: Reorganize metadata keys internally Move XMP output to an extra XMP module of Tika Correct parsers where necessary Add support for structured data to metadata class Introduce versioning scheme for metadata mappings Introduce the ability for clients to define own mappings 14

15 Interoperability as prerequisite for uniform data access METADATA INTEROPERABILITY 15

16 Metadata Interoperability Prerequisite for uniform access to media objects Metadata interoperability is a qualita2ve property of metadata informa2on objects that enables systems and applica2ons to work with or use these objects across system boundaries. [HASLHOFER, KLAS (2010)] 16

17 Metadata HeterogeneiUes [HASLHOFER, KLAS (2010)] Predominat heterogeneiues have been originally idenufied by: [Sheth, Larson (1990)] [Ouksel, Sheth (1999)] [Wache (2003)] [Visser et al. (1997)] 17

18 Examples of Metadata HeterogeneiUes 18

19 Interoperability Techniques Model Agreement e.g., Standardized Metadata Schema Meta- Model Agreement e.g., Global Conceptual Model Model ReconciliaUon Language Mapping (M2) Schema Mapping (M1) Instance TransformaOon (M0) Metadata Mapping 19

20 Metadata Mapping Technique that subsumes: schema mapping instance transforma2on crosswalks funcuons S s Given a source schema and a target schema, each consisung of a set of schema elements e s S s and e t S t, M is a direcuonal relauonship between two sets of elements e s i S t t and e j S t. S t 20

21 Mapping RelaUonship m M m p P f F cardinality mapping expression instance transforma2on func2on Mapping expressions [Spaccapietra et al. (1992)]: Exclude Equivalent Include Overlap I ( s t e i ) I e j ( ) = ( ) I ( t e ) j I e i s I ( s e ) i I ( t e j ) I ( t e ) j I ( s e ) i I ( s e i ) I ( t e ) j I ( s e ) i I ( t e j ) I ( t e ) j I ( s e ) i 21

22 Elements of Metadata Mapping [HASLHOFER, KLAS (2010)] 22

23 Introduce the ability for clients to define their own mappings TIKA FOR ENABLING METADATA INTEROPERABILITY 23

24 Tika for Enabling Metadata Interoperability To be integrated into Tika (TIKA- 1691) as new component Based on the following improvements: MappedMetadata class Mapping uuliues (schema and instance) MetadataConfig class Maps the metadata to a mediated schema 24

25 MappedMetadata class Wrapper of Metadata class Decorates two methods of Metadata: get: maps metadata on geber side (default) set: maps metadata on seber side 25

26 UUliUes and ConfiguraUon Mapping Methods CrosswalkUtils class (schema) TransformationsUtils class (instance) Mapping ConfiguraUon MetadataConfig class works as well as TikaConfig (parse XML config file) Fine- grained configurauon Default configurauon in tika-metadata.xml 26

27 Example of MappedMetadata (1/2) 27

28 Example of MappedMetadata (2/2) 28

29 Future direcuons for Uka- metadata CONCLUSION AND FUTURE WORK 29

30 Conclusion Oka- metadata is a new component to enable metadata interoperability on client side Pros Highly configurable technique Fine- grained mapping Cons Configuring a new mapping from scratch may require much Ume Unknown metadata mapped only to dynamic fields 30

31 Future Work IntegraUon with the next releases of Tika Simplify configurauon and provide a complete sample with documentauon Support strategies for unknown metadata Return current mappings as graphical representauon (i.e., Hierarchical Edge Bundling) 31

32 Acknowledgements This work has been started by Giuseppe Totaro and Chris Mapmann at NASA JPL Thanks to Michael Starch who has presented this proposal Thanks to Nick Burch who is kindly supporung this idea Thanks to Tim Allison who is acuvely commenung Jira issue TIKA

How To Run Apa Tika On A Microsoft Macbook Or Ipa.Net (For Linux) Or Ipad (For Windows) (For Macbook) (Or Ipa) (On Linux) (Minor) (Large

How To Run Apa Tika On A Microsoft Macbook Or Ipa.Net (For Linux) Or Ipad (For Windows) (For Macbook) (Or Ipa) (On Linux) (Minor) (Large What's with all the 1s and 0s? Making sense of binary data at scale with Apache Tika Nick Burch CTO, Quanticate Those 1s and 0s Apache Tika the basics Detection Binary formats Text formats Extending Tika

More information

Indexing big data with Tika, Solr, and map-reduce

Indexing big data with Tika, Solr, and map-reduce Indexing big data with Tika, Solr, and map-reduce Scott Fisher, Erik Hetzner California Digital Library 8 February 2012 Scott Fisher, Erik Hetzner (CDL) Indexing big data 8 February 2012 1 / 19 Outline

More information

Johns Hopkins University Archive Ingest and Handling Test (AIHT) Final Report

Johns Hopkins University Archive Ingest and Handling Test (AIHT) Final Report Johns Hopkins University Archive Ingest and Handling Test (AIHT) Final Report I) Introduction A) Project Description The Archive Ingest and Handling Test (AIHT) was funded by the Library of Congress (LC)

More information

REDUCING THE COST OF GROUND SYSTEM DEVELOPMENT AND MISSION OPERATIONS USING AUTOMATED XML TECHNOLOGIES. Jesse Wright Jet Propulsion Laboratory,

REDUCING THE COST OF GROUND SYSTEM DEVELOPMENT AND MISSION OPERATIONS USING AUTOMATED XML TECHNOLOGIES. Jesse Wright Jet Propulsion Laboratory, REDUCING THE COST OF GROUND SYSTEM DEVELOPMENT AND MISSION OPERATIONS USING AUTOMATED XML TECHNOLOGIES Colette Wilklow MS 301-240, Pasadena, CA phone + 1 818 354-4674 fax + 1 818 393-4100 email: colette.wilklow@jpl.nasa.gov

More information

XQuery and the E-xml Component suite

XQuery and the E-xml Component suite An Introduction to the e-xml Data Integration Suite Georges Gardarin, Antoine Mensch, Anthony Tomasic e-xmlmedia, 29 Avenue du Général Leclerc, 92340 Bourg La Reine, France georges.gardarin@e-xmlmedia.fr

More information

Easy configuration of NETCONF devices

Easy configuration of NETCONF devices Easy configuration of NETCONF devices David Alexa 1 Tomas Cejka 2 FIT, CTU in Prague CESNET, a.l.e. Czech Republic Czech Republic alexadav@fit.cvut.cz cejkat@cesnet.cz Abstract. It is necessary for developers

More information

Apache Sling A REST-based Web Application Framework Carsten Ziegeler cziegeler@apache.org ApacheCon NA 2014

Apache Sling A REST-based Web Application Framework Carsten Ziegeler cziegeler@apache.org ApacheCon NA 2014 Apache Sling A REST-based Web Application Framework Carsten Ziegeler cziegeler@apache.org ApacheCon NA 2014 About cziegeler@apache.org @cziegeler RnD Team at Adobe Research Switzerland Member of the Apache

More information

Oracle Universal Content Management 10.1.3

Oracle Universal Content Management 10.1.3 Date: 2007/04/16-10.1.3 Oracle Universal Content Management 10.1.3 Document Management Quick Start Tutorial Oracle Universal Content Management 10.1.3 Document Management Quick Start Guide Page 1 Contents

More information

Contents. Pentaho Corporation. Version 5.1. Copyright Page. New Features in Pentaho Data Integration 5.1. PDI Version 5.1 Minor Functionality Changes

Contents. Pentaho Corporation. Version 5.1. Copyright Page. New Features in Pentaho Data Integration 5.1. PDI Version 5.1 Minor Functionality Changes Contents Pentaho Corporation Version 5.1 Copyright Page New Features in Pentaho Data Integration 5.1 PDI Version 5.1 Minor Functionality Changes Legal Notices https://help.pentaho.com/template:pentaho/controls/pdftocfooter

More information

Last Week. XML (extensible Markup Language) HTML Deficiencies. XML Advantages. Syntax of XML DHTML. Applets. Modifying DOM Event bubbling

Last Week. XML (extensible Markup Language) HTML Deficiencies. XML Advantages. Syntax of XML DHTML. Applets. Modifying DOM Event bubbling XML (extensible Markup Language) Nan Niu (nn@cs.toronto.edu) CSC309 -- Fall 2008 DHTML Modifying DOM Event bubbling Applets Last Week 2 HTML Deficiencies Fixed set of tags No standard way to create new

More information

Functional Requirements for Digital Asset Management Project version 3.0 11/30/2006

Functional Requirements for Digital Asset Management Project version 3.0 11/30/2006 /30/2006 2 3 4 5 6 7 8 9 0 2 3 4 5 6 7 8 9 20 2 22 23 24 25 26 27 28 29 30 3 32 33 34 35 36 37 38 39 = required; 2 = optional; 3 = not required functional requirements Discovery tools available to end-users:

More information

estatistik.core: COLLECTING RAW DATA FROM ERP SYSTEMS

estatistik.core: COLLECTING RAW DATA FROM ERP SYSTEMS WP. 2 ENGLISH ONLY UNITED NATIONS STATISTICAL COMMISSION and ECONOMIC COMMISSION FOR EUROPE CONFERENCE OF EUROPEAN STATISTICIANS Work Session on Statistical Data Editing (Bonn, Germany, 25-27 September

More information

Quality Assurance Subsystem Design Document

Quality Assurance Subsystem Design Document Quality Assurance Subsystem Design Document Contents 1 Signatures 2 Revision history 3 Document number 4 Introduction 4.1 Description 4.2 Supporting Documentation 4.3 Requirements 4.4 Jargon 5 Institutional

More information

Web. Services. Web Technologies. Today. Web. Technologies. Internet WWW. Protocols TCP/IP HTTP. Apache. Next Time. Lecture #3 2008 3 Apache.

Web. Services. Web Technologies. Today. Web. Technologies. Internet WWW. Protocols TCP/IP HTTP. Apache. Next Time. Lecture #3 2008 3 Apache. JSP, and JSP, and JSP, and 1 2 Lecture #3 2008 3 JSP, and JSP, and Markup & presentation (HTML, XHTML, CSS etc) Data storage & access (JDBC, XML etc) Network & application protocols (, etc) Programming

More information

BI Publisher Reporting in Release 12 Tips and Techniques

BI Publisher Reporting in Release 12 Tips and Techniques BI Publisher Reporting in Release 12 Tips and Techniques Sudhakar Lakkoju Senior Principal Consultant Muralidhar Kadambala Senior Consultant Agenda About AST Corporation Functional Overview New and changed

More information

Leveraging the Eclipse TPTP* Agent Infrastructure

Leveraging the Eclipse TPTP* Agent Infrastructure 2005 Intel Corporation; made available under the EPL v1.0 March 3, 2005 Eclipse is a trademark of Eclipse Foundation, Inc 1 Leveraging the Eclipse TPTP* Agent Infrastructure Andy Kaylor Intel Corporation

More information

IIPC Metadata Workshop. Brad Tofel Vinay Goel Aaron Binns Internet Archive

IIPC Metadata Workshop. Brad Tofel Vinay Goel Aaron Binns Internet Archive IIPC Metadata Workshop Brad Tofel Vinay Goel Aaron Binns Internet Archive IIPC Data Analysis Workshop, The Hague, May 13, 2011 How can I implement analysis? IIPC Data Analysis Workshop, The Hague, May

More information

A Metadata Model for Peer-to-Peer Media Distribution

A Metadata Model for Peer-to-Peer Media Distribution A Metadata Model for Peer-to-Peer Media Distribution Christian Timmerer 1, Michael Eberhard 1, Michael Grafl 1, Keith Mitchell 2, Sam Dutton 3, and Hermann Hellwagner 1 1 Klagenfurt University, Multimedia

More information

LabVIEW Internet Toolkit User Guide

LabVIEW Internet Toolkit User Guide LabVIEW Internet Toolkit User Guide Version 6.0 Contents The LabVIEW Internet Toolkit provides you with the ability to incorporate Internet capabilities into VIs. You can use LabVIEW to work with XML documents,

More information

Introduction to XML Applications

Introduction to XML Applications EMC White Paper Introduction to XML Applications Umair Nauman Abstract: This document provides an overview of XML Applications. This is not a comprehensive guide to XML Applications and is intended for

More information

Combining SAWSDL, OWL DL and UDDI for Semantically Enhanced Web Service Discovery

Combining SAWSDL, OWL DL and UDDI for Semantically Enhanced Web Service Discovery Combining SAWSDL, OWL DL and UDDI for Semantically Enhanced Web Service Discovery Dimitrios Kourtesis, Iraklis Paraskakis SEERC South East European Research Centre, Greece Research centre of the University

More information

iway Roadmap Michael Corcoran Sr. VP Corporate Marketing

iway Roadmap Michael Corcoran Sr. VP Corporate Marketing 16.06.2015 iway Roadmap Michael Corcoran Sr. VP Corporate Marketing iway 7 Products 1 iway 7 Products iway 7 Products 360 Viewer Remediation Sentinel Portal Golden Record Search and View Omni Patient Data

More information

Preservation Handbook

Preservation Handbook Preservation Handbook [Binary Text / Word Processor Documents] Author Rowan Wilson and Martin Wynne Version Draft V3 Date 22 / 08 / 05 Change History Revised by MW 22.8.05; 2.12.05; 7.3.06 Page 1 of 7

More information

Developer s Guide. How to Develop a Communiqué Digital Asset Management Solution

Developer s Guide. How to Develop a Communiqué Digital Asset Management Solution Developer s Guide How to Develop a Communiqué Digital Asset Management Solution 1 PURPOSE 3 2 CQ DAM OVERVIEW 4 2.1 2.2 Key CQ DAM Features 4 2.2 How CQ DAM Works 6 2.2.1 Unified Architecture 7 2.2.2 Asset

More information

Magento Search Extension TECHNICAL DOCUMENTATION

Magento Search Extension TECHNICAL DOCUMENTATION CHAPTER 1... 3 1. INSTALLING PREREQUISITES AND THE MODULE (APACHE SOLR)... 3 1.1 Installation of the search server... 3 1.2 Configure the search server for usage with the search module... 7 Deploy the

More information

FEATURE COMPARISON BETWEEN WINDOWS SERVER UPDATE SERVICES AND SHAVLIK HFNETCHKPRO

FEATURE COMPARISON BETWEEN WINDOWS SERVER UPDATE SERVICES AND SHAVLIK HFNETCHKPRO FEATURE COMPARISON BETWEEN WINDOWS SERVER UPDATE SERVICES AND SHAVLIK HFNETCHKPRO Copyright 2005 Shavlik Technologies. All rights reserved. No part of this document may be reproduced or retransmitted in

More information

Advanced Customisation: Scripting EPrints. EPrints Training Course

Advanced Customisation: Scripting EPrints. EPrints Training Course Advanced Customisation: Scripting EPrints EPrints Training Course Part 2: Scripting Techniques Roadmap Core API manipulating your data accessing data collections searching your data Scripting techniques

More information

Server-Side Web Development JSP. Today. Web Servers. Static HTML Directives. Actions Comments Tag Libraries Implicit Objects. Apache.

Server-Side Web Development JSP. Today. Web Servers. Static HTML Directives. Actions Comments Tag Libraries Implicit Objects. Apache. 1 Pages () Lecture #4 2007 Pages () 2 Pages () 3 Pages () Serves resources via HTTP Can be anything that serves data via HTTP Usually a dedicated machine running web server software Can contain modules

More information

Java 7 Recipes. Freddy Guime. vk» (,\['«** g!p#« Carl Dea. Josh Juneau. John O'Conner

Java 7 Recipes. Freddy Guime. vk» (,\['«** g!p#« Carl Dea. Josh Juneau. John O'Conner 1 vk» Java 7 Recipes (,\['«** - < g!p#«josh Juneau Carl Dea Freddy Guime John O'Conner Contents J Contents at a Glance About the Authors About the Technical Reviewers Acknowledgments Introduction iv xvi

More information

The following are some of the enhancements and improvements that have been made in this version of the PrinterOn Enterprise Platform.

The following are some of the enhancements and improvements that have been made in this version of the PrinterOn Enterprise Platform. PrinterOn Enterprise Private Cloud Print Platform Release Notes Enterprise 2.1 PSIM 2.1 Product Changes and Enhancements PrinterOn s Enterprise Server provides Enterprise-grade cloud print services for

More information

Rotorcraft Health Management System (RHMS)

Rotorcraft Health Management System (RHMS) AIAC-11 Eleventh Australian International Aerospace Congress Rotorcraft Health Management System (RHMS) Robab Safa-Bakhsh 1, Dmitry Cherkassky 2 1 The Boeing Company, Phantom Works Philadelphia Center

More information

Fast Infoset & Fast Web Services. Paul Sandoz Staff Engineer Sun Microsystems

Fast Infoset & Fast Web Services. Paul Sandoz Staff Engineer Sun Microsystems Fast Infoset & Fast Web Services Paul Sandoz Staff Engineer Sun Microsystems New standards on the way Two new specifications will go for Consent to Last Call in Moscow Fast Infoset ITU-T Rec. X.891 ISO/IEC

More information

iway Roadmap Michael Corcoran Sr. VP Corporate Marketing

iway Roadmap Michael Corcoran Sr. VP Corporate Marketing iway Roadmap Michael Corcoran Sr. VP Corporate Marketing iway 7 Products iway 7 Products iway 7 Products 360 Viewer Remediation Sentinel Portal Golden Record Search and View Omni-Patient Data Exception

More information

Drupal CMS for marketing sites

Drupal CMS for marketing sites Drupal CMS for marketing sites Intro Sample sites: End to End flow Folder Structure Project setup Content Folder Data Store (Drupal CMS) Importing/Exporting Content Database Migrations Backend Config Unit

More information

XML. CIS-3152, Spring 2013 Peter C. Chapin

XML. CIS-3152, Spring 2013 Peter C. Chapin XML CIS-3152, Spring 2013 Peter C. Chapin Markup Languages Plain text documents with special commands PRO Plays well with version control and other program development tools. Easy to manipulate with scripts

More information

Apache Tomcat 4.0 Sample Modified 06/09/04

Apache Tomcat 4.0 Sample Modified 06/09/04 Contents Apache Tomcat 4.0 Sample DISCLAIMER... 3 RELEASE NOTES... 4 SAMPLE FEATURES... 4 RELATED TOPICS... 4 Load Balancing... 4 Data Source Setup Guide... 4 File Return... 4 Java API... 4 OUTPUT TYPES...

More information

Apache Sentry. Prasad Mujumdar prasadm@apache.org prasadm@cloudera.com

Apache Sentry. Prasad Mujumdar prasadm@apache.org prasadm@cloudera.com Apache Sentry Prasad Mujumdar prasadm@apache.org prasadm@cloudera.com Agenda Various aspects of data security Apache Sentry for authorization Key concepts of Apache Sentry Sentry features Sentry architecture

More information

Contents About the Contract Management Post Installation Administrator's Guide... 5 Viewing and Modifying Contract Management Settings...

Contents About the Contract Management Post Installation Administrator's Guide... 5 Viewing and Modifying Contract Management Settings... Post Installation Guide for Primavera Contract Management 14.1 July 2014 Contents About the Contract Management Post Installation Administrator's Guide... 5 Viewing and Modifying Contract Management Settings...

More information

Apache Karaf in real life ApacheCon NA 2014

Apache Karaf in real life ApacheCon NA 2014 Apache Karaf in real life ApacheCon NA 2014 Agenda Very short history of Karaf Karaf basis A bit deeper dive into OSGi Modularity vs Extensibility DIY - Karaf based solution What we have learned New and

More information

Using Impatica for Power Point

Using Impatica for Power Point Using Impatica for Power Point What is Impatica? Impatica is a tool that will help you to compress PowerPoint presentations and convert them into a more efficient format for web delivery. Impatica for

More information

A Java proxy for MS SQL Server Reporting Services

A Java proxy for MS SQL Server Reporting Services 1 of 5 1/10/2005 9:37 PM Advertisement: Support JavaWorld, click here! January 2005 HOME FEATURED TUTORIALS COLUMNS NEWS & REVIEWS FORUM JW RESOURCES ABOUT JW A Java proxy for MS SQL Server Reporting Services

More information

Further we designed a management system on the basis of our proposed architecture that supports basic management functions.

Further we designed a management system on the basis of our proposed architecture that supports basic management functions. Abstract Most Internet networking devices are now equipped with a Web server for providing Web-based element management so that an administrator may take advantage of this enhanced and powerful management

More information

Connections to External File Sources

Connections to External File Sources Connections to External File Sources By using connections to external sources you can significantly speed up the process of getting up and running with M-Files and importing existing data. For instance,

More information

Overview of NDNP Technical Specifications

Overview of NDNP Technical Specifications Overview of NDNP Technical Specifications and Philosophy Digitization from preservation microfilm print negatives (2n) provides the most cost-efficient approach for large-scale digitization Distributed

More information

Email API Document. 2012 Webaroo Technology India Pvt. Ltd.

Email API Document. 2012 Webaroo Technology India Pvt. Ltd. Email API Document 2012 Webaroo Technology India Pvt. Ltd. All rights reserved. No parts of this work may be reproduced in any form or by any means - graphic, electronic, or mechanical, including photocopying,

More information

Expanding Metadata Reuse with an Islandora Metadata Extraction Utility

Expanding Metadata Reuse with an Islandora Metadata Extraction Utility Expanding Metadata Reuse with an Islandora Metadata Extraction Utility Serhiy Polyakov and William E. Moen University of North Texas International conference Open Repositories 2013 Charlottetown, Prince

More information

Developing a Web Server Platform with SAPI Support for AJAX RPC using JSON

Developing a Web Server Platform with SAPI Support for AJAX RPC using JSON Revista Informatica Economică, nr. 4 (44)/2007 45 Developing a Web Server Platform with SAPI Support for AJAX RPC using JSON Iulian ILIE-NEMEDI, Bucharest, Romania, inemedi@ie.ase.ro Writing a custom web

More information

Novel Data Extraction Language for Structured Log Analysis

Novel Data Extraction Language for Structured Log Analysis Novel Data Extraction Language for Structured Log Analysis P.W.D.C. Jayathilake 99X Technology, Sri Lanka. ABSTRACT This paper presents the implementation of a new log data extraction language. Theoretical

More information

Solution White Paper Connect Hadoop to the Enterprise

Solution White Paper Connect Hadoop to the Enterprise Solution White Paper Connect Hadoop to the Enterprise Streamline workflow automation with BMC Control-M Application Integrator Table of Contents 1 EXECUTIVE SUMMARY 2 INTRODUCTION THE UNDERLYING CONCEPT

More information

Mapping Objects to External DBMSs

Mapping Objects to External DBMSs Mapping Objects to External DBMSs There are many decisions to be made when mapping objects to external (non-object) DBMS products. The mapping capabilities of the object storage products factor into these

More information

Building An Institutional Repository With DSpace

Building An Institutional Repository With DSpace 102 PLANNER - 2008 Building An Institutional Repository With DSpace Juli Thakuria Abstract Paper deals with open source institutional repository software specially DSpace. After defining the terms, it

More information

Abstract 1. INTRODUCTION

Abstract 1. INTRODUCTION A Virtual Database Management System For The Internet Alberto Pan, Lucía Ardao, Manuel Álvarez, Juan Raposo and Ángel Viña University of A Coruña. Spain e-mail: {alberto,lucia,mad,jrs,avc}@gris.des.fi.udc.es

More information

NoSQL Roadshow Berlin Kai Spichale

NoSQL Roadshow Berlin Kai Spichale Full-text Search with NoSQL Technologies NoSQL Roadshow Berlin Kai Spichale 25.04.2013 About me Kai Spichale Software Engineer at adesso AG Author in professional journals, conference speaker adesso is

More information

Summary Table of Contents

Summary Table of Contents Summary Table of Contents Preface VII For whom is this book intended? What is its topical scope? Summary of its organization. Suggestions how to read it. Part I: Why We Need Long-term Digital Preservation

More information

SAP Data Services and SAP Information Steward Document Version: 4.2 Support Package 7 (14.2.7.0) 2016-05-06 PUBLIC. Master Guide

SAP Data Services and SAP Information Steward Document Version: 4.2 Support Package 7 (14.2.7.0) 2016-05-06 PUBLIC. Master Guide SAP Data Services and SAP Information Steward Document Version: 4.2 Support Package 7 (14.2.7.0) 2016-05-06 PUBLIC Content 1 Getting Started....4 1.1 Products Overview.... 4 1.2 Components overview....4

More information

How To Manage Your Digital Assets On A Computer Or Tablet Device

How To Manage Your Digital Assets On A Computer Or Tablet Device In This Presentation: What are DAMS? Terms Why use DAMS? DAMS vs. CMS How do DAMS work? Key functions of DAMS DAMS and records management DAMS and DIRKS Examples of DAMS Questions Resources What are DAMS?

More information

D5.3.2b Automatic Rigorous Testing Components

D5.3.2b Automatic Rigorous Testing Components ICT Seventh Framework Programme (ICT FP7) Grant Agreement No: 318497 Data Intensive Techniques to Boost the Real Time Performance of Global Agricultural Data Infrastructures D5.3.2b Automatic Rigorous

More information

JMulTi/JStatCom - A Data Analysis Toolkit for End-users and Developers

JMulTi/JStatCom - A Data Analysis Toolkit for End-users and Developers JMulTi/JStatCom - A Data Analysis Toolkit for End-users and Developers Technology White Paper JStatCom Engineering, www.jstatcom.com by Markus Krätzig, June 4, 2007 Abstract JStatCom is a software framework

More information

ICE Trade Vault. Public User & Technology Guide June 6, 2014

ICE Trade Vault. Public User & Technology Guide June 6, 2014 ICE Trade Vault Public User & Technology Guide June 6, 2014 This material may not be reproduced or redistributed in whole or in part without the express, prior written consent of IntercontinentalExchange,

More information

Digital Preservation Recorder 6.0.0

Digital Preservation Recorder 6.0.0 Digital Preservation Recorder 6.0.0 User Manual Version 1.5 RKS: 2013/1309 Document Change Record Version Changed By 0.1 Ian Little Description of Changes Initial Draft - Complete Revision of previous

More information

Integration Platforms Problems and Possibilities *

Integration Platforms Problems and Possibilities * BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 8, No 2 Sofia 2008 Integration Platforms Problems and Possibilities * Hristina Daskalova, Tatiana Atanassova Institute of Information

More information

IBM WebSphere Adapter for Email 7.0.0.0. Quick Start Tutorials

IBM WebSphere Adapter for Email 7.0.0.0. Quick Start Tutorials IBM WebSphere Adapter for Email 7.0.0.0 Quick Start Tutorials Note: Before using this information and the product it supports, read the information in "Notices" on page 182. This edition applies to version

More information

App Building Guidelines

App Building Guidelines App Building Guidelines App Building Guidelines Table of Contents Definition of Apps... 2 Most Recent Vintage Dataset... 2 Meta Info tab... 2 Extension yxwz not yxmd... 3 Map Input... 3 Report Output...

More information

NTU-IR: An Institutional Repository for Nanyang Technological University using DSpace

NTU-IR: An Institutional Repository for Nanyang Technological University using DSpace Abrizah Abdullah, et al. (Eds.): ICOLIS 2007, Kuala Lumpur: LISU, FCSIT, 2007: pp 103-108 NTU-IR: An Institutional Repository for Nanyang Technological University using DSpace Jayan C Kurian 1, Dion Hoe-Lian

More information

Publishing Europe s Television Heritage on the Web.

Publishing Europe s Television Heritage on the Web. Publishing Europe s Television Heritage on the Web. Johan Oomen 1, Vassilis Tzouvaras 2, 1 Nederlands Instituut voor Beeld en Geluid, Sumatralaan 45, Hilversum, the Netherlands joomen@beeldengeluid.nl

More information

Windchill Service Information Manager 10.1. Curriculum Guide

Windchill Service Information Manager 10.1. Curriculum Guide Windchill Service Information Manager 10.1 Curriculum Guide Live Classroom Curriculum Guide Building Information Structures with Windchill Service Information Manager 10.1 Building Publication Structures

More information

i2b2 Installation Guide

i2b2 Installation Guide Informatics for Integrating Biology and the Bedside i2b2 Installation Guide i2b2 Server and Clients Document Version: 1.7.00-003 Document Management Revision Number Date Author Comment 1.7.00-001 03/06/2014

More information

Finding the Needle in a Big Data Haystack. Wolfgang Hoschek (@whoschek) JAX 2014

Finding the Needle in a Big Data Haystack. Wolfgang Hoschek (@whoschek) JAX 2014 Finding the Needle in a Big Data Haystack Wolfgang Hoschek (@whoschek) JAX 2014 1 About Wolfgang Software Engineer @ Cloudera Search Platform Team Previously CERN, Lawrence Berkeley National Laboratory,

More information

Logging in Java Applications

Logging in Java Applications Logging in Java Applications Logging provides a way to capture information about the operation of an application. Once captured, the information can be used for many purposes, but it is particularly useful

More information

Consuming and Producing Web Services with WST and JST. Christopher M. Judd. President/Consultant Judd Solutions, LLC

Consuming and Producing Web Services with WST and JST. Christopher M. Judd. President/Consultant Judd Solutions, LLC Consuming and Producing Web Services with WST and JST Christopher M. Judd President/Consultant Judd Solutions, LLC Christopher M. Judd President/Consultant of Judd Solutions Central Ohio Java User Group

More information

INSPIRE Dashboard. Technical scenario

INSPIRE Dashboard. Technical scenario INSPIRE Dashboard Technical scenario Technical scenarios #1 : GeoNetwork catalogue (include CSW harvester) + custom dashboard #2 : SOLR + Banana dashboard + CSW harvester #3 : EU GeoPortal +? #4 :? + EEA

More information

Encryption and Anonymization in Hadoop

Encryption and Anonymization in Hadoop Encryption and Anonymization in Hadoop Current and Future needs Sept-28-2015 Page 1 ApacheCon, Budapest Agenda Need for data protection Encryption and Anonymization Current State of Encryption in Hadoop

More information

Content Based Search Add-on API Implemented for Hadoop Ecosystem

Content Based Search Add-on API Implemented for Hadoop Ecosystem International Journal of Research in Engineering and Science (IJRES) ISSN (Online): 2320-9364, ISSN (Print): 2320-9356 Volume 4 Issue 5 ǁ May. 2016 ǁ PP. 23-28 Content Based Search Add-on API Implemented

More information

Talend Technical Note

Talend Technical Note using open source data integration November 2008 v1.0 Talend Open Studio v3 launched in October 2008 offered native SAP support. In November 2008, Talend also introduced Talend Integration Suite v3, which

More information

Sonatype CLM Enforcement Points - Continuous Integration (CI) Sonatype CLM Enforcement Points - Continuous Integration (CI)

Sonatype CLM Enforcement Points - Continuous Integration (CI) Sonatype CLM Enforcement Points - Continuous Integration (CI) Sonatype CLM Enforcement Points - Continuous Integration (CI) i Sonatype CLM Enforcement Points - Continuous Integration (CI) Sonatype CLM Enforcement Points - Continuous Integration (CI) ii Contents 1

More information

and ensure validation; documents are saved in standard METS format.

and ensure validation; documents are saved in standard METS format. METS-Based Cataloging Toolkit for Digital Library Management System Li Dong, Bei Zhang Library of Tsinghua University, Beijing, China {dongli, zhangbei}@lib.tsinghua.edu.cn Chunxiao Xing, Lizhu Zhou Computer

More information

FreeForm Designer. Phone: +972-9-8309999 Fax: +972-9-8309998 POB 8792, Natanya, 42505 Israel www.autofont.com. Document2

FreeForm Designer. Phone: +972-9-8309999 Fax: +972-9-8309998 POB 8792, Natanya, 42505 Israel www.autofont.com. Document2 FreeForm Designer FreeForm Designer enables designing smart forms based on industry-standard MS Word editing features. FreeForm Designer does not require any knowledge of or training in programming languages

More information

PTK Forensics. Dario Forte, Founder and Ceo DFLabs. The Sleuth Kit and Open Source Digital Forensics Conference

PTK Forensics. Dario Forte, Founder and Ceo DFLabs. The Sleuth Kit and Open Source Digital Forensics Conference PTK Forensics Dario Forte, Founder and Ceo DFLabs The Sleuth Kit and Open Source Digital Forensics Conference What PTK is about PTK forensics is a computer forensic framework based on command line tools

More information

S1000D Transformation Toolkit. Mr. Wayne Gafford Advanced Distributed Learning (ADL) Mr. Tyler Shumaker Concurrent Technologies Corporation (CTC)

S1000D Transformation Toolkit. Mr. Wayne Gafford Advanced Distributed Learning (ADL) Mr. Tyler Shumaker Concurrent Technologies Corporation (CTC) S1000D Transformation Toolkit Mr. Wayne Gafford Advanced Distributed Learning (ADL) Mr. Tyler Shumaker Concurrent Technologies Corporation (CTC) Topics Thoughts on Content Management The Bridge Project

More information

RIFF Submission Service Technical and User Guide 21 December 2007

RIFF Submission Service Technical and User Guide 21 December 2007 RIFF Submission Service Technical and User Guide 21 December 2007 Table of Contents 1.Purpose......3 2.Overview......3 3.Services......3 4.Default Workflows......4 5.Job Configuration......6 6.Source Code

More information

Grandstream Networks, Inc.

Grandstream Networks, Inc. Grandstream Networks, Inc. XML Based Downloadable Phone Book Guide GXP21xx/GXP14xx/GXP116x IP Phone Version 2.0 XML Based Downloadable Phone Book Guide Index INTRODUCTION... 4 WHAT IS XML... 4 WHY XML...

More information

JAVA r VOLUME II-ADVANCED FEATURES. e^i v it;

JAVA r VOLUME II-ADVANCED FEATURES. e^i v it; ..ui. : ' :>' JAVA r VOLUME II-ADVANCED FEATURES EIGHTH EDITION 'r.", -*U'.- I' -J L."'.!'.;._ ii-.ni CAY S. HORSTMANN GARY CORNELL It.. 1 rlli!>*-

More information

Web Services Development In a Java Environment

Web Services Development In a Java Environment Web Services Development In a Java Environment SWE 642, Spring 2008 Nick Duan April 16, 2008 1 Overview Services Process Architecture XML-based info processing model Extending the Java EE Platform Interface-driven

More information

data.bris: collecting and organising repository metadata, an institutional case study

data.bris: collecting and organising repository metadata, an institutional case study Describe, disseminate, discover: metadata for effective data citation. DataCite workshop, no.2.. data.bris: collecting and organising repository metadata, an institutional case study David Boyd data.bris

More information

Archiving Systems. Uwe M. Borghoff Universität der Bundeswehr München Fakultät für Informatik Institut für Softwaretechnologie. uwe.borghoff@unibw.

Archiving Systems. Uwe M. Borghoff Universität der Bundeswehr München Fakultät für Informatik Institut für Softwaretechnologie. uwe.borghoff@unibw. Archiving Systems Uwe M. Borghoff Universität der Bundeswehr München Fakultät für Informatik Institut für Softwaretechnologie uwe.borghoff@unibw.de Decision Process Reference Models Technologies Use Cases

More information

Archival Data Format Requirements

Archival Data Format Requirements Archival Data Format Requirements July 2004 The Royal Library, Copenhagen, Denmark The State and University Library, Århus, Denmark Main author: Steen S. Christensen The Royal Library Postbox 2149 1016

More information

Cross Platform Publisher (XPP)

Cross Platform Publisher (XPP) G- Cloud service Cross Platform Publisher (XPP) 2014 Page 1 Table of contents 1. About us... 3 2. Overview of G- Cloud Service... 3 1.1 What does it do?... 3 1.2 What can be created?... 3 1.3 Major features...

More information

1. When will an IP process drop a datagram? 2. When will an IP process fragment a datagram? 3. When will a TCP process drop a segment?

1. When will an IP process drop a datagram? 2. When will an IP process fragment a datagram? 3. When will a TCP process drop a segment? Questions 1. When will an IP process drop a datagram? 2. When will an IP process fragment a datagram? 3. When will a TCP process drop a segment? 4. When will a TCP process resend a segment? CP476 Internet

More information

In: Proceedings of RECPAD 2002-12th Portuguese Conference on Pattern Recognition June 27th- 28th, 2002 Aveiro, Portugal

In: Proceedings of RECPAD 2002-12th Portuguese Conference on Pattern Recognition June 27th- 28th, 2002 Aveiro, Portugal Paper Title: Generic Framework for Video Analysis Authors: Luís Filipe Tavares INESC Porto lft@inescporto.pt Luís Teixeira INESC Porto, Universidade Católica Portuguesa lmt@inescporto.pt Luís Corte-Real

More information

CI:IRL. By Beth Tucker Long

CI:IRL. By Beth Tucker Long CI:IRL By Beth Tucker Long Who am I? Beth Tucker Long (@e3betht) Editor in Chief php[architect] magazine Freelancer under Treeline Design, LLC Stay at home mom User group organizer Madison PHP Audience

More information

Improved document archiving speeds; data enters the FileNexus System at a faster rate! See benchmark test spreadsheet.

Improved document archiving speeds; data enters the FileNexus System at a faster rate! See benchmark test spreadsheet. Feature Sheet Version 6.100.14 FileNexus Major Advances Client Server Communication - Dependency on Windows DCOM protocols eliminated which means NO additional configuration required on Client PCs after

More information

ISM/ISC Middleware Module

ISM/ISC Middleware Module ISM/ISC Middleware Module Lecture 14: Web Services and Service Oriented Architecture Dr Geoff Sharman Visiting Professor in Computer Science Birkbeck College Geoff Sharman Sept 07 Lecture 14 Aims to: Introduce

More information

Simplifying e Business Collaboration by providing a Semantic Mapping Platform

Simplifying e Business Collaboration by providing a Semantic Mapping Platform Simplifying e Business Collaboration by providing a Semantic Mapping Platform Abels, Sven 1 ; Sheikhhasan Hamzeh 1 ; Cranner, Paul 2 1 TIE Nederland BV, 1119 PS Amsterdam, Netherlands 2 University of Sunderland,

More information

Manage Website Template That Using Content Management System Joomla

Manage Website Template That Using Content Management System Joomla Manage Website Template That Using Content Management System Joomla Ahmad Shaker Abdalrada Alkunany Thaer Farag Ali الخالصة : سىف نتطشق في هزا البحث ال هفاهين اساسيت كيفيت ادساة قىالب الوىاقع التي تستخذم

More information

Final Report - HydrometDB Belize s Climatic Database Management System. Executive Summary

Final Report - HydrometDB Belize s Climatic Database Management System. Executive Summary Executive Summary Belize s HydrometDB is a Climatic Database Management System (CDMS) that allows easy integration of multiple sources of automatic and manual stations, data quality control procedures,

More information

>copy openssl.cfg openssl.conf (use the example configuration to create a new configuration)

>copy openssl.cfg openssl.conf (use the example configuration to create a new configuration) HowTo - PxPlus SSL This page contains the information/instructions on SSL Certificates for use with PxPlus Secure TCP/IP-based applications such as the PxPlus Web Server, the PxPlus Application Server

More information

SAP Cloud Identity Service Document Version: 1.0 2014-09-01. SAP Cloud Identity Service

SAP Cloud Identity Service Document Version: 1.0 2014-09-01. SAP Cloud Identity Service Document Version: 1.0 2014-09-01 Content 1....4 1.1 Release s....4 1.2 Product Overview....8 Product Details.... 9 Supported Browser Versions....10 Supported Languages....12 1.3 Getting Started....13 1.4

More information

Witango Application Server 6. Installation Guide for OS X

Witango Application Server 6. Installation Guide for OS X Witango Application Server 6 Installation Guide for OS X January 2011 Tronics Software LLC 503 Mountain Ave. Gillette, NJ 07933 USA Telephone: (570) 647 4370 Email: support@witango.com Web: www.witango.com

More information

HP AppPulse Mobile. Adding HP AppPulse Mobile to Your Android App

HP AppPulse Mobile. Adding HP AppPulse Mobile to Your Android App HP AppPulse Mobile Adding HP AppPulse Mobile to Your Android App Document Release Date: April 2015 How to Add HP AppPulse Mobile to Your Android App How to Add HP AppPulse Mobile to Your Android App For

More information

Managing large sound databases using Mpeg7

Managing large sound databases using Mpeg7 Max Jacob 1 1 Institut de Recherche et Coordination Acoustique/Musique (IRCAM), place Igor Stravinsky 1, 75003, Paris, France Correspondence should be addressed to Max Jacob (max.jacob@ircam.fr) ABSTRACT

More information