DC, MODS and CERIF-XML



Similar documents
eurocris, CERIF and CRIS Ed Simons, Ph.D., Radboud University Nijmegen, the Netherlands President of eurocris

Data Vault + Data Virtualization = Double Flexibility

Advantages of XML as a data model for a CRIS

A Semantic web approach for e-learning platforms

What objects must be associable with an identifier? 1 Catch plus: continuous access to cultural heritage plus

So Many Tools, So Much Data, and So Much Meta Data

PAPER Data retrieval in the PURE CRIS project at 9 universities

The FAO Open Archive: Enhancing Access to FAO Publications Using International Standards and Exchange Protocols

Library and Archives Data Structures

UTILIZING COMPOUND TERM PROCESSING TO ADDRESS RECORDS MANAGEMENT CHALLENGES

Inmagic Content Server Standard and Enterprise Configurations Technical Guidelines

OPENGREY: HOW IT WORKS AND HOW IT IS USED

WHY DIGITAL ASSET MANAGEMENT? WHY ISLANDORA?

What is Data Virtualization? Rick F. van der Lans, R20/Consultancy

Building a strong data management capability with TOGAF and ArchiMate. Bas van Gils b.vangils@bizzdesign.com

Inmagic Content Server Workgroup Configuration Technical Guidelines

Discovering Business Insights in Big Data Using SQL-MapReduce

Module: Sharepoint Administrator

Encoding Library of Congress Subject Headings in SKOS: Authority Control for the Semantic Web

Inmagic Content Server v9 Standard Configuration Technical Guidelines

Building integration environment based on OAI-PMH protocol. Novytskyi Oleksandr Institute of Software Systems NAS Ukraine

Business Proposition. Digital Asset Management. Media Intelligent

B. 3 essay questions. Samples of potential questions are available in part IV. This list is not exhaustive it is just a sample.

A software module for import of theses and dissertations to CRISs

ENHANCED PUBLICATIONS IN THE CZECH REPUBLIC

SEARCH The National Consortium for Justice Information and Statistics. Model-driven Development of NIEM Information Exchange Package Documentation

Combining SAWSDL, OWL DL and UDDI for Semantically Enhanced Web Service Discovery

Data Virtualization Usage Patterns for Business Intelligence/ Data Warehouse Architectures

Digital Asset Management Developing your Institutional Repository

GUIDELINES FOR THE CREATION OF DIGITAL COLLECTIONS

POLAR IT SERVICES. Business Intelligence Project Methodology

Oracle BI 11g R1: Build Repositories

What s New with Informatica Data Services & PowerCenter Data Virtualization Edition

Response from Oxford University Press, USA

Information Management Metamodel

How Programmers Use Internet Resources to Aid Programming

How Microsoft IT India s Test Organization Enabled Efficient Business Intelligence

Power Tools for Pivotal Tracker

Key Data Replication Criteria for Enabling Operational Reporting and Analytics

European Forest Information and Communication Platform

Enterprise Architecture Modeling PowerDesigner 16.1

Technologies for a CERIF XML based CRIS

Configuring Firewalls An XML-based Approach to Modelling and Implementing Firewall Configurations

ECM Governance Policies

Relational Database Basics Review

Information and documentation The Dublin Core metadata element set

D EUOSME: European Open Source Metadata Editor (revised )

Talend Metadata Manager. Reduce Risk and Friction in your Information Supply Chain

Notes about possible technical criteria for evaluating institutional repository (IR) software

Building Semantic Content Management Framework

Federal Enterprise Architecture and Service-Oriented Architecture

PREFACE INTRODUCTION MULTI-DIMENSIONAL MODEL. Chris Claterbos, Vlamis Software Solutions, Inc.

Creating Hybrid Relational-Multidimensional Data Models using OBIEE and Essbase by Mark Rittman and Venkatakrishnan J

<Insert Picture Here> Extending Hyperion BI with the Oracle BI Server

CHAPTER SIX DATA. Business Intelligence The McGraw-Hill Companies, All Rights Reserved

Internet Technologies for Digital Libraries

A Model-based Software Architecture for XML Data and Metadata Integration in Data Warehouse Systems

RS MDM. Integration Guide. Riversand

DATA MODEL FOR STORAGE AND RETRIEVAL OF LEGISLATIVE DOCUMENTS IN DIGITAL LIBRARIES USING LINKED DATA

Vilas Wuwongse, Thiti Vacharasintopchai, Neelawat Intaraksa Asian Institute of Technology

Flattening Enterprise Knowledge

CASRAI, eurocris, Lattes, and VIVO: Four Perspectives on Research Information Standards

DELIVERABLE. Grant Agreement number: Europeana Cloud: Unlocking Europe s Research via The Cloud

Applying MDA in Developing Intermediary Service for Data Retrieval

Mapping Analyst for Excel Guide

THE HELMHOLTZ INVENIO REPOSITORY PROJECT :

Analance Data Integration Technical Whitepaper

Data Publishing Workflows with Dataverse

4D Deployment Options for Wide Area Networks

WEB OF SCIENCE QUICK REFERENCE CARD QUICK REFERENCE CARD

jeti: A Tool for Remote Tool Integration

The Bibliography of the Italian Parliament: Building a Digital Parliamentary Research Library. What is the BPR?

Paper DM10 SAS & Clinical Data Repository Karthikeyan Chidambaram

Data Services: The Marriage of Data Integration and Application Integration

CERN Document Server

Service Oriented Architecture

Embed BA into Web Applications

The Key Elements of Digital Asset Management

Towards a Lightweight User-Centered Content Syndication Architecture

Oracle Application Development Framework Overview

Transcription:

DC, MODS and CERIF-XML A Tale of Two Cultures Ed Simons Radboud University Nijmegen, NL.

Some personal data Ed Simons Workplace: Information Centre (UCI) of Radboud University UCI takes care of all IT-services for RU. UCI also managing host of SURFnet, NL university network. Project leader software development projects. Initiator and project leader METIS: CRIS of all NL universities + NL Royal Academy of Sciences Last few years: international IT-projects within framework of development cooperation (Africa). Board member eurocris.

Structure of the Presentation 1. Comparison of the 3 formats. 2.Why XML? 3.Towards another solution for exposure and access of research information?

Part 1: Comparison of the formats

Comparison of the formats DC: Dublin Core MODS: Metadata Object Description Schema. Often goes together with DIDL (Digital Item Declaration Language), so often you see DIDL/MODS mentioned. In these cases MODS is a metadata record in the DIDL container, describing the bibliographic metadata of the publication whereas other parts in DIDL contain the metadata of the object files of the publication (location, file size, mime type etc...). CERIF-XML: XML transformation of the CERIF data model.

Comparison of the formats The following documents give a representation of the same article in the 3 XML-formats: Titel: On the relations between ISE and structure in some RE(Mg)SiAlO(N) glasses Author(s): Dauce R (Dauce, R.)1, Keding R (Keding, R.)2, Sangleboeuf JC (Sangleboeuf, J-C.)1 Source: JOURNAL OF MATERIALS SCIENCE Abbriviation: J MATER SCI Volume: 43 Issue: 22 Pages: 7239-7246 Published: NOV 2008 JCR Impact factor: 1.081 Times Cited: 0 References: 47 Abstract: Six oxide and oxynitride glasses were synthesized in the Y-Mg-Si-Al-O-N, Nd-Mg-Si-Al-O-N and La-Mg-Si-Al-O-N systems. As already known, nitrogen introduction increases the T-g, packing factor and mechanical properties of the glasses. Cationic substitution also has an influence on the glasses' behavior, particularly in terms of sensitivity to indentation load/size effect (ISE). The structure of the yttrium-containing glasses was investigated by mean of Al-27 and Si-29 MAS-NMR. Al is found to occur for 2/3 as a network former and for 1/3 as a modifier. Language: English Reprint Address: Dauce, R (reprint author), Univ Rennes 1, CNRS, LARMAUR, FRE 2717, F-35042 Rennes, France Addresses: 1. Univ Rennes 1, CNRS, LARMAUR, FRE 2717, F-35042 Rennes, France 2. Univ Aalborg, Aalborg, Denmark Corrosponding author: E-mail Addresses: rachel.dauce@gmail.com Publisher: SPRINGER, 233 SPRING ST, NEW YORK, NY 10013 USA KeyWords: AL-O-N; EARTH ALUMINOSILICATE GLASSES; OXYNITRIDE GLASSES; MAS-NMR; FLOPPY MODES; INDENTATION; SYSTEM; MICROHARDNESS; RAMAN; DIFFRACTION Subject Category: Materials Science, Multidisciplinary IDS Number: 373FZ ISSN: 0022-2461 (Print) 1573-4803 (Online) DOI: 10.1007/s10853-008-2851-3 Full text in Institutional Repository (post print): http://vbn.aau.dk/ws/fbspretrieve/16588792/fulltext.pdf.

Dublin Core DC: too simple: of limited use because of lack of detail and granularity. E.g.: no separate elements for volume, issue and page not possible to describe in the same DC record the item of which a publication, e.g., a book chapter, is a part. not possible to indicate the exact role of a creator or contributor, Etc...

Dublin Core DC reflects the tradtional library culture : electronic version of the old library card. DC possibly also reflects a political aspect or culture. The OAI-community needed a format which was easy to implement everywhere on short notice. They in a way did not have time to wait until a more suitable, robust solution was worked out. DC and DC-based harvesting indeed a success but in which sense: the success of the tool or the success of optimally supplying research information?

MODS Solves the shortcomings of DC. More detailed format and good handling of semantics, e.g.: possibility to express roles of authors/persons possibility to use established classification schemas (controlled vocabularies) by means of the authority attribute. <role> <roleterm authority="marcrelator"...> aut </roleterm> </role>

MODS Describe in the same record the item of which a publication, e.g., a book chapter, is a part. <titleinfo> <title>the provisions of the Corpus Juris on community fraud</title> <subtitle>a Belgian and Dutch perspective</subtitle> </titleinfo> <relateditem type="host"> <titleinfo> <title>das Corpus Juris als Grundlage eines europaeischen Strafrechts : Europaeisches Kolloquium, Trier, 4.-6. Maerz 1999 </title> </titleinfo> </relateditem>

MODS Still MODS heavily reflects the library culture and vision on research information. Rich metadata set to adequately describe the bibliographical aspects of a publication. But adequately and optimally exposing research information involves more than just bibliographical aspects and more than just publications. E.g. contextual research metadata (e.g. about the research project the publication results from).

CERIF-XML Describe in the same record the item of which a publication, e.g., an article, a book chapter, is a part. <cfrespubltitle> <cfrespublid>arttitle4778</cfrespublid> <cftitle cflangcode="en" cftrans="o">on the relations between ISE and structure in some RE(Mg)SiAlO(N) glasses</cftitle> </cfrespubltitle> <cfrespubltitle> <cfrespublid>journaltitle345</cfrespublid> <cftitle cflangcode="en" cftrans="o">journal OF MATERIALS SCIENCE</cfTitle> </cfrespubltitle> <cfrespubl_respubl> <cfrespublid1>arttitle4778</cfrespublid1> <cfrespublid2>journaltitle345</cfrespublid2> <cfclassid>is article in</cfclassid> <cfclassschemeid>cfresultpublication-resultpublication</cfclassschemeid> <cfstartdate>2001-01-01t12:00:00-05:00</cfstartdate> <cfenddate>2001-01-01</cfenddate> </cfrespubl_respubl>

CERIF-XML The link between an author and the publication is done in the same, uniform way. <cfpers_respubl> <cfpersid>daucer</cfpersid> <cfrespublid>arttitle4778</cfrespublid> <cfclassid>is author of</cfclassid> <cfclassschemeid>cfperson-resultpublicationroles</cfclassschemeid> <cfstartdate>2001-01-01t12:00:00-05:00</cfstartdate> <cfenddate>2001-01-01t12:00:00-05:00</cfenddate> <cfcopyright></cfcopyright> </cfpers_respubl>

CERIF-XML Very strong point of CERIF-XML: all relations between entities of a publication are done in exactly the same, uniform way: Authors to publication Article to journal Chapter to book Editors to book Etc.. Secondly: all these relations are at the same time semantically described (role of a person, type of relation between publications, etc ), again in a uniform, standardized way..

CERIF-XML Reflects very strongly the relational database culture or way of thinking: Mirrors the relational CERIF model into the XMLworld. Too fragmented: too many schema's and namespaces (e.g. More than 10 different schema's to express an article). Could lead to performance issues. Difficult to communicate to non-experts or people not familiar with relational thinking.

CERIF-XML Need to combine schema's into a limited number, corresponding to major research objects, e.g.: one schema for a publication. CERIF task group is aware of this and currently working on it already.

CERIF-XML Extensive set of metadata not limited to bibliographic or publication metadata, but encompassing all aspects of research information (including the bibliographical metadata as expressed by MODS). Strong point of CERIF is that it is a uniformed, standardized MODEL which allows easy extension or addition of research metadata (and not so much a given, fixed list of metadata).

Part 2: Why XML?

Why XML? We all seem to uncritically embrace XML as the obvious format for exposing research information in the international context. All has to be prepared to work with the XML-based architecture and technologies: OAI/PMH, SOA... Result: we all copy, transform and double store data (e.g. from our CRIS repositories we transform a set of metadata into XML which we then upload/store in the institutional repository).

Why XML? But shouldn't we ask ourselves whether all this copying, transforming and re-storing of data in XMLformat is necessary and really the way to go? Would it not be better should there be a solution which only needs the original data sources and leave these intact without transforming and re-storing data somewhere else?

Part 3: Towards another solution?

Towards another solution?

Towards another solution? Conclusions: METIS can automatically harvest the metadata already stored in Elsevier s SCOPUS database and so these do not have to be entered separately in METIS again. However up to now, METIS still stores the harvested data in its own database, but actually this probably should not be necessary and so we should considering solutions for this. This brings us to a next step.

Business Intelligence view may be inspiring In one sentence Business Intelligence (BI) could be defined as: knowledge of all aspects of the business in a comprehensive, integrated and maneagable way. BI-tools are softwares which supply this knowledge (e.g. Business Objects, Jasper-Reports/iReport). Great consumers of BI are managers of big companies who need to know all aspects of their business in a comprehensive manageable form (statistics, charts, diagrams, et...)

Business Intelligence view may be inspiring The problem that BI is confronted with is more or less the same as we face when talking about getting full, appropriate and integrated view on research information: the data is dispersed over various, heterogenuous resources (databases,, XMLrepositories, files, etc..). There are solutions emerging that solve this problem, in other words: which supply timely data from heterogenuous sources in an integrated way without first copying, tranforming and storing these data in intermediate resources.

Business Intelligence view may be inspiring The following builds upon the ideas expressed by Rick van der Lans, a Dutch internationally acclaimed expert on software architecture and solutions for Business Intellingence and notably his recent publication: Rick, F. van der Lans, Developing a Data Delivery PlatformWith Informatica Data Services. A Technical Whitepaper on Next Generation Data Virtualization, February 28 th, 2011, Copyright 2011 R20/Consultancy. http://vip.informatica.com/ricklans8761?elqpurlpage=6013&docid=1571&lsc=na- Ongoing-2011Q1-JP-DI_Developing_Data_Delivery_Platform_WP_www

Federation Server

Federation Server Works with all kinds of input resources: relational databases, data warehouses, XML-resources, Excel sheets, text files, web services, etc.. Based on relational database concept but with virtual tables. No (re-)storage of the data On demand (on-the-fly) transformation of incoming data to the virtual table structure.

Federation Server: virtualization Copyright 2011 R20/Consultancy B.V., The Hague, The Netherlands

Mapping foreign to virtual table Copyright 2011 R20/Consultancy B.V., The Hague, The Netherlands

Mapping XML document to virtual table Copyright 2011 R20/Consultancy B.V., The Hague, The Netherlands

Joining Relational and XML data Copyright 2011 R20/Consultancy B.V., The Hague, The Netherlands

. Concrete Application

Concrete Application (Nirvana?)

To conclude: Perhaps good to explore also this kind of technologies instead of just sticking to the XML based solutions. Thank you for your attention!