Content Management for Content Enrichment: Architectural Issues and Strategies
|
|
|
- Brandon Hardy
- 10 years ago
- Views:
Transcription
1 Content Management for Content Enrichment: Architectural Issues and Strategies Evan Owens Chief Information Officer AIP Publishing, LLC STM E-Production Seminar 2013
2 This Presentation 2 Historical Introduction Content Management & Content Enrichment Architectural Issues, Questions, Strategies, Use Cases AIP Publishing Case Study: New Thesaurus Author Disambiguation Affiliation Disambiguation A related presentation: The Evolving Information Ecosystem of Publishing, JATS-Con Proceedings
3 Publishing & Content Management in 1990s 3 Publishing is adding a useful degree of uniformity to information How were we preparing for the digital future? Creating a version of record in SGML/XML full text Making the perfect master file Preparing to publish simultaneously to print and online Article SGML/XML file as a pseudo-database or pseudo-cms: <article copyeditor= XYZ maildate= 00/00/00 > A document-centric approach!
4 Publishing & Content Management Today 4 Publishing is adding value to a collection of content by enrichment and by managing the information life cycle What has changed? Content management is now multi-dimensional, multi-system Publishing is a much more complex ecosystem DOIs, ORCID, DataCite, Thesauri, etc., etc., etc. Less static and less document-centric, more database-like More complex information data models No longer publish and be done Life cycle management is now essential component
5 Content Management 5 A set of processes and technologies that support the evolutionary life cycle of digital information Capture, storage, security, revision control, retrieval, distribution, preservation, and description of documents and content Wikipedia 2007 Some CM components that support enrichment: Version control Technical metadata (formats, format versions, validation) Provenance metadata (processing history)
6 Questions for Enrichment Implementations 6 1. Does the enrichment benefit from author vetting? 2. Is the enrichment part of the permanent scholarly record? 3. How standardized is the enriched information? 4. How volatile is the enriched information?
7 7 Some Use Cases Examples Use Cases Reference Linking Keywords Affiliations Authors Identity Funding Information Key Questions: Author vetting? Scholarly record? How standardized? How volatile?
8 Key Architectural Choices 8 When is the content enhanced? By the author During submission During production/editorial process By the delivery/hosting system Where does the enhanced information live? Embedded in the content In the archival XML or in the exported XML External to the content Layered information architectures
9 Key Design Challenges 9 What is the master source/copy of the information? Is the information normalized or de-normalized? e.g., repeating parent metadata across child elements How to synchronize between multiple systems? e.g., Peer Review System XML content As we move from document-centric to more complex information models and architectures, robust entity-relationship modeling becomes critical.
10 10 AIPP CASE STUDY: NEW PHYSICS THESAURUS AUTHOR DISAMBIGUATION AFFILIATION DISAMBIGUATION
11 Enrichment / Disambiguation Goal 11 Author Pages Subject Pages Institution Pages Articles
12 Key Strategic Decision (Business & Technical) 12 Semantic enrichment and disambiguation to be considered as a feature of the delivery platform Not as part of the publication or version of record Past practice: PACS codes printed on pages and in the PDF Resulted in mismatches between older and newer content Differences visible in previous hosting platform
13 Delivery/Hosting System Architecture 13 Publishing Technology s Pub2Web (P2W) hosting platform is built on an RDF triple store But RDF is ideal for expressing complex relationships P2W manages RDF changes via set algebra P2W displays links dynamically based on the RDF Both parts of a relationship have to be present at RDF loading Content loading is very resource intensive Every technology has its quirks
14 AIPP s Implementation Choices 14 XML master in RSuite archive With all article assets: print, online, supplemental Export packaging for hosting platform Version control via RSuite Interactions between AI and P2W managed by RSuite Processing history captured in RSuite, not in the XML Semantic embedded in article XML Keywords and inline tagging XML markup strippable via XSLT Disambiguation in separate XML files Pointing back into the articles An external annotation layer
15 XML from AI Implementation Overview 15 Thesaurus Subject Pages Dynamically created By Pub2Web Author Pages Articles XML from AI Points into articles Institution Pages XML + semantic (not disambiguation) XML from AI Points into articles
16 16
17 Keyword group in header: AIPP JATS XML: Keywords 17 Keywords inline:
18 AIPP Disambiguated Author XML 18
19 AIPP Disambiguated Affiliation XML 19
20 AIPP Lifecycle Use Cases: Add / Change / Delete 20 Content Corrections could impact author / affiliation / keywords Thesaurus Terms Vocabulary will evolve Enrichment Rules Quarterly reruns of entire corpus with latest rule set Author Disambiguation New info could cause merge or split Institution Disambiguation Organizational changes (names, mergers, etc.)
21 21 Publishing is adding value to a collection of content by enrichment and by managing the information life cycle NO SINGLE MAGIC SOLUTION YOUR MILEAGE MAY VARY! QUESTIONS? COMMENTS?
SHared Access Research Ecosystem (SHARE)
SHared Access Research Ecosystem (SHARE) June 7, 2013 DRAFT Association of American Universities (AAU) Association of Public and Land-grant Universities (APLU) Association of Research Libraries (ARL) This
Digital Assets Repository 3.0. PASIG User Group Conference Noha Adly Bibliotheca Alexandrina
Digital Assets Repository 3.0 PASIG User Group Conference Noha Adly Bibliotheca Alexandrina DAR 3.0 DAR manages the full lifecycle of a digital asset: its creation, ingestion, metadata management, storage,
DDI Lifecycle: Moving Forward Status of the Development of DDI 4. Joachim Wackerow Technical Committee, DDI Alliance
DDI Lifecycle: Moving Forward Status of the Development of DDI 4 Joachim Wackerow Technical Committee, DDI Alliance Should I Wait for DDI 4? No! DDI Lifecycle 4 is a long development process DDI Lifecycle
Cataloguing is riding the waves of change Renate Beilharz Teacher Library and Information Studies Box Hill Institute
Cataloguing is riding the waves of change Renate Beilharz Teacher Library and Information Studies Box Hill Institute Abstract Quality catalogue data is essential for effective resource discovery. Consistent
Tools for Researchers
Tools for Researchers Microsoft Research & the Scholarly Information Ecosystem Lee Dirks & Alex Wade Directors, Education & Scholarly Communication Microsoft External Research Microsoft Corporation Microsoft
Databases in Organizations
The following is an excerpt from a draft chapter of a new enterprise architecture text book that is currently under development entitled Enterprise Architecture: Principles and Practice by Brian Cameron
Building Semantic Content Management Framework
Building Semantic Content Management Framework Eric Yen Computing Centre, Academia Sinica Outline What is CMS Related Work CMS Evaluation, Selection, and Metrics CMS Applications in Academia Sinica Concluding
TopBraid Life Sciences Insight
TopBraid Life Sciences Insight In the Life Sciences industries, making critical business decisions depends on having relevant information. However, queries often have to span multiple sources of information.
Content Management in the 21 st Century
Content Management in the 21 st Century JoAnn Hackos Annette Reilly June 2014 2014 Comtech Services, Inc. 1 What will we be talking about? What is ISO/IEC/IEEE 26531 all about? Why should you be interested
Encoding Library of Congress Subject Headings in SKOS: Authority Control for the Semantic Web
Encoding Library of Congress Subject Headings in SKOS: Authority Control for the Semantic Web Corey A Harper University of Oregon Libraries Tel: +1 541 346 1854 Fax:+1 541 346 3485 [email protected]
Semantic SharePoint. Technical Briefing. Helmut Nagy, Semantic Web Company Andreas Blumauer, Semantic Web Company
Semantic SharePoint Technical Briefing Helmut Nagy, Semantic Web Company Andreas Blumauer, Semantic Web Company What is Semantic SP? a joint venture between iquest and Semantic Web Company, initiated in
- a Humanities Asset Management System. Georg Vogeler & Martina Semlak
- a Humanities Asset Management System Georg Vogeler & Martina Semlak Infrastructure to store and publish digital data from the humanities (e.g. digital scholarly editions): Technically: FEDORA repository
technische universiteit eindhoven WIS & Engineering Geert-Jan Houben
WIS & Engineering Geert-Jan Houben Contents Web Information System (WIS) Evolution in Web data WIS Engineering Languages for Web data XML (context only!) RDF XML Querying: XQuery (context only!) RDFS SPARQL
WHITE PAPER DATA GOVERNANCE ENTERPRISE MODEL MANAGEMENT
WHITE PAPER DATA GOVERNANCE ENTERPRISE MODEL MANAGEMENT CONTENTS 1. THE NEED FOR DATA GOVERNANCE... 2 2. DATA GOVERNANCE... 2 2.1. Definition... 2 2.2. Responsibilities... 3 3. ACTIVITIES... 6 4. THE
A grant number provides unique identification for the grant.
Data Management Plan template Name of student/researcher(s) Name of group/project Description of your research Briefly summarise the type of your research to help others understand the purposes for which
Why archiving erecords influences the creation of erecords. Martin Stürzlinger scopepartner Vienna, Austria
Why archiving erecords influences the creation of erecords Martin Stürzlinger scopepartner Vienna, Austria Electronic Records In a Productive System Created Used Changed Deleted In an Archival System No
Increase Agility and Reduce Costs with a Logical Data Warehouse. February 2014
Increase Agility and Reduce Costs with a Logical Data Warehouse February 2014 Table of Contents Summary... 3 Data Virtualization & the Logical Data Warehouse... 4 What is a Logical Data Warehouse?... 4
ONTOLOGY-BASED APPROACH TO DEVELOPMENT OF ADJUSTABLE KNOWLEDGE INTERNET PORTAL FOR SUPPORT OF RESEARCH ACTIVITIY
ONTOLOGY-BASED APPROACH TO DEVELOPMENT OF ADJUSTABLE KNOWLEDGE INTERNET PORTAL FOR SUPPORT OF RESEARCH ACTIVITIY Yu. A. Zagorulko, O. I. Borovikova, S. V. Bulgakov, E. A. Sidorova 1 A.P.Ershov s Institute
Linked Open Data Infrastructure for Public Sector Information: Example from Serbia
Proceedings of the I-SEMANTICS 2012 Posters & Demonstrations Track, pp. 26-30, 2012. Copyright 2012 for the individual papers by the papers' authors. Copying permitted only for private and academic purposes.
A Business Case for Enterprise Content Integration using Ontology-based Content Analytics
A Business Case for Enterprise Content Integration using Ontology-based Content Analytics Edward Curry 1, Bill McDaniel 1, Dmitry Shingarev 1, Milena C. Caires 1, Mark Leyden 1, Sean O Riain 1, Karl Flannery
ECM Governance Policies
ECM Governance Policies Metadata and Information Architecture Policy Document summary Effective date 13 June 2012 Last updated 17 November 2011 Policy owner Library Services, ICTS Approved by Council Reviewed
Adding Robust Digital Asset Management to Oracle s Storage Archive Manager (SAM)
Adding Robust Digital Asset Management to Oracle s Storage Archive Manager (SAM) Oracle's Sun Storage Archive Manager (SAM) self-protecting file system software reduces operating costs by providing data
data.bris: collecting and organising repository metadata, an institutional case study
Describe, disseminate, discover: metadata for effective data citation. DataCite workshop, no.2.. data.bris: collecting and organising repository metadata, an institutional case study David Boyd data.bris
Semantic Exploration of Archived Product Lifecycle Metadata under Schema and Instance Evolution
Semantic Exploration of Archived Lifecycle Metadata under Schema and Instance Evolution Jörg Brunsmann Faculty of Mathematics and Computer Science, University of Hagen, D-58097 Hagen, Germany [email protected]
Information and documentation The Dublin Core metadata element set
ISO TC 46/SC 4 N515 Date: 2003-02-26 ISO 15836:2003(E) ISO TC 46/SC 4 Secretariat: ANSI Information and documentation The Dublin Core metadata element set Information et documentation Éléments fondamentaux
Tibiscus University, Timişoara
PDF/A standard for long term archiving Ramona Vasilescu Tibiscus University, Timişoara ABSTRACT. PDF/A is defined by ISO 19005-1 as a file format based on PDF format. The standard provides a mechanism
MarkLogic Semantics in Healthcare and Life Sciences for LIDER COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
MarkLogic Semantics in Healthcare and Life Sciences for LIDER The Only Enterprise NoSQL Database Search & Query ACID Transactions High Availability / Disaster Recovery Replication Government-grade Security
SemWeB Semantic Web Browser Improving Browsing Experience with Semantic and Personalized Information and Hyperlinks
SemWeB Semantic Web Browser Improving Browsing Experience with Semantic and Personalized Information and Hyperlinks Melike Şah, Wendy Hall and David C De Roure Intelligence, Agents and Multimedia Group,
MultiMimsy database extractions and OAI repositories at the Museum of London
MultiMimsy database extractions and OAI repositories at the Museum of London Mia Ridge Museum Systems Team Museum of London [email protected] Scope Extractions from the MultiMimsy 2000/MultiMimsy
Cite My Data M2M Service Technical Description
Cite My Data M2M Service Technical Description 1 Introduction... 2 2 How Does it Work?... 2 2.1 Integration with the Global DOI System... 2 2.2 Minting DOIs... 2 2.3 DOI Resolution... 3 3 Cite My Data
A guide to the lifeblood of DAM:
A guide to the lifeblood of DAM: Key concepts and best practices for using metadata in digital asset management systems. By John Horodyski. Sponsored by Widen Enterprises and DigitalAssetManagement.com.
DATA MODEL FOR STORAGE AND RETRIEVAL OF LEGISLATIVE DOCUMENTS IN DIGITAL LIBRARIES USING LINKED DATA
DATA MODEL FOR STORAGE AND RETRIEVAL OF LEGISLATIVE DOCUMENTS IN DIGITAL LIBRARIES USING LINKED DATA María Hallo 1, Sergio Luján-Mora 2, and Alejandro Mate 3 1 Department of Computer Science, National
In ediscovery and Litigation Support Repositories MPeterson, June 2009
XAM PRESENTATION (extensible TITLE Access GOES Method) HERE In ediscovery and Litigation Support Repositories MPeterson, June 2009 Contents XAM Introduction XAM Value Propositions XAM Use Cases Digital
SCHOLARONE MANUSCRIPTS TM ORCID ID GUIDE
SCHOLARONE MANUSCRIPTS TM ORCID ID GUIDE TABLE OF CONTENTS Select an item in the table of contents to go to that topic in the document. ORCID OVERVIEW... 2 ASSOCIATING ORCID IDS... 2 USER ACCOUNT CREATION...
Selecting a Taxonomy Management Tool. Wendi Pohs InfoClear Consulting #SLATaxo
Selecting a Taxonomy Management Tool Wendi Pohs InfoClear Consulting #SLATaxo InfoClear Consulting What do we do? Content Analytics Strategy and Implementation, including: Taxonomy/Ontology development
OvidSP Quick Reference Guide
OvidSP Quick Reference Guide Opening an OvidSP Session Open the OvidSP URL with a browser or Follow a link on a web page or Use Athens or Shibboleth access Select Resources to Search In the Select Resource(s)
ISSUES ON FORMING METADATA OF EDITORIAL SYSTEM S DOCUMENT MANAGEMENT
ISSN 1392 124X INFORMATION TECHNOLOGY AND CONTROL, 2005, Vol.34, No.4 ISSUES ON FORMING METADATA OF EDITORIAL SYSTEM S DOCUMENT MANAGEMENT Marijus Bernotas, Remigijus Laurutis, Asta Slotkienė Information
How To Write A Request For Information (Rfi)
Request for Information No. 15-200-ACCO Litigation Hold & ediscovery Tool Posting Date: November 14, 2014 Event Timeline: This Request for Information (RFI) is issued by Washington State Department of
A Java Tool for Creating ISO/FGDC Geographic Metadata
F.J. Zarazaga-Soria, J. Lacasta, J. Nogueras-Iso, M. Pilar Torres, P.R. Muro-Medrano17 A Java Tool for Creating ISO/FGDC Geographic Metadata F. Javier Zarazaga-Soria, Javier Lacasta, Javier Nogueras-Iso,
Service Road Map for ANDS Core Infrastructure and Applications Programs
Service Road Map for ANDS Core and Applications Programs Version 1.0 public exposure draft 31-March 2010 Document Target Audience This is a high level reference guide designed to communicate to ANDS external
The FAO Open Archive: Enhancing Access to FAO Publications Using International Standards and Exchange Protocols
The FAO Open Archive: Enhancing Access to FAO Publications Using International Standards and Exchange Protocols Claudia Nicolai; Imma Subirats; Stephen Katz Food and Agriculture Organization of the United
EFFECTIVE STORAGE OF XBRL DOCUMENTS
EFFECTIVE STORAGE OF XBRL DOCUMENTS An Oracle & UBmatrix Whitepaper June 2007 Page 1 Introduction Today s business world requires the ability to report, validate, and analyze business information efficiently,
DSpace: An Institutional Repository from the MIT Libraries and Hewlett Packard Laboratories
DSpace: An Institutional Repository from the MIT Libraries and Hewlett Packard Laboratories MacKenzie Smith, Associate Director for Technology Massachusetts Institute of Technology Libraries, Cambridge,
Extracting and Preparing Metadata to Make Video Files Searchable
Extracting and Preparing Metadata to Make Video Files Searchable Meeting the Unique File Format and Delivery Requirements of Content Aggregators and Distributors Table of Contents Executive Overview...
Scientific Knowledge and Reference Management with Zotero Concentrate on research and not re-searching
Scientific Knowledge and Reference Management with Zotero Concentrate on research and not re-searching Dipl.-Ing. Erwin Roth 05.08.2009 Agenda Motivation Idea behind Zotero Basic Usage Zotero s Features
Impelsys: Your Partner for Digital Product Development & Commercialization
Impelsys: Your Partner for Digital Product Development & Commercialization Impelsys is your strategic partner through your workflow process from production to delivery and revenue generation. Publishing
A Secure Autonomous Document Architecture for Enterprise Digital Right Management
A Secure Autonomous Document Architecture for Enterprise Digital Right Management Manuel Munier LIUPPA Université de Pau et des Pays de l Adour Mont de Marsan, France [email protected] SITIS 2011
Interagency Science Working Group. National Archives and Records Administration
Interagency Science Working Group 1 National Archives and Records Administration Establishing Trustworthy Digital Repositories: A Discussion Guide Based on the ISO Open Archival Information System (OAIS)
Terms and Definitions for CMS Administrators, Architects, and Developers
Sitecore CMS 6 Glossary Rev. 081028 Sitecore CMS 6 Glossary Terms and Definitions for CMS Administrators, Architects, and Developers Table of Contents Chapter 1 Introduction... 3 1.1 Glossary... 4 Page
Scholars@Duke Data Consumer's Guide. Aggregating and consuming data from Scholars@Duke profiles March, 2015
Scholars@Duke Data Consumer's Guide Aggregating and consuming data from Scholars@Duke profiles March, 2015 Contents Getting Started with Scholars@Duke Data 1 Who is this Guide for? 1 Why consume Scholars@Duke
OpenText Content Hub for Publishers
OpenText Content Hub for Publishers For managing content across all your publishing channels July 2011 TOGETHER, WE ARE THE CONTENT EXPERTS WHITEPAPER 1 What is OpenText Content Hub for Publishers? OpenText
How To Write A Blog Post On Globus
Globus Software as a Service data publication and discovery Kyle Chard, University of Chicago Computation Institute, [email protected] Jim Pruyne, University of Chicago Computation Institute, [email protected]
Ten Tests for Microsoft s Document Inspector: Does it satisfy the Metadata Management Needs of Law Firms?
Ten Tests for Microsoft s Document Inspector: Does it satisfy the Metadata Management Needs of Law Firms? Esquire Innovations, Inc., a leading provider of Microsoft Office integrated practice management
Taking full advantage of the medium does also mean that publications can be updated and the changes being visible to all online readers immediately.
Making a Home for a Family of Online Journals The Living Reviews Publishing Platform Robert Forkel Heinz Nixdorf Center for Information Management in the Max Planck Society Overview The Family The Concept
Data Warehouses in the Path from Databases to Archives
Data Warehouses in the Path from Databases to Archives Gabriel David FEUP / INESC-Porto This position paper describes a research idea submitted for funding at the Portuguese Research Agency. Introduction
Presentation / Interface 1.3
W3C Recommendations Mobile Web Best Practices 1.0 Canonical XML Version 1.1 Cascading Style Sheets, level 2 (CSS2) SPARQL Query Results XML Format SPARQL Protocol for RDF SPARQL Query Language for RDF
Content Management Using the Rational Unified Process By: Michael McIntosh
Content Management Using the Rational Unified Process By: Michael McIntosh Rational Software White Paper TP164 Table of Contents Introduction... 1 Content Management Overview... 1 The Challenge of Unstructured
Yandex: Webmaster Tools Overview and Guidelines
Yandex: Webmaster Tools Overview and Guidelines Agenda Introduction Register Features and Tools 2 Introduction What is Yandex Yandex is the leading search engine in Russia. It has nearly 60% market share
Making Content Easy to Find. DC2010 Pittsburgh, PA Betsy Fanning AIIM
Making Content Easy to Find DC2010 Pittsburgh, PA Betsy Fanning AIIM Who is AIIM? The leading industry association representing professionals working in Enterprise Content Management (ECM). We offer a
ELIS Managing Enterprise Level Learning Programs with Moodle
ELIS Managing Enterprise Level Learning Programs with Moodle Mike Churchward Remote-Learner.net, USA/Canada/UK, [email protected] Abstract Moodle as a learning management system does a superior job
EPrints Preservation Update
EPrints Preservation Update EPrints Preservation New EPrints Preservation Team Overseeing integration of preservation into EPrints software as well as involvement with preservation projects Preserv2 Ended
How To Useuk Data Service
Publishing and citing research data Research Data Management Support Services UK Data Service University of Essex April 2014 Overview While research data is often exchanged in informal ways with collaborators
European Archival Records and Knowledge Preservation Database Archiving in the E-ARK Project
European Archival Records and Knowledge Preservation Database Archiving in the E-ARK Project Janet Delve, University of Portsmouth Kuldar Aas, National Archives of Estonia Rainer Schmidt, Austrian Institute
LinkZoo: A linked data platform for collaborative management of heterogeneous resources
LinkZoo: A linked data platform for collaborative management of heterogeneous resources Marios Meimaris, George Alexiou, George Papastefanatos Institute for the Management of Information Systems, Research
Lightweight Data Integration using the WebComposition Data Grid Service
Lightweight Data Integration using the WebComposition Data Grid Service Ralph Sommermeier 1, Andreas Heil 2, Martin Gaedke 1 1 Chemnitz University of Technology, Faculty of Computer Science, Distributed
How to create database in GlycomcsPortal?
How to create database in GlycomcsPortal? 1. Log- in Log in through Log in 2. Submit Content Click Submit Content on the menu. 3. Choose Database Choose Database as a type of entry you desire to create.
TopBraid Insight for Life Sciences
TopBraid Insight for Life Sciences In the Life Sciences industries, making critical business decisions depends on having relevant information. However, queries often have to span multiple sources of information.
Standards, Tools and Web 2.0
Standards, Tools and Web 2.0 Web Programming Uta Priss ZELL, Ostfalia University 2013 Web Programming Standards and Tools Slide 1/31 Outline Guidelines and Tests Logfile analysis W3C Standards Tools Web
Archival Data Format Requirements
Archival Data Format Requirements July 2004 The Royal Library, Copenhagen, Denmark The State and University Library, Århus, Denmark Main author: Steen S. Christensen The Royal Library Postbox 2149 1016
Metadata in Microsoft Office and in PDF Documents Types, Export, Display and Removal
White Paper Metadata in Microsoft Office and in PDF Documents Types, Export, Display and Removal Copyright 2002-2009 soft Xpansion GmbH & Co. KG White Paper Metadata in PDF Files 1 Contents Term Definitions
White Paper. Software Development Best Practices: Enterprise Code Portal
White Paper Software Development Best Practices: Enterprise Code Portal An Enterprise Code Portal is an inside the firewall software solution that enables enterprise software development organizations
