LONG TERM PRESERVATION OF ELECTRONIC PUBLICATIONS GUARANTEEING ACCESS THROUGH ADOPTION OF XML-BASED OPEN DOCUMENT FORMATS



Similar documents
Preservation Handbook

Data Leakage Detection in Cloud Computing using Identity Services

Management Update: Important Issues About Digital Data Preservation

XMPP A Perfect Protocol for the New Era of Volunteer Cloud Computing

OCLC Digital Archive Preservation Policy and Supporting Documentation Last Revised: 8 August 2006

Decision Trees for Mining Data Streams Based on the Gaussian Approximation

Long Term Preservation of Digital Information

Use of modern telephone network for time transfer: An innovation

A Review of Recent E-learning Trends: Implementation & Cognitive Styles

Archive exchange Format AXF

A Mind Map Based Framework for Automated Software Log File Analysis

A grant number provides unique identification for the grant.

ElegantJ BI. White Paper. Making the Choice: Commercial Open Source (COS) vs. Proprietary Business Intelligence (BI)

Electronic Records Management Guidelines - File Formats

Archival Data Format Requirements

Phire Architect Hardware and Software Requirements

THEREDA - The Thermodynamic Reference Database for Nuclear Waste Disposal in Germany

INTEROPERABILITY IN DATA WAREHOUSES

Cloud Computing Services and its Application

IBM Solution Framework for Lifecycle Management of Research Data IBM Corporation

Tibiscus University, Timişoara

NTU-IR: An Institutional Repository for Nanyang Technological University using DSpace

Gradient An EII Solution From Infosys

EVOLUTION OF NETWORKED STORAGE

AHDS Digital Preservation Glossary

ERA-CAPS Data Sharing Policy ERA-CAPS. Data Sharing Policy

Planning a Backup Strategy

E-learning as a Powerful Tool for Knowledge Management

AXF Archive exchange Format: Interchange & Interoperability for Operational Storage and Long-Term Preservation

Add the compliance and discovery benefits of records management to your business solutions. IBM Information Management software

White. Paper. The SMB Market is Ready for Data Encryption. January, 2011

Why is the V3 appliance so effective as a physical desktop replacement?

Considerations for Management of Laboratory Data

Queensland Government Digital Continuity Strategy

Archiving Systems. Uwe M. Borghoff Universität der Bundeswehr München Fakultät für Informatik Institut für Softwaretechnologie.

Overview of NDNP Technical Specifications

CHAPTER 1 INTRODUCTION

BIG DATA: DATA EVERYWHERE

A Best Practice Guide to Archiving Persistent Data: How archiving is a vital tool as part of a data center cost savings exercise

HOLLYWOOD POST ALLIANCE TECHNOLOGY RETREAT FEBRUARY 13, 2015 S. MERRILL WEISS / MERRILL WEISS GROUP LLC CHAIR, SMPTE AXF WORKING GROUP

DELAWARE PUBLIC ARCHIVES POLICY STATEMENT AND GUIDELINES MODEL GUIDELINES FOR ELECTRONIC RECORDS

XML-BASED INTEGRATION: A CASE STUDY

How to Enhance Traditional BI Architecture to Leverage Big Data

TEXT FILES. Format Description / Properties Usage and Archival Recommendations

Filing Information Rich Digital Asset Management Coca-Cola s Archive Research Assistant: Using DAM for Competitive Advantage IDC Opinion

Best Practices for Research Data Management. October 30, 2014

Recordkeeping for Good Governance Toolkit. GUIDELINE 14: Digital Recordkeeping Choosing the Best Strategy

Adobe Anywhere for video Collaborate without boundaries

Functional Requirements for Digital Asset Management Project version /30/2006

A Metadata Model for Peer-to-Peer Media Distribution

MEDIA ASSET MANAGEMENT

Digital Preservation Guidance Note: Selecting File Formats for Long-Term Preservation

Architecting an Industrial Sensor Data Platform for Big Data Analytics

Communications and Computer Networks

Digital Rights Management

CA XOsoft Content Distribution v4

COPYRIGHT AND LICENSING ISSUES FOR DIGITAL PRESERVATION AND POSSIBLE SOLUTIONS

THE NATIONAL FREE AND OPEN SOURCE SOFTWARE (FOSS), AND OPEN STANDARDS POLICY DRAFT SEPT 2014

A Parallel Processor for Distributed Genetic Algorithm with Redundant Binary Number

The Business Case for Device Management SyncML Initiative Ltd. White Paper

Digital Preservation. OAIS Reference Model

Cloud Computing and Digital Preservation: A Comparison of Two Services. Amanda L. Stowell. San Jose State University

The ASTM E57 File Format for 3D Imaging Data Exchange

Internationalization and Web Services

Windows-Based Guided Data Capture Software for Mass-Scale Thermophysical and Thermochemical Property Data Collection

The Next Frontier. for Records Managers. Retention and Disposition of Structured Data:

DIGITIZATION S GUIDE. Go for quality and document your process!

Physically present future preserved

Big Data Driving Need for Storage Application Platforms. Mike Stolz Vice President, Marketing and Alliances Networked Storage Solutions

Integrating Apache Spark with an Enterprise Data Warehouse

Web-based Multimedia Content Management System for Effective News Personalization on Interactive Broadcasting

Windows Embedded Security and Surveillance Solutions

International Standards for Online Finding Aids in German Archives

1. Redistributions of documents, or parts of documents, must retain the SWGIT cover page containing the disclaimer.

The ASTM E57 File Format for 3D Imaging Data Exchange

Infosys GRADIENT. Enabling Enterprise Data Virtualization. Keywords. Grid, Enterprise Data Integration, EII Introduction

Quantum DXi6500 Family of Network-Attached Disk Backup Appliances with Deduplication

DATA PROGRESSION. The Industry s Only SAN with Automated. Tiered Storage STORAGE CENTER DATA PROGRESSION

The Evolution of Cloud Storage - From "Disk Drive in the Sky" to "Storage Array in the Sky" Allen Samuels Co-founder & Chief Architect

Universal Access Through Time: Archiving Strategies for Digital Publications

Introduction. Chapter 1. Introducing the Database. Data vs. Information

Managing a Fibre Channel Storage Area Network

Integration of Heterogeneous Databases based on XML

Digital Asset Management Developing your Institutional Repository

Survey of Big Data Benchmarking

Enterprise Application Integration (EAI) Techniques

Using Embedded System Information to Document Complex Business Applications

JHU Data Management Best Practices Initiative

Data management plan

Key Management Interoperability Protocol (KMIP)

Solving Healthcare's BIG Data Problem... Imaging and Cloud Infrastructure

IBM Global Technology Services March Virtualization for disaster recovery: areas of focus and consideration.

Snapshot Technology: Improving Data Availability and Redundancy

IT S ABOUT TIME. Sponsored by. The National Science Foundation. Digital Government Program and Digital Libraries Program

POLICY AND GUIDELINES FOR THE MANAGEMENT OF ELECTRONIC RECORDS INCLUDING ELECTRONIC MAIL ( ) SYSTEMS

DIGITAL PRESERVATION AT THE U.S. GOVERNMENT PRINTING OFFICE: WHITE PAPER. Version July 2008 UNITED STATES GOVERNMENT PRINTING OFFICE

Symantec NetBackup Snapshots, Continuous Data Protection, and Replication

PERFORMANCE ANALYSIS OF VIDEO FORMATS ENCODING IN CLOUD ENVIRONMENT

Preservation Projects at Basel University

DATA MIGRATION IN ARCHIVES OF SERBIA AND MONTENEGRO CONCEPT AND EXAMPLE

Transcription:

LONG TERM PRESERVATION OF ELECTRONIC PUBLICATIONS GUARANTEEING ACCESS THROUGH ADOPTION OF XML-BASED OPEN DOCUMENT FORMATS 1. R Vasanth Kumar Mehta, Lecturer, Department of Computer Science and Engineering, Sri Chandrasekharendra Saraswathi Viswa Mahavidyalaya SCSVMV University, Kanchipuram 2. N R Ananthanarayanan Lecturer, Department of Computer Science and Engineering, Sri Chandrasekharendra Saraswathi Viswa Mahavidyalaya SCSVMV University, Kanchipuram CONTACT PERSON & DETAILS: R Vasanth Kumar Mehta, Lecturer, Department of Computer Science and Engineering, Sri Chandrasekharendra Saraswathi Viswa Mahavidyalaya SCSVMV University Enathur, Kanchipuram 631561 Tel: 022-27264301 Ext. 271 Mob: 91 94452-47274 Email: vasanthmehta@gmail.com SHORT TITLE: XML-Based Open Document Formats

XML-Based Open Document Formats 1 ABSTRACT A common approach to preserving documents is digitization. However, preserving digitized documents and ensuring their longevity gives rise to new challenges due to changes in storage media, devices and data formats. The problem posed by proprietary data formats in long term preservation of electronic documents is highlighted. A solution in the form of xml-based open document formats is suggested for addressing the problem. KEYWORDS Open document format, digital preservation, digital longevity

XML-Based Open Document Formats 2 Introduction The need for preservation of documents is an inherent one, especially in government, and more so in defence and scientific organizations. Digitizing documents into electronic publications is the first step towards preservation. An electronic publication consists of three components: a. The bit stream b. The logical format in the bit stream c. The functionality needed to decode this logical format Issues with Long term Digital Preservation Let us discuss the problems involved in preserving electronic publications and ensure that they remain accessible over a period of time. A. Issues with the bit stream: The medium on which the bit stream is stored could deteriorate, leading to loss of data and the storage technology could become obsolete over a period of time. B. The Logical Format: The logical format will become obsolete. For example, the Word format has completely replaced Word Perfect. So, even if the bit stream is available intact, we will not be able to access the information because we cannot decipher the format used. This problem can

XML-Based Open Document Formats 3 be overcome through the adoption of XML based Open Document Formats, and is the focus of this paper. C. Preserving the Functionality The bit stream is transformed into a format suitable for viewing, by an interpreter. Hence, the interpreter is software that provides the functionality for decoding the format and data embedded in the bit stream. One possible solution is to bundle the software (because it is also a bit stream) along with the electronic publication for long term maintenance [1]. However, there could be a case where over a period of time, the hardware available is no longer compatible with the interpreter. Proposed Solution Steps for Preservation and Access to Electronic Publications: 1. Saving the electronic publication with descriptive and technical metadata, a process called Archiving 2. Preserving the digital object The bit stream must be checked regularly, copied and the storage medium must be refreshed. 3. Guaranteeing long term access - While the above two steps are analogous to the preservation of conventional, non-digital documents, which can be accessed and read without any tool or intermediary aid, this requirement is specific only to Digital documents. So, even if the publication is archived and maintained in its pristine form, we cannot guarantee access to the information stored over a period of time without the necessary functionality or software.

XML-Based Open Document Formats 4 PUBLISHING ENVIRONMENT versus ARCHIVING ENVIRONMENT While creating a document, we often overlook the need for its preservation or long term access. Hence, we deal with it in the publishing mode. Often, our choice of the document format, the software to access it etc. are all decisions made considering on the hand issues like storage and bandwidth cost, document richness etc., without considering its implications on the long term nature of the document. Hence, the need arises for moving a document from the publishing environment to the archiving environment [2]. This transformation could be achieved by the adoption of XML-based Open Formats like the Open Document Format, IUPAC standard for experimental and critically-evaluated thermodynamic property data storage and capture etc.[ 3] Advantages of XML-based Document Formats XML offers significant benefits for interoperability, since it is a standardized, vendor and platform independent format for data and metadata exchange [4]. Open XML-based document file formats make transformations to other formats simple by leveraging and reusing existing standards wherever possible. It also creates the possibility for new types of applications and solutions to be developed other than traditional applications which presently access the data [5]. As more and more documents of significance are being created and stored in digital form, it is essential that the ability to keep these documents and files free and accessible not only today but for future generations is kept in mind. XML-based document file formats help achieve the same.

XML-Based Open Document Formats 5 Need for Openness It is proposed that the XML-based standards are kept open to insulate the preserved archives from any restrictions or royalties. The openness would enable the standard to be fully and independently implemented by multiple software providers on multiple platforms without any Intellectual Property reservations for necessary technology. References [1 ] Lorie, R., van Diessen, R.: UVC: A Universal Computer for Long-Term Preservation of Digital Information. RJ 10338, IBM Almaden Research Center, San Jose, CA (2005) [2 ] Backfile conversion and format issues for information stored in digital archives, AIIM Industry White Paper on Records, Document and Enterprise Content Management for the Public Sector - http://h30046.www3.hp.com/uploads/whitepapers/conversion_formats_wp.pdf [3] Michael Frenkel; Robert D. Chiroco; Vladimir Diky; Qian Dong; Kenneth N. Marsh; John H. Dymond; William A. Wakeham; Stephen E. Stein; Erich Königsberger, & Anthony R. H. Goodwin. XML-based IUPAC standard for experimental, predicted, and critically evaluated thermodynamic property data storage and capture (ThermoML) (IUPAC Recommendations 2006). Pure Appl. Chem., 2006, Vol. 78, No. 3, pp. 541-612

XML-Based Open Document Formats 6 [4] Oscar Mangisengi; Johannes Huber; Christian Hawel & Wolfgang Essmayr, A. Framework for Supporting Interoperability of Data Warehouse Islands Using XML, LNCS 2114, pp. 328 338, 2001. [5 ] Open Office Specification 1.0. Committee Draft 1, 22. March (2004) http://xml.coverpages.org/openofficespecificationv10-cd.pdf