LONG TERM PRESERVATION OF ELECTRONIC PUBLICATIONS GUARANTEEING ACCESS THROUGH ADOPTION OF XML-BASED OPEN DOCUMENT FORMATS



Similar documents
Preservation Handbook

Management Update: Important Issues About Digital Data Preservation

XMPP A Perfect Protocol for the New Era of Volunteer Cloud Computing

OCLC Digital Archive Preservation Policy and Supporting Documentation Last Revised: 8 August 2006

Long Term Preservation of Digital Information

Use of modern telephone network for time transfer: An innovation

Archive exchange Format AXF

A Mind Map Based Framework for Automated Software Log File Analysis

A grant number provides unique identification for the grant.

Electronic Records Management Guidelines - File Formats

Archival Data Format Requirements

Phire Architect Hardware and Software Requirements

THEREDA - The Thermodynamic Reference Database for Nuclear Waste Disposal in Germany

Cloud Computing Services and its Application

IBM Solution Framework for Lifecycle Management of Research Data IBM Corporation

Tibiscus University, Timişoara

NTU-IR: An Institutional Repository for Nanyang Technological University using DSpace

Gradient An EII Solution From Infosys

EVOLUTION OF NETWORKED STORAGE

AHDS Digital Preservation Glossary

ERA-CAPS Data Sharing Policy ERA-CAPS. Data Sharing Policy

Planning a Backup Strategy

E-learning as a Powerful Tool for Knowledge Management

AXF Archive exchange Format: Interchange & Interoperability for Operational Storage and Long-Term Preservation

Add the compliance and discovery benefits of records management to your business solutions. IBM Information Management software

White. Paper. The SMB Market is Ready for Data Encryption. January, 2011

Why is the V3 appliance so effective as a physical desktop replacement?

Considerations for Management of Laboratory Data

Queensland Government Digital Continuity Strategy

Overview of NDNP Technical Specifications

CHAPTER 1 INTRODUCTION

A Best Practice Guide to Archiving Persistent Data: How archiving is a vital tool as part of a data center cost savings exercise

DELAWARE PUBLIC ARCHIVES POLICY STATEMENT AND GUIDELINES MODEL GUIDELINES FOR ELECTRONIC RECORDS

How to Enhance Traditional BI Architecture to Leverage Big Data

TEXT FILES. Format Description / Properties Usage and Archival Recommendations

Filing Information Rich Digital Asset Management Coca-Cola s Archive Research Assistant: Using DAM for Competitive Advantage IDC Opinion

Best Practices for Research Data Management. October 30, 2014

Recordkeeping for Good Governance Toolkit. GUIDELINE 14: Digital Recordkeeping Choosing the Best Strategy

Adobe Anywhere for video Collaborate without boundaries

Functional Requirements for Digital Asset Management Project version /30/2006

MEDIA ASSET MANAGEMENT

Digital Preservation Guidance Note: Selecting File Formats for Long-Term Preservation

Architecting an Industrial Sensor Data Platform for Big Data Analytics

Communications and Computer Networks

Digital Rights Management

COPYRIGHT AND LICENSING ISSUES FOR DIGITAL PRESERVATION AND POSSIBLE SOLUTIONS

THE NATIONAL FREE AND OPEN SOURCE SOFTWARE (FOSS), AND OPEN STANDARDS POLICY DRAFT SEPT 2014

A Parallel Processor for Distributed Genetic Algorithm with Redundant Binary Number

Digital Preservation. OAIS Reference Model

Cloud Computing and Digital Preservation: A Comparison of Two Services. Amanda L. Stowell. San Jose State University

The ASTM E57 File Format for 3D Imaging Data Exchange

Internationalization and Web Services

The Next Frontier. for Records Managers. Retention and Disposition of Structured Data:

DIGITIZATION S GUIDE. Go for quality and document your process!

Physically present future preserved

Integrating Apache Spark with an Enterprise Data Warehouse

Web-based Multimedia Content Management System for Effective News Personalization on Interactive Broadcasting

Windows Embedded Security and Surveillance Solutions

International Standards for Online Finding Aids in German Archives

1. Redistributions of documents, or parts of documents, must retain the SWGIT cover page containing the disclaimer.

The ASTM E57 File Format for 3D Imaging Data Exchange

Infosys GRADIENT. Enabling Enterprise Data Virtualization. Keywords. Grid, Enterprise Data Integration, EII Introduction

Quantum DXi6500 Family of Network-Attached Disk Backup Appliances with Deduplication

DATA PROGRESSION. The Industry s Only SAN with Automated. Tiered Storage STORAGE CENTER DATA PROGRESSION

The Evolution of Cloud Storage - From "Disk Drive in the Sky" to "Storage Array in the Sky" Allen Samuels Co-founder & Chief Architect

Introduction. Chapter 1. Introducing the Database. Data vs. Information

Managing a Fibre Channel Storage Area Network

Digital Asset Management Developing your Institutional Repository

Survey of Big Data Benchmarking

Enterprise Application Integration (EAI) Techniques

Data management plan

Key Management Interoperability Protocol (KMIP)

Solving Healthcare's BIG Data Problem... Imaging and Cloud Infrastructure

IBM Global Technology Services March Virtualization for disaster recovery: areas of focus and consideration.

Snapshot Technology: Improving Data Availability and Redundancy

POLICY AND GUIDELINES FOR THE MANAGEMENT OF ELECTRONIC RECORDS INCLUDING ELECTRONIC MAIL ( ) SYSTEMS

DIGITAL PRESERVATION AT THE U.S. GOVERNMENT PRINTING OFFICE: WHITE PAPER. Version July 2008 UNITED STATES GOVERNMENT PRINTING OFFICE

Symantec NetBackup Snapshots, Continuous Data Protection, and Replication

PERFORMANCE ANALYSIS OF VIDEO FORMATS ENCODING IN CLOUD ENVIRONMENT

Transcription:

LONG TERM PRESERVATION OF ELECTRONIC PUBLICATIONS GUARANTEEING ACCESS THROUGH ADOPTION OF XML-BASED OPEN DOCUMENT FORMATS 1. R Vasanth Kumar Mehta, Lecturer, Department of Computer Science and Engineering, Sri Chandrasekharendra Saraswathi Viswa Mahavidyalaya SCSVMV University, Kanchipuram 2. N R Ananthanarayanan Lecturer, Department of Computer Science and Engineering, Sri Chandrasekharendra Saraswathi Viswa Mahavidyalaya SCSVMV University, Kanchipuram CONTACT PERSON & DETAILS: R Vasanth Kumar Mehta, Lecturer, Department of Computer Science and Engineering, Sri Chandrasekharendra Saraswathi Viswa Mahavidyalaya SCSVMV University Enathur, Kanchipuram 631561 Tel: 022-27264301 Ext. 271 Mob: 91 94452-47274 Email: vasanthmehta@gmail.com SHORT TITLE: XML-Based Open Document Formats

XML-Based Open Document Formats 1 ABSTRACT A common approach to preserving documents is digitization. However, preserving digitized documents and ensuring their longevity gives rise to new challenges due to changes in storage media, devices and data formats. The problem posed by proprietary data formats in long term preservation of electronic documents is highlighted. A solution in the form of xml-based open document formats is suggested for addressing the problem. KEYWORDS Open document format, digital preservation, digital longevity

XML-Based Open Document Formats 2 Introduction The need for preservation of documents is an inherent one, especially in government, and more so in defence and scientific organizations. Digitizing documents into electronic publications is the first step towards preservation. An electronic publication consists of three components: a. The bit stream b. The logical format in the bit stream c. The functionality needed to decode this logical format Issues with Long term Digital Preservation Let us discuss the problems involved in preserving electronic publications and ensure that they remain accessible over a period of time. A. Issues with the bit stream: The medium on which the bit stream is stored could deteriorate, leading to loss of data and the storage technology could become obsolete over a period of time. B. The Logical Format: The logical format will become obsolete. For example, the Word format has completely replaced Word Perfect. So, even if the bit stream is available intact, we will not be able to access the information because we cannot decipher the format used. This problem can

XML-Based Open Document Formats 3 be overcome through the adoption of XML based Open Document Formats, and is the focus of this paper. C. Preserving the Functionality The bit stream is transformed into a format suitable for viewing, by an interpreter. Hence, the interpreter is software that provides the functionality for decoding the format and data embedded in the bit stream. One possible solution is to bundle the software (because it is also a bit stream) along with the electronic publication for long term maintenance [1]. However, there could be a case where over a period of time, the hardware available is no longer compatible with the interpreter. Proposed Solution Steps for Preservation and Access to Electronic Publications: 1. Saving the electronic publication with descriptive and technical metadata, a process called Archiving 2. Preserving the digital object The bit stream must be checked regularly, copied and the storage medium must be refreshed. 3. Guaranteeing long term access - While the above two steps are analogous to the preservation of conventional, non-digital documents, which can be accessed and read without any tool or intermediary aid, this requirement is specific only to Digital documents. So, even if the publication is archived and maintained in its pristine form, we cannot guarantee access to the information stored over a period of time without the necessary functionality or software.

XML-Based Open Document Formats 4 PUBLISHING ENVIRONMENT versus ARCHIVING ENVIRONMENT While creating a document, we often overlook the need for its preservation or long term access. Hence, we deal with it in the publishing mode. Often, our choice of the document format, the software to access it etc. are all decisions made considering on the hand issues like storage and bandwidth cost, document richness etc., without considering its implications on the long term nature of the document. Hence, the need arises for moving a document from the publishing environment to the archiving environment [2]. This transformation could be achieved by the adoption of XML-based Open Formats like the Open Document Format, IUPAC standard for experimental and critically-evaluated thermodynamic property data storage and capture etc.[ 3] Advantages of XML-based Document Formats XML offers significant benefits for interoperability, since it is a standardized, vendor and platform independent format for data and metadata exchange [4]. Open XML-based document file formats make transformations to other formats simple by leveraging and reusing existing standards wherever possible. It also creates the possibility for new types of applications and solutions to be developed other than traditional applications which presently access the data [5]. As more and more documents of significance are being created and stored in digital form, it is essential that the ability to keep these documents and files free and accessible not only today but for future generations is kept in mind. XML-based document file formats help achieve the same.

XML-Based Open Document Formats 5 Need for Openness It is proposed that the XML-based standards are kept open to insulate the preserved archives from any restrictions or royalties. The openness would enable the standard to be fully and independently implemented by multiple software providers on multiple platforms without any Intellectual Property reservations for necessary technology. References [1 ] Lorie, R., van Diessen, R.: UVC: A Universal Computer for Long-Term Preservation of Digital Information. RJ 10338, IBM Almaden Research Center, San Jose, CA (2005) [2 ] Backfile conversion and format issues for information stored in digital archives, AIIM Industry White Paper on Records, Document and Enterprise Content Management for the Public Sector - http://h30046.www3.hp.com/uploads/whitepapers/conversion_formats_wp.pdf [3] Michael Frenkel; Robert D. Chiroco; Vladimir Diky; Qian Dong; Kenneth N. Marsh; John H. Dymond; William A. Wakeham; Stephen E. Stein; Erich Königsberger, & Anthony R. H. Goodwin. XML-based IUPAC standard for experimental, predicted, and critically evaluated thermodynamic property data storage and capture (ThermoML) (IUPAC Recommendations 2006). Pure Appl. Chem., 2006, Vol. 78, No. 3, pp. 541-612

XML-Based Open Document Formats 6 [4] Oscar Mangisengi; Johannes Huber; Christian Hawel & Wolfgang Essmayr, A. Framework for Supporting Interoperability of Data Warehouse Islands Using XML, LNCS 2114, pp. 328 338, 2001. [5 ] Open Office Specification 1.0. Committee Draft 1, 22. March (2004) http://xml.coverpages.org/openofficespecificationv10-cd.pdf