LONG TERM PRESERVATION OF ELECTRONIC PUBLICATIONS GUARANTEEING ACCESS THROUGH ADOPTION OF XML-BASED OPEN DOCUMENT FORMATS 1. R Vasanth Kumar Mehta, Lecturer, Department of Computer Science and Engineering, Sri Chandrasekharendra Saraswathi Viswa Mahavidyalaya SCSVMV University, Kanchipuram 2. N R Ananthanarayanan Lecturer, Department of Computer Science and Engineering, Sri Chandrasekharendra Saraswathi Viswa Mahavidyalaya SCSVMV University, Kanchipuram CONTACT PERSON & DETAILS: R Vasanth Kumar Mehta, Lecturer, Department of Computer Science and Engineering, Sri Chandrasekharendra Saraswathi Viswa Mahavidyalaya SCSVMV University Enathur, Kanchipuram 631561 Tel: 022-27264301 Ext. 271 Mob: 91 94452-47274 Email: vasanthmehta@gmail.com SHORT TITLE: XML-Based Open Document Formats
XML-Based Open Document Formats 1 ABSTRACT A common approach to preserving documents is digitization. However, preserving digitized documents and ensuring their longevity gives rise to new challenges due to changes in storage media, devices and data formats. The problem posed by proprietary data formats in long term preservation of electronic documents is highlighted. A solution in the form of xml-based open document formats is suggested for addressing the problem. KEYWORDS Open document format, digital preservation, digital longevity
XML-Based Open Document Formats 2 Introduction The need for preservation of documents is an inherent one, especially in government, and more so in defence and scientific organizations. Digitizing documents into electronic publications is the first step towards preservation. An electronic publication consists of three components: a. The bit stream b. The logical format in the bit stream c. The functionality needed to decode this logical format Issues with Long term Digital Preservation Let us discuss the problems involved in preserving electronic publications and ensure that they remain accessible over a period of time. A. Issues with the bit stream: The medium on which the bit stream is stored could deteriorate, leading to loss of data and the storage technology could become obsolete over a period of time. B. The Logical Format: The logical format will become obsolete. For example, the Word format has completely replaced Word Perfect. So, even if the bit stream is available intact, we will not be able to access the information because we cannot decipher the format used. This problem can
XML-Based Open Document Formats 3 be overcome through the adoption of XML based Open Document Formats, and is the focus of this paper. C. Preserving the Functionality The bit stream is transformed into a format suitable for viewing, by an interpreter. Hence, the interpreter is software that provides the functionality for decoding the format and data embedded in the bit stream. One possible solution is to bundle the software (because it is also a bit stream) along with the electronic publication for long term maintenance [1]. However, there could be a case where over a period of time, the hardware available is no longer compatible with the interpreter. Proposed Solution Steps for Preservation and Access to Electronic Publications: 1. Saving the electronic publication with descriptive and technical metadata, a process called Archiving 2. Preserving the digital object The bit stream must be checked regularly, copied and the storage medium must be refreshed. 3. Guaranteeing long term access - While the above two steps are analogous to the preservation of conventional, non-digital documents, which can be accessed and read without any tool or intermediary aid, this requirement is specific only to Digital documents. So, even if the publication is archived and maintained in its pristine form, we cannot guarantee access to the information stored over a period of time without the necessary functionality or software.
XML-Based Open Document Formats 4 PUBLISHING ENVIRONMENT versus ARCHIVING ENVIRONMENT While creating a document, we often overlook the need for its preservation or long term access. Hence, we deal with it in the publishing mode. Often, our choice of the document format, the software to access it etc. are all decisions made considering on the hand issues like storage and bandwidth cost, document richness etc., without considering its implications on the long term nature of the document. Hence, the need arises for moving a document from the publishing environment to the archiving environment [2]. This transformation could be achieved by the adoption of XML-based Open Formats like the Open Document Format, IUPAC standard for experimental and critically-evaluated thermodynamic property data storage and capture etc.[ 3] Advantages of XML-based Document Formats XML offers significant benefits for interoperability, since it is a standardized, vendor and platform independent format for data and metadata exchange [4]. Open XML-based document file formats make transformations to other formats simple by leveraging and reusing existing standards wherever possible. It also creates the possibility for new types of applications and solutions to be developed other than traditional applications which presently access the data [5]. As more and more documents of significance are being created and stored in digital form, it is essential that the ability to keep these documents and files free and accessible not only today but for future generations is kept in mind. XML-based document file formats help achieve the same.
XML-Based Open Document Formats 5 Need for Openness It is proposed that the XML-based standards are kept open to insulate the preserved archives from any restrictions or royalties. The openness would enable the standard to be fully and independently implemented by multiple software providers on multiple platforms without any Intellectual Property reservations for necessary technology. References [1 ] Lorie, R., van Diessen, R.: UVC: A Universal Computer for Long-Term Preservation of Digital Information. RJ 10338, IBM Almaden Research Center, San Jose, CA (2005) [2 ] Backfile conversion and format issues for information stored in digital archives, AIIM Industry White Paper on Records, Document and Enterprise Content Management for the Public Sector - http://h30046.www3.hp.com/uploads/whitepapers/conversion_formats_wp.pdf [3] Michael Frenkel; Robert D. Chiroco; Vladimir Diky; Qian Dong; Kenneth N. Marsh; John H. Dymond; William A. Wakeham; Stephen E. Stein; Erich Königsberger, & Anthony R. H. Goodwin. XML-based IUPAC standard for experimental, predicted, and critically evaluated thermodynamic property data storage and capture (ThermoML) (IUPAC Recommendations 2006). Pure Appl. Chem., 2006, Vol. 78, No. 3, pp. 541-612
XML-Based Open Document Formats 6 [4] Oscar Mangisengi; Johannes Huber; Christian Hawel & Wolfgang Essmayr, A. Framework for Supporting Interoperability of Data Warehouse Islands Using XML, LNCS 2114, pp. 328 338, 2001. [5 ] Open Office Specification 1.0. Committee Draft 1, 22. March (2004) http://xml.coverpages.org/openofficespecificationv10-cd.pdf