Standards Development. PROS 14/00x Specification 3: Long term preservation formats



Similar documents
Suitable file formats for transfer of digital records to The National Archives

ADRI. Digital Record Export Standard. ADRI v1.0. ADRI Submission Information Package (ASIP)

Public Record Office Standard. Retention & Disposal Authority for Records of the Victims of Crime Assistance Tribunal

Electronic Records Management Guidelines - File Formats

11.5 E-THESIS SUBMISSION PROCEDURE (RESEARCH DEGREES)

File Formats. Summary

TROPICAL DATA HUB. Best Practice Guidelines for Research Data Management at JCU

TEXT FILES. Format Description / Properties Usage and Archival Recommendations

Public Record Office Standard. Retention & Disposal Authority for Records of the Transport Accident Prevention and Assistance Functions

A Mapping of the Victorian Electronic Records Strategy Schema to openehr

Swarthmore College Libraries Digital Collection Development Policy

Public Record Office Standard PROS 99/007. Public Record Office Victoria. Management of Electronic Records

Smithsonian Institution Archives Guidance Update SIA. ELECTRONIC RECORDS Recommendations for Preservation Formats. November 2004 SIA_EREC_04_03

IMS LIBRARY MEDIA TYPE GUIDE

How Xena performs file format identification

Guideline 1. Cloud Computing Decision Making. Public Record Office Victoria Cloud Computing Policy. Version Number: 1.0. Issue Date: 26/06/2013

Chapter 10 Printing, Exporting, and ing

Guideline. Records Management Strategy. Public Record Office Victoria PROS 10/10 Strategic Management. Version Number: 1.0. Issue Date: 19/07/2010

VERS Standard Electronic Record Format PROS 99/007 Specification 3. Public Record Office Victoria

Tibiscus University, Timişoara

DATA MANAGEMENT FOR QUALITATIVE DATA USING NVIVO9

Management of Records

System Requirements for Archiving Electronic Records PROS 99/007 Specification 1. Public Record Office Victoria

Lessons from document archiving PDF/A Dave McAllister, Director, Open Source and Standards Adobe Systems Incorporated. All Rights Reserved.

Public Record Office Victoria Standards and Policy. Recordkeeping Policy. Mobile Technologies. Version Number: v1.0. Issue Date: 13/10/2014

Using the DocConverter Plugin for Microsoft SharePoint

HP INTEGRATED ARCHIVE PLATFORM

Guidelines on Information Deliverables for Research Projects in Grand Canyon National Park

Preservation Handbook

Points to Note. Chinese and English characters shall be coded in ISO/IEC 10646:2011, and the set of Chinese

UNITED STATES DISTRICT COURT DISTRICT OF OREGON. Pursuant to the Court s order dated May 4, 2015 (Dkt # 110),

Perfect PDF & Print 9

Management Update: Important Issues About Digital Data Preservation

Adobe Acrobat 9 Pro Accessibility Guide: Creating Accessible PDF from Microsoft Word

Frequently Asked Questions (FAQs) ISO :2005 PDF/A-1 Date: July 10, 2006

Topic: Receiving and Responding to CBP Forms

Perfect PDF 8 Premium

Evaluating File Formats for Long-term Preservation

Preferred formats. September 2015, version 3.0 >>>

Digital Preservation Guidance Note: Selecting File Formats for Long-Term Preservation

POLICY AND PROCEDURES OFFICE OF COMMUNICATIONS. Table of Contents

Project Title: Judicial Branch Enterprise Document Management System RFP Number: FIN122210CK Appendix D Technical Features List

Overview Diagram Contracts & Amendments. Overview. Digital Asset Management FAQ

Digital Asset Management. An Oracle White Paper Updated April 2007

Quality Criteria for Digital Learning Resources

PDF/A the standard for long-term archiving

Catalyst CR Document Indexing Policy

Getting Started Guide. Chapter 10 Printing, Exporting, and ing

Bradford Scholars Digital Preservation Policy

Preservation Handbook

HP ARCHIVING SOFTWARE FOR EXCHANGE

Northern Maine. Electronic & Voic . Preserving Electronic Records in

SCI Gateway Newsletter er for Admin Users

Computers Are Your Future Eleventh Edition Chapter 5: Application Software: Tools for Productivity

Scanning and Tossing. Requirements for Scanning and the Destruction of Paper Based Records

AHDS Digital Preservation Glossary

Server-Based PDF Creation: Basics

Records Management Policy

Introduction. 1. Name of your organisation: 2. Country (of your organisation): Page 2

HTML Power Tips. HTML messages improve your CTR. World s Easiest Marketing.

System Requirements and Technical Prerequisites for SAP SuccessFactors HCM Suite

Guideline 2. Cloud Computing: Tools. Public Record Office Victoria Cloud Computing Policy. Version Number: 1.0. Issue Date: 26/06/2013

An Approach to the Preservation of Digital Records

Queensland Government Digital Continuity Strategy

Customizing component reports Tutorial

C6 Easy Imaging Total Computer Backup. User Guide

My Account User Guide. Popfax.com login page. Easy, inexpensive Effective!

Clickfree Software User Guide

Software Requirements Specification vyasa

Carol Chou. version 1.1, June 2006 supercedes version 1.0, May 2006

BUSINESS REQUIREMENTS SPECIFICATION (BRS) Business domain: Archiving and records management. Transfer of digital records

Best Archiving Practice Guidance

11 ways to migrate Lotus Notes applications to SharePoint and Office 365

Checklist and guidance for a Data Management Plan

Institute for Advanced Study Shelby White and Leon Levy Archives Center

How DHS is Doing Cybersecurity with Content Filtering

1. What is Long-Term Docs... 5

Operating manual. innovaphone Fax. Version 10. Documents that are to be faxed are basically sent as an attachment.

Guideline 2. Cloud Computing: Tools. Public Record Office Victoria Cloud Computing Policy. Version Number: 1.0. Issue Date: 26/06/2013

Using Adobe Acrobat X to enhance collaboration with Microsoft SharePoint and Microsoft Office

On completion of your Research Degree, you are required to submit an electronic copy of your thesis in USIR.

Secure User Guide

practical best practices for data management Brianna

Tasmanian Government WEB CONTENT MANAGEMENT GUIDELINES

Administrative Office of the Courts

Electronic Records Management Guidelines Version 5, March 2012

White Paper: Securely archiving s

ADRI. Statement on the Application of Digital Rights Management Technology to Public Records. ADRI v1.0

Welcome to InFixi Exchange Mailbox Recovery Software. Help Guide. Exchange Mailbox recovery software

TROLLing File Format Essentials

Trends and solutions for archiving in pharmaceutical industry. Didier Coyman / Ömer Yilmaz

College Archives Digital Preservation Policy. Created: October 2007 Last Updated: December 2012

HP Records Manager. A single solution for enterprise-scalable document and records management

In addition, a decision should be made about the date range of the documents to be scanned. There are a number of options:

USGS Guidelines for the Preservation of Digital Scientific Data

A Metadata Model for Peer-to-Peer Media Distribution

Document management. Why the format of office documents matters to your business

General Terms and Conditions

Transcription:

Standards Development PROS 14/00x Specification 3: Long term preservation formats

1 2 Copyright Statement State of Victoria 2014 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 This work is licensed under a Creative Commons Attribution 3.0 Australia licence. You are free to re-use the work under that licence, on the condition that you credit the State of Victoria (through the Public Record Office Victoria) as author. The licence does not apply to any images, photographs, or branding, including the Victorian Coat of Arms, the Victorian Government Logo, and the Public Record Office Victoria logo Disclaimer General The State of Victoria gives no warranty that the information in this version is correct or complete, error free or contains no omissions. The State of Victoria shall not be liable for any loss howsoever caused whether due to negligence or otherwise arising from the use of this document. This issues paper should not constitute, and should not be read as, a competent legal opinion. Agencies are advised to seek independent legal advice if appropriate. Records Management Standards Application The recordkeeping standards issued by PROV apply to all records in all formats, media or systems (including business systems). Agencies are advised to conduct an independent assessment to determine what other records management requirements apply. State of Victoria 2014 v1.0 Page 2 of 10

23 24 25 26 27 28 29 30 31 32 33 Table of Contents 1. Introduction... 6 1.1. Existing standard... 6 1.2. Related documents... 6 1.3. General References... 6 1.4. Acknowledgements... 6 2. VERS Long Term Preservation Formats... 7 2.1. How PROV chose the long term preservation formats... 7 2.2. How to choose one of the long term preservation formats... 7 2.3. Long term preservation formats... 9 2.4. Other formats... 10 State of Victoria 2014 v1.0 Page 3 of 10

34 35 36 Acronyms The following acronyms are used throughout this document. PROS PROV VEO VERS Public Record Office Standard Public Record Office Victoria VERS Encapsulated Object Victorian Electronic Records Strategy State of Victoria 2014 v1.0 Page 4 of 10

37 38 39 40 41 42 43 44 45 46 47 48 49 50 Executive Summary This specification is part of the Standard for the encapsulation of digital information (PROS 14/xxx). It describes the long term preservation formats that are acceptable to PROV. Obsolescence of digital formats is a problem for long term digital preservation. When a format becomes obsolete it may be difficult (or impossible) to obtain software to read the format and render the information contained within it. A long term preservation format is one that is So ubiquitous that software to read the format is likely to be available for the foreseeable future, or Well documented so that readers can, if necessary, be implemented from scratch. We are interested in any suggestions on ways to improve this specification. Please send comments to standards@prov.vic.gov.au State of Victoria 2014 v1.0 Page 5 of 10

51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 1. Introduction This specification lists a set of long term preservation formats that are considered to have a long usable life, and that will minimise the cost of providing access to the content over its life. All content transferred to PROV must be migrated (if necessary) to one of these formats. If content is migrated, the original format must also be transferred to PROV (unless specifically exempted). It is recommended that Agencies adopt these formats when creating content as this minimises subsequent migration costs. Agencies adopt these long term formats for content that they need to keep for a long time. The formats were chosen on the following criteria: Extremely widespread adoption. Being the dominant format in a particular category. Multiple independent implementations of the software. A published formal specification that implementations adhere to. Already accepted by the previous version of this standard. 1.1. Existing standard In order to protect the investment already made by vendors and agencies, PROV will continue to accept VEOs conformant to the existing VERS standard (PROS 99/007 (Version 2)) for the indefinite future. The new format, however, is not backwards compatible with the previous version. 1.2. Related documents This document should be read in conjunction with PROS 14/00x Standard for the encapsulation of digital information PROS 14/00x Specification 1: Constructing VERS Encapsulated Objects PROS 14/00x Specification 2: Adding metadata to VEOs PROS 14/00x Specification 3: Long term preservation formats 1.3. General References References to format specifications are included in the body of this specification. 1.4. Acknowledgements We would like to acknowledge and thank the people who took the time to comment on earlier drafts of this proposal. Nearly all the comments have been included in this draft, and where this is not possible we have included footnotes to explain the reasons. State of Victoria 2014 v1.0 Page 6 of 10

87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 2. VERS Long Term Preservation Formats This specification identifies the formats that are considered low risk for the long term preservation of information. The risk being addressed is that in the future it will not be possible to obtain software to extract present the information embedded in a digital object. Over a sufficiently long period of time all formats can be expected to become unreadable. At some point, then, it will be necessary to undertake a preservation action for a format. The goal of this specification is to identify formats for which this preservation action is likely to be a long way in the future, and when it is necessary to perform this preservation action, the necessary tools are available. 2.1. How PROV chose the long term preservation formats Selection of long term preservation formats is based on the assumption that ultimately any format is likely to fall out of use and objects in that format will require preservation actions. Good long term preservation formats are those that are likely to have a long lifespan before preservation interventions are required, and that when a preservation action is required suitable tools should be easily obtained. It is important to note that we are not assuming that formats will have an indefinite lifespan. Characteristics that suggest that a format is likely to have a long lifespan before preservation interventions are required are that: The format is in extremely widespread use The format has the dominant market share in its domain. These two characteristics mean that economics sustains the format. New products in the domain must accurately support the format (otherwise it is extremely difficult for them to gain market share). The number of instances of the format means that there is an economic incentive for developers to produce readers or migration tools for that format, even if the original vendor ceases support. An additional benefit of selecting common, dominant, formats is that these are likely to be the majority of record content held by an agency. Characteristics that suggest that tools will be available to undertake the preservation action include: A published format specification exists 1. Multiple independent implementations exist of format creators/readers 2. 2.2. How to choose one of the long term preservation formats Requirement to use one of the formats All record content transferred to PROV must be represented in one of the long term preservation formats in this standard. Agencies are strongly encouraged to use these long term preservation formats for information that must be retained for more than seven years, even if the information does not have to be transferred to PROV. The longer the information needs to be kept, the more important it is to use one of these long term preservation formations. Use of the formats will aid in ensuring long term access to the content. 1 It is not necessary that the format be published by a standards body, but this is preferred. 2 Multiple independent implementations show 1) that there is an economic incentive to read the selected format, and 2) that the format can be accurately implemented by independent vendors. State of Victoria 2014 v1.0 Page 7 of 10

129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 Avoiding migration Agencies are strongly encouraged to adopt these long term preservation formats for day to day business use. This will avoid the requirement to subsequently migrate information to a long term preservation format. Migration from one format to another is to be avoided if at all possible. In general, migration is expensive (to obtain the necessary tools, to carry out the migration, and to conduct the necessary quality assurance to ensure the migration was carried out successfully). There is always the risk when migrating of losing information. One of the criteria that PROV used when selecting long term preservation formats was to choose formats commonly in use within agencies to avoid, as far as possible, migration. Format selection Several long term preservation formats are provided for most categories of information. Agencies and agency staff can choose the most appropriate format for their business needs. In accordance with our policy of avoiding migration, the most appropriate format to choose will be the one in which the business is actually undertaken. Version selection Unless otherwise indicated, PROV will accept any version or variant of the selected long term preservation formats. This is because, in most cases, it is difficult for agencies to configure or set up software products to produce particular variants of a format. Further, most software will produce the latest version 3 of a particular format. PROV reserves the right, however, to not accept records that do not conform to a particular format, even though the production software is claimed to produce valid objects. Where appropriate, we provide recommendations if particular versions of a format are preferred over others. Agencies are encouraged to adopt the recommended versions where possible. Including the original format Where record content has been migrated to a long term preservation format for the purposes of transfer, a copy of the original, un-migrated, format must also be included in the VEO unless otherwise agreed by PROV. The requirement to include a copy of the original format guards against the following risks: that the migration did not result in an accurate representation of the original record that the migration resulted in loss of information that a better migration approach may be available in the future PROV will not generally require a copy in original format where The record is extremely large, and The migration process is a routine technology process with little chance of content loss 3 At the time the software was created State of Victoria 2014 v1.0 Page 8 of 10

171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 A typical example of a situation where the original would not normally be required is the conversion of video or audio to a long term preservation formation. 2.3. Long term preservation formats 45 The following formats are defined as long term preservation formats: Document and text formats: Plain text (.txt) Portable Document Format (.pdf). PDF documents should be conformant to PDF/A-1 or PDF/A-2 6. PDF/A-3 must not be used. Microsoft Word 7 (.doc,.docx) Web formats: HTML (.htm,.html,.css) 8. extensible Mark-up language (.xml) 9 Web ARChive format (.warc) 10 Spreadsheet formats: Comma separated values (.csv) 11 ; Microsoft Excel 12 (.xls,.xlsx) Presentation formats: Microsoft PowerPoint 13 (.ppt,.pptx) Portable Document Format (.pdf). PDF documents should be conformant to PDF/A-1 or PDF/A-2 14. PDF/A-3 must not be used. Image Formats: JPEG (.jpg); JPEG2000 15 (.jpg); Tagged image file format (.tif,.tiff) 4 PROV has tried to limit the number of alternative equivalent formats, particularly where the alternatives have low use (e.g. non inclusion of RTF as few documents are in this format and they can easily be migrated with no loss of functionality to Word). Some alternative formats, however, are already in the current version of the standard (e.g. JPEG2000). 5 Experience with PROS 99/007 has shown that it is extremely difficult for content creators to know what format their software products are creating, or to configure a product to create a particular profile of a format. Further, PROV does not currently have the capability of verifying format versions. It is for these reasons that we do not require specific versions of the listed formats. However, we encourage creators to use the current version of the products, and recommend particular format versions. These recommendations may become requirements over time. 6 It is recommended that PDF documents be conformant to either PDF/A-1 (ISO 19005-1) or PDF/A-2 (ISO 19005-2) 7 The Microsoft Word format has been chosen over the Open Office Write format because PROV believes a key characteristic of longevity is economic adoption, not status as an open standard. Selection of the Word format also avoids the cost of migrating the single most common record type within government. While we have selected the Word format, this does not mean that agencies must adopt the Microsoft Office product. Any application that creates valid Word (e.g. OpenOffice) can be used. 8 It is recommended that HTML files conform to HTML 4.01 standard (http://www.w3.org/tr/1999/rechtml401-19991224/ ) and CSS 2.1 (http://www.w3.org/tr/2011/rec-css2-20110607/ ) 9 XML files will be readable for the indefinite future, but they may not be interpretable. This is because the meaning of the XML markup is defined in separate standards (e.g. SVG for vector graphics). 10 Note that WARC is a container format that encapsulates web objects. Each web object (e.g. web pages) in the WARC file must be in one of the long term preservation formats specified in this specification. 11 See http://tools.ietf.org/html/rfc4180 for a non-normative definition of a CSV file. 12 The selection of the Microsoft Excel format over the Open Office Calc format is for the same reasons as preferring the Word format over the Write format. 13 The selection of the Microsoft PowerPoint format over the Open Office Impress format is for the same reasons as preferring the Word format over the Write format. 14 It is recommended that PDF documents be conformant to either PDF/A-1 (ISO 19005-1) or PDF/A-2 (ISO 19005-2) 15 JPEG2000 is accepted as a long term preservation format in PROS 99/007 (Version 2.0). For this reason, PROV will continue to accept it, however, at the moment (2014) it would fail the economic adoption test. State of Victoria 2014 v1.0 Page 9 of 10

195 196 197 198 199 200 201 202 203 204 205 206 207 208 Audio Formats: MPEG 1/2 Audio Layer 3 (.mp3); MPEG-4 (.mp4) WAVE (.wav) using an LPCM codec Video Formats: MPEG-4 (.mp4) Email Formats: MIME (.eml 16 ). 2.4. Other formats Agencies are encouraged to contact PROV if a common format they use is not in this list 17. However, before extending this list of formats PROV will work with the agency to determine the likely economic life of the format. End of Document 16 The.eml format is used by many email clients, including Microsoft Outlook Express, Lotus notes, Windows Mail, and Mozilla Thunderbird. Other straight representations of the MIME format can be used. 17 In particular, we are interested in discussing with agencies appropriate long term preservation formats for specialized types of data (e.g. CAD, GIS). State of Victoria 2014 v1.0 Page 10 of 10