Archiving Digital Content ARMA Spring Meeting Leigh A. Grinstead April 14, 2015 Digital Archiving This session will Cover the components of digital archiving providing long-term access, aka digital preservation Review standards and best practices Discuss tools and standards for technical and preservation metadata Provide a list of factors to consider in evaluating and selecting digital preservation systems What is Digital Archiving? 1
Digital Preservation Digital preservation combines policies, strategies and actions to ensure access to reformatted and born digital content regardless of the challenges of media failure and technological change. The goal of digital preservation is the accurate rendering of authenticated content over time Prepared by the ALCTS Preservation and Reformatting Section, Working Group on Defining Digital Preservation ALA Annual Conference, Washington, D.C., June 24, 2007 http://www.ala.org/alcts/resources/preserv/defdigpres0408 Digital Stewardship is NOT Back-up Backing up is not the same as digital stewardship Periodic testing Goal of Digital Stewardship Digital objects that are: Authentic Renderable Understandable Viable 2
Digital Preservation Readiness Assessment https://www.nedcc.org/free-resources/digital-preservation https://www.nedcc.org/assets/media/documents/toolkitquestionnai re.pdf https://www.nedcc.org/assets/media/documents/sodaexercisetoolki t.pdf Digital Preservation Activities Content Creation Not just reformatting File Formats: obsolescence What causes file format obsolescence? Software upgrades fail to support legacy files The format is superseded by another The format fails to be widely implemented The software that supports the format fails or is withdrawn 3
File Formats: proprietary v. non-proprietary Features may only be available within proprietary formats Open formats may include features that are not actually supported by any of the applications available to create or display them Open formats take a long time to develop and include input from many interested parties Since they are controlled by a commercial company, proprietary formats are vulnerable to being changed or dropped without notice Distinguish between Masters & Access Distinguish your preservation masters and access collections Preservation master or archival files are your digital masters and should be stored separately with limited access Access files are the ones you provide your patrons and most staff and may include a variety of duplicates: thumbnails, screen-sized, higher-resolution, or zoomable versions derived from the same master image Discussing how to Choose a File Format Consider: Wide adoption History of backward compatibility Good metadata support in an open format Built-in error-checking Reasonable upgrade cycles Before receipt of new files determine What can your department or institution reasonably accomplish in order to set yourself up for success 4
Standards & Best Practices for Creation Standards & Best Practices provide a benchmark Adhering to well-recognized and broadly implemented standards for encoding and formatting supports those standards and their longevity. What is Metadata? Metadata is data that facilitates the management, description, and preservation of a digital object or aggregation of digital objects. The creation of metadata is governed by a body of standards, best practices and schemas that, when appropriately applied, work together to facilitate the management, description, and preservation of digital objects. Metadata: Preservation Fields Provenance: for authentication and a documented history of the file s contents Context: why the data was created, how it relates to other data Reference identifiers: ISBN, accession number, etc. to demonstrate the relationship between the digital file and any physical holding you have Technical: to describe the technology environment used to create the digital objects and suggest how the files might be read/used 5
Technical Metadata Suggested Information about the creation of the digital version of the original object Capture equipment used Software used (version number) Name of technician Date of capture/conversion Resolution Bit-depth Size/length of original File format File storage location Best Practices for Imaging Federal Agencies Digitization Guidelines Initiative Technical Guidelines for Digitizing Cultural Heritage Materials: Creation of Raster Image Master Files http://www.digitizationguidelines.gov/guidelines/fadgi_still_image- Tech_Guidelines_2010-08-24.pdf Universal Photographic Digital Imaging Guidelines (UPDIG) www.updig.org Additional Best Practices Digitizing Video for Long-Term Preservation: An RFP Guide and Template http://library.nyu.edu/preservation/varrfp.pdf CDP Digital Audio Working Group Digital Audio Best Practices Version 2.1 http://www.lyrasis.org/products-and-services/digital-and- Preservation-Services/Digital-Toolbox.aspx 6
Storage Online Storage Offline Storage In the Cloud Digital Preservation Content Integrity 7
Strategies for Preservation Bitstream Copying Analog Backup Migration Data Recovery Technology Preservation Emulation Replication Strategies: Bitstream Copying Bitstream Copying is the making of an exact duplicate of a digital object. Often combined with remote storage so the two copies are not subject to the same disaster (power failure, fire, etc.) The minimum required strategy for even the most ephemeral data Pros: Easy Inexpensive Cons: Doesn t account for loss due to hardware or media failure Not an adequate long-term strategy for data of significant value Strategies: Analog Backup It is possible to transfer some of the content of a digital object into an analog format (e.g. printing to paper or microfilm) Pros: Transfers preservation strategy or burden back to media with which we may have more experience Cons: Not suitable for many kinds of digital objects Sacrifices enhanced operability of digital objects 8
Strategies: Migration Migration is to convert data from one technology to another while preserving the essential characteristics of the data. Pros: Data storage keeps getting cheaper and more reliable We re all already doing it anybody still got a 5.25 floppy drive? Cons: Each transfer provides another opportunity for error and data loss Strategies: Data Recovery Data Recovery is rescuing content from damaged media or hardware Usually performed by commercial vendors This is an emergency recovery strategy ONLY! Pros: Results may be better than a total loss Cons: Even if the data can be recovered, that doesn t mean that it will be renderable or understandable Almost no cultural heritage institution will be prepared to do this in house Strategies: Preservation of Devices Preservation of Devices is to preserve the historic technological environment Pros: Better than total loss Cons: May not be possible to preserve staff understanding of operating the preserved technology Saving hardware takes more storage than most of us have room for. 9
Strategies: Emulation Emulation combines software and hardware to reproduce the essential characteristics of a different computer so that media designed for one environment can be used in another one. Pros: You may be able to use files you wouldn t otherwise be able to Cons: Recreating computing environments on different equipment is difficult and not particularly profitable, so the strategy may be unavailable when you need it Strategies: Replication Replication is keeping many copies of the same digital object, preserving copies variously, with the hope that one of them will still be viable when it is needed. Pros: It works with popular publishing Cons: Copies have to be kept in different places and preserved in different ways Interoperability standards become even more important for multiple participant success Your strategy/strategies of choice right now? 10
Digital Preservation Content Maintenance Checksum Creation check sum ˈCHekˌsəm/ noun a digit representing the sum of the correct digits in a piece of stored or transmitted digital data, against which later comparisons can be made to detect errors in the data. Open Archival Information System--OAIS Information being maintained has been deemed to need long-term preservation There is a particular focus on digital information The OAIS is technologically agnostic 11
Standards & Best Practices National Digital Stewardship Alliance http://www.digitalpreservation.gov/ndsa/ SAA Standards Portal http://www2.archivists.org/standards LYRASIS Digital Tool Box http://bit.ly/1pmhp52 Useful Tools LOC Digital Preservation Tool Showcase http://www.digitalpreservation.gov/tools/ FOSS4LIB https://foss4lib.org/ 12
Decision Making Matrix Tools Archivematica digital preservation system Access to Memory (AtoM) content management system BitCurator BitCurator Access projects Questions? 13
Contact Information Leigh A. Grinstead Digital Services Consultant Leigh.Grinstead@lyrasis.org 14