USGS Guidelines for the Preservation of Digital Scientific Data



Similar documents
Interagency Science Working Group. National Archives and Records Administration

Best Practices for Long-Term Retention & Preservation. Michael Peterson, Strategic Research Corp. Gary Zasman, Network Appliance

DELAWARE PUBLIC ARCHIVES POLICY STATEMENT AND GUIDELINES MODEL GUIDELINES FOR ELECTRONIC RECORDS

Union County. Electronic Records and Document Imaging Policy

A Selection of Questions from the. Stewardship of Digital Assets Workshop Questionnaire

Fixity Checks: Checksums, Message Digests and Digital Signatures Audrey Novak, ILTS Digital Preservation Committee November 2006

How To Use A Court Record Electronically In Idaho

Checklist and guidance for a Data Management Plan

Digital Media Storage

Information Management Advice 18 - Managing records in business systems Part 1: Checklist for decommissioning business systems

Let s Talk Digital An Approach to Managing, Storing, and Preserving Time-Based Media Art Works. DAMS Branch Manager Smithsonian Institution, OCIO

Florida Digital Archive (FDA) Policy and Procedures Guide

Feet On The Ground: A Practical Approach To The Cloud Nine Things To Consider When Assessing Cloud Storage

OCLC Digital Archive Preservation Policy and Supporting Documentation Last Revised: 8 August 2006

STATE OF NEBRASKA STATE RECORDS ADMINISTRATOR DURABLE MEDIUM WRITTEN BEST PRACTICES & PROCEDURES (ELECTRONIC RECORDS GUIDELINES) OCTOBER 2009

Bradford Scholars Digital Preservation Policy

Institute for Advanced Study Shelby White and Leon Levy Archives Center

North Carolina Digital Preservation Policy. April 2014

1. Redistributions of documents, or parts of documents, must retain the SWGIT cover page containing the disclaimer.

TERRITORY RECORDS OFFICE BUSINESS SYSTEMS AND DIGITAL RECORDKEEPING FUNCTIONALITY ASSESSMENT TOOL

Business Process Design As-Is and To-Be Checklists Introduction

Enterprise Key Management: A Strategic Approach ENTERPRISE KEY MANAGEMENT A SRATEGIC APPROACH. White Paper February

State Records Office Guideline. Management of Digital Records

Exchange Server 2010 & C2C ArchiveOne A Feature Comparison for Archiving

Education and Workforce Development Cabinet POLICY/PROCEDURE. Policy Number: EDU-06 Effective Date: April 15, 2006 Revision Date: December 20, 2012

6. FINDINGS AND SUGGESTIONS

Digital Records Preservation Procedure No.: 6701 PR2

Why archiving erecords influences the creation of erecords. Martin Stürzlinger scopepartner Vienna, Austria

City of Minneapolis Policy for Enterprise Information Management

Archiving and Backup - The Basics

Florida Digital Archive (FDA) Policy and Procedures Guide

Information Management Advice 18 - Managing records in business systems: Overview

Exhibit to Data Center Services Service Component Provider Master Services Agreement

Build (develop) and document Acceptance Transition to production (installation) Operations and maintenance support (postinstallation)

Things You Need to Know About Cloud Backup

Workflow Templates Library

SOLUTION GUIDE AND BEST PRACTICES

Chapter WAC PRESERVATION OF ELECTRONIC PUBLIC RECORDS

Information Security Policy September 2009 Newman University IT Services. Information Security Policy

Section 28.1 Purpose. Section 28.2 Background. DOT Order Records Management. CIOP Chapter RECORDS MANAGEMENT

The Australian War Memorial s Digital Asset Management System

Newcastle University Information Security Procedures Version 3

Monitor file integrity using MultiHasher

Protecting Official Records as Evidence in the Cloud Environment. Anne Thurston

Chapter 25 Backup and Restore

Data Management Policies. Sage ERP Online

Arizona State Library, Archives and Public Records

Scotland s Commissioner for Children and Young People Records Management Policy

Management: A Guide For Harvard Administrators

DIGITAL PRESERVATION AT THE U.S. GOVERNMENT PRINTING OFFICE: WHITE PAPER. Version July 2008 UNITED STATES GOVERNMENT PRINTING OFFICE

How To Write A Health Care Security Rule For A University

POLICY AND GUIDELINES FOR THE MANAGEMENT OF ELECTRONIC RECORDS INCLUDING ELECTRONIC MAIL ( ) SYSTEMS

WHITE PAPER. HIPAA-Compliant Data Backup and Disaster Recovery

Information Resource Management Directive USAP Contingency & Disaster Recovery Program

AHDS Digital Preservation Glossary

Guidelines for Managing Electronic Records 2013 Edition

Information Security Policies. Version 6.1

ITIL A guide to service asset and configuration management

The 9 Ugliest Mistakes Made with Data Backup and How to Avoid Them

HIPAA Security. 2 Security Standards: Administrative Safeguards. Security Topics

Dell AppAssure Recovery Assure

Questionnaire on Digital Preservation in Local Authority Archive Services

What is Digital Archiving? 4/10/2015. Archiving Digital Content ARMA Spring Meeting. Digital Archiving

Electronic and Digital Signatures

State of Michigan Records Management Services. Frequently Asked Questions About E mail Retention

ensure prompt restart of critical applications and business activities in a timely manner following an emergency or disaster

Identify and Protect Your Vital Records

Audit of NSERC Award Management Information System

GENERAL RECORDS SCHEDULE 3.1: General Technology Management Records

Harvard Library Preparing for a Trustworthy Repository Certification of Harvard Library s DRS.

September Tsawwassen First Nation Policy for Records and Information Management

This policy is not designed to use systems backup for the following purposes:

Scanning and Tossing. Requirements for Scanning and the Destruction of Paper Based Records

Lord Chancellor s Code of Practice on the management of records issued under section 46 of the Freedom of Information Act 2000

Digital Forensics Tutorials Acquiring an Image with FTK Imager

Achieving a Step Change in Digital Preservation Capability

ISMS Implementation Guide

TECHNOLOGY AND INNOVATION DEPARTMENT BACKUP AND RECOVERY REVIEW AUDIT SEPTEMBER 23, 2014

Global Headquarters: 5 Speen Street Framingham, MA USA P F

Agenda. Overview Configuring the database for basic Backup and Recovery Backing up your database Restore and Recovery Operations Managing your backups

Larry Patterson, City Manager

information Records Management Checklist business people security preservation accountability Foreword Introduction Purpose of the checklist

RECORDKEEPING MATURITY MODEL

Cloud Service Contracts: An Issue of Trust

Transcription:

USGS Guidelines for the Preservation of Digital Scientific Data Introduction This document provides guidelines for use by USGS scientists, management, and IT staff in technical evaluation of systems for preserving digital scientific data. These guidelines will assist in selecting, specifying, building, operating, or enhancing data repositories. The USGS Fundamental Science Practices Advisory Committee Data Preservation Subcommittee developed these guidelines based on material from the Library of Congress- sponsored National Digital Stewardship Alliance (National Digital Stewardship Alliance, 2013). This document does not cover additional non- technical issues such as preservation policies, funding, or organizational competency and longevity, which are critical for data preservation, but beyond the scope of this document. More information about this topic can be found in Trustworthy Repositories Audit & Certification: Criteria and Checklist (OCLC- NARA, 2007) 1. When considering how to preserve digital data you should address these questions: Where are the data stored? Do you store a copy of the data off- site? How do you ensure the integrity of the data over time? What IT security features does USGS require for storing and accessing the data? What additional IT security features do you need? What metadata standard should be used to document the data? What sustainable file formats should be used for long- term storage? What are the applicable USGS rules regarding record retention periods for the data? 2 Who can you ask for assistance? Page 1

Table Element Definitions Each row in the table below represents a technical element of digital data preservation. Storage & Geographic Location Storage systems, locations, and multiple copies to prevent loss of data. Data Integrity Procedures to prevent, detect, and recover from unexpected or deliberate changes to data. Information Security Procedures to prevent human- caused corruption of data, deletion, and unauthorized access. Metadata Documentation of the data to enable contextual understanding and long- term usability. File Formats File types, data structures, and naming conventions to aid long- term preservation and reuse. Physical Media Basic recommendations to reduce obsolescence risks that can threaten the readability of physical media. Long- term and Sustainable Format Definitions Long- term: A period of time long enough for there to be concern about the loss of integrity of digital information held in a repository, including deterioration of storage media, changing technologies, support for old and new media and data formats, and a changing user community. This period extends into the indefinite future. Sustainable format: The ability to access an electronic record throughout its lifecycle, regardless of the technology used when it was originally created. A sustainable format is one that increases the likelihood of a record being accessible in the future. Levels of Digital Preservation Page 2

For each element, the columns in the following table describe four levels of increasing assurance that digital data will be preserved. The table is based upon a left- to- right progression. Each level adds requirements to the previous levels. To enhance an existing data repository, upgrade all elements to the same level. For highest assurance of data preservation, specify all elements at Level Four.. The table can be used to evaluate a repository s compliance with the desired levels. For each cell in the table, assign a grade to the repository of Pass, Incomplete, or Fail. Mark each cell (e.g. green, yellow, red backgrounds), to create a quick visual assessment. Pay attention to the lower levels marked Incomplete or Fail. If you are proposing improvements to an existing repository, you could create Before and After tables, to highlight proposed or actual progress. Level One: The minimum criteria and activities needed to maintain data through the life of a research project. Level Two: Better. Implement Level Two elements after all Level One elements are in place. Level Three: Even better. Implement Level Three elements after all Level Two elements are in place.. Level Four: Best. USGS should plan to provide repositories that meet these criteria for all long- term USGS records. ELEMENT LEVEL ONE LEVEL TWO LEVEL THREE LEVEL FOUR Two complete copies Three complete copies At least one copy in a stored physically At least one copy in a geographic location separate from each different geographic with a different disaster other location (offsite threat (e.g. hurricane Transfer the digital locations must follow area versus an content from NARA 1571 earthquake area) temporary media into guidelines 3 ) Maintain an an established storage Document the storage obsolescence system system and storage monitoring process for Managed storage media the storage system and system in place media Storage and Geographic Location Data Integrity Verify checksums on ingest, if provided Verify checksums on all data ingest Verify checksums at fixed intervals At least 3 copies in geographic locations with different disaster threats Implement a comprehensive plan that keeps files and metadata on currently accessible systems and media Verify checksums of all content in response to Page 3

Information Security Create checksums if not provided Virus check all content Identify who has authorization to read, write, move, and delete individual files Limit authorizations to individual files Metadata Inventory of content and its storage location Ensure backup and physical separation of inventory information Adhere to current USGS metadata standards File Formats Encourage the use of a limited set of documented and open file formats, codecs, compression schemes, and encapsulation schemes Physical Media Inventory all physical media utilized including hard disks. Use read only procedures when working with original media Document access restrictions for content Store all relevant database management information Store information describing changes to the structure or format of the data, including time of occurrence Provide access to all forms of the metadata Inventory the file formats in use Develop a plan to utilize trade studies to evaluate medias Maintain logs of checksums and supply audit information on demand Maintain procedures to detect corrupt data Virus check all content Maintain logs of who performed what actions on files, including deletions and preservation actions Preserve standard technical, descriptive, and preservation metadata Monitor file format obsolescence issues All non- recommended media have been properly disposed of specific events or activities Maintain procedures to replace or repair corrupted data Ensure no one person has write access to all copies Create, store, and verify a second, different checksum for all content Perform audit of logs Same as Level 3 Perform format migrations, emulations (a virtual instance of a previous operating system or procedure) and similar activities Base all media choices on trade studies. All information is Page 4

suitable for USGS purposes. Begin to transition away from all media utilized that are 10 years or more in age. following transition activities. migrated from an older media to a newer media every 3 to 5 years including hard disks. Derived from: Library of Congress, National Digital Stewardship Alliance, NDSA Levels of Digital Preservation: Version 1, February 2013. Roles and Responsibilities The repository manager or project chief ensures that all of the table elements are addressed, though others may be responsible for implementation and operation (e.g. data managers, IT specialists). Scientists and research staff can use these criteria to judge the suitability of a data repository for preserving data. Management can use these criteria for specifying or selecting data repositories. IT personnel can use these criteria for building, buying, enhancing, or operating data repositories. Checksums A checksum is a short mathematical digest of a file, which changes if any bit in the file changes. You use checksums to detect unexpected changes in file content. Federal agencies, including USGS, should use NIST- approved checksums for new systems. Currently, those are: SHA- 224, SHA- 256, SHA- 384, and SHA- 512. MD5 and SHA- 1 checksums are widely used, but not approved for new systems. For more information, see http://csrc.nist.gov/groups/st/toolkit/secure_hashing.html. Page 5

Assistance Storage & Geographic Location John Faundeen (faundeen@usgs.gov, 605-594- 6092) Data Integrity Rex Sanders (rsanders@usgs.gov, 831-460- 7555) Information Security See your local IT security point of contact. Metadata Vivian Hutchison (vhutchison@usgs.gov, 303-202- 4227) Additional USGS information and contacts can be found at http://www.usgs.gov/datamanagement/. Other Recommendations Individual scientists who want to improve the viable longevity of their personal archives could use the Library of Congress recommendations for Personal Archiving at http://www.digitalpreservation.gov/personalarchiving/. Footnotes 1 http://www.oclc.org/research/news/2007/03-12.html 2 http://internal.usgs.gov/gio/irm/fmref2.html 3 http://www.archives.gov/foia/directives/nara1571.pdf Page 6