Multiple Digital Content Types in a Single Collection. Dina Sokolova and Jane Gorjevsky, Columbia University



Similar documents
College Archives Digital Preservation Policy. Created: October 2007 Last Updated: December 2012

Ex Libris Rosetta: A Digital Preservation System Product Description

Functional Requirements for Digital Asset Management Project version /30/2006

A Selection of Questions from the. Stewardship of Digital Assets Workshop Questionnaire

Interagency Science Working Group. National Archives and Records Administration

BACKUP SECURITY GUIDELINE

Harvard Library Preparing for a Trustworthy Repository Certification of Harvard Library s DRS.

1. Redistributions of documents, or parts of documents, must retain the SWGIT cover page containing the disclaimer.

A Websense White Paper Websense CloudMerge Ingestion Service

Solution Brief: Creating Avid Project Archives

Columbia University Libraries / Information Services

Institute for Advanced Study Shelby White and Leon Levy Archives Center

Columbia University Digital Library Architecture. Robert Cartolano, Director Library Information Technology Office October, 2009

Service Plan Fiscal Year 2016

Chapter WAC PRESERVATION OF ELECTRONIC PUBLIC RECORDS

OCLC Digital Archive Preservation Policy and Supporting Documentation Last Revised: 8 August 2006

B. Preservation is not limited to simply avoiding affirmative acts of destruction because day-to-day operations routinely alter or destroy evidence.

Digital Forensics Tutorials Acquiring an Image with FTK Imager

Introduction Thanks Survey of attendees Questions at the end

NERC Biodiversity and Ecosystem Service Sustainability (BESS) Data Management Strategy

Discovery of Electronically Stored Information ECBA conference Tallinn October 2012

Using the Digital Preservation Toolkit Mini - Workshop. November 6, 2015 Ern Bieman, Canadian Heritage Information Network

Research Data Management PROJECT LIFECYCLE

The Australian War Memorial s Digital Asset Management System

E-Discovery Technology Considerations

EC-Council Ethical Hacking and Countermeasures

User Guide to Retention and Disposal Schedules Council of Europe Records Management Project

Archivematica 0.8 tutorial

Agreement Elements for Outsourcing Transfer of Born Digital Content

WORKING DRAFT 2 12 February 2004 University of Kansas Data Collection Manual for Digital Assets

How collaboration can save [more of] the web: recent progress in collaborative web archiving initiatives

Why archiving erecords influences the creation of erecords. Martin Stürzlinger scopepartner Vienna, Austria

03 Working with record creators

Planning and Infrastructure for Analog to Digital Preservation Projects

DIGITAL ARCHIVES & PRESERVATION SYSTEMS

Swarthmore College Libraries Digital Collection Development Policy

ELECTRONIC MAIL FOR STATE AGENCIES A Guideline of the Utah State Archives and Records Service

Digital Forensics, ediscovery and Electronic Evidence

LJMU Research Data Policy: information and guidance

Ensure your digital records have a future. Digital preservation

Computer Backup Strategies

FRONTIER REGIONAL/UNION#38 SCHOOL DISTRICTS. Records Retention Policy for Electronic Correspondence

How collaboration can save [more of] the web: recent progress in collaborative web archiving initiatives

Archiving the Internet

Local Government Cyber Security:

Digital Asset Management Developing your Institutional Repository

I N F O R M A T I O N I M M E D I A C Y, D I S C O V E R Y & C O N T I N U I T Y

Harnessing Media Micro-Services for Stewardship of Digital Assets at CUNY TV. NDSR Project:

Questionnaire on Digital Preservation in Local Authority Archive Services

Larry Patterson, City Manager

How To Preserve Records In Mississippi

PRESERVATION NEEDS ASSESSMENT PRESERVATION 101

INFORMATION UPDATE: Removable media - Storage and Retention of Data - Research Studies

ACS Backup and Restore

Digital Preservation Strategy,

Changing Moving Image Access:

1. Redistributions of documents, or parts of documents, must retain the SWGIT cover page containing the disclaimer.

September Tsawwassen First Nation Policy for Records and Information Management

Beyond Data Migration Best Practices

Chapter Contents. Operating System Activities. Operating System Basics. Operating System Activities. Operating System Activities 25/03/2014

CoLang 2014 Data Management and Archiving Course. Session 2. Nick Thieberger University of Melbourne

Policy & Procedure Manual

Digital Archiving at the National Archives of Australia: Putting Principles into Practice

How To Build A Map Library On A Computer Or Computer (For A Museum)

ComArchive PST Importer For version 3

Digital Photo Workflow and Archival Backups By John Ares September 20, 2009 Version 1.3 Copyright 2009 John Ares

CORPORATE RECORD RETENTION IN AN ELECTRONIC AGE (Outline)

THE COLLABORATIVE ELECTRONIC RECORDS PROJECT SUMMARY

The Benefit of Experience: the first four years of digital archiving at the National Archives of Australia

SUCCESSFUL SHAREPOINT IMPLEMENTATIONS. Maximize Application Availability and Protect Your Mission Critical Assets

A+ Guide to Software: Managing, Maintaining, and Troubleshooting, 5e. Chapter 3 Installing Windows

Digital Archive Getting Started Guide

Electronic Records Management Strategy

Applicability: All Employees Effective Date: December 6, 2005; revised January 27, 2009 Source(s):

The Digital Domain and Media Storage: How the Moving Image Archive Survives in Two Different Worlds

XenData Product Brief: SX-550 Series Servers for LTO Archives

Smithsonian Institution Archives Preservation Tools

Policy for Cleveland Museum of Art Records Table of Contents

Electronic Data Retention and Preservation Policy 1

FROM PAPER TO ELECTRONIC RECORDS MANAGEMENT MANAGING THE TRANSITION

Core Competencies for Visual Resources Management

Archive Storage Infrastructure At the Library of Congress September 2015

AHDS Digital Preservation Glossary

Digital preservation policy

West Chester University Records Management Policy

Forensic Toolkit. Sales and Promotional Summary ACCESSDATA, ON YOUR RADAR

Digital Collections as Big Data. Leslie Johnston, Library of Congress Digital Preservation 2012

Records Management Electronic Records and Electronic Discovery

Symantec Backup Exec 11d for Windows Small Business Server

CatDV Pro Workgroup Serve r

Union County. Electronic Records and Document Imaging Policy

[STORAGE SOLUTIONS: FILM INDUSTRY]

How To Write A Request For Information (Rfi)

web archives & research collections

Investigating the prevalence of unsecured financial, health and personally identifiable information in corporate data

The Summation Users Guide to Digital WarRoom It s time for a fresh approach to e-discovery.

North Carolina Digital Preservation Policy. April 2014

ARCHIVING FOR EXCHANGE 2013

Standards Development. PROS 14/00x Specification 3: Long term preservation formats

Infrequent Tape Retention

Transcription:

Multiple Digital Content Types in a Single Collection Dina Sokolova and Jane Gorjevsky, Columbia University

Digitized Digital-born records (modern and legacy) Delivered-digital Harvested online materials

Acquisition procedures Hardware and software Sorting and weeding workflow Metadata capture and enhancement Preservation routines for various content types and file formats Finding aids for hybrid digital/analog collections Tiered access system

Creators/ Donors Users Copyright advisory office Curating Unit (RBML) Original and Special Materials Cataloging (OSMC) Preservation and Digital Conversion Division (PDCD) Libraries Information Technology Office (LITO) Libraries Digital Programs Division (LDPD)

Permanently preserve IFP paper and electronic records Provide access to IFP digital archives based on three types of user access: publicly accessible online viewable onsite only embargoed until 2075 Make IFP materials discoverable via OPAC, EAD finding aid and a web interface Funded by Ford Foundation grant, October 2011

Offered fellowships for post-graduate study to more than 4,300 people via offices in 22 countries with an overall program management by Secretariat in New York 2001 2013

Paper and digital records from 22 International partner organizations, New York Secretariat and CHEPS (Center for Higher Education Policy Studies) Materials include: Office documents Time-based (audio and video) materials Databases Email correspondence Websites Academic and personal records of fellows Surveys, interviews and statistical reports Datasets 3.6 TB of electronic materials in PC and Mac formats

Record surveys (2010, 2012) Selection and sorting guidelines Transfer instructions and tools Record Samples Additional file/folder naming, selection and sorting recommendations

CUL program administered by OSMC using archive.org toolset RBML: Verifies URLs Provides metadata Specifies capturing frequency Monitors captures Adds descriptive metadata

Hard drives or flash drives RBML: Received Accessioned Housed Labeled Forwarded to LDPD for transfer Obsolete media RBML: Found Inventoried using template Separated Forwarded to LDPD for transfer Original Media returned to RBML, shipped offsite Selection and weeding using digital forensics tools (LDPD, RBML) LDPD: LDPD: Data transferred to server Data transferred to server Inventory and Initial ingest report generated Initial ingest report generated

Most materials in English Pre-selected and sorted into 3 access categories No access to embargoed files until 2075 Full list of fellows and their consent status provided Limited number of file formats Sensitive information in paper format only No obsolete media

About 350,000 files in 245 formats, 10 languages, 7 non-roman character sets Long filenames/file paths (> 260 characters) Compressed and password-protected files Variety of transfer media (hard and flash drives, DVDs, floppy disks, ZIP disks, DV tapes) in need of Digital Conversion Standards Vendor communications Quality control

Selection and sorting by creators unreliable Personally Identifiable Information Privacy and confidentiality concerns vary by country Growing complexity of access needs Manual item-level content appraisal for unrestricted category Initial access assumptions insufficiently restrictive

File/directory names - the only source of descriptive item-level metadata: Non-roman character sets: IFP\...\??????????\????????????.jpg IFP\...\ \.doc Long filenames/file paths: IFP\Newsletter\Alumni Meeting\... \...\...\Fifth meeting October 23-28, 2008\Agenda\IFP Assembly\Other\07.jpg Foreign languages: IFP\...\...\Foto bersama usai sidang kongres Perhimpunan Pelajar Indonesia Australia di Balai Kartini Gedung KBRI Canberra, 2012.jpg (A group photograph of Indonesian students taken after the congress in front of the Indonesian Embassy in Canberra, Australia, 2012)

Curatorial surveys (preacquisition) Record Transfer Documentation Accessioning workflow Weeding routines File format action plan Pre-processing and ingest workflows

Processing workstation: FRED (Forensic Recovery of Evidence Device) and Apple Mac computer

ClamWin (ClamXav): initial virus check

makeinventory program: verifying content integrity

AccessData FTK Imager: creating disk images (CDs/DVDs, ZIP and Floppy Disks)

Forensic Toolkit (FTK): content review and weeding

Forensic Toolkit (FTK): finding duplicate content

SIPs for each office are based on access restrictions (Unrestricted, Onsite, Restricted) Aid4Mail e-discovery Archivist: converting email from multiple formats (eml, mbx, msg, pst, sbd, Pegasus mail) to MBOX Database Preservation Toolkit: converting Microsoft Access databases to XML format Original formats: SQL-based databases, statistical datasets (sav, spss) External Vendor: converting content of commercially produced video DVDs, audio CDs, and mini DV-tapes to preservation formats

Archivematica: digital preservation

Descriptive metadata: makeinventory program, Archivematica Technical metadata: makeinventory program, FTK, Archivematica Preservation metadata: Archivematica Rights metadata: Archivematica

Digital Content Areas: Electronic records Research data Websites Audiovisual materials Organizational Roles, Policies, and Practices: Education of content creators Large amount of complex data Rights management Security and compliance policies Technical Infrastructure Development: Integration of digital forensics tools Ensuring content integrity File format action plan development