Columbia University Digital Library Architecture. Robert Cartolano, Director Library Information Technology Office October, 2009



Similar documents
How To Manage Research Data At Columbia

Reference Architectures for Repositories and Preservation Archiving

Oracle Content Management and Archiving

Columbia University Libraries / Information Services

<Insert Picture Here> Solution Direction for Long-Term Archive

A Selection of Questions from the. Stewardship of Digital Assets Workshop Questionnaire

Adding Robust Digital Asset Management to Oracle s Storage Archive Manager (SAM)

Digital Asset Management Developing your Institutional Repository

PRESERVATION NEEDS ASSESSMENT PRESERVATION 101

XenData Video Edition. Product Brief:

XenData Product Brief: SX-550 Series Servers for LTO Archives

Digital Repository Initiative

THE BRITISH LIBRARY. Unlocking The Value. The British Library s Collection Metadata Strategy Page 1 of 8

Archive Data Retention & Compliance. Solutions Integrated Storage Appliances. Management Optimized Storage & Migration

Functional Requirements for Digital Asset Management Project version /30/2006

Sun Open Archive Framework and Fedora Repository Solutions

<Insert Picture Here> Refreshing Your Data Protection Environment with Next-Generation Architectures

How To Understand The Strategic Importance Of Archive Solutions

EMC DATA DOMAIN EXTENDED RETENTION SOFTWARE: MEETING NEEDS FOR LONG-TERM RETENTION OF BACKUP DATA ON EMC DATA DOMAIN SYSTEMS

Implementing a Digital Video Archive Using XenData Software and a Spectra Logic Archive

Solution Brief: Creating Avid Project Archives

Versity All rights reserved.

Digital Content Management Workflow Task Force

Carestream Information Management Solutions. Managing the explosion in patient information

Creating a Catalog for ILM Services. Bob Mister Rogers, Application Matrix Paul Field, Independent Consultant Terry Yoshii, Intel

How To Build A Map Library On A Computer Or Computer (For A Museum)

Data Protection. the data. short retention. event of a disaster. - Different mechanisms, products for backup and restore based on retention and age of

Optimizing Storage for Better TCO in Oracle Environments. Part 1: Management INFOSTOR. Executive Brief

Planning and Infrastructure for Analog to Digital Preservation Projects

Implementing a Digital Video Archive Based on XenData Software

TRANSFORMING DATA PROTECTION

WHITE PAPER Improving Storage Efficiencies with Data Deduplication and Compression

The safer, easier way to help you pass any IT exams. Exam : Storage Sales V2. Title : Version : Demo 1 / 5

Harvard Library Preparing for a Trustworthy Repository Certification of Harvard Library s DRS.

Implementing an Automated Digital Video Archive Based on the Video Edition of XenData Software

A Best Practice Guide to Archiving Persistent Data: How archiving is a vital tool as part of a data center cost savings exercise

Achieving Cost-Effective, Vendor-Neutral Archiving For Your Enterprise

Vodacom Managed Hosted Backups

Backup of NAS devices with Avamar

SwiftStack Filesystem Gateway Architecture

Exhibit to Data Center Services Service Component Provider Master Services Agreement

Performance, Reliability, and Operational Issues for High Performance NAS Storage on Cray Platforms. Cray User Group Meeting June 2007

The Australian War Memorial s Digital Asset Management System

Storage Solutions For the DIY-types

<Insert Picture Here> Enabling Cloud Deployments with Oracle Virtualization

W H I T E P A P E R T h e C r i t i c a l N e e d t o P r o t e c t M a i n f r a m e B u s i n e s s - C r i t i c a l A p p l i c a t i o n s

Constant Replicator: An Introduction

IBM Global Technology Services September NAS systems scale out to meet growing storage demand.

ETERNUS CS High End Unified Data Protection

Backup and Recovery Solutions for Exadata. Ľubomír Vaňo Principal Sales Consultant

The evolution of data archiving

Archiving, Indexing and Accessing Web Materials: Solutions for large amounts of data

Why StrongBox Beats Disk for Long-Term Archiving. Here s how to build an accessible, protected long-term storage strategy for $.003 per GB/month.

Archiving On-Premise and in the Cloud. March 2015

Backup Implementation Proposal

Building Storage Service in a Private Cloud

Scientific Storage at FNAL. Gerard Bernabeu Altayo Dmitry Litvintsev Gene Oleynik 14/10/2015

September 2009 Cloud Storage for Cloud Computing

IBM System Storage DR550

The Stanford Digital Repository A Case Study in Building A Common Preservation Infrastructure

The Key Elements of Digital Asset Management

How To Back Up A Computer To A Backup On A Hard Drive On A Microsoft Macbook (Or Ipad) With A Backup From A Flash Drive To A Flash Memory (Or A Flash) On A Flash (Or Macbook) On

Digital Asset Management in Museums

Media Cloud Building Practicalities

Backup and Recovery Solutions for Exadata. Cor Beumer Storage Sales Specialist Oracle Nederland

Oracle Reference Architecture and Oracle Cloud

Clodoaldo Barrera Chief Technical Strategist IBM System Storage. Making a successful transition to Software Defined Storage

BlueArc unified network storage systems 7th TF-Storage Meeting. Scale Bigger, Store Smarter, Accelerate Everything

Agenda. Overview Configuring the database for basic Backup and Recovery Backing up your database Restore and Recovery Operations Managing your backups

Restoration Technologies. Mike Fishman / EMC Corp.

Long-term preservation activities of the Bavarian State Library

Digital Assets Repository 3.0. PASIG User Group Conference Noha Adly Bibliotheca Alexandrina

Overview of Storage and Data Management Industry Trends in Long Term Information Retention and Preservation

Factors in Selecting a Digital Asset Management System:

College Archives Digital Preservation Policy. Created: October 2007 Last Updated: December 2012

Storage Switzerland White Paper Storage Infrastructures for Big Data Workflows

WHITEPAPER. A Technical Perspective on the Talena Data Availability Management Solution

Choosing a Digital Asset Management System That s Right for You

Large Scale Storage. Orlando Richards, Information Services LCFG Users Day, University of Edinburgh 18 th January 2013

THE EMC ISILON STORY. Big Data In The Enterprise. Copyright 2012 EMC Corporation. All rights reserved.

Operationalize Policies. Take Action. Establish Policies. Opportunity to use same tools and practices from desktop management in server environment

Transcription:

Columbia University Digital Library Architecture Robert Cartolano, Director Library Information Technology Office October, 2009

Agenda Technology Architecture Off-site NYSERNet Facility Ingest, curation and support tools Academic Commons - Dspace to Fedora Migration

Goal Scale up efforts to catalog, digitize & publish to the Web unique, distinctive collection holdings that have significant value for teaching or research. Design & implement coherent & comprehensive preservation program for ensuring survival & continued accessibility of Libraries digital content. Develop & budget for long-term digital archiving strategy for content created by the Libraries, whether born-digital or converted from analog formats. Collaborate with other stakeholders to develop affordable cooperative solutions to ensure long-term preservation of licensed content. Libraries Strategic Plan, 2006-2009

Who Columbia University Libraries/Information Services Digital Program and Technology Services Center for Digital Research and Scholarship Center for New Media Teaching and Learning Copyright Advisory Office Digital Preservation and Conversion Libraries Digital Program Division Library Information Technology Office Columbia University Libraries: http://www.columbia.edu/library/

Technical Team Ben Armintor Terry Catapano Robert Cartolano Stephen Davis Jack Donovan Sarah Holsted Risa Karaviotis Rebecca Kennison Stuart Marquis Nada O Neal Alberto Ortiz Patricia Renfro James Stuart

Questions in 2007 Build a single consolidated system? Digital Library Collections DSpace Academic Commons (institutional repository) Long-term Archive What is storage impact of large-scale digitization? How many copies? What level of offsite storage? What are budget requirements?

Requirements Stable, secure storage for large-scale access & longterm preservation Support efficient creation & management of administrative, descriptive, structural, preservation & rights metadata Support object relationships, actions, behaviors Support fine-grained access control policies Administrative tools (eg: statistics, reporting)

Key Decision Points Build integrated system to support: Digital Library Collections Academic Commons (institutional repository) Long-term Archive Use Fedora 3 as platform Two copies on disk, two copies on tape Offsite storage Scalable storage Central IT supported infrastructure Sustainable funding

Technology Approach Risk Averse - use tried and true technologies Use mature, commodity products as much as possible Choose first-tier vendor (Sun) with sustainable support models, proven reliability, stability Open to maximize sustainability and flexibility Open Data, Open Formats, Open Source, Open Protocols, Open Community Entrance and Exit Strategy Hancock, Mara, UC Berkeley, New Structures and Efficiecies, Exploring New Potential Collaborations in the Field, http://opencontent.ccnmtl.columbia.edu/presentations/mhancock.html

Technology Storage Technology Sun Storage Archive Manager (SAM) Policy-based, tiered storage approach Released as open source TAR Archive format - no proprietary archive format on disks Commercial support from Sun Access Method NFS access for application use Solaris 10, ZFS Highly reliable, scalable file system model Released as open source Commercial support from Sun

Technology SUN Storage Archive Manager (SAM) Platform 70TB effective storage, expandable to 400TB Policy-based, tiered storage, commercial support Two Front-End SAM T5240 Solaris Servers Tier I Disk Cache, 9.6 Terabytes (TB) expandable to 60TB Tier II Disk Storage - 192TB Raw, mirrored for 70TB net storage Tier II Tape Storage - 80TB, expandable to 460TB Four copies - 2 Disk, 2 Tape for Preservation Open as possible to maximize sustainability

Storage Architecture Columbia Digital Library Applications Columbia Academic Commons Applications Fedora Servers SAM Servers Disk Cache On-Site Disk Campus Data Center On-Site Tape Campus Data Center Off-Site Disk NYSERNet Data Center Off-Site Tape Off-line, Off-Site Facility Copy 1 Copy 2 Copy 3 Copy 4

Offsite: NYSERNet Data Center Colocation facility in Syracuse, NY 24x7x365 support Battery/Diesel Backup Dual-Power Grid High Speed Fiber (adj. to NYSERNet POP) New Machine Room NYSERNet Data Center Overview: http://www.nysernet.org/services/bcc/

Storage Architecture Columbia Data Center New York, NY Campus Private Network 1 Gigabit/sec NYSERNet Data Center Syracuse, New York SUN T5240 SAM Metadata Servers 10TB FC Disk Storage Copy 3) 70TB SATA Offsite Disk Storage Copy 1) 70TB SATA Onsite Disk Storage Copy 2) 80TB Onsite Tape Copy 4) Offline Tape To Offsite Facility

Fedora Platform and Library/Information Services Columbia Digital Library Columbia Academic Commons Public facing Library facing Fedora Asset Repository & Long-Term Archive Internal Workflow Management Systems Internal Data Management Systems

! For example: Digitized collections Born-digital collections (eg: spatial data) Online Exhibitions Columbia Digital Library!! For example: Columbia-produced content Rich collaboration spaces Faculty profiles Columbia Academic Commons Public facing Library facing Fedora Asset Repository & Long-Term Archive Internal Workflow Management Systems Internal Data Management Systems! For example: Hypatia, batch ingest tools Digitization workflow Preservation workflow Online exhibition workflow!! For example: Backup Data migration SAM-FS

Open Source Benefit Columbia Library/IS staff added capability to Fedora to accept content via locally mounted file system Provide better integration with SAM-FS Patch to be incorporated into Fedora 3.3 Benefit of open source approach make change to meet local requirements, benefit to larger community FCREPO-453 - Allow the retrieval of content via the file URI scheme: https://fedora-commons.org/jira/browse/fcrepo-453

Progress To Date July thru December 2008 Purchased hardware Completed initial hardware installation Sun professional services, training Finalized Fedora software plan and configuration Completed initial Fedora installation Evaluated multiple tools Revised technology roadmap

Progress To Date January thru September 2009 Implemented Academic Commons in Fedora Migrated from DSpace to Academic Commons Built Hypatia cataloging tool for Columbia needs Designing metadata and content models in coordination with Cornell Inventoried and stabilized legacy digital content from hard drives and CDs to staging storage (approx. 8 terabytes) Began metadata remediation of legacy digital assets Began batch ingest of Digital Collections for long-term archive Developed initial requirements for long-term archive

Curation Fast Ingest Rate Slow Complexity Complex Assisted Cataloging Full-Service Cataloging Digital Library Cost $$$ Simple None None Automated Self-Service Cataloging Academic Commons $$ $ Low Effort High Derived from: Goble, Carole, University of Manchester, Curating Services and Workflows: the Good, the Bad and the Ugly, http://www.slideshare.net/carolegoble/dcc-keynote-2007/

Hypatia - Ingest Tool Developed by Columbia Library/IS staff Enables non-programmers to create input forms and workflows for metadata schemas and then catalog items in a secure, controlled environment. Support multiple projects, collections, workflows, with secure, granular access controls, in "Hypatia Spaces." Initial support for Academic Commons: Assisted curation support for Library/IS staff Faculty self-service deposit

Hypatia - Ingest Tool

!"#$"% &#"'$()#*!+,-./* &#,#0# 1-$2 3+.'#" Admin: Reject with Comments Users: Submit Withdraw Describe, Deposit, Disseminate 4#5(#6 7/9(:% '-9+,#0#;9#0./.0. $#'-$/*;5.,(/.0# )(0"0$#.9*;5#$(<= $(>?0"*;5.,(/.0#;@#,/" Hypatia Admin: Approve / Un-Approve 7++$-5# 7/9(:%;A#$(<= $#'-$/*;.""(>: B&";C;'-,,#'D-:;C #E+-$0;0#9+,.0# Admin: Commits 8#/-$. 7F0-9.D'%;G9.(, :-D@'.D-:;0-;F"#$

Academic Commons http://academiccommons.columbia.edu Migrated from DSpace to Fedora for Fall 2009 Open access to Dspace and Fedora backends very helpful Custom export/import code written to move data in exactly the format we wanted to meet Columbia metadata requirements Multiple iterations and extensive testing Match existing public interface with enhancements Provides foundation to rapidly increase deposits

Academic Commons

Next Steps Application Development Develop and implement staff digital collection viewer Continue Academic Commons development Continue Hypatia development Expand to support additional media types (eg: video) Research and Development Investigate Fedora administration tools Investigate media servers (eg: djatoka JPEG2000 server) Develop broad strategy for persistent identifiers (eg: handles) Investigate advanced search and discovery for Academic Commons (eg: Blacklight evaluation)

Final Thoughts Team collaboration across disparate groups Technology platform is working well for our needs Success with Fedora 3 platform Importance of communications and awareness building