How collaboration can save [more of] the web: recent progress in collaborative web archiving initiatives



Similar documents
How collaboration can save [more of] the web: recent progress in collaborative web archiving initiatives

ALEX THURMAN. PCC Participants Meeting ALA Annual June 30, Columbia University Libraries

Progress Made and Lessons Learned through Collaborative Web Archiving Projects

Contemporary Composers Web Archive (CCWA): Progress in Collaboratively Collecting Composers' Websites

Archiving the Internet

Scholarly Use of Web Archives

INTRODUCTION. by Kristine Hanna Director of Archiving Services at the Internet Archive

Web and Twitter Archiving at the Library of Congress

Growing a web archiving program: A case study for evolving an organization-management plan

Practical Options for Archiving Social Media

Content Management Systems: Who Makes the Rules?

Web Archiving and Scholarly Use of Web Archives

Using Dataverse Virtual Archive Technology for Research Data Management. Jonathan Crabtree Thu-Mai Christian Amanda Gooch

web archives & research collections

Final Report April 12, 2013

Local Loading. The OCUL, Scholars Portal, and Publisher Relationship

Leveraging SharePoint for Library Services - F6

POSITION DETAILS. Digitisation & Digital Services

Best Practices for Data Management. September 24, 2014

How To Useuk Data Service

Web Site Collection Plan. for. Michigan State University Archives & Historical Collections. May 21, 2015

Building a master s degree on digital archiving and web archiving. Sara Aubry (IT department, BnF) Clément Oury (Legal Deposit department, BnF)

The Institutional Repository at West Virginia University Libraries: Resources for Effective Promotion

Emerging Career Trends for Information Professionals: A Snapshot of Job Titles in Summer 2013

DSpace: An Institutional Repository from the MIT Libraries and Hewlett Packard Laboratories

Advanced Archive- It Applica2on Training: Archiving Social Networking and Social Media Sites

Judy Jeng, Ph. D. International Conference of Digital Archives and Digital Humanities Taipei, Taiwan December 1-2, 2009

How To Harvest For The Agnic Portal

Oh My Blawg! Who Will Save the Legal Blogs? *

How To Use The Internet Archive For A Library Collection

Anne Karle-Zenith, Special Projects Librarian, University of Michigan University Library

Recommendations of the Campus Data Management and Curation Advisory Committee

Web Archiving Tools: An Overview

Archives & Records Administration Tier 2 courses waived.

Functional Requirements for Digital Asset Management Project version /30/2006

Digital Preservation and Access - The 5 Opens Project

Indiana University s Library focusing on the Online Archive Kelsey Bawel

Archive-IT Services Andrea Mills Booksgroup Collections Specialist

How To Manage Pandora

Data Management Planning Tool User Guide

OCLC CONTENTdm. Geri Ingram Community Manager. Overview. Spring 2015 CONTENTdm User Conference Goucher College Baltimore MD May 27, 2015

New Technologies and the Convergence of Libraries, Archives, and Museums

Best Practices for Research Data Management. October 30, 2014

Checklist for a Data Management Plan draft

Oxford University Press Online Resources. Tracy James Library Sales Executive Oxford University Press

Columbia University Libraries / Information Services

Research Data Management Guide

CDL s OAI Harvest Architecture Development

Facilitating the discovery of free online content: the librarian perspective. June Open Access Author Survey March

Web Archiving at BnF September 2006

Digital Asset Management Developing your Institutional Repository

Archiving the Social Web MARAC Spring 2013 Conference

Digital Asset Management DAM + Image Intellectual Property Management IIPM. MCN 2010 Alan Newman, National Gallery of Art

Statement of Work (SOW) for Web Harvesting U.S. Government Printing Office Office of Information Dissemination

Melvin P. Thatcher FamilySearch Utah, USA

Managing explicit knowledge using SharePoint in a collaborative environment: ICIMOD s experience

Jean Sykes The UK Research Data Service Project

The Center for Research Libraries

DIGITAL OBJECT an item or resource in digital format. May be the result of digitization or may be born digital.

Best Practices for Data Management. RMACC HPC Symposium, 8/13/2014

A Unified, Standards-Compliant System for Describing Archives and Manuscript Collections

Navigating An Introductory Guide for Librarians

General concepts: DDI

ERA Challenges. Draft Discussion Document for ACERA: 10/7/30

Cambridge University Library. Working together: a strategic framework

Library-Based Publishing Using Open Journal Systems

OpenAIRE Research Data Management Briefing paper

Michele Kimpton, Internet Archive. April 7, Written Response to Section 4, Section 108

Collaborative Open Source Software Production & APIs. Tom Cramer Stanford

State Library of North Carolina Library Services & Technology Act Grant Projects Narrative Report

THE WEB ARCHIVING LIFE CYCLE MODEL

Building Australia s eresearch Capability: the challenge of data management. Adrian Burton and Margaret Henty

View from the Coalface: experiences of digital collection management

WEB ARCHIVING AT SCALE

Yale University. Report to the Digital Library Federation October 2004 Volume 5, Number 1. Fall

USER EDUCATION: ACADEMIC LIBRARIES

Multiple Digital Content Types in a Single Collection. Dina Sokolova and Jane Gorjevsky, Columbia University

Archiving Systems. Uwe M. Borghoff Universität der Bundeswehr München Fakultät für Informatik Institut für Softwaretechnologie.

UNIVERSITY OF SUSSEX. 1 Advertisement Ref: 861

Ranking Web of repositories: End users point of view?

HP Records Manager. Release Notes. Software Version: 8.1. Document Release Date: June 2014

Policies of Managing Web Resources at the Canadian Government: A Records Management Perspective

Collecting and Providing Access to Large Scale Archived Web Data. Helen Hockx-Yu Head of Web Archiving, British Library

How To Use The Web Curator Tool On A Pc Or Macbook Or Ipad (For Macbook)

University-wide Data Management Services: Cross-campus Collaborations at Indiana University

1.00 ATHENS LOG IN BROWSE JOURNALS 4

WEB ARCHIVING IN THE UNITED STATES: A 2013 SURVEY AN NDSA REPORT

WHY DIGITAL ASSET MANAGEMENT? WHY ISLANDORA?

Archive I. Metadata. 26. May 2015

Databases & Data Infrastructure. Kerstin Lehnert

Beth Bernhardt Assistant Dean for Collection Management and Scholarly Communications UNC Greensboro. Anna Craft Metadata Cataloger UNC Greensboro

Working with the British Library and DataCite Institutional Case Studies

The FAO Open Archive: Enhancing Access to FAO Publications Using International Standards and Exchange Protocols

Institutional Repositories: Staff and Skills requirements

The NDSA Content Working Group s National Agenda Digital Content Area Discussions: Web and Social Media

Administrative Manual

Summary of Critical Success Factors, Action Items and Performance Measures

Documenting the research life cycle: one data model, many products

Plagiarism. Dr. M.G. Sreekumar UNESCO Coordinator, Greenstone Support for South Asia Head, LRC & CDDL, IIM Kozhikode

The New ADS Search Interface and API

Transcription:

How collaboration can save [more of] the web: recent progress in collaborative web archiving initiatives Anna Perricci Columbia University Libraries Best Practices Exchange November 14, 2013

Overview Web archiving context (Who) Benefits of curated web archives (Why) Columbia University Libraries Web Resources Collection Program (What/How) o Background o Selection o Permissions o Harvesting o Description o Access

Andrew W. Mellon Foundation support for CUL web archiving Grant projects Collection Building for Web Resources (2008 2009) 1 FTE: project librarian Web Resources Collection Program Development (2009 2012) 3 FTE: 2 web curators, 1 programmer Web Resources Archiving Collaboration (2013 2015) 2 FTE: 1 project librarian, 1 bibliographic asst

Avery Library Historic Preservation and Urban Planning 60+ websites, semiannual Burke Library New York City Religions 225 websites, semiannual Human Rights 547 websites, quarterly Rare Book and Manuscript Library 30 websites, semiannual University Archives most of columbia.edu domain, plus 75+ affiliated sites with "external" URLs General CUL Web Archive Collections

CUL s Archive It space: http://www.archive it.org/organizations/304

Elements of CUL s web archiving workflows Selection or nomination Archive It for scoping, crawls, access, storage & QA Cataloging workflows/output Firm policy on permissions Project management tools for growing number of collaborations Workflows can vary a little between collections

Archive It provides some essential services and thereby lets CUL focus on curation, cataloging and collaboration building

Archive It s responsiveness to CUL s requests Metadata: custom field functionality added to metadata template & all fields made repeatable Pilot projects: options related to ignoring robots.txt, downloading WARC files and export metadata as XML, improvements to capture for architecture/urban planning sites via Wayback QA tools

Challenges: using QA tools (media files & images)

Description and access for archived websites Archive it.org site level metadata (All thematic collections, DCMI, copied from MARC records if possible) CLIO collection level MARC records (Human rights, Avery, Burke) CLIO site level MARC records (Human rights, Avery) Document level MARC records (selected longish Avery collection reports, pre existing IRCR records) Human Rights Web Archive portal on CUL website (using metadata extracted from MARC records)

CLIO public view of MARC collection level record

Permissions No explicit US Copyright Act libraries exception for web archiving CUL policy is to request permission from website owners to harvest their websites and provide access to archived versions o Permission request email sent to contact info from website o If no response after 3 weeks, follow up request with notification of intent to archive website Statistics o 918 requests sent o 437 responded Yes o 5 responded No o 415 did not respond

Web Resources Archiving Collaboration Many thanks to the Mellon Foundation Building collaborations among The web archiving community Other research libraries Users and potential users of web archives Site creators

Incentives grants to advance web archiving tools Image source: http://imgur.com/gallery/vg7ke48

Building an efficient, coherent, and scalable national framework for collecting web content Image credit: http://imgur.com/gallery/1m5mbkf

Designated space for collaborative collecting

Collaborative collection experiment in progress via partner institutions in Borrow Direct

Collaboration with music librarians

Contemporary composers the perfect storm?

Collection Development (test case)

Project tracking

Use cases

Who are the web archives for? Are they being used? Could we encourage more effective use?

Using the Human Rights Web Archive & learning from human rights scholars work (publications, citations)

http://hrwa.cul.columbia.edu

Columbia University web resources: creating best practices for site creators

Best Practices for site creators: working with website creators Image credit: http://imgur.com/gallery/nwj12pl

Open issue: division and maintenance of cooperative efforts (communication, software and more)

Process over next 18 months Planning, needs assessment (interviews) Group communication Ongoing growth (scale of collecting and distribution of effort) Planning for sharing costs 5 year plan for Borrow Direct collaborations

Leveraging expertise & devising ways to share responsibilities Collecting strategy and establishment of priorities for collection development Suggest seed URLs Liaise with site owners to solicit permission to archive websites Conduct detailed quality assurance by browsing the archived website as a user would (e.g. try to access media files to ensure they have been successfully captured) Assessment of efficacy for users?

Leveraging expertise: web archivists Capture web archives & do initial quality assurance of a crawl Coordinate efforts and field questions regarding technical elements of web archiving existing CUL web archiving policy permissions processing needs assessment user profiles and use cases value and usage assessment? (someday)

Larger questions to return to periodically (and maybe discuss today!) Do you have any ideas about what users would want to find in web archives or use cases? Which websites do you and your colleagues consider indispensable for your day to day work? Are there any websites in your institution's domain that should be harvested and saved (e.g. web pages created by or about faculty members, academic programs or student groups)? Are there any types of sites that would certainly be out of scope or have proven to be out of scope? Is there anything impeding the use of web archives we are creating?

Takeaways Image credit: http://www.thebraiser.com/wp content/uploads/2012/12/chinese.jpg

The future is bright! Image credit: Flickr user: Tambako the Jaguar (CC BY ND 2.0)

Opportunities for discussion: Today and on an ongoing basis Image credit: http://www.idocampaigns.com/editor/assets/istock_000014314309_large.jpg

Thank you! Anna Perricci alp2198@columbia.edu