Contemporary Composers Web Archive (CCWA): Progress in Collaboratively Collecting Composers' Websites



Similar documents
Progress Made and Lessons Learned through Collaborative Web Archiving Projects

How collaboration can save [more of] the web: recent progress in collaborative web archiving initiatives

How collaboration can save [more of] the web: recent progress in collaborative web archiving initiatives

PASIG San Diego March 13, 2015

How To Use The Internet Archive For A Library Collection

Final Report April 12, 2013

ERA Challenges. Draft Discussion Document for ACERA: 10/7/30

Texas State University University Library Strategic Plan

Building a master s degree on digital archiving and web archiving. Sara Aubry (IT department, BnF) Clément Oury (Legal Deposit department, BnF)

31 December Dear Sir:

Stewarding Big Data: Perspectives on Public Access to Federally Funded Scientific Research Data

The ISPS Data Archive: Mission, Work, and Some Reflections

DATA CITATION. what you need to know

Archiving the Social Web MARAC Spring 2013 Conference

WEB ARCHIVING AT SCALE

The Institutional Repository at West Virginia University Libraries: Resources for Effective Promotion

INFORMATION SYSTEMS & HIGHER EDUCATION. Steve Kutay Digital Services Librarian Oviatt Library CSUN

Library Strategic Planning

Canadian National Research Data Repository Service. CC and CARL Partnership for a national platform for Research Data Management

RESEARCH LIBRARY VIRTUAL RESOURCES & INSTRUCTIONAL INITIATIVES SUBTOPIC: SPECIALLY ASSIGNED LIBRARIANS

Queen s Open Journal System (OJS) Business Case

The University of Queensland Library Your Partner in Scholarship STRATEGIC PLAN

DSpace: An Institutional Repository from the MIT Libraries and Hewlett Packard Laboratories

Entering its Third Century

Brown University Libraries Technology Plan,

Building Australia s eresearch Capability: the challenge of data management. Adrian Burton and Margaret Henty

National Snow and Ice Data Center A brief overview and data management projects

How To Useuk Data Service

The Importance of a Digital Library in Modern Business

Test Requirements at Top Colleges

Oh My Blawg! Who Will Save the Legal Blogs? *

The Center for Research Libraries

The Library as Partner in University Data Curation

Managing Business Records and Archives at the Getty Center

Benefits of managing and sharing your data

The Dartmouth Digital Library Program, Priorities, and Policies. A Report from the Digital Projects and Infrastructure Group (DPIG)

University Libraries Strategic Goals and Objectives. extracted from: A Strategic Plan for the UNLV Libraries:

Databases & Data Infrastructure. Kerstin Lehnert

Web Archiving and Scholarly Use of Web Archives

Design and selection criteria for a national web archive

Strategic Plan

Planning for Data Curation in the Small Liberal Arts College Environment

Recommendations of the Campus Data Management and Curation Advisory Committee

THE EFFECTS OF BIG DATA ON INFRASTRUCTURE. Sakkie Janse van Rensburg Dr Dale Peters UCT 1

New York Art Resources Consortium Web Archiving Program

North Carolina Department of Cultural Resources Digital Preservation Plan. State Archives of North Carolina State Library of North Carolina

Scott A. Hollander

How To Write A Blog Post On Globus

Cambridge University Library. Working together: a strategic framework

Trinity College Library

Walk Before You Run. Prerequisites to Linked Data. Kenning Arlitsch Dean of the

web archives & research collections

Compute Canada & CANARIE Response to TC3+ Consultation Paper: Capitalizing on Big Data: Toward a Policy Framework for Advancing Digital Scholarship

Allegheny College CIC Information Fluency Workshop February 2013 Information Literacy Plan Report to CIC

Digital Preservation and Access - The 5 Opens Project

Content Management Systems: Who Makes the Rules?

Facilitating the discovery of free online content: the librarian perspective. June Open Access Author Survey March

Workforce Demand and Career Opportunities in University and Research Libraries

University Libraries Strategic Plan 2015

Introduction... 3 Transition to Digital Media... 3 Creation of an Asset New Workflows... 4 Figure

DataShare & Data Audit. Lessons Learned. Robin Rice. Digital Curation Practice, Promise and Prospects

NOAA Environmental Data Management Update for Unidata SAC

Melvin P. Thatcher FamilySearch Utah, USA

The 20 best universities in the world

NISO Thought Leader Meeting on Institutional Repositories

James Hardiman Library. Digital Scholarship Enablement Strategy

University Library SANTA CLARA UNIVERSITY

Checklist and guidance for a Data Management Plan

UC AND THE NATIONAL RESEARCH COUNCIL RATINGS OF GRADUATE PROGRAMS

Collections Associate University Librarian (Acting) Jean Mckenzie. Acquisitions. Scholarly Communication Officer. (under recruitment)

Growing a web archiving program: A case study for evolving an organization-management plan

The Libraries Role in Research Data Management: A Case Study from the University of Minnesota

Working with the British Library and DataCite Institutional Case Studies

Building community clouds to support access to scholarship. Michele Kimpton CEO, DuraSpace Jonathan Markow CSO, DuraSpace

Digital Preservation Webinar Series: Identify

All You Wanted To Know About the Management of Digital Resources in Alma

Data Curation Profile for History

The University of Alabama at Birmingham. Information Technology. Strategic Plan

University-wide Data Management Services: Cross-campus Collaborations at Indiana University

ECU Libraries Joint Strategic Plan

Advanced Archive- It Applica2on Training: Archiving Social Networking and Social Media Sites

Scholarly Use of Web Archives

Functional Requirements for Digital Asset Management Project version /30/2006

SHared Access Research Ecosystem (SHARE)

LIBRARY GUIDE & SELECTED RESOURCES FOR NURSING. To find books about and by nursing theorists:

ALEX THURMAN. PCC Participants Meeting ALA Annual June 30, Columbia University Libraries

Florida Statewide Digital Initiative: Digital Action Plan

Archiving the Web and Beyond: A Look at Twi8er and Facebook (and some other things too)

Research Data Management Guide

Research Data Management Policy

Adding Robust Digital Asset Management to Oracle s Storage Archive Manager (SAM)

The International Internet Preservation Consortium (IIPC)

Archiving the Web: the mass preservation challenge

CUL Strategic Plan Goals for September

Research and the Scholarly Communications Process: Towards Strategic Goals for Public Policy

Digital Library as Partner in Transformative Scholarship

Web Archiving Tools: An Overview

Managing research data and Horizon 2020

Ranking Web of repositories: End users point of view?

Columbia University Digital Library Architecture. Robert Cartolano, Director Library Information Technology Office October, 2009

Transcription:

Contemporary Composers Web Archive (CCWA): Progress in Collaboratively Collecting Composers' Websites June 24, 2015 IAML/IMS Anna Perricci, Columbia University Laura Stokes, Brown University

What is CCWA?

Collaborative Project Borrow Direct (Ivies Plus) Music Librarians Group Brown University (Laura Stokes) Columbia University (Elizabeth Davis) Cornell University (Bonna Boettcher, Leonora Schneller) Dartmouth College (Patricia Fisken) Harvard University (Sarah Adams, Sandi Jo Malmon) Johns Hopkins University (Jennifer Ottervik) Massachusetts Institute of Technology (Peter Munstedt) Princeton University (Darwin Scott) University of Chicago (Scott Landvatter) University of Pennsylvania (Dick Griscom) Yale University (Suzanne Lovejoy) and Anna Perricci, Columbia University

Ex.: Steven Stucky s website, list of captures

Captured page, 10/28/2014

Web Resources Archiving Collaboration Many thanks to the Mellon Foundation Building collaborations among Web archiving communities Other research libraries Users and potential users of web archives Website creators

Premise: we need radical collaboration and cross institutional partnerships Collaborative web archiving pilots projects supported/furthered by existing and developing cohorts of peers focused on cooperative collection development initiatives Photo credit: http://www.concordia. ca/about/strategicdirections/jamesneal.html

What is a web archiving project librarian? Roles and goals: Web (education/outreach about web collecting, coordinating & supporting selection of web resources to archive) Archiving (helping to prioritize, obtain and save content for immediate and long term access) Project (project manager for many moving pieces and partners) Librarian (technical services librarian and liaison roles)

Project elements Incentives grants to advance web archiving tools Collaborative collection building through Ivy Plus / Borrow Direct CCWA; Collaborative Architecture, Urbanism and Sustainability Web Archive (CAUSEWAY); and climate change curation experiment Outreach to site creators Best practices for site creators Citation analysis Humans testing 2061 URLs from citations in scholarship on Human Rights published in 9 major journals in this field in 2010 Around 50% don t work or have major content drift/lead to cited content now what? Leveraging APIs to determine if cited content in the Human Right Web Archive and/or the Internet Archive; part of this process involved assistants looking for missing content on live web Interviews with scholars to enrich use case development

A brief introduction to web archiving Web archiving entails a multifaceted approach to preserve web based materials (e.g. websites) and ensure ongoing access to collected content The main elements of web archiving are Selection Collection Metadata assignment/cataloging Quality assurance Access Long term stewardship Columbia University Libraries Web Resources Collection Program All of the above steps plus a policy to ask permission before collecting

Selection From collection development policy to subject expertise First questions: what to collect, for whom and why? Collection development policy Defining themes and goals For CCWA this was an extension of cooperative collecting of contemporary composers scores Engaged selectors Photo credit: Anna Perricci

Collection / harvest using software How do we get the content for our collections of archived websites? We harvest sites using software, often known as a crawler, spider or robot For collecting and access we use Archive It, a subscription based service of the Internet Archive Photo credit: Anna Perricci

Technical elements of curation Fostering and managing collaboration is a multifaceted pursuit Tools for collecting, track progress and manage communications Archive It Google Drive Basecamp Email aliases & folder rules Photo credit: Anna Perricci

Cataloging & Quality Assurance Cataloging / Metadata assignment essential to discoverability Recognition of Russell Merritt, a skilled music cataloger at Columbia, who has cataloged most of the sites in CCWA and lead the effort to create a process for cataloging the archived websites of contemporary composers Quality assurance testing Many thanks to music librarians who did QA testing last year! Sometimes errors can be corrected through patch crawls/troubleshooting Photo credit: Anna Perricci

Access Who will use what we have collected and how will they access it? We need more use cases We need to make web archives more accessible to get use cases Archive It is an access system with limitations This is a server at the Internet Archive! Photo credit: Anna Perricci

When? Now! Plus long term stewardship We need to collect websites before they disappear but we also must ensure their longterm survival and maintain access to them over time So far we are saving websites in the WARC file format (the preservation standard) and temporarily relying on the Internet Archive to store the files until a repository framework can be established/chosen Photo credit: Anna Perricci

Web archives in the media Jill Lepore, The Cobweb: Can the Internet Be Archived? The New Yorker (January 26, 2015) Lepore got a lot of things right but there was a severely incorrect assertion that most everything on the internet is already in the Internet Archive (it s not!) There was no recognition of curated web archives and their tendency to be of higher quality/fidelity to the original resource Archive It provides a set of tools to focus and improve crawls Photo credit: Anna Perricci

Advocating for curated collections Curated collections are focused rather than haphazard guided by a collection development policy informed by skilled selectors Re crawling sites at regular intervals can show patterns and maintain a consistent flow of information Because we ask permission, we ignore a file that blocks crawlers (so we get things that the Internet Archive otherwise would pass over as a matter of policy) Photo credit: Anna Perricci

Discoverability & facilitating access

CCWA in a library catalog

Collaborative work Selection Automation Curation / Quality control This brings up content questions still to be resolved! Challenges faced Ownership and costs Leadership and consensus, esp. as the project grows

Research implications Loss and destruction of information in the digital world Curation and canonformation How to retain original order when materials are removed from a linked context? Creative Commons: Wasteland by Shane Gorski CC BY ND 2.0

Effect on research Do digital resources and tools lead to fundamentally different questions, or do they provide the basis for new answers to existing questions that could not be answered without newly emerging digital tools? Or: Is web archiving philosophically different from other kinds of archiving? Photo credit: Anna Perricci

Concluding remarks This pilot project has: Effectively created a robust technical framework for curated web archiving Demonstrated some of the advantages (and a few pitfalls) of a collaborative approach Current and future media coverage and scholarly research needs demonstrate why this work is critically necessary

Thank you! Anna Perricci Web Archiving Project Librarian, Columbia University anna.perricci@columbia.edu Laura Stokes Performing Arts Librarian, Brown University laura_stokes@brown.edu Photo Credit: David Niblack, Imagebase.net